Seminar WS 15/16

Big Data

Winter Semester 2015/16

Contact: Weiping Qu

Presentations Schedule

Group 1:

Wednesday, Feb 10th, 2016

9:00-10:00

SEMA-JOIN: joining semantically-related tables using big table corpora

Florian Haubold

10:00-11:00

KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing

Moiz Hasan

11:00-12:00

BigDansing: A System for Big Data Cleansing

Jeffrey Joseph

lunch break

13:30-14:30

Rethinking serializable multiversion concurrency control

Gagan Gowda

14:30-15:30

In-memory performance for big data

Ganesh Harugeri

15:30-16:30

Let's talk about storage&recovery methods for non-volatile memory database systems

Zubair Jaleel

Thursday, Feb 11th, 2016

9:30-10:30

Scalable Distributed Stream Join Processing

Soumen Pramanik

10:30-11:30

Persistent Data Sketching

Maximilian van den Berg

lunch break

13:00-14:00

Spark SQL: Relational data processing in Spark

Saman Ardalan

14:00-15:00

Continuous cloud-scale query optimization and processing

Nanda Vidya

15:00-16:00

SQLGraph: An Efficient Relational-Based Property Graph Store

Johannes Götz

 

 

Group 2:

Monday, Feb 15th, 2016

9:00-10:00

SEMA-JOIN: joining semantically-related tables using big table corpora

Divya Venkatesan

10:00-11:00

KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing

Alireza Koochali

11:00-12:00

BigDansing: A System for Big Data Cleansing

Stefan Braun

lunch break

13:30-14:30

Rethinking serializable multiversion concurrency control

Pascal Stahl

14:30-15:30

In-memory performance for big data

Max Gilbert

15:30-16:30

Let's talk about storage&recovery methods for non-volatile memory database systems

Felipe Schmidt

Tuesday, Feb 16th, 2016

9:00-10:00

Spark SQL: Relational data processing in Spark

Jahanzeb Khan

10:00-11:00

Continuous cloud-scale query optimization and processing

Zahra Zamansani

11:00-12:00

SQLGraph: An Efficient Relational-Based Property Graph Store

Philipp Thau

lunch break

13:30-14:30

Scalable Distributed Stream Join Processing

Priyamvada Shankar

14:30-15:30

Persistent Data Sketching

Fabian Neffgen

 

News

Registration You can sign-up for this term's seminar through this link until April 21st 2015. The result of registration will be informed after registration deadline. Please follow the news published in this web site for more detailed information about the date and place for kick-off meeting. The attendence of the kick-off meeting is mandatory! wq

Date

Announcement

March 20, 2016

Results

 

You find the final results here.

 

wq

Feb 14, 2016

The template for the written report can be downloaded here. Preview here.

Oct 15, 2015

The registration results have been sent to all students.

 

Please contact me in case you still have not received any feedback yet if you registered for our seminar before Oct 1st.

 

wq

Oct 1, 2015

Registration is over.

 

wq

Sept 16, 2015

Results of seminar SS15

 

The results can be found here.

 

wq

Aug 13, 2015

Registration

 

You can sign-up for this term's seminar through this link until September 30th 2015.

The result of registration will be informed at the beginning of Oct. 2015 after registration deadline.

 

The kick-off meeting will take place on Wednesday, October 28, 2015 at 2:00pm in 36/336.

The attendence of the kick-off meeting is mandatory!

 

wq

Overview

In this term's seminar, we will look into recent trends in the research area of Big Data.

Prerequisites

Participants should have successfully attended the lecture Datenbankanwendung(database application) or equivalent. Further, having attended the core course Informationsysteme(information system) is recommended, too.

Requirements for Certificate

 

 

Time period

Regulation

before Kick-off

Organization:

 

- Sept.30th: end of registration

- Oct. 1st ~ Oct. 23rd:

* registration result notification

* collection of preferences on assigned topics from students

* topic assignment notification

Kick-off

will take place on Oct. 28th and the attendance is mandatory.

Seminar starts!

Our seminar provides students with a place where assigned readings are discussed, questions can be raised and debates can be conducted.

In general, each participant has to construct presentation slides, give a talk and submit a report, especially join the discussion.

Hence, the attendance of all the final presentations is mandatory.

The performance, i.e. final grade of your seminar work is derived from the following aspects:

- active participation in final discussion

- quality of your talk incl. slides, Q&A

- understanding on your topic incl. basic theory and paper-specific idea/algorithm

- active participation during entire seminar, i.e. participation in review & moderation phases

Kick-off ~ the end of November

Understanding seminar paper:

 

The first month is used to read and understand the assigned seminar paper.

 

Each student has to make a first appointment with your tutor to discuss the outline of your presentation. There is no need to have slides ready; but be able to have a solid understanding of the paper and concrete ideas (e.g., bullet point list) how you want to organize the talk. You are responsible for scheduling meetings with your tutor.

 

Meanwhile, each student prepares a short written summary of around 5000-5500 characters (incl. whitespace) as an introduction to relevant foundations of your seminar paper. It should address the list of questions/hints put below.

- What is the research field of the paper, e.g., query optimization in database systems, graph data mining, social network analysis, or index structures?

- What is the motivation of the paper, i.e., why are the results presented in the paper useful?

- What does the paper propose? A new system, algorithms, theory, experimental evaluation, or any subset of these?

- More specifically, what is the main idea of the paper? For instance, it proposes an algorithm to allow efficient search in high-dimensional data.

- What other work exists and how do the authors put their work apart from existing papers on a high level? List 2-3 papers and read (at least) their abstracts.

- Describe three things you like about the paper and three things you dislike.

- What questions would you ask the authors if they were available for a discussion?

 

This summary is to be sent to the tutor at least 2 days before the first meeting. The idea is that this summary is going to be used as the basis for discussions, for the preparation of the talk, and eventually also as the basis for the final report.

 

In addition to these generic questions, the tutor will provide additional 3-5 questions that are specific to the paper.

in December

Slides preparation:

 

Point out advantages or potential weaknesses of the work covered in your presentation. If you are unsure about what to present, talk to your tutor. Note that—even though relevant presentations may be available on the web—we expect that you prepare your own slides (which may be, of course, inspired by the original slides).

 

Send your as-complete-as-possible (i.e., no left TODOs, etc.) slides to and discuss them with your tutor and also your future peer reviewer as well (explained in next part). Otherwise, your talk may be canceled.

in

January

Review process:

 

The review process consists of two parts.

 

The first part is called peer review. Your slides will be reviewed by an arbitarily selected peer student who also participates in seminar in this term. In turn you should review another student's slides and give feedbacks to him/her on his/her work. Peer-to-peer relationships are randomly selected by us.

Meanwhile, your tutor will give you professional feedbacks on your no-left-TODO slides.


By merging feedbacks from both your peer student and your tutor, you should construct a final version of your presentation slides before your talk.

February 8th ~ 12th

Final presentation week:

 

The 45-min talk should be a mixture (e.g., 20:80) of an introduction to relevant foundations and the details of the paper. If two or more papers have identical or highly related foundations, the introductory talk is shared among the students.

 

Each presentation is followed by approximately 15 minutes of discussion. The discussion is moderated by your peer student. The moderator's role is to provide interesting input (such as observations, questions, related work) for the discussion and, in general, to enable a constructive discussion.

 

After seminar, the final slides should be collected.

by end of February

Completing report:

 

Extend the previous written summary to a short report (not longer than 4 pages) about your topic. A Latex template and its preview are provided. The report should concisely summarize the article and point out strengths and weaknesses.

Organisation

You will be assigned one of the below topics (coming soon), do a literature review, and write a seminar report either in German or English. You will present your results in a 45-minute talk.

Assigned Papers

Paper 1

Armbrust, Michael, et al.

"Spark SQL: Relational data processing in Spark."

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015.

Paper 2

Lin, Qian, et al.

"Scalable Distributed Stream Join Processing."

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015.

Paper 3

He, Yeye, Kris Ganjam, and Xu Chu.

"SEMA-JOIN: joining semantically-related tables using big table corpora."

Proceedings of the VLDB Endowment 8.12 (2015): 1358-1369.

Paper 4

Chu, Xu, et al.

"KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing."

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015.

Paper 5

Khayyat, Zuhair, et al.

"BigDansing: A System for Big Data Cleansing."

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015.

Paper 6

Bruno, Nicolas, Sapna Jain, and Jingren Zhou.

"Continuous cloud-scale query optimization and processing."

Proceedings of the VLDB Endowment 6.11 (2013): 961-972.

Paper 7

Sun, Wen, et al.

"SQLGraph: An Efficient Relational-Based Property Graph Store."

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015.

Paper 8

Wei, Zhewei, et al.

"Persistent Data Sketching."

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015.

Paper 9

Faleiro, Jose M., and Daniel J. Abadi.

"Rethinking serializable multiversion concurrency control."

Proceedings of the VLDB Endowment 8.1 (2014)

Paper 10

Graefe, Goetz, et al.

"In-memory performance for big data."

Proceedings of the VLDB Endowment 8.1 (2014): 37-48.

Paper 11

Arulraj, Joy, Andrew Pavlo, and Subramanya R. Dulloor.

"Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems."

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015.