Seminar WS 15/16
Big Data
Winter Semester 2015/16
Contact: Weiping Qu
Presentations Schedule
Group 1:
Wednesday, Feb 10th, 2016 | ||
9:00-10:00 | SEMA-JOIN: joining semantically-related tables using big table corpora | Florian Haubold |
10:00-11:00 | KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing | Moiz Hasan |
11:00-12:00 | BigDansing: A System for Big Data Cleansing | Jeffrey Joseph |
lunch break | ||
13:30-14:30 | Rethinking serializable multiversion concurrency control | Gagan Gowda |
14:30-15:30 | In-memory performance for big data | Ganesh Harugeri |
15:30-16:30 | Let's talk about storage&recovery methods for non-volatile memory database systems | Zubair Jaleel |
Thursday, Feb 11th, 2016 | ||
9:30-10:30 | Scalable Distributed Stream Join Processing | Soumen Pramanik |
10:30-11:30 | Persistent Data Sketching | Maximilian van den Berg |
lunch break | ||
13:00-14:00 | Spark SQL: Relational data processing in Spark | Saman Ardalan |
14:00-15:00 | Continuous cloud-scale query optimization and processing | Nanda Vidya |
15:00-16:00 | SQLGraph: An Efficient Relational-Based Property Graph Store | Johannes Götz |
Group 2:
Monday, Feb 15th, 2016 | ||
9:00-10:00 | SEMA-JOIN: joining semantically-related tables using big table corpora | Divya Venkatesan |
10:00-11:00 | KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing | Alireza Koochali |
11:00-12:00 | BigDansing: A System for Big Data Cleansing | Stefan Braun |
lunch break | ||
13:30-14:30 | Rethinking serializable multiversion concurrency control | Pascal Stahl |
14:30-15:30 | In-memory performance for big data | Max Gilbert |
15:30-16:30 | Let's talk about storage&recovery methods for non-volatile memory database systems | Felipe Schmidt |
Tuesday, Feb 16th, 2016 | ||
9:00-10:00 | Spark SQL: Relational data processing in Spark | Jahanzeb Khan |
10:00-11:00 | Continuous cloud-scale query optimization and processing | Zahra Zamansani |
11:00-12:00 | SQLGraph: An Efficient Relational-Based Property Graph Store | Philipp Thau |
lunch break | ||
13:30-14:30 | Scalable Distributed Stream Join Processing | Priyamvada Shankar |
14:30-15:30 | Persistent Data Sketching | Fabian Neffgen |
News
Registration You can sign-up for this term's seminar through this link until April 21st 2015. The result of registration will be informed after registration deadline. Please follow the news published in this web site for more detailed information about the date and place for kick-off meeting. The attendence of the kick-off meeting is mandatory! wq
Date | Announcement |
---|---|
March 20, 2016 | Results
You find the final results here.
|
Feb 14, 2016 | The template for the written report can be downloaded here. Preview here. |
Oct 15, 2015 | The registration results have been sent to all students.
Please contact me in case you still have not received any feedback yet if you registered for our seminar before Oct 1st.
|
Oct 1, 2015 | Registration is over.
|
Sept 16, 2015 | Results of seminar SS15
The results can be found here.
|
Aug 13, 2015 | Registration
You can sign-up for this term's seminar through this link until September 30th 2015. The result of registration will be informed at the beginning of Oct. 2015 after registration deadline.
The kick-off meeting will take place on Wednesday, October 28, 2015 at 2:00pm in 36/336. The attendence of the kick-off meeting is mandatory!
|
Overview
In this term's seminar, we will look into recent trends in the research area of Big Data.
Prerequisites
Participants should have successfully attended the lecture Datenbankanwendung(database application) or equivalent. Further, having attended the core course Informationsysteme(information system) is recommended, too.
Requirements for Certificate
Time period | Regulation |
---|---|
before Kick-off | Organization:
- Sept.30th: end of registration - Oct. 1st ~ Oct. 23rd: * registration result notification * collection of preferences on assigned topics from students * topic assignment notification |
Kick-off | will take place on Oct. 28th and the attendance is mandatory. Seminar starts! |
Our seminar provides students with a place where assigned readings are discussed, questions can be raised and debates can be conducted. In general, each participant has to construct presentation slides, give a talk and submit a report, especially join the discussion. Hence, the attendance of all the final presentations is mandatory. The performance, i.e. final grade of your seminar work is derived from the following aspects: - active participation in final discussion - quality of your talk incl. slides, Q&A - understanding on your topic incl. basic theory and paper-specific idea/algorithm - active participation during entire seminar, i.e. participation in review & moderation phases | |
Kick-off ~ the end of November | Understanding seminar paper:
The first month is used to read and understand the assigned seminar paper.
Each student has to make a first appointment with your tutor to discuss the outline of your presentation. There is no need to have slides ready; but be able to have a solid understanding of the paper and concrete ideas (e.g., bullet point list) how you want to organize the talk. You are responsible for scheduling meetings with your tutor.
Meanwhile, each student prepares a short written summary of around 5000-5500 characters (incl. whitespace) as an introduction to relevant foundations of your seminar paper. It should address the list of questions/hints put below. - What is the research field of the paper, e.g., query optimization in database systems, graph data mining, social network analysis, or index structures? - What is the motivation of the paper, i.e., why are the results presented in the paper useful? - What does the paper propose? A new system, algorithms, theory, experimental evaluation, or any subset of these? - More specifically, what is the main idea of the paper? For instance, it proposes an algorithm to allow efficient search in high-dimensional data. - What other work exists and how do the authors put their work apart from existing papers on a high level? List 2-3 papers and read (at least) their abstracts. - Describe three things you like about the paper and three things you dislike. - What questions would you ask the authors if they were available for a discussion?
This summary is to be sent to the tutor at least 2 days before the first meeting. The idea is that this summary is going to be used as the basis for discussions, for the preparation of the talk, and eventually also as the basis for the final report.
In addition to these generic questions, the tutor will provide additional 3-5 questions that are specific to the paper. |
in December | Slides preparation:
Point out advantages or potential weaknesses of the work covered in your presentation. If you are unsure about what to present, talk to your tutor. Note that—even though relevant presentations may be available on the web—we expect that you prepare your own slides (which may be, of course, inspired by the original slides).
Send your as-complete-as-possible (i.e., no left TODOs, etc.) slides to and discuss them with your tutor and also your future peer reviewer as well (explained in next part). Otherwise, your talk may be canceled. |
in January | Review process:
The review process consists of two parts.
The first part is called peer review. Your slides will be reviewed by an arbitarily selected peer student who also participates in seminar in this term. In turn you should review another student's slides and give feedbacks to him/her on his/her work. Peer-to-peer relationships are randomly selected by us. Meanwhile, your tutor will give you professional feedbacks on your no-left-TODO slides. By merging feedbacks from both your peer student and your tutor, you should construct a final version of your presentation slides before your talk. |
February 8th ~ 12th | Final presentation week:
The 45-min talk should be a mixture (e.g., 20:80) of an introduction to relevant foundations and the details of the paper. If two or more papers have identical or highly related foundations, the introductory talk is shared among the students.
Each presentation is followed by approximately 15 minutes of discussion. The discussion is moderated by your peer student. The moderator's role is to provide interesting input (such as observations, questions, related work) for the discussion and, in general, to enable a constructive discussion.
After seminar, the final slides should be collected. |
by end of February | Completing report:
Extend the previous written summary to a short report (not longer than 4 pages) about your topic. A Latex template and its preview are provided. The report should concisely summarize the article and point out strengths and weaknesses. |
Organisation
You will be assigned one of the below topics (coming soon), do a literature review, and write a seminar report either in German or English. You will present your results in a 45-minute talk.
Assigned Papers
Paper 1 | Armbrust, Michael, et al. "Spark SQL: Relational data processing in Spark." Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015. | |
Paper 2 | Lin, Qian, et al. "Scalable Distributed Stream Join Processing." Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015. | |
Paper 3 | He, Yeye, Kris Ganjam, and Xu Chu. "SEMA-JOIN: joining semantically-related tables using big table corpora." Proceedings of the VLDB Endowment 8.12 (2015): 1358-1369. | |
Paper 4 | Chu, Xu, et al. "KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing." Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015. | |
Paper 5 | Khayyat, Zuhair, et al. "BigDansing: A System for Big Data Cleansing." Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015. | |
Paper 6 | Bruno, Nicolas, Sapna Jain, and Jingren Zhou. "Continuous cloud-scale query optimization and processing." Proceedings of the VLDB Endowment 6.11 (2013): 961-972. | |
Paper 7 | Sun, Wen, et al. "SQLGraph: An Efficient Relational-Based Property Graph Store." Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015. | |
Paper 8 | Wei, Zhewei, et al. "Persistent Data Sketching." Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015. | |
Paper 9 | Faleiro, Jose M., and Daniel J. Abadi. "Rethinking serializable multiversion concurrency control." Proceedings of the VLDB Endowment 8.1 (2014) | |
Paper 10 | Graefe, Goetz, et al. "In-memory performance for big data." Proceedings of the VLDB Endowment 8.1 (2014): 37-48. | |
Paper 11 | Arulraj, Joy, Andrew Pavlo, and Subramanya R. Dulloor. "Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems." Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015. | |