Advanced database systems (CSC2508), Fall 2016


Course Description

Over the last few years a plethora of new data management systems and architectures have become mainstream. Such systems have vastly diverse application focus, architectures and differ significantly from traditional transactional database management systems and data warehouses. The goal of this course is to explore these systems, broadly characterised as noSQL/newSQL systems and understand their strengths and limitations. We will also explore new trends in data management fueld by application needs, such as support for advanced analytics, stream processing systems and main memory data processing.

This is a graduate seminar course. There will be a combination of presentation by the instructor and the participants. All participants are expected to actively engage in the course, be familiar with all the material presented and drive the discussions for the part of the course they are responsible for. The course involves a project, more details will be available in class.

Announcements and clarifications

Administrivia

Instructor: Nick Koudas
Lectures: BA025
Office: BA 5240
TA: TBD
Office hours: by appointment
Instructor telephone: 416 946-5819
Instructor email: my last name @ uoft cs domain
Course web page: here

Course structure

At the start of every lecture, I would ask a member of the class to summarise the main topic that we will discuss. I would be interested to hear your thoughts on why is this paper important and whether there is anything you would do to challange in the methodology or thesis of this paper. This is your chance to bring up any issues you wish that demonstrate your deep understanding of the topic.

You are expected to actively participate in the discussions for each lecture and be fully familiar with the paper presented. For each system you are assigned to present you are expected to do all the background research and collect all suitable references. You will share you slide deck with the class and make it available through this website along with all references you used. For each type of system presented make sure you structure your presentation along the following lines:

  1. History
  2. Programming Language Interface
  3. Data Model + Operators
  4. Physical Structures (e.g., indexes, hash tables)
  5. Users + Applications + Target Workloads
  6. Transaction Support:
  7. Elasticity + Data Redistribution
  8. Optimizations
  9. System Architecture
  10. Why makes the system different from others?

Readings

Parallel Databases (9/12)

Introduction to noSQL (9/19)

SQL on Hadoop (9/26)

SQL on Hadoop Cont (10/3)

Spark (10/17,10/24,10/31)

Profiling (11/7)

Analytics I (11/14)

Analytics II (11/21)

Analytics III (11/28 and 12/5)

Project Proposals / Paper presentations (12/12)

Other Resources

Breakdown of marks

The course mark will be broken down into the categories listed below, with points assigned as indicated:

WeightItemMinimal markModerate markHigh mark
30%ParticipationPresentTalkativeInsightful comments or questions
20%PresentationsFactually correctDesigned and delivered wellTransmits effectively key points, implications, etc.
5%Quality of feedback to peersFocus on nitpicks and minutiaeSuggest incremental improvementsIdentify structural strengths and flaws
45%Final projectUnambitious and/or badly plannedPartially implemented and/or poorly presentedImplemented successfully with key learning points presented

Project proposals

The course is associated with a project. Proposed class projects will be described by the instructor. Feel free to discuss your ideas with the instructor and propose your own project. However the project you propose HAS to be associated with the material in the class. This is very important and it is not up for discussion. The project should have a research component. A simple implementation using the systems we discuss in class, on a data set you find interesting, does not constitute a project for this class. The projects will be outlined in class and descriptions will be distributed in class. Some background reading is associated with each project. The relevant technical papers will be distributed in class. The project proposal (due date Nov 7) should contain the following information: Project proposals should be a couple of pages at most.