Models & Intro to DB Architectures

Similar documents
Models & Intro to DB Architectures

SQL & intro to db architectures

data systems 101 prof. Stratos Idreos class 2

column-stores basics

basic db architectures & layouts

column-stores basics

from bits to systems

data systems 101 prof. Stratos Idreos class 2

HOW INDEX TO STORE DATA DATA

complex plans and hybrid layouts

Introduction to Data Management. Lecture #1 (Course Trailer )

Introduction to Data Management. Lecture #1 (Course Trailer ) Instructor: Chen Li

Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook:

class 17 updates prof. Stratos Idreos

CS564: Database Management Systems. Lecture 1: Course Overview. Acks: Chris Ré 1

class 13 scans vs indexes prof. Stratos Idreos

class 11 b-trees prof. Stratos Idreos

Introduction to Data Management. Lecture #1 (Course Trailer )

Introduction to Data Management. Lecture #1 (The Course Trailer )

Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri

Modern Database Systems CS-E4610

CAS CS 460/660 Introduction to Database Systems. Fall

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

CS425 Midterm Exam Summer C 2012

class 8 b-trees prof. Stratos Idreos

Chapter 1: Introduction

Modern Database Systems Lecture 1

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

SQL: Part III. Announcements. Constraints. CPS 216 Advanced Database Systems

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

SQL: Part II. Announcements (September 18) Incomplete information. CPS 116 Introduction to Database Systems. Homework #1 due today (11:59pm)

systems & research project

CMPT 354: Database System I. Lecture 1. Course Introduction

Overview. CS165: Project Document. The goal of the project is to design and build a main memory optimized column store.

Database Technology Introduction. Heiko Paulheim

CMPT 354: Database System I. Lecture 3. SQL Basics

Evolution of Database Systems

The Relational Model Constraints and SQL DDL

Announcements (September 18) SQL: Part II. Solution 1. Incomplete information. Solution 3? Solution 2. Homework #1 due today (11:59pm)

Introduction to Data Management. Lecture #4 (E-R à Relational Design)

Who we are: Database Research - Provenance, Integration, and more hot stuff. Boris Glavic. Department of Computer Science

CSCI1270 Introduction to Database Systems

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Overview of the Class and Introduction to DB schemas and queries. Lois Delcambre

Databasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh

CSC 453 Database Technologies. Tanu Malik DePaul University

EECS 647: Introduction to Database Systems

Introduction to Data Management. Lecture #2 Intro II & Data Models I

Standard stuff. Class webpage: cs.rhodes.edu/db Textbook: get it somewhere; used is fine. Prerequisite: CS 241 Coursework:

class 10 b-trees 2.0 prof. Stratos Idreos

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

Relational Algebra for sets Introduction to relational algebra for bags

Module 4. Implementation of XQuery. Part 0: Background on relational query processing

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CS425 Fall 2016 Boris Glavic Chapter 2: Intro to Relational Model

relational Key-value Graph Object Document

LAB 2 Notes. Conceptual Design ER. Logical DB Design (relational) Schema Refinement. Physical DD

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

CS 245: Database System Principles

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

DATABASE MANAGEMENT SYSTEMS

Introduction to Data Management. Lecture #2 (Big Picture, Cont.)

Lecture 16. The Relational Model

class 17 updates prof. Stratos Idreos

CMPSCI 645 Database Design & Implementation

Lecture 3: SQL Part II

EECS 482 Introduction to Operating Systems

CSE 344 JANUARY 3 RD - INTRODUCTION

Announcements (September 21) SQL: Part III. Triggers. Active data. Trigger options. Trigger example

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li

NJIT Department of Computer Science PhD Qualifying Exam on CS 631: DATA MANAGEMENT SYSTEMS DESIGN. Summer 2012

Database Design. Goal: specification of database schema Methodology:

Column-Stores vs. Row-Stores: How Different Are They Really?

CMPT 354: Database System I. Lecture 2. Relational Model

Query Processing & Optimization. CS 377: Database Systems

CS639: Data Management for Data Science. Lecture 1: Intro to Data Science and Course Overview. Theodoros Rekatsinas

CMPT 354: Database System I. Lecture 11. Transaction Management

CSCB20. Introduction to Database and Web Application Programming. Anna Bretscher Winter 2017

class 9 fast scans 1.0 prof. Stratos Idreos

LAB 3 Notes. Codd proposed the relational model in 70 Main advantage of Relational Model : Simple representation (relationstables(row,

The Relational Model. Week 2

CS145: Intro to Databases. Lecture 1: Course Overview

Exam code: Exam name: Database Fundamentals. Version 16.0

MIT Database Management Systems Lesson 01: Introduction

NoSQL database and its business applications

Introduction and Overview

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Score. 1 (10) 2 (10) 3 (8) 4 (13) 5 (9) Total (50)

Data about data is database Select correct option: True False Partially True None of the Above

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

class 6 more about column-store plans and compression prof. Stratos Idreos

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

Midterm 1: CS186, Spring 2012

MongoDB Schema Design

Course Web Site. 445 Staff and Mailing Lists. Textbook. Databases and DBMS s. Outline. CMPSCI445: Information Systems. Yanlei Diao and Haopeng Zhang

University of California, Berkeley. (2 points for each row; 1 point given if part of the change in the row was correct)

Introduction to Data Management. Lecture #4 (E-R Relational Translation)

CSC 261/461 Database Systems Lecture 19

CS W Introduction to Databases Spring Computer Science Department Columbia University

class 20 updates 2.0 prof. Stratos Idreos

Transcription:

class 3 Models & Intro to DB Architectures prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

welcome brave cs165 students! 42+44 Stratos Idreos 2 /49

NO LAPTOP/PHONE POLICY class is based on participation! we will bring a copy of the slides for every one in each time class so you can follow and keep notes + there is enough evidence that laptops and phones slow you down (check syllabus for more info) Stratos Idreos 3 /49

applications sql database kernel algorithms/operators cpu memory data data data disk Stratos Idreos 4 /49

a simple example assume an array of N integers: find all positions where value>x qualifying positions select operator exists in all systems: sql, nosql, newsql data even the simplest tasks are actually far from trivial no obvious solutions; just a taste of what to come Stratos Idreos 5 /49

what we will do design data structures and algorithms = access methods study design tradeoffs with respect to modern hardware application requirements & complete system design next couple of weeks very basics of data models and languages (today) basics of db architectures (today next couple of classes) column-stores and hardware-conscious designs Stratos Idreos 6 /49

Stratos Idreos 7 /49

for project: install MonetDB & PostgreSQL/MySQL and play with SQL explain + SQL query to see query plans in MonetDB (use read-only mode) repeat throughout the semester compare with your system in terms of performance this is part of final deliverable & several logistics and tools (code.seas, gdb, perf, valgrind, testing infrastructure) do not underestimate this Stratos Idreos 8 /49

what should I be doing? do P0 register to Piazza and stay logged in check syllabus/website carefully check project timeline and plan around it keep up with reading (goes fast) register for notes today (tmr we will do assignments) come to Labs and OH frequently Stratos Idreos 9 /49

how to read research papers 1) abstract-intro-related work-conclusions what is the problem why is it important why past solutions do not work what is the core idea what is success 2) core part-analysis basic idea what matters any gaps? 3) follow a few citations and repeat goal: by the end of the semester understand these papers fully Stratos Idreos 10/49

we want you to have fun! data systems is an exciting field! tell us how you are keeping up tell us what you need to better follow the class tell us your suggestions about how to improve the class Stratos Idreos 11/49

photo of Anand here the evolution of data-driven applications Anand Rajaraman (3) use social data (social networks) use own data (1) (recommendations) (2) use public data (search) (4) all of the above (all of the above) (5) training data (machine learning) Stratos Idreos 12/49

food for (big data-driven) thought Anand Rajaraman amazon google Cyborg the cyborg knows & manages all your models linkedin bank X Stratos Idreos 13/49

food for (big data-driven) thought the biggest transportations companies have no cars the biggest data companies have no data the biggest hotel companies have no hotels more about this and other thoughts in our first brainstorming session (TBA) Stratos Idreos 14/49

logical design physical design system design Stratos Idreos 15/49

essential steps in using a database system experts/system admins clean schema load tune query user/apps Stratos Idreos 16/49

relational model+sql database professors (id,name, ) key table/relation courses (id,name, profid, ) column/attribute students (id,name, ) create table for professors: create table professors (id:integer, name: char(40), telephone: char(10), ) insert into professors (76897689, john smith, ) give me the names of all students: select name from students where GPA>3.0 Stratos Idreos 17/49

employee (id:int, name:varchar(50), office:char(5), telephone:char(10), city:varchar(30), salary:int) data schema (1, name1, office1, tel1, city1, salary1) (2, name2, office2, tel2, city2, salary2) (3, name3, office3, tel3, city3, salary3) (4, name4, office4, tel4, city4, salary4) (5, name5, office5, tel5, city5, salary5) (6, name6, office6, tel6, city6, salary6) (7, name7, office7, tel7, city7, salary7) (8, name8, office8, tel8, city8, salary8) (9, name9, office9, NULL, city9, salary9) SQL:insert into employee (1, name1, office1, tel1, city1, salary1) cardinality=9 value does not exist Stratos Idreos 18/49

relational model+sql database professor (id,name, ) enrolled (studentid, courseid, ) course (id,name, profid, ) foreign key student (id,name, ) give me all students enrolled in cs165 select student.name from student, enrolled, course where course.name= cs165 and enrolled.courseid=course.id and student.id=enrolled.studentid join Stratos Idreos 19/49

enrolled (studentid,courseid, ) student (id,name, ) how do we join Stratos Idreos 20/49

normalization say schema about university db contains one table AllData(student ID,student name,student address, course name, grade, professor name, professor ID, professor telephone, ) good duplicates - tons of data - updates - but no joins Stratos Idreos 21/49

star schema dimension table 1 (id1, ) fact table (id1,id2, ) dimension table 2 (id2, ) Stratos Idreos 22/49

snowflake schema Stratos Idreos 23/49

Alex Liu, class 2016 165/265 project adaptive denormalization 1st prize in ACM SIGMOD undergrad research competition Special Interest Group on Management of Data Stratos Idreos 24/49

NORMALIZED DATA DENORMALIZED DATA only fast scans but expensive to create, storage & updates good for updates, storage but we need joins Stratos Idreos 25/49

adaptive denormalization continuously physically reorganize data based on incoming query patterns (joins) denormalized fragments queries only need to fast scan normalized data possible denormalized space Stratos Idreos 26/49

constraints create table employee (id:integer, name:varchar(50) not null, must have a value office:char(5), at most 5 chars telephone:char(10), city:varchar(30), salary:integer, primary key (id) must be unique check (salary<100000)) must not become rich when and how do we enforce constraints Stratos Idreos 27/49

more SQL examples aggregations select max(gpa),avg(gpa),min(gpa) from students math select R.a - R.b + R.c from R nested select * from R where R.a IN (select b from S where C<10) set ops select * from R where a =10 UNION select * from B where b =20 Stratos Idreos 28/49

select avg(gpa), class, major from students where GPA>3.0 and class>1990 group by class, major order by class Stratos Idreos 29/49

base table Employee (id:int, name:varchar(50), office:char(5), telephone:char(10), city:varchar(30), salary:int) view to be used by managers in Berlin Employee-Berlin-Manager select * from employee where city= berlin how should we store views view to be used by all employees in Berlin Employee-Berlin-All select id,name,city,office from employee where city= berlin Stratos Idreos 30/49

why are models great? Stratos Idreos 31/49

other models? Stratos Idreos 32/49

it is summer 2017 - now you know all about data systems you are building an augmented reality startup using Google Glass people wearing Google Glass can tag places/objects - voice/image recognition works fine tagging means assigning values, comments, etc to an object you can then query this data - again assume voice recognition works fine and a black box translates natural language to SQL how does the schema of your app look like? (tables, attributes, keys, relationships) (assume a limited working environment/features, say walking around Harvard square/yard) describe 2 interesting queries in SQL Stratos Idreos 33/49

a possible example q1: get all places where jenny said awesome q2: get all users that like what I like and are close by comment (id,user_id,oject_id,text, ) likes_comment (user_id,comment_id) object (id,name,location,telephone,date,url,color,taste, many more) user (id,name,location,device, ) likes_object (user_id,oject_id,) trust (user_id,user_id) select user.name, user.location select object.location from user, likes_object as L1, likes_object as from L2 object, user where L1.user_id=my_id and L1.object_id=L2.object_id where user.name and L2.user_id = jenny!=my_id and and user.id=l2.user_id and close(user.location,mylocation)=true comment.user_id=user.id and comment.text LIKE %awesome% Stratos Idreos 34/49

how do we store the object table? what if we want to add another kind of object? object (id,name,location,telephone,date,url,color,taste, many more) open research and business problem Stratos Idreos 35/49

design logical design physical design system design Stratos Idreos 36/49

declarative interface ask what you want so do db systems just work? db system Stratos Idreos 37/49

declarative interface ask what you want indexes/views/tuning knobs DBA but db cracking, adaptive* ideas db system Stratos Idreos 38/49

essential steps in using a database system experts/system admins clean schema load tune query user/apps Stratos Idreos 39/49

design logical design physical design system design next up: db architectures 101 Stratos Idreos 40/49

applications sql algorithms/operators database kernel design/implement numerous possible algorithms + data representations choose the best data source, algorithms and path for each query data data data Stratos Idreos 41/49

select min(a) from R where B<10 and C<80 algorithms/operators database kernel data data data parser optimizer execution storage Stratos Idreos 42/49

applications sql parser optimizer in/out admission execution storage database kernel Stratos Idreos 43/49

applications sql client programs thread 1 thread 2 thread 3 thread 4 thread 5 db program thread pool database kernel Stratos Idreos 44/49

applications sql parser in/out database kernel cpu optimizer thread pool memory execution transactions disk storage buffer pool is it good to have modules Stratos Idreos 45/49

Notes to remember models help create the right abstractions models help create >>1 applications over the same data we first need to clean, structure and load data data systems consist of software components Stratos Idreos 46/49

reading textbook: chapters 1, 3 (-3.5), 5 (-5.8,-5.9) intro + relational model + SQL browse the Fourth Paradigm Stratos Idreos 47/49

readings for next 3 classes Architecture of a Database System (Sections 1,2,3,4) by J. Hellerstein, M. Stonebraker and J. Hamilton The Design and Implementation of Modern Column-store Database Systems by D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden Stratos Idreos 48/49

class 3 Models & Intro to DB Architectures DATA SYSTEMS prof. Stratos Idreos