Models & Intro to DB Architectures

Size: px
Start display at page:

Download "Models & Intro to DB Architectures"

Transcription

1 class 3 Models & Intro to DB Architectures prof. Stratos Idreos

2 welcome brave cs165 students! Stratos Idreos 2 /55

3 NO LAPTOP/PHONE POLICY class is based on participation! we will bring a copy of the slides for every one in each time class so you can follow and keep notes + there is enough evidence that laptops and phones slow you down (check syllabus for more info) Stratos Idreos 3 /55

4 slides are not notes! slides are mainly there to trigger discussion note keeping is your task: starting class 3 we will do collaborative note taking Stratos Idreos 4 /55

5 Will cs165 help me with using (big, nosql, relational) data systems Yes it will but if that is your only goal, there are easier ways to do this Stratos Idreos 5 /55

6 applications sql database kernel algorithms/operators cpu memory data data data disk Stratos Idreos 6 /55

7 what should I be doing? do P0 register to Piazza and stay logged in check syllabus/website carefully check project timeline and plan around it keep up with reading (goes fast!) register for notes today (tmr we will do assignments) come to Labs and OH frequently Stratos Idreos 7 /55

8 for the semester project: install MonetDB & PostgreSQL/MySQL and play with SQL explain + SQL query to see query plans in MonetDB (use read-only mode) repeat throughout the semester compare with your system in terms of performance this is part of final deliverable & several logistics and tools (code.seas, gdb, perf, valgrind, testing infrastructure) do not underestimate this Stratos Idreos 8 /55

9 Read completely and more than once Read Browse Extra Read intro/related work, browse main part Read only if interested Stratos Idreos 9 /55

10 From the syllabus: In this class many of the reading assignments are recent research papers. Unless mentioned otherwise in class, you are not expected to read and understand all these research papers in extreme detail. The main purpose is for you to get exposed to recent ideas and concepts and get inspiration about new opportunities and what is coming in the future. We expect you to read and understand well the abstract, introduction and related work parts of all papers. For the rest of the material in a paper, i.e., the main technical part and the analysis we expect you to have a high level idea of what this does unless we explicitly cover in detail the exact techniques in class. Our goal is that by the end of the semester you will have enough background to be able to pick up any of these papers again and understand it fully! Of course if you want to discuss any of those papers in more detail we will happily do so during office hours. Stratos Idreos 10/55

11 how to read research papers 1) abstract-intro-related work-conclusions what is the problem why is it important why past solutions do not work what is the core idea what is success 2) core part-analysis basic idea what matters any gaps? 3) follow a few citations and repeat goal: by the end of the semester understand these papers fully Stratos Idreos 11/55

12 we want you to have fun! data systems is an exciting field! tell us how you are keeping up tell us what you need to better follow the class tell us your suggestions about how to improve the class Stratos Idreos 12/55

13 next few weeks: (1) data models and languages (today) (2) db architectures (today & next few classes) (3) column-stores & hardware-conscious designs Stratos Idreos 13/55

14 thousands of users - tons of data - frequent updates - tons of queries analytics in the background for game intelligence = the ultimate data-driven challenge Extra: Scaling Games to Epic Proportions By W. White, A. Demers, C. Koch, J. Gehrke and R. Rajagopalan ACM SIGMOD International Conference on Management of Data, 2007 Stratos Idreos 14/55

15 a simple example assume an array of N integers: find all positions where value>x qualifying positions select operator exists in all systems: sql, nosql, newsql data even the simplest tasks are actually far from trivial no obvious solutions; just a taste of what to come Stratos Idreos 15/55

16 philosophy/ sciences ~1960s declarative/ physical/logical independence recovery/ consistency ~2017 papyrus/ paper dbs dbs dbs vs OSs dbs dbs vs nosql dbs vs hand code history/timeline Stratos Idreos 16/55

17 db complex legacy tuning expensive nosql simple clean just enough abstractions declarative processing it is all the same as apps become more complex as apps need to be more scalable newsql->db memory hierarchy I/O vs CPU modeling optimization data layout concepts tradeoffs Stratos Idreos 17/55

18 photo of Anand here the evolution of data-driven applications Anand Rajaraman (3) use social data (social networks) use own data (1) (recommendations) use public data (2) (search) (4) all of the above (all of the above) (5) training data (machine learning) Stratos Idreos 18/55

19 food for (big data-driven) thought Anand Rajaraman amazon google Cyborg the cyborg knows & manages all your models linkedin bank X Stratos Idreos 19/55

20 food for (big data-driven) thought the biggest transportations companies have no cars the biggest data companies have no data the biggest hotel companies have no hotels more about this and other thoughts in our first brainstorming session (TBA) Stratos Idreos 20/55

21 logical design physical design system design Stratos Idreos 21/55

22 essential steps in using a database system experts/system admins clean schema load tune query user/apps Stratos Idreos 22/55

23 professors (id,name, ) key table/relation relational model+sql courses (id,name, profid, ) column/attribute database students (id,name, ) create table for professors: create table professors (id:integer, name: char(40), telephone: char(10), ) insert into professors ( , john smith, ) give me the names of all students: select name from students where GPA>3.0 Stratos Idreos 23/55

24 employee (id:int, name:varchar(50), office:char(5), telephone:char(10), city:varchar(30), salary:int) data schema (1, name1, office1, tel1, city1, salary1) (2, name2, office2, tel2, city2, salary2) (3, name3, office3, tel3, city3, salary3) (4, name4, office4, tel4, city4, salary4) (5, name5, office5, tel5, city5, salary5) (6, name6, office6, tel6, city6, salary6) (7, name7, office7, tel7, city7, salary7) (8, name8, office8, tel8, city8, salary8) (9, name9, office9, NULL, city9, salary9) SQL:insert into employee (1, name1, office1, tel1, city1, salary1) cardinality=9 value does not exist Stratos Idreos 24/55

25 professor (id,name, ) enrolled (studentid, courseid, ) relational model+sql course (id,name, profid, ) foreign key database student (id,name, ) give me all students enrolled in cs165 select student.name from student, enrolled, course where course.name= cs165 and enrolled.courseid=course.id and student.id=enrolled.studentid join Stratos Idreos 25/55

26 enrolled (studentid,courseid, ) student (id,name, ) how do we join Stratos Idreos 26/55

27 normalization say schema about university db contains one table AllData(student ID,student name,student address, course name, grade, professor name, professor ID, professor telephone, ) good duplicates - tons of data - updates - but no joins Stratos Idreos 27/55 query : no joins try to insert data now try to update professor data

28 star schema dimension table 1 (id1, ) fact table (id1,id2, ) dimension table 2 (id2, ) Stratos Idreos 28/55

29 snowflake schema Stratos Idreos 29/55

30 Alex Liu, class /265 project adaptive denormalization 1st prize in ACM SIGMOD undergrad research competition Special Interest Group on Management of Data Stratos Idreos 30/55

31 NORMALIZED DATA DENORMALIZED DATA only fast scans but expensive to create, storage & updates good for updates, storage but we need joins Stratos Idreos 31/55

32 adaptive denormalization continuously physically reorganize data based on incoming query patterns (joins) denormalized fragments queries only need to fast scan normalized data possible denormalized space Stratos Idreos 32/55

33 constraints create table employee (id:integer, name:varchar(50) not null, must have a value office:char(5), at most 5 chars telephone:char(10), city:varchar(30), salary:integer, primary key (id) must be unique check (salary<100000)) must not become rich when and how do we enforce constraints Stratos Idreos 33/55

34 more SQL examples aggregations select max(gpa),avg(gpa),min(gpa) from students math select R.a - R.b + R.c from R nested select * from R where R.a IN (select b from S where C<10) set ops select * from R where a =10 UNION select * from B where b =20 Stratos Idreos 34/55

35 select avg(gpa), class, major from students where GPA>3.0 and class>1990 group by class, major order by class Stratos Idreos 35/55

36 base table Employee (id:int, name:varchar(50), office:char(5), telephone:char(10), city:varchar(30), salary:int) view to be used by managers in Berlin Employee-Berlin-Manager select * from employee where city= berlin Why is this useful How should we store views view to be used by all employees in Berlin Employee-Berlin-All select id,name,city,office from employee where city= berlin Stratos Idreos 36/55

37 why are data models great? Stratos Idreos 37/55 allow for abstraction

38 other models? Stratos Idreos 38/55 graph, RDF,

39 it is summer now you know all about data systems you are building an augmented reality startup using Google Glass people wearing Google Glass can tag places/objects - voice/image recognition works fine tagging means assigning values, comments, etc to an object you can then query this data - again assume voice recognition works fine and a black box translates natural language to SQL how does the schema of your app look like? (tables, attributes, keys, relationships) (assume a limited working environment/features, say walking around Harvard square) describe 2 interesting queries in SQL Stratos Idreos 39/55 user(id,first,last,dob,curlocation, ) comment(id,userid,objectid,text) object(id,location, all attributes of objects with nulls: telephone, year built, or only common attributes in object and store rest in extra tables with objid as both primary and foreign key) likesc(userid,commentid) likeso(userid,objid)

40 a possible example q1: get all places where jenny said awesome q2: get all users that like what I like and are close by comment (id,user_id,oject_id,text, ) likes_comment (user_id,comment_id) object (id,name,location,telephone,date,url,color,taste, many more) user (id,name,location,device, ) likes_object (user_id,oject_id,) trust (user_id,user_id) select user.name, user.location select object.location from user, likes_object as L1, likes_object as from L2 object, user where L1.user_id=my_id and L1.object_id=L2.object_id where user.name and L2.user_id = jenny!=my_id and and user.id=l2.user_id and close(user.location,mylocation)=true comment.user_id=user.id and comment.text LIKE %awesome% Stratos Idreos 40/55 tell them about dictionary compression - you do not do it in the schema - system does it underneath - better not let the designer worry about low level details - not always possible someone do an SQL query: select o.location from O,U where user.name = jenny and comment.user_id=user.id and comment.text LIKE %awesome% what the plan should look like? first the selection on comment or the join? select names,location from U, likes_o as L1, likes_o as L2 where L1.user_id=my_id and L1.object_id=L2.object_id and L2.user_id!=my_id and user.id=l2.user_id and close(user.location,mylocation)=true

41 how do we store the object table? what if we want to add another kind of object? object (id,name,location,telephone,date,url,color,taste, many more) open research and business problem sql vs nosql Stratos Idreos 41/55 we are going to have TONS of objects! Talk about nodal

42 design logical design physical design system design Stratos Idreos 42/55

43 declarative interface ask what you want so do db systems just work? db system Stratos Idreos 43/55

44 declarative interface ask what you want DBA indexes/views/tuning knobs but db cracking, adaptive* ideas db system Stratos Idreos 44/55

45 essential steps in using a database system experts/system admins clean schema load tune query user/apps Stratos Idreos 45/55

46 design logical design physical design system design next up: db architectures 101 Stratos Idreos 46/55

47 applications algorithms/operators sql database kernel design/implement numerous possible algorithms + data representations choose the best data source, algorithms and path for each query data data data Stratos Idreos 47/55 design best possible storage/access level with a specific API decision level whenever we make decisions - things can go wrong

48 select min(a) from R where B<10 and C<80 algorithms/operators database kernel data data data parser optimizer execution storage Stratos Idreos 48/55 parse like compiler flex, yacc, language grammar etc check catalog:maintain all meta, tables columns, access control : am I allowed to see this data etc who decides how to store the data? based on what?

49 applications sql parser optimizer in/out admission execution storage database kernel Stratos Idreos 49/55 simple listen send out with buffer admission control - memory load etc simple number restriction - assign priorities etc

50 applications sql client programs thread 1 thread 2 thread 3 thread 4 thread 5 db program thread pool database kernel Stratos Idreos 50/55 diff between program and thread - thread pool to avoid creating and destroying threads - databases had their own thread support - now use OS

51 applications sql parser in/out database kernel cpu memory disk optimizer execution storage thread pool transactions buffer pool is it good to have modules Stratos Idreos 51/55

52 Notes to remember models help create the right abstractions models help create >>1 applications over the same data we first need to clean, structure and load data data systems consist of software components Stratos Idreos 52/55

53 reading Read: textbook: chapters 1, 3 (-3.5), 5 (-5.8,-5.9) intro + relational model + SQL Browse: the Fourth Paradigm Stratos Idreos 53/55

54 readings for next few classes Read: Architecture of a Database System (Sections 1,2,3,4) by J. Hellerstein, M. Stonebraker and J. Hamilton Read: The Design and Implementation of Modern Columnstore Database Systems by D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden Stratos Idreos 54/55

55 Models & Intro to DB Architectures class 3 DATA SYSTEMS prof. Stratos Idreos

Models & Intro to DB Architectures

Models & Intro to DB Architectures class 3 Models & Intro to DB Architectures prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ welcome brave cs165 students! 42+44 Stratos Idreos 2 /49 NO LAPTOP/PHONE POLICY class is based

More information

SQL & intro to db architectures

SQL & intro to db architectures class 3 SQL & intro to db architectures prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ welcome brave cs165 students! 35+62 Stratos Idreos 2 /55 guest lecture Laura Haas Data Systems

More information

data systems 101 prof. Stratos Idreos class 2

data systems 101 prof. Stratos Idreos class 2 class 2 data systems 101 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ big data V s (it is not about size only) volume velocity variety veracity actually none of that is really new

More information

basic db architectures & layouts

basic db architectures & layouts class 4 basic db architectures & layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ videos for sections 3 & 4 are online check back every week (1-2 sections weekly) there is a schedule

More information

column-stores basics

column-stores basics class 3 column-stores basics prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ project description is now online First background info will be given this Friday and detailed lecture on Feb 21 Basic Readings

More information

column-stores basics

column-stores basics class 3 column-stores basics prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ Goetz Graefe Google Research guest lecture Justin Levandoski Microsoft Research projects option 1: systems project (now

More information

from bits to systems

from bits to systems class 2 from bits to systems prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ today logistics, goals, etc big data & systems (cont d) designing a data system algorithm: what can go wrong

More information

data systems 101 prof. Stratos Idreos class 2

data systems 101 prof. Stratos Idreos class 2 class 2 data systems 101 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ 2 classes per week - OH/Labs every day 1 presentation/discussion lead - 2 reviews each week research (or systems)

More information

HOW INDEX TO STORE DATA DATA

HOW INDEX TO STORE DATA DATA Stratos Idreos HOW INDEX DATA TO STORE DATA ALGORITHMS data structure decisions define the algorithms that access data INDEX DATA ALGORITHMS unordered [7,4,2,6,1,3,9,10,5,8] INDEX DATA ALGORITHMS unordered

More information

complex plans and hybrid layouts

complex plans and hybrid layouts class 7 complex plans and hybrid layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ essential column-stores features virtual ids late tuple reconstruction (if ever) vectorized execution

More information

Introduction to Data Management. Lecture #1 (Course Trailer )

Introduction to Data Management. Lecture #1 (Course Trailer ) Introduction to Data Management Lecture #1 (Course Trailer ) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Topics v Welcome to one

More information

Introduction to Data Management. Lecture #1 (Course Trailer ) Instructor: Chen Li

Introduction to Data Management. Lecture #1 (Course Trailer ) Instructor: Chen Li Introduction to Data Management Lecture #1 (Course Trailer ) Instructor: Chen Li 1 Today s Topics v Welcome to one of my biggest classes ever! v Read (and live by) the course wiki page: http://www.ics.uci.edu/~cs122a/

More information

CAS CS 460/660 Introduction to Database Systems. Fall

CAS CS 460/660 Introduction to Database Systems. Fall CAS CS 460/660 Introduction to Database Systems Fall 2017 1.1 About the course Administrivia Instructor: George Kollios, gkollios@cs.bu.edu MCS 283, Mon 2:30-4:00 PM and Tue 1:00-2:30 PM Teaching Fellows:

More information

Introduction to Data Management. Lecture #1 (Course Trailer )

Introduction to Data Management. Lecture #1 (Course Trailer ) Introduction to Data Management Lecture #1 (Course Trailer ) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Topics! Welcome to my biggest

More information

CS564: Database Management Systems. Lecture 1: Course Overview. Acks: Chris Ré 1

CS564: Database Management Systems. Lecture 1: Course Overview. Acks: Chris Ré 1 CS564: Database Management Systems Lecture 1: Course Overview Acks: Chris Ré 1 2 Big science is data driven. 3 Increasingly many companies see themselves as data driven. 4 Even more traditional companies

More information

Introduction to Data Management. Lecture #1 (The Course Trailer )

Introduction to Data Management. Lecture #1 (The Course Trailer ) Introduction to Data Management Lecture #1 (The Course Trailer ) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Topics v Welcome to

More information

systems & research project

systems & research project class 4 systems & research project prof. HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS265/ index index knows order about the data data filtering data: point/range queries index data A B C sorted A B C initial

More information

Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri

Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri Overview of Data Exploration Techniques Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri data exploration not always sure what we are looking for (until we find it) data has always been big volume

More information

class 17 updates prof. Stratos Idreos

class 17 updates prof. Stratos Idreos class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing selects

More information

CSCI1270 Introduction to Database Systems

CSCI1270 Introduction to Database Systems CSCI1270 Introduction to Database Systems with thanks to Prof. George Kollios, Boston University Prof. Mitch Cherniack, Brandeis University Prof. Avi Silberschatz, Yale University 1.1 What is a Database

More information

Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook:

Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook: CS 133: Databases Fall 2018 Lec 01 09/04 Introduction & Relational Model Data! Need systems to Data is everywhere Banking, airline reservations manage the data Social media, clicking anything on the internet

More information

class 13 scans vs indexes prof. Stratos Idreos

class 13 scans vs indexes prof. Stratos Idreos class 13 scans vs indexes prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ b-tree - dynamic tree - always balanced 35,50 35, 12,20 50, 1,2,3 12,15,17 20, Stratos Idreos 2 /24 select from

More information

Modern Database Systems CS-E4610

Modern Database Systems CS-E4610 Modern Database Systems CS-E4610 Aristides Gionis Michael Mathioudakis Spring 2017 what is a database? a collection of data what is a database management system?... a.k.a. database system software to store,

More information

class 11 b-trees prof. Stratos Idreos

class 11 b-trees prof. Stratos Idreos class 11 b-trees prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ Midway check-in: Two design docs tmr (Canvas) & tests on Sunday Next weekend: Lab marathon for midway check-in & tests

More information

class 8 b-trees prof. Stratos Idreos

class 8 b-trees prof. Stratos Idreos class 8 b-trees prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ I spend a lot of time debugging am I doing something wrong? maybe but probably not 1. learn to use gdb 2. after spending

More information

Database Technology Introduction. Heiko Paulheim

Database Technology Introduction. Heiko Paulheim Database Technology Introduction Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query Processing Transaction Manager Introduction to the Relational Model

More information

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Introduction to Data Management. Lecture #2 (Big Picture, Cont.)

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Introduction to Data Management Lecture #2 (Big Picture, Cont.) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Still hanging

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

Module 4. Implementation of XQuery. Part 0: Background on relational query processing

Module 4. Implementation of XQuery. Part 0: Background on relational query processing Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:

More information

class 6 more about column-store plans and compression prof. Stratos Idreos

class 6 more about column-store plans and compression prof. Stratos Idreos class 6 more about column-store plans and compression prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ query compilation an ancient yet new topic/research challenge query->sql->interpet

More information

class 5 column stores 2.0 prof. Stratos Idreos

class 5 column stores 2.0 prof. Stratos Idreos class 5 column stores 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ worth thinking about what just happened? where is my data? email, cloud, social media, can we design systems

More information

Introduction to Data Management. Lecture #4 (E-R à Relational Design)

Introduction to Data Management. Lecture #4 (E-R à Relational Design) Introduction to Data Management Lecture #4 (E-R à Relational Design) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Reminders:

More information

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction CS425 Fall 2016 Boris Glavic Chapter 1: Introduction Modified from: Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Textbook: Chapter 1 1.2 Database Management System (DBMS)

More information

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and Engineering wuct@cs.sjtu.edu.cn Schedule (1) Storage system part (first eight weeks) lec1: Introduction on

More information

CMPT 354: Database System I. Lecture 1. Course Introduction

CMPT 354: Database System I. Lecture 1. Course Introduction CMPT 354: Database System I Lecture 1. Course Introduction 1 Outline Motivation for studying this course Course admin and set up Overview of course topics 2 Trend 1: Data grows exponentially 1 ZB = 1,

More information

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query Goals for Today CS 133: Databases Fall 2018 Lec 02 09/06 Relational Model & Memory and Buffer Manager Prof. Beth Trushkowsky Reason about the conceptual evaluation of an SQL query Understand the storage

More information

Introduction to Data Management. Lecture #2 Intro II & Data Models I

Introduction to Data Management. Lecture #2 Intro II & Data Models I Introduction to Data Management Lecture #2 Intro II & Data Models I Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Topics v The biggest

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Chapter 2: Intro. To the Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Database Management System (DBMS) DBMS is Collection of

More information

Shine a Light on Dark Data with Vertica Flex Tables

Shine a Light on Dark Data with Vertica Flex Tables White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,

More information

Overview. CS165: Project Document. The goal of the project is to design and build a main memory optimized column store.

Overview. CS165: Project Document. The goal of the project is to design and build a main memory optimized column store. Overview The goal of the project is to design and build a main memory optimized column store. By the end of the project you will have designed, implemented, and evaluated several key elements of a modern

More information

Modern Database Systems Lecture 1

Modern Database Systems Lecture 1 Modern Database Systems Lecture 1 Aristides Gionis Michael Mathioudakis T.A.: Orestis Kostakis Spring 2016 logistics assignment will be up by Monday (you will receive email) due Feb 12 th if you re not

More information

Databasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh

Databasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh Databasesystemer, forår 2005 IT Universitetet i København Forelæsning 8: Database effektivitet. 31. marts 2005 Forelæser: Rasmus Pagh Today s lecture Database efficiency Indexing Schema tuning 1 Database

More information

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li Introduction to Data Management Lecture #2 (Big Picture, Cont.) Instructor: Chen Li 1 Announcements v We added 10 more seats to the class for students on the waiting list v Deadline to drop the class:

More information

SQL: Part III. Announcements. Constraints. CPS 216 Advanced Database Systems

SQL: Part III. Announcements. Constraints. CPS 216 Advanced Database Systems SQL: Part III CPS 216 Advanced Database Systems Announcements 2 Reminder: Homework #1 due in 12 days Reminder: reading assignment posted on Web Reminder: recitation session this Friday (January 31) on

More information

CSC 453 Database Technologies. Tanu Malik DePaul University

CSC 453 Database Technologies. Tanu Malik DePaul University CSC 453 Database Technologies Tanu Malik DePaul University A Data Model A notation for describing data or information. Consists of mostly 3 parts: Structure of the data Data structures and relationships

More information

MIS Database Systems.

MIS Database Systems. MIS 335 - Database Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query in a Database

More information

BIS Database Management Systems.

BIS Database Management Systems. BIS 512 - Database Management Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query

More information

CS145: Intro to Databases. Lecture 1: Course Overview

CS145: Intro to Databases. Lecture 1: Course Overview CS145: Intro to Databases Lecture 1: Course Overview 1 The world is increasingly driven by data This class teaches the basics of how to use & manage data. 2 Key Questions We Will Answer How can we collect

More information

CMPT 354: Database System I. Lecture 3. SQL Basics

CMPT 354: Database System I. Lecture 3. SQL Basics CMPT 354: Database System I Lecture 3. SQL Basics 1 Announcements! About Piazza 97 enrolled (as of today) Posts are anonymous to classmates You should have started doing A1 Please come to office hours

More information

Announcements (September 21) SQL: Part III. Triggers. Active data. Trigger options. Trigger example

Announcements (September 21) SQL: Part III. Triggers. Active data. Trigger options. Trigger example Announcements (September 21) 2 SQL: Part III CPS 116 Introduction to Database Systems Homework #2 due next Thursday Homework #1 sample solution available today Hardcopies only Check the handout box outside

More information

ADVANCED DATABASES ; Spring 2015 Prof. Sang-goo Lee (11:00pm: Mon & Wed: Room ) Advanced DB Copyright by S.-g.

ADVANCED DATABASES ; Spring 2015 Prof. Sang-goo Lee (11:00pm: Mon & Wed: Room ) Advanced DB Copyright by S.-g. 4541.564; Spring 2015 Prof. Sang-goo Lee (11:00pm: Mon & Wed: Room 301-203) ADVANCED DATABASES Copyright by S.-g. Lee Review - 1 General Info. Text Book Database System Concepts, 6 th Ed., Silberschatz,

More information

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept] Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept] 1. What is DBMS? A Database Management System (DBMS) is a program that controls creation, maintenance and use

More information

Who we are: Database Research - Provenance, Integration, and more hot stuff. Boris Glavic. Department of Computer Science

Who we are: Database Research - Provenance, Integration, and more hot stuff. Boris Glavic. Department of Computer Science Who we are: Database Research - Provenance, Integration, and more hot stuff Boris Glavic Department of Computer Science September 24, 2013 Hi, I am Boris Glavic, Assistant Professor Hi, I am Boris Glavic,

More information

TA hours and labs start today. First lab is out and due next Wednesday, 1/31. Getting started lab is also out

TA hours and labs start today. First lab is out and due next Wednesday, 1/31. Getting started lab is also out Announcements TA hours and labs start today. First lab is out and due next Wednesday, 1/31. Getting started lab is also out Get you setup for project/lab work. We ll check it with the first lab. Stars

More information

class 10 b-trees 2.0 prof. Stratos Idreos

class 10 b-trees 2.0 prof. Stratos Idreos class 10 b-trees 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ CS Colloquium HV Jagadish Prof University of Michigan 10/6 Stratos Idreos /29 2 CS Colloquium Magdalena Balazinska

More information

Lesson 14 Transcript: Triggers

Lesson 14 Transcript: Triggers Lesson 14 Transcript: Triggers Slide 1: Cover Welcome to Lesson 14 of DB2 on Campus Lecture Series. Today, we are going to talk about Triggers. My name is Raul Chong, and I'm the DB2 on Campus Program

More information

relational Key-value Graph Object Document

relational Key-value Graph Object Document NoSQL Databases Earlier We have spent most of our time with the relational DB model so far. There are other models: Key-value: a hash table Graph: stores graph-like structures efficiently Object: good

More information

CS 245: Database System Principles

CS 245: Database System Principles CS 245: Database System Principles Notes 01: Introduction Peter Bailis CS 245 Notes 1 1 This course pioneered by Hector Garcia-Molina All credit due to Hector All mistakes due to Peter CS 245 Notes 1 2

More information

EECS 647: Introduction to Database Systems

EECS 647: Introduction to Database Systems EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2009 Summary of SQL Features Query SELECT-FROM-WHERE statements Set and bag operations Table expressions, subqueries Aggregation

More information

class 17 updates prof. Stratos Idreos

class 17 updates prof. Stratos Idreos class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value INSERT INTO table_name VALUES (value1,value2,value3,...)

More information

Course Web Site. 445 Staff and Mailing Lists. Textbook. Databases and DBMS s. Outline. CMPSCI445: Information Systems. Yanlei Diao and Haopeng Zhang

Course Web Site. 445 Staff and Mailing Lists. Textbook. Databases and DBMS s. Outline. CMPSCI445: Information Systems. Yanlei Diao and Haopeng Zhang Course Web Site CMPSCI445: Information Systems Yanlei Diao and Haopeng Zhang University of Massachusetts Amherst http://avid.cs.umass.edu/courses/445/s2015/ or http://www.cs.umass.edu/~yanlei à Teaching

More information

LAB 2 Notes. Conceptual Design ER. Logical DB Design (relational) Schema Refinement. Physical DD

LAB 2 Notes. Conceptual Design ER. Logical DB Design (relational) Schema Refinement. Physical DD LAB 2 Notes For students that were not present in the first lab TA Web page updated : http://www.cs.ucr.edu/~cs166/ Mailing list Signup: http://www.cs.ucr.edu/mailman/listinfo/cs166 The general idea of

More information

DATABASE MANAGEMENT SYSTEMS

DATABASE MANAGEMENT SYSTEMS www..com Code No: N0321/R07 Set No. 1 1. a) What is a Superkey? With an example, describe the difference between a candidate key and the primary key for a given relation? b) With an example, briefly describe

More information

CSC 343 Winter SQL: Aggregation, Joins, and Triggers MICHAEL LIUT

CSC 343 Winter SQL: Aggregation, Joins, and Triggers MICHAEL LIUT SQL: Aggregation, Joins, and Triggers CSC 343 Winter 2018 MICHAEL LIUT (MICHAEL.LIUT@UTORONTO.CA) DEPARTMENT OF MATHEMATICAL AND COMPUTATIONAL SCIENCES UNIVERSITY OF TORONTO MISSISSAUGA Aggregation Operators

More information

CMPT 354: Database System I. Lecture 11. Transaction Management

CMPT 354: Database System I. Lecture 11. Transaction Management CMPT 354: Database System I Lecture 11. Transaction Management 1 Why this lecture DB application developer What if crash occurs, power goes out, etc? Single user à Multiple users 2 Outline Transaction

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

CMPSCI445: Information Systems

CMPSCI445: Information Systems CMPSCI445: Information Systems Yanlei Diao and Haopeng Zhang University of Massachusetts Amherst Course Web Site http://avid.cs.umass.edu/courses/445/s2015/ or http://www.cs.umass.edu/~yanlei à Teaching

More information

Performance Issue : More than 30 sec to load. Design OK, No complex calculation. 7 tables joined, 500+ millions rows

Performance Issue : More than 30 sec to load. Design OK, No complex calculation. 7 tables joined, 500+ millions rows Bienvenue Nicolas Performance Issue : More than 30 sec to load Design OK, No complex calculation 7 tables joined, 500+ millions rows Denormalize, Materialized Views, Columnstore Index Less than 5 sec to

More information

class 20 updates 2.0 prof. Stratos Idreos

class 20 updates 2.0 prof. Stratos Idreos class 20 updates 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value INSERT INTO table_name VALUES

More information

Why are you here? Introduction. Course roadmap. Course goals. What do you want from a DBMS? What is a database system? Aren t databases just

Why are you here? Introduction. Course roadmap. Course goals. What do you want from a DBMS? What is a database system? Aren t databases just Why are you here? 2 Introduction CPS 216 Advanced Database Systems Aren t databases just Trivial exercises in first-order logic (says AI)? Bunch of out-of-fashion I/O-efficient indexes and algorithms (says

More information

Introduction and Overview

Introduction and Overview Introduction and Overview (Read Cow book Chapter 1) Instructor: Leonard McMillan mcmillan@cs.unc.edu Comp 521 Files and Databases Spring 2010 1 Course Administrivia Book Cow book New (to our Dept) More

More information

Overview of the Class and Introduction to DB schemas and queries. Lois Delcambre

Overview of the Class and Introduction to DB schemas and queries. Lois Delcambre Overview of the Class and Introduction to DB schemas and queries Lois Delcambre 1 CS 386/586 Introduction to Databases Instructor: Lois Delcambre lmd@cs.pdx.edu 503 725-2405 TA: TBA Office Hours: Immediately

More information

SQL: Part II. Announcements (September 18) Incomplete information. CPS 116 Introduction to Database Systems. Homework #1 due today (11:59pm)

SQL: Part II. Announcements (September 18) Incomplete information. CPS 116 Introduction to Database Systems. Homework #1 due today (11:59pm) SQL: Part II CPS 116 Introduction to Database Systems Announcements (September 18) 2 Homework #1 due today (11:59pm) Submit in class, slide underneath my office door Sample solution available Thursday

More information

CMPSCI 645 Database Design & Implementation

CMPSCI 645 Database Design & Implementation Welcome to CMPSCI 645 Database Design & Implementation Instructor: Gerome Miklau Overview of Databases Gerome Miklau CMPSCI 645 Database Design & Implementation UMass Amherst Jan 19, 2010 Some slide content

More information

Recommendation Systems

Recommendation Systems Recommendation Systems + data privacy, and startup stories Filip Kaliszan Stanford University October 1, 2015 Filip Kaliszan studied computer science at Stanford - systems in undergrad, then Master s in

More information

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13 CS121 MIDTERM REVIEW CS121: Relational Databases Fall 2017 Lecture 13 2 Before We Start Midterm Overview 3 6 hours, multiple sittings Open book, open notes, open lecture slides No collaboration Possible

More information

Introduction to Data Management. Lecture #26 (Transactions, cont.)

Introduction to Data Management. Lecture #26 (Transactions, cont.) Introduction to Data Management Lecture #26 (Transactions, cont.) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW and exam

More information

NoSQL database and its business applications

NoSQL database and its business applications COSC 657 Db. Management Systems Professor: RAMESH K. Student: BUER JIANG Research paper NoSQL database and its business applications The original purpose has been contemporary web-expand dbs. The movement

More information

CSE544 Database Architecture

CSE544 Database Architecture CSE544 Database Architecture Tuesday, February 1 st, 2011 Slides courtesy of Magda Balazinska 1 Where We Are What we have already seen Overview of the relational model Motivation and where model came from

More information

Introduction to Data Management. Lecture #6 E-Rà Relational Mapping (Cont.)

Introduction to Data Management. Lecture #6 E-Rà Relational Mapping (Cont.) Introduction to Data Management Lecture #6 E-Rà Relational Mapping (Cont.) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 It s time again for...

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

CSE 444: Database Internals. Lecture 3 DBMS Architecture

CSE 444: Database Internals. Lecture 3 DBMS Architecture CSE 444: Database Internals Lecture 3 DBMS Architecture 1 Announcements On Wednesdays lecture will be in FSH 102 Lab 1 part 1 due tonight at 11pm Turn in using script in local repo:./turninlab.sh lab1-part1

More information

Data Collection, Simple Storage (SQLite) & Cleaning

Data Collection, Simple Storage (SQLite) & Cleaning Data Collection, Simple Storage (SQLite) & Cleaning Duen Horng (Polo) Chau Georgia Tech CSE 6242 A / CS 4803 DVA Jan 15, 2013 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,

More information

CSE 444: Database Internals. Lecture 3 DBMS Architecture

CSE 444: Database Internals. Lecture 3 DBMS Architecture CSE 444: Database Internals Lecture 3 DBMS Architecture CSE 444 - Spring 2016 1 Upcoming Deadlines Lab 1 Part 1 is due today at 11pm Go through logistics of getting started Start to make some small changes

More information

Outline. Databases and DBMS s. Recent Database Applications. Earlier Database Applications. CMPSCI445: Information Systems.

Outline. Databases and DBMS s. Recent Database Applications. Earlier Database Applications. CMPSCI445: Information Systems. Outline CMPSCI445: Information Systems Overview of databases and DBMS s Course topics and requirements Yanlei Diao University of Massachusetts Amherst Databases and DBMS s Commercial DBMS s A database

More information

745: Advanced Database Systems

745: Advanced Database Systems 745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.

More information

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

The Relational Model. Why Study the Relational Model? Relational Database: Definitions The Relational Model Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Microsoft, Oracle, Sybase, etc. Legacy systems in

More information

Introduction to Data Management. Lecture #4 (E-R Relational Translation)

Introduction to Data Management. Lecture #4 (E-R Relational Translation) Introduction to Data Management Lecture #4 (E-R Relational Translation) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Today

More information

CS639: Data Management for Data Science. Lecture 1: Intro to Data Science and Course Overview. Theodoros Rekatsinas

CS639: Data Management for Data Science. Lecture 1: Intro to Data Science and Course Overview. Theodoros Rekatsinas CS639: Data Management for Data Science Lecture 1: Intro to Data Science and Course Overview Theodoros Rekatsinas 1 2 Big science is data driven. 3 Increasingly many companies see themselves as data driven.

More information

class 9 fast scans 1.0 prof. Stratos Idreos

class 9 fast scans 1.0 prof. Stratos Idreos class 9 fast scans 1.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ 1 pass to merge into 8 sorted pages (2N pages) 1 pass to merge into 4 sorted pages (2N pages) 1 pass to merge into

More information

(Refer Slide Time: 01:25)

(Refer Slide Time: 01:25) Computer Architecture Prof. Anshul Kumar Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture - 32 Memory Hierarchy: Virtual Memory (contd.) We have discussed virtual

More information

BBM371- Data Management. Lecture 1: Course policies, Introduction to DBMS

BBM371- Data Management. Lecture 1: Course policies, Introduction to DBMS BBM371- Data Management Lecture 1: Course policies, Introduction to DBMS 26.09.2017 Today Introduction About the class Organization of this course Introduction to Database Management Systems (DBMS) About

More information

What we already know. Late Days. CSE 444: Database Internals. Lecture 3 DBMS Architecture

What we already know. Late Days. CSE 444: Database Internals. Lecture 3 DBMS Architecture CSE 444: Database Internals Lecture 3 1 Announcements On Wednesdays lecture will be in FSH 102 Lab 1 part 1 due tonight at 11pm Turn in using script in local repo:./turninlab.sh lab1-part1 Remember to

More information

This handbook contains directions on using tools and resources in WebAccess at CSM.

This handbook contains directions on using tools and resources in WebAccess at CSM. WebAccess Handbook This handbook contains directions on using tools and resources in WebAccess at CSM. Contents Logging in to WebAccess... 2 Setting up your Shell... 3 Docking Blocks or Menus... 3 Course

More information

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall The Relational Model Chapter 3 Comp 521 Files and Databases Fall 2012 1 Why Study the Relational Model? Most widely used model by industry. IBM, Informix, Microsoft, Oracle, Sybase, etc. It is simple,

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query

More information

Announcements (September 18) SQL: Part II. Solution 1. Incomplete information. Solution 3? Solution 2. Homework #1 due today (11:59pm)

Announcements (September 18) SQL: Part II. Solution 1. Incomplete information. Solution 3? Solution 2. Homework #1 due today (11:59pm) Announcements (September 18) 2 SQL: Part II Homework #1 due today (11:59pm) Submit in class, slide underneath my office door Sample solution available Thursday Homework #2 assigned today CPS 116 Introduction

More information

Data Modeling using ER Model

Data Modeling using ER Model Data Modeling using ER Model Database design process - requirements collection and analysis: database requirements and functional requirements - conceptual DB design using a high-level model: easier to

More information