CS564: Database Management Systems. Lecture 1: Course Overview. Acks: Chris Ré 1

Similar documents
CS145: Intro to Databases. Lecture 1: Course Overview

CS145: Intro to Databases. Lecture 1: Course Overview

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Lecture 3: SQL Part II

Introduction and Overview

CS639: Data Management for Data Science. Lecture 1: Intro to Data Science and Course Overview. Theodoros Rekatsinas

Introduction and Overview

Introduction to Data Management. Lecture #1 (Course Trailer )

Introduction to Data Management. Lecture #1 (Course Trailer ) Instructor: Chen Li

Introduction to Data Management. Lecture #1 (Course Trailer )

CS 564: DATABASE MANAGEMENT SYSTEMS. Spring 2018

Introduction to Data Management. Lecture #1 (The Course Trailer )

Database Management Systems. Chapter 1

Database Systems ( 資料庫系統 ) Practicum in Database Systems ( 資料庫系統實驗 ) 9/20 & 9/21, 2006 Lecture #1

Introduction to Database Systems CS432. CS432/433: Introduction to Database Systems. CS432/433: Introduction to Database Systems

Introduction to Database Systems

CSE 303: Database. Teaching Staff. Lecture 01. Lectures: 1 st half - from a user s perspective. Lectures: 2 nd half - understanding how it works

Introduction to Database Management Systems

Introduction to Database Systems. Chapter 1. Instructor: . Database Management Systems, R. Ramakrishnan and J. Gehrke 1

CMPSCI 645 Database Design & Implementation

BBM371- Data Management. Lecture 1: Course policies, Introduction to DBMS

Introduction to Data Management. Lecture #2 Intro II & Data Models I

CS634 Architecture of Database Systems Spring Elizabeth (Betty) O Neil University of Massachusetts at Boston

CAS CS 460/660 Introduction to Database Systems. Fall

CS 245: Database System Principles

Page 1. Goals for Today" What is a Database " Key Concept: Structured Data" CS162 Operating Systems and Systems Programming Lecture 13.

Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook:

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Page 1. Quiz 18.1: Flow-Control" Goals for Today" Quiz 18.1: Flow-Control" CS162 Operating Systems and Systems Programming Lecture 18 Transactions"

CMPT 354 Database Systems I. Spring 2012 Instructor: Hassan Khosravi

Database Management Systems Chapter 1 Instructor: Oliver Schulte Database Management Systems 3ed, R. Ramakrishnan and J.

9/8/2018. Prerequisites. Grading. People & Contact Information. Textbooks. Course Info. CS430/630 Database Management Systems Fall 2018

Database Management System

CMPT 354: Database System I. Lecture 1. Course Introduction

Outline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1

Announcements. PS 3 is out (see the usual place on the course web) Be sure to read my notes carefully Also read. Take a break around 10:15am

Introduction. Example Databases

Lecture 2: Introduction to SQL

CS 525 Advanced Database Organization - Spring 2017 Mon + Wed 1:50-3:05 PM, Room: Stuart Building 111

Score. 1 (10) 2 (10) 3 (8) 4 (13) 5 (9) Total (50)

CS430/630 Database Management Systems Spring, Betty O Neil University of Massachusetts at Boston

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Database Management Systems MIT Introduction By S. Sabraz Nawaz

Database Applications (15-415)

CSE 344 JANUARY 3 RD - INTRODUCTION

Database Concepts. CS 377: Database Systems

CSC 261/461 Database Systems Lecture 13. Fall 2017

Lecture 16. The Relational Model

CS275 Intro to Databases. File Systems vs. DBMS. Why is a DBMS so important? 4/6/2012. How does a DBMS work? -Chap. 1-2

CMPUT 291 File and Database Management Systems

Outline. Database Management Systems (DBMS) Database Management and Organization. IT420: Database Management and Organization

1 (10) 2 (8) 3 (12) 4 (14) 5 (6) Total (50)

Outline. Databases and DBMS s. Recent Database Applications. Earlier Database Applications. CMPSCI445: Information Systems.

Course Web Site. 445 Staff and Mailing Lists. Textbook. Databases and DBMS s. Outline. CMPSCI445: Information Systems. Yanlei Diao and Haopeng Zhang

Database Design and Implementation

Goals for Today. CS 133: Databases. Final Exam: Logistics. Why Use a DBMS? Brief overview of course. Course evaluations

Fall Semester 2002 Prof. Michael Franklin

Database Systems Management

CMPSCI445: Information Systems

Database Systems (INFR10070) Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Introduction to Data Management. Lecture #18 (Transactions)

Lecture 2 08/26/15. CMPSC431W: Database Management Systems. Instructor: Yu- San Lin

CSC 4710 / CSC 6710 Database Systems. Rao Casturi

Modern Database Systems CS-E4610

Avi Silberschatz, Henry F. Korth, S. Sudarshan, Database System Concept, McGraw- Hill, ISBN , 6th edition.

Week 1 Part 1: An Introduction to Database Systems

CMPUT 391 Database Management Systems. Fall Semester 2006, Section A1, Dr. Jörg Sander. Introduction

CMSC 424 Database design Lecture 2: Design, Modeling, Entity-Relationship. Book: Chap. 1 and 6. Mihai Pop

Distributed Systems Intro and Course Overview

Database Management System Implementation. Who am I? Who is the teaching assistant? TR, 10:00am-11:20am NTRP B 140 Instructor: Dr.

CS 245: Principles of Data-Intensive Systems. Instructor: Matei Zaharia cs245.stanford.edu

Introduction to Data Management. Lecture #24 (Transactions)

John Edgar 2

Database Applications (15-415)

Lecture 1: Introduction

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li

CS 241 Data Organization using C

CSE 544 Principles of Database Management Systems

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims

Michael Kifer, Arthur Bernstein, Philip M. Lewis. Solutions Manual

COURSE 1. Database Management Systems

Database Systems CSE 414

Lecture 12. Lecture 12: The IO Model & External Sorting

CSE 344 Final Review. August 16 th

Introduction to Data Management. Lecture #2 (Big Picture, Cont.)

Introduction. Who wants to study databases?

CS 4400 Database Systems. Meeting 1: Introduction Brandon Myers University of Iowa

EECS 647: Introduction to Database Systems

CS 443 Database Management Systems. Professor: Sina Meraji

Textbook(s) and other required material: Raghu Ramakrishnan & Johannes Gehrke, Database Management Systems, Third edition, McGraw Hill, 2003.

Lectures 8 & 9. Lectures 7 & 8: Transactions

Introduction to Information Systems SSC, Semester 6

SQL: Transactions. Announcements (October 2) Transactions. CPS 116 Introduction to Database Systems. Project milestone #1 due in 1½ weeks

Relational Model and Relational Algebra

Intro to Transactions

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

Standard stuff. Class webpage: cs.rhodes.edu/db Textbook: get it somewhere; used is fine. Prerequisite: CS 241 Coursework:

INF 315E Introduction to Databases School of Information Fall 2015

Introduction to Database Systems CSE 444. Lecture #1 March 26, 2007

CMPT 354: Database System I. Lecture 11. Transaction Management

Transcription:

CS564: Database Management Systems Lecture 1: Course Overview Acks: Chris Ré 1

2

Big science is data driven. 3

Increasingly many companies see themselves as data driven. 4

Even more traditional companies 5

The world is increasingly driven by data This class teaches the basics of how to use & manage data. 6

Today s Lecture 1. Introduction, admin & setup ACTIVITY: Jupyter Hello World! 2. Overview of the relational data model ACTIVITY: SQL in Jupyter 3. Overview of DBMS topics: Key concepts & challenges 7

Section 1 1. Introduction, admin & setup 8

Section 1 What you will learn about in this section 1. Motivation for studying DBs 2. Administrative structure 3. Course logistics 4. Overview of lecture coverage 5. ACTIVITY: Jupyter Hello World! 9

Section 1 > Introduction New tech. Same Principles. 10

Section 1 > Introduction Why should you study databases? Mercenary-make more $$$: Startups need DB talent right away = low employee # Massive industry Intellectual: Science: data poor to data rich No idea how to handle the data! Fundamental ideas to/from all of CS: Systems, theory, AI, logic, stats, analysis. Many great computer systems ideas started in DB. 11

Section 1 > Introduction What this course is (and is not) Discuss fundamentals of data management How to design databases, query databases, build applications with them. How to debug them when they go wrong! Not how to be a DBA or how to tune Oracle 12g. We ll cover how database management systems work And some basic principles of how to build them 12

Section 1 > Administrative > Course Staff Who we are Instructor (me) Theo Rekatsinas Faculty in the Computer Sciences and part of the UW-Database Group Research: data integration and cleaning, statistical analytics, and machine learning. thodrek@cs.wisc.edu Office hours: Wed 4:00-5:00pm (after class), Fri 11:00am-12:00 pm @CS 4361 13

Section 1 > Administrative > Course Staff Teaching Assistants (TAs) TAs are humans too! 14

Section 1 > Administrative > Course Staff Teaching Assistants (TAs) TAs are humans too! 15

Section 1 > Administrative > Course Staff Minzhen Vishnu 16

Section 1 > Administrative Communication w/ Course Staff Piazza https://piazza.com/wisc/fall2017/cs5643 Class email: cs564fall17@gmail.com The goal is to get you to answer each other s questions so you can benefit and learn from each other. Office hours OHs are listed on the course website! By appointment! 17

Section 1 > Administrative Course Website: https://thodrek.github.io/cs564-fall17/ Course Email: cs564fall17@gmail.com 18

Section 1 > Logistics Lectures Lecture slides cover essential material This is your best reference. We are trying to get away from book, but we will have pointers Recommended textbooks listed on website Try to cover same thing in many ways: Lecture, lecture notes, homework, exams (no shock) Attendance makes your life easier 19

Section 1 > Logistics Attendance You should attend lectures plus guest lecture Guest lectures are fun. Great guests: they want to meet you! Show up! Attendance is for your benefit People who did not attend did worse L People who did not attend used more course resources L People who did not attend were less happy with the course L 20

Section 1 > Logistics Graded Elements Two Problem Sets (15%) Programming project (30%) Split into four parts Auction base: Experience with a DB application. Implementation of DB internals Midterm (20%) All due dates are posted on website!!! Final exam (35%) 21

Section 1 > Logistics Un-Graded Elements Readings provided to help you! Only items in lecture, homework, or project are fair game. Activities are again mainly to help / be fun! Will occur during class- not graded, but count as part of lecture material (fair game as well) Jupyter Notebooks provided These are optional but hopefully helpful. Redesigned so that you can interactively replay parts of lecture 22

Section 1 > Logistics What is expected from you Attend lectures If you don t, it s at your own peril Be active and think critically Ask questions, post comments on forums Do programming and homework projects Start early and be honest Study for tests and exams 23

Section 1 > Logistics Problem Sets Two problems sets at the beginning Individual assignments Python plus Jupiter notebooks 1 week per problem set Ask questions, post comments on forums Start early! 24

Section 1 > Logistics Project Split into four parts Two parts cover DB applications Two parts cover DB internals In groups of 3 One person per team emails the group info by Wednesday 9/13. Use cs564fall17@gmail.com subject should be CS564-3: Project Group Write your names and University ID Python (DB design) and C++ (DB internals) Varying duration for different parts (2 weeks at least) Exact dates posted on website Ask questions, post comments on forums Start early! 25

Section 1 > Logistics To encourage awesomeness Bonus assignments, activities, and projects Some extremes 1. I was hung over when I took the test. Intended to make up for silly mistakes. 2. I want to be a research star! There will be some challenging assignments that could indicate possible publication (e.g., ACM SIGMOD undergrad competition) 26

Section 1 > Lectures Lectures: 1 st part - from a user s perspective 1. Foundations: Relational data models & SQL Lectures 2-4 How to manipulate data with SQL, a declarative language reduced expressive power but the system can do more for you 2. Database Design: Design theory and constraints Lectures 5-8 Designing relational schema to keep your data from getting corrupted 27

Section 1 > Lectures Lectures: 2 nd part database internals 3. Introduction to database systems Lectures 9-11 Data Storage and IO models Buffer Manager and File Organization External sorting 4. Indexing and Hashing Lectures 12-15 Intro to indexing B+ Tree, Hash, and Bitmap Indexes 5. Query processing Lectures 16-20 Access methods and operators Joins Relational algebra and Query optimization 28

Section 1 > Lectures Lectures: 3 rd part transactions 6. Transactions Lectures 21-22 Transactions from a user s perspective Logging and Locking 7. Bonus Guest Lecture and Lecture 23 Stratis Viglas from Google Machine Learning meets Data Management 29

Section 1 > Lectures Lectures: A note about format of notes These are asides / notes (still need to know these in general!) Take note!! Definitions in blue with concept being defined bold & underlined Main point of slide / key takeaway at bottom Warnings- pay attention here! 30

Section 1 > ACTIVITY Jupyter Notebook Hello World Jupyter notebooks are interactive shells which save output in a nice notebook format They also can display markdown, LaTeX, HTML, js FYI: Jupyter Notebook are also called ipython notebooks but they handle other languages too. You ll use these for in-class activities interactive lecture supplements/recaps homeworks, projects, etc.- if helpful! Note: you do need to know or learn python for this course! 31

Section 1 > ACTIVITY Jupyter Notebook Setup 1. HIGHLY RECOMMENDED. Install on your laptop via the instructions on the next slide / Piazza 2. Other options running via one of the alternative methods: 1. Ubuntu VM. 2. CS Machines. Please help out your peers by posting issues / solutions on Piazza! 3. Come to office hours if you need help with installation! As a general policy in upper-level CS courses, Windows is not officially supported. 32

Section 1 > ACTIVITY Jupyter Notebook Setup https://thodrek.github.io/cs564- fall17/misc/jupyter_install.html 33

Section 1 > ACTIVITY Activity-1-1.ipynb 34

Section 2 2. Overview of the relational data model 35

Section 2 What you will learn about in this section 1. Definition of DBMS 2. Data models & the relational data model 3. Schemas & data independence 4. ACTIVITY: Jupyter + SQL 36

Section 2 > DBMS What is a DBMS? A large, integrated collection of data Models a real-world enterprise Entities (e.g., Students, Courses) Relationships (e.g., Alice is enrolled in CS564) A Database Management System (DBMS) is a piece of software designed to store and manage databases 37

Section 2 > Data models A Motivating, Running Example Consider building a course management system (CMS): Students Courses Professors Entities Who takes what Who teaches what Relationships 38

Section 2 > Data models Data models A data model is a collection of concepts for describing data The relational model of data is the most widely used model today Main Concept: the relation- essentially, a table A schema is a description of a particular collection of data, using the given data model E.g. every relation in a relational data model has a schema describing types, etc. 39

Section 2 > Data models Modeling the Course Management System Logical Schema Students(sid: string, name: string, gpa: float) Courses(cid: string, cname: string, credits: int) Enrolled(sid: string, cid: string, grade: string) sid Name Gpa 101 Bob 3.2 123 Mary 3.8 Students Relations sid cid Grade 123 564 A cid cname credits 564 564-2 4 308 417 2 Courses Enrolled 40

Section 2 > Data models Modeling the Course Management System Logical Schema Students(sid: string, name: string, gpa: float) Courses(cid: string, cname: string, credits: int) Enrolled(sid: string, cid: string, grade: string) sid Name Gpa 101 Bob 3.2 123 Mary 3.8 Students Corresponding keys sid cid Grade 123 564 A cid cname credits 564 564-2 4 308 417 2 Courses Enrolled 41

Section 2 > Schemata Other Schemata Physical Schema: describes data layout Relations as unordered files Some data in sorted order (index) Administrators Logical Schema: Previous slide External Schema: (Views) Course_info(cid: string, enrollment: integer) Derived from other tables Applications 42

Section 2 > Schemata Data independence Concept: Applications do not need to worry about how the data is structured and stored Logical data independence: protection from changes in the logical structure of the data Physical data independence: protection from physical layout changes I.e. should not need to ask: can we add a new entity or attribute without rewriting the application? I.e. should not need to ask: which disks are the data stored on? Is the data indexed? One of the most important reasons to use a DBMS 43

Section 2 > ACTIVITY Activity-1-2.ipynb 44

Section 3 3. Overview of DBMS topics Key concepts & challenges 45

Section 3 What you will learn about in this section 1. Transactions 2. Concurrency & locking 3. Atomicity & logging 4. Summary 46

Section 3 > DBMS Challenges Challenges with Many Users Suppose that our CMS application serves 1000 s of users or morewhat are some challenges? Security: Different users, different roles We won t look at too much in this course, but is extremely important Performance: Need to provide concurrent access Disk/SSD access is slow, DBMS hide the latency by doing more CPU work concurrently Consistency: Concurrency can lead to update problems DBMS allows user to write programs as if they were the only user 47

Section 3 > DBMS Challenges Transactions A key concept is the transaction (TXN): an atomic sequence of db actions (reads/writes) Atomicity: An action either completes entirely or not at all Acct Balance a10 20,000 a20 15,000 Transfer $3k from a10 to a20: 1. Debit $3k from a10 2. Credit $3k to a20 Acct Balance a10 17,000 a20 18,000 Written naively, in which states is atomicity preserved? Crash before 1, After 1 but before 2, After 2. DB Always preserves atomicity! 48

Section 3 > DBMS Challenges Transactions A key concept is the transaction (TXN): an atomic sequence of db actions (reads/writes) If a user cancels a TXN, it should be as if nothing happened! Atomicity: An action either completes entirely or not at all Transactions leave the DB in a consistent state Users may write integrity constraints, e.g., each course is assigned to exactly one room However, note that the DBMS does not understand the real meaning of the constraints consistency burden is still on the user! Consistency: An action results in a state which conforms to all integrity constraints 49

Section 3 > DBMS Challenges Challenge: Scheduling Concurrent Transactions The DBMS ensures that the execution of {T 1,,T n } is equivalent to some serial execution One way to accomplish this: Locking Before reading or writing, transaction requires a lock from DBMS, holds until the end Key Idea: If T i wants to write to an item x and T j wants to read x, then T i, T j conflict. Solution via locking: only one winner gets the lock loser is blocked (waits) until winner finishes A set of TXNs is isolated if their effect is as if all were executed serially What if T i and T j need X and Y, and T i asks for X before T j, and T j asks for Y before T i? -> Deadlock! One is aborted All concurrency issues handled by the DBMS 50

Section 3 > DBMS Challenges Ensuring Atomicity & Durability DBMS ensures atomicity even if a TXN crashes! One way to accomplish this: Write-ahead logging (WAL) Key Idea: Keep a log of all the writes done. After a crash, the partially executed TXNs are undone using the log Write-ahead Logging (WAL): Before any action is finalized, a corresponding log entry is forced to disk We assume that the log is on stable storage All atomicity issues also handled by the DBMS 51

Section 3 > Summary A Well-Designed DBMS makes many people happy! End users and DBMS vendors Reduces cost and makes money DB application programmers Can handle more users, faster, for cheaper, and with better reliability / security guarantees! Database administrators (DBA) Easier time of designing logical/physical schema, handling security/authorization, tuning, crash recovery, and more Must still understand DB internals 52

Section 3 > Summary Summary of DBMS DBMS are used to maintain, query, and manage large datasets. Provide concurrency, recovery from crashes, quick application development, integrity, and security Key abstractions give data independence DBMS R&D is one of the broadest, most exciting fields in CS. Fact! 53