Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook:

Similar documents
Administrivia. The Relational Model. Review. Review. Review. Some useful terms

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

Review. The Relational Model. Glossary. Review. Data Models. Why Study the Relational Model? Why use a DBMS? OS provides RAM and disk

9/8/2018. Prerequisites. Grading. People & Contact Information. Textbooks. Course Info. CS430/630 Database Management Systems Fall 2018

CS430/630 Database Management Systems Spring, Betty O Neil University of Massachusetts at Boston

Introduction to Data Management. Lecture #4 (E-R Relational Translation)

The Relational Model. Relational Data Model Relational Query Language (DDL + DML) Integrity Constraints (IC)

Introduction to Data Management. Lecture #2 Intro II & Data Models I

The Relational Model. Database Management Systems

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Introduction to Data Management. Lecture #5 Relational Model (Cont.) & E-Rà Relational Mapping

Introduction to Data Management. Lecture #1 (The Course Trailer )

Introduction to Data Management. Lecture #1 (Course Trailer )

Database Management Systems. Chapter 3 Part 1

Introduction to Data Management. Lecture #1 (Course Trailer )

Introduction and Overview

Introduction to Data Management. Lecture #1 (Course Trailer ) Instructor: Chen Li

EGCI 321: Database Systems. Dr. Tanasanee Phienthrakul

The Relational Model. Outline. Why Study the Relational Model? Faloutsos SCS object-relational model

CAS CS 460/660 Introduction to Database Systems. Fall

Why are you here? Introduction. Course roadmap. Course goals. What do you want from a DBMS? What is a database system? Aren t databases just

The Relational Data Model. Data Model

Introduction to Data Management. Lecture #2 (Big Picture, Cont.)

The Relational Model. Chapter 3. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Introduction to Database Systems

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

The Relational Model. Week 2

Introduction and Overview

The Relational Model

Database Applications (15-415)

The Relational Model of Data (ii)

The Relational Model. Chapter 3

Outline. Database Management Systems (DBMS) Database Management and Organization. IT420: Database Management and Organization

Goals for Today. CS 133: Databases. Final Exam: Logistics. Why Use a DBMS? Brief overview of course. Course evaluations

Page 1. Quiz 18.1: Flow-Control" Goals for Today" Quiz 18.1: Flow-Control" CS162 Operating Systems and Systems Programming Lecture 18 Transactions"

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Database Systems ( 資料庫系統 )

CMPT 354: Database System I. Lecture 2. Relational Model

Relational data model

The Relational Model

Lecture 16. The Relational Model

Relational Databases BORROWED WITH MINOR ADAPTATION FROM PROF. CHRISTOS FALOUTSOS, CMU /615

CS 564: DATABASE MANAGEMENT SYSTEMS. Spring 2018

The Relational Model. Roadmap. Relational Database: Definitions. Why Study the Relational Model? Relational database: a set of relations

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

The Relational Model 2. Week 3

CSE 544 Principles of Database Management Systems

Database Systems ( 資料庫系統 ) Practicum in Database Systems ( 資料庫系統實驗 ) 9/20 & 9/21, 2006 Lecture #1

Database Management Systems. Chapter 1

DATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems

Chapter 1: Introduction

CS564: Database Management Systems. Lecture 1: Course Overview. Acks: Chris Ré 1

CompSci 516 Database Systems. Lecture 2 SQL. Instructor: Sudeepa Roy

Introduction. Example Databases

Database Systems. Lecture2:E-R model. Juan Huo( 霍娟 )

Modern Database Systems CS-E4610

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Introduction to Data Management. Lecture #4 (E-R à Relational Design)

BBM371- Data Management. Lecture 1: Course policies, Introduction to DBMS

Relational Model. Topics. Relational Model. Why Study the Relational Model? Linda Wu (CMPT )

Database Management Systems MIT Introduction By S. Sabraz Nawaz

Page 1. Goals for Today" What is a Database " Key Concept: Structured Data" CS162 Operating Systems and Systems Programming Lecture 13.

Introduction. Random things to do after this course. Course roadmap. CPS 116 Introduction to Database Systems

Lecture 2 SQL. Announcements. Recap: Lecture 1. Today s topic. Semi-structured Data and XML. XML: an overview 8/30/17. Instructor: Sudeepa Roy

CS634 Architecture of Database Systems Spring Elizabeth (Betty) O Neil University of Massachusetts at Boston

Announcements. PS 3 is out (see the usual place on the course web) Be sure to read my notes carefully Also read. Take a break around 10:15am

Announcements. Using Electronics in Class. Review. Staff Instructor: Alvin Cheung Office hour on Wednesdays, 1-2pm. Class Overview

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

Data Modeling. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li

EECS 647: Introduction to Database Systems

Standard stuff. Class webpage: cs.rhodes.edu/db Textbook: get it somewhere; used is fine. Prerequisite: CS 241 Coursework:

CS145: Intro to Databases. Lecture 1: Course Overview

Database Systems. Course Administration

Overview of the Class and Introduction to DB schemas and queries. Lois Delcambre

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

CMPT 354: Database System I. Lecture 1. Course Introduction

Comp 5311 Database Management Systems. 4b. Structured Query Language 3

CSC 261/461 Database Systems Lecture 13. Fall 2017

CS 2451 Database Systems: Relational Data Model

COURSE OVERVIEW THE RELATIONAL MODEL. CS121: Relational Databases Fall 2017 Lecture 1

Modern Database Systems Lecture 1

COURSE OVERVIEW THE RELATIONAL MODEL. CS121: Introduction to Relational Database Systems Fall 2016 Lecture 1

L3 The Relational Model. Eugene Wu Fall 2018

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

CSE 344 JANUARY 5 TH INTRO TO THE RELATIONAL DATABASE

ADVANCED DATABASES ; Spring 2015 Prof. Sang-goo Lee (11:00pm: Mon & Wed: Room ) Advanced DB Copyright by S.-g.

John Edgar 2

Outline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1

Ø Last Thursday: String manipulation & Regular Expressions Ø guest lecture from the amazing Sam Lau Ø reviewed in section and in future labs & HWs

Why Study the Relational Model? The Relational Model. Relational Database: Definitions. The SQL Query Language. Relational Query Languages

CS143: Relational Model

Introduction. Who wants to study databases?

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #1: Introduction

CGS 3066: Spring 2017 SQL Reference

Introduction to Databases CSE 414. Lecture 2: Data Models

Outline. Databases and DBMS s. Recent Database Applications. Earlier Database Applications. CMPSCI445: Information Systems.

CS275 Intro to Databases. File Systems vs. DBMS. Why is a DBMS so important? 4/6/2012. How does a DBMS work? -Chap. 1-2

CS275 Intro to Databases

Transcription:

CS 133: Databases Fall 2018 Lec 01 09/04 Introduction & Relational Model Data! Need systems to Data is everywhere Banking, airline reservations manage the data Social media, clicking anything on the internet http://www.bigdataanalyticstoday.com/category/infographics/ Prof. Beth Trushkowsky http://www.puntogeek.com/wp-content/uploads/2012/12/21168.strip_.gif facebook friend wheel, https://www.flickr.com/photos/antjeverena/ Goals for Today What is a database anyway? Important DBMS features and challenges! Course logistics Relational data model Why it s great What it looks like (intro to SQL) So, what is a database? From the textbook: Database: a collection of data, typically describing the activities of one or more related organizations Database system, DataBase Management System (DBMS): software designed to assist in maintaining and utilizing large collections of data

DBMS desiderata Ask questions (queries) about data Add and update data Persist the data (keep it around) E.g., banking application Query: What is Alice s balance? Update: Alice deposits $100 Persist: Alice hopes her money is still there after a power outage Sounds easy! Account, Branch, Name, Balance 45, Claremont, Alice, 200 67, Claremont, Bob, 100000 78, Pasadena, Carl, 987654... Store data in text files Accounts separated by newlines Fields separated by commas Query: what is Alice s balance? Abstracting data management Relational DBMS to the rescue Can come up with tricks to optimize a particular query/application End up redoing this work for new apps Edgar F. Codd Turing award, 1981 [There should be] a clear boundary between the logical and physical aspects of database management 1 1 http://en.wikiquote.org/wiki/e._f._codd Physical Independence Applications need not know how data is physically structured and stored Instead, have logical data model Leave the implementation details and optimization to DBMS

Relational DBMS to the rescue Relational data model: data is stored in relations Example: Banking info account branch name balance 45 Claremont Alice 200 67 Claremont Bob 100000 78 Pasadena Carl 987654 A declarative query language Specify what answers a query should return, but not how the query is executed E.g., SQL, Datalog (subset of Prolog) Query: what is Alice s balance? SELECT balance FROM Banking WHERE name = Alice ; DBA designs these! Relational Model: Levels of Abstraction Conceptual/Logical schema (sid: string, name: string, login: string, gpa: real) Courses (cid: string, cname: string, credits: integer) Enrolled (sid: string, cid: string, grade: string) Specifies storage details Physical schema Store the relations as unsorted files Create indexes on.sid and Courses.cid External schema ( views ) view each course s enrollment CourseInfo( cid: string, enrollmnt: integer) Entities and relationships! CREATE VIEW CourseInfo AS SELECT cid, COUNT (*) as enrollmnt FROM Enrolled GROUP BY cid; Describes data in terms of data model Allow customized data access Data Independence Modern DBMS Features Logical data independence Protected from changes in conceptual schema Physical data independence Protected from changes in physical schema View 1 View 2 View n Conceptual schema Physical schema Logical data model We focus on relational in this course May touch on others, e.g., XML, Document Data independence! Declarative language Queries Updates Persistence But wait, there s more

Concurrent Access Banking example: ATM withdrawal pseudocode get balance; if balance > amount withdraw amount; newbalance = balance - amount; write balance = newbalance; Alice and Bob share an account. Alice goes to one ATM, withdraws $100 Bob goes to another ATM, withdraws $50 Initial balance = $400 Final balance? Example from Jun Yang Alice withdraws $100 get balance; $400 if balance > amount withdraw amount; newbalance = balance - amount; write balance = newbalance; Final balance = $300!! Concurrent Access Bob withdraws $50 get balance; $400 if balance > amount withdraw amount; newbalance = balance - amount; write balance = newbalance; $350 $300 What can we do? Example from Jun Yang System Failures Modern DBMS Features (cntd) Banking example: balance transfer decrement account X by $100 increment account Y by $100 What if power goes out after first instruction? DBMS buffers and updates some data in memory before writing to disk what if power goes out before write to disk? Keep a log of updates, undo/redo upon recovery Logical data model Declarative language Persistence Concurrent access Fault tolerance Performance! Lots of queries Lots of data Example from Jun Yang

Simplified RDBMS Architecture Course Overview Design principles behind DBMS! Application Declarative Query Query optimizer Query executor Access methods Buffer management Disk management Data records Query results Concerned with concurrency control and recovery Bottom-up order of topics to show role of abstraction and algorithms for efficiency/ optimization Physical data organization Relational algebra and SQL Query evaluation and optimization Transactions, concurrency control, recovery Database design Course Objectives Provide a solid background in database management system design principles Promote understanding of these principles through handson exercises implementing the internals of a relational database management system Further develop students' ability to reason about algorithm and software design, optimization, and tradeoffs generally applicable in computer science Labs: SimpleDB Labs 1-4: Implement key features of a (simplified) DBMS in Java Files, Storage Relational Operators Query Optimizer Locking with Transactions Lab 5: database design Lab 1: Getting started due next Wednesday

Administrivia Course website: https://www.cs.hmc.edu/~beth/courses/cs133/current Syllabus, calendar, lab descriptions Textbook: Database Management Systems 3rd Edition, by Ramakrishnan and Gehrke Piazza for questions about labs, problem sets, etc.: piazza.com/hmc/fall2018/cs133/home Assignment submission on Sakai Office hours Monday and Tuesday, time TBD, or by appointment Grutors Devang Hours: TBD The Relational Model Many RDBMS vendors, including open-source Oracle MySQL PostgreSQL SQLite DB2 SQL Server We ll touch on other data models as well Key Concepts: Relational Model Relational Model: Synonyms Database: collection of relations Relation: list of attributes Relations have sets of tuples Schema (metadata) Specification of how data is to be structured logically Contains attribute types Defined at set-up SID name login gpa 45 Alice alicious 3.4 67 Bob bobtastic 3.9 Courses CID Name Dept 121 Software Dev CS 70 Data Structures CS More formal Less formal Relation Table Tuple Row Record Attribute Column Field Domain Type

Structured Query Language (SQL) Data definition language (DDL) Define the schema (create, change, delete relations) Specify constraints, user permissions Data modification language (DML) Find data that matches criteria Add, remove, update data The DBMS is responsible for efficient evaluation! Co-invented by Don Chamberlin (HMC 66)! A Relation Instance An instance of a relation is its contents at a given time cardinality: # tuples arity: # attributes SID Name Login SSN GPA 45 Alice alicious 000-00-0000 3.4 67 Bob bobtastic 000-00-0001 3.9 78 Carl carl 000-00-0010 2.5 Photo: http://researcher.watson.ibm.com/ researcher/view.php?person=us-dchamber SQL: Creating Relations Adding and Removing Tuples Create relation: CREATE TABLE ( sid CHAR(20), name CHAR(20), login CHAR(100), SSN CHAR(12), gpa FLOAT); Create Enrolled relation: CREATE TABLE Enrolled ( sid CHAR(20), cid CHAR(20), grade CHAR(2)); Insert a single tuple INSERT INTO (sid, name, login, SSN, gpa) VALUES (45, Alice, alicious, 000-00-0000, 3.4); Delete tuples that satisfy condition (predicate) Domain info is type of Integrity constraint (IC) IC: a condition on the database schema, restricts data that can be stored DELETE FROM S WHERE S.name = Alice ;

Integrity Constraints: Keys Superkey is a set of field(s) that Uniquely identifies a tuple Candidate key: does so minimally Primary key: a chosen candidate key SID Name Login SSN GPA 45 Alice alicious 000-00-0000 3.4 67 Bob bobtastic 000-00-0001 3.9 78 Carl carl 000-00-0010 2.5 Primary key Integrity Constraints: Foreign Keys Referential integrity, logical pointer Set of fields in one relation refer to primary key of another SID Name Login SSN GPA 45 Alice alicious 000-00-0000 3.4 67 Bob bobtastic 000-00-0001 3.9 78 Carl carl 000-00-0010 2.5 Primary key INSERT INTO Enrolled(sid,cid,grade) VALUES (43,CS133,D);? Enrolled SID CID Grade 45 CS133 A 45 CS121 B 78 CS5 A Foreign key Defining Key Constraints Specified in schema definition CREATE TABLE ( sid CHAR(20), name CHAR(20), login CHAR(10), SSN CHAR(20), gpa FLOAT, PRIMARY KEY(sid), UNIQUE (SSN) ); CREATE TABLE Enrolled ( sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), ); FOREIGN KEY (sid) REFERENCES Primary and Candidate Keys in SQL Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key. Keys must be used carefully! Example: For a given student and course, there is a single grade. CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), vs. grade CHAR(2), PRIMARY KEY (sid,cid)) CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade)) can take only one course, and no two students in a course receive the same grade.

SQL: Single Relation Queries SELECT name FROM WHERE gpa > 3.7; SID name login gpa 45 Alice alicious 3.4 67 Bob bobtastic 3.9 78 Carl carl 2.5 name Bob SQLite Demo The database is in the file you specify. The file is created if it doesn t exist. SQL statements end with a semicolon. Capitalization looks nice, but not required. SELECT * FROM S WHERE S.gpa > 3.7; sid name login gpa 67 Bob bobtastic 3.9 These two settings for mode and header make query results easier to read. Also see: Resources on course website and www.sqlite.org Query Execution: Teaser Query optimizer transforms a declarative query into a pipeline of dataflow operators called a query execution plan Iterators!! SELECT name FROM WHERE gpa > 3.7; Project (name) Filter (gpa > 3.7)