What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

Similar documents
Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

Relational Model & Algebra. Announcements (Tue. Sep. 3) Relational data model. CompSci 316 Introduction to Database Systems

Announcements (September 14) SQL: Part I SQL. Creating and dropping tables. Basic queries: SFW statement. Example: reading a table

EECS 647: Introduction to Database Systems

SQL: Part III. Announcements. Constraints. CPS 216 Advanced Database Systems

Introduction to Wireless Sensor Network. Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University

EECS 647: Introduction to Database Systems

SQL: Part II. Announcements (September 18) Incomplete information. CPS 116 Introduction to Database Systems. Homework #1 due today (11:59pm)

Announcements (September 18) SQL: Part II. Solution 1. Incomplete information. Solution 3? Solution 2. Homework #1 due today (11:59pm)

EECS 647: Introduction to Database Systems

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

Announcements (September 21) SQL: Part III. Triggers. Active data. Trigger options. Trigger example

EECS 647: Introduction to Database Systems

Relational Databases

Chapter 2: Intro to Relational Model

Midterm Review. March 27, 2017

Basic operators: selection, projection, cross product, union, difference,

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

Database Technology Introduction. Heiko Paulheim

CSEN 501 CSEN501 - Databases I

Basant Group of Institution

Relational Algebra Homework 0 Due Tonight, 5pm! R & G, Chapter 4 Room Swap for Tuesday Discussion Section Homework 1 will be posted Tomorrow

Mahathma Gandhi University

Relational Model and Algebra. Introduction to Databases CompSci 316 Fall 2018

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

CSE Midterm - Spring 2017 Solutions

Indexing. Announcements. Basics. CPS 116 Introduction to Database Systems

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

Query Processing: A Systems View. Announcements (March 1) Physical (execution) plan. CPS 216 Advanced Database Systems

Relational Model and Relational Algebra

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

Today's Class. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Example Database. Query Plan Example

Relational Model, Relational Algebra, and SQL

Announcements (March 1) Query Processing: A Systems View. Physical (execution) plan. Announcements (March 3) Physical plan execution

The Relational Model. Week 2

SQL - Data Query language

Why Study the Relational Model? The Relational Model. Relational Database: Definitions. The SQL Query Language. Relational Query Languages

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems

Ian Kenny. November 28, 2017

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Parser: SQL parse tree

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

COMP9311 Week 10 Lecture. DBMS Architecture. DBMS Architecture and Implementation. Database Application Performance

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

Query Processing & Optimization. CS 377: Database Systems

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Lecture 2: Introduction to SQL

Database Management Systems Paper Solution

Relational Query Languages: Relational Algebra. Juliana Freire

COMP 244 DATABASE CONCEPTS AND APPLICATIONS

DATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems

Lecture 16. The Relational Model

Relational Database Design. Announcements. Database (schema) design. CPS 216 Advanced Database Systems. DB2 accounts have been set up

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Relational Model: History

Final Review. CS 145 Final Review. The Best Of Collection (Master Tracks), Vol. 2

CIS 330: Applied Database Systems. ER to Relational Relational Algebra

OBJECTIVES. How to derive a set of relations from a conceptual data model. How to validate these relations using the technique of normalization.

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Relational Algebra 1

Chapter 12: Query Processing

Relational Database Systems 2 5. Query Processing

Relational Query Languages

CMPT 354: Database System I. Lecture 3. SQL Basics

CAS CS 460/660 Introduction to Database Systems. Relational Algebra 1.1

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Chapter 2: Intro to Relational Model

8) A top-to-bottom relationship among the items in a database is established by a

Transactions and Concurrency Control

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Chapter 1: Introduction

COSC344 Database Theory and Applications. σ a= c (P) Lecture 3 The Relational Data. Model. π A, COSC344 Lecture 3 1

CSE344 Midterm Exam Winter 2017

Chapter 1: Introduction

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

CSC 261/461 Database Systems Lecture 13. Fall 2017

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

DATABASE MANAGEMENT SYSTEM

The Relational Model. Outline. Why Study the Relational Model? Faloutsos SCS object-relational model

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Data about data is database Select correct option: True False Partially True None of the Above

The Relational Model. Roadmap. Relational Database: Definitions. Why Study the Relational Model? Relational database: a set of relations

Techno India Batanagar Computer Science and Engineering. Model Questions. Subject Name: Database Management System Subject Code: CS 601

Database Applications (15-415)

CSE 544 Principles of Database Management Systems

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Evaluation of Relational Operations

CS 405G: Introduction to Database Systems. Storage

5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator

Storage hierarchy. Textbook: chapters 11, 12, and 13

Transcription:

What s a database system? Review of Basic Database Concepts CPS 296.1 Topics in Database Systems According to Oxford Dictionary Database: an organized body of related information Database system, DataBase Management System, or DBMS: a software system that facilitates the creation and maintenance and use of an electronic database More precisely, a DBMS should support Efficient and convenient querying and updating of large amounts of persistent data Safe, multi-user access 2 Two important questions What is the right API for a DBMS? Data model How is the data structured conceptually? Query language How do users ask queries about the data? How does the DBMS support the API? Query processing and optimization What is the most efficient way to answer a query? Transaction processing How are atomicity, consistency, isolation, and durability of transaction ensured? 3 Entity-relationship (E/R) diagram Entities: students and courses Relationships: students enroll in courses SID name age GPA Enroll Course Widely used for database design by humans DBMS does not need a graphical data model CID title 4 Before the relational revolution Hierarchical and network data models Relationships are modeled as pointers Queries require explicit pointer following Example: a simplified CODASYL query.gpa := 4.0 FIND RECORD BY CALC-KEY FIND OWNER OF CURRENT -Course SET IF Course.CID = CPS 296 THEN PRINT.name Assume that we can quickly find student records by GPA Assume there is a pointer from students to courses How about navigating from courses to students? 5 Physical data independence Problems with hierarchical and network data models Access to data is not declarative Whenever data is reorganized, applications must be reprogrammed!! Physical data independence Applications should not need to worry about how data is physically structured and stored Applications should work with a logical data model and declarative query language Leave the implementation details and optimization to DBMS 6

Relational data model A database is a collection of relations (or tables) Each relation has a list of attributes (or columns) Each relation contains a set of tuples (or rows) Duplicates not allowed Course Enroll CID title 142 CPS 296 CPS 296 Topics in Database Systems 142 CPS 216 CPS 216 Advanced Database Systems 123 CPS 296 CPS 116 Intro. to Database Systems 857 CPS 296 857 CPS 116 456 CPS 116 7 Schema versus instance Schema (metadata) Structure and constraints over data (SID integer, name string, age integer, GPA float) Course (CID string, title string) Enroll (SID integer, CID integer).sid is a key, Enroll.SID is a foreign key referencing.sid, etc. Changes infrequently Instance Actual contents that conform to the schema { <142, Bart, 10, 2.3>, <123, Milhouse, 10, 3.1>,...} { <CPS 296, Topics in Database Systems>,...} { <142, CPS 296>, <142, CPS 216>,...} Changes frequently 8 Relational algebra Core set of operators: Selection, projection, cross product, union, difference, and renaming Additional, derived operators: Join, etc. Operator Operator 9 Selection Notation: σ p ( R ) p is called a selection condition/predicate Output: only rows that satisfy p Example: s with GPA higher than 3.0 σ GPA > 3.0 ( ) σ GPA > 3.0 10 Projection Notation: π L ( R ) L is a list of columns in R Output: only the columns in L Duplicate rows are removed Example: age distribution of students π age ( ) π age 11 Cross product Notation: R S Output: for each row r in R and each row s in S, output a row rs (concatenation of r and s) Example: Enroll 142 CPS 296 142 CPS 216 123 CPS 296 142 CPS 296 142 CPS 216 123 CPS 296 142 CPS 296 142 CPS 216 123 CPS 296 12

Derived operator: join Union and difference Notation: R >< p S (shorthand for σ p (R S)) p is called a join condition/predicate Example: students and CIDs of their courses ><.SID = Enroll.SID Enroll ><.SID = Enroll.SID 142 CPS 296 142 CPS 216 123 CPS 296 142 CPS 296 142 CPS 216 123 CPS 296 142 CPS 296 142 CPS 216 123 CPS 296 13 Notation: R S R and S must have identical schema Output: Same schema as R and S Contains all rows in R and all rows in S, with duplicates eliminated Notation: R S R and S must have identical schema Output: Same schema as R and S Contains all rows in R that are not found in S 14 Renaming Notation: ρ S ( R ), or ρ S ( A1, A 2,...) ( R ) Purpose: rename a table and/or its columns No real processing involved Used to avoid confusion caused by identical column names Example: all pairs of (different) students >< SID1 < > SID2 ρ 1 (SID1, name1, age1, GPA1) ρ 2 (SID2, name2, age2, GPA2) 15 Relational algebra example Names of students in CPS 296 with 4.0 GPA π name ><.SID = Enroll.SID σ GPA = 4.0 σ CID = CPS 296 Enroll Compare this query to the CODASYL version! 16 SQL SQL (Structured Query Language) Pronounced S-Q-L or sequel The query language of every commercial DBMS Simplest form: SELECT A 1, A 2,,A n FROM R 1, R 2,,R m WHERE condition; Also called an SPJ (select-project-join) query Equivalent (more or less) to relational algebra query π A1, A2,, A n (σ condition (R 1 R 2 R m )) Unlike relational algebra, SQL preserves duplicates by default 17 SQL example Names of students in CPS 296 with 4.0 GPA SELECT.name FROM, Enroll WHERE Enroll.CID = CPS 296 AND Enroll.SID =.SID AND.GPA = 4.0; Compare this query to the CODASYL version! 18

More SQL features SELECT [DISTINCT] list_of_output_columns FROM list_of_tables WHERE where_condition GROUP BY list_of_group_by_columns HAVING having_condition ORDER BY list_of_order_by_columns; Operational semantics FROM: take the cross product of list_of_tables WHERE: apply σ where_condition GROUP BY: group result tuples according to list_of_group_by_columns HAVING: apply σ having_condition to the groups SELECT: apply π list_of_output_columns (preserve duplicates) DISTINCT: eliminate duplicates ORDER BY: sort the result by list_of_order_by_columns 19 SQL example with aggregation Find the average GPA for each age group with at least three students SELECT age, AVG(GPA) FROM GROUP BY age HAVING COUNT(*) >= 3; GROUP BY HAVING SELECT age AVG(GPA) 10 3.2 20 Summary: relational query languages Access paths Not your general-purpose programming language Not expected to be Turing-complete Not intended to be used for complex calculations Amenable to much optimization More declarative than languages for hierarchical and network data models No explicit pointer following Replaced by joins that can be easily reordered Next: How do we support relational query languages Store data in ways to speed up queries Heap file: unordered set of records B+-tree index: disk-based balanced search tree with logarithmic lookup and update Linear/extensible hashing: disk-based hash tables that can grow dynamically Bitmap indexes: potentially much more compact And many more One table may have multiple access paths One primary index that stores records directly Multiple secondary indexes that store pointers to records efficiently? 21 22 Query processing methods The same query operator can be implemented in many different ways Example: R >< R.A=S.B S Nested-loop join: for each tuple of R, and for each tuple of S, join Index nested-loop join: for each tuple of R, use the index on S.B to find joining S tuples Sort-merge join: sort R by R.A, sort S by S.B, and merge-join Hash join: partition R and S by hashing R.A and S.B, and join corresponding partitions And many more 23 Motivation for query optimization The same query can have many different execution plans Example: SELECT.name FROM, Enroll WHERE Enroll.CID = CPS 296 AND Enroll.SID =.SID AND.GPA = 4.0; Plan 1: evaluate σ GPA = 4.0 (); for each result SID, find the Enroll tuples with this SID and check if CID is CPS 296 Plan 2: evaluate σ CID = CPS 296 (Enroll); for each result SID, find the tuple with this SID and check if GPA is 4.0 Plan 3: evaluate both σ GPA = 4.0 () and σ CID = CPS 296 (Enroll), and join them on SID Any many more 24

Query optimization A huge number of possible execution plans With different access methods, join order, join methods, etc. Query optimizer s job Enumerate candidate plans Query rewrite: transform queries or query plans into equivalent ones Estimate costs of plans Use statistics such as histograms Pick a plan with reasonably low cost Dynamic programming Randomized search 25 Optimizing for I/O Location Cycles Location Time Registers 1 My head 1 min. Memory 100 Washington D.C. 1.5 hr. Disk 10 6 Pluto 2 yr. (source: AlphaSort paper, 1995)! I/O costs dominate database operations DBMS typically optimizes the number of I/O s Example: Which of the following is a more efficient way to process SELECT * FROM R ORDER BY R.A;? Use an available secondary B+-tree index on R.A: follow leaf pointers, which are already ordered by R.A Just sort the table 26