Using a DBMS. Shan-Hung Wu & DataLab CS, NTHU

Similar documents
Database Systems. Shan-Hung Wu CS, NTHU

A Deeper Look at Data Modeling. Shan-Hung Wu & DataLab CS, NTHU

Background. vanilladb.org

DATABASES SQL INFOTEK SOLUTIONS TEAM

Introduction. Example Databases

CMPT 354: Database System I. Lecture 3. SQL Basics

Data Modelling and Databases. Exercise Session 7: Integrity Constraints

Chapter 1: Introduction

Databases. Jörg Endrullis. VU University Amsterdam

COGS 121 HCI Programming Studio. Week 03 - Tech Lecture

Session 6: Relational Databases

Database Systems CSE 414

Course Logistics & Chapter 1 Introduction

John Edgar 2

DB Creation with SQL DDL

Relational Database Systems Part 01. Karine Reis Ferreira

DS Introduction to SQL Part 1 Single-Table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Database Technology Introduction. Heiko Paulheim

EGCI 321: Database Systems. Dr. Tanasanee Phienthrakul

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook:

DATABASE MANAGEMENT SYSTEMS

Chapter 4: Intermediate SQL

SQL: Data Definition Language. csc343, Introduction to Databases Diane Horton Fall 2017

Techno India Batanagar Computer Science and Engineering. Model Questions. Subject Name: Database Management System Subject Code: CS 601

Chapter 8: Working With Databases & Tables

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Page 1. Quiz 18.1: Flow-Control" Goals for Today" Quiz 18.1: Flow-Control" CS162 Operating Systems and Systems Programming Lecture 18 Transactions"

ACS-3902 Fall Ron McFadyen 3D21 Slides are based on chapter 5 (7 th edition) (chapter 3 in 6 th edition)

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

CMPT 354: Database System I. Lecture 2. Relational Model

Database Management System 9

Page 1. Goals for Today" What is a Database " Key Concept: Structured Data" CS162 Operating Systems and Systems Programming Lecture 13.

CS275 Intro to Databases

Today Learning outcomes LO2

CGS 3066: Spring 2017 SQL Reference

CSE 544 Principles of Database Management Systems

CS143: Relational Model

In This Lecture. SQL Data Definition SQL SQL. Non-Procedural Programming. Notes. Database Systems Lecture 5 Natasha Alechina

Database Systems Relational Model. A.R. Hurson 323 CS Building

Database Management Systems Introduction to DBMS

Overview. Data Integrity. Three basic types of data integrity. Integrity implementation and enforcement. Database constraints Transaction Trigger

CMSC 424 Database design Lecture 2: Design, Modeling, Entity-Relationship. Book: Chap. 1 and 6. Mihai Pop

SQL: Concepts. Todd Bacastow IST 210: Organization of Data 2/17/ IST 210

Lab IV. Transaction Management. Database Laboratory

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Transactions Processing (i)

Chapter 1 SQL and Data

Consistency The DBMS must ensure the database will always be in a consistent state. Whenever data is modified, the database will change from one

Relational Databases BORROWED WITH MINOR ADAPTATION FROM PROF. CHRISTOS FALOUTSOS, CMU /615

MIS Database Systems.

BIS Database Management Systems.

Data, Databases, and DBMSs

EECS 647: Introduction to Database Systems

CSC 453 Database Technologies. Tanu Malik DePaul University

SQL Interview Questions

Database Applications (15-415)

3/3/2008. Announcements. A Table with a View (continued) Fields (Attributes) and Primary Keys. Video. Keys Primary & Foreign Primary/Foreign Key

CSE 3241: Database Systems I Databases Introduction (Ch. 1-2) Jeremy Morris

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

CMPE 131 Software Engineering. Database Introduction

CSE 344 Final Review. August 16 th

CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Relational Model. Topics. Relational Model. Why Study the Relational Model? Linda Wu (CMPT )

Database Technology. Topic 8: Introduction to Transaction Processing

Course Introduction & Foundational Concepts

Data about data is database Select correct option: True False Partially True None of the Above

SQL DATA DEFINITION LANGUAGE

Bases de Dades: introduction to SQL (indexes and transactions)

The Relational Model of Data (ii)

Informatics 1: Data & Analysis

The Relational Model. Roadmap. Relational Database: Definitions. Why Study the Relational Model? Relational database: a set of relations

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Textbook: Chapter 4. Chapter 5: Intermediate SQL. CS425 Fall 2016 Boris Glavic. Chapter 5: Intermediate SQL. View Definition.

COMP 430 Intro. to Database Systems

Lecture 2: Introduction to SQL

Transaction Management & Concurrency Control. CS 377: Database Systems

Transforming ER to Relational Schema

SQL DATA DEFINITION LANGUAGE

SQL DATA DEFINITION LANGUAGE

CS425 Fall 2017 Boris Glavic Chapter 5: Intermediate SQL

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Outline. Database Management Systems (DBMS) Database Management and Organization. IT420: Database Management and Organization

COSC 304 Introduction to Database Systems. Database Introduction. Dr. Ramon Lawrence University of British Columbia Okanagan

CS 445 Introduction to Database Systems

Chapter 4. Basic SQL. SQL Data Definition and Data Types. Basic SQL. SQL language SQL. Terminology: CREATE statement

Relational Databases. Week 7 INFM 603

Chapter 4. Basic SQL. Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

The Relational Model. Relational Data Model Relational Query Language (DDL + DML) Integrity Constraints (IC)

CSE 135. Applications View of a Relational Database Management System (RDBMS) SQL. Persistent data structure. High-level API for access &modification

CS6312 DATABASE MANAGEMENT SYSTEMS LABORATORY L T P C

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

Keys, SQL, and Views CMPSCI 645

The data structures of the relational model Attributes and domains Relation schemas and database schemas

The Relational Model. Outline. Why Study the Relational Model? Faloutsos SCS object-relational model

CPS510 Database System Design Primitive SYSTEM STRUCTURE

DATABASE SYSTEMS. Database programming in a web environment. Database System Course,

II B.Sc(IT) [ BATCH] IV SEMESTER CORE: RELATIONAL DATABASE MANAGEMENT SYSTEM - 412A Multiple Choice Questions.

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #4: SQL---Part 2

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Today's Party. Example Database. Faloutsos/Pavlo CMU /615

Transcription:

Using a DBMS Shan-Hung Wu & DataLab CS, NTHU

DBMS Database A database is a collection of your data stored in a computer A DBMS (DataBase Management System) is a software that manages databases 2

Outline Main Features of a DBMS Data Models SQL Queries 3

Why not file systems? 4

Advantages of a Database System It answers queries fast E.g., among all posts, find those written by Bob and contain word db Groups modifications into transactions such that either all or nothing happens E.g., money transfer Recovers from crash Modifications are logged No corrupt data after recovery 5

Advantages of a Database System It answers queries fast E.g., among all posts, find those written by Bob and contain word db Groups modifications into transactions such that either all or nothing happens E.g., money transfer Recovers from crash Modifications are logged No corrupt data after recovery 6

Queries Q: find ID and text of all pages written by Bob and containing word db Step1: structure data using tables users id name karma 729 Bob 35 730 John 0 posts Column/field id text ts authorid 33981 Hello DB! 1493897351 729 33982 Show me code 1493854323 812 Row/record 7

Queries Q: find ID and text of all pages written by Bob and containing word db Step2: users id name karma 729 Bob 35 730 John 0 SELECT p.id, p.text FROM posts AS p, users AS u WHERE u.id = p.authorid AND u.name='bob' AND p.text ILIKE '%db%'; posts id text ts authorid 33981 Hello DB! 1493897351 729 33982 Show me code 1493904323 812 8

How Is a Query Answered? SELECT p.id, p.text FROM posts AS p, users AS u WHERE u.id = p.authorid AND u.name='bob' AND p.text ILIKE '%db%'; (p, u) p p.id p.text p.ts p.authorid u.id u.name u.karma 33981 Hello DB!... 729 729 Bob 35 33981 Hello DB!... 729 730 John 0 33982 Show me code 812 729 Bob 35 33982 Show me code 812 730 John 0 id text ts authorid 33981 Hello DB!... 729 33982 Show me code 812 u id name karma 729 Bob 35 730 John 0 9

How Is a Query Answered? SELECT p.id, p.text FROM posts AS p, users AS u WHERE u.id = p.authorid AND u.name='bob' AND p.text ILIKE '%db%'; where(p, u) p.id p.text p.ts p.authorid u.id u.name u.karma 33981 Hello DB!... 729 729 Bob 35 (p, u) p.id p.text p.ts p.authorid u.id u.name u.karma 33981 Hello DB!... 729 729 Bob 35 33981 Hello DB!... 729 730 John 0 33982 Show me code 812 729 Bob 35 33982 Show me code 812 730 John 0 10

How Is a Query Answered? SELECT p.id, p.text FROM posts AS p, users AS u WHERE u.id = p.authorid AND u.name='bob' AND p.text ILIKE '%db%'; where(p, u) select(where(p, u)) p.id p.text 33981 Hello DB! p.id p.text p.ts p.authorid u.id u.name u.karma 33981 Hello DB!... 729 729 Bob 35 11

Why fast? 12

Query Optimization Planning: DBMS finds the best plan tree for each query select( ) select( ) where(u.id = p.authorid) where(u.id = p.authorid AND u.name='bob') p (p, u) u p (p, u) where(u.name='bob') u 13

Query Optimization Indexing: creates a search tree for column(s) ts (ordered) SELECT text FROM posts WHERE ts > 1493897220; idx_ts posts id text ts authorid 1 Good day 1493880220 664 33981 Hello DB! 1493897351 729 33982 Show me code 1493904323 812 14

Advantages of a Database System It answers queries fast E.g., among all posts, find those written by Bob and contain word db Groups modifications into transactions such that either all or nothing happens E.g., money transfer Recovers from crash Modifications are logged No corrupt data after recovery 15

Transactions I Each query, by default, is placed in a transaction (tx for short) automatically BEGIN; SELECT...; -- query COMMIT; 16

Transactions II Can group multiple queries in a tx All or nothing takes effect E.g., karma transfer users id name karma 729 Bob 35 730 John 0 BEGIN; UPDATE users SET karma = karma - 10 WHERE name='bob'; UPDATE users SET karma = karma + 10 WHERE name='john'; COMMIT; 17

ACID Guarantees Atomicity Operation are all or none in effect Consistency Data are correct after each tx commits E.g., posts.authorid must be a valid users.id Isolation Concurrent txs = serial txs (in some order) Durability Changes will not be lost after a tx commits (even after crashes) 18

users Why model data as tables? id name karma 729 Bob 35 730 John 0 id text ts authorid 33981 Hello DB! 1493897351 729 33982 Show me code 1493904323 812 posts 19

Storing Data Let s say, you have data/states in memory to store What do states look like? Objects References to objects Objects formatted by classes you defined Can we store these objects and references directly? Client (Stateful) HTTP Web Server (Stateless) SQL DB Server (Stateful) 20

Data Models Definition: A data model is a framework for describing the structure of databases in a DBMS Common data models at client side: Tree model Common data models at server side: ER model and relational model A DBMS supporting the relational model is called the relational DBMS 21

Tree Model At client side, data are usually stored as trees { // state of client 1 name: 'Bob', karma: 32, posts: [...], friends: [{ name: 'Alice', karma: 10 }, { name: 'John', karma: 17 },...],... } { // state of client 2 name: 'Alice', karma: 10, posts: [...], friends: [{ name: 'Bob', karma: 32 }, { name: 'John', karma: 17 },...],... } 22

Problems at Server Side Space complexity: large redundancy { // state of a client 1 name: 'Bob', karma: 35, posts: [...], friends: [{ name: 'Alice', karma: 10 }, { name: 'John', karma: 17 },...],... } { // state of a client 2 name: 'Alice', karma: 10, posts: [...], friends: [{ name: 'Bob', karma: 35 }, { name: 'John', karma: 17 },...],... } 23

Data Modeling at Server Side 1. Identify entity groups/classes Each class represents an atomic part of the data 2. Store entities of the same class in a table A rows/record denotes an entity A column/field denote an attribute (e.g., name ) 3. Define primary keys for each table Special column(s) that uniquely identifies an entity E.g., ID 24

users posts Identifying Entity Classes { // state of a client 1 name: 'Bob', karma: 32, posts: [...], friends: [{ name: 'Alice', karma: 10 }, { name: 'John', karma: 17 },...],... } { // state of a client 2 name: 'Alice', karma: 10, posts: [...], friends: [{ name: 'Bob', karma: 32 }, { name: 'John', karma: 17 },...],... } 25

One Table per Entity Class users posts posts id name karma 729 Bob 35 730 John 0 id text 33981 Hello DB! 33982 Show me code No redundancy No repeated update 26

Wait, relationship is missing! 27

friend { // state of a client 1 name: 'Bob', karma: 32, } posts: [...], friends: [{ name: 'Alice', karma: 10 }, { name: 'John', karma: 17 },...],... users write Step1 (ER Model) Identify relationships between entities friend (N-N) posts write (1-N) 28

friend (N-N) posts users write (1-N) posts Step 2 (Relational Model) Relationships as foreign keys id name karma 729 Bob 35 730 John 0 friend uid1 uid2 since foreign keys write id text authorid ts 729 730 14928063 729 882 14827432 33981 Hello DB! 729 1493897351 33982 Show me code 729 1493854323 29

Recap on Terminology Columns = fields = attributes Rows = records = tuples Tables = relations Relational database: a collection of tables Relational DBMS Schema: column definitions of tables in a database Basically, the look of a database Schema of a relation/table is fields and field types 30

Why ER Model? Allows thinking your data in OOP way Entity An object (or instance of a class) With attributes Entity group/class A class Must define the ID attribute for each entity Relationship between entities References ( has-a relationship) Could be 1-1, 1-N, or N-N 31

Why Relational Model? Simplifies data management and query processing Table/relations for all kinds of entity classes Primary/foreign keys for all kinds of relationships between entities Relational schema is logical Not how your data stored physically Vs. physical schema 32

Exercise: Student DB Storing course-enrollment info in a school Each department has many students and offers different courses Each courses can have multiple sections (e.g., 2018 spring, 2019 fall, etc.) Each students can enroll in different sections Can you model data and draw a relational schema? 33

Exercise: Student DB students departments s-id: int s-name: varchar(10) grad-year: int major-id: int * 1 d-id: int d-name: varchar(8) 1 1 * enroll sections * courses e-id: int student-id: int section-id: int grade: double * 1 sect-id: int course-id: int prof: int year-offered: int * 1 c-id: int title: varchar(20) dept-id: int Relation (table) Realization of 1) an entity group via table; or 2) a relationship Fields/attributes as columns Records/tuples as rows 34

Exercise: Student DB students departments s-id: int s-name: varchar(10) grad-year: int major-id: int * 1 d-id: int d-name: varchar(8) 1 1 * enroll sections * courses e-id: int student-id: int section-id: int grade: double * 1 sect-id: int course-id: int prof: int year-offered: int * 1 c-id: int title: varchar(20) dept-id: int Primary Key Realization of ID via a group of fields 35

Exercise: Student DB students departments s-id: int s-name: varchar(10) grad-year: int major-id: int * 1 d-id: int d-name: varchar(8) 1 1 * enroll sections * courses e-id: int student-id: int section-id: int grade: double * 1 sect-id: int course-id: int prof: int year-offered: int * 1 c-id: int title: varchar(20) dept-id: int Foreign key Realization of relationship A record can point to the primary key of the other record Only 1-1 and 1-many Intermediate relation is needed for many-many 36

PostgreSQL Download and install For Mac users, try PostgreSQL.app 37

Using PostgreSQL $ createdb <db> $ psql <db> [user] > \h or \? > SELECT now(); -- SQL commands Default schema: public \dn for listing all schemas Multiple lines until ; -- for comments Case insensitive Use to distinguish lower and upper cases E.g., SELECT "authorid" FROM posts; 38

Structured Query Language (SQL) Data Definition Language (DDL) on schema CREATE TABLE ALTER TABLE DROP TABLE Data Manipulation Language (DML) on records INSERT INTO VALUES SELECT FROM WHERE UPDATE SET WHERE DELETE FROM WHERE 39

Schema users id name karma 729 Bob 35 730 John 0 friend uid1 uid2 since 729 730 14928063 729 882 14827432 posts foreign keys id text authorid ts 33981 Hello DB! 729 1493897351 33982 Show me code 729 1493854323 40

Creating Tables/Relations CREATE TABLE posts ( id serial PRIMARY KEY NOT NULL, text text NOT NULL, "authorid" integer NOT NULL REFERENCES users ON DELETE CASCADE, ts bigint NOT NULL DEFAULT (extract(epoch from now())),... ); Column types: Integer, bigint, real, double, etc. varchar(10), text, etc. Non-null constraint Default values 41

Creating Tables/Relations CREATE TABLE posts ( id serial PRIMARY KEY NOT NULL, text text NOT NULL, "authorid" integer NOT NULL REFERENCES users ON DELETE CASCADE, ts bigint NOT NULL DEFAULT (extract(epoch from now())),... ); Primary key: Unique (no duplicate values among rows) Usually of type serial (auto-filled integer) Index automatically created 42

Creating Tables/Relations CREATE TABLE posts ( id serial PRIMARY KEY NOT NULL, text text NOT NULL, "authorid" integer NOT NULL REFERENCES users ON DELETE CASCADE, ts bigint NOT NULL DEFAULT (extract(epoch from now())),... ); Foreign key: post.authorid must be a valid user.id When deleting a user (row): NO ACTION (default): user not deleted, error raised CASCADE: user and all referencing posts deleted 43

Inserting Rows INSERT INTO posts(text, "authorid",...) VALUES ('Today is a good day!, 5,...); String values should be single quoted Inserting dummy rows: INSERT INTO posts(text, "authorid") SELECT 'Dummy word ' i '.', round(random() * 10) + 1 FROM generate_series(1, 20) AS s(i); 44

Queries SELECT * FROM posts WHERE ts > 147988213 AND text ILIKE '%good%' ORDER BY ts DESC, id ASC LIMIT 2; To see how a query is processed: EXPLAIN ANALYZE -- show plan tree SELECT * FROM posts WHERE ts > 147988213 AND text ILIKE '%good%' ORDER BY ts DESC, id ASC LIMIT 2; 45

(Batch) Updating Rows UPDATE post SET ts = ts + 3600 WHERE "authorid" = 10; All rows satisfying the WHERE clause will be updated ts + 3600 is an expression Can be evaluated to a single value 46

Handling Big Data INSERT INTO posts(text, "authorid") SELECT 'Dummy word ' i '.', rount(random() * 10) + 1 FROM generate_series(1, 1000000) AS s(i); Some queries will be long: EXPLAIN ANALYZE SELECT * FROM posts WHERE id > 500000 AND id < 501000; -- 1ms EXPLAIN ANALYZE SELECT * FROM posts WHERE ts > 1400000000 AND ts < 1403600000; -- 230ms 47

Using Index ts (ordered) CREATE INDEX posts_idx_ts ON posts USING btree(ts); \di -- list indices EXPLAIN ANALYZE SELECT * FROM posts WHERE ts > 1400000000 AND ts < 1403600000; -- 2ms posts_idx_ts posts id text ts 1 Good day 1493880220 33981 Hello DB! 1493897351 33982 Show me code 1493904323 48

Index for ILIKE? CREATE INDEX posts_idx_text ON posts USING btree(text); EXPLAIN ANALYZE SELECT * FROM posts WHERE text ILIKE '% word 500000%'; -- 1.5s B-tree indices are not helpful for text searches Use GIN (generalized inverted index) instead: CREATE EXTENSION pg_trgm; \dx -- list extensions CREATE INDEX posts_idx_text_trgm ON posts USING gin(text gin_trgm_ops); EXPLAIN ANALYZE SELECT * FROM posts WHERE text ILIKE '%word 500000%'; -- 50ms 49

Assignment Reading A nice SQL Tutorial We will have a quiz on SQL next Mon! 50