Database!! Structured data collection!! Records!! Relationships. Enforces that data maintains certain consistency properties

Similar documents
6.830 Lecture 2 9/11/2017 Relational Data Model (PS1 Out)

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out end of week start early!

Page 1. Goals for Today" What is a Database " Key Concept: Structured Data" CS162 Operating Systems and Systems Programming Lecture 13.

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Page 1. Quiz 18.1: Flow-Control" Goals for Today" Quiz 18.1: Flow-Control" CS162 Operating Systems and Systems Programming Lecture 18 Transactions"

6.830 Lecture Transactions October 23, 2017

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Introduction to Data Management. Lecture #1 (Course Trailer )

Introduction to Data Management. Lecture #1 (Course Trailer ) Instructor: Chen Li

Introduction to Transactions: Controlling Concurrent "Behavior" Virtual and Materialized Views" Indexes: Speeding Accesses to Data"

Each animal in ONE cage, multiple animals can share a cage Each animal cared for by ONE keeper, a keeper cares for multiple animals

TRANSACTION PROPERTIES

Introduction to Data Management. Lecture #1 (Course Trailer )

Outline. Database Management Systems (DBMS) Database Management and Organization. IT420: Database Management and Organization

Introduction to Transaction Management

Intro to Transaction Management

COURSE 1. Database Management Systems

Introduction. Example Databases

Transactions, Views, Indexes. Controlling Concurrent Behavior Virtual and Materialized Views Speeding Accesses to Data

Lecture 12. Lecture 12: The IO Model & External Sorting

Transactions. ACID Properties of Transactions. Atomicity - all or nothing property - Fully performed or not at all

Database Management Systems. Chapter 1

Introduction to Database Management Systems

BIS Database Management Systems.

MIS Database Systems.

Course Logistics & Chapter 1 Introduction

Introduction to Database Systems. Chapter 1. Instructor: . Database Management Systems, R. Ramakrishnan and J. Gehrke 1

COSC344 Database Theory and Applications. Lecture 21 Transactions

CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Introduction to Database Systems CS432. CS432/433: Introduction to Database Systems. CS432/433: Introduction to Database Systems

Introduction to Database Systems

Intro to Transactions

Database Management Systems Chapter 1 Instructor: Oliver Schulte Database Management Systems 3ed, R. Ramakrishnan and J.

Databases. Jörg Endrullis. VU University Amsterdam

In This Lecture. Transactions and Recovery. Transactions. Transactions. Isolation and Durability. Atomicity and Consistency. Transactions Recovery

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

The functions performed by a typical DBMS are the following:

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CMPT 354: Database System I. Lecture 11. Transaction Management

Introduction to Transaction Management

CSE 344 Final Review. August 16 th

6.830 Lecture Recovery 10/30/2017

Introduction. Who wants to study databases?

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Databases - Transactions

6.830 Lecture Recovery 10/30/2017

Database Systems ( 資料庫系統 ) Practicum in Database Systems ( 資料庫系統實驗 ) 9/20 & 9/21, 2006 Lecture #1

Transactions. Juliana Freire. Some slides adapted from L. Delcambre, R. Ramakrishnan, G. Lindstrom, J. Ullman and Silberschatz, Korth and Sudarshan

Transaction Management: Introduction (Chap. 16)

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

Chapter 1: Introduction

Database Technology Introduction. Heiko Paulheim

ECE 650 Systems Programming & Engineering. Spring 2018

Transactions. The Setting. Example: Bad Interaction. Serializability Isolation Levels Atomicity

Course Introduction & Foundational Concepts

Legacy SQL is a terrible language -- Date paper in 1985

Overview. Introduction to Transaction Management ACID. Transactions

CMPUT 291 File and Database Management Systems

Overview of Transaction Management

Transactions. Kathleen Durant PhD Northeastern University CS3200 Lesson 9

UVA. Database Systems. Need for information

Techno India Batanagar Computer Science and Engineering. Model Questions. Subject Name: Database Management System Subject Code: CS 601

Transaction Management

Chapter 1: Introduction

Introduction to Databases

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CS352 Lecture - The Transaction Concept

Database Systems. Announcement

Concurrency Control In Distributed Main Memory Database Systems. Justin A. DeBrabant

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

T ransaction Management 4/23/2018 1

CSCD43: Database Systems Technology. Lecture 4

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

CAS CS 460/660 Introduction to Database Systems. Fall

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Introduction C H A P T E R1. Exercises

Chapter 7 (Cont.) Transaction Management and Concurrency Control

Outline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1

CSE 190D Database System Implementation

Introduction and Overview

TRANSACTION PROCESSING PROPERTIES OF A TRANSACTION TRANSACTION PROCESSING PROPERTIES OF A TRANSACTION 4/3/2014

RELATIONAL DESIGN THEORY / LECTURE 4 TIM KRASKA

CPS352 Lecture - The Transaction Concept

John Edgar 2

CS275 Intro to Databases. File Systems vs. DBMS. Why is a DBMS so important? 4/6/2012. How does a DBMS work? -Chap. 1-2

Page 1. CS194-3/CS16x Introduction to Systems. Lecture 8. Database concurrency control, Serializability, conflict serializability, 2PL and strict 2PL

Weak Levels of Consistency

CPSC 421 Database Management Systems. Lecture 19: Physical Database Design Concurrency Control and Recovery

14.1 Answer: 14.2 Answer: 14.3 Answer: 14.4 Answer:

Consistent deals with integrity constraints, which we are not going to talk about.

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer DBMS

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

Bases de Dades: introduction to SQL (indexes and transactions)

6.830 Problem Set 3 Assigned: 10/28 Due: 11/30

Goal A Distributed Transaction

COMP Instructor: Dimitris Papadias WWW page:

Transcription:

Relational Databases Sam Madden Key ideas: Declarative programming Transactions Database Structured data collection Records Relationships Database management system (DBMS) Why? 1) Widely used 2) Several big ideas Record oriented data model Explicit Model of Data Declarative language Consistent Atomic & Isolated Durable What is a database? - persistent collection of structured data - typically organized as "records" - and relationships between records A Database management system (DBMS) is a piece of software to access and manipulate a database Why should you care: - Databases are ubiquitous - Almost all websites use them - Amazon, Google, your bank, etc - Many organizations use them internally (e.g., UCB payroll/account/etc.) - Databases provide a convenient way to encapsulate an applicationʼs state as a collection of records - Often much easier to think of state as records than files (closer to representation used in most programs) Explicit model of data provides several attractive properties Can look at data and see names and types of fields Can share data between applications easily Can evolve representation of data over time Enforces that data maintains certain consistency properties - High level declarative language - Say what I want, not how to do it - Query optimizer that systematically determines how to execute a query efficiently - Allows concurrent access from multiple users while ensuring correct behavior ( Atomicity, Isolation ) - Updates are stored persistently on disk; strong guarantees in the face of program crashes, etc. ( Durability )

Zoo admin interface edit add animal public pictures + maps zookeeper feeding 1K animals, 5K pages, 10 admins, 200 keepers ZooFS: store each page in a text file ZooFS Ops: move each snake to a new bldg custom code, consistency issues multiple simultaneous admins serial equivalence system crashes pages in uncertain state hungriest animal custom code, slow suppose i am creating a web site that stores information about a zoo. has : - admin interface that allows me to add new animals, edit animals - public interface that allows me to look at pictures and maps - zookeeper interface that allows keepers to find which animals need to be fed why not just use a file system? what does a database give the developer? 1,000 animals, 5,000 pages, 10 admins, 200 zookeepers, 10,000 hits per day why not just create a separate set of pages for each animal, store it in FS (one page for zookeepers, one page for public) Operations - suppose move all the snakes to a new building - database => queries - suppose multiple admins try to edit the same page at the same time - need some kind of locking database => ("concurrency control; serial equivalence") - suppose the system crashes mid-update - pages might be in uncertain states database provides transactions + recovery groups of actions that happen atomically -- "all or nothing" - suppose i want to find the animal that was fed the longest ago - have to write a complex program - could be very slow if it has to read and search all of the pages - suppose i to add a new field, or share with someone else Databases address all of these issues.

Modeling Features to capture How to (logically) represent data Features: Animals: name, age species, cage Cages: feedtime, bldgs Keepers:... Data Model: logical structures used for data Relational (tables): animals name age species cageno stoica 0.5 shrew 1 sam 3 salama 2 cages no feedtime bdlg 1 1:30 1 2 2:30 2 sally 1 student 1 schema: tables, fields, names, types Lets look at how data might be structured in database What features of our zoo do we want to capture? each animal has a name, age, species, and is in a cage each cage gets fed at a particular time, is in a particular building each cage is kept by many keepers each keeper keeps many cages each animal is in one cage, each cage has many animals data model --> schema Relational data model -- tables that represent entities and their properties Translates into tables by taking all of the one-to-one relations and putting them in table named for object. This tabular approach is called the relational model. Why relational? because each record is a relation between fields ( keys capture relations) keeps keepers keeper 1 1 1 2 2 1 keeper cage name 1 jenny 2 joe cage 1 stoica stoica shrew animals.5 sally sam salamander sam 1 stoica isa shrew cage 2... stioica livesin cage1 sam isa salamander... cage1 cage2 For many to one relations, need to store a reference to other table Many to many relations require a mapping table Alternatives hierarchy, network, triplets Why might I prefer one representation over the other? Are they equivalent? Think about writing a program that manipulates these structures Think about expressing certain complex relationships in some of these models?

Relational Model Many possible representations of a given data set name age species cageno feedtime bldg stoica 13 shrew 1 1:30 VLSB sam 3 salam 2 2:30 SODA sally 1 student 1 1:30 VLSB Normalization -- avoid potential inconsistencies Accessing a database names of shrews 1 for each row r in animals if r.species = ʻshrewʼ output r.name selection query SELECT r.name FROM animals WHERE r.species = ʻshrewʼ Also that there are many possible relations for a given set of data (example with joined column) Rules for choosing the best set of relations for a given data set "schema normalization" Also that the logical model of tables doesnʼt say how they are physically arranged on disk ->Physical data model -- will come back to this For now, use a physical representation similar to the logical representation -- e.g., rows in a file. what kind of operations might i want to perform on a relation? SQL -- structured query language; show slide caged in bldg 2 for each row r1 in animals 2 for each row r2 in cages if r1.bldg = r2.no and r2.bldg = VLSB output r1 join operator (join) SQL: SELECT r FROM animals AS r1, cages AS r2 WHERE r1.bldg = r2.no AND r2.bldg = VLSB avg bear age SELECT AVG(age) FROM animals WHERE type = 'bear' INSERT, DELETE, UPDATE 3 1) find the names of animals that are shrews. 2) find the animals in a cage in VLSB. (you guys) -- join 3) find the average age of the bears. add a a new snake named bill INSERT delete barney DELETE move the snakes to a new cage UPDATE Show additional queries

Under the covers -- Declarative queries: multiple equivalent procedural plans sorted animals on type => binary search + search performance - update performance indices: map from (value) -> (record list) Query optimization Declarative query -> physical execution plan DBMS chooses the execution plan Cost model Data Independence Physical Logical Can change/evolve schema over time Create views that look like old schema Declarative: Notice, however, that our procedural programs are not the only way to compute the answers to these queries When could I do something besides the procedural programs shown above? For example, if we store animals in animal type order, we can use binary search to find the animals of a particular type quickly. Is there a cost to doing this? Have to store in sorted order (more expensive inserts) Lots of other possibilities -- e.g., can have hash table (index) that maps from type -> records Query optimization -- Depending on physical representation of data, and type of query, DBMS selects what it believes to be the best plan. Uses a cost model to estimate how long different plans will take to run. Optimization selects which implementation of each operation to use, as well as order of individual operations -- e.g., can move selection below join. In declarative programming, the physical representation -- e.g., the layout in memory or on disk -- is different than the logical representation the userʼs programs interact with. Optimizerʼs job is to implement the logical query effectively on physical representation. in standard imperative programming, logical and physical representation are typically more closely aligned. E.g. can represent store the table in sorted order, or not. Repr is not exposed in SQL, or app View Replace age with birthday create view animals as (select name, now() - bday as age, species from new_animals) Decoupling of logical model from physical representation is known as physical data independence Can store the data in different ways on disk, donʼt have to change program Also talk about logical data independence Can change the logical schema, and can avoid changing program Views

Transactions -- Atomic actions begin //T1 M = read sam feedtime S = read sal feedtime change sam feedtime to S change sal feedtime to M end Intermediate state is never visible All or nothing Recoverable Concurrency control begin //T2 M = read sam feedtime change sal feedtime to M end name feedtime sam 2 sal 1 Valid outcomes: If not careful, could get n f n f n f ------- ------- ------- sam1 sam2 sam1 sal 1 sal 2 sal 2 Serial equivalence Locking protocol -- run by the DBMS acquire locks on objects before using them, releases locks at end of transaction Summary : Database systems provide relational model of data declarative query language automatic optimization transactions atomicity serial equivalence durability & recoverability Next 2 lectures + CS186 Third big idea in databases (besides data modeling, optimization + data independence): Transactions Powerful way to handle concurrent access to the database name feedtime sam 2 sally 1 Allow a user to group operations into atomic sections: begin //T1 M = read sam feedtime S = read sal feedtime change sam feedtime to S <--- external xaction cant see this until after end change sal feedtime to M end Another concurrent user canʼt see intermediate state All or nothing -- xaction may fail (because it violates some constraint, for example), but if it does all its effects are undone If xaction succeeds, its effects are permanent and on disk Even if system crashes in the middle of a query -- need some way to ensure that partial state isnʼt reflected on disk -- recovery Transactions may run concurrently, but effect is indistinguishable from running in some serial order -- Serial Equivalence begin //T2 M = read sam feedtime change sal feedtime to M end Valid outcomes: If not careful, could get n f n f n f - - - - - - sam 1 sam 2 sam 1 sal 1 sam 2 sal 2 Under the covers, how does the system achieve serializabiliy? Run only one transaction at a time? Bad idea -- no concurrency, canʼt take advantage of multiple CPUs, canʼt mask disk stalls Idea -- use automatic locking protocol