Introduction to Data Management. Lecture #7 (Relational Design Theory)

Similar documents
Introduction to Data Management. Lecture #6 E-Rà Relational Mapping (Cont.)

Introduction to Data Management. Lecture #5 (E-R Relational, Cont.)

From Murach Chap. 9, second half. Schema Refinement and Normal Forms

Lecture 5 Design Theory and Normalization

Schema Refinement and Normal Forms

Introduction to Data Management. Lecture #5 Relational Model (Cont.) & E-Rà Relational Mapping

Lecture #8 (Still More Relational Theory...!)

Introduction to Data Management. Lecture #4 (E-R Relational Translation)

Introduction to Data Management. Lecture #4 (E-R à Relational Design)

Course Content. Database Design Theory. Objectives of Lecture 2 Database Design Theory. CMPUT 391: Database Design Theory

CIS 330: Applied Database Systems. ER to Relational Relational Algebra

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

CSE 562 Database Systems

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Introduction to Data Management. Lecture 16 (SQL: There s STILL More...) It s time for another installment of...

Introduction to Data Management. Lecture #3 (Conceptual DB Design)

The Relational Model. Chapter 3. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Introduction to Data Management. Lecture #11 (Relational Languages I)

Introduction to Data Management. Lecture #4 E-R Model, Still Going

Introduction to Data Management. Lecture #13 (Relational Calculus, Continued) It s time for another installment of...

CS411 Database Systems. 05: Relational Schema Design Ch , except and

SCHEMA REFINEMENT AND NORMAL FORMS

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

FUNCTIONAL DEPENDENCIES

Lecture #16 (Physical DB Design)

Introduction to Data Management. Lecture #3 (E-R Design, Cont d.)

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

Introduction to Data Management. Lecture #10 (Relational Calculus, Continued)

The Relational Model 2. Week 3

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Schema Refinement: Dependencies and Normal Forms

Keys, SQL, and Views CMPSCI 645

Introduction to Data Management. Lecture #3 (Conceptual DB Design)

Friday Nights with Databases!

Schema Refinement: Dependencies and Normal Forms

Schema Refinement: Dependencies and Normal Forms

The Relational Model. Chapter 3

Introduction to Data Management. Lecture #14 (Relational Languages IV)

The Entity-Relationship Model. Overview of Database Design

Introduction to Data Management. Lecture #17 (Physical DB Design!)

Relational Databases BORROWED WITH MINOR ADAPTATION FROM PROF. CHRISTOS FALOUTSOS, CMU /615

Database Applications (15-415)

Database Applications (15-415)

The Relational Model. Roadmap. Relational Database: Definitions. Why Study the Relational Model? Relational database: a set of relations

Database Normalization

Introduction to Data Management. Lecture 14 (SQL: the Saga Continues...)

Database Management System

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

Introduction to Data Management. Lecture #17 (SQL, the Never Ending Story )

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

Mapping ER Diagrams to. Relations (Cont d) Mapping ER Diagrams to. Exercise. Relations. Mapping ER Diagrams to Relations (Cont d) Exercise

Database Design Theory and Normalization. CS 377: Database Systems

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

CompSci 516 Database Systems. Lecture 2 SQL. Instructor: Sudeepa Roy

Why Study the Relational Model? The Relational Model. Relational Database: Definitions. The SQL Query Language. Relational Query Languages

Informal Design Guidelines for Relational Databases

Introduction to Data Management. Lecture #11 (Relational Algebra)

CS 2451 Database Systems: Database and Schema Design

Database Management Systems. Chapter 3 Part 2

CAS CS 460/660 Introduction to Database Systems. Relational Algebra 1.1

Introduction to Data Management. Lecture #3 (Conceptual DB Design) Instructor: Chen Li

The Relational Model (ii)

Database Systems ( 資料庫系統 )

Databases - Relational Algebra. (GF Royle, N Spadaccini ) Databases - Relational Algebra 1 / 24

This lecture. Projection. Relational Algebra. Suppose we have a relation

Relational Algebra. Chapter 4, Part A. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Data Modeling. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

The Relational Model. Outline. Why Study the Relational Model? Faloutsos SCS object-relational model

Overview of the Class and Introduction to DB schemas and queries. Lois Delcambre

Relational Database Design. Announcements. Database (schema) design. CPS 216 Advanced Database Systems. DB2 accounts have been set up

CSCI 127 Introduction to Database Systems

UNIT 3 DATABASE DESIGN

SQL: The Query Language Part 1. Relational Query Languages

The Relational Model. Relational Data Model Relational Query Language (DDL + DML) Integrity Constraints (IC)

To overcome these anomalies we need to normalize the data. In the next section we will discuss about normalization.

Announcements (January 20) Relational Database Design. Database (schema) design. Entity-relationship (E/R) model. ODL (Object Definition Language)

Lecture 2 SQL. Announcements. Recap: Lecture 1. Today s topic. Semi-structured Data and XML. XML: an overview 8/30/17. Instructor: Sudeepa Roy

CS 338 Functional Dependencies

We shall represent a relation as a table with columns and rows. Each column of the table has a name, or attribute. Each row is called a tuple.

The Relational Model

Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

Functional Dependencies CS 1270

CIS 330: Applied Database Systems

The Relational Data Model. Data Model

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

MIS Database Systems Relational Algebra

Database Management Systems. Chapter 3 Part 1

Introduction to Data Management. Lecture #1 (The Course Trailer )

CSE 544 Principles of Database Management Systems

Typical relationship between entities is ((a,b),(c,d) ) is best represented by one table RS (a,b,c,d)

Chapter 8: Relational Database Design

Contents. Database. Information Policy. C03. Entity Relationship Model WKU-IP-C03 Database / Entity Relationship Model

The Entity-Relationship Model

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

The Relational Model

Transcription:

Introduction to Data Management Lecture #7 (Relational Design Theory) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW#2 is underway...! (From HW #1 solution) Use your best judgement on data types and NULLs Capture all the capturable ICs (PK/FK/UNIQUE) v A few logistical notes (waiting list, office hours) Send me an e-mail if you need a quiz make-up plan 420 ( 330) Hybrid F2F/online (Piazza) class (K) v Today s plan: Super-quick E-R wrap-up (from last time) Relational DB design theory (I) Disclaimer: Not the most exciting part of CS122A J Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2

Reminder: Putting the Basics Together (from E-Rà Relational) cid cname login oid shipto total Customer 1 Placed N Order 1 sku pname color listprice N Has N LineItem lno price Product 1 For qty Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3 Reminder: Putting It Together (Cont d.) CREATE TABLE Customer ( cid INTEGER, cname VARCHAR(50), login VARCHAR(20) NOT NULL, PRIMARY KEY (cid), UNIQUE (login)) CREATE TABLE Product ( sku INTEGER, pname VARCHAR(100), color VARCHAR(20), listprice DECIMAL(8,2), PRIMARY KEY (sku)) CREATE TABLE Order ( oid INTEGER, custid INTEGER, shipto VARCHAR(200), total DECIMAL(8,2), PRIMARY KEY (oid), FOREIGN KEY (custid) REFERENCES Customer)) CREATE TABLE LineItem ( oid INTEGER, lno INTEGER, price DECIMAL(8,2), qty INTEGER, sku INTEGER, PRIMARY KEY (oid, lno), FOREIGN KEY (oid) REFERENCES Order ON DELETE CASCADE), FOREIGN KEY (sku) REFERENCES Product)) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4

Reminder: Putting It Together (Cont d.) Customer cid cname login 1 Smith, James jsmith@aol.com 2 White, Susan suzie@gmail.com 3 Smith, James js@hotmail.com Product sku pname color listprice 123 Frozen DVD null 24.95 456 Graco Twin Stroller green 199.99 789 Moen Kitchen Sink black 350.00 Order oid custid shipto total LineItem oid lno price qty item 1 3 J. Smith, 1 Main St., USA 199.95 1 1 169.95 1 456 2 1 Mrs. Smith, 3 State St., USA 300.00 1 2 15.00 2 123 2 1 300.00 1 789 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5 Relational Model and E-R Schema Translation: Summary v Relational model: a tabular representation of data. v Simple and intuitive, also widely used. v Integrity constraints can be specified by the DBA based on application semantics. DBMS then checks for violations. Most important ICs: Primary and foreign keys (PKs, FKs). In addition, we always have domain constraints. v Powerful and natural query languages exist (soon!) v Rules to translate E-R to relational model Can be done by a human, or automatically (using a tool) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6

Relational Database Design v Two aspects to the RDB design problem: Logical schema design: We just saw one approach, namely, doing E-R modeling followed by an E-R à relational schema translation step Physical schema design: Later, once we learn about indexes, when should we utilize them? v We will look at both problem aspects this term, starting first with relational schema design Our power tools will be functional dependencies (FDs) and normalization theory Note: FDs also play an important role in other contexts as well, e.g., SQL query optimization Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 So, Given a Relational Schema... v How do I know if my relational schema is a good logical database design or not? What might make it not good? How can I fix it, if indeed it s not good? How good is it, after I ve fixed it? v Note that your relational schema might have come from one of several places You started from an E-R model (but maybe that model was wrong or incomplete in some way?) You went straight to relational in the first place It s not your schema you inherited it! J Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8

Ex: Wisconsin Sailing Club Proposed schema design #1: sid sname rating age date bid bname color 22 Dustin 7 45.0 10/10/98 101 Interlake blue 22 Dustin 7 45.0 10/10/98 102 Interlake red 22 Dustin 7 45.0 10/8/98 103 Clipper green 22 Dustin 7 45.0 10/7/98 104 Marine red 31 Lubber 8 55.5 11/10/98 102 Interlake red 31 Lubber 8 55.5 11/6/98 103 Clipper green 31 Lubber 8 55.5 11/12/98 104 Marine red........................ Q: Do you think this is a good design? (Why or why not?) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9 Ex: Wisconsin Sailing Club Proposed schema design #2: sid sname rating age 22 Dustin 7 45.0 31 Lubber 8 55.5............ Q: What about this design? Is #2 better than #1...? Explain! Is it a best design? How can we go from design #1 to this one? si d bid date 22 101 10/10/98 22 102 10/10/98 22 103 10/8/98 22 104 10/7/98 31 102 11/10/98 31 103 11/6/98 31 104 11/12/98......... bid bname color 101 Interlake blue 102 Interlake red 103 Clipper green 104 Marine red Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10

Ex: Wisconsin Sailing Club Proposed schema design #3: sid sname rating age 22 Dustin 7 45.0 31 Lubber 8 55.5............ Q: What about this design? Is #3 better or worse than #2...? What sort of tradeoffs do you see between the two? si d bid date 22 101 10/10/98 22 102 10/10/98 22 103 10/8/98 22 104 10/7/98 31 102 11/10/98 31 103 11/6/98 31 104 11/12/98......... Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11 bid bname 101 Interlake 102 Interlake 103 Clipper 104 Marine bid color 101 blue 102 red 103 green 104 red The Evils of Redundancy (or: The Evils of Redundancy) v Redundancy is at the root of several problems associated with relational schemas: Redundant storage (space) Good rule to follow: Insert/delete/update anomalies One fact, one place! v Functional dependencies can help in identifying problem schemas and suggesting refinements. v Main refinement technique: decomposition, e.g., replace R(ABCD) with R1(AB) + R2(BCD). v Decomposition should be used judiciously: Is there reason to decompose a relation? Does the decomposition cause any problems? Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12

Functional Dependencies (FDs) v A functional dependency X à Y holds over relation R if, for every allowable instance r of R: For t1 and t2 in r, t1.x = t2.x implies t1.y = t2.y I.e., given two tuples in r, if the X values agree, then their Y values must also agree. (X and Y can be sets of attributes.) v An FD is a statement about all allowable relations. Identified based on application semantics (similar to E-R). Given some instance r1 of R, we can check to see if it violates some FD f, but we cannot tell if f holds over R! v Saying K is a candidate key for R means that Kà R Note: K à R alone does not require K to be minimal! If K is minimal, then K is a candidate key (else it s a super key). Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13 Example: Constraints on an Entity Set v Suppose you re given a relation called HourlyEmps: HourlyEmps (ssn, name, lot, rating, hrly_wages, hrs_worked) v Notation: Let s denote this relation schema by simply listing the attributes: SNLRWH This is really the set of attributes {S,N,L,R,W,H}. Sometimes, we will refer to all attributes of a relation by using the relation name (e.g., HourlyEmps for SNLRWH). v Suppose we also have some FDs on HourlyEmps: ssn is the key: S à SNLRWH rating determines hrly_wages: R à W Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14

Example (Cont d.) Wages HourlyEmps2 5 7 v Problems due to R à W : S N L R H Update anomaly: What if we change W in just 123-22-3666 Attishoo 48 8 40 the 1st tuple of SNLRWH? 231-31-5368 Smiley 22 8 30 Insertion anomaly: What if 131-24-3650 Smethurst 35 5 30 we want to insert a new employee and don t know 434-26-3751 Guldu 35 5 32 the proper hourly wage for 612-67-4134 Madayan 35 8 40 his or her rating? Deletion anomaly: If we S N L R W H delete all employees with 123-22-3666 Attishoo 48 8 10 40 rating 5, we lose the stored 231-31-5368 Smiley 22 8 10 30 information about the 131-24-3650 Smethurst 35 5 7 30 wage for rating 5! How about two smaller tables? 434-26-3751 Guldu 35 5 7 32 612-67-4134 Madayan 35 8 10 40 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15 R W 8 10 Reasoning About FDs v Given some FDs, we can usually infer additional FDs: ssn did, did lot implies ssn lot (Translation: Matching ssns imply matching lots.) v An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold. F + = closure of F is the set of all FDs that are implied by F. v Armstrong s Axioms (X, Y, Z are sets of attributes): Reflexivity: If X Í Y, then Y X Augmentation: If X Y, then XZ YZ for any Z Transitivity: If X Y and Y Z, then X Z v These are sound and complete inference rules for FDs! Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16

Armstrong s Axioms: Examples pno name title state zip 1 Sandy Professor CA 92697 2 Joe Jim Gray Professor CA 94720 3 Anhai Professor WI 53706 4 Alex Associate Professor CA 92697 v Reflexivity: If X Í Y then YàX: Í zip (zip, name), so (zip, name) à zip. v Augmentation: If XàY then XZàYZ for any Z: zip à state, so (zip, title) à (state, title). v Transitivity: If XàY and YàZ then XàZ: pno à zip and zip à state, so pno à state. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17 Reasoning About FDs (Cont d.) (Recall: two matching X s always have the same Y ) v A few additional rules (which follow from AA): Union: If X Y and X Z, then X YZ Decomposition: If X YZ, then X Y and X Z v Example: Contracts(cid,sid,pjid,did,pid,qty,value), and: The contract id is the key: C CSJDPQV A project purchases each part using single contract: JP C A dept purchases at most one part from a supplier: SD P v JP C, C CSJDPQV imply JP CSJDPQV v SD P implies SDJ JP (New candidate keys...!) v SDJ JP, JP CSJDPQV imply SDJ CSJDPQV Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18