Requirement Analysis & Conceptual Database Design Problem analysis Entity Relationship notation Integrity constraints Generalization Introduction: Lifecycle Requirement analysis Conceptual Design Logical Schema Design Physical Schema Design Administration -> Text -> ER-Model -> Database schema -> Access paths Name DID Name VName VName Email SID Studentin Dozentin Email (1,*) (1,*) anmelden Name halten (0,*) KID Kurs Dauer (0,*) (0,*) Vorgänger Nachfolger voraussetzen CREATE TABLE Studentin (SID INTEGER PRIMARY KEY, VName CHAR(20) Name CHAR(30) NOT NULL, CREATE TABLE Kurs Email Char(40)); (KID CHAR(10) PRIMARY KEY, Name CHAR(40) NOT NULL, Dauer INTEGER); CREATE TABLE Anmelden (SID INTEGER PRIMARY KEY FOREIGN KEY REFERENCES Studentin (SID), KID INTEGER PRIMARY KEY FOREIGN KEY REFERENCES Studentin (KID)); 2.2 1
Introduction: Database Design Terminology Different from Software Engineering! Database Design - process of defining the structure of a database - layers of abstraction: Conceptual, logical, physical level Includes "Analysis" and "Design" from SE Database Requirements Conceptual modeling Logical modeling Physical modeling Software Engineering Requirements Analysis Design 2.3 Requirement Analysis Customer communication! Identify essential "real world" information Remove redundant, unimportant details Clarify unclear natural language statements Fill remaining gaps in discussions Distinguish data and operations Requirement analysis & Conceptual Design aims at focusing thoughts and discussions! 2.4 2
Requirement Analysis: Example I m the owner of a medium size veo store. We have over 10.000 veo tapes that we need to keep track of. Since a few weeks, we also have DVDs. Each of our veo tapes has a tape number. For each movie, we need to know its title and category (e.g. comedy, suspense, drama, action, or SciFi), director and year. Yes, we do have multiple copies of many of our movies. We give each movie a specific, and then track which movie a tape contains. A tape may be either Beta or VHS Format. We always have at least one tape for each movie we track, and each tape is always a copy of a single, specific movie. Our tapes are adapted to the movie lengths, so we don t have any movies which require multiple tapes. The movies are stored on shelf according to their category sorted by movie title. We are frequently asked for movies starring specific actors John Wayne and Katherine Hepburn are always popular. So we d like to keep track of the star actors appearing in each movie... 2.5 Conceptual Design: Example... Not all of our movies have star actors. Customers like to know each actor s real birth name and age. We track only actors who appear in the movies in our inventory. We have lots of customers. We only rent veos to people who have joined our veo club. To belong to our club, they must have good credit. For each club member, we d like to keep their first and last name, current phone number, and current address. And, of course each club member has a membership number. Then we need to keep track of what veo tapes each customer currently has checked out. A customer may check out multiple veo tapes at any given time. Rentals are for one or more days, each movie with an indivual price per day. Furthermore we additionally charge 1 $ per beta format tape, 2 $ for a DVD and another $ for movies longer than 2 hours. Maximum rental time is 4 weeks. The customer gets a bill with movie titles, indivual prices and total amount, when he /she returns movies." 2.6 3
Requirement Analysis: Example Note: what is important depends on the applications Essential information - customers, tapes, movies,... - movies have many copies - tape (or DVD) contains exactly one movie - customers have customer entification () - four weeks maximal rental time - Redundant, unimportant details - "...DVDs since a few weeks - "... John Wayne.. - Names of the categories (but categories are important!) - Tapes on shelf (since we don't design a tape robot) - 2.7 Requirement Analysis: Example Clarify unclear statements - Veo club: admission / annual fee? - Charge per tape: is price for one day the minimum? -... Fill gaps - Any discounts? -... Distinguish data/operations - Processing a bill - Becoming a member of the club - 2.8 4
Conceptual Design: Modeling primitives Entity - something which exists - Examples: tape, movie Attribute - property of an entity - Examples: tape has, movie has title Relationship: - connects two or more entities - Examples: tape contains a movie. Attributes have values Tape - Example: John Wayne is starring - Not modeled! Movie Tape Important terms Tape Movie title contain Movie title 2.9 Conceptual Design: Structuring the problem Modeling / Conceptual Design - typically: Entity-Relationship-Model, data-oriented - 1976 introduced by P.P. Chen (Peter P. Chen: The Entity-Relationship Model - Toward a Unified View of Data. TODS 1(1): 9-36, 1976) Title LID hours - sometimes: UML (problematic!) Student Matr-Nr:Number Name: String FName: String Email: String Student attend attend Name FName Matr-Nr Email LID: Number Title: String hours: Number 2.10 5
Conceptual Design: E-R notation Entities & Attributes: - Entities have entifying (or key) attributes or sets of attributes Tape key Entity attribute attribute attribute Movie title director category These attributes together entify a movie 2.11 Conceptual Design: E-R notation Relationships attribute attribute Entity_1 relationship Entity_2 - Always have a name - Do not have a key! - Not always reflected by the relationship name see example: "rents" is equivalent to "is-rented_by" Customer rents Tape attribute attribute 2.12 6
Conceptual Design: Example [ ] We have [ ] veo tapes that we need to keep track of. Since a few weeks, we also have DVDs. Each of our veo tapes has a tape number. For each movie, we need to know its title and category, director and year. We do have multiple copies of many of our movies. We give each movie a specific, and then track which movie a tape contains. A tape may be either Beta or VHS Format. We always have at least one tape for each movie we track, and each tape is always a copy of a single, specific movie. Our tapes are adapted to the movie lengths, so we don t have any movies which require multiple tapes. The movies are stored on shelf according to their category sorted by movie title. We are frequently asked for movies starring specific actors. [ ] Customers like to know each actor s real birth name and age. We track only actors who appear in the movies in our inventory. [ ] Rentals are for one or more days, each movie with an indivual price per day. Furthermore we additionally charge 1 $ per beta format tape, 2 $ for a DVD and another $ for movies longer than 2 hours. [abbreviated] 2.13 Conceptual Design: Example (without customers) Tape belong_to Format name charge category year title Movie play Actor stage_name real_name director Price_per_day length birthday 2.14 7
Conceptual Design: E-R notation Multiple relationships attend Student Recursive relationships predecessor successor require role name parent Person have child 2.15 Conceptual Design: E-R notation Relationships with attributes rating - Different semantics in separate entity: belong_to _date rating r r 2.16 8
Conceptual Design: Integrity constraints Integrity constraints (=Invariants) Database must always meet these invariants Important concept Explicit state restrictions from requirements Examples: - "There is always at least one tape for each movie, and each tape is always a copy of a single, specific movie - "Not all of our movies have star actors" Implicit invariants common Examples: - Tape cannot be loaned by more than one customer a time - Actor may be starring in more than one movie 2.17 Conceptual Design: Integrity constraints Attribute restrictions Constraints: Cardinality restrictions Attribute restrictions - Attribute contain at most one value: unstructured attribute restriction of E-RM - Key attributes are unique - Attribute must / may have a value e.g., movie must have title, director is not necessarily known - Value may be restricted e.g., movies are made after 1900 : movie.year > 1900 - Special restrictions not in E-R model! 2.18 9
Conceptual Design: Integrity constraints Cardinality restrictions - # entities-instances in relationship (UML: multiplicity) Examples: one movie on a tape, tape loaned by at most one customer at a time, #rented tapes per customer 0 one licence per driver 1:1 Person own drivers license one movie on a tape 1:N Movie Movie play Tape several actors in many movies N:M Actor 2.19 Conceptual Design: Cardinalities Simple Notation k1 k2 E1 R E2 N 1 Tape rent Customer N tapes are rented by 1 customer Min-Max Notation: (min1,max1) E1 R E2 (0,1) (0,*) Tape rent Customer Each tape is rented by 0-1 customers Each customer has rented 0-N tapes (min2,max2) 2.20 10
Conceptual Design: Cardinalities 1:1 relationship 1:N relationship N:M relationship Do not mix notations! (min,max) easier to implement person have driverslicence (0,1) Movie Tape (1,*) r (0,*) User_account has emailmessage (0,*) (1,*) 2.21 Conceptual Design: Semantics N:M, (min,max) Let R E1 x E2 be a relationship between entity sets E1 and E2 R is 1:N R is a partial function R: E2 E1 for all extensions of R e2 E2: {e1 e1 E1 (e1,e2) R } 1 R is 1:1 E2 E1 is a injective partial function R is M:N R is a relation, but not a function (R,E1) has (min1, max1) for all extensions of R and for all x E1 min1 {y y E2 (x,y) R } max1 (R,E2) has (min2, max2) for all extensions of R and for all y E2 min2 {x x E1 (x,y) R } max2 Important concept 2.22 11
Conceptual Design: Example (without customers) title Tape (1,*) belong_to category year Movie (0,*) play (0,*) (1,*) Format Actor name charge stage_name real_name director Price_per_day length birthday 2.23 Conceptual Design: notational overview E-RM (min,max) E-RM 1:N UML mandatory/ multiple (1,*) N or M 1..* k..j k optional/ multiple (0,*) N or M 0..* * 0.. k optional/ single (0,1) 1 0.. 1 mandatory/ single k,j,n,m are natural numbers More notations in use, e.g., oracle-special notation... 1 1 2.24 12
Conceptual Design: Weak entities Existentially dependent entities Not globally entifiable Example: Notation: (1,*) Building has Room (min1,max1) (min2, max2) E1 R E2 min2 = 1 : at least one e1 for each e2, otherwise e2 could not be entified partial key if max1 > 1 2.25 Conceptual Design: Weak entities Example: Movies and their scenes Example: Order and Delivery O Movie (1,*) Order contains Delivery (0,*) has (1,*) has D Scene Item I 2.26 13
Conceptual Design: Weak entities Modeling decision: (0,*) employee has child e MID or c (0,*) employee has child c Advantages and disadvantages? 2.27 Conceptual Design: Temporal data 1. Condition: tape only rented by one customer at the same time: c 2. Condition: tape rented by many customers one after the other customer c custmer rent (0,*) (0,*) customer rent (0,*) (0,1) veotape from has (0,*) until veotape is_in rental veotape (0,*) from until v v 2.28 14
Conceptual Design: Example year title category director charge Format Price_per_day name (0,*) (1,*) Movie length belong_to (0,*) play Tape First_name (0,*) Address Last_name (1,*) is_in Telephone Actor Mem_no from Rental stage_name birthday until have (0,*) Customer real_name 2.29 Conceptual Design: n-ary relationships Example: - r recommends textbook for lecture (0,*) (1,*) r recommend Textbook Cardinalities: (1,*) - A particular lecturer recommends zero or more "(textbook,lecture) pairs" - For each lecture at least one textbook is recommended by lecturers - here, (min,max)-constraints not very expressive 2.30 15
Conceptual Design: n-ary relationships 1:N notation in n-ary relationships - Remember: R is 1:N R is function R:E2 E1 - Generalization: R: E 1 x x E j-1 x E j+1 x x E n E j E1 1 N R 1 r recommend N Textbook recommend: Textbook x r recommend: Textbook x r M E2 2.31 Conceptual Design: n-ary relationships N-ary relationships expressed by binaries 1. Connect pairs of entities r 2. New central entity, binary relationships to old ones r Textbook plan Textbook - Less constraints expressible! 2.32 16
Conceptual Design: Generalization Conser lecturing students: - similar objects in different entities - Redundancy errors! FName Name FName Name Matr_no Contract Student Email r attend Generalization (specialization) helps l Email LID title hours 2.33 Conceptual Design: Generalization attend Person is-a is-a Student r Matr_no hours title LID FName Name Email p Contract DB Interpretation of Generalization differs from OO! instances of Student and r also instances of Person Student Person, r Person 2.34 17
Conceptual Design: Generalization OO-interpretation: - Instances of A, B and C different but share some attributes DB interpretation: - Instances of B and C also instances of A: B A and C A - Key-inheritance - "is-a" different from ordinary relationships Special cases: is-a - Disjoint specialization: Instances of B and C are disjoint - Complete specialization: A = B C, no extra tupel in A B A is-a C 2.35 Conceptual Design: Short summary E-R Modeling - Standard semi-formal language for conceptual modeling - Different notations: we use E-R and (min,max) Important terms - Entity (type, set): set of objects with the same attributes - Attribute: property of an entity, at most one value per attribute. - Relationship (type): association between entity types - Role: name of one se of relationship - Cardinality (multiplicity): number of entity objects that participate in relationship - Weak entity: existentially dependent entity - Generalization 2.36 18