Database Systems Lecture2:E-R model Juan Huo( 霍娟 ) Reference slides: http://www.cs.wisc.edu/ dbbook Berkeley, Professor Eben Haber,Professor Mary Roth
Review: Benefits of a DBMS 1. Data independence applications worry about what data they want, not how it is stored 2. Efficient data access DBMS is smart about how to retrieve data 3. Data integrity and security DBMS won t let you corrupt data 4. Centralized administration stored data on single server and let people specialize in managing it 5. Concurrent access Handles multiple users efficiently and recoverably 6. Reduced application development time Derived from 1-5
Review: DBMS components Database application Talks to DBMS to manage data for a specific task -> e.g. app to withdraw/deposit money or provide a history of the account Query Optimization and Execution Relational Operators Access Methods Buffer Management Disk Space Management Figures out the best way to answer a question -> There is always nore than 1 way to skin a cat! Provides generic ways to combine data -> Do you want a list of customers and accounts or the total account balance of all customers? Provides efficient ways to extract data -> Do you need 1 record or a bunch? Makes efficient use of RAM -> Think 1,000,000 simultaneous requests! Makes efficient use of disk space -> Think 300,000,000 accounts! DB
Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. Legacy systems in older models E.G., IBM s IMS Recent competitor: object-oriented model ObjectStore, Versant, Ontos A synthesis emerging: object-relational model Informix Universal Server, UniSQL, O2, Oracle, DB2 1
Database Design Requirements Analysis ---use needs, what data, what application Conceptual Database Design ---high level description (often done w/er model) Logical Database design ---translate ER into DBMS data model Schema Refinement ---consistency, normalization Physical Database Design ---index, disk access and buffer management Security Design ---who can access what and how E-R model related
Database Design Database Design Tools: Design tools are available from RDBwiS vendors as well as third-party vendors. For example! see the following link fordetails on design and analysis tools from Sybase: http://www.sybase.com/products/application_tools The following provides details on Oracle's tools: http://www.oracle.com/tools
Conceptual Design The world(facts and information) Data model and schema
Conceptual Design Define enterprise entities and relationships What information about entities and relationships should be in database? What are the integrity constraints or business rules that hold? A database `schema in the ER Model is represented pictorially (ER diagrams). Can map an ER diagram into a relational schema.
ER Model Basics ssn name lot Entity: Employees Real-world thing, distinguishable from other objects. Entity described by set of attributes. Entity Set: A collection of similar entities. E.g., all employees. All entities in an entity set have the same set of attributes. (Until we consider hierarchies, anyway!) Each entity set has a key (underlined). Each attribute has a domain.
ER Model Basics (Contd.) name since dname ssn lot did budget Employees Works_In Departments Relationship: Association among two or more entities. E.g., X works in Pharmacy department. relationships can have their own attributes. Relationship Set: Collection of similar relationships. An n-ary relationship set R relates n entity sets E 1... E n ; each relationship in R involves entities e 1 E 1,..., e n E n
Relationship Sets A relationship is an association among several entities Example: Hayes depositor A-102 customer entity relationship set account entity A relationship set is a mathematical relation among n 2 entities, each taken from entity sets {(e 1, e 2, e n ) e 1 E 1, e 2 E 2,, e n E n } where (e 1, e 2,, e n ) is a relationship Example: (Hayes, A-102) depositor
Attributes An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set. Example: customer = (customer_id, customer_name, customer_street, customer_city ) loan = (loan_number, amount ) Domain the set of permitted values for each attribute Attribute types: Simple and composite attributes. Single-valued and multi-valued attributes Example: multivalued attribute: phone_numbers Derived attributes Can be computed from other attributes Example: age, given date_of_birth(abase attribute)
Attribute Types Each attribute of a relation has a name The set of allowed values for each attribute is called the domain of the attribute Attribute values are (normally) required to be atomic; that is, indivisible E.g. the value of an attribute can be an account number, but cannot be a set of account numbers Domain is said to be atomic if all its members are atomic The special value null is a member of every domain The null value causes complications in the definition of many operations We shall ignore the effect of null values in our main presentation and consider their effect later
Example of a Relation attributes (or columns) tuples (or rows)
Relationship Sets (Cont.) An attribute can also be property of a relationship set. For instance, the depositor relationship set between entity sets customer and account may have the attribute access-date
Composite Attributes
A Problem with the Relational Model CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2)) CREATE TABLE Students (sid CHAR(20), name CHAR(20), login CHAR(10), age INTEGER, gpa FLOAT) With complicated schemas, it may be hard for a person to understand the structure from the data definition. Enrolled cid grade sid Carnatic101 C 53666 Reggae203 B 53666 Topology112 A 53650 History105 B 53666 Students sid name login age gpa 53666 Jones jones@cs 18 3.4 53688 Smith smith@eecs 18 3.2 53650 Smith smith@math 19 3.8
One Solution: The E-R Model Instead of relations, it has: Entities and Relationships These are described with diagrams, both structure, notation more obvious to humans A visual language for describing schemas name since dname ssn lot did budget Students Enrolled_in Courses
E-R Diagrams n n n n n Rectangles represent entity sets. Diamonds represent relationship sets. Lines link attributes to entity sets and entity sets to relationship sets. Ellipses represent attributes l l Double ellipses represent multivalued attributes. Dashed ellipses denote derived attributes. Underline indicates primary key attributes (will study later)
E-R Diagram With Composite, Multivalued, and Derived Attributes
Relationship Sets with Attributes
ER Model Basics (Cont.) name ssn lot did dname budget since subordinate supervisor Employees Departments Works_In Reports_To Same entity set can participate in different relationship sets, or in different roles in the same set.
Roles Entity sets of a relationship need not be distinct The labels manager and worker are called roles; they specify how employee entities interact via the works_for relationship set. Roles are indicated in E-R diagrams by labeling the lines that connect diamonds to rectangles. Role labels are optional, and are used to clarify semantics of the relationship
name since dname ssn lot did budget Key Constraints An employee can work in many departments; a dept can have many employees. Employees since Manages Works_In Departments In contrast, each dept has at most one manager, according to the key constraint on Manages. Many-to- Many 1-to Many 1-to-1
Integrity Constraints (ICs) IC: condition that must be true for any instance of the database; e.g., domain constraints. ICs are specified when schema is defined. ICs are checked when relations are modified. A legal instance of a relation is one that satisfies all specified ICs. DBMS should not allow illegal instances. If the DBMS checks ICs, stored data is more faithful to real-world meaning. Avoids data entry errors, too! Database Management Systems, R. Ramakrishnan and J. Gehrke 10
Primary Key Constraints A set of fields is a key for a relation if : 1. No two distinct tuples can have same values in all key fields, and 2. This is not true for any subset of the key. Part 2 false? A superkey. If there s >1 key for a relation, one of the keys is chosen (by DBA) to be the primary key. E.g., sid is a key for Students. (What about name?) The set {sid, gpa} is a superkey. Database Management Systems, R. Ramakrishnan and J. Gehrke 10
Primary and Candidate Keys in SQL Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key. For a given student and course, there is a single grade. vs. Students can take only one course, and receive a single grade for that course; further, no two students in a course receive the same grade. Used carelessly, an IC can prevent the storage of database instances that arise in practice! Database Management Systems, R. Ramakrishnan and J. Gehrke CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid) ) CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade) ) 10
Foreign Keys, Referential Integrity Foreign key : Set of fields in one relation that is used to `refer to a tuple in another relation. (Must correspond to primary key of the second relation.) Like a `logical pointer. E.g. sid is a foreign key referring to Students: Enrolled(sid: string, cid: string, grade: string) If all foreign key constraints are enforced, referential integrity is achieved, i.e., no dangling references. Can you name a data model w/o referential integrity? Links in HTML! Database Management Systems, R. Ramakrishnan and J. Gehrke 10
Mapping Cardinality Constraints Express the number of entities to which another entity can be associated via a relationship set. Most useful in describing binary relationship sets. For a binary relationship set the mapping cardinality must be one of the following types: One to one One to many Many to one Many to many
Mapping Cardinalities One to one One to many Note: Some elements in A and B may not be mapped to any elements in the other set
Mapping Cardinalities Many to one Many to many Note: Some elements in A and B may not be mapped to any elements in the other set
Cardinality Constraints We express cardinality constraints by drawing either a directed line ( ), signifying one, or an undirected line ( ), signifying many, between the relationship set and the entity set. One-to-one relationship: A customer is associated with at most one loan via the relationship borrower A loan is associated with at most one customer via borrower
One-To-Many Relationship In the one-to-many relationship a loan is associated with at most one customer via borrower, a customer is associated with several (including 0) loans via borrower Note: the arrow can be either in the entity side or in the relationship side
Many-To-One Relationships In a many-to-one relationship a loan is associated with several (including 0) customers via borrower, a customer is associated with at most one loan via borrower
Weak Entity Sets An entity set that does not have a primary key is referred to as a weak entity set. The existence of a weak entity set depends on the existence of an identifying entity set it must relate to the identifying entity set via a total, oneto-many relationship set from the identifying to the weak entity set Identifying relationship depicted using a double diamond The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set. The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set s discriminator.
Weak Entity Sets (Cont.) We depict a weak entity set by double rectangles. We underline the discriminator of a weak entity set with a dashed line. payment_number discriminator of the payment entity set Primary key for payment (loan_number, payment_number)
Many-To-Many Relationship A customer is associated with several (possibly 0) loans via borrower A loan is associated with several (possibly 0) customers via borrower
Participation of an Entity Set in a Relationship Set n Total participation (indicated by double line or a dark line): every entity in the entity set participates in at least one relationship in the relationship set l E.g. participation of loan in borrower is total every loan must have a customer associated to it via borrower n Partial participation: some entities may not participate in any relationship in the relationship set l Example: participation of customer in borrower is partial
Participation Constraints Does every employee work in a department? If so, this is a participation constraint the participation of Employees in Works_In is said to be total (vs. partial) What if every department has an employee working in it? Basically means at least one ssn name lot since did dname budget Employees Manages Departments Works_In since Means: exactly one
Alternative Notation for Cardinality Limits n Cardinality limits can also express participation constraints With numbers associated with the line, we can describe a more complex constraint.
Binary vs. Ternary Relationships ssn name lot pname age If each policy is owned by just 1 employee: Key constraint on Policies would mean policy can only cover 1 dependent! Employees Bad design name ssn lot Employees Covers Policies policyid cost Dependents pname age Dependents Think through all the constraints in the 2nd diagram! Purchaser Better design Beneficiary Policies policyid cost
Binary vs. Ternary Relationships (Contd.) Previous example illustrated case when two binary relationships were better than one ternary relationship. Opposite example: a ternary relation Contracts relates entity sets Parts, Departments and Suppliers, and has descriptive attribute qty. No combination of binary relationships is an adequate substitute.
Binary vs. Ternary Relationships (Contd.) qty Parts Contract Departments Suppliers VS. Parts needs Departments can-supply Suppliers deals-with S can-supply P, D needs P, and D deals-with S does not imply that D has agreed to buy P from S. How do we record qty?
E-R Diagram with a Ternary Relationship
Aggregation n Consider the ternary relationship works_on, which we saw earlier n Suppose we want to record managers for tasks performed by an employee at a branch
Aggregation (Cont.) Relationship sets works_on and manages represent overlapping information Every manages relationship corresponds to a works_on relationship However, some works_on relationships may not correspond to any manages relationships So we can t discard the works_on relationship Eliminate this redundancy via aggregation Treat relationship as an abstract entity Allows relationships between relationships Abstraction of relationship into new entity Without introducing redundancy, the following diagram represents: An employee works on a particular job at a particular branch An employee, branch, job combination may have an associated manager
E-R Diagram With Aggregation
Reduction to Relation Schemas Primary keys allow entity sets and relationship sets to be expressed uniformly as relation schemas that represent the contents of the database. A database which conforms to an E-R diagram can be represented by a collection of schemas. For each entity set and relationship set there is a unique schema that is assigned the name of the corresponding entity set or relationship set. Each schema has a number of columns (generally corresponding to attributes), which have unique names.
Representing Entity Sets as Schemas A strong entity set reduces to a schema with the same attributes. A weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set payment = ( loan_number, payment_number, payment_date, payment_amount )
Representing Relationship Sets as Schemas A many-to-many relationship set is represented as a schema with attributes for the primary keys of the two participating entity sets, and any descriptive attributes of the relationship set. Example: schema for relationship set borrower borrower = (customer_id, loan_number )
Entity Type Hierarchies
Properties of IsA
Advantages of IsA
IsA Hierarchy - Example
UML UML: Unified Modeling Language UML has many components to graphically model different aspects of an entire software system UML Class Diagrams correspond to E-R Diagram, but several differences.
Summary of UML Class Diagram Notation
Schema Diagram for University Database
Logical DB Design: ER to Relational Entity sets to tables. ssn name lot Employees Database Management Systems, R. Ramakrishnan and J. Gehrke CREATE TABLE Employees (ssn CHAR(11), name CHAR(20), lot INTEGER, PRIMARY KEY (ssn)) 20
Relationship Sets to Tables In translating a relationship set to a relation, attributes of the relation must include: Keys for each participating entity set (as foreign keys). This set of attributes forms a superkey for the relation. All descriptive attributes. CREATE TABLE Works_In( ssn CHAR(1), did INTEGER, since DATE, PRIMARY KEY (ssn, did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)
Summary of Conceptual Design Conceptual design follows requirements analysis, Yields a high-level description of data to be stored ER model popular for conceptual design Constructs are expressive, close to the way people think about their applications. Basic constructs: entities, relationships, and attributes (of entities and relationships). Some additional constructs: weak entities, ISA hierarchies, and aggregation. Note: There are many variations on ER model. Database Management Systems, R. Ramakrishnan and J. Gehrke 10
Summary of ER (Contd.) Several kinds of integrity constraints can be expressed in the ER model: key constraints, participation constraints, and overlap/covering constraints for ISA hierarchies. Some foreign key constraints are also implicit in the definition of a relationship set. Some constraints (notably, functional dependencies) cannot be expressed in the ER model. Constraints play an important role in determining the best database design for an enterprise. Database Management Systems, R. Ramakrishnan and J. Gehrke 10
Summary of ER (Contd.) ER design is subjective. There are often many ways to model a given scenario! Analyzing alternatives can be tricky, especially for a large enterprise. Common choices include: Entity vs. attribute, entity vs. relationship, binary or ternary relationship, whether or not to use ISA hierarchies, and whether or not to use aggregation. Ensuring good database design: resulting relational schema should be analyzed and refined further. FD information and normalization techniques are especially useful. Database Management Systems, R. Ramakrishnan and J. Gehrke 10