Conceptual Data odeling A data model is a way to describe the structure of the data. In models that are implemented it includes a set of operations that manipulate the data. A Data odel is a combination of at least three components: 1. A collection of data structure types. 2. A collection of operators or rules of inference, which can be applied to any valid instance of the data types listed above. 3. A collection of general integrity rules, which implicitly or explicitly define the set of consistent data base states or change of state or both 1 4 There are two types of data models, conceptual and logical. Logical data models are relational, network, hierarchical, inverted list or object oriented. The relational logical model data stored in tables with no repeating groups. It is based on a mathematical model. E.F. Codd presented the idea first in the early 1970's Some of the commercial relational data models are DB2, Oracle, Ingress, icrosoft Access 2 5 A conceptual data model shows the structure of the data including relationships. It is also: A communication tool Independent of commercial DBSs Easy to learn and use Provides semantics Is a graphical representation of the data The Entity-Relationship model is the most common conceptual data model used. The network logical model data stored in records and associations that are called sets. This is a complex model based on CODASYL model and was created in the 1970's (by a committee) 3 6 1
Using high-level conceptual data models for database design The hierarchical logical model data stored in a tree structure with parent/child relationships. The first commercial DBS was Information anagement System (IS) created by IB in the 1960s. The database designer must collect and analyze the requirements for the system. This is done by interviewing the people who will be using the database to understand and document the data requirements and the functional requirements of the application. 7 10 An inverted list logical model is a tabular representation of the data that uses indices to access the tables. When they were first used in the 1970's they said they were relational, but they are not because repeating groups are allowed. ADABAS is a current example. Also inverted lists are used in search engine indexing algorithms. The conceptual schema and design is a description of the data requirements. This includes detailed descriptions of the entity types, relationships and constraints. This is then transformed from a high-level data model into an implementation data model. 8 11 An object oriented logical data model the data stored as objects. These objects have an object identifier, attributes and methods. A class can be thought of as an abstract data type. A logical design or data model mapping results in a database schema that can be implemented. The physical design pe includes internal storage structures, file organization, indexes, access paths, and other physical design parameters. 9 12 2
Entity-Relationship model The ER model was first introduced by Peter Chen in 1976. It is simple and easy to understand by both the database designer and the end user. So for a person entity you might have attributes like Gender Social security number Name Height Weight 13 16 The ER model describes data as entities, relationships and attributes. Entity an object that is distinguishable from other objects, something in the real world. An entity set is a set of similar entities. For example in a bank you have, Employees and s. All are entities, but and Employee are both people so constitute an entity set. Some specific kinds of attributes: Super key a set of attributes capable of uniquely identifying an entity Candidate key a super key such that no proper subset can uniquely identify the entity Primary key a chosen candidate key 14 18 Attributes are particular properties that describe an entity. There are a variety of attribute types: Composite versus simple(atomic) Single valued versus multivalued Stored versus derived Null values Complex attributes In mathematics a relation is typically defined as a collection of ordered pairs containing one object from each set. If you an object x from the first set and y from the second, the objects are said to be related if the ordered pair (x,y) is in the relation 15 19 3
Ordered n-tuple for n>0, an ordered n-tuple is an ordered sequence of n objects that we denote <a 1,a 2,a 3,a 4,...a n > Cartesian product (cross product) - is a mathematical operation which returns a set (or product set or simply product) from multiple sets. That is, for sets A and B, the Cartesian product A B is the set of all ordered pairs (a, b) where a A and b B. The function that an entity plays in the relationship is called the role. Roles are always there and usually obvious, but sometimes they need to be explicitly stated. 20 23 A relation can be considered a subset of A B. A relationship is an element of a relation or one of the ordered n-tuples. If you look at the entity set employee and the relationship set works for: Employee Works for E1 Bill <e1,e2> E2 John <e3,e2> E3 ary Where the second in the ordered pair is the supervisor, order must be stated. 21 24 So if you have a customer entity set and an account entity set: X (customer) Relationship set Y (account) X1 Bill <x2,y4> y1 acct1 X2 Jill <x1,y1> y2 acct2 X3 Joe <x3,y2> y3 acct 3 y4 acct4 These are ordered n-tuples which means that the first element is from x and the second from y. 22 The degree of a relationship type is the number of entity types that participate in the relationship. Binary means there are 2 entity types, ternary means three and so on 25 4
Cardinalities of relationships this is usually stated in terms of binary relationships, but can be useful in higher (n-ary) relationships. It is used to describe the relationship set. This is one type of constraint we can put on a relationship. One-to-one One-to-many (many-to-one) any-to-many On the other hand, this one-to-many relationship says that this database will never support the idea of a joint account (multiple customers with one account). If that is what is needed or wanted (based on the business rules) then the cardinality needs to be many-to-many. 26 29 Cardinalities are used to describe the intended relationship between entity sets. For example customers and accounts. Existence dependent relationship is another constraint. If we say our bank database accounts and accounts have transactions, we cannot have a situation where there is a transaction that is not associated with any account. If the account is removed, then the transactions associated with it must also be removed. 27 30 We might define the relationship between customer and account to be one-to-many. eaning that a customer may have multiple accounts. On any given day it may be that every customer only one account, and therefore it is a oneto-one relationship. But this is ok. The account is the dominant entity and the transaction is the subordinate entity. 28 31 5
If an entity does not have enough attributes to uniquely identify it, it is said to be a weak entity set. An entity set that can form a primary key is a strong entity set. Weak entity sets must be in a one-to-many relationship with a strong entity set. (one strong many weak) 32 account # Strong entity # is the primary key Trans id Transaction Weak entity Trans id is discriminator 35 The relationship set should have no attributes, they should be pushed to the weak entity side. The weak entity will have a set of attributes called a discriminator which can uniquely it within the context of its strong entity partner. Weak entities are always existence dependent on the strong entities. Attributes of relationships always have the primary keys of the entity sets that participate in the relationship. Relationships can also have their own additional attributes. 33 36 This does not mean that all existent dependent relationships are weak/strong entity pairs. The primary key of the weak entity set is the concatenation of the primary key of the strong entity and it's discriminator. 34 If you look at something like student takes course, there might or might not be relationship data. It depends on the rules. If (only allowed to take a course once) Pk of takes is pk(student) + pk(course) Else PK of takes is pk(student) + pk(course) + Semester Semester endif Student takes Course 37 6
How do you know when to make something an entity versus making it an attribute? It depends. In some situations having phone as an attribute of employee works, but if the employee multiple phone numbers, or multiple people use the same phone number, it might work better to have it as an entity. Each customer can have multiple accounts. 1 38 41 Entity Relationship Diagrams Rectangles entity sets Each account is owned by many customers. Diamonds relationship sets 1 Ovals - attributes 39 42 Each customer can have one account. 1 1 Each customer can have multiple accounts and each account can be owned by multiple customers. N 40 43 7
Recursive a manager many people working for her. Ternary Relationships A customer can have several accounts, and each account can have several owners, and accounts are at one branch. Branch anager Employee 1 Works for subordinate 1 N 44 47 Subtype/Supertype Entities Existence dependent put an E in the relationship diamond Entity 1 Rel E Entity 2 In some instances it it possible for the same entity to play multiple roles in a database. Suppose we have a university database and there are students and faculty we want to keep track of. 45 48 Weak/strong entities put an E in the relationship diamond and a second rectangle. Weak entity is always existence dependent upon the strong. GPA Name Students are enrolled in courses Faculty teaches courses Student SSN Enrolls N Course Entity 1 Rel E Entity 2 Name SSN Teaches Faculty 1 46 Rank 49 8
An entity in both student and faculty is going to produce duplicate data. Since redundancy is one of the problems that a database needs to address, this is bad. Place all of the common attributes in the supertype. The subtype inherits all of the attributes of the supertype and you can then add the attributes that are different. 50 GPA Person Name SSN Student ISA Faculty 1 Rank Enrolls N Enrolls Course 51 Specialization members of higher level entity not found in lower level. Generalization higher level entity is union of all lower level entities. The determining factor is whether or not you allow higher level entities to exist without at least one lower level representation. Therefore if the higher can exist alone that is specialization, if not generalization. 52 9