College of Computer and Information Sciences - Information Systems Dept. Lecture2: Database Environment 1 IS220 : D a t a b a s e F u n d a m e n t a l s
Topics Covered Data abstraction Schemas and Instances Three-level Architecture. Mapping Data Independence Data Models Database system development lifecycle Classification or models of DBMSs. 2
Data abstraction One fundamental characteristic of the database approach is that it provides some level of data abstraction. Data abstraction generally refers to the suppression of details of data organization and storage, and the highlighting of the essential features for an improved understanding of data. Data abstraction enable different users to perceive data at their preferred level of detail. 3
Schemas and Instances In any data model, it is important to distinguish between description of database and the database itself: Schema (intention) The description of the database. It rarely changes. when we define a new database, we specify its schema The structure, data types, and the constraints that describes the database. A displayed schema is called a schema diagram We call each object in the schema a schema construct. Instance (database state / extension) The actual data in the database at any point of time Changes rapidly. When we initially load data into the database, it is said to move into the initial state of the database. Each write operation (insert, delete, modify) changes the current state of the database to its new state 4
Example Schema Instance 5Database Concepts
Three-Schema Architecture and Data Independence 6Lecture2
Three-level Architecture The goal of the three-schema architecture, is to separate the user applications from the physical database. 7
Three-level Architecture 1. The external or view level includes a number of external schemas or user views. (the ways users perceive the data) Describes the part of database that is relevant to a particular user. 8
Three-level Architecture 2. Conceptual Level It has a conceptual schema (logical structure of entire database) which describes the structure of the whole database for a community of users. Describes what data is stored in database and relationships among the data. It concentrates on describing entities, data types, relationships, user operations, and constraints. 9
Three-level Architecture 3. Internal Level It has an internal schema ( the way DBMS and OS perceive the data) Physical representation of the database on the computer. How the data is stored in the database. It contains the definitions of stored records, the methods of representation, the data fields, and the indexes and storage structures used. 10
Illustrating Example 11
Mapping In a DBMS based on the three-schema architecture, the DBMS must transform a request specified on an external schema into a request against the conceptual schema, and then into a request on the internal schema for processing over the stored database. The processes of transforming requests and results between levels are called mappings. 12
Reasons for Separations? 1. Each user should able to access the data, but have a different customized view of data. 2. The DBA should be able to change the DB storage structure without affecting the user s view. 3. The internal structure of database should be unaffected by changes to the physical aspects of storage, such as change to new storage device. 13
Data Independence The three-level architecture provides Data Independence, which means that upper level are unaffected by changes to lower level Data Independence is the ability to modify a schema definition in one level without affecting a schema definition in the next higher level. There are two kinds of data independence: Logical Data Independence Physical Data Independence 14
Data Independence Logical Data Independence Refers to immunity of external schemas to changes in conceptual schema. Conceptual schema changes (e.g. addition/removal of entities) should not require changes to external schema or rewrites of application programs. 15
Data Independence Physical Data Independence Refers to immunity of conceptual schema to changes in the internal schema. Internal schema changes (e.g. using different file organizations, storage structures/devices) should not require change to conceptual or external schemas. 16
Data Independence and the Three-Level Architecture 17
18
Data Models 19
Data Model A data model a collection of concepts that can be used to describe the structure of a database. By structure of a database we mean the data types, relationships, and constraints that apply to the data. Purpose To represent data in an understandable way. 20
Database system development lifecycle As a database system is a fundamental component of the larger organization-wide information system, the database system development lifecycle is inherently associated with the lifecycle of the information system. The stages of the database system development lifecycle are shown in the following Figure: 21
Analysis Phase Design Phase Implementation Phase Maintenance 22 The Stages of the database System Development Lifecycle
Database Design Database design has three main phases: conceptual, logical, and physical design. Conceptual database design to build the conceptual representation of the database, which includes identification of the important entities, relationships, and attributes. Logical database design to translate the conceptual representation to the logical structure of the database, which includes designing the relations. Physical database design to decide how the logical structure is to be physically implemented (as base relations) in the target Database Management System (DBMS). 24
Conceptual Data Model Conceptual Database Design: The process of constructing a model of the data used in an enterprise, independent of all physical considerations. The conceptual data model includes ER and a data dictionary. To build conceptual data model: Step 1.1 Identify entity types Step 1.2 Identify relationship types Step 1.3 Identify and associate attributes with entity or relationship types Step 1.4 Determine attribute domains Step 1.5 Determine candidate, primary, and alternate key attributes Step 1.6 Check model for redundancy Step 1.7 Validate conceptual model against user transactions Step 1.8 Review conceptual data model with user 25
Logical Data Model Logical Database Design: The process of constructing a model of the data used in an enterprise based on a specific data model (e.g. relational), but independent of a particular DBMS and other physical considerations. To build and validate logical data model (for the relational model): Step 2.1 Derive relations for logical data model Step 2.2 Validate relations using normalization: The process of organizing data to minimize redundancy such as dividing large tables into smaller (and less redundant) tables and defining relationships between them Step 2.3 Validate relations against user transactions Step 2.4 Check integrity constraints Step 2.5 Review logical data model with user Step 2.6 Check for future growth 26
Physical Data Model Physical Database Design : The process of producing a description of the implementation of the database on secondary storage. The physical database design phase allows the designer to make decisions on how the database is to be implemented. Therefore, physical design is tailored to a specific DBMS To build physical data model: Step 3.1 Translate logical data model for target DBMS Step 3.2 Design file organizations and indexes Step 3.3 Design user views Step 3.4 Design security mechanisms Step 3.5 Denormalization and controlled redundancy: The process of attempting to optimise the read performance of a database Such as adding attributes to a relation from another relation with which it will be joined. Step 3.6 Monitor and tune the operational system 27
28
Classification of DBMSs 29
Classification or models of DBMSs 1. first generation Network, Hierarchical 2. second generation Relational 3. third generation Object-oriented, Object-relational 30
First Generation Network Data Model The model that allowing a record to participate in multiple parent/child relationships. Allowing child records to have multiple parents (M:N relationships). Hierarchical Data Model Each parent record can have many children, but each child record has only one parent (1:M relationships). Tree-like structure. 31
First Generation Disadvantages of hierarchical and network DBMSs: 1. Required complex programs for even simple queries. 2. Minimal data independence. 3. No widely accepted theoretical foundation. 32
Second Generation Relational Data Model: Computer database in which all data is stored in Relations which are tables with rows and columns. Each table is composed of records (called Tuples) and each record is identified by a field (attribute containing a unique value). 33
Advantages of Relational model The benefits of a database that has been designed according to the relational model are numerous. Some of them are: 1. Data entry, updates and deletions will be efficient. 2. Data retrieval, summarization and reporting will also be efficient. 3. Since much of the information is stored in the database rather than in the application, the database is somewhat self-documenting. 4. Changes to the database schema are easy to make. 34
Third Generation Object-oriented Data Model Response to increasing complexity of DB applications 35
36
References Chapter 9, Chapter 15 Database Systems: A Practical Approach to Design, Implementation and Management. Thomas Connolly, Carolyn Begg. 5 th Edition, Addison-Wesley, 2009. Chapter 2 "Fundamentals of Database Systems Ramez Elmasri, Shamkant B. Navathe, Addison Wesley 37