DATABASE CONCEPTS Dr. Awad Khalil Computer Science & Engineering Department AUC s are considered as major components in almost all recent computer application systems, including business, management, engineering, education, medicine, science,... etc. technology has a major impact on the growing use of computer systems. What is a? A database is simply a collection of related data. Data mean known facts that can be recorded and that have implicit meaning. Properties 1. A database is a logically coherent collection of data with some inherent meaning. A random collection of data cannot be considered as a database. 2. A database represents aspects of a real system in real world. Examples of these real systems : A hospital A bank A factory A company A university 3. A database is designed and implemented to afford the informational needs of a specific group of users. These informational needs are supported by a set of application systems running against the implemented database. Example I Consider a small personal database to maintain data concerning the names, telephone numbers, and addresses of the people you know TELEPHONE DIRECTORY NAME PHONE NO ADDRESS Ahmed 2451615 Heliopolis Aly 2467295 Heliopolis Badran 2977251 Nozah Sadek 3401312 Zamalek Salah 3409123 Dokkie 1
Example II Consider a suppliers-and-parts database to maintain data concerning suppliers supplying parts for a certain company. SUPPLIER PART SUPPLY SNO Sname Status City S1 Ahmed 20 Cairo S2 Badran 10 Cairo S3 Aly 10 Alex. S4 Saleh 30 Tanta S5 Sadek 20 Cairo PNO Pname Color Weight P1 Nut Red 12 P2 Bolt Green 15 P3 Screw Blue 15 P4 Cam Red 17 P5 Cog Black 20 P6 Screw Black 14 SNO PNO QTY S1 P1 100 S1 P2 200 S1 P3 100 S2 P1 150 S2 P3 100 S4 P2 200 S4 P3 300 S5 P2 100 System Environment In a database system environment, four main components can be recognized 1.. 2. Management System (DBMS). 3. programs. 4. Users. 2
Users/Programmers DATABASE SYSTEM Programs/Queries DBMS Software Software to Process Queries/Programs Software to Access Stored Data Stored Definition (Meta-Data) Stored (1) The Data in the database will be both integrated and shared. management system (DBMS) programs STUDENT Name Address Department... End-users ENROLLMENT Name Course... 3
(2) The Management System (DBMS) The Management System (DBMS) is a general-purpose software that enables users to create and maintain a database. The DBMS facilitates the process of defining, constructing, and manipulating databases for various applications. 1. Defining: Data Definition Language (DDL). 2. Constructing. 3. Manipulating: Data Manipulation Language (DML). DBMS Layers 1. Software to process Queries and Programs: (DML). The language SQL (Structured Query Language) is a typical example of a database query language. 2. Software to access stored data (3) The Programs These are the programs written to support the end users requirements. A given end user can access the database via one of the online applications, where he or she operates by choosing items from a menu or filling in items on a form. Such menu- or forms-driven interfaces tend to be easier to use for people who do not have a formal training in data processing. users Common users (4) USERS Users Professional Users End Users Designers Adnibistrators System Analysts & Programmers Casual end users Parametric end users Sophisticated end users Stand-alone end users 4
Characteristics of the Approach 1. Self-describing nature of a database 2. Program-data independence and Data Abstraction 3. Support multiple views of the data 4. Sharing of data and multiuser transaction processing files users files users Traditional file processing approach users Common users The approach. 5
DATA STORING APPROACHES 1. The File-based Approach Each application has its own set of files. Disadvantages: Inflexibility: a mass production facilty committed to processing particular queries. Uncontrolled Redundancy: If these separate applications need to process the same data then must duplicate copies of the data i.e., each application has its own data files, several copies of the same data may exist in different applications. This leads to: Wastage of valuable storage space. Need to input data to several files. Data inconsistency (one fact may have more than one value various versions may occur). Poor Enforcement of System Standards: Data names, formats, access restrictions, etc. are not standardized across an organization, may have many synonyms and homonyms. This makes modifications difficult and hinders sharing of data. Limited Data Sharing: Each application has its own private file providing little opportunity for users to share existing data. Additionally any new applications would not be able to use existing files leading to low productivity. Progran Data Dependency: Descriptions of files, records, data items are embedded within application programs. Any modification to a data file requires that the application programs using that file must also be changed. In other words, program maintenance will be excessive. 2. The Approach The database approach improves upon file-based systems. A DBMS (Data Base Management System) does not fragment data into separate files but regards data as being stored in a large concptual repository termed as database. The DBMS handles the addition, storage, update, and retrieval of data. DBMSs are based on semantically rich data models, which can accurately represent real world data. DBMSs allow: Persistence of Data Transaction Control Concurrency Control Recovery Control Querying Integrity Control Data Security Version Control Performance Tuning Components of DBMS DBMS Engine Interface Subsystem (DDL, DML, DCL, Graphical User Interface, Forms Interface, Natural Language Interface) Data Dictionary Subsystem Performance Management Subsystem Concurrency Control Subsystem 6
Data Integrity Management Subsystem Backup and Recovery Subsystem Development Subsystem Security Management Subsystem Benefits of the Approach Ease of application development Minimal data redundancy Enforcement of standards Data can be shared Physical data independence Logical data independence Better modeling of real world data Uniform security and integrity controls Economy of scale Risks of the Approach New specialized personnel Need for explicit backup Organizational conflict Large size Expensive High impact of failure Modelling Structure A database structure is the description and definition of all basic structures such as simple conceptual files, datatypes, relationships, and constraints that should hold on the data. In any data model it is important to distinguish between the description of the database (Schema) and the database itself (Instance). Schema: The description of a database is called the database schema (or the meta-data). A database schema is specified during database design and is not expected to change frequently. The following is a simplified database schema diagram: 7
A Schema in SQL CREATE TABLE EMPLOYEE (FNAME VARCHAR(15) NOT NULL, MINIT CHAR, LNAME VARCHAR(15) NOT NULL, SSN SSN_TYPE NOT NULL, BDATE DATE ADDRESS VARCHAR(30), SEX CHAR, SALARY DECIMAL(10,2), SUPERSSN SSN_TYPE, DNO INT NOT NULL, PRIMARY KEY (SSN), FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE(SSN), FOREIGN KEY (DNO) REFERENCES DEPARTMENT(DNUMBER)); CREATE TABLE DEPARTMENT (DNAME VARCHAR(15) NOT NULL, DNUMBER INT, NOT NULL, MGRSSN SSN_TYPE NOT NULL, MGRSTARTDATE DATE PRIMARY KEY (DNUMBER), UNIQUE (DNAME) FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE(SSN)); CREATE TABLE DEPT_LOCATIONS (DNUMBER INT NOT NULL, DLOCATION VARCHAR(15) NOT NULL, PRIMARY KEY (DNUMBER, DLOCATION), FOREIGN KEY (DNUMBER) REFERENCES DEPARTMENT(DNUMBER)); CREATE TABLE PROJECT (PNAME VARCHAR(15) NOT NULL, PNUMBER INT NOT NULL, PLOCATION VARCHAR(15) DNUM INT NOT NULL, PRIMARY KEY (PNUMBER), UNIQUE (PNAME) FOREIGN KEY (DNUM) REFERENCES DEPARTMENT(DNUMBER)); CREATE TABLE WORKS_ON (ESSN SSN_TYPE NOT NULL, PNO INT NOT NULL, HOURS DECIMAL(3,1) NOT NULL, PRIMARY KEY (ESSN, PNO), FOREIGN KEY (ESSN) REFERENCES EMPLOYEE(SSN), FOREIGN KEY (PNO) REFERENCES PROJECT(PNUMBER)); CREATE TABLE DEPENDENT (ESSN SSN_TYPE NOT NULL, DEPENDENT_NAME VARCHAR(15) NOT NULL, SEX CHAR, BDATE DATE, RELATIONSHIP VARCHAR(8) PRIMARY KEY (ESSN, DEPENDENT_NAME), FOREIGN KEY (ESSN) REFERENCES EMPLOYEE(SSN)); 8
State (Instance) The data in the database at a particular moment of time is called the database state (or instance). The Three-Layer Architecture End Users External Level External View 1 External View n external/conceptual mapping Conceptual Level Conceptual Schema conceptual/internal mapping Internal Level Internal Schema Stored 9
Data Independence Data independence is the capacity to change the schema at one level of a database system without having to change the schema at the next higher level. Defined as: Logical data independence is the capacity to change the conceptual schema without having to change external schemas or Logical data application programs. Physical data independence is the capacity to change the internal schema without having to change the conceptual (or external) schemas. Data Models A data model is a set of concepts that can be used to describe a database structure. Data Models High-level (conceptual) Data Models Implementation Data Models Low-level (physical) Data Models Relational Hierarchical Network Objectoriented Classification of DBMSs Classification according to Data Model DBMSs Relational Hierarchical Network Objectoriented 10
The Relational Data Model The relational data model represents the database as a collection of tables, where each table can be stored as separate file. Examples of commercial relational DBMSs: DB2 from IBM ORACLE from Oracle Corporation Informix from Informix SyBase from OpenSoft SQL Server from Microsoft MS-ACCESS from Microsoft An Example of a Relational 11
The Network Data model The network data model represents data as a record types. An example of a network model is known as the CODASYL DBTG model. The Hierarchical Data model The hierarchical data model represents data as hierarchical tree structure. Each hierarchical represents a number of related records. 12
The Object-Oriented Data model The object-oriented data model defines a database in terms of objects, their properties, and their operations. Objects with the same structure and behavior belong to a class, and classes are organized into hierarchies or a cyclic graphs. The operations of each class are specified in terms of predefined procedures called methods. Experimental OO prototypes The ORION system developed at MCC, The OpenOODB system at Texas Instruments, The IRIS system developed at HP laboratories, The ODE system at ATT Bell Labs, and The ENCORE/ObServer project at Brown University. Commercially available OO systems GEM-STONE/OPAL of SerioLogic, ONTOS of Ontologic, Objectivity of Objectivity Inc., Versant of Versant Technologies, ObjectStore of Object Design, and, O2 of O2 Technology. Classification of DBMSs (Cont d) Classification according to Number of Users Single user systems support only one user at a time and are mostly used with personal computers. Multiuser systems, which include the majority of DBMSs, support many users concurrently. Classification according to Number of Sites Centralized DBMS where the data is stored at a single computer site. Most DBMSs are centralized. A centralized DBMS can support multiple users, but the DBMS and the database themselves reside totally at a single computer site. Distributed DBMS (DDBMS) can have the actual database and DBMS software distributed over many sites, connected by a computer network. Many DDBMSs use a client-server architecture. 13