Database Review
Outline Definitions History Basic concepts of DBMS Data Models Relational database Normalization
Definitions Database: an organized collection of data Relational database: a database based on the relational model Database management system: a computer program (or a suite of programs) designed to manage a database, and run operations on the data requested by numerous consumers Source: www.en.wikipedia.org
History 960's: Two main data models were developed: network model (CODASYL - COnference on DAta SYstems Languages) and hierarchical (IMS - Information Management System). 970-72: Edgar Frank Codd proposed relational model for databases - disconnects the schema (logical organization) of a database from the physical storage methods.
History 970's: Two main prototypes for relational systems were developed during 974-77. Ingres: Developed at UCB led to Ingres Corp., Sybase, MS SQL Server, Britton-Lee, Wang's PACE. used QUEL as query language. System R: Developed at IBM San Jose led to IBM's SQL/DS & DB2, Oracle, HP's Allbase, Tandem's Non-Stop SQL. used SEQUEL (Sequential English QUEry Language) as query language. The term Relational Database Management System (RDBMS) is coined during this period.
History 976: P. Chen proposed the Entity-Relationship (ER) model for database design. Early 980's: Commercialization of relational systems begins Mid-980's: SQL (Structured Query Language) becomes standard DB2 becomes IBM's flagship product. Development of the IBM PC gives rise to many DB companies and products such as RIM, RBASE 5000, PARADOX, OS/2 Database Manager, Dbase III, IV (later Foxbase, even later Visual FoxPro), Watcom SQL.
History Early 990's: companies offering increasingly complex products at higher prices. development centers on client tools for application such as PowerBuilder (Sybase), Oracle Developer, VB (Microsoft), etc. Client-server model for computing becomes the norm for future business decisions. Development of personal productivity tools such as Excel/Access (MS) and ODBC. This also marks the beginning of Object Database Management Systems (ODBMS) prototypes. Mid-990's: Internet/WWW appears. Web/DB grows exponentially.
History Late-990's: Boom in Web/Internet/DB connectors. Active Server Pages, Front Page, Java Servlets, JDBC, Enterprise Java Beans, ColdFusion, Dream Weaver, Oracle Developer 2000, etc. Open source solution come online widespread use of gcc, cgi, Apache, MySQL, etc. Online Transaction processing (OLTP) and online analytic processing (OLAP) comes of age with many merchants using point-of-sale (POS) technology on a daily basis
History Early 2st century: Decline of the Internet industry as a whole solid growth of DB applications continues. More interactive applications appear with use of PDAs, POS transactions, consolidation of vendors, etc. Three main (western) companies predominate in the large DB market: IBM (buys Informix), Microsoft, and Oracle. Future trends Large data size Mobile computing
Problems with Conventional File Systems Data redundancy and inconsistency Difficulty in accessing data Data isolation Multiple users Security problems Integrity problems
DBMS Characteristics: Self-contained Insulation between programs and data Data abstraction Support of multiple views of the data
DBMS Intended uses: Controlling redundancy Sharing of data - concurrency control Restricting unauthorized access Providing multiple interfaces Representing complex relationships among data Enforcing integrity constraints Providing backup and recovery
DBMS Advantages Potential for enforcing standards Flexibility Reduced application development time Availability of up-to-date information Economies of scale
Data Abstraction Data abstraction is an abstract view of the database system Three levels of data abstraction Physical level - how data are stored physically Conceptual level - what data are stored, relationship among data View level - which parts of data a specific user can view
Data Models Object-based logical data model ER model Object-oriented model Record-based logical data model Relational model Network model Hierarchical model Physical data model
E-R Model Consists of entities (objects) and relationships among those entities Each entity has a set of attributes describing it Can be described using E-R diagram
E-R Diagram Graphical expression of a logical structure of a database rectangles: represent entity sets. ellipses: represent attributes. diamonds: represent relationships among entity sets. lines: link attributes to entity sets and entity sets to relationships. http://wofford.org/ecs/dataandvisualization/ermodel/material.htm
Example of E-R Diagram http://www.conceptdraw.com/products/img/screenshots/cd5/software/chen_erd.gif
Relational Model Represents data and relationships with tables Attributes are represented as columns in each table Each column must have a unique name
Example of Relational Database Dept_ID Dept_Name Faculty_ID Faculty_Name IT Anne Taylor 2 Math 4 Santi Li IT 3 Sirima Li Faculty_ID 3 4 Faculty_Name Anne Taylor Sirima Li Santi Li Faculty_Salary $2500 $2000 $2300
Physical Data Model Describes data at the lowest level Depends on physical DBMS
Technical Terms Comparison Formal Name Relation Tuple Attribute Common Name Table Row Column A.K.A Entity Record Field
Relational Operations Query: select Update: insert, update, delete Tables manipulation: create table, drop table, alter table Index manipulation: Create index, drop index View manipulation: Create view, drop view
SELECT Select a subset of tuples in a relation SELECT field(s) FROM table_name WHERE condition Example ORDER BY field(s) SELECT customer_name,credit_limit,address FROM customer WHERE slsrep_number = 6 ORDER BY credit_limit, current_balance
INSERT Add a single tuple to a relation INSERT INTO table_name VALUES values to be inserted in columns Example: INSERT INTO slsrep VALUES (0, Mark, 400 Joyce St., 0.00, 0.05)
DELETE Remove tuples from a relation DELETE table_name WHERE condition Example DELETE customer WHERE name= Mark
UPDATE Modify attribute values of one or more selected tuples UPDATE table_name SET column = value WHERE condition Example UPDATE customer SET name = Ann WHERE customer_number = 00
Normalization The process of converting a complex data structure into simplest structure most stable structure By removing redundant attributes, keys, and relationships from a conceptual data model
NF First normal form (NF) entity is achieved by removing repeating or multivalued attributes to another, child entity.
Example Order _ID Order_ Date Cust Prd_ ID Prd_Dscr Prd_Price Prd_ QTY 00 02006 TOS 2 Hammer Saw $0 $5 20 0 003 02006 ASI Hammer $0 5 004 03006 DTC 3 Plier Hammer $5 $2 5 5 005 03006 ORA 2 Saw $6 20
Example First Normal Form ORN DTC DTC ASI TOS TOS Cust 0 $5 Saw 2 02006 00 5 $2 Hammer 03006 004 $6 $5 $0 $0 Prd_Price Saw Plier Hammer Hammer Prd_Dscr 20 2 03006 005 5 3 03006 004 5 02006 003 20 02006 00 Prd_ QTY Prd _ID Order_ Date Order _ID
2NF First normal form entities can be reduced to second normal form (2NF) by removing attributes that are not dependent on the whole primary key.
Example First Normal Form ORN DTC DTC ASI TOS TOS Cust 0 $5 Saw 2 02006 00 5 $2 Hammer 03006 004 $6 $5 $0 $0 Prd_Price Saw Plier Hammer Hammer Prd_Dscr 20 2 03006 005 5 3 03006 004 5 02006 003 20 02006 00 Prd_ QTY Prd _ID Order_ Date Order _ID
Example Second Normal Form 0 $5 Saw 2 02006 00 5 $2 Hammer 03006 004 $6 $5 $0 $0 Prd_ Price Saw Plier Hammer Hammer Prd_Dscr 20 2 03006 005 5 3 03006 004 5 02006 003 20 02006 00 Prd_ QTY Prd _ID Order_ Date Order _ID ORA 005 DTC 004 ASI 003 TOS 00 Cust Order _ID
3NF Second normal form entities can be reduced to third normal form (3NF) by removing attributes that depend on other, nonkey attributes (other than alternative keys).
Example Second Normal Form 0 $5 Saw 2 02006 00 5 $2 Hammer 03006 004 $6 $5 $0 $0 Prd_ Price Saw Plier Hammer Hammer Prd_Dscr 20 2 03006 005 5 3 03006 004 5 02006 003 20 02006 00 Prd_ QTY Prd _ID Order_ Date Order _ID
Example Order _ID Order_Date Prd_ID Prd_ QTY Prd_ ID Prd_Dscr Prd_ Price 00 02006 20 Hammer $0 00 02006 2 0 2 Saw $5 003 02006 5 3 Plier $5 004 03006 3 5 004 03006 5 005 03006 2 20 Third Normal Form
Denormalization the process of attempting to optimize the performance of a database by adding redundant data sometimes necessary because current DBMSs implement the relational model poorly Source: www.en.wikipedia.org