DATABASE MANAGEMENT SYSTEMS UNIT I Introduction to Database Systems
Terminology Data = known facts that can be recorded Database (DB) = logically coherent collection of related data with some inherent meaning Entities such as students, courses, sections Relationships between entities such as students taking courses and sections being part of courses Database management system (DBMS) = collection of programs that enable users to create and maintain a DB; general-purpose software system that facilitates process of defining, constructing, and manipulating DBs for various applications.
What is database? A shared collection of logically related data and a description of this data, designed to meet the information needs of an organization Data repository (data resource) Designed independently of applications (i.e., data abstraction) Long-term information needs at the enterprise level Primarily designed for quick and efficient data retrieval
What the Purpose for Learning about Database? Paradigm shift: Data driven business environment Production efficiencies Knowledge and innovation (e.g., knowledge management, business intelligence) Coordination of vendors (e.g., supply chain management Competitor and marketplace information Customer information (e.g., database marketing, CRM)
Early 60s History (1) Charles Bachman introduced first general purpose DBMS known as IDS (Turing Award 1973) at General Electric (GE) Integrated Data Store (IDS) formed the basis for N/w data model Network Data Model was standardized by the Conference on Data Systems Languages (CODASYL). Late 60s IBM developed IMS Information Management Systems (IMS) formed basis for Hierarchical Data Model Hierarchichal Data Model SABRE system for making airline reservation jointly by IBM and American Airlines (allowed several people to access the same data thro computer N/W) 70s Edgar Codd, at IBM proposed Relational Data Model (Turing Award 1981) Use of DBMSs for managing corporate data became standard practice
History (2) 80s Now Relational Data Model became dominant DBMS paradigm SQL query language for relational DBs developed as part of IBM s System R project is now the standard query language Transaction Management (concurrent execution of db programs) (James Gray, Turing Award 1999) Object-oriented Data Model Data warehouse and data mining Accessing databases through the web/internet Multimedia data Text data (information retrieval) Structure of the data (XML)
Traditional File-Based System Definition: "A collection of application programs that perform services for the end-users such as the production of reports. Each program defines and manages its own data." Customer transactions Operating expenses Inventory Vendors Payroll Program Program Program Program Program Report Report Report Report Report One file, one application
Data Redundancy Customer Order File Invoice number Customer account number Customer name, address, city, state, zip code Order date Product code, product description, price, unit Customer Account File Account Number Customer name, mailing address, city, state, zip code Customer Mailing List File Customer name, mailing address, city, state, zip code
File-Based Systems Records contain logically related data Limitations: Separation and isolation of data (one file, one program) Duplication of data Loss of data integrity - uncertainty of the correct version of data and no consistency Data dependence - application program defines the data Incompatibility of file formats Fixed queries/proliferation of application programs - little flexibility in meeting changing information needs
Database A shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization. Data and Data Definitions Central Repository Separation Applications
Data Abstraction Separation between the data s structure (definition) and the application programs Application programs can be run on either the clients or server Data and Data Definitions Central Repository DBMS Applications
Organizing Data Entity - distinct object (i.e., person, place, thing, concept or event) Attribute - describes some aspect of the entity (object) Property of the entity Relationship - association between entities Entity Entity Attributes Customers Account_number Name Address Relationship Purchases Invoice_number Account_number Purchase_date
Database Customer Orders Order Items Products DBMS Management Queries Application Programs Manufacturers DDL DML Other Central Repository Controlled access Software (Organizational resource) Single Access Point Multitude of Applications
Advantages of the Database Approach Control of data redundancy Data consistency Efficient data access, Greater informational gain, more information from the same amount of data Sharing data, organizational resource (i.e., shared resource) Improved data integrity, validity and consistency Improved access and security Enforcement of standards Concurrency Access and Crash recovery Data Administration Reduced Application development time
Database Applications Traditional database applications (banks, library catalogs, inventory, airlines, universities) Multimedia databases (images) Geographic information systems Data warehouse and online analytical processing (OLAP) Real time and active database technology (sensor systems, safety-critical systems) World wide web (e-commerce, internet banking)
DBMS Available ORACLE DB2 by the IBM MS-SQL Teradata Sybase Informix
Data Model Collection of high level data description constructs that hide many low-level storage details Semantic data model More abstract, high level data model (makes it easier to describe about the data) Widely used one is ER model pictorially denotes entities and relationships among them Relational Model Relation set of records Schema A description of data in terms of a data model is schema Schema for a relation specifies its name, name of each field (or attribute or column) and type of each field. Example Students(sid: string, name: string, login: string, gpa: real) Each row in the relation is a record that describes the student
Other Data Models Relational Data model ( dominant model) Hierarchical data model Network model Object oriented model Object relational model
Types of Database Models HIERARCHICAL COLUMN RELATIONAL ROW VALUE TABLE
Database Architecture/ Levels of Data Abstractions External level (individual user views) Conceptual level (community user view) Internal level (storage view) Database
Conceptual Schema Describes data in terms of the data model of the DBMS. In a RDBMs, the conceptual schema describes all relations that are stored in the database. Eg. University Db Students (sid: string, name: string, gpa: real) Faculty (fid: string, fname: string, sal: real) Physical schema Specifies additional storage details Summarizes how the relations described in conceptual schema are actually stored on secondary storage devices like disks and tapes Decide on what file organizations to use to store relations and indexes to speed up data retrieval operations External Schema Allow data access to be customized at the level of individual users or groups of users.
An Example of the Three Levels SNo FName LName Age Salary Conceptual View SNo FName LName Age Salary External View1 SNo LName BranchNo External View2 BranchNo struct STAFF { int staffno; int branchno; char fname[15]; char lname[15]; struct date dateofbirth; float salary; struct STAFF *next; /* pointer to next Staff record */ }; index staffno; index branchno; /* define indexes for staff */ Internal View
Database Design Phases DATA ANALYSIS Entities - Attributes - Relationships - Integrity Rules LOGICAL DESIGN Tables - Columns - Primary Keys - Foreign Keys PHYSICAL DESIGN DDL for Tablespaces, Tables, Indexes
Data Independence Ability to change one schema level without affecting the higher level schemas Physical Data Independence Ability to change physical schema or internal schema without affecting conceptual or logical schema Logical Data Independence Ability to change logical schema without affecting External or view schema. (application programs) One imp. Adv of DBMS is data independence
Characteristics of the DB approach (1) Single repository of data defined once, maintained and accessed by users Self-describing nature of DB DB + description of DB structures and constraints stored in primary DB metadata (stored in catalog) DBMS software works with any number of DB applications Insulation between programs and data, and data abstraction Program--data independence Program--operation independence (OO DBMS) Abstraction: conceptual representation of data, no details of how data is stored or operators are implemented
Characteristics of the DB approach (2) Data model Relational data model Object-oriented data model Entity-relationship data model Support multiple views of data view = subset of DB virtual data derived from DB (not explicitly stored) Sharing data and multi-user transaction processing Concurrency control Online transaction processing (OLTP)
Query Languages Query questions involving data stored in dbms Relational Algebra formal query language based on collection of operators for manipulating relations Relational Calculus formal query language based on mathematical logic DDL: Data Definition Language Defines db structure Commands used are for creating, altering, query data DML: Data Manipulation Language For manipulating (inserting, deleting, updating) db contents Procedural and Non procedural (Declarative) DML
Types of DML Procedural DML Must be embedded in a programming language. Searches for and retrieves individual db records and uses looping and other constructs of the host programming language to retrieve multiple records Non-Procedural or Declarative DML Can be used as a stand-alone query language or can be embedded in a programming language. Searches for and retrieves information from multiple related db records in a single command
Components of a Database Environment Hardware Software: DBMS, application program and query software Data: Organized in a schema, partitioned into subschemas Procedures: Govern the design, access and use of the database People: Administrators (DA, DBA), designers (logical and physical), application developers and users (novice and high-powered)
Database System Users DATABASE SYSTEM DBMS Software Application Programs/Queries Software to process queries/programs Software to access stored data Stored Data Defn. (META-DATA). Stored Database
Users of the Database Day-to-day use of the DB Database administrators (DBA) Database designers End-users Casual end-users Naïve or parametric users Sophisticated end-users Stand-alone users System analysts and application programmers (software engineering)
Implications of the DB approach Potential for enforcing standard Reduce application development time Flexibility Availability of up-to-date info
When not to use a DBMS Unnecessary overhead costs Security, concurrency control, recovery and integrity High initial investment in hardware, software, training DB and applications are simple, well defined, not expected to change Real-time requirements not met (due to overhead) Multi-user access not required