MODULE NAME: Database Management TOPIC: Introduction to Basic Database Concepts LECTURE 2 Functions of a DBMS The functions performed by a typical DBMS are the following: Data Definition The DBMS provides functions to define the structure of the data in the application. These include defining and modifying the record structure, the type and size of fields and the various constraints/conditions to be satisfied by the data in each field. Data Manipulation Once the data structure is defined, data needs to be inserted, modified or deleted. The functions which perform these operations are also part of the DBMS. These function can handle planned and unplanned data manipulation needs. Planned queries are those which form part of the application. Unplanned queries are adhoc queries which are performed on a need basis. Data Security & Integrity The DBMS contains functions which handle the security and integrity of data in the application. These can be easily invoked by the application and hence the application programmer need not code these functions in his/her programs. Data Recovery & Concurrency Recovery of data after a system failure and concurrent access of records by multiple users are also handled by the DBMS. Data Dictionary Maintenance Maintaining the Data Dictionary which contains the data definition of the application is also one of the functions of a DBMS. Performance Optimizing the performance of the queries is one of the important functions of a DBMS. Hence the DBMS has a set of programs forming the Query Optimizer Prepared by Miss N. Nembhard 1
which evaluates the different implementations of a query and chooses the best among them. Thus the DBMS provides an environment that is both convenient and efficient to use when there is a large volume of data and many transactions to be processed. Traditional/File Processing Approach This is an approach to storing and managing data where each department within an organization typically has its own set of files. Traditional Approach to Storing data Files are often designed specifically for their particular application Files are designed to meet the needs of a given program (e.g. Prog 1 uses the employee file only). The focus is on procedures (what needs to be done by the programs Prog 1 and Prog 2) The records in a file may not relate to records in any other file. (e.g. the employee file in no way related to the Warehouse file). Companies have usually been using file processing for many years Major Weaknesses of the Traditional Approach Data Redundancy Each department has its own files, therefore: o The same fields are stored multiple times causing wasted resources o The chance for errors is increased (e.g. different spelling in different locations causing inconsistency) Isolated data Resulting in difficulty to access data stored in different files. (e.g. Prog 1 cannot access directly those files designed for Prog 2) Poor data control with no centralized control at the data element level it is common for the same data element to have multiple names Data had to be kept sorted (e.g. in order to locate a particular item) File structure changes severely impact existing programs. Prepared by Miss N. Nembhard 2
The Database Approach to Data Management Database technology can cut through many of the problems a traditional file organization creates. A more rigorous definition of a database is a collection of data organized to serve many applications efficiently by centralizing the data and minimizing redundant data. Rather than storing data in separate files for each application, data are stored physically to appear to users as being stored in only one location. A single database services multiple applications. For example, instead of a corporation storing employee data in separate information systems and separate files for personnel, payroll, and benefits, the corporation could create a single common human resources database. (You can think of a database as an electronic filing system). Traditional databases are organized by fields, records, and files. A field is a single piece of information; a record is one complete set of fields; and a file is a collection of records. For example, a telephone book is analogous to a file. It contains a list of records, each of which consists of three fields: name, address, and telephone number. To access information from a database, you need a database management system (DBMS). This is a collection of programs that enables you to enter, organize, and select data in a database. The DBMS relieves the programmer or end user from the task of understanding where and how the data are actually stored by separating the logical and physical views of the data. The logical view presents data as they would be perceived by end users or business specialists; whereas the physical view shows how data are actually organized and structured on physical storage media. There is only one physical view of the data, but there can be many different logical views. The database management software makes the physical database available for different logical views presented for various application programs. For example, an employee retirement benefits program might use a logical view of the human resources database. Data Hierarchy Some terms associated with the structure of the data processes includes: File A collection of related data or fact. E.g. payroll facts for all employees in a company from a payroll file. Record- A collection of data, or facts about a single entity in the file e.g. payroll facts for a single employee in a company form a payroll record. Field Any single piece of data, or fact, about a single entity (record) in a file e.g. employee number represents a filed. Character A letter (A-Z, a-z) a number (0-9), or a special character (?,%,&) Prepared by Miss N. Nembhard 3
Advantages of DBMS o additional capabilities sorting query integrity checking o easy access to data o reduces data redundancy / inconsistency o improved data consistency o central control of data creation / definitions Disadvantages of DBMS o o o few graphical or statistical capabilities proprietary formats may limit archival quality of data require expertise and resources to administer Types of Database Systems Database Systems can be categorized according to the data structures and operators they present to the user. The oldest systems fall into inverted list, hierarchic and network systems. These are the pre-relational models. In the Hierarchical Model, different records are inter-related through hierarchical or tree-like structures. A parent record can have several children, but a child can have only one parent. IMS (Information Management System) of IBM is an example of a Hierarchical DBMS. You can still find older systems that are based on a hierarchical or network data model. The hierarchical DBMS is used to model one-to-many relationships, presenting data to users in a treelike structure. Within each record, data elements are organized into pieces of records called segments. To the user, each record looks like an organizational chart with one top level segment called the root. An upper segment is connected logically to a lower segment in a parent child relationship. A parent segment can have more than one child, but a child can have only one parent. Figure 1 shows a hierarchical structure that might be used for a human resources database. The root segment is Employee, which contains basic employee information such as name, address, and identification number. Immediately below it are three child segments: Compensation (containing salary and promotion data), Job Assignments (containing data about job positions and departments), and Benefits (containing data about beneficiaries and benefit options). The compensation segment has two children below it: Performance Ratings (containing data about employees job performance evaluations) and Salary History (containing historical data about employees past salaries). Below the Benefits segment are child segments for Pension, Life Insurance, and Health, containing data about these benefit plans. Prepared by Miss N. Nembhard 4
In the Network Model, a parent can have several children and a child can also have many parent records. Records are physically linked through linked-lists. IDMS from Computer Associates International Inc. is an example of a Network DBMS also, Codasyl and Total. In the Relational Model, unlike the Hierarchical and Network models, there are no physical links. All data is maintained in the form of tables consisting of rows and columns. Data in two tables is related through common columns and not physical links or pointers. Operators are provided for operating on rows in tables. Unlike the other two type of DBMS, there is no need to traverse pointers in the Relational DBMS. This makes querying much more easier in a Relational DBMS than in the the Hierarchical or Network DBMS. This, in fact, is a major reason for the relational model to become more programmer friendly and much more dominant and popular in both industrial and academic scenarios. Oracle, Sybase, DB2, Ingres, Informix, MS-SQL Server are few of the popular Relational Prepared by Miss N. Nembhard 5
DBMSs. Cust- Name Salesperson Orderno Salesperson Part-No Order-no Properties of Relational Tables: Values Are Atomic Each Row is Unique Column Values Are of the Same Kind The Sequence of Columns is Insignificant The Sequence of Rows is Insignificant Each Column Has a Unique Name Object-Oriented Model Object DBMSs add database functionality to object programming languages. They bring much more than persistent storage of programming language objects. Object DBMSs extend the semantics of the C++, Smalltalk and Java object programming languages to provide full-featured database programming capability, while retaining native language compatibility. A major benefit of this approach is the unification of the application and database development into a seamless data model and language environment. As a result, applications require less code, use more natural data modeling, and code bases are easier to maintain. Object developers can write complete database applications with a modest amount of additional effort. According to Rao (1994), "The object-oriented database (OODB) paradigm is the combination of object-oriented programming language (OOPL) systems and persistent systems. The power of the OODB comes from the seamless treatment of both persistent data, as found in databases, and transient data, as found in executing programs." In contrast to a relational DBMS where a complex data structure must be flattened out to fit into tables or joined together from those tables to form the in-memory structure, object DBMSs have no performance overhead to store or retrieve a web or hierarchy of interrelated objects. This one-to-one mapping of object programming language objects to database objects has two benefits over other storage approaches: it provides higher performance management of objects, and it enables better management of the complex interrelationships between objects. This makes object DBMSs better suited to support Prepared by Miss N. Nembhard 6
applications such as financial portfolio risk analysis systems, telecommunications service applications, World Wide Web document structures, design and manufacturing systems, and hospital patient record systems, which have complex relationships between data. Object/Relational Model Object/relational database management systems (ORDBMSs) add new object storage capabilities to the relational systems at the core of modern information systems. These new facilities integrate management of traditional fielded data, complex objects such as time-series and geospatial data and diverse binary media such as audio, video, images, and applets. By encapsulating methods with data structures, an ORDBMS server can execute complex analytical and data manipulation operations to search and transform multimedia and other complex objects. As an evolutionary technology, the object/relational (OR) approach has inherited the robust transaction- and performance-management features of it s relational ancestor and the flexibility of its object-oriented cousin. Database designers can work with familiar tabular structures and data definition languages (DDLs) while assimilating new objectmanagement possibilities. Query and procedural languages and call interfaces in ORDBMSs are familiar: SQL3, vendor procedural languages, and ODBC, JDBC, and proprietary call interfaces are all extensions of RDBMS languages and interfaces. And the leading vendors are, of course, quite well known: IBM, Inform ix, and Oracle. Comparison of Models Whereas hierarchical structures depict one-to-many relationships, network DBMS depict data logically as many-to-many relationships. In other words, parents can have multiple children, and a child can have more than one parent. A typical many-to-many relationship for a network DBMS is the student course relationship (see Figure 2). There are many courses in a university and many students. A student takes many courses, and a course has many students. Hierarchical and network DBMS are considered outdated and are no longer used for building new database applications. They are much less flexible than relational DBMS and do not support ad hoc, English language like inquiries for information. All paths for accessing data must be specified in advance and cannot be changed without a major programming effort. For instance, if you queried the human resources database illustrated in Figure 1 to find out the names of the employees with the job title of administrative assistant, you would discover that there is no way the system can find the answer in a reasonable amount of time. This path through the data was not specified in advance. Relational DBMS, in contrast, have much more flexibility in providing data for ad hoc queries, combining information from different sources, and providing capability to add new data and records without disturbing existing programs and applications. However, Prepared by Miss N. Nembhard 7
these systems can be slowed down if they require many accesses to the data stored on disk to carry out the select, join, and project commands. Selecting one part number from among millions, one record at a time, can take a long time. Of course, the database can be tuned to speed up pre-specified queries. Hierarchical DBMS can still be found in large legacy systems that require intensive high-volume transaction processing. Banks, insurance companies, and other high-volume users continue to use reliable hierarchical databases, such as IBM s Information Management System (IMS) developed in 1969. As relational products acquire more muscle, firms will shift away completely from hierarchical DBMS, but this will happen over a long period of time. 3-Level Database System Architecture The External Level represents the collection of views available to different end-users. The Conceptual level is the representation of the entre information content of the database. The Internal level is the physical level which shows how the data is stored, what is the representation of the fields etc. References: http://www.unixspace.com/context/databases.html Managing the Digital Office by Laudon & Laudon Prepared by Miss N. Nembhard 8