HNC Computing - s HNC Computing - s Architecture References Fundamentals of Systems, Elmasri/Navathe, Chapter 2 Systems : A Practical Approach, Connolly/Begg/Strachan, Chapter 2 Definitions Schema Description of the database Integrity Constraints Rules describing the consistency and validity of the data Indexes Data structure providing fast access to data Buffers Area of memory used for transferring data between disc and memory Three-level Architecture External, Conceptual and Internal view of DBMS structure SQL language INSERT Statement SQL command used to add records into a relation. 2 Stephen Mc Kearney, 2003. 1 Stephen Mc Kearney, 2003. 2
HNC Computing - s HNC Computing - s Overview Components of a DBMS Components of a DBMS File Dictionary Language Processor Languages Data Definition Language Data Manipulation Language Data Query Language DBMS Interfaces Menus Forms DBMS Utilities Loaders Backup File Reorganisation Performance Monitoring Classifications of DBMSs Number of Users Distribution Cost Application Programs DML Pre-processor Program Object Code Access Methods System Buffers Queries Query Processor File Schema DDL Compiler Dictionary From Connolly et al 3 4 A database management system (DBMS) is a complex piece of software. Its purpose is to store and retrieve large volumes of data in the most efficient way possible. A DBMS normally consists of one or more program modules that each provide some part of the overall functionality of the system. These modules include the language processors, the query processor, the database manager, the data dictionary manager and the file manager. For most DBMSs there are three major types of input: (1) application programs that change data in the database, (2) queries that retrieve data from the database (usually in response to a user s request) and (3) commands to change the database schema. Some of the functions of the DBMS may be provided by the underlying operating system, for example, the file manager or the system buffers. Ref: Connolly, sec. 2,5; Elmasri, sec. 2.4 Stephen Mc Kearney, 2003. 3 Stephen Mc Kearney, 2003. 4
HNC Computing - s HNC Computing - s Responsible for Authorisation and access control Command processor Integrity checking Query optimisation Transaction management Recovery management Application Programs DML Pre-processor Program Object Code Access Methods System Buffers Queries Query Processor File Schema DDL Compiler Dictionary The database manager is responsible for accepting commands from the user, ensuring that commands are valid, calculating the most effective way to execute commands and executing commands. Elmasri et al calls the database manager the run-time database processor. Connolly et al identifies the following major responsibilities of the database manager: Authorisation control All commands from the user are checked to ensure that the user is allowed to execute them. This process will involve checking the security permissions that have been given to the user and the restrictions that have been placed on the data requested. Command processor When a command has been authorised it is carried out by the command processor. Carrying out a command involves selecting the best method of executing the command using the query optimiser and the transaction manager. Integrity checking All commands that change the contents of the database must be checked to ensure that they do not introduce errors into the database. Integrity constraints are created by the database administrator. Query optimisation The most efficient method of executing a query must be identified. This is done by analysing a variety of possible plans and selecting the best. Transaction & Recovery management Large DBMSs use transaction processing to management very large changes to the database. Ref: Connolly, p59; Elmasri, sec. 2.4 5 File Responsible for Allocating disc storage Maintaining files and indexes Managing system buffers in main memory Transferring blocks between discs and buffers Functionality may be provided by the underlying operating system. Application Programs DML Pre-processor Program Object Code Access Methods System Buffers Queries Query Processor File Schema DDL Compiler Dictionary The file manager is responsible for the operation of all the discs and buffers used by the DBMS. Elmasri et al calls the file manager the stored data manager. The file manager will manage the following components: The allocation of disc space in which to store the data in the database system. The storage of data in a large DBMS can be a very complicated process because of the complexity of the data structures required to make retrieving the data efficient. The indexes and hash functions used to improve the performance of queries and updates to the database. Indexes must be automatically updated when data is added to or removed from the database. A large DBMS will require many main memory buffers (sometimes called caches). Buffers are used to store data that has been written to the database but not yet stored on the disc. When buffers become full they must be stored on the disc. The use of buffers in a DBMS can greatly affect the overall performance of the database. The functionality of the file manager may be provided by the underlying operating system, for example, UNIX or MSDOS. That is, the operating system will be responsible for managing buffers and disc allocation. This is normally true for small scale database systems. Ref: Connolly, p59; Elmasri, sec 2.4 6 Stephen Mc Kearney, 2003. 5 Stephen Mc Kearney, 2003. 6
HNC Computing - s HNC Computing - s Dictionary Language Processors Responsible for Keeping the data dictionary up-to-date Providing information about the database schema Storing integrity constraints Storing authorisation permissions Application Programs DML Pre-processor Program Object Code Access Methods System Buffers Queries Query Processor File Schema DDL Compiler Dictionary 7 Includes Data Manipulation Language Pre-processor Data Definition Language Compiler Responsible for Checking commands to the DBMS are correct Translating commands into machine readable form Work with the query processor Application Programs DML Pre-processor Program Object Code Access Methods System Buffers Queries Query Processor File Schema DDL Compiler Dictionary 8 The dictionary manager is responsible for managing all aspects of the data dictionary. Elmasri et al calls the dictionary manager the data dictionary system. The data dictionary is a database that stores information about all the data stored in the DBMS, for example, descriptions of the tables and attributes. The dictionary manager is responsible for: Updating the data dictionary when the database schema changes, for example, when a new table is added. Providing the database manager with information about the content of the database. Storing information about the integrity constraints and authorisation permissions of the users. This allows the database manager to fulfil its role of enforcing the integrity and security constraints of the database administrator. Ref: Connolly, p60; Elmasri, sec. 2.4 There are many languages in a DBMS. The language processor actually consists of a variety of different processes. The data manipulation language pre-processor is responsible for converting manipulation commands, for example, the SQL insert command, into commands that may be executed by the database manager. The data definition language compiler is responsible for converting commands that define the structure of the database into entries in the data dictionary. The language processor is responsible for checking that all commands are correct and for translating each command into a form that may be executed by the database manager. The language processor must also work with the query processor to produce the most efficient method of answering queries. The data manipulation commands are often stored in application programs and they must be converted into commands that can be understood by the database manager. Ref: Elmasri, sec. 2.3 Stephen Mc Kearney, 2003. 7 Stephen Mc Kearney, 2003. 8
HNC Computing - s HNC Computing - s Components Oracle Instance http://www.openlinksw.com/virtuoso/virtuowp/virtuowp.htm#_toc430023374 Oracle Concepts Manual Stephen Mc Kearney, 2003. 9 Stephen Mc Kearney, 2003. 10
HNC Computing - s HNC Computing - s Overview Components of a DBMS File Dictionary Language Processor Languages Data Definition Language Data Manipulation Language Data Query Language DBMS Interfaces Menus Forms DBMS Utilities Loaders Backup File Reorganisation Performance Monitoring Classifications of DBMSs Number of Users Distribution Cost 11 Languages Users communicate with the DBMS through a database language. A database language is simpler than a programming language. Types of database languages Data Definition Language Data Manipulation Language Data Query Language Users of a DBMS communicate with the database by giving it commands to execute. These commands are expressed using a database language, for example, SQL. A database language consists of a set of commands that allow the user to change the database s structure and content, for example, creating new tables or inserting new records. A database language is normally not as complex as a programming language. It does not, for instance, contain while-loops or for-loops. There are three important types of database language: Data Definition Language Data Manipulation Language Data Query Language Each database language deals with a different aspect of the database. For example, the data definition language provides commands to change the structure of the database. Users can execute commands by either: Sending the commands directly to the DBMS, or Embedding the commands in a programming language as part of an application program. Modern database languages do not distinguish between the different language types, for example, SQL contains commands for all the language types. 12 Ref: Connolly, sec. 2.2; Elmasri, sec 2.3 Stephen Mc Kearney, 2003. 11 Stephen Mc Kearney, 2003. 12
HNC Computing - s HNC Computing - s Data Definition Language Used to describe the database schema Creating relations, attributes, etc. Declaring integrity constraints Results of executing a DDL command Updated data dictionary New file structures Deleted file structures 13 Data Definition - Creating a Table CREATE TABLE scott.emp ( empno NUMBER CONSTRAINT pk_emp PRIMARY KEY, ename VARCHAR2(10) CONSTRAINT nn_ename NOT NULL CONSTRAINT upper_ename CHECK (ename = UPPER(ename)), job VARCHAR2(9), mgr NUMBER CONSTRAINT fk_mgr REFERENCES scott.emp(empno), hiredate DATE DEFAULT SYSDATE, sal NUMBER(10,2) CONSTRAINT ck_sal CHECK (sal > 500), comm NUMBER(9,0) DEFAULT NULL, deptno NUMBER(2) CONSTRAINT nn_deptno NOT NULL CONSTRAINT fk_deptno REFERENCES scott.dept(deptno) ) PCTFREE 5 PCTUSED 75; The data definition language (DDL) is used to describe the structure of the database. It allows the database administrator to describe the relations, attributes and integrity constraints of the system. Connolly et al defines the DDL as a descriptive language that allows the DBA or user to describe and name entities required for the application and the relationships that may exist between the different entities. When the DBMS receives a DDL command it: 1. Creates or changes the underlying file structures that are used to implement the database. For example, a file might be created for each new relation created by a CREATE TABLE DDL command. 2. Changes the data dictionary to record the change that has been made to the database structure. For example, a new record may be added to the data dictionary describing the structure of a newly created relation. In the three-level schema, Elmasri et al identifies three different data definition languages: 1. A view definition language for creating entities and relationships at the external level and mapping them to the conceptual schema. 2. A data definition language for creating entities and relationships at the conceptual level and mapping them to the internal level. 3. A storage definition language for creating file and index structures at the internal level. Ref: Connolly, sec. 2.2; Elmasri, sec 2.3 Stephen Mc Kearney, 2003. 13 Stephen Mc Kearney, 2003. 14
HNC Computing - s HNC Computing - s Data Manipulation Language Used to change the content of the database Insertion Deletion Two types of DML Procedural (How to make changes) Non-procedural (What changes to make) Data Manipulation - Inserting Record INSERT INTO emp (empno, ename, job, sal, comm, deptno) VALUES (7890, JINKS, CLERK, 1200, NULL, 40); INSERT INTO bonus SELECT ename, job, sal, comm FROM emp WHERE comm > 0.25 * sal OR job IN ( PRESIDENT, MANAGER ); 15 UPDATE emp SET emp_no = 1356 WHERE name = SMITH ; The data manipulation language (DML) is used to make changes to the content of the database. For example, in SQL the DML command INSERT INTO inserts a new tuple into a relation. Connolly et al defines the DML as a language that provides a set of operations that support the basic data manipulation operations on the data held in the database. The DML includes commands to: Insert new data into the database, Delete existing data from the database, and Modify existing data in the database. In the three-level schema a DML is used to make changes at the external and conceptual levels of the schema. The internal level DML is more complex because it must handle low level file and index structures. There are two main types of DML: 1. Procedural DMLs describe how the desired changes should be made to the database. For example, they will allow the user to describe the process to be used by the DBMS to update the database. 2. Non-procedural DMLs describe what the desired changes are but not how to actually perform the changes. The DBMS must select the best method of making the changes in the database. Relational DBMSs use non-procedural languages, for example, SQL. A non-procedural DML allows the user to concentrate on what they require rather than how to get it. Ref: Connolly, sec 2.2; Elmasri, sec 2.3 Stephen Mc Kearney, 2003. 15 Stephen Mc Kearney, 2003. 16
HNC Computing - s HNC Computing - s Data Manipulation Data Query Language Procedural Input.openFile( emp.dat ); Output.openFile( emp.out ); r = Input.readInt(); salary = Input.readInt(); while (r!= -1) { if (r = 15) { salary = salary + 100; Output.writeInt(r); Output.writeInt(salary); } r = Input.readInt(); salary = Input.readInt(); } Output.close(); Input.close(); Non-Procedural update emp set salary = salary + 100 where empno = 15; Used to retrieve data from the database The DQL is part of the data manipulation language. e.g. the SELECT statement in SQL Types of DQL Procedural (How to make changes) Non-procedural (What changes to make) 18 The data query language (DQL) is used to retrieve data from the database. It is part of the data manipulation language. For example, the SELECT statement in SQL is the DQL component of SQL. Using the SELECT statement the user can express all queries on the database. Connolly et al defines a DQL as a high-level special-purpose language used to satisfy diverse requests for the retrieval of data held in the database. As with the data manipulation language, the DQL can be either procedural or non-procedural. A procedural DQL describes how to answer the query. For example, a procedural DQL describes the tables that should be accessed, the indexes to use in accessing the tables and the order in which to access them. A non-procedural DQL describes what data is required to answer the query but not how to retrieve the data. For example, a non-procedural DQL will not describe the indexes that should be accessed. A non-procedural DQL is used in relational DBMSs because this allows the DBMS to decide the best strategy for accessing the data. This is particularly important when many users are accessing the same data. Ref: Connolly, sec 2.2; Elmasri, sec 2.3 Stephen Mc Kearney, 2003. 17 Stephen Mc Kearney, 2003. 18
HNC Computing - s HNC Computing - s Data Query - Select SELECT * FROM emp WHERE deptno = 30; SELECT deptno, MIN(sal), MAX (sal) FROM emp WHERE job = CLERK GROUP BY deptno; SELECT deptno, MIN(sal), MAX (sal) FROM emp WHERE job = CLERK GROUP BY deptno HAVING MIN(sal) < 1000; Data Query Procedural Input.openFile( emp.dat ); r = Input.readInt(); salary = Input.readInt(); while (r!= -1) { if (r = 15) { Output.writeInt(r); Output.writeInt(salary); } r = Input.readInt(); salary = Input.readInt(); } Input.close(); Non-Procedural select * from emp where empno = 15; Stephen Mc Kearney, 2003. 19 Stephen Mc Kearney, 2003. 20
HNC Computing - s HNC Computing - s Overview DBMS Interfaces Components of a DBMS File Dictionary Language Processor Languages Data Definition Language Data Manipulation Language Data Query Language DBMS Interfaces Menus Forms DBMS Utilities Loaders Backup File Reorganisation Performance Monitoring Classifications of DBMSs Number of Users Distribution Cost 4GL Non-procedural programming languages Forms User interfaces Menus Introductory screens Reports Formal printed reports 21 22 Stephen Mc Kearney, 2003. 21 Stephen Mc Kearney, 2003. 22
HNC Computing - s HNC Computing - s Overview DBMS Utilities Components of a DBMS File Dictionary Language Processor Languages Data Definition Language Data Manipulation Language Data Query Language DBMS Interfaces Menus Forms DBMS Utilities Loaders Backup File Reorganisation Performance Monitoring Classifications of DBMSs Number of Users Distribution Cost 23 Loaders Loads or extracts large amounts of data from the database Backup Copies data in case of a failure File Reorganiser Improves performance by reorganising data Performance Monitor Monitors the DBMS 24 A large DBMS provides many tools that the database administrator can use to manage the database. Loader A loader is a piece of software that loads data from a file into the database or extracts data from the database into a file. It is used because using many INSERT statements may too slow. Backup To safeguard the data it is important to backup the database on a regular basis. Special tools are provided to perform this function. File Reorganiser In a large database changing the structure of files can be a slow process. It can also be difficult to understand how the current structure is performing. The file reorganiser changes the structure of the database to improve its efficiency. Performance Monitor The performance monitor allows the database administrator to investigate the performance of the database. It will indicate how slow or fast the DBMS is performing and indicate any problems with the system. The monitor provides statistics on all aspects of the system. Ref: Elmasri, sec 2.4. Stephen Mc Kearney, 2003. 23 Stephen Mc Kearney, 2003. 24
HNC Computing - s HNC Computing - s Overview Classifications of DBMSs Components of a DBMS File Dictionary Language Processor Languages Data Definition Language Data Manipulation Language Data Query Language DBMS Interfaces Menus Forms DBMS Utilities Loaders Backup File Reorganisation Performance Monitoring Classifications of DBMSs Number of Users Distribution Cost 25 Type of Data Model e.g. relational, network, hierarchical Number of users e.g. single-user, multi-user Distribution e.g. number of sites Cost 26 There are many different types of DBMS. Depending on the requirements of the user the database administrator must select the most appropriate DBMS. For example, if the database is to hold details of one million customers and is to be accessed by 1500 salespeople from numerous locations then a very complex DBMS is required. However, if the database is to store a list of 100 customers for a small manufacturing firm and is to be accessed by the firms one salesperson then a simple DBMS will be sufficient. The major differences between DBMS packages include: The type of data model used to describe the data. The most popular data models include relational, network and hierarchical. Newer models include object-oriented databases The number of users that will be accessing the data, for example, 100, 1000, or 2000 users. The distribution of the data across sites in a network. A database may be contained on one site (centralised) or many sites (distributed). The databases in a distributed systems may use the same DBMS software (homogeneous) or different DBMS software (heterogeneous). The cost of the DBMS software and equipment. Ref: Elmasri, sec 2.5. Stephen Mc Kearney, 2003. 25 Stephen Mc Kearney, 2003. 26