Architecture and Implementation of Database Management Systems Prof. Dr. Marc H. Scholl Winter 2006/07 University of Konstanz, Dept. of Computer & Information Science www.inf.uni-konstanz.de/dbis/teaching/
Module 1: Introduction & Overview Module Outline 1.1 What it s all about Web Forms Applications SQL Commands SQL Interface 1.2 Outline of the Course 1.3 Organizational Matters Plan Executor Parser Operator Evaluator Optimizer Query Processor Transaction Manager Lock Manager Concurrency Control Files and Index Structures Buffer Manager Disk Space Manager Recovery Manager DBMS Index Files Data Files System Catalog Database 2
1.1 What it s all about This is a systems-oriented course, with focus on the necessary infrastructure to build a DBMS. This will help to thoroughly analyze, compare, and tune DBMSs for performance-critical applications. 3 While introductory courses have presented the functionality, i.e., the interface, of database management systems (DBMSs), this course will dive into the internals. We will, for example, learn how a DBMS can efficiently organize and access data on disk, knowing that I/O is way more expensive than CPU cycles, translate SQL queries into efficient execution plans, including query rewrite optimization and index exploitation, sort/combine/filter large data volumes exceeding main memory size by far, allow many users to consistently access and modify the database at the same time, takes care of failures and guarantees recovery into a consistent operation after crashes.
1.1.1 Overall System Architecture A DBMS is typically run as a back-end server in a (local or global) network, offering services to clients directly or to application servers. Users Clients Request Reply Application Application Server Program 1 Application Program 2 Request Reply Data Server encapsulated data Objects exposed data Stored Data (Pages) Generally, we call this the 3-tier reference architecture. 4
1.1.2 Layered DBMS Architecture Typically, a DBMS implements its functionality in a layered architecture that builds up by incrementally adding more abstractions from the low level of block I/O devices up to the high level of a declarative (SQL) user interface. Clients Requests Database Server Request Execution Threads Language & Interface Layer Query Decomposition & Optimization Layer Query Execution Layer Access Layer Storage Layer Data Accesses Database 5
1.1.3 Storage Structures Whether the DBMS offers relational, object-relational, or other data structures at the user interface, internally they have to be mapped into fixed-length blocks that serve as the basic I/O-unit of transfer between main and secondary memory. Database Page Page Header Ben 55 Las Vegas Sue 23 Seattle Joe 29 San Antonio free space --- forwarding RID Slot Array Extent Table Database Extents 6
1.1.4 Access Paths A DBMS typically provides a number of indexing techniques that allow for fast content-based searching of records, such as tree-structured or hash-based methods. Often, the suite of such indexing techniques can be extended to match the requirements of particular applications. Root Node Bob Eve Tom B+-tree Adam Bill Bob Dick Eve Hank Jane Jill Tom RIDs Leaf Nodes 7
1.1.5 Query Execution Declarative query specifications, e.g. expressed in SQL, need to be optimized and transformed into efficient query execution plans (QEPs), i.e., sequential or even parallelized programs that compute the results. Projection RID Access Projection Filtering RID List Intersection RID Access Index Scan on AgeIndex Index Scan on CityIndex Fetch Person Record Index Scan on CityIndex Fetch Person Record 8
1.1.6 Implementing a Lock Manager Most DBMSs use a locking protocol (e.g., 2PL) for concurrency control. Efficiently implementing the lock manager and exploiting the synchronization primitives offered by the underlying operating system is crucial for a high degree of parallelism. Hash Table indexed by Resource Id Transaction Control Blocks (TCBs) Transaction Id Update Flag Transaction Status Number of Locks LCB Chain Resource Control Blocks (RCBs) Resource Id Hash Chain FirstInQueue Lock Control Blocks (LCBs) Transaction Id Resource Id Lock Mode Lock Status NextInQueue LCB Chain 9
1.2 Outline of the Course We will pursue a bottom-up strategy, starting from the block-i/o devices used for secondary storage management and work our way up to the SQL interface. Most part of the lecture is based on the book (Ramakrishnan and Gehrke, 2003). Additional references to other textbooks and related literature will be given when appropriate. Web Forms Plan Executor Operator Evaluator Applications SQL Commands Parser Optimizer SQL Interface Query Processor Transaction Manager Lock Manager Concurrency Control Files and Index Structures Buffer Manager Disk Space Manager Recovery Manager DBMS Index Files Data Files System Catalog Database 10
1.3 Organizational Matters Register with the Account Tool. Actively participate in lectures and assignments. There will be a written exam at the end of the semester. Let us know when you have problems or suggestions. 10 copies of the book underlying this course are available in the U KN library. 11
Bibliography Elmasri, R. and Navathe, S. (2000). Fundamentals of Database Systems. Addison-Wesley, Reading, MA., 3 edition. Titel der deutschen Ausgabe von 2002: Grundlagen von Datenbanken. Härder, T. (1987). Realisierung von operationalen Schnittstellen, chapter 3. in (Lockemann and Schmidt, 1987). Springer. Härder, T. (1999). Springer. Datenbanksysteme: Konzepte und Techniken der Implementierung. Heuer, A. and Saake, G. (1999). Datenbanken: Implementierungstechniken. Int l Thompson Publishing, Bonn. Lockemann, P. and Dittrich, K. (1987). Architektur von Datenbanksystemen, chapter 2. in (Lockemann and Schmidt, 1987). Springer. Lockemann, P. and Schmidt, J., editors (1987). Datenbank-Handbuch. Springer-Verlag. Mitschang, B. (1995). Anfrageverarbeitung in Datenbanksystemen - Entwurfs- und Implementierungsaspekte. Vieweg. Ramakrishnan, R. and Gehrke, J. (2003). Database Management Systems. McGraw-Hill, New York, 3 edition. 12