BIS 512 - Database Management Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim
Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query in a Database System
Data? Database Systems Streams of raw facts representing events occurring in organizations Information? Data shaped into a meaningful form that is useful to human beings Database Organized collection of data
Information vs. Data Raw data is processes and organized to produce meaningful and useful information Total unit sales, total sales revenue
Database Management System (DBMS) DBMS contains information about a particular enterprise Collection of interrelated data Set of programs to access the data An environment that is both convenient and efficient to use Database Applications: Banking: all transactions Airlines: reservations, schedules Universities: registration, grades Sales: customers, products, purchases
Transaction Processing Systems Serves operational managers and staff Keep track of the elementary activities and transactions of the organization (sales, cash deposits, flow of materials) Monitor the status of internal operations and the firm s relations with the external environment Perform and record daily routine transactions necessary to conduct business Sales order entry, payroll, and shipping Serve predefined, structured goals, tasks and resources at the operational level Major producers of information for the other systems and business functions
Transaction Processing Systems
Data Management Systems Responsible for organizing and managing firm s data so that they can be efficiently accessed and used Allows the definition, creation, querying, update, and administration of databases (Transactions, Persistence of data, Recovery, Concurrency control) Database Software Providers Commercial Microsoft SQL Server, Oracle, IBM DB2, Sybase These 4 companies supply more than 90% of US DB market Open Source MySQL, PostgreSQL
Data Management and Storage NoSQL and Big Data Systems Amazon Dynamo, Cassandra, MongoDB, Neo4j Apache Hadoop
Why do we need DBMSs? Computers were originally developed for number crunching By time, data storage and processing became as important as scientific computing Amount and types of data increased Image/audio/video data Genome data Customer transactions DBMSs were developed to manage this data
Why do we need DBMSs? Data independence Efficient access Data integrity and security Uniform data administration Concurrent access Recovery from crashes User-friendly declarative query language
Terminology and Basic Ideas Data Model: Describes conceptual structuring of data stored in database data model is set of records. (records might each have student- ID and name) The relational model of data is the most widely used model today Main concept: relation, basically a table with rows and columns Every relation has a schema, which describes the columns, or fields Schema vs. Data Schema describes how data is to be structured defined at set-up time, rarely changes part of the "metadata" Data is actual "instance" of database, may change rapidly
Relational Model
Relational Model Concepts
Terminology and Basic Ideas Data Definition Language (DDL) Commands for setting up schema of database Process of designing schema can be complex may use design methodology and/or tool Data Manipulation Language (DML) Commands to manipulate data in database: RETRIEVE, INSERT, DELETE, MODIFY Also called "query language"
Database Design The process of designing the general structure of the database Logical Design: Decide on the good collection of the relation schemas Logical modeling deals with gathering business requirements and converting those requirements into a model Business decision: What attributes should we record in the database? IS decision: What relation schemas should we have and how should the attributes be distributed among the various relation schemas?
Database Design The process of designing the general structure of the database Physical Design: Decide on the physical layout of the database Physical modeling deals with the conversion of the logical, or business model, into a relational database model database software specific The objects defined during physical modeling can vary depending on the relational database software being used. Most relational database systems have variations with the way data types are represented and the way data is stored, although basic data types are conceptually the same among different implementations.
Terminology and Basic Ideas Data Definition Language (DDL) Commands for setting up schema of database Process of designing schema can be complex may use design methodology and/or tool Data Manipulation Language (DML) Commands to manipulate data in database: RETRIEVE, INSERT, DELETE, MODIFY Also called "query language"
Structured Query Language (SQL) Widely used non-procedural database query language Question: What is the name, email address and yearly income of the customer with customerkey 11009? Customer Table DimCustomer
Components of a DBMS Database system has five main pieces; Client Communications Manager: manages communication between users and the database Process manager: encapsulates and schedules the various tasks in the system A statement-at-a-time query processing engine A shared transactional storage subsystem: knits together storage, buffer management, concurrency control and recovery A set of shared utilities: memory management, disk space management, replication, and various batch utilities used for administration
Architectural Components of a DBMS
Life of a Query A database interaction at an airport, in which a gate agent clicks on a form to request the passenger list for a flight This button click results in a single-query transaction
Life of a Query The personal computer at the airport gate (the client ) calls an API that in turn communicates over a network to establish a connection with the Client Communications Manager of a DBMS the responsibility of the DBMS client communications manager to establish and remember the connection state for the caller to respond to SQL commands from the caller to return both data and control messages (result codes, errors, etc.) as appropriate In gate agent s query example, the communications manager would establish the security credentials of the client set up state to remember the details of the new connection and the current SQL command across calls forward the client s first request deeper into the DBMS to be processed
Life of a Query Upon receiving the client s first SQL command, the DBMS (Process Manager) must assign a thread of computation to the command Make decision regarding admission control: whether the system should begin processing the query immediately, or defer execution until a time when enough system resources are available to devote to this query Begin executing the gate agent s query by Relational Query Processor checks that the user is authorized to run the query, and compiles the user s SQL query text into an internal query plan The plan executor consists of a suite of operators (relational algorithm implementations) for executing any query Operators implement relational query processing tasks including joins, selection, projection, aggregation, sorting, etc.
Life of a Query One or more operators exist to request data from the database. These operators make calls to fetch data from the DBMS Transactional Storage Manager Manages all data access (read) and manipulation (create, update, delete) calls Invoke the transaction management code to ensure the well-known ACID properties of transactions Before accessing data; locks are acquired from a lock manager to ensure correct execution in the face of other concurrent queries. If the gate agent s query involved updates to the database, it would interact with the log manager to ensure that the transaction was durable if committed, and fully undone if aborted
Life of a Query Agent s query has begun to access data records, and is ready to use them to compute results for the client. done by unwinding the stack of activities The access methods return control to the query executor s operators which orchestrate the computation of result tuples from database data as result tuples are generated, they are placed in a buffer for the client communications manager which ships the results back to the caller. At the end of the query: the transaction is completed and the connection is closed transaction manager cleaning up state for the transaction process manager freeing any control structures for the query communications manager cleaning up communication state for the connection
Life of a Query Shared components and utilities that are vital to the operation of a full-function DBMS The catalog and memory managers are invoked as utilities during any transaction The catalog is used by the query processor during authentication, parsing, and query optimization The memory manager is used throughout the DBMS whenever memory needs to be dynamically allocated or deallocated Remaining modules that run independently of any particular query, keeping the database as a whole well tuned and reliable