Data Organization and Processing I Data Organization in Oracle Server 11g R2 (NDBI007) RNDr. Michal Kopecký, Ph.D. http://www.ms.mff.cuni.cz/~kopecky
Database structure o Database structure o Database files o Memory structure o Process structure Outline Data access o SQL execution o SQL optimization o Indexes
Physical Database Structure 1. Data stored in files and/or disc partitions 2. Data accessed by the database instance set of processes communicating via the shared memory.
Data File Structure 1. Each file consists of 1. Header 2. Used space 3. Free space
Logical Data Structure 1. Data (tables and indexes) stored each in its own segment a linked list of extents. 2. Extent logically continuous set of database blocks 3. Database block the I/O unit. Fixed size for whole tablespace (usually for whole database).
Tablespaces
Tablespaces 1. Data stored in so called tablespaces logical database volumes. 2. Inidividually managed 1. Quotas 2. Offline/Online 3. Back up/restore 4. Recover
Tablespaces 1. Dictionary managed obsolete information about free and used space in data dictionary 2. Locally managed bitmap of free/used space in the tablespace itself 1. Automatic space management
Segments
Data Blocks 1. One DB Block (4KiB, 8KiB, ) consists of a sequence of OS blocks
Data Blocks
Data Rows 1. Each column stored as tuple (length,data) 2. Trailing NULLs not stored
Block Free Space Management 1. PCTFREE=x If free space drops under x%, block is removed from blocks available for INSERTs 2. PCTUSED=y If free space drops below y%, block becomes available for INSERTs
Block Free Space Coalescing
System Global Area 1. SGA stores all data, that can be shared by all running sessions 1. Database buffer cache 2. Redo log buffer 3. Shared pool 4. Large pool 5. 2. Program Global Area (PGA) stores data belonging to given process
Database Buffer 1. Default pool stores copies of DB blocks in external memory (disc) 2. Keep pool stores (all) blocks of small frequently used tables 3. Recycle pool stores blocks of large infrequently used tables to avoid consuming lot of space in default pool 4. Blocks of nondefault sizes have their own pool
Redo-log Buffer 1. Circular buffer that stores redo-log entries. When necessary (COMMIT, checkpoint, buffer full, ) are written to online redo-log file
Process Architecture 1. Client processes 1. Runs client code 2. Server processes 1. Runs Oracle code 3. Background Processes 1. Manipulates with data (bottom of the figure) 2. Check the database (right side of the figure)
Process Architecture 1. Individual Server processes access shared data in the SGA and private data in PGA
Dedicated Server Processes 1. Each client process has its own server process 1. Optimal for batch processing with intensive process usage (without or with less idle time)
Shared Server Processes 1. More clients share one server process through dispatcher process 1. Optimal for less intensive clients with much idle time
Redo Log Writer 1. Log Writer Process (LGWR) writes data from the buffer to online redo log file. 2. When file is full, the redo log switch occurs, checkpoint forced 1. All dirty data blocks written to the database files
Process Structure
Index Colaescing
Index Colaescing
Row Chaining 1. Row too long for the block stored partially in the block and partially in another block(s). Linked via pointers.
Row Migration 1. Row updated so it doesn t fit to the block is migrated to another block. Remains only pointer to keep original ROWID. 2. Two block reads instead of one.
Clustered Tables
SELECT Query Lifecycle 1. SQL statement parsing 1. Syntax checking 2. Semantic checking 2. Check existence of query plan in Shared Pool 3. If plan doesn t exist, generate as much of available plans as possible and choose the best (with lowest cost) 4. If plan doesn t exist, plan row source execution 5. Execute the query
Query Optimization 1. Query can be reformulated (IN operator can be replaced by OR and vice versa, ) 2. For each plan is estimated number of processed rows in nodes, number of bytes, number of read blocks, CPU usage, etc. 3. The cost of the plan is computed as the weighted sum according to internal statistics
Plan Cache Each execution plan is stored. According to the CURSOR_SHARING settins can be used repeatedly o o EXACT cursor sharing The plan can be used for the exactly same SQL statement SIMILAR cursor sharing The plan can be used for similar SQL statement with the same structure, but different constant Each constant is replaced by placeholder SELECT * FROM Emp WHERE ID=1 SELECT * FROM Emp WHERE ID=:x1 Values are bound to placeholders in time of execution PROS: less number of statement parsing CONS: optimization without knowledge of exact value may be ineffective in case of non-uniform distribution of values
B-Tree Indexes o For columns with high selectivity Bitmap o For columns with low selectivity o Can be combined using AND, OR to achieve higher selectivity Index organized Tables