Data Organization and Processing I

Data Organization and Processing I Data Organization in Oracle Server 11g R2 (NDBI007) RNDr. Michal Kopecký, Ph.D. http://www.ms.mff.cuni.cz/~kopecky

Database structure o Database structure o Database files o Memory structure o Process structure Outline Data access o SQL execution o SQL optimization o Indexes

Physical Database Structure 1. Data stored in files and/or disc partitions 2. Data accessed by the database instance set of processes communicating via the shared memory.

Data File Structure 1. Each file consists of 1. Header 2. Used space 3. Free space

Logical Data Structure 1. Data (tables and indexes) stored each in its own segment a linked list of extents. 2. Extent logically continuous set of database blocks 3. Database block the I/O unit. Fixed size for whole tablespace (usually for whole database).

Tablespaces

Tablespaces 1. Data stored in so called tablespaces logical database volumes. 2. Inidividually managed 1. Quotas 2. Offline/Online 3. Back up/restore 4. Recover

Tablespaces 1. Dictionary managed obsolete information about free and used space in data dictionary 2. Locally managed bitmap of free/used space in the tablespace itself 1. Automatic space management

Segments

Data Blocks 1. One DB Block (4KiB, 8KiB, ) consists of a sequence of OS blocks

Data Blocks

Data Rows 1. Each column stored as tuple (length,data) 2. Trailing NULLs not stored

Block Free Space Management 1. PCTFREE=x If free space drops under x%, block is removed from blocks available for INSERTs 2. PCTUSED=y If free space drops below y%, block becomes available for INSERTs

Block Free Space Coalescing

System Global Area 1. SGA stores all data, that can be shared by all running sessions 1. Database buffer cache 2. Redo log buffer 3. Shared pool 4. Large pool 5. 2. Program Global Area (PGA) stores data belonging to given process

Database Buffer 1. Default pool stores copies of DB blocks in external memory (disc) 2. Keep pool stores (all) blocks of small frequently used tables 3. Recycle pool stores blocks of large infrequently used tables to avoid consuming lot of space in default pool 4. Blocks of nondefault sizes have their own pool

Redo-log Buffer 1. Circular buffer that stores redo-log entries. When necessary (COMMIT, checkpoint, buffer full, ) are written to online redo-log file

Process Architecture 1. Client processes 1. Runs client code 2. Server processes 1. Runs Oracle code 3. Background Processes 1. Manipulates with data (bottom of the figure) 2. Check the database (right side of the figure)

Process Architecture 1. Individual Server processes access shared data in the SGA and private data in PGA

Dedicated Server Processes 1. Each client process has its own server process 1. Optimal for batch processing with intensive process usage (without or with less idle time)

Shared Server Processes 1. More clients share one server process through dispatcher process 1. Optimal for less intensive clients with much idle time

Redo Log Writer 1. Log Writer Process (LGWR) writes data from the buffer to online redo log file. 2. When file is full, the redo log switch occurs, checkpoint forced 1. All dirty data blocks written to the database files

Process Structure

Index Colaescing

Row Chaining 1. Row too long for the block stored partially in the block and partially in another block(s). Linked via pointers.

Row Migration 1. Row updated so it doesn t fit to the block is migrated to another block. Remains only pointer to keep original ROWID. 2. Two block reads instead of one.

Clustered Tables

SELECT Query Lifecycle 1. SQL statement parsing 1. Syntax checking 2. Semantic checking 2. Check existence of query plan in Shared Pool 3. If plan doesn t exist, generate as much of available plans as possible and choose the best (with lowest cost) 4. If plan doesn t exist, plan row source execution 5. Execute the query

Query Optimization 1. Query can be reformulated (IN operator can be replaced by OR and vice versa, ) 2. For each plan is estimated number of processed rows in nodes, number of bytes, number of read blocks, CPU usage, etc. 3. The cost of the plan is computed as the weighted sum according to internal statistics

Plan Cache Each execution plan is stored. According to the CURSOR_SHARING settins can be used repeatedly o o EXACT cursor sharing The plan can be used for the exactly same SQL statement SIMILAR cursor sharing The plan can be used for similar SQL statement with the same structure, but different constant Each constant is replaced by placeholder SELECT * FROM Emp WHERE ID=1 SELECT * FROM Emp WHERE ID=:x1 Values are bound to placeholders in time of execution PROS: less number of statement parsing CONS: optimization without knowledge of exact value may be ineffective in case of non-uniform distribution of values

B-Tree Indexes o For columns with high selectivity Bitmap o For columns with low selectivity o Can be combined using AND, OR to achieve higher selectivity Index organized Tables