Efficient Lazy Timestamping in BerkeleyDB Student: Shilong (Stanley) Yao Advisor: Dr. Richard T.Snodgrass Qualifying Oral Exam Computer Science Department University of Arizona 04/18/03 (1:30-3:00pm) 3:00pm) Efficient Lazy Timestamping in BerkeleyDB 2 Temporal Database Temporal database Support some aspects of time Simplify sophisticated queries over time Almost all database applications concern time Academic: Course schedule over time GIS: Land use over time Accounting: Bill management over time Etc. Efficient Lazy Timestamping in BerkeleyDB 3 A Challenge What s s salary history? Name Salary 60,000 70,000 CREATE TABLE Temp (Salary, Start, Stop) AS SELECT salary, Start, Stop FROM Employee WHERE Name = ; SELECT DISTINCT F.Salary, F.Start, F.Stop FROM Temp AS F, Temp AS L TitleWHERE F.Start < L.Stop Start Stop AND F.Salary = L.Salary Assistant AND NOT Prof. EXISTS (SELECT 2002-05 * 05-20 Associate FROM Prof. Temp AS M WHERE M.Salary = F.Salary AND F.Start < M.Start AND M.Start < L.Stop 2003-03-10 AND NOT EXISTS (SELECT * FROM Temp AS T1 WHERE T1.Salary = F.Salary AND T1.Start < M.Start AND M.Start <= T1.Stop TRANSACTIONTIME SELECT AND Salary NOT EXISTS (SELECT * FROM Employee FROM Temp AS T2 Where Name = ; WHERE T2.Salary = F.Salary AND ((T2.Start < F.Start AND F.Start <= T2.Stop) OR (T2.Start < L.Stop AND L.Stop < T2.Stop))) Efficient Lazy Timestamping in BerkeleyDB 4 Time & Valid Time Example of Database Time: When the fact is stored as current in the database. Valid Time: When the fact is true in the modeled reality. Valid Time Eva buys on Jan 10 Peter buys on Jan 20 Peter sells on Jan 30 (2002-05-20) Insert () Update Name Name Salary 60,000 Salary 60,000 70,000 Start 2002-05-20 Start 2002-05-20 Stop UC Stop UC 01/30 01/20 01/10 01/10 01/20 01/30 Time Efficient Lazy Timestamping in BerkeleyDB 5 (2003-03-10) Delete Name Salary 60,000 70,000 Start 2002-05-20 Stop 2003-03-10 Efficient Lazy Timestamping in BerkeleyDB 6
Time Definition: Timestamping is is providing the transaction time to the tuple s internal time fields. Key Problem: Ensure transaction consistent timestamps. All timestamps of the same transaction are identical. Several Key Issues: Which time to choose as the transaction time? When to do the stamping? What information is needed for the stamping? Efficient Lazy Timestamping in BerkeleyDB 7 Choose the Time Begin time of the transaction Advantage: time is available whenever updating a tuple Double visiting is not needed Disadvantage: Requires concurrency control scheme based on timestamp ordering. Loses the superiority of conventional locking. Commit time of the transaction Advantage: Supports 2PL Lower abort rate Disadvantage: Need a stamper ID as the place holder for the timestamp Revisiting the records with stamper ID is necessary Efficient Lazy Timestamping in BerkeleyDB 8 When To Stamp? Eager Timestamping Efficient in-memory stamping Heavy I/O when steal policy is in effect No double visiting or timestamp table needed Pure Lazy Timestamping Need a list of updated unstamped pages Eager timestamping Lazy timestamping Laziness of the Definition: Laziness of the stamping is the latency between the transaction commit and stamping. Time Info Overhead I/O Load CPU Load begin commit A subsequent read Pure Lazy Eager Efficient Lazy Timestamping in BerkeleyDB 9 Efficient Lazy Timestamping in BerkeleyDB 10 Lazy Time How to stamp after commit? Keep a Time table (TT for short) ID 111111 222222 333333 timestamp Double-visit the record: one before commit, the other after commit. 09:00, 01/01/2002 16:05, 01/01/2002 09:10, 01/05/2002 Efficient Lazy Timestamping in BerkeleyDB 11 Efficient Lazy Timestamping in BerkeleyDB 12
BerkeleyDB Overview Open-source embedded database database library developed by UC Berkeley and distributed by Sleepycat Current release version: 4.1.x Not a relational database BerkeyDB Original Major Subsystems Access Methods Subsystem Memory Pool Subsystem Subsystem Locking Subsystem Logging Subsystem New Subsystem Temporal Subsystem Efficient Lazy Timestamping in BerkeleyDB 13 Efficient Lazy Timestamping in BerkeleyDB 14 BerkeleyDB Role of the STP Module Maintain the TT and TL table Do the stamping (Replacing the stamper ID with the actual transaction time of the transaction) Making TT survive system crashes Efficient Lazy Timestamping in BerkeleyDB 15 Efficient Lazy Timestamping in BerkeleyDB 16 STP Module Logging Recovery Logging Functions Recovery Functions STP MPOOL Functions TT Table TL Table Txn Functions CLK Efficient Lazy Timestamping in BerkeleyDB 17 Efficient Lazy Timestamping in BerkeleyDB 18
Record Efficient Lazy Timestamping in BerkeleyDB A Page s Tour Tuple Categories Memory Page Database operations STAMPER TL TT Page According to transaction time status: type-1: Unstamped & uncommitted type-2: Unstamped & committed type-3: Stamped Page According to the storage status: In-mem On-disk Efficient Lazy Timestamping in BerkeleyDB 19 Efficient Lazy Timestamping in BerkeleyDB 20 Tuple Type Evolution Data Structures MPool Stamper Created Mem Unstamped Uncommitted Mem Unstamped Committed Stamper Stamper Stamper Unstamped Uncommitted Unstamped Committed Mem Stamped Committed Stamped Committed TL Difference Log and Recovery Protected TT All in-memory changes are kept in TL (not logged) All on-disk changes are kept in TT (logged) Efficient Lazy Timestamping in BerkeleyDB 21 Efficient Lazy Timestamping in BerkeleyDB 22 TT Table TT ( Time Table) log/recovery protected Fields ID: (EnvID, Txnid) Time In-memory Unstamped Page List On-disk Unstamped Page Count TLTableTable TL ( Location Table) BT non-logged, non-recoverable, 2-D table TT Efficient Lazy Timestamping in BerkeleyDB 23 Efficient Lazy Timestamping in BerkeleyDB 24
How to Construct TL Add a BT Entry For each page read into the memory For each page created in the memory Construct In-memory Unstamped Page List for Each BT Entry At a transaction commit, get the list of WRITE-locked pages from LOCK subsystem Add a node in the 2-D table for each of these pages Garbage Collecting the TT Table Garbage Collecting the TT Entry is Necessary TT is in memory data structure TT grows as new transactions begin TT entry lookup is faster if TT is smaller How to Garbage Collect a TT Entry In-mem unstamped page list becomes empty On-disk unstamped page count becomes zero Problem Tuples containing transaction IDs may move among pages Efficient Lazy Timestamping in BerkeleyDB 25 Efficient Lazy Timestamping in BerkeleyDB 26 Handling Tuple Movement When Does Tuple Movement happen? BTree page split / merge Copying duplicate keys off the page Solution: Adjust the On-disk Page Count Pages fed with new records are passed to STP to register the unstamped tuples in TL Algorithm Overview Begin/Abort/Commit Handling Tuple Movement at pgread/pgwrite/fget Buffer Free Log/Recovery Adding new TT entry Modifying TT [i].od_pgcnt Backup TT at checkpoint Rebuild TT at recovery Renovation Efficient Lazy Timestamping in BerkeleyDB 27 Efficient Lazy Timestamping in BerkeleyDB 28 Begin Abort stp_txn_begin (DB_TXN *tid) stp_txn_abort (DB_TXN *tid) Add an TT entry for this transaction Init the TT entry: transaction time = INVALID_TIME in-mem page list = EMPTY on-disk page count = 0 Garbage collect the TT entry for this transaction Efficient Lazy Timestamping in BerkeleyDB 29 Efficient Lazy Timestamping in BerkeleyDB 30
Commit Handling Tuple Movement stp_txn_commit (DB_TXN *tid) stp_addbt (void *addrp) Fill the transaction time Get the list of pages WRITE-locked by this transaction and add them to the TLas the in-memory page list of this transaction If there is no entry for this page in the TL table, add one for it. Scan the page and register in TL table all the transactions that updated this page Efficient Lazy Timestamping in BerkeleyDB 31 Efficient Lazy Timestamping in BerkeleyDB 32 When Reading a Page stp_pgread(void *addrp) When Writing a Page stp_pgwrite (void *addrp) For (each rec. in page *addrp) Do the correspondent operation and update TL For (each rec. in page *addrp) Do the correspondent operation and update TL Txnid In_type 0 2,1 Others 0,1,2,3 2,3 Update TT acording to TL Out_type 1 3 Others 2 1 Operation TT[]->pgcnt++ TT[]->pgcnt-- Do nothing Impossible Impossible Modify page LSN Efficient Lazy Timestamping in BerkeleyDB 33 Efficient Lazy Timestamping in BerkeleyDB 34 Buffer Free stp_bhfree (void *addrp) Delete this page s BT entry and the row of nodes in TL. Garbage collect the TT entries whose in-memory page list become empty and od_pgcnt is zero. Logging Snapshot TT at checkpoint Put all TT entries with positive od_pgcnt sequentially into the log Adding new TT entry Whenever a new TT entry is created, log it. Modifying TT [i].od_pgcnt Whenever TT [i].od_pgcnt is decreased, log it. Efficient Lazy Timestamping in BerkeleyDB 35 Efficient Lazy Timestamping in BerkeleyDB 36
TT Table Recovery Logging/Recovery Example stp_recover () Restore the TT Table snapshot in the latest checkpoint log entry Scan from the latest checkpoint toward the end of the log, modify the TT table according to the TT modification log entries Efficient Lazy Timestamping in BerkeleyDB 37 Efficient Lazy Timestamping in BerkeleyDB 38 Renovation What is Renovation? The process of asynchronously reading the on-disk pages and stamping them so that TT entries are garbage collected. Why Renovation: TT Table is in memory data structure TT Table grows as new transactions begin TT Entries need garbage collecting Renovation is is similar as the Vacuum Cleaner in POSTGRES How to Renovate? Db_renovate Utility Working in parallel with user applications Sequentially scan the database so that STP can stamp them User APP Join User APP User APP DB Environment Shared Regions Db_renovate Efficient Lazy Timestamping in BerkeleyDB 39 Efficient Lazy Timestamping in BerkeleyDB 40 Design Decisions Higher Level.vs. Lower Level Per.vs. Per Environment In-memory Strategy: at Commit.vs. after Commit On-disk Unstamped Page Tracking: In- memory.vs. On-disk Data Structure Renovation At checkpointing.vs. Concurrent How to Identify the In-memory Unstamped Pages Efficient Lazy Timestamping in BerkeleyDB 41 Efficient Lazy Timestamping in BerkeleyDB 42
CPU Time Efficient Lazy Timestamping in BerkeleyDB 43 Efficient Lazy Timestamping in BerkeleyDB 44 POSTGRES Future Work 2PL WAL Time Support Management Time TT Table Force at commit Steal T-BerkeleyDB WAL Commit Time In-memory No BerkeleyDB No WAL No Postgres No Archive Storage Commit Time Regular Relation Minimize the log flush overhead Use more efficient data structure to store TT and TL, such as hash table, AVL tree, etc. Efficient Lazy Timestamping in BerkeleyDB 45 Efficient Lazy Timestamping in BerkeleyDB 46 Summary Choose commit -time as timestamp Maintain TT table for timestamping Keep TT stable with the aid of original LOGGING/RECOVERY system Use auxiliary data structure TL to aid TT Minimum I/O overhead References Richard T. Snodgrass, Michael H Bohlen, Christian S. Jensen and Adreas Steiner, Transitioning Temporal Support in TSQL2 to SQL3 Richard T.Snodgrass, Temporal Database,, Lecture of CSc630 Spring 2002 Sleepycat (http:// http://www.sleepycat.com/) Betty Salzberg, Timestamping After Commit,, IEEE 1994 Garlo Zaniolo, Stefano Ceri, Christos Faloutsos, Richard T. Snodgrass, V. S. Subrahmanian, Roberto Zicari, Advanced Database Systems, Morgan Kaufmann, 1997 C.S. Jensen, J. Clifford, S.K. Gadia, A.Segev, R.T. Snodgrass, A Glossary of Temporal Database Concepts, SIGMOD Record, Vol. 21, No. 3, Sept. 192 Efficient Lazy Timestamping in BerkeleyDB 47 Efficient Lazy Timestamping in BerkeleyDB 48
References (cont.) Micheal Stonebraker, Lawrence A. Rowe, and Michael Hirohama, The implementation of POSTGRES,, TKDE Vol. 2, No. 1, March 1990 Michael Stonebraker, The Design of the POSTGRES Storage System,, 13th VLDB, Sept. 1987 Lawrence A. Rowe, Michael R. Stonebraker, The POSTGRES Data Model,, 13th VLDB, Brighton 1987 Raghu Ramakrishnan, Database Management Systems, WCB & McGraw-Hill, 1998 Christian S. Jensen, Temporal Database Management, April 2000 (http:// http://www.cs.auc.dk/~csj/thesis/) Efficient Lazy Timestamping in BerkeleyDB 49