What the Hekaton? In-memory OLTP Overview Kalen Delaney www.sqlserverinternals.com Kalen Delaney Background: MS in Computer Science from UC Berkeley Working exclusively with SQL Server for 28 years SQL Server In-Memory OLTP Internals (RedGate 2014) Primary Author: SQL Server 2012 Internals (MS Press/O Reilly, 2013) Author: SQL Server Concurrency (RedGate 2010) Primary Author: SQL Server 2008 Internals (MS Press, 2009) Primary Author: Inside SQL Server 2005: Query Tuning and Optimization (MS Press, 2007) Author: Inside SQL Server 2005: The Storage Engine (MS Press, 2006) SQL Server Magazine columnist and contributing editor Website: www.sqlserverinternals.com Twitter: @sqlqueen Blog: www.sqlblog.com 2016-05-10 2 Kalen Delaney, 2016 #2 1
In-Memory OLTP Overview SQL Server 2014 adds in-memory technology to boost performance of OLTP workloads Memory optimized table and index structures Native compilation of business logic in stored procedures Latch- and lockfree data structures Fully integrated into SQL Server Stream-based storage Multi-versioning built into table structures Familiar management tools: SSMS/SMO/DMVs 2016-05-10 3 Kalen Delaney, 2016 #3 Agenda In-memory Data Structures Managing Memory and Garbage Collection Persistence and Storage Management Logging and Recovery 2016-05-10 4 Kalen Delaney, 2016 #4 2
In-Memory Data Structures Rows New row format Structure of the row is optimized for in-memory residency and access One copy of row Indexes point to rows, they do not duplicate them Indexes Hash index for equality search Memory-optimized B-tree for range and equality search Do not exist on disk recreated during recovery Every table must have at least one index! 2016-05-10 5 Kalen Delaney, 2016 #5 In-memory Table: Row Format Row header Payload (table columns) 8 bytes * (IdxLinkCount 1) Begin Ts End Ts StmtId IdxLinkCount 8 bytes 8 bytes 4 bytes 2 bytes Key Points: Begin/End timestamp determines row s validity There is no data or index page - Just rows Row size limited to 8060 bytes Allows data to be moved to disk-based tables Not every SQL table schema is supported 2016-05-10 6 Kalen Delaney, 2016 #6 3
Indexes On Memory-Optimized Tables Indexes are what combines rows into a table Hash Indexes Defined as NONCLUSTERED HASH Predefined number of buckets (fixed memory size) Good for point lookups Range Indexes Defined as NONCLUSTERED Stored as Bw-Tree Good for range searches and ordered scans Indexes ONLY exist in memory 2016-05-10 7 Kalen Delaney, 2016 #7 Hash Indexes Timestamps Chain ptrs Name City Hash index on Name 50, Jane Prague Hash index on City 100, John Prague 90, 150 Susan Bogota 2016-05-10 8 Kalen Delaney, 2016 #8 4
Range Index Core Characteristics Resizeable grows and shrinks with utilization Unidirectional Good performance for point lookup, excellent performance for range and table scan Lock free Bw-tree based Leaf pages used to store index keys and pointers to chains of rows Value chains have same characteristics as bucket chains for hash index 2016-05-10 9 Kalen Delaney, 2016 #9 Page Mapping Table 0 PAGE 1 PAGE 2 3 Physical Bw-Tree Root 10 20 28 PageID-0 Page size- up to 8K Logical pointers Indirect physical pointers through Page Mapping table Page Mapping table grows (doubles) as table grows Sibling pages linked one direction Require two indexes for ASC/DESC No in-place updates on index pages Handled thru delta pages or building new pages 5 8 10 11 15 18 21 24 27 PageID-3 Page-ID-2 PageID -14 Non-leaf pages 1 2 4 6 7 8 25 26 27 leaf pages 14 15 200, 1 50, 300 2 Key 100,200 1 Key Data rows 2016-05-10 10 Kalen Delaney, 2016 #10 5
Point Lookups and Range Scans Point lookups similar to B-Trees Range scans Search for starting point Follow keys, duplicate chains and right page pointers until end key is reached Uni-directional because pages linked in only one direction 2016-05-10 11 Kalen Delaney, 2016 #11 Limitations on Tables in SQL 2014 Optimized for in-memory Rows are at most 8060 bytes no off-row data No Large Object (LOB) types like varchar(max) Scoping limitations No FOREIGN KEY and no CHECK constraints IDENTITY only (1,1) No schema changes (ALTER TABLE) need to drop/recreate table No add/remove index need to drop/recreate table 2016-05-10 12 Kalen Delaney, 2016 #12 6
Memory Management Table data resides in memory at all times. No paging Must configure SQL box with sufficient memory to store memory-optimized tables; Max supported 512GB Failure to allocate memory will fail transactional workload at runtime Integrated with SQL Server memory manager and reacts to memory pressure where possible Integration with Resource Governor Bind a database to a resource pool Ensures memory consumption from recovery is accounted for Hard limit (80% of phys. memory) to ensure system remains stable under lowmemory situations 2016-05-10 13 Kalen Delaney, 2016 #13 Garbage Collection Stale Row Versions Updates, deletes, and aborted insert operations create row versions that (eventually) are no longer visible to any transaction. Slows down scans of index structures Creates unused memory that needs to be reclaimed (i.e. Garbage Collected) Garbage Collection (GC) Analogous to version store cleanup task for disk-based tables to support Read Committed Snapshot (RCSI) System maintains oldest active transaction information Design Goals: Non-blocking, Cooperative, Efficient, Responsive, Scalable Active transactions work cooperatively and pick up parts of GC work A dedicated system thread to do GC 2016-05-10 14 Kalen Delaney, 2016 #14 7
Durability Memory-optimized tables can be durable or non-durable Default is durable Non-durable tables are useful for transient data Durable tables are persisted in a single memory-optimized filegroup Storage used for memory-optimized has a different access pattern than for disk tables Filegroup can have multiple containers (volumes) Additional containers aid in parallel recovery; recovery happens at the speed of I/O 2016-05-10 15 Kalen Delaney, 2016 #15 On-disk Storage Filestream is the underlying storage mechanism Checksums and single-bit correcting ECC on files Data files ~128MB in size (unless machine has <16GB memory), write 256KB chunks at a time Stores only the inserted rows (i.e. table content) Chronologically organized streams of row versions Delta files File size is not constant, write 4KB chunks at a time Stores IDs of deleted rows 2016-05-10 16 Kalen Delaney, 2016 #16 8
Storage: Data and Delta Files 0 100 Checkpoint File Pair Data File Delta File TS (ins) RowId TableId TS (ins) RowId TableId TS (ins) RowId TableId TS (ins) RowId TS (del) TS (ins) RowId TS (del) TS (ins) RowId TS (del) Row pay load Row pay load Row pay load 2016-05-10 17 Kalen Delaney, 2016 #17 Populating Data/Delta files SQL Transaction log Del Del Tran1 Tran1(TS150) (row TS150) Log in disk Table Del Tran2 (row (TS TS 450) 450) Del Tran3 (row (TS TS 250) 250) Insert into Hekaton Insert into T1 T1 Offline Checkpoint Thread Delete 150 TS Delete 250 TS Delete 450 TS New Inserts Range 100-199 Range 200-299 Range 300-399 Range 400-499 Range 500- Memory-optimized Table Filegroup 2016-05-10 18 Kalen Delaney, 2016 #18 9
Logging for Memory-Optimized Tables Uses SQL transaction log to store content Each HK log record contains a log record header followed by opaque memory optimized-specific log content All logging for memory-optimized tables is logical No log records for physical structure modifications No index-specific / index-maintenance log records No UNDO information is logged 2016-05-10 19 Kalen Delaney, 2016 #19 Backup for Memory-Optimized Tables Memory-Optimized file group is backed up as part SQL database backup 2016-05-10 20 Kalen Delaney, 2016 #20 10
Recovery for Memory-Optimized Tables Analysis Phase Finds the last completed checkpoint in transaction log Data Load Load from set of data/delta files from the last completed checkpoint Parallel Load by reading data/delta files using 1 thread / file Redo phase to apply tail of the log Apply the transaction log from last checkpoint Concurrent with REDO on disk-based tables No UNDO phase for memory-optimized tables Only committed transactions are logged 2016-05-10 21 Kalen Delaney, 2016 #21 Recovery: Parallel load Memory Optimized Tables Recovery Data Loader Recovery Data Loader Recovery Data Loader filter filter filter Delta map Delta map Delta map Data File1 Delta File1 Data File2 Delta File2 Data File3 Delta File3 Memory Optimized Container - 1 Memory Optimized Container - 2 2016-05-10 22 Kalen Delaney, 2016 #22 11
Summary In-memory Data Structures Managing Memory and Garbage Collection Persistence and Storage Management Logging and Recovery 2016-05-10 23 Kalen Delaney, 2016 #23 2016-05-10 24 Kalen Delaney, 2016 #24 12