SQL Server 2014: In-Memory OLTP for Database Administrators

Similar documents
What the Hekaton? In-memory OLTP Overview. Kalen Delaney

Quick Poll. SQL Server 2014 In-Memory OLTP. Prod Test

JANUARY 20, 2016, SAN JOSE, CA. Microsoft. Microsoft SQL Hekaton Towards Large Scale Use of PM for In-memory Databases

Heckaton. SQL Server's Memory Optimized OLTP Engine

SQL Server 2014 In-Memory Tables (Extreme Transaction Processing)

Field Testing Buffer Pool Extension and In-Memory OLTP Features in SQL Server 2014

The DBA Survival Guide for In-Memory OLTP. Ned Otter SQL Strategist

SQL Server 2014 In-Memory OLTP: Prepare for Migration. George Li, Program Manager, Microsoft

SQL Server 2014 Highlights der wichtigsten Neuerungen In-Memory OLTP (Hekaton)

Will my workload run faster with In-Memory OLTP?

The Google File System

SQL Server In-Memory OLTP Internals Overview

SQL Server In-Memory OLTP Internals Overview for CTP1

Microsoft SQL Server Database Administration

The Google File System

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

Real world SQL 2016 In-Memory OLTP

Persistence Is Futile- Implementing Delayed Durability in SQL Server

Google File System. Arun Sundaram Operating Systems

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

The Google File System

Get the Skinny on Minimally Logged Operations

Columnstore Indexes In SQL Server 2016 #Columnstorerocks!!

Tables. Tables. Physical Organization: SQL Server Partitions

Physical Organization: SQL Server. Leggere Cap 7 Riguzzi et al. Sistemi Informativi

Physical Organization: SQL Server 2005

CLOUD-SCALE FILE SYSTEMS

Course Outline. SQL Server Performance & Tuning For Developers. Course Description: Pre-requisites: Course Content: Performance & Tuning.

Designing Database Solutions for Microsoft SQL Server (465)

Main-Memory Databases 1 / 25

The Google File System (GFS)

Jyotheswar Kuricheti

Memory Pointer Management

Tuesday, April 6, Inside SQL Server

Course Description. Audience. Prerequisites. At Course Completion. : Course 40074A : Microsoft SQL Server 2014 for Oracle DBAs

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

Google File System. By Dinesh Amatya

GFS: The Google File System. Dr. Yingwu Zhu

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

InnoDB: Status, Architecture, and Latest Enhancements

Performance Monitoring Always On Availability Groups. Anthony E. Nocentino

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

Microsoft SQL AlwaysOn and High Availability

The Google File System

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Distributed File Systems II

DESIGNING DATABASE SOLUTIONS FOR MICROSOFT SQL SERVER CERTIFICATION QUESTIONS AND STUDY GUIDE

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Let s Explore SQL Storage Internals. Brian

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

Basics of SQL Transactions

NPTEL Course Jan K. Gopinath Indian Institute of Science

EECS 482 Introduction to Operating Systems

Recoverability. Kathleen Durant PhD CS3200

Google Disk Farm. Early days

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

High Performance Transactions in Deuteronomy

Microsoft SQL Server Fix Pack 15. Reference IBM

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Introduction to Database Services

Scale out Read Only Workload by sharing data files of InnoDB. Zhai weixiang Alibaba Cloud

The Google File System

Build ETL efficiently (10x) with Minimal Logging

Build ETL efficiently (10x) with Minimal Logging

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

EXAM Administering Microsoft SQL Server 2012 Databases. Buy Full Product.

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Today s Class. Faloutsos/Pavlo CMU /615

Synergetics-Standard-SQL Server 2012-DBA-7 day Contents

Virtual to physical address translation

Course 6231A: Maintaining a Microsoft SQL Server 2008 Database

SQL Server 2014 Training. Prepared By: Qasim Nadeem

Foster B-Trees. Lucas Lersch. M. Sc. Caetano Sauer Advisor

CS848 Paper Presentation Building a Database on S3. Brantner, Florescu, Graf, Kossmann, Kraska SIGMOD 2008

CA485 Ray Walshe Google File System

Disks, Memories & Buffer Management

NPTEL Course Jan K. Gopinath Indian Institute of Science

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

20762B: DEVELOPING SQL DATABASES

Indexing survival guide for SQL 2016 In-Memory OLTP. Ned Otter SQL Strategist

Course 6231A: Maintaining a Microsoft SQL Server 2008 Database

CSE 124: Networked Services Fall 2009 Lecture-19

HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System

The Google File System

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

Microsoft SQL AlwaysOn and High Availability

File Systems: Consistency Issues

Columnstore in real life

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

Topics. " Start using a write-ahead log on disk " Log all updates Commit

Transaction Log Internals and Troubleshooting

CSE 544 Principles of Database Management Systems

TempDB how it works? Dubi Lebel Dubi Or Not To Be

Distributed Filesystem

Kathleen Durant PhD Northeastern University CS Indexes

Common non-configured options on a Database Server

Transcription:

SQL Server 2014: In-Memory OLTP for Database Administrators Presenter: Sunil Agarwal Moderator: Angela Henry

Session Objectives And Takeaways Session Objective(s): Understand the SQL Server 2014 In-Memory OLTP Memory usage and Management Storage usage and Management Key Takeaway 1 Memory-optimized tables have new memory-optimized data structures and must fit in memory Key Takeaway 2 Durability of memory-optimized tables requires consideration for storage and recovery

Memory Usage and Management

In-Memory Data Structures Rows New row format Structure of the row is optimized for memory residency and access No data page containers Rows are versioned (payload never updated in place) Delete/Insert Indexes Nonclustered Indexes only Indexes point to physical rows. No need for covering columns <nonclustered hash> index for point lookups. <nonclustered> index for range (inequality) and ordered scans Not logged and do not exist on disk maintained online or recreated during recovery No index fragmentation on disk

Memory-optimized Table: Row Format Row header Payload (table columns) 8 bytes * (IdxLinkCount) Begin Ts End Ts StmtId IdxLinkCount 8 bytes 8 bytes 4 bytes 2 + 2 (padding) bytes Key Points Begin/End timestamp determines row s version validity and visibility No data pages; just rows Row size limited to 8060 bytes (@table create time) to allow data to be moved to disk-based table Not every SQL table schema is supported (for example LOB and SQLVARIANT are not supported)

Memory Optimized Tables: Hash Indexes Timestamps Chain ptrs Name City Hash index on Name f(jane) f(john) 50, Jane Prague 100, 200 John Prague 200, John Beijing Hash index on City f(prague) f(beijing) f(bogota) 90, 150 Susan Bogota

Memory Optimized Tables: BW Tree Page Mapping Table 0 1 2 3 Physical PageID-3 5 8 Logical 1 0 1 0 1 1 2 0 Root 1 5 2 8 PageID-2 1 8 PageID-0 PageID -14 2 1 2 4 2 7 Page size- up to 8K. Sized to the row Logical pointers Indirect physical pointers through Page Mapping table Page Mapping table grows (doubles) as table grows Sibling pages linked one direction Require two indexes for ASC/DSC No in-place updates on index pages Handled thru delta pages or building new pages No covering columns (only the key is stored) Non-leaf pages 1 2 4 6 7 8 2 5 2 6 2 7 leaf pages 14 15 200, 1 50, 300 2 100,200 1 Key Key Data rows

Memory Management Data resides in memory at all times Must configure SQL server with sufficient memory Failure to allocate memory will fail transactional workload at run-time Integrated with SQL Server memory manager Reacts to memory pressure for GC (Garbage Collection) Guidance for SQL2014 is not to exceed 256GB of in-memory table user data Integrated with Resource Governor Recommend using dedicated resource pool To ensure performance stability for disk-based table workloads Bind a database to a resource pool Memory-optimized tables cannot exceed the limit of the resource pool Hard top limit to ensure system remains stable under memory pressure

Garbage Collection Stale Row Versions Updates, deletes, and aborted insert operations create row versions that (eventually) are no longer visible to any transaction Slow down scans of index structures Create unused memory that needs to be reclaimed (i.e. Garbage Collected) Garbage Collection (GC) Analogous to version store cleanup task for disk-based tables to support Read Committed Snapshot (RCSI) System maintains oldest active transaction hint GC Design Non-blocking, Cooperative, Efficient, Responsive, Scalable Active transactions work cooperatively and pick up parts of GC work A dedicated system thread for GC

Memory Optimized Tables: Garbage Cleanup Timestamps Chain ptrs Name City Hash index on Name f(jane) f(john) 50, Jane Prague 100, 200 John Prague 200, John Beijing Hash index on City f(prague) f(beijing) 90, 150 Susan Bogota T250: lowest Active Transaction Timestamp Select * from <table> where name = John Automatic Garbage Collection kicks in

Memory Usage and Monitoring

Demo Memory Usage

In-Memory OLTP - Storage

Disk-Based Tables Storage Revisited Page based Latch contention due to concurrent access Random IO when modified page is flushed to disk Checkpoint can flood the IO and can cause temporary glitch in the workload Transaction Log Each change to the data and index row is logged separately UNDO is part of the transaction log for rollback

Durability Memory-optimized tables can be durable or non-durable Default is durable Non-durable tables are useful for transient data ( relational cache scenario) Only Committed transactions are logged No UNDO is logged A common system bottleneck; SSD or Fusion-IO recommended Delayed durability support define at the SP/Transaction level Durable tables are persisted in memory-optimized filegroup Storage used for memory-optimized has a different access pattern (Sequential IO) than for disk tables Recommend SSD or fast (15K e.g.) SAS drives Filegroup can have multiple containers (volumes) to aid in parallel recovery; recovery typically happens at the speed of IO

Storage in Memory-Optimized Filegroup Filestream is the underlying storage mechanism Checksums and single-bit correcting ECC on files Data files For machines <=16GB memory size 16MB. For larger machines 128MB Writes in 256KB chunks at a time Stores only the inserted rows (i.e. table content) Chronologically organized streams of row versions Delta files For machines <=16GB memory size 1MB. For larger machines 8MB Writes in 4KB chunks at a time. Stores IDs of deleted rows

Checkpoint File Pair Storage: Data and Delta Files 0 100 Data File Transaction Timestamp Range TS (ins) RowId TableId TS (ins) RowId TableId TS (ins) RowId TableId Row pay load Row pay load Row pay load Data file contains rows inserted within a given transaction range Delta File TS (ins) RowId TS (del) TS (ins) RowId TS (del) TS (ins) RowId TS (del)

Range 100-200 Range 200-300 Range 300-400 Range 400-500 Range 500- Populating Data/Delta files SQL Transaction log (from LogPool) Del Del Tran1 Tran1(TS150) (row TS150) Log in disk Table Del Tran2 (row (TS TS 450) 450) Del Tran3 (row (TS TS 250) 250) Insert into Insert into T1 Hekaton T1 Offline Checkpoint Thread Delete 150 TS Delete 250 TS Delete 450 TS New Inserts Memory-optimized Table Filegroup Data file with rows generated in timestamp range IDs of Deleted Rows (height indicates % deleted)

Merge Operations What is a Merge Operation? Merges one or more adjacent data/delta files pairs into 1 pair Need for Merge Deleting rows causes data files to have stale rows DMV: sys.dm_xtp_checkpoint_files can be used to find inserted/deleted rows and freespace Benefits of Merge Reduces storage (i.e. fewer data/delta files) required to store active data rows Improves the recovery time as there will be fewer files to load Merge is a (non-blocking) background operation Merge does not block concurrent deletes in the affected file pairs

Range 100-200 Range 200-300 Range 300-400 Range 400-500 Range 100-200 Range 200-299 200-300 Range 300-400 300-399 Range 200-400 Range 400-500 Range 500-600 Merge Operation Files as of Time 500 Files as of Time 600 Merge 200-400 Memory-optimized data Filegroup Memory-optimized data Filegroup Data file with rows generated in timestamp range IDs of Deleted Rows (height indicates % deleted) Files Under Merge Deleted Files

Logging for Memory-Optimized Tables Uses SQL transaction log to store content Log record: log record header followed by opaque memory optimizedspecific log content. All logging for memory-optimized tables is logical No log records for physical structure modifications. No index-specific / index-maintenance log records. No UNDO information is logged All three recovery models (Simple, Full, Bulk) are supported

In-Memory OLTP: Checkpoint Disk-based tables Frequency is determined by the recovery interval configuration option Can potentially flood IO queue when flushing dirty pages. In-Memory OLTP Automatic checkpoint done after 512MB of transaction log written the last checkpoint Independent of the checkpoint for disk-based tables No flooding of IO queue. Checkpoint writes meta-information (e.g. list of data/delta files) Manual Checkpoint Forces checkpoint for both disk-based tables and memory_optimized tables

Storage Considerations

Backup for Memory-Optimized Tables Memory-Optimized file group is backed up as part SQL database backup Differential backup supported only new data files since last full backup Piecemeal backups supported

In-Memory OLTP: Recovery Memory Optimized Tables Recovery Data Loader Recovery Data Loader Recovery Data Loader filter filter filter Delta map Delta map Delta map Data File1 Delta File1 Data File2 Delta File2 Data File3 Delta File3 Memory Optimized Container - 1 Memory Optimized Container - 2

In-Memory OLTP: High Availability AlwaysOn Failover Clustering Fully Supported AlwaysOn Availability Group All configuration Supported Data is kept in-memory on the Secondary Replica. Instant failover Offload query workload to Readable Secondary Backups can be offloaded to Secondary Replica Transaction Replication Only supported as subscriber

Demo Logging and Storage

Session Objectives And Takeaways Session Objective(s): Understand the SQL Server 2014 In-Memory OLTP Memory usage Storage usage Key Takeaway 1 Memory-optimized tables have new memory-optimized data structures and must fit in memory Key Takeaway 2 Durability of memory-optimized tables requires consideration for storage and recovery

Thank You for Attending