YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores Swapnil Patil Milo Polte, Wittawat Tantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie Rinaldi * Carnegie Mellon University * National Security Agency Open Cirrus Summit, October 2011, Atlanta GA
Scalable table stores are critical systems For data processing & analysis (e.g. Pregel, Hive) For systems services (e.g., Google Colossus metadata) Swapnil Patil, CMU 2
Evolution of scalable table stores Growing set of HBase features 2008 2009 2010 2011+ HBASE release RangeRowFilters Batch updates Bulk load tools RegEx filtering Scan optimizations Co-processors Access Control Simple, lightweight complex, feature-rich stores Supports a broader range of applications and services Hard to debug and understand performance problems Complex behavior and interaction of various components Swapnil Patil, CMU 3
Need richer tools for understanding advanced features in table stores YCSB++ FUNCTIONALITY ZooKeeper-based distributed and coordinated testing API and extensions the new Apache ACCUMULO DB Fine-grained, correlated monitoring using OTUS FEATURES TESTED USING YCSB++ Batch writing Table pre-splitting Bulk loading Weak consistency Server-side filtering Fine-grained security Tool released at http://www.pdl.cmu.edu/ycsb++ Swapnil Patil, CMU 4
Outline Problem YCSB++ design Illustrative examples Ongoing work and summary Swapnil Patil, CMU 5
Yahoo Cloud Serving Benchmark [Cooper2010] Command-line Parameters HBASE Workload Parameter File Workload Executor Threads DB Clients OTHER DBS Stats Storage Servers For CRUD (create-read-update-delete) benchmarking Single-node system with an extensible API Swapnil Patil, CMU 6
YCSB++: New extensions Command-line Parameters HBASE Workload Parameter File Workload Executor Threads DB Clients OTHER DBS EXTENSIONS EXTENSIONS Stats ACCUMULO Storage Servers Added support for the new Apache ACCUMULO DB New parameters and workload executors Swapnil Patil, CMU 7
YCSB++: Distributed & parallel tests Command-line Parameters MULTI-PHASE HBASE Workload Parameter File Workload Executor Threads DB Clients OTHER DBS EXTENSIONS EXTENSIONS Stats ACCUMULO COORDINATION YCSB clients Storage Servers Multi-client, multi-phase coordination using ZooKeeper Enables testing at large scales and testing asymmetric features Swapnil Patil, CMU 8
YCSB++: Collective monitoring Command-line Parameters MULTI-PHASE HBASE Workload Parameter File Workload Executor Threads DB Clients OTHER DBS EXTENSIONS EXTENSIONS Stats ACCUMULO COORDINATION OTUS MONITORING YCSB clients Storage Servers OTUS monitor built on Ganglia [Ren2011] Collects information from YCSB, table stores, HDFS and OS Swapnil Patil, CMU 9
Example of YCSB++ debugging Monitoring Resource Usage and TableStore Metrics CPU Usage (%) 100 80 60 40 20 Accumulo Avg. StoreFiles per Tablet HDFS DataNode CPU Usage Accumulo TabletServer CPU Usage 40 32 24 16 8 Avg Number of StoreFiles Per Tablet 0 0 00:00 04:00 08:00 12:00 16:00 20:00 00:00 04:00 Time (Minutes) OTUS collects fine-grained information Both HDFS process and TabletServer process on same node Swapnil Patil, CMU 10
Outline Problem YCSB++ design Illustrative examples YCSB++ on HBASE and ACCUMULO (Bigtable-like stores) Ongoing work and summary Swapnil Patil, CMU 11
Recap of Bigtable-like table stores Tablet Servers Data Insertion Memtable 1 2 Write Ahead Log Sorted Indexed Files MINOR COMPACTION MAJOR COMPACTION 3 Tablet T N (Fewer) Sorted Indexed Files HDFS nodes Write-path: in-memory buffering & async FS writes 1) Mutations logged in memory tables (unsorted order) 2) Minor compaction: Memtables -> sorted, indexed files in HDFS 3) Major compaction: LSM-tree based file merging in background Read-path: lookup both memtables and on-disk files Swapnil Patil, CMU 12
Apache ACCUMULO Started at NSA; now an Apache Incubator project Designed for for high-speed ingest and scan workloads http://incubator.apache.org/projects/accumulo.html New features in ACCUMULO Iterator framework for user-specified programs placed in between different stages of the DB pipeline E.g., Support joins and stream processing using iterators Also supports fine-grained cell-level access control Swapnil Patil, CMU 13
ILLUSTRATIVE EXAMPLE #1 Analyzing the fast inserts vs. weak consistency tradeoff using YCSB++ Swapnil Patil, CMU 14
Client-side batch writing Table store servers CLIENT #1 CLIENT #2 Store client Batch YCSB++ ZooKeeper Cluster Manager Store client Read {K} YCSB++ Feature: clients batch inserts, delay writes to server Improves insert throughput and latency Newly inserted data may not be immediately visible to other clients Swapnil Patil, CMU 15
Batch writing improves throughput Inserts per second (1000s) 60 50 40 30 20 10 0 Hbase Accumulo 10 KB 100 KB 1 MB 10 MB Batch size 6 clients creating 9 million 1-Kbyte records on 6 servers Small batches - high client CPU utilization, limits throughput Large batches - saturate servers, limited benefit from batching Swapnil Patil, CMU 16
Batch writing causes weak consistency Table store servers CLIENT #1 CLIENT #2 Store client Store client 1 Insert {K:V} (10 6 records) Batch YCSB++ ZooKeeper 2 3 Enqueue K (sample 1% records) Poll and dequeue K 4 Read {K} YCSB++ Test setup: ZooKeeper-based client coordination Share producer-consumer queue between readers/writers R-W lag = delay before C2 can read C1 s most recent write Swapnil Patil, CMU 17
Batch writing causes weak consistency (a) HBase: Time lag for different buffer sizes (b) Accumulo: Time lag for different buffer sizes Fraction of requests 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 10 KB ( <1%) 100 KB (7.4%) 1 MB ( 17%) 10 MB ( 23%) Fraction of requests 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 10 KB ( <1%) 100 KB (1.2%) 1 MB ( 14%) 10 MB ( 33%) 0.1 0.1 0 0 1 10 100 1000 10000 100000 read-after-write time lag (ms) 1 10 100 1000 10000 100000 read-after-write time lag (ms) Deferred write wins, but lag can be ~100 seconds (N%) = fraction of requests that needed multiple read()s Implementation of batching affects the median latency Swapnil Patil, CMU 18
ILLUSTRATIVE EXAMPLE #2 Benchmarking high-speed ingest features using YCSB++ Swapnil Patil, CMU 19
Features for high-speed insertions Most table stores have high-speed ingest features Periodically insert large amounts of data or migrate old data in bulk Classic relational DB techniques applied to new stores Two features: bulk loading and table pre-splitting Less data migration during inserts Engages more tablet servers immediately Need careful tuning and configuration [Sasha2002] Swapnil Patil, CMU 20
8-phase test setup: table bulk loading Phases 1 2 3 4 5 6 7 8 Pre-Load (re-formatting) Pre-Load (importing) Read/Update workload Load (re-formatting) Load (importing) Read/Update workload Sleep Read/Update workload Bulk loading involves two steps Hadoop-based data formatting Importing store files into table store Pre-load phase (1 and 2) Bulk load 6M rows in an empty table Goal: parallelism by engaging all servers Load phase (4 and 5) Load 48M new rows Goal: study rebalancing during ingest R/U measurements (3, 6 and 7) Correlate latency with rebalancing work Swapnil Patil, CMU 21
Read latency affected by rebalancing work Phases 1 2 3 4 5 6 Pre-Load (re-formatting) Pre-Load (importing) Read/Update workload Load (re-formatting) Load (importing) Read/Update workload Accumulo Read Latency (ms) R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8) 1000 100 10 1 0 60 120 180 240 300 Measurement Phase Running Time (Seconds) 7 8 Sleep Read/Update workload High latency after high insertion periods that cause servers to rebalance (compactions) Latency drops after store is in a steady state Swapnil Patil, CMU 22
Rebalancing on ACCUMULO servers Phases 1 Pre-Load (re-formatting) 1000 2 Pre-Load (importing) 100 StoreFiles 3 4 Read/Update workload Load (re-formatting) 10 Tablets Compactions 5 6 7 8 Load (importing) Read/Update workload Sleep Read/Update workload 1 0 300 600 900 1200 1500 1800 Experiment Running Time (sec) OTUS monitor shows the server-side compactions during post-ingest measurement phases Swapnil Patil, CMU 23
HBASE is slower: Different compaction policies Accumulo Read Latency (ms) R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8) 10000 1000 1000 100 10 1 0 60 120 180 240 300 Measurement Phase Running Time (Seconds) HBase Read Latency (ms) 10000 1000 1000 100 10 1 0 60 120 180 240 300 Measurement Phase Running Time (Seconds) 100 StoreFiles 100 10 Tablets 10 Compactions 1 0 300 600 900 1200 1500 1800 Accumulo Experiment Running Time (sec) 1 0 300 600 900 1200 1500 1800 HBase Experiment Running Time (sec) Swapnil Patil, CMU 24
Extending to table pre-splitting Pre-split a key range into N partitions to avoid splitting during insertion Pre-Load (re-formatting) Pre-Load (importing) Pre-load Read/Update workload Pre-split into N ranges Load (re-formatting) Load (importing) Read/Update workload Sleep Read/Update workload Bulk loading test Table pre-splitting test Load Read/Update workload Sleep Read/Update workload Swapnil Patil, CMU 25
Outline Problem YCSB++ design Illustrative examples Ongoing work and summary Swapnil Patil, CMU 26
Things not covered in this talk More features: function shipping to servers Data filtering at the servers Fine-grained, cell-level access control More details in the ACM SOCC 2011 paper Ongoing work Analyze more table stores: Cassandra, CouchDB, MongoDB Continue research through the new Intel Science and Technology Center for Cloud Computing at CMU (with GaTech) Swapnil Patil, CMU 27
Summary: YCSB++ tool Tool for performance debugging and benchmarking advanced features using new extensions to YCSB Weak consistency semantics Fast insertions (pre-splits & bulk loads) Server-side filtering Fine-grained access control Distributed clients using ZooKeeper Multi-phase testing (with Hadoop) New workload generators and database client API extensions Two case-studies: Apache HBASE and ACCUMULO Tool available at http://www.pdl.cmu.edu/ycsb++ Swapnil Patil, CMU 28