Time Series Storage with Apache Kudu (incubating)

Similar documents
Cloudera Kudu Introduction

Apache Kudu. Zbigniew Baranowski

April Copyright 2013 Cloudera Inc. All rights reserved.

New Data Architectures For Netflow Analytics NANOG 74. Fangjin Yang - Imply

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Security and Performance advances with Oracle Big Data SQL

Apache Kudu. A Distributed, Columnar Data Store for Fast Analytics. Mike Percy Software Engineer at Cloudera Apache Kudu PMC member

Big Data Architect.

10 Million Smart Meter Data with Apache HBase

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

CIB Session 12th NoSQL Databases Structures

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

microsoft

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Microsoft Big Data and Hadoop

Processing of big data with Apache Spark

Introducing Apache Kudu and RecordService (incubating)

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

Applied Spark. From Concepts to Bitcoin Analytics. Andrew F.

Lecture 11 Hadoop & Spark

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data with Hadoop Ecosystem

August 23, 2017 Revision 0.3. Building IoT Applications with GridDB

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

Distributed systems for stream processing

EsgynDB Enterprise 2.0 Platform Reference Architecture

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Streaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_

Big Data Hadoop Course Content

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Introduction to BigData, Hadoop:-

VOLTDB + HP VERTICA. page

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma

Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData

BIG DATA COURSE CONTENT

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Flash Storage Complementing a Data Lake for Real-Time Insight

Apache Spark 2.0. Matei

Hadoop Online Training

CloudExpo November 2017 Tomer Levi

Distributed File Systems II

Big Data Analytics. Rasoul Karimi

Introduction to Apache Beam

Ghislain Fourny. Big Data 5. Wide column stores

Technical Sheet NITRODB Time-Series Database

YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Backtesting with Spark

Druid Power Interactive Applications at Scale. Jonathan Wei Software Engineer

Exam Questions

Apache Hive for Oracle DBAs. Luís Marques

CS November 2018

Apache Kylin. OLAP on Hadoop

CS November 2017

Big Data Technology Incremental Processing using Distributed Transactions

Oracle Big Data Connectors

Stages of Data Processing

CarbonData : An Indexed Columnar File Format For Interactive Query HUAWEI TECHNOLOGIES CO., LTD.

Data Informatics. Seon Ho Kim, Ph.D.

Big Streaming Data Processing. How to Process Big Streaming Data 2016/10/11. Fraud detection in bank transactions. Anomalies in sensor data

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Google Cloud Bigtable. And what it's awesome at

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider

Scaling for Humongous amounts of data with MongoDB

Introduction to NoSQL Databases

CA485 Ray Walshe NoSQL

Innovatus Technologies

ORC Files. Owen O June Page 1. Hortonworks Inc. 2012

Index. Raul Estrada and Isaac Ruiz 2016 R. Estrada and I. Ruiz, Big Data SMACK, DOI /

Evolution of the Logging Service Hands-on Hadoop Proof of Concept for CALS-2.0

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

UNIFY DATA AT MEMORY SPEED. Haoyuan (HY) Li, Alluxio Inc. VAULT Conference 2017

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

Tools for Social Networking Infrastructures

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Presented by Nanditha Thinderu

Shen PingCAP 2017

Impala Intro. MingLi xunzhang

Distributed Data Analytics Partitioning

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions

Part 1: Indexes for Big Data

data parallelism Chris Olston Yahoo! Research

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Transcription:

Time Series Storage with Apache Kudu (incubating) Dan Burkert (Committer) dan@cloudera.com @danburkert Tweet about this talk: @getkudu or #kudu 1

Time Series machine metrics event logs sensor telemetry 42 2

Time Series Series Time Value 42 3

Time Series Series Time Value us-east.appserver01.loadavg.1min 2016-05-09T15:14:30Z 0.44 us-east.appserver01.loadavg.1min 2016-05-09T15:14:40Z 0.53 us-west.dbserver03.rss 2016-05-09T15:14:30Z 1572864 us-west.dbserver03.rss 2016-05-09T15:15:00Z 2097152 4

Time Series Design Criteria Insert Performance (throughput & latency) Read Performance (throughput & latency) Storage Overhead (compression ratio) 42 5

Time Series Common Patterns Datapoints are inserted in time order across all series Reads specify a series and a time range, containing hundreds to many thousands of datapoints SELECT time, value FROM timeseries WHERE series = "us-west.dbserver03.rss" AND time >= 2016-05-08T00:00:00; 42 6

42 7

Storage Landscape in Hadoop Ecosystem HDFS (GFS) excels at: Batch ingest only (eg hourly) Efficiently scanning large amounts of data (analytics) HBase (BigTable) excels at: Efficiently finding and writing individual rows Making data mutable Gaps exist when these properties are needed simultaneously 8

Kudu Design Goals High throughput for big scans Goal: Within 2x of Parquet Low-latency for short accesses Goal: 1ms read/write on SSD Database-like semantics (initially singlerow ACID) Relational data model SQL queries are easy NoSQL style scan/insert/update (Java, C++, Python clients) 9

Novel Kudu Features Columnar Storage Flexible Data Partitioning Next generation consistency & replication 10

Columnar Storage 22 11

Row Storage (us-east.appserver01.loadavg.1min, 2016-05-09T15:14:30Z, 0.56) (us-east.appserver01.loadavg.1min, 2016-05-09T15:14:40Z, 0.70) (us-east.appserver01.loadavg.1min, 2016-05-09T15:14:50Z, 0.72) (us-east.appserver01.loadavg.1min, 2016-05-09T15:15:00Z, 0.55)

Columnar Storage us-east.appserver01.loadavg.1min us-east.appserver01.loadavg.1min us-east.appserver01.loadavg.1min us-east.appserver01.loadavg.1min 2016-05-09T15:14:30Z 2016-05-09T15:14:40Z 2016-05-09T15:14:50Z 2016-05-09T15:15:00Z 0.56 0.70 0.72 0.55

Columnar Storage Improved Scan performance Predicates (e.g. WHERE time >= 2016-05-08T00:00:00) can be evaluated without reading unnecessary data from other columns Efficient encodings can dramatically improve compression ratios, which reduces effective IO load Typed, homogenous data plays well to modern processor strengths (vectorization, pipelining) At the cost of random access performance single row access requires a number of seeks proportional to the number of columns BUT, random access is becoming cheaper (Cheap RAM, SSDs, NVRAM)

Columnar Storage - Dictionary Encoding Transform blocks of low cardinality values (e.g. series) into an offset array and a dictionary lookup table

Columnar Storage - Dictionary Encoding Transform blocks of low cardinality values (e.g. series) into an offset array and a dictionary lookup table us-east.appserver01.loadavg.1min us-west.dbserver03.rss us-east.appserver01.loadavg.1min asia.appserver02.rss us-west.dbserver03.rss 0 1 0 2 1 us-east.appserver01.loadavg.1min us-west.dbserver03.rss asia.appserver02.rss

Columnar Storage - Bitshuffle Encoding Transform blocks of similar data by transposing values at the bit level Dramatically improves the effectiveness of high throughput general purpose compression algorithms (LZ4, snappy)

Columnar Storage - Bitshuffle Encoding Transform blocks of similar data by transposing values at the bit level Dramatically improves the effectiveness of high throughput general purpose compression algorithms (LZ4, snappy) 2016-05-09T15:14:30Z 2016-05-09T15:14:40Z 2016-05-09T15:14:50Z 2016-05-09T15:15:00Z time 1462806870000000 1462806880000000 1462806890000000 1462806900000000 time (µs since unix epoch)

Columnar Storage - Bitshuffle Encoding Transform blocks of similar data by transposing values at the bit level Dramatically improves the effectiveness of high throughput general purpose compression algorithms (LZ4, snappy) 00000000 00000101 00110010 01101010 01000011 11011100 10000001 10000000 00000000 00000101 00110010 01101010 01000100 01110101 00011000 00000000 00000000 00000101 00110010 01101010 01000101 00001101 10101110 10000000 00000000 00000101 00110010 01101010 01000101 10100110 01000101 00000000 time bits (µs since unix epoch)

Columnar Storage - Bitshuffle Encoding Transform blocks of similar data by transposing values at the bit level Dramatically improves the effectiveness of high throughput general purpose compression algorithms (LZ4, snappy) 00000000 00000101 00110010 01101010 01000011 11011100 10000001 10000000 00000000 00000101 00110010 01101010 01000100 01110101 00011000 00000000 00000000 00000101 00110010 01101010 01000101 00001101 10101110 10000000 00000000 00000101 00110010 01101010 01000101 10100110 01000101 00000000 time bits (µs since unix epoch)

Columnar Storage - Bitshuffle Encoding Transform blocks of similar data by transposing values at the bit level Dramatically improves the effectiveness of high throughput general purpose compression algorithms (LZ4, snappy) 00000000 00000000 00000000 00000000 00000000 00000000 00001111 00001111 00000000 11111111 00000000 11110000 00001111 11110000 11110000 11110000 00001111 00000000 00000111 10001011 10011100 01011100 10101111 00010110 10100001 00100100 01100011 00101001 10100000 00000000 00000000 00000000 transposed time bits (µs since unix epoch) individual runs of N bits are highly correlated (int this case N = 4 datapoints)

Flexible Data Partitioning 22 22

Partitioning vs Indexing Partitioning how datapoints are distributed among partitions Kudu: tablet HBase: region Cassandra: VNode Indexing how data within a single partition is sorted for time series, data should typically indexed by (series, time):

Partitioning vs Indexing Partitioning how datapoints are distributed among partitions Kudu: tablet HBase: region Cassandra: VNode Indexing how data within a single partition is sorted for time series, data should typically indexed by (series, time): (us-east.appserver01.loadavg, 2016-05-09T15:14:00Z) (us-east.appserver01.loadavg, 2016-05-09T15:15:00Z) (us-west.dbserver03.rss, 2016-05-09T15:14:30Z) (us-west.dbserver03.rss, 2016-05-09T15:14:30Z) (series, time) (2016-05-09T15:14:00Z, us-east.appserver01.loadavg) (2016-05-09T15:14:30Z, us-west.dbserver03.rss) (2016-05-09T15:15:00Z, us-east.appserver01.loadavg) (2016-05-09T15:14:30Z, us-west.dbserver03.rss) (time, series)

Partitioning vs Indexing Partitioning how datapoints are distributed among partitions Kudu: tablet HBase: region Cassandra: VNode Indexing how data within a single partition is sorted for time series, data should typically indexed by (series, time): (us-east.appserver01.loadavg, 2016-05-09T15:14:00Z) (us-east.appserver01.loadavg, 2016-05-09T15:15:00Z) (us-west.dbserver03.rss, 2016-05-09T15:14:30Z) (us-west.dbserver03.rss, 2016-05-09T15:14:30Z) (series, time) (2016-05-09T15:14:00Z, us-east.appserver01.loadavg) (2016-05-09T15:14:30Z, us-west.dbserver03.rss) (2016-05-09T15:15:00Z, us-east.appserver01.loadavg) (2016-05-09T15:14:30Z, us-west.dbserver03.rss) (time, series) SELECT * WHERE series = us-east.appserver01.loadavg ;

Partitioning Kudu has flexible policies for distributing data among partitions Indexing is independent of partitioning Hash partitioning is built in, and can be combined with range partitioning

Partitioning By Time Range time Partition 1 Partition 2 Partition 3 start: 2016-05-07T00:00:00Z end: 2016-05-08T00:00:00Z start: 2016-05-08T00:00:00Z end: 2016-05-09T00:00:00Z start: 2016-05-09T00:00:00Z end: 2016-05-10T00:00:00Z

Partitioning By Time Range (inserts) time Partition 1 Partition 2 Partition 3 start: 2016-05-07T00:00:00Z end: 2016-05-08T00:00:00Z start: 2016-05-08T00:00:00Z end: 2016-05-09T00:00:00Z start: 2016-05-09T00:00:00Z end: 2016-05-10T00:00:00Z All Inserts go to Latest Partition

Partitioning By Time Range (scans) time Partition 1 Partition 2 Partition 3 start: 2016-05-07T00:00:00Z end: 2016-05-08T00:00:00Z start: 2016-05-08T00:00:00Z end: 2016-05-09T00:00:00Z start: 2016-05-09T00:00:00Z end: 2016-05-10T00:00:00Z Big scans (across large time intervals) can be parallelized across many partitions

Partitioning By Series Range series Partition 1 Partition 2 Partition 3 start: <start> end: 'us-east' start: 'us-east' end: 'us-west' start: 'us-west' end: <end>

Partitioning By Series Range (inserts) series Partition 1 Partition 2 Partition 3 start: <start> end: 'us-east' start: 'us-east' end: 'us-west' start: 'us-west' end: <end> Inserts are spread among all partitions

Partitioning By Series Range (scans) series Partition 1 Partition 2 Partition 3 start: <start> end: 'us-east' start: 'us-east' end: 'us-west' start: 'us-west' end: <end> Scans are over a single partition

Partitioning By Series Range series Partition 1 Partition 2 start: 'us-east' end: 'us-west' start: 'us-west' end: <end> Partition 3 Partitions can become unbalanced, resulting in hot spotting

Partitioning By Series Hash hash (series) Partition 1 Partition 2 Partition 3 series bucket: 0 series bucket: 1 series bucket: 2

Partitioning By Series Hash (inserts) hash (series) Partition 1 Partition 2 Partition 3 series bucket: 0 series bucket: 1 series bucket: 2 Inserts are spread among all partitions

Partitioning By Series Hash (scans) hash (series) Partition 1 Partition 2 Partition 3 series bucket: 0 series bucket: 1 series bucket: 2 Scans are over a single partition

Partitioning By Series Hash hash (series) Partition 1 series bucket: 0 Partition 2 series bucket: 1 Partition 3 series bucket: 2 Partitions grow overtime, eventually becoming too big for a single server

Partitioning By Series Hash + Time Range hash (series) Partition 1 series bucket: 0 time start: 2016-05-07 time end: 2016-05-08 Partition 2 series bucket: 1 time start: 2016-05-07 time end: 2016-05-08 Partition 3 series bucket: 2 time start: 2016-05-07 time end: 2016-05-08 time Partition 4 series bucket: 0 time start: 2016-05-08 time end: 2016-05-09 Partition 7 series bucket: 0 time start: 2016-05-09 time end: 2016-05-10 Partition 5 series bucket: 1 time start: 2016-05-08 time end: 2016-05-09 Partition 8 series bucket: 1 time start: 2016-05-09 time end: 2016-05-10 Partition 6 series bucket: 2 time start: 2016-05-08 time end: 2016-05-09 Partition 9 series bucket: 2 time start: 2016-05-09 time end: 2016-05-10 Partition 10 series bucket: 0 time start: 2016-05-10 time end: 2016-05-11 Partition 11 series bucket: 1 time start: 2016-05-10 time end: 2016-05-11 Partition 12 series bucket: 2 time start: 2016-05-10 time end: 2016-05-11

Partitioning By Series Hash + Time Range (inserts) hash (series) Partition 1 series bucket: 0 time start: 2016-05-07 time end: 2016-05-08 Partition 2 series bucket: 1 time start: 2016-05-07 time end: 2016-05-08 Partition 3 series bucket: 2 time start: 2016-05-07 time end: 2016-05-08 time Partition 4 series bucket: 0 time start: 2016-05-08 time end: 2016-05-09 Partition 7 series bucket: 0 time start: 2016-05-09 time end: 2016-05-10 Partition 5 series bucket: 1 time start: 2016-05-08 time end: 2016-05-09 Partition 8 series bucket: 1 time start: 2016-05-09 time end: 2016-05-10 Partition 6 series bucket: 2 time start: 2016-05-08 time end: 2016-05-09 Partition 9 series bucket: 2 time start: 2016-05-09 time end: 2016-05-10 Partition 10 series bucket: 0 time start: 2016-05-10 time end: 2016-05-11 Partition 11 series bucket: 1 time start: 2016-05-10 time end: 2016-05-11 Partition 12 series bucket: 2 time start: 2016-05-10 time end: 2016-05-11 Inserts are spread among all partitions in the latest time range

Partitioning By Series Hash + Time Range (scans) hash (series) Partition 1 series bucket: 0 time start: 2016-05-07 time end: 2016-05-08 Partition 2 series bucket: 1 time start: 2016-05-07 time end: 2016-05-08 Partition 3 series bucket: 2 time start: 2016-05-07 time end: 2016-05-08 time Partition 4 series bucket: 0 time start: 2016-05-08 time end: 2016-05-09 Partition 7 series bucket: 0 time start: 2016-05-09 time end: 2016-05-10 Partition 5 series bucket: 1 time start: 2016-05-08 time end: 2016-05-09 Partition 8 series bucket: 1 time start: 2016-05-09 time end: 2016-05-10 Partition 6 series bucket: 2 time start: 2016-05-08 time end: 2016-05-09 Partition 9 series bucket: 2 time start: 2016-05-09 time end: 2016-05-10 Partition 10 series bucket: 0 time start: 2016-05-10 time end: 2016-05-11 Partition 11 series bucket: 1 time start: 2016-05-10 time end: 2016-05-11 Partition 12 series bucket: 2 time start: 2016-05-10 time end: 2016-05-11 Big scans (across large time intervals) can be parallelized across partitions

Time Series Storage in Kudu Columnar Storage Flexible Partitioning 42 41

Kudu Roadmap Current version - 0.8 Beta quality 6 week release cycle 1.0 tentatively planned for late summer First production ready release 42 42

More Kudu Using Kafka and Kudu for Fast, Low Latency SQL Analytics on Streaming Data Mike Percy & Ashish Singh Tuesday at 3:00pm Kudu and Spark for Fast Analytics on Streaming Data Mike Percy & Dan Burkert Vancouver Spark Meetup (in the Hyatt) Tuesday, 6:00pm getkudu.io/ @getkudu @danburkert Slack room getkudu-slack.herokuapp.com/ github.com/danburkert/kudu-ts - early stage metrics storage API built on Kudu 42 43