Apache Kudu. Zbigniew Baranowski

Similar documents
Cloudera Kudu Introduction

Time Series Storage with Apache Kudu (incubating)

Apache Kudu. A Distributed, Columnar Data Store for Fast Analytics. Mike Percy Software Engineer at Cloudera Apache Kudu PMC member

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Big Data Hadoop Course Content

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Apache Hive for Oracle DBAs. Luís Marques

Important Notice Cloudera, Inc. All rights reserved.

Data Access 3. Managing Apache Hive. Date of Publish:

Introducing Apache Kudu and RecordService (incubating)

Comparing SQL and NOSQL databases

Introduction to BigData, Hadoop:-

Innovatus Technologies

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Turning Relational Database Tables into Spark Data Sources

Hadoop Online Training

Security and Performance advances with Oracle Big Data SQL

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Hadoop File Formats and Data Ingestion. Prasanth Kothuri, CERN

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Evolution of the Logging Service Hands-on Hadoop Proof of Concept for CALS-2.0

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

10 Million Smart Meter Data with Apache HBase

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

microsoft

Apache HBase Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Do-It-Yourself 1. Oracle Big Data Appliance 2X Faster than

Albis: High-Performance File Format for Big Data Systems

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

April Copyright 2013 Cloudera Inc. All rights reserved.

Map-Reduce. Marco Mura 2010 March, 31th

Distributed File Systems II

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Backtesting with Spark

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Hadoop Development Introduction

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Cloudera Introduction

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Cmprssd Intrduction To

Hadoop. Introduction to BIGDATA and HADOOP

Shen PingCAP 2017

Chase Wu New Jersey Institute of Technology

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Big Data Architect.

Oracle NoSQL Database Enterprise Edition, Version 18.1

Cloudera Introduction

Ghislain Fourny. Big Data 5. Wide column stores

MapR Enterprise Hadoop

Starting with Apache Spark, Best Practices and Learning from the Field

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop

Big Data Analytics using Apache Hadoop and Spark with Scala

Hive SQL over Hadoop

Ghislain Fourny. Big Data 5. Column stores

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Cloudera Introduction

An Introduction to Big Data Formats

HBase... And Lewis Carroll! Twi:er,

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

HBase. Леонид Налчаджи

EsgynDB Enterprise 2.0 Platform Reference Architecture

Typical size of data you deal with on a daily basis

Oracle Big Data SQL High Performance Data Virtualization Explained

Oracle Big Data Connectors

Distributed Systems 16. Distributed File Systems II

Using space-filling curves for multidimensional

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

ORC Files. Owen O June Page 1. Hortonworks Inc. 2012

Apache Spark 2.0. Matei

Unifying Big Data Workloads in Apache Spark

Scale out databases for CERN use cases

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Hadoop. Introduction / Overview

The State of Apache HBase. Michael Stack

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Open Data Standards for Administrative Data Processing

Microsoft Big Data and Hadoop

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Impala Intro. MingLi xunzhang

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

Certified Big Data Hadoop and Spark Scala Course Curriculum

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Certified Big Data and Hadoop Course Curriculum

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan

Transcription:

Apache Kudu Zbigniew Baranowski

Intro

What is KUDU? New storage engine for structured data (tables) does not use HDFS! Columnar store Mutable (insert, update, delete) Written in C++ Apache-licensed open source Quite new ->1.0 version recently released First commit on October 11th, 2012 and immature?

KUDU tries to fill the gap HDFS excels at Scanning of large amount of data at speed Accumulating data with high throughput HBASE (on HDFS) excels at Fast random lookups by key Making data mutable

Table oriented storage A Kudu table has RDBMS-like schema Primary key (one or many columns), No secondary indexes Finite and constant number of columns (unlike HBase) Each column has a name and type boolean, int(8,16,32,64), float, double, timestamp, string, binary Horizontally partitioned (range, hash) partitions are called tablets tablets can have 3 or 5 replicas

Data Consistency Writing Single row mutations done atomically across all columns No multi-row ACID transactions Reading Tuneable freshness of the data read whatever is available or wait until all changes committed in WAL are available Snapshot consistency changes made during scanning are not reflected in the results point-in-time queries are possible (based on provided timestamp)

Kudu simplifies BigData deployment model for online analytics (low latency ingestion and access) Classical low latency design Stream Source Stream Source Stream Source Events Flush periodically Staging area Flush immediately Big Files HDFS Indexed data Batch processing Fast data access

Implementing low latency with Kudu Stream Source Stream Source Events Stream Source Batch processing Fast data access

Kudu Architecture

Architecture overview Master server (can be multiple masters for HA) Stores metadata - tables definitions Tablets directory (tablets locations) Coordinates the cluster reconfigurations Tablet servers (worker nodes) Writes and reads tablets Stored on local disks (no HDFS) Tracks status of tablets replicas (followers) Replicates the data to followers

Tables and tablets Master Map of table TEST: TabletID TabletID Leader TabletID Leader Follower1 Leader Follower1 Follower2 Follower1 Follower2 Follower2 TEST1 TEST1 TS1 TEST1 TS1 TS2 TS1 TS2 TS3 TS2 TS3 TS3 TEST2 TEST2 TS4 TEST2 TS4 TS1 TS4 TS1 TS2 TS1 TS2 TS2 TEST3 TEST3 TS3 TEST3 TS3 TS4 TS3 TS4 TS1 TS4 TS1 TS1 Leader TEST1 TEST1 TEST1 Leader TEST2 TEST2 TEST2 Leader TEST3 TEST3 TEST3 TabletServer1 TabletServer2 TabletServer3 TabletServer4

Data changes propagation in Kudu (Raft Consensus - https://raft.github.io) Master Client Tablet server X Tablet 1 (leader) Commit WAL Tablet server Y Tablet 1 (follower) Commit WAL Tablet server Z Tablet 1 (follower) Commit WAL

Insert into tablet (without uniqueness check) MemRowSet DiskRowSet1 (32MB) PK B+tree Row1,Row2,Row3 Flush DiskRowSet2 (32MB) Col1 Col2 Col3 PK {min, max} Bloom filters PK {min, max} Row: Col1,Col2, Col3 Leafs sorted by Primary Key Columnar store encoded similarly to Parquet Rows sorted by PK. Interval tree Bloom filters for PK ranges. Stored in cached btree INSERT Interval tree keeps track of PK ranges within DiskRowSets PK Col1 Col2 Col3 Bloom filters Tablets Server There might be Ks of sets per tablet

DiskRowSet compaction DiskRowSet1 (32MB) PK {A, G} DiskRowSet2 (32MB) PK {B, E} Compact DiskRowSet1 (32MB) PK {A, D} DiskRowSet2 (32MB) PK {E, G} Periodical task Removes deleted rows Reduces the number of sets with overlapping PK ranges Does not create bigger DiskRowSets 32MB size for each DRS is preserved

Btree index PK Btree index Btree index Btree index maps PK to row offset How columns are stored on disk maps row offsets to pages Column1 Column2 Column3 (DiskRowSet) Values Page metadata Values Page metadata Values Page metadata Values Page metadata Values Page metadata Values Page metadata Values Page metadata Values Page metadata Values Page metadata Values Page metadata Values Page metadata Values Page metadata Size 256KB Pages are encoded with a variety of encodings, such as dictionary encoding, 32MB bitshuffle, or RLE Pages can be compressed: Snappy, LZ4 or ZLib

Kudu deployment

3 options for deployments Build from source Using RPMs 1 core rpms 2 service rpms (master and servers) One shared config file Using Cloudera manager Click, click, click, done

Interfacing with Kudu

Table access and manipulations Operations on tables (NoSQL) insert, update, delete, scan Python, C++, Java API Integrated with Impala & Hive(SQL), MapReduce, Spark Flume sink (ingestion)

Table creation Manipulating Kudu tables with SQL(Impala/Hive) CREATE TABLE `kudu_example` ( `runnumber` BIGINT, `eventnumber` BIGINT, `project` STRING, `streamname` STRING, `prodstep` STRING, `datatype` STRING, `amitag` STRING, `lumiblockn` BIGINT, `bunchid` BIGINT, ) DISTRIBUTE BY HASH (runnumber) INTO 64 BUCKETS TBLPROPERTIES( 'storage_handler' = 'com.cloudera.kudu.hive.kudustoragehandler', 'kudu.table_name' = example_table', 'kudu.master_addresses' = kudu-master.cern.ch:7051', 'kudu.key_columns' = 'runnumber, eventnumber' ); DMLs insert into kudu_example values (1,30,'test',.); insert into kudu_example select * from data_parquet; update kudu_example set datatype='test' where runnumber=1; delete from kudu_example where project='test'; Queries select count(*),max(eventnumber) from kudu_example where datatype like '%AOD% group by runnumber; select * from kudu_example k, parquet_table p where k.runnumber=p.runnumber ;

Creating table with Java import org.kududb.* //CREATING TABLE String tablename = "my_table"; String KUDU_MASTER_NAME = "master.cern.ch" KuduClient client = new KuduClient.KuduClientBuilder(KUDU_MASTER_NAME).build(); List<ColumnSchema> columns = new ArrayList(); columns.add(new ColumnSchema.ColumnSchemaBuilder("runnumber",Type.INT64). key(true).encoding(columnschema.encoding.bit_shuffle).nullable(false).compressionalgorithm(columnschema.compressionalgorithm.sn APPY).build()); columns.add(new ColumnSchema.ColumnSchemaBuilder("eventnumber",Type.INT64). key(true).encoding(columnschema.encoding.bit_shuffle).nullable(false).compressionalgorithm(columnschema.compressionalgorithm.sn APPY).build());.. Schema schema = new Schema(columns); List<String> partcolumns = new ArrayList<>(); partcolumns.add("runnumber"); partcolumns.add("eventnumber"); CreateTableOptions options = new CreateTableOptions().addHashPartitions(partColumns, 64).setNumReplicas(3); client.createtable(tablename, schema,options);..

Inserting rows with Java //INSERTING KuduTable table = client.opentable(tablename); KuduSession session = client.newsession(); Insert insert = table.newinsert(); PartialRow row = insert.getrow(); row.addlong(0, 1); row.addstring(2,"test"). session.apply(insert); //stores them in memory on client side (for batch upload) session.flush(); //sends data to Kudu..

Scanner in Java //configuring column projection List<String> projectcolumns = new ArrayList<>(); projectcolumns.add("runnumber"); projectcolumns.add("datatype"); //setting a scan range PartialRow start = s.newpartialrow(); start.addlong("runnumber", 8); PartialRow end = s.newpartialrow(); end.addlong("runnumber",10); KuduScanner scanner = client.newscannerbuilder(table).lowerbound(start).exclusiveupperbound(end).setprojectedcolumnnames(projectcolumns).build(); while (scanner.hasmorerows()) { RowResultIterator results = scanner.nextrows(); while (results.hasnext()) { RowResult result = results.next(); System.out.println(result.getString(1)); //getting 2nd column } }

Spark with Kudu wget http://central.maven.org/maven2/org/apache/kudu/kudu-spark_2.10/1.0.0/kudu-spark_2.10-1.0.0.jar spark-shell --jars kudu-spark_2.10-1.0.0.jar import org.apache.kudu.spark.kudu._ // Read a table from Kudu val df = sqlcontext.read.options( Map("kudu.master"-> kudu_master.cern.ch:7051, "kudu.table" -> kudu_table )s).kudu // Query using the DF API... df.select(df("runnumber"),df("eventnumber"),df("db0")).filter($"runnumber"===169864).filter($"eventnumber "===1).show(); //...or register a temporary table and use SQL df.registertemptable("kudu_table") sqlcontext.sql("select id from kudu_table where id >= 5").show() // Create a new Kudu table from a dataframe schema // NB: No rows from the dataframe are inserted into the table kuducontext.createtable("test_table", df.schema, Seq("key"), new CreateTableOptions().setNumReplicas(1)) // Insert data kuducontext.insertrows(df, "test_table")

Kudu Security To be done!

Performance (based on ATLAS EventIndex case)

Bytes Average row length Very good compaction ratio The same like parquet No compression Snappy GZip-like Each row consists of 56 attributes Most of them are strings Few integers and floats 3000 2819 2500 2000 1500 length in CSV 1559 1000 777 890 500 171 189 87 90 538 314 326 217 0 kudu parquet hbase avro

Insertion spped (khz) Insertion rates (per machine, per partition) with Impala Average ingestion speed worse than parquet better than HBase 140 No compression Snappy GZip-like 120 115 100 80 60 40 64 85 38 70 49 20 0 7.21 11.34 10.9 5.3 4.4 4.9 kudu parquet hbase avro

Average random lookup spped [s] Random lookup with Impala Good random data lookup speed Similar to Hbase No compression Snappy GZip-like 30 27 25 20 15 16 19 10 5 0 0.27 0.45 0.32 0.62 0.86 0.89 0.56 0.4 0.5 kudu parquet hbase avro

Scan speed (khz) Data scan rate per core with a predicate on non PK column (using Impala) Quite good data scanning speed Much better than HBase If natively supported predicates operations are used it is even faster than parquet 600 No compression Snappy GZip-like 500 488 435 400 345 300 200 136 260 237 129 131 120 232 215 100 62 0 kudu parquet hbase

Kudu monitoring

Cloudera Manager A lot of metrics are published though servers http All collected by CM agents and can be plotted Predefined CM dashboards Monitoring of Kudu processes Workload plots CM can be also used for Kudu configuration

CM Kudu host status

CM - Workload plots

CM - Resource utilisation

Observations & Conclusions

What is nice about Kudu The first one in Big Data open source world trying to combine columnar store + indexing Simple to deploy It works (almost) without problems It scales (this depends how the schema is designed) Writing, Accessing, Scanning Integrated with Big Data mainstream processing frameworks Spark, Impala, Hive, MapReduce SQL and NoSQL on the same data Gives more flexibility in optimizing schema design comparing to HBase (to levels of partitioning) Cloudera is pushing to deliver production-like quality of the software ASAP

What is bad about Kudu? No security (it should be added in next releases) authentication (who connected) authorization (ACLs) Raft consensus not always works as it should Too frequent tablet leader changes (sometime leader cannot be elected at all) Period without leader is quite long (sometimes never ends) This freezes updates on tables Handling disk failures you have to erase/reinitialize entire server Only one index per table No nested types (but there is a binary type) Cannot control tablet placement on servers

When to Kudu can be useful? When you have structured big data Like in a RDBMS Without complex types When sequential and random data access is required simultaneously and have to scale Data extraction and analytics at the same time Time series When low ingestion latency is needed and lambda architecture is too expensive

Learn more Main page: https://kudu.apache.org/ Video: https://www.oreilly.com/ideas/kudu-resolvingtransactional-and-analytic-trade-offs-in-hadoop Whitepaper: http://kudu.apache.org/kudu.pdf KUDU project: https://github.com/cloudera/kudu Some Java code examples: https://gitlab.cern.ch:8443/zbaranow/kudu-atlas-eventindex Get Cloudera Quickstart VM and test it