ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Similar documents
Column Stores vs. Row Stores How Different Are They Really?

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

April Copyright 2013 Cloudera Inc. All rights reserved.

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

I am: Rana Faisal Munir

An Introduction to Big Data Formats

Caching and Buffering in HDF5

HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

A Review Paper on Big data & Hadoop

Evolving To The Big Data Warehouse

Hive SQL over Hadoop

CISC 7610 Lecture 2b The beginnings of NoSQL

Things To Know. When Buying for an! Alekh Jindal, Jorge Quiané, Jens Dittrich

Combining MapReduce with Parallel DBMS Techniques for Large-Scale Data Analytics

cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman

ORC Files. Owen O June Page 1. Hortonworks Inc. 2012

An Oracle White Paper April 2010

The amount of data increases every day Some numbers ( 2012):

2/26/2017. The amount of data increases every day Some numbers ( 2012):

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs

Was ist dran an einer spezialisierten Data Warehousing platform?

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Resource and Performance Distribution Prediction for Large Scale Analytics Queries

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

EsgynDB Enterprise 2.0 Platform Reference Architecture

CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

Column-Stores vs. Row-Stores: How Different Are They Really?

Table Compression in Oracle9i Release2. An Oracle White Paper May 2002

Big Data Infrastructures & Technologies

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

Cloud Computing & Visualization

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

CSE 153 Design of Operating Systems

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo

Juxtaposition of Apache Tez and Hadoop MapReduce on Hadoop Cluster - Applying Compression Algorithms

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Evaluating Data Storage Structures of Map Reduce

Actian Vector Benchmarks. Cloud Benchmarking Summary Report

The Google File System

The Google File System

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

Embedded Technosolutions

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

A Fast and High Throughput SQL Query System for Big Data

Exadata X3 in action: Measuring Smart Scan efficiency with AWR. Franck Pachot Senior Consultant

Shark: SQL and Rich Analytics at Scale. Reynold Xin UC Berkeley

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Automating Information Lifecycle Management with

Deep Dive Into Storage Optimization When And How To Use Adaptive Compression. Thomas Fanghaenel IBM Bill Minor IBM

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

Big Data Analytics. Rasoul Karimi

V Conclusions. V.1 Related work

Shark: Hive (SQL) on Spark

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Databases 2 (VU) ( / )

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa

Shark: SQL and Rich Analytics at Scale. Yash Thakkar ( ) Deeksha Singh ( )

Hadoop/MapReduce Computing Paradigm

Big Data Facebook

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.

Presented by Nanditha Thinderu

Skipping-oriented Partitioning for Columnar Layouts

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

The Google File System

HadoopDB: An open source hybrid of MapReduce

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Weaving Relations for Cache Performance

Crossing the Chasm: Sneaking a parallel file system into Hadoop

IMPROVING THE PERFORMANCE, INTEGRITY, AND MANAGEABILITY OF PHYSICAL STORAGE IN DB2 DATABASES

Large-Scale Data Engineering

A BigData Tour HDFS, Ceph and MapReduce

Crossing the Chasm: Sneaking a parallel file system into Hadoop

The Fusion Distributed File System

Introduction to Database Services

In-Memory Data Management

Approaching the Petabyte Analytic Database: What I learned

MapReduce, Hadoop and Spark. Bompotas Agorakis

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia Final Exam. Administrivia Final Exam

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Part 1: Indexes for Big Data

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

OLTP vs. OLAP Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

Implementation Techniques

Cloud Computing CS

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Transcription:

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems Presented by Fahad Mirza

Part 1 Background Objectives Previous Techniques Agenda Part 2 RC File Implementation Part 3 Results Conclusion

Part 1

Background Map reduce based data-ware house system: Support big data analytics Adjust quickly the dynamics of user behavior trends Needs in typical web-service providers and social network sites (e.g., Facebook) Data placement structure is a crucial factor in warehouse performance Facebook warehouse characterized four requirements for the data placement structure: Fast data loading Fast query processing Highly efficient storage space utilization Strong adaptivity to highly dynamic workload patterns 4 (1)

Background Why data placement is so critical for warehouse? MapReduce framework and Hadoop provide a scalable and faulttolerant infrastructure for big data analysis on large clusters. MapReduce-based warehouse systems cannotdirectly control storage disks in clusters Utilize cluster-level distributed file system (e.g. HDFS, the Hadoop Distributed File System) to store a huge amount of table data Serious challenge-find an efficient data placement method to organize table data in the underlying HDFS. 5 (2)

Objectives Fast data loading Quick data loading is important for Facebook data warehouse On average, more than 20TB data are pushed daily in Facebookdata warehouse. Fast query processing Response-time critical queries such as decision support queries comprise of major workload in such applications. Data placement structure should be able to support a large number of query processing as the number of queries rapidly increases. Highly efficient storage space utilization High space utilization in disks to avoid any storage waste. Rapidly growing user activities demand scalable storage capacity. Strong adaptivity to highly dynamic workload patterns Underlying system must be highly adaptive to meet unexpected dynamics in data-processing with limited storage for both expected and unexpected 6 queries.

Previous Techniques Common data placement structures in conventional DBs Row-stores Column-stores Hybrid-stores in the context of large data analysis using MapReduce Not very suitable for big data processing in distributed systems Data Placement Techniques for MapReduce Conventional database system placement structures for MapReduce datawarehouse Horizontal row-store structure Vertical column-store structure Hybrid PAX store structure Importing DB structures into MapReduce data warehouse system can t meet all four objectives 7 (1)

Previous Techniques Horizontal Query Structure Advantages Fast data loading and strong adaptive ability to dynamic workloads Disadvantages Cannot support fast query processing Reason: It cannot skip un-necessary column reads when a query only requires only a few columns from a wide table with many columns. Compression ratio is low and hence a high storage space utilization Reason: Due to mixed columns with different data domains 8 (2)

Previous Techniques Vertical column-store structure Advantages High compression ratios Reason: Similar length data fields available Disadvantage Column-store can often cause high record reconstruction overhead with expensive network transfers in a cluster. Slower query processing. Reason: HDFS cannot guarantee that all fields in the same record are stored in the same cluster node. So high overhead for tuple reconstruction. 9 Alternative Pre-grouping multiple columns together can reduce the overhead Disadvantages It does not have a strong adaptivity to respond highly dynamic workload patterns. (3)

Previous Techniques Two schemes of vertical stores Each column in one sub-relation, such as the Decomposition Storage Model (DSM)- Column Store Organize all the columns of a relation into different column groups, and usually allow column overlapping among multiple Column Groups. 10 (4)

Hybrid PAX store structure A hybrid placement structure Previous Techniques Aiming to improve CPU cache performance Multiple fields, of a record, from different columns are put in a single disk page to save additional operations for record reconstructions Within each disk page, PAX uses a mini-page to store all fields belonging to each column, and uses a page header to store pointers to mini pages Advantages Strong adaptive ability for various dynamic query workloads CPU performance improved by better cache utilization Disadvantages Can t satisfy high store utilization and fast query processing speed. 11 (5)

Previous Techniques Drawbacks of PAX Store Architecture Not associated with data compression, which is not necessary for cache optimization Cannot improve I/O performance because it does not change the actual content of a page and hence slower query processing. Limited by the page-level data manipulation inside a traditional DBMS engine, PAX uses a fixed page as the basic unit of data record organization 12 (6)

Part 2

New Technique RC File A big data placement structure RC File ( Record Columnar File) Satisfies all 4 requirements Adopted by Hive and Pig RCFileapplies the concept of first horizontally-partition, then vertically-partition from PAX. It combines the advantages of both row-store and column-store. RCFileguarantees that data in the same row are located in the same node, thus it has low cost of tuple reconstruction. As column-store, RCFilecan exploit a column-wise data compression and skip unnecessary column reads. Utilizes a column-wise data compression within each row group. 14 Provides a lazy decompression technique to avoid unnecessary column decompression during query execution. (1)

Data Layout for an RC File HDFS structure New Technique RC File A table can span multiple HDFS blocks. All the records stored are partitioned into row groups. A table stored in RCFileis first horizontally partitioned into multiple row groups. Followed by each row group is vertically partitioned so that each column is stored independently. For a table, all row groups have the same size. Depending on the row group size and the HDFS block size, an HDFS block can have only one or multiple row groups. RCFileallows a flexible row group size. A default size is given considering both data compression performance and query execution performance. 15 RCFilealso allows users to select the row group size for a given table. (2)

New Technique RC File Data Layout for an RC File The first section Sync marker that is placed in the beginning of the row group. Sync marker is mainly used to separate two continuous row groups in an HDFS block. The second section Metadata header for the row group. Header stores the information items on how many records are in this row group, how many bytes are in each column, and how many bytes are in each field in a column. The third section The table data section that is actually a column-store. In this section, all the fields in the same column are stored continuously together 16 (3)

New Technique RC File 17 (4)

New Technique RC File Data Compression RC File In each row group, the metadata header section and the table data section are compressed independently as follows. Metadata header section compressed using RLE (Run Length Encoding). Advantage of RLE Values of the field lengths in the same column are continuously stored The RLE algorithm can find long runs of repeated data values, especially for fixed field lengths. RLE not used for the column data because its not sorted Tabledatasectionisnotcompressedasawholeunit. Each column is independently compressed with the high end Gzip compression algorithm. 18 Due to the lazy decompression technology, does not need to decompress all the columns when processing a row group. (5)

New Technique RC File Allows us to use different algorithms to compress different columns. In future multiple type of compression schemes can be adopted. Data Appending RCFile does not allow arbitrary data writing operations. Only appends are allowed. Because HDFS only supports data write at the endoffile. The method of data appending in RCFile is summarized as follows: RCFilecreates and maintains an in-memorycolumn holderfor each column. When a record is appended, all its fields will be scattered, and each field will be appended into its corresponding column holder. 19 (6)

New Technique RC File In addition, RCFile will record corresponding metadata of each field in the metadata header. RCFile - two parameters provided to control records counts in memory before they are flushed into the disk. i. Number of records ii. Limitofthesizeofthememorybuffer. RCFile first compresses the metadata header and stores it in the disk. Then it compresses each column holder separately, and flushes compressed column holders into one row group in the underlying file system. 20 (7)

New Technique RC File Data Reads When processing a row group, RCFiledoes not need to fully read the whole content of the row group into memory. It only reads the metadata header and the needed columns in the row group for a given query. Advantage: it can skip unnecessary columns and gain the I/O advantages of column store. For instance: suppose we have a table with four columns tbl(c1, c2, c3, c4), and we have a query SELECT c1 FROM tblwhere c4 = 1. Then, in each row group, RCFileonly reads the content of column c1 and c4. 21 (8)

New Technique RC File Once the metadata header and data of the required columns are loaded they need to be decompressed. The metadata header is always decompressed and held in memory until RCFile processes the next row group. RCFiledoes not decompress all the loaded columns. Instead, it uses a lazy decompression technique. Lazy decompression Column will not be decompressed in memory until RCFilehas determined that the data in the column will be really useful for query execution. If a where condition is not satisfied by all the records in a row group then RCFile does not decompress the columns that do not occur in the where condition. 22 (9)

Part 3

Results Determining effect of row group size on compression ratio I/O performance is a major concern of RCFile. RCFileneeds to use a large and flexible row group size. Current size adopted by Facebookis 4 MB. Larger group size Advantage Better data compression efficiency than that of a small one. Disadvantage May have lower read performance than small sized one. Can undermine benefits of lazy decompression. Higher memory usage 24 (1)

Results 25 (2)

Performance Evaluation Results Effectiveness of RCFileversus other structures(row,column,pax) in three aspects: i. Data storage space ii. Data loading time iii. Query execution time Data Storage Space performance evaluation USERVISITS table from the benchmark Generated the data set whose size is about 120GB Data is all in plain text Loaded it into Hive using different data placement structures Data is compressed by the Gzip algorithm for each structure 26 (3)

Results Data Storage Space Results Interpretation RCFilestores data in two sections: data and the meta-data, hence different compression ratios and better compression efficiency and low storage 27 (4)

Results Data Loading Time Performance Evaluation Data loading time(the time required by loading the raw data into the data warehouse) Data Loading Time Results Interpretation Row-store - smallest data loading time due to minimum overhead to re-organize records in the raw text file. Column-store and column-group -due to raw data file will be written to multiple HDFS blocks for different columns (or column groups). RCFileis slightly slower than row-store due to small overhead to re-organize records inside each row group. (5) 28

Results Query Execution Time Performance Evaluation Executed two queries on the RANKING table, having three columns from the benchmark. For column-store, all three columns are stored independently. Q1: SELECT pagerank, pageurl FROM RANKING WHERE pagerank > 400; Q2: SELECT pagerank, pageurl FROM RANKING WHERE pagerank < 400; Q1 : RC File outperforms others utilizing lazy decompression Q2: Column store performs slightly better due to high selectivity Note that the performance advantage of column-group is not free. It relies on pre-defined column combinations before query executions. 29 (6)

Results Effect of different Row Group Sizes on RCFile s Performance Workload Industry standard TPC-H benchmark for warehousing system evaluations Generated by daily operations for advertisement business in Facebook Factors examined Data storage space Query execution time 30 (7)

Results TCP-H Workload performance evaluation RCFilecan significantly decrease storage space compared with row-store. Increasing row group size after a threshold would not help improve data compression efficiency significantly. A large row group can also decrease the advantage of lazy decompression, and cause unnecessary data decompression 31 (8)

Results Facebook Workload Performance Evaluation Query A: SELECT adid, userid FROM adclicks; Query B: SELECT adid, userid FROM adclickswhere userid="x ; For row-store, the average mappertime of Query B > Query A. This is due to where clause in the query causing more computations For Query B, the average mappertime significantly shorter than that of Query A. This reflects the performance benefit of lazy decompression of RCFile. 32 (9)

Competitive systems Conclusion Cheetah but RC File outperforms due to heavy utilization of Gzipon both meta and column data by Cheetah Big Table from Google. It s a low-level key value store for both read and write-intensive applications. But RFC is served as almost readonly data warehouse. Facebook is trying to transform its existing data to RCFile format. An integration of RCFile to Pig(Yahoo) is being developed by Yahoo. 33

Questions (1) What s the row-store data placement? What re the disadvantages of this data placement? (Section II-A) 34