The Stratosphere Platform for Big Data Analytics

The Stratosphere Platform for Big Data Analytics Hongyao Ma Franco Solleza April 20, 2015

Stratosphere

Big Data Analytics BIG Data Heterogeneous datasets: structured / unstructured / semi-structured Users have different needs for declarativity and expressivity

What we have covered so far Polybase Shark MLBase SharedDB BlinkDB

The Promises Declarative, high-level language In situ data analysis Richer set of primitives than MapReduce Treat UDFs at first-class citizens Automated parallelization and optimization Support for iterative programs Includes external memory query processing algorithms to support arbitrarily long programs

Outline Meteor & Sopremo PACT Nephele Experiment Results Future work & Discussions

Sopremo

Meteor Script Declarative interface High level script

Meteor Translates To Sopremo Output Group Join Compute Revenue Filter Lineitem Supplier

Sopremo Modular and extensible Composable

Sopremo compiled to PACT Output Group Join Compute Revenue Filter Lineitem Supplier

PACT

PACT Programmer makes a pact with system Uses one of 5 functions

PACT Programmer makes a pact with system Uses one of 5 functions Map Reduce Match Cross Co-group

What s a PACT? Data and a function Specifies how data are partitioned across the system An atomic(?) operation on all specified data

Iterative PACT Programs

Iterative PACT Programs Implicitly, iteration mutates state

Iterative PACT Programs Implicitly, iteration mutates state How to do iteration without explicit mutation of state?

Iterative PACT Programs Bulk iteration

Iterative PACT Programs Bulk iteration Starts with a solution set

Iterative PACT Programs Bulk iteration Sends group by label to neighbors

Iterative PACT Programs Bulk iteration Find minimum among those neighbors

Iterative PACT Programs Bulk iteration Outputs an incremental solution set

Iterative PACT Programs Bulk iteration Incremental solution set becomes input to next iteration

Iterative PACT Programs Bulk iteration

Iterative PACT Programs Incremental iteration

Iterative PACT Programs Incremental iteration Starts with a work set, and a solution set

Iterative PACT Programs Incremental iteration Calculates the min for a group

Iterative PACT Programs Incremental iteration Merges work set with solution set and checks if label changed

Iterative PACT Programs Incremental iteration If the label is new, it becomes part of the delta set..

Iterative PACT Programs Incremental iteration Which gets sent back to the next iteration

Iterative PACT Programs Incremental iteration If changed, also gets matched to the neighbors...

Iterative PACT Programs Incremental iteration And those matches become the new workset

Iterative PACT Programs Incremental iteration

PACT Optimization

Nephele

Nephele Execution

Nephele Execution Tasks, channels, scheduling

Nephele Execution Tasks, channels, scheduling Tasks with all local pipelines associated with that task are pushed by to slaves

Nephele Execution Tasks, channels, scheduling Tasks can request to send data over network (only when necessary or ready)

Nephele Execution Fault tolerance

Nephele Execution Fault tolerance Conceptually, follows the same concept as lineage (RDDs) but...

Nephele Execution Fault tolerance Intermediate Blocking operator model

Nephele Execution Fault tolerance Intermediate Non- Blocking operator model

Nephele Execution Runtime operators

Does it deliver?

Does it deliver? Maybe - what do the experiments say? What s old? A lot of things What s new? second-order functions that abstract parallelization optimization in a UDF-heavy environment Integrate iterative processing an extensible query language and underlying operator model

Experimental Evaluation

Experimental Setup Setup: 1 master + 25 slave machines 16 cores @ 2.0Hz with 32GB of RAM (29GB of operating memory) 80TB HDFS in plain ASCII, 4 SATA drives at 500MB/s read/write per node 8 parallel tasks per slave, total DOP 40-200 Comparison with Hadoop Vanilla MapReduce engine Apache Hive Apache Giraph

Summary of Results Stratosphere achieves linear speedup and similar performance to Hadoop for simple tasks (TeraSort, Word Count) Stratosphere beats Hive and Hadoop by 5 times for complicated tasks like TPC-H and triangle enumeration, though no gain from increasing DOP Stratosphere performed worse on Connected Components than Giraph due to the better tuned implementation of the latter Checkpointing adds little overhead and saves much time when failure occurs

TeraSort --- Stratosphere v.s. Hadoop Stratosphere achieves similar performance as Hadoop and Linear Speedup

Word Count --- Stratosphere v.s. Hadoop Stratosphere is 20% faster than Hadoop and achieves linear speedup

Triangle Enumeration: Reducer 1

Triangle Enumeration: Reducer 2

Triangle Enumeration: PACT

Triangle Enumeration Stratosphere is 5x faster than Hadoop, though parallelism does not help

TPC-H Query

TPC-H --- Stratosphere v.s. Hive Parallelism does not seem to help, however, Stratosphere is 5x faster

Connected Components Giraph is faster, due to better tuned implementation

CC --- Execution time per superstep

Fault Tolerance Checkpointing adds little overhead and saves much time when failure occurs

What Else Do We Want to See? For presented experiments: Breakdown of execution time to distinguish bottlenecks What happens with even smaller DOP? What happens with more/less tasks on each core? Further: What happens with even larger data? Current size does fit into RAM Comparison with MPP, or split query processing systems like Polybase, or Shark given the size of the tested data

The Promises? Declarative, high-level language In situ data analysis Richer set of primitives than MapReduce Treat UDFs at first-class citizens Automated parallelization and optimization Support for iterative programs Includes external memory query processing algorithms to support arbitrarily long programs

Ongoing and Future Work One-pass optimizer unifying PACT and sopremo layers Strengthening fault-tolerant capabilities Improving scalability and efficiency of Nephele Design, compilation and optimization of higher-level languages Scalable, efficient, and adaptive algorithms and architecture Stateful systems for fast ingestion and low-latency data analysis

Discussions and Questions Declarativity - expressiveness tradeoff More declarative -> less expressive, but easier to optimize Run-time optimization is the way to go? Skewed data distribution may become a bottleneck for such systems Detecting performance bottleneck on the fly

QED THANKS!