Apache Flink Big Data Stream Processing

Apache Flink Big Data Stream Processing Tilmann Rabl Berlin Big Data Center www.dima.tu-berlin.de bbdc.berlin rabl@tu-berlin.de XLDB 11.10.2017 1 2013 Berlin Big Data Center All Rights Reserved DIMA 2017

Agenda Disclaimer: I am neither a Flink developer nor affiliated with data Artisans. 2 2 DIMA 2017

Agenda Flink Primer Background & APIs (-> Polystore functionality) Execution Engine Some key features Stream Processing with Apache Flink Key features With slides from data Artisans, Volker Markl, Asterios Katsifodimos 3 3 DIMA 2017

Stratosphere: General Purpose Programming + Database Execution Draws on Database Technology Adds Draws on MapReduce Technology Relational Algebra Declarativity Query Optimization Robust Out-of-core Iterations Advanced Dataflows General APIs Native Streaming Scalability User-defined Functions Complex Data Types Schema on Read 5 DIMA 2017

The APIs Stream- & Batch Processing Analytics Stream SQL Table API (dynamic tables) Stateful Event-Driven Applications 6 DataStream API (streams, windows) Process Function (events, state, time) 6 2013 Berlin Big Data Center All Rights Reserved 6 DIMA 2017

Process Function class MyFunction extends ProcessFunction[MyEvent, Result] { // declare state to use in the program lazy val state: ValueState[CountWithTimestamp] = getruntimecontext().getstate( ) def processelement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = { // work with event and state (event, state.value) match { } out.collect( ) // emit events state.update( ) // modify state } // schedule a timer callback ctx.timerservice.registereventtimetimer(event.timestamp + 500) } def ontimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = { // handle callback when event-/processing- time instant is reached } 7 2013 Berlin Big Data Center All Rights Reserved 7 DIMA 2017 7

Data Stream API val lines: DataStream[String] = env.addsource( new FlinkKafkaConsumer09<>( )) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream.keyby("sensor").timewindow(time.seconds(5)).sum(new MyAggregationFunction()) stats.addsink(new RollingSink(path)) 8 2013 Berlin Big Data Center All Rights Reserved 8 DIMA 2017 8

What can I do with it? Stream processing Batch processing Machine Learning at scale Complex event processing Graph Analysis Flink An engine that can natively support all these workloads. 10 2013 Berlin Big Data Center All Rights Reserved 10 DIMA 2017

Flink in the Analytics Ecosystem Applications & Languages Hive Mahout Cascading Pig Giraph Crunch Data processing engines MapReduce Spark Storm Flink Tez App and resource management Yarn Mesos Storage, streams HDFS HBase Kafka 11 2013 Berlin Big Data Center All Rights Reserved 11 DIMA 2017 11

Where in my cluster does Flink fit? Gathering Integration Analysis Server logs Upstream systems Trxn logs Sensor logs - Gather and backup streams - Offer streams for consumption - Provide stream recovery - Analyze and correlate streams - Create derived streams and state - Provide these to upstream systems 12 DIMA 2017

Architecture Hybrid MapReduce and MPP database runtime Pipelined/Streaming engine Complete DAG deployed Worker 1 Worker 2 Job Manager Worker 3 Worker 4 13 13 DIMA 2017

Flink Execution Model Flink program = DAG* of operators and intermediate streams Operator = computation + state Intermediate streams = logical stream of records 14 14 DIMA 2017

Technology inside Flink case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths.join(edges).where("to").equalto("from") { (path, edge) => Path(path.from, edge.to) }.union(paths).distinct() next } Program Type extraction stack Cost-based optimizer Pre-flight (Client) Map Filter DataSourc e orders.tbl build HT GroupRed sort forward Join Hybrid Hash probe hash-part [0] hash-part [0] DataSourc e lineitem.tbl Dataflow Graph Memory manager Out-of-core algorithms deploy operators Recovery metadata Batch & streaming State & checkpoints Workers track intermediate results Task scheduling Master 15 2013 Berlin Big Data Center All Rights Reserved 15 DIMA 2017

Rich set of operators Map, Reduce, Join, CoGroup, Union, Iterate, Delta Iterate, Filter, FlatMap, GroupReduce, Project, Aggregate, Distinct, Vertex-Update, Accumulators, 16 16 16 DIMA 2017

Effect of optimization Execution Plan A Hash vs. Sort Partition vs. Broadcast Caching Reusing partition/sort Run on a sample on the laptop Execution Plan B Run on large files on the cluster Execution Plan C Run a month later after the data evolved 17 17 17 DIMA 2017

Flink Optimizer Transitive Closure replace Co-locate DISTINCT + JOIN Iterate Forward HDF S Hybrid Hash Join Group new Distinc Reduce (Sorted (on [0])) Paths Join Union t Hash Partition on [1] Co-locate JOIN + UNION Hash Partition on [1] Step function Hash Partition on [0] paths Loop-invariant data cached in memory What you write is not what is executed No need to hardcode execution strategies Flink Optimizer decides: Pipelines and dam/barrier placement Sort- vs. hash- based execution Data exchange (partition vs. broadcast) Data partitioning steps In-memory caching 18 18 DIMA 2017

Scale Out 19 19 19 DIMA 2017

Stream Processing with Flink 20 DIMA 2017

8 Requirements of Big Streaming Keep the data moving Streaming architecture Integrate stored and streaming data Hybrid stream and batch Declarative access E.g. StreamSQL, CQL Data safety and availability Fault tolerance, durable state Handle imperfections Late, missing, unordered items Automatic partitioning and scaling Distributed processing Predictable outcomes Consistency, event time Instantaneous processing and response The 8 Requirements of Real-Time Stream Processing Stonebraker et al. 2005 21 21 DIMA 2017

8 Requirements of Streaming Systems Keep the data moving Streaming architecture Integrate stored and streaming data Hybrid stream and batch see StreamSQL Declarative access E.g. StreamSQL, CQL Data safety and availability Fault tolerance, durable state Handle imperfections Late, missing, unordered items Automatic partitioning and scaling Distributed processing Predictable outcomes Consistency, event time Instantaneous processing and response The 8 Requirements of Real-Time Stream Processing Stonebraker et al. 2005 22 22 DIMA 2017

How to keep data moving? Discretized Streams (mini-batch) while (true) { // get next few records // issue batch computation } Stream discretizer Job Job Job Job Native streaming while (true) { // process next record } Long-standing operators 23 23 DIMA 2017

Handle Imperfections - Event Time et al. Event time Data item production time Ingestion time System time when data item is received Processing time System time when data item is processed Typically, these do not match! In practice, streams are unordered! Image: Tyler Akidau 25 25 DIMA 2017

Time: Event Time Example Event Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII 1977 1980 1983 1999 2002 2005 2015 Processing Time 26 2013 Berlin Big Data Center All Rights Reserved 26 DIMA 2017 26

Flink s Windowing Windows can be any combination of (multiple) triggers & evictions Arbitrary tumbling, sliding, session, etc. windows can be constructed. Common triggers/evictions part of the API Time (processing vs. event time), Count Even more flexibility: define your own UDF trigger/eviction Examples: datastream.windowall(tumblingeventtimewindows.of(time.seconds(5))); datastream.keyby(0).window(tumblingeventtimewindows.of(time.seconds(5))); Flink will handle event time, ordering, etc. 27 27 DIMA 2017

Example Analysis: Windowed Aggregation (2) StockPrice(HDP, 23.8) StockPrice(SPX, 2113.9) StockPrice(FTSE, 6931.7) StockPrice(HDP, 23.8) StockPrice(HDP, 26.6) (1) (3) StockPrice(SPX, 2113.9) StockPrice(FTSE, 6931.7) StockPrice(HDP, 26.6) (4) StockPrice(SPX, 2113.9) StockPrice(FTSE, 6931.7) StockPrice(HDP, 25.2) (1) (2) (3) (4) val windowedstream = stockstream.window(time.of(10, SECONDS)).every(Time.of(5, SECONDS)) val lowest = windowedstream.minby("price") val maxbystock = windowedstream.groupby("symbol").maxby("price") val rollingmean = windowedstream.groupby("symbol").mapwindow(mean _) 28 DIMA 2017

Data Safety and Availability Ensure that operators see all events At least once Solved by replaying a stream from a checkpoint No good for correct results Ensure that operators do not perform duplicate updates to their state Exactly once Several solutions Ensure the job can survive failure 29 29 29 DIMA 2017

Lessons Learned from Batch batch-2 batch-1 If a batch computation fails, simply repeat computation as a transaction Transaction rate is constant Can we apply these principles to a true streaming execution? 30 30 DIMA 201730

Taking Snapshots the naïve way t1 t2 Initial approach (e.g., Naiad) Pause execution on t1,t2,.. Collect state Restore execution execution snapshots 31 31 DIMA 201731

Asynchronous Snapshots in Flink t1 snapshotting t2 snapshotting Propagating markers/barriers snap - t1 Full or incremental snap - t2 32 32 DIMA 2017

Conclusion Apache Flink! The case for Flink as a stream processor Ideal basis for polystore computations Full feature big data streaming engine 33 33 DIMA 2017