Apache Flink Streaming Done Right. Till

Size: px

Start display at page:

Download "Apache Flink Streaming Done Right. Till"

Franklin Goodwin
6 years ago
Views:

1 Apache Flink Streaming Done Right Till

2 What Is Apache Flink? Apache TLP since December 2014 Parallel streaming data flow runtime Low latency & high throughput Exactly once semantics Stateful operations 2

3 Why Stream Processing? Most problems have streaming nature Stream processing gives lower latency Data volumes more easily tamed More predictable resource consumption Event stream 3

4 Counting Tweet Impressions Input stream #tweets #tweets Input Group #tweets #tweets 4

5 How Would I Do It with Flink? case class Tweet(id: Long, timestamp: Long, count: Long) val env = StreamExecutionEnvironment.getEnvironment() val tweets: DataStream[Tweet] = env.addsource( new MyTwitterSource()) val result: DataStream[Tweet] = tweets.keyby("id").timewindow(time.minutes(10)).sum("count") result.print() env.execute() 5

6 How Would I Do It with Flink? case class Tweet(id: Long, timestamp: Long, count: Long) val env = StreamExecutionEnvironment.getEnvironment() val tweets: DataStream[Tweet] = env.addsource( new MyTwitterSource()) val result: DataStream[Tweet] = tweets.keyby("id").timewindow(time.minutes(10)).sum("count") result.print() env.execute() 6

7 How Would I Do It with Flink? case class Tweet(id: Long, timestamp: Long, count: Long) val env = StreamExecutionEnvironment.getEnvironment() val tweets: DataStream[Tweet] = env.addsource( new MyTwitterSource()) val result: DataStream[Tweet] = tweets.keyby("id").timewindow(time.minutes(10)).sum("count") result.print() env.execute() 7

8 How Would I Do It with Flink? case class Tweet(id: Long, timestamp: Long, count: Long) val env = StreamExecutionEnvironment.getEnvironment() val tweets: DataStream[Tweet] = env.addsource( new MyTwitterSource()) val result: DataStream[Tweet] = tweets.keyby("id").timewindow(time.minutes(10)).sum("count") result.print() env.execute() We have to define a 6me frame for the aggrega6on 8

9 How Would I Do It with Flink? case class Tweet(id: Long, timestamp: Long, count: Long) val env = StreamExecutionEnvironment.getEnvironment() val tweets: DataStream[Tweet] = env.addsource( new MyTwitterSource()) val result: DataStream[Tweet] = tweets.keyby("id").timewindow(time.minutes(10)).sum("count") result.print() env.execute() 9

10 How Would I Do It with Flink? case class Tweet(id: Long, timestamp: Long, count: Long) val env = StreamExecutionEnvironment.getEnvironment() val tweets: DataStream[Tweet] = env.addsource( new MyTwitterSource()) val result: DataStream[Tweet] = tweets.keyby("id").timewindow(time.minutes(10)).sum("count") result.print() env.execute() 10

11 Expressive Windows Time and count based; Event-time support; Custom windows 11

12 Stateful Operators What if a window grows too large? Solution: Stateful mapper with counter Operator User function Count tweet impressions #13 #37 Read Write #42 Operator state 12

13 Intuitive API case class Tweet(id: Long, timestamp: Long, count: Long) val env = StreamExecutionEnvironment.getEnvironment() val tweets: DataStream[Tweet] = env.addsource( new MyTwitterSource()) val result: DataStream[Tweet] = tweets.keyby("id").mapwithstate { (tweet, state: Option[Long]) => state match { case Some(counter) => (tweet.copy(count = counter + 1L), Some(counter + 1L)) case None => (tweet, Some(1L)) } } 13

14 Intuitive API case class Tweet(id: Long, timestamp: Long, count: Long) val env = StreamExecutionEnvironment.getEnvironment() val tweets: DataStream[Tweet] = env.addsource( new MyTwitterSource()) val result: DataStream[Tweet] = tweets.keyby("id").mapwithstate { (tweet, state: Option[Long]) => state match { case Some(counter) => (tweet.copy(count = counter + 1L), Some(counter + 1L)) case None => (tweet, Some(1L)) } } 14

15 What If a Failure Occurs? Operator User function Count tweet impressions #?? #?? Read Write #42 Operator state Loss of data and incorrect count! 15

16 We Need to Save Our State! Consistent snapshots of distributed data stream and operator state 16

17 How Does It Work? Markers for checkpoints Injected in the data flow 17

18 Processing Guarantees At most once No guarantees at all At least once Ensure that all operators see all events Exactly once Flink gives you all guarantees 18

19 Checkpointed Operators State is automatically checkpointed State backend is configurable Operator Barriers User function Count tweet impressions #13 #37 Read Write #42 Operator state 19

20 What About Performance? Continuous streaming Latency-bound buffering Distributed Snapshots High Throughput & Low Latency With configurable throughput/latency tradeoff 20

21 What s coming next? 21

22 Asynchronous Snapshots Taking snapshots stalls the operator Solution: Out of core & asynchronous snapshots Barriers User function Count tweet impressions #13 #37 Read Write Spill to disk #42 Operator state Async/incremental snapshots 22

23 Streams with Varying Data Rate With static resources: Provision for max. rate events/second Idle capacity time 23

24 (1) Adjust Parallelism Initial configuration Scale Out (for load) Scale In (save resources) 24

25 (2) Dynamic Worker Pool YARN/Mesos/ Pool of Cluster Resources Resource Manager Allocate Release JobManager TaskManager TaskManager 25

26 Declarative Queries StreamSQL val tabenv = new TableEnvironment(env) tabenv.registerstream(stream, mystream, ( ID, MEASURE, COUNT )) val sqlquery = tabenv.sql( SELECT ID, MEASURE FROM mystream WHERE COUNT > 17 ) Complex event processing drink Car Raise alert enters car Sober drink Drunk walks Afoot 26

27 Where to Find Us? 27

28 Architecture Overview (Worker) TaskManager (Worker) TaskManager Task Slot Task Slot Task Slot Task Slot Task Slot Task Slot Task Task Task Task Memory & I/O Manager Network Manager Data Streams Memory & I/O Manager Network Manager Actor System Actor System Flink Program Program Dataflow Optimizer / Graph Builder Dataflow graph Program code Client Actor System Status updates Statistics & results Submit job (send dataflow) Cancel / update job JobManager Dataflow Graph Task Status Heartbeats Statistics Actor System Scheduler Deploy/Stop/ Cancel Tasks Trigger Checkpoints Checkpoint Coordinator (Master / YARN Application Master)

Streaming Analytics with Apache Flink. Stephan

Streaming Analytics with Apache Flink Stephan Ewen @stephanewen Apache Flink Stack Libraries DataStream API Stream Processing DataSet API Batch Processing Runtime Distributed Streaming Data Flow Streaming