PREDICTIVE DATACENTER ANALYTICS WITH STRYMON

Size: px

Start display at page:

Download "PREDICTIVE DATACENTER ANALYTICS WITH STRYMON"

Deborah McLaughlin
6 years ago
Views:

1 PREDICTIVE DATACENTER ANALYTICS WITH STRYMON Vasia Kalavri QCon San Francisco 14 November 2017 Support:

2 ABOUT ME Postdoc at ETH Zürich Systems Group: PMC member of Apache Flink Research Large-scale graph processing Streaming dataflow engines Current project Predictive datacenter analytics and management 2

3 DATACENTER MANAGEMENT

4 4

5 5

6 DATACENTER MANAGEMENT Configuration updates Resource scaling Network failures Service deployment Workload fluctuations Software updates 6

7 DATACENTER MANAGEMENT Configuration updates Resource scaling Network failures Service deployment Workload fluctuations Software updates Can we predict the effect of changes? 6

8 DATACENTER MANAGEMENT Configuration updates Resource scaling Network failures Service deployment Workload fluctuations Software updates Can we predict the effect of changes? Can we prevent catastrophic faults? 6

9 What-if Analysis: Predicting outcomes under hypothetical conditions 7

10 What-if Analysis: Predicting outcomes under hypothetical conditions How will response time change if we migrate a large service? What will happen to link utilization if we change the routing protocol costs? Will SLOs will be violated if a certain switch fails? Which services will be affected if we change load balancing strategy? 7

11 Test deployment? a physical small-scale cluster to try out configuration changes and what-ifs Data Center Test Cluster expensive to operate and maintain some errors only occur in a large scale! 8

12 Analytical model? infer workload distribution from samples and analytically model system components (e.g. disk failure rate) Data Center hard to develop, large design space to explore often inaccurate 9

13 MODERN ENTERPRISE DATACENTERS ARE ALREADY HEAVILY INSTRUMENTED 10

14 Trace-driven online simulation Use existing instrumentation to build a datacenter model construct DC state from real events simulate the state we cannot directly observe current DC state Data Center forked forked what-if forked what-if what-if DC DC state DC state state 11

updates, Datacenter state policy enforcement, what-if

15 STRYMON: ONLINE DATACENTER ANALYTICS AND MANAGEMENT event streams Strymon Datacenter traces, configuration, topology updates, Datacenter state policy enforcement, what-if scenarios, queries, complex analytics, simulations, strymon.systems.ethz.ch 12

16 feedback TIMELY DATAFLOW ingress egress

17 STRYMON S OPERATIONAL REQUIREMENTS Low latency: react quickly to network failures High throughput: keep up with high stream rates Iterative computation: complex graph analytics on the network topology Incremental computation: reuse already computed results when possible e.g. do not recompute forwarding rules after a link update 14

TIMELY DATAFLOW: STREAM PROCESSING IN RUST Data-parallel computations Arbitrary cyclic dataflows Logical timestamps (epochs) Asynchronous execution Low latency

18 TIMELY DATAFLOW: STREAM PROCESSING IN RUST Data-parallel computations Arbitrary cyclic dataflows Logical timestamps (epochs) Asynchronous execution Low latency D. Murray, F. McSherry, M. Isard, R. Isaacs, P. Barham, M. Abadi. Naiad: A Timely Dataflow System. In SOSP,

19 WORDCOUNT IN TIMELY fn main() { timely::execute_from_args(std::env::args(), worker { let mut input = InputHandle::new(); let mut probe = ProbeHandle::new(); let index = worker.index(); worker.dataflow( scope { input.to_stream(scope).flat_map( text: String text.split_whitespace().map(move word (word.to_owned(), 1)).collect::<Vec<_>>() ).aggregate( _key, val, agg { *agg += val; }, key, agg: i64 (key, agg), key hash_str(key) ).inspect( data println!("seen {:?}", data)).probe_with(&mut probe); }); //feed data }).unwrap(); } 16

20 WORDCOUNT IN TIMELY fn main() { timely::execute_from_args(std::env::args(), worker { let mut input = InputHandle::new(); let mut probe = ProbeHandle::new(); let index = worker.index(); worker.dataflow( scope { input.to_stream(scope).flat_map( text: String text.split_whitespace().map(move word (word.to_owned(), 1)).collect::<Vec<_>>() ).aggregate( _key, val, agg { *agg += val; }, key, agg: i64 (key, agg), key hash_str(key) ).inspect( data println!("seen {:?}", data)).probe_with(&mut probe); }); //feed data }).unwrap(); } initialize and run a timely job 16

21 WORDCOUNT IN TIMELY fn main() { timely::execute_from_args(std::env::args(), worker { } let mut input = InputHandle::new(); let mut probe = ProbeHandle::new(); let index = worker.index(); create input and progress handles worker.dataflow( scope { input.to_stream(scope).flat_map( text: String text.split_whitespace().map(move word (word.to_owned(), 1)).collect::<Vec<_>>() ).aggregate( _key, val, agg { *agg += val; }, key, agg: i64 (key, agg), key hash_str(key) ).inspect( data println!("seen {:?}", data)).probe_with(&mut probe); }); //feed data }).unwrap(); 16

22 WORDCOUNT IN TIMELY fn main() { timely::execute_from_args(std::env::args(), worker { let mut input = InputHandle::new(); let mut probe = ProbeHandle::new(); let index = worker.index(); worker.dataflow( scope { input.to_stream(scope).flat_map( text: String text.split_whitespace().map(move word (word.to_owned(), 1)).collect::<Vec<_>>() } }); //feed data }).unwrap(); ).aggregate( define the dataflow and its operators _key, val, agg { *agg += val; }, key, agg: i64 (key, agg), key hash_str(key) ).inspect( data println!("seen {:?}", data)).probe_with(&mut probe); 16

23 WORDCOUNT IN TIMELY fn main() { timely::execute_from_args(std::env::args(), worker { let mut input = InputHandle::new(); let mut probe = ProbeHandle::new(); let index = worker.index(); worker.dataflow( scope { input.to_stream(scope).flat_map( text: String text.split_whitespace().map(move word (word.to_owned(), 1)).collect::<Vec<_>>() ).aggregate( _key, val, agg { *agg += val; }, key, agg: i64 (key, agg), key hash_str(key) ) } }); //feed data }).unwrap(); watch for progress.inspect( data println!("seen {:?}", data)).probe_with(&mut probe); 16

24 WORDCOUNT IN TIMELY fn main() { timely::execute_from_args(std::env::args(), worker { let mut input = InputHandle::new(); let mut probe = ProbeHandle::new(); let index = worker.index(); worker.dataflow( scope { input.to_stream(scope).flat_map( text: String text.split_whitespace() } }); //feed data }).unwrap();.map(move word (word.to_owned(), 1)).collect::<Vec<_>>() ).aggregate( _key, val, agg { *agg += val; }, key, agg: i64 (key, agg), key hash_str(key) ).inspect( data println!("seen {:?}", data)).probe_with(&mut probe); a few Rust peculiarities to get used to :-) 16

25 PROGRESS TRACKING A distributed protocol that allows operators reason about the possibility of receiving data All tuples bear a logical timestamp (think event time) To send a timestamped tuple, an operator must hold a capability for it Workers broadcast progress changes to other workers Each worker independently determines progress 17

26 MAKING PROGRESS fn main() { timely::execute_from_args(std::env::args(), worker { for round in { input.send(("round".to_owned(), 1)); input.advance_to(round + 1); while probe.less_than(input.time()) { worker.step(); } } }).unwrap(); } 18

27 MAKING PROGRESS fn main() { timely::execute_from_args(std::env::args(), worker { } push data to the input stream for round in { input.send(("round".to_owned(), 1)); input.advance_to(round + 1); while probe.less_than(input.time()) { worker.step(); } } }).unwrap(); 18

28 MAKING PROGRESS fn main() { timely::execute_from_args(std::env::args(), worker { for round in { input.send(("round".to_owned(), 1)); } } }).unwrap(); advance the input epoch input.advance_to(round + 1); while probe.less_than(input.time()) { worker.step(); } 18

29 MAKING PROGRESS fn main() { timely::execute_from_args(std::env::args(), worker { for round in { input.send(("round".to_owned(), 1)); input.advance_to(round + 1); while probe.less_than(input.time()) { worker.step(); } } } }).unwrap(); do work while there s still data for this epoch 18

30 TIMELY ITERATIONS timely::example( scope { let (handle, stream) = scope.loop_variable(100, 1); (0..10).to_stream(scope).concat(&stream).inspect( x println!("seen: {:?}", x)).connect_loop(handle); }); 19

31 TIMELY ITERATIONS timely::example( scope { let (handle, stream) = scope.loop_variable(100, 1); (0..10).to_stream(scope).concat(&stream) }); loop 100 times at most.inspect( x println!("seen: {:?}", x)).connect_loop(handle); create the feedback loop advance timestamps by 1 in each iteration 19

32 TIMELY ITERATIONS timely::example( scope { let (handle, stream) = scope.loop_variable(100, 1); (0..10).to_stream(scope).concat(&stream) }); (t, l1) loop 100 times at most.inspect( x println!("seen: {:?}", x)).connect_loop(handle); create the feedback loop advance timestamps by 1 in each iteration t (t, (l1, l2)) 19

33 TIMELY & I 20

34 TIMELY & I Relationship status: it s complicated 20

35 TIMELY & I Relationship status: it s complicated performance fault-tolerance debugging expressiveness incremental computation APIs & libraries deployment ecosystem performance 20

36 STRYMON USE-CASES

37 Strymon Real-time datacenter analytics Incremental network routing Online critical path analysis What-if analysis Query and analyze state online Control and enforce configuration Understand performance Simulate what-if scenarios 22

38 Strymon Real-time datacenter analytics Incremental network routing Online critical path analysis What-if analysis Query and analyze state online Control and enforce configuration Understand performance Simulate what-if scenarios 23

39 RECONSTRUCTING USER SESSIONS A.1 Application A A.2 A.3 B.1 Application B B.2 B.3 B.4 Time: 2015/09/01 10:03: Session ID: XKSHSKCBA53U088FXGE7LD8 Transaction ID:

40 PERFORMANCE RESULTS Logs from 1263 streams and 42 servers Client Log event Transaction n-2-8 Transaction ID Time 1.3 million events/s at MB/s A ms per epoch vs. 2.1s per epoch with Flink B Inactivity More Results: Zaheer Chothia, John Liagouris, Desislava Dimitrova, and Timothy Roscoe. Online Reconstruction of Structural Information from Datacenter Logs. (EuroSys '17). 25

41 Strymon Real-time datacenter analytics Incremental network routing Online critical path analysis What-if analysis Query and analyze state online Control and enforce configuration Understand performance Simulate what-if scenarios 26

42 ROUTING AS A STREAMING COMPUTATION Compute APSP and keep forwarding rules as operator state Flow requests translate to lookups on that state Network updates cause re-computation of affected rules only iteration Network changes delta-join G R Updated rules Forwarding rules 27

43 REACTION TO LINK FAILURES Fail random link 500 individual runs 32 threads More Results: Desislava C. Dimitrova et. al. Quick Incremental Routing Logic for Dynamic Network Graphs. SIGCOMM Posters and Demos '17 28

44 Strymon Real-time datacenter analytics Incremental network routing Online critical path analysis What-if analysis Query and analyze state online Control and enforce configuration Understand performance Simulate what-if scenarios 29

45 ONLINE CRITICAL PATH ANALYSIS input stream output stream periodic snapshot trace snapshot stream analyzer Apache Flink, Apache Spark, TensorFlow, Heron, Timely Dataflow performance summaries 30

46 TASK SCHEDULING IN APACHE SPARK DRIVER W1 W2 W3 Venkataraman, Shivaram, et al. "Drizzle: Fast and adaptable stream processing at scale." Spark Summit (2016). 31

47 SCHEDULING BOTTLENECK IN APACHE SPARK Strymon Profiling Conventional Profiling 0.8 CP % weight Snapshot Processing Snapshot Scheduling Apache Spark: Yahoo! Streaming Benchmark, 16 workers, 8s snapshots 32

48 SCHEDULING BOTTLENECK IN APACHE SPARK Strymon Profiling Conventional Profiling 0.8 CP % weight Snapshot Processing Snapshot Scheduling Apache Spark: Yahoo! Streaming Benchmark, 16 workers, 8s snapshots 32

49 Strymon Real-time datacenter analytics Incremental network routing Online critical path analysis What-if analysis Query and analyze state online Control and enforce configuration Understand performance Simulate what-if scenarios 33

50 EVALUATING LOAD BALANCING STRATEGIES 13k services ~100K user requests/s OSPF routing Weighted Round-Robin load balancing Traffic matrix simulation under topology and load balancing changes 34

51 WHAT-IF DATAFLOW RPC logs (text) ETL Load Balancing Routing Traffic Matrix Construction Logging Servers Topology changes (graph) LB View Construction Topology Discovery Configuration DBs Device specs, application placement (relational) DC Model construction Routing View Construction 35

52 STRYMON DEMO

53 Strymon has been released! Try it out: Send us feedback: 37

54 THE STRYMON TEAM & FRIENDS Prof. Timothy Roscoe Desislava Dimitrova Moritz Hoffmann Andrea Lattuada John Liagouris Zaheer Chothia Sebastian Wicki Frank McSherry Vasiliki Kalavri 38

55 THE STRYMON TEAM & FRIENDS Prof. Timothy Roscoe Desislava Dimitrova Moritz Hoffmann Andrea Lattuada John Liagouris Zaheer Chothia Sebastian Wicki Frank McSherry?? Vasiliki Kalavri 38

56 THE STRYMON TEAM & FRIENDS Prof. Timothy Roscoe Desislava Dimitrova Moritz Hoffmann Andrea Lattuada IT COULD BE YOU! strymon.systems.ethz.ch John Liagouris Zaheer Chothia Sebastian Wicki Frank McSherry?? Vasiliki Kalavri 38

57 PREDICTIVE DATACENTER ANALYTICS WITH STRYMON Vasia Kalavri QCon San Francisco 14 November 2017 Support:

SnailTrail Generalizing Critical Paths for Online Analysis of Distributed Dataflows

SnailTrail Generalizing Critical Paths for Online Analysis of Distributed Dataflows Moritz Hoffmann, Andrea Lattuada, John Liagouris, Vasiliki Kalavri, Desislava Dimitrova, Sebastian Wicki, Zaheer Chothia,