Flexible Network Analytics in the Cloud. Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco

Size: px

Start display at page:

Download "Flexible Network Analytics in the Cloud. Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco"

Ethel O’Connor’
5 years ago
Views:

1 Flexible Network Analytics in the Cloud Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco

2 Introduction Harsh realities of network analytics netbeam Demo Technology Stack Alternative Approaches Lessons Learned 2

3 Architecture 3

4 The Harsh Realities of Network Analytics 1. It s a mess Your data isn t neat and tidy 2. Things change What you need today may not be what you need tomorrow. 3. There s always more More devices & more telemetry 4. It s never really done Time and money are limited 4

5 Coping strategies 1. It s a mess Design knowing things won t be tidy 2. Things change Keep raw data to keep your options open 3. There s always more Rely on the cloud for scaling 4. It s never really done What not How 5

6 netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics 2. Offline analytics 3. Low latency, incomplete High latency, complete Flexible data model Changing needs? Recompute from raw data! Secret sauce: Apache Beam 6

7 What is Apache Beam? 1. The Beam Programming Model 2. SDKs for writing Beam pipelines 3. Runners for existing distributed processing backends Apache Apex Apache Flink Apache Spark Google Cloud Dataflow Local runner for testing Slide courtesy of the Apache Beam Project 7

8 The Evolution of Apache Beam Colossus BigTable PubSub Dremel Google Cloud Dataflow Spanner Megastore Millwheel Flume Apache Beam MapReduce Slide courtesy of the Apache Beam Project 8

9 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream Processing) avro (realtime) Apache Beam (Batch Processing) (immutable) (historical) API Client 9

10 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Google Pubsub Uses Python outside of Google Cloud to poll devices and write to Pubsub topic Code within Google Cloud subscribes to topic to process data Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 10

11 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Stream processing Subscribes to Pubsub topic Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 11

12 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Stream processing Subscribes to Pubsub topic Raw data is written to Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 12

13 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Stream processing Subscribes to Pubsub topic Raw data is written to Real time transformed data (e.g. aligned data rates) written to Writes and makes use of meta data in BigTable (not shown) Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 13

14 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Cloud Like HBase Write to cells in rows, indexed by keys We write 1 day of data to a single row (columns are the time of day, key is metric and day) Fast access to row by key, can serve data from here Store one year Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 14

15 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Data warehousing solution Cheap storage, SQL access, but not suitable for real-time access Allows SQL queries for ad hoc investigation We store our source of truth here Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 15

16 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Data warehousing solution Cheap storage, SQL access, but not suitable for real-time access Allows SQL queries for ad hoc investigation We store our source of truth here Also store historical data (7 years), imported via avro files Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 16

17 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Batch processing Run with cron job Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 17

18 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Batch processing Run with cron job Recalculate data each night from source of truth in Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 18

19 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Batch processing Run with cron job Recalculate data each night from source of truth in Process rows into new rows of 5min, 1 hr and 1 day aggregations Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 19

20 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Batch processing Run with cron job Recalculate data each night from source of truth in Process rows into new rows of 5min, 1 hr and 1 day aggregations Additional pre-computed views e.g. percentiles for traffic distribution over a month Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 20

21 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro API Currently runs on App Engine Node.js Serves data out of Timeseries data is served as tiles, each tile is one row Would like to use Cloud Endpoints and provide a grpc service Looking forward to grpc-web solution Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles Dataserver API (node.js) Client 21

22 Use case example: Historical Trends 22

23 Use case example: Historical Trends SNMP collection system Stream to BQ Per-day Interface totals Old SNMP system avro (historical) Per-month totals rows Dataserver API (node.js) snmp-daily:: ::$interface Jan 1 Jan 2 Dec Pb 1.9 Pb 3.1 Pb Client snmp-monthly-totals Jan 1991 Feb 1991 Sep Gb 29 Gb 56 Pb 23

24 Use case: real time anomaly detection SNMP collection system Baseline generation Stream to BQ Generates avg for each interface over the past 3 months for that hour/day Anomaly detection Compares baseline to real time values to generate current deviation from normal baseline::5m::avg::$interface Mon 12am Mon 1am Mon 2am Sun 11pm Dataserver API (node.js) Client anomaly::5m::avg iface-1 iface-2 iface-n

25 Use case example: Percentiles 25

26 Use case example: Percentiles Daily rollups 5m avg SNMP collection system Stream to Percentiles rows rollup-month-5m:: ::$interface::in Gbps 5Gbps 2Gbps Dataserver API (node.js) Client percentiles:: ::$interface::in 1 pct 2 pct 99 pct 0.1 Gbps 0.3 Gbps 22.1Gbps 26

27 Demo 27

28 Example: Computing Total Traffic # Python Beam SDK pipeline = beam.pipeline('directrunner') (pipeline 'read' >> ReadFromText('./example.csv') 'csv' >> beam.pardo(formatcsvdofn()) 'ifname key' >> beam.map(group_by_device_interface) 'group by iface' >> beam.groupbykey() 'compute rate' >> beam.flatmap(compute_rate) 'timestamp key' >> beam.map(lambda row: (row['timestamp'], row['ratein'])) 'group by timestamp' >> beam.groupbykey() 'sum by timestamp' >> beam.map(lambda rates: (rates[0], sum(rates[1]))) 'format' >> beam.map(lambda row: '{},{}'.format(row[0], row[1])) 'save' >> beam.io.writetotext('./total_by_timestamp')) pipeline.run() Full code available at: 28

29 Our Stack Apache Beam using Scio Google Cloud Platform Dataflow Pub/Sub App Engine Languages Scala Javascript / Typescript Python Cloud Dataflow Cloud Pub/Sub Cloud App Engine Cloud Endpoints 29

30 Current Status & Future Plans Current Future Alpha version for SNMP data: More types of data: Ingest to is working Migration of historical data is implemented. Awaiting final details before full conversion Streaming ingest to still in process Early version of utilization visualization Simple data server can provide data to clients, but grpc API coming Interface timeseries charts functional Flow data perfsonar Machine Learning Anomaly Detection Mash up various data sources 30

31 Why not InfluxDB, Elastic or ${FAVORITE_DB} We have a data processing problem, not a data storage problem per se. Beam and the ecosystem around it give a huge amount of flexibility -- can try new ideas as they occur to us Ability to move to different platform components machine learning (TensorFlow and others) InfluxDB & Elastic require care and feeding -- have to think about disks and machines, etc. At our last evaluation (a while ago now) InfluxDB wasn t able to keep up with our load -- this may have changed but other benefits outweigh that. Elastic doesn t seem to be a good fit for long term storage -- everything is in the hot tier 31

32 Why the cloud? Why Google Cloud Platform? Why the cloud? Focus on our problems not on infrastructure Scalability without needing to own lots of systems Managed services for databases and compute Why Google Cloud? Apache Beam was Google Dataflow when we first encountered it More cohesive ecosystem than AWS in our experience 32

33 Lessons learned / Life in the cloud / Good & Bad This approach is not a silver bullet, but definitely makes many things easier Scaling is pretty sweet: we processed 4,005,271,066 points in 13 hours GCP Tech support could be better Despite early indications Python streaming support in Beam has been slow to appear. Python is a second class citizen. Fortunately Scio and Scala allow working with the Java SDK at a high level of abstraction. Scala is powerful but challenging at times Focus on developing your services, not on setting up machines to run them Nice options for decomposing services (Endpoints/esp, load balancing, etc) Service oriented Battle tested software stacks 33

34 Thank you! Peter Murphy Jon Dugan MyESnet: ESnet Open Source: Scio: Beam: 34

Apache Beam. Modèle de programmation unifié pour Big Data

Apache Beam. Modèle de programmation unifié pour Big Data Apache Beam Modèle de programmation unifié pour Big Data Who am I? Jean-Baptiste Onofre @jbonofre http://blog.nanthrax.net Member of the Apache Software Foundation