Flexible Network Analytics in the Cloud. Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco
|
|
- Ethel O’Connor’
- 5 years ago
- Views:
Transcription
1 Flexible Network Analytics in the Cloud Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco
2 Introduction Harsh realities of network analytics netbeam Demo Technology Stack Alternative Approaches Lessons Learned 2
3 Architecture 3
4 The Harsh Realities of Network Analytics 1. It s a mess Your data isn t neat and tidy 2. Things change What you need today may not be what you need tomorrow. 3. There s always more More devices & more telemetry 4. It s never really done Time and money are limited 4
5 Coping strategies 1. It s a mess Design knowing things won t be tidy 2. Things change Keep raw data to keep your options open 3. There s always more Rely on the cloud for scaling 4. It s never really done What not How 5
6 netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics 2. Offline analytics 3. Low latency, incomplete High latency, complete Flexible data model Changing needs? Recompute from raw data! Secret sauce: Apache Beam 6
7 What is Apache Beam? 1. The Beam Programming Model 2. SDKs for writing Beam pipelines 3. Runners for existing distributed processing backends Apache Apex Apache Flink Apache Spark Google Cloud Dataflow Local runner for testing Slide courtesy of the Apache Beam Project 7
8 The Evolution of Apache Beam Colossus BigTable PubSub Dremel Google Cloud Dataflow Spanner Megastore Millwheel Flume Apache Beam MapReduce Slide courtesy of the Apache Beam Project 8
9 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream Processing) avro (realtime) Apache Beam (Batch Processing) (immutable) (historical) API Client 9
10 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Google Pubsub Uses Python outside of Google Cloud to poll devices and write to Pubsub topic Code within Google Cloud subscribes to topic to process data Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 10
11 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Stream processing Subscribes to Pubsub topic Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 11
12 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Stream processing Subscribes to Pubsub topic Raw data is written to Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 12
13 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Stream processing Subscribes to Pubsub topic Raw data is written to Real time transformed data (e.g. aligned data rates) written to Writes and makes use of meta data in BigTable (not shown) Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 13
14 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Cloud Like HBase Write to cells in rows, indexed by keys We write 1 day of data to a single row (columns are the time of day, key is metric and day) Fast access to row by key, can serve data from here Store one year Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 14
15 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Data warehousing solution Cheap storage, SQL access, but not suitable for real-time access Allows SQL queries for ad hoc investigation We store our source of truth here Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 15
16 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Data warehousing solution Cheap storage, SQL access, but not suitable for real-time access Allows SQL queries for ad hoc investigation We store our source of truth here Also store historical data (7 years), imported via avro files Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 16
17 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Batch processing Run with cron job Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 17
18 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Batch processing Run with cron job Recalculate data each night from source of truth in Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 18
19 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Batch processing Run with cron job Recalculate data each night from source of truth in Process rows into new rows of 5min, 1 hr and 1 day aggregations Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 19
20 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro Apache Beam / Google Dataflow Batch processing Run with cron job Recalculate data each night from source of truth in Process rows into new rows of 5min, 1 hr and 1 day aggregations Additional pre-computed views e.g. percentiles for traffic distribution over a month Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles API Client 20
21 Architecture Diagram SNMP collection system Old SNMP system Apache Beam (Stream) avro API Currently runs on App Engine Node.js Serves data out of Timeseries data is served as tiles, each tile is one row Would like to use Cloud Endpoints and provide a grpc service Looking forward to grpc-web solution Align/rates (realtime) Rollups 5m, 1h, 1d avg (immutable) (historical) Percentiles Dataserver API (node.js) Client 21
22 Use case example: Historical Trends 22
23 Use case example: Historical Trends SNMP collection system Stream to BQ Per-day Interface totals Old SNMP system avro (historical) Per-month totals rows Dataserver API (node.js) snmp-daily:: ::$interface Jan 1 Jan 2 Dec Pb 1.9 Pb 3.1 Pb Client snmp-monthly-totals Jan 1991 Feb 1991 Sep Gb 29 Gb 56 Pb 23
24 Use case: real time anomaly detection SNMP collection system Baseline generation Stream to BQ Generates avg for each interface over the past 3 months for that hour/day Anomaly detection Compares baseline to real time values to generate current deviation from normal baseline::5m::avg::$interface Mon 12am Mon 1am Mon 2am Sun 11pm Dataserver API (node.js) Client anomaly::5m::avg iface-1 iface-2 iface-n
25 Use case example: Percentiles 25
26 Use case example: Percentiles Daily rollups 5m avg SNMP collection system Stream to Percentiles rows rollup-month-5m:: ::$interface::in Gbps 5Gbps 2Gbps Dataserver API (node.js) Client percentiles:: ::$interface::in 1 pct 2 pct 99 pct 0.1 Gbps 0.3 Gbps 22.1Gbps 26
27 Demo 27
28 Example: Computing Total Traffic # Python Beam SDK pipeline = beam.pipeline('directrunner') (pipeline 'read' >> ReadFromText('./example.csv') 'csv' >> beam.pardo(formatcsvdofn()) 'ifname key' >> beam.map(group_by_device_interface) 'group by iface' >> beam.groupbykey() 'compute rate' >> beam.flatmap(compute_rate) 'timestamp key' >> beam.map(lambda row: (row['timestamp'], row['ratein'])) 'group by timestamp' >> beam.groupbykey() 'sum by timestamp' >> beam.map(lambda rates: (rates[0], sum(rates[1]))) 'format' >> beam.map(lambda row: '{},{}'.format(row[0], row[1])) 'save' >> beam.io.writetotext('./total_by_timestamp')) pipeline.run() Full code available at: 28
29 Our Stack Apache Beam using Scio Google Cloud Platform Dataflow Pub/Sub App Engine Languages Scala Javascript / Typescript Python Cloud Dataflow Cloud Pub/Sub Cloud App Engine Cloud Endpoints 29
30 Current Status & Future Plans Current Future Alpha version for SNMP data: More types of data: Ingest to is working Migration of historical data is implemented. Awaiting final details before full conversion Streaming ingest to still in process Early version of utilization visualization Simple data server can provide data to clients, but grpc API coming Interface timeseries charts functional Flow data perfsonar Machine Learning Anomaly Detection Mash up various data sources 30
31 Why not InfluxDB, Elastic or ${FAVORITE_DB} We have a data processing problem, not a data storage problem per se. Beam and the ecosystem around it give a huge amount of flexibility -- can try new ideas as they occur to us Ability to move to different platform components machine learning (TensorFlow and others) InfluxDB & Elastic require care and feeding -- have to think about disks and machines, etc. At our last evaluation (a while ago now) InfluxDB wasn t able to keep up with our load -- this may have changed but other benefits outweigh that. Elastic doesn t seem to be a good fit for long term storage -- everything is in the hot tier 31
32 Why the cloud? Why Google Cloud Platform? Why the cloud? Focus on our problems not on infrastructure Scalability without needing to own lots of systems Managed services for databases and compute Why Google Cloud? Apache Beam was Google Dataflow when we first encountered it More cohesive ecosystem than AWS in our experience 32
33 Lessons learned / Life in the cloud / Good & Bad This approach is not a silver bullet, but definitely makes many things easier Scaling is pretty sweet: we processed 4,005,271,066 points in 13 hours GCP Tech support could be better Despite early indications Python streaming support in Beam has been slow to appear. Python is a second class citizen. Fortunately Scio and Scala allow working with the Java SDK at a high level of abstraction. Scala is powerful but challenging at times Focus on developing your services, not on setting up machines to run them Nice options for decomposing services (Endpoints/esp, load balancing, etc) Service oriented Battle tested software stacks 33
34 Thank you! Peter Murphy Jon Dugan MyESnet: ESnet Open Source: Scio: Beam: 34
Apache Beam. Modèle de programmation unifié pour Big Data
Apache Beam Modèle de programmation unifié pour Big Data Who am I? Jean-Baptiste Onofre @jbonofre http://blog.nanthrax.net Member of the Apache Software Foundation
More informationFundamentals of Stream Processing with Apache Beam (incubating)
Google Docs version of slides (including animations): https://goo.gl/yzvlxe Fundamentals of Stream Processing with Apache Beam (incubating) Frances Perry & Tyler Akidau @francesjperry, @takidau Apache
More informationData Processing with Apache Beam (incubating) and Google Cloud Dataflow
Data Processing with Apache Beam (incubating) and Google Cloud Dataflow Jelena Pjesivac-Grbovic Staff software engineer Cloud Big Data In collaboration with Frances Perry, Tayler Akidau, and Dataflow team
More informationAn Introduction to The Beam Model
An Introduction to The Beam Model Apache Beam (incubating) Slides by Tyler Akidau & Frances Perry, April 2016 Agenda 1 Infinite, Out-of-order Data Sets 2 The Evolution of the Beam Model 3 What, Where,
More informationIntroduction to Apache Beam
Introduction to Apache Beam Dan Halperin JB Onofré Google Beam podling PMC Talend Beam Champion & PMC Apache Member Apache Beam is a unified programming model designed to provide efficient and portable
More informationBig Data Infrastructure at Spotify
Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure September 26, 2013 2 Who am I? According to ZDNet: "The work they have done to improve the Apache Hive data warehouse system
More informationApache Beam: portable and evolutive data-intensive applications
Apache Beam: portable and evolutive data-intensive applications Ismaël Mejía - @iemejia Talend Who am I? @iemejia Software Engineer Apache Beam PMC / Committer ASF member Integration Software Big Data
More informationProcessing Data Like Google Using the Dataflow/Beam Model
Todd Reedy Google for Work Sales Engineer Google Processing Data Like Google Using the Dataflow/Beam Model Goals: Write interesting computations Run in both batch & streaming Use custom timestamps Handle
More informationDruid Power Interactive Applications at Scale. Jonathan Wei Software Engineer
Druid Power Interactive Applications at Scale Jonathan Wei Software Engineer History & Motivation Demo Overview Storage Internals Druid Architecture Motivation Motivation Visibility and analysis for complex
More informationThe SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.
Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate
More informationScaling Marketplaces at Thumbtack QCon SF 2017
Scaling Marketplaces at Thumbtack QCon SF 2017 Nate Kupp Technical Infrastructure Data Eng, Experimentation, Platform Infrastructure, Security, Dev Tools Infrastructure from early beginnings You see that?
More informationChronix A fast and efficient time series storage based on Apache Solr. Caution: Contains technical content.
Chronix A fast and efficient time series storage based on Apache Solr Caution: Contains technical content. 68.000.000.000* time correlated data objects. How to store such amount of data on your laptop
More informationReal-time Streaming Applications on AWS Patterns and Use Cases
Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationGoogle Cloud Dataflow
Google Cloud Dataflow A Unified Model for Batch and Streaming Data Processing Jelena Pjesivac-Grbovic STREAM 2015 Agenda 1 Data Shapes 2 Data Processing Tradeoffs 3 Google s Data Processing Story 4 Google
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationNew Data Architectures For Netflow Analytics NANOG 74. Fangjin Yang - Imply
New Data Architectures For Netflow Analytics NANOG 74 Fangjin Yang - Cofounder @ Imply The Problem Comparing technologies Overview Operational analytic databases Try this at home The Problem Netflow data
More informationSpark, Shark and Spark Streaming Introduction
Spark, Shark and Spark Streaming Introduction Tushar Kale tusharkale@in.ibm.com June 2015 This Talk Introduction to Shark, Spark and Spark Streaming Architecture Deployment Methodology Performance References
More informationTime Series Live 2017
1 Time Series Schemas @Percona Live 2017 Who Am I? Chris Larsen Maintainer and author for OpenTSDB since 2013 Software Engineer @ Yahoo Central Monitoring Team Who I m not: A marketer A sales person 2
More informationData Ingestion at Scale. Jeffrey Sica
Data Ingestion at Scale Jeffrey Sica ARC-TS @jeefy Overview What is Data Ingestion? Concepts Use Cases GPS collection with mobile devices Collecting WiFi data from WAPs Sensor data from manufacturing machines
More informationDevops Practices on google cloud. Jeff Liu Google
Devops Practices on google cloud Jeff Liu Google Jeff Liu SWE / Devops Google SRE - Twitter Devops - Splunk SRE - Dell Devops Evolution Devops Evolution Devops Evolution Cloud/Platform Microservices Big
More informationBIG DATA REVOLUTION IN JOBRAPIDO
BIG DATA REVOLUTION IN JOBRAPIDO Michele Pinto Big Data Technical Team Leader @ Jobrapido Big Data Tech 2016 Firenze - October 20, 2016 ABOUT ME NAME Michele Pinto LINKEDIN https://www.linkedin.com/in/pintomichele
More informationStreaming Auto-Scaling in Google Cloud Dataflow
Streaming Auto-Scaling in Google Cloud Dataflow Manuel Fahndrich Software Engineer Google Addictive Mobile Game https://commons.wikimedia.org/wiki/file:globe_centered_in_the_atlantic_ocean_(green_and_grey_globe_scheme).svg
More informationIntro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect
Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing
More informationProcessing of big data with Apache Spark
Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT
More informationTOWARDS PORTABILITY AND BEYOND. Maximilian maximilianmichels.com DATA PROCESSING WITH APACHE BEAM
TOWARDS PORTABILITY AND BEYOND Maximilian Michels mxm@apache.org DATA PROCESSING WITH APACHE BEAM @stadtlegende maximilianmichels.com !2 BEAM VISION Write Pipeline Execute SDKs Runners Backends !3 THE
More informationRAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko
RAMCloud Scalable High-Performance Storage Entirely in DRAM 2009 by John Ousterhout et al. Stanford University presented by Slavik Derevyanko Outline RAMCloud project overview Motivation for RAMCloud storage:
More informationLessons Learned: Building Scalable & Elastic Akka Clusters on Google Managed Kubernetes. - Timo Mechler & Charles Adetiloye
Lessons Learned: Building Scalable & Elastic Akka Clusters on Google Managed Kubernetes - Timo Mechler & Charles Adetiloye About MavenCode MavenCode is a Data Analytics software company offering training,
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationGoogle Cloud Bigtable. And what it's awesome at
Google Cloud Bigtable And what it's awesome at Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes Agenda 1 Research 2 A story about bigness 3 How it works 4 When it's awesome Google Research
More informationWho Am I? Chris Larsen
2.4 and 3.0 Update Who Am I? Chris Larsen Maintainer and author for OpenTSDB since 2013 Software Engineer @ Yahoo Central Monitoring Team Who I m not: A marketer A sales person 2 What Is OpenTSDB? Open
More informationBig Data on AWS. Peter-Mark Verwoerd Solutions Architect
Big Data on AWS Peter-Mark Verwoerd Solutions Architect What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing
More informationFROM ZERO TO PORTABILITY
FROM ZERO TO PORTABILITY? Maximilian Michels mxm@apache.org APACHE BEAM S JOURNEY TO CROSS-LANGUAGE DATA PROCESSING @stadtlegende maximilianmichels.com FOSDEM 2019 What is Beam? What does portability mean?
More informationPortable stateful big data processing in Apache Beam
Portable stateful big data processing in Apache Beam Kenneth Knowles Apache Beam PMC Software Engineer @ Google klk@google.com / @KennKnowles https://s.apache.org/ffsf-2017-beam-state Flink Forward San
More informationUsing Apache Beam for Batch, Streaming, and Everything in Between. Dan Halperin Apache Beam PMC Senior Software Engineer, Google
Abstract Apache Beam is a unified programming model capable of expressing a wide variety of both traditional batch and complex streaming use cases. By neatly separating properties of the data from run-time
More informationRIPE75 - Network monitoring at scale. Louis Poinsignon
RIPE75 - Network monitoring at scale Louis Poinsignon Why monitoring and what to monitor? Why do we monitor? Billing Reducing costs Traffic engineering Where should we peer? Where should we set-up a new
More informationData in the Cloud and Analytics in the Lake
Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationTowards a Real- time Processing Pipeline: Running Apache Flink on AWS
Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Dr. Steffen Hausmann, Solutions Architect Michael Hanisch, Manager Solutions Architecture November 18 th, 2016 Stream Processing Challenges
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationFluentd + MongoDB + Spark = Awesome Sauce
Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here Wipro Open Source Practice: Vision & Mission Vision
More informationCIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench
CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench Abstract Implementing a Hadoop-based system for processing big data and doing analytics is a topic which has been
More informationCloud platforms. T Mobile Systems Programming
Cloud platforms T-110.5130 Mobile Systems Programming Agenda 1. Motivation 2. Different types of cloud platforms 3. Popular cloud services 4. Open-source cloud 5. Cloud on this course 6. Mobile Edge Computing
More informationMonitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino
Monitoring system for geographically distributed datacenters based on Openstack Gioacchino Vino Tutor: Dott. Domenico Elia Tutor: Dott. Giacinto Donvito Borsa di studio GARR Orio Carlini 2016-2017 INFN
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference
More informationLazyBase: Trading freshness and performance in a scalable database
LazyBase: Trading freshness and performance in a scalable database (EuroSys 2012) Jim Cipar, Greg Ganger, *Kimberly Keeton, *Craig A. N. Soules, *Brad Morrey, *Alistair Veitch PARALLEL DATA LABORATORY
More information8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara
Week 1-B-0 Week 1-B-1 CS535 BIG DATA FAQs Slides are available on the course web Wait list Term project topics PART 0. INTRODUCTION 2. DATA PROCESSING PARADIGMS FOR BIG DATA Sangmi Lee Pallickara Computer
More informationWelcome to the New Era of Cloud Computing
Welcome to the New Era of Cloud Computing Aaron Kimball The web is replacing the desktop 1 SDKs & toolkits are there What about the backend? Image: Wikipedia user Calyponte 2 Two key concepts Processing
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationZombie Apocalypse Workshop
Zombie Apocalypse Workshop Building Serverless Microservices Danilo Poccia @danilop Paolo Latella @LatellaPaolo September 22 nd, 2016 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More informationHBase Solutions at Facebook
HBase Solutions at Facebook Nicolas Spiegelberg Software Engineer, Facebook QCon Hangzhou, October 28 th, 2012 Outline HBase Overview Single Tenant: Messages Selection Criteria Multi-tenant Solutions
More informationApache Ignite and Apache Spark Where Fast Data Meets the IoT
Apache Ignite and Apache Spark Where Fast Data Meets the IoT Denis Magda GridGain Product Manager Apache Ignite PMC http://ignite.apache.org #apacheignite #denismagda Agenda IoT Demands to Software IoT
More informationBuilding Scalable and Extendable Data Pipeline for Call of Duty Games: Lessons Learned. Yaroslav Tkachenko Senior Data Engineer at Activision
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lessons Learned Yaroslav Tkachenko Senior Data Engineer at Activision 1+ PB Data lake size (AWS S3) Number of topics in the biggest
More informationHow can you implement this through a script that a scheduling daemon runs daily on the application servers?
You ve been tasked with implementing an automated data backup solution for your application servers that run on Amazon EC2 with Amazon EBS volumes. You want to use a distributed data store for your backups
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationDown the event-driven road: Experiences of integrating streaming into analytic data platforms
Down the event-driven road: Experiences of integrating streaming into analytic data platforms Dr. Dominik Benz, Head of Machine Learning Engineering, inovex GmbH Confluent Meetup Munich, 8.10.2018 Integrate
More informationitexamdump 최고이자최신인 IT 인증시험덤프 일년무료업데이트서비스제공
itexamdump 최고이자최신인 IT 인증시험덤프 http://www.itexamdump.com 일년무료업데이트서비스제공 Exam : Professional-Cloud-Architect Title : Google Certified Professional - Cloud Architect (GCP) Vendor : Google Version : DEMO Get
More informationCS 102. Big Data. Spring Big Data Platforms
CS 102 Big Data Spring 2016 Big Data Platforms How Big is Big? The Data Data Sets 1000 2 (5.3 MB) Complete works of Shakespeare (text) 1000 3 (~5-500 GB) Your data 1000 4 (10 TB) Library of Congress (text)
More informationCloud Analytics and Business Intelligence on AWS
Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationGabriel Villa. Architecting an Analytics Solution on AWS
Gabriel Villa Architecting an Analytics Solution on AWS Cloud and Data Architect Skilled leader, solution architect, and technical expert focusing primarily on Microsoft technologies and AWS. Passionate
More informationMicrosoft Azure Stream Analytics
Microsoft Azure Stream Analytics Marcos Roriz and Markus Endler Laboratory for Advanced Collaboration (LAC) Departamento de Informática (DI) Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio)
More informationDeep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services
Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationApp Engine: Datastore Introduction
App Engine: Datastore Introduction Part 1 Another very useful course: https://www.udacity.com/course/developing-scalableapps-in-java--ud859 1 Topics cover in this lesson What is Datastore? Datastore and
More informationΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing
ΕΠΛ 602:Foundations of Internet Technologies Cloud Computing 1 Outline Bigtable(data component of cloud) Web search basedonch13of thewebdatabook 2 What is Cloud Computing? ACloudis an infrastructure, transparent
More informationTime Series Storage with Apache Kudu (incubating)
Time Series Storage with Apache Kudu (incubating) Dan Burkert (Committer) dan@cloudera.com @danburkert Tweet about this talk: @getkudu or #kudu 1 Time Series machine metrics event logs sensor telemetry
More informationAWS Serverless Architecture Think Big
MAKING BIG DATA COME ALIVE AWS Serverless Architecture Think Big Garrett Holbrook, Data Engineer Feb 1 st, 2017 Agenda What is Think Big? Example Project Walkthrough AWS Serverless 2 Think Big, a Teradata
More informationMicrosoft Big Data and Hadoop
Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common
More informationLecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka
Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka What problem does Kafka solve? Provides a way to deliver updates about changes in state from one service to another
More informationCisco Tetration Analytics
Cisco Tetration Analytics Enhanced security and operations with real time analytics John Joo Tetration Business Unit Cisco Systems Security Challenges in Modern Data Centers Securing applications has become
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationData Centers and Cloud Computing. Slides courtesy of Tim Wood
Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet
More informationAugust 23, 2017 Revision 0.3. Building IoT Applications with GridDB
August 23, 2017 Revision 0.3 Building IoT Applications with GridDB Table of Contents Executive Summary... 2 Introduction... 2 Components of an IoT Application... 2 IoT Models... 3 Edge Computing... 4 Gateway
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationReport on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt
Report on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt Date: 10 Sep, 2017 Draft v 4.0 Table of Contents 1. Introduction... 3 2. Infrastructure Reference Architecture...
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationDeveloping Enterprise Cloud Solutions with Azure
Developing Enterprise Cloud Solutions with Azure Java Focused 5 Day Course AUDIENCE FORMAT Developers and Software Architects Instructor-led with hands-on labs LEVEL 300 COURSE DESCRIPTION This course
More informationEvolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo
Evolution of Big Data Architectures@ Facebook Architecture Summit, Shenzhen, August 2012 Ashish Thusoo About Me Currently Co-founder/CEO of Qubole Ran the Data Infrastructure Team at Facebook till 2011
More informationData Centers and Cloud Computing. Data Centers
Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet
More informationSocial Network Analytics on Cray Urika-XA
Social Network Analytics on Cray Urika-XA Mike Hinchey, mhinchey@cray.com Technical Solutions Architect Cray Inc, Analytics Products Group April, 2015 Agenda 1. Introduce platform Urika-XA 2. Technology
More informationSimplifying ML Workflows with Apache Beam & TensorFlow Extended
Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau Software Engineer at Google Apache Beam PMC Apache Beam Portable data-processing pipelines Example pipelines Python
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationMigrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring
Migrating massive monitoring to Bigtable without downtime Martin Parm, Infrastructure Engineer for Monitoring This is a big deal. -- Nicholas Harteau/VP, Engineering & Infrastructure https://news.spotify.com/dk/2016/02/23/announcing-spotify-infrastructures-googley-future/
More informationShen PingCAP 2017
Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationSpark Streaming. Guido Salvaneschi
Spark Streaming Guido Salvaneschi 1 Spark Streaming Framework for large scale stream processing Scales to 100s of nodes Can achieve second scale latencies Integrates with Spark s batch and interactive
More informationSparkStreaming. Large scale near- realtime stream processing. Tathagata Das (TD) UC Berkeley UC BERKELEY
SparkStreaming Large scale near- realtime stream processing Tathagata Das (TD) UC Berkeley UC BERKELEY Motivation Many important applications must process large data streams at second- scale latencies
More informationPrincipal Software Engineer Red Hat Emerging Technology June 24, 2015
USING APACHE SPARK FOR ANALYTICS IN THE CLOUD William C. Benton Principal Software Engineer Red Hat Emerging Technology June 24, 2015 ABOUT ME Distributed systems and data science in Red Hat's Emerging
More informationStreaming Data: The Opportunity & How to Work With It
Streaming Data: The Opportunity & How to Work With It Roger Barga, GM Amazon Kinesis April 2016 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Interest in and demand for stream
More informationExadata. Presented by: Kerry Osborne. February 23, 2012
Exadata Presented by: Kerry Osborne February 23, 2012 whoami Worked with Oracle Since 1982 (V2) Working with Exadata since early 2010 Work for Enkitec (www.enkitec.com) (Enkitec owns a Half Rack V2/X2)
More informationTurning Relational Database Tables into Spark Data Sources
Turning Relational Database Tables into Spark Data Sources Kuassi Mensah Jean de Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 3 Safe Harbor Statement The following
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016
Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation
More informationEvolution of an Apache Spark Architecture for Processing Game Data
Evolution of an Apache Spark Architecture for Processing Game Data Nick Afshartous WB Analytics Platform May 17 th 2017 May 17 th, 2017 About Me nafshartous@wbgames.com WB Analytics Core Platform Lead
More informationPNDA.io: when BGP meets Big-Data
PNDA.io: when BGP meets Big-Data Let s go back in time 26 th April 2017 The Internet is very much alive Millions of BGP events occurring every day 15 Routers Monitored 410 active peers (both IPv4 and IPv6)
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationApache Drill. Interactive Analysis of Large-Scale Datasets. Tomer Shiran
Apache Drill Interactive Analysis of Large-Scale Datasets Tomer Shiran Latency Matters Ad-hoc analysis with interactive tools Real-time dashboards Event/trend detection Network intrusions Fraud Failures
More information