Processing Data Like Google Using the Dataflow/Beam Model
|
|
- Silas Griffith
- 5 years ago
- Views:
Transcription
1 Todd Reedy Google for Work Sales Engineer Google Processing Data Like Google Using the Dataflow/Beam Model
2 Goals: Write interesting computations Run in both batch & streaming Use custom timestamps Handle late data
3 Weave provides the fastest way to connect smart devices to the network, the cloud, and an ecosystem of (business) apps & smart devices. Weave setup works on Android, ios, and desktop. CRM Want more details? Visit:
4 Brillo is a free OS, tightly integrated with Android, tailored for IoT. Brillo makes it cost-effective to build a secure connected device, with verified boot to ensure it runs only trusted code. Brillo comes with services to help understand how customers interact with it, and keep it updated over time. Have Brillo running on your device itself or use it as a hub to aggregate data from nearby (micro) processors. OTA Updates Weave Metrics Android HAL Kernel Hardware Telemetry
5 Agenda 1 Data Shapes 2 Google s Data Processing Story 3 The Beam Model 4 Demo: Google Cloud Dataflow
6 1 Data Shapes
7 Data...
8 ...can be big...
9 ...really, really big... Thursday Wednesday Tuesday
10 ...maybe even infinitely big... 8:00 1:00 9:00 2:00 10:00 3:00 11:00 4:00 12:00 5:00 13:00 6:00 14:00 7:00
11 with unknown delays. 8:00 8:00 8:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00
12 Data Processing Tradeoffs = 2 $$$ Completeness Latency Cost
13 Requirements: Billing Pipeline Important Not Important Completeness Latency Cost
14 Requirements: Live Cost Estimate Pipeline Important Not Important Completeness Latency Cost
15 Requirements: Abuse Detection Pipeline Important Not Important Completeness Latency Cost
16 Requirements: Abuse Detection Backfill Pipeline Important Not Important Completeness Latency Cost
17 2 Google s Data Processing Story
18 Google s Data-Related Papers Dataflow MapReduce FlumeJava Dremel GFS Big Table Pregel Spanner Colossus MillWheel
19 MapReduce: Batch Processing M M M R R
20 FlumeJava: Easy and Efficient MapReduce Pipelines Higher-level API with simple data processing abstractions. Focus on what you want to do to your data, not what the underlying system supports. A graph of transformations is automatically transformed into an optimized series of MapReduces.
21 Batch Patterns: Creating Structured Data MapReduce
22 Batch Patterns: Repetitive Runs Thursday Wednesday Tuesday MapReduce
23 Batch Patterns: Time Based Windows Tuesday [11:00-12:00) [12:00-13:00) [13:00-14:00) [14:00-15:00) [15:00-16:00) [16:00-17:00) [18:00-19:00) [19:00-20:00) MapReduce [21:00-22:00) [22:00-23:00) [23:00-0:00)
24 Batch Patterns: Sessions Tuesday Wednesday Tuesday Wednesday Jose Lisa MapReduce Ingo Asha Cheryl Ari
25 MillWheel: Streaming Computations Framework for building low-latency data-processing applications User provides a DAG of computations to be performed System manages state and persistent flow of elements
26 Streaming Patterns: Element-wise transformations 8:00 9:00 10:00 11:00 12:00 13:00 14:00 Processing Time
27 Streaming Patterns: Aggregating Time Based Windows 8:00 9:00 10:00 11:00 12:00 13:00 14:00 Processing Time
28 Streaming Patterns: Event Time Based Windows Input Processing Time 10:00 11:00 12:00 13:00 14:00 15:00 Output Event Time 10:00 11:00 12:00 13:00 14:00 15:00
29 Streaming Patterns: Session Windows Input Processing Time 10:00 11:00 12:00 13:00 14:00 15:00 Output Event Time 10:00 11:00 12:00 13:00 14:00 15:00
30 Processing Time Event-Time Skew Skew Event Time
31 Processing Time Event-Time Skew Watermark Skew Watermarks describe event time progress. "No timestamp earlier than the watermark will be seen" Often heuristic-based. Event Time Too Slow? Results are delayed. Too Fast? Some data is late.
32 Streaming or Batch? = 2 $$$ Completeness Latency Cost Why not both?
33 3 The Dataflow Model
34 3 The Apache Beam Model
35 What are you computing? Where in event time? When in processing time? How do refinements relate?
36 What are you computing? A Pipeline represents a graph of data processing transformations PCollections flow through the pipeline Optimized and executed as a unit for efficiency
37 What are you computing? A PCollection<T> is a collection of data of type T Maybe be bounded or unbounded in size Each element has an implicit timestamp Initially created from backing data stores
38 What are you computing? PTransforms transform PCollections into other PCollections. Element-Wise Aggregating Composite What Where When How
39 Example: Computing Integer Sums // Collection of raw log lines PCollection<String> raw =...; // Element-wise transformation into team/score pairs PCollection<KV<String, Integer>> input = raw.apply(pardo.of(new ParseFn())) // Composite transformation containing an aggregation PCollection<KV<String, Integer>> output = input.apply(sum.integersperkey()); What Where When How
40 Example: Computing Integer Sums What Where When How
41 Example: Computing Integer Sums What Where When How
42 Example: Computing Integer Sums What Where When How
43 Where in Event Time? Windowing divides data into eventtime-based finite chunks Key 1 Key 2 Key Key 1 Key 2 Key Key 1 Key 2 Key Required when doing aggregations over unbounded data. Fixed Sliding Sessions What Where When How
44 Example: Fixed 2-minute Windows PCollection<KV<String, Integer>> output = input.apply(window.into(fixedwindows.of(minutes(2)))).apply(sum.integersperkey()); What Where When How
45 Example: Fixed 2-minute Windows What Where When How
46 Processing Time When in Processing Time? Watermark Triggers control when results are emitted. Event Time Triggers are often relative to the watermark. What Where When How
47 Example: Triggering at the Watermark PCollection<KV<String, Integer>> output = input.apply(window.into(fixedwindows.of(minutes(2))).trigger(atwatermark())).apply(sum.integersperkey()); What Where When How
48 Example: Triggering at the Watermark What Where When How
49 Example: Triggering for Speculative & Late Data PCollection<KV<String, Integer>> output = input.apply(window.into(fixedwindows.of(minutes(2))).trigger(atwatermark().withearlyfirings(atperiod(minutes(1))).withlatefirings(atcount(1)))).apply(sum.integersperkey()); What Where When How
50 Example: Triggering for Speculative & Late Data What Where When How
51 How do Refinements Relate? How should multiple outputs per window accumulate? Appropriate choice depends on consumer. Firing Elements Discarding Speculative 3 Watermark 5, 1 Late 2 Total Observ What Where When How
52 How do Refinements Relate? How should multiple outputs per window accumulate? Appropriate choice depends on consumer. Firing Elements Discarding Accumulating Acc. & Retracting Speculative Watermark 5, , -3 Late , -9 Total Observ What Where When How
53 Example: Add Newest, Remove Previous PCollection<KV<String, Integer>> output = input.apply(window.into(sessions.withgapduration(minutes(1))).trigger(atwatermark().withearlyfirings(atperiod(minutes(1))).withlatefirings(atcount(1))).accumulatingandretracting()).apply(new Sum()); What Where When How
54 Example: Add Newest, Remove Previous What Where When How
55 Customizing What Where When How 1. Classic Batch 2. Batch with Fixed Windows 3. Streaming 4. Streaming with 5. Streaming with Speculative + Late Data Retractions What Where When How
56 4 Demo: Google Cloud Dataflow
57 Google Cloud Dataflow A fully-managed cloud service and programming model for batch and streaming big data processing. Cloud Dataflow
58 Open Source SDKs Used to construct a Dataflow pipeline. Java available now. Python in the works. Pipelines can run On your development machine On the Dataflow Service on Google Cloud Platform On third party environments like Spark or Flink.
59 Fully Managed Dataflow Service Runs the pipeline on Google Cloud Platform. Includes: Graph optimization: Modular code, efficient execution Smart Workers: Lifecycle management, Autoscaling, and Smart task rebalancing Easy Monitoring: Dataflow UI, Restful API and CLI, Integration with Cloud Logging, etc.
60 Demo Time!
61 Wrote Reusable code (PTransforms) Built upon the same code (PTransform) in batch and streaming Used event time, not processing time Adjusted windowing and late data settings
62 Learn More! tec2016.toddreedy.com
Google Cloud Dataflow
Google Cloud Dataflow A Unified Model for Batch and Streaming Data Processing Jelena Pjesivac-Grbovic STREAM 2015 Agenda 1 Data Shapes 2 Data Processing Tradeoffs 3 Google s Data Processing Story 4 Google
More informationAn Introduction to The Beam Model
An Introduction to The Beam Model Apache Beam (incubating) Slides by Tyler Akidau & Frances Perry, April 2016 Agenda 1 Infinite, Out-of-order Data Sets 2 The Evolution of the Beam Model 3 What, Where,
More informationFundamentals of Stream Processing with Apache Beam (incubating)
Google Docs version of slides (including animations): https://goo.gl/yzvlxe Fundamentals of Stream Processing with Apache Beam (incubating) Frances Perry & Tyler Akidau @francesjperry, @takidau Apache
More informationData Processing with Apache Beam (incubating) and Google Cloud Dataflow
Data Processing with Apache Beam (incubating) and Google Cloud Dataflow Jelena Pjesivac-Grbovic Staff software engineer Cloud Big Data In collaboration with Frances Perry, Tayler Akidau, and Dataflow team
More informationStreaming Auto-Scaling in Google Cloud Dataflow
Streaming Auto-Scaling in Google Cloud Dataflow Manuel Fahndrich Software Engineer Google Addictive Mobile Game https://commons.wikimedia.org/wiki/file:globe_centered_in_the_atlantic_ocean_(green_and_grey_globe_scheme).svg
More informationApache Beam. Modèle de programmation unifié pour Big Data
Apache Beam Modèle de programmation unifié pour Big Data Who am I? Jean-Baptiste Onofre @jbonofre http://blog.nanthrax.net Member of the Apache Software Foundation
More informationTOWARDS PORTABILITY AND BEYOND. Maximilian maximilianmichels.com DATA PROCESSING WITH APACHE BEAM
TOWARDS PORTABILITY AND BEYOND Maximilian Michels mxm@apache.org DATA PROCESSING WITH APACHE BEAM @stadtlegende maximilianmichels.com !2 BEAM VISION Write Pipeline Execute SDKs Runners Backends !3 THE
More informationIntroduction to Apache Beam
Introduction to Apache Beam Dan Halperin JB Onofré Google Beam podling PMC Talend Beam Champion & PMC Apache Member Apache Beam is a unified programming model designed to provide efficient and portable
More informationPortable stateful big data processing in Apache Beam
Portable stateful big data processing in Apache Beam Kenneth Knowles Apache Beam PMC Software Engineer @ Google klk@google.com / @KennKnowles https://s.apache.org/ffsf-2017-beam-state Flink Forward San
More informationUsing Apache Beam for Batch, Streaming, and Everything in Between. Dan Halperin Apache Beam PMC Senior Software Engineer, Google
Abstract Apache Beam is a unified programming model capable of expressing a wide variety of both traditional batch and complex streaming use cases. By neatly separating properties of the data from run-time
More informationApache Beam: portable and evolutive data-intensive applications
Apache Beam: portable and evolutive data-intensive applications Ismaël Mejía - @iemejia Talend Who am I? @iemejia Software Engineer Apache Beam PMC / Committer ASF member Integration Software Big Data
More informationNexmark with Beam. Evaluating Big Data systems with Apache Beam. Etienne Chauchot, Ismaël Mejía. Talend
Nexmark with Beam Evaluating Big Data systems with Apache Beam Etienne Chauchot, Ismaël Mejía. Talend 1 Who are we? 2 Agenda 1. Big Data Benchmarking a. b. 2. Nexmark on Apache Beam a. b. c. d. e. f. 3.
More informationFlexible Network Analytics in the Cloud. Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco
Flexible Network Analytics in the Cloud Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco Introduction Harsh realities of network analytics netbeam Demo
More informationNaiad (Timely Dataflow) & Streaming Systems
Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed Data Systems Mon, Nov 7th 2016 Amine Mhedhbi What is Timely Dataflow?! What is its significance? Dataflow?! Dataflow?!
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 9: Real-Time Data Analytics (1/2) March 27, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationReal-Time Decisions Using ML on the Google Cloud Platform. Przemysław Pastuszka & Carlos Garcia QCon London 7th March 2018
Real-Time Decisions Using ML on the Google Cloud Platform Przemysław Pastuszka & Carlos Garcia QCon London 7th March 2018 How many of you are interested in machine learning? but how many of you are running
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationHow Apache Beam Will Change Big Data
How Apache Beam Will Change Big Data 1 / 21 About Big Data Institute Mentoring, training, and high-level consulting company focused on Big Data, NoSQL and The Cloud Founded in 2008 We help make companies
More informationProcessing Data of Any Size with Apache Beam
Processing Data of Any Size with Apache Beam 1 / 19 Chapter 1 Introducing Apache Beam 2 / 19 Introducing Apache Beam What Is Beam? Why Use Beam? Using Beam 3 / 19 Apache Beam Apache Beam is a unified model
More informationTurning Relational Database Tables into Spark Data Sources
Turning Relational Database Tables into Spark Data Sources Kuassi Mensah Jean de Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 3 Safe Harbor Statement The following
More informationExperiences with Apache Beam. Dan Debrunner Programming Model Architect IBM Streams STSM, IBM
Experiences with Apache Beam Dan Debrunner Programming Model Architect IBM Streams STSM, IBM Background To define my point of view IBM Streams brief history 2002 IBM Research/DoD joint research project
More informationApache Flink. Alessandro Margara
Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate
More informationFast and Easy Stream Processing with Hazelcast Jet. Gokhan Oner Hazelcast
Fast and Easy Stream Processing with Hazelcast Jet Gokhan Oner Hazelcast Stream Processing Why should I bother? What is stream processing? Data Processing: Massage the data when moving from place to place.
More informationFROM ZERO TO PORTABILITY
FROM ZERO TO PORTABILITY? Maximilian Michels mxm@apache.org APACHE BEAM S JOURNEY TO CROSS-LANGUAGE DATA PROCESSING @stadtlegende maximilianmichels.com FOSDEM 2019 What is Beam? What does portability mean?
More informationMEAP Edition Manning Early Access Program Flink in Action Version 2
MEAP Edition Manning Early Access Program Flink in Action Version 2 Copyright 2016 Manning Publications For more information on this and other Manning titles go to www.manning.com welcome Thank you for
More informationHadoop, Spark, Flink, and Beam Explained to Oracle DBAs: Why They Should Care
Hadoop, Spark, Flink, and Beam Explained to Oracle DBAs: Why They Should Care Kuassi Mensah Jean De Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 Safe Harbor
More informationPractical Big Data Processing An Overview of Apache Flink
Practical Big Data Processing An Overview of Apache Flink Tilmann Rabl Berlin Big Data Center www.dima.tu-berlin.de bbdc.berlin rabl@tu-berlin.de With slides from Volker Markl and data artisans 1 2013
More informationTowards a Real- time Processing Pipeline: Running Apache Flink on AWS
Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Dr. Steffen Hausmann, Solutions Architect Michael Hanisch, Manager Solutions Architecture November 18 th, 2016 Stream Processing Challenges
More informationReal-time Streaming Applications on AWS Patterns and Use Cases
Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,
More informationProcessing of big data with Apache Spark
Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT
More informationContainer 2.0. Container: check! But what about persistent data, big data or fast data?!
@unterstein @joerg_schad @dcos @jaxdevops Container 2.0 Container: check! But what about persistent data, big data or fast data?! 1 Jörg Schad Distributed Systems Engineer @joerg_schad Johannes Unterstein
More informationDeep Dive into Concepts and Tools for Analyzing Streaming Data
Deep Dive into Concepts and Tools for Analyzing Streaming Data Dr. Steffen Hausmann Sr. Solutions Architect, Amazon Web Services Data originates in real-time Photo by mountainamoeba https://www.flickr.com/photos/mountainamoeba/2527300028/
More informationMillWheel:Fault Tolerant Stream Processing at Internet Scale. By FAN Junbo
MillWheel:Fault Tolerant Stream Processing at Internet Scale By FAN Junbo Introduction MillWheel is a low latency data processing framework designed by Google at Internet scale. Motived by Google Zeitgeist
More informationThrill : High-Performance Algorithmic
Thrill : High-Performance Algorithmic Distributed Batch Data Processing in C++ Timo Bingmann, Michael Axtmann, Peter Sanders, Sebastian Schlag, and 6 Students 2016-12-06 INSTITUTE OF THEORETICAL INFORMATICS
More information@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS
@unterstein @dcos @bedcon #bedcon Operating microservices with Apache Mesos and DC/OS 1 Johannes Unterstein Software Engineer @Mesosphere @unterstein @unterstein.mesosphere 2017 Mesosphere, Inc. All Rights
More informationApache Spark 2.0. Matei
Apache Spark 2.0 Matei Zaharia @matei_zaharia What is Apache Spark? Open source data processing engine for clusters Generalizes MapReduce model Rich set of APIs and libraries In Scala, Java, Python and
More informationThe Evolution of Big Data Platforms and Data Science
IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering
More informationMaking Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST
Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0 WEBINAR MAY 15 th, 2018 1PM EST 10AM PST Welcome and Logistics If you have problems with the sound on your computer, switch
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationRIPE75 - Network monitoring at scale. Louis Poinsignon
RIPE75 - Network monitoring at scale Louis Poinsignon Why monitoring and what to monitor? Why do we monitor? Billing Reducing costs Traffic engineering Where should we peer? Where should we set-up a new
More informationPutting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21
Big Processing -Parallel Computation COS 418: Distributed Systems Lecture 21 Michael Freedman 2 Ex: Word count using partial aggregation Putting it together 1. Compute word counts from individual files
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the Cloud Dean Bryen, Solutions Architect AWS Andrew Wheat, Senior Software Engineer - BBC April 15, 2015 London, UK 2015, Amazon Web Services, Inc. or its affiliates.
More informationOpen Source IoT. Eclipse IoT. Tim De Borger - Senior Solution Architect 13/06/2017
Open Source IoT Eclipse IoT Tim De Borger - tdeborge@redhat.com Senior Solution Architect 13/06/2017 Disclaimer The content set forth herein is Red Hat confidential information and does not constitute
More informationCSE 444: Database Internals. Lecture 23 Spark
CSE 444: Database Internals Lecture 23 Spark References Spark is an open source system from Berkeley Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei
More informationThe SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.
Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate
More informationArcGIS GeoEvent Server: Making 3D Scenes Come Alive with Real-Time Data
ArcGIS GeoEvent Server: Making 3D Scenes Come Alive with Real-Time Data Morakot Pilouk, Ph.D. Senior Software Developer, Esri mpilouk@esri.com @mpesri Agenda 1 2 3 4 5 6 3D for ArcGIS Real-Time GIS Static
More informationPython, PySpark and Riak TS. Stephen Etheridge Lead Solution Architect, EMEA
Python, PySpark and Riak TS Stephen Etheridge Lead Solution Architect, EMEA Agenda Introduction to Riak TS The Riak Python client The Riak Spark connector and PySpark CONFIDENTIAL Basho Technologies 3
More informationAn Introduction to Apache Spark
An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations
More informationDevops Practices on google cloud. Jeff Liu Google
Devops Practices on google cloud Jeff Liu Google Jeff Liu SWE / Devops Google SRE - Twitter Devops - Splunk SRE - Dell Devops Evolution Devops Evolution Devops Evolution Cloud/Platform Microservices Big
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationDevelop and test your Mobile App faster on AWS
Develop and test your Mobile App faster on AWS Carlos Sanchiz, Solutions Architect @xcarlosx26 #AWSSummit 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The best mobile apps are
More informationChapter 4: Apache Spark
Chapter 4: Apache Spark Lecture Notes Winter semester 2016 / 2017 Ludwig-Maximilians-University Munich PD Dr. Matthias Renz 2015, Based on lectures by Donald Kossmann (ETH Zürich), as well as Jure Leskovec,
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationData Analytics with HPC. Data Streaming
Data Analytics with HPC Data Streaming Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationApache Flink Big Data Stream Processing
Apache Flink Big Data Stream Processing Tilmann Rabl Berlin Big Data Center www.dima.tu-berlin.de bbdc.berlin rabl@tu-berlin.de XLDB 11.10.2017 1 2013 Berlin Big Data Center All Rights Reserved DIMA 2017
More informationApache Flink- A System for Batch and Realtime Stream Processing
Apache Flink- A System for Batch and Realtime Stream Processing Lecture Notes Winter semester 2016 / 2017 Ludwig-Maximilians-University Munich Prof Dr. Matthias Schubert 2016 Introduction to Apache Flink
More informationA BIG DATA STREAMING RECIPE WHAT TO CONSIDER WHEN BUILDING A REAL TIME BIG DATA APPLICATION
A BIG DATA STREAMING RECIPE WHAT TO CONSIDER WHEN BUILDING A REAL TIME BIG DATA APPLICATION Konstantin Gregor / konstantin.gregor@tngtech.com ABOUT ME So ware developer for TNG in Munich Client in telecommunication
More informationConsuming Model-Driven Telemetry
Consuming Model-Driven Telemetry Cristina Precup & Stefan Braicu Software Systems Engineers Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session
More informationNXOS in the Real World Using NX-API REST
NXOS in the Real World Using NX-API REST Adrian Iliesiu Corporate Development Engineer Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session
More informationBringing Data to Life
Bringing Data to Life Data management and Visualization Techniques Benika Hall Rob Harrison Corporate Model Risk March 16, 2018 Introduction Benika Hall Analytic Consultant Wells Fargo - Corporate Model
More informationDeep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services
Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationGoogle Cloud Platform for Systems Operations Professionals (CPO200) Course Agenda
Google Cloud Platform for Systems Operations Professionals (CPO200) Course Agenda Module 1: Google Cloud Platform Projects Identify project resources and quotas Explain the purpose of Google Cloud Resource
More informationHow to Route Internet Traffic between A Mobile Application and IoT Device?
Whitepaper How to Route Internet Traffic between A Mobile Application and IoT Device? Website: www.mobodexter.com www.paasmer.co 1 Table of Contents 1. Introduction 3 2. Approach: 1 Uses AWS IoT Setup
More informationBuild the unified end to end IoT solution on ARM LEADING COLLABORATION IN THE ARM ECOSYSTEM
Build the unified end to end IoT solution on ARM LEADING COLLABORATION IN THE ARM ECOSYSTEM Agenda Linaro Linaro s IoT efforts Demo Business Models Design and sell x86 chips 2016 $59.5Bn Revenue Sells
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 12: Real-Time Data Analytics (2/2) March 30, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationYARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa
YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa ozawa.tsuyoshi@lab.ntt.co.jp ozawa@apache.org About me Tsuyoshi Ozawa Research Engineer @ NTT Twitter: @oza_x86_64 Over 150 reviews in 2015
More informationSimplifying ML Workflows with Apache Beam & TensorFlow Extended
Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau Software Engineer at Google Apache Beam PMC Apache Beam Portable data-processing pipelines Example pipelines Python
More informationPrioritizing Attention in Fast Data
Prioritizing ttention in Fast Data Principles and Promise Peter ailis Edward Gan Kexin Rong Sahaana Suri CIDR 2017 Edward Kexin Sahaana Edward Kexin Sahaana Firas Deepak John Matei Tony Edward Kexin Ihab
More informationStreaming ETL of High-Velocity Big Data Using SAS Event Stream Processing and SAS Viya
SAS 1679-2018 Streaming ETL of High-Velocity Big Data Using SAS Event Stream Processing and SAS Viya ABSTRACT Joydeep Bhattacharya and Manish Jhunjhunwala, SAS Institute Inc. A typical ETL happens once
More informationGoogle GSuite Intro Demo of GSuite and GCP integration
Google GSuite Intro Demo of GSuite and GCP integration May 2017 Sara Djelassi - Sales Steve Mansfield - PSO 7 Cloud products with 1 billion users ML is core to differentiating Google services Search Search
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationAn Introduction to Developing for Cisco Kinetic
An Introduction to Developing for Cisco Kinetic Krishna Chengavalli Technical Marketing Engineer IoT Software Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session
More informationStreamBox: Modern Stream Processing on a Multicore Machine
StreamBox: Modern Stream Processing on a Multicore Machine Hongyu Miao and Heejin Park, Purdue ECE; Myeongjae Jeon and Gennady Pekhimenko, Microsoft Research; Kathryn S. McKinley, Google; Felix Xiaozhu
More informationWelcome to the New Era of Cloud Computing
Welcome to the New Era of Cloud Computing Aaron Kimball The web is replacing the desktop 1 SDKs & toolkits are there What about the backend? Image: Wikipedia user Calyponte 2 Two key concepts Processing
More informationGetting Started with ArcGIS Runtime SDK for the Microsoft.NET Framework. Morten Nielsen Mike Branscomb Antti Kajanus Rex Hansen
Getting Started with ArcGIS Runtime SDK for the Microsoft.NET Framework Morten Nielsen Mike Branscomb Antti Kajanus Rex Hansen Agenda What is the ArcGIS Runtime? ArcGIS Runtime SDK for.net - Platform -
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationLecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Apache Flink
Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Apache Flink Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur Schmid, Daniyal Kazempour,
More informationMapReduce Algorithm Design
MapReduce Algorithm Design Contents Combiner and in mapper combining Complex keys and values Secondary Sorting Combiner and in mapper combining Purpose Carry out local aggregation before shuffle and sort
More informationAn Introduction to Software Architecture By David Garlan & Mary Shaw 94
IMPORTANT NOTICE TO STUDENTS These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague and incomplete on purpose to spark a class discussion An Introduction to
More informationDRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE
DRIZZLE: FAST AND Adaptable STREAM PROCESSING AT SCALE Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout, Michael Armbrust, Ali Ghodsi, Michael Franklin, Benjamin Recht, Ion Stoica STREAMING WORKLOADS
More informationStream and Batch Processing in the Cloud with Data Microservices. Marius Bogoevici and Mark Fisher, Pivotal
Stream and Batch Processing in the Cloud with Data Microservices Marius Bogoevici and Mark Fisher, Pivotal Stream and Batch Processing in the Cloud with Data Microservices Use Cases Predictive maintenance
More informationCS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab
CS6030 Cloud Computing Ajay Gupta B239, CEAS Computer Science Department Western Michigan University ajay.gupta@wmich.edu 276-3104 1 Acknowledgements I have liberally borrowed these slides and material
More informationThe Stream Processor as a Database. Ufuk
The Stream Processor as a Database Ufuk Celebi @iamuce Realtime Counts and Aggregates The (Classic) Use Case 2 (Real-)Time Series Statistics Stream of Events Real-time Statistics 3 The Architecture collect
More informationIntegrating Splunk with AWS services:
Integrating Splunk with AWS services: Using Redshi+, Elas0c Map Reduce (EMR), Amazon Machine Learning & S3 to gain ac0onable insights via predic0ve analy0cs via Splunk Patrick Shumate Solutions Architect,
More informationRAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko
RAMCloud Scalable High-Performance Storage Entirely in DRAM 2009 by John Ousterhout et al. Stanford University presented by Slavik Derevyanko Outline RAMCloud project overview Motivation for RAMCloud storage:
More informationMapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia
MapReduce Spark Some slides are adapted from those of Jeff Dean and Matei Zaharia What have we learnt so far? Distributed storage systems consistency semantics protocols for fault tolerance Paxos, Raft,
More informationBig data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT
: Choices for high availability and disaster recovery on Microsoft Azure By Arnab Ganguly DataCAT March 2019 Contents Overview... 3 The challenge of a single-region architecture... 3 Configuration considerations...
More informationReal-Time & Big Data GIS: Best Practices. Suzanne Foss Josh Joyner
Real-Time & Big Data GIS: Best Practices Suzanne Foss Josh Joyner ArcGIS Enterprise With Real-time Capabilities Desktop Apps APIs visualization ingestion dissemination & actuation analytics storage Agenda:
More informationModel-Driven Telemetry and Analytics
Model-Driven Telemetry and Analytics Steven Barth & Cristina Precup Software Systems Engineers Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this
More informationCompSci 516: Database Systems
CompSci 516 Database Systems Lecture 12 Map-Reduce and Spark Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements Practice midterm posted on sakai First prepare and
More informationIntroduction to MapReduce (cont.)
Introduction to MapReduce (cont.) Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com USC INF 553 Foundations and Applications of Data Mining (Fall 2018) 2 MapReduce: Summary USC INF 553 Foundations
More informationStreaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_
Streaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_ About Us At GetInData, we build custom Big Data solutions Hadoop, Flink, Spark, Kafka and more Our team is today represented
More informationStreaming OLAP Applications
Streaming OLAP Applications From square one to multi-gigabit streams and beyond C. Scott Andreas HPTS 2013 @cscotta Roadmap Framing the problem Four phases of an architecture s evolution Code: A general-purpose
More informationCreating a Recommender System. An Elasticsearch & Apache Spark approach
Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused
More informationApache Ignite and Apache Spark Where Fast Data Meets the IoT
Apache Ignite and Apache Spark Where Fast Data Meets the IoT Denis Magda GridGain Product Manager Apache Ignite PMC http://ignite.apache.org #apacheignite #denismagda Agenda IoT Demands to Software IoT
More informationDeveloping Qt Apps with the Runtime SDK
Developing Qt Apps with the Runtime SDK Thomas Dunn and Michael Tims Esri UC 2014 Technical Workshop Agenda Getting Started Creating the Map Geocoding and Routing Geoprocessing Message Processing Work
More informationOne year of Deploying Applications for Docker, CoreOS, Kubernetes and Co.
One year of Deploying Applications for Docker, CoreOS, Kubernetes and Co thomas@endocode.com HI! Thomas Fricke thomas@endocode.com CTO Endocode System Automation DevOps Cloud, Database and Software Architect
More informationCloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe
Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability
More information