Apache Beam. Modèle de programmation unifié pour Big Data
|
|
- Erica Logan
- 6 years ago
- Views:
Transcription
1 Apache Beam Modèle de programmation unifié pour Big Data
2 Who am I? Jean-Baptiste Onofre @jbonofre Member of the Apache Software Foundation Fellow/Software Architect at Talend PMC on ~20 Apache Projects from system integration & container (Karaf, Camel, ActiveMQ, Archiva, Aries, ServiceMix, ) to big data (Beam, CarbonData, Falcon, Gearpump, Lens, )
3 Apache Beam origin Colossus BigTable PubSub Dremel Google Cloud Dataflow Spanner Megastore Millwheel Flume Apache Beam MapReduce
4 Beam model: asking the right questions What results are calculated? Where in event time are results calculated? When in processing time are results materialized? How do refinements of results relate?
5 Customizing What Where When How 1 Classic Batch 2 Windowed Batch 3 Streaming 4 Streaming + Accumulation
6 What is Apache Beam? 1. Unified model (Batch + stream) What / Where / When / How 2. SDKs (Java, Python,...) & DSLs (Scala, ) 3. Runners for Existing Distributed Processing Backends (Google Dataflow, Spark, Flink, ) 4. IOs: Data store Sources / Sinks
7 Apache Beam vision 1. End users: who want to write pipelines in a language that s familiar. Beam Java Other Languages Beam Python 2. SDK/DSL writers: who want to make Beam concepts available in new languages. 3. Runner writers: who have a distributed processing environment and want to support Beam pipelines Beam Model: Pipeline Construction Apache Flink Cloud Dataflow Beam Model: Fn Runners Apache Spark Execution Execution Execution
8 Complex Event Processing Apache Beam - SDKs & DSLs SDKs API based on the Beam Model 1. Current: a. Java b. Python 2. Future (possible) SDKs: Go, Ruby, etc. DSLs Domain-Specific Languages based on the Beam Model: 1. Current: Scio (Scala API), 2. Future (ideas): Streaming SQL (Calcite) Machine Learning
9 Apache Beam SDK concepts 1. Pipeline - data processing job as a directed graph of transformations 2. PCollection - the data inside a pipeline 3. PTransform - a transformation step in the pipeline a. IO transforms - Read from a Source or Write to a Sink. b. Core transforms - common transformation provided (ParDo, GroupByKey, ) c. Composite transforms - combine multiple transforms
10 Apache Beam - Pipeline Data processing pipeline (executed via a Beam runner) Read PTransform (source) PTransform PTransform Write PTransform (sink)
11 Apache Beam - PCollection 1. PCollection is immutable, does not support random access to element, belongs to a Pipeline 2. Each element in PCollection has a Timestamp (commonly set by IO Source) 3. Coder to support different data serialization 4. Bounded (batch) or Unbounded (streaming) (depending of the IO Source)
12 Apache Beam - PTransform 1. PTransform are operations that transform data 2. Receive one or multiple PCollections and produce one or multiple PCollections 3. They must be Serializable 4. Should be thread-compatible (If you create your threads you must sync them). 5. Idempotency is not required but recommended.
13 Apache Beam - IO Transforms 1. IO read/write data as PCollections (Source/Sink) 2. Support Bounded and/or Unbounded PCollections 3. Extensible API to create custom sources & sinks 4. Deal with timestamp, watermarks, deduplication, read/write parallelism
14 Agenda 1. Evolution of the Big Data programming models 2. The Beam approach 3. Apache Beam
15 Apache Beam - Current IOs Ready File Avro Google Cloud Storage BigQuery BigTable DataStore MQTT JDBC Mongo / GridFS JMS Kafka Kinesis WIP Hive Cassandra Reddis RabbitMQ... HDFS Elasticsearch HBase
16 Apache Beam - Pipeline with IO Example public static void main(string[] args) { // Create a pipeline parameterized by command line flags eg. --runner Pipeline p = Pipeline.create(PipelineOptionsFactory.fromArgs(arg)); p.apply(kafkaio.read().withbootstrapservers(servers).withtopics(topics)) // Read input.apply(new YourFancyFn()) // Do some processing.apply(elasticsearchio.write().withaddress(esserver).withindex(index).withtype(type)); // Write output // Run the pipeline. p.run(); }
17 What are you computing? Element-Wise Aggregating Composite
18 Apache Beam - Programming model in the SDK Element-wise ParDo MapElements FlatMapElements Filter WithKeys Keys Values Grouping GroupByKey Combine -> Reduce Sum Count Min Max Mean... Windowing/Triggers FixedWindows GlobalWindows SlidingWindows Sessions AfterWatermark AfterProcessingTime AfterPane...
19 Apache Beam - Example - GDELT Events by location Pipeline pipeline = Pipeline.create(options); // Read events from a text file and parse them. pipeline.apply("gdeltfile", TextIO.Read.from(options.getInput())) // Extract location from the fields.apply("extractlocation", ParDo.of(...) // Count events per location.apply("countperlocation", Count.<String>perElement()) // Reformat KV as a String.apply("StringFormat", MapElements.via(...)) // write to result files.apply("results",textio.write.to(options.getoutput())); // Run the batch pipeline. pipeline.run();
20 Apache Bean - Runners / Execution Engines Runners translate the code to a target runtime (the runner itself doesn t provide the runtime) Many runners are tied to other top-level Apache projects, such as Apache Flink and Apache Spark Due to this, runners can be run on-premise (on your local Flink cluster) or in a public cloud (using Google Cloud Dataproc or Amazon EMR) for example Apache Beam is focused on treating runners as a top-level use case (with APIs, support, etc.) so runners can be developed with minimal friction for maximum pipeline portability
21 Runners Apache Beam Direct Runner Local Google Cloud Dataflow Managed (NoOps) Apache Spark Apache Flink WIP Apache Apex Apache Gearpump Apache MapReduce Apache Karaf Same code, different runners & runtimes
22 Apache Beam - Use cases Apache Beam is a great choice for both batch and stream processing and can handle bounded and unbounded datasets Batch can focus on ETL/ELT, catch-up processing, daily aggregations, and so on Stream can focus on handling real-time processing on a record-by-record basis Real use cases Data processing, both batch and stream processing Real-time event processing from IoT devices Fraud detection,...
23 Why Apache Beam? 1. Portable - You can use the same code with different runners (agnostic) and backends on premise, in the cloud, or locally 2. Unified - Same unified model for batch and stream processing 3. Advanced features - Event windowing, triggering, watermarking, lateness, etc. 4. Extensible model and SDK - Extensible API; can define custom sources to read and write in parallel
24 Growing the Beam Community Collaborate - Beam is becoming a communitydriven effort with participation from many organizations and contributors Grow - We want to grow the Beam ecosystem and community with active, open involvement so Beam is a part of the larger OSS ecosystem
25 Learn More! Apache Beam Join the Beam mailing lists! on Twitter
26 Thank You!
Introduction to Apache Beam
Introduction to Apache Beam Dan Halperin JB Onofré Google Beam podling PMC Talend Beam Champion & PMC Apache Member Apache Beam is a unified programming model designed to provide efficient and portable
More informationApache Beam: portable and evolutive data-intensive applications
Apache Beam: portable and evolutive data-intensive applications Ismaël Mejía - @iemejia Talend Who am I? @iemejia Software Engineer Apache Beam PMC / Committer ASF member Integration Software Big Data
More informationAn Introduction to The Beam Model
An Introduction to The Beam Model Apache Beam (incubating) Slides by Tyler Akidau & Frances Perry, April 2016 Agenda 1 Infinite, Out-of-order Data Sets 2 The Evolution of the Beam Model 3 What, Where,
More informationData Processing with Apache Beam (incubating) and Google Cloud Dataflow
Data Processing with Apache Beam (incubating) and Google Cloud Dataflow Jelena Pjesivac-Grbovic Staff software engineer Cloud Big Data In collaboration with Frances Perry, Tayler Akidau, and Dataflow team
More informationFundamentals of Stream Processing with Apache Beam (incubating)
Google Docs version of slides (including animations): https://goo.gl/yzvlxe Fundamentals of Stream Processing with Apache Beam (incubating) Frances Perry & Tyler Akidau @francesjperry, @takidau Apache
More informationFROM ZERO TO PORTABILITY
FROM ZERO TO PORTABILITY? Maximilian Michels mxm@apache.org APACHE BEAM S JOURNEY TO CROSS-LANGUAGE DATA PROCESSING @stadtlegende maximilianmichels.com FOSDEM 2019 What is Beam? What does portability mean?
More informationTOWARDS PORTABILITY AND BEYOND. Maximilian maximilianmichels.com DATA PROCESSING WITH APACHE BEAM
TOWARDS PORTABILITY AND BEYOND Maximilian Michels mxm@apache.org DATA PROCESSING WITH APACHE BEAM @stadtlegende maximilianmichels.com !2 BEAM VISION Write Pipeline Execute SDKs Runners Backends !3 THE
More informationUsing Apache Beam for Batch, Streaming, and Everything in Between. Dan Halperin Apache Beam PMC Senior Software Engineer, Google
Abstract Apache Beam is a unified programming model capable of expressing a wide variety of both traditional batch and complex streaming use cases. By neatly separating properties of the data from run-time
More informationNexmark with Beam. Evaluating Big Data systems with Apache Beam. Etienne Chauchot, Ismaël Mejía. Talend
Nexmark with Beam Evaluating Big Data systems with Apache Beam Etienne Chauchot, Ismaël Mejía. Talend 1 Who are we? 2 Agenda 1. Big Data Benchmarking a. b. 2. Nexmark on Apache Beam a. b. c. d. e. f. 3.
More informationProcessing Data Like Google Using the Dataflow/Beam Model
Todd Reedy Google for Work Sales Engineer Google Processing Data Like Google Using the Dataflow/Beam Model Goals: Write interesting computations Run in both batch & streaming Use custom timestamps Handle
More informationPortable stateful big data processing in Apache Beam
Portable stateful big data processing in Apache Beam Kenneth Knowles Apache Beam PMC Software Engineer @ Google klk@google.com / @KennKnowles https://s.apache.org/ffsf-2017-beam-state Flink Forward San
More informationGoogle Cloud Dataflow
Google Cloud Dataflow A Unified Model for Batch and Streaming Data Processing Jelena Pjesivac-Grbovic STREAM 2015 Agenda 1 Data Shapes 2 Data Processing Tradeoffs 3 Google s Data Processing Story 4 Google
More informationFlexible Network Analytics in the Cloud. Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco
Flexible Network Analytics in the Cloud Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco Introduction Harsh realities of network analytics netbeam Demo
More informationProcessing Data of Any Size with Apache Beam
Processing Data of Any Size with Apache Beam 1 / 19 Chapter 1 Introducing Apache Beam 2 / 19 Introducing Apache Beam What Is Beam? Why Use Beam? Using Beam 3 / 19 Apache Beam Apache Beam is a unified model
More informationHow Apache Beam Will Change Big Data
How Apache Beam Will Change Big Data 1 / 21 About Big Data Institute Mentoring, training, and high-level consulting company focused on Big Data, NoSQL and The Cloud Founded in 2008 We help make companies
More informationDevops Practices on google cloud. Jeff Liu Google
Devops Practices on google cloud Jeff Liu Google Jeff Liu SWE / Devops Google SRE - Twitter Devops - Splunk SRE - Dell Devops Evolution Devops Evolution Devops Evolution Cloud/Platform Microservices Big
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationReal-Time Decisions Using ML on the Google Cloud Platform. Przemysław Pastuszka & Carlos Garcia QCon London 7th March 2018
Real-Time Decisions Using ML on the Google Cloud Platform Przemysław Pastuszka & Carlos Garcia QCon London 7th March 2018 How many of you are interested in machine learning? but how many of you are running
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationModern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc.
Modern ETL Tools for Cloud and Big Data Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc. Agenda Landscape Cloud ETL Tools Big Data ETL Tools Best Practices
More informationMonitoring with Apache Karaf Decanter
Monitoring with Apache Karaf Decanter JB Onofré ASF Member @jbonofre PMC Chair for Karaf PMC member for ACE, Archiva, Aries, Camel, Falcon, Incubator, Lens, ServiceMix, Syncope PPMC
More informationApache Spark 2.0. Matei
Apache Spark 2.0 Matei Zaharia @matei_zaharia What is Apache Spark? Open source data processing engine for clusters Generalizes MapReduce model Rich set of APIs and libraries In Scala, Java, Python and
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationHadoop, Spark, Flink, and Beam Explained to Oracle DBAs: Why They Should Care
Hadoop, Spark, Flink, and Beam Explained to Oracle DBAs: Why They Should Care Kuassi Mensah Jean De Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 Safe Harbor
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationStreaming Auto-Scaling in Google Cloud Dataflow
Streaming Auto-Scaling in Google Cloud Dataflow Manuel Fahndrich Software Engineer Google Addictive Mobile Game https://commons.wikimedia.org/wiki/file:globe_centered_in_the_atlantic_ocean_(green_and_grey_globe_scheme).svg
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationAdvanced Data Processing Techniques for Distributed Applications and Systems
DST Summer 2018 Advanced Data Processing Techniques for Distributed Applications and Systems Hong-Linh Truong Faculty of Informatics, TU Wien hong-linh.truong@tuwien.ac.at www.infosys.tuwien.ac.at/staff/truong
More informationApache Karaf in the enterprise. JB
Apache Karaf in the enterprise JB Onofré jbonofre@apache.org @jbonofre APACHECON North America Sept. 24-27, 2018 1 Who am I? Jean-Baptiste Onofré Software Architect/Fellow at Talend
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationProcessing of big data with Apache Spark
Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT
More informationIntroduction to Apache Apex
Introduction to Apache Apex Siyuan Hua @hsy541 PMC Apache Apex, Senior Engineer DataTorrent, Big Data Technology Conference, Beijing, Dec 10 th 2016 Stream Data Processing Data Delivery
More informationBlended Learning Outline: Cloudera Data Analyst Training (171219a)
Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills
More informationThe SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.
Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationAn Introduction to Apache Spark
An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations
More informationOverview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::
Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional
More informationThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,
More informationCSE 444: Database Internals. Lecture 23 Spark
CSE 444: Database Internals Lecture 23 Spark References Spark is an open source system from Berkeley Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationIntro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect
Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationTowards a Real- time Processing Pipeline: Running Apache Flink on AWS
Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Dr. Steffen Hausmann, Solutions Architect Michael Hanisch, Manager Solutions Architecture November 18 th, 2016 Stream Processing Challenges
More informationBig Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture
Big Data Syllabus Hadoop YARN Setup Programming in YARN framework j Understanding big data and Hadoop Big Data Limitations and Solutions of existing Data Analytics Architecture Hadoop Features Hadoop Ecosystem
More informationYARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa
YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa ozawa.tsuyoshi@lab.ntt.co.jp ozawa@apache.org About me Tsuyoshi Ozawa Research Engineer @ NTT Twitter: @oza_x86_64 Over 150 reviews in 2015
More informationTurning Relational Database Tables into Spark Data Sources
Turning Relational Database Tables into Spark Data Sources Kuassi Mensah Jean de Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 3 Safe Harbor Statement The following
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 12: Real-Time Data Analytics (2/2) March 30, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationApache Bahir Writing Applications using Apache Bahir
Apache Big Data Seville 2016 Apache Bahir Writing Applications using Apache Bahir Luciano Resende About Me Luciano Resende (lresende@apache.org) Architect and community liaison at Have been contributing
More informationCERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)
CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationDeep Dive into Concepts and Tools for Analyzing Streaming Data
Deep Dive into Concepts and Tools for Analyzing Streaming Data Dr. Steffen Hausmann Sr. Solutions Architect, Amazon Web Services Data originates in real-time Photo by mountainamoeba https://www.flickr.com/photos/mountainamoeba/2527300028/
More informationPractical Big Data Processing An Overview of Apache Flink
Practical Big Data Processing An Overview of Apache Flink Tilmann Rabl Berlin Big Data Center www.dima.tu-berlin.de bbdc.berlin rabl@tu-berlin.de With slides from Volker Markl and data artisans 1 2013
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationDeep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services
Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationData Ingestion at Scale. Jeffrey Sica
Data Ingestion at Scale Jeffrey Sica ARC-TS @jeefy Overview What is Data Ingestion? Concepts Use Cases GPS collection with mobile devices Collecting WiFi data from WAPs Sensor data from manufacturing machines
More informationLet the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka. Materna GmbH
Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka Wer ist Frank Pientka? Dipl.-Informatiker (TH Karlsruhe) Verheiratet, 2 Töchter Principal Software Architect in Dortmund Fast
More informationReal-time Streaming Applications on AWS Patterns and Use Cases
Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationCenter for Information Services and High Performance Computing (ZIH) Current trends in big data analysis: second generation data processing
Center for Information Services and High Performance Computing (ZIH) Current trends in big data analysis: second generation data processing Course overview Part 1 Challenges Fundamentals and challenges
More informationApache Flink. Alessandro Margara
Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationHadoop Development Introduction
Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand
More informationIoT with Apache ActiveMQ, Camel and Spark
IoT with Apache ActiveMQ, Camel and Spark Burr Sutter - Red Hat Agenda Business & IT Architecture IoT Architecture IETF IoT Use Case Ingestion: Apache ActiveMQ, Apache Camel Analytics: Apache Spark Demos
More information@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS
@unterstein @dcos @bedcon #bedcon Operating microservices with Apache Mesos and DC/OS 1 Johannes Unterstein Software Engineer @Mesosphere @unterstein @unterstein.mesosphere 2017 Mesosphere, Inc. All Rights
More informationFast and Easy Stream Processing with Hazelcast Jet. Gokhan Oner Hazelcast
Fast and Easy Stream Processing with Hazelcast Jet Gokhan Oner Hazelcast Stream Processing Why should I bother? What is stream processing? Data Processing: Massage the data when moving from place to place.
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More information빅데이터기술개요 2016/8/20 ~ 9/3. 윤형기
빅데이터기술개요 2016/8/20 ~ 9/3 윤형기 (hky@openwith.net) D4 http://www.openwith.net 2 Hive http://www.openwith.net 3 What is Hive? 개념 a data warehouse infrastructure tool to process structured data in Hadoop. Hadoop
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 9: Real-Time Data Analytics (1/2) March 27, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationBig Data Hadoop Course Content
Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux
More informationNew Developments in Spark
New Developments in Spark And Rethinking APIs for Big Data Matei Zaharia and many others What is Spark? Unified computing engine for big data apps > Batch, streaming and interactive Collection of high-level
More informationGoogle Cloud Bigtable. And what it's awesome at
Google Cloud Bigtable And what it's awesome at Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes Agenda 1 Research 2 A story about bigness 3 How it works 4 When it's awesome Google Research
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationVideo Analytics at the Edge: Fun with Apache Edgent, OpenCV and a Raspberry Pi
Video Analytics at the Edge: Fun with Apache Edgent, OpenCV and a Raspberry Pi Dale LaBossiere, Will Marshall, Jerome Chailloux Apache Edgent is currently undergoing Incubation at the Apache Software Foundation.
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationReport on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt
Report on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt Date: 10 Sep, 2017 Draft v 4.0 Table of Contents 1. Introduction... 3 2. Infrastructure Reference Architecture...
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationThe Stream Processor as a Database. Ufuk
The Stream Processor as a Database Ufuk Celebi @iamuce Realtime Counts and Aggregates The (Classic) Use Case 2 (Real-)Time Series Statistics Stream of Events Real-time Statistics 3 The Architecture collect
More informationIBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics
IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference
More informationPython, PySpark and Riak TS. Stephen Etheridge Lead Solution Architect, EMEA
Python, PySpark and Riak TS Stephen Etheridge Lead Solution Architect, EMEA Agenda Introduction to Riak TS The Riak Python client The Riak Spark connector and PySpark CONFIDENTIAL Basho Technologies 3
More informationPerformance Measurement of Stream Data Processing in Apache Spark
Masaryk University Faculty of Informatics Performance Measurement of Stream Data Processing in Apache Spark Master s Thesis Bc. Filip Halas Brno, Spring 2017 This is where a copy of the official signed
More informationResearch challenges in data-intensive computing The Stratosphere Project Apache Flink
Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive
More informationCertified Big Data and Hadoop Course Curriculum
Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation
More informationApache Ignite - Using a Memory Grid for Heterogeneous Computation Frameworks A Use Case Guided Explanation. Chris Herrera Hashmap
Apache Ignite - Using a Memory Grid for Heterogeneous Computation Frameworks A Use Case Guided Explanation Chris Herrera Hashmap Topics Who - Key Hashmap Team Members The Use Case - Our Need for a Memory
More informationExperiences with Apache Beam. Dan Debrunner Programming Model Architect IBM Streams STSM, IBM
Experiences with Apache Beam Dan Debrunner Programming Model Architect IBM Streams STSM, IBM Background To define my point of view IBM Streams brief history 2002 IBM Research/DoD joint research project
More informationScaling Marketplaces at Thumbtack QCon SF 2017
Scaling Marketplaces at Thumbtack QCon SF 2017 Nate Kupp Technical Infrastructure Data Eng, Experimentation, Platform Infrastructure, Security, Dev Tools Infrastructure from early beginnings You see that?
More informationThe Evolution of Big Data Platforms and Data Science
IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering
More informationConfiguring Intelligent Streaming 10.2 For Kafka on MapR
Configuring Intelligent Streaming 10.2 For Kafka on MapR Copyright Informatica LLC 2017. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States
More informationCertified Big Data Hadoop and Spark Scala Course Curriculum
Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills
More informationThe age of Big Data Big Data for Oracle Database Professionals
The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationMigrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring
Migrating massive monitoring to Bigtable without downtime Martin Parm, Infrastructure Engineer for Monitoring This is a big deal. -- Nicholas Harteau/VP, Engineering & Infrastructure https://news.spotify.com/dk/2016/02/23/announcing-spotify-infrastructures-googley-future/
More informationExam Questions
Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure
More information