Simplifying ML Workflows with Apache Beam & TensorFlow Extended
|
|
- Estella Karin Wheeler
- 5 years ago
- Views:
Transcription
1 Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Software Engineer at Google Apache Beam PMC
2 Apache Beam Portable data-processing pipelines
3 Example pipelines Python Java
4 Cross-language Portability Framework Language A SDK Language B SDK Language C SDK The Beam Model Runner 1 Runner 2 Runner 3 The Beam Model Language A Language B Language C
5 Python compatible runners Direct runner (local machine): Now Google Cloud Dataflow: Now Apache Flink: Q2-Q3 Apache Spark: Q3-Q4
6 TensorFlow Extended End-to-end machine learning in production
7 Doing ML in production is hard. -Everyone who has ever tried
8 Because, in addition to the actual ML... ML Code
9 ...you have to worry about so much more. Data Verification Monitoring Configuration Data Collection ML Code Serving Infrastructure Analysis Tools Process Management Tools Feature Extraction Machine Resource Management Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems
10 In this talk, I will...
11 In this talk, I will... Show you how to apply transformations... TensorFlow Transform
12 In this talk, we will... Show you how to apply transformations consistently between Training and Serving TensorFlow Transform TensorFlow Estimators TensorFlow Serving
13 In this talk, we will... Introduce something new... TensorFlow Transform TensorFlow Estimators TensorFlow Model Analysis TensorFlow Serving
14 TensorFlow Transform Consistent In-Graph Transformations in Training and Serving
15 Typical ML Pipeline During training During serving data request batch processing live processing
16 Typical ML Pipeline During training During serving data request batch processing live processing
17 TensorFlow Transform During training During serving data request tf.transform batch processing transform as tf.graph
18 Defining a preprocessing function in TF Transform X def preprocessing_fn(inputs): x = inputs['x']... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. Y Z
19 Defining a preprocessing function in TF Transform X def preprocessing_fn(inputs): x = inputs['x']... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. mean stddev normalize Y Z
20 Defining a preprocessing function in TF Transform X def preprocessing_fn(inputs): x = inputs['x']... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. mean stddev normalize multiply Y Z
21 Defining a preprocessing function in TF Transform X def preprocessing_fn(inputs): x = inputs['x']... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } mean stddev normalize multiply quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize A Y Z
22 Defining a preprocessing function in TF Transform X def preprocessing_fn(inputs): x = inputs['x']... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } mean Z Y stddev normalize multiply quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize A B
23 Defining a preprocessing function in TF Transform X def preprocessing_fn(inputs): x = inputs['x']... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } mean Y Z stddev normalize multiply quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize A B C
24 Analyzers mean stddev Transforms normalize Reduce (full pass) Implemented as a distributed data pipeline multiply quantiles bucketize Instance-to-instance (don t change batch dimension) Pure TensorFlow
25 data constant tensors mean stddev normalize multiply Analyze normalize multiply quantiles bucketize bucketize
26 What can be done with TF Transform? tf.transform batch processing Pretty much anything.
27 What can be done with TF Transform? tf.transform batch processing Pretty much anything. Serving Graph Anything that can be expressed as a TensorFlow Graph
28 Some common use-cases... Scale to... Bag of Words / N-Grams Bucketization Feature Crosses
29 Some common use-cases... Scale to... Bag of Words / N-Grams Bucketization Feature Crosses Apply another TensorFlow Model
30 github.com/tensorflow/transform
31 Introducing TensorFlow Model Analysis Scaleable, sliced, and full-pass metrics
32 Let s Talk about Metrics... How accurate? Converged model? What about my TB sized eval set? Slices / subsets? Across model versions?
33 ML Fairness: analyzing model mistakes by subgroup ROC Curve Sensitivity (True Positive Rate) All groups Specificity (False Positive Rate) Learn more at ml-fairness.com
34 ML Fairness: analyzing model mistakes by subgroup ROC Curve All groups Sensitivity (True Positive Rate) Group A Group B Specificity (False Positive Rate) Learn more at ml-fairness.com
35 ML Fairness: understand the failure modes of your models
36 ML Fairness: Learn More ml-fairness.com
37 How does it work? Inference Graph (SavedModel)... estimator = DNNLinearCombinedClassifier(...) estimator.train(...) estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn) tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn)... SignatureDef
38 How does it work? Inference Graph (SavedModel)... estimator = DNNLinearCombinedClassifier(...) estimator.train(...) SignatureDef estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn) tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn)... Eval Graph (SavedModel) Eval SignatureDef Metadata
39 github.com/tensorflow/model-analysis
40 Summary Apache Beam: Data-processing framework the runs locally and scales to massive data, in the Cloud (now) and soon on-premise via Flink (Q2-Q3) and Spark (Q3-Q4). Powers large-scale data processing in the TF libraries below. tf.transform: Consistent in-graph transformations in training and serving. tf.modelanalysis: Scalable, sliced, and full-pass metrics.
Fundamentals of Stream Processing with Apache Beam (incubating)
Google Docs version of slides (including animations): https://goo.gl/yzvlxe Fundamentals of Stream Processing with Apache Beam (incubating) Frances Perry & Tyler Akidau @francesjperry, @takidau Apache
More informationTOWARDS PORTABILITY AND BEYOND. Maximilian maximilianmichels.com DATA PROCESSING WITH APACHE BEAM
TOWARDS PORTABILITY AND BEYOND Maximilian Michels mxm@apache.org DATA PROCESSING WITH APACHE BEAM @stadtlegende maximilianmichels.com !2 BEAM VISION Write Pipeline Execute SDKs Runners Backends !3 THE
More informationPortable stateful big data processing in Apache Beam
Portable stateful big data processing in Apache Beam Kenneth Knowles Apache Beam PMC Software Engineer @ Google klk@google.com / @KennKnowles https://s.apache.org/ffsf-2017-beam-state Flink Forward San
More informationApache Beam: portable and evolutive data-intensive applications
Apache Beam: portable and evolutive data-intensive applications Ismaël Mejía - @iemejia Talend Who am I? @iemejia Software Engineer Apache Beam PMC / Committer ASF member Integration Software Big Data
More informationReal-Time Decisions Using ML on the Google Cloud Platform. Przemysław Pastuszka & Carlos Garcia QCon London 7th March 2018
Real-Time Decisions Using ML on the Google Cloud Platform Przemysław Pastuszka & Carlos Garcia QCon London 7th March 2018 How many of you are interested in machine learning? but how many of you are running
More informationData Analysis and Validation for ML
Analysis and for ML Neoklis (Alkis) Polyzotis, Google Research Collaborators: Eric Breck, Sudip Roy, Steven Whang, Martin Zinkevich Outline ML in production is hard, and a big part of hardness is related
More informationAn Introduction to The Beam Model
An Introduction to The Beam Model Apache Beam (incubating) Slides by Tyler Akidau & Frances Perry, April 2016 Agenda 1 Infinite, Out-of-order Data Sets 2 The Evolution of the Beam Model 3 What, Where,
More informationBenchmarking Apache Flink and Apache Spark DataFlow Systems on Large-Scale Distributed Machine Learning Algorithms
Benchmarking Apache Flink and Apache Spark DataFlow Systems on Large-Scale Distributed Machine Learning Algorithms Candidate Andrea Spina Advisor Prof. Sonia Bergamaschi Co-Advisor Dr. Tilmann Rabl Co-Advisor
More informationFlexible Network Analytics in the Cloud. Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco
Flexible Network Analytics in the Cloud Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco Introduction Harsh realities of network analytics netbeam Demo
More informationIntroduction to Apache Beam
Introduction to Apache Beam Dan Halperin JB Onofré Google Beam podling PMC Talend Beam Champion & PMC Apache Member Apache Beam is a unified programming model designed to provide efficient and portable
More informationApache Beam. Modèle de programmation unifié pour Big Data
Apache Beam Modèle de programmation unifié pour Big Data Who am I? Jean-Baptiste Onofre @jbonofre http://blog.nanthrax.net Member of the Apache Software Foundation
More informationA CD Framework For Data Pipelines. Yaniv
A CD Framework For Data Pipelines Yaniv Rodenski @YRodenski yaniv@apache.org Archetypes of Data Pipelines Builders Data People (Data Scientist/ Analysts/BI Devs) Exploratory workloads Code centric Software
More informationData Processing with Apache Beam (incubating) and Google Cloud Dataflow
Data Processing with Apache Beam (incubating) and Google Cloud Dataflow Jelena Pjesivac-Grbovic Staff software engineer Cloud Big Data In collaboration with Frances Perry, Tayler Akidau, and Dataflow team
More informationLuigi Build Data Pipelines of batch jobs. - Pramod Toraskar
Luigi Build Data Pipelines of batch jobs - Pramod Toraskar I am a Principal Solution Engineer & Pythonista with more than 8 years of work experience, Works for a Red Hat India an open source solutions
More informationFROM ZERO TO PORTABILITY
FROM ZERO TO PORTABILITY? Maximilian Michels mxm@apache.org APACHE BEAM S JOURNEY TO CROSS-LANGUAGE DATA PROCESSING @stadtlegende maximilianmichels.com FOSDEM 2019 What is Beam? What does portability mean?
More informationPersonalizing Netflix with Streaming datasets
Personalizing Netflix with Streaming datasets Shriya Arora Senior Data Engineer Personalization Analytics @shriyarora What is this talk about? Helping you decide if a streaming pipeline fits your ETL problem
More informationUsing Apache Beam for Batch, Streaming, and Everything in Between. Dan Halperin Apache Beam PMC Senior Software Engineer, Google
Abstract Apache Beam is a unified programming model capable of expressing a wide variety of both traditional batch and complex streaming use cases. By neatly separating properties of the data from run-time
More informationHadoop, Spark, Flink, and Beam Explained to Oracle DBAs: Why They Should Care
Hadoop, Spark, Flink, and Beam Explained to Oracle DBAs: Why They Should Care Kuassi Mensah Jean De Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 Safe Harbor
More informationBuilding a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch
Nick Pentreath Nov / 14 / 16 Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning
More informationIoT Sensor Analytics with Apache Kafka, KSQL and TensorFlow
1 IoT Sensor Analytics with Apache Kafka, KSQL and TensorFlow Kafka-Native End-to-End IoT Data Integration and Processing Kai Waehner - Technology Evangelist kontakt@kai-waehner.de - LinkedIn Twitter :
More informationOnto Petaflops with Kubernetes
Onto Petaflops with Kubernetes Vishnu Kannan Google Inc. vishh@google.com Key Takeaways Kubernetes can manage hardware accelerators at Scale Kubernetes provides a playground for ML ML journey with Kubernetes
More informationTensorFlowOnSpark Scalable TensorFlow Learning on Spark Clusters Lee Yang, Andrew Feng Yahoo Big Data ML Platform Team
TensorFlowOnSpark Scalable TensorFlow Learning on Spark Clusters Lee Yang, Andrew Feng Yahoo Big Data ML Platform Team What is TensorFlowOnSpark Why TensorFlowOnSpark at Yahoo? Major contributor to open-source
More informationDeploying Machine Learning Models in Practice
Deploying Machine Learning Models in Practice Nick Pentreath Principal Engineer @MLnick About @MLnick on Twitter & Github Principal Engineer, IBM CODAIT - Center for Open-Source Data & AI Technologies
More informationStreaming Auto-Scaling in Google Cloud Dataflow
Streaming Auto-Scaling in Google Cloud Dataflow Manuel Fahndrich Software Engineer Google Addictive Mobile Game https://commons.wikimedia.org/wiki/file:globe_centered_in_the_atlantic_ocean_(green_and_grey_globe_scheme).svg
More informationProcessing Data Like Google Using the Dataflow/Beam Model
Todd Reedy Google for Work Sales Engineer Google Processing Data Like Google Using the Dataflow/Beam Model Goals: Write interesting computations Run in both batch & streaming Use custom timestamps Handle
More informationDeep Learning Frameworks with Spark and GPUs
Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training
More informationTensorFlow: A System for Learning-Scale Machine Learning. Google Brain
TensorFlow: A System for Learning-Scale Machine Learning Google Brain The Problem Machine learning is everywhere This is in large part due to: 1. Invention of more sophisticated machine learning models
More informationDeveloping Machine Learning Models. Kevin Tsai
Developing Machine Learning Models Kevin Tsai GOTO Chicago 2018 Developing Machine Learning Models Kevin Tsai, Google Agenda Machine Learning with tf.estimator Performance pipelines with TFRecords and
More informationThe SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.
Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate
More information15.1 Data flow vs. traditional network programming
CME 323: Distributed Algorithms and Optimization, Spring 2017 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Matroid and Stanford. Lecture 15, 5/22/2017. Scribed by D. Penner, A. Shoemaker, and
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Lyamine Hedjazi 2015 The MathWorks, Inc. 1 Data Analytics Workflow Preprocessing Data Business Systems Build Algorithms Smart Connected Systems Take Decisions
More informationHadoop, Yarn and Beyond
Hadoop, Yarn and Beyond 1 B. R A M A M U R T H Y Overview We learned about Hadoop1.x or the core. Just like Java evolved, Java core, Java 1.X, Java 2.. So on, software and systems evolve, naturally.. Lets
More informationNVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS
TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most
More informationOPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS
OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS 1 Why GPUs? A Tale of Numbers 100x Performance Increase Infrastructure Cost Savings Performance 100x gains over traditional
More informationRapid growth of massive datasets
Overview Rapid growth of massive datasets E.g., Online activity, Science, Sensor networks Data Distributed Clusters are Pervasive Data Distributed Computing Mature Methods for Common Problems e.g., classification,
More informationAzure Data Factory. Data Integration in the Cloud
Azure Data Factory Data Integration in the Cloud 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and
More informationHandling Non-Relational Databases on Cloud using Scheduling Approach with Performance Analysis
Handling Non-Relational Databases on Cloud using Scheduling Approach with Performance Analysis *1 Bansri Kotecha, 2 Hetal Joshiyara 1 PG Scholar, 2 Assistant Professor 1, 2 Computer science Department,
More informationIntroduction to TensorFlow. Mor Geva, Apr 2018
Introduction to TensorFlow Mor Geva, Apr 2018 Introduction to TensorFlow Mor Geva, Apr 2018 Plan Why TensorFlow Basic Code Structure Example: Learning Word Embeddings with Skip-gram Variable and Name Scopes
More informationBuilding a Data-Friendly Platform for a Data- Driven Future
Building a Data-Friendly Platform for a Data- Driven Future Benjamin Hindman - @benh 2016 Mesosphere, Inc. All Rights Reserved. INTRO $ whoami BENJAMIN HINDMAN Co-founder and Chief Architect of Mesosphere,
More informationGrow your cloud knowledge.
Grow your cloud knowledge. Bootcamps. Dates: Monday, July 23 Friday, July 27. Price: $495 Full Day $249 Half Day. Build your practical skills and theoretical knowledge at in-person training courses that
More informationResearch challenges in data-intensive computing The Stratosphere Project Apache Flink
Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive
More informationMigrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring
Migrating massive monitoring to Bigtable without downtime Martin Parm, Infrastructure Engineer for Monitoring This is a big deal. -- Nicholas Harteau/VP, Engineering & Infrastructure https://news.spotify.com/dk/2016/02/23/announcing-spotify-infrastructures-googley-future/
More informationProcessing Data of Any Size with Apache Beam
Processing Data of Any Size with Apache Beam 1 / 19 Chapter 1 Introducing Apache Beam 2 / 19 Introducing Apache Beam What Is Beam? Why Use Beam? Using Beam 3 / 19 Apache Beam Apache Beam is a unified model
More informationIntro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect
Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing
More informationExperiences with Apache Beam. Dan Debrunner Programming Model Architect IBM Streams STSM, IBM
Experiences with Apache Beam Dan Debrunner Programming Model Architect IBM Streams STSM, IBM Background To define my point of view IBM Streams brief history 2002 IBM Research/DoD joint research project
More informationPython lab session 1
Python lab session 1 Dr Ben Dudson, Department of Physics, University of York 28th January 2011 Python labs Before we can start using Python, first make sure: ˆ You can log into a computer using your username
More informationScaled Machine Learning at Matroid
Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Machine Learning Pipeline Learning Algorithm Replicate model Data Trained Model Serve Model Repeat entire pipeline Scaling
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationPursuit of stability. Growing AWS ECS in production. Alexander Köhler Frankfurt, September 2018
Pursuit of stability Growing AWS ECS in production Alexander Köhler Frankfurt, September 2018 Alexander Köhler DevOps Engineer Systems Engineer Big Data Engineer Application Developer 2 @la3mmchen inovex
More informationHow Apache Beam Will Change Big Data
How Apache Beam Will Change Big Data 1 / 21 About Big Data Institute Mentoring, training, and high-level consulting company focused on Big Data, NoSQL and The Cloud Founded in 2008 We help make companies
More informationCenter for Information Services and High Performance Computing (ZIH) Current trends in big data analysis: second generation data processing
Center for Information Services and High Performance Computing (ZIH) Current trends in big data analysis: second generation data processing Course overview Part 1 Challenges Fundamentals and challenges
More informationHow to go serverless with AWS Lambda
How to go serverless with AWS Lambda Roman Plessl, nine (AWS Partner) Zürich, AWSomeDay 12. September 2018 About myself and nine Roman Plessl Working for nine as a Solution Architect, Consultant and Leader.
More informationGabriel Villa. Architecting an Analytics Solution on AWS
Gabriel Villa Architecting an Analytics Solution on AWS Cloud and Data Architect Skilled leader, solution architect, and technical expert focusing primarily on Microsoft technologies and AWS. Passionate
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationMachine Learning. Bridging the OT IT Gap for Machine Learning with Ignition and AWS Greengrass
Machine Learning with Ignition and AWS Greengrass Bridging B the OT IT Gap for Machine Learning Simply Connect Ignition & AWS Greengrass for Machine Learning! Bridging the OT IT gaps is now easier using
More informationDatabase infrastructure for electronic structure calculations
Database infrastructure for electronic structure calculations Fawzi Mohamed fawzi.mohamed@fhi-berlin.mpg.de 22.7.2015 Why should you be interested in databases? Can you find a calculation that you did
More informationEnabling High-Quality Printing in Web Applications. Tanu Hoque & Craig Williams
Enabling High-Quality Printing in Web Applications Tanu Hoque & Craig Williams New Modern Print Service with ArcGIS Enterprise 10.6 Quality Improvements: Support for true color level transparency PDF produced
More informationDown the event-driven road: Experiences of integrating streaming into analytic data platforms
Down the event-driven road: Experiences of integrating streaming into analytic data platforms Dr. Dominik Benz, Head of Machine Learning Engineering, inovex GmbH Confluent Meetup Munich, 8.10.2018 Integrate
More informationCrossing the AI Chasm. What We Learned from building Apache PredictionIO (incubating)
Crossing the AI Chasm What We Learned from building Apache PredictionIO (incubating) Simon Chan Sr. Director, Product Management, Salesforce Co-founder, PredictionIO PhD, University College London simon@salesforce.com
More informationMachine Learning in Python. Rohith Mohan GradQuant Spring 2018
Machine Learning in Python Rohith Mohan GradQuant Spring 2018 What is Machine Learning? https://twitter.com/myusuf3/status/995425049170489344 Traditional Programming Data Computer Program Output Getting
More information@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS
@unterstein @dcos @bedcon #bedcon Operating microservices with Apache Mesos and DC/OS 1 Johannes Unterstein Software Engineer @Mesosphere @unterstein @unterstein.mesosphere 2017 Mesosphere, Inc. All Rights
More informationThe Future of Analytics or The New SQL
The Future of Analytics or The New SQL Gerhard Otterbach, Sales Manager Teradata Germany Hanau, Feb. 28th, 2018 Teradata At A Glance: 39 Years Ago Teradata was big data before there was big data Donald
More informationMLeap: Release Spark ML Pipelines
MLeap: Release Spark ML Pipelines Hollin Wilkins & Mikhail Semeniuk SATURDAY Web Dev @ Cornell Studied some General Biology Rails Consulting for TrueCar and other companies Implement ML model for ClearBook
More informationUnderstanding the latent value in all content
Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence
More informationBig data systems 12/8/17
Big data systems 12/8/17 Today Basic architecture Two levels of scheduling Spark overview Basic architecture Cluster Manager Cluster Cluster Manager 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores
More informationPython based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc C O M P U T E S T O R E A N A L Y Z E
Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc Overview Supported Technologies Cray PE Python Support Shifter Urika-XC Anaconda Python Spark Intel BigDL machine
More informationProcessing of big data with Apache Spark
Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationWHY AND HOW TO LEVERAGE THE POWER AND SIMPLICITY OF SQL ON APACHE FLINK - FABIAN HUESKE, SOFTWARE ENGINEER
WHY AND HOW TO LEVERAGE THE POWER AND SIMPLICITY OF SQL ON APACHE FLINK - FABIAN HUESKE, SOFTWARE ENGINEER ABOUT ME Apache Flink PMC member & ASF member Contributing since day 1 at TU Berlin Focusing on
More informationMoven. Machine/Deep Learning Models Distribution Relying on the Maven Infrastructure. Sergio Fernández (Redlink GmbH) November 14th, Sevilla
Moven Machine/Deep Learning Models Distribution Relying on the Maven Infrastructure Sergio Fernández (Redlink GmbH) November 14th, 2016 - Sevilla SSIX aims to exploit the predictive power of Social Media
More informationRelay: a high level differentiable IR. Jared Roesch TVMConf December 12th, 2018
Relay: a high level differentiable IR Jared Roesch TVMConf December 12th, 2018!1 This represents months of joint work with lots of great folks:!2 TVM Stack Optimization Relay High-Level Differentiable
More informationTransitioning From SSIS to Azure Data Factory. Meagan Longoria, Solution Architect, BlueGranite
Transitioning From SSIS to Azure Data Factory Meagan Longoria, Solution Architect, BlueGranite Microsoft Data Platform MVP I enjoy contributing to and learning from the Microsoft data community. Blogger
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationDECISION SUPPORT SYSTEM USING TENSOR FLOW
Volume 118 No. 24 2018 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ DECISION SUPPORT SYSTEM USING TENSOR FLOW D.Anji Reddy 1, G.Narasimha 2, K.Srinivas
More informationIntro. Scheme Basics. scm> 5 5. scm>
Intro Let s take some time to talk about LISP. It stands for LISt Processing a way of coding using only lists! It sounds pretty radical, and it is. There are lots of cool things to know about LISP; if
More informationContainer 2.0. Container: check! But what about persistent data, big data or fast data?!
@unterstein @joerg_schad @dcos @jaxdevops Container 2.0 Container: check! But what about persistent data, big data or fast data?! 1 Jörg Schad Distributed Systems Engineer @joerg_schad Johannes Unterstein
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Aurélie Urbain MathWorks Consulting Services 2015 The MathWorks, Inc. 1 Data Analytics Workflow Data Acquisition Data Analytics Analytics Integration Business
More informationIntroduction to MATLAB application deployment
Introduction to application deployment Antti Löytynoja, Application Engineer 2015 The MathWorks, Inc. 1 Technical Computing with Products Access Explore & Create Share Options: Files Data Software Data
More informationBlackPearl Customer Created Clients Using Free & Open Source Tools
BlackPearl Customer Created Clients Using Free & Open Source Tools December 2017 Contents A B S T R A C T... 3 I N T R O D U C T I O N... 3 B U L D I N G A C U S T O M E R C R E A T E D C L I E N T...
More informationDesigning for the Cloud
Design Page 1 Designing for the Cloud 3:47 PM Designing for the Cloud So far, I've given you the "solution" and asked you to "program a piece of it". This is a short-term problem! In fact, 90% of the real
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationDeep Learning Applications
October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning
More informationMachine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich
Machine Learning In A Snap Thomas Parnell Research Staff Member IBM Research - Zurich What are GLMs? Ridge Regression Support Vector Machines Regression Generalized Linear Models Classification Lasso Regression
More informationSTATS Data Analysis using Python. Lecture 8: Hadoop and the mrjob package Some slides adapted from C. Budak
STATS 700-002 Data Analysis using Python Lecture 8: Hadoop and the mrjob package Some slides adapted from C. Budak Recap Previous lecture: Hadoop/MapReduce framework in general Today s lecture: actually
More informationModern Stream Processing with Apache Flink
1 Modern Stream Processing with Apache Flink Till Rohrmann GOTO Berlin 2017 2 Original creators of Apache Flink da Platform 2 Open Source Apache Flink + da Application Manager 3 What changes faster? Data
More informationDistributed systems for stream processing
Distributed systems for stream processing Apache Kafka and Spark Structured Streaming Alena Hall Alena Hall Large-scale data processing Distributed Systems Functional Programming Data Science & Machine
More informationEagle Eye. Sommersemester 2017 Big Data Science Praktikum. Zhenyu Chen - Wentao Hua - Guoliang Xue - Bernhard Fabry - Daly
Eagle Eye Sommersemester 2017 Big Data Science Praktikum Zhenyu Chen - Wentao Hua - Guoliang Xue - Bernhard Fabry - Daly 1 Sommersemester Agenda 2009 Brief Introduction Pre-processiong of dataset Front-end
More informationWHITE PAPER: TOP 10 CAPABILITIES TO LOOK FOR IN A DATA CATALOG
WHITE PAPER: TOP 10 CAPABILITIES TO LOOK FOR IN A DATA CATALOG The #1 Challenge in Successfully Deploying a Data Catalog The data cataloging space is relatively new. As a result, many organizations don
More informationServerless Architecture Hochskalierbare Anwendungen ohne Server. Sascha Möllering, Solutions Architect
Serverless Architecture Hochskalierbare Anwendungen ohne Server Sascha Möllering, Solutions Architect Agenda Serverless Architecture AWS Lambda Amazon API Gateway Amazon DynamoDB Amazon S3 Serverless Framework
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationCentral platform for business-wide, standardized Conversion of documents to PDF and PDF/A
PDF Days Europe 2017 Central platform for business-wide, standardized Conversion of documents to PDF and PDF/A A PDF Association Presentation 2017 by PDF Association 1 Sources of PDF-files in larger organizations
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationSoftware Quality Assurance. David Janzen
Software Quality Assurance David Janzen What is quality? Crosby: Conformance to requirements Issues: who establishes requirements? implicit requirements Juran: Fitness for intended use Issues: Who defines
More informationCloudMan cloud clusters for everyone
CloudMan cloud clusters for everyone Enis Afgan usecloudman.org This is accessibility! But only sometimes So, there are alternatives BUT WHAT IF YOU WANT YOUR OWN, QUICKLY The big picture A. Users in different
More informationFor this class we are going to create a file in Microsoft Word. Open Word on the desktop.
File Management Windows 10 What is File Management? As you use your computer and create files you may need some help in storing and retrieving those files. File management shows you how to create, move,
More informationGoogle Cloud Dataflow
Google Cloud Dataflow A Unified Model for Batch and Streaming Data Processing Jelena Pjesivac-Grbovic STREAM 2015 Agenda 1 Data Shapes 2 Data Processing Tradeoffs 3 Google s Data Processing Story 4 Google
More information