MLlib. Distributed Machine Learning on. Evan Sparks. UC Berkeley

Similar documents
Evan Sparks and Ameet Talwalkar UC Berkeley. UC Berkeley

Evan Sparks and Ameet Talwalkar UC Berkeley

Evan Sparks and Ameet Talwalkar UC Berkeley

Scaled Machine Learning at Matroid

Big Data Infrastructures & Technologies

Distributed Machine Learning" on Spark

MLI - An API for Distributed Machine Learning. Sarang Dev

Summary of Big Data Frameworks Course 2015 Professor Sasu Tarkoma

Integration of Machine Learning Library in Apache Apex

An Introduction to Apache Spark

Distributed Computing with Spark

SCALABLE, LOW LATENCY MODEL SERVING AND MANAGEMENT WITH VELOX

Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs

MLlib and Distributing the " Singular Value Decomposition. Reza Zadeh

Fast, Interactive, Language-Integrated Cluster Computing

Matrix Computations and " Neural Networks in Spark

MLBase, MLLib and GraphX

DATA SCIENCE USING SPARK: AN INTRODUCTION

Distributed Computing with Spark and MapReduce

Analytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation

Gradient Descent. Michail Michailidis & Patrick Maiden

Submitted to: Dr. Sunnie Chung. Presented by: Sonal Deshmukh Jay Upadhyay

Asynchronous and Fault-Tolerant Recursive Datalog Evalua9on in Shared-Nothing Engines

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

Apache Spark 2.0. Matei

A Parallel R Framework

Apache Spark 2 X Cookbook Cloud Ready Recipes For Analytics And Data Science

Chapter 1 - The Spark Machine Learning Library

Towards an OpBmizer for MLbase

a Spark in the cloud iterative and interactive cluster computing

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

Intelligent Services Serving Machine Learning

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Spark, Shark and Spark Streaming Introduction

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Resilient Distributed Datasets

Spark. Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.

CSE 444: Database Internals. Lecture 23 Spark

COSC 6339 Big Data Analytics. Introduction to Spark. Edgar Gabriel Fall What is SPARK?

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

2/4/2019 Week 3- A Sangmi Lee Pallickara

Lecture 11 Hadoop & Spark

Using Existing Numerical Libraries on Spark

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

Cloud, Big Data & Linear Algebra

Chapter 4: Apache Spark

Using Numerical Libraries on Spark

Big Data Infrastructures & Technologies Hadoop Streaming Revisit.

Overcoming the Barriers of Graphs on GPUs: Delivering Graph Analy;cs 100X Faster and 40X Cheaper

Backtesting with Spark

Cloud Computing & Visualization

Spark & Spark SQL. High- Speed In- Memory Analytics over Hadoop and Hive Data. Instructor: Duen Horng (Polo) Chau

Stream Processing on IoT Devices using Calvin Framework

Scalable Machine Learning in R. with H2O

Scalable Tools - Part I Introduction to Scalable Tools

BigDataBench- S: An Open- source Scien6fic Big Data Benchmark Suite

The Datacenter Needs an Operating System

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Data Analytics and Machine Learning: From Node to Cluster

Machine Learning Crash Course: Part I

Apache SystemML Declarative Machine Learning

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING

CSC 261/461 Database Systems Lecture 24. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

CS 6453: Parameter Server. Soumya Basu March 7, 2017

15.1 Data flow vs. traditional network programming

A very short introduction

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System

Re- op&mizing Data Parallel Compu&ng

Processing of big data with Apache Spark

Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra<on

Unifying Big Data Workloads in Apache Spark

Scalable Automated Model Search

Index. bfs() function, 225 Big data characteristics, 2 variety, 3 velocity, 3 veracity, 3 volume, 2 Breadth-first search algorithm, 220, 225

Jialin Liu, Evan Racah, Quincey Koziol, Richard Shane Canon, Alex Gittens, Lisa Gerhardt, Suren Byna, Mike F. Ringenburg, Prabhat

KNIME for the life sciences Cambridge Meetup

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

Why do we need graph processing?

Machine Learning for Large-Scale Data Analysis and Decision Making A. Distributed Machine Learning Week #9

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Twitter data Analytics using Distributed Computing

Specialist ICT Learning

Distributed Machine Learning: An Intro. Chen Huang

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Machine Learning at the Limit

Distributed Computation Models

Accelerating Spark Workloads using GPUs

Analytics in Spark. Yanlei Diao Tim Hunter. Slides Courtesy of Ion Stoica, Matei Zaharia and Brooke Wenig

Apache Spark and Scala Certification Training

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Benchmarking Apache Flink and Apache Spark DataFlow Systems on Large-Scale Distributed Machine Learning Algorithms

The Stratosphere Platform for Big Data Analytics

Decentralized and Distributed Machine Learning Model Training with Actors

Webinar Series TMIP VISION

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Machine learning library for Apache Flink

A Tutorial on Apache Spark

Lambda Architecture with Apache Spark

Transcription:

MLlib & ML base Distributed Machine Learning on Evan Sparks UC Berkeley January 31st, 2014 Collaborators: Ameet Talwalkar, Xiangrui Meng, Virginia Smith, Xinghao Pan, Shivaram Venkataraman, Matei Zaharia, Rean Griffith, John Duchi, Joseph Gonzalez, Michael Franklin, Michael I. Jordan, Tim Kraska www.mlbase.org UC Berkeley

Problem: Scalable implementa.ons difficult for ML Developers

Problem: Scalable implementa.ons difficult for ML Developers

Problem: Scalable implementa.ons difficult for ML Developers

Problem: ML is difficult for End Users Too many algorithms

Problem: ML is difficult for End Users Too many knobs Too many algorithms

Problem: ML is difficult for End Users Too many knobs Too many algorithms Difficult to debug

Problem: ML is difficult for End Users Too many knobs Too many algorithms Difficult to debug Doesn t scale

Problem: ML is difficult Fast for End Users Too many knobs Reliable Provable Accurate Too many algorithms Difficult to debug Doesn t scale

ML Experts MLbase Systems Experts

ML Experts MLbase Systems Experts 1. Easy scalable ML development (ML Developers) 2. User- friendly ML at scale (End Users)

Matlab Stack

Matlab Stack Single Machine

Matlab Stack Lapack Single Machine Lapack: low- level Fortran linear algebra library

Matlab Stack Matlab Interface Lapack Single Machine Lapack: low- level Fortran linear algebra library Matlab Interface Higher- level abstrac]ons for data access / processing More extensive func]onality than Lapack Leverages Lapack whenever possible

Matlab Stack Matlab Interface Lapack Single Machine Lapack: low- level Fortran linear algebra library Matlab Interface Higher- level abstrac]ons for data access / processing More extensive func]onality than Lapack Leverages Lapack whenever possible Similar stories for R and Python

MLbase Stack Matlab Interface Lapack Single Machine

MLbase Stack Matlab Interface Lapack Single Machine Runtime(s)

MLbase Stack Matlab Interface Lapack Single Machine Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on

MLbase Stack Matlab Interface Lapack Single Machine MLlib Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on MLlib: produc]on- quality ML library in Spark

MLbase Stack Matlab Interface Lapack Single Machine MLI MLlib Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on MLlib: produc]on- quality ML library in Spark MLI: experimental API for simplified feature extrac]on and algorithm development

MLbase Stack ML Optimizer Matlab Interface Lapack Single Machine MLI MLlib Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on MLlib: produc]on- quality ML library in Spark MLI: experimental API for simplified feature extrac]on and algorithm development ML Op.mizer: a declara]ve layer to simplify access to large- scale ML

Overview MLlib Collabora.ve Filtering ALS Details

MLlib Classifica.on: Regression: Collabora.ve Filtering: Clustering / Explora.on: Op.miza.on Primi.ves: Interopera.lity: Logis]c Regression, Linear SVM (+L1, L2), Decision Trees, Naive Bayes Linear Regression (+Lasso, Ridge) Alterna]ng Least Squares K- Means, SVD SGD, Parallel Gradient Scala, Java, PySpark (0.9)

MLlib Classifica.on: Regression: Collabora.ve Filtering: Clustering / Explora.on: Op.miza.on Primi.ves: Logis]c Regression, Linear SVM (+L1, L2), Decision Trees, Naive Bayes Linear Regression (+Lasso, Ridge) Alterna]ng Least Squares K- Means, SVD SGD, Parallel Gradient Interopera.lity: Scala, Java, PySpark (0.9) Included within Spark codebase Unlike Mahout/Hadoop Part of Spark 0.8 release Con]nued support via Spark project Community involvement has been terrific: ALS with implicit feedback (0.8.1), Naive Bayes (0.9), SVD (0.9), Decision Trees (soon!)

MLlib Performance

MLlib Performance Wall.me: elapsed ]me to execute task

MLlib Performance Wall.me: elapsed ]me to execute task Weak scaling fix problem size per processor ideally: constant wall]me as we grow cluster

MLlib Performance Wall.me: elapsed ]me to execute task Weak scaling fix problem size per processor ideally: constant wall]me as we grow cluster Strong scaling fix total problem size ideally: linear speed up as we grow cluster

MLlib Performance Wall.me: elapsed ]me to execute task Weak scaling fix problem size per processor ideally: constant wall]me as we grow cluster Strong scaling fix total problem size ideally: linear speed up as we grow cluster EC2 Experiments m2.4xlarge instances, up to 32 machine clusters

Logis]c Regression - Weak Scaling

Logis]c Regression - Weak Scaling Full dataset: 200K images, 160K dense features

Logis]c Regression - Weak Scaling 10 8 MLbase MLlib VW Ideal relative walltime 6 4 2 0 0 5 10 15 20 25 30 # machines Full dataset: 200K images, 160K dense features Similar weak scaling

Logis]c Regression - Weak Scaling 10 8 MLbase MLlib VW Ideal relative walltime 6 4 2 0 0 5 10 15 20 25 30 # machines Full dataset: 200K images, 160K dense features Similar weak scaling MLlib within a factor of 2 of VW s wall]me

Logis]c Regression - Weak Scaling relative walltime 10 8 6 4 MLbase MLlib VW Ideal walltime (s) 4000 3000 2000 n=6k, d=160k n=12.5k, d=160k n=25k, d=160k n=50k, d=160k n=100k, d=160k n=200k, d=160k 2 1000 0 0 5 10 15 20 25 30 # machines 0 MLbase MLlib VW Matlab Full dataset: 200K images, 160K dense features Similar weak scaling MLlib within a factor of 2 of VW s wall]me

Logis]c Regression - Strong Scaling

Logis]c Regression - Strong Scaling Fixed Dataset: 50K images, 160K dense features

Logis]c Regression - Strong Scaling 35 30 25 MLlib MLbase VW Ideal speedup 20 15 10 5 0 0 5 10 15 20 25 30 # machines Fixed Dataset: 50K images, 160K dense features MLlib exhibits bejer scaling proper]es

Logis]c Regression - Strong Scaling speedup 35 30 25 20 15 MLlib MLbase VW Ideal walltime (s) 1400 1200 1000 800 600 1 Machine 2 Machines 4 Machines 8 Machines 16 Machines 32 Machines 10 400 5 0 0 5 10 15 20 25 30 # machines 200 0 MLbase MLlib VW Matlab Fixed Dataset: 50K images, 160K dense features MLlib exhibits bejer scaling proper]es MLlib faster than VW with 16 and 32 machines

ALS - Wall]me

ALS - Wall]me Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines

ALS - Wall]me System Wall.me (seconds) Matlab 15443 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines

ALS - Wall]me System Wall.me (seconds) Matlab 15443 Mahout 4206 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines

ALS - Wall]me System Wall.me (seconds) Matlab 15443 Mahout 4206 GraphLab 291 MLlib 481 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines MLlib an order of magnitude faster than Mahout

ALS - Wall]me System Wall.me (seconds) Matlab 15443 Mahout 4206 GraphLab 291 MLlib 481 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines MLlib an order of magnitude faster than Mahout MLlib within factor of 2 of GraphLab

Deployment Considera]ons

Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance

Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance MLlib Reads files from HDFS Launch/compile/run on cluster with a few commands RDD s provide fault tolerance naturally

Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance MLlib Reads files from HDFS Launch/compile/run on cluster with a few commands RDD s provide fault tolerance naturally Part of Spark s swiss army knife ecosystem

Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance MLlib Reads files from HDFS Launch/compile/run on cluster with a few commands RDD s provide fault tolerance naturally Part of Spark s swiss army knife ecosystem Shark, Spark Streaming, Graph- X, BlinkDB, etc.

Vision MLlib Collabora.ve Filtering ALS Details

Matrix Comple]on

Matrix Comple]on Goal: Recover a matrix from a subset of its entries

Matrix Comple]on Goal: Recover a matrix from a subset of its entries

Matrix Comple]on Goal: Recover a matrix from a subset of its entries

Reducing Degrees of Freedom

Reducing Degrees of Freedom Problem: Impossible without additional information n m

Reducing Degrees of Freedom Problem: Impossible without additional information mn degrees of freedom n r n r m = m Low-rank

Reducing Degrees of Freedom Problem: Impossible without additional information mn degrees of freedom Solution: Assume small # of factors determine preference O(m + n) degrees of freedom n r n r m = m Low-rank

Alterna]ng Least Squares =

Alterna]ng Least Squares =

Alterna]ng Least Squares =

Alterna]ng Least Squares =

Alterna]ng Least Squares = training error for first user = ( - ) + ( - )

Alterna]ng Least Squares = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors

Alterna]ng Least Squares = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors update first user by finding that minimizes training error

Alterna]ng Least Squares = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors update first user by finding that minimizes training error reduces to standard linear regression problem

Alterna]ng Least Squares... = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors update first user by finding that minimizes training error reduces to standard linear regression problem can update all users in parallel!

Alterna]ng Least Squares = training error for first user = ( - ) + ( - )

Alterna]ng Least Squares = H > M W training error for first user = ( - ) + ( - )

Alterna]ng Least Squares = H > M W training error for first user = ( - ) + ( - ) = X (M 1j W 1 H > j ) 2 (1,j)2

Alterna]ng Least Squares = H > M W training error for first user = ( - ) + ( - ) = X (M 1j W 1 H > j ) 2 (1,j)2 W 1 =(H > 1 H 1 ) 1 H > 1 M > 1 1

Exercise Today 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. Validate and tune hyperparameters. 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. Validate and tune hyperparameters. Get YOUR recommenda]ons. 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. Validate and tune hyperparameters. Get YOUR recommenda]ons. Great example of a Spark applica]on! 19

Vision MLlib Collabora.ve Filtering ALS Details

Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

Broadcast Everything Ratings Movie! User! Master Workers

Broadcast Everything Ratings Movie! User! Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

Broadcast Everything Ratings Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

Data Parallel Ratings Movie! Ratings User! Ratings Master Workers Ratings

Data Parallel Ratings Workers load data Movie! Ratings User! Ratings Master Workers Ratings

Data Parallel Ratings Workers load data Movie! Ratings Master broadcasts ini]al models User! Ratings Master Workers Ratings

Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Master Workers Ratings

Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Master Workers Ratings

Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings

Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings Works well for smaller models. (low K)

Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings Works well for smaller models. (low K)

Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings Works well for smaller models. (low K)

Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings Works well for smaller models. (low K)

Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

Fully Parallel Ratings Movie! User! Ratings Movie! User! Ratings Movie! User! Master Workers Ratings Movie! User!

Fully Parallel Ratings Movie! User! Workers load data Ratings Movie! User! Ratings Movie! User! Master Workers Ratings Movie! User!

Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! Ratings Movie! User! Master Workers Ratings Movie! User!

Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Master Workers Ratings Movie! User!

Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! Ratings Movie! User! At each itera]on, models are shared via join between workers. Much bejer scalability. Master Workers Ratings Movie! User!

Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! Ratings Movie! User! At each itera]on, models are shared via join between workers. Much bejer scalability. Works on large datasets Master Workers Ratings Movie! User!

Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Much bejer scalability. Works on large datasets Master Workers Ratings Movie! User! Works on big models (higher K)

Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Much bejer scalability. Works on large datasets Master Workers Ratings Movie! User! Works on big models (higher K)

Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Much bejer scalability. Works on large datasets Master Workers Ratings Movie! User! Works on big models (higher K)

Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

Three Kinds of ALS Four Broadcast Everything Data Parallel Fully Parallel Blocked

ML Optimizer MLI MLlib Runtime(s) Spark ML Op.mizer: a declara]ve layer to simplify access to large- scale ML MLI: experimental API for simplified feature extrac]on and algorithm development MLlib: produc]on- quality ML library in Spark Spark: cluster compu]ng system designed for itera]ve computa]on

ML Optimizer MLI MLlib Runtime(s) Spark ML Op.mizer: a declara]ve layer to simplify access to large- scale ML MLI: experimental API for simplified feature extrac]on and algorithm development MLlib: produc]on- quality ML library in Spark Spark: cluster compu]ng system designed for itera]ve computa]on ML base www.mlbase.org

ML Optimizer MLI MLlib Runtime(s) Spark ML Op.mizer: a declara]ve layer to simplify access to large- scale ML MLI: experimental API for simplified feature extrac]on and algorithm development MLlib: produc]on- quality ML library in Spark Spark: cluster compu]ng system designed for itera]ve computa]on THANKS! QUESTIONS? ML base www.mlbase.org