MLlib. Distributed Machine Learning on. Evan Sparks. UC Berkeley

MLlib & ML base Distributed Machine Learning on Evan Sparks UC Berkeley January 31st, 2014 Collaborators: Ameet Talwalkar, Xiangrui Meng, Virginia Smith, Xinghao Pan, Shivaram Venkataraman, Matei Zaharia, Rean Griffith, John Duchi, Joseph Gonzalez, Michael Franklin, Michael I. Jordan, Tim Kraska www.mlbase.org UC Berkeley

Problem: Scalable implementa.ons difficult for ML Developers

Problem: ML is difficult for End Users Too many algorithms

Problem: ML is difficult for End Users Too many knobs Too many algorithms

Problem: ML is difficult for End Users Too many knobs Too many algorithms Difficult to debug

Problem: ML is difficult for End Users Too many knobs Too many algorithms Difficult to debug Doesn t scale

Problem: ML is difficult Fast for End Users Too many knobs Reliable Provable Accurate Too many algorithms Difficult to debug Doesn t scale

ML Experts MLbase Systems Experts

ML Experts MLbase Systems Experts 1. Easy scalable ML development (ML Developers) 2. User- friendly ML at scale (End Users)

Matlab Stack

Matlab Stack Single Machine

Matlab Stack Lapack Single Machine Lapack: low- level Fortran linear algebra library

Matlab Stack Matlab Interface Lapack Single Machine Lapack: low- level Fortran linear algebra library Matlab Interface Higher- level abstrac]ons for data access / processing More extensive func]onality than Lapack Leverages Lapack whenever possible

MLbase Stack Matlab Interface Lapack Single Machine

MLbase Stack Matlab Interface Lapack Single Machine Runtime(s)

MLbase Stack Matlab Interface Lapack Single Machine Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on

MLbase Stack Matlab Interface Lapack Single Machine MLlib Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on MLlib: produc]on- quality ML library in Spark

MLbase Stack Matlab Interface Lapack Single Machine MLI MLlib Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on MLlib: produc]on- quality ML library in Spark MLI: experimental API for simplified feature extrac]on and algorithm development

MLbase Stack ML Optimizer Matlab Interface Lapack Single Machine MLI MLlib Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on MLlib: produc]on- quality ML library in Spark MLI: experimental API for simplified feature extrac]on and algorithm development ML Op.mizer: a declara]ve layer to simplify access to large- scale ML

Overview MLlib Collabora.ve Filtering ALS Details

MLlib Classifica.on: Regression: Collabora.ve Filtering: Clustering / Explora.on: Op.miza.on Primi.ves: Interopera.lity: Logis]c Regression, Linear SVM (+L1, L2), Decision Trees, Naive Bayes Linear Regression (+Lasso, Ridge) Alterna]ng Least Squares K- Means, SVD SGD, Parallel Gradient Scala, Java, PySpark (0.9)

MLlib Classifica.on: Regression: Collabora.ve Filtering: Clustering / Explora.on: Op.miza.on Primi.ves: Logis]c Regression, Linear SVM (+L1, L2), Decision Trees, Naive Bayes Linear Regression (+Lasso, Ridge) Alterna]ng Least Squares K- Means, SVD SGD, Parallel Gradient Interopera.lity: Scala, Java, PySpark (0.9) Included within Spark codebase Unlike Mahout/Hadoop Part of Spark 0.8 release Con]nued support via Spark project Community involvement has been terrific: ALS with implicit feedback (0.8.1), Naive Bayes (0.9), SVD (0.9), Decision Trees (soon!)

MLlib Performance

MLlib Performance Wall.me: elapsed ]me to execute task

MLlib Performance Wall.me: elapsed ]me to execute task Weak scaling fix problem size per processor ideally: constant wall]me as we grow cluster

MLlib Performance Wall.me: elapsed ]me to execute task Weak scaling fix problem size per processor ideally: constant wall]me as we grow cluster Strong scaling fix total problem size ideally: linear speed up as we grow cluster

Logis]c Regression - Weak Scaling

Logis]c Regression - Weak Scaling Full dataset: 200K images, 160K dense features

Logis]c Regression - Weak Scaling 10 8 MLbase MLlib VW Ideal relative walltime 6 4 2 0 0 5 10 15 20 25 30 # machines Full dataset: 200K images, 160K dense features Similar weak scaling

Logis]c Regression - Weak Scaling 10 8 MLbase MLlib VW Ideal relative walltime 6 4 2 0 0 5 10 15 20 25 30 # machines Full dataset: 200K images, 160K dense features Similar weak scaling MLlib within a factor of 2 of VW s wall]me

Logis]c Regression - Weak Scaling relative walltime 10 8 6 4 MLbase MLlib VW Ideal walltime (s) 4000 3000 2000 n=6k, d=160k n=12.5k, d=160k n=25k, d=160k n=50k, d=160k n=100k, d=160k n=200k, d=160k 2 1000 0 0 5 10 15 20 25 30 # machines 0 MLbase MLlib VW Matlab Full dataset: 200K images, 160K dense features Similar weak scaling MLlib within a factor of 2 of VW s wall]me

Logis]c Regression - Strong Scaling

Logis]c Regression - Strong Scaling Fixed Dataset: 50K images, 160K dense features

Logis]c Regression - Strong Scaling 35 30 25 MLlib MLbase VW Ideal speedup 20 15 10 5 0 0 5 10 15 20 25 30 # machines Fixed Dataset: 50K images, 160K dense features MLlib exhibits bejer scaling proper]es

Logis]c Regression - Strong Scaling speedup 35 30 25 20 15 MLlib MLbase VW Ideal walltime (s) 1400 1200 1000 800 600 1 Machine 2 Machines 4 Machines 8 Machines 16 Machines 32 Machines 10 400 5 0 0 5 10 15 20 25 30 # machines 200 0 MLbase MLlib VW Matlab Fixed Dataset: 50K images, 160K dense features MLlib exhibits bejer scaling proper]es MLlib faster than VW with 16 and 32 machines

ALS - Wall]me

ALS - Wall]me Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines

ALS - Wall]me System Wall.me (seconds) Matlab 15443 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines

ALS - Wall]me System Wall.me (seconds) Matlab 15443 Mahout 4206 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines

ALS - Wall]me System Wall.me (seconds) Matlab 15443 Mahout 4206 GraphLab 291 MLlib 481 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines MLlib an order of magnitude faster than Mahout

Deployment Considera]ons

Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance

Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance MLlib Reads files from HDFS Launch/compile/run on cluster with a few commands RDD s provide fault tolerance naturally

Vision MLlib Collabora.ve Filtering ALS Details

Matrix Comple]on

Matrix Comple]on Goal: Recover a matrix from a subset of its entries

Reducing Degrees of Freedom

Reducing Degrees of Freedom Problem: Impossible without additional information n m

Reducing Degrees of Freedom Problem: Impossible without additional information mn degrees of freedom n r n r m = m Low-rank

Reducing Degrees of Freedom Problem: Impossible without additional information mn degrees of freedom Solution: Assume small # of factors determine preference O(m + n) degrees of freedom n r n r m = m Low-rank

Alterna]ng Least Squares =

Alterna]ng Least Squares = training error for first user = ( - ) + ( - )

Alterna]ng Least Squares = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors

Alterna]ng Least Squares = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors update first user by finding that minimizes training error

Alterna]ng Least Squares = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors update first user by finding that minimizes training error reduces to standard linear regression problem

Alterna]ng Least Squares... = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors update first user by finding that minimizes training error reduces to standard linear regression problem can update all users in parallel!

Alterna]ng Least Squares = training error for first user = ( - ) + ( - )

Alterna]ng Least Squares = H > M W training error for first user = ( - ) + ( - )

Alterna]ng Least Squares = H > M W training error for first user = ( - ) + ( - ) = X (M 1j W 1 H > j ) 2 (1,j)2

Alterna]ng Least Squares = H > M W training error for first user = ( - ) + ( - ) = X (M 1j W 1 H > j ) 2 (1,j)2 W 1 =(H > 1 H 1 ) 1 H > 1 M > 1 1

Exercise Today 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. Validate and tune hyperparameters. 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. Validate and tune hyperparameters. Get YOUR recommenda]ons. 19

Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. Validate and tune hyperparameters. Get YOUR recommenda]ons. Great example of a Spark applica]on! 19

Vision MLlib Collabora.ve Filtering ALS Details

Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

Broadcast Everything Ratings Movie! User! Master Workers

Broadcast Everything Ratings Movie! User! Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

Broadcast Everything Ratings Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

Data Parallel Ratings Movie! Ratings User! Ratings Master Workers Ratings

Data Parallel Ratings Workers load data Movie! Ratings User! Ratings Master Workers Ratings

Data Parallel Ratings Workers load data Movie! Ratings Master broadcasts ini]al models User! Ratings Master Workers Ratings

Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Master Workers Ratings

Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Master Workers Ratings

Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings

Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

Fully Parallel Ratings Movie! User! Ratings Movie! User! Ratings Movie! User! Master Workers Ratings Movie! User!

Fully Parallel Ratings Movie! User! Workers load data Ratings Movie! User! Ratings Movie! User! Master Workers Ratings Movie! User!

Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! Ratings Movie! User! Master Workers Ratings Movie! User!

Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Master Workers Ratings Movie! User!

Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! Ratings Movie! User! At each itera]on, models are shared via join between workers. Much bejer scalability. Master Workers Ratings Movie! User!

Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

Three Kinds of ALS Four Broadcast Everything Data Parallel Fully Parallel Blocked

ML Optimizer MLI MLlib Runtime(s) Spark ML Op.mizer: a declara]ve layer to simplify access to large- scale ML MLI: experimental API for simplified feature extrac]on and algorithm development MLlib: produc]on- quality ML library in Spark Spark: cluster compu]ng system designed for itera]ve computa]on