MLlib & ML base Distributed Machine Learning on Evan Sparks UC Berkeley January 31st, 2014 Collaborators: Ameet Talwalkar, Xiangrui Meng, Virginia Smith, Xinghao Pan, Shivaram Venkataraman, Matei Zaharia, Rean Griffith, John Duchi, Joseph Gonzalez, Michael Franklin, Michael I. Jordan, Tim Kraska www.mlbase.org UC Berkeley
Problem: Scalable implementa.ons difficult for ML Developers
Problem: Scalable implementa.ons difficult for ML Developers
Problem: Scalable implementa.ons difficult for ML Developers
Problem: ML is difficult for End Users Too many algorithms
Problem: ML is difficult for End Users Too many knobs Too many algorithms
Problem: ML is difficult for End Users Too many knobs Too many algorithms Difficult to debug
Problem: ML is difficult for End Users Too many knobs Too many algorithms Difficult to debug Doesn t scale
Problem: ML is difficult Fast for End Users Too many knobs Reliable Provable Accurate Too many algorithms Difficult to debug Doesn t scale
ML Experts MLbase Systems Experts
ML Experts MLbase Systems Experts 1. Easy scalable ML development (ML Developers) 2. User- friendly ML at scale (End Users)
Matlab Stack
Matlab Stack Single Machine
Matlab Stack Lapack Single Machine Lapack: low- level Fortran linear algebra library
Matlab Stack Matlab Interface Lapack Single Machine Lapack: low- level Fortran linear algebra library Matlab Interface Higher- level abstrac]ons for data access / processing More extensive func]onality than Lapack Leverages Lapack whenever possible
Matlab Stack Matlab Interface Lapack Single Machine Lapack: low- level Fortran linear algebra library Matlab Interface Higher- level abstrac]ons for data access / processing More extensive func]onality than Lapack Leverages Lapack whenever possible Similar stories for R and Python
MLbase Stack Matlab Interface Lapack Single Machine
MLbase Stack Matlab Interface Lapack Single Machine Runtime(s)
MLbase Stack Matlab Interface Lapack Single Machine Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on
MLbase Stack Matlab Interface Lapack Single Machine MLlib Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on MLlib: produc]on- quality ML library in Spark
MLbase Stack Matlab Interface Lapack Single Machine MLI MLlib Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on MLlib: produc]on- quality ML library in Spark MLI: experimental API for simplified feature extrac]on and algorithm development
MLbase Stack ML Optimizer Matlab Interface Lapack Single Machine MLI MLlib Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on MLlib: produc]on- quality ML library in Spark MLI: experimental API for simplified feature extrac]on and algorithm development ML Op.mizer: a declara]ve layer to simplify access to large- scale ML
Overview MLlib Collabora.ve Filtering ALS Details
MLlib Classifica.on: Regression: Collabora.ve Filtering: Clustering / Explora.on: Op.miza.on Primi.ves: Interopera.lity: Logis]c Regression, Linear SVM (+L1, L2), Decision Trees, Naive Bayes Linear Regression (+Lasso, Ridge) Alterna]ng Least Squares K- Means, SVD SGD, Parallel Gradient Scala, Java, PySpark (0.9)
MLlib Classifica.on: Regression: Collabora.ve Filtering: Clustering / Explora.on: Op.miza.on Primi.ves: Logis]c Regression, Linear SVM (+L1, L2), Decision Trees, Naive Bayes Linear Regression (+Lasso, Ridge) Alterna]ng Least Squares K- Means, SVD SGD, Parallel Gradient Interopera.lity: Scala, Java, PySpark (0.9) Included within Spark codebase Unlike Mahout/Hadoop Part of Spark 0.8 release Con]nued support via Spark project Community involvement has been terrific: ALS with implicit feedback (0.8.1), Naive Bayes (0.9), SVD (0.9), Decision Trees (soon!)
MLlib Performance
MLlib Performance Wall.me: elapsed ]me to execute task
MLlib Performance Wall.me: elapsed ]me to execute task Weak scaling fix problem size per processor ideally: constant wall]me as we grow cluster
MLlib Performance Wall.me: elapsed ]me to execute task Weak scaling fix problem size per processor ideally: constant wall]me as we grow cluster Strong scaling fix total problem size ideally: linear speed up as we grow cluster
MLlib Performance Wall.me: elapsed ]me to execute task Weak scaling fix problem size per processor ideally: constant wall]me as we grow cluster Strong scaling fix total problem size ideally: linear speed up as we grow cluster EC2 Experiments m2.4xlarge instances, up to 32 machine clusters
Logis]c Regression - Weak Scaling
Logis]c Regression - Weak Scaling Full dataset: 200K images, 160K dense features
Logis]c Regression - Weak Scaling 10 8 MLbase MLlib VW Ideal relative walltime 6 4 2 0 0 5 10 15 20 25 30 # machines Full dataset: 200K images, 160K dense features Similar weak scaling
Logis]c Regression - Weak Scaling 10 8 MLbase MLlib VW Ideal relative walltime 6 4 2 0 0 5 10 15 20 25 30 # machines Full dataset: 200K images, 160K dense features Similar weak scaling MLlib within a factor of 2 of VW s wall]me
Logis]c Regression - Weak Scaling relative walltime 10 8 6 4 MLbase MLlib VW Ideal walltime (s) 4000 3000 2000 n=6k, d=160k n=12.5k, d=160k n=25k, d=160k n=50k, d=160k n=100k, d=160k n=200k, d=160k 2 1000 0 0 5 10 15 20 25 30 # machines 0 MLbase MLlib VW Matlab Full dataset: 200K images, 160K dense features Similar weak scaling MLlib within a factor of 2 of VW s wall]me
Logis]c Regression - Strong Scaling
Logis]c Regression - Strong Scaling Fixed Dataset: 50K images, 160K dense features
Logis]c Regression - Strong Scaling 35 30 25 MLlib MLbase VW Ideal speedup 20 15 10 5 0 0 5 10 15 20 25 30 # machines Fixed Dataset: 50K images, 160K dense features MLlib exhibits bejer scaling proper]es
Logis]c Regression - Strong Scaling speedup 35 30 25 20 15 MLlib MLbase VW Ideal walltime (s) 1400 1200 1000 800 600 1 Machine 2 Machines 4 Machines 8 Machines 16 Machines 32 Machines 10 400 5 0 0 5 10 15 20 25 30 # machines 200 0 MLbase MLlib VW Matlab Fixed Dataset: 50K images, 160K dense features MLlib exhibits bejer scaling proper]es MLlib faster than VW with 16 and 32 machines
ALS - Wall]me
ALS - Wall]me Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines
ALS - Wall]me System Wall.me (seconds) Matlab 15443 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines
ALS - Wall]me System Wall.me (seconds) Matlab 15443 Mahout 4206 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines
ALS - Wall]me System Wall.me (seconds) Matlab 15443 Mahout 4206 GraphLab 291 MLlib 481 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines MLlib an order of magnitude faster than Mahout
ALS - Wall]me System Wall.me (seconds) Matlab 15443 Mahout 4206 GraphLab 291 MLlib 481 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines MLlib an order of magnitude faster than Mahout MLlib within factor of 2 of GraphLab
Deployment Considera]ons
Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance
Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance MLlib Reads files from HDFS Launch/compile/run on cluster with a few commands RDD s provide fault tolerance naturally
Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance MLlib Reads files from HDFS Launch/compile/run on cluster with a few commands RDD s provide fault tolerance naturally Part of Spark s swiss army knife ecosystem
Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance MLlib Reads files from HDFS Launch/compile/run on cluster with a few commands RDD s provide fault tolerance naturally Part of Spark s swiss army knife ecosystem Shark, Spark Streaming, Graph- X, BlinkDB, etc.
Vision MLlib Collabora.ve Filtering ALS Details
Matrix Comple]on
Matrix Comple]on Goal: Recover a matrix from a subset of its entries
Matrix Comple]on Goal: Recover a matrix from a subset of its entries
Matrix Comple]on Goal: Recover a matrix from a subset of its entries
Reducing Degrees of Freedom
Reducing Degrees of Freedom Problem: Impossible without additional information n m
Reducing Degrees of Freedom Problem: Impossible without additional information mn degrees of freedom n r n r m = m Low-rank
Reducing Degrees of Freedom Problem: Impossible without additional information mn degrees of freedom Solution: Assume small # of factors determine preference O(m + n) degrees of freedom n r n r m = m Low-rank
Alterna]ng Least Squares =
Alterna]ng Least Squares =
Alterna]ng Least Squares =
Alterna]ng Least Squares =
Alterna]ng Least Squares = training error for first user = ( - ) + ( - )
Alterna]ng Least Squares = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors
Alterna]ng Least Squares = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors update first user by finding that minimizes training error
Alterna]ng Least Squares = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors update first user by finding that minimizes training error reduces to standard linear regression problem
Alterna]ng Least Squares... = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors update first user by finding that minimizes training error reduces to standard linear regression problem can update all users in parallel!
Alterna]ng Least Squares = training error for first user = ( - ) + ( - )
Alterna]ng Least Squares = H > M W training error for first user = ( - ) + ( - )
Alterna]ng Least Squares = H > M W training error for first user = ( - ) + ( - ) = X (M 1j W 1 H > j ) 2 (1,j)2
Alterna]ng Least Squares = H > M W training error for first user = ( - ) + ( - ) = X (M 1j W 1 H > j ) 2 (1,j)2 W 1 =(H > 1 H 1 ) 1 H > 1 M > 1 1
Exercise Today 19
Exercise Today Load 1,000,000 ra]ngs from MovieLens. 19
Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. 19
Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. 19
Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. 19
Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. Validate and tune hyperparameters. 19
Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. Validate and tune hyperparameters. Get YOUR recommenda]ons. 19
Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. Validate and tune hyperparameters. Get YOUR recommenda]ons. Great example of a Spark applica]on! 19
Vision MLlib Collabora.ve Filtering ALS Details
Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel
Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel
Broadcast Everything Ratings Movie! User! Master Workers
Broadcast Everything Ratings Movie! User! Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples
Broadcast Everything Ratings Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples
Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples
Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples
Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples
Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples
Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel
Data Parallel Ratings Movie! Ratings User! Ratings Master Workers Ratings
Data Parallel Ratings Workers load data Movie! Ratings User! Ratings Master Workers Ratings
Data Parallel Ratings Workers load data Movie! Ratings Master broadcasts ini]al models User! Ratings Master Workers Ratings
Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Master Workers Ratings
Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Master Workers Ratings
Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings
Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings Works well for smaller models. (low K)
Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings Works well for smaller models. (low K)
Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings Works well for smaller models. (low K)
Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings Works well for smaller models. (low K)
Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel
Fully Parallel Ratings Movie! User! Ratings Movie! User! Ratings Movie! User! Master Workers Ratings Movie! User!
Fully Parallel Ratings Movie! User! Workers load data Ratings Movie! User! Ratings Movie! User! Master Workers Ratings Movie! User!
Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! Ratings Movie! User! Master Workers Ratings Movie! User!
Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Master Workers Ratings Movie! User!
Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! Ratings Movie! User! At each itera]on, models are shared via join between workers. Much bejer scalability. Master Workers Ratings Movie! User!
Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! Ratings Movie! User! At each itera]on, models are shared via join between workers. Much bejer scalability. Works on large datasets Master Workers Ratings Movie! User!
Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Much bejer scalability. Works on large datasets Master Workers Ratings Movie! User! Works on big models (higher K)
Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Much bejer scalability. Works on large datasets Master Workers Ratings Movie! User! Works on big models (higher K)
Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Much bejer scalability. Works on large datasets Master Workers Ratings Movie! User! Works on big models (higher K)
Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel
Three Kinds of ALS Four Broadcast Everything Data Parallel Fully Parallel Blocked
ML Optimizer MLI MLlib Runtime(s) Spark ML Op.mizer: a declara]ve layer to simplify access to large- scale ML MLI: experimental API for simplified feature extrac]on and algorithm development MLlib: produc]on- quality ML library in Spark Spark: cluster compu]ng system designed for itera]ve computa]on
ML Optimizer MLI MLlib Runtime(s) Spark ML Op.mizer: a declara]ve layer to simplify access to large- scale ML MLI: experimental API for simplified feature extrac]on and algorithm development MLlib: produc]on- quality ML library in Spark Spark: cluster compu]ng system designed for itera]ve computa]on ML base www.mlbase.org
ML Optimizer MLI MLlib Runtime(s) Spark ML Op.mizer: a declara]ve layer to simplify access to large- scale ML MLI: experimental API for simplified feature extrac]on and algorithm development MLlib: produc]on- quality ML library in Spark Spark: cluster compu]ng system designed for itera]ve computa]on THANKS! QUESTIONS? ML base www.mlbase.org