MLlib. Distributed Machine Learning on. Evan Sparks. UC Berkeley

Size: px

Start display at page:

Download "MLlib. Distributed Machine Learning on. Evan Sparks. UC Berkeley"

Bathsheba Bradford
5 years ago
Views:

MLlib & ML base Distributed Machine Learning on Evan Sparks UC Berkeley January 31st, 2014 Collaborators: Ameet Talwalkar, Xiangrui Meng, Virginia Smith, Xinghao

1 MLlib & ML base Distributed Machine Learning on Evan Sparks UC Berkeley January 31st, 2014 Collaborators: Ameet Talwalkar, Xiangrui Meng, Virginia Smith, Xinghao Pan, Shivaram Venkataraman, Matei Zaharia, Rean Griffith, John Duchi, Joseph Gonzalez, Michael Franklin, Michael I. Jordan, Tim Kraska UC Berkeley

2 Problem: Scalable implementa.ons difficult for ML Developers

3 Problem: Scalable implementa.ons difficult for ML Developers

4 Problem: Scalable implementa.ons difficult for ML Developers

5 Problem: ML is difficult for End Users Too many algorithms

6 Problem: ML is difficult for End Users Too many knobs Too many algorithms

7 Problem: ML is difficult for End Users Too many knobs Too many algorithms Difficult to debug

8 Problem: ML is difficult for End Users Too many knobs Too many algorithms Difficult to debug Doesn t scale

9 Problem: ML is difficult Fast for End Users Too many knobs Reliable Provable Accurate Too many algorithms Difficult to debug Doesn t scale

10 ML Experts MLbase Systems Experts

11 ML Experts MLbase Systems Experts 1. Easy scalable ML development (ML Developers) 2. User- friendly ML at scale (End Users)

12 Matlab Stack

13 Matlab Stack Single Machine

14 Matlab Stack Lapack Single Machine Lapack: low- level Fortran linear algebra library

15 Matlab Stack Matlab Interface Lapack Single Machine Lapack: low- level Fortran linear algebra library Matlab Interface Higher- level abstrac]ons for data access / processing More extensive func]onality than Lapack Leverages Lapack whenever possible

16 Matlab Stack Matlab Interface Lapack Single Machine Lapack: low- level Fortran linear algebra library Matlab Interface Higher- level abstrac]ons for data access / processing More extensive func]onality than Lapack Leverages Lapack whenever possible Similar stories for R and Python

17 MLbase Stack Matlab Interface Lapack Single Machine

18 MLbase Stack Matlab Interface Lapack Single Machine Runtime(s)

19 MLbase Stack Matlab Interface Lapack Single Machine Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on

20 MLbase Stack Matlab Interface Lapack Single Machine MLlib Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on MLlib: produc]on- quality ML library in Spark

21 MLbase Stack Matlab Interface Lapack Single Machine MLI MLlib Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on MLlib: produc]on- quality ML library in Spark MLI: experimental API for simplified feature extrac]on and algorithm development

22 MLbase Stack ML Optimizer Matlab Interface Lapack Single Machine MLI MLlib Runtime(s) Spark Spark: cluster compu]ng system designed for itera]ve computa]on MLlib: produc]on- quality ML library in Spark MLI: experimental API for simplified feature extrac]on and algorithm development ML Op.mizer: a declara]ve layer to simplify access to large- scale ML

23 Overview MLlib Collabora.ve Filtering ALS Details

24 MLlib Classifica.on: Regression: Collabora.ve Filtering: Clustering / Explora.on: Op.miza.on Primi.ves: Interopera.lity: Logis]c Regression, Linear SVM (+L1, L2), Decision Trees, Naive Bayes Linear Regression (+Lasso, Ridge) Alterna]ng Least Squares K- Means, SVD SGD, Parallel Gradient Scala, Java, PySpark (0.9)

25 MLlib Classifica.on: Regression: Collabora.ve Filtering: Clustering / Explora.on: Op.miza.on Primi.ves: Logis]c Regression, Linear SVM (+L1, L2), Decision Trees, Naive Bayes Linear Regression (+Lasso, Ridge) Alterna]ng Least Squares K- Means, SVD SGD, Parallel Gradient Interopera.lity: Scala, Java, PySpark (0.9) Included within Spark codebase Unlike Mahout/Hadoop Part of Spark 0.8 release Con]nued support via Spark project Community involvement has been terrific: ALS with implicit feedback (0.8.1), Naive Bayes (0.9), SVD (0.9), Decision Trees (soon!)

26 MLlib Performance

27 MLlib Performance Wall.me: elapsed ]me to execute task

28 MLlib Performance Wall.me: elapsed ]me to execute task Weak scaling fix problem size per processor ideally: constant wall]me as we grow cluster

29 MLlib Performance Wall.me: elapsed ]me to execute task Weak scaling fix problem size per processor ideally: constant wall]me as we grow cluster Strong scaling fix total problem size ideally: linear speed up as we grow cluster

30 MLlib Performance Wall.me: elapsed ]me to execute task Weak scaling fix problem size per processor ideally: constant wall]me as we grow cluster Strong scaling fix total problem size ideally: linear speed up as we grow cluster EC2 Experiments m2.4xlarge instances, up to 32 machine clusters

31 Logis]c Regression - Weak Scaling

32 Logis]c Regression - Weak Scaling Full dataset: 200K images, 160K dense features

33 Logis]c Regression - Weak Scaling 10 8 MLbase MLlib VW Ideal relative walltime # machines Full dataset: 200K images, 160K dense features Similar weak scaling

34 Logis]c Regression - Weak Scaling 10 8 MLbase MLlib VW Ideal relative walltime # machines Full dataset: 200K images, 160K dense features Similar weak scaling MLlib within a factor of 2 of VW s wall]me

35 Logis]c Regression - Weak Scaling relative walltime MLbase MLlib VW Ideal walltime (s) n=6k, d=160k n=12.5k, d=160k n=25k, d=160k n=50k, d=160k n=100k, d=160k n=200k, d=160k # machines 0 MLbase MLlib VW Matlab Full dataset: 200K images, 160K dense features Similar weak scaling MLlib within a factor of 2 of VW s wall]me

36 Logis]c Regression - Strong Scaling

37 Logis]c Regression - Strong Scaling Fixed Dataset: 50K images, 160K dense features

38 Logis]c Regression - Strong Scaling MLlib MLbase VW Ideal speedup # machines Fixed Dataset: 50K images, 160K dense features MLlib exhibits bejer scaling proper]es

39 Logis]c Regression - Strong Scaling speedup MLlib MLbase VW Ideal walltime (s) Machine 2 Machines 4 Machines 8 Machines 16 Machines 32 Machines # machines MLbase MLlib VW Matlab Fixed Dataset: 50K images, 160K dense features MLlib exhibits bejer scaling proper]es MLlib faster than VW with 16 and 32 machines

40 ALS - Wall]me

41 ALS - Wall]me Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines

42 ALS - Wall]me System Wall.me (seconds) Matlab Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines

43 ALS - Wall]me System Wall.me (seconds) Matlab Mahout 4206 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines

44 ALS - Wall]me System Wall.me (seconds) Matlab Mahout 4206 GraphLab 291 MLlib 481 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines MLlib an order of magnitude faster than Mahout

45 ALS - Wall]me System Wall.me (seconds) Matlab Mahout 4206 GraphLab 291 MLlib 481 Dataset: Scaled version of Neklix data (9X in size) Cluster: 9 machines MLlib an order of magnitude faster than Mahout MLlib within factor of 2 of GraphLab

46 Deployment Considera]ons

47 Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance

48 Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance MLlib Reads files from HDFS Launch/compile/run on cluster with a few commands RDD s provide fault tolerance naturally

49 Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance MLlib Reads files from HDFS Launch/compile/run on cluster with a few commands RDD s provide fault tolerance naturally Part of Spark s swiss army knife ecosystem

50 Deployment Considera]ons Vowpal Wabbit, GraphLab Data prepara]on specific to each program Non- trivial setup on cluster No fault tolerance MLlib Reads files from HDFS Launch/compile/run on cluster with a few commands RDD s provide fault tolerance naturally Part of Spark s swiss army knife ecosystem Shark, Spark Streaming, Graph- X, BlinkDB, etc.

51 Vision MLlib Collabora.ve Filtering ALS Details

52 Matrix Comple]on

53 Matrix Comple]on Goal: Recover a matrix from a subset of its entries

54 Matrix Comple]on Goal: Recover a matrix from a subset of its entries

55 Matrix Comple]on Goal: Recover a matrix from a subset of its entries

56 Reducing Degrees of Freedom

57 Reducing Degrees of Freedom Problem: Impossible without additional information n m

58 Reducing Degrees of Freedom Problem: Impossible without additional information mn degrees of freedom n r n r m = m Low-rank

59 Reducing Degrees of Freedom Problem: Impossible without additional information mn degrees of freedom Solution: Assume small # of factors determine preference O(m + n) degrees of freedom n r n r m = m Low-rank

60 Alterna]ng Least Squares =

61 Alterna]ng Least Squares =

62 Alterna]ng Least Squares =

63 Alterna]ng Least Squares =

64 Alterna]ng Least Squares = training error for first user = ( - ) + ( - )

65 Alterna]ng Least Squares = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors

66 Alterna]ng Least Squares = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors update first user by finding that minimizes training error

67 Alterna]ng Least Squares = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors update first user by finding that minimizes training error reduces to standard linear regression problem

68 Alterna]ng Least Squares... = training error for first user = ( - ) + ( - ) ALS: alternate between updating user and movie factors update first user by finding that minimizes training error reduces to standard linear regression problem can update all users in parallel!

69 Alterna]ng Least Squares = training error for first user = ( - ) + ( - )

70 Alterna]ng Least Squares = H > M W training error for first user = ( - ) + ( - )

71 Alterna]ng Least Squares = H > M W training error for first user = ( - ) + ( - ) = X (M 1j W 1 H > j ) 2 (1,j)2

72 Alterna]ng Least Squares = H > M W training error for first user = ( - ) + ( - ) = X (M 1j W 1 H > j ) 2 (1,j)2 W 1 =(H > 1 H 1 ) 1 H > 1 M > 1 1

73 Exercise Today 19

74 Exercise Today Load 1,000,000 ra]ngs from MovieLens. 19

75 Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. 19

76 Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. 19

77 Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. 19

78 Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. Validate and tune hyperparameters. 19

79 Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. Validate and tune hyperparameters. Get YOUR recommenda]ons. 19

80 Exercise Today Load 1,000,000 ra]ngs from MovieLens. Get YOUR ra]ngs. Split into training/valida]on. Fit a model. Validate and tune hyperparameters. Get YOUR recommenda]ons. Great example of a Spark applica]on! 19

81 Vision MLlib Collabora.ve Filtering ALS Details

82 Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

83 Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

84 Broadcast Everything Ratings Movie! User! Master Workers

85 Broadcast Everything Ratings Movie! User! Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

86 Broadcast Everything Ratings Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

87 Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

88 Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

89 Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

90 Broadcast Everything Master loads (small) data file and ini]alizes models. Master broadcasts data and ini]al models. Ratings Movie! User! At each itera]on, updated models are broadcast again. Works OK for small data. Master Workers Lots of communica]on overhead - doesn t scale well. Ships with Spark Examples

91 Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

92 Data Parallel Ratings Movie! Ratings User! Ratings Master Workers Ratings

93 Data Parallel Ratings Workers load data Movie! Ratings User! Ratings Master Workers Ratings

94 Data Parallel Ratings Workers load data Movie! Ratings Master broadcasts ini]al models User! Ratings Master Workers Ratings

95 Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Master Workers Ratings

96 Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Master Workers Ratings

97 Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings

98 Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings Works well for smaller models. (low K)

99 Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings Works well for smaller models. (low K)

100 Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings Works well for smaller models. (low K)

101 Data Parallel Ratings Workers load data Movie! User! Ratings Ratings Master broadcasts ini]al models At each itera]on, updated models are broadcast again Much bejer scaling Works on large datasets Master Workers Ratings Works well for smaller models. (low K)

102 Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

103 Fully Parallel Ratings Movie! User! Ratings Movie! User! Ratings Movie! User! Master Workers Ratings Movie! User!

104 Fully Parallel Ratings Movie! User! Workers load data Ratings Movie! User! Ratings Movie! User! Master Workers Ratings Movie! User!

105 Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! Ratings Movie! User! Master Workers Ratings Movie! User!

106 Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Master Workers Ratings Movie! User!

107 Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! Ratings Movie! User! At each itera]on, models are shared via join between workers. Much bejer scalability. Master Workers Ratings Movie! User!

108 Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! Ratings Movie! User! At each itera]on, models are shared via join between workers. Much bejer scalability. Works on large datasets Master Workers Ratings Movie! User!

109 Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Much bejer scalability. Works on large datasets Master Workers Ratings Movie! User! Works on big models (higher K)

110 Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Much bejer scalability. Works on large datasets Master Workers Ratings Movie! User! Works on big models (higher K)

111 Fully Parallel Ratings Movie! User! Workers load data Models are instan]ated at workers. Ratings Movie! User! At each itera]on, models are shared via join between workers. Ratings Movie! User! Much bejer scalability. Works on large datasets Master Workers Ratings Movie! User! Works on big models (higher K)

112 Three Kinds of ALS Broadcast Everything Data Parallel Fully Parallel

113 Three Kinds of ALS Four Broadcast Everything Data Parallel Fully Parallel Blocked

114 ML Optimizer MLI MLlib Runtime(s) Spark ML Op.mizer: a declara]ve layer to simplify access to large- scale ML MLI: experimental API for simplified feature extrac]on and algorithm development MLlib: produc]on- quality ML library in Spark Spark: cluster compu]ng system designed for itera]ve computa]on

115 ML Optimizer MLI MLlib Runtime(s) Spark ML Op.mizer: a declara]ve layer to simplify access to large- scale ML MLI: experimental API for simplified feature extrac]on and algorithm development MLlib: produc]on- quality ML library in Spark Spark: cluster compu]ng system designed for itera]ve computa]on ML base

116 ML Optimizer MLI MLlib Runtime(s) Spark ML Op.mizer: a declara]ve layer to simplify access to large- scale ML MLI: experimental API for simplified feature extrac]on and algorithm development MLlib: produc]on- quality ML library in Spark Spark: cluster compu]ng system designed for itera]ve computa]on THANKS! QUESTIONS? ML base

Evan Sparks and Ameet Talwalkar UC Berkeley. UC Berkeley

ML base Evan Sparks and Ameet Talwalkar UC Berkeley UC Berkeley Three Converging Trends Three Converging Trends Big Data Three Converging Trends Big Data Distributed Compu2ng Three Converging Trends Machine