SCALABLE, LOW LATENCY MODEL SERVING AND MANAGEMENT WITH VELOX

Size: px

Start display at page:

Download "SCALABLE, LOW LATENCY MODEL SERVING AND MANAGEMENT WITH VELOX"

Silvia Bryant
5 years ago
Views:

1 THE MISSING PIECE IN COMPLEX ANALYTICS: SCALABLE, LOW LATENCY MODEL SERVING AND MANAGEMENT WITH VELOX Daniel Crankshaw, Peter Bailis, Joseph Gonzalez, Haoyuan Li, Zhao Zhang, Ali Ghodsi, Michael Franklin, and Michael I. Jordan UC Berkeley AMPLab CIDR 2015

2 Talk Outline ML model management today Velox system architecture Key idea: Split model family Prediction serving Model management Next directions

3 Catify: Music for Cats

4 MODELING TASK Rating Songs

5 MODELING TASK Ratings Prediction Songs

6 Data

7 Data Model

8 Data Training Model

9 BERKELEY DATA ANALYTICS STACK (BDAS) Spark Streaming BlinkDB Spark SQL GraphX MLBase MLlib Spark Mesos Hadoop Yarn Tachyon HDFS, S3,

10 Catify: Music for Cats

11 Catify: Music for Cats CatID Song Score

12 Catify: Music for Cats Pipeline CatID Song Score

13 Catify: Music for Cats Pipeline CatID Song Score Tachyon + HDFS

14 Catify: Music for Cats Pipeline CatID Song Score Tachyon + HDFS

15 Prediction Latency Prediction Error

16 Prediction Latency Lower is Better Prediction Error

17 Prediction Latency Lower is Better Lower is Better Prediction Error

18 Online Retraining e.g., Prediction Latency Lower is Better Lower is Better Prediction Error

19 Catify: Music for Cats Pipeline Tachyon + HDFS

20 Catify: Music for Cats Apache Web Server Pipeline Node.js App Server MySQL Tachyon + HDFS

21 Catify: Music for Cats Apache Web Server Pipeline Node.js App Server MySQL Tachyon + HDFS

22 Catify: Music for Cats Users Songs

23 Catify: Music for Cats Users Songs O(users * songs)

24 Catify: Music for Cats NGINX Pipeline Node.js App Server MySQL Tachyon + HDFS

25 Catify: Music for Cats NGINX Pipeline Node.js App Server New Model MySQL Tachyon + HDFS

26 Catify: Music for Cats Training Data NGINX Pipeline Node.js App Server New Model MySQL Tachyon + HDFS

27 Online Retraining e.g., Prediction Latency Prediction Error

28 Online Retraining e.g., Prediction Latency Full pre-materialization e.g., Prediction Error

29 What s wrong?

30 What s wrong? 1. Predictions have either:

31 What s wrong? 1. Predictions have either: a. High latency, low staleness

32 What s wrong? 1. Predictions have either: a. High latency, low staleness b. Low latency, high staleness

33 What s wrong? 1. Predictions have either: a. High latency, low staleness b. Low latency, high staleness 2. Limited optimization of model semantics

34 What s wrong? 1. Predictions have either: a. High latency, low staleness b. Low latency, high staleness 2. Limited optimization of model semantics 3. Ad-hoc lifecycle management

35 Talk Outline ML model management today Velox system architecture Split model family Prediction serving Model management Next directions

36 Talk Outline ML model management today Velox system architecture Split model family Prediction serving Model management Next directions

37 Online Retraining e.g., Prediction Latency Full pre-materialization e.g., Prediction Error

38 Online Retraining e.g., Prediction Latency VELOX Full pre-materialization e.g., Prediction Error

39 VELOX GOALS

40 VELOX GOALS 1. Low latency and low error predictions

41 VELOX GOALS 1. Low latency and low error predictions 2. Cross-cutting model-specific optimizations

42 VELOX GOALS 1. Low latency and low error predictions 2. Cross-cutting model-specific optimizations 3. Unified system eases operation

43 Online Retraining e.g., Prediction Latency VELOX Full pre-materialization e.g., Prediction Error

44 Prediction Latency Online Retraining e.g., key idea: split model into staleness insensitive and staleness sensitive components VELOX Full pre-materialization e.g., Prediction Error

45 Prediction Latency Online Retraining e.g., BATCH key idea: split model into staleness insensitive and staleness sensitive components VELOX Full pre-materialization e.g., Prediction Error

46 Prediction Latency Online Retraining e.g., BATCH INCREMENTAL key idea: split model into staleness insensitive and staleness sensitive components VELOX Full pre-materialization e.g., Prediction Error

47 BERKELEY DATA ANALYTICS STACK (BDAS) Spark Streaming BlinkDB Spark SQL GraphX MLbase MLlib Spark Mesos Hadoop Yarn Tachyon HDFS, S3,

48 THE MISSING PIECE Training Spark Streaming BlinkDB Spark SQL Graph X MLbase ML library Spark Mesos Hadoop Yarn Tachyon HDFS, S3,

49 THE MISSING PIECE Training Management + Serving Spark Streaming BlinkDB Spark SQL Graph X MLbase ML library Spark Mesos Hadoop Yarn Tachyon HDFS, S3,

50 THE MISSING PIECE Training Management + Serving Spark Streaming BlinkDB Spark SQL Graph X MLbase ML library Velox Spark Mesos Hadoop Yarn Tachyon HDFS, S3,

51 THE MISSING PIECE Training Management + Serving Spark Streaming BlinkDB Spark SQL Graph X MLbase ML library Velox Spark Mesos Hadoop Yarn Tachyon HDFS, S3,

52 THE MISSING PIECE Training Management + Serving Spark Streaming BlinkDB Spark SQL Spark Graph X MLbase ML library Model Manager Velox Mesos Hadoop Yarn Tachyon HDFS, S3,

53 THE MISSING PIECE Training Management + Serving Spark Streaming BlinkDB Spark SQL Spark Graph X MLbase ML library Model Manager Velox Prediction Service Mesos Hadoop Yarn Tachyon HDFS, S3,

54 PREDICTION SERVICE

55 PREDICTION SERVICE 1. Implements model serving API

56 PREDICTION SERVICE 1. Implements model serving API 2. Low latency; < 10ms response time

57 PREDICTION SERVICE 1. Implements model serving API 2. Low latency; < 10ms response time 3. Fuzzy materialized view of model state

58 PREDICTION SERVICE 1. Implements model serving API 2. Low latency; < 10ms response time 3. Fuzzy materialized view of model state MODEL MANAGER

59 PREDICTION SERVICE 1. Implements model serving API 2. Low latency; < 10ms response time 3. Fuzzy materialized view of model state MODEL MANAGER 1. Maintains models via online and batch retraining

60 PREDICTION SERVICE 1. Implements model serving API 2. Low latency; < 10ms response time 3. Fuzzy materialized view of model state MODEL MANAGER 1. Maintains models via online and batch retraining 2. Stores model catalog, metadata, versioning

61 PREDICTION SERVICE 1. Implements model serving API 2. Low latency; < 10ms response time 3. Fuzzy materialized view of model state MODEL MANAGER 1. Maintains models via online and batch retraining 2. Stores model catalog, metadata, versioning 3. Contains library of standard models + custom API

62 Talk Outline ML model management today Velox system architecture Key idea: Split model family Prediction serving Model management Next directions

63 Talk Outline ML model management today Velox system architecture Key idea: Split model family Prediction serving Model management Next directions

64 PERSONALIZED MODELING

65 PERSONALIZED MODELING

66 PERSONALIZED MODELING A Separate Model for Each User?

67 PERSONALIZED MODELING A Separate Model for Each User? Computationally Inefficient many complex models

68 PERSONALIZED MODELING A Separate Model for Each User? Computationally Inefficient many complex models Statistically Inefficient not enough data per user

69 Input (Song) Rating

70 Input (Song) Rating

71 Input (Song) Rating Split

72 Input (Song) Rating Split

73 PERSONALIZED MODELING Personalized User Model Input (Song)

74 PERSONALIZED MODELING Shared Basis Feature Model Personalized User Model Input (Song)

75 PERSONALIZED MODELING Shared Basis Feature Model Trained across users Changes Slowly Personalized User Model Input (Song)

76 PERSONALIZED MODELING Shared Basis Feature Model Trained across users Changes Slowly Personalized User Model Input (Song) Trained for each user Changes Quickly

77 SPLIT MODEL FORMULATION Shared Basis Feature Model Personalized User Model Input (Song)

78 SPLIT MODEL FORMULATION Shared Basis Feature Model Personalized User Model Input (Song)

79 SPLIT MODEL FORMULATION Shared Basis Feature Model Personalized User Model Meow Input (Song)

80 SPLIT MODEL FORMULATION Shared Basis Feature Model Personalized User Model Meow Input (Song) Terrible

81 MATHEMATICAL FORMULATION Input (Song)

82 MATHEMATICAL FORMULATION Input (Song) x

83 MATHEMATICAL FORMULATION Input (Song) x Shared Basis Feature Models

84 MATHEMATICAL FORMULATION Input (Song) x f(x; ) Shared Basis Feature Models

85 MATHEMATICAL FORMULATION Input (Song) x f(x; ) Changes slowly Shared Basis Feature Models

86 MATHEMATICAL FORMULATION Input (Song) x f(x; ) Changes slowly Shared Basis Feature Models Personalized User Model

87 MATHEMATICAL FORMULATION Input (Song) x f(x; ) w u Changes slowly Shared Basis Feature Models Personalized User Model

88 MATHEMATICAL FORMULATION Input (Song) x f(x; ) w u Changes slowly Shared Basis Feature Models Personalized User Model Highly dynamic

89 MATHEMATICAL FORMULATION Input (Song) x f(x; ) w u = Rating Changes slowly Shared Basis Feature Models Personalized User Model Highly dynamic

90 MATHEMATICAL FORMULATION Input (Song) x Terrible f(x; ) w u = Rating Changes slowly Shared Basis Feature Models Personalized User Model Highly dynamic

91 Talk Outline ML model management today Velox system architecture Key idea: Split model family Prediction serving Model management Next directions

92 Talk Outline ML model management today Velox system architecture Key idea: Split model family Prediction serving Model management Next directions

93 System Architecture Training Management + Serving Spark Straming BlinkDB Shark SQL Spark Graph X MLbase ML library Model Manager Velox Prediction Service Mesos Hadoop Yarn Mesos Tachyon HDFS, S3,

94 PREDICTION API Simple point queries: GET /velox/catify/predict?userid=22&song=27632

95 PREDICTION API Simple point queries: GET /velox/catify/predict?userid=22&song=27632 More complex ordering queries: GET /velox/catify/predict_top_k?userid=22&k=100

96 PREDICTIONS def predict( u: UUID, x: Context ) w u f(x; )

97 PREDICTIONS def predict( u: UUID, x: Context ) Look up user weight w u f(x; )

98 PREDICTIONS def predict( u: UUID, x: Context ) Look up user weight Primary key lookup w u f(x; )

99 PREDICTIONS def predict( u: UUID, x: Context ) Look up user weight Primary key lookup Partition query by user: always local w u f(x; )

100 PREDICTIONS def predict( u: UUID, x: Context ) Compute Features w u f(x; ) } user independent

101 PREDICTIONS def predict( u: UUID, x: Context ) Feature computation could be costly Compute Features w u f(x; ) } user independent

102 PREDICTIONS def predict( u: UUID, x: Context ) Feature computation could be costly Compute Features Cache features for reuse across users w u f(x; ) } user independent

103 FEATURE CACHING GAINS Without Caching Caching Enabled Feature caching leads to order-of-magnitude reduction in latency.

104 TOP-K QUERIES Query predicate to pre-filter candidate set All Songs

105 TOP-K QUERIES Query predicate to pre-filter candidate set All Songs Playlist Keywords

106 TOP-K QUERIES Query predicate to pre-filter candidate set All Songs Playlist Keywords Candidate Songs

107 TOP-K QUERIES Query predicate to pre-filter candidate set All Songs Playlist Keywords Candidate Songs Score and rank all candidates

108 TOP-K QUERIES Query predicate to pre-filter candidate set All Songs Playlist Keywords Candidate Songs Score and rank all candidates By exploiting split model design we can leverage:

109 TOP-K QUERIES Query predicate to pre-filter candidate set All Songs Playlist Keywords Candidate Songs Score and rank all candidates By exploiting split model design we can leverage: A. Shrivastava, P. Li. Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS). NIPS 14 Best Paper

110 TOP-K QUERIES Query predicate to pre-filter candidate set All Songs Playlist Keywords Candidate Songs Score and rank all candidates By exploiting split model design we can leverage: A. Shrivastava, P. Li. Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS). NIPS 14 Best Paper Y. Low and A. X. Zheng. Fast Top-K Similarity Queries Via Matrix Compression. CIKM 2012

111 Talk Outline ML model management today Velox system architecture Key idea: Split model family Prediction serving Model management Next directions

112 Talk Outline ML model management today Velox system architecture Key idea: Split model family Prediction serving Model management Next directions

113 System Architecture Training Management + Serving Spark Straming BlinkDB Shark SQL Spark Graph X MLbase ML library Model Manager Velox Prediction Service Mesos Hadoop Yarn Mesos Tachyon HDFS, S3,

114 System Architecture Training Management + Serving Spark Straming BlinkDB Shark SQL Spark Graph X MLbase ML library Model Manager Velox Prediction Service Mesos Hadoop Yarn Mesos 1. Online and offline model training 2. Sample bias problem Tachyon HDFS, S3,

115 FEEDBACK API Simple direct value feedback: POST /velox/catify/observe?userid=22&song=27&score=3.7

116 FEEDBACK API Simple direct value feedback: POST /velox/catify/observe?userid=22&song=27&score=3.7 Online Learning Continuously update user models in Velox

117 FEEDBACK API Simple direct value feedback: POST /velox/catify/observe?userid=22&song=27&score=3.7 Online Learning Continuously update user models in Velox Offline Learning Logged to DFS for feature learning in Spark

118 FEEDBACK API Simple direct value feedback: POST /velox/catify/observe?userid=22&song=27&score=3.7 Online Learning Continuously update user models in Velox Offline Learning Logged to DFS for feature learning in Spark Evaluation Continuously assess model performance

119 ONLINE LEARNING def observe(u: UUID, x: Context, y: Score) w u f(x; )

120 ONLINE LEARNING def observe(u: UUID, x: Context, y: Score) Update wu with new training point w u f(x; )

121 ONLINE LEARNING def observe(u: UUID, x: Context, y: Score) Update wu with new training point Stochastic gradient descent w u f(x; )

122 ONLINE LEARNING def observe(u: UUID, x: Context, y: Score) Update wu with new training point Stochastic gradient descent Incremental linear algebra w u f(x; )

123 OFFLINE LEARNING def retrain(trainingdata: RDD) w u f(x; ) Spark Based Training Algs. Efficient batch training using Spark

124 OFFLINE LEARNING def retrain(trainingdata: RDD) w u f(x; ) Spark Based Training Algs. Efficient batch training using Spark When do we retrain?

125 OFFLINE LEARNING def retrain(trainingdata: RDD) w u f(x; ) Spark Based Training Algs. Efficient batch training using Spark When do we retrain? Periodically

126 OFFLINE LEARNING def retrain(trainingdata: RDD) w u f(x; ) Spark Based Training Algs. Efficient batch training using Spark When do we retrain? Periodically Trigger by the evaluation system

127 PREDICTION LATENCY WITH ONLINE TRAINING Spark Velox Velox keeps models updated at low latency

128 Data Model

129 Data Model Sample Bias: model affects the training data.

130 ALWAYS SERVE THE BEST SONG? Predicted Rating Songs

131 ALWAYS SERVE THE BEST SONG? Predicted Rating Songs

132 VELOX SOLUTION With prob. 1- ϵ serve the best predicted song Predicted Rating Songs

133 VELOX SOLUTION With prob. 1- ϵ serve the best predicted song Predicted Rating Songs

134 VELOX SOLUTION With prob. 1- ϵ serve the best predicted song With prob. ϵ pick a random song Predicted Rating Songs

135 VELOX SOLUTION With prob. 1- ϵ serve the best predicted song With prob. ϵ pick a random song Predicted Rating Songs

136 VELOX SOLUTION With prob. 1- ϵ serve the best predicted song With prob. ϵ pick a random song Epsilon Greedy Predicted Rating Songs

137 VELOX SOLUTION With prob. 1- ϵ serve the best predicted song With prob. ϵ pick a random song Epsilon Greedy Predicted Rating Active Learning Opportunity to explore new systems for this emerging analytics workload Songs

138 Talk Outline ML model management today Velox system architecture Key idea: Split model family Prediction serving Model management Next directions

139 Talk Outline ML model management today Velox system architecture Split model family Prediction serving Model management Next directions

140 OPEN CHALLENGES FOR DATABASE SYSTEMS

141 OPEN CHALLENGES FOR DATABASE SYSTEMS Going beyond the split model family

142 OPEN CHALLENGES FOR DATABASE SYSTEMS Going beyond the split model family logical model pipeline language

143 OPEN CHALLENGES FOR DATABASE SYSTEMS Going beyond the split model family logical model pipeline language More generic training pipelines

144 OPEN CHALLENGES FOR DATABASE SYSTEMS Going beyond the split model family logical model pipeline language More generic training pipelines standard set of physical operators

145 OPEN CHALLENGES FOR DATABASE SYSTEMS Going beyond the split model family logical model pipeline language More generic training pipelines standard set of physical operators Automatically choose split for online & offline training

146 OPEN CHALLENGES FOR DATABASE SYSTEMS Going beyond the split model family logical model pipeline language More generic training pipelines standard set of physical operators Automatically choose split for online & offline training view maintenance and query optimization

147 OPEN CHALLENGES FOR DATABASE SYSTEMS Going beyond the split model family logical model pipeline language More generic training pipelines standard set of physical operators Automatically choose split for online & offline training view maintenance and query optimization Ensure user privacy

148 OPEN CHALLENGES FOR DATABASE SYSTEMS Going beyond the split model family logical model pipeline language More generic training pipelines standard set of physical operators Automatically choose split for online & offline training view maintenance and query optimization Ensure user privacy Privacy-Preserving DBMS

149 Data

150 Data Model

151 Data Training Model

152 Data Training Predictions Serving Model

153 Data Feedback Training Predictions Serving Model

154 The future of research in scalable learning systems will be in the integration of the learning lifecycle: Data Feedback Training Predictions Serving Model

155 THE MISSING PIECE Training Management + Serving Spark Streaming BlinkDB Spark SQL Spark Graph X MLbase ML library Model Manager Velox Prediction Service Mesos Hadoop Yarn Tachyon HDFS, S3,

156 Prediction Latency Online Retraining e.g., BATCH INCREMENTAL key idea: split model into staleness insensitive and staleness sensitive components VELOX Full pre-materialization e.g., Prediction Error

157 SUMMARY

158 SUMMARY Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems

159 SUMMARY Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems The Velox system automatically maintains multiple models while providing low latency, scalable, and personalized predictions

160 SUMMARY Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems The Velox system automatically maintains multiple models while providing low latency, scalable, and personalized predictions Velox is coming soon as part of BDAS

161 SUMMARY Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems The Velox system automatically maintains multiple models while providing low latency, scalable, and personalized predictions Velox is coming soon as part of BDAS

162 QUESTIONS?

Intelligent Services Serving Machine Learning

Intelligent Services Serving Machine Learning Joseph E. Gonzalez jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc. Contemporary Learning Systems Big Data