Intelligent Services Serving Machine Learning

Size: px

Start display at page:

Download "Intelligent Services Serving Machine Learning"

Karin West
5 years ago
Views:

1 Intelligent Services Serving Machine Learning Joseph E. Gonzalez Assistant UC Berkeley joseph@dato.com; Dato Inc.

2 Contemporary Learning Systems Big Data Training Big Models

3 Contemporary Learning Systems Create MLlib BIDMach MLC VW LIBSVM Oryx 2

4 What happens after we train a model? Data Training Model Conference Papers Dashboards and Reports Drive Actions

5 What happens after we train a model? Data Training Model Conference Papers Dashboards and Reports Drive Actions

6 Suggesting Items at Checkout Fraud Detection Cognitive Assistance Internet of Things Low-Latency Personalized Rapidly Changing

7 Data Train Model

8 Actions Data Train Model

9 Machine Learning Intelligent Services 9

10 The Life of a Query in an Intelligent Service User Request: Items like x New Page Images Feedback: Preferred Item Web Serving Tier Top-K Query Top Items α Feedback μ Intelligent Service σ math β ρ Content Request Lookup Model Feature Lookup Feature Lookup Model Info User Data Product Info

11 Essential Attributes of Intelligent Services Responsive Intelligent applications are interactive Adaptive ML models out-of-date the moment learning is done Manageable Many models created by multiple people

SELECT * FROM users JOIN items, click_logs, pages

12 Responsive: Now and Always Compute predictions in < 20ms for complex Models Queries Features Top K SELECT * FROM users JOIN items, click_logs, pages WHERE under heavy query load with system failures.

13 Experiment: End-to-end Latency in Spark MLlib To JSON HTTP Req. Feature Trans. 4 HTTP Response Encode Prediction Evaluate Model

14 Latency measured in milliseconds NOP (Avg = 5.5, P99 = 20.6) End-to-end Latency for Digits Classification 784 dimension input Served using MLlib and Dato Inc. Count out of 1000 Single Logistic Regression (Avg = 21.8, P99 = 38.6) Decision Tree (Avg = 22.4, P99 = 63.8) One-vs-all LR (10-class) (Avg = 137.7, P99 = 217.7) 100 Tree Random Forrest (Avg = 50.5, P99 = 73.4) 500 Tree Random Forrest (Avg = , 172.6, P99 = 268.7) AlexNet CNN (Avg = 418.7, P99 = 549.8)

15 Latency in Milliseconds Predict Avg Is "4" LR Decision Tree Class LR Random Forrest Random Forrest C++ AlexNet

16 Adaptive to Change at All Scales Population Granularity of Data Session Shopping for Mom Shopping for Me Months Rate of Change Minutes

17 Adaptive to Change at All Scales Population Granularity of Data Session Population Months Law of Large Numbers à Change Slow Shopping for Mom Shopping for Me Rely on efficient offline retraining à High-throughput Systems Months Rate of Change Minutes

18 Adaptive to Change at All Scales Population Granularity of Data Session Small Data à Rapidly Changing Low Latency à Online Learning Sensitive to feedback bias Shopping for Mom Shopping for Me Months Rate of Change Minutes

19 The Feedback Loop I once looked at cameras on Amazon Similar cameras and accessories Opportunity for Bandit Algorithms Bandits present new challenges: computation overhead complicates caching + indexing

20 Exploration / Exploitation Tradeoff Systems that can take actions can adversely bias future data. Opportunity for Bandits! Bandits present new challenges: Complicates caching + indexing tuning + counterfactual reasoning

21 Management: Collaborative Development Teams of data-scientists working on similar tasks Ø competing features and models Complex model dependencies: Cat Photo iscat isanimal Cat Classifier Animal Classifier Cute! Cuteness Predictor

22 UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan Predictive Services

23 UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan Predictive Services Active Research Project

24 Velox Model Serving System [CIDR 15, LearningSys 15] Focuses on the multi-task learning (MTL) domain Spam Classification f 1 ( )! f 2 ( )! Content Rec. Scoring Session 1: f 1 ( )! Session 2: f 2 ( )! Localized Anomaly Detection f 1 ( )! f 2 ( )!

25 Velox Model Serving System Personalized Models (Mulit-task Learning) [CIDR 15, LearningSys 15] Input Output Separate model for each user/context.

26 Velox Model Serving System [CIDR 15, LearningSys 15] Personalized Models (Mulit-task Learning) Feature Model Personalization Model Split

27 Hybrid Offline + Online Learning Update feature functions offline using batch solvers Leverage high-throughput Feature systems Personalization (Apache Spark) Exploit slow change in Model population statistics Model f(x; ) T w u Update the user weights online: Simple to train + more Split robust model Address rapidly changing user statistics

28 Hybrid Online + Offline Learning Results Similar Test Error Substantially Faster Training Hybrid Offline Full Offline Full Hybrid User Pref. Change

29 Evaluating the Model Input Cache Feature Evaluation Split

30 Evaluating the Model Feature Caching Across Users Input Cache Feature Evaluation Approximate Feature Hashing Anytime Feature Evaluation Split

31 Feature Caching New input: x Compute feature:f(x; ) Hash input: h(x) f(x; ) Feature Hash Table

32 LSH Cache Coarsening New input z 6= x Hash new input: h(z) Use Wrong Value! à LSH hash fn. f(x; ) Feature Hash Table

33 LSH Cache Coarsening Locality-Sensitive Hashing: x z ) h(x) =h(z) Hash new input: h(z) Locality-Sensitive Caching: Use Value Anyways! à Req. LSH f(x; ) f(x; ) f(z; ) ) h(x) =h(z) Feature Hash Table

34 Anytime Predictions Compute features asynchronously: f 1 (x; ) w u1 + E [f 2 (x; )] w u2 + f 3 (x; ) w u3 if a particular element does not arrive use estimator instead Always able to render a prediction by the latency deadline

35 Coarsening + Anytime Predictions f i (x; ) f i (z; ) Better Overly Coarsened Coarser Hash More Features Approx. Expectation Best No Coarsening f i (x; ) E [f i (x; )] Figure 10: Cash miss rate against accuracy Checkout our poster!

36 Part of Berkeley Data Analytics Stack Training Management + Serving Spark Streaming BlinkDB MLbase Graph Velox X Spark SQL Spark ML library Model Manager Prediction Service Mesos Tachyon HDFS, S3,

37 UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan Predictive Services

38 Dato Predictive Services Production ready model serving and management system ØElastic scaling and load balancing of docker.io containers ØAWS Cloudwatch Metrics and Reporting ØServes Dato Create models, scikit-learn, and custom python ØDistributed shared caching: scale-out to address latency ØREST management API: Demo?

39 UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan Predictive Services Responsive Adaptive Manageable Key Insights: Caching, Bandits, & Management Online/Offline Learning Latency vs. Accuracy

40 Future of Learning Systems Actions Data Train Model

41 Thank You Joseph E. Gonzalez Assistant UC Berkeley joseph@dato.com, Dato

SCALABLE, LOW LATENCY MODEL SERVING AND MANAGEMENT WITH VELOX

THE MISSING PIECE IN COMPLEX ANALYTICS: SCALABLE, LOW LATENCY MODEL SERVING AND MANAGEMENT WITH VELOX Daniel Crankshaw, Peter Bailis, Joseph Gonzalez, Haoyuan Li, Zhao Zhang, Ali Ghodsi, Michael Franklin,