Simplifying ML Workflows with Apache Beam & TensorFlow Extended

Size: px

Start display at page:

Download "Simplifying ML Workflows with Apache Beam & TensorFlow Extended"

Estella Karin Wheeler
5 years ago
Views:

1 Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Software Engineer at Google Apache Beam PMC

2 Apache Beam Portable data-processing pipelines

3 Example pipelines Python Java

4 Cross-language Portability Framework Language A SDK Language B SDK Language C SDK The Beam Model Runner 1 Runner 2 Runner 3 The Beam Model Language A Language B Language C

5 Python compatible runners Direct runner (local machine): Now Google Cloud Dataflow: Now Apache Flink: Q2-Q3 Apache Spark: Q3-Q4

6 TensorFlow Extended End-to-end machine learning in production

7 Doing ML in production is hard. -Everyone who has ever tried

8 Because, in addition to the actual ML... ML Code

9 ...you have to worry about so much more. Data Verification Monitoring Configuration Data Collection ML Code Serving Infrastructure Analysis Tools Process Management Tools Feature Extraction Machine Resource Management Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems

10 In this talk, I will...

11 In this talk, I will... Show you how to apply transformations... TensorFlow Transform

12 In this talk, we will... Show you how to apply transformations consistently between Training and Serving TensorFlow Transform TensorFlow Estimators TensorFlow Serving

13 In this talk, we will... Introduce something new... TensorFlow Transform TensorFlow Estimators TensorFlow Model Analysis TensorFlow Serving

14 TensorFlow Transform Consistent In-Graph Transformations in Training and Serving

15 Typical ML Pipeline During training During serving data request batch processing live processing

16 Typical ML Pipeline During training During serving data request batch processing live processing

17 TensorFlow Transform During training During serving data request tf.transform batch processing transform as tf.graph

18 Defining a preprocessing function in TF Transform X def preprocessing_fn(inputs): x = inputs['x']... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. Y Z

19 Defining a preprocessing function in TF Transform X def preprocessing_fn(inputs): x = inputs['x']... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. mean stddev normalize Y Z

20 Defining a preprocessing function in TF Transform X def preprocessing_fn(inputs): x = inputs['x']... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. mean stddev normalize multiply Y Z

21 Defining a preprocessing function in TF Transform X def preprocessing_fn(inputs): x = inputs['x']... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } mean stddev normalize multiply quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize A Y Z

22 Defining a preprocessing function in TF Transform X def preprocessing_fn(inputs): x = inputs['x']... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } mean Z Y stddev normalize multiply quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize A B

23 Defining a preprocessing function in TF Transform X def preprocessing_fn(inputs): x = inputs['x']... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } mean Y Z stddev normalize multiply quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize A B C

24 Analyzers mean stddev Transforms normalize Reduce (full pass) Implemented as a distributed data pipeline multiply quantiles bucketize Instance-to-instance (don t change batch dimension) Pure TensorFlow

25 data constant tensors mean stddev normalize multiply Analyze normalize multiply quantiles bucketize bucketize

26 What can be done with TF Transform? tf.transform batch processing Pretty much anything.

27 What can be done with TF Transform? tf.transform batch processing Pretty much anything. Serving Graph Anything that can be expressed as a TensorFlow Graph

28 Some common use-cases... Scale to... Bag of Words / N-Grams Bucketization Feature Crosses

29 Some common use-cases... Scale to... Bag of Words / N-Grams Bucketization Feature Crosses Apply another TensorFlow Model

30 github.com/tensorflow/transform

31 Introducing TensorFlow Model Analysis Scaleable, sliced, and full-pass metrics

32 Let s Talk about Metrics... How accurate? Converged model? What about my TB sized eval set? Slices / subsets? Across model versions?

33 ML Fairness: analyzing model mistakes by subgroup ROC Curve Sensitivity (True Positive Rate) All groups Specificity (False Positive Rate) Learn more at ml-fairness.com

34 ML Fairness: analyzing model mistakes by subgroup ROC Curve All groups Sensitivity (True Positive Rate) Group A Group B Specificity (False Positive Rate) Learn more at ml-fairness.com

35 ML Fairness: understand the failure modes of your models

36 ML Fairness: Learn More ml-fairness.com

37 How does it work? Inference Graph (SavedModel)... estimator = DNNLinearCombinedClassifier(...) estimator.train(...) estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn) tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn)... SignatureDef

38 How does it work? Inference Graph (SavedModel)... estimator = DNNLinearCombinedClassifier(...) estimator.train(...) SignatureDef estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn) tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn)... Eval Graph (SavedModel) Eval SignatureDef Metadata

39 github.com/tensorflow/model-analysis

40 Summary Apache Beam: Data-processing framework the runs locally and scales to massive data, in the Cloud (now) and soon on-premise via Flink (Q2-Q3) and Spark (Q3-Q4). Powers large-scale data processing in the TF libraries below. tf.transform: Consistent in-graph transformations in training and serving. tf.modelanalysis: Scalable, sliced, and full-pass metrics.

Fundamentals of Stream Processing with Apache Beam (incubating)

Google Docs version of slides (including animations): https://goo.gl/yzvlxe Fundamentals of Stream Processing with Apache Beam (incubating) Frances Perry & Tyler Akidau @francesjperry, @takidau Apache