Creating a Cloud Based Machine Learning Platform at the Petabyte Scale クラウド基盤の機械学習プラットフォームをペタバイト規模で作成する Alex Sadovsky alex.sadovsky@oracle.com Director of Data Science Oracle Data Cloud
Our mission: Unlocking the value of data to help advertisers connect more deeply with customers
Our data Over 5 Billion global cookie IDs 1.05 Billion mobile device IDs Even North Korea Over 15 Million domains 1.1 Trillion page views $3 Trillion in annual observed consumer spending Lowest Activity Highest Activity 3
This gives us a full view of the consumer What they BUY What they DO Where they GO Who they ARE 4
Down to an extremely detailed level Married: Yes Kids: 2 Education level: College grad Online sites: Kayak travel Italy Travel - High-end hotels Cars - cross-over vehicle DMA: San Diego Hobby: Photography Buys high-end women s apparel Drives a Honda CRV Avid home cook Organic product purchaser Buys children s apparel Favorite wine is Pinot Noir Shops at luxury retailers 5
We use machine learning to go from data to targeted advertising Start with an action a client wants to find more of Apply machine learning to model consumer action patterns Use models to create audiences of potential future purchasers Target audiences across the digital landscape Add in all of our consumer data 6
We do this on a stable, self built, machine learning platform Every month we create: 10,000+ models Each with terabytes of data and best in class machine learning accuracy 7
So let s discuss how we do it! How can developers create a machine learning platform that can: Use as much data as possible? Utilize a data driven machine learning approach? Scale to meet business needs? 8
Agenda 1. Letting data tell the story 2. Creating effective cloud infrastructure 3. Constructing an end to end platform 9
Letting data tell the story: The data Data is our most precious asset, but we often have too much for a single computer We can use Hadoop to create large scale, distributed data sets from data stored in cloud object stores We can use Hive to access this data in simple SQL, making complex multicomputer computation as simple as: SELECT SUM(spend) FROM consumers WHERE age > 21 10
Letting data tell the story: Modeling The wrong way to model data: Manually pick a modeling method for your data Apply it Height Height Newborn Age 10 Years Newborn Age 100 Years 11
Letting data tell the story: Modeling No Free Lunch Theorem For any algorithm, any elevated performance for one class of problems is offset by subpar performance for another class So what can we do? Try multiple algorithms/models with training data and pick the one or combination that works best with machine learning We can do it easily in under 50 lines of code! https://github.com/sadovsky/no-free-lunch/blob/master/no_free_lunch.ipynb 12
Letting data tell the story: Python Pros: For machine learning, Python lets us quickly write code that is easily readable and able to be interfaced with other languages and software Libraries like scikit-learn, like in the previous example, create access to hundreds of machine learning algorithms Cons: Limited to single machines Not great for big data out of the box So how can we fix that? 13
Letting data tell the story: Python + Spark Spark is a distributed, in memory, big data processing engine Pyspark gives us the simplicity of a Python API with the power of Spark text_file = spark.textfile("hdfs://...") counts = text_file.flatmap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b) counts.saveastextfile("hdfs://...") Distributed computing in 5 lines of code! But what if we want to use deep learning? 14
Letting data tell the story: Deep learning The simplicity of a Python API on top of a C/C++ engine GPU & Distributed processing made easy 15
Letting data tell the story: Python + Tensorflow from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split import numpy as np iris = load_iris() X = iris.data[:, (2, 3)] # petal length, petal width X = X.astype(np.float32) y = (iris.target == 0).astype(np.int) X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33) import tensorflow as tf tf.logging.set_verbosity(tf.logging.info) feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(x_train) clf = tf.contrib.learn.dnnclassifier(hidden_units=[2, 3], n_classes=2, feature_columns=feature_columns) clf.fit(x=x_train, y=y_train, batch_size=50, steps=500) y_pred = clf.predict(x=x_test, batch_size=50, as_iterable=false) from sklearn.metrics import accuracy_score print accuracy_score(y_test,y_pred) Deep learning in <21 lines of code! 16
Agenda 1. Letting data tell the story 2. Creating effective cloud infrastructure 3. Constructing an end to end platform 17
Creating effective cloud infrastructure: Why cloud? CPU Bound CPU + GPU Bound Network, Memory, CPU Bound The needs of data science change all the time The cloud lets us keep our infrastructure current and fit for the task at hand But how can we efficient keep track of all of our cloud databases, networks, and computers? 18
Creating effective cloud infrastructure: Infrastructure as code Amazon Web Services provider "aws" {... } resource "aws_vpc" "default" {... } resource "aws_instance" model_cpu" {... } provider "baremetal" {... } Oracle Bare Metal Cloud resource "baremetal_core_virtual_network default {... } resource "baremetal_core_instance" model_cpu" {... } 19
Creating effective cloud infrastructure: Multiple environments $ terraform apply $ terraform destroy Bring up entire cloud infrastructure Shutdown cloud infrastrucutre Test Environment </code> Production Environment Staging Environment 20
Creating effective cloud infrastructure: How can we package different software and code? Containers! All machine learning code can be stored in Docker containers Resource isolation and allocation like a virtual machine Abstract the OS Memory CPU 21
Creating effective cloud infrastructure: Failure recovery 22
Agenda 1. Letting data tell the story 2. Creating effective cloud infrastructure 3. Constructing an end to end platform 23
Constructing an end to end platform Flexible Service oriented Scalable For data For compute Robust Containerized and failsafe 24
Creating a cloud based machine learning platform at the petabyte scale Lessons learned: 1. Use a scalable data infrastructure to make use of all data 2. Let the data drive machine learning decisions 3. Use the cloud to create a flexible, scalable, and robust platform Thank you! 25
26
alex.sadovsky@oracle.com