SVM multiclass classification in 10 steps 17/32

Similar documents
NVIDIA FOR DEEP LEARNING. Bill Veenhuis

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

Tutorial on Machine Learning Tools

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

Keras: Handwritten Digit Recognition using MNIST Dataset

CS 179 Lecture 16. Logistic Regression & Parallel SGD

A performance comparison of Deep Learning frameworks on KNL

Convolutional Neural Networks (CNN)

Deep Learning Frameworks. COSC 7336: Advanced Natural Language Processing Fall 2017

Deploying Deep Learning Networks to Embedded GPUs and CPUs

Scaling Out Python* To HPC and Big Data

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder

GPU-Accelerated Deep Learning

Machine Learning Workshop

TENSORFLOW. DU _v1.8.0 June User Guide

TENSORRT. RN _v01 January Release Notes

GPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13

Keras: Handwritten Digit Recognition using MNIST Dataset

Lecture 3: Overview of Deep Learning System. CSE599W: Spring 2018

Developing Machine Learning Models. Kevin Tsai

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

Porting the NAS-NPB Conjugate Gradient Benchmark to CUDA. NVIDIA Corporation

Review: The best frameworks for machine learning and deep learning

Intel Distribution For Python*

NVIDIA DATA LOADING LIBRARY (DALI)

SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS

Deep learning in MATLAB From Concept to CUDA Code

Homework 01 : Deep learning Tutorial

Frameworks in Python for Numeric Computation / ML

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

M. Sc. (Artificial Intelligence and Machine Learning)

GPU Coder: Automatic CUDA and TensorRT code generation from MATLAB

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017

Hardware and Software. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 6-1

Accelerated Machine Learning Algorithms in Python

scikit-learn (Machine Learning in Python)

CafeGPI. Single-Sided Communication for Scalable Deep Learning

Introduction to Deep Learning in Signal Processing & Communications with MATLAB

R for SQListas, a Continuation

CERN openlab & IBM Research Workshop Trip Report

CUDA Toolkit 4.0 Performance Report. June, 2011

Intel Distribution for Python* и Intel Performance Libraries

python numpy tensorflow tutorial

Machine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich

CUDA Accelerated Compute Libraries. M. Naumov

Assignment4. November 29, Follow the directions on to install Tensorflow on your computer.

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

TENSORRT. RN _v01 June Release Notes

A Standard for Batching BLAS Operations

DECISION SUPPORT SYSTEM USING TENSOR FLOW

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

APL on GPUs A Progress Report with a Touch of Machine Learning

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

CUDA 6.5 Performance Report

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

CUDA 6.0 Performance Report. April 2014

Introduction to GPU Computing. 周国峰 Wuhan University 2017/10/13

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.

ImageNet Classification with Deep Convolutional Neural Networks

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Training Neural Networks with Mixed Precision MICHAEL CARILLI CHRISTIAN SAROFEEN MICHAEL RUBERRY BEN BARSDELL

CUDA Toolkit 5.0 Performance Report. January 2013

OPTIMIZED GPU KERNELS FOR DEEP LEARNING. Amir Khosrowshahi

World s most advanced data center accelerator for PCIe-based servers

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Deep Learning on Modern Architectures. Keren Zhou 4/17/2017

CS224n: Natural Language Processing with Deep Learning 1

Introduction to GPGPUs and to CUDA programming model: CUDA Libraries

Deep Learning and Its Applications

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA

CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations

Data Analytics and Machine Learning: From Node to Cluster

Implementing Deep Learning for Video Analytics on Tegra X1.

ModBot Software Documentation 4CAD April 30, 2018

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

Asynchronous Stochastic Gradient Descent on GPU: Is It Really Better than CPU?

Day 1 Lecture 6. Software Frameworks for Deep Learning

cosmic rays problem EUCLID VIS simulation M. Brescia - Data Mining

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

TF Mutiple Hidden Layers: Regression on Boston Data

Using OpenACC With CUDA Libraries

Deep Learning Accelerators

TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology

TENSORRT 4.0 RELEASE CANDIDATE (RC)

NEW ADVANCES IN GPU LINEAR ALGEBRA

MACHINE LEARNING CLASSIFIERS ADVANTAGES AND CHALLENGES OF SELECTED METHODS

Harp-DAAL for High Performance Big Data Computing

Image Classification using Fast Learning Convolutional Neural Networks

TensorFlow: A System for Learning-Scale Machine Learning. Google Brain

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm

TENSORRT 3.0. DU _v3.0 February Installation Guide

Matchbox. automatic batching for imperative deep learning NVIDIA GTC, 2018/3/28. James Bradbury

Sony's deep learning software "Neural Network Libraries/Console and its use cases in Sony

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Xilinx ML Suite Overview

CUDNN. DU _v07 December Installation Guide

Transcription:

SVM multiclass classification in 10 steps import numpy as np # load digits dataset from sklearn import datasets digits = datasets. load_digits () # define training set size n_samples = len ( digits. images ) n_training = int ( 0.9 * n_samples ) data = np. ascontiguousarray ( digits.data, dtype=np. double ) labels = np. ascontiguousarray ( digits. target. reshape ( n_samples, 1), dtype = np. double ) from daal. data_management import HomogenNumericTable train_data = HomogenNumericTable ( data [: n_training ] ) train_labels = HomogenNumericTable ( labels [: n_training ] ) test_data = HomogenNumericTable ( data [ n_training :] ) 17/32

SVM multiclass classification in 10 steps import numpy as np # load digits dataset from sklearn import datasets digits = datasets. load_digits () # define training set size n_samples = len ( digits. images ) n_training = int ( 0.9 * n_samples ) data = np. ascontiguousarray ( digits.data, dtype=np. double ) labels = np. ascontiguousarray ( digits. target. reshape ( n_samples, 1), dtype = np. double ) from daal. data_management import HomogenNumericTable train_data = HomogenNumericTable ( data [: n_training ] ) train_labels = HomogenNumericTable ( labels [: n_training ] ) test_data = HomogenNumericTable ( data [ n_training :] ) 1. enjoy sklearn datasets import module 17/32

SVM multiclass classification in 10 steps import numpy as np # load digits dataset from sklearn import datasets digits = datasets. load_digits () # define training set size n_samples = len ( digits. images ) n_training = int ( 0.9 * n_samples ) data = np. ascontiguousarray ( digits.data, dtype=np. double ) labels = np. ascontiguousarray ( digits. target. reshape ( n_samples, 1), dtype = np. double ) from daal. data_management import HomogenNumericTable train_data = HomogenNumericTable ( data [: n_training ] ) train_labels = HomogenNumericTable ( labels [: n_training ] ) test_data = HomogenNumericTable ( data [ n_training :] ) 1. enjoy sklearn datasets import module 2. require a contiguous array 17/32

SVM multiclass classification in 10 steps import numpy as np # load digits dataset from sklearn import datasets digits = datasets. load_digits () # define training set size n_samples = len ( digits. images ) n_training = int ( 0.9 * n_samples ) data = np. ascontiguousarray ( digits.data, dtype=np. double ) labels = np. ascontiguousarray ( digits. target. reshape ( n_samples, 1), dtype = np. double ) from daal. data_management import HomogenNumericTable train_data = HomogenNumericTable ( data [: n_training ] ) train_labels = HomogenNumericTable ( labels [: n_training ] ) test_data = HomogenNumericTable ( data [ n_training :] ) 1. enjoy sklearn datasets import module 2. require a contiguous array 3. create instances of HomogenNumericTable 17/32

training algorithm setup from daal. algorithms.svm import training as svm_training from daal. algorithms.svm import prediction as svm_prediction from daal. algorithms. multi_class_classifier import training as multiclass_training from daal. algorithms. classifier import training as training_params kernel = rbf. Batch_Float64DefaultDense () kernel. parameter. sigma = 0.001 # Create two class svm classifier # training alg twoclass_train_alg = svm_training. Batch_Float64DefaultDense () twoclass_train_alg. parameter. kernel = kernel twoclass_train_alg. parameter.c = 1.0 # prediction alg twoclass_predict_alg = svm_prediction. Batch_Float64DefaultDense () twoclass_predict_alg. parameter. kernel = kernel # Create a multiclass classifier object ( training ) train_alg = multiclass_training. Batch_Float64OneAgainstOne () train_alg. parameter. nclasses = 10 train_alg. parameter. training = twoclass_train_alg train_alg. parameter. prediction = twoclass_predict_alg 18/32

training algorithm setup from daal. algorithms.svm import training as svm_training from daal. algorithms.svm import prediction as svm_prediction from daal. algorithms. multi_class_classifier import training as multiclass_training from daal. algorithms. classifier import training as training_params kernel = rbf. Batch_Float64DefaultDense () kernel. parameter. sigma = 0.001 # Create two class svm classifier # training alg twoclass_train_alg = svm_training. Batch_Float64DefaultDense () twoclass_train_alg. parameter. kernel = kernel twoclass_train_alg. parameter.c = 1.0 # prediction alg twoclass_predict_alg = svm_prediction. Batch_Float64DefaultDense () twoclass_predict_alg. parameter. kernel = kernel # Create a multiclass classifier object ( training ) train_alg = multiclass_training. Batch_Float64OneAgainstOne () train_alg. parameter. nclasses = 10 train_alg. parameter. training = twoclass_train_alg train_alg. parameter. prediction = twoclass_predict_alg 4. define kernel and kernel parameters 18/32

training algorithm setup from daal. algorithms.svm import training as svm_training from daal. algorithms.svm import prediction as svm_prediction from daal. algorithms. multi_class_classifier import training as multiclass_training from daal. algorithms. classifier import training as training_params kernel = rbf. Batch_Float64DefaultDense () kernel. parameter. sigma = 0.001 # Create two class svm classifier # training alg twoclass_train_alg = svm_training. Batch_Float64DefaultDense () twoclass_train_alg. parameter. kernel = kernel twoclass_train_alg. parameter.c = 1.0 # prediction alg twoclass_predict_alg = svm_prediction. Batch_Float64DefaultDense () twoclass_predict_alg. parameter. kernel = kernel # Create a multiclass classifier object ( training ) train_alg = multiclass_training. Batch_Float64OneAgainstOne () train_alg. parameter. nclasses = 10 train_alg. parameter. training = twoclass_train_alg train_alg. parameter. prediction = twoclass_predict_alg 4. define kernel and kernel parameters 5. create two class svm classifier 18/32

training algorithm setup from daal. algorithms.svm import training as svm_training from daal. algorithms.svm import prediction as svm_prediction from daal. algorithms. multi_class_classifier import training as multiclass_training from daal. algorithms. classifier import training as training_params kernel = rbf. Batch_Float64DefaultDense () kernel. parameter. sigma = 0.001 # Create two class svm classifier # training alg twoclass_train_alg = svm_training. Batch_Float64DefaultDense () twoclass_train_alg. parameter. kernel = kernel twoclass_train_alg. parameter.c = 1.0 # prediction alg twoclass_predict_alg = svm_prediction. Batch_Float64DefaultDense () twoclass_predict_alg. parameter. kernel = kernel # Create a multiclass classifier object ( training ) train_alg = multiclass_training. Batch_Float64OneAgainstOne () train_alg. parameter. nclasses = 10 train_alg. parameter. training = twoclass_train_alg train_alg. parameter. prediction = twoclass_predict_alg 4. define kernel and kernel parameters 5. create two class svm classifier 6. create multi class svm classifier (training) 18/32

training phase # Pass training data and labels train_alg. input.set ( training_params.data, train_data ) train_alg. input.set ( training_params.labels, train_labels ) # training model = train_alg. compute ().get ( training_params. model ) 19/32

training phase # Pass training data and labels train_alg. input.set ( training_params.data, train_data ) train_alg. input.set ( training_params.labels, train_labels ) # training model = train_alg. compute ().get ( training_params. model ) 7. set input data and labels 19/32

training phase # Pass training data and labels train_alg. input.set ( training_params.data, train_data ) train_alg. input.set ( training_params.labels, train_labels ) # training model = train_alg. compute ().get ( training_params. model ) 7. set input data and labels 8. start training and get model 19/32

prediction algorithm setup from daal. algorithms. multi_class_classifier import prediction as multiclass_prediction from daal. algorithms. classifier import prediction as prediction_params # Create a multiclass classifier object ( prediction ) predict_alg = multiclass_prediction. Batch_Float64DefaultDenseOneAgainstOne () predict_alg. parameter. nclasses = 10 predict_alg. parameter. training = twoclass_train_alg predict_alg. parameter. prediction = twoclass_predict_alg 20/32

prediction algorithm setup from daal. algorithms. multi_class_classifier import prediction as multiclass_prediction from daal. algorithms. classifier import prediction as prediction_params # Create a multiclass classifier object ( prediction ) predict_alg = multiclass_prediction. Batch_Float64DefaultDenseOneAgainstOne () predict_alg. parameter. nclasses = 10 predict_alg. parameter. training = twoclass_train_alg predict_alg. parameter. prediction = twoclass_predict_alg 9. create multi class svm classifier (prediction) 20/32

prediction phase # Pass a model and input data predict_alg. input. setmodel ( prediction_params.model, model ) predict_alg. input. settable ( prediction_params.data, test_data ) # Compute and return prediction results results = predict_alg. compute ().get ( prediction_params. prediction ) 21/32

prediction phase # Pass a model and input data predict_alg. input. setmodel ( prediction_params.model, model ) predict_alg. input. settable ( prediction_params.data, test_data ) # Compute and return prediction results results = predict_alg. compute ().get ( prediction_params. prediction ) 10. set input model and data 21/32

prediction phase # Pass a model and input data predict_alg. input. setmodel ( prediction_params.model, model ) predict_alg. input. settable ( prediction_params.data, test_data ) # Compute and return prediction results results = predict_alg. compute ().get ( prediction_params. prediction ) 10. set input model and data 11. start prediction and get labels 21/32

Benchmark results Test description: 1797 samples total (90% training set, 10% test set), 64 features per sample Platform description: Intel Core i5-6300u CPU @ 2.40GHz sklearn pydaal speedup training time [s] 0.161 0.018 8.9 test time [s] 0.017 0.004 4.3 22/32

Advantages and disadvantages DAAL in general: 23/32

Advantages and disadvantages DAAL in general: very simple installation/setup 23/32

Advantages and disadvantages DAAL in general: very simple installation/setup wide range of algorithms (both for machine l. and for deep l.) 23/32

Advantages and disadvantages DAAL in general: very simple installation/setup wide range of algorithms (both for machine l. and for deep l.) good support through dedicated intel forum (even for non paid versions) 23/32

Advantages and disadvantages DAAL in general: very simple installation/setup wide range of algorithms (both for machine l. and for deep l.) good support through dedicated intel forum (even for non paid versions) DAAL C ++ can be called from R and Matlab (see how-to forum posts) 23/32

Advantages and disadvantages DAAL in general: very simple installation/setup wide range of algorithms (both for machine l. and for deep l.) good support through dedicated intel forum (even for non paid versions) DAAL C ++ can be called from R and Matlab (see how-to forum posts) documentation is sometimes not exhaustive 23/32

Advantages and disadvantages DAAL in general: very simple installation/setup wide range of algorithms (both for machine l. and for deep l.) good support through dedicated intel forum (even for non paid versions) DAAL C ++ can be called from R and Matlab (see how-to forum posts) documentation is sometimes not exhaustive examples cover very simple application cases 23/32

Advantages and disadvantages DAAL in general: very simple installation/setup wide range of algorithms (both for machine l. and for deep l.) good support through dedicated intel forum (even for non paid versions) DAAL C ++ can be called from R and Matlab (see how-to forum posts) documentation is sometimes not exhaustive examples cover very simple application cases as a Python user: comes with Intel Python framework 23/32

Advantages and disadvantages DAAL in general: very simple installation/setup wide range of algorithms (both for machine l. and for deep l.) good support through dedicated intel forum (even for non paid versions) DAAL C ++ can be called from R and Matlab (see how-to forum posts) documentation is sometimes not exhaustive examples cover very simple application cases as a Python user: comes with Intel Python framework faster alternative to scikit 23/32

Advantages and disadvantages DAAL in general: very simple installation/setup wide range of algorithms (both for machine l. and for deep l.) good support through dedicated intel forum (even for non paid versions) DAAL C ++ can be called from R and Matlab (see how-to forum posts) documentation is sometimes not exhaustive examples cover very simple application cases as a Python user: comes with Intel Python framework faster alternative to scikit Python interface still in development phase not all neural network layers parameters are accessible/modifiable 23/32

NVIDIA Deep Learning Software Tools and libraries for designing and deploying GPU-accelerated deep learning applications. 24/32

NVIDIA Deep Learning Software Tools and libraries for designing and deploying GPU-accelerated deep learning applications. Deep Learning Neural Network library (cudnn) forward and backward convolution, pooling, normalization, activation layers; 24/32

NVIDIA Deep Learning Software Tools and libraries for designing and deploying GPU-accelerated deep learning applications. Deep Learning Neural Network library (cudnn) forward and backward convolution, pooling, normalization, activation layers; TensorRT optimize, validate and deploy trained neural network for inference to hyperscale data centers, embedded, or automotive product platforms; 24/32

NVIDIA Deep Learning Software Tools and libraries for designing and deploying GPU-accelerated deep learning applications. Deep Learning Neural Network library (cudnn) forward and backward convolution, pooling, normalization, activation layers; TensorRT optimize, validate and deploy trained neural network for inference to hyperscale data centers, embedded, or automotive product platforms; DeepStream SDK uses TensorRT to deliver fast INT8 precision inference for real-time video content analysis, supports also FP16 and FP32 precisions; 24/32

NVIDIA Deep Learning Software Tools and libraries for designing and deploying GPU-accelerated deep learning applications. Deep Learning Neural Network library (cudnn) forward and backward convolution, pooling, normalization, activation layers; TensorRT optimize, validate and deploy trained neural network for inference to hyperscale data centers, embedded, or automotive product platforms; DeepStream SDK uses TensorRT to deliver fast INT8 precision inference for real-time video content analysis, supports also FP16 and FP32 precisions; Linear Algebra (cublas and cublas-xt) accelerated BLAS subroutines for single and multi-gpu acceleration; 24/32

NVIDIA Deep Learning Software Tools and libraries for designing and deploying GPU-accelerated deep learning applications. Deep Learning Neural Network library (cudnn) forward and backward convolution, pooling, normalization, activation layers; TensorRT optimize, validate and deploy trained neural network for inference to hyperscale data centers, embedded, or automotive product platforms; DeepStream SDK uses TensorRT to deliver fast INT8 precision inference for real-time video content analysis, supports also FP16 and FP32 precisions; Linear Algebra (cublas and cublas-xt) accelerated BLAS subroutines for single and multi-gpu acceleration; Sparse Linear Algebra (cusparse) supports dense, COO, CSR, CSC, ELL/HYB and Blocked CSR sparse matrix formats, Level 1,2,3 routines, sparse triangular solver, sparse tridiagonal solver; 24/32

NVIDIA Deep Learning Software Tools and libraries for designing and deploying GPU-accelerated deep learning applications. Deep Learning Neural Network library (cudnn) forward and backward convolution, pooling, normalization, activation layers; TensorRT optimize, validate and deploy trained neural network for inference to hyperscale data centers, embedded, or automotive product platforms; DeepStream SDK uses TensorRT to deliver fast INT8 precision inference for real-time video content analysis, supports also FP16 and FP32 precisions; Linear Algebra (cublas and cublas-xt) accelerated BLAS subroutines for single and multi-gpu acceleration; Sparse Linear Algebra (cusparse) supports dense, COO, CSR, CSC, ELL/HYB and Blocked CSR sparse matrix formats, Level 1,2,3 routines, sparse triangular solver, sparse tridiagonal solver; Multi-GPU Communications (NCCL, pronounced Nickel ) optimized primitives for collective multi-gpu communication; 24/32

NVIDIA based frameworks 25/32

TensorFlow Google Brain s second generation machine learning system 26/32

TensorFlow Google Brain s second generation machine learning system computations are expressed as stateful dataflow graphs 26/32

TensorFlow Google Brain s second generation machine learning system computations are expressed as stateful dataflow graphs automatic differentiation capabilities 26/32

TensorFlow Google Brain s second generation machine learning system computations are expressed as stateful dataflow graphs automatic differentiation capabilities optimization algorithms: gradient and proximal gradient based 26/32

TensorFlow Google Brain s second generation machine learning system computations are expressed as stateful dataflow graphs automatic differentiation capabilities optimization algorithms: gradient and proximal gradient based code portability (CPUs, GPUs, on desktop, server, or mobile computing platforms) 26/32

TensorFlow Google Brain s second generation machine learning system computations are expressed as stateful dataflow graphs automatic differentiation capabilities optimization algorithms: gradient and proximal gradient based code portability (CPUs, GPUs, on desktop, server, or mobile computing platforms) Python interface is the preferred one (Java and C ++ also exist) 26/32

TensorFlow Google Brain s second generation machine learning system computations are expressed as stateful dataflow graphs automatic differentiation capabilities optimization algorithms: gradient and proximal gradient based code portability (CPUs, GPUs, on desktop, server, or mobile computing platforms) Python interface is the preferred one (Java and C ++ also exist) installation through: virtualenv, pip, Docker, Anaconda, from sources 26/32

Linear regression (I) import tensorflow as tf import numpy as np import matplotlib. pyplot as plt # Set up the data with a noisy linear relationship between X and Y. num_examples = 50 X = np. array ([ np. linspace ( -2, 4, num_examples ), np. linspace ( -6, 6, num_examples ) ]) X += np. random. randn (2, num_examples ) x, y = X x_with_bias = np. array ([(1., a ) for a in x ]). astype (np. float32 ) losses = [] training_steps = 50 learning_rate = 0.002 27/32

Linear regression (I) import tensorflow as tf import numpy as np import matplotlib. pyplot as plt # Set up the data with a noisy linear relationship between X and Y. num_examples = 50 X = np. array ([ np. linspace ( -2, 4, num_examples ), np. linspace ( -6, 6, num_examples ) ]) X += np. random. randn (2, num_examples ) x, y = X x_with_bias = np. array ([(1., a ) for a in x ]). astype (np. float32 ) 8 6 4 losses = [] training_steps = 50 learning_rate = 0.002 2 0 2 4 6 8 7.5 5.0 2.5 0.0 2.5 5.0 7.5 1. generate noisy input data 27/32

Linear regression (I) import tensorflow as tf import numpy as np import matplotlib. pyplot as plt # Set up the data with a noisy linear relationship between X and Y. num_examples = 50 X = np. array ([ np. linspace ( -2, 4, num_examples ), np. linspace ( -6, 6, num_examples ) ]) X += np. random. randn (2, num_examples ) x, y = X x_with_bias = np. array ([(1., a ) for a in x ]). astype (np. float32 ) 8 6 4 losses = [] training_steps = 50 learning_rate = 0.002 2 0 2 4 6 8 7.5 5.0 2.5 0.0 2.5 5.0 7.5 1. generate noisy input data 2. set slack variables and fix algorithm parameters 27/32

Linear regression (II) # Start of graph description # Set up all the tensors, variables, and operations. A = tf. constant ( x_with_bias ) target = tf. constant (np. transpose ([y]). astype (np. float32 ) ) weights = tf. Variable (tf. random_normal ([2, 1], 0, 0.1) ) yhat = tf. matmul (A, weights ) yerror = tf. sub ( yhat, target ) loss = tf.nn. l2_loss ( yerror ) update_weights = tf. train. GradientDescentOptimizer ( learning_rate ). minimize (loss ) sess = tf. Session () sess.run ( tf. global_variables_initializer () ) for _ in range ( training_steps ) : # Repeatedly run the operations, updating variables sess. run ( update_weights ) losses. append ( sess.run ( loss ) ) 28/32

Linear regression (II) # Start of graph description # Set up all the tensors, variables, and operations. A = tf. constant ( x_with_bias ) target = tf. constant (np. transpose ([y]). astype (np. float32 ) ) weights = tf. Variable (tf. random_normal ([2, 1], 0, 0.1) ) yhat = tf. matmul (A, weights ) yerror = tf. sub ( yhat, target ) loss = tf.nn. l2_loss ( yerror ) update_weights = tf. train. GradientDescentOptimizer ( learning_rate ). minimize (loss ) sess = tf. Session () sess.run ( tf. global_variables_initializer () ) for _ in range ( training_steps ) : # Repeatedly run the operations, updating variables sess. run ( update_weights ) losses. append ( sess.run ( loss ) ) 3. define tensorflow constants and variables 28/32

Linear regression (II) # Start of graph description # Set up all the tensors, variables, and operations. A = tf. constant ( x_with_bias ) target = tf. constant (np. transpose ([y]). astype (np. float32 ) ) weights = tf. Variable (tf. random_normal ([2, 1], 0, 0.1) ) yhat = tf. matmul (A, weights ) yerror = tf. sub ( yhat, target ) loss = tf.nn. l2_loss ( yerror ) update_weights = tf. train. GradientDescentOptimizer ( learning_rate ). minimize (loss ) sess = tf. Session () sess.run ( tf. global_variables_initializer () ) for _ in range ( training_steps ) : # Repeatedly run the operations, updating variables sess. run ( update_weights ) losses. append ( sess.run ( loss ) ) 3. define tensorflow constants and variables 4. define nodes 28/32

Linear regression (II) # Start of graph description # Set up all the tensors, variables, and operations. A = tf. constant ( x_with_bias ) target = tf. constant (np. transpose ([y]). astype (np. float32 ) ) weights = tf. Variable (tf. random_normal ([2, 1], 0, 0.1) ) yhat = tf. matmul (A, weights ) yerror = tf. sub ( yhat, target ) loss = tf.nn. l2_loss ( yerror ) update_weights = tf. train. GradientDescentOptimizer ( learning_rate ). minimize (loss ) sess = tf. Session () sess.run ( tf. global_variables_initializer () ) for _ in range ( training_steps ) : # Repeatedly run the operations, updating variables sess. run ( update_weights ) losses. append ( sess.run ( loss ) ) 3. define tensorflow constants and variables 4. define nodes 5. start evaluation 28/32

Linear regression (III) # Training is done, get the final values betas = sess. run ( weights ) yhat = sess. run ( yhat ) 29/32

Linear regression (III) # Training is done, get the final values betas = sess. run ( weights ) yhat = sess. run ( yhat ) 8 6 200 4 2 180 0 Loss 160 2 4 140 6 120 8 7.5 5.0 2.5 0.0 2.5 5.0 7.5 0 10 20 30 40 50 Training steps 29/32

Benchmark results MNIST dataset of handwritten digits: training set: 60k samples, 784 features per sample (MNIST) test set: 10k samples, 784 features per sample (MNIST) 30/32

Benchmark results MNIST dataset of handwritten digits: training set: 60k samples, 784 features per sample (MNIST) test set: 10k samples, 784 features per sample (MNIST) Convolutional NN: two conv. layers, two fully conn. layers (plus reg.) 3m variables convolutional layer with non-linearities plus subsampling convolutional layer with non-linearities plus subsampling fully connected layer fully connected layer input image (28 28) (14 14 32) (7 7 64) (1024) output probabilities (10) 30/32

Benchmark results (II) Optimization method: stochastic gradient descent (batch size: 50 examples) fixed learning rate 2000 iterations 31/32

Benchmark results (II) Optimization method: Platforms: stochastic gradient descent (batch size: 50 examples) fixed learning rate 2000 iterations (1) Intel Core i5-6300u CPU @2.4GHz (2) 2 x Intel Xeon 2630 v3 @2.4GHz (3) Nvidia Tesla K40 31/32

Benchmark results (II) Optimization method: Platforms: stochastic gradient descent (batch size: 50 examples) fixed learning rate 2000 iterations (1) Intel Core i5-6300u CPU @2.4GHz (2) 2 x Intel Xeon 2630 v3 @2.4GHz (3) Nvidia Tesla K40 PL (1) PL (2) PL (3) training time [s] 3307.2 866.1 191.8 test time [s] 11.9 1.7 1.2 31/32

Benchmark results (II) Optimization method: Platforms: stochastic gradient descent (batch size: 50 examples) fixed learning rate 2000 iterations (1) Intel Core i5-6300u CPU @2.4GHz (2) 2 x Intel Xeon 2630 v3 @2.4GHz (3) Nvidia Tesla K40 PL (1) PL (2) PL (3) training time [s] 3307.2 866.1 191.8 test time [s] 11.9 1.7 1.2 Same code achieves 3.8x when running in 1 GALILEO node and 17.2x on a single GPU (training phase). 31/32

Advantages and disadvantages TensorFlow in general: 32/32

Advantages and disadvantages TensorFlow in general: very simple installation/setup 32/32

Advantages and disadvantages TensorFlow in general: very simple installation/setup plenty of tutorials, exhaustive documentation 32/32

Advantages and disadvantages TensorFlow in general: very simple installation/setup plenty of tutorials, exhaustive documentation tools for exporting (partially) trained graphs (see MetaGraph) 32/32

Advantages and disadvantages TensorFlow in general: very simple installation/setup plenty of tutorials, exhaustive documentation tools for exporting (partially) trained graphs (see MetaGraph) debugger, graph flow visualization 32/32

Advantages and disadvantages TensorFlow in general: very simple installation/setup plenty of tutorials, exhaustive documentation tools for exporting (partially) trained graphs (see MetaGraph) debugger, graph flow visualization as a Python user: Python API is the most complete and the easiest to use 32/32

Advantages and disadvantages TensorFlow in general: very simple installation/setup plenty of tutorials, exhaustive documentation tools for exporting (partially) trained graphs (see MetaGraph) debugger, graph flow visualization as a Python user: Python API is the most complete and the easiest to use numpy interoperability 32/32

Advantages and disadvantages TensorFlow in general: very simple installation/setup plenty of tutorials, exhaustive documentation tools for exporting (partially) trained graphs (see MetaGraph) debugger, graph flow visualization as a Python user: Python API is the most complete and the easiest to use numpy interoperability lower level than pydaal (?) 32/32