Practical Deep Learning

Similar documents
DEEP LEARNING IN PYTHON. Creating a keras model

Deep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1

Deep Nets with. Keras

DEEP LEARNING IN PYTHON. Introduction to deep learning

How to Develop Encoder-Decoder LSTMs

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Keras: Handwritten Digit Recognition using MNIST Dataset

ECE 5470 Classification, Machine Learning, and Neural Network Review

An Introduction to NNs using Keras

ˆ How to develop a naive LSTM network for a sequence prediction problem.

Machine Learning 13. week

Keras: Handwritten Digit Recognition using MNIST Dataset

Deep Learning Applications

Day 1 Lecture 6. Software Frameworks for Deep Learning

MoonRiver: Deep Neural Network in C++

A Quick Guide on Training a neural network using Keras.

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

Neural networks. About. Linear function approximation. Spyros Samothrakis Research Fellow, IADS University of Essex.

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Application of Deep Learning Techniques in Satellite Telemetry Analysis.

An Introduction to Deep Learning with RapidMiner. Philipp Schlunder - RapidMiner Research

CSC 578 Neural Networks and Deep Learning

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

The Anatomy of Deep Learning Frameworks* *Everything you wanted to know about DL Frameworks but were afraid to ask

TensorFlow: A System for Learning-Scale Machine Learning. Google Brain

ˆ Why language modeling is critical to addressing tasks in natural language processing.

Machine Learning and Algorithms for Data Mining Practical 2: Neural Networks

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 15

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Deep Learning Frameworks. COSC 7336: Advanced Natural Language Processing Fall 2017

RNN LSTM and Deep Learning Libraries

Recurrent Neural Network (RNN) Industrial AI Lab.

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

Lecture 3: Overview of Deep Learning System. CSE599W: Spring 2018

CIS 660. Image Searching System using CNN-LSTM. Presented by. Mayur Rumalwala Sagar Dahiwala

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Neural Network Exchange Format

Frameworks in Python for Numeric Computation / ML

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

Deep Learning with Tensorflow AlexNet

Review: The best frameworks for machine learning and deep learning

Scaling Distributed Machine Learning

EECS 496 Statistical Language Models. Winter 2018

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Deep learning at Microsoft

Software and Practical Methodology. Tambet Matiisen Neural Networks course

PROGRAMMING THE IPU POPLAR

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Package kerasr. June 1, 2017

Recurrent Neural Network

Tutorial on Deep Learning with Theano and Lasagne. Jan Schlüter Sander Dieleman EMBL-EBI

Lecture note 7: Playing with convolutions in TensorFlow

Pluto A Distributed Heterogeneous Deep Learning Framework. Jun Yang, Yan Chen Large Scale Learning, Alibaba Cloud

Usable while performant: the challenges building. Soumith Chintala

ˆ The first architecture to try with specific advice on how to configure hyperparameters.

Neural Nets & Deep Learning

Deep Learning for Computer Vision

Report. The project was built on the basis of the competition offered on the site

Practical 8: Neural networks

Hardware and Software. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 6-1

Sentiment Classification of Food Reviews

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

Knowledge Discovery and Data Mining

Deep Learning Frameworks with Spark and GPUs

Training Neural Networks with Mixed Precision MICHAEL CARILLI CHRISTIAN SAROFEEN MICHAEL RUBERRY BEN BARSDELL

Machine learning for vision. It s the features, stupid! cathedral. high-rise. Winter Roland Memisevic. Lecture 2, January 26, 2016

Deep neural networks II

Relational inductive biases, deep learning, and graph networks

Machine Learning Workshop

Convolutional Neural Networks (CNN)

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Recurrent Neural Networks

Read & Download (PDF Kindle) Python Parallel Programming Cookbook

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Understanding Adversarial Training: Improve Image Recognition Accuracy of Convolution Neural Network

ImageNet Classification with Deep Convolutional Neural Networks

Foolbox Documentation

CS489/698: Intro to ML

Demystifying Deep Learning

Introduction to Deep Learning in Signal Processing & Communications with MATLAB

MIXED PRECISION TRAINING OF NEURAL NETWORKS. Carl Case, Senior Architect, NVIDIA

Relay: a high level differentiable IR. Jared Roesch TVMConf December 12th, 2018

! References: ! Computer eyesight gets a lot more accurate, NY Times. ! Stanford CS 231n. ! Christopher Olah s blog. ! Take ECS 174!

NVIDIA DEEP LEARNING INSTITUTE

2015 The MathWorks, Inc. 1

Package kerasformula

Weld: A Common Runtime for Data Analytics

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

Convolutional Networks for Text

CS 224n: Assignment #3

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Kaggle Data Science Bowl 2017 Technical Report

Gated Recurrent Models. Stephan Gouws & Richard Klein

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer

Machine Learning. MGS Lecture 3: Deep Learning

Deep Learning with Torch

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Transcription:

Practical Deep Learning micha.codes / fastforwardlabs.com 1 / 70

deep learning can seem mysterious 2 / 70

let's nd a way to just build a function 3 / 70

Feed Forward Layer # X.shape == (512,) # output.shape == (4,) # weights.shape == (512, 4) == 2048 # biases.shape == (4,) def feed_forward(activation, X, weights, biases): return activation(x @ weights + biases) IE: f (X) = σ (X W + b) 4 / 70

What's so special? 5 / 70

Composable # Just like a Logistic Regression result = feed_forward( softmax, X, outer_weights, outer_biases ) 6 / 70

Composable # Just like a Logistic Regression with learned features? result = feed_forward( softmax, feed_forward( tanh, X, inner_weights, inner_biases ) outer_weights, outer_biases ) 7 / 70

nonlinear 8 / 70

UNIVERSAL APPROXIMATION THEOREM neural networks can approximate arbitrary functions 9 / 70

di erentiable SGD Iteratively learn the values for the weights and biases given training data 10 / 70

11 / 70

12 / 70

13 / 70

Convolutional Layer import numpy as np from scipy.signal import convolve # X.shape == (800, 600, 3) # filters.shape == (8, 8, 3, 16) # biases.shape == (3, 16) # output.shape < (792, 592, 16) def convnet(activation, X, filters, biases): return activation( np.stack([convolve(x, f) for f in filter]) + biases ) IE: f (X) = σ (X f + b) 14 / 70

15 / 70

Recurrent Layer # X_sequence.shape == (None, 512) # output.shape == (None, 4) # W.shape == (512, 4) # U.shape == (4, 4) # biases.shape == (4,) def RNN(activation, X_sequence, W, U, biases, activation): output = None for X in X_sequence: output = activation(x @ W + output @ U + biases) yield output IE: f t ( X t ) = σ ( X t W + f t 1 ( X t 1 ) U + b ) 16 / 70

GRU Layer def GRU(activation_in, activation_out, X_sequence, W, U, biases): output = None for X in X_sequence: z = activation_in(w[0] @ X + U[0] @ output + biases[0]) r = activation_in(w[1] @ X + U[1] @ output + biases[1]) o = activation_out(w[2] @ x + U[2] @ (r @ output) + biases[2]) output = z * output + (1 - z) * o yield output 17 / 70

What about theano/tensor ow/mxnet? 18 / 70

what happens here? import numpy as np a = np.random.random(100) - 0.5 a[a < 0] = 10 19 / 70

statement sequence while return condition compare op: body variable name: a variable name: b constant value: 0 branch 20 / 70

library widely used auto-diff gpu/cpu mobile frontend models multi-gpu speed numpy slow theano library widely used auto-diff gpu/cpu mobile frontend models multi-gpu speed fast mx-net fast tensor ow slow 21 / 70

Which should I use? 22 / 70

keras makes Deep Learning simple (http://keras.io/) 23 / 70

$ cat ~/.keras/keras.json { "image_dim_ordering": "th", "epsilon": 1e-07, "floatx": "float32", "backend": "theano" } or $ cat ~/.keras/keras.json { "image_dim_ordering": "tf", "epsilon": 1e-07, "floatx": "float32", "backend": "tensorflow" } 24 / 70

(coming soon... hopefully) $ cat ~/.keras/keras.json { "image_dim_ordering": "mx", "epsilon": 1e-07, "floatx": "float32", "backend": "mxnet" } 25 / 70

simple! from keras.models import Sequential from keras.layers.core import Dense # Same as our Logistic Regression above with: # weights_outer.shape = (512, 4) # biases_outer.shape = (4,) model_lr = Sequential() model_lr.add(dense(4, activation='softmax', input_shape=[512])) model_lr.compile('sgd', 'categorical_crossentropy') model_lr.fit(x, y) 26 / 70

extendible! from keras.models import Sequential from keras.layers.core import Dense # Same as our "deep" Logistic Regression model = Sequential() model.add(dense(128, activation='tanh', input_shape=[512])) model.add(dense(4, activation='softmax')) model.compile('sgd', 'categorical_crossentropy') model.fit(x, y) 27 / 70

model_lr.summary() # # Layer (type) Output Shape Param # Connected to # ========================================================================== # dense_1 (Dense) (None, 4) 2052 dense_input_1[0][0] # ========================================================================== # Total params: 2,052 # Trainable params: 2,052 # Non-trainable params: 0 # model.summary() # # Layer (type) Output Shape Param # Connected to # =================================================================== # dense_2 (Dense) (None, 128) 65664 dense_input_2[0][0] # # dense_3 (Dense) (None, 4) 516 dense_2[0][0] # =================================================================== # Total params: 66,180 # Trainable params: 66,180 # Non-trainable params: 0 # 28 / 70

let's build something 29 / 70

30 / 70

31 / 70

32 / 70

fastforwardlabs.com/luhn/ 33 / 70

34 / 70

35 / 70

36 / 70

37 / 70

38 / 70

39 / 70

40 / 70

41 / 70

42 / 70

43 / 70

def skipgram(words): for i in range(1, len(words)-1): yield words[i], (words[i-1], words[i+1]) 44 / 70

45 / 70

46 / 70

from keras.models import Model from keras.layers import (Input, Embedding, Merge, Lambda, Activation) vector_size=300 word_index = Input(shape=1) word_point = Input(shape=1) syn0 = Embedding(len(vocab), vector_size)(word_index) syn1= Embedding(len(vocab), vector_size)(word_point) merge = Merge([syn0, syn1], mode='mul') merge_sum = Lambda(lambda x: x.sum(axis=-1))(merge) context = Activation('sigmoid')(merge_sum) model = Model(input=[word, context], output=output) model.compile(loss='binary_crossentropy', optimizer='adam') 47 / 70

Feed Forward vs Recurrent Network 48 / 70

49 / 70

RNN Summarization Sketch for Articles 1. Find articles summaries heavy on quotes (http://thebrowser.com/) 2. Score every sentence in the articles based on their "closeness" to a quote 3. Use skip-thoughts to encode every sentence in the article 4. Train and RNN to predict these scores given the sentence vector 5. Evaluate trained model on new things! 50 / 70

keras makes RNN's simple (http://keras.io/) 51 / 70

Example: proprocess from skipthoughts import skipthoughts from.utils import load_data (articles, scores), (articles_test, scores_test) = load_data() articles_vectors = skipthoughts.encode(articles) articles_vectors_test = skipthoughts.encode(articles_test) 52 / 70

Example: model def and training from keras.models import Model from keras.layers.recurrent import LSTM from keras.layers.core import Dense from keras.layers.wrappers import TimeDistributed model = Model() model.add(lstm(512, input_shape=(none, 4800), dropout_w=0.3, dropout_u=0.3)) model.add(timedistributed(dense(1))) model.compile(loss='mean_absolute_error', optimizer='rmsprop') model.fit(articles_vectors, scores, validation_split=0.10) loss, acc = model.evaluate(articles_vectors_test, scores_test) print('test loss / test accuracy = {:.4f} / {:.4f}'.format(loss, acc)) model.save("models/new_model.h5") 53 / 70

article size of data list(text) sent1 sent2 sent3 sent4 sent5 sent6 skip thought skip thought skip thought skip thought skip thought skip thought preprocess (6,4800) (6,512) LSTM LSTM LSTM LSTM LSTM LSTM Dense Dense Dense Dense Dense Dense keras (6,1) 54 / 70

Example Model: evaluation from keras.models import load_model from flask import Flask, request import nltk app = Flask( name ) model = load_model("models/new_model.h5") @app.route('/api/evaluate', methods=['post']) def evaluate(): article = request.data sentences = nltk.sent_tokenize(article) sentence_vectors = skipthoughts.encode(sentences) return model.predict(sentence_vectors) 55 / 70

56 / 70

57 / 70

Thoughts of doing this method Scoring function used is SUPER important Hope you have a GPU Hyper-parameters for all! Structure of model can change where it's applicable SGD means random initialization... may need to t multiple times 58 / 70

59 / 70

REGULARIZE! dropout: only parts of the NN participate in every round l1/l2: add penalty term for large weights batchnormalization: unit mean/std for each batch of data 60 / 70

VALIDATE AND TEST! lots of parameters == potential for over tting 61 / 70

deploy? 62 / 70

63 / 70

load balancer p2.8xlarge backend api backend api backend api backend api gpu gpu gpu gpu front-end api p2.xlarge backend api gpu internet model store 64 / 70

careful: data drift set monitoring on the distribution of results 65 / 70

66 / 70

67 / 70

68 / 70

Emerging Applications Text Generation Audio Generation Event Detection Intent Identi cation Decision Making 69 / 70

70 / 70