Mocha.jl. Deep Learning in Julia. Chiyuan Zhang CSAIL, MIT

Similar documents
Mocha.jl. Deep Learning for Julia. Chiyuan Zhang CSAIL, MIT

Deep Learning with Tensorflow AlexNet

Caffe tutorial. Seong Joon Oh

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Deep Learning for Computer Vision II

Automated Geophysical Feature Detection with Deep Learning

Convolutional Neural Networks

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

ImageNet Classification with Deep Convolutional Neural Networks

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Computer Vision Lecture 16

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Deep Neural Networks:

Introduction to Neural Networks and Brief Tutorial with Caffe 10 th Set of Notes

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA

Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Spatial Localization and Detection. Lecture 8-1

Deep Learning with Torch

All You Want To Know About CNNs. Yukun Zhu

Learning Deep Representations for Visual Recognition

Structured Prediction using Convolutional Neural Networks

INTRODUCTION TO DEEP LEARNING

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

A performance comparison of Deep Learning frameworks on KNL

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

Deep Neural Network Evaluation

Day 1 Lecture 6. Software Frameworks for Deep Learning

RNN LSTM and Deep Learning Libraries

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Deconvolutions in Convolutional Neural Networks

Deep Learning and Its Applications

Know your data - many types of networks

CNN Basics. Chongruo Wu

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Learning Deep Features for Visual Recognition

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Machine Learning Workshop

Channel Locality Block: A Variant of Squeeze-and-Excitation

Deep Residual Learning

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Fuzzy Set Theory in Computer Vision: Example 3

Convolutional Neural Network Layer Reordering for Acceleration

Toward Scalable Deep Learning

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Rotation Invariance Neural Network

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.

Fei-Fei Li & Justin Johnson & Serena Yeung

Getting started with Caffe. Jon Barker, Solutions Architect

CAFFE TUTORIAL. Brewing Deep Networks With Caffe XINLEI CHEN

A HPX backend for TensorFlow

Machine Learning. MGS Lecture 3: Deep Learning

Visual scene real-time analysis for Intelligent Vehicles:

Training Neural Networks with Mixed Precision MICHAEL CARILLI CHRISTIAN SAROFEEN MICHAEL RUBERRY BEN BARSDELL

Review: The best frameworks for machine learning and deep learning

Multi-Glance Attention Models For Image Classification

Semantic Image Search. Alex Egg

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Deep Learning. Volker Tresp Summer 2015

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

Real-time Object Detection CS 229 Course Project

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer

Computer Vision Lecture 16

Pyramidal Deep Models for Computer Vision

Stochastic Gradient Descent Algorithm in the Computational Network Toolkit

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Computer Vision Lecture 16

arxiv: v1 [cs.cv] 20 Dec 2016

Weighted Convolutional Neural Network. Ensemble.

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Hardware and Software. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 6-1

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

CNN optimization. Rassadin A

Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR

If you installed VM and Linux libraries as in the tutorial, you should not get any errors. Otherwise, you may need to install wget or gunzip.

arxiv: v1 [cs.cv] 4 Dec 2014

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

CafeGPI. Single-Sided Communication for Scalable Deep Learning

Deep learning using Caffe Execution Process

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Training Deep Neural Networks (in parallel)

Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016

6. Convolutional Neural Networks

ENERGY EFFICIENCY ANALYSIS AND OPTIMIZATION OF CONVOLUTIONAL NEURAL NETWORKS FOR IMAGE RECOGNITION. Xinbo Chen, B.Eng.

Study of Residual Networks for Image Recognition

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms

Convolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Real-time convolutional networks for sonar image classification in low-power embedded systems

Transcription:

Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT

Deep Learning Learning with multi-layer (3~30) neural networks, on a huge training set. State-of-the-art on many AI tasks Computer Vision: Image Classification, Object Detection, Semantic Segmentation, etc. Speech Recognition & Natural Language Processing: Acoustic Modeling, Language Modeling, Word / Sentence embedding

input 7x7+2(S) 3x3+2(S) LocalRespNorm 1x1+1(V) LocalRespNorm 3x3+2(S) DepthConcat 5x5+1(S) DepthConcat 5x5+1(S) 3x3+2(S) DepthConcat 5x5+1(S) AveragePool 5x5+3(V) DepthConcat 5x5+1(S) DepthConcat 5x5+1(S) DepthConcat 5x5+1(S) AveragePool 5x5+3(V) DepthConcat 5x5+1(S) 3x3+2(S) DepthConcat 5x5+1(S) DepthConcat 5x5+1(S) AveragePool 7x7+1(V) FC FC FC SoftmaxActivation softmax0 FC FC SoftmaxActivation softmax1 SoftmaxActivation softmax2 GoogLeNet (Inception) Winner of ILSVRC 2014, 27 layers, ~7 million parameters

ISVRC on Imagenet 30 22.5 15 7.5 0 2010 2011 2012 2013 2014 Human ArXiv 2015 Top-5 error Deep learning dominated since 2012; surpassing human performance since 2015.

Deep Learning in Speech Recognition image source: Li Deng and Dong Yu. Deep Learning Methods and Applications.

Deep Neural Networks A network that consists of computation units (layers, or nodes) connected via a specific architecture. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. ImageNet Classification with Deep olutional Neural Networks. NIPS 2012.

Deep Learning Made Easy A deep learning toolkit provides common layers, easy ways to define network architecture, and transparent interface to high performance computation backends (BLAS, GPUs, etc.) C++: Caffe (widely used on academia), dmlc/cxxnet, cudaconvnet, etc. Python: Theano (auto-differentiation) and wrappers, NervanaSystems/neon, etc. Lua: Torch7 (facebook); Matlab: MatNet (VGG) Julia: pluskid/mocha.jl, dfdx/boltzmann.jl

Why Mocha.jl? Written in Julia and for Julia: easily making use of data pre/post processing and visualization tools from Julia. Minimum dependency: Julia backend ready to run, easy for fast prototyping. Multiple backends: easily switching to CUDA + cudnn based backend for highly efficient deep nets training. Correctness: all computation layers are unit-tested. Modular architecture: layers, activation functions, network topology, etc. Easily extendable.

Mocha.jl Deep learning framework for (and written in) Julia; inspired by Caffe; focusing on easy prototyping, customization, and efficiency (switchable computation backends) > Pkg.add( Mocha ) or for latest dev version: > Pkg.checkout( Mocha ) > Pkg.test( Mocha )

IJulia Example IJulia Image classification example with pre-trained Imagenet model.

Mini-tutor: CNN on MNIST MNIST: handwritten digits Data preparation: Image data in Mocha is represented as 4D tensor: width-by-height-by-channels-by-batch MNIST: 28-by-28-by-1-by-64 Mocha supports ND-tensor for general data HDF5 file: general format for tensor data, also supported by numpy, Matlab, etc.

Mini-tutor: CNN on MNIST Data layer data_layer = AsyncHDF5DataLayer(name="train-data", source="data/ train.txt", batch_size=64, shuffle=true) data/train.txt lists the HDF5 files for training set 64 images is provided for each mini-batch the data is shuffled to improve convergence async data layer use Julia s @async to pre-read data while waiting for computation on CPU / GPU

olution layer INPUT 32x32 C1: feature maps 6@28x28 olutions C3: f. maps 16@10x10 S4: f. maps 16@5x5 S2: f. maps 6@14x14 Subsampling C5: layer F6: layer 120 84 olutions OUTPUT 10 Gaussian connections Full connection Subsampling Full connection LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324. conv_layer = olutionlayer(name="conv1", n_filter=20, kernel=(5,5), bottoms=[:data], tops=[:conv])

Pooling Layer pool_layer = PoolingLayer(name="pool1", kernel=(2,2), stride=(2,2), bottoms=[:conv], tops=[:pool]) Pooling layer operate on the output of convolution layer By default, MAX pooling is performed; can switch to MEAN pooling by specifying pooling=pooling.mean()

Blobs & Net Architecture Network architecture is determined by connecting tops (output) blobs to bottoms (input) blobs with matching blob names. Layers are automatically sorted and connected as a directed acyclic graph (DAG).

Rest of the layers conv2_layer = olutionlayer(name="conv2", n_filter=50, kernel=(5,5), bottoms=[:pool], tops=[:conv2]) pool2_layer = PoolingLayer(name="pool2", kernel=(2,2), stride=(2,2), bottoms=[:conv2], tops=[:pool2]) fc1_layer = InnerProductLayer(name="ip1", output_dim=500, neuron=neurons.relu(), bottoms=[:pool2], tops=[:ip1]) fc2_layer = InnerProductLayer(name="ip2", output_dim=10, bottoms=[:ip1], tops=[:ip2]) loss_layer = SoftmaxLossLayer(name="loss", bottoms=[:ip2, :label])

SGD Solver params = SolverParameters(max_iter=10000, regu_coef=0.0005, mom_policy=mompolicy.fixed(0.9), lr_policy=lrpolicy.inv(0.01, 0.0001, 0.75), load_from=exp_dir) solver = SGD(params)

Coffee Breaks for the solver setup_coffee_lounge(solver, save_into="$exp_dir/ statistics.jld", every_n_iter=1000) # report training progress every 100 iterations add_coffee_break(solver, TrainingSummary(), every_n_iter=100) # save snapshots every 5000 iterations add_coffee_break(solver, Snapshot(exp_dir), every_n_iter=5000)

Solver Statistics Solver statistics will be automatically saved if coffee lounge is set up. Snapshots save the training progress periodically, can continue training from the last snapshot after interruption.

Demo: GPU vs CPU backend = use_gpu? GPUBackend() : CPUBackend()

Parameter Sharing When a layer has trainable parameters (e.g. convolution, inner-product layers), those parameters will be registered under the layer name, and shared by layers with the same name Use cases Validation network during training Pre-training, fine-tuning Advanced architectures, time-delayed nodes

Parameter Sharing

3rd most star-ed Julia package Contributions are very welcome!