Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Similar documents
Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Introduction to Deep Learning

INTRODUCTION TO DEEP LEARNING

Computer Vision Lecture 16

ECE 6504: Deep Learning for Perception

Deep Learning with R. Francesca Lazzeri Data Scientist II - Microsoft, AI Research

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Machine Learning 13. week

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Deep Learning and Its Applications

Using neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Advanced Introduction to Machine Learning, CMU-10715

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Deep Learning for Computer Vision II

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Keras: Handwritten Digit Recognition using MNIST Dataset

Machine Learning. MGS Lecture 3: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

Deep Learning. Volker Tresp Summer 2017

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

A performance comparison of Deep Learning frameworks on KNL

Deep Learning with Tensorflow AlexNet

Back Propagation and Other Differentiation Algorithms. Sargur N. Srihari

All You Want To Know About CNNs. Yukun Zhu

Convolutional Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Lecture 37: ConvNets (Cont d) and Training

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Deep Learning. Volker Tresp Summer 2015

CS6220: DATA MINING TECHNIQUES

Neural Network Exchange Format

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Neural Network Neurons

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

DEEP LEARNING IN PYTHON. The need for optimization

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.

Accelerating Convolutional Neural Nets. Yunming Zhang

Hardware and Software. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 6-1

Review: The best frameworks for machine learning and deep learning

Perceptron: This is convolution!

Fei-Fei Li & Justin Johnson & Serena Yeung

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

ConvolutionalNN's... ConvNet's... deep learnig

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Multi-Glance Attention Models For Image Classification

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Seminars in Artifiial Intelligenie and Robotiis

Deep Learning & Neural Networks

Deep Learning. Architecture Design for. Sargur N. Srihari

CS489/698: Intro to ML

CSE 559A: Computer Vision

! References: ! Computer eyesight gets a lot more accurate, NY Times. ! Stanford CS 231n. ! Christopher Olah s blog. ! Take ECS 174!

Recurrent Convolutional Neural Networks for Scene Labeling

Study of Residual Networks for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun

Backpropagation and Neural Networks. Lecture 4-1

Deep Neural Networks:

Machine Learning Workshop

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Global Optimality in Neural Network Training

CSC 578 Neural Networks and Deep Learning

Why data science is the new frontier in software development

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

An Introduction to Deep Learning with RapidMiner. Philipp Schlunder - RapidMiner Research

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer

Autoencoders, denoising autoencoders, and learning deep networks

Convolutional Layer Pooling Layer Fully Connected Layer Regularization

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Introduction to Neural Networks

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

Computer Vision Lecture 16

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity

Deep Indian Delicacy: Classification of Indian Food Images using Convolutional Neural Networks

Introduction to Neural Networks

10/6/2017. What is deep learning Some motivation Enablers Data Computation Structure Software Closing examples

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms

Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

A Quick Guide on Training a neural network using Keras.

An Introduction to NNs using Keras

Administrative. Assignment 1 due Wednesday April 18, 11:59pm

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Vulnerability of machine learning models to adversarial examples

Deep neural networks II

Computer Vision Lecture 16

Fully Convolutional Networks for Semantic Segmentation

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Transcription:

Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics, Informatics and Intelligent Systems Institute for Artificial Intelligence and Decision Support Spitalgasse 23, 1090 Vienna, BT88.04.808 November 7, 2017

Introduction References (available online for free): "Neural Networks and Deep Learning". Michael A. Nielsen, Determination Press, 2015 intuition first, math after "Deep Learning". Ian Goodfellow, Youshua Bengio and Aaron Courville, MIT Press, 2016 formal with fair amount of intuition, more general than Nielsen These slides are based on DL courses: Course notes "CNN for Visual Recognition" (Stanford, Spring 2017) Course notes "An introduction to Deep Learning", Marc Aurelio Ranzato (Facebook AI Research), DeepLearn Summer School - Bilbao, 17-21 July 2017

Why Deep Learning Peter Norvig s 1 recollection on Geoff Hinton s 2 talk on Boltzmann Machine 3 work (back in 1980) 1. Cognitive plausibility in terms of a model of the brain 2. Model that learns from experiences rather that programmed by hand 3. Continuous representations rather than Boolean, as in traditional symbolic expert systems 1 Research Director at Google, co-author of classical texts on AI 2 Professor at University of Toronto, one of the pioneers of Deep Learning 3 Boltzmann Machine (and Probabilistic Graphical Models) one of the theoretical foundations for generative DL models

Neural networks and Deep Learning Neural networks - biologically-inspired programming paradigm enables computer to learn from observational data universal function approximation machine 4 Deep learning - powerful set of techniques for learning in neural networks harness GPU resources to parallelize and speed up matrix-vector computations give rise to modularized approach to learning 4 Hornik, "Approximation capabilities of Multilayer Feedforward Networks", Neural Networks, 1991

Deep Learning - what s in the name? DL, roughly speaking, is NN with many layers and many neurons in each layer not true in all cases though (e.g., embeddings are often shallow) Figure 1: Simple and Deep NNs (image credit 5 ) 5 https://hackernoon.com/log-analytics-with-deep-learning-and-machine-learning-20a1891ff70e

Hierarchical feature learning DL learns features automatically, and hierarchically Figure 2: (Convolutional) Neural Network to detect a face 6 6 credit "Michael A. Nielsen"

Hierarchical feature learning (cont.) Learnt features can be combined Figure 3: Further decomposition of learnt features 7 7 credit "Michael A. Nielsen"

Neural networks Figure 4: 2 hidden layer network/4 layer network (+ input, output) 8 Universal function approximation that maps input to output f : R n R m Class of functions considered to map input to output composition of simpler (including non-linear 9 ) functions h 1 is non-linear max(0, W x + b) aka ReLU f = o h 2 x h 1 x 8 image credit M-A. Ranzato (Facebook AI Research) 9 composition of only linear function would be equivalent to one linear function

Forward propagation Figure 5: Forward pass on the network x R D, W 1 R N 1 D b 1 R N 1, h 1 R N 1 h 1 = max(0, W 1 x + b) W 1 1-st layer weight matrix or weights b 1 1-st layer biases

Why non linear layers ReLU layers provide piece-wise linear tiling # planes grows exponentially w. # hidden units Multiple layers yield exponential savings in # parameters (parameter sharing) Figure 6: with ReLU mapping is locally linear 10 10 Montufar et al. "On the number of linear regions of DNNs", arxiv, 2014

How good is the network: task-dependant loss function V i regression: MSE (mean squared error) V1 (y, f ) = (y f (x)) 2 classification: variants of Cross-Entropy loss class (category) index k 1... C predicted classes f (x) = [ 1 0 0... k 1... C 0], f (x) k = 1 true classes y = [ 1 1 0... k 0... C 0], y k = 0 probability that x belongs to class ck C ef (x) 1 p(c k = 1 x) = ef (x) k loss function with log-likelihoods (easier to optimize) V 2(y, f ) = k y k log p(c k x)

Optimization: finding the best f Typical setup for optimization f can be parameterized with Θ (f = Θ x linear case) minimizing (learning) the loss function V over all training examples 1... n plus regularizations on: λ2 (f ) - controls complexity of the function (usually norm f ) λ1 (f, Θ) - sparsity of the solution, where Θ parameters of f f = argmin f = n V (y, f (x)) + λ 2 (f ) + λ 1 (f, Θ) 1 to find f you need to minimize complicated function backpropagation gives the gradients of that complicated function

Recap Neural nets - chain (composition) of non-linear operations, implementing highly non-linear functions Forward pass computes error between the currently learnt mapping function and the actual output Backward pass computes gradients w.r.t. inputs at each layer and parameters Optimization (minimization of the loss error) done by stochastic gradient descent (or variants of it)

Computation: speed up and parallelize with GPU In a nutshell DL is all about matrix multiplication Figure 7: Matrix-matrix multiplication 11 Entries of the A C matrix can be computed in parallel with GPU A B rows and B C cols loaded in the shared memory 11 image credit: Course notes "CNN for Visual Recognition" (Stanford, Spring 2017)

Function composition and computational graph x n 1 y 1 + z 1 f (x, y, z) =. in vector notation. i x n y n + z n n = (x y + z) Hadamard product, elementwise multiplication i n = (a + z) a = x y i n = b b = a + z i n = c c = b i

Function composition and computational graph (contd.) f (x, y, z) = x n 1 y 1 + z 1. i x n y n + z n Figure 8: computational graph with numpy 12 12 image credit: Course notes "CNN for Visual Recognition" (Stanford, Spring 2017)

Gradients of function composition x f = x f x n 1 y 1 + z 1 x i. =. i x n y n + z f n x n, f x i = f c c b b a a x i x f = y y f = x z f = 1

Gradients of function composition (contd.) Cons of using numpy only: Manual computation of gradients for all f No GPU support Figure 9: computational graph and gradients with numpy 13 13 image credit: Course notes "CNN for Visual Recognition" (Stanford, Spring 2017)

Deep Learning frameworks Goals: 1. Easily build big computation graphs 2. Easily compute gradients in computational graphs (automatic gradient computation) 3. Run it all efficiently on GPU (wrap low level NVIDIA and Linear Algebra libraries (e.g., cudnn, cublas)) Academia/Industry open source frameworks Caffe (UC Berkeley) Caffe2 (Facebook) Torch (NYU/Facebook) PyTorch (Facebook) Theano (U Montreal) TensorFlow (Google) Industry (not necessarily open source) frameworks Paddle (Baidu), CNTK (Microsoft), MXNet (Amazon), and others... High-level frameworks Keras (Theano, TensorFlow or CNTK as backend) good for beginners

DL frameworks comparison Figure 10: Computational graph definition in numpy, pytorch and tensorflow 15 15 image credit: Course notes "CNN for Visual Recognition" (Stanford, Spring 2017)

DL frameworks: Demo

Deep Learning for Vision Idea: unwrap images (2d matrices) into 1d vectors R 200 200 R 40000 Figure 11: fully connected layer for visual recognition (image credit Ranzato FAIR) Problem: feed them into Neural Networks (fully connected layers) spatial correlation is local waste of resources not robust to transformations (scale, rotation, translation)

Convolutional Layer shared weights across the whole image convolution takes advantage of stationarity (similar statistics at different locations) local spatial correlation Figure 12: convolutional layer for visual recognition Figure 13: convolutions with learnt kernels

Multiple convolutional filters h n j = max(0, K k=1 h n 1 k w n kj) Figure 14: multiple convolutional filters for visual recognition (image credit Ranzato FAIR) Figure 15: one convolution layer

Pooling layer Pooling layer goal: spatial robustness for feature extraction Assume our filter is eye dectector Pooling layer makes eye detector robust to exact location of eye Figure 16: pooling layer (image credit Ranzato FAIR)

Pooling layer (contd.) h n j (x, y) = max x N(x),y N(y) h n 1 j (x, y) Figure 17: pooling layer (image credit Ranzato FAIR) by pooling (e.g., taking max) filter responses at different locations we gain robustness to the exact spatial location of features

ConvNets architecture Figure 18: LeCun et al. "Gradient based learning applied to document recognition" IEEE 1988

DL for vision Demo