Backpropagation and Neural Networks. Lecture 4-1

Similar documents
Administrative. Assignment 1 due Wednesday April 18, 11:59pm

EE 511 Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Deep neural networks II

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Lecture 5: Training Neural Networks, Part I

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

Deep Learning for Computer Vision II

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Convolutional Neural Networks

Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Deep Learning with Tensorflow AlexNet

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Introduction to Neural Networks

Hardware and Software. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 6-1

Introduction to Neural Networks

Lecture 20: Neural Networks for NLP. Zubin Pahuja

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

! References: ! Computer eyesight gets a lot more accurate, NY Times. ! Stanford CS 231n. ! Christopher Olah s blog. ! Take ECS 174!

Machine Learning 13. week

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Accelerating Convolutional Neural Nets. Yunming Zhang

A Quick Guide on Training a neural network using Keras.

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.

Deep Neural Networks Optimization

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Computer Vision Lecture 16

Homework 5. Due: April 20, 2018 at 7:00PM

Perceptron: This is convolution!

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Introduction. Neural Networks. Chapter , , 18.7 and Deep Learning paper. Recognizing Digits using a Neural Net

Natural Language Processing with Deep Learning. CS224N/Ling284. Lecture 5: Backpropagation Kevin Clark

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

INTRODUCTION TO DEEP LEARNING

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Does the Brain do Inverse Graphics?

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Back Propagation and Other Differentiation Algorithms. Sargur N. Srihari

CSC 578 Neural Networks and Deep Learning

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017

Fuzzy Set Theory in Computer Vision: Example 3, Part II

Machine Learning. MGS Lecture 3: Deep Learning

Deep Learning. Architecture Design for. Sargur N. Srihari

DEEP LEARNING IN PYTHON. The need for optimization

Deep Learning with R. Francesca Lazzeri Data Scientist II - Microsoft, AI Research

Technical University of Munich. Exercise 7: Neural Network Basics

Neural Network Neurons

Does the Brain do Inverse Graphics?

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Computer Vision Lecture 16

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Motivation. Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight. Fixed basis function

Supervised Learning in Neural Networks (Part 2)

All You Want To Know About CNNs. Yukun Zhu

Neural Networks CMSC475/675

Advanced Introduction to Machine Learning, CMU-10715

Fusion of Mini-Deep Nets

11/14/2010 Intelligent Systems and Soft Computing 1

Lecture 4: Backpropagation and Automatic Differentiation. CSE599W: Spring 2018

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 15

Opening the Black Box Data Driven Visualizaion of Neural N

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Image Classification pipeline. Lecture 2-1

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Neural Networks in Statistica

Computer Vision Lecture 16

Dynamic Routing Between Capsules

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Lecture 37: ConvNets (Cont d) and Training

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Data Mining. Neural Networks

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Practical Tips for using Backpropagation

Know your data - many types of networks

A performance comparison of Deep Learning frameworks on KNL

ECE 6504: Deep Learning for Perception

Image Classification pipeline. Lecture 2-1

Parallel Deep Network Training

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Instructor: Jessica Wu Harvey Mudd College

Recurrent Neural Networks and Transfer Learning for Action Recognition

Reservoir Computing with Emphasis on Liquid State Machines

Training Deep Neural Networks (in parallel)

Automatic Differentiation for R

Character Recognition Using Convolutional Neural Networks

Keras: Handwritten Digit Recognition using MNIST Dataset

Transcription:

Lecture 4: Backpropagation and Neural Networks Lecture 4-1

Administrative Assignment 1 due Thursday April 20, 11:59pm on Canvas Lecture 4-2

Administrative Project: TA specialities and some project ideas are posted on Piazza Lecture 3-3 April 11, 2017

Administrative Google Cloud: All registered students will receive an email this week with instructions on how to redeem $100 in credits Lecture 4-4

Where we are... scores function SVM loss data loss + regularization want Lecture 4-5

Optimization Landscape image is CC0 1.0 public domain Walking man image is CC0 1.0 public domain Lecture 4-6

Gradient descent Numerical gradient: slow :(, approximate :(, easy to write :) Analytic gradient: fast :), exact :), error-prone :( In practice: Derive analytic gradient, check your implementation with numerical gradient Lecture 4-7

Computational graphs x * s (scores) hinge loss + L W R Lecture 4-8

Convolutional network (AlexNet) input image weights loss Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission. Lecture 4-9

Neural Turing Machine input image loss Figure reproduced with permission from a Twitter post by Andrej Karpathy. Lecture 4-10

Neural Turing Machine Figure reproduced with permission from a Twitter post by Andrej Karpathy. Lecture 4 -

Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Lecture 4-12

Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-13

Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-14

Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-15

Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-16

Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-17

Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-18

Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-19

Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-20

Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Chain rule: Want: Lecture 4-21

Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Lecture 4-22

Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Chain rule: Want: Lecture 4-23

f Lecture 4-24

local gradient f Lecture 4-25

local gradient f gradients Lecture 4-26

local gradient f gradients Lecture 4-27

local gradient f gradients Lecture 4-28

local gradient f gradients Lecture 4-29

Another example: Lecture 4-30

Another example: Lecture 4-31

Another example: Lecture 4-32

Another example: Lecture 4-33

Another example: Lecture 4-34

Another example: Lecture 4-35

Another example: Lecture 4-36

Another example: Lecture 4-37

Another example: Lecture 4-38

Another example: (-1) * (-0.20) = 0.20 Lecture 4-39

Another example: Lecture 4-40

Another example: [local gradient] x [upstream gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!) Lecture 4-41

Another example: Lecture 4-42

Another example: [local gradient] x [upstream gradient] x0: [2] x [0.2] = 0.4 w0: [-1] x [0.2] = -0.2 Lecture 4-43

sigmoid function sigmoid gate Lecture 4-44

sigmoid function sigmoid gate (0.73) * (1-0.73) = 0.2 Lecture 4-45

Patterns in backward flow add gate: gradient distributor Lecture 4-46

Patterns in backward flow add gate: gradient distributor Q: What is a max gate? Lecture 4-47

Patterns in backward flow add gate: gradient distributor max gate: gradient router Lecture 4-48

Patterns in backward flow add gate: gradient distributor max gate: gradient router Q: What is a mul gate? Lecture 4-49

Patterns in backward flow add gate: gradient distributor max gate: gradient router mul gate: gradient switcher Lecture 4-50

Gradients add at branches + Lecture 4-51

Gradients for vectorized code (x,y,z are now vectors) local gradient This is now the Jacobian matrix (derivative of each element of z w.r.t. each element of x) f gradients Lecture 4-52

Vectorized operations 4096-d input vector f(x) = max(0,x) (elementwise) 4096-d output vector Lecture 4-53

Vectorized operations Jacobian matrix 4096-d input vector f(x) = max(0,x) (elementwise) 4096-d output vector Q: what is the size of the Jacobian matrix? Lecture 4-54

Vectorized operations Jacobian matrix 4096-d input vector f(x) = max(0,x) (elementwise) 4096-d output vector Q: what is the size of the Jacobian matrix? [4096 x 4096!] Lecture 4-55

Vectorized operations 4096-d input vector 4096-d output vector f(x) = max(0,x) (elementwise) Q: what is the size of the Jacobian matrix? [4096 x 4096!] in practice we process an entire minibatch (e.g. 100) of examples at one time: i.e. Jacobian would technically be a [409,600 x 409,600] matrix :\ Lecture 4 -

Vectorized operations Jacobian matrix 4096-d input vector 4096-d output vector f(x) = max(0,x) (elementwise) Q: what is the size of the Jacobian matrix? [4096 x 4096!] Q2: what does it look like? Lecture 4 -

A vectorized example: Lecture 4-58

A vectorized example: Lecture 4-59

A vectorized example: Lecture 4-60

A vectorized example: Lecture 4-61

A vectorized example: Lecture 4-62

A vectorized example: Lecture 4-63

A vectorized example: Lecture 4-64

A vectorized example: Lecture 4-65

A vectorized example: Lecture 4-66

A vectorized example: Lecture 4-67

A vectorized example: Lecture 4-68

A vectorized example: Lecture 4-69

A vectorized example: Always check: The gradient with respect to a variable should have the same shape as the variable Lecture 4-70

A vectorized example: Lecture 4-71

A vectorized example: Lecture 4-72

A vectorized example: Lecture 4-73

A vectorized example: Lecture 4-74

Modularized implementation: forward / backward API Graph (or Net) object (rough psuedo code) Lecture 4-75

Modularized implementation: forward / backward API x * z y (x,y,z are scalars) Lecture 4-76

Modularized implementation: forward / backward API x * z y (x,y,z are scalars) Lecture 4-77

Example: Caffe layers Caffe is licensed under BSD 2-Clause Lecture 4-78

Caffe Sigmoid Layer * top_diff (chain rule) Caffe is licensed under BSD 2-Clause Lecture 4-79

In Assignment 1: Writing SVM / Softmax Stage your forward/backward computation! E.g. for the SVM: Lecture 4-80 margins

Summary so far... neural nets will be very large: impractical to write down gradient formula by hand for all parameters backpropagation = recursive application of the chain rule along a computational graph to compute the gradients of all inputs/parameters/intermediates implementations maintain a graph structure, where the nodes implement the forward() / backward() API forward: compute result of an operation and save any intermediates needed for gradient computation in memory backward: apply the chain rule to compute the gradient of the loss function with respect to the inputs Lecture 4-81

Next: Neural Networks Lecture 4-82

Neural networks: without the brain stuff (Before) Linear score function: Lecture 4-83

Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network Lecture 4-84

Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network x 3072 W1 h 100 W2 s 10 Lecture 4-85

Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network x 3072 W1 h 100 W2 s 10 Lecture 4-86

Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network or 3-layer Neural Network Lecture 4-87

Full implementation of training a 2-layer Neural Network needs ~20 lines: Lecture 4-88

In Assignment 2: Writing a 2-layer net Lecture 4-89

This image by Fotis Bobolas is licensed under CC-BY 2.0 Lecture 4-90

Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 Lecture 4-91

Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 Lecture 4-92

Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 sigmoid activation function Lecture 4-93

Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 Lecture 4-94

Be very careful with your brain analogies! Biological Neurons: Many different types Dendrites can perform complex non-linear computations Synapses are not a single weight but a complex non-linear dynamical system Rate code may not be adequate [Dendritic Computation. London and Hausser] Lecture 4-95

Activation functions Sigmoid Leaky ReLU tanh Maxout ReLU ELU Lecture 4-96

Neural networks: Architectures 3-layer Neural Net, or 2-hidden-layer Neural Net 2-layer Neural Net, or 1-hidden-layer Neural Net Fully-connected layers Lecture 4-97

Example feed-forward computation of a neural network We can efficiently evaluate an entire layer of neurons. Lecture 4-98

Example feed-forward computation of a neural network Lecture 4-99

Summary - We arrange neurons into fully-connected layers - The abstraction of a layer has the nice property that it allows us to use efficient vectorized code (e.g. matrix multiplies) - Neural networks are not really neural - Next time: Convolutional Neural Networks Lecture 4-10 0