Backpropagation + Deep Learning
|
|
- Amos Poole
- 5 years ago
- Views:
Transcription
1 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation + Deep Learning Matt Gormley Lecture 13 Mar 1,
2 Reminders Homework 5: Neural Networks Out: Tue, Feb 28 Due: Fri, Mar 9 at 11:59pm 2
3 Q&A 3
4 BACKPROPAGATION 4
5 Background A Recipe for Machine Learning 1. Given training data: 3. Define goal: 2. Choose each of these: Decision function Loss function 4. Train with SGD: (take small steps opposite the gradient) 5
6 Approaches to Differentiation Question 1: When can we compute the gradients of the parameters of an arbitrary neural network? Question 2: When can we make the gradient computation efficient? 6
7 Approaches to Differentiation 1. Finite Difference Method Pro: Great for testing implementations of backpropagation Con: Slow for high dimensional inputs / outputs Required: Ability to call the function f(x) on any input x 2. Symbolic Differentiation Note: The method you learned in high-school Note: Used by Mathematica / Wolfram Alpha / Maple Pro: Yields easily interpretable derivatives Con: Leads to exponential computation time if not carefully implemented Required: Mathematical expression that defines f(x) 3. Automatic Differentiation - Reverse Mode Note: Called Backpropagation when applied to Neural Nets Pro: Computes partial derivatives of one output f(x) i with respect to all inputs x j in time proportional to computation of f(x) Con: Slow for high dimensional outputs (e.g. vector-valued functions) Required: Algorithm for computing f(x) 4. Automatic Differentiation - Forward Mode Note: Easy to implement. Uses dual numbers. Pro: Computes partial derivatives of all outputs f(x) i with respect to one input x j in time proportional to computation of f(x) Con: Slow for high dimensional inputs (e.g. vector-valued x) Required: Algorithm for computing f(x) 7
8 Finite Difference Method Notes: Suffers from issues of floating point precision, in practice Typically only appropriate to use on small examples with an appropriately chosen epsilon 8
9 Symbolic Differentiation Differentiation Quiz #1: Suppose x = 2 and z = 3, what are dy/dx and dy/dz for the function below? 9
10 Symbolic Differentiation Differentiation Quiz #2: 11
11 Chain Rule Whiteboard Chain Rule of Calculus 12
12 { That is, the computation Given: : y = g(u) and u = h(x). Chain es. Rule: dy i dx k = JX j=1 dy i du j du j dx k, Chain Rule 8i, k 13
13 { That is, the computation Given: : y = g(u) and u = h(x). Chain es. Rule: dy i dx k = JX j=1 dy i du j du j dx k, Chain Rule 8i, k Backpropagation is just repeated application of the chain rule from Calculus
14 Backpropagation Whiteboard Example: Backpropagation for Chain Rule #1 Differentiation Quiz #1: Suppose x = 2 and z = 3, what are dy/dx and dy/dz for the function below? 15
15 Backpropagation Automatic Differentiation Reverse Mode (aka. Backpropagation) Forward Computation 1. Write an algorithm for evaluating the function y = f(x). The algorithm defines a directed acyclic graph, where each variable is a node (i.e. the computation graph ) 2. Visit each node in topological order. For variable u i with inputs v 1,, v N a. Compute u i = g i (v 1,, v N ) b. Store the result at the node Backward Computation 1. Initialize all partial derivatives dy/du j to 0 and dy/dy = Visit each node in reverse topological order. For variable u i = g i (v 1,, v N ) a. We already know dy/du i b. Increment dy/dv j by (dy/du i )(du i /dv j ) (Choice of algorithm ensures computing (du i /dv j ) is easy) Return partial derivatives dy/du i for all variables 16
16 Backpropagation Simple Example: The goal is to compute J = ( (x 2 )+3x 2 ) on the forward pass and the derivative dj dx on the backward pass. Forward J = cos(u) u = u 1 + u 2 u 1 = sin(t) u 2 =3t t = x 2 17
17 Backpropagation Simple Example: The goal is to compute J = ( (x 2 )+3x 2 ) on the forward pass and the derivative dj dx on the backward pass. Forward J = cos(u) u = u 1 + u 2 u 1 = sin(t) u 2 =3t t = x 2 Backward dj du = sin(u) dj = dj du, du 1 du du 1 dj dj du 1 = dt du 1 dt, du 1 dt dj dj du 2 = dt du 2 dt, du 2 dt dj dj dt = dx dt dx, du du 1 =1 = (t) =3 dt dx =2x dj = dj du 2 du du du 2, du du 2 =1 18
18 Backpropagation Case 1: Logistic Regression Output Input θ 1 θ 2 θ 3 θ M Forward J = y y +(1 y ) (1 y) y = a = 1 1+ ( a) D j=0 jx j Backward dj dy = y y + (1 y ) y 1 dj da = dj dy dy da, dy da = ( a) ( ( a) + 1) 2 dj = dj d j da dj = dj dx j da da d j, da dx j, da d j = x j da dx j = j 19
19 Backpropagation Output (E) Output (sigmoid) y = 1 1+ ( b) Hidden Layer (D) Output (linear) b = D j=0 jz j Input (C) Hidden (sigmoid) z j = 1 1+ ( a j ), j (B) Hidden (linear) a j = M i=0 jix i, j (A) Input Given x i, i 20
20 Backpropagation (F) Loss J = 1 2 (y y )2 Output (E) Output (sigmoid) y = 1 1+ ( b) Hidden Layer (D) Output (linear) b = D j=0 jz j Input (C) Hidden (sigmoid) z j = 1 1+ ( a j ), j (B) Hidden (linear) a j = M i=0 jix i, j (A) Input Given x i, i 21
21 Backpropagation Case 2: Neural Network Forward J = y y +(1 y ) (1 y) y = b = z j = a j = 1 1+ ( b) D j=0 jz j 1 1+ ( a j ) M i=0 jix i Backward dj dy = y y + (1 y ) y 1 dj db = dj dy dy db, dy db = ( b) ( ( b) + 1) 2 dj = dj d j db dj = dj dz j db db d j, db dz j, db d j = z j db dz j = dj = dj dz j, dz j = da j dz j da j da j dj = dj d ji da j da j d ji, dj = dj da j, da j = dx i da j dx i dx i j ( a j ) ( ( a j ) + 1) 2 da j d ji = x i D j=0 ji 22
22 Backpropagation Case 2: Neural Loss Network Sigmoid Linear Sigmoid Linear Forward J = y y +(1 y ) (1 y) y = b = z j = a j = 1 1+ ( b) D j=0 jz j 1 1+ ( a j ) M i=0 jix i Backward dj dy = y y + (1 y ) y 1 dj db = dj dy dy db, dy db = ( b) ( ( b) + 1) 2 dj = dj d j db dj = dj dz j db db d j, db dz j, db d j = z j db dz j = dj = dj dz j, dz j = da j dz j da j da j dj = dj d ji da j da j d ji, dj = dj da j, da j = dx i da j dx i dx i j ( a j ) ( ( a j ) + 1) 2 da j d ji = x i D j=0 ji 23
23 Backpropagation Whiteboard SGD for Neural Network Example: Backpropagation for Neural Network 24
24 Backpropagation Backpropagation (Auto.Diff. - Reverse Mode) Forward Computation 1. Write an algorithm for evaluating the function y = f(x). The algorithm defines a directed acyclic graph, where each variable is a node (i.e. the computation graph ) 2. Visit each node in topological order. a. Compute the corresponding variable s value b. Store the result at the node Backward Computation 1. Initialize all partial derivatives dy/du j to 0 and dy/dy = Visit each node in reverse topological order. For variable u i = g i (v 1,, v N ) a. We already know dy/du i b. Increment dy/dv j by (dy/du i )(du i /dv j ) (Choice of algorithm ensures computing (du i /dv j ) is easy) Return partial derivatives dy/du i for all variables 25
25 Background A Recipe for Gradients Machine Learning 1. Given training data: 3. Define goal: And it s a special case of a more general algorithm called reversemode automatic differentiation that Decision functioncan compute 4. Train the with gradient SGD: of any differentiable function efficiently! 2. Choose each of these: Loss function Backpropagation can compute this gradient! (take small steps opposite the gradient) 26
26 Summary 1. Neural Networks provide a way of learning features are highly nonlinear prediction functions (can be) a highly parallel network of logistic regression classifiers discover useful hidden representations of the input 2. Backpropagation provides an efficient way to compute gradients is a special case of reverse-mode automatic differentiation 27
27 DEEP NETS 28
28 A Recipe for Goals for Today s Machine Lecture Learning Background Given Explore training a new data: class of 3. decision Define functions goal: (Deep Neural Networks) 2. Consider variants of this recipe for training 2. Choose each of these: Decision function Loss function 4. Train with SGD: (take small steps opposite the gradient) 29
29 Idea #1: No pre-training Idea #1: (Just like a shallow network) Compute the supervised gradient by backpropagation. Take small steps in the direction of the gradient (SGD) 30
30 Comparison on MNIST 2.5 Results from Bengio et al. (2006) on MNIST digit classification task Percent error (lower is better) % Error Shallow Net Idea #1 (Deep Net, nopretraining) Idea #2 (Deep Net, supervised pretraining) Idea #3 (Deep Net, unsupervised pretraining) 31
31 Comparison on MNIST 2.5 Results from Bengio et al. (2006) on MNIST digit classification task Percent error (lower is better) % Error Shallow Net Idea #1 (Deep Net, nopretraining) Idea #2 (Deep Net, supervised pretraining) Idea #3 (Deep Net, unsupervised pretraining) 32
32 Idea #1: No pre-training Idea #1: (Just like a shallow network) Compute the supervised gradient by backpropagation. Take small steps in the direction of the gradient (SGD) What goes wrong? A. Gets stuck in local optima Nonconvex objective Usually start at a random (bad) point in parameter space B. Gradient is progressively getting more dilute Vanishing gradients 33
33 Problem A: Nonconvexity Where does the nonconvexity come from? Even a simple quadratic z = xy objective is nonconvex: z x y 34
34 Problem A: Nonconvexity Where does the nonconvexity come from? Even a simple quadratic z = xy objective is nonconvex: z x y
35 Problem A: Nonconvexity Stochastic Gradient Descent climbs to the top of the nearest hill 36
36 Problem A: Nonconvexity Stochastic Gradient Descent climbs to the top of the nearest hill 37
37 Problem A: Nonconvexity Stochastic Gradient Descent climbs to the top of the nearest hill 38
38 Problem A: Nonconvexity Stochastic Gradient Descent climbs to the top of the nearest hill 39
39 Problem A: Nonconvexity Stochastic Gradient Descent climbs to the top of the nearest hill which might not lead to the top of the mountain 40
40 Problem B: Vanishing Gradients The gradient for an edge at the base of the network depends on the gradients of many edges above it Output Hidden Layer Hidden Layer The chain rule multiplies many of these partial derivatives together Hidden Layer Input 41
41 Problem B: Vanishing Gradients The gradient for an edge at the base of the network depends on the gradients of many edges above it Output Hidden Layer Hidden Layer The chain rule multiplies many of these partial derivatives together Hidden Layer Input 42
42 Problem B: Vanishing Gradients The gradient for an edge at the base of the network depends on the gradients of many edges above it Output Hidden Layer Hidden Layer The chain rule multiplies many of these partial derivatives together Hidden Layer Input
43 Idea #1: No pre-training Idea #1: (Just like a shallow network) Compute the supervised gradient by backpropagation. Take small steps in the direction of the gradient (SGD) What goes wrong? A. Gets stuck in local optima Nonconvex objective Usually start at a random (bad) point in parameter space B. Gradient is progressively getting more dilute Vanishing gradients 44
44 Idea #2: Supervised Pre-training Idea #2: (Two Steps) Train each level of the model in a greedy way Then use our original idea 1. Supervised Pre-training Use labeled data Work bottom-up Train hidden layer 1. Then fix its parameters. Train hidden layer 2. Then fix its parameters. Train hidden layer n. Then fix its parameters. 2. Supervised Fine-tuning Use labeled data to train following Idea #1 Refine the features by backpropagation so that they become tuned to the end-task 45
45 Idea #2: Supervised Pre-training Idea #2: (Two Steps) Train each level of the model in a greedy way Then use our original idea Output Hidden Layer 1 Input 46
46 Idea #2: Supervised Pre-training Idea #2: (Two Steps) Train each level of the model in a greedy way Then use our original idea Output Hidden Layer 2 Hidden Layer 1 Input 47
47 Idea #2: Supervised Pre-training Idea #2: (Two Steps) Output Train each level of the model in a greedy way Then use our original idea Hidden Layer 3 Hidden Layer 2 Hidden Layer 1 Input 48
48 Idea #2: Supervised Pre-training Idea #2: (Two Steps) Output Train each level of the model in a greedy way Then use our original idea Hidden Layer 3 Hidden Layer 2 Hidden Layer 1 Input 49
49 Comparison on MNIST 2.5 Results from Bengio et al. (2006) on MNIST digit classification task Percent error (lower is better) % Error Shallow Net Idea #1 (Deep Net, nopretraining) Idea #2 (Deep Net, supervised pretraining) Idea #3 (Deep Net, unsupervised pretraining) 50
50 Comparison on MNIST 2.5 Results from Bengio et al. (2006) on MNIST digit classification task Percent error (lower is better) % Error Shallow Net Idea #1 (Deep Net, nopretraining) Idea #2 (Deep Net, supervised pretraining) Idea #3 (Deep Net, unsupervised pretraining) 51
51 Idea #3: Unsupervised Pre-training Idea #3: (Two Steps) Use our original idea, but pick a better starting point Train each level of the model in a greedy way 1. Unsupervised Pre-training Use unlabeled data Work bottom-up Train hidden layer 1. Then fix its parameters. Train hidden layer 2. Then fix its parameters. Train hidden layer n. Then fix its parameters. 2. Supervised Fine-tuning Use labeled data to train following Idea #1 Refine the features by backpropagation so that they become tuned to the end-task 52
52 Unsupervised pretraining of the first layer: What should it predict? What else do we observe? The input! The solution: Output Unsupervised pretraining This topology defines an Auto-encoder. Hidden Layer Input 53
53 Unsupervised pretraining of the first layer: What should it predict? What else do we observe? The input! The solution: Unsupervised pretraining Input This topology defines an Auto-encoder. Hidden Layer Input 54
54 Auto-Encoders Key idea: Encourage z to give small reconstruction error: x is the reconstruction of x Loss = x DECODER(ENCODER(x)) 2 Train with the same backpropagation algorithm for 2-layer Neural Networks with x m as both input and output. DECODER: x = h(w z) Input Hidden Layer ENCODER: z = h(wx) Input Slide adapted from Raman Arora 55
55 Unsupervised pretraining Work bottom-up Train hidden layer 1. Then fix its parameters. Train hidden layer 2. Then fix its parameters. Train hidden layer n. Then fix its parameters. The solution: Unsupervised pretraining Input Hidden Layer Input 56
56 Unsupervised pretraining Work bottom-up Train hidden layer 1. Then fix its parameters. Train hidden layer 2. Then fix its parameters. Train hidden layer n. Then fix its parameters. The solution: Hidden Layer Hidden Layer Input Unsupervised pretraining 57
57 Unsupervised pretraining Work bottom-up Train hidden layer 1. Then fix its parameters. Train hidden layer 2. Then fix its parameters. Train hidden layer n. Then fix its parameters. The solution: Input Unsupervised pretraining Hidden Layer Hidden Layer Hidden Layer 58
58 The solution: Unsupervised pretraining Work bottom-up Train hidden layer 1. Then fix its parameters. Train hidden layer 2. Then fix its parameters. Train hidden layer n. Then fix its parameters. Supervised fine-tuning Backprop and update all Output Hidden Layer Hidden Layer Hidden Layer Input Unsupervised pretraining parameters 59
59 Deep Network Training Idea #1: 1. Supervised fine-tuning only Idea #2: 1. Supervised layer-wise pre-training 2. Supervised fine-tuning Idea #3: 1. Unsupervised layer-wise pre-training 2. Supervised fine-tuning 60
60 Comparison on MNIST 2.5 Results from Bengio et al. (2006) on MNIST digit classification task Percent error (lower is better) % Error Shallow Net Idea #1 (Deep Net, nopretraining) Idea #2 (Deep Net, supervised pretraining) Idea #3 (Deep Net, unsupervised pretraining) 61
61 Comparison on MNIST 2.5 Results from Bengio et al. (2006) on MNIST digit classification task Percent error (lower is better) % Error Shallow Net Idea #1 (Deep Net, nopretraining) Idea #2 (Deep Net, supervised pretraining) Idea #3 (Deep Net, unsupervised pretraining) 62
62 Is layer-wise pre-training always necessary? In 2010, a record on a hand-writing recognition task was set by standard supervised backpropagation (our Idea #1). How? A very fast implementation on GPUs. See Ciresen et al. (2010) 63
63 Deep Learning Goal: learn features at different levels of abstraction Training can be tricky due to Nonconvexity Vanishing gradients Unsupervised layer-wise pre-training can help with both! 64
27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationMachine Learning. MGS Lecture 3: Deep Learning
Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer
More informationPerceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron Matt Gormley Lecture 5 Jan. 31, 2018 1 Q&A Q: We pick the best hyperparameters
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationValue Iteration. Reinforcement Learning: Introduction to Machine Learning. Matt Gormley Lecture 23 Apr. 10, 2019
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Reinforcement Learning: Value Iteration Matt Gormley Lecture 23 Apr. 10, 2019 1
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationDeep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES
Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated
More informationNatural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs
Natural Language Processing with Deep Learning CS4N/Ling84 Christopher Manning Lecture 4: Backpropagation and computation graphs Lecture Plan Lecture 4: Backpropagation and computation graphs 1. Matrix
More informationNeural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders
Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationMachine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K- Means + GMMs Clustering Readings: Murphy 25.5 Bishop 12.1, 12.3 HTF 14.3.0 Mitchell
More informationk-nearest Neighbors + Model Selection
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationNeural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing
Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing feature 3 PC 3 Beate Sick Many slides are taken form Hinton s great lecture on NN: https://www.coursera.org/course/neuralnets
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationCOMPUTATIONAL INTELLIGENCE
COMPUTATIONAL INTELLIGENCE Radial Basis Function Networks Adrian Horzyk Preface Radial Basis Function Networks (RBFN) are a kind of artificial neural networks that use radial basis functions (RBF) as activation
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationThe Mathematics Behind Neural Networks
The Mathematics Behind Neural Networks Pattern Recognition and Machine Learning by Christopher M. Bishop Student: Shivam Agrawal Mentor: Nathaniel Monson Courtesy of xkcd.com The Black Box Training the
More informationMachine learning for vision. It s the features, stupid! cathedral. high-rise. Winter Roland Memisevic. Lecture 2, January 26, 2016
Winter 2016 Lecture 2, Januar 26, 2016 f2? cathedral high-rise f1 A common computer vision pipeline before 2012 1. 2. 3. 4. Find interest points. Crop patches around them. Represent each patch with a sparse
More informationDeep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationIntroduction to Deep Learning
ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationFor Monday. Read chapter 18, sections Homework:
For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron
More informationCS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationDeep Learning. Volker Tresp Summer 2014
Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationCambridge Interview Technical Talk
Cambridge Interview Technical Talk February 2, 2010 Table of contents Causal Learning 1 Causal Learning Conclusion 2 3 Motivation Recursive Segmentation Learning Causal Learning Conclusion Causal learning
More informationParallel Deep Network Training
Lecture 26: Parallel Deep Network Training Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Tunes Speech Debelle Finish This Album (Speech Therapy) Eat your veggies and study
More informationBack Propagation and Other Differentiation Algorithms. Sargur N. Srihari
Back Propagation and Other Differentiation Algorithms Sargur N. srihari@cedar.buffalo.edu 1 Topics (Deep Feedforward Networks) Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units
More informationCS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016
CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)
More informationEmotion Detection using Deep Belief Networks
Emotion Detection using Deep Belief Networks Kevin Terusaki and Vince Stigliani May 9, 2014 Abstract In this paper, we explore the exciting new field of deep learning. Recent discoveries have made it possible
More informationExtracting and Composing Robust Features with Denoising Autoencoders
Presenter: Alexander Truong March 16, 2017 Extracting and Composing Robust Features with Denoising Autoencoders Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre-Antoine Manzagol 1 Outline Introduction
More informationMultivariate Calculus: Review Problems for Examination Two
Multivariate Calculus: Review Problems for Examination Two Note: Exam Two is on Tuesday, August 16. The coverage is multivariate differential calculus and double integration. You should review the double
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationNeural Network Neurons
Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given
More informationCOMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units
COMP9 8s Geometry of Hidden Units COMP9 Neural Networks and Deep Learning 5. Geometry of Hidden Units Outline Geometry of Hidden Unit Activations Limitations of -layer networks Alternative transfer functions
More informationLearning via Optimization
Lecture 7 1 Outline 1. Optimization Convexity 2. Linear regression in depth Locally weighted linear regression 3. Brief dips Logistic Regression [Stochastic] gradient ascent/descent Support Vector Machines
More informationDeep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers
Deep Learning Workshop Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Why deep learning? The ImageNet Challenge Goal: image classification with 1000 categories Top 5 error rate of 15%. Krizhevsky, Alex,
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationModel Generalization and the Bias-Variance Trade-Off
Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Model Generalization and the Bias-Variance Trade-Off Neural Networks and Deep Learning, Springer, 2018 Chapter 4, Section 4.1-4.2 What
More informationLecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 December 2012
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 7: Universal Approximation Theorem, More Hidden Units, Multi-Class Classifiers, Softmax, and Regularization Peter Belhumeur Computer Science Columbia University
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationTraining Deep Neural Networks (in parallel)
Lecture 9: Training Deep Neural Networks (in parallel) Visual Computing Systems How would you describe this professor? Easy? Mean? Boring? Nerdy? Professor classification task Classifies professors as
More informationAssignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation
Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class
More informationMultivariate Calculus Review Problems for Examination Two
Multivariate Calculus Review Problems for Examination Two Note: Exam Two is on Thursday, February 28, class time. The coverage is multivariate differential calculus and double integration: sections 13.3,
More informationFall 09, Homework 5
5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationAn Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation
An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007
More informationNeural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.
Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationMultinomial Regression and the Softmax Activation Function. Gary Cottrell!
Multinomial Regression and the Softmax Activation Function Gary Cottrell Notation reminder We have N data points, or patterns, in the training set, with the pattern number as a superscript: {(x 1,t 1 ),
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationReview: Final Exam CPSC Artificial Intelligence Michael M. Richter
Review: Final Exam Model for a Learning Step Learner initially Environm ent Teacher Compare s pe c ia l Information Control Correct Learning criteria Feedback changed Learner after Learning Learning by
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Unsupervised learning Daniel Hennes 29.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Supervised learning Regression (linear
More informationDeep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.
Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,
More informationClass 6 Large-Scale Image Classification
Class 6 Large-Scale Image Classification Liangliang Cao, March 7, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual
More informationMachine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart
Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider
More informationModel Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal
More informationMachine Learning for Physicists Lecture 6. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt
Machine Learning for Physicists Lecture 6 Summer 2017 University of Erlangen-Nuremberg Florian Marquardt Channels MxM image MxM image K K 3 channels conv 6 channels in any output channel, each pixel receives
More informationMulti-Layered Perceptrons (MLPs)
Multi-Layered Perceptrons (MLPs) The XOR problem is solvable if we add an extra node to a Perceptron A set of weights can be found for the above 5 connections which will enable the XOR of the inputs to
More informationTransfer Learning Using Rotated Image Data to Improve Deep Neural Network Performance
Transfer Learning Using Rotated Image Data to Improve Deep Neural Network Performance Telmo Amaral¹, Luís M. Silva¹², Luís A. Alexandre³, Chetak Kandaswamy¹, Joaquim Marques de Sá¹ 4, and Jorge M. Santos¹
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More informationHomework 5. Due: April 20, 2018 at 7:00PM
Homework 5 Due: April 20, 2018 at 7:00PM Written Questions Problem 1 (25 points) Recall that linear regression considers hypotheses that are linear functions of their inputs, h w (x) = w, x. In lecture,
More informationLecture 19: Generative Adversarial Networks
Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,
More informationAdvanced Introduction to Machine Learning, CMU-10715
Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture
More informationGradient Descent - Problem of Hiking Down a Mountain
Gradient Descent - Problem of Hiking Down a Mountain Udacity Have you ever climbed a mountain? I am sure you had to hike down at some point? Hiking down is a great exercise and it is going to help us understand
More informationPractical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow
Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains
More information(Multinomial) Logistic Regression + Feature Engineering
-6 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University (Multinomial) Logistic Regression + Feature Engineering Matt Gormley Lecture 9 Feb.
More informationPractical Tips for using Backpropagation
Practical Tips for using Backpropagation Keith L. Downing August 31, 2017 1 Introduction In practice, backpropagation is as much an art as a science. The user typically needs to try many combinations of
More informationAutoencoders, denoising autoencoders, and learning deep networks
4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationHomework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1
Homework 2 Due: March 2, 2018 at 7:00PM Written Questions Problem 1: Estimator (5 points) Let x 1, x 2,..., x m be an i.i.d. (independent and identically distributed) sample drawn from distribution B(p)
More informationDay 3 Lecture 1. Unsupervised Learning
Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations
More informationUsing neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah
Using neural nets to recognize hand-written digits Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the first chapter of the online book by Michael
More informationOptimization. Industrial AI Lab.
Optimization Industrial AI Lab. Optimization An important tool in 1) Engineering problem solving and 2) Decision science People optimize Nature optimizes 2 Optimization People optimize (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do)
More informationLinear Regression and K-Nearest Neighbors 3/28/18
Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,
More informationDeep Learning and Its Applications
Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent
More informationNeural Networks: promises of current research
April 2008 www.apstat.com Current research on deep architectures A few labs are currently researching deep neural network training: Geoffrey Hinton s lab at U.Toronto Yann LeCun s lab at NYU Our LISA lab
More informationUnsupervised Domain Adaptation by Backpropagation. Chih-Hui Ho, Xingyu Gu, Yuan Qi
Unsupervised Domain Adaptation by Backpropagation Chih-Hui Ho, Xingyu Gu, Yuan Qi Problems Deep network: requires massive labeled training data. Labeled data: Available sometimes: Image recognition Speech
More informationAlternatives to Direct Supervision
CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of
More informationSequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015
Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using
More information