Natural Language Processing with Deep Learning. CS224N/Ling284. Lecture 5: Backpropagation Kevin Clark

Size: px
Start display at page:

Download "Natural Language Processing with Deep Learning. CS224N/Ling284. Lecture 5: Backpropagation Kevin Clark"

Transcription

1 Natural Language Processing with Deep Learning CS4N/Ling84 Lecture 5: Backpropagation Kevin Clark

2 Announcements Assignment 1 due Thursday, 11:59 You can use up to 3 late days (making it due Sunday at midnight) Default final project will be released February 1 st To help you choose which project opgon you want to do Final project proposal due February 8 th See website for details and inspiragon

3 Overview Today: From one-layer to mulg layer neural networks! Fully vectorized gradient computagon The backpropagagon algorithm (Time perminng) Class project Gps 3

4 Remember: One-layer Neural Net x = [ x museums x in x Paris x are x amazing ] 4

5 Two-layer Neural Net x = [ x museums x in x Paris x are x amazing ] 5

6 Repeat as Needed! x = [ x museums x in x Paris x are x amazing ] 6

7 Why Have Layers? Hierarchical representagons -> neural net can represent complicated features BeZer results! # Layers Machine Transla@on Score From Transformer Network (will cover in a later lecture) 7

8 Remember: Gradient Descent Update equagon: α = step size or learning rate 8

9 Remember: Gradient Descent Update equagon: α = step size or learning rate This Lecture: How do we compute? By hand Algorithmically (the backpropagagon algorithm) 9

10 Why learn all these details about gradients? Modern deep learning frameworks compute gradients for you But why take a class on compilers or systems when they are implemented for you? Understanding what is going on under the hood is useful! BackpropagaGon doesn t always work perfectly. Understanding why is crucial for debugging and improving models Example in future lecture: exploding and vanishing gradients 10

11 Quickly Gradients by Hand Review of mulgvariable derivagves Fully vectorized gradients Much faster and more useful than non-vectorized gradients But doing a non-vectorized gradient can be good pracgce, see slides in last week s lecture for an example Lecture notes cover this material in more detail 11

12 Gradients Given a funcgon with 1 output and n inputs Its gradient is a vector of pargal derivagves 1

13 Jacobian Matrix: of the Gradient Given a funcgon with m outputs and n inputs Its Jacobian is an m x n matrix of pargal derivagves 13

14 Chain Rule For Jacobians For one-variable funcgons: mulgply derivagves For mulgple variables: mulgply Jacobians 14

15 Example Jacobian: 15

16 Example Jacobian: FuncGon has n outputs and n inputs -> n by n Jacobian 16

17 Example Jacobian: 17

18 Example Jacobian: 18

19 Example Jacobian: 19

20 Other Jacobians Compute these at home for pracgce! Check your answers with the lecture notes 0

21 Other Jacobians Compute these at home for pracgce! Check your answers with the lecture notes 1

22 Other Jacobians Compute these at home for pracgce! Check your answers with the lecture notes

23 Other Jacobians Compute these at home for pracgce! Check your answers with the lecture notes 3

24 Back to Neural Nets! 4 x = [ x museums x in x Paris x are x amazing ]

25 Back to Neural Nets! Let s find In pracgce we care about the gradient of the loss, but we will compute the gradient of the score for simplicity 5 x = [ x museums x in x Paris x are x amazing ]

26 1. Break up into simple pieces 6

27 . Apply the chain rule 7

28 . Apply the chain rule 8

29 . Apply the chain rule 9

30 . Apply the chain rule 30

31 3. Write out the Jacobians Useful Jacobians from previous slide 31

32 3. Write out the Jacobians Useful Jacobians from previous slide 3

33 3. Write out the Jacobians Useful Jacobians from previous slide 33

34 3. Write out the Jacobians Useful Jacobians from previous slide 34

35 3. Write out the Jacobians Useful Jacobians from previous slide 35

36 Re-using Suppose we now want to compute Using the chain rule again: 36

37 Re-using Suppose we now want to compute Using the chain rule again: The same! Let s avoid duplicated computagon 37

38 Re-using Suppose we now want to compute Using the chain rule again: 38

39 with respect to Matrix What does look like? 1 output, nm inputs: 1 by nm Jacobian? Inconvenient to do 39

40 with respect to Matrix What does look like? 1 output, nm inputs: 1 by nm Jacobian? Inconvenient to do Instead follow convengon: shape of the gradient is shape of parameters So is n by m: 40

41 with respect to Matrix Remember is going to be in our answer The other term should be because It turns out 41

42 Why the Transposes? Hacky answer: this makes the dimensions work out Useful trick for checking your work! Full explanagon in the lecture notes 4

43 Why the Transposes? Hacky answer: this makes the dimensions work out Useful trick for checking your work! Full explanagon in the lecture notes 43

44 What shape should be? is a row vector But convengon says our gradient should be a column vector because is a column vector Disagreement between Jacobian form (which makes the chain rule easy) and the shape convengon (which makes implemengng SGD easy) We expect answers to follow the shape convengon But Jacobian form is useful for compugng the answers 44

45 What shape should be? Two opgons: 1. Use Jacobian form as much as possible, reshape to follow the convengon at the end: What we just did. But at the end transpose to make the derivagve a column vector, resulgng in. Always follow the convengon Look at dimensions to figure out when to transpose and/or reorder terms. 45

46 Notes on PA1 Don t worry if you used some other method for gradient computagon (as long as your answer is right and you are consistent!) This lecture we computed the gradient of the score, but in PA1 its of the loss Don t forget to replace f with the actual derivagve PA1 uses for the linear transformagon: gradients are different! 46

47 Compute gradients algorithmically ConverGng what we just did by hand into an algorithm Used by deep learning frameworks (TensorFlow, PyTorch, etc.) 47

48 Graphs RepresenGng our neural net equagons as a graph Source nodes: inputs Interior nodes: operagons + 48

49 Graphs RepresenGng our neural net equagons as a graph Source nodes: inputs Interior nodes: operagons Edges pass along result of the operagon + 49

50 Graphs RepresenGng our neural net equagons as a graph Source nodes: inputs Forward PropagaGon Interior nodes: operagons Edges pass along result of the operagon + 50

51 Go backwards along edges Pass along gradients + 51

52 Single Node Node receives an upstream gradient Goal is to pass on the correct downstream gradient 5 Downstream gradient Upstream gradient

53 Single Node Each node has a local gradient The gradient of its output with respect to its input 53 Downstream gradient Local gradient Upstream gradient

54 Single Node Each node has a local gradient The gradient of its output with respect to its input Chain rule! 54 Downstream gradient Local gradient Upstream gradient

55 Single Node Each node has a local gradient The gradient of its output with respect to its input [downstream gradient] = [upstream gradient] x [local gradient] 55 Downstream gradient Local gradient Upstream gradient

56 Single Node What about nodes with mulgple inputs? * 56

57 Single Node MulGple inputs -> mulgple local gradients * 57 Downstream gradients Local gradients Upstream gradient

58 An Example 58

59 An Example Forward prop steps + max * 59

60 An Example Forward prop steps max 3 * 6 60

61 An Example Forward prop steps Local gradients max 3 * 6 61

62 An Example Forward prop steps Local gradients max 3 * 6 6

63 An Example Forward prop steps Local gradients max 3 * 6 63

64 An Example Forward prop steps Local gradients max 3 * 6 64

65 An Example Forward prop steps Local gradients max 3 1* = 1*3 = 3 * 6 1 upstream * local = downstream

66 An Example Forward prop steps Local gradients *1 = 3 0 3*0 = 0 + max 3 3 * 6 1 upstream * local = downstream

67 An Example Forward prop steps Local gradients 67 1 *1 = *1 = max 3 3 * 6 1 upstream * local = downstream

68 An Example Forward prop steps Local gradients max 3 3 * 6 1

69 Gradients add at branches + 69

70 Gradients add at branches + 70

71 Node + distributes the upstream gradient max 3 *

72 Node + distributes the upstream gradient max routes the upstream gradient max 3 3 * 6 1

73 Node + distributes the upstream gradient max routes the upstream gradient * switches the upstream gradient max 3 3 *

74 Efficiency: compute all gradients at once Incorrect way of doing backprop: First compute * + 74

75 Efficiency: compute all gradients at once Incorrect way of doing backprop: First compute Then independently compute Duplicated computagon! * + 75

76 Efficiency: compute all gradients at once Correct way: Compute all the gradients at once Analogous to using when we computed gradients by hand * + 76

77 Backprop 77

78 forward/backward API 78

79 forward/backward API 79

80 to backprop: Numeric Gradient For small h, Easy to implement But approximate and very slow: Have to recompute f for every parameter of our model Useful for checking your implementagon 80

81 Summary BackpropagaGon: recursively apply the chain rule along computagonal graph [downstream gradient] = [upstream gradient] x [local gradient] Forward pass: compute results of operagon and save intermediate values Backward: apply chain rule to compute gradient 81

82 8

83 Project Types 1. Apply exisgng neural network model to a new task. Implement a complex neural architecture(s) This is what PA4 will have you do! 3. Come up with a new model/training algorithm/etc. Get 1 or working first See project page for some inspiragon 83

84 Must-haves (choose-your-own final project) 10,000+ labeled examples by milestone Feasible task AutomaGc evaluagon metric NLP is central 84

85 Details ma_er! Split your data into train/dev/test: only look at test for final experiments Look at your data, collect summary stagsgcs Look at your model s outputs, do error analysis Tuning hyperparameters is important Writeup quality is important Look at last-year s prize winners for examples 85

86 Project Advice Implement simplest possible model first (e.g., average word vectors and apply logisgc regression) and improve it Having a baseline system is crucial First overfit your model to train set (get really good training set results) Then regularize it so it does well on the dev set Start early! 86

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs Natural Language Processing with Deep Learning CS4N/Ling84 Christopher Manning Lecture 4: Backpropagation and computation graphs Lecture Plan Lecture 4: Backpropagation and computation graphs 1. Matrix

More information

Neural Networks: Learning. Cost func5on. Machine Learning

Neural Networks: Learning. Cost func5on. Machine Learning Neural Networks: Learning Cost func5on Machine Learning Neural Network (Classifica2on) total no. of layers in network no. of units (not coun5ng bias unit) in layer Layer 1 Layer 2 Layer 3 Layer 4 Binary

More information

Administrative. Assignment 1 due Wednesday April 18, 11:59pm

Administrative. Assignment 1 due Wednesday April 18, 11:59pm Lecture 4-1 Administrative Assignment 1 due Wednesday April 18, 11:59pm Lecture 4-2 Administrative All office hours this week will use queuestatus Lecture 4-3 Where we are... scores function SVM loss data

More information

Backpropagation and Neural Networks. Lecture 4-1

Backpropagation and Neural Networks. Lecture 4-1 Lecture 4: Backpropagation and Neural Networks Lecture 4-1 Administrative Assignment 1 due Thursday April 20, 11:59pm on Canvas Lecture 4-2 Administrative Project: TA specialities and some project ideas

More information

EE 511 Neural Networks

EE 511 Neural Networks Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Andrei Karpathy EE 511 Neural Networks Instructor: Hanna Hajishirzi hannaneh@washington.edu Computational

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview

More information

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer

More information

: Advanced Compiler Design

: Advanced Compiler Design 263-2810: Advanced Compiler Design Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Topics Program opgmizagon Op%miza%on Op%mize for (execu%on) speed Op%mize for (code) size Op%mize

More information

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning  Ian Goodfellow Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3 Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.

More information

Gradient Descent. Michail Michailidis & Patrick Maiden

Gradient Descent. Michail Michailidis & Patrick Maiden Gradient Descent Michail Michailidis & Patrick Maiden Outline Mo4va4on Gradient Descent Algorithm Issues & Alterna4ves Stochas4c Gradient Descent Parallel Gradient Descent HOGWILD! Mo4va4on It is good

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Backpropagation + Deep Learning

Backpropagation + Deep Learning 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation + Deep Learning Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders

More information

Programming Languages and Techniques (CIS120e)

Programming Languages and Techniques (CIS120e) Programming Languages and Techniques (CIS120e) Lecture 10 Sep 29, 2010 Abstract Stack Machine & First- Class FuncGons Announcements Homework 3 is due tonight at 11:59:59pm. Homework 4 will be available

More information

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional

More information

Back Propagation and Other Differentiation Algorithms. Sargur N. Srihari

Back Propagation and Other Differentiation Algorithms. Sargur N. Srihari Back Propagation and Other Differentiation Algorithms Sargur N. srihari@cedar.buffalo.edu 1 Topics (Deep Feedforward Networks) Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

CS5670: Computer Vision

CS5670: Computer Vision CS5670: Computer Vision Noah Snavely Lecture 33: Recognition Basics Slides from Andrej Karpathy and Fei-Fei Li http://vision.stanford.edu/teaching/cs231n/ Announcements Quiz moved to Tuesday Project 4

More information

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression CSE 158 Lecture 2 Web Mining and Recommender Systems Supervised learning Regression Supervised versus unsupervised learning Learning approaches attempt to model data in order to solve a problem Unsupervised

More information

Dropout. Sargur N. Srihari This is part of lecture slides on Deep Learning:

Dropout. Sargur N. Srihari This is part of lecture slides on Deep Learning: Dropout Sargur N. srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties

More information

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018 INF 5860 Machine learning for image classification Lecture 11: Visualization Anne Solberg April 4, 2018 Reading material The lecture is based on papers: Deep Dream: https://research.googleblog.com/2015/06/inceptionism-goingdeeper-into-neural.html

More information

CSC321 Lecture 10: Automatic Differentiation

CSC321 Lecture 10: Automatic Differentiation CSC321 Lecture 10: Automatic Differentiation Roger Grosse Roger Grosse CSC321 Lecture 10: Automatic Differentiation 1 / 23 Overview Implementing backprop by hand is like programming in assembly language.

More information

CS 179 Lecture 16. Logistic Regression & Parallel SGD

CS 179 Lecture 16. Logistic Regression & Parallel SGD CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning

More information

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016 CS 4510/9010 Applied Machine Learning 1 Neural Nets Paula Matuszek Fall 2016 Neural Nets, the very short version 2 A neural net consists of layers of nodes, or neurons, each of which has an activation

More information

A Brief Look at Optimization

A Brief Look at Optimization A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest

More information

: Compiler Design

: Compiler Design 252-210: Compiler Design 7.2 Assignment statement 7.3 Condi2onal statement 7.4 Loops 7.5 Method invoca2on Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Outline 7.1 Access to operands

More information

Structure and Support Vector Machines. SPFLODD October 31, 2013

Structure and Support Vector Machines. SPFLODD October 31, 2013 Structure and Support Vector Machines SPFLODD October 31, 2013 Outline SVMs for structured outputs Declara?ve view Procedural view Warning: Math Ahead Nota?on for Linear Models Training data: {(x 1, y

More information

CS1 Lecture 22 Mar. 8, HW5 available due this Friday, 5pm

CS1 Lecture 22 Mar. 8, HW5 available due this Friday, 5pm CS1 Lecture 22 Mar. 8, 2017 HW5 available due this Friday, 5pm CS1 Lecture 22 Mar. 8, 2017 HW5 available due this Friday, 5pm HW4 scores later today LAST TIME Tuples Zip (and a bit on iterators) Use of

More information

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation

More information

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017 INF 5860 Machine learning for image classification Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 0, 207 Mandatory exercise Available tonight,

More information

Advanced RNN (GRU and LSTM) for Machine Transla:on. Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion

Advanced RNN (GRU and LSTM) for Machine Transla:on. Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion Advanced RNN (GRU and LSTM) for Machine Transla:on Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion Slides were adapted from lectures by Richard Socher Overview Machine transla8on

More information

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 6: k-nn Cross-validation Regularization LEARNING METHODS Lazy vs eager learning Eager learning generalizes training data before

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation

More information

CS 61C: Great Ideas in Computer Architecture. Lecture 6: More MIPS, MIPS Func.ons. Instructor: Sagar Karandikar

CS 61C: Great Ideas in Computer Architecture. Lecture 6: More MIPS, MIPS Func.ons. Instructor: Sagar Karandikar CS 61C: Great Ideas in Computer Architecture Lecture 6: More MIPS, MIPS Func.ons Instructor: Sagar Karandikar sagark@eecs.berkeley.edu hap://inst.eecs.berkeley.edu/~cs61c 1 Machine Interpreta4on Levels

More information

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction

More information

Hyperparameter optimization. CS6787 Lecture 6 Fall 2017

Hyperparameter optimization. CS6787 Lecture 6 Fall 2017 Hyperparameter optimization CS6787 Lecture 6 Fall 2017 Review We ve covered many methods Stochastic gradient descent Step size/learning rate, how long to run Mini-batching Batch size Momentum Momentum

More information

Lecture 37: ConvNets (Cont d) and Training

Lecture 37: ConvNets (Cont d) and Training Lecture 37: ConvNets (Cont d) and Training CS 4670/5670 Sean Bell [http://bbabenko.tumblr.com/post/83319141207/convolutional-learnings-things-i-learned-by] (Unrelated) Dog vs Food [Karen Zack, @teenybiscuit]

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:

More information

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015 Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using

More information

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com Neural Networks These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Department of Energy s Exascale Efforts. Barbara Helland Advanced Scientific Computing Research Office of Science

Department of Energy s Exascale Efforts. Barbara Helland Advanced Scientific Computing Research Office of Science Department of Energy s Exascale Efforts Barbara Helland Advanced Scientific Computing Research Office of Science Town Hall Mee+ngs April- June 2007 Scien+fic Grand Challenges Workshops November 2008 October

More information

Deep Learning & Neural Networks

Deep Learning & Neural Networks Deep Learning & Neural Networks Machine Learning CSE4546 Sham Kakade University of Washington November 29, 2016 Sham Kakade 1 Announcements: HW4 posted Poster Session Thurs, Dec 8 Today: Review: EM Neural

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

CS 179: LECTURE 17 CONVOLUTIONAL NETS IN CUDNN

CS 179: LECTURE 17 CONVOLUTIONAL NETS IN CUDNN CS 179: LECTURE 17 CONVOLUTIONAL NETS IN CUDNN LAST TIME Motivation for convolutional neural nets Forward and backwards propagation algorithms for convolutional neural nets (at a high level) Foreshadowing

More information

An Introduc+on to Applied Cryptography. Chester Rebeiro IIT Madras

An Introduc+on to Applied Cryptography. Chester Rebeiro IIT Madras CR An Introduc+on to Applied Cryptography Chester Rebeiro IIT Madras CR 2 Connected and Stored Everything is connected! Everything is stored! Increased Security Breaches 81% more in 2015 CR h9p://www.pwc.co.uk/assets/pdf/2015-isbs-execugve-

More information

Deep neural networks II

Deep neural networks II Deep neural networks II May 31 st, 2018 Yong Jae Lee UC Davis Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Why (convolutional) neural networks? State of

More information

Introduc)on to Matlab

Introduc)on to Matlab Introduc)on to Matlab Marcus Kaiser (based on lecture notes form Vince Adams and Syed Bilal Ul Haq ) MATLAB MATrix LABoratory (started as interac)ve interface to Fortran rou)nes) Powerful, extensible,

More information

Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018

Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018 INF 5860 Machine learning for image classification Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018 Reading

More information

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer CS 61C: Great Ideas in Computer Architecture Everything is a Number Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13 9/19/13 Fall 2013 - - Lecture #7 1 New- School Machine Structures

More information

Machine Learning (CSE 446): Decision Trees

Machine Learning (CSE 446): Decision Trees Machine Learning (CSE 446): Decision Trees Sham M Kakade c 28 University of Washington cse446-staff@cs.washington.edu / 8 Announcements First assignment posted. Due Thurs, Jan 8th. Remember the late policy

More information

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016 CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2016 Assignment 5: Due Friday. Assignment 6: Due next Friday. Final: Admin December 12 (8:30am HEBB 100) Covers Assignments 1-6. Final from

More information

Deep Neural Networks Optimization

Deep Neural Networks Optimization Deep Neural Networks Optimization Creative Commons (cc) by Akritasa http://arxiv.org/pdf/1406.2572.pdf Slides from Geoffrey Hinton CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

CS535 Big Data Fall 2017 Colorado State University   10/10/2017 Sangmi Lee Pallickara Week 8- A. CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE

More information

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018 SEMANTIC COMPUTING Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) Dagmar Gromann International Center For Computational Logic TU Dresden, 21 December 2018 Overview Handling Overfitting Recurrent

More information

Parallelization in the Big Data Regime: Model Parallelization? Sham M. Kakade

Parallelization in the Big Data Regime: Model Parallelization? Sham M. Kakade Parallelization in the Big Data Regime: Model Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 12 Announcements...

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Simplest

More information

Feedback Alignment Algorithms. Lisa Zhang, Tingwu Wang, Mengye Ren

Feedback Alignment Algorithms. Lisa Zhang, Tingwu Wang, Mengye Ren Feedback Alignment Algorithms Lisa Zhang, Tingwu Wang, Mengye Ren Agenda Review of Back Propagation Random feedback weights support learning in deep neural networks Direct Feedback Alignment Provides Learning

More information

Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade

Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 23 Announcements...

More information

CSE 559A: Computer Vision

CSE 559A: Computer Vision CSE 559A: Computer Vision Fall 2018: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao Xia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/

More information

Practical Tips for using Backpropagation

Practical Tips for using Backpropagation Practical Tips for using Backpropagation Keith L. Downing August 31, 2017 1 Introduction In practice, backpropagation is as much an art as a science. The user typically needs to try many combinations of

More information

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper Deep Convolutional Neural Networks Nov. 20th, 2015 Bruce Draper Background: Fully-connected single layer neural networks Feed-forward classification Trained through back-propagation Example Computer Vision

More information

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15 Perceptrons and Backpropagation Fabio Zachert Cognitive Modelling WiSe 2014/15 Content History Mathematical View of Perceptrons Network Structures Gradient Descent Backpropagation (Single-Layer-, Multilayer-Networks)

More information

Image Registration Lecture 4: First Examples

Image Registration Lecture 4: First Examples Image Registration Lecture 4: First Examples Prof. Charlene Tsai Outline Example Intensity-based registration SSD error function Image mapping Function minimization: Gradient descent Derivative calculation

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30

More information

Neural Networks for Machine Learning. Lecture 5a Why object recogni:on is difficult. Geoffrey Hinton with Ni:sh Srivastava Kevin Swersky

Neural Networks for Machine Learning. Lecture 5a Why object recogni:on is difficult. Geoffrey Hinton with Ni:sh Srivastava Kevin Swersky Neural Networks for Machine Learning Lecture 5a Why object recogni:on is difficult Geoffrey Hinton with Ni:sh Srivastava Kevin Swersky Things that make it hard to recognize objects Segmenta:on: Real scenes

More information

SGD: Stochastic Gradient Descent

SGD: Stochastic Gradient Descent Improving SGD Hantao Zhang Deep Learning with Python Reading: http://neuralnetworksanddeeplearning.com/index.html Chapter 2 SGD: Stochastic Gradient Descent Main Idea: Given a set of input/output examples

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 7: Universal Approximation Theorem, More Hidden Units, Multi-Class Classifiers, Softmax, and Regularization Peter Belhumeur Computer Science Columbia University

More information

Neural Nets & Deep Learning

Neural Nets & Deep Learning Neural Nets & Deep Learning The Inspiration Inputs Outputs Our brains are pretty amazing, what if we could do something similar with computers? Image Source: http://ib.bioninja.com.au/_media/neuron _med.jpeg

More information

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02) Programming, Data Structures and Algorithms in Python Prof. Madhavan Mukund Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 04 Lecture - 01 Merge Sort (Refer

More information

Deep neural networks

Deep neural networks Deep neural networks Outline What s new in ANNs in the last 5-10 years? Deeper networks, more data, and faster training Scalability and use of GPUs Symbolic differentiation reverse-mode automatic differentiation

More information

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint

More information

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Jincheng Cao, SCPD Jincheng@stanford.edu 1. INTRODUCTION When running a direct mail campaign, it s common practice

More information

Recurrent Neural Network (RNN) Industrial AI Lab.

Recurrent Neural Network (RNN) Industrial AI Lab. Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series

More information

Replacing Neural Networks with Black-Box ODE Solvers

Replacing Neural Networks with Black-Box ODE Solvers Replacing Neural Networks with Black-Box ODE Solvers Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud University of Toronto, Vector Institute Resnets are Euler integrators Middle layers

More information

Practice Exam Sample Solutions

Practice Exam Sample Solutions CS 675 Computer Vision Instructor: Marc Pomplun Practice Exam Sample Solutions Note that in the actual exam, no calculators, no books, and no notes allowed. Question 1: out of points Question 2: out of

More information

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 December 2012

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/25/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3 In many data mining

More information

Machine learning for vision. It s the features, stupid! cathedral. high-rise. Winter Roland Memisevic. Lecture 2, January 26, 2016

Machine learning for vision. It s the features, stupid! cathedral. high-rise. Winter Roland Memisevic. Lecture 2, January 26, 2016 Winter 2016 Lecture 2, Januar 26, 2016 f2? cathedral high-rise f1 A common computer vision pipeline before 2012 1. 2. 3. 4. Find interest points. Crop patches around them. Represent each patch with a sparse

More information

Version Control Systems

Version Control Systems Version Control Systems ESA 2016/2017 Adam Belloum a.s.z.belloum@uva.nl Material Prepared by Eelco Schatborn Today IntroducGon to Version Control Systems Centralized Version Control Systems RCS CVS SVN

More information

BAYESIAN GLOBAL OPTIMIZATION

BAYESIAN GLOBAL OPTIMIZATION BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Tune Deep Learning Pipelines Scott Clark scott@sigopt.com OUTLINE 1. Why is Tuning AI Models Hard? 2. Comparison of Tuning Methods 3. Bayesian Global

More information

CS 224N: Assignment #1

CS 224N: Assignment #1 Due date: assignment) 1/25 11:59 PM PST (You are allowed to use three (3) late days maximum for this These questions require thought, but do not require long answers. Please be as concise as possible.

More information

Deep Learning. Volker Tresp Summer 2014

Deep Learning. Volker Tresp Summer 2014 Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there

More information

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan Lecture 17: Neural Networks and Deep Learning Instructor: Saravanan Thirumuruganathan Outline Perceptron Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks Auto Encoders

More information

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

Parallel Deep Network Training

Parallel Deep Network Training Lecture 26: Parallel Deep Network Training Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Tunes Speech Debelle Finish This Album (Speech Therapy) Eat your veggies and study

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information