Natural Language Processing with Deep Learning. CS224N/Ling284. Lecture 5: Backpropagation Kevin Clark
|
|
- Ilene Briggs
- 5 years ago
- Views:
Transcription
1 Natural Language Processing with Deep Learning CS4N/Ling84 Lecture 5: Backpropagation Kevin Clark
2 Announcements Assignment 1 due Thursday, 11:59 You can use up to 3 late days (making it due Sunday at midnight) Default final project will be released February 1 st To help you choose which project opgon you want to do Final project proposal due February 8 th See website for details and inspiragon
3 Overview Today: From one-layer to mulg layer neural networks! Fully vectorized gradient computagon The backpropagagon algorithm (Time perminng) Class project Gps 3
4 Remember: One-layer Neural Net x = [ x museums x in x Paris x are x amazing ] 4
5 Two-layer Neural Net x = [ x museums x in x Paris x are x amazing ] 5
6 Repeat as Needed! x = [ x museums x in x Paris x are x amazing ] 6
7 Why Have Layers? Hierarchical representagons -> neural net can represent complicated features BeZer results! # Layers Machine Transla@on Score From Transformer Network (will cover in a later lecture) 7
8 Remember: Gradient Descent Update equagon: α = step size or learning rate 8
9 Remember: Gradient Descent Update equagon: α = step size or learning rate This Lecture: How do we compute? By hand Algorithmically (the backpropagagon algorithm) 9
10 Why learn all these details about gradients? Modern deep learning frameworks compute gradients for you But why take a class on compilers or systems when they are implemented for you? Understanding what is going on under the hood is useful! BackpropagaGon doesn t always work perfectly. Understanding why is crucial for debugging and improving models Example in future lecture: exploding and vanishing gradients 10
11 Quickly Gradients by Hand Review of mulgvariable derivagves Fully vectorized gradients Much faster and more useful than non-vectorized gradients But doing a non-vectorized gradient can be good pracgce, see slides in last week s lecture for an example Lecture notes cover this material in more detail 11
12 Gradients Given a funcgon with 1 output and n inputs Its gradient is a vector of pargal derivagves 1
13 Jacobian Matrix: of the Gradient Given a funcgon with m outputs and n inputs Its Jacobian is an m x n matrix of pargal derivagves 13
14 Chain Rule For Jacobians For one-variable funcgons: mulgply derivagves For mulgple variables: mulgply Jacobians 14
15 Example Jacobian: 15
16 Example Jacobian: FuncGon has n outputs and n inputs -> n by n Jacobian 16
17 Example Jacobian: 17
18 Example Jacobian: 18
19 Example Jacobian: 19
20 Other Jacobians Compute these at home for pracgce! Check your answers with the lecture notes 0
21 Other Jacobians Compute these at home for pracgce! Check your answers with the lecture notes 1
22 Other Jacobians Compute these at home for pracgce! Check your answers with the lecture notes
23 Other Jacobians Compute these at home for pracgce! Check your answers with the lecture notes 3
24 Back to Neural Nets! 4 x = [ x museums x in x Paris x are x amazing ]
25 Back to Neural Nets! Let s find In pracgce we care about the gradient of the loss, but we will compute the gradient of the score for simplicity 5 x = [ x museums x in x Paris x are x amazing ]
26 1. Break up into simple pieces 6
27 . Apply the chain rule 7
28 . Apply the chain rule 8
29 . Apply the chain rule 9
30 . Apply the chain rule 30
31 3. Write out the Jacobians Useful Jacobians from previous slide 31
32 3. Write out the Jacobians Useful Jacobians from previous slide 3
33 3. Write out the Jacobians Useful Jacobians from previous slide 33
34 3. Write out the Jacobians Useful Jacobians from previous slide 34
35 3. Write out the Jacobians Useful Jacobians from previous slide 35
36 Re-using Suppose we now want to compute Using the chain rule again: 36
37 Re-using Suppose we now want to compute Using the chain rule again: The same! Let s avoid duplicated computagon 37
38 Re-using Suppose we now want to compute Using the chain rule again: 38
39 with respect to Matrix What does look like? 1 output, nm inputs: 1 by nm Jacobian? Inconvenient to do 39
40 with respect to Matrix What does look like? 1 output, nm inputs: 1 by nm Jacobian? Inconvenient to do Instead follow convengon: shape of the gradient is shape of parameters So is n by m: 40
41 with respect to Matrix Remember is going to be in our answer The other term should be because It turns out 41
42 Why the Transposes? Hacky answer: this makes the dimensions work out Useful trick for checking your work! Full explanagon in the lecture notes 4
43 Why the Transposes? Hacky answer: this makes the dimensions work out Useful trick for checking your work! Full explanagon in the lecture notes 43
44 What shape should be? is a row vector But convengon says our gradient should be a column vector because is a column vector Disagreement between Jacobian form (which makes the chain rule easy) and the shape convengon (which makes implemengng SGD easy) We expect answers to follow the shape convengon But Jacobian form is useful for compugng the answers 44
45 What shape should be? Two opgons: 1. Use Jacobian form as much as possible, reshape to follow the convengon at the end: What we just did. But at the end transpose to make the derivagve a column vector, resulgng in. Always follow the convengon Look at dimensions to figure out when to transpose and/or reorder terms. 45
46 Notes on PA1 Don t worry if you used some other method for gradient computagon (as long as your answer is right and you are consistent!) This lecture we computed the gradient of the score, but in PA1 its of the loss Don t forget to replace f with the actual derivagve PA1 uses for the linear transformagon: gradients are different! 46
47 Compute gradients algorithmically ConverGng what we just did by hand into an algorithm Used by deep learning frameworks (TensorFlow, PyTorch, etc.) 47
48 Graphs RepresenGng our neural net equagons as a graph Source nodes: inputs Interior nodes: operagons + 48
49 Graphs RepresenGng our neural net equagons as a graph Source nodes: inputs Interior nodes: operagons Edges pass along result of the operagon + 49
50 Graphs RepresenGng our neural net equagons as a graph Source nodes: inputs Forward PropagaGon Interior nodes: operagons Edges pass along result of the operagon + 50
51 Go backwards along edges Pass along gradients + 51
52 Single Node Node receives an upstream gradient Goal is to pass on the correct downstream gradient 5 Downstream gradient Upstream gradient
53 Single Node Each node has a local gradient The gradient of its output with respect to its input 53 Downstream gradient Local gradient Upstream gradient
54 Single Node Each node has a local gradient The gradient of its output with respect to its input Chain rule! 54 Downstream gradient Local gradient Upstream gradient
55 Single Node Each node has a local gradient The gradient of its output with respect to its input [downstream gradient] = [upstream gradient] x [local gradient] 55 Downstream gradient Local gradient Upstream gradient
56 Single Node What about nodes with mulgple inputs? * 56
57 Single Node MulGple inputs -> mulgple local gradients * 57 Downstream gradients Local gradients Upstream gradient
58 An Example 58
59 An Example Forward prop steps + max * 59
60 An Example Forward prop steps max 3 * 6 60
61 An Example Forward prop steps Local gradients max 3 * 6 61
62 An Example Forward prop steps Local gradients max 3 * 6 6
63 An Example Forward prop steps Local gradients max 3 * 6 63
64 An Example Forward prop steps Local gradients max 3 * 6 64
65 An Example Forward prop steps Local gradients max 3 1* = 1*3 = 3 * 6 1 upstream * local = downstream
66 An Example Forward prop steps Local gradients *1 = 3 0 3*0 = 0 + max 3 3 * 6 1 upstream * local = downstream
67 An Example Forward prop steps Local gradients 67 1 *1 = *1 = max 3 3 * 6 1 upstream * local = downstream
68 An Example Forward prop steps Local gradients max 3 3 * 6 1
69 Gradients add at branches + 69
70 Gradients add at branches + 70
71 Node + distributes the upstream gradient max 3 *
72 Node + distributes the upstream gradient max routes the upstream gradient max 3 3 * 6 1
73 Node + distributes the upstream gradient max routes the upstream gradient * switches the upstream gradient max 3 3 *
74 Efficiency: compute all gradients at once Incorrect way of doing backprop: First compute * + 74
75 Efficiency: compute all gradients at once Incorrect way of doing backprop: First compute Then independently compute Duplicated computagon! * + 75
76 Efficiency: compute all gradients at once Correct way: Compute all the gradients at once Analogous to using when we computed gradients by hand * + 76
77 Backprop 77
78 forward/backward API 78
79 forward/backward API 79
80 to backprop: Numeric Gradient For small h, Easy to implement But approximate and very slow: Have to recompute f for every parameter of our model Useful for checking your implementagon 80
81 Summary BackpropagaGon: recursively apply the chain rule along computagonal graph [downstream gradient] = [upstream gradient] x [local gradient] Forward pass: compute results of operagon and save intermediate values Backward: apply chain rule to compute gradient 81
82 8
83 Project Types 1. Apply exisgng neural network model to a new task. Implement a complex neural architecture(s) This is what PA4 will have you do! 3. Come up with a new model/training algorithm/etc. Get 1 or working first See project page for some inspiragon 83
84 Must-haves (choose-your-own final project) 10,000+ labeled examples by milestone Feasible task AutomaGc evaluagon metric NLP is central 84
85 Details ma_er! Split your data into train/dev/test: only look at test for final experiments Look at your data, collect summary stagsgcs Look at your model s outputs, do error analysis Tuning hyperparameters is important Writeup quality is important Look at last-year s prize winners for examples 85
86 Project Advice Implement simplest possible model first (e.g., average word vectors and apply logisgc regression) and improve it Having a baseline system is crucial First overfit your model to train set (get really good training set results) Then regularize it so it does well on the dev set Start early! 86
Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs
Natural Language Processing with Deep Learning CS4N/Ling84 Christopher Manning Lecture 4: Backpropagation and computation graphs Lecture Plan Lecture 4: Backpropagation and computation graphs 1. Matrix
More informationNeural Networks: Learning. Cost func5on. Machine Learning
Neural Networks: Learning Cost func5on Machine Learning Neural Network (Classifica2on) total no. of layers in network no. of units (not coun5ng bias unit) in layer Layer 1 Layer 2 Layer 3 Layer 4 Binary
More informationAdministrative. Assignment 1 due Wednesday April 18, 11:59pm
Lecture 4-1 Administrative Assignment 1 due Wednesday April 18, 11:59pm Lecture 4-2 Administrative All office hours this week will use queuestatus Lecture 4-3 Where we are... scores function SVM loss data
More informationBackpropagation and Neural Networks. Lecture 4-1
Lecture 4: Backpropagation and Neural Networks Lecture 4-1 Administrative Assignment 1 due Thursday April 20, 11:59pm on Canvas Lecture 4-2 Administrative Project: TA specialities and some project ideas
More informationEE 511 Neural Networks
Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Andrei Karpathy EE 511 Neural Networks Instructor: Hanna Hajishirzi hannaneh@washington.edu Computational
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview
More informationMachine Learning. MGS Lecture 3: Deep Learning
Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer
More information: Advanced Compiler Design
263-2810: Advanced Compiler Design Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Topics Program opgmizagon Op%miza%on Op%mize for (execu%on) speed Op%mize for (code) size Op%mize
More informationPractical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow
Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationGradient Descent. Michail Michailidis & Patrick Maiden
Gradient Descent Michail Michailidis & Patrick Maiden Outline Mo4va4on Gradient Descent Algorithm Issues & Alterna4ves Stochas4c Gradient Descent Parallel Gradient Descent HOGWILD! Mo4va4on It is good
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationBackpropagation + Deep Learning
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation + Deep Learning Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders
More informationProgramming Languages and Techniques (CIS120e)
Programming Languages and Techniques (CIS120e) Lecture 10 Sep 29, 2010 Abstract Stack Machine & First- Class FuncGons Announcements Homework 3 is due tonight at 11:59:59pm. Homework 4 will be available
More informationCS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional
More informationBack Propagation and Other Differentiation Algorithms. Sargur N. Srihari
Back Propagation and Other Differentiation Algorithms Sargur N. srihari@cedar.buffalo.edu 1 Topics (Deep Feedforward Networks) Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationCS5670: Computer Vision
CS5670: Computer Vision Noah Snavely Lecture 33: Recognition Basics Slides from Andrej Karpathy and Fei-Fei Li http://vision.stanford.edu/teaching/cs231n/ Announcements Quiz moved to Tuesday Project 4
More informationDeep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES
Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationCSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression
CSE 158 Lecture 2 Web Mining and Recommender Systems Supervised learning Regression Supervised versus unsupervised learning Learning approaches attempt to model data in order to solve a problem Unsupervised
More informationDropout. Sargur N. Srihari This is part of lecture slides on Deep Learning:
Dropout Sargur N. srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties
More informationINF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018
INF 5860 Machine learning for image classification Lecture 11: Visualization Anne Solberg April 4, 2018 Reading material The lecture is based on papers: Deep Dream: https://research.googleblog.com/2015/06/inceptionism-goingdeeper-into-neural.html
More informationCSC321 Lecture 10: Automatic Differentiation
CSC321 Lecture 10: Automatic Differentiation Roger Grosse Roger Grosse CSC321 Lecture 10: Automatic Differentiation 1 / 23 Overview Implementing backprop by hand is like programming in assembly language.
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationPartitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationCS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016
CS 4510/9010 Applied Machine Learning 1 Neural Nets Paula Matuszek Fall 2016 Neural Nets, the very short version 2 A neural net consists of layers of nodes, or neurons, each of which has an activation
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More information: Compiler Design
252-210: Compiler Design 7.2 Assignment statement 7.3 Condi2onal statement 7.4 Loops 7.5 Method invoca2on Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Outline 7.1 Access to operands
More informationStructure and Support Vector Machines. SPFLODD October 31, 2013
Structure and Support Vector Machines SPFLODD October 31, 2013 Outline SVMs for structured outputs Declara?ve view Procedural view Warning: Math Ahead Nota?on for Linear Models Training data: {(x 1, y
More informationCS1 Lecture 22 Mar. 8, HW5 available due this Friday, 5pm
CS1 Lecture 22 Mar. 8, 2017 HW5 available due this Friday, 5pm CS1 Lecture 22 Mar. 8, 2017 HW5 available due this Friday, 5pm HW4 scores later today LAST TIME Tuples Zip (and a bit on iterators) Use of
More informationCPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation
More informationLecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017
INF 5860 Machine learning for image classification Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 0, 207 Mandatory exercise Available tonight,
More informationAdvanced RNN (GRU and LSTM) for Machine Transla:on. Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion
Advanced RNN (GRU and LSTM) for Machine Transla:on Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion Slides were adapted from lectures by Richard Socher Overview Machine transla8on
More informationCOMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization
COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 6: k-nn Cross-validation Regularization LEARNING METHODS Lazy vs eager learning Eager learning generalizes training data before
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation
More informationCS 61C: Great Ideas in Computer Architecture. Lecture 6: More MIPS, MIPS Func.ons. Instructor: Sagar Karandikar
CS 61C: Great Ideas in Computer Architecture Lecture 6: More MIPS, MIPS Func.ons Instructor: Sagar Karandikar sagark@eecs.berkeley.edu hap://inst.eecs.berkeley.edu/~cs61c 1 Machine Interpreta4on Levels
More informationMachine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center
Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction
More informationHyperparameter optimization. CS6787 Lecture 6 Fall 2017
Hyperparameter optimization CS6787 Lecture 6 Fall 2017 Review We ve covered many methods Stochastic gradient descent Step size/learning rate, how long to run Mini-batching Batch size Momentum Momentum
More informationLecture 37: ConvNets (Cont d) and Training
Lecture 37: ConvNets (Cont d) and Training CS 4670/5670 Sean Bell [http://bbabenko.tumblr.com/post/83319141207/convolutional-learnings-things-i-learned-by] (Unrelated) Dog vs Food [Karen Zack, @teenybiscuit]
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:
More informationSequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015
Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using
More informationNeural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Neural Networks These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture
More informationDepartment of Energy s Exascale Efforts. Barbara Helland Advanced Scientific Computing Research Office of Science
Department of Energy s Exascale Efforts Barbara Helland Advanced Scientific Computing Research Office of Science Town Hall Mee+ngs April- June 2007 Scien+fic Grand Challenges Workshops November 2008 October
More informationDeep Learning & Neural Networks
Deep Learning & Neural Networks Machine Learning CSE4546 Sham Kakade University of Washington November 29, 2016 Sham Kakade 1 Announcements: HW4 posted Poster Session Thurs, Dec 8 Today: Review: EM Neural
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationCS 179: LECTURE 17 CONVOLUTIONAL NETS IN CUDNN
CS 179: LECTURE 17 CONVOLUTIONAL NETS IN CUDNN LAST TIME Motivation for convolutional neural nets Forward and backwards propagation algorithms for convolutional neural nets (at a high level) Foreshadowing
More informationAn Introduc+on to Applied Cryptography. Chester Rebeiro IIT Madras
CR An Introduc+on to Applied Cryptography Chester Rebeiro IIT Madras CR 2 Connected and Stored Everything is connected! Everything is stored! Increased Security Breaches 81% more in 2015 CR h9p://www.pwc.co.uk/assets/pdf/2015-isbs-execugve-
More informationDeep neural networks II
Deep neural networks II May 31 st, 2018 Yong Jae Lee UC Davis Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Why (convolutional) neural networks? State of
More informationIntroduc)on to Matlab
Introduc)on to Matlab Marcus Kaiser (based on lecture notes form Vince Adams and Syed Bilal Ul Haq ) MATLAB MATrix LABoratory (started as interac)ve interface to Fortran rou)nes) Powerful, extensible,
More informationLecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018
INF 5860 Machine learning for image classification Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018 Reading
More informationInstructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer
CS 61C: Great Ideas in Computer Architecture Everything is a Number Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13 9/19/13 Fall 2013 - - Lecture #7 1 New- School Machine Structures
More informationMachine Learning (CSE 446): Decision Trees
Machine Learning (CSE 446): Decision Trees Sham M Kakade c 28 University of Washington cse446-staff@cs.washington.edu / 8 Announcements First assignment posted. Due Thurs, Jan 8th. Remember the late policy
More informationCPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016
CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2016 Assignment 5: Due Friday. Assignment 6: Due next Friday. Final: Admin December 12 (8:30am HEBB 100) Covers Assignments 1-6. Final from
More informationDeep Neural Networks Optimization
Deep Neural Networks Optimization Creative Commons (cc) by Akritasa http://arxiv.org/pdf/1406.2572.pdf Slides from Geoffrey Hinton CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationSEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018
SEMANTIC COMPUTING Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) Dagmar Gromann International Center For Computational Logic TU Dresden, 21 December 2018 Overview Handling Overfitting Recurrent
More informationParallelization in the Big Data Regime: Model Parallelization? Sham M. Kakade
Parallelization in the Big Data Regime: Model Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 12 Announcements...
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Simplest
More informationFeedback Alignment Algorithms. Lisa Zhang, Tingwu Wang, Mengye Ren
Feedback Alignment Algorithms Lisa Zhang, Tingwu Wang, Mengye Ren Agenda Review of Back Propagation Random feedback weights support learning in deep neural networks Direct Feedback Alignment Provides Learning
More informationParallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade
Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 23 Announcements...
More informationCSE 559A: Computer Vision
CSE 559A: Computer Vision Fall 2018: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao Xia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/
More informationPractical Tips for using Backpropagation
Practical Tips for using Backpropagation Keith L. Downing August 31, 2017 1 Introduction In practice, backpropagation is as much an art as a science. The user typically needs to try many combinations of
More informationDeep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper
Deep Convolutional Neural Networks Nov. 20th, 2015 Bruce Draper Background: Fully-connected single layer neural networks Feed-forward classification Trained through back-propagation Example Computer Vision
More informationPerceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15
Perceptrons and Backpropagation Fabio Zachert Cognitive Modelling WiSe 2014/15 Content History Mathematical View of Perceptrons Network Structures Gradient Descent Backpropagation (Single-Layer-, Multilayer-Networks)
More informationImage Registration Lecture 4: First Examples
Image Registration Lecture 4: First Examples Prof. Charlene Tsai Outline Example Intensity-based registration SSD error function Image mapping Function minimization: Gradient descent Derivative calculation
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30
More informationNeural Networks for Machine Learning. Lecture 5a Why object recogni:on is difficult. Geoffrey Hinton with Ni:sh Srivastava Kevin Swersky
Neural Networks for Machine Learning Lecture 5a Why object recogni:on is difficult Geoffrey Hinton with Ni:sh Srivastava Kevin Swersky Things that make it hard to recognize objects Segmenta:on: Real scenes
More informationSGD: Stochastic Gradient Descent
Improving SGD Hantao Zhang Deep Learning with Python Reading: http://neuralnetworksanddeeplearning.com/index.html Chapter 2 SGD: Stochastic Gradient Descent Main Idea: Given a set of input/output examples
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 7: Universal Approximation Theorem, More Hidden Units, Multi-Class Classifiers, Softmax, and Regularization Peter Belhumeur Computer Science Columbia University
More informationNeural Nets & Deep Learning
Neural Nets & Deep Learning The Inspiration Inputs Outputs Our brains are pretty amazing, what if we could do something similar with computers? Image Source: http://ib.bioninja.com.au/_media/neuron _med.jpeg
More informationWeek - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)
Programming, Data Structures and Algorithms in Python Prof. Madhavan Mukund Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 04 Lecture - 01 Merge Sort (Refer
More informationDeep neural networks
Deep neural networks Outline What s new in ANNs in the last 5-10 years? Deeper networks, more data, and faster training Scalability and use of GPUs Symbolic differentiation reverse-mode automatic differentiation
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationPredict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry
Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Jincheng Cao, SCPD Jincheng@stanford.edu 1. INTRODUCTION When running a direct mail campaign, it s common practice
More informationRecurrent Neural Network (RNN) Industrial AI Lab.
Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series
More informationReplacing Neural Networks with Black-Box ODE Solvers
Replacing Neural Networks with Black-Box ODE Solvers Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud University of Toronto, Vector Institute Resnets are Euler integrators Middle layers
More informationPractice Exam Sample Solutions
CS 675 Computer Vision Instructor: Marc Pomplun Practice Exam Sample Solutions Note that in the actual exam, no calculators, no books, and no notes allowed. Question 1: out of points Question 2: out of
More informationLecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 December 2012
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/25/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3 In many data mining
More informationMachine learning for vision. It s the features, stupid! cathedral. high-rise. Winter Roland Memisevic. Lecture 2, January 26, 2016
Winter 2016 Lecture 2, Januar 26, 2016 f2? cathedral high-rise f1 A common computer vision pipeline before 2012 1. 2. 3. 4. Find interest points. Crop patches around them. Represent each patch with a sparse
More informationVersion Control Systems
Version Control Systems ESA 2016/2017 Adam Belloum a.s.z.belloum@uva.nl Material Prepared by Eelco Schatborn Today IntroducGon to Version Control Systems Centralized Version Control Systems RCS CVS SVN
More informationBAYESIAN GLOBAL OPTIMIZATION
BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Tune Deep Learning Pipelines Scott Clark scott@sigopt.com OUTLINE 1. Why is Tuning AI Models Hard? 2. Comparison of Tuning Methods 3. Bayesian Global
More informationCS 224N: Assignment #1
Due date: assignment) 1/25 11:59 PM PST (You are allowed to use three (3) late days maximum for this These questions require thought, but do not require long answers. Please be as concise as possible.
More informationDeep Learning. Volker Tresp Summer 2014
Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationLecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan
Lecture 17: Neural Networks and Deep Learning Instructor: Saravanan Thirumuruganathan Outline Perceptron Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks Auto Encoders
More informationCode Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:
Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More informationParallel Deep Network Training
Lecture 26: Parallel Deep Network Training Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Tunes Speech Debelle Finish This Album (Speech Therapy) Eat your veggies and study
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More information