Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 15

Similar documents
Machine Learning 13. week

Deep Learning and Its Applications

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning Applications

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

A Quick Guide on Training a neural network using Keras.

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Hidden Units. Sargur N. Srihari

Machine Learning. MGS Lecture 3: Deep Learning

Accelerating Convolutional Neural Nets. Yunming Zhang

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Deep Learning with Tensorflow AlexNet

Deep neural networks II

Midterm Review. CS230 Fall 2018

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Tutorial on Machine Learning Tools

CSC 578 Neural Networks and Deep Learning

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

Knowledge Discovery and Data Mining

Keras: Handwritten Digit Recognition using MNIST Dataset

Pixel-level Generative Model

Administrative. Assignment 1 due Wednesday April 18, 11:59pm

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Homework 5. Due: April 20, 2018 at 7:00PM

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

Deep Learning for Computer Vision II

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Introduction to Neural Networks

Instance-based Learning

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

CIS 660. Image Searching System using CNN-LSTM. Presented by. Mayur Rumalwala Sagar Dahiwala

Perceptron as a graph

Artificial Intelligence

Keras: Handwritten Digit Recognition using MNIST Dataset

Crowd Scene Understanding with Coherent Recurrent Neural Networks

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

Image Captioning with Object Detection and Localization

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Multinomial Regression and the Softmax Activation Function. Gary Cottrell!

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

Lecture 7: Neural network acoustic models in speech recognition

Backpropagation and Neural Networks. Lecture 4-1

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification

Convolutional Networks for Text

Introduction to Neural Networks

DEEP LEARNING IN PYTHON. Introduction to deep learning

Deep Learning Explained Module 6: Text classification with Recurrence (LSTM)

1 Introduction. 3 Dataset and Features. 2 Prior Work. 3.1 Data Source. 3.2 Data Preprocessing

ECE 6504: Deep Learning for Perception

Deep Learning. Yee Whye Teh (Oxford Statistics & DeepMind)

Dynamic Routing Between Capsules

Hello Edge: Keyword Spotting on Microcontrollers

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Instance-based Learning

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

MoonRiver: Deep Neural Network in C++

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Deep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1

CNN Basics. Chongruo Wu

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

CSC321 Lecture 10: Automatic Differentiation

All You Want To Know About CNNs. Yukun Zhu

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Table of Contents. What Really is a Hidden Unit? Visualizing Feed-Forward NNs. Visualizing Convolutional NNs. Visualizing Recurrent NNs

Face Recognition A Deep Learning Approach

Sensor-based Semantic-level Human Activity Recognition using Temporal Classification

Recurrent Neural Networks

Lecture note 7: Playing with convolutions in TensorFlow

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

Practical Deep Learning

CS6220: DATA MINING TECHNIQUES

MATLAB representation of neural network Outline Neural network with single-layer of neurons. Neural network with multiple-layer of neurons.

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Deep Neural Networks:

Perceptron: This is convolution!

Deep Learning on Graphs

This Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured

Know your data - many types of networks

Back Propagation and Other Differentiation Algorithms. Sargur N. Srihari

CS 224d: Assignment #1

EE 511 Neural Networks

Transcription:

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 15 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 21

Logistics HW3 available on Github, due on October 19 First prelim next Monday Final project team formation due by October 12 Machine Learning: Chenhao Tan Boulder 2 of 21

Recap Outline Recap Machine Learning: Chenhao Tan Boulder 3 of 21

Recap Logistic Regression as a Single-layer Neural Network Input layer x 1 x 2... Single Layer o 1... What is the activation used in logistic regression? What is the objective function used in logistic regression? x d Machine Learning: Chenhao Tan Boulder 4 of 21

Recap Nonlinearity Options Sigmoid tanh ReLU (rectified linear unit) f (x) = f (x) = 1 1 + exp(x) exp(x) exp( x) exp(x) + exp( x) f (x) = max(0, x) softmax x = https://keras.io/activations/ exp(x) x i exp(x i ) Machine Learning: Chenhao Tan Boulder 5 of 21

Recap Nonlinearity Options Machine Learning: Chenhao Tan Boulder 6 of 21

Recap Loss Function Options l 2 loss l 1 loss Cross entropy (logistic regression) (y i ŷ i ) 2 i y i ŷ i i i y i log ŷ i Hinge loss (more on this during SVM) https://keras.io/losses/ max(0, 1 yŷ) Machine Learning: Chenhao Tan Boulder 7 of 21

Recap Forward propagation algorithm How do we make predictions based on a multi-layer neural network? Store the biases for layer l in b l, weight matrix in W l W 1, b 1 W 2, b 2 W 3, b 3 W 4, b 4 x 1 x 2 o 1...... x d Machine Learning: Chenhao Tan Boulder 8 of 21

Recap Forward propagation algorithm Suppose your network has L layers Make prediction for an instance x 1: Initialize a 0 = x 2: for l = 1 to L do 3: z l = W l a l 1 + b l 4: a l = g(z l ) // g represents the nonlinear activation 5: end for 6: The prediction ŷ is simply a L Machine Learning: Chenhao Tan Boulder 9 of 21

Recap Quiz 1 How many parameters are there in the following feed-forward neural networks (assuming no biases)? x 1 x 2... x 100 h 1... h 50 A. 100 * 50 + 50 * 20 + 20 * 5 h 1... h 20 B. 100 * 50 + 50 + 50 * 20 + 20 + 20 * 5 + 5 o 1... o 5 Machine Learning: Chenhao Tan Boulder 10 of 21

Recap Neural networks in a nutshell Training data S train = {(x, y)} Network architecture (model) ŷ = f w (x) Loss function (objective function) L (y, ŷ) Learning (next week) Machine Learning: Chenhao Tan Boulder 11 of 21

Recap Quiz 2 Which of the following statements is true? (Suppose that training data is large.) A. In training, K-nearest neighbors takes shorter time than neural networks. B. In training, K-nearest neighbors takes longer time than neural networks. C. In testing, K-nearest neighbors takes shorter time than neural networks. D. In testing, K-nearest neighbors takes longer time than neural networks. Machine Learning: Chenhao Tan Boulder 12 of 21

Outline Recap Machine Learning: Chenhao Tan Boulder 13 of 21

Structured data Spatial information https://www.reddit.com/r/aww/comments/6ip2la/before_and_ after_she_was_told_she_was_a_good_girl/ Machine Learning: Chenhao Tan Boulder 14 of 21

Convolutional Layers Sharing parameters across patches input image or input feature map output feature maps a i j = k k w ij x ij i=1 j=1 Number of filters Filter shape Stride size https://github.com/davidstutz/latex-resources/blob/master/ tikz-convolutional-layer/convolutional-layer.tex Machine Learning: Chenhao Tan Boulder 15 of 21

Quiz 3 A concrete example (assuming no biases, convolution with 4 filters, ReLU, max pooling, and finally a fully-connected softmax layer) input image (10*10) 4@6*6 4@3*3 5*1 4 filters 5*5, stride 1 Max pooling 2*2, stride 2 Machine Learning: Chenhao Tan Boulder 16 of 21

Quiz 3 How many parameters are there in the following convolutional neural networks? (assuming no biases, convolution with 4 filters, ReLU, max pooling, and finally a fully-connected softmax layer) input image (10*10) 4@6*6 4@3*3 5*1 4 filters 5*5, stride 1 Max pooling 2*2, stride 2 Machine Learning: Chenhao Tan Boulder 16 of 21

Quiz 3 How many ReLU operations are performed on the forward pass? (assuming no biases, convolution with 4 filters, ReLU, max pooling, and finally a fully-connected softmax layer) input image (10*10) 4@6*6 4@3*3 5*1 4 filters 5*5, stride 1 Max pooling 2*2, stride 2 Machine Learning: Chenhao Tan Boulder 16 of 21

Structured data Sequential information My words fly up, my thoughts remain below: Words without thoughts never to heaven go. Hamlet Machine Learning: Chenhao Tan Boulder 17 of 21

Structured data Sequential information My words fly up, my thoughts remain below: Words without thoughts never to heaven go. Hamlet language activity history Machine Learning: Chenhao Tan Boulder 17 of 21

Structured data Sequential information My words fly up, my thoughts remain below: Words without thoughts never to heaven go. Hamlet language activity history x = (x 1,..., x T ) Machine Learning: Chenhao Tan Boulder 17 of 21

Recurrent Layers Sharing parameters along a sequence h t = f (x t, h t 1 ) Machine Learning: Chenhao Tan Boulder 18 of 21

Recurrent Layers Sharing parameters along a sequence h t = f (x t, h t 1 ) https://colah.github.io/posts/2015-08-understanding-lstms/ Machine Learning: Chenhao Tan Boulder 18 of 21

Long short-term memory A commonly used recurrent neural network in natural language processing f t = σ(w f [h t 1, x t ] + b f ) i t = σ(w i [h t 1, x t ] + b i ) o t = σ(w o [h t 1, x t ] + b o ) C t = tanh(w C [h t 1, x t ] + b C ) C t = f t C t 1 + i t C t h t = o t tanh(c t ) Machine Learning: Chenhao Tan Boulder 19 of 21

Wrap up Neural networks Network architecture (a lot more then fully connected layers) Convulutional layer Recurrent layer Loss function Machine Learning: Chenhao Tan Boulder 20 of 21

What is missing? How to find good weights? How to make the model work (regularization, even more architecture, etc)? Machine Learning: Chenhao Tan Boulder 21 of 21