Table of Contents. What Really is a Hidden Unit? Visualizing Feed-Forward NNs. Visualizing Convolutional NNs. Visualizing Recurrent NNs

Similar documents
Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Convolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research

Recurrent Neural Nets II

Can Active Memory Replace Attention?

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Evolving Multitask Neural Network Structure

The Mathematics Behind Neural Networks

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Pointer Network. Oriol Vinyals. 박천음 강원대학교 Intelligent Software Lab.

This Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured

Recurrent Neural Network (RNN) Industrial AI Lab.

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Modeling Sequences Conditioned on Context with RNNs

FastText. Jon Koss, Abhishek Jindal

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

27: Hybrid Graphical Models and Neural Networks

Keras: Handwritten Digit Recognition using MNIST Dataset

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

CS839: Probabilistic Graphical Models. Lecture 22: The Attention Mechanism. Theo Rekatsinas

Variational Autoencoders. Sargur N. Srihari

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

RNNs as Directed Graphical Models

Recurrent Neural Networks

Machine Learning 13. week

CSC 578 Neural Networks and Deep Learning

Hand Written Digit Recognition Using Tensorflow and Python

Time Series prediction with Feed-Forward Neural Networks -A Beginners Guide and Tutorial for Neuroph. Laura E. Carter-Greaves

Convolutional Networks for Text

Machine Learning. MGS Lecture 3: Deep Learning

The Hitchhiker s Guide to TensorFlow:

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Perceptron: This is convolution!

Neural Nets & Deep Learning

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Deep Learning. Volker Tresp Summer 2014

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Machine learning for biomedical

Deep Learning Applications

Deep Learning. Yee Whye Teh (Oxford Statistics & DeepMind)

Lecture note 4: How to structure your model in TensorFlow

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Stefano Cavuoti INAF Capodimonte Astronomical Observatory Napoli

Making Sense of Artificial Intelligence: A Practical Guide

Advanced Data Visualization

Exploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

EECS 496 Statistical Language Models. Winter 2018

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Deep Generative Models Variational Autoencoders

Opportunities and challenges in personalization of online hotel search

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Application of Deep Learning Techniques in Satellite Telemetry Analysis.

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Gated Recurrent Models. Stephan Gouws & Richard Klein

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 15

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Multinomial Regression and the Softmax Activation Function. Gary Cottrell!

Static Gesture Recognition with Restricted Boltzmann Machines

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Machine Learning for Physicists Lecture 6. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt

Deep Neural Networks Applications in Handwriting Recognition

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Machine Learning (CSE 446): Practical Issues

Hidden Units. Sargur N. Srihari

CS 4510/9010 Applied Machine Learning. Deep Learning. Paula Matuszek Fall copyright Paula Matuszek 2016

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

3 Perceptron Learning; Maximum Margin Classifiers

Image Registration Lecture 4: First Examples

Music Genre Classification

Deep Neural Networks Applications in Handwriting Recognition

CS 523: Multimedia Systems

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

ImageNet Classification with Deep Convolutional Neural Networks

Transcription:

Table of Contents What Really is a Hidden Unit? Visualizing Feed-Forward NNs Visualizing Convolutional NNs Visualizing Recurrent NNs Visualizing Attention Visualizing High Dimensional Data What do visualizations get us now?

What Really is a Hidden Unit? Some operation that outputs a weighted value. Sometimes its between -1 and +1 or its just weighted by 1 resulting in a linear value Lets say between -1 and +1 for this example -1 This is really a projection into a weighted space +1

Let s Look at a Distribution We Want to Model Simple distribution of blue points totally surrounded by red points

Let s Put This Data Into Random Hidden Units Still not linearly separable Still not linearly separable Finally linearly separable We just visualized a simple hidden layer!

Let s Put This Data Into Random Hidden Units Still not linearly separable Still not linearly separable Finally linearly separable This distribution is obviously not linearly separable To separate this data with a NN we need multiple layers

Let s issee This how A you Network visualize of Layers a basic Like NN... These Learning... Let s imagine a network structured like this is it able to learn this distribution?

Visualize a basic NN... Let s look at some more networks http://playground.tensorflow.org

Paper Here: https://cs.nyu.edu/~fergus/papers/zeilereccv2014.pdf Visualize a Convolutional NN...

Paper Here: https://cs.nyu.edu/~fergus/papers/zeilereccv2014.pdf Visualize a Convolutional NN...

Paper Here: https://cs.nyu.edu/~fergus/papers/zeilereccv2014.pdf Visualize a Convolutional NN...

Paper Here: https://cs.nyu.edu/~fergus/papers/zeilereccv2014.pdf Visualize a Convolutional NN... Layer 4 Layer 5

Visualize a Convolutional NN... Let s look at a more interactive visualization http://scs.ryerson.ca/~aharley/vis/conv/

What about an RNN Turns out that its similar but different

Visualize a Recurrent NN... Activations that change dependent on the input Paper Here: https://arxiv.org/pdf/1506.02078.pdf

Let s See aarecurrent Visualize Network ofnn... Layers Like These Learning... Average LSTM Gate positions dependent on their inputs Paper Here: https://arxiv.org/pdf/1506.02078.pdf

Let s See aarecurrent Visualize Network ofnn... Layers Like These Learning... Average GRU Gate positions dependent on their inputs Paper Here: https://arxiv.org/pdf/1506.02078.pdf

Let s See AAttention... Visualizing Network of Layers Like These Learning... First let's talk about attention we have not covered it yet. This is going to be handwavy, but the professor will cover this in lecture shortly Note that there is only one value calculated per input Outputs Attention Values (Calculated by the attention model) Inputs Paper Here: https://arxiv.org/pdf/1409.0473.pdf

Let s See AAttention... Visualizing Network of Layers Like These Learning... a() is a learned neural network Paper Here: https://arxiv.org/pdf/1409.0473.pdf

Visualizing Attention... We can visualize these values!

Visualizing Attention... Soft Attention (Softmax Activation - Differentiable) Hard Attention (Step Function Activation - Not Differentiable) Paper Here: https://arxiv.org/pdf/1502.03044.pdf

Visualizing Attention... Paper Here: https://arxiv.org/pdf/1502.03044.pdf

Let s See AAttention... Visualizing Network of Layers Like These Learning... Sometimes this can show us where our errors are... Paper Here: https://arxiv.org/pdf/1502.03044.pdf

Let s See AAttention... Visualizing Network of Layers Like These Learning... You can also do this with Seq2Seq Models Don t these look a lot like word alignment charts Topic for another time Paper Here: https://arxiv.org/pdf/1409.0473.pdf

Let s See AHigh Visualizing Network Dimensional of LayersData Like These Learning... How do you view data that is in more than 3 dimensions? Projections! Paper Here: https://arxiv.org/pdf/1409.0473.pdf

Visualizing Let s See AHigh Network Dimensional of LayersData Like These Learning... (Dimensionality Reduction) Principal Component Analysis Mathematical Projection of Principal Components Common for visualizing embedding spaces (deterministic) Great PCA Reading here: https://www.cs.cmu.edu/~elaw/papers/pca.pdf T-Distributed Stochastic Neighbor Embedding (t-sne) Stochastic Projection Trained using Gradient Descent Using KL Divergence Common for visualizing training data (non-deterministic) Paper Here: http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

Visualizing Let s See AHigh Network Dimensional of LayersData Like These Learning... (Dimensionality Reduction) Let s look at some MNIST Examples http://colah.github.io/posts/2014-10-visualizing-mnist/ Paper Here: http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

Visualizing Let s See AHigh Network Dimensional of LayersData Like These Learning... (Dimensionality Reduction) Let s look at some MNIST Examples http://colah.github.io/posts/2014-10-visualizing-mnist/ Paper Here: http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

Visualizing Let s See AHigh Network Dimensional of LayersData Like These Learning... (Dimensionality Reduction) Let s look at some of our own examples https://colab.research.google.com/drive/1bjjxecml544xp3hc ZwFPNxAhFlF6VMTy#scrollTo=ULqt5rdaQPoi Paper Here: http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

Let s See What do visualizations A Network of get Layers us now? Like These Learning... Right now nothing NNs are still black boxes No way to really use these visualizations to help improve our models They are currently just cool images to look at... maybe later it will be better

Let s See A Network of Layers Like These Learning... Conclusion Any Questions?