Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Similar documents
Machine Learning 13. week

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Deep Learning. Volker Tresp Summer 2014

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

CS 523: Multimedia Systems

Data Mining. Neural Networks

Machine Learning. MGS Lecture 3: Deep Learning

Introduction to Deep Learning

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

INTRODUCTION TO DEEP LEARNING

Does the Brain do Inverse Graphics?

Neural Networks and Deep Learning

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Deep Learning for Computer Vision II

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Deep Learning with Tensorflow AlexNet

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Character Recognition Using Convolutional Neural Networks

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Deep Neural Networks:

Advanced Introduction to Machine Learning, CMU-10715

Convolutional Neural Networks

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Neural Network Neurons

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Recurrent Convolutional Neural Networks for Scene Labeling

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Deep Learning. Architecture Design for. Sargur N. Srihari

CS6220: DATA MINING TECHNIQUES

Computer Vision Lecture 16

C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation

The SIFT (Scale Invariant Feature

Neural Networks: promises of current research

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Know your data - many types of networks

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

CSC 578 Neural Networks and Deep Learning

Why equivariance is better than premature invariance

Dynamic Routing Between Capsules

Visual object classification by sparse convolutional neural networks

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Layerwise Interweaving Convolutional LSTM

Neural Bag-of-Features Learning

Autoencoders, denoising autoencoders, and learning deep networks

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Multi-Glance Attention Models For Image Classification

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN

ConvolutionalNN's... ConvNet's... deep learnig

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Supervised Learning in Neural Networks (Part 2)

Deep Generative Models Variational Autoencoders

Lab 8 CSC 5930/ Computer Vision

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Convolutional Neural Networks. CSC 4510/9010 Andrew Keenan

A Deep Learning Framework for Authorship Classification of Paintings

OBJECT RECOGNITION ALGORITHM FOR MOBILE DEVICES

Extracting and Composing Robust Features with Denoising Autoencoders

Cambridge Interview Technical Talk

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Unsupervised Learning of Spatiotemporally Coherent Metrics

Deep Learning and Its Applications

CSE 559A: Computer Vision

ImageNet Classification with Deep Convolutional Neural Networks

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Deep Learning. Volker Tresp Summer 2015

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

CS 4510/9010 Applied Machine Learning. Deep Learning. Paula Matuszek Fall copyright Paula Matuszek 2016

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

CS231N Section. Video Understanding 6/1/2018

COMPUTATIONAL INTELLIGENCE

Deep Learning in Image Processing

Deep (1) Matthieu Cord LIP6 / UPMC Paris 6

Deep Learning Applications

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

Backpropagation + Deep Learning

Online Learning for Object Recognition with a Hierarchical Visual Cortex Model

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Human Vision Based Object Recognition Sye-Min Christina Chan

DEEP LEARNING TO DIVERSIFY BELIEF NETWORKS FOR REMOTE SENSING IMAGE CLASSIFICATION

Does the Brain do Inverse Graphics?

Transcription:

http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort because it is such a difficult task. Deep learning can be used to use the objects and their context within the photograph to color the image, much like a human operator might approach the problem. http://machinelearningmastery.com/inspirational-applications-deep-learning/ In this task the system must synthesize sounds to match a silent video. The system is trained using 1000 examples of video with sound of a drum stick striking different surfaces and creating different sounds. A deep learning model associates the video frames with a database of pre-rerecorded sounds in order to select a sound to play that best matches what is happening in the scene. https://youtu.be/0fw99aqmmc8 1

Automatic Machine Translation Object Classification and Detection in Photographs This is a task where given words, phrase or sentence in one language, automatically translate it into another language. Automatic machine translation has been around for a long time, but deep learning is achieving top results in two specific areas: Automatic Translation of Text. Automatic Translation of Images. This task requires the classification of objects within a photograph as one of a set of previously known objects. A more complex variation of this task called object detection involves specifically identifying one or more objects within the scene of the photograph and drawing a box around them. http://machinelearningmastery.com/inspirational-applications-deep-learning/ Automatic Handwriting Generation Automatic Text Generation Please do not use this tool to generate your course project report! This is a task where given a corpus of handwriting examples, generate new handwriting for a given word or phrase. The handwriting is provided as a sequence of coordinates used by a pen when the handwriting samples were created. From this corpus the relationship between the pen movement and the letters is learned and new examples can be generated adhoc. What is fascinating is that different styles can be learned and then mimicked. I would love to see this work combined with some forensic hand writing analysis expertise. Generated, word-by-word or character-by-character. The model is capable of learning how to spell, punctuate, form sentences and even capture the style of the text in the corpus. 2

Automatic Image Caption Generation Automatic Game Playing Automatic image captioning is the task where given an image the system must generate a caption that describes the contents of the image. Introduction Deep learning (also known as deep structured learning, hierarchical learning or deep machine learning) is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations. Biological Motivation David Hubel and Torsten Wiesel In one experiment, done in 1959, they inserted a microelectrode into the primary visual cortex of a cat. They then projected patterns of light and dark on a screen in front of the cat. They found that some neurons fired rapidly when presented with lines at one angle, while others responded best to another angle. They called these neurons "simple cells." Still other neurons, which they termed "complex cells," responded best to lines of a certain angle moving in one direction. These studies showed how the visual system builds an image from simple stimuli into more complex representations https://en.wikipedia.org/wiki/torsten_wiesel 3

Why Brains have a deep architecture Humans organize their ideas hierarchically, through composition of simpler ideas Insufficiently deep architectures can be exponentially inefficient Distributed (possibly sparse) representations are necessary Input represented by the activation of a set of features that are not mutually exclusive Multiple levels of latent variables allow combinatorial sharing of statistical strength http://www.cs.toronto.edu/~fleet/courses/cifarschool09/slidesbengio.pdf 13 - Features Neural Network Representation Steer an autonomous vehicle driving at normal speeds on public highways Pixels of a figure is a feature in machine learning algorithms 4

- Features Deep Architecture in the Brain Pixels do not provide much useful information - Features - Features Simple cells Sparse structure of input data: Natural images contain localized, oriented structures with limited phase alignment across spatial frequency 5

- Features - Features 400 16 16-pixel image patches extracted from many natural scenes, denoted as S[i], i=0,1,2,,399. A target image patch T. The problem seeks to find the as few as possible S[k] such that (T- SUM k (a[k]*s[k])) is minimized. Surprisely, S[K] selected by the algorithm are always localized, oriented structures with limited phase alignment across spatial frequency Overview Train networks with many layers (vs. shallow nets with just a couple of layers) Multiple layers work to build an improved feature space First layer learns 1 st order features (e.g. edges ) 2 nd layer learns higher order features (combinations of first layer features, combinations of edges, etc.) In current models layers often learn in an unsupervised mode and discover general features of the input space serving multiple tasks related to the unsupervised instances (image recognition, etc.) Then final layer features are fed into supervised layer(s) And entire network is often subsequently tuned using supervised training of the entire net, using the initial weightings learned in the unsupervised phase Could also do fully supervised versions, etc. (early BP attempts) www.axon.cs.byu.edu/~martinez/classes/678/slides/ Tasks Usually best when input space is locally structured spatial or temporal: images, language, etc. vs arbitrary input features Images Example: view of a learned vision feature layer (Basis) Each square in the figure shows the input image that maximally activates www.axon.cs.byu.edu/~martinez/classes/678/slides/ 24 6

Category of document ->topic (thousand)- > term(10 thousand)->word (million) How Many Features? Shallow Learning (Surface Learning) Support Vector Machine Neural Network Logistic Regression Accuracy and Time Complexity 7

Deep learning and Neural Network Neural Network Simplicity of overfitting and gradient diffusion, difficulty of tuning parameters, high computational complexity, non-superiority of performance The algorithm consists of multiple steps; starts by a stochastic mapping of x to x through q d ( x x), this is the corrupting step. Then the corrupted input x passes through a basic auto encoder process and is mapped to a hidden representation y = f θ x = s(w x + b). From this hidden representation, we can reconstruct z=g θ (y). In the last stage, a minimization algorithm runs in order to have z as close as possible to uncorrupted input x. The reconstruction error L H (x,z) might be either the cross-entropy loss with an affine-sigmoid decoder, or the squared error loss with an affine decoder. Convolutional Neural Networks CNNs exploit spatially-local correlation by enforcing a local connectivity pattern between neurons of adjacent layers. In other words, the inputs of hidden units in layer m are from a subset of units in layer m-1, units that have spatially contiguous receptive fields. The architecture thus ensures that the learnt "filters" produce the strongest response to a spatially local input pattern. Stacking many such layers leads to non-linear "filters" that become increasingly "global". 8

CNN-LeNet-5 CNN-Convolutions Replicating units in this way allows for features to be detected regardless of their position in the visual field, thus constituting the property of translation invariance CNN-Convolutions CNN- Pooling Suppose you want to learned 9 features from a 5 5 image With Fully Connected Neural Network: 5 5 9=225 With Locally Connected Neural: 3 3 9=81 With Weight Sharing: 3 3+1 9

Convolutional Layer C1 is convolutional layer 6 feature maps 28 28 and each neuron has a 5 5 receptive field in input layer One neuron corresponds to 5 5 unit parameters and one bias parameter Connection: (5 5) 6 (28 28)=122,304 Hyper-parameters: (5*5+1)*6=156 parameters to learn If it was fully connected we had (32 32) (28 28) 6 parameters S2 is Subsampling layer with 6 feature maps of size 14 14 2 2 non overlapping receptive fields in C1 layer S2: 6 2=12 trainable parameters Connections: 14 14 2 2 6=4704 Convolutional Layer C3 Subsampling Layer S4 C3 is Convolutional layer with 16 feature maps of size 10 10 Each unit in C3 is connected to several 5 5 receptive fields Connections: 14 14 2 2 6=4704 S4 is Subsampling layer with 16 feature maps of size 5 5 Each unit in S4 is connected to the corresponding 2 2 receptive field at C3 Layer S4: 16 2=32 trainable parameters Connections: 5 5 2 2 16=1600 10

Convolutional Layer C5 Layer F6 C5 is Convolutional layer with 120 feature maps of size 1 1 Each unit in C5 is connected to all 16 5 5 receptive fields in S4 C5: 120 (16 25)=48000 trainable parameter and connection (Fully connected) F6 is 84 fully connected units. 84 (120+1) = 10164 trainable parameters and connections Output Layer: 84 Weight update: Backpropagation Recurrent Neural Network(RNN) 11

Recurrent Neural Network(RNN) Recurrent Neural Network(RNN) V V V W 0 W 1 W 2 W U U U U V t A t = f(ux t + WA t 1 ) h t =softmax(va t ) Long Term Dependencies Long Short Term Memory Networks The clouds are in the sky I grew up in France I speak fluent French 12

Long Short Term Memory Networks Input Gate Forget Gate 13