Fast Sliding Window Classification with Convolutional Neural Networks

Size: px
Start display at page:

Download "Fast Sliding Window Classification with Convolutional Neural Networks"

Transcription

1 Fast Sliding Window Classification with Convolutional Neural Networks Henry G. R. Gouk Department of Computer Science University of Waikato Private Bag 3105, Hamilton 3240, New Zealand ABSTRACT Convolutional Neural Networks (CNNs) have repeatedly been shown to be the state of the art method for natural signal classification image classification in particular. Unfortunately, due to the high model complexity CNNs often cannot be used for object detection tasks with real-time constraints, where many predictions have to be made on sub-windows of a large input image. We demonstrate how two recent advances in CNN efficiency can be combined, with modifications, to provide a substantial speedup for sliding window classification. An in depth analysis of the various factors that can impact performance is presented. Categories and Subject Descriptors I.5.1 [Pattern Recognition]: Neural Nets Keywords Convolutional Neural Networks, Object Detection, Sliding Window, FFT 1. INTRODUCTION Convolutional Neural Networks (CNNs) are powerful models and have become very popular for image classification tasks over the last few years, where they have been shown to produce state of the art results on several benchmark datasets [10, 5]. The primary drawback of using CNNs is their substantial complexity, which leads to slow training and inference procedures, making some applications unfeasible due to real-time constraints. In this paper we aim to address this problem of slow inference in the context of sliding window classification by extending a scalable inference procedure to evaluate all the sub-windows in a large input image simultaneously and applying computer architecture aware optimisations. We provide a comparison between several modern methods for applying CNNs in a sliding window fashion. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. IVCNZ 14 November , Hamilton, New Zealand Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM /14/11$ Anthony M. Blake Department of Computer Science University of Waikato Private Bag 3105, Hamilton 3240, New Zealand ablake@waikato.ac.nz It is known that computing convolutions by transforming the image and kernel into the frequency domain and performing point-wise complex multiplication is an effective way to gain a speedup with many computer vision and image processing algorithms [9]. However, it has only been applied in the context of CNNs recently where the main motivation was to accelerate the training process [6]. The focus of our work has been on optimisation of the entire deployed sliding window CNN classification system, as opposed to the CNN training process. The forward propagation procedure developed as part of the frequency domain method scales better to larger image sizes than the traditional space domain method, so it forms the basis of our sliding window inference procedure. In [8] it was observed that the feature maps computed by CNNs for adjacent sub-windows during sliding window classification contain large overlapping sections, allowing for a significant number of redundant computations to be avoided. The method originally presented describes the computations in the spatial domain and the performance is not quantified. We use the frequency domain forward propagation algorithm from [6] and perform the same extending described by [8] and apply architecture aware optimisations to greatly accelerate the sliding window classification process to a point where it can be used in real-time with modest sized networks. Neither of these methods impacts the final accuracy of the resulting classification system. That is to say, they produce results that are equivalent to the algorithms conventionally used for performing training and inference on CNNs. Thus, our work was focused on the speedup that can be attained. We also show how the memory requirements of the algorithm presented in [6] can be reduced. The relationship between the reduced memory usage and the performance is demonstrated. The paper is structured as follows; in Section 2 we give an overview of the methods presented in [6] and [8], followed by an explanation of how the methods can be combined. Section 3 discusses what modifications must be made to prevent several potential pitfalls, and also provides an exploration of the tradeoff between memory usage and run time. Sections 6 and 7 provide an evaluation of how efficient the different algorithms are in a range of scenarios. Section 8 provides some final comments with suggestions for future work in this area. 2. BACKGROUND Exploiting the convolution theorem to accelerate CNNs is not as trivial as it is to use it to speed up other image 114

2 processing and computer vision algorithms. If one were to simply replace all occurrences of spatial convolution with frequency domain convolution then in the vast majority of cases one would see a dramatic slowdown because the kernels used in CNNs are usually quite small 5 5 pixels is a very common size. To get around this [6] leverages the linearity property of the Fourier transform to greatly reduce the number of inverse transforms required to compute each feature map by performing the summation in the frequency domain. In addition to this, it was observed that one can simply precompute the frequency domain copy of the kernels and store them, further cutting down the number of transforms required. Although these optimisations are not particularly difficult to see once the task is approached with frequency domain methods in mind, it has taken over twenty years since the introduction of CNNs for them to be applied. It should be noted that [6] thoroughly explores the factors that determine the run-time performance of these frequency domain methods versus the traditional spatial domain methods, but performance of sliding window classification was not within the scope of their investigation. Recently [8] introduced OverFeat, a CNN based image recogniser and feature extractor. The authors also included a small section addressing the performance of sliding window classification where they identified that the feature maps in convolutional layers produced by propagating several adjacent sub-windows through the same CNN contain a large amount of overlapping data. To exploit this fact they applied the convolutional layers to the entire input image rather than processing each sub-window individually, preventing the repeated calculation of the same values. The pooling layers are left to continue pooling in the same pattern, but since the pools are extending over the much larger feature maps one is forced into using a particular stride, determined by the product of the sizes of each pooling layer in the network. Finally, the fully connected layers can be transformed into convolutional layers with 1 1 kernels and used in the same manner as the other convolutional layers. The derivation of the fast sliding window inference algorithm presented by [8] can easily be modified to use the frequency domain methods presented by [6]. In the fast sliding window method each kernel in a convolutional layer is applied to the entirety of each input feature map, as opposed to each sub-window, so by simply using the frequency domain convolution procedure with the CNN specific optimisations from [6], the algorithms can be combined. However, one will quickly encounter a major problem when attempting to perform inference on a large image. The convolution theorem specifically states that point-wise multiplication in the frequency domain is equivalent to circular convolution in the space domain. The way to get around this limitation and still perform linear convolution is to zero-pad the image and the kernel enough that the border effects caused by circular convolution are not present [2]. The problem with this, in the context of sliding window CNN classifiers, is that the input images one would like to apply the sliding window classifier to can be several megapixels. CNNs usually have a very large number of kernels, so padding out all these kernels to be at least as big as a large input image causes the memory requirements to skyrocket a problem the fast spatial domain sliding window classification algorithm does not suffer from since the padding is not required for space domain convolution. Additionally, the fast sliding window algorithm transforms fully connected layers into convolutional layers. If one were to indiscriminately modify all convolutional layers to undertake computations in the frequency domain huge memory and computational overheads will be incurred since the kernel consists of only a single value. To gain a good speedup from combining these two methods a slightly more sophisticated approach will need to be employed. 3. OPTIMISING SLIDING CNNS Excessive memory use has the consequence that the overall memory available to the rest of the system is reduced. If the CNN is part of a larger system then this could be catastrophic, as it could end up leaving the system unable to even operate on a computer with limited memory. We propose a very simple way to avoid the first problem: coarsely divide the large input image up into several smaller input images and apply the frequency domain sliding window inference to each of these images. Note that these smaller sub-images must have an overlap to compensate for the cropping effect introduced by the convolutional layers in the CNN, so if one were to divide the input image into four sub-images then each of those sub-images would contain slightly more than one quarter of the input image. This overlap reintroduces some of the redundant computations that first provided the inspiration for the fast space domain forward propagation algorithm, and as the number of sub-images grows the performance will tend towards the standard method of extracting and performing inference on each sub-window individually. As an example of how this changes the run-time performance, the technique was applied to an image of pixels using the shallow model described in Section 6. Figure 1 shows how this method impacts the speed compared to the simple sliding window frequency domain approach that applies a model to the entire input image at once when the number of sub-images is one, this approach is equivalent to the standard frequency domain sliding window inference algorithm. The speeds have very little correlation with the number of divisions. This is because the FFT needs to be computed on a signal of a size that is not a power of two. In most modern libraries this causes a significant degradation in performance [1]. Due to the sacrifice in speed this optimisation is intended to be used in situations where memory consumption needs to be minimised, such as in embedded environments or mobile devices. The second problem of incurring needless overheads for fully connected layers can also be mitigated. Once again, the solution is rather simple; only convert convolutional layers with sufficiently large kernels to use the frequency domain forward propagation method. We use the very simple heuristic of performing frequency domain based convolutions for any layer where the kernels have more than one element. Thus, the computations for the fully connected layers are still performed in the spatial domain. However, there is still something that can be improved upon. If one truly treats the fully connected layers as convolutional layers with 1 1 kernels then a huge number of redundant branching instructions will be executed. To avoid this we make a variant of the convolutional layer where the inner loops inside the convolution methods have been unrolled to take into account that the kernels have only a single element. Figure 2 gives an idea of how much speedup the removal of these superflu- 115

3 Relative Speedup Number of Sub-images 0 Figure 1: This figure illustrates how dividing up the input image in order to reduce the model size impacts the speed. The network used here is the shallow network described in Section 6. ous branch instructions provides. 4. BACKPROPAGATION It is worth noting that the fast sliding window algorithm is also capable of speeding up backpropagation by applying the same extension idea to the correlation and convolution operations used for calculating the derivatives. However the accuracy is likely to be very poor unless two criteria are satisfied. Firstly, it must make sense to use very large minibatch sizes, as each sub-window is essentially a new instance. Secondly, the class distribution of the sub-windows in a single input image must be varied enough to prevent catastrophic forgetting (a phenomenon explained very well in [3]) for every new input image. Unfortunately, in the object detection setting there are no guarantees about how frequently an object class will occur so it is unlikely the fast sliding window backpropagation algorithm will be any use in these cases. The task of pixel-level scene labeling might see a benefit from this backpropagation method, but that is outside the scope of this work. 5. IMPLEMENTATION DETAILS The algorithms were implemented in C++ and extensive use was made of Intel s AVX (Advanced Vector extensions) SIMD (Single Instruction, Multiple Data) intrinsics to provide a significant performance boost. FFTW [4], a popular fast Fourier transform library, was used for computing the Fourier transforms required in the convolutional layers. This library was chosen because of the relatively good performance and because it also utilises AVX instructions. The implementation is packaged as a library that can be included in other projects and is available for download. 1 In addition to the fast sliding window forward propagation algorithm, we implemented the sliding window backpropagation methods as well. We have, however, not yet found 1 Figure 2: Relative speedup of the specialised sliding fully connected layer over using a convolutional layer for different square input image sizes when using the shallow network described in Section 6. any application that would be suitable to use these capabilities due to the limitations outlined earlier. A GPU version of the library is also in development, and we plan to release this at a later date. 6. PERFORMANCE EVALUATION From the outset, the goal of this paper was to quantify the relative performance of several recently introduced methods for accelerating CNNs particularly in the context of sliding window classification. This section provides an in depth evaluation of how each of these algorithms performs under a representative variety of typical scenarios. We aim to show how the performance of each algorithm is impacted by kernel size, input image size, network depth, and network breadth. All experiments are run on a machine with an Intel i processor and 16GB of RAM. The performance measurements reported for each parameter configuration is the average over 10 runs. Five different algorithms are evaluated: Standard forward propagation in the space domain (FP-S); Standard forward propagation in the frequency domain (FP-F); Sliding window forward propagation in the space domain (SWFP-S); Sliding window forward propagation in the frequency domain (SWFP-F); Sliding window forward propagation in the frequency domain with the optimisations to the fully connected layers described in Section 3 (OSWFP-F). Note that optimisations introduced for OSWFP-F does not include the technique for reducing memory usage by coarsely dividing up the input image. The first experiment we conducted provides empirical evidence for how fast the different inference procedures are 116

4 FP-S FP-F SWFP-S SWFP-F OSWFP-F FP-S FP-F SWFP-S SWFP-F OSWFP-F Figure 3: Speed of the five algorithms for different square input image resolutions using a typical shallow network architecture. Figure 4: Speed of the five algorithms for different square input image resolutions using a typical deep network architecture. on a shallow network. We measure the average time taken to run the CNN on randomly generated synthetic images at several different resolutions. Figure 3 shows these measurements for the five different inference techniques. The network architecture used in experiment contains four layers; a convolutional layer with 9 9 kernels producing 64 feature maps, a max pooling layer with a 3 3 pool size, a fully connected layer with 64 units, and a binary output layer. Rectified Linear Units [7] are used for all the hidden layers and softmax is used for the output layer, as this has become the typical combination of activation functions in deep neural networks applied to classification tasks. Next, we investigate how the different methods cope with a significantly larger number of kernels by introducing additional convolutional and pooling layers. The network used for this experiment is as follows; a convolutional layer with 5 5 kernels producing 32 feature maps, a pooling layer with a pool size of 2 2, another convolutional layer with 5 5 kernels that also produces 32 feature maps, another 2 2 pooling layer, a fully connected layer with 32 units, and a binary output layer. Once again, we run several different sized images through the network and use all five algorithms to perform forward propagations. The results are presented in Figure 4. The primary benefit of the frequency domain method of convolution in the context of CNNs is that the computation time becomes completely dependent on the feature map size and the kernel size plays no role, aside from determining the dimensions of feature maps used as inputs for subsequent layers. To see how the frequency domain algorithms compare to the space domain algorithms we experimented with the same two networks previously used to show how the performance changes for each of the five techniques when the kernel size is varied, as opposed to the input image size. For these experiments we use a fixed input image size of pixels. Figure 5 shows the results for the shallow network and Figure 6 shows the results for the deep network. When the feature maps output by a convolutional layer are not divisible by the pool sizes due to the varying kernel sizes, the feature maps are cropped to the largest size that is a divisible by the pool size without any remainder. 7. DISCUSSION As can be seen from the experiments presented in Section 6, the frequency domain sliding window inference method that takes advantage of computer architecture aware optimisations is the fastest method in most cases. The exception being where the image size is identical to the receptive field size of the original CNN classifier. The main controlling factor in the performance of the frequency domain approach is the speed of the FFT implementation. The erratic deviation in performance demonstrated in Figure 1, caused by the FFT execution time, can also be seen when observing how the speeds varies with different kernel sizes. In Figures 5 and 6 the speedup of the frequency domain algorithms exhibits the same, seemingly random, variations in speed with changes to the kernel size. This means great care must be taken when using the method for reducing the memory usage of the frequency domain algorithm described in Section 3. The difference between OSWFP-F and SWFP-F is most noticeable when a large fraction of the running time is spent on computing the fully connected layer activations, as that is what the performance optimisations targeted. This difference is most obvious in the shallow network, particularly when the image size is quite large, as Figure 5. In the deep network the vast majority of computations are devoted to the convolutional layers, and because of this there is very little difference between OSWFP-F and SWFP-F. The fast space domain sliding window classification method presented in [8] (labeled SWFP-S in the figures) does indeed provide a massive speedup of over an order of magnitude compared to the naive approach of extracting each sub-window and running the inference procedure on them individually. This speedup is particularly noticeable when the input image becomes quite large. 117

5 8. CONCLUSIONS We have presented an algorithm that improves upon the state of the art performance of sliding window inference with convolutional neural networks by exploiting a combination of frequency domain methods and knowledge of computer architecture. The performance of this method and a recently introduced spatial domain method were quantified in a range of scenarios representative of the typical CNN architectures used. Our analysis also shows that the quality of the FFT implementation used impacts the performance in a major way. In the future we plan to investigate how the issues with the fast sliding window backpropagation can be overcome, and also demonstrate that complex models can be applied to tasks with real-time constraints Kernel Size Figure 5: Speed of the five algorithms for different square kernel sizes using a typical shallow network architecture. The legend has been omitted due to space constraints, however the colour coding is the same as in Figure Kernel Size 9. REFERENCES [1] Anthony Martin Blake. Computing the fast Fourier transform on SIMD microprocessors. PhD thesis, University of Waikato, [2] CSS Burrus and Thomas W Parks. DFT/FFT and Convolution Algorithms: Theory and Implementation. John Wiley & Sons, Inc., [3] Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4): , [4] Matteo Frigo and Steven G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2): , Special issue on Program Generation, Optimization, and Platform Adaptation. [5] Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages , [6] Michael Mathieu, Mikael Henaff, and Yann LeCun. Fast Training of Convolutional Networks through FFTs. arxiv preprint arxiv: , [7] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages , [8] Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. arxiv preprint arxiv: , [9] Richard Szeliski. Computer vision: algorithms and applications. Springer, [10] Li Wan, Matthew Zeiler, Sixin Zhang, Yann L Cun, and Rob Fergus. Regularization of neural networks using dropconnect. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages , Figure 6: Speed of the five algorithms for different square kernel sizes using a typical deep network architecture. The legend has been omitted due to space constraints, however the colour coding is the same as in Figure

Accelerating Convolutional Neural Network Systems

Accelerating Convolutional Neural Network Systems Accelerating Convolutional Neural Network Systems Henry G.R. Gouk This report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Computing and Mathematical Sciences with

More information

Weighted Convolutional Neural Network. Ensemble.

Weighted Convolutional Neural Network. Ensemble. Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Real-time convolutional networks for sonar image classification in low-power embedded systems

Real-time convolutional networks for sonar image classification in low-power embedded systems Real-time convolutional networks for sonar image classification in low-power embedded systems Matias Valdenegro-Toro Ocean Systems Laboratory - School of Engineering & Physical Sciences Heriot-Watt University,

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Rotation Invariance Neural Network

Rotation Invariance Neural Network Rotation Invariance Neural Network Shiyuan Li Abstract Rotation invariance and translate invariance have great values in image recognition. In this paper, we bring a new architecture in convolutional neural

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Deep Learning With Noise

Deep Learning With Noise Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu

More information

Convolutional Neural Networks

Convolutional Neural Networks Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

More information

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application

More information

Advanced Introduction to Machine Learning, CMU-10715

Advanced Introduction to Machine Learning, CMU-10715 Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Return of the Devil in the Details: Delving Deep into Convolutional Nets

Return of the Devil in the Details: Delving Deep into Convolutional Nets Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen Simonyan - Andrea Vedaldi - Andrew Zisserman University of Oxford The Devil is still in the Details 2011 2014

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Tianyu Wang Australia National University, Colledge of Engineering and Computer Science u@anu.edu.au Abstract. Some tasks,

More information

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Does the Brain do Inverse Graphics?

Does the Brain do Inverse Graphics? Does the Brain do Inverse Graphics? Geoffrey Hinton, Alex Krizhevsky, Navdeep Jaitly, Tijmen Tieleman & Yichuan Tang Department of Computer Science University of Toronto How to learn many layers of features

More information

arxiv: v1 [cs.cv] 29 Oct 2017

arxiv: v1 [cs.cv] 29 Oct 2017 A SAAK TRANSFORM APPROACH TO EFFICIENT, SCALABLE AND ROBUST HANDWRITTEN DIGITS RECOGNITION Yueru Chen, Zhuwei Xu, Shanshan Cai, Yujian Lang and C.-C. Jay Kuo Ming Hsieh Department of Electrical Engineering

More information

Convolutional Neural Network Layer Reordering for Acceleration

Convolutional Neural Network Layer Reordering for Acceleration R1-15 SASIMI 2016 Proceedings Convolutional Neural Network Layer Reordering for Acceleration Vijay Daultani Subhajit Chaudhury Kazuhisa Ishizaka System Platform Labs Value Co-creation Center System Platform

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Deep Learning Workshop Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Why deep learning? The ImageNet Challenge Goal: image classification with 1000 categories Top 5 error rate of 15%. Krizhevsky, Alex,

More information

Using Machine Learning for Classification of Cancer Cells

Using Machine Learning for Classification of Cancer Cells Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.

More information

Two-Stream Convolutional Networks for Action Recognition in Videos

Two-Stream Convolutional Networks for Action Recognition in Videos Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Cemil Zalluhoğlu Introduction Aim Extend deep Convolution Networks to action recognition in video. Motivation

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins Smart Parking System using Deep Learning Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins Content Labeling tool Neural Networks Visual Road Map Labeling tool Data set Vgg16 Resnet50 Inception_v3

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Groupout: A Way to Regularize Deep Convolutional Neural Network

Groupout: A Way to Regularize Deep Convolutional Neural Network Groupout: A Way to Regularize Deep Convolutional Neural Network Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Groupout is a new technique

More information

Training Convolutional Neural Networks for Translational Invariance on SAR ATR

Training Convolutional Neural Networks for Translational Invariance on SAR ATR Downloaded from orbit.dtu.dk on: Mar 28, 2019 Training Convolutional Neural Networks for Translational Invariance on SAR ATR Malmgren-Hansen, David; Engholm, Rasmus ; Østergaard Pedersen, Morten Published

More information

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA TOPICS COVERED Convolutional Networks Deep Learning Use Cases GPUs cudnn 2 MACHINE LEARNING! Training! Train the model from supervised

More information

A performance comparison of Deep Learning frameworks on KNL

A performance comparison of Deep Learning frameworks on KNL A performance comparison of Deep Learning frameworks on KNL R. Zanella, G. Fiameni, M. Rorro Middleware, Data Management - SCAI - CINECA IXPUG Bologna, March 5, 2018 Table of Contents 1. Problem description

More information

Semantic Image Search. Alex Egg

Semantic Image Search. Alex Egg Semantic Image Search Alex Egg Inspiration Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing

More information

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper Deep Convolutional Neural Networks Nov. 20th, 2015 Bruce Draper Background: Fully-connected single layer neural networks Feed-forward classification Trained through back-propagation Example Computer Vision

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

A Deep Learning Framework for Authorship Classification of Paintings

A Deep Learning Framework for Authorship Classification of Paintings A Deep Learning Framework for Authorship Classification of Paintings Kai-Lung Hua ( 花凱龍 ) Dept. of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei,

More information

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Elaina Chai December 12, 2014 1. Abstract In this paper I will discuss the feasibility of an implementation of an algorithm

More information

From Maxout to Channel-Out: Encoding Information on Sparse Pathways

From Maxout to Channel-Out: Encoding Information on Sparse Pathways From Maxout to Channel-Out: Encoding Information on Sparse Pathways Qi Wang and Joseph JaJa Department of Electrical and Computer Engineering and, University of Maryland Institute of Advanced Computer

More information

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,

More information

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report Figure 1: The architecture of the convolutional network. Input: a single view image; Output: a depth map. 3 Related Work In [4] they used depth maps of indoor scenes produced by a Microsoft Kinect to successfully

More information

End-To-End Spam Classification With Neural Networks

End-To-End Spam Classification With Neural Networks End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam

More information

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro Ryerson University CP8208 Soft Computing and Machine Intelligence Naive Road-Detection using CNNS Authors: Sarah Asiri - Domenic Curro April 24 2016 Contents 1 Abstract 2 2 Introduction 2 3 Motivation

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

Classifying a specific image region using convolutional nets with an ROI mask as input

Classifying a specific image region using convolutional nets with an ROI mask as input Classifying a specific image region using convolutional nets with an ROI mask as input 1 Sagi Eppel Abstract Convolutional neural nets (CNN) are the leading computer vision method for classifying images.

More information

Does the Brain do Inverse Graphics?

Does the Brain do Inverse Graphics? Does the Brain do Inverse Graphics? Geoffrey Hinton, Alex Krizhevsky, Navdeep Jaitly, Tijmen Tieleman & Yichuan Tang Department of Computer Science University of Toronto The representation used by the

More information

Tiny ImageNet Visual Recognition Challenge

Tiny ImageNet Visual Recognition Challenge Tiny ImageNet Visual Recognition Challenge Ya Le Department of Statistics Stanford University yle@stanford.edu Xuan Yang Department of Electrical Engineering Stanford University xuany@stanford.edu Abstract

More information

Convolutional Neural Network for Facial Expression Recognition

Convolutional Neural Network for Facial Expression Recognition Convolutional Neural Network for Facial Expression Recognition Liyuan Zheng Department of Electrical Engineering University of Washington liyuanz8@uw.edu Shifeng Zhu Department of Electrical Engineering

More information

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn

More information

Stochastic Function Norm Regularization of DNNs

Stochastic Function Norm Regularization of DNNs Stochastic Function Norm Regularization of DNNs Amal Rannen Triki Dept. of Computational Science and Engineering Yonsei University Seoul, South Korea amal.rannen@yonsei.ac.kr Matthew B. Blaschko Center

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Convolutional Neural Networks

Convolutional Neural Networks NPFL114, Lecture 4 Convolutional Neural Networks Milan Straka March 25, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise

More information

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD. Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

ConvolutionalNN's... ConvNet's... deep learnig

ConvolutionalNN's... ConvNet's... deep learnig Deep Learning ConvolutionalNN's... ConvNet's... deep learnig Markus Thaler, TG208 tham@zhaw.ch www.zhaw.ch/~tham Martin Weisenhorn, TB427 weie@zhaw.ch 20.08.2018 1 Neural Networks Classification: up to

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

High Performance Computing

High Performance Computing High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason

More information

Some fast and compact neural network solutions for artificial intelligence applications

Some fast and compact neural network solutions for artificial intelligence applications Some fast and compact neural network solutions for artificial intelligence applications Radu Dogaru, University Politehnica of Bucharest ETTI, Dept. of Applied Electronics and Info. Eng., Natural Computing

More information

A comparison between end-to-end approaches and feature extraction based approaches for Sign Language recognition

A comparison between end-to-end approaches and feature extraction based approaches for Sign Language recognition A comparison between end-to-end approaches and feature extraction based approaches for Sign Language recognition Marlon Oliveira, Houssem Chatbri, Suzanne Little, Noel E. O Connor, and Alistair Sutherland

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

arxiv: v2 [cs.ne] 2 Dec 2015

arxiv: v2 [cs.ne] 2 Dec 2015 An Introduction to Convolutional Neural Networks Keiron O Shea 1 and Ryan Nash 2 1 Department of Computer Science, Aberystwyth University, Ceredigion, SY23 3DB keo7@aber.ac.uk 2 School of Computing and

More information

Learning Binary Code with Deep Learning to Detect Software Weakness

Learning Binary Code with Deep Learning to Detect Software Weakness KSII The 9 th International Conference on Internet (ICONI) 2017 Symposium. Copyright c 2017 KSII 245 Learning Binary Code with Deep Learning to Detect Software Weakness Young Jun Lee *, Sang-Hoon Choi

More information

Visual object classification by sparse convolutional neural networks

Visual object classification by sparse convolutional neural networks Visual object classification by sparse convolutional neural networks Alexander Gepperth 1 1- Ruhr-Universität Bochum - Institute for Neural Dynamics Universitätsstraße 150, 44801 Bochum - Germany Abstract.

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

Fuzzy Set Theory in Computer Vision: Example 3

Fuzzy Set Theory in Computer Vision: Example 3 Fuzzy Set Theory in Computer Vision: Example 3 Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Purpose of these slides are to make you aware of a few of the different CNN architectures

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

DEEP NEURAL NETWORKS FOR OBJECT DETECTION DEEP NEURAL NETWORKS FOR OBJECT DETECTION Sergey Nikolenko Steklov Institute of Mathematics at St. Petersburg October 21, 2017, St. Petersburg, Russia Outline Bird s eye overview of deep learning Convolutional

More information

Stacked Denoising Autoencoders for Face Pose Normalization

Stacked Denoising Autoencoders for Face Pose Normalization Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

Quantifying Translation-Invariance in Convolutional Neural Networks

Quantifying Translation-Invariance in Convolutional Neural Networks Quantifying Translation-Invariance in Convolutional Neural Networks Eric Kauderer-Abrams Stanford University 450 Serra Mall, Stanford, CA 94305 ekabrams@stanford.edu Abstract A fundamental problem in object

More information

arxiv: v1 [cs.cv] 4 Dec 2014

arxiv: v1 [cs.cv] 4 Dec 2014 Convolutional Neural Networks at Constrained Time Cost Kaiming He Jian Sun Microsoft Research {kahe,jiansun}@microsoft.com arxiv:1412.1710v1 [cs.cv] 4 Dec 2014 Abstract Though recent advanced convolutional

More information

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version Shallow vs. deep networks Restricted Boltzmann Machines Shallow: one hidden layer Features can be learned more-or-less independently Arbitrary function approximator (with enough hidden units) Deep: two

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9 General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School

More information

Why equivariance is better than premature invariance

Why equivariance is better than premature invariance 1 Why equivariance is better than premature invariance Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto with contributions from Sida Wang

More information

Deep Face Recognition. Nathan Sun

Deep Face Recognition. Nathan Sun Deep Face Recognition Nathan Sun Why Facial Recognition? Picture ID or video tracking Higher Security for Facial Recognition Software Immensely useful to police in tracking suspects Your face will be an

More information

Character Recognition from Google Street View Images

Character Recognition from Google Street View Images Character Recognition from Google Street View Images Indian Institute of Technology Course Project Report CS365A By Ritesh Kumar (11602) and Srikant Singh (12729) Under the guidance of Professor Amitabha

More information

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets) Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets) Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez, Senior Researcher, Microsoft Module Outline

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7 General Purpose GPU Programming Advanced Operating Systems Tutorial 7 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Suyash Shetty Manipal Institute of Technology suyash.shashikant@learner.manipal.edu Abstract In

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Convolutional Neural Networks for Facial Expression Recognition

Convolutional Neural Networks for Facial Expression Recognition Convolutional Neural Networks for Facial Expression Recognition Shima Alizadeh Stanford University shima86@stanford.edu Azar Fazel Stanford University azarf@stanford.edu Abstract In this project, we have

More information