Deep Learning With Noise
|
|
- Eleanore Scott
- 5 years ago
- Views:
Transcription
1 Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University Fan Yang Department of Mathematical Sciences Carnegie Mellon University Abstract Recent works have shown that, by allowing some inaccuracy when training deep neural networks, not only the training performance but also the accuracy of the model can be improved. Our work, taking those previous works as examples and guidance, tries to study the impact of introducing different types of noise in different components of training a deep neural network. We intend to experiment with noise types which include Binomial noise, Gaussian noise, Gamma noise, etc. We also intend to study the effects of noise in different parts of the model which include neurons and network links in the input, hidden, and output layers, as well as matrix multiplication and gradient computation in the backward propagation process. 1 Introduction Large-scale deep neutral network models have become increasingly popular to solve hard classification problems and have demonstrated significant improvements in accuracy. Compared to traditional statistical machine learning methods, which require a human domain expert that can construct a good set of features as input dataset, deep learning models do not require a hand crafted feature set to begin with, and hence is more powerful and suitable for hard AI tasks such as speech recognition or visual object classification. Without any hand crafting of the raw input data, deep neural network machine learning models can learn a hierarchy of features by itself in the first several layers of neural network model. Then, in the deepest layer of the model, a set of features are selected and weighted for each output to generate a prediction. Avoiding the inevitable human error in feature selection, deep learning often outperforms traditional approaches in those hard classification problems in terms of accuracy. In order to train a more complicated model which includes feature selection capability, deep neural network is typically trained with more data than a traditional machine learning method. Due to the scale of the deep neutral network (with multiple layers of neurons) and the scale of the input data set, the performance of these models, in addition to accuracy, has become a significant factor in such implementations. Recent work [1, 2] on large-scale machine learning systems propose to significantly improve the performance by relaxing the consistency when training neural network models (e.g., weights will not be updated in each iteration). One interesting observation in these papers, alongside the above one, is that relaxation surprisingly improves the accuracy of the deep learning model on the test data. However, the effect of noise on deep learning models has never been systematically studied, nor is the underlying reason for the improved accuracy. One hypothesis of the above observation is that relaxing consistency introduces stochastic noise into training process [1]. This implicitly mitigates over-fitting of the model and generalizes the model better to classify test data. Another hypothesis of this observation is that the introduced noise eliminates the memorization effect of a deep neural network, and hence allows the model to capture the general observation of the training data that can be applied to the test data well. 1
2 Our work, taking previous works as examples and guidance, tries to systematically study the effect of introducing different noise into different components of different types of deep learning neural networks. We observe that a reasonable amount, and a reasonable magnitude of noise, when introduced into a deep learning model, can improve the accuracy and the convergence rate of the model. We hope that our work can provide insights into future methods of approximate deep learning model, and inspire and motivate more work to take advantage of the 2 Background and Related Work In this section, we first introduce several common neural networks model, Logistic Regression (single neuron) model, Multi-Layer Perceptron (MLP) model, and Convolutional Neural Network (LeNet) model. Then we summarize and compare our work to several related works that introduce noise into those models to improve accuracy. 2.1 Neural Network Models Explained The simplest form of a neural network, which is also the primary component of any neural network model, is a single neuron. Figure 1a illustrates a single-neuron neural network, which is also known as the Logistic Regression (LR) model. The neuron shown in the figure consumes a vector of numbers (X) as inputs, and produces a single number as its output, which typically represents the prediction result by the model. The neuron stores a vector of weights (W), with each weight represents how positively or negatively each input affects the output. The output of the neuron and the update to the neuron s weight can be computed as follows: Output = tanh(w X) W new = W old learningrate Wold cost(w ) A Multi-Layer Perceptron (MLP) model is essentially multiple layers of neurons connected by a network. Figure 1b illustrates a simple example of such model, which is composed of three layers. The first layer is the input layer, which provides the raw inputs to the next layer. The second layer is the hidden layer, whose input is fully connected to the input layer, and output fully connected to the output layer. The hidden layer is known to be capable of extracting features from the input. The third layer is the output layer, which outputs the prediction results for the data. Note that the figure shows only an example of such model, the model can become deeper to extract more implicit features from the raw input if we add more hidden layers between input and output layers, which are fully connected to the layers next to each other. (a) Single-layer neural network (LR) model. (b) Multi-Layer Perceptron (MLP) model. Figure 1: Illustration of two neural network models. A Convolutional Neural Network (LeNet) model adds multiple convolution layers in addition to the MLP model. Figure 2 illustrates an example of this model. In the convolution layer, multiple steps are processed. First, the input is transformed into a two-dimensional array. Then a sliding window which contains a small two-dimensional weight vector is applied to the input. The sliding window is capable of extracting 2-D features from the inputs such as images. Finally, the processed input is downsampled by a 2x2 matrix, which reduces the size of the input by 4. In this figure, we show an example which contains two convolution layer and a hidden layer. We can also have a more complex model by adding more convolution layers or more hidden layers, which also allows the model to extract more implicit features. 2
3 Figure 2: Convolutional Neural Network (LeNet) model. 2.2 Comparison with Related Works We summarize three recent works that explain and explore three mechanisms to introduce noise into a multi-layer neural network (mlp). Dropout proposes to regularize fully connected neural networks by probabilistically dropping an output (set to zero) of a hidden layer neuron [3] (i.e., with a low probability (1 p), one of the output of a hidden layer neutron is set to 0 in the forward propagation process). This can effectively decrease test error rates by preventing over-fitting of the model. Inspired by Dropout [3], DropConnect proposes to probabilistically drop a weight of a hidden layer neuron (as opposed to an output of a hidden layer neuron in DropConnect) [2]. Maxout extends Dropout and DropConnect by probabilistically set an output or a weight of a hidden layer neuron to maximum value [4]. While these works attempts to explore similar ideas as our work, we believe our work is much more comprehensive than these works as we systematically and experimentally explored various noise models, various noise locations, and various neural networks. 3 Proposed Method In this section, we overview all types of noise that we have introduced into each model (LR, MLP, and LeNet) in our experiments. 3.1 Adding Noise into Logistic Regression We first introduce noise into gradient descent component of Logistic Regression. To be more specific, in a noise-free Logistic Regression model, weights are updated in the following way: W new = W old learningrate Wold cost(w ) In a noise-added Logistic Regression model, weights are updated as: W new = W old learningrate (mask Wold cost(w )) or W new = W old mask Gau Wold cost(w ) where learning rate is a scalar and mask is a vector that has the same dimension as W. We generate mask as a random vector from Binomial distribution Bin(1, 0.5), Gaussian distribution N (learningrate, 2 learningrate), Rayleigh Distribution Rayleigh(1) or Gamma Distribution Gamma(1, 1). 3.2 Adding Noise into Multi-layer Logistic Regression Secondly, we introduce noise into weights between layers. In our Multi-layer Logistic Regression model, there are three layers: input layer, hidden layer and output layer. Each layer consists of neurons. Neurons in different layers are connected by weights. During a noise-free training process of the model, weights between layers are transmitted and updated without any loss of information or variances. However, during a noise-added training process, weights between layers are are subject to some variation. To be more specific, let W input be the 3
4 matrix of weights between input layer and hidden layer, W output be the matrix of weights between hidden layer and output layer. In a noise-added training process, we add combination of the following steps: W input = W input mask W output = W output mask W input = W input + mask where mask is a matrix of the same dimension as W input or W output. We generate mask as a random matrix from Binomial Distribution Bin(1, 0.99) or Gaussian Distribution N (0, 0.01). 3.3 Adding Noise into Convolutional Neural Network Last, we introduce noise into feature mapping component of the model. The difference between Convolutional Neural Network (LeNet) and Multi-layer Logistic Regression (MLP) is that LeNet has a feature mapping process before MLP. Feature mapping is a process where a small window moves along the image to extract local features. In other words, the window, acting as a function, will compute a linear combination of the underlying pixels. In a noise-added feature mapping process, the extracted feature is subject to some variation. 4 Experiments In this section, we first present the dataset we use in our experiments as well as the parameters for each model. Note that we fine tuned these parameters to achieve the best possible outcomes before we add our modification to the code. Next, we present the results and conclude findings we get from our experiments, including those negative results and the lessons learned in this project. 4.1 Dataset and Implementation Parameters We experiment on three datasets a hand-written digit dataset (MNIST), two tiny images datasets (CIFAR-10 and CIFAR-100). Specifications of datasets are summarized in Table 1. Dataset Description Class Training Set Size Testing Set Size MNIST hand-written digits 10 60,000 10,000 CIFAR-10 32x32 RGB images 10 50,000 10,000 CIFAR x32 RGB images ,000 10,000 Table 1: Datasets: MNIST, CIFAR-10, CIFAR-100 We preprocess CIFAR-10 and CIFAR-100 by grey-scaling every image using the following formula: Y = R G B In other words, every pixel in the image is now a linear combination of its original RGB values. These two datasets are preprocessed due to technical implementation limitations (which will be fixed after the deadline), not machine learning theory reasons. Our neural network models are implemented using Python Theono Library. The starter code is from DeepLearning.net. Parameters of each neural network models are summarized in Table 2. Model Parameters Logistic Regression (LR) learning rate = 0.13 Multi-layer Logistic Regression (MLP) LR + hidden units = 500 Convolutional Neural Network MLP + window size = 5x5, downsample = 2x2 Table 2: Parameters of Neural Network Models We use stochastic logistic regression with learning rate = In Multi-layer Logistic Regression, there are 500 neurons in the hidden layer. During feature mapping of Convolutional Neural Network, windows are of size 5 by 5 and downsample is of size 2 by 2. 4
5 In our experiments, different models may run different number of iterations. This is because we set a threshold of accuracy increase when training the model. If the model s accuracy increase is less than the threshold, we stop training the model. Hence some models run more iterations as long as their accuracy increases are above the threshold. 4.2 Adding Noise into Logistic Regression Figure 3 shows test error rate using noise-free and noise-added Logistic Regression on MNIST. Figure 3: Logistic Regression with Noise on MNIST In Figure 3, the vertical axis is test error rate (%), the horizontal axis is number of iterations. The experiments all run on MNIST. The noise-free line shows test error rate using a noise-free Logistic Regression model. The noise(gaussian) line shows test error rate when mask Gau is applied during gradient descent. The noise(binomial) line shows test error rate when a mask generated from Bin(1, 0.5) is applied during gradient descent. The noise(rayleigh) line shows test error rate when a mask generated from Rayleigh(1) is applied during gradient descent. The noise(gamma) line shows test error rate when a mask generated from Gamma(1, 1) is applied during gradient descent. Finding 1: A reasonable amount, and a reasonable amplitude of noise improves deep neural network model s accuracy, while a noise that is too significant does not. As showed in Figure 3, noise-added models achieve better accuracy compared to the noise-free model. Noise(Binomial) model has the lowest test error rate (7.156%) among the five experiments. However, it also has the lowest convergence rate. This is a phenomenon we have observed throughout the project. Though adding noise can improve accuracy, the side effect is that it will take longer to train the model, hence decrease convergence rate. 4.3 Adding Noise into Multi-layer Logistic Regression Figure 4 shows test error rate using noise-free and noise-added Multi-layer Logistic Regression on MNIST. In Figure 4, the vertical axis is test error rate (%), the horizontal axis is number of iterations. The experiments all run on MNIST. The noise-free line shows test error rate using a noise-free Multilayer Logistic Regression. The dropconnect line shows test error rate when a mask generated from Bin(1, 0.99) is applied to W input. The dropout line shows test error rate when a mask generated from Bin(1, 0.99) is applied to W output. The dropconnect&out line shows test error rate when mask generated from Bin(1, 0.99) is applied to both W input and W output. The noise-variation line shows test error rate when a mask generated from Gaussian N (0, 0.01) is added to W input. Finding 2: Deep learning models with noise perform no worse than the noise-free model. As showed in Figure 4, noise-added models perform no worse than the noise-free model. Since the test error rate of the noise-free model is already quite low (2.63%), it is difficult for noise-added models to significantly improve accuracy. We notice that dropout model and dropconnect model 5
6 Figure 4: Multi-layer Logistic Regression with Noise on MNIST Figure 5: Multi-layer Logistic Regression with Noise on CIFAR-10 perform better than dropconnect&out model and noise-variation model. It is difficult to provide a conclusive explanation for this observation at the moment because we have not finished fine-tuning our noise-added models. It is possible that noise from certain distributions is more likely to prevent overfitting and hence improve accuracy. Figure 5 shows test error rate using noise-free and noise-added Multi-layer Logistic Regression on CIFAR-10. In Figure 5, the vertical axis is test error rate (%), the horizontal axis is number of iterations. The experiments all run on CIFAR-10. The noise-free line, dropconnect line and dropout line use the same models as experiments in Figure 4, respectively. Finding 3: Deep learning models with noise can take more iterations to converge as the test error fluctuates due to noise. As showed in Figure 5, noise-added models perform much better than noise-free model, though it takes longer to train noise-added models. An interesting observation is that as training iterations increase, test error rate of noise-added models fluctuates. This is another side effect of adding noise into the model. Finding 4: Noise added to earlier stage of the deep learning models can be better integrated and generate less fluctuation. From the above experiments using MLP, we observe that models with noise added between input layer and hidden layer outperform other noise-added models. An intuitive explanation for this phenomenon is that noise added to earlier stage of the model can be better integrated while noise added to later stage of the model tends to cause more fluctuation. 6
7 4.4 Adding Noise into Convolutional Neural Network Figure 6 shows test error rate using noise-free and noise-added Convolutional Multi-layer Logistic Regression on MNIST. In Figure 6, the vertical axis is test error rate (%), the horizontal axis is Figure 6: Convolutional Neural Network with Noise on MNIST number of iterations. The experiments all run on MNIST. The noise-free line shows test error rate when using a noise-free Convolutional Neural Network. The noise@downsampe line shows test error rate when noise is added during downsample process. The noise-before-hidden-layer line shows test error rate when noise is added right before hidden layer. Finding 5: The convergence rate is faster for deep learning models with noise. As showed in Figure 6, the three models perform equally well. We observe that as the number of iterations increases, noise-added models converge slightly faster. This phenomenon is interesting because it is unexpected. Similarly phenomenon appears in Figure 7 as well. Figure 7 shows test error rate using noise-free and noise-added Convolutional Multi-layer Logistic Regression on CIFAR-10. In Figure 7, the vertical axis is test error rate (%), the horizontal axis is Figure 7: Convolutional Neural Network with Noise on CIFAR-10 number of iterations. The experiments all run on CIFAR-10. The noise-free line shows test error rate when using a noise-free Convolutional Neural Network. The convo-dropconnect line shows test error rate when the MLP part of the model has noise added between input layer and hidden layer. The convo-dropout line shows test error rate when the MLP part of the model has noise added between hidden layer and output layer. 7
8 Finding 6: Noise improves both accuracy and convergence rate more with complex deep learning models. As showed in Figure 7, the three models achieve the same lowest test error rate. However, through the training process, the noise-added model (convo-dropconnect) converges faster than the noisefree model. The intuition behind this phenomenon is that since Convolutional Neural Network is a complicated model, noise is better integrated and absorbed. We conjecture that noise added to complicated deep learning models can improve not only accuracy but also convergence rate. 4.5 Negative Results Lesson Learned: Complex deep learning models can integrate noise better than simple models. Figure 8 shows some negative results from our experiments. We experiment noise-free and noiseadded Logistic Regression on CIFAR-100. The noise-added model perform much worse than noisefree model. The explanation for this result is that Logistic Regression model is too simple to integrate noise when running CIFAR-100. This agrees with our previous conjecture that complicated models are better at integrating noise. Figure 8: Negative Results on CIFAR Conclusions In this project, we systematically perform experiments on studying the effect of adding noise into deep learning neural networks. We conduct experiments on adding different noise into different components of neural network models. The experiment results show that adding noise almost always improves accuracy. Our main observations are: (1) Noise added during early stage of the model can be better integrated while noise added during late stage of the model tends to cause fluctuation of accuracy; (2) Complicated neural network models can integrate and absorb noise better than simple neural network models; (3) Sometimes adding noise can improve not only accuracy but also convergence rate. We hope that this experimental study can provide insights into future design of deep learning neural network models and machine learning hardwares. Next generation machine learning hardwares can fully exploit the results that Beyond this project, we hope to pursue three major research directions: (1) Conduct more thorough experiments that quantitatively analyze the effect of noise on deep learning models; (2) Provide theoretical explanation for the effect of noise on deep learning models based on our experiment results and findings; (3) Design and explore more efficient computer hardware and systems for deep learning models. References [1] T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, Project adam: Building an efficient and scalable deep learning training system, in OSDI,
9 [2] L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus, Regularization of neural networks using dropconnect, in ICML, [3] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors, arxiv preprint arxiv: , [4] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. C. Courville, and Y. Bengio, Maxout networks, in ICML,
Stochastic Function Norm Regularization of DNNs
Stochastic Function Norm Regularization of DNNs Amal Rannen Triki Dept. of Computational Science and Engineering Yonsei University Seoul, South Korea amal.rannen@yonsei.ac.kr Matthew B. Blaschko Center
More informationFrom Maxout to Channel-Out: Encoding Information on Sparse Pathways
From Maxout to Channel-Out: Encoding Information on Sparse Pathways Qi Wang and Joseph JaJa Department of Electrical and Computer Engineering and, University of Maryland Institute of Advanced Computer
More informationConvolutional Neural Networks
Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
More informationStudy of Residual Networks for Image Recognition
Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationWeighted Convolutional Neural Network. Ensemble.
Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationarxiv: v1 [stat.ml] 21 Feb 2018
Detecting Learning vs Memorization in Deep Neural Networks using Shared Structure Validation Sets arxiv:2.0774v [stat.ml] 2 Feb 8 Elias Chaibub Neto e-mail: elias.chaibub.neto@sagebase.org, Sage Bionetworks
More informationDeep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers
Deep Learning Workshop Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Why deep learning? The ImageNet Challenge Goal: image classification with 1000 categories Top 5 error rate of 15%. Krizhevsky, Alex,
More informationDeep Learning & Neural Networks
Deep Learning & Neural Networks Machine Learning CSE4546 Sham Kakade University of Washington November 29, 2016 Sham Kakade 1 Announcements: HW4 posted Poster Session Thurs, Dec 8 Today: Review: EM Neural
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationCS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016
CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More informationApparel Classifier and Recommender using Deep Learning
Apparel Classifier and Recommender using Deep Learning Live Demo at: http://saurabhg.me/projects/tag-that-apparel Saurabh Gupta sag043@ucsd.edu Siddhartha Agarwal siagarwa@ucsd.edu Apoorve Dave a1dave@ucsd.edu
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationIntroduction to Deep Learning
ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)
More informationDropConnect Regularization Method with Sparsity Constraint for Neural Networks
Chinese Journal of Electronics Vol.25, No.1, Jan. 2016 DropConnect Regularization Method with Sparsity Constraint for Neural Networks LIAN Zifeng 1,JINGXiaojun 1, WANG Xiaohan 2, HUANG Hai 1, TAN Youheng
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationGroupout: A Way to Regularize Deep Convolutional Neural Network
Groupout: A Way to Regularize Deep Convolutional Neural Network Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Groupout is a new technique
More informationDeep Neural Networks:
Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationEnd-To-End Spam Classification With Neural Networks
End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam
More informationComparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationVisual object classification by sparse convolutional neural networks
Visual object classification by sparse convolutional neural networks Alexander Gepperth 1 1- Ruhr-Universität Bochum - Institute for Neural Dynamics Universitätsstraße 150, 44801 Bochum - Germany Abstract.
More informationTiny ImageNet Visual Recognition Challenge
Tiny ImageNet Visual Recognition Challenge Ya Le Department of Statistics Stanford University yle@stanford.edu Xuan Yang Department of Electrical Engineering Stanford University xuany@stanford.edu Abstract
More informationEmotion Detection using Deep Belief Networks
Emotion Detection using Deep Belief Networks Kevin Terusaki and Vince Stigliani May 9, 2014 Abstract In this paper, we explore the exciting new field of deep learning. Recent discoveries have made it possible
More informationCOMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation
More informationMulti-Glance Attention Models For Image Classification
Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We
More informationAdvanced Introduction to Machine Learning, CMU-10715
Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio
More informationConvolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,
Convolutional Neural Networks: Applications and a short timeline 7th Deep Learning Meetup Kornel Kis Vienna, 1.12.2016. Introduction Currently a master student Master thesis at BME SmartLab Started deep
More informationNeural Networks: promises of current research
April 2008 www.apstat.com Current research on deep architectures A few labs are currently researching deep neural network training: Geoffrey Hinton s lab at U.Toronto Yann LeCun s lab at NYU Our LISA lab
More informationSeminars in Artifiial Intelligenie and Robotiis
Seminars in Artifiial Intelligenie and Robotiis Computer Vision for Intelligent Robotiis Basiis and hints on CNNs Alberto Pretto What is a neural network? We start from the frst type of artifcal neuron,
More informationTraffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers
Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane
More informationConvolution Neural Networks for Chinese Handwriting Recognition
Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationUsing Capsule Networks. for Image and Speech Recognition Problems. Yan Xiong
Using Capsule Networks for Image and Speech Recognition Problems by Yan Xiong A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science Approved November 2018 by the
More informationFuzzy Set Theory in Computer Vision: Example 3, Part II
Fuzzy Set Theory in Computer Vision: Example 3, Part II Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Resource; CS231n: Convolutional Neural Networks for Visual Recognition https://github.com/tuanavu/stanford-
More informationCS229 Final Project: Predicting Expected Response Times
CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time
More informationRotation Invariance Neural Network
Rotation Invariance Neural Network Shiyuan Li Abstract Rotation invariance and translate invariance have great values in image recognition. In this paper, we bring a new architecture in convolutional neural
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationAdvanced Machine Learning
Advanced Machine Learning Convolutional Neural Networks for Handwritten Digit Recognition Andreas Georgopoulos CID: 01281486 Abstract Abstract At this project three different Convolutional Neural Netwroks
More informationA comparison between end-to-end approaches and feature extraction based approaches for Sign Language recognition
A comparison between end-to-end approaches and feature extraction based approaches for Sign Language recognition Marlon Oliveira, Houssem Chatbri, Suzanne Little, Noel E. O Connor, and Alistair Sutherland
More informationThe Mathematics Behind Neural Networks
The Mathematics Behind Neural Networks Pattern Recognition and Machine Learning by Christopher M. Bishop Student: Shivam Agrawal Mentor: Nathaniel Monson Courtesy of xkcd.com The Black Box Training the
More informationDeep Neural Network Acceleration Framework Under Hardware Uncertainty
Deep Neural Network Acceleration Framework Under Hardware Uncertainty Mohsen Imani, Pushen Wang, and Tajana Rosing Computer Science and Engineering, UC San Diego, La Jolla, CA 92093, USA {moimani, puw001,
More informationarxiv: v1 [cs.cv] 29 Oct 2017
A SAAK TRANSFORM APPROACH TO EFFICIENT, SCALABLE AND ROBUST HANDWRITTEN DIGITS RECOGNITION Yueru Chen, Zhuwei Xu, Shanshan Cai, Yujian Lang and C.-C. Jay Kuo Ming Hsieh Department of Electrical Engineering
More informationCost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling
[DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationMotivation Dropout Fast Dropout Maxout References. Dropout. Auston Sterling. January 26, 2016
Dropout Auston Sterling January 26, 2016 Outline Motivation Dropout Fast Dropout Maxout Co-adaptation Each unit in a neural network should ideally compute one complete feature. Since units are trained
More informationIndex. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,
A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,
More informationUsing neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah
Using neural nets to recognize hand-written digits Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the first chapter of the online book by Michael
More informationHandwritten Hindi Numerals Recognition System
CS365 Project Report Handwritten Hindi Numerals Recognition System Submitted by: Akarshan Sarkar Kritika Singh Project Mentor: Prof. Amitabha Mukerjee 1 Abstract In this project, we consider the problem
More informationLecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017
INF 5860 Machine learning for image classification Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 0, 207 Mandatory exercise Available tonight,
More informationImage Compression: An Artificial Neural Network Approach
Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and
More informationConvolutional Neural Network for Image Classification
Convolutional Neural Network for Image Classification Chen Wang Johns Hopkins University Baltimore, MD 21218, USA cwang107@jhu.edu Yang Xi Johns Hopkins University Baltimore, MD 21218, USA yxi5@jhu.edu
More informationOn the Effectiveness of Neural Networks Classifying the MNIST Dataset
On the Effectiveness of Neural Networks Classifying the MNIST Dataset Carter W. Blum March 2017 1 Abstract Convolutional Neural Networks (CNNs) are the primary driver of the explosion of computer vision.
More information3D model classification using convolutional neural network
3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing
More informationDeep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies
More informationNeural Networks (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.
More informationReal-time Object Detection CS 229 Course Project
Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection
More informationAssignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions
ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan
More informationAdaptive Dropout Training for SVMs
Department of Computer Science and Technology Adaptive Dropout Training for SVMs Jun Zhu Joint with Ning Chen, Jingwei Zhuo, Jianfei Chen, Bo Zhang Tsinghua University ShanghaiTech Symposium on Data Science,
More informationContextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work
Contextual Dropout Finding subnets for subtasks Sam Fok samfok@stanford.edu Abstract The feedforward networks widely used in classification are static and have no means for leveraging information about
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More informationCharacter Recognition Using Convolutional Neural Networks
Character Recognition Using Convolutional Neural Networks David Bouchain Seminar Statistical Learning Theory University of Ulm, Germany Institute for Neural Information Processing Winter 2006/2007 Abstract
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationLSTM: An Image Classification Model Based on Fashion-MNIST Dataset
LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationVulnerability of machine learning models to adversarial examples
ITAT 216 Proceedings, CEUR Workshop Proceedings Vol. 1649, pp. 187 194 http://ceur-ws.org/vol-1649, Series ISSN 1613-73, c 216 P. Vidnerová, R. Neruda Vulnerability of machine learning models to adversarial
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30
More informationDeep Neural Networks for Recognizing Online Handwritten Mathematical Symbols
Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols Hai Dai Nguyen 1, Anh Duc Le 2 and Masaki Nakagawa 3 Tokyo University of Agriculture and Technology 2-24-16 Nakacho, Koganei-shi,
More informationStacked Denoising Autoencoders for Face Pose Normalization
Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University
More informationCS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning
CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with
More informationArtificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )
Structure: 1. Introduction 2. Problem 3. Neural network approach a. Architecture b. Phases of CNN c. Results 4. HTM approach a. Architecture b. Setup c. Results 5. Conclusion 1.) Introduction Artificial
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationDisguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601
Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,
More informationDNN-BASED AUDIO SCENE CLASSIFICATION FOR DCASE 2017: DUAL INPUT FEATURES, BALANCING COST, AND STOCHASTIC DATA DUPLICATION
DNN-BASED AUDIO SCENE CLASSIFICATION FOR DCASE 2017: DUAL INPUT FEATURES, BALANCING COST, AND STOCHASTIC DATA DUPLICATION Jee-Weon Jung, Hee-Soo Heo, IL-Ho Yang, Sung-Hyun Yoon, Hye-Jin Shim, and Ha-Jin
More informationBranchyNet: Fast Inference via Early Exiting from Deep Neural Networks
BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks Surat Teerapittayanon Harvard University Email: steerapi@seas.harvard.edu Bradley McDanel Harvard University Email: mcdanel@fas.harvard.edu
More informationMotivation. Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight. Fixed basis function
Neural Networks Motivation Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight Fixed basis function Flashback: Linear regression Flashback:
More informationarxiv: v2 [cs.cv] 30 Oct 2018
Adversarial Noise Layer: Regularize Neural Network By Adding Noise Zhonghui You, Jinmian Ye, Kunming Li, Zenglin Xu, Ping Wang School of Electronics Engineering and Computer Science, Peking University
More informationWeek 3: Perceptron and Multi-layer Perceptron
Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,
More informationClassification of objects from Video Data (Group 30)
Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time
More informationLecture 19: Generative Adversarial Networks
Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,
More informationFinding Tiny Faces Supplementary Materials
Finding Tiny Faces Supplementary Materials Peiyun Hu, Deva Ramanan Robotics Institute Carnegie Mellon University {peiyunh,deva}@cs.cmu.edu 1. Error analysis Quantitative analysis We plot the distribution
More informationINTRODUCTION TO DEEP LEARNING
INTRODUCTION TO DEEP LEARNING CONTENTS Introduction to deep learning Contents 1. Examples 2. Machine learning 3. Neural networks 4. Deep learning 5. Convolutional neural networks 6. Conclusion 7. Additional
More information