DropConnect Regularization Method with Sparsity Constraint for Neural Networks

Size: px
Start display at page:

Download "DropConnect Regularization Method with Sparsity Constraint for Neural Networks"

Transcription

1 Chinese Journal of Electronics Vol.25, No.1, Jan DropConnect Regularization Method with Sparsity Constraint for Neural Networks LIAN Zifeng 1,JINGXiaojun 1, WANG Xiaohan 2, HUANG Hai 1, TAN Youheng 1 and CUI Yuanhao 1 (1. Key Laboratory of Trustworthy Distributed Computing and Service (BUPT), Ministry of Education, Being University of Posts and Telecommunications, Being , China) (2. School of Software and Microelectronics, Peking University, Being , China) Abstract DropConnect is a recently introduced algorithm to prevent the co-adaptation of feature detectors. Compared to Dropout, DropConnect gains state-of-the-art results on several image recognition benchmarks. Motivated by the success of DropConnect, we extended this algorithm with the ability of sparse feature selection. In DropConnect algorithm, the dropping masks of weights are generated using Bernoulli gating variables that are independent of the weights and activations. We introduce a new strategy to generate masks depending on the outputs of previous layer. Using this method, neurons which are promising to produce sparser features will be assigned a bigger possibility to keep active in the forward and backward propagations. We then evaluate such sparsity constrained DropConnect on MNIST and CIFAR datasets in comparison with ordinary DropConnect and Dropout method. The results show that our new method improves the sparsity of features significantly, while not degrading the precision. Key words DropConnect, Sparse regularization, Deep learning, Neural networks. I. Introduction Feedforward artificial neural networks are well suited to deal with large labeled datasets, since their capacity can be scaled up easily by adding more layers to the networks, or adding more units in each layer. However, in order to keep the ability of extracting the higher dimensional features, when datasets are large enough, the corresponding neural networks must be large enough too. Large-scale neural networks always contain millions or billions of parameters, and the relationship between the input and the correct output is complicated too. Given a limited amount of labeled training data, and a random initial status to start from, the neural networks can learn many different settings of the weights that can model the training set almost perfectly from forward and backward propagations. But when evaluated on the test set and validation set, almost all of them will do worse than on the training set because the feature detectors have been tuned too fine to work on the training data but not on the test and validation data. Such a frustrating phenomenon is always called Overfitting. In order to deal with overfitting, a wide range of techniques for regularizing neural networks have been developed. Recently, two state-of-the-art regularization methods were proposed, called Dropout [1] and DropConnect [2], while the latter is a generalization of Dropout. When training neural networks using DropConnect model, each weight between two layers is kept with probability (1 p), otherwise being set to zero with probability p. Extensive experiments show that DropConnect improves the network s generalization ability, and gives improved test performance. In DropConnect of Li Wan et al. [2], the dropping masks of weights are generated using Bernoulli gating variables. So that every weight shares the same probability of being dropped or not. In this paper, we impose a sparsity constraint on DropConnect when generating dropping mask, thus change the dropping probability of masks from a constant into a function of the activation of neurons in previous layer. The following experiments show that such modified version of DropConnect, hereinafter referred to as Sparse DropConnect, improves the sparsity property of features significantly, while keeping a good performance as DropConnect in terms of generalization ability. The rest of this paper is organized as follows. In Sec- Manuscript Received May 11, 2015; Accepted June 23, This work is supported by the National Natural Science Foundation of China (No , No ), National High Technology Research and Development Program of China (No.2011AA01A204). c 2016 Chinese Institute of Electronics. DOI: /cje

2 DropConnect Regularization Method with Sparsity Constraint for Neural Networks 153 tion II we make a literature review of prior works, including a brief introduction of Dropout and DropConnect strategy. Then, detailed methodology of Sparse Drop- Connect method is described in Section III. Section IV gives our experiments of different models on MNIST and CIFAR-10 datasets, including discussions and analyses about the result. Finally, we conclude our work and discuss the future work in Section V. II. Related Works Deep learning algorithms are special cases of representation learning proposed in recent years, which have made important empirical successes in traditional AI applications such as computer vision and natural language processing [3,4]. Deep learning algorithms can learn multiple levels of representation, thus more abstract features could be discovered automatically. Abstract and complex representations are believed to be more useful than the shallow ones. But as neural networks grow deeper and larger, overfitting becomes a major challenge when processing large datasets while only small amounts of data are offered as training set. To regularize neural networks from overfitting and coadaptation of feature detectors, some simple approaches have shown favorable effects, such as imposing a L2 penalty on the network weights, weight elimination [5], Bayesian methods [6], and early stopping of training. Denoising autoencoders (DAEs) by Vincent et al. [7,8] add noise to the input units of an autoencoder as a type of regularization, and the network is trained to reconstruct the noise-free input. DAEs achieved good performance on Autoencoders. Recently, a new form of regularization method called Dropout was proposed by Hinton et al. in 2012 [1]. Then, Li Wan et al. takes the idea a step further, and introduced a general form of Dropout, called DropConnect [2], for regularizing large fully connected layers within neural networks. When training neural networks with Dropout method, each element of a layer s output is kept unchanged with probability (1 p), otherwise being set to 0 with probability p. DropConnect is the generalization of Dropout in which each connection, instead of each output unit as in Dropout, can be dropped with probability p. DropConnect is similar to Dropout as it introduces dynamic noise into the model, but differs in that the noise is on the weights W, rather than the output vectors of a layer. Both Dropout and DropConnect can be seen as stochastic regularization techniques. Similar to Dropout, DropConnect technique is very suitable for fully connected layers, but also can be used in other neural network models, such as Convolutional neural networks (CNNs) [9] and Deep belief networks (DBNs) [10]. Based on these previous works, we made some improvement on original DropConnect strategy, in order to get a better performance in term of feature sparsity, while keeping the regularization ability of DropConnect. The detailed methodology and sparseness measure are as below. III. Methodology of Sparse DropConnect In this section we firstly review the DropConnect neural network model briefly. Then the detailed methodology of Sparse DropConnect is elaborated, especially the calculation of DropConnect probability of weights. At the end of this section, a measuring method of feature sparseness is formulated, which will be used in the experiments later to examine the performance of our method. 1. DropConnect formulization Consider a feed-forward neural network with L fully connected hidden layers, in which one or more fully connected layers adopt the method of DropConnect. Let l {1,...,L} indexes one of the DropConnect layer, while p means the probability of dropping connects. Let y (l) i denotes the activation of unit i in layer l, while y (0) = x is the input. z (l) i is the total weighted sum of inputs to unit i in layer l. W l and b l are the weights and biases of layer l, respectively. The symbol * means element-wise multiplication. f(z (l) ) denotes the activation function for the l-th layer. Then, the feed-forward operation of a Drop- Connect neural network can be described as follows: mask (l) Bernoulli(p) (1) W (l) = W (l) mask (l) (2) n z (l+1) i = W (l) y(l) j + b (l) i (3) j=1 y (l+1) = f(z (l+1) ) (4) 2. Sparse DropConnect Sparse representation are a favorable compromise between dense input signals and local codes. In neural networks, sparse coding always means small set of hidden neurons or small average activity ratio of hidden layers. Given a potentially large set of input data, sparse representation attempts to find the smallest number of representative patterns automatically. Such patterns can be used to reproduces the original input patterns with minimum deviation when combined in appropriate proportions and weighted. The sparse representation for the input then consists of those representative patterns. In DropConnect strategy, all the weights are dropped according to the same probability, without considering the sparsity property of neurons. What will happen if different units were assigned with different probability of propagation according to the sparseness of its activations? In

3 154 Chinese Journal of Electronics 2016 other words, masks of weights turns into a function of outputs that are produced by neurons in previous layer. Based on this intuition, the feed-forward operations of our Sparse DropConnect neural networks can be formulized as follows: W (l) mask (l) ρ (l) = W (l) z (l+1) i = Uniform(0.5) (5) = p drop(y (l 1) j ) (6) (mask(l) n j=1 >ρ(l) ) (7) W (l) y(l) j + b (l) i (8) y (l+1) = f(z (l+1) ) (9) where W (l) denotes the weight associated with the connection between the j-th unit in layer l 1, and the i-th unit in layer l. The symbol * in Eq.(7) means element-wise multiplication. ρ (l) denotes the DropConnect probability associated with the weight W (l). p drop( ) is a function that calculates the dropping probability (ρ (l) ), which will be elaborated in Section III.3. The probability function ρ (l) takes a normalized neural activation of previous layer (y (l 1) j ) as input, and produce a DropConnect probability (ρ (l) ) ranges from 0 to 1.0. Then, the randomly generatedmasksismodifiedaccording to ρ (l). If the initial random mask value associated with W (l) (denoted by mask (l) ) is bigger than ρ(l), the mask will be set to one, the corresponding weight will keep active during forward and backward propagation. Otherwise the mask will be set to zero, and the weight will be omitted during propagations. Since the dropping probabilities of weights are function values of previous layer s outputs, so that Sparse DropConnect method adds the ability of selective feedforward propagation to the normal DropConnect strategy. In addition, since sparsity property is partially embodied in the distribution of neural outputs, such selective propagation adds a new sparse regularization to DropConnect models. To update weight matrix W during backward propagation phase, the DropConnect masks are applied to the gradient vectors to update only the weights and biases that were active in the forward pass. During testing phase, we use the same approximation method adopted by Hinton et al. in 2012 [1] : f((m W )x) f( (M W )x). (10) M M This method averages outputs before the activation rather than after. Although it has not been justified mathematically, but works well in practice. The 1-dimensional Gaussian approximation method used in DropConnect is not applicable in this paper, since the masks are not Bernoulli variables anymore after sparsity constrained selection of Eq.(7), thus the weighted sum cannot be approximated by a Gaussian variable. 3. DropConnect probability of weights Under the hypothesis of sparse representation, the average activity ratio must be as low as possible, typically a small value close to zero. To satisfy this constraint, most activations of hidden units should be near to zero. To achieve this, the typical method is adding an extra penalty term to optimize the objective function. The extra term penalizes the y (l) i which deviate significantly from the anticipative average activity ratio when the network is given a specific input, where y (l) i denotes the activation value of the i-th hidden unit in layer l. Under the sparsity hypothesis, most of the outputs should take the value of zero. Besides, for the non-zero activation outputs, the preferred value is prone to either zero, or a considerably large value compared with the average activation value. According to Eq.(6), the probabilities of dropping connects for weights between layer l and layer l 1 are depending on the outputs of layer l 1. Before calculating these probabilities, neurons output vector of previous layer, denoted by y (l 1), need to be normalized to the range of [0, 1]. We use ỹ (l 1) to denote the normalized output vector. As to each element of the normalized neuron s outputs ỹ (l 1), the associated DropConnect probability corresponding to W (l) ρ (l) = p drop(y (l 1) j will be: )=4ỹ (l 1) j (1 ỹ (l 1) j ) (11) Fig.1 demonstrates the DropConnect probability curve of activations. As can be seen in the figure, activations are normalized to the range of [0, 1] before being inputted to this probability function. The corresponding DropConnect probabilities range from zero to 1.0. When ỹ (l 1) j equals to 0.5, the value of ρ (l) takes the maximum of 1.0. As ỹ (l 1) j moves from middle to either end of the range [0, 1], DropConnect probability decreases smoothly from 1.0 towards zero. Fig. 1. ropping probability curve for normalized activations 4. Measure of feature sparseness This section defines the sparseness measure of the fea-

4 DropConnect Regularization Method with Sparsity Constraint for Neural Networks 155 ture vectors outputted by neural networks, which will be used later to quantify the performance of Sparse Drop- Connect strategy. The concept of sparse coding or sparse representation refers to a signal representational scheme that only a few units in a large neural network are effectively used to represent typical data vectors [11]. This constraint implies that most neurons in a neural network will take values of zero, or close to zero. In some special but reasonable cases, a small set of neurons may output values that are far from zero, but concentrated in the neighborhood of a relatively big value. Two commonly used sparseness measures as well as sparsity constraints are L 1 norm and L 2 norm. When regularized by the L 1 norm in feature extraction tasks, sparsity is yielded by learning each variable (weight or bias) individually according to the objective function constrained by L 1 norm. Being regularized by linear combinations of L 2 norms is known to induce sparsity into W too [12].Both methods are widely used in neural networks as sparsity constraints and show remarkable power of regularization. Meanwhile, L 1 and L 2 norm are commonly used measures of sparseness too. In this paper, we use a sparseness measure based on the relationship between L 1 norm and L 2 norm, which is introduced by Patrik O. Hoyer [13]. The sparseness measure is defined as: n ( xi )/ x 2 i sparsness(x) = (12) n 1 where x denotes the input features, n is the number of elements of x. The input features need to be flattened to 1-dimensional when it s a multidimensional matrix. Normalization is also needed if input feature ranges beyond [0, 1]. This function evaluates to 1.0 if and only if the input x contains only one single non-zero component, and equal to zero if and only if all components are equal. In other cases, the result of this sparseness measure equation takes the value interpolating smoothly between 0 and 1.0. Fig. 2. Illustration of various degrees of sparseness Fig.2 illustrates the concept of the sparseness measure used in this paper. Four vectors with sparseness levels of about 0.1, 0.4, 0.7, and 0.9 are shown in the figures. Each bar denotes one element of the vector. The vector with the lowest level of sparseness (leftmost vector) has the most non-zero values, and most of the elements deviate not far from the mean. At the highest level (rightmost), most elements are zero, and only a few take significant values. IV. Experiments and Results Analysis In order to make clear comparisons, we evaluate Sparse DropConnect method using neural network models of DNNs and CNNs which are similar to the models used in DropConnect experiments of Li Wan et al. [2] for the task of image classification. Our main expectation for Sparse DropConnect is improvements on the sparsity property of features, the utmost classification precision is not the primary concern. Therefore, the layers and neurons contained in each model are not as large as the stateof-the-art records. These model settings may not produce excellent testing errors, but when compared with the same models, the improvements with respect to sparsity property should be demonstrated clearly. We run experiments on MNIST [9] and CIFAR-10 [14] datasets. All datasets are pre-processed by mean normalization, but no whitening processing or data augmentation schemes are used. Sparse DropConnect, DropConnect, Dropout, and non-regularized methods are used respectively in the last fully connected layers. All experiments use mini-batch Stochastic gradient descent (SGD) with the mini-batch size of 100, and the momentum parameter is fixed at 0.9. Our implementation is based on a Deep learning library named cuda-convnet [15],whichis a fast C++/CUDA implementation of feed-forward neural networks written by Alex Krizhevsky [14,15]. For each dataset, two aspects of results about performance will be outputted after training and test. The first aspects of data is about the discriminative ability, which is characterized by test error and cross entropy. Another is about the sparsity of features learned by the neural network models, which will be measured by the sparseness measure elaborated in Section III.4. ReLU and sigmoid are used as activation functions in these experiments. In experiments aimed to test the classification accuracy, we use ReLU units to gain a better performance. But in experiments intended to perform sparseness measurements, sigmoid units are preferred. There are two reasons for this choice. On the one hand, ReLU units doesn t face the gradient vanishing problem as with sigmoid and tanh functions, thus it s possible to obtain a higher classification accuracy. Meanwhile, by limiting negative value to zero, the outputs of ReLU units become sparser than other neurons. This effect will entangle with the sparsity constraints, and affect the sparseness measurement of experiments. On the other hand, the output

5 156 Chinese Journal of Electronics 2016 range of ReLU unit is [0, + ), this makes it inappropriate to sparseness analysis compared with sigmoid unit, which has a fixed output range of [0, 1]. Though the outputs of ReLU units can be normalized to [0, 1] by being divided by the maximum value of each mini-batch outputs, but since the maximum value of every mini-batch differs, this normalization will cause distortion of the probability distribution of activations when regarding all the minibatches as a whole. Therefore, both ReLU and sigmoid units are used to get classification accuracy or sparseness of features, respectively. 1. MNIST The MNIST handwritten digit dataset [9] for classification task consists of black and white images. It contains 70,000 handwritten digit samples which belong to 10 classes from digit 0 to 9. Each digit in the 60,000 training images and 10,000 test images is size normalized to fit in a pixel box while preserving their aspect ratio. We train models with two fully connected layers each with 400 neurons using ReLU or sigmoid activation functions to compare to DropConnect of Li Wan et al. [2].The first layer takes sized raw image pixels as input. Layer 2 and layer 3 are hidden layers of 400 units, which are responsible for learning image features. The last hidden layer s output is fed into a 10-class Softmax layer, then yields classification results from 0 to 9. We use stochastic gradient descent with mini-batch size of 100, and an objective function of cross-entropy. The initial learning rate is set to 0.1, and anneals with epochs. We train the model for five stages of epochs, with corresponding learning rate set as In Fig.3 we show the performance and convergence property of our Sparse DropConnect method used in fully connected DNN models described above, compared to No- Drop, Dropout and DropConnect methods on MNIST dataset. Sparse DropConnect, DropConnect, Dropout and No-Drop are denoted by sdc, dc, do and na in the figure, respectively. In Fig.3(a) weplottheerrorratesof each method on both test set and training set. Fig.3(b) shows the convergence properties of the four methods. From the two figures we can see that, model with No-Drop overfits quickly, while Sparse DropConnect, DropConnect and Dropout converge slowly, but reach better test performances ultimately. Sparse DropConnect is slower to converge than Dropout, but slightly faster than Drop- Connect, and reaches the lowest test error in the end. The final error rates of every method are summarized in the second column of Table 1. The third column of Table 1 shows the values of sparseness according to the measure described in Section III.4. Note that due to the limitation of DNN architecture s distinguishing ability, the test errors of all models are not as pretty as the stateof-the-art results, but Sparse DropConnect still performs best among them. What s really important is the sparsity property, in which Sparse DropConnect obtains outstanding result. Table 1. Accuracy and sparseness on MNIST dataset. Model Test error (%) Sparseness Sparse DropConnect DropConnect Dropout No-Drop In order to demonstrate the sparsity property of features that are outputted by neural networks adopting Sparse DropConnect, DropConnect, Dropout, or No-Drop strategies respectively, we train neural networks using the four methods as described above, then input the test set to the fully trained networks, and make analysis of the distributions and sparsity properties of outputs. The test set of MNIST contains 10,000 samples. Fig. 3. Comparison of performance and convergence property. (a) Error rates of train/test sets; (b) Logistic regression costs of train/test sets

6 DropConnect Regularization Method with Sparsity Constraint for Neural Networks 157 The last fully connected layer above output layer, which is layer 3, is a hidden layer containing 400 sigmoid neurons. Therefore, a total number of 4,000,000 values will be outputted by layer 3, range from zero to one. As a way to demonstrate the sparsity property of the neural networks trained with different methods respectively, we plot the histograms of the 4,000,000 outputs of layer 3 in Fig.4 to show the probability distributions. Each subfigure in Fig.4 corresponds to one regularization method out of Sparse DropConnect, DropConnect, Dropout, or No-Drop. In Fig.4, we can see that the overall output values of hidden units trained with Sparse DropConnect are smaller and more concentrated than those trained with other methods, which reflects that the outputs seem to be sparser. Most values of Sparse DropConnect outputs are between 0 and 0.1, which is the biggest difference with other distributions. Using the sparseness measure described in Section III.4, the neural network trained with Sparse DropConnect achieves best sparseness value of Sparseness values of other neural networks are 0.24 for DropConnect, 0.05 for Dropout, and 0.14 for No- Drop, respectively. kernel size is 5, and stride is 1. Three max-pooling layers summarize a 3 3 neighborhood and use a stride of 2. Between the last convolutional layer and output layer, there is a fully-connected layer with 64 neurons. Such a model is designed for rapid training rather than optimal classification performance. In the last fully connected layer, Sparse DropConnect, DropConnect, Dropout, or No-Drop is adopted respectively to evaluate the performances with respect to classification accuracy and feature sparsity. Since these experiments are aimed to demonstrate the effects of different regularization methods, rather than to gain optimal classification result, the neural network model don t need to be very complicated, neither the training epochs need to be a large number. Thus we train the models for 150 epochs with an initial learning rate of 0.1 and their default weight decay in all experiments on CIFAR-10. Table 2 summarizes the classification accuracy and feature sparseness of models trained using different strategies including Sparse DropConnect, DropConnect, Dropout, and No-Drop. As shown in Table 2, the Sparse DropConnect method achieves the best property of feature sparsity, while keeping a comparable performance of classification accuracy to Dropout and DropConnect strategies. Table 2. Accuracy and sparseness on CIFAR-10 dataset. Model Test error (%) Sparseness Sparse DropConnect DropConnect Dropout No-Drop V. Conclusion and Further Work Fig. 4. Comparison of feature distributions and sparseness. (a) Sparse DropConnect(0.54); (b) DropConnect(0.24); (c) Dropout(0.05); (d) No-Drop(0.14) 1. CIFAR-10 The CIFAR-10 [13,14] dataset is a subset of the Tiny Images dataset [16]. It contains 60, color images in 10 classes, with 6,000 images per class. Each class contains a training set of 5,000 images and a test set of 1,000 images, respectively. The experiments on CIFAR-10 are based on the simple convolutional network models described by Alex Krizhevsky [15], which is named as layers-80sec.cfg in the author s code repository. This model contains three convolutional layers with 32, 32 and 64 feature maps respectively, each followed by a polling layer. The convolutional We have presented Sparse DropConnect regularization method, which is a modified version of DropConnect constrained by sparse feature selection strategy. By regulating the probability of passing through or dropping of each neuron adaptively, Sparse DropConnect allows us to add sparsity property to conventional DropConnect models while maintaining its key advantage of antioverfitting. A series of experiments are done on MNIST and CIFAR-10 datasets using neural network models including DNNs and CNNs. Trained with Sparse DropConnect methods, the outputs of fully connected layer in both DNNs and CNNs exhibit considerably sparser distributions than their counterparts that are trained without Sparse DropConnect. Meanwhile, the accuracy of classification is improved slightly too. Such experiments have demonstrated the preferable performance of Sparse Drop- Connect method with respect to both feature sparsity and discriminative ability. Although Sparse DropConnect performs better than other methods, the error border of this method has not

7 158 Chinese Journal of Electronics 2016 been calculated theoretically. Meanwhile, our implementation of Sparse DropConnect is slightly slower than Dropout and DropConnect. In very deep models used for large datasets, speed of the feature extractor will be an important parameter, since it may cause significant difference in overall training time. References [1] G.E. Hinton, N. Srivastava, A. Krizhevsky, et al., Improving neural networks by preventing co-adaptation of feature detectors, arxiv preprint arxiv: , [2] L. Wan, et al., Regularization of neural networks using Drop- Connect, Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp , [3] Y. Bengio, Learning deep architectures for AI, Foundations and trends in Machine Learning, Vol.2, No.1, pp.1 127, 2009 [4] Y. Bengio, et al., Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.35, No.8, pp , [5] A.S. Weigend, D.E. Rumelhart and B.A. Huberman, Generalization by weight-elimination with application to forecasting, Neural Information Processing Systems (NIPS), [6] D.J.C. Mackay, Probable networks and plausible predictions A review of practical bayesian methods for supervised neural networks, Network Computation in Neural Systems, Vol.6, No.3, pp , [7] P. Vincent, et al., Extracting and composing robust features with denoising autoencoders, Proceedings of the 25th International Conference on Machine Learning, ACM, pp , [8] P. Vincent, et al., Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Proceedings of the 27th International Conference on Machine Learning, ACM, pp , [9] Y. LeCun, L. Bottou, Y. Bengio and P. Haner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, Vol.86, No.11, pp , [10] Hinton, Geoffrey, S. Osindero and Yee-Whye Teh, A fast learning algorithm for deep belief nets, Neural computation, Vol.18, No.7, pp , [11] D.J. Field, What is the goal of sensory coding?, Neural Computation, Vol.6, pp , [12] P. Zhao, G. Rocha and B. Yu, The composite absolute penalties family for grouped and hierarchical variable selection, Annals of Statistics, Vol.37, No.6A, pp , [13] P.O. Hoyer, Non-negative matrix factorization with sparseness constraints, The Journal of Machine Learning Research, Vol.5, pp , [14] A. Krizhevsky, Learning multiple layers of features from tiny images, Master s Thesis, University of Toronto, [15] A. Krizhevsky, Cuda-convnet, available at om/p/cuda-convnet/, [16] Torralba, Antonio, R. Fergus and W.T. Freeman, 80 million tiny images: A large data set for nonparametric object and scene recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.30, No.11, pp , LIAN Zifeng was born in He is now pursuing the Ph.D. degree in School of Information and Communication Engineering, Being University of Posts and Telecommunications. His research interests include pattern recognition, machine learning and deep learning. ( lianzf@bupt.edu.cn) JING Xiaojun was born in He received the M.S. and Ph.D. degrees in 1995 and 1999 respectively, both in communications and information systems. From 2000 to 2002, he had been the postdoctoral researcher in Being University of Posts and Telecommunications, and now he is a professor in Being University of Posts and Telecommunications. ( jxiaojun@bupt.edu.cn)

Deep Learning With Noise

Deep Learning With Noise Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu

More information

Weighted Convolutional Neural Network. Ensemble.

Weighted Convolutional Neural Network. Ensemble. Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com

More information

11. Neural Network Regularization

11. Neural Network Regularization 11. Neural Network Regularization CS 519 Deep Learning, Winter 2016 Fuxin Li With materials from Andrej Karpathy, Zsolt Kira Preventing overfitting Approach 1: Get more data! Always best if possible! If

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Autoencoders, denoising autoencoders, and learning deep networks

Autoencoders, denoising autoencoders, and learning deep networks 4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,

More information

Neural Networks: promises of current research

Neural Networks: promises of current research April 2008 www.apstat.com Current research on deep architectures A few labs are currently researching deep neural network training: Geoffrey Hinton s lab at U.Toronto Yann LeCun s lab at NYU Our LISA lab

More information

Introduction to Deep Learning

Introduction to Deep Learning ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

Convolutional Neural Networks

Convolutional Neural Networks Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

More information

Stacked Denoising Autoencoders for Face Pose Normalization

Stacked Denoising Autoencoders for Face Pose Normalization Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016 CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2016 Assignment 5: Due Friday. Assignment 6: Due next Friday. Final: Admin December 12 (8:30am HEBB 100) Covers Assignments 1-6. Final from

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

Advanced Introduction to Machine Learning, CMU-10715

Advanced Introduction to Machine Learning, CMU-10715 Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio

More information

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider

More information

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

From Maxout to Channel-Out: Encoding Information on Sparse Pathways

From Maxout to Channel-Out: Encoding Information on Sparse Pathways From Maxout to Channel-Out: Encoding Information on Sparse Pathways Qi Wang and Joseph JaJa Department of Electrical and Computer Engineering and, University of Maryland Institute of Advanced Computer

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine

To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine 2014 22nd International Conference on Pattern Recognition To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine Takayoshi Yamashita, Masayuki Tanaka, Eiji Yoshida, Yuji Yamauchi and Hironobu

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Neural Network Neurons

Neural Network Neurons Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given

More information

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD. Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,

More information

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer

More information

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why? Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3 Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.

More information

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

On the Effectiveness of Neural Networks Classifying the MNIST Dataset On the Effectiveness of Neural Networks Classifying the MNIST Dataset Carter W. Blum March 2017 1 Abstract Convolutional Neural Networks (CNNs) are the primary driver of the explosion of computer vision.

More information

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Deep Learning Workshop Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Why deep learning? The ImageNet Challenge Goal: image classification with 1000 categories Top 5 error rate of 15%. Krizhevsky, Alex,

More information

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun

More information

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning, A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 11 Sinan Kalkan TRAINING A CNN Fig: http://www.robots.ox.ac.uk/~vgg/practicals/cnn/ Feed-forward pass Note that this is written in terms of the

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Rotation Invariance Neural Network

Rotation Invariance Neural Network Rotation Invariance Neural Network Shiyuan Li Abstract Rotation invariance and translate invariance have great values in image recognition. In this paper, we bring a new architecture in convolutional neural

More information

arxiv: v1 [cs.lg] 16 Jan 2013

arxiv: v1 [cs.lg] 16 Jan 2013 Stochastic Pooling for Regularization of Deep Convolutional Neural Networks arxiv:131.3557v1 [cs.lg] 16 Jan 213 Matthew D. Zeiler Department of Computer Science Courant Institute, New York University zeiler@cs.nyu.edu

More information

Stochastic Function Norm Regularization of DNNs

Stochastic Function Norm Regularization of DNNs Stochastic Function Norm Regularization of DNNs Amal Rannen Triki Dept. of Computational Science and Engineering Yonsei University Seoul, South Korea amal.rannen@yonsei.ac.kr Matthew B. Blaschko Center

More information

MoonRiver: Deep Neural Network in C++

MoonRiver: Deep Neural Network in C++ MoonRiver: Deep Neural Network in C++ Chung-Yi Weng Computer Science & Engineering University of Washington chungyi@cs.washington.edu Abstract Artificial intelligence resurges with its dramatic improvement

More information

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural

More information

Dropout. Sargur N. Srihari This is part of lecture slides on Deep Learning:

Dropout. Sargur N. Srihari This is part of lecture slides on Deep Learning: Dropout Sargur N. srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Convolutional Deep Belief Networks on CIFAR-10

Convolutional Deep Belief Networks on CIFAR-10 Convolutional Deep Belief Networks on CIFAR-10 Alex Krizhevsky kriz@cs.toronto.edu 1 Introduction We describe how to train a two-layer convolutional Deep Belief Network (DBN) on the 1.6 million tiny images

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

All You Want To Know About CNNs. Yukun Zhu

All You Want To Know About CNNs. Yukun Zhu All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 7: Universal Approximation Theorem, More Hidden Units, Multi-Class Classifiers, Softmax, and Regularization Peter Belhumeur Computer Science Columbia University

More information

Classifying Depositional Environments in Satellite Images

Classifying Depositional Environments in Satellite Images Classifying Depositional Environments in Satellite Images Alex Miltenberger and Rayan Kanfar Department of Geophysics School of Earth, Energy, and Environmental Sciences Stanford University 1 Introduction

More information

Novel Lossy Compression Algorithms with Stacked Autoencoders

Novel Lossy Compression Algorithms with Stacked Autoencoders Novel Lossy Compression Algorithms with Stacked Autoencoders Anand Atreya and Daniel O Shea {aatreya, djoshea}@stanford.edu 11 December 2009 1. Introduction 1.1. Lossy compression Lossy compression is

More information

Emotion Detection using Deep Belief Networks

Emotion Detection using Deep Belief Networks Emotion Detection using Deep Belief Networks Kevin Terusaki and Vince Stigliani May 9, 2014 Abstract In this paper, we explore the exciting new field of deep learning. Recent discoveries have made it possible

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,

More information

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing feature 3 PC 3 Beate Sick Many slides are taken form Hinton s great lecture on NN: https://www.coursera.org/course/neuralnets

More information

A Fast Learning Algorithm for Deep Belief Nets

A Fast Learning Algorithm for Deep Belief Nets A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton, Simon Osindero Department of Computer Science University of Toronto, Toronto, Canada Yee-Whye Teh Department of Computer Science National

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work Contextual Dropout Finding subnets for subtasks Sam Fok samfok@stanford.edu Abstract The feedforward networks widely used in classification are static and have no means for leveraging information about

More information

Transfer Learning Using Rotated Image Data to Improve Deep Neural Network Performance

Transfer Learning Using Rotated Image Data to Improve Deep Neural Network Performance Transfer Learning Using Rotated Image Data to Improve Deep Neural Network Performance Telmo Amaral¹, Luís M. Silva¹², Luís A. Alexandre³, Chetak Kandaswamy¹, Joaquim Marques de Sá¹ 4, and Jorge M. Santos¹

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

End-To-End Spam Classification With Neural Networks

End-To-End Spam Classification With Neural Networks End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam

More information

Deep Learning Cook Book

Deep Learning Cook Book Deep Learning Cook Book Robert Haschke (CITEC) Overview Input Representation Output Layer + Cost Function Hidden Layer Units Initialization Regularization Input representation Choose an input representation

More information

Global Optimality in Neural Network Training

Global Optimality in Neural Network Training Global Optimality in Neural Network Training Benjamin D. Haeffele and René Vidal Johns Hopkins University, Center for Imaging Science. Baltimore, USA Questions in Deep Learning Architecture Design Optimization

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

Elastic Neural Networks for Classification

Elastic Neural Networks for Classification Elastic Neural Networks for Classification Yi Zhou 1, Yue Bai 1, Shuvra S. Bhattacharyya 1, 2 and Heikki Huttunen 1 1 Tampere University of Technology, Finland, 2 University of Maryland, USA arxiv:1810.00589v3

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Report: Privacy-Preserving Classification on Deep Neural Network

Report: Privacy-Preserving Classification on Deep Neural Network Report: Privacy-Preserving Classification on Deep Neural Network Janno Veeorg Supervised by Helger Lipmaa and Raul Vicente Zafra May 25, 2017 1 Introduction In this report we consider following task: how

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

Notes on Multilayer, Feedforward Neural Networks

Notes on Multilayer, Feedforward Neural Networks Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic computational unit of the brain about 10^11 neurons in human brain Simplified neuron model as linear threshold

More information

Convolution Neural Networks for Chinese Handwriting Recognition

Convolution Neural Networks for Chinese Handwriting Recognition Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven

More information

Lecture 19: Generative Adversarial Networks

Lecture 19: Generative Adversarial Networks Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,

More information

arxiv: v1 [stat.ml] 21 Feb 2018

arxiv: v1 [stat.ml] 21 Feb 2018 Detecting Learning vs Memorization in Deep Neural Networks using Shared Structure Validation Sets arxiv:2.0774v [stat.ml] 2 Feb 8 Elias Chaibub Neto e-mail: elias.chaibub.neto@sagebase.org, Sage Bionetworks

More information

Exploring Capsules. Binghui Peng Runzhou Tao Shunyu Yao IIIS, Tsinghua University {pbh15, trz15,

Exploring Capsules. Binghui Peng Runzhou Tao Shunyu Yao IIIS, Tsinghua University {pbh15, trz15, Exploring Capsules Binghui Peng Runzhou Tao Shunyu Yao IIIS, Tsinghua University {pbh15, trz15, yao-sy15}@mails.tsinghua.edu.cn 1 Introduction Nowadays, convolutional neural networks (CNNs) have received

More information

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated

More information

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Tianyu Wang Australia National University, Colledge of Engineering and Computer Science u@anu.edu.au Abstract. Some tasks,

More information

Real-time convolutional networks for sonar image classification in low-power embedded systems

Real-time convolutional networks for sonar image classification in low-power embedded systems Real-time convolutional networks for sonar image classification in low-power embedded systems Matias Valdenegro-Toro Ocean Systems Laboratory - School of Engineering & Physical Sciences Heriot-Watt University,

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

An Efficient Learning Scheme for Extreme Learning Machine and Its Application

An Efficient Learning Scheme for Extreme Learning Machine and Its Application An Efficient Learning Scheme for Extreme Learning Machine and Its Application Kheon-Hee Lee, Miso Jang, Keun Park, Dong-Chul Park, Yong-Mu Jeong and Soo-Young Min Abstract An efficient learning scheme

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

Final Report: Classification of Plankton Classes By Tae Ho Kim and Saaid Haseeb Arshad

Final Report: Classification of Plankton Classes By Tae Ho Kim and Saaid Haseeb Arshad Final Report: Classification of Plankton Classes By Tae Ho Kim and Saaid Haseeb Arshad Table of Contents 1. Project Overview a. Problem Statement b. Data c. Overview of the Two Stages of Implementation

More information

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version Shallow vs. deep networks Restricted Boltzmann Machines Shallow: one hidden layer Features can be learned more-or-less independently Arbitrary function approximator (with enough hidden units) Deep: two

More information

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction

More information

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,

More information

ETALON IMAGES: UNDERSTANDING THE CONVOLUTION NEURAL NETWORKS

ETALON IMAGES: UNDERSTANDING THE CONVOLUTION NEURAL NETWORKS ETALON IMAGES: UNDERSTANDING THE CONVOLUTION NEURAL WORKS Vladimir V. Molchanov 1, Boris V. Vishnyakov 1, Vladimir S. Gorbatsevich 1, Yury V. Vizilter 1 1 FGUP «State Research Institute of Aviation Systems»,

More information

Inception Network Overview. David White CS793

Inception Network Overview. David White CS793 Inception Network Overview David White CS793 So, Leonardo DiCaprio dreams about dreaming... https://m.media-amazon.com/images/m/mv5bmjaxmzy3njcxnf5bml5banbnxkftztcwnti5otm0mw@@._v1_sy1000_cr0,0,675,1 000_AL_.jpg

More information

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Intro to Deep Learning Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Why this class? Deep Features Have been able to harness the big data in the most efficient and effective

More information

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Abstract: This project aims at creating a benchmark for Deep Learning (DL) algorithms

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information