Tuning the Layers of Neural Networks for Robust Generalization
|
|
- Teresa Craig
- 5 years ago
- Views:
Transcription
1 208 Int'l Conf. Data Science ICDATA'18 Tuning the Layers of Neural Networks for Robust Generalization C. P. Chiu, and K. Y. Michael Wong Department of Physics Hong Kong University of Science and Technology Clear Water Bay, Hong Kong, China Abstract Neural Networks are known to have generalization ability for test data. The generalization ability depends on the fine tuning of the network architecture, which mainly depended on design experience. In this work, we explore a simple way to identify the network layer responsible for the lack of performance robustness for translationally displaced input patterns, and hence provide evidence to improve the translational robustness of the network by modifying that particular layer for small datasets. It achieves a significant improvement in the weighted average error with modification hints provided by the random epochs training process on MNIST and Fashion MNIST datasets. This method also provides a way to understand the weight space development of neural networks. Keywords Generalization robustness, weak layer identification, architecture search, data augmentation, random epochs training. I. INTRODUCTION Neural network are known to have generalization ability on different tasks [1] [4]. However, its generalization ability is often restricted by different factors, as illustrated by the occurrence of adversarial examples [5], [6]. A recent work has shown that neural networks may perform poorly even when dealing input patterns that are simply rotationally or translationally displaced [7]. Different visualization techniques have been developed to understand the internal features inside deep neural networks in order to improve the network architecture and understand better the fundamental nature of its operation [8], [9]. These visualization techniques depend on the input image and are normally demanding in terms of resources. Model architecture plays an important role on generalization. Different initialization [4], [10], regularization [4], [11], [12], and optimization [13] techniques are proposed to enhance network generalization robustness, but the model architecture design generally relied on experience. Therefore, automatic architecture search has drawn attention owing to the current hand-crafting nature of neural network research. The common automatic architecture search methods are based on evolution of topologies [14], reinforcement learning [15]. Despite its promising performance [16], these methods are demanding in terms of resources and are not able to reveal the fundamental nature of deep neural networks. In this work, we propose a method to identify the weak layer in terms of translational robustness, using simple data augmentation and hence the demand for resources is low. From this, we also demonstrate that the robustness can be improved by modifying the weak layer. This approach can also provide a better understanding for the development of weights in the network during the training process. As an example of identifying the weak layer in a neural network, we consider the robustness of networks processing visual images when input images are translated. For benchmarking we select a network trained with augmented data with epochs of randomly translated input images as a reference. Its performance is compared with those trained with a less stochastic sequence of translated images, which normally result in weaker performance. The correlations of each layer of these networks with the reference networks are monitored, and the layer with the weakest correlation is identified to be weakest layer for further tuning. Using MNIST and Fashion MNIST datasets for testing, we found that this method is effective. The rest of the paper is organized as follows. In section 2, we first describe the architecture of the neural network. Then, we propose two different data augmentation methods to impose different extent of perturbation to the network. Third, we use zero normalized cross correlation and weighted average error to measure the generalization robustness. In section 3, we discuss the experimental results from our proposed method for the identification of the weak layer. We demonstrate that this method is effective on standard machine learning datasets such as MNIST and Fashion MNIST. In section 4, we conclude our results in this work and propose possible applications and limitations of the method. II. EXPERIMENTAL SETUP A. Model Architecture The network is based on the LeNet used to process the MNIST dataset [17]. The convolutional kernels are 5 5 pixels in both convolutional layers which have 20 kernels, and the second layer has kernels. The third and fourth layers are fully connected layers with size 800, and 500. The network is pre-trained with the MNIST dataset without data augmentation. Even the network is further trained with another 200 epochs with weight decay, we observed that the network does not show any further improvement without data augmentation. The activation function of all layers is defaulted to be tanh, and the last layer activation is arg max of softmax.
2 Int'l Conf. Data Science ICDATA' Fig. 1: Pre-trained Model, weighted average error = The learning rate is 0.05, and the weight decay is The network is initialized with a given seed for the pseudo random number generator and undergoes mini-batch learning. The pretrained model achieves 0.91% error on the test set. B. Training and Data Augmentation Methods We deployed two data augmentation methods in this work. a) Random Epochs Training (RET): For every epoch, the validation and training dataset are displaced horizontally and vertically in range of [ 10, 10] randomly, and the network is trained with 200 epochs in total. b) Sequential Training: The network is trained and validated with the dataset which is displaced leftwards by 2 pixels for several epochs. The network is then trained with the dataset which is displaced further leftwards by 2 pixels. This is repeated in turn for displacements in the rightward, upward and downward directions images in MNIST training dataset are separated and used as the validation set. C. Zero Normalized Cross Correlation of Weight Space The direct visualization of the weight space is nearly impossible owing to the large number of parameters. Therefore, we use Zero Normalized Cross Correlation (ZNCC) to check the similarity of the different models in fully connected layers as well as convolutional layers. Consider the comparison of two network models A and B. For the convolutional layers, C AB ij = W ij AW ij B W ij A n Wij B n n σij A (1) σb ij the are weights in i-th layer,and j-th kernel are the standard respectively, n is the dimension where Wij A, W ij B in models A, and B respectively. σij A and σb ij derivations of Wij A and W ij B of the weight W ij. This is further simplified to the root mean square of the ZNCC kernels in a single layer i. Ci AB = (Cij AB ) 2 j (2) (a) The sequentially trained model at an intermediate stage when x =0;y = 4, weighted average error = (b) The final state of the sequentially trained model, weighted average error = (c) The RET Model, mean of 10 trials of the weighted average error = ± Fig. 2: MNIST error map of models trained with different data augmentation methods (with pre-training stage). x-axis: horizontal displacement of input test images, y-axis: vertical displacement of input test images, color bar : test set classification error.
3 210 Int'l Conf. Data Science ICDATA'18 For the fully connected layer, it is similar to the ZNCC for the convolutional layer where it is given as below. Ci AB = W i AW i B Wi A n Wi B n n σi A (3) σb i D. Error Maps and Weighted Average Errors The trained networks are tested with the test set for each displacement horizontally and vertically. The corresponding classification error with different displaced test sets is plotted in the two-dimensional space of pixel displacements as a map to check the generalization ability. The generalization performance is further simplified with weighted average error, the weighted average error of the map is computed with the following equation, i weighted average error = exp( x2 i +y2 i 2σ )E 2 i i exp( (4) x2 i +y2 i 2σ ) 2 where E i is the classification error of the model given that the pixel displacement of the test set is (x i,y i ), and σ is equal to 6 as the loss of information increases rapidly for translation displacements of the dataset around ±6. As seen in fig. 1, the generalization error rapidly increases to 0.9 accordingly. III. EXPERIMENTAL RESULT AND DISCUSSION A. Error Map of Trained Models Ideally, the error map should have a basin shape since there is information lost when the displacement increases gradually until saturation. The classification error of the pre-trained model increases rapidly with small displacements of around ±6 pixels as shown in fig. 1 with weighted average error equal to There are patches of unnaturally high error regions beyond the center in the pre-trained model. Beyond these patches, the network input will be virtually blank for large pixel displacements, and the network classifies most of the largely displaced patterns as class 1, since the handwritten digits in class 1 of the MNIST dataset have the largest number of blank pixels among the 10 classes. After the sequential training, and random epochs training, the high error patches vanish as seen in fig. 2b, and 2c. Data augmentation is crucial for robust generalization; however, it does not guarantee good generalization; the mean of the weighted average error is 0.36 as seen in 2c, and the error map has an irregular contour. Sequential training only improves the generalization robustness slightly with the weighted average error dropping from 0.66 of the pre-trained model to When the pixel displacement of the dataset is small in sequential training, the basin of the classification error is shifted to the corresponding displacement as shown in fig. 2a. Therefore, the corresponding change in the weight space should be responsible for the center displacement. In the RET model, the basin is broadened as seen in fig. 2c with the mean of the weighted average error Therefore, in the analyses below, the RET model can be regarded as a benchmark for robust generalization. By comparing the (a) Pre-trained Model. (b) RET Model. Fig. 3: Normalized cross correlation (Ci AB ) between the sequentially trained model and (a) the pre-trained model and (b) the RET model. sequential trained and RET models layer by layer, the layer with the largest change in deviation can be found and identified as the layer responsible for the lack of translational robustness. B. Comparison of Trained Models As described previously, sequential training shifts the basin in the direction corresponding to the displacement of the dataset, as illustrated in fig. 2a. Fig. 3a shows the deviation of the network weight in the weight space from the pre-rained model through the training sequence. As the RET model achieves a high generalization power with a significant drop in weighted average error, we treat it as a benchmark model of generalization and check how the sequentially trained model deviates from it. As shown
4 Int'l Conf. Data Science ICDATA' (a) The RET Model without pre-training stage. The activation function is tanh. Mean of 10 trials of the weighted average error = ± (a) The pre-trained Model. Weighted average error = (b) The RET Model without the pre-training stage. The activation function is Relu. Mean of 10 trials of the weighted average error = ± Fig. 4: Error map of the RET model (without the pre-training stage) with different activation functions in layer 2 (a) tanh, (b) Relu. in fig. 3b, the normalized cross correlation of the sequential model changed relatively slightly for layers 0, 1, and 3. It means that these layers already capture some translationally invariant features. However, it was found that only the first fully connected layer (layer 2) is subjected to a large change comparing with the other layers. It is likely that improving the first fully connected layer is crucial for the generalization of translational invariance for this model and dataset. C. Robustness Improvement with minimal change We have seen in fig. 1 that an irregular landscape with patches of high error regions presents in the pre-trained model. However, as shown in fig. 4a and 2c, the error map still has (b) The RET Model. Mean of 10 trials the weighted average error = ± Fig. 5: Fashion MNIST error maps of the pre-trained and RET (after pre-training) models. an irregular contour even the network is trained without pretraining, and there is only a drop of around 0.04 from 0.36 in the mean of the weighted average error. Therefore, we suspect that the likely cause is the vanishing backpropagation gradient of the tanh activation function when the magnitude of its argument is too large. From the previous section, we know that layer 2 is possibly responsible for the lack of translational invariance. Hence, we only focus on the modification of layer 2. As shown in fig. 4b, when the activation functions in layer 2 are changed from tanh to Relu, the basin of low error in the error map is broadened to roughly [ 10, 10] horizontally and vertically. The weighted average error is greatly reduced from 0.66 to 0.26 for the RET network. D. Further Test on Fashion MNIST The Fashion MNIST, which is similar to MNIST, is a dataset of clothing images with a size of 28 by 28 [18]. It is a more
5 212 Int'l Conf. Data Science ICDATA'18 (a) The RET Model without the pre-training stage. Mean of 10 trials of the weighted average error = ± (b) The RET Model without the pre-training stage, Layer 2 Activation: Relu. Mean of 10 trials of the weighted average error = ± (a) (c) The RET Model without the pre-training stage, Layer 1 and 2 Activation: Relu. Mean of 10 trials of the weighted average error = ± (d) The RET Model without the pre-training stage, Layer 1 Activation: Relu. Mean of 10 trials of the weighted average error = ± Fig. 7: Fashion MNIST error map of the RET models (without the pre-training stage) of different activation functions. (b) Fig. 6: Fashion MNIST normalized cross correlation between sequentially trained models and pre-trained (and RET) models. challenging dataset. For example, the network is required to learn to differentiate T-shirts, shirts, coats, shoes, etc. from gray scale information. Similar to the MNIST model, the network performance decays rapidly after the displacement of a few pixels with the weighted average error of 0.7 as shown in fig. 5a. The network achieves a better generalization for the RET model with 0.3 drop in mean weighted average error. However, the basin is slightly shifted as shown in fig. 5b. This is likely the result of the RET training strategy, which may be biased towards certain displacement which gives a lower classification error. If the data augmentation is done randomly for individual images instead of individual epochs, it is likely that the bias can be reduced. However, it should also be noticed that the network may be biased owing to the network architecture. For the case without pre-training, the RET model performs worse than the case with pre-training, as the weighted average error increases from 0.41 in fig. 5b to 0.46 in fig. 7a. Note that the basins for the Fashion MNIST dataset are broader than those in MNIST because objects in the Fashion MNIST dataset cover broader regions in the image field. Therefore, large displacements in RET for Fashion MNIST results in a larger mean weighted average error than RET model with pretraining stage. From fig. 6, all layers of weights gradually deviate from the pre-trained model, and layer 2 was found to have a relatively larger change in ZNCC compared with the RET model. Therefore, it is likely the layer 2 is responsible for the lack of translation robustness. Again, by modifying the activation function on layer 2, fig. 7b shows that the network becomes less biased and achieves a lower mean weighted average error of Unlike the MNIST dataset, layers 0, and 3 have higher ZNCCs compared with that of the MNIST data, as shown in fig. 6b. This is because objects in the Fashion MNIST dataset have very different features, hence layers 0 and 3 already capture the more robust features of translational invariance of clothings. Although there is a slight drop in the correlation for layer 1, it should be noted that it may result from the influence due to
6 Int'l Conf. Data Science ICDATA' the dramatic change of layer 2. Note that although changing the activation function gives us a way to explore performance improvement, its success is not guaranteed. As seen in fig. 7c, and 7d, if layer 1 is changed alone or together with layer 2, the error map becomes irregular and biased. The mean weighted average error are larger for RET model with modification on layer 1 than that with modification on layer 2 only. IV. CONCLUSION In this work, we demonstrated that the network generalization power can be improved by random epochs training. It can be further utilized to identify the layer responsible for the lack of translational invariance and used to further improve the robustness of the network. Although we only demonstrated this idea on translated patterns from the MNIST and Fashion MNIST datasets, we believe that it is also applicable for image rotations in other types of small datasets. As the weight deviation changes due to changes of the dataset, it can help to identify the layer responsible for the lack of robustness against the distortion. However, for a large dataset like ImageNet, the dataset itself may be robust enough to have all kinds of distortions embedded within. Thus, this method should be further verified and tested on large scale datasets. [14] K. O. Stanley, and M. Risto. Evolving neural networks through augmenting topologies., Evolutionary Computation, vol. 10, pp , 2002 [15] B. Zoph, V. Vasudevan, J. Shlens, Q.V. Le, Learning Transferable Architectures for Scalable Image Recognition, arxiv preprint arxiv: , [16] E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, A. Kurakin, Large-Scale Evolution of Image Classifiers, Proc. 34th ICML, vol. 70, pp , [17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-Based Learning Applied to Document Recognition, Proc. IEEE, vol. 86, pp , [18] H. Xiao, K. Rasul, and R. Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, arxiv: , ACKNOWLEDGMENT This work is supported by the Research Grants Council of Hong Kong (grant nos and ). REFERENCES [1] A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Adv. NIPS, 25, pp , [2] A. Graves, A. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks, IEEE ICASSP, 2013 [3] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel and, D. Hassabis, Mastering the game of Go without human knowledge, Nature, 550, pp , [4] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human level performance on imagenet classification., Proc. IEEE ICCV, pp , [5] X. Yuan, P. He, Q. Zhu, R. R. Bhat, X. Li, Adversarial Examples: Attacks and Defenses for Deep Learning, arxiv preprint arxiv: , [6] A. A. Alemi, I. Fischer, J. V. Dillon, K. Murphy, Deep Variational Information Bottleneck, ICLR, [7] L. Engstrom, D. Tsipras, L. Schmidt,and A. Madry, A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations., NIPS Workshop Mach. Learning and Comput. Security, [8] M. Aravindh, and A. Vedaldi, Understanding Deep Image Representations by Inverting Them, IEEE CVPR, pp , [9] M. D. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV, Springer, pp , [10] G. Xavier, B. Yoshua. Understanding the difficulty of training deep feedforward neural networks. Proc.13-th Int. Conf. Artificial Intell. and Statist., pp , [11] A.Krogh, J. A. Hertz, A simple weight decay can improve generalization., Adv. NIPS, pp , [12] S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Proc. 32nd ICML, vol 37, pp , [13] D. P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, ICLR, 2015.
Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationProceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong
, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA
More informationLSTM: An Image Classification Model Based on Fashion-MNIST Dataset
LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application
More informationStudy of Residual Networks for Image Recognition
Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks
More informationChannel Locality Block: A Variant of Squeeze-and-Excitation
Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan
More informationReal-time convolutional networks for sonar image classification in low-power embedded systems
Real-time convolutional networks for sonar image classification in low-power embedded systems Matias Valdenegro-Toro Ocean Systems Laboratory - School of Engineering & Physical Sciences Heriot-Watt University,
More informationCOMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization
More informationDeep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.
Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer
More informationHENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage
HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn
More informationCS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016
CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional
More informationDeep Neural Networks:
Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,
More informationDeep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies
More informationLearning Deep Representations for Visual Recognition
Learning Deep Representations for Visual Recognition CVPR 2018 Tutorial Kaiming He Facebook AI Research (FAIR) Deep Learning is Representation Learning Representation Learning: worth a conference name
More informationarxiv: v2 [cs.cv] 30 Oct 2018
Adversarial Noise Layer: Regularize Neural Network By Adding Noise Zhonghui You, Jinmian Ye, Kunming Li, Zenglin Xu, Ping Wang School of Electronics Engineering and Computer Science, Peking University
More informationADAPTIVE DATA AUGMENTATION FOR IMAGE CLASSIFICATION
ADAPTIVE DATA AUGMENTATION FOR IMAGE CLASSIFICATION Alhussein Fawzi, Horst Samulowitz, Deepak Turaga, Pascal Frossard EPFL, Switzerland & IBM Watson Research Center, USA ABSTRACT Data augmentation is the
More information3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon
3D Densely Convolutional Networks for Volumetric Segmentation Toan Duc Bui, Jitae Shin, and Taesup Moon School of Electronic and Electrical Engineering, Sungkyunkwan University, Republic of Korea arxiv:1709.03199v2
More informationarxiv: v3 [cs.cv] 29 May 2017
RandomOut: Using a convolutional gradient norm to rescue convolutional filters Joseph Paul Cohen 1 Henry Z. Lo 2 Wei Ding 2 arxiv:1602.05931v3 [cs.cv] 29 May 2017 Abstract Filters in convolutional neural
More informationApplication of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset
Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Suyash Shetty Manipal Institute of Technology suyash.shashikant@learner.manipal.edu Abstract In
More informationarxiv: v1 [cs.cv] 20 Dec 2016
End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationContent-Based Image Recovery
Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose
More informationA Novel Weight-Shared Multi-Stage Network Architecture of CNNs for Scale Invariance
JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 A Novel Weight-Shared Multi-Stage Network Architecture of CNNs for Scale Invariance Ryo Takahashi, Takashi Matsubara, Member, IEEE, and Kuniaki
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More informationDeep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?
Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of
More informationRyerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro
Ryerson University CP8208 Soft Computing and Machine Intelligence Naive Road-Detection using CNNS Authors: Sarah Asiri - Domenic Curro April 24 2016 Contents 1 Abstract 2 2 Introduction 2 3 Motivation
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationConvolutional Neural Networks
NPFL114, Lecture 4 Convolutional Neural Networks Milan Straka March 25, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationDeconvolutions in Convolutional Neural Networks
Overview Deconvolutions in Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More informationWeighted Convolutional Neural Network. Ensemble.
Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com
More informationLearning Deep Features for Visual Recognition
7x7 conv, 64, /2, pool/2 1x1 conv, 64 3x3 conv, 64 1x1 conv, 64 3x3 conv, 64 1x1 conv, 64 3x3 conv, 64 1x1 conv, 128, /2 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv,
More informationSeminars in Artifiial Intelligenie and Robotiis
Seminars in Artifiial Intelligenie and Robotiis Computer Vision for Intelligent Robotiis Basiis and hints on CNNs Alberto Pretto What is a neural network? We start from the frst type of artifcal neuron,
More informationRotation Invariance Neural Network
Rotation Invariance Neural Network Shiyuan Li Abstract Rotation invariance and translate invariance have great values in image recognition. In this paper, we bring a new architecture in convolutional neural
More informationDeep Neural Network Hyperparameter Optimization with Genetic Algorithms
Deep Neural Network Hyperparameter Optimization with Genetic Algorithms EvoDevo A Genetic Algorithm Framework Aaron Vose, Jacob Balma, Geert Wenes, and Rangan Sukumar Cray Inc. October 2017 Presenter Vose,
More informationCost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling
[DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji
More informationConvolution Neural Network for Traditional Chinese Calligraphy Recognition
Convolution Neural Network for Traditional Chinese Calligraphy Recognition Boqi Li Mechanical Engineering Stanford University boqili@stanford.edu Abstract script. Fig. 1 shows examples of the same TCC
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationarxiv: v2 [cs.lg] 21 Nov 2017
Efficient Architecture Search by Network Transformation Han Cai 1, Tianyao Chen 1, Weinan Zhang 1, Yong Yu 1, Jun Wang 2 1 Shanghai Jiao Tong University, 2 University College London {hcai,tychen,wnzhang,yyu}@apex.sjtu.edu.cn,
More informationEfficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet
Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet Amarjot Singh, Nick Kingsbury Signal Processing Group, Department of Engineering, University of Cambridge,
More informationFACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI
FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE Masatoshi Kimura Takayoshi Yamashita Yu Yamauchi Hironobu Fuyoshi* Chubu University 1200, Matsumoto-cho,
More informationDefense against Adversarial Attacks Using High-Level Representation Guided Denoiser SUPPLEMENTARY MATERIALS
Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser SUPPLEMENTARY MATERIALS Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, Jun Zhu Department of Computer
More informationCNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague
CNNS FROM THE BASICS TO RECENT ADVANCES Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague ducha.aiki@gmail.com OUTLINE Short review of the CNN design Architecture progress
More informationarxiv: v1 [cs.cv] 6 Jul 2016
arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks
More informationResearch on Pruning Convolutional Neural Network, Autoencoder and Capsule Network
Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Tianyu Wang Australia National University, Colledge of Engineering and Computer Science u@anu.edu.au Abstract. Some tasks,
More informationConvolutional Neural Networks
Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
More informationSupplementary material for Analyzing Filters Toward Efficient ConvNet
Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable
More informationDeep Learning With Noise
Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu
More informationarxiv: v1 [cs.cv] 1 Jul 2018
Autonomous Deep Learning: A Genetic DCNN Designer for Image Classification Benteng Ma Yong Xia* School of Computer Science, Northwestern Polytechnical University Xian 710072, China yxia@nwpu.edu.cn arxiv:1807.00284v1
More informationAdvanced Machine Learning
Advanced Machine Learning Convolutional Neural Networks for Handwritten Digit Recognition Andreas Georgopoulos CID: 01281486 Abstract Abstract At this project three different Convolutional Neural Netwroks
More informationTo be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine
2014 22nd International Conference on Pattern Recognition To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine Takayoshi Yamashita, Masayuki Tanaka, Eiji Yoshida, Yuji Yamauchi and Hironobu
More informationDynamic Routing Between Capsules
Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet
More informationResidual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina
Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,
More informationWhere s Waldo? A Deep Learning approach to Template Matching
Where s Waldo? A Deep Learning approach to Template Matching Thomas Hossler Department of Geological Sciences Stanford University thossler@stanford.edu Abstract We propose a new approach to Template Matching
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationElastic Neural Networks for Classification
Elastic Neural Networks for Classification Yi Zhou 1, Yue Bai 1, Shuvra S. Bhattacharyya 1, 2 and Heikki Huttunen 1 1 Tampere University of Technology, Finland, 2 University of Maryland, USA arxiv:1810.00589v3
More informationSmart Content Recognition from Images Using a Mixture of Convolutional Neural Networks *
Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks * Tee Connie *, Mundher Al-Shabi *, and Michael Goh Faculty of Information Science and Technology, Multimedia University,
More informationReal-Time Depth Estimation from 2D Images
Real-Time Depth Estimation from 2D Images Jack Zhu Ralph Ma jackzhu@stanford.edu ralphma@stanford.edu. Abstract ages. We explore the differences in training on an untrained network, and on a network pre-trained
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationStructured Prediction using Convolutional Neural Networks
Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationUnderstanding Deep Networks with Gradients
Understanding Deep Networks with Gradients Henry Z. Lo, Wei Ding Department of Computer Science University of Massachusetts Boston Boston, Massachusetts 02125 3393 Email: {henryzlo, ding}@cs.umb.edu Abstract
More informationarxiv: v1 [cs.cv] 29 Oct 2017
A SAAK TRANSFORM APPROACH TO EFFICIENT, SCALABLE AND ROBUST HANDWRITTEN DIGITS RECOGNITION Yueru Chen, Zhuwei Xu, Shanshan Cai, Yujian Lang and C.-C. Jay Kuo Ming Hsieh Department of Electrical Engineering
More informationarxiv: v1 [cs.cv] 16 Mar 2018
Semantic Adversarial Examples Hossein Hosseini Radha Poovendran Network Security Lab (NSL) Department of Electrical Engineering, University of Washington, Seattle, WA arxiv:1804.00499v1 [cs.cv] 16 Mar
More informationIntroduction to Neural Networks
Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic computational unit of the brain about 10^11 neurons in human brain Simplified neuron model as linear threshold
More informationKaggle Data Science Bowl 2017 Technical Report
Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li
More informationOn the Effectiveness of Neural Networks Classifying the MNIST Dataset
On the Effectiveness of Neural Networks Classifying the MNIST Dataset Carter W. Blum March 2017 1 Abstract Convolutional Neural Networks (CNNs) are the primary driver of the explosion of computer vision.
More informationHandwritten Digit Classication using 8-bit Floating Point based Convolutional Neural Networks
Downloaded from orbit.dtu.dk on: Nov 22, 2018 Handwritten Digit Classication using 8-bit Floating Point based Convolutional Neural Networks Gallus, Michal; Nannarelli, Alberto Publication date: 2018 Document
More informationConvolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017
Convolutional Neural Networks + Neural Style Transfer Justin Johnson 2/1/2017 Outline Convolutional Neural Networks Convolution Pooling Feature Visualization Neural Style Transfer Feature Inversion Texture
More informationConvolution Neural Networks for Chinese Handwriting Recognition
Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven
More informationarxiv: v1 [cs.lg] 16 Jan 2013
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks arxiv:131.3557v1 [cs.lg] 16 Jan 213 Matthew D. Zeiler Department of Computer Science Courant Institute, New York University zeiler@cs.nyu.edu
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period
More informationFrom Maxout to Channel-Out: Encoding Information on Sparse Pathways
From Maxout to Channel-Out: Encoding Information on Sparse Pathways Qi Wang and Joseph JaJa Department of Electrical and Computer Engineering and, University of Maryland Institute of Advanced Computer
More informationLearning Discrete Representations via Information Maximizing Self-Augmented Training
A. Relation to Denoising and Contractive Auto-encoders Our method is related to denoising auto-encoders (Vincent et al., 2008). Auto-encoders maximize a lower bound of mutual information (Cover & Thomas,
More informationPart Localization by Exploiting Deep Convolutional Networks
Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.
More informationarxiv: v1 [cs.cv] 17 Nov 2016
Inverting The Generator Of A Generative Adversarial Network arxiv:1611.05644v1 [cs.cv] 17 Nov 2016 Antonia Creswell BICV Group Bioengineering Imperial College London ac2211@ic.ac.uk Abstract Anil Anthony
More informationarxiv: v1 [cs.ne] 11 Jun 2018
Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks arxiv:1806.03796v1 [cs.ne] 11 Jun 2018 Yash Upadhyay University of Minnesota, Twin Cities Minneapolis, MN, 55414
More informationIntroduction to Deep Q-network
Introduction to Deep Q-network Presenter: Yunshu Du CptS 580 Deep Learning 10/10/2016 Deep Q-network (DQN) Deep Q-network (DQN) An artificial agent for general Atari game playing Learn to master 49 different
More informationStacked Denoising Autoencoders for Face Pose Normalization
Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University
More informationTiny ImageNet Visual Recognition Challenge
Tiny ImageNet Visual Recognition Challenge Ya Le Department of Statistics Stanford University yle@stanford.edu Xuan Yang Department of Electrical Engineering Stanford University xuany@stanford.edu Abstract
More informationCONVOLUTIONAL NEURAL NETWORK TRANSFER LEARNING FOR UNDERWATER OBJECT CLASSIFICATION
CONVOLUTIONAL NEURAL NETWORK TRANSFER LEARNING FOR UNDERWATER OBJECT CLASSIFICATION David P. Williams NATO STO CMRE, La Spezia, Italy 1 INTRODUCTION Convolutional neural networks (CNNs) have recently achieved
More informationProgressive Neural Architecture Search
Progressive Neural Architecture Search Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy 09/10/2018 @ECCV 1 Outline Introduction
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More informationLearning visual odometry with a convolutional network
Learning visual odometry with a convolutional network Kishore Konda 1, Roland Memisevic 2 1 Goethe University Frankfurt 2 University of Montreal konda.kishorereddy@gmail.com, roland.memisevic@gmail.com
More informationIn-Place Activated BatchNorm for Memory- Optimized Training of DNNs
In-Place Activated BatchNorm for Memory- Optimized Training of DNNs Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder Mapillary Research Paper: https://arxiv.org/abs/1712.02616 Code: https://github.com/mapillary/inplace_abn
More informationA FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen
A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY
More informationCNN-based Human Body Orientation Estimation for Robotic Attendant
Workshop on Robot Perception of Humans Baden-Baden, Germany, June 11, 2018. In conjunction with IAS-15 CNN-based Human Body Orientation Estimation for Robotic Attendant Yoshiki Kohari, Jun Miura, and Shuji
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30
More informationA Quick Guide on Training a neural network using Keras.
A Quick Guide on Training a neural network using Keras. TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from
More informationStochastic Function Norm Regularization of DNNs
Stochastic Function Norm Regularization of DNNs Amal Rannen Triki Dept. of Computational Science and Engineering Yonsei University Seoul, South Korea amal.rannen@yonsei.ac.kr Matthew B. Blaschko Center
More informationProperties of adv 1 Adversarials of Adversarials
Properties of adv 1 Adversarials of Adversarials Nils Worzyk and Oliver Kramer University of Oldenburg - Dept. of Computing Science Oldenburg - Germany Abstract. Neural networks are very successful in
More informationCEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015
CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France
More informationSupplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos
Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Kihyuk Sohn 1 Sifei Liu 2 Guangyu Zhong 3 Xiang Yu 1 Ming-Hsuan Yang 2 Manmohan Chandraker 1,4 1 NEC Labs
More informationREGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION
REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological
More informationDeep Convolutional Neural Networks and Noisy Images
Deep Convolutional Neural Networks and Noisy Images Tiago S. Nazaré, Gabriel B. Paranhos da Costa, Welinton A. Contato, and Moacir Ponti Instituto de Ciências Matemáticas e de Computação Universidade de
More informationA Deep Learning Approach to Vehicle Speed Estimation
A Deep Learning Approach to Vehicle Speed Estimation Benjamin Penchas bpenchas@stanford.edu Tobin Bell tbell@stanford.edu Marco Monteiro marcorm@stanford.edu ABSTRACT Given car dashboard video footage,
More informationMachine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart
Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider
More informationKnow your data - many types of networks
Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for
More informationBranchyNet: Fast Inference via Early Exiting from Deep Neural Networks
BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks Surat Teerapittayanon Harvard University Email: steerapi@seas.harvard.edu Bradley McDanel Harvard University Email: mcdanel@fas.harvard.edu
More information