Applying Deep Learning for Classifying Images of Hand Postures in the Rock-Paper-Scissors Game

Size: px
Start display at page:

Download "Applying Deep Learning for Classifying Images of Hand Postures in the Rock-Paper-Scissors Game"

Transcription

1 Otto-Friedrich-Universität Bamberg Angewandte Informatik / Kognitive Systeme Masterarbeit im Studiengang International Software Systems Science der Fakultät Wirtschaftsinformatik und Angewandte Informatik der Otto-Friedrich-Universität Bamberg Zum Thema: Applying Deep Learning for Classifying Images of Hand Postures in the Rock-Paper-Scissors Game Vorgelegt von: Rambabu Gupta (Matr. No ) Themensteller: Prof. Dr. Ute Schmid Abgabedatum:

2

3 Abstract The goal of our research is to build a model for visual recognition. First, I described Convolutional neural networks, its methods and techniques. Moreover, I also presented briefly the history of deep learning. After that, introduction about few architectures of convolutional neural networks were discussed. The crucial thing about this experiment is that we are dealing with fewer amounts of data for deep learning training and spend less computation power to perform training as architecture is relatively small. I also include several models with different methods and techniques to achieve the best accuracy on CIFAR10 datasets. In the experiment, architecture has to be designed, to classify images of three hand postures of Rock-Paper-Scissors game. An intelligent agent (named NAO) plays this game with people. So, deep learning model has to be built for identifying different hand postures of image. Based on results, similar architecture( with techniques and methods) which obtained higher accuracy in that experiment has been used to classify hand postures of the game.

4 II

5 Acknowledgements I would specifically like to thank Professor Ute Schmid and Christina Zeller for the opportunity to write the master thesis and for the guidance during thesis. I also want to thank them for supervising my research on this project. Additionally, I thank all the people who collaborated during the experiment.

6 Contents Abstract Acknowledgement I III 1 Introduction 1 2 Deep learning - An Overview Basic Concepts Background and History of Deep Learning Feedforward Deep Network (MLP) Types of Deep Neural Network Architectures Approaches and Methods CNN Approaches CNN Training Methods Covnets Architectures Applications and Evaluation Methods of Deep Learning Applications Evaluation methods The Tensorflow Framework Deep Learning in Computer Vision: CIFAR10 Images Data Architectures Solving CIFAR10 and Their Accuracies Comparing different CNN Architectures Description of CNN Architecture Setting of the Deep Learning Environment Angewandte Informatik / Kognitive Systeme IV

7 Contents 3.3 Configuring and Installing in AWS Cloud Analyzing outputs for Layover of ConvNets Results on CIFAR A Deep Convolutional Neural Network for Hand Posture Classification Playing Rock-Paper-Scissors with an Artificial Agents Creating Training Data for Rock-Paper-Scissors Hand Postures Image Data Preprocessing Image Data Augmentation Realizations of Deep learning project Convnet Architecture Application of the CNN Architecture Code implementation Training: Epochs and Early Stopping Results and Evaluation Conclusion and Future work 47 References Appendices A Content of the DVD 51 Angewandte Informatik / Kognitive Systeme V

8 List of Figures 2.1 Venn diagram[1] Flowcharts showing how the different parts of an AI system relate to each other within different AI disciplines[1] Analogy between biological neuron and node of artificial neural networks[2] Left: A regular 3-layer Neural Network. Right: A ConvNet arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a ConvNet transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels)[2] Multilayer neural networks and backpropagation[3] Picture of input changing in MLP[3] Larger Neural Networks can represent more complicated functions. The data are shown as circles colored by their class, and the decision regions by a trained neural network are shown underneath[2] Deeper models tend to perform better. This is not merely because the model is larger. This experiment from Goodfellow et al. (2014d) shows that increasing the number of parameters in layers of convolutional networks without increasing their depth is not nearly as effective at increasing test set performance. The legend indicates the depth of network used to make each curve and whether the curve represents variation in the size of the convolutional or the fully connected layers. [1] Angewandte Informatik / Kognitive Systeme VI

9 List of Figures 2.9 Analogy: A person is stuck in the mountains and is trying to get down (i.e. trying to find the minima). SGD: The person represents the backpropagation algorithm, and the path taken down the mountain represents the sequence of parameter settings that the algorithm will explore. The steepness of the hill represents the slope of the error surface at that point. The instrument used to measure steepness is differentiation (the slope of the error surface can be calculated by taking the derivative of the squared error function at that point). The direction he chooses to travel in aligns with the gradient of the error surface at that point. The amount of time he travels before taking another measurement is the learning rate of the algorithm. Source: closed-form-vs-gd/ball.png Typical CNN architecture Generative Adversial Networks overview CNN and RNN both are applied to find logical textual information directly from images. CNN does identify/classify the images and RNN generates meaningful sentences out of it Convolutional matrix equation[1] The visualization below iterates over the output activations (green), and shows that each element is computed by elementwise multiplying the highlighted input (blue) with the filter (red), summing it up, and then offsetting the result by the bias[2] An example input volume in red (e.g. a 32x32x3 CIFAR-10 image), and an example volume of neurons in the first Convolutional layer. Each neuron in the convolutional layer is connected only to a local region in the input volume spatially, but to the full depth (i.e. all color channels). Note, there are multiple neurons (5 in this example) along the depth, all looking at the same region in the input - see discussion of depth columns in text below[2] Pooling layer downsamples the volume spatially, independently in each depth slice of the input volume. Left: In this example, the input volume of size [224x224x64] is pooled with filter size 2, stride 2 into output volume of size [112x112x64]. Notice that the volume depth is preserved. Right: The most common downsampling operation is max, giving rise to max pooling, here shown with a stride of 2. That is, each max is taken over 4 numbers (little 2x2 square)[2] The test accuracy graph of the MNIST network trained with and without Batch Normalization[6] Test error for different architectures with and without dropout. The networks have 2 to 4 hidden layers each with 1024 to units.[7] Effect on features by dropout on MNSIT data with one hidden layer autoencoders having 256 rectified units[7] Angewandte Informatik / Kognitive Systeme VII

10 List of Figures 2.20 Max-pooling dropout[8] L1 and L2 regularization terms in mathematical formula Dropout Neural Net Model. Left: A standard neural net with 2 hidden layers. Right: An example of a thinned net produced by applying dropout to the network on the left. Crossed units have been dropped.[7] The effects of regularization strength: Each neural network above has 20 hidden neurons, but changing the regularization strength makes its final decision regions smoother with a higher regularization.[2] A cartoon depicting the effects of different learning rates. With low learning rates the improvements will be linear. With high learning rates they will start to look more exponential. Higher learning rates will decay the loss faster, but they get stuck at worse values of loss (green line).[2] The gap between the training and validation accuracy indicates the amount of overfitting.[2] Cost function[1] Cross Entropy formula Animations that may help your intuitions about the learning process dynamics. Left: Contours of a loss surface and time evolution of different optimization algorithms. Notice the overshooting behavior of momentumbased methods, which make the optimization look like a ball rolling down the hill. Right: A visualization of a saddle point in the optimization landscape, where the curvature along different dimension has different signs (one dimension curves up and another down). Notice that SGD has a very hard time breaking symmetry and gets stuck on the top. Conversely, algorithms such as RMSprop will see very low gradients in the saddle direction. Due to the denominator term in the RMSprop update, this will increase the effective learning rate along this direction, helping RMSProp proceed. Images credit: Alec Radford LeNet-5: First convolutional architecture for digit recognition. Courtesy: CNN Architecture: AlexNet. It contains 7 hidden layers, 650 thousand neurons, 60 million parameters and 650 million connections. Courtsey: http: //vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_ tugce_kyunghee.pdf Confusion Matrix : Based on different scenario, One can choose performance measure for better insights from this confusion matrix. Courtesy: Wikipedia Model : input-100c3-mp2-200c2-mp2-300c2-mp2-400c2-mp2-500c2-output. The hidden layer dimensions for l = 5 DeepCNets, and LeNet-7. In both cases, the spatial sizes decrease from 96 down to 1. The DeepCNet applies max-pooling much more slowly than LeNet-7[13] Angewandte Informatik / Kognitive Systeme VIII

11 List of Figures 2.33 DenseNet Architecture D: CNN model by tutorial of tensorflow on CIFAR10[14] A: 1st Architecture B: 2nd Architecture C: 3rd Architecture First step to choose the OS Choosing GPU in AWS Storage of root in OS images generated from 1st convolutional layer images generated from 2nd convolutional layer Original gray images st generated image nd generated image rd generated image Convnet Architecture for RPSG16 data st result : test accuracy Angewandte Informatik / Kognitive Systeme IX

12 List of Tables 3.1 Architecture details of CNN models for training CIFAR Training details of 4 models Results of Training using four models on CIFAR10 data Test Accuracy from 5-fold cross-validation Training and Validation Accuracy : 5-fold cross-validation Angewandte Informatik / Kognitive Systeme X

13 Abbreviations Abbreviations CNN MSE MLP SGD SVM NLP GAN RNN ReLU PReLU ILSVRC FC GPU AWS RPSG Convolutional Neural Networks Mean Square Error Multi Layer Perceptron Stochastic Gradient Descent Support Vector Machines Natural Language Processing Generative Adversial Networks Recurrent Neural Network Rectified Linear Unit Parametric Rectified Linear Unit ImageNet Large Scale Visual Recognition Challenge Fully Connected Graphical Processing Unit Amazon Web Services Rock-Paper-Scissors Game Hand postures(university of Bamberg,2016) Angewandte Informatik / Kognitive Systeme XI

14

15 Chapter 1 Introduction Human beings, the most advanced species of animals, possess better learning abilities than other species. Thanks to the five senses, we can intuitively learn by seeing, hearing, smelling, touching, and tasting. Learning through five senses help humans to gain various necessary skills for their lives such as face recognition, driving a car or language translation. These activities are considered simple for normal people. Computers, a representative of artificial intelligence agents which can solve complex problems, have failed to perform these simple intuitive tasks. Hence, the question arises as to why computers cannot perform these intuitive tasks. How can computers deal with these problem? How can we make computers achieve a high level of accuracy in performing non-algorithmic tasks? The reason why computers cannot fulfill intuitive tasks is because these tasks cannot be described formally (by some rules or logic). In order to find solutions to these intuitive problems, researchers have come up with various models or algorithms which learn from examples or experiences using multiple machine-learning techniques. Conventional machine learning needed suitable feature vectors and internal representations of raw data. It also required substantial amounts of effort, domain skills and expert knowledge to transform raw data into feature extractors. Afterwards, by using these feature vectors, the machine learning subsystems could detect and classify patterns in the input. In contrast, feature learning or representation learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification[3]. Feature learning is developed from the theory of probability, statistics, and neural networks. One of the most recent successful versions of feature learning is based on the concept of neural networks. It acquires the feature from examples provided in raw input data (e.g., pixel values for images). For example, when we need to identify cats using multilayer neural networks, then it will learn edges, arrangements of edges, parts of familiar objects regardless of distinct orientation, and the locations and colors of the different images of cats. In the past, in order to build such an image-recognition algorithm, researchers had to design these features manually, then send them to some learning algorithmic machines such as Support vector machines, K-nearest neighbors, Hidden Markov models, etc. Now, there is one advanced version of a multilayer neural network model called deep learning in which the representation learning methods is associated with multiple levels of representations. Each level provides non-linear capabilities to the model and transforms the representation to a more abstract level at the consecutive higher levels. This brings promising results to image and speech recognition processes. The success in performing these simple tasks is a milestone for the Angewandte Informatik / Kognitive Systeme 1

16 Chapter 1. Introduction development of more sophisticated machine-learning tasks like driving cars and playing games. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning [1]. The neural network concept was the focus of research in the 1980s. Although deep learning is based on neural networks concepts, it is a relatively new field of machine learning. Why are deep learning algorithms becoming more and more popular now? And how could deep learning algorithms achieve good results? Thanks to the recent success in designing and building effective neural nets with multiple layers using some techniques like convolution, pooling, batch normalization, etc,. We have also access to a substantial amount of data (Internet revolution), and computation power which has dramatically increased with the advent of GPU. Deep learning models are giving significantly better results in the field of computer vision, signal processing, and NLP. It attracted more attention when Google DeepMind deep-learning software was able to defeat a professional player in GO game for the first time in Deep learning models also even surpassed human levels of accuracy in some problems in image recognition, especially for CIFAR data[4]. I have done experiments on classifying images using CNN (deep learning model) and collected images of three hand postures of the Rock-Paper-Scissors game. We have photographed hand postures of more than 30 students and staff members at the University of Bamberg, and named this set of images as RPSG16. We have a robot named NAO at the university lab, which would play Rock-Paper-Scissors with humans. Based on the three hand positions of the game, NAO would decide whether he won or lost according to the rules of this game. However, how NAO decides about winning and losing is not the focus of this study. Instead, this thesis tries to identify which deep learning model NAO uses to recognize his opponent's hand posture with significantly high accuracy. Which deep learning architecture is suitable to do that? How do the different positions of layers and their corresponding numbers affect the accuracy of CNN models? Finally, is it possible to get good results using deep learning models with less image data? Considering the structure of this paper, in Section 2 we discuss basic concepts of deep learning and its approaches, then we mention necessary methods to understand the deep learning models. Some relevant definitions of the framework and its relation to experimental hypotheses are described in Section 3. Moreover, this section also presents the empirical study on CIFAR10 data using different CNN architectures and their accuracies. Section 4 provides information about the experiments on RPSG16 data, including details of the questionnaires, experimental procedures, results and discussion. Finally, Section 5 includes some conclusions and suggestions for further research. Angewandte Informatik / Kognitive Systeme 2

17 Chapter 2 Deep learning - An Overview 2.1 Basic Concepts Deep Learning is a relatively new sub-field of Machine learning. It is inspired by the neuronal architecture of the brain. It which usually consists of multiple hidden-layers and has fewer connections among the nodes. Deep Learning is also termed as hierarchical learning, feature or representation learning as it does not require hand-crafted feature vectors but, uses raw data instead Background and History of Deep Learning The history of deep learning algorithms can be traced back to 1965, when Ivakhnenko and Lapa modeled the neural networks having multiple layers of non-linear features. Nevertheless, instead of using back-propagation, this model employed polynomial activation function which was analyzed by statistical methods. In 1979, Fukusima created another model consist of multiple convolutional and pooling layers, but it also did not use backpropagation for the training. Rumelhart, Hinton, and Williams in 1985 showed that backpropagation in neural networks yielded distributed representations(connectionism). For this reason, Deep learning is sometimes considered as rebranding of connectionism. In 1989 at bells lab, LeCun had practically applied convolutional networks with backpropagation(lenet) on handwritten digits. Even the advance was slow in this field in the 1990s, support vector machine(svm) overshadowed it later. With the advent of GPUs, the computational speed of computers increased drastically. As neural networks are computationally expensive and requires intensive matrix multiplication, it was not feasible to train deep neural networks before availability of modern GPUs. Neural networks attain much better results than SVM with same amount of data and they also continue to improve with more training data. However, the main problem during the training process of deep neural networks from vanishing gradient, where features in early layers could not be learned because no learning signal reached these layers. In 2006, Hinton and Salakhutdinov suggested one of the first solution given for feed-forward networks by pre-training or initializing features at early layers. As the speed of GPU increased rapidly, it is possible to train deep learning model without using Angewandte Informatik / Kognitive Systeme 3

18 Chapter 2. Deep learning - An Overview Figure 2.1: Venn diagram[1] pretraining. Krizhevsky, Sutskever, and Hinton used a convolutional networks with rectified linear units and dropout for regularization(alexnet). They got outstanding results in ILSVRC-2012 ImageNet competition in With an increase of data, advance in hardware and software technologies as well as techniques to train deeper network model, we are now able to solve some complex problems in the field of NLP, Speech recognition, Computer vision. After Google DeepMind won GO game against humans in 2016 using deep learning models, all major companies such as Facebook, NVIDIA, Microsoft, Google and Intel have started to invest in deep learning startups and created research teams for deep learning in their companies. Machine Learning is ability to acquire their own knowledge, by extracting patterns from raw data. For instance, machine learning algorithm called naive bayes, can classify legitimate from spam . Conventional machine learning algorithms are heavily relied on representation of data they are given. For example, practitioners used logistic regression to recommend casarean delivery and doctors provide several pieces of relevant information for the algorithm. These inputs by doctors included in the representation of patient which is known as a feature. Logistic regression uses these features for predictions. If logistic regression was given as MRI scan instead of doctor reports, it would not able to make predictions using pixels of MRI scan. The process of finding features is called feature engineering. It requires lot of effort and time to find the features. Conventional machine learning algorithms perform quite well in mapping representation to output. But designing these representation or feature vector from raw data (such as pixel values of image) is challenging which requires careful engineering and considerable domain knowledge. The methods of automatically discovering representation from raw data needed for detection or classification is called representation learning. Angewandte Informatik / Kognitive Systeme 4

19 Chapter 2. Deep learning - An Overview In representation learning, for example, when analyzing image of car, our main aim is to separate factors of variation like position of car, its color and angle and brightness of the sun. And most applications require us to disentangle factors of variation. To extract such high level, abstract features are not easy. Deep learning solves this problem in representation learning by introducing representations that are expressed in terms of other, simpler representations. For instance, in image recognition, deep learning system represents the concept of image by combining similar concepts, such as corners and contours, which are in turn defined in terms of edges. Figure 2.2: Flowcharts showing how the different parts of an AI system relate to each other within different AI disciplines[1]. Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. Deep-learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more Angewandte Informatik / Kognitive Systeme 5

20 Chapter 2. Deep learning - An Overview abstract level. Picture of deep learning finding edge deep learning The quintessential example of deep learning model is the feedforward deep network or Multilayer perceptron(mlp) Feedforward Deep Network (MLP) A feedforward neural network is a biologically inspired nonlinear classification algorithm. It consists of several simple neuron-like processing units called as nodes and organized in layers. Every node in a layer is connected with all the units of the previous layer. Each connection may have a different weight. Figure 2.3: Analogy between biological neuron and node of artificial neural networks[2]. The main goal of feedforward deep networks to approximate some function f. For example, for a classifier, y = f(x) maps an input x to a category y. A feedforward network defines a mapping y = f(x; θ) and learns the value of the parameters θ that result in the best function approximation. y = f(x; θ) where θ consists of w and b. So, f(x; w, b) = xw + b. Here w represents weight and b represents bias in the equation. Figure 2.4: Left: A regular 3-layer Neural Network. Right: A ConvNet arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a ConvNet transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels)[2]. Angewandte Informatik / Kognitive Systeme 6

21 Chapter 2. Deep learning - An Overview If there will multiple layer, then output of previous layer would be input of current layer. To make model to learn nonlinear function, we use nonlinear function called activation function. Activation function can be sigmoid, tanh or rectified linear unit. Modern Neural networks use activation function called rectified linear unit. It is defined as g(z) = max{0,z}. Typical non-linear function are sigmoid, logistic and rectified linear unit. A three-layer neural network could analogously look like s = W 3max(θ, W 2max(θ, W 1x)), where all of W3,W2,W1 are parameters to be learned and s is a scores of visual categories. In the below picture, It has been briefly shown that how different nodes are connected with each in typical three-layer neural network and methods to get error derivatives. Figure 2.5: Multilayer neural networks and backpropagation[3]. A multilayer neural network (shown by the connected dots) can distort the input space to make the classes of data (examples of which are on the red and blue lines in 2.6) linearly separable. Note how a regular grid (shown on the left) in input space is also transformed (shown in the middle panel) by hidden units. This is an illustrative example with only two input units, two hidden units and one output unit, but the networks used for object recognition or natural language processing contain tens or hundreds of thousands of units[3]. Figure 2.6: Picture of input changing in MLP[3]. Architecture of deep feedforward model can be different size and depth depending on the practical problems. If there are more nodes in architecture, then it would increase Angewandte Informatik / Kognitive Systeme 7

22 Chapter 2. Deep learning - An Overview space of representation functions. Networks with more node can express more complicated function( shown below in figure 2.7). And network would able to classify more complicated data. But it also raises overfitting issue. Figure 2.7: Larger Neural Networks can represent more complicated functions. The data are shown as circles colored by their class, and the decision regions by a trained neural network are shown underneath[2]. It is shown through empirical study that if the model has sparse connection between their nodes of layers then test accuracy further increases[1](figure: 2.8). A special type of deep feedforward network which generalize better than networks with fully connected adjacent layers, used sparse connection between nodes of layers and some other specialized techniques is convolutional neural network. It is discussed in detail at later sections. If the depth size of neural network is more then it increases test accuracy. So, if we increase number of parameters in neural network like in CNN then it would improve test accuracy as well as require less computation and time to train the model. Figure 2.8: Deeper models tend to perform better. This is not merely because the model is larger. This experiment from Goodfellow et al. (2014d) shows that increasing the number of parameters in layers of convolutional networks without increasing their depth is not nearly as effective at increasing test set performance. The legend indicates the depth of network used to make each curve and whether the curve represents variation in the size of the convolutional or the fully connected layers. [1]. Angewandte Informatik / Kognitive Systeme 8

23 Chapter 2. Deep learning - An Overview Stochastic Gradient Descent Stochastic gradient descent is iterative way for stochastic approximation of gradient descent optimization method for minimizing loss function. SGD is effective when training steps use mini-batch. Gradient of the loss over a mini batch is an estimate of the gradient over training sets, which quality improves as the batch increases and also computation over a batch is more efficient than individual examples. Figure 2.9: Analogy: A person is stuck in the mountains and is trying to get down (i.e. trying to find the minima). SGD: The person represents the backpropagation algorithm, and the path taken down the mountain represents the sequence of parameter settings that the algorithm will explore. The steepness of the hill represents the slope of the error surface at that point. The instrument used to measure steepness is differentiation (the slope of the error surface can be calculated by taking the derivative of the squared error function at that point). The direction he chooses to travel in aligns with the gradient of the error surface at that point. The amount of time he travels before taking another measurement is the learning rate of the algorithm. Source: com/images/faq/closed-form-vs-gd/ball.png Types of Deep Neural Network Architectures Convolutional Neural Network (CNN) Convolutional neural networks are specialized kind of neural networks. Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers. According to Yann LeCun, there are no fully connected layers in a convolutional neural network and fully connected layers are in fact convolutional layers with a 1 1 convolution kernels. However, non-linearity is built within the neurons in fully connected layer, but there is separate non-linearly layer in CNN. CNN typically consists of three types of layers such as convolutional layers, activation layers, pooling layers and fully connected layers. CNN usually takes input as 2D/3D matrix for learning. It is incredibly successful in many practical application like image recognition, NLP or signal processing. It usually consist of convolutional layer, pooling layers, dropout layers, fully connected layers, classifier layers(svm or Softmax ) etc. Angewandte Informatik / Kognitive Systeme 9

24 Chapter 2. Deep learning - An Overview Figure 2.10: Typical CNN architecture Deep Reinforcement Learning In some problems, Deep learning can be applied with reinforcement learning to learn complex things directly from raw data. For example, deep learning model wants to learn a game directly using the moving pixel values of the agents. So, Deep learning model like CNN would learn their own knowledge from raw data such as vision tasks in game and how object is playing their moves could be learned by reinforcement learning. Reinforcement learning is based on rewards or punishments by learning moves with trial-and-error method. One of most popular reinforcement learning is Q-learning. Combining Q-networks with deep learning models is termed as Deep Q- Networks. It stores all of the agent s experiences and then randomly samples and replays these experiences to provide training data. This model could be to used to learn Atari game and also similar kind of approach had been used to learn GO game. DeepMind s deep learning algorithm was able to defeat human player in GO game in Generative Adversial Networks It is first introduced by Ian Goodfellow in It consists of two neural networks models namely generator and discriminator. Generator takes noise as inpute and generate samples. Discriminator receives samples from generator as well training data and distinguish two different sources. Generator learns to produce more and more realistic samples and discriminator becomes better in distinguishing generated and real data. These two networks are trained simultaneously and also GAN can backpropagate gradient information from discriminator back to generator network, so the generator can adopt its parameters. The main goal of networks to generate samples indistinguishable from real data. It is widely applied for modelling natural images. Figure 2.11: Generative Adversial Networks overview Angewandte Informatik / Kognitive Systeme 10

25 Chapter 2. Deep learning - An Overview Recurrent Neural Network (RNN) Recurrent neural networks are special type of neural network architecture used for sequential data. It use parameter sharing concept in which each output is a function of previous member of output. It is another variant of neural network which is applied usually on one dimensional, sequential data. RNN produces out at each time event. Here time is not real time. It is sequential in a way. It can be applied for solve problem which have sequence like natural language processing. Sometimes integrating both CNN and RNN models solve complex problems(figure: 2.12). Figure 2.12: CNN and RNN both are applied to find logical textual information directly from images. CNN does identify/classify the images and RNN generates meaningful sentences out of it. 2.2 Approaches and Methods CNN Approaches There are a few most important approaches which has been most widely used while building CNN architecture by practitioners. These approaches could be used in CNN architecture based on amount of training data, computational power and other factors. Some of common approaches are discussed below. Convolution layer CNN must have at least one convolutional layers in the architecture. It is based on concept of convolution which is special type of mathematical operation on two functions. Lets suppose if we have 2 dimensional as input denoted by I, then there would be 2D kernel denoted by K and feature map or output denoted by S. Equation for convolution would be below: Figure 2.13: Convolutional matrix equation[1] Angewandte Informatik / Kognitive Systeme 11

26 Chapter 2. Deep learning - An Overview Convolution can be understood by thinking sliding window function applied to a matrix. Figure 2.14: The visualization below iterates over the output activations (green), and shows that each element is computed by elementwise multiplying the highlighted input (blue) with the filter (red), summing it up, and then offsetting the result by the bias[2] Convolution leverages three ideas to improve learning system: sparse interactions, parameter sharing and equivalent representation. 1. Sparse interaction In CNN, the kernel is quite smaller than input. In case of images, small and meaningful features can be detected by small kernel for size 10 or 100 pixels for input images which contain millions of pixels. Also in CNN, there is few connections between nodes so computation time significantly decreases and it makes convolutional layer spatially sparse sparse. When dealing with high-dimensional data such as images, it is impractical to connect neurons to all neurons. Instead, it connect each neuron to only a local region of the input volume. The spatial extent of this connectivity is a hyperparameter which is called as the receptive field of the neuron (equivalently, this is the filter size)[2]. The depth axis is always equal to the depth of input volume. For example, suppose that the input of size [ ]. If the filter size of [3 3], then each neuron of convolutional layers will have weights to [3 3 12] region in input volume(depth size is 12 here). Angewandte Informatik / Kognitive Systeme 12

27 Chapter 2. Deep learning - An Overview Figure 2.15: An example input volume in red (e.g. a 32x32x3 CIFAR-10 image), and an example volume of neurons in the first Convolutional layer. Each neuron in the convolutional layer is connected only to a local region in the input volume spatially, but to the full depth (i.e. all color channels). Note, there are multiple neurons (5 in this example) along the depth, all looking at the same region in the input - see discussion of depth columns in text below[2]. There are three hyperparameters which controls the size of output volume: the depth, stride and zero-padding. Depth corresponds to number of filters to be used. Each filters learns something different in the input. How we slide the filter in layers is called stride. If stride is 1, then filter will move by one pixel(in case of images). Higher the stride would produces smaller output volumes spatially. To control the size of output volumes, sometimes one need to pad input volume with zeros around the corner and this technique is called zero-padding. 2. Parameter sharing Parameter sharing is used in convolutional layer to control the number of parameters. It dramatically reduces the number of parameter by making one assumption. The assumption is that if one feature is useful to compute some spatial position (x1,y1) then it should also be useful at some different position (x2,y2). So, each member of kernel is used at almost all position of input. In this way, we learn few parameters compare to learning separate parameters for every position of inputs. 3. Equivalent representations Function is called as equivariant if it s input changes, the output changes in same way. When processing images using CNN, Convolution create 2D map of certain feature of input. If we move object in the input, its representation also moves in same amount in the output. In convolutional layers, if it detects edges and same type of edge found everywhere in image. So, it is plausible to share parameters across the entire image. Spatial pooling Pooling layers are generally used between two convolutional layers. The function of pooling is to reduce number of parameters and computation in the CNN architecture. It operates on every depth slices of input and resizes spatially. In this way, it reduces the spatial size of representation. In case of max pooling, it resize using MAX operation. MAX operation would only input maximum of matrix on given filters. Angewandte Informatik / Kognitive Systeme 13

28 Chapter 2. Deep learning - An Overview The paper Striving for Simplicity: The All Convolutional Net [5] propose that CNN is performing well even without use of pooling layers and larger stride have been used to reduce size of representation. So, CNN architectures may not have pooling layers in future. Figure 2.16: Pooling layer downsamples the volume spatially, independently in each depth slice of the input volume. Left: In this example, the input volume of size [224x224x64] is pooled with filter size 2, stride 2 into output volume of size [112x112x64]. Notice that the volume depth is preserved. Right: The most common downsampling operation is max, giving rise to max pooling, here shown with a stride of 2. That is, each max is taken over 4 numbers (little 2x2 square)[2] Batch normalization Batch normalization is method of performing the normalization for each training mini-batch. It allows us to use much higher learning rates and be less careful about initialization. Batch normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by significant margin in terms of accuracy[6]. Batch normalization solves the problem called internal covariate shift. Covariate shift means that distribution of features is different at different parts of data. If this happens in network from one layer to another, then it is called as internal covariate shift. This happens because, as the network learns and the weights are updated, the distribution of outputs of a specific layer in the network changes. This forces the higher layers to adapt to that drift, which slows down learning. Batch normalization helps to reduce internal covariate shift. It accomplishes this via a normalization step that fixes means and variance of layer inputs. Figure 2.17: The test accuracy graph of the MNIST network trained with and without Batch Normalization[6] Batch normalization also regularizes the model and reduces the need of dropout. It helps network to train faster and achieve higher accuracy[6]. Dropout layer Deep neural networks with large number of parameters are very powerful machine learning systems. However, Overfitting is serious problem in such networks. Angewandte Informatik / Kognitive Systeme 14

29 Chapter 2. Deep learning - An Overview Dropout is technique for randomly dropping out input from neural nets during training. This prevents from co-adapting too much. Hence, it prevents overfitting in the network. Dropout also provides a way to combine exponentially many neural network architectures efficiently. Figure 2.18: Test error for different architectures with and without dropout. The networks have 2 to 4 hidden layers each with 1024 to units.[7] Applying dropout to a neural network amounts to sampling a thinned network from it. The thinned network consists of all the units that survived dropout. A neural net with n units, can be seen as a collection of 2 n possible thinned neural networks. These networks all share weights so that the total number of parameters is still O(n 2 ) or less. For each presentation of each training case, a new thinned network is sampled and trained. So, training a neural network with dropout can be seen as training a collection of 2 n thinned networks with extensive weight sharing, where each thinned network gets trained very rarely, if at all. Figure 2.19: Effect on features by dropout on MNSIT data with one hidden layer autoencoders having 256 rectified units[7] Angewandte Informatik / Kognitive Systeme 15

30 Chapter 2. Deep learning - An Overview If dropout are applied near fully connected layers of CNN. Its called as fully connected dropout. If dropout is used at max pooling layer then its called as Max-pooling dropout. Both increase accuracy if applied on convolutional networks. Generally, CNN layers with max-pooling dropout layer and fully connected drop out layers had shown better results than CNN architecture without max-pooling layers. However, it is also shown that models which contains only convolutional performed quite well. Figure 2.20: Max-pooling dropout[8] Rectifiers Rectifier linear units are essential for state-of-the-art neural networks. It is activation function and used for non-linearity in the neural network model. It overcomes vanishing gradient descent problem unlike tanh or signmoid function. It is defined as f(x) = max(θ, x). PReLU, the variant of ReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. One of model which used PReLU for CIFAR10 and imagenet data had surpassed human-level performance( 5.1%)[4] on visual recognition challenge. Fully connected Fully connected layer is traditional multi-layer perceptron. A fully connected layer is a kind of convolutional layer with 1 1 convolutional kernel as per Yann LeCun. It usually placed at the end of CNN architecture. It takes one dimensional matrix as input. The output of convolutional layers and pooling layers represent high level feature and low dimensional feature space, and fully connected is learning a nonlinear function in that space. The main task of fully connected layer is to classify input data into various classes using features which is output of convolutional layers. Classifiers Classifiers are generally used on top of fully connected layers of CNN. Most widely used classifiers is softmax activation function in the output layer. It could use other classifiers like SVM also. Softmax is activation function. It provides probability of each classes in the output which sum to one for all classes/groups and cross-entropy loss is what we use to measure loss error at a softmax layer. Angewandte Informatik / Kognitive Systeme 16

31 Chapter 2. Deep learning - An Overview CNN Training Methods Regularization Regularization is strategies required to reduce test error, sometimes at cost of increased training error. One of the main problem in machine learning is develop an algorithm which can perform well not just on training data, but also in new data. These strategies reduce generalization error, but not its training error. Regularization is one of the methods to build those kind of algorithm. Strategies like putting constraints on model, such as adding restriction on parameters values or also add extra term in objective functions. These constraints and penalties do promote generalization in machine learning model and prevent overfitting. There are several regularization strategies, some of important regularization methods of deep learning are discussed below. 1. L1 and L2 Regularization A Regularizer term has to be added in loss function in order to prevent the coefficients to fit so perfectly to overfit. L1 regularization is sum of the weights and L2 regularization is sum of the square of weights. Figure 2.21: L1 and L2 regularization terms in mathematical formula 2. Dataset Augmentation One of best approach to prevent overfitting in deep learning models is to use more and more data. As sometimes, we have limited amount of data. One way to get around this problem is to create fake data out of existing data. In case of object recognition, this trick works quite well. One can generate multiple images from one images by randomly changing its orientation, contrast, or colour etc. 3. Early Stopping When training large models with sufficient representational capacity to overfit the task, we often observe that training error decreases steadily over time, but validation set error begins to rise again[1]. Thats make U-shape graph. This means we can obtain a model with better validation set error (and thus, hopefully better test set error) by returning to the parameter setting at the point in time with the lowest validation set error. Instead of running our optimization algorithm until we reach a (local) minimum of validation error, we run it until the error on the validation set has not improved for some amount of time. This strategy is called early stopping. One of the hyperparameters of early stopping is number of training steps. It also Angewandte Informatik / Kognitive Systeme 17

32 Chapter 2. Deep learning - An Overview reduces computational cost as it limit the number of training steps. It is unobtrusive form of regularization, in that it requires no change in objective function or set of allowable parameters. 4. Dropout Dropout can be method of applying bagging. Bagging is regularization method in which several models are trained, then all the models vote to output the test examples. It reduces generalization error. Bagging is impratical in neural networks as training very large neural networks need runtime and memory. Dropout make it possible by bagged ensemble of exponentially many neural networks. Every new input, Dropout randomly removes some nodes from layers. Figure 2.22: Dropout Neural Net Model. Left: A standard neural net with 2 hidden layers. Right: An example of a thinned net produced by applying dropout to the network on the left. Crossed units have been dropped.[7] 5. Learning Rate as Hyperparameter Learning rate is one of hyperparameter used in neural networks. It could be one of regularization methods. So, choosing optimal learning rate do impact overfitting and improves generalization on test set(figure: 2.23). It is one of the preferred way to control overfitting in the model caused by using high number filters. Angewandte Informatik / Kognitive Systeme 18

33 Chapter 2. Deep learning - An Overview Figure 2.23: The effects of regularization strength: Each neural network above has 20 hidden neurons, but changing the regularization strength makes its final decision regions smoother with a higher regularization.[2] Figure 2.24: A cartoon depicting the effects of different learning rates. With low learning rates the improvements will be linear. With high learning rates they will start to look more exponential. Higher learning rates will decay the loss faster, but they get stuck at worse values of loss (green line).[2] 6. Validation/Training Accuracy The gap between training and validation accuracy can be useful to track overfitting in the model. If the gap is increasing then there is more overfitting in the model and there is need of change hyperparameters or network topology. It also arises need for more data. So, using validation set in training is an important methods to know more about model during training. In this way, we can tune our model hyparameters in effective way. Angewandte Informatik / Kognitive Systeme 19

34 Chapter 2. Deep learning - An Overview Figure 2.25: The gap between the training and validation accuracy indicates the amount of overfitting.[2] Optimization Optimization refers to the task of either minimizing or maximizing some function f(x) by altering x. One particular case of optimization: finding the parameters θ of a neural network that significantly reduce a cost function J(θ), which typically includes a performance measure evaluated on the entire training set as well as additional regularization terms[1]. Typically, the cost function can be written as an average over the training set, such as Figure 2.26: Cost function[1] where L is the per-example loss function, f(x; θ) is the predicted output when the input is x, pdata is the empirical distribution[1]. There are two most used loss function which are discussed below. 1. Mean Square Error (MSE) It is a multi class loss formerly used to train neural networks. Loss(x, y) = i x i y i. with x be the vector denoting values of n number of predictions. Also, y be a vector representing n number of true values.n. 2. Cross Entropy It is usually preferred more than MSE for neural networks training. Figure 2.27: Cross Entropy formula Angewandte Informatik / Kognitive Systeme 20

35 Chapter 2. Deep learning - An Overview where p is the true distribution and q is the model distribution. As before, the model q is parameterized with θ (i.e. these are the network weights). There is chance that initialization of the parameters result in the network being decisively wrong for some training input. The output neuron would saturate near to 1 instead of 0 or vice versa. The MSE loss will usually slow down learning. But cross entropy do not get any impact from this. Optimization algorithms in machine learning does not usually halts at local minimum. Instead, It minimizes a surrogate loss function but halts when early stopping is satisfied. Optimization algorithm that uses entire training set is called batch gradient methods and if it uses more than one training examples but not all example at once, then it called as minibatch stochastic methods. Most of time, stochastic gradient methods are applied in deep learning models for optimization. There are few advantages of using stochastic methods. Large batches provide a more accurate estimate of gradient, considerable reduction in runtime and also it utilizes the multicore architectures of computers by using all core to process the batches in parallel. But large batches requires more memory and large batches can not offer regularizing effect like smaller batches. So, optimum size of minibatch should be used for training deep learning models. Figure 2.28: Animations that may help your intuitions about the learning process dynamics. Left: Contours of a loss surface and time evolution of different optimization algorithms. Notice the overshooting behavior of momentum-based methods, which make the optimization look like a ball rolling down the hill. Right: A visualization of a saddle point in the optimization landscape, where the curvature along different dimension has different signs (one dimension curves up and another down). Notice that SGD has a very hard time breaking symmetry and gets stuck on the top. Conversely, algorithms such as RMSprop will see very low gradients in the saddle direction. Due to the denominator term in the RMSprop update, this will increase the effective learning rate along this direction, helping RMSProp proceed. Images credit: Alec Radford. There are many optimization algorithms like Stochastic gradient descent, Momentum, Nestrov momentum. Most popular one is stochastic gradient descent. There are also algorithm which supports adaptive learning algorithm and perform quite well compare to algorithm having absolute learning rate. AdaGrad, RMSProp and Adam are most popular optimzation algoritm with adaptive learning rates(figure: 2.28). The RMSProp Angewandte Informatik / Kognitive Systeme 21

36 Chapter 2. Deep learning - An Overview algorithm modifies AdaGrad to perform better in the non-convex setting by changing the gradient accumulation into an exponentially weighted moving average[1]. Transfer Learning Training neural networks is computationally expensive as well require lots of data. It is not convenient for everybody to invest time and resources to train the neural network. Due to this, it is good idea to somehow use pre-trained model from other people with small changes. Transfer learning enable us to use pre-trained model to accelerate our solution. Pre-trained model is created by someone else to solve similar problem. For example, there are many pre-trained model available based on imagenet data of 1.2 million images of 1000 categories. These pre-trained models have ability to generalize to images outside the ImageNet datasets via transfer learning. The modification on pre-existing model has been done by fine-tuning the model. Two ways of finetuning the mode have described below. 1. Feature extraction Just removing the output layer( the one which gives the probabilities for being in each of the 1000 classes) and then use the entire network as a fixed feature extractor for the new data set. 2. Use the Architecture of the pre-trained model We can only use the architecture only. Just initializes the weights and train the model on new data. 2.3 Convnets Architectures There are several architectures based on convolutional neural networks. Some of architectures which is quite common and influential are listed below. LeNet(1990s) LeNet was one of the first CNN which used usually for character recognition tasks such as reading zip codes, digits etc. LeNet architecture consists of alternate convolutional layers always immediately followed by pooling layer. Convolutional, pooling and non-linearity has been used in the architecture. Least mean square is employed as loss function in LeNet. Figure 2.29: LeNet-5: First convolutional architecture for digit recognition. Courtesy: neural-network-architectures-156e5bad51ba Angewandte Informatik / Kognitive Systeme 22

37 Chapter 2. Deep learning - An Overview AlexNet (2012) The first work that popularized Convolutional Networks in Computer Vision was the AlexNet, developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton in It significantly outperformed other hand crafted machine learning algorithm in ImageNet ILSVRC 2012 challenge. It is a more deeper, wider and featured consecutive convolutional layers version of LeNet. ReLU has been used in the model for non-linearity. Stacked convolutional layers and overlapping max pooling are applied in the model. Figure 2.30: CNN Architecture: AlexNet. It contains 7 hidden layers, 650 thousand neurons, 60 million parameters and 650 million connections. Courtsey: stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf GoogLeNet (2014) GoogleNet was winner of ILSVRC Its main contribution was development of inception module which reduces number of parameters to 4m from 60m(AlexNet). VGGNet(2014) VGGNet was runner up of ILSVRC Its main contribution was to show that depth of network is an important factors for the overall performance of CNN model. It consists of 16 conv/fc layers and performs 3*3 convolution and 2*2 pooling across the layers. DenseNet(2016) DenseNet is a network architecture where each layer is directly connected to every other layer in a feed-forward fashion (within each dense block). For each layer, the feature maps of all preceding layers are treated as separate inputs whereas its own feature maps are passed on as inputs to all subsequent layers. This connectivity pattern yields state-of-the-art accuracies on CIFAR10/100 (with or without data augmentation) and SVHN. Angewandte Informatik / Kognitive Systeme 23

38 Chapter 2. Deep learning - An Overview 2.4 Applications and Evaluation Methods of Deep Learning Applications Now a days, Deep learning are being applied in many fields such as computer vision, speech recognition, Natural language processing and other applications related to commercial uses. Computer Vision Computer vision is interdisciplinary field which deals on how computers process, analyze, understand and create images or videos. Most of computer vision such as image recognition is quite easy for humans but challenging for computers. Deep learning is giving quite promising results in the field of computer vision such as object recognition, object segmentation, annotating an image or transcribing symbols from an image. Convolutional neural networks are usually applied to solve computer vision problems. Speech recognition Speech recognition is interdisciplinary field in which computers can translate or recognize the spoken words into text. It is also called as automatic speech recognition or computer speech recognition. The task of speech recognition is to map an acoustic signal containing a spoken natural language utterance into the corresponding sequence of words intended by the speaker[book]. Recently deep learning models like deep LSTM RNN is giving quite good results compare to previous machine learning algorithms. Natural Language Processing Natural language processing (NLP) is the use of human languages, such as English or French, by a computer. NLP is more about extracting meaning out of spoken words. Machine translation is one of application of NLP. It converts one human language to another human language using computers. Sometimes, very generic neural networks can be applied on to natural language processing. However, to achieve good performance in some NLP problems like machine translation, Deep RNN could be applied. Other Applications There are many applications of deep learning like cancer detection, autonomous vehicles, music generation, photo description, real time analysis of human behavior etc Evaluation methods Performance Measure Performance measure is a measurement which make predictions by a trained model on test dataset. It is specialized for different problems like classification, regression, and clustering. Classification accuracy which sum of all positives divided by total prediction is used for classification problems. Sometimes, we need to tweak or breakdown performance measure to gain better insights into data. Angewandte Informatik / Kognitive Systeme 24

39 Chapter 2. Deep learning - An Overview Accuracy is defined by total number of true positives and true negatives divided by total population(figure: 2.31). In general Positive = identified and negative = rejected. Therefore: 1. True positive = correctly identified 2. False positive = incorrectly identified 3. True negative = correctly rejected 4. False negative = incorrectly rejected Figure 2.31: Confusion Matrix : Based on different scenario, One can choose performance measure for better insights from this confusion matrix. Courtesy: Wikipedia An automatic verification dataset Automatically splitting the training data into validation dataset then evaluating the performance of the model on that validation dataset at each epoch. So, we would get four outputs such as cross entropy loss, accuracy, validation loss and validation accuracy at every epoch. Based on both the accuracies and its difference, one can decide about stopping the training when accuracy is maximum and its difference is less. This automatic verification dataset evaluation methods has beeen realized by using python library called Keras for my experiment. Manual k-fold Cross Validation Training datasets has been split into k subsets. Each subset is called fold. The algorithm is trained on k-1 folds with one held back and tested on the held back fold. This is repeated so that each fold of the dataset is given a chance to be the held back test set. After running cross validation, we end up with k different performance values then average of those values has been calculated and it would be final performance value for that model. Angewandte Informatik / Kognitive Systeme 25

40 Chapter 2. Deep learning - An Overview 2.5 The Tensorflow Framework TensorFlow is an open source software library for numerical computation using data flow graphs developed by Google. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow can be used for deep learning algorithm quite efficiently as there are lots of pre-defined deep learning libraries are available. It also provides state flow diagram for deep learning models so that anyone can visualize it in better way. Tensorflow supports multiple programming languages such as Python, Java, C++, R, etc. CuDNN, which is GPU-accelerated library primitives for deep neural networks also supports tensorflow. Keras is high-level library which run on top of Tensorflow. It provides more intuitive programming style for deep learning model. 2.6 Deep Learning in Computer Vision: CIFAR10 Images Data Deep learning model, AlexNet had popularized CNN in the field of computer vision when it became winner in ILCVRC 2012 with only around 16 percent error rate. The second runner up have almost 26 percent error rate. Now there are also deep learning models which surpasses the human level accuracy[4]. In the past years at ILCVRC competition, there is further improvement in the accuracy on CIFAR10 image data using modified convolutional networks by researchers. Those convolutional networks have been used for different applications of computer vision like image segmentation, annotating image with text or learning to play game just by analyzing screen pixel. 2.7 Architectures Solving CIFAR10 and Their Accuracies There are various CNN architectures had been applied on CIFAR10 datasets. There are few most important architectures are discussed below. The first CNN architectures, AlexNet(2012) which become widely popular in getting significantly better accuracy on identifying pictures using CIFAR10 than any other machine learning algorithm. ImageNet Classification with Deep Convolutional Neural Networks, 2012 AlexNet was one of the first CNN architecture which gained attention by its comparatively good accuracy results on image data. This architecture was named by Alex Krizhevsky whose group was winner of ILCVRC Architecture consists of five convolutional layers and three fully connected layers. Dropout had been used in first two fully-connected layers. ReLU and max pooling used in convolutional layers. After data augmentation in CIFAR10 image data, AlexNet model has achieved 89 percent accuracy[12]. Angewandte Informatik / Kognitive Systeme 26

41 Chapter 2. Deep learning - An Overview Spatially-sparse Convolutional Neural Networks, 2014 CNN architecture performed significantly well when it processes spatially-sparse inputs with some modification in its architecture. A character drawn with a one-pixel wide pen on a high-resolution grid looks like a sparse matrix. If the picture is not sparse, then it can be made sparse by adding padding. Slow max-pooling retains more spatial information; this is better for handwriting recognition, as handwriting is highly structured. So applying pooling slowly, using 2*2 pooling rather than using small number of 3*3 or 4*4 pooling with small convolutional filters, performed better than later. Architecture have alternating convolutional layers with pooling layers. Architecture can be denoted by DeepCNet(l, k) where l + 1 is number of convolutional layer,separated by l layers of pooling 2*2 pooling layers and nk is number of convolutional filters in nth convolutional layer. Last convolutional layers is fully-connected output layer with SoftMax activation function. If the input of size N*N, then N is equal to 3*2 to the power l. For example, DeepCNet(5,200) is architecture from input layer size N = 96 with 5 convolutional filter layers and 5 pooling layers. The spatial size of first convolutional filter is 3*3, and then 2*2 for subsequent layers. The lower levels of a DeepCNet seem to be fairly immune to overfitting, but applying dropout in the upper levels improved test performance for larger values of k. Apply a deep convolutional neural network with sparsity has resulted 6.28 percent error rate in CIFAR10 small picture dataset[13]. Figure 2.32: Model : input-100c3-mp2-200c2-mp2-300c2-mp2-400c2-mp2-500c2-output. The hidden layer dimensions for l = 5 DeepCNets, and LeNet-7. In both cases, the spatial sizes decrease from 96 down to 1. The DeepCNet applies max-pooling much more slowly than LeNet-7[13]. STRIVING FOR SIMPLICITY: THE ALL CONVOLUTIONAL NET, 2015 Most CNNs architecture pipeline consists of convolutional layers, pooling layers and fully connected layers. All convolutional net architecture, convolutional layer replaced the pooling layers, dropout layer as well as fully connected layers without loss of accuracy. Architecture solely built on convolutional layers, with occasionally dimensionality reduction by using stride of 2 and yields competitive performance on image recognition datasets. This architecture trained vanilla gradient descent with momentum reaches state of the art performance without the need for complicated activation functions, any response normal- Angewandte Informatik / Kognitive Systeme 27

42 Chapter 2. Deep learning - An Overview ization or max-pooling. Architecture pipeline consists of convolutional layers and at 3rd and 5th position, there is 3 3 convolutional layer having strides of 2. Usually at those position, it is common to have pooling layers but in All-Conv-C, is replaced by convolutional layer with strides of 2 so that next layer covers the same spatial region as before. All-CNN-C architecture had achieved error rate of 9.08 percent without augmentation and accuracy of percent with image augmentation in CIFAR10 dataset[5]. Densely Connected Convolutional Networks, 2016 Densely connected convolutional networks (DenseNets), which connects each layer to every other layer in feed forward fashion. Conventional convolutional networks with l layers have l connections. DenseNets with l layers have (l+1)/2 connections. For every layer, feature maps of all preceding layers are used as inputs, and its own feature maps are used as inputs into all subsequent layer. DenseNets alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. DenseNets performed significantly better than most of convolutional networks, whilst requiring less memory and computation to achieve high performance. DenseNets(k=24) have achieved error rate of 3.74 percent on CIFAR10 with data augmentation[14]. Figure 2.33: DenseNet Architecture Angewandte Informatik / Kognitive Systeme 28

43 Chapter 3 Comparing different CNN Architectures 3.1 Description of CNN Architecture There are many CNN architecture in recent times which performs quite well in the field of image recognition such as AlexNet, VGGNet, GoogleLeNet, DenseNets. In these architecture, there are several different regularization techniques such as Dropout, Batch normalization, Data augmentation, which has been introduced to prevent overfitting in the model. For non-linearity in the model, rectifiers like ReLU have been employed. It also reduces overfitting in the model. In this experiment, I try to apply some techniques and methods of different architectures to find out their effectiveness on the model for CIFAR10 data. There are four CNN architecture for solving CIFAR10 in this paper. Each CNN architecture is slightly different from others in terms of positions of dropout layers and number of convolutional layers in the model. I have been able to reproduce the effect on model from a few research papers[4][7][8] that dropout reduce overfitting and increase the number of convolutional layers to enhance model accuracy. I have applied multiple network topologies in CNN architecture to see if it affects the accuracy of experiment. Besides, there have been use of different regularization strategies like ReLU, Pooling, Dropout layers for better accuracy. I have used VGGNet as base model and then, performed some changes in it to improve accuracy with limited computational resources and relatively less runtime. Number of convolutional layers has been reduced and more dropout layers are added in the architecture. So, experiments have been performed on four different CNN architectures similar to VGGNet on CIFAR10 datasets. All the four architectures have distinct network topologies with respect to positions of various layers and number of various distinct layers in CNN architecture. I train three models(namely A, B and C) on CIFAR10 dataset described below in the figure 3.2, 3.3 and 3.4 in the page 31. The fourth model D(3.1) is replica of CNN model provided in tutorial by tensorflow. All other three models are modified version of this architecture. I tried to reproduce or improve the results produced by this model. All architecture are based on VGGNet with modifications on network topology and adding of Angewandte Informatik / Kognitive Systeme 29

44 Chapter 3. Comparing different CNN Architectures dropout layers. Input size of CIFAR10 data is Number of parameters in model does not depend on input size. It depends only on depth, kernel size and depth of previous layer for CNN models. Details about CNN model such as dimension and number of parameters have been provided in the followed table 3.1. There are 16 convolutional layers, which are being used in actual VGGNet model. As it is very computationally expensive to use 16 convolutional layers, so I have reduced the number of convolutional layers in the architecture. Dropout layer also has been added which is not present in VGGNet. Dropout reduces overfitting and improve accuracy on test data set. It is also inexpensive methods for regularization. Model A has three convolutional layers and two dropout layers. Model B and C have four convolutional layers each. Detailed network topologies of these models have been described in the below figure on page 31. The positions of dropout, convolutional and pooling layers are different in the three architectures. They are changed in order to see whether it would impact in the accuracy of the model. The main aim of building different architectures is to realize if any models would be able to get better results than model D. And how different layers affect the accuracy of model? Which layers would be effective to reduce overfitting? Table 3.1: Architecture details of CNN models for training CIFAR10 Models CNN Architectures ( Network Topology) Number of Parameters (3*3*3)*64 = 1,728 CONV2D-[32x32x64]-Pooling (3*3*64)*64 = 36,864 CONV2D-[16x16x64]-Pooling-Dropout(0.5) (3*3*64)*128 = 73,728 CONV2D [8x8x128]-Dropout(0.5) A (8*8*128)*384 = 31,45,728 FC-[1x1x384] (192*192) = FC-[1x1x192] Softmax B C D CONV2D-[32x32x64]-Pooling CONV2D-[16x16x64]-Pooling CONV2D-[8x8x64]-Pooling CONV2D-[4x4x128]-Pooling FC-[1x1x384] FC-[1x1x192]-Dropout(0.5) Softmax CONV2D-[32x32x64] CONV2D-[32x32x64]-Dropout(0.3)-Pooling CONV2D-[16x16x64] CONV2D-[16x16x128]-Pooling FC-[1x1x384]-Dropout(0.5) FC-[1x1x192] Softmax CONV2D-[32x32x64]-Pooling CONV2D-[16x16x64]-Pooling FC-[1x1x384] FC-[1x1x192] Softmax Total 3.2M (3*3*3)*64 = 1,728 (3*3*64)*64 = 36,864 (3*3*64)*128 = 73,728 (3*3*128)*128 = 147,456 (2*2*128)*384 = (192*192) = Total 1M (3*3*3)*64 = 1,728 (3*3*64)*64 = 36,864 (3*3*64)*128 = 73,728 (3*3*128)*128 = 147,456 (8*8*128)*384 = 31,45,728 (192*192) = Total 3.2M (3*3*3)*64 = 1,728 (3*3*64)*64 = 36,864 (8*8*128)*384 = 31,45,728 (192*192) = Total 3.2M Angewandte Informatik / Kognitive Systeme 30

45 Chapter 3. Comparing different CNN Architectures All the four architectures have pictorial representation in the below figure. These pictures show how different layers are stacked among each other in every architectures. All these architecture have been tested on CIFAR10 data. Rectifiers, Pooling, Convolutional and dropout layers are used in different fashions in model A, B and C. Figure 3.1: D: CNN model by tutorial of tensorflow on CIFAR10[14] Figure 3.2: A: 1st Architecture Figure 3.3: B: 2nd Architecture Figure 3.4: C: 3rd Architecture 3.2 Setting of the Deep Learning Environment To get started for Deep learning research, one should first set up required environment for deep learning projects. I have set up the environment on Ubuntu Server LTS as its operating system, and NVIDIA GRID K520 with memory size of 8 GB as its Graphical processing unit(gpu). GPU has needed because neural networks require lots of memory and computation power to perform operations such as matrix multiplication. GPU usually Angewandte Informatik / Kognitive Systeme 31

46 Chapter 3. Comparing different CNN Architectures has thousands of cores while a CPU usually have no more than 12 cores. Therefore, GPU facilitates parallel processing of these computations and provides required memory power using their cores effectively than CPU. There are a few software packages, libraries and frameworks which are necessary for deep learning experiment such as Python, CUDA, cudnn, Tensorflow and Keras. Python 2.7 used as programming language and CUDA is NVIDIA API for programming on graphics card. One can write high performance program using CUDA language on GPU. cudnn is a library for deep neural networks using CUDA. It provides high performance GPU acceleration and highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. Tensorflow is a deep learning framework by Google which use abstraction built on cudnn, and it also accelerates tensorflow framework. Keras is high level neural network API which runs on top of Tensorflow. It is written in Python. It provides user friendly API, and easy and fast prototyping which are possible for CNNs and RNNs using Keras. There could be different ways to set up deep learning environment depending on programming languages, deep learning framework provided by many other vendors, operating systems, libraries, etc. Deep learning frameworks such as Caffe, Theono, Torch, Tensorflow, Keras, Deeplearning4j, etc. could be used for deep learning programming. Each framework does support multiple programming languages such Python, Java, C++, Scala, GO, R, MATLAB. etc. Programming languages could be used if that particular language is supported by the framework. Keras is a special deep learning framework which works under theono or tensorflow as it backend. Keras supports only python as of now. MATLAB is also quite handy for deep learning projects, especially for computer vision tasks. It is a powerful tool for image processing related operations, such as shortening pixel sizes, image resizing of pictures, image augmentation, etc. In this experiment, for simplicity and to avoid conflicts with other libraries installed in the system, it is neccessary to use virtualenv to create python environment and set up CUDA, cudnn, Tensorflow, Keras, and IPython Notebook inside it. Virtualenv is a tool to create isolated python environment. Virtualenv creates a folder which contains all the necessary executables to use packages which python project would need. 3.3 Configuring and Installing in AWS Cloud As it is quite easy and economical to configure and install deep learning environment in cloud infrastructure. So, this project utilizes AWS Cloud. It provides flexibility in selecting GPU and running it as per user'demand. Therefore, instead of buying some GPU hardware for deep learning projects, one can rent it on AWS Cloud whenever there is a need of GPU. It is one of most inexpensive and efficient ways to perform small deep learning experiments. To configure and set up deep learning environment, the first step is to create new EC2 GPU instance, one needs to select the OS image and GPU type. For this project, I choose OS image Ubuntu Server LTS and g2.2*large as its GPU type which contains NVIDIA K520 GPU with memory size of 8 GB. After launch of EC2 instance, CUDA, cudnn, Tensorflow and Keras need to be installed on that EC2 instance. While Angewandte Informatik / Kognitive Systeme 32

47 Chapter 3. Comparing different CNN Architectures installing above software such as tensorflow and keras, one need to aware about versions. This is quite new frameworks and libraries so its version keep on changing frequently. Old code from previous software versions sometimes throw error if its run on newer version softwares. Tensorflow libary has been used for CIFAR10 training but I have used Keras for RPSG16 data. Keras run on top tensorflow. Keras provides easier way to efficiently program for deep learning model required for deep learning projects. Below are figures of starting a machine in AWS. In the figure 3.8, there is a need to increase the storage capacity of root in OS as CUDA and CuDDN requires more than 8 GB of space. Figure 3.5: First step to choose the OS Figure 3.6: Choosing GPU in AWS Figure 3.7: Storage of root in OS 3.4 Analyzing outputs for Layover of ConvNets After the training, we can visualize outputs at every layer. When a image of a dog is feed into CNN then first convolutional layer output 64 different images shown below. It learns edges at first layer. In next layers, there is another 64 different images generated but it is Angewandte Informatik / Kognitive Systeme 33

48 Chapter 3. Comparing different CNN Architectures difficult to understand what exactly model tries to learn. All 64 images in 2nd layer have random black and white spots. Figure 3.8: 64 images generated from 1st convolutional layer Figure 3.9: 64 images generated from 2nd convolutional layer Angewandte Informatik / Kognitive Systeme 34

49 Chapter 3. Comparing different CNN Architectures 3.5 Results on CIFAR10 In the below table 3.3, there are results for all three models of the experiment. I have done cross-validation in this case as I just have to compare my results with results from tensorflow's tutorial and they have used same test set as what I used for training all three models. If the number of convolutional layers increased by in model A compared to model D and two dropout layers added, then it produces less accuracy than model D. It is obvious that excessive use dropout does not lead to improve test error even we increase convolutional layers. Model C performed 0.5% better than model D. It has used 4 convolutional layers with two dropout layers placed at the 3rd and 8th position in architecture. We also could see that the use of pooling layers after at every convolutional layer and dropout layer at the end just before softmax in the model B. The accuracy of model B is less than model C but more than model A. Therefore, by looking at these results of empirical study, one can infer that increasing the number of convolutional layers and adding of dropout layer at proper place significantly impact the accuracy. However, how exactly these layer positions affects the accuracy, is not clear from this empirical study. It is reaffirmed[5] from these results that dropout layer do enhance the accuracy on test data as it is one of the powerful regularization technique for CNN models. Table 3.2: Training details of 4 models Models Runtime details Resource A 7 hours - 60k steps (128 batch size) NVIDIA K520 8 GB B 10 hours - 60k steps (128 batch size) NVIDIA K520 8 GB C 12 hours - 60k steps (128 batch size) NVIDIA K520 8 GB D 5 hours - 60k steps ( 128 batch size) 1 Tesla K20m Angewandte Informatik / Kognitive Systeme 35

50 Chapter 3. Comparing different CNN Architectures Table 3.3: Results of Training using four models on CIFAR10 data Models CNN Architectures ( Network Topology) Accuracy A CONV2D-64-ReLU-Pooling CONV2D-64-ReLU-Pooling-Dropout(0.5) CONV2D 128-ReLU-Dropout(0.5) FC % FC-192 Softmax B CONV2D-64-ReLU-Pooling CONV2D-64-ReLU-Pooling CONV2D-64-ReLU-Pooling CONV2D-128-ReLU-Pooling 85.2% FC-384 FC-192-Dropout(0.5) Softmax C CONV2D-64-ReLU CONV2D-64-ReLU-Dropout(0.3)-Pooling CONV2D-64-ReLU CONV2D-128-ReLU-Pooling 86.5% FC-384-Dropout(0.5) FC-192 Softmax D CONV2D-64-ReLU-Pooling CONV2D-64-ReLU-Pooling FC-384 FC-192 Softmax 86.1% Angewandte Informatik / Kognitive Systeme 36

51 Chapter 4 A Deep Convolutional Neural Network for Hand Posture Classification There is a huge deep CNN effectiveness in the field of computer vision due to recent state of the art performance in the image recognition using deep learning algorithms. It is now become virtually de facto to apply deep learning in image recognition tasks. There are many variants of CNN played important role in increasing accuracy for image classification tasks. I have used CNN model similar to VGGNet architecture. Model has dropout layer in its network which is not present in VGGNet. Also, Model has less number of convolutional layers than VGGNet. 4.1 Playing Rock-Paper-Scissors with an Artificial Agents There is an artificial agent named NAO at university. It has been programmed to play Rock-Paper-Scissors game with humans. There are three hand postures used in the game. So, NAO must identify correct hand postures of opponent. For identifying hand postures, NAO need to use some kind of machine learning algorithm. As human hand postures are different in colors, sizes or posture orientations based on different people and bakground color of hand postures. Therefore, it is quite challenging for conventional machine learning algorithm to be able to accurately classify them. Deep learning algorithm like CNN is good in feature extraction therefore it is able to perform well for classifying the images with different colors, sizes or orientations. Due to these reasons, I use CNN in classifying different hand postures of game. We have collected images with the help of university staff members and students. We have taken a number of photos of people's hand postures(three postures of game) in three different background using each of two cameras. NAO'camera is one of two cameras used for taking photos. Other one is mobile phone cameras. We took photos from almost 35 university students and staff members. Angewandte Informatik / Kognitive Systeme 37

52 Chapter 4. A Deep Convolutional Neural Network for Hand Posture Classification 4.2 Creating Training Data for Rock-Paper-Scissors Hand Postures After the collecting images from different cameras, we get almost 800 images of all the three hand postures. Every hand postures (three hand images of rock-paper-scissors game) of each person is taken by two cameras in three different orientation having three distinct background. There are 3 distinct orientations for every postures. There are also 3 different background for all the images taken from mobile camera. Only one background is used while taking picture from NAO'camera. Image count from Mobile'camera: 3 (Hand postures) x 3 (different orientation of same hand postures) x 3 (distinct background) x 38 (total number of students and staff members) 1000 images Image count from NAO'camera: 3 (hand postures) x 38 (total number of students and staff members) 350 images Total image count: 1350 images Total count of all the images from people whose image are taken in the experiment, is around 1300 as shown in the above mathematical calculation.so, there are total 12 images of every hand postures of single person. Every image has arbitrary pixel values and size as they are taken with different angle and distance. Total images then split into three sets of training, validation and test in 80:20:20 ratio. It is standard proportion used for machine learning experiments. Each set further split into three subsets of Rock, Paper and Scissors. It would act as label for images. Three hand postures are kept into their corresponding folders of label named Rock, Paper and Scissors. I named data as RPSG Image Data Preprocessing Data preprocessing is one of the most important parts for almost any deep learning experiment. Image data preprocessing includes converting these images into gray scale, pixel shortening, normalization, etc. Images have been taken from arbitrary distance and direction. Therefore, pixel values and size of images are random to each other. There are images whose pixel size is more than 1200*1200. It is very computationally expensive to perform matrix multiplication in CNN model. It is also quite inefficient for deep learning model to work on these images, which have variety of pixel sizes. So, In this experiment, I rescale all the images to 128*128 pixel size. Then, it needs to be changed into gray scale images so that input matrix for CNN would consists of single channel (128*128*1). Using single channel in input matrix makes model less computationally intensive hence significantly improves the training speed. Color makes learning complex and difficult for features extraction. Most of the applications in image processing, color information would not help to learn important features. Due to all these reasons, I gray scale all the images and shorten the pixel values to 128*128. After that, I have programmed in MATLAB, which is one of the Angewandte Informatik / Kognitive Systeme 38

53 Chapter 4. A Deep Convolutional Neural Network for Hand Posture Classification most powerful image preprocessing tools for shortening pixels and converting images to gray scale. After all images are resized and gray scaled, they all have pixel values between 0 to 255. If these values feed into model, then it may lead to numerical overflows. And multiplication of weights with pixel values force neuron to saturate. So, we should apply some techniques to overcome this problem. There is per-channel normalization technique, first zero center the data, then normalized them. It is described below. Per-channel normalization X = np.mean(x, axis = θ) ; X/ = np.std(x, axis = θ) However, I have used simple rescaling techniques for compressing pixel values between 0 and 1(Rescale 1./255). This is implemented by Keras which is a high level python library Image Data Augmentation Images data which we gathered for the experiment are not sufficient for deep learning training. Deep learning models do require significant amount of data so that all its kernel weights would be fully optimized. If there is more data available for training, then there are less possibilities of overfitting. Hence, it would eventually boost performance. Data augmentation could also be inexpensive and powerful method of regularization in deep learning model. However, data augmentation is not enough to prevent overfitting since generated samples are still highly correlated. I created artificial/fake image samples from existing images by a number of random transformations. New Image data has been randomly generated by rotating, horizontal flipping, shifting width and height of existing image data. These parameters have been applied to all images randomly and created 3 new images from each image. Therefore, the total images are almost These techniques provide entirely new sample for inputs. Due to rotation and shift, there is fill mode strategy applied to fill in newly created pixels. These generated images are mot highly correlated from each other as they have different orientation, width, height, etc. All images looks as distinct as inputs. Keras is the library which I used for data augmentation task. As it has already built-in functions to do transformations such as rotating, flipping, shifting weight, etc. It randomly apply various transformations together on every image at each iteration. As a result, CNN model would never see twice the same picture. This helps prevent overfitting and helps model generalize better. These are the below parameters used for random transformations. Based on these parameters, all images are generated from old ones. I have generated around 3 artificial images from each image. Datagen = ImageDataGenerator( rotation range=40, width shift range=0.2, Angewandte Informatik / Kognitive Systeme 39

54 Chapter 4. A Deep Convolutional Neural Network for Hand Posture Classification height shift range=0.2, rescale=1./255, shear range=0.2, zoom range=0.2, horizontal flip=true, fill mode= nearest ) Below is details about transformations and its parameters: ˆ Rotation range: It is a value in degrees (0-180), a range within which to randomly rotate pictures. ˆ Width shift and Height shift: These are ranges (as a fraction of total width or height) within which to randomly translate pictures vertically or horizontally ˆ Rescale: It is a value by which we will multiply the data before any other processing. Our original images consist in RGB coefficients in the 0-255, but such values would be too high for our models to process (given a typical learning rate), so we target values between 0 and 1 instead by scaling with a 1/255. factor. ˆ Shear range: It is for randomly applying shearing transformations ˆ Zoom range: It is for randomly zooming inside pictures ˆ Horizontal flip: It is for randomly flipping half of the images horizontally relevant when there are no assumptions of horizontal assymetry (e.g. real-world pictures). ˆ Fill mode: It is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift. We collected around 1400 images. Each of three classes have almost 420 images. Using MATLAB, I mirror all the images. So, there are around 840 images for each group. Then, I have applied above mentioned transformations to generate 3 images from every image. Finally, there are total 8 images for every original image(which taken by cameras). Figure 4.1: Figure 4.2: 1stFigure 4.3: 2nd Figure 4.4: 3rd Original gray generated image generated image generated image images Angewandte Informatik / Kognitive Systeme 40

55 Chapter 4. A Deep Convolutional Neural Network for Hand Posture Classification 4.3 Realizations of Deep learning project Convolutional neural network, a pillar algorithm of deep learning, are used for perceptual problems such image classifications. As there is small dataset available to perform deep learning algorithm so data have been augmented by programming. Data augmentation is not enough to prevent overfitting. The main focus to prevent overfitting should be entropic capacity of model, how much model is allowed to store. If a model stores a lot of information then it would be likely to learn more features and some of features might be irrelevant. But if it stores only a few features, then it will have to focus on the most significant features. Hence, it is more relevant and generalize better. One way to modulate entropic capacity, is the choice of parameters, i.e the number of layers and size of each layer. Considering all the above reasons, I design a small convnet compare to advanced industrial one. Dropout layers also being used. Dropout layers in addition to data augmentation help to prevent overfitting. Technical details This experiment has been done by remote machine provided by AWS. I rented a machine in which all the required software frameworks and libraries are configured. CUDA, CuDNN, Tensorflow and Keras are already installed in the OS. Operating System: Linux/Unix, Ubuntu 14.04, 64-bit Amazon Machine Image (AMI) AWS Instance details: 1. Instance Name : g2.2xlarge 2. GPU : 1 NVIDIA GRID K520 GPU, each with 1,536 CUDA cores and 4 GB of video memory 3. Internal Memory: 32 GB 4. CPU: 32 vcpus Software framework and Libraries: 1. NVIDIA CUDA and CuDNN : CUDA is NVIDIA's API for programming on the graphics card. cudnn is a library for deep neural nets built using CUDA. Cuda Toolkit Version 7.5 and CuDNN Tensorflow : Tensorflow Keras : Keras as deep learning library run on top of tensorflow. 4. MATLAB : MATLAB R2017a has been used for data preprocessing. Angewandte Informatik / Kognitive Systeme 41

56 Chapter 4. A Deep Convolutional Neural Network for Hand Posture Classification Convnet Architecture In this experiment,the available dataset for training is smaller. So, I build relatively a small CNN architecture. Small networks have lesser entropy capacity so it learns only important features. Therefore, the chance for overfitting is lesser. It would be a good idea to build architecture with optimally calculated number of layers. The layers are 4 convolution, 2 fully connected, 1 dropout, 3 ReLU and 3 max-pooling layers. Dropout randomly removes nodes with a given probability. In this case, I have kept 0.5 as its probability. A dropout layer is placed between two fully connected layer. An excessive use of dropout layers may result a lower accuracy as the available dataset is smaller and entropy capacity is lesser for a smaller model. With every consequent layer,the number of parameters increase drastically. It has been shown in the my previous experiment with CIFAR10 data using Model C that a dropout layer would be more effective near the end of convolutional network in a smaller architecture. Considering all this, it would be efficient to keep a dropout layer between two fully connected layers. The network topology of this experiment is bit similar to Model C which was used for training CIFAR10. Angewandte Informatik / Kognitive Systeme 42

57 Chapter 4. A Deep Convolutional Neural Network for Hand Posture Classification Figure 4.5: Convnet Architecture for RPSG16 data Angewandte Informatik / Kognitive Systeme 43

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Convolutional Neural Networks

Convolutional Neural Networks Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

INTRODUCTION TO DEEP LEARNING

INTRODUCTION TO DEEP LEARNING INTRODUCTION TO DEEP LEARNING CONTENTS Introduction to deep learning Contents 1. Examples 2. Machine learning 3. Neural networks 4. Deep learning 5. Convolutional neural networks 6. Conclusion 7. Additional

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

Neural Network Neurons

Neural Network Neurons Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why? Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of

More information

Deep Learning With Noise

Deep Learning With Noise Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)

More information

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,

More information

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper Deep Convolutional Neural Networks Nov. 20th, 2015 Bruce Draper Background: Fully-connected single layer neural networks Feed-forward classification Trained through back-propagation Example Computer Vision

More information

Advanced Introduction to Machine Learning, CMU-10715

Advanced Introduction to Machine Learning, CMU-10715 Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( ) Structure: 1. Introduction 2. Problem 3. Neural network approach a. Architecture b. Phases of CNN c. Results 4. HTM approach a. Architecture b. Setup c. Results 5. Conclusion 1.) Introduction Artificial

More information

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural

More information

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3 Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.

More information

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning, A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Convolutional Neural Networks

Convolutional Neural Networks NPFL114, Lecture 4 Convolutional Neural Networks Milan Straka March 25, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

Deep Learning. Architecture Design for. Sargur N. Srihari

Deep Learning. Architecture Design for. Sargur N. Srihari Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn

More information

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 11 Sinan Kalkan TRAINING A CNN Fig: http://www.robots.ox.ac.uk/~vgg/practicals/cnn/ Feed-forward pass Note that this is written in terms of the

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

On the Effectiveness of Neural Networks Classifying the MNIST Dataset On the Effectiveness of Neural Networks Classifying the MNIST Dataset Carter W. Blum March 2017 1 Abstract Convolutional Neural Networks (CNNs) are the primary driver of the explosion of computer vision.

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort

More information

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016 CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2016 Assignment 5: Due Friday. Assignment 6: Due next Friday. Final: Admin December 12 (8:30am HEBB 100) Covers Assignments 1-6. Final from

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

Neural Networks. By Laurence Squires

Neural Networks. By Laurence Squires Neural Networks By Laurence Squires Machine learning What is it? Type of A.I. (possibly the ultimate A.I.?!?!?!) Algorithms that learn how to classify data The algorithms slowly change their own variables

More information

Inception Network Overview. David White CS793

Inception Network Overview. David White CS793 Inception Network Overview David White CS793 So, Leonardo DiCaprio dreams about dreaming... https://m.media-amazon.com/images/m/mv5bmjaxmzy3njcxnf5bml5banbnxkftztcwnti5otm0mw@@._v1_sy1000_cr0,0,675,1 000_AL_.jpg

More information

Deep Neural Networks Optimization

Deep Neural Networks Optimization Deep Neural Networks Optimization Creative Commons (cc) by Akritasa http://arxiv.org/pdf/1406.2572.pdf Slides from Geoffrey Hinton CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy

More information

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna, Convolutional Neural Networks: Applications and a short timeline 7th Deep Learning Meetup Kornel Kis Vienna, 1.12.2016. Introduction Currently a master student Master thesis at BME SmartLab Started deep

More information

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Deep Learning Workshop Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Why deep learning? The ImageNet Challenge Goal: image classification with 1000 categories Top 5 error rate of 15%. Krizhevsky, Alex,

More information

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD. Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

Data Mining. Neural Networks

Data Mining. Neural Networks Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most

More information

CS 523: Multimedia Systems

CS 523: Multimedia Systems CS 523: Multimedia Systems Angus Forbes creativecoding.evl.uic.edu/courses/cs523 Today - Convolutional Neural Networks - Work on Project 1 http://playground.tensorflow.org/ Convolutional Neural Networks

More information

Deep Learning Cook Book

Deep Learning Cook Book Deep Learning Cook Book Robert Haschke (CITEC) Overview Input Representation Output Layer + Cost Function Hidden Layer Units Initialization Regularization Input representation Choose an input representation

More information

A Deep Learning Approach to Vehicle Speed Estimation

A Deep Learning Approach to Vehicle Speed Estimation A Deep Learning Approach to Vehicle Speed Estimation Benjamin Penchas bpenchas@stanford.edu Tobin Bell tbell@stanford.edu Marco Monteiro marcorm@stanford.edu ABSTRACT Given car dashboard video footage,

More information

Using Capsule Networks. for Image and Speech Recognition Problems. Yan Xiong

Using Capsule Networks. for Image and Speech Recognition Problems. Yan Xiong Using Capsule Networks for Image and Speech Recognition Problems by Yan Xiong A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science Approved November 2018 by the

More information

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic computational unit of the brain about 10^11 neurons in human brain Simplified neuron model as linear threshold

More information

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Intro to Deep Learning Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Why this class? Deep Features Have been able to harness the big data in the most efficient and effective

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Convolu'onal Neural Networks

Convolu'onal Neural Networks Convolu'onal Neural Networks Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion Slides were adapted from Fei-Fei Li & Andrej Karpathy & Jus8n Johnson A bit of history: Hubel & Wiesel,

More information

CSE 559A: Computer Vision

CSE 559A: Computer Vision CSE 559A: Computer Vision Fall 2018: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao Xia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/

More information

Report: Privacy-Preserving Classification on Deep Neural Network

Report: Privacy-Preserving Classification on Deep Neural Network Report: Privacy-Preserving Classification on Deep Neural Network Janno Veeorg Supervised by Helger Lipmaa and Raul Vicente Zafra May 25, 2017 1 Introduction In this report we consider following task: how

More information

Fuzzy Set Theory in Computer Vision: Example 3, Part II

Fuzzy Set Theory in Computer Vision: Example 3, Part II Fuzzy Set Theory in Computer Vision: Example 3, Part II Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Resource; CS231n: Convolutional Neural Networks for Visual Recognition https://github.com/tuanavu/stanford-

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

Back propagation Algorithm:

Back propagation Algorithm: Network Neural: A neural network is a class of computing system. They are created from very simple processing nodes formed into a network. They are inspired by the way that biological systems such as the

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

MoonRiver: Deep Neural Network in C++

MoonRiver: Deep Neural Network in C++ MoonRiver: Deep Neural Network in C++ Chung-Yi Weng Computer Science & Engineering University of Washington chungyi@cs.washington.edu Abstract Artificial intelligence resurges with its dramatic improvement

More information

Visual object classification by sparse convolutional neural networks

Visual object classification by sparse convolutional neural networks Visual object classification by sparse convolutional neural networks Alexander Gepperth 1 1- Ruhr-Universität Bochum - Institute for Neural Dynamics Universitätsstraße 150, 44801 Bochum - Germany Abstract.

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Convolutional Neural Networks for Handwritten Digit Recognition Andreas Georgopoulos CID: 01281486 Abstract Abstract At this project three different Convolutional Neural Netwroks

More information

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing feature 3 PC 3 Beate Sick Many slides are taken form Hinton s great lecture on NN: https://www.coursera.org/course/neuralnets

More information

Character Recognition Using Convolutional Neural Networks

Character Recognition Using Convolutional Neural Networks Character Recognition Using Convolutional Neural Networks David Bouchain Seminar Statistical Learning Theory University of Ulm, Germany Institute for Neural Information Processing Winter 2006/2007 Abstract

More information

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning  Ian Goodfellow Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains

More information

Deep Learning. Volker Tresp Summer 2014

Deep Learning. Volker Tresp Summer 2014 Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there

More information

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

End-To-End Spam Classification With Neural Networks

End-To-End Spam Classification With Neural Networks End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam

More information

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

Lecture 19: Generative Adversarial Networks

Lecture 19: Generative Adversarial Networks Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,

More information

Fei-Fei Li & Justin Johnson & Serena Yeung

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9-1 Administrative A2 due Wed May 2 Midterm: In-class Tue May 8. Covers material through Lecture 10 (Thu May 3). Sample midterm released on piazza. Midterm review session: Fri May 4 discussion

More information

Apparel Classifier and Recommender using Deep Learning

Apparel Classifier and Recommender using Deep Learning Apparel Classifier and Recommender using Deep Learning Live Demo at: http://saurabhg.me/projects/tag-that-apparel Saurabh Gupta sag043@ucsd.edu Siddhartha Agarwal siagarwa@ucsd.edu Apoorve Dave a1dave@ucsd.edu

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Deep Face Recognition. Nathan Sun

Deep Face Recognition. Nathan Sun Deep Face Recognition Nathan Sun Why Facial Recognition? Picture ID or video tracking Higher Security for Facial Recognition Software Immensely useful to police in tracking suspects Your face will be an

More information

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. Rhodes 5/10/17 What is Machine Learning? Machine learning

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional

More information

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA More Learning Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA 1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector

More information

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017 INF 5860 Machine learning for image classification Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 0, 207 Mandatory exercise Available tonight,

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu Artificial Neural Networks Introduction to Computational Neuroscience Ardi Tampuu 7.0.206 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

ECE 5470 Classification, Machine Learning, and Neural Network Review

ECE 5470 Classification, Machine Learning, and Neural Network Review ECE 5470 Classification, Machine Learning, and Neural Network Review Due December 1. Solution set Instructions: These questions are to be answered on this document which should be submitted to blackboard

More information

ConvolutionalNN's... ConvNet's... deep learnig

ConvolutionalNN's... ConvNet's... deep learnig Deep Learning ConvolutionalNN's... ConvNet's... deep learnig Markus Thaler, TG208 tham@zhaw.ch www.zhaw.ch/~tham Martin Weisenhorn, TB427 weie@zhaw.ch 20.08.2018 1 Neural Networks Classification: up to

More information

Alternatives to Direct Supervision

Alternatives to Direct Supervision CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

Classifying Depositional Environments in Satellite Images

Classifying Depositional Environments in Satellite Images Classifying Depositional Environments in Satellite Images Alex Miltenberger and Rayan Kanfar Department of Geophysics School of Earth, Energy, and Environmental Sciences Stanford University 1 Introduction

More information