MATTI TUHOLA WIRELESS ACCESS POINT QUALITY ASSESSMENT USING CONVOLUTIONAL NEURAL NETWORKS. Bachelor of Science Thesis

Size: px
Start display at page:

Download "MATTI TUHOLA WIRELESS ACCESS POINT QUALITY ASSESSMENT USING CONVOLUTIONAL NEURAL NETWORKS. Bachelor of Science Thesis"

Transcription

1 MATTI TUHOLA WIRELESS ACCESS POINT QUALITY ASSESSMENT USING CONVOLUTIONAL NEURAL NETWORKS Bachelor of Science Thesis Examiner: Heikki Huttunen Submitted: April 29, 2016

2 I ABSTRACT TAMPERE UNIVERSITY OF TECHNOLOGY Degree Programme in Information Technology MATTI TUHOLA: Wireless Access Point Quality Assessment Using Convolutional Neural Networks Bachelor of Science Thesis, 22 pages April 2016 Major: Signal Processing Examiner: Heikki Huttunen Keywords: Convolutional Neural Networks, Deep Learning, Machine Learning, Positioning Positioning has many applications in today s world and there are many methods to improve its accuracy. Using wireless access points is a common way to improve positioning accuracy but not all access points are reliable enough for this purpose. In this thesis, machine learning methods are used to assess the quality of wireless access points. The problem is treated as a supervised learning problem. The data is annotated, pre-processed, and a machine learning model is used to predict the labels created in the annotation process. Convolutional neural networks are used as the principal model and a logistic regression model is used for comparison. The results indicate that machine learning can be used for this purpose. Convolutional neural networks perform better than logistic regression, but not by a large margin.

3 II PREFACE This thesis was made in collaboration with HERE for the Bachelor s Seminar in Signal Processing in the spring of My personal motivation for this work arose from my interest in machine learning. This work was a great opportunity to increase my understanding of machine learning, in particular convolutional neural networks, and apply it to a real-world problem. IwouldliketothankHeikkiHuttunenforhisvaluablefeedbackandforexamining this work, the people at HERE for providing the topic and the data, and my friends and family for their support. Tampere, April 29, 2016 Matti Tuhola

4 III CONTENTS 1. Introduction Theoretical Background Neural Networks and Deep Learning Multi-Layer Perceptrons Convolutional Neural Networks Network Training Model Evaluation Overfitting and Regularization Implementation The Dataset Annotation Data Pre-processing Models and Learning Algorithms Results Discussion Visualizing the Weights and the Feature Maps Conclusions

5 IV LIST OF ABBREVIATIONS AND SYMBOLS CNN GPS GPU MAE MLP MSE ReLU SGD WLAN Convolutional neural network Global Positioning System Graphics processing unit Mean absolute error Multi-layer perceptron Mean squared error Rectified linear unit Stochastic gradient descent Wireless local area network b The bias value of a neuron. J( ) Acostfunctionthatcalculatesascalarcost. p Dropout rate, the probability of a unit in a neural network to be dropped. w The weight vector of a neuron. X Acompletedataset. x (i) y ŷ The ith sample in a dataset. The set of labels for a dataset. The predictions made by a machine learning model. The learning rate, a parameter of the optimization algorithm that controls the rate of change on each iteration. The set of parameters of a machine learning model. ( ) The activation function applied to the output of a neuron.! The convolutional kernel used in convolutional neural networks.

6 1 1. INTRODUCTION The use of positioning systems in different fields has proliferated in the early 21st century. Positioning systems are used in many products, including vehicles and mobile devices, for a vast range of applications such as navigation, mapping, and various location-based services. Emerging technologies such as self-driving cars also heavily depend on positioning. All of these applications benefit from accuracy and afastsignalacquisitiontime. Accordingly,improvingthemhasbeenofinterestto researchers for decades. Positioning systems rely on global navigation satellite systems, such as the Global Positioning System (GPS), as their foundation. These systems have their limitations: they generally cannot operate indoors, and in some situations, acquiring a signal may take a long time. For this reason, information from additional sources, such as cellular sites and wireless local area network (WLAN) access points, is used. The information from these sources can be used to determine which satellites are in range, or to approximate the distance to nearby access points. Data about the WLAN access points can be crowd-sourced from users mobile devices and later be used for positioning. However, the incoming data is raw and may contain inaccuracies due to the heterogeneity of the devices, software bugs, and environmental and other factors. The goal of this thesis is to use machine learning to determine which access points provide data that is reliable enough for positioning. The problem will be treated as a supervised learning problem. In Chapter 2, the theoretical background behind the machine learning techniques used in this thesis will be discussed. The implementation will be discussed in Chapter 3. The machine learning task can be split into two parts. First, the raw data is annotated. A numeric label for each sample is set based on the input from a human expert. The label represents the quality of the given access point. In the second part, the data is pre-processed and fed into a machine learning model that attempts to predict the labels. Experiments are run using multiple convolutional neural network models and a logistic regression model. The results will be discussed in Chapter 4.

7 2 2. THEORETICAL BACKGROUND Machine learning algorithms are algorithms that are able to learn from data. The starting point for any machine learning application is thus to have a dataset X that consists of m samples, X = {x (1), x (2),...,x (m) }. A machine learning model learns its parameters from the data. There are two primary categories of learning, namely unsupervised and supervised learning. In unsupervised learning the model attempts to discover patterns or other insights from unlabeled data. In supervised learning there is, in addition to the dataset X, asetofcorrespondinglabels y = {y (1), y (2),...,y (m) }.Asupervisedlearningmodelapproximatesafunctionthat predicts the labels based on an input x. Supervisedlearningcanbefurtherdivided into regression and classification tasks. In regression, the predicted outputs are continuous numeric values, whereas in classification, the predicted outputs consist of discrete class labels. Machine learning models vary in terms of complexity. The more complex models have a greater representational capacity, meaning that they can represent more intricate functions. The most appropriate function of all the functions that a model can represent is chosen by training the parameters. The models are typically trained using gradient-based methods that change model parameters iteratively. The representational capacity and the training of neural networks,afamilyofmachinelearning models, are discussed in Section 2.1. In order to be useful in the real world, a machine learning model needs to be able to generalize to new data that it has not seen before. To properly evaluate the model, it is conventional to randomly split the dataset into two or three subsets, namely a training set, a test set, and often a validation set [4, p. 222]. The training set is used to train the parameters of the model. The validation set is used to find the best model out of many. It is also used to tune the hyperparameters, ormanually chosen settings that control the behavior of the learning algorithm. The purpose of the test set is to measure the model s ability to generalize to new, unseen data. Model evaluation and splitting the dataset are discussed in Section 2.2. A model is said to overfit, ifitperformswellonthetrainingdata,andpoorlyonthetestdata. Techniques to combat overfitting are discussed in Section 2.3.

8 2.1. Neural Networks and Deep Learning Neural Networks and Deep Learning Deep learning is a branch of machine learning where neural network models with multiple computational layers are used to learn multiple levels of abstraction from the data. In recent years, deep learning has seen a surge in popularity. It is considered to be the state-of-the-art technology in many fields, including computer vision and speech recognition. [10] Deep learning has also proved successful in tasks that were previously unattainable for computers, such as beating human champions in the game of Go [3]. Many machine learning models benefit from having the data represented as a set of hand-crafted features. The representation of the data has a heavy impact on the performance of the model. Choosing the right features for a given task is difficult and often requires substantial domain knowledge. One of the advantages of deep learning models is that they can not only learn to make predictions from existing features but learn the features themselves too, with little data pre-processing or feature extraction. Artificial neural networks, or simply neural networks, are a central concept in deep learning. Neural networks are machine learning models that consist of layers of interconnected units. The units are often referred to as neurons. Neuroscience has been a source of inspiration for artificial neural networks. This link should not, however, be overemphasized. Today, neuroscience is not predominantly used to guide deep learning research, and the goal of deep learning is not to learn to simulate the human brain [5, pp ]. Neural networks have a large number of parameters 2 R N, which allows them to represent very complex functions. In some recent models, N has been as large as 144 million [13]. The parameters can be learned from data using methods such as the backpropagation algorithm and stochastic gradient descent, discussed in Section In the following sections, two types of artificial neural networks will be discussed, namely multi-layer perceptrons and convolutional neural networks. They are examples of feedforward neural networks, where the computational layers are connected sequentially and information flows in one direction without feedback connections. In terms of theoretical advances, few things about deep learning are new. Neural networks have existed for decades; many central ideas such as the backpropagation algorithm and convolutional neural networks were already known in the 1980s and the 1990s [8, 9]. The reason for the newfound success in using neural networks has been in large part due to the availability of large datasets and the growing

9 2.1. Neural Networks and Deep Learning 4 computational capacity. Computational capacity has allowed for models that may have in the order of hundreds of layers. For comparison, neural networks in the 1990s typically only had two or three layers. The large number of layers, or depth, is where deep learning gets its name from Multi-Layer Perceptrons Multi-layer perceptrons (MLPs) are the quintessential example of a feedforward neural network. They are also an example of a supervised learning model. The parameters of the network have to be separately trained in order to use the network for prediction. Multi-layer perceptrons consist of many layers of neurons. They derive their name from the 1950s neuron model, the perceptron [12]. MLPs are discussed in more detail in the work of Goodfellow et al. [5, pp ]. The neurons used in MLPs are simple computational units that are loosely inspired by neurons in the brain. The neuron calculates the inner product of the inputs and the weights and passes the result through an activation function. The structure of aneuronisillustratedinfigure 2.1. x 1 w 1 b x 2 w 2 P y.. x n w n Figure 2.1 A neuron in a multi-layer perceptron. The neuron calculates the inner product of the input vector x and the weight vector w, addsthebiasvalueandpassestheresult through an activation function ( ). h i T. The inputs to the neuron can be represented as a vector x = x 1 x 2 x n The input vectors in MLPs are one-dimensional. Inputs of two or higher dimensions must be vectorized before feeding them into an MLP. The parameters of the neuron h i T, consist of a bias b and a weight vector w = w 1 w 2 w n where the values are the weights for the respective inputs. The output of a single neuron is defined by a =! nx w i x i + b = w T x + b, (2.1) i=1

10 2.1. Neural Networks and Deep Learning 5 where ( ) is the activation function. The purpose of the activation function is to produce a non-linear decision boundary. Non-linearity is essential because a multi-layer perceptron with linear activation functions could be expressed with a single layer. Common activation functions for multi-layer perceptrons include the hyperbolic tangent function (x) =tanh(x) and the logistic sigmoid function (x) = (1 + e x ) 1. The bias term allows the network to shift the activation function to the left or to the right. Multi-layer perceptrons consist of an input layer, one or more hidden layers and an output layer. The only purpose of the input layer is to feed the input vector to the next layer. The units in the other layers are neurons. Hidden layers are computational layers inside the network, whereas the output layer defines the output of the network. In classification, the number of neurons in the output layer is equal to the number of classes in regression, there is just one neuron in the output layer. Figure 2.2 illustrates a multi-layer perceptron with two hidden layers. Input layer Hidden layer Hidden layer Ouput layer x 1 x 2 y 1 x 3 y 2 x 4 x 5 Figure 2.2 A multi-layer perceptron with two hidden layers. The units in the hidden layers and in the output layer, depicted as circles, are neurons. The layers in MLPs are fully connected. This means that every neuron is connected to all of the units in the preceding layer and the subsequent layer. The neurons get their input from the units in the previous layer and pass their output to the units in the next layer. They are not connected to other neurons in the same layer. The network defines a function ŷ = f(x; ), where consists of the biases and the weights of the neurons and ŷ is the predicted output. One of the problems with fully connected neural networks is that the number of connections grows exponentially as the size of the input grows, which makes them

11 2.1. Neural Networks and Deep Learning 6 impractical for large inputs. In addition, they are very susceptible to even small changes in the input. For example, translating or rotating an input image may completely change the predicted output Convolutional Neural Networks Convolutional neural networks (CNNs) are variations of multi-layer perceptrons. Their structure draws inspiration from the visual cortex and they are specifically designed for data that can be represented in a grid-like topology, such as images [5, p. 334]. In this section, convolutional neural networks will be discussed as they are typically applied to image data. Convolutional layers are the core building blocks of CNNs. The parameters of a convolutional layer consist of a set of convolutional kernels. In convolutional layers, the input is not limited to one-dimensional vectors. Input images are represented as 2D or 3D matrices, depending on whether or not there are multiple color channels. The neurons in a convolutional layer are arranged on a 2D plane called a feature map. The neurons in a feature map are locally connected. The output neuron is only connected to a limited area in the input, the size of which is determined by the convolutional kernel. Each neuron is connected to a different area in the input but all neurons use the same convolutional kernel. The use of shared kernels allows the network to express large models with a fairly small number of parameters. Figure 2.3 illustrates a convolutional layer. Input Kernel Feature map Figure 2.3 Aconvolutionallayerconvolvestheinputwithaconvolutionalkernel,producing a feature map.

12 2.1. Neural Networks and Deep Learning 7 Consider a simple case with an N N grayscale image as the input x, andanm M convolutional kernel!. The output of a single neuron is given by a i,j = MX 1 u=0 MX 1 v=0! u,v x i+u,j+v, (2.2) where a i,j is the neuron in the ith column and the jth row of the feature map. In this case, the feature map would be of size (N M +1) (N M +1). If the dimensions of the input were to be retained, the input would have to padded with zeros. Each convolutional layer typically comprises multiple convolutional kernels. As a result, there will be multiple feature maps, and the output of the convolutional layer will be a 3D matrix. The kernel! can also be three-dimensional and extend through a 3D input volume. The size kernel typically depends on size of the input. For example, 3 3 and 5 5 are a common kernel sizes for small images, whereas or might be used for fairly large images. The output from a convolutional layer is passed through a non-linear activation function. The most common activation function for CNNs is the rectified linear unit (ReLU), (x) = max(0,x). ReLUs are preferred to the hyperbolic tangent or the logistic sigmoid function because they help convolutional neural networks converge faster [7]. In addition, they are computationally cheap and they help avoid some problems that are present when training with other activation functions. Pooling layers are often used between successive convolutional layers. Pooling layers reduce the spatial size of the feature maps by partitioning them into k k, most commonly 2 2, non-overlapping tiles and reducing them to a single pixel. The most common type of pooling is max pooling, taking the maximum of each tile. Pooling makes the network more invariant to small translations in the input, which is useful if the presence of some feature is more important than its location [5, pp ]. Another benefit of pooling is that it increases the computational efficiency of the network. Figure 2.4 depicts the structure of a typical convolutional neural network. The structure follows a repeating pattern, where a convolutional layer with a non-linear activation function is followed by a pooling layer. This structure allows the network to learn features hierarchically. The first convolutional layers typically learn lowlevel features such as edges, whereas later layers will learn features that are higher level abstractions. The last layers in the network are typically fully-connected like

13 2.1. Neural Networks and Deep Learning 8 Input layer Hidden layers Output layer Convolution Pooling Convolution Pooling Fully connected MLP Figure 2.4 A convolutional neural network consists of convolutional layers with non-linear activation functions, pooling layers, and fully connected layers. the layers in MLPs. Their purpose is to learn from the high level features produced by the last convolutional layer, and to produce the final output Network Training So far, the focus has been on the representational capacity of neural networks. The process described earlier, where a neural network with parameters gets an input x and produces a predicted output ŷ, iscalledforward propagation. Aneural network model defines a family of functions. By learning the parameters, the most appropriate function to solve a given problem can be selected. In order to learn the parameters, the network needs to be trained on a dataset. In the training phase, the first step is to perform forward propagation on an input x to calculate the predicted output ŷ. Then, based on the difference between ŷ and the label y, acostfunctionj( ) is used to calculate a scalar cost that measures how well the predicted outputs match the labels. The backpropagation algorithm is applied to compute the gradient of the network. The gradient is typically calculated on the cost with respect to the parameters. Another algorithm, stochastic gradient descent (SGD), is commonly used to learn, i.e. to minimize the cost function and to update the parameters, using the gradient. The choice of cost function typically depends on the kind of problem that is being solved. For regression problems, simple cost functions such as the mean squared error (MSE), given by MSE = 1 m mx (ŷ (i) y (i) ) 2, (2.3) i=1

14 2.1. Neural Networks and Deep Learning 9 or the mean absolute error (MAE), given by MAE = 1 m mx i=1 ŷ (i) y (i), (2.4) are usually suitable. MAE has the advantage of being more robust to outliers. For classification problems, using a different cost function such as logarithmic loss can be beneficial. The gradient of the network with respect to its parameters is calculated using backpropagation. Backpropagation starts with the scalar cost calculated by the cost function and uses the chain rule of calculus to calculate the gradients of all the outputs in the previous layers with respect to the cost. These gradients indicate how much the parameters of each unit contributed to the cost. The parameters can then be updated accordingly. Minimizing the cost function is an optimization problem. Optimization algorithms used in machine learning are typically based on the concept of gradient descent. Gradient descent starts with some initial parameters and then iteratively moves towards the minimum based on the gradient of the cost function with respect to the parameters. The cost functions in neural networks are almost always non-convex, which means that there are many local minima. An optimization function is not guaranteed to converge to the global minimum. Gradient descent is sensitive to the initial values of the parameters. Typically, small random values are used for initialization [5, p. 176]. The training is done in epochs. In an epoch, all the samples in the training set are presented to the network once. The number of epochs used to train a neural network can be in the order of hundreds. In vanilla gradient descent, the gradient is computed for the entire dataset. In other words, the parameters are only updated after a complete epoch. In stochastic gradient descent, the gradient is estimated based on one sample or a mini-batch of n samples of the data {x (1), x (2),...,x (n) } with corresponding labels {y (1), y (2),...,y (n) }. The order in which the samples are presented to the network affects the outcome and is therefore typically randomized. Stochastic gradient descent updates the parameters based on the gradient calculated by backpropagation. The parameter update is given by J( ),

15 2.2. Model Evaluation 10 where J( ) is the cost function that calculates a scalar cost for the n samples in amini-batchand is the learning rate that controls the rate of change on each iteration. Choosing the right learning rate can be difficult. If it is too small, the convergence will be slow - if it is too large, fluctuations or even divergence may occur. This is among the reasons that many alternative optimization algorithms, such as Ada- Grad, RMSprop, and Adam [6], have been proposed. These alternative optimization algorithms are still based on the concept of stochastic gradient descent but they typically have an adaptive learning rate or some other characteristics that may hasten the convergence. Training a neural network is a computationally intensive task. Today, neural networks are most commonly trained on graphics processing units (GPUs). GPUs provide a high memory bandwidth and a high degree of parallelism. These features can be utilized to a great extent when training neural networks. [5, pp ] The success of deep learning can, to some extent, be attributed to the efficient use of GPUs [10]. 2.2 Model Evaluation The ultimate purpose of a machine learning model is to be used in the real world on new, unseen data. For this reason, it is important to evaluate the performance of the model. There are two separate problems to consider when evaluating the performance of machine learning models, namely model selection, choosingthebest model out of many, and model assessment, estimatingthemodel sabilitytogen- eralize to new data [4, p. 222]. In an ideal situation, where data is plentiful, the dataset should be split into three subsets, a training set, a validation set, and a test set. The model should then be trained on the training set, and any modifications to its hyperparameters or the model itself should be evaluated on the validation set. The generalization performance of the final model should be assessed on the test set, but only after no further changes to the model or its hyperparameters will be made. The reasoning behind the split is manifold. The performance of the model on the data it was trained on does not reflect its ability to generalize to new data. For this reason, it is important that at least the test set be always separate, even if data is scarce. Furthermore, using a validation set is necessary to make sure that assessing the model on the test set provides a non-biased error rate. If the model or its hyperparameters are chosen so that they minimize the error on the test set, the

16 2.3. Overfitting and Regularization 11 model may learn to slightly overfit the test set. In many cases, the size of the dataset is not large enough to be split into three separate datasets. One of the most common ways to solve this problem is to use k-fold cross-validation [4, pp ]. In k-fold cross-validation, the training set is split into k subsets of equal size. One subset is used for validation, and the other k 1 subsets are used for training the model. This is repeated k times, so that each subset is used for validation, and the results are combined. One of the drawbacks of k-fold cross-validation is that it can be computationally expensive on large models. 2.3 Overfitting and Regularization Neural network models have a large number of parameters, which makes them very expressive. It also makes them prone to a problem known as overfitting. Overfitting occurs when the model starts to learn the noise from the training data, decreasing the model s ability to generalize to data outside the training set. There are many ways to reduce overfitting. One of the simplest methods is to simply have more training data. More data can be gathered and annotated, or, the existing data can be augmented by adding slightly modified copies of the existing samples to the training set. Augmentation is especially applicable to image data; new images can be created by rotating, translating, and mirroring existing ones. Augmentation has been found to be an effective way to improve generalization performance. It cannot, however, be applied to all kinds of data. Techniques that reduce overfitting are known as regularization techniques. Traditional regularization techniques include Tikhonov, L 2,andL 1 regularization. They add an additional regularization term to the cost function. This term penalizes large weights. Consequently, the network will learn to prefer small weights, which makes it more difficult for the network to learn noise from the training data. Dropout [14] is a method for reducing overfitting in neural networks. Instead of changing the cost function, dropout changes the network itself. The key idea is to randomly drop some units in the training phase. Dropping means that the output of the unit is set to zero, which effectively means that the unit has no effect on the output of the network. Each unit has a probability p of being dropped. The probability p is a hyperparameter of the model. Figure 2.5 illustrates a neural network with dropout applied to it.

17 2.3. Overfitting and Regularization 12 Input layer Hidden layer Hidden layer Ouput layer x 1 x 2 y 1 x 3 y 2 x 4 x 5 Figure 2.5 The multi-layer perceptron from Figure 2.2 after applying dropout with a dropout rate of 0.5. Dropout is performed in the training phase. When testing the model, no units are dropped. Dropout has been found to be a very effective method for reducing overfitting. Units being dropped out forces the remaining units in the network to learn more independently. Applying dropout is approximately equal to training many neural networks and averaging their output [14].

18 13 3. IMPLEMENTATION 3.1 The Dataset The dataset consists of signal strength measurements near WLAN access points. Each sample in the dataset is a point cloud with measurements from one access point. The number of measurements in a sample ranges from one to the order of one thousand. Each measurement comprises three values, namely the latitude, the longitude, and the signal strength. The data has been crowd-sourced from mobile devices that have been in range of the access point and connected to GPS. The raw data is unlabeled and contains inaccuracies. The objective is to create a model that can filter out the less accurate access points and thus improve positioning accuracy. To achieve this, randomly picked 7,500 samples from a larger dataset were annotated and pre-processed. 3.2 Annotation In order to apply a supervised learning algorithm, the data needs to have labels. In this case, a label would be a score that tells how useful a particular sample of the data is. Annotating the data to create labels is often a laborious task that needs to be performed manually. In this particular case it would be difficult for the user to consistently determine a score by just looking at one sample of the data at a time. To solve this problem, a different approach was taken to annotating the data. Instead of looking at one sample of the data at a time, two samples were compared to each other and the user had to choose the one they found to be better. Comparing all the possible pairs would be impractical, so the Swiss tournament system [11] was used to create a limited number of pairs. The Swiss tournament system is a non-elimination system with a predetermined number of rounds, n. Inthefirstround, thedatais randomly split into pairs. The user determines the winners that each receive one point. In the next rounds, the pairs are randomly formed among samples with the same score. The process is repeated until all the comparisons have been made.

19 3.2. Annotation 14 Figure 3.1 Ascreenshotoftheannotationsoftware.Theuserchoosesthebettersample of the two displayed at a time. In the above example, the left sample is better, because the data points with the best signal strengths are clearly concentrated in a small area. For this purpose, a program for annotating the data was written in MATLAB. A screenshot of this program is shown in Figure 3.1. The program takes the raw data as input and plots two samples from it to the map at a time. The signal strength is represented by color the warmer the color, the stronger the signal. The map is used as a contextual cue to help the user determine which of the two samples compared at a time is better. The user then chooses the better sample, and the annotation process unfolds as described earlier. The final output is a file with the file names and the scores of the respective samples, scaled between 0 to 1 with n +1 discrete values. This method is not without its drawbacks. It relies on human intuition and understanding on what makes a good sample. This is likely to differ between people. In addition, the user may be forced to choose between two equally good or equally bad samples, which may skew the results. These problems could be alleviated by annotating the same data multiple times and combining the results, but in any case, they are limitations of the method that must be acknowledged.

20 3.3. Data Pre-processing Data Pre-processing The data collected from the WLAN access points cannot be fed into a model as a raw object that consists of the data points and metadata it needs to be pre-processed. The data was chosen to be represented as grayscale images. The size of these images was chosen to be with values ranging from 0 to 255. Each non-zero pixel represents a data point and the pixel value represents the signal strength. It was found that this size was sufficient to fit all the data points in more than 95 % of the samples. In many cases, the points that were left out were outliers. The median of the latitudes and the longitudes of all the points in a sample was chosen to be the center point of the image. Figure 3.2 Samples from the dataset in the form they were fed to the learning model. Figure 3.2 illustrates the input images. The 7,500 samples of data were randomly split into a training set of 80 % and a test set of 20 %. In practice, this means that there were 6,000 training samples and 1,500 test samples. Since the size of the training set can be considered small for a deep model, an augmented version of the training set was created. The input images were mirrored and rotated to produce in total 8 times as much data as there was originally. This was expected to increase the network s invariance to rotation and thus increase its ability to generalize. 3.4 Models and Learning Algorithms The machine learning models were implemented using a Python library Keras [2]. Keras offers a high level of abstraction, making it relatively straightforward to build and experiment with different kinds of models. It needs another library, a so-called backend, to handle the low-level computations. Theano [1] was used as the backend. The input data was represented as a NumPy array [15]. The shape of the training data array, as an example, was The first value refers to the number of samples, the second one to the number of color channels, and the remaining two to the dimensions of the input image. A convolutional neural network with two convolutional layers written in Keras is shown in Program 3.1.

21 3.4. Models and Learning Algorithms 16 model = Sequential() # Two convolutional layers with 32 kernels and a kernel size of 3x3. model.add(convolution2d(32, 3, 3, border_mode= valid, input_shape=(1, 64, 64))) model.add(activation( relu )) model.add(dropout(0.25)) model.add(maxpooling2d(pool_size=(2, 2))) model.add(convolution2d(32, 3, 3)) model.add(activation( relu )) model.add(dropout(0.25)) model.add(maxpooling2d(pool_size=(2, 2))) # Two fully connected (dense) layers. model.add(flatten()) model.add(dense(128)) model.add(activation( relu )) model.add(dropout(0.5)) model.add(dense(1)) # Squash final output between zero and one. model.add(activation( sigmoid )) model.compile(loss= mae, optimizer= rmsprop ) # Train the model on the training data in batches of 128 samples, # iterating over the entire dataset 100 times. Validate on the test data. model.fit(x_train, y_train, 128, 100, verbose=1, validation_data=(x_test, y_test)) Program 3.1 AsimpleconvolutionalneuralnetworkwritteninKeras. Multiple convolutional neural networks with a varying number of convolutional layers were made. After the convolutional layers, the models have two fully-connected layers, the latter of which is the output layer and consists of only one neuron. Many of the hyperparameters, such as the number and the size of the convolutional kernels, and the dropout rate, were chosen experimentally using 5-fold cross-validation on the training data. The models used 32 convolutional kernels in each layer, the size of the kernel being 3 3. Adropoutrateof0.25 was used for the convolutional layers. In addition to the convolutional neural networks, one logistic regression model was implemented for comparison. Logistic regression models [4, pp ] are among the simplest machine learning models there are. Despite its simplicity, logistic regression often provides adequate results for many problems. Logistic regression can also be interpreted to be a neural network with a single layer and a logistic sigmoid

22 3.4. Models and Learning Algorithms 17 activation function. logistic regression. The input images were vectorized into vectors for RMSprop was used as the optimization algorithm. The cost function was chosen to be mean absolute error (MAE) because of its easy interpretability. The outputs of the models range between 0 and 1, and a 0.1 mean absolute error means that the prediction is, in average, off by 0.1.

23 18 4. RESULTS The models were trained five times, each time with a new random split for the training data and the test data, and the results were averaged. This was done to compensate for the small size of the test set, which was just 1,500 samples. The experiment was first run with the original training set of 6,000 samples. The test errors and the time to train the models are shown in Table 4.1. The number of convolutional layers is shown in parenthesis after the name of the model. Table 4.1 The test errors and training times of the models without data augmentation. Model Error (MAE) Time to train (100 epochs) CNN (1) min CNN (2) min CNN (3) min CNN (5) min CNN (7) min Logistic regression min The same experiment was run with augmented training data. The augmented training set included rotated and mirrored copies of the input images. The size of the augmented training set was 48,000 samples. The results are shown in Table 4.2. Table 4.2 The test errors and training times of the models with data augmentation. Model Error (MAE) Time to train (100 epochs) CNN (1) min CNN (2) h17min CNN (3) h45min CNN (5) h20min CNN (7) h28min Logistic regression min All the models were trained on a single NVIDIA Tesla K40M GPU. The models were trained for 100 epochs with a mini-batch size of 64.

24 4.1. Discussion Discussion Augmenting the data decreases the test error on all convolutional neural networks. The effect of augmentation on the logistic regression model is minute. Training with the larger dataset also comes with a cost, namely a significantly longer time to train the model. In practice this is not necessarily an issue, since the model only needs to be trained once in order to be used. Convolutional neural networks performed better than logistic regression in every test. The convolutional neural network architecture with five layers was found to have the lowest test error in all cases. The differences among the error rates of the CNN models were, however, small and the significance of these differences can be questioned. The results are slightly affected by noise. Sources of noise include the random initialization of the parameters and the randomized mini-batches. The fact that the results are averages from five separate training passes should decrease the effect of the noise. Figure 4.1 The test error of the models over 100 epochs of training. Figure 4.1 shows the test error of the different models over 100 epochs of training. The test errors in the figure are from one of the five training passes. The logistic regression model converges immediately and doesn t change in a meaningful way over the training epochs. The small convolutional neural network models also converge quickly, whereas the larger networks benefit from the large number of epochs. The training error of the CNN with one convolutional layer appears to increase over time, which implies overfitting and insufficient regularization.

25 4.2. Visualizing the Weights and the Feature Maps Visualizing the Weights and the Feature Maps Neural networks have been described as black box models because their inner workings can be difficult to understand. Visualizing the weights or the outputs of the layers can provide insight into this. Input Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Figure 4.2 Examples of the feature maps produced by a convolutional neural network given the input image on the left. Other layers such as Figure 4.2 visualizes some of the feature maps produced by the convolutional neural network model with five layers. In the figure, the gray background color represents zeros, lighter shades represent positive values, and darker shades represent negative values. The feature maps are depicted before the ReLU non-linearity has been applied to them. The first convolutional layer appears to have responded to the edges, whereas the following convolutional layers appear to have learned increasingly high level features about the sample. For the logistic regression model, the input images were vectorized to a vector. Each input unit has a corresponding weight that dictates how much of an impact the input unit has on the output. The weights can be used to infer what kind of an input the model would consider ideal. The weights of the logistic regression model, reshaped into a image, are shown in Figure 4.3. The color coding is the same as it was for the feature maps.

26 4.2. Visualizing the Weights and the Feature Maps 21 Figure 4.3 The weights of the logistic regression model reshaped into an image. The weights show that the logistic regression model heavily favors inputs that have strong pixel values at the center of the image or in close proximity to it. Values around this region are negative. In other words, inputs that have the their data points concentrated near the center are preferred. The values outside of these regions appear to be very small. The reason for this is probably that most samples simply did not have any non-zero values in these regions.

27 22 5. CONCLUSIONS The motivation behind this thesis was to determine whether machine learning, and especially convolutional neural networks, could be used to determine whether a WLAN access point provides accurate data for the purposes of positioning. The starting point was raw and unlabeled WLAN data. The data was annotated using a program specifically built for the purpose. The annotation method compared two samples at a time over multiple passes over the dataset. The data was pre-processed into images in order to feed it to a machine learning model. Five convolutional neural network models and a logistic regression model were used in the experiments. The experiments were run with the normal dataset and an artificially augmented dataset. The results suggest that using machine learning to assess the quality of WLAN access points is feasible. The best results were achieved with a convolutional neural network architecture that had five convolutional layers. The differences between the models were, however, rather small. Furthermore, it was observed that logistic regression, amuchsimplermodel,couldalsoachieveresultsrelativelyclosetothoseobtained with convolutional neural networks. Data augmentation was found to improve the error rate of the models at the expense of a longer training time. In the future, more work should go towards the annotation process. As it is, the annotation process may produce very different labels for very similar samples of the data. This problem could be alleviated by annotating the same data many times using the current method, or, by using a different method altogether. Some, but probably limited, improvements could also be made by further tuning the models and their hyperparameters. In addition, having more data could improve the performance of the model, or at least increase the confidence in the results.

28 23 BIBLIOGRAPHY [1] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio, Theano: a CPU and GPU math expression compiler, in Proceedings of the Python for Scientific Computing Conference (SciPy), June2010. [2] F. Chollet, Keras, [3] T. Chouard, The Go Files: AI computer wraps up 4-1 victory against human champion, Nature, [Online]. Available: http: //dx.doi.org/ /nature [4] T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: data mining, inference and prediction, 2nd ed. Springer, [Online]. Available: [5] Y. B. Ian Goodfellow and A. Courville, Deep learning, 2016, book in preparation for MIT Press. [Online]. Available: org [6] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, CoRR, vol. abs/ , [Online]. Available: [7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp [Online]. Available: imagenet-classification-with-deep-convolutional-neural-networks.pdf [8] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural Computation, vol. 1, no. 4, pp , Winter1989. [9] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86,no.11,pp , November [10] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp , [Online]. Available: nature14539

29 Bibliography 24 [11] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, J. Astola, M. Carli, and F. Battisti, TID2008 a database for evaluation of full-reference visual quality assessment metrics, Advances of Modern Radioelectronics, vol. 10, pp , [12] F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, vol.65,no.6,pp , [13] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, CoRR, vol. abs/ , [Online]. Available: [14] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, vol.15,no.1,pp ,2014. [15] S. van der Walt, S. C. Colbert, and G. Varoquaux, The NumPy array: A structure for efficient numerical computation, Computing in Science Engineering, vol. 13, no. 2, pp , March 2011.

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation

More information

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30

More information

Convolutional Neural Networks

Convolutional Neural Networks Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

More information

Deep Learning With Noise

Deep Learning With Noise Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu

More information

Character Recognition Using Convolutional Neural Networks

Character Recognition Using Convolutional Neural Networks Character Recognition Using Convolutional Neural Networks David Bouchain Seminar Statistical Learning Theory University of Ulm, Germany Institute for Neural Information Processing Winter 2006/2007 Abstract

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

On the Effectiveness of Neural Networks Classifying the MNIST Dataset On the Effectiveness of Neural Networks Classifying the MNIST Dataset Carter W. Blum March 2017 1 Abstract Convolutional Neural Networks (CNNs) are the primary driver of the explosion of computer vision.

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD. Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,

More information

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning, A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3 Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.

More information

Advanced Introduction to Machine Learning, CMU-10715

Advanced Introduction to Machine Learning, CMU-10715 Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Emotion Detection using Deep Belief Networks

Emotion Detection using Deep Belief Networks Emotion Detection using Deep Belief Networks Kevin Terusaki and Vince Stigliani May 9, 2014 Abstract In this paper, we explore the exciting new field of deep learning. Recent discoveries have made it possible

More information

Visual object classification by sparse convolutional neural networks

Visual object classification by sparse convolutional neural networks Visual object classification by sparse convolutional neural networks Alexander Gepperth 1 1- Ruhr-Universität Bochum - Institute for Neural Dynamics Universitätsstraße 150, 44801 Bochum - Germany Abstract.

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

INTRODUCTION TO DEEP LEARNING

INTRODUCTION TO DEEP LEARNING INTRODUCTION TO DEEP LEARNING CONTENTS Introduction to deep learning Contents 1. Examples 2. Machine learning 3. Neural networks 4. Deep learning 5. Convolutional neural networks 6. Conclusion 7. Additional

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting

More information

A Quick Guide on Training a neural network using Keras.

A Quick Guide on Training a neural network using Keras. A Quick Guide on Training a neural network using Keras. TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from

More information

An Introduction to NNs using Keras

An Introduction to NNs using Keras An Introduction to NNs using Keras Michela Paganini michela.paganini@cern.ch Yale University 1 Keras Modular, powerful and intuitive Deep Learning python library built on Theano and TensorFlow Minimalist,

More information

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated

More information

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction

More information

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn

More information

Data Mining. Neural Networks

Data Mining. Neural Networks Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Convolutional Neural Networks for Handwritten Digit Recognition Andreas Georgopoulos CID: 01281486 Abstract Abstract At this project three different Convolutional Neural Netwroks

More information

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun

More information

Learning visual odometry with a convolutional network

Learning visual odometry with a convolutional network Learning visual odometry with a convolutional network Kishore Konda 1, Roland Memisevic 2 1 Goethe University Frankfurt 2 University of Montreal konda.kishorereddy@gmail.com, roland.memisevic@gmail.com

More information

Deep Learning. Architecture Design for. Sargur N. Srihari

Deep Learning. Architecture Design for. Sargur N. Srihari Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information

Neural Network Neurons

Neural Network Neurons Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

Notes on Multilayer, Feedforward Neural Networks

Notes on Multilayer, Feedforward Neural Networks Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic computational unit of the brain about 10^11 neurons in human brain Simplified neuron model as linear threshold

More information

SIIM 2017 Scientific Session Analytics & Deep Learning Part 2 Friday, June 2 8:00 am 9:30 am

SIIM 2017 Scientific Session Analytics & Deep Learning Part 2 Friday, June 2 8:00 am 9:30 am SIIM 2017 Scientific Session Analytics & Deep Learning Part 2 Friday, June 2 8:00 am 9:30 am Performance of Deep Convolutional Neural Networks for Classification of Acute Territorial Infarct on Brain MRI:

More information

Lecture #11: The Perceptron

Lecture #11: The Perceptron Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer

More information

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Tianyu Wang Australia National University, Colledge of Engineering and Computer Science u@anu.edu.au Abstract. Some tasks,

More information

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

Real-Time Depth Estimation from 2D Images

Real-Time Depth Estimation from 2D Images Real-Time Depth Estimation from 2D Images Jack Zhu Ralph Ma jackzhu@stanford.edu ralphma@stanford.edu. Abstract ages. We explore the differences in training on an untrained network, and on a network pre-trained

More information

Introduction to Deep Learning

Introduction to Deep Learning ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)

More information

Rotation Invariance Neural Network

Rotation Invariance Neural Network Rotation Invariance Neural Network Shiyuan Li Abstract Rotation invariance and translate invariance have great values in image recognition. In this paper, we bring a new architecture in convolutional neural

More information

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18, REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Asmita Goswami [1], Lokesh Soni [2 ] Department of Information Technology [1] Jaipur Engineering College and Research Center Jaipur[2]

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider

More information

Classifying Depositional Environments in Satellite Images

Classifying Depositional Environments in Satellite Images Classifying Depositional Environments in Satellite Images Alex Miltenberger and Rayan Kanfar Department of Geophysics School of Earth, Energy, and Environmental Sciences Stanford University 1 Introduction

More information

Stochastic Gradient Descent Algorithm in the Computational Network Toolkit

Stochastic Gradient Descent Algorithm in the Computational Network Toolkit Stochastic Gradient Descent Algorithm in the Computational Network Toolkit Brian Guenter, Dong Yu, Adam Eversole, Oleksii Kuchaiev, Michael L. Seltzer Microsoft Corporation One Microsoft Way Redmond, WA

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016 CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2016 Assignment 5: Due Friday. Assignment 6: Due next Friday. Final: Admin December 12 (8:30am HEBB 100) Covers Assignments 1-6. Final from

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

ECE 5470 Classification, Machine Learning, and Neural Network Review

ECE 5470 Classification, Machine Learning, and Neural Network Review ECE 5470 Classification, Machine Learning, and Neural Network Review Due December 1. Solution set Instructions: These questions are to be answered on this document which should be submitted to blackboard

More information

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning  Ian Goodfellow Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

Challenges motivating deep learning. Sargur N. Srihari

Challenges motivating deep learning. Sargur N. Srihari Challenges motivating deep learning Sargur N. srihari@cedar.buffalo.edu 1 Topics In Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation

More information

Stochastic Function Norm Regularization of DNNs

Stochastic Function Norm Regularization of DNNs Stochastic Function Norm Regularization of DNNs Amal Rannen Triki Dept. of Computational Science and Engineering Yonsei University Seoul, South Korea amal.rannen@yonsei.ac.kr Matthew B. Blaschko Center

More information

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation Learning Learning agents Inductive learning Different Learning Scenarios Evaluation Slides based on Slides by Russell/Norvig, Ronald Williams, and Torsten Reil Material from Russell & Norvig, chapters

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Convolution Neural Network for Traditional Chinese Calligraphy Recognition Convolution Neural Network for Traditional Chinese Calligraphy Recognition Boqi Li Mechanical Engineering Stanford University boqili@stanford.edu Abstract script. Fig. 1 shows examples of the same TCC

More information

Stacked Denoising Autoencoders for Face Pose Normalization

Stacked Denoising Autoencoders for Face Pose Normalization Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University

More information

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment

More information

Pedestrian and Part Position Detection using a Regression-based Multiple Task Deep Convolutional Neural Network

Pedestrian and Part Position Detection using a Regression-based Multiple Task Deep Convolutional Neural Network Pedestrian and Part Position Detection using a Regression-based Multiple Tas Deep Convolutional Neural Networ Taayoshi Yamashita Computer Science Department yamashita@cs.chubu.ac.jp Hiroshi Fuui Computer

More information

Deep Learning Based Large Scale Handwritten Devanagari Character Recognition

Deep Learning Based Large Scale Handwritten Devanagari Character Recognition Deep Learning Based Large Scale Handwritten Devanagari Character Recognition Ashok Kumar Pant (M.Sc.) Institute of Science and Technology TU Kirtipur, Nepal Email: ashokpant87@gmail.com Prashnna Kumar

More information

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Deep Learning Workshop Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Why deep learning? The ImageNet Challenge Goal: image classification with 1000 categories Top 5 error rate of 15%. Krizhevsky, Alex,

More information

Convolutional Neural Networks for No-Reference Image Quality Assessment

Convolutional Neural Networks for No-Reference Image Quality Assessment Convolutional Neural Networks for No-Reference Image Quality Assessment Le Kang 1, Peng Ye 1, Yi Li 2, and David Doermann 1 1 University of Maryland, College Park, MD, USA 2 NICTA and ANU, Canberra, Australia

More information

Deep Learning. Volker Tresp Summer 2014

Deep Learning. Volker Tresp Summer 2014 Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there

More information

Global Optimality in Neural Network Training

Global Optimality in Neural Network Training Global Optimality in Neural Network Training Benjamin D. Haeffele and René Vidal Johns Hopkins University, Center for Imaging Science. Baltimore, USA Questions in Deep Learning Architecture Design Optimization

More information

Deep Neural Networks Optimization

Deep Neural Networks Optimization Deep Neural Networks Optimization Creative Commons (cc) by Akritasa http://arxiv.org/pdf/1406.2572.pdf Slides from Geoffrey Hinton CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy

More information

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why? Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of

More information

Neural Networks (pp )

Neural Networks (pp ) Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,

More information

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Jincheng Cao, SCPD Jincheng@stanford.edu 1. INTRODUCTION When running a direct mail campaign, it s common practice

More information

All You Want To Know About CNNs. Yukun Zhu

All You Want To Know About CNNs. Yukun Zhu All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Using neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah

Using neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah Using neural nets to recognize hand-written digits Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the first chapter of the online book by Michael

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Convolutional Neural Network for Image Classification

Convolutional Neural Network for Image Classification Convolutional Neural Network for Image Classification Chen Wang Johns Hopkins University Baltimore, MD 21218, USA cwang107@jhu.edu Yang Xi Johns Hopkins University Baltimore, MD 21218, USA yxi5@jhu.edu

More information

Neuron Selectivity as a Biologically Plausible Alternative to Backpropagation

Neuron Selectivity as a Biologically Plausible Alternative to Backpropagation Neuron Selectivity as a Biologically Plausible Alternative to Backpropagation C.J. Norsigian Department of Bioengineering cnorsigi@eng.ucsd.edu Vishwajith Ramesh Department of Bioengineering vramesh@eng.ucsd.edu

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

Weighted Convolutional Neural Network. Ensemble.

Weighted Convolutional Neural Network. Ensemble. Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com

More information

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work Contextual Dropout Finding subnets for subtasks Sam Fok samfok@stanford.edu Abstract The feedforward networks widely used in classification are static and have no means for leveraging information about

More information