Multi-Dimensional Recurrent Neural Networks

Size: px
Start display at page:

Download "Multi-Dimensional Recurrent Neural Networks"

Transcription

1 Multi-Dimensional Recurrent Neural Networks Alex Graves 1, Santiago Fernández 1, Jürgen Schmidhuber 1,2 1 IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland 2 TU Munich, Boltzmannstr. 3, Garching, Munich, Germany {alex,santiago,juergen}@idsia.ch Abstract. Recurrent neural networks (RNNs) have proved effective at one dimensional sequence learning tasks, such as speech and online handwriting recognition. Some of the properties that make RNNs suitable for such tasks, for example robustness to input warping, and the ability to access contextual information, are also desirable in multi-dimensional domains. However, there has so far been no direct way of applying RNNs to data with more than one spatio-temporal dimension. This paper introduces multi-dimensional recurrent neural networks, thereby extending the potential applicability of RNNs to vision, video processing, medical imaging and many other areas, while avoiding the scaling problems that have plagued other multi-dimensional models. Experimental results are provided for two image segmentation tasks. 1 Introduction Recurrent neural networks (RNNs) were originally developed as a way of extending neural networks to sequential data. The addition of recurrent connections allows RNNs to make use of previous context, as well as making them more more robust to warping along the time axis than non-recursive models [15]. Access to contextual information and robustness to warping are also important when dealing with multi-dimensional data. For example, a face recognition algorithm should use the entire face as context, and should be robust to changes in perspective, distance etc. It therefore seems desirable to apply RNNs to such tasks. However, the standard RNN architectures are explicitly one dimensional, meaning that in order to use them for multi-dimensional tasks, the data must be pre-processed to one dimension, for example by presenting one vertical line of an image at a time to the network. One successful use of neural networks for multi-dimensional data has been the application of convolution networks [10] to image processing tasks such as digit recognition [14]. One disadvantage of convolution nets is that because they are not recurrent, they rely on hand specified kernel sizes to introduce context. Another disadvantage is that they do not scale well to large images. For example, sequences of handwritten digits must be pre-segmented into individual characters before they can be recognised by convolution networks [10]. Various statistical models have been proposed for multi-dimensional data, notably multi-dimensional HMMs. However, multi-dimensional HMMs suffer from two serious drawbacks: (1) the time required to run the Viterbi algorithm, and thereby calculate the optimal state sequences, grows exponentially with the size of the data exemplars; (2) the

2 Fig. 1. 2D RNN Forward pass. Fig. 2. 2D RNN Backward pass. number of transition probabilities, and hence the required memory, grows exponentially with the data dimensionality. Numerous approximate methods have been proposed to alleviate one or both of these problems, including pseudo 2D and 3D HMMs [7], isolating elements [11], approximate Viterbi algorithms [9], and dependency tree HMMs [8]. However, none of these methods exploit the full multi-dimensional structure of the data. As we will see, multi dimensional recurrent neural networks (MDRNNs) bring the benefits of RNNs to multi-dimensional data, without suffering from the scaling problems described above. Section 2 describes the MDRNN architecture, Section 3 presents two experiments on image segmentation, and concluding remarks are given in Section 4. 2 Multi-Dimensional Recurrent Neural Networks The basic idea of MDRNNs is to replace the single recurrent connection found in standard RNNs with as many recurrent connections as there are dimensions in the data. During the forward pass, at each point in the data sequence, the hidden layer of the network receives both an external input and its own activations from one step back along all dimensions. Figure 1 illustrates the two dimensional case. Note that, although the word sequence usually denotes one dimensional data, we will use it to refer to data examplars of any dimensionality. For example, an image is a two dimensional sequence, a video is a three dimensional sequence, and a series of fmri brain scans is a four dimensional sequence. Clearly, the data must be processed in such a way that when the network reaches a point in an n-dimensional sequence, it has already passed through all the points from which it will receive its previous activations. This can be ensured by following a suitable ordering on the set of points {(x 1, x 2,..., x n )}. One example of a suitable ordering is (x 1,..., x n ) < (x 1,..., x n) if m (1,..., n) such that x m < x m and x i = x i i (1,..., m 1). Note that this is not the only possible ordering, and that its realisation for a particular sequence depends on an arbitrary choice of axes. We will return to this point in Section 2.1. Figure 3 illustrates the ordering for a 2 dimensional sequence. The forward pass of an MDRNN can then be carried out by feeding forward the input and the n previous hidden layer activations at each point in the ordered input sequence, and storing the resulting hidden layer activations. Care must be taken at the sequence boundaries not to feed forward activations from points outside the sequence. Note that the points in the input sequence will in general be multivalued vectors. For example, in a two dimensional colour image, the inputs could be single pixels rep-

3 Fig. 3. 2D sequence ordering. The MDRNN forward pass starts at the origin and follows the direction of the arrows. The point (i,j) is never reached before both (i-1,j) and (i,j-1). resented by RGB triples, or blocks of pixels, or the outputs of a preprocessing method such as a discrete cosine transform. The error gradient of an MDRNN (that is, the derivative of some objective function with respect to the network weights) can be calculated with an n-dimensional extension of backpropagation through time (BPTT) [15]. As with one dimensional BPTT, the sequence is processed in the reverse order of the forward pass. At each timestep, the hidden layer receives both the output error derivatives and its own n future derivatives. Figure 2 illustrates the BPTT backward pass for two dimensions. Again, care must be taken at the sequence boundaries. At a point x = (x 1,..., x n ) in an n-dimensional sequence, define i x j and hx k respectively as the activations of the j th input unit and the k th hidden unit. Define w kj as the weight of the connection going from unit j to unit k. Then for an n-dimensional MDRNN whose hidden layer consists of H summation units, where θ k is the activation function of hidden unit k, the forward pass for a sequence with dimensions (X 1, X 2,..., X n ) can be summarised as follows: for x 1 = 0 to X 1 1 do for x 2 = 0 to X 2 1 do... for x n = 0 to X n 1 do for k = 1 to H do a k = j in(x 1,...,x n) j for i = 1 to n do if x i > 0 then w kj a k += j h(x 1,...,x i 1,...,x n) j h x k = θ k (a k ) w kj Algorithm 1: MDRNN Forward Pass Defining ô x j and ĥx k respectively as the derivatives of the objective function with respect to the activations of the j th output unit and the k th hidden unit at point x, the backward pass is:

4 for x 1 = X 1 1 to 0 do for x 2 = X 2 1 to 0 do... for x n = X n 1 to 0 do for k = 1 to H do e k j ô(x 1,...,x n) j w jk for i = 1 to n do if x i < X i 1 then e k += j ĥ(x 1,...,x i +1,...,x n) j ĥ x k θ k(e k ) w jk Algorithm 2: MDRNN Backward Pass Fig. 4. Context available at (i,j) to a 2D RNN with a single hidden layer. Fig. 5. Context available at (i,j) to a multi-directional 2D RNN. Since the forward and backward pass require one pass each through the data sequence, the overall complexity of MDRNN training is linear in the number of data points and the number of network weights. 2.1 Multi-Directional MDRNNs At a point (x 1,..., x n ) in the input sequence, the network described above has access to all points (x 1,..., x n) such that x i x i i (1,..., n). This defines an n-dimensional context region of the full sequence, as illustrated in Figure 4. For some tasks, such as object recognition, this would in principle be sufficient. The network could process the image as usual, and output the object label at a point when the object to be recognized is entirely contained in the context region. Intuitively however, we would prefer the network to have access to the surrounding context in all directions. This is particularly true for tasks where precise localization is required, such as image segmentation. For one dimensional RNNs, the problem of multi-directional context was solved by the introduction of bidirectional recurrent neural networks (BRNNs) [13]. BRNNs contain two separate hidden layers that process the input sequence in the forward and reverse directions. The two hidden layers are connected to a single output layer, thereby providing the network with access to both past and future context. BRNNs can be extended to n-dimensional data by using 2 n separate hidden layers, each of which processes the sequence using the ordering defined above, but with a

5 Fig. 6. Axes used by the 4 hidden layers in a multi-directional 2D network. The arrows inside the rectangle indicate the direction of propagation during the forward pass. different choice of axes. The axes are chosen so that each one has its origin on a distinct vertex of the sequence. The 2 dimensional case is illustrated in Figure 6. As before, the hidden layers are connected to a single output layer, which now has access to all surrounding context (see Figure 5). If the size of the hidden layers is held constant, the multi-directional MDRNN architecture scales as O(2 n ) for n-dimensional data. In practice however, the computing power of the network tends to be governed by the overall number of weights, rather than the size of the hidden layers. This is presumably because the data processing is shared between the layers. Since the complexity of the algorithm is linear in the number of parameters, the O(2 n ) scaling factor can be removed by simply using smaller hidden layers for higher dimensions. Furthermore, the complexity of a task, and therefore the number of weights likely to be needed for it, does not necessarily increase with the dimensionality of the data. For example, both the networks described in this paper have less than half the weights than the one dimensional networks we have previously applied to speech recognition [4,3]. For a given task, we have also found that using a multi-directional MDRNN gives better results than a uni-directional MDRNN with the same overall number of weights, as previously demonstrated in one dimension [4]. For a multi-directional MDRNN, the forward and backward passes through an n- dimensional sequence can be summarised as follows: 1: For each of the 2 n hidden layers choose a distinct vertex of the sequence, then define a set of axes such that the vertex is the origin and all sequence co-ordinates are 0 2: Repeat Algorithm 1 for each hidden layer 3: At each point in the sequence, feed forward all hidden layers to the output layer Algorithm 3: Multi-Directional MDRNN Forward Pass

6 1: At each point in the sequence, calculate the derivative of the objective function with respect to the activations of output layer 2: With the same axes as above, repeat Algorithm 2 for each hidden layer Algorithm 4: Multi-Directional MDRNN Backward Pass 2.2 Multi-Dimensional Long Short-Term Memory So far we have implicitly assumed that the network can make use of all context to which it has access. For standard RNN architectures however, the range of context that can practically be used is limited. The problem is that the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network s recurrent connections. This is usually referred to as the vanishing gradient problem [5]. Long Short-Term Memory (LSTM) [6,2] is an RNN architecture specifically designed to address the vanishing gradient problem. An LSTM hidden layer consists of multiple recurrently connected subnets, known as memory blocks. Each block contains a set of internal units, known as cells, whose activation is controlled by three multiplicative units: the input gate, forget gate and output gate. The effect of the gates is to allow the cells to store and access information over long periods of time, thereby avoiding the vanishing gradient problem. The standard formulation of LSTM is explicitly one-dimensional, since the cell contains a single self connection, whose activation is controlled by a single forget gate. However we can easily extend this to n dimensions by using instead n self connections (one for each of the cell s previous states along every dimension) with n forget gates. 3 Experiments 3.1 Air Freight Data The Air Freight database [12] is a ray-traced colour image sequence that comes with a ground truth segmentation based on textural characteristics (Figure 7). The sequence is 455 frames long and contains 155 distinct textures. Each frame is 120 pixels high and 160 pixels wide. The advantage of ray-traced data is the true segmentation can be defined directly from the 3D models. Although the images are not real, they are realistic in the sense that they have significant lighting, specular effects etc. We used the sequence to define a 2D image segmentation task, where the aim was to assign each pixel in the input data to the correct texture class. We divided the data at random into a 250 frame train set, a 150 frame test set and a 55 frame validation set. Note that we could have instead defined a 3D task where the network processed the entire video as one sequence. However, this would have left us with only one exemplar. For this task we used a multi-directional MDRNN with 4 LSTM hidden layers. Each layer consisted of 25 memory blocks, each containing 1 cell, 2 forget gates, 1 input gate, 1 output gate and 5 peephole weights. This gave a total 600 hidden units. The input and output activation functions of the cells were both tanh, and the activation

7 Fig. 7. Frame from the Air Freight database, showing the original image (left) and the colour-coded texture segmentation (right). function for the gates was the logistic sigmoid in the range [0, 1]. The input layer was size 3 (one each for the red, green and blue components of the pixels) and the output layer was size 155 (one unit for each textural class). The network contained 43,257 trainable weights in total. As is standard for classification tasks, the softmax activation function was used at the output layer, with the cross-entropy objective function [1]. The network was trained using online gradient descent (weight updates after every training sequence) with a learning rate of 10 6 and a momentum of 0.9. The final pixel classification error rate was 7.1 % on the test set. 3.2 MNIST Data The MNIST database [10] of isolated handwritten digits is a subset of a larger database available from NIST. It consists of size-normalized, centered images, each of which is 28 pixels high and 28 pixels wide and contains a single handwritten digit. The data comes divided into a training set with 60,000 images and a test set with 10,000 images. We used 10,000 of the training images for validation, leaving 50,000 for training. The usual task on MNIST is to label the images with the corresponding digits. This is benchmark task for which many algorithms have been evaluated. We carried out a slightly modified task where each pixel was classified according to the digit it belonged to, with an additional class for background pixels. However, the original task can be recovered by simply choosing the digit whose corresponding output unit had the highest cumulative activation for the entire sequence. To test the network s robustness to input warping, we also evaluated it on an altered version of the MNIST test set, where elastic deformations had been applied to every image (see Figure 8). We compared our results with the convolution neural network that has achieved the best results so far on MNIST [14]. Note that we re-implemented the convolution network ourselves, and we did not augment the training set with elastic distortions, which gives a substantial improvement in performance. The MDRNN for this task was identical to that for the Air Freight task with the following exceptions: the sizes of the input and output layers were now 1 (for grayscale pixels) and 11 (one for each digit, plus background) respectively, giving 27,511 weights in total, and the gradient descent learning rate was 10 5.

8 Fig. 8. MNIST image before and after deformation. For the distorted test set, we used the same degree of elastic deformation used by Simard [14] to augment the training set (σ = 4.0, α = 34.0), with a different initial random field for every sample image. Table 1 shows that the MDRNN matched the convolution net on the clean test set, and was considerably better on the warped test set. This suggests that MDRNNs are more robust to input warping than convolution networks. Table 1. Image error rates on MNIST (pixel error rates in brackets) Algorithm Clean Test Set Warped Test Set MDRNN 0.9 % (0.4 %) 7.1 % (3.8 %) Convolution 0.9 % 11.3 % 3.3 Analysis One benefit of two dimensional tasks is that the operation of the network can be easily visualised. Figure 9 shows the network activations during a frames from the Air Freight database. As can be seen, the network segments this image almost perfectly, in spite of difficult, reflective surfaces such as the glass and metal tube running from left to right. Clearly, classifying individual pixels in such a surface requires the use of context. An indication of the network s sensitivity to context can be found by analysing the derivatives of the network outputs at a particular point in the sequence with respect to the inputs at all other points. Collectively, we refer to these derivatives as the sequential Jacobian. Figure 10 shows the absolute value of the sequential Jacobian of an output during classification of an image from the MNIST database. It can be seen that the network responds to context from across the entire image, and seems particularly attuned to the outline of the digit.

9 Fig. 9. 2D MDRNN applied to an image from the Air Freight database. The hidden layer activations display one unit from each of the layers. A common behaviour is to mask off parts of the image, exhibited here by layers 2 and 3. 4 Conclusion We have introduced multi-dimensional recurrent neural networks (MDRNNs), thereby extending the applicabilty of RNNs to n-dimensional data. We have added multi-directional hidden layers that provide the network with access to all contextual information, and we have developed a multi-dimensional variant of the Long Short-Term Memory RNN architecture. We have tested MDRNNs on two image segmentation tasks, and found that it was more robust to input warping than a state-of-the-art digit recognition algorithm. Acknowledgments This research was funded by SNF grant /1. References 1. J.S. Bridle. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. Fogleman-Soulie and J.Herault, editors, Neurocomputing: Algorithms, Architectures and Applications, pages Springer- Verlag, F. Gers, N. Schraudolph, and J. Schmidhuber. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, 3: , A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the International Conference on Machine Learning, ICML 2006, Pittsburgh, USA, 2006.

10 Fig. 10. Sequential Jacobian of a 2D RNN for an image from the MNIST database. The white outputs correspond to the class background and the light grey ones to 8. The black outputs represent misclassifications. The output pixel for which the Jacobian is calculated is marked with a cross. Absolute values are plotted for the Jacobian, and lighter colours are used for higher values. 4. A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5-6): , S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8): , F. Hülsken, F. Wallhoff, and G. Rigoll. Facial expression recognition with pseudo-3d hidden markov models. In Proceedings of the 23rd DAGM-Symposium on Pattern Recognition, pages , London, UK, Springer-Verlag. 8. J. Jiten, B. Mérialdo, and B. Huet. Multi-dimensional dependency-tree hidden Markov models. In ICASSP 2006, 31st IEEE International Conference on Acoustics, Speech, and Signal Processing, May 14-19, 2006, Toulouse, France, May D. Joshi, J. Li, and J.Z. Wang. Parameter estimation of multi-dimensional hidden markov models: A scalable approach. pages III: , Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): , November J. Li, A. Najmi, and R. M. Gray. Image classification by a two-dimensional hidden markov model. IEEE Transactions on Signal Processing, 48(2): , G. McCarter and A. Storkey. Air Freight Image Segmentation Database. homepages.inf.ed.ac.uk/amos/afreightdata.html. 13. M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45: , November Patrice Y. Simard, Dave Steinkraus, and John C. Platt. Best practices for convolutional neural networks applied to visual document analysis. In ICDAR 03: Proceedings of the Seventh International Conference on Document Analysis and Recognition, page 958, Washington, DC, USA, IEEE Computer Society. 15. R. J. Williams and D. Zipser. Gradient-based learning algorithms for recurrent networks and their computational complexity. In Y. Chauvin and D. E. Rumelhart, editors, Backpropagation: Theory, Architectures and Applications, pages Lawrence Erlbaum Publishers, Hillsdale, N.J., 1995.

arxiv: v1 [cs.ai] 14 May 2007

arxiv: v1 [cs.ai] 14 May 2007 Multi-Dimensional Recurrent Neural Networks Alex Graves, Santiago Fernández, Jürgen Schmidhuber IDSIA Galleria 2, 6928 Manno, Switzerland {alex,santiago,juergen}@idsia.ch arxiv:0705.2011v1 [cs.ai] 14 May

More information

Text Recognition in Videos using a Recurrent Connectionist Approach

Text Recognition in Videos using a Recurrent Connectionist Approach Author manuscript, published in "ICANN - 22th International Conference on Artificial Neural Networks, Lausanne : Switzerland (2012)" DOI : 10.1007/978-3-642-33266-1_22 Text Recognition in Videos using

More information

Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks

Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks Alex Graves TU Munich, Germany graves@in.tum.de Jürgen Schmidhuber IDSIA, Switzerland and TU Munich, Germany juergen@idsia.ch

More information

A Novel Approach to On-Line Handwriting Recognition Based on Bidirectional Long Short-Term Memory Networks

A Novel Approach to On-Line Handwriting Recognition Based on Bidirectional Long Short-Term Memory Networks A Novel Approach to n-line Handwriting Recognition Based on Bidirectional Long Short-Term Memory Networks Marcus Liwicki 1 Alex Graves 2 Horst Bunke 1 Jürgen Schmidhuber 2,3 1 nst. of Computer Science

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

RNN-based Learning of Compact Maps for Efficient Robot Localization

RNN-based Learning of Compact Maps for Efficient Robot Localization RNN-based Learning of Compact Maps for Efficient Robot Localization Alexander Förster and Alex Graves and Jürgen Schmidhuber,2 - Istituto Dalle Molle di Studi sull Intelligenza Artificiale (IDSIA) Galleria

More information

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October

More information

Supervised Sequence Labelling with Recurrent Neural Networks. Alex Graves

Supervised Sequence Labelling with Recurrent Neural Networks. Alex Graves Supervised Sequence Labelling with Recurrent Neural Networks Alex Graves Contents List of Tables List of Figures List of Algorithms iv v vii 1 Introduction 1 1.1 Structure of the Book........................

More information

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural

More information

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015 Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training

More information

Supervised Sequence Labelling with Recurrent Neural Networks

Supervised Sequence Labelling with Recurrent Neural Networks Technische Universität München Fakultät für Informatik Lehrstuhl VI: Echtzeitsysteme und Robotik Supervised Sequence Labelling with Recurrent Neural Networks Alex Graves Vollständiger Abdruck der von der

More information

On the Efficiency of Recurrent Neural Network Optimization Algorithms

On the Efficiency of Recurrent Neural Network Optimization Algorithms On the Efficiency of Recurrent Neural Network Optimization Algorithms Ben Krause, Liang Lu, Iain Murray, Steve Renals University of Edinburgh Department of Informatics s17005@sms.ed.ac.uk, llu@staffmail.ed.ac.uk,

More information

Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling

Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling INTERSPEECH 2014 Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling Haşim Sak, Andrew Senior, Françoise Beaufays Google, USA {hasim,andrewsenior,fsb@google.com}

More information

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Empirical Evaluation of RNN Architectures on Sentence Classification Task Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

Weighted Convolutional Neural Network. Ensemble.

Weighted Convolutional Neural Network. Ensemble. Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols

Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols Hai Dai Nguyen 1, Anh Duc Le 2 and Masaki Nakagawa 3 Tokyo University of Agriculture and Technology 2-24-16 Nakacho, Koganei-shi,

More information

Recurrent Neural Nets II

Recurrent Neural Nets II Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization

More information

Character Recognition Using Convolutional Neural Networks

Character Recognition Using Convolutional Neural Networks Character Recognition Using Convolutional Neural Networks David Bouchain Seminar Statistical Learning Theory University of Ulm, Germany Institute for Neural Information Processing Winter 2006/2007 Abstract

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

Recurrent Neural Network (RNN) Industrial AI Lab.

Recurrent Neural Network (RNN) Industrial AI Lab. Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

Pixel-level Generative Model

Pixel-level Generative Model Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tübingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord,

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

In this assignment, we investigated the use of neural networks for supervised classification

In this assignment, we investigated the use of neural networks for supervised classification Paul Couchman Fabien Imbault Ronan Tigreat Gorka Urchegui Tellechea Classification assignment (group 6) Image processing MSc Embedded Systems March 2003 Classification includes a broad range of decision-theoric

More information

Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit

Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit Journal of Machine Learning Research 6 205) 547-55 Submitted 7/3; Published 3/5 Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit Felix Weninger weninger@tum.de Johannes

More information

Deep Neural Networks Applications in Handwriting Recognition

Deep Neural Networks Applications in Handwriting Recognition Deep Neural Networks Applications in Handwriting Recognition 2 Who am I? Théodore Bluche PhD defended at Université Paris-Sud last year Deep Neural Networks for Large Vocabulary Handwritten

More information

Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks

Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks Rahul Dey and Fathi M. Salem Circuits, Systems, and Neural Networks (CSANN) LAB Department of Electrical and Computer Engineering Michigan State

More information

LSTM with Working Memory

LSTM with Working Memory LSTM with Working Memory Andrew Pulver Department of Computer Science University at Albany Email: apulver@albany.edu Siwei Lyu Department of Computer Science University at Albany Email: slyu@albany.edu

More information

Speech Recognition with Quaternion Neural Networks

Speech Recognition with Quaternion Neural Networks Speech Recognition with Quaternion Neural Networks LIA - 2019 Titouan Parcollet, Mohamed Morchid, Georges Linarès University of Avignon, France ORKIS, France Summary I. Problem definition II. Quaternion

More information

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract

More information

Action Classification in Soccer Videos with Long Short-Term Memory Recurrent Neural Networks

Action Classification in Soccer Videos with Long Short-Term Memory Recurrent Neural Networks Action Classification in Soccer Videos with Long Short-Term Memory Recurrent Neural Networks Moez Baccouche 1,2, Franck Mamalet 1, Christian Wolf 2, Christophe Garcia 1, and Atilla Baskurt 2 1 Orange Labs,

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun

More information

Deep Neural Networks Applications in Handwriting Recognition

Deep Neural Networks Applications in Handwriting Recognition Deep Neural Networks Applications in Handwriting Recognition Théodore Bluche theodore.bluche@gmail.com São Paulo Meetup - 9 Mar. 2017 2 Who am I? Théodore Bluche PhD defended

More information

Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification

Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification 1 Amir Ahooye Atashin, 2 Kamaledin Ghiasi-Shirazi, 3 Ahad Harati Department of Computer Engineering Ferdowsi University

More information

Detecting Fraudulent Behavior Using Recurrent Neural Networks

Detecting Fraudulent Behavior Using Recurrent Neural Networks Computer Security Symposium 2016 11-13 October 2016 Detecting Fraudulent Behavior Using Recurrent Neural Networks Yoshihiro Ando 1,2,a),b) Hidehito Gomi 2,c) Hidehiko Tanaka 1,d) Abstract: Due to an increase

More information

Convolutional Neural Networks

Convolutional Neural Networks Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

More information

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural

More information

Handwritten Hindi Numerals Recognition System

Handwritten Hindi Numerals Recognition System CS365 Project Report Handwritten Hindi Numerals Recognition System Submitted by: Akarshan Sarkar Kritika Singh Project Mentor: Prof. Amitabha Mukerjee 1 Abstract In this project, we consider the problem

More information

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Lecture 21 : A Hybrid: Deep Learning and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

Visual object classification by sparse convolutional neural networks

Visual object classification by sparse convolutional neural networks Visual object classification by sparse convolutional neural networks Alexander Gepperth 1 1- Ruhr-Universität Bochum - Institute for Neural Dynamics Universitätsstraße 150, 44801 Bochum - Germany Abstract.

More information

Unsupervised Feature Learning for Optical Character Recognition

Unsupervised Feature Learning for Optical Character Recognition Unsupervised Feature Learning for Optical Character Recognition Devendra K Sahu and C. V. Jawahar Center for Visual Information Technology, IIIT Hyderabad, India. Abstract Most of the popular optical character

More information

Segmentation-free Vehicle License Plate Recognition using ConvNet-RNN

Segmentation-free Vehicle License Plate Recognition using ConvNet-RNN Segmentation-free Vehicle License Plate Recognition using ConvNet-RNN Teik Koon Cheang (Author) cheangtk@1utar.my Yong Shean Chong yshean@1utar.my Yong Haur Tay tayyh@utar.edu.my Abstract While vehicle

More information

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

On the Effectiveness of Neural Networks Classifying the MNIST Dataset On the Effectiveness of Neural Networks Classifying the MNIST Dataset Carter W. Blum March 2017 1 Abstract Convolutional Neural Networks (CNNs) are the primary driver of the explosion of computer vision.

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Word Vector 2 Word2Vec Variants Skip-gram: predicting surrounding words given the target word (Mikolov+, 2013) CBOW (continuous bag-of-words): predicting

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

Efficient Algorithms may not be those we think

Efficient Algorithms may not be those we think Efficient Algorithms may not be those we think Yann LeCun, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann

More information

Handwritten Devanagari Character Recognition Model Using Neural Network

Handwritten Devanagari Character Recognition Model Using Neural Network Handwritten Devanagari Character Recognition Model Using Neural Network Gaurav Jaiswal M.Sc. (Computer Science) Department of Computer Science Banaras Hindu University, Varanasi. India gauravjais88@gmail.com

More information

Real-time Gesture Pattern Classification with IMU Data

Real-time Gesture Pattern Classification with IMU Data Real-time Gesture Pattern Classification with IMU Data Alex Fu Stanford University Computer Science Department alexfu@stanford.edu Yangyang Yu Stanford University Electrical Engineering Department yyu10@stanford.edu

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network

Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network 139 Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network Harmit Kaur 1, Simpel Rani 2 1 M. Tech. Research Scholar (Department of Computer Science & Engineering), Yadavindra College

More information

Automated Diagnosis of Vertebral Fractures using 2D and 3D Convolutional Networks

Automated Diagnosis of Vertebral Fractures using 2D and 3D Convolutional Networks Automated Diagnosis of Vertebral Fractures using 2D and 3D Convolutional Networks CS189 Final Project Naofumi Tomita Overview Automated diagnosis of osteoporosis-related vertebral fractures is a useful

More information

Handwritten Text Recognition

Handwritten Text Recognition Handwritten Text Recognition M.J. Castro-Bleda, S. España-Boquera, F. Zamora-Martínez Universidad Politécnica de Valencia Spain Avignon, 9 December 2010 Text recognition () Avignon Avignon, 9 December

More information

Voice command module for Smart Home Automation

Voice command module for Smart Home Automation Voice command module for Smart Home Automation LUKA KRALJEVIĆ, MLADEN RUSSO, MAJA STELLA Laboratory for Smart Environment Technologies, University of Split, FESB Ruđera Boškovića 32, 21000, Split CROATIA

More information

Dan Ciresan Politehnica University of Timisoara Computer Department Timisoara, Romania Abstract. 1.

Dan Ciresan Politehnica University of Timisoara Computer Department Timisoara, Romania Abstract. 1. Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples (draft - camera ready on 09/01/2008) Dan Ciresan Politehnica

More information

A Robust Dissimilarity-based Neural Network for Temporal Pattern Recognition

A Robust Dissimilarity-based Neural Network for Temporal Pattern Recognition A Robust Dissimilarity-based Neural Network for Temporal Pattern Recognition Brian Kenji Iwana, Volkmar Frinken, Seiichi Uchida Department of Advanced Information Technology, Kyushu University, Fukuoka,

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Deep Learning Workshop Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Why deep learning? The ImageNet Challenge Goal: image classification with 1000 categories Top 5 error rate of 15%. Krizhevsky, Alex,

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units COMP9 8s Geometry of Hidden Units COMP9 Neural Networks and Deep Learning 5. Geometry of Hidden Units Outline Geometry of Hidden Unit Activations Limitations of -layer networks Alternative transfer functions

More information

Final Report: Classification of Plankton Classes By Tae Ho Kim and Saaid Haseeb Arshad

Final Report: Classification of Plankton Classes By Tae Ho Kim and Saaid Haseeb Arshad Final Report: Classification of Plankton Classes By Tae Ho Kim and Saaid Haseeb Arshad Table of Contents 1. Project Overview a. Problem Statement b. Data c. Overview of the Two Stages of Implementation

More information

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Qi Zhang & Yan Huang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition

More information

arxiv: v1 [cs.lg] 16 Jan 2013

arxiv: v1 [cs.lg] 16 Jan 2013 Stochastic Pooling for Regularization of Deep Convolutional Neural Networks arxiv:131.3557v1 [cs.lg] 16 Jan 213 Matthew D. Zeiler Department of Computer Science Courant Institute, New York University zeiler@cs.nyu.edu

More information

THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York

THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York The MNIST database of handwritten digits, available from this page, has a training set

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of September 23, 2018.

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of September 23, 2018. Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c 2018. All rights reserved. Draft of September 23, 2018. CHAPTER 9 Sequence Processing with Recurrent Networks Time will explain.

More information

Facial Keypoint Detection

Facial Keypoint Detection Facial Keypoint Detection CS365 Artificial Intelligence Abheet Aggarwal 12012 Ajay Sharma 12055 Abstract Recognizing faces is a very challenging problem in the field of image processing. The techniques

More information

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework Demo Paper Joerg Evermann 1, Jana-Rebecca Rehse 2,3, and Peter Fettke 2,3 1 Memorial University of Newfoundland 2 German Research

More information

House Price Prediction Using LSTM

House Price Prediction Using LSTM House Price Prediction Using LSTM Xiaochen Chen Lai Wei The Hong Kong University of Science and Technology Jiaxin Xu ABSTRACT In this paper, we use the house price data ranging from January 2004 to October

More information

Outline GF-RNN ReNet. Outline

Outline GF-RNN ReNet. Outline Outline Gated Feedback Recurrent Neural Networks. arxiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

A new approach for supervised power disaggregation by using a deep recurrent LSTM network

A new approach for supervised power disaggregation by using a deep recurrent LSTM network A new approach for supervised power disaggregation by using a deep recurrent LSTM network GlobalSIP 2015, 14th Dec. Lukas Mauch and Bin Yang Institute of Signal Processing and System Theory University

More information

Static Gesture Recognition with Restricted Boltzmann Machines

Static Gesture Recognition with Restricted Boltzmann Machines Static Gesture Recognition with Restricted Boltzmann Machines Peter O Donovan Department of Computer Science, University of Toronto 6 Kings College Rd, M5S 3G4, Canada odonovan@dgp.toronto.edu Abstract

More information

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation For Intelligent GPS Navigation and Traffic Interpretation Tianshi Gao Stanford University tianshig@stanford.edu 1. Introduction Imagine that you are driving on the highway at 70 mph and trying to figure

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Recognition of Historical Greek Polytonic Scripts Using LSTM Networks

Recognition of Historical Greek Polytonic Scripts Using LSTM Networks Recognition of Historical Greek Polytonic Scripts Using LSTM Networks Fotini Simistira, Adnan Ul-Hassan, Vassilis Papavassiliou, Basilis Gatos, Vassilis Katsouros and Marcus Liwicki Institute for Language

More information

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why? Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of

More information

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application

More information

A Novel Word Spotting Algorithm Using Bidirectional Long Short-Term Memory Neural Networks

A Novel Word Spotting Algorithm Using Bidirectional Long Short-Term Memory Neural Networks A Novel Word Spotting Algorithm Using Bidirectional Long Short-Term Memory Neural Networks Volkmar Frinken, Andreas Fischer, and Horst Bunke Institute of Computer Science and Applied Mathematics, University

More information

MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS

MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS Deniz Erdogmus, Justin C. Sanchez 2, Jose C. Principe Computational NeuroEngineering Laboratory, Electrical & Computer

More information

Two-step Modified SOM for Parallel Calculation

Two-step Modified SOM for Parallel Calculation Two-step Modified SOM for Parallel Calculation Two-step Modified SOM for Parallel Calculation Petr Gajdoš and Pavel Moravec Petr Gajdoš and Pavel Moravec Department of Computer Science, FEECS, VŠB Technical

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Introduction Pattern recognition is a set of mathematical, statistical and heuristic techniques used in executing `man-like' tasks on computers. Pattern recognition plays an

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

A Cascade of Deep Learning Fuzzy Rule-based Image Classifier and SVM

A Cascade of Deep Learning Fuzzy Rule-based Image Classifier and SVM A Cascade of Deep Learning Fuzzy Rule-based Image Classifier and SVM Plamen Angelov, Fellow, IEEE and Xiaowei Gu, Student Member, IEEE Data Science Group, School of Computing and Communications, Lancaster

More information

Inductive Modelling of Temporal Sequences by Means of Self-organization

Inductive Modelling of Temporal Sequences by Means of Self-organization Inductive Modelling of Temporal Sequences by Means of Self-organization Jan Koutnik 1 1 Computational Intelligence Group, Dept. of Computer Science and Engineering, Faculty of Electrical Engineering, Czech

More information

Deep Learning. Volker Tresp Summer 2014

Deep Learning. Volker Tresp Summer 2014 Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there

More information

Face Detection Using Convolutional Neural Networks and Gabor Filters

Face Detection Using Convolutional Neural Networks and Gabor Filters Face Detection Using Convolutional Neural Networks and Gabor Filters Bogdan Kwolek Rzeszów University of Technology W. Pola 2, 35-959 Rzeszów, Poland bkwolek@prz.rzeszow.pl Abstract. This paper proposes

More information

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,

More information

Autoencoders, denoising autoencoders, and learning deep networks

Autoencoders, denoising autoencoders, and learning deep networks 4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,

More information