Speech Recognition with Quaternion Neural Networks
|
|
- Annabelle Booker
- 5 years ago
- Views:
Transcription
1 Speech Recognition with Quaternion Neural Networks LIA Titouan Parcollet, Mohamed Morchid, Georges Linarès University of Avignon, France ORKIS, France
2 Summary I. Problem definition II. Quaternion numbers III. Quaternion convolutional neural networks IV. Experimentation and discussions 1
3 Summary I. Problem definition II. Quaternion numbers III. Quaternion convolutional neural networks IV. Experimentation and discussions 1
4 Summary I. Problem definition II. Quaternion numbers III. Quaternion neural networks IV. Experimentation and discussions 1
5 Summary I. Problem definition II. Quaternion numbers III. Quaternion neural networks IV. Experiments and discussions 1
6 Problem definition 2
7 Problem definition Established fact n 1: the bigger the model is, the better the results are (with a good training procedure and enough data) 2
8 Problem definition Established fact n 1: the bigger the model is, the better the results are (with a good training procedure and enough data) Is the model really efficient? 2
9 Problem definition Established fact n 1: the bigger the model is, the better the results are (with a good training procedure and enough data) Is the model really efficient? Established fact n 2: Input features are often multidimensional 2
10 Problem definition Established fact n 1: the bigger the model is, the better the results are (with a good training procedure and enough data) Is the model really efficient? Established fact n 2: Input features are often multidimensional Is the usual flat real-valued representation good? 2
11 Problem definition Can we define a more natural representation of multidimensional input features than the real-valued one, that will helps neural networks to be more efficient? 3
12 Quaternion numbers 4
13 Quaternion numbers Q = r1+xi + yj + zk 4
14 Quaternion numbers Real part Q = r1+xi + yj + zk 4
15 Quaternion numbers Imaginary part Q = r1+xi + yj + zk 4
16 Quaternion numbers Real part Imaginary part Q = r1+xi + yj + zk Quaternions solve the multidimensionality problem! 4
17 Quaternion numbers Real part Imaginary part Q = r1+xi + yj + zk Quaternions solve the multidimensionality problem! Acoustic quaternion for speech processing Q(f,t) j 2 t k 4
18 Quaternion numbers Real part Imaginary part Q = r1+xi + yj + zk Quaternions solve the multidimensionality problem! Acoustic quaternion for speech processing MFCC, Mel-filter-banks Q(f,t) j 2 t k 4
19 Quaternion numbers Real part Imaginary part Q = r1+xi + yj + zk Quaternions solve the multidimensionality problem! Acoustic quaternion for speech processing First and second order derivatives Q(f,t) j 2 t k 4
20 Quaternion numbers Real part Imaginary part Q = r1+xi + yj + zk Quaternions solve the multidimensionality problem! Acoustic quaternion for speech processing Purely imaginary acoustic quaternion Q(f,t) j 2 t k 4
21 Quaternion numbers Real part Imaginary part Q = r1+xi + yj + zk Quaternions solve the multidimensionality problem! Pixel quaternion for image processing Purely imaginary pixel quaternion Q(p) =0+Red (p)i + Green(p)j + Blue(p)k 4
22 Quaternion numbers Hamilton product 5
23 Quaternion numbers Hamilton product Components are related to each others 5
24 Quaternion numbers Hamilton product in neural networks Real-valued layer 6
25 Quaternion numbers Hamilton product in neural networks Real-valued layer 6
26 Quaternion numbers Hamilton product in neural networks Real-valued layer Quaternion-valued layer 4 x 4 = 16 weights 6
27 Quaternion numbers Hamilton product in neural networks Real-valued layer Quaternion-valued layer r i j k r i j k 4 x 4 = 16 weights 6
28 Quaternion numbers Hamilton product in neural networks Real-valued layer Quaternion-valued layer r i j k r i j k 4 x 4 = 16 weights W q X q =(w r x r w x x x w y x y w z x z )+ (w r x x + w x x r + w y x z w z x y )i+ (w r x y w x x z + w y x r + w z x x )j+ (w r x z + w x x y w y x x + w z x r )k 6
29 Quaternion numbers Hamilton product in neural networks Real-valued layer Quaternion-valued layer r i j k r i j k 4 x 4 = 16 weights W q X q =(w r x r w x x x w y x y w z x z )+ (w r x x + w x x r + w y x z w z x y )i+ (w r x y w x x z + w y x r + w z x x )j+ (w r x z + w x x y w y x x + w z x r )k 6
30 Quaternion numbers Hamilton product in neural networks Real-valued layer Quaternion-valued layer r i j k r i j k 4 x 4 = 16 weights W q X q =(w r x r w x x x w y x y w z x z )+ (w r x x + w x x r + w y x z w z x y )i+ (w r x y w x x z + w y x r + w z x x )j+ (w r x z + w x x y w y x x + w z x r )k 1 weight = 4 parameters 6
31 Quaternion numbers Hamilton product in neural networks Quaternions can learn internal relations within input features! Quaternions reduce the number of neural parameters! 7
32 Quaternion Neural Networks (QNN)
33 Quaternion Neural Networks (QNN) QNN = NN with all parameters being quaternions 8
34 Quaternion Neural Networks (QNN) QNN = NN with all parameters being quaternions QNN = NN with to replace 8
35 Quaternion Neural Networks (QNN) QNN = NN with all parameters being quaternions QNN = NN with to replace QNN backpropagation and update differ from NN[1] [1] P. Arena, L. Fortuna, L. Occhipinti, and M. G. Xibilia, Neural networks for quaternion-valued function approximation, in Circuits and Systems, ISCAS 94., 1994 IEEE International Symposium on, vol. 6. IEEE, 1994, pp
36 Quaternion Neural Networks (QNN) Activation function: the «split» approach [1] Q = r1+xi + yj + zk [1] P. Arena, L. Fortuna, L. Occhipinti, and M. G. Xibilia, Neural networks for quaternion-valued function approximation, in Circuits and Systems, ISCAS 94., 1994 IEEE International Symposium on, vol. 6. IEEE, 1994, pp
37 Quaternion Neural Networks (QNN) Activation function: the «split» approach [1] Q = r1+xi + yj + zk f(q) =f(r)+f(x)i + f(y)j + f(z)k The function f can be any real-valued activation function Sigmoid, TanH, ReLU, ELU [1] P. Arena, L. Fortuna, L. Occhipinti, and M. G. Xibilia, Neural networks for quaternion-valued function approximation, in Circuits and Systems, ISCAS 94., 1994 IEEE International Symposium on, vol. 6. IEEE, 1994, pp
38 Quaternion Neural Networks (QNN) Neural parameters initialization 10
39 Quaternion Neural Networks (QNN) Neural parameters initialization 10
40 Quaternion Neural Networks (QNN) Neural parameters initialization 10
41 Experiments and discussions
42 Experiments and discussions Neural Networks reminder
43 Experiments and discussions Neural Networks reminder Convolutional neural networks (CNN) 11
44 Experiments and discussions Neural Networks reminder Recurrent neural networks (RNN) 12
45 Experiments and discussions Neural Networks reminder Long-Short Term Memory recurrent neural networks (LSTM) 13
46 Experiments and discussions Speech Recognition tasks Where are we using neural networks? 14
47 Experiments and discussions Speech Recognition tasks Automatic Speech Recognition (ASR) system overly simplified 15
48 Experiments and discussions Speech Recognition tasks Automatic Speech Recognition (ASR) system overly simplified 15
49 Experiments and discussions Acoustic Modelling Speech Recognition tasks 16
50 Experiments and discussions End-to-End - TIMIT speakers training set - 50 speakers validation set sentences as a core test set - SA records are removed from the training Acoustic Modelling Speech Recognition tasks 16
51 Experiments and discussions End-to-End - TIMIT speakers training set - 50 speakers validation set sentences as a core test set - SA records are removed from the training Acoustic Modelling - Q-Convolutional Neural Network + CTC [2] Speech Recognition tasks [2] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp
52 Experiments and discussions Acoustic Modelling Speech Recognition tasks End-to-End - TIMIT speakers training set - 50 speakers validation set sentences as a core test set - SA records are removed from the training - Q-Convolutional Neural Network + CTC [2] Traditional HMM - TIMIT speakers training set - 50 speakers validation set sentences as a core test set - SA records are removed from the training - Wall Street Journal (WSJ) - 14h and 81h training set - test-dev93 used as a validation set - test-eval92 used as a test set [2] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp
53 Experiments and discussions Acoustic Modelling Speech Recognition tasks End-to-End - TIMIT speakers training set - 50 speakers validation set sentences as a core test set - SA records are removed from the training - Q-Convolutional Neural Network + CTC [2] Traditional HMM - TIMIT speakers training set - 50 speakers validation set sentences as a core test set - SA records are removed from the training - Wall Street Journal (WSJ) - 14h and 81h training set - test-dev93 used as a validation set - test-eval92 used as a test set - Q-Recurrent Neural Networks (QRNN) - Q-Long-Short Term Memory NN (QLSTM) [2] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp
54 Experiments and discussions Acoustic features 17
55 Experiments and discussions Acoustic features acoustic quaternion 40 mel-filter-banks + delta + dd + ddd = 160 real-valued inputs 40 MFCC + deltas + deltas-deltas = 40 quaternion-valued inputs 17
56 Experiments and discussions End-to-End results on TIMIT - QCNN + CTC Models architectures will be discussed during the questions! ;) 18
57 Experiments and discussions End-to-End results on TIMIT - QCNN + CTC Error expressed in Phoneme Error Rate (PER %) FM = Feature Maps 19
58 Experiments and discussions End-to-End results on TIMIT - QCNN + CTC Error expressed in Phoneme Error Rate (PER %) FM = Feature Maps 19
59 Experiments and discussions End-to-End results on TIMIT - QCNN + CTC Error expressed in Phoneme Error Rate (PER %) FM = Feature Maps 4x less parameters! 19
60 Experiments and discussions No more End-to-End results 20
61 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QRNN 21
62 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QRNN Error expressed in Phoneme Error Rate (PER %) 22
63 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QRNN Error expressed in Phoneme Error Rate (PER %) 22
64 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QRNN Error expressed in Phoneme Error Rate (PER %) 2.5x less parameters! 22
65 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QLSTM 23
66 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QLSTM Error expressed in Phoneme Error Rate (PER %) 24
67 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QLSTM Error expressed in Phoneme Error Rate (PER %) 24
68 Experiments and discussions HMM on TIMIT with PyTorch-Kaldi - QLSTM Error expressed in Phoneme Error Rate (PER %) 3.2x less parameters! 24
69 Experiments and discussions HMM on WSJ with PyTorch-Kaldi - QLSTM 25
70 Experiments and discussions HMM on WSJ with PyTorch-Kaldi - QLSTM Error expressed in Word Error Rate (WER %) 26
71 Experiments and discussions HMM on WSJ with PyTorch-Kaldi - QLSTM Error expressed in Word Error Rate (WER %) 26
72 Experiments and discussions HMM on WSJ with PyTorch-Kaldi - QLSTM Error expressed in Word Error Rate (WER %) 26
73 Conclusion Can we define a more natural representation of multidimensional input features than the real-valued one, that will helps neural networks to be more efficient? 27
74 Conclusion Can we define a more natural representation of multidimensional input features than the real-valued one, that will helps neural networks to be more efficient? Better and more natural representation multidimensional features 27
75 Conclusion Can we define a more natural representation of multidimensional input features than the real-valued one, that will helps neural networks to be more efficient? Better and more natural representation multidimensional features The Hamilton product alongside with neural networks allow QNN to well-learn both internal and contextual dependencies 27
76 Conclusion Can we define a more natural representation of multidimensional input features than the real-valued one, that will helps neural networks to be more efficient? Better and more natural representation multidimensional features The Hamilton product alongside with neural networks allow QNN to well-learn both internal and contextual dependencies Reduction of the number of free parameters 27
77 Conclusion Can we define a more natural representation of multidimensional input features than the real-valued one, that will helps neural networks to be more efficient? Better and more natural representation multidimensional features The Hamilton product alongside with neural networks allow QNN to well-learn both internal and contextual dependencies Reduction of the number of free parameters Y E S W E C A N 27
78 Ressources Related to this presentation: - «Quaternion Recurrent Neural Networks» ICLR 2019, Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, Yoshua Bengio - «Speech Recognition with Quaternion Neural Networks», NIPS (NeurIPS) IRASL, Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Renato De Mori - «Quaternion Convolutional Neural Networks for End-to-End Speech Recognition» Interspeech 2018 Oral Session on «End-to-End ASR», Titouan Parcollet,Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, Yoshua Bengio - «Bidirectional Quaternion Long-Short Term Memory Recurrent Neural Networks for Speech Recognition» Submitted ICASSP 2019, Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori - «The Pytorch-Kaldi Speech Recognition Toolkit» Submitted ICASSP 2019, Mirco Ravanelli, Titouan Parcollet, Yoshua Bengio QNN with PyTorch and Keras: PyTorch-Kaldi:
79 Thanks you! Questions? Related to this presentation: *Eve follows a rotation described by a unit quaternion around Wall-e - «Quaternion Recurrent Neural Networks» ICLR 2019, Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, Yoshua Bengio - «Speech Recognition with Quaternion Neural Networks», NIPS (NeurIPS) IRASL, Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Renato De Mori - «Quaternion Convolutional Neural Networks for End-to-End Speech Recognition» Interspeech 2018 Oral Session on «End-to-End ASR», Titouan Parcollet,Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, Yoshua Bengio - «Bidirectional Quaternion Long-Short Term Memory Recurrent Neural Networks for Speech Recognition» Submitted ICASSP 2019, Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori - «The Pytorch-Kaldi Speech Recognition Toolkit» Submitted ICASSP 2019, Mirco Ravanelli, Titouan Parcollet, Yoshua Bengio QNN with PyTorch and Keras: PyTorch-Kaldi:
80 Quaternion numbers Hamilton product in neural networks
81 Quaternion convolution
82 Computations W q X q =(w r x r w x x x w y x y w z x z )+ (w r x x + w x x r + w y x z w z x y )i+ (w r x y w x x z + w y x r + w z x x )j+ (w r x z + w x x y w y x x + w z x r )k 28 operations that should be computed in parallel with CUDA and GPUs
83 Quaternion equations Q = r x y z x r z y 7 y z r x5 Q = r1 xi yj zk z y x r Q / = Q p r2 + x 2 + y 2 + z 2 Q = Q e n = Q (cos( )+nsin( )) n = xi + yj + zk Q sin( )
84 Connectionist Temporal Classification[2] Hannun, "Sequence Modeling with CTC", Distill, [2] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp
85 Learning Internal relations with QCNN
arxiv: v1 [cs.sd] 20 Jun 2018
Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition Titouan Parcollet 1,2,4, Ying Zhang 2,5, Mohamed Morchid 1, Chiheb Trabelsi 2, Georges Linarès 1, Renato De Mori 1,3
More informationarxiv: v1 [eess.as] 21 Nov 2018
Speech Recognition with Quaternion Neural Networks arxiv:1811.09678v1 [eess.as] 21 Nov 2018 Titouan Parcollet 1,4 Mirco Ravanelli 2 Mohamed Morchid 1 Georges Linarès 1 Renato De Mori 1,3 1 LIA, Université
More informationLecture 7: Neural network acoustic models in speech recognition
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic
More informationVoice command module for Smart Home Automation
Voice command module for Smart Home Automation LUKA KRALJEVIĆ, MLADEN RUSSO, MAJA STELLA Laboratory for Smart Environment Technologies, University of Split, FESB Ruđera Boškovića 32, 21000, Split CROATIA
More informationLong Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling
INTERSPEECH 2014 Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling Haşim Sak, Andrew Senior, Françoise Beaufays Google, USA {hasim,andrewsenior,fsb@google.com}
More informationTraining LDCRF model on unsegmented sequences using Connectionist Temporal Classification
Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification 1 Amir Ahooye Atashin, 2 Kamaledin Ghiasi-Shirazi, 3 Ahad Harati Department of Computer Engineering Ferdowsi University
More informationImproving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks
Interspeech 2018 2-6 September 2018, Hyderabad Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks Sheng Li 1, Xugang Lu 1, Ryoichi Takashima 1, Peng Shen 1, Tatsuya Kawahara
More informationarxiv: v1 [cs.ai] 14 May 2007
Multi-Dimensional Recurrent Neural Networks Alex Graves, Santiago Fernández, Jürgen Schmidhuber IDSIA Galleria 2, 6928 Manno, Switzerland {alex,santiago,juergen}@idsia.ch arxiv:0705.2011v1 [cs.ai] 14 May
More informationEND-TO-END CHINESE TEXT RECOGNITION
END-TO-END CHINESE TEXT RECOGNITION Jie Hu 1, Tszhang Guo 1, Ji Cao 2, Changshui Zhang 1 1 Department of Automation, Tsinghua University 2 Beijing SinoVoice Technology November 15, 2017 Presentation at
More informationGated Recurrent Unit Based Acoustic Modeling with Future Context
Interspeech 2018 2-6 September 2018, Hyderabad Gated Recurrent Unit Based Acoustic Modeling with Future Context Jie Li 1, Xiaorui Wang 1, Yuanyuan Zhao 2, Yan Li 1 1 Kwai, Beijing, P.R. China 2 Institute
More informationEnd-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow
End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow Ehsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani Google Inc, Mountain View, CA, USA
More informationQUATERNION NEURAL NETWORKS FOR SPOKEN LANGUAGE UNDERSTANDING
QUATERIO EURAL ETWORKS FOR SPOKE LAGUAGE UDERSTADIG Titouan Parcollet, Mohamed Morchid, Pierre-Michel Bousquet, Richard Dufour Georges Linarès and Renato De Mori, Fellow, IEEE LIA, University of Avignon
More informationLow latency acoustic modeling using temporal convolution and LSTMs
1 Low latency acoustic modeling using temporal convolution and LSTMs Vijayaditya Peddinti, Yiming Wang, Daniel Povey, Sanjeev Khudanpur Abstract Bidirectional long short term memory (BLSTM) acoustic models
More informationEnd- To- End Speech Recogni0on with Recurrent Neural Networks
RTTH Summer School on Speech Technology: A Deep Learning Perspec0ve End- To- End Speech Recogni0on with Recurrent Neural Networks José A. R. Fonollosa Universitat Politècnica de Catalunya. Barcelona Barcelona,
More informationMulti-Dimensional Recurrent Neural Networks
Multi-Dimensional Recurrent Neural Networks Alex Graves 1, Santiago Fernández 1, Jürgen Schmidhuber 1,2 1 IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland 2 TU Munich, Boltzmannstr. 3, 85748 Garching,
More informationDeep Neural Networks in HMM- based and HMM- free Speech RecogniDon
Deep Neural Networks in HMM- based and HMM- free Speech RecogniDon Andrew Maas Collaborators: Awni Hannun, Peng Qi, Chris Lengerich, Ziang Xie, and Anshul Samar Advisors: Andrew Ng and Dan Jurafsky Outline
More informationFactored deep convolutional neural networks for noise robust speech recognition
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Factored deep convolutional neural networks for noise robust speech recognition Masakiyo Fujimoto National Institute of Information and Communications
More informationBidirectional Truncated Recurrent Neural Networks for Efficient Speech Denoising
Bidirectional Truncated Recurrent Neural Networks for Efficient Speech Denoising Philémon Brakel, Dirk Stroobandt, Benjamin Schrauwen Department of Electronics and Information Systems, Ghent University,
More informationIntroducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit
Journal of Machine Learning Research 6 205) 547-55 Submitted 7/3; Published 3/5 Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit Felix Weninger weninger@tum.de Johannes
More informationA Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October
More informationDetecting Fraudulent Behavior Using Recurrent Neural Networks
Computer Security Symposium 2016 11-13 October 2016 Detecting Fraudulent Behavior Using Recurrent Neural Networks Yoshihiro Ando 1,2,a),b) Hidehito Gomi 2,c) Hidehiko Tanaka 1,d) Abstract: Due to an increase
More informationarxiv: v1 [cs.cl] 28 Nov 2016
An End-to-End Architecture for Keyword Spotting and Voice Activity Detection arxiv:1611.09405v1 [cs.cl] 28 Nov 2016 Chris Lengerich Mindori Palo Alto, CA chris@mindori.com Abstract Awni Hannun Mindori
More informationarxiv: v3 [cs.sd] 1 Nov 2018
AN IMPROVED HYBRID CTC-ATTENTION MODEL FOR SPEECH RECOGNITION Zhe Yuan, Zhuoran Lyu, Jiwei Li and Xi Zhou Cloudwalk Technology Inc, Shanghai, China arxiv:1810.12020v3 [cs.sd] 1 Nov 2018 ABSTRACT Recently,
More informationLOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS
LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS Tara N. Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, Bhuvana Ramabhadran IBM T. J. Watson
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationText Recognition in Videos using a Recurrent Connectionist Approach
Author manuscript, published in "ICANN - 22th International Conference on Artificial Neural Networks, Lausanne : Switzerland (2012)" DOI : 10.1007/978-3-642-33266-1_22 Text Recognition in Videos using
More informationAn Online Sequence-to-Sequence Model Using Partial Conditioning
An Online Sequence-to-Sequence Model Using Partial Conditioning Navdeep Jaitly Google Brain ndjaitly@google.com David Sussillo Google Brain sussillo@google.com Quoc V. Le Google Brain qvl@google.com Oriol
More informationA Robust Dissimilarity-based Neural Network for Temporal Pattern Recognition
A Robust Dissimilarity-based Neural Network for Temporal Pattern Recognition Brian Kenji Iwana, Volkmar Frinken, Seiichi Uchida Department of Advanced Information Technology, Kyushu University, Fukuoka,
More informationVariable-Component Deep Neural Network for Robust Speech Recognition
Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition 2 Who am I? Théodore Bluche PhD defended at Université Paris-Sud last year Deep Neural Networks for Large Vocabulary Handwritten
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationA Novel Approach to On-Line Handwriting Recognition Based on Bidirectional Long Short-Term Memory Networks
A Novel Approach to n-line Handwriting Recognition Based on Bidirectional Long Short-Term Memory Networks Marcus Liwicki 1 Alex Graves 2 Horst Bunke 1 Jürgen Schmidhuber 2,3 1 nst. of Computer Science
More informationUnsupervised Feature Learning for Optical Character Recognition
Unsupervised Feature Learning for Optical Character Recognition Devendra K Sahu and C. V. Jawahar Center for Visual Information Technology, IIIT Hyderabad, India. Abstract Most of the popular optical character
More informationCode Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:
Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition Théodore Bluche theodore.bluche@gmail.com São Paulo Meetup - 9 Mar. 2017 2 Who am I? Théodore Bluche PhD defended
More informationCUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models
CUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models Xie Chen, Xunying Liu, Yanmin Qian, Mark Gales and Phil Woodland April 1, 2016 Overview
More informationHandwritten Gurumukhi Character Recognition by using Recurrent Neural Network
139 Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network Harmit Kaur 1, Simpel Rani 2 1 M. Tech. Research Scholar (Department of Computer Science & Engineering), Yadavindra College
More informationarxiv: v1 [cs.cv] 4 Feb 2018
End2You The Imperial Toolkit for Multimodal Profiling by End-to-End Learning arxiv:1802.01115v1 [cs.cv] 4 Feb 2018 Panagiotis Tzirakis Stefanos Zafeiriou Björn W. Schuller Department of Computing Imperial
More informationarxiv: v5 [cs.lg] 21 Feb 2014
Do Deep Nets Really Need to be Deep? arxiv:1312.6184v5 [cs.lg] 21 Feb 2014 Lei Jimmy Ba University of Toronto jimmy@psi.utoronto.ca Abstract Rich Caruana Microsoft Research rcaruana@microsoft.com Currently,
More informationGMM-FREE DNN TRAINING. Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao
GMM-FREE DNN TRAINING Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao Google Inc., New York {andrewsenior,heigold,michiel,hankliao}@google.com ABSTRACT While deep neural networks (DNNs) have
More informationDeep Neural Networks for Recognizing Online Handwritten Mathematical Symbols
Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols Hai Dai Nguyen 1, Anh Duc Le 2 and Masaki Nakagawa 3 Tokyo University of Agriculture and Technology 2-24-16 Nakacho, Koganei-shi,
More informationScaling Deep Learning. Bryan
Scaling Deep Learning @ctnzr What do we want AI to do? Guide us to content Keep us organized Help us find things Help us communicate 帮助我们沟通 Drive us to work Serve drinks? Image Q&A Baidu IDL Sample questions
More informationScanning Neural Network for Text Line Recognition
2012 10th IAPR International Workshop on Document Analysis Systems Scanning Neural Network for Text Line Recognition Sheikh Faisal Rashid, Faisal Shafait and Thomas M. Breuel Department of Computer Science
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationThe Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays
CHiME2018 workshop The Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays Naoyuki Kanda 1, Rintaro Ikeshita 1, Shota Horiguchi 1,
More informationHello Edge: Keyword Spotting on Microcontrollers
Hello Edge: Keyword Spotting on Microcontrollers Yundong Zhang, Naveen Suda, Liangzhen Lai and Vikas Chandra ARM Research, Stanford University arxiv.org, 2017 Presented by Mohammad Mofrad University of
More informationMachine Learning. MGS Lecture 3: Deep Learning
Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer
More informationFrame and Segment Level Recurrent Neural Networks for Phone Classification
Frame and Segment Level Recurrent Neural Networks for Phone Classification Martin Ratajczak 1, Sebastian Tschiatschek 2, Franz Pernkopf 1 1 Graz University of Technology, Signal Processing and Speech Communication
More informationNON-LINEAR DIMENSION REDUCTION OF GABOR FEATURES FOR NOISE-ROBUST ASR. Hitesh Anand Gupta, Anirudh Raju, Abeer Alwan
NON-LINEAR DIMENSION REDUCTION OF GABOR FEATURES FOR NOISE-ROBUST ASR Hitesh Anand Gupta, Anirudh Raju, Abeer Alwan Department of Electrical Engineering, University of California Los Angeles, USA {hiteshag,
More informationBoosting Handwriting Text Recognition in Small Databases with Transfer Learning
Boosting Handwriting Text Recognition in Small Databases with Transfer Learning José Carlos Aradillas University of Seville Seville, Spain 41092 Email: jaradillas@us.es Juan José Murillo-Fuentes University
More informationAn Optimization of Deep Neural Networks in ASR using Singular Value Decomposition
An Optimization of Deep Neural Networks in ASR using Singular Value Decomposition Bachelor Thesis of Igor Tseyzer At the Department of Informatics Institute for Anthropomatics (IFA) Reviewer: Second reviewer:
More informationConvolutional Sequence to Sequence Non-intrusive Load Monitoring
1 Convolutional Sequence to Sequence Non-intrusive Load Monitoring Kunjin Chen, Qin Wang, Ziyu He Kunlong Chen, Jun Hu and Jinliang He Department of Electrical Engineering, Tsinghua University, Beijing,
More informationClinical Name Entity Recognition using Conditional Random Field with Augmented Features
Clinical Name Entity Recognition using Conditional Random Field with Augmented Features Dawei Geng (Intern at Philips Research China, Shanghai) Abstract. In this paper, We presents a Chinese medical term
More informationAUTOMATIC TRANSPORT NETWORK MATCHING USING DEEP LEARNING
AUTOMATIC TRANSPORT NETWORK MATCHING USING DEEP LEARNING Manuel Martin Salvador We Are Base / Bournemouth University Marcin Budka Bournemouth University Tom Quay We Are Base 1. INTRODUCTION Public transport
More informationSegmentation-free Vehicle License Plate Recognition using ConvNet-RNN
Segmentation-free Vehicle License Plate Recognition using ConvNet-RNN Teik Koon Cheang (Author) cheangtk@1utar.my Yong Shean Chong yshean@1utar.my Yong Haur Tay tayyh@utar.edu.my Abstract While vehicle
More informationA long, deep and wide artificial neural net for robust speech recognition in unknown noise
A long, deep and wide artificial neural net for robust speech recognition in unknown noise Feipeng Li, Phani S. Nidadavolu, and Hynek Hermansky Center for Language and Speech Processing Johns Hopkins University,
More informationLSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University
LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in
More informationImproving Bottleneck Features for Automatic Speech Recognition using Gammatone-based Cochleagram and Sparsity Regularization
Improving Bottleneck Features for Automatic Speech Recognition using Gammatone-based Cochleagram and Sparsity Regularization Chao Ma 1,2,3, Jun Qi 4, Dongmei Li 1,2,3, Runsheng Liu 1,2,3 1. Department
More information저작권법에따른이용자의권리는위의내용에의하여영향을받지않습니다.
저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할수없습니다. 변경금지. 귀하는이저작물을개작, 변형또는가공할수없습니다. 귀하는, 이저작물의재이용이나배포의경우,
More informationRLAT Rapid Language Adaptation Toolkit
RLAT Rapid Language Adaptation Toolkit Tim Schlippe May 15, 2012 RLAT Rapid Language Adaptation Toolkit - 2 RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit - 3 Outline Introduction
More informationDEEP CLUSTERING WITH GATED CONVOLUTIONAL NETWORKS
DEEP CLUSTERING WITH GATED CONVOLUTIONAL NETWORKS Li Li 1,2, Hirokazu Kameoka 1 1 NTT Communication Science Laboratories, NTT Corporation, Japan 2 University of Tsukuba, Japan lili@mmlab.cs.tsukuba.ac.jp,
More informationEmpirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural
More informationComparison of Bernoulli and Gaussian HMMs using a vertical repositioning technique for off-line handwriting recognition
2012 International Conference on Frontiers in Handwriting Recognition Comparison of Bernoulli and Gaussian HMMs using a vertical repositioning technique for off-line handwriting recognition Patrick Doetsch,
More informationSPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION
Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing
More informationPair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification 2 1 Xugang Lu 1, Peng Shen 1, Yu Tsao 2, Hisashi
More informationTemporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks Alberto Montes al.montes.gomez@gmail.com Santiago Pascual TALP Research Center santiago.pascual@tsc.upc.edu Amaia Salvador
More informationESE: Efficient Speech Recognition Engine for Sparse LSTM on FPGA
ESE: Efficient Speech Recognition Engine for Sparse LSTM on FPGA Song Han 1,2, Junlong Kang 2, Huizi Mao 1, Yiming Hu 3, Xin Li 2, Yubin Li 2, Dongliang Xie 2, Hong Luo 2, Song Yao 2, Yu Wang 2,3, Huazhong
More informationA Deep Learning primer
A Deep Learning primer Riccardo Zanella r.zanella@cineca.it SuperComputing Applications and Innovation Department 1/21 Table of Contents Deep Learning: a review Representation Learning methods DL Applications
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Word Vector 2 Word2Vec Variants Skip-gram: predicting surrounding words given the target word (Mikolov+, 2013) CBOW (continuous bag-of-words): predicting
More informationBidirectional Recurrent Convolutional Networks for Video Super-Resolution
Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Qi Zhang & Yan Huang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition
More informationAcoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing
Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se
More informationOffline Handwriting Recognition with Multidimensional Recurrent Neural Networks
Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks Alex Graves TU Munich, Germany graves@in.tum.de Jürgen Schmidhuber IDSIA, Switzerland and TU Munich, Germany juergen@idsia.ch
More informationSemantic Word Embedding Neural Network Language Models for Automatic Speech Recognition
Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition Kartik Audhkhasi, Abhinav Sethy Bhuvana Ramabhadran Watson Multimodal Group IBM T. J. Watson Research Center Motivation
More informationRecurrent Neural Networks with Attention for Genre Classification
Recurrent Neural Networks with Attention for Genre Classification Jeremy Irvin Stanford University jirvin16@stanford.edu Elliott Chartock Stanford University elboy@stanford.edu Nadav Hollander Stanford
More informationManifold Constrained Deep Neural Networks for ASR
1 Manifold Constrained Deep Neural Networks for ASR Department of Electrical and Computer Engineering, McGill University Richard Rose and Vikrant Tomar Motivation Speech features can be characterized as
More informationGating Neural Network for Large Vocabulary Audiovisual Speech Recognition
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. X, JUNE 2017 1 Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition Fei Tao, Student Member, IEEE, and
More informationChapter 3. Speech segmentation. 3.1 Preprocessing
, as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents
More informationACOUSTIC MODELING WITH NEURAL GRAPH EMBEDDINGS. Yuzong Liu, Katrin Kirchhoff
ACOUSTIC MODELING WITH NEURAL GRAPH EMBEDDINGS Yuzong Liu, Katrin Kirchhoff Department of Electrical Engineering University of Washington, Seattle, WA 98195 ABSTRACT Graph-based learning (GBL) is a form
More informationObject Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal
Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn
More informationReverberant Speech Recognition Based on Denoising Autoencoder
INTERSPEECH 2013 Reverberant Speech Recognition Based on Denoising Autoencoder Takaaki Ishii 1, Hiroki Komiyama 1, Takahiro Shinozaki 2, Yasuo Horiuchi 1, Shingo Kuroiwa 1 1 Division of Information Sciences,
More informationMaximum Likelihood Beamforming for Robust Automatic Speech Recognition
Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Barbara Rauch barbara@lsv.uni-saarland.de IGK Colloquium, Saarbrücken, 16 February 2006 Agenda Background: Standard ASR Robust ASR
More informationCombining Neural Networks and Log-linear Models to Improve Relation Extraction
Combining Neural Networks and Log-linear Models to Improve Relation Extraction Thien Huu Nguyen and Ralph Grishman Computer Science Department, New York University {thien,grishman}@cs.nyu.edu Outline Relation
More informationarxiv: v5 [cs.lg] 2 Feb 2017
ONLINE SEQUENCE TRAINING OF RECURRENT NEURAL NETWORKS WITH CONNECTIONIST TEMPORAL CLASSIFICATION arxiv:5.0684v5 [cs.lg] 2 Feb 207 Kyuyeon Hwang & Wonyong Sung Department of Electrical and Computer Engineering
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationarxiv: v1 [cs.cl] 30 Jan 2018
ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM Kyungmin Lee, Chiyoun Park, Namhoon Kim, and Jaewon Lee DMC R&D Center, Samsung Electronics, Seoul, Korea {k.m.lee,
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 ML Group Overview Today we will talk about applied machine learning (ML) on Arm. My aim for today is to show you just
More informationEmpirical Evaluation of RNN Architectures on Sentence Classification Task
Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks
More informationApplications of Berkeley s Dwarfs on Nvidia GPUs
Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse
More informationAdversarial Feature-Mapping for Speech Enhancement
Interspeech 2018 2-6 September 2018, Hyderabad Adversarial Feature-Mapping for Speech Enhancement Zhong Meng 1,2, Jinyu Li 1, Yifan Gong 1, Biing-Hwang (Fred) Juang 2 1 Microsoft AI and Research, Redmond,
More informationAn Efficient End-to-End Neural Model for Handwritten Text Recognition
CHOWDHURY, VIG: AN EFFICIENT END-TO-END NEURAL MODEL FOR HANDWRITTEN 1 An Efficient End-to-End Neural Model for Handwritten Text Recognition Arindam Chowdhury chowdhury.arindam1@tcs.com Lovekesh Vig lovekesh.vig@tcs.com
More informationPixel-level Generative Model
Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tübingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord,
More informationDeep Belief Networks for phone recognition
Deep Belief Networks for phone recognition Abdel-rahman Mohamed, George Dahl, and Geoffrey Hinton Department of Computer Science University of Toronto {asamir,gdahl,hinton}@cs.toronto.edu Abstract Hidden
More informationEncoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44
A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,
More informationWhy DNN Works for Speech and How to Make it More Efficient?
Why DNN Works for Speech and How to Make it More Efficient? Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering, York University, CANADA Joint work with Y.
More informationDeep Learning Applications
October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning
More information2015 The MathWorks, Inc. 1
2015 The MathWorks, Inc. 1 개발에서구현까지 MATLAB 환경에서의딥러닝 김종남 Application Engineer 2015 The MathWorks, Inc. 2 3 Why MATLAB for Deep Learning? MATLAB is Productive MATLAB is Fast MATLAB Integrates with Open Source
More informationAsynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features
Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University xusun@pku.edu.cn Motivation Neural networks -> Good Performance CNN, RNN, LSTM
More informationLSTM: An Image Classification Model Based on Fashion-MNIST Dataset
LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application
More informationA PRIORITIZED GRID LONG SHORT-TERM MEMORY RNN FOR SPEECH RECOGNITION. Wei-Ning Hsu, Yu Zhang, and James Glass
A PRIORITIZED GRID LONG SHORT-TERM MEMORY RNN FOR SPEECH RECOGNITION Wei-Ning Hsu, Yu Zhang, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
More informationarxiv: v1 [cs.lg] 20 Apr 2018
Modelling customer online behaviours with neural networks: applications to conversion prediction and advertising retargeting Yanwei Cui, Rogatien Tobossi, and Olivia Vigouroux GMF Assurances, Groupe Covéa
More information