Voice command module for Smart Home Automation
|
|
- Doris Davidson
- 5 years ago
- Views:
Transcription
1 Voice command module for Smart Home Automation LUKA KRALJEVIĆ, MLADEN RUSSO, MAJA STELLA Laboratory for Smart Environment Technologies, University of Split, FESB Ruđera Boškovića 32, 21000, Split CROATIA Abstract: - Voice control is the most prominent feature of smart home environment. In this paper, we proposed a voice command module that enables users hands-free interaction with the smart home environment. We presented three components required for simple and efficient control of the smart home devices. Wake up word component allows actual voice command processing. Speech recognition component maps spoken voice commands to text and Voice Control Interface parse that text into appropriate JSON format for home automation. We evaluate the possibility of using the voice control module in a smart home environment by separately analyzing each component of the module. Key-Words: - Smart Home, Speech Recognition, Voice control, Wake-up Word, s Parsing 1 Introduction Smart home environment represents an automation system in which different sensors and intelligent devices work together providing efficient service functions to improve comfortable quality of house living. The smart environment can be seen as a promising way of supporting independent living providing in-home assistance which is especially significant to people with some disabilities [1]. Speech recognition is considered to be a bridge for better and more natural human-computer interaction [2]. Recent advances in automatic speech recognition (ASR) made this technology one of the essential features of the smart home automation (HA) system [3][4]. Our goal is to enable voice control within the smart home laboratory that can be used to control a large number of devices such as lights, plugs, sensors, dimmers, etc. In this paper, we present the architecture of the voice control module which has a role of converting the spoken commands to text and then mapping them to the appropriate actions in the HA system. Module continuously listens and process the sounds from the environment to ensures hands-free interaction. To minimize the number of cases in which the action in the HA system is triggered without actual user intent, we introduced wake up word (WUW) detection mechanism. 2 System Architecture In this section, we described the architecture of the proposed voice control module. The proposed module consists of Wake-Up Word detection module, Speech recognition engine and Voice command interface as shown in Fig 1. Voice Wake up Word Detection (Audio) Voice Interface JSON Speech to Text Engine (Text) Figure 1 Voice Module First, we will give a brief introduction inside deep network architecture behind our WUW and ASR components followed by the description of each component. 2.1 Deep Network Architecture Because of the ability to handle sequential information almost all end-to-end speech recognition systems use some recurrent neural network in their pipeline [5]-[8]. In the RNN, the internal representation of dynamic speech features is formed by feeding the low-level acoustic features into the hidden layer together with the recurrent ISSN: Volume 3, 2018
2 hidden features from the history. As an improvement of conventional RNN, Bidirectional Recurrent Neural Networks (BRNNs) can make use of future context. Data is processed both directions with two separate hidden layers by splitting the state neurons of a regular RNN in a part that is responsible for the positive time direction and part for the negative time direction which are then fed forwards to the same output layer as shown in Fig2. y y y y [10]. One of the solutions is to embed the memory structure like long-short-term memory cells (LSTM) [11]. The structure of a single LSTM memory cell is illustrated in Fig 3. Figure 3 Architecture of simple LSTM For LSTM, the computation at the time step t can be formally written as follows: Figure 2 Bidirectional LSTM Outputs from forward states are not connected to inputs of backward states, and vice versa [9]. Given the input sequence x = (x 1 x t) BRNN computes the forward hidden sequence h, the backward hidden sequence h and output sequence y=(y 1 y t) with the following iterative process: h t = (W xh x t + W h h h t 1 + b h ) (1) h t = (W xh x t + W h h h t+1 + b h ) (2) y t = W h y h t + W h y h t + b y (3) Learning RNN parameters W hy, W xh and W hh goes by using backpropagation through time (BPTT) algorithm. For learning those weight matrices of an RNN unfolds the network in time and propagates error signals backwards through time. BPTT can be viewed as extension of the classic backpropagation algorithm for feed-forward networks, where the stacked hidden layers for the same training frame t, are replaced by the T same single hidden layers across time t = 1,2,3,...,T. The use of the state space in the RNN enables its representation and learning of sequentially extended dependencies over arbitrarily long sequences but in practice has been shown that they are not capable of looking far back into the past in many types of input sequences. Generally, it is difficult to train RNN with commonly used activation functions (tanh, sigmoid, RELU) due to the exploding and vanishing gradient c t = f t c t 1 + i t tanh(w (xc) x t + W (hc) h t 1 + b (c) ) (4) o t = σ(w (xo) x t + W (ho) h t 1 + W (co) c t + b (o) ) (5) i t = σ(w (xi) x t + W (hi) h t 1 + W (ci) c t 1 + b (i) ) (6) f t = σ(w (xf) x t + W (hf) h t 1 + W (cf) c t 1 + b (f) ) (7) h t = o t tanh (c t ) (8) Above formulas describe five different types of information at time t representing the input gate, forget gate, cell activation, output gate, and hidden layer, where output σ is the logistic sigmoid function, W is weight matrices connecting different gates and b are the corresponding bias vectors. 2.2 Connectionist Temporal Classification Connectionist Temporal Classification (CTC) [12] is one of the most successful sequence-to-sequence learning algorithms for RNNs. CTC is an objective function that allows an RNN to be trained for sequence transcription tasks without requiring any prior alignment between the input and target sequences. It uses a softmax output layer to define a separate output distribution P(k t) at every step t over all possible label sequences, conditioned on a given input sequence. The output layer contains a single unit for each of the transcription labels (characters, phonemes, musical notes, etc.), plus an extra unit referred to as the blank which corresponds to a null. Alignment of CTC π is defined by length T sequence of blank and label ISSN: Volume 3, 2018
3 indices. The probability P(π x) is a product of the emission probabilities at every time step. T P(π x) = t=1 P(π t t, x) (9) For a given transcription sequence, there are as many possible alignments as there is different ways of separating the labels with blanks. Given this distribution, an objective function can be derived that directly maximizes the probabilities of the correct labeling. Since the objective function is differentiable, the network can then be trained with BPTT algorithm, where we try to minimize the CTC objective function: CTC(x) = log P(y x) (10) where y is target transcription. 2.4 Wake-up Word Detection Voice command module continuously receives and process sound. The goal of Wake-up Word (WUW) detection module is reducing the additional number of cases in which ASR preforms Speech to Text (SST) operations at the same time minimizing unwanted triggers of Voice Interface (VCI). Therefore, WUW spotting module can be viewed as a continuous identification process of the predefined wake-up word while rejecting all other words, sounds, and noises. In this work, we presented a context-aware keyword spotter which consists of unidirectional RNN (frontend) and decoder (back-end) similar to the one in [13]. RNN is trained as end-to-end ASR with CTC to transcribe the input speech to a sequence of character labels. Given the purpose, keyword spotting compared to large vocabulary ASR requires simpler architecture suitable for real-time lowpower systems; hereof network consists of two unidirectional LSTM layers and a softmax output layer. At each frame, the soft output of the RNN is fed into back-end decoder for computing posterior probabilities of keywords. Decoder takes labels probabilities (characters, word-boundary and CTC blank) and converts them to character level output sequences. Word-boundary label is responsible for detecting keyword as a substring of other words, i.e., decoders network is bounded by the word-boundary label _. At each time frame decoder updates state by multiplying networks input probabilities with RNN labels probabilities. Nodes of the network contain either character label or word-boundary label followed by CTC blank. Decoder network probability is calculated as a sum of all incoming probabilities, and when negative log posterior probability is below a threshold, then the keyword is detected where the threshold is proportional to the number of characters in the keyword. 2.5 Automatic Speech recognition When WUW component detects a predefined keyword, it triggers actual command recording, which is then processed by speech recognition component. In this work, we implemented end-toend ASR module aiming for offline speech recognition to protect users privacy. In the context of Voice command module, ASR needs to be lightweight and working with limited vocabulary since all command recording and processing should be done on low power embedded device. The architecture of ASR comprised of three components: Feature extraction part, Acoustic model (AM) responsible for describing distribution over acoustic observations for given the word sequence and the language model (LM) which assigns a probability to every possible word sequence. Underlying architecture for AM is Bidirectional Recurrent Neural Network with LSTM similar to one employed in [14]. Using bidirectional network allows the output layer to obtain information from past and future states i.e. information about the location of the characters before and after the current utterance can bring improvements in performance but introduces the increase in latency. This increase in latency is justified considering that role of the ASR in Voice Control module pipeline is speech-to-text transformation where the only accuracy is essential because transformed text command is further process by VCI and user doesn t see actual output form ASR component (there isn t feedback from ASR to the user). Each layer contains 512 LSTM cells (256 neurons per forward and backward sub-layer) CTC algorithm was used to train sequence to sequence network with two bidirectional LSTM layers. The goal of CTC is to use neural network to transform input sequence into probabilities of characters at each time step, the speaker has spoken 1 of 29 possible characters. ASR combines sequence score of acoustic model labels over a time frame and sequences score of words from language model and predicts actual speech command as word sequence that maximizes both the language and acoustic model scores. To speed up the training process of ASR in this work we used pretrained kalidi [15] LM to estimates the prior probabilities of spoken sounds. AM component is implemented in Tensorflow and for ISSN: Volume 3, 2018
4 learning parameters we used BPTT with AdamOptimizer which is an improved version of traditional gradient descent by using momentum (moving averages of the parameters) to control the learning rate. 2.6 Voice Interface Voice command Interface receives results of the speech-to-text processing as unstructured text data. To create structured input which can be later used by HA to realize user commands, we used the command parser. parser accepts the unstructured command text and performs the semantic analysis according to predefined command patterns. parser was realized using parse generator ANTLR 4[16] and all possible command patterns are defined in ANTLR grammar file. Before determining command patterns, we need to identify between two types of smart home devices according to their use case: actuator where the user can set state, and the sensor which allows only to get the state. To design commands patterns to control or read the state of an individual device or a group of devices we define three parts: action, device name/device group and location descriptors (optional). 3 Evaluation Experimental evaluation of the Voice Control module is made by analyzing each component separately. Before the start of the evaluation we asked 20 participants to read a text containing 400 sentences of which 220 were home automation commands. Each command line is preceded by WUW keyword Elise. All the recordings are altered by corrupting clean utterances using room simulator, adding varying degrees of noise and reverberation from daily life noisy environmental recording so that overall SNR is between 0db and 30db with average SNR of 12 db. This recorded data set is about 4 hours long and it will be used as our test set. In this work freely available recordings of transcribed English speech are used as the training set for both WUW and ASR components. Training corpus consists of 1000 hours of speech available as part of LibriSpeech dataset [17]. Inputs to RNN acoustic models are 40-dimensional MFCC features and their first and second-order derivates WUW component is further fine-tuned by adapting the WUW acoustical model to the speakers. Also, we used Speech s Dataset [18] containing 65,000 one-second long utterances of 30 short words, by thousands of different people for the ASR model refinement training. Experimental results for WUW are given regarding false alarm per hour, i.e., false alarm frequency (FAF) which is defined as the incorrect detection of a keyword per hour. False Alarm (FA) rate is calculated by dividing the number of false positives by the total number of examples. False alarm means that the system is activated even though the user did not intend to do so. Reported FAF of the keyword spotter is 0,5. Evaluation results for ASR component are given in terms of word error rate (WER). Reported average WER of LibriSpeeech corpus is 21.5% and 32% when the recorded test set is used. The accuracy of Interface depends most on accuracy of speech to text transcription, and it can achieve 100% if the command follows grammar rules. 4 Conclusion In this work, we presented the architecture of the voice command module that enabled voice control inside the smart home lab. The intent was to build a lightweight module that can run on low power embedded devices like Raspberry Pi. We presented the WUW spotting system that enables user to activate VCM using speech instead of manual activation. ASR was designed to allow offline speech to text processing regarding the matter of privacy. Unstructured text from ASR is parsed by Voice Interface and converted to appropriate JSON directive. Acknowledgment This work has been fully supported by the Croatian Science Foundation under the project number UIP References: [1] De Silva, L.C., Morikawa, C. and Petra, I.M., State of the art of smart homes. Engineering Applications of Artificial Intelligence, 25(7), pp [2] Picone, J., Fundamentals of speech recognition: A short course. Institute for Signal and Information Processing, Mississippi State University. [3] Giannakopoulos, T., Tatlas, N.A., Ganchev, T. and Potamitis, I., A practical, real-time speech-driven home automation front-end. ISSN: Volume 3, 2018
5 IEEE Transactions on Consumer Electronics, 51(2), pp [4] McLoughlin, I.V. and Sharifzadeh, H.R., 2007, December. Speech recognition engine adaptions for smart home dialogues. In Information, Communications & Signal Processing, th International Conference on (pp. 1-5). IEEE. [5] Graves, A., Mohamed, A.R. and Hinton, G., 2013, May. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on (pp ). IEEE. [6] Graves, A., Sequence transduction with recurrent neural networks. arxiv preprint arxiv: [7] Vinyals, O., Ravuri, S.V. and Povey, D., 2012, March. Revisiting recurrent neural networks for robust ASR. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on (pp ). IEEE. [8] Li, J., Zhang, H., Cai, X. and Xu, B., Towards end-to-end speech recognition for chinese mandarin using long short-term memory recurrent neural networks. In Sixteenth annual conference of the international speech communication association. [9] Schuster, M. and Paliwal, K.K., Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), pp [10] Graves, A., Generating sequences with recurrent neural networks. arxiv preprint arxiv: [11] Hochreiter, S. and Schmidhuber, J., Long short-term memory. Neural computation, 9(8), pp [12] Graves, A., Fernández, S., Gomez, F. and Schmidhuber, J., 2006, June. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning (pp ). ACM. [13] Hwang, K., Lee, M. and Sung, W., Online keyword spotting with a character-level recurrent neural network. arxiv preprint arxiv: [14] Graves, A. and Jaitly, N., 2014, January. Towards end-to-end speech recognition with recurrent neural networks. In International Conference on Machine Learning (pp ). [15] Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P. and Silovsky, J., The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding (No. EPFL-CONF ). IEEE Signal Processing Society. [16] Parr, T., The definitive ANTLR 4 reference. Pragmatic Bookshelf. [17] Panayotov, V., Chen, G., Povey, D. and Khudanpur, S., 2015, April. Librispeech: an ASR corpus based on public domain audio books. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp ). IEEE. [18] Warden, P., Speech s: A Dataset for Limited-Vocabulary Speech Recognition. arxiv preprint arxiv: ISSN: Volume 3, 2018
Low latency acoustic modeling using temporal convolution and LSTMs
1 Low latency acoustic modeling using temporal convolution and LSTMs Vijayaditya Peddinti, Yiming Wang, Daniel Povey, Sanjeev Khudanpur Abstract Bidirectional long short term memory (BLSTM) acoustic models
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationarxiv: v1 [cs.cl] 28 Nov 2016
An End-to-End Architecture for Keyword Spotting and Voice Activity Detection arxiv:1611.09405v1 [cs.cl] 28 Nov 2016 Chris Lengerich Mindori Palo Alto, CA chris@mindori.com Abstract Awni Hannun Mindori
More informationEnd-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow
End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow Ehsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani Google Inc, Mountain View, CA, USA
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationGated Recurrent Unit Based Acoustic Modeling with Future Context
Interspeech 2018 2-6 September 2018, Hyderabad Gated Recurrent Unit Based Acoustic Modeling with Future Context Jie Li 1, Xiaorui Wang 1, Yuanyuan Zhao 2, Yan Li 1 1 Kwai, Beijing, P.R. China 2 Institute
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationSpeech Recognition with Quaternion Neural Networks
Speech Recognition with Quaternion Neural Networks LIA - 2019 Titouan Parcollet, Mohamed Morchid, Georges Linarès University of Avignon, France ORKIS, France Summary I. Problem definition II. Quaternion
More informationLong Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling
INTERSPEECH 2014 Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling Haşim Sak, Andrew Senior, Françoise Beaufays Google, USA {hasim,andrewsenior,fsb@google.com}
More informationEmpirical Evaluation of RNN Architectures on Sentence Classification Task
Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks
More informationA Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October
More informationEmpirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural
More informationRecurrent Neural Networks
Recurrent Neural Networks Javier Béjar Deep Learning 2018/2019 Fall Master in Artificial Intelligence (FIB-UPC) Introduction Sequential data Many problems are described by sequences Time series Video/audio
More informationarxiv: v3 [cs.sd] 1 Nov 2018
AN IMPROVED HYBRID CTC-ATTENTION MODEL FOR SPEECH RECOGNITION Zhe Yuan, Zhuoran Lyu, Jiwei Li and Xi Zhou Cloudwalk Technology Inc, Shanghai, China arxiv:1810.12020v3 [cs.sd] 1 Nov 2018 ABSTRACT Recently,
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationText Recognition in Videos using a Recurrent Connectionist Approach
Author manuscript, published in "ICANN - 22th International Conference on Artificial Neural Networks, Lausanne : Switzerland (2012)" DOI : 10.1007/978-3-642-33266-1_22 Text Recognition in Videos using
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationVariable-Component Deep Neural Network for Robust Speech Recognition
Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft
More informationHybrid Accelerated Optimization for Speech Recognition
INTERSPEECH 26 September 8 2, 26, San Francisco, USA Hybrid Accelerated Optimization for Speech Recognition Jen-Tzung Chien, Pei-Wen Huang, Tan Lee 2 Department of Electrical and Computer Engineering,
More informationSequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015
Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using
More informationA Novel Approach to On-Line Handwriting Recognition Based on Bidirectional Long Short-Term Memory Networks
A Novel Approach to n-line Handwriting Recognition Based on Bidirectional Long Short-Term Memory Networks Marcus Liwicki 1 Alex Graves 2 Horst Bunke 1 Jürgen Schmidhuber 2,3 1 nst. of Computer Science
More informationMaximum Likelihood Beamforming for Robust Automatic Speech Recognition
Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Barbara Rauch barbara@lsv.uni-saarland.de IGK Colloquium, Saarbrücken, 16 February 2006 Agenda Background: Standard ASR Robust ASR
More informationTraining LDCRF model on unsegmented sequences using Connectionist Temporal Classification
Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification 1 Amir Ahooye Atashin, 2 Kamaledin Ghiasi-Shirazi, 3 Ahad Harati Department of Computer Engineering Ferdowsi University
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition Théodore Bluche theodore.bluche@gmail.com São Paulo Meetup - 9 Mar. 2017 2 Who am I? Théodore Bluche PhD defended
More informationBidirectional Truncated Recurrent Neural Networks for Efficient Speech Denoising
Bidirectional Truncated Recurrent Neural Networks for Efficient Speech Denoising Philémon Brakel, Dirk Stroobandt, Benjamin Schrauwen Department of Electronics and Information Systems, Ghent University,
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition 2 Who am I? Théodore Bluche PhD defended at Université Paris-Sud last year Deep Neural Networks for Large Vocabulary Handwritten
More informationEND-TO-END CHINESE TEXT RECOGNITION
END-TO-END CHINESE TEXT RECOGNITION Jie Hu 1, Tszhang Guo 1, Ji Cao 2, Changshui Zhang 1 1 Department of Automation, Tsinghua University 2 Beijing SinoVoice Technology November 15, 2017 Presentation at
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationShow, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks
Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of
More informationCode Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:
Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment
More informationMulti-Dimensional Recurrent Neural Networks
Multi-Dimensional Recurrent Neural Networks Alex Graves 1, Santiago Fernández 1, Jürgen Schmidhuber 1,2 1 IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland 2 TU Munich, Boltzmannstr. 3, 85748 Garching,
More informationOn the Efficiency of Recurrent Neural Network Optimization Algorithms
On the Efficiency of Recurrent Neural Network Optimization Algorithms Ben Krause, Liang Lu, Iain Murray, Steve Renals University of Edinburgh Department of Informatics s17005@sms.ed.ac.uk, llu@staffmail.ed.ac.uk,
More informationarxiv: v1 [cs.ai] 14 May 2007
Multi-Dimensional Recurrent Neural Networks Alex Graves, Santiago Fernández, Jürgen Schmidhuber IDSIA Galleria 2, 6928 Manno, Switzerland {alex,santiago,juergen}@idsia.ch arxiv:0705.2011v1 [cs.ai] 14 May
More informationSEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018
SEMANTIC COMPUTING Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) Dagmar Gromann International Center For Computational Logic TU Dresden, 21 December 2018 Overview Handling Overfitting Recurrent
More informationEncoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44
A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,
More informationDetecting Fraudulent Behavior Using Recurrent Neural Networks
Computer Security Symposium 2016 11-13 October 2016 Detecting Fraudulent Behavior Using Recurrent Neural Networks Yoshihiro Ando 1,2,a),b) Hidehito Gomi 2,c) Hidehiko Tanaka 1,d) Abstract: Due to an increase
More informationTitle. Author(s)Noguchi, Wataru; Iizuka, Hiroyuki; Yamamoto, Masahit. CitationEAI Endorsed Transactions on Security and Safety, 16
Title Proposing Multimodal Integration Model Using LSTM an Author(s)Noguchi, Wataru; Iizuka, Hiroyuki; Yamamoto, Masahit CitationEAI Endorsed Transactions on Security and Safety, 16 Issue Date 216-12-28
More informationRecurrent Neural Network (RNN) Industrial AI Lab.
Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 ML Group Overview Today we will talk about applied machine learning (ML) on Arm. My aim for today is to show you just
More informationLSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University
LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationIntroducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit
Journal of Machine Learning Research 6 205) 547-55 Submitted 7/3; Published 3/5 Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit Felix Weninger weninger@tum.de Johannes
More informationImage Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading
More informationEnd-To-End Spam Classification With Neural Networks
End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationLecture 7: Neural network acoustic models in speech recognition
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic
More informationXES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework
XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework Demo Paper Joerg Evermann 1, Jana-Rebecca Rehse 2,3, and Peter Fettke 2,3 1 Memorial University of Newfoundland 2 German Research
More informationSentiment Classification of Food Reviews
Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford
More informationGate-Variants of Gated Recurrent Unit (GRU) Neural Networks
Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks Rahul Dey and Fathi M. Salem Circuits, Systems, and Neural Networks (CSANN) LAB Department of Electrical and Computer Engineering Michigan State
More information저작권법에따른이용자의권리는위의내용에의하여영향을받지않습니다.
저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할수없습니다. 변경금지. 귀하는이저작물을개작, 변형또는가공할수없습니다. 귀하는, 이저작물의재이용이나배포의경우,
More informationarxiv: v1 [cs.cv] 4 Feb 2018
End2You The Imperial Toolkit for Multimodal Profiling by End-to-End Learning arxiv:1802.01115v1 [cs.cv] 4 Feb 2018 Panagiotis Tzirakis Stefanos Zafeiriou Björn W. Schuller Department of Computing Imperial
More informationHello Edge: Keyword Spotting on Microcontrollers
Hello Edge: Keyword Spotting on Microcontrollers Yundong Zhang, Naveen Suda, Liangzhen Lai and Vikas Chandra ARM Research, Stanford University arxiv.org, 2017 Presented by Mohammad Mofrad University of
More informationA PRIORITIZED GRID LONG SHORT-TERM MEMORY RNN FOR SPEECH RECOGNITION. Wei-Ning Hsu, Yu Zhang, and James Glass
A PRIORITIZED GRID LONG SHORT-TERM MEMORY RNN FOR SPEECH RECOGNITION Wei-Ning Hsu, Yu Zhang, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
More informationLayerwise Interweaving Convolutional LSTM
Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States
More informationImage Captioning with Object Detection and Localization
Image Captioning with Object Detection and Localization Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
More informationRecurrent neural networks for polyphonic sound event detection in real life recordings
Recurrent neural networks for polyphonic sound event detection in real life recordings Giambattista Parascandolo, Heikki Huttunen, Tuomas Virtanen Tampere University of Technology giambattista.parascandolo@tut.fi
More information16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text
16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics
More informationarxiv: v1 [cs.cl] 30 Jan 2018
ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM Kyungmin Lee, Chiyoun Park, Namhoon Kim, and Jaewon Lee DMC R&D Center, Samsung Electronics, Seoul, Korea {k.m.lee,
More informationHouse Price Prediction Using LSTM
House Price Prediction Using LSTM Xiaochen Chen Lai Wei The Hong Kong University of Science and Technology Jiaxin Xu ABSTRACT In this paper, we use the house price data ranging from January 2004 to October
More informationDeep Learning Applications
October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Word Vector 2 Word2Vec Variants Skip-gram: predicting surrounding words given the target word (Mikolov+, 2013) CBOW (continuous bag-of-words): predicting
More informationWhy DNN Works for Speech and How to Make it More Efficient?
Why DNN Works for Speech and How to Make it More Efficient? Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering, York University, CANADA Joint work with Y.
More information11/14/2010 Intelligent Systems and Soft Computing 1
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationCUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models
CUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models Xie Chen, Xunying Liu, Yanmin Qian, Mark Gales and Phil Woodland April 1, 2016 Overview
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview
More informationClinical Name Entity Recognition using Conditional Random Field with Augmented Features
Clinical Name Entity Recognition using Conditional Random Field with Augmented Features Dawei Geng (Intern at Philips Research China, Shanghai) Abstract. In this paper, We presents a Chinese medical term
More informationGMM-FREE DNN TRAINING. Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao
GMM-FREE DNN TRAINING Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao Google Inc., New York {andrewsenior,heigold,michiel,hankliao}@google.com ABSTRACT While deep neural networks (DNNs) have
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationCS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationTransition-Based Dependency Parsing with Stack Long Short-Term Memory
Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationAssignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation
Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class
More informationDNN-BASED AUDIO SCENE CLASSIFICATION FOR DCASE 2017: DUAL INPUT FEATURES, BALANCING COST, AND STOCHASTIC DATA DUPLICATION
DNN-BASED AUDIO SCENE CLASSIFICATION FOR DCASE 2017: DUAL INPUT FEATURES, BALANCING COST, AND STOCHASTIC DATA DUPLICATION Jee-Weon Jung, Hee-Soo Heo, IL-Ho Yang, Sung-Hyun Yoon, Hye-Jin Shim, and Ha-Jin
More informationClinical Named Entity Recognition Method Based on CRF
Clinical Named Entity Recognition Method Based on CRF Yanxu Chen 1, Gang Zhang 1, Haizhou Fang 1, Bin He, and Yi Guan Research Center of Language Technology Harbin Institute of Technology, Harbin, China
More informationQuery-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram
International Conference on Education, Management and Computing Technology (ICEMCT 2015) Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based
More informationConvolutional Sequence to Sequence Non-intrusive Load Monitoring
1 Convolutional Sequence to Sequence Non-intrusive Load Monitoring Kunjin Chen, Qin Wang, Ziyu He Kunlong Chen, Jun Hu and Jinliang He Department of Electrical Engineering, Tsinghua University, Beijing,
More informationDeep Neural Networks for Recognizing Online Handwritten Mathematical Symbols
Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols Hai Dai Nguyen 1, Anh Duc Le 2 and Masaki Nakagawa 3 Tokyo University of Agriculture and Technology 2-24-16 Nakacho, Koganei-shi,
More informationWHO WANTS TO BE A MILLIONAIRE?
IDIAP COMMUNICATION REPORT WHO WANTS TO BE A MILLIONAIRE? Huseyn Gasimov a Aleksei Triastcyn Hervé Bourlard Idiap-Com-03-2012 JULY 2012 a EPFL Centre du Parc, Rue Marconi 19, PO Box 592, CH - 1920 Martigny
More informationLSTM: An Image Classification Model Based on Fashion-MNIST Dataset
LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application
More informationDeep Learning for Program Analysis. Lili Mou January, 2016
Deep Learning for Program Analysis Lili Mou January, 2016 Outline Introduction Background Deep Neural Networks Real-Valued Representation Learning Our Models Building Program Vector Representations for
More informationImproving Bottleneck Features for Automatic Speech Recognition using Gammatone-based Cochleagram and Sparsity Regularization
Improving Bottleneck Features for Automatic Speech Recognition using Gammatone-based Cochleagram and Sparsity Regularization Chao Ma 1,2,3, Jun Qi 4, Dongmei Li 1,2,3, Runsheng Liu 1,2,3 1. Department
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More informationarxiv: v2 [stat.ml] 15 Jun 2018
Scalable Factorized Hierarchical Variational Autoencoder Training Wei-Ning Hsu, James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139,
More informationImproving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks
Interspeech 2018 2-6 September 2018, Hyderabad Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks Sheng Li 1, Xugang Lu 1, Ryoichi Takashima 1, Peng Shen 1, Tatsuya Kawahara
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 Arm K.K. Senior FAE Ryuji Tanaka Overview Today we will talk about applied machine learning (ML) on Arm. My aim for
More informationSpeech Technology Using in Wechat
Speech Technology Using in Wechat FENG RAO Powered by WeChat Outline Introduce Algorithm of Speech Recognition Acoustic Model Language Model Decoder Speech Technology Open Platform Framework of Speech
More informationBUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES
BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES O.O. Iakushkin a, G.A. Fedoseev, A.S. Shaleva, O.S. Sedova Saint Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg,
More informationDeep Learning and Its Applications
Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent
More informationDomain-Aware Sentiment Classification with GRUs and CNNs
Domain-Aware Sentiment Classification with GRUs and CNNs Guangyuan Piao 1(B) and John G. Breslin 2 1 Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway,
More informationNeural Network Weight Selection Using Genetic Algorithms
Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks
More informationIndex. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,
A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationSemantic image search using queries
Semantic image search using queries Shabaz Basheer Patel, Anand Sampat Department of Electrical Engineering Stanford University CA 94305 shabaz@stanford.edu,asampat@stanford.edu Abstract Previous work,
More informationThe Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays
CHiME2018 workshop The Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays Naoyuki Kanda 1, Rintaro Ikeshita 1, Shota Horiguchi 1,
More informationConfidence Measures: how much we can trust our speech recognizers
Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition
More informationCrowd Scene Understanding with Coherent Recurrent Neural Networks
Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu May 22, 2016 Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 1 / 26 Outline 1 Introduction 2 LSTM
More informationLSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia
1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model
More informationPixel-level Generative Model
Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tübingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord,
More information