Sequence Prediction with Neural Segmental Models. Hao Tang
|
|
- Derick McDowell
- 5 years ago
- Views:
Transcription
1 Sequence Prediction with Neural Segmental Models Hao Tang
2 About Me Pronunciation modeling [TKL 2012] Segmental models [TGL 2014] [TWGL 2015] [TWGL 2016] [TWGL 2016] American Sign Language fingerspelling recognition [KWTL 2015] Under-resourced speech recognition [LJTMSKHK 2016] [HJMMLDELMTLCHKSL 2016] State-tying with CCA [WTL 2016] Dialog state tracking [TWMH 2014] Finite-state transducers Discriminative training Linear models Structured prediction Neural networks
3 Segments Netflix announces House of Cards return with dark inauguration day promo.
4 Frames and Segments segment: variable-length unit Netflix announces House of Cards return with dark inauguration day promo. frame: fixed-length unit
5 Frame-Based Models B O B I I O O y 1 y 2 y 3 y 4 y 5 y 6 y 7 x 1 x 2 x 3 x 4 x 5 x 6 x 7 Netflix announces House of Cards return with
6 Frame Labels BIO tags in named-entity recognition B O B I I O O Netflix announces House of Cards return with Netflix announces House of Cards return with sub-phonetic states in phonetic recognition ay-1 ay-2 ay-3 ay
7 Frame Labels BIO tags in named-entity recognition B O B I I O O Netflix announces House of Cards return with Netflix announces House of Cards return with sub-phonetic states in phonetic recognition ay-1 ay-2 ay-3 ay We are not able to express features such as duration formants I[the segment has balanced parentheses]
8 Reduction to Graph Search Problem named-entity recognition speech recognition parsing translation Sequence Prediction Graph Search Inference: 1. Take input x. Build search graph G. 2. Find the maximum scoring path in G.
9 Frame-Based Models B B B B y 1 y 2 y 3 y 4 I I I I x 1 x 2 x 3 x 4 O O O O Netflix announces House of
10 Segmental Models I I O O O Netflix announces House of Cards return with
11 Segmental Models I I O O Netflix announces House of Cards return with
12 Segmental Models I O I O Netflix announces House of Cards return with
13 Segmental Models I O Netflix announces House of Cards return with
14 Segmental Models I O Netflix announces House of Cards return with
15 Segmental Models I I O O O Netflix announces House of Cards return with
16 Segmental Models Netflix announces House of Cards return with Netflix announces House of Cards return with
17 Segmental Models Netflix announces House of Cards return with Netflix announces House of Cards return with The difference between frame-based models and segmental models: Search graph. Features can be extracted from variable-length units.
18 Problem Definition Netflix announces House of Cards return with Search space: G = (V, E) Weight: w θ (x, e), where x is the input and e is an edge. Inference: finding the maximum scoring path. Learning: finding θ that minimizes a loss function.
19 Past Research on Segmental Models Network-based digit recognition [Bush and Kopec, 1985] SUMMIT [Zue et al., 1989] [Glass, 2003] Stochastic segmental models [Ostendorf and Roukos, 1989] Hidden semi-markov models [Sarawagi and Cohen, 2004] Segmental conditional random fields (SCRF) [Zweig and Ngyuen, 2009] [Zweig et al., 2011] [Zweig, 2012] Boundary-factored segmental CRF [He and Fosler-Lussier, 2012] Deep segmental neural networks [Abdel-Hamid et al., 2013] Discriminative segmental cascades [TWGL 2015] Segmental recurrent neural networks [Lu et al., 2016]
20 Problem: Efficiency Runtime for inference O( E c), where c is the time to compute the weight of an edge. Suppose x = T, and the label set is of size L. Frame-based models: E = O(TL) Segmental models: E = O(TLD), where D is the maximum duration. L D named-entity recognition 30 4 action recognition phoneme recognition word recognition
21 Past Research on Efficiency of Segmental Models Bottom-up approach [Zue and Glass, 1988] Other ASR systems [Chang and Glass, 1997] [Zweig et al., 2010] Augmentation [Glass et al., 1996] [Chang and Glass, 1997] Separate pruners [Okanohara et al., 2006] Different graph topologies [Andrew, 2006] [Vinh et al., 2011] [He and Fosler-Lussier, 2012]
22 Contribution Desideratum: No HMMs! Discriminative segmental cascades [TWGL 2015] [TWGL 2016] Improved performance with segmental neural networks and higher-order features while maintaining efficiency Structured composition for computing higher-order features efficiently Speedup in inference and learning without accuracy loss End-to-end training for segmental models [TWGL 2016] Two-stage training can serve as a good initialization for end-to-end training. Hinge loss converges the fastest and log loss achieves the best accuracy. Marginal log loss achieves strong results without relying on manual alignments.
23 Discriminative Segmental Cascades first-pass search space H 1 = Y 1 search space size feature complexity segmental features higher-order features
24 Discriminative Segmental Cascades first-pass search space H 1 = Y 1 prune a b a b pruned search space H 2 a c search space size feature complexity segmental features higher-order features
25 Discriminative Segmental Cascades first-pass search space H 1 = Y 1 prune a b a b pruned search space H 2 a c search space size feature complexity segmental features higher-order features σ-compose bigram LM L 2 ɛ b b b b a a b b c ɛ a a a a a a c second-pass search space H 2 σ L 2 = Y 2
26 Discriminative Segmental Cascades first-pass search space H 1 = Y 1 prune a b a b pruned search space H 2 a c search space size feature complexity segmental features higher-order features σ-compose bigram LM L 2 ɛ b b b b a a b b c ɛ a a a a a a c second-pass search space H 2 σ L 2 = Y 2
27 Discriminative Segmental Cascades first-pass search space H 1 = Y 1 prune a b a b pruned search space H 2 a c search space size feature complexity segmental features higher-order features σ-compose bigram LM L 2 ɛ b b b b a a b b c ɛ a a a a a a c second-pass search space H 2 σ L 2 = Y 2
28 Max-Marginal Pruning [Sixtus and Ortmanns, 1999, Weiss et al., 2012] for e E, prune e if γ(e) < t Max-marginal of e E is defined as γ(e) = max w(x, y) y e For α (0, 1), the threshold is defined as 1 t = α max γ(e) + (1 α) e E E At least one path is retained. γ(e) e E All paths with scores higher than t are retained. e
29 Structured Composition (σ-composition) The structured composition of A and B is defined as an FST G where a b a b a c V G = V A V B { E G = e 1, e 2 E A E B : o A (e 1) = i B (e2) } ɛ b ɛ a b b b a a b a a a a σ-compose b c a c After σ-composition, the search space becomes L n 1 times larger when using a n-gram language model with a vocabulary of size L. the weight function has access to a pair of labels
30 Experimental Setup iy v eh n ih f... Task Phonetic recognition Dataset TIMIT Size 6 hours Ground truth Manual alignments Loss function hinge loss Maximum duration 30 Label set size 48 Average input length
31 Beam Pruning vs Max-marginal Pruning oracle error (%) beam pruning max-marginal pruning oracle error (%) beam pruning max-marginal pruning density (edges per gold edge) real-time factor Beam pruning is faster. Max-marginal pruning produces more compact lattices.
32 Beam Search vs Exact Search dev PER (%) hit rate (%) beam width beam width When the model is well-trained, beam search can be as good as exact search. Dual decomposition is not an option, since we only allow a single pass over the edges.
33 Beam Search vs Exact Search dev PER (%) hit rate (%) beam width beam width When the model is well-trained, beam search can be as good as exact search. Dual decomposition is not an option, since we only allow a single pass over the edges.
34 Beam Search vs Exact Search dev PER (%) hit rate (%) beam width beam width When the model is well-trained, beam search can be as good as exact search. Dual decomposition is not an option, since we only allow a single pass over the edges.
35 Beam Search vs Exact Search dev PER (%) hit rate (%) beam width beam width When the model is well-trained, beam search can be as good as exact search. Dual decomposition is not an option, since we only allow a single pass over the edges.
36 Learning with Beam Search vs Learning with Cascades unigram bigram dev PER (%) epoch exact beam=10 beam=20 beam=30 cascade Learning with beam search is fine for the unigram case but fails in the bigram case. Learning with cascades is both effective and efficient.
37 Learning with Beam Search vs Learning with Cascades unigram bigram dev PER (%) epoch exact beam=10 beam=20 beam=30 cascade Learning with beam search is fine for the unigram case but fails in the bigram case. Learning with cascades is both effective and efficient.
38 Learning with Beam Search vs Learning with Cascades unigram bigram dev PER (%) epoch exact beam=10 beam=20 beam=30 cascade Learning with beam search is fine for the unigram case but fails in the bigram case. Learning with cascades is both effective and efficient.
39 Learning with Beam Search vs Learning with Cascades unigram bigram dev PER (%) epoch dev PER (%) epoch exact beam=10 beam=20 beam=30 cascade Learning with beam search is fine for the unigram case but fails in the bigram case. Learning with cascades is both effective and efficient.
40 Phonetic Recognition on TIMIT dev test HMM-DNN st -pass segmental model bigram LM nd -order boundary features st -order segment NN st -order bi-phone NN bottleneck
41 We consider signer-dependent, signer-independent, and signer-adapted recognition. We Americannext Sign describe Language the recognizers Fingerspelling we compare, as well as Recognition the techniques we[kim exploreetfor al., signer 2016] adaptation. All of the recognizers use deep neural network (DNN) classifiers of letters or handshape features. <s> T U L I P </s> <s> T U L I P </s> Figure 2-1: Images and ground-truth segmentations of the fingerspelled word TULIP produced by two signers. Image frames are sub-sampled at the same rate from both signers to show the true relative speeds. Asterisks indicate manually annotated peak frames for each letter. <s> and </s> denote non-signing intervals before/after signing. LER Tandem HMM 14.6% 19 Rescoring SCRF 11.5% cascade 1 st pass 8.8% cascade 2 nd pass 7.6%
42 Improving Efficiency first-pass search space H 1 = Y 1 prune first-pass search space H 1 = Y 1 a b a b a c second-pass search space H 2 = Y 2
43 Improving Efficiency baseline proposed dev PER (%) real-time factor 1st pass 2nd pass training hours baseline 1st pass proposed 1st pass proposed 2nd pass
44 Contribution Desideratum: No HMMs! Discriminative segmental cascades [TWGL 2015] [TWGL 2016] Improved performance with segmental neural networks and higher-order features while maintaining efficiency Structured composition for computing higher-order features efficiently Speedup in inference and learning without accuracy loss End-to-end training for segmental models [TWGL 2016] Two-stage training can serve as a good initialization for end-to-end training. Hinge loss converges the fastest and log loss achieves the best accuracy. Marginal log loss achieves strong results without relying on manual alignments.
45 Two-Stage vs End-to-End Training log prob [ ] [ ] [ ] [ ] [ ] [ ] [ ] f Λ x [ ] [ ] [ ] [ ] [ ] [ ] [ ]
46 Two-Stage vs End-to-End Training log prob [ ] [ ] [ ] [ ] [ ] [ ] [ ] f Λ x [ ] [ ] [ ] [ ] [ ] [ ] [ ]
47 Two-Stage vs End-to-End Training log prob [ ] [ ] [ ] [ ] [ ] [ ] [ ] f Λ x [ ] [ ] [ ] [ ] [ ] [ ] [ ]
48 Two-Stage vs End-to-End Training log prob [ ] [ ] [ ] [ ] [ ] [ ] [ ] f Λ x [ ] [ ] [ ] [ ] [ ] [ ] [ ]
49 Two-Stage vs End-to-End Training??? [ ] [ ] [ ] [ ] [ ] [ ] [ ] f Λ x [ ] [ ] [ ] [ ] [ ] [ ] [ ]
50 Two-Stage vs End-to-End Training Two-stage training 1. Find Λ by minimizing cross entropy at each frame. 2. Fix Λ. Find θ by minimizing hinge loss l hinge (θ, Λ; x, y, z) [ = max (y,z ) P cost((y, z ), (y, z)) θ φ Λ (x, y, z) + θ φ Λ (x, y, z ) ] End-to-end training from scratch 1. Randomly initialize Λ. 2. Find θ and Λ jointly by minimizing hinge loss. End-to-end fine-tuning 1. Two-stage training 2. End-to-end training
51 Two-Stage vs End-to-End Training for Hinge Loss 27.5 test PER stage e2e fine tuning End-to-end training can get stuck at a poor local optimum. Two-stage training provides a better starting point.
52 Two-Stage vs End-to-End Training for Hinge Loss test PER training loss stage e2e fine tuning stage e2e fine tuning End-to-end training can get stuck at a poor local optimum. Two-stage training provides a better starting point.
53 Log Loss Log loss l log (θ, Λ; x, y, z) = log p(y, z x) ) p(y, z x) = (θ 1 Z exp φ Λ (x, y, z) Z = ( ) exp θ φ Λ (x, y, z ) (y,z ) P
54 Two-Stage vs End-to-End Training for Log Loss test PER training loss stage e2e fine tuning 2-stage e2e fine tuning End-to-end training for log loss seems easier to optimize. Two-stage training provides a better starting point.
55 Frame-wise Cross Entropy 1.0 cross entropy best dev best dev dropout best dev dropout fine-tuning train CE dev CE End-to-end fine-tuning sticks to the log probability representation and improves it.
56 Other Loss Functions Marginal log loss l log (θ, Λ; x, y) = log p(y x) = log p(y, z x) z Z Latent hinge loss l latent-hinge (θ, Λ; x, y) [ = max (y,z ) P cost((y, z ), (y, z)) max θ φ Λ (x, y, z ) + θ φ Λ (x, y, z ) z Z ] z = argmaxθ φ Λ (x, y, z ) z Z
57 End-to-End Training without Manual Alignments test PER latent hinge loss marginal log loss MLL align 2-stage fine-tuning 2 stage fine-tuning e2e from scratch End-to-end training for marginal log loss seems easier. Two-stage training provides a better starting point.
58 End-to-End Training without Manual Alignments test PER latent hinge loss marginal log loss MLL align 2-stage fine-tuning 2 stage fine-tuning e2e from scratch End-to-end training for marginal log loss seems easier. Two-stage training provides a better starting point.
59 Loss Functions hours hinge loss log loss latent hinge loss marginal log loss LSTM alignments required convex in θ smooth sparse update hinge loss log loss latent hinge loss marginal log loss
60 Where are we? speakerindependent speakeradapted HMM-DNN HMM-CNN [Tòth, 2015] 16.5 Segment-based models [Glass 2003] 24.4 SCRF [Zweig, 2012] 33.1 SCRF with shallow NN [He and Fosler-Lussier, 2012] 26.5 SCRF with DNN [He, 2015] 19.1 Deep segmental NN [Abdel-Hamid et al., 2013] 21.9 cascade 1 st pass [TWGL 2015] 21.7 cascade 2 nd pass [TWGL 2015] 19.9 End-to-end + two-stage training [TWGL 2016] 19.7 Segmental RNN [Lu et al., 2016]
61 Contribution Desideratum: No HMMs! Discriminative segmental cascades [TWGL 2015] [TWGL 2016] Improved performance with segmental neural networks and higher-order features while maintaining efficiency Structured composition for computing higher-order features efficiently Speedup in inference and learning without accuracy loss End-to-end training for segmental models [TWGL 2016] Two-stage training can serve as a good initialization for end-to-end training. Hinge loss converges the fastest and log loss achieves the best accuracy. Marginal log loss achieves strong results without relying on manual alignments.
62 Ongoing and Future Work Unsupervised learning lexical unit discovery contrastive estimation [Smith and Eisner, 2005] autoencoder [Ammar et al., 2014, Tran et al., 2016] generative adversarial networks [Goodfellow et al., 2016] Structure + Network Networks Deep structured models [Chen et al., 2015] Attention [Chorowski et al., 2015] Structured attention networks [Kim et al., 2016] Large-scale structured prediction whole-word speech recognizers TIDIGITS (4.45% SER) Beam search + early update rule [Collins and Roark, 2004] First-order methods for inference Dijkstra s algorithm is steepest descent in the dual [Murota and Shioura, 2010] Structured Prediction Energy Network [Belanger and McCallum, 2015]
63 Acknowledgement Weiran Wang Taehwan Kim Kevin Gimpel Karen Livescu This research was supported by a Google faculty research award and NSF grant IIS The GPUs used for this research were donated by NVIDIA.
64 th ae ng k s
Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition
Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition by Hong-Kwang Jeff Kuo, Brian Kingsbury (IBM Research) and Geoffry Zweig (Microsoft Research) ICASSP 2007 Presented
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationA Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October
More informationA comparison of training approaches for discriminative segmental models
A comparison of training approaches for discriminative segmental models Hao Tang, Kevin Gimpel, Karen Livescu Toyota Technological Institute at Chicago {haotang,kgimpel,klivescu}@ttic.edu Abstract Segmental
More informationLearning The Lexicon!
Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,
More informationLecture 21 : A Hybrid: Deep Learning and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation
More informationLecture 7: Neural network acoustic models in speech recognition
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationDeep Learning Applications
October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationLexicographic Semirings for Exact Automata Encoding of Sequence Models
Lexicographic Semirings for Exact Automata Encoding of Sequence Models Brian Roark, Richard Sproat, and Izhak Shafran {roark,rws,zak}@cslu.ogi.edu Abstract In this paper we introduce a novel use of the
More informationConditional Random Fields : Theory and Application
Conditional Random Fields : Theory and Application Matt Seigel (mss46@cam.ac.uk) 3 June 2010 Cambridge University Engineering Department Outline The Sequence Classification Problem Linear Chain CRFs CRF
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationLSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia
1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationMachine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart
Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationAutomatic Speech Recognition (ASR)
Automatic Speech Recognition (ASR) February 2018 Reza Yazdani Aminabadi Universitat Politecnica de Catalunya (UPC) State-of-the-art State-of-the-art ASR system: DNN+HMM Speech (words) Sound Signal Graph
More informationKernels vs. DNNs for Speech Recognition
Kernels vs. DNNs for Speech Recognition Joint work with: Columbia: Linxi (Jim) Fan, Michael Collins (my advisor) USC: Zhiyun Lu, Kuan Liu, Alireza Bagheri Garakani, Dong Guo, Aurélien Bellet, Fei Sha IBM:
More informationTTIC 31190: Natural Language Processing
TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 2: Text Classification 1 Please email me (kgimpel@ttic.edu) with the following: your name your email address whether you taking
More informationWhy DNN Works for Speech and How to Make it More Efficient?
Why DNN Works for Speech and How to Make it More Efficient? Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering, York University, CANADA Joint work with Y.
More informationHidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017
Hidden Markov Models Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 1 Outline 1. 2. 3. 4. Brief review of HMMs Hidden Markov Support Vector Machines Large Margin Hidden Markov Models
More informationSpeech Technology Using in Wechat
Speech Technology Using in Wechat FENG RAO Powered by WeChat Outline Introduce Algorithm of Speech Recognition Acoustic Model Language Model Decoder Speech Technology Open Platform Framework of Speech
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationConditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國
Conditional Random Fields - A probabilistic graphical model Yen-Chin Lee 指導老師 : 鮑興國 Outline Labeling sequence data problem Introduction conditional random field (CRF) Different views on building a conditional
More informationStructured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen
Structured Perceptron Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen 1 Outline 1. 2. 3. 4. Brief review of perceptron Structured Perceptron Discriminative Training Methods for Hidden Markov Models: Theory and
More informationTuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017
Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax
More informationLSTM: An Image Classification Model Based on Fashion-MNIST Dataset
LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application
More informationDeep Learning. Volker Tresp Summer 2014
Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationUnsupervised Learning
Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy
More informationarxiv: v1 [cs.cl] 30 Jan 2018
ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM Kyungmin Lee, Chiyoun Park, Namhoon Kim, and Jaewon Lee DMC R&D Center, Samsung Electronics, Seoul, Korea {k.m.lee,
More informationCUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models
CUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models Xie Chen, Xunying Liu, Yanmin Qian, Mark Gales and Phil Woodland April 1, 2016 Overview
More informationClinical Named Entity Recognition Method Based on CRF
Clinical Named Entity Recognition Method Based on CRF Yanxu Chen 1, Gang Zhang 1, Haizhou Fang 1, Bin He, and Yi Guan Research Center of Language Technology Harbin Institute of Technology, Harbin, China
More informationStructured Attention Networks
Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 / Outline 1 Deep Neutral Networks for Text
More informationShallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001
Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationSparse Non-negative Matrix Language Modeling
Sparse Non-negative Matrix Language Modeling Joris Pelemans Noam Shazeer Ciprian Chelba joris@pelemans.be noam@google.com ciprianchelba@google.com 1 Outline Motivation Sparse Non-negative Matrix Language
More informationPart-of-Speech Tagging
Part-of-Speech Tagging A Canonical Finite-State Task 600.465 - Intro to NLP - J. Eisner 1 The Tagging Task Input: the lead paint is unsafe Output: the/ lead/n paint/n is/v unsafe/ Uses: text-to-speech
More informationSemantic Word Embedding Neural Network Language Models for Automatic Speech Recognition
Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition Kartik Audhkhasi, Abhinav Sethy Bhuvana Ramabhadran Watson Multimodal Group IBM T. J. Watson Research Center Motivation
More informationRecurrent Neural Nets II
Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization
More informationOverview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals
Overview Search and Decoding Steve Renals Automatic Speech Recognition ASR Lecture 10 January - March 2012 Today s lecture Search in (large vocabulary) speech recognition Viterbi decoding Approximate search
More informationLecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 December 2012
More informationMachine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center
Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction
More informationRecurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra
Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationMEMMs (Log-Linear Tagging Models)
Chapter 8 MEMMs (Log-Linear Tagging Models) 8.1 Introduction In this chapter we return to the problem of tagging. We previously described hidden Markov models (HMMs) for tagging problems. This chapter
More informationModeling Phonetic Context with Non-random Forests for Speech Recognition
Modeling Phonetic Context with Non-random Forests for Speech Recognition Hainan Xu Center for Language and Speech Processing, Johns Hopkins University September 4, 2015 Hainan Xu September 4, 2015 1 /
More informationImage Captioning and Generation From Text
Image Captioning and Generation From Text Presented by: Tony Zhang, Jonathan Kenny, and Jeremy Bernstein Mentor: Stephan Zheng CS159 Advanced Topics in Machine Learning: Structured Prediction California
More informationTransition-Based Dependency Parsing with Stack Long Short-Term Memory
Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented
More informationClinical Name Entity Recognition using Conditional Random Field with Augmented Features
Clinical Name Entity Recognition using Conditional Random Field with Augmented Features Dawei Geng (Intern at Philips Research China, Shanghai) Abstract. In this paper, We presents a Chinese medical term
More informationComplex Prediction Problems
Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity
More informationTutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY
Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company
More informationPOINT CLOUD DEEP LEARNING
POINT CLOUD DEEP LEARNING Innfarn Yoo, 3/29/28 / 57 Introduction AGENDA Previous Work Method Result Conclusion 2 / 57 INTRODUCTION 3 / 57 2D OBJECT CLASSIFICATION Deep Learning for 2D Object Classification
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationJoint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training
Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training Chao Zhang and Phil Woodland March 8, 07 Cambridge University Engineering Department
More informationFrame and Segment Level Recurrent Neural Networks for Phone Classification
Frame and Segment Level Recurrent Neural Networks for Phone Classification Martin Ratajczak 1, Sebastian Tschiatschek 2, Franz Pernkopf 1 1 Graz University of Technology, Signal Processing and Speech Communication
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationLattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models
Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models ABSTRACT Euisok Chung Hyung-Bae Jeon Jeon-Gue Park and Yun-Keun Lee Speech Processing Research Team, ETRI, 138 Gajeongno,
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationApplication of Deep Learning Techniques in Satellite Telemetry Analysis.
Application of Deep Learning Techniques in Satellite Telemetry Analysis. Greg Adamski, Member of Technical Staff L3 Technologies Telemetry and RF Products Julian Spencer Jones, Spacecraft Engineer Telenor
More informationSeq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning
Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning V. Zhong, C. Xiong, R. Socher Salesforce Research arxiv: 1709.00103 Reviewed by : Bill Zhang University of Virginia
More informationChapter 3. Speech segmentation. 3.1 Preprocessing
, as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents
More informationConstrained Discriminative Training of N-gram Language Models
Constrained Discriminative Training of N-gram Language Models Ariya Rastrow #1, Abhinav Sethy 2, Bhuvana Ramabhadran 3 # Human Language Technology Center of Excellence, and Center for Language and Speech
More informationTransductive Phoneme Classification Using Local Scaling And Confidence
202 IEEE 27-th Convention of Electrical and Electronics Engineers in Israel Transductive Phoneme Classification Using Local Scaling And Confidence Matan Orbach Dept. of Electrical Engineering Technion
More informationTraining for Fast Sequential Prediction Using Dynamic Feature Selection
Training for Fast Sequential Prediction Using Dynamic Feature Selection Emma Strubell Luke Vilnis Andrew McCallum School of Computer Science University of Massachusetts, Amherst Amherst, MA 01002 {strubell,
More informationDiscriminative Training with Perceptron Algorithm for POS Tagging Task
Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu
More informationDecentralized and Distributed Machine Learning Model Training with Actors
Decentralized and Distributed Machine Learning Model Training with Actors Travis Addair Stanford University taddair@stanford.edu Abstract Training a machine learning model with terabytes to petabytes of
More informationECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016
ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationWeighted Finite State Transducers in Automatic Speech Recognition
Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 15.04.2015 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri, M. Riley and
More informationStatistical parsing. Fei Xia Feb 27, 2009 CSE 590A
Statistical parsing Fei Xia Feb 27, 2009 CSE 590A Statistical parsing History-based models (1995-2000) Recent development (2000-present): Supervised learning: reranking and label splitting Semi-supervised
More informationarxiv: v2 [cs.lg] 6 Jun 2015
HOPE (Zhang and Jiang) 1 Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks Shiliang Zhang and Hui Jiang arxiv:1502.00702v2 [cs.lg 6 Jun 2015 National
More informationExpectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University
Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate
More informationA Hybrid Neural Model for Type Classification of Entity Mentions
A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type
More informationConditional Random Field for tracking user behavior based on his eye s movements 1
Conditional Random Field for tracing user behavior based on his eye s movements 1 Trinh Minh Tri Do Thierry Artières LIP6, Université Paris 6 LIP6, Université Paris 6 8 rue du capitaine Scott 8 rue du
More informationDeep Learning Cook Book
Deep Learning Cook Book Robert Haschke (CITEC) Overview Input Representation Output Layer + Cost Function Hidden Layer Units Initialization Regularization Input representation Choose an input representation
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationSparse Feature Learning
Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering
More informationAn Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation
An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007
More informationComputationally Efficient M-Estimation of Log-Linear Structure Models
Computationally Efficient M-Estimation of Log-Linear Structure Models Noah Smith, Doug Vail, and John Lafferty School of Computer Science Carnegie Mellon University {nasmith,dvail2,lafferty}@cs.cmu.edu
More informationRestricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version
Shallow vs. deep networks Restricted Boltzmann Machines Shallow: one hidden layer Features can be learned more-or-less independently Arbitrary function approximator (with enough hidden units) Deep: two
More informationCAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha
More informationHidden Markov Models in the context of genetic analysis
Hidden Markov Models in the context of genetic analysis Vincent Plagnol UCL Genetics Institute November 22, 2012 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi
More informationFinite-State and the Noisy Channel Intro to NLP - J. Eisner 1
Finite-State and the Noisy Channel 600.465 - Intro to NLP - J. Eisner 1 Word Segmentation x = theprophetsaidtothecity What does this say? And what other words are substrings? Could segment with parsing
More informationGrounded Compositional Semantics for Finding and Describing Images with Sentences
Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs
More informationAdvanced Search Algorithms
CS11-747 Neural Networks for NLP Advanced Search Algorithms Daniel Clothiaux https://phontron.com/class/nn4nlp2017/ Why search? So far, decoding has mostly been greedy Chose the most likely output from
More informationSemantic image search using queries
Semantic image search using queries Shabaz Basheer Patel, Anand Sampat Department of Electrical Engineering Stanford University CA 94305 shabaz@stanford.edu,asampat@stanford.edu Abstract Previous work,
More informationScene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science
Scene Text Recognition for Augmented Reality Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Outline Research area and motivation Finding text in natural scenes Prior art Improving
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationConditional Random Fields. Mike Brodie CS 778
Conditional Random Fields Mike Brodie CS 778 Motivation Part-Of-Speech Tagger 2 Motivation object 3 Motivation I object! 4 Motivation object Do you see that object? 5 Motivation Part-Of-Speech Tagger -
More informationAlternatives to Direct Supervision
CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationA new approach for supervised power disaggregation by using a deep recurrent LSTM network
A new approach for supervised power disaggregation by using a deep recurrent LSTM network GlobalSIP 2015, 14th Dec. Lukas Mauch and Bin Yang Institute of Signal Processing and System Theory University
More informationExam Marco Kuhlmann. This exam consists of three parts:
TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding
More informationDeep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More information