Discriminative Training and Adaptation of Large Vocabulary ASR Systems

Size: px
Start display at page:

Download "Discriminative Training and Adaptation of Large Vocabulary ASR Systems"

Transcription

1 Discriminative Training and Adaptation of Large Vocabulary ASR Systems Phil Woodland March 30th 2004 ICSI Seminar: March 30th 2004

2 Overview Why use discriminative training for LVCSR? MMIE/CMLE criterion & simple example Issues for LVCSR optimsation methods: extended Baum-Welch algorithm computation: lattice-based training generalisation MMIE performance within task across task MPE criterion ICSI Seminar: March 30th

3 error based criterion I-smoothing performance Discriminative MAP weak-sense auxiliary functions task-based adaptation Discriminative linear transform based adaptation supervised adaptation discriminative SAT unsupervised discriminative adaptation Current lines of research Conclusions ICSI Seminar: March 30th

4 Many others have worked/are working at CUED on HMM-based discriminative training and adaptation for large vocab ASR systems including: Ricky Chan KK Chin Ricardo de Cordoba Mark Gales Do Yeong Kim Julian Odell Dan Povey Khe Chai Sim Luis Uebel Valtcho Valchev Lan Wang Steve Young Kai Yu ICSI Seminar: March 30th

5 Why Discriminative Criteria? Standard HMM training uses maximum likelihood estimation (MLE) MLE optimisation criteria is F MLE (λ) = R log P λ (O r M wr ) r=1 w r is the transcription for utterance r and M wr the corresponding model. Would be optimal if several unrealistic assumptions met Infinite training set size Model correctness Neither condition met for speech recognition, hence interesting to investigate alternatives, especially discriminative schemes such as MMIE (& MPE) ICSI Seminar: March 30th

6 MMIE Basics Maximum mutual information estimation (MMIE) maximises the sentence level posterior : in log form F MMIE (λ) = R r=1 log P λ (O r M wr ) P (w r ) w P λ (O r M w ) P (w) Numerator is likelihood of data given correct transcription (as for MLE) Denominator expands total likelihood in terms of all word sequences Can compute denominator by finding likelihood through composite HMM with all recognition constraints (recognition model) Need to optimise rational objective function (harder than for MLE) Maximise numerator (MLE term) Simultaneously minimise denominator ICSI Seminar: March 30th

7 More closely related to word error rate than MLE not optimising an error rate directly Strictly Conditional Maximum Likelihood Estimator but here equiv to MMIE, since LM fixed Widely used on small vocab tasks since late 1980s/early 1990s Can compute denominator using recognition pass MMIE weights training data unequally (well classified small weight) MLE gives all training samples equal weight Simple example shows usefulness with incorrect model assumptions. Two class static pattern recognition problem Two dimensional data from full covariance Gaussian Modelled with diagonal covariance Gaussian ICSI Seminar: March 30th

8 Simple MMIE Example 3 MLE SOLUTION (FULL COVARIANCE) 3 MLE SOLUTION (DIAGONAL) MMIE SOLUTION MMIE CRITERION / ERROR RATE ITERATION ICSI Seminar: March 30th

9 MMIE Issues for LVCSR Need to have effective optimisation technique that scales well to large systems. Optimisation: Extended Baum-Welch (Gopalakrishnan et al, Normandin) ˆµ jm = { θ num jm (O) θden jm (O)} + Dµ jm { γ num jm γden jm } + D ˆσ 2 jm = { θ num jm (O2 ) θjm den(o2 ) } + D(σjm 2 + µ2 jm { ) γ num jm } ˆµ 2 γden jm + D jm Gaussian occupancies (summed over time) are γ jm. θ jm (O) and θ jm (O 2 ) are sums of data and squared data respectively, weighted by occupancy. num and den denote correct word sequence, & recognition model respectively. ICSI Seminar: March 30th

10 Denominator requires computation of all sentence likelihoods: with lattices approximate Require good generalisation Can reduce training set error rate: need to reduce test-set errors! Not just better with small numbers of parameters (as often thought with MMIE) Need to increase confusable data for training Use acoustic scaling to broaden posterior distribution across denominator Weakened language model type to increase confusable data with focus on acoustics (Schlueter et al) ICSI Seminar: March 30th

11 Original Lattice Based MMIE Introduced by Valtchev, Odell, Woodland & Young (1996) Use a word-lattice to represent numerator & denominator terms Recognise every training sentence with a bigram LM (denominator) Accumulate statistics for EBW via forward-backward pass on lattice Forward-backward at word-level: Viterbi at state level Iterate EBW training using fixed word level lattice Evaluated on Wall Street Journal (SI284 training) Good test-set gains for simpler models Small/zero gains for more complex models Very effective at reducing training-set error rate (for denominator lattices) ICSI Seminar: March 30th

12 Current (2000-) MMIE Implementation Generate word lattices for training set with a fast recogniser using MLE models generate phone-marked lattices with model boundary times run EBW algorithm for several iterations Exact match lattice search Only run forward-backward between boundaries Use acoustic scaling of complete segments (by LM probabilities) F-B passes uses unigram (or v. small bigram)language model scores Parameter updates Standard updating formulae for means/variances Gaussian specific D constant with flooring Revised updates for mixture weights which leads to faster convergence ICSI Seminar: March 30th

13 NAB/WSJ MMIE Results Standard HTK large vocab LVCSR system (no adaptation) 66 hour training set #Mix H1 dev H1 eval Comp MLE MMIE MLE MMIE Bigger reductions in WER for simpler systems All model complexities improve ICSI Seminar: March 30th

14 Cross-Task NAB/WSJ MMIE Results Test discriminative training across task Train on WSJ-type data and test on broadcast news Train Setup Avg F0 F1 F2 F4 FX NAB-C2 MLE NAB-C2 MMIE BN-36H MLE BN-72H MLE BN-72H MMIE %WER on BNdev96pe data using trigram, GI, for NAB channel 2 models trained with either MLE or MMIE. The use of BN training data is also shown for comparison. MMIE always better than MLE even with severe mismatch ICSI Seminar: March 30th

15 Error Rates on Conversational telephone Speech Iteration 68 hour training 265 hour training Number eval97sub eval98 eval97sub eval98 0 (MLE) %WER from several iterations of MMIE training on CTS data Sizeable absolute reductions in WER after 4 iterations (eval98) 2.3% for 68 hour training set 3.4% for 265 hour training larger gains with increased training set sizes is general pattern ICSI Seminar: March 30th

16 MPE Objective Function Maximise the following function: F MPE (λ) = R r w p λ(o r w)p (w)rawaccuracy(w) w p λ(o r w)p (w) RawAccuracy(w) measures the number of phones correctly transcribed in sentence w (derived from word recognition). i.e. the number of correct phones in w inserted phones in w F MPE (λ) is weighted average of RawAccuracy(w) over all w. MPE is smoothed approx to phone error in a word recognition context Error measure reduces sensitivity to outliers Can use lattice-based implementation (requires time-based alignments for errors) and new statistics computation to still use EBW update formulae ICSI Seminar: March 30th

17 Improved Generalisation using I-smoothing Use of discriminative criteria can easily cause over-training Get smoothed estimates of parameters by combining Maximum Likelihood (ML) and MPE objective functions for each Gaussian Rather than globally interpolate (H-criterion), amount of ML smoothing depends on the amount of data per Gaussian I-smoothing adds τ samples of the average ML statistics for each Gaussian. Typically τ =50. For MMI scale numerator counts appropriately For MPE need ML counts in addition to other MPE statistics I-smoothing essential for MPE (& helps a little for MMI) ICSI Seminar: March 30th

18 MPE CTS results % WER Train % WER eval98 % WER redn (test) MLE MMIE MMIE (τ =200) MPE (τ =50) HMMs trained on 68hr set. Train use lattice unigram % WER Train % WER eval98 % WER redn (test) MLE baseline MMIE % MMIE (τ =200) % MPE (τ =100) % HMMs trained on 265hr train. Train is lattice unigram I-smoothing reduces the error rate with MMI by % abs MPE/I-smoothing gives around 1% abs lower WER than previous MMIE results ICSI Seminar: March 30th

19 Discriminative MAP Maximum A Posteriori (MAP) is a standard adaptation scheme: increasing adaptation data tends to Maximum Likelihood estimation; referred to as ML-MAP For discriminative MAP schemes: increasing adaptation data tends to discriminative estimation; maximum mutual information (MMI-MAP) and minimum phone error (MPE-MAP) adaptation investigated. Evaluation for task porting from CTS to Voic Also used for creation of gender dependent models ICSI Seminar: March 30th

20 Strong/Weak Sense Auxiliary Functions ^ F( λ, λ) ^ F( λ, λ) ^ G( λ, λ ) ^ G( λ, λ ) ^ λ λ G λ F (a)strong Sense λ ^ λ λ F λ G (b) Weak Sense λ Strong Sense: used for standard EM - guaranteed convergence, requires G(λ, ˆλ) G(ˆλ, ˆλ) F(λ) F(ˆλ), Weak Sense: applicable to MMI - yields Extended BW, requires G(λ, ˆλ) = λ λ F(λ). λ=ˆλ λ=ˆλ ICSI Seminar: March 30th

21 Weak Sense Auxiliary functions for MMI MMI criterion may be expressed as F MMIE (λ) = log p(o M num ) log p(o M den ) The weak sense auxiliary function is G MMIE (λ, ˆλ) = G num (λ, ˆλ) G den (λ, ˆλ) + G sm (λ, ˆλ). where G num (λ, ˆλ) and G den (λ, ˆλ) are standard strong sense auxiliary functions. A smoothing term is added to improve stability - satisfies λ Gsm (λ, ˆλ) = 0 λ=ˆλ This ensures that final function is still a valid weak sense auxiliary function and appropriate choice yields E-BW ICSI Seminar: March 30th

22 Incorporating Prior Information By definition a function is a weak sense auxiliary function of itself: a log-prior may be directly added to the weak sense auxiliary function. To make normal discriminative training more robust the ML estimate of the parameter values can be used as the centre of an appropriately defined prior distribution This yields I-Smoothing µ j = {θnum j (O) θ den {γ num j j (O)} + D j ˆµ j + τ I µ ml j γj den } + D j + τ I τ I determines influence of prior (ML estimate) on the final MMI estimate. ICSI Seminar: March 30th

23 MMI-MAP For adaptation/porting the ML estimate may not be robust use a ML-MAP estimate as the prior Use count-smoothing ML-MAP with prior parameters ( µ j ) µ j = {θnum j (O) θ den (O)} + D j ˆµ j + τ I µ map {γ num j j γ den j } + D j + τ I j where µ map j = θnum j γ j num (O)+τ µ j +τ Two smoothing variables for MMI-MAP τ determines how close the prior is to the ML estimate τ I determines how much the prior influences the final estimate. Similar form may be used for MPE-MAP. ICSI Seminar: March 30th

24 Switchboard to Voic Porting Results Test WER on Voic ML >ML MAP ML >MMI MAP MMI >ML MAP MMI >MMI MAP Hours of adaptation data WERs on Voic for varying amounts of adaptation data (MMI or ML) adapted with (MMI-MAP or ML-MAP) 4.5% relative improvement from MMI-MAP vs. ML-MAP (starting from 30h adaptation data ICSI Seminar: March 30th

25 Discriminative Linear-Transform Based Adaptation Adaptation by estimating a set of linear transforms for Gaussian means and/or variances Normally computed using ML (MLLR) Can estimate transforms for the model parmeters or apply to the features Can estimate transforms for various discriminative criteria using theory of weak-sense auxiliary functions including MMI and MPE. Investigated for supervised and unsupervised adaptation. Can also apply to discriminative speaker adaptive training training set transforms estimated for each training speaker/condition toaccount for variability estimate canonical model after applying transforms use MMI/MPE for both canonical model and transforms ICSI Seminar: March 30th

26 Supervised Adaptation with DLT Adapt native speaker models to non-natives (40 enrollment utterances) WSJ/NAB 1994 S3 dev/eval sets Use either interpolation of MMI/ML criteria (H-Crit) or MPE Single iteration WERs shown (more iterations help further) Test sets Upadapt MLLR H-crit DLT MPE-DLT s3-dev s3-eval Large improvements from adaptation Gains from discriminative adaptation ICSI Seminar: March 30th

27 Unsupervised Adaptation with DLT Unsupervised discriminative adaptation is a challenge! DLT can learn supervision information very effectively... Use supervision from strong LM and weak-lm for denominator Include confidence scores on words Evaluated in part of a CTS transcription system Supervision from MLLR-adapted confusion network decoding 27.0% WER Small gains from unsupervised DLT MLLR MPE DLT MPE DLT +conf ICSI Seminar: March 30th

28 Other Current Work Discriminative training is being applied in a range of other ways to various models For very large datasets using lightly-supervised training methods recogniser generated transcriptions with 5-10% WER (strong LM) use weaker LM for confusable data For joint cluster-adaptive training and linear transform estimation For various types of precision matrix modelling For other extended forms of HMMs parameters linear predictive HMMs to best determine the model structure Still working on refinements to basic process lattice (re-)generation & combination forms of smoothing/prior ICSI Seminar: March 30th

29 Summary & Outlook Discriminative training is effective for large vocabulary recognition Important to address basic issues efficient optimisation (EBW + lattice schemes) generalisation (acoustic scaling, weakened LMs, stopping overtraining) Interesting properties WER difference to ML is bigger with more data Most effective with smaller number of parameters Improvements under within-task and cross-task conditions All leading large vocab research systems now use discriminative training Minimum Phone Error training more effective than MMIE ICSI Seminar: March 30th

30 Std approach to training systems at Cambridge since 2002 Theoretical extensions using concerpt of weak-sense auxilairy functions New derivation of extended Baum-Welch algorithm Discriminative MAP adaptation schemes (better task porting) Discriminative linear transforms (supervised and unsupervised adaptation) Still refinements and application to more complex model structures ICSI Seminar: March 30th

Discriminative training and Feature combination

Discriminative training and Feature combination Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics

More information

Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training

Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training Chao Zhang and Phil Woodland March 8, 07 Cambridge University Engineering Department

More information

THE most popular training method for hidden Markov

THE most popular training method for hidden Markov 204 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 3, MAY 2004 A Discriminative Training Algorithm for Hidden Markov Models Assaf Ben-Yishai and David Burshtein, Senior Member, IEEE Abstract

More information

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:

More information

Maximum Likelihood Beamforming for Robust Automatic Speech Recognition

Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Barbara Rauch barbara@lsv.uni-saarland.de IGK Colloquium, Saarbrücken, 16 February 2006 Agenda Background: Standard ASR Robust ASR

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

HMMS AND RELATED SPEECH RECOGNITION TECHNOLOGIES. Steve Young

HMMS AND RELATED SPEECH RECOGNITION TECHNOLOGIES. Steve Young Springer Handbook on Speech Processing and Speech Communication 1 HMMS AND RELATED SPEECH RECOGNITION TECHNOLOGIES Steve Young Cambridge University Engineering Department Trumpington Street, Cambridge,

More information

Why DNN Works for Speech and How to Make it More Efficient?

Why DNN Works for Speech and How to Make it More Efficient? Why DNN Works for Speech and How to Make it More Efficient? Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering, York University, CANADA Joint work with Y.

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

CONDITIONAL RANDOM FIELDS FOR CONTINUOUS SPEECH RECOGNITION. By Yasser Hifny Abdel-Haleem

CONDITIONAL RANDOM FIELDS FOR CONTINUOUS SPEECH RECOGNITION. By Yasser Hifny Abdel-Haleem CONDITIONAL RANDOM FIELDS FOR CONTINUOUS SPEECH RECOGNITION By Yasser Hifny Abdel-Haleem SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY AT FACULTY OF ENGINEERING

More information

c COPYRIGHT Microsoft Corporation. c COPYRIGHT Cambridge University Engineering Department.

c COPYRIGHT Microsoft Corporation. c COPYRIGHT Cambridge University Engineering Department. The HTK Book Steve Young Gunnar Evermann Dan Kershaw Gareth Moore Julian Odell Dave Ollason Valtcho Valtchev Phil Woodland The HTK Book (for HTK Version 3.1) c COPYRIGHT 1995-1999 Microsoft Corporation.

More information

MINIMUM EXACT WORD ERROR TRAINING. G. Heigold, W. Macherey, R. Schlüter, H. Ney

MINIMUM EXACT WORD ERROR TRAINING. G. Heigold, W. Macherey, R. Schlüter, H. Ney MINIMUM EXACT WORD ERROR TRAINING G. Heigold, W. Macherey, R. Schlüter, H. Ney Lehrstuhl für Informatik 6 - Computer Science Dept. RWTH Aachen University, Aachen, Germany {heigold,w.macherey,schlueter,ney}@cs.rwth-aachen.de

More information

The HTK Book. The HTK Book (for HTK Version 3.4)

The HTK Book. The HTK Book (for HTK Version 3.4) The HTK Book Steve Young Gunnar Evermann Mark Gales Thomas Hain Dan Kershaw Xunying (Andrew) Liu Gareth Moore Julian Odell Dave Ollason Dan Povey Valtcho Valtchev Phil Woodland The HTK Book (for HTK Version

More information

The HTK Book. Steve Young Gunnar Evermann Dan Kershaw Gareth Moore Julian Odell Dave Ollason Dan Povey Valtcho Valtchev Phil Woodland

The HTK Book. Steve Young Gunnar Evermann Dan Kershaw Gareth Moore Julian Odell Dave Ollason Dan Povey Valtcho Valtchev Phil Woodland The HTK Book Steve Young Gunnar Evermann Dan Kershaw Gareth Moore Julian Odell Dave Ollason Dan Povey Valtcho Valtchev Phil Woodland The HTK Book (for HTK Version 3.2) c COPYRIGHT 1995-1999 Microsoft Corporation.

More information

MLSALT11: Large Vocabulary Speech Recognition

MLSALT11: Large Vocabulary Speech Recognition MLSALT11: Large Vocabulary Speech Recognition Riashat Islam Department of Engineering University of Cambridge Trumpington Street, Cambridge, CB2 1PZ, England ri258@cam.ac.uk I. INTRODUCTION The objective

More information

CUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models

CUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models CUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models Xie Chen, Xunying Liu, Yanmin Qian, Mark Gales and Phil Woodland April 1, 2016 Overview

More information

Variable-Component Deep Neural Network for Robust Speech Recognition

Variable-Component Deep Neural Network for Robust Speech Recognition Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft

More information

Introduction to HTK Toolkit

Introduction to HTK Toolkit Introduction to HTK Toolkit Berlin Chen 2003 Reference: - The HTK Book, Version 3.2 Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools Homework:

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

Constrained Discriminative Training of N-gram Language Models

Constrained Discriminative Training of N-gram Language Models Constrained Discriminative Training of N-gram Language Models Ariya Rastrow #1, Abhinav Sethy 2, Bhuvana Ramabhadran 3 # Human Language Technology Center of Excellence, and Center for Language and Speech

More information

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Jan Vaněk and Josef V. Psutka Department of Cybernetics, West Bohemia University,

More information

Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition

Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition by Hong-Kwang Jeff Kuo, Brian Kingsbury (IBM Research) and Geoffry Zweig (Microsoft Research) ICASSP 2007 Presented

More information

THE RT04 EVALUATION STRUCTURAL METADATA SYSTEMS AT CUED. M. Tomalin and P.C. Woodland

THE RT04 EVALUATION STRUCTURAL METADATA SYSTEMS AT CUED. M. Tomalin and P.C. Woodland THE RT04 EVALUATION STRUCTURAL METADATA S AT CUED M. Tomalin and P.C. Woodland Cambridge University Engineering Department, Trumpington Street, Cambridge, CB2 1PZ, UK. Email: mt126,pcw @eng.cam.ac.uk ABSTRACT

More information

Conditional Random Fields : Theory and Application

Conditional Random Fields : Theory and Application Conditional Random Fields : Theory and Application Matt Seigel (mss46@cam.ac.uk) 3 June 2010 Cambridge University Engineering Department Outline The Sequence Classification Problem Linear Chain CRFs CRF

More information

Using Gradient Descent Optimization for Acoustics Training from Heterogeneous Data

Using Gradient Descent Optimization for Acoustics Training from Heterogeneous Data Using Gradient Descent Optimization for Acoustics Training from Heterogeneous Data Martin Karafiát Λ, Igor Szöke, and Jan Černocký Brno University of Technology, Faculty of Information Technology Department

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018 Assignment 2 Unsupervised & Probabilistic Learning Maneesh Sahani Due: Monday Nov 5, 2018 Note: Assignments are due at 11:00 AM (the start of lecture) on the date above. he usual College late assignments

More information

CAMBRIDGE UNIVERSITY

CAMBRIDGE UNIVERSITY CAMBRIDGE UNIVERSITY ENGINEERING DEPARTMENT DISCRIMINATIVE CLASSIFIERS WITH GENERATIVE KERNELS FOR NOISE ROBUST SPEECH RECOGNITION M.J.F. Gales and F. Flego CUED/F-INFENG/TR605 August 13, 2008 Cambridge

More information

Large Scale Distributed Acoustic Modeling With Back-off N-grams

Large Scale Distributed Acoustic Modeling With Back-off N-grams Large Scale Distributed Acoustic Modeling With Back-off N-grams Ciprian Chelba* and Peng Xu and Fernando Pereira and Thomas Richardson Abstract The paper revives an older approach to acoustic modeling

More information

A Gaussian Mixture Model Spectral Representation for Speech Recognition

A Gaussian Mixture Model Spectral Representation for Speech Recognition A Gaussian Mixture Model Spectral Representation for Speech Recognition Matthew Nicholas Stuttle Hughes Hall and Cambridge University Engineering Department PSfrag replacements July 2003 Dissertation submitted

More information

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 Hidden Markov Models Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 1 Outline 1. 2. 3. 4. Brief review of HMMs Hidden Markov Support Vector Machines Large Margin Hidden Markov Models

More information

The HTK Book. Steve Young Dan Kershaw Julian Odell Dave Ollason Valtcho Valtchev Phil Woodland. The HTK Book (for HTK Version 3.1)

The HTK Book. Steve Young Dan Kershaw Julian Odell Dave Ollason Valtcho Valtchev Phil Woodland. The HTK Book (for HTK Version 3.1) The HTK Book Steve Young Dan Kershaw Julian Odell Dave Ollason Valtcho Valtchev Phil Woodland The HTK Book (for HTK Version 3.1) c COPYRIGHT 1995-1999 Microsoft Corporation. All Rights Reserved First published

More information

1 1 λ ( i 1) Sync diagram is the lack of a synchronization stage, which isthe main advantage of this method. Each iteration of ITSAT performs ex

1 1 λ ( i 1) Sync diagram is the lack of a synchronization stage, which isthe main advantage of this method. Each iteration of ITSAT performs ex Fast Robust Inverse Transform SAT and Multi-stage ation Hubert Jin, Spyros Matsoukas, Richard Schwartz, Francis Kubala BBN Technologies 70 Fawcett Street, Cambridge, MA 02138 ABSTRACT We present a new

More information

Short-time Viterbi for online HMM decoding : evaluation on a real-time phone recognition task

Short-time Viterbi for online HMM decoding : evaluation on a real-time phone recognition task Short-time Viterbi for online HMM decoding : evaluation on a real-time phone recognition task Julien Bloit, Xavier Rodet To cite this version: Julien Bloit, Xavier Rodet. Short-time Viterbi for online

More information

Comparison of Bernoulli and Gaussian HMMs using a vertical repositioning technique for off-line handwriting recognition

Comparison of Bernoulli and Gaussian HMMs using a vertical repositioning technique for off-line handwriting recognition 2012 International Conference on Frontiers in Handwriting Recognition Comparison of Bernoulli and Gaussian HMMs using a vertical repositioning technique for off-line handwriting recognition Patrick Doetsch,

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Speech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute allauzen@cs.nyu.edu Slide Credit: Mehryar Mohri This Lecture Speech recognition evaluation N-best strings

More information

Learning The Lexicon!

Learning The Lexicon! Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,

More information

Contents. Resumen. List of Acronyms. List of Mathematical Symbols. List of Figures. List of Tables. I Introduction 1

Contents. Resumen. List of Acronyms. List of Mathematical Symbols. List of Figures. List of Tables. I Introduction 1 Contents Agraïments Resum Resumen Abstract List of Acronyms List of Mathematical Symbols List of Figures List of Tables VII IX XI XIII XVIII XIX XXII XXIV I Introduction 1 1 Introduction 3 1.1 Motivation...

More information

A ROBUST SPEAKER CLUSTERING ALGORITHM

A ROBUST SPEAKER CLUSTERING ALGORITHM A ROBUST SPEAKER CLUSTERING ALGORITHM J. Ajmera IDIAP P.O. Box 592 CH-1920 Martigny, Switzerland jitendra@idiap.ch C. Wooters ICSI 1947 Center St., Suite 600 Berkeley, CA 94704, USA wooters@icsi.berkeley.edu

More information

GMM-FREE DNN TRAINING. Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao

GMM-FREE DNN TRAINING. Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao GMM-FREE DNN TRAINING Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao Google Inc., New York {andrewsenior,heigold,michiel,hankliao}@google.com ABSTRACT While deep neural networks (DNNs) have

More information

Machine Learning. Supervised Learning. Manfred Huber

Machine Learning. Supervised Learning. Manfred Huber Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D

More information

Discriminate Analysis

Discriminate Analysis Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1 Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models

Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models ABSTRACT Euisok Chung Hyung-Bae Jeon Jeon-Gue Park and Yun-Keun Lee Speech Processing Research Team, ETRI, 138 Gajeongno,

More information

Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition

Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition Kartik Audhkhasi, Abhinav Sethy Bhuvana Ramabhadran Watson Multimodal Group IBM T. J. Watson Research Center Motivation

More information

Mixture Models and EM

Mixture Models and EM Table of Content Chapter 9 Mixture Models and EM -means Clustering Gaussian Mixture Models (GMM) Expectation Maximiation (EM) for Mixture Parameter Estimation Introduction Mixture models allows Complex

More information

Chapter 3. Speech segmentation. 3.1 Preprocessing

Chapter 3. Speech segmentation. 3.1 Preprocessing , as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Handling Data with Three Types of Missing Values:

Handling Data with Three Types of Missing Values: Handling Data with Three Types of Missing Values: A Simulation Study Jennifer Boyko Advisor: Ofer Harel Department of Statistics University of Connecticut Storrs, CT May 21, 2013 Jennifer Boyko Handling

More information

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION Prateek Verma, Yang-Kai Lin, Li-Fan Yu Stanford University ABSTRACT Structural segmentation involves finding hoogeneous sections appearing

More information

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Your Name: Your student id: Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Problem 1 [5+?]: Hypothesis Classes Problem 2 [8]: Losses and Risks Problem 3 [11]: Model Generation

More information

Applications of Keyword-Constraining in Speaker Recognition. Howard Lei. July 2, Introduction 3

Applications of Keyword-Constraining in Speaker Recognition. Howard Lei. July 2, Introduction 3 Applications of Keyword-Constraining in Speaker Recognition Howard Lei hlei@icsi.berkeley.edu July 2, 2007 Contents 1 Introduction 3 2 The keyword HMM system 4 2.1 Background keyword HMM training............................

More information

Modeling time series with hidden Markov models

Modeling time series with hidden Markov models Modeling time series with hidden Markov models Advanced Machine learning 2017 Nadia Figueroa, Jose Medina and Aude Billard Time series data Barometric pressure Temperature Data Humidity Time What s going

More information

IBL and clustering. Relationship of IBL with CBR

IBL and clustering. Relationship of IBL with CBR IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS

LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS Tara N. Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, Bhuvana Ramabhadran IBM T. J. Watson

More information

EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition

EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition Yan Han and Lou Boves Department of Language and Speech, Radboud University Nijmegen, The Netherlands {Y.Han,

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Client Dependent GMM-SVM Models for Speaker Verification

Client Dependent GMM-SVM Models for Speaker Verification Client Dependent GMM-SVM Models for Speaker Verification Quan Le, Samy Bengio IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland {quan,bengio}@idiap.ch Abstract. Generative Gaussian Mixture Models (GMMs)

More information

Modeling Phonetic Context with Non-random Forests for Speech Recognition

Modeling Phonetic Context with Non-random Forests for Speech Recognition Modeling Phonetic Context with Non-random Forests for Speech Recognition Hainan Xu Center for Language and Speech Processing, Johns Hopkins University September 4, 2015 Hainan Xu September 4, 2015 1 /

More information

Overview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals

Overview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals Overview Search and Decoding Steve Renals Automatic Speech Recognition ASR Lecture 10 January - March 2012 Today s lecture Search in (large vocabulary) speech recognition Viterbi decoding Approximate search

More information

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Sea Chen Department of Biomedical Engineering Advisors: Dr. Charles A. Bouman and Dr. Mark J. Lowe S. Chen Final Exam October

More information

Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition

Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition Tara N. Sainath 1, Brian Kingsbury 1, Bhuvana Ramabhadran 1, Petr Fousek 2, Petr Novak 2, Abdel-rahman Mohamed 3

More information

Using Document Summarization Techniques for Speech Data Subset Selection

Using Document Summarization Techniques for Speech Data Subset Selection Using Document Summarization Techniques for Speech Data Subset Selection Kai Wei, Yuzong Liu, Katrin Kirchhoff, Jeff Bilmes Department of Electrical Engineering University of Washington Seattle, WA 98195,

More information

An Introduction to Pattern Recognition

An Introduction to Pattern Recognition An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included

More information

Introduction to The HTK Toolkit

Introduction to The HTK Toolkit Introduction to The HTK Toolkit Hsin-min Wang Reference: - The HTK Book Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools A Tutorial Example

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Optimizing feature representation for speaker diarization using PCA and LDA

Optimizing feature representation for speaker diarization using PCA and LDA Optimizing feature representation for speaker diarization using PCA and LDA itsikv@netvision.net.il Jean-Francois Bonastre jean-francois.bonastre@univ-avignon.fr Outline Speaker Diarization what is it?

More information

Learning N-gram Language Models from Uncertain Data

Learning N-gram Language Models from Uncertain Data Learning N-gram Language Models from Uncertain Data Vitaly Kuznetsov 1,2, Hank Liao 2, Mehryar Mohri 1,2, Michael Riley 2, Brian Roark 2 1 Courant Institute, New York University 2 Google, Inc. vitalyk,hankliao,mohri,riley,roark}@google.com

More information

NON-LINEAR DIMENSION REDUCTION OF GABOR FEATURES FOR NOISE-ROBUST ASR. Hitesh Anand Gupta, Anirudh Raju, Abeer Alwan

NON-LINEAR DIMENSION REDUCTION OF GABOR FEATURES FOR NOISE-ROBUST ASR. Hitesh Anand Gupta, Anirudh Raju, Abeer Alwan NON-LINEAR DIMENSION REDUCTION OF GABOR FEATURES FOR NOISE-ROBUST ASR Hitesh Anand Gupta, Anirudh Raju, Abeer Alwan Department of Electrical Engineering, University of California Los Angeles, USA {hiteshag,

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Computationally Efficient M-Estimation of Log-Linear Structure Models

Computationally Efficient M-Estimation of Log-Linear Structure Models Computationally Efficient M-Estimation of Log-Linear Structure Models Noah Smith, Doug Vail, and John Lafferty School of Computer Science Carnegie Mellon University {nasmith,dvail2,lafferty}@cs.cmu.edu

More information

Lecture 7: Neural network acoustic models in speech recognition

Lecture 7: Neural network acoustic models in speech recognition CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic

More information

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen Structured Perceptron Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen 1 Outline 1. 2. 3. 4. Brief review of perceptron Structured Perceptron Discriminative Training Methods for Hidden Markov Models: Theory and

More information

Combining Audio and Video for Detection of Spontaneous Emotions

Combining Audio and Video for Detection of Spontaneous Emotions Combining Audio and Video for Detection of Spontaneous Emotions Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič, and Nikola Pavešić Faculty of Electrical Engineering, University

More information

Scott Shaobing Chen & P.S. Gopalakrishnan. IBM T.J. Watson Research Center. as follows:

Scott Shaobing Chen & P.S. Gopalakrishnan. IBM T.J. Watson Research Center.   as follows: SPEAKER, ENVIRONMENT AND CHANNEL CHANGE DETECTION AND CLUSTERING VIA THE BAYESIAN INFORMATION CRITERION Scott Shaobing Chen & P.S. Gopalakrishnan IBM T.J. Watson Research Center email: schen@watson.ibm.com

More information

Lecture 5: Markov models

Lecture 5: Markov models Master s course Bioinformatics Data Analysis and Tools Lecture 5: Markov models Centre for Integrative Bioinformatics Problem in biology Data and patterns are often not clear cut When we want to make a

More information

HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION. Hung-An Chang and James R. Glass

HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION. Hung-An Chang and James R. Glass HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION Hung-An Chang and James R. Glass MIT Computer Science and Artificial Intelligence Laboratory Cambridge, Massachusetts, 02139,

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

Manifold Constrained Deep Neural Networks for ASR

Manifold Constrained Deep Neural Networks for ASR 1 Manifold Constrained Deep Neural Networks for ASR Department of Electrical and Computer Engineering, McGill University Richard Rose and Vikrant Tomar Motivation Speech features can be characterized as

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

PARALLEL TRAINING ALGORITHMS FOR CONTINUOUS SPEECH RECOGNITION, IMPLEMENTED IN A MESSAGE PASSING FRAMEWORK

PARALLEL TRAINING ALGORITHMS FOR CONTINUOUS SPEECH RECOGNITION, IMPLEMENTED IN A MESSAGE PASSING FRAMEWORK PARALLEL TRAINING ALGORITHMS FOR CONTINUOUS SPEECH RECOGNITION, IMPLEMENTED IN A MESSAGE PASSING FRAMEWORK Vladimir Popescu 1, 2, Corneliu Burileanu 1, Monica Rafaila 1, Ramona Calimanescu 1 1 Faculty

More information

Biology 644: Bioinformatics

Biology 644: Bioinformatics A statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states in the training data. First used in speech and handwriting recognition In

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Automatic Speech Recognition using Dynamic Bayesian Networks

Automatic Speech Recognition using Dynamic Bayesian Networks Automatic Speech Recognition using Dynamic Bayesian Networks Rob van de Lisdonk Faculty Electrical Engineering, Mathematics and Computer Science Delft University of Technology June 2009 Graduation Committee:

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification

Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 52 Optimization of Observation Membership Function By Particle Swarm Method for Enhancing

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Machine Learning. Semi-Supervised Learning. Manfred Huber

Machine Learning. Semi-Supervised Learning. Manfred Huber Machine Learning Semi-Supervised Learning Manfred Huber 2015 1 Semi-Supervised Learning Semi-supervised learning refers to learning from data where part contains desired output information and the other

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Lecture 8. LVCSR Training and Decoding. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lecture 8. LVCSR Training and Decoding. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen Lecture 8 LVCSR Training and Decoding Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 November

More information

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones What is machine learning? Data interpretation describing relationship between predictors and responses

More information

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate

More information

Optimizing Speech Recognition Evaluation Using Stratified Sampling

Optimizing Speech Recognition Evaluation Using Stratified Sampling INTERSPEECH 01 September 1, 01, San Francisco, USA Optimizing Speech Recognition Evaluation Using Stratified Sampling Janne Pylkkönen, Thomas Drugman, Max Bisani Amazon {jannepyl, drugman, bisani}@amazon.com

More information

Clustering. Shishir K. Shah

Clustering. Shishir K. Shah Clustering Shishir K. Shah Acknowledgement: Notes by Profs. M. Pollefeys, R. Jin, B. Liu, Y. Ukrainitz, B. Sarel, D. Forsyth, M. Shah, K. Grauman, and S. K. Shah Clustering l Clustering is a technique

More information

An empirical study of smoothing techniques for language modeling

An empirical study of smoothing techniques for language modeling Computer Speech and Language (1999) 13, 359 394 Article No. csla.1999.128 Available online at http://www.idealibrary.com on An empirical study of smoothing techniques for language modeling Stanley F. Chen

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process

More information