Detailed Notes on A Voice Dialing Application

Size: px
Start display at page:

Download "Detailed Notes on A Voice Dialing Application"

Transcription

1 Detailed Notes on A Voice Dialing Application It is assumed that you worked through steps 1-3 of the HTK Workshop These notes are intended to support you while working through the tutorial example of HTK. 0. General Preparations Create a new project file folder named <voicedialsystem>. 1. Copy HParse, HDMan, HCopy, HLEd, HERest, HVite, HCompV, HHEd, HResults from the <bin.win32> folder to your project. 2. Copy prompts2wlist, prompts2mlf, maketrihed from the <HTKtutorial> folder to your project. 3. Copy mkclscript.prl from the <perl_scripts> folder to your project. 4. Copy beep-1.0 from the <Beep> folder to your project. 1. Data Preparations 1.1 The Task Grammar Create a gram file with the following content: $digit = ONE TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE OH ZERO; $name = [JOOP] [JULIAN] [DAVE] [PHIL] WOOD [STEVE] YOUNG; (SENT-START (DIAL <$digit> (PHONE CALL) $name) SENT-END) Now execute on the command line: HParse gram wdnet 1.2 the Dictionary Create a trainprompts file, copy and re-label the selected training text from the TIMIT database that is available in the <speech> directory on the DVD. Extract the training words lists (wlist) with the following script: Perl prompts2wlisl trainprompts wlist Create a global.ded file with the following content: AS sp RS cmu MP sil sil sp

2 Create a names dictionary file with content: DAVE d ey v JOOP jh uh p JULIAN jh uw l y ax n JULIAN jh uw l ia n LAW l ao LEE l iy PHIL f ih l SENT-END [] sil SENT-START [] sil STEVE s t iy v SUE s uw SUE s y uw TYLER t ay l ax WOOD w uh d YOUNG y ah ng Execute on the command line: HDMan m w wlist n monophones1 l dlog dict beep-1.0 names Note that the original beep-1.0 dictionary may not correspond to your text, so you need to modify it manually. Open the dict file and change into SENT-END sil SENT-START sil SENT-END [] sil SENT-START [] sil This will result in no output even when these two words are recognized. 1.3 Recording the Data This should be clear. 1.4 Creating the Transcription Files Create a testprompts file, copy and re-label the selected training text from the TIMIT database. Perl prompts2mlf trainwords.mlf trainprompts Perl prompts2mlf testwords.mlf testprompts

3 Create an mkphones0.led file with content: EX IS sil sil DE sp Generate phone level MLFs with the following script: HLed l * d dict i phones0.mlf mkphones0.led trainwords.mlf Note: It may happen that some of the training words are not included in the beep-1.0 dictionary, so you may need to add them to your dict dictionary manually. 1.5 Coding the Data Create a config file with content: # Coding parameters SOURCEFORMAT= NIST TARGETKIND = MFCC_0_D_A TARGETRATE = SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F Create a codetr.scp file with a list of training source files (left side) and their corresponding feature output file (right side). Use the same method to create a codete.scp file. HCopy T 1 C config S codetr.scp HCopy T 1 C config S codete.scp

4 2. Creating Monophone HMMs 2.1 Creating Flat Start Monophones Create a proto file to define a prototype model with the following parameters: ~o <VecSize> 39 <MFCC_0_D_A> ~h "proto" <BeginHMM> <NumStates> 5 <State> 2 <Mean> (x39) <Variance> (x39) <State> 3 <Mean> (x39) <Variance> (x39) <State> 4 <Mean> (x39) <Variance> (x39) <TransP> <EndHMM> Create a train.scp file with a list of all the training files. Mkdir hmm0 HCompv C config f 0.01 m S train.scp M hmm0 proto Create a Master Macro File (MMF) called hmmdefs containing a copy for each of the monophones by manually copying all the required monophones (include sil ) and relabeling them. ~h aa <BeginHMM>

5 <EndHMM> ~h eh <BeginHMM> <EndHMM>..etc.. Create macros with the content: ~o <VECSIZE> 39 <MFCC_0_D_A> ~v varfloor1 <Variance> Delete the sp model in the monopnones1 file and save the file as monophones0. Execute the following scripts. Mkdir hmm1 HERest -C config I phones0.mlf t S train.scp H hmm0/macros H hmm0/hmmdefs M hmm1 monophones0 Mkdir hmm2 HERest -C config I phones0.mlf t S train.scp H hmm1/macros H hmm1/hmmdefs M hmm2 monophones0 Mkdir hmm3 HERest -C config I phones0.mlf t S train.scp H hmm2/macros H hmm2/hmmdefs M hmm3 monophones0 2.2 Fixing the Silence Models Make a new directory: Mkdir hmm4 Use a text editor on the file hmm3/hmmdefs to copy the centre state of the sil model to make a new sp model. Store the resulting MMF hmmdefs, which includes the new sp model, in the new directory <hmm4>. Copy macros file to the <hmm4> folder. Create the sil.led file with the following content: AT {sil.transp} AT {sil.transp} AT {sp.transp} TI silst {sil.state[3],sp.state[2]}

6 Execute the following commands: Mkdir hmm5 Hhed H hmm4/macros H hmm4/hmmdefs M hmm5 sil.hed monophones1 Mkdir hmm6 HERest -C config I phones0.mlf t S train.scp H hmm5/macros H hmm5/hmmdefs M hmm6 monophones1 Mkdir hmm7 HERest -C config I phones0.mlf t S train.scp H hmm6/macros H hmm6/hmmdefs M hmm7 monophones1 2.3 Realigning the Training Data Add SILENCE sil to dict and save as dict1. Note: Add */ before each file name in trainwords.mlf. Execute the following commands: HVite l * o SWT b SILENCE C config a H hmm7/macros H hmm7/hmmdelfs I aligned.mlf m t y lab I trainwords.mlf S train.scp dict1 monophones1 Mkdir hmm8 HERest C config I aligned.mlf t S train.scp H hmm7/macros H hmm7/hmmdefs M hmm8 monophonese1 Mkdir hmm9 HERest C config I aligned.mlf t S train.scp H hmm8/macros H hmm8/hmmdefs M hmm9 monophonese1 3. Creating Tied-State Triphones 3.1 Making triphones from monophones Create the file mktri.led with the following content: WB sp WB sil TC

7 Execute the following commands: HLEd n triphones1 l * -i wintri.mlf mktri.led aligned.mlf Perl Maketrihed monophones1 triphones1 Mkdir hmm10 Hhed B H hmm9/macros H hmm9/hmmdefs M hmm10 mktri.hed monophones1 Mkdir hmm11 Herest C config I wintri.mlf t S train.scp H hmm10/macros H hmm10/hmmdefs M hmm11 triphones1 Mkdir hmm12 Herest C config I wintri.mlf t s stats S train.scp H hmm11/macros H hmm11/hmmdefs M hmm12 triphones1 4.2 Making Tied-State Triphones Execute the following command: HDMan b sp n fulllist g global.ded l flog beep-tri beep-1.0 Copy the content of the triphones1 and add it to fulllist file. Create the file tree.hed with the following content: TB 350 "ST_ah_2_" {("ah","*-ah+*","ah+*","*-ah").state[2]} TB 350 "ST_ax_2_" {("ax","*-ax+*","ax+*","*-ax").state[2]} TB 350 "ST_ey_2_" {("ey","*-ey+*","ey+*","*-ey").state[2]} TB 350 "ST_sh_2_" {("sh","*-sh+*","sh+*","*-sh").state[2]} Etc And execute the following command: Perl mkclscript.prl TB monophones1>>tree.hed Add the following content to the tree.hed file: TR 1 AU fulllist CO tiedlist ST trees

8 Execute the following commands: Mkdir hmm13 HHEd H hmm12/macros H hmm12/hmmdefs M hmm13 tree.hed triphones1 > log Mkdir hmm14 HERest C config I wintri.mlf t S train.scp H hmm13/macros H hmm13/hmmdefs M hmm14 tiedlist Mkdir hmm15 HERest C config I wintri.mlf t S train.scp H hmm14/macros H hmm14/hmmdefs M hmm15 tiedlist 4 The Evaluation of the Recognizer 4.1 Recognizing the Test Data Finally, execute the following command to evaluate the recognizer: Hvite C config H hmm15/macros H hmm15/hmmdefs S test.scp l * -I result.mlf w wdnet p 0.0 s 5.0 dict tiedlist

Introduction to The HTK Toolkit

Introduction to The HTK Toolkit Introduction to The HTK Toolkit Hsin-min Wang Reference: - The HTK Book Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools A Tutorial Example

More information

Tutorial of Building an LVCSR System

Tutorial of Building an LVCSR System Tutorial of Building an LVCSR System using HTK Shih Hsiang Lin( 林士翔 ) Department of Computer Science & Information Engineering National Taiwan Normal University Reference: Steve Young et al, The HTK Books

More information

Building a Simple Speaker Identification System

Building a Simple Speaker Identification System Building a Simple Speaker Identification System 1 Introduction 11 We will be using the Hidden Markov Model Toolkit (HTK) HTK is installed under linux on the lab chines Your path should already be set,

More information

DT2118 Speech and Speaker Recognition. Outline. HTK, What is it? Short History. Notes. Notes. Notes. Notes. HTK Tutorial. Giampiero Salvi VT2014

DT2118 Speech and Speaker Recognition. Outline. HTK, What is it? Short History. Notes. Notes. Notes. Notes. HTK Tutorial. Giampiero Salvi VT2014 DT2118 Speech and Speaker Recognition HTK Tutorial Giampiero Salvi KTH/CSC/TMH giampi@kthse VT2014 1 / 39 Outline Introduction General Usage Data formats and manipulation Training Recognition 2 / 39 HTK,

More information

Introduction to HTK Toolkit

Introduction to HTK Toolkit Introduction to HTK Toolkit Berlin Chen 2003 Reference: - The HTK Book, Version 3.2 Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools Homework:

More information

Lecture 5: Hidden Markov Models

Lecture 5: Hidden Markov Models Lecture 5: Hidden Markov Models Lecturer: Mark Hasegawa-Johnson (jhasegaw@uiuc.edu) TA: Sarah Borys (sborys@uiuc.edu) Web Page: http://www.ifp.uiuc.edu/speech/courses/minicourse/ May 27, 2005 1 Training

More information

HTK (v.3.1): Basic Tutorial

HTK (v.3.1): Basic Tutorial HTK (v.3.1): Basic Tutorial Nicolas Moreau / 02.02.2002 Content WHAT IS HTK?... 3 1 YES/NO RECOGNITION SYSTEM... 3 2 CREATION OF THE TRAINING CORPUS... 4 2.1 Record the Signal...4 2.2 Label the Signal...4

More information

EE627 Term Project : Jul Semester

EE627 Term Project : Jul Semester EE627 Term Project : Jul. 2013 Semester August 12, 2013 Title : Build and demonstrate a real time continuous speech recognition system in English Assigned to : Batch No. 1 TAs Assigned : Waquar Ahmad.

More information

Speech Recognition Tools

Speech Recognition Tools Speech Recognition Tools Mark Hasegawa-Johnson July 17, 2002 1 Bash, Sed, Awk 1.1 Installation If you are on a unix system, bash, sed, gawk, and perl are probably already installed. If not, ask your system

More information

Speech Recogni,on using HTK CS4706. Fadi Biadsy April 21 st, 2008

Speech Recogni,on using HTK CS4706. Fadi Biadsy April 21 st, 2008 peech Recogni,on using HTK C4706 Fadi Biadsy April 21 st, 2008 1 Outline peech Recogni,on Feature Extrac,on HMM 3 basic problems HTK teps to Build a speech recognizer 2 peech Recogni,on peech ignal to

More information

The HTK Hidden Markov Model Toolkit: Design and Philosophy. SJ Young. September 6, Cambridge University Engineering Department

The HTK Hidden Markov Model Toolkit: Design and Philosophy. SJ Young. September 6, Cambridge University Engineering Department The HTK Hidden Markov Model Toolkit: Design and Philosophy SJ Young CUED/F-INFENG/TR.152 September 6, 1994 Cambridge University Engineering Department Trumpington Street, Cambridge, CB2 1PZ (sjy@eng.cam.ac.uk)

More information

c COPYRIGHT Microsoft Corporation. c COPYRIGHT Cambridge University Engineering Department.

c COPYRIGHT Microsoft Corporation. c COPYRIGHT Cambridge University Engineering Department. The HTK Book Steve Young Gunnar Evermann Dan Kershaw Gareth Moore Julian Odell Dave Ollason Valtcho Valtchev Phil Woodland The HTK Book (for HTK Version 3.1) c COPYRIGHT 1995-1999 Microsoft Corporation.

More information

The HTK Book. Steve Young Gunnar Evermann Dan Kershaw Gareth Moore Julian Odell Dave Ollason Dan Povey Valtcho Valtchev Phil Woodland

The HTK Book. Steve Young Gunnar Evermann Dan Kershaw Gareth Moore Julian Odell Dave Ollason Dan Povey Valtcho Valtchev Phil Woodland The HTK Book Steve Young Gunnar Evermann Dan Kershaw Gareth Moore Julian Odell Dave Ollason Dan Povey Valtcho Valtchev Phil Woodland The HTK Book (for HTK Version 3.2) c COPYRIGHT 1995-1999 Microsoft Corporation.

More information

The HTK Book. Steve Young Dan Kershaw Julian Odell Dave Ollason Valtcho Valtchev Phil Woodland. The HTK Book (for HTK Version 3.1)

The HTK Book. Steve Young Dan Kershaw Julian Odell Dave Ollason Valtcho Valtchev Phil Woodland. The HTK Book (for HTK Version 3.1) The HTK Book Steve Young Dan Kershaw Julian Odell Dave Ollason Valtcho Valtchev Phil Woodland The HTK Book (for HTK Version 3.1) c COPYRIGHT 1995-1999 Microsoft Corporation. All Rights Reserved First published

More information

The HTK Book. The HTK Book (for HTK Version 3.4)

The HTK Book. The HTK Book (for HTK Version 3.4) The HTK Book Steve Young Gunnar Evermann Mark Gales Thomas Hain Dan Kershaw Xunying (Andrew) Liu Gareth Moore Julian Odell Dave Ollason Dan Povey Valtcho Valtchev Phil Woodland The HTK Book (for HTK Version

More information

Masters in Computer Speech Text and Internet Technology. Module: Speech Practical. HMM-based Speech Recognition

Masters in Computer Speech Text and Internet Technology. Module: Speech Practical. HMM-based Speech Recognition Masters in Computer Speech Text and Internet Technology Module: Speech Practical HMM-based Speech Recognition 1 Introduction This practical is concerned with phone-based continuous speech recognition using

More information

Lecture 8: Speech Recognition Using Finite State Transducers

Lecture 8: Speech Recognition Using Finite State Transducers Lecture 8: Speech Recognition Using Finite State Transducers Lecturer: Mark Hasegawa-Johnson (jhasegaw@uiuc.edu) TA: Sarah Borys (sborys@uiuc.edu) Web Page: http://www.ifp.uiuc.edu/speech/courses/minicourse/

More information

Contents I Tutorial Overview 1 The Overview of the HTK Toolkit HTK Software Architecture Generic Properties of

Contents I Tutorial Overview 1 The Overview of the HTK Toolkit HTK Software Architecture Generic Properties of The HTK Book Steve Young Gunnar Evermann Dan Kershaw Gareth Moore Julian Odell Dave Ollason Dan Povey Valtcho Valtchev Phil Woodland The HTK Book (for HTK Version 32) c COPYRIGHT 1995-1999 Microsoft Corporation

More information

Voiced/Unvoiced and Silent Classification Using HMM Classifier based on Wavelet Packets BTE features

Voiced/Unvoiced and Silent Classification Using HMM Classifier based on Wavelet Packets BTE features Voiced/Unvoiced and Silent Classification Using HMM Classifier based on Wavelet Packets BTE features Amr M. Gody 1 Fayoum University Abstract Wavelet Packets Best Tree Encoded (BTE) features is used here

More information

Learning The Lexicon!

Learning The Lexicon! Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,

More information

Research Report on Bangla OCR Training and Testing Methods

Research Report on Bangla OCR Training and Testing Methods Research Report on Bangla OCR Training and Testing Methods Md. Abul Hasnat BRAC University, Dhaka, Bangladesh. hasnat@bracu.ac.bd Abstract In this paper we present the training and recognition mechanism

More information

Julius rev LEE Akinobu, and Julius Development Team 2007/12/19. 1 Introduction 2

Julius rev LEE Akinobu, and Julius Development Team 2007/12/19. 1 Introduction 2 Julius rev. 4.0 L Akinobu, and Julius Development Team 2007/12/19 Contents 1 Introduction 2 2 Framework of Julius-4 2 2.1 System architecture........................... 2 2.2 How it runs...............................

More information

Lecture 8. LVCSR Training and Decoding. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lecture 8. LVCSR Training and Decoding. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen Lecture 8 LVCSR Training and Decoding Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 November

More information

RESOLUTION LIMITS ON VISUAL SPEECH RECOGNITION. Helen L. Bear, Richard Harvey, Barry-John Theobald, Yuxuan Lan

RESOLUTION LIMITS ON VISUAL SPEECH RECOGNITION. Helen L. Bear, Richard Harvey, Barry-John Theobald, Yuxuan Lan RESOLUTION LIMITS ON VISUAL SPEECH RECOGNITION Helen L. Bear, Richard Harvey, Barry-John Theobald, Yuxuan Lan School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK. helen.bear@uea.ac.uk,

More information

Segmentation free Bangla OCR using HMM: Training and Recognition

Segmentation free Bangla OCR using HMM: Training and Recognition Segmentation free Bangla OCR using HMM: Training and Recognition Md. Abul Hasnat, S.M. Murtoza Habib, Mumit Khan BRAC University, Bangladesh mhasnat@gmail.com, murtoza@gmail.com, mumit@bracuniversity.ac.bd

More information

Khmer OCR for Limon R1 Size 22 Report

Khmer OCR for Limon R1 Size 22 Report PAN Localization Project Project No: Ref. No: PANL10n/KH/Report/phase2/002 Khmer OCR for Limon R1 Size 22 Report 09 July, 2009 Prepared by: Mr. ING LENG IENG Cambodia Country Component PAN Localization

More information

Overview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals

Overview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals Overview Search and Decoding Steve Renals Automatic Speech Recognition ASR Lecture 10 January - March 2012 Today s lecture Search in (large vocabulary) speech recognition Viterbi decoding Approximate search

More information

Modeling Phonetic Context with Non-random Forests for Speech Recognition

Modeling Phonetic Context with Non-random Forests for Speech Recognition Modeling Phonetic Context with Non-random Forests for Speech Recognition Hainan Xu Center for Language and Speech Processing, Johns Hopkins University September 4, 2015 Hainan Xu September 4, 2015 1 /

More information

Applications of Keyword-Constraining in Speaker Recognition. Howard Lei. July 2, Introduction 3

Applications of Keyword-Constraining in Speaker Recognition. Howard Lei. July 2, Introduction 3 Applications of Keyword-Constraining in Speaker Recognition Howard Lei hlei@icsi.berkeley.edu July 2, 2007 Contents 1 Introduction 3 2 The keyword HMM system 4 2.1 Background keyword HMM training............................

More information

MRCP. PocketSphinx Plugin. Usage Guide. Powered by Universal Speech Solutions LLC

MRCP. PocketSphinx Plugin. Usage Guide. Powered by Universal Speech Solutions LLC Powered by Universal Speech Solutions LLC MRCP PocketSphinx Plugin Usage Guide Revision: 3 Created: February 16, 2017 Last updated: May 20, 2017 Author: Arsen Chaloyan Universal Speech Solutions LLC Overview

More information

An Experimental Evaluation of Keyword-Filler Hidden Markov Models

An Experimental Evaluation of Keyword-Filler Hidden Markov Models An Experimental Evaluation of Keyword-Filler Hidden Markov Models A. Jansen and P. Niyogi April 13, 2009 Abstract We present the results of a small study involving the use of keyword-filler hidden Markov

More information

INTEGRATION OF SPEECH & VIDEO: APPLICATIONS FOR LIP SYNCH: LIP MOVEMENT SYNTHESIS & TIME WARPING

INTEGRATION OF SPEECH & VIDEO: APPLICATIONS FOR LIP SYNCH: LIP MOVEMENT SYNTHESIS & TIME WARPING INTEGRATION OF SPEECH & VIDEO: APPLICATIONS FOR LIP SYNCH: LIP MOVEMENT SYNTHESIS & TIME WARPING Jon P. Nedel Submitted to the Department of Electrical and Computer Engineering in Partial Fulfillment of

More information

DECODING VISEMES: IMPROVING MACHINE LIP-READING. Helen L. Bear and Richard Harvey

DECODING VISEMES: IMPROVING MACHINE LIP-READING. Helen L. Bear and Richard Harvey DECODING VISEMES: IMPROVING MACHINE LIP-READING Helen L. Bear and Richard Harvey School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, United Kingdom ABSTRACT To undertake machine

More information

Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments

Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments PAGE 265 Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments Patrick Lucey, Terrence Martin and Sridha Sridharan Speech and Audio Research Laboratory Queensland University

More information

Weighted Finite State Transducers in Automatic Speech Recognition

Weighted Finite State Transducers in Automatic Speech Recognition Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 15.04.2015 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri, M. Riley and

More information

Weighted Finite State Transducers in Automatic Speech Recognition

Weighted Finite State Transducers in Automatic Speech Recognition Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley

More information

Modeling Coarticulation in Continuous Speech

Modeling Coarticulation in Continuous Speech ing in Oregon Health & Science University Center for Spoken Language Understanding December 16, 2013 Outline in 1 2 3 4 5 2 / 40 in is the influence of one phoneme on another Figure: of coarticulation

More information

PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION

PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION LucaCappelletta 1, Naomi Harte 1 1 Department ofelectronicand ElectricalEngineering, TrinityCollegeDublin, Ireland {cappelll, nharte}@tcd.ie Keywords:

More information

A MOUTH FULL OF WORDS: VISUALLY CONSISTENT ACOUSTIC REDUBBING. Disney Research, Pittsburgh, PA University of East Anglia, Norwich, UK

A MOUTH FULL OF WORDS: VISUALLY CONSISTENT ACOUSTIC REDUBBING. Disney Research, Pittsburgh, PA University of East Anglia, Norwich, UK A MOUTH FULL OF WORDS: VISUALLY CONSISTENT ACOUSTIC REDUBBING Sarah Taylor Barry-John Theobald Iain Matthews Disney Research, Pittsburgh, PA University of East Anglia, Norwich, UK ABSTRACT This paper introduces

More information

Documentation of MASV

Documentation of MASV Documentation of MASV Munich Automatic Speaker Verification system Documentation version 1.3.00 (16.02.2004) Release 1.3 (16.02.2004) Ulrich Türk tuerk@phonetik.uni-muenchen.de Department of Phonetics

More information

Facial Animation System Based on Image Warping Algorithm

Facial Animation System Based on Image Warping Algorithm Facial Animation System Based on Image Warping Algorithm Lanfang Dong 1, Yatao Wang 2, Kui Ni 3, Kuikui Lu 4 Vision Computing and Visualization Laboratory, School of Computer Science and Technology, University

More information

StaRt. A Biofeedback App. Heather Campbell, Helen Carey, Celine Wu, Dalit Shalom Developing Assistive Technologies NYU October 2014

StaRt. A Biofeedback App. Heather Campbell, Helen Carey, Celine Wu, Dalit Shalom Developing Assistive Technologies NYU October 2014 StaRt A Biofeedback App Heather Campbell, Helen Carey, Celine Wu, Dalit Shalom Developing Assistive Technologies NYU October 2014 Name Development The StaRt app is a work in progress. It was temporarily

More information

arxiv: v1 [cs.cv] 3 Oct 2017

arxiv: v1 [cs.cv] 3 Oct 2017 Which phoneme-to-viseme maps best improve visual-only computer lip-reading? Helen L. Bear, Richard W. Harvey, Barry-John Theobald and Yuxuan Lan School of Computing Sciences, University of East Anglia,

More information

Ping-pong decoding Combining forward and backward search

Ping-pong decoding Combining forward and backward search Combining forward and backward search Research Internship 09/ - /0/0 Mirko Hannemann Microsoft Research, Speech Technology (Redmond) Supervisor: Daniel Povey /0/0 Mirko Hannemann / Beam Search Search Errors

More information

Chapter 3. Speech segmentation. 3.1 Preprocessing

Chapter 3. Speech segmentation. 3.1 Preprocessing , as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents

More information

1st component influence. y axis location (mm) Incoming context phone. Audio Visual Codebook. Visual phoneme similarity matrix

1st component influence. y axis location (mm) Incoming context phone. Audio Visual Codebook. Visual phoneme similarity matrix ISCA Archive 3-D FACE POINT TRAJECTORY SYNTHESIS USING AN AUTOMATICALLY DERIVED VISUAL PHONEME SIMILARITY MATRIX Levent M. Arslan and David Talkin Entropic Inc., Washington, DC, 20003 ABSTRACT This paper

More information

ProTalk Plus. Basic Programming Tutorial

ProTalk Plus. Basic Programming Tutorial The ProTalk Plus will walk you through the steps needed to configure a basic database. This example illustrates programming one alarm input and one voice callout number. The contents of the Barnett Engineering

More information

The Julius book. Akinobu LEE

The Julius book. Akinobu LEE The Julius book Akinobu LEE October 2, 2008 The Julius book by Akinobu LEE Edition 1.0.0 - rev.4.1.0 Copyright c 2008 LEE Akinobu 2 Contents A Major Changes 7 A.1 Changes from 4.0 to 4.1........................................

More information

Lecture 3: Acoustic Features

Lecture 3: Acoustic Features Lecture 3: Acoustic Features Lecturer: Mark Hasegawa-Johnson (jhasegaw@uiuc.edu) TA: Sarah Borys (sborys@uiuc.edu) Web Page: http://www.ifp.uiuc.edu/speech/courses/minicourse/ June 27, 2005 1 Where to

More information

Decoding visemes: improving machine lip-reading (PhD thesis)

Decoding visemes: improving machine lip-reading (PhD thesis) Decoding visemes: improving machine lip-reading (PhD thesis) arxiv:17.01288v1 [cs.cv] 3 Oct 2017 Helen L. Bear University of East Anglia School of Computing Sciences July 2016 This copy of the thesis has

More information

of Manchester The University COMP14112 Markov Chains, HMMs and Speech Revision

of Manchester The University COMP14112 Markov Chains, HMMs and Speech Revision COMP14112 Lecture 11 Markov Chains, HMMs and Speech Revision 1 What have we covered in the speech lectures? Extracting features from raw speech data Classification and the naive Bayes classifier Training

More information

ProTalk Link. Basic Programming Tutorial

ProTalk Link. Basic Programming Tutorial The ProTalk Link will walk you through the steps needed to configure a basic database using 3 modules: the M1 (Main module), the T1 (Callout module) and the D1 (Digital input expansion module). This example

More information

MLSALT11: Large Vocabulary Speech Recognition

MLSALT11: Large Vocabulary Speech Recognition MLSALT11: Large Vocabulary Speech Recognition Riashat Islam Department of Engineering University of Cambridge Trumpington Street, Cambridge, CB2 1PZ, England ri258@cam.ac.uk I. INTRODUCTION The objective

More information

WaveSurfer at a glance

WaveSurfer at a glance WaveSurfer at a glance WaveSurfer has a simple but powerful interface. The basic document you work with is a sound. When WaveSurfer is first started, it contains an empty sound. You can load a sound file

More information

Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training

Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training Chao Zhang and Phil Woodland March 8, 07 Cambridge University Engineering Department

More information

A Collaborative Speech Transcription System for Live Streaming

A Collaborative Speech Transcription System for Live Streaming 1 2 2 1 ustream Yourscribe PodCastle Yourscribe 2.0 A Collaborative Speech Transcription System for Live Streaming Shunsuke Ukita, 1 Jun Ogata, 2 Masataka Goto 2 and Tetsunori Kobayashi 1 In this paper,

More information

2 Julius. Julius. Julius. Julius : Julius Web Julius. N-gram. Julius.

2 Julius. Julius. Julius. Julius :   Julius Web   Julius. N-gram. Julius. Julius 1 3 Julius 4.1.1 Web 1 Julius Web http://julius.sourceforge.jp/ Julius Julius 2 3 4 Julius 5 6 7 Tips 2 Julius 1: http://julius.sourceforge.jp/ N-gram Julius Microsoft SAPI 1 Julius 2.1 Julius Windows

More information

EPSON Speech IC Speech Guide Creation Tool User Guide

EPSON Speech IC Speech Guide Creation Tool User Guide EPSON Speech IC User Guide Rev.1.21 NOTICE No part of this material may be reproduced or duplicated in any form or by any means without the written permission of Seiko Epson. Seiko Epson reserves the right

More information

Revision: January 28, Henley Court Pullman, WA (509) Voice and Fax

Revision: January 28, Henley Court Pullman, WA (509) Voice and Fax Lab Project 2: Board Verification and Basic Logic Circuits Revision: January 28, 2012 1300 Henley Court Pullman, WA 99163 (509) 334 6306 Voice and Fax STUDENT I am submitting my own work, and I understand

More information

Quick Start Guide. Version 3.0

Quick Start Guide. Version 3.0 Quick Start Guide Version 3.0 Introduction and Requirements This document is a Quick Start Guide for the.net Telephony Tool, Voice Elements. For complete documentation on the Telephony API, please refer

More information

Speech Technology Using in Wechat

Speech Technology Using in Wechat Speech Technology Using in Wechat FENG RAO Powered by WeChat Outline Introduce Algorithm of Speech Recognition Acoustic Model Language Model Decoder Speech Technology Open Platform Framework of Speech

More information

Lecture 9. LVCSR Decoding (cont d) and Robustness. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lecture 9. LVCSR Decoding (cont d) and Robustness. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen Lecture 9 LVCSR Decoding (cont d) and Robustness Michael Picheny, huvana Ramabhadran, Stanley F. Chen IM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com

More information

Lab 4 Large Vocabulary Decoding: A Love Story

Lab 4 Large Vocabulary Decoding: A Love Story Lab 4 Large Vocabulary Decoding: A Love Story EECS E6870: Speech Recognition Due: November 19, 2009 at 11:59pm SECTION 0 Overview By far the sexiest piece of software associated with ASR is the large-vocabulary

More information

University of Energy and Natural Resources, Sunyani. Name: UBA, Felix. How to get ROMS Summer school August, 2016

University of Energy and Natural Resources, Sunyani. Name: UBA, Felix. How to get ROMS Summer school August, 2016 University of Energy and Natural Resources, Sunyani Name: UBA, Felix How to get ROMS running @ Summer school August, 2016 Introduction PRESENTATION How to download the code, Configure it for an Application,

More information

CMSC 201 Fall 2015 Lab 12 Tuples and Dictionaries

CMSC 201 Fall 2015 Lab 12 Tuples and Dictionaries CMSC 201 Fall 2015 Lab 12 Tuples and Dictionaries Assignment: Lab 12 Tuples and Dictionaries Due Date: During discussion, November 30 th through December 3 rd Value: 1% of final grade Part 1: Data Types

More information

Creating a Dashboard Prompt

Creating a Dashboard Prompt Creating a Dashboard Prompt This guide will cover: How to create a dashboard prompt which can be used for developing flexible dashboards for users to utilize when viewing an analysis on a dashboard. Step

More information

Lab 2: Training monophone models

Lab 2: Training monophone models v. 1.1 Lab 2: Training monophone models University of Edinburgh January 29, 2018 Last time we begun to get familiar with some of Kaldi s tools and set up a data directory for TIMIT. This time we will train

More information

3. (1.0 point) To quickly switch to the Visual Basic Editor, press on your keyboard. a. Esc + F1 b. Ctrl + F7 c. Alt + F11 d.

3. (1.0 point) To quickly switch to the Visual Basic Editor, press on your keyboard. a. Esc + F1 b. Ctrl + F7 c. Alt + F11 d. Excel Tutorial 12 1. (1.0 point) Excel macros are written in the programming language. a. Perl b. JavaScript c. HTML d. VBA 2. (1.0 point) To edit a VBA macro, you need to use the Visual Basic. a. Manager

More information

GMM-FREE DNN TRAINING. Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao

GMM-FREE DNN TRAINING. Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao GMM-FREE DNN TRAINING Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao Google Inc., New York {andrewsenior,heigold,michiel,hankliao}@google.com ABSTRACT While deep neural networks (DNNs) have

More information

CONTINUOUS VISUAL SPEECH RECOGNITION FOR MULTIMODAL FUSION

CONTINUOUS VISUAL SPEECH RECOGNITION FOR MULTIMODAL FUSION 2014 IEEE International Conference on Acoustic, peech and ignal Processing (ICAP) CONTINUOU VIUAL PEECH RECOGNITION FOR MULTIMODAL FUION Eric Benhaim, Hichem ahbi Telecom ParisTech CNR-LTCI 46 rue Barrault,

More information

Getting Started with Loyola s Voic System

Getting Started with Loyola s Voic System Getting Started with Loyola s Voicemail System Loyola Moves to Microsoft This guide provides an int roduction to Loyola s unified messaging voicemail system. Revised: 08/16/2018 About Unified Messaging

More information

A Parallel Implementation of a Hidden Markov Model. Carl D. Mitchell, Randall A. Helzerman, Leah H. Jamieson, and Mary P. Harper

A Parallel Implementation of a Hidden Markov Model. Carl D. Mitchell, Randall A. Helzerman, Leah H. Jamieson, and Mary P. Harper A Parallel Implementation of a Hidden Markov Model with Duration Modeling for Speech Recognition y Carl D. Mitchell, Randall A. Helzerman, Leah H. Jamieson, and Mary P. Harper School of Electrical Engineering,

More information

MRCP. Julius Plugin. Usage Guide. Powered by Universal Speech Solutions LLC

MRCP. Julius Plugin. Usage Guide. Powered by Universal Speech Solutions LLC Powered by Universal Speech Solutions LLC MRCP Julius Plugin Usage Guide Revision: 3 Created: February 16, 2017 Last updated: May 20, 2017 Author: Arsen Chaloyan Universal Speech Solutions LLC Overview

More information

Adding A Signature To A Photograph By Jerry Koons

Adding A Signature To A Photograph By Jerry Koons The addition of a signature can help identify the image owner, which can be desirable for certain uses such as Field Trip shows. This procedure presents a step-by-step method to create a signature and

More information

Record. Settings. Settings, page 1 Element Data, page 5 Exit States, page 5 Audio Groups, page 6 Folder and Class Information, page 6 Events, page 6

Record. Settings. Settings, page 1 Element Data, page 5 Exit States, page 5 Audio Groups, page 6 Folder and Class Information, page 6 Events, page 6 The voice element makes a recording of the caller's voice. A prompt is played to the caller then the voice element records the caller s voice until a termination key is inputted, the recording time limit

More information

Mono-font Cursive Arabic Text Recognition Using Speech Recognition System

Mono-font Cursive Arabic Text Recognition Using Speech Recognition System Mono-font Cursive Arabic Text Recognition Using Speech Recognition System M.S. Khorsheed Computer & Electronics Research Institute, King AbdulAziz City for Science and Technology (KACST) PO Box 6086, Riyadh

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

Speech Applications. How do they work?

Speech Applications. How do they work? Speech Applications How do they work? What is a VUI? What the user interacts with when using a speech application VUI Elements Prompts or System Messages Prerecorded or Synthesized Grammars Define the

More information

Recent Development of Open-Source Speech Recognition Engine Julius

Recent Development of Open-Source Speech Recognition Engine Julius Recent Development of Open-Source Speech Recognition Engine Julius Akinobu Lee and Tatsuya Kawahara Nagoya Institute of Technology, Nagoya, Aichi 466-8555, Japan E-mail: ri@nitech.ac.jp Kyoto University,

More information

Voice Profile Setup Guide

Voice Profile Setup Guide This document will help a user learn how to create, update, and maintain voice profiles. Understanding the voice profile is an important part in understanding how the ASR Transcription and interaction

More information

Configuration Guide. Index. 1. Admin Menu 2. VoiceXML editor 3. System Reports 4. System Settings. About us

Configuration Guide. Index. 1. Admin Menu 2. VoiceXML editor 3. System Reports 4. System Settings. About us Configuration Guide Index 1. Admin Menu 2. VoiceXML editor 3. System Reports 4. System Settings About us Interactive Powers, SL (EUR) Calle Magallanes, 13 5º Izq 28015 Madrid (Spain) Interactive Powers,

More information

Table of Contents 1. INTRODUCING DLL MODES AND SETTINGS IN DLL GENERAL DLL FEATURES...4

Table of Contents 1. INTRODUCING DLL MODES AND SETTINGS IN DLL GENERAL DLL FEATURES...4 Table of Contents 1. INTRODUCING DLL...1 2. MODES AND SETTINGS IN DLL...2 2.1 TEACHING MODE... 2 2.2 SELF LEARNING MODE... 2 2.3 NORMAL SETTING... 3 2.4 MANUAL SETTING... 3 2.5 CAPTURE SETTING... 3 3.

More information

Lab 4: Hybrid Acoustic Models

Lab 4: Hybrid Acoustic Models v. 1.0 Lab 4: Hybrid Acoustic Models University of Edinburgh March 13, 2017 This is the final lab, in which we will have a look at training hybrid neural network acoustic models using a frame-level cross-entropy

More information

Lecture 7: Neural network acoustic models in speech recognition

Lecture 7: Neural network acoustic models in speech recognition CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic

More information

EE516: Embedded Software Project 1. Setting Up Environment for Projects

EE516: Embedded Software Project 1. Setting Up Environment for Projects EE516: Embedded Software Project 1. Setting Up Environment for Projects By Dong Jae Shin 2015. 09. 01. Contents Introduction to Projects of EE516 Tasks Setting Up Environment Virtual Machine Environment

More information

Speech Recognition. Project: Phone Recognition using Sphinx. Chia-Ho Ling. Sunya Santananchai. Professor: Dr. Kepuska

Speech Recognition. Project: Phone Recognition using Sphinx. Chia-Ho Ling. Sunya Santananchai. Professor: Dr. Kepuska Speech Recognition Project: Phone Recognition using Sphinx Chia-Ho Ling Sunya Santananchai Professor: Dr. Kepuska Objective Use speech data corpora to build a model using CMU Sphinx.Apply a built model

More information

Voice command system. & Using the voice command. system. NOTE

Voice command system. & Using the voice command. system. NOTE 80 system The voice command system enables the audio, hands-free phone system, etc. to be operated using voice commands. Refer to the Command list F83 for samples of voice commands. s can be used even

More information

Application Notes for Anhui USTC iflytek InterReco with Avaya Aura Experience Portal Issue 1.0

Application Notes for Anhui USTC iflytek InterReco with Avaya Aura Experience Portal Issue 1.0 Avaya Solution & Interoperability Test Lab Application Notes for Anhui USTC iflytek InterReco with Avaya Aura Experience Portal Issue 1.0 Abstract These Application Notes describe the configuration steps

More information

Genesis8FemaleXprssnMagic 2018 Elisa Griffin, all rights reserved

Genesis8FemaleXprssnMagic 2018 Elisa Griffin, all rights reserved Genesis8FemaleXprssnMagic 2018 Elisa Griffin, all rights reserved Welcome to Genesis8FemaleXprssnMagic!* To use this free-standing application you need DAZ Studio and the Genesis 8 Female figure that comes

More information

Lab 2: Introduction to Assembly Language Programming

Lab 2: Introduction to Assembly Language Programming COE 205 Lab Manual Lab 2: Introduction to Assembly Language Programming - page 16 Lab 2: Introduction to Assembly Language Programming Contents 2.1. Intel IA-32 Processor Architecture 2.2. Basic Program

More information

Add Tags to a Sent Message [New in v0.6] Misc 2

Add Tags to a Sent Message [New in v0.6] Misc 2 Tag Toolbar 0.6 Contents Overview Display and Toggle Tags Change Mode Use Categories Search Tags [New in v0.6] Add Tags to a Sent Message [New in v0.6] Misc 2 Overview Recognize attached tags easily Thunderbird

More information

A dictator may use a digital hand-held recorder for dictation, in lieu of the telephone, which can be uploaded to the Sten-Tel ASP system.

A dictator may use a digital hand-held recorder for dictation, in lieu of the telephone, which can be uploaded to the Sten-Tel ASP system. INSTRUCTIONS FOR DICTATION VIA THE ASP WEBSITE Uploading voice files A dictator may use a digital hand-held recorder for dictation, in lieu of the telephone, which can be uploaded to the Sten-Tel ASP system.

More information

Implementing a Hidden Markov Model Speech Recognition System in Programmable Logic

Implementing a Hidden Markov Model Speech Recognition System in Programmable Logic Implementing a Hidden Markov Model Speech Recognition System in Programmable Logic S.J. Melnikoff, S.F. Quigley & M.J. Russell School of Electronic and Electrical Engineering, University of Birmingham,

More information

Record_With_Confirm. Settings

Record_With_Confirm. Settings The voice element combines the functionality of the Record voice element with that of the Menu voice element. The voice element records the caller s voice, then prompts the caller to confirm that the recording

More information

Premier Literacy Tools

Premier Literacy Tools Premier Literacy Tools Tutorial Guide A step-by-step guide to the most popular tools in Premier Literacy Tools. Created by: Heather Harris, Special Education Coach Intern Table of Contents Talking Word

More information

How to get the best from the OnGuard STP

How to get the best from the OnGuard STP How to get the best from the OnGuard STP LESSON ONE Student Instruction Page 1. Navigate to the main contents page. (It has a black and yellow striped header frame, and a yellow page background. There

More information

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:

More information

Segmentation free Bangla OCR using HMM: Training and Recognition

Segmentation free Bangla OCR using HMM: Training and Recognition Segmentation free Bangla OCR using HMM: Training and Recognition Md. Abul Hasnat BRAC University, Bangladesh mhasnat@gmail.com S. M. Murtoza Habib BRAC University, Bangladesh murtoza@gmail.com Mumit Khan

More information

Technical Certification Program

Technical Certification Program Technical Program Technical Options and Requirements CX Administrators: AVST offers Administrator Training for those needing to learn configuration and maintenance of their CX-E systems. Administrator

More information

Configuration Guide. Index. 1. Admin Menu 2. VoiceXML editor 3. System Reports 4. System Settings 5. IVR Watchdog. About us

Configuration Guide. Index. 1. Admin Menu 2. VoiceXML editor 3. System Reports 4. System Settings 5. IVR Watchdog. About us Configuration Guide Index 1. Admin Menu 2. VoiceXML editor 3. System Reports 4. System Settings 5. IVR Watchdog About us Interactive Powers, SL (EUR) Calle Magallanes, 13 5º Izq 28015 Madrid (Spain) Interactive

More information