Speech Recognition. Project: Phone Recognition using Sphinx. Chia-Ho Ling. Sunya Santananchai. Professor: Dr. Kepuska

Size: px

Start display at page:

Download "Speech Recognition. Project: Phone Recognition using Sphinx. Chia-Ho Ling. Sunya Santananchai. Professor: Dr. Kepuska"

Isaac Richard
6 years ago
Views:

1 Speech Recognition Project: Phone Recognition using Sphinx Chia-Ho Ling Sunya Santananchai Professor: Dr. Kepuska

2 Objective Use speech data corpora to build a model using CMU Sphinx.Apply a built model to decode a test speech data corpora.use the built model in real time. Introduction The Sphinx Group at Carnegie Mellon University is committed to releasing the long-time, DARPA-funded Sphinx projects widely, in order to stimulate the creation of speech-using tools and applications, and to advance the state of the art both directly in speech recognition, as well as in related areas including dialog systems and speech synthesis. The packages that the CMU Sphinx Group is releasing are a set of reasonably mature, world-class speech components that provide a basic level of technology to anyone interested in creating speech-using applications without the once-prohibitive initial investment cost in research and development; the same components are open to peer review by all researchers in the field, and are used for linguistic research as well. Requirements for CMU Sphinx GNU/Linux, Unix variants, and Windows NT or later Cygwin with perl and tcsh shell for windows SPHINX system: Sphinxbase, Sphinx3, and SphinxTrain Perl to run the provided scripts, and a C compiler to compile the source code 1

3 Flow Chart Set up system Setting up the data Setting up the trainer Setting up the decoder Training corpora Testing corpora Make features Build a model Training corpora Word error rate Test corpora Live to decode Live recording Result for decoding 2

4 Set up system We will have to download and build several components to set up the complete systems. Provided you have all the necessary software, you will have to download the data package, the trainer, and one of the SPHINX decoders. The following instructions detail the steps. Corpora The ICSI Meeting Recorder Digits Corpus provides a collection of connected digit speech data recorded in a real meeting room. Its aim is to support and ease reverberation and noise reduction algorithm development and comparison in real-world environments. The package available here contains non-segmented recordings of read connected digits made simultaneously with four table-top PZM microphones. (This audio data, along with recordings from personal mics and table-top electret microphones, is also available from the Linguistic Data Consortium as part of the ICSI Meeting Corpus.) Segmentation and utterance extraction scripts, transcription files and additional documentation are also included utterances are available after segmentation. Make features Configuration file Extension file format: RAW or NIST Build a model Dictionary file Phone file Training identity file Transcription file 3

5 Implementation 4

The Result c:/cmututorial/digitnumber/result/digitnumber.match3272 SPKR # Snt # Wrd Corr Sub Del Ins Err S.Err mrd_ data 14 897 78.7 7.2 14.0 3.7 25.0 100.0 calls 6 6 0.0 16.7 83.3 0.0 100.0 100.0 Project 10 60 6.

6 The Result c:/cmututorial/digitnumber/result/digitnumber.match3272 SPKR # Snt # Wrd Corr Sub Del Ins Err S.Err mrd_ data calls Project Sum/ Avg Mean S.D Median

7 6

8 7

9 Conclusion Each sample in mrd_data corpus includes around 60 words so each sentence is not easy to recognize all words correct. Therefore sentence error rate is 100%.For mrd_data corpus, the word error rate is 25%. This is a kind of good word error rate. For project corpus, we get very high error rate. There are several factors may effect it: pronunciation of speakers, the environment, and the quality of hardware and software. 8

10 References [1] The Sphinx Group at Carnegie Mellon University. In order to stimulate the creation of speech-using tools and applications, and to advance the state of the art both directly in speech recognition, as well as in related areas including dialog systems and speech synthesis. [2] The ICSI Meeting Corpus. Including simultaneous multi-channel audio recordings, word-level orthographic transcriptions, and supporting documentation -- collected at the International Computer Science Institute in Berkeley during the years [3] CCW17. 9

CMU Sphinx: the recognizer library

CMU Sphinx: the recognizer library Authors: Massimo Basile Mario Fabrizi Supervisor: Prof. Paola Velardi 01/02/2013 Contents 1 Introduction 2 2 Sphinx download and installation 4 2.1 Download..........................................