Speech Recognition. Project: Phone Recognition using Sphinx. Chia-Ho Ling. Sunya Santananchai. Professor: Dr. Kepuska

Size: px
Start display at page:

Download "Speech Recognition. Project: Phone Recognition using Sphinx. Chia-Ho Ling. Sunya Santananchai. Professor: Dr. Kepuska"

Transcription

1 Speech Recognition Project: Phone Recognition using Sphinx Chia-Ho Ling Sunya Santananchai Professor: Dr. Kepuska

2 Objective Use speech data corpora to build a model using CMU Sphinx.Apply a built model to decode a test speech data corpora.use the built model in real time. Introduction The Sphinx Group at Carnegie Mellon University is committed to releasing the long-time, DARPA-funded Sphinx projects widely, in order to stimulate the creation of speech-using tools and applications, and to advance the state of the art both directly in speech recognition, as well as in related areas including dialog systems and speech synthesis. The packages that the CMU Sphinx Group is releasing are a set of reasonably mature, world-class speech components that provide a basic level of technology to anyone interested in creating speech-using applications without the once-prohibitive initial investment cost in research and development; the same components are open to peer review by all researchers in the field, and are used for linguistic research as well. Requirements for CMU Sphinx GNU/Linux, Unix variants, and Windows NT or later Cygwin with perl and tcsh shell for windows SPHINX system: Sphinxbase, Sphinx3, and SphinxTrain Perl to run the provided scripts, and a C compiler to compile the source code 1

3 Flow Chart Set up system Setting up the data Setting up the trainer Setting up the decoder Training corpora Testing corpora Make features Build a model Training corpora Word error rate Test corpora Live to decode Live recording Result for decoding 2

4 Set up system We will have to download and build several components to set up the complete systems. Provided you have all the necessary software, you will have to download the data package, the trainer, and one of the SPHINX decoders. The following instructions detail the steps. Corpora The ICSI Meeting Recorder Digits Corpus provides a collection of connected digit speech data recorded in a real meeting room. Its aim is to support and ease reverberation and noise reduction algorithm development and comparison in real-world environments. The package available here contains non-segmented recordings of read connected digits made simultaneously with four table-top PZM microphones. (This audio data, along with recordings from personal mics and table-top electret microphones, is also available from the Linguistic Data Consortium as part of the ICSI Meeting Corpus.) Segmentation and utterance extraction scripts, transcription files and additional documentation are also included utterances are available after segmentation. Make features Configuration file Extension file format: RAW or NIST Build a model Dictionary file Phone file Training identity file Transcription file 3

5 Implementation 4

6 The Result c:/cmututorial/digitnumber/result/digitnumber.match3272 SPKR # Snt # Wrd Corr Sub Del Ins Err S.Err mrd_ data calls Project Sum/ Avg Mean S.D Median

7 6

8 7

9 Conclusion Each sample in mrd_data corpus includes around 60 words so each sentence is not easy to recognize all words correct. Therefore sentence error rate is 100%.For mrd_data corpus, the word error rate is 25%. This is a kind of good word error rate. For project corpus, we get very high error rate. There are several factors may effect it: pronunciation of speakers, the environment, and the quality of hardware and software. 8

10 References [1] The Sphinx Group at Carnegie Mellon University. In order to stimulate the creation of speech-using tools and applications, and to advance the state of the art both directly in speech recognition, as well as in related areas including dialog systems and speech synthesis. [2] The ICSI Meeting Corpus. Including simultaneous multi-channel audio recordings, word-level orthographic transcriptions, and supporting documentation -- collected at the International Computer Science Institute in Berkeley during the years [3] CCW17. 9

CMU Sphinx: the recognizer library

CMU Sphinx: the recognizer library CMU Sphinx: the recognizer library Authors: Massimo Basile Mario Fabrizi Supervisor: Prof. Paola Velardi 01/02/2013 Contents 1 Introduction 2 2 Sphinx download and installation 4 2.1 Download..........................................

More information

Maximum Likelihood Beamforming for Robust Automatic Speech Recognition

Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Barbara Rauch barbara@lsv.uni-saarland.de IGK Colloquium, Saarbrücken, 16 February 2006 Agenda Background: Standard ASR Robust ASR

More information

Voice. Voice. Patterson EagleSoft Overview Voice 629

Voice. Voice. Patterson EagleSoft Overview Voice 629 Voice Voice Using the Microsoft voice engine, Patterson EagleSoft's Voice module is now faster, easier and more efficient than ever. Please refer to your Voice Installation guide prior to installing the

More information

LING203: Corpus. March 9, 2009

LING203: Corpus. March 9, 2009 LING203: Corpus March 9, 2009 Corpus A collection of machine readable texts SJSU LLD have many corpora http://linguistics.sjsu.edu/bin/view/public/chltcorpora Each corpus has a link to a description page

More information

THE PERFORMANCE of automatic speech recognition

THE PERFORMANCE of automatic speech recognition IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 6, NOVEMBER 2006 2109 Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments Michael L. Seltzer,

More information

Automated Tagging to Enable Fine-Grained Browsing of Lecture Videos

Automated Tagging to Enable Fine-Grained Browsing of Lecture Videos Automated Tagging to Enable Fine-Grained Browsing of Lecture Videos K.Vijaya Kumar (09305081) under the guidance of Prof. Sridhar Iyer June 28, 2011 1 / 66 Outline Outline 1 Introduction 2 Motivation 3

More information

Comprehensive Tool for Generation and Compatibility Management of Subtitles for English Language Videos

Comprehensive Tool for Generation and Compatibility Management of Subtitles for English Language Videos International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 12, Number 1 (2016), pp. 63-68 Research India Publications http://www.ripublication.com Comprehensive Tool for Generation

More information

Speech Applications. How do they work?

Speech Applications. How do they work? Speech Applications How do they work? What is a VUI? What the user interacts with when using a speech application VUI Elements Prompts or System Messages Prerecorded or Synthesized Grammars Define the

More information

Text-Independent Speaker Identification

Text-Independent Speaker Identification December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.

More information

MUSE: AN OPEN SOURCE SPEECH TECHNOLOGY RESEARCH PLATFORM. Peter Cahill and Julie Carson-Berndsen

MUSE: AN OPEN SOURCE SPEECH TECHNOLOGY RESEARCH PLATFORM. Peter Cahill and Julie Carson-Berndsen MUSE: AN OPEN SOURCE SPEECH TECHNOLOGY RESEARCH PLATFORM Peter Cahill and Julie Carson-Berndsen CNGL, School of Computer Science and Informatics, University College Dublin, Dublin, Ireland. {peter.cahill

More information

Vestec Automatic Speech Recognition Engine Standard Edition Version Installation Guide

Vestec Automatic Speech Recognition Engine Standard Edition Version Installation Guide Vestec Automatic Speech Recognition Engine Standard Edition Version 1.1.1 Installation Guide Vestec Automatic Speech Recognition Engine Standard Edition Version 1.1.1 Installation Guide Copyright 2009

More information

Towards Corpus Annotation Standards The MATE Workbench 1

Towards Corpus Annotation Standards The MATE Workbench 1 Towards Corpus Annotation Standards The MATE Workbench 1 Laila Dybkjær, Niels Ole Bernsen Natural Interactive Systems Laboratory Science Park 10, 5230 Odense M, Denmark E-post: laila@nis.sdu.dk, nob@nis.sdu.dk

More information

irobotrock: A Speech Recognition Mobile Application Reema Pimpale Prabhat Narayan Anand Kamath

irobotrock: A Speech Recognition Mobile Application Reema Pimpale Prabhat Narayan Anand Kamath irobotrock: A Speech Recognition Mobile Reema Pimpale Prabhat Narayan Anand Kamath Outline Introduction Technologies Current Approaches Our Solution Users ( Domain) Our Approach Pending Functionality Future

More information

Performance analysis, development and improvement of programs, commands and BASH scripts in GNU/Linux systems

Performance analysis, development and improvement of programs, commands and BASH scripts in GNU/Linux systems Performance analysis, development and improvement of programs, commands and BASH scripts in GNU/Linux systems Erion ÇANO Prof. Dr Betim ÇIÇO 11 TH W O R K S H O P S O F T W A R E E N G I N E E R I N G

More information

Say-it: Design of a Multimodal Game Interface for Children Based on CMU Sphinx 4 Framework

Say-it: Design of a Multimodal Game Interface for Children Based on CMU Sphinx 4 Framework Grand Valley State University ScholarWorks@GVSU Technical Library School of Computing and Information Systems 2014 Say-it: Design of a Multimodal Game Interface for Children Based on CMU Sphinx 4 Framework

More information

Applying Backoff to Concatenative Speech Synthesis

Applying Backoff to Concatenative Speech Synthesis Applying Backoff to Concatenative Speech Synthesis Lily Liu Stanford University lliu23@stanford.edu Luladay Price Stanford University luladayp@stanford.edu Andrew Zhang Stanford University azhang97@stanford.edu

More information

Review on Recent Speech Recognition Techniques

Review on Recent Speech Recognition Techniques International Journal of Scientific and Research Publications, Volume 3, Issue 7, July 2013 1 Review on Recent Speech Recognition Techniques Prof. Deepa H. Kulkarni Assistant Professor, SKN College of

More information

Contents. Resumen. List of Acronyms. List of Mathematical Symbols. List of Figures. List of Tables. I Introduction 1

Contents. Resumen. List of Acronyms. List of Mathematical Symbols. List of Figures. List of Tables. I Introduction 1 Contents Agraïments Resum Resumen Abstract List of Acronyms List of Mathematical Symbols List of Figures List of Tables VII IX XI XIII XVIII XIX XXII XXIV I Introduction 1 1 Introduction 3 1.1 Motivation...

More information

To use cuda (and cudnn), make sure to set paths in your.bashrc or.bash_profile appropriately.

To use cuda (and cudnn), make sure to set paths in your.bashrc or.bash_profile appropriately. ESPnet tutorial 0. Preparation $ ssh @login.clsp.jhu.edu $ ssh bxx $ mkdir -p /export/// $ cd /export///

More information

TESL-EJ 11.1, June 2007 Audacity/Alameen 1

TESL-EJ 11.1, June 2007 Audacity/Alameen 1 June 2007 Volume 11, Number1 Title: Audacity 1.2.6 Publisher: Product Type: Platform: Minimum System Requirements: Developed by a group of volunteers and distributed under the GNU General Public License

More information

Homework 3: Dialog. Part 1. Part 2. Results are due 17 th November 3:30pm

Homework 3: Dialog. Part 1. Part 2. Results are due 17 th November 3:30pm Homework 3: Dialog Part 1 Call TellMe and get two sets of driving directions Call CMU s Let s Go Call Amtrak Part 2 Build your own pizza ordering systems Register with Tell Me Studio Use VoiceXML to build

More information

BEST PRACTICES & CRITICAL SUCCESS FACTORS

BEST PRACTICES & CRITICAL SUCCESS FACTORS FLUENCY DIRECT BEST PRACTICES & CRITICAL SUCCESS FACTORS MICROPHONE USAGE Check the microphone settings to verify the microphone you intend to use is the one selected and that the record buttons are appropriately

More information

RLAT Rapid Language Adaptation Toolkit

RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit Tim Schlippe May 15, 2012 RLAT Rapid Language Adaptation Toolkit - 2 RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit - 3 Outline Introduction

More information

Discriminative training and Feature combination

Discriminative training and Feature combination Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics

More information

How can CLARIN archive and curate my resources?

How can CLARIN archive and curate my resources? How can CLARIN archive and curate my resources? Christoph Draxler draxler@phonetik.uni-muenchen.de Outline! Relevant resources CLARIN infrastructure European Research Infrastructure Consortium National

More information

The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation

The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation Ken Litkowski ken@clres.com http://www.clres.com http://www.clres.com/dppdemo/index.html Dictionary Parsing Project Purpose: to

More information

Web2cToGo: Bringing the Web2cToolkit to Mobile Devices. Reinhard Bacher DESY, Hamburg, Germany

Web2cToGo: Bringing the Web2cToolkit to Mobile Devices. Reinhard Bacher DESY, Hamburg, Germany Web2cToGo: Bringing the Web2cToolkit to Mobile Devices Reinhard Bacher DESY, Hamburg, Germany Outline Introduction to Web2cToolkit New: Web2cToGo project Web2cToGo Web-Desktop Web-Desktop navigation and

More information

Real-time large-scale analysis of audiovisual data

Real-time large-scale analysis of audiovisual data Finnish Center of Excellence in Computational Inference Real-time large-scale analysis of audiovisual data Department of Signal Processing and Acoustics Aalto University School of Electrical Engineering

More information

Intelligent Hands Free Speech based SMS System on Android

Intelligent Hands Free Speech based SMS System on Android Intelligent Hands Free Speech based SMS System on Android Gulbakshee Dharmale 1, Dr. Vilas Thakare 3, Dr. Dipti D. Patil 2 1,3 Computer Science Dept., SGB Amravati University, Amravati, INDIA. 2 Computer

More information

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE

More information

Confidence Measures: how much we can trust our speech recognizers

Confidence Measures: how much we can trust our speech recognizers Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition

More information

THE POSIT TOOLSET WITH GRAPHICAL USER INTERFACE

THE POSIT TOOLSET WITH GRAPHICAL USER INTERFACE THE POSIT TOOLSET WITH GRAPHICAL USER INTERFACE Martin Baillie George R. S. Weir Department of Computer and Information Sciences University of Strathclyde Glasgow G1 1XH UK mbaillie@cis.strath.ac.uk george.weir@cis.strath.ac.uk

More information

Data for linguistics ALEXIS DIMITRIADIS. Contents First Last Prev Next Back Close Quit

Data for linguistics ALEXIS DIMITRIADIS. Contents First Last Prev Next Back Close Quit Data for linguistics ALEXIS DIMITRIADIS Text, corpora, and data in the wild 1. Where does language data come from? The usual: Introspection, questionnaires, etc. Corpora, suited to the domain of study:

More information

Scalable Trigram Backoff Language Models

Scalable Trigram Backoff Language Models Scalable Trigram Backoff Language Models Kristie Seymore Ronald Rosenfeld May 1996 CMU-CS-96-139 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This material is based upon work

More information

Manual operations of the voice identification program GritTec's Speaker-ID: The Mobile Client

Manual operations of the voice identification program GritTec's Speaker-ID: The Mobile Client Manual operations of the voice identification program GritTec's Speaker-ID: The Mobile Client Version 4.00 2017 Title Short name of product Version 4.00 Manual operations of GritTec s Speaker-ID: The Mobile

More information

Preservation. Session 4: Techniques & Audio. Arienne M. Dwyer University of Kansas. Yoshi Ono University of Alberta

Preservation. Session 4: Techniques & Audio. Arienne M. Dwyer University of Kansas. Yoshi Ono University of Alberta Session 4: Techniques & Audio University of California at Santa Barbara, June 24-27, Arienne M. Dwyer University of Kansas Yoshi Ono University of Alberta 1 Session 4 s focus I. Homework review II. Transcriber

More information

Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland,

Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland, Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland, fractor@icsi.berkeley.edu 1 Today Recap: Some more Machine Learning Multimedia Systems An example Multimedia

More information

Creating Multi-Modal, User-Centric Records of Meetings with the Carnegie Mellon Meeting Recorder Architecture

Creating Multi-Modal, User-Centric Records of Meetings with the Carnegie Mellon Meeting Recorder Architecture Carnegie Mellon University Research Showcase Computer Science Department School of Computer Science 1-1-2004 Creating Multi-Modal, User-Centric Records of Meetings with the Carnegie Mellon Meeting Recorder

More information

Speech Tuner. and Chief Scientist at EIG

Speech Tuner. and Chief Scientist at EIG Speech Tuner LumenVox's Speech Tuner is a complete maintenance tool for end-users, valueadded resellers, and platform providers. It s designed to perform tuning and transcription, as well as parameter,

More information

Voice activated spell-check

Voice activated spell-check Technical Disclosure Commons Defensive Publications Series November 15, 2017 Voice activated spell-check Pedro Gonnet Victor Carbune Follow this and additional works at: http://www.tdcommons.org/dpubs_series

More information

Open-Source Speech Recognition for Hand-held and Embedded Devices

Open-Source Speech Recognition for Hand-held and Embedded Devices PocketSphinx: Open-Source Speech Recognition for Hand-held and Embedded Devices David Huggins Daines (dhuggins@cs.cmu.edu) Mohit Kumar (mohitkum@cs.cmu.edu) Arthur Chan (archan@cs.cmu.edu) Alan W Black

More information

Linguistic Resources for Handwriting Recognition and Translation Evaluation

Linguistic Resources for Handwriting Recognition and Translation Evaluation Linguistic Resources for Handwriting Recognition and Translation Evaluation Zhiyi Song*, Safa Ismael*, Steven Grimes*, David Doermann, Stephanie Strassel* *Linguistic Data Consortium, University of Pennsylvania,

More information

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Jan Vaněk and Josef V. Psutka Department of Cybernetics, West Bohemia University,

More information

The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms

The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms Lang Resources & Evaluation (2007) 41:389 407 DOI 10.1007/s10579-007-9054-4 The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms Djamel Mostefa Æ Nicolas Moreau Æ Khalid Choukri

More information

Keyword Recognition Performance with Alango Voice Enhancement Package (VEP) DSP software solution for multi-microphone voice-controlled devices

Keyword Recognition Performance with Alango Voice Enhancement Package (VEP) DSP software solution for multi-microphone voice-controlled devices Keyword Recognition Performance with Alango Voice Enhancement Package (VEP) DSP software solution for multi-microphone voice-controlled devices V1.19, 2018-12-25 Alango Technologies 1 Executive Summary

More information

FP SIMPLE4ALL deliverable D6.5. Deliverable D6.5. Initial Public Release of Open Source Tools

FP SIMPLE4ALL deliverable D6.5. Deliverable D6.5. Initial Public Release of Open Source Tools Deliverable D6.5 Initial Public Release of Open Source Tools The research leading to these results has received funding from the European Community s Seventh Framework Programme (FP7/2007-2013) under grant

More information

Dialogue systems. Volha Petukhova Saarland University

Dialogue systems. Volha Petukhova Saarland University Dialogue systems Volha Petukhova Saarland University 20/07/2016 Einführung in Diskurs and Pragmatik, Sommersemester 2016 Introduction Multimodal natural-language based dialogue as humanmachine interface

More information

Panopto Quick Start (Faculty)

Panopto Quick Start (Faculty) Enabling Panopto in D2L Authorize your course to use D2L/Panopto integration. Login to D2L, open the Content section, Add a module, call it something like Recordings or Videos Then, click Add Existing

More information

Evaluation Board Quick Start

Evaluation Board Quick Start Publication: QS/PE0601-7262/1 CML Microcircuits COMMUNICATION SEMICONDUCTORS Evaluation Board Quick Start PE0601-7262 1 Introduction Thank you for your interest in the PE0601-7262 Evaluation Board. This

More information

Informedia News-On Demand: Using Speech Recognition to Create a Digital Video Library

Informedia News-On Demand: Using Speech Recognition to Create a Digital Video Library Informedia News-On Demand: Using Speech Recognition to Create a Digital Video Library Howard D. Wactlar 1, Alexander G. Hauptmann 1 and Michael J. Witbrock 2,3 March 19 th, 1998 CMU-CS-98-109 1 School

More information

SpeakUp click. Contents. Applications. SpeakUp Firwmware. Algorithm. SpeakUp and SpeakUp 2 click. From MikroElektonika Documentation

SpeakUp click. Contents. Applications. SpeakUp Firwmware. Algorithm. SpeakUp and SpeakUp 2 click. From MikroElektonika Documentation Page 1 of 8 SpeakUp click From MikroElektonika Documentation SpeakUp click and Speakup 2 click are speaker dependent speech recognition click boards with standalone capabilities. They work by matching

More information

THE RT04 EVALUATION STRUCTURAL METADATA SYSTEMS AT CUED. M. Tomalin and P.C. Woodland

THE RT04 EVALUATION STRUCTURAL METADATA SYSTEMS AT CUED. M. Tomalin and P.C. Woodland THE RT04 EVALUATION STRUCTURAL METADATA S AT CUED M. Tomalin and P.C. Woodland Cambridge University Engineering Department, Trumpington Street, Cambridge, CB2 1PZ, UK. Email: mt126,pcw @eng.cam.ac.uk ABSTRACT

More information

An Open Source Speech Synthesis Frontend for HTS

An Open Source Speech Synthesis Frontend for HTS An Open Source Speech Synthesis Frontend for HTS Markus Toman and Michael Pucher FTW Telecommunications Research Center Vienna Donau-City-Straße 1, A-1220 Vienna, Austria http://www.ftw.at {toman,pucher}@ftw.at

More information

Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium

Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium Kazuaki Maeda, Haejoong Lee, Shawn Medero, Julie Medero, Robert Parker, Stephanie Strassel Linguistic

More information

Fluency Direct FAQ's

Fluency Direct FAQ's September 2013 Fluency Direct FAQ's Version 7.85 1710 Murray Avenue Pittsburgh, PA 412.422.2002 solutions@mmodal.com CONFIDENTIALITY DISCLAIMER All information methods and concepts contained in or disclosed

More information

EVENT VERIFICATION THROUGH VOICE PROCESS USING ANDROID. Kodela Divya* 1, J.Pratibha 2

EVENT VERIFICATION THROUGH VOICE PROCESS USING ANDROID. Kodela Divya* 1, J.Pratibha 2 ISSN 2277-2685 IJESR/May 2015/ Vol-5/Issue-5/179-183 Kodela Divya et. al./ International Journal of Engineering & Science Research EVENT VERIFICATION THROUGH VOICE PROCESS USING ANDROID ABSTRACT Kodela

More information

Text, Speech, and Vision for Video Segmentation: The Informedia TM Project

Text, Speech, and Vision for Video Segmentation: The Informedia TM Project Text, Speech, and Vision for Video Segmentation: The Informedia TM Project Alexander G. Hauptmann Michael A. Smith School Computer Science Dept. Electrical and Computer Engineering Carnegie Mellon University

More information

DARPA Communicator Dialog Travel Planning Systems: The June 2000 Data Collection

DARPA Communicator Dialog Travel Planning Systems: The June 2000 Data Collection DARPA Communicator Dialog Travel Planning Systems: The June 2 Data Collection M. Walker, J. Aberdeen, J. Boland, E. Bratt, J. Garofolo, L. Hirschman, A. Le, S. Lee, S. Narayanan, K. Papineni, B. Pellom,

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Communication media for Blinds Based on Voice Mrs.K.M.Sanghavi 1, Radhika Maru

More information

BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES

BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES O.O. Iakushkin a, G.A. Fedoseev, A.S. Shaleva, O.S. Sedova Saint Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg,

More information

Voice Activated Command and Control with Speech Recognition over Wireless Networks

Voice Activated Command and Control with Speech Recognition over Wireless Networks The ITB Journal Volume 5 Issue 2 Article 4 2004 Voice Activated Command and Control with Speech Recognition over Wireless Networks Tony Ayres Brian Nolan Follow this and additional works at: https://arrow.dit.ie/itbj

More information

DESIGN & IMPLEMENTATION OF A CO-PROCESSOR FOR EMBEDDED, REAL-TIME, SPEAKER-INDEPENDENT, CONTINUOUS SPEECH RECOGNITION SYSTEM-ON-A-CHIP.

DESIGN & IMPLEMENTATION OF A CO-PROCESSOR FOR EMBEDDED, REAL-TIME, SPEAKER-INDEPENDENT, CONTINUOUS SPEECH RECOGNITION SYSTEM-ON-A-CHIP. DESIGN & IMPLEMENTATION OF A CO-PROCESSOR FOR EMBEDDED, REAL-TIME, SPEAKER-INDEPENDENT, CONTINUOUS SPEECH RECOGNITION SYSTEM-ON-A-CHIP by Kshitij Gupta B.E., Osmania University, 2002 Submitted to the Graduate

More information

Powerpoint Controller using Speech Recognition

Powerpoint Controller using Speech Recognition e-issn : 2443-2229 Jurnal Teknik Informatika dan Sistem Informasi Powerpoint Controller using Speech Recognition Christina #1, Rosalina #2, R.B Wahyu #3, Rusdianto Roestam #4 # Faculty of Computing, President

More information

Text Mining for Historical Documents Digitisation and Preservation of Digital Data

Text Mining for Historical Documents Digitisation and Preservation of Digital Data Digitisation and Preservation of Digital Data Computational Linguistics Universität des Saarlandes Wintersemester 2010/11 21.02.2011 Digitisation Why digitise? Advantages of Digitisation safeguard against

More information

VRbot with ROBONOVA-I

VRbot with ROBONOVA-I VRbot Module VRbot with ROBONOVA-I The VRbot module provides voice recognition functions for built-in Speaker Independent (SI) commands and up to 32 user-defined commands (Speaker Dependent (SD) trigger

More information

Learning The Lexicon!

Learning The Lexicon! Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,

More information

ISLE Metadata Initiative (IMDI) PART 1 B. Metadata Elements for Catalogue Descriptions

ISLE Metadata Initiative (IMDI) PART 1 B. Metadata Elements for Catalogue Descriptions ISLE Metadata Initiative (IMDI) PART 1 B Metadata Elements for Catalogue Descriptions Version 3.0.13 August 2009 INDEX 1 INTRODUCTION...3 2 CATALOGUE ELEMENTS OVERVIEW...4 3 METADATA ELEMENT DEFINITIONS...6

More information

Andrea PureAudio BT-200 Noise Canceling Bluetooth Headset Performance Comparative Testing

Andrea PureAudio BT-200 Noise Canceling Bluetooth Headset Performance Comparative Testing Andrea Audio Test Labs Andrea PureAudio BT-200 Noise Canceling Bluetooth Headset August 28 th 2008 Rev A Andrea Electronics Corporation 65 Orville Drive Suite One Bohemia NY 11716 (631)-719-1800 www.andreaelectronics.com

More information

Vestec Automatic Speech Recognition Engine Standard Edition Version Administration Guide

Vestec Automatic Speech Recognition Engine Standard Edition Version Administration Guide Vestec Automatic Speech Recognition Engine Standard Edition Version 1.1.1 Administration Guide Vestec Automatic Speech Recognition Engine Standard Edition Version 1.1.1 Administration Guide Copyright 2009

More information

Speech Control System for Robot Based on Raspberry Pi

Speech Control System for Robot Based on Raspberry Pi Advanced Materials Research Online: 2013-09-04 ISSN: 1662-8985, Vols. 791-793, pp 663-667 doi:10.4028/www.scientific.net/amr.791-793.663 2013 Trans Tech Publications, Switzerland Speech Control System

More information

A cocktail approach to the VideoCLEF 09 linking task

A cocktail approach to the VideoCLEF 09 linking task A cocktail approach to the VideoCLEF 09 linking task Stephan Raaijmakers Corné Versloot Joost de Wit TNO Information and Communication Technology Delft, The Netherlands {stephan.raaijmakers,corne.versloot,

More information

Corpus methods for sociolinguistics. Emily M. Bender NWAV 31 - October 10, 2002

Corpus methods for sociolinguistics. Emily M. Bender NWAV 31 - October 10, 2002 Corpus methods for sociolinguistics Emily M. Bender bender@csli.stanford.edu NWAV 31 - October 10, 2002 Overview Introduction Corpora of interest Software for accessing and analyzing corpora (demo) Basic

More information

Object-based audio production. Chris Baume EBU-PTS - 27th January 2016

Object-based audio production. Chris Baume EBU-PTS - 27th January 2016 Object-based audio production Chris Baume EBU-PTS - 27th January 2016 Structure Challenges in Radio ORPHEUS project Impact on production workflow Production tool demo What is object-based

More information

Tina John University of Munich Workshop on standards for phonological corpora Tina John M.A. 1

Tina John University of Munich Workshop on standards for phonological corpora Tina John M.A. 1 Tina John University of Munich (natty_t@gmx.net) 1 Emu Speech Database System Database system for: audio data parametrical data annotation 2 Emu Speech Database System provides: 3 Platforms following setups

More information

Speech-based Information Retrieval System with Clarification Dialogue Strategy

Speech-based Information Retrieval System with Clarification Dialogue Strategy Speech-based Information Retrieval System with Clarification Dialogue Strategy Teruhisa Misu Tatsuya Kawahara School of informatics Kyoto University Sakyo-ku, Kyoto, Japan misu@ar.media.kyoto-u.ac.jp Abstract

More information

SAS: A speaker verification spoofing database containing diverse attacks

SAS: A speaker verification spoofing database containing diverse attacks SAS: A speaker verification spoofing database containing diverse attacks Zhizheng Wu 1, Ali Khodabakhsh 2, Cenk Demiroglu 2, Junichi Yamagishi 1,3, Daisuke Saito 4, Tomoki Toda 5, Simon King 1 1 University

More information

INFORMEDIA TM : NEWS-ON-DEMAND EXPERIMENTS IN SPEECH RECOGNITION

INFORMEDIA TM : NEWS-ON-DEMAND EXPERIMENTS IN SPEECH RECOGNITION INFORMEDIA TM : NEWS-ON-DEMAND EXPERIMENTS IN SPEECH RECOGNITION Howard D. Wactlar, Alexander G. Hauptmann and Michael J. Witbrock ABSTRACT In theory, speech recognition technology can make any spoken

More information

Trial-Based Calibration for Speaker Recognition in Unseen Conditions

Trial-Based Calibration for Speaker Recognition in Unseen Conditions Trial-Based Calibration for Speaker Recognition in Unseen Conditions Mitchell McLaren, Aaron Lawson, Luciana Ferrer, Nicolas Scheffer, Yun Lei Speech Technology and Research Laboratory SRI International,

More information

VoIP Overview. Device Setup The device is configured via the VoIP tab of the devices Device Properties dialog in Integration Designer.

VoIP Overview. Device Setup The device is configured via the VoIP tab of the devices Device Properties dialog in Integration Designer. VoIP Overview DESCRIPTION: RTI devices with VoIP (Voice over IP) support currently support peer-to-peer communication with other RTI devices and 3rd party devices that support the SIP protocol. Audio is

More information

Digital Audio Basics

Digital Audio Basics CSC 170 Introduction to Computers and Their Applications Lecture #2 Digital Audio Basics Digital Audio Basics Digital audio is music, speech, and other sounds represented in binary format for use in digital

More information

WHO WANTS TO BE A MILLIONAIRE?

WHO WANTS TO BE A MILLIONAIRE? IDIAP COMMUNICATION REPORT WHO WANTS TO BE A MILLIONAIRE? Huseyn Gasimov a Aleksei Triastcyn Hervé Bourlard Idiap-Com-03-2012 JULY 2012 a EPFL Centre du Parc, Rue Marconi 19, PO Box 592, CH - 1920 Martigny

More information

Gary F. Simons. SIL International

Gary F. Simons. SIL International Gary F. Simons SIL International AARDVARC Symposium, LSA, Portland, OR, 11 Jan 2015 Given the relentless entropy that degrades our field recordings, and innovation that makes the technology we have used

More information

ATUC-50 Digital Discussion System Hear and be heard.

ATUC-50 Digital Discussion System Hear and be heard. ATUC-50 Digital Discussion System Hear and be heard. Simplicity You choose the scale and complexity of your communication needs and in return the ATUC-50 Discussion System gives you reliable, crystal-clear

More information

MINIMUM EXACT WORD ERROR TRAINING. G. Heigold, W. Macherey, R. Schlüter, H. Ney

MINIMUM EXACT WORD ERROR TRAINING. G. Heigold, W. Macherey, R. Schlüter, H. Ney MINIMUM EXACT WORD ERROR TRAINING G. Heigold, W. Macherey, R. Schlüter, H. Ney Lehrstuhl für Informatik 6 - Computer Science Dept. RWTH Aachen University, Aachen, Germany {heigold,w.macherey,schlueter,ney}@cs.rwth-aachen.de

More information

User Guide for ELAN Linguistic Annotator

User Guide for ELAN Linguistic Annotator User Guide for ELAN Linguistic Annotator version 5.0.0 This user guide was last updated on 2017-05-02 The latest version can be downloaded from: http://tla.mpi.nl/tools/tla-tools/elan/ Author: Maddalena

More information

Sonic Studio. User Manual

Sonic Studio. User Manual Sonic Studio User Manual DE157 First Edition October 2014 Copyright 2014 ASUSTeK COMPUTER INC. All Rights Reserved. No part of this manual, including the products and software described in it, may be reproduced,

More information

Students are placed in System 44 based on their performance in the Scholastic Phonics Inventory. System 44 Placement and Scholastic Phonics Inventory

Students are placed in System 44 based on their performance in the Scholastic Phonics Inventory. System 44 Placement and Scholastic Phonics Inventory System 44 Overview The System 44 student application leads students through a predetermined path to learn each of the 44 sounds and the letters or letter combinations that create those sounds. In doing

More information

Copyright 2012 Pulse Systems, Inc. Page 1 of 21

Copyright 2012 Pulse Systems, Inc. Page 1 of 21 The PulsePro Transcription module provides a method of creating and storing patient transcription documents within the PulsePro database. Use the Dictation functions to preview and listen to wave files

More information

CMU-UKA Syntax Augmented Machine Translation

CMU-UKA Syntax Augmented Machine Translation Outline CMU-UKA Syntax Augmented Machine Translation Ashish Venugopal, Andreas Zollmann, Stephan Vogel, Alex Waibel InterACT, LTI, Carnegie Mellon University Pittsburgh, PA Outline Outline 1 2 3 4 Issues

More information

Analysis and Optimization of Spatial and Appearance Encodings of Words and Sentences

Analysis and Optimization of Spatial and Appearance Encodings of Words and Sentences Analysis and Optimization of Spatial and Appearance Encodings of Words and Sentences Semi-Automatic Transcription of Interviews Thomas Lüdi Christian Vögeli Semester Thesis May 2014 Master Thesis SS 2005

More information

WEB APPLICATION FOR VOICE OPERATED EXCHANGE

WEB APPLICATION FOR VOICE OPERATED  EXCHANGE WEB APPLICATION FOR VOICE OPERATED E-MAIL EXCHANGE Sangeet Sagar 1, Vaibhav Awasthi 2, Samarth Rastogi 3, Tushar Garg 4, S. Kuzhalvaimozhi 5 1, 2,3,4,5 Information Science and Engineering, National Institute

More information

Integrate Speech Technology for Hands-free Operation

Integrate Speech Technology for Hands-free Operation Integrate Speech Technology for Hands-free Operation Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks

More information

Applications of Machine Translation

Applications of Machine Translation Applications of Machine Translation Index Historical Overview Commercial Products Open Source Software Special Applications Future Aspects History Before the Computer: Mid 1930s: Georges Artsrouni and

More information

Least Squares Signal Declipping for Robust Speech Recognition

Least Squares Signal Declipping for Robust Speech Recognition Least Squares Signal Declipping for Robust Speech Recognition Mark J. Harvilla and Richard M. Stern Department of Electrical and Computer Engineering Carnegie Mellon University, Pittsburgh, PA 15213 USA

More information

PJP-50USB. Conference Microphone Speaker. User s Manual MIC MUTE VOL 3 CLEAR STANDBY ENTER MENU

PJP-50USB. Conference Microphone Speaker. User s Manual MIC MUTE VOL 3 CLEAR STANDBY ENTER MENU STANDBY CLEAR ENTER MENU PJP-50USB Conference Microphone Speaker VOL 1 4 7 5 8 0 6 9 MIC MUTE User s Manual Contents INTRODUCTION Introduction... Controls and Functions... Top panel... Side panel...4

More information

THE THISL BROADCAST NEWS RETRIEVAL SYSTEM. Dave Abberley (1), David Kirby (2), Steve Renals (1) and Tony Robinson (3)

THE THISL BROADCAST NEWS RETRIEVAL SYSTEM. Dave Abberley (1), David Kirby (2), Steve Renals (1) and Tony Robinson (3) ISCA Archive THE THISL BROADCAST NEWS RETRIEVAL SYSTEM Dave Abberley (1), David Kirby (2), Steve Renals (1) and Tony Robinson (3) (1) University of Sheffield, Department of Computer Science, UK (2) BBC,

More information

Annotation Graphs, Annotation Servers and Multi-Modal Resources

Annotation Graphs, Annotation Servers and Multi-Modal Resources Annotation Graphs, Annotation Servers and Multi-Modal Resources Infrastructure for Interdisciplinary Education, Research and Development Christopher Cieri and Steven Bird University of Pennsylvania Linguistic

More information

AUTOMATIC DIALOG ACT CORPUS CREATION FROM WEB PAGES

AUTOMATIC DIALOG ACT CORPUS CREATION FROM WEB PAGES AUTOMATIC DIALOG ACT CORPUS CREATION FROM WEB PAGES Pavel Král Department of Computer Science and Engineering, University of West Bohemia, Plzeň, Czech Republic pkral@kiv.zcu.cz Christophe Cerisara LORIA

More information

1. Rich video conference control. Video Conferencing. System Solutions. Video Conferencing System

1. Rich video conference control. Video Conferencing. System Solutions. Video Conferencing System Video Conferencing System Solutions SparkleConference-Video Supports single node video conferencing for 100 people Supports multi-node overlay Support standard SIP rfc-4579 conference control protocol

More information

A MOUTH FULL OF WORDS: VISUALLY CONSISTENT ACOUSTIC REDUBBING. Disney Research, Pittsburgh, PA University of East Anglia, Norwich, UK

A MOUTH FULL OF WORDS: VISUALLY CONSISTENT ACOUSTIC REDUBBING. Disney Research, Pittsburgh, PA University of East Anglia, Norwich, UK A MOUTH FULL OF WORDS: VISUALLY CONSISTENT ACOUSTIC REDUBBING Sarah Taylor Barry-John Theobald Iain Matthews Disney Research, Pittsburgh, PA University of East Anglia, Norwich, UK ABSTRACT This paper introduces

More information