Automatic Speech Recognition on Mobile Devices and over Communication Networks
|
|
- Ilene Gwen O’Connor’
- 6 years ago
- Views:
Transcription
1 Zheng-Hua Tan and Berge Lindberg Automatic Speech Recognition on Mobile Devices and over Communication Networks ^Spri inger g<
2 Contents Preface Contributors v xix 1. Network, Distributed and Embedded Speech Recognition: An Overview 1 Zheng-Hua Tan and Imre Varga 1.1 Introduction ASR and Its Deployment in Devices and Networks Automatic Speech Recognition Resources and Constraints of Mobile Devices Resources and Constraints of Communication Networks Architectural Solutions for ASR in Devices and Networks Network Speech Recognition Distributed Speech Recognition Feature Extraction Source Coding Channel Coding and Packetisation Error Concealment DSR Standards A Configurable DSR System Embedded Speech Recognition ESRScenario Applications and Platforms Fixed-Point Arithmetic Optimisation Robustness Discussion 20 References 21
3 x Part I Contents Network Speech Recognition 2. Speech Coding and Packet Loss Effects on Speech and Speaker Recognition 27 Laurent Besacier 2.1 Introduction Sources of Degradation in Network Speech Recognition Speech and Audio Coding Standards Packet Loss Effects on the Automatic Speech Recognition Task Experimental Setup Degradation Due to Simulated Packet Loss Degradation with Real Transmissions Degradation Due to Speech and Audio Codecs Effect for the Automatic Speaker Verification Task Speaker Verification Experiments Over Compressed Speech and Packet Loss Speaker Verification Experiments Over GSM Compressed Speech Conclusion 38 Acknowledgments 38 References Speech Recognition Over Mobile Networks 41 Hong Kook Kim and Richard C. Rose 3.1 Introduction Techniques for Improving ASR Performance Over Mobile Networks Bitstream-Based Approach Feature Transform Mel-Scaled LPCC LPC-Based MFCC (LP-MFCC) Pseudo-Cepstrum (PCEP) and Its Mel-Scaled Variant (MPCEP) Enhancement of ASR Performance Over Mobile Networks Compensation for the Effect of Mobile Systems Compensation for Speech Coding Distortion in LSP Domain Compensation for Channel Errors Conclusion 57 References Speech Recognition Over IP Networks 63 Hong Kook Kim 4.1 Introduction Speech Recognition and IP Networks Relationship Between ASR Performance and Speech Quality Impact of Speech Coding Distortion Impact of Network Channel Distortion 67
4 Contents xi 4.3 Robustness Against Packet Loss Rate Control Forward Error Correction Interleaving Error Concealment and ASRDecoder- Based Concealment Speech Coder for Speech Recognition Over IP Networks MFCC-Based Speech Coder Efficient Vector Quantization of MFCCs Speech Quality Comparison ASR Performance Comparison Conclusion 82 References 82 Part II Distributed Speech Recognition 5. Distributed Speech Recognition Standards 87 David Pearce 5.1 Introduction Overview of the Set of DSR Standards Scope of the Standards Electro-Acoustics Speech Detection or External Control Signal Pre-Processing Parameterisation Compression and Error Protection Formatting Error Detection and Mitigation Decompression Server Side Post Processing Feature Derivatives DSR Basic Front-End ES Feature Extraction Compression Error Detection and Mitigation DSR Advanced Front-End ES Feature Extraction VAD Compression Recognition Performance of the DSR Front-Ends Aurora Speech Databases and ETSI Performance Testing Aurora 3: Multilingual SpeechDat-Car Digits Small Vocabulary Evaluation GPP Evaluations and Comparisons to AMR Coded Speech ETSI DSR Extended Front-End Standards ES and ES Transport Protocols: The IETF RTP Payload Formats for DSR Conclusion 105 Acknowledgements 105 References 105
5 xii Contents 6. Speech Feature Extraction and Reconstruction 107 Ben Milner 6.1 Introduction Feature Extraction Basic Terminal-Side Feature Extraction Advanced Terminal-Side Feature Extraction Quantisation and Packetisation Server-Side Processing Speech Reconstruction Analysisof Received Speech Information Speech Reconstruction Prediction of Voicing and Fundamental Frequency Fundamental Frequency Prediction from MFCC Vectors Voicing Prediction from MFCC Vectors Speech Reconstruction from Predicted Fundamental Frequency and Voicing Conclusion 129 References Quantization of Speech Features: Source Coding 131 Stephen So and Kuldip K. Paliwal 7.1 Introduction Quantization Schemes Brief Introduction to Quantization Theory Distortion Measures for Quantization in Speech Processing Scalar Quantization Block Quantization Vector Quantization GMM-Based Block Quantization Quantization of ASR Feature Vectors Introduction and Literature Review Statistical Properties of MFCCs Use of Cepstral Liftering for MFCC Variance Normahzation Relationship Between the Distortion Measure and Recognition Performance Improving Noise Robustness: Perceptual Weighting of Filterbank Energies Experimental Results ETSI Aurora-2 Distributed Speech Recognition Task Experimental Setup Non-Uniform Scalar Quantization Using HRO Bit Allocation Unconstrained Vector Quantization GMM-Based Block Quantization Multi-frame GMM-Based Block Quantization Perceptually-Weighted Vector Quantization of Logarithmic Filterbank Energies Conclusion 158 References 159
6 Contents xiii 8. Error Recovery: Channel Coding and Packetization 163 BengtJ. Borgström, Alexis Bernard, and Abeer Alwan 8.1 Distributed Speech Recognition Systems Characterization and Modeling of Communication Channels Signal Degradation Over Wireless Communication Channels Signal Degradation Over IP Networks Modeling Bursty Communication Channels Media-Specific FEC Media-Independent FEC Combining FEC with Error Concealment Methods Linear Block Codes Cyclic Codes Convolutional Codes Unequal Error Protections Frame Interleaving Optimal Spread Block Interleavers Convolutional Interleavers Decorrelated Block Interleavers Examples of Modern Error Recovery Standards ETSI DSR Standard (ETSI 2000) ETSI GSM/EFR Standard (ETSI 1998) Summary 183 Acknowledgements 184 References Error Concealment 187 Reinhold Haeb-Umbach and Valentin Ion 9.1 Introduction Speech Recognition in the Presence of Corrupted Features Modified Observation Probability Gaussian Approximation Feature Posterior Estimation in a DSR Framework ETSI DSR Standards Source Coder Redundancy Channel Models Estimation of Feature Posterior Related Work Performance Evaluations Experimental Setup Results on GSM Data Channel Results on Packet Erasure Channel Conclusion 207 Acknowledgments 208 References 208
7 xiv Contents Part III Embedded Speech Recognition 10. Algorithm Optimizations: Low Computational Complexity 213 Miroslav Novak 10.1 Introduction Common Limitations of Embedded Platforms Memory Limitations CPU Limitations Overview of an ASR System Front End Observation Model Model Organization Efficient Computation Strategies Search Viterbi Search Implementation Search Graph Construction Fast Match Alternative Decoding Schemes Conclusion 229 Acknowledgments 229 References Algorithm Optimizations: Low Memory Footprint 233 Marcel Vasilache 11.1 Introduction Notations and Problem Statement Model Complexity Control Akaike's Information Criterion Bayesian Information Criterion Second Order Approximation Other Measures Parameter Tying Model Level State Level Density Level Subspaces Clustering Parameter Representations Floating Point Representation Fixed Point Representation Quantization Quantized Parameters HMMs Scalar Quantization Vector Quantization Subspace Distribution Clustering HMM Subspace Partitioning Density Clustering 249
8 Contents xv 11.8 Computational Complexity Implications Practicalities and Conclusion 250 References Fixed-Point Arithmetic 255 Enrico Bocchieri 12.1 Introduction Fixed-Point Arithmetic Programming with Fixed-Point Numbers Fixed-Point Representation and Quantization LVCSR MAP Recognizer HMM State Likelihoods State Duration Model Language Model Viterbi Decoder Acoustic Front-End Fixed-Point Implementation of the Recognizer Log-Likelihoods Viterbi Frame-Synchronous Search Gaussian Parameters MFCC Front-End Experiments Real-Time on the Device Conclusion 274 Acknowledgements 274 References 274 Part IV Systems and Applications 13. Software Architectures for Networked Mobile Speech Applications James C. Ferrans and Jonathan Engelsma 13.1 Introduction Embedded and Distnbuted Speech Engines The Voice Web Multimodal User Interfaces Distributed Speech Recognition Multimodal Architectures Simultaneous and Sequential Multimodality Mode Composition Classesof Multimodal Architectures Fully Embedded or "Fat Client" (a) Distributed Processing Engines (b) Thin Client (d) Remote Visual Interface (e) "Pudgy" Client (c) Discussion The "Plus V" Distributed Multimodal Architecture Other Distributed Multimodal Architectures 295
9 xvi Contents Video Interactive Services with VoiceXML Multimodal for Set-Top Boxes Bare Minimum Mobile Voice Search A Transcription-Based Architecture Towards a Commercial Ecosystem Conclusion 298 References Speech Recognition in Mobile Phones 301 Imre Varga and Imre Kiss 14.1 Introduction Applications of Speech Recognition for Mobile Phones Multilinguality and Language Support Multilingual Speaker Independent Name Dialing Multilinguality in Other ASR Applications Language Resources Noise Robustness Robust HMM Models Feature Extraction Noise Reduction Footprint and Complexity Reduction Footprint Reduction of Acoustic Models Footprint Reduction of Language Models Footprint Reduction of Pronunciation Lexicon Reduction of Computational Complexity in Embedded ASR Systems Low Memory, Fast Decoding Platforms and an Example Application Example Application: Large Vocabulary Isolated Word Dictation Conclusion and Outlook 323 References Handheld Speech to Speech Translation System 327 Yuqing Gao, Bowen Zhou, Weizhong Zhu and Wei Zhang 15.1 Introduction System Overview System architecture Hardware and OS Specifications Interface System Components and Optimization LVCSR on Handheld Devices Natural Language Understanding and Generation Based Translation Weighted Finite State Transducer Based Translation Embedded Speech Synthesis 340
10 Contents xvii 15.4 Experiments and Discussions Speech Recognition Experiments Translation Experiments Conclusion 344 References Automotive Speech Recognition 347 Harald Höge, Sascha Hohenner, Bernhard Kämmerer, Niels Kunstmann, Stefanie Schachtl, Martin Schönle, and Panji Setiawan 16.1 Introduction Siemens Speech Processing From Research to Products Development for Performance andquality High-Performance Recognizer Ultra-Compact Text-to-Speech Synthesizer Natural Voice Dialog Speaker Characterization and Recognition Example Automotive Voice Applications: Infotainment, Navigation, Manuals, and Internet Radio Station Selection MP3 Title Selection Navigation Destination Entry Manuals and Help Systems Access to Structured Web Content Access to Web Services Automotive Platform Issues and Challenges Hardware Constraints Software Constraints User Constraints Acoustic Channel Noise Robust Recognition Technology ASRFront-End Minimum Mean Square Weighting Rules Recursive Least Squares Weighting Rules Implementation of RLS Weighting Rules Recognition Results Methodology for Evaluation of Automotive Recognizers Quality Measurement Using SNR Curves Common Evaluation Procedures Proposed SNR-Approach Data Recording Evaluation Best Practice Conclusion 372 References 372
11 xviii Contents 17. Energy Aware Speech Recognition for Mobile Devices 375 Brian Delaney 17.1 Introduction Battery Technology Energy Aware Design Principles Related Work Case Study of Distributed Speech Recognition Using the HP Labs Smartbadge System Signal Processing Front-End Energy Consumption of DSR with IEEE Wireless Networks Energy Consumption of DSR Using Bluetooth Networks Comparison of and Bluetooth in DSR Conclusion 395 References 395 Index 397
Advances in Pattern Recognition
Advances in Pattern Recognition Advances in Pattern Recognition is a series of books which brings together current developments in all areas of this multi-disciplinary topic. It covers both theoretical
More informationCOPYRIGHTED MATERIAL. Introduction. 1.1 Introduction
1 Introduction 1.1 Introduction One of the most fascinating characteristics of humans is their capability to communicate ideas by means of speech. This capability is undoubtedly one of the facts that has
More informationAutomatic speech recognition over error-prone wireless networks q
Speech Communication 47 (2005) 220 242 www.elsevier.com/locate/specom Automatic speech recognition over error-prone wireless networks q Zheng-Hua Tan *, Paul Dalsgaard, Børge Lindberg Centre for TeleInFrastructure
More informationIntroduction to Networked Multimedia An Introduction to RTP p. 3 A Brief History of Audio/Video Networking p. 4 Early Packet Voice and Video
Preface p. xi Acknowledgments p. xvii Introduction to Networked Multimedia An Introduction to RTP p. 3 A Brief History of Audio/Video Networking p. 4 Early Packet Voice and Video Experiments p. 4 Audio
More informationPitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery
Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta Kumar Ghosh SPIRE LAB Electrical Engineering, Indian Institute of Science (IISc), Bangalore,
More informationNetwork Working Group Request for Comments: 4060 Category: Standards Track May 2005
Network Working Group Request for Comments: 4060 Category: Standards Track Q. Xie D. Pearce Motorola May 2005 Status of This Memo RTP Payload Formats for European Telecommunications Standards Institute
More informationA Review of Network Server Based Distributed Speech Recognition
A Review of Network Based Distributed Speech Recognition YIMIN XIE, JOE CHICHARO, JIANGTAO XI & CHUN TUNG CHOU School of Electrical, Computer and Telecommunications ering University of Wollongong Northfields
More informationA Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models
A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models and Voice-Activated Power Gating Michael Price*, James Glass, Anantha Chandrakasan MIT, Cambridge, MA * now at Analog Devices, Cambridge,
More informationEfficient Scalable Encoding for Distributed Speech Recognition
EFFICIENT SCALABLE ENCODING FOR DISTRIBUTED SPEECH RECOGNITION 1 Efficient Scalable Encoding for Distributed Speech Recognition Naveen Srinivasamurthy, Antonio Ortega and Shrikanth Narayanan Standards
More informationETSI TS V ( )
TS 126 441 V12.0.0 (2014-10) TECHNICAL SPECIFICATION Universal Mobile Telecommunications System (UMTS); LTE; EVS Codec General Overview (3GPP TS 26.441 version 12.0.0 Release 12) 1 TS 126 441 V12.0.0 (2014-10)
More informationDigital Speech Coding
Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2700/INFSCI 1072 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html
More informationVariable-Component Deep Neural Network for Robust Speech Recognition
Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft
More informationCoordinated Multi-Point in Mobile Communications
Coordinated Multi-Point in Mobile Communications From Theory to Practice Edited by PATRICK MARSCH Nokia Siemens Networks, Wroctaw, Poland GERHARD P. FETTWEIS Technische Universität Dresden, Germany Pf
More informationDynamic Time Warping
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Dynamic Time Warping Dr Philip Jackson Acoustic features Distance measures Pattern matching Distortion penalties DTW
More informationREAL-TIME DIGITAL SIGNAL PROCESSING
REAL-TIME DIGITAL SIGNAL PROCESSING FUNDAMENTALS, IMPLEMENTATIONS AND APPLICATIONS Third Edition Sen M. Kuo Northern Illinois University, USA Bob H. Lee Ittiam Systems, Inc., USA Wenshun Tian Sonus Networks,
More information4G WIRELESS VIDEO COMMUNICATIONS
4G WIRELESS VIDEO COMMUNICATIONS Haohong Wang Marvell Semiconductors, USA Lisimachos P. Kondi University of Ioannina, Greece Ajay Luthra Motorola, USA Song Ci University of Nebraska-Lincoln, USA WILEY
More informationTelecommunications Engineering Course Descriptions
Telecommunications Engineering Course Descriptions Electrical Engineering Courses EE 5305 Radio Frequency Engineering (3 semester hours) Introduction to generation, transmission, and radiation of electromagnetic
More informationMULTIDIMENSIONAL SIGNAL, IMAGE, AND VIDEO PROCESSING AND CODING
MULTIDIMENSIONAL SIGNAL, IMAGE, AND VIDEO PROCESSING AND CODING JOHN W. WOODS Rensselaer Polytechnic Institute Troy, New York»iBllfllfiii.. i. ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD
More informationSpeech-Coding Techniques. Chapter 3
Speech-Coding Techniques Chapter 3 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower the quality RTP payload types
More informationJAVA Projects. 1. Enforcing Multitenancy for Cloud Computing Environments (IEEE 2012).
JAVA Projects I. IEEE based on CLOUD COMPUTING 1. Enforcing Multitenancy for Cloud Computing Environments 2. Practical Detection of Spammers and Content Promoters in Online Video Sharing Systems 3. An
More informationLecture 5: Error Resilience & Scalability
Lecture 5: Error Resilience & Scalability Dr Reji Mathew A/Prof. Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S 010 jzhang@cse.unsw.edu.au Outline Error Resilience Scalability Including slides
More informationThe Essential Guide to Video Processing
The Essential Guide to Video Processing Second Edition EDITOR Al Bovik Department of Electrical and Computer Engineering The University of Texas at Austin Austin, Texas AMSTERDAM BOSTON HEIDELBERG LONDON
More informationSpeaker Verification with Adaptive Spectral Subband Centroids
Speaker Verification with Adaptive Spectral Subband Centroids Tomi Kinnunen 1, Bingjun Zhang 2, Jia Zhu 2, and Ye Wang 2 1 Speech and Dialogue Processing Lab Institution for Infocomm Research (I 2 R) 21
More informationMultimedia Data Transmission over Mobile Internet using Packet-Loss Punctured (PaLoP) Codes
Multimedia Data Transmission over Mobile Internet using Packet-Loss Punctured PaLoP Codes Markus Kaindl and Joachim Hagenauer Institute for Communications Engineering Munich University of Technology 9
More informationA MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE
A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE S.Villette, M.Stefanovic, A.Kondoz Centre for Communication Systems Research University of Surrey, Guildford GU2 5XH, Surrey, United
More informationOptimal Estimation for Error Concealment in Scalable Video Coding
Optimal Estimation for Error Concealment in Scalable Video Coding Rui Zhang, Shankar L. Regunathan and Kenneth Rose Department of Electrical and Computer Engineering University of California Santa Barbara,
More informationAll MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes
MSEE Curriculum All MSEE students are required to take the following two core courses: 3531-571 Linear systems 3531-507 Probability and Random Processes The course requirements for students majoring in
More informationModified SPIHT Image Coder For Wireless Communication
Modified SPIHT Image Coder For Wireless Communication M. B. I. REAZ, M. AKTER, F. MOHD-YASIN Faculty of Engineering Multimedia University 63100 Cyberjaya, Selangor Malaysia Abstract: - The Set Partitioning
More informationDigital Image Processing
Digital Image Processing Third Edition Rafael C. Gonzalez University of Tennessee Richard E. Woods MedData Interactive PEARSON Prentice Hall Pearson Education International Contents Preface xv Acknowledgments
More informationThe MPEG-4 General Audio Coder
The MPEG-4 General Audio Coder Bernhard Grill Fraunhofer Institute for Integrated Circuits (IIS) grl 6/98 page 1 Outline MPEG-2 Advanced Audio Coding (AAC) MPEG-4 Extensions: Perceptual Noise Substitution
More informationOptimum Array Processing
Optimum Array Processing Part IV of Detection, Estimation, and Modulation Theory Harry L. Van Trees WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Preface xix 1 Introduction 1 1.1 Array Processing
More informationImproving Robustness to Compressed Speech in Speaker Recognition
INTERSPEECH 2013 Improving Robustness to Compressed Speech in Speaker Recognition Mitchell McLaren 1, Victor Abrash 1, Martin Graciarena 1, Yun Lei 1, Jan Pe sán 2 1 Speech Technology and Research Laboratory,
More informationMULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING
MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING Pravin Ramadas, Ying-Yi Li, and Jerry D. Gibson Department of Electrical and Computer Engineering, University of California,
More informationLost VOIP Packet Recovery in Active Networks
Lost VOIP Packet Recovery in Active Networks Yousef Darmani M. Eng. Sc. Sharif University of Technology, Tehran, Iran Thesis submitted for the degree of Doctor of Philosophy m Department of Electrical
More informationGYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)
GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE
More informationIntroduction to HTK Toolkit
Introduction to HTK Toolkit Berlin Chen 2003 Reference: - The HTK Book, Version 3.2 Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools Homework:
More informationAutomatic Speech Recognition (ASR)
Automatic Speech Recognition (ASR) February 2018 Reza Yazdani Aminabadi Universitat Politecnica de Catalunya (UPC) State-of-the-art State-of-the-art ASR system: DNN+HMM Speech (words) Sound Signal Graph
More informationCompression for Speech Recognition and Music Classification
Compression for Speech Recognition and Music Classification 1. Research Team Project Leader: Other Faculty: Graduate Students: Undergraduate Students: Prof. Antonio Ortega, Electrical Engineering Prof.
More informationHybrid NN/HMM Acoustic Modeling Techniques for Distributed Speech Recognition
Hybrid NN/HMM Acoustic Modeling Techniques for Distributed Speech Recognition Jan Stadermann 1 Gerhard Rigoll Technische Universität München, Institute for Human-Machine Communication, München Germany
More informationContents. Resumen. List of Acronyms. List of Mathematical Symbols. List of Figures. List of Tables. I Introduction 1
Contents Agraïments Resum Resumen Abstract List of Acronyms List of Mathematical Symbols List of Figures List of Tables VII IX XI XIII XVIII XIX XXII XXIV I Introduction 1 1 Introduction 3 1.1 Motivation...
More informationNokia Q. Xie Motorola April 2007
Network Working Group Request for Comments: 4867 Obsoletes: 3267 Category: Standards Track J. Sjoberg M. Westerlund Ericsson A. Lakaniemi Nokia Q. Xie Motorola April 2007 RTP Payload Format and File Storage
More information3GPP TR V ( )
TR 22.977 V12.0.0 (2014-10) Technical Report 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Feasibility study for speech-enabled services (Release 12) The
More informationA ROBUST SPEAKER CLUSTERING ALGORITHM
A ROBUST SPEAKER CLUSTERING ALGORITHM J. Ajmera IDIAP P.O. Box 592 CH-1920 Martigny, Switzerland jitendra@idiap.ch C. Wooters ICSI 1947 Center St., Suite 600 Berkeley, CA 94704, USA wooters@icsi.berkeley.edu
More informationQUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose
QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,
More informationModule 6 STILL IMAGE COMPRESSION STANDARDS
Module 6 STILL IMAGE COMPRESSION STANDARDS Lesson 19 JPEG-2000 Error Resiliency Instructional Objectives At the end of this lesson, the students should be able to: 1. Name two different types of lossy
More informationMPEG-4 General Audio Coding
MPEG-4 General Audio Coding Jürgen Herre Fraunhofer Institute for Integrated Circuits (IIS) Dr. Jürgen Herre, hrr@iis.fhg.de 1 General Audio Coding Solid state players, Internet audio, terrestrial and
More informationAudio-visual interaction in sparse representation features for noise robust audio-visual speech recognition
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for
More informationText-Independent Speaker Identification
December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.
More informationInput speech signal. Selected /Rejected. Pre-processing Feature extraction Matching algorithm. Database. Figure 1: Process flow in ASR
Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Feature Extraction
More informationRate-Distortion Optimized Layered Coding with Unequal Error Protection for Robust Internet Video
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 357 Rate-Distortion Optimized Layered Coding with Unequal Error Protection for Robust Internet Video Michael Gallant,
More informationOptimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification
Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 52 Optimization of Observation Membership Function By Particle Swarm Method for Enhancing
More informationA Network Conditions Estimator for Voice Over IP Objective Quality Assessment
University of Miami Scholarly Repository Open Access Theses Electronic Theses and Dissertations 2011-11-22 A Network Conditions Estimator for Voice Over IP Objective Quality Assessment Carlos Daniel Nocito
More informationLow complexity H.264 list decoder for enhanced quality real-time video over IP
Low complexity H.264 list decoder for enhanced quality real-time video over IP F. Golaghazadeh1, S. Coulombe1, F-X Coudoux2, P. Corlay2 1 École de technologie supérieure 2 Université de Valenciennes CCECE
More informationSVD-based Universal DNN Modeling for Multiple Scenarios
SVD-based Universal DNN Modeling for Multiple Scenarios Changliang Liu 1, Jinyu Li 2, Yifan Gong 2 1 Microsoft Search echnology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft Way, Redmond,
More informationLecture Information Multimedia Video Coding & Architectures
Multimedia Video Coding & Architectures (5LSE0), Module 01 Introduction to coding aspects 1 Lecture Information Lecturer Prof.dr.ir. Peter H.N. de With Faculty Electrical Engineering, University Technology
More informationContents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48
Contents Part I Prelude 1 Introduction... 3 1.1 Audio Coding... 4 1.2 Basic Idea... 6 1.3 Perceptual Irrelevance... 8 1.4 Statistical Redundancy... 9 1.5 Data Modeling... 9 1.6 Resolution Challenge...
More informationMulti-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language
Journal of Computer Science 6 (11): 1288-1292, 2010 ISSN 1549-3636 2010 Science Publications Multi-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language
More informationMATLAB Apps for Teaching Digital Speech Processing
MATLAB Apps for Teaching Digital Speech Processing Lawrence Rabiner, Rutgers University Ronald Schafer, Stanford University GUI LITE 2.5 editor written by Maria d Souza and Dan Litvin MATLAB coding support
More informationGSM Network and Services
GSM Network and Services Voice coding 1 From voice to radio waves voice/source coding channel coding block coding convolutional coding interleaving encryption burst building modulation diff encoding symbol
More informationSynopsis of Basic VoIP Concepts
APPENDIX B The Catalyst 4224 Access Gateway Switch (Catalyst 4224) provides Voice over IP (VoIP) gateway applications for a micro branch office. This chapter introduces some basic VoIP concepts. This chapter
More informationDigital Signal Processing with Field Programmable Gate Arrays
Uwe Meyer-Baese Digital Signal Processing with Field Programmable Gate Arrays Third Edition With 359 Figures and 98 Tables Book with CD-ROM ei Springer Contents Preface Preface to Second Edition Preface
More informationConfidence Measures: how much we can trust our speech recognizers
Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition
More informationGPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search
GPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search Wonkyum Lee Jungsuk Kim Ian Lane Electrical and Computer Engineering Carnegie Mellon University March 26, 2014 @GTC2014
More informationNew Results in Low Bit Rate Speech Coding and Bandwidth Extension
Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without
More informationLecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1)
Multimedia Video Coding & Architectures (5LSE0), Module 01 Introduction to coding aspects 1 Lecture Information Lecturer Prof.dr.ir. Peter H.N. de With Faculty Electrical Engineering, University Technology
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 1 Course Overview This course is about performing inference in complex
More informationSystem Identification Related Problems at SMN
Ericsson research SeRvices, MulTimedia and Networks System Identification Related Problems at SMN Erlendur Karlsson SysId Related Problems @ ER/SMN Ericsson External 2015-04-28 Page 1 Outline Research
More informationDIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS
DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame consists of two interlaced fields, giving a field rate of 50
More informationRobust speech recognition using features based on zero crossings with peak amplitudes
Robust speech recognition using features based on zero crossings with peak amplitudes Author Gajic, Bojana, Paliwal, Kuldip Published 200 Conference Title Proceedings of the 200 IEEE International Conference
More informationFECFRAME extension Adding convolutional FEC codes support to the FEC Framework
FECFRAME extension Adding convolutional FEC codes support to the FEC Framework Vincent Roca, Inria, France Ali Begen, Networked Media, Turkey https://datatracker.ietf.org/doc/draft-roca-tsvwg-fecframev2/
More informationSoftware/Hardware Co-Design of HMM Based Isolated Digit Recognition System
154 JOURNAL OF COMPUTERS, VOL. 4, NO. 2, FEBRUARY 2009 Software/Hardware Co-Design of HMM Based Isolated Digit Recognition System V. Amudha, B.Venkataramani, R. Vinoth kumar and S. Ravishankar Department
More informationETSI TS V ( )
TS 126 446 V12.0.0 (2014-10) TECHNICAL SPECIFICATION Universal Mobile Telecommunications System (UMTS); LTE; EVS Codec AMR-WB Backward Compatible Functions (3GPP TS 26.446 version 12.0.0 Release 12) 1
More informationParametric Coding of High-Quality Audio
Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits
More informationCISC 7610 Lecture 3 Multimedia data and data formats
CISC 7610 Lecture 3 Multimedia data and data formats Topics: Perceptual limits of multimedia data JPEG encoding of images MPEG encoding of audio MPEG and H.264 encoding of video Multimedia data: Perceptual
More informationAUDIO SIGNAL PROCESSING FOR NEXT- GENERATION MULTIMEDIA COMMUNI CATION SYSTEMS
AUDIO SIGNAL PROCESSING FOR NEXT- GENERATION MULTIMEDIA COMMUNI CATION SYSTEMS Edited by YITENG (ARDEN) HUANG Bell Laboratories, Lucent Technologies JACOB BENESTY Universite du Quebec, INRS-EMT Kluwer
More informationSpeech Recognition on DSP: Algorithm Optimization and Performance Analysis
Speech Recognition on DSP: Algorithm Optimization and Performance Analysis YUAN Meng A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Philosophy in Electronic Engineering
More informationRTP implemented in Abacus
Spirent Abacus RTP implemented in Abacus 编号版本修改时间说明 1 1. Codec that Abacus supports. G.711u law G.711A law G.726 G.726 ITU G.723.1 G.729 AB (when VAD is YES, it is G.729AB, when No, it is G.729A) G.729
More informationMaximum Likelihood Beamforming for Robust Automatic Speech Recognition
Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Barbara Rauch barbara@lsv.uni-saarland.de IGK Colloquium, Saarbrücken, 16 February 2006 Agenda Background: Standard ASR Robust ASR
More informationCar Information Systems for ITS
Car Information Systems for ITS 102 Car Information Systems for ITS Kozo Nakamura Ichiro Hondo Nobuo Hataoka, Ph.D. Shiro Horii OVERVIEW: For ITS (intelligent transport systems) car information systems,
More informationConvention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author s advance manuscript, without
More informationWHO WANTS TO BE A MILLIONAIRE?
IDIAP COMMUNICATION REPORT WHO WANTS TO BE A MILLIONAIRE? Huseyn Gasimov a Aleksei Triastcyn Hervé Bourlard Idiap-Com-03-2012 JULY 2012 a EPFL Centre du Parc, Rue Marconi 19, PO Box 592, CH - 1920 Martigny
More informationTHE H.264 ADVANCED VIDEO COMPRESSION STANDARD
THE H.264 ADVANCED VIDEO COMPRESSION STANDARD Second Edition Iain E. Richardson Vcodex Limited, UK WILEY A John Wiley and Sons, Ltd., Publication About the Author Preface Glossary List of Figures List
More informationThe BroadVoice Speech Coding Algorithm. Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010
The BroadVoice Speech Coding Algorithm Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010 Outline 1. Introduction 2. Basic Codec Structures 3. Short-Term Prediction
More informationAdvanced Video Coding: The new H.264 video compression standard
Advanced Video Coding: The new H.264 video compression standard August 2003 1. Introduction Video compression ( video coding ), the process of compressing moving images to save storage space and transmission
More information14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP
TRADEOFF BETWEEN COMPLEXITY AND MEMORY SIZE IN THE 3GPP ENHANCED PLUS DECODER: SPEED-CONSCIOUS AND MEMORY- CONSCIOUS DECODERS ON A 16-BIT FIXED-POINT DSP Osamu Shimada, Toshiyuki Nomura, Akihiko Sugiyama
More informationPerceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding
Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.
More informationChapter 3. Speech segmentation. 3.1 Preprocessing
, as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents
More informationAudio Coding and MP3
Audio Coding and MP3 contributions by: Torbjørn Ekman What is Sound? Sound waves: 20Hz - 20kHz Speed: 331.3 m/s (air) Wavelength: 165 cm - 1.65 cm 1 Analogue audio frequencies: 20Hz - 20kHz mono: x(t)
More informationDistributed Signal Processing for Binaural Hearing Aids
Distributed Signal Processing for Binaural Hearing Aids Olivier Roy LCAV - I&C - EPFL Joint work with Martin Vetterli July 24, 2008 Outline 1 Motivations 2 Information-theoretic Analysis 3 Example: Distributed
More informationSource Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201
Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao Outline Why do we need to compress speech signals Basic components in a source coding
More informationError Concealment Used for P-Frame on Video Stream over the Internet
Error Concealment Used for P-Frame on Video Stream over the Internet MA RAN, ZHANG ZHAO-YANG, AN PING Key Laboratory of Advanced Displays and System Application, Ministry of Education School of Communication
More informationMpeg 1 layer 3 (mp3) general overview
Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,
More informationEpipolar Geometry in Stereo, Motion and Object Recognition
Epipolar Geometry in Stereo, Motion and Object Recognition A Unified Approach by GangXu Department of Computer Science, Ritsumeikan University, Kusatsu, Japan and Zhengyou Zhang INRIA Sophia-Antipolis,
More informationEVALITA 2009: Loquendo Spoken Dialog System
EVALITA 2009: Loquendo Spoken Dialog System Paolo Baggia Director of International Standards Speech Luminary at SpeechTEK 2009 Evalita Workshop December 12 th, 2009 Evalita Workshop 2009 Paolo Baggia 11
More informationSystem Identification Related Problems at SMN
Ericsson research SeRvices, MulTimedia and Network Features System Identification Related Problems at SMN Erlendur Karlsson SysId Related Problems @ ER/SMN Ericsson External 2016-05-09 Page 1 Outline Research
More informationError Protection of Wavelet Coded Images Using Residual Source Redundancy
Error Protection of Wavelet Coded Images Using Residual Source Redundancy P. Greg Sherwood and Kenneth Zeger University of California San Diego 95 Gilman Dr MC 47 La Jolla, CA 9293 sherwood,zeger @code.ucsd.edu
More informationACEEE Int. J. on Electrical and Power Engineering, Vol. 02, No. 02, August 2011
DOI: 01.IJEPE.02.02.69 ACEEE Int. J. on Electrical and Power Engineering, Vol. 02, No. 02, August 2011 Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Interaction Krishna Kumar
More informationFor Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer
For Mac and iphone James McCartney Core Audio Engineer Eric Allamanche Core Audio Engineer 2 3 James McCartney Core Audio Engineer 4 Topics About audio representation formats Converting audio Processing
More informationISSN: An Efficient Fully Exploiting Spatial Correlation of Compress Compound Images in Advanced Video Coding
An Efficient Fully Exploiting Spatial Correlation of Compress Compound Images in Advanced Video Coding Ali Mohsin Kaittan*1 President of the Association of scientific research and development in Iraq Abstract
More information3GPP TS V6.4.0 ( )
TS 26.235 V6.4.0 (2005-03) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Packet switched conversational multimedia applications;
More information