Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery
|
|
- Chrystal Short
- 5 years ago
- Views:
Transcription
1 Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta Kumar Ghosh SPIRE LAB Electrical Engineering, Indian Institute of Science (IISc), Bangalore, India SPIRE LAB, IISc, Bangalore 1
2 1 Introduction 2 Proposed approach 3 Previous work and baseline 4 Experiments and results Database Experimental setup Evaluation 5 Conclusion and future work SPIRE LAB, IISc, Bangalore 2
3 Introduction Section 1 1 Introduction 2 Proposed approach 3 Previous work and baseline 4 Experiments and results Database Experimental setup Evaluation 5 Conclusion and future work SPIRE LAB, IISc, Bangalore 3
4 Introduction Motivation The Automatic speech recognition (ASR) systems are very common in Mobile devices. SPIRE LAB, IISc, Bangalore 4
5 Introduction Motivation The Automatic speech recognition (ASR) systems are very common in Mobile devices. Implementing ASR applications in mobile devices using these models could be challenging due to its computational and memory constraints. SPIRE LAB, IISc, Bangalore 4
6 Introduction Motivation Distributed speech recognition (DSR) allows ASR applications to be used in mobile devices 1. 1 Choi, 14-2: Invited Paper: Enabling Technologies for Wearable Smart Headsets, 2016 SPIRE LAB, IISc, Bangalore 5
7 Introduction Motivation Distributed speech recognition (DSR) allows ASR applications to be used in mobile devices 2. Such systems replace low bit-rate speech codecs with feature vectors (such as MFCCs). 2 Choi, 14-2: Invited Paper: Enabling Technologies for Wearable Smart Headsets, Shao and Milner, Pitch prediction from MFCC vectors for speech reconstruction, 2004 SPIRE LAB, IISc, Bangalore 6
8 Introduction Motivation Distributed speech recognition (DSR) allows ASR applications to be used in mobile devices 2. Such systems replace low bit-rate speech codecs with feature vectors (such as MFCCs). The removal of the speech codec gives increased recognition accuracy, particular in the presence of acoustic noise or channel errors 3. 2 Choi, 14-2: Invited Paper: Enabling Technologies for Wearable Smart Headsets, Shao and Milner, Pitch prediction from MFCC vectors for speech reconstruction, 2004 SPIRE LAB, IISc, Bangalore 6
9 Introduction Motivation HMM based recognizer was using directly features to do ASR 4. 4 Gales, Maximum likelihood linear transformations for HMM-based speech recognition, 1998 SPIRE LAB, IISc, Bangalore 7
10 Introduction Motivation Recently in many practical scenarios, the accuracy of the speech recognition is closer to the human level using the end to end deep architectures Xiong et al., The Microsoft 2016 Conversational Speech Recognition System, Zweig et al., Advances in All-Neural Speech Recognition, Chan et al., Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, 2016 SPIRE LAB, IISc, Bangalore 8
11 Introduction Motivation One way to use is reconstructing the speech from features. SPIRE LAB, IISc, Bangalore 9
12 Introduction Motivation In most cases the feature used are Mel-frequency Cepstral Coefficients (MFCC) in case of HMM based ASR. So we need way to reconstruct the speech just using MFCC. So we propose to predict the pitch from MFCC as first step in speech reconstruction. SPIRE LAB, IISc, Bangalore 10
13 Introduction How Mel-frequency Cepstral Coefficients (MFCC) encodes the pitch information? SPIRE LAB, IISc, Bangalore 11
14 Introduction Source Filter model of speech 8 8 Fant, Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, 1971 SPIRE LAB, IISc, Bangalore 12
15 Introduction Source Filter model of speech 8 Note the sparse nature of the speech spectrum..! 8 Fant, Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, 1971 SPIRE LAB, IISc, Bangalore 12
16 Introduction MFCC computation 9 9 Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001 SPIRE LAB, IISc, Bangalore 13
17 Introduction MFCC computation 9 w(n) is the window signal. 9 Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001 SPIRE LAB, IISc, Bangalore 13
18 Introduction MFCC computation 9 9 Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001 SPIRE LAB, IISc, Bangalore 13
19 Introduction MFCC computation 9 H m [k], 0 k N 1 is frequency response of m th filter. 9 Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001 SPIRE LAB, IISc, Bangalore 13
20 Introduction MFCC computation 9 9 Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001 SPIRE LAB, IISc, Bangalore 13
21 Introduction MFCC computation 9 9 Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001 SPIRE LAB, IISc, Bangalore 13
22 Introduction MFCC computation 9 9 Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001 SPIRE LAB, IISc, Bangalore 13
23 Proposed approach Section 2 1 Introduction 2 Proposed approach 3 Previous work and baseline 4 Experiments and results Database Experimental setup Evaluation 5 Conclusion and future work SPIRE LAB, IISc, Bangalore 14
24 Proposed approach Proposed approach: Pitch prediction from MFCC What are the blocks to be inverted? SPIRE LAB, IISc, Bangalore 15
25 Proposed approach Proposed approach: Pitch prediction from MFCC Speech magnitude spectrum is enough to predict the pitch..! SPIRE LAB, IISc, Bangalore 15
26 Proposed approach Proposed approach: Pitch prediction from MFCC Which blocks are non-invertible? SPIRE LAB, IISc, Bangalore 16
27 Proposed approach Proposed approach: Pitch prediction from MFCC Which blocks are non-invertible? SPIRE LAB, IISc, Bangalore 16
28 Proposed approach Proposed approach: Pitch prediction from MFCC What are the blocks are non-invertible? SPIRE LAB, IISc, Bangalore 16
29 Proposed approach Proposed approach: Pitch prediction from MFCC What are the blocks are non-invertible? SPIRE LAB, IISc, Bangalore 16
30 Proposed approach Proposed approach: Pitch prediction from MFCC What are the blocks are non-invertible? We propose a three-step method to estimate the pitch from MFCC. 1 Estimate the MFBE from the MFCC. 2 Recover the spectrum from the estimated MFBEs. 3 Estimate pitch from spectrum. SPIRE LAB, IISc, Bangalore 16
31 Proposed approach Proposed approach-estimation of the spectrum from the MFBEs SPIRE LAB, IISc, Bangalore 17
32 Proposed approach (1) Estimate the MFBE from the MFCC SPIRE LAB, IISc, Bangalore 18
33 Proposed approach (1) Estimate the MFBE from the MFCC The DCT operation is invertible only if the number of MFBEs(M) and MFCCs(K) are the same. If K<M. we use two methods to recover the MFBEs. 1 Z DCT : Zero padding to MFCC 2 DNN DCT : DNN based estimation. SPIRE LAB, IISc, Bangalore 19
34 Proposed approach (2) Recover the spectrum from the estimated MFBEs SPIRE LAB, IISc, Bangalore 20
35 Proposed approach [2a] Recover the spectrum from the estimated MFBEs. The voiced spectrum is sparse and the pitch can be determined from the voice spectrum. The values around the harmonics is determined by the spectrum of the window. We model the voiced speech spectrum as ( L ) ( L ) Y [k] W [k] x l δ(k N 0 l) = x l W (k N 0 l) l=1 l=1 This can be compactly written as Y W x. where x is a sparse vector SPIRE LAB, IISc, Bangalore 21
36 Proposed approach (2b) Recover the spectrum from the estimated MFBEs. Error in the modeling because of non-inveribility. SPIRE LAB, IISc, Bangalore 22
37 Proposed approach (2b) Recover the spectrum from the estimated MFBEs. Error in the modeling because of non-inveribility. The estimated MFBE can be written as ˆf = HŴ x + γ where γ is sum of model and estimation noise. SPIRE LAB, IISc, Bangalore 22
38 Proposed approach (2b) Recover the spectrum from the estimated MFBEs. Error in the modeling because of non-inveribility. The estimated MFBE can be written as ˆf = HŴ x + γ where γ is sum of model and estimation noise. We propose two methods to recover the spectrum from MFBEs 1 Direct estimation of spectrum under the noise model given above. 2 Estimation of spectrum with sparsity constraint on the spectrum. SPIRE LAB, IISc, Bangalore 22
39 Proposed approach (2c) Recover the spectrum from the estimated MFBEs. Given that the ˆf = HŴ x + γ, The maximum likelihood estimation of spectrum is given by x PINV = arg min ˆf HŴ x 2 2 (1) x SPIRE LAB, IISc, Bangalore 23
40 Proposed approach (2c) Recover the spectrum from the estimated MFBEs. Given that the ˆf = HŴ x + γ, The maximum likelihood estimation of spectrum is given by x PINV = arg min ˆf HŴ x 2 2 (1) x The solution turns out to be a closed form expression and can be written using the pseudo-inverse (PINV) of HŴ as follows: x PINV = ((HŴ )T HŴ ) 1 (HŴ )Tˆf (2) SPIRE LAB, IISc, Bangalore 23
41 Proposed approach (2d) Recover the spectrum from the estimated MFBEs. We impose non-negativity and sparsity constraint on x. This results in the following optimization problem: x S = arg min ˆf HŴ x λ x 1 x 0 10 Koh, Kim, and Boyd, An interior-point method for large-scale l1-regularized logistic regression, 2007 SPIRE LAB, IISc, Bangalore 24
42 Proposed approach (2d) Recover the spectrum from the estimated MFBEs. We impose non-negativity and sparsity constraint on x. This results in the following optimization problem: x S = arg min ˆf HŴ x λ x 1 x 0 Since there is non negativity constraint on x, the l 1 norm of x can be written as sum of its elements. The equivalent optimization problem becomes: x S = arg min ˆf HŴ x λ1 T x (3) x 0 The following optimization is posed as a quadratic programing problem Koh, Kim, and Boyd, An interior-point method for large-scale l1-regularized logistic regression, 2007 SPIRE LAB, IISc, Bangalore 24
43 Proposed approach (2d) Recover the spectrum from the estimated MFBEs. We impose non-negativity and sparsity constraint on x. This results in the following optimization problem: x S = arg min ˆf HŴ x λ x 1 x 0 Since there is non negativity constraint on x, the l 1 norm of x can be written as sum of its elements. The equivalent optimization problem becomes: x S = arg min ˆf HŴ x λ1 T x (3) x 0 The following optimization is posed as a quadratic programing problem 10. Note that the λ is hyper-parameter and controls the sparsity. 10 Koh, Kim, and Boyd, An interior-point method for large-scale l1-regularized logistic regression, 2007 SPIRE LAB, IISc, Bangalore 24
44 Proposed approach (3) Estimation of the pitch from the estimated spectrum SPIRE LAB, IISc, Bangalore 25
45 Proposed approach (3) Estimation of the pitch from the estimated spectrum We use Subharmonic to Harmonic Ratio (SHR) 11 to estimate the pitch from the spectrum. Given the magnitude spectrum X (f ), the pitch range S and the number of harmonics (Q), the pitch value (p ) is obtained following an optimization given below: p = arg max f S Q k=1 0 ( ) log X (f ) δ(f kf) δ(f (k 1/2)f) df (4) 11 Sun, Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio, 2002 SPIRE LAB, IISc, Bangalore 26
46 Proposed approach [3]Estimation of the pitch from the estimated spectrum SPIRE LAB, IISc, Bangalore 27
47 Previous work and baseline Section 3 1 Introduction 2 Proposed approach 3 Previous work and baseline 4 Experiments and results Database Experimental setup Evaluation 5 Conclusion and future work SPIRE LAB, IISc, Bangalore 28
48 Previous work and baseline Previous work and baseline There are several works in the literature where the pitch is predicted from the MFCC using a statistical model such as Gaussian mixture model (GMM) and hidden Markov models Here we use Deep neural network (DNN) based method to predict pitch from MFCC. Which showed lot of success in many fields. We refer this DNN by DNN b. 12 Milner and Shao, Prediction of fundamental frequency and voicing from Mel-frequency cepstral coefficients for unconstrained speech reconstruction, Shao and Milner, Pitch prediction from MFCC vectors for speech reconstruction, 2004 SPIRE LAB, IISc, Bangalore 29
49 Previous work and baseline [3]Estimation of the pitch from the estimated spectrum SPIRE LAB, IISc, Bangalore 30
50 Experiments and results Section 4 1 Introduction 2 Proposed approach 3 Previous work and baseline 4 Experiments and results Database Experimental setup Evaluation 5 Conclusion and future work SPIRE LAB, IISc, Bangalore 31
51 Experiments and results Database Database We use two databases: CMUARCTIC 14 and KEELE Kominek and Black, The CMU ARCTIC speech databases, Plante, Meyer, and Ainsworth, A pitch extraction reference database, 1995 SPIRE LAB, IISc, Bangalore 32
52 Experiments and results Database Database We use two databases: CMUARCTIC 14 and KEELE 15. CMU-ARCTIC database: one male(jmk) and one female(slt). 48min each. 14 Kominek and Black, The CMU ARCTIC speech databases, Plante, Meyer, and Ainsworth, A pitch extraction reference database, 1995 SPIRE LAB, IISc, Bangalore 32
53 Experiments and results Database Database We use two databases: CMUARCTIC 14 and KEELE 15. CMU-ARCTIC database: one male(jmk) and one female(slt). 48min each. KEELE database: one male and one female. 4min each. 14 Kominek and Black, The CMU ARCTIC speech databases, Plante, Meyer, and Ainsworth, A pitch extraction reference database, 1995 SPIRE LAB, IISc, Bangalore 32
54 Experiments and results Database Database We use two databases: CMUARCTIC 14 and KEELE 15. CMU-ARCTIC database: one male(jmk) and one female(slt). 48min each. KEELE database: one male and one female. 4min each. We use randomly choose 80% of the CMUARCTIC data from each speaker as training set and rest as test set. 100% KEELE is used as test set to evaluate the generalization of the algorithms. 14 Kominek and Black, The CMU ARCTIC speech databases, Plante, Meyer, and Ainsworth, A pitch extraction reference database, 1995 SPIRE LAB, IISc, Bangalore 32
55 Experiments and results Database Database The histogram pitch distribution for different train and test set is shown below SPIRE LAB, IISc, Bangalore 33
56 Experiments and results Database Database The histogram pitch distribution for different train and test set is shown below Note the histogram mismatch in MALE is more. SPIRE LAB, IISc, Bangalore 33
57 Experiments and results Experimental setup Experimental setup: MFCC and Pitch computation MFCC computation: 1 hamming window of 40ms and shift of 10ms. 2 The DFT of 2048 point is computed. 3 The MFBEs are computed by placing the M=26 filter banks uniformly on the Melscale from Hz The DCT with K = 26, 21, 16, 13 is computed to investigate estimation error due to different amount of truncation in DCT coefficients. Pitch: We use auto-correlation method from Praat 17 on the EGG signal available with the database to determine ground truth of the fundamental frequency and voicing. The un-voiced frames are removed from the data for the experiments. 16 The velocity and the acceleration coefficients of MFCC are not used. 17 Boersma and Weenink, Praat: doing phonetics by computer, 2010 SPIRE LAB, IISc, Bangalore 34
58 Experiments and results Experimental setup Experimental setup: Hyper parameter selection proposed method: The sparse spectrum estimation has hyper parameter λ is experimentally found using the 10% of randomly selected training data to minimize the pitch error for each training set. SPIRE LAB, IISc, Bangalore 35
59 Experiments and results Experimental setup Experimental setup: Hyper parameter selection proposed method: The sparse spectrum estimation has hyper parameter λ is experimentally found using the 10% of randomly selected training data to minimize the pitch error for each training set. Pitch estimation: We use 4 harmonics to compute the pitch score using SHR. The pitch search range(s) for CM, CF and CM+CF are chosen to be , , respectively. SPIRE LAB, IISc, Bangalore 35
60 Experiments and results Evaluation Evaluation We use root mean squared error 18 as metrics to measure the pitch estimation performance. This is computed using the estimated pitch (p i ) and original pitch (p i) at the i-th frame for the entire test set with N tot voiced frames given by RMSE = 1 N tot N tot (p i p i )2 i=1 18 Tabrikian, Dubnov, and Dickalov, Maximum a-posteriori probability pitch tracking in noisy environments using harmonic model, 2004 SPIRE LAB, IISc, Bangalore 36
61 Experiments and results Evaluation Estimation of the pitch from the estimated spectrum SPIRE LAB, IISc, Bangalore 37
62 Experiments and results Evaluation Sample recovered spectrum SPIRE LAB, IISc, Bangalore 38
63 Experiments and results Evaluation Sample recovered spectrum the sparsity constraint helps in recovering the lower pitch spectrum with higher accuracy SPIRE LAB, IISc, Bangalore 38
64 Experiments and results Evaluation Set of models trained subscript z indicates the DCT reconstruction using zero padding and subscript D indicates the DCT reconstruction using DNN. model CM CF CM+CF DNN P INV z P S z P INV D P S D DNN CM b P S CM z P INV CM D P S CM D DNN CF b P INV z P S CF z P INV CF D P S CF D DNN CM+CF b P S CM+CF z P INV CM+CF D P S CM+CF D SPIRE LAB, IISc, Bangalore 39
65 Experiments and results Evaluation RMSE error with matched test data MALE Test CM DNN CM b DNN CF b DNN CM+CF b P INV z PS CM z PS CF z PSCF +CM z PINV D PS CM D DNN out performs all the methods. SPIRE LAB, IISc, Bangalore 40
66 Experiments and results Evaluation RMSE error with matched test data MALE Test CM DNN CM b DNN CF b DNN CM+CF b P INV z PS CM z PS CF z PSCF +CM z PINV D PS CM D The pitch RMSE increases as the truncation increases. SPIRE LAB, IISc, Bangalore 40
67 Experiments and results Evaluation RMSE error with matched test data MALE Test CM DNN CM b DNN CF b DNN CM+CF b P INV z PS CM z PS CF z PSCF +CM z PINV D PS CM D The DNN gender mismatched model performing poorly. SPIRE LAB, IISc, Bangalore 40
68 Experiments and results Evaluation RMSE error with matched test data MALE Test CM DNN CM b DNN CF b DNN CM+CF b P INV z PS CM z PS CF z PSCF +CM z PINV D PS CM D The PS and PINV methods are not affected much by the gender mismatch. SPIRE LAB, IISc, Bangalore 40
69 Experiments and results Evaluation RMSE error with matched test data MALE Test CM DNN CM b DNN CF b DNN CM+CF b P INV z PS CM z PS CF z PSCF +CM z PINV D PS CM D The DNN based DCT estimation is helping in all cases. SPIRE LAB, IISc, Bangalore 40
70 Experiments and results Evaluation RMSE error with matched test data FEMALE Test CF DNN CM b DNN CF b DNN CM+CF b P INV z PS CM z PS CF z PSCF +CM z PINV D PS CF D Same observations applicable. The RMSE is lower comapre to the MALE case. SPIRE LAB, IISc, Bangalore 41
71 Experiments and results Evaluation RMSE error with matched test data MALE+FEMALE Test CM+CF DNN CM b DNN CF b DNN CM+CF b P INV z PS CM z PS CF z PSCF +CM z PINV D PS CM+CF D Same observations applicable. The RMSE is increased compare to gender dependent cases. SPIRE LAB, IISc, Bangalore 42
72 Experiments and results Evaluation RMSE error with mismatched MALE test data KM DNN CM b DNN CF b DNN CM+CF b P INV z PS CM z PS CF z PSCF +CM z PINV D PS CM D DNN is performing poorly because of the histogram mismatch. SPIRE LAB, IISc, Bangalore 43
73 Experiments and results Evaluation RMSE error with mismatched MALE test data KM DNN CM b DNN CF b DNN CM+CF b P INV z PS CM z PS CF z PSCF +CM z PINV D PS CM D PS method is out performing all the methods. SPIRE LAB, IISc, Bangalore 43
74 Experiments and results Evaluation RMSE error with mismatched MALE test data KM DNN CM b DNN CF b DNN CM+CF b P INV z PS CM z PS CF z PSCF +CM z PINV D PS CM D The RMSE of P INV D and P S D is higher than the P INV z and P S z because the DNN-DCT prediction is also poor. SPIRE LAB, IISc, Bangalore 43
75 Experiments and results Evaluation RMSE error with mismatched FEMALE test data KF DNN CM b DNN CF b DNN CM+CF b P INV z PS CM z PS CF z PSCF +CM z PINV D PS CF D P INV z method is out performing all the methods at lower value of truncation. SPIRE LAB, IISc, Bangalore 44
76 Experiments and results Evaluation RMSE error with mismatched FEMALE test data KF DNN CM b DNN CF b DNN CM+CF b P INV z PS CM z PS CF z PSCF +CM z PINV D PS CF D DNN method is better at higher truncation and this because of the histogram mismatch is less in case of FEMALE data. SPIRE LAB, IISc, Bangalore 44
77 Experiments and results Evaluation RMSE error with mismatched MALE+FEMALE test data To evaluate the performance of the algorithm in a general unseen scenario, we evaluate the gender independent models in each method (DNN, PINV and PS). The average RMSE on unseen KEELE database is shown below AVG DNN b CM+CF P INV z PS CF +CM z SPIRE LAB, IISc, Bangalore 45
78 Experiments and results Evaluation RMSE error with mismatched MALE+FEMALE test data To evaluate the performance of the algorithm in a general unseen scenario, we evaluate the gender independent models in each method (DNN, PINV and PS). The average RMSE on unseen KEELE database is shown below AVG DNN b CM+CF P INV z PS CF +CM z The pitch prediction with sparsity constraint will out perform other 2 methods in general unseen data and unknown gender case. SPIRE LAB, IISc, Bangalore 45
79 Conclusion and future work Section 5 1 Introduction 2 Proposed approach 3 Previous work and baseline 4 Experiments and results Database Experimental setup Evaluation 5 Conclusion and future work SPIRE LAB, IISc, Bangalore 46
80 Conclusion and future work Conclusion and future work Proposed a there-step method to estimate pitch from MFCC vectors. SPIRE LAB, IISc, Bangalore 47
81 Conclusion and future work Conclusion and future work Proposed a there-step method to estimate pitch from MFCC vectors. We showed that the sparsity constraint help in recovering the pitch value more accurately in MALE subjects and generalize well across database. SPIRE LAB, IISc, Bangalore 47
82 Conclusion and future work Conclusion and future work Proposed a there-step method to estimate pitch from MFCC vectors. We showed that the sparsity constraint help in recovering the pitch value more accurately in MALE subjects and generalize well across database. It might be possible to train a DNN with many speakers to get a better model that generalizes well on unseen test cases. However, obtaining data from many speakers with EGG could be challenging. SPIRE LAB, IISc, Bangalore 47
83 Conclusion and future work Conclusion and future work Proposed a there-step method to estimate pitch from MFCC vectors. We showed that the sparsity constraint help in recovering the pitch value more accurately in MALE subjects and generalize well across database. It might be possible to train a DNN with many speakers to get a better model that generalizes well on unseen test cases. However, obtaining data from many speakers with EGG could be challenging. Future works may include imposing the periodicity constraint along with the sparsity constraint on the spectrum. Reconstruction of speech using the estimated pitch and evaluation of ASR performance and the naturalness of synthesized speech. SPIRE LAB, IISc, Bangalore 47
84 Conclusion and future work THANK YOU SPIRE LAB, IISc, Bangalore 48
85 Conclusion and future work SPIRE LAB, IISc, Bangalore 49
86 Conclusion and future work Experimental setup: Deep neural network 1 The structure of DNNs is defined recursively on the layer index l. The input vector, z l R d 1, is mapped to the representation vector z l+1 R d 2 through an activation function f l as follows: z l+1 = f l (W l z l + b l ), 0 l L 1 (5) where f l (x) = { tanh(x), 0 l L 2 x, l = L 1. d 1,d 2 are the input and output dimensions of the l th layer. The W l and b l are the parameters of the network. These parameters are estimated by back propagation and stochastic gradient decent. SPIRE LAB, IISc, Bangalore 50
87 Conclusion and future work Experimental setup: Deep neural network 1 DNNs for both DNN DCT and DNN b have the same architecture and training procedure, except for the number of hidden units in each layer. We use 4-layer network with 256 units in each layer for DNN b and 512 units for DNN DCT. 2 The input data is normalized to zero mean and unit variance. 3 The network is initialized using glorot initialization. 4 The training is performed using stochastic gradient descent with a batch size of 256 and a momentum of 0.9. The 20% of training data is used to monitor the validation loss at each epoch and the weight update is stopped when there is no improvement in the validation loss. SPIRE LAB, IISc, Bangalore 51
NON-UNIFORM SPEAKER NORMALIZATION USING FREQUENCY-DEPENDENT SCALING FUNCTION
NON-UNIFORM SPEAKER NORMALIZATION USING FREQUENCY-DEPENDENT SCALING FUNCTION S. V. Bharath Kumar Imaging Technologies Lab General Electric - Global Research JFWTC, Bangalore - 560086, INDIA bharath.sv@geind.ge.com
More informationAcoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing
Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se
More informationGPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search
GPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search Wonkyum Lee Jungsuk Kim Ian Lane Electrical and Computer Engineering Carnegie Mellon University March 26, 2014 @GTC2014
More informationBo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on
Bo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on TAN Zhili & MAK Man-Wai APSIPA 2015 Department of Electronic and Informa2on Engineering The Hong Kong Polytechnic
More informationDeep Learning. Volker Tresp Summer 2014
Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationVoice Command Based Computer Application Control Using MFCC
Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationSPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION
Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing
More informationDiscriminative training and Feature combination
Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics
More informationImplementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi
Implementing a Speech Recognition System on a GPU using CUDA Presented by Omid Talakoub Astrid Yi Outline Background Motivation Speech recognition algorithm Implementation steps GPU implementation strategies
More informationJoint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training
Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training Chao Zhang and Phil Woodland March 8, 07 Cambridge University Engineering Department
More informationImproving Bottleneck Features for Automatic Speech Recognition using Gammatone-based Cochleagram and Sparsity Regularization
Improving Bottleneck Features for Automatic Speech Recognition using Gammatone-based Cochleagram and Sparsity Regularization Chao Ma 1,2,3, Jun Qi 4, Dongmei Li 1,2,3, Runsheng Liu 1,2,3 1. Department
More informationArbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov
Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization Presented by: Karen Lucknavalai and Alexandr Kuznetsov Example Style Content Result Motivation Transforming content of an image
More informationIntelligent Hands Free Speech based SMS System on Android
Intelligent Hands Free Speech based SMS System on Android Gulbakshee Dharmale 1, Dr. Vilas Thakare 3, Dr. Dipti D. Patil 2 1,3 Computer Science Dept., SGB Amravati University, Amravati, INDIA. 2 Computer
More informationSpeech User Interface for Information Retrieval
Speech User Interface for Information Retrieval Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic Institute, Nagpur Sadar, Nagpur 440001 (INDIA) urmilas@rediffmail.com Cell : +919422803996
More informationGYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)
GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE
More informationVariable-Component Deep Neural Network for Robust Speech Recognition
Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft
More informationA new approach for supervised power disaggregation by using a deep recurrent LSTM network
A new approach for supervised power disaggregation by using a deep recurrent LSTM network GlobalSIP 2015, 14th Dec. Lukas Mauch and Bin Yang Institute of Signal Processing and System Theory University
More information2014, IJARCSSE All Rights Reserved Page 461
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real Time Speech
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 1 Course Overview This course is about performing inference in complex
More informationACEEE Int. J. on Electrical and Power Engineering, Vol. 02, No. 02, August 2011
DOI: 01.IJEPE.02.02.69 ACEEE Int. J. on Electrical and Power Engineering, Vol. 02, No. 02, August 2011 Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Interaction Krishna Kumar
More informationHIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION. Hung-An Chang and James R. Glass
HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION Hung-An Chang and James R. Glass MIT Computer Science and Artificial Intelligence Laboratory Cambridge, Massachusetts, 02139,
More informationSVD-based Universal DNN Modeling for Multiple Scenarios
SVD-based Universal DNN Modeling for Multiple Scenarios Changliang Liu 1, Jinyu Li 2, Yifan Gong 2 1 Microsoft Search echnology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft Way, Redmond,
More informationResearch on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models
Research on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models Wenzhun Huang 1, a and Xinxin Xie 1, b 1 School of Information Engineering, Xijing University, Xi an
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationA Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models
A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models and Voice-Activated Power Gating Michael Price*, James Glass, Anantha Chandrakasan MIT, Cambridge, MA * now at Analog Devices, Cambridge,
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationOptimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification
Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 52 Optimization of Observation Membership Function By Particle Swarm Method for Enhancing
More informationAuthentication of Fingerprint Recognition Using Natural Language Processing
Authentication of Fingerprint Recognition Using Natural Language Shrikala B. Digavadekar 1, Prof. Ravindra T. Patil 2 1 Tatyasaheb Kore Institute of Engineering & Technology, Warananagar, India 2 Tatyasaheb
More informationSpeaker Verification with Adaptive Spectral Subband Centroids
Speaker Verification with Adaptive Spectral Subband Centroids Tomi Kinnunen 1, Bingjun Zhang 2, Jia Zhu 2, and Ye Wang 2 1 Speech and Dialogue Processing Lab Institution for Infocomm Research (I 2 R) 21
More informationNovel Subband Autoencoder Features for Non-intrusive Quality Assessment of Noise Suppressed Speech
INTERSPEECH 16 September 8 12, 16, San Francisco, USA Novel Subband Autoencoder Features for Non-intrusive Quality Assessment of Noise Suppressed Speech Meet H. Soni, Hemant A. Patil Dhirubhai Ambani Institute
More informationSOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2
Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 1 Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 33720, Tampere, Finland toni.heittola@tut.fi,
More informationStacked Denoising Autoencoders for Face Pose Normalization
Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University
More informationAditi Upadhyay Research Scholar, Department of Electronics & Communication Engineering Jaipur National University, Jaipur, Rajasthan, India
Analysis of Different Classifier Using Feature Extraction in Speaker Identification and Verification under Adverse Acoustic Condition for Different Scenario Shrikant Upadhyay Assistant Professor, Department
More informationDEEP LEARNING IN PYTHON. The need for optimization
DEEP LEARNING IN PYTHON The need for optimization A baseline neural network Input 2 Hidden Layer 5 2 Output - 9-3 Actual Value of Target: 3 Error: Actual - Predicted = 4 A baseline neural network Input
More informationWhy DNN Works for Speech and How to Make it More Efficient?
Why DNN Works for Speech and How to Make it More Efficient? Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering, York University, CANADA Joint work with Y.
More informationGeometric Reconstruction Dense reconstruction of scene geometry
Lecture 5. Dense Reconstruction and Tracking with Real-Time Applications Part 2: Geometric Reconstruction Dr Richard Newcombe and Dr Steven Lovegrove Slide content developed from: [Newcombe, Dense Visual
More informationThe Automatic Musicologist
The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12, 2004 Based on the paper: Fast Recognition of Musical
More informationLecture 7: Neural network acoustic models in speech recognition
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic
More informationTracking Algorithms. Lecture16: Visual Tracking I. Probabilistic Tracking. Joint Probability and Graphical Model. Deterministic methods
Tracking Algorithms CSED441:Introduction to Computer Vision (2017F) Lecture16: Visual Tracking I Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Deterministic methods Given input video and current state,
More informationDynamic Time Warping
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Dynamic Time Warping Dr Philip Jackson Acoustic features Distance measures Pattern matching Distortion penalties DTW
More informationTWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University
TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION Prateek Verma, Yang-Kai Lin, Li-Fan Yu Stanford University ABSTRACT Structural segmentation involves finding hoogeneous sections appearing
More informationGating Neural Network for Large Vocabulary Audiovisual Speech Recognition
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. X, JUNE 2017 1 Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition Fei Tao, Student Member, IEEE, and
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationFUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION
Please contact the conference organizers at dcasechallenge@gmail.com if you require an accessible file, as the files provided by ConfTool Pro to reviewers are filtered to remove author information, and
More informationReverberant Speech Recognition Based on Denoising Autoencoder
INTERSPEECH 2013 Reverberant Speech Recognition Based on Denoising Autoencoder Takaaki Ishii 1, Hiroki Komiyama 1, Takahiro Shinozaki 2, Yasuo Horiuchi 1, Shingo Kuroiwa 1 1 Division of Information Sciences,
More informationHidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017
Hidden Markov Models Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 1 Outline 1. 2. 3. 4. Brief review of HMMs Hidden Markov Support Vector Machines Large Margin Hidden Markov Models
More informationGender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV
Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Jan Vaněk and Josef V. Psutka Department of Cybernetics, West Bohemia University,
More informationReal Time Speaker Recognition System using MFCC and Vector Quantization Technique
Real Time Speaker Recognition System using MFCC and Vector Quantization Technique Roma Bharti Mtech, Manav rachna international university Faridabad ABSTRACT This paper represents a very strong mathematical
More informationMultinomial Regression and the Softmax Activation Function. Gary Cottrell!
Multinomial Regression and the Softmax Activation Function Gary Cottrell Notation reminder We have N data points, or patterns, in the training set, with the pattern number as a superscript: {(x 1,t 1 ),
More informationModeling Phonetic Context with Non-random Forests for Speech Recognition
Modeling Phonetic Context with Non-random Forests for Speech Recognition Hainan Xu Center for Language and Speech Processing, Johns Hopkins University September 4, 2015 Hainan Xu September 4, 2015 1 /
More informationAutomatic Speech Recognition on Mobile Devices and over Communication Networks
Zheng-Hua Tan and Berge Lindberg Automatic Speech Recognition on Mobile Devices and over Communication Networks ^Spri inger g< Contents Preface Contributors v xix 1. Network, Distributed and Embedded Speech
More informationMACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014
MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationRECOGNITION OF EMOTION FROM MARATHI SPEECH USING MFCC AND DWT ALGORITHMS
RECOGNITION OF EMOTION FROM MARATHI SPEECH USING MFCC AND DWT ALGORITHMS Dipti D. Joshi, M.B. Zalte (EXTC Department, K.J. Somaiya College of Engineering, University of Mumbai, India) Diptijoshi3@gmail.com
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationManifold Constrained Deep Neural Networks for ASR
1 Manifold Constrained Deep Neural Networks for ASR Department of Electrical and Computer Engineering, McGill University Richard Rose and Vikrant Tomar Motivation Speech features can be characterized as
More informationUsing Gradient Descent Optimization for Acoustics Training from Heterogeneous Data
Using Gradient Descent Optimization for Acoustics Training from Heterogeneous Data Martin Karafiát Λ, Igor Szöke, and Jan Černocký Brno University of Technology, Faculty of Information Technology Department
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationIndex. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,
A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,
More informationA Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition
Special Session: Intelligent Knowledge Management A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition Jiping Sun 1, Jeremy Sun 1, Kacem Abida 2, and Fakhri Karray
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationMachine Learning Feature Creation and Selection
Machine Learning Feature Creation and Selection Jeff Howbert Introduction to Machine Learning Winter 2012 1 Feature creation Well-conceived new features can sometimes capture the important information
More informationComparative Evaluation of Feature Normalization Techniques for Speaker Verification
Comparative Evaluation of Feature Normalization Techniques for Speaker Verification Md Jahangir Alam 1,2, Pierre Ouellet 1, Patrick Kenny 1, Douglas O Shaughnessy 2, 1 CRIM, Montreal, Canada {Janagir.Alam,
More informationMoonRiver: Deep Neural Network in C++
MoonRiver: Deep Neural Network in C++ Chung-Yi Weng Computer Science & Engineering University of Washington chungyi@cs.washington.edu Abstract Artificial intelligence resurges with its dramatic improvement
More informationLecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 December 2012
More informationUsing Capsule Networks. for Image and Speech Recognition Problems. Yan Xiong
Using Capsule Networks for Image and Speech Recognition Problems by Yan Xiong A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science Approved November 2018 by the
More informationNajiya P Fathima, C. V. Vipin Kishnan; International Journal of Advance Research, Ideas and Innovations in Technology
ISSN: 2454-32X Impact factor: 4.295 (Volume 4, Issue 2) Available online at: www.ijariit.com Analysis of Different Classifier for the Detection of Double Compressed AMR Audio Fathima Najiya P najinasi2@gmail.com
More informationSparse Solutions to Linear Inverse Problems. Yuzhe Jin
Sparse Solutions to Linear Inverse Problems Yuzhe Jin Outline Intro/Background Two types of algorithms Forward Sequential Selection Methods Diversity Minimization Methods Experimental results Potential
More informationLeast Squares Signal Declipping for Robust Speech Recognition
Least Squares Signal Declipping for Robust Speech Recognition Mark J. Harvilla and Richard M. Stern Department of Electrical and Computer Engineering Carnegie Mellon University, Pittsburgh, PA 15213 USA
More informationRobust speech recognition using features based on zero crossings with peak amplitudes
Robust speech recognition using features based on zero crossings with peak amplitudes Author Gajic, Bojana, Paliwal, Kuldip Published 200 Conference Title Proceedings of the 200 IEEE International Conference
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationNeural Networks Based Time-Delay Estimation using DCT Coefficients
American Journal of Applied Sciences 6 (4): 73-78, 9 ISSN 1546-939 9 Science Publications Neural Networks Based Time-Delay Estimation using DCT Coefficients Samir J. Shaltaf and Ahmad A. Mohammad Department
More informationVoice Conversion Using Dynamic Kernel. Partial Least Squares Regression
Voice Conversion Using Dynamic Kernel 1 Partial Least Squares Regression Elina Helander, Hanna Silén, Tuomas Virtanen, Member, IEEE, and Moncef Gabbouj, Fellow, IEEE Abstract A drawback of many voice conversion
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationNeetha Das Prof. Andy Khong
Neetha Das Prof. Andy Khong Contents Introduction and aim Current system at IMI Proposed new classification model Support Vector Machines Initial audio data collection and processing Features and their
More informationOn Pre-Image Iterations for Speech Enhancement
Leitner and Pernkopf RESEARCH On Pre-Image Iterations for Speech Enhancement Christina Leitner 1* and Franz Pernkopf 2 * Correspondence: christina.leitner@joanneum.at 1 JOANNEUM RESEARCH Forschungsgesellschaft
More informationConfidence Measures: how much we can trust our speech recognizers
Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition
More informationMultifactor Fusion for Audio-Visual Speaker Recognition
Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 2007 70 Multifactor Fusion for Audio-Visual Speaker Recognition GIRIJA CHETTY
More informationRecurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra
Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification
More informationPair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification 2 1 Xugang Lu 1, Peng Shen 1, Yu Tsao 2, Hisashi
More informationDevice Activation based on Voice Recognition using Mel Frequency Cepstral Coefficients (MFCC s) Algorithm
Device Activation based on Voice Recognition using Mel Frequency Cepstral Coefficients (MFCC s) Algorithm Hassan Mohammed Obaid Al Marzuqi 1, Shaik Mazhar Hussain 2, Dr Anilloy Frank 3 1,2,3Middle East
More informationDecentralized and Distributed Machine Learning Model Training with Actors
Decentralized and Distributed Machine Learning Model Training with Actors Travis Addair Stanford University taddair@stanford.edu Abstract Training a machine learning model with terabytes to petabytes of
More informationAccelerating the Hessian-free Gauss-Newton Full-waveform Inversion via Preconditioned Conjugate Gradient Method
Accelerating the Hessian-free Gauss-Newton Full-waveform Inversion via Preconditioned Conjugate Gradient Method Wenyong Pan 1, Kris Innanen 1 and Wenyuan Liao 2 1. CREWES Project, Department of Geoscience,
More informationChapter 3. Speech segmentation. 3.1 Preprocessing
, as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents
More informationDeep Learning Cook Book
Deep Learning Cook Book Robert Haschke (CITEC) Overview Input Representation Output Layer + Cost Function Hidden Layer Units Initialization Regularization Input representation Choose an input representation
More informationA Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October
More informationHello Edge: Keyword Spotting on Microcontrollers
Hello Edge: Keyword Spotting on Microcontrollers Yundong Zhang, Naveen Suda, Liangzhen Lai and Vikas Chandra ARM Research, Stanford University arxiv.org, 2017 Presented by Mohammad Mofrad University of
More informationTutorial: Using Tina Vision s Quantitative Pattern Recognition Tool.
Tina Memo No. 2014-004 Internal Report Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool. P.D.Tar. Last updated 07 / 06 / 2014 ISBE, Medical School, University of Manchester, Stopford
More informationAssignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions
ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan
More informationImplementation of Speech Based Stress Level Monitoring System
4 th International Conference on Computing, Communication and Sensor Network, CCSN2015 Implementation of Speech Based Stress Level Monitoring System V.Naveen Kumar 1,Dr.Y.Padma sai 2, K.Sonali Swaroop
More informationMultiple-View Object Recognition in Band-Limited Distributed Camera Networks
in Band-Limited Distributed Camera Networks Allen Y. Yang, Subhransu Maji, Mario Christoudas, Kirak Hong, Posu Yan Trevor Darrell, Jitendra Malik, and Shankar Sastry Fusion, 2009 Classical Object Recognition
More informationM. Sc. (Artificial Intelligence and Machine Learning)
Course Name: Advanced Python Course Code: MSCAI 122 This course will introduce students to advanced python implementations and the latest Machine Learning and Deep learning libraries, Scikit-Learn and
More informationProbabilistic Robotics
Probabilistic Robotics Sebastian Thrun Wolfram Burgard Dieter Fox The MIT Press Cambridge, Massachusetts London, England Preface xvii Acknowledgments xix I Basics 1 1 Introduction 3 1.1 Uncertainty in
More informationl1 ls: A Matlab Solver for Large-Scale l 1 -Regularized Least Squares Problems
l ls: A Matlab Solver for Large-Scale l -Regularized Least Squares Problems Kwangmoo Koh deneb@stanford.edu Seungjean Kim sjkim@stanford.edu May 5, 2008 Stephen Boyd boyd@stanford.edu l ls solves l -regularized
More informationA long, deep and wide artificial neural net for robust speech recognition in unknown noise
A long, deep and wide artificial neural net for robust speech recognition in unknown noise Feipeng Li, Phani S. Nidadavolu, and Hynek Hermansky Center for Language and Speech Processing Johns Hopkins University,
More informationPartitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationDeep Learning. Volker Tresp Summer 2015
Deep Learning Volker Tresp Summer 2015 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationObject Detection with Partial Occlusion Based on a Deformable Parts-Based Model
Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Johnson Hsieh (johnsonhsieh@gmail.com), Alexander Chia (alexchia@stanford.edu) Abstract -- Object occlusion presents a major
More information