Machine Learning for Speaker Recogni2on and Bioinforma2cs

Size: px

Start display at page:

Download "Machine Learning for Speaker Recogni2on and Bioinforma2cs"

Osborne Howard
5 years ago
Views:

1 Machine Learning for Speaker Recogni2on and Bioinforma2cs Man-Wai MAK Dept of Electronic and Informa8on Engineering, The Hong Kong Polytechnic University UTS/PolyU Workshop 24 Oct 2017

2 1 High-Level Perspec8ve 2 Speaker Recogni8on Contents Robust speaker recogni8on SNR-invariant PLDA Mixture of PLDA Deep learning for speaker recogni8on 3 Machine Learning for Bioinforma8cs 2

3 A High-Level Perspec2ve of My Work Machine Learning Speech Applica8ons Bioinforma8cs Applica8ons Speaker Recogni8on Emo8on Recogni8on Protein Recogni8on ECG Recogni8on 32

What is Speaker Recogni2on Based on the fact that speech produc8on organs are speakerdependent Automa8c speaker recogni8on under controlled

4 What is Speaker Recogni2on Based on the fact that speech produc8on organs are speakerdependent Automa8c speaker recogni8on under controlled environments is easy But under uncontrolled environments, errors are s8ll very high because of different types of variability in speech signals 4

60-dim acoustic vectors Factor Analysis PLDA Scoring 500-dim i-vector x s x t Decision Making Accept/ Reject

5 Processes of Speaker Verifica2on Utterance from registered speaker low-dim representation of the whole utterance Spectral Analysis 60-dim acoustic vectors Factor Analysis 500-dim i-vector Decision Threshold Spectral Analysis 60-dim acoustic vectors Factor Analysis PLDA Scoring 500-dim i-vector x s x t Decision Making Accept/ Reject Utterance from test speaker PLDA: A supervised factor analysis model that can suppress the channel effects in the i-vectors 5

6 Noise Robust Speaker Recogni2on In conven8onal mul8-condi8on training, we pool i- vectors from various background noise levels to train the PLDA model I-vectors with 2 SNR ranges EM Algorithm PLDA Model 6

SNR-Invariant PLDA We proposed to use an

Subspace N Li and MW Mak, "SNR-Invariant

for Robust Speaker Verification", IEEE/ACM

Processing, 2015 N Li, MW Mak, WW Lin and

Modeling of SNR and Duration Variabilities

7 SNR-Invariant PLDA We proposed to use an SNR subspace to model the SNR variability in uzerances Group1 Group2 Group3 SNR Factor 1 SNR Factor 2 SNR Factor 3 SNR Subspace N Li and MW Mak, "SNR-Invariant PLDA Modeling in Nonparametric Subspace for Robust Speaker Verification", IEEE/ACM Trans on Audio Speech and Language Processing, 2015 N Li, MW Mak, WW Lin and JT Chien, "Discriminative Subspace Modeling of SNR and Duration Variabilities for Robust Speaker Verification", Computer Speech and Language,

8 Compared with Conven2onal PLDA Conventional PLDA xij = m+ Vhi + εij x = m+ Vh + Uw + ε k k ij i k ij SNR-Invariant PLDA 8

9 Noise Robust Speaker Recogni2on Conven8onal i-vector/plda systems use a single PLDA model to handle all SNR condi8ons PLDA Model PLDA Score Enrollment i-vectors 9

Mixture of PLDA SNR Es8mator SNR Posterior

diverse SNR by a mixture of PLDA in which the

uzerance s SNR PLDA Model 1 PLDA Model 2 PLDA Score

10 Mixture of PLDA SNR Es8mator SNR Posterior Estimator We proposed to handle uzerances of diverse SNR by a mixture of PLDA in which the posteriors of the indicator variables depend on the uzerance s SNR PLDA Model 1 PLDA Model 2 PLDA Score PLDA Model 3 MW Mak, XM Pang and JT Chien, "Mixture of PLDA for Noise Robust I-Vector Speaker Verification", IEEE/ACM Trans on Audio Speech and Language Processing,

11 Mixture of PLDA Use a GMM to es8mate the mixture posteriors MW Mak, XM Pang and JT Chien, "Mixture of PLDA for Noise Robust I-Vector Speaker Verification", IEEE/ACM Trans on Audio Speech and Language Processing,

12 DNN-Driven Mixture of PLDA Use a DNN to es8mate the mixture posteriors, given i- vectors as input N Li, MW Mak, and JT Chien, "DNN-driven Mixture of PLDA for Robust Speaker Verification", IEEE/ACM Transactions on Audio, Speech and Language Processing,

13 Deep Learning for Speaker Recogni2on Use DNNs for noise reduc8on and feature extrac8on Z Tan, Y Zhu, MW Mak and B Mak, "Senone I-Vectors for Robust Speaker Verification", ISCSLP'16 13

14 Deep Learning for Speaker Recogni2on Use denoising DNNs for i-vector extrac8on Z Tan, Y Zhu, MW Mak and B Mak, "Senone I-Vectors for Robust Speaker Verification", ISCSLP'16 14

15 Deep Learning for Speaker Recogni2on Use mul8-task DNNs for score calibra8on Z Tan, MW Mak and B Mak, DNN-Based Score Calibration with Multi-Task Learning for Noise Robust Speaker Verification", IEEE/ACM Trans on Audio, Speech and Language Processing, to appear 15

16 Machine Learning for Bioinforma2cs We leverage the knowledge in gene ontology database and Swissprot protein database for protein subcellular localiza8on m=m S AC BLAST GO Terms Retrieval Swiss-Prot Database GO Vectors Construc8on RP 1 RP l SVM SVM w l m=2 m=1 w 1 w L Mul8-label Classifica8 on GOA Database RP L SVM Ensemble RP SB Wan, MW Mak, and SY Kung, "mgoasvm: Multi-label protein subcellular localization based on gene ontology and support vector machines", BMC Bioinformatics,

Machine Learning for Bioinforma2cs Using LASSO and elas8c net, we discovered some essen8al GO terms for each subcellular loca8on SB Wan, MW Mak

17 Machine Learning for Bioinforma2cs Using LASSO and elas8c net, we discovered some essen8al GO terms for each subcellular loca8on SB Wan, MW Mak and SY Kung, "Sparse Regressions for Predicting and Interpreting Subcellular Localization of Multi-label Proteins", BMC Bioinformatics,

Machine Learning for Bioinforma2cs For each method (paper), we provide a web server for researchers to use our algorithm SB Wan, MW Mak and SY Kung, "FUEL-mLoc: Feature-Unified Prediction and

18 Machine Learning for Bioinforma2cs For each method (paper), we provide a web server for researchers to use our algorithm SB Wan, MW Mak and SY Kung, "FUEL-mLoc: Feature-Unified Prediction and Explanation of Multi- Localization of Cellular Proteins in Multiple Organisms", Bioinformatics, 2016 SB Wan and MW Mak, Machine Learning for Protein Subcellular Localization Prediction, De Gruyter,

19 Thanks

PLDA Likelihood-Ra2o Scores x t : I-vector from a test uzerance x s : I-vector from an enrollment uzerance of speaker s H 0 : Same speaker H 1 : Different speakers x s = m+ Vz + ε s x t = m+ Vz + ε t

20 PLDA Likelihood-Ra2o Scores x t : I-vector from a test uzerance x s : I-vector from an enrollment uzerance of speaker s H 0 : Same speaker H 1 : Different speakers x s = m+ Vz + ε s x t = m+ Vz + ε t against x s = m + Vz s +ε s x t = m + Vz t +ε t p(x Score(x s, x t ) = log s, x t Same Speaker) p(x s, x t Different Speaker) = log p(x s, x t z)p(z)dz p(x s z s )p(z s )dz s p(x t z t )p(z t )dz t = 1 2 x T s Qx s + x T t Qx t + 2x T s Px t + const where Full derivation of this scoring function can be found in 20

21 SNR-Invariant PLDA Method of modeling SNR informa8on i-vector w 6dB 6 db w cln SNR Subspace 15 db clean w 15dB I-vector Space N Li, MW Mak, WW Lin and JT Chien, "Discriminative Subspace Modeling of SNR and Duration Variabilities for Robust Speaker Verification", Computer Speech and Language,

Bo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on

Bo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on TAN Zhili & MAK Man-Wai APSIPA 2015 Department of Electronic and Informa2on Engineering The Hong Kong Polytechnic