SRE08 system. Nir Krause Ran Gazit Gennady Karvitsky. Leave Impersonators, fraudsters and identity thieves speechless

Size: px

Start display at page:

Download "SRE08 system. Nir Krause Ran Gazit Gennady Karvitsky. Leave Impersonators, fraudsters and identity thieves speechless"

Winfred Robbins
5 years ago
Views:

2 Focus: Multilingual telephone speech and 10sec conditions

4 Qualcomm-ICSI- OGI Wiener filter (MIC only)

5 MFCC & LPCC

6 SGM Svm in the Gmm Models space SGM= GMM-SVM=GSV=GMS=?

7 NAP SGM

8 GMM

9 TNO Thank you David!

10 Tuning with Focal Thank you Niko!

11 Condition Short2-short3 PRS1 (Primary) LPCC NAP SGM + MFCC NAP SGM + TNO Short2-10sec LPCC NAP SGM + MFCC NAP SGM 10sec-10sec LPCC SGM + MFCC SGM + GMM Short2-summed LPCC NAP SGM + MFCC NAP SGM

13 Super vectors

14 Super vector generation m 1 :Means-only Bayesian adaptation -> UBM with 512 Gaussians (m ubm ) top 10 scoring Gaussians A super vector of Gaussian means m 2 =m 1 -m ubm m ( gaussian, feature ) m 3 2 m=m 3 / m 3 L2 normalization. ( gaussian, feature ) weight ( gaussian ) var( gaussian, feature )

15 Data engineering

16 Strategy: Gender dependent Sub-condition dependent Greedy

18 Optimized parameters UBM, Negative & NAP speakers (among a few choices) NAP dimension Relevance factor

19 LPCC & MFCC NAP SGM short2-short3 phone (in both train & test), short2-summed Males Females Description UBM background data segments of different speakers from SREs: 99, 03, 04 & 05. Negative examples segments from Call Friend, and SREs: 99, 03, 04 & 05. NAP different speakers with at least 6 calls in SREs 04 & 05, who do not appear in the negative examples. NAP dimension Relevance factor 3 3

20 LPCC NAP SGM short2-short3 microphone (in either test or train) Males Females Description UBM background data segments of different speakers from SREs: 99, 03, 04 & 05. Negative examples segments from SREs: 99, 03, 04 & 05, including 05 mic tests data. NAP different speakers with at least 6 calls in SREs 04 & 05mic, who do not appear in the negative examples. NAP dimension Relevance factor 3 3

21 LPCC NAP SGM short2-10sec (MFCC used slightly different background data) Males Females description UBM background data segments of different speakers from SREs: 99, 03, 04 & 05. Negative examples segments from SREs: 99, 03, 04 & 05. Only the first 15sec of net audio were used to create the super vector, to match the test segment length. NAP different speakers with at least 6 calls in SREs 04 & 05, who do not appear in the negative examples. NAP dimension Relevance factor 3 3

22 LPCC & MFCC SGM 10sec-10sec Males Females description UBM background data segments of different speakers from SREs: 99, 03, 04 & 05. Negative examples segments from SREs: 99, 03, 04 & 05. Only the first 15sec of net audio were used to create the super vector, to match the test segment length. Relevance factor 1 1 Other The silence detector parameters were optimized for this condition, to extract more frames

23 10sec-10sec GMM Same as last year

25 Equal fusion Focal logistic regression

26 Good old SRE06 (didn t use the new short-short lists)

27 Results

28 Short2-short3 Int-Int Int-Int same PRS1 (NAP SGM + TNO) Int-Int different Int-Tel Tel-Mic Tel-Tel Tel-Tel Eng EER Tel-Tel native Eng mindcf actdcf PRS2 (NAP SGM) EER mindcf actdcf

29 Short2-summed Tel-tel Tel-tel Eng EER mindcf actdcf

30 Short2-10sec Tel-tel Tel-tel Eng EER mindcf actdcf

31 10sec-10sec Tel-tel Tel-tel Eng PRS1 EER mindcf actdcf PRS2 (SGM) EER mindcf actdcf PRS3 (Tnormed GMM) EER mindcf actdcf

32 Road to no-where Z/T/ZT norm for SGM? Only in GMM

33 Road to no-where 1024 Gaussians

34 Road to no-where Concatenate a male adapted & a female adapted super vectors (as in SRI s MLLR)

35 Road to no-where Wiener filter on telephone data

36 Road to no-where Factor analysis session compensation: as good as NAP, doesn t improve much with fusion

37 Road to no-where NAP on 10sec training duration doesn t help. Helps when training on 2.5 min, testing 10sec

38 Road to no-where Fusion of same system with different background datausually not useful

39 Back to the future Joint Factor Analysis still in process

40 PerSay s NIST VS customers (mainly call centers of banks, telecoms etc...)

41 Dev & Test data NIST Dev: SRE 06,05,04, Different background and development data Thousands of speakers SRE target models 100,000 tests 100GB Customers Same dev & test: speakers 10 speakers 3 speakers

42 Duration NIST Focus on Train: 2.5 minutes Test: 2.5 minutes Customers Train: ~1 minute Test: 20 sec summed Can we remove the agent?

43 TI/TD NIST Text Independent Customers Text Dependent (90%) 0.5-3% EER Text Independent (10%)

Comparative Evaluation of Feature Normalization Techniques for Speaker Verification

Comparative Evaluation of Feature Normalization Techniques for Speaker Verification Md Jahangir Alam 1,2, Pierre Ouellet 1, Patrick Kenny 1, Douglas O Shaughnessy 2, 1 CRIM, Montreal, Canada {Janagir.Alam,