Vulnerability of Voice Verification System with STC anti-spoofing detector to different methods of spoofing attacks
|
|
- Percival Walton
- 6 years ago
- Views:
Transcription
1 Vulnerability of Voice Verification System with STC anti-spoofing detector to different methods of spoofing attacks Vadim Shchemelinin 1,2, Alexandr Kozlov 2, Galina Lavrentyeva 2, Sergey Novoselov 1,2 and Konstantin Simonchik 1,2 1 ITMO University, St.Petersburg, Russia 2 Speech Technology Center Limited, St.Petersburg, Russia Abstract. This paper explores the robustness of a text-independent voice verification system against different methods of spoofing attacks based on speech synthesis and voice conversion techniques. Our experiments show that the most dangerous are spoofing attacks based on the speech synthesis, but the use of standard TV-JFA approach based spoofing detection module can reduce the False Acceptance error rate of the whole speaker recognition system from 80% to 1%. Keywords: spoofing, anti-spoofing, speaker recognition, TV, SVM 1 Introduction Speaker verification systems become widespread in recent time. They are used in different areas of our lives: forensic research, physical access control systems, banking, as well as on the web. The two main roles that such systems have in every-day life are usability enhancement and security. So to perform its functions a voice verification system has to have high robustness, especially if it is used for access to a bank account or personal information. For this reason, it is important to continuously assess the stability of voice verification systems against to spoofing attacks. The greatest threat are automatable methods of spoofing based on the synthesis of speech or voice conversion techniques. In the works [1, 2] it is shown that such attack mehods may raise a false error rate to unacceptable values. Together with the increased security threat there were developed detection methods of similar attacks. However, the question of their reliability and performance evaluation is still open. The aim of our study was to determine the most dangerous methods of spoofing for modern verification system working together with the spoofing detection module.
2 2 Vadim Shchemelinin et al. 2 Voice Verification System with Anti-spoofing 2.1 Voice Verification Module One of the standard use-cases of text-independent voice verification systems is the client voice model creation and its comparison with his etalon model during user interaction with the IVR (Interactive Voice Response) systems in call-centers. The user calls to the call-center and uses voice commands to go through the IVR menu. Throughout the call session, clients speech is sent to verification system for voice model creation and estimation if the access to the confidential information should be denied or not. In our experiments the i-vector based speaker recognition system was used. Before features extraction signal preprocessing module was applied. It included energy based voice activity detection, clipping [3], pulse and multi-tonal detection. The pre-emphasizing was also done and speech signal was divided into 22ms window frames with a 50% overlap, and, similarly to spoofing detection, multiplied by Hamming window function. As front-end features 13 MFCC features of each frame with first and second derivatives were selected. The derivatives were estimated over a 5-frame context and we also applied a cepstral mean subtraction (CMS) for the cepstral coefficients. For the acoustic space modelling we used Total Variability super-vectors with Probabilistic LDA approach (TV-PLDA) to achieve better performance [4, 5]. According to this approach, the distribution of the i-vectors can be expressed as following: µ = m + T ω + ϵ, where µ is the super-vector of the Gaussian Mixture Models (GMM) parameters of the speaker model, m is the super-vector of the Universal Background Model(UBM) parameters, T is the TV matrix defining the basis in the reduced feature space, ω is the i-vector in the reduced feature space, ω N(0, 1), ϵ is the error vector. In our system the dimension of TV space was 600 and UBM was genderindependent with 512 component. UBM was obtained by standard ML-training on the telephone part of the NIST s SRE datasets (all languages, both genders) [6, 7]. In our study we used more than 4000 training speakers in total. We also used a diagonal, not a full-covariance GMM UBM. The i-vector extractor and PLDA matrix were trained on more than telephone and microphone recordings from the NIST comprising more than 4000 speakers voices. 2.2 Spoofing Detection Module Spoofing detection method was used in considered speaker verification system as preliminary step. It was firstly introduced in the ASVspoof Challenge 2015 [8]
3 Vulnerability of Voice Verification to different spoofing attacks 3 and achieved 3.922% EER for unknown types of spoofing attacks and 0.008% EER for known spoofing attacks. It should be mentioned that for the HMMbased spoofing attacks of the ASVspoof Challenge evaluation base zero error of spoofing detection was achieved. That was the motivation to include this method to ASV system. Anti-spoofing method consists of four main components: Pre-detection Acoustic feature extractor TV i-vector extractor SVM classifier Pre-detector was used to check if the input signal had zero temporal energy and in this cases declared signal as spoofing attack. Otherwise acoustic features were extracted from signal. As front-end acoustic features we used: 12 Mel-Frequency Cepstral Coefficients (MFCC), 12 Mel-Frequency Principal Coefficients (MFPC) and 12 Cos- Phase Principal Coefficients (CosPhasePC) based on phase spectrum with its first and second derivatives. To obtain these coefficients Hamming windowing was used with 256 window length and 50% overlap. For the acoustic space modelling we used the standard TV-JFA approach, which is the state-of-the-art in speaker verification [7, 9, 10]. According to this version of the joint factor analysis, the i-vector of the Total Variability space is extracted by means of JFA modification, which is a usual Gaussian factor analyser defined on mean super-vectors of the Universal Background Model (UBM) and Total-variability matrix T. UBM was represented by the Gaussian mixture model (GMM) of the described features. The diagonal covariance UBM was trained by the standard EM-algorithm. For anti-spoofing method UBM was represented by a 1024-component Gaussian mixture model of the described features, and the dimension of the TV space was Fusion Decision Module Fig. 1. Voice Verification System with Anti-spoofing sheme
4 4 Vadim Shchemelinin et al. Fusion Decision Module was based on fusion on speaker recognition module output and spoofing detection module output as shown on figure 1. The decision made by verification and spoofing detection modules was expressed as a mentioned below P = P verification (1 P spoofing ), where: P verification is the probability that the speaker in the test recording is the same as the speaker in the etalon, P spoofing is probability that the test recording is spoofing. To calculate probabilities from scores, we used the BOSARIS toolkit [18]. 3 Experiments with Different Types of Spoofing Baseline S1 spoofing S2 spoofing S3 spoofing S4 spoofing S5 spoofing False Rejection (%) False Acceptance (%) Fig. 2. DET curves for verification system without spoofing detection module against different methods of attacks For examining vulnerability of Voice Verification System to different methods of spoofing attacks we used ASVspoof development dataset [11]. It includes free and spoofed speech of 35 speakers, 15 male and 20 female. There are 3497 genuine and spoofed trials. Spoofed speech is generated according to one of the five spoofing methods (S1 - S5) as follows:
5 Vulnerability of Voice Verification to different spoofing attacks 5 S1 - Based on voice conversion, simplified frame selection algorithm [12, 13]. The converted speech is generated by selecting target speech frames. S2 - The simplest voice conversion algorithm [14] which adjusts only the first mel-cepstral coefficient in order to shift the slope of the source spectrum to the target. S3 - The Hidden Markov model based speech synthesis system using speaker adaptation techniques [15] and only 20 adaptation utterances. S4 - The Hidden Markov model based speech synthesis system using speaker adaptation techniques [15] and only 40 adaptation utterances. S5 - The method based on voice conversion toolkit and with the Festvox system [16]. At first, we checked how strong F A error rate was increased if voice verification system didn t contain spoofing detection module. Also in this step, we wanted to make sure that proposed for ASVspoof Chalenge 2015 spoofing techniques were a threat to a system of verification. As the baseline we used only free speech of all speakers from previous described dataset. It is interesting to note that S2 based on conversion of the first mel-cepstral coefficient gives the greatest detection error [17], while this method has the least impact on verification system without spoofing detector as shown on figure Baseline S1 spoofing S2 spoofing S3 spoofing S4 spoofing S5 spoofing 2 1 False Rejection (%) False Acceptance (%) Fig. 3. DET curves for verification system with spoofing detection module against different methods of attacks
6 6 Vadim Shchemelinin et al. The results of experiments with enabled spoofing detection module are presented on figure 3. Additionally in table 1, presented comparisons of the F A values at baseline EER point threshold with spoofing detection module on and off. As it can be Table 1. F A verification error for spoofing the verification system based on different algorithms. F A for threshold in EER point Voice Verification system S1 S2 S3 S4 S5 Without spoofing detection module 52.5% 1.7% 68.5% 77.1% 63.7% With TV-JFA based spoofing detection module 0.36% 0% 0.23% 1.35% 0.98% seen from the table, spoofing detection implementation significantly improves F A error rate. Also obtained results demonstrate that synthesis based spoofing methods are more dangerous in comparison with those based on voice conversion techniques. 4 Conclusions In this paper we analyzed the vulnerability of voice verification system based on state-of-the-art speaker recognition and spoofing detection methods against different spoofing methods based on text-to-speech and voice conversion algorithms. As it was demonstrated by the experiments, spoofing using a TTS voice is more treatful than other methods. For instance, the Hidden Markov model based speech synthesis spoofing method gave 1.35% False Acceptance error, comparing to the 0.98% of method based on voice conversion toolkit. Also, it can be sum up that it is important to evaluate spoofing detection methods together with voice verification systems. Firstly, spoofing detector can be reliable on the not effective spoofing attacks. Secondly, the system EER can be increased by false acceptances error of spoofing detector. However, our results showed once again that it is highly necessary to test verification systems against spoofing by different methods, and to develop antispoofing algorithms reliable in real use-cases. This work was partially financially supported by the Government of Russian Federation, Grant 074-U01. References 1. Shchemelinin V., Simonchik K.: Examining Vulnerability of Voice Verification Systems to Spoofing Attacks by Means of a TTS System. Proceedings of the SPECOM 2013 (Plzen, Czech Republic, September 15, 2013), pp (2013)
7 Vulnerability of Voice Verification to different spoofing attacks 7 2. Shchemelinin V., Topchina M., Simonchik K.: Vulnerability of Voice Verification Systems to Spoofing Attacks with TTS Voices Based on Automatically Labeled Telephone Speech. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8773, No. LNAI, pp (2014) 3. Aleinik, S., Matveev, Y.N.: Detection of Clipped Fragments in Speech Signals. International Journal of Electrical, Electronic Science and Engineering, 8(2), pp , (2014) 4. Kenny, P.,: Bayesian speaker verification with heavy tailed priors. Proceedings of the Odyssey Speaker and Language Recognition Workshop (Brno, Czech Republic, Jun. 2010). (2010) 5. Simonchik K., Pekhovsky T., Shulipa A., Afanasyev A.: Supervized Mixture of PLDA Models for Cross-Channel Speaker Verification. Proceedings of the 13th Annual Conference of the International Speech Communication Association, Interspeech-2012 (Portland, Oregon, USA, September 9-13). (2012) 6. Matveev Yu., Simonchik K.: The speaker identification system for the NIST SRE Proc. The 20th International Conference on Computer Graphics and Vision, GraphiCon 2010 (St. Petersburg, Russia, September ), pp (2010) 7. Kozlov, A., Kudashev, O., Matveev, Yu., Pekhovsky, T., Simonchik, K., Shulipa, A.: SVID speaker recognition system for the NIST SRE Lecture Notes in Computer Science (LNCS), vol. 8113, pp (2013) 8. Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi: ASVspoof 2015: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan, Dec. 19, 2014, 9. S. Novoselov, T. Pekhovsky, K. Simonchik: STC Speaker Recognition System for the NIST i-vector Challenge. In: Proc. Odyssey The Speaker and Language Recognition Workshop (2014) 10. Kinnunen T., Li H.: An overview of text-independent speaker recognition: from features to supervectors. In: Speech Commun. vol. 52, pp (2010) 11. Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Cemal Hanilc, Md Sahidullah, Aleksandr Sizov: ASVspoof 2015: the First Automatic Speaker Verification Spoofing and Countermeasures Challenge spoofingchallenge.org/is2015_asvspoof.pdf 12. T. Dutoit, A. Holzapfel, M. Jottrand, A. Moinet, J. Perez, and Y. Stylianou: Towards a voice conversion system based on frame selection in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Z. Wu, T. Virtanen, T. Kinnunen, E. Chng, and H. Li: Exemplarbased unit selection for voice conversion utilizing temporal information. Interspeech, T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai: An adaptive algorithm for mel-cepstral analysis of speech, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai: Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained smaplr adaptation algorithm, IEEE Trans. Audio, Speech and Language Processing, vol. 17, no. 1, pp. 6683, Festvox project, S. Novoselov, A. Kozlov, G. Lavrentyeva, K. Simonchik, V. Shchemelinin: STC Anti-spoofing Systems for the ASVspoof 2015 Challenge. wp-content/uploads/2015/06/technical_report_asvspoof2015_stc.pdf. 18. BOSARIS Toolkit,
STC ANTI-SPOOFING SYSTEMS FOR THE ASVSPOOF 2015 CHALLENGE
STC ANTI-SPOOFING SYSTEMS FOR THE ASVSPOOF 2015 CHALLENGE Sergey Novoselov 1,2, Alexandr Kozlov 2, Galina Lavrentyeva 1,2, Konstantin Simonchik 1,2, Vadim Shchemelinin 1,2 1 ITMO University, St. Petersburg,
More informationarxiv: v1 [cs.sd] 24 May 2017
Anti-spoofing Methods for Automatic Speaker Verification System Galina Lavrentyeva 1,2, Sergey Novoselov 1,2, and Konstantin Simonchik 1,2 arxiv:1705.08865v1 [cs.sd] 24 May 2017 1 Speech Technology Center
More informationSAS: A speaker verification spoofing database containing diverse attacks
SAS: A speaker verification spoofing database containing diverse attacks Zhizheng Wu 1, Ali Khodabakhsh 2, Cenk Demiroglu 2, Junichi Yamagishi 1,3, Daisuke Saito 4, Tomoki Toda 5, Simon King 1 1 University
More informationComparative Evaluation of Feature Normalization Techniques for Speaker Verification
Comparative Evaluation of Feature Normalization Techniques for Speaker Verification Md Jahangir Alam 1,2, Pierre Ouellet 1, Patrick Kenny 1, Douglas O Shaughnessy 2, 1 CRIM, Montreal, Canada {Janagir.Alam,
More informationImproving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data
INTERSPEECH 17 August 24, 17, Stockholm, Sweden Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data Achintya Kr. Sarkar 1, Md. Sahidullah 2, Zheng-Hua
More informationIntroducing I-Vectors for Joint Anti-spoofing and Speaker Verification
Introducing I-Vectors for Joint Anti-spoofing and Speaker Verification Elie Khoury, Tomi Kinnunen, Aleksandr Sizov, Zhizheng Wu, Sébastien Marcel Idiap Research Institute, Switzerland School of Computing,
More informationarxiv: v1 [cs.mm] 23 Jan 2019
GENERALIZATION OF SPOOFING COUNTERMEASURES: A CASE STUDY WITH ASVSPOOF 215 AND BTAS 216 CORPORA Dipjyoti Paul 1, Md Sahidullah 2, Goutam Saha 1 arxiv:191.825v1 [cs.mm] 23 Jan 219 1 Department of E & ECE,
More informationThe Approach of Mean Shift based Cosine Dissimilarity for Multi-Recording Speaker Clustering
The Approach of Mean Shift based Cosine Dissimilarity for Multi-Recording Speaker Clustering 1 D. Jareena Begum, 2 K Rajendra Prasad, 3 M Suleman Basha 1 M.Tech in SE, RGMCET, Nandyal 2 Assoc Prof, Dept
More informationarxiv: v1 [cs.sd] 8 Jun 2017
SUT SYSTEM DESCRIPTION FOR NIST SRE 2016 Hossein Zeinali 1,2, Hossein Sameti 1 and Nooshin Maghsoodi 1 1 Sharif University of Technology, Tehran, Iran 2 Brno University of Technology, Speech@FIT and IT4I
More informationDetection of Replay Attacks using Single Frequency Filtering Cepstral Coefficients
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Detection of Replay Attacks using Single Frequency Filtering Cepstral Coefficients K N R K Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri,
More informationThe ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection Tomi Kinnunen 1, Md Sahidullah 1, Héctor Delgado 2, Massimiliano
More informationUnsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection Hardik B. Sailor, Madhu R. Kamble,
More informationSUT Submission for NIST 2016 Speaker Recognition Evaluation: Description and Analysis
The 2017 Conference on Computational Linguistics and Speech Processing ROCLING 2017, pp. 276-286 The Association for Computational Linguistics and Chinese Language Processing SUT Submission for NIST 2016
More informationReplay spoofing detection system for automatic speaker verification using multi-task learning of noise classes
Replay spoofing detection system for automatic speaker verification using multi-task learning of noise classes Hye-Jin Shim shimhz6.6@gmail.com Sung-Hyun Yoon ysh901108@naver.com Jee-Weon Jung jeewon.leo.jung@gmail.com
More informationReplay Attack Detection using DNN for Channel Discrimination
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Replay Attack Detection using DNN for Channel Discrimination Parav Nagarsheth, Elie Khoury, Kailash Patil, Matt Garland Pindrop, Atlanta, USA {pnagarsheth,ekhoury,kpatil,matt.garland}@pindrop.com
More informationANALYSING REPLAY SPOOFING COUNTERMEASURE PERFORMANCE UNDER VARIED CONDITIONS
2018 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 17 20, 2018, AALBORG, DENMARK ANALYSING REPLAY SPOOFING COUNTERMEASURE PERFORMANCE UNDER VARIED CONDITIONS Bhusan Chettri
More informationOVERVIEW OF BTAS 2016 SPEAKER ANTI-SPOOFING COMPETITION
RESEARCH REPORT IDIAP OVERVIEW OF BTAS 2016 SPEAKER ANTI-SPOOFING COMPETITION Pavel Korshunov Sébastien Marcel a Hannah Muckenhirn A. R. Gonçalves A. G. Souza Mello R. P. Velloso Violato Flávio Simões
More informationBo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on
Bo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on TAN Zhili & MAK Man-Wai APSIPA 2015 Department of Electronic and Informa2on Engineering The Hong Kong Polytechnic
More informationSRE08 system. Nir Krause Ran Gazit Gennady Karvitsky. Leave Impersonators, fraudsters and identity thieves speechless
Leave Impersonators, fraudsters and identity thieves speechless SRE08 system Nir Krause Ran Gazit Gennady Karvitsky Copyright 2008 PerSay Inc. All Rights Reserved Focus: Multilingual telephone speech and
More informationVoiceprint-based Access Control for Wireless Insulin Pump Systems
Voiceprint-based Access Control for Wireless Insulin Pump Systems Presenter: Xiaojiang Du Bin Hao, Xiali Hei, Yazhou Tu, Xiaojiang Du, and Jie Wu School of Computing and Informatics, University of Louisiana
More informationSpoofing Speech Detection using Temporal Convolutional Neural Network
Spoofing Speech Detection using Temporal Convolutional Neural Network Xiaohai Tian, Xiong Xiao, Eng Siong Chng and Haizhou Li School of Computer Science and Engineering, Nanyang Technological University
More informationPresentation attack detection in voice biometrics
Chapter 1 Presentation attack detection in voice biometrics Pavel Korshunov and Sébastien Marcel Idiap Research Institute, Martigny, Switzerland {pavel.korshunov,sebastien.marcel}@idiap.ch Recent years
More informationPDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/94752
More informationMultifactor Fusion for Audio-Visual Speaker Recognition
Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 2007 70 Multifactor Fusion for Audio-Visual Speaker Recognition GIRIJA CHETTY
More informationSpeaker Verification with Adaptive Spectral Subband Centroids
Speaker Verification with Adaptive Spectral Subband Centroids Tomi Kinnunen 1, Bingjun Zhang 2, Jia Zhu 2, and Ye Wang 2 1 Speech and Dialogue Processing Lab Institution for Infocomm Research (I 2 R) 21
More informationIMPROVED SPEAKER RECOGNITION USING DCT COEFFICIENTS AS FEATURES. Mitchell McLaren, Yun Lei
IMPROVED SPEAKER RECOGNITION USING DCT COEFFICIENTS AS FEATURES Mitchell McLaren, Yun Lei Speech Technology and Research Laboratory, SRI International, California, USA {mitch,yunlei}@speech.sri.com ABSTRACT
More informationFOUR WEIGHTINGS AND A FUSION: A CEPSTRAL-SVM SYSTEM FOR SPEAKER RECOGNITION. Sachin S. Kajarekar
FOUR WEIGHTINGS AND A FUSION: A CEPSTRAL-SVM SYSTEM FOR SPEAKER RECOGNITION Sachin S. Kajarekar Speech Technology and Research Laboratory SRI International, Menlo Park, CA, USA sachin@speech.sri.com ABSTRACT
More informationThis presentation is the third part of a tutorial presented at INTERSPEECH 2018:
INTERSPEECH 2018 SEPTEMBER 2-6 HYDERABAD, INDIA This presentation is the third part of a tutorial presented at INTERSPEECH 2018: Spoofing attacks in Automatic Speaker Verification: Analysis and Countermeasures
More informationComplex Identification Decision Based on Several Independent Speaker Recognition Methods. Ilya Oparin Speech Technology Center
Complex Identification Decision Based on Several Independent Speaker Recognition Methods Ilya Oparin Speech Technology Center Corporate Overview Global provider of voice biometric solutions Company name:
More informationA One-Class Classification Approach to Generalised Speaker Verification Spoofing Countermeasures using Local Binary Patterns
A One-Class Classification Approach to Generalised Speaker Verification Spoofing Countermeasures using Local Binary Patterns Federico Alegre, Asmaa Amehraye, Nicholas Evans To cite this version: Federico
More informationClient Dependent GMM-SVM Models for Speaker Verification
Client Dependent GMM-SVM Models for Speaker Verification Quan Le, Samy Bengio IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland {quan,bengio}@idiap.ch Abstract. Generative Gaussian Mixture Models (GMMs)
More informationImproving Robustness to Compressed Speech in Speaker Recognition
INTERSPEECH 2013 Improving Robustness to Compressed Speech in Speaker Recognition Mitchell McLaren 1, Victor Abrash 1, Martin Graciarena 1, Yun Lei 1, Jan Pe sán 2 1 Speech Technology and Research Laboratory,
More informationGender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV
Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Jan Vaněk and Josef V. Psutka Department of Cybernetics, West Bohemia University,
More informationComparison of Clustering Methods: a Case Study of Text-Independent Speaker Modeling
Comparison of Clustering Methods: a Case Study of Text-Independent Speaker Modeling Tomi Kinnunen, Ilja Sidoroff, Marko Tuononen, Pasi Fränti Speech and Image Processing Unit, School of Computing, University
More informationSupervector Compression Strategies to Speed up I-Vector System Development
Supervector Compression Strategies to Speed up I-Vector System Development Ville Vestman Tomi Kinnunen University of Eastern Finland Finland vvestman@cs.uef.fi tkinnu@cs.uef.fi system can be achieved [5
More informationAcoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing
Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se
More informationTrial-Based Calibration for Speaker Recognition in Unseen Conditions
Trial-Based Calibration for Speaker Recognition in Unseen Conditions Mitchell McLaren, Aaron Lawson, Luciana Ferrer, Nicolas Scheffer, Yun Lei Speech Technology and Research Laboratory SRI International,
More informationConstant Q Cepstral Coefficients: A Spoofing Countermeasure for Automatic Speaker Verification
Constant Q Cepstral Coefficients: A Spoofing Countermeasure for Automatic Speaker Verification Massimiliano Todisco, Héctor Delgado and Nicholas Evans EURECOM, Sophia Antipolis, France Abstract Recent
More informationSPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION
Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing
More informationAN I-VECTOR BASED DESCRIPTOR FOR ALPHABETICAL GESTURE RECOGNITION
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) AN I-VECTOR BASED DESCRIPTOR FOR ALPHABETICAL GESTURE RECOGNITION You-Chi Cheng 1, Ville Hautamäki 2, Zhen Huang 1,
More informationON THE EFFECT OF SCORE EQUALIZATION IN SVM MULTIMODAL BIOMETRIC SYSTEMS
ON THE EFFECT OF SCORE EQUALIZATION IN SVM MULTIMODAL BIOMETRIC SYSTEMS Pascual Ejarque and Javier Hernando TALP Research Center, Department of Signal Theory and Communications Technical University of
More informationA Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation
A Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation * A. H. M. Al-Helali, * W. A. Mahmmoud, and * H. A. Ali * Al- Isra Private University Email: adnan_hadi@yahoo.com Abstract:
More informationConfidence Measures: how much we can trust our speech recognizers
Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition
More informationK-Nearest Neighbor Classification Approach for Face and Fingerprint at Feature Level Fusion
K-Nearest Neighbor Classification Approach for Face and Fingerprint at Feature Level Fusion Dhriti PEC University of Technology Chandigarh India Manvjeet Kaur PEC University of Technology Chandigarh India
More informationASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements
ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements Héctor Delgado 1, Massimiliano Todisco 1, Md Sahidullah 2, Nicholas Evans 1, Tomi Kinnunen 3, Kong Aik Lee 4, Junichi Yamagishi 5,6
More informationTWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University
TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION Prateek Verma, Yang-Kai Lin, Li-Fan Yu Stanford University ABSTRACT Structural segmentation involves finding hoogeneous sections appearing
More informationREDDOTS REPLAYED: A NEW REPLAY SPOOFING ATTACK CORPUS FOR TEXT-DEPENDENT SPEAKER VERIFICATION RESEARCH
REDDOTS REPLAYED: A NEW REPLAY SPOOFING ATTACK CORPUS FOR TEXT-DEPENDENT SPEAKER VERIFICATION RESEARCH Tomi Kinnunen 1, Md Sahidullah 1, Mauro Falcone 2, Luca Costantini 2, Rosa González Hautamäki 1, Dennis
More informationGYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)
GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE
More informationHIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION. Hung-An Chang and James R. Glass
HIERARCHICAL LARGE-MARGIN GAUSSIAN MIXTURE MODELS FOR PHONETIC CLASSIFICATION Hung-An Chang and James R. Glass MIT Computer Science and Artificial Intelligence Laboratory Cambridge, Massachusetts, 02139,
More informationVoice Command Based Computer Application Control Using MFCC
Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,
More informationSOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2
Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 1 Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 33720, Tampere, Finland toni.heittola@tut.fi,
More informationVoice Conversion Using Dynamic Kernel. Partial Least Squares Regression
Voice Conversion Using Dynamic Kernel 1 Partial Least Squares Regression Elina Helander, Hanna Silén, Tuomas Virtanen, Member, IEEE, and Moncef Gabbouj, Fellow, IEEE Abstract A drawback of many voice conversion
More informationVariable-Component Deep Neural Network for Robust Speech Recognition
Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft
More informationVoice Conversion for Non-Parallel Datasets Using Dynamic Kernel Partial Least Squares Regression
INTERSPEECH 2013 Voice Conversion for Non-Parallel Datasets Using Dynamic Kernel Partial Least Squares Regression Hanna Silén, Jani Nurminen, Elina Helander, Moncef Gabbouj Department of Signal Processing,
More informationPitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery
Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta Kumar Ghosh SPIRE LAB Electrical Engineering, Indian Institute of Science (IISc), Bangalore,
More informationShape and Texture Based Countermeasure to Protect Face Recognition Systems Against Mask Attacks
2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Shape and Texture Based Countermeasure to Protect Face Recognition Systems Against Mask Attacks Neslihan Kose and Jean-Luc Dugelay
More informationAuthentication of Fingerprint Recognition Using Natural Language Processing
Authentication of Fingerprint Recognition Using Natural Language Shrikala B. Digavadekar 1, Prof. Ravindra T. Patil 2 1 Tatyasaheb Kore Institute of Engineering & Technology, Warananagar, India 2 Tatyasaheb
More informationASVspoof 2019: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan
ASVspoof 2019: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan ASVspoof consortium http://www.asvspoof.org/ January 15, 2019 1 Introduction The ASVspoof 2019 challenge
More informationAgatha: Multimodal Biometric Authentication Platform in Large-Scale Databases
Agatha: Multimodal Biometric Authentication Platform in Large-Scale Databases David Hernando David Gómez Javier Rodríguez Saeta Pascual Ejarque 2 Javier Hernando 2 Biometric Technologies S.L., Barcelona,
More informationText-Independent Speaker Identification
December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.
More informationVoice Authentication Using Short Phrases: Examining Accuracy, Security and Privacy Issues
Voice Authentication Using Short Phrases: Examining Accuracy, Security and Privacy Issues R.C. Johnson, Terrance E. Boult University of Colorado, Colorado Springs Colorado Springs, CO, USA rjohnso9 tboult
More informationDetector. Flash. Detector
CLIPS at TRECvid: Shot Boundary Detection and Feature Detection Georges M. Quénot, Daniel Moraru, and Laurent Besacier CLIPS-IMAG, BP53, 38041 Grenoble Cedex 9, France Georges.Quenot@imag.fr Abstract This
More informationTime Analysis of Pulse-based Face Anti-Spoofing in Visible and NIR
Time Analysis of Pulse-based Face Anti-Spoofing in Visible and NIR Javier Hernandez-Ortega, Julian Fierrez, Aythami Morales, and Pedro Tome Biometrics and Data Pattern Analytics BiDA Lab Universidad Autónoma
More informationSpeaker Diarization System Based on GMM and BIC
Speaer Diarization System Based on GMM and BIC Tantan Liu 1, Xiaoxing Liu 1, Yonghong Yan 1 1 ThinIT Speech Lab, Institute of Acoustics, Chinese Academy of Sciences Beijing 100080 {tliu, xliu,yyan}@hccl.ioa.ac.cn
More informationWriter Identification In Music Score Documents Without Staff-Line Removal
Writer Identification In Music Score Documents Without Staff-Line Removal Anirban Hati, Partha P. Roy and Umapada Pal Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata,
More informationOptimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification
Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 52 Optimization of Observation Membership Function By Particle Swarm Method for Enhancing
More informationSimultaneous Design of Feature Extractor and Pattern Classifer Using the Minimum Classification Error Training Algorithm
Griffith Research Online https://research-repository.griffith.edu.au Simultaneous Design of Feature Extractor and Pattern Classifer Using the Minimum Classification Error Training Algorithm Author Paliwal,
More informationInput speech signal. Selected /Rejected. Pre-processing Feature extraction Matching algorithm. Database. Figure 1: Process flow in ASR
Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Feature Extraction
More informationAudiovisual Synchrony Detection with Optimized Audio Features
Audiovisual Synchrony Detection with Optimized Audio Features Sami Sieranoja, Md Sahidullah, Tomi Kinnunen School of Computing University of Eastern Finland, Joensuu, Finland Jukka Komulainen, Abdenour
More informationBiometrics already form a significant component of current. Biometrics Systems Under Spoofing Attack
[ Abdenour Hadid, Nicholas Evans, Sébastien Marcel, and Julian Fierrez ] s Systems Under Spoofing Attack [ An evaluation methodology and lessons learned ] istockphoto.com/greyfebruary s already form a
More informationBaseball Game Highlight & Event Detection
Baseball Game Highlight & Event Detection Student: Harry Chao Course Adviser: Winston Hu 1 Outline 1. Goal 2. Previous methods 3. My flowchart 4. My methods 5. Experimental result 6. Conclusion & Future
More informationDiscriminative training and Feature combination
Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics
More informationOutline. Incorporating Biometric Quality In Multi-Biometrics FUSION. Results. Motivation. Image Quality: The FVC Experience
Incorporating Biometric Quality In Multi-Biometrics FUSION QUALITY Julian Fierrez-Aguilar, Javier Ortega-Garcia Biometrics Research Lab. - ATVS Universidad Autónoma de Madrid, SPAIN Loris Nanni, Raffaele
More informationBiometrics Technology: Multi-modal (Part 2)
Biometrics Technology: Multi-modal (Part 2) References: At the Level: [M7] U. Dieckmann, P. Plankensteiner and T. Wagner, "SESAM: A biometric person identification system using sensor fusion ", Pattern
More informationApplications of Keyword-Constraining in Speaker Recognition. Howard Lei. July 2, Introduction 3
Applications of Keyword-Constraining in Speaker Recognition Howard Lei hlei@icsi.berkeley.edu July 2, 2007 Contents 1 Introduction 3 2 The keyword HMM system 4 2.1 Background keyword HMM training............................
More informationHow accurate is AGNITIO KIVOX Voice ID?
How accurate is AGNITIO KIVOX Voice ID? Overview Using natural speech, KIVOX can work with error rates below 1%. When optimized for short utterances, where the same phrase is used for enrolment and authentication,
More informationSpeech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:
More informationFUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION
Please contact the conference organizers at dcasechallenge@gmail.com if you require an accessible file, as the files provided by ConfTool Pro to reviewers are filtered to remove author information, and
More informationXing Fan, Carlos Busso and John H.L. Hansen
Xing Fan, Carlos Busso and John H.L. Hansen Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science Department of Electrical Engineering University of Texas at Dallas
More informationMulti-Modal Human Verification Using Face and Speech
22 Multi-Modal Human Verification Using Face and Speech Changhan Park 1 and Joonki Paik 2 1 Advanced Technology R&D Center, Samsung Thales Co., Ltd., 2 Graduate School of Advanced Imaging Science, Multimedia,
More informationNeetha Das Prof. Andy Khong
Neetha Das Prof. Andy Khong Contents Introduction and aim Current system at IMI Proposed new classification model Support Vector Machines Initial audio data collection and processing Features and their
More informationOn-line Signature Verification on a Mobile Platform
On-line Signature Verification on a Mobile Platform Nesma Houmani, Sonia Garcia-Salicetti, Bernadette Dorizzi, and Mounim El-Yacoubi Institut Telecom; Telecom SudParis; Intermedia Team, 9 rue Charles Fourier,
More informationLARGE-SCALE SPEAKER IDENTIFICATION
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) LARGE-SCALE SPEAKER IDENTIFICATION Ludwig Schmidt MIT Matthew Sharifi and Ignacio Lopez Moreno Google, Inc. ABSTRACT
More informationMultimodal Biometric System by Feature Level Fusion of Palmprint and Fingerprint
Multimodal Biometric System by Feature Level Fusion of Palmprint and Fingerprint Navdeep Bajwa M.Tech (Student) Computer Science GIMET, PTU Regional Center Amritsar, India Er. Gaurav Kumar M.Tech (Supervisor)
More informationEFFECTIVE METHODOLOGY FOR DETECTING AND PREVENTING FACE SPOOFING ATTACKS
EFFECTIVE METHODOLOGY FOR DETECTING AND PREVENTING FACE SPOOFING ATTACKS 1 Mr. Kaustubh D.Vishnu, 2 Dr. R.D. Raut, 3 Dr. V. M. Thakare 1,2,3 SGBAU, Amravati,Maharashtra, (India) ABSTRACT Biometric system
More informationSecure E- Commerce Transaction using Noisy Password with Voiceprint and OTP
Secure E- Commerce Transaction using Noisy Password with Voiceprint and OTP Komal K. Kumbhare Department of Computer Engineering B. D. C. O. E. Sevagram, India komalkumbhare27@gmail.com Prof. K. V. Warkar
More informationChapter 3. Speech segmentation. 3.1 Preprocessing
, as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents
More informationProduction of Video Images by Computer Controlled Cameras and Its Application to TV Conference System
Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol.2, II-131 II-137, Dec. 2001. Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System
More informationRLAT Rapid Language Adaptation Toolkit
RLAT Rapid Language Adaptation Toolkit Tim Schlippe May 15, 2012 RLAT Rapid Language Adaptation Toolkit - 2 RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit - 3 Outline Introduction
More informationUsing Gradient Descent Optimization for Acoustics Training from Heterogeneous Data
Using Gradient Descent Optimization for Acoustics Training from Heterogeneous Data Martin Karafiát Λ, Igor Szöke, and Jan Černocký Brno University of Technology, Faculty of Information Technology Department
More informationREAL-TIME ROAD SIGNS RECOGNITION USING MOBILE GPU
High-Performance Сomputing REAL-TIME ROAD SIGNS RECOGNITION USING MOBILE GPU P.Y. Yakimov Samara National Research University, Samara, Russia Abstract. This article shows an effective implementation of
More informationEM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition
EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition Yan Han and Lou Boves Department of Language and Speech, Radboud University Nijmegen, The Netherlands {Y.Han,
More informationAudio-visual interaction in sparse representation features for noise robust audio-visual speech recognition
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for
More informationHIDDEN Markov model (HMM)-based statistical parametric
1492 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 5, JULY 2012 Minimum Kullback Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis Zhen-Hua Ling, Member,
More informationLec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA
Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA Zhu Li Dept of CSEE,
More informationFigure 1. Example sample for fabric mask. In the second column, the mask is worn on the face. The picture is taken from [5].
ON THE VULNERABILITY OF FACE RECOGNITION SYSTEMS TO SPOOFING MASK ATTACKS Neslihan Kose, Jean-Luc Dugelay Multimedia Department, EURECOM, Sophia-Antipolis, France {neslihan.kose, jean-luc.dugelay}@eurecom.fr
More information2. Basic Task of Pattern Classification
2. Basic Task of Pattern Classification Definition of the Task Informal Definition: Telling things apart 3 Definition: http://www.webopedia.com/term/p/pattern_recognition.html pattern recognition Last
More informationQuery-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram
International Conference on Education, Management and Computing Technology (ICEMCT 2015) Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based
More informationGender Classification Technique Based on Facial Features using Neural Network
Gender Classification Technique Based on Facial Features using Neural Network Anushri Jaswante Dr. Asif Ullah Khan Dr. Bhupesh Gour Computer Science & Engineering, Rajiv Gandhi Proudyogiki Vishwavidyalaya,
More informationMulti-modal Person Identification in a Smart Environment
Multi-modal Person Identification in a Smart Environment Hazım Kemal Ekenel 1, Mika Fischer 1, Qin Jin 2, Rainer Stiefelhagen 1 1 Interactive Systems Labs (ISL), Universität Karlsruhe (TH), 76131 Karlsruhe,
More information