Speaker Verification Using SVM

Similar documents
Multi-Modal Communication

4. Lessons Learned in Introducing MBSE: 2009 to 2012

DoD Common Access Card Information Brief. Smart Card Project Managers Group

Using Model-Theoretic Invariants for Semantic Integration. Michael Gruninger NIST / Institute for Systems Research University of Maryland

Service Level Agreements: An Approach to Software Lifecycle Management. CDR Leonard Gaines Naval Supply Systems Command 29 January 2003

Kathleen Fisher Program Manager, Information Innovation Office

High-Assurance Security/Safety on HPEC Systems: an Oxymoron?

Information, Decision, & Complex Networks AFOSR/RTC Overview

2013 US State of Cybercrime Survey

FUDSChem. Brian Jordan With the assistance of Deb Walker. Formerly Used Defense Site Chemistry Database. USACE-Albuquerque District.

Running CyberCIEGE on Linux without Windows

Secure FAST: Security Enhancement in the NATO Time Sensitive Targeting Tool

A Review of the 2007 Air Force Inaugural Sustainability Report

Accuracy of Computed Water Surface Profiles

Vision Protection Army Technology Objective (ATO) Overview for GVSET VIP Day. Sensors from Laser Weapons Date: 17 Jul 09 UNCLASSIFIED

The C2 Workstation and Data Replication over Disadvantaged Tactical Communication Links

Architecting for Resiliency Army s Common Operating Environment (COE) SERC

Dana Sinno MIT Lincoln Laboratory 244 Wood Street Lexington, MA phone:

ARINC653 AADL Annex Update

Space and Missile Systems Center

Space and Missile Systems Center

Empirically Based Analysis: The DDoS Case

73rd MORSS CD Cover Page UNCLASSIFIED DISCLOSURE FORM CD Presentation

ASSESSMENT OF A BAYESIAN MODEL AND TEST VALIDATION METHOD

75th Air Base Wing. Effective Data Stewarding Measures in Support of EESOH-MIS

Washington University

From Signature-Based Towards Behaviour-Based Anomaly Detection (Extended Abstract)

Energy Security: A Global Challenge

73rd MORSS CD Cover Page UNCLASSIFIED DISCLOSURE FORM CD Presentation

COMPUTATIONAL FLUID DYNAMICS (CFD) ANALYSIS AND DEVELOPMENT OF HALON- REPLACEMENT FIRE EXTINGUISHING SYSTEMS (PHASE II)

Col Jaap de Die Chairman Steering Committee CEPA-11. NMSG, October

Distributed Real-Time Embedded Video Processing

Using Templates to Support Crisis Action Mission Planning

Shallow Ocean Bottom BRDF Prediction, Modeling, and Inversion via Simulation with Surface/Volume Data Derived from X-Ray Tomography

SURVIVABILITY ENHANCED RUN-FLAT

Towards a Formal Pedigree Ontology for Level-One Sensor Fusion

NEW FINITE ELEMENT / MULTIBODY SYSTEM ALGORITHM FOR MODELING FLEXIBLE TRACKED VEHICLES

BUPT at TREC 2009: Entity Track

Development and Test of a Millimetre-Wave ISAR Images ATR System

Directed Energy Using High-Power Microwave Technology

Technological Advances In Emergency Management

C2-Simulation Interoperability in NATO

Concept of Operations Discussion Summary

VICTORY VALIDATION AN INTRODUCTION AND TECHNICAL OVERVIEW

Office of Global Maritime Situational Awareness

Wireless Connectivity of Swarms in Presence of Obstacles

Use of the Polarized Radiance Distribution Camera System in the RADYO Program

Lessons Learned in Adapting a Software System to a Micro Computer

Web Site update. 21st HCAT Program Review Toronto, September 26, Keith Legg

An Update on CORBA Performance for HPEC Algorithms. Bill Beckwith Objective Interface Systems, Inc.

Detection, Classification, & Identification of Objects in Cluttered Images

David W. Hyde US Army Engineer Waterways Experiment Station Vicksburg, Mississippi ABSTRACT

ENVIRONMENTAL MANAGEMENT SYSTEM WEB SITE (EMSWeb)

Dr. ing. Rune Aasgaard SINTEF Applied Mathematics PO Box 124 Blindern N-0314 Oslo NORWAY

Current Threat Environment

Setting the Standard for Real-Time Digital Signal Processing Pentek Seminar Series. Digital IF Standardization

A Distributed Parallel Processing System for Command and Control Imagery

REPORT DOCUMENTATION PAGE

Approaches to Improving Transmon Qubits

Introducing I 3 CON. The Information Interpretation and Integration Conference

Apparel Research Network Upgrade Special Measurement Electronic Order File Final Technical Report September 27, 2004

Topology Control from Bottom to Top

MODELING AND SIMULATION OF LIQUID MOLDING PROCESSES. Pavel Simacek Center for Composite Materials University of Delaware

32nd Annual Precise Time and Time Interval (PTTI) Meeting. Ed Butterline Symmetricom San Jose, CA, USA. Abstract

DEVELOPMENT OF A NOVEL MICROWAVE RADAR SYSTEM USING ANGULAR CORRELATION FOR THE DETECTION OF BURIED OBJECTS IN SANDY SOILS

U.S. Army Research, Development and Engineering Command (IDAS) Briefer: Jason Morse ARMED Team Leader Ground System Survivability, TARDEC

ATCCIS Replication Mechanism (ARM)

Advanced Numerical Methods for Numerical Weather Prediction

DATA COLLECTION AND TESTING TOOL: SAUDAS

COUNTERING IMPROVISED EXPLOSIVE DEVICES

Ross Lazarus MB,BS MPH

Exploring the Query Expansion Methods for Concept Based Representation

Simulation of Routing Protocol with CoS/QoS Enhancements in Heterogeneous Communication Network

Data Warehousing. HCAT Program Review Park City, UT July Keith Legg,

NATO-IST-124 Experimentation Instructions

Dr. Stuart Dickinson Dr. Donald H. Steinbrecher Naval Undersea Warfare Center, Newport, RI May 10, 2011

SINOVIA An open approach for heterogeneous ISR systems inter-operability

LARGE AREA, REAL TIME INSPECTION OF ROCKET MOTORS USING A NOVEL HANDHELD ULTRASOUND CAMERA

Application of Hydrodynamics and Dynamics Models for Efficient Operation of Modular Mini-AUVs in Shallow and Very-Shallow Waters

Defense Hotline Allegations Concerning Contractor-Invoiced Travel for U.S. Army Corps of Engineers' Contracts W912DY-10-D-0014 and W912DY-10-D-0024

Computer Aided Munitions Storage Planning

Rockwell Science Center (RSC) Technologies for WDM Components

Increased Underwater Optical Imaging Performance Via Multiple Autonomous Underwater Vehicles

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud

An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs

INTEGRATING LOCAL AND GLOBAL NAVIGATION IN UNMANNED GROUND VEHICLES

Basing a Modeling Environment on a General Purpose Theorem Prover

COTS Multicore Processors in Avionics Systems: Challenges and Solutions

PEO C4I Remarks for NPS Acquisition Research Symposium

Shallow Ocean Bottom BRDF Prediction, Modeling, and Inversion via Simulation with Surface/Volume Data Derived from X-ray Tomography

Advanced Numerical Methods for Numerical Weather Prediction

WAITING ON MORE THAN 64 HANDLES

SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task

Balancing Transport and Physical Layers in Wireless Ad Hoc Networks: Jointly Optimal Congestion Control and Power Control

Automation Middleware and Algorithms for Robotic Underwater Sensor Networks

2011 NNI Environment, Health, and Safety Research Strategy

Title: An Integrated Design Environment to Evaluate Power/Performance Tradeoffs for Sensor Network Applications 1

Next generation imager performance model

MATREX Run Time Interface (RTI) DoD M&S Conference 10 March 2008

Use of the Polarized Radiance Distribution Camera System in the RADYO Program

Transcription:

Mr. Rastoceanu Florin / Mrs. Lazar Marilena Military Equipment and Technologies Research Agency Aeroportului Street, No. 16, CP 19 OP Bragadiru 077025, Ilfov ROMANIA email: rastoceanu_florin@yahoo.com / mnvlazar@yahoo.com ABSTRACT In this paper, we describe an application of speaker verification using Romanian vowels as speaker s models in case of a small Romanian language database. Afterwards the models are classified with the powerful technique named SVM. 1. INTRODUCTION If the XX century was the speed century, the one that just begin is the communication century. Because the communication is vital in many areas, the people effort was concentrated in building large communications channels that can transmit more and more information in a shorter time. In the present this objective is almost accomplished and the main effort now is to protect the information that flows through this channels. This is very important, because we know that today the most used communication channels are public, like internet or electromagnetic waves. The first step in protecting this information is the authentication. Biometrics are better methods for authentication and that is the reason that many application use biometric methods. The biometric methods used in present with good results are fingerprint identification, iris scan, face recognition and hand geometry biometrics. Nevertheless those methods needs important resources or are difficult to use. To overstep those limits, person voice could be used for authentication. For example in telephony application the required resources are provided by the phone itself. Speaker recognition can be used in many areas, like: homeland security: airport security, strengthening the national borders, in travel documents, visas; enterprise-wide network security infrastructures; secure electronic banking; investing and other financial transactions; retail sales, law enforcement; health and social services. Automatic speaker recognition systems have a wide range of potential applications in Army environments, as well: verify the identity of users of various communication channels; provide access control to restricted areas, equipment and information; verify computer users through terminals accepting voice input; counter - terrorism measures. A voice recognition system can be used in identifying an unknown voice recording intercepted by the authorities. RTO-MP-IST-091 P12-1

Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE NOV 2010 4. TITLE AND SUBTITLE Speaker Verification Using SVM 2. REPORT TYPE N/A 3. DATES COVERED - 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Military Equipment and Technologies Research Agency Aeroportului Street, No. 16, CP 19 OP Bragadiru 077025, Ilfov ROMANIA 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR S ACRONYM(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release, distribution unlimited 11. SPONSOR/MONITOR S REPORT NUMBER(S) 13. SUPPLEMENTARY NOTES See also ADA564697. Information Assurance and Cyber Defence (Assurance de l information et cyberdefense). RTO-MP-IST-091 14. ABSTRACT In this paper, we describe an application of speaker verification using Romanian vowels as speakers models in case of a small Romanian language database. Afterwards the models are classified with the powerful technique named SVM. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT SAR a. REPORT unclassified b. ABSTRACT unclassified c. THIS PAGE unclassified 18. NUMBER OF PAGES 6 19a. NAME OF RESPONSIBLE PERSON Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

The paper is organized as follows: After introduction, in the second section it is given a brief introduction to the theory of SVMs. In the third section it is described an experiment using SVMs for a speaker verification application using Romanian vowels. We conclude with the results obtained by application describe above and future work that shall done for increasing the performance of the system. 2. SUPPORT VECTOR MACHINES The support vector machine (SVM) is a supervised learning method that generates input-output mapping functions from a set of labeled training data [1]. The mapping function can be either a classification or a regression function. For classification, nonlinear kernel functions are often used to transform input data to a high-dimensional feature space in which the input data become more separable compared to the original input space. Maximum-margin hyperplanes are then created. The model thus produced depends on only a subset of the training data near the class boundaries. 2.1 Linear Case Consider the problem of separ1ating the set of N training vectors {(x 1,y 1 ),, (x n,y n )}, xє m, belonging to two different classes y i Є{-1, 1}. The goal is to find the liniar decision fuction D(x) and the separation plane H. H:< w, x > + b = 0 (1) D(x)= sign (w x+b) (2) where b is the distance of the hyperplane from the origin and w is the normal to the decision region. Margins Hiperplanes Optimum hiperplan Maximum margin Figure 1: Separation hyperplanes Let the margin of the SVM be defined as the distance from the separating hyperplane to the closest two classes. The SVM training paradigm finds the separating hyperplane which gives the maximum margin. The margin is equal to 2/ w. Once the hyperplane is obtained, all the training examples satisfy the following inequalities [2] : We can summarize the above procedure to the following: x i w+b +1 for y i = +1 (3) x i w+b -1 for y i = -1 (4) P12-2 RTO-MP-IST-091

2.2 Non-Linear Case Minimize 1 2 L( w) w 2 Subject to y i (x i w + b) +1, i=1,2,, N (5) Real-world classification problems typically involve data that can only be separated using a nonlinear decision surface. Optimization on the input data in this case involves the use of a kernel-based transformation who transform data in a higher dimensional space (feature space) in which data are linear separable. k(x i,x j )= Φ(x i ) Φ(x j ) (6) Input space N - dimensional Feature space M>N - dimensional X -> ф(x) Suport Vectors Figure 2: SVM principle Kernels allow a dot product to be computed in a higher dimensional space without explicitly mapping the data into these spaces. The kernels used in our application are: Table 1: Kernels used in experiments RBF 2 exp( g ( x ) ) Polynomial x i ( s ( x x i ) c) d 3. SPEAKER VERIFICATION METHOD The next figure shows the diagram of the text-independent speaker verification application realized by the authors using the SVM approach. In class data Out class data TRAINING Train the model SVM Model Test data Claimed SVM model TESTING Decision Accept Reject Figure 3: SVM speaker verification method RTO-MP-IST-091 P12-3

A speaker verification system is composed of two distinct phases, a training phase and a test phase. In the training phase the SVM models corresponding to each speaker are created. For each speaker are created a number of 7 models, one for each Romanian vowel. For training this models are used in class data vowels extracted from current speaker and out class data vowels extracted from the other speakers. In testing phase the testing data are compared with the claimed SVM model and a decision is made. The Equal Error Rate (EER) is used to measure the system performance in all our evaluations. For SVM implementation we use LIBSVM [3], a library for support vector machines classification and regression, developed by National Taiwan University. 4. EXPERIMENTS AND RESULTS The evaluation was carried out on a small database with 10 speakers (2 female and 8 male). A number of 50 different sentences are spoken by each speaker. From this sentences are extracted vowels used for experiment. According with the frequency of its appearance in this sentences are extracted a number of different samples according with vowel s apparition in spoken phrases. The feature extracted from this vowels were 12 LPC coefficients concatenated with 12 delta LPC coefficients (total of 24 coefficients) and a forty dimensional feature vector composed by 12 mel-cepstrum coefficients, log energy, 0th cepstral coefficient, delta and delta-delta coefficients. A set of features corresponding with 80% from the total number of vowels extracted from this sentences were used in the training process and the other 20% were used for testing. Experiments were carried out to compare the method performances using different types of kernel functions, feature extracted and vowels in a SVM implementation. In the first stage the comparison are made against the two types of coefficients and kernels used in SVM implementation. For this purpose are used LPC and MFCC coefficient. As a kernel functions are used RBF and Polynomial (degree 3 and 4). The results are presented in figure 4 and table II. We can observe from this that the coefficients with the best results are obtain with MFCC and Polynomial (degree 3) kernel function, but good results are obtain too, using MFCC coefficient with Polynomial (degree 2) and RBF kernel. Figure 4: Mean EER (for all speakers and vowels) for different kernels and coefficients Table 2: Mean EER (for all speakers and vowels) for different kernels and coefficients Methods Mean EER LPC+Pol2 11.86 LPC+Pol3 11.52 LPC+RBF 13.29 MFCC+Pol2 10.88 MFCC+Pol3 10.54 MFCC+RBF 10.67 P12-4 RTO-MP-IST-091

Using the results mentioned above, in the second phase, the comparisons are made using only Polynomial (degree 3) kernel and MFCC coefficients. In this phase the experiments show what are the vowel with the best results (figure 5), and using this vowel, what are the results obtained by each speaker (figure 6). Figure 5: Mean EER (for all speakers) for Romanian vowels obtained with Polynomial (degree 3) kernel and MFCC coefficients Figure 6: EER for speakers obtained with Polynomial (degree 3) kernel and MFCC coefficients for vowel a 5. CONCLUSIONS In this paper we describe a method for speaker verification implemented in SVM with SVMLib and a database, recorded in a laboratory, with 10 speakers and 500 sentences. For that purpose, we used LPC and MFCC coefficients as features extracted from Romanian vowels. For SVM we used RBF and polynomial kernels (degree 2 and 3). Our conclusion was that using polynomial (degree 3) kernel, MFCC coefficients and a vowel we obtained an EER equal to 4.82, that is a good result. The differences between the error rates for speakers and vowels are big. For vowels, one supposition is that the database is small and some vowels are better training that the other. Certainly, all these results can be improved using a bigger professional database. [1] V. Vapnik. Three remarks on the support vector method of function estimation, In Advances in Kernel Methods - Support Vector Learning, MIT Press, 1999 [2] Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press 2000 [3] www.csie.ntu.edu.tw/~cjlin/libsvm/ - a library for support vector machines classification and regression, developed by National Taiwan University RTO-MP-IST-091 P12-5

P12-6 RTO-MP-IST-091