Complex Identification Decision Based on Several Independent Speaker Recognition Methods. Ilya Oparin Speech Technology Center

Similar documents
Passive Detection. What is KIVOX Passive Detection? Product Datasheet. Key Benefits. 3 APIs in one product

Manual operations of the voice identification program GritTec's Speaker-ID: The Mobile Client

) GNOMES. Professional audio recorders

SUGGESTED SOLUTION IPCC MAY 2017EXAM. Test Code - I M J

Voice Command Based Computer Application Control Using MFCC

Bo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on

Authentication of Fingerprint Recognition Using Natural Language Processing

STC ANTI-SPOOFING SYSTEMS FOR THE ASVSPOOF 2015 CHALLENGE

Comparative Evaluation of Feature Normalization Techniques for Speaker Verification

Vulnerability of Voice Verification System with STC anti-spoofing detector to different methods of spoofing attacks

Hybrid Biometric Person Authentication Using Face and Voice Features

Implementation of Speech Based Stress Level Monitoring System

T3main. Powering comprehensive unified communications solutions.

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

SHT-2B/USB SHT-4B/USB

Implementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi

SkillsManager TM. Business advantage through IT skills management

Guide to Speaker Verification & Voice Biometrics

Multifactor Fusion for Audio-Visual Speaker Recognition

Available online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article

STATE-OF-THE-ART VOICE SOLUTIONS PROVIDER LED BY AWARD WINNING TEAM FROM NORDIC

Biometric Authentication. Bringing End users and Enterprise on the same page

Client Dependent GMM-SVM Models for Speaker Verification

The new maximum security smartphone No Camera - No GPS - No Recorder

SHT-8B/PCI SHT-8B/PCI/FAX SHT-16B-CT/PCI SHT-16B-CT/PCI/FAX

How accurate is AGNITIO KIVOX Voice ID?

Speaker Verification with Adaptive Spectral Subband Centroids

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

Maximum Likelihood Beamforming for Robust Automatic Speech Recognition

Car Information Systems for ITS

When Recognition Matters WHITEPAPER CLFE CERTIFIED LEAD FORENSIC EXAMINER.

Tomorrow s Surveillance. Solutions Today

WELCOME TO THE CONNECTIVITY

Minimal-Impact Personal Audio Archives

User Guide GNOME 7.0. Portable Digital Stereo Voice Recorder STC-H476. Operation manual

Computer Information Science xxx

Technical Security Training

Text-Independent Speaker Identification

Best-in-class audio recording

Speaker Verification Using SVM

MULTIMODAL PERSON IDENTIFICATION IN A SMART ROOM. J.Luque, R.Morros, J.Anguita, M.Farrus, D.Macho, F.Marqués, C.Martínez, V.Vilaplana, J.

Smart Logger II. User Guide SMART LOGGER II. Multichannel Call Recording and Monitoring System STC-S303. User Guide

Multi-modal Person Identification in a Smart Environment

IBM ViaVoice for Windows, Release 8 Family of Products, Spanish Language Version, Delivers Improved ViaVoice Performance

IMPROVED SPEAKER RECOGNITION USING DCT COEFFICIENTS AS FEATURES. Mitchell McLaren, Yun Lei

TRBONET PLUS FOR MOTOTRBO PREMIUM CONTROL ROOM SOLUTION FOR MOTOTRBO DIGITAL TWO-WAY RADIO SYSTEMS SOLD AND SUPPORTED BY MOTOROLA SOLUTIONS

UltraMatch. Standalone Iris Recognition System

Aditi Upadhyay Research Scholar, Department of Electronics & Communication Engineering Jaipur National University, Jaipur, Rajasthan, India

PJP-50USB. Conference Microphone Speaker. User s Manual MIC MUTE VOL 3 CLEAR STANDBY ENTER MENU

Web Content Accessibility Guidelines (WCAG) Whitepaper

Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification

STUDY OF SPEAKER RECOGNITION SYSTEMS

Mobile Biometrics (MoBio): Joint Face and Voice Verification for a Mobile Platform

Improving Robustness to Compressed Speech in Speaker Recognition

Salamanca Risk Management. Global Security and Risk Management Solutions. www. salamancarm. com

Input speech signal. Selected /Rejected. Pre-processing Feature extraction Matching algorithm. Database. Figure 1: Process flow in ASR

Speech-Coding Techniques. Chapter 3

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun

Cyber Defense Operations Center

PJP-25UR Conference Microphone Speaker

Biometric identity verification for large-scale high-security apps. Face Verification SDK

SRE08 system. Nir Krause Ran Gazit Gennady Karvitsky. Leave Impersonators, fraudsters and identity thieves speechless

Voiceprint-based Access Control for Wireless Insulin Pump Systems

A text-independent speaker verification model: A comparative analysis

Real Time Speaker Recognition System using MFCC and Vector Quantization Technique

Detection of Acoustic Events in Meeting-Room Environment

Yealink Audio Conferencing Solution Easy Conferencing, Clear Communication

SHT-2B/USB SHT-4B/USB

Cisco SP Wi-Fi Solution Support, Optimize, Assurance, and Operate Services

Building an Assurance Foundation for 21 st Century Information Systems and Networks

The Approach of Mean Shift based Cosine Dissimilarity for Multi-Recording Speaker Clustering

Symantec Backup Exec 2012

Topics for thesis. Automatic Speech-based Emotion Recognition

Sample Telephone Engineering Analysis

CCNP ROUTING & SWITCHING

Significant sales outside Germany have been recorded in 2012, the worldwide operating audio specialist shows an export quota of 84 per cent.

Detector. Flash. Detector

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

MILESTONE. MacroVIEW DM. Macro digital imaging for advanced anatomical and forensic investigations

Agent vs Agentless Log Collection

OUR VISION To be a global leader of computing research in identified areas that will bring positive impact to the lives of citizens and society.

Topics in Operating Systems (mini-project)

PJP-25UR Conference Microphone Speaker

REAL TIME RECOGNITION OF SPEAKERS FROM INTERNET AUDIO STREAM

white paper SMS Authentication: 10 Things to Know Before You Buy

Making each relationship with the client EXTRAORDINARY! Corporate Profile

Speakler recognition for stand-alone or Web applications. VeriSpeak SDK

Supriya M. H. Department of Electronics, Cochin University of Science and Technology, Cochin, India

Chapter 4 After Incident Detection

The Center for Internet Security

COPYRIGHTED MATERIAL. Introduction. 1.1 Introduction

Webinar Training Series NetBorder Express Version 4.0. May 5, 2011

MATLAB Apps for Teaching Digital Speech Processing

A network is a group of connected, communicating devices such as computers and printers. An internet

The Steganography In Inactive Frames Of Voip

FOUR WEIGHTINGS AND A FUSION: A CEPSTRAL-SVM SYSTEM FOR SPEAKER RECOGNITION. Sachin S. Kajarekar

CHAPTER 6 EFFICIENT TECHNIQUE TOWARDS THE AVOIDANCE OF REPLAY ATTACK USING LOW DISTORTION TRANSFORM

ATP-24A/PCI(2.0) ATP-24A/PCI+(2.0) ATP-24A/PCIe(3.0) ATP-24A/PCIe+(3.0) Analog Tap Passive Board

Device Activation based on Voice Recognition using Mel Frequency Cepstral Coefficients (MFCC s) Algorithm

Transcription:

Complex Identification Decision Based on Several Independent Speaker Recognition Methods Ilya Oparin Speech Technology Center

Corporate Overview Global provider of voice biometric solutions Company name: Speech Technology Center, Ltd Core expertise: Voice identification and verification Audio forensics Professional audio recording Noise cancelation Location: Russia Germany Mexico USA (office in 2009) The year of foundation: 1990 Staff: 250 including 25 world-class PhD

Global Customer Base in More than 60 Countries Law enforcement Government Corporate clients Integrators & developers

Strong R&D Capabilities Ambitious and experienced team: One of the leading R&D teams (voice sector) in the world: over 100 technical specialists, scientists and software developers (including 25 PhDs), 5 certified audio forensic experts. Strong management and sales teams STC R&D facility, Saint-Petersburg

Why Speech? Speech is a key communication tool in all fields of the human activity Voice can not be lost or stolen Speech Using speech we can identify the person without any direct contact with him Samples taking procedure does not require any additional hardware

Audio Forensics Global leader in audio forensics Over 15 years of experience Forensic speaker identification. Authenticity analysis of analog or digital audio recordings. Audio equipment for forensic examination and identification. Speech enhancement and audio restoration. Text transcription of low quality recordings.

Audio Forensics Automatic algorithms for real-time noise suppression and speech enhancement. Sound Cleaner Premium! the first and the second prize in audio enhancement contest by AES (Audio Engineering Society), Denver, 2008 Efficient suppression of all types of noises and distortions Adaptive algorithms of filtering Filters can be combined to process the record simultaneously

Main Challenges State-of-the-art voice-id systems face four basic challenges: Ensuring robustness to noise (real life audio) Ensuring robust performance across different sound recording channels and levels of speaker stress Effective processing of large-scale (nation-wide) databases Language and context independent identification

Speaker Identification Methods Spectral-formant method Spectral-forman( me(+o, -S/M1 is base, on (+e uni6ue s+a7e of eac+ 7erson9s vocal tract which is reflected in the visible speech of different people. An e=am7le of forman( re7resen(a(ion of (+e 7+rase >/orensic au,io? 7ronounce, by (Ao different persons is shown in the picture (The horizontal axis is time in seconds. The vertical axis is frequency in Hertz. Energy level is depicted by the darkness of the trace).

Speaker Identification Methods Pitch statistics method Pitch statistics method (PSM) engages 16 different pitch parameters, including average pitch value, maximum, minimum, median, percent of areas with rising pitch, pitch logarithm variation, pitch logarithm asymmetry, pitch logarithm excess and 8 parameters more. An e=am7le of au(oma(e, 7i(c+ e=(rac(ion in (+e 7+rase >/orensic au,io? 7ronounce, by two different persons is shown in the picture

Speaker Identification Methods GMM/SVM method In the GMM/SVM approach Gaussian mixtures are used to approximate statistical distributions of MFCC (Mel frequency cepstral coefficients) parameters extracted from speech of different speakers. Support Vector Machines are a robust classifier in multi-dimensional space.

Peculiarities of Different Methods Method Spectral- Formant Pitch Statistics Dependence on speech signal characteristics Signal duration Signal quality Emotional state + ++ +++ ++ +++ + GMM/SVM ++ + ++ Fusion (STC) ++ +++ +++

Fusion Solution Ability to work with signals from various communication channels Both microphone and telephone (landline, GSM) Robust to noise Low-quality signal processing (SNR down to 10 db) Processing of short speech signals Speaker identification by a few seconds of speech

Performance of Different Methods Database NIST SRE 2004 Spectral-Formant method EER=13% Pitch statistics EER=15.9% GMM/SVM Fusion EER=7.5% EER=4.7%

Adaptation Customization - ability to adapt the system to the key parameters of search Speech Database Adaptation of parameters! taking features of a specific speech database into account Identification results

Voice Identification for Experts TrawlLab - Facilitating voice ID analysis while carrying out multi-target forensic investigation by eliminating imposters and ranging the top-in-the-list speakers according to likelihood probability.

VoiceNet.ID VoiceNet.ID is designed for: Reliable identification on a nation-wide voice database of speakers. VoiceNet.ID highlights Storage and real-time processing of large volume of voiceprints Client-server architecture Web-client Ben(raliCe, s7eakers9 7rofiles re7osi(ory Multi-user system Secure storage and access Remote access to the database Additional information storage (video, photo, text)

VoiceNet.ID Architecture Record Web operator Application server Database server Record LAN operator Records Import operator Calculation cluster

VoiceNet.ID Speaker's profile card 0SPC3 Automatically extracts biometric traits of voice and speech from the attached sound records. Speaker card can contain wealth additional information about the person (text, photo, video etc).

VoiceNet.ID Database management SPCs in the database can be organized into unlimited number of sections and subsections to facilitate further search.

VoiceNet.ID Identification results The results of 4VoiceNet.ID; search presented in the form of a list with indication of likelihood probability (LR) of each record containing the speech of a target speaker.

VoiceNet.ID Technical specs: DBMS - Oracle 11g, PostgreSQL, ready to be adapted for others OS! UNIX (Solaris 10, Linux), Windows Server 2003 or later Web Service based architecture Application Server (GlassFish V3, Tomcat 6, ready to be adapted for others ) Cluster calculations JPPF 1.8 Performance & scalability: Size! Speed! Tasks! Database is scalable up to 10`000`000 cards Performance directly linked to the computing power of a server (parallel calculation support) The system can be adopted to any voice ID challenge (search for unknown speakers in the database or search for known speakers in the stream of audio files)

Contacts!"#$%&'()&*(+&'()+&#,,-$,.($/ WWW.SPEECHPRO.COM tel.: +7 812 331-0665 fax: +7 812 327-9297