ABC submission for NIST SRE 2016
|
|
- Job Martin
- 5 years ago
- Views:
Transcription
1 ABC submission for NIST SRE 2016 Agnitio+BUT+CRIM Oldrich Plchot, Pavel Matejka, Ondrej Novotny, Anna Silnova, Johan Rohdin, Mireia Diez, Ondrej Glembek, Xiaowei Jiang, Lukas Burget, Martin Karafiat, Lucas Ondel, Frantisek Grezl, Niko Brummer, Albert Swart, Paola García, Jesús Jorrín, Luis Buera, Patrick Kenny, Jahangir Alam, Gautam Bhattacharya December 11, San Diego, NIST SRE 2016
2 Overview General system architecture Agnitio CRIM Data, features, system architecture Speaker Classifier Network (SCN) Scoring with Beta-Bernoulli Backend BUT System design Clustering analysis Results BUT subsystems Analysis with PLDA system Analysis with DPLDA system BUT DEV set design Different flavours of calibration/fusion Results Conclusions
3 System architecture 2 3 SUM Fusion 7 LR calibration (BUT DEV) LR fusion (BUT DEV) MMFBG Fusion (DEV16) NIG calibration (DEV16) Calibrated score 8 Linear cal. (DEV16) SUM Fusion
4 s System Conservative and simple system: Interesting: Stacking of ivectors NDA+norm BNF (pnorm, 5-hidden layer) Two main systems based on the position of the bottleneck: BNF-2 Second layer BNF-4 Fourth layer Fusion
5 s System Interesting ideas we explored, but didn t work quite well: Clustering: language and gender (HC and score PLDA) Adaptation of the NDA Normalizing according to clustering In dev we could observe clusters Unlabeled and labeled minor Unlabeled major
6 s System Results: Development Equalized BNF-4 BNF-2 FUSION EER mincprim ActCPrim
7 s System Results: Development Equalized BNF-4 BNF-2 FUSION EER mincprim ActCPrim Eval Equalized EER mincprim ActCprim Conclusions and highlights: Normalization helped. Speaker clustering was difficult, didn t help in normalization.
8 CRIM Site for NIST SRE 2016 Jahangir Alam, Patrick Kenny, Gautam Bhattacharya CRIM NIST SRE 2016 Workshop
9 9 Outline Data preparation Feature Extraction Training and Extraction of I-vectors Speaker Classifier Network Beta-Bernoulli Backend Results
10 10 Data preparation OBD: Mandarin, Chinese, and Tagalog from NIST SREs PBD: Switchboard + all recordings from NIST SREs excluding the Mandarin, Chinese, and Tagalog. SRE16UNLABELED: Unlabeled training data from SRE16 OD: OBD + SRE16UNLABELED
11 11 Feature Extraction MFCC (60-dimensional, MFCC_E_D_A) LFCC (60-dimensional, LFCC_E_D_A) LPCC (60-dimensional, LPCC_E_D_A)
12 12 Training and Extraction of I-vectors
13 13 Speaker Classifier Network The SCN is two layers deep and uses sigmoid non-linearity in the hidden layers. Each hidden layer consists of 2000 hidden units. The softmax output distribution is over 4323 speakers in the background set (Primary Background Data + Oriental Background Data). We extract the activations of the last hidden layer and treat them as feature vectors (d-vectors) for speaker verification.
14 14 Beta-Bernoulli Backend (1/2) For each node in the last hidden layer, we compare the activations on the enrollment side and the test side by supposing them to be generated by a biased coin toss: - The probability of heads is drawn from a Beta - One draw for the same speaker hypothesis. prior. - Two draws for the different speaker hypothesis. The Beta priors (one per node) are trainable. Reference. T. Minka Estimating a Dirichlet distribution, 2012.
15 15 Beta-Bernoulli Backend (2/2)
16 16 Results on Evaluation Data
17 17 Results on Evaluation Data
18 DPLDA_PLP - single best system on DEV, submitted as contrastive 2 *SVM_PLP - failed on eval, caused some miscalibration *PLDA_MFCC - Used NDA instead of LDA, miscalibrated on eval DPLDA_MFCC - initialized from PLDA_MFCC, much better calibrated PLDA_TEL_PLP - PLP, only telephone data for PLDA training PLDA_TEL_PERS - Perseus, only telephone data for PLDA training PLDA_MFCCSBN - MFCC+Bottleneck features Bottleneck on Fisher English - fixed condition Botleneck on BABEL languages - open condition * Problems with calibration on eval
19 Analysis on MFCC PLDA system
20 Feature comparison with PLDA system ivector system with 2048G/600ivec/L2norm/200lda, WCC(gender,lang,train+unlabeled data), adaptive z+t-norm Results computed on all trials from the eval key Features feadim EER[%] mincprim MFCC PLP MFCC+SBN80-BABEL (open cond) Perseus MFCC+SBN80-Fisher All features perform about the same. There is no superiority of BN as we saw on SRE2010 data. More analysis and comparisons in our SLT paper.
21 BUT DEV set We used PRISM language condition for calibration/fusion We split the segments into short cuts to reflect speech duration in enroll/test of DEV16 We split it into calibration and test part (no jack-knifing) We added multi-enroll trials For the purposes of calibration/fusion, we used only non-english trials We tried to add short segments of non-english training data into the PLDA This did not improve results of PLDA nor DPLDA NO DEV16 annotated data in BUT part of the submission In DPLDA, we used unlabeled data to form non-target trials
22 Calibration/Fusion LR: optimizing cross-entropy on the supervised DEV set, training shift and scale Linear f ( (s-mnon).^2/vnon + log(vnon) -(s-mtar).^2/vtar - log(vtar) )/2; Normal Inverse Gaussian Distribution f ( (s-mnon).^2 - (s-mtar).^2 )/(2*v); Quadratic s_cal = a*s + b Can contain fat-tailed, skewed, or classical normal distributions NIG process is a special case of Levy process (brownian motion, Poisson process) Calibrate: tarlh=nig_logpdf(betat,gammat,deltat,mut,s); nonlh=nig_logpdf(betan,gamman,deltan,mun,s); llr = tarlh - nonlh; Multiclass Multivariate Fully Bayesian Classifier (MMFBG)
23 System architecture 2 3 SUM Fusion 7 LR calibration (BUT DEV) LR fusion (BUT DEV) 8 Linear cal. (DEV16) SUM Fusion MMFBG Fusion (DEV16) NIG calibration (DEV16) Calibrated score Contrastive sys.
24 ABC primary system (NIG_CAL)
25 ABC primary system (Q_CAL)
26 BUT Fusion (contrastive1) - LR
27 no SVM,DPLDA+NDA - MMFBG, BUT DEV, QCAL
28 MMFBG on DEV16, QCAL
29 Results - equalized, NIST scoring tool System EER[%] mincprim actcprim ABC_PRIMARY_NIGCAL ABC_PRIMARY_QCAL Agnitio (SUM, NIGCAL) CRIM (SUM, QCAL) BUT (CONTRASTIVE_FIX on BUT_DEV) BUT_DPLDA_PLP (CONTRASTIVE2, LR)
30 Conclusions Evaluation was challenging Calibration was not a big issue We were not able to successfully exploit clustering on unlabeled DEV data There is probably a big channel mismatch between SRE16 and all older MIXER data Even systems like relevance map, eigenchannel comp., SVM, etc. were competitive Small models (256G, 400ivec) were performing close to the big ones BN features and BN+MFCC were not outperforming single MFCC or PLP system Although we had a hard time to fuse on small DEV16 We designed an out-of-domain dataset that provided good calibration for eval (BUT_DEV) We were positively surprised by the DPLDA performance There is definitely some room for improvement and tuning
31 THANK YOU We are happy for the dataset with a lot of room for improvement and research :)
32 Analysis with DPLDA
33 Analysis on MFCC PLDA system
34 Analysis on MFCC PLDA system II s-norm* = mean(t-norm,z-norm), 500 closest i-vectors were used (based on score). nne - non native english cuts, noe - non english cuts
35 Analysis with DPLDA 1. Baseline DPLDA system, trained on telephone part of Mixer+Fisher+Switchboard. Uses PLP based 600dim ivecs. 2. The same as 1, but with SRE16 unlabeled data added to the training. 3. The same as 2 with NAP applied (we use 20 language classes from training set and one class for unlabeled data). 4. The same as 3, with snorm_easy applied. Snorm is calculated on unlabeled data only. This is what went into a BUT fusion. 5. The same as 4, but instead of snorm_easy, we apply adaptive snorm. Size of the cohort is set to 200, snorm again calculated only on unlabeled data. 6. Contrastive 2 system. The same as 4, but also SRE16 development data is added to the training set. 7. Similar to what Nuance did, the same as previous one, but instead of adding development data once we add it 6 times. 8. Add to training set corrupted version of development data. 9. The same as 6 but instead of snorm_easy, we apply adaptive snorm.
36 DPLDA results SRE16 dev SRE16 eval EER[%] mincprim EER[%] mincprim 1. DPLDA (2048G/600ivec250LDA, tel data) SRE16 unlab NAP snorm asnorm SRE16 dev data xSRE16 dev data corr SRE16 dev data SRE16 dev data
37 BUT - SVM classifier One SVM per speaker trained using the enrollment ivector(s) as positive samples and unlabeled major and unlabeled minor data as negative samples. Length normalization, WCCN and NAP were applied to ivectors. Trained with telephone data from Mixer+Fisher+Switchboard. The classes for NAP were languages present in the training data. ZT-Norm was applied to system scores. ZNorm was trained on a subset of Chinese utterances from the training portion of non-english short cuts, plus the data from unlabeled major and unlabeled minor sets. TNorm was trained with the SVM models trained on Chinese cuts, using the unlabeled major and minor sets as background data (negative samples).
SUT Submission for NIST 2016 Speaker Recognition Evaluation: Description and Analysis
The 2017 Conference on Computational Linguistics and Speech Processing ROCLING 2017, pp. 276-286 The Association for Computational Linguistics and Chinese Language Processing SUT Submission for NIST 2016
More informationarxiv: v1 [cs.sd] 8 Jun 2017
SUT SYSTEM DESCRIPTION FOR NIST SRE 2016 Hossein Zeinali 1,2, Hossein Sameti 1 and Nooshin Maghsoodi 1 1 Sharif University of Technology, Tehran, Iran 2 Brno University of Technology, Speech@FIT and IT4I
More informationSRE08 system. Nir Krause Ran Gazit Gennady Karvitsky. Leave Impersonators, fraudsters and identity thieves speechless
Leave Impersonators, fraudsters and identity thieves speechless SRE08 system Nir Krause Ran Gazit Gennady Karvitsky Copyright 2008 PerSay Inc. All Rights Reserved Focus: Multilingual telephone speech and
More informationTrial-Based Calibration for Speaker Recognition in Unseen Conditions
Trial-Based Calibration for Speaker Recognition in Unseen Conditions Mitchell McLaren, Aaron Lawson, Luciana Ferrer, Nicolas Scheffer, Yun Lei Speech Technology and Research Laboratory SRI International,
More informationComparative Evaluation of Feature Normalization Techniques for Speaker Verification
Comparative Evaluation of Feature Normalization Techniques for Speaker Verification Md Jahangir Alam 1,2, Pierre Ouellet 1, Patrick Kenny 1, Douglas O Shaughnessy 2, 1 CRIM, Montreal, Canada {Janagir.Alam,
More informationBo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on
Bo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on TAN Zhili & MAK Man-Wai APSIPA 2015 Department of Electronic and Informa2on Engineering The Hong Kong Polytechnic
More informationThe Approach of Mean Shift based Cosine Dissimilarity for Multi-Recording Speaker Clustering
The Approach of Mean Shift based Cosine Dissimilarity for Multi-Recording Speaker Clustering 1 D. Jareena Begum, 2 K Rajendra Prasad, 3 M Suleman Basha 1 M.Tech in SE, RGMCET, Nandyal 2 Assoc Prof, Dept
More informationIMPROVED SPEAKER RECOGNITION USING DCT COEFFICIENTS AS FEATURES. Mitchell McLaren, Yun Lei
IMPROVED SPEAKER RECOGNITION USING DCT COEFFICIENTS AS FEATURES Mitchell McLaren, Yun Lei Speech Technology and Research Laboratory, SRI International, California, USA {mitch,yunlei}@speech.sri.com ABSTRACT
More informationPDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/94752
More informationFOUR WEIGHTINGS AND A FUSION: A CEPSTRAL-SVM SYSTEM FOR SPEAKER RECOGNITION. Sachin S. Kajarekar
FOUR WEIGHTINGS AND A FUSION: A CEPSTRAL-SVM SYSTEM FOR SPEAKER RECOGNITION Sachin S. Kajarekar Speech Technology and Research Laboratory SRI International, Menlo Park, CA, USA sachin@speech.sri.com ABSTRACT
More informationTHE 2013 SPEAKER RECOGNITION EVALUATION IN MOBILE ENVIRONMENT
RESEARCH IDIAP REPORT THE 013 SPEAKER RECOGNITION EVALUATION IN MOBILE ENVIRONMENT Elie Khoury Bostjan Vesnicer Javier Franco-Pedroso Ricardo Violato Zenelabidine Boulkenafet Luis-Miguel Mazaira Fernandez
More informationWhy DNN Works for Speech and How to Make it More Efficient?
Why DNN Works for Speech and How to Make it More Efficient? Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering, York University, CANADA Joint work with Y.
More informationTowards PLDA-RBM based Speaker Recognition in Mobile Environment: Designing Stacked/Deep PLDA-RBM Systems
Nautch, Hao, Stafylaki, Rathgeb, Buch PLDA-RBM mobile data / Shanghai, 23.03.2016 1/14 Toward PLDA-RBM baed Speaker Recognition in Mobile Environment: Deigning Stacked/Deep PLDA-RBM Sytem A. Nautch, H.
More informationClient Dependent GMM-SVM Models for Speaker Verification
Client Dependent GMM-SVM Models for Speaker Verification Quan Le, Samy Bengio IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland {quan,bengio}@idiap.ch Abstract. Generative Gaussian Mixture Models (GMMs)
More informationDeep Model Compression
Deep Model Compression Xin Wang Oct.31.2016 Some of the contents are borrowed from Hinton s and Song s slides. Two papers Distilling the Knowledge in a Neural Network by Geoffrey Hinton et al What s the
More informationImproving Robustness to Compressed Speech in Speaker Recognition
INTERSPEECH 2013 Improving Robustness to Compressed Speech in Speaker Recognition Mitchell McLaren 1, Victor Abrash 1, Martin Graciarena 1, Yun Lei 1, Jan Pe sán 2 1 Speech Technology and Research Laboratory,
More informationVulnerability of Voice Verification System with STC anti-spoofing detector to different methods of spoofing attacks
Vulnerability of Voice Verification System with STC anti-spoofing detector to different methods of spoofing attacks Vadim Shchemelinin 1,2, Alexandr Kozlov 2, Galina Lavrentyeva 2, Sergey Novoselov 1,2
More informationInter-session Variability Modelling and Joint Factor Analysis for Face Authentication
Inter-session Variability Modelling and Joint Factor Analysis for Face Authentication Roy Wallace Idiap Research Institute, Martigny, Switzerland roy.wallace@idiap.ch Mitchell McLaren Radboud University
More informationMultimedia Event Detection for Large Scale Video. Benjamin Elizalde
Multimedia Event Detection for Large Scale Video Benjamin Elizalde Outline Motivation TrecVID task Related work Our approach (System, TF/IDF) Results & Processing time Conclusion & Future work Agenda 2
More informationMusic Genre Classification
Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different
More informationQuery-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram
International Conference on Education, Management and Computing Technology (ICEMCT 2015) Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based
More informationThe Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays
CHiME2018 workshop The Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays Naoyuki Kanda 1, Rintaro Ikeshita 1, Shota Horiguchi 1,
More informationIntroducing I-Vectors for Joint Anti-spoofing and Speaker Verification
Introducing I-Vectors for Joint Anti-spoofing and Speaker Verification Elie Khoury, Tomi Kinnunen, Aleksandr Sizov, Zhizheng Wu, Sébastien Marcel Idiap Research Institute, Switzerland School of Computing,
More informationPair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification 2 1 Xugang Lu 1, Peng Shen 1, Yu Tsao 2, Hisashi
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationAutoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.
Stacked Denoising Sparse Variational (Adapted from Paul Quint and Ian Goodfellow) Stacked Denoising Sparse Variational Autoencoding is training a network to replicate its input to its output Applications:
More informationOnline and Offline Fingerprint Template Update Using Minutiae: An Experimental Comparison
Online and Offline Fingerprint Template Update Using Minutiae: An Experimental Comparison Biagio Freni, Gian Luca Marcialis, and Fabio Roli University of Cagliari Department of Electrical and Electronic
More informationPredicting Popular Xbox games based on Search Queries of Users
1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which
More informationFeature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web
Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web Chenghua Lin, Yulan He, Carlos Pedrinaci, and John Domingue Knowledge Media Institute, The Open University
More informationI-VECTORS FOR TIMBRE-BASED MUSIC SIMILARITY AND MUSIC ARTIST CLASSIFICATION
I-VECTORS FOR TIMBRE-BASED MUSIC SIMILARITY AND MUSIC ARTIST CLASSIFICATION Hamid Eghbal-zadeh Bernhard Lehner Markus Schedl Gerhard Widmer Department of Computational Perception, Johannes Kepler University
More informationOptimizing feature representation for speaker diarization using PCA and LDA
Optimizing feature representation for speaker diarization using PCA and LDA itsikv@netvision.net.il Jean-Francois Bonastre jean-francois.bonastre@univ-avignon.fr Outline Speaker Diarization what is it?
More informationVoiceprint-based Access Control for Wireless Insulin Pump Systems
Voiceprint-based Access Control for Wireless Insulin Pump Systems Presenter: Xiaojiang Du Bin Hao, Xiali Hei, Yazhou Tu, Xiaojiang Du, and Jie Wu School of Computing and Informatics, University of Louisiana
More informationPROBLEM 4
PROBLEM 2 PROBLEM 4 PROBLEM 5 PROBLEM 6 PROBLEM 7 PROBLEM 8 PROBLEM 9 PROBLEM 10 PROBLEM 11 PROBLEM 12 PROBLEM 13 PROBLEM 14 PROBLEM 16 PROBLEM 17 PROBLEM 22 PROBLEM 23 PROBLEM 24 PROBLEM 25
More informationApril 3, 2012 T.C. Havens
April 3, 2012 T.C. Havens Different training parameters MLP with different weights, number of layers/nodes, etc. Controls instability of classifiers (local minima) Similar strategies can be used to generate
More informationThe L 2 F Query-by-Example Spoken Term Detection system for the ALBAYZIN 2016 evaluation
The L 2 F Query-by-Example Spoken Term Detection system for the ALBAYZIN 2016 evaluation Anna Pompili and Alberto Abad L 2 F - Spoken Language Systems Lab, INESC-ID Lisboa IST - Instituto Superior Técnico,
More informationHANDSET-DEPENDENT BACKGROUND MODELS FOR ROBUST. Larry P. Heck and Mitchel Weintraub. Speech Technology and Research Laboratory.
HANDSET-DEPENDENT BACKGROUND MODELS FOR ROBUST TEXT-INDEPENDENT SPEAKER RECOGNITION Larry P. Heck and Mitchel Weintraub Speech Technology and Research Laboratory SRI International Menlo Park, CA 9 ABSTRACT
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems
More informationPRACTICE FINAL EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
PRACTICE FINAL EXAM: SPRING 0 CS 675 INSTRUCTOR: VIBHAV GOGATE May 4, 0 The exam is closed book. You are allowed four pages of double sided cheat sheets. Answer the questions in the spaces provided on
More informationAnalyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun
Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More informationSupervector Compression Strategies to Speed up I-Vector System Development
Supervector Compression Strategies to Speed up I-Vector System Development Ville Vestman Tomi Kinnunen University of Eastern Finland Finland vvestman@cs.uef.fi tkinnu@cs.uef.fi system can be achieved [5
More informationApplications of Keyword-Constraining in Speaker Recognition. Howard Lei. July 2, Introduction 3
Applications of Keyword-Constraining in Speaker Recognition Howard Lei hlei@icsi.berkeley.edu July 2, 2007 Contents 1 Introduction 3 2 The keyword HMM system 4 2.1 Background keyword HMM training............................
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationMeasuring Aristic Similarity of Paintings
Measuring Aristic Similarity of Paintings Jay Whang Stanford SCPD jaywhang@stanford.edu Buhuang Liu Stanford SCPD buhuang@stanford.edu Yancheng Xiao Stanford SCPD ycxiao@stanford.edu Abstract In this project,
More informationTWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University
TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION Prateek Verma, Yang-Kai Lin, Li-Fan Yu Stanford University ABSTRACT Structural segmentation involves finding hoogeneous sections appearing
More informationConfidence Measures: how much we can trust our speech recognizers
Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition
More informationTwo-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Cemil Zalluhoğlu Introduction Aim Extend deep Convolution Networks to action recognition in video. Motivation
More informationNeural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders
Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components
More informationOutline. Incorporating Biometric Quality In Multi-Biometrics FUSION. Results. Motivation. Image Quality: The FVC Experience
Incorporating Biometric Quality In Multi-Biometrics FUSION QUALITY Julian Fierrez-Aguilar, Javier Ortega-Garcia Biometrics Research Lab. - ATVS Universidad Autónoma de Madrid, SPAIN Loris Nanni, Raffaele
More informationFacial expression recognition using shape and texture information
1 Facial expression recognition using shape and texture information I. Kotsia 1 and I. Pitas 1 Aristotle University of Thessaloniki pitas@aiia.csd.auth.gr Department of Informatics Box 451 54124 Thessaloniki,
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationSTC ANTI-SPOOFING SYSTEMS FOR THE ASVSPOOF 2015 CHALLENGE
STC ANTI-SPOOFING SYSTEMS FOR THE ASVSPOOF 2015 CHALLENGE Sergey Novoselov 1,2, Alexandr Kozlov 2, Galina Lavrentyeva 1,2, Konstantin Simonchik 1,2, Vadim Shchemelinin 1,2 1 ITMO University, St. Petersburg,
More informationAccelerated Machine Learning Algorithms in Python
Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals
More informationOVERVIEW OF BTAS 2016 SPEAKER ANTI-SPOOFING COMPETITION
RESEARCH REPORT IDIAP OVERVIEW OF BTAS 2016 SPEAKER ANTI-SPOOFING COMPETITION Pavel Korshunov Sébastien Marcel a Hannah Muckenhirn A. R. Gonçalves A. G. Souza Mello R. P. Velloso Violato Flávio Simões
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationCS229: Action Recognition in Tennis
CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active
More informationMaking Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition
Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition Tara N. Sainath 1, Brian Kingsbury 1, Bhuvana Ramabhadran 1, Petr Fousek 2, Petr Novak 2, Abdel-rahman Mohamed 3
More informationComplex Identification Decision Based on Several Independent Speaker Recognition Methods. Ilya Oparin Speech Technology Center
Complex Identification Decision Based on Several Independent Speaker Recognition Methods Ilya Oparin Speech Technology Center Corporate Overview Global provider of voice biometric solutions Company name:
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationProbabilistic Graphical Models
Overview of Part One Probabilistic Graphical Models Part One: Graphs and Markov Properties Christopher M. Bishop Graphs and probabilities Directed graphs Markov properties Undirected graphs Examples Microsoft
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More information1 Introduction. 3 Data Preprocessing. 2 Literature Review
Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationDATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing
More informationK-Means Clustering 3/3/17
K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering
More informationReplay Attack Detection using DNN for Channel Discrimination
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Replay Attack Detection using DNN for Channel Discrimination Parav Nagarsheth, Elie Khoury, Kailash Patil, Matt Garland Pindrop, Atlanta, USA {pnagarsheth,ekhoury,kpatil,matt.garland}@pindrop.com
More informationCAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha
More informationIdentification of the correct hard-scatter vertex at the Large Hadron Collider
Identification of the correct hard-scatter vertex at the Large Hadron Collider Pratik Kumar, Neel Mani Singh pratikk@stanford.edu, neelmani@stanford.edu Under the guidance of Prof. Ariel Schwartzman( sch@slac.stanford.edu
More informationDATA MINING TEST 2 INSTRUCTIONS: this test consists of 4 questions you may attempt all questions. maximum marks = 100 bonus marks available = 10
COMP717, Data Mining with R, Test Two, Tuesday the 28 th of May, 2013, 8h30-11h30 1 DATA MINING TEST 2 INSTRUCTIONS: this test consists of 4 questions you may attempt all questions. maximum marks = 100
More informationAction Recognition & Categories via Spatial-Temporal Features
Action Recognition & Categories via Spatial-Temporal Features 华俊豪, 11331007 huajh7@gmail.com 2014/4/9 Talk at Image & Video Analysis taught by Huimin Yu. Outline Introduction Frameworks Feature extraction
More informationAn Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation
An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007
More informationChapter 9. Classification and Clustering
Chapter 9 Classification and Clustering Classification and Clustering Classification and clustering are classical pattern recognition and machine learning problems Classification, also referred to as categorization
More informationAn Exploration of Computer Vision Techniques for Bird Species Classification
An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex
More informationSUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018
SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work
More informationReturn of the Devil in the Details: Delving Deep into Convolutional Nets
Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen Simonyan - Andrea Vedaldi - Andrew Zisserman University of Oxford The Devil is still in the Details 2011 2014
More informationImproving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data
INTERSPEECH 17 August 24, 17, Stockholm, Sweden Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data Achintya Kr. Sarkar 1, Md. Sahidullah 2, Zheng-Hua
More informationAutoencoders, denoising autoencoders, and learning deep networks
4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,
More informationPresentation attack detection in voice biometrics
Chapter 1 Presentation attack detection in voice biometrics Pavel Korshunov and Sébastien Marcel Idiap Research Institute, Martigny, Switzerland {pavel.korshunov,sebastien.marcel}@idiap.ch Recent years
More informationSAS: A speaker verification spoofing database containing diverse attacks
SAS: A speaker verification spoofing database containing diverse attacks Zhizheng Wu 1, Ali Khodabakhsh 2, Cenk Demiroglu 2, Junichi Yamagishi 1,3, Daisuke Saito 4, Tomoki Toda 5, Simon King 1 1 University
More informationMultifactor Fusion for Audio-Visual Speaker Recognition
Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 2007 70 Multifactor Fusion for Audio-Visual Speaker Recognition GIRIJA CHETTY
More informationMultivariate Data Analysis and Machine Learning in High Energy Physics (V)
Multivariate Data Analysis and Machine Learning in High Energy Physics (V) Helge Voss (MPI K, Heidelberg) Graduierten-Kolleg, Freiburg, 11.5-15.5, 2009 Outline last lecture Rule Fitting Support Vector
More informationFACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU
FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU 1. Introduction Face detection of human beings has garnered a lot of interest and research in recent years. There are quite a few relatively
More informationDealing with sensor interoperability in multi-biometrics: The UPM experience at the Biosecure Multimodal Evaluation 2007
Dealing with sensor interoperability in multi-biometrics: The UPM experience at the Biosecure Multimodal Evaluation 2007 Fernando Alonso-Fernandez, Julian Fierrez, Daniel Ramos, Javier Ortega-Garcia ATVS/Biometrics
More informationTodo before next class
Todo before next class Each project group should submit a short project report (4 pages presentation slides) including 1. Problem definition 2. Related work 3. Preliminary results 4. Future plan Submission:
More informationMACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014
MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationGrounded Compositional Semantics for Finding and Describing Images with Sentences
Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationRossmann Store Sales. 1 Introduction. 3 Datasets and Features. 2 Related Work. David Beam and Mark Schramm. December 2015
Rossmann Store Sales David Beam and Mark Schramm 1 Introduction December 015 The objective of this project is to forecast sales in euros at 1115 stores owned by Rossmann, a European pharmaceutical company.
More informationPrototype of Silver Corpus Merging Framework
www.visceral.eu Prototype of Silver Corpus Merging Framework Deliverable number D3.3 Dissemination level Public Delivery data 30.4.2014 Status Authors Final Markus Krenn, Allan Hanbury, Georg Langs This
More informationDetection of Acoustic Events in Meeting-Room Environment
11/Dec/2008 Detection of Acoustic Events in Meeting-Room Environment Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of 34 Content Introduction State of the Art Acoustic
More informationTHE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York
THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York The MNIST database of handwritten digits, available from this page, has a training set
More informationAutomatic summarization of video data
Automatic summarization of video data Presented by Danila Potapov Joint work with: Matthijs Douze Zaid Harchaoui Cordelia Schmid LEAR team, nria Grenoble Khronos-Persyvact Spring School 1.04.2015 Definition
More informationDeep Learning on Graphs
Deep Learning on Graphs with Graph Convolutional Networks Hidden layer Hidden layer Input Output ReLU ReLU, 6 April 2017 joint work with Max Welling (University of Amsterdam) The success story of deep
More informationK-Nearest Neighbor Classification Approach for Face and Fingerprint at Feature Level Fusion
K-Nearest Neighbor Classification Approach for Face and Fingerprint at Feature Level Fusion Dhriti PEC University of Technology Chandigarh India Manvjeet Kaur PEC University of Technology Chandigarh India
More informationClinical Named Entity Recognition Method Based on CRF
Clinical Named Entity Recognition Method Based on CRF Yanxu Chen 1, Gang Zhang 1, Haizhou Fang 1, Bin He, and Yi Guan Research Center of Language Technology Harbin Institute of Technology, Harbin, China
More information