Experimental Evaluation of Latent Variable Models. for Dimensionality Reduction

Similar documents
Experimental Evaluation of Latent Variable Models for Dimensionality Reduction

Experimental Evaluation of Latent Variable Models for Dimensionality Reduction

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

The Laplacian Eigenmaps Latent Variable Model

Clustering Lecture 5: Mixture Model

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France

Trajectory Inverse Kinematics By Conditional Density Models

Note Set 4: Finite Mixture Models and the EM Algorithm

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Problem 1 (20 pt) Answer the following questions, and provide an explanation for each question.

The K-modes and Laplacian K-modes algorithms for clustering

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Time Series Analysis by State Space Methods

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

Locally Linear Landmarks for large-scale manifold learning

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Semi-Supervised Construction of General Visualization Hierarchies

Self-organizing mixture models

Generative and discriminative classification techniques

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Learning a Manifold as an Atlas Supplementary Material

Constrained Hidden Markov Models

Clustering algorithms

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Robust cartogram visualization of outliers in manifold learning

10-701/15-781, Fall 2006, Final

Variational Autoencoders. Sargur N. Srihari

Deep Generative Models Variational Autoencoders

Mixture Models and the EM Algorithm

Hierarchical Gaussian Process Latent Variable Models

Audio-Visual Speech Activity Detection

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

Discriminative training and Feature combination

Spatial Outlier Detection

A New Manifold Representation for Visual Speech Recognition

Segmentation: Clustering, Graph Cut and EM

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

Hybrid Quasi-Monte Carlo Method for the Simulation of State Space Models

Bo#leneck Features from SNR- Adap9ve Denoising Deep Classifier for Speaker Iden9fica9on

Statistical Techniques in Robotics (16-831, F12) Lecture#05 (Wednesday, September 12) Mapping

SMEM Algorithm for Mixture Models

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Tight Clusters and Smooth Manifolds with the Harmonic Topographic Map.

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Grundlagen der Künstlichen Intelligenz

Pictorial Structures for Object Recognition

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS

Split Merge Incremental LEarning (SMILE) of Mixture Models

Random projection for non-gaussian mixture models

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Nonlinear Image Interpolation using Manifold Learning

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

Automatic Singular Spectrum Analysis for Time-Series Decomposition

Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Warped Mixture Models

Experimental Analysis of GTM

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Deep Mixtures of Factor Analysers

ESTIMATING HEAD POSE WITH AN RGBD SENSOR: A COMPARISON OF APPEARANCE-BASED AND POSE-BASED LOCAL SUBSPACE METHODS

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Unsupervised Learning

FACE RECOGNITION USING INDEPENDENT COMPONENT

08 An Introduction to Dense Continuous Robotic Mapping

Multi-pose lipreading and audio-visual speech recognition

arxiv: v1 [cond-mat.dis-nn] 30 Dec 2018

Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

Passive Differential Matched-field Depth Estimation of Moving Acoustic Sources

The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example

Developing a Data Driven System for Computational Neuroscience

Overview of machine learning

Thesis Proposal : Switching Linear Dynamic Systems with Higher-order Temporal Structure. Sang Min Oh

SGN (4 cr) Chapter 11

Straight Lines and Hough

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Statistical Techniques in Robotics (STR, S15) Lecture#05 (Monday, January 26) Lecturer: Byron Boots

Mixture Models and EM

MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Local Linear Embedding. Katelyn Stringer ASTR 689 December 1, 2015

Adaptation of a mixture of multivariate Bernoulli distributions

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari

Tree-based Cluster Weighted Modeling: Towards A Massively Parallel Real- Time Digital Stradivarius

MSA220 - Statistical Learning for Big Data

Facial Expression Detection Using Implemented (PCA) Algorithm

3D Human Motion Analysis and Manifolds

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Rectification and Distortion Correction

STA 4273H: Sta-s-cal Machine Learning

Single Particle Reconstruction Techniques

Guide for inversion of noisy magnetic field using FFT

This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section

t 1 y(x;w) x 2 t 2 t 3 x 1

Transcription:

Experimental Evaluation of Latent Variable Models for Dimensionality Reduction Miguel Á. Carreira-Perpiñán and Steve Renals a Dept. of Computer Science, University of Sheffield {M.Carreira,S.Renals}@dcs.shef.ac.uk th IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing (NNSP8) Aug. Sep., 8, Cambridge, UK a This work has been supported by a scholarship from the Spanish Ministry of Education and Science, by a ESPRIT Long Term Research Project SPRACH (77) and by an award from the Nuffield Foundation.

Electropalatography (EPG) A plastic pseudopalate fitted to a person s mouth detects the presence or absence of contact between the tongue and the palate in 6 different locations during an utterance (sampled at Hz). Result: sequence of 6-dimensional binary EPG frames. Data reduction necessary, traditionally via fixed linear indices. ACCOR-II database: synchronised data (EPG, acoustic, etc.) for different utterances and speakers. The mapping phoneme-to-epg is not one-to-one, e.g. / or /. Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction

Electropalatography (cont.) wires to PC lips palate teeth teeth wires to PC teeth teeth lips velum palate electrodes velum electrodes Pseudopalate and representative EPGs for the typical stable phase of different phonemes. Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction

NNSP8, AUG. SEP., 8, CARIDGE, UK - The Reading pseudopalate. Sfrag replacements EXPERIMENTAL EVALUATION OF LATENT VARIABLE MODELS FOR DIMENSIONALITY REDUCTION

Latent variable models Prior p(x) Induced p(t Θ) t x x f f(x; Θ) t Manifold M t t x Latent space of dimension L = Data space of dimension D = Marginalisation in latent space: p(t) = p(t x)p(x) dx. Maximum likelihood parameter estimation: l(θ) = N n= log p(t n Θ). Inverse mapping given by informative point (mean, mode) of posterior: p(x t) = p(t x)p(x) p(t). Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction

Examples of latent variable models Factor analysis: prior is normal N (, I), mapping is linear, noise model is normal with diagonal covariance matrix. : like but noise model has isotropic covariance. : prior is uniform over discrete latent grid, mapping is a generalised linear model, noise model is normal with isotropic covariance matrix. Mixtures of factor analysers (one mean parameter and one factor per analyser, noise model covariance matrix common to all analysers). We also tried mixtures of multivariate Bernoulli distributions (not really a latent variable model). All these models can be trained via an EM algorithm. Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction

Factors / prototypes (speaker RK) λ λ λ λ λ 5 λ 6 λ 7 λ 8 λ λ λ λ λ λ 5 λ 6 λ 7 λ 8 λ λ λ λ λ λ 5 λ 6 λ 7 λ 8 λ λ λ λ λ 5 λ 6 λ 7 λ 8 λ M Λ µ π =. Λ µ π =. Λ µ π =. Λ µ π =.7 p π =.5 p π =. p π =.7 p π =. p 5 π 5 =.5 p 6 π 6 =. p 7 π 7 =. p 8 π 8 =.5 p π =. Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction 5

and reconstruction error (speaker RK) Training set Test set x 5 5 x 6 5 6 6 8 6 M 6 5 6 5 6 6 8 6 M 6 5 6 8 6 6 8 6 Squared reconstruction error 8 7 6 5 6 5 6 M 6 8 6 6 6 8 6 8 7 6 5 6 M 5 6 6 8 6 6 6 8 6 Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction 6

! " #,. # ) *,+ - %, Two-dimensional representation (speaker RK) Factor analysis 5 6 7 8.5.8 frag replacements Factor.5.5.5 Factor Factor.6.....6 8 7 6 5 8 7 5 6 5 8.5.5.5.5 Factor.8 6 7.8.6.....6.8 Trajectory in latent space of the highlighted utterance fragment I prefer Kant to Hobbes for a good bedtime book ( #(! " $&% /. ). Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction 7

Conclusions Adaptive methods outperform fixed data reduction indices. and performed similarly in terms of likelihood. Mixtures of factor analysers and multivariate Bernoulli distributions did not perform well. Two-dimensional outperformed all other methods in terms of likelihood and error reconstruction and reveals nonlinear structure in the data. This suggests a low intrinsic dimensionality for the EPG data. Additional results available via the web at http://www.dcs.shef.ac.uk/ miguel/research/epg.html Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction 8

Factors / prototypes (speaker HD) λ λ λ λ λ 5 λ 6 λ 7 λ 8 λ λ λ λ λ λ 5 λ 6 λ 7 λ 8 λ λ λ λ λ λ 5 λ 6 λ 7 λ 8 λ λ λ λ λ 5 λ 6 λ 7 λ 8 λ M Λ µ π =. Λ µ π =.6 Λ µ π =. Λ µ π =.6 p π =. p π =. p π =. p π =. p 5 π 5 =. p 6 π 6 =.7 p 7 π 7 =. p 8 π 8 =.7 p π =. Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction

and reconstruction error (speaker HD) Training set Test set Squared reconstruction error x 5 6 5 6 6 8 6 8 6 7 6 5 M 6 5 6 6 6 6 8 6 6 6 8 6 M 6 6 x 6 5 6 6 8 8 6 8 6 8 7 6 5 M 6 5 6 6 6 6 8 6 6 M 6 8 6 Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction

! " #,. # ) *,+ - %, Two-dimensional representation (speaker HD) Factor analysis 7 frag replacements Factor Factor.8.6 8. 5 5 8 5 6 Factor.. 7 8 6 7 5 8 8 7 6 5 7 6. 6 5.6 5.8.5.5.5 Factor.8.6.....6.8 Trajectory in latent space of the highlighted utterance fragment I prefer Kant to Hobbes for a good bedtime book ( #(! " $&% /. ). Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction

!!! # ) *,+ - %,!! " #,. Discontinuities in latent space (speaker RK) Factor analysis.5.8 Factor.5.5.5 5 6.6.....6 5 6 Frame 5 Frame 6.5.5.5.5.5 Factor 7.8.8.6.....6.8 7 Frame 7 Selected subsequence of the utterance fragment I prefer Kant to Hobbes for a good bedtime book ( #! " $ % /. ). The abrupt transition from to (frames 5 6) produces a discontinuity in latent space. Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction

factors before and after varimax rotation (speaker HD) BEFORE λ λ λ λ λ 5 λ 6 λ 7 λ 8 λ AFTER λ λ λ λ λ 5 λ 6 λ 7 λ 8 λ Both sets of components span the same linear subspace, but the varimax-rotated one is more easily interpretable. Miguel Á. Carreira-Perpiñán and Steve Renals Experimental Evaluation of Latent Variable Models for Dimensionality Reduction