LabROSA Research Overview

Size: px
Start display at page:

Download "LabROSA Research Overview"

Transcription

1 LabROSA Research Overview Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA 1. Music 2. Environmental sound 3. Speech Enhancement Laboratory for the Recognition and Organization of Speech and Audio COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK LabROSA Research Overview - Dan Ellis /20

2 LabROSA Getting information from sound Information Extraction Music Machine Learning Recognition Separation Retrieval Speech Environment Signal Processing LabROSA Research Overview - Dan Ellis /20

3 1. Music Audio Analysis Trained classifiers for low-level information notes, chords, beats, section boundaries E.g. Polyphonic transcription feature agnostic needs training data Poliner & Ellis 06 LabROSA Research Overview - Dan Ellis /20

4 Million Song Dataset Industrial-scale database for music information research Many facets: Echo Nest audio features + metadata Echo Nest taste profile user-song-listen count Second Hand Song covers musixmatch lyric BoW last.fm tags Now with audio? resolving artist / album / track / duration against what.cd Bertin-Mahieux McFee LabROSA Research Overview - Dan Ellis /20

5 MIDI-to-MSD Raffel Aligned MIDI to Audio is a nice transcription Can we find matches in large databases? LabROSA Research Overview - Dan Ellis /20

6 Singing ASR McVicar Speech recognition adapted to singing needs aligned data Extensive work to line up scraped acapellas and full mix including jumps LabROSA Research Overview - Dan Ellis /20

7 4, one can hear some high frequency isolated coefficients superimposed to the separated voice. This drawback could be reduced by including harmonicity priors in the sparse component of RPCA, as Papadopoulos proposed in [20]. Ground versus estimated voice activity location. ImRPCAtruth separates vocals and background perfect voice location still allows an improvebasedactivity on low rankinformation optimization ment, although to a lesser extent than with ground-truth voice acparameter single trade-off tivity information. The decrease in the results mainly comes from basedclassified on higher-level features? adjust background segments as vocalmusical segments. Block Structure RPCA e. Fig. 4. Separated for various LabROSA Research Overview voice - Dan Ellis values of λ for the Pink Noise Party song - 7 /

8 Ordinal LDA Segmentation McFee Low-rank decomposition of skewed self-similarity to identify repeats Learned weighting of multiple factors to segment Linear Discriminant Analysis between adjacent segments LabROSA Research Overview - Dan Ellis /20

9 2. Environmental Sound Extracting useful information from soundtracks e.g. TRECVID Multimedia Event Detection (MED) Making a Sandwich, Getting a Vehicle Unstuck 100 examples, find matches in 100k videos manual annotations for ~10 h E009 Getting a Vehicle Unstuck LabROSA Research Overview - Dan Ellis /20

10 Foreground Event Recognition Cotton, Ellis, Loui 11 Transients = foreground events? Onset detector finds energy bursts best SNR PCA basis to represent each 300 ms x auditory freq bag of transients LabROSA Research Overview - Dan Ellis /20

11 NMF Transient Features Decompose spectrograms into templates + activation X = W H well-behaved gradient descent 2D patches sparsity control computation time Basis 1 (L2) Basis 2 (L1) Basis 3 (L1) freq / Hz Original mixture Smaragdis & Brown 03 Abdallah & Plumbley 04 Virtanen 07 Cotton & Ellis 11 LabROSA Research Overview - Dan Ellis / time / s

12 Background Retrieval Classify soundtracks by statistics of ambience E.g. Texture features Sound Automatic gain control Subband distributions Envelope cross-corrs mel x filterbank x (18 chans) x freq / Hz mel band Envelope correlation Cross-band correlations (318 samples) mean, var, skew, kurt (18 x 4) Modulation energy (18 x 6) LabROSA Research Overview - Dan Ellis /20 x x x 1159_10 urban cheer clap Texture features FFT Histogram McDermott et al. 09 Ellis, Zheng, McDermott 11 Octave bins 0.5,1,2,4,8,16 Hz 1062_60 quiet dubbed speech music time / s M V S K M V S K moments mod frq / Hz mel band moments mod frq / Hz mel band 1 0 level

13 Auditory Model Features Lyon et al Lee & Ellis 2012 Cotton & Ellis 2013 Subband Autocorrelation PCA Simplified version of autocorrelogram 10x faster than Lyon original Capture fine time structure in multiple bands information lost in MFCCs delay line short-time autocorrelation Subband VQ Sound Cochlea filterbank frequency channels Subband VQ Subband VQ Subband VQ Histogram Feature Vector freq lag time Correlogram slice LabROSA Research Overview - Dan Ellis /20

14 Subband Autocorrelation delay line short-time autocorrelation Autocorrelation stabilizes fine time structure Sound Cochlea filterbank frequency channels freq lag Correlogram time slice Subband VQ Subband VQ Subband VQ Subband VQ Histogram Feature Vector 25 ms window, lags up to 25 ms calculated every 10 ms normalized to max (zero lag) LabROSA Research Overview - Dan Ellis /20

15 Retrieval Examples High precision for in-domain top hits LabROSA Research Overview - Dan Ellis /20

16 3. Speech Enhancement Noisy speech scenarios Ambient recording (background noise) Communication channel (processing distortion) CAR KIT - BP in 100 HOME LAND - BP in db 50 freq / Hz level / db freq / Hz 1500 Hz chan time / s freq / Hz time / s level / db LabROSA Research Overview - Dan Ellis /20

17 RPCA Enhancement Chen, McFee & Ellis 14 Decompose spectrogram into sparse + low-rank Sparse activation H of dictionary W min H,L,S H khk 1 + L klk + S ksk 1 + I + (H) s.t. Y = WH + L + S ASR benefits: C S D I Orig RPCA wie+rpca LabROSA Research Overview - Dan Ellis /20

18 Classification Pitch Tracker SAcC: MLP trained on noisy speech with ground-truth pitch track targets Large benefits for in-domain noisy speech PTE (%) FDA RBF and pink noise YIN Wu get_f0 10 SAcC SNR (db) Lee & Ellis 12 LabROSA Research Overview - Dan Ellis /20

19 Pitch-Normalized Enhancement Use noise-robust pitch tracker for enhancement? 1000 Clean signal Normalize voice pitch Fixed-pitch enhancement Reimpose pitch Frequency Noisy signal pitch 500 smoothed pvx resampled to pitch = 100 Hz Filtered pvsmooth Resampled back to original pitch LabROSA Research Overview - Dan Ellis /20 Time

20 Summary Music transcription, segmentation, alignment for ground truth Soundtracks foreground events, background ambience Noisy Speech classification pitch tracking spectrogram enhancement LabROSA Research Overview - Dan Ellis /20

Mining Large-Scale Music Data Sets

Mining Large-Scale Music Data Sets Mining Large-Scale Music Data Sets Dan Ellis & Thierry Bertin-Mahieux Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu

More information

Minimal-Impact Personal Audio Archives

Minimal-Impact Personal Audio Archives Minimal-Impact Personal Audio Archives Dan Ellis, Keansub Lee, Jim Ogle Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu

More information

Audio & Music Research at LabROSA

Audio & Music Research at LabROSA Audio & Music Research at LabROSA Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/

More information

The Intervalgram: An Audio Feature for Large-scale Cover-song Recognition

The Intervalgram: An Audio Feature for Large-scale Cover-song Recognition The Intervalgram: An Audio Feature for Large-scale Cover-song Recognition Thomas C. Walters, David A. Ross, and Richard F. Lyon Google, Brandschenkestrasse 110, 8002 Zurich, Switzerland tomwalters@google.com

More information

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri 1 CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING Alexander Wankhammer Peter Sciri introduction./the idea > overview What is musical structure?

More information

Multimedia Indexing. Lecture 12: EE E6820: Speech & Audio Processing & Recognition. Spoken document retrieval Audio databases.

Multimedia Indexing. Lecture 12: EE E6820: Speech & Audio Processing & Recognition. Spoken document retrieval Audio databases. EE E6820: Speech & Audio Processing & Recognition Lecture 12: Multimedia Indexing 1 Spoken document retrieval 2 Audio databases 3 Open issues Dan Ellis http://www.ee.columbia.edu/~dpwe/e6820/

More information

Lecture 12: Multimedia Indexing. Spoken Document Retrieval (SDR)

Lecture 12: Multimedia Indexing. Spoken Document Retrieval (SDR) EE E68: Speech & Audio Processing & Recognition Lecture : Multimedia Indexing 3 Spoken document retrieval Audio databases Open issues Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

Hands On: Multimedia Methods for Large Scale Video Analysis (Project Meeting) Dr. Gerald Friedland,

Hands On: Multimedia Methods for Large Scale Video Analysis (Project Meeting) Dr. Gerald Friedland, Hands On: Multimedia Methods for Large Scale Video Analysis (Project Meeting) Dr. Gerald Friedland, fractor@icsi.berkeley.edu 1 Today Today Project Requirements Today Project Requirements Data available

More information

MUSIC/VOICE SEPARATION USING THE 2D FOURIER TRANSFORM. Prem Seetharaman, Fatemeh Pishdadian, Bryan Pardo

MUSIC/VOICE SEPARATION USING THE 2D FOURIER TRANSFORM. Prem Seetharaman, Fatemeh Pishdadian, Bryan Pardo MUSIC/VOICE SEPARATION USING THE 2D FOURIER TRANSFORM Prem Seetharaman, Fatemeh Pishdadian, Bryan Pardo Northwestern University Electrical Engineering and Computer Science Evanston, IL ABSTRACT Audio source

More information

Short-Term Audio-Visual Atoms for Generic Video Concept Classification

Short-Term Audio-Visual Atoms for Generic Video Concept Classification Short-Term Audio-Visual Atoms for Generic Video Concept Classification Authors Wei Jiang Courtenay Cotton Shih-Fu Chang Dan Ellis Alexander C. Loui Presenters Armin Samii Images from the interwebs, 2009

More information

Multimedia Event Detection for Large Scale Video. Benjamin Elizalde

Multimedia Event Detection for Large Scale Video. Benjamin Elizalde Multimedia Event Detection for Large Scale Video Benjamin Elizalde Outline Motivation TrecVID task Related work Our approach (System, TF/IDF) Results & Processing time Conclusion & Future work Agenda 2

More information

Advanced techniques for management of personal digital music libraries

Advanced techniques for management of personal digital music libraries Advanced techniques for management of personal digital music libraries Jukka Rauhala TKK, Laboratory of Acoustics and Audio signal processing Jukka.Rauhala@acoustics.hut.fi Abstract In this paper, advanced

More information

Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification

Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification

More information

LARGE-SCALE COVER SONG RECOGNITION USING THE 2D FOURIER TRANSFORM MAGNITUDE

LARGE-SCALE COVER SONG RECOGNITION USING THE 2D FOURIER TRANSFORM MAGNITUDE LARGE-SCALE COVER SONG RECOGNITION USING THE D FOURIER TRANSFORM MAGNITUDE Thierry Bertin-Mahieux Columbia University LabROSA, EE Dept. tb33@columbia.edu Daniel P.W. Ellis Columbia University LabROSA,

More information

Multiple Kernel Learning for Emotion Recognition in the Wild

Multiple Kernel Learning for Emotion Recognition in the Wild Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,

More information

Detection of goal event in soccer videos

Detection of goal event in soccer videos Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,

More information

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 1 Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 33720, Tampere, Finland toni.heittola@tut.fi,

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

EVALUATING MUSIC SEQUENCE MODELS THROUGH MISSING DATA

EVALUATING MUSIC SEQUENCE MODELS THROUGH MISSING DATA EVALUATING MUSIC SEQUENCE MODELS THROUGH MISSING DATA Thierry Bertin-Mahieux, Graham Grindlay Columbia University LabROSA New York, USA Ron J. Weiss and Daniel P.W. Ellis New York University / Columbia

More information

Sparse Models in Image Understanding And Computer Vision

Sparse Models in Image Understanding And Computer Vision Sparse Models in Image Understanding And Computer Vision Jayaraman J. Thiagarajan Arizona State University Collaborators Prof. Andreas Spanias Karthikeyan Natesan Ramamurthy Sparsity Sparsity of a vector

More information

Robustness and independence of voice timbre features under live performance acoustic degradations

Robustness and independence of voice timbre features under live performance acoustic degradations Robustness and independence of voice timbre features under live performance acoustic degradations Dan Stowell and Mark Plumbley dan.stowell@elec.qmul.ac.uk Centre for Digital Music Queen Mary, University

More information

Online music recognition: the Echoprint system

Online music recognition: the Echoprint system Online music recognition: the Echoprint system Cors Brinkman cors.brinkman@gmail.com Manolis Fragkiadakis vargmanolis@gmail.com Xander Bos xander.bos@gmail.com ABSTRACT Echoprint is an open source music

More information

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc. CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems Leigh M. Smith Humtap Inc. leigh@humtap.com Basic system overview Segmentation (Frames, Onsets, Beats, Bars, Chord Changes, etc) Feature

More information

Available online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article

Available online   Journal of Scientific and Engineering Research, 2016, 3(4): Research Article Available online www.jsaer.com, 2016, 3(4):417-422 Research Article ISSN: 2394-2630 CODEN(USA): JSERBR Automatic Indexing of Multimedia Documents by Neural Networks Dabbabi Turkia 1, Lamia Bouafif 2, Ellouze

More information

Hello, I am from the State University of Library Studies and Information Technologies, Bulgaria

Hello, I am from the State University of Library Studies and Information Technologies, Bulgaria Hello, My name is Svetla Boytcheva, I am from the State University of Library Studies and Information Technologies, Bulgaria I am goingto present you work in progress for a research project aiming development

More information

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION Prateek Verma, Yang-Kai Lin, Li-Fan Yu Stanford University ABSTRACT Structural segmentation involves finding hoogeneous sections appearing

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/

More information

AudioSet: Real-world Audio Event Classification

AudioSet: Real-world Audio Event Classification AudioSet: Real-world Audio Event Classification g.co/audioset Rif A. Saurous, Shawn Hershey, Dan Ellis, Aren Jansen and the Google Sound Understanding Team 2017-10-20 Outline The Early Years: Weakly-Supervised

More information

INTERACTIVE REFINEMENT OF SUPERVISED AND SEMI-SUPERVISED SOUND SOURCE SEPARATION ESTIMATES

INTERACTIVE REFINEMENT OF SUPERVISED AND SEMI-SUPERVISED SOUND SOURCE SEPARATION ESTIMATES INTERACTIVE REFINEMENT OF SUPERVISED AND SEMI-SUPERVISED SOUND SOURCE SEPARATION ESTIMATES Nicholas J. Bryan Gautham J. Mysore Center for Computer Research in Music and Acoustics, Stanford University Adobe

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Separating Speech From Noise Challenge

Separating Speech From Noise Challenge Separating Speech From Noise Challenge We have used the data from the PASCAL CHiME challenge with the goal of training a Support Vector Machine (SVM) to estimate a noise mask that labels time-frames/frequency-bins

More information

ALIGNED HIERARCHIES: A MULTI-SCALE STRUCTURE-BASED REPRESENTATION FOR MUSIC-BASED DATA STREAMS

ALIGNED HIERARCHIES: A MULTI-SCALE STRUCTURE-BASED REPRESENTATION FOR MUSIC-BASED DATA STREAMS ALIGNED HIERARCHIES: A MULTI-SCALE STRUCTURE-BASED REPRESENTATION FOR MUSIC-BASED DATA STREAMS Katherine M. Kinnaird Department of Mathematics, Statistics, and Computer Science Macalester College, Saint

More information

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING Christopher Burges, Daniel Plastina, John Platt, Erin Renshaw, and Henrique Malvar March 24 Technical Report MSR-TR-24-19 Audio fingerprinting

More information

METRIC LEARNING BASED DATA AUGMENTATION FOR ENVIRONMENTAL SOUND CLASSIFICATION

METRIC LEARNING BASED DATA AUGMENTATION FOR ENVIRONMENTAL SOUND CLASSIFICATION METRIC LEARNING BASED DATA AUGMENTATION FOR ENVIRONMENTAL SOUND CLASSIFICATION Rui Lu 1, Zhiyao Duan 2, Changshui Zhang 1 1 Department of Automation, Tsinghua University 2 Department of Electrical and

More information

The Sensitivity Matrix

The Sensitivity Matrix The Sensitivity Matrix Integrating Perception into the Flexcode Project Jan Plasberg Flexcode Seminar Lannion June 6, 2007 SIP - Sound and Image Processing Lab, EE, KTH Stockholm 1 SIP - Sound and Image

More information

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014 MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)

More information

SCREAM AND GUNSHOT DETECTION IN NOISY ENVIRONMENTS

SCREAM AND GUNSHOT DETECTION IN NOISY ENVIRONMENTS SCREAM AND GUNSHOT DETECTION IN NOISY ENVIRONMENTS L. Gerosa, G. Valenzise, M. Tagliasacchi, F. Antonacci, A. Sarti Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza Leonardo da

More information

Multimedia Database Systems. Retrieval by Content

Multimedia Database Systems. Retrieval by Content Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,

More information

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for

More information

Predicting Song Popularity

Predicting Song Popularity Predicting Song Popularity James Pham jqpham@stanford.edu Edric Kyauk ekyauk@stanford.edu Edwin Park edpark@stanford.edu Abstract Predicting song popularity is particularly important in keeping businesses

More information

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE

More information

Columbia University High-Level Feature Detection: Parts-based Concept Detectors

Columbia University High-Level Feature Detection: Parts-based Concept Detectors TRECVID 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab

More information

Display. 2-Line VA LCD Display. 2-Zone Variable-Color Illumination

Display. 2-Line VA LCD Display. 2-Zone Variable-Color Illumination Display 2-Line VA LCD Display Equipped with VA (Vertical Alignment) LCD panels that offer a broader angle of view and better visibility. The 2-line display provides more information with animation effects.

More information

MASK: Robust Local Features for Audio Fingerprinting

MASK: Robust Local Features for Audio Fingerprinting 2012 IEEE International Conference on Multimedia and Expo MASK: Robust Local Features for Audio Fingerprinting Xavier Anguera, Antonio Garzon and Tomasz Adamek Telefonica Research, Torre Telefonica Diagonal

More information

Discriminative training and Feature combination

Discriminative training and Feature combination Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics

More information

IN the recent decades digital music has become more. Codebook based Audio Feature Representation for Music Information Retrieval

IN the recent decades digital music has become more. Codebook based Audio Feature Representation for Music Information Retrieval IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 1 Codebook based Audio Feature Representation for Music Information Retrieval Yonatan Vaizman, Brian McFee, member, IEEE, and Gert Lanckriet, senior

More information

Sparse coding for image classification

Sparse coding for image classification Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction

More information

Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery

Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta Kumar Ghosh SPIRE LAB Electrical Engineering, Indian Institute of Science (IISc), Bangalore,

More information

TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 1. Yonatan Vaizman, Brian McFee, and Gert Lanckriet

TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 1. Yonatan Vaizman, Brian McFee, and Gert Lanckriet TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 1 Codebook based Audio Feature Representation for Music Information Retrieval Yonatan Vaizman, Brian McFee, and Gert Lanckriet arxiv:1312.5457v1 [cs.ir]

More information

Codebook-Based Audio Feature Representation for Music Information Retrieval

Codebook-Based Audio Feature Representation for Music Information Retrieval IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 10, OCTOBER 2014 1483 Codebook-Based Audio Feature Representation for Music Information Retrieval Yonatan Vaizman, Brian McFee,

More information

Robust and Efficient Multiple Alignment of Unsynchronized Meeting Recordings

Robust and Efficient Multiple Alignment of Unsynchronized Meeting Recordings JOURNAL OF L A TEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 1 Robust and Efficient Multiple Alignment of Unsynchronized Meeting Recordings TJ Tsai, Student Member, IEEE, Andreas Stolcke, Fellow, IEEE,

More information

High-level Event Recognition in Internet Videos

High-level Event Recognition in Internet Videos High-level Event Recognition in Internet Videos Yu-Gang Jiang School of Computer Science Fudan University, Shanghai, China ygj@fudan.edu.cn Joint work with Guangnan Ye 1, Subh Bhattacharya 2, Dan Ellis

More information

MPEG-1 Bitstreams Processing for Audio Content Analysis

MPEG-1 Bitstreams Processing for Audio Content Analysis ISSC, Cork. June 5- MPEG- Bitstreams Processing for Audio Content Analysis Roman Jarina, Orla Duffner, Seán Marlow, Noel O Connor, and Noel Murphy Visual Media Processing Group Dublin City University Glasnevin,

More information

Using Noise Substitution for Backwards-Compatible Audio Codec Improvement

Using Noise Substitution for Backwards-Compatible Audio Codec Improvement Using Noise Substitution for Backwards-Compatible Audio Codec Improvement Colin Raffel AES 129th Convention San Francisco, CA February 16, 2011 Outline Introduction and Motivation Coding Error Analysis

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about

More information

THE IMPORTANCE OF F0 TRACKING IN QUERY-BY-SINGING-HUMMING

THE IMPORTANCE OF F0 TRACKING IN QUERY-BY-SINGING-HUMMING 15th International Society for Music Information Retrieval Conference (ISMIR 214) THE IMPORTANCE OF F TRACKING IN QUERY-BY-SINGING-HUMMING Emilio Molina, Lorenzo J. Tardón, Isabel Barbancho, Ana M. Barbancho

More information

Previous Lecture - Coded aperture photography

Previous Lecture - Coded aperture photography Previous Lecture - Coded aperture photography Depth from a single image based on the amount of blur Estimate the amount of blur using and recover a sharp image by deconvolution with a sparse gradient prior.

More information

An efficient face recognition algorithm based on multi-kernel regularization learning

An efficient face recognition algorithm based on multi-kernel regularization learning Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel

More information

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Beyond Bag of Words Bag of Words a document is considered to be an unordered collection of words with no relationships Extending

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Consumer Video Understanding

Consumer Video Understanding Consumer Video Understanding A Benchmark Database + An Evaluation of Human & Machine Performance Yu-Gang Jiang, Guangnan Ye, Shih-Fu Chang, Daniel Ellis, Alexander C. Loui Columbia University Kodak Research

More information

Accelerating Multimodal Sequence Retrieval with Convolutional Networks

Accelerating Multimodal Sequence Retrieval with Convolutional Networks Accelerating Multimodal Sequence Retrieval with Convolutional Networks Colin Raffel LabROSA Columbia University New York, NY 10027 craffel@gmail.com Daniel P. W. Ellis LabROSA Columbia University New York,

More information

Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

More information

The Automatic Musicologist

The Automatic Musicologist The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12, 2004 Based on the paper: Fast Recognition of Musical

More information

Speaker Verification with Adaptive Spectral Subband Centroids

Speaker Verification with Adaptive Spectral Subband Centroids Speaker Verification with Adaptive Spectral Subband Centroids Tomi Kinnunen 1, Bingjun Zhang 2, Jia Zhu 2, and Ye Wang 2 1 Speech and Dialogue Processing Lab Institution for Infocomm Research (I 2 R) 21

More information

Seamless transfer of ambient media from environment to mobile device

Seamless transfer of ambient media from environment to mobile device Technical Disclosure Commons Defensive Publications Series May 23, 2018 Seamless transfer of ambient media from environment to mobile device Dominik Roblek Matthew Sharifi Follow this and additional works

More information

Lecture 7: Audio Compression & Coding

Lecture 7: Audio Compression & Coding EE E682: Speech & Audio Processing & Recognition Lecture 7: Audio Compression & Coding 1 2 3 Information, compression & quantization Speech coding Wide bandwidth audio coding Dan Ellis

More information

Content-based Video Genre Classification Using Multiple Cues

Content-based Video Genre Classification Using Multiple Cues Content-based Video Genre Classification Using Multiple Cues Hazım Kemal Ekenel Institute for Anthropomatics Karlsruhe Institute of Technology (KIT) 76131 Karlsruhe, Germany ekenel@kit.edu Tomas Semela

More information

The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System

The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System Our first participation on the TRECVID workshop A. F. de Araujo 1, F. Silveira 2, H. Lakshman 3, J. Zepeda 2, A. Sheth 2, P. Perez

More information

Second author Retain these fake authors in submission to preserve the formatting

Second author Retain these fake authors in submission to preserve the formatting l 1 -GRAPH BASED MUSIC STRUCTURE ANALYSIS First author Affiliation1 author1@ismir.edu Second author Retain these fake authors in submission to preserve the formatting Third author Affiliation3 author3@ismir.edu

More information

Computer Vesion Based Music Information Retrieval

Computer Vesion Based Music Information Retrieval Computer Vesion Based Music Information Retrieval Philippe De Wagter pdewagte@andrew.cmu.edu Quan Chen quanc@andrew.cmu.edu Yuqian Zhao yuqianz@andrew.cmu.edu Department of Electrical and Computer Engineering

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

BIRD-PHRASE SEGMENTATION AND VERIFICATION: A NOISE-ROBUST TEMPLATE-BASED APPROACH

BIRD-PHRASE SEGMENTATION AND VERIFICATION: A NOISE-ROBUST TEMPLATE-BASED APPROACH BIRD-PHRASE SEGMENTATION AND VERIFICATION: A NOISE-ROBUST TEMPLATE-BASED APPROACH Kantapon Kaewtip, Lee Ngee Tan, Charles E.Taylor 2, Abeer Alwan Department of Electrical Engineering, 2 Department of Ecology

More information

KD-X240BT. Digital Media Receiver

KD-X240BT. Digital Media Receiver KD-X240BT Digital Media Receiver KD-X240BT Digital Media Receiver featuring Bluetooth / USB/AUX Input /Pandora / iheartradio / 13-Band EQ / JVC Remote App Compatibility Smartphone Integration Android Music

More information

Video search requires efficient annotation of video content To some extent this can be done automatically

Video search requires efficient annotation of video content To some extent this can be done automatically VIDEO ANNOTATION Market Trends Broadband doubling over next 3-5 years Video enabled devices are emerging rapidly Emergence of mass internet audience Mainstream media moving to the Web What do we search

More information

IDMT Transcription API Documentation

IDMT Transcription API Documentation IDMT Transcription API Documentation 06.01.2016 Fraunhofer IDMT Hanna Lukashevich, lkh@idmt.fraunhofer.de Sascha Grollmisch, goh@idmt.fraunhofer.de Jakob Abeßer, abr@idmt.fraunhofer.de 1 Contents 1 Introduction

More information

RECOMMENDATION ITU-R BS Procedure for the performance test of automated query-by-humming systems

RECOMMENDATION ITU-R BS Procedure for the performance test of automated query-by-humming systems Rec. ITU-R BS.1693 1 RECOMMENDATION ITU-R BS.1693 Procedure for the performance test of automated query-by-humming systems (Question ITU-R 8/6) (2004) The ITU Radiocommunication Assembly, considering a)

More information

CHAPTER 3. Preprocessing and Feature Extraction. Techniques

CHAPTER 3. Preprocessing and Feature Extraction. Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques 3.1 Need for Preprocessing and Feature Extraction schemes for Pattern Recognition and

More information

Music Genre Classification

Music Genre Classification Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different

More information

Online PLCA for Real-time Semi-supervised Source Separation

Online PLCA for Real-time Semi-supervised Source Separation Online PLCA for Real-time Semi-supervised Source Separation Zhiyao Duan 1, Gautham J. Mysore 2 and Paris Smaragdis 2,3 1 EECS Department, Northwestern University, 2 Advanced Technology Labs, Adobe Systems

More information

Lecture Video Indexing and Retrieval Using Topic Keywords

Lecture Video Indexing and Retrieval Using Topic Keywords Lecture Video Indexing and Retrieval Using Topic Keywords B. J. Sandesh, Saurabha Jirgi, S. Vidya, Prakash Eljer, Gowri Srinivasa International Science Index, Computer and Information Engineering waset.org/publication/10007915

More information

AUTOMATIC CONSUMER VIDEO SUMMARIZATION BY AUDIO AND VISUAL ANALYSIS

AUTOMATIC CONSUMER VIDEO SUMMARIZATION BY AUDIO AND VISUAL ANALYSIS AUTOMATIC CONSUMER VIDEO SUMMARIZATION BY AUDIO AND VISUAL ANALYSIS Wei Jiang, Courtenay Cotton 2, Alexander C. Loui Corporate Research and Engineering, Eastman Kodak Company, Rochester, NY 2 Electrical

More information

Image Matching. AKA: Image registration, the correspondence problem, Tracking,

Image Matching. AKA: Image registration, the correspondence problem, Tracking, Image Matching AKA: Image registration, the correspondence problem, Tracking, What Corresponds to What? Daisy? Daisy From: www.amphian.com Relevant for Analysis of Image Pairs (or more) Also Relevant for

More information

Audio-Based Action Scene Classification Using HMM-SVM Algorithm

Audio-Based Action Scene Classification Using HMM-SVM Algorithm Audio-Based Action Scene Classification Using HMM-SVM Algorithm Khin Myo Chit, K Zin Lin Abstract Nowadays, there are many kind of video such as educational movies, multimedia movies, action movies and

More information

Chapter 14 MPEG Audio Compression

Chapter 14 MPEG Audio Compression Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 6 Audio Retrieval 6 Audio Retrieval 6.1 Basics of

More information

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing

More information

Robust Shape Retrieval Using Maximum Likelihood Theory

Robust Shape Retrieval Using Maximum Likelihood Theory Robust Shape Retrieval Using Maximum Likelihood Theory Naif Alajlan 1, Paul Fieguth 2, and Mohamed Kamel 1 1 PAMI Lab, E & CE Dept., UW, Waterloo, ON, N2L 3G1, Canada. naif, mkamel@pami.uwaterloo.ca 2

More information

Repeating Segment Detection in Songs using Audio Fingerprint Matching

Repeating Segment Detection in Songs using Audio Fingerprint Matching Repeating Segment Detection in Songs using Audio Fingerprint Matching Regunathan Radhakrishnan and Wenyu Jiang Dolby Laboratories Inc, San Francisco, USA E-mail: regu.r@dolby.com Institute for Infocomm

More information

Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments

Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments The 2 IEEE/RSJ International Conference on Intelligent Robots and Systems October 8-22, 2, Taipei, Taiwan Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments Takami Yoshida, Kazuhiro

More information

Novel Subband Autoencoder Features for Non-intrusive Quality Assessment of Noise Suppressed Speech

Novel Subband Autoencoder Features for Non-intrusive Quality Assessment of Noise Suppressed Speech INTERSPEECH 16 September 8 12, 16, San Francisco, USA Novel Subband Autoencoder Features for Non-intrusive Quality Assessment of Noise Suppressed Speech Meet H. Soni, Hemant A. Patil Dhirubhai Ambani Institute

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

CHAPTER 8 Multimedia Information Retrieval

CHAPTER 8 Multimedia Information Retrieval CHAPTER 8 Multimedia Information Retrieval Introduction Text has been the predominant medium for the communication of information. With the availability of better computing capabilities such as availability

More information

Separation Of Speech From Noise Challenge

Separation Of Speech From Noise Challenge Separation Of Speech From Noise Challenge NagaChaitanya Vellanki vellanki@stanford.edu December 14, 2012 1 Introduction The goal of this project is to implement the methods submitted for the PASCAL CHiME

More information

TRAX SP User Guide. Direct any questions or issues you may encounter with the use or installation of ADX TRAX SP to:

TRAX SP User Guide. Direct any questions or issues you may encounter with the use or installation of ADX TRAX SP to: TRAX SP User Guide Welcome to ADX TRAX 3 SP! This guide provides an in-depth look at the features, functionality and workflow of the software. To quickly learn how to use and work with ADX TRAX SP, please

More information

Voice Command Based Computer Application Control Using MFCC

Voice Command Based Computer Application Control Using MFCC Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,

More information

Auditory Sparse Coding

Auditory Sparse Coding 1 Auditory Sparse Coding Steven R. Ness University of Victoria Thomas Walters Google Inc. Richard F. Lyon Google Inc. CONTENTS 1.1 Summary..................................................................

More information

MuBu for Max/MSP. IMTR IRCAM Centre Pompidou. Norbert Schnell Riccardo Borghesi 20/10/2010

MuBu for Max/MSP. IMTR IRCAM Centre Pompidou. Norbert Schnell Riccardo Borghesi 20/10/2010 MuBu for Max/MSP IMTR IRCAM Centre Pompidou Norbert Schnell Riccardo Borghesi 20/10/2010 Motivation Create a solid and open framework for the experimentation with recorded data streams of multiple representations

More information