LabROSA Research Overview
|
|
- Brianne Hopkins
- 6 years ago
- Views:
Transcription
1 LabROSA Research Overview Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA 1. Music 2. Environmental sound 3. Speech Enhancement Laboratory for the Recognition and Organization of Speech and Audio COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK LabROSA Research Overview - Dan Ellis /20
2 LabROSA Getting information from sound Information Extraction Music Machine Learning Recognition Separation Retrieval Speech Environment Signal Processing LabROSA Research Overview - Dan Ellis /20
3 1. Music Audio Analysis Trained classifiers for low-level information notes, chords, beats, section boundaries E.g. Polyphonic transcription feature agnostic needs training data Poliner & Ellis 06 LabROSA Research Overview - Dan Ellis /20
4 Million Song Dataset Industrial-scale database for music information research Many facets: Echo Nest audio features + metadata Echo Nest taste profile user-song-listen count Second Hand Song covers musixmatch lyric BoW last.fm tags Now with audio? resolving artist / album / track / duration against what.cd Bertin-Mahieux McFee LabROSA Research Overview - Dan Ellis /20
5 MIDI-to-MSD Raffel Aligned MIDI to Audio is a nice transcription Can we find matches in large databases? LabROSA Research Overview - Dan Ellis /20
6 Singing ASR McVicar Speech recognition adapted to singing needs aligned data Extensive work to line up scraped acapellas and full mix including jumps LabROSA Research Overview - Dan Ellis /20
7 4, one can hear some high frequency isolated coefficients superimposed to the separated voice. This drawback could be reduced by including harmonicity priors in the sparse component of RPCA, as Papadopoulos proposed in [20]. Ground versus estimated voice activity location. ImRPCAtruth separates vocals and background perfect voice location still allows an improvebasedactivity on low rankinformation optimization ment, although to a lesser extent than with ground-truth voice acparameter single trade-off tivity information. The decrease in the results mainly comes from basedclassified on higher-level features? adjust background segments as vocalmusical segments. Block Structure RPCA e. Fig. 4. Separated for various LabROSA Research Overview voice - Dan Ellis values of λ for the Pink Noise Party song - 7 /
8 Ordinal LDA Segmentation McFee Low-rank decomposition of skewed self-similarity to identify repeats Learned weighting of multiple factors to segment Linear Discriminant Analysis between adjacent segments LabROSA Research Overview - Dan Ellis /20
9 2. Environmental Sound Extracting useful information from soundtracks e.g. TRECVID Multimedia Event Detection (MED) Making a Sandwich, Getting a Vehicle Unstuck 100 examples, find matches in 100k videos manual annotations for ~10 h E009 Getting a Vehicle Unstuck LabROSA Research Overview - Dan Ellis /20
10 Foreground Event Recognition Cotton, Ellis, Loui 11 Transients = foreground events? Onset detector finds energy bursts best SNR PCA basis to represent each 300 ms x auditory freq bag of transients LabROSA Research Overview - Dan Ellis /20
11 NMF Transient Features Decompose spectrograms into templates + activation X = W H well-behaved gradient descent 2D patches sparsity control computation time Basis 1 (L2) Basis 2 (L1) Basis 3 (L1) freq / Hz Original mixture Smaragdis & Brown 03 Abdallah & Plumbley 04 Virtanen 07 Cotton & Ellis 11 LabROSA Research Overview - Dan Ellis / time / s
12 Background Retrieval Classify soundtracks by statistics of ambience E.g. Texture features Sound Automatic gain control Subband distributions Envelope cross-corrs mel x filterbank x (18 chans) x freq / Hz mel band Envelope correlation Cross-band correlations (318 samples) mean, var, skew, kurt (18 x 4) Modulation energy (18 x 6) LabROSA Research Overview - Dan Ellis /20 x x x 1159_10 urban cheer clap Texture features FFT Histogram McDermott et al. 09 Ellis, Zheng, McDermott 11 Octave bins 0.5,1,2,4,8,16 Hz 1062_60 quiet dubbed speech music time / s M V S K M V S K moments mod frq / Hz mel band moments mod frq / Hz mel band 1 0 level
13 Auditory Model Features Lyon et al Lee & Ellis 2012 Cotton & Ellis 2013 Subband Autocorrelation PCA Simplified version of autocorrelogram 10x faster than Lyon original Capture fine time structure in multiple bands information lost in MFCCs delay line short-time autocorrelation Subband VQ Sound Cochlea filterbank frequency channels Subband VQ Subband VQ Subband VQ Histogram Feature Vector freq lag time Correlogram slice LabROSA Research Overview - Dan Ellis /20
14 Subband Autocorrelation delay line short-time autocorrelation Autocorrelation stabilizes fine time structure Sound Cochlea filterbank frequency channels freq lag Correlogram time slice Subband VQ Subband VQ Subband VQ Subband VQ Histogram Feature Vector 25 ms window, lags up to 25 ms calculated every 10 ms normalized to max (zero lag) LabROSA Research Overview - Dan Ellis /20
15 Retrieval Examples High precision for in-domain top hits LabROSA Research Overview - Dan Ellis /20
16 3. Speech Enhancement Noisy speech scenarios Ambient recording (background noise) Communication channel (processing distortion) CAR KIT - BP in 100 HOME LAND - BP in db 50 freq / Hz level / db freq / Hz 1500 Hz chan time / s freq / Hz time / s level / db LabROSA Research Overview - Dan Ellis /20
17 RPCA Enhancement Chen, McFee & Ellis 14 Decompose spectrogram into sparse + low-rank Sparse activation H of dictionary W min H,L,S H khk 1 + L klk + S ksk 1 + I + (H) s.t. Y = WH + L + S ASR benefits: C S D I Orig RPCA wie+rpca LabROSA Research Overview - Dan Ellis /20
18 Classification Pitch Tracker SAcC: MLP trained on noisy speech with ground-truth pitch track targets Large benefits for in-domain noisy speech PTE (%) FDA RBF and pink noise YIN Wu get_f0 10 SAcC SNR (db) Lee & Ellis 12 LabROSA Research Overview - Dan Ellis /20
19 Pitch-Normalized Enhancement Use noise-robust pitch tracker for enhancement? 1000 Clean signal Normalize voice pitch Fixed-pitch enhancement Reimpose pitch Frequency Noisy signal pitch 500 smoothed pvx resampled to pitch = 100 Hz Filtered pvsmooth Resampled back to original pitch LabROSA Research Overview - Dan Ellis /20 Time
20 Summary Music transcription, segmentation, alignment for ground truth Soundtracks foreground events, background ambience Noisy Speech classification pitch tracking spectrogram enhancement LabROSA Research Overview - Dan Ellis /20
Mining Large-Scale Music Data Sets
Mining Large-Scale Music Data Sets Dan Ellis & Thierry Bertin-Mahieux Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu
More informationMinimal-Impact Personal Audio Archives
Minimal-Impact Personal Audio Archives Dan Ellis, Keansub Lee, Jim Ogle Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu
More informationAudio & Music Research at LabROSA
Audio & Music Research at LabROSA Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationThe Intervalgram: An Audio Feature for Large-scale Cover-song Recognition
The Intervalgram: An Audio Feature for Large-scale Cover-song Recognition Thomas C. Walters, David A. Ross, and Richard F. Lyon Google, Brandschenkestrasse 110, 8002 Zurich, Switzerland tomwalters@google.com
More informationCHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri
1 CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING Alexander Wankhammer Peter Sciri introduction./the idea > overview What is musical structure?
More informationMultimedia Indexing. Lecture 12: EE E6820: Speech & Audio Processing & Recognition. Spoken document retrieval Audio databases.
EE E6820: Speech & Audio Processing & Recognition Lecture 12: Multimedia Indexing 1 Spoken document retrieval 2 Audio databases 3 Open issues Dan Ellis http://www.ee.columbia.edu/~dpwe/e6820/
More informationLecture 12: Multimedia Indexing. Spoken Document Retrieval (SDR)
EE E68: Speech & Audio Processing & Recognition Lecture : Multimedia Indexing 3 Spoken document retrieval Audio databases Open issues Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/
More informationHands On: Multimedia Methods for Large Scale Video Analysis (Project Meeting) Dr. Gerald Friedland,
Hands On: Multimedia Methods for Large Scale Video Analysis (Project Meeting) Dr. Gerald Friedland, fractor@icsi.berkeley.edu 1 Today Today Project Requirements Today Project Requirements Data available
More informationMUSIC/VOICE SEPARATION USING THE 2D FOURIER TRANSFORM. Prem Seetharaman, Fatemeh Pishdadian, Bryan Pardo
MUSIC/VOICE SEPARATION USING THE 2D FOURIER TRANSFORM Prem Seetharaman, Fatemeh Pishdadian, Bryan Pardo Northwestern University Electrical Engineering and Computer Science Evanston, IL ABSTRACT Audio source
More informationShort-Term Audio-Visual Atoms for Generic Video Concept Classification
Short-Term Audio-Visual Atoms for Generic Video Concept Classification Authors Wei Jiang Courtenay Cotton Shih-Fu Chang Dan Ellis Alexander C. Loui Presenters Armin Samii Images from the interwebs, 2009
More informationMultimedia Event Detection for Large Scale Video. Benjamin Elizalde
Multimedia Event Detection for Large Scale Video Benjamin Elizalde Outline Motivation TrecVID task Related work Our approach (System, TF/IDF) Results & Processing time Conclusion & Future work Agenda 2
More informationAdvanced techniques for management of personal digital music libraries
Advanced techniques for management of personal digital music libraries Jukka Rauhala TKK, Laboratory of Acoustics and Audio signal processing Jukka.Rauhala@acoustics.hut.fi Abstract In this paper, advanced
More informationComparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification
More informationLARGE-SCALE COVER SONG RECOGNITION USING THE 2D FOURIER TRANSFORM MAGNITUDE
LARGE-SCALE COVER SONG RECOGNITION USING THE D FOURIER TRANSFORM MAGNITUDE Thierry Bertin-Mahieux Columbia University LabROSA, EE Dept. tb33@columbia.edu Daniel P.W. Ellis Columbia University LabROSA,
More informationMultiple Kernel Learning for Emotion Recognition in the Wild
Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,
More informationDetection of goal event in soccer videos
Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,
More informationSOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2
Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 1 Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 33720, Tampere, Finland toni.heittola@tut.fi,
More informationPrinciples of Audio Coding
Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting
More informationEVALUATING MUSIC SEQUENCE MODELS THROUGH MISSING DATA
EVALUATING MUSIC SEQUENCE MODELS THROUGH MISSING DATA Thierry Bertin-Mahieux, Graham Grindlay Columbia University LabROSA New York, USA Ron J. Weiss and Daniel P.W. Ellis New York University / Columbia
More informationSparse Models in Image Understanding And Computer Vision
Sparse Models in Image Understanding And Computer Vision Jayaraman J. Thiagarajan Arizona State University Collaborators Prof. Andreas Spanias Karthikeyan Natesan Ramamurthy Sparsity Sparsity of a vector
More informationRobustness and independence of voice timbre features under live performance acoustic degradations
Robustness and independence of voice timbre features under live performance acoustic degradations Dan Stowell and Mark Plumbley dan.stowell@elec.qmul.ac.uk Centre for Digital Music Queen Mary, University
More informationOnline music recognition: the Echoprint system
Online music recognition: the Echoprint system Cors Brinkman cors.brinkman@gmail.com Manolis Fragkiadakis vargmanolis@gmail.com Xander Bos xander.bos@gmail.com ABSTRACT Echoprint is an open source music
More informationCCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.
CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems Leigh M. Smith Humtap Inc. leigh@humtap.com Basic system overview Segmentation (Frames, Onsets, Beats, Bars, Chord Changes, etc) Feature
More informationAvailable online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article
Available online www.jsaer.com, 2016, 3(4):417-422 Research Article ISSN: 2394-2630 CODEN(USA): JSERBR Automatic Indexing of Multimedia Documents by Neural Networks Dabbabi Turkia 1, Lamia Bouafif 2, Ellouze
More informationHello, I am from the State University of Library Studies and Information Technologies, Bulgaria
Hello, My name is Svetla Boytcheva, I am from the State University of Library Studies and Information Technologies, Bulgaria I am goingto present you work in progress for a research project aiming development
More informationTWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University
TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION Prateek Verma, Yang-Kai Lin, Li-Fan Yu Stanford University ABSTRACT Structural segmentation involves finding hoogeneous sections appearing
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/
More informationAudioSet: Real-world Audio Event Classification
AudioSet: Real-world Audio Event Classification g.co/audioset Rif A. Saurous, Shawn Hershey, Dan Ellis, Aren Jansen and the Google Sound Understanding Team 2017-10-20 Outline The Early Years: Weakly-Supervised
More informationINTERACTIVE REFINEMENT OF SUPERVISED AND SEMI-SUPERVISED SOUND SOURCE SEPARATION ESTIMATES
INTERACTIVE REFINEMENT OF SUPERVISED AND SEMI-SUPERVISED SOUND SOURCE SEPARATION ESTIMATES Nicholas J. Bryan Gautham J. Mysore Center for Computer Research in Music and Acoustics, Stanford University Adobe
More informationAudio-coding standards
Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.
More informationSeparating Speech From Noise Challenge
Separating Speech From Noise Challenge We have used the data from the PASCAL CHiME challenge with the goal of training a Support Vector Machine (SVM) to estimate a noise mask that labels time-frames/frequency-bins
More informationALIGNED HIERARCHIES: A MULTI-SCALE STRUCTURE-BASED REPRESENTATION FOR MUSIC-BASED DATA STREAMS
ALIGNED HIERARCHIES: A MULTI-SCALE STRUCTURE-BASED REPRESENTATION FOR MUSIC-BASED DATA STREAMS Katherine M. Kinnaird Department of Mathematics, Statistics, and Computer Science Macalester College, Saint
More informationDUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING
DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING Christopher Burges, Daniel Plastina, John Platt, Erin Renshaw, and Henrique Malvar March 24 Technical Report MSR-TR-24-19 Audio fingerprinting
More informationMETRIC LEARNING BASED DATA AUGMENTATION FOR ENVIRONMENTAL SOUND CLASSIFICATION
METRIC LEARNING BASED DATA AUGMENTATION FOR ENVIRONMENTAL SOUND CLASSIFICATION Rui Lu 1, Zhiyao Duan 2, Changshui Zhang 1 1 Department of Automation, Tsinghua University 2 Department of Electrical and
More informationThe Sensitivity Matrix
The Sensitivity Matrix Integrating Perception into the Flexcode Project Jan Plasberg Flexcode Seminar Lannion June 6, 2007 SIP - Sound and Image Processing Lab, EE, KTH Stockholm 1 SIP - Sound and Image
More informationMACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014
MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)
More informationSCREAM AND GUNSHOT DETECTION IN NOISY ENVIRONMENTS
SCREAM AND GUNSHOT DETECTION IN NOISY ENVIRONMENTS L. Gerosa, G. Valenzise, M. Tagliasacchi, F. Antonacci, A. Sarti Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza Leonardo da
More informationMultimedia Database Systems. Retrieval by Content
Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,
More informationAudio-visual interaction in sparse representation features for noise robust audio-visual speech recognition
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for
More informationPredicting Song Popularity
Predicting Song Popularity James Pham jqpham@stanford.edu Edric Kyauk ekyauk@stanford.edu Edwin Park edpark@stanford.edu Abstract Predicting song popularity is particularly important in keeping businesses
More informationGYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)
GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE
More informationColumbia University High-Level Feature Detection: Parts-based Concept Detectors
TRECVID 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab
More informationDisplay. 2-Line VA LCD Display. 2-Zone Variable-Color Illumination
Display 2-Line VA LCD Display Equipped with VA (Vertical Alignment) LCD panels that offer a broader angle of view and better visibility. The 2-line display provides more information with animation effects.
More informationMASK: Robust Local Features for Audio Fingerprinting
2012 IEEE International Conference on Multimedia and Expo MASK: Robust Local Features for Audio Fingerprinting Xavier Anguera, Antonio Garzon and Tomasz Adamek Telefonica Research, Torre Telefonica Diagonal
More informationDiscriminative training and Feature combination
Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics
More informationIN the recent decades digital music has become more. Codebook based Audio Feature Representation for Music Information Retrieval
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 1 Codebook based Audio Feature Representation for Music Information Retrieval Yonatan Vaizman, Brian McFee, member, IEEE, and Gert Lanckriet, senior
More informationSparse coding for image classification
Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction
More informationPitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery
Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta Kumar Ghosh SPIRE LAB Electrical Engineering, Indian Institute of Science (IISc), Bangalore,
More informationTRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 1. Yonatan Vaizman, Brian McFee, and Gert Lanckriet
TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 1 Codebook based Audio Feature Representation for Music Information Retrieval Yonatan Vaizman, Brian McFee, and Gert Lanckriet arxiv:1312.5457v1 [cs.ir]
More informationCodebook-Based Audio Feature Representation for Music Information Retrieval
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 10, OCTOBER 2014 1483 Codebook-Based Audio Feature Representation for Music Information Retrieval Yonatan Vaizman, Brian McFee,
More informationRobust and Efficient Multiple Alignment of Unsynchronized Meeting Recordings
JOURNAL OF L A TEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 1 Robust and Efficient Multiple Alignment of Unsynchronized Meeting Recordings TJ Tsai, Student Member, IEEE, Andreas Stolcke, Fellow, IEEE,
More informationHigh-level Event Recognition in Internet Videos
High-level Event Recognition in Internet Videos Yu-Gang Jiang School of Computer Science Fudan University, Shanghai, China ygj@fudan.edu.cn Joint work with Guangnan Ye 1, Subh Bhattacharya 2, Dan Ellis
More informationMPEG-1 Bitstreams Processing for Audio Content Analysis
ISSC, Cork. June 5- MPEG- Bitstreams Processing for Audio Content Analysis Roman Jarina, Orla Duffner, Seán Marlow, Noel O Connor, and Noel Murphy Visual Media Processing Group Dublin City University Glasnevin,
More informationUsing Noise Substitution for Backwards-Compatible Audio Codec Improvement
Using Noise Substitution for Backwards-Compatible Audio Codec Improvement Colin Raffel AES 129th Convention San Francisco, CA February 16, 2011 Outline Introduction and Motivation Coding Error Analysis
More informationRobotics Programming Laboratory
Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car
More informationAnalyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun
Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about
More informationTHE IMPORTANCE OF F0 TRACKING IN QUERY-BY-SINGING-HUMMING
15th International Society for Music Information Retrieval Conference (ISMIR 214) THE IMPORTANCE OF F TRACKING IN QUERY-BY-SINGING-HUMMING Emilio Molina, Lorenzo J. Tardón, Isabel Barbancho, Ana M. Barbancho
More informationPrevious Lecture - Coded aperture photography
Previous Lecture - Coded aperture photography Depth from a single image based on the amount of blur Estimate the amount of blur using and recover a sharp image by deconvolution with a sparse gradient prior.
More informationAn efficient face recognition algorithm based on multi-kernel regularization learning
Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel
More informationDetecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference
Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Beyond Bag of Words Bag of Words a document is considered to be an unordered collection of words with no relationships Extending
More informationAudio-coding standards
Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.
More informationConsumer Video Understanding
Consumer Video Understanding A Benchmark Database + An Evaluation of Human & Machine Performance Yu-Gang Jiang, Guangnan Ye, Shih-Fu Chang, Daniel Ellis, Alexander C. Loui Columbia University Kodak Research
More informationAccelerating Multimodal Sequence Retrieval with Convolutional Networks
Accelerating Multimodal Sequence Retrieval with Convolutional Networks Colin Raffel LabROSA Columbia University New York, NY 10027 craffel@gmail.com Daniel P. W. Ellis LabROSA Columbia University New York,
More informationMpeg 1 layer 3 (mp3) general overview
Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,
More informationThe Automatic Musicologist
The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12, 2004 Based on the paper: Fast Recognition of Musical
More informationSpeaker Verification with Adaptive Spectral Subband Centroids
Speaker Verification with Adaptive Spectral Subband Centroids Tomi Kinnunen 1, Bingjun Zhang 2, Jia Zhu 2, and Ye Wang 2 1 Speech and Dialogue Processing Lab Institution for Infocomm Research (I 2 R) 21
More informationSeamless transfer of ambient media from environment to mobile device
Technical Disclosure Commons Defensive Publications Series May 23, 2018 Seamless transfer of ambient media from environment to mobile device Dominik Roblek Matthew Sharifi Follow this and additional works
More informationLecture 7: Audio Compression & Coding
EE E682: Speech & Audio Processing & Recognition Lecture 7: Audio Compression & Coding 1 2 3 Information, compression & quantization Speech coding Wide bandwidth audio coding Dan Ellis
More informationContent-based Video Genre Classification Using Multiple Cues
Content-based Video Genre Classification Using Multiple Cues Hazım Kemal Ekenel Institute for Anthropomatics Karlsruhe Institute of Technology (KIT) 76131 Karlsruhe, Germany ekenel@kit.edu Tomas Semela
More informationThe Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System
The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System Our first participation on the TRECVID workshop A. F. de Araujo 1, F. Silveira 2, H. Lakshman 3, J. Zepeda 2, A. Sheth 2, P. Perez
More informationSecond author Retain these fake authors in submission to preserve the formatting
l 1 -GRAPH BASED MUSIC STRUCTURE ANALYSIS First author Affiliation1 author1@ismir.edu Second author Retain these fake authors in submission to preserve the formatting Third author Affiliation3 author3@ismir.edu
More informationComputer Vesion Based Music Information Retrieval
Computer Vesion Based Music Information Retrieval Philippe De Wagter pdewagte@andrew.cmu.edu Quan Chen quanc@andrew.cmu.edu Yuqian Zhao yuqianz@andrew.cmu.edu Department of Electrical and Computer Engineering
More informationLearning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009
Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer
More informationBIRD-PHRASE SEGMENTATION AND VERIFICATION: A NOISE-ROBUST TEMPLATE-BASED APPROACH
BIRD-PHRASE SEGMENTATION AND VERIFICATION: A NOISE-ROBUST TEMPLATE-BASED APPROACH Kantapon Kaewtip, Lee Ngee Tan, Charles E.Taylor 2, Abeer Alwan Department of Electrical Engineering, 2 Department of Ecology
More informationKD-X240BT. Digital Media Receiver
KD-X240BT Digital Media Receiver KD-X240BT Digital Media Receiver featuring Bluetooth / USB/AUX Input /Pandora / iheartradio / 13-Band EQ / JVC Remote App Compatibility Smartphone Integration Android Music
More informationVideo search requires efficient annotation of video content To some extent this can be done automatically
VIDEO ANNOTATION Market Trends Broadband doubling over next 3-5 years Video enabled devices are emerging rapidly Emergence of mass internet audience Mainstream media moving to the Web What do we search
More informationIDMT Transcription API Documentation
IDMT Transcription API Documentation 06.01.2016 Fraunhofer IDMT Hanna Lukashevich, lkh@idmt.fraunhofer.de Sascha Grollmisch, goh@idmt.fraunhofer.de Jakob Abeßer, abr@idmt.fraunhofer.de 1 Contents 1 Introduction
More informationRECOMMENDATION ITU-R BS Procedure for the performance test of automated query-by-humming systems
Rec. ITU-R BS.1693 1 RECOMMENDATION ITU-R BS.1693 Procedure for the performance test of automated query-by-humming systems (Question ITU-R 8/6) (2004) The ITU Radiocommunication Assembly, considering a)
More informationCHAPTER 3. Preprocessing and Feature Extraction. Techniques
CHAPTER 3 Preprocessing and Feature Extraction Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques 3.1 Need for Preprocessing and Feature Extraction schemes for Pattern Recognition and
More informationMusic Genre Classification
Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different
More informationOnline PLCA for Real-time Semi-supervised Source Separation
Online PLCA for Real-time Semi-supervised Source Separation Zhiyao Duan 1, Gautham J. Mysore 2 and Paris Smaragdis 2,3 1 EECS Department, Northwestern University, 2 Advanced Technology Labs, Adobe Systems
More informationLecture Video Indexing and Retrieval Using Topic Keywords
Lecture Video Indexing and Retrieval Using Topic Keywords B. J. Sandesh, Saurabha Jirgi, S. Vidya, Prakash Eljer, Gowri Srinivasa International Science Index, Computer and Information Engineering waset.org/publication/10007915
More informationAUTOMATIC CONSUMER VIDEO SUMMARIZATION BY AUDIO AND VISUAL ANALYSIS
AUTOMATIC CONSUMER VIDEO SUMMARIZATION BY AUDIO AND VISUAL ANALYSIS Wei Jiang, Courtenay Cotton 2, Alexander C. Loui Corporate Research and Engineering, Eastman Kodak Company, Rochester, NY 2 Electrical
More informationImage Matching. AKA: Image registration, the correspondence problem, Tracking,
Image Matching AKA: Image registration, the correspondence problem, Tracking, What Corresponds to What? Daisy? Daisy From: www.amphian.com Relevant for Analysis of Image Pairs (or more) Also Relevant for
More informationAudio-Based Action Scene Classification Using HMM-SVM Algorithm
Audio-Based Action Scene Classification Using HMM-SVM Algorithm Khin Myo Chit, K Zin Lin Abstract Nowadays, there are many kind of video such as educational movies, multimedia movies, action movies and
More informationChapter 14 MPEG Audio Compression
Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 6 Audio Retrieval 6 Audio Retrieval 6.1 Basics of
More informationSPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION
Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing
More informationRobust Shape Retrieval Using Maximum Likelihood Theory
Robust Shape Retrieval Using Maximum Likelihood Theory Naif Alajlan 1, Paul Fieguth 2, and Mohamed Kamel 1 1 PAMI Lab, E & CE Dept., UW, Waterloo, ON, N2L 3G1, Canada. naif, mkamel@pami.uwaterloo.ca 2
More informationRepeating Segment Detection in Songs using Audio Fingerprint Matching
Repeating Segment Detection in Songs using Audio Fingerprint Matching Regunathan Radhakrishnan and Wenyu Jiang Dolby Laboratories Inc, San Francisco, USA E-mail: regu.r@dolby.com Institute for Infocomm
More informationTwo-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments
The 2 IEEE/RSJ International Conference on Intelligent Robots and Systems October 8-22, 2, Taipei, Taiwan Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments Takami Yoshida, Kazuhiro
More informationNovel Subband Autoencoder Features for Non-intrusive Quality Assessment of Noise Suppressed Speech
INTERSPEECH 16 September 8 12, 16, San Francisco, USA Novel Subband Autoencoder Features for Non-intrusive Quality Assessment of Noise Suppressed Speech Meet H. Soni, Hemant A. Patil Dhirubhai Ambani Institute
More informationVideo annotation based on adaptive annular spatial partition scheme
Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory
More informationCHAPTER 8 Multimedia Information Retrieval
CHAPTER 8 Multimedia Information Retrieval Introduction Text has been the predominant medium for the communication of information. With the availability of better computing capabilities such as availability
More informationSeparation Of Speech From Noise Challenge
Separation Of Speech From Noise Challenge NagaChaitanya Vellanki vellanki@stanford.edu December 14, 2012 1 Introduction The goal of this project is to implement the methods submitted for the PASCAL CHiME
More informationTRAX SP User Guide. Direct any questions or issues you may encounter with the use or installation of ADX TRAX SP to:
TRAX SP User Guide Welcome to ADX TRAX 3 SP! This guide provides an in-depth look at the features, functionality and workflow of the software. To quickly learn how to use and work with ADX TRAX SP, please
More informationVoice Command Based Computer Application Control Using MFCC
Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,
More informationAuditory Sparse Coding
1 Auditory Sparse Coding Steven R. Ness University of Victoria Thomas Walters Google Inc. Richard F. Lyon Google Inc. CONTENTS 1.1 Summary..................................................................
More informationMuBu for Max/MSP. IMTR IRCAM Centre Pompidou. Norbert Schnell Riccardo Borghesi 20/10/2010
MuBu for Max/MSP IMTR IRCAM Centre Pompidou Norbert Schnell Riccardo Borghesi 20/10/2010 Motivation Create a solid and open framework for the experimentation with recorded data streams of multiple representations
More information