Audio & Music Research at LabROSA
|
|
- Benjamin Harrington
- 5 years ago
- Views:
Transcription
1 Audio & Music Research at LabROSA Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA 1. Eigenrhythms: Representing drum tracks 2. Frequency-Domain Linear Prediction 3. Segmenting meeting turns 4. Analyzing personal audio recordings
2 LabROSA Projects Overview Information Extraction Music Eigenrhythms Environment Personal audio Machine Learning Meeting turns Speech FDLP Signal Processing
3 1. Eigenrhythms: Drum Pattern Space Pop songs built on repeating drum loop bass drum, snare, hi-hat small variations on a few basic patterns with John Arroyo Eigen-analysis (PCA) to capture variations? by analyzing lots of (MIDI) data Applications music categorization beat box synthesis
4 Aligning the Data Need to align patterns prior to PCA... tempo (stretch): by inferring BPM & normalizing downbeat (shift): correlate against mean template
5 Eigenrhythms Need 20+ Eigenvectors for good coverage of 100 training patterns (1200 dims) Top patterns:
6 Eigenrhythms for Classification Clusters in Eigenspace: Eigenrhythm All tracks projected onto 1st two eigenrhythms hh:gthang rb:honey ho:inside bl:hideaway pp:dllal rc:whteroom hh:rufryder rb:heylover rc:californ nw:psboysi n ho:pvandyk pp:distance di:danqueen nw:evcount s di:booty rc:zztop nw:dontyou rb:mgirlsat hh:1mchance di:funkytwn di:satnight nw:pure hh:nepisode nw:amadeus hh:stan hh:jackson bl:crosfire bl:thrill co:alabama hh:bigpimpn nw:deservepp:fly pu:blitzkr rb:downlow pp:lkvirgin rc:hardday rc:jump rc:money rc:tuesdays pu:rubysoho g hh:slmshady pu:bsedated pu:beatbrat pu:waitinrm rc:blackdog co:sarose pp:lvprayer hh:superst rdi:lafreak di:dontstop nw:whipi nw:bmonday t rb:chgworld pp:mjbeatit pp:loveshck co:walkline rc:rolstone di:carwash bl:blues2gm bl:meanwoma co:aftermid co:walkmi d nw:dbdance ho:modjo hh:bigpoppa bl:onebeer pu:happyguy co:goodlook pu:bombshel rc:layla bl:chicken bl:dimples co:tennesse co:texas co:byyrman rb:volove di:boogient ho:bemylove pu:awal k ho:dpworld rb:lsaround pp:bholly di:boogiewl ho:onemore di:discoinf bl:boomboom -4 rb:bismine co:ringfire nw:banvenus pp:onemore ho:badtouch pu:anarchy pp:downundr Eigenrhythm 1 Genre classification? (10 way) nearest neighbor in 4D eigenspace: 21% correct
7 Eigenrhythm BeatBox Resynthesize rhythms from eigen-space
8 2. Frequency-Domain Lin. Pred. (Time-domain) Linear Prediction the well-known spectral estimator y[n] = TDLP " a i y[n! i] + e[n] i=1.. p Apply to a frequency domain signal dual: estimates temporal envelope FDLP DCT Y[k] = " b i Y[k! i] + E[k] i=1.. p with Marios Athineos
9 Aside: Spectrogram of the DCT DCT gives a pure-real signal: Can we treat it like a waveform?
10 FDLP and TDLP Duality!,-. ),-. )*+#!"#$%#&'(
11 Subband FDLP Temporal envelopes without 25 ms windows Auditory STFT (10-25ms + Bark bin) TDLP (per time frame) Subband FDLP (per frequency subband)
12 FDLP Applications Time-scale modification Modulation-domain temporal equalization DCT Residual in freq. 1 sec up to whole sample OLA & idct Overlap Perceptual audio features... Flat Temporal Envelopes
13 PLP-squared Marios Athineos Hynek Hermansky FDLP fits temporal envelope with LP Perceptual Linear Prediction (PLP) smooths across frequency can we do both... iteratively? Speech features without ST windows 15 Bark band t / sec
14 3. Meeting Turns with Jerry Liu and ICSI Multi-mic recordings for speaker turns every voice reaches every mic... (?)... but with differing coupling filters (delays, gains) Find turns with minimal assumptions e.g. ad-hoc sensor setups (multiple PDAs) differences to remove effect of source signal - no spectral models, < 1xRT
15 Between-channel cues: Timing (ITD) & Level Speaker activity Speaker ground-truth skew/samp db db norm xcorr pk val xocrr peak lags (5pt med filt) per-chan E chan E diffs time/s Timing diffs (ITD) (2 mic pairs, 250ms win) Peak correlation coefficient r Per-channel energy Between-channel energy differences
16 Pre-whitening for ITD Inverse-filter by 12-pole LPC models (32 ms windows) to remove local resonances Filter out noise < 500 Hz, > 6 khz Then cross-correlate Short-time xcorr: raw signals 100 Short-time xcorr: whitened+filtered signals lag / samps Speaker ground truth Speaker ground truth spkr ID time / sec time / sec
17 Choosing Good Frames Correlation coef. r ~ channel similarity: r i j [l] =! nm i [n] m j [n + l]!m 2 i!m 2 j Select frames with r in top 50% in both pairs ITD - all points ITD - high-correlation points (435/1201) Skew34 / samples Cleaner basis for models Skew12 / samples Skew12 / samples Skew34 / samples about 35% of points
18 Eigenvectors of affinity matrix A to pick out similar points: Spectral clustering Affinity matrix A point index a mn = exp{ x[m] x[n] 2 /2! 2 } point index Ad-hoc mapping to clusters Number of clusters K from eigenvalues points first 12 eigenvectors (normalized)
19 Speaker Models & Classification Actual clusters depend on! and K heuristic Fit Gaussians to each cluster, assign that class to all frames within radius or: consider dimensions independently, choose best 0 ICSI0: good points 0 All pts: nearest class 0 All pts: closest dimension
20 Performance Analysis Compare reference & system activity maps: system misses quiet speakers 2,3,4 (deletions) system splits speaker 6 (deletions+insertions) many short gaps (deletions) ~52% avg. error on NIST 2004 dev set speaker-characteristic-based systems ~25%
21 4. Segmenting Personal Audio Easy to record everything you hear ~100GB / 64 kbps Very hard to find anything how to scan? how to visualize? how to index? Starting point: Collect data ~ 60 hours (8 days, ~7.5 hr/day) hand-mark 139 segments (26 min/seg avg.) assign to 16 classes (8 have multiple instances) with Kean sub Lee
22 Features for Long Recordings Feature frames = 1 min (not 25 ms!) Characterize variation within each frame Average Linear Energy Normalized Energy Deviation 60 freq / bark freq / bark Average Log Energy 60 db Log Energy Deviation db 15 freq / bark freq / bark Average Spectral Entropy bits and structure within coarse auditory bands db freq / bark freq / bark Spectral Entropy Deviation time / min 10 5 db bits
23 BIC Segmentation Untrained segmentation technique statistical test indicates good change points: log L(X 1;M 1 )L(X 2 ;M 2 ) L(X;M 0 ) λ 2 log(n) #(M) Evaluate: 60hr hand-marked boundaries different features & combinations Correct Accept False Accept = 2%: µdb 80.8% µh 81.1% σh/µh 81.6% µdb + σh/µh 84.0% µdb + σh/µh + µh 83.6% Specificity Sensitivity µ db µ H! H/µ H µ db +! H/µ H µ db + µ H +! H/µ H
24 Segment clustering Daily activity has lots of repetition: Automatically cluster similar segments supermkt meeting karaoke barber lecture2 billiard break lecture1 car/taxi home bowling street restaurant library campus cmp lib rst str... Spectral clustering achieves ~70% correct 16-way ground truth labels KL distance, smoothed covariance estimates
25 Future Work Visualization / browsing / diary inference link to other information sources Privacy protection speaker/speech search and destroy
26 LabROSA Summary LabROSA signal processing + machine learning + information extraction Applications Eigenrhythms: drum pattern models FDLP temporal envelopes Meeting recordings Personal audio analysis Also... music similarity, signal separation,...
Minimal-Impact Personal Audio Archives
Minimal-Impact Personal Audio Archives Dan Ellis, Keansub Lee, Jim Ogle Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu
More informationLabROSA Research Overview
LabROSA Research Overview Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. Music 2. Environmental sound 3.
More informationCHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri
1 CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING Alexander Wankhammer Peter Sciri introduction./the idea > overview What is musical structure?
More informationMining Large-Scale Music Data Sets
Mining Large-Scale Music Data Sets Dan Ellis & Thierry Bertin-Mahieux Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu
More informationPrinciples of Audio Coding
Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting
More information1 Introduction. 3 Data Preprocessing. 2 Literature Review
Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues
More informationI D I A P R E S E A R C H R E P O R T. October submitted for publication
R E S E A R C H R E P O R T I D I A P Temporal Masking for Bit-rate Reduction in Audio Codec Based on Frequency Domain Linear Prediction Sriram Ganapathy a b Petr Motlicek a Hynek Hermansky a b Harinath
More informationAudio-coding standards
Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.
More informationAudio-coding standards
Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.
More informationMACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014
MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/
More informationParametric Coding of High-Quality Audio
Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits
More informationLecture 7: Audio Compression & Coding
EE E682: Speech & Audio Processing & Recognition Lecture 7: Audio Compression & Coding 1 2 3 Information, compression & quantization Speech coding Wide bandwidth audio coding Dan Ellis
More informationMUSI-6201 Computational Music Analysis
MUSI-6201 Computational Music Analysis Part 4.3: Feature Post-Processing alexander lerch November 4, 2015 instantaneous features overview text book Chapter 3: Instantaneous Features (pp. 63 69) sources:
More informationAnalysis of Functional MRI Timeseries Data Using Signal Processing Techniques
Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Sea Chen Department of Biomedical Engineering Advisors: Dr. Charles A. Bouman and Dr. Mark J. Lowe S. Chen Final Exam October
More informationA text-independent speaker verification model: A comparative analysis
A text-independent speaker verification model: A comparative analysis Rishi Charan, Manisha.A, Karthik.R, Raesh Kumar M, Senior IEEE Member School of Electronic Engineering VIT University Tamil Nadu, India
More informationAnalyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun
Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about
More information5: Music Compression. Music Coding. Mark Handley
5: Music Compression Mark Handley Music Coding LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the
More informationModeling the Spectral Envelope of Musical Instruments
Modeling the Spectral Envelope of Musical Instruments Juan José Burred burred@nue.tu-berlin.de IRCAM Équipe Analyse/Synthèse Axel Röbel / Xavier Rodet Technical University of Berlin Communication Systems
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationMedia Segmentation using Self-Similarity Decomposition
Media Segmentation using Self-Similarity Decomposition Jonathan T. Foote and Matthew L. Cooper FX Palo Alto Laboratory Palo Alto, CA 93 USA {foote, cooper}@fxpal.com ABSTRACT We present a framework for
More informationIntroducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd
Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories,
More informationCh. 5: Audio Compression Multimedia Systems
Ch. 5: Audio Compression Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Chapter 5: Audio Compression 1 Introduction Need to code digital
More informationDiscriminative training and Feature combination
Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationChapter 14 MPEG Audio Compression
Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1
More informationRepeating Segment Detection in Songs using Audio Fingerprint Matching
Repeating Segment Detection in Songs using Audio Fingerprint Matching Regunathan Radhakrishnan and Wenyu Jiang Dolby Laboratories Inc, San Francisco, USA E-mail: regu.r@dolby.com Institute for Infocomm
More informationSpeech and audio coding
Institut Mines-Telecom Speech and audio coding Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced compression Outline Introduction Introduction Speech signal Music signal Masking Codeurs simples
More informationENTROPY CODING OF QUANTIZED SPECTRAL COMPONENTS IN FDLP AUDIO CODEC
RESEARCH REPORT IDIAP ENTROPY CODING OF QUANTIZED SPECTRAL COMPONENTS IN FDLP AUDIO CODEC Petr Motlicek Sriram Ganapathy Hynek Hermansky Idiap-RR-71-2008 NOVEMBER 2008 Centre du Parc, Rue Marconi 19, P.O.
More informationRecognition, SVD, and PCA
Recognition, SVD, and PCA Recognition Suppose you want to find a face in an image One possibility: look for something that looks sort of like a face (oval, dark band near top, dark band near bottom) Another
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationMultimedia Indexing. Lecture 12: EE E6820: Speech & Audio Processing & Recognition. Spoken document retrieval Audio databases.
EE E6820: Speech & Audio Processing & Recognition Lecture 12: Multimedia Indexing 1 Spoken document retrieval 2 Audio databases 3 Open issues Dan Ellis http://www.ee.columbia.edu/~dpwe/e6820/
More informationLecture 12: Multimedia Indexing. Spoken Document Retrieval (SDR)
EE E68: Speech & Audio Processing & Recognition Lecture : Multimedia Indexing 3 Spoken document retrieval Audio databases Open issues Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/
More informationLecture 8 Object Descriptors
Lecture 8 Object Descriptors Azadeh Fakhrzadeh Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University 2 Reading instructions Chapter 11.1 11.4 in G-W Azadeh Fakhrzadeh
More informationREAL-TIME DIGITAL SIGNAL PROCESSING
REAL-TIME DIGITAL SIGNAL PROCESSING FUNDAMENTALS, IMPLEMENTATIONS AND APPLICATIONS Third Edition Sen M. Kuo Northern Illinois University, USA Bob H. Lee Ittiam Systems, Inc., USA Wenshun Tian Sonus Networks,
More informationVolumetric Classification: Program pca3d
Volumetric principle component analysis for 3D SEISMIC FACIES ANALYSIS PROGRAM pca3d Overview Principal component analysis (PCA) is widely used to reduce the redundancy and excess dimensionality of the
More informationMultimedia Database Systems. Retrieval by Content
Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More informationRegion-based Segmentation
Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.
More informationText-Independent Speaker Identification
December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationCS 4495 Computer Vision Motion and Optic Flow
CS 4495 Computer Vision Aaron Bobick School of Interactive Computing Administrivia PS4 is out, due Sunday Oct 27 th. All relevant lectures posted Details about Problem Set: You may *not* use built in Harris
More informationHybrid Speech Synthesis
Hybrid Speech Synthesis Simon King Centre for Speech Technology Research University of Edinburgh 2 What are you going to learn? Another recap of unit selection let s properly understand the Acoustic Space
More informationRobotics Programming Laboratory
Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car
More informationNew Results in Low Bit Rate Speech Coding and Bandwidth Extension
Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationLearning based face hallucination techniques: A survey
Vol. 3 (2014-15) pp. 37-45. : A survey Premitha Premnath K Department of Computer Science & Engineering Vidya Academy of Science & Technology Thrissur - 680501, Kerala, India (email: premithakpnath@gmail.com)
More informationImage Processing. Image Features
Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching
More informationCS 498PS Audio Computing Lab. Audio Restoration. Paris Smaragdis. paris.cs.illinois.edu U NIVERSITY OF URBANA-CHAMPAIGN
CS 498PS Audio Computing Lab Audio Restoration Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Audio restoration Gap filling Click removal Clip recovery 2 Missing data approach
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationSurrounded by High-Definition Sound
Surrounded by High-Definition Sound Dr. ChingShun Lin CSIE, NCU May 6th, 009 Introduction What is noise? Uncertain filters Introduction (Cont.) How loud is loud? (Audible: 0Hz - 0kHz) Introduction (Cont.)
More informationMPEG-1 Bitstreams Processing for Audio Content Analysis
ISSC, Cork. June 5- MPEG- Bitstreams Processing for Audio Content Analysis Roman Jarina, Orla Duffner, Seán Marlow, Noel O Connor, and Noel Murphy Visual Media Processing Group Dublin City University Glasnevin,
More informationPoint Cloud Processing
Point Cloud Processing Has anyone seen the toothpaste? Given a point cloud: how do you detect and localize objects? how do you map terrain? What is a point cloud? Point cloud: a set of points in 3-D space
More informationLecture 16 Perceptual Audio Coding
EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero
More informationA Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval
A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval 1 A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval Serkan Kiranyaz,
More informationTracking system. Danica Kragic. Object Recognition & Model Based Tracking
Tracking system Object Recognition & Model Based Tracking Motivation Manipulating objects in domestic environments Localization / Navigation Object Recognition Servoing Tracking Grasping Pose estimation
More informationComparison of Digital Image Watermarking Algorithms. Xu Zhou Colorado School of Mines December 1, 2014
Comparison of Digital Image Watermarking Algorithms Xu Zhou Colorado School of Mines December 1, 2014 Outlier Introduction Background on digital image watermarking Comparison of several algorithms Experimental
More informationMahdi Amiri. February Sharif University of Technology
Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2014 Sharif University of Technology Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code Modulation (DPCM)
More informationManifold Constrained Deep Neural Networks for ASR
1 Manifold Constrained Deep Neural Networks for ASR Department of Electrical and Computer Engineering, McGill University Richard Rose and Vikrant Tomar Motivation Speech features can be characterized as
More informationCS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003
CS 664 Slides #11 Image Segmentation Prof. Dan Huttenlocher Fall 2003 Image Segmentation Find regions of image that are coherent Dual of edge detection Regions vs. boundaries Related to clustering problems
More informationAudio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011
Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law
More informationMotion Tracking and Event Understanding in Video Sequences
Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!
More informationMethods for Intelligent Systems
Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering
More information2.4 Audio Compression
2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and
More informationCISC 7610 Lecture 3 Multimedia data and data formats
CISC 7610 Lecture 3 Multimedia data and data formats Topics: Perceptual limits of multimedia data JPEG encoding of images MPEG encoding of audio MPEG and H.264 encoding of video Multimedia data: Perceptual
More informationAUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015
AUDIO Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 Key objectives How do humans generate and process sound? How does digital sound work? How fast do I have to sample audio?
More informationMultimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology
Course Presentation Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 25 Sharif University of Technology Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code
More informationThe Pre-Image Problem and Kernel PCA for Speech Enhancement
The Pre-Image Problem and Kernel PCA for Speech Enhancement Christina Leitner and Franz Pernkopf Signal Processing and Speech Communication Laboratory, Graz University of Technology, Inffeldgasse 6c, 8
More informationDetection of goal event in soccer videos
Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationOptimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification
Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 52 Optimization of Observation Membership Function By Particle Swarm Method for Enhancing
More informationMPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding
MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding Heiko Purnhagen Laboratorium für Informationstechnologie University of Hannover, Germany Outline Introduction What is "Parametric Audio Coding"?
More informationCS 664 Segmentation. Daniel Huttenlocher
CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical
More informationSpeech-Music Discrimination from MPEG-1 Bitstream
Speech-Music Discrimination from MPEG-1 Bitstream ROMAN JARINA, NOEL MURPHY, NOEL O CONNOR, SEÁN MARLOW Centre for Digital Video Processing / RINCE Dublin City University, Dublin 9 IRELAND jarinar@eeng.dcu.ie
More informationAdvanced Digital Signal Processing Adaptive Linear Prediction Filter (Using The RLS Algorithm)
Advanced Digital Signal Processing Adaptive Linear Prediction Filter (Using The RLS Algorithm) Erick L. Oberstar 2001 Adaptive Linear Prediction Filter Using the RLS Algorithm A complete analysis/discussion
More informationMULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING
MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING Pravin Ramadas, Ying-Yi Li, and Jerry D. Gibson Department of Electrical and Computer Engineering, University of California,
More informationOptical flow and tracking
EECS 442 Computer vision Optical flow and tracking Intro Optical flow and feature tracking Lucas-Kanade algorithm Motion segmentation Segments of this lectures are courtesy of Profs S. Lazebnik S. Seitz,
More informationRecognition: Face Recognition. Linda Shapiro EE/CSE 576
Recognition: Face Recognition Linda Shapiro EE/CSE 576 1 Face recognition: once you ve detected and cropped a face, try to recognize it Detection Recognition Sally 2 Face recognition: overview Typical
More informationBasis Functions. Volker Tresp Summer 2017
Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)
More informationAudio-Visual Speech Activity Detection
Institut für Technische Informatik und Kommunikationsnetze Semester Thesis at the Department of Information Technology and Electrical Engineering Audio-Visual Speech Activity Detection Salome Mannale Advisors:
More informationMachine Perception of Music & Audio. Topic 10: Classification
Machine Perception of Music & Audio Topic 10: Classification 1 Classification Label objects as members of sets Things on the left Things on the right There is a set of possible examples Each example is
More informationENHANCED RADAR IMAGING VIA SPARSITY REGULARIZED 2D LINEAR PREDICTION
ENHANCED RADAR IMAGING VIA SPARSITY REGULARIZED 2D LINEAR PREDICTION I.Erer 1, K. Sarikaya 1,2, H.Bozkurt 1 1 Department of Electronics and Telecommunications Engineering Electrics and Electronics Faculty,
More informationMPEG-4 General Audio Coding
MPEG-4 General Audio Coding Jürgen Herre Fraunhofer Institute for Integrated Circuits (IIS) Dr. Jürgen Herre, hrr@iis.fhg.de 1 General Audio Coding Solid state players, Internet audio, terrestrial and
More informationPacket Loss Concealment for Audio Streaming based on the GAPES and MAPES Algorithms
26 IEEE 24th Convention of Electrical and Electronics Engineers in Israel Packet Loss Concealment for Audio Streaming based on the GAPES and MAPES Algorithms Hadas Ofir and David Malah Department of Electrical
More informationLocal Features Tutorial: Nov. 8, 04
Local Features Tutorial: Nov. 8, 04 Local Features Tutorial References: Matlab SIFT tutorial (from course webpage) Lowe, David G. Distinctive Image Features from Scale Invariant Features, International
More informationIntroducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd
Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd Overview Audio Signal Processing Applications @ Dolby Audio Signal Processing Basics
More informationEFFICIENT REPRESENTATION OF LIGHTING PATTERNS FOR IMAGE-BASED RELIGHTING
EFFICIENT REPRESENTATION OF LIGHTING PATTERNS FOR IMAGE-BASED RELIGHTING Hyunjung Shim Tsuhan Chen {hjs,tsuhan}@andrew.cmu.edu Department of Electrical and Computer Engineering Carnegie Mellon University
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationComputer Vision Lecture 20
Computer Vision Lecture 2 Motion and Optical Flow Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de 28.1.216 Man slides adapted from K. Grauman, S. Seitz, R. Szeliski,
More informationTWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University
TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION Prateek Verma, Yang-Kai Lin, Li-Fan Yu Stanford University ABSTRACT Structural segmentation involves finding hoogeneous sections appearing
More informationApplication of Principal Components Analysis and Gaussian Mixture Models to Printer Identification
Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification Gazi. Ali, Pei-Ju Chiang Aravind K. Mikkilineni, George T. Chiu Edward J. Delp, and Jan P. Allebach School
More information19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek
More informationDistributed Signal Processing for Binaural Hearing Aids
Distributed Signal Processing for Binaural Hearing Aids Olivier Roy LCAV - I&C - EPFL Joint work with Martin Vetterli July 24, 2008 Outline 1 Motivations 2 Information-theoretic Analysis 3 Example: Distributed
More informationDUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING
DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING Christopher Burges, Daniel Plastina, John Platt, Erin Renshaw, and Henrique Malvar March 24 Technical Report MSR-TR-24-19 Audio fingerprinting
More informationImage Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi
Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi 1. Introduction The choice of a particular transform in a given application depends on the amount of
More informationMotion and Optical Flow. Slides from Ce Liu, Steve Seitz, Larry Zitnick, Ali Farhadi
Motion and Optical Flow Slides from Ce Liu, Steve Seitz, Larry Zitnick, Ali Farhadi We live in a moving world Perceiving, understanding and predicting motion is an important part of our daily lives Motion
More informationThe Automatic Musicologist
The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12, 2004 Based on the paper: Fast Recognition of Musical
More informationComputer Vision Lecture 20
Computer Perceptual Vision and Sensory WS 16/17 Augmented Computing Computer Perceptual Vision and Sensory WS 16/17 Augmented Computing Computer Perceptual Vision and Sensory WS 16/17 Augmented Computing
More informationMultimedia Communications. Audio coding
Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated
More information