Audio Content-Based Music Retrieval

Size: px
Start display at page:

Download "Audio Content-Based Music Retrieval"

Transcription

1 Tutorial Audio Content-Based Music Retrieval Meinard Müller Saarland University and MPI Informatik Joan Serrà Artificial Intelligence Research Institute Spanish National Research Council Meinard Müller 2007 Habilitation Bonn, Germany 2007 MPI Informatik Saarland, Germany Senior Researcher Multimedia Information Retrieval & Music Processing 6 PhD Students

2 Joan Serrà Ph.D. Music Technology Group Universitat Pompeu Fabra Barcelona, Spain Postdoctoral Resarcher Artificial Intelligence Research Institute Spanish National Research Council Music information retrieval, machine learning, time series analysis, complex networks,... Music Retrieval Textual metadata Traditional retrieval Searching for artist, title, Rich and expressive metadata Generated by experts Crowd tagging, social networks Content-based retrieval Automatic generation of tags Query-by-example

3 Query-by-Example Database Query Hits Retrieval tasks: Bernstein (1962) Beethoven, Symphony No. 5 Audio identification Audio matching Version identification Category-based music retrieval Beethoven, Symphony No. 5: Bernstein (1962) Karajan (1982) Gould (1992) Beethoven, Symphony No. 9 Beethoven, Symphony No. 3 Haydn Symphony No. 94 Query-by-Example Taxonomy Retrieval tasks: Audio identification Audio matching Specificity level High specificity Granularity level Fragment-based retrieval Version identification Category-based music retrieval Low specificity Document-based retrieval

4 Overview Part I: Audio identification Part II: Audio matching Coffee Break Part III: Version identification Part IV: Category-based music retrieval Overview Part I: Audio identification Part II: Audio matching Coffee Break Part III: Version identification Part IV: Category-based music retrieval

5 Overview Part I: Audio identification Part II: Audio matching Coffee Break Part III: Version identification Part IV: Category-based music retrieval Overview Part I: Audio identification Part II: Audio matching Coffee Break Part III: Version identification Part IV: Category-based music retrieval

6 Overview Part I: Audio identification Part II: Audio matching Coffee Break Part III: Version identification Part IV: Category-based music retrieval Overview Part I: Audio identification Part II: Audio matching Coffee Break Part III: Version identification Part IV: Category-based music retrieval

7 Tutorial Audio Content-Based Music Retrieval Meinard Müller Saarland University and MPI Informatik Joan Serrà Artificial Intelligence Research Institute Spanish National Research Council Overview Part I: Audio identification Part II: Audio matching Coffee Break Part III: Version identification Part IV: Category-based music retrieval

8 Audio Identification Database: Goal: Huge collection consisting of all audio recordings (feature representations) to be potentially identified. Given a short query audio fragment, identify the original audio recording the query is taken from. Notes: Instance of fragment-based retrieval High specificity Not the piece of music is identified but a specific rendition of the piece Application Scenario User hears music playing in the environment User records music fragment (5-15 seconds) with mobile phone Audio fingerprints are extracted from the recording and sent to an audio identification service Service identifies audio recording based on fingerprints Service sends back metadata (track title, artist) to user

9 Audio Fingerprints An audio fingerprint is a content-based compact signature that summarizes some specific audio content. Requirements: Discriminative power Invariance to distortions Compactness Computational simplicity Audio Fingerprints An audio fingerprint is a content-based compact signature that summarizes a piece of audio content Requirements: Discriminative power Invariance to distortions Compactness Computational simplicity Ability to accurately identify an item within a huge number of other items (informative, characteristic) Low probability of false positives Recorded query excerpt only a few seconds Large audio collection on the server side (millions of songs)

10 Audio Fingerprints An audio fingerprint is a content-based compact signature that summarizes a piece of audio content Requirements: Discriminative power Invariance to distortions Compactness Computational simplicity Recorded query may be distorted and superimposed with other audio sources Background noise Pitching (audio played faster or slower) Equalization Compression artifacts Cropping, framing Audio Fingerprints An audio fingerprint is a content-based compact signature that summarizes a piece of audio content Requirements: Discriminative power Invariance to distortions Compactness Reduction of complex multimedia objects Reduction of dimensionality Making indexing feasible Allowing for fast search Computational simplicity

11 Audio Fingerprints An audio fingerprint is a content-based compact signature that summarizes a piece of audio content Requirements: Discriminative power Invariance to distortions Computational efficiency Extraction of fingerprint should be simple Size of fingerprints should be small Compactness Computational simplicity Literature (Audio Identification) Allamanche et al. (AES 2001) Cano et al. (AES 2002) Haitsma/Kalker (ISMIR 2002) Kurth/Clausen/Ribbrock (AES 2002) Wang (ISMIR 2003) Dupraz/Richard (ICASSP 2010) Ramona/Peeters (ICASSP 2011)

12 Literature (Audio Identification) Allamanche et al. (AES 2001) Cano et al. (AES 2002) Haitsma/Kalker (ISMIR 2002) Kurth/Clausen/Ribbrock (AES 2002) Wang (ISMIR 2003) Dupraz/Richard (ICASSP 2010) Ramona/Peeters (ICASSP 2011) Fingerprints (Shazam) Steps: 1. Spectrogram 2. Peaks (local maxima) Frequency (Hz) Intensity Efficiently computable Standard transform Robust

13 Fingerprints (Shazam) Steps: 1. Spectrogram 2. Peaks Frequency (Hz) Intensity Fingerprints (Shazam) Steps: 1. Spectrogram 2. Peaks / differing peaks Robustness: Frequency (Hz) Intensity Noise, reverb, room acoustics, equalization

14 Fingerprints (Shazam) Steps: 1. Spectrogram 2. Peaks / differing peaks Robustness: Frequency (Hz) Intensity Noise, reverb, room acoustics, equalization Audio codec Fingerprints (Shazam) Steps: 1. Spectrogram 2. Peaks / differing peaks Robustness: Frequency (Hz) Intensity Noise, reverb, room acoustics, equalization Audio codec Superposition of other audio sources

15 Matching Fingerprints (Shazam) Database document Frequency (Hz) Intensity Matching Fingerprints (Shazam) Database document (constellation map) Frequency (Hz)

16 Matching Fingerprints (Shazam) Database document (constellation map) Query document (constellation map) Frequency (Hz) Matching Fingerprints (Shazam) Database document (constellation map) Query document (constellation map) 1. Shift query across database document 2. Count matching peaks Frequency (Hz) #(matching peaks) Shift (seconds)

17 Matching Fingerprints (Shazam) Database document (constellation map) Query document (constellation map) 1. Shift query across database document 2. Count matching peaks Frequency (Hz) #(matching peaks) Shift (seconds) Matching Fingerprints (Shazam) Database document (constellation map) Query document (constellation map) 1. Shift query across database document 2. Count matching peaks Frequency (Hz) #(matching peaks) Shift (seconds)

18 Matching Fingerprints (Shazam) Database document (constellation map) Query document (constellation map) 1. Shift query across database document 2. Count matching peaks Frequency (Hz) #(matching peaks) Shift (seconds) Matching Fingerprints (Shazam) Database document (constellation map) Query document (constellation map) 1. Shift query across database document 2. Count matching peaks Frequency (Hz) #(matching peaks) Shift (seconds)

19 Matching Fingerprints (Shazam) Frequency (Hz) Database document (constellation map) #(matching peaks) Query document (constellation map) 1. Shift query across database document 2. Count matching peaks High count indicates a hit (document ID & position) Shift (seconds) Indexing (Shazam) Index the fingerprints using hash lists Hashes correspond to (quantized) frequencies Hash 2 B Fre equency (Hz) Hash 2 Hash 1

20 Indexing (Shazam) Index the fingerprints using hash lists Hashes correspond to (quantized) frequencies Hash list consists of time positions (and document IDs) N B 2 B = number of spectral peaks = #(bits) used to encode spectral peaks = number of hash lists N / 2 B = average number of elements per list Hash 2 B Fre equency (Hz) Hash 2 Problem: Individual peaks are not characteristic Hash lists may be very long Not suitable for indexing Hash 1 List to Hash 1: Indexing (Shazam) Idea: Use pairs of peaks to increase specificity of hashes Frequency (Hz) 1. Peaks 2. Fix anchor point 3. Define target zone 4. Use paris of points 5. Use every point as anchor point

21 Indexing (Shazam) Idea: Use pairs of peaks to increase specificity of hashes 1. Peaks 2. Fix anchor point f 2 3. Define target zone 4. Use paris of points Frequency (Hz) t f 1 5. Use every point as anchor point New hash: Consists of two frequency values and a time difference: (,, ) f 1 f 2 t Indexing (Shazam) A hash is formed between an anchor point and each point in the target zone using two frequency values and a time difference. Fan-out (taking pairs of peaks) may cause a combinatorial explosion in the number of tokens. However, this can be controlled by the size of the target zone. Using more complex hashes increases specificity (leading to much smaller hash lists) and speed (making the retrieval much faster).

22 Indexing (Shazam) Definitions: N = number of spectral peaks p = probability that a spectral peak can be found in (noisy and distorted) query F = fan-out of target zone, e. g. F = 10 B = #(bits) used to encode spectral peaks and time difference Consequences: F N = #(tokens) to be indexed 2 B+B = increase of specifity (2 B+B+B instead of 2 B ) p 2 = propability of a hash to survive p (1-(1-p) F ) = probability that, at least, on hash survives per anchor point Example: F = 10 and B = 10 Memory requirements: F N = 10 N Speedup factor: 2 B+B / F 2 ~ 10 6 / 10 2 = (F times as many tokens in query and database, respectively) Conclusions (Shazam) Many parameters to choose: Temporal and spectral resolution in spectrogram Peak picking strategy Target zone and fan-out parameter Hash function

23 Literature (Audio Identification) Allamanche et al. (AES 2001) Cano et al. (AES 2002) Haitsma/Kalker (ISMIR 2002) Kurth/Clausen/Ribbrock (AES 2002) Wang (ISMIR 2003) Dupraz/Richard (ICASSP 2010) Ramona/Peeters (ICASSP 2011) Fingerprints (Philips) Steps: 1. Spectrogram Frequency (Hz) Intensity Efficiently computable Standard transform Robust

24 Fingerprints (Philips) Steps: 1. Spectrogram (long window) Frequency (Hz) Intensity Coarse temporal resolution Large overlap of windows Robust to temporal distortion Fingerprints (Philips) Steps: 1. Spectrogram (long window) 2. Consider limited frequency range Frequency (Hz) Intensity Hz Most relevant spectral range (perceptually)

25 Band Fingerprints (Philips) Intensity Steps: 1. Spectrogram (long window) 2. Consider limited frequency range 3. Log-frequency (Bark scale) Hz Most relevant spectral range (perceptually) 33 bands (roughly bark scale) Coarse frequency resolution Robust to spectral distortions Bit Fingerprints (Philips) State Steps: 1. Spectrogram (long window) 2. Consider limited frequency range 3. Log-frequency (Bark scale) 4. Binarization Local thresholding Sign of energy difference (simultanously along time and frequency axes) Sequence of 32-bit vectors

26 Fingerprints (Philips) Sub-fingerprint: 32-bit vector Not characteristic enough Bit State Fingerprints (Philips) Sub-fingerprint: 32-bit vector Not characteristic enough Fingerprint-block: Bit State 256 consecutive sub-fingerprints Covers roughly 3 seconds Overlapping

27 Fingerprints (Philips) Sub-fingerprint: 32-bit vector Not characteristic enough Fingerprint-block: Bit State 256 consecutive sub-fingerprints Covers roughly 3 seconds Overlapping Fingerprints (Philips) Sub-fingerprint: 32-bit vector Not characteristic enough Fingerprint-block: Bit State 256 consecutive sub-fingerprints Covers roughly 3 seconds Overlapping

28 Matching Fingerprints (Philips) Database document (fingerprint-blocks) Query document (fingerprint-block) Band Intensity Matching Fingerprints (Philips) Database document (fingerprint-blocks) Query document (fingerprint-block) 1. Shift query across database document 2. Calculate a block-wise bit-error-rate (BER) Band Intensity BER Shift (seconds)

29 Matching Fingerprints (Philips) Database document (fingerprint-blocks) Query document (fingerprint-block) 1. Shift query across database document 2. Calculate a block-wise bit-error-rate (BER) Band Intensity BER Shift (seconds) Matching Fingerprints (Philips) Database document (fingerprint-blocks) Query document (fingerprint-block) 1. Shift query across database document 2. Calculate a block-wise bit-error-rate (BER) Band Intensity BER Shift (seconds)

30 Matching Fingerprints (Philips) Database document (fingerprint-blocks) Query document (fingerprint-block) 1. Shift query across database document 2. Calculate a block-wise bit-error-rate (BER) Band Intensity BER Shift (seconds) Matching Fingerprints (Philips) Database document (fingerprint-blocks) Query document (fingerprint-block) 1. Shift query across database document 2. Calculate a block-wise bit-error-rate (BER) Band Intensity BER Shift (seconds)

31 Matching Fingerprints (Philips) Database document (fingerprint-blocks) Query document (fingerprint-block) Band 1. Shift query across database document 2. Calculate a block-wise bit-error-rate (BER) 3. Low BER indicates hit Intensity BER Shift (seconds) Indexing (Philips) Note: Problem: Strategy: Individual sub-fingerprints (32 bit) are not characteristic Fingerprint blocks (256 sub-fingerprints, 8 kbit) are used Computation of BER between query fingerprint-block and every database fingerprint-block is expensive Chance that a complete fingerprint-block survives is low Exact hashing problematic Only sub-fingerprints are indexed using hashing Exact sub-fingerprint matches are used to identify candidate fingerprint-blocks in database. BER is only computed between query fingerprint-block and candidate fingerprint-blocks Procedure is terminated when database fingerprint-block is found, where BER falls below a certain threshold

32 Indexing (Philips) Database document (fingerprint-blocks) Query document (fingerprint-block) 1. Efficient search for exact matches of sub-fingerprints (anchor points) Band Intensity Indexing (Philips) Database document (fingerprint-blocks) Query document (fingerprint-block) Band Intensity 1. Efficient search for exact matches of sub-fingerprints (anchor points) 2. Calculate BER only for blocks containing anchor points

33 Conclusions (Philips) Comparing binary fingerprint-blocks expressing tempo-spectral changes Usage of some sort of shingling technique see [Casey et al. 2008, IEEE-TASLP] for a similar approach applied to a more general retrieval task Acceleration using hash-based search for anchor-points (sub-fingerprints) Concepts of fault tolereance are required to increase robustness Susceptible to distortions in specific frequency bands (e. g. equalization) or to superpositions with other sources Conclusions (Audio Identification) Basic techniques used in Shazam and Philip systems Many more ways to define robust audio fingerprints Delicate trade-off between specificity, robustness, and efficiency Audio recording is identified (not a piece of music) Does not allow for identifying studio recording using a query taken from live recordings Does not generalize to identify different interpretations or versions of the same piece of music

34 Tutorial Audio Content-Based Music Retrieval Meinard Müller Saarland University and MPI Informatik Joan Serrà Artificial Intelligence Research Institute Spanish National Research Council Overview Part I: Audio identification Part II: Audio matching Coffee Break Part III: Version identification Part IV: Category-based music retrieval

35 Audio Matching Database: Goal: Notes: Audio collection containing: Several recordings of the same piece of music Different interpretations by various musicians Arrangements in different instrumentations Given a short query audio fragment, find all corresponding audio fragments of similar musical content. Instance of fragment-based retrieval Medium specificity A single document may contain several hits Cross-modal retrieval also feasible Audio Matching Beethoven s Fifth Various interpretations Bernstein Karajan Scherbakov (piano) MIDI (piano)

36 Application Scenario Content-based retrieval Application Scenario Cross-modal retrieval

37 Literature (Audio Matching) Pickens et al. (ISMIR 2002) Müller/Kurth/Clausen (ISMIR 2005) Suyoto et al. (IEEE TASLP 2008) Casey et al. (IEEE TASLP 2008) Kurth/Müller (IEEE TASLP 2008) Yu et al. (ACM MM 2010) Audio Matching Two main ingredients: 1.) Audio features Robust but discriminating Chroma-based features Correlate to harmonic progression Robust to variations in dynamics, timbre, articulation, local tempo 2.) Matching procedure Efficient Robust to local and global tempo variations Scalable using index structure

38 Audio Features Example: Chromatic scale Spectrogram Frequency (Hz z) Intensity (db) Audio Features Example: Chromatic scale Log-frequency spectrogram MIDI pitch Intensity (db)

39 Audio Features Example: Chromatic scale Chroma representation Chroma Intensity (db) Audio Features Example: Chromatic scale Chroma representation (normalized, Euclidean) Chroma Intensity (normalized) )

40 Audio Features Example: Beethoven s Fifth Chroma representation (normalized, 10 Hz) Karajan Scherbakov Audio Features Example: Beethoven s Fifth Chroma representation (normalized, 2 Hz) Smoothing (2 seconds) + downsampling (factor 5) Karajan Scherbakov

41 Audio Features Example: Beethoven s Fifth Chroma representation (normalized, 1 Hz) Smoothing (4 seconds) + downsampling (factor 10) Karajan Scherbakov Audio Features There are many ways to implement chroma features Properties may differ significantly Appropriateness depends on respective application MATLAB implementations for various chroma variants ISMIR 2011, Poster Session (PS2), Tuesday 13-15

42 Matching Procedure Compute chroma feature sequences Database Query N very large (database size), M small (query size) Matching curve Matching Procedure Query DB Bach Beethoven/Bernstein Beethoven/Sawallisch Shostakovich

43 Matching Procedure Query DB Bach Beethoven/Bernstein Beethoven/Sawallisch Shostakovich Matching Procedure Query DB Bach Beethoven/Bernstein Beethoven/Sawallisch Shostakovich

44 Matching Procedure Query DB Bach Beethoven/Bernstein Beethoven/Sawallisch Shostakovich Matching Procedure Matching curve Query: Beethoven s Fifth / Bernstein (first 20 seconds) Bach Beethoven/Bernstein Beethoven/Sawallisch Shostakovich

45 Matching Procedure Matching curve Query: Beethoven s Fifth / Bernstein (first 20 seconds) Bach Beethoven/Bernstein Beethoven/Sawallisch Shostakovich Hits Matching Procedure Problem: How to deal with tempo differences? Karajan is much faster then Bernstein! Beethoven/Karajan Matching curve does not indicate any hits!

46 Matching Procedure 1. Strategy: Usage of local warping Karajan is much faster then Bernstein! Warping strategies are computationally expensive and hard for indexing. Beethoven/Karajan Matching Procedure 2. Strategy: Usage of multiple scaling Beethoven/Karajan

47 Matching Procedure 2. Strategy: Usage of multiple scaling Beethoven/Karajan Matching Procedure 2. Strategy: Usage of multiple scaling Beethoven/Karajan

48 Matching Procedure 2. Strategy: Usage of multiple scaling Query resampling simulates tempo changes Beethoven/Karajan Matching Procedure 2. Strategy: Usage of multiple scaling Query resampling simulates tempo changes Minimize over all curves Beethoven/Karajan

49 Matching Procedure 2. Strategy: Usage of multiple scaling Query resampling simulates tempo changes Minimize over all curves Resulting curve is similar warping curve Beethoven/Karajan Experiments Audio database 110 hours, 16.5 GB Preprocessing chroma features, 40.3 MB Query clip 20 seconds Retrieval time 10 seconds (using MATLAB)

50 Experiments Query: Beethoven s Fifth / Bernstein (first 20 seconds) Rank Piece Position 1 Beethoven s Fifth/Bernstein Beethoven s Fifth/Bernstein Beethoven s Fifth/Karajan Beethoven s Fifth/Karajan Beethoven (Liszt) Fifth/Scherbakov Beethoven s Fifth/Sawallisch Beethoven (Liszt) Fifth/Scherbakov Schumann Op. 97,1/Levine Experiments Query: Shostakovich, Waltz / Chailly (first 21 seconds) Expected hits Shostakovich/Chailly Shostakovich/Yablonsky

51 Experiments Query: Shostakovich, Waltz / Chailly (first 21 seconds) Rank Piece Position 1 Shostakovich/Chailly Shostakovich/Chailly Shostakovich/Chailly Shostakovich/Yablonsky Shostakovich/Yablonsky Shostakovich/Yablonsky Shostakovich/Chailly Bach BWV 582/Chorzempa Beethoven Op. 37,1/Toscanini Beethoven Op. 37,1/Pollini Indexing Matching procedure is linear in size of database Retrieval time was 10 seconds for 110 hours of audio Much too slow Does not scale to millions of songs Need of indexing methods

52 Indexing General procedure Convert database into feature sequence (chroma) Quantize features with respect to a fixed codebook Create an inverted file index contains for each codebook vector an inverted list each list contains feature indices in ascending order [Kurth/Müller, IEEE-TASLP 2008] Indexing Quantization Feature space Visualization (3D)

53 Indexing Quantization Feature space Codebook selection of suitable size R Quantization using nearest neighbors Indexing How to derive a good codebook? Codebook selection by unsupervised learning Linde Buzo Gray (LBG) algorithm similar to k-means adjust algorithm to spheres Codebook selection based on musical knowledge

54 Indexing LBG algorithm Steps: 1. Initialization of codebook vectors 2. Assignment 3. Recalculation 4. Iteration (back to 2.) Indexing LBG algorithm Steps: 1. Initialization of codebook vectors 2. Assignment 3. Recalculation 4. Iteration (back to 2.)

55 Indexing LBG algorithm Steps: 1. Initialization of codebook vectors 2. Assignment 3. Recalculation 4. Iteration (back to 2.) Indexing LBG algorithm Steps: 1. Initialization of codebook vectors 2. Assignment 3. Recalculation 4. Iteration (back to 2.)

56 Indexing LBG algorithm Steps: 1. Initialization of codebook vectors 2. Assignment 3. Recalculation 4. Iteration (back to 2.) Indexing LBG algorithm Steps: 1. Initialization of codebook vectors 2. Assignment 3. Recalculation 4. Iteration (back to 2.)

57 Indexing LBG algorithm Steps: 1. Initialization of codebook vectors 2. Assignment 3. Recalculation 4. Iteration (back to 2.) Indexing LBG algorithm Steps: 1. Initialization of codebook vectors 2. Assignment 3. Recalculation 4. Iteration (back to 2.) Until convergence

58 Indexing LBG algorithm for spheres Example: 2D Assignment Recalculation Projection Indexing LBG algorithm for spheres Example: 2D Assignment Recalculation Projection

59 Indexing LBG algorithm for spheres Example: 2D Assignment Recalculation Projection Indexing LBG algorithm for spheres Example: 2D Assignment Recalculation Projection

60 Indexing Codebook using musical knowledge Observation: Chroma features capture harmonic information Example: C-Major Example: C # -Major Experiments: For more then 95% of all chroma features >50% of energy lies in at most 4 components Indexing Codebook using musical knowledge C-Major C # -Major Choose codebook to contain n-chords for n=1,2,3,4 n template #

61 Indexing Codebook using musical knowledge Additional consideration of harmonics in chord templates Example: 1-chord C Harmonics Pitch C3 C4 G4 C5 E5 G5 Frequency Chroma C C G C E C Replace by with suitable weights for the harmonics Indexing Quantization Orignal chromagram and projections on codebooks Original LBG-based Model-based

62 Indexing Query and retrieval stage Query consists of a short audio clip (10-40 seconds) Specification of fault tolerance setting fuzzyness of query number of admissable mismatches tolerance to tempo variations tolerance to modulations Indexing Retrieval results Medium sized database 500 pieces 112 hours of audio mostly classical music Selection of various queries 36 queries duration between 10 and 40 seconds hand-labelled matches in database Indexing leads to speed-up factor between 15 and 20 (depending on query length) Only small degradation in precision and recall

63 Indexing Retrieval results No index LBG-based index Model-based index Average Precisi ion Average Recall Indexing Conclusions Described method suitable for medium-sized databases index is assumed to be in main memory inverted lists may be long Goal was to find all meaningful matches high-degree of fault-tolerance required (fuzzyness, mismatches) number of intersections and unions may explode What to do when dealing with millions of songs? Can the quantization be avoided? Better indexing and retrieval methods needed! kd-trees locality sensitive hashing

64 Conclusions (Audio Matching) Matching procedure Strategy: Exact matching and multiple scaled queries simulate tempo variations by feature resampling different queries correspond to different tempi indexing possible Strategy: Dynamic time warping subsequence variant more flexible (in particular for longer queries) indexing hard Conclusions (Audio Matching) Audio Features Strategy: Absorb variations already at feature level Chroma invariance to timbre Normalization invariance to dynamics Smoothing invariance to local time deviations Message: There is no standard chroma feature! Variants can make a huge difference!

65 Feature Design Enhancement of chroma features Usage of audio matching framework for evaluating the quality of obtained audio features Usage of matching curves as mid-level representation to reveal a feature s robustness and discriminative capability M. Müller and S. Ewert (2010): Towards Timbre-Invariant Audio Features for Harmony-Based Music. IEEE Trans. on Audio, Speech & Language Processing, Vol. 18, No. 3, pp [Müller/Ewert, IEEE-TASLP 2010] Motivation: Audio Matching

66 Motivation: Audio Matching Four occurrences of the main theme First occurrence Third occurrence Chroma Features First occurrence Third occurrence Chroma scale

67 Chroma Features First occurrence Third occurrence Chroma scale How to make chroma features more robust to timbre changes? Chroma Features First occurrence Third occurrence Chroma scale How to make chroma features more robust to timbre changes? Idea: Discard timbre-related information [Müller/Ewert, IEEE-TASLP 2010]

68 MFCC Features and Timbre MFCC coefficient [Müller/Ewert, IEEE-TASLP 2010] MFCC Features and Timbre MFCC coefficient Lower MFCCs Timbre [Müller/Ewert, IEEE-TASLP 2010]

69 MFCC Features and Timbre MFCC coefficient Lower MFCCs Timbre Idea: Discard lower MFCCs to achieve timbre invariance [Müller/Ewert, IEEE-TASLP 2010] Enhancing Timbre Invariance Short-Time Pitch Energy Steps: 1. Log-frequency spectrogram Pitch scale [Müller/Ewert, IEEE-TASLP 2010]

70 Enhancing Timbre Invariance Log Short-Time Pitch Energy Steps: 1. Log-frequency spectrogram 2. Log (amplitude) Pitch scale [Müller/Ewert, IEEE-TASLP 2010] Enhancing Timbre Invariance PFCC Steps: 1. Log-frequency spectrogram 2. Log (amplitude) 3. DCT Pitch scale [Müller/Ewert, IEEE-TASLP 2010]

71 Enhancing Timbre Invariance PFCC Steps: Pitch scale 1. Log-frequency spectrogram 2. Log (amplitude) 3. DCT 4. Discard lower coefficients [1:n-1] [Müller/Ewert, IEEE-TASLP 2010] Enhancing Timbre Invariance PFCC Steps: Pitch scale 1. Log-frequency spectrogram 2. Log (amplitude) 3. DCT 4. Keep upper coefficients [n:120] [Müller/Ewert, IEEE-TASLP 2010]

72 Enhancing Timbre Invariance Steps: Pitch scale 1. Log-frequency spectrogram 2. Log (amplitude) 3. DCT 4. Keep upper coefficients [n:120] 5. Inverse DCT [Müller/Ewert, IEEE-TASLP 2010] Enhancing Timbre Invariance Steps: Chroma scale 1. Log-frequency spectrogram 2. Log (amplitude) 3. DCT 4. Keep upper coefficients [n:120] 5. Inverse DCT 6. Chroma & Normalization [Müller/Ewert, IEEE-TASLP 2010]

73 Enhancing Timbre Invariance Steps: Chroma scale CRP(n) 1. Log-frequency spectrogram 2. Log (amplitude) 3. DCT 4. Keep upper coefficients [n:120] 5. Inverse DCT 6. Chroma & Normalization Chroma DCT-Reduced Log-Pitch [Müller/Ewert, IEEE-TASLP 2010] Chroma versus CRP Shostakovich Waltz First occurrence Third occurrence Chroma [Müller/Ewert, IEEE-TASLP 2010]

74 Chroma versus CRP Shostakovich Waltz First occurrence Third occurrence Chroma CRP(55) n = 55 [Müller/Ewert, IEEE-TASLP 2010] Quality: Audio Matching Query: Shostakovich, Waltz / Yablonsky (3. occurrence) Shostakovich/Chailly Shostakovich/Yablonsky

75 Quality: Audio Matching Query: Shostakovich, Waltz / Yablonsky (3. occurrence) Shostakovich/Chailly Shostakovich/Yablonsky Quality: Audio Matching Query: Shostakovich, Waltz / Yablonsky (3. occurrence) Shostakovich/Chailly Shostakovich/Yablonsky

76 Quality: Audio Matching Query: Shostakovich, Waltz / Yablonsky (3. occurrence) Shostakovich/Chailly Shostakovich/Yablonsky Quality: Audio Matching Query: Shostakovich, Waltz / Yablonsky (3. occurrence) Standard Chroma (Chroma Pitch) Shostakovich/Chailly Shostakovich/Yablonsky

77 Quality: Audio Matching Query: Shostakovich, Waltz / Yablonsky (3. occurrence) Standard Chroma (Chroma Pitch) CRP(55) Shostakovich/Chailly Shostakovich/Yablonsky Quality: Audio Matching Query: Free in you / Indigo Girls (1. occurence) Standard Chroma (Chroma Pitch) Free in you/indigo Girls Free in you/dave Cooley

78 Quality: Audio Matching Query: Free in you / Indigo Girls (1. occurence) Standard Chroma (Chroma Pitch) CRP(55) Free in you/indigo Girls Free in you/dave Cooley Chroma Toolbox There are many ways to implement chroma features Properties may differ significantly Appropriateness depends on respective application MATLAB implementations for various chroma variants ISMIR 2011, Poster Session (PS2), Tuesday 13-15

79 Tutorial Audio Content-Based Music Retrieval Meinard Müller Saarland University and MPI Informatik Joan Serrà Artificial Intelligence Research Institute Spanish National Research Council Overview Part I: Audio identification Part II: Audio matching Coffee Break Part III: Version identification Part IV: Category-based music retrieval

80 Version Identification Goal: Given a music recording or a sufficiently long excerpt of it, find all corresponding music recordings within a collection that can be regarded as a kind of version or reinterpretation of the same piece. Version Identification Versions: Alternative recordings, renditions, arrangements or performances of the same musical piece. Live versions Cover songs Tributes Demos Parodies Contemporary versions of an old song Radically different interpretations of a musical piece

81 Version Identification Motivation Search and organization of music collections Near-duplicate detection/discovery of music documents Musical rights management and licenses Radio broadcast monitoring, advertisements,... Aid in court decisions/music copyright infringement (e.g. Müllensiefen & Pendzich, Mus. Sci., 2009) Creative application contexts Composers (e.g. assess originality of ideas) Musicologists (e.g. track origins of a motive) Understanding (what s the essence of a song?) Fun/leisure Version Identification Bob Dylan Knockin on Heaven s Door Duke Ellington Take the A train Metallica Enter Sandman Nirvana Poly [Incesticide Album] Stan Getz Garota de Ipanema Black Sabbath Paranoid AC/DC High Voltage Queen Bohemian rhapsody Key Harmonization Timbre Tempo Timing Lyrics Recording conditions Structure Avril Lavigne Knockin on Heaven s Door James Aebersold Take the A train Apocalyptica Enter Sandman Nirvana Poly [Unplugged] Rosa Pasos & Ron Carter Garota de Ipanema Cindy & Bert Der Hund Der Baskerville AC/DC High Voltage [live] Molotov Bohemian rhapsody

82 Version Identification Everything changes, but there must be some essence that allows us to identify versions How to compare two different recordings? What is kept?... What allows us to still identify such pieces? Tonal information: The played notes, their temporal succession, combinations and organization. Version Identification Main idea: Exploit common information (tonal information) Be invariant against potential changes (timbre, main tonality, tempo, structure, )

83 Main building blocks of a version identification system Version Identification: Approach A [Serrà et al., New J. Phys., 2009]

84 Version Identification: Approach A [Serrà et al., New J. Phys., 2009] Version Identification: Approach A Feature extraction Chroma (PCP) features Correlates to harmonic + melodic progression Robust to changes in timbre and instrumentation Normalization introduces invariance to dynamics

85 Version Identification: Approach A Feature extraction Harmonic Pitch Class Profiles (HPCP) [Gómez, PhD, 2006] Version Identification: Approach A [Serrà et al., New J. Phys., 2009]

86 Version Identification: Approach A Key invariance Knockin on Heaven s Door Bob Dylan Avril Lavigne Compute average chroma vectors for each song Consider cyclic shifts of the chroma vectors to simulate transpositions Determine optimal shift indices so that the shifted chroma vectors are matched with minimal cost Transpose the songs accordingly Cyclic Chroma Shifts Given chroma vectors Fix a local cost measure Compute cost between x and shifted y x y 1 Cost Shift index

87 Cyclic Chroma Shifts Given chroma vectors Fix a local cost measure Compute cost between x and shifted y x σ (y) 1 Cost Shift index Cyclic Chroma Shifts Given chroma vectors Fix a local cost measure Compute cost between x and shifted y x σ 2 (y) 1 Cost Shift index

88 Cyclic Chroma Shifts Given chroma vectors Fix a local cost measure Compute cost between x and shifted y x σ 3 (y) 1 Cost Shift index Cyclic Chroma Shifts Given chroma vectors Fix a local cost measure Compute cost between x and shifted y x σ 11 (y) 1 Cost Shift index

89 Cyclic Chroma Shifts Given chroma vectors Fix a local cost measure Compute cost between x and shifted y Minimizing shift index: 3 x σ 3 (y) 1 Cost Shift index Version Identification: Approach A [Serrà et al., New J. Phys., 2009]

90 Version Identification: Approach A Local similarity Phase space embedding (smoothing) Cross-recurrence plot (thresholding) [Kantz & Schreiber, Camb. Univ. Press, 2004; Marwan et al., Phys. Rep., 2007] Version Identification: Approach A Local similarity Phase space embedding (smoothing) Taking m consecutive samples Separated with some delayτ Concatenating them to form a high-dimensional vector r xˆ i r r r [ x x x Lx ] = i i τ i 2τ i ( m 1) τ r [Kantz & Schreiber, Camb. Univ. Press, 2004]

91 Version Identification: Approach A Local similarity: Smooting Local dissimilarity (cost) matrix Version Identification: Approach A Local similarity: Smooting Local dissimilarity (cost) matrix

92 Version Identification: Approach A Local similarity: Smooting Local dissimilarity (cost) matrix: Tempo changes get blurred Version Identification: Approach A Local similarity: Smooting Local dissimilarity (cost) matrix: Tempo changes get blurred

93 Version Identification: Approach A Local similarity: Smooting Filtering along 8 plausible directions and minimizing [Müller & Kurth, ICASSP, 2006] Version Identification: Approach A Local similarity Cross recurrence plot (thresholding the local cost dissimilarity measure) [Marwan et al., Phys. Rep., 2007]

94 Version Identification: Approach A Local similarity Cross recurrence plot (thresholding the local cost dissimilarity measure) Absolute threshold Percentage of the mean cost found Taking each frame s K nearest neighbors [Marwan et al., Phys. Rep., 2007] Version Identification: Approach A [Serrà et al., New J. Phys., 2009]

95 Alignment Classical DTW Global correspondence between X and Y X Y Subsequence DTW Subsequence of Y corresponds to X X Y Local Alignment Subsequence of Y corresponds to subequence of X X Y Global Alignment Dynamic Time Warping (DTW) An optimal warping between two time series (from beginning to end) is found by dynamic programming ( accumulating local costs + backtracking) D( n, m) D( n 1, m 1) = C( n, m) + min D( n, m 1) D( n 1, m)

96 Global Alignment Dynamic Time Warping (DTW): We can apply global path constraints (Sakoe-Chiba band) (Itakura parallelogram) Global Alignment Dynamic Time Warping (DTW) Or local path constraints D ( n 1, m 1) D( n, m) = C( n, m) + min D( n, m 1) D( n 1, m)

97 Global Alignment Dynamic Time Warping (DTW) Or local path constraints D ( n 1, m 1) D( n, m) = C( n, m) + min D( n 1, m 2) D( n 2, m 1) Alignment Classical DTW Global correspondence between X and Y X Y Subsequence DTW Subsequence of Y corresponds to X X Y Local Alignment Subsequence of Y corresponds to subequence of X X Y

98 Subsequence Alignment Subsequence DTW Compute cumulative dissimilarity matrix as before Backtrack from lowest value in a given end region unti the start region is reached Alignment Classical DTW Global correspondence between X and Y X Y Subsequence DTW Subsequence of Y corresponds to X X Y Local Alignment Subsequence of Y corresponds to subequence of X X Y

99 Local Alignment Task: Let X=(x 1,,x N ) and Y=(y 1,,y M ) be two time series. Then, find the maximum similarity of a subsequence of X and a subsequence of Y Note: This problem is also known from bioinformatics. The Smith-Waterman algorithm is a well-known algorithm for performing local sequence alignment; that is, for determining similar regions between two nucleotide or protein sequences. Similar concepts have been developed in parallel in the nonlinear time series analysis community (physics). The cross recurrence plots and their recurrence quantification measures are also a good starting point. Strategy: Develop our own variant of these algorithms Specifically tackle structure changes and small temporal disruptions (both very important for version identification) Local Alignment Computation of accumulated score matrix S from a binary similarity (reward) matrix R Think positive! D( n 1, m 1) D( n, m) = C( n, m) + min D( n 1, m 2) D( n 2, m 1) S( n 1, m 1) S( n, m) = R( n, m) + max S( n 1, m 2) S( n 2, m 1) S( n 1, m 1) R( n, m) + max S( n 1, m 2) S( n 2, m 1) S( n, m) = 0 S( n 1, m 1) g max S( n 1, m 2) g S( n 2, m 1) g if R( n, m) > 0 otherwise Differentiate between matches and mismatches

100 Local Alignment Computation of accumulated score matrix S from a binary similarity (reward) matrix R S( n 1, m 1) R( n, m) + max S( n 1, m 2) if R( n, m) > 0 S( n 2, m 1) S( n, m) = 0 S( n 1, m 1) g max otherwise S( n 1, m 2) g S( n 2, m 1) g g penalizes``inserts and ``delets in alignment Zero-entry allows for jumping to any cell without penalty Best local alignment score is the highest value in S Best local alignment ends at cell of highest value Start is obtained by backtracking to first cell of value zero Local Alignment Score Example: Knockin on Heaven s Door 140 Knockin' on Heavens's Door Accumulated score matrix Bob Dylan Bob Dylan Guns and Roses Guns and Roses 0

101 Local Alignment Score Example: Knockin on Heaven s Door 140 Knockin' on Heavens's Door Accumulated score matrix Cell with max. score = 94.2 Bob Dylan Bob Dylan Guns and Roses Guns and Roses 0 Local Alignment Score Example: Knockin on Heaven s Door 140 Knockin' on Heaven's Door Accumulated score matrix Cell with max. score = 94.2 Bob Dylan Bob Dylan Alignment path of maximal score Guns and Roses Guns and Roses 0

102 Local Alignment Score Example: Knockin on Heaven s Door Knockin' on Heaven's Door Accumulated score matrix Bob Dylan Bob Dylan Accumulated score Cell with max. score = 94.2 Alignment path of maximal score 20 Alignment path Guns and Roses Guns and Roses 0 Local Alignment Score Example: Knockin on Heaven s Door Knockin' on Heaven's Door Accumulated score matrix Cell with max. score = 94.2 Bob Dylan Bob Dylan Alignment path of maximal score Guns and Roses Guns and Roses Matching subsequences

103 Version Identification: Conclusions Main idea: Extract common, shared information and be invariant towards potential changes Tonal information (notes) Timbre, tempo/rhythm, key/harmonicity, structure, lyrics,... Example approach Chroma (PCP) features Transposition indices (circular shifts) Local similarity measure (subsequence + thresholding) Local alignment Many other approaches exist! Casey et al., IEEE-TASLP, 2008: Shingling + Indexing Marolt, IEEE-TMM, 2008: Mid-level representation + 2D Spectrum Ellis & Polliner, ICASSP, 2007: Beat tracking + Correlation (cf. Serrà, PhD, Literature Review, 2011)

104 Tutorial Audio Content-Based Music Retrieval Meinard Müller Saarland University and MPI Informatik Joan Serrà Artificial Intelligence Research Institute Spanish National Research Council Overview Part I: Audio identification Part II: Audio matching Coffee Break Part III: Version identification Part IV: Category-based music retrieval

105 Category-based Retrieval Goal: Find music recordings within a collection that share a specific, predefined characteristic or trait. Humans naturally group objects with similar characteristics or traits into categories (or classes). Category: Any word or concept that can be used to describe a set of recordings sharing some property. Also referred to as class, type or kind. Category-based Retrieval Categories are one of the most basic ways to represent and structure human knowledge They encapsulate characteristic, intrinsic features of a group of items Prototypes can be built and potential category members can be compared to these They can overlap and have some kind of organization (e.g. fuzzy and hierarchical) [Zbikowski, Oxford Univ. Press, 2007]

106 Category-based Retrieval Category-based Retrieval Category-based query examples: Genres (e.g. bossa-nova ) Moods (e.g. sad ) Specific instruments (e.g. extremely distorted electric guitar ) Instrument ensembles (e.g. string quartet ) Key/mode (e.g. G minor pieces ) Dynamics (e.g. lots of loudness changes ) Type of recording (e.g. live recording ) Epoch (e.g. music from the 80s )

107 Category-based Retrieval Objective: Label/annotate music documents (ideally with some confidence and probability measurements) in a way that makes sense to a user. Categories usually imply a taxonomy: Based on the opinion of experts, on the wisdom of the crowd (folksonomy), on a single user (personomy), Taxonomies can be organized, their members can overlap, different relation types can be applied (ontologies) Category-based Retrieval Unsupervised: No taxonomy. Categories emerge from the data organization under a predefined similarity measure Typical clustering algorithms: K-means, hierarchical clustering, SOM, growing hierarchical SOM,

108 Category-based Retrieval Supervised: Use a taxonomy Map features to categories (usually without describing the rules of this mapping explicitly) Typical supervised learning algorithms: KNN, neural networks, LDA, SVM, trees, Category-based Retrieval Motivation Search and organization of music collections (discovery of music documents) Fun/leisure Playlist generation Music recommendation Music similarity... [Tversky, Psych. Revs., 1977; Bogdanov et al., IEEE-TMM, 2011]

109 Typical Approach: Automatic Annotation Labeled audio Data processing: Feature transformation and selection Classifier training and validation Feature extraction Typical Approach: Automatic Annotation Labeled audio Data processing: Feature transformation and selection Classification Feature extraction Unlabeled audio Annotation, Prob(Annotation) and Conf(Annotation)

110 Typical Approach: Automatic Annotation Labeled audio Data processing: Feature transformation and selection Classification Feature extraction Unlabeled audio Annotation, Prob(Annotation) and Conf(Annotation) Typical Approach: Automatic Annotation Feature extraction Features depend on the task! (e.g. MFCC for key etimation?) Temporal: ZCR, energy, loudness, rise time, onset asyncrony, amplitude modulation, Spectral: MFCC, LPC, spectral centroid, spectral bandwidth, bark-band energies, Tonal: Chroma/PCP, melody, chords, key, Rhythmic: Beat histogram, IOIs, rhythm transform, FP, Features extracted on a short-time frame-by-frame basis

111 Typical Approach: Automatic Annotation Feature extraction But usually summarized to represent global characteristics of the whole audio excerpt: Component-wise means Component-wise variances Covariances Higher-order statistical moments (skewness, kurtosis, ) Derivatives (1st, 2nd, ) Max, min, percentiles, histograms,... (+ more exotic summarizations) Typical Approach: Automatic Annotation Labeled audio Data processing: Feature transformation and selection Classification Feature extraction Unlabeled audio Annotation, Prob(Annotation) and Conf(Annotation)

112 Typical Approach: Automatic Annotation Data processing Feature transformation and feature selection Rescale each feature so that all of them have the same influence (normalize, standardize, Gaussianize?) Detect highly correlated features (try to decorrelate them?) Remove uninformative features Reduce the dimensionality Learn about the features and the task! Typical Approach: Automatic Annotation Data processing: Feature transformation Principal Component Analysis (PCA) Linear transform Eliminates redundancy Decorrelates variables It can be used to reduce the number of features Assumes some homogeneity of the distributions We lose the information of which specific feature is relevant

113 Typical Approach: Automatic Annotation Data processing: Feature selection Fisher s ratio Ratio between inter- and intra-class scatter Measures discriminative power of feature j between classes i and i Ratio low when classes are well separated No standard way of performing this feature selection iteratively Sequential backward selection (removing bad features) Sequential forward selection (selecting good features) Typical Approach: Automatic Annotation Data processing: Feature selection Many other feature selection algorithms: Correlation-based Chi-squared Consistency Gain ratio Information gain Bootstrapping Classifier-affine feature selection [Witten & Frank, Elsevier, 2005]

114 Typical Approach: Automatic Annotation Labeled audio Data processing: Feature transformation and selection Classification Feature extraction Unlabeled audio Annotation, Prob(Annotation) and Conf(Annotation) Typical Approach: Automatic Annotation Classifiers Bayesian/probabilistic Classifiers Naïve conditional independence [Witten & Frank, Elsevier, 2005]

115 Typical Approach: Automatic Annotation Classifiers Gaussian Mixtures [Witten & Frank, Elsevier, 2005] Typical Approach: Automatic Annotation Classifiers Neural Networks

116 Typical Approach: Automatic Annotation Classifiers K Nearest Neighbors Classifier Typical Approach: Automatic Annotation Classifiers Trees

117 Typical Approach: Automatic Annotation Classifiers Support Vector Machines Typical Approach: Automatic Annotation Classifiers Many other classifiers exist Logistic regression Perceptrons Random forest Hidden Markov models Boosting Ensembles of classifiers

118 Typical Approach: Automatic Annotation Machine learning / data mining software packages Category-based Retrieval Classifiers: Issues Always perform a visual inspection of your features (Check that there are no things such as inf, or nan)

119 Category-based Retrieval Classifiers: Issues Do not classify M items with D features when D is larger or comparable to M As a rule of thumb (personal): M>20*D Category-based Retrieval Classifiers: Issues Be careful when comparing approaches that use a different number of features

120 Category-based Retrieval Classifiers: Issues If possible, use the same amount of instances per class ``Normalize by random sampling most populated classes Run several experiments (Monte-carlo) Category-based Retrieval Classifiers: Issues Avoid overfitting

121 Category-based Retrieval Classifiers: Issues Avoid overfitting Diagonal line = ideal solution for the problem Category-based Retrieval Classifiers: Issues Avoid overfitting Let s solve the same problem for a smaller dataset

122 Category-based Retrieval Classifiers: Issues Avoid overfitting Let s solve the same problem for a smaller dataset Category-based Retrieval Classifiers: Issues Avoid overfitting Classification of new data (3 errors)

123 Category-based Retrieval Classifiers: Issues Avoid overfitting Solution using a simple line (1 error) Category-based Retrieval Classifiers: Issues How to avoid overfitting: Use learning algorithms that intrinsically, by design, generalize well Pursue simple classifiers Evaluate with the most suitable measure for your task (unbiased and low-variance statistical estimators) Estimate the performance of a model with data you did not use to construct the model

124 Category-based Retrieval Classifiers: Issues Hold-out cross-validation Cross-fold validation LOOCV, Stratified cross-validation, nested crossvalidation, With all of them: ensure a sufficient number of repetitions Category-based Retrieval Classifiers: Issues Check the default parameters of your classifier Examples: Minimum number of instances per tree branch Number of NN in a KNN Complexity parameter in an SVM Distance-based classifier Which distance Effect of normalization/re-scaling

125 Category-based Retrieval Classifiers: Issues If you evaluate your features, demonstrate their usefulness/increment in accuracy Use several classifiers (include a one-rule classifier?) Use several datasets (music collections) Always compute/report random baselines (average accuracy, but also their std, min, and max values) Include also a baseline feature set in the evaluation Category-based Retrieval Classifiers: Issues If you evaluate your features, demonstrate their usefulness/increment in accuracy

126 Category-based Retrieval Classifiers: Issues Test for statistical significance! Category-based Retrieval: Conclusions Scenario: Retrieve music based on automatically annotated labels from audio Assign a label/class, a probability of that label, and a confidence value Automatic annotation Three main steps for supervised learning: Feature extraction Feature transformation and selection Classification Issues with classification tasks Overfitting, statistical significance, number of features, test several methods/datasets, randomizations,

127 Tutorial Audio Content-Based Music Retrieval Meinard Müller Saarland University and MPI Informatik Joan Serrà Artificial Intelligence Research Institute Spanish National Research Council Wrap Up Part I: Audio identification Part II: Audio matching Part III: Version identification Part IV: Category-based music retrieval

128 Wrap Up References (Audio Identification) E. Allamanche, J. Herre, O. Hellmuth, B. Fröba, and M. Cremer. AudioID: Towards content-based identification of audio material. In Proc. 110th AES Convention, P. Cano, E. Batlle, T. Kalker, and J. Haitsma. A review of algorithms for audio fingerprinting. In Proc. MMSP, pp , P. Cano, E. Batlle, T. Kalker, and J. Haitsma. A review of audio fingerprinting. Journal of VLSI Signal Processing Systems, 41(3): , E. Dupraz and G. Richard. Robust frequency-based audio fingerprinting. In Proc. IEEE ICASSP, pp , J. Haitsma and T. Kalker. A highly robust audio fingerprinting system. In Proc. ISMIR, pages , Y. Ke, D. Hoiem, and R. Sukthankar. Computer Vision for Music Identification. In Proc. IEEE CVPR, pages , 2005.

Machine Learning for Music Discovery

Machine Learning for Music Discovery Talk Machine Learning for Music Discovery Joan Serrà Artificial Intelligence Research Institute (IIIA-CSIC) Spanish National Research Council jserra@iiia.csic.es http://joanserra.weebly.com Joan Serrà

More information

Music Synchronization

Music Synchronization Lecture Music Processing Music Synchronization Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals

More information

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014 MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)

More information

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri 1 CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING Alexander Wankhammer Peter Sciri introduction./the idea > overview What is musical structure?

More information

1 Introduction. 3 Data Preprocessing. 2 Literature Review

1 Introduction. 3 Data Preprocessing. 2 Literature Review Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues

More information

The Automatic Musicologist

The Automatic Musicologist The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12, 2004 Based on the paper: Fast Recognition of Musical

More information

Mining Large-Scale Music Data Sets

Mining Large-Scale Music Data Sets Mining Large-Scale Music Data Sets Dan Ellis & Thierry Bertin-Mahieux Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu

More information

Multimedia Database Systems. Retrieval by Content

Multimedia Database Systems. Retrieval by Content Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

Information Retrieval for Music and Motion

Information Retrieval for Music and Motion Meinard Miiller Information Retrieval for Music and Motion With 136 Figures, 39 in color and 26 Tables ^y Springer Contents 1 Introduction 1 1.1 Music Information Retrieval 1 1.1.1 Outline of Part I 3

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Extracting meaning

More information

An Audio Fingerprinting System for Live Version Identification using Image Processing Techniques

An Audio Fingerprinting System for Live Version Identification using Image Processing Techniques An Audio Fingerprinting System for Live Version Identification using Image Processing Techniques (Dr.?) Zafar Rafii Northwestern University EECS department Acknowledgments Work done during internship at

More information

An Introduction to Pattern Recognition

An Introduction to Pattern Recognition An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included

More information

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING Christopher Burges, Daniel Plastina, John Platt, Erin Renshaw, and Henrique Malvar March 24 Technical Report MSR-TR-24-19 Audio fingerprinting

More information

Automatic Classification of Audio Data

Automatic Classification of Audio Data Automatic Classification of Audio Data Carlos H. C. Lopes, Jaime D. Valle Jr. & Alessandro L. Koerich IEEE International Conference on Systems, Man and Cybernetics The Hague, The Netherlands October 2004

More information

Repeating Segment Detection in Songs using Audio Fingerprint Matching

Repeating Segment Detection in Songs using Audio Fingerprint Matching Repeating Segment Detection in Songs using Audio Fingerprint Matching Regunathan Radhakrishnan and Wenyu Jiang Dolby Laboratories Inc, San Francisco, USA E-mail: regu.r@dolby.com Institute for Infocomm

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION Prateek Verma, Yang-Kai Lin, Li-Fan Yu Stanford University ABSTRACT Structural segmentation involves finding hoogeneous sections appearing

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

MPEG-7 Audio: Tools for Semantic Audio Description and Processing

MPEG-7 Audio: Tools for Semantic Audio Description and Processing MPEG-7 Audio: Tools for Semantic Audio Description and Processing Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Why semantic description

More information

A Short Introduction to Audio Fingerprinting with a Focus on Shazam

A Short Introduction to Audio Fingerprinting with a Focus on Shazam A Short Introduction to Audio Fingerprinting with a Focus on Shazam MUS-17 Simon Froitzheim July 5, 2017 Introduction Audio fingerprinting is the process of encoding a (potentially) unlabeled piece of

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Sequence representation of Music Structure using Higher-Order Similarity Matrix and Maximum likelihood approach

Sequence representation of Music Structure using Higher-Order Similarity Matrix and Maximum likelihood approach Sequence representation of Music Structure using Higher-Order Similarity Matrix and Maximum likelihood approach G. Peeters IRCAM (Sound Analysis/Synthesis Team) - CNRS (STMS) 1 supported by the RIAM Ecoute

More information

Two-layer Distance Scheme in Matching Engine for Query by Humming System

Two-layer Distance Scheme in Matching Engine for Query by Humming System Two-layer Distance Scheme in Matching Engine for Query by Humming System Feng Zhang, Yan Song, Lirong Dai, Renhua Wang University of Science and Technology of China, iflytek Speech Lab, Hefei zhangf@ustc.edu,

More information

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc. CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems Leigh M. Smith Humtap Inc. leigh@humtap.com Basic system overview Segmentation (Frames, Onsets, Beats, Bars, Chord Changes, etc) Feature

More information

Music Genre Classification

Music Genre Classification Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

A SIMPLE, HIGH-YIELD METHOD FOR ASSESSING STRUCTURAL NOVELTY

A SIMPLE, HIGH-YIELD METHOD FOR ASSESSING STRUCTURAL NOVELTY A SIMPLE, HIGH-YIELD METHOD FOR ASSESSING STRUCTURAL NOVELTY Olivier Lartillot, Donato Cereghetti, Kim Eliard, Didier Grandjean Swiss Center for Affective Sciences, University of Geneva, Switzerland olartillot@gmail.com

More information

Multimedia Databases

Multimedia Databases Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Previous Lecture Audio Retrieval - Low Level Audio

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming

Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming Takuichi Nishimura Real World Computing Partnership / National Institute of Advanced

More information

Large scale object/scene recognition

Large scale object/scene recognition Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database

More information

The Intervalgram: An Audio Feature for Large-scale Cover-song Recognition

The Intervalgram: An Audio Feature for Large-scale Cover-song Recognition The Intervalgram: An Audio Feature for Large-scale Cover-song Recognition Thomas C. Walters, David A. Ross, and Richard F. Lyon Google, Brandschenkestrasse 110, 8002 Zurich, Switzerland tomwalters@google.com

More information

ALIGNED HIERARCHIES: A MULTI-SCALE STRUCTURE-BASED REPRESENTATION FOR MUSIC-BASED DATA STREAMS

ALIGNED HIERARCHIES: A MULTI-SCALE STRUCTURE-BASED REPRESENTATION FOR MUSIC-BASED DATA STREAMS ALIGNED HIERARCHIES: A MULTI-SCALE STRUCTURE-BASED REPRESENTATION FOR MUSIC-BASED DATA STREAMS Katherine M. Kinnaird Department of Mathematics, Statistics, and Computer Science Macalester College, Saint

More information

CHAPTER 7 MUSIC INFORMATION RETRIEVAL

CHAPTER 7 MUSIC INFORMATION RETRIEVAL 163 CHAPTER 7 MUSIC INFORMATION RETRIEVAL Using the music and non-music components extracted, as described in chapters 5 and 6, we can design an effective Music Information Retrieval system. In this era

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/05 TEXTURE ANALYSIS

ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/05 TEXTURE ANALYSIS ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/ TEXTURE ANALYSIS Texture analysis is covered very briefly in Gonzalez and Woods, pages 66 671. This handout is intended to supplement that

More information

Minimal-Impact Personal Audio Archives

Minimal-Impact Personal Audio Archives Minimal-Impact Personal Audio Archives Dan Ellis, Keansub Lee, Jim Ogle Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu

More information

LabROSA Research Overview

LabROSA Research Overview LabROSA Research Overview Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. Music 2. Environmental sound 3.

More information

Outline 7/2/201011/6/

Outline 7/2/201011/6/ Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 4.3: Feature Post-Processing alexander lerch November 4, 2015 instantaneous features overview text book Chapter 3: Instantaneous Features (pp. 63 69) sources:

More information

THIS article tackles the problem of live song identification.

THIS article tackles the problem of live song identification. JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 Known-Artist Live Song Identification Using Audio Hashprints TJ Tsai, Member, IEEE, Thomas Prätzlich, Student Member, IEEE, and Meinard Müller,

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Part 9: Representation and Description AASS Learning Systems Lab, Dep. Teknik Room T1209 (Fr, 11-12 o'clock) achim.lilienthal@oru.se Course Book Chapter 11 2011-05-17 Contents

More information

Chapter 5: Outlier Detection

Chapter 5: Outlier Detection Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.

More information

RECOMMENDATION ITU-R BS Procedure for the performance test of automated query-by-humming systems

RECOMMENDATION ITU-R BS Procedure for the performance test of automated query-by-humming systems Rec. ITU-R BS.1693 1 RECOMMENDATION ITU-R BS.1693 Procedure for the performance test of automated query-by-humming systems (Question ITU-R 8/6) (2004) The ITU Radiocommunication Assembly, considering a)

More information

LARGE-SCALE COVER SONG RECOGNITION USING THE 2D FOURIER TRANSFORM MAGNITUDE

LARGE-SCALE COVER SONG RECOGNITION USING THE 2D FOURIER TRANSFORM MAGNITUDE LARGE-SCALE COVER SONG RECOGNITION USING THE D FOURIER TRANSFORM MAGNITUDE Thierry Bertin-Mahieux Columbia University LabROSA, EE Dept. tb33@columbia.edu Daniel P.W. Ellis Columbia University LabROSA,

More information

Multimedia Databases. 8 Audio Retrieval. 8.1 Music Retrieval. 8.1 Statistical Features. 8.1 Music Retrieval. 8.1 Music Retrieval 12/11/2009

Multimedia Databases. 8 Audio Retrieval. 8.1 Music Retrieval. 8.1 Statistical Features. 8.1 Music Retrieval. 8.1 Music Retrieval 12/11/2009 8 Audio Retrieval Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 8 Audio Retrieval 8.1 Query by Humming

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

More information

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging 1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant

More information

Detection of Acoustic Events in Meeting-Room Environment

Detection of Acoustic Events in Meeting-Room Environment 11/Dec/2008 Detection of Acoustic Events in Meeting-Room Environment Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of 34 Content Introduction State of the Art Acoustic

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

A Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University

A Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University A Brief Overview of Audio Information Retrieval Unjung Nam CCRMA Stanford University 1 Outline What is AIR? Motivation Related Field of Research Elements of AIR Experiments and discussion Music Classification

More information

Efficient music identification using ORB descriptors of the spectrogram image

Efficient music identification using ORB descriptors of the spectrogram image Williams et al. EURASIP Journal on Audio, Speech, and Music Processing (2017) 2017:17 DOI 10.1186/s13636-017-0114-4 RESEARCH Efficient music identification using ORB descriptors of the spectrogram image

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Movie synchronization by audio landmark matching

Movie synchronization by audio landmark matching Movie synchronization by audio landmark matching Ngoc Q. K. Duong, Franck Thudor To cite this version: Ngoc Q. K. Duong, Franck Thudor. Movie synchronization by audio landmark matching. IEEE International

More information

Large-scale visual recognition Efficient matching

Large-scale visual recognition Efficient matching Large-scale visual recognition Efficient matching Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!!

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

Motion illusion, rotating snakes

Motion illusion, rotating snakes Motion illusion, rotating snakes Local features: main components 1) Detection: Find a set of distinctive key points. 2) Description: Extract feature descriptor around each interest point as vector. x 1

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Region-based Segmentation

Region-based Segmentation Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.

More information

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and

More information

Block Mean Value Based Image Perceptual Hashing for Content Identification

Block Mean Value Based Image Perceptual Hashing for Content Identification Block Mean Value Based Image Perceptual Hashing for Content Identification Abstract. Image perceptual hashing has been proposed to identify or authenticate image contents in a robust way against distortions

More information

3 Feature Selection & Feature Extraction

3 Feature Selection & Feature Extraction 3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

CS6716 Pattern Recognition

CS6716 Pattern Recognition CS6716 Pattern Recognition Prototype Methods Aaron Bobick School of Interactive Computing Administrivia Problem 2b was extended to March 25. Done? PS3 will be out this real soon (tonight) due April 10.

More information

Elemental Set Methods. David Banks Duke University

Elemental Set Methods. David Banks Duke University Elemental Set Methods David Banks Duke University 1 1. Introduction Data mining deals with complex, high-dimensional data. This means that datasets often combine different kinds of structure. For example:

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek

More information

ARTICLE; BIOINFORMATICS Clustering performance comparison using K-means and expectation maximization algorithms

ARTICLE; BIOINFORMATICS Clustering performance comparison using K-means and expectation maximization algorithms Biotechnology & Biotechnological Equipment, 2014 Vol. 28, No. S1, S44 S48, http://dx.doi.org/10.1080/13102818.2014.949045 ARTICLE; BIOINFORMATICS Clustering performance comparison using K-means and expectation

More information

Norbert Schuff VA Medical Center and UCSF

Norbert Schuff VA Medical Center and UCSF Norbert Schuff Medical Center and UCSF Norbert.schuff@ucsf.edu Medical Imaging Informatics N.Schuff Course # 170.03 Slide 1/67 Objective Learn the principle segmentation techniques Understand the role

More information

Robustness and independence of voice timbre features under live performance acoustic degradations

Robustness and independence of voice timbre features under live performance acoustic degradations Robustness and independence of voice timbre features under live performance acoustic degradations Dan Stowell and Mark Plumbley dan.stowell@elec.qmul.ac.uk Centre for Digital Music Queen Mary, University

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and

More information

Detection, Classification, & Identification of Objects in Cluttered Images

Detection, Classification, & Identification of Objects in Cluttered Images Detection, Classification, & Identification of Objects in Cluttered Images Steve Elgar Washington State University Electrical Engineering 2752 Pullman, Washington 99164-2752 elgar@eecs.wsu.edu Voice: (509)

More information

MASK: Robust Local Features for Audio Fingerprinting

MASK: Robust Local Features for Audio Fingerprinting 2012 IEEE International Conference on Multimedia and Expo MASK: Robust Local Features for Audio Fingerprinting Xavier Anguera, Antonio Garzon and Tomasz Adamek Telefonica Research, Torre Telefonica Diagonal

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

Available online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article

Available online   Journal of Scientific and Engineering Research, 2016, 3(4): Research Article Available online www.jsaer.com, 2016, 3(4):417-422 Research Article ISSN: 2394-2630 CODEN(USA): JSERBR Automatic Indexing of Multimedia Documents by Neural Networks Dabbabi Turkia 1, Lamia Bouafif 2, Ellouze

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique

Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique R D Neal, R J Shaw and A S Atkins Faculty of Computing, Engineering and Technology, Staffordshire University, Stafford

More information

Compression-Based Clustering of Chromagram Data: New Method and Representations

Compression-Based Clustering of Chromagram Data: New Method and Representations Compression-Based Clustering of Chromagram Data: New Method and Representations Teppo E. Ahonen Department of Computer Science University of Helsinki teahonen@cs.helsinki.fi Abstract. We approach the problem

More information

Adaptive Quantization for Video Compression in Frequency Domain

Adaptive Quantization for Video Compression in Frequency Domain Adaptive Quantization for Video Compression in Frequency Domain *Aree A. Mohammed and **Alan A. Abdulla * Computer Science Department ** Mathematic Department University of Sulaimani P.O.Box: 334 Sulaimani

More information

Predicting Popular Xbox games based on Search Queries of Users

Predicting Popular Xbox games based on Search Queries of Users 1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which

More information

From Improved Auto-taggers to Improved Music Similarity Measures

From Improved Auto-taggers to Improved Music Similarity Measures From Improved Auto-taggers to Improved Music Similarity Measures Klaus Seyerlehner 1, Markus Schedl 1, Reinhard Sonnleitner 1, David Hauger 1, and Bogdan Ionescu 2 1 Johannes Kepler University Department

More information

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CX 4242 DVA March 6, 2014 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Analyze! Limited memory size! Data may not be fitted to the memory of your machine! Slow computation!

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 5, JULY

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 5, JULY IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 5, JULY 2008 1015 Analysis of Minimum Distances in High-Dimensional Musical Spaces Michael Casey, Member, IEEE, Christophe Rhodes,

More information