Musc/Voce Separaton usng the Smlarty Matrx Zafar Raf & Bryan Pardo
Introducton Muscal peces are often characterzed by an underlyng repeatng structure over whch varyng elements are supermposed Propellerheads - Hstory Repeatng - 2 4 6 8 2 /2/2 Zafar Raf & Bryan Pardo 2
Introducton The REpeatng Pattern Extracton Technque (REPET) was proposed to extract the repeatng structure from the non-repeatng structure Repeatng Structure Mxture REPET Non-repeatng Structure /2/2 Zafar Raf & Bryan Pardo 3
Step Step 2 Step 3 REPET Mxture Sgnal x 5 Mxture Spectrogram V Beat Spectrum b.8.6 5 2.9.8.7.4.2 25 3.6.5.4 -.2 -.4 -.6 -.8 -.5.5 2 2.5 3 3.5 4 4.5 5 5.5 6 35 4 45 p 5 55.3.2. 5 5 2 25 3 35 4 45 5 55 V p 2 3 2p 4 5 6 5 5 2 25 3 35 4 45 5 55 Medan 5 5 5 5 2 2 25 25 3 3 35 35 4 4 45 45 5 5 55.2.4.6.8.2.4.6.8 2 55.2.4.6.8.2.4.6.8 2.2.4.6.8.2.4.6.8 2 Repeatng Segment S 5 S V 5 Repeatng Spectrogram W 5 Tme-Frequency Mask M 5 5 5 2 2 2 25 25 25 3 3 3 35 35 35 4 4 4 45 45 45 5 5 5 55 mn mn mn 55 55 Zafar Raf & Bryan Pardo 4
Step Step 2 Step 3 Adaptve REPET.8.6.4.2 -.2 -.4 -.6 -.8-5 5 2 25 3 35 4 45 5 55 5 5 5 2 25 5 3 2 35 25 4 3 45 35 5 4 55 45 5 55 Mxture Sgnal x.5.5 2 2.5 3 3.5 4 4.5 5 5.5 6 V -p 3 +p 5 6 2 4 V U 2 3 4 5 mn 6 5 5 2 25 3 35 4 45 Mxture Spectrogram V Beat Spectrogram B p 5 55 Medan Repeatng Spectrogram U 5 5 5 5 5 5 5 5 2 2 2 2 25 25 25 25 3 3 3 3 35 35 35 35 4 4 4 4 45 45 45 45 5 5 5 5 55 55 -p 55 2 2 3 2 3 4 3 4 5 554 5 6 5 6 6 +p Repeatng Spectrogram W Tme-Frequency Mask M 5 5 5 5 2 2 25 25 3 3 35 35 4 4 45 45 5 5 55 55 Zafar Raf & Bryan Pardo 5.9.8.7.6.5.4.3.2.
Lmtatons Both the orgnal and the adaptve REPET assume perodcally repeatng patterns Mxture Beat spectrogram Perodcally repeatng background perod fnder /2/2 Zafar Raf & Bryan Pardo 6
Lmtatons Repettons can also happen ntermttently or wthout a global (or local) perod Mxture Beat spectrogram Non-perodcally repeatng background perod fnder /2/2 Zafar Raf & Bryan Pardo 7
Lmtatons Instead of lookng for perodctes, we can look for smlartes, usng a smlarty matrx Mxture Smlarty matrx +smlar +dssmlar Non-perodcally repeatng background /2/2 Zafar Raf & Bryan Pardo 8
Smlarty Matrx The smlarty matrx s a matrx where each bn measures the (ds)smlarty between any two elements of a sequence gven a metrc 2 Sequence metrc Smlarty matrx 2 +smlar +dssmlar /2/2 Zafar Raf & Bryan Pardo 9
frequency (khz) Smlarty Matrx In audo, the SM can help to vsualze the tme structure and fnd repeatng/smlar patterns 2 Spectrogram 2 4 6 8 2 cosne 2 8 6 4 2 Smlarty Matrx 2 4 6 8 2 +smlar +dssmlar.8.6.4.2 /2/2 Zafar Raf & Bryan Pardo
frequency (khz) frequency (khz) frequency (khz) Assumptons Gven a mxture of musc + voce: The repeatng background s dense & low-ranked The non-repeatng foreground s sparse & vared Mxture Spectrogram Background Spectrogram Foreground Spectrogram 2 2 2 2 4 6 8 2 2 4 6 8 2 2 4 6 8 2 /2/2 Zafar Raf & Bryan Pardo
frequency (khz) frequency (khz) Assumptons The SM of a mxture s then lkely to reveal the structure of the repeatng background 2 Mxture Spectrogram 2 4 6 8 2 2 8 6 4 2 Smlarty Matrx 2 4 6 8 2.8.6.4.2 Background Spectrogram 2 2 4 6 8 2 /2/2 Zafar Raf & Bryan Pardo 2
REPET-SIM REPET wth Smlarty Matrx!. Identfy the repeatng/smlar elements 2. Derve a repeatng model 3. Extract the repeatng structure Repeatng Structure Mxture Sgnal REPET- SIM Non-repeatng Structure /2/2 Zafar Raf & Bryan Pardo 3
REPET-SIM Advantages compared wth REPET: Can handle ntermttent repeatng elements Can handle fast-varyng repeatng structures Can handle full-track songs Repeatng Structure Mxture Sgnal REPET- SIM Non-repeatng Structure /2/2 Zafar Raf & Bryan Pardo 4
Practcal Interests Audo post processng Melody extracton Karaoke gamng Interests Intellectual Interests Musc percepton Musc understandng Smply based on self-smlarty! /2/2 Zafar Raf & Bryan Pardo 5
Step 3 Step 2 Step REPET-SIM.8.6.4.2 -.2 -.4 -.6 -.8-5 5 2 25 3 35 4 45 5 55 5 5 5 2 25 5 3 2 35 25 4 3 45 35 5 4 55 45 5 55 Mxture Sgnal x.5.5 2 2.5 3 3.5 4 4.5 5 5.5 6 V j V U j 2 = j 3 2 3 4 5 mn 6 Mxture Spectrogram V 6 5 5 4 5 2 3 25 3 2 35 4 45 Smlarty Matrx S 5 55 Medan Repeatng Spectrogram U 5 5 5 5 5 5 5 5 2 2 2 2 25 25 25 25 3 3 3 3 35 35 35 35 4 4 4 4 45 45 45 45 5 5 5 5 55 55 j 55 2 2 3 2 3 4 3 4 5 554 5 6 5 6 6 j 2 j 3 Repeatng Spectrogram W Tme-Frequency Mask M 5 5 5 5 2 2 25 25 3 3 35 35 4 4 45 45 5 5 55 55 Zafar Raf & Bryan Pardo 6 j 3 j 2 j
Step 3 Step 2 Step. Repeatng Elements.8.6.4.2 -.2 -.4 -.6 -.8-5 5 2 25 3 35 4 45 5 55 5 5 5 2 25 5 3 2 35 25 4 3 45 35 5 4 55 45 5 55 Mxture Sgnal x.5.5 2 2.5 3 3.5 4 4.5 5 5.5 6 V j V U j 2 = j 3 2 3 4 5 mn 6 Mxture Spectrogram V 6 5 5 4 5 2 3 25 3 2 35 4 45 Smlarty Matrx S 5 55 Medan Repeatng Spectrogram U 5 5 5 5 5 5 5 5 2 2 2 2 25 25 25 25 3 3 3 3 35 35 35 35 4 4 4 4 45 45 45 45 5 5 5 5 55 55 j 55 2 2 3 2 3 4 3 4 5 554 5 6 5 6 6 j 2 j 3 Repeatng Spectrogram W Tme-Frequency Mask M 5 5 5 5 2 2 25 25 3 3 35 35 4 4 45 45 5 5 55 55 Zafar Raf & Bryan Pardo 7 j 3 j 2 j
frequency (khz). Repeatng Elements We take the cosne smlarty between any two pars of columns and get a smlarty matrx 2 Mxture Spectrogram 2 cosne 6 2 4 6 8 2 2 4 2 2 8 Smlarty Matrx 2 4 6 8 2 /2/2 Zafar Raf & Bryan Pardo 8
frequency (khz) frequency (khz). Repeatng Elements The SM reveals for every frame, the frames j k that are the most smlar to frame Mxture Spectrogram Smlarty Matrx Mxture Spectrogram 2 2 2 cosne 8 j 3 2 4 6 8 2 6 4 2 j 2 j 2 4 6 8 2 j 2 j j 3 2 4 6 8 2 /2/2 Zafar Raf & Bryan Pardo 9
Step 3 Step 2 Step. Repeatng Elements.8.6.4.2 -.2 -.4 -.6 -.8-5 5 2 25 3 35 4 45 5 55 5 5 5 2 25 5 3 2 35 25 4 3 45 35 5 4 55 45 5 55 Mxture Sgnal x.5.5 2 2.5 3 3.5 4 4.5 5 5.5 6 V j V U j 2 = j 3 2 3 4 5 mn 6 Mxture Spectrogram V 6 5 5 4 5 2 3 25 3 2 35 4 45 Smlarty Matrx S 5 55 Medan Repeatng Spectrogram U 5 5 5 5 5 5 5 5 2 2 2 2 25 25 25 25 3 3 3 3 35 35 35 35 4 4 4 4 45 45 45 45 5 5 5 5 55 55 j 55 2 2 3 2 3 4 3 4 5 554 5 6 5 6 6 j 2 j 3 Repeatng Spectrogram W Tme-Frequency Mask M 5 5 5 5 2 2 25 25 3 3 35 35 4 4 45 45 5 5 55 55 Zafar Raf & Bryan Pardo 2 j 3 j 2 j
Step 3 Step 2 Step 2. Repeatng Model.8.6.4.2 -.2 -.4 -.6 -.8-5 5 2 25 3 35 4 45 5 55 5 5 5 2 25 5 3 2 35 25 4 3 45 35 5 4 55 45 5 55 Mxture Sgnal x.5.5 2 2.5 3 3.5 4 4.5 5 5.5 6 V j V U j 2 = j 3 2 3 4 5 mn 6 Mxture Spectrogram V 6 5 5 4 5 2 3 25 3 2 35 4 45 Smlarty Matrx S 5 55 Medan Repeatng Spectrogram U 5 5 5 5 5 5 5 5 2 2 2 2 25 25 25 25 3 3 3 3 35 35 35 35 4 4 4 4 45 45 45 45 5 5 5 5 55 55 j 55 2 2 3 2 3 4 3 4 5 554 5 6 5 6 6 j 2 j 3 Repeatng Spectrogram W Tme-Frequency Mask M 5 5 5 5 2 2 25 25 3 3 35 35 4 4 45 45 5 5 55 55 Zafar Raf & Bryan Pardo 2 j 3 j 2 j
frequency (khz) frequency (khz) 2. Repeatng Model For every frame, we take the medan of ts most smlar frames j k found usng the SM Mxture Spectrogram Mxture Spectrogram 2 2 SM 2 4 6 8 2 2 4 6 8 2 j 2 j j 3 /2/2 Zafar Raf & Bryan Pardo 22
frequency (khz) frequency (khz) frequency (khz) 2. Repeatng Model We obtan an ntal repeatng spectrogram model Mxture Spectrogram Mxture Spectrogram Repeatng Spectrogram 2 2 2 SM medan 2 4 6 8 2 2 4 6 8 2 j 2 j j 3 2 4 6 8 2 /2/2 Zafar Raf & Bryan Pardo 23
Step 3 Step 2 Step 2. Repeatng Model.8.6.4.2 -.2 -.4 -.6 -.8-5 5 2 25 3 35 4 45 5 55 5 5 5 2 25 5 3 2 35 25 4 3 45 35 5 4 55 45 5 55 Mxture Sgnal x.5.5 2 2.5 3 3.5 4 4.5 5 5.5 6 V j V U j 2 = j 3 2 3 4 5 mn 6 Mxture Spectrogram V 6 5 5 4 5 2 3 25 3 2 35 4 45 Smlarty Matrx S 5 55 Medan Repeatng Spectrogram U 5 5 5 5 5 5 5 5 2 2 2 2 25 25 25 25 3 3 3 3 35 35 35 35 4 4 4 4 45 45 45 45 5 5 5 5 55 55 j 55 2 2 3 2 3 4 3 4 5 554 5 6 5 6 6 j 2 j 3 Repeatng Spectrogram W Tme-Frequency Mask M 5 5 5 5 2 2 25 25 3 3 35 35 4 4 45 45 5 5 55 55 Zafar Raf & Bryan Pardo 24 j 3 j 2 j
Step 3 Step 2 Step 3. Repeatng Structure.8.6.4.2 -.2 -.4 -.6 -.8-5 5 2 25 3 35 4 45 5 55 5 5 5 2 25 5 3 2 35 25 4 3 45 35 5 4 55 45 5 55 Mxture Sgnal x.5.5 2 2.5 3 3.5 4 4.5 5 5.5 6 V j V U j 2 = j 3 2 3 4 5 mn 6 Mxture Spectrogram V 6 5 5 4 5 2 3 25 3 2 35 4 45 Smlarty Matrx S 5 55 Medan Repeatng Spectrogram U 5 5 5 5 5 5 5 5 2 2 2 2 25 25 25 25 3 3 3 3 35 35 35 35 4 4 4 4 45 45 45 45 5 5 5 5 55 55 j 55 2 2 3 2 3 4 3 4 5 554 5 6 5 6 6 j 2 j 3 Repeatng Spectrogram W Tme-Frequency Mask M 5 5 5 5 2 2 25 25 3 3 35 35 4 4 45 45 5 5 55 55 Zafar Raf & Bryan Pardo 25 j 3 j 2 j
frequency (khz) 3. Repeatng Structure We take the element-wse mnmum between the repeatng and mxture spectrograms Mxture Spectrogram Repeatng Spectrogram 2 2 2 4 6 8 2 2 4 6tme 8(s) 2 mn /2/2 Zafar Raf & Bryan Pardo 26
frequency (khz) frequency (khz) 3. Repeatng Structure We obtan a refned repeatng spectrogram model for the repeatng background Mxture Spectrogram Repeatng Spectrogram 2 2 2 Repeatng Spectrogram 2 4 6 8 2 2 4 6tme 8(s) 2 mn 2 4 6 8 2 /2/2 Zafar Raf & Bryan Pardo 27
frequency (khz) frequency (khz) frequency (khz) 3. Repeatng Structure The repeatng spectrogram cannot have values hgher than the mxture spectrogram Mxture Spectrogram Repeatng Spectrogram Non-repeatng Spectrogram 2 2 2 2 4 6 8 2 2 4 6 8 2 2 4 6 8 2 /2/2 Zafar Raf & Bryan Pardo 28
frequency (khz) frequency (khz) frequency (khz) 3. Repeatng Structure We dvde the repeatng spectrogram by the mxture spectrogram, element-wse Mxture Spectrogram 2 Repeatng Spectrogram 2 2Mxture Spectrogram 2 4 6 8 2 2 4 6 8 2 2 4 6tme 8(s) 2 tme (sec) dvdes /2/2 Zafar Raf & Bryan Pardo 29
frequency (khz) frequency (khz) frequency (khz) frequency (khz) 3. Repeatng Structure We obtan a soft tme-frequency mask (wth values n [,]) Mxture Spectrogram Repeatng Spectrogram Tme-frequency Mask 2 2Mxture Spectrogram 2 2 2 4 6 8 2 2 4 6 8 2 2 4 6tme 8(s) 2 tme (sec) dvdes 2 4 6 8 2 /2/2 Zafar Raf & Bryan Pardo 3
frequency (khz) frequency (khz) frequency (khz) 3. Repeatng Structure We apply the t-f mask to the mxture STFT and obtan the repeatng background 2 Mxture Spectrogram Background Spectrogram 2 Background Sgnal 2 4 6 8 2.x 2 4 6 8 2 STFT - 2 4 6 8 2 Tme-frequency Mask 2 2 4 6 8 2 Zafar Raf & Bryan Pardo 3
frequency (khz) frequency (khz) 3. Repeatng Structure The non-repeatng foreground s obtaned by subtractng the background from the mxture 2 Mxture Spectrogram Background Spectrogram 2 Background Sgnal 2 4 6 8 2 2 4 6 8 2 STFT - 2 4 6 8 2 Mxture Sgnal Background Sgnal Foreground Sgnal - 2 4 6 8 2-2 4 6 8 2-2 4 6 8 2 /2/2 Zafar Raf & Bryan Pardo 32
Musc/Voce Separaton Repeatng background musc component Non-repeatng foreground voce component Background Sgnal - Mxture Sgnal 2 4 6 8 2 REPET-SIM. Repeatng elements 2. Repeatng model 3. Repeatng structure - 2 4 6 8 2 Foreground Sgnal - 2 4 6 8 2 /2/2 Zafar Raf & Bryan Pardo 33
Evaluaton Compettve method [Lutkus et al., 22] Adaptve REPET wth automatc perods fnder and soft tme-frequency maskng Compettve method 2 [FtzGerald et al., 2] Medan flterng of the spectrogram at dfferent frequency resolutons to extract the vocals Data set 4 full-track real-world songs (Beach Boys) 3 voce-to-musc mxng ratos (-6,, and 6 db) /2/2 Zafar Raf & Bryan Pardo 34
Evaluaton MMFS = FtzGerald et al. REPET+ = Lutkus et al. Proposed = REPET-SIM /2/2 Zafar Raf & Bryan Pardo 35
Examples REPET-SIM vs. FtzGerald et al. Musc estmate (FtzGerald) Voce estmate (FtzGerald) Wham! - Freedom - 5 5 2 25-5 5 2 25-5 5 2 25 Musc estmate (REPET-SIM) Voce estmate (REPET-SIM) - 5 5 2 25-5 5 2 25 /2/2 Zafar Raf & Bryan Pardo 36
Examples REPET-SIM Blackalcous - Alphabet Aerobcs - 2 4 6 8 2 - - /2/2 Musc estmate 2 4 6 8 2 Voce estmate 2 4 6 8 2 Zafar Raf & Bryan Pardo 37
Examples Adaptve REPET Blackalcous - Alphabet Aerobcs - 2 4 6 8 2 - - /2/2 Musc estmate 2 4 6 8 2 Voce estmate 2 4 6 8 2 Zafar Raf & Bryan Pardo 38
Concluson The analyss of the repettons/smlartes n musc can be used for source separaton Repeatng Structure Mxture Sgnal REPET-SIM. Repeatng elements 2. Repeatng model 3. Repeatng structure Non-repeatng Structure /2/2 Zafar Raf & Bryan Pardo 39
Questons? D. FtzGerald and M. Ganza, Sngle Channel Vocal Separaton usng Medan Flterng and Factorsaton Technques, ISAST Transactons on Electronc and Sgnal Processng, vol. 4, no., pp. 62-73, 2. J. Foote, Vsualzng Musc and Audo usng Self-Smlarty, ACM Internatonal Conference on Multmeda, Orlando, FL, USA, October 3-November 5, 999. A. Lutkus, Z. Raf, R. Badeau, B. Pardo, and G. Rchard, Adaptve Flterng for Musc/Voce Separaton explotng the Repeatng Muscal Structure, IEEE Internatonal Conference on Acoustcs, Speech and Sgnal Processng, Kyoto, Japan, March 25-3, 22. Z. Raf and B. Pardo, A Smple Musc/Voce Separaton Method based on the Extracton of the Repeatng Muscal Structure, IEEE Internatonal Conference on Acoustcs, Speech and Sgnal Processng, Prague, Czech Republc, May 22-27, 2. /2/2 Zafar Raf & Bryan Pardo 4