Two-layer Distance Scheme in Matching Engine for Query by Humming System

Similar documents
Query by Singing/Humming System Based on Deep Learning

Sumantra Dutta Roy, Preeti Rao and Rishabh Bhargava

A Robust Music Retrieval Method for Queryby-Humming

Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming

CHAPTER 7 MUSIC INFORMATION RETRIEVAL

CS3242 assignment 2 report Content-based music retrieval. Luong Minh Thang & Nguyen Quang Minh Tuan

A Top-down Approach to Melody Match in Pitch Contour for Query by Humming

Research Article Fast Query-by-Singing/Humming System That Combines Linear Scaling and Quantized Dynamic Time Warping Algorithm

Multimedia Databases

Efficient and Simple Algorithms for Time-Scaled and Time-Warped Music Search

Towards an Integrated Approach to Music Retrieval

Retrieving musical information based on rhythm and pitch correlations

Multimedia Databases. 8 Audio Retrieval. 8.1 Music Retrieval. 8.1 Statistical Features. 8.1 Music Retrieval. 8.1 Music Retrieval 12/11/2009

Information Retrieval for Music and Motion

Intelligent query by humming system based on score level fusion of multiple classifiers

Story Unit Segmentation with Friendly Acoustic Perception *

FSRM Feedback Algorithm based on Learning Theory

Shape Similarity Measurement for Boundary Based Features

The MUSART Testbed for Query-By-Humming Evaluation

SEMEX An Efficient Music Retrieval Prototype æ

Face Alignment Under Various Poses and Expressions

QUERY BY HUMMING SYSTEM USING PERSONAL HYBRID RANKING

Optimizing the Deblocking Algorithm for. H.264 Decoder Implementation

Similarity Image Retrieval System Using Hierarchical Classification

RECOMMENDATION ITU-R BS Procedure for the performance test of automated query-by-humming systems

UNDERSTANDING SEARCH PERFORMANCE IN QUERY-BY-HUMMING SYSTEMS

Image Segmentation Based on Watershed and Edge Detection Techniques

Restricted Nearest Feature Line with Ellipse for Face Recognition

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

Face Hallucination Based on Eigentransformation Learning

Repeating Segment Detection in Songs using Audio Fingerprint Matching

A Query-by-Humming System based on Marsyas Framework and GPU Acceleration Algorithms

Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique

The MUSART Testbed for Query-by-Humming Evaluation

THE IMPORTANCE OF F0 TRACKING IN QUERY-BY-SINGING-HUMMING

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm

VisoLink: A User-Centric Social Relationship Mining

ALIGNED HIERARCHIES: A MULTI-SCALE STRUCTURE-BASED REPRESENTATION FOR MUSIC-BASED DATA STREAMS

Mubug: a mobile service for rapid bug tracking

Speaker Diarization System Based on GMM and BIC

BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES

Query by Humming. T. H. Merrett. McGill University. April 9, 2008

Sequence-Type Fingerprinting for Indoor Localization

Justin J. Salamon. MuSearch Query by Example Search Engine Computer Science Tripos, Part II Clare College 14th May 2007

A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition

CONTENT-BASED multimedia information retrieval

Toward Part-based Document Image Decoding

Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages

Graph Matching Iris Image Blocks with Local Binary Pattern

A Survey of Light Source Detection Methods

Electrical Engineering and Computer Science Department

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

Discovering Nontrivial Repeating Patterns in Music Data

Variable-Component Deep Neural Network for Robust Speech Recognition

Pattern Recognition Using Graph Theory

Novelty Detection from an Ego- Centric Perspective

HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging

Top-k Keyword Search Over Graphs Based On Backward Search

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING

A Comparative Evaluation of Search Techniques for Query-by-Humming Using the MUSART Testbed

A PMU-Based Three-Step Controlled Separation with Transient Stability Considerations

Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram

Effectiveness of HMM-Based Retrieval on Large Databases

DTV-BASED MELODY CUTTING FOR DTW-BASED MELODY SEARCH AND INDEXING IN QBH SYSTEMS

Ranking Clustered Data with Pairwise Comparisons

A Conflict-Based Confidence Measure for Associative Classification

An Adaptive Threshold LBP Algorithm for Face Recognition

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

A Software Testing Optimization Method Based on Negative Association Analysis Lin Wan 1, Qiuling Fan 1,Qinzhao Wang 2

Efficient Indexing and Searching Framework for Unstructured Data

Speech-Music Discrimination from MPEG-1 Bitstream

A hierarchical network model for network topology design using genetic algorithm

Web Data mining-a Research area in Web usage mining

An algorithm of lips secondary positioning and feature extraction based on YCbCr color space SHEN Xian-geng 1, WU Wei 2

A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON DWT WITH SVD

Robust color segmentation algorithms in illumination variation conditions

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Adaptive Doppler centroid estimation algorithm of airborne SAR

Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System

Similarity Searching Techniques in Content-based Audio Retrieval via Hashing

Noise Reduction in Image Sequences using an Effective Fuzzy Algorithm

Clustering-Based Distributed Precomputation for Quality-of-Service Routing*

Robust Signal-Structure Reconstruction

A Performance Evaluation of HMM and DTW for Gesture Recognition

An Integrated Face Recognition Algorithm Based on Wavelet Subspace

Automated Segmentation Using a Fast Implementation of the Chan-Vese Models

An indirect tire identification method based on a two-layered fuzzy scheme

Minimal Test Cost Feature Selection with Positive Region Constraint

Digital Media. Daniel Fuller ITEC 2110

Outsourcing Privacy-Preserving Social Networks to a Cloud

MPEG-7 Audio: Tools for Semantic Audio Description and Processing

Piecewise Linear Approximation Based on Taylor Series of LDPC Codes Decoding Algorithm and Implemented in FPGA

A Game Map Complexity Measure Based on Hamming Distance Yan Li, Pan Su, and Wenliang Li

SEMI-BLIND IMAGE RESTORATION USING A LOCAL NEURAL APPROACH

An Automatic Timestamp Replanting Algorithm for Panorama Video Surveillance *

A Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation

A Comparative Evaluation of Search Techniques for Query-by-Humming Using the MUSART Testbed

DESIGN AND IMPLEMENTATION OF VARIABLE RADIUS SPHERE DECODING ALGORITHM

RiMOM Results for OAEI 2009

A Revisit to LSB Substitution Based Data Hiding for Embedding More Information

Transcription:

Two-layer Distance Scheme in Matching Engine for Query by Humming System Feng Zhang, Yan Song, Lirong Dai, Renhua Wang University of Science and Technology of China, iflytek Speech Lab, Hefei zhangf@ustc.edu, {songy, lrdai, rhw}@ustc.edu.cn Abstract: In query-by-humming system, minimizing the impact introduced by the pitch tracking errors and humming errors is always a difficult problem. In this paper, we propose a two-layer distance scheme instead of the global DTW distance in traditional QBH system, which is motivated by the people s perception of humming. The local distance measure tries to find the correct part of humming, which is robust to the hummed error and the noise. Also, the approach to combine the local and global distance measure is proposed. The experiment of our QBH system on the mobile phone channel shows that the performance is greatly improved. Keywords: Query By Humming, Dynamic Time Warping, Two-layer Distance 1 Introduction With the development of the multimedia technology, the amount of music data is increasing rapidly everyday. To deal with the overwhelming amount of music data, a Music Information Retrieval (MIR) system is necessary. Query by humming (QBH) is an interesting branch of MIR. By using the QBH system, people can find the song by humming instead of having their name in remembrance. The earlier QBH system was proposed by Ghias [1], in which the hummed tune was firstly converted into a string in a 3 letter alphabet, and then a string matching algorithm was used to search converted symbols. In later QBH systems, the melody was converted into a series of pitch values. In order to prevent the performance degradation introduced by the humming errors, some methods based on the Dynamic Programming (DP) were proposed to calculate Dynamic Time Warping (DTW) distance (by Jang [2]) or Edit Distance (by Lemstrem [3]) between the hummed tune and the MIDI template in database. The DTW algorithm is widely applied in QBH systems, although there are other methods for matching engine such as Linear Alignment Matching (LAM) proposed by Wu [4] etc. However, the DTW algorithm in most QBH systems only measures the overall distance of the whole hummed tune, which may not be suitable for the situation that there are a few of pitch tracking errors and humming errors. The performance of such system is still far from satisfactory on the mobile phone channel or noisy environment. But people can well perceive the hummed tune merely according to the

correct part of humming. Motivated by this intuition, in this paper, we propose a novel two-layer distance measure scheme, in which a local DTW distance is calculated to find the corresponding correct hummed part, and the global DTW distance is used to measure the overall similarity. These distance measures are combined into the final criterion for retrieve the right pieces in melody database. Furthermore, a QBH application system using the two-layer distance measure scheme is constructed for querying from the melody database consisted of the 15000 clips extracted from about 1000 MIDI files, and the querying humming is collected from the mobile telephone channel. The experiment results show that the performance of our QBH system is greatly improved (detailed in section 4). This paper is organized as follows. Section 2 gives the overall description of the QBH system. Section 3 details the two-layer distance scheme. And experiment results are shown in section 4, followed by the conclusion and future work in section 5. 2 Overview of the Proposed QBH System The proposed QBH system consists of 3 components including Melody Extraction, Matching Engine and Melody Database Construction, as shown in figure 1. The acoustic input of the QBH system is the hummed tunes recorded from mobile phone on the CDMA channel (China Unicom) in a common office environment, and the sample rate is 8000, 16-bit resolution. 2.1 Melody Extraction Melody extraction primarily consists of 2 steps including Pitch Extraction and Note Segmentation. After extracting the pitch features, the hummed tune clips into notes according the pitch and intensity features. It is worth noting that the pitch of humming is difficult to be extracted precisely in mobile telephone channel. The pitch errors such as double pitch and half pitch may greatly degrade the system performance. Here, the autocorrelation algorithm proposed by Boersma [5] is used to extract the pitch. Although this algorithm works well in our experiment, there are also some double pitch errors mainly caused by the strong second harmonic components. 2.2 Matching Engine In matching engine, a two-layer distance scheme is proposed, which will be detailed in Section 3. 2.3 Melody Database Construction The database is consisted of 1000 MIDI files. These MIDI files are downloaded from Internet including Chinese pop songs and western pop songs. Their main tracks are

picked up manually. Because people are always humming only at several boundaries of the MIDI file, each MIDI file is divided into several clips by the local boundary detection model proposed by Cambouropoulos [6]. In our database, the 1000 MIDI files are divided into about 15000 clips automatically. Humming Mobile Telephhone MIDI Files Pitch Detection MIDI Segmentaion Melody Extraction Note Segmentaion Melody Database Melody Database Construction DTW Maching Engine Two-layer Distance Calculate Posterior Probability Matching criterion Rank List Fig. 1. The flowchart of QBH system 3 Matching Engine Using the Two-layer Distance Scheme In this section, we first briefly introduce the matching engine in classical QBH system. Then, based on the analysis of this scheme, the matching engine using a novel two-layer distance measure is proposed in detail. 3.1 Matching engine in classical QBH system To account for tempo variation of each individual user, dynamic time warping (DTW) is used to compute the warping distance between the input note sequences and MIDI note sequences in the database. To deal with key variation, relative pitch value is used.

Note durations are also used and measured by the Inter Onset Interval (IOI) ratio, which gives the ratio between the current and the next note duration. A classical calculation pattern is used to compute the melodic distance D(Q, M) between the query note sequences Q = q 1,q 2,...,q m and the MIDI note sequences M= m 1,m 2,...,m n by filling the matrix (D=[d ij ] m*n ). Each entry d i, j denotes the minimal melodic edit distance between two sequences q 1,q 2,...,q i and m 1,m 2,...,m j, which is calculated as follows, di 1, j+ w( qi, ) ( deletion) di, j 1 w(, j + m ) ( insertion) di 1, j 1 + w qi mj replacement di, j min di k, j 1 w( qi k + 1,..., qi, mj), 2 k i (, ) ( ) = + ( consolidation) di 1, j k+ w( qi, mj k + 1,..., mj),2 k i ( fragmentation) where w() is the distance between the query note q i and MIDI note m j for different situations such as deletion, insertion etc., which can be termed as w i,j. The end point constraints were loosed to cope with different starting or ending points. We choose n = a*m (a is a tunable parameter, for example, a = 1.4).Since the query note sequences is same for each MIDI clip in the database, the final distance value d final can be calculated from the m-th row of the matrix (d 0... m, 0... n ) as follows: D(Q, M) min( m, k), 1 (1) = d k n (2) For each MIDI clip M i (i = 1,2 N) in database, where N is the size of database, the melodic distance D(Q, M i ) is calculated as equation (2). And by sorting D(Q, M i ), the rank list is obtained. In practical situation, there are a few of pitch tracking errors and humming errors, which may degrade the querying performance heavily. However people can still discriminate the tune based on the correct hummed part. According to this observation, the performance of the QBH system can be improved by introducing the local distance measure, which will be detailed next. 3.2 Matching engine using two-layer distance measure scheme In this section, we propose a two-layer distance measure scheme. The global distance measure D(Q, M), termed as D global, is calculated as aforementioned, and the local distance measure D local, is calculated as follows. 3.2.1 Local Distance Measure In calculation of the d(i,j) as equation (1), the best path P best (i,j) from start point p(0,0) to end point p(i,j) is obtained. The point sequence of P best (i,j) is termed as s 1 s 2 s k(i,j)

where k(i,j) is the number of the point sequence. For further calculating the local distance measure according to current P best (i,j), the local path in P best (i,j) needs to be determined. We can set the point p(i,j) as the end point of the local path, and trace back along the P best (i,j) to find the appropriate start point, and the corresponding local distance is termed as d local (i,j). After obtaining d local (i,j) of each P best (i,j), the final local distance D local is the minimal of the matrix [d local ] m*n. This approach can be shown to be equivalent to find the best local path by greedy method. In addition, if the length of the path is too short, the corresponding melody may be meaningless. Here, the low-bound threshold of path length is heuristically set to 5 notes in our experiments. The start point p start (i,j) of the path P best (i,j) and the d local (i,j) are calculated in following procedure. End point searching procedure: Initialization: pstart(1,1) = p(1,1) (3) d local(1,1) = (4) Iteration: 1. Given the end point p(i,j), the point s k(i,j)-1 is the point just before p(i,j) in the path P best (i,j). The start point of P best (s k(i,j)-1 ) termed as s q has already been obtained before. 2. If the length of the path with start point s q and end point p(i,j) in P best (i,j) is less than 5 notes, then p start (i,j) = s q. 3. If the length of the path with start point s q and end point p(i,j) in P best (i,j) is more than 5 notes, d local (i,j) is calculated as follows: dlocal(, i j) = dincrease( sq, sq + 1,..., sk ( i, j) ) pstart( i, j) = sq (4) min dincrease( sq + 1, sq + 2,..., sk ( i, j) ) pstart( i, j) = sq + 1 dincrease( sq + 2, sq + 3,..., sk ( i, j) ) pstart( i, j) = sq + 2 dincrease( sq, sq + 1,..., sk ( i, j) ) = sum( wq, wq + 1,..., wk( i, j) ) l 1+ sum( tq, tq + 1,..., tk( i, j) ) sum( tq, tq + 1,..., tk( i, j) ) (where w q is the w() part of the point s q in equation (1) and t q is the duration of s q s corresponding note. l is a tunable parameter, which is set to 10.0 in our experiment. To reduce the computation load, the length of the new local path is around the length of the old local path in equation (4).) End Procedure (5)

3.2.2 Matching Criterion Given a two-layer distance, D global and D local, it is necessary to determine which one is more reliable. Note sequences posterior probability is proposed to solve this problem. For each MIDI clip M i, note sequences posterior probability P(M i T) can be calculated as follows based on Bayes Theory: PN ( Mi) PM ( i) PM ( i N) = PN ( ) where P(N) is constant for each MIDI clip, and the prior probability P(M i ) is assumed to be the same. The class conditional probability P(N M i ) can be used instead of posterior probability P(M i T). For each MIDI clip, P(N M i ) can be calculated from distance D i (D global or D local ) as shown in equation (6), 1/ Di PN ( Mi) = (1 K number of MIDI clips) SUM (1/ D k) Using equation (7), two prior probabilities: P(N M i ) global and P(N M i ) local can be obtained. The layer with higher prior probability is assumed to be more reliable and the corresponding rank list is taken as a result. (6) (7) 4 Experiment The melody database is detailed in section 2.3. In our experiment, 16 people including 9 men and 7 women hum 200 pieces of randomly selected songs to test the two-layer distance scheme. The average searching time for one piece is one half of the humming time on the platform of PC(P4 1.4G). The result is shown in Table 1. The accuracy for rank 1 is 74% compared with 58% for the base DTW method. The mean reciprocal rank (MRR) rises from 0.6563 to 0.7971. Table 1. Query-by-humming experiment Top1 Top3 Top5 Top10 MRR the base DTW method 58.04% 69.93% 73.42% 79.72% 0.6563 Two-layer Distance Scheme 74.12% 82.23% 85.31% 88.81% 0.7971 5 Conclusion and Future Work In this paper, we propose a two-layer distance scheme based on the DTW algorithm in our QBM system, which follows people s perception to search the hummed tune. The experiment has shown that our approach improves the performance of the QBM

system. The accuracy of rank 1 is 74%, while for the accuracy of rank 1 in traditional DP method is 58%. There are still some issues to be investigated as future extension. First, we only use distances of two layers. An approach to use multi-layer distance scheme needs to be exploited to further improve the performance of QBH system. Second, the more reliable local distance measure needs to be found in the future work. Acknowledgments. The work is supported by iflytek Co., and this QBH system has been used in the service of China Unicom Co. References 1. A.Ghias, Logan, D.Chamberlin, B.C.Smith, Query by Humming: Musical Information Retrieval in an Audio Database, pp. 231-236 in Proe. ACM Multimedia, San Francisco, 1995 2. J.R.Jang, H.Lee, Hierarchical Filtering Method for Content based Music Retrieval via Acoustic Input, Proc. The ninth ACM international conference on Multimedia, Ottawa, Canada, 2001, pp. 401-410. 3. Lemstrem, S.Perttu, Semex - an Efticient Music Retrieval Prototype. in Proc. ISMIR, 2000. 4. Ya-Dong Wu, Yang Li, Bao-Long Liu, A new method for approximate melody matching, Machine Learning and Cybernetics, 2003 International Conference on Volume 5, 2-5 Nov. 2003 Page(s):2687-2691 Vol.5 5. Paul Boersma, Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound, Proceedings of the Institute of Phonetic Sciences, 17:97110, 1993. 6. Cambouropoulos, E., Musical Rhythm: Inferring Accentuation and Metrical Structure from Grouping Structure,Music, Gestalt and Computing - Studies in Systematic and Cognitive Musicology. M. Leman(ed.), Springer-Verlag, Berlin(1997).