Research Article Fast Query-by-Singing/Humming System That Combines Linear Scaling and Quantized Dynamic Time Warping Algorithm

Size: px

Start display at page:

Download "Research Article Fast Query-by-Singing/Humming System That Combines Linear Scaling and Quantized Dynamic Time Warping Algorithm"

Barbra Griffith
6 years ago
Views:

1 Distributed Sensor Networks Volume 215, Article ID 17691, 1 pages Research Article Fast Query-by-Singing/Humming System That Combines Linear Scaling and Quantized Dynamic Time Warping Algorithm Gi Pyo Nam 1 and Kang Ryoung Park 2 1 Department of Electronics Engineering, Dongguk University, 26 Pil-dong 3-ga, Jung-gu, Seoul 1-715, Republic of Korea 2 Division of Electronics and Electrical Engineering, Dongguk University, 26 Pil-dong 3-ga, Jung-gu, Seoul 1-715, Republic of Korea Correspondence should be addressed to Kang Ryoung Park; parkgr@dgu.edu Received 23 January 215; Accepted 3 May 215 Academic Editor: José Molina Copyright 215 G. P. Nam and K. R. Park. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We newly propose a query-by-singing/humming (QbSH) system considering both the preclassification and multiple classifierbased by combining linear scaling (LS) and quantized dynamic time warping (QDTW) algorithm in order to enhance both the matching accuracy and processing speed. This is appropriate for the QbSH of high speed in the huge distributed server environment. This research is novel in the following three ways. First, the processing speed of the QDTW is generally much slower thanthels.so,weperformtheqdtwmatchingonlyincasethatthematchingdistancebylsalgorithmissmaller than predetermined threshold, by which the entire processing time is reduced while the matching accuracy is maintained. Second, we use the different measurement of matching distance in LS algorithm by considering the characteristics of reference database. Third, we combine the calculated distances of LS and QDTW algorithms based on score level fusion in order to enhance the matching accuracy. The experimental results with the 29 MIR-QbSH corpus and the AFA MIDI 1 databases showed that the proposed reduced the total searching time of reference data while obtaining the higher accuracy compared to the QDTW. 1. Introduction With the widespread music content and music databases on the Internet, portable media, and smart phone, fast and accurate content-based searching systems are required. Query-by-singing/humming (QbSH) is a representatively convenient and intelligent in the field of contentbased music retrieval systems. It matches the reference music file corresponding humming queries of a user. It can be used for retrieving a music file without singer s name and song title basedonthemelodyofthemusichummed/sungbyauser. In previous researches, the various kinds of QbSH systems have been researched [1 22]. Ghias et al. proposed the of representing the pitch contour features extracted from the humming or whistle data as an up-down-repeat (UDR) string and using them for matching [2]. McNab et al. proposed the MELDEX system based on the pitch contour, interval, and duration with string matcher [3, 4]. In previous research [5], they proposed the Tuneserver representing the pitch contour as the UDR string like the of [2]. Kornstadt et al. developed the Themefinder system which has the capability of searching the theme of music in the Humdrum database of classic music of the 16th century and folk songs on the web [6, 7]. In previous research [8], they showed the retrieval using the changes of melody and the UDR string. RyynänenandKlapuriproposedtheof extracting the pitch vectors by using a fixed-size time window and matching them by using locality sensitive hashing (LSH) [9]. In another study [1], they adopted earth mover s distance (EMD) which could calculate the minimum cost between the features of humming and reference data with the changes of the weight to measure melodic similarity. In the previous research [11], they proposed the of content-based music retrieval which firstly filters out 8% unlikely candidates by using hierarchical filtering and compares the input query with the remaining candidates. Salamon and Rohrmeier proposed the two-stage retrieval for QbSH system [12]. As the first stage, the number

2 2 Distributed Sensor Networks Table 1: Summarized comparisons of the proposed to previous ones. Only by single classifier-based Only by multiple classifier-based Only by preclassification-based Considering both preclassification and multiple classifier-based (proposed ) Advantage Disadvantage Matching with single classifier to calculate the distance between input query data and reference data [2 1, 14, 15, 18, 19] High processing speed Limitation to enhance the matching accuracy only by single classifier Combining the matching scores (by two or more classifiers) [13, 16, 17] Advantage Higher matching accuracy than that by single classifier-based Disadvantage Lower processing speed The system firstly reduces the number of candidates in large amount of database by preclassification, and it calculates the matching distance with remaining candidate data [11, 12, 2] Advantage Higher matching speed by reducing the number of candidates based on preclassification Disadvantage The final matching only by the single classifier has the limitation of lower accuracy Considering both preclassification and multiple classifier-based by combining LS and QDTW algorithm Advantage Higher matching speed with higher accuracy Disadvantage Lower matching speed than that only by LS algorithm of candidates is reduced by the indexing using n- grams. And detail matching with the remaining candidates is performed with the remaining candidates based on local alignment with modified cost functions. Wang et al. proposed the QbSH system by combining the EMD and dynamic time warping (DTW) classifiers based on the weighted SUM rule [13]. The previous QbSH systems can be roughly categorized into top-down and bottom-up matching systems. As the topdown one, Wu et al. proposed recursive alignment algorithm which firstly compares two-feature data in global view and does them locally [14]. Other s of [11, 12] belongto this category. On the contrary, bottom-up locally calculates the distance between query and reference data in each position and searches the optimal path for obtaining a final matching score [2 4, 6 8, 15, 16]. For the QbSH system, DTW algorithm has been widely used for matcher. It has been widely used in speech recognition and can easily solve the time alignment problem. Since there generally has been much misalignment of time between the input humming/singing and the reference music file, the DTW algorithm is suitable for QbSH systems, but it has the limitation of high cost in computation. Jang and Gao converted the input query data into pitch vectors [15]. Using this, they measured the similarity between singing/humming and reference songs based on the calculated distance by DTW with high accuracy; however, this is also computationally demanding [23, 24]. Krishnamoorthy et al. also used DTW as distance measurement for the QbSH system on embedded platforms [19]. However, it still has the problem of high computation of the DTW and lower matching accuracy by using single classifier. Li et al. proposed multistage matching-based system to enhance the performance of QbSH system [2]. It includes three stages. First and second stages aim to reduce the number of candidates in large amount of database by usingearthmover sdistance(emd)basedontuneandprofile features, respectively. Finally, DTW calculates the matching distance with remaining candidate data. However, the final matchingonlybythesingleclassifierofthedtwhasthe limitation of lower accuracy. The linear scaling (LS) has the advantage of fast processing time, but its accuracy is relatively lower than the DTW [16]. All of these previous researches are ones only by single classifier-based or by multiple classifier-based or by preclassification-based. They do not adopt the scheme of considering both the preclassification and multiple classifier-based s. To overcome the problems of the previous researches, we newly propose a QbSH system considering both the preclassification and multiple classifier-based by combining LS and quantized DTW (QDTW) algorithm in order to enhance both the matching accuracy and the processing speed. The processing speed of the QDTW is generally much slower than the LS, although QDTW is the modified version of DTW to enhance the matching accuracy and reduce the processing time. So, we perform the QDTW matching only in case that the matching distance by LS algorithm is smaller than predeterminedthreshold,bywhichtheentireprocessingtime is reduced by higher than 3% compared to that of QDTW while the matching accuracy is maintained. We use the different measurement of matching distance in LS algorithm by considering the characteristics of reference database. In addition, we combine the calculated distances of LS and QDTW algorithms based on score level fusion in order to enhance the matching accuracy. Table 1 shows the summarized comparisons of the proposed and previous s. The rest of this paper is structured as follows. Section 2 explains the proposed QbSH system. Section 3 discusses the experimental results, and Section 4 states the conclusions of this study.

3 Distributed Sensor Networks 3 Input query humming data Pitch extraction Normalization Preclassification by LS algorithm Calculating the matching distance by LS algorithm Matching distance < threshold Yes Calculating the matching distance by QDTW algorithm No Moving the matching window to the next matching position of MIDI data Is the last part of MIDI data?? Yes Combining of the matching distance based on score level fusion No Searching the genuine MIDI file in the database Figure 1: Flowchart of the proposed. 2. Proposed 2.1. Overview of the Proposed. Figure 1 shows a flowchart of the proposed. First, pitch data are extractedfromtheuser sinputhummingfilebyusingmusical note estimation. Second, we perform the following normalization. We remove pitch values of in the extracted pitch data, since these can be regarded as the meaningless data which are obtained from the silence period of melody. In general, the melody of the input humming/singing is relatively inaccurate compared to the reference musical instrument digital interface (MIDI) data because it is hummed or sung by an amateur. So, the pitch data of the input is quite different from the MIDI file which requires the further normalization of the pitch data in both input and MIDI files as follows. After eliminating the values, the input humming data are normalized through mean shifting, average filtering, and min-max scaling [16, 17, 21]. Median andaveragefilteringgetridofthepeakedandvibratednoises, and min-max scaling adjusts the amplitude variations. With the normalized pitch data, preclassification is performed based on the calculated distance by LS algorithm in order to decide whether the QDTW algorithm should be executed. In detail, it calculates the matching distance between the input query data and the reference MIDI data in the matching window. If the matching distance is greater than a specific threshold, the QDTW algorithm does not runbecausethehummingandmididataaredifferent. Then, the matching window of the MIDI data is moved to the next matching position, and the preclassification procedure is repeated. If the matching distance is less than the threshold, the QDTW is executed in order to obtain more accurate matching score. These procedures are iterated until the matching window reaches the last part of MIDI data. If arriving at the last part of MIDI data, the final matching distance between the input humming/singing and the MIDI fileisdeterminedbycombiningthematchingdistanceof QDTWandthatofLSalgorithmbasedonscorelevelfusion. The correct MIDI file is selected based on the final matching distance Pitch Extraction and Normalization. In order to extract the pitch data, we used a voice-activity detection (VAD) algorithm [16, 17, 21]. First, the VAD algorithm estimates the voiced frames, and then pitch data as integer value is extracted by the spectrotemporal autocorrelation (STA)

4 4 Distributed Sensor Networks Input query data Time The length of reference MIDI data (matched with input query data as corresponding part).8 Time The length of reference MIDI data 1. Time The length of reference MIDI data 1.2 Time Figure 2: Example of the operation of the LS algorithm. which is based on temporal and spectral autocorrelations with the sampling of every 32 ms. However, a lot of noises are generally contained in the extracted pitch data. In addition, muted regions exist and the pitch data of input are quite different from the MIDI file since users cannot hum/sing perfectly like MIDI music. So, the extracted pitch data should be normalized to obtain an accurate matching result. In this research, we perform the procedures of removal of values, mean shifting, median filtering, average filtering, and min-max scaling for normalization Preclassification by LS Algorithm LS. The LS algorithm has been widely used in QbSH systems, since its processing complexity is very low [16]. It calculates the matching distance between input query data and reference MIDI data by changing linearly the length of input or reference data on time axis. In this research, we change the length of the reference MIDI data. Figure 2 shows the example of the LS algorithm Measuring of Matching Distance. In general, the characteristics of MIDI data are different according to the kind of reference databases. Although the 29 MIR- QbSH corpus mostly consists of children s song and folk song, theafamidi1databaseincludesmorevariouskinds of songs. So, the melodies of the 29 MIR-QbSH corpus database are usually simpler than those of the AFA MIDI 1 database. In addition, more noises are included in the AFA MIDI 1 database. So, we use the different measurement of matching distance in LS algorithm by considering the characteristics of reference database. In general, the Euclidean distance is used to measure the dissimilarity between input query data and reference MIDI data in LS algorithm as shown in ED = n i= (q i r i ) 2, (1) where q i and r i mean ith query and reference MIDI data, respectively, and n means the length of data. For lower

5 Distributed Sensor Networks 5 Calculated Dist i Absolute difference Square Abs Log Atan Figure 3: The relationship between absolute difference ( q i r i of (4))andcalculatedDist i of (4) (square, abs, log, and atan mean the 1st 4th functions of (4),resp.). processing time, the following equation can be used instead of (1): SquareED = n i= (q i r i ) 2. (2) In (2), we define the (q i r i ) 2 as Dist i like (3),andweselect one of the four functions of (4) as the Dist i according to the kind of reference database: SqaureED = n i= Dist i, (3) (q i r i ) 2 { Dist i = q i r i log ( q i r i + 1) { { arctan ( q i r i.5)+.5. Figure3 shows the relationship between absolute difference ( q i r i of (4)) andcalculateddist i of (4). The square, log, and atan have the characteristics of nonlinearity between the input and output values whereas the abs has the characteristics of linearity Matching by QDTW. As shown in Figure 1,ifthematching distance by LS algorithm is less than predetermined threshold, QDTW is executed to calculate a more accurate matching distance. In general, a difference in length exists between the MIDI and humming phrase. This problem of time alignment can be overcome by the DTW algorithm which can calculate the dissimilarity between the two patterns with insertion and deletion [16, 17, 21]. At each matching (4) position of DTW, the dissimilarity between the humming and MIDI features is calculated by Euclidean distance. In this research, we adopted QDTW which has the only difference (from the DTW) that it uses the quantized pitch value instead of the original one. Since the original pitch value has variations caused by noise, they are represented as quantized integer values in QDTW. Before matching by QDTW, we detect the zero to nonzero position (the position where the pitch value is changed from zero to nonzero) of the MIDI data and match the starting position of the humming data with each zero to nonzero position of the MIDI data. If the time interval between twozero to nonzero positions is less than the threshold, only the first zero to nonzero position is used for matching, through which we can reduce the processing time and enhance the matching accuracy Score Level Fusion of Matching Distances. Score level fusionhasbeenusedwidelytoenhancethematching accuracy, and there are a lot of s into score level fusion. In this paper, we combined the two matching distances by the LS and QDTW s based on simple fusion ssuchasmin,max,product,andsumrulesand compared the performances of each fusion. The MIN andmaxrulesselectthesmallerandgreateroneamongtwo matching distances as final matching score, respectively. The PRODUCTandSUMrulecalculatethefinalmatchingscore by multiplying and summing the two matching distances, respectively. Experimental results showed that the MIN rule showed the best performance among all s. 3. Experimental Results For experiments, we used two databases. The first database was the 29 MIR-QbSH corpus which consists of 48 reference MIDI files and 4431 singing and humming queries as wav files [22]. A total of 118 persons sing or hum 8 s per each query in various environments such as telephones and microphones. Since the 29 MIR-QbSH corpus database provides pitch vector (PV) files which included manually extracted pitch data, we used the PV files for the experiments to exclude the pitch extraction error. The second database was the audio feature analysis (AFA) MIDI 1 database, which includes 1, singing and humming files recorded by microphone, and 1 MIDI files which are made up of 84 Korean songs, 6 children s songs, and 1 pop songs. The average time length of the input singing/humming files is 12 s. We performed our experiments on a desktop computer with a 3.4 GHz CPU and 8 GB RAM. To measure the matching accuracy, the mean reciprocal rank () is used as the criterion of performance, and it has been frequently used for measuring the accuracy of QbSH system [12, 16, 17, 21]: = 1 K k i=1 1 rank i, (5) where K isthenumberofinputsinging/hummingfilesand rank i is the ranking of the correct MIDI file (corresponding

6 6 Distributed Sensor Networks Table 2: The matching accuracy of LS algorithm with PV files of 29 MIR-QbSH corpus database according to various distance measurement s of (3) and (4). Distance measurement Accuracy Top 1 (%) Top 1 (%) Top 2 (%) Square function Abs function Log function Arctan function Table 3: The matching accuracy of LS algorithm with AFA MIDI 1 database according to various distance measurement s of (3) and (4). Distance measurement Accuracy Top 1 (%) Top 1 (%) Top 2 (%) Square function Abs function Log function Arctan function to the input file), as calculated by the proposed. If all of the correct MIDI files (corresponding to the input files) are accurately measured as the 1st in rank, the calculated becomes 1, and the maximum is 1 [12, 16, 17, 21]. Top 1, Top 1, and Top 2 indicate that the rank of the MIDI file is included within rank 1, rank 1, and rank 2, respectively. As the 1st experiment, we measured the matching accuracy of the LS algorithm according to various distance measurement s of (3) and (4), as shown in Tables 2 and 3. Theresultshowedthatthecasewhichuseslog or arctan function shows better accuracy than other cases when using the AFA MIDI 1 database which has a lot of noises. However, the abs function shows the best matching accuracy when using the 29 MIR-QbSH corpus database which has fewer noises. From that, we can confirm that the linear function for distance measurement can show better performance with the database of less noise while the nonlinear function can have better accuracy with the database of larger noises. As the 2nd experiment, we measured the matching accuracy and processing time when using the LS algorithm as preclassification before performing the QDTW algorithm.basedontables2 and 3, the square function based distance measurement for LS algorithm was excluded because it had the lower matching accuracy. As shown in Tables 4 and 5, the processing time was much reduced by the proposed compared to the QDTW although the by the proposed is the same to that of the QDTW. As the 3rd experiment, we compared the processing timeandoftheoriginalqdtwandtheproposed according to the threshold for preclassification by the LS. If the matching distance by the LS isgreaterthanthethreshold,theqdtw-basedmatchingis not performed and matching window is moved to the next position for matching. If not, the QDTW-based matching is performed. If the threshold increases, the number of cases (that the matching distance by the LS is less than the threshold) increases. Consequently, the number of cases of performing the QDTW-based matching is also increased, which enhances the but increases the processing time. AsshowninFigures4 and 5, we can confirm that processing time by the proposed is much reduced compared to that of QDTW while maintaining the. By comparing Figures 4(a), 4(b),and4(c), we can confirm that the proposed using the preclassification based on abs function of (4) shows the better performance. In addition, we can also confirmthattheproposedusingthepreclassification based onarctan functionof (4) shows the better performance by comparing Figures 5(a), 5(b), and 5(c). The predetermined threshold for LS was experimentally determined considering the minimum processing time with the maintained (matching accuracy) of our. That is, as shown in Figures 4(a) 4(c) and 5(a) 5(c), the predetermined thresholds are 3.2, 1.3, 1.5, 3.9, 1.5, and 1.5, respectively. The positions of the thresholds mean that the minimum processing time is taken while the of our does not degrade. As shown in Figures 4 and 5, the thresholds are different from the dataset and the measurement s of matching distance (equation (4)) in LS algorithm. The above results of Tables 4 and 5 and Figures 4 and 5 are the cases that two matching distances by the LS and QDTW are not combined. As the last experiment, we compared the performances when combining the matching distances by LS and QDTW algorithm. Since the matching distance by the LS algorithm was already calculated for preclassification and the processing time of score fusion such as MIN, MAX, PRODUCT, and SUM rule is almost ms, the final processing time by combining two matching distances is not increased. Tables 6 and 7 show the results of fusion of two matching distances. Based on the above results of Tables 4 and 5,theabs function-based LS algorithm was used for 29 MIR-QbSH corpus database, and the arctan function-based LS algorithm was used for AFA MIDI 1 database. Tables 8 and 9 show the performance comparisons of the proposed and others with 29 MIR-QbSH corpus database and AFA MIDI 1 database, respectively. As shown in Table 8, the Top 1 and Top 2 rate of the proposed are a little lower than those of QDTW and QDTW with preclassification by LS (not combining two matching distances) in case of using the 29 MIR-QbSH corpus database. However, except for this case, the accuracies of the proposed are higher than those of other s in allthecasesasshownintables8 and 9.InmostoftheQbSH systems, the accuracy is evaluated based on the of (5) and Top 1 rate. So, we can confirm that the matching accuracy of the proposed was enhanced compared to others although the processing time of our algorithm was reduced byhigherthan3%comparedtothatofqdtw.although LS has the lowest processing time among them, it could not be used as single classifier because of poor matching accuracy.

7 Distributed Sensor Networks 7 Table 4: The performance of the s which combine LS and QDTW algorithm with PV files of 29 MIR-QbSH corpus database. Processing time & accuracy Processing time (ms) Top 1 (%) Top 1 (%) Top 2 (%) QDTW without preclassification by LS Preclassification by LS LS (abs function) + QDTW LS (log function) + QDTW LS (arctan function) + QDTW Table 5: The performance of the s which combine LS and QDTW algorithm with AFA MIDI 1 database. Processing time & accuracy Processing Time (s) Top 1 (%) Top 1 (%) Top 2 (%) QDTW without preclassification by LS Preclassification by LS LS (abs function) + QDTW LS (log function) + QDTW LS (arctan function) + QDTW Processing time (ms) Preclassification threshold by LS (a) Processing time (ms) Threshold of proposed Processing time (ms) Threshold Preclassification threshold by LS Threshold Preclassification threshold by LS (b) of proposed (c) of proposed Figure 4: The relationship between the matching accuracy and processing time with the 29 MIR-QbSH corpus database in case of using the following: (a) abs function of (4); (b) log function of (4); (c) arctan function of (4).

8 8 Distributed Sensor Networks Processing time (s) Threshold Preclassification threshold by LS (a) Processing time (s) of proposed Processing time (s) Threshold Preclassification threshold by LS Threshold Preclassification threshold by LS (b) of proposed (c) of proposed Figure 5: The relationship between the matching accuracy and processing time with the AFA MIDI 1 database in case of using the following: (a) abs function of (4); (b) log function of (4); (c) arctan function of (4). Table 6: The results of fusion of two matching distances with 29 MIR-QbSH corpus database. Fusion Accuracy Top 1 (%) Top 1 (%) Top 2 (%) MIN (proposed ) MAX PRODUCT SUM Table 7: The results of fusion of two matching distances with AFA MIDI 1 database. Fusion Accuracy Top 1 (%) Top 1 (%) Top 2 (%) MIN (proposed ) MAX PRODUCT SUM Conclusions In QbSH systems, DTW is typically adopted as a matcher. However, this is computationally expensive, and a reduction in processing time is required for real-time QbSH systems. To overcome this problem, in this paper we proposed a fast QbSH system that combines LS algorithm and QDTW algorithm. The experimental results showed that theproposedenhancedthematchingaccuracyand reduced the processing time compared to the result when the QDTW algorithm was used as single classifier.

9 Distributed Sensor Networks 9 Table 8: Performance comparison of the proposed with other single classifiers with the 29 MIR-QbSH corpus database. Processing time per query (ms) Processing time & accuracy Top 1 (%) Top 1 (%) Top 2 (%) LS (abs function) QDTW Preclassification by LS (abs function) + QDTW (not combining two matching distances) Preclassification by LS (abs function) + QDTW (combining two matching distances) (proposed ) Table 9: Performance comparison of the proposed with other single classifiers with the AFA MIDI 1 database. Processing time per query (ms) Processing time & accuracy Top 1 (%) Top 1 (%) Top 2 (%) LS (arctan function) QDTW 9, Preclassification by LS (arctan function) + QDTW (not combining two matching distances) 6, Preclassification by LS (abs function) + QDTW (combining two matching distances) (proposed ) 6, Asafuturework,wewillcomparetheperformanceofour proposed with other s for a larger database on various platforms including mobile devices. Conflict of Interests The authors declare that they have no conflict of interests. Acknowledgments This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF- 212R1A1A238666) and in part by the Public Welfare and Safety Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (NRF ). References [1] R. Typke, F. Wiering, and R. C. Veltkamp, A survey of music information retrieval systems, in Proceedings of the International Conference on Music Information Retrieval, pp , September 25. [2] A.Ghias,J.Logan,D.Chamberlin,andB.C.Smith, Queryby humming: musical information retrieval in an audio database, in Proceedings of ACM International Conference on Multimedia (MULTIMEDIA 95), pp , November [3] R.J.McNab,L.A.Smith,I.H.Witten,C.L.Henderson,andS.J. Cunningham, Towards the digital music library: tune retrieval from acoustic input, in Proceedings of the 1st ACM International Conference on Digital Libraries, pp , March [4] R.J.McNab,L.A.Smith,D.Bainbridge,andI.H.Witten, The New Zealand digital library melody index, D-Lib Magazine, vol.3,no.5,pp.4 15,1997. [5] L. Prechelt and R. Typke, An interface for melody input, ACM Transactions on Computer-Human Interaction,vol.8,no.2,pp , 21. [6] A. Kornstadt, Themefinder: a web-based melodic search tool, Computing in Musicology,vol.11,pp ,1998. [7] Themefinder, [8] S. Blackburn and D. DeRoure, A tool for content based navigation of music, in Proceedings of ACM International Conference on Multimedia, pp , [9] M. Ryynänen and A. Klapuri, Query by humming of midi and audio using locality sensitive hashing, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing,pp ,April28. [1] R. Typke, P. Giannopoulos, R. C. Veltkamp, F. Wiering, and R. V. Oostrum, Using transportation distances for measuring melodic similarity, in Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR 3),pp , Baltimore, Md, USA, October 23. [11] J.-S. R. Jang and H.-R. Lee, Hierarchical filtering for content-based music retrieval via acoustic input, in Proceedings of the ACM International Conference on Multimedia, pp.41 41, October 21. [12] J. Salamon and M. Rohrmeier, A quantitative evaluation of a two stage retrieval approach for a melodic query by example system, in Proceedings of the 1th International Society for Music Information Retrieval Conference, pp , Kobe, Japan, October 29.

10 1 Distributed Sensor Networks [13] L.Wang,S.Huang,S.Hu,J.Liang,andB.Xu, Aneffectiveand efficient for query by humming system based on multisimilarity measurement fusion, in Proceedings of the International Conference on Audio, Language and Image Processing (ICALIP 8), pp , Shanghai, China, July 28. [14] X. Wu, M. Li, J. Liu, J. Yang, and Y. Yan, A top-down approach to melody match in pitch contour for query by humming, in Proceedings of the International Symposium of Chinese Spoken Language Processing, pp , 26. [15] J.-S. R. Jang and M.-Y. Gao, A query-by-singing system based on dynamic programming, in Proceedings of the International Workshop on Intelligent Systems Resolutions, pp , 2. [16] G. P. Nam, T. T. T. Luong, H. H. Nam, K. R. Park, and S. J. Park, Intelligent query by humming system based on score level fusion of multiple classifiers, EURASIP Journal on Advances in Signal Processing, vol. 211, article 21, 11 pages, 211. [17] G. P. Nam, K. R. Park, S.-J. Park, S.-P. Lee, and M.-Y. Kim, A new query-by-humming system based on the score level fusion of two classifiers, Communication Systems,vol.25,no.6,pp ,212. [18]J.Song,S.Y.Bae,andK.Yoon, Mid-levelmusicmelody representation of polyphonic audio for query-by-humming system, in Proceedings of the International Symposium on Music Information Retrieval, pp , Paris, France, October 22. [19] P.Krishnamoorthy,R.Bhatt,A.Srinivas,andS.Kumar, Query by humming system for embedded platforms, in Proceedings of the Annual IEEE India Conference, pp. 1 5, December 21. [2] J.Li,J.Han,Z.Shi,andJ.Li, AnEfficientApproachtoHumming Transcription for Query-by-Humming System, in Proceedings of the 3rd International Congress on Image and Signal Processing (CISP 1), pp , IEEE, Yantai, China, October 21. [21] K. Kim, K. R. Park, S.-J. Park, S.-P. Lee, and M. Y. Kim, Robust query-by-singing/humming system against background noise environments, IEEE Transactions on Consumer Electronics,vol. 57, no. 2, pp , 211. [22] C.-C. Wang, J.-S. R. Jang, and W. Wang, An improved query by singing/humming system using melody and lyrics information, in Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 1), pp.45 5,August 21. [23] E. J. Keogh and M. J. Pazzani, Scaling up dynamic time warping to massive datasets, in Principles of Data Mining and Knowledge Discovery: Third European Conference, PKDD 99, Prague, Czech Republic, September 15 18, Proceedings,vol.174ofLecture Notes in Computer Science, pp. 1 11, Springer, Berlin, Germany, [24] A. M. Youssef, T. K. Abdel-Galil, E. F. El-Saadany, and M. M. A. Salama, Disturbance classification utilizing dynamic time warping classifier, IEEE Transactions on Power Delivery, vol. 19, no. 1, pp , 24.

11 Rotating Machinery Engineering Journal of The Scientific World Journal Distributed Sensor Networks Journal of Sensors Journal of Control Science and Engineering Advances in Civil Engineering Submit your manuscripts at Journal of Journal of Electrical and Computer Engineering Robotics VLSI Design Advances in OptoElectronics Navigation and Observation Chemical Engineering Active and Passive Electronic Components Antennas and Propagation Aerospace Engineering Modelling & Simulation in Engineering Shock and Vibration Advances in Acoustics and Vibration

Query by Singing/Humming System Based on Deep Learning

Query by Singing/Humming System Based on Deep Learning Jia-qi Sun * and Seok-Pil Lee** *Department of Computer Science, Graduate School, Sangmyung University, Seoul, Korea. ** Department of Electronic