Music Information Retrieval Schemes in Peer-to-Peer Environments

Similar documents
Parallelism for Nested Loops with Non-uniform and Flow Dependences

A Binarization Algorithm specialized on Document Images and Photos

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

An Optimal Algorithm for Prufer Codes *

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

Simulation Based Analysis of FAST TCP using OMNET++

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Query Clustering Using a Hybrid Query Similarity Measure

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Module Management Tool in Software Development Organizations

Cluster Analysis of Electrical Behavior

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Learning-Based Top-N Selection Query Evaluation over Relational Databases

A Novel Distributed Collaborative Filtering Algorithm and Its Implementation on P2P Overlay Network*

Load Balancing for Hex-Cell Interconnection Network

Positive Semi-definite Programming Localization in Wireless Sensor Networks

A KIND OF ROUTING MODEL IN PEER-TO-PEER NETWORK BASED ON SUCCESSFUL ACCESSING RATE

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Virtual Machine Migration based on Trust Measurement of Computer Node

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

X- Chart Using ANOM Approach

Efficient Distributed File System (EDFS)

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Available online at Available online at Advanced in Control Engineering and Information Science

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Structural Analysis of Musical Signals for Indexing and Thumbnailing

A Topology-aware Random Walk

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks

A Deflected Grid-based Algorithm for Clustering Analysis

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Lecture 5: Multilayer Perceptrons

High-Boost Mesh Filtering for 3-D Shape Enhancement

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems

Solving two-person zero-sum game by Matlab

Related-Mode Attacks on CTR Encryption Mode

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Needed Information to do Allocation

Music/Voice Separation using the Similarity Matrix. Zafar Rafii & Bryan Pardo

Machine Learning: Algorithms and Applications

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Private Information Retrieval (PIR)

The Codesign Challenge

Analysis of Collaborative Distributed Admission Control in x Networks

UB at GeoCLEF Department of Geography Abstract

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Video Proxy System for a Large-scale VOD System (DINA)

Application of VCG in Replica Placement Strategy of Cloud Storage

Performance Evaluation of Information Retrieval Systems

Mining User Similarity Using Spatial-temporal Intersection

Background Removal in Image indexing and Retrieval

Problem Set 3 Solutions

CMPS 10 Introduction to Computer Science Lecture Notes

TN348: Openlab Module - Colocalization

The Shortest Path of Touring Lines given in the Plane

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Design of Structure Optimization with APDL

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

GSLM Operations Research II Fall 13/14

Fast Computation of Shortest Path for Visiting Segments in the Plane


An Entropy-Based Approach to Integrated Information Needs Assessment

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Classifier Selection Based on Data Complexity Measures *

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Estimating Costs of Path Expression Evaluation in Distributed Object Databases

Wishing you all a Total Quality New Year!

Programming in Fortran 90 : 2017/2018

A New Transaction Processing Model Based on Optimistic Concurrency Control

Spatial Data Dynamic Balancing Distribution Method Based on the Minimum Spatial Proximity for Parallel Spatial Database

Collaboratively Regularized Nearest Points for Set Based Recognition

CS1100 Introduction to Programming

A fast algorithm for color image segmentation

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Specifications in 2001

Optimizing Document Scoring for Query Retrieval

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS

Massive XML Data Mining in Cloud Computing Environment

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

(1) The control processes are too complex to analyze by conventional quantitative techniques.

Load-Balanced Anycast Routing

AN INDEXING METHOD FOR SUPPORTING SPATIAL QUERIES IN STRUCTURED PEER-TO-PEER SYSTEMS

Optimal Workload-based Weighted Wavelet Synopses

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Suppression for Luminance Difference of Stereo Image-Pair Based on Improved Histogram Equalization

A METHOD FOR FACTOR SCREENING OF SIMULATION EXPERIMENTS BASED ON ASSOCIATION RULE MINING

Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

USING GRAPHING SKILLS

Transcription:

Journal of Computer Scence 1 (3): 369-375, 2005 ISSN 1549-3636 Scence Publcatons, 2005 Musc Informaton Retreval Schemes n Peer-to-Peer Envronments Chaokun Wang, Janzhong L and Shengfe Sh School of Computer Scence and Technology, Harbn Insttute of Technology Harbn 150001, People s Republc of Chna Abstract: Peer-to-peer systems are useful tools for musc applcatons. Exstng peer-to-peer systems support the storng and sharng of musc data, however they cannot effectvely and effcently support the content-based nformaton retreval and the cooperaton of muscans. Ths study focuses on the methods of content-based musc nformaton retreval n peer-to-peer envronments. Four musc nformaton retreval schemes are evaluated n detal on communcaton cost, retreval tme, update complexty and robustness. Peers-peers-coordnator scheme s found to be the best one from theoretcal analyss and smulated experments. Also, an algorthm s desgned for an mplementaton of the peers-peers-coordnator scheme and a smple but effectve method s brought forward to flter out the replca n the fnal results. The results of smulated experments show the effcency of the algorthm. Key words: Musc Informaton Retreval, Peer-to-Peer Systems, CBP2PMIR INTRODUCTION Peer-to-Peer (P2P) systems have been greatly successful n facltatng storage and exchange of huge volumes of data snce ther scalablty, fault-tolerance and self-organzng nature. In some P2P based data systems, such as Kazaa, musc data s commonly a large part of ts contents. Ths calls for technques that provde users the abltes to store musc data safely and retreve musc data quckly and accurately. Example 1: Imagne an Internet-scale P2P musc system that conssts of peers rangng from desktops behnd modem lnes to powerful servers connected to the Internet through hgh-bandwdth lnes. Each peer shares some resources wth other peers, such as musc data and porton of storage, for the common good of everyone. Peers n musc system can nterchange musc data and other nformaton. A user, for example, can store and publsh a pece of musc n the system wthout specfyng ts exact locaton. Also, he or she can search the whole system for a pece of musc, for example, the pece of musc ttled as My Sun. Ths example shows the role of P2P n storng and sharng of musc data. Example 2: Suppose that you are sttng before your computer connectng to a P2P musc system. You suddenly hear a song. It s the best song you have heard for a long tme, but you mssed the announcement and don t recognze the artst. Wouldn t t be nce f you could hum a melody you remembered, e.g. the melody showed n Fg. 1, or push a few keys on your keyboard and a few seconds later the P2P system would tell you the name of the artst and the ttle of the musc you re lstenng to? Perhaps the system even sends ths pece of 369 musc to your computer. Ths example shows that content-based nformaton retreval n P2P systems s very mportant for musc applcatons. 5 2 3 4 3 21 2 Fg. 1: A Sample Melody Example 3: In a musc communty, the computers of composers, conductors, volnsts, pansts, organsts and so on, are connected to establsh a peer-to-peer system. In ths P2P system, many tasks, such as the creaton of a huge opera, should be cooperated by several muscans together va the P2P network. Each muscan could compose hs part and communcate wth others and then send hs work to a certan muscan who s responsble to combnng all the parts nto the fnal opera. When a composer s workng on a symphony, he can use an organst s computer to tune varous musc nstruments and use a panst s computer to synthesze the symphony and use hs own computer to test the symphony. Ths example shows the role of P2P n the cooperaton of muscans. From the examples above, t can be seen that P2P systems are useful tools for musc applcatons and the P2P systems used n musc applcatons must be have abltes of supportng storng and sharng of musc data, content-based nformaton retreval and cooperaton of muscans. Although exstng P2P systems support the storng and sharng of musc data, but they can not effectvely and effcently support the content-based nformaton retreval and the cooperaton of muscans. Thus, exstng P2P systems are not sutable to musc applcatons. The purpose of our research s to create a

P2P musc system that has the three abltes above and supports most of musc applcatons. In the rest of the study, CBP2PMIR s used to express content-based musc nformaton retreval n P2P envronments. There has been a consderable amount of work on content-based nformaton retreval for musc data n non-p2p envronments. A varety of results were ganed n symbolc musc nformaton retreval (MIR), whch s musc nformaton retreval on musc events, such as MIDI data [1, 2, 5, 6]. Other works focused on acoustc MIR, whch s musc nformaton retreval on musc dgtal sgnals, such as WAV format [3, 4, 7]. Most of the prevous research work on content-based musc nformaton retreval s not related to P2P envronments. Also, they cannot be drectly appled to P2P systems. Many P2P systems have been developed. They can be dvded nto three classes. One s centralzed, such as Napster. The second one s decentralzed, such as Gnutella. The thrd one s hybrd that combnes the advantages of centralzed and decentralzed systems, such as Kazaa and Morpheus. All the P2P systems are dentfer or keyword based rather than content based. They can hardly process musc queres, such as a melody hummed. Four schemes of CBP2PMIR are proposed n our prevous conference studes [9] and the query processng methods n each one are also presented n t. The frst two schemes are centralzed. The thrd one s dstrbuted. And the last one s hybrd. The proposed four schemes have some common features. Frst, musc resource s dspersed over the whole system. Second, musc data s transferred drectly from one peer to another. Fnally, the systems behave n ad-hoc manner, that s, any peer can moves n to and moves out of the P2P system freely. The performance of the schemes s dfferent n term of communcaton cost, robustness and nformaton retreval tme. Also, t should be evaluated so that the best scheme can be selected. Assumptons and Parameters: The key problem of CBP2PMIR s an optmzaton problem, that s, developng a musc retreval algorthm so that COMM and TIME are mnmzed under the constran condton RTN n, where COMM s the communcaton cost and TIME s the tme of processng a musc query Q and RTN s the number of musc fles n query result. Parameters used n the followng evaluaton are defned n Table 1. In the rest of the study, R and N are used to denote the set of nonnegatve real numbers and the set of postve ntegers respectvely. Communcaton Cost: In each scheme, the communcaton cost conssts of four parts. The frst part s the musc feature Qf transferred from the queryng peer to the coordnator or other peers. The second s the results, {(Pd, Md j, Cf(Qf, Mf j))}, transferred from the coordnator or other peers to the queryng peer, where J. Computer Sc., 1 (3): 369-375, 2005 370 Pd s the network dentfer of the th peer, Md j s the dentfer of the j th musc fle n the th peer, Mf j s the feature of the j th musc fle n the th peer and Cf s a musc feature-matchng functon, that s, Cf: Mf Mf RANK, where Mf s the set of musc features and RANK [0, 1] s a set of real numbers.. The thrd s the set of the user s downloadng requests, {(the network dentfer of the queryng peer, the network dentfer of a destnaton peer, the dentfer of a selected musc fle)}, transferred from queryng peers to destnaton peers. The last s the musc fles downloaded. From the descrpton of the queryng processes n the four musc nformaton retreval schemes n [9] and the parameters n Table 1, we can easly derve the communcaton cost of each scheme. Due to the lmtaton of the paper length, we gnore the dervng process here and only gve the results n Table 2. Please note that t' t,and (w + w 2 +... + w d ) W when the number of peers n a system s suffcently large. Lemma 1: The COMM of PsC s less than that of PsC +. Proof: The result follows from Table 2. Lemma 2: When the number of peers n a system s suffcently large, the COMM of PsPs s less than that of PsC. Proof: Snce the number of peers s suffcently large (W ), t. Because of w, d, q and re beng all constants, w+w 2 +...+w d -1 s also a constant and there s an upper bound for t'. Thus, [(w+w 2 +...+w d -1) q]/(re t) 0, 1-[(w+w 2 +...+w d -1) q]/(re t) 1 and furthermore t'/t 1-[(w+w 2 +...+w d -1) q]/(re t). Fnally, we have (w+w 2 +...+w d ) q +t' re+n q'+n m t re + q + n q' + n m, that s, the COMM of PsPs s less than that of PsC. Lemma 3: The COMM of PsPsC s not more than that of PsPs. Proof: It only needs to show that when the query results are the same n PsPsC and PsPs schemes and user selects all fles from the merged results, the COMM of PsPsC s not more than that of PsPs. Snce user selects all fles from the merged results n both schemes, the number of musc fles satsfyng the user s query n PsPs s equal to that n PsPsC when the fnal results n the two schemes are the same,.e. t' = t''. Accordng to defnton of PsPsC scheme, Cas n PsPsC, a data structure n the coordnator for acceleratng the nformaton retreval process, can accelerate musc nformaton retrevng by locatng the retrevng to some but not all peers on whch there are more musc fles matchng user s query. Let Ht be the rato of the number

J. Computer Sc., 1 (3): 369-375, 2005 Table 1: Defntons of Parameters Parameter Name Defnton Scope of Applcaton q=2spd+sqf Spd s sze, n bytes, of peer d, sqf s sze, n bytes, of the musc all schemes feature extracted from a user s query Q re=2spd+smd+r Spd s sze, n bytes, of peer d, smd s sze, n bytes, of a musc fle s all schemes d and r s sze, n bytes, of the matchng value between ths fle and Q t Number of musc fles satsfyng Q PsC, PsC + n Number of musc fles a user selects all schemes q'=2spd+md Spd s sze, n bytes, of peer d and smd s sze, n bytes, of a musc all schemes fle s d m Average sze, n bytes, of musc fles all schemes W Number of peers n P2P system all schemes w Wdth of the system PsPs, PsPsC d Depth of the system PsPs, PsPsC t' Number of fles satsfyng Q PsPs W' Number of peers satsfyng Q PsPsC t'' Number of fles satsfyng Q PsPsC Table 2: Communcaton Cost of Each Scheme Part PsC PsC + PsPs PsPsC 1 q W q (w + w 2 +... + w d ) q W' q 2 t re t re t' re t'' re 3 n q' n q' n q' n q' 4 n m n m n m n m COMM q + t re+ q W + t re+ (w + w 2 +...+w d ) q+ W' q + t'' re+ n q' + n m n q' + n m t' re + n q' + n m n q' + n m of fles satsfyng user s query to the number of peers nvolved n the processng of user s query. Ht n PsPsC scheme s hgher than Ht n PsPs scheme. That means t''/w' t'/(w + w 2 +... + w d ). Thus W' (w + w 2 +...+ w d ) and W' q + t'' re+ n q' + n m (w +w 2 +...+w d ) q +t' re+n q' +n m, that s, the COMM of PsPsC s not more than that of PsPs. Theorem 1: When the number of peers n the system s suffcently large, the COMM of PsPsC s the least. Proof: The result follows from Lemmas 1, 2, and 3. Retreval Tme: The retreval tme of each scheme s composed by three parts. The frst part s the tme for computaton,.e. the tme used for feature-matchng. The second part s the tme used to merge the local results from the peers and to sort the fnal results. The last part s the tme used for communcaton that has been dscussed n the above text. Thus we wll consder the tme for computaton, mergng and sortng here. The retreval tme of each scheme s lsted n Table 3, where A = {(Mf j, Cf(Qf, Mf j)) j N} s the set of features of musc fles stored on the th peer, A = {(Mf j, Cf(Qf, Mf j)), j N} = N A s the set of features of all musc fles stored n a whole P2P system, T(A ) s the tme for computng and sortng Cf(Qf, Mf j) of A, T(A) s the tme for computng and sortng Cf(Qf, Mf j) of A, M{t} s the tme for mergng the result of t fles and M{t'} and other peers. 371 M{t''} are smlar to M{t}. It s obvous that the TIME of PsC s more than others. Because W' W, M{t''} M{t}, the TIME of PsPsC s less than the TIME of PsC+. Because W' (w + w 2 +... + w d ) when t'' = t' from Lemma 3, the tme for computng and mergng n PsPs s more than max{t(a )} + M{t''}, that s, the TIME of PsPsC s less than the TIME of PsPs. Thus the TIME of PsPsC s the least. Update Complexty and Robustness: The update n a P2P system ncludes musc fle update on a peer and peer movng n to or movng out of the P2P system. In PsC scheme, a peer sends ts network dentfer and the features of ts shared musc peces to the coordnator when t moves n to the P2P system. The nformaton of the peer should be deleted from the coordnator when t moves out of the P2P system. The nformaton of a document should be added nto or deleted from the coordnator when t s added nto or deleted from the shared musc set of a peer. In PsC + scheme, the network dentfer of a peer s saved n the coordnator when t moves n to the P2P system. The nformaton should be deleted from the coordnator when t moves out of the P2P system. In PsPs scheme, the network dentfer of a peer s saved n ts neghbor peers when the peer moves n to the P2P system. When the peer moves out of the P2P system, ts nformaton should be deleted from ts neghbor peers. Document update of a peer n these two schemes has no effect on

J. Computer Sc., 1 (3): 369-375, 2005 Table 3: Retreval Tme of Each Scheme Part PsC PsC + PsPs PsPsC Computaton tme T(A) max{t(a )} max{t(a )} max{t(a )} Merge tme M{t} M{t'} M{t''} TIME T(A) max{t(a )}+M{t} max{t(a )} +M{t'} max{t(a )}+M{t''} Remark = 1,...,W = 1,..., (w + w 2 +... + w d ) = 1,...,W' In PsPsC scheme, the network dentfer of a peer s saved n the coordnator and ts neghbor peers when t moves n to a P2P system. Also, the statstc of the shared musc fles of the peer s saved n the coordnator. When the peer moves out of the P2P system, the nformaton s deleted from the coordnator and ts neghbor peers. After a pece of musc s added nto or deleted from the shared musc set of a peer, the correspondng statstc saved n the coordnator should be updated f the dfference between the new statstc and the old one s more than a value specfed by the system. In concluson, when a peer moves n to or moves out of a P2P system, the update cost of PsC + scheme s smallest, the update cost of PsPs scheme s smaller than the update costs of PsPsC scheme and PsC scheme. When musc fles are updated on a peer, the update costs of PsC + and PsPs scheme are all smallest, the update cost of PsC scheme s largest and the update cost of PsPsC scheme s between them. In PsC and PsC + schemes, the coordnator s easly overloaded and becomes the bottleneck of the whole system. P2P systems constructed by these schemes have weak robustness. For example, f the coordnator s attacked by denal of servce from a malcous peer, the system can be faled. Inversely, PsPs scheme has strong robustness because t s fully dstrbuted. However the number of messages sent by peers n the scheme s numerous. The communcaton cost of PsPs scheme s hgh. PsPsC s a hybrd system that takes the advantages of PsC, PsC + and PsPs schemes. It can contnue workng va neghbor peers when there s somethng wrong wth the coordnator. PsPsC scheme has strong robustness. Performance Evaluaton of Schemes: From the above comparson, PsPsC scheme s better than others n terms of communcaton cost, retreval tme and robustness except ts update performance beng lower n some cases. Thereby our CBP2PMIR system s developed n the lght of ths scheme. In order to compare the performance of the four schemes, a smulator, whch smulates P2P systems of 10,000 PIII/600 personal computers, s frst created. Then, four sets of musc fles wth szes of 1,268,870, 3,169,806, 5,073,459 and 6,341,780 are respectvely used. Each set s generated by all the peers each of whch randomly generates one of ts subsets based on dfferent mean and standard devaton between 0 and 1 under the normal dstrbuton. Fnally we run 15 queres on the four sets n four musc nformaton retreval schemes mentoned 372 prevously. The features of the queres are 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85 and 0.9. The parameters spd=4bytes, sqf=8bytes, smd=4bytes, r=8bytes, so q=16bytes and re=20bytes. The expermental results are shown n Table 4. In the table, each retreval tme s the average value of retreval tmes of the 15 queres. The retreval tme of a query s the elapsed tme from the tme of the query beng submtted by a user to the tme of the query result beng sent to the user. The retreval tme n PsC scheme s very long because the comparson of all documents wth the query and the sortng of results are performed n the coordnator. When the computaton s dspersed over the peers, retreval tme comes down as n PsC + scheme. If there are some ndces n the coordnator, the retreval tme used n PsC and PsC + schemes wll be shorter. The retreval tme n PsPs scheme s short, but the query result may be lost because that some peers may not be searched durng the query processng. Tme used n PsPsC scheme s the smallest because the query s processed approxmately and computaton s performed n dstrbuted manner. If a user only needs approxmate answer, PsPsC scheme s the best selecton. In ths scheme, f the coordnator faled, the system can stll run n PsPs scheme whose performance s stll very hgh. If a user needs exact answer, PsC + s a good selecton. Implementaton of PSPSC Scheme: In a P2P envronment, there are usually a lot of musc fles smlar to a query submtted by a user. However, the user often wants a porton of the whole smlar musc fles. Therefore, approxmate query processng s very common for musc nformaton retreval n P2P envronments and then PsPsC scheme s very mportant. In ths study, an mplementaton of PsPsC scheme s also proposed. Two key problems are consdered. The frst problem s how to fnd the set of destnaton-lke peers. In a P2P data system, a peer s called a destnaton-lke peer of a query f there are perhaps lots of musc fles smlar to the query on the peer. The second problem s how to flter out the repeated musc fles. In the followng dscusson, PNIopt represents the set of destnaton-lke peers, each element n whch s the network dentfer of a destnaton-lke peer. In musc theory, an nterval s the dfference n ptch between the current note and the prevous note. The unt of nterval used n ths study s semtone. For example, the nterval between "do" and "re" s 2 semtones, whle

J. Computer Sc., 1 (3): 369-375, 2005 Table 4: Average Retreval Tme n Each Scheme n a Smulator (Seconds) Number of docs PsC PsC + PsPs PsPsC 1,268,870 19.1769 1.0280 0.0973 0.0256 3,169,806 47.9910 2.6715 0.1670 0.0319 5,073,459 78.3206 3.9243 0.2304 0.0461 6,341,780 99.1541 6.5202 0.2831 0.0526 the nterval between "re" and "m" s 1 semtone. A sequence of ntervals can be used to represent a melody. For example, the sequence of nterval of the melody n Fg. 1 s (5 1 2-2 -1-2 2). Also, a sequence of ntervals can be used to represent a pece of musc because a monophonc melody can be extracted from a pece of musc [8]. Defnton 1: Let d be a postve nteger. For a pece of musc whose sequence of ntervals s Seq = ( 1 2... j... Nsum ), 1 j N sum, j s an nterval value, N sum s the number of all ntervals n Seq. Let N d be the number of ntervals whose absolute values are not less than d, Rat d = N d /N sum s called the rato of d nterval of the pece of musc. Let d be 2, the rato of d nterval of the melody n Fg. 1 s 0.71. When d s gven n a P2P system, Rat d can be abbrevated to Rat. For two peces of musc m 1 and m 2, Ratm 1 and Ratm 2 are respectvely the ratos of d nterval of m 1 and m 2. Let offset be a small postve number, m 1 s called smlar to m 2 f Ratm 1 -Ratm 2 <offset, that s, Ratm 1 s close enough to Ratm 2. Defnton 2: Let d be a gven postve nteger. For a set of musc fles, f a varable R d s used to denote the rato of d nterval of each pece of musc n the set, (a, b) s called the feature of d nterval of ths set, where (1) a s the mean of R d ; (2) b s the standard devaton of R d. It s obvous that 0 a, b 1. Defnton 3: Let d be a gven postve nteger. S = {(a, b) 0 a, b 1} s called the space of d nterval features, where (a, b) S f and only f (a, b) s the feature of d nterval of a certan musc set. Defnton 4: Gven 0 a m, b n 1 (m, n = 0, 1, 2,..., v), a 0 = 0 < a 1 < a 2 <... < a v-1 < a v = 1 and b 0 = 0 < b 1 < b 2 <... < b v-1 < b v = 1, S s parttoned by these ponts nto v 2 subspaces that are denoted by S j, where (1). S j Φ, 1, j v, S j = {(a, b) a -1 a < a, b j-1 b < b j }, 1, j (v - 1); S j = {(a, b) a -1 a a, b j-1 b< b j }, = v, 1 j (v-1); S j = {(a, b) a -1 a < a, b j-1 b b j }, 1 (v - 1), j = v; S j = {(a, b) a -1 a a, b j-1 b b j }, = j = v; (2). S S =Φ, 1 1, 2, j 1, j 2 v, 1 2 or j 1 j 2 ; 1 j1 2 j2 (3). v U S = S. j, j= 1 {S j, j = 1,..., v} s called a partton of S and (a 0, a 1,..., a v, b 0, b 1,..., b v ) s called the parttonng sequence. Defnton 5: S (k,h) s called the expandng set of (k, h), f (k, h) s the feature of d nterval of a certan musc set, S (k,h) ={(a,b) a [max(k-α, 0), mn(k+α, 1)], b [max(k-β, 0), mn(k+β, 1)]} S, where α, β [0, 1]. Then α s called the expandng factor on mean, β s called the expandng factor on standard devaton. Defnton 6: Supposng that E S, {S j, j = 1,..., v} s a partton of S, G {S j, j = 1,..., v}, G s called the mnmal overlay of E f (1) E U S ' ; (2)G' G, E U S ' G' = G. ' Defnton 7: S = {(a, b) 0 a, b 1} s the space of d nterval features, {S mn m, n = 1,..., v} s a partton of S. In the PsPsC scheme, the mappng f:{pd } {S mn } s called a partton mappng, f f(pd )=S mn (a, b ) S mn, where (a, b ) s the feature of d nterval of the shared musc set on the th peer. Please note that the shared musc fles on a peer may have more than one feature of d nterval. For example, the shared musc fles on a peer can be clustered nto several sets and then one feature of d nterval can be extracted from each set. To smplfy the dscusson of the mplementaton, a peer only corresponds to one feature of d nterval n ths secton, that s, a peer only corresponds to one set of musc fles. But t can be extended to be sutable for other envronments. In the rest of ths study, ff(s mn ) s used to denote the set {Pd f(pd )=S mn, N}. An mplementaton of Cas, the data structure n the coordnator, s {ff(s mn ) m, n = 1,..., v}. The Algorthm to Fnd PNIopt: Gven parameters d, parttonng sequence, expandng factors α and β, the algorthm to fnd PNIopt can be descrbed as follows. Input: a musc query Q, whch s a pece of musc, a song sung, or a melody hummed. 373

J. Computer Sc., 1 (3): 369-375, 2005 Output: the set of destnaton-lke peers PNIopt. Steps: * Extract the sequence of ntervals of Q; * Compute Rat Q - the rato of d nterval of Q. Please note that Rat Q s consdered as Qf n ths mplementaton of PsPsC scheme, that s, Pf(Q)=Qf=Rat Q, where Pf denotes the operaton of computng the rato of d nterval of Q from the sequence of ntervals of Q; * Compute Qps=Stat({Q})=(a Q, b Q ), where Stat( ) s the operaton of computng the feature of d nterval of a set of musc fles. Obvously a Q = Rat Q, b Q = 0; * Compute S(a Q,b Q ), - the expandng set of Qps; * Compute G - the mnmal overlay of S(a Q,b Q ), G {S mn m, n = 1,..., v}; * Compute PNI opt = ff ( S ') Pd f ( Pd ) = S ', N U = U { } = {Pd f(pd ) G, N}; * Return PNI opt. PNIopt can be refned by tunng parttonng sequence, v, α and β. Flterng Method: In the mplementaton of PsPsC scheme, the coordnator sends PNIopt to the queryng peer. The queryng peer sends Rat Q, the feature of the musc query, to these destnaton-lke peers. Each destnaton-lke peer, say the k th peer, returns the local result, that s, Pd k and {( Mf k j, Cf(Rat Q, Mf k j)) Cf(Rat Q, Mf k j)<offset, j N}, to the queryng peer, where offset s the matchng condton gven by the user. The queryng peer receves all results, sorts them and exhbts them to the user. Repeated copes of a verson of a pece of musc stored n dfferent peers may be returned. Thus, t s mportant to flter out the redundant musc fles n the result shown to the user. In ths study a smple and effectve method s presented to flter out the repettons n the results. Musc fles wth the same content perhaps have dfferent names n dfferent peers, but they have the same szes and usually have the same tmestamps (the date and tme attrbutes of fles). So the szes or the tmestamps of musc fles wth equal matchng values can be used to judge whether these fles are repeated. Usually the sze of fle s enough for the job. Let the format of the result be Pd k and {(Mf k j, Fs k j, Cf(Rat Q, Mf k j)) Cf(Rat Q, Mf k j)<offset, j N}, where Fs k j s the sze of the j th smlar musc fle n the k th destnaton-lke peer. Durng the mergng and sortng of results from destnaton-lke peers, f the queryng peer fnds some musc fles wth the same matchng values, t compares ther szes and then deletes the replca when they are the same. RESULTS Some smulated experments are desgned to show the effcency of the proposed algorthm. In order to measure the expermental results, the followng concepts are ntroduced. Defnton 8: Supposng that Q s the nput query, Clst s the set of network dentfers of all peers and PNIopt CLst s the output, PR = PNI opt Clst s called peer-rato, whch represents the percentage of destnaton-lke peers n all peers. Defnton 9: Gven a value of offset, the followng rato s called ht-rato HR = PNIopt Clst Ph Ph, where Ph ={Md j Cf(Qf, Mf j)<offset, j N} = {Md j Rat Q - RatMd j < offset, j N}. HR represents the percentage of the number of musc fles smlar to Q n PNI opt to the number of all musc fles smlar to Q n the whole P2P system. Defnton 10: The rato of HR to PR s called acceleratng-rato HR η =. PR η represents the effcency of the proposed algorthm. The followng experments are made n our smulator wth 6,341,780 musc fles. The 15 queres mentoned prevously are also processed n the smulator. The parameter v s set to 5 and the parttonng sequence (a 0, a 1,..., a v, b 0, b 1,..., b v ) s altered from a 0 = 0 < a 1 < a 2 <... < a v-1 < a v = 1, b 0 = 0 < b 1 < b 2 <... < b v-1 < b v = 1 to 0 a 0 < a 1 < a 2 <... < a v-1 < a v 1, 0 b 0 < b 1 < b 2 <... < b v-1 < b v 1, 0 a x, b y 1, x, y = 0, 1, 2,..., v, where a 0, a v, b 0 and b v are determned by the experment data. Dfferent expandng factors α and β are selected to test the dfferent effects. Then the averages of PR, HR and η are computed and shown as follows. Fgure 2 shows the relatonshp between average HR and offset at dfferent expandng factors. When α ncreases, HR ncreases because more peers are searched for smlar musc to Q. The same s true of β. When offset ncreases, HR decreases. It s better to keep offset wthn 374

Fg. 2: The Average HR n the P2P System of 10000 Peers Fg. 3: The Average η n the P2P System of 10000 Peers a proper range, such as [0.001, 0.01] n these experments, to get the more stable HR. Fgure 3 shows the relatonshp between average η and offset at dfferent expandng factors. When α ncreases, η decreases snce peers wth lower Ph are searched. The same s true of β. When offset ncreases, η decreases because more rrelevant ngredents are nvolved n the retreval process, such as more dssmlar musc fles to Q. In concluson, HR and η are always stable at certan expandng factors α and β. An expandng factor α or β can be ncreased to mprove HR, but the acceleratng rato η wll fall. When takng α = 0.01, β = 0.15 and offset = 0.005, HR wll be close to one fourth (23.91%) and η wll be more than 4 (4.1191). It s a good tradeoff. CONCLUSIONS J. Computer Sc., 1 (3): 369-375, 2005 CBP2PMIR s ntroduced n ths study. Four schemes of CBP2PMIR are evaluated n detal on communcaton cost, retreval tme, update complexty and robustness. PsPsC scheme s found out to be the best one for 375 approxmate queres and PsC+ s best for exact queres. After that, an mplementaton of PsPsC scheme s presented. Based on some useful concepts, an algorthm s desgned to fnd the destnaton-lke peers. A smple yet effectve algorthm s also gven to flter out the replca n the fnal results. Experments show that these algorthms are very effcent. AKNOWLEDGEMENT Ths work was supported by the 973 Research Plan of Chna under Grant No. G1999032704, the NSF of Chna under Grant No. 60273082, the 863 Research Plan of Chna under Grant No. 2002AA444110 and the Army Research Plan of Chna under Grant No.41315.2.3. REFERENCES 1. Downe, D. and M. Nelson, 2000. Evaluaton of a Smple and Effectve Musc Informaton Retreval Method. In Proceedngs of the 23 rd Intl. ACM SIGIR Conf. on Res. and Development n Informaton Retreval, Athens, Greece, pp: 73-80. 2. Hsu, J.-L., C.-C. Lu and A.L.P. Chen, 2001. Dscoverng Nontrval Repeatng Patterns n Musc Data. IEEE Transactons on Multmeda, 3: 311-325. 3. Jn, H. and H.V. Jagadsh, 2002. Indexng Hdden Markov Models for Musc Retreval. In Proceedngs of the 3 rd Intl. Symposum on Musc Informaton Retreval, Servce des Publcatons, Pars, France. 4. Lu, C.-C. and P.-J. Tsa, 2001. Content-Based Retreval of MP3 Musc Objects. In Proceedngs of the 10 th ACM Intl. Conf. on Informaton and Knowledge Management, Atlanta, Georga, USA, pp: 506-511. 5. Melucc, M. and N. Oro, 1999. Muscal Informaton Retreval usng Melodc Surface. In Proceedngs of the 4 th ACM Intl. Conf. on Dgtal lbrares, Berkeley, Calforna, USA, pp: 152-160. 6. Shalev-Shwartz, S., S. Dubnov, N. Fredman and Y. Snger, 2002. Robust Temporal and Spectral Modelng for Query by Melody. In Proceedngs of the 25 th Intl. ACM SIGIR Conf. on Res. and Development n Informaton Retreval, Tampere, Fnland, pp: 331-338. 7. Tzanetaks, G. and P. Cook, 2002. Muscal Genre Classfcaton of Audo Sgnals. IEEE Transactons on Speech and Audo Processng, 10: 293-302. 8. Utdenbogerd, A.L. and J. Zobel, 1998. Manpulaton of Musc for Melody Matchng. In Proceedngs of the 6 th ACM Intl. Conf. on Multmeda, Brstol, Unted Kngdom, pp: 235-240. 9. Wang, C., J. L and S. Sh, 2002. A Knd of Content-Based Musc Informaton Retreval Method n a Peer-to-Peer Envronment. In Proceedngs of the 3 rd Intl. Symposum on Musc Informaton Retreval, Servce des Publcatons, Pars, France, pp: 178-186.