Shadow Document Methods of Results Merging

Size: px

Start display at page:

Download "Shadow Document Methods of Results Merging"

Marilyn Wilson
5 years ago
Views:

1 Shadow Documet Methods of Results Mergig Shegli Wu ad Fabio Crestai Departmet of Computer ad Iformatio Scieces Uiversity of Strathclyde, Glasgow, UK ABSTRACT I distributed iformatio retrieval systems, documet overlaps occur frequetly across results from differet databases. This is especially the case for meta-search egies which merge results from several geeral-purpose web search egies. This paper addresses the problem of mergig results which cotai overlaps i order to achieve better performace. Several algorithms for mergig results are proposed, which take advatage of the use of duplicate documets i two ways: oe correlates scores from differet results; the other regards duplicates as icreasig evidece of beig relevat to the give query. A variety of experimets have demostrated that these methods are effective. Categories ad Subject Descriptors H.3.3 [Iformatio Search ad Retrieval]: [Distributed Iformatio Retrieval, Result Mergig] Geeral Terms Algorithms Keywords Iformatio Retrieval, Data Fusio 1. INTRODUCTION With the widespread use of wide area etworks ad especially the Iteret, o-lie iformatio systems proliferate very rapidly. Users ofte fid it ecessary to search differet databases to satisfy a iformatio eed. I additio, for iformatio that is proprietary, costs moey, or its publisher wishes to cotrol it carefully, it caot be collected ad idexed i a cetralised database eviromet. I such cases, distributed iformatio retrieval (DIR) systems become a alterative solutio, ad oe i which mergig of results is a aspect which demads careful attetio. Permissio to make digital or hard copies of all or part of this work for persoal or classroom use is grated without fee provided that copies are ot made or distributed for profit or commercial advatage ad that copies bear this otice ad the full citatio o the first page. To copy otherwise, to republish, to post o servers or to redistribute to lists, requires prior specific permissio ad/or a fee. SAC 04, March 14-17, 2004, Nicosia, Cyprus. Copyright 2004 ACM /03/04...$5.00. For mergig of results, three cases ca be idetified depedig o the degree of overlap amog the selected databases for a give query [11]: 1. the databases are pairwise disjoit or early disjoit; 2. the databases overlap but are ot idetical; 3. the databases are idetical. Both cases 1 ad 3 have bee discussed extesively by may researchers, for example, i [2, 4, 5, 10, 14, 16, 17, 19]. However, results mergig algorithms suitable for Case 2 remais a ope questio [11]. Previously, meta search egies treated this i two typical ways. Oe regards duplicates as redudat iformatio, which are discarded whe we fid out that they exist. Such a method was used by Metacrawler i its early stage [14]. The other is usig some methods such as liear combiatio, which are suitable for Case 3. The latter method was used by Profusio [7]. Neither of the above solutios is a good solutio: the first does ot use the iformatio implied by the duplicates for better performace, while the secod igores the differece betwee idetical databases ad partially overlappig databases. I case 2, the results mergig problem is differet ad eve more complicated tha that i case 1 or case 3. I this paper, we preset a group of algorithms which are suitable for case 2. Our solutio itegrates the techiques used for case 1 ad case 3. Experimetal results are preseted for compariso of these algorithms. The rest of the paper is orgaised as follows. I Sectio 2 we review some related work o data fusio with totally idetical databases ad results mergig without overlappig. Sectio 3 presets a group of methods for mergig results from overlappig databases. The experimetal settig ad results are preseted i Sectio 4. Sectio 5 cocludes the paper. 2. RELATED WORK There has bee a lot of research o result mergig with idetical databases, which is called data fusio i may publicatios. Fox ad Shaw [6] proposed a umber of combiatio techiques icludig operators like Mi, Max, Comb- Sum ad CombMNZ. CombSum sets the score of each documet i the combiatio to the sum of the scores obtaied by the idividual database, while i CombMNZ the score of each documet was obtaied by multiplyig this sum by the umber of databases which had o-zero scores. Note that summig (CombSum) is equivalet to averagig while

2 CombMNZ is equivalet to weighted averagig. Lee [10] studied this further with six differet servers. His cotributio was to ormalise each iformatio retrieval server o a per-query basis which improved results substatially. Lee showed that CombMNZ worked best, followed by CombSum while operators like Mi ad Max were the worst. I additio, liear combiatio is valuable alterative. I Vogt ad Cottrell [17] the relevace of a documet to a query is computed by combiig both a score that captures the quality of each database ad a score that captures the quality of the documet with respect to the query. Formally, if q is a query, d is a documet, s is the umber of databases, ad w = (w 1, w 2,..., w s) are the database scores, the the overall relevace p of d P i the cotext of the combied list is give s by p(w, d, q) = i=1 w ip i (d, q). For o-overlappig results from differet databases, Roudrobi is a simple mergig method, which takes oe documet i tur from each available result set. However, the effectiveess of such a method depeds o the performaces of compoet database servers. If all the results have similar effectiveess, the Roud-robi performs well; if some of the results are very poor, the Roud-robi becomes very poor as well. Voorhees et al. [16] demostrated a way of improvig the above Roud-robi method. By ruig some traiig queries, they estimated the performace of each database. Whe a ew query is ecoutered, they retrieve a certai umber of documets from each database based o its estimated performace. Calla et al. [2] proposed a mergig strategy based o the scores achieved by both database ad documet. The score for each database is calculated by a database selectio algorithm, CORI, which idicates the goodess of a database with respect to a give query amog a set of available databases. The score of a documet is the value that documet obtais from that database, which idicates the goodess of that documet to a give query amog a set of retrieved documets from that database. Calve ad Savoy [3] proposed a mergig strategy based o logistic regressio. Some traiig queries are eeded for settig up the model. Their experimets show that the logistic regressio approach is sigificatly better tha Roud-robi, raw-score, ad ormalised raw-score approaches. Si ad Calla [15] used a liear regressio model for results mergig, with a assumptio that there is a cetralised sample database which stores a certai umber of documets from each database. The DIR broker rus the same query o the sample database as differet database servers do, picks up those duplicate documets i the results for the sample database ad for each database, ad ormalises the scores of these documets i all results by a liear regressio aalysis with the documet scores from the sample database servig as a stadard. Their experimetal results show that their method is slightly better tha CORI. Fially, let us review the results mergig policies used by some meta search egies. Metacrawler [14] is probably oe of the first meta search egies that were developed i the world wide web. Its policy was relatively simple, elimiatig duplicate URLs first the mergig the results i a Roud-robi fashio. Profusio [7], aother early meta search egie, used the liear combiatio algorithm. Savvy Search [8] focused primarily o the problem of learig ad idetifyig the right set of search egies to which to issue the queries, ad to a lesser extet o how the retured results are merged. Iquirus [9] took a differet approach to the problem. Istead of relyig o the egie s rakig mechaism, It retrieves the full cotets of the web pages retured ad rak them more like a traditioal search egie usig iformatio retrieval techiques applicable to full documets oly. Similar to Iquirus, Mearf [13] retrieved the full cotets of the web pages retured, the re-raked them based o the aalytical results o cotets ad/or liks of these web pages. Although regressio aalysis has bee used by Calve ad Savoy ad Si ad Calla, we use this method i a differet situatio, where partial overlaps exist betwee differet databases. I our case, o traiig is eeded as i Calve ad Savoy s work, ad either a cetralised sample database or cetralised iformatio retrieval system are eeded as i Si ad Calla s work. O the other had, how to merge documets which appear i oly oe result with those which appear i several differet results is a ew issue, which has ot bee cosidered before. 3. MERGING RESULTS FROM PARTIALLY OVERLAPPING DATABASES I this paper, we assume that we have a group of databases which are partially overlappig. For a give query, we obtai results from these databases. Except that, o descriptio iformatio about these databases is required. I additio, we do ot cosider dowloadig all the documets from differet databases for re-rakig. Usually, we do ot eed to check the full texts of all the documets i order to idetify may of the duplicates. For example, i most cases the URL ca be used as a idetifier for every documet i web search egies. I aother example we may estimate that two documets are the same by comparig their meta-data, such as the titles, authors, publishers, ad so o. Besides, oe assumptio we make is that all documets i the results have scores which have bee obtaied from their database servers. From overlappig databases, we have more iformatio (duplicates) to merge results tha from pairwise disjoit databases, i which o database has ay commo documets with ay other database. We may use this iformatio i two differet ways. First, the duplicate documet pairs i two differet results may be helpful for correlatig the scores of all the documets cotaied i two differet results. As i [3, 15], regressio aalysis is a useful tool for achievig this. O the other had, a documet appearig i more tha oe result ca be regarded as icreasig evidece of beig relevat to the give query. However, how to merge documets which occur oly i oe result with those which occur i more tha oe result is a ew issue. Some of the techiques [6, 10, 17] employed i data fusio ca be used here. However, there are some importat differeces which eed to be cosidered. I data fusio oe assumptio is: the more ofte a documet occurs i differet results, the the more likely it is a relevat documet to the give query. However, whe differet database servers retrieve from databases with some overlaps with each other, the above assumptio ca ot be applied directly. For example, suppose for a give query Q, database server A retrieves documets d 1 ad d 3, database server B retrieves documets d 2 ad d 3. Thus, d 3 occurs

3 i both results of A ad B, but d 1 ad d 2 oly occur oce. Does that suggest d 3 is more likely to be relevat to the query tha d 1 ad d 2? We are uable to reach such a coclusio uless we kow specifically that d 1 is i B s database ad d 2 is i A s database. However, we may ot have this iformatio ad i Sectio 3.2 we discuss this i more detail. 3.1 Regressio Aalysis For scores of the same documets i two differet results, we assume that a liear relatioship exists betwee them (x i ad y i ): y i = β 0 + β 1x i + ɛ i The least squares poit estimates b 1 ad b 0 of parameters β 1 ad β 0 i the simple liear regressio model are calculated by usig the formulae [1]: ad P i=1 b 1 = P xiyi (P xi)(p i=1 i=1 x2 i (P i=1 x i) 2 b 0 = P i=1 y i b 1( P i=1 )x i i=1 yi) If oly two systems are ivolved, the we eed oly do it oce. If ( > 2) database servers are ivolved, we assume that for each of them there are overlaps with at least oe other database. If this caot be satisfied for ay database, we ca use the method proposed by Si ad Calla [15], already metios i Sectio 2. We ca do 1 regressio aalyses, each with a pair of databases. The followig algorithm ca be used for selectig the pairs: 1. assig every retrieval system to a isolated ode; 2. check all pairs of odes which have o coectio (direct or idirect) ad have the most duplicate documets; make a coectio betwee these two odes; 3. repeat Step 2 util all odes are coected. For every selected pair of systems, we carry out a regressio aalysis for them. After 1 pairs, we ca get a ormalised score for every documet i every system without ay icosistecy. We assume that the more duplicate documets are ivolved, the more accurate the regressio aalysis is, that is why we select the pair with the most duplicate documets i the above algorithm. 3.2 Shadow Documet Methods After performig regressio aalysis for each pair of results, we ormalise those scores from differet databases ad make them comparable. Now the remaiig problem is how to deal with the duplicate documets for better performace. Oe solutio is to take the maximum score amog these duplicate documets as the score of that documet i the fused result, aother solutio is to take the miimal score, the third is to average them. The we ca merge these documets by descedig order o scores. These methods will be referred to as Max, Mi, ad Avg. Obviously these methods do ot use overlaps as icreasig evidece of relevace to the query. Next let us itroduce some more methods. Suppose we have two database servers A ad B. If documet d oly occurs i A s result but ot i B s result, there are two possibilities: (1) d is ot stored i B s database; (2) d is ot retrieved by B. I case 1, we guess that if d was i B s database, the B would assig d a score similar to s a (d). I case 2, d must have obtaied a very low score sice it does ot occur i the result. The possibility of case 2 is due to the extet of overlaps existig i A ad B. However, for ay particular documet, we caot judge which situatio is happeig. We itroduce a method which works like this. For a give query, if documet d occurs i both A ad B s results the we sum these two scores as d s total score; if d oly occurs i A s result with a score s a (d), we preted that there is a shadow documet of it i B, ad assig a score to the (shadow) documet d i B. We do the same for those documets which oly occur i B s result but ot i A s result. The we ca use the same method as above (sum) to merge all the results. More specifically, if documet d oly occurs i A s result, d s total score is calculated as s a (d)(2 k 1 ), where k 1 is a coefficiet (0 k 1 1). The situatio is similar for documets oly occurrig i B s result. To assig a desirable value to k 1 is importat. We leave this to be determied empirically by the experimets. Some modificatios ca be doe to the above method. Oe way is to let the coefficiet k 1 be related to the overlappig rate of two databases. We eed some statistical iformatio (e.g., size, overlappig rate) about these databases, ad methods such as the oe i [18] ca be used. However, to collect such iformatio usually takes a lot of effort. Here we assume that this iformatio is ot available. For a give query, we obtai differet results from differet databases. We calculate the umber of duplicate documets betwee each pair of them, the use this iformatio as the idicator of the overlappig rate of these two databases. Suppose is the umber of documets icluded i both results ad m is the umber of duplicate documets. A higher overlappig rate suggests that there are more duplicate documets i both databases. We ca rely o this to modify the above method. If documet d oly occurs i A, the d s total score is calculated as s a(d)(2 k 2 (1 m/)), where k 2 is a coefficiet (0 k 2 1). The same as k 1, we will carry out some experimets to empirically obtai a desirable value for k 2. I the followig these two methods will be referred to as Shadow Documet Method (SDM) 1 ad 2. Obviously, we ca easily exted these SDMs with more tha 2 database servers. 4. EXPERIMENTAL STUDY We caot fully report here the results of our extesive aalysis of the performace of the proposed methods. We will oly report the most iterestig results. 4.1 Experimetal Settigs Two groups of full-text databases were used for the experimets. Each group icluded 3 databases. All of them were subsets of the first two CDROMs of the TREC Collectio (TREC-12). The characteristics of these databases are preseted i Table 1. I group 1, WSJ idicates the Wall Street Joural collectio of articles ad the umbers idicate the years. Documets i group 1 are therefore quite homogeeous. I group 2, each database icludes the same

4 Group Databases Num. of docs Retrieval model WSJ(88-91) vector space 1 WSJ(87-89) okapi WSJ(91-92) laguage model AP(88-89) vector space 2 WSJ(87-89) okapi FR(88-89) laguage model Table 1: Databases used i the experimetatio Average precisio at 8 raks SDM 1 SDM documets from WSJ90 (6103 documets). I additio to those idicated, comig from Associated Press ews wires ad the Federal Register. Documets i this group are more heterogeeous. All groups have cosiderable overlap amog the databases, with group 1 havig by far the largest (over 50, 000 documets i commo). I both groups, a differet retrieval model was used for each database. The Lemur iformatio retrieval system, developed at Caregie Mello Uiversity, was used [12]. Lemur provides three optios as the retrieval model: vector space, okapi, ad laguage model. Users ca select oe of the models for their retrieval tasks. We demostrated the effectiveess of our algorithms for both heavily ad lightly overlappig situatios. I all cases, 100 queries (TREC topics ) were used for the experimets. 4.2 Results with Heavy Overlap Three databases, which are idicated i Table 1 as group 1, were used for the experimet. For each query, each iformatio retrieval system retrieves 1000 documets with scores. Differet mergig methods were experimeted with for these result sets. The experimetal results are show i Table 2, i which differet methods are compared with Roud-robi, ad the differece betwee them is show i paretheses. For the two SDMs, k 1 is set to be 0.05 ad k 2 is set to be 0.1, which are the optimum empirical values observed i our experimets. The effect of coefficiets o the SDMs will be discussed later. The experimetal results show that all methods are better tha Roud-robi. I particular, Ave, Max, ad Mi are very similar i performace, ad the two SDMs are very similar i performace as well. Because we use the duplicate documets for the regressio aalysis to ormalise the scores i differet results, this makes Ave, Max, ad Mi much similar. They outperform Roud-robi about 7%. SDMs are the best of all the methods. They outperform Roud-robi about 9%, ad they are slightly better tha Ave, Max, ad Mi (e.g., they are better tha Max by about 2%). Differet coefficiets were tested for the two SDMs. The experimetal results are show i Figure 1. For SDM1, the maximum value occurs whe k 1 is betwee 0.05 ad 0.1, the it goes dow quite quickly. For SDM2, the curve is very flat from the begiig to the ed. Because i may cases, the overlappig rate for a query result is aroud 300/1000, a value of 0.8 for k 2 is roughly equivalet to a value of 0.24 for k 1. However, it seems that i both cases, a very small coefficiet may improve the performace as effectively as a larger oe does. 4.3 Results with Light Overlap I this experimet, oly light overlaps exist over differet The value of coefficiet Figure 1: Performace of two SDMs with differet coefficiets ad with heavy overlaps amog databases. Average precisio at 8 raks The value of coefficiet Figure 2: Performace of two SDMs with differet coefficiets ad with light overlaps amog databases. databases. Three databases which are idicated as group 2 i Table 1 were used. Very ofte, there are oly about duplicate documets i differet results. I such a case, all 5 mergig methods behave very similarly, as expected. The reaso for that is the umber of duplicate documets is very small, so that it may ot be possible for the two SDMs to outperform Mi, Max, or Avg. The experimetal results are show i Tables 3. The results show that all mergig methods are cosiderably better (over 10% improvemet) tha Roud-robi. Figure 2 shows the experimetal results for two SDMs whe differet coefficiets are assiged. The SDM1 curve decreases whe the value of coefficiet icreases, while SDM2 is almost totally flat all the way from the begiig to the ed. I the above we have preseted experimetal results of two SDMs i two typical situatios: either heavy overlap or light overlap exists i differet results. SDM1 gets its maximum performace whe k 1 is betwee 0.05 ad 0.1 for heavy overlaps, while it gets its maximum performace whe k 1 takes a value of 0.01 or eve less for light overlaps. For SDM2, a value of 0.1 seems to fit well for both situatios. It suggests that for SDM1, it is better to select a differet value of k 1 for differet overlap rates, while this has bee cosidered i SDM2, a fixed value of 0.1 for k 2 is very likely to be suitable for all situatios. SDM 1 SDM 2

5 Docs Rak Roud-Robi Avg Max Mi SDM1 (k=0.05) SDM2 (k=0.1) (+11.0%) (+10.1%) (+11.0%) (+14.0%) (+14.5%) (+7.4%) (+7.7%) (+8.4%) (+10.0%) (+9.8%) (+9.6%) (+9.0%) (+9.7%) (+9.8%) (+9.7%) (+7.3%) (+7.2%) (+7.1%) (+10.3%) (+9.7%) (+8.1%) (+6.9%) (+7.9%) (+9.0%) (+9.1%) (+7.7%) (+6.3%) (+7.3%) (+7.9%) (+8.1%) (+6.0%) (+6.1%) (+5.9%) (+7.3%) (+7.6%) (+4.4%) (+4.3%) (+3.8%) (+6.0%) (+6.0%) Table 2: Precisio compariso of differet methods with heavy overlaps amog databases. Roud-robi serves as baselie. Docs Rak Roud-Robi Avg Max Mi SDM1 (k=0.05) SDM2 (k=0.1) (+11.5%) (+10.2%) (+10.66%) (+11.1%) (+9.7%) (+14.3%) (+14.3%) (+14.6%) (+14.1%) (+14.8%) (+17.9%) (+17.5%) (+17.5%) (+17.5%) (+17.5%) (+14.8%) (+14.7%) (+15.0%) (+14.7%) (+16.1%) (+16.7%) (+16.4%) (+16.4%) (+16.1%) (+16.4%) (+16.3%) (+15.9%) (+13.4%) (+13.4%) (+13.6%) (+16.2%) (+16.0%) (+16.2%) (+16.2%) (+14.1%) (+16.1%) (+15.8%) (+15.8%) (+16.0%) (+14.8%) Table 3: Precisio compariso of differet methods with light overlaps amog databases. Roud-robi serves as baselie. 4.4 Compariso with CombMNZ ad Comb- Sum Whe differet databases have cosiderable overlaps, is it a good solutio to use mergig algorithms such as CombMNZ ad CombSum? It is a iterestig questio, therefore, we coducted a experimet to evaluate the performaces of CombMNZ ad CombSum i such a situatio. As before, we use the three databases i group 1. Here we use two differet ormalisatio methods: liear regressio ad liear [0,1] ormalisatio [10]. We have described liear regressio i Sectio 3.1. Liear [0,1] ormalisatio is the usual method used for CombMNZ ad CombSum before. For a list of documets from a database, we use the followig formula to calculate the score for ay documet i the list: orm score = (score mi score)/(max score mi score) where max score deotes the highest score i the documet list, mi score deotes the lowest score i the documet list, score deotes the score of a give documet, ad orm score is the ormalised score of the give documet. Thus, the documet with the highest score is assiged a score of 1, the documet with the lowest score is assiged a score of 0, ay other documet is assiged a score betwee 0 ad 1. Experimetal results of CombMNZ ad CombSum are show i Table 4, For all of them, SDM1 with k 1 = 0.05 serves as the baselie. The results show that both CombMNZ ad CombSum are cosiderably worse tha SDM1. Usig either liear [0,1] ormalisatio or liear regressio ormalisatio, both CombMNZ ad CombSum are about 10% to 20% worse tha SDM Ehaced Shadow Documet Methods If we have more iformatio about the databases with regards to documet presece, ca we improve the performace of SDM1 ad SDM2? Suppose for ay give documet, we could kow if it is stored i a give database. Such a situatio may happe for some corporate etworks where documets could be distributed i differet databases with very specific domais. I this case we could idetify the source of ay documet by checkig its idetifier, ad at the same time, we could kow if a documets have bee stored i more tha oe site. Let us discuss this with two databases A ad B. For a give query Q, if a documet d oly occurs i the result from A but ot B, oe of two possible thigs must have happeed: (a) d is ot stored i B; (b) d is stored i B, but is ot retrieved. If the DIR system is able to distiguish which situatio has happeed, the we ca modify the two SDM methods i the followig way: if d is ot stored i B, the we just treat it as before; if d is stored i B but ot retrieved, the the documet d oly gets its score (ormalised) from A, but ot ay extra score from B. We refer to this modified SDMs methods as Ehaced SDM (ESDM), sice they make use of additioal iformatio. We carried out a experimetatio o ESDM1 ad ESDM2, with databases i group 1, comparig them with SDM1 ad SDM2. The results showed that o the top 5 ad 10 documets, both ESDM1 ad ESDM2 are better tha their couterparts. However, this situatio chages for higher documet raks. After 15, ESDM1 ad ESDM2 perform at the same level as SDM1 ad SDM2. 5. CONCLUSIONS I this paper we have preseted several methods for results mergig whe overlaps exist betwee differet results i a distributed iformatio retrieval system. These methods use liear regressio aalysis ad shadow documet mergig methods (SDM) to improve the performace of result mergig. Experimets have bee coducted i two typical

6 Docs Rak SDM1 (k=0.05) CombMNZ (Reg.) CombSum (Reg.) CombMNZ (Li.[0,1]) CombSum (Li.[0,1]) (-10.3%) (-10.3%) (-11.8%) (-11.8%) (-13.8%) (-13.8%) (-14.5%) (-13.6%) (-17.0%) (-15.9%) (-14.3%) (-12.1%) (-17.9%) (-16.2%) (-18.0%) (-15.0%) (-17.9%) (-15.8%) (-17.2%) (-13.4%) (-19.0%) (-16.3%) (-17.8%) (-13.1%) (-23.1%) (-20.0%) (-20.4%) (-14.6%) (-26.6%) (-21.8%) (-22.0%) (-11.2%) Table 4: Precisios of CombMNZ ad CombSum with heavy overlaps i databases. baselie. SDM1 serves as the situatios: heavy overlappig ad light overlappig. I the former situatio, the two SDMs perform slightly better tha Mi, Max, ad Avg. With heavy overlappig, we compared the performaces of CombMNZ ad CombSum ad SDM. The experimetal result shows that both CombMNZ ad CombSum are 10% to 20% worse tha SDM. Sice overlap happes frequetly i geeral web search egies, the methods proposed i this paper are very desirable for mergig web search results. Compared with the methods preseted i Sectio 2 our methods ca achieve very good results, but without the effort of dowloadig web pages ad local retrieval process. 6. REFERENCES [1] B. L. Bowerma ad R. T. O Coell. Liear Statistical Models: A Applied Approach. PWS-KENT Publishig Compay, [2] J. K. Calla, Z. Lu, ad W. Croft. Searchig distributed collectios with iferece etworks. I Proceedigs of the 18th Aual Iteratioal ACM SIGIR Coferece, pages 21 28, Seattle, USA, July [3] A. L. Calve ad J. Savoy. Database mergig strategy based o logistic regressio. Iformatio Processig ad Maagemet, 36(3): , [4] N. Craswell, P. Bailer, ad D. Hawkig. Server selectio o the world wide web. I Proceedigs of the Fifth ACM Coferece o Digital Libraries, pages 37 46, Sa Atoio, CA, USA, Jue [5] D. Dreiliger ad A. Howe. Experieces with selectiog search egies usig metasearch. ACM Trasactio o Iformatio Systems, 15(3): , [6] E. A. Fox ad J. Shaw. Combiatio of multiple searchs. I The Secod Text REtrieval Coferece (TREC-2), pages , Gaitherburg, MD, USA, August [7] S. Gauch, G. Wag, ad M. Gomez. Profusio: Itelliget fusio from multiple, distributed search egies. Joural of Uiversal Computer Sciece, 2(9): , September [8] A. Howe ad D. Dreiliger. Savvysearch: A meta-search egie that lears which search egies to query. AI Magazie, 18(2):19 25, [9] S. Lawrece ad C. L. Giles. Cotext ad page aalysis for improved web search. IEEE Iteret Computig, 2(4):38 46, [10] J. H. Lee. Aalysis of multiple evidece combiatio. I Proceedigs of the 20th Aual Iteratioal ACM SIGIR Coferece, pages , Philadelphia, Pesylvaia, USA, July [11] W. Meg, C. Yu, ad K. Liu. Buildig efficiet ad effective metasearch egies. ACM Computig Surveys, 34(1):48 89, March [12] P. Ogilvie ad J. Calla. Experimets usig the lemur toolkit. I Proceedigs of the 2001 Text REtrieval Coferece, pages , Gaithersburg, Marylad, USA, November [13] B. Ozteki, G. Karypis, ad V. Kumar. Expert agreemet ad cotet baed rerakig i a meta search eviromet usig mearf. I Proceedigs of the 11th Iteratioal World Wide Web Coferece, pages , Hoolulu, Hawaii, USA, May [14] E. Selberg ad O. Etzioi. The metacrawler architecture for resource aggregio o the web. IEEE Expert, 12(1):8 14, [15] L. Si ad J. Calla. Usig sampled data ad regressio to merge search egig results. I Proceedigs of the 25th Aual Iteratioal ACM SIGIR Coferece, pages 19 26, Tempere, Filad, August [16] E. M. Voorhees, N. K. Gupta, ad B. Johso-Laird. Learig collectio fusio strategies. I Proceedigs of the 18th Aual Iteratioal ACM SIGIR Coferece, pages , Seattle, Washigto, USA, July [17] C. C. Vort ad G. A. Cottrell. A fusio via a liear combiatio of scores. Iformatio Retrieval, 1(3): , [18] S. Wu, F. Gibb, ad F. Crestai. Experimets with documet archive size detectio. I Proceedigs of the 25th Europea Coferece o Iformatio Retrieval Research, pages , Pisa, Italy, April [19] C. Yu, G. Meg, K. Liu, W. Wu, ad N. Rishe. Efficiet ad effective metasearch for a large umber of text databases. I Proceedigs of the Eighth ACM Iteratioal Coferece o Iformatio ad Kowledge Maagemet, pages , Kasas City, USA, November 2001.

3D Model Retrieval Method Based on Sample Prediction

20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer