Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks

Size: px
Start display at page:

Download "Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks"

Transcription

1 Federated Search of Text-Based Dgtal Lbrares n Herarchcal Peer-to-Peer Networks Je Lu School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA jelu@cs.cmu.edu Jame Callan School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA callan@cs.cmu.edu ABSTRACT Peer-to-peer archtectures are a potentally powerful model for developng large-scale networks of text-based dgtal lbrares, but peer-to-peer networks have so far provded very lmted support for text-based federated search of dgtal lbrares usng relevancebased rankng. Ths paper addresses the problems of resource representaton, resource rankng and selecton, and result mergng for federated search of text-based dgtal lbrares n herarchcal peer-to-peer networks. Exstng approaches to text-based federated search are adapted and two new methods are developed for resource representaton and resource selecton accordng to the unque characterstcs of herarchcal peer-to-peer networks. Expermental results demonstrate that the proposed approaches are both more accurate and more effcent than more common alternatves for text-based federated search n peer-to-peer networks. Categores and Subject Descrptors H.3.3 [Informaton Storage and Retreval]: Retreval models, Search process, Selecton process General Terms Algorthms, Desgn, Expermentaton, Performance Keywords Peer-to-peer, Herarchcal, Federated Search, Text-Based, Retreval, Dgtal Lbrary 1. INTRODUCTION Peer-to-peer (P2P) networks are an appealng approach to federated search over large networks of dgtal lbrares. The actvtes nvolved for search n peer-to-peer networks nclude ssung requests ( queres ), routng requests ( query routng ), and respondng to requests ( retreval ). The nodes n peer-topeer networks can partcpate as clents and/or servers. Clent nodes ssue queres to ntate search n peer-to-peer networks; server nodes provde nformaton contents, respond to queres wth documents that are lkely to satsfy the requests, and/or route queres to other servers. The frst peer-to-peer networks were based on sharng popular musc, vdeos, and software. These types of dgtal objects have relatvely obvous or well-known namng conventons and descrptons, makng t possble to represent them wth just a few words from a name, ttle, or manual annotaton. From a Lbrary Scence or Informaton Retreval perspectve, these systems were desgned for known-tem searches, n whch the goal s to fnd a sngle nstance of a known object (e.g., a partcular song by a partcular artst). In a known tem search, the user s famlar wth the object beng requested, and any copy s as good as any other. Known-tem search of popular musc, vdeo, and software flesharng systems s a task for whch smple solutons suffce. If P2P systems are to scale to more vared content and larger dgtal lbrares, they must adopt more sophstcated solutons. A very large number of text-based dgtal lbrares were developed durng the last decade. Nearly all of them use some form of relevance rankng, n whch term frequency nformaton s used to rank documents by how well they satsfy an unstructured text query. Many of them allow free search access to ther contents va the Internet, but do not provde complete copes of ther contents, or even complete ttle lsts for ther contents, upon request. Many do not allow ther contents to be crawled by Web search engnes. They do not cooperate by conformng to a sngle method of text representaton, query processng, or document retreval; they don t even provde nformaton about how these operatons are done. We would argue that most of the recent research on peer-to-peer networks offers lttle useful gudance for provdng federated search of current text-based dgtal lbrares. Ths paper addresses the problem of usng peer-to-peer networks as a federated search layer for text-based dgtal lbrares. We study federated search n two dfferent types of envronments: cooperatve envronments where each dgtal lbrary provdes accurate resource descrpton of ts content upon request, and uncooperatve envronments where resource descrptons must be obtaned ndrectly. We start by assumng the current state of the art; that s, we assume that each dgtal lbrary s a text database runnng a reasonably good conventonal search engne, that t provdes search access to ts holdngs, and that t provdes ndvdual documents n response to full text queres. We present n ths paper how resource descrptons of dgtal lbrares are obtaned and used for effcent query routng, and how results from dfferent dgtal lbrares are merged nto a sngle, ntegrated ranked lst n peer-to-peer networks. In the followng secton we gve an overvew of the pror research on federated search of text-based dgtal lbrares and peer-to-peer networks. Secton 3 descrbes our approaches to federated search of text-based dgtal lbrares n peer-to-peer networks. Sectons 4 and 5 dscuss our data resources and evaluaton methodologes. Expermental settngs and results are presented n Secton 6. Secton 7 concludes. 2. OVERVIEW Accurate and effcent federated search n peer-to-peer networks of text-based dgtal lbrares requres both the approprate peerto-peer archtecture and the effectve search methods developed for the chosen archtecture. In ths secton we present an

2 overvew of the pror research on federated search of text-based dgtal lbrares, peer-to-peer network archtectures, and textbased search n peer-to-peer networks n order to set the stage for the descrptons of our approaches to text-based federated search n peer-to-peer networks. 2.1 Federated Search of Text-Based Dgtal Lbrares Pror research on federated search of text-based dgtal lbrares (also called dstrbuted nformaton retreval n the research lterature) dentfes three problems that must be addressed: Resource representaton: Dscoverng the contents or content areas covered by each resource ( resource descrpton ); Resource rankng and selecton: Decdng whch resources are most approprate for an nformaton need based on ther resource descrptons; and Result-mergng: Mergng ranked retreval results from a set of selected resources. A drectory servce s responsble for acqurng resource descrptons of the dgtal lbrares t serves, selectng the approprate resources (dgtal lbrares) gven the query, and mergng the retreval results from selected resources nto a sngle, ntegrated ranked lst. Solutons to all these three problems for the case of a sngle drectory servce have been developed n dstrbuted nformaton retreval. We brefly revew them below Resource Representaton Dfferent technques for acqurng resource descrptons requre dfferent degrees of cooperaton from dgtal lbrares. STARTS s a cooperatve protocol that requres every dgtal lbrary to provde an accurate resource descrpton to the drectory servce upon request [6]. STARTS s a good soluton n envronments where cooperaton can be guaranteed. However, n some envronments where dgtal lbrares may not cooperate or may have an ncentve to cheat, STARTS cannot be used to acqure accurate resource descrptons. Query-based samplng s an alternatve approach to acqurng resource descrptons wthout requrng explct cooperaton from dgtal lbrares. The resource descrpton of a dgtal lbrary s constructed by samplng ts documents va the normal process of submttng queres and retrevng documents. Query-based samplng has been shown to acqure farly accurate resource descrptons usng a small number of queres and documents n dstrbuted nformaton retreval envronments [1]. The total number of documents of a dgtal lbrary s one of the most mportant corpus statstcs requred by many resource selecton algorthms. Capture-Recapture [12] and Sample- Resample [20] are two methods of estmatng the total number of documents of an uncooperatve dgtal lbrary. Expermental results show that n most scenaros, Sample-Resample s more accurate and has less communcaton costs than the Capture- Recapture method Resource Rankng and Selecton Resource selecton ams at selectng a small set of resources that contan a lot of documents relevant to the nformaton request. Resources are ranked by ther lkelhood to return relevant documents and top-ranked resources are selected to process the nformaton request. Resource selecton algorthms such as CORI [1], ggloss [7], and Kullback-Lebler (K-L) dvergence-based algorthms [24] use technques adapted from document retreval for resource rankng. The resource descrpton of a dgtal lbrary used by these algorthms ncludes a lst of terms wth correspondng collecton term frequences, and corpus statstcs such as the total number of terms and documents n the collecton. These algorthms have been shown to work well wth resource descrptons provded by cooperatve dgtal lbrares or acqured usng query-based samplng. Other resource selecton algorthms ncludng ReDDE [20] and DTF (the decson-theoretc framework for resource selecton) [16] rank resources by drectly estmatng the number of relevant documents from each resource for a gven query. ReDDE reles on sampled documents obtaned usng query-based samplng for such estmaton. DTF has three varants DTF-rp, DTF-sample and DTF-normal. DTF-rp estmates the number of relevant documents from a resource by assumng a lnearly decreasng recall-precson functon and calculatng the expected precson and recall from the resource. DTF-sample uses sampled documents to estmate how relevant documents are dstrbuted among the avalable resources. DTF-normal models the dstrbuton of document scores from a resource wth normal dstrbuton and map document scores to probablty of relevance usng a functon learned wth user relevance feedback. Decdng how many top-ranked resources to be selected ( thresholdng ) s a problem that s usually smplfed. Most resource selecton algorthms use heurstc values such as 10 and 20 for the number of selected resources Result Mergng Many result-mergng algorthms have been proposed n dstrbuted nformaton retreval. Varous approaches can be dvded nto two categores: approaches based on normalzng resource-specfc document scores nto resource-ndependent document scores, and approaches based on recalculatng document scores at the drectory servce. The CORI mergng algorthm uses a heurstc lnear combnaton of dgtal lbrary scores and document scores to normalze the scores of the documents from dfferent dgtal lbrares. The ntuton s to favor documents from dgtal lbrares wth hgh scores and also to enable hgh-scorng documents from lowscorng dgtal lbrares to be ranked hghly. It s effectve when used together wth the CORI resource selecton and INQUERY document retreval algorthms n federated search usng a sngle drectory servce [1]. There has been some work on usng logstc regresson to learn mergng models to normalze document scores but relevance judgments are requred for tranng [2]. The Sem-Supervsed Learnng result-mergng algorthm uses the documents obtaned by query-based samplng as tranng data to learn score normalzng functons on a query-by-query bass. It s shown to work well wth a varety of resource selecton and document retreval algorthms and s the current state-of-the-art for result mergng n dstrbuted nformaton retreval [19]. Document scores can be recalculated at the drectory servce by downloadng all the documents n the retreval results from

3 selected resources, ndexng them, and re-rankng them usng a document retreval algorthm. Downloadng documents s not necessary f all the statstcs requred for score recalculaton can be obtaned alternatvely. Krsch s algorthm [10] requres each resource to provde summary statstcs for each of the retreved documents. It allows very accurate normalzed document scores to be determned wthout the hgh communcaton cost of downloadng. The corpus statstcs requred for recalculatng document scores could also be substtuted by a reference statstcs database contanng all the relevant statstcs for some set of documents. Ths method s explored n [3] for federated search usng a sngle drectory servce and shown to be effectve compared wth usng the corpus statstcs provded by cooperatve dgtal lbrares. 2.2 P2P Network Archtectures As mentoned n Secton 1, the actvtes nvolved for search n peer-to-peer networks nclude ssung queres, query routng, and retreval. Query routng s essentally a problem of resource selecton and locaton. Resource locaton n frst generaton peerto-peer networks s characterzed by Napster, whch used a sngle logcal drectory servce, and Gnutella 0.4, whch used undrected message floodng and a search horzon. The former proved easy to attack, and the latter ddn t scale; both systems demonstrated the mportance of robust and relable methods of locatng nformaton n peer-to-peer networks. They also explored very dfferent solutons: Napster was centralzed and requred cooperaton (sharng of accurate nformaton); Gnutella 0.4 was decentralzed and requred lttle cooperaton. Recent research provdes a varety of solutons to the flaws of the Napster and Gnutella 0.4 archtectures, but perhaps the most nfluental are herarchcal and structured P2P archtectures. Structured P2P archtecture assocates each data tem wth a key and dstrbutes keys among drectory servces usng a Dstrbuted Hash Table (DHT) [17, 18, 21, 22, 28]. Herarchcal P2P archtecture [9, 11, 23] uses top-layer drectory servces to serve regons of bottom-layer dgtal lbrares and drectory servces work collectvely to cover the whole network. The common characterstc of both approaches s the constructon of an overlay network to organze the nodes that provde drectory servces (also called look up servces by DHT-based approaches) for effcent query routng. An mportant dstncton s that structured P2P networks requre the ablty to map (va a dstrbuted hash table) from an nformaton need to the dentty of the drectory servce that satsfes the need, whereas herarchcal P2P networks rely on message-passng to locate drectory servces. Structured P2P networks requre dgtal lbrares to cooperatvely share descrptons of data tems n order to generate keys and construct dstrbuted hash tables. In contrast, herarchcal P2P networks enable drectory servces to automatcally dscover the contents of (possbly uncooperatve) dgtal lbrares, whch s well-matched to networks that are dynamc, heterogeneous, or protectve of ntellectual property. 2.3 Text-Based Search n P2P Networks Most of the pror research on search n peer-to-peer networks only support smple keyword-based search. Matches between query terms and keywords of documents are used to determne how to route queres and whch documents to be retreved. There has been some recent work on developng systems that adopt more sophstcated retreval models to support text-based search (also called content-based retreval ) n peer-to-peer networks. Examples are PlanetP usng a completed decentralzed P2P archtecture [5], psearch usng a structured P2P archtecture [22], and content-based retreval n herarchcal P2P networks [13]. In PlanetP [5], a node uses a TF.IDF algorthm to decde whch nodes to contact for nformaton requests based on the compact summares t collects about all other nodes nverted ndexes. Because no specal resources are dedcated to support drectory servces n completely decentralzed P2P archtectures, t s somewhat neffcent for each node to collect and store nformaton about the contents of all other nodes, especally n dynamc P2P networks. psearch [22] uses the semantc vector (generated by Latent Semantc Indexng) of each document as the key to dstrbute document ndex n a structured P2P network so that documents close n dstance have smlar contents. The relevance of a document to a query s determned by the smlarty between ther semantc vectors. To compute semantc vectors for documents and queres, global statstcs such as the nverse document frequency and the bass of the semantc space need to be dssemnated to each node n the network. Because global statstcs can only be obtaned n completely cooperatve envronments where each dgtal lbrary shares ts document and corpus statstcs, ths approach cannot be easly extended to uncooperatve and heterogeneous envronments. There has been some pror research on content-based resource selecton and document retreval n herarchcal P2P networks of dgtal lbrares [13]. Vewng peer-to-peer networks as a partcular type of dstrbuted nformaton retreval envronment, content-based resource selecton s extended to the case of multple drectory servces n peer-to-peer envronments where dgtal lbrares cooperatvely provde resource descrptons to connectng drectory servces. Expermental results demonstrate that content-based resource selecton and document retreval can provde more accurate and more effcent solutons to federated search n peer-to-peer networks of text-based dgtal lbrares compared wth the floodng and keyword-based approaches. The problem of result mergng n herarchcal P2P networks of uncooperatve and barely-cooperatve text-based dgtal lbrares has also been studed n [15]. The Sem-Supervsed Learnng (SSL) result-mergng algorthm s modfed and an algorthm Score Estmaton wth Sample Statstcs (SESS) whch extends Krsch s approach to result mergng s proposed. Expermental results show that modfed SSL has satsfactory precson for topranked merged documents, and SESS s able to provde near optmal performance wth a small amount of cooperaton from dgtal lbrares. 3. TEXT-BASED FEDERATED SEARCH IN HIERARCHICAL P2P NETWORKS The research descrbed n ths paper adopts a herarchcal P2P archtecture because t provdes a more flexble framework to ncorporate varous solutons to resource selecton and result mergng n both cooperatve and uncooperatve envronments. Followng the termnology of pror research, we refer to textbased dgtal lbrares as leaf nodes, and drectory servces as hub nodes. Each leaf node s a text database that provdes functonalty to process full text queres by runnng a document

4 D 2 H 2 D 3 D 1 H 3 D 4 D 5 Fgure 3.1 Federated search n herarchcal P2P networks. retreval algorthm over ts ndex of local document collecton and generate responses. Each hub acqures and mantans necessary nformaton about ts neghborng hub and leaf nodes and uses t to provde resource selecton and result mergng servces to peerto-peer networks. In addton to leaf nodes and hubs, there are also nodes representng users wth nformaton requests n peerto-peer networks. They are referred to as clent nodes. In a herarchcal P2P network, leaf nodes and clent nodes can only connect to hubs and hubs connect wth each other. Search n peer-to-peer networks reles on message-passng between nodes. A request message ( query ) s generated by a clent node and routed from a clent node to a hub, from one hub to another, or from a hub to a leaf node. A response message ( queryht ) s generated by a leaf node and routed back along the query path n reverse drecton. Each message n the network has a tme-to-lve (TTL) feld that determnes the maxmum number of tmes t can be relayed n the network. The TTL s decreased by 1 each tme the message s routed to a node. When the TTL reaches 0, the message s no longer routed. When a clent node has an nformaton request, t sends a query message to each of ts connectng hubs. A hub that receves the query message uses ts resource selecton algorthm to rank and select one or more neghborng leaf nodes as well as hubs and routes the query to them f the message s TTL hasn t reached 0. A leaf node that receves the query message uses ts document retreval algorthm to generate a relevance rankng of ts documents and responds wth a queryht message to nclude a lst of top-ranked documents. Each top-level hub (the hub that connects drectly to the clent node that ssues the request) collects the queryht messages and uses ts result mergng algorthm to merge the documents retreved from multple leaf nodes nto a sngle, ntegrated ranked lst and returns t to the clent node. If the clent node ssues the request to more than one hub, then t also needs to merge results returned by multple toplevel hubs. Fgure 3.1 llustrates federated search of text-based dgtal lbrares n herarchcal P2P networks. The C (whte) node s the clent node that ssues the nformaton request, the H (black) nodes are hubs, and the D (gray) nodes are leaf nodes (dgtal lbrares). The edges between nodes represent connectons. The arrows wth sold lnes ndcate the drectons to send query messages and the arrows wth dashed lnes ndcate the drectons to send queryht messages. In the followng subsectons, we present n more detals the solutons to the problems of resource representaton, resource rankng and selecton, and result mergng n both cooperatve and uncooperatve peer-to-peer envronments. H 1 D 6 C H 5 H 4 D 7 D 9 D Resource Representaton The descrpton of a resource s a very compact summary of ts content. Compared wth a copy of the complete ndex of a collecton of documents, resource descrpton requres much less communcaton and storage costs but stll provdes useful nformaton for resource selecton algorthms to determne whch resources are more lkely to contan documents relevant to the query. As mentoned n Secton 2.1.2, the resource descrpton used by most resource selecton algorthms nclude a lst of terms wth correspondng term frequences (collecton language model), and corpus statstcs such as the total number of terms and documents provded or covered by the resource. The resource here could be a sngle leaf node, a hub that covers multple neghborng leaf nodes, or a neghborhood that ncludes all the nodes reachable from a hub. Although resource descrptons for dfferent types of resources have the same format, dfferent methods are requred to acqure them, whch we ntroduce below Resource Descrptons of Leaf Nodes Resource descrptons of leaf nodes are used by hubs for query routng ( resource selecton ) among connectng leaf nodes. In cooperatve envronments, each leaf node provdes accurate resource descrpton to ts connectng hubs upon request. In uncooperatve envronments, each hub conducts query-based samplng ndependently to obtan sampled documents from ts connectng leaf nodes. Sampled documents from a leaf node are used to generate ts collecton language model. They are also used by the Sample-Resample method to estmate the total number of documents n ths leaf node s collecton Resource Descrptons of Hubs The resource descrpton of a hub s the aggregaton of the resource descrptons of ts connectng leaf nodes. Snce hubs work collaboratvely n herarchcal P2P networks, neghborng hubs can exchange wth each other ther aggregate resource descrptons. However, because the aggregate resource descrptons of hubs only have nformaton for nodes wthn 1 hop, f they are drectly used by a hub to decde whch neghborng hubs to route query messages to, the routng would not be effectve when the nodes wth relevant documents st beyond ths horzon. Thus for effectve hub selecton, a hub must have nformaton about what contents can be reached f the query message t routes to a neghborng hub may further travel multple hops. Ths knd of nformaton s referred to as the resource descrpton of a neghborhood and s ntroduced n the followng subsecton Resource Descrptons of Neghborhoods A neghborhood of a hub H n the drecton of ts neghborng hub H j s a set of hubs that can be reached by followng the path from H to H j. Fgure 3.2 llustrates the concept of neghborhood. Hub H 1 has three neghborng hubs H 2, H 3 and H 4. Thus t has three neghborhoods marked by N 1,2, N 1,3 and N 1,4. The resource descrpton of a neghborhood provdes nformaton about the contents covered by all the hubs n ths neghborhood. A hub uses resource descrptons of neghborhoods to select and route queres to ts neghborng hubs. Resource descrptons of neghborhoods provde smlar functonalty as routng ndces [4]. An entry n a routng ndex records the number of documents that may be found along a path for a set of topcs. The key dfference between resource

5 N 1,4 Fgure 3.2 Neghborhoods n herarchcal P2P networks. descrptons of neghborhoods and routng ndces s that resource descrptons of neghborhoods represent contents wth ungram language models (terms wth ther frequences). Thus by usng resource descrptons of neghborhoods, there s no need for hubs and leaf nodes to cluster ther documents nto a set of topcs and t s not necessary to restrct queres to topc keywords. Smlar as exponentally aggregated routng ndces [4], a hub calculates the resource descrpton of a neghborhood by aggregatng the resource descrptons of all the hubs n the neghborhood decayed exponentally accordng to the number of hops. For example, n the resource descrpton of a neghborhood N,j (the neghborhood of H n the drecton of H j ), a term t s exponentally aggregated term frequency s calculated as: [ numhops ( H, H k ) 1] { tf ( t, H k ) / F } (1) H k N, j where tf(t, H k ) s t s term frequency n the resource descrpton of hub H k, and F s the average number of hub neghbors each hub has n the network. The exponentally aggregated total number of documents n a neghborhood s calculated as: [ numhops( H, H k ) 1] { numdocs ( H k ) / F } (2) H k N, j H 9 H 8 H 4 H 3 H 7 N 1,3 H 1 The creaton of resource descrptons of neghborhoods requres several teratons at each hub and dfferent hubs can run the creaton process asynchronously. A hub H n each teraton calculates and sends to ts hub neghbor H j the resource descrpton of neghborhood N j, denoted by ND j, by aggregatng ts hub descrpton HD and the most recent resource descrptons of neghborhoods t receves from all of ts neghborng hubs excludng H j. ND j, s calculated as: ND j = HD + H k { ND drectneghbors H H k F }, ( )\, / (3) j The stoppng condton could be ether the number of teratons reachng a predefned lmt, or the dfference n resource descrptons between adjacent teratons beng small enough. The process of mantanng and updatng resource descrptons of neghborhoods s dentcal to the process used for creatng them. The resource descrptons of neghborhoods could be updated when the dfference between the old and the new value s sgnfcant, or perodcally, or when a node dsconnects from the network. For networks that have cycles, frequences of some terms and the number of documents may be overcounted, whch wll affect the accuraces of resource descrptons. How to deal wth cycles n peer-to-peer networks usng routng ndces s dscussed n detal H 2 H 6 H 10 N 1,2 H 5 n [4]. We could use the same solutons descrbed n [4] for cycle avodance or cycle detecton and recovery. For smplcty, n ths paper, we take the no-op soluton, whch completely gnores cycles. Expermental results show that resource selecton usng resource descrptons of neghborhoods generated n networks wth cycles s stll qute effcent and accurate. 3.2 Resource Rankng and Selecton The goal of query routng s to drect the nformaton request to those nodes that are most lkely to contan relevant documents wth mnmum number of query messages. The floodng technque guarantees to reach nodes wth relevant nformaton contents but requres exponental number of query messages. Random forwardng the request to a small subset of neghbors can sgnfcantly reduce the number of query messages but the reached nodes may not be relevant at all. To acheve both effcency and accuracy, each hub needs to rank ts neghborng leaf nodes by ther lkelhood to satsfy the nformaton request and neghborng hubs by ther lkelhood to reach nodes wth relevant nformaton contents and only forwards the request to top-ranked neghbors. Because the resource descrptons of leaf nodes and those of neghborhoods are not n the same magntude, a hub handles separately the rankng and selecton of ts neghborng leaf nodes and hubs Leaf Node Rankng Adaptng language modelng approaches for ad-hoc nformaton retreval, we use the Kullback-Lebler (K-L) dvergence-based method [24] for leaf node rankng. In the language modelng framework, the K-L dvergence resource selecton algorthm calculates P(L Q), the condtonal probablty of predctng the collecton of leaf node L gven the query Q and uses t to rank dfferent leaf nodes. P(L Q) s calculated as follows: P P( Q L ) P( L ) Q) = P( Q L ) (4) P( Q) ( L wth unform pror probablty for leaf nodes; tf ( q, L ) + µ P( q G) P( Q L ) = (5) µ q Q numterms( L ) + where tf(q L ) s the term frequency of query term q n leaf node L s resource descrpton (collecton language model), P(q G) s the background language model used for smoothng and µ s the smoothng parameter n Drchlet smoothng Leaf Node Selecton wth Unsupervsed Threshold Learnng After leaf nodes are ranked based on ther P(L Q) values, the usual approach s to select the top-ranked leaf nodes up to a predetermned number. In herarchcal P2P networks, the number of leaf nodes served by ndvdual hubs may be qute dfferent, and dfferent hubs may cover dfferent content areas. In ths case, t s not approprate to use a statc, query-ndependent and hubndependent number as threshold for a hub to decde how many leaf nodes to select for a gven query. It s desrable that hubs have the ablty to learn hub-specfc and query type-specfc thresholds automatcally. The problem of learnng threshold to convert relevance rankng scores nto a bnary decson has mostly been studed n nformaton flterng [25, 26, 27]. However, the user relevance

6 feedback requred as tranng data s not as easly avalable for federated search n peer-to-peer networks as for the task of nformaton flterng. Our goal s to develop a technque for each hub to learn the selecton threshold wthout supervson based on the nformaton and functonalty t already has. Because each hub has the ablty to merge the retreval results from multple leaf nodes nto a sngle, ntegrated ranked lst, as long as the result mergng has reasonably good performance, we could assume that the top-ranked merged documents are relevant. If so, the dstrbuton of the top-ranked merged documents over the leaf nodes should provde useful hnts on the number of relevant documents each leaf node s lkely to retreve. Ths s analogous to query expanson wth pseudo-relevance feedback whch treats the top-ranked documents retreved ntally as relevant documents and uses them to mprove the qualty of the query. The key dfferences are ) our approach uses the nformaton about whch top-ranked merged documents are from whch leaf nodes and gnores the actual contents of these documents, and ) the drect goal here s not to mprove mmedately the retreval qualty for current query, but to learn resource selecton thresholds that are specfc to hubs and types of queres and mprove the overall retreval performance for a set of queres. For leaf node selecton, f a hub selects more leaf nodes than necessary, although the retreval results wll nclude a lot of rrelevant documents, as long as there are enough relevant documents, a reasonably good result mergng algorthm can rank most relevant documents above rrelevant documents, yeldng good precsons at top-ranked documents. In ths case, t seems that a loose threshold wll almost always gve good performance. However, a loose threshold leads to low effcency and hgh communcaton costs. Because for search n peer-to-peer networks, accuracy and effcency are equally mportant, the resource selecton threshold must be not too loose n order to guarantee effcency, and not too tght as well so that enough relevant documents are returned (hgh recall). Wth the above crtera n mnd, a hub uses the followng procedure to decde the threshold of leaf node selecton for a query: 1. Gven a query, the hub uses K-L dvergence resource selecton algorthm to calculate leaf node scores and sorts them n descendng order; 2. The hub selects up to 100 top-ranked leaf nodes and normalzes ther scores usng the formula: S' S S mn = (6) Smax Smn where S max s the maxmum score and S mn s the mnmum score among these selected leaf nodes; 3. The hub forwards the query to selected leaf nodes and merges the retreval results returned by these leaf nodes; 4. The hub calculates for each selected leaf node the number of documents that are ranked among top 50 n the merged result; 5. The hub goes down the lst of leaf nodes sorted by ther scores and stops at the leaf node whch has the largest number of documents ranked among top 50 n the merged results (hghest recall usng pseudo-relevance feedback); 6. The hub regards the normalzed score of ths leaf node as the threshold of ts leaf node selecton for the gven query. Learnng thresholds for ndvdual queres s not useful unless the same queres appear agan. Thus queres need to be classfed nto dfferent types and thresholds for ndvdual queres are used to compute thresholds for dfferent query types. Queres can be classfed based on ther contents or statstcal propertes. When the number of queres for tranng s small (whch s desred due to ts low communcaton cost), classfyng queres by contents often leads to sparse and skewed tranng data for varous query types. Hence n our experments we focused on classfyng queres by ther statstcal propertes and found the average probablty of the query terms n a hub s resource descrpton to be a good feature for query classfcaton. Gven a set of tranng queres that have average probabltes of query terms n dfferent ranges, probablty values rangng from 0 to the maxmum term probablty n a hub s resource descrpton are dvded nto 10 non-overlappng bns so that all bns have roughly the same number of queres for tranng. A query type s assocated wth each bn, so there are 10 query types n total. A query s classfed nto one of these 10 types based on the average probablty of ts terms n the hub s resource descrpton. Durng the learnng phase, each hub n the network learns the thresholds for a set of tranng queres and the learned thresholds for queres of the same type are averaged to get the threshold for ths query type at the hub. Gven a new query, a hub determnes the type of the query, ranks up to 100 leaf nodes, normalzes ther scores, and uses the query type-specfc threshold to select the leaf nodes that have normalzed scores no less than the threshold Hub Rankng and Selecton The K-L dvergence resource selecton algorthm used for leaf rankng s also used for hub rankng. The resource descrptons of neghborhoods are used to calculate the collecton language models needed by the resource selecton algorthm. For hub selecton, because selectng a neghborng hub s essentally selectng a neghborhood, usng a pror dstrbuton that favors larger neghborhood could lead to better search performance, whch was ndeed the case n our experments. Thus the pror probablty of a neghborhood s set to be proportonal to the exponentally aggregated total number of documents n the neghborhood. Gven the query Q, the probablty of predctng the neghborhood N that a neghborng hub node H represents s calculated as follows and used to rank neghborng hubs: P( Q N ) P( N ) P ( N Q) = P( Q N ) numdocs( N ) (7) P( Q) tf ( q, N ) + µ P( q G) P( Q N ) = (8) µ q Q numterms( N ) + where tf(q N ) s the term frequency of query term q n the resource descrpton of neghborhood N (collecton language model), P(q G) s the background language model used for smoothng and µ s the smoothng parameter n Drchlet smoothng. A fxed number of top-ranked neghborng hubs are selected. It remans to be future work to apply unsupervsed threshold learnng to hub selecton. 3.3 Result Mergng As descrbed earler, result mergng takes place at each top-level hub. In cooperatve envronments, Krsch s algorthm [10] s

7 extended for result mergng n peer-to-peer networks. In addton to a lst of retreved documents, each resource s requred to provde summary statstcs for each of the retreved documents, for example, document length and how often each query term matched. The corpus statstcs comes from the aggregaton of the hub s resource descrpton and the resource descrptons of neghborhoods for all ts neghborng hubs. The modfed Sem-Supervsed Learnng algorthm (modfed SSL) [15] s used for result mergng n uncooperatve envronments. Each hub along the query path contrbutes to result mergng by provdng document statstcs for overlap documents, whch are documents that appear both n the sampled documents mantaned at the hub for ts leaf node neghbors and n the retreval results sent to the hub by these neghbors. Toplevel hubs use these document statstcs provded by collaboratve hubs to recalculate document scores for overlap documents and par them wth ther orgnal scores returned n the retreval results to use as tranng data for learnng score normalzng functons. The man dfference between result mergng n cooperatve envronments and that n uncooperatve envronments s that n cooperatve envronments leaf nodes provde document statstcs for all the retreved documents to top-level hubs, whle n uncooperatve envronments, hubs provde document statstcs for a subset of retreved documents ( overlap documents) to toplevel hubs. If the clent node ssues the request to more than one hub, then t also needs to merge results returned by multple top-level hubs. Because clent nodes don t mantan nformaton about the contents of other nodes and corpus statstcs as hubs do n herarchcal P2P networks, they cannot use advanced resultmergng algorthms. Thus only smple, but probably less effectve, mergng methods can be appled at clent nodes. For example, results can be merged based on the document scores returned by top-level hubs ( raw score merge ) or n a round robn fashon. 4. TEST DATA We used the P2P testbed [14] developed based on the TREC WT10g web test collecton [8] to evaluate the performance of federated search n herarchcal P2P networks of text-based dgtal lbrares. The P2P testbed conssts of 2,500 collectons obtaned by dvdng WT10g data nto 11,485 collectons based on document URLs and randomly selectng 2,500 of them. The total number of documents n these 2,500 collectons s 1,421,088. Each collecton defnes a leaf node (dgtal lbrary) n a herarchcal P2P network. There are 25 hubs n total n the P2P testbed, each of whch covers a specfc type of content. The connectons between leaf nodes and hubs were determned by clusterng leaf nodes nto 25 clusters usng a smlarty-based soft clusterng algorthm, assocatng each cluster wth a hub, and connectng all the leaf nodes wthn a cluster to the assocated hub. The connectons between hubs were generated randomly. Each hub has no less than 1 and no more than 7 hub neghbors. A hub has on average 4 hub neghbors. Table 4.1 summarzes some statstcs for the testbed. Experments were run on two sets of queres. The frst set of queres came from the ttle felds of TREC topcs used for TREC-8 and TREC-9 Web Tracks. The standard TREC Table 4.1 Summary statstcs for the testbed. mn avg max Number of documents for a leaf node ,505 Number of leaf nodes for a hub ,008 Number of hubs a leaf node connects to relevance assessments suppled by the U. S. Natonal Insttute for Standards and Technology were used. The second set of queres was a set of 1,000 queres selected from the queres defned n the P2P testbed. Queres n the P2P testbed were automatcally generated from WT10g data by extractng key terms from the documents n the collecton. Table 4.2 shows the dstrbuton of query lengths among the selected 1,000 queres. Table 4.3 shows the dstrbuton of term frequences n WT10g for all the query terms n these 1,000 queres. Because t s expensve to obtan relevance judgments for these automatcally generated queres, we used the ranked retreval results from a sngle large collecton as the baselne ( sngle collecton baselne), and measured how well federated search n the herarchcal P2P network could reproduce ths baselne. The sngle large collecton was the subset of the WT10g used to defne the contents of the 2,500 leaf nodes n the peer-to-peer network, and the 50 top-ranked documents retreved usng ths sngle large collecton (WT10g-subset) were treated as the relevant documents for each query. For each query, a leaf node was randomly chosen to act as a clent node temporarly to ssue the query to the network and collect the merged retreval results for evaluaton. 5. EVALUATION METHODOLOGY A smulator was used to evaluate the performance of text-based federated search n herarchcal P2P networks. Both retreval accuracy and query routng effcency are used as performance measures. 5.1 Measurng Retreval Accuracy Retreval accuracy was measured by both set-based and rankbased Recall and. Set-based Recall and are defned as follows: Recall = r / A (9) = r / R (10) where R s the set of the documents returned by retreval n the P2P network, A s the set of relevant documents for a query among the 100 TREC queres, or the set of (up to 50) top-ranked documents returned by retreval usng the sngle WT10g-subset collecton for a query among the 1,000 WT10g queres, and r s the ntersecton of R and A. denotes the sze of the set. The qualty of document rankngs was measured usng precsons Table 4.2 Dstrbuton of query length for 1,000 queres. Length Dstrbuton Table 4.3 Dstrbuton of term frequency for 1,000 queres. Frequency Scale Dstrbuton

8 Leaf descrptons Hub descrptons Neghborhood descrptons Leaf node rankng Table 6.1 Choces of algorthms n the experments. Algorthm Provded by leaf nodes n cooperatve envronments, OR Generated by hubs usng documents sampled from leaf nodes by query-based samplng n uncooperatve envronments Generated by hubs by aggregatng leaf descrptons Generated by hubs by aggregatng hub descrptons and exponentally decayed neghborhood descrptons over several teratons K-L dvergence resource selecton algorthm usng leaf descrptons Leaf node selecton Hub rankng Hub selecton Document retreval Result mergng at top-level hubs 1 of top-ranked leaf nodes, OR Fxed number of top-ranked leaf nodes, OR Top-ranked leaf nodes wth normalzed scores no less than the learned threshold (Secton 3.2.2) K-L dvergence resource selecton algorthm usng neghborhood descrptons All neghborng hubs (floodng), OR 1 randomly selected neghborng hubs, OR Top-ranked neghborng hub K-L dvergence document retreval algorthm Extended Krsch s algorthm n cooperatve envronments, OR Modfed Sem-Supervsed Learnng n uncooperatve envronments (Secton 3.3) Result mergng at clent node Raw score merge (Secton 3.3) at document ranks 5, 10, 15, 20, 30, and 100. Set-based Recall and focus attenton on how well textbased federated search n herarchcal P2P networks returns the rght documents for a query, whle rank-based metrcs measure drectly the performance of document rankng and result mergng. 5.2 Measurng Query Routng Effcency The effcency of query routng was measured by the average number of query messages routed for each query n the network. The average number of query messages routed from hubs to leaf nodes ( Hub-Leaf Messages ) for each query was also used to measure the effcency of leaf node selecton n some experments. 6. EXPERIMENTS AND RESULTS A seres of experments was conducted to study resource selecton and result mergng n both cooperatve ( COOP ) and uncooperatve ( UNCOOP ) P2P envronments. The choces of the algorthms used for resource representaton, resource rankng and selecton, document retreval and result mergng are shown n Table 6.1. Table 6.2 shows the values of some parameters used n our experments. Unsupervsed threshold learnng requred a set of queres for tranng. For each experment that used leaf node selecton wth unsupervsed threshold learnng to run the 100 TREC queres, two runs were conducted. The frst run used the frst half of the 100 TREC queres for tranng and the second half for testng. The second run worked the other way around. The results from two runs were averaged to get the fnal results. For the experments that used leaf node selecton wth unsupervsed threshold learnng to run the 1,000 WT10g queres, the 100 TREC queres were used as tranng data. Unsupervsed threshold learnng only used queres and retreved documents for tranng. The relevance judgments provded by NIST for the 100 TREC queres were not used to learn thresholds for leaf node selecton. Tables 6.3a and 6.3b show respectvely the results of runnng the 100 TREC queres and the 1,000 WT10g queres for text-based federated search n a herarchcal P2P network usng dfferent methods. Both cooperatve and uncooperatve envronments were studed. The sngle collecton baselne whch returned 50 topranked documents for each query by retreval usng the sngle WT10g-subset collecton s also shown n Table 6.3a for the 100 TREC queres. The followng subsectons present the analyss of the results from dfferent perspectves. 6.1 Set-Based Recall/ vs. s at Top Document Ranks The set-based fgures (column 4) are much lower than one mght expect because the number of relevant documents was very small (50 on average for the 100 TREC queres usng relevance judgments and 50 maxmum for the 1,000 WT10g queres usng the sngle collecton baselne), but the total number of retreved documents was at least ten tmes larger for most queres n the herarchcal P2P network. Ths demonstrates a lmtaton of set-based Recall and metrcs for ths task snce generally users only care about the retreval accuracy of top-ranked documents, but we nclude them as another way of comparng resource rankng and selecton methods. Compared wth set-based, the dfferences between precsons at top document ranks for federated search n the herarchcal P2P network and for search usng a centralzed ndex are smaller. Ths mples that both result mergng algorthms for cooperatve and uncooperatve envronments performed qute well by rankng most rrelevant documents lower than relevant documents n spte of low set-based. 6.2 TREC Queres vs. WT10g Queres In contrast to real queres and manual relevance judgments, the Table 6.2 Parameter values used n the experments. Parameters Values Intal TTL for messages 6 Number of documents sampled from each leaf node Up to 300 Number of resample queres used for Sample-Resample to estmate total number of documents Number of teratons to create neghborhood descrptons 6 F (Average number of hub neghbors each hub has) 4 µ (Drchlet smoothng parameter n K-L dvergence resource selecton) Number of documents retreved from each leaf node Up to 50

9 Envronment Table 6.3a Search performance evaluated on the 100 TREC queres usng relevance judgments provded by NIST. Hub Leaf Set-based Recall/ # Query Centralzed N/A N/A / N/A COOP Floodng Top / COOP Random 1 Top / COOP Top 1 Top / COOP Floodng Threshold / COOP Random 1 Threshold / COOP Top 1 Threshold / UNCOOP Floodng Top / UNCOOP Random 1 Top / UNCOOP Top 1 Top / UNCOOP Floodng Threshold / UNCOOP Random 1 Threshold / UNCOOP Top 1 Threshold / Envronment Table 6.3b Search performance evaluated on the 1,000 WT10g queres usng the sngle collecton baselne. Hub Leaf Set-based Recall/ # Query COOP Floodng Top / COOP Random 1 Top / COOP Top 1 Top / COOP Floodng Threshold / COOP Random 1 Threshold / COOP Top 1 Threshold / UNCOOP Floodng Top / UNCOOP Random 1 Top / UNCOOP Top 1 Top / UNCOOP Floodng Threshold / UNCOOP Random 1 Threshold / UNCOOP Top 1 Threshold / ,000 WT10g queres were generated automatcally by extractng key terms from documents and the top-ranked documents retreved usng a sngle centralzed ndex were used for relevance judgments. When ths set of queres was used to evaluate the performance of text-based federated search n herarchcal P2P networks, t drectly measured the ablty of federated search n herarchcal P2P networks to match the results from search n a centralzed envronment. The strong performance ndcated by hgh precsons at top document ranks n Table 6.3b demonstrates that federated search n the herarchcal P2P network mostly agreed wth the centralzed approach on whch documents were most relevant. Addtonal evaluatons on the 100 TREC queres by treatng the documents n the sngle collecton baselne as relevant documents (the same evaluaton methodology as we used for the 1,000 WT10g queres) gave very smlar results (not shown n ths paper due to space reason) as those n Table 6.3b. Ths s an encouragng sgn for federated search n peer-to-peer networks because although dstrbuted retreval systems are not yet better than the sngle collecton baselne, our results show that ther performance can be pretty close at top-ranked documents. However, we note that Table 6.3b gves slghtly overly optmstc vew of federated search qualty, because n cases where federated search n the herarchcal P2P network dsagreed wth search usng a centralzed ndex, federated search was more lkely to gve hgh rank to an rrelevant document whch was ranked lowly by centralzed search. Therefore, the performance dfference between federated search n the herarchcal P2P network and search usng a centralzed ndex s expected to be slghtly larger f we evaluate them usng real relevance judgments, as shown n Table 6.3a. In order to clam that a peer-to-peer system beng able to reproduce the sngle collecton baselne qute well s an effectve system for federated search, we need to rely on the assumpton that search usng a centralzed ndex s effectve n satsfyng user s nformaton needs, whch s not necessarly the case. Due to ths reason, we were concerned wth whether automatcally generated queres would behave smlarly as real queres and whether the conclusons drawn usng the sngle collecton baselne for evaluaton would stll be vald wth real relevance judgments. If we compare the fgures n Table 6.3a wth those n Table 6.3b, we can see that although the absolute values were qute dfferent, the relatve performance dfference of

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Federated Search of Text Search Engines in Uncooperative Environments

Federated Search of Text Search Engines in Uncooperative Environments 1 Federated Search of Text Search Engnes n Uncooperatve Envronments Luo S Thess Proposal Language Technology Insttute School of Computer Scence Carnege Mellon Unversty ls@cs.cmu.edu Thess Commttee: Jame

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

A KIND OF ROUTING MODEL IN PEER-TO-PEER NETWORK BASED ON SUCCESSFUL ACCESSING RATE

A KIND OF ROUTING MODEL IN PEER-TO-PEER NETWORK BASED ON SUCCESSFUL ACCESSING RATE A KIND OF ROUTING MODEL IN PEER-TO-PEER NETWORK BASED ON SUCCESSFUL ACCESSING RATE 1 TAO LIU, 2 JI-JUN XU 1 College of Informaton Scence and Technology, Zhengzhou Normal Unversty, Chna 2 School of Mathematcs

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Future Generation Computer Systems

Future Generation Computer Systems Future Generaton Computer Systems 29 (2013) 1631 1644 Contents lsts avalable at ScVerse ScenceDrect Future Generaton Computer Systems journal homepage: www.elsever.com/locate/fgcs Gosspng for resource

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults 1 An Improved Neural Network Algorthm for Classfyng the Transmsson Lne Faults S. Vaslc, Student Member, IEEE, M. Kezunovc, Fellow, IEEE Abstract--Ths study ntroduces a new concept of artfcal ntellgence

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Using Language Models for Flat Text Queries in XML Retrieval

Using Language Models for Flat Text Queries in XML Retrieval Usng Language Models for Flat ext Queres n XML Retreval aul Oglve, Jame Callan Language echnoes Insttute School of Computer Scence Carnege Mellon Unversty ttsburgh, A USA {pto,callan}@cs.cmu.edu ABSRAC

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

ARTICLE IN PRESS. Signal Processing: Image Communication

ARTICLE IN PRESS. Signal Processing: Image Communication Sgnal Processng: Image Communcaton 23 (2008) 754 768 Contents lsts avalable at ScenceDrect Sgnal Processng: Image Communcaton journal homepage: www.elsever.com/locate/mage Dstrbuted meda rate allocaton

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

DYNAMIC NETWORK OF CONCEPTS FROM WEB-PUBLICATIONS

DYNAMIC NETWORK OF CONCEPTS FROM WEB-PUBLICATIONS DYNAMIC NETWORK OF CONCEPTS FROM WEB-PUBLICATIONS Lande D.V. (dwl@vst.net), IC «ELVISTI», NTUU «KPI» Snarsk A.A. (asnarsk@gmal.com), NTUU «KPI» The network, the nodes of whch are concepts (people's names,

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems:

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems: Speed/RAP/CODA Presented by Octav Chpara Real-tme Systems Many wreless sensor network applcatons requre real-tme support Survellance and trackng Border patrol Fre fghtng Real-tme systems: Hard real-tme:

More information

Petri Net Based Software Dependability Engineering

Petri Net Based Software Dependability Engineering Proc. RELECTRONIC 95, Budapest, pp. 181-186; October 1995 Petr Net Based Software Dependablty Engneerng Monka Hener Brandenburg Unversty of Technology Cottbus Computer Scence Insttute Postbox 101344 D-03013

More information

REFRACTIVE INDEX SELECTION FOR POWDER MIXTURES

REFRACTIVE INDEX SELECTION FOR POWDER MIXTURES REFRACTIVE INDEX SELECTION FOR POWDER MIXTURES Laser dffracton s one of the most wdely used methods for partcle sze analyss of mcron and submcron sze powders and dspersons. It s quck and easy and provdes

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Measuring Integration in the Network Structure: Some Suggestions on the Connectivity Index

Measuring Integration in the Network Structure: Some Suggestions on the Connectivity Index Measurng Integraton n the Network Structure: Some Suggestons on the Connectvty Inde 1. Measures of Connectvty The connectvty can be dvded nto two levels, one s domestc connectvty, n the case of the physcal

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Generalized Team Draft Interleaving

Generalized Team Draft Interleaving Generalzed Team Draft Interleavng Eugene Khartonov,2, Crag Macdonald 2, Pavel Serdyukov, Iadh Ouns 2 Yandex, Russa 2 Unversty of Glasgow, UK {khartonov, pavser}@yandex-team.ru 2 {crag.macdonald, adh.ouns}@glasgow.ac.uk

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Brushlet Features for Texture Image Retrieval

Brushlet Features for Texture Image Retrieval DICTA00: Dgtal Image Computng Technques and Applcatons, 1 January 00, Melbourne, Australa 1 Brushlet Features for Texture Image Retreval Chbao Chen and Kap Luk Chan Informaton System Research Lab, School

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information