Querying and Ranking XML Documents Based on Data Synopses

Size: px
Start display at page:

Download "Querying and Ranking XML Documents Based on Data Synopses"

Transcription

1 Queryng and Rankng XML Documents Based on Data Synopses Wemn He 1, Teng Lv 2 1 Department of Computng and New Meda Technologes Unversty of Wsconsn-Stevens Pont Stevens Pont, Wsconsn 54481, USA whe@uwsp.edu 2 Teachng and Research Secton of Computer Artllery Academy, Hefe Anhu, Chna lt0410@163.com Journal of Dgtal Informaton Management ABSTRACT: There s an ncreasng nterest n recent years for queryng and rankng XML documents. In ths paper, we present a new framework for queryng and rankng schema-less XML documents based on concse summares of ther structural and textual content. We ntroduce a novel data synopss structure to summarze the textual content of an XML document for effcent ndexng. More mportantly, we extend the tradtonal vector space model to effectvely rank XML documents over the proposed data synopses. We conduct extensve experments over XML benchmark data to demonstrate the advantages of the ndexng scheme and the effectveness of our rankng scheme. We also compare our framework wth Lucene to demonstrate our extended TF*IDF scorng functon s effectve. Categores and Subject Descrptors D.3.2 [Language Classfcatons]: Data-flow languages; H.3.1 [Content Analyss and Indexng]: I.7 [Document and Text Processng]: Markup languages; H.3.3 [Informaton Search and Retreval]: Query Formulaton General Terms: XML, Informaton Retreval, Data Processng Keywords: XML, Query processng, Document rankng, Query synopses, Document rankng Receved: 11 June 2011, Revsed 12 August 2011, Accepted 19 August Introducton As XML has become the de facto form for representng and exchangng data on the web, there s an ncreasng nterest n queryng and rankng text-centrc XML documents. Although approxmate matchng wth relevance rankng for plan documents has long been the focus of Informaton Retreval (IR), the herarchcal nature of XML data has brought new challenges to both the IR and database communtes. Extensve work has been motvated on developng effcent ndexng and query evaluaton algorthms, and proposng effectve rankng schemes over XML data [1], [2], [3], [4], [15], [16], [17]. In ths paper, we present a new framework for effcent queryng and rankng schema-less XML documents based on condensed summares extracted from the structural and textual content of the documents. Instead of ndexng each sngle element or term n a document, we extract a structural summary and a small number of data synopses from the document, whch are ndexed n a way sutable for query evaluaton. The result of the query evaluaton s a ranked lst of XML documents that best match the query. In order to rank XML documents, we extend the tradtonal vector space model and develop an effectve rankng scheme based on TF*IDF scorng. We beleve that our framework can serve as an nfrastructure for a wde range of web applcatons, such as onlne nteractve exploraton of massve XML data stores and XML search engnes n peer-topeer envronment. In summary, we address the followng ssues n ths paper: We present a novel framework for ndexng, queryng, and rankng XML documents based on content and structure synopses. We propose an effcent XML meta-data ndexng scheme that s sutable for full-text XML query evaluaton. We extend the tradtonal vector space model to effectvely rank XML documents based on meta-data. We expermentally valdate the advantages of our ndexng scheme and the effectveness of our rankng scheme. 2. Meta-Data Indexng As our query language, we extend the XPath syntax wth a full-text search predcate e ~ S, where e s an arbtrary XPath expresson. Ths predcate returns true f at least one element from the sequence returned by e matches the search specfcaton, S. A search specfcaton s a smple IR-style boolean keyword search that takes the form term S 1 and S 2 S 1 or S 2 ( S ) where S, S 1, and S 2 are search specfcatons. A term s an ndexed term that must be present n the text of an element returned by the expresson e. As a runnng example used throughout the paper, the followng query, Q: //aucton//tem[locaton "Dallas"] [descrpton "mountan" and "bcycle"]/prce searches for the prces of all aucton tems located n Dallas that contan the words mountan and bcycle n ther descrpton. An example XML document s shown n Fgure 1. The document represents the aucton nformaton from an aucton web ste. When an XML document s ndexed, only a structural summary and a small number of concse data synopses are extracted and ndexed. Journal of Dgtal Informaton Management Volume 9 Number 5 October

2 <aucton> <sponsor> <name> ebay </name> <address> 1040 W. Abram Dr. San Jose, CA </address> </sponsor> <tem d="01"> <name> bcycle </name> <descrpton> a mountan bcycle used for 2 years </descrpton> <payment> Credt Card, Cash, Money Order </payment> <locaton> Dallas, TX </locaton> <prce> 30 </prce> </tem> <tem d="02"> <name> bcycle </name> <descrpton> a brand new 24-nch bcycle for grls </descrpton> <payment> Credt Card, Cash, Check </payment> <locaton> Stevens Pont, WI </locaton> <prce> 60 </prce> </tem> <tem d="03"> <name> car </name> <descrpton> 1999 Toyota Camery LE, mleage 80K </descrpton> <payment> Credt Card, Check </payment> <locaton> Arlngton, TX </locaton> <prce> 5000 </prce> </tem> </aucton> Fgure 1. Example XML Document 2.1 Structural Summary To match the structural components n the query durng the query evaluaton, we construct a type of data synopses, called Structural Summary(SS), whch s a structural markup that captures all the unque paths n the document. The structural summary of the example XML document s shown n Fgure 2. Each node n a structural summary has a tagname and a unque d. Fgure 2. Structural Summary of Example XML Document 2.2 Content Synopses To capture the textual content of a document, for each text node k n the structural summary S of the document D (an SS node that corresponds to at least one document element that contans text), we construct a content synopss (CS) H D to p summarze the textual data assocated wth k, where the path p s the unque smple path from the root of S to the node k n S. H D s a bt matrx of sze L W, where W s the number of p term buckets and L s the number of postonal ranges n the document. Here, W s proportonal to the number of elements n D reachable by the text label path p. L s proportonal to the number of begn/end tags n the document. The postonal nformaton s represented by the document order of the begn/ end tags of the elements. More specfcally, for each term t drectly nsde an element assocated wth the node k whose begn/end poston s b/e, we set all matrx values H D [, hash(t) p Fgure 3. Data Synopses Example modw] to one, for all b L/ D e L// D, where hash s a strng hashng functon and D s the document sze. That s, the [0, D ] range of tag postons n the document s compressed nto the range [0, L]. H D p s mplemented as a B+ -tree ndex wth ndex key p, because, durng the query processng, we need to retreve the content synopses of all documents for a gven path p. For example, the content synopss for the SS node descrpton (k = 3) s llustrated on the rght of Fgure Postonal Flters For each non-text node n n the structural summary of a document, we construct another type of data synopss, called Postonal Flter (PF), denoted by F D p. As we dd for H D p, F D p s also mplemented as a B-tree wth ndex key p. F D p s a bt matrx of sze L M, where L s the document postonal ranges of the elements assocated wth node n that s reachable by the label path p, and M s the number of bt vectors n F D p. The postonal flter for SS node tem s demonstrated on the left n Fgure 3. The 7 one-bt ranges ndcate there are 7 tem elements n the document. 2.4 Contanment Flterng Usng the above data synopses, we can enforce the element contanment constrants n the query usng an operaton called Contanment Flterng. Let F be a postonal flter of sze L M and V be a bt vector extracted from a content synopss whose sze s L W. The Contanment Flterng CF(F, V ) returns a new postonal flter F'. The bt F' [, m] s on ff: k [0, L] : V [k] = 1 j [, k] : F [j, m] = 1 Bascally, the Contanment Flterng copes a contnuous range of one-bts from F to F' f there s at least one poston wthn ths range n whch the correspondng bt n V s one. Fgure 4 shows how the data synopses are used to determne whether a document s lkely to satsfy the query Q (here M = 2). Frst, we do a contanment flterng between the ntal postonal flter F 2 and the bt vector H 4 [ Dallas ]. In the resultng postonal flter A, only 5 one-bt ranges out of 7 n F2 are left. Countng from bottom to top, the 2nd and 4th one-bt ranges n F2 are dscarded n A because there s no any onebt range n H 4 [ Dallas ] that ntersects wth the 2nd or 4th range, whch means that nether the 2nd nor 4th tem element contans a locaton element that contans the term Dallas. Smlarly, we can do contanment flterng between A and the resultng bt vector derved from the 200 Journal of Dgtal Informaton Management Volume 9 Number 5 October 2011

3 calculatng the weght of a term, we consder a path-term par as the unt for content scorng. 1) TF*IDF Calculaton: We frst gve the defntons of TF and IDF scores of a path-term par, and then use an example to llustrate ther calculatons. Defnton 1: TF Score of a Path-Term Par. Let D be an XML document assocated wth the par (p, t), where p s a full text label path from ts structural summary and t s a term. Let paths(d) be the set of tuples (t x, p x, b x, e x, x ) for all terms t x n D, where b x /e x s the begn/end poston of the element that drectly contans t x, p x s a full label path that reaches t x, and x s the document poston of t x n D. The TF score of (p, t) relevant to D s defned as: TF D (p, t) = { x (t x, p x, b x, e x, x ) paths(d) p = p x t = t x } (1) Fgure 4. Testng query Q Usng Data Synopses btwse andng between H 3 [ mountan ] and H 3 [ bcycle ]. The 3 one-bt ranges n B ndcate that 3 tems out of 7 n F 2 satsfy all element contanment constrants n the query. Thus, the document s consdered to satsfy the query. 3. Query Processng Due to the the space lmtaton, we only brefly descrbe the query processng n our framework n ths secton. As a XPath query s evaluated, a query footprnt s derved from the query. A query footprnt(qf) captures the essental structural components and all the entry ponts assocated wth the search predcates. For example, the query footprnt of Q s: //aucton//tem:1[locaton:2] [descrpton:3]/prce The numbers 1, 2, and 3 are the numbers of entry ponts n QF that ndcate the places where data synopses are needed for query evaluaton. Based on the query footprnt, all the matchng structural summares and dstnct label path combnatons correspondng to the query search predcates are retreved. Based on the retreved label path combnatons, documents that have data synopses assocated wth those label paths are found and qualfed documents are fltered out usng contanment flterng. The scores of documents are also calculated durng the query evaluaton and the ranked document lst s returned to the clent. 4. Extendng Vector Space Model For Rankng Snce a query footprnt may match a large number of structural summares and there may be a large number of documents that match each structural summary, t s desrable to rank all the qualfed documents usng a scorng functon. 4.1 Extended Vector Space Model In the tradtonal vector space model used n IR systems, the TF*IDF rankng for keyword queres s defned over a document collecton [19]. Ths rankng mechansm consders two factors: (1) TF, the term frequency, that measures the relatve mportance of a query keyword n the canddate document, and (2) IDF, the nverse document frequency, that measures the global mportance of a query keyword n the entre collecton of documents. We extend the vector space model to measure the content smlarty of an XML document D to a query Q. Due to the nherent herarchcal structure of XML data, nstead of Bascally, TF D (p, t) counts the number of (p, t) pars n the document D. Notce that, f t occurs n tmes n the same element reachable by p n D, then (p, t) wll be counted n tmes. Defnton 2: IDF Score of a Path-Term Par. Let N be the total number of documents n the corpus. Let D j 1 j N, be an XML document assocated wth the par (p, t), where p s a full text label path derved from structural summary matchng and t s the correspondng term n the query. The IDF score of (p, t) s defned as: N ( ) = p IDF p, t log Np,t ( ) (2) where N p and N(p, t) are calculated as follows: N { ( x, x, x, x, x) paths( j) x} (3) j1 Np j t p b e D p p N Npt tx, pxtx, px, bx, ex, x) pathsdj j1 p px t tx Bascally, N p counts the total number of documents that contan path p and N(p, t) counts the total number of documents that contan the path-term par (p, t) n the corpus. We now walk through an example to llustrate TF and IDF score calculaton. Suppose we have three documents n the corpus only, D 1, D 2, and D 3. The path-term par (p, t) s (/aucton/tem/ locaton, Dallas ). Suppose D 1 contans 30 paths /aucton/tem/ locaton that can reach the term Dallas and D 1 contans 100 paths /aucton/tem/descrpton that can reach the term bcycle. Suppose that all the three documents contan the path /aucton/ tem/locaton, but only the paths /aucton/tem/locaton n D 1 and D 2 can reach the term Dallas. Then the TF and IDF scores can be calculated as follows: TF D 1(/aucton/tem/locaton, Dallas )=30 IDF(/aucton/tem/locaton, Dallas )=log(3/2)=log Enhanced Scorng wth Postonal Weght A path-term par (p,t) corresponds to a postonal bt vector V n the content synopss assocated wth p. A one-bt range n V represents an element that contans t and s reachable by p. For nstance, n Fgure 4, the bt vector H 3 [ mountan ] corresponds to the par (/aucton/tem/descrpton, mountan ) and t contans 5 bt-on ranges. The number of one-bt ranges n the vector reflects the TF score of the par (/aucton/ tem/ descrpton, mountan ). However, after the btwse andng (4) Journal of Dgtal Informaton Management Volume 9 Number 5 October

4 operaton, only 3 one-bt ranges are left n the resultng bt vector, whch ndcates that among those 5 descrpton elements, only 3 of them contan both mountan and bcycle. Smlarly, after the contanment flterng between the postonal flter of tem(f 2 ) and H 4 [ Dallas ], only 5 tem elements are left out of 7 n the resultng postonal flter(a). Thus, t s desrable to take nto account ths percentage of qualfed elements as we calculate the weght of a path-term par. To make the weght calculaton of a path-term par more accurate, we ntroduce the postonal weght, whch s the percentage of qualfed pathterm pars found durng the contanment flterng or btwse andng operaton, and s used to calculate the weght of the path-term par. Defnton 3: Postonal Weght of a Path-Term Par. Let D be an XML document, PF D (p, t) be the postonal flter assocated wth (p, t), and PF D (p, t) be the result from contanment 0 flterng or btwse andng operaton. In addton, let N PF D (p, t) 0 be the number of one-bt ranges n PF D (p, t) and PF D (p, t) be 0 the number of one-bt ranges n PF D (p, t). The postonal weght of (p, t) n D s defned as: N D D PF ( p, t) PW ( p, t) (5) N PF D O ( p, t) To llustrate the calculaton of the postonal weght of a pathterm par, we refer to our runnng example. In Fgure 4, before the contanment flterng, there are 7 one-bt ranges n F 2. After the contanment flterng, 5 one-bt ranges reman n A. Smlarly, before the btwse andng operaton, there are 5 and 4 one-bt ranges n H [ mountan ] and H 3 [ bcycle ] respectvely. After ths operaton, 3 one-bt ranges are left. Thus, the postonal weght of each path-term par s calculated as follows: PW D (/aucton/tem/locaton, Dallas )=5/7=0.71 PW D (/aucton/tem/descrpton, mountan )=3/5=0.6 PW D (/aucton/tem/descrpton, bcycle )=3/4=0.75 Combnng the TF score, IDF score, and the postonal weght, the defnton of the weght of (p,t) n D s determned by the followng equaton: D D D W ptpw pttf pt IDFpt (6) Fnally, we gve the defnton of the enhanced content score of D relevant to Q. Defnton 4: Enhanced Content Score of a Document. Let Q be the query and D be an XML document. Let W Q ( p Q, t Q ) be the weght of the path-term par (p Q, t Q ) n Q and W D (p D, t D ) be the weght of the correspondng pathterm par (p D, t D ) n the document D. The content score of D relevant to Q s defned as n Q Q Q 2 D D D 2 W ( p, t ) W ( p, t ) ECS( D, Q) 1 n n Q Q Q 2 D D D 2 W ( p, t ) W ( p, t ) 1 1 where n s the number of path-term pars. In order to calculate the TF * IDF score of a pathterm par, we construct two addtonal synopses named TF Synopss and IDF Synopss. When a document D s ndexed, for each text label path p n the structural summary of D, a TF synopss s constructed to summarze the frequences of all the terms assocated wth p n D. More specfcally, a TF synopss s a hash table that maps a term to a par consstng of DstnctTerms that ndcates the number of dstnct terms n D reachable by p, and Frequences that counts the total number of term occurrences reachable by (7) p n D. In our mplementaton, a TF synopss s attached to the content synopss assocated wth p. To obtan the TF score of the path-term par (p, t) durng query evaluaton, path p s used as the key to retreve the correspondng content synopss. Then term t s hashed nto some bucket over the attached TF synopss, usng the value of Frequences/DstnctTerms as the term frequency of (p, t). To acheve IDF score of a pathterm par, for each text label path p, we construct an IDF synopss, whch contans two members named Documents and Synopss, where Documents counts the total number of documents contanng the path p, and Synopss s a hash table that maps the term t to the total number of documents contanng (p, t). 5. Experments We have buld a prototype system named XQR usng Java (J2SE 6.0) and Berkeley DB Java Edton [13] was employed as the storage manager. We conducted extensve experments to evaluate the effcency of our ndexng scheme and the effectveness of our rankng scheme. We used two XML benchmark datasets XMark [22] and XBench [21] as our datasets. The man characterstcs of our datasets and data synopses sze are summarzed n Table 1. For each dataset, our query workload s 10 full-text XPath queres exhbtng dfferent szes, query structures and search predcates. Data Set Data Sze Fles Avg. Fle Avg. SS Avg. CS Avg. PF (MB) Sze (KB) Sze (KB) Sze (KB) Sze (KB) XMark XMark XBench XBench Table 1. Data set characterstcs and data synopses sze 5.1 Evaluaton of Indexng Scheme To demonstrate the effcency of our Data Synopses Indexng( DSI) scheme, we mplemented the tradtonal Inverted Lst Indexng(ILI) scheme n Berkeley DB and compared t wth our ndexng scheme. We chose XMark2 and XBench2 as datasets for ths experment because the dataset sze s the key factor for the ndexng scheme comparson experment. Fgure 5 shows that for XMark dataset, ILI consumes over 2 tmes dsk space and query response tme than DSI. For XBench dataset, DSI s much more effcent than ILI because XBench data s text-centrc data generated from Text-CentrcMultple Document(TC/MD) class, whch s more sutable for the ndex comparson experments. In fact, the ndex sze of DSI s about 3% of that of ILI. The query response tme of DSI s over 40 tmes faster than ILI. (a) Index Sze (b) Query Response Tme Fgure 5. Comparson between ILI and DSI 202 Journal of Dgtal Informaton Management Volume 9 Number 5 October 2011

5 5.2 Evaluaton of Rankng Scheme To examne the effectveness of our rankng functon, we used two wdely accepted metrcs, precson and recall. We evaluated all the XMark(XBench) queres over each XMark(XBench) dataset to measure the average precson and recall. To construct the accurate relevant set for each query, we exploted Qzx/open [20] to evaluate the query over each dataset to obtan the strct relevant set. 1) Varyng bestk value: We frst fxed the heght and wdth of data synopses and vared the bestk value,.e., the number of documents returned, to measure the average precson and recall of a query. Note that the meanng of bestk value s dfferent from that n the classcal topk algorthms, such as Fagn algorthm [18]. The bestk value here s just the number of returned documents for relevance rankng. The results n Fgure 6(a) show that as the bestk value vares from 10 to 100, the average precson of a query over each dataset drops smoothly, whch ndcates that our scorng functon can effectvely rank the relevant documents on the top of the ranked lst so that the precson does not drop too much when the number of returned documents ncreases. Snce the average sze of the relevant set for a query over each XBench dataset s relatvely small, the average precson of a query over each XBench dataset s a lttle lower. Fgure 6(b) shows the recall over XMark1 s greater than that over XMark2 and the recall over XBench1 s larger than that over XBench2. The reason s that when the number of documents ncreases, the sze of relevant set ncreases, whch leads to a lower recall. (a) Average Precson (b) Average Recall Fgure 6. Varyng BestK Value 2) Impact of Heght Factor: In the second set of experments, we fxed the bestk value and the wdth of data synopses to see the mpact of dfferent heght factors on precson and recall. As we can see from Fgure 7(a) and Fgure 7(b), as the heght factor vares from 0.1 to 0.5, the precson and recall almost reman at the same value for each dataset. Ths was expected because when the heght of data synopses s reduced, a query may get more false postves, but our rankng functon can effectvely rank the most relevant documents close to the top, whle movng false postves near the bottom of the answer set so that the precson and recall almost do not change. (a) Impact of Wdth Factor on (b) Impact of Wdth Factor Precson on Recall Fgure 8. Impact of Wdth Factor 3) Impact of Wdth Factor: Fnally, we fxed the sze of the bestk value and the heght of data synopses to see the mpact of dfferent wdth factors on precson and recall. The results are shown n Fgure 8(a) and Fgure 8(b). As we can see, the precson and recall change a lttle more than those n Fgure 7(a) and Fgure 7(b). Ths result mples that f we want to decrease the sze of data synopses to reduce the data storage overhead but stll keep hgh precson, the heght factor should be adjusted frst. 5.3 Expermental Comparson wth Lucene We also compared our framework wth Lucene [14], whch s a full-featured text search engne lbrary wrtten entrely n Java. Lucene s the closest to our framework because t returns ranked XML document hts as query answers and t supports boolean queres that may contan feld-term pars such as ttle: XML AND author: Sucu. We conducted the comparatve experments over XMark2 and XBench2 datasets. Snce Lucene does not drectly support XPath queres, we converted the XPath queres over two data sets to the correspondng queres that are supported n Lucene. We frst compared our rankng scheme wth Lucene over XMark data. We vared the top K value and measured the average precson and recall. The result s shown n Fgure 9. Fgure 9(a) shows that for the same top K value, the average precson of XQR s much hgher than that of Lucene. In addton, the average precson of Lucene drops rapdly when the top K value s ncreased because more false postves are returned n the answer set. On the contrary, the average precson of XQR does not decrease sharply when the top K value s ncreased because our rankng scheme can effectvely rank the relevant documents on the top of the ranked lst. From Fgure 9(b), we can see that the average recall of Lucene s not only much lower than that of XQR, but also s lower than 2%. The reason s that the sze of a relevant set s relatvely larger for XMark data and there are more false postves n the answer set of Lucene, whch leads to a very low average recall. (a) Impact of Heght Factor on Precson Fgure 7. Impact of Heght Factor (b) Impact of Heght Factor on Recall (a) Average Precson (b) Average Recall Fgure 9. Rankng Comparson wth Lucene over XMark Data Journal of Dgtal Informaton Management Volume 9 Number 5 October

6 (a) Average Precson (b) Average Recall Fgure 10. Rankng Comparson wth Lucene over XBench Data We then compared our rankng scheme wth Lucene over XBench data. We vared the top K value and measured the average precson and recall. The result s shown n Fgure 10. From Fgure 10, we can see that both average precson and recall of Lucene are much lower than those of XQR and are close to 0. The s because the queres over XBench data are more selectve and Lucene returns very few relevant document hts n the top K answer set for most queres, thus leads to close-to-zero average precson and recall. 6. Related Work Wth the popularty of XML as the data format for a wde varety of web data repostores, extensve work has been motvated on desgnng powerful query languages, developng effcent ndexng and query evaluaton algorthms, and proposng effectve rankng schemes over XML data [1], [2], [3], [15], [17]. The related work can be classfed nto three categores. Approaches n the frst category are focused on extendng exstng complex XML query languages, such as XQuery, wth IR search predcates and ncorporatng smple IR scorng methods nto the query evaluaton. Khalfa et al [1] proposes bulk algebra called TIX, whch ntegrates smple IR scorng schemes nto a tradtonal ppelned query evaluator for an XML database. More specfcally, new operators and effcent access methods are proposed to generate and manpulate scores of XML fragments durng query evaluaton. TeXQuery [2] supports a powerful set of fully composable full-text search prmtves, whch can be seamlessly ntegrated nto the XQuery language. It mostly focuses on the TeXQuery language desgn and the underlyng formal model, but also provdes a costng model [5]. Strateges n the second category are dedcated to extendng tradtonal IR models for scorng smple XPath-lke queres wth full-text extenson. In [6], the authors present a framework that relaxes a full-text XPath query by droppng some predcates from ts closure and scorng the approxmate answers usng predcate penaltes. They further propose novel scorng methods by extendng TF*IDF rankng to account for both structure and content whle consderng query relaxatons [3]. Sgurbjornsson et al, [7] propose a framework that decomposes the query nto several pathterm pars, evaluates these pars separately and produces a fnal rankng that takes nto account the scores of dfferent sources of evdence. In the thrd category, efforts are made to address the problem of effcently producng ranked results for keyword search queres over XML documents. XRank [16] extends Google-lke keyword search to XML. The authors propose an algorthm for scorng XML elements that takes nto account both hyperlnk and contanment edges. New nverted lst ndex structures based on Dewey IDs and assocated query processng algorthms are also presented. XSEarch [15] s a semantc search engne that extends smple keyword search by ncorporatng keyword context nformaton nto the query,.e., each query term s a keyword-label par nstead of a sngle keyword. XSEarch extends the TF*IDF rankng scheme to rank the results. A recent work, XKSearch [8], ntroduces the concept of smallest lowest common ancestors (SLCAs) and proposes two effcent algorthms, Indexed Lookup Eager and Scan Eager, for keyword search n XML documents accordng to the SLCA semantcs. Note that all the above proposals consder queryng and rankng orgnal XML documents, and returnng XML fragments as query answers. None of them evaluates a full-text query over XML meta-data to return a ranked lst of document locatons to the clent. Snce our queryng and rankng schemes are based on XML meta-data, our work s also related to XML summarzaton technques n the lterature. Polyzots et al [9] propose an XSKETCH synopss model that explots localzed stablty and value-dstrbuton summares to accurately capture the complex correlaton patterns that exst between and across path structure and element values n the XML data graph. In [10], the authors further propose a novel class of XML synopses, termed XCLUSTERs, that provdes a unfed summarzaton framework to address the key problem of XML summarzaton n the context of heterogeneous value content. Smlarly, [11] presents a novel data structure, the bloom hstogram, to approxmate XML path frequency dstrbuton wthn a small space budget and to accurately estmate the path selectvty. A dynamc summary layer s used to keep exact or more detaled XML path nformaton to accommodate frequently changed data. However, all these proposed XML synopses and summares are manly used for selectvty estmaton, rather than for locatng and rankng XML documents, as s n our framework. A recent work by Cho et al [12] addresses the meta-data ndexng problem of effcently dentfyng XML elements along each locaton step n an XPath query, that satsfy range constrants on the ordered meta-data. the authors develop a meta-data ndex structure named full meta-data ndex(fmi) by enhancng the R-tree ndex proposed for XPath locaton steps. The FMI can quckly dentfy XML elements that are reachable from a gven element usng a specfed XPath axs and satsfy the meta-data range constrant.to reduce the metadata update overhead, another R-tree-based ndex structure named nhertance meta-data ndex(imi) s also proposed to enable effcent lookup n the ndex structure, whle keepng the update cost manageable. 7. Concluson We presented a framework for ndexng and rankng XML documents based on concse data synopses extracted from orgnal XML documents. We frst proposed a novel meta-data ndexng scheme sutable for full-text XPath query evaluaton. We then extended vector space model to rank the query results over the ndexed meta-data. The experments demonstrate that our ndexng scheme s much effcent than tradtonal nvertedlst based ndexng scheme. Our rankng scheme s effectve n terms of precson and recall and superor than the exstng XML search engne Lucene. 8. Acknowledgment Ths work s supported by the Faculty & Program Development Grant from the Unversty of Wsconsn-Stevens Pont, USA, Journal of Dgtal Informaton Management Volume 9 Number 5 October 2011

7 References [1] Al-Khalfa, S., Yu, C., Jagadsh, H. V. (2003). Queryng Structured Text n an XML Database, In: SIGMOD 03. [2] Amer-Yaha, S., Botev, C., Shanmugasundaram, S. (2004). TeXQuery: A Full-Text Search Extenson to XQuery, In: WWW 04. [3] Amer-Yaha, S., Koudas, N., Maran, A., Srvastava, D., Toman, D. (2005). Structure and Content Scorng for XML, In: VLDB 05. [4] Amer-Yaha, S., Curtmola, E., Deutsch, A. (2006). Flexble and Effcent XML Search wth Complex Full-Text Predcates, In: Proceedngs of SIGMOD 06. [5] Maran, A., Amer-Yaha, S., Koudas, N., Srvastava, D. (2005). Adaptve Processng of Top-K Queres n XML, In: Proceedngs of ICDE 05. [6] Amer-Yaha, S., Lakshmanan, L. V.S., Pandt, S. (2004). FleXPath: Flexble Structure and Full-Text Queryng for XML, In: Proceedngs of SIGMOD 04. [7] Sgurbjornsson, B., Kamps, J., Rjke, M. D. (2004). Processng Content- Orented XPath Queres, In: Proceedngs of CIKM 04. [8] Xu, Y., Papakonstantnou, Y. (2005). Effcent Keyword Search for Smallest LCAs n XML Databases. SIGMOD [9] Polyzots, N., Garofalaks, M. (2002). Structure and Value Synopses for XML Data Graphs, In: Proceedngs of VLDB 02. [10] Polyzots, N., Garofalaks, M. (2006). XCluster Synopses for Structured XML Content, In: Proceedngs of ICDE 06. [11] Wang, W., Jang, H., Lu, H., Yu, J.H. (2004). Bloom Hstogram: Path Selectvty Estmaton for XML Data wth Updates, In: Proceedngs of VLDB 04. [12]. Cho, S., Koudas, N., Srvastava, D. (2006). Metadata Indexng for XPath Locaton Steps, In: Proceedngs of SIGMOD 06. [13] Berkeley DB. [14] Lucene. [15] Cohen, S., Mamou, J., Kanza, Y., Sagv, Y. (2003). XSEarch: A Semantc Search Engne for XML, In: VLDB 03. [16] Guo, L., Shao, F., Botev, C., Shanmugasundaram, J. (2003). XRANK: Ranked Keyword Search over XML Documents, In: SIGMOD 03. [17] Chen, L., Papakonstantnoul Y. (2010). Supportng Top-K Keyword Search n XML Databases, In: Proceedngs of ICDE 10. [18] Fagn, R., Lotem, A., Naor, M. (2003). Optmal aggregaton algorthms for mddleware, In: Journal of Computer Systems and Scence, 66 (4) [19] Yates, R. B., Neto, B.R. (1999). Modern Informaton Retreval, ACM Press. [20] Qzx/open. [21] XBench. [22] XMark. [23] XML Path Language (XPath) xpath20/ Journal of Dgtal Informaton Management Volume 9 Number 5 October

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Information Retrieval

Information Retrieval Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Indexing and Searching XML Documents based on Content and Structure Synopses

Indexing and Searching XML Documents based on Content and Structure Synopses Indexing and Searching XML Documents based on Content and Structure Synopses Weimin He, Leonidas Fegaras, and David Levine University of Texas at Arlington, CSE Arlington, TX 76019-0015 {weiminhe,fegaras,levine}@cse.uta.edu

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Efficient Processing of Ordered XML Twig Pattern

Efficient Processing of Ordered XML Twig Pattern Effcent Processng of Ordered XML Twg Pattern Jaheng Lu, Tok Wang Lng, Tan Yu, Changqng L, and We N School of computng, Natonal Unversty of Sngapore {lujahen, lngtw, yutan, lchangq, nwe}@comp.nus.edu.sg

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Positive Semi-definite Programming Localization in Wireless Sensor Networks Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection A Gradent Dfference based Technque for Vdeo Text Detecton Palaahnakote Shvakumara, Trung Quy Phan and Chew Lm Tan School of Computng, Natonal Unversty of Sngapore {shva, phanquyt, tancl }@comp.nus.edu.sg

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection 2009 10th Internatonal Conference on Document Analyss and Recognton A Gradent Dfference based Technque for Vdeo Text Detecton Palaahnakote Shvakumara, Trung Quy Phan and Chew Lm Tan School of Computng,

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

SAO: A Stream Index for Answering Linear Optimization Queries

SAO: A Stream Index for Answering Linear Optimization Queries SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples

More information

arxiv: v3 [cs.ds] 7 Feb 2017

arxiv: v3 [cs.ds] 7 Feb 2017 : A Two-stage Sketch for Data Streams Tong Yang 1, Lngtong Lu 2, Ybo Yan 1, Muhammad Shahzad 3, Yulong Shen 2 Xaomng L 1, Bn Cu 1, Gaogang Xe 4 1 Pekng Unversty, Chna. 2 Xdan Unversty, Chna. 3 North Carolna

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning Parallel Inverse Halftonng by Look-Up Table (LUT) Parttonng Umar F. Sddq and Sadq M. Sat umar@ccse.kfupm.edu.sa, sadq@kfupm.edu.sa KFUPM Box: Department of Computer Engneerng, Kng Fahd Unversty of Petroleum

More information

COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL

COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL Nader Safavan and Shohreh Kasae Department of Computer Engneerng Sharf Unversty of Technology Tehran, Iran skasae@sharf.edu

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Application of VCG in Replica Placement Strategy of Cloud Storage

Application of VCG in Replica Placement Strategy of Cloud Storage Internatonal Journal of Grd and Dstrbuted Computng, pp.27-40 http://dx.do.org/10.14257/jgdc.2016.9.4.03 Applcaton of VCG n Replca Placement Strategy of Cloud Storage Wang Hongxa Computer Department, Bejng

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS by XUNYU PAN (Under the Drecton of Suchendra M. Bhandarkar) ABSTRACT In modern tmes, more and more

More information

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION 1 FENG YONG, DANG XIAO-WAN, 3 XU HONG-YAN School of Informaton, Laonng Unversty, Shenyang Laonng E-mal: 1 fyxuhy@163.com, dangxaowan@163.com, 3 xuhongyan_lndx@163.com

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Exploring Image, Text and Geographic Evidences in ImageCLEF 2007

Exploring Image, Text and Geographic Evidences in ImageCLEF 2007 Explorng Image, Text and Geographc Evdences n ImageCLEF 2007 João Magalhães 1, Smon Overell 1, Stefan Rüger 1,2 1 Department of Computng Imperal College London South Kensngton Campus London SW7 2AZ, UK

More information

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,

More information

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Efficient Semantically Equal Join on Strings in Practice

Efficient Semantically Equal Join on Strings in Practice Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 Effcent Semantcally Equal Jon on Strngs n Practce Juggapong Natwcha Computer Engneerng Department, Faculty of Engneerng Chang Ma Unversty, Chang

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp Lfe Tables (Tmes) Summary... 1 Data Input... 2 Analyss Summary... 3 Survval Functon... 5 Log Survval Functon... 6 Cumulatve Hazard Functon... 7 Percentles... 7 Group Comparsons... 8 Summary The Lfe Tables

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY SSDH: Sem-supervsed Deep Hashng for Large Scale Image Retreval Jan Zhang, and Yuxn Peng arxv:607.08477v2 [cs.cv] 8 Jun 207 Abstract Hashng

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

TOWARDS ADVANCED DATA RETRIEVAL FROM LEARNING OBJECTS REPOSITORIES

TOWARDS ADVANCED DATA RETRIEVAL FROM LEARNING OBJECTS REPOSITORIES The Fourth Internatonal Conference on e-learnng (elearnng-03), 6-7 September 03, Belgrade, Serba TOWARDS ADVANCED DATA RETRIEVAL FROM LEARNING OBJECTS REPOSITORIES VALENTINA PAUNOVIĆ Belgrade Metropoltan

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league

More information

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

AN INDEXING METHOD FOR SUPPORTING SPATIAL QUERIES IN STRUCTURED PEER-TO-PEER SYSTEMS

AN INDEXING METHOD FOR SUPPORTING SPATIAL QUERIES IN STRUCTURED PEER-TO-PEER SYSTEMS AN INDEXING METHOD FOR SUPPORTING SPATIAL QUERIES IN STRUCTURED PEER-TO-PEER SYSTEMS Lngku Meng a, Wenun Xe a, *, Dan Lu a, b a School of Remote Sensng and Informaton Engneerng, Wuhan Unversty, Wuhan 4379,

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Brushlet Features for Texture Image Retrieval

Brushlet Features for Texture Image Retrieval DICTA00: Dgtal Image Computng Technques and Applcatons, 1 January 00, Melbourne, Australa 1 Brushlet Features for Texture Image Retreval Chbao Chen and Kap Luk Chan Informaton System Research Lab, School

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information