Using Language Models for Flat Text Queries in XML Retrieval

Size: px
Start display at page:

Download "Using Language Models for Flat Text Queries in XML Retrieval"

Transcription

1 Usng Language Models for Flat ext Queres n XML Retreval aul Oglve, Jame Callan Language echnoes Insttute School of Computer Scence Carnege Mellon Unversty ttsburgh, A USA {pto,callan}@cs.cmu.edu ABSRAC hs paper presents a language modelng system for rankng flat text queres aganst a collecton of structured documents. he system, bult usng Lemur, produces probablty estmates that arbtrary document components generated the query. hs paper descrbes storage mechansms and retreval algorthms for the evaluaton of unstructured queres over XML documents. he paper ncludes retreval experments usng a generatve language model on the content only topcs of the INEX testbed, demonstratng the strengths and flexblty of language modelng to a varety of problems. We also descrbe ndex characterstcs, runnng tmes, and the effectveness of the retreval algorthm. 1. INRODUCION Language modelng has been studed extensvely n standard Informaton Retreval n the last few years. Researches have demonstrated that the framework provded by language models has been powerful and flexble enough to provde strong solutons to numerous problems, ncludng ad-hoc nformaton retreval, known-tem fndng on the Internet, flterng, dstrbuted nformaton retreval, and clusterng. Wth the success of language modelng for ths wde varety of tasks and the ncreasng nterest n studyng structured document retreval, t s natural to apply the language modelng framework to XML retreval. hs paper descrbes and presents experments usng one way the generatve language model could be extended to model and support queres on structured documents. We model documents usng a tree-based language model. hs s smlar to many prevous models for structured document retreval [1][2][3][6][7][10], but dffers n that language modelng provdes some gudance n combnng nformaton from s n the tree and estmatng term weghts. hs work s also smlar to other works usng language models for XML retreval [5][9], but dffers n that we also present context-senstve language model smoothng and an mplementaton usng nformaton retreval style nverted lsts rather than a database. he next secton provdes background n language modelng n nformaton retreval. In Secton 3 we present our approach to modelng structured documents. Secton 4 descrbes queryng the tree-based language models presented n the prevous secton. In Secton 5we descrbe the ndexes requred to support retreval and the retreval algorthms. We descrbe the experment setup and ndexes used for INEX 2003 n Secton 6. Secton 7 descrbes expermental results. We dscuss relatonshps to other approaches to structured document retreval n Secton 8, and Secton 9 concludes the paper. 2. LANGUAGE MODELS FOR DOCUMEN RERIEVAL Language modelng appled to nformaton retreval problems typcally models text usng ungram language models. Ungram language models are smlar to bags-of-words representatons, as word order s gnored. he ungram language model specfcally estmates the probablty of a word gven some text. Document rankng typcally s done one of two ways: by measurng how much a query language model dverges from document language models [8], or by estmatng the probablty that each document generated the query strng. Snce we use the generatve language model for our experments, we wll not descrbe the dvergence based approaches here. 2.1 he Generatve Language Model he generatve method ranks documents by drectly estmatng the probablty of the query usng the texts language models [12][4][14][15]: ( Q where Q s the query strng, and s the language model estmated for the text, and (w s the query term frequency of the term. exts more lkely to have produced the query are ranked hgher. It s common to rank by the of the generatve probablty as t there s less danger of underflow and t produces the same orderngs: ( ( Q Under the assumptons that query terms are generated ndependently and that the query language model used n KLdvergence s the maxmum-lkelhood estmate, the generatve model and KL dvergence produce the same rankngs [11]. 2.2 he Maxmum-Lkelhood Estmate of a Language Model he most drect way to estmate a language model gven some observed text s to use the maxmum-lkelhood estmate, assumng an underlyng multnomal model. In ths case, the maxmum-lkelhood estmate s also the emprcal dstrbuton. An advantage of ths estmate s that t s easy to compute. It s very good at estmatng the probablty dstrbuton for the language model when the sze of the observed text s very large. It s gven by: freq (, w

2 where s the observed text, freq(w, s the number of tmes the word w occurs n, and s the length n words of. he maxmum lkelhood estmate s not good at estmatng low frequency terms for short texts, as t wll assgn zero probablty to those words. hs creates a problem for estmatng document language models n both KL dvergence and generatve language model approaches to rankng documents, as the of zero s negatve nfnty. he soluton to ths problem s smoothng. 2.3 Smoothng Smoothng s the re-estmaton of the probabltes n a language model. Smoothng s motvated by the fact that many of the language models we estmate are based on a small sample of the true probablty dstrbuton. Smoothng mproves the estmates by leveragng known patterns of word usage n language and other language models based on larger samples. In nformaton retreval smoothng s very mportant [15], because the language models tend to be constructed from very small amounts of text. How we estmate low probablty words can have large effects on the document scores. In addton to the problem of zero probabltes mentoned for maxmum-lkelhood estmates, much care s requred f ths probablty s close to zero. Small changes n the probablty wll have large effects on the arthm of the probablty, n turn havng large effects on the document scores. Smoothng also has an effect smlar to nverse document frequency [4], whch s used by many retreval algorthms. he smoothng technque most commonly used s lnear nterpolaton. Lnear nterpolaton s a smple approach to combnng estmates from dfferent language models: k λ 1 where k s the number of language models we are combnng, and λ s the weght on the model. o ensure that ths s a vald probablty dstrbuton, we must place these constrants on the lambdas: k 1 λ 1 and for 1 k, λ 0 One use of lnear nterpolaton s to smooth a document s language model wth a collecton language model. hs new model would then be used as the smoothed document language model n ether the generatve or KL-dvergence rankng approach. 2.4 Another Characterzaton When we take a smple lnear nterpolaton of the maxmum lkelhood model estmated from text and a collecton model, we can also characterze the probablty estmates as: where and f w otherwse ( 1 ω + ω MLE collecton ω collecton hs notaton dstngushes the probablty estmates for cases where the word has been n the text and where the word has not been wll be n the sample text. We wll use ths notaton later when descrbng the retreval algorthm, as t smplfes the descrpton and s smlar to the notaton used n prevous lterature [15]. he smple form of lnear nterpolaton where s a fxed constant s often referred to as Jelnek-Mercer smoothng. 3. SRUCURED DOCUMENS AND LANGUAGE MODELS he prevous secton descrbed how language modelng s used n unstructured document retreval. Wth structured documents such as XML or HML, we beleve that the nformaton contaned n the structure of the document can be used to mprove document retreval. In order to leverage ths nformaton, we need to model document structure n the language models. We model structured documents as trees. he s n the tree correspond drectly wth tags present n the document. A partal tree for a document mght look lke: document ttle abstract body secton 1 secton 2 references Nodes n the document tree correspond drectly to XML tags n the document. For each document n the tree, we estmate a language model. he language models for leaf s wth no chldren can be estmated from the text of the. he language models for other s are estmated by takng a lnear nterpolaton of a language model formed from the text n the (but not n any of ts chldren and the language models formed from the chldren. We have not specfed how the lnear nterpolaton parameters for combnng language models n the document tree should be chosen. hs could be task specfc, and tranng may be requred. he approach we wll adopt n ths paper s to set the weght on a chld as the accumulated length of the text n the chld dvded by the accumulated length of the. By accumulated length we mean the number of words drectly n the plus the accumulated length of the s chldren. Settng the parameters n ths manner assumes that a word n a one type s no more mportant than a word n any other type; t s the accumulated length of the text n the that determnes how much nformaton s contaned n the. We also wsh to smooth the maxmum lkelhood models that are estmated drectly from the text wth a collecton language model. In ths work, we wll combne the maxmum lkelhood models wth the collecton model usng a lnear nterpolaton wth fxed weghts. he collecton model may be specfc to the type, gvng context senstve smoothng, or the collecton model may be one large model estmated from everythng n the corpus, gvng a larger sample sze.

3 When the parameters are set proportonal to the text length and a sngle collecton model s used, ths results a specal case that s very smlar to the models used n [5][9]. he treebased language model estmated usng these parameter settngs wll be dentcal to a language model estmated by takng a smple lnear nterpolaton of a maxmum lkelhood estmate from the text n the and ts ancestors and a the collecton model. 4. RANKING HE REE MODELS In a retreval envronment for structured documents, t s desrable to provde support for both structured queres and unstructured, free-text queres. It s easer to adapt the generatve language model to structured documents, so we only consder that model n ths paper. It s smpler to support unstructured queres, so we wll descrbe retreval for them frst. 4.1 Unstructured Queres o rank document components for unstructured queres, we use the generatve language modelng approach for IR descrbed n Secton 2. For full document retreval, we need only compute the probablty that the document language model generated the query. If we wsh to return arbtrary document components, we need to compute the probablty that each component generated the query. Allowng the system to return arbtrary document components may result n the system stuffng the results lst wth many components from a sngle document. hs behavor s undesrable, so a flter on the results s necessary. One flter we employ takes a greedy approach to preventng overlap among components n the results lst. For each result, t wll be thrown out of the results f there s any component hgher n the rankng that s an ancestor or descendent of the document component under consderaton. 4.2 Structured Queres Our prevous paper on ths subject [11] dscusses how some structural query operators could be ncluded n the model. We do not currently support any of these operators n our system, so we wll not dscuss n depth here. However, we wll note that the retreval framework can support most desred structural query operators as relatvely easy to mplement query s. 4.3 ror robabltes Gven relevance assessments from past topcs, we can estmate pror probabltes of the document component beng relevant gven ts type. Another example pror may depend on the length of the text n the. A way to ncorporate ths nformaton s to rank by the probablty of the document gven the query. Usng Bayes rule, ths would allow us ncorporate the prors on the s. he pror for only the beng ranked would be used, and the system would multply the probablty that the generated the query by the pror: ( N Q ( Q ( (N Q N N (N (Q hs would result n rankng by the probablty of the document component gven the query, rather than the other way around. 5. SORAGE AND ALGORIHMS hs secton descrbes how we support structured retreval n the Lemur toolkt. We frst descrbe the ndexes bult to support retreval. hen we descrbe how the ndces are used by the retreval algorthm. We also present formulas for the computaton of the generatve probabltes we estmate for retreval. 5.1 Index Support here are two man storage structures n Lemur that provde the support necessary for the retreval algorthm. Lemur stores nverted ndexes contanng document and occurrences and document structures nformaton Inverted Indexes he basc dea to storng structured documents n Lemur for retreval s to use a modfed nverted lst. Smlar to storng term locatons for a document entry n an nverted lst, we store the s and the term frequences of the term n the s n the document entres of the nverted lst. he current mplementaton of the structured document ndex does not store term locatons, but could be adapted to store term locatons n the future. he nverted lsts are keyed by term, and each lst contans the followng: document frequency of the term a lst of document entres, each entry contanng o document d o term frequency (count of term n document o number of s the term occurs n o a lst of entres, each entry contanng d term frequency (count of term n When read nto memory, the nverted lsts are stored n an array of ntegers. he lsts are stored on dsk usng restrctedvarable length compresson and delta-encodng s appled to document ds and ds. In the document entry lsts, the documents entres are stored n order by ascendng document d. he entry lsts are smlarly stored n order by ncreasng d. Document entres and entres are only stored n the lst when the term frequency s greater than zero. Access to the lsts on dsks s facltated wth an nmemory lookup table for vocabulary terms. here s also an anaous set of nverted lsts for attrbute name/value pars assocated wth tags. For example, f the document contaned the text <date calendar Gregoran >, the ndex would have an nverted lst keyed by the trple date/calendar/gregoran. he structure and nformaton stored n the nverted lsts for the attrbute name/value pars s dentcal to those n the nverted lsts for terms Document Structure he document structure s stored compressed n memory usng restrcted varable length compresson. A lookup table keyed by document d provdes quck access to the block of compressed memory for a document. We choose to store the document structure n memory because t wll be requested

4 often durng retreval. For each document, a lst of nformaton about the document s s stored. For each, we store: parent of the type of length of the (number of words Snce ths lst of nformaton about the document structure s compressed usng a varable length encodng, we must decompress the memory to provde effcent access to nformaton about s. When the document structure for a document s beng decompressed, we also compute: accumulated length of the (length of text drectly n the + accumulated length of chldren number of chldren of the a lst of the s chldren hs decompresson and computaton of other useful nformaton about the document structure s computed n tme lnear to the number of s n the document beng decompressed. 5.2 Retreval We construct a query tree to process and rank document components. A typcal query tree s llustrated below. he leaf s of the query tree are term s whch read the nverted lsts for a term off of dsk and create result objects for document components contanng the term. he term s are also responsble for propagatng the term scores up the document tree. he sum merges the result lsts returned by each of the term s, combnng the score estmates. he score adjuster adjusts the score estmates to get the generaton probabltes and also apples any prors. he heap mantans a lst of the top n ranked objects and returns a sorted result lst. Effcent retreval s acheved usng a document at a tme approach. hs requres that the query tree be walked many tmes durng the evaluaton of a query, but results a large savng of memory, as only the result objects for a document and the top n results objects n the heap must be stored at any pont n tme. erm gregoran Heap Score adjuster Sum erm chant A more detaled descrpton of each of the query s follows. When each query s called, they are passed a document d to evaluate. In order to know whch document should be processed next, the term s pass up the next document d n the nverted lst. For other query s, the mnmum next document d among a s chldren gets passed up the query tree wth the results lst. We wll descrbe the query s bottom up, as that s how the scores are computed. We frst note that we can rewrte the of the probablty that the document generated the query as ( ( Q, + as shown n [15]. hs wll allow us to easly compute the tem n the frst sum easly usng term s, combne these components of the score usng a sum, and then add on the rest usng a score adjustment erm Node he term s read n the nverted lsts for a term w and create results where the score for a result s ntalzed to ( ( w w he term assumes that the parent d of a s smaller than the s d. It also assumes that the document entres n nverted lsts are organzed n ncreasng document d order and the entres are organzed n ncreasng term d order. he structured document ndex we bult s organzed ths way. In the followng algorthm descrpton, ndentaton s used to denote the body of a loop. 1 Seek to the next entry n the nverted lst where the document d s at least as large as the requested document 2 If the document d of the next entry s the requested document 3 Decompress the document structure nformaton for the document 4 Read n the entres from the nverted lst 5 Create the result objects for the leaf s. For each that contans the term: 6 Intalze the score for the result to the probablty part for the ( ( 1 ω freq, λ(, where ( ( length λ (, accumulated length and ω wll be used to set the nfluence of the collecton models. 7 ush the d onto the canddate heap 8 Store the result object n an array ndexed by d for fast access 9 Whle the canddate heap sn t empty: 10 op the top d off of the heap (the largest d, set t to the current d 11 Lookup the result from the result array 12 Lookup the d for the parent of the current 13 Lookup the parent s result 14 If the parent s result object s NULL: 15 Create a new result object for the parent and put t n the result array, ntalzng the score to 0 16 ush the parent s d onto the canddate heap

5 17 ropagate the part of the score from the current to the parent, settng the parent s part to ( parent + ( λ( parent, where accumulated length λ (, parent accumulated length ( ( parent 18 ush the result onto the front of the results lst 19 Set the result n the result array for the to NULL (ntalzng the result array for the next document [Now each document that contans the query term (or has a chld contanng the term has a result n the results lst where the score s the probablty part for the query term] 20 For each n the result lst 21 Compute the part of the generatve probablty for each. For lnear nterpolaton wth a constant ω and one sngle type ndependent collecton model, ths s, ω collecton For lnear nterpolaton wth a constant ω and type specfc collecton models, ths can be computed recursvely ω +, λ(, collecton, type chld chldren ( ( 22 Set the score for the result to +, chld λ( chld, (,, 23 Return the result lst and the next document d n the nverted lst he result lst now contans results for a sngle document where the score s ( w and the lst s ordered by ncreasng d Sum Node he sum mantans an array of result lsts, wth one result lst for each of the chldren. It seeks to the next entry n each of the chld result lsts where the document d s at least as large as the requested document. If necessary, t calls the chldren s to get ther next result lsts. For the requested document, the sum merges results from the result lsts of the chldren, settng the score of the new result equal to the sum of the chldren s results wth the same document and d. hs assumes that results n a result lst are ordered by ncreasng document d, then ncreasng d. he results returned by ths component have the score, and the mnmum document d returned by the chldren s returned Score Adjustment Node he score adjustment adds to each of the results, where, as defned for the term. If there s a pror probablty for the, the score adjustment also adds on the of the pror. he results n the lst now have the score, + + ( ( ( ( Q ( whch s the of the score by whch we wsh to rank document components Heap Node he heap repeatedly calls ts chld for result lsts untl the document collecton has been ranked. he next document d t calls for ts chld to process s the document d returned by the chld n the prevous evaluaton call. It mantans a heap of the top n results. After the document collecton has been ranked, t sorts the results by decreasng score and stores them n a result lst that s returned Other Nodes here are many other useful s that could be useful for retreval. One example s a that flters the result lsts so that the XML path of the n the document tree satsfes some requrements. Another example s a that throws out all but the top n components of a document. 6. EXERIMEN SEU he ndex we created used the Krovetz stemmer and InQuery stopword lst. opcs are smlarly processed, and all of our queres are constructed from the ttle, descrpton, and keywords felds. All words n the ttle, descrpton, and keywords felds of the topc are gven equal weght n the query. able 3 shows the sze of components created to support retreval on the INEX document collecton. he total ndex sze ncludng nformaton needed to do context senstve smoothng s about 70% the sze of the orgnal document collecton. A better compresson rato could be acheved by compresson of the context senstve smoothng support fles. Note that the document term fle whch s 100 MB s not necessary for the retreval algorthms descrbed above.

6 opc Context ror ath nex_eval Felds Strct Gen DK YES NO NO DK YES YES NO DK NO NO NO DK NO YES NO able 1: erformance of the retreval system on INEX 2002 CO topcs. Context refers to context senstve smoothng, pror refers to the document component type prors, and path refers to the overlappng path flter. Offcal Run Name opc Context ror ath nex_eval nex_eval_ng w/o overlap Felds Strct Gen Strct Gen Strct Gen LM_context_DK DK YES NO NO DK YES YES NO LM_context_typr_path_DK DK YES YES YES DK NO NO NO - DK NO YES NO LM_base_typr_path_DK DK NO YES YES able 2: Summary of runs and results for INEX 2003 CO topcs. Component Sze (MB Inverted fle 100 Document term fle (allows teraton 100 over terms n a document Document structure 30 Attrbutes nverted fle 23 Smoothng sngle collecton model 4 Smoothng context senstve models 81 (not compressed Other fles (lookup tables, vocabulary, 12 table of contents, etc. otal 350 able 3: Lemur structured ndex component szes able 4 shows approxmate runnng tmes for ndex constructon and retreval. he retreval tme for context nsenstve smoothng s reasonable at less than 20 seconds per query, but we would lke to lower the average query tme even more. We feel we can do ths wth some smple data structure optmzatons that wll ncrease memory reuse. Acton me (mns Indexng 25 Retreval of 36 INEX 2003 CO topcs 10 context nsenstve smoothng Retreval of 36 INEX 2003 CO topcs 45 context senstve smoothng able 4: Indexng and retreval tmes usng Lemur he hgher retreval tme for the context senstve retreval algorthm s due to the recursve computaton of the component of the score as descrbed Step 21 of Secton Clever redesgn of the algorthm may reduce the tme some. However, all of the descendent s n the document s tree must be vsted regardless of whether the descendent s contan any of the query terms. hs means that the computaton of the component of the scores s lnear n the number of s n the document tree, rather than the typcally sub-lnear case for computaton of the score components. If the and functons and ther parameters are known, t s possble to precompute and store necessary nformaton to reduce the runnng tme to somethng only slghtly larger than the context nsenstve verson. However, our mplementaton s meant for research, so we prefer that these parameters reman easly changeable. 7. EXERIMEN RESULS We submtted three offcal runs as descrbed n able 2. All of our runs used the ttle, descrpton, and keyword felds of the topcs. Unfortunately, two of our runs performed rather poorly. hs s ether an error n our path flter or a problem wth the component type prors. We would also lke to evaluate the addtonal runs correspondng to the dashes n the table, but we have not been able to do these experments yet. he LM_context_DK run has good performance across all measures. hs s our basc language modelng system usng context senstve smoothng. he strong performance of the context senstve language modelng approach speaks well for the flexblty of language modelng. Unfortunately, we have not been able to do a through evaluaton of varatons of the system to fgure out whch addtonal components are helpful. We have done some experments on the INEX 2002 content only topcs. he summary of our runs for the 2002 topcs s gven n able 1. here s lttle for us to conclude from the 2002 topcs. It s not clear that context senstve smoothng makes any sgnfcant dfference. he prors may gve a small boost, but the prors were estmated drectly from the relevance assessments for the 2002 CO topcs. We would lke to answer questons of whether context senstve smoothng s helpful, whether a component type pror helps, and whether component retreval for ths task performs better than standard document retreval. 8. RELAED WORK here exsts a large and growng body of work n retrevng nformaton from XML documents. Some work s descrbed n our prevous paper [11] and much of the more recent work s also descrbed n the INEX 2002 proceedngs [13]. Wth that n mnd, we wll focus our dscusson of related work on language modelng approaches for structured document retreval. In [5] a generatve language modelng approach for content only queres s descrbed where a document component s

7 language model s estmated by takng a lnear nterpolaton of the maxmum lkelhood model from the text of the and ts ancestors and a collecton model. hs corresponds to a specal case of our approach. Our model s more flexble n that t allows context senstve smoothng and dfferent weghtng of text n chldren s. he authors of [9] also present a generatve language model for content only queres n structured document retreval. hey estmate the collecton model n a dfferent way, usng document frequences nstead of collecton term frequences. As wth [5], ths model can be vewed as a specal case of the language modelng approach presented here. 9. CLOSING REMARKS We presented experments usng a herarchcal language model. he strong performance of language modelng algorthms demonstrates the flexblty and ease of adaptng language models to the problem. In our prelmnary experments, context senstve smoothng dd not gve much dfferent performance than usng a sngle collecton model. We descrbed data structures and retreval algorthms to support retreval of arbtrary XML document components wthn the Lemur toolkt. We are reasonably pleased wth the effcency of the algorthms for a research system, but we wll strve to mprove the algorthms and data structures to reduce retreval tmes even further. In our future work, we would lke to compare the component retreval to standard document retreval. We would also lke to nvestgate query expanson usng XML document components. Addtonally, we would lke to explore dfferent ways of settng the weghts on the s language models, as we beleve that words n some components may convey more useful nformaton than words n other components. 10. ACKNOWLEDGMENS hs work was sponsored by the Advanced Research and Development Actvty n Informaton echnoy (ARDA under ts Statstcal Language Modelng for Informaton Retreval Research rogram. Any opnons, fndngs, conclusons, or recommendatons expressed n ths materal are those of the authors, and do not necessarly reflect those of the sponsor. 11. REFERENCES [1] Fuhr, N. and K. Großjohann. XIRQL: A query language for nformaton retreval n XML documents. In roceedngs of the 24 th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (2001, ACM ress, [2] Grabs,. and H.J. Schek. Generatng vector spaces onthe-fly for flexble XML retreval. In roceedngs of the 25 th Annual Internatonal ACM SIGIR Workshop on XML Informaton Retreval (2002, ACM. [3] Hatanao, K., H. Knutan, M. Yoshkawa, and S. Uemura. Informaton retreval system for XML documents. In roceedngs of Database and Expert Systems Applcatons (DEXA 2002, Sprnger, [4] Hemstra, D. Usng language models for nformaton retreval, h.d. hess (2001, Unversty of wente. [5] Hemstra, D. A database approach to context-based XML retreval. In [13], [6] Kaza, G., M. Lalmas, and. Rölleke. A model for the representaton and focused retreval of structured documents based on fuzzy aggregaton. In he 8 th Symposum on Strng rocessng and Informaton Retreval (SIRE 2001, IEEE, [7] Kaza, G., M. Lalmas, and. Rölleke. Focussed Structured Document Retreval. In roceedngs of the 9 th Symposum on Strng rocessng and Informaton Retreval (SIRE 2002, Sprnger, [8] Lafferty, J., and C. Zha. Document language models, query models, and rsk mnmzaton for nformaton retreval. In roceedngs of the 24 th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (2001, ACM ress, [9] Lst, J., and A.. de Vres. CWI at INEX In [13], [10] Myaeng, S.H., D.H. Jang, M.S. Km, and Z.C. Zhoo. A flexble model for retreval of SGML documents. In roceedngs of the 21 st Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (1998, ACM ress, [11] Oglve,. and J. Callan. Language models and structured document retreval. In [13], [12] onte, J., and W.B. Croft. A language modelng approach to nformaton retreval. In roceedngs of the 21 st Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (1998, ACM ress, [13] roceedngs of the Frst Workshop of the Intatve for the Evaluaton of XML Retreval (INEX. 2003, DELOS. [14] Westerweld,., W. Kraaj, and D. Hemstra. Retrevng web pages usng content, lnks, URLs, and anchors. In roceedngs of the enth ext Retreval Conference, REC 2001, NIS Specal publcaton (2002, [15] Zha, C. and J. Lafferty. A study of smoothng methods for language models appled to ad hoc nformaton retreval. In roceedngs of the 24 th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (2001, ACM ress,

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks Federated Search of Text-Based Dgtal Lbrares n Herarchcal Peer-to-Peer Networks Je Lu School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA 15213 jelu@cs.cmu.edu Jame Callan School of Computer

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Web-supported Matching and Classification of Business Opportunities

Web-supported Matching and Classification of Business Opportunities Web-supported Matchng and Classfcaton of Busness Opportuntes. DIRO Unversté de Montréal C.P. 628, succursale Centre-vlle Montréal, Québec, H3C 3J7, Canada Jng Ba, Franços Parads,2, Jan-Yun Ne {bajng, paradfr,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vdyanagar Faculty Name: Am D. Trved Class: SYBCA Subject: US03CBCA03 (Advanced Data & Fle Structure) *UNIT 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Summarizing Data using Bottom-k Sketches

Summarizing Data using Bottom-k Sketches Summarzng Data usng Bottom-k Sketches Edth Cohen AT&T Labs Research 8 Park Avenue Florham Park, NJ 7932, USA edth@research.att.com Ham Kaplan School of Computer Scence Tel Avv Unversty Tel Avv, Israel

More information

Isosurface Extraction in Time-varying Fields Using a Temporal Hierarchical Index Tree

Isosurface Extraction in Time-varying Fields Using a Temporal Hierarchical Index Tree Isosurface Extracton n Tme-varyng Felds Usng a Temporal Herarchcal Index Tree Han-We Shen MRJ Technology Solutons / NASA Ames Research Center Abstract Many hgh-performance sosurface extracton algorthms

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Random Varables and Probablty Dstrbutons Some Prelmnary Informaton Scales on Measurement IE231 - Lecture Notes 5 Mar 14, 2017 Nomnal scale: These are categorcal values that has no relatonshp of order or

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Automatic selection of reference velocities for recursive depth migration

Automatic selection of reference velocities for recursive depth migration Automatc selecton of mgraton veloctes Automatc selecton of reference veloctes for recursve depth mgraton Hugh D. Geger and Gary F. Margrave ABSTRACT Wave equaton depth mgraton methods such as phase-shft

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

CS1100 Introduction to Programming

CS1100 Introduction to Programming Factoral (n) Recursve Program fact(n) = n*fact(n-) CS00 Introducton to Programmng Recurson and Sortng Madhu Mutyam Department of Computer Scence and Engneerng Indan Insttute of Technology Madras nt fact

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

CS221: Algorithms and Data Structures. Priority Queues and Heaps. Alan J. Hu (Borrowing slides from Steve Wolfman)

CS221: Algorithms and Data Structures. Priority Queues and Heaps. Alan J. Hu (Borrowing slides from Steve Wolfman) CS: Algorthms and Data Structures Prorty Queues and Heaps Alan J. Hu (Borrowng sldes from Steve Wolfman) Learnng Goals After ths unt, you should be able to: Provde examples of approprate applcatons for

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Feature-Based Matrix Factorization

Feature-Based Matrix Factorization Feature-Based Matrx Factorzaton arxv:1109.2271v3 [cs.ai] 29 Dec 2011 Tanq Chen, Zhao Zheng, Quxa Lu, Wenan Zhang, Yong Yu {tqchen,zhengzhao,luquxa,wnzhang,yyu}@apex.stu.edu.cn Apex Data & Knowledge Management

More information