Using Language Models for Flat Text Queries in XML Retrieval

Size: px

Start display at page:

Download "Using Language Models for Flat Text Queries in XML Retrieval"

Lorraine McCormick
6 years ago
Views:

1 Usng Language Models for Flat ext Queres n XML Retreval aul Oglve, Jame Callan Language echnoes Insttute School of Computer Scence Carnege Mellon Unversty ttsburgh, A USA {pto,callan}@cs.cmu.edu ABSRAC hs paper presents a language modelng system for rankng flat text queres aganst a collecton of structured documents. he system, bult usng Lemur, produces probablty estmates that arbtrary document components generated the query. hs paper descrbes storage mechansms and retreval algorthms for the evaluaton of unstructured queres over XML documents. he paper ncludes retreval experments usng a generatve language model on the content only topcs of the INEX testbed, demonstratng the strengths and flexblty of language modelng to a varety of problems. We also descrbe ndex characterstcs, runnng tmes, and the effectveness of the retreval algorthm. 1. INRODUCION Language modelng has been studed extensvely n standard Informaton Retreval n the last few years. Researches have demonstrated that the framework provded by language models has been powerful and flexble enough to provde strong solutons to numerous problems, ncludng ad-hoc nformaton retreval, known-tem fndng on the Internet, flterng, dstrbuted nformaton retreval, and clusterng. Wth the success of language modelng for ths wde varety of tasks and the ncreasng nterest n studyng structured document retreval, t s natural to apply the language modelng framework to XML retreval. hs paper descrbes and presents experments usng one way the generatve language model could be extended to model and support queres on structured documents. We model documents usng a tree-based language model. hs s smlar to many prevous models for structured document retreval [1][2][3][6][7][10], but dffers n that language modelng provdes some gudance n combnng nformaton from s n the tree and estmatng term weghts. hs work s also smlar to other works usng language models for XML retreval [5][9], but dffers n that we also present context-senstve language model smoothng and an mplementaton usng nformaton retreval style nverted lsts rather than a database. he next secton provdes background n language modelng n nformaton retreval. In Secton 3 we present our approach to modelng structured documents. Secton 4 descrbes queryng the tree-based language models presented n the prevous secton. In Secton 5we descrbe the ndexes requred to support retreval and the retreval algorthms. We descrbe the experment setup and ndexes used for INEX 2003 n Secton 6. Secton 7 descrbes expermental results. We dscuss relatonshps to other approaches to structured document retreval n Secton 8, and Secton 9 concludes the paper. 2. LANGUAGE MODELS FOR DOCUMEN RERIEVAL Language modelng appled to nformaton retreval problems typcally models text usng ungram language models. Ungram language models are smlar to bags-of-words representatons, as word order s gnored. he ungram language model specfcally estmates the probablty of a word gven some text. Document rankng typcally s done one of two ways: by measurng how much a query language model dverges from document language models [8], or by estmatng the probablty that each document generated the query strng. Snce we use the generatve language model for our experments, we wll not descrbe the dvergence based approaches here. 2.1 he Generatve Language Model he generatve method ranks documents by drectly estmatng the probablty of the query usng the texts language models [12][4][14][15]: ( Q where Q s the query strng, and s the language model estmated for the text, and (w s the query term frequency of the term. exts more lkely to have produced the query are ranked hgher. It s common to rank by the of the generatve probablty as t there s less danger of underflow and t produces the same orderngs: ( ( Q Under the assumptons that query terms are generated ndependently and that the query language model used n KLdvergence s the maxmum-lkelhood estmate, the generatve model and KL dvergence produce the same rankngs [11]. 2.2 he Maxmum-Lkelhood Estmate of a Language Model he most drect way to estmate a language model gven some observed text s to use the maxmum-lkelhood estmate, assumng an underlyng multnomal model. In ths case, the maxmum-lkelhood estmate s also the emprcal dstrbuton. An advantage of ths estmate s that t s easy to compute. It s very good at estmatng the probablty dstrbuton for the language model when the sze of the observed text s very large. It s gven by: freq (, w

2 where s the observed text, freq(w, s the number of tmes the word w occurs n, and s the length n words of. he maxmum lkelhood estmate s not good at estmatng low frequency terms for short texts, as t wll assgn zero probablty to those words. hs creates a problem for estmatng document language models n both KL dvergence and generatve language model approaches to rankng documents, as the of zero s negatve nfnty. he soluton to ths problem s smoothng. 2.3 Smoothng Smoothng s the re-estmaton of the probabltes n a language model. Smoothng s motvated by the fact that many of the language models we estmate are based on a small sample of the true probablty dstrbuton. Smoothng mproves the estmates by leveragng known patterns of word usage n language and other language models based on larger samples. In nformaton retreval smoothng s very mportant [15], because the language models tend to be constructed from very small amounts of text. How we estmate low probablty words can have large effects on the document scores. In addton to the problem of zero probabltes mentoned for maxmum-lkelhood estmates, much care s requred f ths probablty s close to zero. Small changes n the probablty wll have large effects on the arthm of the probablty, n turn havng large effects on the document scores. Smoothng also has an effect smlar to nverse document frequency [4], whch s used by many retreval algorthms. he smoothng technque most commonly used s lnear nterpolaton. Lnear nterpolaton s a smple approach to combnng estmates from dfferent language models: k λ 1 where k s the number of language models we are combnng, and λ s the weght on the model. o ensure that ths s a vald probablty dstrbuton, we must place these constrants on the lambdas: k 1 λ 1 and for 1 k, λ 0 One use of lnear nterpolaton s to smooth a document s language model wth a collecton language model. hs new model would then be used as the smoothed document language model n ether the generatve or KL-dvergence rankng approach. 2.4 Another Characterzaton When we take a smple lnear nterpolaton of the maxmum lkelhood model estmated from text and a collecton model, we can also characterze the probablty estmates as: where and f w otherwse ( 1 ω + ω MLE collecton ω collecton hs notaton dstngushes the probablty estmates for cases where the word has been n the text and where the word has not been wll be n the sample text. We wll use ths notaton later when descrbng the retreval algorthm, as t smplfes the descrpton and s smlar to the notaton used n prevous lterature [15]. he smple form of lnear nterpolaton where s a fxed constant s often referred to as Jelnek-Mercer smoothng. 3. SRUCURED DOCUMENS AND LANGUAGE MODELS he prevous secton descrbed how language modelng s used n unstructured document retreval. Wth structured documents such as XML or HML, we beleve that the nformaton contaned n the structure of the document can be used to mprove document retreval. In order to leverage ths nformaton, we need to model document structure n the language models. We model structured documents as trees. he s n the tree correspond drectly wth tags present n the document. A partal tree for a document mght look lke: document ttle abstract body secton 1 secton 2 references Nodes n the document tree correspond drectly to XML tags n the document. For each document n the tree, we estmate a language model. he language models for leaf s wth no chldren can be estmated from the text of the. he language models for other s are estmated by takng a lnear nterpolaton of a language model formed from the text n the (but not n any of ts chldren and the language models formed from the chldren. We have not specfed how the lnear nterpolaton parameters for combnng language models n the document tree should be chosen. hs could be task specfc, and tranng may be requred. he approach we wll adopt n ths paper s to set the weght on a chld as the accumulated length of the text n the chld dvded by the accumulated length of the. By accumulated length we mean the number of words drectly n the plus the accumulated length of the s chldren. Settng the parameters n ths manner assumes that a word n a one type s no more mportant than a word n any other type; t s the accumulated length of the text n the that determnes how much nformaton s contaned n the. We also wsh to smooth the maxmum lkelhood models that are estmated drectly from the text wth a collecton language model. In ths work, we wll combne the maxmum lkelhood models wth the collecton model usng a lnear nterpolaton wth fxed weghts. he collecton model may be specfc to the type, gvng context senstve smoothng, or the collecton model may be one large model estmated from everythng n the corpus, gvng a larger sample sze.

3 When the parameters are set proportonal to the text length and a sngle collecton model s used, ths results a specal case that s very smlar to the models used n [5][9]. he treebased language model estmated usng these parameter settngs wll be dentcal to a language model estmated by takng a smple lnear nterpolaton of a maxmum lkelhood estmate from the text n the and ts ancestors and a the collecton model. 4. RANKING HE REE MODELS In a retreval envronment for structured documents, t s desrable to provde support for both structured queres and unstructured, free-text queres. It s easer to adapt the generatve language model to structured documents, so we only consder that model n ths paper. It s smpler to support unstructured queres, so we wll descrbe retreval for them frst. 4.1 Unstructured Queres o rank document components for unstructured queres, we use the generatve language modelng approach for IR descrbed n Secton 2. For full document retreval, we need only compute the probablty that the document language model generated the query. If we wsh to return arbtrary document components, we need to compute the probablty that each component generated the query. Allowng the system to return arbtrary document components may result n the system stuffng the results lst wth many components from a sngle document. hs behavor s undesrable, so a flter on the results s necessary. One flter we employ takes a greedy approach to preventng overlap among components n the results lst. For each result, t wll be thrown out of the results f there s any component hgher n the rankng that s an ancestor or descendent of the document component under consderaton. 4.2 Structured Queres Our prevous paper on ths subject [11] dscusses how some structural query operators could be ncluded n the model. We do not currently support any of these operators n our system, so we wll not dscuss n depth here. However, we wll note that the retreval framework can support most desred structural query operators as relatvely easy to mplement query s. 4.3 ror robabltes Gven relevance assessments from past topcs, we can estmate pror probabltes of the document component beng relevant gven ts type. Another example pror may depend on the length of the text n the. A way to ncorporate ths nformaton s to rank by the probablty of the document gven the query. Usng Bayes rule, ths would allow us ncorporate the prors on the s. he pror for only the beng ranked would be used, and the system would multply the probablty that the generated the query by the pror: ( N Q ( Q ( (N Q N N (N (Q hs would result n rankng by the probablty of the document component gven the query, rather than the other way around. 5. SORAGE AND ALGORIHMS hs secton descrbes how we support structured retreval n the Lemur toolkt. We frst descrbe the ndexes bult to support retreval. hen we descrbe how the ndces are used by the retreval algorthm. We also present formulas for the computaton of the generatve probabltes we estmate for retreval. 5.1 Index Support here are two man storage structures n Lemur that provde the support necessary for the retreval algorthm. Lemur stores nverted ndexes contanng document and occurrences and document structures nformaton Inverted Indexes he basc dea to storng structured documents n Lemur for retreval s to use a modfed nverted lst. Smlar to storng term locatons for a document entry n an nverted lst, we store the s and the term frequences of the term n the s n the document entres of the nverted lst. he current mplementaton of the structured document ndex does not store term locatons, but could be adapted to store term locatons n the future. he nverted lsts are keyed by term, and each lst contans the followng: document frequency of the term a lst of document entres, each entry contanng o document d o term frequency (count of term n document o number of s the term occurs n o a lst of entres, each entry contanng d term frequency (count of term n When read nto memory, the nverted lsts are stored n an array of ntegers. he lsts are stored on dsk usng restrctedvarable length compresson and delta-encodng s appled to document ds and ds. In the document entry lsts, the documents entres are stored n order by ascendng document d. he entry lsts are smlarly stored n order by ncreasng d. Document entres and entres are only stored n the lst when the term frequency s greater than zero. Access to the lsts on dsks s facltated wth an nmemory lookup table for vocabulary terms. here s also an anaous set of nverted lsts for attrbute name/value pars assocated wth tags. For example, f the document contaned the text <date calendar Gregoran >, the ndex would have an nverted lst keyed by the trple date/calendar/gregoran. he structure and nformaton stored n the nverted lsts for the attrbute name/value pars s dentcal to those n the nverted lsts for terms Document Structure he document structure s stored compressed n memory usng restrcted varable length compresson. A lookup table keyed by document d provdes quck access to the block of compressed memory for a document. We choose to store the document structure n memory because t wll be requested

4 often durng retreval. For each document, a lst of nformaton about the document s s stored. For each, we store: parent of the type of length of the (number of words Snce ths lst of nformaton about the document structure s compressed usng a varable length encodng, we must decompress the memory to provde effcent access to nformaton about s. When the document structure for a document s beng decompressed, we also compute: accumulated length of the (length of text drectly n the + accumulated length of chldren number of chldren of the a lst of the s chldren hs decompresson and computaton of other useful nformaton about the document structure s computed n tme lnear to the number of s n the document beng decompressed. 5.2 Retreval We construct a query tree to process and rank document components. A typcal query tree s llustrated below. he leaf s of the query tree are term s whch read the nverted lsts for a term off of dsk and create result objects for document components contanng the term. he term s are also responsble for propagatng the term scores up the document tree. he sum merges the result lsts returned by each of the term s, combnng the score estmates. he score adjuster adjusts the score estmates to get the generaton probabltes and also apples any prors. he heap mantans a lst of the top n ranked objects and returns a sorted result lst. Effcent retreval s acheved usng a document at a tme approach. hs requres that the query tree be walked many tmes durng the evaluaton of a query, but results a large savng of memory, as only the result objects for a document and the top n results objects n the heap must be stored at any pont n tme. erm gregoran Heap Score adjuster Sum erm chant A more detaled descrpton of each of the query s follows. When each query s called, they are passed a document d to evaluate. In order to know whch document should be processed next, the term s pass up the next document d n the nverted lst. For other query s, the mnmum next document d among a s chldren gets passed up the query tree wth the results lst. We wll descrbe the query s bottom up, as that s how the scores are computed. We frst note that we can rewrte the of the probablty that the document generated the query as ( ( Q, + as shown n [15]. hs wll allow us to easly compute the tem n the frst sum easly usng term s, combne these components of the score usng a sum, and then add on the rest usng a score adjustment erm Node he term s read n the nverted lsts for a term w and create results where the score for a result s ntalzed to ( ( w w he term assumes that the parent d of a s smaller than the s d. It also assumes that the document entres n nverted lsts are organzed n ncreasng document d order and the entres are organzed n ncreasng term d order. he structured document ndex we bult s organzed ths way. In the followng algorthm descrpton, ndentaton s used to denote the body of a loop. 1 Seek to the next entry n the nverted lst where the document d s at least as large as the requested document 2 If the document d of the next entry s the requested document 3 Decompress the document structure nformaton for the document 4 Read n the entres from the nverted lst 5 Create the result objects for the leaf s. For each that contans the term: 6 Intalze the score for the result to the probablty part for the ( ( 1 ω freq, λ(, where ( ( length λ (, accumulated length and ω wll be used to set the nfluence of the collecton models. 7 ush the d onto the canddate heap 8 Store the result object n an array ndexed by d for fast access 9 Whle the canddate heap sn t empty: 10 op the top d off of the heap (the largest d, set t to the current d 11 Lookup the result from the result array 12 Lookup the d for the parent of the current 13 Lookup the parent s result 14 If the parent s result object s NULL: 15 Create a new result object for the parent and put t n the result array, ntalzng the score to 0 16 ush the parent s d onto the canddate heap

5 17 ropagate the part of the score from the current to the parent, settng the parent s part to ( parent + ( λ( parent, where accumulated length λ (, parent accumulated length ( ( parent 18 ush the result onto the front of the results lst 19 Set the result n the result array for the to NULL (ntalzng the result array for the next document [Now each document that contans the query term (or has a chld contanng the term has a result n the results lst where the score s the probablty part for the query term] 20 For each n the result lst 21 Compute the part of the generatve probablty for each. For lnear nterpolaton wth a constant ω and one sngle type ndependent collecton model, ths s, ω collecton For lnear nterpolaton wth a constant ω and type specfc collecton models, ths can be computed recursvely ω +, λ(, collecton, type chld chldren ( ( 22 Set the score for the result to +, chld λ( chld, (,, 23 Return the result lst and the next document d n the nverted lst he result lst now contans results for a sngle document where the score s ( w and the lst s ordered by ncreasng d Sum Node he sum mantans an array of result lsts, wth one result lst for each of the chldren. It seeks to the next entry n each of the chld result lsts where the document d s at least as large as the requested document. If necessary, t calls the chldren s to get ther next result lsts. For the requested document, the sum merges results from the result lsts of the chldren, settng the score of the new result equal to the sum of the chldren s results wth the same document and d. hs assumes that results n a result lst are ordered by ncreasng document d, then ncreasng d. he results returned by ths component have the score, and the mnmum document d returned by the chldren s returned Score Adjustment Node he score adjustment adds to each of the results, where, as defned for the term. If there s a pror probablty for the, the score adjustment also adds on the of the pror. he results n the lst now have the score, + + ( ( ( ( Q ( whch s the of the score by whch we wsh to rank document components Heap Node he heap repeatedly calls ts chld for result lsts untl the document collecton has been ranked. he next document d t calls for ts chld to process s the document d returned by the chld n the prevous evaluaton call. It mantans a heap of the top n results. After the document collecton has been ranked, t sorts the results by decreasng score and stores them n a result lst that s returned Other Nodes here are many other useful s that could be useful for retreval. One example s a that flters the result lsts so that the XML path of the n the document tree satsfes some requrements. Another example s a that throws out all but the top n components of a document. 6. EXERIMEN SEU he ndex we created used the Krovetz stemmer and InQuery stopword lst. opcs are smlarly processed, and all of our queres are constructed from the ttle, descrpton, and keywords felds. All words n the ttle, descrpton, and keywords felds of the topc are gven equal weght n the query. able 3 shows the sze of components created to support retreval on the INEX document collecton. he total ndex sze ncludng nformaton needed to do context senstve smoothng s about 70% the sze of the orgnal document collecton. A better compresson rato could be acheved by compresson of the context senstve smoothng support fles. Note that the document term fle whch s 100 MB s not necessary for the retreval algorthms descrbed above.

6 opc Context ror ath nex_eval Felds Strct Gen DK YES NO NO DK YES YES NO DK NO NO NO DK NO YES NO able 1: erformance of the retreval system on INEX 2002 CO topcs. Context refers to context senstve smoothng, pror refers to the document component type prors, and path refers to the overlappng path flter. Offcal Run Name opc Context ror ath nex_eval nex_eval_ng w/o overlap Felds Strct Gen Strct Gen Strct Gen LM_context_DK DK YES NO NO DK YES YES NO LM_context_typr_path_DK DK YES YES YES DK NO NO NO - DK NO YES NO LM_base_typr_path_DK DK NO YES YES able 2: Summary of runs and results for INEX 2003 CO topcs. Component Sze (MB Inverted fle 100 Document term fle (allows teraton 100 over terms n a document Document structure 30 Attrbutes nverted fle 23 Smoothng sngle collecton model 4 Smoothng context senstve models 81 (not compressed Other fles (lookup tables, vocabulary, 12 table of contents, etc. otal 350 able 3: Lemur structured ndex component szes able 4 shows approxmate runnng tmes for ndex constructon and retreval. he retreval tme for context nsenstve smoothng s reasonable at less than 20 seconds per query, but we would lke to lower the average query tme even more. We feel we can do ths wth some smple data structure optmzatons that wll ncrease memory reuse. Acton me (mns Indexng 25 Retreval of 36 INEX 2003 CO topcs 10 context nsenstve smoothng Retreval of 36 INEX 2003 CO topcs 45 context senstve smoothng able 4: Indexng and retreval tmes usng Lemur he hgher retreval tme for the context senstve retreval algorthm s due to the recursve computaton of the component of the score as descrbed Step 21 of Secton Clever redesgn of the algorthm may reduce the tme some. However, all of the descendent s n the document s tree must be vsted regardless of whether the descendent s contan any of the query terms. hs means that the computaton of the component of the scores s lnear n the number of s n the document tree, rather than the typcally sub-lnear case for computaton of the score components. If the and functons and ther parameters are known, t s possble to precompute and store necessary nformaton to reduce the runnng tme to somethng only slghtly larger than the context nsenstve verson. However, our mplementaton s meant for research, so we prefer that these parameters reman easly changeable. 7. EXERIMEN RESULS We submtted three offcal runs as descrbed n able 2. All of our runs used the ttle, descrpton, and keyword felds of the topcs. Unfortunately, two of our runs performed rather poorly. hs s ether an error n our path flter or a problem wth the component type prors. We would also lke to evaluate the addtonal runs correspondng to the dashes n the table, but we have not been able to do these experments yet. he LM_context_DK run has good performance across all measures. hs s our basc language modelng system usng context senstve smoothng. he strong performance of the context senstve language modelng approach speaks well for the flexblty of language modelng. Unfortunately, we have not been able to do a through evaluaton of varatons of the system to fgure out whch addtonal components are helpful. We have done some experments on the INEX 2002 content only topcs. he summary of our runs for the 2002 topcs s gven n able 1. here s lttle for us to conclude from the 2002 topcs. It s not clear that context senstve smoothng makes any sgnfcant dfference. he prors may gve a small boost, but the prors were estmated drectly from the relevance assessments for the 2002 CO topcs. We would lke to answer questons of whether context senstve smoothng s helpful, whether a component type pror helps, and whether component retreval for ths task performs better than standard document retreval. 8. RELAED WORK here exsts a large and growng body of work n retrevng nformaton from XML documents. Some work s descrbed n our prevous paper [11] and much of the more recent work s also descrbed n the INEX 2002 proceedngs [13]. Wth that n mnd, we wll focus our dscusson of related work on language modelng approaches for structured document retreval. In [5] a generatve language modelng approach for content only queres s descrbed where a document component s

7 language model s estmated by takng a lnear nterpolaton of the maxmum lkelhood model from the text of the and ts ancestors and a collecton model. hs corresponds to a specal case of our approach. Our model s more flexble n that t allows context senstve smoothng and dfferent weghtng of text n chldren s. he authors of [9] also present a generatve language model for content only queres n structured document retreval. hey estmate the collecton model n a dfferent way, usng document frequences nstead of collecton term frequences. As wth [5], ths model can be vewed as a specal case of the language modelng approach presented here. 9. CLOSING REMARKS We presented experments usng a herarchcal language model. he strong performance of language modelng algorthms demonstrates the flexblty and ease of adaptng language models to the problem. In our prelmnary experments, context senstve smoothng dd not gve much dfferent performance than usng a sngle collecton model. We descrbed data structures and retreval algorthms to support retreval of arbtrary XML document components wthn the Lemur toolkt. We are reasonably pleased wth the effcency of the algorthms for a research system, but we wll strve to mprove the algorthms and data structures to reduce retreval tmes even further. In our future work, we would lke to compare the component retreval to standard document retreval. We would also lke to nvestgate query expanson usng XML document components. Addtonally, we would lke to explore dfferent ways of settng the weghts on the s language models, as we beleve that words n some components may convey more useful nformaton than words n other components. 10. ACKNOWLEDGMENS hs work was sponsored by the Advanced Research and Development Actvty n Informaton echnoy (ARDA under ts Statstcal Language Modelng for Informaton Retreval Research rogram. Any opnons, fndngs, conclusons, or recommendatons expressed n ths materal are those of the authors, and do not necessarly reflect those of the sponsor. 11. REFERENCES [1] Fuhr, N. and K. Großjohann. XIRQL: A query language for nformaton retreval n XML documents. In roceedngs of the 24 th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (2001, ACM ress, [2] Grabs,. and H.J. Schek. Generatng vector spaces onthe-fly for flexble XML retreval. In roceedngs of the 25 th Annual Internatonal ACM SIGIR Workshop on XML Informaton Retreval (2002, ACM. [3] Hatanao, K., H. Knutan, M. Yoshkawa, and S. Uemura. Informaton retreval system for XML documents. In roceedngs of Database and Expert Systems Applcatons (DEXA 2002, Sprnger, [4] Hemstra, D. Usng language models for nformaton retreval, h.d. hess (2001, Unversty of wente. [5] Hemstra, D. A database approach to context-based XML retreval. In [13], [6] Kaza, G., M. Lalmas, and. Rölleke. A model for the representaton and focused retreval of structured documents based on fuzzy aggregaton. In he 8 th Symposum on Strng rocessng and Informaton Retreval (SIRE 2001, IEEE, [7] Kaza, G., M. Lalmas, and. Rölleke. Focussed Structured Document Retreval. In roceedngs of the 9 th Symposum on Strng rocessng and Informaton Retreval (SIRE 2002, Sprnger, [8] Lafferty, J., and C. Zha. Document language models, query models, and rsk mnmzaton for nformaton retreval. In roceedngs of the 24 th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (2001, ACM ress, [9] Lst, J., and A.. de Vres. CWI at INEX In [13], [10] Myaeng, S.H., D.H. Jang, M.S. Km, and Z.C. Zhoo. A flexble model for retreval of SGML documents. In roceedngs of the 21 st Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (1998, ACM ress, [11] Oglve,. and J. Callan. Language models and structured document retreval. In [13], [12] onte, J., and W.B. Croft. A language modelng approach to nformaton retreval. In roceedngs of the 21 st Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (1998, ACM ress, [13] roceedngs of the Frst Workshop of the Intatve for the Evaluaton of XML Retreval (INEX. 2003, DELOS. [14] Westerweld,., W. Kraaj, and D. Hemstra. Retrevng web pages usng content, lnks, URLs, and anchors. In roceedngs of the enth ext Retreval Conference, REC 2001, NIS Specal publcaton (2002, [15] Zha, C. and J. Lafferty. A study of smoothng methods for language models appled to ad hoc nformaton retreval. In roceedngs of the 24 th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (2001, ACM ress,

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,