Cross-Language Information Retrieval

Size: px
Start display at page:

Download "Cross-Language Information Retrieval"

Transcription

1 Feature Artcle: Cross-Language Informaton Retreval 19 Cross-Language Informaton Retreval Jan-Yun Ne 1 Abstract A research group n Unversty of Montreal has worked on the problem of cross-language nformaton retreval (CLIR) for several years. A method that explots parallel texts for query translaton s proposed. Ths method s shown to allow for retreval effectveness comparable to the state-of-the-art effectveness. A major problem of ths approach s the unavalablty of large parallel corpora. To solve ths problem, a mnng system s constructed to automatcally gather parallel Web pages. The mnng results are used to tran statstcal translaton models. When a query s translated word by word, the accuracy may be low. In order to ncrease the translaton accuracy, compound terms are extracted and ncorporated nto the translaton models, so that compounds can be translated as a unt, rather than as separate words. Our experments show that ths can further ncrease the CLIR effectveness. I. INTRODUCTION NFORMATION retreval (IR) tres to dentfy relevant I documents for an nformaton need, expressed as a query. The problems that an IR system should deal wth nclude document ndexng (whch tres to extract mportant ndexes from a document and wegh them), query analyss (smlar to document ndexng), and query evaluaton (.e. matchng the query wth the documents). Each of these problems has been the subject of many studes n IR. Tradtonal IR dentfes relevant documents n the same language as the query. Ths problem s referred to as monolngual IR. Cross-language nformaton retreval (CLIR) tres to dentfy relevant documents n a language dfferent from that of the query. Ths problem s more and more acute for IR on the Web due to the fact that the Web s a truly multlngual envronment. In addton to the problems of monolngual IR, CLIR s faced wth the problem of language dfferences between queres and documents. The key problem s query translaton (or document translaton). Ths translaton rases two partcular problems [6]: the selecton of the approprate translaton terms/words, and the proper weghtng of them. In the last few years, researchers have worked on these problems ntensvely. Three man technques for query translaton have been proposed and tested: - Wth an on-the-shelf machne translaton (MT) system; - Wth a blngual dctonary; - Or wth a set of parallel texts. 1 Département d'informatqueet Recherche opératonnelle, Unversté de Montréal, C.P. 6128, succursale Centre-vlle, Montréal, Québec, H3C 3J7 Canada, ne@ro.umontreal.ca The frst two approaches are qute straghtforward. We wll not gve detals about them. Our research efforts have been concentrated on the thrd approach. Ths approach s promsng because t does not requre extensve manual preparaton (n comparson wth the constructon of an MT system); and ts translaton s usually more approprate than wth a blngual dctonary. The major advantages of ths approach are the followng ones: The tranng of a translaton model can be completely automatc. No (or lttle) manual preparaton s requred. The resultng translaton model reflects well the word usage n the tranng corpus. Ths offers the possblty to tran specalzed and up-to-date translaton models. In ths paper, we wll descrbe our approach to CLIR based on parallel texts, as well as some experments. The paper wll be organzed as follows. In Secton II, we wll frst descrbe brefly the tranng process of statstcal translaton models on a set of parallel texts. Then we wll descrbe n Secton III the IR system we use for our experments. Secton IV descrbes our experments wth the translaton models traned on a manually prepared parallel corpus. Secton V descrbes our approach to mnng parallel Web pages, as well as ther utlzaton for CLIR. Secton VI presents our utlzaton of compound terms n CLIR. Fnally, we present our conclusons n Secton VII. II. TRAINING STATISTICAL TRANSLATION MODELS ON PARALLEL TEXTS Let us frst descrbe brefly the tranng of statstcal translaton models on a set of parallel texts. These models wll be used n our experments. Statstcal translaton models are traned on parallel texts. A par of parallel texts s two texts whch are translaton one of the other. Model tranng tres to extract the translaton relatonshps between elements of the two languages (usually words) by observng ther occurrences n parallel texts. Most work on the tranng statstcal translaton models follows the models (called IBM models) proposed by Brown et al. [1]. In our case we use the IBM model 1. Ths model does not consder word order n sentences. Each sentence s consdered as a bag of words. Any word n a correspondng target sentence s consdered as a potental translaton word of any source word. Ths consderaton s oversmplfed for the purpose of machne translaton. However, for IR, as the goal of query translaton s to dentfy the most probable words wthout consderng the syntactc features, ths smple IEEE Computatonal Intellgence Bulletn June 2003 Vol.2 No.1

2 20 translaton model may suffce. In order to tran a translaton model, parallel texts are usually decomposed nto algned sentences,.e. for each sentence n a text, we determne ts translaton sentence(s) n the other language. The prmary goal of producng sentence algnment s to reduce the scope of translaton relatonshps between words: nstead of consderng a word n a source text to correspond potentally to every word n the target text, one can lmt ths relatonshp wthn the correspondng sentences. Ths allows us to take full advantage of the parallel texts and to produce a more accurate translaton model. A. Sentence Algnment Sentence algnment tres to create translaton relatonshps between sentences. Sentences are not always algned nto 1:1 pars. In some cases, one sentence can be translated nto several sentences, and the sentence may even be deleted or a new sentence may be added n the translaton. Ths adds some dffcultes n sentence algnment. Gale & Church [5] propose an algorthm based on sentence length. It has been shown that ths algorthm can successfully algn the Canadan Hansard corpus (the debates n the Canadan House of Commons n both Englsh and French), whch s rather clean and easy to algn. However, as ponted out by Smard et al. [12] and Chen [3], whle algnng more nosy corpora, the methods based solely on sentence length are not robust enough to cope wth the above-mentoned dffcultes. Smard et al. proposed a method that uses lexcal nformaton, cognates, to help wth algnment [12]. Cognates are pars of tokens of dfferent languages, whch share obvous phonologcal or orthographc and semantc propertes, wth the result that they are lkely to be used as mutual translatons. Examples are generaton/génératon and fnanced/fnancé for Englsh/French. In a wder sense, cognates can also nclude numercal expressons and punctuaton. Instead of defnng a specfc lst of cognates for each language par, Smard et al. gave language-ndependent defntons on cognates. Cognates are recognzed on the fly accordng to a seres of rules. For example, words startng wth 4 dentcal letters n Englsh and French are consdered as cognates. Another method ncorporates a dctonary [3]. The translatons contaned n the dctonary serve as cues to sentence algnment: a sentence s lkely to algn wth another sentence f the latter contans several dctonary translatons of the words of the former. In our mplementaton, we use the approach of Smard et al. [12]. B. Model Tranng The prncple of model tranng s: n a set of algned sentences, f a target word f often co-occur wth a source word e n the algned sentences, then there s a hgh chance that f s a translaton of e,.e. the translaton probablty t(f e) s hgh. The tranng algorthm uses dynamc programmng to Feature Artcle: Jan-Yun Ne determne a probablty functon t(f e) such that t maxmzes the expectaton of the gven sentence algnments (see [1] for detals). We brefly descrbe the tranng for IBM model 1 as follows. The translaton probablty functon t s determned such as to maxmze the probablty of the gven sentence algnments A of the tranng corpus. Suppose a sentence algnment e f, and that the sentences e and f are composed of set of words as follows: e={e 1, e 2, e 3,..., e l }, f ={f 1, f 2, f 3,..., f m } where l and m are respectvely the length of these sentences. Then the functon t s determned as follows: t = arg max p( A) t e f = arg max p( f e) t = arg max ε(1 + l) t m m e f j= 1 = 0 l t( f j e ) where ε s the probablty that an Englsh sentence of length l can be translated nto a French sentence of length m, and t(f j e ) the word translaton probablty of e by f j. The probablty t can be determned by applyng the teratve EM (Expectaton maxmzaton) algorthm. We do not gve detals here. Interested readers can refer to [1]. IBM model 1 consders every word n the target sentence to be equvalently possble translaton of any word n the source sentence, regardless to ther poston and to the fertlty of each word (e.g. an Englsh word may be translated by one or more French words). It s obvous that the translaton model does not learn syntactc nformaton from the tranng source and thus cannot be used to obtan syntactcally correct translatons. However, the model s able to determne the word translaton probablty t between words, and ths fts the need of cross-language nformaton retreval of fndng out the most mportant translaton words. III. IR SYSTEM In our experments we use the SMART system. SMART s an IR system, developed at Cornell Unversty [2]. The ndexng process consders every token as an ndex. Indexes are weghted accordng to the tf*df weghtng scheme 2. Ths s a common way to wegh the mportance and specfcty of a term n a document. The prncple s as follows: 1) the more a word occurs n a document, the more t s mportant. Ths s the tf factor. On the other hand, the more there are documents contanng the word, the less the word s specfc to one partcular document. In other words, the word does not allow dstngushng a document from the others. Therefore, the weght of the word s lowered. Ths s the df factor. More precsely, the two factors are measured as follows: 2 tf = term frequency, and df = nversed document frequency. June 2003 Vol.2 No. 1 IEEE Computatonal Intellgence Bulletn

3 Feature Artcle: Cross-Language Informaton Retreval 21 tf ( t, D) = log( freq( t, D) + 1); N df ( t) = log( ) n( t) where freq(t, D) s the frequency of occurrences of the word/term t n the document D; N s the total number of documents n the collecton; n(t) s the number of documents contanng t. The retreval process follows the vector space model [2]. In ths model, a vector space s defned by all the tokens (words or terms) encountered n the documents. Each word/term represents a dstnct dmenson n ths space. Then a document, as well as a query, s represented as a vector n ths space. The weght n a dmenson represents the mportance of the correspondng word/term n the document or query (the tf*df weght). The degree of correspondence between a document and a query s estmated by the smlarty of ther vectors. One of the commonly used smlarty measures s as follows: d q sm( D, Q) = 2 2 d q where d and q are respectvely the weghts of a term n the document D and n the query Q. IV. EXPERIMENTS WITH THE HANSARD MODELS There are a few manually constructed parallel corpora. The best known s the Canadan Hansard, whch contans the debates of the Canadan parlaments durng 7 years, n both French and Englsh. It contans dozens of mllons words n each language. Such a parallel corpus s a valuable resource that contans word/term translatons. Our frst experments are carred out wth translaton models traned on the Hansard corpus- we call the resultng models the Hansard models. We used two test collectons developed n TREC 3, one n Englsh (AP) and the other n French (SDA). Both collectons contan newspaper artcles. The SDA contans 141,656 documents, and AP 242,918 documents. We use two sets of about 30 queres, avalable n both French and Englsh. These queres have been used n TREC6 and TREC7 for French- Englsh CLIR. The queres have been manually evaluated (.e. we know ther relevant documents). Table I shows the CLIR effectveness obtaned wth these translaton models. F- E means usng French queres to retreve Englsh documents,.e. the French queres are frst translated nto Englsh, then the Englsh translaton s used to match the documents. In all our experments, we select the 25 most probable translaton words as the translaton of a query. In Table I, the effectveness s measured by average precson,.e. the average of the precsons over 11 ponts of recall. Ths s a standard measure used n IR. We also show 3 TREC: Text Retreval Conference, a seres of conferences amng to test IR systems wth large document collectons. See the percentage of the CLIR effectveness wth respect to the monolngual IR effectveness (%mono). In comparson wth the state-of-the-art effectveness, whch s usually around 80-90% of the monolngual effectveness (see the reports of TREC at the results we obtaned are qute comparable. TABLE I. AVERAGE PRECISION USING HANSARD MODEL F-E (%mono) E-F (%mono) Trec (74.8%) (67.9%) Trec (97.6%) (93.6%) V. MINING OF PARALLEL WEB PAGES A major problem to use parallel texts s often the unavalablty of large parallel corpora. In order to obtan such corpora, we constructed a mnng system PTMner [4] to automatcally gather parallel Web pages. Although many parallel Web pages exst on the Web, t s not obvous to dentfy them and to confrm that a par of pages s truly parallel. In our mnng approach, we explot several heurstc features. For example, f an Englsh page ponts to another page wth an anchor text French verson or verson françase, ths s a useful ndcaton that the second page s a French verson of the frst page. Although these ndcatons are not fully accurate, and they can produce errors, we wll show later n our experments that a nosy parallel corpus s stll useful for query translaton n CLIR. In the followng subsectons, we wll brefly descrbe our mnng approach. A. Automatc Mnng Parallel web pages often are not publshed n solaton. Most of the tme, they are connected n some way. For example, Resnk [11] observed that parallel Web pages often are referenced n the same parent ndex web page. In addton, the anchor text of such lnks usually dentfes the language. For example, f a home page ndex.html contans lnks to both Englsh and French versons of the next page, and that the anchor texts of the lnks are respectvely Englsh verson and French verson, then the referenced pages are parallel. In addton, Resnk assumes that parallel Web pages have been ndexed by large search engnes exstng on the Web. Therefore, n hs approach, a query of the followng form s sent to Alta Vsta n order to frst retreve the common ndex page: anchor: englsh AND anchor: French Then the referenced pages n both languages are retreved and consdered to be parallel pages. We notce that only a small number of web stes are organzed n ths way. Many other parallel pages do not satsfy ths condton. Our mnng strategy uses dfferent crtera. In addton, we also ncorporate an exploraton process (host crawler) n order to dscover more web pages IEEE Computatonal Intellgence Bulletn June 2003 Vol.2 No.1

4 22 that have not been ndexed by the exstng search engnes. Our mnng process s separated nto two man steps: frst dentfy as many canddate parallel pages as possble, then verfy external features and contents to determne f they are parallel. Our mnng system s called PTMner (for Parallel Text Mner). The whole process s organzed nto the followng steps: 1. Determnng canddate stes Ths step tres to dentfy the Web stes where there may be parallel pages. 2. Fle name fetchng It dentfes a set of Web pages from each Web page that are ndexed by search engnes. 3. Host crawlng It uses the URLs collected n the last step as seeds to further crawl each canddate ste for more URLs. 4. Par scannng by names It pars the Web pages accordng to the smlarty of ther URLs. IDENTIFICATION OF CANDIDATE WEB SITES To determne canddate stes, we assume that a canddate ste contans at least one page that refers to another verson of the page, and the anchor text of the reference clearly dentfes the language. For example, an Englsh Web page contans a lnk to the French verson, and the anchor text s French verson, n French, en franças and so on. So to determne the canddate stes, we send a partcular request to search engnes askng for Englsh pages that contan a lnk wth an anchor text dentfyng another language such as: anchor: french verson, [n french,...] language: Englsh The host addresses we extract from the resultng Web pages correspond to the canddate stes. FILE NAME FETCHING To search for parallel pars from each canddate ste, PTMner frst asks the search engnes for all the Web pages from ths ste they have ndexed. Ths s done by a query of the followng form: host: <hostname> However, a search engne may not ndex all the Web pages on a ste. To obtan a more complete lst of URLs from a ste, we need to explore the stes more thoroughly by a host crawler. HOST CRAWLING A host crawler s slghtly dfferent from a Web crawler or a robot [10] n that a host crawler only explots one Web ste. A breadth-frst crawlng algorthm s used n ths step. The prncple s that f a retreved Web page contans a lnk to an unexplored document on the same ste, ths document s added to a lst that wll be explored later. Ths crawlng step allows us to obtan more web pages from the canddate stes. PAIR SCANNING BY NAMES We observe that many parallel pages have very smlar fle Feature Artcle: Jan-Yun Ne names. For example, an Englsh web page wth the fle name ndex.html often corresponds to a French translaton wth the fle name ndex_f.html, ndex_fr.html, and so on. The only dfference between the two fle names s a segment that dentfes the language of the fle. Ths same observaton also apples to URL paths. In some cases, the two versons of the web page are stored n two dfferent drectores, for example, vs. So n general, a smlarty n the URLs of two fles s a good ndcaton of ther parallelsm. Ths smlarty s used to make a prelmnary selecton of canddate pars. FILTERING AFTER DOWNLOADING The remanng fle pars are downloaded for further content verfcaton accordng to the followng crtera:. Length of the pages: A par of parallel pages usually has smlar fle lengths. A smple verfcaton s then to compare the lengths of the two fles. Note that the length rato changes between dfferent language pars. HTML structure: Parallel web pages are usually desgned to look smlarly. Ths often means that the two parallel pages have smlar HTML structures. Therefore, the smlarty n HTML tags s another flterng crteron. The par-scannng crteron we used only explots the name smlarty of parallel pages. Ths s not a fully relable crteron. Fles wth a segment en_ may be not n Englsh. Therefore, a further verfcaton s needed to confrm that the fles are n the requred languages. In our system, we use the SILC4 system for an automatc language and encodng dentfcaton. Wth PTMner, we have been able to collect several parallel corpora from the Web. Table II shows some of them. TABLE II. SIZES OF THE WEB CORPORA FR-EN DE-EN IT-EN # Text Pars Raw data (MB) Cleaned data (MB) In our further descrpton, we wll concentrate on the French-Englsh par. B. CLIR Wth the Web Models Translaton models are traned on the set of parallel Web pages as descrbed n Secton 2, except that some preprocessng has to be performed on these pages n order to remove HTML tags. Once translaton models (n both drectons) are traned, they are used to produce 25 most probable translaton words that are consdered as the translaton of a query. Table III descrbes the CLIR 4 See June 2003 Vol.2 No. 1 IEEE Computatonal Intellgence Bulletn

5 Feature Artcle: Cross-Language Informaton Retreval 23 effectveness wth the Web models. TABLE III. AVERAGE PRECISION USING WEB MODEL F-E (%mono) E-F (%mono) Trec (72.6%) (70.4%) Trec (74.3%) (71.5%) In comparson wth the Hansard model, we see that the Web models perform slghtly worse. However, consderng the nose that ths tranng corpus may contan, ths effectveness s qute good. It s stll close to the state-of-theart effectveness. Ths test shows that the automatcally mned parallel Web pages are greatly useful for CLIR. VI. INCORPORATING COMPOUND TERMS IN TRANSLATION MODELS In the prevous approach, parallel texts have been exploted to fnd translatons between sngle words. The most obvous problem we can see s that by takng words one by one, many of them become ambguous. The translaton model wll then suggest several translatons correspondng to dfferent meanngs of the word. For example, the word nformaton (n French) wll have many possble translatons because 1) the word denotes several meanngs; 2) t appears very frequently n the parallel corpus. Among the possble translatons, there are nformaton, ntellgence, esponage, etc. However, f the term we ntend to translate s système d nformaton (nformaton system), and f the term s translated as a whole, then many of the meanngs of nformaton can be elmnated. The most probable translaton of ths term wll be the correct term nformaton system. Through ths example, we can see that a translaton model that ntegrates the translaton of compound terms can be much more precse. Ths s the goal of our utlzaton of compounds durng query translaton. To do ths, we have to tran a translaton model that ncorporates compound terms as addtonal translaton unts to words. So compound terms are frst extracted from the tranng parallel corpus, and added to the orgnal sentences. Then the same translaton process s launched. The resultng model contans the translatons for both sngle words and compound terms. To dentfy compound terms, we use both a large termnology database contanng almost 1 mllon words and terms, and an automatc extractor of compound terms. The extractor uses syntactc structures, together wth a statstcal analyss. Frst, word sequences correspondng to predefned syntactc templates are extracted as canddates. If the frequency of occurrences of a canddate s above a certan threshold, then the sequence s consdered as a compound term. The frst problem s the defnton of the syntactc templates. Ths s done manually accordng to the general knowledge on syntactc structures of a language. Usually the extracton s restrcted to noun phrases. For example, the followng template s used n the tool we used - Exterm: ((NC AJ) )*((NC AJ) NC PP) ((NC AJ) )*NC where NC means a common noun, AJ an adjectve, and PP a preposton. Of course, a POS (Part-Of-Speech) taggng s necessary n order to recognze the syntactc category of each word. The tagger we used s a statstcal tagger traned on the Penn Treebank 5. It tres to determne the most probable syntactc categores that ft the best the words of a sentence. Detals on the tranng of such a tagger can be found n [7]. All the terms and words n documents, queres and the tranng parallel corpus are submtted to a standardzaton process on words, as follows: Nouns n plural are transformed nto sngular form (e.g. systems system); Verbs are changed nto nfntve form (e.g. retreves retreve, retrevng retreve); Artcles n a term s removed (e.g. the database system) For example, the expresson adjusted the earnngs wll be transformed nto adjust earnng. Once a compound term s recognzed n a document or a query, t s added nto the document or query. For example, consder a preprocessed text as follows: arm dealer prepare relef supply to sovet unon From ths segment, we can extract two stored terms arm dealer and sovet unon So the followng terms are appended to the orgnal text: arm_dealer sovet_unon Once compound terms are extracted from the tranng texts, the corpus s submtted to the tranng process of translaton models descrbed n Secton 2. However, as compounds are consdered as unts of the texts, the resultng translaton models wll also contan translatons for the compounds, whch are usually more accurate than ther wordby-word translatons. A. Experments on CLIR Table IV shows the CLIR results wth both types of translaton model. These results are obtaned on the same document collecton as the one used earler, but the query set s dfferent. In these experments, we separate sngle words and compound terms nto two separate vectors. SMART has the flexblty of buldng multple vectors for a document and for a query. Then the global smlarty between the document and the query s determned by the weghted sum of the smlartes between the vectors. One can assgn a relatve weght to dfferent vectors of the query to balance ther mportance n the global smlarty. In our experments, we tested several values for the relatve weghts of the sngle-word vector and the compoundterm vector. The above results are obtaned wth the relatve 5 IEEE Computatonal Intellgence Bulletn June 2003 Vol.2 No.1

6 24 Feature Artcle: Jan-Yun Ne mportance of 0.3 to the compound-term vector, and 1 to the sngle-word vector. Ths assgnment gves the best result. We can see a great mprovement n CLIR effectveness once the translaton model ncorporates compound terms, especally for the F-E case. We have not appled the same approach to the Web corpus. However, we could expect smlar mprovements wth the Web corpus when compound terms are ncorporated. TABLE IV. THE CLIR EFFECTIVENESS WITH DIFFERENT MODELS. Word Compounds (change) F-E on AP data set (+76.86%) E-F on SDA data set (+26.72%) VII. CONCLUSIONS In ths paper, we descrbed an approach based on parallel texts that has bee used for CLIR at Unversty of Montreal. Globally, our experments show that the statstcal translaton models traned on parallel texts are hghly useful for CLIR. They can acheve comparable effectveness to the state-ofthe-art approaches. Our further tests wth the parallel Web pages mned automatcally show that we can arrve at a reasonable level of effectveness despte the relatvely hgh rate of nose n the tranng parallel Web pages. Ths seres of experments show that our method based on parallel Web pages s sutable for CLIR. Nevertheless, we also observe several aspects that requre mprovements: We encounter problems for translatng proper names. Proper names are often treated as unknown words, and are added nto the translaton as t s. For some names, the spellngs n all the European languages are the same, whch does not rase partcular problems. For some others wth dfferent spellngs (e.g. Bérégovoy n French, but Beregovoy n some Englsh documents), ths smple approach does not solve the problem. Translaton by common but non stop- words: Very often, among the top translaton words, the common words such as prendre and donner ( take and gve n French) appear wth qute strong probablty. The mned parallel Web pages contan a certan amount of nose. To mprove the translaton accuracy, a further flterng of nose s necessary. We are currently nvestgatng on these problems. [4] J. Chen, J.Y. Ne. Automatc constructon of parallel Englsh-Chnese corpus for cross-language nformaton retreval. Proc. ANLP, pp , Seattle (2000). [5] W. A. Gale, K.W. Church, A program for algnng sentences n blngual corpora, Computatonal Lngustcs, 19: 1, (1993). [6] G. Grefenstette. The Problem of Cross-Language Informaton Retreval. In Cross-language Informaton Retreval. Kluwer Academc Publshers. pages 1-9, 1998 [7] C. Mannng, H. Shultze, Fundamentals of Statstcal Natural Language Processng, MIT Press, 1999 [8] J.Y. Ne, P. Isabelle, M. Smard, R. Durand, Cross-language nformaton retreval based on parallel texts and automatc mnng of parallel texts from the Web, ACM-SIGIR conference, Berkeley, CA, pp (1999). [9] J.Y. Ne, J.F. Dufort, Combnng Words and Compound Terms for Monolngual and Cross-Language Informaton Retreval, Informaton 2002, Bejng, July [10] Prosse J., Crawlng the Web, A gude to robots, spders, and other shadowy denzens of the Web, PC Magazne - July 1996 ( [11] Resnk, Phlp (1998) Parallel stands: A prelmnary nvestgaton nto mnng the Web for blngual text, AMTA'98, Lecture Notes n Artfcal Intellgence, 1529, October. [12] M. Smard, G. Foster, P. Isabelle, Usng Cognates to Algn Sentences n Parallel Corpora, Proceedngs of the 4 th Internatonal Conference on Theoretcal and Methodologcal Issues n Machne Translaton, Montreal (1992). REFERENCES [1] P. F. Brown, S. A. D. Petra, V. D. J. Petra, and R. L. Mercer, The mathematcs of machne translaton: Parameter estmaton. Computatonal Lngustcs, vol. 19, pp (1993). [2] Buckley, C. (1985) Implementaton of the SMART nformaton retreval system. Cornell Unversty, Tech. report [3] Chen, S. F. Algnng sentences n blngual corpora usng lexcal nformaton. Proc. ACL, pp. 9-16, June 2003 Vol.2 No. 1 IEEE Computatonal Intellgence Bulletn

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification Research Journal of Appled Scences, Engneerng and Technology 5(4): 1278-1283, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: June 28, 2012 Accepted: August 08, 2012

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Information Retrieval

Information Retrieval Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment

Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment Cross-lngual Pseudo Relevance Feedback Based on Weak Relevant opc Algnment WANG Xu-wen Insttute of Medcal Informaton & Lbrary, Chnese Academy of Medcal Scences, Beng 100020 wang.xuwen@mcams.ac.cn ZHANG

More information

Improving the Quality of Information Retrieval Using Syntactic Analysis of Search Query

Improving the Quality of Information Retrieval Using Syntactic Analysis of Search Query Improvng the Qualty of Informaton Retreval Usng Syntactc Analyss of Search Query Nadezhda Yarushkna 1[0000-0002-5718-8732], Aleksey Flppov 1[0000-0003-0008-5035], and Mara Grgorcheva 1[0000-0001-7492-5178]

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks Decson Strateges for Ratng Objects n Knowledge-Shared Research etwors ALEXADRA GRACHAROVA *, HAS-JOACHM ER **, HASSA OUR ELD ** OM SUUROE ***, HARR ARAKSE *** * nsttute of Control and System Research,

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Web-supported Matching and Classification of Business Opportunities

Web-supported Matching and Classification of Business Opportunities Web-supported Matchng and Classfcaton of Busness Opportuntes. DIRO Unversté de Montréal C.P. 628, succursale Centre-vlle Montréal, Québec, H3C 3J7, Canada Jng Ba, Franços Parads,2, Jan-Yun Ne {bajng, paradfr,

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

IN recent years, we have been witnessing the explosive

IN recent years, we have been witnessing the explosive IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 4, JULY/AUGUST 2003 1 Query Expanson by Mnng User Logs Hang Cu, J-Rong Wen, Jan-Yun Ne, and We-Yng Ma, Member, IEEE Abstract Queres to

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Intrinsic Plagiarism Detection Using Character n-gram Profiles

Intrinsic Plagiarism Detection Using Character n-gram Profiles Intrnsc Plagarsm Detecton Usng Character n-gram Profles Efstathos Stamatatos Unversty of the Aegean 83200 - Karlovass, Samos, Greece stamatatos@aegean.gr Abstract: The task of ntrnsc plagarsm detecton

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Query classification using topic models and support vector machine

Query classification using topic models and support vector machine Query classfcaton usng topc models and support vector machne Deu-Thu Le Unversty of Trento, Italy deuthu.le@ds.untn.t Raffaella Bernard Unversty of Trento, Italy bernard@ds.untn.t Abstract Ths paper descrbes

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Inducing translations from officially published materials in Canadian government websites

Inducing translations from officially published materials in Canadian government websites Inducng translatons from offcally publshed materals n Canadan government webstes Qbo Zhu Statstcs Canada & Insttute of Cogntve Scence, Carleton Unversty 1125 Colonel By Drve, Ottawa, Ontaro, Canada K1S

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Ranking Search Results by Web Quality Dimensions

Ranking Search Results by Web Quality Dimensions Rankng Search Results by Web Qualty Dmensons Joshua C. C. Pun Department of Computer Scence HKUST Clear Water Bay, Kowloon Hong Kong punjcc@cs.ust.hk Frederck H. Lochovsky Department of Computer Scence

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION 1 FENG YONG, DANG XIAO-WAN, 3 XU HONG-YAN School of Informaton, Laonng Unversty, Shenyang Laonng E-mal: 1 fyxuhy@163.com, dangxaowan@163.com, 3 xuhongyan_lndx@163.com

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

Background Removal in Image indexing and Retrieval

Background Removal in Image indexing and Retrieval Background Removal n Image ndexng and Retreval Y Lu and Hong Guo Department of Electrcal and Computer Engneerng The Unversty of Mchgan-Dearborn Dearborn Mchgan 4818-1491, U.S.A. Voce: 313-593-508, Fax:

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

SEMANTIC SEARCH OF INTERNET INFORMATION RESOURCES ON BASE OF ONTOLOGIES AND MULTILINGUISTIC THESAURUSES. Anatoly Gladun, Julia Rogushina

SEMANTIC SEARCH OF INTERNET INFORMATION RESOURCES ON BASE OF ONTOLOGIES AND MULTILINGUISTIC THESAURUSES. Anatoly Gladun, Julia Rogushina 48 SEMANTIC SEARCH OF INTERNET INFORMATION RESOURCES ON BASE OF ONTOLOGIES AND MULTILINGUISTIC THESAURUSES Anatoly Gladun, Jula Rogushna Abstract: the approaches to the analyss of varous nformaton resources

More information

A fault tree analysis strategy using binary decision diagrams

A fault tree analysis strategy using binary decision diagrams Loughborough Unversty Insttutonal Repostory A fault tree analyss strategy usng bnary decson dagrams Ths tem was submtted to Loughborough Unversty's Insttutonal Repostory by the/an author. Addtonal Informaton:

More information

Classic Term Weighting Technique for Mining Web Content Outliers

Classic Term Weighting Technique for Mining Web Content Outliers Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha

More information

SAO: A Stream Index for Answering Linear Optimization Queries

SAO: A Stream Index for Answering Linear Optimization Queries SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

CUM: An Efficient Framework for Mining Concept Units

CUM: An Efficient Framework for Mining Concept Units CUM: An Effcent Framework for Mnng Concept Unts P.Santh Thlagam Ananthanarayana V.S Department of Informaton Technology Natonal Insttute of Technology Karnataka - Surathkal Inda 575025 santh_soc@yahoo.co.n,

More information

Learning to Classify Documents with Only a Small Positive Training Set

Learning to Classify Documents with Only a Small Positive Training Set Learnng to Classfy Documents wth Only a Small Postve Tranng Set Xao-L L 1, Bng Lu 2, and See-Kong Ng 1 1 Insttute for Infocomm Research, Heng Mu Keng Terrace, 119613, Sngapore 2 Department of Computer

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Application of k-nn Classifier to Categorizing French Financial News

Application of k-nn Classifier to Categorizing French Financial News Applcaton of k-nn Classfer to Categorzng French Fnancal News Huazhong KOU, Georges GARDARIN 2, Alan D'heygère 2, Karne Zetoun PRSM Laboratory, Unversty of Versalles Sant-Quentn 45 Etats-Uns Road, 78035

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment A Webpage Smlarty Measure for Web Sessons Clusterng Usng Sequence Algnment Mozhgan Azmpour-Kv School of Engneerng and Scence Sharf Unversty of Technology, Internatonal Campus Ksh Island, Iran mogan_az@ksh.sharf.edu

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information