Inducing translations from officially published materials in Canadian government websites

Size: px
Start display at page:

Download "Inducing translations from officially published materials in Canadian government websites"

Transcription

1 Inducng translatons from offcally publshed materals n Canadan government webstes Qbo Zhu Statstcs Canada & Insttute of Cogntve Scence, Carleton Unversty 1125 Colonel By Drve, Ottawa, Ontaro, Canada K1S 5B6 Qbo.Zhu@statcan.ca Dana Inkpen School of Informaton Technology & Engneerng, Unversty of Ottawa 800 Kng Edward, Ottawa, Ontaro, Canada K1N 6N5 dana@ste.uottawa.ca Ash Asudeh Insttute of Cogntve Scence & School of Lngustcs and Language Studes, Carleton Unversty 1125 Colonel By Drve, Ottawa, Ontaro, Canada K1S 5B6 ash_asudeh@carleton.ca Abstract Btexts collected from web-based materals that are offcally publshed on government webstes can be used for a varety of purposes n language analyss and natural language processng. Mnng offcally publshed web pages can thus be an nvaluable undertakng for translators n government departments who are producng the translatons, and for machne translaton researchers who are studyng how those translatons are produced. In ths paper, we present the StatCan Daly Translaton Extracton System (SDTES) and demonstrate how t s used to nduce translatons from offcally publshed blngual materals from government webstes n Canada. New evaluaton results show that SDTES s a very effectve system for dentfyng and extractng sentences that are translaton pars from most of the federal government web pages whch are currently under the CLF2 (Common Look and Feel for the Internet 2.0) framework. 1 Introducton Well-algned blngual materals are a useful source of translatons that can be used for translaton studes, translaton recyclng, tranng data n machne learnng, translaton memores, nformaton retreval, machne translaton, machne aded translaton, and natural language processng (Jutras, 2000; Moore, 2002; Callson-Burch et al., 2005; Hutchns, 2005; Deng et al., 2006). In the past decade or so, an emergng trend n the use of blngual texts has become rather notceable: use of the web as a blngual corpus (Resnk, 1999; Chen and Ne, 2000). Wth the ncreasngly wdespread avalablty of vast quanttes of web-based blngual texts, more and more researchers are explorng ways to collect, from the web, blngual texts of such language pars as Englsh-French, Englsh- German, Englsh-Italan, Englsh-Chnese, Englsh-Arabc and others (Kraaj et al., 2003; Resnk and Smth, 2003). Web-based materals that are offcally-publshed blngual materals on Canadan government webstes contan exemplary translatons, and mnng these web-based blngual materals can be benefcal to translators, lngusts and researchers n many felds. The StatCan Daly Translaton Extracton System (SDTES) s a system that automatcally extracts translatons from the Daly news release texts of Statstcs Canada (Zhu et al., 2007). In ths paper, we wll descrbe the algorthms and procedures of SDTES and present new results that show that SDTES can be effectvely used to nduce translatons from other government webstes n Canada. The paper s dvded nto 4 parts. The frst part s ths ntroducton. In the second part, we wll

2 analyze the general characterstcs of the blngual texts from Canadan government webstes. The thrd part contans the algorthms and procedures we used to nduce translatons from web-based blngual materals. In the fourth part, we wll present evaluaton metrcs and results. The ffth part s the concluson. 2 Features of web-based materals n Canadan government webstes Here are some general features that we observed of offcally-publshed web materals on government webstes n Canada. 1. It s requred that federal government web pages n Canada be blngual. For example, t s relatvely rare for us to fnd a statc HTML Englsh page wthout a translated French page. Dfferent departments or agences may have dfferent fle namng conventons, but we generally do not have cases where two Englsh pages correspond to one French page or vce versa. 2. Texts are normally translated by government employees who are traned n translaton or text edtng. There are rules and gudelnes for them to observe about translatng, edtng and proofreadng texts to be offcally publshed. For example, wthout a good reason, edtors are not allowed to add texts n one language to convey messages that are not present n the other language. Ths means that nsertons and deletons at the sentence level wll be very rare, f any. 3. Web pages have to be presented n the same way for the two languages. For example, f we use a headng (h1) for the ttle n the Englsh page, we should do the same for the correspondng French page. Currently, almost all the federal government web stes are requred to be under the CLF2 framework. Therefore we can expect to fnd many consstent correspondences n some man HTML markups such as h1, h2, h3, table, etc. 4. Important modules and templates of web pages normally go through usablty tests before they are used for offcally publshed translaton products. One suggeston that often surfaces from users s that extra long web pages should be avoded when we can. If possble, long web pages should be ether splt nto shorter pages, or rewrtten to accommodate easy navgaton and browsng. Therefore web pages on government webstes are usually not very long. 5. Most texts are doman specfc or text-genre specfc, and use standard termnology of the specfc department. The texts are more of a standard wrtten style than of a spoken language style. Texts are mostly expostory data and the translaton style s typcally not free translaton. All these gve rse to frequent translaton correspondences at the word level, phrase level or sentence level. Some stes, such as the stes that publsh data for statstcs, banks, and fnances, contan a lot of numercal data. Numbers can serve as mportant anchors for text algnment. We used these observatons as basc assumptons about the web pages of government webstes n Canada, and desgned algorthms and procedures to algn the web pages and to detect potental errors n translaton and n algnment. 3 Algorthms and procedures SDTES contans two major components: one for blngual text algnment and the other for msalgnment detecton. In ths part, we present a set of protocols and algorthms that are desgned for the two components. 3.1 Text mappng usng the Gale-Church statstcal model The algnment component of SDTES s mostly based on the Gale and Church (1993) length model. The basc assumpton of ths model s that there s a strong lkelhood that, for example, a long sentence n Englsh wll correspond to a long sentence n French; smlarly a short sentence n one language wll correspond to a short sentence n the other. Roughly speakng, f the average lengths of sentences n French and Englsh are known, t s possble to set up a dstrbuton of algnment possbltes from the sentence length nformaton. Suppose we have parallel texts S and T whch can be splt nto n segments each. Each segment n S (s ) s the translaton of a segment n T (t ), and they are algned to form an algnment segment par a. For each, 1 n. A can be defned as the algnment of S and T that conssts of a seres of algned segment pars: A a1,..., an. For the set B of all possble algnments ( A B), the goal s to fnd the mum-lkelhood algnment:

3 A = arg Pr( A S, T ) (1) It s assumed that the probablty of any algned segment par s ndependent of any other segment par: B = arg Pr( a s ) = 1 A (2) The next assumpton s that the length dfference functon d ( s ) s the only feature nfluencng the algnment probablty: B = arg Pr( a d( s )) = 1 A (3) By Bayes theorem, equaton (3) becomes: = 1 P( N M ) P( M ) P ( M N) =, P( N) B Pr( d ( s ) a ) Pr( a ) A = arg (4) Pr( d( s, t )) where the dstrbutons Pr( d ( s, t ) a ) and Pr( a ) can be estmated usng hand-algned data. When the normalzng constant Pr( d ( s )) s gnored and the logarthm s used, B = arg log Pr( d( s ) a )Pr( a ) = 1 A (5) Gale and Church employed dynamc programmng for establshng the optmal algnment to solve Equaton (5). When applyng the Gale and Church algorthm n SDTES, we adopted a two-parse algnment procedure. Pror to the procedure, SDTES moves table contents and mage contents to the end of the HTML page because n many cases these floatng elements n HTML pages can end up n dfferent paragraph postons n dfferent languages. In some cases these elements can be the source of massve msalgnment for the Gale-Church algorthm. In the frst parse, SDTES counts the few man HTML elements such as h2tle, and others to see f the par of text fles has the same number of man HTML features n them. If the numbers were the same, the system splts the texts nto blocks separated by these feature HTML tags. SDTES frst locates the paragraph boundares n the texts and assgn sentence delmters to them. Then t assgns paragraph delmters to text blocks that are dvded by the man HTML tags. If the numbers of major HTML elements are dfferent, the system treats the whole text document as a sngle text block, and assgns a paragraph delmter to t. When the paragraph delmters and the sentence delmters are all assgned, SDTES does the frst round of text algnment by the Gale-Church algorthm. In the second parse, the system automatcally reconstructs the Englsh document and the French document from the pared paragraphs that are algned n the frst parse. SDTES locates the sentence boundares n the algned texts and assgn sentence delmters to them. Then SDTES separates the two halves n each algned par, and assgns a paragraph delmter to each of them. After ths, texts of dfferent languages are collected and reassembled n two fles, one for Englsh and one for French. By reorganzng the text structures on the bass of the paragraph parng results n the frst around of text algnment, SDTES was able to reset the Englsh and French documents to the orgnal text format pror to the ntal algnment. The two nput fles are processed wth the newly assgned boundary symbols by the Gale-Church algorthm. The second parse produces text algnment at a more fne-graned text segment level. 3.2 Msalgnment detecton usng anchor nformaton Anchor nformaton ncludes postons or propertes n one text whch seem to match up wth those n a parallel text. The nformaton can be about structural features n the text or delmters and markers that ndcate hard and soft boundares (Gale and Church, 1993:89), or true ponts of correspondence (Melamed, 1999:107) n algnment. Usng anchor nformaton, regons of text can be dentfed and algnments of text segments can be sought. Most algnment methods make use of anchor nformaton n one way or another. In STDES, we use the followng structural features as anchors to assst msalgnment dentfcaton.

4 HTML markups: Anchor nformaton can be markups that go wth the text and that reveal metanformaton or style nformaton about the text. For example some commonly used HTML tags such as h1, h2, h3, br, p, hr, table,, pre, form, mg, and a can be good anchors n blngual text algnment. For a more detaled lst of structural tags, format tags, content tags and rrelevant tags that can be used n algnment, see Sanchez-Vllaml et al. (2006). As ndcated earler, n Canadan government webstes, we are lkely to have a prolferaton of such HTML markups. Usually, f a text n French contans a markup for a secton n talcs, then the correspondng secton n Englsh s lkely to have the same markup. We dd some HTML style unfcaton formattng so that some parts of the HTML codes are hghlghted, whle some are gnored. For example, the code <a > becomes <a alnk> after the unfcaton formattng, because otherwse the lnk to an Englsh webpage and the lnk to a French webpage mght have been dfferent. Lexcal unts: Specfc lexcal unts such as words or phrases can serve as anchor ponts (Brown et al., 1991). In SDTES, we manly use cognate words as anchors to help locate traces of algnment devatons. Identfcaton of cognates n SDTES s a two-step operaton. The frst step s to produce canddate cognate lsts usng the K-vec algorthm (Fung and Church, 1994). The man objectve s to fnd cognate canddates wthn an acceptable text-regon range and lmt the number of words to be consdered as cognate pars. The K-vec method was developed as a means of generatng a quck-and-drty estmate of a blngual lexcon that could be used as a startng pont for a more detaled algnment algorthm (Fung and Church, 1994). We use the K-vec++ package (Pedersen and Varma, 2002) for the mplementaton of the K-vec algorthm. The package s called the K-vec++ package because t extends the K-vec algorthm n a number of ways. Usng the Perl programs n the K-vec++ package, SDTES was able to obtan a very rough lst that mght contan cognates for each par of documents, although there s a lot of nose n the lst too. In the second step, we apply an algorthm called Acceptable Matchng Sequence (AMS) to dentfy true cognates from the canddate cognate lsts generated by the K-vec algorthm. Ths s an algorthm that we desgned and developed n SDTES. An AMS has two non-overlappng substrngs that can be matched n the same order n both of the words n a cognate par. The algorthm extracts two substrngs (θ a and β b ) from a source word, say a French word (W 1 ), wth a length threshold (T) for the two substrngs combned. The purpose s to fnd f the target Englsh word (W 2 ) has the two substrngs θ a and β b n the same order n the strng sequence. Suppose x = mn(length(w 1 ), length(w 2 )), and y=(length(w 1 ), length(w 2 )), and z s the length dfference threshold. length(w 1 ) - length(w 2 ) z. We use the length dfference threshold for ntal flterng: f y>10 then 0 z 4, otherwse 0 z 3. SDTES determnes the T parameter wth reference to y. Snce T s the combned length of θ a and β b, 0 a T, 0 b T, a+b=t. The system dscards word pars wth y<4. SDTES sets T=8 f y>10. For all the rest, SDTES uses the smple lnear regresson model T=0.5y+1.8 to compute the threshold value. Ths regresson model s derved from the regresson analyss of sample cognate pars we pcked from the StatCan Daly releases. When matchng θ a and β b n W 2, skppng some characters s acceptable before, after or between the two substrngs; we call them Don t Care Characters (DCCs). The ntal value a n θ a s set to 0, and b n β b to T. The system starts to match the two substrngs θ a and β b n W 2. If a match s found for θ a and β b n W 2 n the rght order, W 1 and W 2 are a cognate par, f not, one substrng θ a s ncreased n length (a=a+1) whle the other substrng β b gets decreased (b=b-1). The search contnues tll a two-substrng match s found or a>t or b<0. The man advantage of usng K-vec wth AMS for cognate dentfcaton s that K-vec does not rely on sentence boundary nformaton. In ths way, t can help detect errors n algnment that depend on sentence boundares. At the same tme, AMS has the straghtforwardness of the naïve matchng algorthm of Smard et al. (1992), but avods problems caused by common prefxes or by the requrement that the frst four letters have to match. Ths mprovement can ncrease the number of correct cognate pars dentfed. At the same tme, AMS nherts the strength of no-crossnglnks constrant n the Longest Common Subsequence Rato (LCSR) algorthm by Melamed (1999). Also, n lmtng the number of substrngs

5 to be matched, AMS overcomes the nherent weakness of LCSR n postng non-ntutve lnks because of lack of context senstvty, as noted n a recent study by Kondrak and Dorr (2004). Ths can help reduce the number of false postves for SDTES such as courters/computers, mensuels/results, and paruton/startng. We found that the K-vec technque together wth AMS algorthm s a good ft for cognate dentfcaton for the msalgnment detecton purpose n SDTES. Numbers and punctuatons: Smlarly, numbers n texts can serve as anchors n algnment. They are good ndcatons of correspondence because a number n one language s usually nterpreted as a number n the other language. Some punctuaton marks can also be anchor ponts. For example, f there s a queston mark n Englsh, normally we are expectng a correspondng queston mark n ts translaton text n French. When detectng msalgned portons of texts, what SDTES does s to parse every algned text segment par, and compares the features n each half of the par before t arrves at one of the two decsons: pass and problem. The detectng process starts from two pror flterng mechansms. One of them s the length rato crtera. If a text segment n one language s more than 3 tmes longer than the correspondng text segment n the other language, the par s marked as a problem par. The second crteron s matchng type. Because the extracted translatons are ndependent translaton pars that wll be used for translaton memory systems and cross-language nformaton retreval systems, matchng types lke 1:0 and 0:1 have to be dscarded. When these two crtera have been checked, SDTES compares the structural and lexcal clues of the HTML text segments for further detecton. These clues nclude selected cognates, punctuaton marks, numbers and HTML tags. Fnally, we have a no-match-tolerance prncple: f there are no structural and textual clues present, and f the two pror flterng crtera (length and matchng type) are checked, we mark the segment as pass. When the msalgnment detecton process s completed, SDTES apples a flterng mechansm to the lst of stored translatons to elmnate the followng types of algned pars: 1) Pars that contan only meta-nformaton codng or codes that are derved from the HTML codng unfcaton process. 2) Algned segments that nclude only the numercal nformaton. 3) Duplcate sentences or smlar constructons that have been seen more than once n the collecton of texts. Sometmes ths nvolves unfyng or dscardng some nformaton such as numbers and tags n the text. 4 Results and performance evaluaton For evaluaton, we measure the performance of the two man components of SDTES: one s the text algnng component and the other s the msalgnment detecton component. We used the fles collected from for evaluaton. These web pages contan 2,682 offcally publshed news releases from Aprl of 2000 to June of We assgned each page a fle name begnnng wth a sequental number from 1 to 9. We grouped these fles accordng to the ntal number n the fle name such as 1, 2, 3,, 9, and randomly chose two fles from each of the 9 groups. Then we manually algned these 18 selected fles to buld the reference collecton. For the evaluaton of the algnment component, we defne M as the set of segments n the manually algned reference collecton, A as the set of machne algned segments before the msalgnment detecton devce s appled. Precson (P) and recall (R) are as follows: A M A M P = R = A M We used the same randomly selected fles to evaluate the performance of the msalgnment detecton component. Here, the system proposed algnments (A') nclude those remanng pars after the system has fltered what t dentfes as msalgned segments. M s the reference set,.e., the algned translaton unts that the human evaluator thnks the system should have reported. The recall (R) represents the proporton of system-proposed translaton segments (A') that are rght wth respect to the reference (M), and the precson (P) s the proporton of correctly proposed algnment segments wth respect to the total of those proposed (A'). A' M A' M P = R = A' M As we can see from Table 1, good results have been reported of the text algnment component. For the 18 algned fles used for evaluaton, the overall

6 algnment precson for the algned translaton pars s 0.967; the recall s Table 1 also ndcates that SDTES s accurate n detectng msalgned pars (P=0.988 and R=0.974). A comparson of precson and recall for the data sets before and after the msalgnment flterng shows the effect of the msalgnment detecton algorthm. Precson mproved from to wth a neglgble loss of recall from to Text algnment Msalgnment detecton Fle P R P R Overall * Table 1. Performance evaluaton of the two man components n SDTES Whle the SDTES model has acheved good results for offcally publshed government blngual text data on the web, there are also lmtatons. For the text algnment part, we stll fnd some examples, although the number s very small, of chans of msalgned sentences. For about a dozen of so fles n the collecton, chan msalgnment segments can be found where the Gale and Church algorthm fnds t hard to come back to the rght track. Although most of the msalgned pars can be fltered by the msalgnment detecton mecha- * Computed over all the mstakes n all the documents. nsm, t s worthwhle to analyze these fles to trace the causes of the msalgnments. There are four types of texts that can trgger massve msalgnment for the Gale and Church algorthm. Alphabetcal lsts: In some government texts, there are lsts of names, commttees, or terms and defntons. Items n these lsts usually land n dfferent postons when sorted alphabetcally n dfferent languages. Because of the poston changes, translatons n these lsts can hardly match on a lne-to-lne bass. Swapped paragraphs: When paragraphs have rearranged postons because of blocks of texts such as note to readers sectons or nserted menu tems that are ntroduced by dv and other HTML tags, the correct algnment chan can be dsrupted. Deletons and nsertons: For any accdental deletons and nsertons, once the system starts to algn the wrong text segments, t usually contnues to msalgn a number of text segments before t can fnally correct tself. Algnment types: We also found a few examples where the exstng 6 algnment types such as 0:1, 1:0, 1:1, 1:2, 2:1 or 2:2 were not enough to cover the rght algnment. We need somethng lke 3:1 or 3:2 to algn them properly. When we gve t an alternatve algnment pattern, t would establsh the wrong lnks wth the followng text segments and cause chan msalgnment. In assessng the performance of the msalgnment detecton devce, we have to take nto account two types of errors. One type nvolves some msalgned pars that stll go undetected. We call them overlooked pars. The other type ncludes correct algnments that are wrongly labeled as msalgnments and are thus mstakenly fltered n the process. We call these overdone pars. As can be seen from Table 1, the number of overlooked pars that are not captured by the system was mnmal: 98.8% of the dentfed correct algnment pars are truly relable translaton pars. Mostly, the overlooked pars are pars of partally correct algnments such as those close to pars wth 1:0 or 0:1 algnment patterns. Generally speakng, overdone pars are very few. When we examne these pars, we found that nconsstent use of HTML codes and varous ways of nterpretng numbers were two notable sources of errors n msalgnment detecton. Inconsstent use of HTML tags: In analyzng the results of the evaluaton, we encountered some

7 examples n whch HTML codes are not used consstently n the two halves of the algned translaton segments. For example, n some Englsh sentences, we found the span tag, but the correspondng tag n French s font. In some cases, there are some HTML markups that are present n only one language, but not n the other. For example, we have some French sentences wth the HTML tag sup, but the same tag s nowhere to be found n ts correspondng translaton segment n Englsh. Number nterpretatons: There are cases n whch we have numbers on one sde of the algned par, but not necessarly on the other sde. In some cases, n Englsh we fnd the number 10, but n French, the correspondng number s X. In other cases, numbers are found n both texts, but the numbers are not the same. For example n Fgure 1, usng dfferent numbers (1990 vs. 90) s not necessarly wrong, but t confuses SDTES and leads the system to msjudge correct translaton pars as overdone pars. e 1. In the early 1990s, despte larger declnes n earnngs n the North than n Canada, employment ncome remaned hgher. f 1. Au début des années 90, malgré une basse des gans plus prononcée dans le Nord que dans l'ensemble du Canada, le revenu d'emplo y est demeuré plus élevé. Fgure 1. Dstnct numbers are used n the algned translaton par SDTES was ntally bult on the bass of the Daly releases of Statstcs Canada publshed between 2004 and Then we extended the applcaton to data collecton of the Daly releases from 1995 to Altogether, we assembled 32,276 Daly release fles (16,138 for each language). After flterng and formattng, SDTES generated 488,646 algned text segments. In ths study, we are usng the model for fve other government webstes of whch the Canadan Forces webste s one, and more than 200,000 pars of translatons were generated. The algned text segments are organzed n an XML format for easy exportablty nto the translaton memory system and other applcaton systems. Meta nformaton tems about each of the algned pars were recorded, such as the length nformaton (before the HTML codes are strpped), the source of the matched strngs, and the matchng patterns. Fgure 2 ncludes a sample record of the fnal algned segments. For the sake of presentaton clarty, lne breaks are added to dfferent levels of XML elements. <bead> <en>it wll ensure that our Canadan Forces n Afghanstan receve the supples and equpment they need to get the job done.</en> <fr>il fera en sorte que les Forces canadennes en Afghanstan reçovent l'approvsonnement et le matérel dont elles ont beson pour fare leur traval.</fr> <pa>1:1</pa> <d>2185:22</d> <re>pas</re> <le>120=150</le> </bead> Fgure 2. SDTES XML output (modfed verson) These algned pars of translatons are clean and consstent, and most of them are free of translaton errors. They are ready to be used n many applcatons and systems such as translaton memory systems, nformaton retreval templates, and machne translaton systems. 5 Concluson In ths paper, we presented the algorthms and procedures of the StatCan Daly Translaton Extracton System (SDTES) and demonstrated how t s used to nduce translatons from offcally publshed materals n Canadan government webstes. SDTES contans two man components: one for text algnment and the other for msalgnment detecton that we use to flter out possble translaton errors. For the text algnment component, we used the Gale-Church algorthm for a two-parse algnment at the paragraph level and the text segment level. For msalgnment detecton, we used a few structural features as anchors for btext comparson. The structural features nclude cognate words, numbers, HTML markups, punctuaton marks, etc. In extractng cognates, we employed the K-vec algorthm n combnaton wth our AMS algorthm. Our evaluaton shows that SDTES has acheved good results and that t s a good model for nducng translatons from offcally-publshed materals on federal government webstes n Canada beyond

8 the Statstcs Canada webstes for whch t was ntally desgned. It would have been good f we could set the orgnal Gale and Church algorthm (wthout the two-parse procedure) as the baselne, and compare t wth the algnment method we adopted n SDTES to evaluate the effects of the two-parse procedure. However, ths attempt was deterred by problems n dentfyng the paragraph boundares wthout the help of HTML tags n the source fles. If we run the algorthm wthout paragraph detecton, the results would be very poor. We leave ths to future work. Acknowledgments We thank Professor Davd Sankoff, Dr. Joel Martn, Professor Charles J. Fllmore, Professor Andrew Brook, and Professor Jm Daves for ther nvaluable nput on varous parts of the paper and for dscussng the StatCan Daly Translaton Extracton System wth us. We also thank the three anonymous revewers for ther nsghtful comments. References Brown, P. F., J. C. La, and R. L. Mercer Algnng sentences n parallel corpora. In Proceedngs of the 29th Annual Meetng of the Assocaton for Computatonal Lngustcs, pages , Berkeley, CA. Callson-Burch, C., C. Bannard, and Schroeder, J A compact data structure for searchable translaton memores. In 10th European Assocaton for Machne Translaton Conference: Buldng Applcatons of Machne Translaton, pages Chen, J. and J. Y. Ne Parallel web text mnng for cross-language IR. In Proceedngs of RIAO 2000: Content-Based Multmeda Informaton Access, vol. 1, pages 62-78, Pars, France. Deng, Y., S. Kumar, and W. Byrne Segmentaton and algnment of parallel text for statstcal machne translaton. Natural Language Engneerng, 12(4): Fung, P. and K. W. Church K-vec: A new approach for algnng parallel texts. In Proceedngs of the 15th Internatonal Conference on Computatonal Lngustcs, pages 1,096-1,102, Kyoto, Japan. Gale, W. A. and K. W. Church A program for algnng sentences n blngual corpora. Computatonal Lngustcs, 19(1): Hutchns, J Towards a defnton of examplebased machne translaton. In MT Summt X. Workshop: Second Workshop on Example-Based Machne Translaton, pages 63-70, Phuket, Thaland. Jutras, J-M An automatc revser: the TransCheck system. In Proceedngs of Appled Natural Language Processng, pages , Seattle, WA. Kondrak G. and B. Dorr Identfcaton of confusable drug names: a new approach and evaluaton methodology. In Proceedngs of the 20th Internatonal Conference on Computatonal Lngustcs, vol. II, pages , Geneva, Swtzerland. Kraaj, W., J.-Y. Ne. and M. Smard Embeddng web-based statstcal translaton models n crosslanguage nformaton retreval. Computatonal Lngustcs, 29(3): Melamed, I. Dan Btext maps and algnment va pattern recognton. Computatonal Lngustcs, 25(1): Moore, Robert C Fast and accurate sentence algnment of blngual corpora. In Machne Translaton: From Research to Real Users, 5th Conference of the Assocaton for Machne Translaton n the Amercas, pages , Sprnger-Verlag, Berln. Pedersen, Ted and N. Varma K-vec++: Approach for fndng word correspondences. Avalable onlne: ~tpederse/parallel.html. Resnk, P Mnng the web for blngual text. In Proceedngs of the 37th Meetng of ACL, pages , College Park, MD. Resnk, P. and Noah A. Smth The web as a parallel corpus. Computatonal Lngustcs, 29 (3): Sanchez-Vllaml, E., S. Santos-Anton, S. Ortz-Rojas and M. L. Forcada Evaluaton of algnment methods for HTML parallel text. In Advances n Natural Language Processng, Proceedngs of Fn- TAL 2006, 5th Internatonal Conference on Natural Language Processng, pages , Turku, Fnland, LNCS Sprnger, Berln. Smard, M., G. Foster and P. Isabelle Usng cognates to algn sentences n blngual corpora. In Proceedngs of the Fourth Internatonal Congress on Theoretcal and Methodologcal Issues n Machne Translaton, pages 67-81, Montreal, Canada. Zhu, Q., D. Inkpen and A. Asudeh Automatc extracton of translatons from web-based blngual materals. Machne Translaton, 21 (3):

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Cross-Language Information Retrieval

Cross-Language Information Retrieval Feature Artcle: Cross-Language Informaton Retreval 19 Cross-Language Informaton Retreval Jan-Yun Ne 1 Abstract A research group n Unversty of Montreal has worked on the problem of cross-language nformaton

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

CS1100 Introduction to Programming

CS1100 Introduction to Programming Factoral (n) Recursve Program fact(n) = n*fact(n-) CS00 Introducton to Programmng Recurson and Sortng Madhu Mutyam Department of Computer Scence and Engneerng Indan Insttute of Technology Madras nt fact

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Intrinsic Plagiarism Detection Using Character n-gram Profiles

Intrinsic Plagiarism Detection Using Character n-gram Profiles Intrnsc Plagarsm Detecton Usng Character n-gram Profles Efstathos Stamatatos Unversty of the Aegean 83200 - Karlovass, Samos, Greece stamatatos@aegean.gr Abstract: The task of ntrnsc plagarsm detecton

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

AP PHYSICS B 2008 SCORING GUIDELINES

AP PHYSICS B 2008 SCORING GUIDELINES AP PHYSICS B 2008 SCORING GUIDELINES General Notes About 2008 AP Physcs Scorng Gudelnes 1. The solutons contan the most common method of solvng the free-response questons and the allocaton of ponts for

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES 1 Fetosa, R.Q., 2 Merelles, M.S.P., 3 Blos, P. A. 1,3 Dept. of Electrcal Engneerng ; Catholc Unversty of

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent

More information

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

A Semi-parametric Regression Model to Estimate Variability of NO 2

A Semi-parametric Regression Model to Estimate Variability of NO 2 Envronment and Polluton; Vol. 2, No. 1; 2013 ISSN 1927-0909 E-ISSN 1927-0917 Publshed by Canadan Center of Scence and Educaton A Sem-parametrc Regresson Model to Estmate Varablty of NO 2 Meczysław Szyszkowcz

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

F Geometric Mean Graphs

F Geometric Mean Graphs Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 2 (December 2015), pp. 937-952 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) F Geometrc Mean Graphs A.

More information

11. HARMS How To: CSV Import

11. HARMS How To: CSV Import and Rsk System 11. How To: CSV Import Preparng the spreadsheet for CSV Import Refer to the spreadsheet template to ad algnng spreadsheet columns wth Data Felds. The spreadsheet s shown n the Appendx, an

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Ontology Mapping: As a Binary Classification Problem

Ontology Mapping: As a Binary Classification Problem Fourth Internatonal Conference on Semantcs, Knowledge and Grd Ontology Mappng: As a Bnary Classfcaton Problem Mng Mao SAP Research mng.mao@sap.com Yefe Peng Yahoo! ypeng@yahoo-nc.com Mchael Sprng U. of

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Signature and Lexicon Pruning Techniques

Signature and Lexicon Pruning Techniques Sgnature and Lexcon Prunng Technques Srnvas Palla, Hansheng Le, Venu Govndaraju Centre for Unfed Bometrcs and Sensors Unversty at Buffalo {spalla2, hle, govnd}@cedar.buffalo.edu Abstract Handwrtten word

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Bootstrapping Structured Page Segmentation

Bootstrapping Structured Page Segmentation Bootstrappng Structured Page Segmentaton Huanfeng Ma and Davd Doermann Laboratory for Language and Meda Processng Insttute for Advanced Computer Studes (UMIACS) Unversty of Maryland, College Park, MD {hfma,

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR Judth Aronow Rchard Jarvnen Independent Consultant Dept of Math/Stat 559 Frost Wnona State Unversty Beaumont, TX 7776 Wnona, MN 55987 aronowju@hal.lamar.edu

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

CLASSIFICATION OF ULTRASONIC SIGNALS

CLASSIFICATION OF ULTRASONIC SIGNALS The 8 th Internatonal Conference of the Slovenan Socety for Non-Destructve Testng»Applcaton of Contemporary Non-Destructve Testng n Engneerng«September -3, 5, Portorož, Slovena, pp. 7-33 CLASSIFICATION

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information