Discovering Word Senses from Text

Size: px
Start display at page:

Download "Discovering Word Senses from Text"

Transcription

1 Dsoverng Word Senses from Text Patrk Pantel and Dekang Ln Unversty of Alberta Department of Computng Sene Edmonton, Alberta T6H E1 Canada {ppantel, ABSTRACT Inventores of manually ompled dtonares usually serve as a soure for word senses. However, they often nlude many rare senses whle ssng orpus/doman-spef senses. We present a lusterng algorthm alled CBC (Clusterng By Comttee) that automatally dsovers word senses from text. It ntally dsovers a set of tght lusters alled omttees that are well sattered n the slarty spae. The entrod of the members of a omttee s used as the feature vetor of the luster. We proeed by assgnng words to ther most slar lusters. After assgnng an element to a luster, we remove ther overlappng features from the element. Ths allows CBC to dsover the less frequent senses of a word and to avod dsoverng duplate senses. Eah luster that a word belongs to represents one of ts senses. We also present an evaluaton methodology for automatally measurng the preson and reall of dsovered senses. Categores and Subet Desrptors H.3.3 [Informaton Storage and Retreval]: Informaton Searh and Retreval---Clusterng. General Terms Algorthms, Measurement, Expermentaton. Keywords Word sense dsovery, lusterng, evaluaton, mahne learnng. 1. INTRODUCTION Usng word senses versus word forms s useful n many applatons suh as nformaton retreval [0], mahne translaton [5] and queston-answerng [16]. In prevous approahes, word senses are usually defned usng a manually onstruted lexon. There are several dsadvantages assoated wth these word senses. Frst, manually reated lexons often ontan rare senses. For example, WordNet 1.5 [15] (hereon referred to as WordNet) nluded a sense of omputer that means the person who omputes. Usng WordNet to expand queres to an nformaton retreval system, the expanson of omputer Persson to make dgtal or hard opes of all or part of ths work for personal or lassroom use s granted wthout fee provded that opes are not made or dstrbuted for proft or ommeral advantage and that opes bear ths note and the full taton on the frst page. To opy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror spef persson and/or a fee. SIGKDD 0, July 3-6, 00, Edmonton, Alberta, Canada. Copyrght 00 ACM X/0/0007 $5.00. nludes words lke estmator and rekoner. The seond problem wth these lexons s that they ss many doman spef senses. For example, WordNet sses the user-nterfae-obet sense of the word dalog (as often used n software manuals). The meanng of an unknown word an often be nferred from ts ontext. Consder the followng sentenes: A bottle of tezgüno s on the table. Everyone lkes tezgüno. Tezgüno makes you drunk. We make tezgüno out of orn. The ontexts n whh the word tezgüno s used suggest that tezgüno may be a knd of alohol beverage. Ths s beause other alohol beverages tend to our n the same ontexts as tezgüno. The ntuton s that words that our n the same ontexts tend to be slar. Ths s known as the Dstrbutonal Hypothess [3]. There have been many approahes to ompute the slarty between words based on ther dstrbuton n a orpus [4][8][1]. The output of these programs s a ranked lst of slar words to eah word. For example, [1] outputs the followng slar words for wne and sut: wne: beer, whte wne, red wne, Chardonnay, hampagne, frut, food, offee, ue, Cabernet, ogna, vnegar, Pnot nor, lk, vodka, sut: lawsut, aket, shrt, pant, dress, ase, sweater, oat, trouser, lam, busness sut, blouse, skrt, ltgaton, The slar words of wne represent the meanng of wne. However, the slar words of sut represent a xture of ts lothng and ltgaton senses. Suh lsts of slar words do not dstngush between the multple senses of polysemous words. The algorthm we present n ths paper automatally dsovers word senses by lusterng words aordng to ther dstrbutonal slarty. Eah luster that a word belongs to orresponds to a sense of the word. Consder the followng sample outputs from our algorthm: (sut Nq (blouse, slak, leggng, sweater) Nq (lawsut, allegaton, ase, harge) ) (plant Nq (plant, fatory, falty, refnery) Nq (shrub, ground over, perennal, bulb) )

2 (heart Nq7 ) 0.7 (kdney, bone marrow, marrow, lver) Nq (psyhe, onsousness, soul, nd) Eah entry shows the lusters to whh the headword belongs. Nq34, Nq137, are automatally generated names for the lusters. The number after eah luster name s the slarty between the luster and the headword (.e. sut, plant and heart). The lsts of words are the top-4 most slar members to the luster entrod. Eah luster orresponds to a sense of the headword. For example, Nq34 orresponds to the lothng sense of sut and Nq137 orresponds to the ltgaton sense of sut. In ths paper, we present a lusterng algorthm, CBC (Clusterng By Comttee), n whh the entrod of a luster s onstruted by averagng the feature vetors of a subset of the luster members. The subset s vewed as a omttee that deternes whh other elements belong to the luster. By arefully hoosng omttee members, the features of the entrod tend to be the more typal features of the target lass. We also propose an automat evaluaton methodology for senses dsovered by lusterng algorthms. Usng the senses n WordNet, we measure the preson of a system s dsovered senses and the reall of the senses t should dsover.. RELATED WORK Clusterng algorthms are generally ategorzed as herarhal and parttonal. In herarhal agglomeratve algorthms, lusters are onstruted by teratvely mergng the most slar lusters. These algorthms dffer n how they ompute luster slarty. In sngle-lnk lusterng, the slarty between two lusters s the slarty between ther most slar members whle ompletelnk lusterng uses the slarty between ther least slar members. Average-lnk lusterng omputes ths slarty as the average slarty between all pars of elements aross lusters. The omplexty of these algorthms s O(n logn), where n s the number of elements to be lustered [6]. Chameleon s a herarhal algorthm that employs dyna modelng to mprove lusterng qualty [7]. When mergng two lusters, one ght onsder the sum of the slartes between pars of elements aross the lusters (e.g. average-lnk lusterng). A drawbak of ths approah s that the exstene of a sngle par of very slar elements ght unduly ause the merger of two lusters. An alternatve onsders the number of pars of elements whose slarty exeeds a ertan threshold [3]. However, ths may ause undesrable mergers when there are a large number of pars whose slartes barely exeed the threshold. Chameleon lusterng ombnes the two approahes. K-means lusterng s often used on large data sets sne ts omplexty s lnear n n, the number of elements to be lustered. K-means s a faly of parttonal lusterng algorthms that teratvely assgns eah element to one of K lusters aordng to the entrod losest to t and reomputes the entrod of eah luster as the average of the luster s elements. However, K- means has omplexty O(K T n) and s effent for many lusterng tasks. Beause the ntal entrods are randomly seleted, the resultng lusters vary n qualty. Some sets of ntal entrods lead to poor onvergene rates or poor luster qualty. Bsetng K-means [19], a varaton of K-means, begns wth a set ontanng one large luster onsstng of every element and teratvely pks the largest luster n the set, splts t nto two lusters and replaes t by the splt lusters. Splttng a luster onssts of applyng the bas K-means algorthm α tmes wth K= and keepng the splt that has the hghest average elemententrod slarty. Hybrd lusterng algorthms ombne herarhal and parttonal algorthms n an attempt to have the hgh qualty of herarhal algorthms wth the effeny of parttonal algorthms. Bukshot [1] addresses the problem of randomly seletng ntal entrods n K-means by ombnng t wth average-lnk lusterng. Cuttng et al. lam ts lusters are omparable n qualty to herarhal algorthms but wth a lower omplexty. Bukshot frst apples average-lnk to a random sample of n elements to generate K lusters. It then uses the entrods of the lusters as the ntal K entrods of K-means lusterng. The sample sze ounterbalanes the quadrat runnng tme of average-lnk to make Bukshot effent: O(K T n + nlogn). The parameters K and T are usually onsdered to be small numbers. CBC s a desendent of UNICON [13], whh also uses small and tght lusters to onstrut ntal entrods. We ompare them n Seton 4.4 after presentng the CBC algorthm. 3. WORD SIMILARITY Followng [1], we represent eah word by a feature vetor. Eah feature orresponds to a ontext n whh the word ours. For example, sp s a verb-obet ontext. If the word wne ourred n ths ontext, the ontext s a feature of wne. The value of the feature s the pontwse mutual nformaton [14] between the feature and the word. Let be a ontext and F (w) be the frequeny ount of a word w ourrng n ontext. The pontwse mutual nformaton, w,, between and w s defned as: where N = ( ) w =, (1) F F ( ) N F N F s the total frequeny ounts of all words and ther ontexts. A well-known problem wth mutual nformaton s that t s based towards nfrequent words/features. We therefore multpled w, wth a dsountng fator: F F + n F 1 n F N, F ( ), F ( ) + 1 We ompute the slarty between two words w and w usng the osne oeffent [17] of ther mutual nformaton vetors: sm w (,w ) = w w w w () (3)

3 4. ALGORITHM CBC onssts of three phases. In Phase I, we ompute eah element s top-k slar elements. In our experments, we used k = 10. In Phase II, we onstrut a olleton of tght lusters, where the elements of eah luster form a omttee. The algorthm tres to form as many omttees as possble on the ondton that eah newly formed omttee s not very slar to any exstng omttee. If the ondton s volated, the omttee s smply dsarded. In the fnal phase of the algorthm, eah element e s assgned to ts most slar lusters. 4.1 Phase I: Fnd top-slar elements Computng the omplete slarty matrx between pars of elements s obvously quadrat. However, one an dramatally redue the runnng tme by takng advantage of the fat that the feature vetor s sparse. By ndexng the features, one an retreve the set of elements that have a gven feature. To ompute the top slar elements of an element e, we frst sort the features aordng to ther pontwse mutual nformaton values and then only onsder a subset of the features wth hghest mutual nformaton. Fnally, we ompute the parwse slarty between e and the elements that share a feature from ths subset. Sne hgh mutual nformaton features tend not to our n many elements, we only need to ompute a fraton of the possble parwse ombnatons. Usng ths heurst, slar words that share only low mutual nformaton features wll be ssed by our algorthm. However, n our experments, ths had no vsble mpat on luster qualty. 4. Phase II: Fnd omttees The seond phase of the lusterng algorthm reursvely fnds tght lusters sattered n the slarty spae. In eah reursve step, the algorthm fnds a set of tght lusters, alled omttees, and dentfes resdue elements that are not overed by any omttee. We say a omttee overs an element f the element s slarty to the entrod of the omttee exeeds some hgh slarty threshold. The algorthm then reursvely attempts to fnd more omttees among the resdue elements. The output of the algorthm s the unon of all omttees found n eah reursve step. The detals of Phase II are presented n Fgure 1. In Step 1, the sore reflets a preferene for bgger and tghter lusters. Step gves preferene to hgher qualty lusters n Step 3, where a luster s only kept f ts slarty to all prevously kept lusters s below a fxed threshold. In our experments, we set θ 1 = Step 4 ternates the reurson f no omttee s found n the prevous step. The resdue elements are dentfed n Step 5 and f no resdues are found, the algorthm ternates; otherwse, we reursvely apply the algorthm to the resdue elements. Eah omttee that s dsovered n ths phase defnes one of the fnal output lusters of the algorthm. 4.3 Phase III: Assgn elements to lusters In Phase III, eah element e s assgned to ts most slar lusters n the followng way: let C be a lst of lusters ntally empty let S be the top-00 slar lusters to e Input: Step 1: Step : Step 3: A lst of elements E to be lustered, a slarty database S from Phase I, thresholds θ 1 and θ. For eah element e E Cluster the top slar elements of e from S usng average-lnk lusterng. For eah luster dsovered ompute the followng sore: avgsm(), where s the number of elements n and avgsm() s the average parwse slarty between elements n. Store the hghest-sorng luster n a lst L. Sort the lusters n L n desendng order of ther sores. Let C be a lst of omttees, ntally empty. For eah luster L n sorted order Compute the entrod of by averagng the frequeny vetors of ts elements and omputng the mutual nformaton vetor of the entrod n the same way as we dd for ndvdual elements. If s slarty to the entrod of eah omttee prevously added to C s below a threshold θ 1, add to C. Step 4: If C s empty, we are done and return C. Step 5: For eah element e E If e s slarty to every omttee n C s below threshold θ, add e to a lst of resdues R. Step 6: If R s empty, we are done and return C. Otherwse, return the unon of C and the output of a reursve all to Phase II usng the same nput exept replang E wth R. Output: a lst of omttees. Fgure 1. Phase II of CBC. whle S s not empty { let S be the most slar luster to e f the slarty(e, ) < σ ext the loop f s not slar to any luster n C { assgn e to remove from e ts features that overlap wth the features of ; } remove from S } When omputng the slarty between a luster and an element (or another luster) we use the entrod of omttee members as the representaton for the luster. Ths phase resembles K-means n that elements are assgned to ther losest entrods. Unlke K- means, the number of lusters s not fxed and the entrods do not hange (.e. when an element s added to a luster, t s not added to the omttee of the luster). The key to the algorthm for dsoverng senses s that one an element e s assgned to a luster, the ntersetng features

4 between e and are removed from e. Ths allows CBC to dsover the less frequent senses of a word and to avod dsoverng duplate senses. entty Comparson wth UNICON UNICON [13] also onstruts luster entrods usng a small set of slar elements, lke the omttees n CBC. One of the man dfferenes between UNICON and CBC s that UNICON only guarantees that the omttees do not have overlappng members. However, the entrods of two omttees may stll be qute slar. UNICON deals wth ths problem by mergng suh lusters. In ontrast, Step n Phase II of CBC only outputs a omttee f ts entrod s not slar to any prevously output omttee. Another man dfferene between UNICON and CBC s n Phase III of CBC. UNICON has dffulty dsoverng senses of a word when ths word has a donatng sense. For example, n the newspaper orpus that we used n our experments, the fatory sense of plant s used muh more frequently than ts lfe sense. Consequently, the maorty of the features of the word plant are related to ts fatory sense. Ths s evdened n the followng top- 30 most slar words of plant. falty, fatory, reator, refnery, power plant, ste, manufaturng plant, tree, buldng, omplex, landfll, dump, proet, ll, arport, staton, farm, operaton, warehouse, ompany, home, enter, lab, store, ndustry, park, house, busness, nnerator All of the above, exept the word tree, are related to the fatory sense. Even though UNICON generated a luster ground over, perennal, shrub, bulb, annual, wldflower, shrubbery, fern, grass,... the slarty between plant and ths luster s very low. On the other hand, CBC removes the fatory related features from the feature vetor of plant after t s assgned to the fatory luster. As a result, the slarty between the {ground over, perennal, } luster and the revsed feature vetor of plant beomes muh hgher. 5. EVALUATION METHODOLOGY To evaluate our system, we ompare ts output wth WordNet, a manually reated lexon. 5.1 WordNet WordNet [15] s an eletron dtonary organzed as a graph. Eah node, alled a synset, represents a set of synonymous words. The ars between synsets represent hyponym/hypernym (sublass/superlass) relatonshps 1. Fgure shows a fragment of WordNet. The number attahed to a synset s s the probablty that a randomly seleted noun refers to an nstane of s or any synset below t. These probabltes are not nluded n WordNet. We use the frequeny ounts of synsets n the SemCor [9] orpus to estmate them. Sne SemCor s a farly small orpus (00K 1 WordNet also ontans other semant relatonshps suh as meronyms (part-whole relatonshps) and antonyms, however we do not use them here natural -elevaton words), the frequeny ounts of the synsets n the lower part of the WordNet herarhy are very sparse. We smooth the probabltes by assung that all sblngs are equally lkely gven the parent. Ln [11] defned the slarty between two WordNet synsets s 1 and s as: sm s (,s ) 1 hll log P() s ( s ) + log P( s ) = (4) log P where s s the most spef synset that subsumes s 1 and s. For example, usng Fgure, f s 1 = hll and s = shore then s = geologal-formaton and sm(hll, shore) = Preson For eah word, CBC outputs a lst of lusters to whh the word belongs. Eah luster should orrespond to a sense of the word. The preson of the system s measured by the perentage of output lusters that atually orrespond to a sense of the word. To ompute the preson, we must defne what t means for a luster to orrespond to a orret sense of a word. To deterne ths automatally, we map lusters to WordNet senses. Let S(w) be the set of WordNet senses of a word w (eah sense s a synset that ontans w). We defne smw(s, u), the slarty between a synset s and a word u, as the maxmum slarty between s and a sense of u: smw nanmate-obet natural-obet geologal-formaton ( s,u) 1 shore oast ( ) Fgure. Example herarhy of synsets n WordNet along wth eah synset s probablty. = max sm s,t (5) t S ( u ) Let k be the top-k members of a luster, where these are the k most slar members to the omttee of. We defne the slarty between s and, smc(s, ), as the average slarty between s and the top-k members of :

5 smc ( s,) u k smw k ( s,u) = (6) Suppose a lusterng algorthm assgns the word w to luster. We say that orresponds to a orret sense of w f max smc s S ( s,) θ In our experments, we set k = 4 and vared the θ values. The WordNet sense of w that orresponds to s then: arg max smc s S ( s,) It s possble that multple lusters wll orrespond to the same WordNet sense. In ths ase, we only ount one of them as orret. We defne the preson of a word w as the perentage of orret lusters to whh t s assgned. The preson of a lusterng algorthm s the average preson of all the words. 5.3 Reall The reall (ompleteness) of a word w measures the rato between the orret lusters to whh w s assgned and the atual number of senses n whh w was used n the orpus. Clearly, there s no way to know the omplete lst of senses of a word n any nontrval orpus. To address ths problem, we pool the results of several lusterng algorthms to onstrut the target senses. For a gven word w, we use the unon of the orret luster of w dsovered by the algorthms as the target lst of senses for w. Whle ths reall value s lkely not the true reall, t does provde a relatve rankng of the algorthms used to onstrut the pool of target senses. The overall reall s the average reall of all words. 5.4 F-measure The F-measure [18] ombnes preson and reall aspets: (7) (8) RP F = (9) R + P where R s the reall and P s the preson. F weghts low values of preson and reall more heavly than hgher values. It s hgh when both preson and reall are hgh. 6. EXPERIMENTAL RESULTS In ths seton, we desrbe our expermental setup and present evaluaton results of our system. 6.1 Setup We used Mnpar [10], a broad-overage Englsh parser, to parse about 1GB (144M words) of newspaper text from the TREC olleton (1988 AP Newswre, LA Tmes, and 1991 San Jose Merury) at a speed of about 500 words/seond on a PIII-750 wth 51MB memory. We olleted the frequeny ounts of the Avalable at Table 1. Preson, Reall and F-measure on the data set for varous algorthms wth σ = 0.18 and θ = 0.5. ALGORITHM PRECISION (%) RECALL (%) F-MEASURE (%) CBC UNICON Bukshot K-means Bsetng K-means Average-lnk grammatal relatonshps (ontexts) output by Mnpar and used them to ompute the pontwse mutual nformaton values from Seton 3. The test set s onstruted by ntersetng the words n WordNet wth the nouns n the orpus whose total mutual nformaton wth all of ts ontexts exeeds a threshold (we used 50). Sne WordNet has a low overage of proper names, we removed all aptalzed nouns. The resultng test set onssts of words. The average number of features per word s We modfed the average-lnk, K-means, Bsetng K-means and Bukshot algorthms of Seton sne these algorthms only assgn eah element to a sngle luster. For eah of these algorthms, the modfaton s as follows: Apply the algorthm as desrbed n Seton For eah luster returned by the algorthm Create a entrod for usng all elements assgned to t Apply MK-means usng the above entrods where MK-means s the K-means algorthm, usng the above entrods as ntal entrods, exept that eah element s assgned to ts most slar luster plus all other lusters wth whh t has slarty greater than σ. We then use these modfed algorthms to dsover senses. These lusterng algorthms were not desgned for sense dsovery. Lke UNICON, when assgnng an element to a luster, they do not remove the overlappng features from the element. Thus, a word s often assgned to multple lusters that are slar. 6. Word Sense Evaluaton We ran CBC and the modfed lusterng algorthms desrbed n the prevous subseton on the data set and appled the evaluaton methodology from Seton 4.4. Table 1 shows the results. For Bukshot and K-means, we set the number of lusters to 1000 and the maxmum number of teratons to 5. For the Bsetng K- means algorthm, we appled the bas K-means algorthm twe (α = n Seton ) wth a maxmum of 5 teratons per splt. CBC returned 941 lusters and outperformed the next best algorthm by 7.5% on preson and 5.3% on reall. In Seton 5. we stated that a luster orresponds to a orret sense of a word w f ts maxmum smc slarty wth any synset n S(w) exeeds a threshold θ (Eq. 7). Fgure shows our

6 Table. Comparson of manual and automat evaluatons of a 1% random sample of the data set. 80% AUTOMATIC MANUAL F -measure 64% 48% 3% 16% θ experments usng dfferent values of θ. The hgher the θ value, the strter we are n defnng orret senses. Naturally, the systems F-measures derease when θ nreases. The relatve rankng of the algorthms s not senstve to the hoe of θ values. CBC has hgher F-measure for all θ thresholds. For all sense dsovery algorthms, we assgn an element to a luster f ther slarty exeeds a threshold σ. The value of σ does not affet the frst sense returned by the algorthms for eah word beause eah word s always assgned to ts most slar luster. We expermented wth dfferent values of σ and present the results n Fgure 3. Wth a lower σ value, words are assgned to more lusters. Consequently, the preson goes down whle reall goes up. CBC has hgher F-measure for all σ thresholds. 6.3 Manual Evaluaton We manually evaluated a 1% random sample of the test data onsstng of 133 words wth 168 senses. Here s a sample of the nstanes that are manually udged for the words ara, aptal and deve: ara S1: song, ballad, folk song, tune aptal S1: money, donaton, fundng, honorarum aptal S: amp, shantytown, townshp, slum deve S1: amera, transtter, sensor, eletron deve deve S: equpment, test equpment, roomputer, vdeo equpment For eah dsovered sense of a word, we nlude ts top-4 most slar words. The evaluaton onssts of assgnng a tag to eah sense as follows: : The lst of top-4 words desrbes a sense of the word that has not yet been seen +: The lst of top-4 words desrbes a sense of the word that has already been seen (duplate sense) : The lst of top-4 words does not desrbe a sense of the word The S sense of deve s an example of a sense that s evaluated wth the duplate sense tag. Table ompares the agreements/dsagreements between our manual and automat evaluatons. Our manual evaluaton agreed wth the automat evaluaton 88.1% of the tme. Ths suggests that the evaluaton methodology s relable. Most of the dsagreements (17 out of 0) were on senses that were norret aordng to the automat evaluaton but orret n the manual evaluaton. The automat evaluaton slassfed these Fgure. F-measure of several algorthms wth σ = 0.18 and varyng θ thresholds from Eq.7. F -measure 60% 50% 40% 30% 0% CBC UNICON Bukshot K-means BK-means Average-Lnk σ CBC UNICON Bukshot K-means BK-means A verage-lnk Fgure 3. F-measure of several algorthms wth θ = 0.5 and varyng σ thresholds. beause sometmes WordNet sses a sense of a word and beause of the organzaton of the WordNet herarhy. Some words n WordNet should have hgh slarty (e.g. eleted offal and legslator) but they are not lose to eah other n the herarhy. Our manual evaluaton of the sample gave a preson of 7.0%. The automat evaluaton of the same sample gave 63.1% preson. Of the 13,403 words n the test data, CBC found 869 of them polysemous. 7. DISCUSSION We omputed the average preson for eah luster, whh s the perentage of elements n a luster that orretly orrespond to a WordNet sense aordng to Eq.7. We nspeted the low-preson lusters and found that they were low for three man reasons. Frst, some lusters suffer from part-of-speeh onfuson. Many of the nouns n our data set an also be used as verbs and adetves. Sne the feature vetor of a word s onstruted from all nstanes of that word (nludng ts noun, verb and adetve usage), CBC outputs ontan lusters of verbs and adetves. For example, the followng luster ontans 11 adetves: werd, stupd, slly, old, bad, smple, normal, wrong, wld, good, romant, tough, speal, small, real, smart,...

7 The noun senses of all of these words n WordNet are not slar. Therefore, the luster has a very low.6% preson. In hndsght, we should have removed the verb and adetve usage features. Seondly, CBC outputs some lusters of proper names. If a word that frst ours as a ommon noun also has a proper-noun usage t wll not be removed from the test data. For the same reasons as the part-of-speeh onfuson problem, CBC dsovers proper name lusters but gets them evaluated as f they were ommon nouns (sne WordNet ontans few proper nouns). For example, the followng luster has an average preson of 10%: blue ay, expo, angel, marner, ub, brave, prate, twn, athlets, brewer Fnally, some onepts dsovered by CBC are ompletely ssng from WordNet. For example, the followng luster of government departments has a low preson of 3.3% beause WordNet does not have a synset that subsumes these words: publ works, ty plannng, forestry, fnane, toursm, agrulture, health, affar, soal welfare, transport, labor, ommunaton, envronment, mgraton, publ serve, transportaton, urban plannng, fshery, avaton, teleommunaton, mental health, prourement, ntellgene, ustom, hgher eduaton, rereaton, preservaton, lottery, orreton, soutng Somewhat surprsngly, all of the low-preson lusters that we nspeted are reasonably good. At frst sght, we thought the followng luster was bad: shamrok, nestle, dart, partnershp, haft, onsortum, blokbuster, whrlpool, delta, hallmark, rosewood, odyssey, bass, forte, asade, tadel, metropoltan, hooker By lookng at the features of the entrod of ths luster, we realzed that t s mostly a luster of ompany names. 8. CONCLUSION We presented a lusterng algorthm, CBC, that automatally dsovers word senses from text. We frst fnd well-sattered tght lusters alled omttees and use them to onstrut the entrods of the fnal lusters. We proeed by assgnng words to ther most slar lusters. After assgnng an element to a luster, we remove ther overlappng features from the element. Ths allows CBC to dsover the less frequent senses of a word and to avod dsoverng duplate senses. Eah luster that a word belongs to represents one of ts senses. We also presented an evaluaton methodology for automatally measurng the preson and reall of dsovered senses. In our experments, we showed that CBC outperforms several well known herarhal, parttonal, and hybrd lusterng algorthms. Our manual evaluaton of sample CBC outputs agreed wth 88.1% of the desons made by the automat evaluaton. 9. ACKNOWLEDGEMENTS The authors wsh to thank the revewers for ther helpful omments. Ths researh was partly supported by Natural Senes and Engneerng Researh Counl of Canada grant OGP11338 and sholarshp PGSB REFERENCES [1] Cuttng, D. R.; Karger, D.; Pedersen, J.; and Tukey, J. W Satter/Gather: A luster-based approah to browsng large doument olletons. In Proeedngs of SIGIR-9. pp Copenhagen, Denmark. [] Guha, S.; Rastog, R.; and Kyuseok, S ROCK: A robust lusterng algorthm for ategoral attrbutes. In Proeedngs of ICDE 99. pp Sydney, Australa. [3] Harrs, Z Dstrbutonal struture. In: Katz, J. J. (ed.) The Phlosophy of Lngusts. New York: Oxford Unversty Press. pp [4] Hndle, D Noun lassfaton from predate-argument strutures. In Proeedngs of ACL-90. pp Pttsburgh, PA. [5] Huthns, J. and Sommers, H Introduton to Mahne Translaton. Aade Press. [6] Jan, A. K.; Murty, M. N.; and Flynn, P. J Data lusterng: A revew. ACM Computng Surveys 31(3): [7] Karyps, G.; Han, E.-H.; and Kumar, V Chameleon: A herarhal lusterng algorthm usng dyna modelng. IEEE Computer: Speal Issue on Data Analyss and Mnng 3(8): [8] Landauer, T. K., and Dumas, S. T A soluton to Plato's problem: The Latent Semant Analyss theory of the aquston, nduton, and representaton of knowledge. Psyhologal Revew 104: [9] Landes, S.; Leaok, C.; and Teng, R. I Buldng semant onordanes. In WordNet: An Eletron Lexal Database, edted by C. Fellbaum. pp MIT Press. [10] Ln, D Prnpar - an effent, broad-overage, prnplebased parser. Proeedngs of COLING-94. pp Kyoto, Japan. [11] Ln, D Usng syntat dependeny as loal ontext to resolve word sense ambguty. In Proeedngs of ACL-97. pp Madrd, Span. [1] Ln, D Automat retreval and lusterng of slar words. Proeedngs of COLING/ACL-98. pp Montreal, Canada. [13] Ln, D. and Pantel, P Induton of semant lasses from natural language text. In Proeedngs of SIGKDD-01. pp San Franso, CA. [14] Mannng, C. D. and Shütze, H Foundatons of Statstal Natural Language Proessng. MIT Press. [15] Mller, G WordNet: An onlne lexal database. Internatonal Journal of Lexography, [16] Pasa, M. and Harabagu, S The nformatve role of WordNet n Open-Doman Queston Answerng. In Proeedngs of NAACL-01 Workshop on WordNet and Other Lexal Resoures. pp Pttsburgh, PA. [17] Salton, G. and MGll, M. J Introduton to Modern Informaton Retreval. MGraw Hll. [18] Shaw Jr, W. M.; Burgn, R.; and Howell, P Performane standards and evaluatons n IR test olletons: Cluster-based retreval methods. Informaton Proessng and Management 33:1 14, [19] Stenbah, M.; Karyps, G.; and Kumar, V A omparson of doument lusterng tehnques. Tehnal Report # Department of Computer Sene and Engneerng, Unversty of Mnnesota. [0] Voorhees, E. M Usng WordNet for text retreval. In WordNet: An Eletron Lexal Database, edted by C. Fellbaum. pp MIT Press.

Link Graph Analysis for Adult Images Classification

Link Graph Analysis for Adult Images Classification Lnk Graph Analyss for Adult Images Classfaton Evgeny Khartonov Insttute of Physs and Tehnology, Yandex LLC 90, 6 Lev Tolstoy st., khartonov@yandex-team.ru Anton Slesarev Insttute of Physs and Tehnology,

More information

Bottom-Up Fuzzy Partitioning in Fuzzy Decision Trees

Bottom-Up Fuzzy Partitioning in Fuzzy Decision Trees Bottom-Up Fuzzy arttonng n Fuzzy eson Trees Maej Fajfer ept. of Mathemats and Computer Sene Unversty of Mssour St. Lous St. Lous, Mssour 63121 maejf@me.pl Cezary Z. Janow ept. of Mathemats and Computer

More information

Matrix-Matrix Multiplication Using Systolic Array Architecture in Bluespec

Matrix-Matrix Multiplication Using Systolic Array Architecture in Bluespec Matrx-Matrx Multplaton Usng Systol Array Arhteture n Bluespe Team SegFault Chatanya Peddawad (EEB096), Aman Goel (EEB087), heera B (EEB090) Ot. 25, 205 Theoretal Bakground. Matrx-Matrx Multplaton on Hardware

More information

Multilabel Classification with Meta-level Features

Multilabel Classification with Meta-level Features Multlabel Classfaton wth Meta-level Features Sddharth Gopal Carnege Mellon Unversty Pttsburgh PA 523 sgopal@andrew.mu.edu Ymng Yang Carnege Mellon Unversty Pttsburgh PA 523 ymng@s.mu.edu ABSTRACT Effetve

More information

Research on Neural Network Model Based on Subtraction Clustering and Its Applications

Research on Neural Network Model Based on Subtraction Clustering and Its Applications Avalable onlne at www.senedret.om Physs Proeda 5 (01 ) 164 1647 01 Internatonal Conferene on Sold State Deves and Materals Sene Researh on Neural Networ Model Based on Subtraton Clusterng and Its Applatons

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Fuzzy Modeling for Multi-Label Text Classification Supported by Classification Algorithms

Fuzzy Modeling for Multi-Label Text Classification Supported by Classification Algorithms Journal of Computer Senes Orgnal Researh Paper Fuzzy Modelng for Mult-Label Text Classfaton Supported by Classfaton Algorthms 1 Beatrz Wlges, 2 Gustavo Mateus, 2 Slva Nassar, 2 Renato Cslagh and 3 Rogéro

More information

Performance Evaluation of TreeQ and LVQ Classifiers for Music Information Retrieval

Performance Evaluation of TreeQ and LVQ Classifiers for Music Information Retrieval Performane Evaluaton of TreeQ and LVQ Classfers for Mus Informaton Retreval Matna Charam, Ram Halloush, Sofa Tsekerdou Athens Informaton Tehnology (AIT) 0.8 km Markopoulo Ave. GR - 19002 Peana, Athens,

More information

FUZZY SEGMENTATION IN IMAGE PROCESSING

FUZZY SEGMENTATION IN IMAGE PROCESSING FUZZY SEGMENTATION IN IMAGE PROESSING uevas J. Er,, Zaldívar N. Danel,, Roas Raúl Free Unverstät Berln, Insttut für Inforat Tausstr. 9, D-495 Berln, Gerany. Tel. 0049-030-8385485, Fax. 0049-030-8387509

More information

A Fast Way to Produce Optimal Fixed-Depth Decision Trees

A Fast Way to Produce Optimal Fixed-Depth Decision Trees A Fast Way to Produe Optmal Fxed-Depth Deson Trees Alreza Farhangfar, Russell Grener and Martn Znkevh Dept of Computng Sene Unversty of Alberta Edmonton, Alberta T6G 2E8 Canada {farhang, grener, maz}@s.ualberta.a

More information

A Real-Time Detecting Algorithm for Tracking Community Structure of Dynamic Networks

A Real-Time Detecting Algorithm for Tracking Community Structure of Dynamic Networks A Real-Tme Detetng Algorthm for Trakng Communty Struture of Dynam Networks Jaxng Shang*, Lanhen Lu*, Feng Xe, Zhen Chen, Jaa Mao, Xueln Fang, Cheng Wu* Department of Automaton, Tsnghua Unversty, Beng,,

More information

Time Synchronization in WSN: A survey Vikram Singh, Satyendra Sharma, Dr. T. P. Sharma NIT Hamirpur, India

Time Synchronization in WSN: A survey Vikram Singh, Satyendra Sharma, Dr. T. P. Sharma NIT Hamirpur, India Internatonal Journal of Enhaned Researh n Sene Tehnology & Engneerng, ISSN: 2319-7463 Vol. 2 Issue 5, May-2013, pp: (61-67), Avalable onlne at: www.erpublatons.om Tme Synhronzaton n WSN: A survey Vkram

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Cluster ( Vehicle Example. Cluster analysis ( Terminology. Vehicle Clusters. Why cluster?

Cluster (  Vehicle Example. Cluster analysis (  Terminology. Vehicle Clusters. Why cluster? Why luster? referene funton R R Although R and R both somewhat orrelated wth the referene funton, they are unorrelated wth eah other Cluster (www.m-w.om) A number of smlar ndvduals that our together as

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Session 4.2. Switching planning. Switching/Routing planning

Session 4.2. Switching planning. Switching/Routing planning ITU Semnar Warsaw Poland 6-0 Otober 2003 Sesson 4.2 Swthng/Routng plannng Network Plannng Strategy for evolvng Network Arhtetures Sesson 4.2- Swthng plannng Loaton problem : Optmal plaement of exhanges

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

TAR based shape features in unconstrained handwritten digit recognition

TAR based shape features in unconstrained handwritten digit recognition TAR based shape features n unonstraned handwrtten dgt reognton P. AHAMED AND YOUSEF AL-OHALI Department of Computer Sene Kng Saud Unversty P.O.B. 578, Ryadh 543 SAUDI ARABIA shamapervez@gmal.om, yousef@s.edu.sa

More information

Boosting Weighted Linear Discriminant Analysis

Boosting Weighted Linear Discriminant Analysis . Okada et al. / Internatonal Journal of Advaned Statsts and I&C for Eonoms and Lfe Senes Boostng Weghted Lnear Dsrmnant Analyss azunor Okada, Arturo Flores 2, Marus George Lnguraru 3 Computer Sene Department,

More information

Pattern Classification: An Improvement Using Combination of VQ and PCA Based Techniques

Pattern Classification: An Improvement Using Combination of VQ and PCA Based Techniques Ameran Journal of Appled Senes (0): 445-455, 005 ISSN 546-939 005 Sene Publatons Pattern Classfaton: An Improvement Usng Combnaton of VQ and PCA Based Tehnques Alok Sharma, Kuldp K. Palwal and Godfrey

More information

International Journal of Pharma and Bio Sciences HYBRID CLUSTERING ALGORITHM USING POSSIBILISTIC ROUGH C-MEANS ABSTRACT

International Journal of Pharma and Bio Sciences HYBRID CLUSTERING ALGORITHM USING POSSIBILISTIC ROUGH C-MEANS ABSTRACT Int J Pharm Bo S 205 Ot; 6(4): (B) 799-80 Researh Artle Botehnology Internatonal Journal of Pharma and Bo Senes ISSN 0975-6299 HYBRID CLUSTERING ALGORITHM USING POSSIBILISTIC ROUGH C-MEANS *ANURADHA J,

More information

LOCAL BINARY PATTERNS AND ITS VARIANTS FOR FACE RECOGNITION

LOCAL BINARY PATTERNS AND ITS VARIANTS FOR FACE RECOGNITION IEEE-Internatonal Conferene on Reent Trends n Informaton Tehnology, ICRTIT 211 MIT, Anna Unversty, Chenna. June 3-5, 211 LOCAL BINARY PATTERNS AND ITS VARIANTS FOR FACE RECOGNITION K.Meena #1, Dr.A.Suruland

More information

Bit-level Arithmetic Optimization for Carry-Save Additions

Bit-level Arithmetic Optimization for Carry-Save Additions Bt-leel Arthmet Optmzaton for Carry-Sae s Ke-Yong Khoo, Zhan Yu and Alan N. Wllson, Jr. Integrated Cruts and Systems Laboratory Unersty of Calforna, Los Angeles, CA 995 khoo, zhanyu, wllson @sl.ula.edu

More information

Connectivity in Fuzzy Soft graph and its Complement

Connectivity in Fuzzy Soft graph and its Complement IOSR Journal of Mathemats (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 1 Issue 5 Ver. IV (Sep. - Ot.2016), PP 95-99 www.osrjournals.org Connetvty n Fuzzy Soft graph and ts Complement Shashkala

More information

Computing Cloud Cover Fraction in Satellite Images using Deep Extreme Learning Machine

Computing Cloud Cover Fraction in Satellite Images using Deep Extreme Learning Machine Computng Cloud Cover Fraton n Satellte Images usng Deep Extreme Learnng Mahne L-guo WENG, We-bn KONG, Mn XIA College of Informaton and Control, Nanjng Unversty of Informaton Sene & Tehnology, Nanjng Jangsu

More information

Color Texture Classification using Modified Local Binary Patterns based on Intensity and Color Information

Color Texture Classification using Modified Local Binary Patterns based on Intensity and Color Information Color Texture Classfaton usng Modfed Loal Bnary Patterns based on Intensty and Color Informaton Shvashankar S. Department of Computer Sene Karnatak Unversty, Dharwad-580003 Karnataka,Inda shvashankars@kud.a.n

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Progressive scan conversion based on edge-dependent interpolation using fuzzy logic

Progressive scan conversion based on edge-dependent interpolation using fuzzy logic Progressve san onverson based on edge-dependent nterpolaton usng fuzzy log P. Brox brox@mse.nm.es I. Baturone lum@mse.nm.es Insttuto de Mroeletróna de Sevlla, Centro Naonal de Mroeletróna Avda. Rena Meredes

More information

A Robust Algorithm for Text Detection in Color Images

A Robust Algorithm for Text Detection in Color Images A Robust Algorthm for Tet Deteton n Color Images Yangng LIU Satosh GOTO Takesh IKENAGA Abstrat Tet deteton n olor mages has beome an atve researh area sne reent deades. In ths paper we present a novel

More information

Clustering Data. Clustering Methods. The clustering problem: Given a set of objects, find groups of similar objects

Clustering Data. Clustering Methods. The clustering problem: Given a set of objects, find groups of similar objects Clusterng Data The lusterng problem: Gven a set of obets, fnd groups of smlar obets Cluster: a olleton of data obets Smlar to one another wthn the same luster Dssmlar to the obets n other lusters What

More information

Microprocessors and Microsystems

Microprocessors and Microsystems Mroproessors and Mrosystems 36 (2012) 96 109 Contents lsts avalable at SeneDret Mroproessors and Mrosystems journal homepage: www.elsever.om/loate/mpro Hardware aelerator arhteture for smultaneous short-read

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Clustering incomplete data using kernel-based fuzzy c-means algorithm

Clustering incomplete data using kernel-based fuzzy c-means algorithm Clusterng noplete data usng ernel-based fuzzy -eans algorth Dao-Qang Zhang *, Song-Can Chen Departent of Coputer Sene and Engneerng, Nanjng Unversty of Aeronauts and Astronauts, Nanjng, 210016, People

More information

A MPAA-Based Iterative Clustering Algorithm Augmented by Nearest Neighbors Search for Time-Series Data Streams

A MPAA-Based Iterative Clustering Algorithm Augmented by Nearest Neighbors Search for Time-Series Data Streams A MPAA-Based Iteratve Clusterng Algorthm Augmented by Nearest Neghbors Searh for Tme-Seres Data Streams Jessa Ln 1, Mha Vlahos 1, Eamonn Keogh 1, Dmtros Gunopulos 1, Janwe Lu 2, Shouan Yu 2, and Jan Le

More information

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Interval uncertain optimization of structures using Chebyshev meta-models

Interval uncertain optimization of structures using Chebyshev meta-models 0 th World Congress on Strutural and Multdsplnary Optmzaton May 9-24, 203, Orlando, Florda, USA Interval unertan optmzaton of strutures usng Chebyshev meta-models Jngla Wu, Zhen Luo, Nong Zhang (Tmes New

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

OSSM Ordered Sequence Set Mining for Maximal Length Frequent Sequences A Hybrid Bottom-Up-Down Approach

OSSM Ordered Sequence Set Mining for Maximal Length Frequent Sequences A Hybrid Bottom-Up-Down Approach Global Journal of Computer Sene and Tehnology Volume 12 Issue 7 Verson 1.0 Aprl 2012 Type: Double Blnd Peer Revewed Internatonal Researh Journal Publsher: Global Journals In. (USA) Onlne ISSN: 0975-4172

More information

Steganalysis of DCT-Embedding Based Adaptive Steganography and YASS

Steganalysis of DCT-Embedding Based Adaptive Steganography and YASS Steganalyss of DCT-Embeddng Based Adaptve Steganography and YASS Qngzhong Lu Department of Computer Sene Sam Houston State Unversty Huntsvlle, TX 77341, U.S.A. lu@shsu.edu ABSTRACT Reently well-desgned

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Performance Analysis of Hybrid (supervised and unsupervised) method for multiclass data set

Performance Analysis of Hybrid (supervised and unsupervised) method for multiclass data set IOSR Journal of Computer Engneerng (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 4, Ver. III (Jul Aug. 2014), PP 93-99 www.osrjournals.org Performane Analyss of Hybrd (supervsed and

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Language Understanding in the Wild: Combining Crowdsourcing and Machine Learning

Language Understanding in the Wild: Combining Crowdsourcing and Machine Learning Language Understandng n the Wld: Combnng Crowdsourng and Mahne Learnng Edwn Smpson Unversty of Oxford, UK edwn@robots.ox.a.uk Pushmeet Kohl Mrosoft Researh, Cambrdge, UK pkohl@mrosoft.om Matteo Venanz

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Topic 5: semantic analysis. 5.5 Types of Semantic Actions

Topic 5: semantic analysis. 5.5 Types of Semantic Actions Top 5: semant analyss 5.5 Types of Semant tons Semant analyss Other Semant tons Other Types of Semant tons revously, all semant atons were for alulatng attrbute values. In a real ompler, other types of

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Minimize Congestion for Random-Walks in Networks via Local Adaptive Congestion Control

Minimize Congestion for Random-Walks in Networks via Local Adaptive Congestion Control Journal of Communatons Vol. 11, No. 6, June 2016 Mnmze Congeston for Random-Walks n Networks va Loal Adaptve Congeston Control Yang Lu, Y Shen, and Le Dng College of Informaton Sene and Tehnology, Nanjng

More information

On the End-to-end Call Acceptance and the Possibility of Deterministic QoS Guarantees in Ad hoc Wireless Networks

On the End-to-end Call Acceptance and the Possibility of Deterministic QoS Guarantees in Ad hoc Wireless Networks On the End-to-end Call Aeptane and the Possblty of Determnst QoS Guarantees n Ad ho Wreless Networks S. Srram T. heemarjuna Reddy Dept. of Computer Sene Dept. of Computer Sene and Engneerng Unversty of

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

All-Pairs Shortest Paths. Approximate All-Pairs shortest paths Approximate distance oracles Spanners and Emulators. Uri Zwick Tel Aviv University

All-Pairs Shortest Paths. Approximate All-Pairs shortest paths Approximate distance oracles Spanners and Emulators. Uri Zwick Tel Aviv University Approxmate All-Pars shortest paths Approxmate dstance oracles Spanners and Emulators Ur Zwck Tel Avv Unversty Summer School on Shortest Paths (PATH05 DIKU, Unversty of Copenhagen All-Pars Shortest Paths

More information

Adaptive Class Preserving Representation for Image Classification

Adaptive Class Preserving Representation for Image Classification Adaptve Class Preservng Representaton for Image Classfaton Jan-Xun M,, Qankun Fu,, Wesheng L, Chongqng Key Laboratory of Computatonal Intellgene, Chongqng Unversty of Posts and eleommunatons, Chongqng,

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Semi-analytic Evaluation of Quality of Service Parameters in Multihop Networks

Semi-analytic Evaluation of Quality of Service Parameters in Multihop Networks U J.T. (4): -4 (pr. 8) Sem-analyt Evaluaton of Qualty of Serve arameters n Multhop etworks Dobr tanassov Batovsk Faulty of Sene and Tehnology, ssumpton Unversty, Bangkok, Thaland bstrat

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Prof. Chrs Clfton 15 September 2017 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group Retreval Models Informaton Need Representaton

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Multi-Collaborative Filtering Algorithm for Accurate Push of Command Information System

Multi-Collaborative Filtering Algorithm for Accurate Push of Command Information System Mult-Collaboratve Flterng Algorthm for Aurate Push of Command Informaton System Cu Xao-long,, Du Bo, Su Guo-Png, Yu Yan Urumq Command College of CPAPF, Urumq 83000, Chna XnJang Tehnal Insttute of Physs

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

AVideoStabilizationMethodbasedonInterFrameImageMatchingScore

AVideoStabilizationMethodbasedonInterFrameImageMatchingScore Global Journal of Computer Sene and Tehnology: F Graphs & vson Volume 7 Issue Verson.0 Year 207 Type: Double Blnd Peer Revewed Internatonal Researh Journal Publsher: Global Journals In. (USA) Onlne ISSN:

More information

Design Level Performance Modeling of Component-based Applications. Yan Liu, Alan Fekete School of Information Technologies University of Sydney

Design Level Performance Modeling of Component-based Applications. Yan Liu, Alan Fekete School of Information Technologies University of Sydney Desgn Level Performane Modelng of Component-based Applatons Tehnal Report umber 543 ovember, 003 Yan Lu, Alan Fekete Shool of Informaton Tehnologes Unversty of Sydney Ian Gorton Paf orthwest atonal Laboratory

More information

Parallel Grammatical Evolution for Circuit Optimization

Parallel Grammatical Evolution for Circuit Optimization Proeedngs of the World Congress on Engneerng and Computer Sene 2009 Vol II WCECS 2009, Otober 20-22, 2009, San Franso, USA Parallel Grammatal Evoluton for Crut Optmzaton 2OGLFK Kratohvíl, Pavel Ošmera,

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

MULTIPLE OBJECT DETECTION AND TRACKING IN SONAR MOVIES USING AN IMPROVED TEMPORAL DIFFERENCING APPROACH AND TEXTURE ANALYSIS

MULTIPLE OBJECT DETECTION AND TRACKING IN SONAR MOVIES USING AN IMPROVED TEMPORAL DIFFERENCING APPROACH AND TEXTURE ANALYSIS U.P.B. S. Bull., Seres A, Vol. 74, Iss. 2, 2012 ISSN 1223-7027 MULTIPLE OBJECT DETECTION AND TRACKING IN SONAR MOVIES USING AN IMPROVED TEMPORAL DIFFERENCING APPROACH AND TEXTURE ANALYSIS Tudor BARBU 1

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

The Simulation of Electromagnetic Suspension System Based on the Finite Element Analysis

The Simulation of Electromagnetic Suspension System Based on the Finite Element Analysis 308 JOURNAL OF COMPUTERS, VOL. 8, NO., FEBRUARY 03 The Smulaton of Suspenson System Based on the Fnte Element Analyss Zhengfeng Mng Shool of Eletron & Mahanal Engneerng, Xdan Unversty, X an, Chna Emal:

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Scalable Parametric Runtime Monitoring

Scalable Parametric Runtime Monitoring Salable Parametr Runtme Montorng Dongyun Jn Patrk O Nel Meredth Grgore Roşu Department of Computer Sene Unversty of Illnos at Urbana Champagn Urbana, IL, U.S.A. {djn3, pmeredt, grosu}@s.llnos.edu Abstrat

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

Gabor-Filtering-Based Completed Local Binary Patterns for Land-Use Scene Classification

Gabor-Filtering-Based Completed Local Binary Patterns for Land-Use Scene Classification Gabor-Flterng-Based Completed Loal Bnary Patterns for Land-Use Sene Classfaton Chen Chen 1, Lbng Zhou 2,*, Janzhong Guo 1,2, We L 3, Hongjun Su 4, Fangda Guo 5 1 Department of Eletral Engneerng, Unversty

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

A Novel Dynamic and Scalable Caching Algorithm of Proxy Server for Multimedia Objects

A Novel Dynamic and Scalable Caching Algorithm of Proxy Server for Multimedia Objects Journal of VLSI Sgnal Proessng 2007 * 2007 Sprnger Sene + Busness Meda, LLC. Manufatured n The Unted States. DOI: 10.1007/s11265-006-0024-7 A Novel Dynam and Salable Cahng Algorthm of Proxy Server for

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information