This excerpt from. Foundations of Statistical Natural Language Processing. Christopher D. Manning and Hinrich Schütze The MIT Press.

Size: px
Start display at page:

Download "This excerpt from. Foundations of Statistical Natural Language Processing. Christopher D. Manning and Hinrich Schütze The MIT Press."

Transcription

1 Ths excerpt from Foundatons of Statstcal Natural Language Processng. Chrstopher D. Mannng and Hnrch Schütze The MIT Press. s provded n screen-vewable form for personal use only by members of MIT CogNet. Unauthorzed use or dssemnaton of ths nformaton s expressly forbdden. If you have any questons about ths materal, please contact cognetadmn@cognet.mt.edu.

2 14 Clusterng clusters dendrogram data representaton model bags Clusterng algorthms partton a set of objects nto groups or clusters. Fgure 14.1 gves an example of a clusterng of 22 hgh-frequency words from the Brown corpus. The fgure s an example of a dendrogram, a branchng dagram where the apparent smlarty between nodes at the bottom s shown by the heght of the connecton whch jons them. Each node n the tree represents a cluster that was created by mergng two chld nodes. For example, n and on form a cluster and so do wth and for. These two subclusters are then merged nto one cluster wth four objects. The heght of the node corresponds to the decreasng smlarty of the two clusters that are beng merged (or, equvalently, to the order n whch the merges were executed). The greatest smlarty between any two clusters s the smlarty between n and on correspondng to the lowest horzontal lne n the fgure. The least smlarty s between be and the cluster wth the 21 other words correspondng to the hghest horzontal lne n the fgure. Whle the objects n the clusterng are all dstnct as tokens, normally objects are descrbed and clustered usng a set of features and values (often known as the data representaton model), and multple objects may have the same representaton n ths model, so we wll defne our clusterng algorthms to work over bags objects lke sets except that they allow multple dentcal tems. The goal s to place smlar objects n the same group and to assgn dssmlar objects to dfferent groups. What s the noton of smlarty between words beng used here? Frst, the left and rght neghbors of tokens of each word n the Brown corpus were talled. These dstrbutons gve a farly true mplementaton of Frth s dea that one can categorze a word by the words that occur around t. But now, rather than lookng for dstnctve collocatons, as n

3 Clusterng be not he I t ths the hs a and but n on wth for at from of to as s was Fgure 14.1 A sngle-lnk clusterng of 22 frequent Englsh words represented as a dendrogram. chapter 5, we are capturng and usng the whole dstrbutonal pattern of the word. Word smlarty was then measured as the degree of overlap n the dstrbutons of these neghbors for the two words n queston. For example, the smlarty between n and on s large because both words occur wth smlar left and rght neghbors (both are prepostons and tend to be followed by artcles or other words that begn noun phrases, for nstance). The smlarty between s and he s small because they share fewer mmedate neghbors due to ther dfferent grammatcal functons. Intally, each word formed ts own cluster, and then at each step n the

4 497 exploratory data analyss clusterng, the two clusters that are closest to each other are merged nto a new cluster. There are two man uses for clusterng n Statstcal NLP. The fgure demonstrates the use of clusterng for exploratory data analyss (EDA). Somebody who does not know Englsh would be able to derve a crude groupng of words nto parts of speech from fgure 14.1 and ths nsght may make subsequent analyss easer. Or we can use the fgure to evaluate neghbor overlap as a measure of part-of-speech smlarty, assumng we know what the correct parts of speech are. The clusterng makes apparent both strengths and weaknesses of a neghbor-based representaton. It works well for prepostons (whch are all grouped together), but seems napproprate for other words such as ths and the whch are not grouped together wth grammatcally smlar words. Exploratory data analyss s an mportant actvty n any pursut that deals wth quanttatve data. Whenever we are faced wth a new problem and want to develop a probablstc model or just understand the basc characterstcs of the phenomenon, EDA s the frst step. It s always a mstake to not frst spend some tme gettng a feel for what the data at hand look lke. Clusterng s a partcularly mportant technque for EDA n Statstcal NLP because there s often no drect pctoral vsualzaton for lngustc objects. Other felds, n partcular those dealng wth numercal or geographc data, often have an obvous vsualzaton, for example, maps of the ncdence of a partcular dsease n epdemology. Any technque that lets one vsualze the data better s lkely to brng to the fore new generalzatons and to stop one from makng wrong assumptons about the data. There are other well-known technques for dsplayng a set of objects n a two-dmensonal plane (such as pages of books); see secton 14.3 for references. When used for EDA, clusterng s thus only one of a number of technques that one mght employ, but t has the advantage that t can produce a rcher herarchcal structure. It may also be more convenent to work wth snce vsual dsplays are more complex. One has to worry about how to label objects shown on the dsplay, and, n contrast to clusterng, cannot gve a comprehensve descrpton of the object next to ts vsual representaton. The other man use of clusterng n NLP s for generalzaton. We re- ferred to ths as formng bns or equvalence classes n secton 6.1. But there we grouped data ponts n certan predetermned ways, whereas here we nduce the bns from data. generalzaton

5 Clusterng learnng classfcaton As an example, suppose we want to determne the correct preposton to use wth the noun Frday for translatng a text from French nto Englsh. Suppose also that we have an Englsh tranng text that contans the phrases on Sunday, on Monday, andon Thursday, but not on Frday. That on s the correct preposton to use wth Frday can be nferred as follows. If we cluster Englsh nouns nto groups wth smlar syntactc and semantc envronments, then the days of the week wll end up n the same cluster. Ths s because they share envronments lke untl day-of-the-week, last day-of-the-week, and day-of-the-week mornng. Under the assumpton that an envronment that s correct for one member of the cluster s also correct for the other members of the cluster, we can nfer the correctness of on Frday from the presence of on Sunday, on Monday and on Thursday. So clusterng s a way of learnng. We group objects nto clusters and generalze from what we know about some members of the cluster (lke the approprateness of the preposton on) toothers. Another way of parttonng objects nto groups s classfcaton, whch s the subject of chapter 16. The dfference s that classfcaton s supervsed and requres a set of labeled tranng nstances for each group. Clusterng does not requre tranng data and s hence called unsupervsed because there s no teacher who provdes a tranng set wth class labels. The result of clusterng only depends on natural dvsons n the data, for example the dfferent neghbors of prepostons, artcles and pronouns n the above dendrogram, not on any pre-exstng categorzaton scheme. Clusterng s sometmes called automatc or unsupervsed classfcaton, but we wll not use these terms n order to avod confuson. There are many dfferent clusterng algorthms, but they can be classfed nto a few basc types. There are two types of structures produced by clusterng algorthms, herarchcal clusterngs and flat or non- herarchcal clusterngs. Flat clusterngs smply consst of a certan number of clusters and the relaton between clusters s often undetermned. Most algorthms that produce flat clusterngs are teratve. They start wth a set of ntal clusters and mprove them by teratng a reallocaton operaton that reassgns objects. A herarchcal clusterng s a herarchy wth the usual nterpretaton that each node stands for a subclass of ts mother s node. The leaves of the tree are the sngle objects of the clustered set. Each node represents the cluster that contans all the objects of ts descendants. Fgure 14.1 s an example of a herarchcal cluster structure. herarchcal flat non-herarchcal teratve

6 499 Another mportant dstncton between clusterng algorthms s whether they perform a soft clusterng or hard clusterng. In a hard assgn- ment, each object s assgned to one and only one cluster. Soft assgnments allow degrees of membershp and membershp n multple clusters. In a probablstc framework, an object x has a probablty dstrbuton P( x ) over clusters c j where P(c j x ) s the probablty that x s a member of c j. In a vector space model, degree of membershp n multple clusters can be formalzed as the smlarty of a vector to the center of each cluster. In a vector space, the center of the M ponts n a cluster c, otherwse known as the centrod or center of gravty s the pont: soft clusterng hard clusterng centrod center of gravty (14.1) µ = 1 M x x c In other words, each component of the centrod vector µ s smply the average of the values for that component n the M ponts n c. In herarchcal clusterng, assgnment s usually hard. In non-herarchcal clusterng, both types of assgnment are common. Even most soft assgnment models assume that an object s assgned to only one cluster. The dfference from hard clusterng s that there s uncertanty about whch cluster s the correct one. There are also true multple assgnment dsjunctve models, so-called dsjunctve clusterng models, n whch an object can clusterng truly belong to several clusters. For example, there may be a mx of syntactc and semantc categores n word clusterng and book would fully belong to both the semantc object and the syntactc noun category. We wll not cover dsjunctve clusterng models here. See (Saund 1994) for an example of a dsjunctve clusterng model. Nevertheless, t s worth mentonng at the begnnng the lmtatons that follow from the assumptons of most clusterng algorthms. A hard clusterng algorthm has to choose one cluster to whch to assgn every tem. Ths s rather unappealng for many problems n NLP. Itsa commonplace that many words have more than one part of speech. For nstance play can be a noun or a verb, and fast canbeanadjectveoran adverb. And many larger unts also show mxed behavor. Nomnalzed clauses show some verb-lke (clausal) behavor and some noun-lke (nomnalzaton) behavor. And we suggested n chapter 7 that several senses of a word were often smultaneously actvated. Wthn a hard clusterng framework, the best we can do n such cases s to defne addtonal clusters correspondng to words that can be ether nouns or verbs, and so on. Soft clusterng s therefore somewhat more approprate for many prob-

7 Clusterng Herarchcal clusterng: Preferable for detaled data analyss Provdes more nformaton than flat clusterng No sngle best algorthm (each of the algorthms we descrbe has been found to be optmal for some applcaton) Less effcent than flat clusterng (for n objects, one mnmally has to compute an n n matrx of smlarty coeffcents, and then update ths matrx as one proceeds) Non-herarchcal clusterng: Preferable f effcency s a consderaton or data sets are very large K-means s the conceptually smplest method and should probablybeusedfrstonanewdata set because ts results are often suffcent K-means assumes a smple Eucldean representaton space, and so cannot be used for many data sets, for example, nomnal data lke colors In such cases, the EM algorthm s the method of choce. It can accommodate defnton of clusters and allocaton of objects based on complex probablstc models. Table 14.1 A summary of the attrbutes of dfferent clusterng algorthms. lems n NLP, snce a soft clusterng algorthm can assgn an ambguous word lke play partly to the cluster of verbs and partly to the cluster of nouns. The remander of the chapter looks n turn at varous herarchcal and non-herarchcal clusterng methods, and some of ther applcatons n NLP. In table 14.1, we brefly characterze some of the features of clusterng algorthms for the reader who s just lookng for a quck soluton to an mmedate clusterng need. For a dscusson of the pros and cons of dfferent clusterng algorthms see Kaufman and Rousseeuw (1990). The man notatons that we wll use n ths chapter are summarzed n table Herarchcal Clusterng The tree of a herarchcal clusterng can be produced ether bottom-up, by startng wth the ndvdual objects and groupng the most smlar

8 14.1 Herarchcal Clusterng 501 Notaton Meanng X={x 1,...,x n } the set of n objects to be clustered C ={c 1,...,c j,...c k } the set of clusters (or cluster hypotheses) P(X) powerset (set of subsets) of X sm(, ) smlarty functon S( ) group average smlarty functon m Dmensonalty of vector space R m M j Number of ponts n cluster c j s(c j ) Vector sum of vectors n cluster c j N number of word tokens n tranng corpus w,...,j tokens through j of the tranng corpus π( ) functon assgnng words to clusters C(w 1 w 2 ) number of occurrences of strng w 1 w 2 C(c 1 c 2 ) number of occurrences of strng w 1 w 2 s.t. π(w 1 ) = c 1, π(w 2 ) = c 2 µ j Centrod for cluster c j Σ j Covarance matrx for cluster c j Table 14.2 Symbols used n the clusterng chapter. agglomeratve clusterng ones, or top-down, whereby one starts wth all the objects and dvdes them nto groups so as to maxmze wthn-group smlarty. Fgure 14.2 descrbes the bottom-up algorthm, also called agglomeratve clusterng. Agglomeratve clusterng s a greedy algorthm that starts wth a separate cluster for each object (3,4). In each step, the two most smlar clusters are determned (8), and merged nto a new cluster (9). The algorthm termnates when one large cluster contanng all objects of S has been formed, whch then s the only remanng cluster n C (7). Let us flag one possbly confusng ssue. We have phrased the clusterng algorthm n terms of smlarty between clusters, and therefore we jon thngs wth maxmum smlarty (8). Sometmes people thnk n terms of dstances between clusters, and then you want to jon thngs that are the mnmum dstance apart. So t s easy to get confused between whether you re takng maxmums or mnmums. It s straghtforward to produce a smlarty measure from a dstance measure d, for example by sm(x, y) = 1/(1 + d(x, y)). Fgure 14.3 descrbes top-down herarchcal clusterng, also called dv- sve clusterng (Jan and Dubes 1988: 57). Lke agglomeratve clusterng dvsve clusterng

9 Clusterng 1 Gven: a set X={x 1,... x n } of objects 2 a functon sm: P(X) P(X) R 3 for := 1 to n do 4 c :={x } end 5 C := {c 1,...,c n } 6 j := n whle C > 1 8 (c n1,c n2 ) := arg max (cu,c v ) CC sm(c u,c v ) 9 c j = c n1 c n2 10 C := C\{c n1,c n2 } {c j } 11 j := j + 1 Fgure 14.2 Bottom-up herarchcal clusterng. 1 Gven: a set X={x 1,... x n } of objects 2 a functon coh: P(X) R 3 a functon splt: P(X) P(X) P(X) 4 C := {X} (= {c 1 }) 5 j := 1 6 whle c C s.t. c > 1 7 c u := arg mn cv C coh(c v ) 8 (c j+1,c j+2 ) = splt(c u ) 9 C := C\{c u } {c j+1,c j+2 } 10 j := j + 2 Fgure 14.3 Top-down herarchcal clusterng. monotonc t s a greedy algorthm. Startng from a cluster wth all objects (4), each teraton determnes whch cluster s least coherent (7) and splts ths cluster (8). Clusters wth smlar objects are more coherent than clusters wth dssmlar objects. For example, a cluster wth several dentcal members s maxmally coherent. Herarchcal clusterng only makes sense f the smlarty functon s monotonc: (14.2) Monotoncty. c,c,c S :mn ( sm(c, c ), sm(c, c ) ) sm(c, c c ) In other words, the operaton of mergng s guaranteed to not ncrease smlarty. A smlarty functon that does not obey ths condton makes

10 14.1 Herarchcal Clusterng 503 Functon sngle lnk complete lnk group-average Defnton smlarty of two most smlar members smlarty of two least smlar members average smlarty between members Table 14.3 Smlarty functons used n clusterng. Note that for group-average clusterng, we average over all pars, ncludng pars from the same cluster. For sngle-lnk and complete-lnk clusterng, we quantfy over the subset of pars from dfferent clusters. the herarchy unnterpretable snce dssmlar clusters, whch are placed far apart n the tree, can become smlar n subsequent mergng so that closeness n the tree does not correspond to conceptual smlarty anymore. Most herarchcal clusterng algorthms follow the schemes outlned n fgures 14.2 and The followng sectons dscuss specfc nstances of these algorthms Sngle-lnk and complete-lnk clusterng local coherence Table 14.3 shows three smlarty functons that are commonly used n nformaton retreval (van Rjsbergen 1979: 36ff). Recall that the smlarty functon determnes whch clusters are merged n each step n bottom-up clusterng. In sngle-lnk clusterng the smlarty between two clusters s the smlarty of the two closest objects n the clusters. We search over all pars of objects that are from the two dfferent clusters and select the par wth the greatest smlarty. Sngle-lnk clusterngs have clusters wth good local coherence snce the smlarty functon s locally defned. However, clusters can be elongated or straggly as shown n fgure To see why sngle-lnk clusterng produces such elongated clusters, observe frst that the best moves n fgure 14.4 are to merge the two top pars of ponts and then the two bottom pars of ponts, snce the smlartes a/b, c/d, e/f,andg/h are the largest for any par of objects. Ths gves us the clusters n fgure The next two steps are to frst merge the top two clusters, and then the bottom two clusters, snce the pars b/c and f/g are closer than all others that are not n the same cluster (e.g., closer than b/f and c/g). After dong these two merges we get fgure We end up wth two clusters that

11 Clusterng 5 a b c d d e 3 2 d 2d f g h Fgure 14.4 A cloud of ponts n a plane. 5 a b c d e f g h Fgure 14.5 Intermedate clusterng of the ponts n fgure channg effect mnmum spannng tree are locally coherent (meanng that close objects are n the same cluster), but whch can be regarded as beng of bad global qualty. An example of bad global qualty s that a s much closer to e than to d, yeta and d are n the same cluster whereas a and e are not. The tendency of sngle-lnk clusterng to produce ths type of elongated cluster s sometmes called the channg effect snce we follow a chan of large smlartes wthout takng nto account the global context. Sngle-lnk clusterng s closely related to the mnmum spannng tree (MST) of a set of ponts. The MST s the tree that connects all objects wth edges that have the largest smlartes. That s, of all trees connectng the set of objects the sum of the length of the edges of the MST s mn-

12 14.1 Herarchcal Clusterng a b c d e f g h Fgure 14.6 Sngle-lnk clusterng of the ponts n fgure a b c d e f g h Fgure 14.7 Complete-lnk clusterng of the ponts n fgure mal. A sngle-lnk herarchy can be constructed top-down from an MST by removng the longest edge n the MST so that two unconnected components are created, correspondng to two subclusters. The same operaton s then recursvely appled to these two subclusters (whch are also MSTs). Complete-lnk clusterng has a smlarty functon that focuses on global cluster qualty (as opposed to locally coherent clusters as n the case of sngle-lnk clusterng). The smlarty of two clusters s the smlarty of ther two most dssmlar members. Complete-lnk clusterng avods elongated clusters. For example, n complete-lnk clusterng the two best merges n fgure 14.5 are to merge the two left clusters, and then the

13 Clusterng two rght clusters, resultng n the clusters n fgure Here, the mnmally smlar par for the left clusters (a/f or b/e) s tghter than the mnmally smlar par of the two top clusters (a/d). So far we have made the assumpton that tght clusters are better than straggly clusters. Ths reflects an ntuton that a cluster s a group of objects centered around a central pont, and so compact clusters are to be preferred. Such an ntuton corresponds to a model lke the Gaussan dstrbuton (secton 2.1.9), whch gves rse to sphere-lke clusters. But ths s only one possble underlyng model of what a good cluster s. It s really a queston of our pror knowledge about and model of the data whch determnes what a good cluster s. For example, the Hawa an slands were produced (and are beng produced) by a volcanc process whch moves along a straght lne and creates new volcanoes at more or less regular ntervals. Sngle-lnk s a very approprate clusterng model here snce local coherence s what counts and elongated clusters are what we would expect (say, f we wanted to group several chans of volcanc slands). It s mportant to remember that the dfferent clusterng algorthms that we dscuss wll generally produce dfferent results whch ncorporate the somewhat ad hoc bases of the dfferent algorthms. Nevertheless, n most NLP applcatons, the sphere-shaped clusters of complete-lnk clusterng are preferable to the elongated clusters of sngle-lnk clusterng. The dsadvantage of complete-lnk clusterng s that t has tme complexty O(n 3 ) snce there are n mergng steps and each step requres O(n 2 ) comparsons to fnd the smallest smlarty between any two objects for each cluster par (where n s the number of objects to be clustered). 1 In contrast, sngle-lnk clusterng has complexty O(n 2 ). Once the n n smlarty matrx for all objects has been computed, t can be updated after each merge n O(n): f clusters c u and c v are merged nto c j = c u c v, then the smlarty of the merge wth another cluster c k s smply the maxmum of the two ndvdual smlartes: sm(c j,c k ) = max(sm(c u,c k ), sm(c v,c k )) Each of the n 1 merges requres at most n constant-tme updates. Both mergng and smlarty computaton thus have complexty O(n 2 ) 1. O(n 3 ) s an nstance of Bg Oh notaton for algorthmc complexty. We assume that the reader s famlar wth t, or else s wllng to skp ssues of algorthmc complexty. It s defned n most books on algorthms, ncludng (Cormen et al. 1990). The notaton descrbes just the basc dependence of an algorthm on certan parameters, whle gnorng constant factors.

14 14.1 Herarchcal Clusterng 507 n sngle-lnk clusterng, whch corresponds to an overall complexty of O(n 2 ). Sngle-lnk and complete-lnk clusterng can be graph-theoretcally nterpreted as fndng a maxmally connected and maxmally complete graph (or clque), respectvely, hence the term complete lnk for the latter. See (Jan and Dubes 1988: 64) Group-average agglomeratve clusterng cosne Group-average agglomeratve clusterng s a compromse between snglelnk and complete-lnk clusterng. Instead of the greatest smlarty between elements of clusters (sngle-lnk) or the least smlarty (complete lnk), the crteron for merges s average smlarty. We wll see presently that average smlarty can be computed effcently n some cases so that the complexty of the algorthm s only O(n 2 ). The group-average strategy s thus an effcent alternatve to complete-lnk clusterng whle avodng the elongated and straggly clusters that occur n sngle-lnk clusterng. Some care has to be taken n mplementng group-average agglomeratve clusterng. The complexty of computng average smlarty drectly s O(n 2 ). So f the average smlartes are computed from scratch each tme a new group s formed, that s, n each of the n mergng steps, then the algorthm would be O(n 3 ). However, f the objects are represented as length-normalzed vectors n an m-dmensonal real-valued space and f the smlarty measure s the cosne, defned as n (14.3): (14.3) m =1 sm( v, w) = v w m =1 v = x y m =1 w then there exsts an algorthm that computes the average smlarty of a cluster n constant tme from the average smlarty of ts two chldren. Gven the constant-tme for an ndvdual mergng operaton, the overall tme complexty s O(n 2 ). We wrte X for the set of objects to be clustered, each represented by a m-dmensonal vector: X R m For a cluster c j X, the average smlarty S between vectors n c j s

15 Clusterng (14.4) defned as follows. (The factor c j ( c j 1) calculates the number of (non-zero) smlartes added up n the double summaton.) 1 S(c j ) = sm( x, y) c j ( c j 1) x y c j x c j Let C be the set of current clusters. In each teraton, we dentfy the two clusters c u and c v whch maxmze S(c u c v ). Ths corresponds to step 8 n fgure A new, smaller, partton C s then constructed by mergng c u and c v (step 10 n fgure 14.2): C = (C {c u,c v }) {c u c v } For cosne as the smlarty measure, the nner maxmzaton can be done n lnear tme (Cuttng et al. 1992: 328). One can compute the average smlarty between the elements of a canddate par of clusters n constant tme by precomputng for each cluster the sum of ts members s(c j ). s(c j ) = x c j x (14.5) (14.6) The sum vector s(c j ) s defned n such a way that: () t can be easly updated after a merge (namely by smply summng the s of the clusters that are beng merged), and () the average smlarty of a cluster can be easly computed from them. Ths s so because the followng relatonshp between s(c j ) and S(c j ) holds: s(c j ) s(c j ) = x s(c j ) x c j = x y y c j x c j = c j ( c j 1)S(c j ) + x x x c j = c j ( c j 1)S(c j ) + c j Thus, S(c j ) = s(c j) s(c j ) c j c j ( c j 1) Therefore, f s( ) s known for two groups c and c j, then the average smlarty of ther unon can be computed n constant tme as follows: S(c c j ) = ( s(c ) + s(c j )) ( s(c ) + s(c j )) ( c + c j )( c + c j 1)

16 14.1 Herarchcal Clusterng 509 Gven ths result, ths approach to group-average agglomeratve clusterng has complexty O(n 2 ), reflectng the fact that ntally all parwse smlartes have to be computed. The followng step that performs n mergers (each n lnear tme) has lnear complexty, so that overall complexty s quadratc. Ths form of group-average agglomeratve clusterng s effcent enough to deal wth a large number of features (correspondng to the dmensons of the vector space) and a large number of objects. Unfortunately, the constant tme computaton for mergng two groups (by makng use of the quanttes s(c j )) depends on the propertes of vector spaces. There s no general algorthm for group-average clusterng that would be effcent ndependent of the representaton of the objects that are to be clustered An applcaton: Improvng a language model language model Now that we have ntroduced some of the best known herarchcal clusterng algorthms, t s tme to look at an example of how clusterng can be used for an applcaton. The applcaton s buldng a better language model. Recall that language models are useful n speech recognton and machne translaton for choosng among several canddate hypotheses. For example, a speech recognzer may fnd that Presdent Kennedy and precedent Kennedy are equally lkely to have produced the acoustc observatons. However, a language model can tell us what are apror lkely phrases of Englsh. Here t tell us that Presdent Kennedy s much more lkely than precedent Kennedy, and so we conclude that Presdent Kennedy s probably what was actually sad. Ths reasonng can be formalzed by the equaton for the nosy channel model, whch we ntroduced n secton It says that we should choose the hypothess H that maxmzes the product of the probablty gven by the language model, P(H), and the condtonal probablty of observng the speech sgnal D (or the foregn language text n machne translaton) gven the hypothess, P(D H). P(D H)P(H) Ĥ = arg max P(H D) = arg max H H P(D) = arg max P(D H)P(H) H Clusterng can play an mportant role n mprovng the language model (the computaton of P(H))bywayofgeneralzaton. As we saw n chapter 6, there are many rare events for whch we do not have enough tranng data for accurate probablstc modelng. If we medate probablstc

17 Clusterng nference through clusters, for whch we have more evdence n the tranng set, then our predctons for rare events are lkely to be more accurate. Ths approach was taken by Brown et al. (1992c). We frst descrbe the formalzaton of the language model and then the clusterng algorthm. The language model cross entropy perplexty (14.7) (14.8) (14.9) (14.10) (14.11) The language model under dscusson s a bgram model that makes a frst order Markov assumpton that a word depends only on the prevous word. The crteron that we optmze s a decrease n cross entropy or, equvalently, perplexty (secton 2.2.8), the amount by whch the language model reduces the uncertanty about the next word. Our am s to fnd a functon π that assgns words to clusters whch decreases perplexty compared to a smple word bgram model. We frst approxmate the cross entropy of the corpus L = w 1...w N for the cluster assgnment functon π by makng the Markov assumpton that a word s occurrence only depends on ts predecessor: H(L,π) = 1 N log P(w 1,...,N) 1 N N 1 log P(w w 1 ) 1 N 1 =2 w 1 w 2 C(w 1 w 2 ) log P(w 2 w 1 ) Now we make the basc assumpton of cluster-based generalzaton that the occurrence of a word from cluster c 2 only depends on the cluster c 1 of the precedng word: 2 H(L,π) 1 N 1 w 1 w 2 C(w 1 w 2 ) log P(c 2 c 1 )P(w 2 c 2 ) Formula (14.10) can be smplfed as follows: H(L,π) C(w1 w 2 ) N 1 [log P(w2 c 2 ) + log P(c 2 )] w 1 w 2 2. One can observe that ths equaton s very smlar to the probablstc models used n taggng, whch we dscuss n chapter 10, except that we nduce the word classes from corpus evdence nstead of takng them from our lngustc knowledge about parts of speech.

18 14.1 Herarchcal Clusterng 511 (14.12) (14.13) (14.14) + w 2 ) N 1 [log P(c 2 c 1 ) log P(c 2 )] w 1 w 2 = w 1 C(w1 w 2 ) log P(w 2 c 2 )P(c 2 ) N 1 w 2 + C(c 1 c 2 ) c 1 c 2 N 1 log P(c 2 c 1 ) P(c 2 ) P(w)log P(w)+ P(c 1 c 2 ) log P(c 1c 2 ) w c 1 c 2 P(c 1 )P(c 2 ) = H(w) I(c 1 ; c 2 ) w In (14.13) we rely on the approxmatons w 2 ) N 1 P(w 2 ) and C(c 1 c 2 ) N 1 P(c 1 c 2 ), whch hold for large n. In addton, P(w 2 c 2 )P(c 2 ) = P(w 2 c 2 ) = P(w 2 ) holds snce π(w 2 ) = c 2. Equaton (14.14) shows that we can mnmze the cross entropy by choosng the cluster assgnment functon π such that the mutual nformaton between adjacent clusters I(c 1 ; c 2 ) s maxmzed. Thus we should get the optmal language model by choosng clusters that maxmze ths mutual nformaton measure. Clusterng (14.15) The clusterng algorthm s bottom-up wth the followng merge crteron whch maxmzes the mutual nformaton between adjacent classes: MI-loss(c,c j ) = I(c k ; c ) + I(c k ; c j ) I(c k ; c c j ) c k C\{c,c j } In each step, we select the two clusters whose merge causes the smallest loss n mutual nformaton. In the descrpton of bottom-up clusterng n fgure 14.2, ths would correspond to the followng selecton crteron for the par of clusters that s to be merged next: (c n1,c n2 ) := arg mn MI-loss(c,c j ) (c,c j ) CC The clusterng s stopped when a pre-determned number k of clusters has been reached (k = 1000 n (Brown et al. 1992c)). Several shortcuts are necessary to make the computaton of the MI-loss functon and the clusterng of a large vocabulary effcent. In addton, the greedy algorthm

19 Clusterng ( do the merge wth the smallest MI-loss ) does not guarantee an optmal clusterng result. The clusters can be (and were) mproved by movng ndvdual words between clusters. The nterested reader can look up the specfcs of the algorthm n (Brown et al. 1992c). Here are three of the 1000 clusters found by Brown et al. (1992c): plan, letter, request, memo, case, queston, charge, statement, draft day, year, week, month, quarter, half evaluaton, assessment, analyss, understandng, opnon, conversaton, dscusson We observe that these clusters are characterzed by both syntactc and semantc propertes, for example, nouns that refer to tme perods. The perplexty for the cluster-based language model was 277 compared to a perplexty of 244 for a word-based model (Brown et al. 1992c: 476), so no drect mprovement was acheved by clusterng. However, a lnear nterpolaton (see secton 6.3.1) between the word-based and the clusterbased model had a perplexty of 236, whch s an mprovement over the word-based model (Brown et al. 1992c: 476). Ths example demonstrates the utlty of clusterng for the purpose of generalzaton. We conclude our dscusson by pontng out that clusterng and clusterbased nference are ntegrated here. The crteron we optmze on n clusterng, the mnmzaton of H(L,π) = H(w) I(c 1 ; c 2 ), s at the same tme a measure of the qualty of the language model, the ultmate goal of the clusterng. Other researchers frst nduce clusters and then use these clusters for generalzaton n a second, ndependent step. An ntegrated approach to clusterng and cluster-based nference s preferable because t guarantees that the nduced clusters are optmal for the partcular type of generalzaton that we ntend to use the clusterng for Top-down clusterng Herarchcal top down clusterng as descrbed n fgure 14.3 starts out wth one cluster that contans all objects. The algorthm then selects the least coherent cluster n each teraton and splts t. The functons we ntroduced n table 14.3 for selectng the best par of clusters to merge n bottom-up clusterng can also serve as measures of cluster coherence n top-down clusterng. Accordng to the sngle-lnk measure, the coherence of a cluster s the smallest smlarty n the mnmum spannng tree

20 14.1 Herarchcal Clusterng 513 Kullback-Lebler dvergence (14.16) Ths dssmlarty measure s not defned for p(x) > 0andq(x) = 0. In cases where ndvdual objects have probablty dstrbutons wth many zeros, one cannot compute the matrx of smlarty coeffcents for all objects that s requred for bottom-up clusterng. An example of such a constellaton s the approach to dstrbutonal clusterng of nouns proposed by (Perera et al. 1993). Object nouns are represented as probablty dstrbutons over verbs, where q n (v) s estmated as the relatve frequency that, gven the object noun n, the verb v s ts predcate. So for example, for the noun apple and the verb eat, we wll have q n (v) = 0.2 f one ffth of all occurrences of apple as an object noun are wth the verb eat. Any gven noun only occurs wth a lmted number of verbs, so we have the above-mentoned problem wth sngulartes n computng KL dvergence here, whch prevents us from usng bottom-up clusterng. To address ths problem, dstrbutonal noun clusterng nstead per- forms top-down clusterng. Cluster centrods are computed as (weghted and normalzed) sums of the probablty dstrbutons of the member nouns. Ths leads to cluster centrod dstrbutons wth few zeros that have a defned KL dvergence wth all ther members. See Perera et al. (1993) for a complete descrpton of the algorthm. dstrbutonal noun clusterng for the cluster; accordng to the complete-lnk measure, the coherence s the smallest smlarty between any two objects n the cluster; and accordng to the group-average measure, coherence s the average smlarty between objects n the cluster. All three measures can be used to select the least coherent cluster n each teraton of top-down clusterng. Splttng a cluster s also a clusterng task, the task of fndng two subclusters of the cluster. Any clusterng algorthm can be used for the splttng operaton, ncludng the bottom-up algorthms descrbed above and non-herarchcal clusterng. Perhaps because of ths recursve need for a second clusterng algorthm, top-down clusterng s less often used than bottom-up clusterng. However, there are tasks for whch top-down clusterng s the more natural choce. An example s the clusterng of probablty dstrbutons usng the Kullback-Lebler (KL) dvergence. Recall that KL dvergence whch we ntroduced n secton s defned as follows: D(p q) = x X p(x) log p(x) q(x)

21 Clusterng 14.2 Non-Herarchcal Clusterng Non-herarchcal algorthms often start out wth a partton based on randomly selected seeds (one seed per cluster), and then refne ths ntal partton. Most non-herarchcal algorthms employ several passes of re- allocatng objects to the currently best cluster whereas herarchcal algorthms need only one pass. However, reallocaton of objects from one cluster to another can mprove herarchcal clusterngs too. We saw an example n secton , where after each merge objects were moved around to mprove global mutual nformaton. If the non-herarchcal algorthm has multple passes, then the queston arses when to stop. Ths can be determned based on a measure of goodness or cluster qualty. We have already seen canddates of such a measure, for example, group-average smlarty and mutual nformaton between adjacent clusters. Probably the most mportant stoppng crteron s the lkelhood of the data gven the clusterng model whch we wll ntroduce below. Whchever measure we choose, we smply contnue clusterng as long as the measure of goodness mproves enough n each teraton. We stop when the curve of mprovement flattens or when goodness starts decreasng. The measure of goodness can address another problem: how to determne the rght number of clusters. In some cases, we may have some pror knowledge about the rght number of clusters (for example, the rght number of parts of speech n part-of-speech clusterng). If ths s not the case, we can cluster the data nto n clusters for dfferent values of n. Often the goodness measure mproves wth n. For example, the more clusters the hgher the maxmum mutual nformaton that can be attaned for a gven data set. However, f the data naturally fall nto a certan number k of clusters, then one can often observe a substantal ncrease n goodness n the transton from k 1 tok clusters and a small ncrease n the transton from k to k + 1. In order to automatcally determne the number of clusters, we can look for a k wth ths property and then settle on the resultng k clusters. A more prncpled approach to fndng an optmal number of clusters s the Mnmum Descrpton Length (MDL) approach n the AUTOCLASS system (Cheeseman et al. 1988). The basc dea s that the measure of goodness captures both how well the objects ft nto the clusters (whch s what the other measures we have seen do) and how many clusters there are. A hgh number of clusters wll be penalzed, leadng to a lower good- reallocatng Mnmum Descrpton Length AUTOCLASS

22 14.2 Non-Herarchcal Clusterng 515 ness value. In the framework of MDL, both the clusters and the objects are specfed by code words whose length s measured n bts. The more clusters there are, the fewer bts are necessary to encode the objects. In order to encode an object, we only encode the dfference between t and the cluster t belongs to. If there are more clusters, the clusters descrbe objects better, and we need fewer bts to descrbe the dfference between objects and clusters. However, more clusters obvously take more bts to encode. Snce the cost functon captures the length of the code for both data and clusters, mnmzng ths functon (whch maxmzes the goodness of the clusterng) wll determne both the number of clusters and how to assgn objects to clusters. 3 It may appear that t s an advantage of herarchcal clusterng that the number of clusters need not be determned. But the full cluster herarchy of a set of objects does not defne a partcular clusterng snce the tree can be cut n many dfferent ways. For a usable set of clusters n herarchcal clusterng one often needs to determne a desrable number of clusters or, alternatvely, a value of the smlarty measure at whch lnks of the tree are cut. So there s not really a dfference between herarchcal and non-herarchcal clusterng n ths respect. For some non-herarchcal clusterng algorthms, an advantage s ther speed. We cover two non-herarchcal clusterng algorthms n ths secton, K- means and the EM algorthm. K-means clusterng s probably the smplest clusterng algorthm and, despte ts lmtatons, t works suffcently well n many applcatons. The EM algorthm s a general template for a famly of algorthms. We descrbe ts ncarnaton as a clusterng algorthm frst and then relate t to the varous nstantatons that have been used n Statstcal NLP, some of whch lke the nsde-outsde algorthm and the forward-backward algorthm are more fully treated n other chapters of ths book K-means K-means recomputaton K-means s a hard clusterng algorthm that defnes clusters by the center of mass of ther members. We need a set of ntal cluster centers n the begnnng. Then we go through several teratons of assgnng each object to the cluster whose center s closest. After all objects have been assgned, we recompute the center of each cluster as the centrod or mean 3. AUTOCLASS can be downloaded from the nternet. See the webste.

23 Clusterng 1 Gven: a set X={ x 1,..., x n } R m 2 a dstance measure d : R m R m R 3 a functon for computng the mean µ : P(R) R m 4 Select k ntal centers f 1,..., f k 5 whle stoppng crteron s not true do 6 for all clusters c j do 7 c j ={ x f l d( x, f j ) d( x, f l )} 8 end 9 for all means f j do 10 fj = µ(c j ) 11 end 12 end Fgure 14.8 The K-means clusterng algorthm. µ of ts members (see fgure 14.8), that s µ = (1/ c j ) x c j x. The dstance functon s Eucldean dstance. A varant of K-means s to use the L 1 norm nstead (secton 8.5.2): L 1 ( x, y) = l x l y l medods Ths norm s less senstve to outlers. K-means clusterng n Eucldean space often creates sngleton clusters for outlers. Clusterng n L 1 space wll pay less attenton to outlers so that there s hgher lkelhood of gettng a clusterng that parttons objects nto clusters of smlar sze. The L 1 norm s often used n conjuncton wth medods as cluster centers. The dfference between medods and centrods s that a medod s one of the objects n the cluster a prototypcal class member. A centrod, the average of a cluster s members, s n most cases not dentcal to any of the objects. The tme complexty of K-means s O(n) snce both steps of the teraton are O(n) and only a constant number of teratons s computed. Fgure 14.9 shows an example of one teraton of the K-means algorthm. Frst, objects are assgned to the cluster whose mean s closest. Then the means are recomputed. In ths case, any further teratons wll not change the clusterng snce an assgnment to the closest center does not change the cluster membershp of any object, whch n turn means that no center wll be changed n the recomputaton step. But ths s

24 14.2 Non-Herarchcal Clusterng assgnment recomputaton of means Fgure 14.9 One teraton of the K-means algorthm. The frst step assgns objects to the closest cluster mean. Cluster means are shown as crcles. The second step recomputes cluster means as the center of mass of the set of objects that are members of the cluster. not the case n general. Usually several teratons are requred before the algorthm converges. One mplementaton problem that the descrpton n fgure 14.8 does not address s how to break tes n cases where there are several centers wth the same dstance from an object. In such cases, one can ether assgn objects randomly to one of the canddate clusters (whch has the dsadvantage that the algorthm may not converge) or perturb objects slghtly so that ther new postons do not gve rse to tes. Here s an example of how to use K-means clusterng. Consder these twenty words from the New York Tmes corpus n chapter 5. Barbara, Edward, Gov, Mary, NFL, Reds, Scott, Sox, ballot, fnance, nnng, payments, polls, proft, quarterback, researchers, scence, score, scored, seats

25 Clusterng Cluster Members 1 ballot (0.28), polls (0.28), Gov (0.30), seats (0.32) 2 proft (0.21), fnance (0.21), payments (0.22) 3 NFL (0.36), Reds (0.28), Sox (0.31), nnng (0.33), quarterback (0.30), scored (0.30), score (0.33) 4 researchers (0.23), scence (0.23) 5 Scott (0.28), Mary (0.27), Barbara (0.27), Edward (0.29) Table 14.4 An example of K-means clusterng. Twenty words represented as vectors of co-occurrence counts were clustered nto 5 clusters usng K-means. The dstance from the cluster centrod s gven after each word. Buckshot Table 14.4 shows the result of clusterng these words usng K-means wth k = 5. We used the data representaton from chapter 8 that s also the bass of table 8.8 on page 302. The frst four clusters correspond to the topcs government, fnance, sports, and research, respectvely. The last cluster contans names. The beneft of clusterng s obvous here. The clustered dsplay of the words makes t easer to understand what types of words occur n the sample and what ther relatonshps are. Intal cluster centers for K-means are usually pcked at random. It depends on the structure of the set of objects to be clustered whether the choce of ntal centers s mportant or not. Many sets are well-behaved and most ntalzatons wll result n clusterngs of about the same qualty. For ll-behaved sets, one can compute good cluster centers by frst runnng a herarchcal clusterng algorthm on a subset of the objects. Ths s the basc dea of the Buckshot algorthm. Buckshot frst apples groupaverage agglomeratve clusterng (GAAC) to a random sample of the data that has sze square root of the complete set. GAAC has quadratc tme complexty, but snce ( n) 2 = n, applyng GAAC to ths sample results n overall lnear complexty of the algorthm. The K-means reassgnment step s also lnear, so that the overall complexty s O(n) The EM algorthm One way to ntroduce the EM algorthm s as a soft verson of K-means clusterng. Fgure shows an example. As before, we start wth a set of random cluster centers, c 1 and c 2. In K-means clusterng we would

26 14.2 Non-Herarchcal Clusterng c 1 c c 1 c c 1 c ntal state after teraton 1 after teraton 2 Fgure An example of usng the EM algorthm for soft clusterng. arrve at the fnal centers shown on the rght sde n one teraton. The EM algorthm nstead does a soft assgnment, whch, for example, makes the lower rght pont mostly a member of c 2, but also partly a member of c 1. As a result, both cluster centers move towards the centrod of all three objects n the frst teraton. Only after the second teraton do we reach the stable fnal state. An alternatve way of thnkng of the EM algorthm s as a way of estmatng the values of the hdden parameters of a model. We have seen some data X, and can estmate P(X p(θ)), the probablty of the data accordng to some model p wth parameters Θ. But how do we fnd the model whch maxmzes the lkelhood of the data? Ths pont wll be a maxmum n the parameter space, and therefore we know that the probablty surface wll be flat there. So for each model parameter θ,wewant to set θ log P(...) = 0 and solve for the θ. Unfortunately ths (n general) gves a non-lnear set of equatons for whch no analytcal methods of soluton are known. But we can hope to fnd the maxmum usng the EM algorthm. In ths secton, we wll frst ntroduce the EM algorthm for the estmaton of Gaussan mxtures, the soft clusterng algorthm that fgure s an example of. Then we wll descrbe the EM algorthm n ts most general form and relate the general form to specfc nstances lke the nsde-outsde algorthm and the forward-backward algorthm.

27 Clusterng EM for Gaussan mxtures observable unobservable In applyng EM to clusterng, we vew clusterng as estmatng a mxture of probablty dstrbutons. The dea s that the observed data are generated by several underlyng causes. Each cause contrbutes ndependently to the generaton process, but we only see the fnal mxture wthout nformaton about whch cause contrbuted what. We formalze ths noton by representng the data as a par. There s the observable data X ={ x }, where each x = (x 1,...,x m ) T s smply the vector that corresponds to the th data pont. And then there s the unobservable data Z = { z }, where wthn each z = z 1,...,z k, the component z j s 1 f object s a member of cluster j (that s, t s assumed to be generated by that underlyng cause) and 0 otherwse. We can cluster wth the EM algorthm f we know the type of dstrbuton of the ndvdual clusters (or causes). When estmatng a Gaussan mxture, we make the assumpton that each cluster s a Gaussan. The EM algorthm then determnes the most lkely estmates for the parameters of the dstrbutons (n our case, the mean and varance of each Gaussan), and the pror probablty (or relatve promnence or weght) of the ndvdual causes. So n sum, we are supposng that the data to be clustered conssts of nm-dmensonal objects X={ x 1... x n } R m generated by k Gaussans n 1...n k. Once the mxture has been estmated we can vew the result as a clusterng by nterpretng each cause as a cluster. For each object x,wecan compute the probablty P(ω j x ) that cluster j generated. An object can belong to several clusters, wth varyng degrees of confdence. Gaussan covarance matrx (14.17) Multvarate normal dstrbutons. The (multvarate) m-dmensonal Gaussan famly s parameterzed by a mean or center µ j and an m m nvertble postve defnte symmetrc matrx, the covarance matrx Σ j. The probablty densty functon for a Gaussan s gven by: [ 1 n j ( x; m j, Σ j ) = (2π) m Σ j exp 1 ] 2 ( x µ j) T Σ 1 j ( x µ j ) Snce we are assumng that the data s generated by k Gaussans, we wsh to fnd the maxmum lkelhood model of the form: (14.18) k π j n( x; µ j, Σ j ) j=1

28 14.2 Non-Herarchcal Clusterng 521 Man P(w c j ) = n j ( x ; µ j Σ j ) cluster Word ballot polls Gov seats proft fnance payments NFL Reds Sox nnng quarterback score scored researchers scence Scott Mary Barbara Edward Table 14.5 An example of a Gaussan mxture. The fve cluster centrods from table 14.4 are the means µ j of the fve clusters. A unform dagonal covarance matrx Σ = 0.05 I and unform prors π j = 0.2 were used. The posteror probabltes P(w c j ) can be nterpreted as cluster membershp probabltes. In ths model, we need to assume a pror or weght π j for each Gaussan, so that the ntegral of the combned Gaussans over the whole space s 1. Table 14.5 gves an example of a Gaussan mxture, usng the centrods from the K-means clusterng n table 14.4 as cluster centrods µ j (ths s a common way of ntalzng EM for Gaussan mxtures). For each word, the cluster from table 14.4 s stll the domnatng cluster. For example, ballot has a hgher membershp probablty n cluster 1 (ts cluster from the K- means clusterng) than n other clusters. But each word also has some non-zero membershp n all other clusters. Ths s useful for assessng the strength of assocaton between a word and a topc. Comparng two

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Graph-based Clustering

Graph-based Clustering Graphbased Clusterng Transform the data nto a graph representaton ertces are the data ponts to be clustered Edges are eghted based on smlarty beteen data ponts Graph parttonng Þ Each connected component

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

cos(a, b) = at b a b. To get a distance measure, subtract the cosine similarity from one. dist(a, b) =1 cos(a, b)

cos(a, b) = at b a b. To get a distance measure, subtract the cosine similarity from one. dist(a, b) =1 cos(a, b) 8 Clusterng 8.1 Some Clusterng Examples Clusterng comes up n many contexts. For example, one mght want to cluster journal artcles nto clusters of artcles on related topcs. In dong ths, one frst represents

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

What s Next for POS Tagging. Statistical NLP Spring Feature Templates. Maxent Taggers. HMM Trellis. Decoding. Lecture 8: Word Classes

What s Next for POS Tagging. Statistical NLP Spring Feature Templates. Maxent Taggers. HMM Trellis. Decoding. Lecture 8: Word Classes Statstcal NLP Sprng 2008 Lecture 8: Word Classes Dan Klen UC Berkeley What s Next for POS Taggng Better features! RB PRP VBD IN RB IN PRP VBD. They left as soon as he arrved. We could fx ths wth a feature

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Bran Curless Sprng 2008 Announcements (5/14/08) Homework due at begnnng of class on Frday. Secton tomorrow: Graded homeworks returned More dscusson

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

Hierarchical agglomerative. Cluster Analysis. Christine Siedle Clustering 1

Hierarchical agglomerative. Cluster Analysis. Christine Siedle Clustering 1 Herarchcal agglomeratve Cluster Analyss Chrstne Sedle 19-3-2004 Clusterng 1 Classfcaton Basc (unconscous & conscous) human strategy to reduce complexty Always based Cluster analyss to fnd or confrm types

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Clustering. A. Bellaachia Page: 1

Clustering. A. Bellaachia Page: 1 Clusterng. Obectves.. Clusterng.... Defntons... General Applcatons.3. What s a good clusterng?. 3.4. Requrements 3 3. Data Structures 4 4. Smlarty Measures. 4 4.. Standardze data.. 5 4.. Bnary varables..

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Understanding K-Means Non-hierarchical Clustering

Understanding K-Means Non-hierarchical Clustering SUNY Albany - Techncal Report 0- Understandng K-Means Non-herarchcal Clusterng Ian Davdson State Unversty of New York, 1400 Washngton Ave., Albany, 105. DAVIDSON@CS.ALBANY.EDU Abstract The K-means algorthm

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

Machine Learning. K-means Algorithm

Machine Learning. K-means Algorithm Macne Learnng CS 6375 --- Sprng 2015 Gaussan Mture Model GMM pectaton Mamzaton M Acknowledgement: some sldes adopted from Crstoper Bsop Vncent Ng. 1 K-means Algortm Specal case of M Goal: represent a data

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

APPLIED MACHINE LEARNING

APPLIED MACHINE LEARNING Methods for Clusterng K-means, Soft K-means DBSCAN 1 Objectves Learn basc technques for data clusterng K-means and soft K-means, GMM (next lecture) DBSCAN Understand the ssues and major challenges n clusterng

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information