Context-Specific Bayesian Clustering for Gene Expression Data

Size: px
Start display at page:

Download "Context-Specific Bayesian Clustering for Gene Expression Data"

Transcription

1 Context-Specfc Bayesan Clusterng for Gene Expresson Data Yoseph Barash School of Computer Scence & Engneerng Hebrew Unversty, Jerusalem, 91904, Israel Nr Fredman School of Computer Scence & Engneerng Hebrew Unversty, Jerusalem, 91904, Israel Abstract The recent growth n genomc data and measurements of genome-wde expresson patterns allows us to apply computatonal tools to examne gene regulaton by transcrpton factors. In ths work, we present a class of mathematcal models that help n understandng the connectons between transcrpton factors and functonal classes of genes based on genetc and genomc data. Such a model represents the jont dstrbuton of transcrpton factor bndng stes and of expresson levels of a gene n a unfed probablstc model. Learnng a combned probablty model of bndng stes and expresson patterns enables us to mprove the clusterng of the genes based on the dscovery of putatve bndng stes and to detect whch bndng stes and experments best characterze a cluster. To learn such models from data, we ntroduce a new search method that rapdly learns a model accordng to a Bayesan score. We evaluate our method on synthetc data as well as on real lfe data and analyze the bologcal nsghts t provdes. Fnally, we demonstrate the applcablty of the method to other data analyss problems n gene expresson data. 1 Introducton A central goal of molecular bology s to understand the regulaton of proten synthess. Wth the advent of genome sequencng projects, we have access to DNA sequences of the promoter regons that contan the bndng stes of transcrpton factors that regulate gene expresson. In addton, the development of mcroarrays allows researchers to measure the abundance of thousands of mrna targets smultaneously provdng a genomc vewpont on gene expresson. As a consequence, ths technology facltates new expermental approaches for understandng gene expresson and regulaton (Iyer et al. 1999, Spellman et al. 1998). The combnaton of these two mportant data sources can lead to better understandng of gene regulaton (Bttner et al. 1999, Brazma & Vlo 2000). The man bologcal hypothess underlyng most of these analyses s Genes wth a common functonal role have smlar expresson patterns across dfferent experments. Ths smlarty of expresson patterns s due to co-regulaton of genes n the same functonal group by specfc transcrpton factors. Clearly, ths assumpton s only a frst-order approxmaton of bologcal realty. There are gene functons for whch ths assumpton defntely does not hold, and there are co-expressed genes that are not co-regulated. Nonetheless, ths assumpton s useful n fndng the strong sgnals n the data. 1

2 Based on the above assumpton, one can cluster genes by ther expresson levels, and then search for short DNA strngs that appear n sgnfcant over-abundance n the promoter regons of these genes (Roth et al. 1998, Tavazoe et al. 1999, Vlo et al. 2000). Such an approach can dscover new bndng stes n promoter regons. Our am here s complmentary to ths approach. Instead of dscoverng new bndng stes, we focus on characterzng groups of genes based on ther expresson levels n dfferent experments and the presence of putatve bndng stes wthn ther promoter regons. The bologcal hypothess we descrbed suggests that genes wthn a functonal group wll be smlar wth respect to both types of attrbutes. We treat expresson level measurements and nformaton on promoter bndng stes n a symmetrc fashon, and cluster genes based on both types of data. In dong so, our method characterzes the attrbutes that dstngush each cluster. Due to the stochastc nature of bologcal processes and expermental artfacts, gene expresson measurements are nherently nosy. In addton, dentfcaton of putatve bndng stes s also nosy, and can suffer from both false postve and false negatve errors. All of ths ndcates that usng a probablstc model n whch we treat both expresson and pattern dentfcaton as random varables, mght lead to a better understandng of the bologcal mechansm as well as mprove gene functonal characterzaton and transcrpton stes dentfcaton. Usng ths probablstc approach, we develop a class of clusterng models that cluster genes based on random varables of two types. Random varables of the frst type descrbe the expresson level of the gene, or more precsely ts mrna transcrpt n an experment (mcroarray hybrdzaton). Each experment s denoted by a dfferent random varable whose value s the expresson level of the gene n that partcular experment. Random varables of the second type descrbe occurrences of putatve bndng stes n the promoter regon of the genes. Agan, each bndng ste s denoted by a random varable, whose value s the number of tmes the bndng ste was detected n the gene s promoter regon. Our method clusters genes wth smlar expresson patterns and promoter regons. In addton, the learned model provdes nsght on the regulaton of genes wthn each cluster. The key features of our approach are: (1) automatc detecton of the number of clusters; (2) automatc detecton of random varables that are rrelevant to the clusters; (3) robust clusterng n the presence of many such random varables, (4) context-depended representaton that descrbes whch clusters each attrbute depends on. Ths allows us to dscover the attrbutes (random varables) that characterze each cluster and dstngush t from the rest. We learn these cluster models usng a Bayesan approach that uses structural EM (Fredman 1997, Fredman 1998), an effcent search method over dfferent models. We evaluate the resultng method on synthetc data, and apply t to real-lfe data. Fnally, we also demonstrate the applcablty and generalty of the method to other problems and data sources by ntroducng nto the model data from phylogenetc proflng and clusterng experments by ther gene expresson profles. In Secton 2 we ntroduce the class of probablstc models that we call Context-Specfc Clusterng models. In Secton 3 we dscuss how to score such models based on data. In Secton 4 we descrbe our approach for fndng a hgh-scorng clusterng model. In Secton 5 we evaluate the learnng procedure on synthetc and real-lfe data. We conclude n a dscusson of related work and possble extensons n Secton 6. 2

3 2 Context-Specfc Clusterng In ths secton we descrbe the class of probablstc models we to learn from data. We develop the models n a sequence of steps startng from a farly well known model for Bayesan clusterng, and refnng the representaton to explctly capture the structures we want to learn. We stress that at ths stage we are focusng on what can be represented by the class of models, and we examne how to learn them n subsequent sectons. 2.1 Nave Bayesan Clusterng Let X 1 ::: X N be random varables. In our man applcaton, these random varables denote the attrbute of a partcular gene: the expresson level of ths gene n each of the experments, and the numbers of occurrences of each bndng stes n the promoter regon. Suppose that we receve a dataset D that conssts of M jont nstances of the random varables. The m th nstance s a jont assgnment x 1 [m] ::: x N [m] to X 1 ::: X N. In our applcaton, nstances correspond to genes: each gene s descrbed by the values of the random varables. In modelng such data we assume that there s an underlyng jont dstrbuton P (X 1 ::: X N ) from whch the tranng nstances were sampled. The estmaton task s to approxmate ths jont dstrbuton based on the data set D. Such an estmate can help us understand the nteractons between the varables. A typcal approach for estmatng such a jont dstrbuton s to defne a probablstc model that defnes a set of dstrbutons that can be descrbed n a parametrc form, and then fnd the partcular parameters for the model that best ft the data n some sense. A smple model that s often used n data analyss s the nave Bayes model. In ths model we assume that there s an unobserved random varable C that takes values 1 ::: K, and descrbes whch cluster the example belongs to. We then assume that f we know the value of C, all the observed varables become ndependent of each another. That s, the form of the dstrbuton s: P (X 1 ::: X N )= X P (C = k)p (X 1 j C = k) P (X N j C = k) (1) k In other words, we estmate a mxture of product dstrbutons. One must also bare n mnd that such models are not necessarly a representaton of real bologcal structure but rather a mathematcal model that can gve us nsghts nto the bologcal connectons between varables. The ndependence assumptons we make are condtonal ones. For example, we assume that gven the model, the genes are ndependent. That s, after we know the model, observng the expresson levels of a sngle gene does not help predct better the expresson levels of another gene. Smlarly, we assume that expresson level of the same gene n dfferent condton are ndependent gven the cluster the gene belongs to. Ths assumpton states that the cluster captures the frst order descrpton of the gene s behavor, and we treat (n the model) all other fluctuatons as nose that s ndependent n each measurement. We attempt to be precse and explct about the ndependence assumptons we make. However, we note that most clusterng approaches we know of treat (explctly or mplctly) genes as beng ndependent of each other, and qute often also treat dfferent measurement of the same gene as ndependent observatons of the cluster. The nave Bayes model s attractve for several reasons. Frst, from estmaton pont of vew we need to estmate relatvely few parameters: the mxture coeffcents P (C = k), and the parameters 3

4 of the condtonal dstrbutons P (X j C = k). Second, the estmated model can be nterpreted as modelng the data by K clusters (one for each value k = 1 ::: K), such that the dstrbuton of dfferent varables wthn each cluster are ndependent. Thus, dependences between the observed varables are represented by the cluster varable. Fnally, ths model allows us to use farly effcent learnng algorthms, such as expectaton maxmzaton (EM) (Dempster et al. 1977). The dstrbuton form n Eq. (1) specfes the global structure of the nave Bayesan dstrbuton. In addton, we also have to specfy how to represent the condtonal dstrbutons. For ths purpose we use parametrc famles. There are several of famles of condtonal dstrbutons we can use for modelng P (X j C = k). In ths paper, we focus on two such famles. If X s a dscrete varable that takes a fnte number of values (e.g., a varable that denotes number of bndng stes n a promoter regon), we represent the condtonal probablty as a multnomal dstrbuton P (X j C = k) Multnomal(f x jk : x 2 Val(X )g for each value x of X we have a parameter x jk that denotes the probablty that X = x when C = k. These parameters must be non-negatve, and satsfy P x x jk =1, for each k. If X s a contnuous varable (e.g., a varable that denotes the expresson level of a gene n a partcular experment), we use a Gaussan dstrbuton P (X j C = k) N ( X jk 2 X jk ) such that P (x j C = k) = 1 p exp 2X jk ) (; (x ; X jk) X jk : We use the Gaussan model n ths stuaton for two reasons. Frst, as usual, the Gaussan dstrbuton s one of the smplest contnuous densty models and allow effcent estmaton. Second, when we use as observatons the logarthm of the expresson level (or logarthms of ratos of expresson between a sample and a common control sample), gene expresson has roughly nose characterstcs. We note however, that most of the developments n ths paper can be acheved wth more detaled (and realstc) nose models for gene expresson. Once we have estmated the condtonal probabltes, we can compute the probablty of an example belongng to a cluster: P (C = k j x 1 ::: x N ) / P (C = k)p (x 1 j C = k) P (x N j C = k) If the clusters are well-separated, then ths condtonal probablty wll assgn each example to one cluster wth hgh probablty. However, t s possble that clusters overlap, and some examples are assgned to several clusters. If we compare the probablty of two clusters, then log P (C = k j x 1 ::: x N ) P (C = k) P (C = k 0 = log j x 1 ::: x N ) P (C = k 0 ) + X log P (x j C = k) P (x j C = k 0 ) Thus, we can vew the decson boundary between any two clusters as the sum of terms that represent the contrbuton of each attrbute to ths decson. The rato P (x j C = k)=p (x j C = k 0 ) s the relatve support that x gves to k versus k 0. (2) 4

5 2.2 Selectve Nave Bayesan Models The nave Bayes model gves all varables equal status. Ths s a potental source of problems for two reasons. Frst, some varables should be consdered as nose snce they have no real nteractons wth the other varables. Suppose that X 1 s ndependent from rest of the varables. By learnng K condtonal probablty models P (X 1 j C = 1) ::: P(X 1 j C = K), we are ncreasng the varablty of the estmated model. Second, snce we are dealng wth a relatvely small number of tranng examples, f we fal to recognze that X 1 s ndependent of the rest, the observatons of X 1 can bas our choce of clusters. Thus, a combnaton of many rrelevant varables mght lead us to overlook the relevant ones. As a consequence, the learned model dscrmnate clusters by the values of the rrelevant varables. Such clusters suffer from hgh varablty (because of ther nosy character). If we know that X 1 s ndependent from the rest, we can use the fact that P (X 1 j C) =P (X 1 ) and rewrte the model n a smpler form: P (X 1 ::: X N )=P (X 1 ) X P (C = k)p (X 2 j C = k) P (X N j C = k): k Ths representaton of the jont probablty requres less parameters and thus the estmaton of these parameters s more robust. More mportantly, the structure of ths model explctly captures the fact that X 1 s ndependent of the other varables ts dstrbuton does not depend on the cluster varable. Note that n ths model, as expected, the value of X 1 does not mpact the probablty of the class C. In our bologcal doman, we expect to see many varables that are ndependent (or almost ndependent) of the classfcaton. For example, not all bndng stes of transcrpton factors play an actve role n the condtons n whch expresson levels were measured. Another example, s a putatve bndng ste (suggested by some search method or other) that does not correspond to a bologcal functon. Thus, learnng that these stes are ndependent of the measured expresson levels s an mportant aspect of the data analyss process. Based on ths dscusson, we want to consder models where several of the varables do not depend on the hdden class. Formally, we can descrbe these dependences by specfyng a set G fx 1 ::: X N g that represents the set of varables that depend on the cluster varable C. The jont dstrbuton then takes the form of P (X 1 ::: X N j G) = Y 1 P (X ) A X 62G k P (C = k) Y 2G P (X j C = k) We note that ths class of models s essentally a specal subclass of Bayesan networks (Pearl 1988). Smlar models were consdered for a somewhat dfferent applcaton n supervsed learnng by Langley and Sage (1994). We note agan, that when we compare the posteror probablty of two clusters, as n Eq. (2), we only need to consder varables that are not ndependent of C. That s, P (C log = k j x 1 ::: x N ) P (C = k) X P (x P (C = k 0 =log j x 1 ::: x N ) P (C = k 0 ) + j C = k) log P (x j C = k 0 : ) Ths formally demonstrates the ntuton that varables outsde of G do not nfluence the choce of clusters. 5 2G!

6 X =0 X =1 X =2 X =3 C =1 0:1 0:1 0:5 0:3 C =2 0:1 0:1 0:2 0:6 C =3 0:7 0:2 0:05 0:05 C =4 0:7 0:2 0:05 0:05 C =5 0:7 0:2 0:05 0:05 C =6 0:7 0:2 0:05 0:05 (a) explct table representaton X =0 X =1 X =2 X =3 C =1 0:1 0:1 0:5 0:3 C =2 0:1 0:1 0:2 0:6 C = 0:7 0:2 0:05 0:05 (b) default table Fgure 1: Example of two representatons of the same condtonal dstrbuton P (X j C). 2.3 Context-Specfc Independence Suppose that a certan bndng ste, whose presence s denoted by the varable X 1, s regulatng genes n two functonal categores. We would then expect ths ste to be present wth hgh probablty n promoter regons of genes n these two categores, and to have low probablty of appearng n the promoter regon of all other genes. Snce X 1 s relevant to the expresson level of (some) genes, t s not ndependent of the other varables, and so we would prefer models where X 1 2 G. In such a model, we need to specfy P (X 1 j C = k) for k =1 ::: K. That s, for each functonal category, we learn a dfferent probablty dstrbuton over X 1. However, snce X 1 s relevant only for two classes, say 1 and 2, ths ntroduces unnecessary complexty: once we know that C s not one of the two relevant functon classes (.e., C>2), we can predct P (X 1 j C) usng a sngle dstrbuton. To capture such dstnctons, we need to ntroduce a language that refnes the deas of selectve nave Bayesan models. More precsely, we want to descrbe addtonal structure wthn the condtonal dstrbuton P (X 1 j C). The ntuton here s that we need to specfy context-specfc ndependences (CSI): once, we know that C 62f1 2g, then X 1 s ndependent of C. Ths ssue has receved much attenton n the probablstc reasonng communty (Boutler et al. 1996, Chckerng et al. 1997, Fredman & Goldszmdt 1998). Here, we choose a farly smple representaton of CSI that Fredman & Goldszmdt (1998) term default tables. Ths representaton s as follows. The structure of the dstrbuton P (X j C) s represented by an object L = fk 1 ::: k l g where k j 2f1 ::: Kg. Each k j represents a case that has an explct condtonal probablty. All other cases are treated by a specal default condtonal probablty. Formally, the condtonal probablty has the form: P (X j C = k) =( P (X j C = k j ) k = k j 2L P (X j k 62L ) otherwse It wll be convenent for us to thnk of L as defnng a random varable, whch we wll denote L, wth l +1values. Ths random varable s the characterstc functon of C, such that L = j f C = k j 2 L, and L = l +1f C = k 62 L. Then, P (X j C) s replaced by P (X j L ). Ths representaton requres l +1dfferent dstrbutons rather than K dfferent ones. Note that each of these condtonal dstrbutons can be multnomal, Gaussan, or any other parametrc famly we mght choose to use. Returnng to our example above, Instead of representng the probablty P (X 1 j C) as a complete table, as n Fgure 1(a), we can represent t usng a more succnct table wth the cases 1, 2 and 6

7 the default f3 ::: Kg as shown n Fgure 1(b). Ths requres estmatng a dfferent probablty of X 1 n each of the frst two clusters, and one probablty of X 1 n the remanng clusters. We note that n the extreme case, when L s empty, then we are renderng X ndependent of C. To see ths, note that L has a sngle value n ths stuaton, and thus P (X j C) s the same for all values C. Thus, snce CSI s a refnement of selectve Bayesan models, t suffces to specfy the choce L for each varable. Fnally, we consder classfyng a gene gven a model. As n Eq. (2), the decson between two clusters s a sum of terms of the form P (x j C = k)=p (x j C = k 0 ). Now, f both k and k 0 fall n the default category of L, then they map to the same value of L, and thus defne the same condtonal probablty over X. In such a stuaton, the observaton x does not contrbute to the dstncton between k and k 0. On the other hand We wll say that X dstngushes a cluster k j,f k j 2L ndcatng a unque condtonal dstrbuton for X gven the cluster k j. 3 Scorng CSI Clusterng We want to learn CSI Clusterng from data. By learnng, we mean selectng the number of clusters K, the set of dependent random varables G, the correspondng local structures L, and n addton, estmatng the parameters of the condtonal dstrbutons n the model. We reterate that CSI clusterng s a specal sub-class of Bayesan networks wth default tables. Thus, we adopt standard learnng approaches for Bayesan networks (Fredman 1998, Fredman & Goldszmdt 1998, Heckerman 1998) and specalze them for ths class of models. In partcular, we use a Bayesan approach for learnng probablstc models. In ths approach learnng s posed as an optmzaton problem of some scorng functon. In ths secton we revew the scorng functons over dfferent choces of clusterng models (ncludng both structure and parameters). In the next secton, we consder methods for fndng the hgh-scorng clusterng models. That s, we descrbe computatonal procedures for searchng the vast space of possble structures effcently. 3.1 The Bayesan Score We assume that the set of varables X 1 ::: X N s fxed. We defne a CSI Clusterng model to be a tuple M = hk fl g, where K specfes the number of values of the latent class and L specfes the choce of local structure for X. (Recall that X does not depend on C f L =.) A model M s parameterzed by a vector ~ M of parameters. These nclude the mxture parameters ~ k = P (C = k), and the parameters ~ X jl of P (X j L = l). As nput for the learnng problem, we are gven a dataset D that conssts of M samples, the m th sample specfes a jont assgnment x 1 [m] ::: x N [m] to X 1 ::: X N. In the Bayesan approach, we compute the posteror probablty of a model, gven the partcular data set D: P (MjD) / P (D jm)p (M) The term P (M) s the pror probablty of the model M, and P (D jm) s the margnal lkelhood of the data, gven the model M. 7

8 In ths paper, we use a farly smple class of prors over models, n whch the model pror decomposes nto several ndependent components, as suggested by Fredman & Goldszmdt (1998)) P (M) / P (K)P (G) Y P (L ): We assume that P (K) / K s a geometrc dstrbuton wth parameter whch s farly close to 1. The pror over G s desgned to penalze dependences. Thus P (G) / jgj for some parameter < 1. (Recall that G = f : L 6= g.) Fnally, the pror dstrbuton over local models s set to P (L ) / K; 1 K jl j;1 : Thus, we set a unform pror over the number of cases n L, and then put a unform pror over all local structures wth ths cardnalty. We choose these prors for ther mathematcal smplcty (whch makes some of the computatons below easer) and snce they slghtly favor smpler models. We now consder the margnal lkelhood term. Ths term evaluates the probablty of generatng the data set D from the model M. Ths probablty requres averagng over all possble parameterzatons of M: Z P (D jm)= P (D jm M)P ~ ( ~ M jm)d ~ M (3) where P ( ~ M jm) s the pror densty over the parameters ~ M, and P (D jm ~ M) s the lkelhood of the data P (D jm ~ Y X M )= m k P (C = k jm ~ Y M ) P (x [m] j l (k) M ~ M ) where l (k) s the value of L when C = k. In ths work we follow a standard approach to learnng graphcal models and use decomposable prors for a gven model parameters ~ M that have the form P ( ~ M jm)=p ( C ) Y Y l2l P ( X jl) For multnomal X and for C, we use a Drchlet (DeGroot 1970) pror over the parameters, and for normal X, we use a normal-gamma pror (DeGroot 1970). We revew the detals of both famles of prors n Appendx A. We stress that the Bayesan method s dfferent from the maxmum lkelhood method. In the latter, one evaluates each model by the lkelhood t acheves wth the best parameters. That can be msleadng snce poor models mght have specfc parameters that gve the data hgh lkelhood. Bayesan approaches avod such over-fttng by averagng over all possble parameterzatons. Ths averagng regularzes the score. In fact, a general theorem (Schwarz 1978) shows that for large data sets (.e., as M!1) log P (D jm) = log P (D jm ^~ M) ; 1 2 where ^~ M are the maxmum aposteror probablty (MAP) parameters that maxmze P (D jm M)P ~ ( ~ M j M), and dm(m) s the dmensonalty of the model M (the number of degrees of freedom n the! (4) log M dm(m) +O(1) (5) parameterzaton of M). Thus, n the lmt the Bayesan score behaves lke a penalzed maxmum lkelhood score, where the penalty depends on the complexty of the model. 1 Note that ths approxmaton s closely related to the mnmum descrpton length (MDL) prncple (Rssanen 1978). 1 Note that as M!1the maxmum lkelhood parameters and the MAP parameters converge to the same values. 8

9 3.2 Complete Data We brefly dscuss the evaluaton of the margnal lkelhood n the case where the data s complete. Ths settng s easer than the settng we need to deal wth, however, the developments here are needed for the ones below. In the complete data case we assume that we are learnng from a data set D c that contans M samples, each of these specfes values x 1 [m] ::: x N [m] c[m] for X 1 ::: X N and C. (In ths case, we also fx n advance the number of values of C:) For such data sets, the lkelhood term P (D c jm M) ~ can be decomposed nto a product of local terms: P (D c jm M) ~ =L local (C S C ~ Y Y C ) l2l L local (X S X jl ~ X jl) (6) where the L local terms denote the lkelhood that depends on each condtonal probablty dstrbuton and the assocated suffcent statstcs vectors S C and S X jl. These statstcs are cumulatve functons over the tranng samples. These nclude counts of the number of tmes a certan event occurred, or sum of the values of X,orX 2 n the samples where L = l. The partcular detal of these lkelhoods and suffcent statstcs are less crucal for the developments below, and so we defer them to Appendx A. An mportant property of the suffcent statstcs s that once we compute the statstcs for the case n whch jl j = jcj,.e. we have a separate condtonal dstrbuton for each cluster n a node X, we can easly get statstcs for other local structures, as a sum over the relevant statstcs for each l 2L : X S X jl = P (L = l j C = k)s X jc k k (Note that snce L s a determnstc functon of C, P (L = l j C = c) s ether 0 or 1.) The mportant consequence of the decomposton of Eq. 6 and the correspondng decomposton of the pror, s that the margnal lkelhood term also decomposes (see (Fredman & Goldszmdt 1998, Heckerman 1998)) where P (D c jm)=s local (C S C ) S local (X S X jl) = Z Y Y l2val(l ) S local (X S X jl ) (7) L local (X S X jl ~ X jl)p ( ~ X jl jm)d ~ X jl The decomposton of margnal lkelhood suggests that we can easly fnd the best model n the case of complete data. The ntuton s that the observaton of C decouples the modelng choces for each X from the other varables. Formally, we can easly see that changng L for X changes only the pror assocated wth that L and the margnal lkelhood term Q l2val(l ) S local(x S X jl). Thus, we can optmze the choce of each L separately of the others. Note that there are 2 K possble choces of L. For each such choce we compute the suffcent statstcs, and evaluate the score of the model. When K s small we can exhaustvely evaluate all these choces. In such a stuaton we are fnd the optmal model gven the data. In most learnng scenaros, however, K s large enough to make such enumeraton unfeasble. Thus, nstead, we construct L by a greedy procedure (Fredman & Goldszmdt 1998) that at each teraton fnds the best k to separate from the default case, untl no mprovement s made to the score. To summarze, when we have complete data the problem of learnng a CSI clusterng model s straghtforward: We collect the suffcent statstcs S X jc k for every X and k =1 ::: K, and then 9

10 we can effcently evaluate every possble model. Moreover, we can choose the one wth the hghest posteror wthout explctly enumeratng all possble models. Instead, we smply decde what s the best L for each X, ndependently of the decsons made for the other varables. 3.3 Incomplete Data We now return to the case of nterest to us, where we do not observe the class labels. Such a learnng problem s sad to have ncomplete data. In ths learnng scenaro, the evaluaton of the margnal lkelhood Eq. (3) s problematc as we need to summarze over all completons of the mssng data. We denote the mssng part of the data as D H. In our case, ths consst of assgnment to clusters for the M samples. Usng ths notaton, we can wrte Eq. (3) as: P (D jm)= Z X D H P (D D H jm ~ M)P ( ~ M jm)d ~ M Although P (D D H j M M) ~ s a product of local terms, we cannot decompose the margnal lkelhood. Moreover, unlke the complete data term, we cannot learn the structure of P (X j C) ndependently of learnng the structure of other condtonal probabltes. Snce we do not observe the values of the cluster varables, these choces nteract. As a consequence, we cannot compute the margnal lkelhood n an analytcal form. Instead, we need to resort to approxmatons. We refer the reader to Chckerng & Heckerman (1997) for an overvew of methods for approxmatng the margnal lkelhood. In ths paper we use two such approxmatons to the logarthm of the margnal lkelhood. The frst s the Bayesan Informaton Crteron (BIC) approxmaton of Schwarz (1978)(see Eq. (5)). BIC(M ~ M )=logp (D jm ~ M ) ; 1 2 log M dm(m) To evaluate ths score, we perform expected maxmzaton (EM) teratons to fnd the MAP parameters (Laurtzen 1995); see also (Chckerng & Heckerman 1997, Heckerman 1998). The beneft of ths score s that once we fnd the MAP parameters, t s farly easy to evaluate. Unfortunately, ths score s only asymptotcally correct, and can over-penalze models for complexty n practce. Another possble approxmaton s the Cheeseman-Stutz (CS) score (Cheeseman & Stutz 1995); see also (Chckerng & Heckerman 1997). Ths score approxmates the margnal lkelhood as: CS(M ~ M ) = log P (D jm ~ M ) ; log P (D c jm ~ M )+logp (D c jm) where D c s a fcttous data set that s represented by a set of suffcent statstcs. The computaton of P (D ^ c jm ~ M ) and P (D c jm) s then performed as though the data s complete. Ths smply amounts to evaluatng Eq. (6) and Eq. (7) usng the suffcent statstcs for D c. The choce of D c s such that ts suffcent statstcs wll match the expected suffcent statstcs gven M and ~ M. These are defned by averagng over all possble completons D c of the data E h SX jc k jm ~ M = X D c S Dc X jc k P (D c j D M ~ M) (8) where D c represents a potental completon of the data (.e., assgnment of cluster value to each example) and S Dc X jc k s the suffcent statstcs for X gven C = k evaluated on D c. Usng the 10

11 lnearty of expectaton, ths term can be effcently computed (Chckerng & Heckerman 1997, Fredman 1998). Thus, to compute D c, we fnd the MAP parameters ~ M, and then compute the expected suffcent statstcs gven M ~ M. We then use these wthn Eq. (6) and Eq. (7) as the suffcent statstcs of the fctonal data set D c. 4 Learnng CSI Clusterng 4.1 Structural EM Once we set our pror probabltes, and decde on the type of approxmaton we use (ether BIC or CS), we mplctly nduce a score over all possble models. Our goal s to dentfy the model M that attans the hghest score. Unfortunately, for a fxed K, there are O(2 NK ) choces of models wth K clusters and N varables, therefore we cannot exhaustvely evaluate the score on all models. The typcal way to handle ths dffculty s by resortng to a heurstc search procedure. Local search procedures traverse the space of models by performng local changes (e.g., changng one of the L by addng or removng a case n the default table). The man computatonal cost of such a search s evaluatng canddate model. Remember that snce we have ncomplete data, we cannot drectly score a canddate models. Instead, for each canddate model we want to score, we perform another search n the parameter space (usng technques such as EM) to fnd the MAP parameters and then use these parameters for computng the score. Thus, the search procedure spends nonneglgble computaton per canddate. Ths severely lmts the set of canddates that t can explore. To avod such expensve evaluatons of canddates, we use the framework of Bayesan structural EM (Fredman 1998). In ths framework, we use our current canddate to complete the mssng values (.e., cluster assgnments). We then perform structure learnng as though we have complete data, searchng (effcently) for a better model. Ths results n a new best model (wth t s optmzed parameters). Ths new model, forms the bass for the next teraton, and so on. Ths procedure has the beneft that structure selecton s done n a stuaton that resembles complete data. In addton, each teraton can fnd a model that s qute dfferent from the model at the begnnng of the teraton. In ths sense, the local moves of standard search procedure are replaced by global moves. Fnally, the procedure s proven to mprove the structure n each teraton. More specfcally, the Structural EM procedure conssts of repeated teratons. We ntalze the process wth a model M 0 ~ 0. We dscuss below the choce of ths startng pont. Then at the ` +1 th teraton we start wth the par M` ~ ` of the prevous teraton and construct a new par M`+1 ~ `+1. Ths teraton conssts of three steps. E-Step: Compute expected suffcent statstcs ~S`X jc j = E h SX jc k jm` ~ ` for each =1 ::: N and each k =1 ::: K usng Eq. (8). M-Step: Learn a model M`+1 and parameters ~ `+1 usng these expected suffcent statstcs, as though they were observed n a complete data set. For each X choose the scorng CSI model L that maxmzes the score wth respect to the suffcent statstcs. Ths s done ndependently for each of the varables. Postprocessng-Step: Maxmze the parameters for M`+1 by runnng parametrc EM. Ths optmzaton s ntalzed by the MAP parameters gven the expected suffcent statstcs. 11

12 These teratons are remnscent of the standard EM algorthm. The man dfference s that n the standard approach the M-Step nvolves re estmatng parameters, whle n Structural EM we also relearn the structure. More precsely, Structural EM enables us to evaluate each possble new L based on the suffcent statstcs computed wth the current L nstead of dong an expensve EM procedure for each such canddate. In applyng ths procedure, we can use dfferent scores n choosng models at the M-Step. Ths depends on the approxmaton we set out to use on the ncomplete data. Above we dscussed 2 dfferent scores. The frst one s the BIC approxmaton. In ths case, we smply evaluate structures n the M-step usng BIC on complete data (the lkelhood n ths case decomposes, and the complexty penalty remans the same). The second one s the CS approxmaton. In ths case, note that CS appled to complete data s smply the Bayesan score (snce log P (D jm ~M) ^ and ^ ~ M ) cancel out). Thus, n ths case we use the exact Bayesan score wth respect to log P (D c jm the expected suffcent statstcs. These teratons are guaranteed to mprove the score n the followng sense. Each teraton fnds a canddate that has better score (wth respect to the ncomplete tranng data) than the prevous one. More precsely, f we use the BIC score (wth respect to the expected suffcent statstcs) n the M-step, then results of Fredman (1997) show that the BIC score of M`+1 `+1 ~ s greater than the BIC score M` `, ~ unless the procedure converged n whch case the two scores wll be equal. Thus, each step mproves the score we set out to maxmze, and at some pont the procedure wll reach a (local) maxma. When we use the CS score, the stuaton s more complcated. The results of Fredman (1998) show that each teraton s an approxmate verson of a procedure that does mprove the Bayesan score on the ncomplete data. In practce, most teratons do mprove the CS score. We use two dfferent methods for ntalzng the structural EM procedure. In the frst one, we start wth the full model (where jl j = jcj for every varable X node). Ths model s the most expressve n the class we consder, and thus allows the startng pont to capture any type of trend n the data. The second ntalzaton method, s by usng a random model, where G (.e. the set of varables dependent on the hdden cluster varable) s chosen at random. In both cases, we apply aggressve parametrc optmzaton to fnd the ntal parameters. Ths s done by usng 100 random startng ponts for parametrc EM, and returnng the parameter vector that acheves the hghest score. 4.2 Escapng Local Maxma The structural EM procedure, as descrbed above, can get trapped n local maxma. That s, t can reach sub-optmal convergence ponts. Ths can be a serous problem, snce some of these convergence ponts are much worse than the optmal model, and thus lead to a poor clusterng. A nave way to avod ths problem s by multple restarts. However, when the number of local maxma s large, such multple restarts have lmted utlty. Instead, we want strateges for escapng local maxma that mprove on the soluton found by earler teratons. We mplemented two approaches for escapng local maxma. In the frst approach, we apply a drected search once the structural EM procedure converges. More specfcally, assume that M` s the convergence pont of structural EM. Startng from ths model, we apply a local search procedure that attempts to add and remove varables to the model. As explaned above, such a procedure s costly snce t has to separately evaluate each canddate t proposes. To avod evaluatng all moves from the current model, we apply randomzed moves 12

13 and evaluate each one. Once a canddate wth a score hgher than that of M` s found, we restart structural EM from that pont. If after a fxed amount of random trals no mprovement was found, the procedure termnates the search and returns the best model found so far. In the second approach, we use annealng-lke procedure to ntroduce randomness at each step of the process. Ths randomness needs to serve two purposes. On the one hand, a certan amount of randomness wll allow the procedure to escape convergence ponts of structural EM. On the other hand, we want our steps to explot the suffcent statstc computed n the E-step to choose models that buld on nformaton learned n prevous teratons. We acheve ths goal by usng a varant of Structural EM recently suggested by Eldan et al. (2001) and Fredman et al. (2002). The dea s smple: at each teraton of Structural EM, we perform a random reweghtng of the tranng samples. More precsely, for each sample m, we sample a weght w`m from a Gamma dstrbuton wth mean 1 and varance `, where ` s an addtonal parameter that controls the temperature of the search. In the modfed E-step we compute weghted suffcent statstcs E h SX jc k j W ` M ~ M = X D c w`ms X jc k D c P (D c j D M ~ M ) We then apply the M-Step wth respect to these reweghted expected suffcent statstcs. Addtonally, we set `+1 to be ` where < 1 s a decay factor. The search s termnated once ` reaches a certan predetermned threshold. In our experments, the annealed approach domnated n performance the approach descrbed above. 5 Evaluaton 5.1 Smulaton Studes To evaluate the applcablty of our clusterng method, we started by performng tests on synthetc data sets. These data sets were sampled from a known clusterng model (whch determned the number of clusters, whch varables depend on whch cluster value, and the condtonal probabltes). Snce we know the model that orgnated the data, we can measure the performance of our procedure. We examned two aspects. Frst, how well the procedure recovers the structure of the real model (number of clusters, false postve and false negatve edges n the model). Second, how well the procedure recovers the orgnal clusterng. That s, how well the model classfes a new sample (gene). The am of these tests s to understand how the performance of the method depends on varous parameters of the learnng problem. We wll revew our technques for evaluatng the learnng process results and then turn to descrbe the detals of our artfcal data set generaton, followed wth a summary of the results. We frst address the ssue of evaluatng the classfcaton success, whch can be measured n many dfferent technques. We use the followng crteron. A clusterng model M defnes a condtonal probablty dstrbuton over clusters gven a sample. Let M t denote the true model, and let M e denote the estmated model. Both defne condtonal dstrbutons over clusters. We want to compare these two condtonal dstrbutons. We wll denote the clusters of the true model as C t and the clusters of the estmated model as C e. Then, we can defne a jont dstrbuton over these two clusterngs: P (C t C e )= X P (x jm t )P (C t j x M t )P (C e j x M e ) x 13

14 Table 1: Summary of results on synthetc data. The results summarze performance of the procedure on data generated from a model wth 5 true clusters and addtonal background nose. We report: the number of clusters learned, logarthm of the lkelhood rato between learned model and true model on tranng data (wth nosy samples) and test data (unseen samples, wthout nose), the nformaton the learned clusters contan about the orgnal clusters (see text), the fracton of edges not recovered (# false negatve edges / # edges n the true models), and the fracton of false edges recovered (# false postve edges / # edges n learned model). For each fgure of mert, we report the mean value and the standard devaton from results from 10 datasets (see text). Nose Score N Cluster # Lkelhood (tran) Lkelhood (test) I(C t C e) H(C t) #FalseNegatves #TrueEdges #FalsePostves #LearnedEdges Mean Std Mean Std Mean Std Mean Std Mean Std Mean Std 10% BIC CS % BIC CS Score Vs. Number of Clusters 128 Score Vs. Number of Clusters 800 Samples 154 Weghts Annealng Search Samples 156 Score 157 Drect Search Score Samples Number of Clusters (a) Number of Clusters (b) Fgure 2: Graphs comparng the scores for dfferent cluster numbers. The x-axs denotes the number of clusters, and the y-axs denote the score per sample (logarthm of BIC score dvded by number of samples). (a) Comparson of drected search and weghts annealng search on tranng data wth 30% nose and 500 tranng samples. (b) Comparson of weght annealng search on tranng data wth 10% nose, wth 200, 500, and 800 tranng samples. Each pont s the average of 5 data sets, and error bars denote one standard devatons. 14

15 where the sum s over all possble jont assgnments to X. In practce we cannot sum over all these jont assgnments, and thus we estmate ths dstrbuton by samplng from P (X jm t ). Once we have the jont dstrbuton we can compute the mutual nformaton, I(C t C e )= X c t c e P (c t c e ) log P (c t c e) P (c t )P (c e ) between the two clusterng varables (Cover & Thomas 1991). Ths term denotes the number of bts one clusterng carres about the other. In the table below we report the nformaton rato I(C t C e )=H(C t ), whch measures how much nformaton C e provdes about C t relatve to the maxmum possble (whch s the entropy of C t as I(C t C t )=H(C t )). We now turn to the second ssue of evaluatng the structure learnng. We measure several aspects of the learned structure. To evaluate the selected number of clusters, we record both the number of clusters n the model as well as the number of dentfed clusters n the model. These are clusters for whch there s at least one tranng sample that s assgned to t. For the CSI structure evaluaton we recorded the number of false postve and false negatve edges n the mpled graph. Recall that an edge corresponds to an nformatve attrbute n the dscusson above. We generated synthetc data from a model learned from Gasch et al dataset we descrbe below. Ths model had 5 clusters, 93 contnuous varables, and 25 dscrete nodes. As descrbed n Secton 5.2, ths model (as most of the ones we learned from real data) had several characterstcs. The contnuous attrbutes were mostly nformatve (usually about several clusters). On the other hand, most dscrete attrbutes were unnformatve and the remanng ones dstngushed mostly one cluster. From ths model, we sampled 5 tranng sets of szes 200, 500, and 800 samples (15 tranng sets n total), and a test set of 1000 samples. We expect that bologcal data sets to contan many samples that do not ft nto clusters. Thus, we want to ensure that our procedure s robust to the presence of such nose. To estmate ths robustness, we njected addtonal nose nto the tranng sets. Ths was done by addng samples, whose values were sampled unformly from the range of values each attrbute had n the real samples we already had at hand. These obscure the clusterng n the orgnal model. We ran our procedure on the sampled data sets that we obtaned by addng 10% or 30% addtonal nose samples to the orgnal tranng data. The procedure was ntalzed wth random startng ponts, and for each tranng data we searched for the best scorng model wth the number of clusters n the range K =3 ::: 7. We then chose the model wth the best score among these. Table 1 summarzes the average performance over the 5 tranng sets n each parameter settng (200,500, or 800 samples wth 10% or 30% nose ) usng the learnng procedure wth two scorng methods. We brefly summarze the hghlghts of the results. Search procedure: We compared the performance of the two varants of the search procedure. The drected approach apples structural EM teratons, and attempt to escape from local maxma by attemptng stochastc moves and evaluatng each one. The annealed approach apples structural EM teratons where n each teraton, samples are re weghted. In our experments, we started the annealng procedure wth ntal temperature (varance of gamma dstrbuton) 2, and each teraton cooled the temperature by a factor of 0.9. In partcular, n Fgure 2(a) we see that the annealed search procedure clearly outperforms the drected search on a partcular settng. Ths behavor was consstently observed n all settngs, and we do not report t here. Cluster number: In all runs, models learned wth fewer clusters than the orgnal model were sharply penalzed. On the other hand, models learned wth addtonal clusters got scores that were 15

16 close to the score receved when learnng wth 5/6 clusters; see Fgure 2(b). Most runs added another cluster that captured the nose samples we added n constructng the tranng data, and thus, most of the runs pck 6 or slghtly more clusters (see Table 1). In general, runs wth the BIC score had stronger penalty for addtonal clusters, whch resulted n choosng 6 clusters as the best scorng model more often. Runs wth the CS score sometmes added more clusters. Addtonally, as one mght expect, the number of chosen clusters tends to grow wth strong nose and wth larger sample sze. Lkelhood: As expected, tranng lkelhood s hgher than that of the true model. Ths occur both because the procedure fts better the tranng data, and because of the addtonal nosy samples n the tranng data. On the other hand, the learned models are always worse (as expected) on the test data. Addtonally, the test data lkelhood mproves wth number of tranng samples ncrease, even n nosy data that also has addtonal nose samples. As expected, models traned wth noser data are somewhat worse than models learned from cleaner data. As a general trend, the tranng data lkelhood of models learned wth the CS score are as good as or better than models learned wth the BIC scores. Ths dfference s sgnfcant manly n the noser data sets. The test set performance of both scores s roughly the same when learnng wth 10% nose (wth BIC slghtly better) and CS s better n 30% nose. Structure accuracy: We measured the percentage of addtonal dependences n the learned graph G when compared to the true structure (false postves) and mssng ones n the learned graph G (false negatves). In general, the procedure (usng both BIC and CS scores) tended to have very small rato of false negatves whch dmnshes as more tranng samples are avalable. Ths shows the procedure s good on recognzng relevant attrbutes. On the other hand, the procedure had nontrval number of false postves dependences, about 13% - 17 % dependng on the sample sze, the scorng functon, and the percentage of nose. In general, when usng the CS score, the procedure has a slghtly hgher rato of false postve. Smlarly, the presence of hgher nose levels, also ncreased the number of false postve dependences. Mutual Informaton Rato: In ths category all the runs wth 800 tranng samples acheved the maxmal nformaton gan. Runs wth 200 samples acheved nformaton gan of 94% and above. Runs wth 500 samples had varous results that depended on the level of noses. For 10% nose we got maxmal nformaton gan, whle results n the noser data set got 97% nformaton gan. As wth the lkelhood of the data, the CS score had slghtly better results compared to the BIC score. These results show that the learned clusters were nformatve about the orgnal clusters. Clearly, these smulatons only explore a small part of the space of possble parameters. However, they show that on a model that has statstcal characterstcs smlar to real-lfe datasets, our procedure can perform n a robust manner and dscover clusterngs that are close to the orgnal one, even n the presence of nose. 5.2 Bologcal Data We evaluated our procedure on two bologcal data sets of buddng yeast gene expresson. The frst data set s from Spellman et al. (1998) who measured expresson levels of genes durng dfferent cell-cycle stages. We examned the expresson of the 800 genes that Spellman et al dentfy as cell-cycle related n 77 experments. The second data set s from Gasch et al. (2000) who measured expresson levels of genes n response to dfferent envronmental changes. Gasch et al dentfed a cluster of genes that have generc response to stress condtons. In addton, they dentfed 16

17 alpha cdc15 cdc28 elu x nduced 7 2x repressed Fgure 3: The clusterng found for the cell-cycle data of Spellman et al.. Lght pxels correspond to over expressed genes, and dark ones correspond to under-expressed genes. The clusters shown here, where also characterzed by the exstence of the followng bndng stes. Clusters 2 and 5: STUAP (Aspergllus Stunted proten), Cluster 3: QA1 (DNA-bndng proten wth repressor and actvator actvtes, also nvolved n slencng at telomeres and slent matng type loc), Clusters 4 and 6: HSF (Heat shock transcrpton factor). 17

18 Schematcs Expresson TFs Expresson TFs Phylogentc fngerprnt CSI Mask Data (a) (b) 4x nduced 4x repressed Fgure 4: Representaton of the clusterng found n the stress data of Gasch et al. (a) clusterng based on gene expresson and TF putatve bndng stes. (b) clusterng based also on phylogenetc profles. The top row contans schematc representaton of the clusterng. The second row contans a CSI mask plot that hdes all expresson features that were consdered unnformatve by the model. The bottom row shows fgures of all the genes, sorted by cluster dentty. The followng clusters were also characterzed by putatve bndng stes: Cluster 6(a) and 8(b): GCN4 (Transcrpton factor of the basc leucne zpper (bzip) famly, regulates general control n response to amno acd or purne starvaton) and CBF1, Cluster 2(a) HAP234, Cluster 7(b) GCN4. 18

19 clusters of genes that responded to partcular stress condtons, but not n a generc manner. Our data set conssts of the 950 genes, selected by Segal et al. (2001), that responded to some stress condtons but are not part of the generc stress response. Both data sets are based on cdna array technology, and the expresson value of each gene s reported as the logarthm (base 2) of rato of expresson n the sample compared to the expresson of the same gene n a common baselne ( control sample ). In addton to the expresson levels from these two data sets, we recorded for each gene the number of putatve bndng stes n the 1000bp upstream of the ORF. These were generated by usng the fung matrces n the TRANSFAC 5.1 database (Wngender et al. 2000, Wngender et al. 2001). We used the MAST program (Baley & Grbskov 1998) to scan the upstream regons. We used these matches to count the number of putatve stes n the 1000bp upstream regon. Ths generated dscrete valued random varables (wth values > 2) that correspond to each putatve ste (ether a whole promoter regon or a sub-regon). We start by descrbng the parameters used n the algorthm that were revewed n prevous sectons. We appled the annealed search procedure wth the followng parameters 0 = 2 4 and = 0:5 0:75 0:9 0:95. Best results were obtaned wth = 0:9 0:95 wth ether 0 settngs. Other varatons, such as the technque for choosng the ntal model structure, had no clear cut domnaton of one technque over the other. We now dscuss the results, they also appear (wth full data fle and descrpton of the clusters) n There were several common trends n the results on both expresson data sets, when used wth MAST TF bndng stes. Frst, the expresson measurements were consdered nformatve by the clusterng. Most of the expresson varables had mpact on many of the clusters. Second, most bndng ste measurements were consdered non-nformatve. The learnng procedure decded for most of these that they have no effect on any of the clusters. Those that were consdered relevant, usually had only 1 or 2 dstnct contexts n ther local structure. Ths can be potentally due to the fact that some of these factors were truly rrelevant, or to a large number of errors made by the bndng ste predcton programs that mask the nformatve sgnal n these putatve stes. In any case ths means these attrbutes had relatvely small nfluence on the clusterng results and only the ones that seem to be correlated wth a clear gene expresson profle of one of the clusters were chosen by the model. To llustrate the type of clusters we found, we show n Fgures 3 and 4(a) two of the clusterngs we learned. These descrbe qualtatve cluster profles that helps see whch experments dstngush each cluster, and the general trend of expresson at each cluster s experments. Note the schematc llustraton of the masks that denote the expresson attrbutes that characterze each cluster. As we can see, these capture, qute well, the experments n whch genes n the cluster devate from the average expresson. Another clear observaton s that clusters learned from the cell-cycle data all show perodc behavor. Ths can be expected snce the 800 genes are all correlated wth the cell-cycle. However, the clusters dffer n ther phase. Such clusters profles are characterstc of many of the models we found for the cell-cycle data. In the Gasch et al data, the best scorng models had twelve clusters. In the model shown n Fgure 4(a), we see two clusters of genes that are under-expressed n stress condtons (Clusters 1, and 2), seven clusters of genes that are over expressed n these condtons (Clusters 3, 5, 6, 7, 8, 10, and 12), two cluster of genes that are over expressed n some stress condtons and under-expressed 19

20 n others (Cluster 4 and 9), and two cluster of genes wth undetermned response to stress condtons (Cluster 8, and 11). These later clusters have hgh varance, whle the others have relatvely tght varance n most experments. Some of the clusters correspond to clear bologcal functon: For example, Cluster 7, contans genes that are over-expressed n amno-acd starvaton and ntrogen depleton. Examnng the MIPS (Mewes et al. 1999) functonal annotaton of these genes suggests that many of the genes nvolved amno-acd bosynthess and n transport. Another example s Cluster 2 that contans genes that are under-expressed n late stages of ntrogen depleton, dauxc shft, and under YPD growth medum. Ths cluster s assocated wth frequent occurrences of the HAP234 bndng ste. Ths bndng ste (of the complex HAP2, HAP-3, and HAP-4) s assocated wth the control of gene expresson under nonfermentatve growth condtons. Many genes n ths cluster are assocated wth mtochondral organzaton and transport, respraton, and ATP transport. The assocaton of the cluster wth the HAP234 bndng strengthens the hypothess that genes wth unknown functon n ths cluster mght be related to these pathways. We suspect that one of the reasons few clusters are assocated wth transcrpton factors bndng ste s the nosy predcton of these stes. To evaluate the effect of a more nformatve sequence motfs dentfcaton n the upstream regon, we performed the followng experment. We appled our algorthm usng expresson values from the Gasch et al data set. Then, we appled the procedure of Barash et al (2001) to each of the clusters we dentfed. Ths procedure searches for motfs that dscrmnatvely appear n the upstream regon of genes n partcular clusters and are uncommon n other genes n the genome. We then annotated each gene wth the set of motfs we found, and used these annotatons as addtonal nput to a new run of our algorthm. Although we appled a farly smple unsupervsed sequence motf dentfcaton algorthm, ts mpact on the learnng algorthm was clear. Several hundred genes have changed ther hard assgnment from the ntal assgnment made when clusterng wth only expresson data, 26 out of 28 motfs were consdered nformatve to the fnal clusterng, and 2 motfs became relevant for 2 dfferent clusters. When we ran the new hard assgnments of genes to clusters n the motf fndng algorthm we got a general mprovement n motfs dentfcaton n clusters. Next, n order to demonstrate the model s ablty to facltate relevant bologcal data from dfferent sources, we consdered addng addtonal attrbutes extracted from the COG database (Tatusov et al. 2001). Ths database assocates each yeast gene wth orthologous genes n 43 other genomes. Thus, we create for each gene a phylogenetc pattern that denotes whether there s an orthologous gene n each of the 43 genomes. When we nclude these addtonal features, the clusters learned changes. In general, we note that most of the phylogenetc patterns were consdered nformatve by the model but stll context specfc. For example, we see pars of clusters (e.g., Clusters 5 and 10) that are smlar n terms of expresson, yet have dstnct phylogenetc profles. One cluster contans genes that do not have orthologs, whle the other cluster contans genes that have orthologs n many bacteral genomes. Phylogentc patterns also allow us to gan addtonal nsght nto the functonal aspects of the clusters. For example Cluster 8 contans genes that are hghly over-expressed n amno-acd starvaton and ntrogen depleton. It s characterzed by occurrences of the bndng stes of GCN4 and CBF1, and genes n t have the typcal profle wth orthologs n C. jejun, P. mutocde, Halobacterum sp. NRC-1, P. aerugnosa, M. tuberculoss, A. aeolcus, C. crescentus, H. pylor J99, M. leprae, D. radodurans, T. volcanum, and T. acdophlum, and no orthologs n M. gentalum, B. burgdorfer, C. pneumonae, C. trachomats, S.pyogenes, T. palldum, R. prowazek,, 20

21 x nduced 4x repressed Data CSI Masked Data Fgure 5: Clusterng of arrays n the stress data set of Gasch et al. The left fgure shows the data rearranged accordng to the clusterng. The rght fgure shows only the postons that are nformatve the learned models (note that cluster 1 s totally masked n ths model). U. urealytcum, and Buchnera sp. APS. Ths cluster descrpton suggests that ths group of genes have common phylogenetc orgns as well as common functon and regulaton. As we noted n the ntroducton, our method can be used for other clusterng tasks. As an example, we clustered the 92 samples n the stress data. In ths clusterng, we reversed the roles of condtons and genes. Now we consder each condton as an (ndependent) sample, and each gene as a (contnuous) attrbute of the sample. The result of the clusterng are groups of samples, for each cluster we have the lst of nformatve genes. Not surprsngly, ths clusterng recovered qute well the groups of samples wth the same treatments. Table 2 shows the composton of each cluster n a run wth 10 clusters n terms of the orgnal treatments. Each of the followng treatments were recovered n a separate cluster: ddt, damde, YP, and steady state. In addton, the ntrogen depleton tme course was splt nto two clusters. The earler samples (30 mnutes to 4 hours) appeared n a cluster wth the amno acd starvaton samples, whle the later samples (8 hours to 5 days) were clustered separately. Ths s consstent wth clusters we learned over genes, that showed that some genes had dstnct behavor n later parts of the ntrogen depleton tme course. Smlar phenomena occurs wth H2O2 samples. Earler samples (10 mnutes - 50 mnutes) are clusters wth Menadon samples. Later H2O2 samples (60 mnutes to 80 mnutes, and also 40 mnutes) were clustered wth sorbtol samples. Fnally, both heat shock tme courses (fxed temperature, and varable temperature) were mostly clustered n one cluster, although some of the heat shock samples 21

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

Machine Learning. K-means Algorithm

Machine Learning. K-means Algorithm Macne Learnng CS 6375 --- Sprng 2015 Gaussan Mture Model GMM pectaton Mamzaton M Acknowledgement: some sldes adopted from Crstoper Bsop Vncent Ng. 1 K-means Algortm Specal case of M Goal: represent a data

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Parameter estimation for incomplete bivariate longitudinal data in clinical trials Parameter estmaton for ncomplete bvarate longtudnal data n clncal trals Naum M. Khutoryansky Novo Nordsk Pharmaceutcals, Inc., Prnceton, NJ ABSTRACT Bvarate models are useful when analyzng longtudnal data

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

Learning the Structure of Dynamic Probabilistic Networks

Learning the Structure of Dynamic Probabilistic Networks Learnng the Structure of Dynamc robablstc Networks Nr Fredman Kevn Murphy Stuart Russell Computer Scence Dvson, U. of Calforna, Berkeley, CA 947 fnr,murphyk,russellg@cs.berkeley.edu Abstract Dynamc probablstc

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Fitting: Deformable contours April 26 th, 2018

Fitting: Deformable contours April 26 th, 2018 4/6/08 Fttng: Deformable contours Aprl 6 th, 08 Yong Jae Lee UC Davs Recap so far: Groupng and Fttng Goal: move from array of pxel values (or flter outputs) to a collecton of regons, objects, and shapes.

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Random Varables and Probablty Dstrbutons Some Prelmnary Informaton Scales on Measurement IE231 - Lecture Notes 5 Mar 14, 2017 Nomnal scale: These are categorcal values that has no relatonshp of order or

More information

Using Bayesian Network Inference Algorithms to Recover Molecular Genetic Regulatory Networks

Using Bayesian Network Inference Algorithms to Recover Molecular Genetic Regulatory Networks Usng Bayesan Network Inference Algorthms to Recover Molecular Genetc Regulatory Networks Jng Yu 1,2, V. Anne Smth 1, Paul P. Wang 2, Alexander J. Hartemnk 3, Erch D. Jarvs 1 1 Duke Unversty Medcal Center,

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Fusion Performance Model for Distributed Tracking and Classification

Fusion Performance Model for Distributed Tracking and Classification Fuson Performance Model for Dstrbuted rackng and Classfcaton K.C. Chang and Yng Song Dept. of SEOR, School of I&E George Mason Unversty FAIRFAX, VA kchang@gmu.edu Martn Lggns Verdan Systems Dvson, Inc.

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

General Vector Machine. Hong Zhao Department of Physics, Xiamen University

General Vector Machine. Hong Zhao Department of Physics, Xiamen University General Vector Machne Hong Zhao (zhaoh@xmu.edu.cn) Department of Physcs, Xamen Unversty The support vector machne (SVM) s an mportant class of learnng machnes for functon approach, pattern recognton, and

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

A Semi-parametric Regression Model to Estimate Variability of NO 2

A Semi-parametric Regression Model to Estimate Variability of NO 2 Envronment and Polluton; Vol. 2, No. 1; 2013 ISSN 1927-0909 E-ISSN 1927-0917 Publshed by Canadan Center of Scence and Educaton A Sem-parametrc Regresson Model to Estmate Varablty of NO 2 Meczysław Szyszkowcz

More information