Joint Probabilistic Curve Clustering and Alignment

Jont Probablstc Curve Clusterng and Algnment Scott Gaffney and Padhrac Smyth School of Informaton and Computer Scence Unversty of Calforna, Irvne, CA 9697-345 {sgaffney,smyth}@cs.uc.edu Abstract Clusterng and predcton of sets of curves s an mportant problem n many areas of scence and engneerng. It s often the case that curves tend to be msalgned from each other n a contnuous manner, ether n space (across the measurements) or n tme. We develop a probablstc framework that allows for the jont clusterng and contnuous algnment of sets of curves n curve space (as opposed to a fxed-dmensonal feature-vector space). The proposed methodology ntegrates new probablstc algnment models wth model-based curve clusterng algorthms. The probablstc approach allows for the dervaton of consstent EMtype learnng algorthms for the jont clusterng-algnment problem. Expermental results are shown for algnment of human growth data and jont clusterng and algnment of gene expresson tme-course data. Introducton We ntroduce a novel methodology for the clusterng and predcton of sets of smoothly varyng curves whle jontly allowng for the learnng of sets of contnuous curve transformatons. Our approach s to formulate models for both the clusterng and algnment sub-problems and ntegrate them nto a unfed probablstc framework that allows for the dervaton of consstent learnng algorthms. The algnment sub-problem s handled wth the ntroducton of a novel curve algnment procedure employng model prors over the set of possble algnments leadng to the dervaton of EM learnng algorthms that formalze the so-called Procrustes approach for curve data []. These algnment models are then ntegrated nto a fnte mxture model settng n whch the clusterng s carred out. We make use of both polynomal and splne regresson mxture models to complete the jont clusterng-algnment framework. The followng smple llustratve example demonstrates the mportance of jontly handlng the clusterng-algnment problem as opposed to treatng algnment and clusterng separately. Fgure (a) shows a smulated set of curves whch have been subjected to random translatons n tme. The underlyng generatve model contans three clusters each descrbed by a cubc polynomal (not shown). Fgure (b) shows the output of the EM algorthm descrbed later n ths paper, where curves have been smultaneously algned and clustered. Ths fgure s vrtually dentcal

Y axs 5 5 5 4 6 8 Tme Y axs 5 5 5 5 Tme (a) Smulated data (b) Jont EM results Y axs 5 5 5 4 6 8 Tme Y axs 5 5 5 5 Tme (c) Cluster frst (d) Algn second Fgure : Comparson of jont EM and sequental clusterng-algnment: (top-row) unlabelled smulated data wth hdden algnments, and soluton recovered by EM, (bottom-row) cluster and then algn. to that of the orgnal data (wth cluster labels and no msalgnnment). Fgures (c) shows the result of frst clusterng the unalgned data, and Fgure (d) then shows the result of algnng wthn each cluster. The sequental approach results n sgnfcant msclassfcaton and ncorrect algnment, demonstratng that a two-stage approach can be qute suboptmal when compared to a jont clusterng-algnment methodology. (Smlar results, not shown, are obtaned when the curves are frst algned and then clustered see [] for full detals.) There has been lttle pror work on the specfc problem of jont curve clusterng and algnment, but there s related work n other areas. For example, clusterng of gene-expresson tme profles wth mxtures of splnes s addressed n [3]. However, algnment s only consdered as a post-processng step to compare cluster results among related datasets. In mage analyss, the transformed mxture of Gaussans (TMG) model uses a probablstc framework and an EM algorthm to jontly learn clusterng and algnment of mage patches subject to varous forms of lnear transformatons [4]. However, ths model only consders sets of transformatons n dscrete pxel space, whereas we are focused on curve modellng that allows for arbtrary contnuous algnment n tme and space. Another branch of work n mage analyss focuses on the problem of estmatng correspondences of ponts across mages [5] (or vertces across graphs [6]), usng EM or determnstc annealng algorthms. The results we descrbe here dffer prmarly n that (a) we focus specfcally on sets of curves rather than mage data (generally makng the problem more tractable), (b) we focus on clusterng and algnment rather than just algnment, (c) we allow contnuous affne transformatons n tme and measurement space, and (d) we have a fully generatve probablstc framework allowng for (for example) the ncorporaton of nformatve prors on transformatons f such pror nformaton exsts. In earler related work we developed general technques for curve clusterng (e.g., [7]) and also proposed technques for transformaton-nvarant curve clusterng wth dscrete tme algnment and Gaussan mxture models for curves [8, 9]. In ths paper we provde

a much more general framework that allows for contnuous algnment n both tme and measurement space for a general class of cluster shape models, ncludng polynomals and splnes. Jont clusterng and algnment It s useful to represent curves as varable-length vectors. In ths case, y s a curve that conssts of a sequence of n observatons or measurements. The j-th measurement of y s denoted by y j and s usually taken to be unvarate (the generalzaton to multvarate observatons s straghtforward). The assocated covarate of y s wrtten as x n the same manner. x s often thought of as tme so that x j gves the tme at whch y j was observed. Regresson mxture models can be effectvely used to cluster ths type of curve data []. In the standard setup, y s modelled usng a normal (Gaussan) regresson model n whch y = X β+ɛ, where β s a (p+) coeffcent vector, ɛ s a zero-mean Gaussan nose varable, and X s the regresson matrx. The form of X depends on the type of regresson model employed. For polynomal regresson, X s often assocated wth the standard Vandermonde matrx; and for splne regresson, X takes the form of a splne-bass matrx (see, e.g., [7] for more detals). The mxture model s completed by repeatng ths model over K clusters and ndexng the parameters by k so that, for example, y = X β k + ɛ gves the regresson model for y under the k-th cluster. B-splnes [] are partcularly effcent for computatonal purposes due to the blockdagonal bass matrces that result. Usng B-splnes, the curve pont y j can be represented as the lnear combnaton y j = B j c, n whch the vector B j gves the vector of B-splne bass functons evaluated at x j, and c gves the splne coeffcent vector []. The full curve y can then be wrtten compactly as y = B c n whch the splne bass matrx takes the form B =[B B n ]. Splne regresson models can be easly ntegrated nto the regresson mxture model framework by equatng the regresson matrx X wth the splne bass matrx B. In what follows, we use the more general notaton X n favor of the more specfc B.. Jont model defnton The jont clusterng-algnment model defnton s based on a regresson mxture model that has been augmented wth up to four ndvdual random transformaton parameters or varables (a,b,c,d ). The a and b allow for scalng and translaton n tme, whle the c and d allow for scalng and translaton n measurement space. The model defnton takes the form y = c a x b β k + d + ɛ, () n whch a x b represents the regresson matrx X (ether splne or polynomal) evaluated at the transformed tme a x b. Below we use the matrx X to denote a x b when parsmony s requred. It s assumed that ɛ s a zero-mean Gaussan vector wth covarance σk I. The condtonal densty p k (y a,b,c,d )=N(y c a x b β k + d,σki) () gves the probablty densty of y when all the transformaton parameters (as well as cluster membershp) are known. (Note that the densty on the left s mplctly condtoned on an approprate set of parameters ths s always assumed n what follows.) In general, the values for the transformaton parameters are unknown. Treatng ths as a standard hdden-data problem, t s useful to thnk of each of the transformaton parameters as random varables that are curve-specfc but wth populaton-level pror probablty dstrbutons. In ths

way, the transformaton parameters and the model parameters can be learned smultaneously n an effcent manner usng EM.. Transformaton prors Prors are attached to each of the transformaton varables n such a way that the dentty transformaton s the most lkely transformaton. A useful pror for ths s the Gaussan densty N (µ, σ ) wth mean µ and varance σ. The tme transformaton prors are specfed as a N(,r k), b N(,s k), (3) and the measurement space prors are gven as c N(,u k),d N(,v k). (4) Note that the dentty transformaton s ndeed the most lkely. All of the varance parameters are cluster-specfc n general; however, any subset of these parameters can be ted across clusters f desred n a specfc applcaton. Note that these prors techncally allow for negatve scalng n tme and n measurement space. In practce ths s typcally not a problem, though one can easly specfy other prors (e.g., log-normal) to strctly dsallow ths possblty. It should be noted that each of the pror varance parameters are learned from the data n the ensung EM algorthm. However, below we do not make use of hyperprors for these pror parameters (ths can be ntegrated n a straghtforward manner f desred)..3 Full probablty model The jont densty of y and the set of transformaton varables Φ = {a,b,c,d } can be wrtten succnctly as p k (y, Φ )=p k (y Φ )p k (Φ ), (5) where p k (Φ )=N(a,rk )N (b,s k )N (c,u k )N (d,vk ). The space transformaton parameters can be ntegrated-out of (5) resultng n the margnal of y condtoned only on the tme transformaton parameters. Ths condtonal margnal takes the form p k (y a,b ) = p k (y,c,d a,b ) dc,dd = N (y X β k, U k + V k σki), (6) wth U k = u k X β k β k X + σ k I and V k = vk + σ ki. The uncondtonal (though, stll cluster-dependent) margnal for y cannot be computed analytcally snce a,b cannot be analytcally ntegrated-out. Instead, we use numercal Monte Carlo ntegraton for ths task. The resultng uncondtonal margnal for y can be approxmated by p k (y ) = p k (y a,b )p k (a )p k (b ) da db p k (y M a (m),b (m) ), m (7) where the M Monte Carlo samples are taken accordng to a (m) N(,rk ), and b(m) N(,s k ), for m =,...,M. (8) A mxture results when cluster membershp s unknown: p(y )= α k p k (y ). k (9) The log-lkelhood of all n curves Y = {y } follows drectly from ths approxmaton and takes the form log p(y ) log mk α k p k (y a (m),b (m) ) n log M. ()

.4 EM algorthm We derve an EM algorthm that smultaneously allows the learnng of both the model parameters and the transformaton varables Φ wth tme-complexty that s lnear n the total number of data ponts N = n. Frst, let z gve the cluster membershp for curve y. Now, regard the transformaton varables {Φ } as well as the cluster membershps {z } as beng hdden. The complete-data log-lkelhood functon s defned as the jont loglkelhood of Y and the hdden data {Φ,z }. Ths can be wrtten as the sum over all n curves of the log of the product of α z and the cluster-dependent jont densty n (5). Ths functon takes the form L c = log α z p z (y Φ ) p z (Φ ). () In the E-step, the posteror p(φ,z y ) s calculated and then used to take the posteror expectaton of Equaton (). Ths expectaton s then used n the M-step to calculate the re-estmaton equatons for updatng the model parameters {β k,σ k,r k,s k,u k,v k }..5 E-step The posteror p(φ,z y ) can be factorzed as p z (Φ y )p(z y ). The second factor s the membershp probablty w k that y was generated by cluster k. It can be rewrtten as p(z = k y ) p k (y ) and evaluated usng Equaton (7). The frst factor requres a bt more work. Further factorng reveals that p z (Φ y )=p z (c,d a,b, y )p z (a,b y ). The new frst factor p z (c,d a,b, y ) can be solved for exactly by notng that t s proportonal to a bvarate normal dstrbuton for each z []. The new second factor p z (a,b y ) cannot, n general, be solved for analytcally, so nstead we use an approxmaton. The fact that posteror denstes tend towards hghly peaked Gaussan denstes has been wdely noted (e.g, []) and leads to the normal approxmaton of posteror denstes. To make the approxmaton here, the vector (â k, ˆb k ) representng the mult-dmensonal mode of p k (a,b y ), the covarance matrx V (k) a b for (â k, ˆb k ), and the separate varances V ak,v bk must be found. These can readly be estmated usng a Nelder-Mead optmzaton method. Experments have shown ths approxmaton works well across a varety of expermental and real-world data sets []. The above calculatons of the posteror p(φ,z y ) allow the posteror expectaton of the complete-data log-lkelhood n Equaton () to be solved for. Ths expectaton results n the so-called Q-functon whch s maxmzed n the M-step. Although the dervaton s qute complex, the Q-functon can be calculated exactly for polynomal regresson []; for splne regresson, the bass functons do not afford an exact formula for the soluton of the Q-functon. However, n the splne case, removal of a few problematc varance terms gves an effcent approxmaton (the nterested reader s referred to [] for more detals)..6 M-step The M-step s straghtforward snce most of the hard work s done n the E-step. The Q- functon s maxmzed over the set of parameters {β k,σk,r k,s k,u k,v k } for k K. The derved solutons are as follows: ˆr k = [â ] w w k k + V ak, ŝ ] k = k w w k [ˆb k + V bk, k û k = w k [ĉ ] w k k + V ck, ˆv k = w k w k [ ˆd k + V dk ],

Heght acceleraton 4 4 6 4 6 8 Age Heght acceleraton 4 4 6 8 4 6 8 Age Fgure : Curves measurng the heght acceleraton for 39 boys; (left) smoothed versons of raw observatons, (rght) automatcally algned curves. [ ] [ ] ˆβ k = w k ĉ ˆX ˆX k k k + Vxx w k ĉ k ˆX k(y ˆd k )+V xy V xcd, and ˆσ k = [ y w w k ĉ k ˆX k β kn ˆd k y Vx ˆβ k + ˆβ kvxx ˆβ k +ˆβ kvxcd + n V dk ], where ˆX k = â k x ˆb k, and Vxx, Vx, Vxcd are specal varance matrces whose components are functons of the posteror expectatons of Φ calculated n the E-step (the exact forms of these matrces can be found n []). 3 Expermental results and conclusons The results of a smple demonstraton of EM-based algnment (usng splnes and the learnng algorthm of the prevous secton, but wth no clusterng) are shown n Fgure. In the left plot are a set of smoothed curves representng the acceleraton of heght for each of 39 boys whose heghts were measured at 9 observaton tmes over the ages of to 8 []. Notce that the curves share a smlar shape but seem to be msalgned n tme due to ndvdual growth dynamcs. The rght plot shows the same acceleraton curves after processng from our splne algnment model (allowng only for translatons n tme). The x-axs n ths plot can be seen as canoncal (or average ) age. The algned curves n the rght plot of Fgure represent the average behavor n a much clearer way. For example, t appears there s an nterval of.5 years from peak (age.5) to trough (age 5) that descrbes the average cycle that all boys go through. The results demonstrate that t s common for mportant features of curves to be randomly translated n tme and that t s possble to use the data to recover these underlyng hdden transformatons usng our algnment models. Next we brefly present an applcaton of the jont clusterng-algnment model to the problem of gene expresson clusterng. We analyze the alpha arrest data descrbed n [3] that captures gene expresson levels at 7 mnute ntervals for two consecutve cell cycles (totalng 7 measurements per gene). Clusterng s often used n gene expresson analyss to reveal groups of genes wth smlar profles that may be physcally related to the same underlyng bologcal process (e.g., [3]). It s well-known that tme-delays play an mportant role n gene regulaton, and thus, curves measured over tme whch represent the same process may often be msalgned from each other. [4].

Expresson Expresson 5 5 Canoncal tme 5 5 Tme Expresson Expresson 5 5 Canoncal tme 5 5 Tme Expresson Expresson 5 5 Canoncal tme 5 5 Tme Fgure 3: Three clusters for the tme translaton algnment model (left) and the nonalgnment model (rght). Snce these gene expresson data are already normalzed, we dd not allow for transformatons n measurement space. We only allow for translatons n tme snce experts do not expect scalng n tme to be a factor n these data. Due to lmted space, we present a sngle case of comparson between a standard splne regresson mxture model (SRM) and an SRM that jontly allows for tme translatons. Ten random starts of EM were allowed for each algorthm wth the hghest lkelhood model selected for comparson for each algorthm. It s common to assume that there are fve dstnct clusters of genes n these data; as such we set K =5for each algorthm [3]. Three of the resultng clusters from the two methods are shown n Fgure 3. The left column of the fgure shows the output from the jont clusterng-algnment model, whle the rght column shows the output from the standard cluster model. It s mmedately obvous that the tme-algned clusters represent the mean behavor n a much clearer way. The overall cluster varance s much lower than n the non-algned clusterng. The results also demonstrate the appearance of cluster-dependent algnment effects. For example, the frst two clusters show large wthn-cluster msalgnment, whereas the thrd cluster does not. Interestngly, the actual clusterng s consderably dfferent between the two methods. In fact, only 57% of the expresson profles are assgned to common clusters between the two methods. Out-of-sample experments (not shown here) show that the jont model produces better predctve models than the standard clusterng method. Expermental results on a varety of other data sets are provded n [], ncludng applcatons to clusterng of cyclone trajectores.

4 Conclusons We proposed a general probablstc framework for jont clusterng and algnment of sets of curves. The expermental results ndcate that the approach provdes a new and useful tool for curve analyss n the face of underlyng hdden transformatons. The resultng EM-based learnng algorthms have tme-complexty that s lnear n the number of measurements n contrast, many exstng curve algnment algorthms themselves are O(n ) (e.g., dynamc tme warpng) wthout regard to clusterng. The ncorporaton of splnes gves the method an overall non-parametrc freedom whch leads to general applcablty. References [] J.O. Ramsay and B. W. Slverman. Functonal Data Analyss. Sprnger-Verlag, New York, NY, 997. [] Scott J. Gaffney. Probablstc Curve-Algned Clusterng and Predcton wth Regresson Mxture Models. Ph.D. Dssertaton, Unversty of Calforna, Irvne, 4. [3] Z. Bar-Joseph et al. A new approach to analyzng gene expresson tme seres data. Journal of Computatonal Bology, (3):34 356, 3. [4] B. J. Frey and N. Jojc. Transformaton-nvarant clusterng usng the EM algorthm. IEEE Trans. PAMI, 5(): 7, January 3. [5] H. Chu, J. Zhang, and A. Rangarajan. Unsupervsed learnng of an atlas from unlabeled pontsets. IEEE Trans. PAMI, 6():6 7, February 4. [6] A. D. J. Cross and E. R. Hancock. Graph matchng wth a dual-step EM algorthm. IEEE Trans. PAMI, ():36 53, November 998. [7] S. J. Gaffney and P. Smyth. Curve clusterng wth random effects regresson mxtures. In C. M. Bshop and B. J. Frey, edtors, Proc. Nnth Inter. Workshop on Artfcal Intellgence and Stats, Key West, FL, January 3 6 3. [8] D. Chudova, S. J. Gaffney, and P. J. Smyth. Probablstc models for jont clusterng and tmewarpng of mult-dmensonal curves. In Proc. of the Nneteenth Conference on Uncertanty n Artfcal Intellgence (UAI-3), Acapulco, Mexco, August 7, 3. [9] D. Chudova, S. J. Gaffney, E. Mjolsness, and P. J. Smyth. Translaton-nvarant mxture models for curve clusterng. In Proc. Nnth ACM SIGKDD Inter. Conf. on Knowledge Dscovery and Data Mnng, Washngton D.C., August 4 7, New York, 3. ACM Press. [] S. Gaffney and P. Smyth. Trajectory clusterng wth mxtures of regresson models. In Surajt Chaudhur and Davd Madgan, edtors, Proc. Ffth ACM SIGKDD Inter. Conf. on Knowledge Dscovery and Data Mnng, August 5 8, pages 63 7, N.Y., 999. ACM Press. [] P. H. C. Elers. and B. D. Marx. Flexble smoothng wth B-splnes and penaltes. Statstcal Scence, ():89, 996. [] A. Gelman, J. B. Carln, H. S. Stern, and D. B. Rubn. Bayesan Data Analyss. Chapman & Hall, New York, NY, 995. [3] P. T. Spellman et al. Comprehensve dentfcaton of cell cycle-regulated genes of the yeast Saccharomyces cerevsae by mcroarray hybrdzaton. Molec. Bo. Cell, 9():373 397, December 998. [4] J. Aach and G. M. Church. Algnng gene expresson tme seres wth tme warpng algorthms. Bonformatcs, 7(6):495 58,.