Including Spatial Information in Clustering of Multi-Channel Images

Size: px

Start display at page:

Download "Including Spatial Information in Clustering of Multi-Channel Images"

Laurel Madison Johnson
5 years ago
Views:

1 Includng Spatal Informaton n Clusterng of Mult-Channel Images een wetenschappeljke proeve op het gebed van de Natuurwetenschappen, Wskunde en Informatca PROEFSCHRIFT ter verkrjgng van de graad van doctor aan de Radboud Unverstet Njmegen op gezag van de Rector Magnfcus prof. dr. C.W.P.M. Blom, volgens beslut van het College van Decanen n het openbaar te verdedgen op maandag 21 november 2005 des namddags om 3.30 uur preces door Thanh Ngoc Tran Geboren op 25 jul 1973 Te Hano, Vetnam

2 Promotor: Copromotor: Manuscrptcommsse: Prof. dr. Lutgarde M.C. Buydens Dr. Ron Wehrens Prof. dr. Pet van Espen Unversty of Antwerpen, Belgum Prof. dr. Freek van der Meer Internatonal Insttute for Geo- Informaton Scence and Earth Observaton (ITC) Dr. Drk H. Hoekman Wagenngen Unversty Prnt Prnt Partners Ipskamp ISBN Cover photo: C-band Polarmetrc SAR Image of Flevoland n the Netherlands (Source Wagenngen Unversty)

3 CONTENTS 1. General ntroducton Introducton to clusterng mult-spectral mages: a tutoral Introducton Problems for clusterng multvarate mages Example mages Smlarty Measures Clusterng technques Pre- and Post-processng Concluson Knn-Kernel Densty-based Clusterng for Hgh Dmensonal Multvarate Data Introducton Knn-Kernel Densty Estmaton Results Summary Sparef: A Clusterng Algorthm for Mult-spectral Images Introducton Notaton Descrpton of SPAREF Software Segmentaton Experments Concluson Intalzaton of Markov Random Feld Clusterng of Large Remote Sensng Introducton Basc Elements n Mxture Models and Markov Random Feld Clusterng The proposed method Applcaton to SAR Data Concluson and dscusson Strateges for Mxture Model Clusterng of Multvarate Images Introducton Prevous works Strategy I Strategy II Results Conclusons and dscusson Concluson, Dscusson and Future Prospects Summary Samenvattng Acknowledgement Currculum Vtae

4 2

5 CHAPTER 1 MOTIVATION, OBJECTIVE AND OVERVIEW OF THE THESIS Ths thess s the result of work n clusterng of mult-varate/-spectral mages at the Department of Analytcal Chemstry, Insttute of Molecules and Materals (IMM), Radboud Unversty Njmegen, The Netherlands. In ths chapter, the motvaton and objectves of ths thess are presented. Fnally, an overvew of the content of the thess wll be gven. 1.1 Motvaton Nowadays, hgh resoluton mages are often measured n many current magng systems. Clusterng has become an mportant tool for revealng underlyng structure n mages for varous applcatons. For example, remote sensng mages have made t possble to map remote areas and to update exstng nformaton effcently and cheaply at both global and regonal scales. Advances n spatal resoluton allow us to work on even very small scales. In applcatons such as daly montorng of agrcultural objects by creatng agrcultural block-maps or maps for urban and wet-, flood-land areas, most of the nterpretaton s stll made by human experts on aeral photos (Rydberg, 2001). Ths s an expensve procedure, and n many cases mpossble for the huge numbers of mages that have been collected over several years over large study areas. Hence, an automated classfcaton method would reduce costs sgnfcantly, and makes many prevously mpractcal applcatons feasble. Supervsed classfcaton s preferable when tranng samples are avalable. However, collectng tranng samples agan consumes very much tme and effort. Sometmes, t s even mpossble because of the sze or accessblty of the research area. Clusterng (.e. unsupervsed classfcaton), on the other hand, works wthout the need of pror knowledge n the form of tranng samples. Human experts are stll useful to verfy clusterng results and to make a decson to select a specfc clusterng method that s the most approprate for the dataset at hand. Ths can normally be done usng a smaller sample dataset and t can be extended to the larger set or to another dataset of the same type. Not only n remote sensng applcatons, but also n many other felds, clusterng technques play an mportant role. Clusterng of Magnetc Resonance Images (MRI) and X-ray mages has been appled for qualty nspecton of food, vegetables and postharvest products (Abbott, 1999, Hall et al., 1998, and Noordam, 2005), n whch wthout a pror nformaton, clusterng s used to detect small defects or abnormaltes n the nspecton object mage. In medcal applcatons, wth the recent development of Magnetc Resonance Spectroscopc Imagng (MRS), clusterng of the combnaton of MRI and MRS data brngs more relable and non-nvasve bran tumor dagnoss (Smonett, 2004) Clusterng Clusterng normally works wth no pror knowledge about the classes that are present. There are many ways to defne clusterng: clusterng n whch each of member of clusters s n some way smlar and dfferent from the members of other clusters. (Kaufman, 1990) 3

6 CHAPTER 1 clusterng s used to classfy objects, characterzed by the values of a set of varables, nto groups. (Vandegnste et. al. 1998). clusterng s to help to understand relatonshps of objects by smlarty (Tran, 2004) A fundamental ssue n clusterng s n the defnton of the smlarty of objects to form a natural ( homogeneous ) group. Due to the adaptvty of the smlarty concept, t s too much to expect a sngle method to be optmal for all cases; for example, n remote sensng, land cover types wthn the urban envronment have a very complex nature and dverse composton. Hence, homogenety s also dverse. Moreover, clusters can have dfferent shapes, szes, populatons, or dstrbutons. A huge number of clusterng methods therefore has been developed durng the last decades and t s necessary to know n whch cases, whch clusterng method can do best (Chapter 2). Parttonal clusterng methods, such as K-means, fuzzy C-means (Bezdek, 1981), ISODATA (Ball and Hall, 1965) and mxture modellng (McLachlan and Peel, 2000) by Expectaton Maxmzaton (Dempster et al. 1977) are the most often-used methods for moderate and large datasets, due to ther tme-effcent computaton. Especally mxture model clusterng, modelng a statstcal dstrbuton by a fnte mxture dstrbuton of other dstrbutons, becomes more and more popular nowadays n remote sensng applcatons (Ichoku and Karnel, 1996, Brown et al., 2000) and many other applcatons; see, e.g, (Yeung et al., 2001, Alexandrds et al., 2004) for clusterng gene expresson n genomcs. However the ntalzaton s crtcal on determnng the rght nput parameters (Fraley and Raftery, 2002). Some other methods based on a herarchcal clusterng scheme and mxture modellng, e.g. model-based clusterng (Fraley and Raftery, 2002), fnd a better way to dentfy the number of clusters and the correspondng nput parameters. However, applyng them to large datasets s dffcult. It often happens that clusters are overlappng; nformaton of objects n the overlappng area may be very smlar and t s hard or even mpossble to separate these objects Clusterng mult-spectral mages Image data s dfferent from normal spectrum-only data because of the avalablty of the spatal nformaton, the spatal relatons between pxels n the mage. Ths s mportant nformaton whch can mprove the performance of clusterng methods. However, most mage clusterng methods are pxel-based approaches, takng pxel by pxel wthout payng attenton to the spatal nformaton. In ths thess specfc research questons were addressed: Q1. How can we use spatal nformaton to derve better clusterng algorthms? Q2. Can we make the algorthms effcent for very large mult-spectral mages? Q3. Can we apply the algorthms for mages of a very hgh spectral dmenson? Q4. Can we automatcally dentfy the number of clusters n the mage? Detaled dscusson of the problems and gudelnes to clusterng on multvarate mages are gven n Chapter Objectve The objectve of ths thess s to study a possblty of extenson of clusterng technques, especally the mxture modellng, to moderate and large multvarate/mult-spectral mages 4

7 GENERAL INTRODUCTION takng advantage of spatal nformaton. The man nterest s to mprove the robustness of clusterng methods (nput parameters and the number of classes), and the total accuracy by reducng the nfluence of the problems of overlappng clusters and nose on (but not lmted to) remotely sensed mages. 1.3 Overvew of the thess Ths thess comprses research papers that were wrtten durng partcpaton n the doctoral program at the Department of Analytcal Chemstry, Radboud Unversty Njmegen. Chapter 2 presents a detaled ntroducton to the major types of clusterng technques and ther problems. Partcular attenton wll be devoted to the extenson to take nto account both spectral and spatal nformaton of the mage data. General gudelnes for the optmal use of these algorthms are gven. Chapter 3 focuses on the automatc determnaton of the number of clusters (Queston four) n a hgh dmensonal data set (Queston three). Especally, the proposed method, KNNCLUST, s based on nonparametrc densty-based clusterng methods. Ths method has major advantages over all tradtonal densty-based methods to deal wth clusters of wdely dfferent denstes. Spatal nformaton s not used by ths method: ths wll be studed ntensvely startng from the next chapter. Due to a reasonably hgh computatonal complexty, KNNCLUST s useful for small datasets wth the problem of dfferent cluster denstes. The thess pays partcular attenton to solutons to Queston one. The spatal nformaton can be used at dfferent places n the clusterng process; ether at the begnnng of the clusterng process to dentfy good ntal parameters, or durng clusterng by ntroducng a weght functon to the ordnary smlarty functon, or at the fnal stage to flter the clusterng result to mprove performance of clusterng algorthms. Especally, the nfluence of overlappng clusters, nose/artefacts, and mxed pxels are reduced by the modfcaton of the smlarty functon by a weght functon, so that pxels n the overlappng area are closer f they form a spatal regon, otherwse they are far apart. It wll be llustrated n Chapter 2, 5, and 6. Nose can be treated n the same way. In Chapter 4, the smple combnaton of the often-used k-means clusterng and Ward s herarchcal clusterng s presented. The refnement step, ntroduced at the end of the clusterng algorthm, uses spatal nformaton. It leads to an mprovement of the performance of clusterng on a remote sensng Compact Arborne Spectrographc Imager (CASI) mage from an area n the Klompenwaard, the Netherlands. Mxture model clusterng becomes more and more popular and a central ssue s determnng the number of components (clusters) and ther ntal parameters. For a large and complex mage, t s often very hard to apply the mxture model clusterng to the entre mage (Fraley and Raftery, 2002, Murat Dundar and Landgrebe, 2002). These stuatons wll be nvestgated n Chapters 5 and 6, where the spatal nformaton s used to deal wth these problems (Queston one). Brefly, Chapter 5 uses the combnaton of statstcal testng and herarchcal clusterng to produce the ntal parameters for clusters. In Chapter 6, two novel strateges to mxture model clusterng on multvarate mage are proposed. One of the strateges s ntended for the normal stuaton of mxture modellng, where the densty of a cluster s modelled by a sngle normal dstrbuton, the second s desgned for a more complex stuaton, where the densty of ndvdual clusters s a mxture of several normal sub-clusters. 5

8 CHAPTER 1 The man part of both strateges s the estmaton of the ntal parameters based on the combnaton of the smple regon growng segmentaton and model-based herarchcal clusterng (Fraley, 1998). Snce the number of regons s much smaller than the number of pxels, the algorthm can work very fast (Questons one and two). In the case where one cluster s modelled by several Gaussans, an addtonal merge s performed to jon clusters that are overlappng; these can be regarded as sub-clusters. The fnal classfcaton step extends the classfcaton to the entre mage. Agan, spatal nformaton can optonally be used to mprove clusterng by usng Markov Random Feld (Queston one). The clusterng procedure s fast enough to be used for moderate-sze and large multvarate mages (Queston two). In both Chapter 5, and 6, the best model s dentfed by the Pseudolkelhood Informaton Crteron (PLIC) (Stanford and Raftery, 2002) and Bayesan Informaton Crteron (BIC) (Schwarz, 1978), respectvely (Queston four). In chapter 7, we summarze our conclusons from precedng chapters and dscuss drectons for future research. References Abbott, J.A., (1999). Qualty measurement of fruts and vegetables, Postharvest Bology and Technology. 15(3) Alexandrds, R. Ln, S. and Irwn, M (2004), Class dscovery and classfcaton of tumor samples usng mxture modelng of gene expresson data-a unfed approach, Bonformatcs, 20(16) Ball, G.H., and Hall, D.J. (1965) ISODATA, A novel method of data analyss and pattern classfcaton, Techn. Rep., Stanford Research Insttute, Menlo Park, CA. Bezdek, J.C. (1981) Pattern Recognton wth Fuzzy Objectve Functon Algorthms, Plenum, New York. Brown, M. Lews, H.G. Gunn, S.R. (2000).Lnear spectral mxture models and support vector machnes for remote sensng. IEEE Trans. on Geoscence and Remote Sensng, 38(5) Dempster, A.P., Lard, N.M. and Rubn, D.B. (1977) Maxmum lkelhood from ncomplete data va the EM algorthm, J. R. Statst. Soc. B, (39) Fraley, C. (1998) Algorthms for Model-Based Gaussan Herarchcal Clusterng, SIAM J. Sc. Comput., (20) Fraley, C. and Raftery, A. E. (2002) Model-based clusterng, dscrmnant analyss, and densty estmaton, J. the Amer. Statst. Asso., (97) Hall, L., Evans, S. and Nott, K. (1998) Measurement of textural changes of food by MRI relaxometry. Magnetrc Resonance Imagng, 14 (5/6) Ichoku, C., and Karnel, A., (1996), A revew of mxture modelng technques for sub-pxel land cover estmaton, Remote Sensng Revews, 13. Kaufman, L., and Rousseeuw, P. J. (1990). Fndng groups n data: an ntroducton to cluster analyss. New York: Wley. McLachlan, G. and Peel, D. (2000). Fnte Mxture Models, Wlley seres n probablty and statstc, Canada. Murat Dundar, M. and Landgrebe, D. (2002). A model-based mxture-supervsed classfcaton approach n hyperspectral data analyss. IEEE Trans. Geosc. Remote Sensng. 40(12) Noordam, J.C. (2005) Chemometrcs n multspectral magng for qualty nspecton of postharvest products, Phd. thess, Radboud Unversty of Njmegen. Tran, T.N., Wehrens, R., and Buydens, L.M.C. (2005). Clusterng mult-spectral mages: a tutoral, to appear n Chemom. Intell. Lab. Syst. Rydberg, A. (2001). Multspectral mage analyss for extracton of remotely sensed features n agrcultural felds. Phd. thess. Swedsh Unversty of Agrcultural Scences. 6

9 GENERAL INTRODUCTION Schwarz, G. (1978). Estmatng the dmenson of a model. Annuals of Statstcs (6) Smonett, A. (2004) Investgaton of bran tumor classfcaton and ts relablty usng Chemometrcs on MR spectroscopy and MR magng data. Phd. thess, Radboud Unversty of Njmegen. Stanford, D.C., and Raftery, A.E. (2002) Approxmate Bayes Factors for Image Segmentaton: The Pseudolkelhood Informaton Crteron (PLIC). IEEE Trans. on Pattern Anal. Mach. Intell. (24) Vandegnste, B.G. M., Massart, D.L., Buydens, L.M.C., Jong, S.De., Lew, P.J., and Smeyersverbeke, J. (1998). Handbook of Chemometrcs and Qualmetrcs, Part B. Elsever, 57 7

10 CHAPTER 1 8

11 CHAPTER 2 CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL Abstract A huge number of clusterng methods have been appled to many dfferent knds of data set ncludng multvarate mages, such as magnetc resonance mages and remote sensng mages. However, not many methods nclude spatal nformaton of the mage data. In ths tutoral, the major types of clusterng technques are summarzed. Partcular attenton wll be devoted to the extenson of clusterng technques to take nto account both spectral and spatal nformaton of the multvarate mage data. General gudelnes for the optmal use of these algorthms are gven. The applcaton of pre- and post-processng methods s also dscussed. Keywords: Pattern recognton; Unsupervsed classfcaton. T.N. Tran, R. Wehrens and L.M.C. Buydens, Chemometrcs and Intellgent Laboratory Systems,vol. 77/1-2, pp. 3-17,

12 CHAPTER 2 1. Introducton Automatc groupng of pxels havng a smlar characterstc n a multvarate mage s an mportant problem n a varety of research areas such as bology, chemstry, medcne, and computer vson. In spte of several decades of research, the task s stll challengng due to the dramatc mprovement of magng technology n recent years. Examples are magnetc resonance mages (MRI), whch has become a standard tool n medcne, and remote sensng of the earth surface from satellte or arborne scanners. In both examples, a huge number of multvarate mages, often wth a very hgh spectral and spatal resoluton, are generated routnely. If there s no pror nformaton about the classes, the groupng of pxels has to be done n an unsupervsed way. Ths s called clusterng [1][2][3]. In general, clusterng groups objects characterzed by the values of a set of varables nto separate groups (clusters), based on ther smlartes. Ths may help to understand relatonshps that may exst among them. Examples of the applcaton of clusterng technques on non-mage data type n chemometrcs are explorng of envronmental data structure representng physcal and chemcal parameters [4], computatonal analyss of mcroarray gene expresson profles [5], or electron probe X-ray mcroanalyss n [6]. In these cases, the clusterng method s ntegrated wth vsual dsplay allowng drect nterpretaton of nternal structure of the data. Another applcaton s dentfyng chemcal compounds for combnatoral chemstry [7], where clusterng was studed on a data set of alcohols and the nterpretaton of the results was consstent wth chemstry. Clusterng can also be combned wth other methods such as genetc algorthms for molecular descrptor selecton n [8]. And last but not least, clusterng can also be appled for process montorng [9][10][11]. In ths case, cluster centers are updated automatcally by the method accordng changes due to, e.g., process drfts by seasonal fluctuatons [9]. Clusterng helps to nterpret the model and study short-term changes and long-term changes due to drftng [10]. Clusterng technques can also be appled to multvarate mages. In general, a multvarate mage s defned as a stack of mages, where each mage represents a dfferent varable. Many physcal characterstcs can be used n multvarate mages such as temperature, mass, wavelength, polarzaton etc. As an example, MRI T1 and T2 weght mages, correspondng to dfferent relaxaton tmes, are often use n clncal decson makng. More general, a varable can be also a latent varable, e.g. prncpal components (PCs). These (latent) varables form the so-called feature nformaton of pxels n the multvarate mage. A major dfference wth non-mage data s that spatal nformaton, n the form of X and Y coordnates, s avalable besdes pxel nformaton on the feature space. In general, we expect that classes form spatally contnuous regons. Ths s sometmes called a spatal relaton of neghbor pxels, local characterstcs, or local dependency [12],[13]. Spatal nformaton s usually gnored. In most cases, takng t nto account wll mprove the clusterng result sgnfcantly. Examples of the applcaton of clusterng of multvarate mages n chemometrcs are the localzaton of clusters of bran tumours n MRI mages [14][15], or dentfyng clusters of pxels havng smlar ground cover types n remote sensng mages [16]. In ths tutoral, the man types of problems for clusterng of multvarate mages are dscussed n detal n secton 2. In the followng sectons, the major types of clusterng technques are overvewed and possble extensons takng nto account spatal nformaton [14][15][16] wll be evaluated. Preprocessng of multvarate mages and post-processng of clusterng results wll be treated n the last secton. 10

13 CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL 2. Problems for clusterng multvarate mages We consder a multvarate mage contanng N pxels (objects) n d-dmensonal multvarate space (also called a feature space). In other words, a pxel (an object) s descrbed by d varables correspondng to the d-dmensonal feature space. The man problems encountered when clusterng multvarate mages are lsted below: Image sze: The mprovement n mage sensor sensbltes has ncreased the resoluton n the spatal doman of multvarate mages drastcally. As a result, the sze of mages has ncreased too. The sze of typcal data set can easly get up to mllons of pxels. For many clusterng algorthms, especally the ones that use a dstance matrx such as herarchcal methods, ths s prohbtve n terms of memory and processng tme. Feature dmenson: The mprovement n mage sensor sensbltes gans not only a large number of pxels but also a large number of varables. In many cases, the nverse of the covarance of a cluster has to be computed durng clusterng. Ths s very expensve. For very small clusters, t may not be possble to calculate, because of sngularty. Nose: Many mage scanners produce nose/outlers n mages due to lmted sensor senstvty, statstcal varaton, or sgnal nterference (c.f. the speckle n the SAR Flevoland mage data n Fgure. 3a.). Not only can nose make the result very dffcult to nterpret, but also t can lead to a completely wrong soluton. Mxed pxels: Despte the ncrease n resoluton of scanners, pxels often contan the spectral response of several components. These pxels are not easly classfed n one cluster. Some clusterng methods, such as fuzzy C-means, allow a mxed pxel to be classfed to more than one cluster. Another approach s to use spatal nformaton. In addton, several general problems are also relevant to clusterng mages. Overlappng clusters: More often than not, clusters are overlappng n the feature doman; even though two objects may belong to dfferent clusters they may have features that are very smlar. Then, f a clusterng algorthm uses only feature nformaton, t wll not lead to a good result. Number of clusters: In many cases there s no clear pror reason to favor a partcular number of clusters. The clusterng method then has to fnd the best number of clusters from the data. Ths s often very dffcult to fnd. Unequal cluster densty: The densty of a cluster at a partcular pont n the feature space s the number of pxels contaned n a unt of the data space. Clusterng methods based on densty often have problems wth clusters of very dfferent denstes, e.g. rver and lake clusters n [17]. Unequal cluster sze: f cluster populatons are very dfferent then t could nfluence clusterng results. Sometmes, a small cluster can be very mportant but t s often not found because the larger clusters determne the clusterng result. For example n an mage of a St. Paula flower, t s dffcult to recognze a pstl on the mage [18]. Ths problem can be dfferent from the unequal densty problem when the denstes reman the same and the feature space s very dfferent. In summary, the mage sze and the feature dmenson problems often make a method unsutable due to computaton tme and computer memory. On the other hand, other problems affect the accuracy of clusterng method rather than ts feasblty. They wll be dscussed n more detal n secton 5 and 6. In many cases, clusterng takng nto account spatal nformaton can reduce the nfluence of these problems to clusterng accuracy [12][13][19]. 11

One s n the centre of the mage and the other s around t. The densty functons of the two dstrbutons are plotted n Fgure 1b. It llustrates the overlap of two clusters n the feature doman. Fgure 1. SYN mage.

14 CHAPTER 2 3. Example mages In ths tutoral, three expermental setups are used for demonstraton purposes. Experment 1 (SYN): A synthetc mage of sze 40 x 40 consstng of two overlappng Gaussan clusters n one dmenson s generated. Pxels of two Gaussan clusters are dstrbuted n the mage as shown n Fgure 1a. One s n the centre of the mage and the other s around t. The densty functons of the two dstrbutons are plotted n Fgure 1b. It llustrates the overlap of two clusters n the feature doman. Fgure 1. SYN mage. (a) Gray mage sze 40 x 40. (b) Gaussan dstrbuton functons of the two clusters. Experment 2 (MEAT): A multvarate mage of mnced meat was recorded wth the ImSpector V7 magng spectrograph (Spectral Imagng Oulu, Fndland) as descrbed n [20]. The mage sze s 318x318 wth 257 varables (bands) from 396 nm to 736 nm (1.3 nm for each band). The ncomng lght s splt and captured by CCD Sony camera to obtan a color mage, whch wll be used as the reference mage for clusterng result. The CCD color mage and the plot of representatve spectral of clusters are shown n Fgure 2a and b, respectvely. The full spectral mage of large number of varables s pre-processed by averagng technque to 11 planes (bands) mage n order to reduce computaton tme. Fgure 2. (a) Meat CCD color mage of sze 318x318, (b) representatve spectra for clusters: a fat meat spectrum located at (119, 134), a dark meat spectrum at (32, 119) and a lght meat spectrum at (78, 94). The CCD mage shows a petr dsk flled wth a pece of mnced meat. It contans 4 classes: the petr dsk, dark meat, lght meat and fat. The dfference between dark meat and lght meat s caused by the amount of blood n the meat. The dark pxels represent the dark meat class and the whte spots represent the fat class. The fat class s qute separated from other classes. The lght meat class surrounds the fat class and gradually turns nto the dark meat class. Ths causes the overlap problem between the dark meat and lght meat classes [20]. The large number of varables and the overlap of clusters are problems for clusterng ths mage. 12

CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL Experment 3 (SAR): An area of 400 x 400 pxels of a remote sensng SAR mage was taken over Flevoland, an agrcultural area n The Netherlands, by the NASA/Jet

Fgure 3a shows a false-color mage of frst three ntenstes of the mage data. Ideally, one would lke to obtan a clusterng that corresponds to the seven expected crop types [21], as shown n Fgure 3b.

15 CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL Experment 3 (SAR): An area of 400 x 400 pxels of a remote sensng SAR mage was taken over Flevoland, an agrcultural area n The Netherlands, by the NASA/Jet Propulson Laboratory (JPL) ArSAR on 3 July The mage used here s n C- and L-band full polarmetry and contans 18 ntenstes. Fgure 3a shows a false-color mage of frst three ntenstes of the mage data. Ideally, one would lke to obtan a clusterng that corresponds to the seven expected crop types [21], as shown n Fgure 3b. Fgure 3. (a) False-color mage of the frst three ntenstes on C-band of 400 x 400 pxels. (b) Map of seven crop types (ground-truth). The Yellow color s a mask where the ground truth s uncertan: these pxels are predcted but does not take nto account when calculatng predcton accuracy of clusterng result. Heavly overlappng clusters are shown n Fgure 4 between Barley (Green) and Wnter Wheat (Magenta) clusters. Nose s also present n the data set due to statstcal varaton of the sgnal (speckle). Nose and cluster overlap are the two man problems for ths mage. Fgure 4. (a) Spectra of 50 objects for each of the three classes, Barley (Green), Wnter Wheat (Magenta), and Rapeseed (Brown), (b) Score plot of the two frst PCs of all pxels n the three classes. 4. Smlarty Measures A measure of smlarty s essental to clusterng. It can be a dstance n determnstc clusterng or a lkelhood n probablstc clusterng. Both are called smlarty functon n ths tutoral and wll be ndcated by Smlarty measures wth no spatal nformaton The smlarty functon wth no spatal nformaton uses only nformaton n the feature space. It can be calculated between two pxels, two clusters, or between a pxel and a cluster. In the determnstc case, the most popular measure of dssmlarty between pxels x and x j s the Eucldean dstance, eucl x, x j, whch s a specal case of the Mnkowsk dstance wth p = 2. Ths s gven by: 13

16 CHAPTER 2 1 d p p mnkovsk x, x j = xl x (1) jl l = 1 where x = { x 1,.., xd }. The Mnkowsk dstance wth p = 1 s called the Manhattan dstance. These dstances can be also appled for measurng dssmlarty between pxel x and cluster ω j where the mean of clusterω j, µ ω j, s used nstead of the pxel x j. However, the covarance C ω of the cluster s then not taken nto account. The j Mahalanobs dstance, on the other hand, does use the covarance: T 1 Mahalanob s x, ω j = x µ ω C x j ω µ ω (2) j The Bhattacharyya dstance s a dstance between two clusters, ω1 andω 2, both havng a normal dstrbuton: 1 1 ω + + ( ) 1 ω2 1 Cω C T C C 1 ω2 Bhattacharyya ω1, ω2 = µ ω µ + ln 1 ω µ 2 ω µ (3) 1 ω C ω C 1 ω2 where µ and C agan ndcate means and covarances, respectvely. The frst part of the Bhattacharyya dstance s domnated by the dfference n means, and the second part by the dfference n covarance. In the probablstc case, where clusters are explctly modeled as a dstrbuton, such as a t-dstrbuton or a normal dstrbuton, the lkelhood s used as smlarty functon [22], [23]. More detals are dscussed n secton Includng spatal nformaton n the smlarty measure In (multvarate) mages, the spatal nformaton of a pxel x conssts of cluster nformaton of the neghbor pxels. Many neghbor-schemes,, can be used. An often-used one s a square wndow centered at the pxel x. In prncple, t s possble to defne a smlarty functon that not only takes nto account nformaton n the feature doman, but also clusterng nformaton of neghborng pxels. Ths can be done by a weght functon w(x,,ω j ) to the cluster ω j. Such a smlarty functon for comparng a pxel and a cluster s generally expressed by two general forms as follows: Addton form: ~ x, ω j = x, ω j + w x,, ω (4) j ~ x, ω = w x,, ω. x, ω (5) Multplcaton form: ( ) ( ) ( ) j j The spatal weght functon w(x,,ω j ) s defned dfferently dependng on the partcular clusterng method, whch wll be dscussed n more detal n secton 5. Smlar expressons could be set up to compare two pxels or two clusters, but ths has not appeared n the lterature. j j 14

17 5. Clusterng technques CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL 5.1 General deas One can often see clusterng n a hard form, whch assgns each pxel x to one and only one cluster. A soft or a fuzzy technque, on the other hand, assgns to each pxel x a fractonal degree of membershp u j [0,1] for all clusters. The hgher the degree of the membershp u j, the more probable t s that the pxel x belongs to cluster j. A determnstc smlarty s often used n hard clusterng and a probablstc dstance (or a fuzzy varant) s used n soft clusterng. A soft clusterng contans more nformaton than a hard clusterng and t can be converted to a hard clusterng. Clusterng technques n general can be categorzed nto three man types: parttonal clusterng, herarchcal clusterng, and densty-based clusterng, as llustrated n Fgure 5. Each can be further subdvded, the best-known clusterng algorthms n chemometrcs are K-means, Fuzzy C-means [9][19], herarchcal agglomeratve [4][5], model-based [14] (or mxture modelng), and densty-based [24] clusterng methods. Clusterng Parttonal Herarchcal Densty-based Dstance-based (K-means, C-means, ) Model-based Sngle-, Average- Model-based Mode-seekng graph-based Completelnkage Fgure 5. A taxonomy of clusterng methods Parttonal clusterng Ordnary parttonal clusterng wthout spatal nformaton Gven a number of clusters, g, a parttonal clusterng technque seeks an organzaton of pxels whch optmzes a target functon E. Ths can be a mnmum or maxmum, dependng on the clusterng method. E.g, n Kmeans a compactness functon s mnmzed and n model-based clusterng, the log-lkelhood s maxmzed. E can be wrtten as: E = g j= 1 C u x,ω (6) j j where u j s a degree of membershp of the pxel x. Agan u j s ether 0 or 1 n hard parttonal clusterng methods. In fuzzy clusterng, u j s replaced wth u q j wth q > 1. Often, q = 2 s used. An optmal soluton for ths clusterng problem requres an exhaustve combnatoral search, but t s not possble to perform n practce. It s often estmated by an teratve process: 15

18 CHAPTER 2 Algorthm: (1) Start: The algorthm starts wth an ntal guess of the set u j [0,1], often random. (2) Iteraton: A number of teratons are performed to mprove the compactness functon (Eq. 6) by updatng the degree of membershp u j accordng to new centrods of the clusters. In hard parttonal clusterng, t s nterpreted as assgnng each pxel to the cluster wth the smallest (x,ω j ). The degree of membershp s updated as: u j = 1ff (x,ω j ) = mnmum of (x,ω j ) k) u j = 0 otherwse. (3) End: The algorthm ends f a stop-crteron holds, otherwse the algorthm s repeated at step 2. The stop-crteron could be a number of teratons, a threshold of the compactness functon or convergence of the soluton. The algorthm bascally provdes a better soluton wth more teratons and more processng tme. A bg advantage of partonal clusterng s the computaton tme. The complexty s only Nlog(N), where N s the number of pxels. Ths makes t possble to apply the algorthm to even very large data sets. However, partonal clusterng has several drawbacks: - The number of clusters needs to be defned before hand, the number of cluster problem. - Most partonal clusterng methods are heavly dependent on the ntal guess. It may lead to very dfferent results upon repeated applcaton. A locally optmal soluton s often obtaned nstead of the global optmum of the compactness functon - The unequal cluster sze problem mght nfluence the clusterng result, because the centre of a smaller cluster often tends to drft to an adjacent larger cluster. - Nose, present n a data, also nterferes wth the result of the parttonal clusterng by nfluencng the calculaton of new cluster centers. It s less nfluental n a soft/fuzzy clusterng because pxels far from the center of a cluster, such as nose/outlers, are assgned a lower degree of membershp. Parttonal clusterng methods can be dvded nto determnstc and model-based approaches Determnstc Parttonal Clusterng A determnstc parttonal clusterng s a parttonal clusterng where the smlarty functon, (x,ω j ), s a dstance. Dfferent determnstc parttonal clusterng algorthms have dfferent defntons of the dstance (x,ω j ) and ways of updatng of the membershp degree u j. The most popular hard determnstc parttonal clusterng s K-means, where the dstance (x,ω j ) s the Eucldean dstance eucl (x,ω j ). Some varaton of the K-means algorthm nvolves selectng a dfferent dstance functon (x,ω j ), for nstance the Mahalanobs dstance [25], but the algorthm then tends to produce unusually large or unusually small clusters. Another varaton of K-means s ISODATA clusterng [26]. It s desgned to solve the number of clusters problem. ISODATA starts wth a hgh number of clusters. The method s dfferent from the ordnary parttonal method, whch permts splttng a bg cluster, mergng two close clusters, and deletng a very small cluster. Ths way, the number of clusters s dentfed by the method. However, thresholds for cluster varance and cluster sze need to be defned, whch are dffcult to control n practce. 16

19 CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL Nowadays, much attenton has been pad to soft or fuzzy determnstc parttonal clusterng. Fuzzy C-means (FCM) or fuzzy K-means (FKM) s a famous example of ths type [9][15][27][28][29]. Durng the teratons, the fuzzy membershp u j s updated as a functon of dstance to clusters: u j = g c= 1 1 c means c means ( x, ω ) ( x, ω ) j c 1 q 1 where q>1 s the fuzzness ndex. Normally, q s 2. The smlarty functon s gven by: c means j j T x, ω = x µ ω A x µ (8) ω j where A s a d x d symmetrc, postve defnte matrx, and d s the feature dmenson of the data set. The c-means (x,ω j ) dstance s mahalanobs (x,ω j ) when A s the nverse of covarance matrx, or t can be eucl (x,ω j ) when A s the dentty matrx. The later case s the usual form of c-means (x,ω j ). The smlarty functon s defned dfferently n the FMLE (Fuzzy Modfcaton of the Maxmum Lkelhood Estmaton) algorthm [28] and p-norm FCM [29], where an exponental dstance and the Mnkowsk dstance are employed, respectvely. A nce feature of fuzzy determnstc parttonal clusterng s that a pxel n an area of overlappng clusters s not assgned wth a very hgh membershp. Ths way, t does not nfluence the cluster parameters very much. In other words, such a pxel always has a larger uncertanty. Ths also holds for outlers/nose. As the example, clusterng results of SYN mage by K-means and FCM algorthms to two clusters, correspondng to Whte and Black area, are plotted n Fgure 6. Many pxels n the overlapped area are msclassfed. (7) Fgure 6. Clusterng result of SYN mage to two clusters (Whte and Black); a) by K-means, b) by fuzzy C-means, c) hard clusterng result based on fuzzy C-means. The gray pxels ndcate fuzzy membershps n the fuzzy C-means clusterng result. The results of clusterng MEAT by K-means and the hard result based on FCM are gven n the Fgure 7. Due to the overlappng clusters, both have qute smlar problems. Many dark meat areas are replaced by lght meat and the fat spots are extendng over the lght meat regons. However, the problem s smaller wth FCM. 17

CHAPTER 2 18 Fgure 7. Clusterng of MEAT mage by (a) K-means, (b) Fuzzy C_means. 5.2.1.2 Model-Based Clusterng (MBC) Model-based clusterng, sometmes also called mxture modelng, s a soft parttonal clusterng based on a statstcal approach [23][30].

20 CHAPTER 2 18 Fgure 7. Clusterng of MEAT mage by (a) K-means, (b) Fuzzy C_means Model-Based Clusterng (MBC) Model-based clusterng, sometmes also called mxture modelng, s a soft parttonal clusterng based on a statstcal approach [23][30]. Every cluster c s descrbed by a multvarate dstrbuton f wth parameters θ c. For example, for the Gaussan dstrbuton, the most often used, θ c contans mean µ c and covarance C c. The total data set s descrbed by a lnear combnaton of ndvdual cluster and the coeffcents correspond to mxture proportons π c. The probablty densty functon of the pxel x under a g-component (cluster) mxture s gven by: f g ( x ; Ψ) = f ( x ; θ ) c= 1 π (9) c c Now, the probablstc lkelhood functon s gven by the followng expresson: L n ( Ψ) = f ( Ψ) = 1 x ; (10) where ψ contans all cluster parameters and mxture proportons. The am of the modelbased clusterng s to obtan a confguraton ψ n whch t maxmzes the log-lkelhood logl(ψ). Ths s equvalent to the optmzng the log-lkelhood : log L g ( Ψ) = u log( f ( x ; θ )) c= 1 n c π (11) c c where u c corresponds to the condtonal probablty of object x belongng to cluster c. The maxmzaton of the log-lkelhood probablty functon s analogous to the optmzaton of the compactness functon (eq. 6). Ths s usually performed by the EM (Expectaton Maxmum) algorthm [31]. Accordng to the general procedure for parttonal clusterng, the step 2 s splt nto two sub-steps n EM algorthm, called the M-step (Maxmzaton step) maxmzng π c and θ c, and the E-step (condtonal Expectaton step), estmatng u c. The E- and M- steps are terated untl convergence, or untl the number of teratons exceeds a certan threshold Parttonal clusterng wth spatal nformaton Includng spatal nformaton,.e. class nformaton of neghborng pxels, may enable a clusterng method to dstngush two clusters that are close together n feature space, but

21 CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL far apart n the mage. Moreover, t wll smoothen the result. Although n many cases a somewhat nosy classfed mage may be very well nterpretable by an expert, there are also cases where the nose serously decreases the qualty of the clusterng. Furthermore, automatc assessment of the areas of the dfferent clusters (by countng pxels) wll be less relable n the presence of nose or outlers. In all cases, by takng nto account the spatal nformaton, the overlappng problem n clusterng s reduced Spatal nformaton n determnstc parttonal clusterng The spatal nformaton of multvarate mage can be taken nto account by usng the ~ x, ω as n Eqs. 1 and 2. approprate dstances ( ) The compactness functon then becomes: ~ E = g q ( u ) ~ j ( x, j ) j= 1 C j ω (12) In general, many spatal weght functons are possble. Ths concept has been appled n [19][32] for fuzzy C-means. For example of the addtve ncluson of spatal nformaton, n robust fuzzy C-means (RFCM) [32], the dstance functon ~ x, ω s defned as: ~ β q ( x, ω j ) = ( x, ω j ) + ( ulm ) (13) 2 l m C\ ω j where u lm s the condtonal probablty of pxel xl n the neghbor-scheme belongng to cluster m, whch s not ω j. The parameter β s a postve spatal dependency parameter. Larger values of β encourage neghbors to be n the same cluster. RFCM s dentcal to standard FCM when β=0. As an example of the multplcatve ncluson of spatal nformaton (eq. 5), a spatal weght functon s defned n ths paper as: (,, w ) w x j = l e g ( β. ulw ) j ( β. u ) l c= 1 e lc The parameter β, agan, s a postve spatal dependency parameter. Larger values of β encourage neghbors to be n the same cluster. The resultng modfcaton of the standard fuzzy C-means clusterng s called Spatal Condtonal FCM (SCFCM) clusterng. Fgure 8 llustrates the effectveness of the ntegratng of the spatal nformaton nto clusterng of SYN and MEAT data by SCFCM. The SYN result (Fgure 8a) s consstent wth the desgn of the mage data. In Fgure 8b, the result on MEAT data, the dark meat regons are larger and the fat regons concde wth regons of lght spots n the orgnal mage (Fgure 2). j (14) 19

CHAPTER 2 Fgure 8: Clusterng result usng Spatal Condtonal FCM (SCFCM) wth spatal nformaton. (a) SYN data; (b) the MEAT data. GGC-FCM (Geometrcally Guded Condtonal FCM) s another example n [19].

22 CHAPTER 2 Fgure 8: Clusterng result usng Spatal Condtonal FCM (SCFCM) wth spatal nformaton. (a) SYN data; (b) the MEAT data. GGC-FCM (Geometrcally Guded Condtonal FCM) s another example n [19]. It follows the general multplcaton form ~ ( x,ω j ) = k ( x, ω j ) descrbed n [33], where k corresponds to the condton for the pxel x, whch s equvalent to the spatal weght w(x,,ω j ). Ths condton value s determned by the majorty class of neghborng pxels n. More dscusson of the condton value s n [19] Spatal nformaton n model-based clusterng Smlar to determnstc parttonal methods, the spatal contnuty weght functon w(x,,ω j ) can also be ncluded n model-based approaches. Probably the most often used weght functon s based on Markov Random Feld (MRF) theory [13][34]. Gven the neghbor-scheme, the smplest weght functon can be defned for a model-based clusterng as follows: β. u lw g l ( j ) j,, w = e w x c= 1 e β. ulc l where β s the spatal contnuty parameter. More postve values encourage neghbors to be of the same cluster. Hence, the new smlarty functon ~ ( x, ω j ) can be formed, usually as n Eq. 5. Thus, the product of the weght w and the lkelhood s maxmzed. The weght w approaches 1 f all neghborng pxels are n the same class as x, otherwse t s smaller. Research n the feld was very actve after the work n [35]. The same author proposes the famous ICM (Iterated Condtonal Modes) algorthm [13]. ICM estmates the maxmum of the margnal probabltes, whch s equvalent to the optmzng the log-lkelhood L ( Ψ). Ths s actually the conventonal EM algorthm usng the estmated condtonal probablty takng nto account the spatal nformaton. More detal on the modfcaton of the EM algorthm takng nto account the changes of posteror probablty s n [23]. The neghbor-scheme and the smoothness parameter β are often manually chosen. Automatc adjustment s also possble [13][34]. It s not very dffcult to fnd good settng values for a small mage or a small area. However, for a large mage, presentng many dfferent types of objects or structures, there may be no sngle parameter value for whch good results are obtaned. In such a case, many dfferent local parameter values may be needed, and a mult-scale and a mult-resoluton approach are needed [36][37]. (15) 20

CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL For demonstraton purposes, clusterng results usng ordnary MBC (usng no spatal nformaton) and ICM are reported n Fgure 9.

23 CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL For demonstraton purposes, clusterng results usng ordnary MBC (usng no spatal nformaton) and ICM are reported n Fgure 9. Clusterng results are compared wth the reference nformaton n Fgure 3b (not the Yellow area). The ICM algorthm, takng nto account the spatal nformaton of the mage, shows better results. Not only s the agreement wth the ground-truth hgher but also the mage looks much smoother. The parameters used n ICM are β=0.2 and to be a square wndow of 11 x 11 pxels centered at x. Fgure 9. Clusterng result of SAR mage a) The best MBC after 50 random ntalzatons wth 71% accuracy, b) The best ICM after 50 random ntalzatons wth β=0.2 wth accuracy of 81%, on the area havng reference nformaton (not the Yellow area n Fg 3b) Agglomeratve herarchcal clusterng Agglomeratve herarchcal clusterng (AHC) mostly refers to a hard determnstc herarchcal clusterng. Ths yelds a herarchcal structure of clusters, whch represents how cluster pars are joned. Conceptually, t s a smple dea that follows naturally from the concepts of dstance and smlarty [4][9][15][28][29]. In prncple, the algorthm s as follows: Algorthm: (1) Start: Assgn each pxel to an ndvdual cluster, yeldng N clusters. (2) Iteraton step: The smlartes between all cluster pars and j, ω, ω j, are calculated and the two closest clusters are merged. (3) End: The algorthm ends f there s only one cluster. Several varants of AHC exst: sngle lnkage, complete lnkage, average lnkage, centrod lnkage, and Ward s clusterng, dependng on the defnton of the dstance between clusters. In sngle lnkage, the dstance between two clusters s the dstance between ther two nearest ponts. Smlarly, the dstances are the maxmal dstance between ponts, the average dstance of ponts, and the dstance of mass centers n complete lnkage, average lnkage, and centrod lnkage, respectvely. Agan, the dstances can be the Eucldean, Manhattan, or more generally Mnkowsk dstances. In Ward s clusterng, the dstance between two clusters, and j, s the weghted verson of the squared Eucldean dstance of the cluster mean vectors; ( µ, µ ) nn j n eucl j n (16) + j where n, n j and µ, µ j are the numbers of ponts and means of cluster and j, respectvely. 21

24 CHAPTER 2 The result of AHC s a dendrogram, representng nested clusters and the smlarty levels where clusters are joned. The dendrogram can be cut at several levels n order to obtan any number of clusters. Ths property makes t easy to compare many dfferent numbers of clusters. However, determnng a good number of clusters s dffcult. Several crtera wll be mentoned below. Vsualzaton of a dendrogram s only useful n a small data set, although n the feld of mcroarray data analyss large dendrograms are often shown, e.g. analyss of gene expresson profles n [5][38]. In contrast to parttonal clusterng, AHC methods are very stable. There are two reasons. Frst, clusterng s always ntalzed n the same way. Secondly, the algorthm consders only clusters that were obtaned n the prevous step. Ths means that once a pont has been merged to a cluster, t can not be consdered for jonng other clusters n later teratons. In some cases, t s an advantage but t also decreases the flexblty, a drawback of AHC. Channg, or frends-of-frends, s a term for a typcal problem of sngle lnkage AHC, when seres of smaller clusters are merged nto an elongated chan. AHC works on the dstance matrx at every teraton. The sze of the dstance matrx, the square of the number of objects, can be very large. AHC therefore s very susceptble to the mage sze problem. Because of ths problem, AHC s rarely appled to an mage data set. If the data set contans nose, or outlers, these are kept n separate clusters and do not nfluence other clusters. In ths case, the real number of clusters can only be defned after the clusters contanng nose/outlers, whch are normally very small n sze, are elmnated [16]. These characterstcs can be demonstrated by applyng AHC to the SYN mage. As expected, the sngle-lnkage obtans one very small cluster contanng only outlers, shown n Fgure 10a. The results of complete lnkage and average lnkage are gven n Fgure 10b and c. The AHC concept can be extended to a model-based varant where the classfcaton lkelhood s used [22][30]: L CL = n = 1 f ( x ) ;θ (17) c where f s a multvarate dstrbuton wth parameters θ c for cluster c to whch x s assgned. Model-based agglomeratve herarchcal clusterng operates by successvely mergng pars of clusters correspondng to the greatest ncrease n the classfcaton lkelhood L CL. Ths method s equvalent to the Ward s AHC method when f s multvarate normal wth unform sphercal covarance [30]. Just lke ntegratng spatal and spectral nformaton wth parttonal clusterng, spatal nformaton conceptually can also be used n AHC. Dstances can thus be estmated by ~ ( x, ω j ) at any teraton, takng nto account the spatal weght functon. There stll s no report of ths knd of work for AHC so far. 22

25 CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL Fgure 10. AHC clusterng appled to the SYN mage, a) sngle lnkage, b) complete lnkage, c) average lnkage. The horzontal lne ndcates the cuttng level to obtan two clusters Densty-based method Besdes the herarchcal and the parttonal approaches, densty-based clusterng methods, such as Denclust [39], CLUPOT [40] and DBSCAN [41], form a thrd clusterng type. Densty-based clusterng estmates denstes around ndvdual objects. It s bascally a hllclmbng procedure to a local densty maxmum [42]. Each local maxmum then consttutes a cluster, and the cluster boundares are gven by the low densty areas (valleys). They are determned by a densty threshold. Ths, together wth the sze of the volume for whch the local densty s estmated, are the two man parameters of the method. Once these parameters are set, the number of clusters automatcally follows. Densty-based clusterng was frst presented as mean-shft or mode seekng methods, 23

26 CHAPTER 2 based on an estmaton of a gradent of local densty functons, proposed n [43] and further mproved n [42]. Bascally, the densty estmaton for a partcular pont, x, s gven by the number of ponts n a partcular volume around that pont, V x. Varable kernel methods use a kernel functon K(x ) to gve more weght to ponts close to x. Gaussan and trangular kernels are often used. A good revew of non-parametrc densty estmaton methods can be found n [1]. Densty-based clusterng was orgnally desgned to detect clusters of arbtrary shape and to solate nose, and n these aspects, t has advantages over other clusterng methods. However, t was shown that current densty-based clusterng fals to dentfy both large and small clusters smultaneously due to very dfferent denstes [17]. Low densty clusters tend to be assgned as nose/outlers. The method spends most of the computaton tme for computng the densty estmaton functon for each object, whch s very demandng. Ths feature prohbts the densty-based method to be appled to multvarate mage data. Ths s n contradcton to conclusons from [24]. Moreover, determnng the rght parameters for densty-based clusterng method can be challengng. There s no good way to dentfy them automatcally and n practce t s a tral and error strategy. Densty-based clusterng n general also has problems wth overlappng clusters. The area of overlap often has a hgher densty than the neghborhood areas. Ths feature prohbts densty-based clusterng to separate two overlappng clusters, but tends to merge them together or to create a new cluster for the overlapped regon. Those are the man reasons why densty-based clusterng s not wdely used for multvarate mages. Graph-based clusterng [44][45] s a specal case of densty-based clusterng. In graphbased clusterng, pxels, nodes n a graph, are connected based on a neghborhood functon. A weak lnk s defned by a low number of neghbor lnks. The clusterng process s then a spannng process to dentfy a group of connected nodes when all weak lnks are broken (dsconnected). The strength of the lnk s analogous to the densty functon. DBSCAN and OPTICS [41] are well-known densty-based clusterng methods that have been appled recently n chemometrcs [24]. Denclust [39] s generalzaton of DBSCAN usng a gaussan kernel. A fxed volume s used n ths case, and the densty threshold s defned by two parameters, ε and mnpts, the radus of the volume centered at a partcular pont, and the mnmum number of ponts n the volume, respectvely. The propertes of densty-based clusterng are llustrated by a smple example of applyng DBSCAN to the SYN mage data n Fgure 11. Fgure 11a shows the result and the densty plot (value s the number of ponts n the volume wthout normalzaton to the total volume) when DBSCAN s appled wth a hgh densty threshold, ε = 0.27 and mnpts = 100. Pxels n the second cluster are classfed as nose. The best clusterng result s obtaned wth a lower densty threshold, ε = 0.19 and mnpts = 70, whch yelds two clusters n Fgure 11b. If the densty threshold s even lower, ε = 0.13 and mnpts = 50, three clusters are obtaned, where the overlappng area shows a peak to form a separate cluster, Fgure 11c. There s no report showng the ntegraton of spatal nformaton wth densty-based clusterng. In general, t s harder n ths case because the densty-based clusterng does not use dstances whch can be extended wth spatal nformaton. To nclude the spatal nformaton, the new estmated densty functon has to be calculated usng a new feature space whch s an extenson of the orgnal space. The parameters n ths case wll even more dffcult to estmate. 24

27 CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL Fgure 11. Applcaton of DBSCAN to the SYN data set, on the left sde showng results n the 1D feature space (Red: frst cluster, Pnk: second cluster, Black: thrd cluster, and Blue: nose. On the rght sde, the densty plot s shown (the value s the number of ponts n the volume wthout normalzaton to the total volume): (a) one cluster obtaned wth ε = 0.27 and mnpts = 100 (the red lne), (b) 2 clusters obtaned wth ε = 0.19 and mnpts = 70, and (c) 3 clusters obtaned wth ε = 0.13 and mnpts = Choosng a good number of clusters Not many clusterng algorthms can provde a good number of clusters automatcally. In many cases, the user needs to defne the number of clusters, ether drectly, n the parttonal clusterng, or ndrectly, n the herarchcal and densty-based clusterng. In general, a good number of clusters can be obtaned by runnng an algorthm many tmes wth a dfferent number of clusters and comparng results wth a crteron. The most popular crtera for determnstc parttonal clusterng are the Daves-Bouldn, Dunn, C-, and Goodman-Kruskal ndces [46]. In Model-based clusterng, on the other hand, the optmal number of clusters corresponds wth the best ft of the data. AIC (Akake s Informaton Crteron) and BIC (Bayesan Informaton Crteron) are the most popular crtera for mxture model clusterng [23]. For example, n herarchcal model-based clusterng [30], the BIC crteron s used to fnd an approprate cuttng level and the best number of clusters as the result. If the clusters can be descrbed by normal dstrbuton, these ndces often perform very well. There s no useful crteron to determne the number of cluster to densty-based clusterng. In many cases, a number of clusters of the multvarate mage can also be determned by vsualzng the clusterng result usng some pror knowledge about structure presentng n the mage surface. Ths technque has been appled frequently, e.g., n detectng of bran tumors n MRI mages [14]. The only crteron to date to take nto account spatal nformaton s the PLIC crteron [47], an extenson of the BIC crteron. The condtonal lkelhood s estmated locally,.e. from the mmedate spatal neghborhood of the pxel. Ths crteron can be used for model-based clusterng such as ICM clusterng. 25

28 CHAPTER Applcaton to mage data Image data s normally qute large and contans nose/outlers and overlappng clusters. A soft parttonal clusterng s a good opton f a number of clusters and a good ntal set u j [0,1] are avalable. Unfortunately, ths s often not the case and the result may be very dependent on the random ntal state. AHC, on the other hand, s more stable but hard to apply to mage data drectly because of the mage sze problem. Hence, AHC s normally used to estmate the correct number of clusters and ther parameters usng a small representatve subset of the mage data [30]. The problem then becomes how to obtan ths subset. The smplest soluton s to generate ths set randomly from the whole data set. However, there s a real danger of mssng a small cluster and more complcated methods may be needed [48]. Another opton s to apply a parttonal clusterng to obtan a large number of cells, whch are then joned together by AHC. Ths s much cheaper than startng AHC from sngletons [16]. Both densty-based and AHC clusterngs suffer from tme complexty and computer resource problems. In most of the cases, they are preferred for a small multvarate mage, such as n MRI mages [14][15]. Although AHC s stll appled for larger mages, e.g. n analyss of gene expresson profles n [5][38], or electron probe X-ray mcroanalyss n [6]. 6. Pre- and Post-processng In many cases, accuracy may mprove by performng pre-processng of the raw mage data or post-processng of the clusterng result. However, the effectveness depends on the clusterng method and the partcular mage data. Preprocessng methods nclude smoothng technques to decrease the amount of nose n the mage data and dmenson reducton technques to decrease the computatonal demands. In post-processng, the most often used technque s a smoothng, performed by a majorty-votng procedure. Agan ths manly serves to decrease nose n the clusterng result Nose/Outlers One of the most often used pre-processng technques s to remove unwanted nose/outlers from a raw multvarate mage. It s an mportant task and necessary for clusterng methods that are sensble to nose/outlers such as K-means. It s normally called a spatal flterng (low pass flterng) or a smoothng method. Many spatal flterng technques have been proposed for gray mages, based on local averagng of a mean ntensty value on a local neghborhood at each mage pxel. Lnear flterngs, such as Mean Value Smoothng and Medan Flterng, are the most popular methods. Readers are referred to [49] for a complete revew of mage flterng methods. In the most smple case, these flterng technques can be extended to multvarate mage by performng flterng on each varable (parameter) ndvdually. However, because they rely on only a raw data set wthout any knowledge of underlyng structure, these technques tend to dsplace structures and blur ther boundares. Ths sde effect crtcally nfluences many clusterng methods. Thus, ths flterng technque s recommended only when the mage does not contan many boundares. In ths case, a smple flterng method such as Medan flterng can do the job. Otherwse, t s recommended to use clusterng methods whch can deal wth nose, such as MRF model-based clusterng. As an example, the SYN data s fltered by the Medan flterng and clustered usng the fuzzy C-means algorthm as n Fgure 12a-b. The result s much better compared the result on the orgnal raw mage data (Fgure 6b). 26

6.2. Dmenson reducton CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL The dmensonalty has to be kept as small as possble to mprove the clusterng performance due to the hgh feature dmenson problem.

29 6.2. Dmenson reducton CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL The dmensonalty has to be kept as small as possble to mprove the clusterng performance due to the hgh feature dmenson problem. In many cases, not all feature varables are mportant n clusterng and one may select a subset of varables that together stll capture most nformaton of the mage data. Moreover, calculatng dstances takng nto account many unnformatve varables may totally obscure cluster structure. Pror knowledge may help to decde whch wavelength to use. In other cases, selecton of features s an optmzaton problem, for whch methods such as SA (Smulated Annealng), GA (Genetc Algorthm), or Tabu Search can be appled [50][51]. Projecton methods form an alternatve for feature selecton. Lnear transformatons, such as PCA (prncpal component analyss), ICA (ndependent component analyss), and non lnear mappngs such as SOM (Self-Organzng Map), Nonlnear PCA, have been wdely used [30][50]. The orgnal feature space s then mapped to a latent space, n whch the number of latent features s small and sutable for clusterng algorthm. However, the structure of the cluster may be changed, sometmes n such a way that clusters dsappears or start to overlap [52]. Note that both feature selecton and projecton methods do not take nto account spatal nformaton Flterng of clusterng result Flterng s not only used as a pre-processng step but can also be appled n the postprocessng of a clusterng result. The only dfference s that the pre-processng takes nto account the whole nformaton on the raw mage data but the post-processng takes only the clusterng result nto consderaton. Pont ntenstes are then replaced by clusterng labels. Nose/outlers are also consdered as members of clusters. Hence, dependng on the clusterng algorthm, f nose/outlers are stll present n the mage, they wll be smoothed by ths flterng. For mxed ponts n an area of cluster overlap, such as n the SYN data set, ntenstes are not changed much by applyng a pre-processng flterng technque when spatal neghbor ponts are also n ths overlappng area. Ths s often the case when the wndow sze s small. Ths problem s less for the flterng appled as a post-processng method when the neghborhood ntenstes are replaced by the cluster labels. These propertes are llustrated on SYN data set n Fgure 12a-c. Fgure 12a shows the preprocessng mage and the followed clusterng result by the fuzzy C-means s n Fgure 12b. The result s compared wth the stuaton where the fuzzy C-means s appled drectly on the orgnal SYN mage and the clusterng result s post-processed by flterng as n Fgure 12c. As expected, the clusterng result s better compared to the result usng flterng as pre-processng. Fgure 12. Pre- and Post processng on the SYN mage, (a) mage data after pre-processng by Medan flterng wth a 3x3 wndow, (b) the fuzzy C-means result of clusterng the preprocessed data, (c) post-processng result of fuzzy C-means appled to the orgnal SYN mage usng medan flterng of clusterng result wth a 3x3 wndow. 27

30 CHAPTER 2 The same scenaro s performed on the SAR mage as n Fgure 13a-c. The Medan flterng wth a 5x5 wndow s used for both pre- and post processng. The fuzzy C-means wth post-processng of the clusterng s better than usng only pre-processng technque. Accuraces are 77% and 64%, respectvely. The fuzzy C-means wthout any pre- or postprocessng acheves only 51%. Fgure 13. Pre- and Post processng on the SAR mage, (a) false-color mage of the SAR mage data after pre-processng usng Medan flterng wth a 5x5 wndow, (b) fuzzy C-means appled to the fltered mage data yeldng an accuracy of 64%, (c) the result after post-processng flterng of fuzzy C- means appled to the orgnal mage data, wth an accuracy of 77%, (d) the fuzzy C-means result appled to the orgnal mage data wth accuracy of only 51%. 7. Concluson Ths tutoral provdes a broad survey of the most basc clusterng technques to multvarate mage. The tutoral gves gudelnes to determne the most relevant clusterng for a partcular multvarate mage data set, dependng on the lst of mage data problems. In many cases, parttonal clusterng technques takng nto account spatal nformaton form the best opton for a large mage, provded the number of clusters s known or can easly be estmated. The stuaton s more dffcult f ths nformaton s unknown. Then, the process of tral and error usng statstcal crtera and vsualzaton s an opton. Careful pre- and post-processng can reduce the effect of nose/outlers and overlappng clusters. However, ncorrect use of these technques can dsturb or blur structures n the mage. Instead, usng clusterng technques takng nto account spatal nformaton can deal better wth these stuatons. Some problems are stll remanng for clusterng multvarate mages. A good clusterng for a partcular mage usng spatal nformaton needs to have a good settng of parameters. Automatc settngs do not always gve a good result. In many cases, the settng can be obtaned by a tral and error strategy and personal experence. Ths work s more dffcult for a larger mage, when more than one set of parameters may be requred. 28

31 CLUSTERING MULTI-SPECTRAL IMAGES: A TUTORIAL Furthermore, clusterng multvarate mages always has to deal wth the huge data problem because the development of mage scanner technology at the moment s often faster than computer technology. Valdatng the clusterng result s another problem, due to the lack of reference nformaton. 8. Acknowledgements We thank Jacco C. Noordam, Department of Producton & Control Systems, Agrotechnologcal Research Insttute (ATO), and Drk H. Hoekman, Department of Envronmental Scences, Wagenngen Unversty, for sharng the data sets. References [1] A. Webb, Statstcal Pattern Recognton, Wley, Malvern, UK, [2] K. Fukunaga, Introducton to Statstcal Pattern Recognton, 2nd edton, Academc Press, London, [3] A. K. Jan and M. N. Murty, ACM Computng Surveys, 31 (1999) [4] A. Smolnsk, B. Walczak and J. W. Enax, Chemom. Intell. Lab. Syst., 64 (2002) [5] J. Lang and S. Kachalo, Chemom. Intell. Lab. Syst, 62 (2002) [6] I. Bondarenko, H. Van Malderen, B. Treger, P. Van Espen and R. Van Greken, Chemom. Intell. Lab. Syst, 22 ( 1994) [7] A. Lnusson, S. Wold and B. Nordén, Chemom Intell. Lab. Syst., 44 (1998) [8] F. Ros, M. Pntore and J. R. Chréten, Chemom. Intell. Lab. Syst., 63 (2002) [9] P. Teppola, S.-P. Mujunen and P. Mnkknen, Chemom. Intell. Lab. Syst., 45 (1999) [10] P. Teppola, S.-P. Mujunen and P. Mnkknen, Chemom. Intell. Lab. Syst., 41(1998) [11] U. Thssen, H. Swerenga, A.P. de Wejer, R. Wehrens, W.J. Melssen, L.M.C. Buydens, Multvarate Statstcal Process Control Usng Mxture Modellng, submtted for publcaton (2004). [12] Stan Z. L, Markov Random Feld Modelng n Image Analyss, Sprnger-Verlag Tokyo, [13] J. Besag, J. R. Statst. Soc. B, 48 (1986), [14] R. Wehrens, A. W. Smonett and L. M. C. Buydens, J. Chemom., 16 (2002) [15] D.L. Pham and J. L. Prnce, IEEE Trans. on Medcal Imagng, 18 (1999) [16] T. N. Tran, R. Wehrens and L. M. C. Buydens, Anal. Chm. Acta, 490 (2003) [17] T.N. Tran, R. Wehrens and L.M.C. Buydens, 2nd GRSS/ISPRS Jont Workshop on Remote Sensng and Data Fuson over Urban Areas (URBAN 2003), Proceedngs of the Conference, May 2003, Berln, Germany, 2003, pp [18] R. Wehrens, L.M.C. Buydens, C. Fraley and A.E. Raftery, Model-based clusterng for mage segmentatons and large datasets va samplng, Techn. Report no. 424, Dept. of Statstcs, Unversty of Washngton, [19] J. C. Noordam and W. H. A. M. van den Broek, J. Chemom., 16 (2002) [20] J.C. Noordam and W.H.A.M. van den Broek, L.M.C. Buydens, Chemom. Intell. Lab. Syst., 64 (2002) [21] D.H. Hoekman and M.A.M. Vssers, IEEE Trans. on Geoscence and Remote Sensng, 41 (2003) [22] C. Fraley, SIAM J. Sc. Comput., 20 (1998) [23] G. McLachlan and D. Peel, Fnte Mxture Models, Wlley seres n probablty and statstc, John Wley & Sons, Canada, 2000 [24] M. Daszykowsk, B. Walczak and D. L. Massart, Chemom. Intell. Lab. Syst., 56 ( 2001) [25] J. Mao and A.K. Jan, IEEE Trans. on Neural Networks, 7 (1996) pp [26] G.H. Ball and D.J. Hall, ISODATA, A novel method of data analyss and pattern classfcaton, Techn. Rep., Stanford Research Insttute, Menlo Park, CA,

32 CHAPTER 2 [27] J.C. Bezdek, Pattern Recognton wth Fuzzy Objectve Functon Algorthms, Plenum, New York, [28] I. Gath and A. B. Geva, IEEE Trans. on Pattern Anal. Mach. Intell., 11 (1989) [29] L. Bobrowsk and J.C. Bejdek, IEEE Trans. on Systems Man and Cybernetcs, 21 (1991) [30] C. Fraley and A. E. Raftery, J. Am. Stat. Assoc., 97 (2002) [31] A.P. Dempster, N.M. Lard and D.B. Rubn, J. R. Statst. Soc. B, 39 (1977) [32] D.L. Pham, Computer Vson and Image Understandng, 84 (2001) [33] W. Pedrycz, Pattern Recognton Letters, 17 (1996) [34] D. Geman and S. Geman, IEEE Trans. Pattern Mach. Intell., 6 (1984) [35] J. Besag, The Statstcan, 24 (1975) [36] W. Qan and D.M. Ttterngton, Phlosophcal Trans. R. Soc. of London A, 337 (1991) [37] I.V. Cadez and P. Smyth, Modelng of nhomogeneous Markov Random Felds wth applcatons to cloud screenng, Techncal Report No Irvne: Department of Informaton Scence, Unversty of Calforna Irvne, [38] M. Esen, P. Spellman, P. Brown and D. Botsten, PNAS, 95 (1998) [39] A. Hnneburg and D. A. Kem, Knowledge Dscovery and Data Mnng (1998). Proceedngs of the Conference, 1998, pp [40] D. Coomans and D. L. Massart, Anal. Chm. Acta, 133 (1981) [41] M Ester., H.-P. Kregel, J. Sander and X. Xu, Knowledge Dscovery and Data Mnng, Processdngs of the Conference, 1996, pp [43] K. Fukunaga and L.D. Hostetler, IEEE Trans. Info. Theory, 21 (1975) [42] Yzong Cheng, IEEE Trans. Pattern Anal. Mach. Intell., 17 (1995) [44] G. Karyps, E.-H. Han and V. Kumar, IEEE Computer, 32 (1999) [45] S. Guha, R. Rastog, K. Shm, Informaton Systems 25 (2000) [46] S. Günter and H. Bunke, Patt. Recog. Lett., 24(2003) [47] D.C. Stanford, A. E. Raftery, IEEE Trans. on Pattern Anal. and Mach. Intell., 24 (2002) [48] C. Fraley, A. E. Raftery and R. Wehrens, Incremental Model-Based Clusterng for Large Datasets wth Small Clusters, Techn. Rep. no. 439, Dept. of Statstcs, Unversty of Washngton, Dec [49] P. Gelad, H. Grahn, Multvarate mage analyss, Wley, New York, [50] A.K. Jan and D. Zongker, IEEE Trans. Pattern Anal. Mach. Intell., 19 (1997) [51] J.A. Hageman, R. Wehrens, H.A. van Sprang and L.M.C. Buydens, Anal. Chm. Acta, 490 (2003) [52] W. C. Chang, Appled Statstcs 32 (1983)

33 CHAPTER 3 KNN-KERNEL DENSITY-BASED CLUSTERING FOR HIGH DIMENSIONAL MULTIVARIATE DATA Abstract Densty-based clusterng algorthms for multvarate data often have dffcultes wth hgh dmensonal data and clusters of very dfferent denstes. A new densty-based clusterng algorthm, called KNNCLUST, s presented n ths paper that s able to tackle these stuatons. It s based on the combnaton of nonparametrc k-nearest-neghbour (knn) and kernel (knn-kernel) densty estmaton. The knn-kernel densty estmaton technque makes t possble to model clusters of dfferent denstes n hgh-dmensonal datasets. Moreover, the number of clusters s dentfed automatcally by the algorthm. KNNCLUST s tested usng smulated data and appled to a multspectral Compact Arborne Spectrographc Imager (CASI) mage of a floodplan n the Netherlands to llustrate the characterstcs of the method. Keywords: Multvarate data; classfcaton; clusterng; T.N. Tran, R. Wehrens and L.M.C. Buydens, Proc. 2nd GRSS/ISPRS Jont Workshop on Remote Sensng and Data Fuson over Urban Areas, URBAN_2003, May 2003, Berln, Germany. Revsed for Journal of Computatonal Statstcs and Data Analyss 31

34 CHAPTER 3 1. Introducton Clusterng of multspectral data [1] groups objects, characterzed by the values of a set of varables nto separate groups (clusters) wth respect to a dstance or, equvalently, a smlarty measure. Its objectve s to assgn to the same cluster objects that are more close (smlar) to each other than to objects from dfferent clusters, whch may help to understand relatonshps that may exst among objects. Examples are explorng of envronmental data representng physcal and chemcal parameters [2], computatonal analyss of mcroarray gene expresson profles [3], electron probe X-ray mcroanalyss [4], or process montorng [5], and many others. However, the successful applcaton of clusterng on multspectral datasets s not a straghtforward task. It depends on the understandng of the dataset and a good choce of the clusterng algorthm. Several types of clusterng methods can be dstngushed, among whch parttonal and herarchcal approaches are the most common [1]. Densty-based clusterng methods, such as CLUPOT [6], DBSCAN [7], and Denclust [8], form a thrd clusterng type. Denstybased clusterng uses a local cluster crteron, n whch clusters are defned as regons n the data space where the objects are dense, and clusters are separated from one another by low-densty regons. Non-parametrc densty-based clusterng s based on an estmaton of a local non-parametrcs densty functon, proposed by Fukunaga and Hostetler [9] and been further mproved n [10][11]. Densty-based clusterng has advantages over parttonal and herarchcal clusterng methods n dscoverng clusters of arbtrary shapes, szes. It s often used n data mnng for knowledge dscovery. However, t was shown that current densty-based clusterng mght have dffcultes wth complex data sets contanng clusters wth dfferent denstes [11]. In ths case, t often dentfes the very low densty classes as nose [1]. Moreover, the hgh dmensonalty of many multvarate data sets s another problem for densty-based clusterng. In ths case, the volume of the data grows dramatcally wth the dmenson, whle the number of objects remans the same. One of the solutons for the dmensonalty problem s proposed n [12][13], usng a k-nearest-neghbor densty estmaton technque. Instead of defnng a threshold to local densty functon, low-densty regons, valleys, separatng two clusters can be detected by calculatng the number of shared neghbors. If the number of shared neghbors of two adjacent objects s below a threshold (number of objects), then there s a gap, a valley, n between. Hence, the two object belong to two dfferent clusters. In ths way, the method does not have to take nto account the volume of the hgh dmensonal search space. However, ths clusterng method stll requres the densty threshold to be defned, whch s very dffcult for a real dataset [14]. In ths paper, a new densty-based clusterng algorthm, the so-called KNNCLUST, s developed. The proposed method s based on a combnaton of nonparametrc k-nearestneghbour (knn) and kernel densty estmaton methods (knn-kernel). It wll be shown later n the text that knn-kernel s not a good soluton for estmatng the true densty of a dstrbuton due to an overestmate of densty n the tals of the dstrbuton. However, the knn-kernel has attractve propertes to clusterng, shown for the frst tme n ths paper. KNNCLUST has been mplemented n MATLAB 6.5 and the toolbox s avalable on the web [15]. We revew nonparametrc densty estmaton technques n secton 2. The knn-kernel classcondton rule on clusterng and the descrpton of the new knn-kernel densty-based clusterng, KNNCLUST, are gven n secton 3. In secton 4, ts propertes are llustrated 32

35 KNNCLUST usng a multspectral remote sensng mage and compared to the results from DBSCAN. Fnally, the work s summarzed n secton Knn-Kernel Densty Estmaton An unknown probablty densty functon of a data set can be estmated by a nonparametrc kernel densty estmaton method. Consder a N x d dmensonal data set. The d- dmensonal space can be parttoned nto a number of equal bns (volumes), V, e.g. hyperrectangles. The multvarate kernel densty estmate obtaned at the object x wth kernel K s defned as [16]: N fˆ 1 ( x) = K( ( x x ). / H ) (1) NV 1 The sze of the bn s gven by a scale vector H=[h 1..h d ] n d-dmensonal space, and the matrx operaton./ s the element-by-element dvson of two equal-szed matrces or vectors. The data volume V s d h. A lst of common kernels s gven n Table 1. A = 1 Trangular or Gaussan kernel functon s normally used (Fg. 1). 1 Trangular kernel 0.42 G aussan kernel Fgure 1. Trangular and Gaussan kernels In Eq. 1, bn V s fxed n sze. If bn V s defned just lke n k-nearest-neghbor (knn) [16], where the volume around object x, V x, s adjusted to nclude the k nearest neghbor objects, the method s called knn-kernel and gven by: N fˆ 1 ( x) = K( ( x x )./ H x ) (2) NVx 1 where H x s a scale vector [h x 1..h x d] of the volume V x n d-dmensonal space. Table 1. Commonly used kernels, where z = ( x x ). / H. Rectangular ½ f x T x < 1, 0 otherwse Trangular 1- x f x T x < 1, 0 otherwse Bweght 15 2 (1 x T x) f x T x < 1, 0 otherwse 16 Gaussan 1 exp( x T x / 2) 2π Bartlett-Epanechnkov 3 (1 x T x /5) 4 5 f x T x < 5, 0 otherwse The dea of knn-kernel s frst ntroduced by Loftsgaarden and Quesenberry [17] and then generalzed by Terrell and Scott[18], where the Eucldean dstance or Mahalanobs dstance to the k th nearest neghbor s used. Here, we use the volume V x, whch s more 33

36 CHAPTER 3 general. Knn-kernel can also be seen as a case of varable kernel densty estmaton methods [18][19]. Knn tself obvously s a smply case of knn_densty estmaton where the unform kernel s used. Readers are referred to [16] for a complete overvew of nonparametrc kernel densty estmaton methods. The knn-kernel method has two advantages over other methods. Wthout the kernel, the frst arses from densty estmate s non-smooth; usng a kernel makes the knn-kernel estmator smooth. The second advantage s the result of the applcaton of knn and allows for an adaptve kernel wdth: a broader kernel n low densty regons and a narrower kernel n hgh densty regons. Comparng wth fxed kernel wdth methods, abnormal small densty peaks appear n low densty regons (e.g. n Fgure 3a), whch wll result n many small clusters found wth ordnary densty-based clusterng. Hence, the knn-kernel method s useful for clusterng, even though t s not better than the fxed kernel scheme for the purpose of estmatng a densty, due to an overestmate of densty n the tals of the dstrbuton [18]. These features are demonstrated n Fgure 2, where the knn and the knn-kernel methods are appled to a synthetc data set, whch ncludes 500 objects generated from one Gaussan dstrbuton A - K N N k = B - K n n - T r a n g u l a r K e r n e l C - K n n - G a u s s a n K e r n e l Fgure 2. Knn and Knn-kernel estmaton on the sample data set contanng 500 samples generated from one Gaussan dstrbuton (mean=0 and s=1), wth k = 100. The dotted lne s the theoretcal pdf functon for the data set. The other smple example n Fgure 3 shows the advantage of the knn-kernel on a data set contanng two classes of dfferent denstes. Class one s a hgh densty class contanng 500 objects generated from one Gaussan dstrbuton (mean=0 and s=1). Class two s a low densty class contanng 150 objects generated from one Gaussan dstrbuton (mean=100 and s=10). The kernel-based estmaton method (Eq. 1) provdes a smooth estmate for the frst class but a bad estmate for the second class, showng many sharp peaks, due to the aforementoned problem of the kernel-based method (Fgure 3A). In contrast, the knn-kernel method (Eq. 2) wth k = 100 provdes a smooth densty estmate for both classes (Fgure 3B). In general, nonparametrc methods are senstve to the choce of the smoothng parameter. If t s too small, the densty estmate s too detaled, showng many sharp peaks (as n Fgure 3A, the kernel method for cluster 2). If t s too large, the structure of the densty functon s lost. Hand [20] showed that the smoothng parameter can be estmated from the average dstance of k nearest neghbors. The knn-kernel method, on the other hand, forms a flexble way to deal wth a complex data set, where denstes can be very dfferent between clusters. Then, the smoothng parameter values are adapted locally for dfferent clusters. 34

37 KNNCLUST A-Trangular kernel B- Knn-Trangular Kernel Fgure 3. Densty estmaton functons for the data set of two classes of dfferent denstes. 3. Knn-Kernel Densty-Based Clusterng 3.1 Classfcaton rule based on knn-kernel densty estmates The most common ways to assgn objects to clusters, also called classfcaton rules, are based on Bayes decson rule: p( xω ) p( ω ) > p( xω ) p( ω ), j (3) j j ω and p( ω ) where p( xω ) s the class-condtonal densty functon at x of each class s the pror probablty functon. The class-condtonal densty functon can be estmated by the nonparametrc knn-kernel method, mentoned earler: 1 ( xω ) = K( x x j ). / H x ) pˆ (4) n V x x j ω j where n s the sze of cluster ω, and Σ n = N. Bayes knn-kernel class-condton can be rewrtten as: 1 n V x x ω l K 1 n V ( x x )./ H ) p( ω ) > K( ( x x )./ H ) p( ω ), j l x j x x ω l The pror probablty functons p(ω ) and p(ω j ) are normally estmated by n /N and n j /N, respectvely. Then, the knn-kernel Bayes class-condton can be smplfed: x l K ω (( x x ). / H ) > K( ( x x ). / H ), j l x x ω l j l x Thus, the decson rule used here s the same as the one n the knn classfer [16] n the supervsed classfcaton method, but the densty estmaton s replaced by the knn-kernel. The advantage of ths for clusterng s llustrated n the followng secton. j l x j (5) (6) 35

38 CHAPTER The KNNCLUST algorthm We propose n ths secton KNNCLUST as a hard clusterng algorthm, whch assgns each object x to one and only one cluster. Just lke parttonal clusterng [1], whch seeks an organzaton of objects whch optmzes a target functon [1], KNNCLUST forms clusters n order to maxmze the total class-condtonal densty functon for all objects defned by: 36 N = ( x c ) D = p ˆ (7) 1 where the pont x s assgned to cluster c. The framework of KNNCLUST s as follows: Steps of the algorthm: 1. Start: N sngleton clusters, the number neghbors k, and the knn table T of sze (N x k), the lst of k nearest neghbors of all samples. 2. Iteraton: re-calculate cluster membershps of all ponts usng the class-condton (Eq. 6) n order to maxmze the functon D. STOP: f no, or only a few cluster membershps change (stop-condton). Otherwse LOOP and start new teraton (step 2). pˆ x c s replaced by Usng the knn-kernel Bayes class-condton (Eq. 6), n step 2, ( ) p( x d ) = max pˆ ( x j) j C ( ) ˆ for all ponts. The old membershp c s replaced by new membershp d of object x. At the end of teraton, there may be an empty cluster because all ponts were moved to other clusters. Ths cluster s removed from the system and the total number of clusters s decreased by one. The algorthm ends f the stop-condton s fulflled. Note that pˆ ( x c) never decreases at any stage. Therefore, eventual convergence s assured. In KNNCLUST, only the trangular kernel s recommended for the kernel functon K n knn-kernel Bayes class-condton (Eq. 6) to reduce computaton tme. Usng the Gaussan kernel gves smlar results but s more tme consumng. The rectangular kernel (equvalent to the well-known knn class-condton, often-used n supervsed classfcaton) s not used here. It leads to problems n the ntal state where the knn estmated densty values at any pont are equal for all clusters. A smple example n Fgure 4 shows how KNNCLUST performs on a smple data set of eght object values n a 1D space wth k = 2. Each row n Fgure 4 plots objects n one partcular step of the process when the membershp s changed. For example, teraton one starts wth object one, x1. Because k = 2, the wdth of the bn around x1 s gven by H x 1 = x3 x1. By applyng the trangular kernel we obtan ( x1 *) 1 ( x1 x )/ H x1 ( x1 ο) = 1 ( x1 x )/ H x1 pˆ = 2, pˆ 3 It s obvous that (the class-condton Eq. 6), so x1 s assgned to the cluster of object x2, ndcated wth symbol *. The process s repeated to all other objects n turn; ths concludes one teraton. The order n ths case s random, e.g., object sx s consdered at step three of

39 KNNCLUST teraton one. Only two teratons are needed for clusterng the dataset nto two clusters (o and ). Fgure 4. A smple example shows how KNNCLUST. The symbols: *, o, x,,, and stand for cluster membershp, n whch pxels belongng to the same cluster have the same symbol. In general, the object order, n whch the objects are consdered, may nfluence the result of the algorthm. One may order objects by ther denstes, n whch hgher densty object s taken before the lower. However, densty values are changed durng teraton and the reorderng at every step takes a lot of the computaton tme. In practce, objects may be processed n any convenent order. We have not seen any performance degradaton Computatonal complexty The computatonal complexty of KNNCLUST depends manly on the calculaton of knn table, the lst of k nearest neghbors of all objects, whch s very expensve. For example, f we acqure knn query for each object ndependently, the smplest way s to order the all dstances from ths object to other objects, whch leads to a complexty of O(N log(n)). However, there are many ways to make t more effcent; e.g. ntegrate nformaton on all queres (see [21] for a summary). The R-tree ndexng technque s often utlzed, e.g. n DBSCAN User-defned parameters Apart from the choce of the kernel, the algorthm requres only one parameter, the number of neghborhood ponts, k. The smaller k, the more detal there s n the clusterng and the more clusters can be obtaned. In contrast, wth a hgher value for k, the clusterng result s smoother and a smaller number of clusters s obtaned. In all cases, k should be smaller than the sze of the smallest cluster, because ths cluster wll otherwse be mssed. It may be dffcult to fnd an optmal value of k for a dataset whch has clusters of very dfferent sze. It s recommended to use several values of k, and to pck the one that captures the relevant features of data best. However, as wll be shown below, n practce there wll be a range of k values that gve qute smlar results. 37

40 CHAPTER Comparson of KNNCLUST to other clusterng methods KNNCLUST s not an agglomeratve herarchcal clusterng algorthm [1], where a par of clusters s merged based on the smlarty between pars of clusters. KNNCLUST s more lke parttonal clusterng [1], where the probablty densty functon (pdf) s used nstead of normally used dstances, e.g., Eucldean or Mahalanobs dstances. In ths type of clusterng, objects are allowed to be reassgned to other clusters. However, the number of clusters needs to be defned n parttonal clusterng methods, whereas t s automatcally determned by KNNCLUST. Parttonal clusterng, such as Fuzzy C-means or mxture modelng by Expectaton Maxmzaton (EM) s senstve to the ntal choce of cluster centers and nose/outlers present n the data set. Ths s not the case for KNNCLUST. Moreover, dfferent from mxture model clusterng EM, KNNCLUST does not requre clusters to have a certan statstcal dstrbuton; e.g. the Gaussan dstrbuton s often used n EM. KNNCLUST also dffers from ordnary densty-based clusterng by constructng the class-condton nstead of usng a densty estmaton functon for detectng separaton densty valleys between clusters. As a consequence, KNNCLUST s less suted for fndng very elongated clusters or clusters wth strange shapes, somethng that s possble wth ordnary densty-based clusterng. On the other hand, t can be used n cases when clusters have very dfferent denstes where other densty-based methods cannot. Last but not least, KNNCLUST can work well wth data n hgh dmensonal feature spaces whch s dffcult for many clusterng algorthms, such as the EM method. 4. Results In ths secton, we demonstrate the effectveness of KNNCLUST on two datasets, a smulated dataset and a remote sensng Compact Arborne Spectrographc Imager (CASI) mage. The 2D smulated dataset n Fgure 5 contans four classes havng szes of 600, 400, 200 and 200 objects. To make the smulated dataset more realstc, class one s constructed from two overlappng Gaussans. The other three are generated from three sngle Gaussan dstrbutons wth very dfferent cluster denstes; the varances of clusters three and four are ten tmes smaller than cluster one and two, respectvely. The Gaussans are llustrated by ellpses, shown n fgure 5. In the plot, classes two and three (n the mddle-rght of Fgure 5) are located n very small areas, and are dffcult to dstngush class 1 class 2 class 3 class Y axs X axs Fgure 5. The smulated dataset. Class one s a mxture of two Gaussans and the other three are generated from three sngle Gaussan dstrbutons wth very dfferent n cluster denstes. Usng KNNCLUST, the four-cluster results can be obtaned usng k values n the range [180,.., 220] wth total accuracy more than 95 % (by countng the msclassfed objects). As an example, the result of KNNCLUST by k = 180 s gven n Fgure 6. 38

41 KNNCLUST cluster 1 cluster 2 cluster 3 cluster s Y ax X axs Fgure 6. Clusterng result by KNNCLUST wth k = 180; the total accuracy s 95.9 %. The often-used densty-based clusterng, DBSCAN [7], s appled to the dataset as well. The clusterng result (n Fgure 7) s very poor, as expected snce clusters have very dfferent denstes. The best results of DBSCAN on two stuatons are dscussed hereafter. In order to recognze classes three and four, a very hgh densty threshold wth mn_ponts = 10 and ε = 20 s set, leadng to objects of class one and two to be classfed as nose (Fgure 7a). In the opposte stuaton, usng a low dense threshold wth mn_ponts =20 and ε = 950, classes three and four are merged (Fgure 7b) Nose cluster 1 cluster Nose cluster 1 cluster 2 cluster s Y ax Y axs a) X axs b) X axs Fgure 7. DBSCAN (a) mn_ponts = 10, ε = 20; (b) mn_ponts =20, ε = 950 We also compared KNNCLUST wth the state-of-the-art mxture model clusterng by EM on ths dataset. The EM algorthm s very senstve to ntalzaton [22][23]; a random ntalzaton strategy s normally used. We performed EM to four clusters 100 tmes and the best clusterng result n terms of the maxmal lkelhood crteron s shown n Fgure 8a. Gaussan mxture model clusterng assumes clusters to have normal dstrbuton. Because of the mxture of two Gaussans n class one, EM needs two Gaussans to descrbe the class and the class three and four are merged together. EM works better when workng wth fve clusters, the class three and four can be recognzed. However, cluster one stll dvded to two parts (Fgure 8b). Together wth the dffculty of the ntalzaton of and the dentfyng the number of clusters, KNNCLUST works better than EM for ths dataset. 39

CHAPTER 3 500 400 300 cluster 1 cluster 2 cluster 3 cluster 4 500 400 300 cluster 1 cluster 2 cluster 3 cluster 4 cluster 5 200 200 s Y ax 100 0 100 0-100 -100-200 -200-300 -300 (a) -400-500 -400-300

(b) -400-500 -400-300 -200-100 0 100 200 300 400 X axs The second experment s done on a multspectral remote sensng satellte mage recorded by a CASI scanner from the Natural Envronment Research Councl

42 CHAPTER cluster 1 cluster 2 cluster 3 cluster cluster 1 cluster 2 cluster 3 cluster 4 cluster s Y ax (a) X axs Fgure 8. The best of 100 runs of EM to (a) 4 clusters; (b) 5 clusters. (b) X axs The second experment s done on a multspectral remote sensng satellte mage recorded by a CASI scanner from the Natural Envronment Research Councl (NERC). The mage was taken at 1536 m over an area n the Klompenwaard, the Netherlands, durng August The data set for ths study contans 10 bands from 437 nm to 890 nm, wth bandwdths of 10 nm, except for band 9 wth 8 nm. The study area has sze of 30 x 255 pxels wth 3 m resoluton, coverng m 2. Prncpal Components Analyss (PCA) s used for reducng the complexty and vsualzaton of the results. The orgnal multspectral data were mean zero and unt varance and compressed va a PCA to the frst four prncpal components, whch account for more than 99.8 % of the spectral varance. KNNCLUST was appled on both the orgnal 10-bands dataset and the four-component compressed dataset. The result shows no dfference between the two cases. For convenence, the results shown n ths paper are shown n PCA space. B: Lake C: Land & clay E: Open vegetatons F: Gras or bush D: Sand and vegetatons F E A B A : Rver D C (a) Fgure 9. (a) The gray-scale mages of the frst two prncpal components (PC1 on the left, PC2 on the rght), and the sx man object classes that have been dentfed n the area. (b) The score plot of PC1 and PC2. Fgure 9a shows the gray-scale mages of the frst and second prncpal components, explanng 71 % and 27 % of the varance n the data, respectvely. Sx man object patterns have been estmated for the area from the work of Van den Berg [24]. (b) 40

43 KNNCLUST The clusters are dfferent n densty, as can be seen clearly n Fgure 9b; e.g., the clusters of the rver (A) and the lake (B) are very dense, wth a long narrow shape contanng approxmately 1000 ponts, compared to the large cluster correspondng to sand and vegetaton (D) of 1440 ponts. Frst, we apply an often-used densty-based clusterng, DBSCAN [7]. DBSCAN clusterng s a spannng process, groupng ponts connected by hgh densty cells and dvdng ponts separated by low densty cells. The threshold s a user-settable parameter,. The second parameter that should be set s mn-ponts, the mnmum number of objects n the neghbourhood. The number of clusters s found automatcally by DBSCAN. For ths data set, many values for both user parameters have been used but none of them gave good results. Some examples are shown n Fgure 10 (a-d). Ths s caused by the absence of a global threshold of the densty for the whole data set. If the densty parameter s adjusted to dentfy low densty regons such as the sand and the vegetaton cluster (D), then t s too hgh to dstngush between clusters B and C, as well as between clusters E and F n Fgure 10c, and d. In other settngs, cluster D could not be recognzed due to ts low densty (Fgure 10a and b). Fgure 10. Score-plots of two frst PCs by DBSCAN wth parameter mn-ponts s 25 (a) 8 clusters found by ε = 300, (b) 6 clusters found by ε = 400 and, c) 4 clusters found by ε = 500 and (d) 2 clusters found by ε = 900; KNNCLUST was appled usng the followng values of k: [450, 500, 550, 600, 650]. In all cases, sx clusters are found. The score-plots of two frst PCs, showng the clusters obtaned wth k=550, s gven n Fgure 11c. The cluster szes range from 950 to 1800 ponts. Seven and fve clusters are found when values of k are 300 and 700, respectvely. The method s also compared wth K-means, and EM and the best results after 100 runs by randomly ntalzaton are shown n Fgure 11a, and b, respectvely. In ths case, the mage result of the KNNCLUST (Fgure 11c) and EM are comparable and look much smoother 41

44 CHAPTER 3 than the one obtaned by K-means, manly because of the vegetaton area (D). K-means ncorrectly jons the lake (B) and the rver (A), and dvdes the vegetaton area D nto two clusters. Fgure 11. Score-plots of two frst PCs and result mages of sx clusters obtaned by (a) K-means (the best of 100 runs); (b) EM (the best of 100 runs) and (c) KNNCLUST wth k=550. The stablty and the compactness of the clusterng result also can be studed by usng an ndex whch measures the rato of wthn-cluster varaton and between-cluster varaton [25]. A lower value ndcates a hgher compactness. Ths ndex s not desgned for a data set wth clusters of dfferent shapes. Nevertheless, t mght provde an dea about the stablty and the compactness of the clusterng results. Fgure 12. Compactness ndex of K-means n 100 runs compared to the ndex of the KNNCLUST result. Fgure 12 shows compactness ndex values of 100 replcated runs for K-means. It shows that K-means s not stable wth a mnmum value of the compactness ndex of and a maxmum value of Also n the fgure are the smallest and largest compactness values for KNNCLUST usng all fve values of k leadng to sx clusters. The smallest value for the ndex for KNNCLUST s when k = 650, and the largest value s when k = 450. They are comparable to the best case obtaned by K-means. The small varance of the compactness ndex ndcates that KNNCLUST s not very senstve to the values of k n the selected range. 5. Summary Many clusterng algorthms for multvarate data, such as, EM or most densty-based methods, suffer from the problem of clusters n a hgh dmensonal feature space wth 42

45 KNNCLUST dfferent denstes. Ths s not the case for our new proposed algorthm, KNNCLUST, makng use of a knn-kernel densty estmator usng the trangular kernel. For a gven kernel functon, KNNCLUST has only one parameter, k, the number of neghbors. In most cases, t s not dffcult to fnd a range of k for whch clusterng results are stable. The number of clusters s automatcally determned by the algorthm upon convergence. The computatonal complexty for the algorthm s qute hgh, manly caused by the calculaton of the knn dstance matrx. However, ndexng technques [21] could be used to mprove the stuaton for a larger data set. KNNCLUST s less suted for fndng very elongated clusters or clusters wth strange shapes, somethng that s possble wth ordnary denstybased clusterng. However, KNNCLUST can detect more nature clusters that are requred to follow any type of statstcal dstrbutons lke n mxture model clusterng. In concluson, t s a very good tool to cluster moderately-szed multvarate data set where the clusters are very dfferent n denstes. 6. Acknowledgements We thank Gertjan Geerlng, Department of Envronmental Studes, for sharng the data and stmulatng dscussons. References [1] T.N. Tran, R. Wehrens and L.M.C. Buydens, Clusterng multspectral mages: a tutoral, Chemom. Intell. Lab. Syst., n press. [2] A. Smolnsk, B. Walczak and J. W. Enax, Herarchcal clusterng extended wth vsual complements of envronmental data set, Chemom. Intell. Lab. Syst., vol. 64, pp , [3] J. Lang and S. Kachalo, Computatonal analyss of mcroarray gene expresson profles: clusterng, classfcaton, and beyond, Chemom. Intell. Lab. Syst, vol. 62, pp , [4] I. Bondarenko, H. Van Malderen, B. Treger, P. Van Espen and R. Van Greken, Herarchcal cluster analyss wth stoppng rules bult on Akake's nformaton crteron for aerosol partcle classfcaton based on electron probe X-ray mcroanalyss, Chemom. Intell. Lab. Syst, vol. 22, pp , [5] P. Teppola, S.-P. Mujunen and P. Mnkknen, Adaptve Fuzzy C-Means clusterng n process montorng, Chemom. Intell. Lab. Syst., vol. 45, pp , [6] D. Coomans and D. L. Massart, Potental methods n pattern recognton Part2. CLUPOT an unsupervsed pattern recognton technque, Analytca Chmca Acta, vol. 133, pp , [7] M. Ester., H.-P. Kregel, J. Sander and X. Xu, A Densty-Based Algorthm for Dscoverng Clusters n Large Spatal Databases wth Nose. n Proc. Knowledge Dscovery and Data Mnng, 1996, pp [8] A. Hnneburg and D. A. Kem, An Effcent Approach to Clusterng n Large Multmeda Databases wth Nose, n Proc. Knowledge Dscovery and Data Mnng, 1998, pp [9] K. Fukunaga, L.D. Hostetler. The estmaton of the gradent of a densty functon, wth applcatons n pattern recognton, IEEE Trans. Inform. Theory, vol. 21, pp , [10] Y. Cheng, Mean shft, mode seekng, and clusterng, IEEE Trans. Pattern Anal. Machne Intell., vol. 17, no. 8, pp , Aug [11] D. Comancu and P. Meer, Dstrbuton Free Decomposton of Multvarate Data, Pattern Analyss & Applcatons, vol. 2, pp , [12] L. Ertoz, M. Stenbach and V. Kumar, A new shared nearest neghbor clusterng algorthm and ts applcatons, Proc. Workshop on Clusterng Hgh Dmensonal Data and ts Applcatons, Arlngton, VA, USA, 2002, pp

46 CHAPTER 3 [13] R. A. Jarvs and E. A. Patrck, Clusterng Usng a Smlarty Measure Based on Shared Nearest Neghbors, IEEE Transactons on Computers, Vol. C-22, No. 11, November, [14] Z. Su, Q. Yang, H. Zhang, X. Xu and Y.-H. Hu, "Correlaton-based Web-Document Clusterng for Adaptve Web-Interface", Knowledge and Informaton Systems, vol. 4, pp , [15] T.N. Tran, KNNCLUST n Matlab, [16] A. Webb, Statstcal Pattern Recognton. Wley, Malvern, UK, [17] D. O. Loftsgaarden and C. P. Quesenberry, A nonparametrc estmate of a multvarate densty functon, Ann. Math. Statst., vol. 36, pp , [18] G. R. Terrell; D. W. Scott, Varable kernel densty estmaton, The Annals of Statstc, vol. 20, pp , [19] B.W. Slverman, Densty estmaton for statstcs and data analyss. Chapman & Hall, 1986, pp [20] D. J. Hand, Dscrmnaton and Classfcaton. Wley, New York, 1981 pp [21] B.V. Dasarathy, Nearest Neghbour Norms: NN Pattern Classfcaton Technques. IEEE Computer Socety Press, Los Alamtos, CA, pp. 1-30, [22] W. Sedel, K. Mosler, and M. Alker, A cautonary note on lkelhood rato tests n mxture models, Ann. Inst. Statst. Math., vol. 52, No. 3, pp , [23] G. McLachlan and D. Peel, Fnte Mxture Models, Wlley seres n probablty and statstc, Canada, [24] G. J. v/d Berg, Classfcate CASI-beeld: Een vegetatekaart van de Klompenwaard. Adves en Onderzoek Remote Sensng en Fotogrammetre (GAR), [25] R. G. Brereton, Multvarate pattern recognton n chemometrcs, llustrated by case studes. Elsever, 1992, pp

47 CHAPTER 4 SPAREF: A CLUSTERING ALGORITHM FOR MULTI- SPECTRAL IMAGES Abstract Mult-spectral mages such as mult-spectral chemcal mages or mult-spectral satellte mages provde detaled data wth nformaton n both the spatal and spectral domans. Many segmentaton methods for mult-spectral mages are based on a per-pxel classfcaton, whch uses only spectral nformaton and gnores spatal nformaton. A clusterng algorthm based on both spectral and spatal nformaton would produce better results. In ths work, SpaRef, a new clusterng algorthm for mult-spectral mages s presented. Spatal nformaton s ntegrated wth parttonal and agglomeraton clusterng processes. The number of clusters s automatcally dentfed. SpaRef s compared wth a set of wellknown clusterng methods on CASI mage over an area n the Klompenwaard, the Netherlands. The clusters obtaned show mproved results. Applyng Sparef to multspectral chemcal mages would be a straght-forward step. Keywords: clusterng algorthm; mult-spectral mage segmentaton; spatal nformaton. T.N. Tran, R. Wehrens, L. M.C. Buydens, Anal. Chm. Acta Journal, 490, ,

48 CHAPTER 4 1. Introducton Clusterng s the organzaton of a data set nto homogenous and/or well separated groups wth respect to a dstance or, equvalently, a smlarty measure. Its objectve s to assgn to the same cluster data that are more close (smlar) to each other than they are n dfferent clusters [1]. In mult-spectral satellte mages, organzng the data pxels nto classes, also called mage segmentaton, can reveal the underlyng structure of the mages,.e. spectrally homogeneous characterstcs. Ths nformaton can be used n a number of ways, e.g. to obtan optmum nformaton for the selecton of tranng regons for subsequent supervsed land-use segmentaton [2]. In vegetaton areas, the gradent may change very slowly from one vegetaton type to another. Ths makes t very dffcult to dentfy a border between clusters, leadng to clusters scattered n the spatal doman, whch makes nterpretaton very dffcult. Ths s also true for mult-spectral chemcal mages. What s needed s a clusterng method that takes both spectral and spatal nformaton nto account. Clusterng methods fall nto two types: parttonal and herarchcal approaches [1]. Varants of K-clusterng, such as K-means, ISODATA [3], and Fuzzy C-means [1], are the parttonal clusterng methods that are most wdely used for satellte mages. K-clusterng s computatonally attractve, whch makes t applcable for large data sets, but t s very senstve to small clusters and outlers,.e. nose or mxed pxels (pxels contanng nformaton from two or more classes) [4]. Agglomeratve herarchcal clusterng works well wth small data sets and can handle outlers very well but ts computaton s very expensve and therefore t s not feasble for a large data set. Moreover, t also has a channg problem for a complex data set [5]. In several papers, these clusterng methods are compared [2, 6] but the fundamental problems reman. In other research agglomeratve herarchcal clusterng s performed on a number of homogenous classes wth an assumpton of unform neghbourhoods n the dataset n order to avod the lmtatons of agglomeratve herarchcal clusterng, whch s not true n general cases [7]. In ths study, K-clusterng and agglomeratve herarchcal clusterng are analysed. Ther advantages as well as lmtatons are llustrated. A new clusterng algorthm, SpaRef (Spatal Refnement clusterng), s desgned to take advantage of the characterstcs of both clusterng methods and elmnate ther potental lmtatons. SpaRef can work wth a complex and large dataset, ncludng small objects and outlers. Brefly, SpaRef method works as follows. Frst, a hgh number of small, homogeneous clusters are dentfed by K- means. These so-called cells are clustered usng agglomeratve herarchcal clusterng and the optmal number of clusters s dentfed based on the rato of the wthn- and betweencluster varaton. Our man contrbuton, the refnement process, s ntroduced at the last stage. It reallocates msassgned ponts usng the nformaton of ponts n the spatal doman. Fst, we wll dscuss relevant characterstcs of K-clusterng and herarchcal clusterng methods n more detal. Then we wll dscuss several ways to pck the optmal number of clusters and to valdate the results of an mage segmentaton. We proceed by descrbng on SpaRef method n more detal, and apply t to a real-world mult-spectral mage. SpaRef s compared wth K-means, ISODATA and a herarchcal clusterng and shows better results. 2. Notaton We wll consder an mage consstng of N pxels, where each pxel s charactersed by D m varables (reflectance values). 46

49 SPAREF: A CLUSTERING ALGORITHM FOR MULTI-SPECTRAL IMAGES We wll use the followng notatons: K s the number of clusters and k s the ndex of the cluster. M s the number of cells, clusters of a hgh homogenety, M << N Mnsze s a mnmum sze of a normal cluster (so the clusters should contan at least Mnsze pxels). B c s number of boundary ponts of cluster c n the spatal doman. A boundary pont of cluster c s defned as the pont whch has at least one adjacent pont belongng to another cluster d (d c). C k s a set of pont ndces that belong to cluster k. 1 ck = x (1) nk C k s the centre of the k th cluster, n spectral space. N K 1 1 c = x = nkc (2) k N = 1 N k = 1 s the mean centre of the entre data set, n spectral space. d 2 d ( x, x j ) = ( xl x jl ) = x x (3) j l=1 s the Eucldean dstance of two ponts, x and x j. 1 W = ( ) k d x, c (4) k nk C k s wthn-cluster nerta of class k. B kj = d( ck, c j ) (5) s between-cluster nerta of class k and j. 2.1 K-clusterng K-means and ISODATA [8] are among the most popular, well-known hard parttonal clusterng algorthms, n whch each pont s assgned to only one partcular cluster. K- means produces a clusterng by optmsng the sum-of-squares crteron, E: ( x c ) 2 E = d, (6) K Ck k The algorthm addresses drectly the problem of dvdng a set of data nto several homogeneous groups. For a gven number of K clusters, the algorthm starts by choosng K cluster centres (randomly or by some heurstc process) [8]. The Eucldean dstances between all ponts and the cluster centres are calculated. Ponts wll be assgned to the closest cluster centre. Cluster centres are recalculated and the process s repeated unless a convergence crteron s met. A major dsadvantage of K-means clusterng s that one must specfy the number of clusters K n advance. Moreover, the algorthm s very senstve to nose, mxed pxels and outlers n the data set [4], all stuatons that occur frequently wth satellte mages. Furthermore, the algorthm easly gets stuck n a local optmum on the sum-of-square error space. For these reasons, the K-means clusterng results are not stable,.e., they heavly depend on dfferent choces of the ntal cluster centres. ISODATA [3] s a modfcaton of K-means that starts wth a hgh number of clusters and permts splttng of clusters when a cluster varance s above a pre-specfed threshold or merges them when dstances between clusters are small, below another threshold. Startng 47

50 CHAPTER 4 wth a hgher number of clusters, ISODATA s more stable, but the algorthm requres many nput parameters that can be dffcult to fnd. Fuzzy c-means [1], on the other hand, s a soft parttonal K-clusterng whch attempts to assgn each pont x to several clusters, dependng on the degree of the fuzzy membershp, u k [0,1], n order to optmse the sum-of-squares crteron, E f : ( x c ) 2 E = u d, (7) f k K Ck k The algorthm works smlar to K-means. In most cases, f one has no nterest n a fuzzy membershp, then Fuzzy c-means result - the membershp matrx U wll be converted to a hard membershp matrx by thresholdng the fuzzy membershp value, whch s smlar to a hard clusterng result Agglomeratve Herarchcal Clusterng Agglomeratve herarchcal clusterng yelds a herarchcal structure of clusters, representng how cluster pars are joned. In prncple, the algorthm starts wth assgnng each pxel to ndvdual clusters. At each teratve step, the proxmty matrx s calculated for all cluster pars and the two closest par clusters are merged. The process wll contnue untl there s only one cluster. Dependng on the defnton of a dstance between clusters, agglomeratve herarchcal clusterng are varants of sngle lnkage [9], complete lnkage [10], average lnkage and Ward s [11] algorthms. In sngle lnkage, the dstance of two clusters s the dstance between two nearest ponts. Smlarly, the dstance s the maxmal dstance between ponts n dfferent clusters n complete lnkage, and the average dstance of ponts n average lnkage clusterng. The dstance n Ward s method s defned as the squared Eucldean dstance of the cluster mean vectors. Hence, Ward s method s related to K-means through the mnmum-varance crteron. In ths paper, Agglomeratve Herarchcal Clusterng (AHC) wth Ward s dstance measure s used. A dendrogram s produced, representng nested clusters and the smlarty levels at whch clusters are joned. The dendrogram can be cut at several levels n order to obtan an arbtrary number of clusters. It crcumvents the problem of the pre-defned number of K clusters n K-clusterng algorthms. By startng wth assgnng each pxel to ndvdual clusters the algorthm s not senstve to outlers [5]: outlers wll be kept n separate clusters, not nfluencng the other clusters. Overall, agglomeratve herarchcal clusterng consders only clusters that were obtaned n the prevous step. Ths means that once a pont has been merged to a cluster, t can not be consdered for jonng another cluster n later teratons. Ths rule s not optmal for complex data sets where cluster homogenety levels are low or not unform [5]. The algorthm requres calculaton, storage and sortng of the proxmty matrx a maxmum sze of N 2. If N s large then ths matrx becomes huge and sometme t s not feasble [5] 2.3. Number of clusters Determnng the number of clusters s a dffcult problem n all n clusterng algorthms. Many crtera have been developed [12-13] often based on measures of spread wthn and between clusters. The wthn-cluster nerta, W, s defned as varaton of ndvdual ponts to ther centre and the between-cluster nerta, B, s defned as the varaton of cluster centres around the overall mean. 48

51 SPAREF: A CLUSTERING ALGORITHM FOR MULTI-SPECTRAL IMAGES 1 W = d( x, ck ) (8) N K Ck 1 B = nkd( ck, c) (9) N K Clusterng algorthms mnmzng the sum-of-squares crteron (eq. 6) would thus mnmze W. By keepng track of the wthn-cluster nerta (or other crtera based on t) for a varyng number of clusters, one can often observe a sharp ncrease at a certan level. Just before ths ncrease, the spread of the clusters s mnmal and then the optmal number of clusters can be found. Many crtera [12-13] n one way or another llustrate ths stuaton [4], for example, by mnmum Duun or Daves-Bouldn ndces [13-14]. Duun and generalzed Duun ndces are also used n some cases [14] but they are very computaton-expensve and not sutable for a dataset wth large number of ponts [15]. The Daves-Bouldn ndex s a functon R of wthn-cluster scatter and between-cluster separaton: W k + W j (10) Rk = max j k Bkj 1 R = (11) K K R k k= 1 Here, we smply use the rato of wthn-cluster to between-cluster nerta, I, to determne the optmal number of clusters where there s a sharp change at a certan level: W I = (12) B Usng ths rato allows us to see the change n homogenety more clearly and t s not dependent on a partcular clusterng algorthm Cluster Valdty It s notorously dffcult to assess the results of clusterng algorthms n remote sensng. Usually qualtatve, subjectve crtera are appled, such as the homogenety n the spectral doman (compactness) of the segments, and the degree of fragmentaton (dsperson) of the segments n spatal doman. The ndex functon I and Daves-Bouldn ndex can also be used for cluster valdaton. Small values of these ndces correspond wth better results. For valdatng a clusterng result n terms of dsperson of ponts n the spatal doman, we ntroduce D c, a dsperson ndex for cluster c, to be the rato of the number of boundary ponts of cluster c, B c, to the total number of ponts of cluster c, n c. A boundary pont of cluster c s defned as a pont where at least one of ts adjacent ponts belongs to another cluster d (d c). B c D c = (13) nc D, the average dsperson degree over the mage, s equal to the rato of the total number of boundary ponts to the total number of ponts of the mage. A fuzzy mage wll have a hgher dsperson degree than an mage contanng large contnuous areas wth sharp straght edges. 49

52 CHAPTER 4 D = K 1 B c (14) N c= 1 3. Descrpton of SPAREF SpaRef s desgned to use a combnaton of K-clusterng and agglomeratve herarchcal clusterng (AHC) to take advantage of the characterstcs of both clusterng methods and elmnate ther potental lmtatons by ntroducng a refnement process usng spatal nformaton. In order to prevent the (expensve) applcaton of AHC to a large data set, SpaRef s frst pre-processed by K-means wth a hgh number of classes, M. When M s hgh enough, clusters can be consdered as hghly homogenous classes. These form the nput to the agglomeratve herarchcal process. The number of classes M s much smaller than the total number of ponts N, typcally n the order of 100. Determnng the number of clusters n a data set by usng the ndex functon I (eq. 12) s very tme consumng wth K-clusterng, where the algorthm has to be run for each number of clusters K. On the other hand, t s much easer for agglomeratve herarchcal clusterng, where we can calculate the ndex functon at each merge level, whch s used n SpaRef. For each level n the dendrogram, the clusterng ndex I s calculated and the best choce of K number of clusters thus s dentfed where there s a sharp change at the level K. In a data set contanng also nose, mxed pxels or outlers, we often fnd a number of very small clusters wth abnormal cluster szes, the set S, whch are well separated from normal clusters by the threshold, Mnsze. They are stable, solated and hghly homogenous [16]. They may contan nose, mxed pxels, outlers and very small objects. Nose pxels must be rejected, mxed pxels have to be consdered to merge to the spatally closest neghbour cluster and small objects may be dentfed usng a pror nformaton. How to dscrmnate between the dfferent types of small classes s the subject of further study. Here, we wll remove these classes from the data set and concentrate on the larger clusters. Let O to be the set of other clusters, K\S. These clusters are large, probably less well separated and qute dsperse. Agglomeratve herarchcal clusterng may have problems separatng these clusters, because of the lack of flexblty mposed by the herarchcal structure. To deal wth ths problem, we ntroduce a refnement process to all boundary ponts of the clusters n spatal doman. We assume that f there are ms-assgned ponts n clusters, they would frst appear n the boundares of clusters. Therefore, boundary ponts wll be re-assgned to the closest adjacent clusters, and cluster boundares wll be redrawn. The refnement process terates untl there s no more change n border pont classfcaton. Ths leads to a smoothng on the spatal doman, whle stll keepng n mnd the nformaton from the spectral doman. The flowchart of SpaRef s gven n fgure 1. SpaRef allevates the nflexblty of agglomeratve herarchcal clusterng. By lmtng the refnement only to boundary ponts, the clusterng s expected to have a hgh contnuty. At any teraton, let x be pont n cluster S c but not a border pont. Even f there exsts a cluster d such that d(x,c d ) < d(x,c c ) then x s not consdered to be reassgned to cluster d. It wll only be joned to cluster d when t s at the boundary of cluster c. Therefore, SpaRef s fast, snce only a lmted number of reallocatons have to be consdered. 50

53 SPAREF: A CLUSTERING ALGORITHM FOR MULTI-SPECTRAL IMAGES Input: Image N ponts M homogenous classes by K-means AHC to M clusters & Calculate I ndex Identfy K = O S clusters Spatal refnement on O Identfy outlers, mxed, real object on S Output: F clusters Fgure 1. Flow chart of SpaRef method. SpaRef depends on two man nput parameters, M and Mnsze. M, the number of cells, s dependent on the mage type. Images wth a hgher degree of complexty would requre a hgher settng for M. The man purposes of defnng the number of cells M are to separate nose, mxed-pxel class and small objects, and stablze clusterng result. Therefore, wth a hgh enough settng of M, the clusterng method wll not be sgnfcantly affected by the exact settng. In most cases, after the AHC mergng stage, very small clusters wll be well separated from normal and large clusters. The settng for Mnsze s thus easly defned, n practce. The total complexty of SpaRef s equal to O(MlogM) + O(M 2 ). For a large dataset, when the number of ponts N s bg, the complexty of SpaRef s much less than O(N 2 ) as wth AHC. 4. Software Software has been developed usng C (GCC) n SunOS operatng system. Pre- and postprocessng of the mage s done n Matlab. MultSpec ( Purdue Research Foundaton), a mult-spectral mage data analyss system [17], and ERDAS IMAGINE product [18] are used for mage manpulaton and clusterng comparson. 5. Segmentaton Experments 5.1. Data As an example, we wll use a mult-spectral satellte mage recorded by a Compact Arborne Spectrographc Imager (CASI) scanner from the Natural Envronment Research Concl (NERC) that was taken at 1536 m over an area n the Klompenwaard, the Netherlands durng August The CASI has provded 10 bands for ths study from 437 nm to 890 nm, wth bandwdths of 10 nm, except for band 9 wth 8 nm. The area has sze of 211 x 301 pxels leadng to pxels wth 3 m resoluton coverng 633 x 903 m 2 51

CHAPTER 4 (Fgure 2). The orgnal mult-spectral data were mean centered and compressed va a prncpal components analyss n order to reduce computaton tme.

54 CHAPTER 4 (Fgure 2). The orgnal mult-spectral data were mean centered and compressed va a prncpal components analyss n order to reduce computaton tme. The clusterng methods were all performed on the frst four prncpal components, whch account for more than 99.8 % of the spectral varance. Next, the applcaton of four clusterng methods to these data wll be descrbed. The methods are SpaRef, K-means, ISODATA and Ward s clusterng. Fgure 2. Grayscale-mage of the frst prncpal component (PC1) of the mage. The whte band on the rght corresponds wth the rver; other wth areas show small lakes. Dkes and roads are vsble (e.g. parallel to the rver). Vegetaton and woodland are vsble as darker shades of gray Applcaton of SpaRef M s set to 300. The K-means clusterng was frst appled to the mages to obtan 300 classes (cells). The agglomeratve herarchcal clusterng of 300 classes was then contnued and the ndex functon was calculated. We present n fgure 3 the plot of the ndex functon over the number of classes. The fgure shows the locaton of the best choce of number of clusters to be 39, where there s a sharp change. Fgure 3. I ndex changes whle applyng AHC to 300 homogenous clusters. The optmal number of clusters s dentfed to be 39 where there s a sharp change. Ths data set s expected to contan also noses, mx-pxels, and hence, Mnsze s set to 100. Otherwse, Mnsze s zero. 14 clusters wth szes smaller than 100 pxels, contanng n total 765 ponts, (1%), have been rejected (Fgure 4). Indeed, by comparson wth ground-truth nformaton, those ponts are shadow areas, small objects (buldngs, structures of a boat, etc.). The remanng 25 normal classes wth ponts, 99 % of ponts are subjected to the refnement process as descrbed earler. 52

55 SPAREF: A CLUSTERING ALGORITHM FOR MULTI-SPECTRAL IMAGES ( Fgure 4. Unclassfed ponts n 14 very small classes (765 ponts, 1% of total ponts). Those ponts are shadow areas, small objects (buldngs, structures of a boat, etc.). 5.3 Applcaton of K-means K-means s senstve the choce of ntal centre ponts, so that we performed K-means 100 tmes wth random ntalsaton. The (non)compactness I, dsperson and DB ndces are llustrated n fgure 5. Clearly, for the 100 runs, the varablty n all three ndces s qute large. (a) (b) (c) Fgure 5. Indces of K-means for 25 clusters for 100 runs: (a). (non)compactness I ndex, (b). Dsperson ndex (c). Daves-Bouldn ndex. 5.4 Applcaton of ISODATA The data set has been also clustered by ISODATA algorthm [Fgure 6a] whch s mplemented n MultSpec software, a mult-spectral mage data analyss system for nteractvely analyzng Earth observatonal mult-spectral mage [17]. Wth the pror nformaton about the number of clusters and maxmum cluster sze, n order to fnd settngs leadng to 25 clusters, a tral-and-error strategy has been appled. A good settng of convergence, a stop-crteron, s 99 %. The algorthm s more accurate but takes more 53

56 CHAPTER 4 computaton tme f the stop-crteron s hgh. The mnmum cluster sze s 10, the dstance threshold used n decdng whether two clusters should be merged s 990, and the threshold determnng f a cluster should be splt s The number of clusters would not be 25 otherwse. Lower splt-threshold or lower dstance threshold leads to more clusters. It s very dffcult to fnd good settngs for ISODATA algorthm wthout pror nformaton about the data set. Ths s also the man lmtaton of ISODATA. 5.5 Applcaton of Ward s clusterng For convenence, the process of agglomeratve mergng of 300 cells nstead of ndvdual pxels s consderng as the modfcaton of Ward s method (M-Ward). Ths s actually our method wthout the refnement process. The modfcaton of Ward s method s expected to have slghtly lower values of (non)compactness I and DB ndces and a slghtly hgher value of the dsperson D ndex, compared to the orgnal Ward method. Ths dfference s not sgnfcant when M s hgh enough Results Four clusters for ISODATA, the frst of 100 runs of K-means and SpaRef, and 3 clusters for M-Ward method, n total coverng roughly the same area, are chosen from the clusterng results n order to present the results n gray-scale mage. These clusters are shown n Fgure 6(a),(b),(c) and (d) for ISODATA, the frst run of K-means, M-Ward and SpaRef, respectvely. In all cases, small clusters are excluded. By ths setup, ISODATA and K-means have nherted the advantage of not consderng small classes and nose, whch would otherwse degrade performance. Fgure 6. Four clusters of clusterng results: (a) by ISODATA; (b) by Kmeans; (c) by Ward; and (d) by SpaRef. The colour whte n (a-d) sgnfes others clusters ; there are only four shades of gray n (a). We compare clusterng results of SpaRef to K-means, ISODATA and M-Ward clusterng usng I, Daves-Bouldn and dsperson ndces. Table 1 shows (non)compactness I, dsperson D and DB ndces of dfferent methods. In (non)compactness I ndex, ISODATA leads to comparable values wth the average value obtaned from 100 runs K-means. M-Ward clusterng, wth the hghest I degree, s worse than any clusterng obtaned wth K-means [19]. The response from SpaRef s comparable 54

57 SPAREF: A CLUSTERING ALGORITHM FOR MULTI-SPECTRAL IMAGES to the best case obtaned from K-means. The table also shows that DB ndex gves the same scenaro as the I ndex. In the dsperson ndex, D, M-Ward method gves the lowest value (the hghest contnuty degree). It s because of the nearest neghbourhood rule affected on the spatal doman and t may thus be lower than the expected value of a true response of the D ndex. K- means obtans bad responses n all cases. ISODATA gves better result than K-means. Lastly, SpaRef obtans a lower D ndex than ISODATA and K-means. It s hgher than the response from M-Ward method but, as mentoned, t may be more close to the true value of D ndex. Indeed, n fgure 6c, the clusterng result from M-Ward method, a large cluster on the mddle-bottom area and on the rght sde along the rver has a very low dsperson degree (hgh contnuty degree). In the same area n fgure 6d, the result from SpaRef, boundares of ths cluster wth other clusters are curtaled and hence the dsperson degree of ths cluster s hgher. In fgure 6a and fgure 6b, the clusterng results from ISODATA and K-means, respectvely, the study area s dspersed and shared wth other clusters. The dsperson degrees of these clusterng results are thus very hgh. Overall, SpaRef does very well on all crtera smultaneously. 6. Concluson The paper presents a new clusterng algorthm, SpaRef, for hyperspectral mages. The proposed clusterng method, usng spatal nformaton, has the advantages to be stable, and leads to clusters wth a hgh degree of compactness and contnuty. Moreover, SpaRef can work wth a large dataset, by applyng an agglomeratve mergng process on a moderate number of hghly homogenous classes, nstead of on a very hgh number of ponts. Potental shortcomngs of the agglomeratve herarchcal clusterng are corrected by ntroducng a refnement process to ponts n the spatal doman. SpaRef method has gven good results on Klompenwaard CASI mage where t has been compared wth K-means, ISODATA and Ward s method. It would be a straght-forward step to successfully apply the algorthm to mult-spectral chemcal mages. The nose, mxed-pxel and very small objects are not taken nto account by SpaRef. Future work on categorsaton of very small classes s necessary n order to cluster a complete mage. 7. Acknowledgements We thank Gertjan Geerlng for sharng the data and stmulatng dscussons. References [1] A.K. Jan, M.N. Murty, and P.J. Flynn, ACM Computng Surveys, 31:3 (September 1999) [2] T. Duda, M. Canty, Int.J.Remote Sensng, 23:11 (2002) [3] G.H. Ball, D.J. Hall, ISODATA, a novel method of data analyss and pattern classfcaton, Sprngfeld, Stanford, [4] M. Halkd, Y. Batstaks, M. Vazrganns, SIGMOD Record, 31:2 (June 2002), [5] R. G. Brereton, Multvarate pattern recognton n chemometrcs, llustrated by case studes, Elsever [6] A. El-Hamdouch, P. Wllett, The Comp. J., 32: [7] M. Amadasun, R. A. Kng, Pattern Recognton, 21:3 (1988) [8] M. R. Anderberg, Cluster analyss for applcatons, Academc Press, New York,

58 CHAPTER 4 [9] P.H.A. Sneath, and R. R. Sokal, Numercal Taxonomy, Freeman, London, UK, [10] B. Kng, J. Am. Stat. Assoc. 69 (1967) [11] J. H. J. Ward, J. of the Amercan Statstcal Assocaton,58 (1963) [12] J.C. Duun, J. Cybern, 4 (1974) [13] D. L. Daves and D. W. Bouldn, IEEE Trans. on Pattern Anal. and Mach. Int,. 1:2 (Aprl 1979) [14] R. Kothar, D. Ptts, Pattern Recognton Letters, 20 (1999) [15] J. C. Bezek, N. R. Pal, IEEE Trans. on Sys, Man, and Cyber. part B, 28:3 (June 1998) [16] J.A. Rchards, X. Ja, Remote Sensng Dgtal Image Analyss, Sprnger, [17] [18] [19] C. Goutte, P. Toft, E. Rostrup, F. Nelsen, L. Hansen, NeuroImage, 9 (1999)

59 CHAPTER 5 INITIALIZATION OF MARKOV RANDOM FIELD CLUSTERING OF LARGE REMOTE SENSING IMAGES Abstract Markov Random Feld clusterng, utlzng both spectral and spatal nter-pxel dependency nformaton, often mproves classfcaton accuracy for remote sensng mages, such as mult-channel polarmetrc Synthetc Aperture Radar (SAR) mages. However, t s heavly senstve to ntal condtons such as the choce of the number of clusters and ther parameters. In ths paper, an ntalzaton scheme for MRF clusterng approaches s suggested for remote sensng mages. The proposed method derves sutable ntal cluster parameters from a set of homogeneous regons, and estmates the number of clusters usng the Pseudolkelhood Informaton Crteron (PLIC). The method works best for an mage consstng of many large homogeneous regons, such as agrcultural crops areas. It s llustrated usng a well-known polarmetrc SAR mage of Flevoland n the Netherlands. The experment shows a superor performance compared to several other methods, such as fuzzy C-means and Iterated Condtonal Modes (ICM) clusterng. Keywords: Image clusterng; Spatal nformaton; Parameter estmaton; ICM; T.N. Tran, R. Wehrens, D.H. Hoekman and L.M.C. Buydens, IEEE Trans. on Geosc. Remote Sensng, vol. 43, ss. 8, ,

60 CHAPTER 5 1. Introducton Clusterng s an mportant tool n mult-spectral/channel mage analyss. Most clusterng methods do not take nto account spatal nformaton of the mage, the nter-pxel dependency n the mage surface. Markov Random Feld (MRF) clusterng, frst dscussed by Besag [1][2] and later mproved by Qan and Ttterngton [3], provdes a way to ntegrate spatal nformaton wth a model-based clusterng approach [4][5]. In many cases, ths reduces a possble overlap problem of clusters and the effect of nose on the clusterng result [6]. MRF clusterng has also been appled to remote sensng [7][8][9][10]. In MRF clusterng approaches, of whch the terated condtonal modes (ICM) clusterng s an example, the class probablty of a pxel s locally dependent on ts spatal neghbor clusters. In operaton, just as the ordnary model-based clusterng, the method assumes a mxture of all components (clusters). Startng from an ntal guess model, an teratve method s used to ft the model to the dataset. The most common way s usng maxmum lkelhood va an expectaton-maxmzaton (EM) algorthm. However, dfferent from the ordnary model-based clusterng, the only classes consdered for a pxel are classes that are present among the neghborng pxels [3][4]. We refer the reader to McLachlan and Peel [4] for an extensve revew of MRF mxture models. Snce the convergence of MRF clusterng methods s local, the accuracy s much more dependent on the ntal guess of cluster parameters than the classcal model-based clusterng algorthm. They typcally work well n supervsed mode, where the number of clusters and ther assocated parameters are known or can be estmated [10]. MRF clusterng methods then tend to converge rapdly [1]. If the estmaton of the ntal parameters fals, classfcaton results can be very poor [11], and a locally optmal soluton s often obtaned nstead of a global soluton. Thus, the ntal parameters should be qute close to the true parameters. The ntalzaton scheme s often smply random, or sometmes t s obtaned from other clusterng technques, such as k-means [11] or fuzzy c- means. As an alternatve, an agglomeratve herarchcal clusterng (AHC) framework can also be used as an ntalzaton scheme, ether n the form of sngle- and complete-lnkage methods, or n a model-based form [12]. The method provdes a dendrogram, representng nested clusters. Intal parameters for model-based clusterng, as well as MRF clusterng n ths case, can be easly extracted for dfferent cluster models. However, ordnary AHC ntalzaton starts wth sngleton clusters [6] whch makes t mpractcal for large data sets. In ths paper, a new AHC ntalzaton framework to MRF clusterng, sutable for large remote sensng mages, e.g. Synthetc Aperture Radar (SAR) mages, s proposed. Instead of startng wth sngleton clusters, a lmted number of homogeneous regons, obtaned from a smple segmentaton method (usng a so-called mult-level homogenety test ) s used for buldng up the dendrogram. In general, many mergng crtera, or dstances between two clusters, can be used n AHC [11]. Here, a determnstc Bhattacharyya dstance and a probablstc lkelhood are used. In ths study, an example of MRF clusterng wth the new ntalzaton framework s evaluated on a polarmetrc SAR mage of an area n Flevoland n the Netherlands. The paper s organzed as follows. In Secton II, we ntroduce the basc elements of modelbased and MRF clusterng. The proposed clusterng strategy, usng the herarchcal agglomeraton ntalzaton scheme, s descrbed n Secton III. Problems of determnng the number of clusters and dealng wth outlers are also dscussed. Secton IV shows the 58

61 INITIALIZATION OF MRF-CLUSTERING OF LARGE REMOTE SENSING IMAGES applcaton to the polarmetrc SAR mage of Flevoland. Our conclusons and dscussons are gven n Secton V. 2. Basc Elements n Mxture Models and Markov Random Feld Clusterng A. Mxture models An mage of sze n n d dmensonal feature space contans a set of pxels X= x T T 1,...,x T n, where x s a vector of pxel values n the spectral doman. In model-based clusterng [4][5], each cluster c s descrbed by a multvarate dstrbuton f wth parameters θ c. Most commonly, f s the multvarate Gaussan dstrbuton, and θ c contans mean µ c and covarance Σ c. The total data set s descrbed by a lnear combnaton of ndvdual clusters and ther correspondng mxture proportons π c. The probablty densty functon f x ;Ψ of the pxel x under a g-component (cluster) mxture s then gven by: f ( ) ( x ;Ψ ) = π f ( x ; θ ) g c c= 1 c c where g s the total number of clusters, and Ψ contans all cluster parameters and mxture proportons. The probablstc lkelhood functon L(Ψ) s gven by the followng expresson: ( ) = f ( x ; ) L Ψ N = 1 Ψ Clusterng can be seen as an ncomplete-data problem, n whch u c s defned as the condtonal probablty of object x belongng to cluster c. The complete-data X s now therefore declared to be [4] T T T X c = X, u (3) where matrx u contans the values u c. The complete-data log-lkelhood functon s then derved, log ( Ψ ) complete dervaton): L c (1) (2), (refer to [4] for a ( ) g N log Lc Ψ = uclog( πc f ( x ; θc )) (4) c= 1 The am of model-based clusterng s to obtan a confguraton Ψ, so that t maxmzes the log-lkelhood functon, log L c (ψ). It s usually performed by the EM (Expectaton- Maxmzaton) algorthm [13]. At each teraton, k, EM conssts of two sub-steps, called the M-step (Maxmzaton step), maxmzng π c and θ c, and the E-step (condtonal Expectaton step), estmatng u c by the normalzed post probablty, gven by: N k uc 1 k π = = 1 c N (5) 59

62 CHAPTER 5 N k 1 uc x k = 1 µ c = N k 1 uc = 1 k k uc = P (x c)= g N k T c c c µ c c= 1= 1 ; and k k k Σ = u x µ x k k πc fc x ;θc g k πd fd d= 1 k x ;θd EM starts wth an ntal guess Ψ o and terates untl convergence, or untl the number of teratons exceeds a certan threshold. An advantage of model-based clusterng methods s that the classfcaton results are n a soft form, a condtonal probablty, nstead of a hard form, e.g. as n K-means or ISODATA methods. The soft form of clusterng result s more flexble to model remote sensng mages, where there are mxtures of ground cover types wthn a resoluton cell, nose due to lmted sensor sensblty or, n case of radar, statstcal varaton because of speckle. Outlers or nose pxels normally show a low membershp for all clusters. Moreover, the method s computatonally effcent. However, just lke many other clusterng methods usng only spectral nformaton, the method s nfluenced by the problem of overlappng clusters [6]. Ths problem can be reduced by takng nto account spatal nformaton. B. Markov Random Feld and model-based clusterng Model-based clusterng can be combned wth the Markov Random Feld (MRF) to take nto account the spatal relaton between pxels. In lterature, the MRF model on modelbased clusterng frst has been appled for the restoraton of drty mages [1] and referred to a smoothng technque whch gves more weght to the fuzzy class membershps of spatal neghbor clusters. It s assumed that the class probablty of a pxel s only dependent on class membershps of ts (spatal) neghbor clusters, so that t reduces the possble nfluence of nose and overlappng clusters [6]. Practcal examples n remote sensng applcatons show mprovement of the separaton of varous ground cover classes [7][8][9][10]. More precsely, the w-th order neghborhood system for a partcular pxel, called, s defned as a set of neghbor pxels belongng to a rectangular wndow of sze w, centered at the pxel. The condtonal probablty of pont x of belongng to cluster c under the neghborng system s estmated by [1]: 1 = u (8) P(c = c) exp β Z jc j where Z s a normalzaton constant and β s a spatal smoothness parameter. A hgher (postve) β corresponds to hgher spatal dependency of neghbor ponts. The EM algorthm s then adapted, leadng to the log lkelhood crteron: ( ) g N log LMRF Ψ = uclog( πc f ( x ; θc )) (9) c= 1 The mxture proportons π c s now replaced by the transton probablty π c [4]: (6) (7) 60

63 INITIALIZATION OF MRF-CLUSTERING OF LARGE REMOTE SENSING IMAGES g πc = exp β u jc / exp β u jh j h= j 1 Not only does the algorthm tend to converge faster, snce EM tends to converge to a local optmum, the clusterng accuracy s heavly dependent on the ntal guess Ψo, and the choce of number of clusters [11][1]. Therefore, obtanng a good Ψo s a key element of MRF clusterng. 3. The proposed method We propose a new ntalzaton framework, based on agglomeratve herarchcal clusterng (AHC), whch s sutable for remote sensng mages. The ordnary AHC usually starts from N sngleton clusters. Iteratvely, the smlartes between all cluster pars, and j, are calculated and two closest clusters are merged. The algorthm ends when there s only one cluster. Varants dffer manly accordng to the crteron for optmalty, the cluster smlartes. Sngle-lnkage, Complete-lnkage, and Average-lnkage are classcal agglomeratve methods wth the mergng crteron to be nearest, farthest, and average neghbor. In herarchcal model-based clusterng [12], the probablstc lkelhood smlarty s used, and a maxmum-lkelhood par s merged at each stage accordng to a specfc model. The AHC algorthm yelds a dendrogram, representng nested clusters and smlarty levels where clusters are joned. In order to obtan ntal parameters for a partcular number of clusters model, the dendrogram s cut at the correspondng level. The equvalent parameters are extracted and they can be used for ntalzaton of the MRF model-based clusterng [12]. However, ths method s sutable for only very small data set because the method demands very hgh computaton tme and computatonal resources proportonal to the square of the number of pxels. In order to reduce the computaton tme, one soluton s to apply the method on a small representatve subset of pxels of mage. Usually, random samples are taken [14]. Iteratve procedures may also be used [15][16]. An alternatve method s segmentaton of the mage nto a number of homogeneous regons. The AHC clusterng s then appled to these homogeneous regons rather than to the whole mage. The mnmum spannng tree and k-means, for example, are used to create such regons n [15] and n [17], respectvely. In ths study, a smple segmentaton method, the so-called mult-level homogenety test, s used to obtan homogeneous regons. The full proposed clusterng procedure s summarzed n the flowchart below: (10) The algorthm: Step 1 (Representatve regons): Obtan an over-segmented mage by applyng the mult-scale homogenety test. Step 2 (Parameter estmaton): Apply agglomeratve herarchcal clusterng on the oversegmented mage to obtan the ntal parameters for M predefned models. Step 3 (Actual clusterng): Apply MRF clusterng for each model, usng the ntal parameters obtaned n the prevous step. The fnal soluton wll be selected from the M models usng the Pseudolkelhood Informaton Crteron (PLIC). 61

64 CHAPTER 5 The mage s frst dvded nto a number of regons n a multspectral mage. A regon r s defned to be a group of pxels formng a contnuous regon n spatal doman, e.g. a rectangle or an ellpse. Gven a regon r and a set of sub-regons r, the regon r s sad to be totally homogeneous at sgnfcance level α, for nstance 0.05, f for all pars of subregons <r, r j > the test of complete homogenety s not rejected at the sgnfcance level α. The test of complete homogenety s defned below: Gven two groups of pxels, A and B, wth numbers of pxels, mean vectors and covarance matrces,{ n A, µ AΣˆ A} and{ nb, µ B, Σˆ B}, respectvely. Σ< AB > s covarance matrx of the unon of A and B. The test of complete homogenety of two groups under the hypothess Hc; µ A = µ B and Σ A = Σˆ B ˆ s the lkelhood rato test λ = Lc / L, where Lc and L are the maxmzed lkelhoods under the hypothess Hc and the unconstraned maxmum lkelhoods, respectvely. The statstc 2logλ = ( na +nb ) log Σ AB> ( nalog Σ A +nblog Σ B ) has an asymptotc ch-squared dstrbuton wth ( d +3) (11) < d degrees of freedom [18], where d 2 s the number of nput bands of the mage data. Hence, two groups of pxels, A and B, are sad to be completely homogeneous at sgnfcance level α f and only f -2logλ s not sgnfcantly larger than the crtcal value provded by the ch-squared dstrbuton. The homogenety test for the regon can also be recursvely appled to all sub-regons r, as n Fgure 1. It s then called the mult-level homogenety test. In ths study, the two-level homogenety test s used, n whch the test s repeated once for all sub-regons. It s obvous that less homogenous regons are obtaned for hgher level tests because these are stronger than the lower level test. Fgure 1. Mult-scale homogenety test for spatal regon. Regon R conssts of r 1, r 2,.., r 4. Agan, each sub-regon r conssts of r 1,, r 4. In prncple, the sze of regons wll determne the total number of obtaned regons. The choce s a trade-off and a tral-and-error strategy s normally appled. The test s less based wth a larger sze of the regon, but t should not be too large because very large regons would contan more than one component and the homogenety test may fal. Moreover, small homogeneous regons may not be recognzed. On the other hand, the sze should not be too small, because ths leads to hgher bas on the test, even t s performed on the area of the same component. Ths would result n fewer homogenous regons and less relable estmates of the parameters. In practce, we found that a good settng of the regon sze les around 10 x 10 pxels for a two-level method. The choce, of course, depends also on mage resoluton. The task s easer for hgh resoluton mages. 62

65 INITIALIZATION OF MRF-CLUSTERING OF LARGE REMOTE SENSING IMAGES The agglomeratve herarchcal process s used n step 2 to merge homogeneous regons, yeldng a dendrogram. From that, the statstcal parameters can be extracted for each cluster model. The mergng crteron n AHC can be a determnstc, e.g. the Eucldean dstance, or a probablstc lkelhood smlarty, as used n model-based herarchcal clusterng [5][12]. In ths study, the Bhattacharyya determnstc dstance, whch gves the dstance between two Gaussan regons r1 and r2, s used [19]: B 1 1 ( ) ( ) ( ) T Σ r1 + Σ r2 1 Σ r1 + Σ (12) r2 r1, r2 = µ r1 µ r2 µ r1 µ r2 + ln Σ r1 Σ r2 The Bhattacharyya dstance conssts of two terms, whch are domnated by the dfferences n mean, and covarance, respectvely. It s very close to the Bayes error of two clusters [19]. In many cases, outlers are also present n the regons obtaned n step 1. By the herarchcal mechansm, they are trapped nto solated sngleton clusters. The real number of clusters can be thus defned after these sngleton clusters (outlers) are elmnated [17]. At ths pont, a lst of solutons s extracted from the dendrogram for M nterestng models. Then, statstcal parameters for each cluster model can be calculated. Then, the last step, the actual MRF clusterng s performed on the entre mage for all selected models. A. Determnng the best model One of the ways to determne the best model n model-based clusterng s by usng an approxmate Bayes factor [20]. The Bayesan Informaton Crteron (BIC) s often used for the tradtonal model-based clusterng [5]: ( ) d log( n) BIC = 2logL Ψ k k (13) where d k s the number of parameters of the model. The Pseudolkelhood Informaton Crteron (PLIC) s adapted from BIC for the MRF modelng [21]: PLIC = 2logLMRF ( Ψ k ) dklog( n) (14) where the MRF log-lkelhood functon, log L MRF ( Ψ k ), s used nstead of the ordnary loglkelhood functon. PLIC s used n ths study. The best model normally corresponds wth the hghest BIC or PLIC value. For a complex data set, where more Gaussans are needed to ft one class, the most sgnfcant ncrease n BIC or PLIC value s used. Apart from the best model, suggested by measures lke BIC or PLIC, the vsualzaton of the lst of M clusterngs gves addtonal nformaton for choosng a good number of clusters. Ths s a useful feature of ths approach n practce, e.g. n remote sensng as n the example n Secton IV. B. Model Outlers MRF clusterng yelds zc = P(x c), the posteror probablty of pont x on the component c. A hard clusterng tends to nterpret ths by assgnng the pxel x to a cluster c f P(x d) for all d. Ths nterpretaton s only vald f the pxel belongs to at least one cluster. Ths s not the case n ths study, where the ntalzaton process does not provde a complete lst of the clusters, but a group of major clusters. It means that there wll be a set of pxels, the so-called set O, that are not close to any of those major clusters. Put dfferently, these pxels are poorly ftted by the current model. In ths case, the pxels n set 63

66 CHAPTER 5 O can be dentfed as havng very low probablty denstes for all dentfed clusters, and set O can be seen as contanng the outlers to the model. One of the ways to dentfy set O s to compare the Mahalanobs dstances of a pxel to all clusters wth the Hotellng T 2 dstrbuton. Thus: 2 2 O = x Mah (x c) < T for each cluster c, 2 T c Mah (x c)= ( x µ c ) Σ ( x µ c ), and c 1 2 m( nc 1)( nc + 1) T = F where n c υ,m,n m nc ( nc m) c c s the number of pxels n cluster c, m s the number of dmensons, υ s the level of sgnfcance, 1 υ s the level of confdence, e.g. 95%, and F s the F-statstc. [18] The set O may consst of pxels from a cluster, whch s small n sze or solated n the spatal doman. One can further work on ths set by usng any spectral-only clusterng method. The suggeston here s to use ncremental model-based clusterng, descrbed n [16]. The method bulds a model takng nto account the current model and teratvely addng new clusters to descrbe set O. C. Computatonal analyss The proposed method s fast. The statstcal test n step 1 has a complexty of O(n.w), where w s the sze of the regon. In step 2, t s noted that only part of the mage s taken nto account. Pxels n heterogeneous regons, rejected by the homogenety test, are skpped and only homogenous regons are consdered. The maxmum number of operatons s O(s 2 ), where s s the total number of homogeneous regons. Lastly, the complexty of MRF clusterng s equvalent to O(n log n). Hence the total complexty of the system s O(n.w + s 2 + n.log(n)), whch s acceptable for a normal sze of remote sensng mage. 4. Applcaton to SAR Data We nvestgate our method by applyng t to a well-known SAR mage of Flevoland, an agrcultural area n The Netherlands, acqured by the NASA/JPL ArSAR system (C-, L- and P-band polarmetrc) on 3 July The polarmetrc backscatter behavour for homogeneous felds can be descrbed by the Wshart dstrbuton or ts margnal dstrbutons [22][23][24][25]. The characterstcs of the physcal scatterng mechansms are employed for classfcaton n [26][27][28]. They may also be exploted n the ntalzaton phase of model-based clusterng [29][30]. In fact, they can also be appled to MRF clusterng. However, t s outsde the scope of ths paper to dscuss and compare these n detal. In [31], the full polarmetrc nformaton s transformed to a log-normal dstrbuton, and the valdty of ths s demonstrated for the Flevoland data set whch used n ths paper. For practcal applcatons, t s mportant to note that when ntenstes have a log-normal dstrbuton, ths dstrbuton transforms nto a normal dstrbuton after logarthmc scalng. Then, the classfcaton can be performed drectly on these logarthmcally scaled (db values) ntensty mages wth multvarate Gaussan dstrbuton assumpton. Note that for an ndvdual homogeneous feld the complex Wshart dstrbuton and ts margnal dstrbutons are approprate. However, for classfcaton of a complex scene, featurng between feld varatons, the class dstrbutons (.e. the values of all pxels belongng to a certan class) are the ones that are of prmary mportance. For a collecton of felds from 64

INITIALIZATION OF MRF-CLUSTERING OF LARGE REMOTE SENSING IMAGES the same class, whch typcally show slghtly dfferent radar backscatter mean values, the sgnals are shown to conform well to log-normal

67 INITIALIZATION OF MRF-CLUSTERING OF LARGE REMOTE SENSING IMAGES the same class, whch typcally show slghtly dfferent radar backscatter mean values, the sgnals are shown to conform well to log-normal dstrbutons. In ths study, 18 ntensty bands from the C- and L-band of the full polarmetry model (see reference [31]) are used. In that reference, only supervsed classfcaton, where the classes statstcs are known from the tranng set, s used. The study area has a sze of 400 x 400 pxels and s taken from the orgnal data wthout any aggregaton process. Ths s a reasonable sze for demonstraton purposes n ths case. Even then, t stll requres good ntal parameters for clusterng n order to have good results for the area. The clusterng process takes only few mnutes usng Matlab on a PC Pentum IV computer. Fgure 2. (a) shows the false-color mage and (b) the ground-truth nformaton of the ste. Fg. 2a shows the false-color mage of the frst 3 ntenstes of the C band. The crop type map whch s the ground truth for the clusterng s shown n Fg. 2b. The yellow color s a mask where the ground truth s uncertan, or not recorded. Heavly overlappng clusters are shown n Fg. 3 between Barley (Green) and Wnter Wheat (Magenta) clusters. Together wth sensor speckles, they are the two man problems for ths mage. Fgure 3. a) Mean spectra of objects n each of the three classes, (b) Score plot of the two frst PCs of all pxels n the three classes. For the analyss, the mage s frst dvded to 3136 square wndows (regons) wth a sze of 15x15 pxels. It s done so that two adjacent regons are overlappng for 50 percent,.e. the center of one regon s on the edge of the other. Ths produces more regons for the test, ncreasng the probablty of a regon correspondng exactly to one crop type, leadng to more homogeneous regons. Indeed, after a two-level homogenety test, the oversegmented mage contans 227 homogeneous regons, as shown n Fg

Subsequently, homogeneous regons are combned wth AHC usng the Bhattacharyya dstance.

68 CHAPTER 5 Fgure 4. Homogeneous regons. Fgure 5. AHC result on homogeneous regons usng Bhattacharyya dstance to 5-,, 10-cluster models. Subsequently, homogeneous regons are combned wth AHC usng the Bhattacharyya dstance. Statstcal parameters are extracted for seven models, correspondng to [4,..,10] clusters. As an llustraton, sx of these models are shown n Fg. 5. Then, at the fnal step, 66

INITIALIZATION OF MRF-CLUSTERING OF LARGE REMOTE SENSING IMAGES MRF clusterng s performed for each cluster-model wth β=1 and a 5x5 neghborng wndow system.

69 INITIALIZATION OF MRF-CLUSTERING OF LARGE REMOTE SENSING IMAGES MRF clusterng s performed for each cluster-model wth β=1 and a 5x5 neghborng wndow system. The clusterng results of four selected models are shown n Fg. 6. In ths case, we know that the correct number of classes equals seven. The PLIC values for the seven models, consdered n step 2, are plotted n Fg. 7. In ths complex data set, the 6- cluster model shows the largest ncrease n PLIC value. Ths s to be expected snce there s a large overlap between Barley and Wnter Wheat clusters (Fg. 3). The seven-cluster model obtans more than 97 % accuracy on the area for whch reference nformaton s avalable (the non-yellow area n Fg 2b). The accuraces of the separate clusters are 96, 95, 100, 98, 98.5, 100 and 91%, respectvely. Snce the clusterng result s unlabelled, the clusterng accuracy s calculated from the most overlappng cluster-class combnaton. c) Fgure 6. MRF clusterng results for (a) 6-, (b) 7-, (c) 8-, and (d) 10-cluster models. Fgure 7. Plot of PLIC values for 4-,..,10-cluster models. The method was also compared wth other often-used ntalzaton methods, such as random ntalzaton, K-means and fuzzy C-means clusterng. The correspondng results are shown n Fg. 8a-c and the maxmal total accuraces after 50 runs are 81%, 85% and 79%, respectvely. A comparson has also been done wth ordnary fuzzy C-means, whch leads to only 44 % accuracy (Fg. 8d). 67

If segmentaton s used as a preprocessng step, the classfcaton accuracy s ncreased to 96.3%. The result s qute comparable to our unsupervsed method, n whch the class sgnatures are unknown beforehand.

70 CHAPTER 5 Fgure 8. MRF clusterng wth (a) random ntalzaton; (b) ntalzaton by K-means; (c) ntalzaton by the fuzzy C-means; and d) fuzzy C-means clusterng. We performed a further test usng a supervsed maxmum-lkelhood classfcaton approach [31]. Ths obtaned only 78.4% total accuracy. If segmentaton s used as a preprocessng step, the classfcaton accuracy s ncreased to 96.3%. The result s qute comparable to our unsupervsed method, n whch the class sgnatures are unknown beforehand. As expected, not all pxels are well descrbed by the clusters that are found. The O-mage n Fg. 9 shows the outlers of the model. They partly consst of pxels from unknown classes (e.g. pxels n the upper-rght regon or road structures) or sensor speckles. One can further work on these pxels by usng, for example, the ncremental model-based clusterng method descrbed n [16] to dentfy addton classes and nose. The method takes nto account the current model and new clusters n the set O. However, more dscusson s not wthn the scope of ths study. 68 Fgure 9. The O-mage shows the outlers of the seven-cluster model. In order to mprove the classfcaton results, speckle s normally reduced from the orgnal mage by de-nosng schemes, such as movng average flterng or dedcated speckle

71 INITIALIZATION OF MRF-CLUSTERING OF LARGE REMOTE SENSING IMAGES flterng. The drawback of flterng technques s that the structure n the data may be affected. Our proposed algorthm, on the other hand, works drectly on the orgnal mage. Outlers n classfcaton results caused by speckle can be dentfed afterwards. 5. Concluson and dscusson We have proposed n ths work a farly smple ntalzaton method, whch makes MRF clusterng more robust and applcable for clusterng of large remote sensng mages, a very dffcult task for any unsupervsed classfcaton method. The method works best for an mage consstng of many large homogeneous regons, such as agrcultural crops areas. Small and solated clusters may not be recognzed by the method. In ths case, ncremental model-based clusterng s suggested as a post-processng step. In many cases, a good choce of the number of clusters may be dentfed by the use of PLIC. Pror nformaton can also be used to determne the optmal model. The proposed method does not need preprocessng on the orgnal mage data. The method s totally unsupervsed, whch s the bg advantage snce n many cases ground truth s not avalable. In ths work, the method was appled to a polarmetrc SAR mage, utlzng the full polarmetrc nformaton content through a transformaton descrbed n [31]. The method shows excellent results. Our future work wll focus on usng other segmentaton methods, such as regon growng, n the frst step. Ths can overcome the lmtaton of the current method to dentfy small and solated clusters, and wll sgnfcantly ncrease possbltes of the proposed approach on remote sensng applcatons. Acknowledgements We thank Smon Dodds for helpng us to mprove the Englsh. References [1] J. Besag, On the statstcal analyss of drty pctures, Journal R. Statstc, Soc., B, [2] J. Besag, Spatal Interacton and the Statstcal Analyss of Lattce System, Journal R. Statstcs, Soc. Ser. B, 36, pp , [3] W. Qan, D.M. Ttterngton, Estmaton of parameters n hdden Markov models, Phlosophcal Transactons of the Royal Socety of London A, vol. 337, pp , [4] G. McLachlan and D. Peel, Fnte Mxture Models, Wlley seres n probablty and statstc, Canada, [5] C. Fraley and A. E. Raftery, Model-based clusterng, dscrmnant analyss, and densty estmaton, J. the Amer. Statst. Asso., vol. 97, pp , [6] T.N. Tran, R. Wehrens and L.M.C. Buydens, Clusterng multspectral mages: a tutoral, Chemom. Intell. Lab. Syst., n press. [7] A.H.S. Solberg, T. Taxt, A.K. Jan, A Markov random feld model for classfcaton of multsource satellte magery, IEEE Trans. on Geosc. Remote Sensng, vol. 34, pp , Jan [8] P.C. Smts, S.G. Dellepane, Synthetc aperture radar mage segmentaton by a detal preservng Markov random feld approach, IEEE Trans. on Geosc. Remote Sensng, vol. 35, pp , Jul [9] Q. Jackson, D. A. Landgrebe, Adaptve Bayesan Contextual Classfcaton Based on Markov Random Felds, IEEE Trans. on Geosc. Remote Sensng, vol. 40, pp , Nov [10] A. Sarkar, M. Kumar Bswas, B Kartkeyan, V. Kumar, K. L. Majumder, D. K. Pal, A MRF Model-based Segmentaton Approach to Classfcaton for Multspectral Imagery, IEEE Trans. on Geosc. Remote Sensng, vol. 40, pp , May

72 CHAPTER 5 [11] R. Fjortoft, Y. Delgnon, W. Peczynsk, M. Sgelle, and F. Tupn, Unsupervsed Classfcaton of Radar Images usng Hdden Markov Chans and Hdden Markov Random Felds, IEEE Trans. on Geosc. Remote Sensng., vol. 41, pp , Mar [12] C. Fraley, Algorthms for Model-Based Gaussan Herarchcal Clusterng, SIAM J. Sc. Comput., vol. 20, pp , [13] A.P. Dempster, N.M. Lard and D.B. Rubn, Maxmum lkelhood from ncomplete data va the EM algorthm, J. R. Statst. Soc. B, vol. 39, pp. 1-38, [14] R. Wehrens, L.M.C. Buydens, C. Fraley and A.E. Raftery, Model-based clusterng for mage segmentatons and large datasets va samplng, Techn. Report no. 424, Dept. of Statstcs, Unversty of Washngton, [15] C. Posse, Herarchcal Model-Based Clusterng for Large Datasets, Journal of Computatonal and Graphcal Statstcs, vol. 10, pp , [16] C. Fraley, A. E. Raftery and R. Wehrens, Incremental Model-Based Clusterng for Large Datasets wth Small Clusters, Techn. Rep. no. 439, Dept. of Statstcs, Unversty of Washngton, Dec [17] T. N. Tran, R. Wehrens and L. M. C. Buydens, SpaRef: A Clusterng Algorthm for Satellte Imagery, Anal. Chm. Acta, vol. 490, pp , [18] K.V. Marda, J. T. Kent, J. M. Bbby, Multvarate Analyss. London, Academc Press, [19] D. J. Hand, Dscrmnaton and classfcaton. John Wley & Sons, [20] R. E. Kass and A. E. Raftery, Bayes factors and model uncertanty, J. Amer. Statst. Assoc.,, vol. 90, pp , [21] D.C. Stanford, A. E. Raftery, Approxmate Bayes Factors for Image Segmentaton: The Pseudolkelhood Informaton Crteron (PLIC), IEEE Trans. on Pattern Anal. Mach. Intell., vol. 24,pp , [22]. H. A. Yueh, A. A. Swartz, J. A. Kong, R. T. Shn, and L. M. Novak, Bayes classfcaton of terran cover usng normalzed polarmetrc data, J. Geophys. Res. B-12, vol. 93, pp , [23]. H. H. Lm et al., Classfcaton of earth terran usng polarmetrc SAR mages, J. Geophys. Res., vol. 94, pp , [24]. J. J. van Zyl and C. F. Burnette, Baysan classfcaton of polarmetrc SAR mages usng adaptve a pror probablty, Int. J. Remote Sens., vol. 13, no. 5, pp , [25]. J. S. Lee, M. R. Grunes, and R. Kwok, Classfcaton of mult-look polarmetrc SAR magery based on complex Wshart dstrbuton, Int.J. Remote Sens., vol. 15, no. 11, pp , 1994 [26]. J. J. van Zyl, Unsupervsed classfcaton of scatterng mechansms usng radar polarmetry data, IEEE Trans. Geosc. Remote Sensng, vol. 27, pp , Jan [27]. S. R. Cloude and E. Potter, An entropy based classfcaton scheme for land applcatons of polarmetrc SAR, IEEE Trans. Geosc. RemoteSensng, vol. 35, pp , Jan [28] A. Freeman and S. L. Durden, A three-component scatterng model for polarmetrc SAR data, IEEE Trans. Geosc. Remote Sensng, vol. 36, pp , May [29]. J. S. Lee, M. R. Grunes, T. L. Answorth, L. J. Du, D. L. Schuler, and S. R. Cloude, Unsupervsed classfcaton usng polarmetrc decomposton and the complex Wshart classfer, IEEE Trans. Geosc. Remote Sensng, vol. 37, pp , Sept [30]. J. S. Lee, M. R. Grunes, E. Potter, and L. Ferro-Faml, Unsupervsed Terran Classfcaton Preservng Polarmetrc Scatterng Characterstcs, IEEE Trans. on Geosc. Remote Sensng, vol. 42, pp , [31] D.H. Hoekman, and M.A.M. Vssers, A new polarmetrc classfcaton approach evaluated for agrcultural crops, IEEE Trans. on Geosc. Remote Sensng, vol. 41, pp , Dec

73 CHAPTER 6 STRATEGIES FOR MIXTURE MODEL CLUSTERING OF MULTIVARIATE IMAGES Abstract Two novel strateges for mxture model clusterng of multvarate mages have been developed. Most other approaches requre good guesses of the number of components (clusters) and ther ntal statstcal parameters. In our approach, the ntal parameters of mxture model clusterng are determned by agglomeratve clusterng on homogenous regons, dentfed by regon growng segmentaton. One strategy s developed for a normal stuaton of mxture modellng, where the densty of a cluster s modeled by a sngle normal dstrbuton; the other s desgned for a more complex stuaton, where the densty of a sngle cluster s a mxture of several normal sub-clusters. The method s very robust to nose/outlers and overlappng clusters. It s also reasonably fast and sutable for moderate to large mages. Experments on both smple and complex data sets are presented. Keywords: Mxture models; Clusterng; Number of clusters; Spatal nformaton T.N. Tran, R. Wehrens, and L.M.C. Buydens, revsed for Journal of Chemometrcs. 71

74 CHAPTER 6 1. Introducton The mxture modellng approach to clusterng plays a major role n exploratory data analyss n searchng for groupngs n the data [1][2]. The data to be clustered are usually descrbed by a mxture of a number of Gaussan components and the clusterng uses the Expectaton-Maxmzaton (EM) algorthm to ft the fnte mxture model to the dataset [3][4]. However, the EM method s very senstve to the ntal estmate of the number of components and ther statstcal parameters (means and covarances) [5]. Several solutons n lterature have addressed the problem. Fraley et al. [6][2] suggested obtanng the ntal parameters values va model-based agglomeratve clusterng. However, drect applcaton of ths ntalzaton method to large datasets s often prohbtvely expensve n terms of computer tme and memory [7]. Even a few thousand pxels may already be too large for convenent processng. To get out ths stuaton, estmates of the statstcal parameters can be derved from a small sample of the data. However, obtanng a representatve sample s qute dffcult n many cases [7]. Qute recently, for mage data, the agglomeratve process s sped up usng segmentaton technques [8][9]. Here, an over-segmented mage s produced as nput to the agglomeratve process. However, the fnal clusterng result, obtaned rght after the agglomeratve process, has the flexblty-problem [10]; once a pxel has been assgned to a cluster, t wll not be consdered for jonng other clusters n later teratons. Two new strateges are proposed n ths paper to mprove the mxture model clusterng. The strateges combne agglomeratve clusterng and segmentaton to obtan ntal estmates for the subsequent mxture model clusterng. Moreover, to deal wth overlappng clusters and nose, the clusterng result s fltered by a Markov Random Feld (MRF)-based technque at the fnal step. The basc strategy (Strategy I) s used for data where clusters are normally dstrbuted. However, t s frequently encountered n practce that cluster denstes can be non-normal. Detectng non-gaussan classes s a challengng task usng Gaussan mxture model clusterng. As suggested n [1] and recently n [17], non-gaussan classes could be modeled by several Gaussan dstrbutons. For ths reason, we develop Strategy II for ths complex stuaton. It ams to group Gaussan subclusters to form a complete component, and at the same tme to retan very small clusters. Examples are gven for two real-world cases: a multspectral mage for mnced meat, and an RGB mage of St. Paula flowers. The results are compared to other methods such as Fuzzy C-means [11] and mxture modelng clusterng. In these cases, the spatal relatons between pxels are gnored. 2. Prevous works 2.1 Mxture model clusterng In bref, n mxture model clusterng, the probablty densty functon of the pxel x s gven by: f g ( x ; Ψ) = f ( x ; θ ) c= 1 π (1) c c θ contans the means and covarances ( ) where g s the number of components, c µ, Σ of c c cluster c, π c s the mxture proporton, and Ψ contans all cluster parameters (θ ) and mxture proportons (π ). The form of the multvarate dstrbuton functon f s chosen 72

75 STRATEGIES FOR MIXTURE MODEL CLUSTERING OF MULTIVARIATE IMAGES accordng to the underlng dstrbuton of the data set; usually a multvarate normal dstrbuton s used. If the data are not normally dstrbuted, a mxture of normal dstrbutons can stll descrbe the cluster shape qute well [1]. We consder both stuatons n ths paper. For a dataset of n pxels, the mxture model clusterng algorthm maxmzes the completedata log-lkelhood functon: g n log L( Ψ) = uc log( π c f ( x ; θc )) (2) c= 1 where u c corresponds to the condtonal probablty of object x belongng to cluster c. The Expectaton-Maxmzaton (EM) algorthm [3][4] s usually used to ft the fnte mxture model to the dataset. At each teraton k, EM conssts of two sub-steps, called an E-step and an M-step. The E-step (condtonal Expectaton step), estmatng the condtonal probablty u c, s gven by: k k uc = P (x c) = k k πc fc x ;θc g k πd fd d= 1 k x ;θd In the M-step (Maxmzaton step), the statstcal parameters π c and θ c are estmated from the data [1]. Usually, several dfferent models are ftted. To fnd the one that fts the data best, many dfferent crtera can be used (see, e.g. [1]). One of the most popular crtera s the Bayesan Informaton Crteron (BIC) [12][13]: BIC = 2 log L( Ψ) d log( n) (4) where d s the number of parameters of the model. The best model s ndcated by a maxmal BIC value. Ths corresponds to a model wth few parameters that nevertheless fts well (hgh lkelhood). 2.2 Estmaton of ntal cluster parameters by model-based agglomeratve clusterng The qualty of the clusterng result by EM crtcally depends on the ntal values,.e. the number of clusters and ther parameters. If a poor choce of ntal values s made, the convergence of EM may be very slow [14], whch s mpractcal for large mage datasets. Agan, many dfferent strateges have been proposed, e.g. several random starts, but the most relable opton seems to be to use model-based agglomeratve clusterng (MAC) [6]. Ths has the added advantage that once the cluster tree has been establshed, at a very low computatonal cost, several numbers of clusters can be assessed. MAC starts on sngleton clusters, contanng a sngle pxel. The parameters θ, the means and covarances c ( µ c, Σ c ), for cluster c s now ntalzed by the spectral of pxel and dentty matrx I. The algorthm then contnues to jon those pars of clusters, whch leads to the greatest ncrease n classfcaton lkelhood [2][6], L, gven by: CL n LCL = f = 1 ( x ) ;θ (5) c (3) 73

76 CHAPTER 6 Agan, f s a multvarate Gaussan wth parameters θc for cluster c to whch x s assgned. The number of clusters s decreased by one after each teraton. The process contnues untl there s only one cluster. Ths yelds a dendrogram, presentng how cluster pars are joned. The ntal statstcal parameters for mxture model clusterng for several nterestng models can be extracted by cuttng the dendrogram at approprate levels. Model-based clusterng [2], proposed by Fraley and Raftery, essentally ncludes these two man steps: ntalzaton of statstcal parameters by usng MAC at step one, and performng EM for the nterestng models and selectng the best model usng BIC at step two. The ntalzaton step of course can be done by usng the ordnary agglomeratve clusterng such as sngle-lnkage, average lnkage and complete lnkage. However, they have no known assocated statstcal model [2] that allows good estmates of the statstcal parameters and the number of clusters, whch s the man goal of our study. 3. Strategy I The major drawback of ntalzaton by agglomeratve methods s that the computatonal demands ncrease rapdly wth the number of samples, whch makes t mpractcal for large data sets. In our research, we propose a soluton partcular to mages. Instead of startng from ndvdual pxels, the agglomeratve ntalzaton starts from a much smaller number of homogeneous areas. These areas are obtaned by a smple regon growng segmentaton (RGS). Strategy I extracts cluster parameters for several selected models; e.g. n between 5 and 25 clusters, whch are used as startng ponts for the EM teratons. The am s only for reducng the computaton tme by dscardng all the models out from ths range. Normally, wthout any pror knowledge, ths can be set to the maxmum to the capacty of the computer system. However, wth pror knowledge about the data ths range can be much narrower. Then, the best model s pcked on the bass of the BIC crteron. Fnally, the clusterng of the complete mage (pxels nsde and outsde the homogeneous areas) s obtaned by MRF classfcaton, the second new element n our strategy. It uses dstances to the ndvdual clusters as well as class nformaton of neghborng pxels. The steps of Strategy I are gven below. Clusterng Strategy I: Step 1: Obtan homogenous regons by the regon growng segmentaton method. Step2: Estmate cluster parameters for selected models by agglomeratve clusterng. Step 3: Do EM for each selected model on homogenous pxels; the best model wll be selected usng BIC. Step 4: Log-lkelhood classfcaton or Markov Random Feld (MRF) classfcaton on the entre mage. Note that from now on f the objects are not mentoned explctly at any step of the algorthm, then t mples that the algorthm s appled to the pxels belongng to the homogenous regons. 74

77 STRATEGIES FOR MIXTURE MODEL CLUSTERING OF MULTIVARIATE IMAGES Obtan homogenous regons By defnton, mage segmentaton s a process of parttonng the mage nto nonntersectng regons such that each regon s homogeneous and the unon of no two adjacent regons s homogeneous [15]. RGS starts wth a number of ntal seed pxels, and creates homogeneous regons by groupng adjacent pxels (or regons) f ther dstance (due to ntenstes n the channels) s below a predefned threshold. Defnng ths threshold s the most problematc ssue n RGS. An over-fragmented mage, contanng many more regons than expected, s easly obtaned. One reason for ths s that object homogenety s not a well-defned concept and may be descrbed by dfferent statstcs. In realty, cluster homogenety can be nfluenced by many external factors such as temperature, or experment factors. In ths work, over-segmentaton s not a problem because regons wll be merged later. Hence, determnng a perfect threshold s not necessary. Several basc varants of RGS exst, dependng on the defnton of the dstance between neghbor pxels and the segment of a current seed pxel. Here, for smplcty, we use the smple average lnkage RGS where the dstance between neghbor pxels and the mean of the current segment s used. The RGS algorthm employed here uses one parameter, the mnmal sze of a regon (MINSIZE), s descrbed below: The RGS algorthm: Step 1. (New segment) A stll unlabeled pxel (whch s not assocated to any segment) s used as a seed pxel to ntalze the set of seed pxels, and go to step 2. If all pxels are labeled, dscard all very small regons wth sze < MINSIZE, and STOP. In general, any unlabeled pxel can be used as seed pxel n ths step. However, for speedng up the process, t s chosen as the frst unlabeled pxel encountered when readng the mage (row or column order) Step 2. (Iteratve growth) If the set of seed pxels s not empty, get one at the top of the set as a current seed pxel. All boundary pxels are elgble for mergng (by readng order): they are joned to the current segment (.e. they are labeled) AND appended to the end of the set of seed-pxels f: 1. They are unlabeled pxels, 2. The dstances to the mean of the current segment are below the varance of the current segment. If the set of seed pxels s empty, go back to step 1, otherwse loop to the begnnng of step 2. Very small regons wth szes smaller than MINSIZE may well contan nose, artefacts or spatally solated pxels. These pxels are not mportant for parameter estmaton purposes, and therefore they are dscarded from the process untl the last step of the clusterng strategy. Otherwse, they may nfluence the clusterng process. The smaller MINSIZE, the more homogeneous regon wll be found. It should be kept small but larger than the number of dmensons. As a rule of thumb, MINSIZE may be taken as twce the number of spectral varables n the data set. In our experence, ths works well for dfferent data sets. Artefacts and nose are typcally not present n homogeneous areas, whch wll mprove the qualty of the estmated statstcal parameters n later stages of the algorthm. Note that there s a chance that some clusters are not found; ths may be the case f no homogeneous areas correspondng to these clusters are dentfed. However, ths may happen n any 75

78 CHAPTER 6 clusterng, albet for dfferent reasons. The chance s very small f MINSIZE s small enough. Model-based clusterng of homogenous regons Step 2 and step 3 of the algorthm s n fact the model-based clusterng [2] usng the most general model (varable n volume, shape, and orentaton - VVV). The ntalzaton s performed usng the homogeneous regons obtaned from step 1 rather than from sngleton pxels. In bref, after MAC on the homogeneous regons, the statstcal parameters for the range of nterestng models, can be obtaned. These are used to start EM for the selected models. Then, n step 3, BIC values for all selected models are calculated and the best model s dentfed by the hghest BIC. MRF classfcaton for the entre mage At ths pont, step 4, the best model s dentfed wth statstcal parameters of all clusters. We can use these to obtan a classfcaton for all pxels n the mage, not only the pxels n the homogeneous areas, by maxmal lkelhood classfcaton (one E-step) [2]. However, the result may be mproved further by takng nto account the spatal relaton between pxels. Therefore, we propose to apply a MRF step to deal wth overlappng clusters and nose [1][10][16]. Bascally, n MRF clusterng, the condtonal probablty of pont x belongng to cluster c, P(c = c), under the neghborng system (usually a 3 x 3 or a 5 x 5 rectangular wndow) s estmated by [8]: 1 P(c = c) = exp β u (8) Z jc j where Z s a normalzaton constant and β s a spatal smoothness parameter. More detals can be found n [1][16]. A hgher (postve) β corresponds to hgher spatal dependency of neghbor pxels. In practce, t s normally set n the range of [0.1,.., 4]. The more postve the value, the smoother the result mage can get. However, over-smooth mages could lose small and solated parts of clusters. Therefore, the user has to fnd a compromse. The EM algorthm s then adapted, leadng to the the complete-data log-lkelhood crteron: 76 MRF g ( Ψ ) = u log( π f ( x ; θ )) c= 1 N log L (9) c c c where the mxture proportons π c (Eq. 2) are now replaced by the transton probablty π c [1]: π c g = exp β u jc / exp β u (10) jh j h= 1 j Here, we need only the E-step. The condtonal probabltes u c, and π c takng nto account spatal nformaton, are changed. Ths s dfferent from the MRF concept n [10], where MRF s ntegrated drectly nto full E- and M- steps. Here, the statstcal parameters θ are c kept constant. The convergence s usually obtaned n very few teratons; n most cases two teratons suffce.

79 STRATEGIES FOR MIXTURE MODEL CLUSTERING OF MULTIVARIATE IMAGES Two clusters that overlap n the spectral doman but are n dfferent regons of the mage may be separated easly n ths way. Moreover, solated nose pxels are classfed to one of the classes present n ther neghborhood, whch leads to a much smoother clusterng result [10]. 4. Strategy II In practce, the classes to be dentfed by the clusterng, are often not normally dstrbuted but stll can be descrbed very well by a mxture of several normal dstrbutons [1][17]. In ths stuaton, t s hard to determne the best cluster-model determned by usng BIC crteron, because no clear maxmum may be present. It could result n many more clusters than expected. Especally, t s the case for a dataset contanng a non-normal bg cluster as well as several small clusters. By the lkelhood crteron n agglomeratve clusterng (step 2)[2][6], small clusters are lkely to be merged to other clusters very early. Ths s the explanaton for the problem of detectng small clusters n large datasets usng model-based clusterng [7]. On the other hand, t s very hard to jon sub-clusters of a bg cluster nto one component. Hence, we propose strategy II, whch s an extenson of strategy I only n the step 3. Step 3 s extended to better dentfy the number of clusters and ther statstcal parameters for the best cluster model. Now the step 3 n Strategy I s step 3.1 n Strategy II. After step 3.1, an ntermedate best cluster model contanng many clusters s obtaned. At ths pont, sub-clusters of each component should be merged. We start wth the assumpton: f a component contans many normal clusters then they must be hghly overlappng (very smlar). Then we do the mergng by lookng nto the degree of overlap of all pars of clusters. One of the ways to measure the overlap s by usng the Bayes error. It s often modeled by the Bhattacharyya dstance, Bha, [15] below: 1 ( ) ( ) ( ) 1 T Σ + Σ j 1 Σ + Σ j (12) Bha, j = µ µ j µ µ j + ln Σ Σ j where,j are two clusters wth means µ, µ j and covarance matrces Σ, Σ. j, respectvely. The dstance s postve number. A hgher overlap leads to a hgher Bayes error and therefore a lower Bhattacharyya dstance. Step three of the strategy I s then replaced by: Extenson of step 3: 3.1. Do EM for each selected model (usually models wth a large number of clusters); the best ntermedate model s selected usng BIC plot and for non-gaussan dataset, t usually comes up wth a very hgh number of clusters. Ths s actually the step 3 of strategy I objectve to very hgh number of clusters Apply agglomeratve clusterng to the ntermedate model usng the Bhattacharyya dstance, to obtan cluster parameters for a number of nterestng models Do EM for each model and agan the best model can be selected usng BIC plot. Snce the number of expected clusters s far less than the normal dstrbutons needed n the complex data set, the fnal cluster model of Strategy II s not the best ftted to the data and the BIC value s not the hghest. However, nstead of focusng on the exact descrpton of one or two large clusters, strategy II tres to model smaller clusters, whch do not nfluence the lkelhood that much, as well. 77

CHAPTER 6 Strategy II dffers from Strategy I only n step 3. In general, Strategy II can be drectly appled to the dataset wthout pror nformaton about class dstrbutons. The BIC plot at the step 3.

If the plot shows the maxmal BIC values already at a very low number of clusters, then classes are expected to be Gaussans and Strategy I can be used. 5. Results 5.1. Mnced meat data set.

3 nm for each band), recorded wth the ImSpector V7 magng spectrograph (Spectral Imagng Oulu, Fndland) [18].

80 CHAPTER 6 Strategy II dffers from Strategy I only n step 3. In general, Strategy II can be drectly appled to the dataset wthout pror nformaton about class dstrbutons. The BIC plot at the step 3.1 determnes the next steps, whether to contnue wth Strategy II or use Strategy I nstead. If the plot shows the maxmal BIC values already at a very low number of clusters, then classes are expected to be Gaussans and Strategy I can be used. 5. Results 5.1. Mnced meat data set. The frst example s a multvarate mage of mnced meat of 318x318 pxels wth 257 varables (bands) from 396 nm to 736 nm (1.3 nm for each band), recorded wth the ImSpector V7 magng spectrograph (Spectral Imagng Oulu, Fndland) [18]. The ncomng lght s splt and captured by a Sony CCD camera to obtan a color mage, whch wll be used as the reference mage (Fg. 1a). In order to reduce computaton tme, the number of varables s reduced to 11 bands by averagng [10]. The data set contans 4 classes: the petr dsk, dark meat, lght meat and fat. The dfference between dark meat and lght meat s caused by the amount of blood n the meat. The dark pxels represent the dark meat class and the whte spots represent the fat class. The clusterng of orgnal mage s reported n [10]. We demonstrate the ablty of the method of dealng wth nose. Therefore, Whte Gaussan nose wth a standard devaton of 50% of the average standard devaton of the entre mage s added to the spectra of the mage (Fg. 1b). a) b) Fgure 1. (a) the reference CCD color mage; (b) The composte mage (band 2,3 and 9) of the nose mage. Nose and overlappng clusters are the two man problems of ths data set. Indeed, the results obtaned by usng fuzzy C-means and mxture modellng by EM (wth a random ntalsaton, repeated 50 tmes, an unconstraned VVV model, and gnorng spatal nformaton) are very poor, as shown n Fgures 2a and 2b, respectvely. 78 a) b) Fgure 2 Clusterng on the nose mage by (a) Fuzzy C-means on nose mage; (b) Mxture models clusterng by EM.

STRATEGIES FOR MIXTURE MODEL CLUSTERING OF MULTIVARIATE IMAGES A standard soluton to the nose problem s to preprocess the mage by smoothng or flterng technques.

The clusterng results for the fltered data by the fuzzy C-means (Fg. 3b) and the EM algorthm (Fg. 3c) are stll not very good.

81 STRATEGIES FOR MIXTURE MODEL CLUSTERING OF MULTIVARIATE IMAGES A standard soluton to the nose problem s to preprocess the mage by smoothng or flterng technques. However, ths tends to ncrease the overlap problem [10]. It s llustrated n Fgure 3a, where the nosy mage s fltered by the often-used medan flterng technque wth a 3-by-3 neghborhood. The clusterng results for the fltered data by the fuzzy C-means (Fg. 3b) and the EM algorthm (Fg. 3c) are stll not very good. In fuzzy C-means, the fat spots and dark meat regons are coverng much of the lght meat regons; the regular EM algorthm mxes the dark meat wth other classes. a) b) c) Fgure 3. (a) The fltered mage usng the Medan flterng; and clusterng on the fltered mage by (b) Fuzzy C-means on nose mage; (c) Mxture models clusterng by EM. a) b) c) d) Fgure 4. (a) Step 1: 86 regons have been obtaned by RGS (Black regons); (b) Step 2: The fourcluster model after MHC; (c) Step 3: The four-cluster model after EM; (d) The four-cluster model after MRF classfcaton extenson to the entre mage. The frst step of strateges was appled on the nosy mage. 86 homogeneous regons are obtaned by RGS wth MINSIZE = 22 (twce the number of feature dmenson). The mage of these regons s plotted n Fgure 4a (black areas). The whte areas represent pxels 79

82 CHAPTER 6 outsde the homogeneous regons. In step two, MAC s appled to the homogeneous regons and seven nterestng models (rangng from 2 to 8 clusters) are extracted from the dendrogram. The four-cluster model s shown n Fgure 4b. Then EM s appled to all selected models n step three to obtan statstcal parameters. The BIC values of several models are shown n Fgure 5. After a seven cluster-model, BIC values are decreasng. The plot shows the maxmal BIC values already at a four cluster-model, so classes are expected to be Gaussans and Strategy I s sutable for ths dataset. The four-cluster model has the hghest BIC value, whch s n agreement wth the reference nformaton. Upon obtanng the best model and the correspondng cluster parameters, the fnal clusterng result s obtaned after MRF classfcaton on the entre mage, usng β=0.3 and a 5x5 neghborng system (Fgure 4d). The clusterng result s very good, consderng the amount of nose n the mage. Especally, t s mportant that the fat regons concde wth regons of lght spots n the reference mage (Fgure 1a) x e s u a l C v B I Clusters Fgure 5: The BIC plot of Strategy I on the mnced meat data set St. Paula Flower Image Data The RGB (3-band) mage (304 x 268 pxels) of a St. Paula flower s shown n Fgure 8a. Snce the yellow centers of the flower are very small (many yellow spots have szes smaller than 4 pxels) detectng them usng clusterng wth a small number of clusters s a challengng task. Incremental model-based clusterng was proposed n [7] as one of solutons for ths problem. In the current paper, Strategy II s used for ths dataset. In step one, 419 homogeneous regons are obtaned by RGS wth MINSIZE = 6 (twce the number of feature dmensons). In step two, MAC s appled to the homogeneous regons and a wde range of numbers of clusters of [2,..,80] s selected to start step three. EM s appled to all selected models. The BIC values are shown n Fgure 6. The maxmal BIC value s n a hgh number of clusters and the best model s very dffcult to determne, confrmng that ths s a complex dataset. An ntermedate-model of 61 clusters s selected (Step 3.1) correspondng to the hghest BIC value. Step 3.2 and 3.3 are appled to the ntermedate model for a range of [2,..30] clusters. Fgure 7 plots the BIC values for ths step (Strategy II) aganst the values found n step 2 (equvalent to Strategy I). The best model could be obtaned by a sharp ncrease of BIC value. Three sutable optons for Strategy II are at the locatons of A, B, and C (n the BIC plot) correspondng to the models of 15, 22, and 25 clusters, whle n the same range, 80

83 STRATEGIES FOR MIXTURE MODEL CLUSTERING OF MULTIVARIATE IMAGES the models, located at D and E are sutable for the best model for Strategy I correspondng to the models of 21 and 29 clusters x e s u a l C v B I Clusters Fgure 6. BIC values for cluster-models from 2 to 80 clusters of step 3.1 (Strategy I) on the mage of St. Paula flower. A D B C E Fgure 7. BIC values of Strategy II aganst the Strategy I on the mage of St. Paula flower. In the fnal step, the statstcal parameters of clusters are extracted for each opton and maxmal lkelhood classfcaton (one E-step) s used to obtan the correspondng clusterng result. Some results for chosen models are plotted n Fgure 8. The yellow centers are revealed well only by Strategy II on the B and C models correspondng to 22 and 25 clusters. The results of two models, Strategy II to 22 clusters (opton B) and Strategy I to 29 clusters (opton E), are shown n Fgure 8b and c, respectvely. The 81

CHAPTER 6 clusterng was also performed by fuzzy C-means to 30 clusters (Fgure 8d), whch cannot show the yellow centers ether. (a) (b) (c) (d) Fgure 8. a) RGB Image of St.

84 CHAPTER 6 clusterng was also performed by fuzzy C-means to 30 clusters (Fgure 8d), whch cannot show the yellow centers ether. (a) (b) (c) (d) Fgure 8. a) RGB Image of St. Paula flower b) Strategy II to 22 clusters (opton B), c) Strategy I to 29 clusters (opton E), and d) fuzzy C-means to 30 clusters. 6. Conclusons and dscusson Two strateges (Strategy I and Strategy II) to mxture model clusterng for multvarate mages have been developed n ths study. Strategy I s for a mage data where each cluster s normally dstrbuted, and the other for the stuaton where a cluster s a mxture of several Gaussan dstrbutons. The methods mnmze the need for human nteracton to select values of nput parameters. Spatal nformaton s effectvely used. Frstly, n the frst stages t consders only homogenous regons, formed by RGS, whch not only makes the process faster, but allows for relable estmaton of cluster parameters. Secondly, by employng MRF classfcaton n the fnal step, t reduces the effect of nose/artfacts and the overlap problem of clusters, often present n real-world data sets. Snce the number of homogenous regons s much smaller than the total pxels, the MAC process should be fast. 82

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton