SSDR: An Algorithm for Clustering Categorical Data Using Rough Set Theory

Size: px
Start display at page:

Download "SSDR: An Algorithm for Clustering Categorical Data Using Rough Set Theory"

Transcription

1 Avalable onlne at Pelaga Research Lbrary Advances n Appled Scence Research, 20, 2 (3): ISSN: CODEN (USA): AASRFC SSDR: An Algorthm for Clusterng Categorcal Data Usng Rough Set Theory B. K. Trpathy and *Adhr Ghosh School of Computer Scence and Engneerng, VIT Unversty, Vellore, Taml Nadu, Inda _ ABSTRACT In the present day scenaro, there are large numbers of clusterng algorthms avalable to group objects havng smlar characterstcs. But the mplementatons of many of those algorthms are challengng when dealng wth categorcal data. Whle some of the algorthms avalable at present cannot handle categorcal data the others are unable to handle uncertanty. Many of them have the stablty problem and also have effcency ssues. Ths necesstated the development of some algorthms for clusterng categorcal data and whch also deal wth uncertanty. In 2007, an algorthm, termed MMR was proposed [3], whch uses the rough set theory concepts to deal wth the above problems n clusterng categorcal data. Later n 2009, ths algorthm was further mproved to develop the algorthm MMeR [2] and t could handle hybrd data. Agan, very recently n 20 MMeR s agan mproved to develop an algorthm called SDR [22], whch can also handle hybrd data. The last two algorthms can handle both uncertantes as well as deal wth categorcal data at the same tme but SDR has more effcency over MMeR and MMR. In ths paper, we propose a new algorthm n ths sequence, whch s better than all ts predecessors; MMR, MMeR and SDR, and we call t SSDR (Standard devaton of Standard Devaton Roughness) algorthm. Ths takes both the numercal and categorcal data smultaneously besdes takng care of uncertanty. Also, ths algorthm gves better performance whle tested on well known datasets. Keywords- Clusterng, MMeR, MMR, SDR, SSDR, uncertanty. _ INTRODUCTION The basc objectve of clusterng s to group data or objects havng the smlar characterstcs n the same cluster and havng dssmlarty wth other clusters. It has been used n data mnng tasks such as unsupervsed classfcaton and data summaton. It s also used n segmentaton of large heterogeneous data sets nto smaller homogeneous subsets whch s easly managed, separately modeled and analyzed [8]. The basc goal n cluster analyss s to dscover natural groupngs of objects []. Clusterng technques are used n many areas such as manufacturng, Pelaga Research Lbrary 34

2 Adhr Ghosh et al Adv. Appl. Sc. Res., 20, 2 (3): medcne, nuclear scence, radar scannng and research and also n development. For example, Wu et al. [2] developed a clusterng algorthm specfcally desgned for handlng the complexty of gene data. Jang et al. [3] analyze a varety of cluster technques, whch can be appled for gene expresson data. Wong et al. [6] presented an approach used to segment tssues n a nuclear medcal magng method known as postron emsson tomography (PET). Hamov et al. [20] used cluster analyss to segment radar sgnals n scannng land and marne objects. Fnally Matheu and Gbson [9] used the cluster analyss as a part of a decson support tool for large scale research and development plannng to dentfy programs to partcpate n and to determne resource allocaton. The problem wth all the above mentoned algorthms s that they mostly deal wth numercal data sets that are those databases havng attrbutes wth numerc domans.the basc reason for dealng wth numercal attrbutes s that these are very easy to handle and also t s easy to defne smlarty on them. But categorcal data have mult-valued attrbutes. Ths, smlarty can be defned as common objects, common values for the attrbutes and the assocaton between two. In such cases horzontal co-occurrences (common value for the objects) as well as the vertcal co-occurrences (common value for the attrbutes) can be examned [2]. Other algorthms, those can handle categorcal data have been proposed ncludng work by Huang[3], Gbson et al. [4], Guha et al. [3] and Dempster et al. []. Whle these algorthms or methods are very helpful to form the clusters from categorcal data they have the dsadvantage that they cannot deal wth uncertanty. However, n real world applcatons t has been found that there s often no sharp boundary between clusters. Recently some work has been done by Huang [8] and Km et al. [4] where they have developed some clusterng algorthms usng fuzzy sets, whch can handle categorcal data. But, these algorthms suffer from the stablty problem as they do not provde satsfactory values due to the multple runs of the algorthms. Therefore, there s a need for a robust algorthm that can handle uncertanty and categorcal data together. In ths sequence S. Parmar et al [3] n 2007, B.K.Trpathy et al [2] n 2009 and [22] n 20 proposed three algorthms whch can deal wth both uncertanty and categorcal attrbutes together. But the effcency and stablty come nto play when Purty rato s measured. The purty ratos of MMR, MMeR and SDR are n the ncreasng order. In ths paper, a new algorthm called Standard Devaton of Standard Devaton Roughness (SSDR) algorthm s proposed, whch has hgher purty rato than all the prevous algorthms n ths seres and prevous to that. We establsh the superorty of ths algorthm over the others by testng them on a famlar data base, the zoo data set taken from the UCI repostory. MATERIALS AND METHODS 2. Materals In ths secton we frst present the lterature revew as the bass of the proposed work, the defntons of concepts to be used n the work and also present the notatons to be used. 2.. Lterature Revew In ths secton we present the lterature of exstng categorcal clusterng algorthms. Dempster et al. [] presents a parttonal clusterng method, called the Expectaton-Maxmzaton (EM) algorthm. EM frst randomly assgns dfferent probabltes to each class or category, for each cluster. These probabltes are then successvely adjusted to maxmze the lkelhood of the data Pelaga Research Lbrary 35

3 Adhr Ghosh et al Adv. Appl. Sc. Res., 20, 2 (3): gven the specfed number of clusters. Snce the EM algorthm computes the classfcaton probabltes, each observaton belongs to each cluster wth a certan probablty. The actual assgnment of observatons to a cluster s determned based on the largest classfcaton probablty. After a large number of teratons, EM termnates at a locally optmal soluton. Han et al. [26] propose a clusterng algorthm to cluster related tems n a market database based on an assocaton rule hypergraph. A hypergraph s used as a model for relatedness. The approach targets bnary transactonal data. It assumes tem sets that defne clusters are dsjont and there s no overlap amongst them. However, ths assumpton may not hold n practce as transactons n dfferent clusters may have a few common tems. K-modes [8] extend K-means and ntroduce a new dssmlarty measure for categorcal data. The dssmlarty measure between two objects s calculated as the number of attrbutes whose values do not match. The K-modes algorthm then replaces the means of clusters wth modes, usng a frequency based method to update the modes n the clusterng process to mnmze the clusterng cost functon. One advantage of K-modes s t s useful n nterpretng the results [8]. However, K-modes generate local optmal solutons based on the ntal modes and the order of objects n the data set. K-modes must be run multple tmes wth dfferent startng values of modes to test the stablty of the clusterng soluton. Ralambondrany [5] proposes a method to convert multple categores attrbutes nto bnary attrbutes usng 0 and to represent ether a category absence or presence, and to treat the bnary attrbutes as numerc n the K-means algorthm. Huang [8] also proposes the K-prototypes algorthm, whch allows clusterng of objects descrbed by a combnaton of numerc and categorcal data. CACTUS (Clusterng Categorcal Data Usng Summares) [23] s a summarzaton based algorthm. In CACTUS, the authors cluster for categorcal data by generalzng the defnton of a cluster for numercal attrbutes. Summary nformaton constructed from the data set s assumed to be suffcent for dscoverng well-defned clusters. CACTUS fnds clusters n subsets of all attrbutes and thus performs a subspace clusterng of the data. Guha et al. [6] propose a herarchcal clusterng method termed ROCK (Robust Clusterng usng Lnks), whch can measure the smlarty or proxmty between a par of objects. Usng ROCK, the number of lnks are computed as the number of common neghbors between two objects. An agglomeratve herarchcal clusterng algorthm s then appled: frst, the algorthm assgns each object to a separate cluster, clusters are then merged repeatedly accordng to the closeness between clusters, where the closeness s defned as the sum of the number of lnks between all pars of objects. Gbson et al. [4] propose an algorthm called STIRR (Sevng Through Iterated Relatonal Renforcement), a generalzed spectral graph parttonng method for categorcal data. STIRR s an teratve approach, whch maps categorcal data to non-lnear dynamc systems. If the dynamc system converges, the categorcal data can be clustered. Clusterng naturally lends tself to combnatoral formulaton. However, STIRR requres a nontrval post-processng step to dentfy sets of closely related attrbute values [23]. Addtonally, certan classes of clusters are not dscovered by STIRR [23]. Moreover, Zhang et al. [24] argue that STIRR cannot guarantee convergence and therefore propose a revsed dynamc system algorthm that assures convergence. He et al. [7] propose an algorthm called Squeezer, whch s a one-pass algorthm. Squeezer puts the frst-tuple n a cluster and then the subsequent-tuples are ether put nto an exstng cluster or rejected to form a new cluster based on a gven smlarty functon. He et al. [25] explore categorcal data clusterng (CDC) and lnk clusterng (LC) problems and propose a LCBCDC (Lnk Clusterng Based Categorcal Data Clusterng), and compare the results wth Squeezer and K-mode. In revewng these algorthms, some of the methods such as STIRR and EM algorthms cannot guarantee the convergence whle others have scalablty ssues. In addton, all of the algorthms have one common assumpton: each object can be classfed nto only one cluster and all objects have the same degree of confdence when grouped nto a cluster [5]. However, n real world applcatons, t s dffcult to draw clear Pelaga Research Lbrary 36

4 Adhr Ghosh et al Adv. Appl. Sc. Res., 20, 2 (3): boundares between the clusters. Therefore, the uncertanty of the objects belongng to the cluster needs to be consdered. One of the frst attempts to handle uncertanty s fuzzy K-means [9]. In ths algorthm, each pattern or object s allowed to have membershp functons to all clusters rather than havng a dstnct membershp to exactly one cluster. Krshnapuram and Keller [8] propose a probablstc approach to clusterng n whch the membershp of a feature vector n a class has nothng to do wth ts membershp n other classes and modfed clusterng methods are used to generate membershp dstrbutons. Krshnapuram et al. [7] present several fuzzy and probablstc algorthms to detect lnear and quadratc shell clusters. Note the ntal work n handlng uncertanty was based on numercal data. Huang [8] proposes a fuzzy K-modes algorthm wth a new procedure to generate the fuzzy partton matrx from categorcal data wthn the framework of the fuzzy K-means algorthm. The method fnds fuzzy cluster modes when a smple matchng dssmlarty measure s used for categorcal objects. By assgnng confdence to objects n dfferent clusters, the core and boundary objects of the clusters can be decded. Ths helps n provdng more useful nformaton for dealng wth boundary objects. More recently, Km et al. [4] have extended the fuzzy K-modes algorthm by usng fuzzy centrod to represent the clusters of categorcal data nstead of the hard-type centrod used n the fuzzy K-modes algorthm. The use of fuzzy centrod makes t possble to fully explot the power of fuzzy sets n representng the uncertanty n the classfcaton of categorcal data. However, fuzzy K-modes and fuzzy centrod algorthms suffer from the same problem as K-modes, that s they requre multple runs wth dfferent startng values of modes to test the stablty of the clusterng soluton. In addton, these algorthms have to adjust one control parameter for membershp fuzzness to obtan better solutons. Ths necesstates the effort for multple runs of these algorthms to determne an acceptable value of ths parameter. Therefore, there s a need for a categorcal data clusterng method, havng the ablty to handle uncertanty n the clusterng process whle provdng stable results. One methodology wth potental for handlng uncertanty s Rough Set Theory (RST) whch has receved consderable attenton n the computatonal ntellgence lterature snce ts development by Pawlak n the 980s. Unlke fuzzy set based approaches, rough sets have no requrement on doman expertse to assgn the fuzzy membershp. Stll, t may provde satsfactory results for rough clusterng. The objectve of ths proposed algorthm s to develop a rough set based approach for categorcal data clusterng. The approach, termed Standard devaton of Standard devaton roughness (SSDR), s presented and ts performance s evaluated on large scale data sets Bascs of rough sets Most of our tradtonal tools for formal modelng, reasonng and computng are determnstc and precse n character. Real stuatons are very often not determnstc and they cannot be descrbed precsely. For a complete descrpton of a real system often one would requre by far more detaled data than a human beng could ever recognze smultaneously, process and understand. Ths observaton led to the extenson of the basc concept of sets so as to model mprecse data whch can enhance ther modelng power. The fundamental concept of sets has been extended n many drectons n the recent past. The noton of Fuzzy Sets, ntroduced by Zadeh [0] deals wth the approxmate membershp and the noton of Rough Sets, ntroduced by Pawlak [2] captures ndscernblty of the elements n a set. These two theores have been found to complement each other nstead of beng rvals. The dea of rough set conssts of approxmaton of a set by a par of sets, called the lower and upper approxmatons of the set. The basc assumpton n rough set s that, knowledge depends upon the classfcaton capabltes of human bengs. Snce every classfcaton (or partton) of a unverse and the concept of equvalence Pelaga Research Lbrary 37

5 Adhr Ghosh et al Adv. Appl. Sc. Res., 20, 2 (3): relaton are nterchangeable notons, the defnton of rough sets depends upon equvalence relatons as ts mathematcal foundatons [2]. Let U ( ) be a fnte set of objects, called the unverse and R be an equvalence relaton over U. By U / R we denote the famly of all equvalence classes of R (or classfcaton of U) referred to as categores or concepts of R and [x] R denotes a category n R contanng an element x U. By a Knowledge base, we understand a relaton system k= (U, R), where U s as above and R s a famly of equvalence relatons over U. For any subset P ( ) R, the ntersecton of all equvalence relatons n P s denoted by IND (P) and s called the ndscernblty relaton over P. The equvalence classes of IND (P) are called P- basc knowledge about U n K. For any Q R, Q s called a Q-elementary knowledge about U n K and equvalence classes of Q are called Q-elementary concepts of knowledge R. The famly of P-basc categores for all P R wll be called the famly of basc categores n knowledge base K. By IND (K), we denote the famly of all equvalence relatons defned n k. Symbolcally, IND (K) = {IND (P): P R}. For any X U and an equvalence relaton R IND (K), we assocate two subsets, RX = U{ Y U / R : Y X} and RX = U { Y U / R : Y X }, called the R-lower and R-upper approxmatons of X respectvely. The R-boundary of X s denoted by BN R (X) and s gven by BN R (X) = RX RX. The elements of RX are those elements of U whch can be certanly classfed as elements of X employng knowledge of R. The borderlne regon s the undecdable area of the unverse. We say X s rough wth respect to R f and only f RX RX, equvalently BN R (X). X s sad to be R- defnable f and only f RX rough wth respect to R f and only f t s not R-defnable. = RX, or BN R (X) =. So, a set s 2..3 Defntons Defnton (Indscernblty relaton (Ind (B))): Ind (B) s a relaton on U. Gven two objects x, x j U, they are ndscernble by the set of attrbutes B n A, f and only f a (x ) = a (x j ) for every a B. That s, (x, x j Ind (B) f and only f a B where B A, a (x ) = a (x j ). Defnton (Equvalence class ([x ] Ind (B) )): Gven Ind (B), the set of objects x havng the same values for the set of attrbutes n B conssts of an equvalences classes, [x ] Ind(B). It s also known as elementary set wth respect to B. Defnton (Lower approxmaton): Gven the set of attrbutes B n A, set of objects X n U, the lower approxmaton of X s defned as the unon of all the elementary sets whch are contaned n X. That s X = x [x ] Ind (B) X}. B Defnton (upper approxmaton): Gven the set of attrbutes B n A, set of objects X n U, the upper approxmaton of X s defned as the unon of the elementary sets whch have a nonempty ntersecton wth X.That s X B = {x [x ] Ind (B) X }. Pelaga Research Lbrary 38

6 Adhr Ghosh et al Adv. Appl. Sc. Res., 20, 2 (3): Defnton (Roughness): The rato of the cardnalty of the lower approxmaton and the cardnalty of the upper approxmaton s defned as the accuracy of estmaton, whch s a measure of roughness. It s presented as R B (X) = - X B X B If R B (X) = 0, X s crsp wth respect to B, n other words, X s precse wth respect to B. If R B (X) <, X s rough wth respect to B, That s, B s vague wth respect to X. Defnton (Relatve roughness) : Gven a A, X s a subset of objects havng one specfcs value α of attrbute a, X ( a = a) and X ( a = a) refer to the lower and upper approxmaton of X wth respect to { }, then R (X) s defned as the roughness of X wth respect to { }, that s Ra ( X / a j =α) = - X ( a a j = α ), where a, A and a. X ( a = α ) Defnton (Mean roughness): Let A have n attrbutes and a A. X be the subset of objects havng a specfc value α of the attrbute a. Then we defne the mean roughness for the equvalence class a =α, denoted by MeR (a =α) as n MeR (a =α) = ( Ra ( X / a / ( ) j = α )) n. j= j Defnton (Standard devaton) : After calculatng the mean of each a A, we wll apply the standard devaton to each a by the formula SD (a = α) = n (/ ( n )) ( R ( X / a = α ) MeR(a = α)) = a 2 Defnton (Dstance of relevance): Gven two objects B and C of categorcal data wth n attrbutes, DR for relevance of objects s defned as follows: n =. = DR( B, C) ( b, c ) Here, b and c are values of objects B and C respectvely, under the th attrbute a. Also, we have. DR (b, c ) = f b c 2. DR (b, c ) = 0 f b = c 3. DR (b, c ) = eq B eq C f a s a numercal attrbute; where eq B no s the number assgned to the equvalence class that contans b. eq number of equvalence classes n numercal attrbute a. Pelaga Research Lbrary C s smlarly defned and no s the total 39

7 Adhr Ghosh et al Adv. Appl. Sc. Res., 20, 2 (3): Defnton (Purty rato) : In order to compare SDR wth MMeR and MMR and all other algorthms whch have taken ntatve to handle categorcal data we developed an mplementaton. The tradtonal approach for calculatng purty of a cluster s gven below. Purty ()= the number of data occurng n both the th cluster and ts correspondng class Over all Purty= # ofclusters = the number of data n the data set Purty( ) # ofclusters METHODS In ths secton we present the man algorthm of the paper and the expermental part deals wth an example Proposed Algorthm In ths secton we present our algorthm whch we call SSDR. The notatons and defntons of concepts have been dscussed n the prevous secton.. Procedure SSDR(U, k) 2. Begn 3. Set current number of cluster CNC = 4. Set ParentNode = U 5. Loop: 6. If CNC < k and CNC then 7. ParentNode = Proc ParentNode (CNC) 8. End f // Clusterng the ParentNode 9. For each a A ( = to n, where n s the number of attrbutes n A) 0. Determne [ X m] Ind ( a ) (m = to number of objects). For each A (j = to n, where n s the number of the attrbutes n A, j ) 2. Calculate Rough (a ) 3. Next 4. MeR (a =α) = n ( Ra ( X / a / ( ) j = α )) n. j= j 5. Next 6. Apply standard devaton SD(a =α)= n (/ ( n )) ( R ( X / a = α ) MeR(a = α)) = a 7. Next 8. Set SDR =SD {mn {SD (a =α ),.SD (a = α k j )},where k j s the number of equvalence classes n Dom(a ). 9. Determne splttng attrbute a correspondng to the Standard devaton- Roughness 20. Do bnary splt on the splttng attrbute a Pelaga Research Lbrary 2 320

8 Adhr Ghosh et al Adv. Appl. Sc. Res., 20, 2 (3): CNC = the number of leaf nodes 22. Go to Loop: 23. End 24. Proc ParentNode (CNC) 25. Begn 26. Set = 27. Do untl < CNC 28. If Avg-dstance of cluster s calculated 29. Goto label 30. else 3. n = Count (Set of Elements n Cluster ). 32. Avg-dstance () = 2*( n n ( Dstance of relevance between objects and a k j= k = j+ ))/(n*(n -)) 33. label : 34. ncrement 35. Loop 36. Determne Max (Avg-dstance ()) 37. Return (Set of Elements n cluster ) correspondng to Max (Avg-dstance ()) 38. End Expermental Part In ths secton we present the expermental hybrd table whch the characterzaton of varous anmals n terms of sze, anmalty, color and age. In later secton we wll show the effcency of ths algorthm. The expermental table s as follows: Table ANIMAL NAME SIZE ANIMALITY COLOUR AGE A Small Bear Black 25 A2 Medum Bear Black 6 A3 Large Dog Brown 9 A4 Small Cat Black 30 A5 Medum Horse Black 28 A6 Large Horse Black 5 A7 Large Horse Brown 7 Let us consder the value of k s 3 that s k=3 whch mean the number of clusters wll be 3. Intally the value of CNC s and the value of the ParentNode s U whch ndcates, the ntal value of ParentNode s whole table. So, we need to apply our algorthm three tmes to get the desred clusters. Computatonal Part So, ntally CNC < k and CNC s false. So t wll calculate the average dstance of the parent node, but ntally only one table we have so there s no need to calculate the average dstance, drectly we wll calculate the roughness of each attrbute relatve to the rest of the attrbutes whch s known as relatve roughness. So, when =, the value of a s SIZE that s a = sze. Ths attrbute has three dstnct values Small, Medum and Large so consderng α = Small Pelaga Research Lbrary 32

9 Adhr Ghosh et al Adv. Appl. Sc. Res., 20, 2 (3): frst we get X={A, A4} (where X s a subset of objects havng one specfc value α of attrbute a ) and consderng j=2(as j) we get = Anmalty. So the equvalence classes of s {(A, A2), A3, A4, (A5, A6, A7)} and the lower approxmaton of X ( a = α ) s gven by X ( a = α )= {ϕ} and the upper approxmaton of X ( a = α ) s gven by X ( a = α ) = {A, A2, A4}. So, the roughness of a (when a = SIZE and α= Small ) s gven by R ( X / a = α ) = - X j X a ( a j = α ) X ( a = α ) = = Now, by changng the value of j (when j=3, 4,) and keepng constant the value of a (a = sze ) and α (α= Small ) we need to fnd the roughness of a relatve to the attrbutes COLOR (when j=3) and AGE (when j=4) and s gven by R ( X / a = α ) = - X j R ( X / a = α ) = - X j X a ( a j = α ) X ( a = α ) = = when j=3 and = COLOR X a ( a j = α ) X ( a = α ) = = 0 when j=4 and = AGE Now, to get the standard devaton of a (a = sze ) when α= Small we need to fnd the mean of these values and s gven by =. And applyng standard devaton formula we get the 3 3 value and wll be stored n a varable. Ths smlar process wll be contnued by changng the value of α (for α= Medum and Large ) and keepng constant the value of a. And lastly we wll get three standard devaton values for each dfferent α. And agan we wll store those values n a varable. After calculatng the SD (standard devaton) of each α we wll take the mnmum value of those dfferent values of α and wll store t n another varable. The above procedure wll be contnued for each a (for a = ANIMALITY, COLOR and SIZE when =2, 3 and 4) and the correspondng values wll be stored n the varable. After completng the above step we wll take those mnmum values for next calculaton. We wll apply SD (standard devaton) to those mnmum values to get the Splttng attrbutes. If the value of SD does not match wth the mnmum values then wll we take the nearest mnmum vale as the splttng attrbute and wll do the bnary splttng that s we wll dvde ths table nto two clusters. Let after splttng we have got two cluster c and c2 and c contans 2 elements and c2 contans 5 elements. So now we need to calculate the average dstance to choose the clusterng table for further calculaton. Ths can be done by applyng dstance of relevance formula. Let us see how we calculate DR (dstance of Relevance). For example let us take two tuple A4 and A6 whch s as follows Pelaga Research Lbrary 322

10 Adhr Ghosh et al Adv. Appl. Sc. Res., 20, 2 (3): Table 2 ANIMAL NAME SIZE ANIMALITY COLOR AGE A4 Small Cat Black 30 A6 Large Horse Black 5 Here B=A4 and C=A6 and DR (B, C) s defned as DR (B, C) = n = DR( b, c ) =DR (b sze,c sze ) + DR (b anmalty,c anmalty ) + DR (b color,c color ) + DR (b age,c age ) So, DR (b sze, c sze ) = 0 as b sze c sze DR (b anmalty, c anmalty ) = 0 as b anmalty c anmalty DR (b color, c color ) = as b color = c color But for DR (b age, c age ) we need to follow some dfferent method as AGE s the numercal attrbute. To calculate the DR of a numercal attrbute we need to exclude that numercal attrbute from that table and need to fnd the average equvalence class of all attrbutes. So, n ths case we need to exclude the attrbute AGE frst and then we have to fnd the average equvalence class. So, the average equvalence class s (3+4+2)/3 = 3. In ths case we have got a nteger value but we can get a fracton also then we need to take ether ts floor value or ts roof value. Now we need to sort the attrbute value of the attrbute AGE. After sortng n ascendng order we get {5, 7, 9, 6, 25, 28, 30}. Now we wll dstrbute these numbers nto three sets whch s as follows Set = {5, 7} Set 2 = {9, 6} Set 3 = {25, 28, 30} Now we wll calculate DR (b age, c age ). In our case b age = 30 and c age = 5. So, we wll put 3 and n place of 30 and 5 as 30 belongs to the set 3 and 5 belongs to the set. So, DR (b age, c age ) = 3 total _ number _ of _ sets Fnally, DR (B, C) = DR (b sze,c sze ) + DR (b anmalty,c anmalty ) + DR (b color,c color ) + DR (b age,c age ) 2 = = = 2 3 So, n ths way we wll calculate the average dstance of C and C2 and the cluster havng the larger average dstance we wll take that partcular cluster as the nput for further calculaton. Pelaga Research Lbrary 323

11 Adhr Ghosh et al Adv. Appl. Sc. Res., 20, 2 (3): So, n ths fashon we wll apply ths algorthm untl we get the desred number of cluster. In our case we wll stop when we wll get C3 because n our case the total number of clusters s 3. RESULTS AND DISCUSSION In ths secton we present the orgnal result that s tested on ZOO dataset whch was also taken by MMR, MMeR and SDR algorthm. The ZOO data has 8 attrbutes and out them 5 are Boolean attrbute, 2 are numerc and s anmal name and t has 0 objects. The total objects are dvded nto seven classes so; we need to stop when we wll get seven clusters. After takng the ZOO dataset as the nput we have got the followng output whch s as follows: Table 3 Cluster Number Class I Class II Class III Class IV Class V Class VI Class VII Purty Rato Overall Purty Comparson of SSDR wth MMeR, MMR, SDR and Algorthms based on FUZZY Set Theory Tll the development of MMR, the only algorthms whch amed at handlng uncertanty n the clusterng process were based upon fuzzy set theory[26].these algorthms based on fuzzy set theory nclude fuzzy K-modes, fuzzy centrods. The K-modes algorthm replaces the means of the clusters (K-means) wth modes and uses a frequency based method to update the modes n the clusterng process to mnmze the clusterng cost functon. Fuzzy K-modes generates a fuzzy partton matrx from categorcal data. By assgnng a confdence to objects n dfferent clusters, the core and boundary objects of the clusters are determned for clusterng purposes. The fuzzy centrods algorthm uses the concept of fuzzy set theory to derve fuzzy centrods to create clusters of objects whch have categorcal attrbutes. But n MMR, MMeR and n SDR they have used rough sets concept to buld those algorthms but as compared to effcency MMeR s more effcent than MMR and less effcent than SDR but SSDR s much more effcent than other Empercal Analyss The earler algorthms for classfcaton wth uncertanty lke K-modes, Fuzzy K-modes and Fuzzy centrod on one hand and MMR, MMeR and SDR on the other hand were appled to ZOO data sets. Table 4 below provdes the comparson of purty for these algorthms on ths datasets. It s observed that SSDR has a better purty than all other algorthms when appled on zoo data set. As mentoned earler, all the fuzzy set based algorthms face a challengng problem that s the problem of stablty. These algorthms requre great effort to adjust the parameter, whch s used to control the fuzzness of membershp of each data pont. At each value of ths parameter, the algorthms need to be run multple tmes to acheve a stable soluton. Pelaga Research Lbrary 324

12 Adhr Ghosh et al Adv. Appl. Sc. Res., 20, 2 (3): MMR, MMeR and SDR on the other hand have no such problem. SSDR contnues to have the advantages of MMR, MMeR and SDR over the other algorthms as mentoned above. But t has hgher purty than MMR, MMeR and SDR whch establshes ts superorty over MMR, MMeR and SDR. Table 4 DATA SET K-modes Fuzzy K-modes Fuzzy centrods MMR MMeR SDR SSDR ZOO * *In ths case we have got the same Purty rato as compared to SDR but as standard devaton has better central tendency over mean or mnmum t wll gve better result for other data sets. Manually t has been checked for a small data set that t s gvng much better result than MMR, MMeR and SDR CONCLUSION In ths paper, we proposed a new algorthm called SSDR, whch s more effcent than most of the earler algorthms ncludng MMR, MMeR and SDR, whch are recent algorthms developed n ths drecton. It handles uncertan data usng rough set theory. Frstly, we have provded a method where both numercal and categorcal data can be handled and secondly, by provdng the dstance of relevance we are gettng much better results than MMR where they are choosng the table to be clustered, accordng to the number of objects. The comparson of purty rato shows ts superorty over MMeR. Future enhancements of ths algorthm may be possble by consderng hybrd technques lke rough-fuzzy clusterng or fuzzy-rough clusterng. REFERENCES [] A. Dempster, N. Lard, D. Rubn, Journal of the Royal Statstcal Socety 39 () (977) 38. [2] B.K.Trpathy and M S Prakash Kumar Ch.: Internatonal Journal of Rapd Manufacturng (specal ssue on Data Mnng) (Swtzerland),vol., no.2, (2009), pp [3] D Parmar, Teresa Wu, Jennfer B, Data & Knowledge Engneerng (2007) [4] D. Gbson, J. Klenberg, P. Raghavan, The Very Large Data Bases Journal 8 (3 4) (2000) [5] M. Halkd, Y. Batstaks, M. Vazrganns, Journal of Intellgent Informaton Systems 7 (2 3) (200) [6] S. Guha, R. Rastog, K. Shm, Informaton Systems 25 (5) (2000) [7] Z. He, X. Xu, S. Deng, Journal of Computer Scence & Technology 7 (5) (2002) [8] Z. Huang, Data Mnng and Knowledge Dscovery 2 (3) (998) [9] E. Ruspn, Informaton Control 5 () (969) [0] L.A. Zadeh, Informaton and Control, (965), pp [] R. Johnson, W. Wchern, Appled Multvarate Statstcal Analyss, Prentce Hall, New York, [2] Zdzslaw Pawlak, Rough Sets- Theoretcal Aspects of Reasonng About Data. Norwell: Kluwar Academc Publshers, (992). [3] D. Jang, C. Tang, A. Zhang IEEE Transactons on Knowledge and Data Engneerng 6 () (2004) [4] D. Km, K. Lee, D. Lee, Pattern Recognton Letters 25 () (2004) Mkm. [5] H. Ralambondrany, Pattern Recognton Letters 6 () (995) Pelaga Research Lbrary 325

13 Adhr Ghosh et al Adv. Appl. Sc. Res., 20, 2 (3): [6] K. Wong, D. Feng, S. Mekle, M. Fulham, IEEE Transactons on Nuclear Scence 49 () (2002) [7] R. Krshnapuram, H. Frgu, O. Nasraou, IEEE Transactons on Fuzzy Systems 3 () (995) [8] R. Krshnapuram, J. Keller, IEEE Transactons on Fuzzy Systems (2) (993) [9] R. Matheu, J. Gbson, IEEE Transactons on Engneerng Management 40 (3) (2004) [20] S. Hamov, M. Mchalev, A. Savchenko, O. Yordanov, IEEE Transactons on Geo Scence and Remote Sensng 8 () (989) [2] S. Wu, A. Lew, H. Yan, M. Yang, IEEE Transactons on Informaton Technology n BoMedcne 8 () (2004) 5 5. [22] Trpathy, B.K. and A.Ghosh: SDR: An Algorthm for Clusterng Categorcal Data Usng Rough Set Theory, Communcated to the Internatonal IEEE conference to be held n Kerala, (20). [23] V., Gant, J. Gehrke, R. Ramakrshnan, CACTUS clusterng categorcal data usng summares, n: Ffth ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, (999), pp [24] Y. Zhang, A. Fu, C. Ca, P. Heng, Clusterng categorcal data, n: Proceedngs of the 6th Internatonal Conference on Data Engneerng, (2000), pp [25] Z. He, X. Xu, S. Deng, A lnk clusterng based approach for clusterng categorcal data, Proceedngs of the WAIM Conference, (2004). < [26] E. Han, G. Karyps, V. Kumar, B. Mobasher, Clusterng based on assocaton rule hypergraphs, n: Workshop on Research Issues on Data Mnng and Knowledge Dscovery, (997), pp Pelaga Research Lbrary 326

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

A NOTE ON FUZZY CLOSURE OF A FUZZY SET (JPMNT) Journal of Process Management New Technologes, Internatonal A NOTE ON FUZZY CLOSURE OF A FUZZY SET Bhmraj Basumatary Department of Mathematcal Scences, Bodoland Unversty, Kokrajhar, Assam, Inda,

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

A Combined Approach for Mining Fuzzy Frequent Itemset

A Combined Approach for Mining Fuzzy Frequent Itemset A Combned Approach for Mnng Fuzzy Frequent Itemset R. Prabamaneswar Department of Computer Scence Govndammal Adtanar College for Women Truchendur 628 215 ABSTRACT Frequent Itemset Mnng s an mportant approach

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 2 Sofa 2016 Prnt ISSN: 1311-9702; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-2016-0017 Hybrdzaton of Expectaton-Maxmzaton

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS J.H.Guan, F.B.Zhu, F.L.Ban a School of Computer, Spatal Informaton & Dgtal Engneerng Center, Wuhan Unversty, Wuhan, 430079,

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult- Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Bridges and cut-vertices of Intuitionistic Fuzzy Graph Structure

Bridges and cut-vertices of Intuitionistic Fuzzy Graph Structure Internatonal Journal of Engneerng, Scence and Mathematcs (UGC Approved) Journal Homepage: http://www.jesm.co.n, Emal: jesmj@gmal.com Double-Blnd Peer Revewed Refereed Open Access Internatonal Journal -

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

A new paradigm of fuzzy control point in space curve

A new paradigm of fuzzy control point in space curve MATEMATIKA, 2016, Volume 32, Number 2, 153 159 c Penerbt UTM Press All rghts reserved A new paradgm of fuzzy control pont n space curve 1 Abd Fatah Wahab, 2 Mohd Sallehuddn Husan and 3 Mohammad Izat Emr

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Report on On-line Graph Coloring

Report on On-line Graph Coloring 2003 Fall Semester Comp 670K Onlne Algorthm Report on LO Yuet Me (00086365) cndylo@ust.hk Abstract Onlne algorthm deals wth data that has no future nformaton. Lots of examples demonstrate that onlne algorthm

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

F Geometric Mean Graphs

F Geometric Mean Graphs Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 2 (December 2015), pp. 937-952 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) F Geometrc Mean Graphs A.

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information

From Comparing Clusterings to Combining Clusterings

From Comparing Clusterings to Combining Clusterings Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (008 From Comparng Clusterngs to Combnng Clusterngs Zhwu Lu and Yuxn Peng and Janguo Xao Insttute of Computer Scence and Technology,

More information

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Parameter estimation for incomplete bivariate longitudinal data in clinical trials Parameter estmaton for ncomplete bvarate longtudnal data n clncal trals Naum M. Khutoryansky Novo Nordsk Pharmaceutcals, Inc., Prnceton, NJ ABSTRACT Bvarate models are useful when analyzng longtudnal data

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Querying by sketch geographical databases. Yu Han 1, a *

Querying by sketch geographical databases. Yu Han 1, a * 4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then

More information

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers 62626262621 Journal of Uncertan Systems Vol.5, No.1, pp.62-71, 211 Onlne at: www.us.org.u A Smple and Effcent Goal Programmng Model for Computng of Fuzzy Lnear Regresson Parameters wth Consderng Outlers

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm 01 Internatonal Conference on Image, Vson and Computng (ICIVC 01) IPCSIT vol. 50 (01) (01) IACSIT Press, Sngapore DOI: 10.776/IPCSIT.01.V50.4 Vectorzaton of Image Outlnes Usng Ratonal Splne and Genetc

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information