Semi-supervised feature extraction for population structure identification using the Laplacian linear discriminant Usman Roshan 1,* 1

Size: px
Start display at page:

Download "Semi-supervised feature extraction for population structure identification using the Laplacian linear discriminant Usman Roshan 1,* 1"

Transcription

1 Population and Geneti Analysis Semi-supervised feature extration for population struture identifiation using the Laplaian linear disriminant Usman Roshan 1,* 1 Department of Computer Siene, New Jersey Institute of Tehnology, GITC 4400, University Heights, Newark, NJ 07102, USA ABSTRACT Motivation: The identifiation of population struture from genomewide SNP data is of signifiant interest in the population and medial genetis ommunity. A popular solution is to perform unsupervised feature extration using prinipal omponent analysis. Prinipal omponent analysis, however, relies only on global properties of the data. Results: The Laplaian linear disriminant takes into onsideration loal properties of the data as well and, as we show in this study, it an be extended to the semi-supervised setting. This an then be applied to extrat features for identifying population struture when the anestry of some individuals in some sub-populations of the admixture is known. Using real data we simulate suh semisupervised senarios and extrat features using the Laplaian linear disriminant, kernel prinipal omponent analysis, and two reent semi-supervised feature extrators. We show that there is a statistially signifiant improvement in auray when the nearest mean lassifier or k-means lustering is applied on the Laplaian linear disriminant features ompared to kernel prinipal omponent analysis and the other methods. Availability: All neessary software and data for reproduibility purposes is at Contat: usman@s.njit.edu 1 INTRODUCTION The problem of lustering humans into groups of similar geographial anestry arises in the fields of medial and population genetis. In medial genetis disease assoiation studies an lead to misleading results if the underlying population struture is not taken into aount (Ziv and Burhard, 2003; Marhini et. al., 2004; Xu and Shete, 2005; Devlin and Roeder, 1999). In population genetis this an be used to unover demographi history and address related sientifi questions (Cavalli-Sforza and Feldman, 2003). Consequently several methods have been developed for identifying struture from genome wide single nuleotide polymorphism (SNP) data (Tsai et. al., 2005). The two prevailing methods are prinipal omponent analysis (PCA) (Pashou et. al., 2007), whih is an unsupervised feature extration method, and the model-based Markov Chain Monte Carlo method implemented in STRUCTURE (Prithard et. al., 2000), where prior anestry an be speified if available. PCA is very fast and has been shown to separate inter and intra-ontinental admixtures with high auray when followed by the simple k- means lustering algorithm (Pashou et. al., 2007). The model- based approah of STRUCTURE is also onsidered aurate. However it is very slow and thus prohibitive on admixtures with several thousand SNPs or hundreds of individuals. In this study onsider the following problem. Assume that some individuals in the population have known anestry. In pratie this an be obtained using urrent benhmarks suh as the HAPMAP projet ( How an we then extrat features for struture identifiation while taking the prior anestry into onsideration? We all this semi-supervised feature extration sine the data of individuals with unknown anestry is also utilized for extrating features. We propose a semi-supervised Laplaian linear disriminant (LLDA) (Tang et. al., 2006) for solving this problem. LLDA an be onsidered to be a more general feature extrator than PCA. PCA uses only global properties of the data sine it is based on the total satter matrix. LLDA, whih is similar to the maximum margin riterion disriminant (Li et. al., 2006), uses both the total satter matrix, whih aptures global properties of the data, as well as the within lass satter matrix, whih aptures loal properties. Although the within lass satter matrix is normally defined in a supervised framework, it an also be omputed in an unsupervised one using Laplaian matries (Nijima and Okuno, 2007). In this study we extend it to a semi-supervised setting and then extrat features with it. In related work Zhang et. al., 2007, propose a Laplaian based approah that uses must-link and annot-link onstraints for extrating features. Sugiyama et. al., 2007, present a semi-supervised loal Fisher disriminant for feature extration. We ontrast our approah with theirs in the Methods Setion. Using real benhmarks we simulate different semi-supervised senarios and ompare our proposed LLDA approah to kernel PCA and the two reent semi-supervised feature extrators. We ompare the error of the nearest mean lassifier and the k-means lustering algorithm on all the projetions. We show that LLDA attains statistially signifiant higher auraies than kernel PCA even with a small number of individuals with known anestry. These auraies improve as the perent of individuals with known anestry inreases. The two other semi-supervised methods, however, have higher error than our LLDA approah and also kernel PCA on the data onsidered here. 2 METHODS Throughout we assume that n represents the total number of individuals in the population and k represents the number of sub-populations in it. * To whom orrespondene should be addressed. Oxford University Press

2 U. Roshan 2.1 Bakground Enoding the data We assume that for eah individual in the population bialleli SNPs have been assayed. We use the following enoding sheme to onvert eah individual s set of sequened SNPs into numerial vetors on whih PCA an then be performed. Let g be an individual s set of sequened SNPs and let g i represent the i th SNP. g i an be AA, AB, or BB where A and B are alphabetially ordered SNP bases. Following the enoding of smartpa (Patterson et. al., 2006) and similar to the one in (Pashou et. al., 2007) we define the feature vetor x for g by setting x i to 0 if g i is AA, 1 if AB, and 2 if BB. Let eah suh vetor x for the i th individual be the i th olumn of the data matrix X, i.e. X = [x 1,, x n] Prinipal omponent analysis (PCA) Let x be a random variable that represents the feature vetor of an individual s SNP genotype as defined above. Suppose we are interested in omputing a projetion of x onto one dimension suh that the variane (i.e. spread) of the projeted data is maximized. In other words we want to find w suh that Variane(w T x) is maximized subjet to w T w=1. This yields the optimization problem maxw!' T w subjet to w T w =1 (1) w where Σ=E((x-µ)(x-µ) T ) is the ovariane of x and µ=e(x) is the expeted value of x. Using Lagrange multipliers one an show that the eigenvetor of Σ with the largest eigenvalue is the solution to problem (1). The eigenvetor with the next largest eigenvalue yields a projetion orthogonal to the first one and of maximum variane (Alpaydin, 2004). In pratie we use the sample ovariane matrix instead of Σ. Given the data matrix X = [x 1,, x n] (as desribed in the enoding) we define X = [x 1-m,, x n-m], where x i is the feature vetor of the i th individual and m=σx i/n is the mean feature vetor of the population. The sample ovariane matrix is then defined as X X T. This is also alled total satter matrix and is equivalent to S t = 1 n n i=1 (x i! m)(x i! m) T The eigenvetor of S t with the largest eigenvalue (also alled the first prinipal omponent vetor) gives the projetion of maximum variane in the sample. The eigenvetor with the next largest eigenvalue yields a projetion orthogonal to the first one and also of maximum variane. In unsupervised senarios PCA an be very helpful in eluidating lusters in the data Maximum margin disriminant analysis (MMC) In a supervised senario one an use the standard Fisher disriminant (Alpaydin, 2004). Another supervised method is the reent maximum margin disriminant (MMC) (Li et. al., 2006) that does not suffer from singularity problems like its Fisher ounterpart. Define the within lass satter matrix S i for lass i as n i S i = 1 (x (i) j! m (i) )(x (i) j! m (i) ) T n i j =1 the overall within lass satter matrix as S w = 1! n k S k = 1 n n n k!! j =1 and the between lass satter matrix as S b = 1 n n k (m (k )! m)(m (k)! m) T (x (k) j m (k) (k )(x ) j m (k ) ) T where x j (i) is the j th feature vetor of lass i, is the number of lasses, n i is the size of lass i, m (i) is the mean feature vetor of lass i, and m is the mean feature vetor of the entire dataset. The maximum margin riterion for feature extration is defined as (Li et. al., 2006) J = 1 2!! i=1 j =1 p i p j d(c i,c j ) where p i and p j are lass prior probabilities and d(c i,c j) is the interlass distane. Define the interlass distane d(c i,c j) as d(c i,c j)=d(m i,m j)-tr(s i)- tr(s j) where d(m i,m j) is the Eulidean distane between the mean vetors m i and m j and tr(s i) is the overall variane of lass S i. Then if p i=n i/n and d(c i,c j)=d(m i,m j)-tr(s i)-tr(s j), it an be shown that J=tr(S b-s w) (Li et. al., 2006). The linear MMC disriminant aims to find a matrix W = [w 1, w 2,, w d] that maximizes tr(w T (S b! S w )W ) subjet to (w k T w ) for all k [1,d]. This is equivalent to solving max d w 1 w d w k T (S b! S w )w k subjet to w k T w k =1 for k =1,,d Using Lagrange multipliers it an be shown that the d largest eigenvetor of S b-s w are the solution to W. The projetion of the data matrix X is then given by W T X. Sine S t=s w+s b, S b-s w an be rewritten as S t-2s w (Nijima and Okuno, 2007). Thus MMC also takes into onsideration loal properties of the data (by onsidering S w) whereas PCA only onsiders S t Laplaian linear disriminant analysis (LLDA) In order to desribe this disriminant we first define the Laplaian matrix of a weighted graph. Suppose we are given a weighted graph G with n nodes and its assoiated weight matrix W = {w ij : i,j [1,n]}. Then the Laplaian L of G is defined as (Tang et. al., 2006) $!w ij i j( & n & L[i, j] = % ) & d i = w ik i = j& ' * Now we desribe LLDA. First note that the matries S t and S w defined earlier an also be written as (Nijima and Okuno, 2007) S t = 1 n X(I! 1 n eet )X T = 1 n X(I!W g)x T = 1 n XL g X T (3) S w = 1 n X(I! 1 e (k ) e (k )T )X T = 1 n k n X(I!W l )X T = 1 n XL l X T k= l (2) 2

3 Semi-supervised feature extration for population struture identifiation using the Laplaian linear disriminant where e is n dimensional with all entries set to 1 and the i th entry of e (k) is set to 1 if x i belongs to lass k and 0 otherwise. I-W g an be viewed as the Laplaian L g of a global graph where all verties are onneted and eah edge has weight 1/n. Similarly I-W l is the Laplaian L l of a loal graph suh that all verties belonging to the same lass are onneted with weight 1/n k. In terms of the graph Laplaians we an represent S t-2s w as (1/n)X T (L g- 2L l)x using (3). MMC represented in this form is known as Laplaian linear disriminant analysis (Nijima and Okuno, 2007). The d leading eigenvetors of (1/n)X T (L g-2l l)x form the olumns of W and the projetion is then given by W T X. Note that so far the Laplaian of the loal graph is welldefined only under a supervised senario. However, it an also be used in an unsupervised setting (Nijima and Okuno, 2007) and a semi-supervised one as we show below Related work Zhang et. al., 2007, also use a Laplaian graph based approah for semi supervised dimensionality redution. They speify must-link and annotlink pairwise onstraints in the weight matrix and then proeed to ompute its Laplaian. The largest eigenvetors of the Laplaian are then used to projet the data. Our approah, however, is based on the LLDA, whih in turn is based on the MMC riterion. The LLDA approah takes both the global and loal properties into onsideration in separate Laplaians. Furthermore, the Laplaians in our ase are onstruted from nearest neighbor graphs that also inorporate prior information (see Subsetion 2.2.1). The approah of Zhang et. al., 2007, on the other hand, does not use nearest neighbor graphs for onstrution of the weight matrix. Sugiyama et. al., 2007, propose a semi-supervised loal Fisher disriminant for dimensionality redution. They use regularized between-lass and within-lass satter matries and define a trade-off parameter to ontrol the regularization. Their approah is based on the loal Fisher disriminant whereas LLDA (and subsequently our method) is loser to the MMC riterion. Li et. al., 2006, have shown MMC to produe lower error than the Fisher disriminant on faial reognition benhmarks. 2.2 Our ontribution Semi-supervised Laplaian linear disriminant In this study we desribe a semi-supervised extension. Suppose that the anestry of some individuals from some sub-populations in the admixture is known. In order to take this into aount for feature extration in an LLDA framework we proeed as follows. First we define the weight matries for the loal and global graphs as follows. Our definitions of weight matries are similar to the ones in Nijima and Okuno, 2007, exept that we inorporate prior knowledge.! 100p ij if anestry of i and j are speified and l(i) = l( j) % W l [i, j] = k(i, j) if i is among m nearest neighbors of j or vie - versa& $ 0 otherwise ' k(i, j) if i! j% W g [i, j] = & $ 0 if i = j ' where k(i,j) is the similarity of individuals i and j, and l(i) returns the lass (anestry) of individual i. p ij is defined as p ij =! k q ik q jk where q ik is the probability that individual i belongs to sub-population k of the admixture. Current benhmarks are obtained from individuals where both parents and grandparents of an individual are from the same anestry (e.g. Thus q ik is usually 1 or 0. However if an individual has different anestry from the mother and father then q ik will be a probability between 0 and 1. We use prior knowledge also in the onstrution of the nearest neighbor graph. For a given individual i whose anestry is speified we alulate its m nearest neighbors as follows. Let C be the set of individuals with the same speified anestry as that of i. We onsider all individuals in C to be loser to i even if they are onsidered far under a given distane metri. Thus, we add them to the nearest neighbors of i sorted by distane to i. If C < m then we sort the remaining individuals, i.e. the total population without C, by their distane to i and add the losest m- C ones to the nearest neighbor set of i. If the prior anestry of i is not speified we alulate its distane to all other individuals in the admixture and inlude the m nearest ones in the nearest neighbor set of i. After omputing the matries W g and W l we alulate the global and loal Laplaians aording to (2). We then form the matrix (1/n)X T (L g-2l l)x and ompute the d leading eigenvetors as the solution to W, where d is the desired number of redued dimensions. Note that the semi-supervised LLDA as well as the unsupervised one in Nijima and Okuno, 2007 both inlude MMC as a speial ase. If we set W g[i,j]=1/n for all i,j, and W l[i,j]=1/n k if i and j are both in the same lass k and 0 otherwise, we then obtain MMC Our implementation: Kernel-PCA + LLDA The dimensions of (1/n)X T (L g-2l l)x an be very large when thousands of SNPs are given. This an onsiderably slow down the eigenvetor omputations. To overome this we follow the spirit of PCA+LDA algorithms for image reognition (Yang et. al., 2005). We first ompute the full PCA projetion of the SNP data. This an be done effiiently using kernel PCA (Sholkopf and Smola, 2002). Kernel PCA also allows us to model nonlinear relationships in the data. After the kernel PCA transformation eah individual is now represented by an n dimensional vetor (where n is the number of individuals). This is a signifiant redution in dimension. SNPs an be in the order of hundreds of thousands or millions whereas the number of individuals in the population may be muh smaller. We treat the kernel PCA projetion as the new data matrix, i.e. olumn i of the data matrix X is replaed with the kernel PCA projeted vetor of the i th individual. We then apply LLDA on this matrix. Although our method is tehnially kernel-pca + semi-supervised LLDA we refer it to as LLDA hereon. All software and the datasets (inluding the kernel PCA projetions) are available from In partiular, sripts and data required to reprodue the results shown in this study are also provided Parameters In the omputation of W l we set k(i,j) to 1 everywhere, although note that Eulidean, dot produt, or Gaussian kernels may also be used (Nijima and Okuno, 2007). For the nearest neighbor searh we use the Eulidean distane between the kernel-pca projeted vetors. The number of dimensions in the PCA projeted vetors used for the Eulidean distane alula- 3

4 U. Roshan tion is set to 10 and the parameter m (in the definition of W l) is also set to 10. We experimented with fewer dimensions and smaller values of m but observed negligible differenes in error. Sine we use benhmarks where both maternal and paternal parents and grandparents have the same anestry, q ik is set to 1 for the appropriate subpopulation k and 0 for remaining ones if i th s prior anestry is available. Thus p ij is always 1 when both i and j have the same anestry. We used the kernel PCA program gist-kpa of the GIST software suite ( for omputing kernel-pca projetions with the polynomial degree two kernel. We experimented with Gaussian and higher order polynomials but did not observe a signifiant differene in error. We implemented LLDA using Perl and employed the Perl Math Cephes pakage ( for matrix operations. 3 RESULTS In order to study the performane of our proposed approah we simulate two different semi-supervised senarios. In the first senario we assume that for a given admixture some individuals from eah sub-population have known anestry. In the seond one some individuals from only some sub-populations have known anestry. Note that the semi-supervised senario does not affet the PCA projetion sine that is an unsupervised method. However, it produes different LLDA or other semi-supervised projetions. In order to ompare different projetions under the first senario we apply the nearest mean lassifier (Alpaydin, 2004). Sine training samples are available from eah sub-population in this ase it is straightforward to apply this lassifier. The proedure is simple: first alulate the training means of eah sub-population and then assign eah individual to the sub-population with the shortest Eulidean distane to its mean. In the seond senario we use the k-means lustering algorithm (Alpaydin, 2004) on both the projetions. In this situation only some individuals of some subpopulations have known anestry and therefore it is not possible to apply a lassifier. When applying the nearest means or k-means methods on a projetion we onsider only the leading k dimensions, where k is the number of sub-populations in the admixture. Note that this number is not required to ompute the LLDA projetion. We only use it in order to ompare the projetions under the nearest means and k- means algorithms. In pratie the number k an be estimated using various methods (Sanguinetti et. al., 2005). Given a true and estimated lassifiation we use the lassifiation error rate for determining its error. Let T[i] [1,k] be the true subpopulation ID of individual i and similarly let C[i] be the estimated sub-population ID. The error rate is then ( 100! { j : T[ j] C[ j], j [1,n] } ) n If a lustering is given then the above approah is not appliable. Instead we first have to make sure that the estimated subpopulation IDs math the true ones by the following method: for eah estimated sub-population we assign it the ID that is maximum among the true population IDs of all individuals in it. We then proeed with the same formula. We measure the statistial signifiane of the differene in errors between two methods using the Wiloxon signed rank test (Kanji, 1999). As shown in earlier studies PCA followed by k-means an separate sub-populations of the HAPMAP dataset (suh as Chinese and Japanese individuals) with high auray (Pashou et. al., 2007). We fous here on the reent large dataset from Noah Rosenberg s lab (Jakobsson et. al., 2008). This ontains 525,9210 genome-wide SNPs of 485 individuals from 29 populations. From this large dataset we extrat individuals from three different regions as speified in the dataset: East Asia: 10 from Cambodia, 15 from Siberia, 49 from China, and 16 from Japan; 459,188 SNPs without missing entries Afria: 32 Biaka Pygmy individuals, 15 Mbuti Pygmy, 24 Mandenka, 25 Yoruba, 7 San from Namibia, 8 Bantu of South Afria, and 12 Bantu of Kenya; 454,732 SNPs without missing entries. Middle East: 43 Druze from Israel-Carmel, 47 Bedouins from Israel-Negev, 26 Palestinians from Israel-Central, and 30 Mozabite from Algeria-Mzab; 438,596 SNPs without missing entries. We onduted several preliminary studies to determine whih methods to present in the main results. We first ompared kernel- PCA to the implementation of the smartpa program (Patterson et. al., 2006) and found it to have lower error when evaluated under k- means. We also ompared the SSDR method of Zhang et. al., 2007, and the semi-supervised loal Fisher disriminant (SSLFDA) of Sugiyama et. al., 2007, to LLDA under the first semi-supervised senario. We implemented both of these methods in Perl and provide them in our software distribution at In line with our semi-supervised LLDA implementation (kernel-pca + LLDA) we apply both SSDR and SSLFDA on the full kernel PCA projetion. We found both SSDR and SSLFDA to have higher errors than LLDA and even the original kernel PCA on the data onsidered here. Subsequently we omit them from the remaining results. 3.1 Random number of individuals with known anestry from eah sub-population of the admixture In the first semi-supervised senario we assume that some individuals from eah sub-population of the admixture have known anestry. In order to apply LLDA here we would need a minimum of two individuals from eah sub-population with known anestry Between 2 and 4 random individuals of eah sub-population with known anestry (for Middle Eastern admixture between 2 and 8) 4

5 Semi-supervised feature extration for population struture identifiation using the Laplaian linear disriminant We simulate datasets with known anestry as follows. Assume that the admixture has p sub-populations. For eah sub-population j [1,p] we generate a random r j [2,4] and randomly selet r j individuals from sub-population j to have known anestry. For the Middle Eastern admixture, however, we use the range [2,8]. We generate 200 suh random datasets. On eah suh dataset we extrat LLDA features. The kernel-pca features are unsupervised and so independent of the prior anestry. The number of random individuals with known anestry is muh smaller than the total in the admixture. For the datasets in this subsetion the mean number of individuals with known anestries in the East Asian, Afrian, and Middle Eastern admixtures are 11.0, 19.1, and 18.3 respetively. These onstitute 12.2%, 15.5%, and 12.5% of the three admixtures respetively. Sine eah sub-population has a minimum of two individuals with known anestry we an apply the simple nearest means lassifier on the two projetions and measure their error. In Table 1 we report the mean error on both the projetions (averaged over the 200 random trials). On all three admixtures the LLDA error is lower by a statistially signifiant margin. Table 1. Error of nearest means lassifier. * denotes Wiloxon signed rank test p-value < Admixture Kernel PCA Laplaian linear disriminant East Asia * Afria * Middle East * Between 2 and 20% random individuals of eah subpopulation with known anestry We simulate 200 random datasets with known anestry in a manner similar to the one desribed in the previous setion. However, the range for eah r j is [2,0.2n j ] where n j is the number of individuals in sub-population j of the admixture. The mean perent of individuals with known anestry is 13.5%, 15.1%, and 12% for the three admixtures respetively. LLDA still performs signifiantly better as Table 2 shows. Table 2. Caption as Table 1. Admixture Kernel PCA Laplaian linear disriminant East Asia * Afria * Middle East * 3.2 Random number of individuals with known anestry but only from some sub-populations of the admixture In the seond semi-supervised senario we assume that only some individuals from some sub-populations have known anestry. This presents a more diffiult senario sine now a lassifiation method annot be applied. In order to ompare the two projetions in this senario we apply the popular k-means lustering method. This may not neessarily the best way of omparison but it still gives us some idea of how well separated the two projetions are. We run k-means 100 times on eah input and report the error of the lustering with the highest objetive funtion value Between 2 and 4 random individuals of some subpopulations with known anestry (for Middle Eastern admixture between 2 and 8) Assume that the admixture has p sub-populations. We first selet a random number p [1,p] and then pik p random sub-population IDs (from 1 through p) using Fisher-Yates sampling (Knuth, 1998). For eah seleted sub-population j we generate a random r j [2,4] and randomly selet r j individuals from sub-population j to have known anestry (as before). Again for the Middle Eastern admixture we use the range [2,8]. We generate 200 suh random datasets. On eah suh dataset we extrat LLDA features. The kernel-pca features are unsupervised and so independent of the prior anestry. The number of seleted sub-populations p and number of individuals per seleted sub-population varies aross these datasets. For example for the 200 datasets in this subsetion the mean values of p for the East Asian, Afrian, and Middle Eastern admixtures are 2.6, 3.9, and 2.5. These values are very similar for the data in the proeeding subsetions. The number of individuals with known anestry is also muh smaller than the total admixture sizes. For the data in this subsetion the mean number of individuals with known anestry are 7.1, 10.8, and 11.8 for the East Asian, Afrian, and Middle Eastern admixtures. These onstitute 7.8%, 8.8%, and 8.1% of the respetive admixtures. We apply the k-means lustering algorithm on both the kernel- PCA and LLDA projetions. K-means on the kernel-pca projetion has the same error aross the different datasets beause the prior anestry does not affet the projetion. In Table 3 we report the k-means error averaged over the 200 random trials. Exept for the East Asian admixture LLDA has statistially signifiant lower errors. Table 3. Error of k-means. * denotes Wiloxon signed rank test p-value < Admixture Kernel PCA Laplaian linear disriminant East Asia Afria * Middle East * 5

6 U. Roshan Between 2 and 20% random individuals of some subpopulations with known anestry We proeed as in the previous subsetion. We first selet p random population IDs. However, this time we selet r j randomly from [2,0.2n j ] where n j is the number of individuals in the seleted sub-population j of the admixture. The mean perent of individuals with known anestry is 9%, 9.1%, and 7.4% for the three admixtures respetively. As Table 4 shows LLDA performs signifiantly better than kernel-pca. Table 4. Caption as Table 3. Admixture Kernel PCA Laplaian linear disriminant East Asia * Afria * Middle East * On the HAPMAP dataset PCA followed by k-means separates the Japanese and Chinese sub-populations with 1% error (Pashou et. al., 2007). We observed the same error with kernel PCA. On the admixtures onsidered here however, PCA error is onsiderably higher. For example, on the Middle Eastern admixture kernel PCA followed by k-means reahes an error of 29.5%. This admixture is partiularly hard beause of the losely related Bedouin and Palestinian sub-populations. LLDA gives an improvement of almost 10% when mean perent of known anestry is 16.8% over randomly seleted sub-populations of this admixture (see Table 6). The perent of individuals with known anestry is intentionally kept small throughout our experiments in order to simulate hard senarios. In pratie prior anestry of many individuals may be available whih in turn will provide a greater advantage with LLDA. This trend an be observed from Tables 4 through 6: as the mean number of individuals with known anestry inreases the LLDA error drops. Our LLDA implementation is also very fast: for any of the three given admixtures it finishes in a few seonds Between 2 and 35% random individuals of some subpopulations with known anestry We proeed as previously but selet r j randomly from [2,0.35n j ]. The mean perent of individuals with known anestry is 12.2%, 11.9%, and 12% for the three admixtures respetively. Table 5 shows the improvements gained from LLDA. Table 5. Caption as Table 3. Admixture Kernel PCA Laplaian linear disriminant East Asia * Afria * Middle East * Between 2 and 50% random individuals of some subpopulations with known anestry Finally we selet r j randomly from [2,0.5n j ]. The mean number of individuals with known anestry (over the 200 trials) is still small for the data in this subsetion: 15.5 for East Asian admixture, 19.3 for Afrian, and 24.6 for the Middle Eastern. These onstitute 17.2%, 15.7%, and 16.8% of the three admixtures respetively. We summarize the results in Table 6. Table 6. Caption as Table 3. Admixture Kernel PCA Laplaian linear disriminant East Asia * Afria * Middle East * 4 CONCLUSIONS We proposed a semi-supervised Laplaian linear disriminant for extrating features for identifying population struture when the anestry of some individuals is known in advane. Using real benhmarks we simulate various semi-supervised senarios and show that LLDA outperforms kernel PCA by a statistially signifiant margin. In omparison to two reent semi-supervised feature extrators, LLDA and kernel PCA have lower error on the data onsidered here. The proposed LLDA method is fast, an be easily implemented, and aommodates mixed prior anestries. ACKNOWLEDGEMENTS Computational experiments were performed on the CIPRES luster supported by NSF award We thank system administrators at New Jersey Institute of Tehnology and the San Diego Superomputing Center for their support. REFERENCES Alpaydin, E. (2004) Introdution to Mahine Learning, MIT Press Cavalli-Sforza L., Feldman M. (2003) The appliation of moleular geneti approahes to the study of human evolution, Nature Genetis 33: Devlin B, Roeder K (1999) Genomi ontrol for assoiation studies. Biometris 55: Jakobsson M., Sholz S.W., Sheet P., Gibbs J.R., VanLiere J.M., Fung H-C., Szpieh Z.A., Degnan J.H., Wang K., Guerreiro R., Bras J.M., Shymik J.C., Hernandez D.G., Traynor B.J., Simon-Sanhez J., Matarin M., Britton A., van de Leemput J., Rafferty I., Buan M., Cann H.M., Hardy J.A., Rosenberg N.A., Singleton A.B. (2008) Genotype, haplotype and opy-number variation in worldwide human populations. Nature 451: Kanji G. K. (1999) 100 Statistial Tests, London, U.K., Sage Publiations Knuth D. E. (1998) The Art of Computer Programming volume 2, 3 rd edition, Li H., Tiang J., Zhang J. (2006) Effiient and robust feature extration by maximum margin riterion, IEEE Trans. Neural Networks 17(1): Marhini J, Cardon L, Phillips M, Donnelly P (2004) The effets of human population struture on large geneti assoiation studies. Nature Genetis 36: Nijima S., Okuno Y. (2007) Laplaian linear disriminant analysis approah to unsupervised feature seletion, IEEE/ACM Transations on Computational Biology and Bioinformatis PP(99):1-1, doi: /tcbb Disussion 6

7 Semi-supervised feature extration for population struture identifiation using the Laplaian linear disriminant Pashou P, Ziv E, Burhard EG, Choudhry S, Rodriguez-Cintron W, et al. (2007) PCA-Correlated SNPs for Struture Identifiation in Worldwide Human Populations. PLoS Genetis 3(9): e160 doi: /journal.pgen Patterson N, Prie A, Reih D (2006) Population struture and eigenanalysis. PLoS Genetis 2: e190. doi: /journal.pgen Prithard J, Stephens M, Donnelly P (2000) Inferene of population struture using multilous genotype data. Genetis 155: Sanguinetti G., Laidler J., Lawrene N. D. (2005) Automati determination of the number of lusters using spetral algorithms, Proeedings of the IEEE Workshop on Mahine Learning for Signal Proessing, Sholkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge, MA, MIT Press. Sugiyama M., Ide T., Nakajimi S., Sese J. (2007) Semi-supervised loal Fisher disriminant analysis for dimensionality redution, Proeedings of the Workshop on Information-Based Indution Sienes, Tokyo, Japan Tang H, Fang T., Shi P-F. (2006) Laplaian linear disriminant analysis, Pattern Reognition, 39: Tsai H, Choudhry S, Naqvi M, Rodriguez-Cintron W, Burhard E, et al. (2005) Comparison of three methods to estimate geneti anestry and ontrol for stratifiation in geneti assoiation studies among admixed populations, Hum Genet 118: Xu H., Shete S. (2005) Effets of population struture on geneti assoiation studies. BMC Genetis 6(Suppl 1): S109 Yang J., Frangi A.F.,Yang J-Y., Zhang D., Zhong J. (2005) KPCA plus LDA: a omplete kernel Fisher disriminant framework for feature extration and reognition, IEEE Transations on Pattern Analysis and Mahine Intelligene 27(2): Zhang D., Zhou Z-H., Chen S., (2007) Semi-supervised dimensionality redution, Proeedings of the 2007 SIAM International Conferene on Data Mining, Minneapolis, Minnesota, USA Ziv E, Burhard E (2003) Human population struture and geneti assoiation studies. Pharmaogenomis 4:

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

Cluster-Based Cumulative Ensembles

Cluster-Based Cumulative Ensembles Cluster-Based Cumulative Ensembles Hanan G. Ayad and Mohamed S. Kamel Pattern Analysis and Mahine Intelligene Lab, Eletrial and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1,

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

Evolutionary Feature Synthesis for Image Databases

Evolutionary Feature Synthesis for Image Databases Evolutionary Feature Synthesis for Image Databases Anlei Dong, Bir Bhanu, Yingqiang Lin Center for Researh in Intelligent Systems University of California, Riverside, California 92521, USA {adong, bhanu,

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules Improved Vehile Classifiation in Long Traffi Video by Cooperating Traker and Classifier Modules Brendan Morris and Mohan Trivedi University of California, San Diego San Diego, CA 92093 {b1morris, trivedi}@usd.edu

More information

Using Augmented Measurements to Improve the Convergence of ICP

Using Augmented Measurements to Improve the Convergence of ICP Using Augmented Measurements to Improve the onvergene of IP Jaopo Serafin, Giorgio Grisetti Dept. of omputer, ontrol and Management Engineering, Sapienza University of Rome, Via Ariosto 25, I-0085, Rome,

More information

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition A Coarse-to-Fine Classifiation Sheme for Faial Expression Reognition Xiaoyi Feng 1,, Abdenour Hadid 1 and Matti Pietikäinen 1 1 Mahine Vision Group Infoteh Oulu and Dept. of Eletrial and Information Engineering

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

New Fuzzy Object Segmentation Algorithm for Video Sequences *

New Fuzzy Object Segmentation Algorithm for Video Sequences * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 521-537 (2008) New Fuzzy Obet Segmentation Algorithm for Video Sequenes * KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO Department

More information

Exploiting Enriched Contextual Information for Mobile App Classification

Exploiting Enriched Contextual Information for Mobile App Classification Exploiting Enrihed Contextual Information for Mobile App Classifiation Hengshu Zhu 1 Huanhuan Cao 2 Enhong Chen 1 Hui Xiong 3 Jilei Tian 2 1 University of Siene and Tehnology of China 2 Nokia Researh Center

More information

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs?

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs? One Against One or One Against All : Whih One is Better for Handwriting Reognition with SVMs? Jonathan Milgram, Mohamed Cheriet, Robert Sabourin To ite this version: Jonathan Milgram, Mohamed Cheriet,

More information

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION Cuiui Kang 1, Shengai Liao, Shiming Xiang 1, Chunhong Pan 1 1 National Laboratory of Pattern Reognition, Institute of Automation, Chinese

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 CSES International 2007 ISSN 0973-4406 253 An Optimized Approah on Applying Geneti Algorithm to Adaptive

More information

Semi-Supervised Affinity Propagation with Instance-Level Constraints

Semi-Supervised Affinity Propagation with Instance-Level Constraints Semi-Supervised Affinity Propagation with Instane-Level Constraints Inmar E. Givoni, Brendan J. Frey Probabilisti and Statistial Inferene Group University of Toronto 10 King s College Road, Toronto, Ontario,

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks Unsupervised Stereosopi Video Objet Segmentation Based on Ative Contours and Retrainable Neural Networks KLIMIS NTALIANIS, ANASTASIOS DOULAMIS, and NIKOLAOS DOULAMIS National Tehnial University of Athens

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

Boosted Random Forest

Boosted Random Forest Boosted Random Forest Yohei Mishina, Masamitsu suhiya and Hironobu Fujiyoshi Department of Computer Siene, Chubu University, 1200 Matsumoto-ho, Kasugai, Aihi, Japan {mishi, mtdoll}@vision.s.hubu.a.jp,

More information

A {k, n}-secret Sharing Scheme for Color Images

A {k, n}-secret Sharing Scheme for Color Images A {k, n}-seret Sharing Sheme for Color Images Rastislav Luka, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos The Edward S. Rogers Sr. Dept. of Eletrial and Computer Engineering, University

More information

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes Deteting Outliers in High-Dimensional Datasets with Mixed Attributes A. Koufakou, M. Georgiopoulos, and G.C. Anagnostopoulos 2 Shool of EECS, University of Central Florida, Orlando, FL, USA 2 Dept. of

More information

A Fast Kernel-based Multilevel Algorithm for Graph Clustering

A Fast Kernel-based Multilevel Algorithm for Graph Clustering A Fast Kernel-based Multilevel Algorithm for Graph Clustering Inderjit Dhillon Dept. of Computer Sienes University of Texas at Austin Austin, TX 78712 inderjit@s.utexas.edu Yuqiang Guan Dept. of Computer

More information

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating Capturing Large Intra-lass Variations of Biometri Data by Template Co-updating Ajita Rattani University of Cagliari Piazza d'armi, Cagliari, Italy ajita.rattani@diee.unia.it Gian Lua Marialis University

More information

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method 3537 Multiple-Criteria Deision Analysis: A Novel Rank Aggregation Method Derya Yiltas-Kaplan Department of Computer Engineering, Istanbul University, 34320, Avilar, Istanbul, Turkey Email: dyiltas@ istanbul.edu.tr

More information

Weak Dependence on Initialization in Mixture of Linear Regressions

Weak Dependence on Initialization in Mixture of Linear Regressions Proeedings of the International MultiConferene of Engineers and Computer Sientists 8 Vol I IMECS 8, Marh -6, 8, Hong Kong Weak Dependene on Initialization in Mixture of Linear Regressions Ryohei Nakano

More information

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors Eurographis Symposium on Geometry Proessing (003) L. Kobbelt, P. Shröder, H. Hoppe (Editors) Rotation Invariant Spherial Harmoni Representation of 3D Shape Desriptors Mihael Kazhdan, Thomas Funkhouser,

More information

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of

More information

Segmentation of brain MR image using fuzzy local Gaussian mixture model with bias field correction

Segmentation of brain MR image using fuzzy local Gaussian mixture model with bias field correction IOSR Journal of VLSI and Signal Proessing (IOSR-JVSP) Volume 2, Issue 2 (Mar. Apr. 2013), PP 35-41 e-issn: 2319 4200, p-issn No. : 2319 4197 Segmentation of brain MR image using fuzzy loal Gaussian mixture

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

A Comprehensive Review of Overlapping Community Detection Algorithms for Social Networks

A Comprehensive Review of Overlapping Community Detection Algorithms for Social Networks International Journal of Engineering Researh and Appliations (IJERA) ISSN: 2248-9622 National Conferene on Advanes in Engineering and Tehnology (AET- 29th Marh 2014) RESEARCH ARTICLE OPEN ACCESS A Comprehensive

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

Diffusion Kernels on Graphs and Other Discrete Structures

Diffusion Kernels on Graphs and Other Discrete Structures Diffusion Kernels on Graphs and Other Disrete Strutures Risi Imre Kondor ohn Lafferty Shool of Computer Siene Carnegie Mellon University Pittsburgh P 523 US KONDOR@CMUEDU LFFERTY@CSCMUEDU bstrat The appliation

More information

A scheme for racquet sports video analysis with the combination of audio-visual information

A scheme for racquet sports video analysis with the combination of audio-visual information A sheme for raquet sports video analysis with the ombination of audio-visual information Liyuan Xing a*, Qixiang Ye b, Weigang Zhang, Qingming Huang a and Hua Yu a a Graduate Shool of the Chinese Aadamy

More information

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization Self-Adaptive Parent to Mean-Centri Reombination for Real-Parameter Optimization Kalyanmoy Deb and Himanshu Jain Department of Mehanial Engineering Indian Institute of Tehnology Kanpur Kanpur, PIN 86 {deb,hjain}@iitk.a.in

More information

Hyperspectral Images Classification Using Energy Profiles of Spatial and Spectral Features

Hyperspectral Images Classification Using Energy Profiles of Spatial and Spectral Features journal homepage: www.elsevier.om Hyperspetral Images Classifiation Using Energy Profiles of Spatial and Spetral Features Hamid Reza Shahdoosti a a Hamedan University of ehnology, Department of Eletrial

More information

TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM

TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM M. Murugeswari 1, M.Gayathri 2 1 Assoiate Professor, 2 PG Sholar 1,2 K.L.N College of Information

More information

FOREGROUND OBJECT EXTRACTION USING FUZZY C MEANS WITH BIT-PLANE SLICING AND OPTICAL FLOW

FOREGROUND OBJECT EXTRACTION USING FUZZY C MEANS WITH BIT-PLANE SLICING AND OPTICAL FLOW FOREGROUND OBJECT EXTRACTION USING FUZZY C EANS WITH BIT-PLANE SLICING AND OPTICAL FLOW SIVAGAI., REVATHI.T, JEGANATHAN.L 3 APSG, SCSE, VIT University, Chennai, India JRF, DST, Dehi, India. 3 Professor,

More information

Multi-modal Clustering for Multimedia Collections

Multi-modal Clustering for Multimedia Collections Multi-modal Clustering for Multimedia Colletions Ron Bekkerman and Jiwoon Jeon Center for Intelligent Information Retrieval University of Massahusetts at Amherst, USA {ronb jeon}@s.umass.edu Abstrat Most

More information

DOMAIN ADAPTATION BY ITERATIVE IMPROVEMENT OF SOFT-LABELING AND MAXIMIZATION OF NON-PARAMETRIC MUTUAL INFORMATION. M.N.A. Khan, Douglas R.

DOMAIN ADAPTATION BY ITERATIVE IMPROVEMENT OF SOFT-LABELING AND MAXIMIZATION OF NON-PARAMETRIC MUTUAL INFORMATION. M.N.A. Khan, Douglas R. DOMAIN ADAPTATION BY ITERATIVE IMPROVEMENT OF SOFT-LABELING AND MAXIMIZATION OF NON-PARAMETRIC MUTUAL INFORMATION M.N.A. Khan, Douglas R. Heisterkamp Department of Computer Siene Oklahoma State University,

More information

Fast Rigid Motion Segmentation via Incrementally-Complex Local Models

Fast Rigid Motion Segmentation via Incrementally-Complex Local Models Fast Rigid Motion Segmentation via Inrementally-Complex Loal Models Fernando Flores-Mangas Allan D. Jepson Department of Computer Siene, University of Toronto {mangas,jepson}@s.toronto.edu Abstrat The

More information

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results An Alternative Approah to the Fuzziier in Fuzzy Clustering to Obtain Better Clustering Results Frank Klawonn Department o Computer Siene University o Applied Sienes BS/WF Salzdahlumer Str. 46/48 D-38302

More information

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT 1 ZHANGGUO TANG, 2 HUANZHOU LI, 3 MINGQUAN ZHONG, 4 JIAN ZHANG 1 Institute of Computer Network and Communiation Tehnology,

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? 3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? Bernd Girod, Peter Eisert, Marus Magnor, Ekehard Steinbah, Thomas Wiegand Te {girod eommuniations Laboratory, University of Erlangen-Nuremberg

More information

Transition Detection Using Hilbert Transform and Texture Features

Transition Detection Using Hilbert Transform and Texture Features Amerian Journal of Signal Proessing 1, (): 35-4 DOI: 1.593/.asp.1.6 Transition Detetion Using Hilbert Transform and Texture Features G. G. Lashmi Priya *, S. Domni Department of Computer Appliations, National

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup Parallelizing Frequent Web Aess Pattern Mining with Partial Enumeration for High Peiyi Tang Markus P. Turkia Department of Computer Siene Department of Computer Siene University of Arkansas at Little Rok

More information

Figure 1. LBP in the field of texture analysis operators.

Figure 1. LBP in the field of texture analysis operators. L MEHODOLOGY he loal inary pattern (L) texture analysis operator is defined as a gray-sale invariant texture measure, derived from a general definition of texture in a loal neighorhood. he urrent form

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

Polyhedron Volume-Ratio-based Classification for Image Recognition

Polyhedron Volume-Ratio-based Classification for Image Recognition Polyhedron Volume-Ratio-based Classifiation for Image Reognition Qingxiang Feng, Jeng-Shyang Pan, Senior Member, IEEE, Jar-Ferr Yang, Fellow, IEEE and Yang-ing Chou Abstrat In this paper, a novel method,

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

An Efficient and Scalable Approach to CNN Queries in a Road Network

An Efficient and Scalable Approach to CNN Queries in a Road Network An Effiient and Salable Approah to CNN Queries in a Road Network Hyung-Ju Cho Chin-Wan Chung Dept. of Eletrial Engineering & Computer Siene Korea Advaned Institute of Siene and Tehnology 373- Kusong-dong,

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

Cluster Centric Fuzzy Modeling

Cluster Centric Fuzzy Modeling 10.1109/TFUZZ.014.300134, IEEE Transations on Fuzzy Systems TFS-013-0379.R1 1 Cluster Centri Fuzzy Modeling Witold Pedryz, Fellow, IEEE, and Hesam Izakian, Student Member, IEEE Abstrat In this study, we

More information

An Edge-based Clustering Algorithm to Detect Social Circles in Ego Networks

An Edge-based Clustering Algorithm to Detect Social Circles in Ego Networks JOURNAL OF COMPUTERS, VOL. 8, NO., OCTOBER 23 2575 An Edge-based Clustering Algorithm to Detet Soial Cirles in Ego Networks Yu Wang Shool of Computer Siene and Tehnology, Xidian University Xi an,77, China

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

Gradient based progressive probabilistic Hough transform

Gradient based progressive probabilistic Hough transform Gradient based progressive probabilisti Hough transform C.Galambos, J.Kittler and J.Matas Abstrat: The authors look at the benefits of exploiting gradient information to enhane the progressive probabilisti

More information

Performance Benchmarks for an Interactive Video-on-Demand System

Performance Benchmarks for an Interactive Video-on-Demand System Performane Benhmarks for an Interative Video-on-Demand System. Guo,P.G.Taylor,E.W.M.Wong,S.Chan,M.Zukerman andk.s.tang ARC Speial Researh Centre for Ultra-Broadband Information Networks (CUBIN) Department

More information

Detection and Recognition of Non-Occluded Objects using Signature Map

Detection and Recognition of Non-Occluded Objects using Signature Map 6th WSEAS International Conferene on CIRCUITS, SYSTEMS, ELECTRONICS,CONTROL & SIGNAL PROCESSING, Cairo, Egypt, De 9-31, 007 65 Detetion and Reognition of Non-Oluded Objets using Signature Map Sangbum Park,

More information

Spatial-Aware Collaborative Representation for Hyperspectral Remote Sensing Image Classification

Spatial-Aware Collaborative Representation for Hyperspectral Remote Sensing Image Classification Spatial-Aware Collaborative Representation for Hyperspetral Remote Sensing Image ifiation Junjun Jiang, Member, IEEE, Chen Chen, Member, IEEE, Yi Yu, Xinwei Jiang, and Jiayi Ma Member, IEEE Representation-residual

More information

Defect Detection and Classification in Ceramic Plates Using Machine Vision and Naïve Bayes Classifier for Computer Aided Manufacturing

Defect Detection and Classification in Ceramic Plates Using Machine Vision and Naïve Bayes Classifier for Computer Aided Manufacturing Defet Detetion and Classifiation in Cerami Plates Using Mahine Vision and Naïve Bayes Classifier for Computer Aided Manufaturing 1 Harpreet Singh, 2 Kulwinderpal Singh, 1 Researh Student, 2 Assistant Professor,

More information

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification A New RBFNDDA-KNN Network and Its Appliation to Medial Pattern Classifiation Shing Chiang Tan 1*, Chee Peng Lim 2, Robert F. Harrison 3, R. Lee Kennedy 4 1 Faulty of Information Siene and Tehnology, Multimedia

More information

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar Plot-to-trak orrelation in A-SMGCS using the target images from a Surfae Movement Radar G. Golino Radar & ehnology Division AMS, Italy ggolino@amsjv.it Abstrat he main topi of this paper is the formulation

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

13.1 Numerical Evaluation of Integrals Over One Dimension

13.1 Numerical Evaluation of Integrals Over One Dimension 13.1 Numerial Evaluation of Integrals Over One Dimension A. Purpose This olletion of subprograms estimates the value of the integral b a f(x) dx where the integrand f(x) and the limits a and b are supplied

More information

Mean Deviation Similarity Index: Efficient and Reliable Full-Reference Image Quality Evaluator

Mean Deviation Similarity Index: Efficient and Reliable Full-Reference Image Quality Evaluator 1 Mean Deviation Similarity Index: Effiient and Reliable Full-Referene Image Quality Evaluator Hossein Ziaei Nafhi, Atena Shahkolaei, Rahid Hedjam, and Mohamed Cheriet, Senior Member, IEEE two images of

More information

And, the (low-pass) Butterworth filter of order m is given in the frequency domain by

And, the (low-pass) Butterworth filter of order m is given in the frequency domain by Problem Set no.3.a) The ideal low-pass filter is given in the frequeny domain by B ideal ( f ), f f; =, f > f. () And, the (low-pass) Butterworth filter of order m is given in the frequeny domain by B

More information

Define - starting approximation for the parameters (p) - observational data (o) - solution criterion (e.g. number of iterations)

Define - starting approximation for the parameters (p) - observational data (o) - solution criterion (e.g. number of iterations) Global Iterative Solution Distributed proessing of the attitude updating L. Lindegren (21 May 2001) SAG LL 37 Abstrat. The attitude updating algorithm given in GAIA LL 24 (v. 2) is modified to allow distributed

More information

The goal of machine learning

The goal of machine learning [appliations CORNER] Kilian Q. Weinberger, Fei Sha, and Lawrene K. Saul Convex Optimizations for Distane Metri Learning and Pattern Classifiation The goal of mahine learning is to build automated systems

More information

Learning Discriminative and Shareable Features. Scene Classificsion

Learning Discriminative and Shareable Features. Scene Classificsion Learning Disriminative and Shareable Features for Sene Classifiation Zhen Zuo, Gang Wang, Bing Shuai, Lifan Zhao, Qingxiong Yang, and Xudong Jiang Nanyang Tehnologial University, Singapore, Advaned Digital

More information

Video Data and Sonar Data: Real World Data Fusion Example

Video Data and Sonar Data: Real World Data Fusion Example 14th International Conferene on Information Fusion Chiago, Illinois, USA, July 5-8, 2011 Video Data and Sonar Data: Real World Data Fusion Example David W. Krout Applied Physis Lab dkrout@apl.washington.edu

More information

An Interactive-Voting Based Map Matching Algorithm

An Interactive-Voting Based Map Matching Algorithm Eleventh International Conferene on Mobile Data Management An Interative-Voting Based Map Mathing Algorithm Jing Yuan* University of Siene and Tehnology of China Hefei, China yuanjing@mail.ust.edu.n Yu

More information

Color Image Fusion for Concealed Weapon Detection

Color Image Fusion for Concealed Weapon Detection In: E.M. Carapezza (Ed.), Sensors, and ommand, ontrol, ommuniations, and intelligene (C3I) tehnologies for homeland defense and law enforement II, SPIE-571 (pp. 372-379). Bellingham, WA., USA: The International

More information

Machine Vision. Laboratory Exercise Name: Student ID: S

Machine Vision. Laboratory Exercise Name: Student ID: S Mahine Vision 521466S Laoratory Eerise 2011 Name: Student D: General nformation To pass these laoratory works, you should answer all questions (Q.y) with an understandale handwriting either in English

More information

Discrete sequential models and CRFs. 1 Case Study: Supervised Part-of-Speech Tagging

Discrete sequential models and CRFs. 1 Case Study: Supervised Part-of-Speech Tagging 0-708: Probabilisti Graphial Models 0-708, Spring 204 Disrete sequential models and CRFs Leturer: Eri P. Xing Sribes: Pankesh Bamotra, Xuanhong Li Case Study: Supervised Part-of-Speeh Tagging The supervised

More information

特集 Road Border Recognition Using FIR Images and LIDAR Signal Processing

特集 Road Border Recognition Using FIR Images and LIDAR Signal Processing デンソーテクニカルレビュー Vol. 15 2010 特集 Road Border Reognition Using FIR Images and LIDAR Signal Proessing 高木聖和 バーゼル ファルディ Kiyokazu TAKAGI Basel Fardi ヘンドリック ヴァイゲル Hendrik Weigel ゲルド ヴァニーリック Gerd Wanielik This paper

More information

MATH STUDENT BOOK. 12th Grade Unit 6

MATH STUDENT BOOK. 12th Grade Unit 6 MATH STUDENT BOOK 12th Grade Unit 6 Unit 6 TRIGONOMETRIC APPLICATIONS MATH 1206 TRIGONOMETRIC APPLICATIONS INTRODUCTION 3 1. TRIGONOMETRY OF OBLIQUE TRIANGLES 5 LAW OF SINES 5 AMBIGUITY AND AREA OF A TRIANGLE

More information

Bayesian Belief Networks for Data Mining. Harald Steck and Volker Tresp. Siemens AG, Corporate Technology. Information and Communications

Bayesian Belief Networks for Data Mining. Harald Steck and Volker Tresp. Siemens AG, Corporate Technology. Information and Communications Bayesian Belief Networks for Data Mining Harald Stek and Volker Tresp Siemens AG, Corporate Tehnology Information and Communiations 81730 Munih, Germany fharald.stek, Volker.Trespg@mhp.siemens.de Abstrat

More information

The Implementation of RRTs for a Remote-Controlled Mobile Robot

The Implementation of RRTs for a Remote-Controlled Mobile Robot ICCAS5 June -5, KINEX, Gyeonggi-Do, Korea he Implementation of RRs for a Remote-Controlled Mobile Robot Chi-Won Roh*, Woo-Sub Lee **, Sung-Chul Kang *** and Kwang-Won Lee **** * Intelligent Robotis Researh

More information

Alleviating DFT cost using testability driven HLS

Alleviating DFT cost using testability driven HLS Alleviating DFT ost using testability driven HLS M.L.Flottes, R.Pires, B.Rouzeyre Laboratoire d Informatique, de Robotique et de Miroéletronique de Montpellier, U.M. CNRS 5506 6 rue Ada, 34392 Montpellier

More information

SURVEY ON MEDICAL IMAGE SEGMENTATION USING ENHANCED K-MEANS AND KERNELIZED FUZZY C- MEANS

SURVEY ON MEDICAL IMAGE SEGMENTATION USING ENHANCED K-MEANS AND KERNELIZED FUZZY C- MEANS SURVEY ON MEDICAL IMAGE SEGMENTATION USING ENHANCED K-MEANS AND KERNELIZED FUZZY C- MEANS Gunwanti S. Mahajan & Kanhan S. Bhagat. Dept of E &TC, J. T. Mahajan C.o.E Faizpur, India ABSTRACT Diagnosti imaging

More information

TOWARD HYBRID VARIANT/GENERATIVE PROCESS PLANNING

TOWARD HYBRID VARIANT/GENERATIVE PROCESS PLANNING Proeedings of DETC 97: 1997 ASME Design Engineering Tehnial Conferenes September 14-17,1997, Saramento, California DETC97/DFM-4333 TOWARD HYBRID VARIANT/GENERATIVE PROCESS PLANNING Alexei Elinson Dept.

More information

represent = as a finite deimal" either in base 0 or in base. We an imagine that the omputer first omputes the mathematial = then rounds the result to

represent = as a finite deimal either in base 0 or in base. We an imagine that the omputer first omputes the mathematial = then rounds the result to Sientifi Computing Chapter I Computer Arithmeti Jonathan Goodman Courant Institute of Mathemaial Sienes Last revised January, 00 Introdution One of the many soures of error in sientifi omputing is inexat

More information

IMPROVED FUZZY CLUSTERING METHOD BASED ON INTUITIONISTIC FUZZY PARTICLE SWARM OPTIMIZATION

IMPROVED FUZZY CLUSTERING METHOD BASED ON INTUITIONISTIC FUZZY PARTICLE SWARM OPTIMIZATION Journal of Theoretial and Applied Information Tehnology IMPROVED FUZZY CLUSTERING METHOD BASED ON INTUITIONISTIC FUZZY PARTICLE SWARM OPTIMIZATION V.KUMUTHA, 2 S. PALANIAMMAL D.J. Aademy For Managerial

More information

COMBINATION OF INTERSECTION- AND SWEPT-BASED METHODS FOR SINGLE-MATERIAL REMAP

COMBINATION OF INTERSECTION- AND SWEPT-BASED METHODS FOR SINGLE-MATERIAL REMAP Combination of intersetion- and swept-based methods for single-material remap 11th World Congress on Computational Mehanis WCCM XI) 5th European Conferene on Computational Mehanis ECCM V) 6th European

More information

FUZZY WATERSHED FOR IMAGE SEGMENTATION

FUZZY WATERSHED FOR IMAGE SEGMENTATION FUZZY WATERSHED FOR IMAGE SEGMENTATION Ramón Moreno, Manuel Graña Computational Intelligene Group, Universidad del País Vaso, Spain http://www.ehu.es/winto; {ramon.moreno,manuel.grana}@ehu.es Abstrat The

More information

Fitting conics to paracatadioptric projections of lines

Fitting conics to paracatadioptric projections of lines Computer Vision and Image Understanding 11 (6) 11 16 www.elsevier.om/loate/viu Fitting onis to paraatadioptri projetions of lines João P. Barreto *, Helder Araujo Institute for Systems and Robotis, Department

More information

SINCE the successful launch of high-resolution sensors and

SINCE the successful launch of high-resolution sensors and 1458 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 5, MAY 2007 Classifiation of High Spatial Resolution Imagery Using Improved Gaussian Markov Random-Field-Based Texture Features Yindi

More information

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Arne Hamann, Razvan Rau, Rolf Ernst Institute of Computer and Communiation Network Engineering Tehnial University of Braunshweig,

More information

Face and Facial Feature Tracking for Natural Human-Computer Interface

Face and Facial Feature Tracking for Natural Human-Computer Interface Fae and Faial Feature Traking for Natural Human-Computer Interfae Vladimir Vezhnevets Graphis & Media Laboratory, Dept. of Applied Mathematis and Computer Siene of Mosow State University Mosow, Russia

More information

Gait Based Human Recognition with Various Classifiers Using Exhaustive Angle Calculations in Model Free Approach

Gait Based Human Recognition with Various Classifiers Using Exhaustive Angle Calculations in Model Free Approach Ciruits and Systems, 2016, 7, 1465-1475 Published Online June 2016 in SiRes. http://www.sirp.org/journal/s http://dx.doi.org/10.4236/s.2016.78128 Gait Based Human Reognition with Various Classifiers Using

More information