On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution

Size: px
Start display at page:

Download "On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution"

Transcription

1 On Information-Maximization Custering: Tuning Parameter Seection and Anaytic Soution Masashi Sugiyama Makoto Yamada Manabu Kimura Hirotaka Hachiya Department of Computer Science, Tokyo Institute of Technoogy, Tokyo, Japan. Abstract Information-maximization custering earns a probabiistic cassifier in an unsupervised manner so that mutua information between feature vectors and custer assignments is maximized. A notabe advantage of this approach is that it ony invoves continuous optimization of mode parameters, which is substantiay easier to sove than discrete optimization of custer assignments. However, existing methods sti invove nonconvex optimization probems, and therefore finding a good oca optima soution is not straightforward in practice. In this paper, we propose an aternative informationmaximization custering method based on a squared-oss variant of mutua information. This nove approach gives a custering soution anayticay in a computationay efficient way via kerne eigenvaue decomposition. Furthermore, we provide a practica mode seection procedure that aows us to objectivey optimize tuning parameters incuded in the kerne function. Through experiments, we demonstrate the usefuness of the proposed approach. 1. Introduction The goa of custering is to cassify data sampes into disjoint groups in an unsupervised manner. K-means is a cassic but sti popuar custering agorithm. However, since k-means ony produces ineary separated custers, its usefuness is rather imited in practice. Appearing in Proceedings of the 8 th Internationa Conference on Machine Learning, Beevue, WA, USA, 011. Copyright 011 by the author(s)/owner(s). To cope with this probem, various non-inear custering methods have been deveoped. Kerne k-means (Giroami, 00) performs k-means in a feature space induced by a reproducing kerne function. Spectra custering (Shi & Maik, 000) first unfods non-inear data manifods by a spectra embedding method, and then performs k-means in the embedded space. Burring mean-shift (Fukunaga & Hosteter, 1975) uses a non-parametric kerne density estimator for modeing the data-generating probabiity density and finds custers based on the modes of the estimated density. Discriminative custering (Xu et a., 005; Bach & Harchaoui, 008) earns a discriminative cassifier for separating custers, where cass abes are aso treated as parameters to be optimized. Dependencemaximization custering (Song et a., 007; Faivishevsky & Godberger, 010) determines custer assignments so that their dependence on input data is maximized. These non-inear custering techniques woud be capabe of handing highy compex rea-word data. However, they suffer from ack of objective mode seection strategies 1. More specificay, the above non-inear custering methods contain tuning parameters such as the width of Gaussian functions and the number of nearest neighbors in kerne functions or simiarity measures, and these tuning parameter vaues need to be heuristicay determined in an unsupervised manner. The probem of earning simiarities/kernes was addressed in earier works, but they considered supervised setups, i.e., abeed sampes are assumed to be given. Zenik-Manor & Perona (005) provided a usefu unsupervised heuristic to determine the simiarity in a data-dependent way. However, it sti requires the number of nearest neighbors to be determined man- 1 Mode seection in this paper refers to the choice of tuning parameters in kerne functions or simiarity measures, not the choice of the number of custers.

2 On Information-Maximization Custering uay (athough the magic number 7 was shown to work we in their experiments). Another ine of custering framework caed information-maximization custering (Agakov & Barber, 006; Gomes et a., 010) exhibited the state-of-the-art performance. In this informationmaximization approach, probabiistic cassifiers such as a kerneized Gaussian cassifier (Agakov & Barber, 006) and a kerne ogistic regression cassifier (Gomes et a., 010) are earned so that mutua information (MI) between feature vectors and custer assignments is maximized in an unsupervised manner. A notabe advantage of this approach is that cassifier training is formuated as continuous optimization probems, which are substantiay simper than discrete optimization of custer assignments. Indeed, cassifier training can be carried out in computationay efficient manners by a gradient method (Agakov & Barber, 006) or a quasi-newton method (Gomes et a., 010). Furthermore, Agakov & Barber (006) provided a mode seection strategy based on the common information-maximization principe. Thus, kerne parameters can be systematicay optimized in an unsupervised way. However, in the above MI-based custering approach, the optimization probems are non-convex, and finding a good oca optima soution is not straightforward in practice. The goa of this paper is to overcome this probem by providing a nove informationmaximization custering method. More specificay, we propose to empoy a variant of MI caed squaredoss MI (SMI), and deveop a new custering agorithm whose soution can be computed anayticay in a computationay efficient way via eigenvaue decomposition. Furthermore, for kerne parameter optimization, we propose to use a non-parametric SMI estimator caed east-squares MI (LSMI) (Suzuki et a., 009), which was proved to achieve the optima convergence rate with anaytic-form soutions. Through experiments on various rea-word datasets such as images, natura anguages, acceerometric sensors, and speech, we demonstrate the usefuness of the proposed custering method.. Information-Maximization Custering with Squared-Loss Mutua Information In this section, we describe our nove custering agorithm..1. Formuation of Information-Maximization Custering Suppose we are given d-dimensiona i.i.d. feature vectors of size n, {x i x i R d } n, which are assumed to be drawn independenty from a distribution with density p (x). The goa of custering is to give custer assignments, {y i y i {1,..., c}} n, to the feature vectors {x i } n, where c denotes the number of casses. Throughout this paper, we assume that c is known. In order to sove the custering probem, we take the information-maximization approach (Agakov & Barber, 006; Gomes et a., 010). That is, we regard custering as an unsupervised cassification probem, and earn the cass-posterior probabiity p (y x) so that information between feature vector x and cass abe y is maximized. The dependence-maximization approach (Song et a., 007; Faivishevsky & Godberger, 010) is reated to, but substantiay different from the above information-maximization approach. In the dependence-maximization approach, custer assignments {y i } n are directy determined so that their dependence on feature vectors {x i } n is maximized. Thus, the dependence-maximization approach intrinsicay invoves combinatoria optimization with respect to {y i } n. On the other hand, the informationmaximization approach invoves continuous optimization with respect to the parameter α incuded in a cass-posterior mode p(y x; α). This continuous optimization of α is substantiay easier to sove than discrete optimization of {y i } n. Another advantage of the information-maximization approach is that it naturay aows out-of-sampe custering based on the discriminative mode p(y x; α), i.e., a custer assignment for a new feature vector can be obtained based on the earned discriminative mode... Squared-Loss Mutua Information As an information measure, we adopt squared-oss mutua information (SMI). SMI between feature vector x and cass abe y is defined by SMI := 1 ( p p (x)p ) (x, y) (y) p (x)p (y) 1 dx, (1)

3 On Information-Maximization Custering where p (x, y) denotes the joint density of x and y, and p (y) is the margina probabiity of y. SMI is the Pearson divergence (Pearson, 1900) from p (x, y) to p (x)p (y), whie the ordinary MI (Cover & Thomas, 006) is the Kuback-Leiber divergence (Kuback & Leiber, 1951) from p (x, y) to p (x)p (y): MI := p (x, y) og p (x, y) p (x)p dx. () (y) The Pearson divergence and the Kuback-Leiber divergence both beong to the cass of Ai-Sivey-Csiszár divergences (which is aso known as f-divergences, see (Ai & Sivey, 1966; Csiszár, 1967)), and thus they share simiar properties. For exampe, SMI is nonnegative and takes zero if and ony if x and y are statisticay independent, as the ordinary MI. In the existing information-maximization custering methods (Agakov & Barber, 006; Gomes et a., 010), MI is used as the information measure. On the other hand, in this paper, we adopt SMI because it aows us to deveop a custering agorithm whose soution can be computed anayticay in a computationay efficient way via eigenvaue decomposition, as described beow..3. Custering by SMI Maximization Here, we give a computationay-efficient custering agorithm based on SMI (1). We can express SMI as SMI = 1 = 1 p (x, y) p (x, y) p (x)p (y) dx 1 (3) p (y x)p (x) p (y x) p (y) dx 1. (4) Suppose that the cass-prior probabiity p (y) is set to be uniform: p (y) = 1/c. Then Eq.(4) is expressed as c p (y x)p (x)p (y x)dx 1. (5) Let us approximate the cass-posterior probabiity p (y x) by the foowing kerne mode: p(y x; α) := α y,i K(x, x i ), (6) where K(x, x ) denotes a kerne function with a kerne parameter t. In the experiments, we wi use a sparse variant of the oca-scaing kerne (Zenik-Manor & Perona, 005): exp ( x i x j ) σ i σ j K(x i, x j ) = (7) if x i N t (x j ) or x j N t (x i ), 0 otherwise, where N t (x) denotes the set of t nearest neighbors for x (t is the kerne parameter), σ i is a oca scaing factor defined as σ i = x i x (t) i, and x (t) i is the t-th nearest neighbor of x i. Further approximating the expectation with respect to p (x) incuded in Eq.(5) by the empirica average of sampes {x i } n, we arrive at the foowing SMI approximator: ŜMI := c n α y K α y 1, (8) where denotes the transpose, α y := (α y,1,..., α y,n ), and K i,j := K(x i, x j ). For each custer y, we maximize α y K α y under α y = 1. Since this is the Rayeigh quotient, the maximizer is given by the normaized principa eigenvector of K (Horn & Johnson, 1985). To avoid a the soutions {α y } c to be reduced to the same principa eigenvector, we impose their mutua orthogonaity: α y α y = 0 for y y. Then the soutions are given by the normaized eigenvectors ϕ 1,..., ϕ c associated with the eigenvaues λ 1 λ n 0 of K. Since the sign of ϕ y is arbitrary, we set the sign as ϕ y = ϕ y sign(ϕ y 1 n ), where sign( ) denotes the sign of a scaar and 1 n denotes the n-dimensiona vector with a ones. On the other hand, since p (y)= p (y x)p (x)dx 1 n p(y x i ; α)=α y K1 n, and the cass-prior probabiity p (y) was set to be uniform, we have the foowing normaization condition: α y K1 n = 1/c. Furthermore, probabiity estimates shoud be nonnegative, which can be achieved by rounding up negative outputs to zero. Taking these issues into account, Note that this unit-norm constraint is not essentia since the obtained soution is renormaized ater.

4 On Information-Maximization Custering custer assignments {y i } n for {x i} n are determined as y i = argmax y [max(0 n, ϕ y )] i max(0 n, ϕ y ) 1 n, where the max operation for vectors is appied in the eement-wise manner and [ ] i denotes the i-th eement of a vector. Note that we used K ϕ y = λ y ϕy in the above derivation. We ca the above method SMI-based custering (SMIC)..4. Kerne Parameter Choice by SMI Maximization Since the above custering approach was deveoped in the framework of SMI maximization, it woud be natura to determine the kerne parameters so that SMI is maximized. A direct approach is to use the above SMI estimator ŜMI aso for kerne parameter choice. However, this direct approach is not favorabe because ŜMI is an unsupervised SMI estimator (i.e., SMI is estimated ony from unabeed sampes {x i } n ). In the mode seection stage, however, we have aready obtained abeed sampes {(x i, y i )} n, and thus supervised estimation of SMI is possibe. For supervised SMI estimation, a non-parametric SMI estimator caed east-squares mutua information (LSMI) (Suzuki et a., 009) was shown to achieve the optima convergence rate. For this reason, we propose to use LSMI for mode seection, instead of ŜMI (8). LSMI is an estimator of SMI based on paired sampes {(x i, y i )} n. The key idea of LSMI is to earn the foowing density-ratio function, r (x, y) := p (x, y) p (x)p (y), (9) without going through density estimation of p (x, y), p (x), and p (y). More specificay, et us empoy the foowing density-ratio mode: r(x, y; θ) := θ L(x, x ), (10) :y =y where L(x, x ) is a kerne function with kerne parameter γ. In the experiments, we wi use the Gaussian kerne: L(x, x ) = exp ( x x ) γ. (11) The parameter θ in the above density-ratio mode is earned so that the foowing squared error is minimized: 1 ( p r(x, y; θ) r (x, y)) (x)p (y)dx. (1) Among n custer assignments {y i } n, et n y be the number of sampes in custer y. Let θ y be the parameter vector corresponding to the kerne bases {L(x, x )} :y =y, i.e., θ y is the n y -dimensiona subvector of θ = (θ 1,..., θ n ) consisting of indices { y = y}. Then an empirica and reguarized version of the optimization probem (1) is given for each y as foows: min θ y [ 1 θ y Ĥ(y) θy θ y ĥ(y) + δθ y θ y ], (13) where δ ( 0) is the reguarization parameter. Ĥ (y) is the n y n y matrix and ĥ(y) is the n y -dimensiona vector defined as Ĥ (y), := n y n L(x i, x (y) )L(x i, x (y) ), ĥ (y) := 1 n i:y i=y L(x i, x (y) ), where x (y) is the -th sampe in cass y (which corresponds to θ (y) ). A notabe advantage of LSMI is that the soution can be computed anayticay as θ (y) = (Ĥ(y) + δi) 1 ĥ (y). θ (y) Then a density-ratio estimator is obtained anayticay as foows: n y r(x, y) = =1 θ (y) L(x, x (y) ). The accuracy of the above east-squares densityratio estimator depends on the choice of the kerne parameter γ and the reguarization parameter δ. They can be systematicay optimized based on crossvaidation as foows (Suzuki et a., 009). The sampes Z = {(x i, y i )} n are divided into M disjoint subsets {Z m } M m=1 of approximatey the same size. Then a density-ratio estimator r m (x, y) is obtained using Z\Z m (i.e., a sampes without Z m ), and its out-ofsampe error (which corresponds to Eq.(1) without irreevant constant) for the hod-out sampes Z m is computed as CV m := 1 Z m m (x, y) x,y Z m r 1 m (x, y). Z m (x,y) Z m r

5 On Information-Maximization Custering This procedure is repeated for m = 1,..., M, and the average of the above hod-out error over a m is computed. Finay, the kerne parameter γ and the reguarization parameter δ that minimize the average hodout error are chosen as the most suitabe ones. Based on the expression of SMI given by Eq.(3), an SMI estimator caed LSMI is given as foows: LSMI := 1 n r(x i, y i ) 1, (14) where r(x, y) is a density-ratio estimator obtained above. Since r(x, y) can be computed anayticay, LSMI can aso be computed anayticay. We use LSMI for mode seection of SMIC. More specificay, we compute LSMI as a function of the kerne parameter t of K(x, x ) incuded in the custer-posterior mode (6), and choose the one that maximizes LSMI. MATLAB impementation of the proposed custering method is avaiabe from sugi/software/smic. 3. Existing Methods In this section, we quaitativey compare the proposed approach with existing methods Spectra Custering The basic idea of spectra custering (Shi & Maik, 000) is to first unfod non-inear data manifods by a spectra embedding method, and then perform k- means in the embedded space. More specificay, given sampe-sampe simiarity W i,j 0, the minimizer of the foowing criterion with respect to {ξ i } n is obtained under some normaization constraint: i,j 1 W i,j ξ i 1 ξ j, Di,i Dj,j where D is the diagona matrix with i-th diagona eement given by D i,i := n j=1 W i,j. Consequenty, the embedded sampes are given by the principa eigenvectors of D 1 W D 1, foowed by normaization. Note that spectra custering was shown to be equivaent to a weighted variant of kerne k-means with some specific kerne (Dhion et a., 004). The performance of spectra custering depends heaviy on the choice of sampe-sampe simiarity W i,j. Zenik-Manor & Perona (005) proposed a usefu unsupervised heuristic to determine the simiarity in a data-dependent manner, caed oca scaing: W i,j = exp ( x i x j σ i σ j ), where σ i is a oca scaing factor defined as σ i = x i x (t) i, and x (t) i is the t-th nearest neighbor of x i. t is the tuning parameter in the oca scaing simiarity, and t = 7 was shown to be usefu (Zenik-Manor & Perona, 005; Sugiyama, 007). However, this magic number 7 does not seem to work aways we in genera. If D 1 W D 1 is regarded as a kerne matrix, spectra custering wi be simiar to the proposed SMIC method described in Section.3. However, SMIC does not require the post k-means processing since the principa components have cear interpretation as parameter estimates of the cass-posterior mode (6). Furthermore, our proposed approach provides a systematic mode seection strategy, which is a notabe advantage over spectra custering. 3.. Burring Mean-Shift Custering Burring mean-shift (Fukunaga & Hosteter, 1975) is a non-parametric custering method based on the modes of the data-generating probabiity density. In the burring mean-shift agorithm, a kerne density estimator (Siverman, 1986) is used for modeing the data-generating probabiity density: p(x) = 1 n ( K x x i /σ ), where K(ξ) is a kerne function such as a Gaussian kerne K(ξ) = e ξ/. Taking the derivative of p(x) with respect to x and equating the derivative at x = x i to zero, we obtain the foowing updating formua for sampe x i (i = 1,..., n): n j=1 x i W i,jx j n j =1 W, i,j where W i,j := K ( x i x j /σ ) and K (ξ) is the derivative of K(ξ). Each mode of the density is regarded as a representative of a custer, and each data point is assigned to the custer which it converges to. Carreira-Perpiñán (007) showed that the burring mean-shift agorithm can be interpreted as an EM agorithm (Dempster et a., 1977), where W i,j /( n j =1 W i,j ) is regarded as the posterior probabiity of the i-th sampe beonging to the j-th custer. Furthermore, the above update rue can be expressed in a matrix form as X XP, where X = (x 1,..., x n ) is a sampe matrix and P := W D 1 is a stochastic matrix of the random wak in a graph with adjacency W (Chung, 1997). D is defined as

6 On Information-Maximization Custering D i,i := n j=1 W i,j and D i,j = 0 for i j. If P is independent of X, the above iterative agorithm corresponds to the power method (Goub & Loan, 1996) for finding the eading eft eigenvector of P. Then, this agorithm is highy reated to the spectra custering which computes the principa eigenvectors of D 1 W D 1 (see Section 3.1). Athough P depends on X in reaity, Carreira-Perpiñán (006) insisted that this anaysis is sti vaid since P and X quicky reach a quasi-stabe state. An attractive property of burring mean-shift is that the number of custers is automaticay determined as the number of modes in the probabiity density estimate. However, this choice depends on the kerne parameter σ and there is no systematic way to determine σ, which is restrictive compared with the proposed method. Another critica drawback of the burring mean-shift agorithm is that it eventuay converges to a singe point (Cheng, 1995), and therefore a sensibe stopping criterion is necessary in practice. Athough Carreira-Perpiñán (006) gave a usefu heuristic for stopping the iteration, it is not cear whether this heuristic aways works we in practice. 4. Experiments In this section, we experimentay evauate the performance of the proposed and existing custering methods Iustration First, we iustrate the behavior of the proposed method using artificia datasets described in the top row of Figure 1. The dimensionaity is d = and the sampe size is n = 00. As a kerne function, we used the sparse oca-scaing kerne (7) for SMIC, where the kerne parameter t was chosen from {1,..., 10} based on LSMI with the Gaussian kerne (11). The top graphs in Figure 1 depict the custer assignments obtained by SMIC, and the bottom graphs in Figure 1 depict the mode seection curves obtained by LSMI. The resuts show that SMIC combined with LSMI works we for these toy datasets. 4.. Performance Comparison Next, we systematicay compare the performance of the proposed and existing custering methods using various rea-word datasets such as images, natura anguages, acceerometric sensors, and speech. We compared the performance of the foowing methods, which a do not contain open tuning parame Kerne parameter t Kerne parameter t Kerne parameter t SMI estimate SMI estimate SMI estimate SMI estimate Kerne parameter t Figure 1. Iustrative exampes. Custer assignments obtained by SMIC (top) and mode seection curves obtained by LSMI (bottom). ters and therefore experimenta resuts are fair and objective: K-means (KM), spectra custering with the sef-tuning oca-scaing simiarity (SC) (Zenik-Manor & Perona, 005), mean nearest-neighbor custering (MNN) (Faivishevsky & Godberger, 010), MI-based custering for kerne ogistic modes (MIC) (Gomes et a., 010) with mode seection by maximumikeihood MI (Suzuki et a., 008), and the proposed SMIC. The custering performance was evauated by the adjusted Rand index (ARI) (Hubert & Arabie, 1985) between inferred custer assignments and the ground truth categories. Larger ARI vaues mean better performance, and ARI takes its maximum vaue 1 when two sets of custer assignments are identica. In addition, we aso evauated the computationa efficiency of each method by the CPU computation time. We used various rea-word datasets incuding images, natura anguages, acceerometric sensors, and speech: The USPS hand-written digit dataset ( digit ), the Oivetti Face dataset ( face ), the 0-Newsgroups dataset ( document ), the SENSEVAL- dataset ( word ), the ALKAN dataset ( acceerometry ), and the in-house speech dataset ( speech ). Detaied expanation of the datasets is omitted due to ack of space. For each dataset, the experiment was repeated 100 times with random choice of sampes from a poo. Sampes were centraized and their variance was normaized in the dimension-wise manner, before feeding them to custering agorithms. The experimenta resuts are described in Tabe 1. For the digit dataset, MIC and SMIC outperform KM, SC, and MNN in terms of ARI. The entire computation time of SMIC incuding mode seection is faster than KM, SC, and MIC, and is comparabe to MNN which does not incude a mode seection procedure. For the

7 On Information-Maximization Custering Tabe 1. Experimenta resuts on rea-word datasets (with equa custer size). The average custering accuracy (and its standard deviation in the bracket) in terms of ARI and the average CPU computation time in second over 100 runs are described. The best method in terms of the average ARI and methods judged to be comparabe to the best one by the t-test at the significance eve 1% are described in bodface. Computation time of MIC and SMIC corresponds to the time for computing a custering soution after mode seection has been carried out. For references, computation time for the entire procedure incuding mode seection is described in the square bracket. Digit (d = 56, n = 5000, and c = 10) ARI 0.4(0.01) 0.4(0.0) 0.44(0.03) 0.63(0.08) 0.63(0.05) Time [3631.7] 14.4[359.5] Face (d = 4096, n = 100, and c = 10) ARI 0.60(0.11) 0.6(0.11) 0.47(0.10) 0.64(0.1) 0.65(0.11) Time [30.8] 0.0[19.3] Document (d = 50, n = 700, and c = 7) ARI 0.00(0.00) 0.09(0.0) 0.09(0.0) 0.01(0.0) 0.19(0.03) Time [530.5] 0.3[115.3] Word (d = 50, n = 300, and c = 3) ARI 0.04(0.05) 0.0(0.01) 0.0(0.0) 0.04(0.04) 0.08(0.05) Time [369.6] 0.[03.9] Acceerometry (d = 5, n = 300, and c = 3) ARI 0.49(0.04) 0.58(0.14) 0.71(0.05) 0.57(0.3) 0.68(0.1) Time [410.6] 0.[9.6] Speech (d = 50, n = 400, and c = ) ARI 0.00(0.00) 0.00(0.00) 0.04(0.15) 0.18(0.16) 0.1(0.5) Time [413.4] 0.3[179.7] face dataset, SC, MIC, and SMIC are comparabe to each other and are better than KM and MNN in terms of ARI. For the document and word datasets, SMIC tends to outperform the other methods. For the acceerometry dataset, MNN and SMIC work better than the other methods. Finay, for the speech dataset, MIC and SMIC work comparaby we, and are significanty better than KM, SC, and MNN. Overa, MIC was shown to work reasonaby we, impying that mode seectoin by maximum-ikeihood MI is practicay usefu. SMIC was shown to work even better than MIC, with much ess computation time. The accuracy improvement of SMIC over MIC was gained by computing the SMIC soution in a cosedform without any heuristic initiaization. The computationa efficiency of SMIC was brought by the anaytic computation of the optima soution and the cass-wise optimization of LSMI (see Section.4). The performance of MNN and SC was rather unstabe because of the heuristic averaging of the number of nearest neighbors and the heuristic choice of oca scaing. In terms of computation time, they are rea- Tabe. Experimenta resuts on rea-word datasets under imbaanced setup. ARI vaues are described in the tabe. Cass-imbaance was reaized by setting the sampe size of the first cass m times arger than other casses. The resuts for m = 1 are the same as the ones reported in Tabe 1. Digit (d = 56, n = 5000, and c = 10) m = 1 0.4(0.01) 0.4(0.0) 0.44(0.03) 0.63(0.08) 0.63(0.05) m = 0.5(0.01) 0.1(0.0) 0.43(0.04) 0.60(0.05) 0.63(0.05) Document (d = 50, n = 700, and c = 7) m = (0.00) 0.09(0.0) 0.09(0.0) 0.01(0.0) 0.19(0.03) m = 0.01(0.01) 0.10(0.03) 0.10(0.0) 0.01(0.0) 0.19(0.04) m = (0.01) 0.10(0.03) 0.09(0.0) -0.01(0.03) 0.16(0.05) m = 4 0.0(0.01) 0.09(0.03) 0.08(0.0) -0.00(0.04) 0.14(0.05) Word (d = 50, n = 300, and c = 3) m = (0.05) 0.0(0.01) 0.0(0.0) 0.04(0.04) 0.08(0.05) m = 0.00(0.07) -0.01(0.01) 0.01(0.0) -0.0(0.05) 0.03(0.05) Acceerometry (d = 5, n = 300, and c = 3) m = (0.04) 0.58(0.14) 0.71(0.05) 0.57(0.3) 0.68(0.1) m = 0.48(0.05) 0.54(0.14) 0.58(0.11) 0.49(0.19) 0.69(0.16) m = (0.05) 0.47(0.10) 0.4(0.1) 0.4(0.14) 0.66(0.0) m = (0.06) 0.38(0.11) 0.31(0.09) 0.40(0.18) 0.56(0.) tivey efficient for sma- to medium-sized datasets, but they are expensive for the argest dataset, digit. KM was not reiabe for the document and speech datasets because of the restriction that the custer boundaries are inear. For the digit, face, and document datasets, KM was computationay very expensive since a arge number of iterations were needed unti convergence to a oca optimum soution. Finay, we performed simiar experiments under imbaanced setup, where the the sampe size of the first cass was set to be m times arger than other casses. The resuts are summarized in Tabe, showing that the performance of a methods tends to be degraded as the degree of imbaance increases. Thus, custering becomes more chaenging if the custer size is imbaanced. Among the compared methods, the proposed SMIC sti worked better than other methods. Overa, the proposed SMIC combined with LSMI was shown to be a usefu aternative to existing custering approaches. 5. Concusions In this paper, we proposed a nove informationmaximization custering method, which earns cassposterior probabiities in an unsupervised manner so that the squared-oss mutua information (SMI) between feature vectors and custer assignments is maximized. The proposed agorithm caed SMI-based custering (SMIC) aows us to obtain custering soutions anayticay by soving a kerne eigenvaue probem. Thus, unike the previous information-maximization

8 On Information-Maximization Custering custering methods (Agakov & Barber, 006; Gomes et a., 010), SMIC does not suffer from the probem of oca optima. Furthermore, we proposed to use an optima non-parametric SMI estimator caed eastsquares mutua information (LSMI) for data-driven parameter optimization. Through experiments, SMIC combined with LSMI was demonstrated to compare favoraby with existing custering methods. Acknowedgments We woud ike to thank Ryan Gomes for providing us his program code of information-maximization custering. MS was supported by SCAT, AOARD, and the FIRST program. MY and MK were supported by the JST PRESTO program, and HH was supported by the FIRST program. References Agakov, F. and Barber, D. Kerneized infomax custering. NIPS 18, pp MIT Press, 006. Ai, S. M. and Sivey, S. D. A genera cass of coefficients of divergence of one distribution from another. Journa of the Roya Statistica Society, Series B, 8(1):131 14, Bach, F. and Harchaoui, Z. DIFFRAC: A discriminative and fexibe framework for custering. NIPS 0, pp , 008. Carreira-Perpiñán, M. Á. Fast nonparametric custering with Gaussian burring mean-shift. ICML, pp , 006. Carreira-Perpiñán, M. Á. Gaussian mean shift is an EM agorithm. IEEE Transactions on Pattern Anaysis and Machine Inteigence, 9: , 007. Cheng, Y. Mean shift, mode seeking, and custering. IEEE Transactions on Pattern Anaysis and Machine Inteigence, 17: , Chung, F. R. K. Spectra Graph Theory. American Mathematica Society, Providence, Cover, T. M. and Thomas, J. A. Eements of Information Theory. John Wiey & Sons, Inc., nd edition, 006. Csiszár, I. Information-type measures of difference of probabiity distributions and indirect observation. Studia Scientiarum Mathematicarum Hungarica, :9 318, Dempster, A. P., Laird, N. M., and Rubin, D. B. Maximum ikeihood from incompete data via the EM agorithm. Journa of the Roya Statistica Society, series B, 39(1): 1 38, Dhion, I. S., Guan, Y., and Kuis, B. Kerne k-means, spectra custering and normaized cuts. ACM SIGKDD, pp , 004. Faivishevsky, L. and Godberger, J. A nonparametric information theoretic custering agorithm. ICML, pp , 010. Fukunaga, K. and Hosteter, L. D. The estimation of the gradient of a density function, with appication in pattern recognition. IEEE Transactions on Information Theory, 1(1):3 40, Giroami, M. Mercer kerne-based custering in feature space. IEEE Transactions on Neura Networks, 13(3): , 00. Goub, G. H. and Loan, C. F. Van. Matrix Computations. Johns Hopkins University Press, Gomes, R., Krause, A., and Perona, P. Discriminative custering by reguarized information maximization. NIPS 3, pp Horn, R. A. and Johnson, C. A. Matrix Anaysis. Cambridge University Press, Hubert, L. and Arabie, P. Comparing partitions. Journa of Cassification, (1):193 18, Kuback, S. and Leiber, R. A. On information and sufficiency. Annas of Mathematica Statistics, :79 86, Pearson, K. On the criterion that a given system of deviations from the probabe in the case of a correated system of variabes is such that it can be reasonaby supposed to have arisen from random samping. Phiosophica Magazine, 50: , Shi, J. and Maik, J. Normaized cuts and image segmentation. IEEE Transactions on Pattern Anaysis and Machine Inteigence, (8): , 000. Siverman, B. W. Density Estimation for Statistics and Data Anaysis. Chapman and Ha, Song, L., Smoa, A., Gretton, A., and Borgwardt, K. A dependence maximization view of custering. ICML, pp , 007. Sugiyama, M. Dimensionaity reduction of mutimoda abeed data by oca Fisher discriminant anaysis. Journa of Machine Learning Research, 8: , 007. Suzuki, T., Sugiyama, M., Sese, J., and Kanamori, T. Approximating mutua information by maximum ikeihood density ratio estimation. JMLR Workshop and Conference Proceedings, 4:5 0, 008. Suzuki, T., Sugiyama, M., Kanamori, T., and Sese, J. Mutua information estimation reveas goba associations between stimui and bioogica processes. BMC Bioinformatics, 10(1):S5, 009. Xu, L., Neufed, J., Larson, B., and Schuurmans, D. Maximum margin custering. NIPS 17, pp Zenik-Manor, L. and Perona, P. Sef-tuning spectra custering. NIPS 17, pp , 005.

On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution

On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution ICML2011 Jun. 28-Jul. 2, 2011 On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution Masashi Sugiyama, Makoto Yamada, Manabu Kimura, and Hirotaka Hachiya Department of

More information

Language Identification for Texts Written in Transliteration

Language Identification for Texts Written in Transliteration Language Identification for Texts Written in Transiteration Andrey Chepovskiy, Sergey Gusev, Margarita Kurbatova Higher Schoo of Economics, Data Anaysis and Artificia Inteigence Department, Pokrovskiy

More information

Automatic Grouping for Social Networks CS229 Project Report

Automatic Grouping for Social Networks CS229 Project Report Automatic Grouping for Socia Networks CS229 Project Report Xiaoying Tian Ya Le Yangru Fang Abstract Socia networking sites aow users to manuay categorize their friends, but it is aborious to construct

More information

Nearest Neighbor Learning

Nearest Neighbor Learning Nearest Neighbor Learning Cassify based on oca simiarity Ranges from simpe nearest neighbor to case-based and anaogica reasoning Use oca information near the current query instance to decide the cassification

More information

Semi-Supervised Information-Maximization Clustering

Semi-Supervised Information-Maximization Clustering Neural Networks, vol.57, pp.103 111, 2014. 1 Semi-Supervised Information-Maximization Clustering Daniele Calandriello Politecnico di Milano, Milano, Italy daniele.calandriello@mail.polimi.it Gang Niu Baidu

More information

Sensitivity Analysis of Hopfield Neural Network in Classifying Natural RGB Color Space

Sensitivity Analysis of Hopfield Neural Network in Classifying Natural RGB Color Space Sensitivity Anaysis of Hopfied Neura Network in Cassifying Natura RGB Coor Space Department of Computer Science University of Sharjah UAE rsammouda@sharjah.ac.ae Abstract: - This paper presents a study

More information

Dependence-Maximization Clustering with Least-Squares Mutual Information

Dependence-Maximization Clustering with Least-Squares Mutual Information Journal of Advanced Computational Intelligence and Intelligent Informatics. vol.5, no.7, pp.800-805, 20. Dependence-Maximization Clustering with Least-Squares Mutual Information Manabu Kimura Tokyo Institute

More information

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions 2006 Internationa Joint Conference on Neura Networks Sheraton Vancouver Wa Centre Hote, Vancouver, BC, Canada Juy 16-21, 2006 A New Supervised Custering Agorithm Based on Min-Max Moduar Network with Gaussian-Zero-Crossing

More information

Distance Weighted Discrimination and Second Order Cone Programming

Distance Weighted Discrimination and Second Order Cone Programming Distance Weighted Discrimination and Second Order Cone Programming Hanwen Huang, Xiaosun Lu, Yufeng Liu, J. S. Marron, Perry Haaand Apri 3, 2012 1 Introduction This vignette demonstrates the utiity and

More information

Lecture Notes for Chapter 4 Part III. Introduction to Data Mining

Lecture Notes for Chapter 4 Part III. Introduction to Data Mining Data Mining Cassification: Basic Concepts, Decision Trees, and Mode Evauation Lecture Notes for Chapter 4 Part III Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,

More information

A HYBRID FEATURE SELECTION METHOD BASED ON FISHER SCORE AND GENETIC ALGORITHM

A HYBRID FEATURE SELECTION METHOD BASED ON FISHER SCORE AND GENETIC ALGORITHM Journa of Mathematica Sciences: Advances and Appications Voume 37, 2016, Pages 51-78 Avaiabe at http://scientificadvances.co.in DOI: http://dx.doi.org/10.18642/jmsaa_7100121627 A HYBRID FEATURE SELECTION

More information

Mobile App Recommendation: Maximize the Total App Downloads

Mobile App Recommendation: Maximize the Total App Downloads Mobie App Recommendation: Maximize the Tota App Downoads Zhuohua Chen Schoo of Economics and Management Tsinghua University chenzhh3.12@sem.tsinghua.edu.cn Yinghui (Catherine) Yang Graduate Schoo of Management

More information

Sparse Representation based Face Recognition with Limited Labeled Samples

Sparse Representation based Face Recognition with Limited Labeled Samples Sparse Representation based Face Recognition with Limited Labeed Sampes Vijay Kumar, Anoop Namboodiri, C.V. Jawahar Center for Visua Information Technoogy, IIIT Hyderabad, India Abstract Sparse representations

More information

A Design Method for Optimal Truss Structures with Certain Redundancy Based on Combinatorial Rigidity Theory

A Design Method for Optimal Truss Structures with Certain Redundancy Based on Combinatorial Rigidity Theory 0 th Word Congress on Structura and Mutidiscipinary Optimization May 9 -, 03, Orando, Forida, USA A Design Method for Optima Truss Structures with Certain Redundancy Based on Combinatoria Rigidity Theory

More information

A Comparison of a Second-Order versus a Fourth- Order Laplacian Operator in the Multigrid Algorithm

A Comparison of a Second-Order versus a Fourth- Order Laplacian Operator in the Multigrid Algorithm A Comparison of a Second-Order versus a Fourth- Order Lapacian Operator in the Mutigrid Agorithm Kaushik Datta (kdatta@cs.berkeey.edu Math Project May 9, 003 Abstract In this paper, the mutigrid agorithm

More information

Image Segmentation Using Semi-Supervised k-means

Image Segmentation Using Semi-Supervised k-means I J C T A, 9(34) 2016, pp. 595-601 Internationa Science Press Image Segmentation Using Semi-Supervised k-means Reza Monsefi * and Saeed Zahedi * ABSTRACT Extracting the region of interest is a very chaenging

More information

Comparative Analysis of Relevance for SVM-Based Interactive Document Retrieval

Comparative Analysis of Relevance for SVM-Based Interactive Document Retrieval Comparative Anaysis for SVM-Based Interactive Document Retrieva Paper: Comparative Anaysis of Reevance for SVM-Based Interactive Document Retrieva Hiroshi Murata, Takashi Onoda, and Seiji Yamada Centra

More information

A Novel Linear-Polynomial Kernel to Construct Support Vector Machines for Speech Recognition

A Novel Linear-Polynomial Kernel to Construct Support Vector Machines for Speech Recognition Journa of Computer Science 7 (7): 99-996, 20 ISSN 549-3636 20 Science Pubications A Nove Linear-Poynomia Kerne to Construct Support Vector Machines for Speech Recognition Bawant A. Sonkambe and 2 D.D.

More information

A Petrel Plugin for Surface Modeling

A Petrel Plugin for Surface Modeling A Petre Pugin for Surface Modeing R. M. Hassanpour, S. H. Derakhshan and C. V. Deutsch Structure and thickness uncertainty are important components of any uncertainty study. The exact ocations of the geoogica

More information

Layer-Specific Adaptive Learning Rates for Deep Networks

Layer-Specific Adaptive Learning Rates for Deep Networks Layer-Specific Adaptive Learning Rates for Deep Networks arxiv:1510.04609v1 [cs.cv] 15 Oct 2015 Bharat Singh, Soham De, Yangmuzi Zhang, Thomas Godstein, and Gavin Tayor Department of Computer Science Department

More information

University of Illinois at Urbana-Champaign, Urbana, IL 61801, /11/$ IEEE 162

University of Illinois at Urbana-Champaign, Urbana, IL 61801, /11/$ IEEE 162 oward Efficient Spatia Variation Decomposition via Sparse Regression Wangyang Zhang, Karthik Baakrishnan, Xin Li, Duane Boning and Rob Rutenbar 3 Carnegie Meon University, Pittsburgh, PA 53, wangyan@ece.cmu.edu,

More information

Solutions to the Final Exam

Solutions to the Final Exam CS/Math 24: Intro to Discrete Math 5//2 Instructor: Dieter van Mekebeek Soutions to the Fina Exam Probem Let D be the set of a peope. From the definition of R we see that (x, y) R if and ony if x is a

More information

Neural Networks. Aarti Singh. Machine Learning Nov 3, Slides Courtesy: Tom Mitchell

Neural Networks. Aarti Singh. Machine Learning Nov 3, Slides Courtesy: Tom Mitchell Neura Networks Aarti Singh Machine Learning 10-601 Nov 3, 2011 Sides Courtesy: Tom Mitche 1 Logis0c Regression Assumes the foowing func1ona form for P(Y X): Logis1c func1on appied to a inear func1on of

More information

AN EVOLUTIONARY APPROACH TO OPTIMIZATION OF A LAYOUT CHART

AN EVOLUTIONARY APPROACH TO OPTIMIZATION OF A LAYOUT CHART 13 AN EVOLUTIONARY APPROACH TO OPTIMIZATION OF A LAYOUT CHART Eva Vona University of Ostrava, 30th dubna st. 22, Ostrava, Czech Repubic e-mai: Eva.Vona@osu.cz Abstract: This artice presents the use of

More information

Application of Intelligence Based Genetic Algorithm for Job Sequencing Problem on Parallel Mixed-Model Assembly Line

Application of Intelligence Based Genetic Algorithm for Job Sequencing Problem on Parallel Mixed-Model Assembly Line American J. of Engineering and Appied Sciences 3 (): 5-24, 200 ISSN 94-7020 200 Science Pubications Appication of Inteigence Based Genetic Agorithm for Job Sequencing Probem on Parae Mixed-Mode Assemby

More information

Neural Network Enhancement of the Los Alamos Force Deployment Estimator

Neural Network Enhancement of the Los Alamos Force Deployment Estimator Missouri University of Science and Technoogy Schoars' Mine Eectrica and Computer Engineering Facuty Research & Creative Works Eectrica and Computer Engineering 1-1-1994 Neura Network Enhancement of the

More information

A Robust Sign Language Recognition System with Sparsely Labeled Instances Using Wi-Fi Signals

A Robust Sign Language Recognition System with Sparsely Labeled Instances Using Wi-Fi Signals A Robust Sign Language Recognition System with Sparsey Labeed Instances Using Wi-Fi Signas Jiacheng Shang, Jie Wu Center for Networked Computing Dept. of Computer and Info. Sciences Tempe University Motivation

More information

A Memory Grouping Method for Sharing Memory BIST Logic

A Memory Grouping Method for Sharing Memory BIST Logic A Memory Grouping Method for Sharing Memory BIST Logic Masahide Miyazai, Tomoazu Yoneda, and Hideo Fuiwara Graduate Schoo of Information Science, Nara Institute of Science and Technoogy (NAIST), 8916-5

More information

Design of IP Networks with End-to. to- End Performance Guarantees

Design of IP Networks with End-to. to- End Performance Guarantees Design of IP Networks with End-to to- End Performance Guarantees Irena Atov and Richard J. Harris* ( Swinburne University of Technoogy & *Massey University) Presentation Outine Introduction Mutiservice

More information

Utility-based Camera Assignment in a Video Network: A Game Theoretic Framework

Utility-based Camera Assignment in a Video Network: A Game Theoretic Framework This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Y.LI AND B.BHANU CAMERA ASSIGNMENT: A GAME-THEORETIC

More information

On Upper Bounds for Assortment Optimization under the Mixture of Multinomial Logit Models

On Upper Bounds for Assortment Optimization under the Mixture of Multinomial Logit Models On Upper Bounds for Assortment Optimization under the Mixture of Mutinomia Logit Modes Sumit Kunnumka September 30, 2014 Abstract The assortment optimization probem under the mixture of mutinomia ogit

More information

Endoscopic Motion Compensation of High Speed Videoendoscopy

Endoscopic Motion Compensation of High Speed Videoendoscopy Endoscopic Motion Compensation of High Speed Videoendoscopy Bharath avuri Department of Computer Science and Engineering, University of South Caroina, Coumbia, SC - 901. ravuri@cse.sc.edu Abstract. High

More information

JOINT IMAGE REGISTRATION AND EXAMPLE-BASED SUPER-RESOLUTION ALGORITHM

JOINT IMAGE REGISTRATION AND EXAMPLE-BASED SUPER-RESOLUTION ALGORITHM JOINT IMAGE REGISTRATION AND AMPLE-BASED SUPER-RESOLUTION ALGORITHM Hyo-Song Kim, Jeyong Shin, and Rae-Hong Park Department of Eectronic Engineering, Schoo of Engineering, Sogang University 35 Baekbeom-ro,

More information

Polygonal Approximation of Point Sets

Polygonal Approximation of Point Sets Poygona Approximation of Point Sets Longin Jan Latecki 1, Rof Lakaemper 1, and Marc Sobe 2 1 CIS Dept., Tempe University, Phiadephia, PA 19122, USA, atecki@tempe.edu, akamper@tempe.edu 2 Statistics Dept.,

More information

Chapter Multidimensional Direct Search Method

Chapter Multidimensional Direct Search Method Chapter 09.03 Mutidimensiona Direct Search Method After reading this chapter, you shoud be abe to:. Understand the fundamentas of the mutidimensiona direct search methods. Understand how the coordinate

More information

Alpha labelings of straight simple polyominal caterpillars

Alpha labelings of straight simple polyominal caterpillars Apha abeings of straight simpe poyomina caterpiars Daibor Froncek, O Nei Kingston, Kye Vezina Department of Mathematics and Statistics University of Minnesota Duuth University Drive Duuth, MN 82-3, U.S.A.

More information

Fuzzy Equivalence Relation Based Clustering and Its Use to Restructuring Websites Hyperlinks and Web Pages

Fuzzy Equivalence Relation Based Clustering and Its Use to Restructuring Websites Hyperlinks and Web Pages Fuzzy Equivaence Reation Based Custering and Its Use to Restructuring Websites Hyperinks and Web Pages Dimitris K. Kardaras,*, Xenia J. Mamakou, and Bi Karakostas 2 Business Informatics Laboratory, Dept.

More information

Hiding secrete data in compressed images using histogram analysis

Hiding secrete data in compressed images using histogram analysis University of Woongong Research Onine University of Woongong in Dubai - Papers University of Woongong in Dubai 2 iding secrete data in compressed images using histogram anaysis Farhad Keissarian University

More information

Load Balancing by MPLS in Differentiated Services Networks

Load Balancing by MPLS in Differentiated Services Networks Load Baancing by MPLS in Differentiated Services Networks Riikka Susitaiva, Jorma Virtamo, and Samui Aato Networking Laboratory, Hesinki University of Technoogy P.O.Box 3000, FIN-02015 HUT, Finand {riikka.susitaiva,

More information

Resource Optimization to Provision a Virtual Private Network Using the Hose Model

Resource Optimization to Provision a Virtual Private Network Using the Hose Model Resource Optimization to Provision a Virtua Private Network Using the Hose Mode Monia Ghobadi, Sudhakar Ganti, Ghoamai C. Shoja University of Victoria, Victoria C, Canada V8W 3P6 e-mai: {monia, sganti,

More information

Optimization and Application of Support Vector Machine Based on SVM Algorithm Parameters

Optimization and Application of Support Vector Machine Based on SVM Algorithm Parameters Optimization and Appication of Support Vector Machine Based on SVM Agorithm Parameters YAN Hui-feng 1, WANG Wei-feng 1, LIU Jie 2 1 ChongQing University of Posts and Teecom 400065, China 2 Schoo Of Civi

More information

CLOUD RADIO ACCESS NETWORK WITH OPTIMIZED BASE-STATION CACHING

CLOUD RADIO ACCESS NETWORK WITH OPTIMIZED BASE-STATION CACHING CLOUD RADIO ACCESS NETWORK WITH OPTIMIZED BASE-STATION CACHING Binbin Dai and Wei Yu Ya-Feng Liu Department of Eectrica and Computer Engineering University of Toronto, Toronto ON, Canada M5S 3G4 Emais:

More information

Transformation Invariance in Pattern Recognition: Tangent Distance and Propagation

Transformation Invariance in Pattern Recognition: Tangent Distance and Propagation Transformation Invariance in Pattern Recognition: Tangent Distance and Propagation Patrice Y. Simard, 1 Yann A. Le Cun, 2 John S. Denker, 2 Bernard Victorri 3 1 Microsoft Research, 1 Microsoft Way, Redmond,

More information

Quality of Service Evaluations of Multicast Streaming Protocols *

Quality of Service Evaluations of Multicast Streaming Protocols * Quaity of Service Evauations of Muticast Streaming Protocos Haonan Tan Derek L. Eager Mary. Vernon Hongfei Guo omputer Sciences Department University of Wisconsin-Madison, USA {haonan, vernon, guo}@cs.wisc.edu

More information

Binarized support vector machines

Binarized support vector machines Universidad Caros III de Madrid Repositorio instituciona e-archivo Departamento de Estadística http://e-archivo.uc3m.es DES - Working Papers. Statistics and Econometrics. WS 2007-11 Binarized support vector

More information

A Novel Method for Early Software Quality Prediction Based on Support Vector Machine

A Novel Method for Early Software Quality Prediction Based on Support Vector Machine A Nove Method for Eary Software Quaity Prediction Based on Support Vector Machine Fei Xing 1,PingGuo 1;2, and Michae R. Lyu 2 1 Department of Computer Science Beijing Norma University, Beijing, 1875, China

More information

Further Optimization of the Decoding Method for Shortened Binary Cyclic Fire Code

Further Optimization of the Decoding Method for Shortened Binary Cyclic Fire Code Further Optimization of the Decoding Method for Shortened Binary Cycic Fire Code Ch. Nanda Kishore Heosoft (India) Private Limited 8-2-703, Road No-12 Banjara His, Hyderabad, INDIA Phone: +91-040-3378222

More information

Neural Networks. Aarti Singh & Barnabas Poczos. Machine Learning / Apr 24, Slides Courtesy: Tom Mitchell

Neural Networks. Aarti Singh & Barnabas Poczos. Machine Learning / Apr 24, Slides Courtesy: Tom Mitchell Neura Networks Aarti Singh & Barnabas Poczos Machine Learning 10-701/15-781 Apr 24, 2014 Sides Courtesy: Tom Mitche 1 Logis0c Regression Assumes the foowing func1ona form for P(Y X): Logis1c func1on appied

More information

Improvement of Nearest-Neighbor Classifiers via Support Vector Machines

Improvement of Nearest-Neighbor Classifiers via Support Vector Machines From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). A rights reserved. Improvement of Nearest-Neighbor Cassifiers via Support Vector Machines Marc Sebban and Richard Nock TRIVIA-Department

More information

Joint disparity and motion eld estimation in. stereoscopic image sequences. Ioannis Patras, Nikos Alvertos and Georgios Tziritas y.

Joint disparity and motion eld estimation in. stereoscopic image sequences. Ioannis Patras, Nikos Alvertos and Georgios Tziritas y. FORTH-ICS / TR-157 December 1995 Joint disparity and motion ed estimation in stereoscopic image sequences Ioannis Patras, Nikos Avertos and Georgios Tziritas y Abstract This work aims at determining four

More information

MACHINE learning techniques can, automatically,

MACHINE learning techniques can, automatically, Proceedings of Internationa Joint Conference on Neura Networks, Daas, Texas, USA, August 4-9, 203 High Leve Data Cassification Based on Network Entropy Fiipe Aves Neto and Liang Zhao Abstract Traditiona

More information

Automatic Hidden Web Database Classification

Automatic Hidden Web Database Classification Automatic idden Web atabase Cassification Zhiguo Gong, Jingbai Zhang, and Qian Liu Facuty of Science and Technoogy niversity of Macau Macao, PRC {fstzgg,ma46597,ma46620}@umac.mo Abstract. In this paper,

More information

MULTITASK MULTIVARIATE COMMON SPARSE REPRESENTATIONS FOR ROBUST MULTIMODAL BIOMETRICS RECOGNITION. Heng Zhang, Vishal M. Patel and Rama Chellappa

MULTITASK MULTIVARIATE COMMON SPARSE REPRESENTATIONS FOR ROBUST MULTIMODAL BIOMETRICS RECOGNITION. Heng Zhang, Vishal M. Patel and Rama Chellappa MULTITASK MULTIVARIATE COMMON SPARSE REPRESENTATIONS FOR ROBUST MULTIMODAL BIOMETRICS RECOGNITION Heng Zhang, Visha M. Pate and Rama Cheappa Center for Automation Research University of Maryand, Coage

More information

A Local Optimal Method on DSA Guiding Template Assignment with Redundant/Dummy Via Insertion

A Local Optimal Method on DSA Guiding Template Assignment with Redundant/Dummy Via Insertion A Loca Optima Method on DSA Guiding Tempate Assignment with Redundant/Dummy Via Insertion Xingquan Li 1, Bei Yu 2, Jiani Chen 1, Wenxing Zhu 1, 24th Asia and South Pacific Design T h e p i c Automation

More information

Outline. Introduce yourself!! What is Machine Learning? What is CAP-5610 about? Class information and logistics

Outline. Introduce yourself!! What is Machine Learning? What is CAP-5610 about? Class information and logistics Outine Introduce yoursef!! What is Machine Learning? What is CAP-5610 about? Cass information and ogistics Lecture Notes for E Apaydın 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) About

More information

Forgot to compute the new centroids (-1); error in centroid computations (-1); incorrect clustering results (-2 points); more than 2 errors: 0 points.

Forgot to compute the new centroids (-1); error in centroid computations (-1); incorrect clustering results (-2 points); more than 2 errors: 0 points. Probem 1 a. K means is ony capabe of discovering shapes that are convex poygons [1] Cannot discover X shape because X is not convex. [1] DBSCAN can discover X shape. [1] b. K-means is prototype based and

More information

Semi-Supervised Learning with Sparse Distributed Representations

Semi-Supervised Learning with Sparse Distributed Representations Semi-Supervised Learning with Sparse Distributed Representations David Zieger dzieger@stanford.edu CS 229 Fina Project 1 Introduction For many machine earning appications, abeed data may be very difficut

More information

Solving Large Double Digestion Problems for DNA Restriction Mapping by Using Branch-and-Bound Integer Linear Programming

Solving Large Double Digestion Problems for DNA Restriction Mapping by Using Branch-and-Bound Integer Linear Programming The First Internationa Symposium on Optimization and Systems Bioogy (OSB 07) Beijing, China, August 8 10, 2007 Copyright 2007 ORSC & APORC pp. 267 279 Soving Large Doube Digestion Probems for DNA Restriction

More information

Constellation Models for Recognition of Generic Objects

Constellation Models for Recognition of Generic Objects 1 Consteation Modes for Recognition of Generic Objects Wei Zhang Schoo of Eectrica Engineering and Computer Science Oregon State University zhangwe@eecs.oregonstate.edu Abstract Recognition of generic

More information

Relative Positioning from Model Indexing

Relative Positioning from Model Indexing Reative Positioning from Mode Indexing Stefan Carsson Computationa Vision and Active Perception Laboratory (CVAP)* Roya Institute of Technoogy (KTH), Stockhom, Sweden Abstract We show how to determine

More information

ACTIVE LEARNING ON WEIGHTED GRAPHS USING ADAPTIVE AND NON-ADAPTIVE APPROACHES. Eyal En Gad, Akshay Gadde, A. Salman Avestimehr and Antonio Ortega

ACTIVE LEARNING ON WEIGHTED GRAPHS USING ADAPTIVE AND NON-ADAPTIVE APPROACHES. Eyal En Gad, Akshay Gadde, A. Salman Avestimehr and Antonio Ortega ACTIVE LEARNING ON WEIGHTED GRAPHS USING ADAPTIVE AND NON-ADAPTIVE APPROACHES Eya En Gad, Akshay Gadde, A. Saman Avestimehr and Antonio Ortega Department of Eectrica Engineering University of Southern

More information

Response Surface Model Updating for Nonlinear Structures

Response Surface Model Updating for Nonlinear Structures Response Surface Mode Updating for Noninear Structures Gonaz Shahidi a, Shamim Pakzad b a PhD Student, Department of Civi and Environmenta Engineering, Lehigh University, ATLSS Engineering Research Center,

More information

MULTIGRID REDUCTION IN TIME FOR NONLINEAR PARABOLIC PROBLEMS: A CASE STUDY

MULTIGRID REDUCTION IN TIME FOR NONLINEAR PARABOLIC PROBLEMS: A CASE STUDY MULTIGRID REDUCTION IN TIME FOR NONLINEAR PARABOLIC PROBLEMS: A CASE STUDY R.D. FALGOUT, T.A. MANTEUFFEL, B. O NEILL, AND J.B. SCHRODER Abstract. The need for paraeism in the time dimension is being driven

More information

Research of Classification based on Deep Neural Network

Research of  Classification based on Deep Neural Network 2018 Internationa Conference on Sensor Network and Computer Engineering (ICSNCE 2018) Research of Emai Cassification based on Deep Neura Network Wang Yawen Schoo of Computer Science and Engineering Xi

More information

1682 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 6, DECEMBER Backward Fuzzy Rule Interpolation

1682 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 6, DECEMBER Backward Fuzzy Rule Interpolation 1682 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 6, DECEMBER 2014 Bacward Fuzzy Rue Interpoation Shangzhu Jin, Ren Diao, Chai Que, Senior Member, IEEE, and Qiang Shen Abstract Fuzzy rue interpoation

More information

A probabilistic fuzzy method for emitter identification based on genetic algorithm

A probabilistic fuzzy method for emitter identification based on genetic algorithm A probabitic fuzzy method for emitter identification based on genetic agorithm Xia Chen, Weidong Hu, Hongwen Yang, Min Tang ATR Key Lab, Coege of Eectronic Science and Engineering Nationa University of

More information

Fastest-Path Computation

Fastest-Path Computation Fastest-Path Computation DONGHUI ZHANG Coege of Computer & Information Science Northeastern University Synonyms fastest route; driving direction Definition In the United states, ony 9.% of the househods

More information

Quaternion Support Vector Classifier

Quaternion Support Vector Classifier Quaternion Support Vector Cassifier G. López-Gonzáez, Nancy Arana-Danie, and Eduardo Bayro-Corrochano CINVESTAV - Unidad Guadaajara, Av. de Bosque 1145, Coonia e Bajo, Zapopan, Jaisco, México {geopez,edb}@gd.cinvestav.mx

More information

A Discriminative Global Training Algorithm for Statistical MT

A Discriminative Global Training Algorithm for Statistical MT Discriminative Goba Training gorithm for Statistica MT Christoph Timann IBM T.J. Watson Research Center Yorktown Heights, N.Y. 10598 cti@us.ibm.com Tong Zhang Yahoo! Research New York Cit, N.Y. 10011 tzhang@ahoo-inc.com

More information

Handling Outliers in Non-Blind Image Deconvolution

Handling Outliers in Non-Blind Image Deconvolution Handing Outiers in Non-Bind Image Deconvoution Sunghyun Cho 1 Jue Wang 2 Seungyong Lee 1,2 sodomau@postech.ac.kr juewang@adobe.com eesy@postech.ac.kr 1 POSTECH 2 Adobe Systems Abstract Non-bind deconvoution

More information

Optimized Base-Station Cache Allocation for Cloud Radio Access Network with Multicast Backhaul

Optimized Base-Station Cache Allocation for Cloud Radio Access Network with Multicast Backhaul Optimized Base-Station Cache Aocation for Coud Radio Access Network with Muticast Backhau Binbin Dai, Student Member, IEEE, Ya-Feng Liu, Member, IEEE, and Wei Yu, Feow, IEEE arxiv:804.0730v [cs.it] 28

More information

Crossing Minimization Problems of Drawing Bipartite Graphs in Two Clusters

Crossing Minimization Problems of Drawing Bipartite Graphs in Two Clusters Crossing Minimiation Probems o Drawing Bipartite Graphs in Two Custers Lanbo Zheng, Le Song, and Peter Eades Nationa ICT Austraia, and Schoo o Inormation Technoogies, University o Sydney,Austraia Emai:

More information

Fault detection and classification by unsupervised feature extraction and dimensionality reduction

Fault detection and classification by unsupervised feature extraction and dimensionality reduction Compex Inte. Syst. (2015) 1:25 33 DOI 10.1007/s40747-015-0004-2 ORIGINAL ARTICLE Faut detection and cassification by unsupervised feature extraction and dimensionaity reduction Praveen Chopra 1,2 Sandeep

More information

WATERMARKING GIS DATA FOR DIGITAL MAP COPYRIGHT PROTECTION

WATERMARKING GIS DATA FOR DIGITAL MAP COPYRIGHT PROTECTION WATERMARKING GIS DATA FOR DIGITAL MAP COPYRIGHT PROTECTION Shen Tao Chinese Academy of Surveying and Mapping, Beijing 100039, China shentao@casm.ac.cn Xu Dehe Institute of resources and environment, North

More information

Formulation of Loss minimization Problem Using Genetic Algorithm and Line-Flow-based Equations

Formulation of Loss minimization Problem Using Genetic Algorithm and Line-Flow-based Equations Formuation of Loss minimization Probem Using Genetic Agorithm and Line-Fow-based Equations Sharanya Jaganathan, Student Member, IEEE, Arun Sekar, Senior Member, IEEE, and Wenzhong Gao, Senior member, IEEE

More information

Reference trajectory tracking for a multi-dof robot arm

Reference trajectory tracking for a multi-dof robot arm Archives of Contro Sciences Voume 5LXI, 5 No. 4, pages 53 57 Reference trajectory tracking for a muti-dof robot arm RÓBERT KRASŇANSKÝ, PETER VALACH, DÁVID SOÓS, JAVAD ZARBAKHSH This paper presents the

More information

GPU Implementation of Parallel SVM as Applied to Intrusion Detection System

GPU Implementation of Parallel SVM as Applied to Intrusion Detection System GPU Impementation of Parae SVM as Appied to Intrusion Detection System Sudarshan Hiray Research Schoar, Department of Computer Engineering, Vishwakarma Institute of Technoogy, Pune, India sdhiray7@gmai.com

More information

Development of a hybrid K-means-expectation maximization clustering algorithm

Development of a hybrid K-means-expectation maximization clustering algorithm Journa of Computations & Modeing, vo., no.4, 0, -3 ISSN: 79-765 (print, 79-8850 (onine Scienpress Ltd, 0 Deveopment of a hybrid K-means-expectation maximization custering agorithm Adigun Abimboa Adebisi,

More information

FACE RECOGNITION WITH HARMONIC DE-LIGHTING. s: {lyqing, sgshan, wgao}jdl.ac.cn

FACE RECOGNITION WITH HARMONIC DE-LIGHTING.  s: {lyqing, sgshan, wgao}jdl.ac.cn FACE RECOGNITION WITH HARMONIC DE-LIGHTING Laiyun Qing 1,, Shiguang Shan, Wen Gao 1, 1 Graduate Schoo, CAS, Beijing, China, 100080 ICT-ISVISION Joint R&D Laboratory for Face Recognition, CAS, Beijing,

More information

A NEW APPROACH FOR BLOCK BASED STEGANALYSIS USING A MULTI-CLASSIFIER

A NEW APPROACH FOR BLOCK BASED STEGANALYSIS USING A MULTI-CLASSIFIER Internationa Journa on Technica and Physica Probems of Engineering (IJTPE) Pubished by Internationa Organization of IOTPE ISSN 077-358 IJTPE Journa www.iotpe.com ijtpe@iotpe.com September 014 Issue 0 Voume

More information

A Column Generation Approach for Support Vector Machines

A Column Generation Approach for Support Vector Machines A Coumn Generation Approach for Support Vector Machines Emiio Carrizosa Universidad de Sevia (Spain). ecarrizosa@us.es Beén Martín-Barragán Universidad de Sevia (Spain). bemart@us.es Doores Romero Moraes

More information

As Michi Henning and Steve Vinoski showed 1, calling a remote

As Michi Henning and Steve Vinoski showed 1, calling a remote Reducing CORBA Ca Latency by Caching and Prefetching Bernd Brügge and Christoph Vismeier Technische Universität München Method ca atency is a major probem in approaches based on object-oriented middeware

More information

Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-Density

Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-Density Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-Density Hiroaki Sasaki 1, Aapo Hyvärinen 2,3, and Masashi Sugiyama 1 1 Graduate School of Information Science and Engineering,

More information

Fast b-matching via Sufficient Selection Belief Propagation

Fast b-matching via Sufficient Selection Belief Propagation Fast b-matching via Sufficient Seection Beief Propagation Bert Huang Computer Science Department Coumbia University New York, NY 127 bert@cs.coumbia.edu Tony Jebara Computer Science Department Coumbia

More information

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Why Learn to Program?

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Why Learn to Program? Intro to Programming & C++ Unit 1 Sections 1.1-3 and 2.1-10, 2.12-13, 2.15-17 CS 1428 Spring 2018 Ji Seaman 1.1 Why Program? Computer programmabe machine designed to foow instructions Program a set of

More information

Learning to Learn Second-Order Back-Propagation for CNNs Using LSTMs

Learning to Learn Second-Order Back-Propagation for CNNs Using LSTMs Learning to Learn Second-Order Bac-Propagation for CNNs Using LSTMs Anirban Roy SRI Internationa Meno Par, USA anirban.roy@sri.com Sinisa Todorovic Oregon State University Corvais, USA sinisa@eecs.oregonstate.edu

More information

Outline. Parallel Numerical Algorithms. Forward Substitution. Triangular Matrices. Solving Triangular Systems. Back Substitution. Parallel Algorithm

Outline. Parallel Numerical Algorithms. Forward Substitution. Triangular Matrices. Solving Triangular Systems. Back Substitution. Parallel Algorithm Outine Parae Numerica Agorithms Chapter 8 Prof. Michae T. Heath Department of Computer Science University of Iinois at Urbana-Champaign CS 554 / CSE 512 1 2 3 4 Trianguar Matrices Michae T. Heath Parae

More information

Special Edition Using Microsoft Excel Selecting and Naming Cells and Ranges

Special Edition Using Microsoft Excel Selecting and Naming Cells and Ranges Specia Edition Using Microsoft Exce 2000 - Lesson 3 - Seecting and Naming Ces and.. Page 1 of 8 [Figures are not incuded in this sampe chapter] Specia Edition Using Microsoft Exce 2000-3 - Seecting and

More information

Real-Time Feature Descriptor Matching via a Multi-Resolution Exhaustive Search Method

Real-Time Feature Descriptor Matching via a Multi-Resolution Exhaustive Search Method 297 Rea-Time Feature escriptor Matching via a Muti-Resoution Ehaustive Search Method Chi-Yi Tsai, An-Hung Tsao, and Chuan-Wei Wang epartment of Eectrica Engineering, Tamang University, New Taipei City,

More information

Lecture outline Graphics and Interaction Scan Converting Polygons and Lines. Inside or outside a polygon? Scan conversion.

Lecture outline Graphics and Interaction Scan Converting Polygons and Lines. Inside or outside a polygon? Scan conversion. Lecture outine 433-324 Graphics and Interaction Scan Converting Poygons and Lines Department of Computer Science and Software Engineering The Introduction Scan conversion Scan-ine agorithm Edge coherence

More information

A study of comparative evaluation of methods for image processing using color features

A study of comparative evaluation of methods for image processing using color features A study of comparative evauation of methods for image processing using coor features FLORENTINA MAGDA ENESCU,CAZACU DUMITRU Department Eectronics, Computers and Eectrica Engineering University Pitești

More information

Fast b-matching via Sufficient Selection Belief Propagation

Fast b-matching via Sufficient Selection Belief Propagation Bert Huang Computer Science Department Coumbia University New York, NY 127 bert@cs.coumbia.edu Tony Jebara Computer Science Department Coumbia University New York, NY 127 jebara@cs.coumbia.edu Abstract

More information

A Two-Step Approach to Hallucinating Faces: Global Parametric Model and Local Nonparametric Model

A Two-Step Approach to Hallucinating Faces: Global Parametric Model and Local Nonparametric Model A Two-Step Approach to aucinating Faces: Goba Parametric Mode and Loca Nonparametric Mode Ce Liu eung-yeung Shum Chang-Shui Zhang State Key Lab of nteigent Technoogy and Systems, Dept. of Automation, Tsinghua

More information

Complex Human Activity Searching in a Video Employing Negative Space Analysis

Complex Human Activity Searching in a Video Employing Negative Space Analysis Compex Human Activity Searching in a Video Empoying Negative Space Anaysis Shah Atiqur Rahman, Siu-Yeung Cho, M.K.H. Leung 3, Schoo of Computer Engineering, Nanyang Technoogica University, Singapore 639798

More information

Multiple Plane Phase Retrieval Based On Inverse Regularized Imaging and Discrete Diffraction Transform

Multiple Plane Phase Retrieval Based On Inverse Regularized Imaging and Discrete Diffraction Transform Mutipe Pane Phase Retrieva Based On Inverse Reguaried Imaging and Discrete Diffraction Transform Artem Migukin, Vadimir Katkovnik, and Jaakko Astoa Department of Signa Processing, Tampere University of

More information

Multi-Robot Pose Graph Localization and Data Association from Unknown Initial Relative Poses

Multi-Robot Pose Graph Localization and Data Association from Unknown Initial Relative Poses 1 Muti-Robot Pose Graph Locaization and Data Association from Unknown Initia Reative Poses Vadim Indeman, Erik Neson, Nathan Michae and Frank Deaert Institute of Robotics and Inteigent Machines (IRIM)

More information

Multi-task hidden Markov modeling of spectrogram feature from radar high-resolution range profiles

Multi-task hidden Markov modeling of spectrogram feature from radar high-resolution range profiles http://asp.eurasipjournas.com/content/22//86 RESEARCH Open Access Muti-task hidden Markov modeing of spectrogram feature from radar high-resoution range profies Mian Pan, Lan Du *, Penghui Wang, Hongwei

More information

LEARNING causal structures is one of the main problems

LEARNING causal structures is one of the main problems cupc: CUDA-based Parae PC Agorithm for Causa Structure Learning on PU Behrooz Zare, Foad Jafarinejad, Matin Hashemi, and Saber Saehkaeybar arxiv:8.89v [cs.dc] Dec 8 Abstract The main goa in many fieds

More information

Quality Assessment using Tone Mapping Algorithm

Quality Assessment using Tone Mapping Algorithm Quaity Assessment using Tone Mapping Agorithm Nandiki.pushpa atha, Kuriti.Rajendra Prasad Research Schoar, Assistant Professor, Vignan s institute of engineering for women, Visakhapatnam, Andhra Pradesh,

More information

fastruct: model-based clustering made faster

fastruct: model-based clustering made faster fastruct: mode-based custering made faster Chibiao Chen Forence Forbes Oivier François INRIA Rhone-Apes, 655 Avenue de Europe, Montbonnot, 38334 St Ismier France TIMC-TIMB, Dept Math Bioogy, Facuty of

More information