Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Size: px
Start display at page:

Download "Integrating Constraints and Metric Learning in Semi-Supervised Clustering"

Transcription

1 Integrating Constraints and Metric Learning in Semi-Supervised Clustering Mikail Bilenko Sugato Basu Raymond J. Mooney Department of Computer Sciences, University of Texas at Austin, Austin, TX USA Abstract Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in te area as utilized supervised data in one of two approaces: 1) constraint-based metods tat guide te clustering algoritm towards a better grouping of te data, and 2) distance-function learning metods tat adapt te underlying similarity metric used by te clustering algoritm. Tis paper provides new metods for te two approaces as well as presents a new semi-supervised clustering algoritm tat integrates bot of tese tecniques in a uniform, principled framework. Experimental results demonstrate tat te unified approac produces better clusters tan bot individual approaces as well as previously proposed semisupervised clustering algoritms. 1. Introduction In many learning tasks, unlabeled data is plentiful but labeled data is limited and expensive to generate. Consequently, semi-supervised learning, wic employs bot labeled and unlabeled data, as become a topic of significant interest. More specifically, semi-supervised clustering, te use of class labels or pairwise constraints on some examples to aid unsupervised clustering, as been te focus of several recent projects (Wagstaff et al., 2001; Basu et al., 2002; Klein et al., 2002; Xing et al., 2003; Bar-Hillel et al., 2003; Segal et al., 2003). Existing metods for semi-supervised clustering fall into two general approaces we call constraint-based and metric-based. In constraint-based approaces, te clustering algoritm itself is modified so tat user-provided labels or pairwise constraints are used to guide te algoritm towards a more appropriate data partitioning. Tis is done by modifying te clustering objective function so tat it includes satisfaction of constraints (Demiriz et al., Appearing in Proceedings of te 21 st International Conference on Macine Learning, Banff, Canada, Copyrigt 2004 by te autors. 1999), enforcing constraints during te clustering process (Wagstaff et al., 2001), or initializing and constraining clustering based on labeled examples (Basu et al., 2002). In metric-based approaces, an existing clustering algoritm tat uses a distance metric is employed; owever, te metric is first trained to satisfy te labels or constraints in te supervised data. Several distance measures ave been used for metric-based semi-supervised clustering including Euclidean distance trained by a sortest-pat algoritm (Klein et al., 2002), string-edit distance learned using Expectation Maximization (EM) (Bilenko & Mooney, 2003), KL divergence adapted using gradient descent (Con et al., 2003), and Maalanobis distances trained using convex optimization (Xing et al., 2003; Bar-Hillel et al., 2003). Previous metric-based semi-supervised clustering algoritms exclude unlabeled data from te metric training step, as well as separate metric learning from te clustering process. Also, existing metric-based metods use a single distance metric for all clusters, forcing tem to ave similar sapes. We propose a new semi-supervised clustering algoritm derived from, MPCK-MEANS, tat incorporates bot metric learning and te use of pairwise constraints in a principled manner. MPCK-MEANS performs distance-metric training wit eac clustering iteration, utilizing bot unlabeled data and pairwise constraints. Te algoritm is able to learn individual metrics for eac cluster, wic permits clusters of different sapes. MPCK- MEANS also allows violation of constraints if it leads to a more coesive clustering, wereas earlier constraint-based metods forced satisfaction of all constraints, leaving tem vulnerable to noisy supervision. By ablating te metric-based and constraint-based components of our unified metod, we present experimental results comparing and combining te two approaces on multiple datasets. Te two metods for semi-supervision individually improve clustering accuracy, and our unified approac integrates teir strengts. Finally, we demonstrate tat te semi-supervised metric learning in our approac outperforms previously proposed metods tat learn metrics prior to clustering, and tat learning multiple clusterspecific metrics can lead to better results.

2 2. Problem Formulation 2.1. Clustering wit is a clustering algoritm based on iterative relocation tat partitions a dataset into K clusters, locally minimizing te total squared Euclidean distance between te data points and te cluster centroids. Let X = {x i } N i=1,x i Ê m be a set of data points, x id be te d-t component of x i, {µ } K =1 represent te K cluster centroids, and l i be te cluster assignment of a point x i, were l i {1,...,K}. Te Euclidean algoritm creates a K-partitioning {X } K =1 of X so tat te objective function x x i X i µ li 2 is locally minimized. It can be sown tat te algoritm is essentially an EM algoritm on a mixture of K Gaussians under assumptions of identity covariance of te Gaussians, uniform mixture component priors and expectation under a particular type of conditional distribution (Basu et al., 2002). In te Euclidean formulation, te squared L 2 -norm x i µ li 2 = (x i µ li ) T (x i µ li ) between a point x i and its corresponding cluster centroid µ li is used as te distance measure, wic is a direct consequence of te identity covariance assumption of te underlying Gaussians Semi-supervised Clustering wit Constraints In semi-supervised clustering, a small amount of labeled data is available to aid te clustering process. Our framework uses bot must-link and cannot-link constraints between pairs of instances (Wagstaff et al., 2001), wit an associated cost for violating eac constraint. In many unsupervised-learning applications, e.g., clustering for speaker identification in a conversation (Bar-Hillel et al., 2003), or clustering GPS data for lane-finding (Wagstaff et al., 2001), considering supervision in te form of constraints is more realistic tan providing class labels. Wile class labels may be unknown, a user can still specify weter pairs of points belong to same or different clusters. Constraint-based supervision is also more general tan class labels: a set of classified points implies an equivalent set of pairwise constraints, but not vice versa. Since cannot directly andle pairwise constraints, we formulate te goal of pairwise constrained clustering as minimizing a combined objective function, defined as te sum of te total squared distances between te points and teir cluster centroids, and te cost incurred by violating any pairwise constraints. Let M be a set of must-link pairs were (x i,x j ) M implies x i and x j sould be in te same cluster, and C be a set of cannot-link pairs were (x i,x j ) C implies x i and x j sould be in different clusters. Let W = {w ij } and W = {w ij } be penalty costs for violating te constraints in M and C respectively. Terefore, te goal of pairwise constrained is to minimize te following objective function, were point x i is assigned to te partition X li wit centroid µ li : J pckmeans = X x i µ li 2 x i X (x i,x j ) C (x i,x j ) M w ij½[l i l j] w ij½[l i = l j] (1) were ½ is te indicator function, ½[true] = 1 and ½[false] = 0. Tis matematical formulation is motivated by te metric labeling problem wit te generalized Potts model (Kleinberg & Tardos, 1999) Semi-supervised Clustering wit Metric Learning Wile pairwise constraints can guide a clustering algoritm towards a better grouping, tey can also be used to adapt te underlying distance metric. Pairwise constraints effectively represent te user s view of similarity in te domain. Since te original data representation may not specify a space were clusters are sufficiently separated, modifying te distance metric warps te space to minimize distances between same-cluster objects, wile maximizing distances between different-cluster objects. As a result, clusters discovered using learned metrics adere more closely to te notion of similarity embodied in te supervision. We parameterize Euclidean distance using a symmetric positive-definite matrix A as follows: x i x j A = (x i µ li ) T A(x i µ li ); te same parameterization was previously used by Xing et al. (2003) and Bar-Hillel et al. (2003). If A is restricted to a diagonal matrix, it scales eac dimension by a different weigt and corresponds to feature weigting; oterwise new features are created tat are linear combinations of te original ones. In previous work on adaptive metrics for clustering (Con et al., 2003; Xing et al., 2003; Bar-Hillel et al., 2003), metric weigts are trained to simultaneously minimize te distance between must-linked instances and maximize te distance between cannot-linked instances. A fundamental limitation of tese approaces is tat tey assume a single metric for all clusters, preventing tem from aving different sapes. We allow a separate weigt matrix for eac cluster, denoted A for cluster. Tis is equivalent to a generalized version of te model described in section 2.1, were cluster is generated by a Gaussian wit covariance matrix A 1 (Bilmes, 1997). It can be sown tat maximizing te complete data log-likeliood under tis generalized model is equivalent to minimizing te objective function: J mkmeans = X ` xi µ li 2 A log(det(a li l i )) (2) x i X were te second term arises due to te normalizing constant of l i -t Gaussian wit covariance matrix A 1 l i.

3 2.4. Integrating Constraints and Metric Learning Combining Eqns.(1) and (2) leads to te following objective function tat minimizes cluster dispersion under te learned metrics wile reducing constraint violations: J combined = X ` xi µ li 2 A log(det(a li l i )) x i X (x i,x j ) M w ij½[l i l j] w ij½[l i = l j] (3) (x i,x j ) C If we assume uniform constraint costs w ij and w ij, all constraint violations are treated equally. However, te penalty for violating a must-link constraint between distant points sould be iger tan tat between nearby points. Intuitively, tis captures te fact tat if two must-linked points are far apart according to te current metric, te metric is grossly inadequate and needs severe modification. Since two clusters are involved in a must-link violation, te corresponding penalty sould affect te metrics for bot clusters. Tis can be accomplised via multiplying te penalty in te second summation of Eqn.(3) by te following function: f M(x i,x j) = 1 2 xi xj 2 A li xi xj 2 A lj (4) Analogously, te penalty for violating a cannot-link constraint between two points tat are nearby according to te current metric sould be iger tan for two distant points. To reflect tis intuition, te following penalty term can be used wit violated cannot-link constraints tat are assigned to te same cluster (l i = l j ): f C(x i,x j) = x l i x l i 2 A li x i x j 2 A li (5) were (x l i,x l i ) is te maximally separated pair of points in te dataset according to l i -t metric. Tis form of f C ensures tat te penalty for violating a cannot-link constraint remains non-negative since te second term is never greater tan te first. Te combined objective function ten becomes: J mpckm = X ` xi µ li 2 A log(det(a li l i )) x i X w ijf M(x i,x j)½[l i l j] (6) (x i,x j ) M w ijf C(x i,x j)½[l i = l j] (x i,x j ) C Costs w ij and w ij provide a way of specifying te relative importance of te labeled versus unlabeled data wile allowing individual constraint weigts. Te following section describes ow J mpckm can be greedily optimized by our proposed metric pairwise constrained (MPCK- MEANS) algoritm. 3. MPCK-MEANS Algoritm Given a set of data points X, a set of must-link constraints M, a set of cannot-link constraints C, corresponding cost sets W and W, and te desired number of clusters K, MPCK-MEANS finds a disjoint K-partitioning {X } K =1 of X (wit eac cluster aving a centroid µ and a local weigt matrix A ) suc tat J mpckm is (locally) minimized. Te algoritm integrates te use of constraints and metric learning. Constraints are utilized during cluster initialization and wen assigning points to clusters, and te distance metric is adapted by re-estimating te weigt matrices A during eac iteration based on te current cluster assignments and constraint violations. Pseudocode for te algoritm is presented in Fig.1. Algoritm: Input: Set of data points X = {x i } N i=1, set of must-link constraints M = {(x i,x j )}, set of cannot-link constraints C = {(x i,x j )}, number of clusters K, sets of constraint costs W and W. Output: Disjoint K-partitioning {X } K =1 of X suc tat objective function J mpckm is (locally) minimized. Metod: 1. Initialize clusters: 1a. create te λ neigboroods {N p } λ p=1 from M and C 1b. if λ K initialize {µ (0) }K =1 using weigted fartest-first traversal starting from te largest N p else if λ < K initialize {µ (0) }λ =1 wit centroids of {N p} λ p=1 initialize remaining clusters at random 2. Repeat until convergence 2a. assign cluster: Assign eac data point x i to cluster (i.e. set X (t+1) ), for ( = arg min xi µ (t) 2 log(det(a A )) 2b. estimate means: {µ (t+1) } K =1 { 1 X (t+1) + (xi,xj) M w ijf M (x i,x j )½[ l j ] + (xi,xj) C w ijf C (x i,x j )½[ = l j ] ) x} K x X (t+1) =1 2c. update metrics: A = X ( xi X (x i µ )(x i µ ) T + (xi,xj) M 1 2 w ij(x i x j )(x i x j ) T ½[l i l j ] + (xi,xj) C w ij( (x x )(x x )T 2d. t (t + 1) 3.1. Initialization (x i x j )(x i x j ) T) ½[l i = l j ] Figure 1. MPCK-MEANS algoritm ) 1 Good initial centroids are critical to te success of greedy clustering algoritms suc as. To infer te initial clusters from te constraints, we take te transitive closure of te must-link constraints and augment te set M wit tese entailed constraints (assuming consistency of te constraints). Let λ be te number of connected components in te augmented set M. Tese connected components are used to create λ neigborood sets {N p } λ p=1, were eac neigborood consists of points connected by must-links. For every pair of neigboroods N p and N p tat ave at least one cannot-link between tem, we add cannot-link constraints between every pair of points in N p and N p and augment te cannot-link set C wit tese entailed constraints. We will overload notation from tis point and refer

4 to te augmented must-link and cannot-link sets as M and C respectively. After tis preprocessing step, we get λ neigborood sets {N p } λ p=1. Tese neigboroods provide initial clusters for te MPCK-MEANS algoritm. If λ K, we initialize λ cluster centers wit te centroids of all te λ neigborood sets. If λ < K, we initialize te remaining K λ clusters wit points obtained by random perturbations of te global centroid of X. If λ > K, we select K neigborood sets using a weigted variant of te fartest-first algoritm, wic is a good euristic for initialization in centroid-based clustering algoritms like. In weigted fartest-first traversal, te goal is to find K points wic are maximally separated from eac oter in terms of a weigted distance. In our case, te points are te centroids of te λ neigboroods, and te weigt of eac centroid is te size of its corresponding neigborood. Tus, we bias fartest-first to select centroids wic are relatively far apart but also represent large neigboroods, in order to obtain good initial clusters. In weigted fartest-first traversal, we maintain a set of traversed points at every step, and pick te following point aving te fartest weigted distance from te traversed set (using te standard notion of distance from a set: d(x,s) = min y S d(x,y)), and so on. Finally, we initialize te K cluster centers wit te centroids of te K neigboroods cosen by weigted fartest-first traversal E-step MPCK-MEANS alternates between cluster assignment in te E-step, and centroid estimation and metric learning in te M-step (see Step 2 in Fig.1). In te E-step, every point x is assigned to te cluster tat minimizes te sum of te distance of x to te cluster centroid according to te local metric and te cost of any constraint violations incurred by tis cluster assignment. Points are randomly re-ordered for eac assignment sequence, and once a point x is assigned to a cluster, te subsequent points in te random ordering use te current cluster assignment of x to calculate possible constraint violations. Note tat tis assignment step is order-dependent, since te subsets of M and C relevant to eac cluster may cange wit te assignment of a point. We experimented wit random ordering as well as a greedy strategy tat first assigned instances tat are closest to te cluster centroid and involved in a minimal number of constraints. Tese experiments sowed tat te order of assignment does not result in statistically significant differences in clustering quality; terefore, we used random ordering in our evaluation. In te E-step, eac point moves to a new cluster only if te component of J mpckm contributed by tis point decreases. So wen all points are given teir new assignment, J mpckm will decrease or remain te same M-step In te M-step, every cluster centroid µ is first re-estimated using te points in corresponding X. As a result, te contribution of eac cluster to J mpckm is minimized. Te pairwise constraints do not take part in tis centroid reestimation step because te constraint violations only depend on cluster assignments, wic do not cange in tis step. Tus, only te first term (te distance component) of J mpckm is minimized. Te centroid re-estimation step effectively remains te same as in. Te second part of te M-step performs metric learning, were te matrices {A } K =1 are re-estimated to decrease te objective function J mpckm. Eac updated matrix of local weigts A is obtained by taking te partial derivative J mpckm A and setting it to zero, resulting in: X A = X x i X (x i µ )(x i µ ) T 1 2 wij(xi xj)(xi xj)t ½[l i l j] (7) (x i,x j ) M `(x x )(x x ) T (x i,x j ) C w ij (x i x j)(x i x j) T ½[l i = l j] «1 were M and C are subsets of must-link and cannotlink constraints respectively tat contain points currently assigned to te -t cluster. Since eac A is obtained by inverting te summation of covariance matrices in Eqn.(7), A 1, tat summation must not be singular. If any of te obtained A 1 are singular, tey can be conditioned via adding te identity matrix multiplied by a small fraction of te trace of A 1 : A 1 = A 1 + ǫ tr(a 1 )I (Saul & Roweis, 2003). If te A resulting from te inversion is negative definite, it is mended by projecting on te set C = {A : A 0} of positive semi-definite matrices as described by Xing et al. (2003) to ensure tat it parameterizes a distance metric. For ig-dimensional or large datasets, estimating te full matrix A can be computationally expensive. In suc cases diagonal weigt matrices can be used, wic is equivalent to feature weigting, wile using te full matrix corresponds to feature generation. In te case of diagonal A, te d-t diagonal element, a () dd, corresponds to te weigt of te d-t feature for te -t cluster metric: a () dd = X X x i X (x id µ d ) wij(x id x jd ) 2 ½[l i l j] (8) (x i,x j ) M w ij`(x d x d) 2 (x id x jd ) 2 ½[l «1 i = l j] (x i,x j ) C

5 Intuitively, te first term in te sum, x i X (x id µ d ) 2, scales te weigt of eac feature proportionately to te feature s contribution to te overall cluster dispersion, analogously to scaling performed wen computing unsupervised Maalanobis distance. Te last two terms tat depend on constraint violations stretc eac dimension attempting to mend te current violations. Tus, te metric weigts are adjusted at eac iteration in suc a way tat te contribution of different attributes to distance is variance-normalized, wile constraint violations are minimized. Instead of multiple metrics {A } K =1 te algoritm can use a single metric A for all clusters. Te metric would be used and updated similarly to te description above, except tat summations in Eqns.(7) and (8) would be over X, M, and C instead of X, M, and C respectively. Te objective function decreases after every cluster assignment, centroid re-estimation and metric learning step till convergence, implying tat te MPCK-MEANS algoritm will converge to a local minima of J mpckm as long as matrices {A } K =1 are obtained directly from Eqn.(7). If any A 1 is conditioned as described above to make it positive definite or if te maximally separated points {(x,x )}K =1 cange between iterations, convergence is no longer guaranteed teoretically; owever, empirically tis as not been a problem in our experience. 4. Experiments 4.1. Metodology and Datasets Experiments were conducted on tree datasets from te UCI repository: Iris, Wine, and Ionospere (Blake & Merz, 1998); te Protein dataset used by Xing et al. (2003) and Bar-Hillel et al. (2003), and randomly sampled subsets from te Digits and Letters andwritten caracter recognition datasets, also from te UCI repository. For Digits and Letters, we cose two sets of tree classes: {I, J, L} from Letters and {3, 8, 9} from Digits, sampling 10% of te data points from te original datasets randomly. Tese classes were cosen since tey represent difficult visual discrimination problems. Table 1 summarizes te properties of te datasets: te number of instances N, te number of dimensions D, and te number of classes K. Table 1. Datasets used in experimental evaluation Iris Wine Ionospere Protein Letters Digits N D K We ave used pairwise to evaluate te clustering results based on te underlying classes. relies on te traditional information retrieval measures, adapted for evaluating clustering by considering same-cluster pairs: Precision = #PairsCorrectlyPredictedInSameCluster #T otalp airsp redictedinsamecluster Recall = #PairsCorrectlyPredictedInSameCluster #T otalp airsinsamecluster F Measure = 2 Precision Recall P recision + Recall We generated learning curves wit 5-fold cross-validation for eac dataset to determine te effect of utilizing te pairwise constraints. Eac point on te learning curve represents a particular number of randomly selected pairwise constraints given as input to te algoritm. Unit constraint costs W and W were used for all constraints, original and inferred, since te datasets did not provide individual weigts for te constraints. Te clustering algoritm was run on te wole dataset, but te pairwise was calculated only on te test set. Results were averaged over 50 runs of 5 folds Results and Discussion First, we compared constraint-based and metric-based semi-supervised clustering wit te integrated framework as well as purely unsupervised and supervised approaces. Figs.2-7 sow learning curves for te six datasets. For eac dataset, we compared five clustering scemes: MPCK-MEANS clustering, wic involves bot seeding and metric learning in te unified framework described in Section 2.4; a single metric parameterized by a diagonal matrix is used for all clusters; MK-MEANS, wic is clustering wit te metric learning component described in Section 3.3, witout utilizing constraints for initialization; a single metric parameterized by a diagonal matrix is used for all clusters; PCK-MEANS clustering, wic utilizes constraints for seeding te initial clusters and directs te cluster assignments to respect te constraints witout doing any metric learning, as outlined in Section 2.2; K-MEANS unsupervised clustering; SUPERVISED-MEANS, wic performs assignment of points to nearest cluster centroids inferred from constraints, as described in Section 3.1. Tis algoritm provides a baseline for performance of pure supervised learning based on constraints. On te presented datasets, te unified approac (MPCK- MEANS) outperforms individual seeding (PCK-MEANS) and metric learning (MK-MEANS). Superiority of semisupervised over unsupervised clustering illustrates tat providing pairwise constraints is beneficial to clustering quality. Improvements of semi-supervised clustering over SUPERVISED-MEANS indicate tat iterative refinement of

6 PC 5 5 PC PC 0.25 Figure 2. Iris: ablations Figure 3. Wine: ablations 0.2 Figure 4. Protein: ablations PC 8 7 Figure 5. Ionospere: ablations 5 5 PC Figure 6. Digits-389: ablations PC 0.35 Figure 7. Letters-IJL: ablations centroids using bot constraints and unlabeled data outperforms purely supervised assignment based on neigboroods inferred from constraints (for Ionospere, MPCK- MEANS requires eiter te full weigt matrix or individual cluster metrics to outperform SUPERVISED-MEANS, results for tese experiments are sown on Fig.11). For te Wine, Protein, and Letter-IJL datasets, te difference between metods tat utilize metric learning (MPCK- MEANS and MK-MEANS) and tose tat do not (PCK- MEANS and regular ) wit no pairwise constraints indicates tat even in te absence of constraints, weigting features by teir variance (essentially using unsupervised Maalanobis distance) improves clustering accuracy. For te Wine dataset, additional constraints provide an insubstantial improvement in cluster quality on tis dataset, wic sows tat meaningful feature weigts are obtained from scaling by variance using just te unlabeled data. Some of te metric learning curves display a caracteristic dip, were clustering accuracy decreases wen initial constraints are provided, but after a certain point starts to increase and eventually rises above te initial point on te learning curve. We conjecture tat tis penomenon is due to te fact tat metric parameters learned using few constraints are unreliable, and a significant number of constraints is required by te metric learning mecanism to estimate parameters accurately. On te oter and, seeding te clusters wit a small number of pairwise constraints as an immediate positive effect on te final cluster quality, wile providing more pairwise constraints as diminising returns, i.e., PCK-MEANS learning curves rise slowly. Wen bot seeding and metric learning are utilized, te unified approac benefits from te individual strengts of te two metods, as can be seen from te MPCK-MEANS results. In anoter set of experiments, we evaluated te utility of using individual metrics for eac cluster and te usefulness of learning a full weigt matrix A (feature generation) as opposed to a diagonal matrix (feature weigting). We ave also compared our metods wit, a semi-supervised clustering algoritm tat performs metric learning separately from te clustering process (Bar-Hillel et al., 2003), and tat as been sown to outperform a similar approac by Xing et al. (2003). Figs.8-13 sow learning curves for te six datasets on te following clustering scemes: MPCK-MEANS-S-D, wic is same as MPCK- MEANS on Figs.2-7 and involves bot seeding and metric learning; a single metric (S) parameterized by a diagonal matrix (D) is used for all clusters; MPCK-MEANS-M-D, wic involves bot seeding and metric learning; multiple metrics (M) parameterized by diagonal matrices (D) are used; MPCK-MEANS-S-F, wic involves bot seeding and metric learning; a single metric (S) parameterized by a full matrix (F) is used for all clusters; MPCK-MEANS-M-F, wic involves bot seeding and metric learning; multiple metrics (M) parameterized by full matrices (F) are used;

7 M-F Figure 8. Iris: metric learning 5 5 -M-F Figure 9. Wine: metric learning M-F 0.2 Figure 10. Protein: metric learning 5 -M-F Figure 11. Ionospere: metric learning M-F Figure 12. Digits-389: metric learning 5 5 -M-F 0.45 Figure 13. Letters-IJL: metric learning clustering, wic uses distance metric learning described in (Bar-Hillel et al., 2003) and initialization inferred from constraints as described in Section 3.1. As can be seen from results, bot full matrix parameterization and individual metrics for eac cluster can lead to significant improvements in clustering quality. However, te relative usefulness of tese two tecniques varies between te datasets, e.g., multiple metrics are particularly beneficial for Protein and Digits datasets, wile switcing from a diagonal to a full weigt matrix leads to large improvements on Wine, Ionospere, and Letters. Tese results can be explained by te fact tat te relative success of te two tecniques depends on te properties of a particular dataset: using a full weigt matrix elps wen te attributes are igly correlated, wile multiple metrics lead to improvements wen clusters in te dataset are of different sapes or lie in different subspaces of te original space. A combination of te two tecniques is most elpful wen bot of tese requirements are satisfied, as for Iris and Digits, wic was observed by visualizing tese datasets. For oter datasets, eiter multiple metrics or full weigt matrix lead to maximum performance in isolation. Comparing te performance of different variants of MPCK-MEANS wit, we can see tat early on te learning curves, were few pairwise constraints are available, leads to better metrics tan MPCK-MEANS. However, as more training data is provided, te ability of MPCK-MEANS to learn from bot supervised and unsupervised data as well as use individual metrics allows MPCK-MEANS to produce better clustering. Overall, our results indicate tat te integrated approac to utilizing pairwise constraints in clustering wit individual metrics outperforms seeding and metric learning individually and leads to improvements in cluster quality. Extending te basic approac wit a full parameterization matrix and individual metrics for eac cluster can lead to significant improvements over te basic metod. 5. Related work In previous work on constrained pairwise clustering, Wagstaff et al. (2001) proposed te COP-KMeans algoritm tat as a euristically motivated objective function. Our formulation, on te oter and, as an underlying generative model based on Hidden Markov Random Fields (see (Basu et al., 2004) for a detailed analysis). Bansal et al. (2002) also proposed a framework for pairwise constrained clustering, but teir model performs clustering using only te constraints, wereas our formulation uses bot constraints and an underlying distance metric between te points for clustering. Scultz and Joacims (2004) recently introduced a metod for learning distance metric parameters based on relative comparisons. In unsupervised clustering, Domeniconi (2002) proposed a variant of tat incorporated learning individual Euclidean metric weigts for eac cluster; our approac is more general since it allows metric learning to utilize pairwise constraints along wit unlabeled data.

8 In recent work on semi-supervised clustering wit pairwise constraints, Con et al. (2003) used gradient descent for weigted Jensen-Sannon divergence in te context of EM clustering. Xing et al. (2003) utilized a combination of gradient descent and iterative projections to learn a Maalanobis metric for clustering. Also, Bar-Hillel et al. (2003) proposed a Redundant Component Analysis () algoritm tat uses only must-link constraints to learn a Maalanobis metric using convex optimization. All tese metric learning tecniques for clustering train a single metric first using only supervised data, and ten perform clustering on te unsupervised data. In contrast, our metod integrates distance metric learning wit te clustering process and utilizes bot supervised and unsupervised data to learn multiple metrics, wic experimentally leads to improved results. Finally, a unified objective function for semi-supervised clustering wit constraints was recently proposed by Segal et al. (2003), owever, it did not incorporate distance metric learning. 6. Conclusions and Future Work Tis paper as presented MPCK-MEANS, a new approac to semi-supervised clustering tat unifies te previous constraint-based and metric-based metods. It is based on a variation of te standard clustering algoritm and uses pairwise constraints along wit unlabeled data for constraining te clustering and learning distance metrics. In contrast to previously proposed semi-supervised clustering algoritms, MPCK-MEANS also allows clusters to lie in different subspaces and ave different sapes. By ablating te individual components of our integrated approac, we ave experimentally compared metric learning and constraints in isolation wit te combined algoritm. Our results ave sown tat by unifying te advantages of bot tecniques, te integrated approac outperforms te two tecniques individually. We ave sown tat using individual metrics for different clusters, as well as performing feature generation via a full weigt matrix in contrast to feature weigting wit a diagonal weigt matrix, can lead to improvements over our basic algoritm. Extending our approac to ig-dimensional datasets, were Euclidean distance performs poorly, is te primary avenue for future researc. Oter interesting topics for future work include selection of most informative pairwise constraints tat would facilitate accurate metric learning and obtaining good initial centroids, as well as metodology for andling noisy constraints and cluster initialization sensitive to constraint costs. 7. Acknowledgments We would like to tank anonymous reviewers and Joel Tropp for insigtful comments. Tis researc was supported in part by NSF grants IIS and IIS , and by a Faculty Fellowsip from IBM Corp. References Bansal, N., Blum, A., & Cawla, S. (2002). Correlation clustering. Proceedings of te 43rd IEEE Symposium on Foundations of Computer Science (FOCS-02) (pp ). Bar-Hillel, A., Hertz, T., Sental, N., & Weinsall, D. (2003). Learning distance functions using equivalence relations. Proceedings of 20t International Conference on Macine Learning (ICML-2003) (pp ). Basu, S., Banerjee, A., & Mooney, R. J. (2002). Semi-supervised clustering by seeding. Proceedings of 19t International Conference on Macine Learning (ICML-2002) (pp ). Basu, S., Bilenko, M., & Mooney, R. J. (2004). A probabilistic framework for semi-supervised clustering. In submission, available at ttp:// ml/publication. Bilenko, M., & Mooney, R. J. (2003). Adaptive duplicate detection using learnable string similarity measures. Proceedings of te Nint ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003) (pp ). Bilmes, J. (1997). A gentle tutorial on te EM algoritm and its application to parameter estimation for Gaussian mixture and idden Markov models (Tec. Report ICSI-TR ). ICSI. Blake, C. L., & Merz, C. J. (1998). UCI repository of macine learning databases. ttp:// mlearn/mlrepository.tml. Con, D., Caruana, R., & McCallum, A. (2003). Semi-supervised clustering wit user feedback (Tec. Report TR ). Cornell University. Demiriz, A., Bennett, K. P., & Embrects, M. J. (1999). Semisupervised clustering using genetic algoritms. Artificial Neural Networks in Engineering (ANNIE-99) (pp ). Domeniconi, C. (2002). Locally adaptive tecniques for pattern classification. Doctoral dissertation, University of California, Riverside. Klein, D., Kamvar, S. D., & Manning, C. (2002). From instancelevel constraints to space-level constraints: Making te most of prior knowledge in data clustering. Proceedings of te Te Nineteent International Conference on Macine Learning (ICML-2002) (pp ). Kleinberg, J., & Tardos, E. (1999). Approximation algoritms for classification problems wit pairwise relationsips: Metric labeling and Markov random fields. Proceedings of te 40t IEEE Symposium on Foundations of Computer Science (FOCS-99) (pp ). Saul, L., & Roweis, S. (2003). Tink globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Macine Learning Researc, 4, Segal, E., Wang, H., & Koller, D. (2003). Discovering molecular patways from protein interaction and gene expression data. Bioinformatics, 19, i264 i272. Scultz, M., and Joacims, T. (2004). Learning a distance metric from relative comparisons. Advances in Neural Information Processing Systems 16. Wagstaff, K., Cardie, C., Rogers, S., & Scroedl, S. (2001). Constrained clustering wit background knowledge. Proceedings of 18t International Conference on Macine Learning (ICML-2001) (pp ). Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning, wit application to clustering wit sideinformation. Advances in Neural Information Processing Systems 15 (pp ).

Comparing and Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering

Comparing and Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering Proceedings of the ICML-2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining Systems, pp.42-49, Washington DC, August, 2003 Comparing and Unifying Search-Based

More information

Multi-View Clustering with Constraint Propagation for Learning with an Incomplete Mapping Between Views

Multi-View Clustering with Constraint Propagation for Learning with an Incomplete Mapping Between Views Multi-View Clustering wit Constraint Propagation for Learning wit an Incomplete Mapping Between Views Eric Eaton Bryn Mawr College Computer Science Department Bryn Mawr, PA 19010 eeaton@brynmawr.edu Marie

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

A Novel Approach for Weighted Clustering

A Novel Approach for Weighted Clustering A Novel Approach for Weighted Clustering CHANDRA B. Indian Institute of Technology, Delhi Hauz Khas, New Delhi, India 110 016. Email: bchandra104@yahoo.co.in Abstract: - In majority of the real life datasets,

More information

An Adaptive Kernel Method for Semi-Supervised Clustering

An Adaptive Kernel Method for Semi-Supervised Clustering An Adaptive Kernel Method for Semi-Supervised Clustering Bojun Yan and Carlotta Domeniconi Department of Information and Software Engineering George Mason University Fairfax, Virginia 22030, USA byan@gmu.edu,

More information

A Unified Framework to Integrate Supervision and Metric Learning into Clustering

A Unified Framework to Integrate Supervision and Metric Learning into Clustering A Unified Framework to Integrate Supervision and Metric Learning into Clustering Xin Li and Dan Roth Department of Computer Science University of Illinois, Urbana, IL 61801 (xli1,danr)@uiuc.edu December

More information

Two Modifications of Weight Calculation of the Non-Local Means Denoising Method

Two Modifications of Weight Calculation of the Non-Local Means Denoising Method Engineering, 2013, 5, 522-526 ttp://dx.doi.org/10.4236/eng.2013.510b107 Publised Online October 2013 (ttp://www.scirp.org/journal/eng) Two Modifications of Weigt Calculation of te Non-Local Means Denoising

More information

Unsupervised Learning for Hierarchical Clustering Using Statistical Information

Unsupervised Learning for Hierarchical Clustering Using Statistical Information Unsupervised Learning for Hierarcical Clustering Using Statistical Information Masaru Okamoto, Nan Bu, and Tosio Tsuji Department of Artificial Complex System Engineering Hirosima University Kagamiyama

More information

Bounding Tree Cover Number and Positive Semidefinite Zero Forcing Number

Bounding Tree Cover Number and Positive Semidefinite Zero Forcing Number Bounding Tree Cover Number and Positive Semidefinite Zero Forcing Number Sofia Burille Mentor: Micael Natanson September 15, 2014 Abstract Given a grap, G, wit a set of vertices, v, and edges, various

More information

Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering

Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering Submitted for pubilcation Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering Sugato Basu, Mikhail Bilenko and Raymond J. Mooney Department of Computer Sciences University

More information

4.1 Tangent Lines. y 2 y 1 = y 2 y 1

4.1 Tangent Lines. y 2 y 1 = y 2 y 1 41 Tangent Lines Introduction Recall tat te slope of a line tells us ow fast te line rises or falls Given distinct points (x 1, y 1 ) and (x 2, y 2 ), te slope of te line troug tese two points is cange

More information

Density Estimation Over Data Stream

Density Estimation Over Data Stream Density Estimation Over Data Stream Aoying Zou Dept. of Computer Science, Fudan University 22 Handan Rd. Sangai, 2433, P.R. Cina ayzou@fudan.edu.cn Ziyuan Cai Dept. of Computer Science, Fudan University

More information

Fast Calculation of Thermodynamic Properties of Water and Steam in Process Modelling using Spline Interpolation

Fast Calculation of Thermodynamic Properties of Water and Steam in Process Modelling using Spline Interpolation P R E P R N T CPWS XV Berlin, September 8, 008 Fast Calculation of Termodynamic Properties of Water and Steam in Process Modelling using Spline nterpolation Mattias Kunick a, Hans-Joacim Kretzscmar a,

More information

Computing Gaussian Mixture Models with EM using Equivalence Constraints

Computing Gaussian Mixture Models with EM using Equivalence Constraints Computing Gaussian Mixture Models with EM using Equivalence Constraints Noam Shental, Aharon Bar-Hillel, Tomer Hertz and Daphna Weinshall email: tomboy,fenoam,aharonbh,daphna@cs.huji.ac.il School of Computer

More information

Model-based Clustering With Probabilistic Constraints

Model-based Clustering With Probabilistic Constraints To appear in SIAM data mining Model-based Clustering With Probabilistic Constraints Martin H. C. Law Alexander Topchy Anil K. Jain Abstract The problem of clustering with constraints is receiving increasing

More information

More on Functions and Their Graphs

More on Functions and Their Graphs More on Functions and Teir Graps Difference Quotient ( + ) ( ) f a f a is known as te difference quotient and is used exclusively wit functions. Te objective to keep in mind is to factor te appearing in

More information

Semi-supervised Clustering

Semi-supervised Clustering Semi-supervised lustering BY: $\ S - MAI AMLT - 2016/2017 (S - MAI) Semi-supervised lustering AMLT - 2016/2017 1 / 26 Outline 1 Semisupervised lustering 2 Semisupervised lustering/labeled Examples 3 Semisupervised

More information

Semi-supervised learning

Semi-supervised learning Semi-supervised Learning COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Overview 2 Semi-supervised learning Semi-supervised classification Semi-supervised clustering Semi-supervised

More information

Numerical Derivatives

Numerical Derivatives Lab 15 Numerical Derivatives Lab Objective: Understand and implement finite difference approximations of te derivative in single and multiple dimensions. Evaluate te accuracy of tese approximations. Ten

More information

Design of PSO-based Fuzzy Classification Systems

Design of PSO-based Fuzzy Classification Systems Tamkang Journal of Science and Engineering, Vol. 9, No 1, pp. 6370 (006) 63 Design of PSO-based Fuzzy Classification Systems Cia-Cong Cen Department of Electronics Engineering, Wufeng Institute of Tecnology,

More information

3.6 Directional Derivatives and the Gradient Vector

3.6 Directional Derivatives and the Gradient Vector 288 CHAPTER 3. FUNCTIONS OF SEVERAL VARIABLES 3.6 Directional Derivatives and te Gradient Vector 3.6.1 Functions of two Variables Directional Derivatives Let us first quickly review, one more time, te

More information

Software Fault Prediction using Machine Learning Algorithm Pooja Garg 1 Mr. Bhushan Dua 2

Software Fault Prediction using Machine Learning Algorithm Pooja Garg 1 Mr. Bhushan Dua 2 IJSRD - International Journal for Scientific Researc & Development Vol. 3, Issue 04, 2015 ISSN (online): 2321-0613 Software Fault Prediction using Macine Learning Algoritm Pooja Garg 1 Mr. Busan Dua 2

More information

4.2 The Derivative. f(x + h) f(x) lim

4.2 The Derivative. f(x + h) f(x) lim 4.2 Te Derivative Introduction In te previous section, it was sown tat if a function f as a nonvertical tangent line at a point (x, f(x)), ten its slope is given by te it f(x + ) f(x). (*) Tis is potentially

More information

Laser Radar based Vehicle Localization in GPS Signal Blocked Areas

Laser Radar based Vehicle Localization in GPS Signal Blocked Areas International Journal of Computational Intelligence Systems, Vol. 4, No. 6 (December, 20), 00-09 Laser Radar based Veicle Localization in GPS Signal Bloced Areas Ming Yang Department of Automation, Sangai

More information

Optimal In-Network Packet Aggregation Policy for Maximum Information Freshness

Optimal In-Network Packet Aggregation Policy for Maximum Information Freshness 1 Optimal In-etwork Packet Aggregation Policy for Maimum Information Fresness Alper Sinan Akyurek, Tajana Simunic Rosing Electrical and Computer Engineering, University of California, San Diego aakyurek@ucsd.edu,

More information

Cubic smoothing spline

Cubic smoothing spline Cubic smooting spline Menu: QCExpert Regression Cubic spline e module Cubic Spline is used to fit any functional regression curve troug data wit one independent variable x and one dependent random variable

More information

CESILA: Communication Circle External Square Intersection-Based WSN Localization Algorithm

CESILA: Communication Circle External Square Intersection-Based WSN Localization Algorithm Sensors & Transducers 2013 by IFSA ttp://www.sensorsportal.com CESILA: Communication Circle External Square Intersection-Based WSN Localization Algoritm Sun Hongyu, Fang Ziyi, Qu Guannan College of Computer

More information

Constrained Clustering with Interactive Similarity Learning

Constrained Clustering with Interactive Similarity Learning SCIS & ISIS 2010, Dec. 8-12, 2010, Okayama Convention Center, Okayama, Japan Constrained Clustering with Interactive Similarity Learning Masayuki Okabe Toyohashi University of Technology Tenpaku 1-1, Toyohashi,

More information

1.4 RATIONAL EXPRESSIONS

1.4 RATIONAL EXPRESSIONS 6 CHAPTER Fundamentals.4 RATIONAL EXPRESSIONS Te Domain of an Algebraic Epression Simplifying Rational Epressions Multiplying and Dividing Rational Epressions Adding and Subtracting Rational Epressions

More information

Linear Interpolating Splines

Linear Interpolating Splines Jim Lambers MAT 772 Fall Semester 2010-11 Lecture 17 Notes Tese notes correspond to Sections 112, 11, and 114 in te text Linear Interpolating Splines We ave seen tat ig-degree polynomial interpolation

More information

Coarticulation: An Approach for Generating Concurrent Plans in Markov Decision Processes

Coarticulation: An Approach for Generating Concurrent Plans in Markov Decision Processes Coarticulation: An Approac for Generating Concurrent Plans in Markov Decision Processes Kasayar Roanimanes kas@cs.umass.edu Sridar Maadevan maadeva@cs.umass.edu Department of Computer Science, University

More information

An Algorithm for Loopless Deflection in Photonic Packet-Switched Networks

An Algorithm for Loopless Deflection in Photonic Packet-Switched Networks An Algoritm for Loopless Deflection in Potonic Packet-Switced Networks Jason P. Jue Center for Advanced Telecommunications Systems and Services Te University of Texas at Dallas Ricardson, TX 75083-0688

More information

PYRAMID FILTERS BASED ON BILINEAR INTERPOLATION

PYRAMID FILTERS BASED ON BILINEAR INTERPOLATION PYRAMID FILTERS BASED ON BILINEAR INTERPOLATION Martin Kraus Computer Grapics and Visualization Group, Tecnisce Universität Müncen, Germany krausma@in.tum.de Magnus Strengert Visualization and Interactive

More information

Classification of Osteoporosis using Fractal Texture Features

Classification of Osteoporosis using Fractal Texture Features Classification of Osteoporosis using Fractal Texture Features V.Srikant, C.Dines Kumar and A.Tobin Department of Electronics and Communication Engineering Panimalar Engineering College Cennai, Tamil Nadu,

More information

MATH 5a Spring 2018 READING ASSIGNMENTS FOR CHAPTER 2

MATH 5a Spring 2018 READING ASSIGNMENTS FOR CHAPTER 2 MATH 5a Spring 2018 READING ASSIGNMENTS FOR CHAPTER 2 Note: Tere will be a very sort online reading quiz (WebWork) on eac reading assignment due one our before class on its due date. Due dates can be found

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

Piecewise Polynomial Interpolation, cont d

Piecewise Polynomial Interpolation, cont d Jim Lambers MAT 460/560 Fall Semester 2009-0 Lecture 2 Notes Tese notes correspond to Section 4 in te text Piecewise Polynomial Interpolation, cont d Constructing Cubic Splines, cont d Having determined

More information

Comparison of the Efficiency of the Various Algorithms in Stratified Sampling when the Initial Solutions are Determined with Geometric Method

Comparison of the Efficiency of the Various Algorithms in Stratified Sampling when the Initial Solutions are Determined with Geometric Method International Journal of Statistics and Applications 0, (): -0 DOI: 0.9/j.statistics.000.0 Comparison of te Efficiency of te Various Algoritms in Stratified Sampling wen te Initial Solutions are Determined

More information

The Euler and trapezoidal stencils to solve d d x y x = f x, y x

The Euler and trapezoidal stencils to solve d d x y x = f x, y x restart; Te Euler and trapezoidal stencils to solve d d x y x = y x Te purpose of tis workseet is to derive te tree simplest numerical stencils to solve te first order d equation y x d x = y x, and study

More information

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016 Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the

More information

Proceedings of the 8th WSEAS International Conference on Neural Networks, Vancouver, British Columbia, Canada, June 19-21,

Proceedings of the 8th WSEAS International Conference on Neural Networks, Vancouver, British Columbia, Canada, June 19-21, Proceedings of te 8t WSEAS International Conference on Neural Networks, Vancouver, Britis Columbia, Canada, June 9-2, 2007 3 Neural Network Structures wit Constant Weigts to Implement Dis-Jointly Removed

More information

Chapter K. Geometric Optics. Blinn College - Physics Terry Honan

Chapter K. Geometric Optics. Blinn College - Physics Terry Honan Capter K Geometric Optics Blinn College - Pysics 2426 - Terry Honan K. - Properties of Ligt Te Speed of Ligt Te speed of ligt in a vacuum is approximately c > 3.0µ0 8 mês. Because of its most fundamental

More information

Computing Gaussian Mixture Models with EM using Equivalence Constraints

Computing Gaussian Mixture Models with EM using Equivalence Constraints Computing Gaussian Mixture Models with EM using Equivalence Constraints Noam Shental Computer Science & Eng. Center for Neural Computation Hebrew University of Jerusalem Jerusalem, Israel 9904 fenoam@cs.huji.ac.il

More information

2 The Derivative. 2.0 Introduction to Derivatives. Slopes of Tangent Lines: Graphically

2 The Derivative. 2.0 Introduction to Derivatives. Slopes of Tangent Lines: Graphically 2 Te Derivative Te two previous capters ave laid te foundation for te study of calculus. Tey provided a review of some material you will need and started to empasize te various ways we will view and use

More information

Symmetric Tree Replication Protocol for Efficient Distributed Storage System*

Symmetric Tree Replication Protocol for Efficient Distributed Storage System* ymmetric Tree Replication Protocol for Efficient Distributed torage ystem* ung Cune Coi 1, Hee Yong Youn 1, and Joong up Coi 2 1 cool of Information and Communications Engineering ungkyunkwan University

More information

Haar Transform CS 430 Denbigh Starkey

Haar Transform CS 430 Denbigh Starkey Haar Transform CS Denbig Starkey. Background. Computing te transform. Restoring te original image from te transform 7. Producing te transform matrix 8 5. Using Haar for lossless compression 6. Using Haar

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Implementation of Integral based Digital Curvature Estimators in DGtal

Implementation of Integral based Digital Curvature Estimators in DGtal Implementation of Integral based Digital Curvature Estimators in DGtal David Coeurjolly 1, Jacques-Olivier Lacaud 2, Jérémy Levallois 1,2 1 Université de Lyon, CNRS INSA-Lyon, LIRIS, UMR5205, F-69621,

More information

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) 1 S. ADAEKALAVAN, 2 DR. C. CHANDRASEKAR 1 Assistant Professor, Department of Information Technology, J.J. College of Arts and Science, Pudukkottai,

More information

Investigating an automated method for the sensitivity analysis of functions

Investigating an automated method for the sensitivity analysis of functions Investigating an automated metod for te sensitivity analysis of functions Sibel EKER s.eker@student.tudelft.nl Jill SLINGER j..slinger@tudelft.nl Delft University of Tecnology 2628 BX, Delft, te Neterlands

More information

Tuning MAX MIN Ant System with off-line and on-line methods

Tuning MAX MIN Ant System with off-line and on-line methods Université Libre de Bruxelles Institut de Recerces Interdisciplinaires et de Développements en Intelligence Artificielle Tuning MAX MIN Ant System wit off-line and on-line metods Paola Pellegrini, Tomas

More information

Classification with Partial Labels

Classification with Partial Labels Classification with Partial Labels Nam Nguyen, Rich Caruana Cornell University Department of Computer Science Ithaca, New York 14853 {nhnguyen, caruana}@cs.cornell.edu ABSTRACT In this paper, we address

More information

, 1 1, A complex fraction is a quotient of rational expressions (including their sums) that result

, 1 1, A complex fraction is a quotient of rational expressions (including their sums) that result RT. Complex Fractions Wen working wit algebraic expressions, sometimes we come across needing to simplify expressions like tese: xx 9 xx +, xx + xx + xx, yy xx + xx + +, aa Simplifying Complex Fractions

More information

A Cost Model for Distributed Shared Memory. Using Competitive Update. Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science

A Cost Model for Distributed Shared Memory. Using Competitive Update. Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science A Cost Model for Distributed Sared Memory Using Competitive Update Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, Texas, 77843-3112, USA E-mail: fjkim,vaidyag@cs.tamu.edu

More information

An Effective Sensor Deployment Strategy by Linear Density Control in Wireless Sensor Networks Chiming Huang and Rei-Heng Cheng

An Effective Sensor Deployment Strategy by Linear Density Control in Wireless Sensor Networks Chiming Huang and Rei-Heng Cheng An ffective Sensor Deployment Strategy by Linear Density Control in Wireless Sensor Networks Ciming Huang and ei-heng Ceng 5 De c e mbe r0 International Journal of Advanced Information Tecnologies (IJAIT),

More information

MAPI Computer Vision

MAPI Computer Vision MAPI Computer Vision Multiple View Geometry In tis module we intend to present several tecniques in te domain of te 3D vision Manuel Joao University of Mino Dep Industrial Electronics - Applications -

More information

Locality Preserving Projections (LPP) Abstract

Locality Preserving Projections (LPP) Abstract Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL

More information

Analytical CHEMISTRY

Analytical CHEMISTRY ISSN : 974-749 Grap kernels and applications in protein classification Jiang Qiangrong*, Xiong Zikang, Zai Can Department of Computer Science, Beijing University of Tecnology, Beijing, (CHINA) E-mail:

More information

An Analytical Approach to Real-Time Misbehavior Detection in IEEE Based Wireless Networks

An Analytical Approach to Real-Time Misbehavior Detection in IEEE Based Wireless Networks Tis paper was presented as part of te main tecnical program at IEEE INFOCOM 20 An Analytical Approac to Real-Time Misbeavior Detection in IEEE 802. Based Wireless Networks Jin Tang, Yu Ceng Electrical

More information

Mean Shifting Gradient Vector Flow: An Improved External Force Field for Active Surfaces in Widefield Microscopy.

Mean Shifting Gradient Vector Flow: An Improved External Force Field for Active Surfaces in Widefield Microscopy. Mean Sifting Gradient Vector Flow: An Improved External Force Field for Active Surfaces in Widefield Microscopy. Margret Keuper Cair of Pattern Recognition and Image Processing Computer Science Department

More information

Information Integration of Partially Labeled Data

Information Integration of Partially Labeled Data Information Integration of Partially Labeled Data Steffen Rendle and Lars Schmidt-Thieme Information Systems and Machine Learning Lab, University of Hildesheim srendle@ismll.uni-hildesheim.de, schmidt-thieme@ismll.uni-hildesheim.de

More information

Our Calibrated Model has No Predictive Value: An Example from the Petroleum Industry

Our Calibrated Model has No Predictive Value: An Example from the Petroleum Industry Our Calibrated Model as No Predictive Value: An Example from te Petroleum Industry J.N. Carter a, P.J. Ballester a, Z. Tavassoli a and P.R. King a a Department of Eart Sciences and Engineering, Imperial

More information

Semi-supervised graph clustering: a kernel approach

Semi-supervised graph clustering: a kernel approach Mach Learn (2009) 74: 1 22 DOI 10.1007/s10994-008-5084-4 Semi-supervised graph clustering: a kernel approach Brian Kulis Sugato Basu Inderjit Dhillon Raymond Mooney Received: 9 March 2007 / Revised: 17

More information

Multi-Stack Boundary Labeling Problems

Multi-Stack Boundary Labeling Problems Multi-Stack Boundary Labeling Problems Micael A. Bekos 1, Micael Kaufmann 2, Katerina Potika 1 Antonios Symvonis 1 1 National Tecnical University of Atens, Scool of Applied Matematical & Pysical Sciences,

More information

12.2 TECHNIQUES FOR EVALUATING LIMITS

12.2 TECHNIQUES FOR EVALUATING LIMITS Section Tecniques for Evaluating Limits 86 TECHNIQUES FOR EVALUATING LIMITS Wat ou sould learn Use te dividing out tecnique to evaluate its of functions Use te rationalizing tecnique to evaluate its of

More information

Computing geodesic paths on manifolds

Computing geodesic paths on manifolds Proc. Natl. Acad. Sci. USA Vol. 95, pp. 8431 8435, July 1998 Applied Matematics Computing geodesic pats on manifolds R. Kimmel* and J. A. Setian Department of Matematics and Lawrence Berkeley National

More information

Vector Processing Contours

Vector Processing Contours Vector Processing Contours Andrey Kirsanov Department of Automation and Control Processes MAMI Moscow State Tecnical University Moscow, Russia AndKirsanov@yandex.ru A.Vavilin and K-H. Jo Department of

More information

Feature-Based Steganalysis for JPEG Images and its Implications for Future Design of Steganographic Schemes

Feature-Based Steganalysis for JPEG Images and its Implications for Future Design of Steganographic Schemes Feature-Based Steganalysis for JPEG Images and its Implications for Future Design of Steganograpic Scemes Jessica Fridric Dept. of Electrical Engineering, SUNY Bingamton, Bingamton, NY 3902-6000, USA fridric@bingamton.edu

More information

Intra- and Inter-Session Network Coding in Wireless Networks

Intra- and Inter-Session Network Coding in Wireless Networks Intra- and Inter-Session Network Coding in Wireless Networks Hulya Seferoglu, Member, IEEE, Atina Markopoulou, Member, IEEE, K K Ramakrisnan, Fellow, IEEE arxiv:857v [csni] 3 Feb Abstract In tis paper,

More information

Global Metric Learning by Gradient Descent

Global Metric Learning by Gradient Descent Global Metric Learning by Gradient Descent Jens Hocke and Thomas Martinetz University of Lübeck - Institute for Neuro- and Bioinformatics Ratzeburger Allee 160, 23538 Lübeck, Germany hocke@inb.uni-luebeck.de

More information

DRN: Bringing Greedy Layer-Wise Training into Time Dimension

DRN: Bringing Greedy Layer-Wise Training into Time Dimension DRN: Bringing Greedy Layer-Wise Training into Time Dimension Xiaoyi Li, Xiaowei Jia, Hui Li, Houping Xiao, Jing Gao and Aidong Zang Dept. of Computer Science and Engineering State University of New York

More information

Constrained K-means Clustering with Background Knowledge. Clustering! Background Knowledge. Using Background Knowledge. The K-means Algorithm

Constrained K-means Clustering with Background Knowledge. Clustering! Background Knowledge. Using Background Knowledge. The K-means Algorithm Constrained K-means Clustering with Background Knowledge paper by Kiri Wagstaff, Claire Cardie, Seth Rogers and Stefan Schroedl presented by Siddharth Patwardhan An Overview of the Talk Introduction to

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Measuring Constraint-Set Utility for Partitional Clustering Algorithms

Measuring Constraint-Set Utility for Partitional Clustering Algorithms Measuring Constraint-Set Utility for Partitional Clustering Algorithms Ian Davidson 1, Kiri L. Wagstaff 2, and Sugato Basu 3 1 State University of New York, Albany, NY 12222, davidson@cs.albany.edu 2 Jet

More information

On the Use of Radio Resource Tests in Wireless ad hoc Networks

On the Use of Radio Resource Tests in Wireless ad hoc Networks Tecnical Report RT/29/2009 On te Use of Radio Resource Tests in Wireless ad oc Networks Diogo Mónica diogo.monica@gsd.inesc-id.pt João Leitão jleitao@gsd.inesc-id.pt Luis Rodrigues ler@ist.utl.pt Carlos

More information

Section 2.3: Calculating Limits using the Limit Laws

Section 2.3: Calculating Limits using the Limit Laws Section 2.3: Calculating Limits using te Limit Laws In previous sections, we used graps and numerics to approimate te value of a it if it eists. Te problem wit tis owever is tat it does not always give

More information

Clustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo

Clustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo Clustering Lecture 9: Other Topics Jing Gao SUNY Buffalo 1 Basics Outline Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Miture model Spectral methods Advanced topics

More information

UNSUPERVISED HIERARCHICAL IMAGE SEGMENTATION BASED ON THE TS-MRF MODEL AND FAST MEAN-SHIFT CLUSTERING

UNSUPERVISED HIERARCHICAL IMAGE SEGMENTATION BASED ON THE TS-MRF MODEL AND FAST MEAN-SHIFT CLUSTERING UNSUPERVISED HIERARCHICAL IMAGE SEGMENTATION BASED ON THE TS-MRF MODEL AND FAST MEAN-SHIFT CLUSTERING Raffaele Gaetano, Giuseppe Scarpa, Giovanni Poggi, and Josiane Zerubia Dip. Ing. Elettronica e Telecomunicazioni,

More information

Fault Localization Using Tarantula

Fault Localization Using Tarantula Class 20 Fault localization (cont d) Test-data generation Exam review: Nov 3, after class to :30 Responsible for all material up troug Nov 3 (troug test-data generation) Send questions beforeand so all

More information

All truths are easy to understand once they are discovered; the point is to discover them. Galileo

All truths are easy to understand once they are discovered; the point is to discover them. Galileo Section 7. olume All truts are easy to understand once tey are discovered; te point is to discover tem. Galileo Te main topic of tis section is volume. You will specifically look at ow to find te volume

More information

Minimizing Memory Access By Improving Register Usage Through High-level Transformations

Minimizing Memory Access By Improving Register Usage Through High-level Transformations Minimizing Memory Access By Improving Register Usage Troug Hig-level Transformations San Li Scool of Computer Engineering anyang Tecnological University anyang Avenue, SIGAPORE 639798 Email: p144102711@ntu.edu.sg

More information

Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data

Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Ryan Atallah, John Ryan, David Aeschlimann December 14, 2013 Abstract In this project, we study the problem of classifying

More information

Value, Cost, and Sharing: Open Issues in Constrained Clustering

Value, Cost, and Sharing: Open Issues in Constrained Clustering Value, Cost, and Sharing: Open Issues in Constrained Clustering Kiri L. Wagstaff Jet Propulsion Laboratory, California Institute of Technology, Mail Stop 126-347, 4800 Oak Grove Drive, Pasadena CA 91109,

More information

Search-aware Conditions for Probably Approximately Correct Heuristic Search

Search-aware Conditions for Probably Approximately Correct Heuristic Search Searc-aware Conditions for Probably Approximately Correct Heuristic Searc Roni Stern Ariel Felner Information Systems Engineering Ben Gurion University Beer-Seva, Israel 85104 roni.stern@gmail.com, felner@bgu.ac.il

More information

ANTENNA SPHERICAL COORDINATE SYSTEMS AND THEIR APPLICATION IN COMBINING RESULTS FROM DIFFERENT ANTENNA ORIENTATIONS

ANTENNA SPHERICAL COORDINATE SYSTEMS AND THEIR APPLICATION IN COMBINING RESULTS FROM DIFFERENT ANTENNA ORIENTATIONS NTNN SPHRICL COORDINT SSTMS ND THIR PPLICTION IN COMBINING RSULTS FROM DIFFRNT NTNN ORINTTIONS llen C. Newell, Greg Hindman Nearfield Systems Incorporated 133. 223 rd St. Bldg. 524 Carson, C 9745 US BSTRCT

More information

Utilizing Call Admission Control to Derive Optimal Pricing of Multiple Service Classes in Wireless Cellular Networks

Utilizing Call Admission Control to Derive Optimal Pricing of Multiple Service Classes in Wireless Cellular Networks Utilizing Call Admission Control to Derive Optimal Pricing of Multiple Service Classes in Wireless Cellular Networks Okan Yilmaz and Ing-Ray Cen Computer Science Department Virginia Tec {oyilmaz, ircen}@vt.edu

More information

Intractability and Clustering with Constraints

Intractability and Clustering with Constraints Ian Davidson davidson@cs.albany.edu S.S. Ravi ravi@cs.albany.edu Department of Computer Science, State University of New York, 1400 Washington Ave, Albany, NY 12222 Abstract Clustering with constraints

More information

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

2.8 The derivative as a function

2.8 The derivative as a function CHAPTER 2. LIMITS 56 2.8 Te derivative as a function Definition. Te derivative of f(x) istefunction f (x) defined as follows f f(x + ) f(x) (x). 0 Note: tis differs from te definition in section 2.7 in

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Multi-Objective Particle Swarm Optimizers: A Survey of the State-of-the-Art

Multi-Objective Particle Swarm Optimizers: A Survey of the State-of-the-Art Multi-Objective Particle Swarm Optimizers: A Survey of te State-of-te-Art Margarita Reyes-Sierra and Carlos A. Coello Coello CINVESTAV-IPN (Evolutionary Computation Group) Electrical Engineering Department,

More information

Kernel-Based Metric Adaptation with Pairwise Constraints

Kernel-Based Metric Adaptation with Pairwise Constraints Kernel-Based Metric Adaptation with Pairwise Constraints Hong Chang and Dit-Yan Yeung Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong {hongch,dyyeung}@cs.ust.hk

More information

CHAPTER 7: TRANSCENDENTAL FUNCTIONS

CHAPTER 7: TRANSCENDENTAL FUNCTIONS 7.0 Introduction and One to one Functions Contemporary Calculus 1 CHAPTER 7: TRANSCENDENTAL FUNCTIONS Introduction In te previous capters we saw ow to calculate and use te derivatives and integrals of

More information

When a BST becomes badly unbalanced, the search behavior can degenerate to that of a sorted linked list, O(N).

When a BST becomes badly unbalanced, the search behavior can degenerate to that of a sorted linked list, O(N). Balanced Binary Trees Binary searc trees provide O(log N) searc times provided tat te nodes are distributed in a reasonably balanced manner. Unfortunately, tat is not always te case and performing a sequence

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Distributed and Optimal Rate Allocation in Application-Layer Multicast

Distributed and Optimal Rate Allocation in Application-Layer Multicast Distributed and Optimal Rate Allocation in Application-Layer Multicast Jinyao Yan, Martin May, Bernard Plattner, Wolfgang Mülbauer Computer Engineering and Networks Laboratory, ETH Zuric, CH-8092, Switzerland

More information

Alternating Direction Implicit Methods for FDTD Using the Dey-Mittra Embedded Boundary Method

Alternating Direction Implicit Methods for FDTD Using the Dey-Mittra Embedded Boundary Method Te Open Plasma Pysics Journal, 2010, 3, 29-35 29 Open Access Alternating Direction Implicit Metods for FDTD Using te Dey-Mittra Embedded Boundary Metod T.M. Austin *, J.R. Cary, D.N. Smite C. Nieter Tec-X

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information