Integrating Constraints and Metric Learning in Semi-Supervised Clustering
|
|
- Gillian Bruce
- 5 years ago
- Views:
Transcription
1 Integrating Constraints and Metric Learning in Semi-Supervised Clustering Mikail Bilenko Sugato Basu Raymond J. Mooney Department of Computer Sciences, University of Texas at Austin, Austin, TX USA Abstract Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in te area as utilized supervised data in one of two approaces: 1) constraint-based metods tat guide te clustering algoritm towards a better grouping of te data, and 2) distance-function learning metods tat adapt te underlying similarity metric used by te clustering algoritm. Tis paper provides new metods for te two approaces as well as presents a new semi-supervised clustering algoritm tat integrates bot of tese tecniques in a uniform, principled framework. Experimental results demonstrate tat te unified approac produces better clusters tan bot individual approaces as well as previously proposed semisupervised clustering algoritms. 1. Introduction In many learning tasks, unlabeled data is plentiful but labeled data is limited and expensive to generate. Consequently, semi-supervised learning, wic employs bot labeled and unlabeled data, as become a topic of significant interest. More specifically, semi-supervised clustering, te use of class labels or pairwise constraints on some examples to aid unsupervised clustering, as been te focus of several recent projects (Wagstaff et al., 2001; Basu et al., 2002; Klein et al., 2002; Xing et al., 2003; Bar-Hillel et al., 2003; Segal et al., 2003). Existing metods for semi-supervised clustering fall into two general approaces we call constraint-based and metric-based. In constraint-based approaces, te clustering algoritm itself is modified so tat user-provided labels or pairwise constraints are used to guide te algoritm towards a more appropriate data partitioning. Tis is done by modifying te clustering objective function so tat it includes satisfaction of constraints (Demiriz et al., Appearing in Proceedings of te 21 st International Conference on Macine Learning, Banff, Canada, Copyrigt 2004 by te autors. 1999), enforcing constraints during te clustering process (Wagstaff et al., 2001), or initializing and constraining clustering based on labeled examples (Basu et al., 2002). In metric-based approaces, an existing clustering algoritm tat uses a distance metric is employed; owever, te metric is first trained to satisfy te labels or constraints in te supervised data. Several distance measures ave been used for metric-based semi-supervised clustering including Euclidean distance trained by a sortest-pat algoritm (Klein et al., 2002), string-edit distance learned using Expectation Maximization (EM) (Bilenko & Mooney, 2003), KL divergence adapted using gradient descent (Con et al., 2003), and Maalanobis distances trained using convex optimization (Xing et al., 2003; Bar-Hillel et al., 2003). Previous metric-based semi-supervised clustering algoritms exclude unlabeled data from te metric training step, as well as separate metric learning from te clustering process. Also, existing metric-based metods use a single distance metric for all clusters, forcing tem to ave similar sapes. We propose a new semi-supervised clustering algoritm derived from, MPCK-MEANS, tat incorporates bot metric learning and te use of pairwise constraints in a principled manner. MPCK-MEANS performs distance-metric training wit eac clustering iteration, utilizing bot unlabeled data and pairwise constraints. Te algoritm is able to learn individual metrics for eac cluster, wic permits clusters of different sapes. MPCK- MEANS also allows violation of constraints if it leads to a more coesive clustering, wereas earlier constraint-based metods forced satisfaction of all constraints, leaving tem vulnerable to noisy supervision. By ablating te metric-based and constraint-based components of our unified metod, we present experimental results comparing and combining te two approaces on multiple datasets. Te two metods for semi-supervision individually improve clustering accuracy, and our unified approac integrates teir strengts. Finally, we demonstrate tat te semi-supervised metric learning in our approac outperforms previously proposed metods tat learn metrics prior to clustering, and tat learning multiple clusterspecific metrics can lead to better results.
2 2. Problem Formulation 2.1. Clustering wit is a clustering algoritm based on iterative relocation tat partitions a dataset into K clusters, locally minimizing te total squared Euclidean distance between te data points and te cluster centroids. Let X = {x i } N i=1,x i Ê m be a set of data points, x id be te d-t component of x i, {µ } K =1 represent te K cluster centroids, and l i be te cluster assignment of a point x i, were l i {1,...,K}. Te Euclidean algoritm creates a K-partitioning {X } K =1 of X so tat te objective function x x i X i µ li 2 is locally minimized. It can be sown tat te algoritm is essentially an EM algoritm on a mixture of K Gaussians under assumptions of identity covariance of te Gaussians, uniform mixture component priors and expectation under a particular type of conditional distribution (Basu et al., 2002). In te Euclidean formulation, te squared L 2 -norm x i µ li 2 = (x i µ li ) T (x i µ li ) between a point x i and its corresponding cluster centroid µ li is used as te distance measure, wic is a direct consequence of te identity covariance assumption of te underlying Gaussians Semi-supervised Clustering wit Constraints In semi-supervised clustering, a small amount of labeled data is available to aid te clustering process. Our framework uses bot must-link and cannot-link constraints between pairs of instances (Wagstaff et al., 2001), wit an associated cost for violating eac constraint. In many unsupervised-learning applications, e.g., clustering for speaker identification in a conversation (Bar-Hillel et al., 2003), or clustering GPS data for lane-finding (Wagstaff et al., 2001), considering supervision in te form of constraints is more realistic tan providing class labels. Wile class labels may be unknown, a user can still specify weter pairs of points belong to same or different clusters. Constraint-based supervision is also more general tan class labels: a set of classified points implies an equivalent set of pairwise constraints, but not vice versa. Since cannot directly andle pairwise constraints, we formulate te goal of pairwise constrained clustering as minimizing a combined objective function, defined as te sum of te total squared distances between te points and teir cluster centroids, and te cost incurred by violating any pairwise constraints. Let M be a set of must-link pairs were (x i,x j ) M implies x i and x j sould be in te same cluster, and C be a set of cannot-link pairs were (x i,x j ) C implies x i and x j sould be in different clusters. Let W = {w ij } and W = {w ij } be penalty costs for violating te constraints in M and C respectively. Terefore, te goal of pairwise constrained is to minimize te following objective function, were point x i is assigned to te partition X li wit centroid µ li : J pckmeans = X x i µ li 2 x i X (x i,x j ) C (x i,x j ) M w ij½[l i l j] w ij½[l i = l j] (1) were ½ is te indicator function, ½[true] = 1 and ½[false] = 0. Tis matematical formulation is motivated by te metric labeling problem wit te generalized Potts model (Kleinberg & Tardos, 1999) Semi-supervised Clustering wit Metric Learning Wile pairwise constraints can guide a clustering algoritm towards a better grouping, tey can also be used to adapt te underlying distance metric. Pairwise constraints effectively represent te user s view of similarity in te domain. Since te original data representation may not specify a space were clusters are sufficiently separated, modifying te distance metric warps te space to minimize distances between same-cluster objects, wile maximizing distances between different-cluster objects. As a result, clusters discovered using learned metrics adere more closely to te notion of similarity embodied in te supervision. We parameterize Euclidean distance using a symmetric positive-definite matrix A as follows: x i x j A = (x i µ li ) T A(x i µ li ); te same parameterization was previously used by Xing et al. (2003) and Bar-Hillel et al. (2003). If A is restricted to a diagonal matrix, it scales eac dimension by a different weigt and corresponds to feature weigting; oterwise new features are created tat are linear combinations of te original ones. In previous work on adaptive metrics for clustering (Con et al., 2003; Xing et al., 2003; Bar-Hillel et al., 2003), metric weigts are trained to simultaneously minimize te distance between must-linked instances and maximize te distance between cannot-linked instances. A fundamental limitation of tese approaces is tat tey assume a single metric for all clusters, preventing tem from aving different sapes. We allow a separate weigt matrix for eac cluster, denoted A for cluster. Tis is equivalent to a generalized version of te model described in section 2.1, were cluster is generated by a Gaussian wit covariance matrix A 1 (Bilmes, 1997). It can be sown tat maximizing te complete data log-likeliood under tis generalized model is equivalent to minimizing te objective function: J mkmeans = X ` xi µ li 2 A log(det(a li l i )) (2) x i X were te second term arises due to te normalizing constant of l i -t Gaussian wit covariance matrix A 1 l i.
3 2.4. Integrating Constraints and Metric Learning Combining Eqns.(1) and (2) leads to te following objective function tat minimizes cluster dispersion under te learned metrics wile reducing constraint violations: J combined = X ` xi µ li 2 A log(det(a li l i )) x i X (x i,x j ) M w ij½[l i l j] w ij½[l i = l j] (3) (x i,x j ) C If we assume uniform constraint costs w ij and w ij, all constraint violations are treated equally. However, te penalty for violating a must-link constraint between distant points sould be iger tan tat between nearby points. Intuitively, tis captures te fact tat if two must-linked points are far apart according to te current metric, te metric is grossly inadequate and needs severe modification. Since two clusters are involved in a must-link violation, te corresponding penalty sould affect te metrics for bot clusters. Tis can be accomplised via multiplying te penalty in te second summation of Eqn.(3) by te following function: f M(x i,x j) = 1 2 xi xj 2 A li xi xj 2 A lj (4) Analogously, te penalty for violating a cannot-link constraint between two points tat are nearby according to te current metric sould be iger tan for two distant points. To reflect tis intuition, te following penalty term can be used wit violated cannot-link constraints tat are assigned to te same cluster (l i = l j ): f C(x i,x j) = x l i x l i 2 A li x i x j 2 A li (5) were (x l i,x l i ) is te maximally separated pair of points in te dataset according to l i -t metric. Tis form of f C ensures tat te penalty for violating a cannot-link constraint remains non-negative since te second term is never greater tan te first. Te combined objective function ten becomes: J mpckm = X ` xi µ li 2 A log(det(a li l i )) x i X w ijf M(x i,x j)½[l i l j] (6) (x i,x j ) M w ijf C(x i,x j)½[l i = l j] (x i,x j ) C Costs w ij and w ij provide a way of specifying te relative importance of te labeled versus unlabeled data wile allowing individual constraint weigts. Te following section describes ow J mpckm can be greedily optimized by our proposed metric pairwise constrained (MPCK- MEANS) algoritm. 3. MPCK-MEANS Algoritm Given a set of data points X, a set of must-link constraints M, a set of cannot-link constraints C, corresponding cost sets W and W, and te desired number of clusters K, MPCK-MEANS finds a disjoint K-partitioning {X } K =1 of X (wit eac cluster aving a centroid µ and a local weigt matrix A ) suc tat J mpckm is (locally) minimized. Te algoritm integrates te use of constraints and metric learning. Constraints are utilized during cluster initialization and wen assigning points to clusters, and te distance metric is adapted by re-estimating te weigt matrices A during eac iteration based on te current cluster assignments and constraint violations. Pseudocode for te algoritm is presented in Fig.1. Algoritm: Input: Set of data points X = {x i } N i=1, set of must-link constraints M = {(x i,x j )}, set of cannot-link constraints C = {(x i,x j )}, number of clusters K, sets of constraint costs W and W. Output: Disjoint K-partitioning {X } K =1 of X suc tat objective function J mpckm is (locally) minimized. Metod: 1. Initialize clusters: 1a. create te λ neigboroods {N p } λ p=1 from M and C 1b. if λ K initialize {µ (0) }K =1 using weigted fartest-first traversal starting from te largest N p else if λ < K initialize {µ (0) }λ =1 wit centroids of {N p} λ p=1 initialize remaining clusters at random 2. Repeat until convergence 2a. assign cluster: Assign eac data point x i to cluster (i.e. set X (t+1) ), for ( = arg min xi µ (t) 2 log(det(a A )) 2b. estimate means: {µ (t+1) } K =1 { 1 X (t+1) + (xi,xj) M w ijf M (x i,x j )½[ l j ] + (xi,xj) C w ijf C (x i,x j )½[ = l j ] ) x} K x X (t+1) =1 2c. update metrics: A = X ( xi X (x i µ )(x i µ ) T + (xi,xj) M 1 2 w ij(x i x j )(x i x j ) T ½[l i l j ] + (xi,xj) C w ij( (x x )(x x )T 2d. t (t + 1) 3.1. Initialization (x i x j )(x i x j ) T) ½[l i = l j ] Figure 1. MPCK-MEANS algoritm ) 1 Good initial centroids are critical to te success of greedy clustering algoritms suc as. To infer te initial clusters from te constraints, we take te transitive closure of te must-link constraints and augment te set M wit tese entailed constraints (assuming consistency of te constraints). Let λ be te number of connected components in te augmented set M. Tese connected components are used to create λ neigborood sets {N p } λ p=1, were eac neigborood consists of points connected by must-links. For every pair of neigboroods N p and N p tat ave at least one cannot-link between tem, we add cannot-link constraints between every pair of points in N p and N p and augment te cannot-link set C wit tese entailed constraints. We will overload notation from tis point and refer
4 to te augmented must-link and cannot-link sets as M and C respectively. After tis preprocessing step, we get λ neigborood sets {N p } λ p=1. Tese neigboroods provide initial clusters for te MPCK-MEANS algoritm. If λ K, we initialize λ cluster centers wit te centroids of all te λ neigborood sets. If λ < K, we initialize te remaining K λ clusters wit points obtained by random perturbations of te global centroid of X. If λ > K, we select K neigborood sets using a weigted variant of te fartest-first algoritm, wic is a good euristic for initialization in centroid-based clustering algoritms like. In weigted fartest-first traversal, te goal is to find K points wic are maximally separated from eac oter in terms of a weigted distance. In our case, te points are te centroids of te λ neigboroods, and te weigt of eac centroid is te size of its corresponding neigborood. Tus, we bias fartest-first to select centroids wic are relatively far apart but also represent large neigboroods, in order to obtain good initial clusters. In weigted fartest-first traversal, we maintain a set of traversed points at every step, and pick te following point aving te fartest weigted distance from te traversed set (using te standard notion of distance from a set: d(x,s) = min y S d(x,y)), and so on. Finally, we initialize te K cluster centers wit te centroids of te K neigboroods cosen by weigted fartest-first traversal E-step MPCK-MEANS alternates between cluster assignment in te E-step, and centroid estimation and metric learning in te M-step (see Step 2 in Fig.1). In te E-step, every point x is assigned to te cluster tat minimizes te sum of te distance of x to te cluster centroid according to te local metric and te cost of any constraint violations incurred by tis cluster assignment. Points are randomly re-ordered for eac assignment sequence, and once a point x is assigned to a cluster, te subsequent points in te random ordering use te current cluster assignment of x to calculate possible constraint violations. Note tat tis assignment step is order-dependent, since te subsets of M and C relevant to eac cluster may cange wit te assignment of a point. We experimented wit random ordering as well as a greedy strategy tat first assigned instances tat are closest to te cluster centroid and involved in a minimal number of constraints. Tese experiments sowed tat te order of assignment does not result in statistically significant differences in clustering quality; terefore, we used random ordering in our evaluation. In te E-step, eac point moves to a new cluster only if te component of J mpckm contributed by tis point decreases. So wen all points are given teir new assignment, J mpckm will decrease or remain te same M-step In te M-step, every cluster centroid µ is first re-estimated using te points in corresponding X. As a result, te contribution of eac cluster to J mpckm is minimized. Te pairwise constraints do not take part in tis centroid reestimation step because te constraint violations only depend on cluster assignments, wic do not cange in tis step. Tus, only te first term (te distance component) of J mpckm is minimized. Te centroid re-estimation step effectively remains te same as in. Te second part of te M-step performs metric learning, were te matrices {A } K =1 are re-estimated to decrease te objective function J mpckm. Eac updated matrix of local weigts A is obtained by taking te partial derivative J mpckm A and setting it to zero, resulting in: X A = X x i X (x i µ )(x i µ ) T 1 2 wij(xi xj)(xi xj)t ½[l i l j] (7) (x i,x j ) M `(x x )(x x ) T (x i,x j ) C w ij (x i x j)(x i x j) T ½[l i = l j] «1 were M and C are subsets of must-link and cannotlink constraints respectively tat contain points currently assigned to te -t cluster. Since eac A is obtained by inverting te summation of covariance matrices in Eqn.(7), A 1, tat summation must not be singular. If any of te obtained A 1 are singular, tey can be conditioned via adding te identity matrix multiplied by a small fraction of te trace of A 1 : A 1 = A 1 + ǫ tr(a 1 )I (Saul & Roweis, 2003). If te A resulting from te inversion is negative definite, it is mended by projecting on te set C = {A : A 0} of positive semi-definite matrices as described by Xing et al. (2003) to ensure tat it parameterizes a distance metric. For ig-dimensional or large datasets, estimating te full matrix A can be computationally expensive. In suc cases diagonal weigt matrices can be used, wic is equivalent to feature weigting, wile using te full matrix corresponds to feature generation. In te case of diagonal A, te d-t diagonal element, a () dd, corresponds to te weigt of te d-t feature for te -t cluster metric: a () dd = X X x i X (x id µ d ) wij(x id x jd ) 2 ½[l i l j] (8) (x i,x j ) M w ij`(x d x d) 2 (x id x jd ) 2 ½[l «1 i = l j] (x i,x j ) C
5 Intuitively, te first term in te sum, x i X (x id µ d ) 2, scales te weigt of eac feature proportionately to te feature s contribution to te overall cluster dispersion, analogously to scaling performed wen computing unsupervised Maalanobis distance. Te last two terms tat depend on constraint violations stretc eac dimension attempting to mend te current violations. Tus, te metric weigts are adjusted at eac iteration in suc a way tat te contribution of different attributes to distance is variance-normalized, wile constraint violations are minimized. Instead of multiple metrics {A } K =1 te algoritm can use a single metric A for all clusters. Te metric would be used and updated similarly to te description above, except tat summations in Eqns.(7) and (8) would be over X, M, and C instead of X, M, and C respectively. Te objective function decreases after every cluster assignment, centroid re-estimation and metric learning step till convergence, implying tat te MPCK-MEANS algoritm will converge to a local minima of J mpckm as long as matrices {A } K =1 are obtained directly from Eqn.(7). If any A 1 is conditioned as described above to make it positive definite or if te maximally separated points {(x,x )}K =1 cange between iterations, convergence is no longer guaranteed teoretically; owever, empirically tis as not been a problem in our experience. 4. Experiments 4.1. Metodology and Datasets Experiments were conducted on tree datasets from te UCI repository: Iris, Wine, and Ionospere (Blake & Merz, 1998); te Protein dataset used by Xing et al. (2003) and Bar-Hillel et al. (2003), and randomly sampled subsets from te Digits and Letters andwritten caracter recognition datasets, also from te UCI repository. For Digits and Letters, we cose two sets of tree classes: {I, J, L} from Letters and {3, 8, 9} from Digits, sampling 10% of te data points from te original datasets randomly. Tese classes were cosen since tey represent difficult visual discrimination problems. Table 1 summarizes te properties of te datasets: te number of instances N, te number of dimensions D, and te number of classes K. Table 1. Datasets used in experimental evaluation Iris Wine Ionospere Protein Letters Digits N D K We ave used pairwise to evaluate te clustering results based on te underlying classes. relies on te traditional information retrieval measures, adapted for evaluating clustering by considering same-cluster pairs: Precision = #PairsCorrectlyPredictedInSameCluster #T otalp airsp redictedinsamecluster Recall = #PairsCorrectlyPredictedInSameCluster #T otalp airsinsamecluster F Measure = 2 Precision Recall P recision + Recall We generated learning curves wit 5-fold cross-validation for eac dataset to determine te effect of utilizing te pairwise constraints. Eac point on te learning curve represents a particular number of randomly selected pairwise constraints given as input to te algoritm. Unit constraint costs W and W were used for all constraints, original and inferred, since te datasets did not provide individual weigts for te constraints. Te clustering algoritm was run on te wole dataset, but te pairwise was calculated only on te test set. Results were averaged over 50 runs of 5 folds Results and Discussion First, we compared constraint-based and metric-based semi-supervised clustering wit te integrated framework as well as purely unsupervised and supervised approaces. Figs.2-7 sow learning curves for te six datasets. For eac dataset, we compared five clustering scemes: MPCK-MEANS clustering, wic involves bot seeding and metric learning in te unified framework described in Section 2.4; a single metric parameterized by a diagonal matrix is used for all clusters; MK-MEANS, wic is clustering wit te metric learning component described in Section 3.3, witout utilizing constraints for initialization; a single metric parameterized by a diagonal matrix is used for all clusters; PCK-MEANS clustering, wic utilizes constraints for seeding te initial clusters and directs te cluster assignments to respect te constraints witout doing any metric learning, as outlined in Section 2.2; K-MEANS unsupervised clustering; SUPERVISED-MEANS, wic performs assignment of points to nearest cluster centroids inferred from constraints, as described in Section 3.1. Tis algoritm provides a baseline for performance of pure supervised learning based on constraints. On te presented datasets, te unified approac (MPCK- MEANS) outperforms individual seeding (PCK-MEANS) and metric learning (MK-MEANS). Superiority of semisupervised over unsupervised clustering illustrates tat providing pairwise constraints is beneficial to clustering quality. Improvements of semi-supervised clustering over SUPERVISED-MEANS indicate tat iterative refinement of
6 PC 5 5 PC PC 0.25 Figure 2. Iris: ablations Figure 3. Wine: ablations 0.2 Figure 4. Protein: ablations PC 8 7 Figure 5. Ionospere: ablations 5 5 PC Figure 6. Digits-389: ablations PC 0.35 Figure 7. Letters-IJL: ablations centroids using bot constraints and unlabeled data outperforms purely supervised assignment based on neigboroods inferred from constraints (for Ionospere, MPCK- MEANS requires eiter te full weigt matrix or individual cluster metrics to outperform SUPERVISED-MEANS, results for tese experiments are sown on Fig.11). For te Wine, Protein, and Letter-IJL datasets, te difference between metods tat utilize metric learning (MPCK- MEANS and MK-MEANS) and tose tat do not (PCK- MEANS and regular ) wit no pairwise constraints indicates tat even in te absence of constraints, weigting features by teir variance (essentially using unsupervised Maalanobis distance) improves clustering accuracy. For te Wine dataset, additional constraints provide an insubstantial improvement in cluster quality on tis dataset, wic sows tat meaningful feature weigts are obtained from scaling by variance using just te unlabeled data. Some of te metric learning curves display a caracteristic dip, were clustering accuracy decreases wen initial constraints are provided, but after a certain point starts to increase and eventually rises above te initial point on te learning curve. We conjecture tat tis penomenon is due to te fact tat metric parameters learned using few constraints are unreliable, and a significant number of constraints is required by te metric learning mecanism to estimate parameters accurately. On te oter and, seeding te clusters wit a small number of pairwise constraints as an immediate positive effect on te final cluster quality, wile providing more pairwise constraints as diminising returns, i.e., PCK-MEANS learning curves rise slowly. Wen bot seeding and metric learning are utilized, te unified approac benefits from te individual strengts of te two metods, as can be seen from te MPCK-MEANS results. In anoter set of experiments, we evaluated te utility of using individual metrics for eac cluster and te usefulness of learning a full weigt matrix A (feature generation) as opposed to a diagonal matrix (feature weigting). We ave also compared our metods wit, a semi-supervised clustering algoritm tat performs metric learning separately from te clustering process (Bar-Hillel et al., 2003), and tat as been sown to outperform a similar approac by Xing et al. (2003). Figs.8-13 sow learning curves for te six datasets on te following clustering scemes: MPCK-MEANS-S-D, wic is same as MPCK- MEANS on Figs.2-7 and involves bot seeding and metric learning; a single metric (S) parameterized by a diagonal matrix (D) is used for all clusters; MPCK-MEANS-M-D, wic involves bot seeding and metric learning; multiple metrics (M) parameterized by diagonal matrices (D) are used; MPCK-MEANS-S-F, wic involves bot seeding and metric learning; a single metric (S) parameterized by a full matrix (F) is used for all clusters; MPCK-MEANS-M-F, wic involves bot seeding and metric learning; multiple metrics (M) parameterized by full matrices (F) are used;
7 M-F Figure 8. Iris: metric learning 5 5 -M-F Figure 9. Wine: metric learning M-F 0.2 Figure 10. Protein: metric learning 5 -M-F Figure 11. Ionospere: metric learning M-F Figure 12. Digits-389: metric learning 5 5 -M-F 0.45 Figure 13. Letters-IJL: metric learning clustering, wic uses distance metric learning described in (Bar-Hillel et al., 2003) and initialization inferred from constraints as described in Section 3.1. As can be seen from results, bot full matrix parameterization and individual metrics for eac cluster can lead to significant improvements in clustering quality. However, te relative usefulness of tese two tecniques varies between te datasets, e.g., multiple metrics are particularly beneficial for Protein and Digits datasets, wile switcing from a diagonal to a full weigt matrix leads to large improvements on Wine, Ionospere, and Letters. Tese results can be explained by te fact tat te relative success of te two tecniques depends on te properties of a particular dataset: using a full weigt matrix elps wen te attributes are igly correlated, wile multiple metrics lead to improvements wen clusters in te dataset are of different sapes or lie in different subspaces of te original space. A combination of te two tecniques is most elpful wen bot of tese requirements are satisfied, as for Iris and Digits, wic was observed by visualizing tese datasets. For oter datasets, eiter multiple metrics or full weigt matrix lead to maximum performance in isolation. Comparing te performance of different variants of MPCK-MEANS wit, we can see tat early on te learning curves, were few pairwise constraints are available, leads to better metrics tan MPCK-MEANS. However, as more training data is provided, te ability of MPCK-MEANS to learn from bot supervised and unsupervised data as well as use individual metrics allows MPCK-MEANS to produce better clustering. Overall, our results indicate tat te integrated approac to utilizing pairwise constraints in clustering wit individual metrics outperforms seeding and metric learning individually and leads to improvements in cluster quality. Extending te basic approac wit a full parameterization matrix and individual metrics for eac cluster can lead to significant improvements over te basic metod. 5. Related work In previous work on constrained pairwise clustering, Wagstaff et al. (2001) proposed te COP-KMeans algoritm tat as a euristically motivated objective function. Our formulation, on te oter and, as an underlying generative model based on Hidden Markov Random Fields (see (Basu et al., 2004) for a detailed analysis). Bansal et al. (2002) also proposed a framework for pairwise constrained clustering, but teir model performs clustering using only te constraints, wereas our formulation uses bot constraints and an underlying distance metric between te points for clustering. Scultz and Joacims (2004) recently introduced a metod for learning distance metric parameters based on relative comparisons. In unsupervised clustering, Domeniconi (2002) proposed a variant of tat incorporated learning individual Euclidean metric weigts for eac cluster; our approac is more general since it allows metric learning to utilize pairwise constraints along wit unlabeled data.
8 In recent work on semi-supervised clustering wit pairwise constraints, Con et al. (2003) used gradient descent for weigted Jensen-Sannon divergence in te context of EM clustering. Xing et al. (2003) utilized a combination of gradient descent and iterative projections to learn a Maalanobis metric for clustering. Also, Bar-Hillel et al. (2003) proposed a Redundant Component Analysis () algoritm tat uses only must-link constraints to learn a Maalanobis metric using convex optimization. All tese metric learning tecniques for clustering train a single metric first using only supervised data, and ten perform clustering on te unsupervised data. In contrast, our metod integrates distance metric learning wit te clustering process and utilizes bot supervised and unsupervised data to learn multiple metrics, wic experimentally leads to improved results. Finally, a unified objective function for semi-supervised clustering wit constraints was recently proposed by Segal et al. (2003), owever, it did not incorporate distance metric learning. 6. Conclusions and Future Work Tis paper as presented MPCK-MEANS, a new approac to semi-supervised clustering tat unifies te previous constraint-based and metric-based metods. It is based on a variation of te standard clustering algoritm and uses pairwise constraints along wit unlabeled data for constraining te clustering and learning distance metrics. In contrast to previously proposed semi-supervised clustering algoritms, MPCK-MEANS also allows clusters to lie in different subspaces and ave different sapes. By ablating te individual components of our integrated approac, we ave experimentally compared metric learning and constraints in isolation wit te combined algoritm. Our results ave sown tat by unifying te advantages of bot tecniques, te integrated approac outperforms te two tecniques individually. We ave sown tat using individual metrics for different clusters, as well as performing feature generation via a full weigt matrix in contrast to feature weigting wit a diagonal weigt matrix, can lead to improvements over our basic algoritm. Extending our approac to ig-dimensional datasets, were Euclidean distance performs poorly, is te primary avenue for future researc. Oter interesting topics for future work include selection of most informative pairwise constraints tat would facilitate accurate metric learning and obtaining good initial centroids, as well as metodology for andling noisy constraints and cluster initialization sensitive to constraint costs. 7. Acknowledgments We would like to tank anonymous reviewers and Joel Tropp for insigtful comments. Tis researc was supported in part by NSF grants IIS and IIS , and by a Faculty Fellowsip from IBM Corp. References Bansal, N., Blum, A., & Cawla, S. (2002). Correlation clustering. Proceedings of te 43rd IEEE Symposium on Foundations of Computer Science (FOCS-02) (pp ). Bar-Hillel, A., Hertz, T., Sental, N., & Weinsall, D. (2003). Learning distance functions using equivalence relations. Proceedings of 20t International Conference on Macine Learning (ICML-2003) (pp ). Basu, S., Banerjee, A., & Mooney, R. J. (2002). Semi-supervised clustering by seeding. Proceedings of 19t International Conference on Macine Learning (ICML-2002) (pp ). Basu, S., Bilenko, M., & Mooney, R. J. (2004). A probabilistic framework for semi-supervised clustering. In submission, available at ttp:// ml/publication. Bilenko, M., & Mooney, R. J. (2003). Adaptive duplicate detection using learnable string similarity measures. Proceedings of te Nint ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003) (pp ). Bilmes, J. (1997). A gentle tutorial on te EM algoritm and its application to parameter estimation for Gaussian mixture and idden Markov models (Tec. Report ICSI-TR ). ICSI. Blake, C. L., & Merz, C. J. (1998). UCI repository of macine learning databases. ttp:// mlearn/mlrepository.tml. Con, D., Caruana, R., & McCallum, A. (2003). Semi-supervised clustering wit user feedback (Tec. Report TR ). Cornell University. Demiriz, A., Bennett, K. P., & Embrects, M. J. (1999). Semisupervised clustering using genetic algoritms. Artificial Neural Networks in Engineering (ANNIE-99) (pp ). Domeniconi, C. (2002). Locally adaptive tecniques for pattern classification. Doctoral dissertation, University of California, Riverside. Klein, D., Kamvar, S. D., & Manning, C. (2002). From instancelevel constraints to space-level constraints: Making te most of prior knowledge in data clustering. Proceedings of te Te Nineteent International Conference on Macine Learning (ICML-2002) (pp ). Kleinberg, J., & Tardos, E. (1999). Approximation algoritms for classification problems wit pairwise relationsips: Metric labeling and Markov random fields. Proceedings of te 40t IEEE Symposium on Foundations of Computer Science (FOCS-99) (pp ). Saul, L., & Roweis, S. (2003). Tink globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Macine Learning Researc, 4, Segal, E., Wang, H., & Koller, D. (2003). Discovering molecular patways from protein interaction and gene expression data. Bioinformatics, 19, i264 i272. Scultz, M., and Joacims, T. (2004). Learning a distance metric from relative comparisons. Advances in Neural Information Processing Systems 16. Wagstaff, K., Cardie, C., Rogers, S., & Scroedl, S. (2001). Constrained clustering wit background knowledge. Proceedings of 18t International Conference on Macine Learning (ICML-2001) (pp ). Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning, wit application to clustering wit sideinformation. Advances in Neural Information Processing Systems 15 (pp ).
Comparing and Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering
Proceedings of the ICML-2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining Systems, pp.42-49, Washington DC, August, 2003 Comparing and Unifying Search-Based
More informationMulti-View Clustering with Constraint Propagation for Learning with an Incomplete Mapping Between Views
Multi-View Clustering wit Constraint Propagation for Learning wit an Incomplete Mapping Between Views Eric Eaton Bryn Mawr College Computer Science Department Bryn Mawr, PA 19010 eeaton@brynmawr.edu Marie
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationA Novel Approach for Weighted Clustering
A Novel Approach for Weighted Clustering CHANDRA B. Indian Institute of Technology, Delhi Hauz Khas, New Delhi, India 110 016. Email: bchandra104@yahoo.co.in Abstract: - In majority of the real life datasets,
More informationAn Adaptive Kernel Method for Semi-Supervised Clustering
An Adaptive Kernel Method for Semi-Supervised Clustering Bojun Yan and Carlotta Domeniconi Department of Information and Software Engineering George Mason University Fairfax, Virginia 22030, USA byan@gmu.edu,
More informationA Unified Framework to Integrate Supervision and Metric Learning into Clustering
A Unified Framework to Integrate Supervision and Metric Learning into Clustering Xin Li and Dan Roth Department of Computer Science University of Illinois, Urbana, IL 61801 (xli1,danr)@uiuc.edu December
More informationTwo Modifications of Weight Calculation of the Non-Local Means Denoising Method
Engineering, 2013, 5, 522-526 ttp://dx.doi.org/10.4236/eng.2013.510b107 Publised Online October 2013 (ttp://www.scirp.org/journal/eng) Two Modifications of Weigt Calculation of te Non-Local Means Denoising
More informationUnsupervised Learning for Hierarchical Clustering Using Statistical Information
Unsupervised Learning for Hierarcical Clustering Using Statistical Information Masaru Okamoto, Nan Bu, and Tosio Tsuji Department of Artificial Complex System Engineering Hirosima University Kagamiyama
More informationBounding Tree Cover Number and Positive Semidefinite Zero Forcing Number
Bounding Tree Cover Number and Positive Semidefinite Zero Forcing Number Sofia Burille Mentor: Micael Natanson September 15, 2014 Abstract Given a grap, G, wit a set of vertices, v, and edges, various
More informationUnifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering
Submitted for pubilcation Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering Sugato Basu, Mikhail Bilenko and Raymond J. Mooney Department of Computer Sciences University
More information4.1 Tangent Lines. y 2 y 1 = y 2 y 1
41 Tangent Lines Introduction Recall tat te slope of a line tells us ow fast te line rises or falls Given distinct points (x 1, y 1 ) and (x 2, y 2 ), te slope of te line troug tese two points is cange
More informationDensity Estimation Over Data Stream
Density Estimation Over Data Stream Aoying Zou Dept. of Computer Science, Fudan University 22 Handan Rd. Sangai, 2433, P.R. Cina ayzou@fudan.edu.cn Ziyuan Cai Dept. of Computer Science, Fudan University
More informationFast Calculation of Thermodynamic Properties of Water and Steam in Process Modelling using Spline Interpolation
P R E P R N T CPWS XV Berlin, September 8, 008 Fast Calculation of Termodynamic Properties of Water and Steam in Process Modelling using Spline nterpolation Mattias Kunick a, Hans-Joacim Kretzscmar a,
More informationComputing Gaussian Mixture Models with EM using Equivalence Constraints
Computing Gaussian Mixture Models with EM using Equivalence Constraints Noam Shental, Aharon Bar-Hillel, Tomer Hertz and Daphna Weinshall email: tomboy,fenoam,aharonbh,daphna@cs.huji.ac.il School of Computer
More informationModel-based Clustering With Probabilistic Constraints
To appear in SIAM data mining Model-based Clustering With Probabilistic Constraints Martin H. C. Law Alexander Topchy Anil K. Jain Abstract The problem of clustering with constraints is receiving increasing
More informationMore on Functions and Their Graphs
More on Functions and Teir Graps Difference Quotient ( + ) ( ) f a f a is known as te difference quotient and is used exclusively wit functions. Te objective to keep in mind is to factor te appearing in
More informationSemi-supervised Clustering
Semi-supervised lustering BY: $\ S - MAI AMLT - 2016/2017 (S - MAI) Semi-supervised lustering AMLT - 2016/2017 1 / 26 Outline 1 Semisupervised lustering 2 Semisupervised lustering/labeled Examples 3 Semisupervised
More informationSemi-supervised learning
Semi-supervised Learning COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Overview 2 Semi-supervised learning Semi-supervised classification Semi-supervised clustering Semi-supervised
More informationNumerical Derivatives
Lab 15 Numerical Derivatives Lab Objective: Understand and implement finite difference approximations of te derivative in single and multiple dimensions. Evaluate te accuracy of tese approximations. Ten
More informationDesign of PSO-based Fuzzy Classification Systems
Tamkang Journal of Science and Engineering, Vol. 9, No 1, pp. 6370 (006) 63 Design of PSO-based Fuzzy Classification Systems Cia-Cong Cen Department of Electronics Engineering, Wufeng Institute of Tecnology,
More information3.6 Directional Derivatives and the Gradient Vector
288 CHAPTER 3. FUNCTIONS OF SEVERAL VARIABLES 3.6 Directional Derivatives and te Gradient Vector 3.6.1 Functions of two Variables Directional Derivatives Let us first quickly review, one more time, te
More informationSoftware Fault Prediction using Machine Learning Algorithm Pooja Garg 1 Mr. Bhushan Dua 2
IJSRD - International Journal for Scientific Researc & Development Vol. 3, Issue 04, 2015 ISSN (online): 2321-0613 Software Fault Prediction using Macine Learning Algoritm Pooja Garg 1 Mr. Busan Dua 2
More information4.2 The Derivative. f(x + h) f(x) lim
4.2 Te Derivative Introduction In te previous section, it was sown tat if a function f as a nonvertical tangent line at a point (x, f(x)), ten its slope is given by te it f(x + ) f(x). (*) Tis is potentially
More informationLaser Radar based Vehicle Localization in GPS Signal Blocked Areas
International Journal of Computational Intelligence Systems, Vol. 4, No. 6 (December, 20), 00-09 Laser Radar based Veicle Localization in GPS Signal Bloced Areas Ming Yang Department of Automation, Sangai
More informationOptimal In-Network Packet Aggregation Policy for Maximum Information Freshness
1 Optimal In-etwork Packet Aggregation Policy for Maimum Information Fresness Alper Sinan Akyurek, Tajana Simunic Rosing Electrical and Computer Engineering, University of California, San Diego aakyurek@ucsd.edu,
More informationCubic smoothing spline
Cubic smooting spline Menu: QCExpert Regression Cubic spline e module Cubic Spline is used to fit any functional regression curve troug data wit one independent variable x and one dependent random variable
More informationCESILA: Communication Circle External Square Intersection-Based WSN Localization Algorithm
Sensors & Transducers 2013 by IFSA ttp://www.sensorsportal.com CESILA: Communication Circle External Square Intersection-Based WSN Localization Algoritm Sun Hongyu, Fang Ziyi, Qu Guannan College of Computer
More informationConstrained Clustering with Interactive Similarity Learning
SCIS & ISIS 2010, Dec. 8-12, 2010, Okayama Convention Center, Okayama, Japan Constrained Clustering with Interactive Similarity Learning Masayuki Okabe Toyohashi University of Technology Tenpaku 1-1, Toyohashi,
More information1.4 RATIONAL EXPRESSIONS
6 CHAPTER Fundamentals.4 RATIONAL EXPRESSIONS Te Domain of an Algebraic Epression Simplifying Rational Epressions Multiplying and Dividing Rational Epressions Adding and Subtracting Rational Epressions
More informationLinear Interpolating Splines
Jim Lambers MAT 772 Fall Semester 2010-11 Lecture 17 Notes Tese notes correspond to Sections 112, 11, and 114 in te text Linear Interpolating Splines We ave seen tat ig-degree polynomial interpolation
More informationCoarticulation: An Approach for Generating Concurrent Plans in Markov Decision Processes
Coarticulation: An Approac for Generating Concurrent Plans in Markov Decision Processes Kasayar Roanimanes kas@cs.umass.edu Sridar Maadevan maadeva@cs.umass.edu Department of Computer Science, University
More informationAn Algorithm for Loopless Deflection in Photonic Packet-Switched Networks
An Algoritm for Loopless Deflection in Potonic Packet-Switced Networks Jason P. Jue Center for Advanced Telecommunications Systems and Services Te University of Texas at Dallas Ricardson, TX 75083-0688
More informationPYRAMID FILTERS BASED ON BILINEAR INTERPOLATION
PYRAMID FILTERS BASED ON BILINEAR INTERPOLATION Martin Kraus Computer Grapics and Visualization Group, Tecnisce Universität Müncen, Germany krausma@in.tum.de Magnus Strengert Visualization and Interactive
More informationClassification of Osteoporosis using Fractal Texture Features
Classification of Osteoporosis using Fractal Texture Features V.Srikant, C.Dines Kumar and A.Tobin Department of Electronics and Communication Engineering Panimalar Engineering College Cennai, Tamil Nadu,
More informationMATH 5a Spring 2018 READING ASSIGNMENTS FOR CHAPTER 2
MATH 5a Spring 2018 READING ASSIGNMENTS FOR CHAPTER 2 Note: Tere will be a very sort online reading quiz (WebWork) on eac reading assignment due one our before class on its due date. Due dates can be found
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationPiecewise Polynomial Interpolation, cont d
Jim Lambers MAT 460/560 Fall Semester 2009-0 Lecture 2 Notes Tese notes correspond to Section 4 in te text Piecewise Polynomial Interpolation, cont d Constructing Cubic Splines, cont d Having determined
More informationComparison of the Efficiency of the Various Algorithms in Stratified Sampling when the Initial Solutions are Determined with Geometric Method
International Journal of Statistics and Applications 0, (): -0 DOI: 0.9/j.statistics.000.0 Comparison of te Efficiency of te Various Algoritms in Stratified Sampling wen te Initial Solutions are Determined
More informationThe Euler and trapezoidal stencils to solve d d x y x = f x, y x
restart; Te Euler and trapezoidal stencils to solve d d x y x = y x Te purpose of tis workseet is to derive te tree simplest numerical stencils to solve te first order d equation y x d x = y x, and study
More informationMachine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016
Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the
More informationProceedings of the 8th WSEAS International Conference on Neural Networks, Vancouver, British Columbia, Canada, June 19-21,
Proceedings of te 8t WSEAS International Conference on Neural Networks, Vancouver, Britis Columbia, Canada, June 9-2, 2007 3 Neural Network Structures wit Constant Weigts to Implement Dis-Jointly Removed
More informationChapter K. Geometric Optics. Blinn College - Physics Terry Honan
Capter K Geometric Optics Blinn College - Pysics 2426 - Terry Honan K. - Properties of Ligt Te Speed of Ligt Te speed of ligt in a vacuum is approximately c > 3.0µ0 8 mês. Because of its most fundamental
More informationComputing Gaussian Mixture Models with EM using Equivalence Constraints
Computing Gaussian Mixture Models with EM using Equivalence Constraints Noam Shental Computer Science & Eng. Center for Neural Computation Hebrew University of Jerusalem Jerusalem, Israel 9904 fenoam@cs.huji.ac.il
More information2 The Derivative. 2.0 Introduction to Derivatives. Slopes of Tangent Lines: Graphically
2 Te Derivative Te two previous capters ave laid te foundation for te study of calculus. Tey provided a review of some material you will need and started to empasize te various ways we will view and use
More informationSymmetric Tree Replication Protocol for Efficient Distributed Storage System*
ymmetric Tree Replication Protocol for Efficient Distributed torage ystem* ung Cune Coi 1, Hee Yong Youn 1, and Joong up Coi 2 1 cool of Information and Communications Engineering ungkyunkwan University
More informationHaar Transform CS 430 Denbigh Starkey
Haar Transform CS Denbig Starkey. Background. Computing te transform. Restoring te original image from te transform 7. Producing te transform matrix 8 5. Using Haar for lossless compression 6. Using Haar
More informationC-NBC: Neighborhood-Based Clustering with Constraints
C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationImplementation of Integral based Digital Curvature Estimators in DGtal
Implementation of Integral based Digital Curvature Estimators in DGtal David Coeurjolly 1, Jacques-Olivier Lacaud 2, Jérémy Levallois 1,2 1 Université de Lyon, CNRS INSA-Lyon, LIRIS, UMR5205, F-69621,
More informationTOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)
TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) 1 S. ADAEKALAVAN, 2 DR. C. CHANDRASEKAR 1 Assistant Professor, Department of Information Technology, J.J. College of Arts and Science, Pudukkottai,
More informationInvestigating an automated method for the sensitivity analysis of functions
Investigating an automated metod for te sensitivity analysis of functions Sibel EKER s.eker@student.tudelft.nl Jill SLINGER j..slinger@tudelft.nl Delft University of Tecnology 2628 BX, Delft, te Neterlands
More informationTuning MAX MIN Ant System with off-line and on-line methods
Université Libre de Bruxelles Institut de Recerces Interdisciplinaires et de Développements en Intelligence Artificielle Tuning MAX MIN Ant System wit off-line and on-line metods Paola Pellegrini, Tomas
More informationClassification with Partial Labels
Classification with Partial Labels Nam Nguyen, Rich Caruana Cornell University Department of Computer Science Ithaca, New York 14853 {nhnguyen, caruana}@cs.cornell.edu ABSTRACT In this paper, we address
More information, 1 1, A complex fraction is a quotient of rational expressions (including their sums) that result
RT. Complex Fractions Wen working wit algebraic expressions, sometimes we come across needing to simplify expressions like tese: xx 9 xx +, xx + xx + xx, yy xx + xx + +, aa Simplifying Complex Fractions
More informationA Cost Model for Distributed Shared Memory. Using Competitive Update. Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science
A Cost Model for Distributed Sared Memory Using Competitive Update Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, Texas, 77843-3112, USA E-mail: fjkim,vaidyag@cs.tamu.edu
More informationAn Effective Sensor Deployment Strategy by Linear Density Control in Wireless Sensor Networks Chiming Huang and Rei-Heng Cheng
An ffective Sensor Deployment Strategy by Linear Density Control in Wireless Sensor Networks Ciming Huang and ei-heng Ceng 5 De c e mbe r0 International Journal of Advanced Information Tecnologies (IJAIT),
More informationMAPI Computer Vision
MAPI Computer Vision Multiple View Geometry In tis module we intend to present several tecniques in te domain of te 3D vision Manuel Joao University of Mino Dep Industrial Electronics - Applications -
More informationLocality Preserving Projections (LPP) Abstract
Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL
More informationAnalytical CHEMISTRY
ISSN : 974-749 Grap kernels and applications in protein classification Jiang Qiangrong*, Xiong Zikang, Zai Can Department of Computer Science, Beijing University of Tecnology, Beijing, (CHINA) E-mail:
More informationAn Analytical Approach to Real-Time Misbehavior Detection in IEEE Based Wireless Networks
Tis paper was presented as part of te main tecnical program at IEEE INFOCOM 20 An Analytical Approac to Real-Time Misbeavior Detection in IEEE 802. Based Wireless Networks Jin Tang, Yu Ceng Electrical
More informationMean Shifting Gradient Vector Flow: An Improved External Force Field for Active Surfaces in Widefield Microscopy.
Mean Sifting Gradient Vector Flow: An Improved External Force Field for Active Surfaces in Widefield Microscopy. Margret Keuper Cair of Pattern Recognition and Image Processing Computer Science Department
More informationInformation Integration of Partially Labeled Data
Information Integration of Partially Labeled Data Steffen Rendle and Lars Schmidt-Thieme Information Systems and Machine Learning Lab, University of Hildesheim srendle@ismll.uni-hildesheim.de, schmidt-thieme@ismll.uni-hildesheim.de
More informationOur Calibrated Model has No Predictive Value: An Example from the Petroleum Industry
Our Calibrated Model as No Predictive Value: An Example from te Petroleum Industry J.N. Carter a, P.J. Ballester a, Z. Tavassoli a and P.R. King a a Department of Eart Sciences and Engineering, Imperial
More informationSemi-supervised graph clustering: a kernel approach
Mach Learn (2009) 74: 1 22 DOI 10.1007/s10994-008-5084-4 Semi-supervised graph clustering: a kernel approach Brian Kulis Sugato Basu Inderjit Dhillon Raymond Mooney Received: 9 March 2007 / Revised: 17
More informationMulti-Stack Boundary Labeling Problems
Multi-Stack Boundary Labeling Problems Micael A. Bekos 1, Micael Kaufmann 2, Katerina Potika 1 Antonios Symvonis 1 1 National Tecnical University of Atens, Scool of Applied Matematical & Pysical Sciences,
More information12.2 TECHNIQUES FOR EVALUATING LIMITS
Section Tecniques for Evaluating Limits 86 TECHNIQUES FOR EVALUATING LIMITS Wat ou sould learn Use te dividing out tecnique to evaluate its of functions Use te rationalizing tecnique to evaluate its of
More informationComputing geodesic paths on manifolds
Proc. Natl. Acad. Sci. USA Vol. 95, pp. 8431 8435, July 1998 Applied Matematics Computing geodesic pats on manifolds R. Kimmel* and J. A. Setian Department of Matematics and Lawrence Berkeley National
More informationVector Processing Contours
Vector Processing Contours Andrey Kirsanov Department of Automation and Control Processes MAMI Moscow State Tecnical University Moscow, Russia AndKirsanov@yandex.ru A.Vavilin and K-H. Jo Department of
More informationFeature-Based Steganalysis for JPEG Images and its Implications for Future Design of Steganographic Schemes
Feature-Based Steganalysis for JPEG Images and its Implications for Future Design of Steganograpic Scemes Jessica Fridric Dept. of Electrical Engineering, SUNY Bingamton, Bingamton, NY 3902-6000, USA fridric@bingamton.edu
More informationIntra- and Inter-Session Network Coding in Wireless Networks
Intra- and Inter-Session Network Coding in Wireless Networks Hulya Seferoglu, Member, IEEE, Atina Markopoulou, Member, IEEE, K K Ramakrisnan, Fellow, IEEE arxiv:857v [csni] 3 Feb Abstract In tis paper,
More informationGlobal Metric Learning by Gradient Descent
Global Metric Learning by Gradient Descent Jens Hocke and Thomas Martinetz University of Lübeck - Institute for Neuro- and Bioinformatics Ratzeburger Allee 160, 23538 Lübeck, Germany hocke@inb.uni-luebeck.de
More informationDRN: Bringing Greedy Layer-Wise Training into Time Dimension
DRN: Bringing Greedy Layer-Wise Training into Time Dimension Xiaoyi Li, Xiaowei Jia, Hui Li, Houping Xiao, Jing Gao and Aidong Zang Dept. of Computer Science and Engineering State University of New York
More informationConstrained K-means Clustering with Background Knowledge. Clustering! Background Knowledge. Using Background Knowledge. The K-means Algorithm
Constrained K-means Clustering with Background Knowledge paper by Kiri Wagstaff, Claire Cardie, Seth Rogers and Stefan Schroedl presented by Siddharth Patwardhan An Overview of the Talk Introduction to
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationMeasuring Constraint-Set Utility for Partitional Clustering Algorithms
Measuring Constraint-Set Utility for Partitional Clustering Algorithms Ian Davidson 1, Kiri L. Wagstaff 2, and Sugato Basu 3 1 State University of New York, Albany, NY 12222, davidson@cs.albany.edu 2 Jet
More informationOn the Use of Radio Resource Tests in Wireless ad hoc Networks
Tecnical Report RT/29/2009 On te Use of Radio Resource Tests in Wireless ad oc Networks Diogo Mónica diogo.monica@gsd.inesc-id.pt João Leitão jleitao@gsd.inesc-id.pt Luis Rodrigues ler@ist.utl.pt Carlos
More informationSection 2.3: Calculating Limits using the Limit Laws
Section 2.3: Calculating Limits using te Limit Laws In previous sections, we used graps and numerics to approimate te value of a it if it eists. Te problem wit tis owever is tat it does not always give
More informationClustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo
Clustering Lecture 9: Other Topics Jing Gao SUNY Buffalo 1 Basics Outline Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Miture model Spectral methods Advanced topics
More informationUNSUPERVISED HIERARCHICAL IMAGE SEGMENTATION BASED ON THE TS-MRF MODEL AND FAST MEAN-SHIFT CLUSTERING
UNSUPERVISED HIERARCHICAL IMAGE SEGMENTATION BASED ON THE TS-MRF MODEL AND FAST MEAN-SHIFT CLUSTERING Raffaele Gaetano, Giuseppe Scarpa, Giovanni Poggi, and Josiane Zerubia Dip. Ing. Elettronica e Telecomunicazioni,
More informationFault Localization Using Tarantula
Class 20 Fault localization (cont d) Test-data generation Exam review: Nov 3, after class to :30 Responsible for all material up troug Nov 3 (troug test-data generation) Send questions beforeand so all
More informationAll truths are easy to understand once they are discovered; the point is to discover them. Galileo
Section 7. olume All truts are easy to understand once tey are discovered; te point is to discover tem. Galileo Te main topic of tis section is volume. You will specifically look at ow to find te volume
More informationMinimizing Memory Access By Improving Register Usage Through High-level Transformations
Minimizing Memory Access By Improving Register Usage Troug Hig-level Transformations San Li Scool of Computer Engineering anyang Tecnological University anyang Avenue, SIGAPORE 639798 Email: p144102711@ntu.edu.sg
More informationIncorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data
Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Ryan Atallah, John Ryan, David Aeschlimann December 14, 2013 Abstract In this project, we study the problem of classifying
More informationValue, Cost, and Sharing: Open Issues in Constrained Clustering
Value, Cost, and Sharing: Open Issues in Constrained Clustering Kiri L. Wagstaff Jet Propulsion Laboratory, California Institute of Technology, Mail Stop 126-347, 4800 Oak Grove Drive, Pasadena CA 91109,
More informationSearch-aware Conditions for Probably Approximately Correct Heuristic Search
Searc-aware Conditions for Probably Approximately Correct Heuristic Searc Roni Stern Ariel Felner Information Systems Engineering Ben Gurion University Beer-Seva, Israel 85104 roni.stern@gmail.com, felner@bgu.ac.il
More informationANTENNA SPHERICAL COORDINATE SYSTEMS AND THEIR APPLICATION IN COMBINING RESULTS FROM DIFFERENT ANTENNA ORIENTATIONS
NTNN SPHRICL COORDINT SSTMS ND THIR PPLICTION IN COMBINING RSULTS FROM DIFFRNT NTNN ORINTTIONS llen C. Newell, Greg Hindman Nearfield Systems Incorporated 133. 223 rd St. Bldg. 524 Carson, C 9745 US BSTRCT
More informationUtilizing Call Admission Control to Derive Optimal Pricing of Multiple Service Classes in Wireless Cellular Networks
Utilizing Call Admission Control to Derive Optimal Pricing of Multiple Service Classes in Wireless Cellular Networks Okan Yilmaz and Ing-Ray Cen Computer Science Department Virginia Tec {oyilmaz, ircen}@vt.edu
More informationIntractability and Clustering with Constraints
Ian Davidson davidson@cs.albany.edu S.S. Ravi ravi@cs.albany.edu Department of Computer Science, State University of New York, 1400 Washington Ave, Albany, NY 12222 Abstract Clustering with constraints
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More information2.8 The derivative as a function
CHAPTER 2. LIMITS 56 2.8 Te derivative as a function Definition. Te derivative of f(x) istefunction f (x) defined as follows f f(x + ) f(x) (x). 0 Note: tis differs from te definition in section 2.7 in
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationMulti-Objective Particle Swarm Optimizers: A Survey of the State-of-the-Art
Multi-Objective Particle Swarm Optimizers: A Survey of te State-of-te-Art Margarita Reyes-Sierra and Carlos A. Coello Coello CINVESTAV-IPN (Evolutionary Computation Group) Electrical Engineering Department,
More informationKernel-Based Metric Adaptation with Pairwise Constraints
Kernel-Based Metric Adaptation with Pairwise Constraints Hong Chang and Dit-Yan Yeung Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong {hongch,dyyeung}@cs.ust.hk
More informationCHAPTER 7: TRANSCENDENTAL FUNCTIONS
7.0 Introduction and One to one Functions Contemporary Calculus 1 CHAPTER 7: TRANSCENDENTAL FUNCTIONS Introduction In te previous capters we saw ow to calculate and use te derivatives and integrals of
More informationWhen a BST becomes badly unbalanced, the search behavior can degenerate to that of a sorted linked list, O(N).
Balanced Binary Trees Binary searc trees provide O(log N) searc times provided tat te nodes are distributed in a reasonably balanced manner. Unfortunately, tat is not always te case and performing a sequence
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationDistributed and Optimal Rate Allocation in Application-Layer Multicast
Distributed and Optimal Rate Allocation in Application-Layer Multicast Jinyao Yan, Martin May, Bernard Plattner, Wolfgang Mülbauer Computer Engineering and Networks Laboratory, ETH Zuric, CH-8092, Switzerland
More informationAlternating Direction Implicit Methods for FDTD Using the Dey-Mittra Embedded Boundary Method
Te Open Plasma Pysics Journal, 2010, 3, 29-35 29 Open Access Alternating Direction Implicit Metods for FDTD Using te Dey-Mittra Embedded Boundary Metod T.M. Austin *, J.R. Cary, D.N. Smite C. Nieter Tec-X
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More information