(Graph-based) Semi-Supervised Learning. Partha Pratim Talukdar Indian Institute of Science
|
|
- Kristopher Dixon
- 5 years ago
- Views:
Transcription
1 (Graph-based) Semi-Supervised Learning Partha Pratim Talukdar Indian Institute of Science April 7, 2015
2 Supervised Learning Labeled Data Learning Algorithm Model 2
3 Supervised Learning Labeled Data Learning Algorithm Model Examples: Decision Trees Support Vector Machine (SVM) Maximum Entropy (MaxEnt) 2
4 Semi-Supervised Learning (SSL) Labeled Data Learning Algorithm Model A Lot of Unlabeled Data 3
5 Semi-Supervised Learning (SSL) Labeled Data Learning Algorithm Model A Lot of Unlabeled Data Examples: Self-Training Co-Training 3
6 Why SSL? How can unlabeled data be helpful? 4
7 Why SSL? How can unlabeled data be helpful? Without Unlabeled Data 4
8 Why SSL? How can unlabeled data be helpful? Labeled Instances Without Unlabeled Data 4
9 Why SSL? How can unlabeled data be helpful? Labeled Instances Decision Boundary Without Unlabeled Data 4
10 Why SSL? How can unlabeled data be helpful? Labeled Instances Decision Boundary Without Unlabeled Data With Unlabeled Data 4
11 Why SSL? How can unlabeled data be helpful? Labeled Instances Decision Boundary Without Unlabeled Data With Unlabeled Data Unlabeled Instances 4
12 Why SSL? How can unlabeled data be helpful? Labeled Instances More accurate decision boundary in the presence of unlabeled instances Decision Boundary Without Unlabeled Data With Unlabeled Data Unlabeled Instances Example from [Belkin et al., JMLR 2006] 4
13 Inductive vs Transductive 5
14 Inductive vs Transductive Supervised (Labeled) Semi-supervised (Labeled + Unlabeled) 5
15 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) Semi-supervised (Labeled + Unlabeled) 5
16 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy Semi-supervised (Labeled + Unlabeled) 5
17 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy X Semi-supervised (Labeled + Unlabeled) 5
18 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy X Semi-supervised (Labeled + Unlabeled) Manifold Regularization 5
19 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy X Semi-supervised (Labeled + Unlabeled) Manifold Regularization Label Propagation (LP), MAD, MP,... 5
20 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy X Semi-supervised (Labeled + Unlabeled) Manifold Regularization Label Propagation (LP), MAD, MP,... Most Graph SSL algorithms are non-parametric (i.e., # parameters grows with data size) 5
21 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy X Focus of this talk Semi-supervised (Labeled + Unlabeled) Manifold Regularization Label Propagation (LP), MAD, MP,... Most Graph SSL algorithms are non-parametric (i.e., # parameters grows with data size) 5
22 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy X Focus of this talk Semi-supervised (Labeled + Unlabeled) Manifold Regularization Label Propagation (LP), MAD, MP,... Most Graph SSL algorithms are non-parametric (i.e., # parameters grows with data size) See Chapter 25 of SSL Book: 5
23 Two Popular SSL Algorithms Self Training 6
24 Two Popular SSL Algorithms Self Training Co-Training 6
25 Why Graph-based SSL? 7
26 Why Graph-based SSL? Some datasets are naturally represented by a graph web, citation network, social network,... 7
27 Why Graph-based SSL? Some datasets are naturally represented by a graph web, citation network, social network,... Uniform representation for heterogeneous data 7
28 Why Graph-based SSL? Some datasets are naturally represented by a graph web, citation network, social network,... Uniform representation for heterogeneous data Easily parallelizable, scalable to large data 7
29 Why Graph-based SSL? Some datasets are naturally represented by a graph web, citation network, social network,... Uniform representation for heterogeneous data Easily parallelizable, scalable to large data Effective in practice 7
30 Why Graph-based SSL? Some datasets are naturally represented by a graph web, citation network, social network,... Uniform representation for heterogeneous data Easily parallelizable, scalable to large data Effective in practice Graph SSL Non-Graph SSL Text Classification Supervised 7
31 Graph-based SSL 8
32 Graph-based SSL 8
33 Graph-based SSL Similarity
34 Graph-based SSL Similarity business politics 8
35 Graph-based SSL Similarity business politics business politics 8
36 Graph-based SSL 9
37 Graph-based SSL Smoothness Assumption If two instances are similar according to the graph, then output labels should be similar 9
38 Graph-based SSL Smoothness Assumption If two instances are similar according to the graph, then output labels should be similar 9
39 Graph-based SSL Smoothness Assumption If two instances are similar according to the graph, then output labels should be similar Two stages Graph construction (if not already present) Label Inference 9
40 Outline Motivation Graph Construction Inference Methods Scalability Applications Conclusion & Future Work 10
41 Outline Motivation Graph Construction Inference Methods Scalability Applications Conclusion & Future Work 10
42 Graph Construction Neighborhood Methods k-nn Graph Construction (k-nng) e-neighborhood Method Metric Learning Other approaches 11
43 Neighborhood Methods 12
44 Neighborhood Methods k-nearest Neighbor Graph (k-nng) add edges between an instance and its k-nearest neighbors 12
45 Neighborhood Methods k-nearest Neighbor Graph (k-nng) add edges between an instance and its k-nearest neighbors k = 3 12
46 Neighborhood Methods k-nearest Neighbor Graph (k-nng) add edges between an instance and its k-nearest neighbors k = 3 e-neighborhood add edges to all instances inside a ball of radius e 12
47 Neighborhood Methods k-nearest Neighbor Graph (k-nng) add edges between an instance and its k-nearest neighbors k = 3 e-neighborhood add edges to all instances inside a ball of radius e e 12
48 Issues with k-nng 13
49 Issues with k-nng Not scalable (quadratic) 13
50 Issues with k-nng Not scalable (quadratic) Results in an asymmetric graph 13
51 Issues with k-nng Not scalable (quadratic) Results in an asymmetric graph b is the closest neighbor of a, but not the other way a b c 13
52 Issues with k-nng Not scalable (quadratic) Results in an asymmetric graph b is the closest neighbor of a, but not the other way a b c Results in irregular graphs some nodes may end up with higher degree than other nodes 13
53 Issues with k-nng Not scalable (quadratic) Results in an asymmetric graph b is the closest neighbor of a, but not the other way a b c Results in irregular graphs some nodes may end up with higher degree than other nodes Node of degree 4 in the k-nng (k = 1) 13
54 Issues with e-neighborhood 14
55 Issues with e-neighborhood Not scalable 14
56 Issues with e-neighborhood Not scalable Sensitive to value of e : not invariant to scaling 14
57 Issues with e-neighborhood Not scalable Sensitive to value of e : not invariant to scaling Fragmented Graph: disconnected components 14
58 Issues with e-neighborhood Not scalable Sensitive to value of e : not invariant to scaling Fragmented Graph: disconnected components Disconnected Data e-neighborhood Figure from [Jebara et al., ICML 2009] 14
59 Graph Construction using Metric Learning 15
60 Graph Construction using Metric Learning x i w ij / exp( D A (x i,x j )) x j 15
61 Graph Construction using Metric Learning x i w ij / exp( D A (x i,x j )) x j D A (x i,x j )=(x i x j ) T A(x i x j ) Estimated using Mahalanobis metric learning algorithms 15
62 Graph Construction using Metric Learning x i w ij / exp( D A (x i,x j )) x j D A (x i,x j )=(x i x j ) T A(x i x j ) Supervised Metric Learning ITML [Kulis et al., ICML 2007] LMNN [Weinberger and Saul, JMLR 2009] Estimated using Mahalanobis metric learning algorithms Semi-supervised Metric Learning IDML [Dhillon et al., UPenn TR 2010] 15
63 Benefits of Metric Learning for Graph Construction 16
64 Benefits of Metric Learning for Graph Construction 0.5 Original RP PCA ITML IDML Error Amazon Newsgroups Reuters Enron A Text 100 seed and1400 test instances, all inferences using LP 16
65 Benefits of Metric Learning for Graph Construction Original RP PCA ITML IDML Graph constructed using supervised metric learning Error Amazon Newsgroups Reuters Enron A Text 100 seed and1400 test instances, all inferences using LP 16
66 Benefits of Metric Learning for Graph Construction Original RP PCA ITML IDML Graph constructed using supervised metric learning Graph constructed using semi-supervised metric learning [Dhillon et al., 2010] Error Amazon Newsgroups Reuters Enron A Text 100 seed and1400 test instances, all inferences using LP 16 [Dhillon et al., UPenn TR 2010]
67 Benefits of Metric Learning for Graph Construction Original RP PCA ITML IDML Graph constructed using supervised metric learning Graph constructed using semi-supervised metric learning [Dhillon et al., 2010] Error Amazon Newsgroups Reuters Enron A Text 100 seed and1400 test instances, all inferences using LP Careful graph construction is critical! 16 [Dhillon et al., UPenn TR 2010]
68 Other Graph Construction Approaches Local Reconstruction Linear Neighborhood [Wang and Zhang, ICML 2005] Regular Graph: b-matching [Jebara et al., ICML 2008] Fitting Graph to Vector Data [Daitch et al., ICML 2009] Graph Kernels [Zhu et al., NIPS 2005] 17
69 Outline Motivation Graph Construction Inference Methods Scalability - Label Propagation - Modified Adsorption - Measure Propagation - Sparse Label Propagation - Manifold Regularization Applications Conclusion & Future Work 18
70 Graph Laplacian 19
71 Graph Laplacian Laplacian (un-normalized) of a graph: L = D W, where D ii = X j W ij, D ij(6=i) =0 19
72 Graph Laplacian Laplacian (un-normalized) of a graph: L = D W, where D ii = X j W ij, D ij(6=i) =0 1 b 3 c a 2 1 d 19
73 Graph Laplacian Laplacian (un-normalized) of a graph: L = D W, where D ii = X j W ij, D ij(6=i) =0 a b c d a b c d a b 2 3 c 1 d 19
74 1 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: 20
75 1 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: f T Lf = X i,j W ij (f i f j ) 2 20
76 1 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: Measure of Non-Smoothness f T Lf = X i,j W ij (f i f j ) 2 20
77 1 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: Vector of scores for single label on nodes f T Lf = X i,j W ij (f i f j ) 2 Measure of Non-Smoothness 20
78 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: Vector of scores for single label on nodes f T Lf = X i,j W ij (f i f j ) 2 Measure of Non-Smoothness 10 b 3 1 c 5 1 a 2 1 d 25 1 f T = [ ] 20
79 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: Vector of scores for single label on nodes f T Lf = X i,j W ij (f i f j ) 2 Measure of Non-Smoothness 10 b 3 1 c 5 1 a 2 1 d 25 1 f T = [ ] f T Lf = 588 Not Smooth 20
80 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: Vector of scores for single label on nodes W ij (f i f j ) 2 Measure of Non-Smoothness 1 1 b 3 f T Lf = X i,j c b 3 c 5 1 a 2 f T = [1113] f T Lf =4 1 d 3 1 a 2 1 d 25 f T = [ ] f T Lf = 588 Not Smooth 1 20
81 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: Vector of scores for single label on nodes W ij (f i f j ) 2 Measure of Non-Smoothness 1 1 b 3 f T Lf = X i,j c b 3 c 5 1 a 2 f T = [1113] f T Lf =4 1 d 3 Smooth 20 1 a 2 1 d 25 f T = [ ] f T Lf = 588 Not Smooth 1
82 Outline Motivation Graph Construction Inference Methods Scalability - Label Propagation - Modified Adsorption - Measure Propagation - Sparse Label Propagation - Manifold Regularization Applications Conclusion & Future Work 21
83 Notations Y v,l : score of seed label l on node v Ŷ v,l : score of estimated label l on node v R v,l : regularization target for label l on node v v Seed Scores Label Regularization Estimated Scores S : seed node indicator (diagonal matrix) W uv : weight of edge (u, v) in the graph 22
84 LP-ZGL [Zhu et al., ICML 2003] arg min Ŷ m l=1 W uv (Ŷul Ŷvl) 2 = m l=1 Ŷ T l LŶl such that Y ul = Ŷul, S uu =1 Graph Laplacian 23
85 LP-ZGL [Zhu et al., ICML 2003] Smooth arg min Ŷ m l=1 W uv (Ŷul Ŷvl) 2 = m l=1 Ŷ T l LŶl such that Y ul = Ŷul, S uu =1 Graph Laplacian 23
86 LP-ZGL [Zhu et al., ICML 2003] Smooth arg min Ŷ m l=1 W uv (Ŷul Ŷvl) 2 = m l=1 Ŷ T l LŶl such that Y ul = Ŷul, S uu =1 Match Seeds (hard) Graph Laplacian 23
87 LP-ZGL [Zhu et al., ICML 2003] Smooth arg min Ŷ m l=1 W uv (Ŷul Ŷvl) 2 = m l=1 Ŷ T l LŶl such that Y ul = Ŷul, S uu =1 Match Seeds (hard) Graph Laplacian Smoothness two nodes connected by an edge with high weight should be assigned similar labels 23
88 LP-ZGL [Zhu et al., ICML 2003] Smooth arg min Ŷ m l=1 W uv (Ŷul Ŷvl) 2 = m l=1 Ŷ T l LŶl such that Smoothness two nodes connected by an edge with high weight should be assigned similar labels Y ul = Ŷul, S uu =1 Match Seeds (hard) 23 Graph Laplacian Solution satisfies harmonic property
89 Outline Motivation Graph Construction Inference Methods Scalability - Label Propagation - Modified Adsorption - Manifold Regularization - Spectral Graph Transduction - Measure Propagation Applications Conclusion & Future Work 24
90 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] 25
91 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 SŶ l SY l 2 + µ 1 u,v M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores 25
92 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 u,v M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores 25
93 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 u,v Smooth M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores 25
94 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 u,v Smooth M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l Match Priors (Regularizer) R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores 25
95 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 u,v Smooth M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l Match Priors (Regularizer) R l 2 m labels, +1 dummy label M = W for + none-of-the-above W is symmetrized label weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores 25
96 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 u,v Smooth M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l Match Priors (Regularizer) R l 2 m labels, +1 dummy label M = W for + none-of-the-above W is symmetrized label weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores MAD has extra regularization compared to LP-ZGL [Zhu et al, ICML 03]; similar to QC [Bengio et al, 2006] 25
97 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 u,v Smooth M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l Match Priors (Regularizer) R l 2 m labels, +1 dummy label M = W for + none-of-the-above W is symmetrized label weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores MAD s Objective is Convex MAD has extra regularization compared to LP-ZGL [Zhu et al, ICML 03]; similar to QC [Bengio et al, 2006] 25
98 Solving MAD Objective 26
99 Solving MAD Objective Can be solved using matrix inversion (like in LP) but matrix inversion is expensive 26
100 Solving MAD Objective Can be solved using matrix inversion (like in LP) but matrix inversion is expensive Instead solved exactly using a system of linear equations (Ax = b) solved using Jacobi iterations results in iterative updates guaranteed convergence see [Bengio et al., 2006] and [Talukdar and Crammer, ECML 2009] for details 26
101 27 Solving MAD using Iterative Updates Inputs Y, R : V ( L + 1), W : V V, S : V V diagonal Ŷ Y M = W + W > Current label Z v S vv + µ 1 Pu6=v M a estimate on b vu + µ 2 8v 2 V repeat for all v 2 V do Ŷ v (SY ) v + µ 1 M v Ŷ + µ 2 R v Z v end for until convergence 0.75 b Seed v Prior c 0.05
102 27 Solving MAD using Iterative Updates Inputs Y, R : V ( L + 1), W : V V, S : V V diagonal Ŷ Y M = W + W > Z v S vv + µ 1 Pu6=v M a vu + µ 2 8v 2 V repeat for all v 2 V do Ŷ v (SY ) v + µ 1 M v Ŷ + µ 2 R v Z v end for until convergence 0.75 b New label estimate on v Seed v Prior c 0.05
103 27 Solving MAD using Iterative Updates Inputs Y, R : V ( L + 1), W : V V, S : V V diagonal Ŷ Y M = W + W > Z v S vv + µ 1 Pu6=v M a vu + µ 2 8v 2 V repeat for all v 2 V do Ŷ v (SY ) v + µ 1 M v Ŷ + µ 2 R v Z v end for until convergence 0.75 b New label estimate on v Seed v Prior 0.05 Importance of a node can be discounted Easily Parallelizable: Scalable (more c later)
104 When is MAD most effective? 28
105 When is MAD most effective? 0.4 Relative Increase in MRR by MAD over LP-ZGL Average Degree 28
106 When is MAD most effective? 0.4 Relative Increase in MRR by MAD over LP-ZGL Average Degree MAD is particularly effective in denser graphs, where there is greater need for regularization. 28
107 Extension to Dependent Labels 29
108 Extension to Dependent Labels Labels are not always mutually exclusive 29
109 Extension to Dependent Labels Labels are not always mutually exclusive Label Similarity in Sentiment Classification 29
110 Extension to Dependent Labels Labels are not always mutually exclusive Label Similarity in Sentiment Classification Modified Adsorption with Dependent Labels (MADDL) [Talukdar and Crammer, ECML 2009] 29
111 Extension to Dependent Labels Labels are not always mutually exclusive Label Similarity in Sentiment Classification Modified Adsorption with Dependent Labels (MADDL) [Talukdar and Crammer, ECML 2009] Can take label similarities into account 29
112 Extension to Dependent Labels Labels are not always mutually exclusive Label Similarity in Sentiment Classification Modified Adsorption with Dependent Labels (MADDL) [Talukdar and Crammer, ECML 2009] Can take label similarities into account Convex Objective 29
113 Extension to Dependent Labels Labels are not always mutually exclusive Label Similarity in Sentiment Classification Modified Adsorption with Dependent Labels (MADDL) [Talukdar and Crammer, ECML 2009] Can take label similarities into account Convex Objective Efficient iterative/parallelizable updates as in MAD 29
114 Outline Motivation Graph Construction Inference Methods Scalability - Label Propagation - Modified Adsorption - Measure Propagation - Sparse Label Propagation - Manifold Regularization Applications Conclusion & Future Work 30
115 Measure Propagation (MP) [Subramanya and Bilmes, EMNLP 2008, NIPS 2009, JMLR 2011] CKL lx nx arg min {p i } i=1 D KL (r i p i )+µ X i,j w ij D KL (p i p j ) i=1 H(p i ) s.t. X y p i (y) =1, p i (y) 0, 8y, i 31
116 Measure Propagation (MP) [Subramanya and Bilmes, EMNLP 2008, NIPS 2009, JMLR 2011] CKL lx Divergence on seed nodes nx arg min {p i } i=1 D KL (r i p i )+µ X i,j w ij D KL (p i p j ) i=1 H(p i ) s.t. X y p i (y) =1, p i (y) 0, 8y, i Seed and estimated label distributions (normalized) on node i 31
117 Measure Propagation (MP) [Subramanya and Bilmes, EMNLP 2008, NIPS 2009, JMLR 2011] CKL lx Divergence on seed nodes Smoothness (divergence across edge) nx arg min {p i } i=1 D KL (r i p i )+µ X i,j w ij D KL (p i p j ) i=1 H(p i ) s.t. X y p i (y) =1, p i (y) 0, 8y, i Seed and estimated label distributions (normalized) on node i KL Divergence D KL (p i p j )= X y p i (y) log p i(y) p j (y) 31
118 Measure Propagation (MP) [Subramanya and Bilmes, EMNLP 2008, NIPS 2009, JMLR 2011] CKL arg min {p i } lx Divergence on seed nodes i=1 D KL (r i p i )+µ X i,j s.t. X y p i (y) =1, p i (y) Smoothness (divergence across edge) w ij D KL (p i p j ) 0, 8y, i Entropic Regularizer nx i=1 H(p i ) Seed and estimated label distributions (normalized) on node i KL Divergence D KL (p i p j )= X y p i (y) log p i(y) p j (y) H(p i )= Entropy X y p i (y) log p i (y) 31
119 Measure Propagation (MP) [Subramanya and Bilmes, EMNLP 2008, NIPS 2009, JMLR 2011] CKL arg min {p i } lx Divergence on seed nodes i=1 D KL (r i p i )+µ X i,j s.t. X y p i (y) =1, p i (y) Smoothness (divergence across edge) w ij D KL (p i p j ) 0, 8y, i Entropic Regularizer nx i=1 H(p i ) Seed and estimated label distributions (normalized) on node i KL Divergence D KL (p i p j )= X y p i (y) log p i(y) p j (y) H(p i )= Entropy X y p i (y) log p i (y) Normalization Constraint 31
120 Measure Propagation (MP) [Subramanya and Bilmes, EMNLP 2008, NIPS 2009, JMLR 2011] CKL arg min {p i } lx Divergence on seed nodes i=1 D KL (r i p i )+µ X i,j s.t. X y p i (y) =1, p i (y) Smoothness (divergence across edge) w ij D KL (p i p j ) 0, 8y, i Entropic Regularizer nx i=1 H(p i ) Seed and estimated label distributions (normalized) on node i KL Divergence D KL (p i p j )= X y p i (y) log p i(y) p j (y) H(p i )= Entropy X y p i (y) log p i (y) Normalization Constraint CKL is convex (with non-negative edge weights and hyper-parameters) MP is related to Information Regularization [Corduneanu and Jaakkola, 2003] 31
121 Solving MP Objective For ease of optimization, reformulate MP objective: CMP lx nx arg min {p i,q i } i=1 D KL (r i q i )+µ X i,j w 0 ijd KL (p i q j ) i=1 H(p i ) 32
122 Solving MP Objective For ease of optimization, reformulate MP objective: CMP lx nx arg min {p i,q i } i=1 D KL (r i q i )+µ X i,j w 0 ijd KL (p i q j ) i=1 H(p i ) New probability measure, one for each vertex, similar to pi 32
123 Solving MP Objective For ease of optimization, reformulate MP objective: CMP lx nx arg min {p i,q i } i=1 D KL (r i q i )+µ X i,j w 0 ijd KL (p i q j ) i=1 H(p i ) New probability measure, one for each vertex, similar to pi w 0 ij = w ij + (i, j) 32
124 Solving MP Objective For ease of optimization, reformulate MP objective: CMP lx nx arg min {p i,q i } i=1 D KL (r i q i )+µ X i,j w 0 ijd KL (p i q j ) i=1 H(p i ) New probability measure, one for each vertex, similar to pi w 0 ij = w ij + (i, j) Encourages agreement between pi and qi 32
125 Solving MP Objective For ease of optimization, reformulate MP objective: CMP lx nx arg min {p i,q i } i=1 D KL (r i q i )+µ X i,j w 0 ijd KL (p i q j ) i=1 H(p i ) New probability measure, one for each vertex, similar to pi CMP is also convex w 0 ij = w ij + (i, j) Encourages agreement between pi and qi (with non-negative edge weights and hyper-parameters) 32
126 Solving MP Objective For ease of optimization, reformulate MP objective: CMP lx nx arg min {p i,q i } i=1 D KL (r i q i )+µ X i,j w 0 ijd KL (p i q j ) i=1 H(p i ) New probability measure, one for each vertex, similar to pi CMP is also convex w 0 ij = w ij + (i, j) Encourages agreement between pi and qi (with non-negative edge weights and hyper-parameters) CMP can be solved using Alternating Minimization (AM) 32
127 Alternating Minimization 33
128 Alternating Minimization Q 0 33
129 Alternating Minimization P 1 Q 0 33
130 Alternating Minimization P 1 Q 1 Q 0 33
131 Alternating Minimization P 2 P 1 Q 1 Q 0 33
132 Alternating Minimization P 2 P 1 Q 2 Q 1 Q 0 33
133 Alternating Minimization P 3 P 2 P 1 Q 2 Q 1 Q 0 33
134 Alternating Minimization P 3 P 2 P 1 Q 2 Q 1 Q 0 33
135 Alternating Minimization P 3 P 2 P 1 Q 2 Q 1 Q 0 33
136 Alternating Minimization P 3 P 2 P 1 Q 2 Q 1 CMP satisfies the necessary conditions for AM to Q 0 converge [Subramanya and Bilmes, JMLR 2011] 33
137 Outline Motivation Graph Construction Inference Methods Scalability - Label Propagation - Modified Adsorption - Measure Propagation - Sparse Label Propagation - Manifold Regularization Applications Conclusion & Future Work 34
138 Background: Factor Graphs [Kschischang et al., 2001] Factor Graph bipartite graph variable nodes (e.g., label distribution on a node) factor nodes: fitness function over variable assignment Variable Nodes (V) Factor Nodes (F) Distribution over all variables values 35 variables connected to factor f
139 Factor Graph Interpretation of Graph SSL [Zhu et al., ICML 2003] [Das and Smith, NAACL 2012] 3-term Graph SSL Objective (common to many algorithms) min Seed Matching Loss (if any) Edge Smoothness Loss + + Regularization Loss 36
140 Factor Graph Interpretation of Graph SSL [Zhu et al., ICML 2003] [Das and Smith, NAACL 2012] 3-term Graph SSL Objective (common to many algorithms) min Seed Matching Loss (if any) Edge Smoothness Loss + + Regularization Loss w 1,2 q 1 q 2 2 q 1 q 2 36
141 Factor Graph Interpretation of Graph SSL [Zhu et al., ICML 2003] [Das and Smith, NAACL 2012] 3-term Graph SSL Objective (common to many algorithms) min Seed Matching Loss (if any) Edge Smoothness Loss + + Regularization Loss w 1,2 q 1 q 2 2 q 1 q 2 q (q 1,q 2 ) 1 q 2 36
142 Factor Graph Interpretation of Graph SSL [Zhu et al., ICML 2003] [Das and Smith, NAACL 2012] 3-term Graph SSL Objective (common to many algorithms) min Seed Matching Loss (if any) Edge Smoothness Loss + + Regularization Loss w 1,2 q 1 q 2 2 q 1 q 2 q (q 1,q 2 ) 1 q 2 Smoothness Factor (q 1,q 2 ) / w 1,2 q 1 q
143 Factor Graph Interpretation of Graph SSL [Zhu et al., ICML 2003] [Das and Smith, NAACL 2012] 3-term Graph SSL Objective (common to many algorithms) min Seed Matching Loss (if any) Edge Smoothness Loss + + Regularization Loss Seed Matching Factor (unary) w 1,2 q 1 q 2 2 q 1 q 2 q (q 1,q 2 ) 1 q 2 Smoothness Factor (q 1,q 2 ) / w 1,2 q 1 q
144 Factor Graph Interpretation of Graph SSL [Zhu et al., ICML 2003] [Das and Smith, NAACL 2012] 3-term Graph SSL Objective (common to many algorithms) min Seed Matching Loss (if any) Edge Smoothness Loss + + Regularization Loss Seed Matching Factor (unary) w 1,2 q 1 q 2 2 q 1 q 2 Regularization factor (unary) q (q 1,q 2 ) 1 q 2 Smoothness Factor (q 1,q 2 ) / w 1,2 q 1 q
145 Factor Graph Interpretation [Zhu et al., ICML 2003][Das and Smith, NAACL 2012] r 1 r 2 2. Smoothness Factor q 9264 q 9265 q 1 q 2 3. Unary factor for regularization log t (q t ) q 9266 r 3 q 3 q r 4 4 q 9267 q 9268 q 9269 q Factor encouraging agreement on seed labels 37
146 Label Propagation with Sparsity 38
147 Label Propagation with Sparsity Enforce through sparsity inducing unary factor 38
148 Label Propagation with Sparsity Enforce through sparsity inducing unary factor Lasso (Tibshirani, 1996) log t (q t )= Elitist Lasso (Kowalski and Torrésani, 2009) log t (q t )= 38
149 Label Propagation with Sparsity Enforce through sparsity inducing unary factor Lasso (Tibshirani, 1996) log t (q t )= Elitist Lasso (Kowalski and Torrésani, 2009) log t (q t )= For more details, see [Das and Smith, NAACL 2012] 38
150 Outline Motivation Graph Construction Inference Methods Scalability - Label Propagation - Modified Adsorption - Measure Propagation - Sparse Label Propagation - Manifold Regularization Applications Conclusion & Future Work 39
151 Manifold Regularization [Belkin et al., JMLR 2006] f 1 = arg min f l lx i=1 V (y i,f(x i )) + f T Lf + f 2 K 40
152 Manifold Regularization [Belkin et al., JMLR 2006] f 1 = arg min f l lx i=1 Training Data Loss V (y i,f(x i )) + f T Lf + f 2 K Loss Function (e.g., soft margin) 40
153 Manifold Regularization [Belkin et al., JMLR 2006] f 1 = arg min f l lx i=1 Training Data Loss Smoothness Regularizer V (y i,f(x i )) + f T Lf + f 2 K Loss Function (e.g., soft margin) Laplacian of graph over labeled and unlabeled data 40
154 Manifold Regularization [Belkin et al., JMLR 2006] f 1 = arg min f l lx i=1 Training Data Loss Smoothness Regularizer Regularizer (e.g., L2) V (y i,f(x i )) + f T Lf + f 2 K Loss Function (e.g., soft margin) Laplacian of graph over labeled and unlabeled data 40
155 Manifold Regularization [Belkin et al., JMLR 2006] f 1 = arg min f l lx i=1 Training Data Loss Smoothness Regularizer Regularizer (e.g., L2) V (y i,f(x i )) + f T Lf + f 2 K Loss Function (e.g., soft margin) Laplacian of graph over labeled and unlabeled data Trains an inductive classifier which can generalize to unseen instances 40
156 Other Graph-based SSL Methods TACO [Orbach and Crammer, ECML 2012] SSL on Directed Graphs [Zhou et al, NIPS 2005], [Zhou et al., ICML 2005] Spectral Graph Transduction [Joachims, ICML 2003] Graph-SSL for Ordering [Talukdar et al., CIKM 2012] Learning with dissimilarity edges [Goldberg et al., AISTATS 2007] 41
157 Outline Motivation Graph Construction Inference Methods Scalability Applications - Scalability Issues - Node reordering - MapReduce Parallelization Conclusion & Future Work 42
158 More (Unlabeled) Data is Better Data 43 [Subramanya & Bilmes, JMLR 2011]
159 More (Unlabeled) Data is Better Data Graph with 120m vertices 43 [Subramanya & Bilmes, JMLR 2011]
160 More (Unlabeled) Data is Better Data Graph with 120m vertices Challenges with large unlabeled data: Constructing graph from large data Scalable inference over large graphs 43 [Subramanya & Bilmes, JMLR 2011]
161 Outline Motivation Graph Construction Inference Methods Scalability Applications - Scalability Issues - Node reordering [Subramanya & Bilmes, JMLR 2011; Bilmes & Subramanya, 2011] - MapReduce Parallelization Conclusion & Future Work 44
162 Label Update using Message Passing Processor 1 1 a b Processor 2 2 Current label estimate on a v Processor k k Seed Prior Processor 1 k SMP with k Processors. Graph nodes (neighbors not shown) c 45
163 Label Update using Message Passing Processor 1 1 a b Processor 2 2. v Processor k k Seed Prior Processor 1 k+1 New label estimate on v 0.05 SMP with k Processors. Graph nodes (neighbors not shown) c 45
164 Node Reordering Algorithm : Intuition a k b c 46
165 Node Reordering Algorithm : Intuition a k b c Which node should be processed along with k: the one with highest intersection of neighborhood with k 46
166 Node Reordering Algorithm : Intuition e a j c a k b c Which node should be processed along with k: the one with highest intersection of neighborhood with k 46
167 Node Reordering Algorithm : Intuition e a j c a a k b b d c c l Which node should be processed along with k: the one with highest intersection of neighborhood with k c n m 46
168 Node Reordering Algorithm : Intuition e N(k) \ N(a) =1 a j c a a k b b d c c l Which node should be processed along with k: the one with highest intersection of neighborhood with k c n m 46
169 Node Reordering Algorithm : Intuition Cardinality of Intersection N(k) \ N(a) =1 a e j c a a k b b d c c l Which node should be processed along with k: the one with highest intersection of neighborhood with k c n m 46
170 Node Reordering Algorithm : Intuition Cardinality of Intersection N(k) \ N(a) =1 a e j c a a k b N(k) \ N(b) =2 b d c N(k) \ N(c) =0 c l Which node should be processed along with k: the one with highest intersection of neighborhood with k c n m 46
171 Node Reordering Algorithm : Intuition Cardinality of Intersection N(k) \ N(a) =1 Best Node a e j c a a k b N(k) \ N(b) =2 b d c N(k) \ N(c) =0 c l Which node should be processed along with k: the one with highest intersection of neighborhood with k c n m 46
172 Speed-up on SMP after Node Ordering 47 [Subramanya & Bilmes, JMLR, 2011]
173 Outline Motivation Graph Construction Inference Methods Scalability Applications - Scalability Issues - Node reordering - MapReduce Parallelization Conclusion & Future Work 48
174 MapReduce Implementation of MAD a Current label estimate on b b Seed v Prior c
175 MapReduce Implementation of MAD Map Each node send its current label assignments to its neighbors a 0.60 Current label estimate on b 0.75 b Seed v Prior c
176 MapReduce Implementation of MAD Map Each node send its current label assignments to its neighbors a b Seed v Prior c
177 MapReduce Implementation of MAD Map Each node send its current label assignments to its neighbors a b Reduce Each node updates its own label assignment using messages received from neighbors, and its own information (e.g., seed labels, reg. penalties etc.) Repeat until convergence v Reduce Seed Prior c
178 MapReduce Implementation of MAD Map a b Each node send its current label assignments to its neighbors Reduce New label estimate on v Each node updates its own label assignment using messages received from neighbors, and its own information (e.g., seed labels, reg. penalties etc.) 0.60 Seed v Prior Repeat until convergence c 49
179 MapReduce Implementation of MAD Map Each node send its current label assignments to its neighbors Reduce New label estimate on v Each node updates its own label assignment using messages received from neighbors, and its own information (e.g., seed labels, a 0.60 Seed v Prior Code in Junto Label Propagation Toolkit reg. penalties etc.) b Repeat (includes until convergence Hadoop-based implementation) c 49
180 MapReduce Implementation of MAD Map Each node send its current label assignments to its neighbors Reduce New label estimate on v Each node updates its own label assignment using messages received from neighbors, and its own information (e.g., seed labels, reg. penalties etc.) a 0.60 Graph-based algorithms are amenable to distributed processing Seed v Prior Code in Junto Label Propagation Toolkit b Repeat (includes until convergence Hadoop-based implementation) c 49
181 When to use Graph-based SSL and which method? 50
182 When to use Graph-based SSL and which method? When input data itself is a graph (relational data) or, when the data is expected to lie on a manifold 50
183 When to use Graph-based SSL and which method? When input data itself is a graph (relational data) or, when the data is expected to lie on a manifold MAD, Quadratic Criteria (QC) when labels are not mutually exclusive MADDL: when label similarities are known 50
184 When to use Graph-based SSL and which method? When input data itself is a graph (relational data) or, when the data is expected to lie on a manifold MAD, Quadratic Criteria (QC) when labels are not mutually exclusive MADDL: when label similarities are known Measure Propagation (MP) for probabilistic interpretation 50
185 When to use Graph-based SSL and which method? When input data itself is a graph (relational data) or, when the data is expected to lie on a manifold MAD, Quadratic Criteria (QC) when labels are not mutually exclusive MADDL: when label similarities are known Measure Propagation (MP) for probabilistic interpretation Manifold Regularization for generalization to unseen data (induction) 50
186 Graph-based SSL: Summary 51
187 Graph-based SSL: Summary Provide flexible representation for both IID and relational data 51
188 Graph-based SSL: Summary Provide flexible representation for both IID and relational data Graph construction can be key 51
189 Graph-based SSL: Summary Provide flexible representation for both IID and relational data Graph construction can be key Scalable: Node Reordering and MapReduce 51
190 Graph-based SSL: Summary Provide flexible representation for both IID and relational data Graph construction can be key Scalable: Node Reordering and MapReduce Can handle labeled as well as unlabeled data 51
191 Graph-based SSL: Summary Provide flexible representation for both IID and relational data Graph construction can be key Scalable: Node Reordering and MapReduce Can handle labeled as well as unlabeled data Can handle multi class, multi label settings 51
192 Graph-based SSL: Summary Provide flexible representation for both IID and relational data Graph construction can be key Scalable: Node Reordering and MapReduce Can handle labeled as well as unlabeled data Can handle multi class, multi label settings Effective in practice 51
193 Open Challenges 52
194 Open Challenges Graph-based SSL for Structured Prediction Algorithms: Combining Inductive and graph-based methods Applications: Constituency and dependency parsing, Coreference 52
195 Open Challenges Graph-based SSL for Structured Prediction Algorithms: Combining Inductive and graph-based methods Applications: Constituency and dependency parsing, Coreference Scalable graph construction, especially with multi-modal data 52
196 Open Challenges Graph-based SSL for Structured Prediction Algorithms: Combining Inductive and graph-based methods Applications: Constituency and dependency parsing, Coreference Scalable graph construction, especially with multi-modal data Extensions with other loss functions, sparsity, etc. 52
197 53
198 References (I) [1] A. Alexandrescu and K. Kirchhoff. Data-driven graph construction for semi-supervised graph-based learning in nlp. In NAACL HLT, [2] Y. Altun, D. McAllester, and M. Belkin. Maximum margin semi-supervised learn- ing for structured variables. NIPS, [3] S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: taking random walks through the view graph. In WWW, [4] R. Bekkerman, R. El-Yaniv, N. Tishby, and Y. Winter. Distributional word clusters vs. words for text categorization. J. Mach. Learn. Res., 3: , [5] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7: , [6] Y. Bengio, O. Delalleau, and N. Le Roux. Label propagation and quadratic criterion. Semi-supervised learning, [7] T. Berg-Kirkpatrick, A. Bouchard-Coˆt é, J. DeNero, and D. Klein. Painless unsupervised learning with features. In HLT-NAACL, [8] J. Bilmes and A. Subramanya. Scaling up Machine Learning: Parallel and Distributed Approaches, chapter Parallel Graph-Based Semi- Supervised Learning [9] S. Blair-goldensohn, T. Neylon, K. Hannan, G. A. Reis, R. Mcdonald, and J. Reynar. Building a sentiment summarizer for local service reviews. In In NLP in the Information Explosion Era, [10] M. Cafarella, A. Halevy, D. Wang, E. Wu, and Y. Zhang. Webtables: exploring the power of tables on the web. VLDB, [11] O. Chapelle, B. Scho lkopf, A. Zien, et al. Semi-supervised learning. MIT press Cambridge, MA:, [12] Y. Choi and C. Cardie. Adapting a polarity lexicon using integer linear program- ming for domain specific sentiment classification. In EMNLP, [13] S. Daitch, J. Kelner, and D. Spielman. Fitting a graph to vector data. In ICML, [14] D. Das and S. Petrov. Unsupervised part-of-speech tagging with bilingual graph- based projections. In ACL, [15] D. Das, N. Schneider, D. Chen, and N. A. Smith. Probabilistic frame-semantic parsing. In NAACL-HLT, [16] D. Das and N. Smith. Graph-based lexicon expansion with sparsity-inducing penalties. NAACL-HLT, [17] D. Das and N. A. Smith. Semi-supervised frame-semantic parsing for unknown predicates. In ACL, [18] J. Davis, B. Kulis, P. Jain, S. Sra, and I. Dhillon. Information-theoretic metric learning. In ICML, [19] O. Delalleau, Y. Bengio, and N. L. Roux. Efficient non-parametric function induction in semi-supervised learning. In AISTATS, [20] P. Dhillon, P. Talukdar, and K. Crammer. Inference-driven metric learning for graph construction. Technical report, MS-CIS-10-18, University of Pennsylvania,
199 References (II) [21] S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM, [22] J. Friedman, J. Bentley, and R. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transaction on Mathematical Software, 3, [23] J. Garcke and M. Griebel. Data mining with sparse grids using simplicial basis functions. In KDD, [24] A. Goldberg and X. Zhu. Seeing stars when there aren t many stars: graph-based semi-supervised learning for sentiment categorization. In Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, [25] A. Goldberg, X. Zhu, and S. Wright. Dissimilarity in graph-based semi-supervised classification. AISTATS, [26] M. Hu and B. Liu. Mining and summarizing customer reviews. In KDD, [27] T. Jebara, J. Wang, and S. Chang. Graph construction and b-matching for semi-supervised learning. In ICML, [28] T. Joachims. Transductive inference for text classification using support vector machines. In ICML, [29] T. Joachims. Transductive learning via spectral graph partitioning. In ICML, [30] M. Karlen, J. Weston, A. Erkan, and R. Collobert. Large scale manifold transduction. In ICML, [31] S.-M. Kim and E. Hovy. Determining the sentiment of opinions. In Proceedings of the 20th International conference on Computational Linguistics, [32] F. Kschischang, B. Frey, and H. Loeliger. Factor graphs and the sum-product algorithm. Information Theory, IEEE Transactions on, 47(2): , [33] K. Lerman, S. Blair-Goldensohn, and R. McDonald. Sentiment summarization: evaluating and learning user preferences. In EACL, [34] D.Lewisetal.Reuters [35] J. Malkin, A. Subramanya, and J. Bilmes. On the semi-supervised learning of multi-layered perceptrons. In InterSpeech, [36] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?: sentiment classification using machine learning techniques. In EMNLP, [37] D. Rao and D. Ravichandran. Semi-supervised polarity lexicon induction. In EACL, [38] A. Subramanya and J. Bilmes. Soft-supervised learning for text classification. In EMNLP, [39] A. Subramanya and J. Bilmes. Entropic graph regularization in non-parametric semi-supervised classification. NIPS, [40] A. Subramanya and J. Bilmes. Semi-supervised learning with measure propagation. JMLR,
200 References (III) [41] A. Subramanya, S. Petrov, and F. Pereira. Efficient graph-based semi-supervised learning of structured tagging models. In EMNLP, [42] P. Talukdar. Topics in graph construction for semi-supervised learning. Technical report, MS-CIS-09-13, University of Pennsylvania, [43] P. Talukdar and K. Crammer. New regularized algorithms for transductive learning. ECML, [44] P. Talukdar and F. Pereira. Experiments in graph-based semi-supervised learning methods for class-instance acquisition. In ACL, [45] P. Talukdar, J. Reisinger, M. Pa şca, D. Ravichandran, R. Bhagat, and F. Pereira. Weakly-supervised acquisition of labeled class instances using graph random walks. In EMNLP, [46] B. Van Durme and M. Pasca. Finding cars, goddesses and enzymes: Parametrizable acquisition of labeled instances for opendomain information extraction. In AAAI, [47] L. Velikovich, S. Blair-Goldensohn, K. Hannan, and R. McDonald. The viability of web-derived polarity lexicons. In HLT-NAACL, [48] F. Wang and C. Zhang. Label propagation through linear neighborhoods. In ICML, [49] J. Wang, T. Jebara, and S. Chang. Graph transduction via alternating minimization. In ICML, [50] R. Wang and W. Cohen. Language-independent set expansion of named entities using the web. In ICDM, [51] K. Weinberger and L. Saul. Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research, 10: , [52] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase- level sentiment analysis. In HLT-EMNLP, [53] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scho lkopf. Learning with local and global consistency. NIPS, [54] D. Zhou, J. Huang, and B. Scho lkopf. Learning from labeled and un- labeled data on a directed graph. In ICML, [55] D. Zhou, B. Scho lkopf, and T. Hofmann. Semi-supervised learning on directed graphs. In NIPS, [56] X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, CMU- CALD , Carnegie Mellon University, [57] X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, Carnegie Mellon University, [58] X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, [59] X. Zhu and J. Lafferty. Harmonic mixtures: combining mixture models and graph- based methods for inductive and scalable semi-supervised learning. In ICML,
201 Thanks! Web: 57
Graph-based Semi- Supervised Learning as Optimization
Graph-based Semi- Supervised Learning as Optimization Partha Pratim Talukdar CMU Machine Learning with Large Datasets (10-605) April 3, 2012 Graph-based Semi-Supervised Learning 0.2 0.1 0.2 0.3 0.3 0.2
More informationLearning Better Data Representation using Inference-Driven Metric Learning
Learning Better Data Representation using Inference-Driven Metric Learning Paramveer S. Dhillon CIS Deptt., Univ. of Penn. Philadelphia, PA, U.S.A dhillon@cis.upenn.edu Partha Pratim Talukdar Search Labs,
More informationInference Driven Metric Learning (IDML) for Graph Construction
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 1-1-2010 Inference Driven Metric Learning (IDML) for Graph Construction Paramveer S. Dhillon
More informationTransductive Phoneme Classification Using Local Scaling And Confidence
202 IEEE 27-th Convention of Electrical and Electronics Engineers in Israel Transductive Phoneme Classification Using Local Scaling And Confidence Matan Orbach Dept. of Electrical Engineering Technion
More informationSemi-Supervised Learning: Lecture Notes
Semi-Supervised Learning: Lecture Notes William W. Cohen March 30, 2018 1 What is Semi-Supervised Learning? In supervised learning, a learner is given a dataset of m labeled examples {(x 1, y 1 ),...,
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationSemi-supervised Data Representation via Affinity Graph Learning
1 Semi-supervised Data Representation via Affinity Graph Learning Weiya Ren 1 1 College of Information System and Management, National University of Defense Technology, Changsha, Hunan, P.R China, 410073
More informationS. Yamuna M.Tech Student Dept of CSE, Jawaharlal Nehru Technological University, Anantapur
Ensemble Prototype Vector Machines based on Semi Supervised Classifiers S. Yamuna M.Tech Student Dept of CSE, Jawaharlal Nehru Technological University, Anantapur Abstract: -- In large number of real world
More informationTopics in Graph Construction for Semi-Supervised Learning
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 8-27-2009 Topics in Graph Construction for Semi-Supervised Learning Partha Pratim Talukdar
More informationScaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch
Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch Graph-based SSL using a count-min sketch has a number of properties that are desirable, and somewhat surprising.
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationGraph-based Techniques for Searching Large-Scale Noisy Multimedia Data
Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data Shih-Fu Chang Department of Electrical Engineering Department of Computer Science Columbia University Joint work with Jun Wang (IBM),
More informationKernel-based Transductive Learning with Nearest Neighbors
Kernel-based Transductive Learning with Nearest Neighbors Liangcai Shu, Jinhui Wu, Lei Yu, and Weiyi Meng Dept. of Computer Science, SUNY at Binghamton Binghamton, New York 13902, U. S. A. {lshu,jwu6,lyu,meng}@cs.binghamton.edu
More informationLarge Scale Distributed Semi-Supervised Learning Using Streaming Approximation
Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation Several classification and knowledge expansion type of problems involve a large number of labels in realworld scenarios. For
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationSemi-supervised learning SSL (on graphs)
Semi-supervised learning SSL (on graphs) 1 Announcement No office hour for William after class today! 2 Semi-supervised learning Given: A pool of labeled examples L A (usually larger) pool of unlabeled
More informationGraph-Based Lexicon Expansion with Sparsity-Inducing Penalties
Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties Dipanjan Das and Noah A. Smith Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213, USA {dipanjan,nasmith}@cs.cmu.edu
More informationThorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA
Retrospective ICML99 Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Outline The paper in
More informationEfficient Iterative Semi-supervised Classification on Manifold
. Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani
More informationAlternating Minimization. Jun Wang, Tony Jebara, and Shih-fu Chang
Graph Transduction via Alternating Minimization Jun Wang, Tony Jebara, and Shih-fu Chang 1 Outline of the presentation Brief introduction and related work Problems with Graph Labeling Imbalanced labels
More informationLarge-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver
Large-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver AAAI Press Association for the Advancement of Artificial Intelligence 2275 East Bayshore Road, Suite 60 Palo Alto, California
More informationGraph-Based Weakly- Supervised Methods for Information Extraction & Integration. Partha Pratim Talukdar University of Pennsylvania
Graph-Based Weakly- Supervised Methods for Information Extraction & Integration Partha Pratim Talukdar University of Pennsylvania Dissertation Defense, February 24, 2010 End Goal We should be able to answer
More informationNeighbor Search with Global Geometry: A Minimax Message Passing Algorithm
: A Minimax Message Passing Algorithm Kye-Hyeon Kim fenrir@postech.ac.kr Seungjin Choi seungjin@postech.ac.kr Department of Computer Science, Pohang University of Science and Technology, San 31 Hyoja-dong,
More informationRegularized Learning with Networks of Features
Regularized Learning with Networks of Features Ted Sandler, Partha Pratim Talukdar, and Lyle H. Ungar Department of Computer & Information Science, University of Pennsylvania {tsandler,partha,ungar}@cis.upenn.edu
More informationINTRO TO SEMI-SUPERVISED LEARNING (SSL)
SSL (on graphs) 1 INTRO TO SEMI-SUPERVISED LEARNING (SSL) Semi-supervised learning Given: A pool of labeled examples L A (usually larger) pool of unlabeled examples U Option 1 for using L and U : Ignore
More informationInstance-level Semi-supervised Multiple Instance Learning
Instance-level Semi-supervised Multiple Instance Learning Yangqing Jia and Changshui Zhang State Key Laboratory on Intelligent Technology and Systems Tsinghua National Laboratory for Information Science
More informationLarge-Scale Semi-Supervised Learning
Large-Scale Semi-Supervised Learning Jason WESTON a a NEC LABS America, Inc., 4 Independence Way, Princeton, NJ, USA 08540. Abstract. Labeling data is expensive, whilst unlabeled data is often abundant
More informationAutomatic Domain Partitioning for Multi-Domain Learning
Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels
More informationGraph based machine learning with applications to media analytics
Graph based machine learning with applications to media analytics Lei Ding, PhD 9-1-2011 with collaborators at Outline Graph based machine learning Basic structures Algorithms Examples Applications in
More informationScalable Clustering of Signed Networks Using Balance Normalized Cut
Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.
More informationLarge Scale Manifold Transduction
Large Scale Manifold Transduction Michael Karlen, Jason Weston, Ayse Erkan & Ronan Collobert NEC Labs America, Princeton, USA Ećole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland New York University,
More informationUnsupervised and Semi-Supervised Learning vial 1 -Norm Graph
Unsupervised and Semi-Supervised Learning vial -Norm Graph Feiping Nie, Hua Wang, Heng Huang, Chris Ding Department of Computer Science and Engineering University of Texas, Arlington, TX 769, USA {feipingnie,huawangcs}@gmail.com,
More informationSupplementary Material: The Emergence of. Organizing Structure in Conceptual Representation
Supplementary Material: The Emergence of Organizing Structure in Conceptual Representation Brenden M. Lake, 1,2 Neil D. Lawrence, 3 Joshua B. Tenenbaum, 4,5 1 Center for Data Science, New York University
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems
More informationSEMI-SUPERVISED LEARNING (SSL) for classification
IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 12, DECEMBER 2015 2411 Bilinear Embedding Label Propagation: Towards Scalable Prediction of Image Labels Yuchen Liang, Zhao Zhang, Member, IEEE, Weiming Jiang,
More informationLawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory
Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title Prototype Vector Machine for Large Scale Semi-Supervised Learning Permalink https://escholarship.org/uc/item/64r3c1rx Author
More informationLearning Models of Similarity: Metric and Kernel Learning. Eric Heim, University of Pittsburgh
Learning Models of Similarity: Metric and Kernel Learning Eric Heim, University of Pittsburgh Standard Machine Learning Pipeline Manually-Tuned Features Machine Learning Model Desired Output for Task Features
More informationEffective Latent Space Graph-based Re-ranking Model with Global Consistency
Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case
More informationHarmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning
Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning Xiaojin Zhu John Lafferty School of Computer Science, Carnegie ellon University,
More informationEfficient Non-Parametric Function Induction in Semi-Supervised Learning
Efficient Non-Parametric Function Induction in Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Dept. IRO, Université de Montréal P.O. Box 618, Succ. Centre-Ville, Montreal,
More informationGRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION
GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION Nasehe Jamshidpour a, Saeid Homayouni b, Abdolreza Safari a a Dept. of Geomatics Engineering, College of Engineering,
More informationTyped Graph Models for Semi-Supervised Learning of Name Ethnicity
Typed Graph Models for Semi-Supervised Learning of Name Ethnicity Delip Rao Dept. of Computer Science Johns Hopkins University delip@cs.jhu.edu David Yarowsky Dept. of Computer Science Johns Hopkins University
More informationSemi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization
Semi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization Sunil Thulasidasan 1,2, Jeffrey Bilmes 2 1 Los Alamos National Laboratory 2 Department
More informationNatural Language Processing
Natural Language Processing Machine Learning Potsdam, 26 April 2012 Saeedeh Momtazi Information Systems Group Introduction 2 Machine Learning Field of study that gives computers the ability to learn without
More informationImproving Image Segmentation Quality Via Graph Theory
International Symposium on Computers & Informatics (ISCI 05) Improving Image Segmentation Quality Via Graph Theory Xiangxiang Li, Songhao Zhu School of Automatic, Nanjing University of Post and Telecommunications,
More informationPart 5: Structured Support Vector Machines
Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Providence, 21st June 2012 1 / 34 Problem (Loss-Minimizing Parameter Learning) Let d(x, y) be the (unknown) true data
More informationA METRIC LEARNING APPROACH FOR GRAPH-BASED
A METRIC LEARNING APPROACH FOR GRAPH-BASED LABEL PROPAGATION Pauline Wauquier Clic And Walk 25 rue Corneille 59100 Roubaix France pauline.wauquier@inria.fr Mikaela Keller Univ. Lille, CNRS, Centrale Lille,
More informationSemi-supervised vs. Cross-domain Graphs for Sentiment Analysis
Semi-supervised vs. Cross-domain Graphs for Sentiment Analysis Natalia Ponomareva University of Wolverhampton, UK nata.ponomareva@wlv.ac.uk Mike Thelwall University of Wolverhampton, UK m.thelwall@wlv.ac.uk
More informationConstrained Metric Learning via Distance Gap Maximization
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Constrained Metric Learning via Distance Gap Maximization Wei Liu, Xinmei Tian, Dacheng Tao School of Computer Engineering
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationon learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015
on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,
More informationLarge-Scale Face Manifold Learning
Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random
More informationTransductive Classification Methods for Mixed Graphs
Transductive Classification Methods for Mixed Graphs S Sundararajan Yahoo! Labs Bangalore, India ssrajan@yahoo-inc.com S Sathiya Keerthi Yahoo! Labs Santa Clara, CA selvarak@yahoo-inc.com ABSTRACT In this
More informationString Vector based KNN for Text Categorization
458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research
More informationComputationally Efficient M-Estimation of Log-Linear Structure Models
Computationally Efficient M-Estimation of Log-Linear Structure Models Noah Smith, Doug Vail, and John Lafferty School of Computer Science Carnegie Mellon University {nasmith,dvail2,lafferty}@cs.cmu.edu
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationSemi-supervised Learning
Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationObject Classification Problem
HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category
More informationSVMs and Data Dependent Distance Metric
SVMs and Data Dependent Distance Metric N. Zaidi, D. Squire Clayton School of Information Technology, Monash University, Clayton, VIC 38, Australia Email: {nayyar.zaidi,david.squire}@monash.edu Abstract
More informationApplied Statistics for Neuroscientists Part IIa: Machine Learning
Applied Statistics for Neuroscientists Part IIa: Machine Learning Dr. Seyed-Ahmad Ahmadi 04.04.2017 16.11.2017 Outline Machine Learning Difference between statistics and machine learning Modeling the problem
More informationSemi-supervised learning and active learning
Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationStatistical parsing. Fei Xia Feb 27, 2009 CSE 590A
Statistical parsing Fei Xia Feb 27, 2009 CSE 590A Statistical parsing History-based models (1995-2000) Recent development (2000-present): Supervised learning: reranking and label splitting Semi-supervised
More informationAdaptation of Graph-Based Semi-Supervised Methods to Large-Scale Text Data
Adaptation of Graph-Based Semi-Supervised Methods to Large-Scale Text Data ABSTRACT Frank Lin Carnegie Mellon University 5 Forbes Ave. Pittsburgh, PA 15213 frank@cs.cmu.edu Graph-based semi-supervised
More informationRobot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning
Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge
More informationLearning Two-View Stereo Matching
Learning Two-View Stereo Matching Jianxiong Xiao Jingni Chen Dit-Yan Yeung Long Quan Department of Computer Science and Engineering The Hong Kong University of Science and Technology The 10th European
More informationBipartite Edge Prediction via Transductive Learning over Product Graphs
Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang School of Computer Science, Carnegie Mellon University July 8, 2015 ICML 2015 Bipartite Edge Prediction
More informationDay 3 Lecture 1. Unsupervised Learning
Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations
More informationK Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat
K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that
More informationMetric Learning for Large-Scale Image Classification:
Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationGuiding Semi-Supervision with Constraint-Driven Learning
Guiding Semi-Supervision with Constraint-Driven Learning Ming-Wei Chang 1 Lev Ratinov 2 Dan Roth 3 1 Department of Computer Science University of Illinois at Urbana-Champaign Paper presentation by: Drew
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationGraph Transduction via Alternating Minimization
Jun Wang Department of Electrical Engineering, Columbia University Tony Jebara Department of Computer Science, Columbia University Shih-Fu Chang Department of Electrical Engineering, Columbia University
More informationGraph Transduction via Alternating Minimization
Jun Wang Department of Electrical Engineering, Columbia University Tony Jebara Department of Computer Science, Columbia University Shih-Fu Chang Department of Electrical Engineering, Columbia University
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationCollaborative Filtering: A Comparison of Graph-Based Semi-Supervised Learning Methods and Memory-Based Methods
70 Computer Science 8 Collaborative Filtering: A Comparison of Graph-Based Semi-Supervised Learning Methods and Memory-Based Methods Rasna R. Walia Collaborative filtering is a method of making predictions
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important
More informationLearning a Distance Metric for Structured Network Prediction. Stuart Andrews and Tony Jebara Columbia University
Learning a Distance Metric for Structured Network Prediction Stuart Andrews and Tony Jebara Columbia University Outline Introduction Context, motivation & problem definition Contributions Structured network
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationRobust Semi-Supervised Learning through Label Aggregation
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Robust Semi-Supervised Learning through Label Aggregation Yan Yan, Zhongwen Xu, Ivor W. Tsang, Guodong Long, Yi Yang Centre
More informationData Clustering. Danushka Bollegala
Data Clustering Danushka Bollegala Outline Why cluster data? Clustering as unsupervised learning Clustering algorithms k-means, k-medoids agglomerative clustering Brown s clustering Spectral clustering
More informationOn Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution
ICML2011 Jun. 28-Jul. 2, 2011 On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution Masashi Sugiyama, Makoto Yamada, Manabu Kimura, and Hirotaka Hachiya Department of
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationGlobal Metric Learning by Gradient Descent
Global Metric Learning by Gradient Descent Jens Hocke and Thomas Martinetz University of Lübeck - Institute for Neuro- and Bioinformatics Ratzeburger Allee 160, 23538 Lübeck, Germany hocke@inb.uni-luebeck.de
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationDeeper Insights into Graph Convolutional Networks for Semi-Supervised Learning
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning Qimai Li, 1 Zhichao Han, 1,2 Xiao-Ming Wu 1 1 The Hong
More informationSpectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014
Spectral Clustering Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 What are we going to talk about? Introduction Clustering and
More informationSemi-supervised Learning by Sparse Representation
Semi-supervised Learning by Sparse Representation Shuicheng Yan Huan Wang Abstract In this paper, we present a novel semi-supervised learning framework based on l 1 graph. The l 1 graph is motivated by
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More information无监督学习中的选代表和被代表问题 - AP & LLE 张响亮. Xiangliang Zhang. King Abdullah University of Science and Technology. CNCC, Oct 25, 2018 Hangzhou, China
无监督学习中的选代表和被代表问题 - AP & LLE 张响亮 Xiangliang Zhang King Abdullah University of Science and Technology CNCC, Oct 25, 2018 Hangzhou, China Outline Affinity Propagation (AP) [Frey and Dueck, Science, 2007]
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important
More informationGraph Laplacian Kernels for Object Classification from a Single Example
Graph Laplacian Kernels for Object Classification from a Single Example Hong Chang & Dit-Yan Yeung Department of Computer Science, Hong Kong University of Science and Technology {hongch,dyyeung}@cs.ust.hk
More informationDeepWalk: Online Learning of Social Representations
DeepWalk: Online Learning of Social Representations ACM SIG-KDD August 26, 2014, Rami Al-Rfou, Steven Skiena Stony Brook University Outline Introduction: Graphs as Features Language Modeling DeepWalk Evaluation:
More informationCluster Analysis (b) Lijun Zhang
Cluster Analysis (b) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Grid-Based and Density-Based Algorithms Graph-Based Algorithms Non-negative Matrix Factorization Cluster Validation Summary
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More information