(Graph-based) Semi-Supervised Learning. Partha Pratim Talukdar Indian Institute of Science

Size: px
Start display at page:

Download "(Graph-based) Semi-Supervised Learning. Partha Pratim Talukdar Indian Institute of Science"

Transcription

1 (Graph-based) Semi-Supervised Learning Partha Pratim Talukdar Indian Institute of Science April 7, 2015

2 Supervised Learning Labeled Data Learning Algorithm Model 2

3 Supervised Learning Labeled Data Learning Algorithm Model Examples: Decision Trees Support Vector Machine (SVM) Maximum Entropy (MaxEnt) 2

4 Semi-Supervised Learning (SSL) Labeled Data Learning Algorithm Model A Lot of Unlabeled Data 3

5 Semi-Supervised Learning (SSL) Labeled Data Learning Algorithm Model A Lot of Unlabeled Data Examples: Self-Training Co-Training 3

6 Why SSL? How can unlabeled data be helpful? 4

7 Why SSL? How can unlabeled data be helpful? Without Unlabeled Data 4

8 Why SSL? How can unlabeled data be helpful? Labeled Instances Without Unlabeled Data 4

9 Why SSL? How can unlabeled data be helpful? Labeled Instances Decision Boundary Without Unlabeled Data 4

10 Why SSL? How can unlabeled data be helpful? Labeled Instances Decision Boundary Without Unlabeled Data With Unlabeled Data 4

11 Why SSL? How can unlabeled data be helpful? Labeled Instances Decision Boundary Without Unlabeled Data With Unlabeled Data Unlabeled Instances 4

12 Why SSL? How can unlabeled data be helpful? Labeled Instances More accurate decision boundary in the presence of unlabeled instances Decision Boundary Without Unlabeled Data With Unlabeled Data Unlabeled Instances Example from [Belkin et al., JMLR 2006] 4

13 Inductive vs Transductive 5

14 Inductive vs Transductive Supervised (Labeled) Semi-supervised (Labeled + Unlabeled) 5

15 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) Semi-supervised (Labeled + Unlabeled) 5

16 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy Semi-supervised (Labeled + Unlabeled) 5

17 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy X Semi-supervised (Labeled + Unlabeled) 5

18 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy X Semi-supervised (Labeled + Unlabeled) Manifold Regularization 5

19 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy X Semi-supervised (Labeled + Unlabeled) Manifold Regularization Label Propagation (LP), MAD, MP,... 5

20 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy X Semi-supervised (Labeled + Unlabeled) Manifold Regularization Label Propagation (LP), MAD, MP,... Most Graph SSL algorithms are non-parametric (i.e., # parameters grows with data size) 5

21 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy X Focus of this talk Semi-supervised (Labeled + Unlabeled) Manifold Regularization Label Propagation (LP), MAD, MP,... Most Graph SSL algorithms are non-parametric (i.e., # parameters grows with data size) 5

22 Inductive vs Transductive Inductive (Generalize to Unseen Data) Transductive (Doesn t Generalize to Unseen Data) Supervised (Labeled) SVM, Maximum Entropy X Focus of this talk Semi-supervised (Labeled + Unlabeled) Manifold Regularization Label Propagation (LP), MAD, MP,... Most Graph SSL algorithms are non-parametric (i.e., # parameters grows with data size) See Chapter 25 of SSL Book: 5

23 Two Popular SSL Algorithms Self Training 6

24 Two Popular SSL Algorithms Self Training Co-Training 6

25 Why Graph-based SSL? 7

26 Why Graph-based SSL? Some datasets are naturally represented by a graph web, citation network, social network,... 7

27 Why Graph-based SSL? Some datasets are naturally represented by a graph web, citation network, social network,... Uniform representation for heterogeneous data 7

28 Why Graph-based SSL? Some datasets are naturally represented by a graph web, citation network, social network,... Uniform representation for heterogeneous data Easily parallelizable, scalable to large data 7

29 Why Graph-based SSL? Some datasets are naturally represented by a graph web, citation network, social network,... Uniform representation for heterogeneous data Easily parallelizable, scalable to large data Effective in practice 7

30 Why Graph-based SSL? Some datasets are naturally represented by a graph web, citation network, social network,... Uniform representation for heterogeneous data Easily parallelizable, scalable to large data Effective in practice Graph SSL Non-Graph SSL Text Classification Supervised 7

31 Graph-based SSL 8

32 Graph-based SSL 8

33 Graph-based SSL Similarity

34 Graph-based SSL Similarity business politics 8

35 Graph-based SSL Similarity business politics business politics 8

36 Graph-based SSL 9

37 Graph-based SSL Smoothness Assumption If two instances are similar according to the graph, then output labels should be similar 9

38 Graph-based SSL Smoothness Assumption If two instances are similar according to the graph, then output labels should be similar 9

39 Graph-based SSL Smoothness Assumption If two instances are similar according to the graph, then output labels should be similar Two stages Graph construction (if not already present) Label Inference 9

40 Outline Motivation Graph Construction Inference Methods Scalability Applications Conclusion & Future Work 10

41 Outline Motivation Graph Construction Inference Methods Scalability Applications Conclusion & Future Work 10

42 Graph Construction Neighborhood Methods k-nn Graph Construction (k-nng) e-neighborhood Method Metric Learning Other approaches 11

43 Neighborhood Methods 12

44 Neighborhood Methods k-nearest Neighbor Graph (k-nng) add edges between an instance and its k-nearest neighbors 12

45 Neighborhood Methods k-nearest Neighbor Graph (k-nng) add edges between an instance and its k-nearest neighbors k = 3 12

46 Neighborhood Methods k-nearest Neighbor Graph (k-nng) add edges between an instance and its k-nearest neighbors k = 3 e-neighborhood add edges to all instances inside a ball of radius e 12

47 Neighborhood Methods k-nearest Neighbor Graph (k-nng) add edges between an instance and its k-nearest neighbors k = 3 e-neighborhood add edges to all instances inside a ball of radius e e 12

48 Issues with k-nng 13

49 Issues with k-nng Not scalable (quadratic) 13

50 Issues with k-nng Not scalable (quadratic) Results in an asymmetric graph 13

51 Issues with k-nng Not scalable (quadratic) Results in an asymmetric graph b is the closest neighbor of a, but not the other way a b c 13

52 Issues with k-nng Not scalable (quadratic) Results in an asymmetric graph b is the closest neighbor of a, but not the other way a b c Results in irregular graphs some nodes may end up with higher degree than other nodes 13

53 Issues with k-nng Not scalable (quadratic) Results in an asymmetric graph b is the closest neighbor of a, but not the other way a b c Results in irregular graphs some nodes may end up with higher degree than other nodes Node of degree 4 in the k-nng (k = 1) 13

54 Issues with e-neighborhood 14

55 Issues with e-neighborhood Not scalable 14

56 Issues with e-neighborhood Not scalable Sensitive to value of e : not invariant to scaling 14

57 Issues with e-neighborhood Not scalable Sensitive to value of e : not invariant to scaling Fragmented Graph: disconnected components 14

58 Issues with e-neighborhood Not scalable Sensitive to value of e : not invariant to scaling Fragmented Graph: disconnected components Disconnected Data e-neighborhood Figure from [Jebara et al., ICML 2009] 14

59 Graph Construction using Metric Learning 15

60 Graph Construction using Metric Learning x i w ij / exp( D A (x i,x j )) x j 15

61 Graph Construction using Metric Learning x i w ij / exp( D A (x i,x j )) x j D A (x i,x j )=(x i x j ) T A(x i x j ) Estimated using Mahalanobis metric learning algorithms 15

62 Graph Construction using Metric Learning x i w ij / exp( D A (x i,x j )) x j D A (x i,x j )=(x i x j ) T A(x i x j ) Supervised Metric Learning ITML [Kulis et al., ICML 2007] LMNN [Weinberger and Saul, JMLR 2009] Estimated using Mahalanobis metric learning algorithms Semi-supervised Metric Learning IDML [Dhillon et al., UPenn TR 2010] 15

63 Benefits of Metric Learning for Graph Construction 16

64 Benefits of Metric Learning for Graph Construction 0.5 Original RP PCA ITML IDML Error Amazon Newsgroups Reuters Enron A Text 100 seed and1400 test instances, all inferences using LP 16

65 Benefits of Metric Learning for Graph Construction Original RP PCA ITML IDML Graph constructed using supervised metric learning Error Amazon Newsgroups Reuters Enron A Text 100 seed and1400 test instances, all inferences using LP 16

66 Benefits of Metric Learning for Graph Construction Original RP PCA ITML IDML Graph constructed using supervised metric learning Graph constructed using semi-supervised metric learning [Dhillon et al., 2010] Error Amazon Newsgroups Reuters Enron A Text 100 seed and1400 test instances, all inferences using LP 16 [Dhillon et al., UPenn TR 2010]

67 Benefits of Metric Learning for Graph Construction Original RP PCA ITML IDML Graph constructed using supervised metric learning Graph constructed using semi-supervised metric learning [Dhillon et al., 2010] Error Amazon Newsgroups Reuters Enron A Text 100 seed and1400 test instances, all inferences using LP Careful graph construction is critical! 16 [Dhillon et al., UPenn TR 2010]

68 Other Graph Construction Approaches Local Reconstruction Linear Neighborhood [Wang and Zhang, ICML 2005] Regular Graph: b-matching [Jebara et al., ICML 2008] Fitting Graph to Vector Data [Daitch et al., ICML 2009] Graph Kernels [Zhu et al., NIPS 2005] 17

69 Outline Motivation Graph Construction Inference Methods Scalability - Label Propagation - Modified Adsorption - Measure Propagation - Sparse Label Propagation - Manifold Regularization Applications Conclusion & Future Work 18

70 Graph Laplacian 19

71 Graph Laplacian Laplacian (un-normalized) of a graph: L = D W, where D ii = X j W ij, D ij(6=i) =0 19

72 Graph Laplacian Laplacian (un-normalized) of a graph: L = D W, where D ii = X j W ij, D ij(6=i) =0 1 b 3 c a 2 1 d 19

73 Graph Laplacian Laplacian (un-normalized) of a graph: L = D W, where D ii = X j W ij, D ij(6=i) =0 a b c d a b c d a b 2 3 c 1 d 19

74 1 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: 20

75 1 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: f T Lf = X i,j W ij (f i f j ) 2 20

76 1 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: Measure of Non-Smoothness f T Lf = X i,j W ij (f i f j ) 2 20

77 1 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: Vector of scores for single label on nodes f T Lf = X i,j W ij (f i f j ) 2 Measure of Non-Smoothness 20

78 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: Vector of scores for single label on nodes f T Lf = X i,j W ij (f i f j ) 2 Measure of Non-Smoothness 10 b 3 1 c 5 1 a 2 1 d 25 1 f T = [ ] 20

79 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: Vector of scores for single label on nodes f T Lf = X i,j W ij (f i f j ) 2 Measure of Non-Smoothness 10 b 3 1 c 5 1 a 2 1 d 25 1 f T = [ ] f T Lf = 588 Not Smooth 20

80 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: Vector of scores for single label on nodes W ij (f i f j ) 2 Measure of Non-Smoothness 1 1 b 3 f T Lf = X i,j c b 3 c 5 1 a 2 f T = [1113] f T Lf =4 1 d 3 1 a 2 1 d 25 f T = [ ] f T Lf = 588 Not Smooth 1 20

81 Graph Laplacian (contd.) L is positive semi-definite (assuming non-negative weights) Smoothness of prediction f over the graph in terms of the Laplacian: Vector of scores for single label on nodes W ij (f i f j ) 2 Measure of Non-Smoothness 1 1 b 3 f T Lf = X i,j c b 3 c 5 1 a 2 f T = [1113] f T Lf =4 1 d 3 Smooth 20 1 a 2 1 d 25 f T = [ ] f T Lf = 588 Not Smooth 1

82 Outline Motivation Graph Construction Inference Methods Scalability - Label Propagation - Modified Adsorption - Measure Propagation - Sparse Label Propagation - Manifold Regularization Applications Conclusion & Future Work 21

83 Notations Y v,l : score of seed label l on node v Ŷ v,l : score of estimated label l on node v R v,l : regularization target for label l on node v v Seed Scores Label Regularization Estimated Scores S : seed node indicator (diagonal matrix) W uv : weight of edge (u, v) in the graph 22

84 LP-ZGL [Zhu et al., ICML 2003] arg min Ŷ m l=1 W uv (Ŷul Ŷvl) 2 = m l=1 Ŷ T l LŶl such that Y ul = Ŷul, S uu =1 Graph Laplacian 23

85 LP-ZGL [Zhu et al., ICML 2003] Smooth arg min Ŷ m l=1 W uv (Ŷul Ŷvl) 2 = m l=1 Ŷ T l LŶl such that Y ul = Ŷul, S uu =1 Graph Laplacian 23

86 LP-ZGL [Zhu et al., ICML 2003] Smooth arg min Ŷ m l=1 W uv (Ŷul Ŷvl) 2 = m l=1 Ŷ T l LŶl such that Y ul = Ŷul, S uu =1 Match Seeds (hard) Graph Laplacian 23

87 LP-ZGL [Zhu et al., ICML 2003] Smooth arg min Ŷ m l=1 W uv (Ŷul Ŷvl) 2 = m l=1 Ŷ T l LŶl such that Y ul = Ŷul, S uu =1 Match Seeds (hard) Graph Laplacian Smoothness two nodes connected by an edge with high weight should be assigned similar labels 23

88 LP-ZGL [Zhu et al., ICML 2003] Smooth arg min Ŷ m l=1 W uv (Ŷul Ŷvl) 2 = m l=1 Ŷ T l LŶl such that Smoothness two nodes connected by an edge with high weight should be assigned similar labels Y ul = Ŷul, S uu =1 Match Seeds (hard) 23 Graph Laplacian Solution satisfies harmonic property

89 Outline Motivation Graph Construction Inference Methods Scalability - Label Propagation - Modified Adsorption - Manifold Regularization - Spectral Graph Transduction - Measure Propagation Applications Conclusion & Future Work 24

90 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] 25

91 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 SŶ l SY l 2 + µ 1 u,v M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores 25

92 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 u,v M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores 25

93 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 u,v Smooth M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores 25

94 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 u,v Smooth M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l Match Priors (Regularizer) R l 2 m labels, +1 dummy label M = W + W is the symmetrized weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores 25

95 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 u,v Smooth M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l Match Priors (Regularizer) R l 2 m labels, +1 dummy label M = W for + none-of-the-above W is symmetrized label weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores 25

96 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 u,v Smooth M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l Match Priors (Regularizer) R l 2 m labels, +1 dummy label M = W for + none-of-the-above W is symmetrized label weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores MAD has extra regularization compared to LP-ZGL [Zhu et al, ICML 03]; similar to QC [Bengio et al, 2006] 25

97 Modified Adsorption (MAD) [Talukdar and Crammer, ECML 2009] arg min Ŷ m+1 l=1 Match Seeds (soft) SŶ l SY l 2 + µ 1 u,v Smooth M uv (Ŷ ul Ŷ vl ) 2 + µ 2 Ŷ l Match Priors (Regularizer) R l 2 m labels, +1 dummy label M = W for + none-of-the-above W is symmetrized label weight matrix Ŷ vl: weight of label l on node v Y vl : seed weight for label l on node v S: diagonal matrix, nonzero for seed nodes R vl : regularization target for label l on node v v Seed Scores Label Priors Estimated Scores MAD s Objective is Convex MAD has extra regularization compared to LP-ZGL [Zhu et al, ICML 03]; similar to QC [Bengio et al, 2006] 25

98 Solving MAD Objective 26

99 Solving MAD Objective Can be solved using matrix inversion (like in LP) but matrix inversion is expensive 26

100 Solving MAD Objective Can be solved using matrix inversion (like in LP) but matrix inversion is expensive Instead solved exactly using a system of linear equations (Ax = b) solved using Jacobi iterations results in iterative updates guaranteed convergence see [Bengio et al., 2006] and [Talukdar and Crammer, ECML 2009] for details 26

101 27 Solving MAD using Iterative Updates Inputs Y, R : V ( L + 1), W : V V, S : V V diagonal Ŷ Y M = W + W > Current label Z v S vv + µ 1 Pu6=v M a estimate on b vu + µ 2 8v 2 V repeat for all v 2 V do Ŷ v (SY ) v + µ 1 M v Ŷ + µ 2 R v Z v end for until convergence 0.75 b Seed v Prior c 0.05

102 27 Solving MAD using Iterative Updates Inputs Y, R : V ( L + 1), W : V V, S : V V diagonal Ŷ Y M = W + W > Z v S vv + µ 1 Pu6=v M a vu + µ 2 8v 2 V repeat for all v 2 V do Ŷ v (SY ) v + µ 1 M v Ŷ + µ 2 R v Z v end for until convergence 0.75 b New label estimate on v Seed v Prior c 0.05

103 27 Solving MAD using Iterative Updates Inputs Y, R : V ( L + 1), W : V V, S : V V diagonal Ŷ Y M = W + W > Z v S vv + µ 1 Pu6=v M a vu + µ 2 8v 2 V repeat for all v 2 V do Ŷ v (SY ) v + µ 1 M v Ŷ + µ 2 R v Z v end for until convergence 0.75 b New label estimate on v Seed v Prior 0.05 Importance of a node can be discounted Easily Parallelizable: Scalable (more c later)

104 When is MAD most effective? 28

105 When is MAD most effective? 0.4 Relative Increase in MRR by MAD over LP-ZGL Average Degree 28

106 When is MAD most effective? 0.4 Relative Increase in MRR by MAD over LP-ZGL Average Degree MAD is particularly effective in denser graphs, where there is greater need for regularization. 28

107 Extension to Dependent Labels 29

108 Extension to Dependent Labels Labels are not always mutually exclusive 29

109 Extension to Dependent Labels Labels are not always mutually exclusive Label Similarity in Sentiment Classification 29

110 Extension to Dependent Labels Labels are not always mutually exclusive Label Similarity in Sentiment Classification Modified Adsorption with Dependent Labels (MADDL) [Talukdar and Crammer, ECML 2009] 29

111 Extension to Dependent Labels Labels are not always mutually exclusive Label Similarity in Sentiment Classification Modified Adsorption with Dependent Labels (MADDL) [Talukdar and Crammer, ECML 2009] Can take label similarities into account 29

112 Extension to Dependent Labels Labels are not always mutually exclusive Label Similarity in Sentiment Classification Modified Adsorption with Dependent Labels (MADDL) [Talukdar and Crammer, ECML 2009] Can take label similarities into account Convex Objective 29

113 Extension to Dependent Labels Labels are not always mutually exclusive Label Similarity in Sentiment Classification Modified Adsorption with Dependent Labels (MADDL) [Talukdar and Crammer, ECML 2009] Can take label similarities into account Convex Objective Efficient iterative/parallelizable updates as in MAD 29

114 Outline Motivation Graph Construction Inference Methods Scalability - Label Propagation - Modified Adsorption - Measure Propagation - Sparse Label Propagation - Manifold Regularization Applications Conclusion & Future Work 30

115 Measure Propagation (MP) [Subramanya and Bilmes, EMNLP 2008, NIPS 2009, JMLR 2011] CKL lx nx arg min {p i } i=1 D KL (r i p i )+µ X i,j w ij D KL (p i p j ) i=1 H(p i ) s.t. X y p i (y) =1, p i (y) 0, 8y, i 31

116 Measure Propagation (MP) [Subramanya and Bilmes, EMNLP 2008, NIPS 2009, JMLR 2011] CKL lx Divergence on seed nodes nx arg min {p i } i=1 D KL (r i p i )+µ X i,j w ij D KL (p i p j ) i=1 H(p i ) s.t. X y p i (y) =1, p i (y) 0, 8y, i Seed and estimated label distributions (normalized) on node i 31

117 Measure Propagation (MP) [Subramanya and Bilmes, EMNLP 2008, NIPS 2009, JMLR 2011] CKL lx Divergence on seed nodes Smoothness (divergence across edge) nx arg min {p i } i=1 D KL (r i p i )+µ X i,j w ij D KL (p i p j ) i=1 H(p i ) s.t. X y p i (y) =1, p i (y) 0, 8y, i Seed and estimated label distributions (normalized) on node i KL Divergence D KL (p i p j )= X y p i (y) log p i(y) p j (y) 31

118 Measure Propagation (MP) [Subramanya and Bilmes, EMNLP 2008, NIPS 2009, JMLR 2011] CKL arg min {p i } lx Divergence on seed nodes i=1 D KL (r i p i )+µ X i,j s.t. X y p i (y) =1, p i (y) Smoothness (divergence across edge) w ij D KL (p i p j ) 0, 8y, i Entropic Regularizer nx i=1 H(p i ) Seed and estimated label distributions (normalized) on node i KL Divergence D KL (p i p j )= X y p i (y) log p i(y) p j (y) H(p i )= Entropy X y p i (y) log p i (y) 31

119 Measure Propagation (MP) [Subramanya and Bilmes, EMNLP 2008, NIPS 2009, JMLR 2011] CKL arg min {p i } lx Divergence on seed nodes i=1 D KL (r i p i )+µ X i,j s.t. X y p i (y) =1, p i (y) Smoothness (divergence across edge) w ij D KL (p i p j ) 0, 8y, i Entropic Regularizer nx i=1 H(p i ) Seed and estimated label distributions (normalized) on node i KL Divergence D KL (p i p j )= X y p i (y) log p i(y) p j (y) H(p i )= Entropy X y p i (y) log p i (y) Normalization Constraint 31

120 Measure Propagation (MP) [Subramanya and Bilmes, EMNLP 2008, NIPS 2009, JMLR 2011] CKL arg min {p i } lx Divergence on seed nodes i=1 D KL (r i p i )+µ X i,j s.t. X y p i (y) =1, p i (y) Smoothness (divergence across edge) w ij D KL (p i p j ) 0, 8y, i Entropic Regularizer nx i=1 H(p i ) Seed and estimated label distributions (normalized) on node i KL Divergence D KL (p i p j )= X y p i (y) log p i(y) p j (y) H(p i )= Entropy X y p i (y) log p i (y) Normalization Constraint CKL is convex (with non-negative edge weights and hyper-parameters) MP is related to Information Regularization [Corduneanu and Jaakkola, 2003] 31

121 Solving MP Objective For ease of optimization, reformulate MP objective: CMP lx nx arg min {p i,q i } i=1 D KL (r i q i )+µ X i,j w 0 ijd KL (p i q j ) i=1 H(p i ) 32

122 Solving MP Objective For ease of optimization, reformulate MP objective: CMP lx nx arg min {p i,q i } i=1 D KL (r i q i )+µ X i,j w 0 ijd KL (p i q j ) i=1 H(p i ) New probability measure, one for each vertex, similar to pi 32

123 Solving MP Objective For ease of optimization, reformulate MP objective: CMP lx nx arg min {p i,q i } i=1 D KL (r i q i )+µ X i,j w 0 ijd KL (p i q j ) i=1 H(p i ) New probability measure, one for each vertex, similar to pi w 0 ij = w ij + (i, j) 32

124 Solving MP Objective For ease of optimization, reformulate MP objective: CMP lx nx arg min {p i,q i } i=1 D KL (r i q i )+µ X i,j w 0 ijd KL (p i q j ) i=1 H(p i ) New probability measure, one for each vertex, similar to pi w 0 ij = w ij + (i, j) Encourages agreement between pi and qi 32

125 Solving MP Objective For ease of optimization, reformulate MP objective: CMP lx nx arg min {p i,q i } i=1 D KL (r i q i )+µ X i,j w 0 ijd KL (p i q j ) i=1 H(p i ) New probability measure, one for each vertex, similar to pi CMP is also convex w 0 ij = w ij + (i, j) Encourages agreement between pi and qi (with non-negative edge weights and hyper-parameters) 32

126 Solving MP Objective For ease of optimization, reformulate MP objective: CMP lx nx arg min {p i,q i } i=1 D KL (r i q i )+µ X i,j w 0 ijd KL (p i q j ) i=1 H(p i ) New probability measure, one for each vertex, similar to pi CMP is also convex w 0 ij = w ij + (i, j) Encourages agreement between pi and qi (with non-negative edge weights and hyper-parameters) CMP can be solved using Alternating Minimization (AM) 32

127 Alternating Minimization 33

128 Alternating Minimization Q 0 33

129 Alternating Minimization P 1 Q 0 33

130 Alternating Minimization P 1 Q 1 Q 0 33

131 Alternating Minimization P 2 P 1 Q 1 Q 0 33

132 Alternating Minimization P 2 P 1 Q 2 Q 1 Q 0 33

133 Alternating Minimization P 3 P 2 P 1 Q 2 Q 1 Q 0 33

134 Alternating Minimization P 3 P 2 P 1 Q 2 Q 1 Q 0 33

135 Alternating Minimization P 3 P 2 P 1 Q 2 Q 1 Q 0 33

136 Alternating Minimization P 3 P 2 P 1 Q 2 Q 1 CMP satisfies the necessary conditions for AM to Q 0 converge [Subramanya and Bilmes, JMLR 2011] 33

137 Outline Motivation Graph Construction Inference Methods Scalability - Label Propagation - Modified Adsorption - Measure Propagation - Sparse Label Propagation - Manifold Regularization Applications Conclusion & Future Work 34

138 Background: Factor Graphs [Kschischang et al., 2001] Factor Graph bipartite graph variable nodes (e.g., label distribution on a node) factor nodes: fitness function over variable assignment Variable Nodes (V) Factor Nodes (F) Distribution over all variables values 35 variables connected to factor f

139 Factor Graph Interpretation of Graph SSL [Zhu et al., ICML 2003] [Das and Smith, NAACL 2012] 3-term Graph SSL Objective (common to many algorithms) min Seed Matching Loss (if any) Edge Smoothness Loss + + Regularization Loss 36

140 Factor Graph Interpretation of Graph SSL [Zhu et al., ICML 2003] [Das and Smith, NAACL 2012] 3-term Graph SSL Objective (common to many algorithms) min Seed Matching Loss (if any) Edge Smoothness Loss + + Regularization Loss w 1,2 q 1 q 2 2 q 1 q 2 36

141 Factor Graph Interpretation of Graph SSL [Zhu et al., ICML 2003] [Das and Smith, NAACL 2012] 3-term Graph SSL Objective (common to many algorithms) min Seed Matching Loss (if any) Edge Smoothness Loss + + Regularization Loss w 1,2 q 1 q 2 2 q 1 q 2 q (q 1,q 2 ) 1 q 2 36

142 Factor Graph Interpretation of Graph SSL [Zhu et al., ICML 2003] [Das and Smith, NAACL 2012] 3-term Graph SSL Objective (common to many algorithms) min Seed Matching Loss (if any) Edge Smoothness Loss + + Regularization Loss w 1,2 q 1 q 2 2 q 1 q 2 q (q 1,q 2 ) 1 q 2 Smoothness Factor (q 1,q 2 ) / w 1,2 q 1 q

143 Factor Graph Interpretation of Graph SSL [Zhu et al., ICML 2003] [Das and Smith, NAACL 2012] 3-term Graph SSL Objective (common to many algorithms) min Seed Matching Loss (if any) Edge Smoothness Loss + + Regularization Loss Seed Matching Factor (unary) w 1,2 q 1 q 2 2 q 1 q 2 q (q 1,q 2 ) 1 q 2 Smoothness Factor (q 1,q 2 ) / w 1,2 q 1 q

144 Factor Graph Interpretation of Graph SSL [Zhu et al., ICML 2003] [Das and Smith, NAACL 2012] 3-term Graph SSL Objective (common to many algorithms) min Seed Matching Loss (if any) Edge Smoothness Loss + + Regularization Loss Seed Matching Factor (unary) w 1,2 q 1 q 2 2 q 1 q 2 Regularization factor (unary) q (q 1,q 2 ) 1 q 2 Smoothness Factor (q 1,q 2 ) / w 1,2 q 1 q

145 Factor Graph Interpretation [Zhu et al., ICML 2003][Das and Smith, NAACL 2012] r 1 r 2 2. Smoothness Factor q 9264 q 9265 q 1 q 2 3. Unary factor for regularization log t (q t ) q 9266 r 3 q 3 q r 4 4 q 9267 q 9268 q 9269 q Factor encouraging agreement on seed labels 37

146 Label Propagation with Sparsity 38

147 Label Propagation with Sparsity Enforce through sparsity inducing unary factor 38

148 Label Propagation with Sparsity Enforce through sparsity inducing unary factor Lasso (Tibshirani, 1996) log t (q t )= Elitist Lasso (Kowalski and Torrésani, 2009) log t (q t )= 38

149 Label Propagation with Sparsity Enforce through sparsity inducing unary factor Lasso (Tibshirani, 1996) log t (q t )= Elitist Lasso (Kowalski and Torrésani, 2009) log t (q t )= For more details, see [Das and Smith, NAACL 2012] 38

150 Outline Motivation Graph Construction Inference Methods Scalability - Label Propagation - Modified Adsorption - Measure Propagation - Sparse Label Propagation - Manifold Regularization Applications Conclusion & Future Work 39

151 Manifold Regularization [Belkin et al., JMLR 2006] f 1 = arg min f l lx i=1 V (y i,f(x i )) + f T Lf + f 2 K 40

152 Manifold Regularization [Belkin et al., JMLR 2006] f 1 = arg min f l lx i=1 Training Data Loss V (y i,f(x i )) + f T Lf + f 2 K Loss Function (e.g., soft margin) 40

153 Manifold Regularization [Belkin et al., JMLR 2006] f 1 = arg min f l lx i=1 Training Data Loss Smoothness Regularizer V (y i,f(x i )) + f T Lf + f 2 K Loss Function (e.g., soft margin) Laplacian of graph over labeled and unlabeled data 40

154 Manifold Regularization [Belkin et al., JMLR 2006] f 1 = arg min f l lx i=1 Training Data Loss Smoothness Regularizer Regularizer (e.g., L2) V (y i,f(x i )) + f T Lf + f 2 K Loss Function (e.g., soft margin) Laplacian of graph over labeled and unlabeled data 40

155 Manifold Regularization [Belkin et al., JMLR 2006] f 1 = arg min f l lx i=1 Training Data Loss Smoothness Regularizer Regularizer (e.g., L2) V (y i,f(x i )) + f T Lf + f 2 K Loss Function (e.g., soft margin) Laplacian of graph over labeled and unlabeled data Trains an inductive classifier which can generalize to unseen instances 40

156 Other Graph-based SSL Methods TACO [Orbach and Crammer, ECML 2012] SSL on Directed Graphs [Zhou et al, NIPS 2005], [Zhou et al., ICML 2005] Spectral Graph Transduction [Joachims, ICML 2003] Graph-SSL for Ordering [Talukdar et al., CIKM 2012] Learning with dissimilarity edges [Goldberg et al., AISTATS 2007] 41

157 Outline Motivation Graph Construction Inference Methods Scalability Applications - Scalability Issues - Node reordering - MapReduce Parallelization Conclusion & Future Work 42

158 More (Unlabeled) Data is Better Data 43 [Subramanya & Bilmes, JMLR 2011]

159 More (Unlabeled) Data is Better Data Graph with 120m vertices 43 [Subramanya & Bilmes, JMLR 2011]

160 More (Unlabeled) Data is Better Data Graph with 120m vertices Challenges with large unlabeled data: Constructing graph from large data Scalable inference over large graphs 43 [Subramanya & Bilmes, JMLR 2011]

161 Outline Motivation Graph Construction Inference Methods Scalability Applications - Scalability Issues - Node reordering [Subramanya & Bilmes, JMLR 2011; Bilmes & Subramanya, 2011] - MapReduce Parallelization Conclusion & Future Work 44

162 Label Update using Message Passing Processor 1 1 a b Processor 2 2 Current label estimate on a v Processor k k Seed Prior Processor 1 k SMP with k Processors. Graph nodes (neighbors not shown) c 45

163 Label Update using Message Passing Processor 1 1 a b Processor 2 2. v Processor k k Seed Prior Processor 1 k+1 New label estimate on v 0.05 SMP with k Processors. Graph nodes (neighbors not shown) c 45

164 Node Reordering Algorithm : Intuition a k b c 46

165 Node Reordering Algorithm : Intuition a k b c Which node should be processed along with k: the one with highest intersection of neighborhood with k 46

166 Node Reordering Algorithm : Intuition e a j c a k b c Which node should be processed along with k: the one with highest intersection of neighborhood with k 46

167 Node Reordering Algorithm : Intuition e a j c a a k b b d c c l Which node should be processed along with k: the one with highest intersection of neighborhood with k c n m 46

168 Node Reordering Algorithm : Intuition e N(k) \ N(a) =1 a j c a a k b b d c c l Which node should be processed along with k: the one with highest intersection of neighborhood with k c n m 46

169 Node Reordering Algorithm : Intuition Cardinality of Intersection N(k) \ N(a) =1 a e j c a a k b b d c c l Which node should be processed along with k: the one with highest intersection of neighborhood with k c n m 46

170 Node Reordering Algorithm : Intuition Cardinality of Intersection N(k) \ N(a) =1 a e j c a a k b N(k) \ N(b) =2 b d c N(k) \ N(c) =0 c l Which node should be processed along with k: the one with highest intersection of neighborhood with k c n m 46

171 Node Reordering Algorithm : Intuition Cardinality of Intersection N(k) \ N(a) =1 Best Node a e j c a a k b N(k) \ N(b) =2 b d c N(k) \ N(c) =0 c l Which node should be processed along with k: the one with highest intersection of neighborhood with k c n m 46

172 Speed-up on SMP after Node Ordering 47 [Subramanya & Bilmes, JMLR, 2011]

173 Outline Motivation Graph Construction Inference Methods Scalability Applications - Scalability Issues - Node reordering - MapReduce Parallelization Conclusion & Future Work 48

174 MapReduce Implementation of MAD a Current label estimate on b b Seed v Prior c

175 MapReduce Implementation of MAD Map Each node send its current label assignments to its neighbors a 0.60 Current label estimate on b 0.75 b Seed v Prior c

176 MapReduce Implementation of MAD Map Each node send its current label assignments to its neighbors a b Seed v Prior c

177 MapReduce Implementation of MAD Map Each node send its current label assignments to its neighbors a b Reduce Each node updates its own label assignment using messages received from neighbors, and its own information (e.g., seed labels, reg. penalties etc.) Repeat until convergence v Reduce Seed Prior c

178 MapReduce Implementation of MAD Map a b Each node send its current label assignments to its neighbors Reduce New label estimate on v Each node updates its own label assignment using messages received from neighbors, and its own information (e.g., seed labels, reg. penalties etc.) 0.60 Seed v Prior Repeat until convergence c 49

179 MapReduce Implementation of MAD Map Each node send its current label assignments to its neighbors Reduce New label estimate on v Each node updates its own label assignment using messages received from neighbors, and its own information (e.g., seed labels, a 0.60 Seed v Prior Code in Junto Label Propagation Toolkit reg. penalties etc.) b Repeat (includes until convergence Hadoop-based implementation) c 49

180 MapReduce Implementation of MAD Map Each node send its current label assignments to its neighbors Reduce New label estimate on v Each node updates its own label assignment using messages received from neighbors, and its own information (e.g., seed labels, reg. penalties etc.) a 0.60 Graph-based algorithms are amenable to distributed processing Seed v Prior Code in Junto Label Propagation Toolkit b Repeat (includes until convergence Hadoop-based implementation) c 49

181 When to use Graph-based SSL and which method? 50

182 When to use Graph-based SSL and which method? When input data itself is a graph (relational data) or, when the data is expected to lie on a manifold 50

183 When to use Graph-based SSL and which method? When input data itself is a graph (relational data) or, when the data is expected to lie on a manifold MAD, Quadratic Criteria (QC) when labels are not mutually exclusive MADDL: when label similarities are known 50

184 When to use Graph-based SSL and which method? When input data itself is a graph (relational data) or, when the data is expected to lie on a manifold MAD, Quadratic Criteria (QC) when labels are not mutually exclusive MADDL: when label similarities are known Measure Propagation (MP) for probabilistic interpretation 50

185 When to use Graph-based SSL and which method? When input data itself is a graph (relational data) or, when the data is expected to lie on a manifold MAD, Quadratic Criteria (QC) when labels are not mutually exclusive MADDL: when label similarities are known Measure Propagation (MP) for probabilistic interpretation Manifold Regularization for generalization to unseen data (induction) 50

186 Graph-based SSL: Summary 51

187 Graph-based SSL: Summary Provide flexible representation for both IID and relational data 51

188 Graph-based SSL: Summary Provide flexible representation for both IID and relational data Graph construction can be key 51

189 Graph-based SSL: Summary Provide flexible representation for both IID and relational data Graph construction can be key Scalable: Node Reordering and MapReduce 51

190 Graph-based SSL: Summary Provide flexible representation for both IID and relational data Graph construction can be key Scalable: Node Reordering and MapReduce Can handle labeled as well as unlabeled data 51

191 Graph-based SSL: Summary Provide flexible representation for both IID and relational data Graph construction can be key Scalable: Node Reordering and MapReduce Can handle labeled as well as unlabeled data Can handle multi class, multi label settings 51

192 Graph-based SSL: Summary Provide flexible representation for both IID and relational data Graph construction can be key Scalable: Node Reordering and MapReduce Can handle labeled as well as unlabeled data Can handle multi class, multi label settings Effective in practice 51

193 Open Challenges 52

194 Open Challenges Graph-based SSL for Structured Prediction Algorithms: Combining Inductive and graph-based methods Applications: Constituency and dependency parsing, Coreference 52

195 Open Challenges Graph-based SSL for Structured Prediction Algorithms: Combining Inductive and graph-based methods Applications: Constituency and dependency parsing, Coreference Scalable graph construction, especially with multi-modal data 52

196 Open Challenges Graph-based SSL for Structured Prediction Algorithms: Combining Inductive and graph-based methods Applications: Constituency and dependency parsing, Coreference Scalable graph construction, especially with multi-modal data Extensions with other loss functions, sparsity, etc. 52

197 53

198 References (I) [1] A. Alexandrescu and K. Kirchhoff. Data-driven graph construction for semi-supervised graph-based learning in nlp. In NAACL HLT, [2] Y. Altun, D. McAllester, and M. Belkin. Maximum margin semi-supervised learn- ing for structured variables. NIPS, [3] S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: taking random walks through the view graph. In WWW, [4] R. Bekkerman, R. El-Yaniv, N. Tishby, and Y. Winter. Distributional word clusters vs. words for text categorization. J. Mach. Learn. Res., 3: , [5] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7: , [6] Y. Bengio, O. Delalleau, and N. Le Roux. Label propagation and quadratic criterion. Semi-supervised learning, [7] T. Berg-Kirkpatrick, A. Bouchard-Coˆt é, J. DeNero, and D. Klein. Painless unsupervised learning with features. In HLT-NAACL, [8] J. Bilmes and A. Subramanya. Scaling up Machine Learning: Parallel and Distributed Approaches, chapter Parallel Graph-Based Semi- Supervised Learning [9] S. Blair-goldensohn, T. Neylon, K. Hannan, G. A. Reis, R. Mcdonald, and J. Reynar. Building a sentiment summarizer for local service reviews. In In NLP in the Information Explosion Era, [10] M. Cafarella, A. Halevy, D. Wang, E. Wu, and Y. Zhang. Webtables: exploring the power of tables on the web. VLDB, [11] O. Chapelle, B. Scho lkopf, A. Zien, et al. Semi-supervised learning. MIT press Cambridge, MA:, [12] Y. Choi and C. Cardie. Adapting a polarity lexicon using integer linear program- ming for domain specific sentiment classification. In EMNLP, [13] S. Daitch, J. Kelner, and D. Spielman. Fitting a graph to vector data. In ICML, [14] D. Das and S. Petrov. Unsupervised part-of-speech tagging with bilingual graph- based projections. In ACL, [15] D. Das, N. Schneider, D. Chen, and N. A. Smith. Probabilistic frame-semantic parsing. In NAACL-HLT, [16] D. Das and N. Smith. Graph-based lexicon expansion with sparsity-inducing penalties. NAACL-HLT, [17] D. Das and N. A. Smith. Semi-supervised frame-semantic parsing for unknown predicates. In ACL, [18] J. Davis, B. Kulis, P. Jain, S. Sra, and I. Dhillon. Information-theoretic metric learning. In ICML, [19] O. Delalleau, Y. Bengio, and N. L. Roux. Efficient non-parametric function induction in semi-supervised learning. In AISTATS, [20] P. Dhillon, P. Talukdar, and K. Crammer. Inference-driven metric learning for graph construction. Technical report, MS-CIS-10-18, University of Pennsylvania,

199 References (II) [21] S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM, [22] J. Friedman, J. Bentley, and R. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transaction on Mathematical Software, 3, [23] J. Garcke and M. Griebel. Data mining with sparse grids using simplicial basis functions. In KDD, [24] A. Goldberg and X. Zhu. Seeing stars when there aren t many stars: graph-based semi-supervised learning for sentiment categorization. In Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, [25] A. Goldberg, X. Zhu, and S. Wright. Dissimilarity in graph-based semi-supervised classification. AISTATS, [26] M. Hu and B. Liu. Mining and summarizing customer reviews. In KDD, [27] T. Jebara, J. Wang, and S. Chang. Graph construction and b-matching for semi-supervised learning. In ICML, [28] T. Joachims. Transductive inference for text classification using support vector machines. In ICML, [29] T. Joachims. Transductive learning via spectral graph partitioning. In ICML, [30] M. Karlen, J. Weston, A. Erkan, and R. Collobert. Large scale manifold transduction. In ICML, [31] S.-M. Kim and E. Hovy. Determining the sentiment of opinions. In Proceedings of the 20th International conference on Computational Linguistics, [32] F. Kschischang, B. Frey, and H. Loeliger. Factor graphs and the sum-product algorithm. Information Theory, IEEE Transactions on, 47(2): , [33] K. Lerman, S. Blair-Goldensohn, and R. McDonald. Sentiment summarization: evaluating and learning user preferences. In EACL, [34] D.Lewisetal.Reuters [35] J. Malkin, A. Subramanya, and J. Bilmes. On the semi-supervised learning of multi-layered perceptrons. In InterSpeech, [36] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?: sentiment classification using machine learning techniques. In EMNLP, [37] D. Rao and D. Ravichandran. Semi-supervised polarity lexicon induction. In EACL, [38] A. Subramanya and J. Bilmes. Soft-supervised learning for text classification. In EMNLP, [39] A. Subramanya and J. Bilmes. Entropic graph regularization in non-parametric semi-supervised classification. NIPS, [40] A. Subramanya and J. Bilmes. Semi-supervised learning with measure propagation. JMLR,

200 References (III) [41] A. Subramanya, S. Petrov, and F. Pereira. Efficient graph-based semi-supervised learning of structured tagging models. In EMNLP, [42] P. Talukdar. Topics in graph construction for semi-supervised learning. Technical report, MS-CIS-09-13, University of Pennsylvania, [43] P. Talukdar and K. Crammer. New regularized algorithms for transductive learning. ECML, [44] P. Talukdar and F. Pereira. Experiments in graph-based semi-supervised learning methods for class-instance acquisition. In ACL, [45] P. Talukdar, J. Reisinger, M. Pa şca, D. Ravichandran, R. Bhagat, and F. Pereira. Weakly-supervised acquisition of labeled class instances using graph random walks. In EMNLP, [46] B. Van Durme and M. Pasca. Finding cars, goddesses and enzymes: Parametrizable acquisition of labeled instances for opendomain information extraction. In AAAI, [47] L. Velikovich, S. Blair-Goldensohn, K. Hannan, and R. McDonald. The viability of web-derived polarity lexicons. In HLT-NAACL, [48] F. Wang and C. Zhang. Label propagation through linear neighborhoods. In ICML, [49] J. Wang, T. Jebara, and S. Chang. Graph transduction via alternating minimization. In ICML, [50] R. Wang and W. Cohen. Language-independent set expansion of named entities using the web. In ICDM, [51] K. Weinberger and L. Saul. Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research, 10: , [52] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase- level sentiment analysis. In HLT-EMNLP, [53] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scho lkopf. Learning with local and global consistency. NIPS, [54] D. Zhou, J. Huang, and B. Scho lkopf. Learning from labeled and un- labeled data on a directed graph. In ICML, [55] D. Zhou, B. Scho lkopf, and T. Hofmann. Semi-supervised learning on directed graphs. In NIPS, [56] X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, CMU- CALD , Carnegie Mellon University, [57] X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, Carnegie Mellon University, [58] X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, [59] X. Zhu and J. Lafferty. Harmonic mixtures: combining mixture models and graph- based methods for inductive and scalable semi-supervised learning. In ICML,

201 Thanks! Web: 57

Graph-based Semi- Supervised Learning as Optimization

Graph-based Semi- Supervised Learning as Optimization Graph-based Semi- Supervised Learning as Optimization Partha Pratim Talukdar CMU Machine Learning with Large Datasets (10-605) April 3, 2012 Graph-based Semi-Supervised Learning 0.2 0.1 0.2 0.3 0.3 0.2

More information

Learning Better Data Representation using Inference-Driven Metric Learning

Learning Better Data Representation using Inference-Driven Metric Learning Learning Better Data Representation using Inference-Driven Metric Learning Paramveer S. Dhillon CIS Deptt., Univ. of Penn. Philadelphia, PA, U.S.A dhillon@cis.upenn.edu Partha Pratim Talukdar Search Labs,

More information

Inference Driven Metric Learning (IDML) for Graph Construction

Inference Driven Metric Learning (IDML) for Graph Construction University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 1-1-2010 Inference Driven Metric Learning (IDML) for Graph Construction Paramveer S. Dhillon

More information

Transductive Phoneme Classification Using Local Scaling And Confidence

Transductive Phoneme Classification Using Local Scaling And Confidence 202 IEEE 27-th Convention of Electrical and Electronics Engineers in Israel Transductive Phoneme Classification Using Local Scaling And Confidence Matan Orbach Dept. of Electrical Engineering Technion

More information

Semi-Supervised Learning: Lecture Notes

Semi-Supervised Learning: Lecture Notes Semi-Supervised Learning: Lecture Notes William W. Cohen March 30, 2018 1 What is Semi-Supervised Learning? In supervised learning, a learner is given a dataset of m labeled examples {(x 1, y 1 ),...,

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

Semi-supervised Data Representation via Affinity Graph Learning

Semi-supervised Data Representation via Affinity Graph Learning 1 Semi-supervised Data Representation via Affinity Graph Learning Weiya Ren 1 1 College of Information System and Management, National University of Defense Technology, Changsha, Hunan, P.R China, 410073

More information

S. Yamuna M.Tech Student Dept of CSE, Jawaharlal Nehru Technological University, Anantapur

S. Yamuna M.Tech Student Dept of CSE, Jawaharlal Nehru Technological University, Anantapur Ensemble Prototype Vector Machines based on Semi Supervised Classifiers S. Yamuna M.Tech Student Dept of CSE, Jawaharlal Nehru Technological University, Anantapur Abstract: -- In large number of real world

More information

Topics in Graph Construction for Semi-Supervised Learning

Topics in Graph Construction for Semi-Supervised Learning University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 8-27-2009 Topics in Graph Construction for Semi-Supervised Learning Partha Pratim Talukdar

More information

Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch

Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch Graph-based SSL using a count-min sketch has a number of properties that are desirable, and somewhat surprising.

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data

Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data Shih-Fu Chang Department of Electrical Engineering Department of Computer Science Columbia University Joint work with Jun Wang (IBM),

More information

Kernel-based Transductive Learning with Nearest Neighbors

Kernel-based Transductive Learning with Nearest Neighbors Kernel-based Transductive Learning with Nearest Neighbors Liangcai Shu, Jinhui Wu, Lei Yu, and Weiyi Meng Dept. of Computer Science, SUNY at Binghamton Binghamton, New York 13902, U. S. A. {lshu,jwu6,lyu,meng}@cs.binghamton.edu

More information

Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation

Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation Several classification and knowledge expansion type of problems involve a large number of labels in realworld scenarios. For

More information

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran

More information

Semi-supervised learning SSL (on graphs)

Semi-supervised learning SSL (on graphs) Semi-supervised learning SSL (on graphs) 1 Announcement No office hour for William after class today! 2 Semi-supervised learning Given: A pool of labeled examples L A (usually larger) pool of unlabeled

More information

Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties

Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties Dipanjan Das and Noah A. Smith Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213, USA {dipanjan,nasmith}@cs.cmu.edu

More information

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Retrospective ICML99 Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Outline The paper in

More information

Efficient Iterative Semi-supervised Classification on Manifold

Efficient Iterative Semi-supervised Classification on Manifold . Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani

More information

Alternating Minimization. Jun Wang, Tony Jebara, and Shih-fu Chang

Alternating Minimization. Jun Wang, Tony Jebara, and Shih-fu Chang Graph Transduction via Alternating Minimization Jun Wang, Tony Jebara, and Shih-fu Chang 1 Outline of the presentation Brief introduction and related work Problems with Graph Labeling Imbalanced labels

More information

Large-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver

Large-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver Large-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver AAAI Press Association for the Advancement of Artificial Intelligence 2275 East Bayshore Road, Suite 60 Palo Alto, California

More information

Graph-Based Weakly- Supervised Methods for Information Extraction & Integration. Partha Pratim Talukdar University of Pennsylvania

Graph-Based Weakly- Supervised Methods for Information Extraction & Integration. Partha Pratim Talukdar University of Pennsylvania Graph-Based Weakly- Supervised Methods for Information Extraction & Integration Partha Pratim Talukdar University of Pennsylvania Dissertation Defense, February 24, 2010 End Goal We should be able to answer

More information

Neighbor Search with Global Geometry: A Minimax Message Passing Algorithm

Neighbor Search with Global Geometry: A Minimax Message Passing Algorithm : A Minimax Message Passing Algorithm Kye-Hyeon Kim fenrir@postech.ac.kr Seungjin Choi seungjin@postech.ac.kr Department of Computer Science, Pohang University of Science and Technology, San 31 Hyoja-dong,

More information

Regularized Learning with Networks of Features

Regularized Learning with Networks of Features Regularized Learning with Networks of Features Ted Sandler, Partha Pratim Talukdar, and Lyle H. Ungar Department of Computer & Information Science, University of Pennsylvania {tsandler,partha,ungar}@cis.upenn.edu

More information

INTRO TO SEMI-SUPERVISED LEARNING (SSL)

INTRO TO SEMI-SUPERVISED LEARNING (SSL) SSL (on graphs) 1 INTRO TO SEMI-SUPERVISED LEARNING (SSL) Semi-supervised learning Given: A pool of labeled examples L A (usually larger) pool of unlabeled examples U Option 1 for using L and U : Ignore

More information

Instance-level Semi-supervised Multiple Instance Learning

Instance-level Semi-supervised Multiple Instance Learning Instance-level Semi-supervised Multiple Instance Learning Yangqing Jia and Changshui Zhang State Key Laboratory on Intelligent Technology and Systems Tsinghua National Laboratory for Information Science

More information

Large-Scale Semi-Supervised Learning

Large-Scale Semi-Supervised Learning Large-Scale Semi-Supervised Learning Jason WESTON a a NEC LABS America, Inc., 4 Independence Way, Princeton, NJ, USA 08540. Abstract. Labeling data is expensive, whilst unlabeled data is often abundant

More information

Automatic Domain Partitioning for Multi-Domain Learning

Automatic Domain Partitioning for Multi-Domain Learning Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels

More information

Graph based machine learning with applications to media analytics

Graph based machine learning with applications to media analytics Graph based machine learning with applications to media analytics Lei Ding, PhD 9-1-2011 with collaborators at Outline Graph based machine learning Basic structures Algorithms Examples Applications in

More information

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Scalable Clustering of Signed Networks Using Balance Normalized Cut Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.

More information

Large Scale Manifold Transduction

Large Scale Manifold Transduction Large Scale Manifold Transduction Michael Karlen, Jason Weston, Ayse Erkan & Ronan Collobert NEC Labs America, Princeton, USA Ećole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland New York University,

More information

Unsupervised and Semi-Supervised Learning vial 1 -Norm Graph

Unsupervised and Semi-Supervised Learning vial 1 -Norm Graph Unsupervised and Semi-Supervised Learning vial -Norm Graph Feiping Nie, Hua Wang, Heng Huang, Chris Ding Department of Computer Science and Engineering University of Texas, Arlington, TX 769, USA {feipingnie,huawangcs}@gmail.com,

More information

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation Supplementary Material: The Emergence of Organizing Structure in Conceptual Representation Brenden M. Lake, 1,2 Neil D. Lawrence, 3 Joshua B. Tenenbaum, 4,5 1 Center for Data Science, New York University

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems

More information

SEMI-SUPERVISED LEARNING (SSL) for classification

SEMI-SUPERVISED LEARNING (SSL) for classification IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 12, DECEMBER 2015 2411 Bilinear Embedding Label Propagation: Towards Scalable Prediction of Image Labels Yuchen Liang, Zhao Zhang, Member, IEEE, Weiming Jiang,

More information

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title Prototype Vector Machine for Large Scale Semi-Supervised Learning Permalink https://escholarship.org/uc/item/64r3c1rx Author

More information

Learning Models of Similarity: Metric and Kernel Learning. Eric Heim, University of Pittsburgh

Learning Models of Similarity: Metric and Kernel Learning. Eric Heim, University of Pittsburgh Learning Models of Similarity: Metric and Kernel Learning Eric Heim, University of Pittsburgh Standard Machine Learning Pipeline Manually-Tuned Features Machine Learning Model Desired Output for Task Features

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning

Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning Xiaojin Zhu John Lafferty School of Computer Science, Carnegie ellon University,

More information

Efficient Non-Parametric Function Induction in Semi-Supervised Learning

Efficient Non-Parametric Function Induction in Semi-Supervised Learning Efficient Non-Parametric Function Induction in Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Dept. IRO, Université de Montréal P.O. Box 618, Succ. Centre-Ville, Montreal,

More information

GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION

GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION Nasehe Jamshidpour a, Saeid Homayouni b, Abdolreza Safari a a Dept. of Geomatics Engineering, College of Engineering,

More information

Typed Graph Models for Semi-Supervised Learning of Name Ethnicity

Typed Graph Models for Semi-Supervised Learning of Name Ethnicity Typed Graph Models for Semi-Supervised Learning of Name Ethnicity Delip Rao Dept. of Computer Science Johns Hopkins University delip@cs.jhu.edu David Yarowsky Dept. of Computer Science Johns Hopkins University

More information

Semi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization

Semi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization Semi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization Sunil Thulasidasan 1,2, Jeffrey Bilmes 2 1 Los Alamos National Laboratory 2 Department

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Machine Learning Potsdam, 26 April 2012 Saeedeh Momtazi Information Systems Group Introduction 2 Machine Learning Field of study that gives computers the ability to learn without

More information

Improving Image Segmentation Quality Via Graph Theory

Improving Image Segmentation Quality Via Graph Theory International Symposium on Computers & Informatics (ISCI 05) Improving Image Segmentation Quality Via Graph Theory Xiangxiang Li, Songhao Zhu School of Automatic, Nanjing University of Post and Telecommunications,

More information

Part 5: Structured Support Vector Machines

Part 5: Structured Support Vector Machines Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Providence, 21st June 2012 1 / 34 Problem (Loss-Minimizing Parameter Learning) Let d(x, y) be the (unknown) true data

More information

A METRIC LEARNING APPROACH FOR GRAPH-BASED

A METRIC LEARNING APPROACH FOR GRAPH-BASED A METRIC LEARNING APPROACH FOR GRAPH-BASED LABEL PROPAGATION Pauline Wauquier Clic And Walk 25 rue Corneille 59100 Roubaix France pauline.wauquier@inria.fr Mikaela Keller Univ. Lille, CNRS, Centrale Lille,

More information

Semi-supervised vs. Cross-domain Graphs for Sentiment Analysis

Semi-supervised vs. Cross-domain Graphs for Sentiment Analysis Semi-supervised vs. Cross-domain Graphs for Sentiment Analysis Natalia Ponomareva University of Wolverhampton, UK nata.ponomareva@wlv.ac.uk Mike Thelwall University of Wolverhampton, UK m.thelwall@wlv.ac.uk

More information

Constrained Metric Learning via Distance Gap Maximization

Constrained Metric Learning via Distance Gap Maximization Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Constrained Metric Learning via Distance Gap Maximization Wei Liu, Xinmei Tian, Dacheng Tao School of Computer Engineering

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,

More information

Large-Scale Face Manifold Learning

Large-Scale Face Manifold Learning Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random

More information

Transductive Classification Methods for Mixed Graphs

Transductive Classification Methods for Mixed Graphs Transductive Classification Methods for Mixed Graphs S Sundararajan Yahoo! Labs Bangalore, India ssrajan@yahoo-inc.com S Sathiya Keerthi Yahoo! Labs Santa Clara, CA selvarak@yahoo-inc.com ABSTRACT In this

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Computationally Efficient M-Estimation of Log-Linear Structure Models

Computationally Efficient M-Estimation of Log-Linear Structure Models Computationally Efficient M-Estimation of Log-Linear Structure Models Noah Smith, Doug Vail, and John Lafferty School of Computer Science Carnegie Mellon University {nasmith,dvail2,lafferty}@cs.cmu.edu

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Semi-supervised Learning

Semi-supervised Learning Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

SVMs and Data Dependent Distance Metric

SVMs and Data Dependent Distance Metric SVMs and Data Dependent Distance Metric N. Zaidi, D. Squire Clayton School of Information Technology, Monash University, Clayton, VIC 38, Australia Email: {nayyar.zaidi,david.squire}@monash.edu Abstract

More information

Applied Statistics for Neuroscientists Part IIa: Machine Learning

Applied Statistics for Neuroscientists Part IIa: Machine Learning Applied Statistics for Neuroscientists Part IIa: Machine Learning Dr. Seyed-Ahmad Ahmadi 04.04.2017 16.11.2017 Outline Machine Learning Difference between statistics and machine learning Modeling the problem

More information

Semi-supervised learning and active learning

Semi-supervised learning and active learning Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A Statistical parsing Fei Xia Feb 27, 2009 CSE 590A Statistical parsing History-based models (1995-2000) Recent development (2000-present): Supervised learning: reranking and label splitting Semi-supervised

More information

Adaptation of Graph-Based Semi-Supervised Methods to Large-Scale Text Data

Adaptation of Graph-Based Semi-Supervised Methods to Large-Scale Text Data Adaptation of Graph-Based Semi-Supervised Methods to Large-Scale Text Data ABSTRACT Frank Lin Carnegie Mellon University 5 Forbes Ave. Pittsburgh, PA 15213 frank@cs.cmu.edu Graph-based semi-supervised

More information

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge

More information

Learning Two-View Stereo Matching

Learning Two-View Stereo Matching Learning Two-View Stereo Matching Jianxiong Xiao Jingni Chen Dit-Yan Yeung Long Quan Department of Computer Science and Engineering The Hong Kong University of Science and Technology The 10th European

More information

Bipartite Edge Prediction via Transductive Learning over Product Graphs

Bipartite Edge Prediction via Transductive Learning over Product Graphs Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang School of Computer Science, Carnegie Mellon University July 8, 2015 ICML 2015 Bipartite Edge Prediction

More information

Day 3 Lecture 1. Unsupervised Learning

Day 3 Lecture 1. Unsupervised Learning Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations

More information

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that

More information

Metric Learning for Large-Scale Image Classification:

Metric Learning for Large-Scale Image Classification: Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka

More information

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

Guiding Semi-Supervision with Constraint-Driven Learning

Guiding Semi-Supervision with Constraint-Driven Learning Guiding Semi-Supervision with Constraint-Driven Learning Ming-Wei Chang 1 Lev Ratinov 2 Dan Roth 3 1 Department of Computer Science University of Illinois at Urbana-Champaign Paper presentation by: Drew

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Graph Transduction via Alternating Minimization

Graph Transduction via Alternating Minimization Jun Wang Department of Electrical Engineering, Columbia University Tony Jebara Department of Computer Science, Columbia University Shih-Fu Chang Department of Electrical Engineering, Columbia University

More information

Graph Transduction via Alternating Minimization

Graph Transduction via Alternating Minimization Jun Wang Department of Electrical Engineering, Columbia University Tony Jebara Department of Computer Science, Columbia University Shih-Fu Chang Department of Electrical Engineering, Columbia University

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Collaborative Filtering: A Comparison of Graph-Based Semi-Supervised Learning Methods and Memory-Based Methods

Collaborative Filtering: A Comparison of Graph-Based Semi-Supervised Learning Methods and Memory-Based Methods 70 Computer Science 8 Collaborative Filtering: A Comparison of Graph-Based Semi-Supervised Learning Methods and Memory-Based Methods Rasna R. Walia Collaborative filtering is a method of making predictions

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Using PageRank in Feature Selection

Using PageRank in Feature Selection Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important

More information

Learning a Distance Metric for Structured Network Prediction. Stuart Andrews and Tony Jebara Columbia University

Learning a Distance Metric for Structured Network Prediction. Stuart Andrews and Tony Jebara Columbia University Learning a Distance Metric for Structured Network Prediction Stuart Andrews and Tony Jebara Columbia University Outline Introduction Context, motivation & problem definition Contributions Structured network

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Robust Semi-Supervised Learning through Label Aggregation

Robust Semi-Supervised Learning through Label Aggregation Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Robust Semi-Supervised Learning through Label Aggregation Yan Yan, Zhongwen Xu, Ivor W. Tsang, Guodong Long, Yi Yang Centre

More information

Data Clustering. Danushka Bollegala

Data Clustering. Danushka Bollegala Data Clustering Danushka Bollegala Outline Why cluster data? Clustering as unsupervised learning Clustering algorithms k-means, k-medoids agglomerative clustering Brown s clustering Spectral clustering

More information

On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution

On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution ICML2011 Jun. 28-Jul. 2, 2011 On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution Masashi Sugiyama, Makoto Yamada, Manabu Kimura, and Hirotaka Hachiya Department of

More information

Structured Learning. Jun Zhu

Structured Learning. Jun Zhu Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Global Metric Learning by Gradient Descent

Global Metric Learning by Gradient Descent Global Metric Learning by Gradient Descent Jens Hocke and Thomas Martinetz University of Lübeck - Institute for Neuro- and Bioinformatics Ratzeburger Allee 160, 23538 Lübeck, Germany hocke@inb.uni-luebeck.de

More information

MSA220 - Statistical Learning for Big Data

MSA220 - Statistical Learning for Big Data MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups

More information

Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning

Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning Qimai Li, 1 Zhichao Han, 1,2 Xiao-Ming Wu 1 1 The Hong

More information

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 Spectral Clustering Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 What are we going to talk about? Introduction Clustering and

More information

Semi-supervised Learning by Sparse Representation

Semi-supervised Learning by Sparse Representation Semi-supervised Learning by Sparse Representation Shuicheng Yan Huan Wang Abstract In this paper, we present a novel semi-supervised learning framework based on l 1 graph. The l 1 graph is motivated by

More information

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the

More information

无监督学习中的选代表和被代表问题 - AP & LLE 张响亮. Xiangliang Zhang. King Abdullah University of Science and Technology. CNCC, Oct 25, 2018 Hangzhou, China

无监督学习中的选代表和被代表问题 - AP & LLE 张响亮. Xiangliang Zhang. King Abdullah University of Science and Technology. CNCC, Oct 25, 2018 Hangzhou, China 无监督学习中的选代表和被代表问题 - AP & LLE 张响亮 Xiangliang Zhang King Abdullah University of Science and Technology CNCC, Oct 25, 2018 Hangzhou, China Outline Affinity Propagation (AP) [Frey and Dueck, Science, 2007]

More information

Using PageRank in Feature Selection

Using PageRank in Feature Selection Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important

More information

Graph Laplacian Kernels for Object Classification from a Single Example

Graph Laplacian Kernels for Object Classification from a Single Example Graph Laplacian Kernels for Object Classification from a Single Example Hong Chang & Dit-Yan Yeung Department of Computer Science, Hong Kong University of Science and Technology {hongch,dyyeung}@cs.ust.hk

More information

DeepWalk: Online Learning of Social Representations

DeepWalk: Online Learning of Social Representations DeepWalk: Online Learning of Social Representations ACM SIG-KDD August 26, 2014, Rami Al-Rfou, Steven Skiena Stony Brook University Outline Introduction: Graphs as Features Language Modeling DeepWalk Evaluation:

More information

Cluster Analysis (b) Lijun Zhang

Cluster Analysis (b) Lijun Zhang Cluster Analysis (b) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Grid-Based and Density-Based Algorithms Graph-Based Algorithms Non-negative Matrix Factorization Cluster Validation Summary

More information

Machine Learning: Think Big and Parallel

Machine Learning: Think Big and Parallel Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

More information