Composite Likelihood Data Augmentation for Within-Network Statistical Relational Learning

Size: px

Start display at page:

Download "Composite Likelihood Data Augmentation for Within-Network Statistical Relational Learning"

Wilfrid Holt
6 years ago
Views:

1 Composite Likelihood Data Augmentation for Within-Network Statistical Relational Learning Joseph J. Pfeiffer III 1 Jennifer Neville 1 Paul Bennett 2 1 Purdue University 2 Microsoft Research ICDM 2014, Shenzhen

2 Relational Machine Learning Items: stock brokers - social network users, papers, movies, products, etc. Label: fraudulent - buy product, sales rank, etc., Relationships: phone calls - friendships, s, citations, co-purchases, etc. Attributes: other observable item traits Within-network relational machine learning 2

3 Within-Network Learn Relational model parameters Machine Y Learning using the observed data Infer (predict) remaining labels using the learned parameters and available data Y Will Show: Learning/inference approximations prevent semi-supervised algorithms from converging Will Develop/Demonstrate: Methods that model parameter uncertainty overcome approximation errors 3

4 Outline Introduction Background: Semi-supervised Relational Machine Learning Analysis of Relational EM convergence in real world networks Relational Stochastic EM and Relational Data Augmentation Experiments Conclusions 4

5 Outline Introduction Background: Semi-supervised Relational Machine Learning Analysis of Relational EM convergence in real world networks Relational Stochastic EM and Relational Data Augmentation Experiments Conclusions 5

6 Background - Semi-Supervised Relational Learning Relational Machine Learning - Approximate Learning: - Approximate Inference: Labels Many conditional forms to choose from (e.g., Generative, Logistic) xi yi yj xi yj (Xiang & Neville, 2009) MB(vi) M-Step: Edges P (Y X, E, Y ) Both learning and inference require Attributes approximations X ˆ Y = arg max log P Y (y i Y MBL (v i ), x i, Y ) Y v i 2V L Pseudolikelihood (GL) X X ˆ Y = arg max P Y (Y U ) log P Y (y i ỸMB(v i ), x i, Y ) Y Y U 2Y U v i 2V L How do these approximations affect E-Step: semi-supervised relational learning? Semi-supervised Relational EM Composite Likelihood (GO) P Y (y i Y MB (v i ), x i, Y ) Collective Inference yi MB(vi) (For all unlabeled instances)

7 Outline Introduction Background: Semi-supervised Relational Machine Learning Analysis of Relational EM convergence in real world networks Relational Stochastic EM and Relational Data Augmentation Experiments Conclusions 7

8 Outline Introduction Background: Semi-supervised Relational Machine Learning Analysis of Relational EM convergence in real world networks Relational Stochastic EM and Relational Data Augmentation Experiments Conclusions 8

9 Impact of Approximations in Semi-Supervised? Relational EM P (Purple Purple Neighbor) 0.96 (Relational EM:) E-Step: Collective Inference Approximation P Y (y i Y MB (v i ), x i, Y ) (For all unlabeled instances) Over-Propagation Error - Both neighboring yellow - Yellow neighbors are not predictive Over-Correction M-Step: X X? ˆ Y = arg max P (Purple Neighbors) P Y (Y U ) log P Y (y i ỸMB(v i 0.61 ), x i, Y ) Y Y U 2Y U v i 2V L Composite Likelihood Approximation

10 Impact of Approximations in Semi-Supervised Relational Learning Does over propagation during prediction happen in real world, sparsely labeled networks? Class Prior Class Prior Total Negatives5000 Trial 1 Trial 2 Trial 3 Trial 4 Trial Total Positives Relational Naive Bayes Total Negatives5000 Trial 1 Trial 2 Trial 3 Trial 4 Trial Total Positives Relational Logistic Regression Yes

11 Impact of Approximations in Semi-Supervised Relational Learning Does over correction happen during parameter estimation for semisupervised relational learning for real world, sparsely labeled networks? P (+ y) P ( ) P ( +) P ( y) Relational Naive Bayes w[1] w w[0] Relational Logistic Regression Yes

12 Outline Introduction Background: Semi-supervised Relational Machine Learning Analysis of Relational EM convergence in real world networks Relational Stochastic EM and Relational Data Augmentation Experiments Conclusions 12

13 Outline Introduction Background: Semi-supervised Relational Machine Learning Analysis of Relational EM convergence in real world networks Relational Stochastic EM and Relational Data Augmentation Experiments Conclusions 13

14 Relational Stochastic EM and Relational Data Augmentation Parameters Fixed Point Predictions Stochastic Fixed Point Relational EM Stochastic Relational Stochastic EM Relational Data Augmentation 14

15 Relational Stochastic EM Stochastic approximation to relational EM algorithm Sample from the joint conditional distribution of labels Maximize the composite likelihood Contrasts with Relational EM, which utilizes expectations of unlabeled items Average over the parameters to reduce the variance (Celex et al., 2001) Inference is still performed using a single, fixed point set of parameters Parameters Fixed Point Predictions Stochastic Fixed Point Relational EM Stochastic Relational Stochastic EM Alternate Between: Gibbs sample of labels Ỹ t U P t Y (Y U Y L, X, E, t 1 Y ) Maximizing Parameters t Y = arg max Y X Aggregate Parameters: Relational Data Augmentation v i 2V L log P Y (y i Ỹt MB(v i ), x i, Y ) ˆ Y = 1 T X t t Y Final Inference: P Y (Y U Y L, X, E, ˆ Y ) 15

16 Relational Data Augmentation Data Augmentation is a Bayesian viewpoint of EM - Parameters are random variables. - Computes posterior predictive distribution (Tanner&Wong,1987) Developed a version for relational semisupervised learning Final inference is over a distribution of parameter values Requires prior distributions over the parameters and corresponding sampling methods - RNB: Beta (conjunctive prior) - RLR: Normal (Metropolis-Hastings sampler) Parameters Fixed Point Predictions Stochastic Fixed Point Relational EM Stochastic Relational Stochastic EM Alternate Between: Gibbs sample of labels Ỹ t U P t Y (Y U Y L, X, E, t 1 Y ) Sample Parameters t Y P t ( Y Ỹt U, Y L, X, E) Final Parameters: Final Inference: ˆ Y Relational Data Augmentation = 1 T Ŷ t U = 1 T X t X t t Y Ỹ t U 16

17 Outline Introduction Background: Semi-supervised Relational Machine Learning Analysis of Relational EM convergence in real world networks Relational Stochastic EM and Relational Data Augmentation Experiments Conclusions 17

18 Outline Introduction Background: Semi-supervised Relational Machine Learning Analysis of Relational EM convergence in real world networks Relational Stochastic EM and Relational Data Augmentation Experiments Conclusions 18

19 Experiments - Setup Compare on four real world networks Competing Models - Independent - Relational (No CI and CI) - Relational EM - Relational SEM - Relational DA Conditional Models - Relational Naive Bayes - Relational Logistic Regression Error measures - Zero-One Loss (0-1 Loss) - Mean absolute error (MAE) Dataset Vertices Edges Attributes Facebook 5,906 73,374 2 IMDB 11, , DVD 16,118 75, Music 56, ,

20 Experiments - DVD Relational EM Relational EM 0-1 Loss Percentage of Graph Labeled Naive Bayes Relational DA 0-1 Loss Percentage of Graph Labeled Logistic Regression Relational DA Relational EM s instability in sparsely labeled domains causes poor performance 20

21 Experiments - Facebook 0.60 Relational Data Augmentation can outperform Relational Stochastic EM (In 0.32 Paper): Cast collective inference as a Percentage of Graph Labeled Percentage of Graph Labeled MAE Relational EM nonlinear dynamical system to analyze Naive the Bayes convergence Logistic of Relational Regression SEM Relational SEM Relational DA MAE Relational EM Relational DA (Finding): Similar to Relational EM, Relational SEM can sometimes learn parameters that result in an unstable system 21

22 Outline Introduction Background: Semi-supervised Relational Machine Learning Analysis of Relational EM convergence in real world networks Relational Stochastic EM and Relational Data Augmentation Experiments Conclusions 22

23 Outline Introduction Background: Semi-supervised Relational Machine Learning Analysis of Relational EM convergence in real world networks Relational Stochastic EM and Relational Data Augmentation Experiments Conclusions 23

24 Conclusions Findings - Fixed point approximation error led Relational EM to not converge - Overpropagation and Overcorrection - (In Paper) Fixed point EM methods can result in unstable systems during inference Developed - Relational Stochastic EM method has lower variance in parameter estimates - Relational Data Augmentation for computing a posterior predictive distribution for the unlabeled instances - Models the uncertainty over the parameter estimates, meaning it can outperform the Relational Stochastic EM approach - Both work well in conjunction with Composite Likelihood approximation - Both significantly outperformed a variety of competitors under a number of testing scenarios 24

25 Thank you! Jennifer Neville Paul Bennett

Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values

Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values Iman Alodah Computer Science Department Purdue University West Lafayette, Indiana 47906 Email: ialodah@purdue.edu