Estimating Labels from Label Proportions

Size: px

Start display at page:

Download "Estimating Labels from Label Proportions"

Delphia Webb
5 years ago
Views:

1 Estimating Labels from Label Proportions Novi Quadrianto The Australian National University, Australia NICTA, Statistical Machine Learning Program, Australia Joint work with Alex Smola, Tiberio Caetano, and Quoc Le Novi Quadrianto: Estimating Labels from Label Proportions, Page 1

2 Supervised Learning Novi Quadrianto: Estimating Labels from Label Proportions, Page 2

3 Unsupervised Learning Novi Quadrianto: Estimating Labels from Label Proportions, Page 3

4 Semi-supervised Learning Novi Quadrianto: Estimating Labels from Label Proportions, Page 4

5 Learning from Proportions Novi Quadrianto: Estimating Labels from Label Proportions, Page 5

6 An example application Promotional coupon Apple Inc. decides to distribute the following coupon: To whom this coupon should be mailed? every college students in the world? selected college students? Novi Quadrianto: Estimating Labels from Label Proportions, Page 6

7 An example application Selection criteria Some people would always buy Mac, even without coupon Some other people will never buy Mac anyway Others will buy Mac if and only if they receive the coupon Novi Quadrianto: Estimating Labels from Label Proportions, Page 7

8 An example application Four types of customers: A - Always buyers, N - Never buyers, C - Compliers (buy iff coupon), D - Defiers (buy iff no coupon). Four data aggregates Buy Doesn t Buy Exp. 1 : Given Coupon A C N Exp. 2 : Not Given Coupon A N C Assumption: no defiers Fact: we don t have a pure sample of C, and we want p(c customer profile) Novi Quadrianto: Estimating Labels from Label Proportions, Page 8

9 An example application We know the proportions p(a) and p(n) from the random assignment experiment Therefore we know p(c) Therefore we know all the proportions Novi Quadrianto: Estimating Labels from Label Proportions, Page 9

10 Problem formulation What we have n sets of observations X i = { x i 1,..., x i m i } of respective sample sizes m i as calibration sets a set X = {x 1,..., x m } as a test set fractions π iy of patterns of labels y Y ( Y n) contained in each set X i marginal probability p(y) of the test set X What we want conditional class probability estimates p(y x) Novi Quadrianto: Estimating Labels from Label Proportions, Page 10

11 Gaussian process solution Conditional exponential likelihood model p(y x, θ) = exp ( φ(x, y), θ g(θ x)) with g(θ x) = log y Y exp φ(x, y), θ Some details φ(x, y) is the sufficient statistics g(θ x) is the log-partition function Gaussian prior log p(θ) λ θ 2 Posterior log p(y X, θ)p(θ) = m i=1 [g(θ x i) φ(x i, y i ), θ ] + λ θ 2 Novi Quadrianto: Estimating Labels from Label Proportions, Page 11

12 Optimization Optimization θ = argmin θ [ m ] g(θ x i ) m µ XY, θ + λ θ 2 i=1 µ XY := 1 m m φ(x i, y i ) i=1 This is a convex optimization problem So is our job done? with Convergence of empirical means (Bartlett & Mandelson 2002): µ XY sample µ xy := y Y p(y)e x p(x y) [φ(x,y)] population Novi Quadrianto: Estimating Labels from Label Proportions, Page 12

13 Intuition Binary classification Dataset 1 contains class +1 Dataset 2 contains class +1 and -1 with proportions p(+1) := ρ and p( 1) = 1 ρ µ + := E (x) p(x y=+1) [φ(x, y)] µ 1 := E (x) p(x set 1) [φ(x, y)] Novi Quadrianto: Estimating Labels from Label Proportions, Page 13

14 Re-calibrated sufficient statistics Binary classification [ µ1 µ 2 ] π = = [ 1 0 ρ 1 ρ [ µ+ µ ] = [ 1 0 ρ 1 ρ ] π 1 = [ 1 0 ρ 1 ρ ˆµ XY = ρµ 1 (1 ρ) 1 1 ρ ] [ ] µ+ µ [ 1 0 ] [ µ1 ρ 1 ρ µ 2 ] 1 1 ρ [ ] ρ 1 ρ µ ρ µ 2 ] Novi Quadrianto: Estimating Labels from Label Proportions, Page 14

15 Generalization Three class classification µ 1 µ 2 µ 3 = α β 1 (α + β) η ξ 1 (η + ξ) σ λ 1 (λ + σ) µ a µ b µ c Novi Quadrianto: Estimating Labels from Label Proportions, Page 15

16 The algorithm µ set X ˆµ class x ˆµ XY Novi Quadrianto: Estimating Labels from Label Proportions, Page 16

17 Performance guaranteed! Binary classification, φ(x, y) = yψ(x) and X 2 = X Theorem 1 With probability 1 δ the following bound holds: [ ˆµ XY µ XY 2ρ 2 + ] [ ] log 2/δ m m Some details m 1 is the number of observations in X 1 m + is the number of observations with y = +1 in X 2 Novi Quadrianto: Estimating Labels from Label Proportions, Page 17

18 Performance guaranteed! Bound on the minimizer of the log-posterior (Altun & Smola 2006) θ ˆθ λ 1 µ ˆµ Bound on the log-posterior (Altun & Smola 2006) Some details L(ˆθ, ˆµ) L(θ, µ) ˆθ θ ˆµ µ = λ 1 µ ˆµ 2 θ is the minimizer of L(θ, µ) ˆθ is the minimizer of L(ˆθ, ˆµ) Novi Quadrianto: Estimating Labels from Label Proportions, Page 18

19 Alternative Solutions Reduction to binary a binary classifier between set X 1 and X 2 label thresholding according to the known proportions Density estimation density estimation for each dataset X i re-calibration to get p(x y) via [ ] i π 1 p(x, y i) yi compute posterior probabilities MCMC (Kück & de Freitas 2005) explicitly generate mixing proportions per group by hierarchical probabilistic model use sampling to generate samples of model posterior distribution Novi Quadrianto: Estimating Labels from Label Proportions, Page 19

20 Experiments Table 1. Classification error on the UCI/LibSVM database Errors are reported in % with standard error. (%) ± SE. The best result and those results not significantly worse than it, are highlighted in red. We used a one-sided paired Welch t- test with 95% confidence level as reference. MM: Mean Map (ours) KDE: Kernel Density Estimation DS: Discriminative Sorting MCMC: Sampling Method BA: Baseline Data MM KDE DS MCMC BA iono 18.4± ± ± ± iris 10.0± ± ± ± optd 1.8± ± ± ± page 3.8± ± ± ± pima 27.5± ± ± ± tic 31.0± ± ± ± yeast 9.3± ± ± ± wine 7.4± ± ± ± wdbc 7.8± ± ± ± sonar 24.2± ± ± ± heart 30.0± ± ± ± brea 5.3± ± ± ± aust 17.0± ± ± ± svm3 20.4± ± ± ± adult 18.9± ± ± ± cleve 19.1± ± ± ± derm 4.9± ± ± ± musk 25.1± ± ± ± ger 32.4± ± ± ± cove 37.1± ± ± ± spli 25.2± ± ± ± giss 10.3± ± ± made 44.1± ± ± cmc 37.5± ± ± ± bupa 48.5± ± ± ± prota 44.6± ±0.1 N/A 65.3± protb 45.7± ±0.0 N/A 67.7± dnaa 16.6± ±0.8 N/A 37.7± dnab 29.1± ±0.7 N/A 40.5± sensa 19.8± ±0.0 N/A 43.2 sensb 21.0± ±0.0 N/A 43.2 Novi Quadrianto: Estimating Labels from Label Proportions, Page 20

21 Zooming in (binary results) KDE (Error Rate) 1 84% % DS (Error Rate) 1 64% % Mean Map (Error Rate) Mean Map (Error Rate) MCMC (Error Rate) % 24% Mean Map (Error Rate) Novi Quadrianto: Estimating Labels from Label Proportions, Page 21

22 Extensions Design parameters : Entropy and regularization : choosing various Csiszar and Bregman distances will produce a range of diverse estimators Function space : measuring the deviation in moment matching in term of l norm recovers sparse coding l 1 (dual connection) Novi Quadrianto: Estimating Labels from Label Proportions, Page 22

23 Summary Take home messages A new problem formulation which has not been solved and quite relevant in many aspects Our estimator can be easily implemented Our estimator enjoys the same rates of convergence as what can be expected from building an estimator with a fully labeled sample Our solution can be easily extended to other learning frameworks Our estimator works well in practice! Novi Quadrianto: Estimating Labels from Label Proportions, Page 23

Estimating Labels from Label Proportions

Journal of Machine Learning Research 1 (2008) xxx-xxx Submitted xx/08; Published xx/08 Estimating Labels from Label Proportions Novi Quadrianto novi.quad@gmail.com Statistical Machine Learning Group, NICTA