Semidefinite Programs for Exact Recovery of a Hidden Community (and Many Communities)

Size: px

Start display at page:

Download "Semidefinite Programs for Exact Recovery of a Hidden Community (and Many Communities)"

Beverley Johnston
5 years ago
Views:

1 Semidefinite Programs for Exact Recovery of a Hidden Community (and Many Communities) Bruce Hajek 1 Yihong Wu 1 Jiaming Xu 2 1 University of Illinois at Urbana-Champaign 2 Simons Insitute, UC Berkeley June 23, 2016

2 Hidden community model [Deshpande-Montanari 13] P Q Data: n n symmetric matrix A with empty diagonal Community C [n] of size K uniform at random, such that { P both i and j C A ij Q otherwise Q Q (K, P, Q) varies with n Goal: exact recovery of C from A P{Ĉ = C } n 1 Fruitful venue for stuying computational aspects of statistical problems Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 2

3 Examples Planted dense subgraph P = Bern(p), Q = Bern(q), p > q A = adjancency matrix of G(n, q) planted with G(K, p) [Alon et al 98, McSherry 01, Arias-Castro-Verzelen 14, Chen-Xu 14, Montanari 15,...] Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 3

4 Examples Planted dense subgraph P = Bern(p), Q = Bern(q), p > q A = adjancency matrix of G(n, q) planted with G(K, p) [Alon et al 98, McSherry 01, Arias-Castro-Verzelen 14, Chen-Xu 14, Montanari 15,...] Submatrix localization P = N (0, µ), Q = N (0, 1), µ > 0 A = µ 0 + noise [Shabalin et al 09, Butucea-Ingster 11, Kolar et al 11, Ma-W 13, Cai et al 15,...] Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 3

5 Running example: Plated Dense Subgraph

6 Planted dense subgraph graph view Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 5

7 Planted dense subgraph graph view 1 A community of K vertices are chosen randomly Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 5

8 Planted dense subgraph graph view 1 A community of K vertices are chosen randomly 2 For every pair of nodes in the community, add an edge w.p. p Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 5

9 Planted dense subgraph graph view 1 A community of K vertices are chosen randomly 2 For every pair of nodes in the community, add an edge w.p. p 3 For other pairs of nodes, add an edge w.p. q Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 5

10 Planted dense subgraph graph view 1 A community of K vertices are chosen randomly 2 For every pair of nodes in the community, add an edge w.p. p 3 For other pairs of nodes, add an edge w.p. q Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 5

11 Planted dense subgraph adjacency matrix view n = 200, K = 50, p = 0.3, q = 0.1 Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 6

12 Planted dense subgraph adjacency matrix view n = 200, K = 50, p = 0.3, q = 0.1 Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 6

13 Planted dense subgraph adjacency matrix view n = 200, K = 50, p = 0.3, q = 0.1 Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 6

14 Computational gap in planted Clique p = 1 q = Ω(1) K = Ω(log n): exact recovery is possible via maximum likelihood K = Ω( n): exact recovery is attainable in poly-time [Alon et al. 98] K = o( n): exact recovery is believed to be hard [Deshpande-Montanari 15] [Meka-Potechin-Wigderson 15],... Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 7

15 Computational gap in planted Clique p = 1 q = Ω(1) K = Ω(log n): exact recovery is possible via maximum likelihood K = Ω( n): exact recovery is attainable in poly-time [Alon et al. 98] K = o( n): exact recovery is believed to be hard [Deshpande-Montanari 15] [Meka-Potechin-Wigderson 15],... What about dense subgraphs clique? Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 7

16 Linear community size K = ρ n p = a log n n and q = b log n n Theorem (Hajek-W-Xu Trans. IT 16) If ρ > ρ, exact recovery is possible in polynomial-time. If ρ < ρ, exact recovery is impossible. Remarks ρ = 1/(a τ log ea τ ) with τ = Convex (SDP) relaxation works a b log a log b Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 8

17 Sublinear community size [Hajek-W-Xu, COLT 15] β 1 K = Θ(n β ) 1/2 O easy PC hard? impossible 2/3 p = cq = Θ(n α ) 1 α K = Ω(n): SDP works K = n 1 ɛ : no known poly-time algorithm Where is the SDP barrier? Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 9

18 Sublinear community size [Hajek-W-Xu, COLT 15] β 1 K = Θ(n β ) 1/2 O easy PC hard? impossible 2/3 p = cq = Θ(n α ) 1 α K = Ω(n): SDP works K = n 1 ɛ : no known poly-time algorithm Where is the SDP barrier? K = Θ( n log n ) Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 9

19 SDP Relaxation vs. Information-Theoretic Limits Main results: For both planted dense subgraph (Bernoulli) and submatrix localization (Gaussian) K = ω( n log n ): SDP attains the info-theoretic limit with sharp constants K = Θ( n log n ): SDP is order-wise optimal, but strictly suboptimal by a constant factor K = o( n log n ) and K : SDP is order-wise suboptimal Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 10

20 SDP Relaxation vs. Information-Theoretic Limits Log-likelihood ratio matrix L Let ξ = indicator of C. L ij = log dp dq (A ij), i j, L ii = 0 Maximum likelihood estimator = find densest K-subgraph ˆξ MLE = arg max L ij ξ i ξ j ξ i,j s.t. ξ {0, 1} n ξ, 1 = K. Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 11

21 Lift: Z = ξξ Ẑ MLE = arg max L, Z Z s.t. rank(z) = 1 Z ii 1 Z ij 0, I, Z = K J, Z = K 2 i [n] i, j [n] Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 12

22 Semidefinite programming Natural SDP relaxation: Ẑ SDP = arg max L, Z Z s.t. Z 0 Z ii 1 Z 0 I, Z = K J, Z = K 2 i [n] Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 13

23 Semidefinite programming Natural SDP relaxation: Ẑ SDP = arg max L, Z Z s.t. Z 0 Z ii 1 Z 0 I, Z = K J, Z = K 2 i [n] Goal: P Ẑ SDP = ẐMLE = Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 13

24 Analysis of SDP Define e(i, C ) = j C L ij, i [n],, β = D(Q P). Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 14

25 Analysis of SDP Theorem Sufficient condition: Ẑ SDP = Z, if { } min e(i, C ) max max e(j, C ), Kβ > L E [L] β i C j / C Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 15

26 Analysis of SDP Theorem Sufficient condition: Ẑ SDP = Z, if { } min e(i, C ) max max e(j, C ), Kβ > L E [L] β i C j / C Necessary condition: If Z ẐSDP, then { min e(i, C ) max e(j, C ) sup V (a) a } i C j / C 1 a K K max e(j, C ), j / C where V (a) = max{ LC C, Z : Z 0, Z 0, Tr(Z) = 1, J, Z = a} is the value of an (simpler) auxilliary SDP Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 15

27 Remarks To apply this result, min, max, L E [L], etc concentrate Sufficient condition proof: construction of dual witnesses (standard) Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 16

28 Proof of necessary condition Primal proof: random perturbation of the ground truth to establish integrality gap Z = 1 Z = 0 Z = 0 Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 17

29 Proof of necessary condition Primal proof: random perturbation of the ground truth to establish integrality gap Z = Z = 0 Z = 0 + Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 17

30 Proof of necessary condition Primal proof: random perturbation of the ground truth to establish integrality gap Z = Z = 0 Z = 0 + Dual proof: non-existence of dual witness Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 17

31 Multiple communities

32 k communities: MLE SDP relaxation SBM with k communities and parameter (p, q) max k A, θ l θl l=1 s.t. θ l {0, 1} n θ l, 1 = n/k θ l, θ l = 0, l l Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 19

33 k communities: MLE SDP relaxation SBM with k communities and parameter (p, q) max k A, θ l θl max A, Z l=1 s.t. θ l {0, 1} n lift: Z= k l=1 θ lθ l ========== s.t. rank(z) = k θ l, 1 = n/k Z ii = 1 i [n] θ l, θ l = 0, l l Z ij 0, Z ij = n/k j Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 19

34 k communities: MLE SDP relaxation SBM with k communities and parameter (p, q) max k A, θ l θl max A, Z l=1 s.t. θ l {0, 1} n lift: Z= k l=1 θ lθ l ========== s.t. Z 0 θ l, 1 = n/k Z ii = 1 i [n] θ l, θ l = 0, l l Z ij 0, Z ij = n/k j Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 19

35 k communities: MLE SDP relaxation SBM with k communities and parameter (p, q) max k A, θ l θl max A, Z l=1 s.t. θ l {0, 1} n lift: Z= k l=1 θ lθ l ========== s.t. Z 0 θ l, 1 = n/k Z ii = 1 i [n] θ l, θ l = 0, l l Z ij 0, Z ij = n/k Goal: P Ẑ SDP = j Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 19

36 k equal-sized communities: optimal recovery via SDP Theorem (Hajek-W-Xu 15) For a fixed k communities with p = a log n/n and q = b log n/n. If a b > k, exact recovery is attained via SDP in poly-time. If a b < k, exact recovery is impossible. Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 20

37 k equal-sized communities: optimal recovery via SDP Theorem (Hajek-W-Xu 15) For a fixed k communities with p = a log n/n and q = b log n/n. If a b > k, exact recovery is attained via SDP in poly-time. If a b < k, exact recovery is impossible. Remarks Extended to k = o(log n) in [Agarwal-Bandeira-Koiliaris-Kolla 15] Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 20

38 k equal-sized communities: optimal recovery via SDP Theorem (Hajek-W-Xu 15) For a fixed k communities with p = a log n/n and q = b log n/n. If a b > k, exact recovery is attained via SDP in poly-time. If a b < k, exact recovery is impossible. Remarks Extended to k = o(log n) in [Agarwal-Bandeira-Koiliaris-Kolla 15] Extended to the case with multiple unequal-sized clusters [Perry-Wein 15] Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 20

39 When does SDP cease to be optimal? Theorem (Hajek-W.-Xu 16) k log n: SDP achieves the optimal exact recovery threshold. k c log n: SDP is suboptimal by a constant factor. k log n: SDP is order-suboptimal. Remarks A hard but informationally possible regime is conjectured to exist for exact recovery when k log n [Chen-Xu 14] Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 21

40 Some remaining problems Can the computational gap for exact recovery be bridged by any polynomial time algorithm? (SoS hardness result or reduction to PC would offer further evidence for no answer.) Approximate recovery? (Current proof only rules out exact recovery.) Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 22

41 Some remaining problems Can the computational gap for exact recovery be bridged by any polynomial time algorithm? (SoS hardness result or reduction to PC would offer further evidence for no answer.) Approximate recovery? (Current proof only rules out exact recovery.) Thank you! Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 22

42 Necessary condition for optimality of SDP EXTRA SLIDES NOT INCLUDED IN ORIGINAL Let M = L (C ) c (C ) c denote the submatrix of L outside the community. For a R, consider the (random) value of the following SDP: V (a) max M, Z (1) Z s.t. Z 0 Z 0 Tr(Z) = 1 J, Z = a. Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 23

43 Necessary condition for optimality of SDP Theorem (Necessary condition for SDP) If Z ẐSDP, then { min e(i, C ) max e(j, C ) sup V (a) a } i C j / C 1 a K K max e(j, C ). (2) j / C Weaker necessary condition (set a = K):. min e(i, C ) V (K) i C Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 24

44 SDP vs. MLE, message passing, and linear MP 0.06 ρ 0.04 SDP necessary MLE MP linear MP µ 0 Phase diagram for the Gaussian model with K = ρn/ log n and µ = µ 0 log n/ n. Bruce Hajek, Yihong Wu and Jiaming Xu SDP for Community Recovery 25

Phase Transitions in Semidefinite Relaxations

Phase Transitions in Semidefinite Relaxations Andrea Montanari [with Adel Javanmard, Federico Ricci-Tersenghi, Subhabrata Sen] Stanford University April 5, 2016 Andrea Montanari (Stanford) SDP Phase Transitions