QUINT: On Query-Specific Optimal Networks

QUINT: On Query-Specific Optimal Networks Presenter: Liangyue Li Joint work with Yuan Yao (NJU) -1- Jie Tang (Tsinghua) Wei Fan (Baidu) Hanghang Tong (ASU)

Node Proximity: What? Node proximity: the closeness (a.k.a., relevance, or similarity) between two nodes What is the closest node to 4? 0.13 1 4 0.10 2 3 0.13 0.13 5 7 6 0.04 9 0.08 8 0.05 0.05 0.03 10 11 0.04 12 0.02-2 -

Node Proximity: Why? Biology [Ni+] Social Network [Lerman+] E-commerce [Chen+] Disaster Mgtm [Zheng+] - 3 -

- 4 - Node Proximity: How? Random Walk with Restart (RWR) Idea: summarize multiple weighted relationships btw nodes Variants: A Electric networks: SAEC[Faloutsos+] Katz [Katz], [Huang+] Matrix-Forest-based Alg [Chobotarev+] 1 H 1 1 1 D E I 1 1 1 1 F J 1 1 Prox (A, B) = G B Score (Red Path) + Score (Green Path) + Score (Blue Path) + Score (Purple Path) +

Node Proximity: RWR 9 10 2 12 1 3 8 11 4 5 6 7-5 -

Node Proximity -- RWR Detail: a random walker starts from s (a) transmit to one neighbor with (b) go back to s with prob Formulation (1 c) r s = car s +(1 p ca ij c)e s Ranking vector Adjacent matrix Restart prob Starting vector Assumption How to best leverage the fixed input graph A - 6 -

Node Proximity: Learning RWR Goal Use side information to learn better graph Side info: user feedback, node attributes Key Idea: Infer optimal edge weights X min w kwk2 + h(q(y, s) Q(x, s)) Map edge attributes to weights x2p,y2n Limitation: Fixed topology Match user preferences Q =(I ca) 1 J. Tang, T. Lou and J. Kleinberg. Transfer Link Prediction across Heterogeneous Social Networks. TOIS, 2015. L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. WSDM, 2011. A. - 7Agarwal, - S. Chakrabarti, and S. Aggarwal. Learning to rank networked Arizona entities. State KDD, University 2006.

Algorithmic Questions Q1: optimal weights or optimal topology? Q2: one-fits-all or one-fits-one? Q3: offline learning or online learning? - 8 -

Q1: Optimal Weights or Topology? Observation: real network is noisy and incomplete Challenge: learn optimal weights and topology 0.13 1 4 0.10 2 3 0.13 Missing edge 0.13 5 6 0.04 9 0.08 8 0.05 0.03 10 11 Noisy edge 0.04 12 0.02 7 0.05-9 -

Q2: One-fits-all, or one-fits-one? Observation: optimal network for different queries might be different Query Node P 1 Positive Nodes 4 2 3 5 9 8 6 7 10 11 12 Negative Nodes N N 1 Negative Nodes 4 2 3 Query Node 5 7 6 9 8 10 11 12 Positive Nodes P Challenge: How to tailor learning for each query - 10 -

Q3: Offline or Online Learning Observation: Learning RWR: costly iterative sub-routine to compute a single gradient vector Learning topology: parameter space expands to O(n 2 ) One-fits-one: one optimal network for each query Challenge: How to perform query-specific online learning? - 11 -

Query-specific Optimal Network Learning s Query Node P 1 Positive Nodes 4 2 3 5 7 6 9 8 10 11 12 Negative Nodes N A Given: An input network, a query node, positive nodes P and negative nodes N Learn: An optimal network A s specific to the query s - 12 -

Roadmap Motivations Proposed Solutions: QUINT Empirical Evaluations Conclusions - 13 -

QUINT - Formulations Q =(I ca) 1 Optimization Formulation (hard version) Remarks Matching Input Network Larger parameter space Query-specific Optimal Network Positive nodes No exception is allowed in the constraint Negative nodes arg min ka s Ak 2 F A s s.t., Q(x, s) > Q(y, s), 8x 2 P, 8y 2 N Matching Preference(hard) O(n 2 ) - 14 -

QUINT - Formulations Q =(I ca) 1 Optimization Formulation (soft version) arg min L(A s ) = ka s Ak 2 F A s + P g(q(y, s) Remarks Characteristic x2p,y2n Wilcoxon-Mann-Whitney (WMW) loss Loss function Q(x, s)) Penalty to the violation of preferences Q(y, s) < Q(x, s) ) g( ) =0 Q(y, s) > Q(x, s) ) g( ) > 0-15 -

QUINT -- Optimization Q =(I ca) 1 Gradient Descent Based Solution Gradient @L(A s ) @A s =2 (A s A)+ P =2 (A s A)+ P x,y x2p,y2n @g(d yx ) @d yx @g(q(y,s) Q(x,s)) @A s ( @Q(y,s) @A s @Q(x,s) @A s ) Derivative of an Inverse Differentiable @Q @A s (i,j) = Q @(I ca s) @A s (i,j) Q = cqjij Q @Q(x, s) @A s (i, j) = cq(x, i)q(j, s) - 16 -

QUINT -- Optimization Q =(I ca) 1 Intuition @Q(x, s) @A s (i, j) Complexity O(T 1 P N (T 2 m + n 2 )) Observation Usually = cq(x, i)q(j, s) Complexity: quadratic Query node s j Neighbor of Q(j, s) Q(x, i) T 1, T 2, P, N m, n Q: how to scale up? s / @Q(x, s) @A s (i, j) x i Positive node Neighbor of x - 17 -

QUINT Scale-up Q =(I ca) 1 Key idea: Optimal network is rank-one perturbation to original network Details: arg min L(f, g) = f,g kfg0 k 2 F + (kfk2 + kgk 2 ) + P g(q(y, s) Q(x, s)) Optimization: alternating gradient descent Complexity: x2p,y2n O(T 1 P N (T 2 m + n)) - 18 -

QUINT Variant #1 Key idea: apply Taylor Approximation for Details: Q =(I ca) 1 I + P k i=1 ck A k Complexity: using 1 st order Taylor O(T 1 P N n) Q Benefit: accessing faster Q(i, j) - 19 -

QUINT Variant #2 Key idea: Only update neighborhood of the query node and the pos/neg nodes (Localized Rank-One Perturbation) Complexity O(T 1 P N max( N(s), N(P, N ) )) N(s) :Neighbors of s N(P, N ):Neighbors of pos/neg nodes max( N(s), N(P, N ) ) n Benefit: usually sub-linear to n - 20 -

Roadmap Motivations Proposed Solutions: QUINT Empirical Evaluations Conclusions - 21 -

Datasets 10+ diverse networks - 22 -

Effectiveness: MAP (Higher is better) MAP: Mean Average Precision 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Admic/Adar Common Nbr SRW RWR wizan_dual ProSIN QUINT-Basic QUINT-Basic1st QUINT-rankOne Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm - 23 -

Effectiveness: HLU (Higher is better) HLU: Half-life Utility 90 80 70 60 50 40 30 20 10 0 Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm - 24 -

Effectiveness: AUC (Higher is better) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-25 - Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm

Effectiveness: Precision@20 (Higher is better) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm - 26 -

Effectiveness: Recall@5 (Higher is better) - 27-0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm

Effectiveness: MPR (Lower is better) MPR: Mean Percentile Ranking 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm - 28 -

Efficiency -- Twitter 10 3 10 3 Running Time (second) 10 2 10 1 10 0 Running Time (second) 0.34 0.32 0.3 0.28 0.26 0.24 0.22 0.2 0.18 0.16 0 0.5 1 1.5 2 2.5 3 3.5 4 # Nodes QUINT rankone QUINT Basic1st QUINT rankone x 10 7 Running Time (second) 10 2 10 1 10 0 Running Time (second) 0.34 0.32 0.3 0.28 0.26 0.24 0.22 0.2 0.18 0.16 0 5 10 15 # Edges QUINT rankone QUINT Basic1st QUINT rankone x 10 8 1s 10 1 0 0.5 1 1.5 2 2.5 3 3.5 4 0 5 10 15 # Nodes 10 7 x 7 # Edges 10 8 x 8 10 1 QUINT-rankOne scales sub-linearly - 29 -

Roadmap Motivations Proposed Solutions: QUINT Empirical Evaluations Conclusions - 30 -

0.34 0.32 0.3 0.28 0.26 0.24 0.22 0.2 0.18 0.16 QUINT rankone 0 5 10 15 x 10 8-31 - Conclusion: QUINT Goals: Learn Optimal network (for Node Proximity) Q1 Q2 Q3 Existing Optimal weights One-fit-all offline QUINT Optimal topology One-fit-one online Algorithms: VERY efficient way to compute Rank-1 approx + Taylor approx + local search Results: consistently better on 10+ networks & 6 metrics sublinear scalability, near real-time response on billionscale networks @Q(x, s) @A s (i, j) Admic/Adar Common Nbr SRW 10 3 RWR wizan_dual ProSIN 0.9 QUINT-Basic QUINT-Basic1st QUINT-rankOne 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Email Gene Last.fm Running Time (second) 10 2 10 1 10 0 Query node Arizona State 10 University 1 s / j Neighbor of s Running Time (second) Q(j, s) Q(x, i) @Q(x, s) @As(i, j) # Edges x Positive node i Neighbor of x QUINT Basic1st QUINT rankone 0 5 10 15 # Edges x 10 8