QUINT: On Query-Specific Optimal Networks

Size: px

Start display at page:

Download "QUINT: On Query-Specific Optimal Networks"

Martina Moody
6 years ago
Views:

1 QUINT: On Query-Specific Optimal Networks Presenter: Liangyue Li Joint work with Yuan Yao (NJU) -1- Jie Tang (Tsinghua) Wei Fan (Baidu) Hanghang Tong (ASU)

2 Node Proximity: What? Node proximity: the closeness (a.k.a., relevance, or similarity) between two nodes What is the closest node to 4?

3 Node Proximity: Why? Biology [Ni+] Social Network [Lerman+] E-commerce [Chen+] Disaster Mgtm [Zheng+] - 3 -

4 - 4 - Node Proximity: How? Random Walk with Restart (RWR) Idea: summarize multiple weighted relationships btw nodes Variants: A Electric networks: SAEC[Faloutsos+] Katz [Katz], [Huang+] Matrix-Forest-based Alg [Chobotarev+] 1 H D E I F J 1 1 Prox (A, B) = G B Score (Red Path) + Score (Green Path) + Score (Blue Path) + Score (Purple Path) +

5 Node Proximity: RWR

6 Node Proximity -- RWR Detail: a random walker starts from s (a) transmit to one neighbor with (b) go back to s with prob Formulation (1 c) r s = car s +(1 p ca ij c)e s Ranking vector Adjacent matrix Restart prob Starting vector Assumption How to best leverage the fixed input graph A - 6 -

7 Node Proximity: Learning RWR Goal Use side information to learn better graph Side info: user feedback, node attributes Key Idea: Infer optimal edge weights X min w kwk2 + h(q(y, s) Q(x, s)) Map edge attributes to weights x2p,y2n Limitation: Fixed topology Match user preferences Q =(I ca) 1 J. Tang, T. Lou and J. Kleinberg. Transfer Link Prediction across Heterogeneous Social Networks. TOIS, L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. WSDM, A. - 7Agarwal, - S. Chakrabarti, and S. Aggarwal. Learning to rank networked Arizona entities. State KDD, University 2006.

8 Algorithmic Questions Q1: optimal weights or optimal topology? Q2: one-fits-all or one-fits-one? Q3: offline learning or online learning? - 8 -

9 Q1: Optimal Weights or Topology? Observation: real network is noisy and incomplete Challenge: learn optimal weights and topology Missing edge Noisy edge

10 Q2: One-fits-all, or one-fits-one? Observation: optimal network for different queries might be different Query Node P 1 Positive Nodes Negative Nodes N N 1 Negative Nodes Query Node Positive Nodes P Challenge: How to tailor learning for each query

11 Q3: Offline or Online Learning Observation: Learning RWR: costly iterative sub-routine to compute a single gradient vector Learning topology: parameter space expands to O(n 2 ) One-fits-one: one optimal network for each query Challenge: How to perform query-specific online learning?

12 Query-specific Optimal Network Learning s Query Node P 1 Positive Nodes Negative Nodes N A Given: An input network, a query node, positive nodes P and negative nodes N Learn: An optimal network A s specific to the query s

13 Roadmap Motivations Proposed Solutions: QUINT Empirical Evaluations Conclusions

14 QUINT - Formulations Q =(I ca) 1 Optimization Formulation (hard version) Remarks Matching Input Network Larger parameter space Query-specific Optimal Network Positive nodes No exception is allowed in the constraint Negative nodes arg min ka s Ak 2 F A s s.t., Q(x, s) > Q(y, s), 8x 2 P, 8y 2 N Matching Preference(hard) O(n 2 )

15 QUINT - Formulations Q =(I ca) 1 Optimization Formulation (soft version) arg min L(A s ) = ka s Ak 2 F A s + P g(q(y, s) Remarks Characteristic x2p,y2n Wilcoxon-Mann-Whitney (WMW) loss Loss function Q(x, s)) Penalty to the violation of preferences Q(y, s) < Q(x, s) ) g( ) =0 Q(y, s) > Q(x, s) ) g( ) >

16 QUINT -- Optimization Q =(I ca) 1 Gradient Descent Based Solution s s =2 (A s A)+ P =2 (A s A)+ P x,y yx @A s ) Derivative of an s (i,j) = ca s (i,j) Q = cqjij s (i, j) = cq(x, i)q(j, s)

17 QUINT -- Optimization Q =(I ca) 1 s (i, j) Complexity O(T 1 P N (T 2 m + n 2 )) Observation Usually = cq(x, i)q(j, s) Complexity: quadratic Query node s j Neighbor of Q(j, s) Q(x, i) T 1, T 2, P, N m, n Q: how to scale up? s s (i, j) x i Positive node Neighbor of x

18 QUINT Scale-up Q =(I ca) 1 Key idea: Optimal network is rank-one perturbation to original network Details: arg min L(f, g) = f,g kfg0 k 2 F + (kfk2 + kgk 2 ) + P g(q(y, s) Q(x, s)) Optimization: alternating gradient descent Complexity: x2p,y2n O(T 1 P N (T 2 m + n))

19 QUINT Variant #1 Key idea: apply Taylor Approximation for Details: Q =(I ca) 1 I + P k i=1 ck A k Complexity: using 1 st order Taylor O(T 1 P N n) Q Benefit: accessing faster Q(i, j)

20 QUINT Variant #2 Key idea: Only update neighborhood of the query node and the pos/neg nodes (Localized Rank-One Perturbation) Complexity O(T 1 P N max( N(s), N(P, N ) )) N(s) :Neighbors of s N(P, N ):Neighbors of pos/neg nodes max( N(s), N(P, N ) ) n Benefit: usually sub-linear to n

21 Roadmap Motivations Proposed Solutions: QUINT Empirical Evaluations Conclusions

22 Datasets 10+ diverse networks

23 Effectiveness: MAP (Higher is better) MAP: Mean Average Precision Admic/Adar Common Nbr SRW RWR wizan_dual ProSIN QUINT-Basic QUINT-Basic1st QUINT-rankOne Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Gene Last.fm

24 Effectiveness: HLU (Higher is better) HLU: Half-life Utility Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Gene Last.fm

25 Effectiveness: AUC (Higher is better) Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Gene Last.fm

Effectiveness: Precision@20 (Higher is better) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.

26 Effectiveness: (Higher is better) Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Gene Last.fm

Effectiveness: Recall@5 (Higher is better) - 27-0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.

27 Effectiveness: (Higher is better) Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Gene Last.fm

28 Effectiveness: MPR (Lower is better) MPR: Mean Percentile Ranking Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Gene Last.fm

29 Efficiency -- Twitter Running Time (second) Running Time (second) # Nodes QUINT rankone QUINT Basic1st QUINT rankone x 10 7 Running Time (second) Running Time (second) # Edges QUINT rankone QUINT Basic1st QUINT rankone x s # Nodes 10 7 x 7 # Edges 10 8 x QUINT-rankOne scales sub-linearly

30 Roadmap Motivations Proposed Solutions: QUINT Empirical Evaluations Conclusions

0.34 0.32 0.3 0.28 0.26 0.24 0.22 0.2 0.18 0.

31 QUINT rankone x Conclusion: QUINT Goals: Learn Optimal network (for Node Proximity) Q1 Q2 Q3 Existing Optimal weights One-fit-all offline QUINT Optimal topology One-fit-one online Algorithms: VERY efficient way to compute Rank-1 approx + Taylor approx + local search Results: consistently better on 10+ networks & 6 metrics sublinear scalability, near real-time response on billionscale s (i, j) Admic/Adar Common Nbr SRW 10 3 RWR wizan_dual ProSIN 0.9 QUINT-Basic QUINT-Basic1st QUINT-rankOne Astro-Ph GR-QC Hep-TH Hep-PH Protein Airport Oregon NBA Gene Last.fm Running Time (second) Query node Arizona State 10 University 1 s / j Neighbor of s Running Time (second) Q(j, s) Q(x, j) # Edges x Positive node i Neighbor of x QUINT Basic1st QUINT rankone # Edges x 10 8

QUINT: On Query-Specific Optimal Networks

QUINT: On Query-Specific Optimal Networks Liangyue Li Arizona State University liangyue@asu.edu Yuan Yao Nanjing University targenardy@gmail.com Jie Tang Tsinghua University jietang@tsinghua.edu.cn Wei