De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs, Jianwei Qian Illinois Tech Chunhong Zhang BUPT Xiang#Yang Li USTC,/Illinois Tech Linlin Chen Illinois Tech
Outline Background Prior Work Our Work Conclusion 2
Background Tons of social network data Released to third-parties for research and business Though user IDs removed, attackers with prior knowledge can de-anonymize them. privacy leak 3
Attacking Process Prior k.g. 4
Prior k.g. Privacy leaked! 5
Attack,Stage,1 De#Anonymization Which isalice? Which is Bob? Direct privacy leak 6
Attack,Stage,2 Privacy Inference Correlations between attributes/users Higher education => higher salary Colleagues=> same company Common hobbies => friends Infer new info that is not published Indirect privacy leak 7
What,Do,We,Want,to,Do? To understand How privacy is leaked to the attacker 8
Outline Background Prior Work Our Work Conclusion 9
Prior,Work De-anonymize one user Fight Never ending! Degree attack [SIGMOD 08] 1-neighborhood attack [INFOCOM 13] 1*-neighborhood attack [ICDE 08] Friendship attack [KDD 11] k-degree anonymity 1-neighborhood anonymity 1*-neighborhood anonymity " # -degree anonymity Assume specific prior knowledge! Community re-identification [SDM 11] k-structural diversity 10
Prior,Work De-anonymize all the users Graph mapping based de-anonymization [WWW 07, S&P 09, CCS 12, COSN 13, CCS 14, NDSS 15] Attacker holds an auxiliary SN that overlaps with the published SN Mapping Twitter Flickr 11
Limitations Assume attacker has specific prior knowledge We assume diverse and probabilistic knowledge Focus on de-anonymization only. How attacker infers privacy afterwards is barely discussed We consider it as 2 nd attacking step! 12
Outline Background Prior Work Our Work Conclusion 13
Goals To construct a comprehensive and realistic model of the attacker s knowledge To use this model to depict how privacy is leaked. 14
Challenges Hard to build such an expressive model, given that the attacker has various prior knowledge Hard to simulate attacking process, since the attacker has various techniques 15
Solution Use knowledge graph to model attacker s knowledge 16
Knowledge Graph Knowledge => directed edge Each edge has a confidence score 17
What s Privacy? Every edge is privacy Privacy is leaked when $ % e $ ( (*) > -(*) Say 30% Prior Posterior 18
De#Anonymization Mapping A Prior knowledge 8 % Anonymized graph 8 : argmax3456 7 (8 %, 8 : ) 456 7 8 %, 8 : = (>,?) 7 4(5, =), S is node similarity function 19
Node Similarity Attribute Similarity Use Jaccard index to compare attribute sets Relation similarity Inbound neighborhood outbound neighborhood l-hop neighborhood 4 B 5, = = C > 4 > 5, = + C E 4 E 5, = + C F 4 F 5, = 4 5, = = C G 4 G 5, = + 1 C G 4 B 5, = 20
Problem Transformation Mapping => Max weighted bipartite matching Naïve3method: 8 % Huge complexity! 8 : I % 3I : (millions) 21
Top#k Strategy Alice Suppose k=2 1 3 8 % 8 : I % 3I : (millions) "3I % 22
How,to,Choose, Top#k Candidates? Intuition If two nodes match, their neighbors are also very likely to match. Alice 1 Bob 2 Perform BFS on 8 % 23
Complexity Analysis Building Bipartite Time Finding Matching Space Naïve method Top-k strategy I % I : [ I % + I : I % # I : [ I % + I : # I % I : [ " # I % ] [ " # I % # Complexity3greatly reduced! 24
Tradeoff " balances accuracy and complexity " = 10 is enough to achieve high accuracy Accuracy 1 0.8 0.6 0.4 0.2 0 sr=0.4 sr=0.6 sr=0.8 0 10 20 30 40 50 60 70 80 k k Time 50 45 40 35 30 25 20 15 10 sr=0.4 sr=0.6 sr=0.8 0 10 20 30 40 50 60 70 80 k k 25
Privacy inference Predict new edges in knowledge graph NY Knicks teaminleague opponent playfor LA Lakers playfor playinleague Nick Young teammate Kobe Bryant 26
Path,Ranking Algorithm Proposed by Ni Lao et al. in 2011 for a different topic Alice AIDS Correlations => rules => paths Logistic regression 27
Experiments Datasets Google+, Pokec Steps Generate 8 : Generate 8 % Run the algorithms 28
De#Anonymization Results Accuracy Metrics: accuracy, run time 1 0.8 0.6 0.4 0.2 k=5 k=10 k=15 Accuracy Acc 5 De-anonymize 0 Time 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 about 1 60% of users 0 0.2 0.4 0.6 0.8 1 0 s min w A Run Time(x10 2 s) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 80 70 60 50 40 30 20 10 0 Load Build Match Total Accuracy 40 35 30 25 20 15 10 0.8 0.7 0.6 Accuracy 0.5 0.4 0.3 0.2 Run Time(s) RS EG RW 29
Privacy Inference Results Metrics: hit@k, MRR (Mean reciprocal rank ) MRR(%) MRR(%) 1.2 MRR MRR,RG Hit Hit,RG Sample Ratio 6 Hit@10(%) MRR(%) Hit@10(%) MRR(%) 2.5 MRR MRR,RG Hit Hit,RG 1 5 2 10 0.8 4 8 1.5 0.6 3 6 0.4 2 1 4 0.2 1 0.5 Infers much 0 10 20 30 40 50 60 70 80 90 0 more privacy 2 0 10 20 30 40 50 60 70 80 90 0 sr sr 5.5 than random 4.5 5 MRR 35 30 guess 75 MRR MRR,RG 30 3.5 4 Hit 25 MRR,RG 70 Hit 65 Hit,RG 25 Hit,RG 60 2.5 3 20 20 55 50 1.5 2 15 15 45 10 40 0.5 1 5 10 35 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 25 30 Sample Ratio 30 12 Hit@10(%) Hit@10(%)
Outline Background Prior Work Our Work Conclusion 31
Conclusion We have Applied knowledge graphs to model the attacker s prior knowledge Studied the attack process: de-anonymization & privacy inference Designed methods to perform attack Done simulations and evaluations on two real world social networks 32
Future work Effective construction of the bipartite for large scale social networks Impact of adversarial knowledge on deanonymizability Fine-grained privacy inference on the knowledge graph 33
Thank you! Jianwei Qian jqian15@hawk.iit.edu https://sites.google.com/site/jianweiqianshomepage 34