De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs,

Similar documents
Outsourcing Privacy-Preserving Social Networks to a Cloud

Distributed Data Anonymization with Hiding Sensitive Node Labels

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Sensitive Label Privacy Protection on Social Network Data

How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?

Emerging Measures in Preserving Privacy for Publishing The Data

Pufferfish: A Semantic Approach to Customizable Privacy

SOFIA: Social Filtering for Niche Markets

CLUSTER BASED ANONYMIZATION FOR PRIVACY PRESERVATION IN SOCIAL NETWORK DATA COMMUNITY

On Application-Aware Data Extraction for Big Data in Social Networks

Sampling Large Graphs: Algorithms and Applications

arxiv: v1 [cs.db] 11 Oct 2013

Dissecting Tor Bridges A Security Evaluation of their Private and Public Infrastructures

Sampling Large Graphs: Algorithms and Applications

Graph Symmetry and Social Network Anonymization

P 5 : A Protocol for Scalable Anonymous Communications

Data Mining: Dynamic Past and Promising Future

Recommendation System for Location-based Social Network CS224W Project Report

Fast Nearest Neighbor Search on Large Time-Evolving Graphs

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

Yi Song, Panagiotis Karras, Qian Xiao and Stephane Bressan

Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report

De-anonymization. A. Narayanan, V. Shmatikov, De-anonymizing Social Networks. International Symposium on Security and Privacy, 2009

Meta-path based Multi-Network Collective Link Prediction

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

Being Prepared In A Sparse World: The Case of KNN Graph Construction. Antoine Boutet DRIM LIRIS, Lyon

José Miguel Hernández Lobato Zoubin Ghahramani Computational and Biological Learning Laboratory Cambridge University

Link Prediction for Social Network

Dynamic Embeddings for User Profiling in Twitter

#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks

Inferring the Source of Encrypted HTTP Connections

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She

Inferring Coarse Views of Connectivity in Very Large Graphs

DE-ANONYMIZING SOCIAL NETWORKS AND MOBILITY TRACES

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

Bayes Classifiers and Generative Methods

RAPTOR: Routing Attacks on Privacy in Tor. Yixin Sun. Princeton University. Acknowledgment for Slides. Joint work with

Prateek Mittal Princeton University

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness

Solution of Exercise Sheet 11

Securing Mediated Trace Access Using Black-box Permutation Analysis

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD)

Graph Data Management

Survey Result on Privacy Preserving Techniques in Data Publishing

QUINT: On Query-Specific Optimal Networks

Data Science Training

Hiding the Presence of Individuals from Shared Databases: δ-presence

It s the Way you Check-in: Identifying Users in Location-Based Social Networks

OSN: when multiple autonomous users disclose another individual s information

(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data

Part 11: Collaborative Filtering. Francesco Ricci

Entity Matching in Online Social Networks

Making Recommendations by Integrating Information from Multiple Social Networks

You are Who You Know and How You Behave: Attribute Inference Attacks via Users Social Friends and Behaviors

Challenges in Multiresolution Methods for Graph-based Learning

Internal Link Prediction in Early Stage using External Network

Scalable Network Analysis

Aiding the Detection of Fake Accounts in Large Scale Social Online Services

Mining Trusted Information in Medical Science: An Information Network Approach

Privacy Risks and Countermeasures in Publishing and Mining Social Network Data

Introduction to Text Mining. Hongning Wang

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database

CS249: ADVANCED DATA MINING

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Threats & Vulnerabilities in Online Social Networks

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Multi-label classification using rule-based classifier systems

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Lecture outline. Graph coloring Examples Applications Algorithms

PRIVACY IN LARGE DATASETS

Data Anonymization. Graham Cormode.

Demonstration of Damson: Differential Privacy for Analysis of Large Data

Prowess Improvement of Accuracy for Moving Rating Recommendation System

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park

Measuring Intrusion Detection Capability: An Information- Theoretic Approach

IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W STRUCTURAL DIVERSITY ANONYMITY

Towards a hybrid approach to Netflix Challenge

SOCIAL MEDIA MINING. Data Mining Essentials

A Study of Privacy Attacks on Social Network Data

walk2friends: Inferring Social Links from Mobility Profiles

Alpha Anonymization in Social Networks using the Lossy-Join Approach

STREAMING RANKING BASED RECOMMENDER SYSTEMS

COMP5331: Knowledge Discovery and Data Mining

Inferring Protocol State Machine from Network Traces: A Probabilistic Approach

NON-CENTRALIZED DISTINCT L-DIVERSITY

Search Quality. Jan Pedersen 10 September 2007

Extracting Information from Complex Networks

Measurement-calibrated Graph Models for Social Network Experiments

Mining and Analyzing Online Social Networks

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

Mining Web Data. Lijun Zhang

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Toward Efficient Search for a Fragment Network in a Large Semantic Database

Identifying Web Spam With User Behavior Analysis

Privacy. CS Computer Security Profs. Vern Paxson & David Wagner

A Differentially Private Matching Scheme for Pairing Similar Users of Proximity Based Social Networking applications

REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD

Transcription:

De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs, Jianwei Qian Illinois Tech Chunhong Zhang BUPT Xiang#Yang Li USTC,/Illinois Tech Linlin Chen Illinois Tech

Outline Background Prior Work Our Work Conclusion 2

Background Tons of social network data Released to third-parties for research and business Though user IDs removed, attackers with prior knowledge can de-anonymize them. privacy leak 3

Attacking Process Prior k.g. 4

Prior k.g. Privacy leaked! 5

Attack,Stage,1 De#Anonymization Which isalice? Which is Bob? Direct privacy leak 6

Attack,Stage,2 Privacy Inference Correlations between attributes/users Higher education => higher salary Colleagues=> same company Common hobbies => friends Infer new info that is not published Indirect privacy leak 7

What,Do,We,Want,to,Do? To understand How privacy is leaked to the attacker 8

Outline Background Prior Work Our Work Conclusion 9

Prior,Work De-anonymize one user Fight Never ending! Degree attack [SIGMOD 08] 1-neighborhood attack [INFOCOM 13] 1*-neighborhood attack [ICDE 08] Friendship attack [KDD 11] k-degree anonymity 1-neighborhood anonymity 1*-neighborhood anonymity " # -degree anonymity Assume specific prior knowledge! Community re-identification [SDM 11] k-structural diversity 10

Prior,Work De-anonymize all the users Graph mapping based de-anonymization [WWW 07, S&P 09, CCS 12, COSN 13, CCS 14, NDSS 15] Attacker holds an auxiliary SN that overlaps with the published SN Mapping Twitter Flickr 11

Limitations Assume attacker has specific prior knowledge We assume diverse and probabilistic knowledge Focus on de-anonymization only. How attacker infers privacy afterwards is barely discussed We consider it as 2 nd attacking step! 12

Outline Background Prior Work Our Work Conclusion 13

Goals To construct a comprehensive and realistic model of the attacker s knowledge To use this model to depict how privacy is leaked. 14

Challenges Hard to build such an expressive model, given that the attacker has various prior knowledge Hard to simulate attacking process, since the attacker has various techniques 15

Solution Use knowledge graph to model attacker s knowledge 16

Knowledge Graph Knowledge => directed edge Each edge has a confidence score 17

What s Privacy? Every edge is privacy Privacy is leaked when $ % e $ ( (*) > -(*) Say 30% Prior Posterior 18

De#Anonymization Mapping A Prior knowledge 8 % Anonymized graph 8 : argmax3456 7 (8 %, 8 : ) 456 7 8 %, 8 : = (>,?) 7 4(5, =), S is node similarity function 19

Node Similarity Attribute Similarity Use Jaccard index to compare attribute sets Relation similarity Inbound neighborhood outbound neighborhood l-hop neighborhood 4 B 5, = = C > 4 > 5, = + C E 4 E 5, = + C F 4 F 5, = 4 5, = = C G 4 G 5, = + 1 C G 4 B 5, = 20

Problem Transformation Mapping => Max weighted bipartite matching Naïve3method: 8 % Huge complexity! 8 : I % 3I : (millions) 21

Top#k Strategy Alice Suppose k=2 1 3 8 % 8 : I % 3I : (millions) "3I % 22

How,to,Choose, Top#k Candidates? Intuition If two nodes match, their neighbors are also very likely to match. Alice 1 Bob 2 Perform BFS on 8 % 23

Complexity Analysis Building Bipartite Time Finding Matching Space Naïve method Top-k strategy I % I : [ I % + I : I % # I : [ I % + I : # I % I : [ " # I % ] [ " # I % # Complexity3greatly reduced! 24

Tradeoff " balances accuracy and complexity " = 10 is enough to achieve high accuracy Accuracy 1 0.8 0.6 0.4 0.2 0 sr=0.4 sr=0.6 sr=0.8 0 10 20 30 40 50 60 70 80 k k Time 50 45 40 35 30 25 20 15 10 sr=0.4 sr=0.6 sr=0.8 0 10 20 30 40 50 60 70 80 k k 25

Privacy inference Predict new edges in knowledge graph NY Knicks teaminleague opponent playfor LA Lakers playfor playinleague Nick Young teammate Kobe Bryant 26

Path,Ranking Algorithm Proposed by Ni Lao et al. in 2011 for a different topic Alice AIDS Correlations => rules => paths Logistic regression 27

Experiments Datasets Google+, Pokec Steps Generate 8 : Generate 8 % Run the algorithms 28

De#Anonymization Results Accuracy Metrics: accuracy, run time 1 0.8 0.6 0.4 0.2 k=5 k=10 k=15 Accuracy Acc 5 De-anonymize 0 Time 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 about 1 60% of users 0 0.2 0.4 0.6 0.8 1 0 s min w A Run Time(x10 2 s) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 80 70 60 50 40 30 20 10 0 Load Build Match Total Accuracy 40 35 30 25 20 15 10 0.8 0.7 0.6 Accuracy 0.5 0.4 0.3 0.2 Run Time(s) RS EG RW 29

Privacy Inference Results Metrics: hit@k, MRR (Mean reciprocal rank ) MRR(%) MRR(%) 1.2 MRR MRR,RG Hit Hit,RG Sample Ratio 6 Hit@10(%) MRR(%) Hit@10(%) MRR(%) 2.5 MRR MRR,RG Hit Hit,RG 1 5 2 10 0.8 4 8 1.5 0.6 3 6 0.4 2 1 4 0.2 1 0.5 Infers much 0 10 20 30 40 50 60 70 80 90 0 more privacy 2 0 10 20 30 40 50 60 70 80 90 0 sr sr 5.5 than random 4.5 5 MRR 35 30 guess 75 MRR MRR,RG 30 3.5 4 Hit 25 MRR,RG 70 Hit 65 Hit,RG 25 Hit,RG 60 2.5 3 20 20 55 50 1.5 2 15 15 45 10 40 0.5 1 5 10 35 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 25 30 Sample Ratio 30 12 Hit@10(%) Hit@10(%)

Outline Background Prior Work Our Work Conclusion 31

Conclusion We have Applied knowledge graphs to model the attacker s prior knowledge Studied the attack process: de-anonymization & privacy inference Designed methods to perform attack Done simulations and evaluations on two real world social networks 32

Future work Effective construction of the bipartite for large scale social networks Impact of adversarial knowledge on deanonymizability Fine-grained privacy inference on the knowledge graph 33

Thank you! Jianwei Qian jqian15@hawk.iit.edu https://sites.google.com/site/jianweiqianshomepage 34