Distributed Data Anonymization with Hiding Sensitive Node Labels

Similar documents
CLUSTER BASED ANONYMIZATION FOR PRIVACY PRESERVATION IN SOCIAL NETWORK DATA COMMUNITY

Alpha Anonymization in Social Networks using the Lossy-Join Approach

IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W STRUCTURAL DIVERSITY ANONYMITY

Sensitive Label Privacy Protection on Social Network Data

A Study of Privacy Attacks on Social Network Data

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Yi Song, Panagiotis Karras, Qian Xiao and Stephane Bressan

arxiv: v1 [cs.db] 11 Oct 2013

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust

An Efficient Clustering Method for k-anonymization

A Review of Privacy Preserving Data Publishing Technique

Survey Result on Privacy Preserving Techniques in Data Publishing

Privacy Preserving in Knowledge Discovery and Data Publishing

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Prediction Promotes Privacy In Dynamic Social Networks

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain

Sanitization Techniques against Personal Information Inference Attack on Social Network

A Novel Technique for Privacy Preserving Data Publishing

The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique

Comparison and Analysis of Anonymization Techniques for Preserving Privacy in Big Data

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database

Outsourcing Privacy-Preserving Social Networks to a Cloud

Research Article Structural Attack to Anonymous Graph of Social Networks

Survey Paper on Giving Privacy to Sensitive Labels

Big Graph Privacy. Hessam Zakerzadeh. Charu C. Aggarwal. Ken Barker. ABSTRACT

Edge Anonymity in Social Network Graphs

Preserving Data Mining through Data Perturbation

Slicing Technique For Privacy Preserving Data Publishing

Mining User - Aware Rare Sequential Topic Pattern in Document Streams

(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD)

ADDITIVE GAUSSIAN NOISE BASED DATA PERTURBATION IN MULTI-LEVEL TRUST PRIVACY PRESERVING DATA MINING

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

Emerging Measures in Preserving Privacy for Publishing The Data

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER

Review on Techniques of Collaborative Tagging

Comparative Analysis of Anonymization Techniques

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

ADVANCES in NATURAL and APPLIED SCIENCES

On Anonymizing Social Network Graphs

Survey of Anonymity Techniques for Privacy Preserving

PRIVACY PRESERVATION IN SOCIAL GRAPHS

Keywords Data alignment, Data annotation, Web database, Search Result Record

OSN: when multiple autonomous users disclose another individual s information

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets

Enforcing Privacy in Decentralized Mobile Social Networks

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

A Survey on: Privacy Preserving Mining Implementation Techniques

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

Exploring re-identification risks in public domains

k RDF-Neighbourhood Anonymity: Combining Structural and Attribute-Based Anonymisation for Linked Data

Graph Symmetry and Social Network Anonymization

Outsourcing Privacy-Preserving Social Networks to a Cloud

Privacy Preserving High-Dimensional Data Mashup Megala.K 1, Amudha.S 2 2. THE CHALLENGES 1. INTRODUCTION

Demonstration of Damson: Differential Privacy for Analysis of Large Data

Injector: Mining Background Knowledge for Data Anonymization

An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages

De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs,

Privacy Preserved Data Publishing Techniques for Tabular Data

K-Automorphism: A General Framework for Privacy Preserving Network Publication

Prajapati Het I., Patel Shivani M., Prof. Ketan J. Sarvakar IT Department, U. V. Patel college of Engineering Ganapat University, Gujarat

Image Similarity Measurements Using Hmok- Simrank

Exploring graph mining approaches for dynamic heterogeneous networks

CERIAS Tech Report Injector: Mining Background Knowledge for Data Anonymization by Tiancheng Li; Ninghui Li Center for Education and Research

Temporal Weighted Association Rule Mining for Classification

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

PPKM: Preserving Privacy in Knowledge Management

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

Data attribute security and privacy in Collaborative distributed database Publishing

Comparative Study of Subspace Clustering Algorithms

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion

Deep Web Crawling and Mining for Building Advanced Search Application

NON-CENTRALIZED DISTINCT L-DIVERSITY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Structural Diversity for Privacy in Publishing Social Networks

On Privacy-Preservation of Text and Sparse Binary Data with Sketches

A Review on Privacy Preserving Data Mining Approaches

De-anonymizing Social Networks

ISSN (Online) ISSN (Print)

ISSN: (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies

Finding the Most Appropriate Auxiliary Data for Social Graph Deanonymization

Co-clustering for differentially private synthetic data generation

ANONYMIZATION OF DATA USING MAPREDUCE ON CLOUD

A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING

AN EFFECTIVE FRAMEWORK FOR EXTENDING PRIVACY- PRESERVING ACCESS CONTROL MECHANISM FOR RELATIONAL DATA

Data Anonymization. Graham Cormode.

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017

Robust Steganography Using Texture Synthesis

Resisting Structural Re-identification in Anonymized Social Networks

Resisting Structural Re-identification in Anonymized Social Networks

MultiRelational k-anonymity

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks

International Journal of Advance Engineering and Research Development CLUSTERING ON UNCERTAIN DATA BASED PROBABILITY DISTRIBUTION SIMILARITY

Ad-hoc Trusted Information Exchange Scheme for Location Privacy in VANET

Transcription:

Distributed Data Anonymization with Hiding Sensitive Node Labels C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan University,Trichy R.JAYA Assistant Professor, PG and Research Department of Computer Science Nehru Memorial College, Putthanampatti, Bharathidasan University, Trichy Abstract Recently, people share their information via social platforms such as Face book and Twitter in their daily life. Social networks on the Internet can be regarded as a microcosm of the real world and worth being analyzed. Since the data in social networks can be private and sensitive, privacy preservation in social networks has been a focused study. Previous works develop anonymization methods for a single social network represented by a single graph, which are not enough for the analysis on the evolution of the social network. In this paper I implement the data anonymization protection model in a distributed environment, where different publishers publish their data independently and their data are overlapping. And also this work will extend to individual identifications which use nodes edge label information in graphs. The problem is defined as PSNM k- Anonymity (Protecting Sensitive Nodes Model). I conduct extensive experiments to evaluate the effectiveness of the proposed technique. Keywords PSNM k- Anonymity, privacy preservation, nodes edge label editing. I. INTRODUCTION Recently people share their information and participate in various activities on the Internet via social platforms such as Google+, Facebook and Twitter. The data in social networks are worth being analyzed in social science research since they can reflect the real social activities. Since the data in social networks include personal information and interactions, which can be private and sensitive, there is a need to anonymize the data to protect user s privacy before their release for some analysis. A social network can be represented by a graph consisting of nodes and edges between the nodes. The nodes are used to represent users while the edges represent the interactions between the users. Recently, much work has been done on anonymizing tabular micro-data. A variety of privacy models as well as anonymization algorithms has been developed (e.g., k-anonymity, l-diversity, t-closeness). In tabular micro-data, some of the non-sensitive attributes, called quasi identifiers, can be used to re-identify individuals and their sensitive attributes. When publishing social network data, graph structures are also published with corresponding social relationships. As a result, it may be exploited as a new means to compromise privacy. In this paper consider the privacy preservation problem in a distributed manner. A set of social network graphs are anonymized to a sequence of sanitized graphs to be released. Study the problem of protecting a nodes identity by preventing attacks from an adversary who is armed with both the node degree and edge labels information. An electronic copy can be downloaded from the Journal website. For questions on paper guidelines, please contact the journal publications committee as indicated on the journal website. Information about final paper submission is available from the conference website. II. RELATED WORKS A big issue in publishing social network data is that may expose individuals' private or sensitive information (e.g., salary, disease, connection to a specific group of people). A simple method used in the past, to address this issue, is to replace the true identity of the individuals in a network with random pseudo-identifiers before releasing the data. However, Backstrom et al. [1] and Narayanan et al. [2] have shown that this simple approach cannot prevent the identification of individuals when adversaries have background knowledge about the network. To protect people's privacy from different types of attacks from adversaries, various data anonymization methodologies (e.g., k-anonymity [3], l-diversity [4], t-closeness [5]) have been proposed. In tabular microdata, some of the nonsensitive attributes, called quasi identifiers, can be used to reidentify individuals and their sensitive attributes. When publishing social network data, graph structures are also published with corresponding social relationships. As a result, it may be exploited as a new means to compromise privacy. A structure attack refers to an attack that uses the structure information, such as the degree and the subgraph of a node, to identify the node. To prevent structure attacks, a published graph should satisfy k-anonymity. The goal is to publish a social graph, which always has at least k candidates in different attack scenarios in order to protect privacy. 2014, IJIRAE- All Rights Reserved Page -392

Liu and Terzi [6] did pioneer work in this direction that defined a k-degree anonymity model to prevent degree attacks (Attacks use the degree of a node). A graph is k-degree anonymous if and only if for any node in this graph, there exist at least k - 1 other node with the same degree. Current approaches for protecting graph privacy can be classified into two categories: clustering and edge editing. The edge editing- based model is to add or delete edges to make the graph satisfy certain properties according to the privacy requirements. Most edge-editing-based graph protection models implement k-anonymity of nodes on different background knowledge of the attacker. Liu and Terzi [6] defined and implemented k-degree-anonymous model on network structure that is for published network, for any node, there exists at least other k-1 nodes have the same degree as this node. Zhou and Pei [7] considered k-neighborhood anonymous model: for every node, there exist at least other k-1 nodes sharing isomorphic neighborhoods. Zou et al. [8] proposed a k-automorphism protection model: A graph is k-automorphism if and only if for every node there exist at least k - 1 other nodes do not have any structure difference with it. Cheng et al. [9] designed a k-isomorphism model to protect both nodes and links: a graph is k-isomorphism if this graph consists of k disjoint isomorphic sub-graphs. The sensitive attributes of nodes are protected by anatomy model [10] in a kisomorphism graph. Ying and Wu [11] proposed a protection model which randomly changes the edges in the graph. Hay et al. [12] proposed a heuristic clustering algorithm to prevent privacy leakage using vertex refinement, subgraph, and hub-print attacks. Zheleva and Getoor [13] developed a clustering method to prevent the sensitive link leakage. Cormode et al. [14] introduced (k,l)- clusterings for bipartite graphs and interaction graphs, respectively. After they are preserve important graph properties, such as distances between nodes by adding certain noise nodes into a graph. This idea is based on the following key observation. Most social networks satisfy the Power Law distribution [15] i.e., there exist a large number of low degree vertices in the graph which could be used to hide added noise nodes from being reidentified. By carefully inserting noise nodes, some graph properties could be better preserved than a pure edge-editing method. This paper proposes a better idea to important node degree anonymization the values (node lables) are anonymized using fuzzy concept. III. METHODOLOGY This paper proposes a new anonymity model for social networks in a distributed manner. Different Data publisher publish their data independently and their data are overlapping. In a distributed environment, although the data published by each publisher satisfy certain privacy requirements. Here the new method is designed to help these publishers publish a unified data together to guarantee the privacy. The social network contains both nodes label and edges. The previous work focuses only the edges. It does not consider the node labels. This work focuses both node labels and edges. This work contains two phase Node Degree Anonymization and Edge Label Anonymization. After node degree anonymization the values (node lables) are anonymized using fuzzy concept. The first phase to achieve anonymity of a graph is to perform edge degree anonymization. The purpose of this step is to add as few edges as possible to the graph G to create graph G such that each node in G has the same degree as at least K-1 other nodes. In this paper the node grouping based degree anonymization algorithm is proposed. In this algorithm, the sorted nodes are first partitioned into different groups such that each group consists of K nodes. Then, all the groups are traversed and anonymized. In this step, when anonymizing the nodes in one group, the degrees of each node should be increased to be the same as the first node's degree d 0. Once the degrees of a group's nodes are the same, this group is denoted as anonymized, and all its nodes are marked as anonymized". The major operation in anonymizing a group's node is to add edges for every node v to make its degree the same as d 0. To add an edge for v, the algorithm chooses another unanonymized node. In case there are insufficient numbers of unanonymized nodes to create edges for v, anonymized nodes are randomly selected to create edges for v. accordingly these anonymized nodes' groups are marked as unanonymized. Algorithm 1: Node Degree Anonymization A. ALGORITHM S d = The sequence with sorted node degrees for the original graph G. S d = (S d (v 1 ),,, S d (v n )) Input: Nodes Output: Degree Anonymized 2014, IJIRAE- All Rights Reserved Page -393

1. Partition all nodes into groups and mark each group as unanonymized 2. for each unanonymized group g d 0 = degree of the first node in this group For each node v in graph g such that Sd(v)<d 0 i. If there are d 0 Sd(v) unanonymized nodes in V, randomly choose d 0 Sd(v) nodes and create edges between v and these nodes ii. Otherwise (i.e., there are insufficient number of unanonymized nodes to create edges for v) - Randomly select anonymized nodes v to create edges between v and these nodes - Mark the group that each v belongs to as unanonymized Mark every node in g as anonymized The second phase is edge label anonymization. In this phase the edge labels are anonymized using fuzzy concept. The following algorithm is used for edge data anonymization Algorithm 2: Data Anonymization Input: Data Set D Output: Anonymized datasets AD 1. Read input Data Set D 2. Convert Categorical Attribute into Numerical Attribute 3. For each attribute apply fuzzy concept F x 2 2 * [( x a ) /( b a )], 1 2 * [( x b) /( b a )] 1, 0, 2, a x ( a b) / 2 (( a b ) / 2) x b x b otherwise Here a= minimum value b= maximum value of the particular column x= value in that column 4. Transform the fuzzy values into Low, Medium, High and Very High 5. Return Anonymized dataset B. EXPERIMENTAL RESULT This section presents the experimental results on the performance of our proposed techniques. The approaches are implemented using JAVA. All the experiments were run on a Windows 7 with an Intel Pentium(R) CPU P6200 (@2.13GHz) and 2GB RAM. The adult data is used for our experiments. The Data set contains the following list of attributes: Node ID, Age, Sex, Work Class, Education, Occupation, Marital, Race, Salary and Number of connected edges. Figure 1 No of Edges Added 2014, IJIRAE- All Rights Reserved Page -394

Figure 2 Execution Time The set of experiments to test the degree anonymization techniques are shown in Figure 1 and 2. Figure 1 shows the number of new edges added to original data set. Figure 2 shows the execution time of degree anonymization. IV. CONCLUSIONS In this paper, we study the problem of achieving K-anonymity of social networks containing rich information to prevent attacks utilizing both node degrees and edge labels information. The major challenge in solving this problem is because of the correlation of edge labels. To address this challenge, we present a two-phase solution framework, which anonymizes edge degrees and edge labels in two phases. The first phase, degree anonymization and the second phase label anonymization. We theoretical analyze the optimality and running time complexity for the techniques in different stages of this framework. ACKNOWLEDGMENT I am very glad to express my deep sense of gratitude and profound thanks to my research advisor Mrs. R. JAYA, M.Sc., M.Phil., Department of Computer Science, Nehru memorial college (Autonomous), Puthanampatti, Tiruchirapalli Dt, for required guidance and encouragement at every stage of this research work. I feel proud in sharing this success with staff members and friends who helped me with their timely help and co-operation in successful completion of the research work. REFERENCES [1]. L. Backstrom, C. Dwork, and J. M. Kleinberg. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In WWW, pages 181{190, 2007. [2]. A. Narayanan and V. Shmatikov. De-anonymizing social networks. In IEEE Symposium on Security and Privacy, pages 173{187, 2009 [3] L. Sweeney, K-Anonymity: A Model for Protecting Privacy, Int l J. Uncertain. Fuzziness Knowledge-Based Systems, vol. 10, pp. 557-570, 2002 [4] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, L-Diversity: Privacy Beyond K- Anonymity, ACM Trans. Knowledge Discovery Data, vol. 1, article 3, Mar. 2007 [5] N. Li and T. Li, T-Closeness: Privacy Beyond K-Anonymity and L-Diversity, Proc. IEEE 23rd Int l Conf. Data Eng. (ICDE 07), pp. 106-115, 2007. [6] K. Liu and E. Terzi, Towards Identity Anonymization on Graphs, SIGMOD 08: Proc. ACM SIGMOD Int l Conf. Management of Data, pp. 93-106, 2008 [7] B. Zhou and J. Pei, Preserving Privacy in Social Networks Against Neighborhood Attacks, Proc. IEEE 24th Int l Conf. Data Eng. (ICDE 08), pp. 506-515, 2008. [8]. L. Zou, L. Chen, and M.T. O zsu, K-Automorphism: A General Framework for Privacy Preserving Network Publication, Proc. VLDB Endowment, vol. 2, pp. 946-957, 2009 [9] J. Cheng, A.W.-c. Fu, and J. Liu, K-Isomorphism: Privacy Preserving Network Publication against Structural Attacks, Proc. Int l Conf. Management of Data, pp. 459-470, 2010 [10] X. Xiao and Y. Tao, Anatomy: Simple and Effective Privacy Preservation, Proc. 32nd Int l Conf. Very Large Databases (VLDB 06), pp. 139-150, 2006. [11] X. Ying and X. Wu, Randomizing Social Networks: A Spectrum Preserving Approach, Proc. Eighth SIAM Conf. Data Mining (SDM 08), 2008 [12] M. Hay, G. Miklau, D. Jensen, D. Towsley, and P. Weis, Resisting Structural Re-Identification in Anonymized Social Networks, Proc. VLDB Endowment, vol. 1, pp. 102-114, 2008 2014, IJIRAE- All Rights Reserved Page -395

[13] E. Zheleva and L. Getoor, Preserving the Privacy of Sensitive Relationships in Graph Data, Proc. First SIGKDD Int l Workshop Privacy, Security, and Trust in KDD (PinKDD 07), pp. 153-171, 2007 [14] G. Cormode, D. Srivastava, T. Yu, and Q. Zhang, Anonymizing Bipartite Graph Data Using Safe Groupings, Proc. VLDB Endowment, vol. 1, pp. 833-844, 2008 [15] A.-L. Baraba si and R. Albert, Emergence of Scaling in Random Networks, Science, vol. 286, pp. 509-512, 1999. [16] Mingxuan Yuan,Lei Chen, Member, IEEE, Philip S.Yu, Fellow, IEEE, and Ting Yu, Protecting Sensitive Labels in Social Network Data Anonymization IEEE Transactions on Knowledge and Engineering, Vol.25, No.3 March 2013. 2014, IJIRAE- All Rights Reserved Page -396