A Proposed Technique for Privacy Preservation by Anonymization Method Accomplishing Concept of K-Means Clustering and DES

Similar documents
Comparative Analysis of Anonymization Techniques

Survey Result on Privacy Preserving Techniques in Data Publishing

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Survey of Anonymity Techniques for Privacy Preserving

Enhancing Fine Grained Technique for Maintaining Data Privacy

An Efficient Clustering Method for k-anonymization

A Review of Privacy Preserving Data Publishing Technique

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud

A Review of K-mean Algorithm

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Record Linkage using Probabilistic Methods and Data Mining Techniques

Slicing Technique For Privacy Preserving Data Publishing

S. Indirakumari, A. Thilagavathy

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

Emerging Measures in Preserving Privacy for Publishing The Data

GAIN RATIO BASED FEATURE SELECTION METHOD FOR PRIVACY PRESERVATION

NON-CENTRALIZED DISTINCT L-DIVERSITY

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Performance Comparison of Cryptanalysis Techniques over DES

PRIVACY PRESERVING CONTENT BASED SEARCH OVER OUTSOURCED IMAGE DATA

Dynamic Clustering of Data with Modified K-Means Algorithm

A Review on Privacy Preserving Data Mining Approaches

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD)

A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S.

PRIVACY PRESERVING RANKED MULTI KEYWORD SEARCH FOR MULTIPLE DATA OWNERS. SRM University, Kattankulathur, Chennai, IN.

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Keyword: Frequent Itemsets, Highly Sensitive Rule, Sensitivity, Association rule, Sanitization, Performance Parameters.

Comparison and Analysis of Anonymization Techniques for Preserving Privacy in Big Data

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique

Iteration Reduction K Means Clustering Algorithm

EFFICIENT RETRIEVAL OF DATA FROM CLOUD USING DATA PARTITIONING METHOD FOR BANKING APPLICATIONS [RBAC]

ABSTRACT I. INTRODUCTION

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

CS573 Data Privacy and Security. Li Xiong

Chongqing, China. *Corresponding author. Keywords: Wireless body area network, Privacy protection, Data aggregation.

Study on data encryption technology in network information security. Jianliang Meng, Tao Wu a

Incremental K-means Clustering Algorithms: A Review

Research Paper SECURED UTILITY ENHANCEMENT IN MINING USING GENETIC ALGORITHM

Randomized Response Technique in Data Mining

Security Control Methods for Statistical Database

A Study on Data Perturbation Techniques in Privacy Preserving Data Mining

SBKMMA: Sorting Based K Means and Median Based Clustering Algorithm Using Multi Machine Technique for Big Data

Sanitization Techniques against Personal Information Inference Attack on Social Network

Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP

A REVIEW ON K-mean ALGORITHM AND IT S DIFFERENT DISTANCE MATRICS

A Novel Identification Approach to Encryption Mode of Block Cipher Cheng Tan1, a, Yifu Li 2,b and Shan Yao*2,c

(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data

Review on Techniques of Collaborative Tagging

Multilevel Data Aggregated Using Privacy Preserving Data mining

Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering

Data attribute security and privacy in Collaborative distributed database Publishing

A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

ADDITIVE GAUSSIAN NOISE BASED DATA PERTURBATION IN MULTI-LEVEL TRUST PRIVACY PRESERVING DATA MINING

Attribute Based Encryption with Privacy Protection in Clouds

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

PRIVACY PROTECTION OF FREQUENTLY USED DATA SENSITIVE IN CLOUD SEVER

Survey Paper on Efficient and Secure Dynamic Auditing Protocol for Data Storage in Cloud

A Power Attack Method Based on Clustering Ruo-nan ZHANG, Qi-ming ZHANG and Ji-hua CHEN

A Survey on A Privacy Preserving Technique using K-means Clustering Algorithm

ISSN Vol.04,Issue.05, May-2016, Pages:

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

ADVANCES in NATURAL and APPLIED SCIENCES

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness

Research Trends in Privacy Preserving in Association Rule Mining (PPARM) On Horizontally Partitioned Database

ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining

Distributed Data Anonymization with Hiding Sensitive Node Labels

International Journal of Advanced Research in Computer Science and Software Engineering

K ANONYMITY. Xiaoyong Zhou

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems

Secure Multiparty Computation Introduction to Privacy Preserving Distributed Data Mining

Fuzzy methods for database protection

Unsupervised learning on Color Images

A Methodology for Assigning Access Control to Public Clouds

Privacy Preserving Mining Techniques and Precaution for Data Perturbation

Privacy Preserving Data Mining. Danushka Bollegala COMP 527

NIGERIA SECURITY AND CIVIL DEFENCE CORPS INSTITUTE OF SECURITY OF NIGERIA

Data Distortion for Privacy Protection in a Terrorist Analysis System

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

Integration of information security and network data mining technology in the era of big data

International Journal of Computer Engineering and Applications, Volume XI, Issue XII, Dec. 17, ISSN

Web Data mining-a Research area in Web usage mining

Volume 6, Issue 1, January 2018 International Journal of Advance Research in Computer Science and Management Studies

An Improved Document Clustering Approach Using Weighted K-Means Algorithm

Privacy Preserved Data Publishing Techniques for Tabular Data

IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points

Privacy-Preserving Data Mining in the Fully Distributed Model

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Privacy Challenges in Big Data and Industry 4.0

PRIVACY PRESERVING DATA MINING BASED ON VECTOR QUANTIZATION

Transcription:

A Proposed Technique for Privacy Preservation by Anonymization Method Accomplishing Concept of K-Means Clustering and DES Priyanka Pachauri 1, Unmukh Datta 2 1 Dept. of CSE/IT, MPCT College Gwalior, India 2 Prof. Dept. of CSE/IT, MPCT College, Gwalior, India Abstract: Privacy-Preserving Data Mining also an essential branch of the data mining and an exciting topic in privacy preservation has gain particular attention in current years. Data mining has been substantially studied and useful into numerous fields which include the Internet of Things and the business growth. However, data mining tactics additionally take vicinity critical demanding situations due to enlarged sensitive statistics disclosure and the violation of privacy. This discussion describes the privacy concern that occurs due to data mining, particularly for the national security applications. We discuss PPDM by Anonymization Method in which we use K-means clustering in order to divide the given data and DES algorithm for encryption of data in order to prevent sensitive data from attacker. Keywords: privacy preservation, PPDM, K-meanS clustering, anonymization,des etc. I. INTRODUCTION Privacy is receiving extra attention partially as of counter-terrorism and the national security. At present we have heard so much approximately countrywide protection in media. This is mainly because people are now realizing that to handle terrorism, the government may need to collect information about individuals. There been much interest in the recent on applying the sa mining used to detect patterns which are unusual, terrorist activities and the fraudulent behavior. While all applications of data mining can give profit to the humans and save lives, there also negative side to this type of method, it could be a danger to the individuals privacy. This is due to data mining tools are present on the Web or, and even naive individuals can use these tools to mine information from stored data in various databases and files, and therefore violate the privateness of individuals. As we have stressed in papers to take out efficient data mining and mine necessary information for counter terrorism and national security, we gather all kinds of information about individuals.[1] However, this information could be a threat to individuals privacy and civil liberties. This is causing major concern with different civil liberties unions. The aim is to carry out data mining and yet to the maintain privacy. This topic is known as privacy-preserving data mining. In this paper we use anonymization technique along with K-means clustering and DES algorithm. A. Anonymization To guard identity of the individual when release of sensitive information is done, holders of data often being encrypt or eliminate explicit identifiers, like names and the unique security no: However, data which is unencrypted provides no assurance for anonymity. To preserve privacy, model of k-anonymity has been proposed by the Sweeney [2] which achieve k anonymity by means of generalization and the Suppression, K-anonymity, it is difficult for an imposter to decide the identity of the individuals in collection of data set containing personal information. For ex, the age of person might be generalized to a variety like youth, middle age and the adult with no specifying suitably, so as to decrease the threat of the identification. Suppression involves decrease the exactness of the applications and it don t liberate some information.by using this method it reduces the risk of detecting exact information. B. K- Means Clustering Algorithm K-mean is the greatest popular apportioning procedure of clustering. It turns out to be leading proposed by methods for Macqueen in 1967. K-mean is an unsupervised, non-deterministic, numerical, iterative technique for grouping. In k-mean each cluster is represented by using the suggest value of gadgets within the cluster.[3] Here we partition a hard and fast of n item into k cluster in order that intercluster similarity is low and intracluster similarity is excessive. Similarity is measured in time period of imply value of objects in a cluster. The set of rules consists of two separate levels. 1594

1 st Phase pick k centroid randomly, where the value k is settled in advance.2 nd Phase each item in data set is identified with the closest centroid. Euclidean distance is used to measure the distance among every data object and cluster centroid. Procedure 1) Arbitrarily pick out k data item from D dataset as preliminary cluster centriod. 2) Repeat 3) Assign every data item to the cluster to which object is most comparative construct absolutely with respect to the suggest estimation of the item in cluster; 4) Calculate the new suggest estimation of the data items for each cluster and refresh the mean esteem. 5) Until no trade. C. DES(Data Encryption Standard) Data Encryption Standard also called as (DES) algorithm has been all the rage secret key encryption algo and it is taken in use in lots of commercial and the financial applications. Though it was introduced in the year 1976, it has established resistant to all the forms of cryptanalysis. In addition, DES is a block cipher algorithm which means that it takes a fixed length of the message and encrypts it (encrypts the block), and returns a string in the same size. DES is the first encryption algorithm recommended by NIST (National Institute of Standards and Technology). II. LITERATURE REVIEW The main contributions of this paper are three folds: (i) the definition of the data collection and publication process, (ii) the privacy framework model and (iii) personalized anonymization approach. The experimental analysis is presented at the end; it shows this approach performs better over the distinct l-diversity measure, probabilistic l-diversity measure and k-anonymity with t-closeness measure [4]. The [5] proposes a singular, extra flexible generalization scheme. The experimental of their study indicate that their approaches produce k anonymization with less generalization compared to previous approaches. They conclude that a bottom-up approach for k anonymization is preferable for small number of quasiidentifying attributes. The k anonymity based method is illustrated in [6] is used to search for optimal feature set partitioning and [7] for cluster analysis. And [8] Proposes a data reconstruction approach to obtain k-anonymity safety in predictive data mining. In this method the probably identifying attributes are first mapped the usage of aggregation for numeric data and swapping for nominal data. A genetic set of rules technique is then implemented to the masked data to find a correct subset of it. k-anonymity method is treated as the classical anonymization method and most of the studies are based on k-anonymity. The others are based on its improved methods like l-diversity, t-closeness, km -anonymization, (α,k) anonymity, p-sensitive k-anonymity, (k,e) anonymity, which are described in [9]. They provide a detailed survey of anonymization methods and also point out pitfalls in k anonymity. Previous works by Samarati and Sweeney [10,11] shows that the removal of the personally identifying information from data is insufficient for the data security, rather it is better to use k anonymity method for publishing data. The quasi identifier (QI), which is the combination of person specific identifiers are considered here for the process of anonymization. One of the common methods to achieve k anonymity is to generalize identifiers (for example date of birth can be generalized to month of birth). [12]There are various PPDM techniques such as anonymization, perturbation, randomization, condensation, cryptography. In this paper we have reviewed anonymization technique of PPDM such as k anonymity using generalization and suppression, p sensitive k anonymity, (α, k) anonymity, l diversity, m-invariance. Sometimes the data should be publically published in its original form. Even though it is not encrypted and perturbed, some sort of precaution should be implemented before releasing the data in terms of anonymization [13]. This is a kind of generalization of some attributes which protects against identity disclosure. Anonymization can be obtained through techniques inclusive of generalization, suppression, data removal, permutation, swapping and so forth [13] III. PROPOSED METHOD The proposed algorithm is a try to present a new method for complex encrypting and decrypting data based totally on parallel programming in the sort of way that the new method can make use of multiple- core processor to acquire higher speed with better degree of protection. 1595

ALGORITHM International Journal for Research in Applied Science & Engineering Technology (IJRASET) Partition of given dataset through the usage of K- means clustering Summary partition Dim choose dimension() P frequency set(partition, dim) Sv find median(p) Lhs {t belongs to partition : t.dim <= sv} Rhs {t belongs to partition: t.dim > sv} Apply DES for encrypting the records. Finally return union of lhs and rhs partition start Input dataset Anonymization algorithm K-Means clustering for partitioning dataset DES encryption on data Final result end Procedure Step1 - consider the dataset for input. Fig. 1 Flow diagram Propose method Step2 - applies anonymization technique to that particular dataset. Step3 - K-means clustering technique is used to partiton the data sets into clusters. Step4 DES encryption technique is used to suppress the data values. Step5 final result obtained by union of lhs and rhs values formed by anonymization technique. IV. RESULT ANALYSIS Table 1 describes the accuracy between Base method and Propose method. No. of records in base in proposed 100 200 300 400 500 84.00 83.50 84.00 84.00 83.80 96.00 98.00 99.00 98.75 98.20 1596

Fig.2 Above graph represents the analysis of accuracy in base and propose method and concluded that propose method is best for preserving privacy. Table II describes the error rate between Base method and Propose method. No. of records in base in proposed 100 200 300 400 500 84.00 83.50 84.00 84.00 83.80 96.00 98.00 99.00 98.75 98.20 Fig.3 Above graph represents the analysis of error rate in base and propose method and concluded that propose method is best for preserving privacy as the error rate in propose method is less than the error rate in base method. V. CONCLUSION Each privacy preserving technique has its own importance. Substantial efforts have been accomplished to address these needs. Data encryption and anonymization are widely adopted ways regarding privacy breach. However, encryption is not suitable for data that are processed and shared. Anonymizing huge data and dealing with anonymized data sets are nonetheless challenges for classic anonymization processes. Privacy- preserving data mining is emerged for to 2 critical desires: data analysis with a purpose to deliver better services and making sure the protection privileges of the data owners. 1597

The of our proposed work shows that by doing k-means clustering and encypting the data using DES method we can achieve more preservation of privacy. REFERENCES [1] Bhavani Thuraisingham, Privacy-Preserving Data Mining: Developments and Directions, IDEA GROUP PUBLISHING, Journal of Database Management, 16(1), 75-87, Jan-March 2005 77. [2] Pingshui WANG, Survey on Privacy Preserving Data Mining, International Journal of Digital Content Technology and its Applications, Volume 4, Number 9, December 2010. [3] Jyoti Yadav, Monika Sharma, A Review of K-mean Algorithm, International Journal of Engineering Trends and Technology (IJETT) Volume 4 Issue 7- July 2013, ISSN: 2231-5381. [4] M. Prakash G. Singarave An approach for prevention of privacy breach and information leakage in sensitive data mining 2015 IEEE. [5] Tiancheng Li, Ninghui Li, Towards Optimal k-anonymization, Data & Knowledge Engineering, 2008 Elsevier. [6] Nissim Matatov, Lior Rokach, Oded Maimon, Privacy-preserving data mining: A feature set partitioning approach, Information Sciences 180 (2010) 2696 2720. [7] Benjamin C. M. Fung, Ke Wang, Lingyu Wang, Patrick C.K. Hung, Privacy-preserving data publishing for cluster analysis, Data & Knowledge Engineering 68 (2009) 552 575. [8] Dan Zhu, Xiao-Bai Li, Shuning Wu, Identity disclosure protection: A data reconstruction approach for privacypreserving data mining, Decision Support Systems 48 (2009) 133 140. [9] Yan Zhao, Ming Du, Jiajin Le, Yongcheng Luo, A Survey on Privacy Preserving Approaches in Data Publishing, First International Workshop on Database Technology and Applications, 2009. [10] Samarati P, Protecting respondent s privacy in Microdata release, IEEE Transactions on Knowledge and Data Engineering, 13:1010 1027. [11] Sweeney L, k-anonymity: A model for protecting Privacy, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5):557 570. [12] Kiran Israni, Shalu Chopra, Survey on Anonymization Technique for Privacy Preserving Data Mining (PPDM), International Journal of Innovative Research in Computer and Communication Engineering, Vol. 4, Issue 11, November 2016, ISSN(Online): 2320-9801. [13] Asmaa H.Rashid and Prof.dr. Abd-Fatth Hegazy, Protect Privacy of Medical Informatics using K-Anonymization Model, IEEE Explore. 1598