Research Trends in Privacy Preserving in Association Rule Mining (PPARM) On Horizontally Partitioned Database

Similar documents
MINING ASSOCIATION RULE FOR HORIZONTALLY PARTITIONED DATABASES USING CK SECURE SUM TECHNIQUE

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust

PRIVACY PRESERVING IN DISTRIBUTED DATABASE USING DATA ENCRYPTION STANDARD (DES)

International Journal of Modern Engineering and Research Technology

DISCLOSURE PROTECTION OF SENSITIVE ATTRIBUTES IN COLLABORATIVE DATA MINING V. Uma Rani *1, Dr. M. Sreenivasa Rao *2, V. Theresa Vinayasheela *3

Privacy preserving association rules mining on distributed homogenous databases. Mahmoud Hussein*, Ashraf El-Sisi and Nabil Ismail

Secure Multiparty Computation Introduction to Privacy Preserving Distributed Data Mining

ADDITIVE GAUSSIAN NOISE BASED DATA PERTURBATION IN MULTI-LEVEL TRUST PRIVACY PRESERVING DATA MINING

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD)

Privacy in Distributed Database

Hiding Sensitive Predictive Frequent Itemsets

Survey Result on Privacy Preserving Techniques in Data Publishing

IJSER. Privacy and Data Mining

Privacy and Security Ensured Rule Mining under Partitioned Databases

A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING

International Journal of Scientific & Engineering Research, Volume 8, Issue 4, April-2017 ISSN V.Sathya, and Dr.V.

Privacy Preserving Two-Layer Decision Tree Classifier for Multiparty Databases

Privacy-Preserving Data Mining in the Fully Distributed Model

Privacy Preserving Decision Tree Classification on Horizontal Partition Data

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

PPKM: Preserving Privacy in Knowledge Management

Secure Multiparty Computation

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain

Privacy Preserving Naïve Bayes Classifier for Horizontally Distribution Scenario Using Un-trusted Third Party

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING

Secure Frequent Itemset Hiding Techniques in Data Mining

IMPLEMENTATION OF UNIFY ALGORITHM FOR MINING OF ASSOCIATION RULES IN PARTITIONED DATABASES

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Evaluating the Classification Accuracy of Data Mining Algorithms for Anonymized Data

Distributed Data Mining with Differential Privacy

Privacy Preserving Association Rule Mining in Vertically Partitioned Databases

Distributed Data Anonymization with Hiding Sensitive Node Labels

Research Paper SECURED UTILITY ENHANCEMENT IN MINING USING GENETIC ALGORITHM

Emerging Measures in Preserving Privacy for Publishing The Data

Introduction to Data Mining

Accountability in Privacy-Preserving Data Mining

Comparison and Analysis of Anonymization Techniques for Preserving Privacy in Big Data

TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SET

A Review on Privacy Preserving Data Mining Approaches

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

Approaches to distributed privacy protecting data mining

PRESERVING PRIVACY IN ASSOCIATION RULE MINING

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

A Novel Sanitization Approach for Privacy Preserving Utility Itemset Mining

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017

Multilevel Data Aggregated Using Privacy Preserving Data mining

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database

Reconstruction-based Classification Rule Hiding through Controlled Data Modification

Attribute Based Encryption with Privacy Protection in Clouds

Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP

Keyword: Frequent Itemsets, Highly Sensitive Rule, Sensitivity, Association rule, Sanitization, Performance Parameters.

A Pandect on Association Rule Hiding Techniques

Comparative Analysis of Anonymization Techniques

An Architecture for Privacy-preserving Mining of Client Information

ADVANCES in NATURAL and APPLIED SCIENCES

Analyzing Outlier Detection Techniques with Hybrid Method

Data attribute security and privacy in Collaborative distributed database Publishing

EFFICIENT RETRIEVAL OF DATA FROM CLOUD USING DATA PARTITIONING METHOD FOR BANKING APPLICATIONS [RBAC]

Using secure multi-party computation when pocessing distributed health data

Building a Concept Hierarchy from a Distance Matrix

A Review of Privacy Preserving Data Publishing Technique

A Survey on A Privacy Preserving Technique using K-means Clustering Algorithm

Privacy Preserving Mining Techniques and Precaution for Data Perturbation

PRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION

Survey Paper on Efficient and Secure Dynamic Auditing Protocol for Data Storage in Cloud

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S.

The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm

Iteration Reduction K Means Clustering Algorithm

MySQL Data Mining: Extending MySQL to support data mining primitives (demo)

Preserving Data Mining through Data Perturbation

An Overview of Secure Multiparty Computation

CLASSIFICATIONANDEVALUATION THE PRIVACY PRESERVING DISTRIBUTED DATA MININGTECHNIQUES

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

ADVANCES in NATURAL and APPLIED SCIENCES

CS573 Data Privacy and Security. Cryptographic Primitives and Secure Multiparty Computation. Li Xiong

Dynamic Clustering of Data with Modified K-Means Algorithm

. (1) N. supp T (A) = If supp T (A) S min, then A is a frequent itemset in T, where S min is a user-defined parameter called minimum support [3].

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

PRIVACY PROTECTION OF FREQUENTLY USED DATA SENSITIVE IN CLOUD SEVER

A Novel method for Frequent Pattern Mining

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

Privacy Preserving Data Mining. Danushka Bollegala COMP 527

Enhancing Fine Grained Technique for Maintaining Data Privacy

SPEEDING UP PRIVACY PRESERVING DATA MINING TECHNIQUES TRAN HUY DUC 2016 SPEEDING UP PRIVACY PRESERVING DATA MINING TECHNIQUES

ISSN: (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies

Privacy-preserving query processing over encrypted data in cloud

A Machine Learning Approach to Privacy-Preserving Data Mining Using Homomorphic Encryption

Mining of Web Server Logs using Extended Apriori Algorithm

MTAT Research Seminar in Cryptography Building a secure aggregation database

K ANONYMITY. Xiaoyong Zhou

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES

Efficient Algorithm for Frequent Itemset Generation in Big Data

BUILDING PRIVACY-PRESERVING C4.5 DECISION TREE CLASSIFIER ON MULTI- PARTIES

On Privacy-Preservation of Text and Sparse Binary Data with Sketches

Transcription:

204 IJEDR Volume 2, Issue ISSN: 232-9939 Research Trends in Privacy Preserving in Association Rule Mining (PPARM) On Horizontally Partitioned Database Rachit Adhvaryu, 2 Nikunj Domadiya PG Student, 2 Professor Computer Science & Engineering, B. H. Gardi College of Engineering & Technology, Rajkot, Gujarat, India. 2 Computer Science & Engineering, B. H. Gardi College of Engineering & Technology, Rajkot, Gujarat, India. rachit.adhvaryu@yahoo.com, 2 nhdomadiya@gardividyapith.ac.in Abstract - The advancement in data mining techniques plays an important role in many applications. In context of privacy and security issues, the problems caused by association rule mining technique are investigated by many research scholars. It is proved that the misuse of this technique may reveal the database owner s sensitive and private information to others. Many researchers have put their effort to preserve privacy in Association Rule Mining. In this paper, we have presented the survey about the techniques and algorithms used for preserving privacy in association rule mining with horizontally partitioned database. Keywords - Data Mining, Horizontally Partitioned Database, Privacy Preserving Association Rule Mining I. INTRODUCTION Data mining or knowledge discovery techniques such as association rule mining, classification, clustering, sequence mining, etc. have been most widely used in today s information world []. Successful application of these techniques has been demonstrated in many areas like marketing, medical analysis, business, Bioinformatics, product control and some other areas that benefit commercial, social and humanitarian activities. Privacy Preserving Data Mining is an important feature which every mining system must support. This feature actually secures the private and sensitive information which the database owners do not want to reveal. The sensitive data can be anything like Identification Number, Name, Address, Disease etc. [2] The works required in the privacy preserving data mining area are as follows: a) Privacy Preserving Data Publishing: These techniques try to study different techniques associated with privacy. These techniques consist of: i. The Randomization Method: In this technique, any random value is added to the original value of the data to mask the values of the data. The noise is added in large amount so that the original data value is not recovered [3]. ii. The K-Anonymity Model and L-Diversity: In K-Anonymity, the techniques like generalization and suppression were introduced to normalize data representation. In order to reduce the identification risk, every tuple in the database must be indistinguishable. The L-Diversity method was introduced to overcome some weaknesses of K-Anonymity. The new concept of intra group diversity of sensitive and iii. private values within anonymization scheme was discovered [4]. Distributed Privacy Preserving: Sometimes, some users do not wish to disclose their information to other users. But the individual users are interested in achieving the aggregate results from the data set which are divided among the users. [5] b) Modifying the record values to preserve privacy: In this technique, Association Rule Hiding methods were used to preserve privacy. Using these methods, the association rules are encrypted in order to secure the data. c) Query Auditing: In this technique, either the result of the query is modified or the result of the query is restricted. Many Perturbation methods are used to achieve this. [6] These techniques have been implemented either on Centralized Database or Distributed Database. The detailed description has been described in chapter II. II. CENTRALIZED AND DISTRIBUTED DATABASE A. Centralized Database In the centralized database, all the datasets are collected at one central site which can be called as Data Warehouse and then all the mining operations are performed. Fig [a] shows the centralized database. IJEDR40035 International Journal of Engineering Development and Research ( www.ijedr.org) 200

The various techniques used in centralized database are Data Perturbation, Data Blocking and Reconstruction Based Techniques [7]. B. Distributed Database Sometimes, some users do not wish to disclose their information to other users. But the individual users are interested in achieving the aggregate results from the data set which are divided among the users. Distributed Database is used now a days. Due to large and fast growing database, the data are not managed centrally. But they are stored at different places i.e. different data stored in different places. The Distributed Database can be further classified into: [8] a. Vertically Partitioned Database: In this every site has different schemas. The attribute values may or may not be same. Fig [b] indicates Vertical Partitioned Database (b) For example, Hospital may have records of the patient like name, contact details, disease, attending doctor, bill amount, etc. Also same name and contact details with any other information like mediclaim amount, insurance id etc. may be found with the insurance company. Thus, at the end one final result can be obtained by combining both the records [9]. b. Horizontally Partitioned Database: In this every site has the same schema. But the attribute values are different [0]. Fig [c] indicates Horizontal Partitioned Database. (c) It means the records may be different, but the way the records are stored is same. For example, Credit card systems of two different banks have the same schema of storing the information of the users holding the credit card. But, the information of the users like user name, contact details, credit limit, etc. may be different with both the banks. [0] In the later part of paper, we describe broad description about the techniques and algorithms used for Privacy Preserving on Horizontally Partitioned Database. III. PRIVACY PRESERVING APPROACHES FOR HORIZONTALLY PARTITIONED DATABASE. SECURE MULTIPARTY COMPUTATION (SMC) WITH TRUSTED THIRD PARTY It was Client Server Architecture bases technique where one party is a client and other parties are clients. All the client parties believe the server party to be trusted and honest such that the server party won t reveal their sensitive and private data to another party []. Fig [d] shows SMC with trusted third party. IJEDR40035 International Journal of Engineering Development and Research ( www.ijedr.org) 20

In this, each party finds the frequent itemset and its local support count and sends it the third party. On receiving these data, the third party evaluates this data to find global frequent itemset and global support count. The result found is returned back to all the client sites for further manipulations []. The limitation of this technique was what if the third party fails or collusion between third party and any client []. There were more chances of data loss in this technique. Algorithm describing this technique is as below A. PPDM-ARBSM ALGORITHM In Privacy Preserving Distributed Mining algorithm of Association Rules (PPDM-ARBSM), the advantages of RSA Public Key encryption was used [2]. The algorithm used mainly 2 types of servers. CMS (Cryptosystem Management Server) and DMS (Data Mining Server). The task of the CMS was to provide public key and private key for encryption and the task of DMS server was decryption and final result generation. Fig [e] shows PPDM-ARBSM. The working of the algorithm was as follows:. Firstly, Identity of CMS is authenticated. 2. Secondly, CMS generates Public Key (pub) for RSA and sends to each site (S, S2, S3). Also CMS generates and sends a Private Key to DMS [2]. 3. At each site in a communication network, for a transaction D, frequent itemset (Fi) is generated. Thereafter, Fi is encrypted using public key and sent to DMS. 4. Once DMS receives all the data, it decrypts it using the private key, evaluates the data and generates the final result. This result is sent back to all sites [2]. The drawback of this algorithm was the use of communication network. If this network fails, whole data may be lost and no global result would be generated. 2. SECURE MULTIPARTY COMPUTATION (SMC) WITH SEMI-HONEST MODEL This technique is quite different than SMC with Trusted Third Party Model. A partially-honest party is one who follows the standard rules. But feels free to migrate in between the steps to gain more information and satisfy an independent agenda of interests [3]. In other words, a partial-honest party follows the rules step by step and computes exactly required values based on the input from the other parties and it can analysis other party s data. But it is sure that it will not insert any false value which results in failure and try to use all the information secured to know sensitive information of other parties. The collusion among the parties does not occur in this model and thus private data is not revealed [3]. Fig [f] shows SMC with the semi honest model. IJEDR40035 International Journal of Engineering Development and Research ( www.ijedr.org) 202

(f) SMC with semi honest model Each site assumes the other party to be honest. Each site follows the protocol Each site computes only the required data. Sites do not collude with each other. But try to find some information about other site [3]. The drawback of the model was that each site is assumed to be honest. But there is no surety about sites not colluding with another site. Few algorithms describing this technique are explained below. A. FAST PRIVATE ASSOCIATION RULES MINING FOR SECURELY SHARING In this model, it was assumed that each site follows semi-honest protocol. The sites can share the union of data without the use of any trusted third party. The information which is hidden is the records and the position of the records in the data set. The whole model is governed by one site which is either called Model Driver or Model Initiator [4]. The Fig [g] shows the working of this model and is described as: (g) Scenario of Algorithm. Predefined one party acts as Model Initiator. For e.g. Alice. 2. Alice generates RSA Public Key and Private Key. It sends the public key to each site. 3. Each site encrypts its data using Alice s public key [4]. 4. Alice sends it to next site Bob. As Bob does not know decryption key, it won t be able to know the Alice s data. So it merges its own data, shuffles it and sends it next site Carol. 5. Carol again merges its own data as it cannot decrypt the data, shuffles it and sends it to the next site. 6. This process continues and each site merges its own data and sends it to the next site till the data reaches to the last site. 7. The last site merges its own data and sends it Initiator [4]. 8. Initiator decrypts the complete data using the private key and removes the duplicate data. The initiator is unable to know the other site s data as data were shuffled by each site. 9. Finally Initiator (Alice) publishes the union of all the data sites [4]. B. MHS ALGORITHM FOR HORIZONTALLY PARTITION DATABASE M. Hussein et al. s Scheme (MHS) was introduced to improve privacy and try to reduce communication cost on increasing number of sites. The main idea was to use effective cryptosystem and rearrange the communication path. For this, two sites were discovered. This algorithm works with minimum 3 sites. One site acts as Data Mining Initiator and other site as a Data Mining Combiner. Rests of other sites were called client sites [2]. This scenario was able to decrease communication time. To improve the algorithm, Apriori-Tid data mining algorithm was used instead of standard Apriori algorithms. Fig [h] shows MHS algorithm. IJEDR40035 International Journal of Engineering Development and Research ( www.ijedr.org) 203

The working of the algorithm is as follows: (h) MHS Algorithm. The initiator generates RSA public key and a private key. It sends the public key to combiner and all other client sites. 2. Each site, except initiator computes frequent itemset and local support for each frequent itemset using Local Data Mining (LDM) [2]. 3. All Client sites encrypt their computed data using public key and send it to the combiner. 4. The combiner merges the received data with its own encrypted data, encrypts it again and sends it to initiator to find global association rules. 5. Initiator decrypts the received data using the private key. Then it merges its own LDM data and computes to find global results. 6. Finally, it finds global association rules and sends it to all other sites [2]. C. EMHS ALGORITHM FOR HORIZONTALLY PARTITION DATABASE Enhanced M. Hussein et al. s Scheme (EMHS) was introduced to improve privacy and reduce communication cost on increasing number of sites. This algorithm also works with minimum 3 sites. One site acts as Data Mining Initiator and other site as a Data Mining Combiner. Rests of other sites were called client sites [5]. But this algorithm works on the concept of MFI (Maximal Frequent Itemset) instead of Frequent Itemset. Also the algorithm uses two different cryptosystem as mentioned in II and III. I. MFI (Maximal Frequent Itemset): A Frequent Itemset which is not a subset of any other frequent itemset is called MFI. By using MFI, communication cost is reduced [5]. II. RSA (Rivest, Shamir, Adleman) Algorithm: one of the widely used public key cryptosystem. It is based on keeping factoring product of two large prime numbers secret. Breaking RSA encryption is tough [5]. III. Homomorphic Paillier Cryptosystem: Paillier cryptosystem is an additive homomorphic cryptosystem, meaning that one can compute cipher texts into a new cipher text that is encryption of sum of the messages of the original cipher texts. For E.g. Let m, m2 be the two messages. Then Encryption= E (m+m2) =E (m) *E (m2) and Decryption= D (E (m) *E (m2)) =m+m2 i.e. the sum of m and m2. Also, if the size of the public key is to (bit) then the size of cipher text c is 2*t (byte) [5]. Fig [I] shows the EMHS algorithm. The working of the algorithm was divided in two phases as follows: Phase-I: (i) EHMS Algorithm IJEDR40035 International Journal of Engineering Development and Research ( www.ijedr.org) 204

a) The initiator generates RSA & Paillier public key and private key. It sends public keys to combiner and all other client sites [5]. b) Each site, except initiator computes its MFI, encrypts it using RSA public key and sends it to the combiner. c) The combiner merges the received data with its own data and sends it to the initiator. d) Initiator decrypts the received data using the private key. Then it adds its own data with the decrypted data and computes to find global MFI. Then the result is sent to all other sites [5]. Phase-II: a) Each site finds frequent itemset and its local support count on the basis of MFI [5]. b) Each site, except initiator encrypts the data using Paillier s public key and sends it to the combiner. c) The combiner merges its own data with the received data and sends it to the initiator. d) Initiator decrypts the received data using the private key. Then it adds its own data with the decrypted data and computes to find global Association Rules. Then the result is sent to all other sites [5]. IX. CONCLUSION In this paper, we discussed about data mining. We also discussed about various types of database. We presented a broad survey on privacy preserving in association rule mining on horizontally partitioned database. We discussed a variety of techniques and algorithms used for maintaining privacy or securing private and sensitive information. Still, there can be improvements in the defined algorithms. Better and improved algorithms must be defined which provides more security. REFERENCES [] M. Atallah, A. Elmagarmid, M. Ibrahim, E. Bertino, and V. Verykios, Disclosure limitation of sensitive rules, in Proceedings of the 999 Workshop on Knowledge and Data Engineering Exchange, ser. KDEX 99. Washington, DC, USA: IEEE Computer Society, pp. 45-52 999 [2] Mahmoud Hussein, Ashraf El-Sisi, Nabil Ismail Fast Cryptographic Privacy Preserving Association Rules Mining on Distributed Homogenous Data Base, Knowledge-Based Intelligent Information and Engineering Systems, Lecture Notes in Computer Science, Volume 578/2008, pp. 607-66 2008. [3] Agrawal D. Aggarwal C. C. On the Design and Quantification of Privacy-Preserving Data Mining Algorithms.ACM PODS Conference, 2002. [4] Machanavajjhala A., Gehrke J., Kifer D., and Venkita subramaniam M.: l-diversity: Privacy Beyond k-anonymity. ICDE, 2006. [5] Pinkas B.: Cryptographic Techniques for Privacy-Preserving Data Mining.ACM SIGKDD Explorations, 4(2), 2002. [6] Blum A., Dwork C., McSherry F., Nissim K.: Practical Privacy: The SuLQ Framework.ACM PODS Conference, 2005. [7] LiWu Chang and Ira S. Moskowitz, Parsimonious downgrading and decision trees applied to the inference problem, In Proceedings of the 998 New Security Paradigms Workshop, 82-89. 998 [8] D.W.Cheung,etal.,Ecient Mining of Association Rules in Distributed Databases, "IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, 996,pp.9-922; [9] YangZ., ZhongS.,Wright R.: Privacy-Preserving Classification of Customer Data without Loss of Accuracy. SDM Conference, 2006. [0] Yi, X., Zhang, Y. Privacy-preserving distributed association rule mining via semi trusted mixer. Data Knowledge. Eng. 63(2), 550-567. 2007 [] N V Muthu Lakshmi and Dr. K Sandhya Rani, PRIVACY PRESERVING ASSOCIATION RULE MINING WITHOUT TRUSTED PARTY FOR HORIZONTALLY PARTITIONED DATABASES, International Journal of Data Mining AND Knowledge Management Process (IJDKP) Vol.2, No.2 March 202 [2] GUI Qiong, CHENG Xiao-hui, A Privacy-Preserving Distributed Method for Mining Association Rules", 2009 International Conference on Artificial Intelligence and Computational Intelligence, pp 294-297 2009. [3] J. Han and M. Kamber, Data Mining: Concepts and Techniques. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 200, pp. 227-245. [4] Estivill-Castro, V., Hajyasien, A Fast Private Association Rule Mining by a Protocol Se-curely Sharing Distributed Data, 2007 IEEE Intelligence and Security Informatics (ISI 2007), New Brunswick, New Jersey, USA, May 23-24, pp. 324-330 2007 [5] Xuan C. N., Hoai B. L., Tung A. C., An enhanced scheme for privacy preserving association rules mining on horizontally distributed databases, 202 IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), pp - 4 202. IJEDR40035 International Journal of Engineering Development and Research ( www.ijedr.org) 205