Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning

Size: px
Start display at page:

Download "Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning"

Transcription

1 Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning S. Ram Prasad Reddy, Kvsvn Raju, and V. Valli Kumari associated with a particular transaction, if the adversary has some partial knowledge about a subset of items purchased by that person. Besides this, the major concern is that the adversary may also identify the customer s illness from the identified transaction. For example, assume that Thomas went to the drug store on a particular day and purchased medicines {riboflavin, beplex forte, tiniba and roxid}. Assume that some of the medicines (e.g., riboflavin, beplex forte) purchased by Thomas were observed by another customer, James, who has come to buy medicines at that point of time. Unfortunately, James happens to be neighbor of Thomas. Thomas would not like James to find out other medicines that he bought because if the remaining medicines are known there is a likelihood of James identifying the illness he is suffering from. However, if the drug store decides to publish its transactions and there is only one transaction containing {riboflavin, beplex forte}, James can infer that this transaction corresponds to Thomas and he can find out his complete list of medicines and thereby identify the disease as glossitis. This scenario not only reveals other items in the transaction but also gives a scope to the adversary to identify the illness. This motivating example stresses the need to transform the original transactional database D to a database D' before publication. In practice, we expect the adversary to have only partial knowledge about the transactions otherwise there would be little sensitive information to hide. In summary, this paper proposes a solution using concept learning for publishing transactional data sets. It also ensures that publishing transactional data does not provide any scope for the adversary to gain the identity of the individual. Experiments are conducted on adult dataset to verify our technique. The results show that our solution is very promising in real world applications. The rest of this paper is organized as follows. Section surveys the previous works. Section explains the system architecture. Section 4 comprises of proposed model. In Section 5, we present an analysis of our approach. Empirical study is given in section 6. Section 7 concludes our work and points out some possible future directions. Abstract In this paper we study the problem of protecting privacy in the publication of transactional data. Consider a collection of transactional data that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the point of view of the adversary. We define a new version of the anonymity guarantee using concept learning. Our anonymization model relies on generalization using concept hierarchy and concept learning. The proposed algorithms are experimentally evaluated using real world datasets. Index Terms Privacy preserving, hashing, anonymity, concept learning, transactional datasets. I. INTRODUCTION Recently, mining of databases has attracted a growing amount of attention in database communities due to its wide applicability in retail industries and for improving marketing strategy. As pointed out in [1], the progress in barcode technology has made it possible for retail organizations to collect and store massive amounts of sales data. A record in such a data typically consists of the transaction date, the items bought in the transaction, and possibly also customer-id if such a transaction is made via the use of a credit card or any kind of customer card. It is noted that analysis of past transaction data can provide valuable information on customer buying behavior, and thus improve the quality of business decisions. It is essential to collect a sufficient amount of sales data before we can draw any meaningful conclusion from them. As a result, the amount of these sales data tends to be huge. We consider the problem of publishing transactional data, while preserving the privacy of individuals associated to them. Consider a transactional database D, which stores information about items purchased at a drug store by various customers (patients). We observe that the direct publication of D may result in unveiling the identity of the person II. RELATED WORK Anonymity for relational data has received considerable attention due to the need of several organizations to publish microdata without revealing the identity of individual records. Even if the identifying attributes like name is removed, an attacker may be able to associate records with specific persons using combinations of other attributes (e.g., Postal Code; Gender; birth-date), called quasi-identifiers (QID). Two techniques to preserve privacy are generalization and Manuscript received August 9, revised September 7, 01 S. R. P. Reddy is with Computer Science Engineering Department, Vignan s Institute of Engineering for Women, Visakhpatnam-50049, Andhra Pradesh, India ( reddysadi@ gmail.com). K. Raju was with Andhra University, Visakhapatnam-5000, Andhra Pradesh, India. He is now with the R&D, ANITS as Director, Visakhapatnam, India ( kvsvn.raju@gmail.com). V. V. Kumari is with Computer Science and Systems Engineering Department, Andhra University Visakhapatnam-5000, Andhra Pradesh, India ( vallikumari@gmail.com) /IJMLC.01.V.5 7

2 getting the transaction released, the transaction is totally suppressed. If the user gives willingness to publish the transaction i.e. UC=1, the transaction is anonymized and then published. The transactions are distributed into various equivalence classes using hashing. After this, each equivalence class is checked for the minimality threshold (k). If k is satisfied for all the equivalence classes then each bucket of transactions is anonymized using concept learning. If k is not satisfied for any of the equivalence class then the total transaction dataset is rehashed and the procedure continues. suppression [5]. Generalization replaces their actual QID values with more general ones. Suppression excludes some QID attributes or entire records (known as outliers) from the microdata. k-anonymity is a technique that prevents joining attacks by generalizing/suppressing portions of the released microdata so that no individual can be uniquely distinguished from a group of size k [4], [7]. The privacy preserving transformation of the microdata is referred to as recoding. Two models exist: in global recoding [], a particular detailed value must be mapped to the same generalized value in all records. Local recoding, on the other hand, allows the same detailed value to be mapped to different generalized values in each anonymized group. For any anonymity mechanism, it is desirable to define some notion of minimality. As to our model, the notion of minimal full-domain generalization was defined [], [4] using the distance vector of the domain generalization. Informally, this definition says that a full-domain generalized private table (PT) is minimal if PT is k-anonymous, and the height of the resulting generalization is less than or equal to that of any other k-anonymous full-domain generalization. Yabo Xu et al [6] model the power of attackers by the maximum size of public itemsets that may be acquired as prior knowledge, and proposed a novel privacy notion called coherence suitable for transactional databases. Manolis Terrovitis et al [8] proposed the k-anonymization problem of set-valued data. They defined the novel concept of km-anonymity for such data and analyzed the space of possible solutions. Our problem is different, since any combination of m items (which correspond to attributes) can be used by the attacker as a quasi-identifier. In this paper, we focus on specific local-recoding model of k-anonymity. Our objective is to find the minimal k-anonymous generalization (table) under the definition of minimality defined by Samarati [4]. By introducing concept learning, we provide a new approach to generate minimal k-anonymous tables for transactional datasets. Definition 1 (k-anonymity): Given a set of QID attributes Q1,.,Qd, released data D' is said to be k-anonymous with respect to Q1,.,Qd if each unique tuple in the projection of D' on Q1,.,Qd occurs at least k times. Definition (Quasi-Identifier): A set of non-sensitive attributes {Q1,...,Qw} of a table is called a quasi-identifier if these attributes can be linked with external data to uniquely identify at least one individual in the general population. Definition (Sensitive Attribute): The attributes that should not be disclosed directly to the public or may be disclosed after disassociating its value with an individual s other information. Transactional DB Users Consent Transactions to be published Transactions not to be published Hashing Suppress Anonymize Concept Learning & Concept Hierarchy Privacy Compliance Checker Privacy Provider Publish Transactions Fig. 1. System architecture IV. PROPOSED MODEL Let D be a transactional database containing D transactions. Let I = {I1,...,I m } be a set of items. Each transaction T D is a set of items such that T I. I is the domain of possible items that can appear in a transaction. Each transaction is associated with an identifier TID. A set of items is referred to as itemset. An itemset that contains m items is an m-itemset. We assume that the database provides answers to subset queries, i.e. queries of the form {T (Qs T) Λ (T D)}, where Qs is a set of items from I provided by the user. The number of query items provided by the user in Qs defines the size of the query. We define a database as k-anonymous as follows: Problem Definition: Given a database D, no adversary that has a subset of background knowledge of items of a transaction T D can use these items to identify less than k tuples from D. In other words, any subset query of size m or less, issued by the adversary should return either nothing or more than k answers. Note that queries with zero answers are also secure, since they correspond to background information that cannot be associated to any transaction. In general the structure of the transactional database may be in two different ways: (i) Horizontal data format and (ii) Vertical data format. In this paper, transactions of database are stored in the horizontal format. In horizontal data format, the data is represented as tid-itemset format, where tid is the transaction identifier and itemset is the set of items the transaction contains. III. SYSTEM ARCHITECTURE The transactional database D consists of set of transactions and would be updated frequently. We consider the users consent to publish the transactions. To satisfy this constraint an extra attribute is appended to the existing database specifying users consent (UC). The UC is a binary value. Binary value 0 indicates that the user is not interested in revealing the information. If the user is not interested in 74

3 TABLE I: THE HORIZONTAL DATA FORMAT OF THE TRANSACTIONAL DATABASE D {I1, J1, J} {I, J1} {I, J1, J} {I1, I, J} B. Anonymization Using Concept Learning Concept learning is a machine learning concept to generalize the items optimally. Each bucket of transactions is considered independently and generalized. Initially transaction1 is considered as it is and this would be the current resultant transaction. Next transaction is considered and is matched with the resultant transaction. If it exactly matches the resultant transaction remains as it is. If it does not match then the set of items that are present in the current transaction and not present in the previous transaction are added to the previous transaction. This is nothing but simply performing a union (U) operation on the two transactions. We obtain the resultant transaction. The procedure is repeated until a single transaction is obtained such that each transaction in the bucket is contained within the resultant transaction. The resultant transaction is now generalized using the concept hierarchy. The anonymized version for data in Table I is given in Table IV. The items in the transactions are hashed based on the value produced by the hash function as h(t ) = m i =1 σ ( Ii ) mod n. So, the transaction is stored in nd location (bucket). Similarly, the process is repeated for other transactions in the transactional database. Table II indicates the support counts of each item. The process of hashing is illustrated in Table III. From the Table III, we understand that transactions and are stored in bucket and transactions and are stored in bucket 0. Definition 4 (Support Count): The number of transactions in which a particular itemset exists, gives the support count, denoted by σ. Algorithm 1 Anonymizer Input: Bucket of transactions Bj Concept hierarchy Output: Anonymized transactions 1: Initialize the result to the most specific transaction R = {Ti} : for each transaction Ti+1 in Bj : for each item Ik in Ti+1 4: if item Ik in Ti+1 is contained in R 5: do nothing 6: else if item Ik is a sibling to one of the items in R generalize both items using the concept hierarchy 7: else 8: add item Ik to R 9: end if 10: end for 11: end for 1: Publish R for each transaction in Bj TABLE II: SUPPORT COUNTS Item Support count(σ) I1 I J1 J TABLE III: APPLYING HASH FUNCTION Sum of the support counts(ssc) {I1, J1, J} ++=8 SSC mod {I, J1} +=6 0 {I, J1, J} ++=9 0 {I1, I, J} ++=8 TABLE IV: ANONYMIZED TRANSACTIONS A. Concept Hierarchy Much research effort has been found on using concept hierarchy towards databases management, text categorizations, natural language processing and so on. Algorithms on discover associations between different items from levels of taxonomy (which is represented in hierarchies) was introduced in 1995 as the mining approach for generalized association rules [11]. As an example, a keyword suggestion approach based on concept hierarchy has been proposed [1] to facilitate user s web search. A data mining system has been proposed to induce the classification rules using concept hierarchy [1]. Concept Hierarchy can reflect the concepts and relationships of a given knowledge domain. Such hierarchies are useful towards generalization and specialization. Fig. shows sample hierarchy for the above set of items in Table I. Algorithm Personalized Privacy Preserving Input: Transaction database D Number of buckets (Equivalence Classes)N Hash Function H Minimal Anonymity k Output: Buckets B0, B1, B,.., BN of anonymized transactions 1: for each distinct items Ix in T : compute σ(ix) : for each transaction Ti in D 4: j H(Ti) { j is the index of the bucket} 5: Bj Ti 6: end for 7: for each bucket Bj 8: if Bj < k 9: N = N : go to step 11: end if 1: end for 1: for each bucket Bj in the hash table 14: call Anonymizer(Bj) 15: end for All I I1 J I J1 {I, J} {I, J} {I, J} {I, J} J Fig.. Concept hierarchy 75

4 C. Algorithm for Personalized Privacy Publishing The algorithm starts by considering the transaction set, number of buckets, hash function and minimal anonymity as input parameters and distributes the data uniformly into all buckets satisfying the minimal anonymity k. The support count of each distinct item in the transaction set is computed. This support count is used as the weight of the item. The hash function is applied to each transaction independently by computing the residue of the sum of the support counts of all the items in the transaction (SSC) divided by the number of buckets (N). The residue generated is the bucket address. Each bucket of data is called a hash equivalence class. Depending on the residue produced, the transaction is mapped to the respective bucket. The process is repeated for all the transactions. Now, each bucket is checked for minimal anonymity (k). If all the buckets satisfy minimal anonymity then the transactions in each equivalence class is anonymized using the concept hierarchy and is published. of occurrences of item x in the database D and D is the size of the transactional database to be published. The value lies in between 0 and 1 where 0 implies no privacy gain and 1 implies complete privacy gain. B. Information Loss All privacy-preserving transformations cause information loss, which must be minimized in order to maintain the ability to extract meaningful information from the published data. A variety of information loss metrics have been proposed. The Classification Metric (CM) [10] is suitable when the purpose of the anonymized data is to train a classifier, whereas the Discernibility Metric (DM) [9] measures the cardinality of the anonymized groups. More accurate is the Generalized Loss Metric [10] and the similar Normalized Certainty Penalty (NCP) [5]. In the case of categorical attributes NCP is defined with respect to the hierarchy. On the similar guidelines, the information loss is defined in (5). Let x be an item in T. Then: 0, N ( x ) = 1 V. ANALYSIS N ( x ) A. Privacy The privacy metric depends on the level of anonymization. The strength of privacy is given by the distribution of transactions into equivalence classes and the anonymization of transactions in each equivalence class. The privacy gain of an item x in transaction T is defined in (1). 0, N ( x ) = 1 CH, Otherwise PG( x ) = N ( x ) PG( x ) T PG( x D ) = PG( D ) = x T x D PG( x ) σ ( x) T D PG(T ) D (5) where Nx is the node of the item generalization hierarchy where I is generalized. Nx and CH are the number of leaves under Nx and in the entire hierarchy, respectively. Intuitively, the IL tries to capture the degree of generalization of each item, by considering the ratio of the total items in the domain that are indistinguishable from it. For example, in the hierarchy of Fig., if I1 is generalized to I in a transaction T, the information loss IL(I1) is /4. The information loss for the whole database weighs the information loss of each generalized item in each transaction. Since local recoding is used, the generalization of an item will not be common for the total database. It varies from equivalence class to equivalence class. The information loss of a particular generalization ranges from 0 to 1 and can be easily measured. 0 implies no information loss 1 implies complete information loss. The information loss of transaction T (IL(T)), of an item in the entire database (IL(x D)) and of the entire database (IL(D)) are given in (6), (7) and (8) respectively. (1) where Nx is the node of the item generalization hierarchy where x is generalized. Nx is the number of leaves under Nx and CH is the number of leaves in the concept hierarchy. If the item being published is at the leaf level it implies that privacy is not being provided and the privacy gain is zero. On the other hand, if the published item is a non-leaf node then it denotes that the item is being secured and some amount of privacy is being provided as specified in (1). The privacy gain for the whole database is measured by computing the privacy gain of generalized item in each transaction. The usage of local recoding prevents the generalization of an item to be non-uniform and varies for each equivalence class. The privacy gain of each transaction T (PG(T)), of an item x in the entire database (PG(x D)) and of the entire database D (PG(D)) is given in (), () and (4) respectively. PG(T ) = CH, Otherwise IL( x ) = IL(T ) = IL( D ) = () IL( x ) T IL( x D ) = () x T x D IL( x ) σ ( x) T D IL(T ) D (6) (7) (8) where T is the size of the transaction, σ(x) is the total number of occurrences of item x in the database D and D is the size of the transactional database to be published. (4) VI. EMPIRICAL RESULTS where T is the size of the transaction, σ(x) is the total number The method is implemented in Java. For our experiments, 76

5 we worked on adult dataset for medical transactions. Graph in Fig. is developed by considering the number of buckets on x-axis and number of transactions on y-axis. The graph in Fig. depicts the distribution of transactions into various buckets. Fig. 4 reveals the information loss of each tuple after anonymizing using the concept hierarchy. X-axis represents the individual transaction and y-axis represents the relevant information loss. Fig. 5 illustrates the information loss vs. privacy gain for transactional databases of different sizes. Database sizes are given on the x-axis and information loss is given on the y-axis. As the privacy keeps on increasing the information loss also keeps on creeping up. Fig.. Distribution of transactions into various buckets Fig. 4. Information loss for each transaction ( D = 500) Fig. 5. Information loss vs. privacy gain for databases of different sizes VII. CONCLUSIONS AND FUTURE WORK This model brings out a practical problem of maintaining anonymity against transactional data and proposes a simple, yet effective solution. We have provided a simple solution using concept learning for maintaining anonymity for transactional datasets. In this paper, we studied the anonymization problem of transactional data. We considered the idea of support count as a weight assigned to each item of the transaction and the minimal anonymity property. Hashing is applied to the items of the transactions by considering the sum of the support counts of each item in the transaction. Based on our analysis, we developed an efficient algorithm which is practical for large databases. We emphasize that our technique is also directly applicable for databases, where transactions may contain both non-sensitive attribute and sensitive attributes. In this case, k-anonymity principle can be used to help avoiding associating the sensitive value to less than k tuples. In future, we aim at extending our model to l-diversity considering sensitive values associated to non-sensitive values. REFERENCES [1] R. Agarwak and R. Srikanth, Fast Algorithms for Mining association Rules in Large Databases, in Proceedings of the 0 th International Conference on Very Large Databases, pp , [] L. Willenborg and T. DeWaal, Elements of Statistical Disclosure Control, Springer-Verlag Lecture Notes in Statistics, 001. [] P. Samarati and L. Sweeney, Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression, Technical Report SRI-CSL-98-04, [4] P. Samarati, Protecting respondents' identities in microdata release, IEEE Transactions on Knowledge and Data Engineering, pp , 001. [5] J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. Fu, Utility-Based Anonymization Using Local Recoding, in Proceedings of SIGKDD, pp , 006. [6] Yabo Xu, Ke Wang, Ada Wai-Chee Fu, and Philip. S. Yu, Anonymizing Transaction Databases for Publication, in Proceedings of ACM SIGKDD, pp , 008. [7] L. Sweeney, k-anonymity: A Model for Protecting Privacy, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, pp , 00. [8] Manolis Terrovitis, Nikos Mamoulis, Panos Kalnis, Privacy-Preserving Anonymization of Set-valued Data, International Conference on Very Large Data Bases, pp , 008. [9] R. J. Bayardo and R. Agrawal, Data Privacy through Optimal k-anonymization, in Proceedings of ICDE, pp. 17-8, 005. [10] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, Incognito: Efficient Full-domain k-anonymity, In Proceedings of ACM SIGMOD, pp , 005. [11] R. Srikant and R. Agrawal, Mining generalized association rules, in Proceedings of the 1th International Conference on Very Large Data Bases, pp , [1] Y. Chen, G.-R. Xue, and Y. Yu, Advertising keyword suggestion based on concept hierarchy, in Proceedings of the international conference on Web search and web data mining, ACM, pp , 008. [1] M. E. M. D. Beneditto and L. N. de Barros, Using concept hierarchies in knowledge discovery, Lecture Notes in Computer Science, Springer, pp , 004. S. Ram Prasad Reddy holds an MTech in Computer Science and Technology from Andhra University, Visakhapatnam. He is currently working as Associate Professor in the Department of Computer Science and Engineering, Vignan s Institute of Engineering for Women, Visakhapatnam. Mr. Reddy research interests include Data Mining and Data Security. Dr. Kvsvn Raju holds a PhD degree in Computer Science and Technology from IIT, Kharagpur. He is presently working as Professor in the Department of Computer Science and Engineering and is the Director of R&D, at ANITS, Visakhapatnam. Dr. Raju research interests include Software Engineering Data Engineering and Data Security. He is a member of IEEE, ISTE, senior member of CSI and Fellow of IETE. Dr. V. Valli Kumari holds a PhD degree in Computer Science and Systems Engineering from Andhra University Engineering College, Visakhapatnam. She is presently working as Professor in the Department of Computer Science and Systems Engineering, Andhra University, Visakhapatnam. Dr. Valli Kumari research interests include Security and privacy issues in Data Engineering, Network Security and E-Commerce. She is a member of IEEE and ACM and is a fellow of IETE. 77

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Tiancheng Li Ninghui Li CERIAS and Department of Computer Science, Purdue University 250 N. University Street, West

More information

Survey of Anonymity Techniques for Privacy Preserving

Survey of Anonymity Techniques for Privacy Preserving 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Survey of Anonymity Techniques for Privacy Preserving Luo Yongcheng

More information

Survey Result on Privacy Preserving Techniques in Data Publishing

Survey Result on Privacy Preserving Techniques in Data Publishing Survey Result on Privacy Preserving Techniques in Data Publishing S.Deebika PG Student, Computer Science and Engineering, Vivekananda College of Engineering for Women, Namakkal India A.Sathyapriya Assistant

More information

An Efficient Clustering Method for k-anonymization

An Efficient Clustering Method for k-anonymization An Efficient Clustering Method for -Anonymization Jun-Lin Lin Department of Information Management Yuan Ze University Chung-Li, Taiwan jun@saturn.yzu.edu.tw Meng-Cheng Wei Department of Information Management

More information

A Review of Privacy Preserving Data Publishing Technique

A Review of Privacy Preserving Data Publishing Technique A Review of Privacy Preserving Data Publishing Technique Abstract:- Amar Paul Singh School of CSE Bahra University Shimla Hills, India Ms. Dhanshri Parihar Asst. Prof (School of CSE) Bahra University Shimla

More information

GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION

GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION K. Venkata Ramana and V.Valli Kumari Department of Computer Science and Systems Engineering, Andhra University, Visakhapatnam, India {kvramana.auce, vallikumari}@gmail.com

More information

An efficient hash-based algorithm for minimal k-anonymity

An efficient hash-based algorithm for minimal k-anonymity An efficient hash-based algorithm for minimal k-anonymity Xiaoxun Sun Min Li Hua Wang Ashley Plank Department of Mathematics & Computing University of Southern Queensland Toowoomba, Queensland 4350, Australia

More information

(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data

(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data (δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data Mohammad-Reza Zare-Mirakabad Department of Computer Engineering Scool of Electrical and Computer Yazd University, Iran mzare@yazduni.ac.ir

More information

Emerging Measures in Preserving Privacy for Publishing The Data

Emerging Measures in Preserving Privacy for Publishing The Data Emerging Measures in Preserving Privacy for Publishing The Data K.SIVARAMAN 1 Assistant Professor, Dept. of Computer Science, BIST, Bharath University, Chennai -600073 1 ABSTRACT: The information in the

More information

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction International Journal of Engineering Science Invention Volume 2 Issue 1 January. 2013 An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction Janakiramaiah Bonam 1, Dr.RamaMohan

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

Comparative Analysis of Anonymization Techniques

Comparative Analysis of Anonymization Techniques International Journal of Electronic and Electrical Engineering. ISSN 0974-2174 Volume 7, Number 8 (2014), pp. 773-778 International Research Publication House http://www.irphouse.com Comparative Analysis

More information

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD)

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD) Vol.2, Issue.1, Jan-Feb 2012 pp-208-212 ISSN: 2249-6645 Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD) Krishna.V #, Santhana Lakshmi. S * # PG Student,

More information

Data Anonymization - Generalization Algorithms

Data Anonymization - Generalization Algorithms Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity Generalization and Suppression Z2 = {410**} Z1 = {4107*. 4109*} Generalization Replace the value with a less specific

More information

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Dr.K.P.Kaliyamurthie HOD, Department of CSE, Bharath University, Tamilnadu, India ABSTRACT: Automated

More information

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 31 st July 216. Vol.89. No.2 25-216 JATIT & LLS. All rights reserved. SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 1 AMANI MAHAGOUB OMER, 2 MOHD MURTADHA BIN MOHAMAD 1 Faculty of Computing,

More information

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database T.Malathi 1, S. Nandagopal 2 PG Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,

More information

Privacy-preserving Anonymization of Set-valued Data

Privacy-preserving Anonymization of Set-valued Data Privacy-preserving Anonymization of Set-valued Data Manolis Terrovitis Dept. of Computer Science University of Hong Kong rrovitis@cs.hku.hk Panos Kalnis Dept. of Computer Science National University of

More information

Efficient k-anonymization Using Clustering Techniques

Efficient k-anonymization Using Clustering Techniques Efficient k-anonymization Using Clustering Techniques Ji-Won Byun 1,AshishKamra 2, Elisa Bertino 1, and Ninghui Li 1 1 CERIAS and Computer Science, Purdue University {byunj, bertino, ninghui}@cs.purdue.edu

More information

Maintaining K-Anonymity against Incremental Updates

Maintaining K-Anonymity against Incremental Updates Maintaining K-Anonymity against Incremental Updates Jian Pei Jian Xu Zhibin Wang Wei Wang Ke Wang Simon Fraser University, Canada, {jpei, wang}@cs.sfu.ca Fudan University, China, {xujian, 55, weiwang}@fudan.edu.cn

More information

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University Outline Privacy preserving data publishing: What and Why Examples of privacy attacks

More information

Maintaining K-Anonymity against Incremental Updates

Maintaining K-Anonymity against Incremental Updates Maintaining K-Anonymity against Incremental Updates Jian Pei 1 Jian Xu 2 Zhibin Wang 2 Wei Wang 2 Ke Wang 1 1 Simon Fraser University, Canada, {jpei, wang}@cs.sfu.ca 2 Fudan University, China, {xujian,

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Privacy Preservation Data Mining Using GSlicing Approach Mr. Ghanshyam P. Dhomse

More information

(α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing

(α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing (α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing Raymond Chi-Wing Wong, Jiuyong Li +, Ada Wai-Chee Fu and Ke Wang Department of Computer Science and Engineering +

More information

Utility-Based Anonymization Using Local Recoding

Utility-Based Anonymization Using Local Recoding Utility-Based Anonymization Using Local Recoding Jian Xu 1 Wei Wang 1 Jian Pei Xiaoyuan Wang 1 Baile Shi 1 Ada Wai-Chee Fu 3 1 Fudan University, China, {xujian, weiwang1, xy wang, bshi}@fudan.edu.cn Simon

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud R. H. Jadhav 1 P.E.S college of Engineering, Aurangabad, Maharashtra, India 1 rjadhav377@gmail.com ABSTRACT: Many

More information

Incognito: Efficient Full Domain K Anonymity

Incognito: Efficient Full Domain K Anonymity Incognito: Efficient Full Domain K Anonymity Kristen LeFevre David J. DeWitt Raghu Ramakrishnan University of Wisconsin Madison 1210 West Dayton St. Madison, WI 53706 Talk Prepared By Parul Halwe(05305002)

More information

Mining of Web Server Logs using Extended Apriori Algorithm

Mining of Web Server Logs using Extended Apriori Algorithm International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Comparison and Analysis of Anonymization Techniques for Preserving Privacy in Big Data

Comparison and Analysis of Anonymization Techniques for Preserving Privacy in Big Data Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 2 (2017) pp. 247-253 Research India Publications http://www.ripublication.com Comparison and Analysis of Anonymization

More information

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of

More information

Anonymity in Unstructured Data

Anonymity in Unstructured Data Anonymity in Unstructured Data Manolis Terrovitis, Nikos Mamoulis, and Panos Kalnis Department of Computer Science University of Hong Kong Pokfulam Road, Hong Kong rrovitis@cs.hku.hk Department of Computer

More information

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique P.Nithya 1, V.Karpagam 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College,

More information

Privacy Preserving in Knowledge Discovery and Data Publishing

Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao, G.V Konda Reddy and G.Yedukondalu 33 Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao 1, G.V Konda Reddy 2, G.Yedukondalu 3 Abstract Knowledge Discovery is

More information

Injector: Mining Background Knowledge for Data Anonymization

Injector: Mining Background Knowledge for Data Anonymization : Mining Background Knowledge for Data Anonymization Tiancheng Li, Ninghui Li Department of Computer Science, Purdue University 35 N. University Street, West Lafayette, IN 4797, USA {li83,ninghui}@cs.purdue.edu

More information

AN EFFECTIVE CLUSTERING APPROACH TO WEB QUERY LOG ANONYMIZATION

AN EFFECTIVE CLUSTERING APPROACH TO WEB QUERY LOG ANONYMIZATION AN EFFECTIVE CLUSTERING APPROACH TO WEB QUERY LOG ANONYMIZATION Amin Milani Fard, Ke Wang School of Computing Science, Simon Fraser University, BC, Canada V5A 1S6 {milanifard,wangk}@cs.sfu.ca Keywords:

More information

Local and global recoding methods for anonymizing set-valued data

Local and global recoding methods for anonymizing set-valued data The VLDB Journal (211) 2:83 16 DOI 1.17/s778-1-192-8 REGULAR PAPER Local and global recoding methods for anonymizing set-valued data Manolis Terrovitis Nikos Mamoulis Panos Kalnis Received: 26 May 29 /

More information

Purna Prasad Mutyala et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (5), 2011,

Purna Prasad Mutyala et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (5), 2011, Weighted Association Rule Mining Without Pre-assigned Weights PURNA PRASAD MUTYALA, KUMAR VASANTHA Department of CSE, Avanthi Institute of Engg & Tech, Tamaram, Visakhapatnam, A.P., India. Abstract Association

More information

Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets

Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets Sheetal K. Labade Computer Engineering Dept., JSCOE, Hadapsar Pune, India Srinivasa Narasimha

More information

CS573 Data Privacy and Security. Li Xiong

CS573 Data Privacy and Security. Li Xiong CS573 Data Privacy and Security Anonymizationmethods Li Xiong Today Clustering based anonymization(cont) Permutation based anonymization Other privacy principles Microaggregation/Clustering Two steps:

More information

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S.

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series Introduction to Privacy-Preserving Data Publishing Concepts and Techniques Benjamin C M Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S Yu CRC

More information

CERIAS Tech Report Injector: Mining Background Knowledge for Data Anonymization by Tiancheng Li; Ninghui Li Center for Education and Research

CERIAS Tech Report Injector: Mining Background Knowledge for Data Anonymization by Tiancheng Li; Ninghui Li Center for Education and Research CERIAS Tech Report 28-29 : Mining Background Knowledge for Data Anonymization by Tiancheng Li; Ninghui Li Center for Education and Research Information Assurance and Security Purdue University, West Lafayette,

More information

AN EFFECTIVE FRAMEWORK FOR EXTENDING PRIVACY- PRESERVING ACCESS CONTROL MECHANISM FOR RELATIONAL DATA

AN EFFECTIVE FRAMEWORK FOR EXTENDING PRIVACY- PRESERVING ACCESS CONTROL MECHANISM FOR RELATIONAL DATA AN EFFECTIVE FRAMEWORK FOR EXTENDING PRIVACY- PRESERVING ACCESS CONTROL MECHANISM FOR RELATIONAL DATA Morla Dinesh 1, Shaik. Jumlesha 2 1 M.Tech (S.E), Audisankara College Of Engineering &Technology 2

More information

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.1105

More information

PPKM: Preserving Privacy in Knowledge Management

PPKM: Preserving Privacy in Knowledge Management PPKM: Preserving Privacy in Knowledge Management N. Maheswari (Corresponding Author) P.G. Department of Computer Science Kongu Arts and Science College, Erode-638-107, Tamil Nadu, India E-mail: mahii_14@yahoo.com

More information

Research Article Apriori Association Rule Algorithms using VMware Environment

Research Article Apriori Association Rule Algorithms using VMware Environment Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,

More information

Blocking Anonymity Threats Raised by Frequent Itemset Mining

Blocking Anonymity Threats Raised by Frequent Itemset Mining Blocking Anonymity Threats Raised by Frequent Itemset Mining M. Atzori 1,2 F. Bonchi 2 F. Giannotti 2 D. Pedreschi 1 1 Department of Computer Science University of Pisa 2 Information Science and Technology

More information

An Algorithm for Frequent Pattern Mining Based On Apriori

An Algorithm for Frequent Pattern Mining Based On Apriori An Algorithm for Frequent Pattern Mining Based On Goswami D.N.*, Chaturvedi Anshu. ** Raghuvanshi C.S.*** *SOS In Computer Science Jiwaji University Gwalior ** Computer Application Department MITS Gwalior

More information

Survey of k-anonymity

Survey of k-anonymity NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA Survey of k-anonymity by Ankit Saroha A thesis submitted in partial fulfillment for the degree of Bachelor of Technology under the guidance of Dr. K. S. Babu Department

More information

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust G.Mareeswari 1, V.Anusuya 2 ME, Department of CSE, PSR Engineering College, Sivakasi, Tamilnadu,

More information

Privacy Preserved Data Publishing Techniques for Tabular Data

Privacy Preserved Data Publishing Techniques for Tabular Data Privacy Preserved Data Publishing Techniques for Tabular Data Keerthy C. College of Engineering Trivandrum Sabitha S. College of Engineering Trivandrum ABSTRACT Almost all countries have imposed strict

More information

A Novel Model for Encryption of Telugu Text Using Visual Cryptography Scheme

A Novel Model for Encryption of Telugu Text Using Visual Cryptography Scheme A Novel Model for Encryption of Telugu Text Using Visual Cryptography Scheme G. Lakshmeeswari *, D. Rajya Lakshmi, Y. Srinivas, and G. Hima Bindu GIT, GITAM University, Visakhapatnam, Andhra Pradesh {lak_pr,rdavuluri}@yahoo.com,

More information

Slicing Technique For Privacy Preserving Data Publishing

Slicing Technique For Privacy Preserving Data Publishing Slicing Technique For Privacy Preserving Data Publishing D. Mohanapriya #1, Dr. T.Meyyappan M.Sc., MBA. M.Phil., Ph.d., 2 # Department of Computer Science and Engineering, Alagappa University, Karaikudi,

More information

Information Security in Big Data: Privacy & Data Mining

Information Security in Big Data: Privacy & Data Mining Engineering (IJERCSE) Vol. 1, Issue 2, December 2014 Information Security in Big Data: Privacy & Data Mining [1] Kiran S.Gaikwad, [2] Assistant Professor. Seema Singh Solanki [1][2] Everest College of

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

More information

Distributed Data Anonymization with Hiding Sensitive Node Labels

Distributed Data Anonymization with Hiding Sensitive Node Labels Distributed Data Anonymization with Hiding Sensitive Node Labels C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan University,Trichy

More information

A Novel Technique for Privacy Preserving Data Publishing

A Novel Technique for Privacy Preserving Data Publishing Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING Neha V. Sonparote, Professor Vijay B. More. Neha V. Sonparote, Dept. of computer Engineering, MET s Institute of Engineering Nashik, Maharashtra,

More information

Comparing the Performance of Frequent Itemsets Mining Algorithms

Comparing the Performance of Frequent Itemsets Mining Algorithms Comparing the Performance of Frequent Itemsets Mining Algorithms Kalash Dave 1, Mayur Rathod 2, Parth Sheth 3, Avani Sakhapara 4 UG Student, Dept. of I.T., K.J.Somaiya College of Engineering, Mumbai, India

More information

A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING

A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING 1 B.KARTHIKEYAN, 2 G.MANIKANDAN, 3 V.VAITHIYANATHAN 1 Assistant Professor, School of Computing, SASTRA University, TamilNadu, India. 2 Assistant

More information

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

Ambiguity: Hide the Presence of Individuals and Their Privacy with Low Information Loss

Ambiguity: Hide the Presence of Individuals and Their Privacy with Low Information Loss : Hide the Presence of Individuals and Their Privacy with Low Information Loss Hui (Wendy) Wang Department of Computer Science Stevens Institute of Technology Hoboken, NJ, USA hwang@cs.stevens.edu Abstract

More information

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com

More information

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique Sumit Jain 1, Abhishek Raghuvanshi 1, Department of information Technology, MIT, Ujjain Abstract--Knowledge

More information

Enhancing Clustering Results In Hierarchical Approach By Mvs Measures

Enhancing Clustering Results In Hierarchical Approach By Mvs Measures International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.25-30 Enhancing Clustering Results In Hierarchical Approach

More information

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others Vol.15 No.6 J. Comput. Sci. & Technol. Nov. 2000 A Fast Algorithm for Mining Association Rules HUANG Liusheng (ΛΠ ), CHEN Huaping ( ±), WANG Xun (Φ Ψ) and CHEN Guoliang ( Ξ) National High Performance Computing

More information

A Review on Privacy Preserving Data Mining Approaches

A Review on Privacy Preserving Data Mining Approaches A Review on Privacy Preserving Data Mining Approaches Anu Thomas Asst.Prof. Computer Science & Engineering Department DJMIT,Mogar,Anand Gujarat Technological University Anu.thomas@djmit.ac.in Jimesh Rana

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

Privacy-Preserving Data Mining in the Fully Distributed Model

Privacy-Preserving Data Mining in the Fully Distributed Model Privacy-Preserving Data Mining in the Fully Distributed Model Rebecca Wright Stevens Institute of Technology www.cs.stevens.edu/~rwright MADNES 05 22 September, 2005 (Includes joint work with Zhiqiang

More information

Parallel Popular Crime Pattern Mining in Multidimensional Databases

Parallel Popular Crime Pattern Mining in Multidimensional Databases Parallel Popular Crime Pattern Mining in Multidimensional Databases BVS. Varma #1, V. Valli Kumari *2 # Department of CSE, Sri Venkateswara Institute of Science & Information Technology Tadepalligudem,

More information

Mamatha Nadikota et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (4), 2011,

Mamatha Nadikota et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (4), 2011, Hashing and Pipelining Techniques for Association Rule Mining Mamatha Nadikota, Satya P Kumar Somayajula,Dr. C. P. V. N. J. Mohan Rao CSE Department,Avanthi College of Engg &Tech,Tamaram,Visakhapatnam,A,P..,India

More information

Data Mining for XML Query-Answering Support

Data Mining for XML Query-Answering Support IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 5, Issue 6 (Sep-Oct. 2012), PP 25-29 Data Mining for XML Query-Answering Support KC. Ravi Kumar 1, E. Krishnaveni

More information

K-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007

K-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007 K-Anonymity and Other Cluster- Based Methods Ge Ruan Oct 11,2007 Data Publishing and Data Privacy Society is experiencing exponential growth in the number and variety of data collections containing person-specific

More information

Privacy Preserving High-Dimensional Data Mashup Megala.K 1, Amudha.S 2 2. THE CHALLENGES 1. INTRODUCTION

Privacy Preserving High-Dimensional Data Mashup Megala.K 1, Amudha.S 2 2. THE CHALLENGES 1. INTRODUCTION Privacy Preserving High-Dimensional Data Mashup Megala.K 1, Amudha.S 2 1 PG Student, Department of Computer Science and Engineering, Sriram Engineering College, Perumpattu-602 024, Tamil Nadu, India 2

More information

Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory

Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining Gyozo Gidofalvi Uppsala Database Laboratory Announcements Updated material for assignment 3 on the lab course home

More information

Privacy Breaches in Privacy-Preserving Data Mining

Privacy Breaches in Privacy-Preserving Data Mining 1 Privacy Breaches in Privacy-Preserving Data Mining Johannes Gehrke Department of Computer Science Cornell University Joint work with Sasha Evfimievski (Cornell), Ramakrishnan Srikant (IBM), and Rakesh

More information

CHUIs-Concise and Lossless representation of High Utility Itemsets

CHUIs-Concise and Lossless representation of High Utility Itemsets CHUIs-Concise and Lossless representation of High Utility Itemsets Vandana K V 1, Dr Y.C Kiran 2 P.G. Student, Department of Computer Science & Engineering, BNMIT, Bengaluru, India 1 Associate Professor,

More information

Performance Analysis of Frequent Closed Itemset Mining: PEPP Scalability over CHARM, CLOSET+ and BIDE

Performance Analysis of Frequent Closed Itemset Mining: PEPP Scalability over CHARM, CLOSET+ and BIDE Volume 3, No. 1, Jan-Feb 2012 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info ISSN No. 0976-5697 Performance Analysis of Frequent Closed

More information

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey ISSN No. 0976-5697 Volume 8, No. 5, May-June 2017 International Journal of Advanced Research in Computer Science SURVEY REPORT Available Online at www.ijarcs.info Preserving Privacy during Big Data Publishing

More information

Building a Concept Hierarchy from a Distance Matrix

Building a Concept Hierarchy from a Distance Matrix Building a Concept Hierarchy from a Distance Matrix Huang-Cheng Kuo 1 and Jen-Peng Huang 2 1 Department of Computer Science and Information Engineering National Chiayi University, Taiwan 600 hckuo@mail.ncyu.edu.tw

More information

ADVANCES in NATURAL and APPLIED SCIENCES

ADVANCES in NATURAL and APPLIED SCIENCES ADVANCES in NATURAL and APPLIED SCIENCES ISSN: 1995-0772 Published BYAENSI Publication EISSN: 1998-1090 http://www.aensiweb.com/anas 2017 May 11(7): pages 585-591 Open Access Journal A Privacy Preserving

More information

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Virendra Kumar Shrivastava 1, Parveen Kumar 2, K. R. Pardasani 3 1 Department of Computer Science & Engineering, Singhania

More information

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008 121 An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

More information

The Fuzzy Search for Association Rules with Interestingness Measure

The Fuzzy Search for Association Rules with Interestingness Measure The Fuzzy Search for Association Rules with Interestingness Measure Phaichayon Kongchai, Nittaya Kerdprasop, and Kittisak Kerdprasop Abstract Association rule are important to retailers as a source of

More information

On Privacy-Preservation of Text and Sparse Binary Data with Sketches

On Privacy-Preservation of Text and Sparse Binary Data with Sketches On Privacy-Preservation of Text and Sparse Binary Data with Sketches Charu C. Aggarwal Philip S. Yu Abstract In recent years, privacy preserving data mining has become very important because of the proliferation

More information

L-Diversity Algorithm for Incremental Data Release

L-Diversity Algorithm for Incremental Data Release Appl. ath. Inf. Sci. 7, No. 5, 2055-2060 (203) 2055 Applied athematics & Information Sciences An International Journal http://dx.doi.org/0.2785/amis/070546 L-Diversity Algorithm for Incremental Data Release

More information

A Novel method for Frequent Pattern Mining

A Novel method for Frequent Pattern Mining A Novel method for Frequent Pattern Mining K.Rajeswari #1, Dr.V.Vaithiyanathan *2 # Associate Professor, PCCOE & Ph.D Research Scholar SASTRA University, Tanjore, India 1 raji.pccoe@gmail.com * Associate

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 1, Jan-Feb 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 1, Jan-Feb 2015 RESEARCH ARTICLE OPEN ACCESS Reviewing execution performance of Association Rule Algorithm towards Data Mining Dr. Sanjay Kumar 1, Abhishek Shrivastava 2 Associate Professor 1, Research Scholar 2 Jaipur

More information

A mining method for tracking changes in temporal association rules from an encoded database

A mining method for tracking changes in temporal association rules from an encoded database A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil

More information

Performance Evaluation of Sequential and Parallel Mining of Association Rules using Apriori Algorithms

Performance Evaluation of Sequential and Parallel Mining of Association Rules using Apriori Algorithms Int. J. Advanced Networking and Applications 458 Performance Evaluation of Sequential and Parallel Mining of Association Rules using Apriori Algorithms Puttegowda D Department of Computer Science, Ghousia

More information

ETP-Mine: An Efficient Method for Mining Transitional Patterns

ETP-Mine: An Efficient Method for Mining Transitional Patterns ETP-Mine: An Efficient Method for Mining Transitional Patterns B. Kiran Kumar 1 and A. Bhaskar 2 1 Department of M.C.A., Kakatiya Institute of Technology & Science, A.P. INDIA. kirankumar.bejjanki@gmail.com

More information

Optimization using Ant Colony Algorithm

Optimization using Ant Colony Algorithm Optimization using Ant Colony Algorithm Er. Priya Batta 1, Er. Geetika Sharmai 2, Er. Deepshikha 3 1Faculty, Department of Computer Science, Chandigarh University,Gharaun,Mohali,Punjab 2Faculty, Department

More information

Research Trends in Privacy Preserving in Association Rule Mining (PPARM) On Horizontally Partitioned Database

Research Trends in Privacy Preserving in Association Rule Mining (PPARM) On Horizontally Partitioned Database 204 IJEDR Volume 2, Issue ISSN: 232-9939 Research Trends in Privacy Preserving in Association Rule Mining (PPARM) On Horizontally Partitioned Database Rachit Adhvaryu, 2 Nikunj Domadiya PG Student, 2 Professor

More information

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati Analytical Representation on Secure Mining in Horizontally Distributed Database Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

K ANONYMITY. Xiaoyong Zhou

K ANONYMITY. Xiaoyong Zhou K ANONYMITY LATANYA SWEENEY Xiaoyong Zhou DATA releasing: Privacy vs. Utility Society is experiencing exponential growth in the number and variety of data collections containing person specific specific

More information

Achieving k-anonmity* Privacy Protection Using Generalization and Suppression

Achieving k-anonmity* Privacy Protection Using Generalization and Suppression UT DALLAS Erik Jonsson School of Engineering & Computer Science Achieving k-anonmity* Privacy Protection Using Generalization and Suppression Murat Kantarcioglu Based on Sweeney 2002 paper Releasing Private

More information

NON-CENTRALIZED DISTINCT L-DIVERSITY

NON-CENTRALIZED DISTINCT L-DIVERSITY NON-CENTRALIZED DISTINCT L-DIVERSITY Chi Hong Cheong 1, Dan Wu 2, and Man Hon Wong 3 1,3 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong {chcheong, mhwong}@cse.cuhk.edu.hk

More information