(t,k)-hypergraph anonymization: an approach for secure data publishing

Size: px
Start display at page:

Download "(t,k)-hypergraph anonymization: an approach for secure data publishing"

Transcription

1 SECURITY AND COMMUNICATION NETWORKS Security Comm. Networks 2015; 8: Published online 25 September 2014 in Wiley Online Library (wileyonlinelibrary.com) RESEARCH ARTICLE (t,k)-hypergraph anonymization: an approach for secure data publishing Atefeh Asayesh*, Mohammad Ali Hadavi and Rasool Jalili Data and Network Security Laboratory (DNSL), Department of Computer Engineering, Sharif University of Technology, Tehran, Iran ABSTRACT Privacy preservation is an important issue in data publishing. Existing approaches on privacy-preserving data publishing rely on tabular anonymization techniques such as k-anonymity, which do not provide appropriate results for aggregate queries. The solutions based on graph anonymization have also been proposed for relational data to hide only bipartite relations. In this paper, we propose an approach for anonymizing multirelation constraints (ternary or more) with (t,k) hypergraph anonymization in data publishing. To this end, we model constraints as undirected hypergraphs and formally cluster attribute relations as hyperedge with the t-means-clustering algorithm. In addition, anonymization is carried out with a k-anonymity method in every cluster for which the parameter k can vary in each cluster, to attain more flexibility and less information loss with respect to utility. Our experiments demonstrate that this approach offers a great trade-off between privacy and utility. Copyright 2014 John Wiley & Sons, Ltd. KEYWORDS data publishing; privacy; anonymization; hypergraph; clustering *Correspondence Atefeh Asayesh, Data and Network Security Laboratory (DNSL), Department of Computer Engineering, Sharif University of Technology, Tehran, Iran. asayesh@ ce.sharif.edu 1. INTRODUCTION Numerous organizations publish microdata for a variety of purposes, such as demographic and public health research. Although attributes that clearly identify individuals, such as Name and ID Number, are usually removed, such databases can sometimes be joined with other public databases on attributes such as postcode, gender, and date of birth to reidentify individuals who were supposed to remain anonymous [1]. Linking attacks are easier by the availability of other databases over the Internet. According to one study, approximately 87% of the population of the United States can be uniquely identified on the basis of their postcode, gender, and the date of birth [2,3]. Such attributes are called quasi-identifiers. A large amount of data from various sources is required to produce useful statistical results. However, data analysis tools and methods may be maliciously used to disclose private and sensitive information. Therefore, privacy preservation becomes an essential issue in data publishing. Encryption changes data such that ad hoc queries cannot be answered correctly on the released/published database [4 6]. So, encryption-based methods such as in [7] lose generality for the purpose of data publishing. Database fragmentation as another approach leads to execution of queries in more than one database fragments and obtaining the final answer by the combination of received results [8,9]. However, statistical analysis on relations pertinent to more than two attributes is impossible and may lead to wrong ad hoc results. For example, having a constraint with three attributes postcode, gender, and date of birth, and aiming to preserve privacy, each attribute should be included in a separate fragment to misdirect statistical analysis relied on the attributes. Perturbing [10,11] and k-anonymity [2,7,12] models are two major techniques for such a goal. The k-anonymity model has been extensively studied because of its relative conceptual simplicity and effectiveness [10,13 15]. In this approach, data are anonymized before being released in order to prevent potential reidentification attacks. While in data publishing, privacy is aimed at concealing the association between attribute values [16,17], most of the existing anonymization approaches are limited to consider only bipartite attribute associations [18,19]. Nevertheless, in some real databases, it is possible to have privacy constraints in more than two attributes. In [13], Cormode et al. use(k,l)-grouping to anonymize associations between two entities. Their approach works only for binary constraints because they use bipartite graph in which there are two groups of nodes Copyright 2014 John Wiley & Sons, Ltd.

2 A. Asayesh, M. A. Hadavi and R. Jalili (t,k)-hypergraph anonymization Extending their method to multirelation constraints fails to meet utility preserving together with privacy concerns simultaneously. Meanwhile, this condition leads to wrong answers. This is more highlighted through Example 1. Example 1. assume a relation with four attributes, R ={product, date of birth (DoB), address, illness}, as shown in Table I, in which illness may have sensitive values. This constraint is defined as (1). fproduct; DoB; addressg illness (1) In this paper, (1) constraint means that the ternary association between product, DoB, and address can lead to the deduction of illness. By extending the proposed method of Cormode et al. for ternary associations [13], we should consider two bipartite graphs: (product, DoB) and (DoB, address), and anonymize them separately. Furthermore, the maximum of the lower bound and the minimum of the upper bound of the response set should be computed for aggregate queries. The reason that we consider two graphs is due to bipartite graph in which there are only two parts. But in our example, we have three attributes. So, we want to see the three parts of product, DoB, and address. Therefore, we divide them into two bipartite graphs. This approach leads to the lack of data in some situations. Consider the following scenario using Table I. For secure data publishing based on bipartite graphs, we fragment the relation into two relations with attributes (product, DoB) and (DoB, address). Next, we model each fragment as a bipartite graph and then anonymize the fragments by Cormode et al. anonymization technique [13]. It should be noted that we cannot consider one bipartite graph of ({product, DoB, address}, illness). Because we want to answer aggregated queries related to two attributes among {product, DoB, address}, if we model {product, DoB, address} as one node, queries such as number of people who buy a2 and their address is c2 cannot be supported. We process this query in two tables. First, number of people who buy product a2 with the answer 3, and second, number of people with address c2 with the answer 3, as well. So, the final answer is 3 but the correct answer is 1. It means that privacy is provided by dividing the relation using Cormode et al. approach, but query results are not always acceptable for desired goals. Our motivation is to solve this problem. Meanwhile, we want to model the database as graph instead of tabular models to gain more utility with acceptable privacy. Table I. A sample relation with four attributes. Illness Address Date of birth Product d1 c1 b1 a1 d1 c2 b1 a1 d3 c2 b1 a2 d2 c1 b2 a2 d3 c2 b3 a3 d4 c1 b4 a2 In this paper, we propose a utility-preserving approach for data anonymization with ternary (or more) constraints based on graph anonymization techniques instead of tabular techniques. Satisfying the privacy concerns, this approach prevents linking and reidentification attacks, while it yields a small amount of information loss. The database is modeled as a hypergraph and tuples as hyperedges. This is carried out in two steps: the first step is hypergraph clustering (t clusters), and the second step is cluster anonymization (k-anonymization) using localrecoding method. Our method is more flexible than the other reported research conducted in data publishing scenarios based on k-anonymization, such as in [20 22]. The remainder of this paper is organized as follows. Section 2 presents the preliminary concepts including definitions and privacy models. Section 3 describes our proposed approach. In Section 4, we analyze the information loss and distortion ratio of the approach. Section 5 focuses on processing aggregate queries on anonymized database. The experimental evaluation is given in Section 6. Finally, the paper is concluded with the advantages of the approach in Section PRELIMINARIES In this paper, without loss of generality, ternary constraints are considered to develop proposed method. We can simply generalize our method to adopt privacy constraints on an arbitrary number of attributes. Also, the database is static, and the goal is to answer aggregate queries with aggregation functions such as COUNT and SUM. Some basic definitions are presented later. Definition 1. Hypergraph model: A hypergraph is the generalization of a graph wherein edges, called hyperedges, can connect more than two vertices. The formal definition is denoted as a set of vertices V and a set of hyperedges E among those vertices. A hyperedge e j 2 E is a subset of vertices in V. Graphs are special instances of hypergraphs where each hyperedge has exactly two vertices [23]. Definition 2. Quasi-identifier attribute set (QID): A quasiidentifier attribute set is a set of attributes in a relation that potentially reveals private information, possibly by joining with other tables [2]. As another definition, a quasi-identifier is a set of attributes that can be linked with external information with the purpose of reidentifying the individuals to whom information refers. In Example 1, attributes {product, DoB, address} are quasi-identifiers. In our scenario, QID is not null. Definition 3. Sensitive tuples: Tuples that may contain private values in an attribute of non-qid set are called sensitive tuples. Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 1307

3 (t,k)-hypergraph anonymization A. Asayesh, M. A. Hadavi and R. Jalili Definition 4. Equivalence class: An equivalence class of a relation with respect to an attribute set is the set of all tuples in the table containing identical values for the attribute set. Definition 5. k-anonymity property: A relation is k- anonymous if the size of every equivalence class with respect to the quasi-identifier attribute set is k or more. k-anonymity requires that every occurrence within a quasiidentifier attribute set has the frequency of at least k [2]. Definition 6. k-anonymization: A view of a table is said to be k-anonymous, if the view satisfies k-anonymity property with respect to quasi-identifiers [2]. Definition 7. The t-means-clustering algorithm: The t-means-clustering algorithm was developed by J. MacQueen [21] and also by J. A. Hartigan and M. A. Wong [24]. This algorithm classifies the objects based on attributes/features into t groups where t is a positive integer number. The grouping is carried out by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus, the purpose of t-means clustering is to classify the data. In the beginning step of t-means clustering, the number of clusters, that is t, is determined. Thus, any t random objects can be chosen as initial centroids. The other method is to use the first t objects in the sequence as initial centroids [24,25]. The t-means algorithm is described in Figure OUR APPROACH In general, it is desirable that the released data strives to preserve data utility as much as possible. To achieve this goal, we design a utility-preserving anonymization method that yields a small amount of information loss, while it satisfies k-anonymity property. In our approach, the database is modeled with a hypergraph. Also, we expose the hypergraph as an incidence matrix (V). In this matrix, the rows indicate the values of QID attributes, and columns demonstrate the association of attributes, known as hyperedges. In this paper, we consider QID with only three attributes, but it can be extended to more. As a formal definition, we have 8 9 v 11 v 12 v 1n >< v 21 v 22 v >= 2n Matrix V mn : >: >; v m1 v m2 v mn So, we can define V m n as follows: m : thesum of the counts of different values f or each atribute in QID n : the number of tuples (2) i 2 f1; ; ng : if fa i ; b i ; c i g 2 f1; ; mg then e i 2 E ; V ai;e i ¼ 1; V bi;e i ¼ 1 ; V ci;e i ¼ 1 It should be noted that the number of rows depends on the number of tuples with different values in each domain and that number could be very big, even larger than the number of tuples. So, the space occupied with the table should be high when we have m rows with p attributes in QID. But we use incident matrices that have a lot of elements with the value of 0. So, our matrix is sparse. Also, there are a lot of algorithms to store sparse matrix with optimum space [26,27]. In our approach, we did not use optimum sparse matrix storage, but in future work, we can solve this problem with mentioned algorithms in [26,27]. Example 2. Table II shows a relation of four attributes, namely, postcode, gender, DoB, and illness; and Table III shows the corresponding hypergraph representation of Table II. In Table III, we add one extra row to show sensitive tuples in the corresponding hypergraph. In Table III, every column represents one tuple in which the elements related to the values of considered tuple were set to 1. Our contribution is twofold. First, we introduce an algorithm for clustering the hypergraph in which the resulted intracluster similarity is high. Second, we propose an algorithm to anonymize each cluster with minimum information loss. These two steps lead to more flexibility and less information loss with respect to utility. In the next two subsections, the method is described based on clustering and anonymization. Table II. A sample relation. Postcode Gender Date of birth Illness Figure 1. The flowchart of t-means clustering algorithm Male 11/07/1987 Flu 4350 Male 05/02/1990 HIV 4357 Female 09/10/1985 HIV 4357 Male 11/07/1987 Fever 4357 Male 04/12/1995 HIV 4340 Female 09/10/1985 Fever 1308 Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

4 A. Asayesh, M. A. Hadavi and R. Jalili (t,k)-hypergraph anonymization Table III. Hypergraph representation of QID of Table II. e 6 e 5 e 4 e 3 e 2 e Female Male /07/ /10/ /02/ /12/ Sensitive 3.1. Clustering step Some previous anonymization techniques use a greedy approach for anonymization [2,21]. Instead, we focus on a clustering-based approach. We partition the hyperedges into groups of similar vertices and then anonymize vertices within each group. This enables us to have a more suitable hypergraph anonymization in terms of clustering information. In this section, the clustering algorithm bounded t-means is used. The classic clustering technique is known as the k-means algorithm. To avoid confusion with the k in k-anonymity, we use t-means throughout the paper. Many clustering algorithms need a cluster center, usually a point in the domain of dataset. One idea is that the center of a cluster be one of the nodes in that cluster. With this idea, we choose sensitive tuples as samples and pick t of them randomly as cluster centers. Now, the samples are grouped based on cluster centers. Therefore, sensitive tuples are distributed among different clusters that provide diversity in grouping, leading to more privacy. However, this may yield poor results if the centers do not provide an accurate representation of the cluster. For example, when two center points are close to each other, it is better to put them in one cluster rather than being in different groups. To tackle the problem, we work with virtual centers. The virtual centers do not need to be the real nodes of the graph. Therefore, the first step is to find the virtual centers, by grouping the relation into clusters. A sampling procedure is used to determine the virtual centers (step 1.1). Using the specified virtual centers, we group the entire database into clusters (step 1.2) by minimizing the sum of squares of distances between data and the corresponding virtual centers. The last step is to anonymize each database cluster. Our clustering method is shown in Algorithm Step 1.1: finding virtual centers. In this step, we want to find the best center points for each cluster. To this end, two main steps are performed. One is sampling, and the other is clustering on samples Sampling. To deal with large datasets, instead of clustering the whole database at once, which is computationally demanding, a sample out of the preprocessed database is picked consciously. The samples in our approach are sensitive tuples. In this step, tuples of database are divided into two parts: sensitive and non-sensitive tuples. Sensitive tuples are tuples that have a sensitive value in an attribute of non-qid set. Because the goal of secure publishing is answering aggregate queries, attributes of QID are not sensitive. Our approach assumes publishing data when it is not sensitive by its own. For example, if DNA or income is a sensitive attribute, for all of their values, they must not be published. In other words, data publishing is performed relying upon anonymization techniques for data values that are not intrinsically sensitive, but their association with other values reveals private information of an entity Cluster on Samples. We apply clustering on the samples to find t center points. The t-means algorithm takes an input parameter t and partitions a set of n objects into t clusters so that the resulted intracluster similarity is maximized. The t-means algorithm is applied to the samples for clustering the sensitive tuples into t groups, based on their similarities. Sim (e i, e j ) in (3) is the distance between e i and e j where e i and e j are tuples and d i and d k represent their vectors (p is the size of QID, i.e.p = 3). X p 2 sim e i ; e j ¼ d ik d jk (3) k¼1 After clustering the sample tuples into t groups, the center points of the clusters are computed. We adopt the bounded t-means algorithm in [14] to specify the virtual centers. In particular, consider a cluster of sensitive tuples {st v1 st vb }. Each tuple is assigned to a vector with m (all QID values) coordinates over {0, 1}. Then, the virtual center is a vector of f^c 1 ; ;;^c m g such that ^C j ¼ X b st vij i¼1 b j 1 j m (4) In (4), virtual center of cluster j has been computed as ^C j where b j is the number of tuples in a particular cluster Step 1.2: clustering the entire database. Let ^C 1 ; ;;^C t be t cluster centers resulting from step 1.1. Now, we partition all tuples of the database into t clusters, where cluster C i contains all tuples whose closest center point is ^C i. By grouping similar tuples into the same cluster, less information loss with high privacy is obtained by using the sampling procedure, compared with the situation when all the tuples are clustered at once without sampling. The evidence of this improvement is the size of clusters with and without sampling. Clusters of balanced sizes facilitate data handling in later anonymization step. Our experimental study shows that the sizes of clusters with virtual centers are of less skewed distribution. For example, with t = 3, the size of clusters by real centers (without sampling) varies from 300 to , while with Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 1309

5 (t,k)-hypergraph anonymization A. Asayesh, M. A. Hadavi and R. Jalili virtual centers, it varies around to More details of our experimental evaluation are explained in Section 7. Example 3. Let t = 2 for clustering Table II. Table IV shows two clusters as the result of applying our clustering algorithm on Table II. Algorithm1: Clustering Inputs: hg: Main database modeled as a hypergraph V: Hypergraph modeled as an incident matrix t: Required Number of clusters h: number of all sensitive tuples e: Tuples modeled as hyperedges d: Hyperedges represented as vectors of {0,1} m: The size of all QID values Outputs: b: The maximum size of each cluster Cluster[t]: t clusters with at most b tuples in each cluster 1- Function Make Incident Matrix(hg) { Make matrix V from hg based on (2) Add extra row to show sensitive tuples // 1 for sensitive & 0 for non-sensitive tuples Return matrix V } 2- Function Pick_sensitive_tuples (V) { Choose all sensitive tuples from V as samples. return {e 1,...,e h } } 3- Choose t numbers of samples randomly.{e 1,...,e t } 4- Function Pre-Cluster_Samples( {e 1,,e t })// t-means clustering algorithm { // (e i and e j are tuples and d i and d j represent their vectors.) Find similarities between every sample and t clusters. X p 2// sim e i ; e j ¼ d ik d jk p is the size of QID k¼1 Find the minimum of sim to find the best cluster for sample e i. //sensitive tuples in each cluster after clustering: {st 1 st b } return {st 1 st b } } 5- Function Calculate mean ({st v1 st vb }) //compute virtual center { X b st vij i¼1 ^c j ¼ b j //(b is number of tuples in a particular cluster) return virtual centers as vector of ^C 1 ; ;;^C t 6- Cluster V based on ^C 1 ; ;; ^C t // final clustering 7- Return Cluster[t] Cluster 1 Table IV. Hypergraph after clustering. Fever 09/10/1985 Female 4340 HIV 04/12/1995 Female 4357 Cluster 2 Flu 11/07/1987 Male 4350 HIV 05/02/1990 Male 4350 HIV 05/02/1985 Male 4357 Fever 11/07/1987 Male Anonymization step In the second phase, we anonymize each cluster with a k-anonymity method. To this purpose, we use localrecoding method [15]. Some methods anonymize a given database by mapping the values in the domain of quasi-identifier attributes to modified values. This is known as global recoding. Alternatively, some methods modify individual instances of data items, using local recoding. Two main local-recoding methods have been proposed in the literature. The first one, cell suppression, produces the anonymized database by suppressing individual cells of the original database [14,26]. The second one, cell generalization, maps individual cells to their generalized values using a hierarchy-based generalization model [18]. The global-recoding method causes too much distortion to a database. Therefore, we prefer using a local-recoding method. However, optimal local recoding is a non-deterministic polynomialtime hard problem, and good heuristic algorithms are required to achieve k-anonymization. Here, we propose a local-recoding algorithm with cell generalization using a top-down approach. In this algorithm, a tree is constructed for each attribute in QID set. At first, each equivalence class is generalized to the root of the tree. Then, QID attributes of tuples in an equivalence class are specialized in iterations, separately. During the specialization, k-anonymity should be satisfied. The process continues until we cannot specialize the tuples. It should be noted that when we group the tuples in to classes, if the size of a class is less than k, we add fake tuples to that class. Fake tuples are non-sensitive tuples that have common values of QID with that of equivalence class. Example 4. Three hierarchical trees related to Example 3 are shown in Figure 2. We show anonymization procedure on cluster 1 for QID attributes (postcode, DoB, and gender) step-by-step in Figures 3 5. Based on the generalization tree and local-recoding method, the anonymization of Table I with k = 2 results in clusters 1 and 2, indicated in Table V Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

6 A. Asayesh, M. A. Hadavi and R. Jalili (t,k)-hypergraph anonymization Algorithm2: Hypergraph Anonymization Inputs: hg: Main database modeled as a hypergraph V: Hypergraph modeled as an incident matrix t: Required number of clusters k: The size of equivalence class numt: number of tuples in a cluster Outputs: anon DB: Anonymized database 1- Function Cluster (hg,v,t) //Algorithm1 { return Cluster[t] } 2- Function AnonymizeCluster (Cluster[t]},k) { //Group similar tuples in the same class num = 0; WHILE there is a class with a size of less than k or num <=numt DO Find a class with minimum size (MIN Class). Choose the best class(ind Class) to combine with MIN Class// by using XOR method to find a class with the value of 0 in the result of XOR Calculate new size of IND Class Remove MIN Class num = num + 1; END WHILE Add fake tuples if the size of an equivalence class is less than k in each cluster. Generalize each equivalence class with the local recoding generalization method. } In this paper, we propose a framework for anonymizing multirelation constraints (ternary or more) with hypergraph anonymization in data publishing. Our framework is twofold: (1) clustering and (2) anonymization. Our contribution is much more in the first part. Furthermore, any anonymization algorithm can be used in the second part. We used an anonymization algorithm that is based on local-recoding method and almost is similar to many anonymization approaches such as in [2,7,22] basically with the change in specialization and adding fake tuple. The difference between our algorithm and other tabular anonymization algorithm is that we use a preprocessing approach named clustering consciously. This causes that greedy method of making classes work faster in anonymization step. The level of anonymization may not be of much difference in terms of algorithm. However, in overall view, the result of two steps is better than similar methods like in [2,7,22]. The advantage of our method is utility preserving in finding equivalence classes, because similar tuples are grouped into the same clusters at the first phase. In other words, grouping similar tuples into the same equivalence class is much more probable in the proposed approach, compared with the anonymization without clustering. Moreover, the time of constructing Figure 2. Postcode, DoB, and gender generalization trees. Figure 3. Postcode generalization steps for cluster 1 (Table I). Figure 4. Gender generalization steps for cluster 1 (Table I). an anonymized table is much less than using the general k-anonymity method. Furthermore, we can choose different values of k for each cluster to reach more utility in the anonymization. This leads to less information loss. Algorithm 2 describes the overview of (t, k)-hypergraph anonymization. Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 1311

7 (t,k)-hypergraph anonymization A. Asayesh, M. A. Hadavi and R. Jalili 5. INFORMATION LOSS In the proposed algorithm, there are two kinds of information loss: information loss due to generalization (GIL) and information loss due to adding fake tuples (FIL). Therefore, the total information loss consists of both GIL and FIL, computed individually GIL Figure 5. DoB generalization steps for cluster 1 (Table I). Cluster 1 Table V. Anonymized hypergraph. Fever Female 43** HIV Female 43** Cluster 2 Flu Male 4350 HIV Male 4350 HIV Male 4357 Fever Male FLEXIBILITY IN DETERMINING K FOR EACH CLUSTER As it was explained in anonymization step, different values of k in each cluster lead to more flexibility and utility. Now, the question is which value of k in each cluster results in less information loss. Suppose we have another parameter β related to privacy. This parameter demonstrates that the maximum number of sensitive tuples in each equivalence class is not more than β% of all tuples in that class. For example, consider the number of sensitive tuples and anonymization parameter in cluster 1 are h 1 and k 1, respectively. The maximum number of sensitive tuples in each equivalence class is computed as h 1 = NoE 1 β k 1. By defining NoE and β initially, k can be found in every cluster recursively. This process results in less information loss and more utility due to flexibility in determining k in each cluster. It should be noted that we present h 1 with the condition that the tuples in each equivalent class in lower than k. In the situation that we have a class with more than k tuples, we can find h for that class separately. Number of equivalence classes (NoE). The distortion value is equal to the distortion of the generalized dataset divided by the distortion of the fully generalized dataset, where the fully generalized dataset is one with all values of attributes, generalized to the root of the tree. So, GIL is computed by (5). GIL ¼ X t i¼1 X bi X p j¼1 k¼1 AGN i; j;k h agnk pn AGN i, j,k is the generalization height of kth QID attribute in tuple j of cluster i, and h agnk is the height of the generalization tree for kth QID attribute. In the previous example, GIL is computed as GIL ðexampleþ ¼ X t X bi i¼1 j¼1 PGN i; j h pgn þ DoGN i; j h dogn pn þ GeGN i; j h gegn (5) PGN i, j is the generalization height of postcode, DoGN i, j is the generalization height of DoB, and GenGN i, j is the generalization height of gender in tuple j of cluster i. n is the number of tuples, k is the anonymization factor, and b i is the number of tuples in each cluster. h pgn, h dogn, and h gegn are heights of the generalization tree for QID attributes. p is the size of QID set. In our example, h pgn =4, h dogn =7, h gegn = 1, and p = FIL In the anonymization phase, fake tuples will be added if the size of an equivalence class is less than k. This distortion value is equal to the number of all added fake tuples divided by the total number of tuples in database. So, FIL is computed by (7). X t X ui FTN i; j i¼1 j¼1 FIL ¼ (7) n where FTN i,j is the number of fake tuples in the equivalence class j of cluster i and u i is the number of equivalence classes in cluster i. The total information loss (TIL) is computed by (8), accordingly. (6) TIL ¼ GIL þ FIL (8) 1312 Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

8 A. Asayesh, M. A. Hadavi and R. Jalili (t,k)-hypergraph anonymization If it is desired to evaluate information loss as a rating parameter, we should compute the maximum value of TIL and compute distortion ratio based on information loss. For the maximum value of GIL, we have the total generalization of all tuples (9). When each tuple is placed in a separate equivalence class, the maximum value for FIL is obtained and k 1 fake tuples should be added for each tuple (10). Equation (11) shows the maximum value of information loss. Max GIL ¼ pn n ¼ p (9) ½nðk 1ÞŠ Max FIL ¼ ½n þ ðnðk 1ÞÞŠ ¼ k 1 k Max TIL ¼ Max GIL þ Max FIL ¼ p þ k 1 k Finally, we have 0 TIL p þ k 1 k Distortion ratio is obtained by (13). Distortion 2 X t i¼1 DR ¼ 6 4 RatioðDRÞ ¼ TIL Max TIL X bi X p j¼1 k¼1 pn AGN i; j;k h agnk 0 þ p þ k 1 k X t X ui FTN i; j i¼1 j¼1 n (10) (11) (12) 13 C A (13) 7 5 In our example, distortion ratio is computed as (14) 6. QUERY ANSWERING OF (T,K)- HYPERGRAPH ANONYMIZATION Some previous approaches on anonymization works on generalization based on tabular data such as in [15,27]. Their framework is also based on generalization of quasiidentifiers. However, none of these previous works discuss their impact on the accuracy of answering aggregation queries. In 2007, Zhang et al. proposed an approach for aggregate query answering on anonymized tables [28]. However, as in majority of works on k-anonymity [19], this work focuses only on tabular methods, whereas our approach is based on graph algorithms with lower information loss in addition to better efficiency in answering queries. The reason is that graph-based algorithms model data and relations as nodes and edges can lead to high speed to find edges and also less information loss because of clustering edges in the first step of our algorithm. In our point of view, it is desirable to answer aggregated queries correctly with minimum information loss. With this purpose, graph-based algorithms are good solutions. In our approach, there should be a query processor to provide acceptable answers in response to users queries. Because the main goal in data publishing is aggregate analysis of data, the server responds only to aggregate requests. Every COUNT and SUM queries related to QID attributes will be answered with investigating vertices of hypergraph with desired conditions. For example, if we want to answer the query the number of people whose postcode is 4350 and have dangerous HIV disease, we should consider all edges in the set {4350, 435 *, 43 * *, 4 * * *, * * * *} connected to HIV vertex. If we show the number of edges that satisfies our query condition with (15), the lower bound value of answer is computed by (16), considering TIL parameter. This means that by the anonymization step, some edges corresponding to our query condition may not be correct edges. Also fake tuples may be added to the database. So, it is necessary to prune the E qry, using TIL. It should be noted that E qry is the number of edges that meet the requirements of our query. ans high bound ¼ E qry (15) Distortion 20 DR ¼ 6 4 X t RatioðDRÞ example ¼ TIL Max TIL X bi i¼1 j¼1 PGN i; j h pgn þ DoGN i; j h dogn np þ GeGN i; j h gegn p þ k 1 k 1 0 C A þ X t X ui FTN i; j i¼1 j¼1 n 13 C A 7 5 (14) Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 1313

9 (t,k)-hypergraph anonymization A. Asayesh, M. A. Hadavi and R. Jalili ans lowbound ¼ E qry E qry TIL p þ k 1 k! ¼ E qry 1 TIL p þ k 1 k The final answer is bounded as in (17). (16) ans low bound final answer ans highbound (17) Our approach supports average (AVG) queries too. This is demonstrated by Example 5: Example 5. What is the average age of people who have dangerous HIV disease For this query, we should find edges that connect to HIV disease and find DoB vertices of those edges. Then, we should compute ages related to DoB. For example, if DoB is 11/07/1987, the age is = 26, and if DoB is 85 90, the age is 2013 [( )/2] = 25. After computing ages of all vertices placed in the answering set, we will compute the sum of these ages. But the fact is that some of these ages might be due to fake tuples. So, we should compute minimum and maximum values for average. In this case, we compute the average value of all answering age as an upper bound. For the lower bound answer, we compute the minimum number of edges placed in the answer set related to Equation (16). We call these edges as min-num. Then, we find min-num value of vertices related to the age with the minimum value. Furthermore, we compute the average of lower bound. For example, assume our answer set of age is [22, 23, 25, 26] and ans lowbound = 3 related to Equation (16). Also ans highbound = E qry = 6. So, the maximum value of AVG is ( )/6 = 24. Lower bound of AVG is computed from three minimum values of answer set [22, 23]. So, we have ( )/3 = 23. At theend,wehave23 AVG 24 For MAX and MIN queries, we should find edges that encompass our query condition. For this situation, we return to anonymized hypergraph and find vertices that concern to the condition of MAX or MIN. Then, the edges connected to those vertices are found. Finally, we use formula (17) to find bounds of answer. 7. EMPIRICAL STUDY We implemented our algorithm using MATLAB on a system with a 2.4-GHz processor and 4 GB of RAM. In our experiments, a dataset of records is generated, health database, with four attributes, postcode (four digits), DoB, gender, and illness. This algorithm is evaluated in terms of the execution time and distortion ratio. We conducted the experiments for different k from 2 to 7 with the constant t = 3. This has been performed with and without clustering. As shown in Figures 6 and 7 with clustering step, higher values of k results in more information loss and distortion ratio. This variation is not significant, Information loss Distortion ratio K-Size of equivalence classes Figure 6. Information loss with different values of k. X: 2 Y: X: 2 Y: X: 3 Y: X: 3 Y: X: 4 Y: X: 4 Y: X: 5 Y: X: 5 Y: X: 6 Y: X: 6 Y: X: 7 Y: X: 7 Y: with clustering without clustering K-Size of equivalence classes Figure 7. Distortion ratio with different values of k. however. It can be observed in Figure 8 that the execution time of processing our algorithm with clustering is much less than when we do not have clustering. Our approach is implemented with and without sampling for t = 3 and t = 5. As shown in Figures 9 and 10, with more clusters, less information loss and consequently better utility is obtained. It has been demonstrated in [14] Execution time / sec K-Size of equivalence classes with clustering without clustering Figure 8. Execution time with different values of k Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

10 A. Asayesh, M. A. Hadavi and R. Jalili (t,k)-hypergraph anonymization Scenario 3: Setting t = n and k =1 gives a (n, 1)- hypergraph anonymization. E = E (hyperedges of anonymized hypergraph) and HG (anonymized hypergraph) exactly represents the original hypergraph HG. In this scenario, we have a perfect utility but no privacy. Scenario 4: Setting t = n and k = n gives a (n, n)- hypergraph anonymization. That is, we publish a hypergraph with (n 1) n hyperedges. In this scenario, we have a perfect privacy but no utility. Distortion ratio Figure 9. Information loss with different values of k. 5 clusters-type1 5 clusters-type2 3 clusters-type1 3 clusters-type K-Size of equivalence classes Figure 10. Distortion ratio with different values of k. p ffiffi that the best value for t is n, obtained from experimental results. In Figures 9 and 10, we show the performance of our approach with and without sampling by type 2 and type 1, respectively. As demonstrated in the figures, different values of (t, k) over possible values of them, bounded by (1, 1) and (n, n), give various results around privacy and utility. For example, four following scenarios describe a privacy-utility trade-off. Scenario 1: Setting t = 1 and k = 1 gives a (1, 1)- hypergraph anonymization. E = E (hyperedges of anonymized hypergraph) and HG (anonymized hypergraph) exactly represents the original hypergraph HG. Thus, every query on HG is answered with the same precision as on HG. In this scenario, we have a perfect utility but no privacy. Scenario 2: Setting t = 1 and k = n gives a (1, n)- hypergraph anonymization. That is, a table with one cluster and an original hypergraph will be published. Recall that HG is similar to the original k-anonymized table. In this scenario, we have a limited utility in answering queries, while privacy is not preserved as much as when t > 1. Among these scenarios, there are many possibilities for making a trade-off between utility and privacy by choosing appropriate values of t and k. Generally, given a (t, k)- hypergraph anonymization, aggregate queries will be answered approximately correct with small k and large t.with increasing k and decreasing t, the accuracy of query answers is reduced, according to (7) and (15). Clearly, a (t, k)-hypergraph anonymization offers more utility and less privacy than (t, k )-hypergraph anonymization k < k and t < t. 8. CONCLUSION In this paper, we considered the problem of anonymizing data in the form of hypergraph for multirelations with more than two attributes. In particular, we investigated an effective clustering algorithm for hypergraph anonymization to find similar vertices in database, modeled as a hypergraph. The novelty of the proposed method compared with the existing clustering-based anonymization methods is to compute and utilize pairwise similarity values for clustering hyperedges. We also provided flexibility in determining k, separately for each cluster, to balance information loss and privacy. Choosing sensitive tuples for sampling provides t-diversity among clusters. Thereupon, distributing sensitive tuples into t clusters results in high privacy in our approach. The experiments confirm that proposed approach offers a strong trade-off between privacy and utility. While we obtain t-diversity among clusters, it is important to obtain l diversity in each cluster where l is number of sensitive tuples in each equivalent class. Our approach does not support this issue. In future work, we plan to extend the algorithm with l diversity and also t closeness to reach more privacy with acceptable utility in comparison with approaches such as in [27]. Our approach also yields a significant speed-up in run-time of the clustering step, especially when performed on large databases. With the aim of improving performance, we plan to use more suitable methods of clustering such as in [29,30]. REFERENCES 1. Riboni D, Pareschi L, Bettini C. JS-reduce: defending your data from sequential background knowledge attacks. IEEE transactions on dependable and secure computing 2012; 8(3): Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 1315

11 (t,k)-hypergraph anonymization A. Asayesh, M. A. Hadavi and R. Jalili 2. Sweeney L. K-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems 2002; 10: Alto P, Golle Ph. Revisiting the uniqueness of simple demographics in the US, in Proceedings of the 5th ACM workshop on Privacy in electronic society, New York, 2006; Kiernan J, Srikant R, Xu Y, Agrawal R. Order preserving encryption for numeric data, in Proceedings of the 2004 ACM SIGMOD international conference on Management of data, Paris, France, June Waters B, Boneh D. Conjunctive, subset, and range queries on encrypted data, in Proceedings of the 4th conference on Theory of cryptography, 2007; Damiani E, De Capitani Di Vimercati S, Jajodia S, Paraboschi S, Samarati P, Ceselli A. Modeling and assessing inference exposure in encrypted databases. ACM Transactions on Information and System Security (TISSEC) 2005; 8(1): Samarati P. Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 2001; 13(6): Foresti S, Jajodia S, Paraboschi S, Samarati P, Vimercati S. Fragments and loose associations: respecting privacy in data publishing. Very Large Data Bases Endowment(VLDB) 2010; 3: De. C. di Vimercati S, Foresti S, Jajodia S, Paraboschi S, Samarati P, Ciriani V. Fragmentation design for efficient query execution over sensitive distributed databases, in Proceedings of the 29th IEEE International Conference on Distributed Computing Systems, Washington, DC, USA, 2009; Agrawal D, Aggarwal C. On the design and quantification of privacy preserving data mining algorithms, in Proceedings of the twentieth ACM SIGMOD- SIGACT-SIGART symposium on Principles of database systems, 2001; Datta S, Wang Q, Sivakumar K, Kargupta H. On the privacy preserving properties of random data perturbation techniques, in Third IEEE International Conference on Data mining, 2003; Rokach L, Elovici Y, Shapira B, Kisilevich S. Efficient multidimensional suppression for k-anonymity. IEEE Transactions on Knowledge and Data Engineering 2010; 22: Cormode G, Srivastava D, Yu T, Zhang Q. Anonymizing bipartite graph data using safe groupings, in Proceeding of the 34th International Conference on Very Large Data Bases (VLDB), vol. 1, Auckland, New Zealand, August 2008; Thompson B, Yao D. Union-split clustering algorithm and social network anonymization, in Proceedings of the 4th International Symposium on Information, Computer, and Communications Security, New York, USA, 2009; LeFevre K, DeWitt DJ, Ramakrishnan R. Incognito: efficient full-domain k-anonymity, in SIGMOD Conference, 2005; Kifer D, LeFevre K, Machanavajjhala A, Chung Chen B. Privacy-preserving data publishing. Foundations and Trends in Databases(ACM) 2009; 2(1-2): Funga BCM, Mohammeda N, Desai BC, Wang K, Chena R. Privacy-preserving trajectory data publishing by local suppression. Information Sciences 2013; 231: Srikant R, Agrawal R. Mining generalized association rules, in Proceedings of the 21th International Conference on Very Large Data base, San Francisco, CA, USA, August 1995; Eltabakh MY, Padma J, Silva YN, Pei He WG, Aref EB. Query processing with K-anonymity. International Journal of Data Engineering (IJDE) 2012; 3(2): Fung BCM, Wang K, Yu PS. Top-down specialization for information and privacy preservation, in The 21st International Conference on Data Engineering (ICDE), 2005; Macqueen JB. Some methods for classification and analysis of multivariate observations, in Procedings of the Fifth Berkeley Symposium on Math, Statistics and Probability, vol. 1, 1967; Aggarwal G, Feder T, Kenthapadi K, et al. Anonymizing tables, in 10th Int l Conference on Database Theory, January Papa DA, Markov IL. Hypergraph partitioning and clustering, in Approximation Algorithms and Metaheuristics, Hartigan JA, Wong MA. A K-means clustering algorithm. Journal of the Royal Statistical Society: Series C: Applied Statistics 1979; 28(1): Chang CC, Thompson B, Wang H, Yao D. Towards publishing recommendation data with predictive anonymization, in Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security, 2010; Agrawal R, Bayardo R. Data privacy through optimal k-anonymization, in Proceedings of the 21st International Conference on Data Engineering(ICDE), 2005; Li T, Venkatasubramanian S, Li N. t-closeness: privacy beyond k-anonymity and l-diversity, in Proceedings of IEEE International Conference on Data Engineering, Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

12 A. Asayesh, M. A. Hadavi and R. Jalili (t,k)-hypergraph anonymization 28. Koudas N, Srivastava D, Yu T, Zhang Q. Aggregate query answering on anonymized tables, in IEEE 23rd International Conference on Data Engineering (ICDE), Istanbul, 2007; Lai J-H, Huang D, Zheng W-S, Wang C-D. A support vector-based algorithm for clustering data streams. IEEE Transactions on Knowledge and Data Engineering 2013; 25(6): Pradeepini G, Jyothi S. An improved k-means clustering algorithm with refined initial centroids. Publications Of Problems & Application in Engineering Research Paper 2013; 4(1). Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 1317

An Efficient Clustering Method for k-anonymization

An Efficient Clustering Method for k-anonymization An Efficient Clustering Method for -Anonymization Jun-Lin Lin Department of Information Management Yuan Ze University Chung-Li, Taiwan jun@saturn.yzu.edu.tw Meng-Cheng Wei Department of Information Management

More information

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Dr.K.P.Kaliyamurthie HOD, Department of CSE, Bharath University, Tamilnadu, India ABSTRACT: Automated

More information

Emerging Measures in Preserving Privacy for Publishing The Data

Emerging Measures in Preserving Privacy for Publishing The Data Emerging Measures in Preserving Privacy for Publishing The Data K.SIVARAMAN 1 Assistant Professor, Dept. of Computer Science, BIST, Bharath University, Chennai -600073 1 ABSTRACT: The information in the

More information

Survey of Anonymity Techniques for Privacy Preserving

Survey of Anonymity Techniques for Privacy Preserving 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Survey of Anonymity Techniques for Privacy Preserving Luo Yongcheng

More information

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 31 st July 216. Vol.89. No.2 25-216 JATIT & LLS. All rights reserved. SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 1 AMANI MAHAGOUB OMER, 2 MOHD MURTADHA BIN MOHAMAD 1 Faculty of Computing,

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Privacy Preservation Data Mining Using GSlicing Approach Mr. Ghanshyam P. Dhomse

More information

Comparative Analysis of Anonymization Techniques

Comparative Analysis of Anonymization Techniques International Journal of Electronic and Electrical Engineering. ISSN 0974-2174 Volume 7, Number 8 (2014), pp. 773-778 International Research Publication House http://www.irphouse.com Comparative Analysis

More information

(α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing

(α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing (α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing Raymond Chi-Wing Wong, Jiuyong Li +, Ada Wai-Chee Fu and Ke Wang Department of Computer Science and Engineering +

More information

Privacy Preserved Data Publishing Techniques for Tabular Data

Privacy Preserved Data Publishing Techniques for Tabular Data Privacy Preserved Data Publishing Techniques for Tabular Data Keerthy C. College of Engineering Trivandrum Sabitha S. College of Engineering Trivandrum ABSTRACT Almost all countries have imposed strict

More information

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud R. H. Jadhav 1 P.E.S college of Engineering, Aurangabad, Maharashtra, India 1 rjadhav377@gmail.com ABSTRACT: Many

More information

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD)

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD) Vol.2, Issue.1, Jan-Feb 2012 pp-208-212 ISSN: 2249-6645 Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD) Krishna.V #, Santhana Lakshmi. S * # PG Student,

More information

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University Outline Privacy preserving data publishing: What and Why Examples of privacy attacks

More information

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Tiancheng Li Ninghui Li CERIAS and Department of Computer Science, Purdue University 250 N. University Street, West

More information

An efficient hash-based algorithm for minimal k-anonymity

An efficient hash-based algorithm for minimal k-anonymity An efficient hash-based algorithm for minimal k-anonymity Xiaoxun Sun Min Li Hua Wang Ashley Plank Department of Mathematics & Computing University of Southern Queensland Toowoomba, Queensland 4350, Australia

More information

Maintaining K-Anonymity against Incremental Updates

Maintaining K-Anonymity against Incremental Updates Maintaining K-Anonymity against Incremental Updates Jian Pei Jian Xu Zhibin Wang Wei Wang Ke Wang Simon Fraser University, Canada, {jpei, wang}@cs.sfu.ca Fudan University, China, {xujian, 55, weiwang}@fudan.edu.cn

More information

A Review of Privacy Preserving Data Publishing Technique

A Review of Privacy Preserving Data Publishing Technique A Review of Privacy Preserving Data Publishing Technique Abstract:- Amar Paul Singh School of CSE Bahra University Shimla Hills, India Ms. Dhanshri Parihar Asst. Prof (School of CSE) Bahra University Shimla

More information

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique P.Nithya 1, V.Karpagam 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College,

More information

Maintaining K-Anonymity against Incremental Updates

Maintaining K-Anonymity against Incremental Updates Maintaining K-Anonymity against Incremental Updates Jian Pei 1 Jian Xu 2 Zhibin Wang 2 Wei Wang 2 Ke Wang 1 1 Simon Fraser University, Canada, {jpei, wang}@cs.sfu.ca 2 Fudan University, China, {xujian,

More information

On Privacy-Preservation of Text and Sparse Binary Data with Sketches

On Privacy-Preservation of Text and Sparse Binary Data with Sketches On Privacy-Preservation of Text and Sparse Binary Data with Sketches Charu C. Aggarwal Philip S. Yu Abstract In recent years, privacy preserving data mining has become very important because of the proliferation

More information

CS573 Data Privacy and Security. Li Xiong

CS573 Data Privacy and Security. Li Xiong CS573 Data Privacy and Security Anonymizationmethods Li Xiong Today Clustering based anonymization(cont) Permutation based anonymization Other privacy principles Microaggregation/Clustering Two steps:

More information

Efficient k-anonymization Using Clustering Techniques

Efficient k-anonymization Using Clustering Techniques Efficient k-anonymization Using Clustering Techniques Ji-Won Byun 1,AshishKamra 2, Elisa Bertino 1, and Ninghui Li 1 1 CERIAS and Computer Science, Purdue University {byunj, bertino, ninghui}@cs.purdue.edu

More information

Incognito: Efficient Full Domain K Anonymity

Incognito: Efficient Full Domain K Anonymity Incognito: Efficient Full Domain K Anonymity Kristen LeFevre David J. DeWitt Raghu Ramakrishnan University of Wisconsin Madison 1210 West Dayton St. Madison, WI 53706 Talk Prepared By Parul Halwe(05305002)

More information

Survey of k-anonymity

Survey of k-anonymity NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA Survey of k-anonymity by Ankit Saroha A thesis submitted in partial fulfillment for the degree of Bachelor of Technology under the guidance of Dr. K. S. Babu Department

More information

Distributed Data Anonymization with Hiding Sensitive Node Labels

Distributed Data Anonymization with Hiding Sensitive Node Labels Distributed Data Anonymization with Hiding Sensitive Node Labels C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan University,Trichy

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

Data Distortion for Privacy Protection in a Terrorist Analysis System

Data Distortion for Privacy Protection in a Terrorist Analysis System Data Distortion for Privacy Protection in a Terrorist Analysis System Shuting Xu, Jun Zhang, Dianwei Han, and Jie Wang Department of Computer Science, University of Kentucky, Lexington KY 40506-0046, USA

More information

Efficient integration of data mining techniques in DBMSs

Efficient integration of data mining techniques in DBMSs Efficient integration of data mining techniques in DBMSs Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex, FRANCE {bentayeb jdarmont

More information

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust G.Mareeswari 1, V.Anusuya 2 ME, Department of CSE, PSR Engineering College, Sivakasi, Tamilnadu,

More information

Data Anonymization - Generalization Algorithms

Data Anonymization - Generalization Algorithms Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity Generalization and Suppression Z2 = {410**} Z1 = {4107*. 4109*} Generalization Replace the value with a less specific

More information

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases Jinlong Wang, Congfu Xu, Hongwei Dan, and Yunhe Pan Institute of Artificial Intelligence, Zhejiang University Hangzhou, 310027,

More information

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness Data Security and Privacy Topic 18: k-anonymity, l-diversity, and t-closeness 1 Optional Readings for This Lecture t-closeness: Privacy Beyond k-anonymity and l-diversity. Ninghui Li, Tiancheng Li, and

More information

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database T.Malathi 1, S. Nandagopal 2 PG Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,

More information

Survey Result on Privacy Preserving Techniques in Data Publishing

Survey Result on Privacy Preserving Techniques in Data Publishing Survey Result on Privacy Preserving Techniques in Data Publishing S.Deebika PG Student, Computer Science and Engineering, Vivekananda College of Engineering for Women, Namakkal India A.Sathyapriya Assistant

More information

K-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007

K-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007 K-Anonymity and Other Cluster- Based Methods Ge Ruan Oct 11,2007 Data Publishing and Data Privacy Society is experiencing exponential growth in the number and variety of data collections containing person-specific

More information

Privacy Preserving in Knowledge Discovery and Data Publishing

Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao, G.V Konda Reddy and G.Yedukondalu 33 Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao 1, G.V Konda Reddy 2, G.Yedukondalu 3 Abstract Knowledge Discovery is

More information

Slicing Technique For Privacy Preserving Data Publishing

Slicing Technique For Privacy Preserving Data Publishing Slicing Technique For Privacy Preserving Data Publishing D. Mohanapriya #1, Dr. T.Meyyappan M.Sc., MBA. M.Phil., Ph.d., 2 # Department of Computer Science and Engineering, Alagappa University, Karaikudi,

More information

Solution of Exercise Sheet 11

Solution of Exercise Sheet 11 Foundations of Cybersecurity (Winter 16/17) Prof. Dr. Michael Backes CISPA / Saarland University saarland university computer science Solution of Exercise Sheet 11 1 Breaking Privacy By Linking Data The

More information

(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data

(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data (δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data Mohammad-Reza Zare-Mirakabad Department of Computer Engineering Scool of Electrical and Computer Yazd University, Iran mzare@yazduni.ac.ir

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017 Differential Privacy Seminar: Robust Techniques Thomas Edlich Technische Universität München Department of Informatics kdd.in.tum.de July 16, 2017 Outline 1. Introduction 2. Definition and Features of

More information

Evaluating the Classification Accuracy of Data Mining Algorithms for Anonymized Data

Evaluating the Classification Accuracy of Data Mining Algorithms for Anonymized Data International Journal of Computer Science and Telecommunications [Volume 3, Issue 8, August 2012] 63 ISSN 2047-3338 Evaluating the Classification Accuracy of Data Mining Algorithms for Anonymized Data

More information

Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning

Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning S. Ram Prasad Reddy, Kvsvn Raju, and V. Valli Kumari associated with a particular transaction, if the adversary

More information

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique Sumit Jain 1, Abhishek Raghuvanshi 1, Department of information Technology, MIT, Ujjain Abstract--Knowledge

More information

Rule Enforcement with Third Parties in Secure Cooperative Data Access

Rule Enforcement with Third Parties in Secure Cooperative Data Access Rule Enforcement with Third Parties in Secure Cooperative Data Access Meixing Le, Krishna Kant, and Sushil Jajodia George Mason University, Fairfax, VA 22030 {mlep,kkant,jajodia}@gmu.edu Abstract. In this

More information

The Application of K-medoids and PAM to the Clustering of Rules

The Application of K-medoids and PAM to the Clustering of Rules The Application of K-medoids and PAM to the Clustering of Rules A. P. Reynolds, G. Richards, and V. J. Rayward-Smith School of Computing Sciences, University of East Anglia, Norwich Abstract. Earlier research

More information

K ANONYMITY. Xiaoyong Zhou

K ANONYMITY. Xiaoyong Zhou K ANONYMITY LATANYA SWEENEY Xiaoyong Zhou DATA releasing: Privacy vs. Utility Society is experiencing exponential growth in the number and variety of data collections containing person specific specific

More information

Approximation Algorithms for k-anonymity 1

Approximation Algorithms for k-anonymity 1 Journal of Privacy Technology 20051120001 Approximation Algorithms for k-anonymity 1 Gagan Aggarwal 2 Google Inc., Mountain View, CA Tomas Feder 268 Waverley St., Palo Alto, CA Krishnaram Kenthapadi 3

More information

Data Anonymization. Graham Cormode.

Data Anonymization. Graham Cormode. Data Anonymization Graham Cormode graham@research.att.com 1 Why Anonymize? For Data Sharing Give real(istic) data to others to study without compromising privacy of individuals in the data Allows third-parties

More information

Security Control Methods for Statistical Database

Security Control Methods for Statistical Database Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security Statistical Database A statistical database is a database which provides statistics on subsets of records OLAP

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING

A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING 1 B.KARTHIKEYAN, 2 G.MANIKANDAN, 3 V.VAITHIYANATHAN 1 Assistant Professor, School of Computing, SASTRA University, TamilNadu, India. 2 Assistant

More information

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey ISSN No. 0976-5697 Volume 8, No. 5, May-June 2017 International Journal of Advanced Research in Computer Science SURVEY REPORT Available Online at www.ijarcs.info Preserving Privacy during Big Data Publishing

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator R.Saravanan 1, J.Sivapriya 2, M.Shahidha 3 1 Assisstant Professor, Department of IT,SMVEC, Puducherry, India 2,3 UG student, Department

More information

NON-CENTRALIZED DISTINCT L-DIVERSITY

NON-CENTRALIZED DISTINCT L-DIVERSITY NON-CENTRALIZED DISTINCT L-DIVERSITY Chi Hong Cheong 1, Dan Wu 2, and Man Hon Wong 3 1,3 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong {chcheong, mhwong}@cse.cuhk.edu.hk

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Partition Based Perturbation for Privacy Preserving Distributed Data Mining BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 2 Sofia 2017 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2017-0015 Partition Based Perturbation

More information

One-mode Additive Clustering of Multiway Data

One-mode Additive Clustering of Multiway Data One-mode Additive Clustering of Multiway Data Dirk Depril and Iven Van Mechelen KULeuven Tiensestraat 103 3000 Leuven, Belgium (e-mail: dirk.depril@psy.kuleuven.ac.be iven.vanmechelen@psy.kuleuven.ac.be)

More information

Towards the Anonymisation of RDF Data

Towards the Anonymisation of RDF Data Towards the Anonymisation of RDF Data Filip Radulovic Ontology Engineering Group ETSI Informáticos Universidad Politécnica de Madrid Madrid, Spain fradulovic@fi.upm.es Raúl García-Castro Ontology Engineering

More information

Anonymization Algorithms - Microaggregation and Clustering

Anonymization Algorithms - Microaggregation and Clustering Anonymization Algorithms - Microaggregation and Clustering Li Xiong CS573 Data Privacy and Anonymity Anonymization using Microaggregation or Clustering Practical Data-Oriented Microaggregation for Statistical

More information

Data Mining Algorithms In R/Clustering/K-Means

Data Mining Algorithms In R/Clustering/K-Means 1 / 7 Data Mining Algorithms In R/Clustering/K-Means Contents 1 Introduction 2 Technique to be discussed 2.1 Algorithm 2.2 Implementation 2.3 View 2.4 Case Study 2.4.1 Scenario 2.4.2 Input data 2.4.3 Execution

More information

The Applicability of the Perturbation Model-based Privacy Preserving Data Mining for Real-world Data

The Applicability of the Perturbation Model-based Privacy Preserving Data Mining for Real-world Data The Applicability of the Perturbation Model-based Privacy Preserving Data Mining for Real-world Data Li Liu, Murat Kantarcioglu and Bhavani Thuraisingham Computer Science Department University of Texas

More information

K-Anonymity. Definitions. How do you publicly release a database without compromising individual privacy?

K-Anonymity. Definitions. How do you publicly release a database without compromising individual privacy? K-Anonymity How do you publicly release a database without compromising individual privacy? The Wrong Approach: REU Summer 2007 Advisors: Ryan Williams and Manuel Blum Just leave out any unique identifiers

More information

k-anonymization May Be NP-Hard, but Can it Be Practical?

k-anonymization May Be NP-Hard, but Can it Be Practical? k-anonymization May Be NP-Hard, but Can it Be Practical? David Wilson RTI International dwilson@rti.org 1 Introduction This paper discusses the application of k-anonymity to a real-world set of microdata

More information

STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES

STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES Prof. Ambarish S. Durani 1 and Mrs. Rashmi B. Sune 2 1 Assistant Professor, Datta Meghe Institute of Engineering,

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Comparison and Analysis of Anonymization Techniques for Preserving Privacy in Big Data

Comparison and Analysis of Anonymization Techniques for Preserving Privacy in Big Data Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 2 (2017) pp. 247-253 Research India Publications http://www.ripublication.com Comparison and Analysis of Anonymization

More information

Injector: Mining Background Knowledge for Data Anonymization

Injector: Mining Background Knowledge for Data Anonymization : Mining Background Knowledge for Data Anonymization Tiancheng Li, Ninghui Li Department of Computer Science, Purdue University 35 N. University Street, West Lafayette, IN 4797, USA {li83,ninghui}@cs.purdue.edu

More information

GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION

GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION K. Venkata Ramana and V.Valli Kumari Department of Computer Science and Systems Engineering, Andhra University, Visakhapatnam, India {kvramana.auce, vallikumari}@gmail.com

More information

Handling Missing Values via Decomposition of the Conditioned Set

Handling Missing Values via Decomposition of the Conditioned Set Handling Missing Values via Decomposition of the Conditioned Set Mei-Ling Shyu, Indika Priyantha Kuruppu-Appuhamilage Department of Electrical and Computer Engineering, University of Miami Coral Gables,

More information

ADDITIVE GAUSSIAN NOISE BASED DATA PERTURBATION IN MULTI-LEVEL TRUST PRIVACY PRESERVING DATA MINING

ADDITIVE GAUSSIAN NOISE BASED DATA PERTURBATION IN MULTI-LEVEL TRUST PRIVACY PRESERVING DATA MINING ADDITIVE GAUSSIAN NOISE BASED DATA PERTURBATION IN MULTI-LEVEL TRUST PRIVACY PRESERVING DATA MINING R.Kalaivani #1,S.Chidambaram #2 # Department of Information Techology, National Engineering College,

More information

Review on Techniques of Collaborative Tagging

Review on Techniques of Collaborative Tagging Review on Techniques of Collaborative Tagging Ms. Benazeer S. Inamdar 1, Mrs. Gyankamal J. Chhajed 2 1 Student, M. E. Computer Engineering, VPCOE Baramati, Savitribai Phule Pune University, India benazeer.inamdar@gmail.com

More information

A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering

A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering Nghiem Van Tinh 1, Vu Viet Vu 1, Tran Thi Ngoc Linh 1 1 Thai Nguyen University of

More information

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING Neha V. Sonparote, Professor Vijay B. More. Neha V. Sonparote, Dept. of computer Engineering, MET s Institute of Engineering Nashik, Maharashtra,

More information

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec. Randomized rounding of semidefinite programs and primal-dual method for integer linear programming Dr. Saeedeh Parsaeefard 1 2 3 4 Semidefinite Programming () 1 Integer Programming integer programming

More information

Preserving Data Mining through Data Perturbation

Preserving Data Mining through Data Perturbation Preserving Data Mining through Data Perturbation Mr. Swapnil Kadam, Prof. Navnath Pokale Abstract Data perturbation, a widely employed and accepted Privacy Preserving Data Mining (PPDM) approach, tacitly

More information

Information Security in Big Data: Privacy & Data Mining

Information Security in Big Data: Privacy & Data Mining Engineering (IJERCSE) Vol. 1, Issue 2, December 2014 Information Security in Big Data: Privacy & Data Mining [1] Kiran S.Gaikwad, [2] Assistant Professor. Seema Singh Solanki [1][2] Everest College of

More information

Co-clustering for differentially private synthetic data generation

Co-clustering for differentially private synthetic data generation Co-clustering for differentially private synthetic data generation Tarek Benkhelif, Françoise Fessant, Fabrice Clérot and Guillaume Raschia January 23, 2018 Orange Labs & LS2N Journée thématique EGC &

More information

arxiv: v1 [cs.lg] 3 Oct 2018

arxiv: v1 [cs.lg] 3 Oct 2018 Real-time Clustering Algorithm Based on Predefined Level-of-Similarity Real-time Clustering Algorithm Based on Predefined Level-of-Similarity arxiv:1810.01878v1 [cs.lg] 3 Oct 2018 Rabindra Lamsal Shubham

More information

Fast Efficient Clustering Algorithm for Balanced Data

Fast Efficient Clustering Algorithm for Balanced Data Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut

More information

Alpha Anonymization in Social Networks using the Lossy-Join Approach

Alpha Anonymization in Social Networks using the Lossy-Join Approach TRANSACTIONS ON DATA PRIVACY 11 (2018) 1 22 Alpha Anonymization in Social Networks using the Lossy-Join Kiran Baktha*, B K Tripathy** * Department of Electronics and Communication Engineering, VIT University,

More information

Bitmap index-based decision trees

Bitmap index-based decision trees Bitmap index-based decision trees Cécile Favre and Fadila Bentayeb ERIC - Université Lumière Lyon 2, Bâtiment L, 5 avenue Pierre Mendès-France 69676 BRON Cedex FRANCE {cfavre, bentayeb}@eric.univ-lyon2.fr

More information

Optimization Techniques for Range Queries in the Multivalued-Partial Order Preserving Encryption Scheme

Optimization Techniques for Range Queries in the Multivalued-Partial Order Preserving Encryption Scheme DEIM Forum C5-6 Optimization Techniques for Range Queries in the Multivalued-Partial Abstract Order Preserving Encryption Scheme Hasan KADHEM, Toshiyuki AMAGASA,, and Hiroyuki KITAGAWA, Graduate School

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S.

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series Introduction to Privacy-Preserving Data Publishing Concepts and Techniques Benjamin C M Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S Yu CRC

More information

IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W STRUCTURAL DIVERSITY ANONYMITY

IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W STRUCTURAL DIVERSITY ANONYMITY IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W STRUCTURAL DIVERSITY ANONYMITY Gowthamy.R 1* and Uma.P 2 *1 M.E.Scholar, Department of Computer Science & Engineering Nandha Engineering College,

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

A Review on Privacy Preserving Data Mining Approaches

A Review on Privacy Preserving Data Mining Approaches A Review on Privacy Preserving Data Mining Approaches Anu Thomas Asst.Prof. Computer Science & Engineering Department DJMIT,Mogar,Anand Gujarat Technological University Anu.thomas@djmit.ac.in Jimesh Rana

More information

Service-Oriented Architecture for Privacy-Preserving Data Mashup

Service-Oriented Architecture for Privacy-Preserving Data Mashup Service-Oriented Architecture for Privacy-Preserving Data Mashup Thomas Trojer a Benjamin C. M. Fung b Patrick C. K. Hung c a Quality Engineering, Institute of Computer Science, University of Innsbruck,

More information

Comparative Study of Subspace Clustering Algorithms

Comparative Study of Subspace Clustering Algorithms Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that

More information

Complexity Results on Graphs with Few Cliques

Complexity Results on Graphs with Few Cliques Discrete Mathematics and Theoretical Computer Science DMTCS vol. 9, 2007, 127 136 Complexity Results on Graphs with Few Cliques Bill Rosgen 1 and Lorna Stewart 2 1 Institute for Quantum Computing and School

More information

Privacy Preserving Data Mining. Danushka Bollegala COMP 527

Privacy Preserving Data Mining. Danushka Bollegala COMP 527 Privacy Preserving ata Mining anushka Bollegala COMP 527 Privacy Issues ata mining attempts to ind mine) interesting patterns rom large datasets However, some o those patterns might reveal inormation that

More information

AN EFFECTIVE FRAMEWORK FOR EXTENDING PRIVACY- PRESERVING ACCESS CONTROL MECHANISM FOR RELATIONAL DATA

AN EFFECTIVE FRAMEWORK FOR EXTENDING PRIVACY- PRESERVING ACCESS CONTROL MECHANISM FOR RELATIONAL DATA AN EFFECTIVE FRAMEWORK FOR EXTENDING PRIVACY- PRESERVING ACCESS CONTROL MECHANISM FOR RELATIONAL DATA Morla Dinesh 1, Shaik. Jumlesha 2 1 M.Tech (S.E), Audisankara College Of Engineering &Technology 2

More information

Achieving Anonymity via Clustering

Achieving Anonymity via Clustering Achieving Anonymity via Clustering Gagan Aggarwal 1 Tomás Feder 2 Krishnaram Kenthapadi 2 Samir Khuller 3 Rina Panigrahy 2,4 Dilys Thomas 2 An Zhu 1 ABSTRACT Publishing data for analysis from a table containing

More information

Record Linkage using Probabilistic Methods and Data Mining Techniques

Record Linkage using Probabilistic Methods and Data Mining Techniques Doi:10.5901/mjss.2017.v8n3p203 Abstract Record Linkage using Probabilistic Methods and Data Mining Techniques Ogerta Elezaj Faculty of Economy, University of Tirana Gloria Tuxhari Faculty of Economy, University

More information

The Effect of Word Sampling on Document Clustering

The Effect of Word Sampling on Document Clustering The Effect of Word Sampling on Document Clustering OMAR H. KARAM AHMED M. HAMAD SHERIN M. MOUSSA Department of Information Systems Faculty of Computer and Information Sciences University of Ain Shams,

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Hiding Sensitive Predictive Frequent Itemsets

Hiding Sensitive Predictive Frequent Itemsets Hiding Sensitive Predictive Frequent Itemsets Barış Yıldız and Belgin Ergenç Abstract In this work, we propose an itemset hiding algorithm with four versions that use different heuristics in selecting

More information

K-Means Clustering With Initial Centroids Based On Difference Operator

K-Means Clustering With Initial Centroids Based On Difference Operator K-Means Clustering With Initial Centroids Based On Difference Operator Satish Chaurasiya 1, Dr.Ratish Agrawal 2 M.Tech Student, School of Information and Technology, R.G.P.V, Bhopal, India Assistant Professor,

More information