K-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007

Size: px
Start display at page:

Download "K-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007"

Transcription

1 K-Anonymity and Other Cluster- Based Methods Ge Ruan Oct 11,2007

2 Data Publishing and Data Privacy Society is experiencing exponential growth in the number and variety of data collections containing person-specific information These collected information is valuable both in research and business Data sharing is common Publishing the data may put the respondent s privacy in risk Objective: Maximize data utility while limiting disclosure risk to an acceptable level

3 Related Works Statistical Databases The most common way is adding noise and still maintaining some statistical invariant Disadvantages: destroy the integrity of the data

4 Related Works(Cont d) Multi-level Databases Data is stored at different security classifications and users having different security clearances (Denning and Lunt) Eliminating precise inference Sensitive information is suppressed, ie simply not released (Su and Ozsoyoglu) Disadvantages: It is impossible to consider every possible attack Many data holders share same data But their concerns are different Suppression can drastically reduce the quality of the data

5 Related Works (Cont d) Computer Security Access control and authentication ensure that right people has right authority to the right object at right time and right place That s not what we want here A general doctrine of data privacy is to release all the information as much as the identities of the subjects (people) are protected

6 K-Anonymity Sweeny came up with a formal protection model named k-anonymity What is K-Anonymity? If the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release Ex If you try to identify a man from a release, but the only information you have is his birth date and gender There are k people meet the requirement This is k-anonymity

7 Classification of Attributes Key Attribute: Name, Address, Cell Phone which can uniquely identify an individual directly Always removed before release Quasi-Identifier: 5-digit ZIP code,birth date, gender A set of attributes that can be potentially linked with external information to re-identify entities 87% of the population in US can be uniquely identified based on these attributes, according to the Census summary data in 1991 Suppressed or generalized

8 Classification of Attributes(Cont d) Hospital Patient Data DOB Sex Zipcode Disease 1/21/76 Male Heart Disease 4/13/86 Female Hepatitis 2/28/76 Male Brochitis 1/21/76 Male Broken Arm 4/13/86 Female Flu 2/28/76 Female Hang Nail Vote Registration Data Name DOB Sex Zipcode Andre 1/21/76 Male Beth 1/10/81 Female Carol 10/1/44 Female Dan 2/21/84 Male Ellen 4/19/72 Female Andre has heart disease!

9 Classification of Attributes(Cont d) Sensitive Attribute: Medical record, wage,etc Always released directly These attributes is what the researchers need It depends on the requirement

10 K-Anonymity Protection Model PT: Private Table RT,GT1,GT2: Released Table QI: Quasi Identifier (Ai,,Aj) (A1,A2,,An): Attributes Lemma:

11

12 Attacks Against K-Anonymity Unsorted Matching Attack This attack is based on the order in which tuples appear in the released table Solution: Randomly sort the tuples before releasing

13 Attacks Against K-Anonymity(Cont d) Complementary Release Attack Different releases can be linked together to compromise k-anonymity Solution: Consider all of the released tables before release the new one, and try to avoid linking Other data holders may release some data that can be used in this kind of attack Generally, this kind of attack is hard to be prohibited completely

14 Attacks Against K-Anonymity(Cont d) Complementary Release Attack (Cont d)

15 Attacks Against K-Anonymity(Cont d) Complementary Release Attack (Cont d)

16 Attacks Against K-Anonymity(Cont d) Temporal Attack (Cont d) Adding or removing tuples may compromise k-anonymity protection

17 Attacks Against K-Anonymity(Cont d) k-anonymity does not provide privacy if: Sensitive values in an equivalence class lack diversity The attacker has background knowledge Homogeneity Attack Bob Zipcode Age Background Knowledge Attack Carl Zipcode Age A 3-anonymous patient table Zipcode Age Disease 476** 2* Heart Disease 476** 2* Heart Disease 476** 2* Heart Disease 4790* 40 Flu 4790* 40 Heart Disease 4790* 40 Cancer 476** 3* Heart Disease 476** 3* Cancer 476** 3* Cancer A Machanavajjhala et al l-diversity: Privacy Beyond k-anonymity ICDE 2006

18 l-diversity Distinct l-diversity Each equivalence class has at least l wellrepresented sensitive values Limitation: Doesn t prevent the probabilistic inference attacks Ex In one equivalent class, there are ten tuples In the Disease area, one of them is Cancer, one is Heart Disease and the remaining eight are Flu This satisfies 3-diversity, but the attacker can still affirm that the target person s disease is Flu with the accuracy of 70% A Machanavajjhala et al l-diversity: Privacy Beyond k-anonymity ICDE 2006

19 l-diversity(cont d) Entropy l-diversity Each equivalence class not only must have enough different sensitive values, but also the different sensitive values must be distributed evenly enough In the formal language of statistic, it means the entropy of the distribution of sensitive values in each equivalence class is at least log(l) Sometimes this maybe too restrictive When some values are very common, the entropy of the entire table may be very low This leads to the less conservative notion of l-diversity A Machanavajjhala et al l-diversity: Privacy Beyond k-anonymity ICDE 2006

20 l-diversity(cont d) Recursive (c,l)-diversity The most frequent value does not appear too frequently r 1 <c(r l +r l+1 + +r m ) A Machanavajjhala et al l-diversity: Privacy Beyond k-anonymity ICDE 2006

21 Limitations of l-diversity l-diversity may be difficult and unnecessary to achieve A single sensitive attribute Two values: HIV positive (1%) and HIV negative (99%) Very different degrees of sensitivity l-diversity is unnecessary to achieve 2-diversity is unnecessary for an equivalence class that contains only negative records l-diversity is difficult to achieve Suppose there are records in total To have distinct 2-diversity, there can be at most 10000*1%=100 equivalence classes

22 Limitations of l-diversity(cont d) l-diversity is insufficient to prevent attribute disclosure Skewness Attack Two sensitive values HIV positive (1%) and HIV negative (99%) Serious privacy risk Consider an equivalence class that contains an equal number of positive records and negative records l-diversity does not differentiate: Equivalence class 1: 49 positive + 1 negative Equivalence class 2: 1 positive + 49 negative l-diversity does not consider the overall distribution of sensitive values

23 Limitations of l-diversity(cont d) l-diversity is insufficient to prevent attribute disclosure Similarity Attack Bob Zip Age Conclusion 1 Bob s salary is in [20k,40k], which is relative low 2 Bob has some stomach-related disease A 3-diverse patient table Zipcode Age Salary Disease 476** 476** 476** 4790* 4790* 4790* 476** 2* 2* 2* * 20K 30K 40K 50K 100K 70K 60K Gastric Ulcer Gastritis Stomach Cancer Gastritis Flu Bronchitis Bronchitis 476** 3* 80K Pneumonia 476** 3* 90K Stomach Cancer l-diversity does not consider semantic meanings of sensitive values

24 t-closeness: A New Privacy Measure Rationale A completely generalized table Age Zipcode Gender Disease * * * Flu Belief B 0 B 1 Knowledge External Knowledge Overall distribution Q of sensitive values * * * * * * * * * Heart Disease Cancer Gastritis

25 t-closeness: A New Privacy Measure Rationale A released table Age Zipcode Gender Disease 2* 479** Male Flu Belief Knowledge 2* 479** Male Heart Disease 2* 479** Male Cancer B 0 External Knowledge B 1 Overall distribution Q of sensitive values * * Gastritis B 2 Distribution P i of sensitive values in each equi-class

26 t-closeness: A New Privacy Measure Rationale Belief B 0 B 1 B 2 Knowledge External Knowledge Overall distribution Q of sensitive values Distribution P i of sensitive values in each equi-class Observations Q should be public Knowledge gain in two parts: Whole population (from B 0 to B 1 ) Specific individuals (from B 1 to B 2 ) We bound knowledge gain between B 1 and B 2 instead Principle The distance between Q and P i should be bounded by a threshold t

27 Distance Measures P=(p 1,p 2,,p m ), Q=(q 1,q 2,,q m ) Trace-distance KL-divergence None of these measures reflect the semantic distance among values Q: {3K,4K,5K,6K,7K,8K,9K,10K,11k} P 1 :{3K,4K,5k} P 2 :{5K,7K,10K} Intuitively, D[P 1,Q]>D[P 2,Q] Ground distance for any pair of values D[P,Q] is dependent upon the ground distances

28 Earth Mover s Distance Formulation P=(p 1,p 2,,p m ), Q=(q 1,q 2,,q m ) d ij : the ground distance between element i of P and element j of Q Find a flow F=[f ij ] where f ij is the flow of mass from element i of P to element j of Q that minimizes the overall work: subject to the constraints:

29 Earth Mover s Distance Example {3k,4k,5k} and {3k,4k,5k,6k,7k,8k,9k,10k,11k} Move 1/9 probability for each of the following pairs 3k->6k,3k->7k cost: 1/9*(3+4)/8 4k->8k,4k->9k cost: 1/9*(4+5)/8 5k->10k,5k->11k cost: 1/9*(5+6)/8 Total cost: 1/9*27/8=0375 With P2={6k,8k,11k}, we can get the total cost is 0167 < 0375 This make more sense than the other two distance calculation method

30 How to calculate EMD EMD for numerical attributes Ordered distance ordered Ordered-distance is a metric dist( vi, vj) = i j m 1 Non-negative, symmetry, triangle inequality Let r i =p i -q i, then D[P,Q] is calculated as: m i 1 1 D[ PQ, ] = ( r1 + r1+ r2 + + r1+ r2+ + rm 1 ) = r m 1 m 1 i= 1 j= 1 j

31 How to calculate EMD EMD for categorical attributes Equal distance equal Equal-distance is a metric D[P,Q] is calculated as: dist( vi, vj) = 1 m 1 D[ PQ, ] = pi qi = ( pi qi) = ( pi qi) 2 i i i= 1 p > q pi< qi

32 How to calculate EMD(Cont d) EMD for categorical attributes Hierarchical distance Hierarchical distance is a metric hierarchical dist( vi, vj) = level( vi, vj) H Respiratory&digestive system diseases Respiratory system diseases Digestive system diseases Respiratory infection Vascular lung diseases Stomach diseases Colon diseases Flu Pneumonia Bronchitis Pulmonary edema Pulmonary embolism Gastric ulcer Stomach cancer Colitis Colon cancer

33 How to calculate EMD(Cont d) EMD for categorical attributes pi qi if N is a leaf extra( N) = extra( C) otherwise C Child ( N ) pos _ extra( N) = extra( C) C Child( N) extra( C) > 0 neg _ extra( N) = extra( C) C Child ( N ) extra( C ) < 0 height( N) cos t( N) = min( pos _ extra( N), neg _ extra( N)) H D[P,Q] is calculated as: D[ PQ, ] = cos t( N) N

34 Experiments Goal To show l-diversity does not provide sufficient privacy protection (the similarity attack) To show the efficiency and data quality of using t- closeness are comparable with other privacy measures Setup Adult dataset from UC Irvine ML repository tuples, 9 attributes (2 sensitive attributes) Algorithm: Incognito

35 Experiments Similarity attack (Occupation) 13 of 21 entropy 2-diversity tables are vulnerable 17 of 26 recursive (4,4)-diversity tables are vulnerable Comparisons of privacy measurements k-anonymity Entropy l-diversity Recursive (c,l)-diversity k-anonymity with t-closeness

36 Experiments Efficiency The efficiency of using t-closeness is comparable with other privacy measurements

37 Experiments Data utility Discernibility metric; Minimum average group size The data quality of using t-closeness is comparable with other privacy measurements

38 Conclusion Limitations of l-diversity l-diversity is difficult and unnecessary to achieve l-diversity is insufficient in preventing attribute disclosure t-closeness as a new privacy measure The overall distribution of sensitive values should be public information The separation of the knowledge gain EMD to measure distance EMD captures semantic distance well Simple formulas for three ground distances

39 Questions? Thank you!

K ANONYMITY. Xiaoyong Zhou

K ANONYMITY. Xiaoyong Zhou K ANONYMITY LATANYA SWEENEY Xiaoyong Zhou DATA releasing: Privacy vs. Utility Society is experiencing exponential growth in the number and variety of data collections containing person specific specific

More information

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness Data Security and Privacy Topic 18: k-anonymity, l-diversity, and t-closeness 1 Optional Readings for This Lecture t-closeness: Privacy Beyond k-anonymity and l-diversity. Ninghui Li, Tiancheng Li, and

More information

Emerging Measures in Preserving Privacy for Publishing The Data

Emerging Measures in Preserving Privacy for Publishing The Data Emerging Measures in Preserving Privacy for Publishing The Data K.SIVARAMAN 1 Assistant Professor, Dept. of Computer Science, BIST, Bharath University, Chennai -600073 1 ABSTRACT: The information in the

More information

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Dr.K.P.Kaliyamurthie HOD, Department of CSE, Bharath University, Tamilnadu, India ABSTRACT: Automated

More information

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD)

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD) Vol.2, Issue.1, Jan-Feb 2012 pp-208-212 ISSN: 2249-6645 Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD) Krishna.V #, Santhana Lakshmi. S * # PG Student,

More information

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University Outline Privacy preserving data publishing: What and Why Examples of privacy attacks

More information

CS573 Data Privacy and Security. Li Xiong

CS573 Data Privacy and Security. Li Xiong CS573 Data Privacy and Security Anonymizationmethods Li Xiong Today Clustering based anonymization(cont) Permutation based anonymization Other privacy principles Microaggregation/Clustering Two steps:

More information

Survey Result on Privacy Preserving Techniques in Data Publishing

Survey Result on Privacy Preserving Techniques in Data Publishing Survey Result on Privacy Preserving Techniques in Data Publishing S.Deebika PG Student, Computer Science and Engineering, Vivekananda College of Engineering for Women, Namakkal India A.Sathyapriya Assistant

More information

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey ISSN No. 0976-5697 Volume 8, No. 5, May-June 2017 International Journal of Advanced Research in Computer Science SURVEY REPORT Available Online at www.ijarcs.info Preserving Privacy during Big Data Publishing

More information

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Tiancheng Li Ninghui Li CERIAS and Department of Computer Science, Purdue University 250 N. University Street, West

More information

An Efficient Clustering Method for k-anonymization

An Efficient Clustering Method for k-anonymization An Efficient Clustering Method for -Anonymization Jun-Lin Lin Department of Information Management Yuan Ze University Chung-Li, Taiwan jun@saturn.yzu.edu.tw Meng-Cheng Wei Department of Information Management

More information

Privacy Preserved Data Publishing Techniques for Tabular Data

Privacy Preserved Data Publishing Techniques for Tabular Data Privacy Preserved Data Publishing Techniques for Tabular Data Keerthy C. College of Engineering Trivandrum Sabitha S. College of Engineering Trivandrum ABSTRACT Almost all countries have imposed strict

More information

Anonymization Algorithms - Microaggregation and Clustering

Anonymization Algorithms - Microaggregation and Clustering Anonymization Algorithms - Microaggregation and Clustering Li Xiong CS573 Data Privacy and Anonymity Anonymization using Microaggregation or Clustering Practical Data-Oriented Microaggregation for Statistical

More information

Achieving k-anonmity* Privacy Protection Using Generalization and Suppression

Achieving k-anonmity* Privacy Protection Using Generalization and Suppression UT DALLAS Erik Jonsson School of Engineering & Computer Science Achieving k-anonmity* Privacy Protection Using Generalization and Suppression Murat Kantarcioglu Based on Sweeney 2002 paper Releasing Private

More information

Data Anonymization - Generalization Algorithms

Data Anonymization - Generalization Algorithms Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity Generalization and Suppression Z2 = {410**} Z1 = {4107*. 4109*} Generalization Replace the value with a less specific

More information

Slicing Technique For Privacy Preserving Data Publishing

Slicing Technique For Privacy Preserving Data Publishing Slicing Technique For Privacy Preserving Data Publishing D. Mohanapriya #1, Dr. T.Meyyappan M.Sc., MBA. M.Phil., Ph.d., 2 # Department of Computer Science and Engineering, Alagappa University, Karaikudi,

More information

L-Diversity Algorithm for Incremental Data Release

L-Diversity Algorithm for Incremental Data Release Appl. ath. Inf. Sci. 7, No. 5, 2055-2060 (203) 2055 Applied athematics & Information Sciences An International Journal http://dx.doi.org/0.2785/amis/070546 L-Diversity Algorithm for Incremental Data Release

More information

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 31 st July 216. Vol.89. No.2 25-216 JATIT & LLS. All rights reserved. SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 1 AMANI MAHAGOUB OMER, 2 MOHD MURTADHA BIN MOHAMAD 1 Faculty of Computing,

More information

Survey of k-anonymity

Survey of k-anonymity NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA Survey of k-anonymity by Ankit Saroha A thesis submitted in partial fulfillment for the degree of Bachelor of Technology under the guidance of Dr. K. S. Babu Department

More information

(α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing

(α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing (α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing Raymond Chi-Wing Wong, Jiuyong Li +, Ada Wai-Chee Fu and Ke Wang Department of Computer Science and Engineering +

More information

CERIAS Tech Report

CERIAS Tech Report CERIAS Tech Report 27-7 PRIVACY-PRESERVING INCREMENTAL DATA DISSEMINATION by Ji-Won Byun, Tiancheng Li, Elisa Bertino, Ninghui Li, and Yonglak Sohn Center for Education and Research in Information Assurance

More information

Privacy Preserving Data Mining. Danushka Bollegala COMP 527

Privacy Preserving Data Mining. Danushka Bollegala COMP 527 Privacy Preserving ata Mining anushka Bollegala COMP 527 Privacy Issues ata mining attempts to ind mine) interesting patterns rom large datasets However, some o those patterns might reveal inormation that

More information

NON-CENTRALIZED DISTINCT L-DIVERSITY

NON-CENTRALIZED DISTINCT L-DIVERSITY NON-CENTRALIZED DISTINCT L-DIVERSITY Chi Hong Cheong 1, Dan Wu 2, and Man Hon Wong 3 1,3 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong {chcheong, mhwong}@cse.cuhk.edu.hk

More information

CERIAS Tech Report Injector: Mining Background Knowledge for Data Anonymization by Tiancheng Li; Ninghui Li Center for Education and Research

CERIAS Tech Report Injector: Mining Background Knowledge for Data Anonymization by Tiancheng Li; Ninghui Li Center for Education and Research CERIAS Tech Report 28-29 : Mining Background Knowledge for Data Anonymization by Tiancheng Li; Ninghui Li Center for Education and Research Information Assurance and Security Purdue University, West Lafayette,

More information

Comparative Analysis of Anonymization Techniques

Comparative Analysis of Anonymization Techniques International Journal of Electronic and Electrical Engineering. ISSN 0974-2174 Volume 7, Number 8 (2014), pp. 773-778 International Research Publication House http://www.irphouse.com Comparative Analysis

More information

Maintaining K-Anonymity against Incremental Updates

Maintaining K-Anonymity against Incremental Updates Maintaining K-Anonymity against Incremental Updates Jian Pei Jian Xu Zhibin Wang Wei Wang Ke Wang Simon Fraser University, Canada, {jpei, wang}@cs.sfu.ca Fudan University, China, {xujian, 55, weiwang}@fudan.edu.cn

More information

Security Control Methods for Statistical Database

Security Control Methods for Statistical Database Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security Statistical Database A statistical database is a database which provides statistics on subsets of records OLAP

More information

Anonymizing Sequential Releases

Anonymizing Sequential Releases Anonymizing Sequential Releases Ke Wang School of Computing Science Simon Fraser University Canada V5A 1S6 wangk@cs.sfu.ca Benjamin C. M. Fung School of Computing Science Simon Fraser University Canada

More information

Injector: Mining Background Knowledge for Data Anonymization

Injector: Mining Background Knowledge for Data Anonymization : Mining Background Knowledge for Data Anonymization Tiancheng Li, Ninghui Li Department of Computer Science, Purdue University 35 N. University Street, West Lafayette, IN 4797, USA {li83,ninghui}@cs.purdue.edu

More information

On the Tradeoff Between Privacy and Utility in Data Publishing

On the Tradeoff Between Privacy and Utility in Data Publishing On the Tradeoff Between Privacy and Utility in Data Publishing Tiancheng Li and Ninghui Li Department of Computer Science Purdue University {li83, ninghui}@cs.purdue.edu ABSTRACT In data publishing, anonymization

More information

Maintaining K-Anonymity against Incremental Updates

Maintaining K-Anonymity against Incremental Updates Maintaining K-Anonymity against Incremental Updates Jian Pei 1 Jian Xu 2 Zhibin Wang 2 Wei Wang 2 Ke Wang 1 1 Simon Fraser University, Canada, {jpei, wang}@cs.sfu.ca 2 Fudan University, China, {xujian,

More information

Survey of Anonymity Techniques for Privacy Preserving

Survey of Anonymity Techniques for Privacy Preserving 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Survey of Anonymity Techniques for Privacy Preserving Luo Yongcheng

More information

Efficient k-anonymization Using Clustering Techniques

Efficient k-anonymization Using Clustering Techniques Efficient k-anonymization Using Clustering Techniques Ji-Won Byun 1,AshishKamra 2, Elisa Bertino 1, and Ninghui Li 1 1 CERIAS and Computer Science, Purdue University {byunj, bertino, ninghui}@cs.purdue.edu

More information

PRACTICAL K-ANONYMITY ON LARGE DATASETS. Benjamin Podgursky. Thesis. Submitted to the Faculty of the. Graduate School of Vanderbilt University

PRACTICAL K-ANONYMITY ON LARGE DATASETS. Benjamin Podgursky. Thesis. Submitted to the Faculty of the. Graduate School of Vanderbilt University PRACTICAL K-ANONYMITY ON LARGE DATASETS By Benjamin Podgursky Thesis Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of

More information

Hiding the Presence of Individuals from Shared Databases: δ-presence

Hiding the Presence of Individuals from Shared Databases: δ-presence Consiglio Nazionale delle Ricerche Hiding the Presence of Individuals from Shared Databases: δ-presence M. Ercan Nergiz Maurizio Atzori Chris Clifton Pisa KDD Lab Outline Adversary Models Existential Uncertainty

More information

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database T.Malathi 1, S. Nandagopal 2 PG Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,

More information

Incognito: Efficient Full Domain K Anonymity

Incognito: Efficient Full Domain K Anonymity Incognito: Efficient Full Domain K Anonymity Kristen LeFevre David J. DeWitt Raghu Ramakrishnan University of Wisconsin Madison 1210 West Dayton St. Madison, WI 53706 Talk Prepared By Parul Halwe(05305002)

More information

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique Sumit Jain 1, Abhishek Raghuvanshi 1, Department of information Technology, MIT, Ujjain Abstract--Knowledge

More information

Lightning: Utility-Driven Anonymization of High-Dimensional Data

Lightning: Utility-Driven Anonymization of High-Dimensional Data 161 185 Lightning: Utility-Driven Anonymization of High-Dimensional Data Fabian Prasser, Raffael Bild, Johanna Eicher, Helmut Spengler, Florian Kohlmayer, Klaus A. Kuhn Chair of Biomedical Informatics,

More information

Data Anonymization. Graham Cormode.

Data Anonymization. Graham Cormode. Data Anonymization Graham Cormode graham@research.att.com 1 Why Anonymize? For Data Sharing Give real(istic) data to others to study without compromising privacy of individuals in the data Allows third-parties

More information

Privacy Preserving in Knowledge Discovery and Data Publishing

Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao, G.V Konda Reddy and G.Yedukondalu 33 Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao 1, G.V Konda Reddy 2, G.Yedukondalu 3 Abstract Knowledge Discovery is

More information

Parallel Composition Revisited

Parallel Composition Revisited Parallel Composition Revisited Chris Clifton 23 October 2017 This is joint work with Keith Merrill and Shawn Merrill This work supported by the U.S. Census Bureau under Cooperative Agreement CB16ADR0160002

More information

Service-Oriented Architecture for Privacy-Preserving Data Mashup

Service-Oriented Architecture for Privacy-Preserving Data Mashup Service-Oriented Architecture for Privacy-Preserving Data Mashup Thomas Trojer a Benjamin C. M. Fung b Patrick C. K. Hung c a Quality Engineering, Institute of Computer Science, University of Innsbruck,

More information

Data attribute security and privacy in Collaborative distributed database Publishing

Data attribute security and privacy in Collaborative distributed database Publishing International Journal of Engineering Inventions e-issn: 2278-7461, p-issn: 2319-6491 Volume 3, Issue 12 (July 2014) PP: 60-65 Data attribute security and privacy in Collaborative distributed database Publishing

More information

Privacy-Preserving Data Publishing: A Survey of Recent Developments

Privacy-Preserving Data Publishing: A Survey of Recent Developments Privacy-Preserving Data Publishing: A Survey of Recent Developments BENJAMIN C. M. FUNG Concordia University, Montreal KE WANG Simon Fraser University, Burnaby RUI CHEN Concordia University, Montreal 14

More information

publishing and (mobility) data mining

publishing and (mobility) data mining Privacy and anonymity in data publishing and (mobility) data mining Fosca Giannotti, Dino Pedreschi, Franco Turini Pisa KDD Laboratory Università di Pisa and ISTI-CNR, Pisa, Italy Dottorato di Ricerca

More information

Privacy Preserving Health Data Mining

Privacy Preserving Health Data Mining IJCST Vo l. 6, Is s u e 4, Oc t - De c 2015 ISSN : 0976-8491 (Online) ISSN : 2229-4333 (Print) Privacy Preserving Health Data Mining 1 Somy.M.S, 2 Gayatri.K.S, 3 Ashwini.B 1,2,3 Dept. of CSE, Mar Baselios

More information

Comparison and Analysis of Anonymization Techniques for Preserving Privacy in Big Data

Comparison and Analysis of Anonymization Techniques for Preserving Privacy in Big Data Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 2 (2017) pp. 247-253 Research India Publications http://www.ripublication.com Comparison and Analysis of Anonymization

More information

FMC: An Approach for Privacy Preserving OLAP

FMC: An Approach for Privacy Preserving OLAP FMC: An Approach for Privacy Preserving OLAP Ming Hua, Shouzhi Zhang, Wei Wang, Haofeng Zhou, Baile Shi Fudan University, China {minghua, shouzhi_zhang, weiwang, haofzhou, bshi}@fudan.edu.cn Abstract.

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Privacy preserving data mining Li Xiong Slides credits: Chris Clifton Agrawal and Srikant 4/3/2011 1 Privacy Preserving Data Mining Privacy concerns about personal data AOL

More information

The K-Anonymization Method Satisfying Personalized Privacy Preservation

The K-Anonymization Method Satisfying Personalized Privacy Preservation 181 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Editors: Peiyu Ren, Yancang Li, Huiping Song Copyright 2015, AIDIC Servizi S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The

More information

Towards the Anonymisation of RDF Data

Towards the Anonymisation of RDF Data Towards the Anonymisation of RDF Data Filip Radulovic Ontology Engineering Group ETSI Informáticos Universidad Politécnica de Madrid Madrid, Spain fradulovic@fi.upm.es Raúl García-Castro Ontology Engineering

More information

Microdata Publishing with Algorithmic Privacy Guarantees

Microdata Publishing with Algorithmic Privacy Guarantees Microdata Publishing with Algorithmic Privacy Guarantees Tiancheng Li and Ninghui Li Department of Computer Science, Purdue University 35 N. University Street West Lafayette, IN 4797-217 {li83,ninghui}@cs.purdue.edu

More information

Privacy Challenges in Big Data and Industry 4.0

Privacy Challenges in Big Data and Industry 4.0 Privacy Challenges in Big Data and Industry 4.0 Jiannong Cao Internet & Mobile Computing Lab Department of Computing Hong Kong Polytechnic University Email: csjcao@comp.polyu.edu.hk http://www.comp.polyu.edu.hk/~csjcao/

More information

K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization

K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization ABSTRACT Tochukwu Iwuchukwu University of Wisconsin 1210 West Dayton Street Madison, WI 53706 tochukwu@cs.wisc.edu In

More information

A Review of Privacy Preserving Data Publishing Technique

A Review of Privacy Preserving Data Publishing Technique A Review of Privacy Preserving Data Publishing Technique Abstract:- Amar Paul Singh School of CSE Bahra University Shimla Hills, India Ms. Dhanshri Parihar Asst. Prof (School of CSE) Bahra University Shimla

More information

m-privacy for Collaborative Data Publishing

m-privacy for Collaborative Data Publishing m-privacy for Collaborative Data Publishing Slawomir Goryczka Emory University Email: sgorycz@emory.edu Li Xiong Emory University Email: lxiong@emory.edu Benjamin C. M. Fung Concordia University Email:

More information

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015.

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015. Privacy-preserving machine learning Bo Liu, the HKUST March, 1st, 2015. 1 Some slides extracted from Wang Yuxiang, Differential Privacy: a short tutorial. Cynthia Dwork, The Promise of Differential Privacy.

More information

Nearest neighbor classification DSE 220

Nearest neighbor classification DSE 220 Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000

More information

Steps Towards Location Privacy

Steps Towards Location Privacy Steps Towards Location Privacy Subhasish Mazumdar New Mexico Institute of Mining & Technology Socorro, NM 87801, USA. DataSys 2018 Subhasish.Mazumdar@nmt.edu DataSys 2018 1 / 53 Census A census is vital

More information

Preventing Equivalence Attacks in Updated, Anonymized Data

Preventing Equivalence Attacks in Updated, Anonymized Data Preventing Equivalence Attacks in Updated, Anonymized Data Yeye He, Siddharth Barman, Jeffrey F. Naughton Computer Science Department, University of Wisconsin-Madison {heyeye, sid, naughton}@cs.wisc.edu

More information

A Disclosure Avoidance Research Agenda

A Disclosure Avoidance Research Agenda A Disclosure Avoidance Research Agenda presented at FCSM research conf. ; Nov. 5, 2013 session E-3; 10:15am: Data Disclosure Issues Paul B. Massell U.S. Census Bureau Center for Disclosure Avoidance Research

More information

INTRODUCTION to SAS STATISTICAL PACKAGE LAB 3

INTRODUCTION to SAS STATISTICAL PACKAGE LAB 3 Topics: Data step Subsetting Concatenation and Merging Reference: Little SAS Book - Chapter 5, Section 3.6 and 2.2 Online documentation Exercise I LAB EXERCISE The following is a lab exercise to give you

More information

On Syntactic Anonymity and Differential Privacy

On Syntactic Anonymity and Differential Privacy 161 183 On Syntactic Anonymity and Differential Privacy Chris Clifton 1, Tamir Tassa 2 1 Department of Computer Science/CERIAS, Purdue University, West Lafayette, IN 47907-2107 USA. 2 The Department of

More information

Clustering-based Multidimensional Sequence Data Anonymization

Clustering-based Multidimensional Sequence Data Anonymization Clustering-based Multidimensional Sequence Data Anonymization Morvarid Sehatar University of Ottawa Ottawa, ON, Canada msehatar@uottawa.ca Stan Matwin 1 Dalhousie University Halifax, NS, Canada 2 Institute

More information

CSE 565 Computer Security Fall 2018

CSE 565 Computer Security Fall 2018 CSE 565 Computer Security Fall 2018 Lecture 12: Database Security Department of Computer Science and Engineering University at Buffalo 1 Review of Access Control Types We previously studied four types

More information

Attacks on Privacy and definetti s Theorem

Attacks on Privacy and definetti s Theorem Attacks on Privacy and definetti s Theorem Daniel Kifer Penn State University ABSTRACT In this paper we present a method for reasoning about privacy using the concepts of exchangeability and definetti

More information

GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION

GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION K. Venkata Ramana and V.Valli Kumari Department of Computer Science and Systems Engineering, Andhra University, Visakhapatnam, India {kvramana.auce, vallikumari}@gmail.com

More information

A stochastic agent-based model of pathogen propagation in dynamic multi-relational social networks

A stochastic agent-based model of pathogen propagation in dynamic multi-relational social networks A stochastic agent-based model of pathogen propagation in dynamic multi-relational social networks Bilal Khan, Kirk Dombrowski, and Mohamed Saad, Journal of Transactions of Society Modeling and Simulation

More information

HIDE: Privacy Preserving Medical Data Publishing. James Gardner Department of Mathematics and Computer Science Emory University

HIDE: Privacy Preserving Medical Data Publishing. James Gardner Department of Mathematics and Computer Science Emory University HIDE: Privacy Preserving Medical Data Publishing James Gardner Department of Mathematics and Computer Science Emory University jgardn3@emory.edu Motivation De-identification is critical in any health informatics

More information

A Survey of Privacy Preserving Data Publishing using Generalization and Suppression

A Survey of Privacy Preserving Data Publishing using Generalization and Suppression Appl. Math. Inf. Sci. 8, No. 3, 1103-1116 (2014) 1103 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.12785/amis/080321 A Survey of Privacy Preserving Data Publishing

More information

Pufferfish: A Semantic Approach to Customizable Privacy

Pufferfish: A Semantic Approach to Customizable Privacy Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu Collaborators: Daniel Kifer (Penn State), Bolin Ding (UIUC, Microsoft Research) idash Privacy Workshop

More information

Solution of Exercise Sheet 11

Solution of Exercise Sheet 11 Foundations of Cybersecurity (Winter 16/17) Prof. Dr. Michael Backes CISPA / Saarland University saarland university computer science Solution of Exercise Sheet 11 1 Breaking Privacy By Linking Data The

More information

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Partition Based Perturbation for Privacy Preserving Distributed Data Mining BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 2 Sofia 2017 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2017-0015 Partition Based Perturbation

More information

Extra readings beyond the lecture slides are important:

Extra readings beyond the lecture slides are important: 1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their

More information

Distributed Data Anonymization with Hiding Sensitive Node Labels

Distributed Data Anonymization with Hiding Sensitive Node Labels Distributed Data Anonymization with Hiding Sensitive Node Labels C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan University,Trichy

More information

An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis

An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis Mohammad Hammoud CS3525 Dept. of Computer Science University of Pittsburgh Introduction This paper addresses the problem of defining

More information

CURRENT crowdsourcing platforms, such as Amazon

CURRENT crowdsourcing platforms, such as Amazon 1 K-Anonymity for Crowdsourcing Database Sai Wu, Xiaoli Wang, Sheng Wang, Zhenjie Zhang and Anthony K.H. Tung Abstract In crowdsourcing database, human operators are embedded into the database engine and

More information

(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data

(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data (δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data Mohammad-Reza Zare-Mirakabad Department of Computer Engineering Scool of Electrical and Computer Yazd University, Iran mzare@yazduni.ac.ir

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Privacy Preservation Data Mining Using GSlicing Approach Mr. Ghanshyam P. Dhomse

More information

Privacy Preserving Data Mining: An approach to safely share and use sensible medical data

Privacy Preserving Data Mining: An approach to safely share and use sensible medical data Privacy Preserving Data Mining: An approach to safely share and use sensible medical data Gerhard Kranner, Viscovery Biomax Symposium, June 24 th, 2016, Munich www.viscovery.net Privacy protection vs knowledge

More information

23.2 Normal Distributions

23.2 Normal Distributions 1_ Locker LESSON 23.2 Normal Distributions Common Core Math Standards The student is expected to: S-ID.4 Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data representation 5 Data reduction, notion of similarity

More information

K-Anonymity. Definitions. How do you publicly release a database without compromising individual privacy?

K-Anonymity. Definitions. How do you publicly release a database without compromising individual privacy? K-Anonymity How do you publicly release a database without compromising individual privacy? The Wrong Approach: REU Summer 2007 Advisors: Ryan Williams and Manuel Blum Just leave out any unique identifiers

More information

Anonymizing Collections of Tree-Structured Data

Anonymizing Collections of Tree-Structured Data IEEE Transactions on Data and Knowledge Engineering Vol No 27 Year 25 Anonymizing Collections of Tree-Structured Data Olga Gkountouna, Student Member, IEEE, and Manolis Terrovitis Abstract Collections

More information

Efficient Algorithms for Masking and Finding Quasi-Identifiers

Efficient Algorithms for Masking and Finding Quasi-Identifiers Efficient Algorithms for Masking and Finding Quasi-Identifiers Rajeev Motwani Ying Xu Abstract A quasi-identifier refers to a subset of attributes that can uniquely identify most tuples in a table. Incautious

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

An efficient hash-based algorithm for minimal k-anonymity

An efficient hash-based algorithm for minimal k-anonymity An efficient hash-based algorithm for minimal k-anonymity Xiaoxun Sun Min Li Hua Wang Ashley Plank Department of Mathematics & Computing University of Southern Queensland Toowoomba, Queensland 4350, Australia

More information

Anonymized Data: Generation, Models, Usage. Graham Cormode Divesh Srivastava

Anonymized Data: Generation, Models, Usage. Graham Cormode Divesh Srivastava Anonymized Data: Generation, Models, Usage Graham Cormode Divesh Srivastava {graham,divesh}@research.att.com Outline Part 1 Introduction to Anonymization and Uncertainty Tabular Data Anonymization Part

More information

Types à la Milner. Benjamin C. Pierce University of Pennsylvania. April 2012

Types à la Milner. Benjamin C. Pierce University of Pennsylvania. April 2012 Types are the leaven of computer programming: they make it digestible. - R. Milner Types à la Milner Benjamin C. Pierce University of Pennsylvania April 2012 Type inference Abstract types Types à la Milner

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

Co-clustering for differentially private synthetic data generation

Co-clustering for differentially private synthetic data generation Co-clustering for differentially private synthetic data generation Tarek Benkhelif, Françoise Fessant, Fabrice Clérot and Guillaume Raschia January 23, 2018 Orange Labs & LS2N Journée thématique EGC &

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Detection of Conflicts and Inconsistencies in Taxonomy-based Authorization Policies

Detection of Conflicts and Inconsistencies in Taxonomy-based Authorization Policies Detection of Conflicts and Inconsistencies in Taxonomy-based Authorization Policies Apurva Mohan, Douglas M. Blough School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta,

More information

The Two Dimensions of Data Privacy Measures

The Two Dimensions of Data Privacy Measures The Two Dimensions of Data Privacy Measures Abstract Orit Levin Page 1 of 9 Javier Salido Corporat e, Extern a l an d Lega l A ffairs, Microsoft This paper describes a practical framework for the first

More information

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique P.Nithya 1, V.Karpagam 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College,

More information

CLIENT HEALTH QUESTIONNAIRE AND INITIAL SCREENING QUESTIONS HEALTH QUESTIONNAIRE INSTRUCTIONS

CLIENT HEALTH QUESTIONNAIRE AND INITIAL SCREENING QUESTIONS HEALTH QUESTIONNAIRE INSTRUCTIONS CLIENT HEALTH QUESTIONNAIRE AND INITIAL SCREENING QUESTIONS HEALTH QUESTIONNAIRE INSTRUCTIONS If Incidental Medical Services (IMS) are to be provided, the Incidental Medical Services Certification Form

More information

CS 161 Multilevel & Database Security. Military models of security

CS 161 Multilevel & Database Security. Military models of security CS 161 Multilevel & Database Security 3 October 26 CS 161 3 October 26 Military models of security Need to know Three models of security Classification unclassified, classified, secret, top secret Compartmentalization

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Introduction To Security and Privacy Einführung in die IT-Sicherheit I

Introduction To Security and Privacy Einführung in die IT-Sicherheit I Introduction To Security and Privacy Einführung in die IT-Sicherheit I Prof. Dr. rer. nat. Doğan Kesdoğan Institut für Wirtschaftsinformatik kesdogan@fb5.uni-siegen.de http://www.uni-siegen.de/fb5/itsec/

More information