Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Size: px
Start display at page:

Download "Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University"

Transcription

1 Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University

2 Outline Privacy preserving data publishing: What and Why Examples of privacy attacks Existing solutions k-anonymity l-diversity differential privacy Conclusion and open problems

3 Privacy Preserving Data Publishing data data curator recipient contributors Each contributor: provide data about herself Curator: collects data and releases them in a certain form Recipient: uses the released data for analysis

4 Example: Census Data Release data data individuals census bureau general public Each contributor: provide data about herself Curator: collects data and releases them in a certain form Recipient: uses the released data for analysis

5 Example: Medical Data Release data data patients hospital medical researcher Each contributor: provide data about herself Curator: collects data and releases them in a certain form Recipient: uses the released data for analysis

6 Privacy Preserving Data Publishing data data curator recipient contributors Objectives: The privacy of the contributors are protected The recipient gets useful data

7 Why is this important? Many types of research rely on the availability of private data Demographic research Medical research Social network studies Web search studies

8 Why is it a difficult problem? Intuition: There are only 7 billion people on earth 7 billion 33 bits Theoretically speaking, we need only 33 bits of information to pinpoint an individual

9 Outline Privacy preserving data publishing: What and Why Examples of privacy attacks Existing solutions k-anonymity l-diversity differential privacy Conclusions and open problems

10 Privacy Breach: The MGIC Case Time: mid-1990s Curator: Massachusetts Group Insurance Commission (MGIC) Data released: anonymized medical records Intention: facilitate medical research Name Birth Date Gender ZIP Disease Alice 1960/01/01 F flu Bob 1965/02/02 M dyspepsia Cathy 1970/03/03 F pneumonia David 1975/04/04 M gastritis Medical Records

11 Privacy Breach: The MGIC Case Time: mid-1990s Curator: Massachusetts Group Insurance Commission (MGIC) Data released: anonymized medical records Intention: facilitate medical research match Name Birth Date Gender ZIP Alice 1960/01/01 F Bob 1965/02/02 M Cathy 1970/03/03 F David 1975/04/04 M Voter Registration List Birth Date Gender ZIP Disease 1960/01/01 F flu 1965/02/02 M dyspepsia 1970/03/03 F pneumonia 1975/04/04 M gastritis Medical Records

12 Privacy Breach: The AOL Case Time: 2006 Curator: American Online Data released: anonymized search log Intention: facilitate research on web search Log record: < User ID, Query, > Example: < , UQ, >

13 Privacy Breach: The AOL Case Log record: < User ID, Query, > Example: < , UQ, > Attacker: New York Times Method: Find all log entries for AOL user Many queries for businesses and services in Lilburn, GA (population 11K) A number of queries for different persons with the last name Arnold Lilburn has 14 people with the last name Arnold The New York Times contacted them and found that AOL User is Thelma Arnold

14 Privacy Breach: The DNA case Time: reported in 2005 Curator: A sperm bank Data released: A sperm donor s date of birth and birthplace Result: The donor offspring (a boy) was able to identify his biological father How? By exploiting the information on the Y- chromosome

15 Y-chromosome vs. Surname surname XY XX surname XY XX There is a strong correlation between Y-chromosomes and surnames surname XY surname XY XX

16 Y-chromosome vs. Surname The 15-year-old boy purchased the service from a company that collected DNA samples from 45,000 individuals provided service for Y-chromosome matching Result There were two (relatively) close matches, with almost identical surnames The boy thus learned the possible surnames of his biological farther

17 Privacy Breach: The DNA case The boy knew The possible last names of his biological father as well as his date of birth and birthplace He paid another company to retrieve the names of persons born in that place on that date Only one person had a matching surname Summary: The sperm bank in-directly revealed the identity of the donor, by disclosing his date of birth and birthplace

18 Lessons Learned Any information released by the data curator can potentially be exploited by the adversary In the MGIC case: genders, birth dates, ZIP codes In the AOL case: keywords in search queries In the DNA case: date of birth, birthplace Solution? Do not release the exact information from the original data

19 Privacy Preserving Data Publishing data modified data curator recipient contributors Publish a modified version of the data, such that the contributors privacy is adequately protected the published data is useful for its intended purpose (at least to some degree)

20 Privacy Preserving Data Publishing data modified data curator recipient contributors Two issues privacy principle: what do we mean by adequately protected privacy? modification method: how should we modify the data to ensure privacy while maximizing utility?

21 Existing Solutions Earliest solutions dated back to the 1960 s Solutions before 2000 Mostly without a formal privacy model Evaluates privacy based on empirical studies only This talk will focus on solutions with formal privacy models (developed after 2000) k-anonymity l-diversity differential privacy

22 Outline Privacy preserving data publishing: What and Why Examples of privacy attacks Existing solutions k-anonymity [Sweeney 2002] l-diversity differential privacy Conclusions and open problems

23 k-anonymity: Example Suppose that we want to publish the medical records below Name Age ZIP Disease Andy flu Bob dyspepsia Cathy pneumonia Diane gastritis

24 k-anonymity: Example Suppose that we want to publish the medical records below We know that eliminating names is not enough because an adversary may identify patients by Age and ZIP Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease flu dyspepsia pneumonia gastritis medical records

25 k-anonymity: Example k-anonymity [Sweeney 2002] How? requires that each (Age, ZIP) combination can be matched to at least k patients Make Age and ZIP less specific in the medical records Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease flu dyspepsia pneumonia gastritis medical records

26 k-anonymity: Example k-anonymity [Sweeney 2002] requires that each (Age, ZIP) combination can be matched to at least k patients Age ZIP Disease Name Age ZIP Andy Bob Cathy Diane adversary s knowledge flu dyspepsia pneumonia gastritis medical records

27 k-anonymity: Example k-anonymity [Sweeney 2002] requires that each (Age, ZIP) combination can be matched to at least k patients generalization Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis medical records

28 k-anonymity: Example k-anonymity [Sweeney 2002] requires that each (Age, ZIP) combination can be matched to at least k patients Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table

29 k-anonymity: Example k-anonymity [Sweeney 2002] requires that each (Age, ZIP) combination can be matched to at least k patients Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table

30 k-anonymity: General Approach Identify the attributes that the adversary may know Referred to as Quasi-Identifiers (QI) Divide tuples in the table into groups of sizes at least k Generalize the QI values of each group to make them identical QI group 1 group 2 Age ZIP Disease flu dyspepsia pneumonia gastritis medical records

31 k-anonymity: General Approach Identify the attributes that the adversary may know Referred to as Quasi-Identifiers (QI) Divide tuples in the table into groups of sizes at least k Generalize the QI values of each group to make them identical QI Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table

32 k-anonymity: Algorithms Numerous algorithms for k-anonymity had been proposed Objective: achieve k-anonymity with the least amount of generalization This line of research became obsolete Reason: k-anonymity was found to be vulnerable [Machanavajjhala et al. 2006] QI Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table

33 k-anonymity: Vulnerability k-anonymity requires that each combination of quasi-identifiers (QI) is hidden in a group of size at least k But it says nothing about the remaining attributes Result: Disclosure of sensitive attributes is possible QI sensitive Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table

34 k-anonymity: Vulnerability k-anonymity requires that each combination of quasi-identifiers (QI) is hidden in a group of size at least k But it says nothing about the remaining attributes Result: Disclosure of sensitive attributes is possible QI sensitive Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] dyspepsia [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table

35 k-anonymity: Vulnerability Intuition: Hiding in a group of k is not sufficient The group should have a diverse set of sensitive values QI sensitive Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] dyspepsia [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table

36 Outline Privacy preserving data publishing: What and Why Examples of privacy attacks Existing solutions k-anonymity l-diversity [Machanavajjhala et al. 2006] differential privacy Conclusions and open problems

37 l-diversity [Machanavajjhala et al. 2006] Approach: (similar to k-anonymity) Divide tuples into groups, and make the QI of each group identical Requirement: (different from k-anonymity) Each group has at least l well-represented sensitive values Several definitions of well-represented exist Simplest one: in each group, no sensitive value is associated with more than 1/l of the tuples Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-diverse table

38 l-diversity [Machanavajjhala et al. 2006] Rationale: The 1/l association in the generalized table leads to 1/l confidence for the adversary Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-diverse table

39 l-diversity: Follow-up Research Algorithms: achieve l-diversity with the least amount of generalization Patches : identify vulnerabilities of l-diversity, and propose an improved privacy notion Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-diverse table

40 l-diversity: Vulnerability Suppose that the adversary wants to find out the disease of Bob The adversary knows that Bob is unlikely to have breast cancer So he knows that Bob is likely to have dyspepsia Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] breast cancer [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-diverse table

41 l-diversity: Vulnerability Intuition: It is not sufficient to impose constraints of the diversity of sensitive values in each group Need to take into account the adversary s background knowledge (e.g., males are unlikely to have breast cancer) Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] breast cancer [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-diverse table

42 l-diversity: Vulnerability Intuition: It is not sufficient to impose constraints of the diversity of sensitive values in each group Need to take into account the adversary s background knowledge (e.g., males are unlikely to have breast cancer) Follow-up research on three issues: How to express the background knowledge of the adversary? How to derive background knowledge? How to generalize data to protect privacy against background knowledge? This led to papers after papers

43 Algorithm-Based Attacks Algorithms designed for l-diversity (and its improvements) are often vulnerable to algorithm-based attacks Intuition: Those algorithms always try to use the least amount of generalization to achieve l-diversity An adversary can utilize the characteristics of the algorithms to do some reverse engineering on the generalized tables

44 l-diversity: Summary l-diversity and its follow-up approaches address the weakness of k-anonymity tackle much more advanced adversary models Problems with this line of research It does not converge to a final adversary model Most proposed methods are vulnerable to algorithm-based attacks

45 Outline Privacy preserving data publishing: What and Why Examples of privacy attacks Existing solutions k-anonymity l-diversity differential privacy [Dwork 2006] Conclusions and open problems

46 Differential Privacy [Dwork 2006] A privacy principle proposed by theoreticians More difficult to understand k-anonymity and l-anonymity Becomes well-adopted because Its privacy model is generally considered strong enough Its definition naturally takes into account algorithm-based attacks

47 Differential Privacy: Intuition Suppose that we have a dataset D that contains the medical record of every individual in Australia Suppose that Alice is the dataset Intuitively, is it OK to publish the following information? Whether Alice has diabetes The total number of diabetes patients in D Why is it OK to publish the latter but not the former? Intuition: The former completely depends on Alice The latter does not depend much on Alice

48 Differential Privacy: Intuition In general, we should only publish information that does not highly depend on any particular individual This motivates the definition of differential privacy

49 Differential Privacy: Definition data randomized algorithm A modified data recipient Neighboring datasets: Two datasets and, such that can be obtained by changing one single tuple in A randomized algorithm satisfies -differential privacy, iff for any two neighboring datasets and and for any output of, Rationale: The output of the algorithm does not highly depend on any particular tuple in the input

50 Differential Privacy: Definition Illustration of -differential privacy exp Pr Pr exp

51 Comparison with k-anonymity and l- diversity Differential privacy does not directly model the adversary s knowledge exp Pr Pr exp But its privacy protection is generally considered strong enough

52 Comparison with k-anonymity and l- diversity Differential privacy is more general exp Pr Pr exp There is no restriction on the type of O It can be a table, a set of frequent itemsets, a regression model, etc.

53 The Differential Privacy Landscape This leads to a lot of research interests from various communities Database and data mining (SIGMOD, VLDB, ICDE, KDD, ) Security (CCS, ) Machine learning (NIPS, ICML, ) Systems (SIGCOMM, OSDI, ) Theory (STOC, FOCS, )

54 Outline Privacy preserving data publishing: What and Why Examples of privacy attacks Existing solutions k-anonymity l-diversity differential privacy General approaches for achieving differential privacy Research issues Conclusions and open problems

55 Achieving Differential Privacy: Example Suppose that we have a set D of medical records We want to release the number of diabetes patients in D (say, 1000) How to do it in a differentially private manner?

56 Achieving Differential Privacy: Example Naïve solution: Release 1000 directly But it violates differential privacy, since does not hold Better solution: Add noise into 1000 before releasing it Intuition: the noise could achieve the requirements of differential privacy Question: what kind of noise should we add?

57 Laplace Distribution ; increase/decrease by changes by a factor of variance: ; is referred as the scale

58 Achieving Differential Privacy: Example Add Laplace noise before releasing the number of diabetes patients in D Changing one tuple in D shifting the mean of Laplace distribution by 1 The two distributions have bounded differences differential privacy is satisfied Pr Pr ratio bounded exp # of diabetes patients

59 The Laplace Mechanism In general, if we want to release a set of values (e.g., counts) from a dataset, We add i.i.d. Laplace noise to each value to achieve differential privacy This general approach is called the Laplace mechanism Figuring out the correct amount of noise to use could be a research issue

60 Histogram Suppose that we want to release a histogram on Age from a dataset D Using the Laplace mechanism: We add i.i.d. Laplace noise to the histogram counts How much noise? Previous example: noise leads to (1/)-differential privacy This case: noise leads to (1/)-differential privacy Rationale: Changing one tuple may change two counts simultaneously in the histogram We need twice the noise to conceal such changes

61 Histogram In general, if the values to be published are obtained from a complex task (e.g., regression) It could be much more challenging to derive the correct amount of noise to use

62 Optimization of Accuracy A more common research issue: choosing a good strategy to publish data Example: Histogram vs. Histogram + binary tree The latter is good for range queries It answers range query is answered by taking the sum of O(log n) noisy counts the former requires O(n) counts Note: the latter requires logn times the noise required the former, but it pays off

63 Choosing a Good Strategy In general, choosing a good strategy requires exploiting the characteristics of the input data the output results the way that users may use the output results Most differential privacy papers focus on this issue

64 Other Research Issues General approaches beyond the Laplace mechanism E.g., the exponential mechanism [McSherry et al. 2007], which is suitable for problems with non-numeric outputs

65 Conclusion An overview of existing solutions for privacy preserving data publishing k-anonymity l-diversity differential privacy General methods for differential privacy, and related research issues

66 Open Problems Differentially private algorithms for complex data/tasks, e.g., Graph queries Principle component analysis (PCA) Trajectories Main challenge: Difficult to identify a good strategy

67 Open Problems Differential privacy might be too strong It requires that changing one tuple should not bring much change to the published result Alternative interpretation: Even if an adversary knows n 1 tuples in the input data, he won t be able to infer information about the remaining tuple Knowing n 1 individuals is often impossible How should we relax differential privacy?

68 Open Problems How to choose an appropriate for - differential privacy? Need a way to quantify the cost of privacy and the gain of utility in releasing data

69 Open Problems What do we do to protect genome data? Challenges The data is highly complex The queries are highly complex Definition of privacy is unclear

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness Data Security and Privacy Topic 18: k-anonymity, l-diversity, and t-closeness 1 Optional Readings for This Lecture t-closeness: Privacy Beyond k-anonymity and l-diversity. Ninghui Li, Tiancheng Li, and

More information

Data Anonymization. Graham Cormode.

Data Anonymization. Graham Cormode. Data Anonymization Graham Cormode graham@research.att.com 1 Why Anonymize? For Data Sharing Give real(istic) data to others to study without compromising privacy of individuals in the data Allows third-parties

More information

K ANONYMITY. Xiaoyong Zhou

K ANONYMITY. Xiaoyong Zhou K ANONYMITY LATANYA SWEENEY Xiaoyong Zhou DATA releasing: Privacy vs. Utility Society is experiencing exponential growth in the number and variety of data collections containing person specific specific

More information

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Dr.K.P.Kaliyamurthie HOD, Department of CSE, Bharath University, Tamilnadu, India ABSTRACT: Automated

More information

K-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007

K-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007 K-Anonymity and Other Cluster- Based Methods Ge Ruan Oct 11,2007 Data Publishing and Data Privacy Society is experiencing exponential growth in the number and variety of data collections containing person-specific

More information

Emerging Measures in Preserving Privacy for Publishing The Data

Emerging Measures in Preserving Privacy for Publishing The Data Emerging Measures in Preserving Privacy for Publishing The Data K.SIVARAMAN 1 Assistant Professor, Dept. of Computer Science, BIST, Bharath University, Chennai -600073 1 ABSTRACT: The information in the

More information

Pufferfish: A Semantic Approach to Customizable Privacy

Pufferfish: A Semantic Approach to Customizable Privacy Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu Collaborators: Daniel Kifer (Penn State), Bolin Ding (UIUC, Microsoft Research) idash Privacy Workshop

More information

CS573 Data Privacy and Security. Differential Privacy. Li Xiong

CS573 Data Privacy and Security. Differential Privacy. Li Xiong CS573 Data Privacy and Security Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques Composition theorems Statistical Data Privacy Non-interactive vs interactive Privacy

More information

Survey Result on Privacy Preserving Techniques in Data Publishing

Survey Result on Privacy Preserving Techniques in Data Publishing Survey Result on Privacy Preserving Techniques in Data Publishing S.Deebika PG Student, Computer Science and Engineering, Vivekananda College of Engineering for Women, Namakkal India A.Sathyapriya Assistant

More information

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD)

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD) Vol.2, Issue.1, Jan-Feb 2012 pp-208-212 ISSN: 2249-6645 Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD) Krishna.V #, Santhana Lakshmi. S * # PG Student,

More information

Differential Privacy. CPSC 457/557, Fall 13 10/31/13 Hushiyang Liu

Differential Privacy. CPSC 457/557, Fall 13 10/31/13 Hushiyang Liu Differential Privacy CPSC 457/557, Fall 13 10/31/13 Hushiyang Liu Era of big data Motivation: Utility vs. Privacy large-size database automatized data analysis Utility "analyze and extract knowledge from

More information

Privacy-Preserving Machine Learning

Privacy-Preserving Machine Learning Privacy-Preserving Machine Learning CS 760: Machine Learning Spring 2018 Mark Craven and David Page www.biostat.wisc.edu/~craven/cs760 1 Goals for the Lecture You should understand the following concepts:

More information

m-privacy for Collaborative Data Publishing

m-privacy for Collaborative Data Publishing m-privacy for Collaborative Data Publishing Slawomir Goryczka Emory University Email: sgorycz@emory.edu Li Xiong Emory University Email: lxiong@emory.edu Benjamin C. M. Fung Concordia University Email:

More information

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017 Differential Privacy Seminar: Robust Techniques Thomas Edlich Technische Universität München Department of Informatics kdd.in.tum.de July 16, 2017 Outline 1. Introduction 2. Definition and Features of

More information

0x1A Great Papers in Computer Security

0x1A Great Papers in Computer Security CS 380S 0x1A Great Papers in Computer Security Vitaly Shmatikov http://www.cs.utexas.edu/~shmat/courses/cs380s/ C. Dwork Differential Privacy (ICALP 2006 and many other papers) Basic Setting DB= x 1 x

More information

Privacy Preserving Data Mining. Danushka Bollegala COMP 527

Privacy Preserving Data Mining. Danushka Bollegala COMP 527 Privacy Preserving ata Mining anushka Bollegala COMP 527 Privacy Issues ata mining attempts to ind mine) interesting patterns rom large datasets However, some o those patterns might reveal inormation that

More information

Cryptography & Data Privacy Research in the NSRC

Cryptography & Data Privacy Research in the NSRC Cryptography & Data Privacy Research in the NSRC Adam Smith Assistant Professor Computer Science and Engineering 1 Cryptography & Data Privacy @ CSE NSRC SIIS Algorithms & Complexity Group Cryptography

More information

Survey of Anonymity Techniques for Privacy Preserving

Survey of Anonymity Techniques for Privacy Preserving 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Survey of Anonymity Techniques for Privacy Preserving Luo Yongcheng

More information

Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin

Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin Airavat: Security and Privacy for MapReduce Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin Computing in the year 201X 2 Data Illusion of

More information

Achieving k-anonmity* Privacy Protection Using Generalization and Suppression

Achieving k-anonmity* Privacy Protection Using Generalization and Suppression UT DALLAS Erik Jonsson School of Engineering & Computer Science Achieving k-anonmity* Privacy Protection Using Generalization and Suppression Murat Kantarcioglu Based on Sweeney 2002 paper Releasing Private

More information

Hiding the Presence of Individuals from Shared Databases: δ-presence

Hiding the Presence of Individuals from Shared Databases: δ-presence Consiglio Nazionale delle Ricerche Hiding the Presence of Individuals from Shared Databases: δ-presence M. Ercan Nergiz Maurizio Atzori Chris Clifton Pisa KDD Lab Outline Adversary Models Existential Uncertainty

More information

(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data

(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data (δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data Mohammad-Reza Zare-Mirakabad Department of Computer Engineering Scool of Electrical and Computer Yazd University, Iran mzare@yazduni.ac.ir

More information

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015.

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015. Privacy-preserving machine learning Bo Liu, the HKUST March, 1st, 2015. 1 Some slides extracted from Wang Yuxiang, Differential Privacy: a short tutorial. Cynthia Dwork, The Promise of Differential Privacy.

More information

Demonstration of Damson: Differential Privacy for Analysis of Large Data

Demonstration of Damson: Differential Privacy for Analysis of Large Data Demonstration of Damson: Differential Privacy for Analysis of Large Data Marianne Winslett 1,2, Yin Yang 1,2, Zhenjie Zhang 1 1 Advanced Digital Sciences Center, Singapore {yin.yang, zhenjie}@adsc.com.sg

More information

ECEN Security and Privacy for Big Data. Introduction Professor Yanmin Gong 08/22/2017

ECEN Security and Privacy for Big Data. Introduction Professor Yanmin Gong 08/22/2017 ECEN 5060 - Security and Privacy for Big Data Introduction Professor Yanmin Gong 08/22/2017 Administrivia Course Hour: T/R 3:30-4:45 pm @ CLB 101 Office Hour: T/R 2:30-3:30 pm Any question besides assignment

More information

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique Sumit Jain 1, Abhishek Raghuvanshi 1, Department of information Technology, MIT, Ujjain Abstract--Knowledge

More information

Solution of Exercise Sheet 11

Solution of Exercise Sheet 11 Foundations of Cybersecurity (Winter 16/17) Prof. Dr. Michael Backes CISPA / Saarland University saarland university computer science Solution of Exercise Sheet 11 1 Breaking Privacy By Linking Data The

More information

Security Control Methods for Statistical Database

Security Control Methods for Statistical Database Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security Statistical Database A statistical database is a database which provides statistics on subsets of records OLAP

More information

NON-CENTRALIZED DISTINCT L-DIVERSITY

NON-CENTRALIZED DISTINCT L-DIVERSITY NON-CENTRALIZED DISTINCT L-DIVERSITY Chi Hong Cheong 1, Dan Wu 2, and Man Hon Wong 3 1,3 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong {chcheong, mhwong}@cse.cuhk.edu.hk

More information

Parallel Composition Revisited

Parallel Composition Revisited Parallel Composition Revisited Chris Clifton 23 October 2017 This is joint work with Keith Merrill and Shawn Merrill This work supported by the U.S. Census Bureau under Cooperative Agreement CB16ADR0160002

More information

Injector: Mining Background Knowledge for Data Anonymization

Injector: Mining Background Knowledge for Data Anonymization : Mining Background Knowledge for Data Anonymization Tiancheng Li, Ninghui Li Department of Computer Science, Purdue University 35 N. University Street, West Lafayette, IN 4797, USA {li83,ninghui}@cs.purdue.edu

More information

Microdata Publishing with Algorithmic Privacy Guarantees

Microdata Publishing with Algorithmic Privacy Guarantees Microdata Publishing with Algorithmic Privacy Guarantees Tiancheng Li and Ninghui Li Department of Computer Science, Purdue University 35 N. University Street West Lafayette, IN 4797-217 {li83,ninghui}@cs.purdue.edu

More information

A Theory of Privacy and Utility for Data Sources

A Theory of Privacy and Utility for Data Sources A Theory of Privacy and Utility for Data Sources Lalitha Sankar Princeton University 7/26/2011 Lalitha Sankar (PU) Privacy and Utility 1 Electronic Data Repositories Technological leaps in information

More information

Privacy in Statistical Databases

Privacy in Statistical Databases Privacy in Statistical Databases CSE 598D/STAT 598B Fall 2007 Lecture 2, 9/13/2007 Aleksandra Slavkovic Office hours: MW 3:30-4:30 Office: Thomas 412 Phone: x3-4918 Adam Smith Office hours: Mondays 3-5pm

More information

CS573 Data Privacy and Security. Li Xiong

CS573 Data Privacy and Security. Li Xiong CS573 Data Privacy and Security Anonymizationmethods Li Xiong Today Clustering based anonymization(cont) Permutation based anonymization Other privacy principles Microaggregation/Clustering Two steps:

More information

Privacy Breaches in Privacy-Preserving Data Mining

Privacy Breaches in Privacy-Preserving Data Mining 1 Privacy Breaches in Privacy-Preserving Data Mining Johannes Gehrke Department of Computer Science Cornell University Joint work with Sasha Evfimievski (Cornell), Ramakrishnan Srikant (IBM), and Rakesh

More information

Data attribute security and privacy in Collaborative distributed database Publishing

Data attribute security and privacy in Collaborative distributed database Publishing International Journal of Engineering Inventions e-issn: 2278-7461, p-issn: 2319-6491 Volume 3, Issue 12 (July 2014) PP: 60-65 Data attribute security and privacy in Collaborative distributed database Publishing

More information

Cryptography & Data Privacy Research in the NSRC

Cryptography & Data Privacy Research in the NSRC Cryptography & Data Privacy Research in the NSRC Adam Smith Assistant Professor Computer Science and Engineering 1 Cryptography & Data Privacy @ CSE NSRC SIIS Algorithms & Complexity Group Cryptography

More information

Steps Towards Location Privacy

Steps Towards Location Privacy Steps Towards Location Privacy Subhasish Mazumdar New Mexico Institute of Mining & Technology Socorro, NM 87801, USA. DataSys 2018 Subhasish.Mazumdar@nmt.edu DataSys 2018 1 / 53 Census A census is vital

More information

On Syntactic Anonymity and Differential Privacy

On Syntactic Anonymity and Differential Privacy 161 183 On Syntactic Anonymity and Differential Privacy Chris Clifton 1, Tamir Tassa 2 1 Department of Computer Science/CERIAS, Purdue University, West Lafayette, IN 47907-2107 USA. 2 The Department of

More information

CERIAS Tech Report Injector: Mining Background Knowledge for Data Anonymization by Tiancheng Li; Ninghui Li Center for Education and Research

CERIAS Tech Report Injector: Mining Background Knowledge for Data Anonymization by Tiancheng Li; Ninghui Li Center for Education and Research CERIAS Tech Report 28-29 : Mining Background Knowledge for Data Anonymization by Tiancheng Li; Ninghui Li Center for Education and Research Information Assurance and Security Purdue University, West Lafayette,

More information

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database T.Malathi 1, S. Nandagopal 2 PG Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,

More information

Differentially Private Algorithm and Auction Configuration

Differentially Private Algorithm and Auction Configuration Differentially Private Algorithm and Auction Configuration Ellen Vitercik CMU, Theory Lunch October 11, 2017 Joint work with Nina Balcan and Travis Dick $1 Prices learned from purchase histories can reveal

More information

Privacy Preserving in Knowledge Discovery and Data Publishing

Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao, G.V Konda Reddy and G.Yedukondalu 33 Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao 1, G.V Konda Reddy 2, G.Yedukondalu 3 Abstract Knowledge Discovery is

More information

An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis

An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis Mohammad Hammoud CS3525 Dept. of Computer Science University of Pittsburgh Introduction This paper addresses the problem of defining

More information

(α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing

(α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing (α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing Raymond Chi-Wing Wong, Jiuyong Li +, Ada Wai-Chee Fu and Ke Wang Department of Computer Science and Engineering +

More information

A Review of Privacy Preserving Data Publishing Technique

A Review of Privacy Preserving Data Publishing Technique A Review of Privacy Preserving Data Publishing Technique Abstract:- Amar Paul Singh School of CSE Bahra University Shimla Hills, India Ms. Dhanshri Parihar Asst. Prof (School of CSE) Bahra University Shimla

More information

CERIAS Tech Report

CERIAS Tech Report CERIAS Tech Report 27-7 PRIVACY-PRESERVING INCREMENTAL DATA DISSEMINATION by Ji-Won Byun, Tiancheng Li, Elisa Bertino, Ninghui Li, and Yonglak Sohn Center for Education and Research in Information Assurance

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Privacy preserving data mining Li Xiong Slides credits: Chris Clifton Agrawal and Srikant 4/3/2011 1 Privacy Preserving Data Mining Privacy concerns about personal data AOL

More information

Privacy Preserving Machine Learning: A Theoretically Sound App

Privacy Preserving Machine Learning: A Theoretically Sound App Privacy Preserving Machine Learning: A Theoretically Sound Approach Outline 1 2 3 4 5 6 Privacy Leakage Events AOL search data leak: New York Times journalist was able to identify users from the anonymous

More information

CS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong

CS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong Outline Tabular data and histogram/range queries Algorithms for low dimensional data Algorithms for high dimensional

More information

with BLENDER: Enabling Local Search a Hybrid Differential Privacy Model

with BLENDER: Enabling Local Search a Hybrid Differential Privacy Model BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model Brendan Avent 1, Aleksandra Korolova 1, David Zeber 2, Torgeir Hovden 2, Benjamin Livshits 3 University of Southern California 1

More information

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.1105

More information

Composition Attacks and Auxiliary Information in Data Privacy

Composition Attacks and Auxiliary Information in Data Privacy Composition Attacks and Auxiliary Information in Data Privacy Srivatsava Ranjit Ganta Pennsylvania State University University Park, PA 1682 ranjit@cse.psu.edu Shiva Prasad Kasiviswanathan Pennsylvania

More information

L-Diversity Algorithm for Incremental Data Release

L-Diversity Algorithm for Incremental Data Release Appl. ath. Inf. Sci. 7, No. 5, 2055-2060 (203) 2055 Applied athematics & Information Sciences An International Journal http://dx.doi.org/0.2785/amis/070546 L-Diversity Algorithm for Incremental Data Release

More information

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 31 st July 216. Vol.89. No.2 25-216 JATIT & LLS. All rights reserved. SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 1 AMANI MAHAGOUB OMER, 2 MOHD MURTADHA BIN MOHAMAD 1 Faculty of Computing,

More information

Maintaining K-Anonymity against Incremental Updates

Maintaining K-Anonymity against Incremental Updates Maintaining K-Anonymity against Incremental Updates Jian Pei Jian Xu Zhibin Wang Wei Wang Ke Wang Simon Fraser University, Canada, {jpei, wang}@cs.sfu.ca Fudan University, China, {xujian, 55, weiwang}@fudan.edu.cn

More information

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey ISSN No. 0976-5697 Volume 8, No. 5, May-June 2017 International Journal of Advanced Research in Computer Science SURVEY REPORT Available Online at www.ijarcs.info Preserving Privacy during Big Data Publishing

More information

BOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen

BOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen BOOLEAN MATRIX FACTORIZATIONS with applications in data mining Pauli Miettinen MATRIX FACTORIZATIONS BOOLEAN MATRIX FACTORIZATIONS o THE BOOLEAN MATRIX PRODUCT As normal matrix product, but with addition

More information

Slicing Technique For Privacy Preserving Data Publishing

Slicing Technique For Privacy Preserving Data Publishing Slicing Technique For Privacy Preserving Data Publishing D. Mohanapriya #1, Dr. T.Meyyappan M.Sc., MBA. M.Phil., Ph.d., 2 # Department of Computer Science and Engineering, Alagappa University, Karaikudi,

More information

An Iterative Approach to Examining the Effectiveness of Data Sanitization

An Iterative Approach to Examining the Effectiveness of Data Sanitization An Iterative Approach to Examining the Effectiveness of Data Sanitization By ANHAD PREET SINGH B.Tech. (Punjabi University) 2007 M.S. (University of California, Davis) 2012 DISSERTATION Submitted in partial

More information

An Efficient Clustering Method for k-anonymization

An Efficient Clustering Method for k-anonymization An Efficient Clustering Method for -Anonymization Jun-Lin Lin Department of Information Management Yuan Ze University Chung-Li, Taiwan jun@saturn.yzu.edu.tw Meng-Cheng Wei Department of Information Management

More information

Algorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis. Part 1 Aaron Roth

Algorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis. Part 1 Aaron Roth Algorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis Part 1 Aaron Roth The 2015 ImageNet competition An image classification competition during a heated war for deep learning talent

More information

Differential Privacy. Cynthia Dwork. Mamadou H. Diallo

Differential Privacy. Cynthia Dwork. Mamadou H. Diallo Differential Privacy Cynthia Dwork Mamadou H. Diallo 1 Focus Overview Privacy preservation in statistical databases Goal: to enable the user to learn properties of the population as a whole, while protecting

More information

Michelle Hayes Mary Joel Holin. Michael Roanhouse Julie Hovden. Special Thanks To. Disclaimer

Michelle Hayes Mary Joel Holin. Michael Roanhouse Julie Hovden. Special Thanks To. Disclaimer Further Understanding the Intersection of Technology and Privacy to Ensure and Protect Client Data Special Thanks To Michelle Hayes Mary Joel Holin We can provably know where domestic violence shelter

More information

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Tiancheng Li Ninghui Li CERIAS and Department of Computer Science, Purdue University 250 N. University Street, West

More information

Crowd-Blending Privacy

Crowd-Blending Privacy Crowd-Blending Privacy Johannes Gehrke, Michael Hay, Edward Lui, and Rafael Pass Department of Computer Science, Cornell University {johannes,mhay,luied,rafael}@cs.cornell.edu Abstract. We introduce a

More information

Differentially Private H-Tree

Differentially Private H-Tree GeoPrivacy: 2 nd Workshop on Privacy in Geographic Information Collection and Analysis Differentially Private H-Tree Hien To, Liyue Fan, Cyrus Shahabi Integrated Media System Center University of Southern

More information

Clustering-based Multidimensional Sequence Data Anonymization

Clustering-based Multidimensional Sequence Data Anonymization Clustering-based Multidimensional Sequence Data Anonymization Morvarid Sehatar University of Ottawa Ottawa, ON, Canada msehatar@uottawa.ca Stan Matwin 1 Dalhousie University Halifax, NS, Canada 2 Institute

More information

Prajapati Het I., Patel Shivani M., Prof. Ketan J. Sarvakar IT Department, U. V. Patel college of Engineering Ganapat University, Gujarat

Prajapati Het I., Patel Shivani M., Prof. Ketan J. Sarvakar IT Department, U. V. Patel college of Engineering Ganapat University, Gujarat Security and Privacy with Perturbation Based Encryption Technique in Big Data Prajapati Het I., Patel Shivani M., Prof. Ketan J. Sarvakar IT Department, U. V. Patel college of Engineering Ganapat University,

More information

Anonymizing Sequential Releases

Anonymizing Sequential Releases Anonymizing Sequential Releases Ke Wang School of Computing Science Simon Fraser University Canada V5A 1S6 wangk@cs.sfu.ca Benjamin C. M. Fung School of Computing Science Simon Fraser University Canada

More information

Approaches to distributed privacy protecting data mining

Approaches to distributed privacy protecting data mining Approaches to distributed privacy protecting data mining Bartosz Przydatek CMU Approaches to distributed privacy protecting data mining p.1/11 Introduction Data Mining and Privacy Protection conflicting

More information

Privacy-Preserving Data Publishing: A Survey of Recent Developments

Privacy-Preserving Data Publishing: A Survey of Recent Developments Privacy-Preserving Data Publishing: A Survey of Recent Developments BENJAMIN C. M. FUNG Concordia University, Montreal KE WANG Simon Fraser University, Burnaby RUI CHEN Concordia University, Montreal 14

More information

Service-Oriented Architecture for Privacy-Preserving Data Mashup

Service-Oriented Architecture for Privacy-Preserving Data Mashup Service-Oriented Architecture for Privacy-Preserving Data Mashup Thomas Trojer a Benjamin C. M. Fung b Patrick C. K. Hung c a Quality Engineering, Institute of Computer Science, University of Innsbruck,

More information

PRACTICAL K-ANONYMITY ON LARGE DATASETS. Benjamin Podgursky. Thesis. Submitted to the Faculty of the. Graduate School of Vanderbilt University

PRACTICAL K-ANONYMITY ON LARGE DATASETS. Benjamin Podgursky. Thesis. Submitted to the Faculty of the. Graduate School of Vanderbilt University PRACTICAL K-ANONYMITY ON LARGE DATASETS By Benjamin Podgursky Thesis Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of

More information

Privacy Preserved Data Publishing Techniques for Tabular Data

Privacy Preserved Data Publishing Techniques for Tabular Data Privacy Preserved Data Publishing Techniques for Tabular Data Keerthy C. College of Engineering Trivandrum Sabitha S. College of Engineering Trivandrum ABSTRACT Almost all countries have imposed strict

More information

A Case Study: Privacy Preserving Release of Spa9o- temporal Density in Paris

A Case Study: Privacy Preserving Release of Spa9o- temporal Density in Paris A Case Study: Privacy Preserving Release of Spa9o- temporal Density in Paris Gergely Acs (INRIA) gergely.acs@inria.fr!! Claude Castelluccia (INRIA) claude.castelluccia@inria.fr! Outline 2! Dataset descrip9on!

More information

Maintaining K-Anonymity against Incremental Updates

Maintaining K-Anonymity against Incremental Updates Maintaining K-Anonymity against Incremental Updates Jian Pei 1 Jian Xu 2 Zhibin Wang 2 Wei Wang 2 Ke Wang 1 1 Simon Fraser University, Canada, {jpei, wang}@cs.sfu.ca 2 Fudan University, China, {xujian,

More information

Private Database Synthesis for Outsourced System Evaluation

Private Database Synthesis for Outsourced System Evaluation Private Database Synthesis for Outsourced System Evaluation Vani Gupta 1, Gerome Miklau 1, and Neoklis Polyzotis 2 1 Dept. of Computer Science, University of Massachusetts, Amherst, MA, USA 2 Dept. of

More information

Privacy-Enhancing Technologies & Applications to ehealth. Dr. Anja Lehmann IBM Research Zurich

Privacy-Enhancing Technologies & Applications to ehealth. Dr. Anja Lehmann IBM Research Zurich Privacy-Enhancing Technologies & Applications to ehealth Dr. Anja Lehmann IBM Research Zurich IBM Research Zurich IBM Research founded in 1945 employees: 3,000 12 research labs on six continents IBM Research

More information

Distributed Data Anonymization with Hiding Sensitive Node Labels

Distributed Data Anonymization with Hiding Sensitive Node Labels Distributed Data Anonymization with Hiding Sensitive Node Labels C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan University,Trichy

More information

Privacy Challenges in Big Data and Industry 4.0

Privacy Challenges in Big Data and Industry 4.0 Privacy Challenges in Big Data and Industry 4.0 Jiannong Cao Internet & Mobile Computing Lab Department of Computing Hong Kong Polytechnic University Email: csjcao@comp.polyu.edu.hk http://www.comp.polyu.edu.hk/~csjcao/

More information

Comparative Analysis of Anonymization Techniques

Comparative Analysis of Anonymization Techniques International Journal of Electronic and Electrical Engineering. ISSN 0974-2174 Volume 7, Number 8 (2014), pp. 773-778 International Research Publication House http://www.irphouse.com Comparative Analysis

More information

VPriv: Protecting Privacy in Location- Based Vehicular Services

VPriv: Protecting Privacy in Location- Based Vehicular Services VPriv: Protecting Privacy in Location- Based Vehicular Services Raluca Ada Popa and Hari Balakrishnan Computer Science and Artificial Intelligence Laboratory, M.I.T. Andrew Blumberg Department of Mathematics

More information

Ambiguity: Hide the Presence of Individuals and Their Privacy with Low Information Loss

Ambiguity: Hide the Presence of Individuals and Their Privacy with Low Information Loss : Hide the Presence of Individuals and Their Privacy with Low Information Loss Hui (Wendy) Wang Department of Computer Science Stevens Institute of Technology Hoboken, NJ, USA hwang@cs.stevens.edu Abstract

More information

Privacy, Security & Ethical Issues

Privacy, Security & Ethical Issues Privacy, Security & Ethical Issues How do we mine data when we can t even look at it? 2 Individual Privacy Nobody should know more about any entity after the data mining than they did before Approaches:

More information

Distributed Private Data Collection at Scale

Distributed Private Data Collection at Scale Distributed Private Data Collection at Scale Graham Cormode g.cormode@warwick.ac.uk Tejas Kulkarni (Warwick) Divesh Srivastava (AT&T) 1 Big data, big problem? The big data meme has taken root Organizations

More information

Preserving Privacy in High-Dimensional Data Publishing

Preserving Privacy in High-Dimensional Data Publishing Preserving Privacy in High-Dimensional Data Publishing Khalil Al-Hussaeni A Thesis in The Department of Electrical and Computer Engineering Presented in Partial Fulfillment of the Requirements for the

More information

FMC: An Approach for Privacy Preserving OLAP

FMC: An Approach for Privacy Preserving OLAP FMC: An Approach for Privacy Preserving OLAP Ming Hua, Shouzhi Zhang, Wei Wang, Haofeng Zhou, Baile Shi Fudan University, China {minghua, shouzhi_zhang, weiwang, haofzhou, bshi}@fudan.edu.cn Abstract.

More information

Alpha Anonymization in Social Networks using the Lossy-Join Approach

Alpha Anonymization in Social Networks using the Lossy-Join Approach TRANSACTIONS ON DATA PRIVACY 11 (2018) 1 22 Alpha Anonymization in Social Networks using the Lossy-Join Kiran Baktha*, B K Tripathy** * Department of Electronics and Communication Engineering, VIT University,

More information

Distributed Data Mining with Differential Privacy

Distributed Data Mining with Differential Privacy Distributed Data Mining with Differential Privacy Ning Zhang, Ming Li, Wenjing Lou Department of Electrical and Computer Engineering, Worcester Polytechnic Institute, MA Email: {ning, mingli}@wpi.edu,

More information

Adding Differential Privacy in an Open Board Discussion Board System

Adding Differential Privacy in an Open Board Discussion Board System San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-26-2017 Adding Differential Privacy in an Open Board Discussion Board System Pragya Rana San

More information

Research Paper SECURED UTILITY ENHANCEMENT IN MINING USING GENETIC ALGORITHM

Research Paper SECURED UTILITY ENHANCEMENT IN MINING USING GENETIC ALGORITHM Research Paper SECURED UTILITY ENHANCEMENT IN MINING USING GENETIC ALGORITHM 1 Dr.G.Kirubhakar and 2 Dr.C.Venkatesh Address for Correspondence 1 Department of Computer Science and Engineering, Surya Engineering

More information

Differentially-Private Network Trace Analysis. Frank McSherry and Ratul Mahajan Microsoft Research

Differentially-Private Network Trace Analysis. Frank McSherry and Ratul Mahajan Microsoft Research Differentially-Private Network Trace Analysis Frank McSherry and Ratul Mahajan Microsoft Research Overview. 1 Overview Question: Is it possible to conduct network trace analyses in a way that provides

More information

Secure Multi-party Computation Protocols For Collaborative Data Publishing With m-privacy

Secure Multi-party Computation Protocols For Collaborative Data Publishing With m-privacy Secure Multi-party Computation Protocols For Collaborative Data Publishing With m-privacy K. Prathyusha 1 M.Tech Student, CSE, KMMITS, JNTU-A, TIRUPATHI,AP Sakshi Siva Ramakrishna 2 Assistant proffesor,

More information

Attacks on Privacy and definetti s Theorem

Attacks on Privacy and definetti s Theorem Attacks on Privacy and definetti s Theorem Daniel Kifer Penn State University ABSTRACT In this paper we present a method for reasoning about privacy using the concepts of exchangeability and definetti

More information

Incognito: Efficient Full Domain K Anonymity

Incognito: Efficient Full Domain K Anonymity Incognito: Efficient Full Domain K Anonymity Kristen LeFevre David J. DeWitt Raghu Ramakrishnan University of Wisconsin Madison 1210 West Dayton St. Madison, WI 53706 Talk Prepared By Parul Halwe(05305002)

More information

Privately Solving Linear Programs

Privately Solving Linear Programs Privately Solving Linear Programs Justin Hsu 1 Aaron Roth 1 Tim Roughgarden 2 Jonathan Ullman 3 1 University of Pennsylvania 2 Stanford University 3 Harvard University July 8th, 2014 A motivating example

More information

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Partition Based Perturbation for Privacy Preserving Distributed Data Mining BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 2 Sofia 2017 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2017-0015 Partition Based Perturbation

More information

A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING

A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING 1 B.KARTHIKEYAN, 2 G.MANIKANDAN, 3 V.VAITHIYANATHAN 1 Assistant Professor, School of Computing, SASTRA University, TamilNadu, India. 2 Assistant

More information