Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University
|
|
- Vivian Clarissa Melton
- 6 years ago
- Views:
Transcription
1 Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University
2 Outline Privacy preserving data publishing: What and Why Examples of privacy attacks Existing solutions k-anonymity l-diversity differential privacy Conclusion and open problems
3 Privacy Preserving Data Publishing data data curator recipient contributors Each contributor: provide data about herself Curator: collects data and releases them in a certain form Recipient: uses the released data for analysis
4 Example: Census Data Release data data individuals census bureau general public Each contributor: provide data about herself Curator: collects data and releases them in a certain form Recipient: uses the released data for analysis
5 Example: Medical Data Release data data patients hospital medical researcher Each contributor: provide data about herself Curator: collects data and releases them in a certain form Recipient: uses the released data for analysis
6 Privacy Preserving Data Publishing data data curator recipient contributors Objectives: The privacy of the contributors are protected The recipient gets useful data
7 Why is this important? Many types of research rely on the availability of private data Demographic research Medical research Social network studies Web search studies
8 Why is it a difficult problem? Intuition: There are only 7 billion people on earth 7 billion 33 bits Theoretically speaking, we need only 33 bits of information to pinpoint an individual
9 Outline Privacy preserving data publishing: What and Why Examples of privacy attacks Existing solutions k-anonymity l-diversity differential privacy Conclusions and open problems
10 Privacy Breach: The MGIC Case Time: mid-1990s Curator: Massachusetts Group Insurance Commission (MGIC) Data released: anonymized medical records Intention: facilitate medical research Name Birth Date Gender ZIP Disease Alice 1960/01/01 F flu Bob 1965/02/02 M dyspepsia Cathy 1970/03/03 F pneumonia David 1975/04/04 M gastritis Medical Records
11 Privacy Breach: The MGIC Case Time: mid-1990s Curator: Massachusetts Group Insurance Commission (MGIC) Data released: anonymized medical records Intention: facilitate medical research match Name Birth Date Gender ZIP Alice 1960/01/01 F Bob 1965/02/02 M Cathy 1970/03/03 F David 1975/04/04 M Voter Registration List Birth Date Gender ZIP Disease 1960/01/01 F flu 1965/02/02 M dyspepsia 1970/03/03 F pneumonia 1975/04/04 M gastritis Medical Records
12 Privacy Breach: The AOL Case Time: 2006 Curator: American Online Data released: anonymized search log Intention: facilitate research on web search Log record: < User ID, Query, > Example: < , UQ, >
13 Privacy Breach: The AOL Case Log record: < User ID, Query, > Example: < , UQ, > Attacker: New York Times Method: Find all log entries for AOL user Many queries for businesses and services in Lilburn, GA (population 11K) A number of queries for different persons with the last name Arnold Lilburn has 14 people with the last name Arnold The New York Times contacted them and found that AOL User is Thelma Arnold
14 Privacy Breach: The DNA case Time: reported in 2005 Curator: A sperm bank Data released: A sperm donor s date of birth and birthplace Result: The donor offspring (a boy) was able to identify his biological father How? By exploiting the information on the Y- chromosome
15 Y-chromosome vs. Surname surname XY XX surname XY XX There is a strong correlation between Y-chromosomes and surnames surname XY surname XY XX
16 Y-chromosome vs. Surname The 15-year-old boy purchased the service from a company that collected DNA samples from 45,000 individuals provided service for Y-chromosome matching Result There were two (relatively) close matches, with almost identical surnames The boy thus learned the possible surnames of his biological farther
17 Privacy Breach: The DNA case The boy knew The possible last names of his biological father as well as his date of birth and birthplace He paid another company to retrieve the names of persons born in that place on that date Only one person had a matching surname Summary: The sperm bank in-directly revealed the identity of the donor, by disclosing his date of birth and birthplace
18 Lessons Learned Any information released by the data curator can potentially be exploited by the adversary In the MGIC case: genders, birth dates, ZIP codes In the AOL case: keywords in search queries In the DNA case: date of birth, birthplace Solution? Do not release the exact information from the original data
19 Privacy Preserving Data Publishing data modified data curator recipient contributors Publish a modified version of the data, such that the contributors privacy is adequately protected the published data is useful for its intended purpose (at least to some degree)
20 Privacy Preserving Data Publishing data modified data curator recipient contributors Two issues privacy principle: what do we mean by adequately protected privacy? modification method: how should we modify the data to ensure privacy while maximizing utility?
21 Existing Solutions Earliest solutions dated back to the 1960 s Solutions before 2000 Mostly without a formal privacy model Evaluates privacy based on empirical studies only This talk will focus on solutions with formal privacy models (developed after 2000) k-anonymity l-diversity differential privacy
22 Outline Privacy preserving data publishing: What and Why Examples of privacy attacks Existing solutions k-anonymity [Sweeney 2002] l-diversity differential privacy Conclusions and open problems
23 k-anonymity: Example Suppose that we want to publish the medical records below Name Age ZIP Disease Andy flu Bob dyspepsia Cathy pneumonia Diane gastritis
24 k-anonymity: Example Suppose that we want to publish the medical records below We know that eliminating names is not enough because an adversary may identify patients by Age and ZIP Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease flu dyspepsia pneumonia gastritis medical records
25 k-anonymity: Example k-anonymity [Sweeney 2002] How? requires that each (Age, ZIP) combination can be matched to at least k patients Make Age and ZIP less specific in the medical records Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease flu dyspepsia pneumonia gastritis medical records
26 k-anonymity: Example k-anonymity [Sweeney 2002] requires that each (Age, ZIP) combination can be matched to at least k patients Age ZIP Disease Name Age ZIP Andy Bob Cathy Diane adversary s knowledge flu dyspepsia pneumonia gastritis medical records
27 k-anonymity: Example k-anonymity [Sweeney 2002] requires that each (Age, ZIP) combination can be matched to at least k patients generalization Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis medical records
28 k-anonymity: Example k-anonymity [Sweeney 2002] requires that each (Age, ZIP) combination can be matched to at least k patients Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table
29 k-anonymity: Example k-anonymity [Sweeney 2002] requires that each (Age, ZIP) combination can be matched to at least k patients Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table
30 k-anonymity: General Approach Identify the attributes that the adversary may know Referred to as Quasi-Identifiers (QI) Divide tuples in the table into groups of sizes at least k Generalize the QI values of each group to make them identical QI group 1 group 2 Age ZIP Disease flu dyspepsia pneumonia gastritis medical records
31 k-anonymity: General Approach Identify the attributes that the adversary may know Referred to as Quasi-Identifiers (QI) Divide tuples in the table into groups of sizes at least k Generalize the QI values of each group to make them identical QI Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table
32 k-anonymity: Algorithms Numerous algorithms for k-anonymity had been proposed Objective: achieve k-anonymity with the least amount of generalization This line of research became obsolete Reason: k-anonymity was found to be vulnerable [Machanavajjhala et al. 2006] QI Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table
33 k-anonymity: Vulnerability k-anonymity requires that each combination of quasi-identifiers (QI) is hidden in a group of size at least k But it says nothing about the remaining attributes Result: Disclosure of sensitive attributes is possible QI sensitive Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table
34 k-anonymity: Vulnerability k-anonymity requires that each combination of quasi-identifiers (QI) is hidden in a group of size at least k But it says nothing about the remaining attributes Result: Disclosure of sensitive attributes is possible QI sensitive Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] dyspepsia [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table
35 k-anonymity: Vulnerability Intuition: Hiding in a group of k is not sufficient The group should have a diverse set of sensitive values QI sensitive Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] dyspepsia [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-anonymous table
36 Outline Privacy preserving data publishing: What and Why Examples of privacy attacks Existing solutions k-anonymity l-diversity [Machanavajjhala et al. 2006] differential privacy Conclusions and open problems
37 l-diversity [Machanavajjhala et al. 2006] Approach: (similar to k-anonymity) Divide tuples into groups, and make the QI of each group identical Requirement: (different from k-anonymity) Each group has at least l well-represented sensitive values Several definitions of well-represented exist Simplest one: in each group, no sensitive value is associated with more than 1/l of the tuples Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-diverse table
38 l-diversity [Machanavajjhala et al. 2006] Rationale: The 1/l association in the generalized table leads to 1/l confidence for the adversary Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-diverse table
39 l-diversity: Follow-up Research Algorithms: achieve l-diversity with the least amount of generalization Patches : identify vulnerabilities of l-diversity, and propose an improved privacy notion Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] flu [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-diverse table
40 l-diversity: Vulnerability Suppose that the adversary wants to find out the disease of Bob The adversary knows that Bob is unlikely to have breast cancer So he knows that Bob is likely to have dyspepsia Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] breast cancer [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-diverse table
41 l-diversity: Vulnerability Intuition: It is not sufficient to impose constraints of the diversity of sensitive values in each group Need to take into account the adversary s background knowledge (e.g., males are unlikely to have breast cancer) Name Age ZIP Andy Bob Cathy Diane adversary s knowledge Age ZIP Disease [20,30] [10000,20000] breast cancer [20,30] [10000,20000] dyspepsia [40,50] [30000,40000] pneumonia [40,50] [30000,40000] gastritis 2-diverse table
42 l-diversity: Vulnerability Intuition: It is not sufficient to impose constraints of the diversity of sensitive values in each group Need to take into account the adversary s background knowledge (e.g., males are unlikely to have breast cancer) Follow-up research on three issues: How to express the background knowledge of the adversary? How to derive background knowledge? How to generalize data to protect privacy against background knowledge? This led to papers after papers
43 Algorithm-Based Attacks Algorithms designed for l-diversity (and its improvements) are often vulnerable to algorithm-based attacks Intuition: Those algorithms always try to use the least amount of generalization to achieve l-diversity An adversary can utilize the characteristics of the algorithms to do some reverse engineering on the generalized tables
44 l-diversity: Summary l-diversity and its follow-up approaches address the weakness of k-anonymity tackle much more advanced adversary models Problems with this line of research It does not converge to a final adversary model Most proposed methods are vulnerable to algorithm-based attacks
45 Outline Privacy preserving data publishing: What and Why Examples of privacy attacks Existing solutions k-anonymity l-diversity differential privacy [Dwork 2006] Conclusions and open problems
46 Differential Privacy [Dwork 2006] A privacy principle proposed by theoreticians More difficult to understand k-anonymity and l-anonymity Becomes well-adopted because Its privacy model is generally considered strong enough Its definition naturally takes into account algorithm-based attacks
47 Differential Privacy: Intuition Suppose that we have a dataset D that contains the medical record of every individual in Australia Suppose that Alice is the dataset Intuitively, is it OK to publish the following information? Whether Alice has diabetes The total number of diabetes patients in D Why is it OK to publish the latter but not the former? Intuition: The former completely depends on Alice The latter does not depend much on Alice
48 Differential Privacy: Intuition In general, we should only publish information that does not highly depend on any particular individual This motivates the definition of differential privacy
49 Differential Privacy: Definition data randomized algorithm A modified data recipient Neighboring datasets: Two datasets and, such that can be obtained by changing one single tuple in A randomized algorithm satisfies -differential privacy, iff for any two neighboring datasets and and for any output of, Rationale: The output of the algorithm does not highly depend on any particular tuple in the input
50 Differential Privacy: Definition Illustration of -differential privacy exp Pr Pr exp
51 Comparison with k-anonymity and l- diversity Differential privacy does not directly model the adversary s knowledge exp Pr Pr exp But its privacy protection is generally considered strong enough
52 Comparison with k-anonymity and l- diversity Differential privacy is more general exp Pr Pr exp There is no restriction on the type of O It can be a table, a set of frequent itemsets, a regression model, etc.
53 The Differential Privacy Landscape This leads to a lot of research interests from various communities Database and data mining (SIGMOD, VLDB, ICDE, KDD, ) Security (CCS, ) Machine learning (NIPS, ICML, ) Systems (SIGCOMM, OSDI, ) Theory (STOC, FOCS, )
54 Outline Privacy preserving data publishing: What and Why Examples of privacy attacks Existing solutions k-anonymity l-diversity differential privacy General approaches for achieving differential privacy Research issues Conclusions and open problems
55 Achieving Differential Privacy: Example Suppose that we have a set D of medical records We want to release the number of diabetes patients in D (say, 1000) How to do it in a differentially private manner?
56 Achieving Differential Privacy: Example Naïve solution: Release 1000 directly But it violates differential privacy, since does not hold Better solution: Add noise into 1000 before releasing it Intuition: the noise could achieve the requirements of differential privacy Question: what kind of noise should we add?
57 Laplace Distribution ; increase/decrease by changes by a factor of variance: ; is referred as the scale
58 Achieving Differential Privacy: Example Add Laplace noise before releasing the number of diabetes patients in D Changing one tuple in D shifting the mean of Laplace distribution by 1 The two distributions have bounded differences differential privacy is satisfied Pr Pr ratio bounded exp # of diabetes patients
59 The Laplace Mechanism In general, if we want to release a set of values (e.g., counts) from a dataset, We add i.i.d. Laplace noise to each value to achieve differential privacy This general approach is called the Laplace mechanism Figuring out the correct amount of noise to use could be a research issue
60 Histogram Suppose that we want to release a histogram on Age from a dataset D Using the Laplace mechanism: We add i.i.d. Laplace noise to the histogram counts How much noise? Previous example: noise leads to (1/)-differential privacy This case: noise leads to (1/)-differential privacy Rationale: Changing one tuple may change two counts simultaneously in the histogram We need twice the noise to conceal such changes
61 Histogram In general, if the values to be published are obtained from a complex task (e.g., regression) It could be much more challenging to derive the correct amount of noise to use
62 Optimization of Accuracy A more common research issue: choosing a good strategy to publish data Example: Histogram vs. Histogram + binary tree The latter is good for range queries It answers range query is answered by taking the sum of O(log n) noisy counts the former requires O(n) counts Note: the latter requires logn times the noise required the former, but it pays off
63 Choosing a Good Strategy In general, choosing a good strategy requires exploiting the characteristics of the input data the output results the way that users may use the output results Most differential privacy papers focus on this issue
64 Other Research Issues General approaches beyond the Laplace mechanism E.g., the exponential mechanism [McSherry et al. 2007], which is suitable for problems with non-numeric outputs
65 Conclusion An overview of existing solutions for privacy preserving data publishing k-anonymity l-diversity differential privacy General methods for differential privacy, and related research issues
66 Open Problems Differentially private algorithms for complex data/tasks, e.g., Graph queries Principle component analysis (PCA) Trajectories Main challenge: Difficult to identify a good strategy
67 Open Problems Differential privacy might be too strong It requires that changing one tuple should not bring much change to the published result Alternative interpretation: Even if an adversary knows n 1 tuples in the input data, he won t be able to infer information about the remaining tuple Knowing n 1 individuals is often impossible How should we relax differential privacy?
68 Open Problems How to choose an appropriate for - differential privacy? Need a way to quantify the cost of privacy and the gain of utility in releasing data
69 Open Problems What do we do to protect genome data? Challenges The data is highly complex The queries are highly complex Definition of privacy is unclear
Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness
Data Security and Privacy Topic 18: k-anonymity, l-diversity, and t-closeness 1 Optional Readings for This Lecture t-closeness: Privacy Beyond k-anonymity and l-diversity. Ninghui Li, Tiancheng Li, and
More informationData Anonymization. Graham Cormode.
Data Anonymization Graham Cormode graham@research.att.com 1 Why Anonymize? For Data Sharing Give real(istic) data to others to study without compromising privacy of individuals in the data Allows third-parties
More informationK ANONYMITY. Xiaoyong Zhou
K ANONYMITY LATANYA SWEENEY Xiaoyong Zhou DATA releasing: Privacy vs. Utility Society is experiencing exponential growth in the number and variety of data collections containing person specific specific
More informationAutomated Information Retrieval System Using Correlation Based Multi- Document Summarization Method
Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Dr.K.P.Kaliyamurthie HOD, Department of CSE, Bharath University, Tamilnadu, India ABSTRACT: Automated
More informationK-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007
K-Anonymity and Other Cluster- Based Methods Ge Ruan Oct 11,2007 Data Publishing and Data Privacy Society is experiencing exponential growth in the number and variety of data collections containing person-specific
More informationEmerging Measures in Preserving Privacy for Publishing The Data
Emerging Measures in Preserving Privacy for Publishing The Data K.SIVARAMAN 1 Assistant Professor, Dept. of Computer Science, BIST, Bharath University, Chennai -600073 1 ABSTRACT: The information in the
More informationPufferfish: A Semantic Approach to Customizable Privacy
Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu Collaborators: Daniel Kifer (Penn State), Bolin Ding (UIUC, Microsoft Research) idash Privacy Workshop
More informationCS573 Data Privacy and Security. Differential Privacy. Li Xiong
CS573 Data Privacy and Security Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques Composition theorems Statistical Data Privacy Non-interactive vs interactive Privacy
More informationSurvey Result on Privacy Preserving Techniques in Data Publishing
Survey Result on Privacy Preserving Techniques in Data Publishing S.Deebika PG Student, Computer Science and Engineering, Vivekananda College of Engineering for Women, Namakkal India A.Sathyapriya Assistant
More informationSecured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD)
Vol.2, Issue.1, Jan-Feb 2012 pp-208-212 ISSN: 2249-6645 Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD) Krishna.V #, Santhana Lakshmi. S * # PG Student,
More informationDifferential Privacy. CPSC 457/557, Fall 13 10/31/13 Hushiyang Liu
Differential Privacy CPSC 457/557, Fall 13 10/31/13 Hushiyang Liu Era of big data Motivation: Utility vs. Privacy large-size database automatized data analysis Utility "analyze and extract knowledge from
More informationPrivacy-Preserving Machine Learning
Privacy-Preserving Machine Learning CS 760: Machine Learning Spring 2018 Mark Craven and David Page www.biostat.wisc.edu/~craven/cs760 1 Goals for the Lecture You should understand the following concepts:
More informationm-privacy for Collaborative Data Publishing
m-privacy for Collaborative Data Publishing Slawomir Goryczka Emory University Email: sgorycz@emory.edu Li Xiong Emory University Email: lxiong@emory.edu Benjamin C. M. Fung Concordia University Email:
More informationDifferential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017
Differential Privacy Seminar: Robust Techniques Thomas Edlich Technische Universität München Department of Informatics kdd.in.tum.de July 16, 2017 Outline 1. Introduction 2. Definition and Features of
More information0x1A Great Papers in Computer Security
CS 380S 0x1A Great Papers in Computer Security Vitaly Shmatikov http://www.cs.utexas.edu/~shmat/courses/cs380s/ C. Dwork Differential Privacy (ICALP 2006 and many other papers) Basic Setting DB= x 1 x
More informationPrivacy Preserving Data Mining. Danushka Bollegala COMP 527
Privacy Preserving ata Mining anushka Bollegala COMP 527 Privacy Issues ata mining attempts to ind mine) interesting patterns rom large datasets However, some o those patterns might reveal inormation that
More informationCryptography & Data Privacy Research in the NSRC
Cryptography & Data Privacy Research in the NSRC Adam Smith Assistant Professor Computer Science and Engineering 1 Cryptography & Data Privacy @ CSE NSRC SIIS Algorithms & Complexity Group Cryptography
More informationSurvey of Anonymity Techniques for Privacy Preserving
2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Survey of Anonymity Techniques for Privacy Preserving Luo Yongcheng
More informationIndrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin
Airavat: Security and Privacy for MapReduce Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin Computing in the year 201X 2 Data Illusion of
More informationAchieving k-anonmity* Privacy Protection Using Generalization and Suppression
UT DALLAS Erik Jonsson School of Engineering & Computer Science Achieving k-anonmity* Privacy Protection Using Generalization and Suppression Murat Kantarcioglu Based on Sweeney 2002 paper Releasing Private
More informationHiding the Presence of Individuals from Shared Databases: δ-presence
Consiglio Nazionale delle Ricerche Hiding the Presence of Individuals from Shared Databases: δ-presence M. Ercan Nergiz Maurizio Atzori Chris Clifton Pisa KDD Lab Outline Adversary Models Existential Uncertainty
More information(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data
(δ,l)-diversity: Privacy Preservation for Publication Numerical Sensitive Data Mohammad-Reza Zare-Mirakabad Department of Computer Engineering Scool of Electrical and Computer Yazd University, Iran mzare@yazduni.ac.ir
More informationPrivacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015.
Privacy-preserving machine learning Bo Liu, the HKUST March, 1st, 2015. 1 Some slides extracted from Wang Yuxiang, Differential Privacy: a short tutorial. Cynthia Dwork, The Promise of Differential Privacy.
More informationDemonstration of Damson: Differential Privacy for Analysis of Large Data
Demonstration of Damson: Differential Privacy for Analysis of Large Data Marianne Winslett 1,2, Yin Yang 1,2, Zhenjie Zhang 1 1 Advanced Digital Sciences Center, Singapore {yin.yang, zhenjie}@adsc.com.sg
More informationECEN Security and Privacy for Big Data. Introduction Professor Yanmin Gong 08/22/2017
ECEN 5060 - Security and Privacy for Big Data Introduction Professor Yanmin Gong 08/22/2017 Administrivia Course Hour: T/R 3:30-4:45 pm @ CLB 101 Office Hour: T/R 2:30-3:30 pm Any question besides assignment
More informationSMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique
SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique Sumit Jain 1, Abhishek Raghuvanshi 1, Department of information Technology, MIT, Ujjain Abstract--Knowledge
More informationSolution of Exercise Sheet 11
Foundations of Cybersecurity (Winter 16/17) Prof. Dr. Michael Backes CISPA / Saarland University saarland university computer science Solution of Exercise Sheet 11 1 Breaking Privacy By Linking Data The
More informationSecurity Control Methods for Statistical Database
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security Statistical Database A statistical database is a database which provides statistics on subsets of records OLAP
More informationNON-CENTRALIZED DISTINCT L-DIVERSITY
NON-CENTRALIZED DISTINCT L-DIVERSITY Chi Hong Cheong 1, Dan Wu 2, and Man Hon Wong 3 1,3 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong {chcheong, mhwong}@cse.cuhk.edu.hk
More informationParallel Composition Revisited
Parallel Composition Revisited Chris Clifton 23 October 2017 This is joint work with Keith Merrill and Shawn Merrill This work supported by the U.S. Census Bureau under Cooperative Agreement CB16ADR0160002
More informationInjector: Mining Background Knowledge for Data Anonymization
: Mining Background Knowledge for Data Anonymization Tiancheng Li, Ninghui Li Department of Computer Science, Purdue University 35 N. University Street, West Lafayette, IN 4797, USA {li83,ninghui}@cs.purdue.edu
More informationMicrodata Publishing with Algorithmic Privacy Guarantees
Microdata Publishing with Algorithmic Privacy Guarantees Tiancheng Li and Ninghui Li Department of Computer Science, Purdue University 35 N. University Street West Lafayette, IN 4797-217 {li83,ninghui}@cs.purdue.edu
More informationA Theory of Privacy and Utility for Data Sources
A Theory of Privacy and Utility for Data Sources Lalitha Sankar Princeton University 7/26/2011 Lalitha Sankar (PU) Privacy and Utility 1 Electronic Data Repositories Technological leaps in information
More informationPrivacy in Statistical Databases
Privacy in Statistical Databases CSE 598D/STAT 598B Fall 2007 Lecture 2, 9/13/2007 Aleksandra Slavkovic Office hours: MW 3:30-4:30 Office: Thomas 412 Phone: x3-4918 Adam Smith Office hours: Mondays 3-5pm
More informationCS573 Data Privacy and Security. Li Xiong
CS573 Data Privacy and Security Anonymizationmethods Li Xiong Today Clustering based anonymization(cont) Permutation based anonymization Other privacy principles Microaggregation/Clustering Two steps:
More informationPrivacy Breaches in Privacy-Preserving Data Mining
1 Privacy Breaches in Privacy-Preserving Data Mining Johannes Gehrke Department of Computer Science Cornell University Joint work with Sasha Evfimievski (Cornell), Ramakrishnan Srikant (IBM), and Rakesh
More informationData attribute security and privacy in Collaborative distributed database Publishing
International Journal of Engineering Inventions e-issn: 2278-7461, p-issn: 2319-6491 Volume 3, Issue 12 (July 2014) PP: 60-65 Data attribute security and privacy in Collaborative distributed database Publishing
More informationCryptography & Data Privacy Research in the NSRC
Cryptography & Data Privacy Research in the NSRC Adam Smith Assistant Professor Computer Science and Engineering 1 Cryptography & Data Privacy @ CSE NSRC SIIS Algorithms & Complexity Group Cryptography
More informationSteps Towards Location Privacy
Steps Towards Location Privacy Subhasish Mazumdar New Mexico Institute of Mining & Technology Socorro, NM 87801, USA. DataSys 2018 Subhasish.Mazumdar@nmt.edu DataSys 2018 1 / 53 Census A census is vital
More informationOn Syntactic Anonymity and Differential Privacy
161 183 On Syntactic Anonymity and Differential Privacy Chris Clifton 1, Tamir Tassa 2 1 Department of Computer Science/CERIAS, Purdue University, West Lafayette, IN 47907-2107 USA. 2 The Department of
More informationCERIAS Tech Report Injector: Mining Background Knowledge for Data Anonymization by Tiancheng Li; Ninghui Li Center for Education and Research
CERIAS Tech Report 28-29 : Mining Background Knowledge for Data Anonymization by Tiancheng Li; Ninghui Li Center for Education and Research Information Assurance and Security Purdue University, West Lafayette,
More informationEnhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database
Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database T.Malathi 1, S. Nandagopal 2 PG Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,
More informationDifferentially Private Algorithm and Auction Configuration
Differentially Private Algorithm and Auction Configuration Ellen Vitercik CMU, Theory Lunch October 11, 2017 Joint work with Nina Balcan and Travis Dick $1 Prices learned from purchase histories can reveal
More informationPrivacy Preserving in Knowledge Discovery and Data Publishing
B.Lakshmana Rao, G.V Konda Reddy and G.Yedukondalu 33 Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao 1, G.V Konda Reddy 2, G.Yedukondalu 3 Abstract Knowledge Discovery is
More informationAn Ad Omnia Approach to Defining and Achiev ing Private Data Analysis
An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis Mohammad Hammoud CS3525 Dept. of Computer Science University of Pittsburgh Introduction This paper addresses the problem of defining
More information(α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing
(α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing Raymond Chi-Wing Wong, Jiuyong Li +, Ada Wai-Chee Fu and Ke Wang Department of Computer Science and Engineering +
More informationA Review of Privacy Preserving Data Publishing Technique
A Review of Privacy Preserving Data Publishing Technique Abstract:- Amar Paul Singh School of CSE Bahra University Shimla Hills, India Ms. Dhanshri Parihar Asst. Prof (School of CSE) Bahra University Shimla
More informationCERIAS Tech Report
CERIAS Tech Report 27-7 PRIVACY-PRESERVING INCREMENTAL DATA DISSEMINATION by Ji-Won Byun, Tiancheng Li, Elisa Bertino, Ninghui Li, and Yonglak Sohn Center for Education and Research in Information Assurance
More informationIntroduction to Data Mining
Introduction to Data Mining Privacy preserving data mining Li Xiong Slides credits: Chris Clifton Agrawal and Srikant 4/3/2011 1 Privacy Preserving Data Mining Privacy concerns about personal data AOL
More informationPrivacy Preserving Machine Learning: A Theoretically Sound App
Privacy Preserving Machine Learning: A Theoretically Sound Approach Outline 1 2 3 4 5 6 Privacy Leakage Events AOL search data leak: New York Times journalist was able to identify users from the anonymous
More informationCS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong
CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong Outline Tabular data and histogram/range queries Algorithms for low dimensional data Algorithms for high dimensional
More informationwith BLENDER: Enabling Local Search a Hybrid Differential Privacy Model
BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model Brendan Avent 1, Aleksandra Korolova 1, David Zeber 2, Torgeir Hovden 2, Benjamin Livshits 3 University of Southern California 1
More informationImplementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.1105
More informationComposition Attacks and Auxiliary Information in Data Privacy
Composition Attacks and Auxiliary Information in Data Privacy Srivatsava Ranjit Ganta Pennsylvania State University University Park, PA 1682 ranjit@cse.psu.edu Shiva Prasad Kasiviswanathan Pennsylvania
More informationL-Diversity Algorithm for Incremental Data Release
Appl. ath. Inf. Sci. 7, No. 5, 2055-2060 (203) 2055 Applied athematics & Information Sciences An International Journal http://dx.doi.org/0.2785/amis/070546 L-Diversity Algorithm for Incremental Data Release
More informationSIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER
31 st July 216. Vol.89. No.2 25-216 JATIT & LLS. All rights reserved. SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 1 AMANI MAHAGOUB OMER, 2 MOHD MURTADHA BIN MOHAMAD 1 Faculty of Computing,
More informationMaintaining K-Anonymity against Incremental Updates
Maintaining K-Anonymity against Incremental Updates Jian Pei Jian Xu Zhibin Wang Wei Wang Ke Wang Simon Fraser University, Canada, {jpei, wang}@cs.sfu.ca Fudan University, China, {xujian, 55, weiwang}@fudan.edu.cn
More informationPreserving Privacy during Big Data Publishing using K-Anonymity Model A Survey
ISSN No. 0976-5697 Volume 8, No. 5, May-June 2017 International Journal of Advanced Research in Computer Science SURVEY REPORT Available Online at www.ijarcs.info Preserving Privacy during Big Data Publishing
More informationBOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen
BOOLEAN MATRIX FACTORIZATIONS with applications in data mining Pauli Miettinen MATRIX FACTORIZATIONS BOOLEAN MATRIX FACTORIZATIONS o THE BOOLEAN MATRIX PRODUCT As normal matrix product, but with addition
More informationSlicing Technique For Privacy Preserving Data Publishing
Slicing Technique For Privacy Preserving Data Publishing D. Mohanapriya #1, Dr. T.Meyyappan M.Sc., MBA. M.Phil., Ph.d., 2 # Department of Computer Science and Engineering, Alagappa University, Karaikudi,
More informationAn Iterative Approach to Examining the Effectiveness of Data Sanitization
An Iterative Approach to Examining the Effectiveness of Data Sanitization By ANHAD PREET SINGH B.Tech. (Punjabi University) 2007 M.S. (University of California, Davis) 2012 DISSERTATION Submitted in partial
More informationAn Efficient Clustering Method for k-anonymization
An Efficient Clustering Method for -Anonymization Jun-Lin Lin Department of Information Management Yuan Ze University Chung-Li, Taiwan jun@saturn.yzu.edu.tw Meng-Cheng Wei Department of Information Management
More informationAlgorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis. Part 1 Aaron Roth
Algorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis Part 1 Aaron Roth The 2015 ImageNet competition An image classification competition during a heated war for deep learning talent
More informationDifferential Privacy. Cynthia Dwork. Mamadou H. Diallo
Differential Privacy Cynthia Dwork Mamadou H. Diallo 1 Focus Overview Privacy preservation in statistical databases Goal: to enable the user to learn properties of the population as a whole, while protecting
More informationMichelle Hayes Mary Joel Holin. Michael Roanhouse Julie Hovden. Special Thanks To. Disclaimer
Further Understanding the Intersection of Technology and Privacy to Ensure and Protect Client Data Special Thanks To Michelle Hayes Mary Joel Holin We can provably know where domestic violence shelter
More informationOptimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching
Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Tiancheng Li Ninghui Li CERIAS and Department of Computer Science, Purdue University 250 N. University Street, West
More informationCrowd-Blending Privacy
Crowd-Blending Privacy Johannes Gehrke, Michael Hay, Edward Lui, and Rafael Pass Department of Computer Science, Cornell University {johannes,mhay,luied,rafael}@cs.cornell.edu Abstract. We introduce a
More informationDifferentially Private H-Tree
GeoPrivacy: 2 nd Workshop on Privacy in Geographic Information Collection and Analysis Differentially Private H-Tree Hien To, Liyue Fan, Cyrus Shahabi Integrated Media System Center University of Southern
More informationClustering-based Multidimensional Sequence Data Anonymization
Clustering-based Multidimensional Sequence Data Anonymization Morvarid Sehatar University of Ottawa Ottawa, ON, Canada msehatar@uottawa.ca Stan Matwin 1 Dalhousie University Halifax, NS, Canada 2 Institute
More informationPrajapati Het I., Patel Shivani M., Prof. Ketan J. Sarvakar IT Department, U. V. Patel college of Engineering Ganapat University, Gujarat
Security and Privacy with Perturbation Based Encryption Technique in Big Data Prajapati Het I., Patel Shivani M., Prof. Ketan J. Sarvakar IT Department, U. V. Patel college of Engineering Ganapat University,
More informationAnonymizing Sequential Releases
Anonymizing Sequential Releases Ke Wang School of Computing Science Simon Fraser University Canada V5A 1S6 wangk@cs.sfu.ca Benjamin C. M. Fung School of Computing Science Simon Fraser University Canada
More informationApproaches to distributed privacy protecting data mining
Approaches to distributed privacy protecting data mining Bartosz Przydatek CMU Approaches to distributed privacy protecting data mining p.1/11 Introduction Data Mining and Privacy Protection conflicting
More informationPrivacy-Preserving Data Publishing: A Survey of Recent Developments
Privacy-Preserving Data Publishing: A Survey of Recent Developments BENJAMIN C. M. FUNG Concordia University, Montreal KE WANG Simon Fraser University, Burnaby RUI CHEN Concordia University, Montreal 14
More informationService-Oriented Architecture for Privacy-Preserving Data Mashup
Service-Oriented Architecture for Privacy-Preserving Data Mashup Thomas Trojer a Benjamin C. M. Fung b Patrick C. K. Hung c a Quality Engineering, Institute of Computer Science, University of Innsbruck,
More informationPRACTICAL K-ANONYMITY ON LARGE DATASETS. Benjamin Podgursky. Thesis. Submitted to the Faculty of the. Graduate School of Vanderbilt University
PRACTICAL K-ANONYMITY ON LARGE DATASETS By Benjamin Podgursky Thesis Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of
More informationPrivacy Preserved Data Publishing Techniques for Tabular Data
Privacy Preserved Data Publishing Techniques for Tabular Data Keerthy C. College of Engineering Trivandrum Sabitha S. College of Engineering Trivandrum ABSTRACT Almost all countries have imposed strict
More informationA Case Study: Privacy Preserving Release of Spa9o- temporal Density in Paris
A Case Study: Privacy Preserving Release of Spa9o- temporal Density in Paris Gergely Acs (INRIA) gergely.acs@inria.fr!! Claude Castelluccia (INRIA) claude.castelluccia@inria.fr! Outline 2! Dataset descrip9on!
More informationMaintaining K-Anonymity against Incremental Updates
Maintaining K-Anonymity against Incremental Updates Jian Pei 1 Jian Xu 2 Zhibin Wang 2 Wei Wang 2 Ke Wang 1 1 Simon Fraser University, Canada, {jpei, wang}@cs.sfu.ca 2 Fudan University, China, {xujian,
More informationPrivate Database Synthesis for Outsourced System Evaluation
Private Database Synthesis for Outsourced System Evaluation Vani Gupta 1, Gerome Miklau 1, and Neoklis Polyzotis 2 1 Dept. of Computer Science, University of Massachusetts, Amherst, MA, USA 2 Dept. of
More informationPrivacy-Enhancing Technologies & Applications to ehealth. Dr. Anja Lehmann IBM Research Zurich
Privacy-Enhancing Technologies & Applications to ehealth Dr. Anja Lehmann IBM Research Zurich IBM Research Zurich IBM Research founded in 1945 employees: 3,000 12 research labs on six continents IBM Research
More informationDistributed Data Anonymization with Hiding Sensitive Node Labels
Distributed Data Anonymization with Hiding Sensitive Node Labels C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan University,Trichy
More informationPrivacy Challenges in Big Data and Industry 4.0
Privacy Challenges in Big Data and Industry 4.0 Jiannong Cao Internet & Mobile Computing Lab Department of Computing Hong Kong Polytechnic University Email: csjcao@comp.polyu.edu.hk http://www.comp.polyu.edu.hk/~csjcao/
More informationComparative Analysis of Anonymization Techniques
International Journal of Electronic and Electrical Engineering. ISSN 0974-2174 Volume 7, Number 8 (2014), pp. 773-778 International Research Publication House http://www.irphouse.com Comparative Analysis
More informationVPriv: Protecting Privacy in Location- Based Vehicular Services
VPriv: Protecting Privacy in Location- Based Vehicular Services Raluca Ada Popa and Hari Balakrishnan Computer Science and Artificial Intelligence Laboratory, M.I.T. Andrew Blumberg Department of Mathematics
More informationAmbiguity: Hide the Presence of Individuals and Their Privacy with Low Information Loss
: Hide the Presence of Individuals and Their Privacy with Low Information Loss Hui (Wendy) Wang Department of Computer Science Stevens Institute of Technology Hoboken, NJ, USA hwang@cs.stevens.edu Abstract
More informationPrivacy, Security & Ethical Issues
Privacy, Security & Ethical Issues How do we mine data when we can t even look at it? 2 Individual Privacy Nobody should know more about any entity after the data mining than they did before Approaches:
More informationDistributed Private Data Collection at Scale
Distributed Private Data Collection at Scale Graham Cormode g.cormode@warwick.ac.uk Tejas Kulkarni (Warwick) Divesh Srivastava (AT&T) 1 Big data, big problem? The big data meme has taken root Organizations
More informationPreserving Privacy in High-Dimensional Data Publishing
Preserving Privacy in High-Dimensional Data Publishing Khalil Al-Hussaeni A Thesis in The Department of Electrical and Computer Engineering Presented in Partial Fulfillment of the Requirements for the
More informationFMC: An Approach for Privacy Preserving OLAP
FMC: An Approach for Privacy Preserving OLAP Ming Hua, Shouzhi Zhang, Wei Wang, Haofeng Zhou, Baile Shi Fudan University, China {minghua, shouzhi_zhang, weiwang, haofzhou, bshi}@fudan.edu.cn Abstract.
More informationAlpha Anonymization in Social Networks using the Lossy-Join Approach
TRANSACTIONS ON DATA PRIVACY 11 (2018) 1 22 Alpha Anonymization in Social Networks using the Lossy-Join Kiran Baktha*, B K Tripathy** * Department of Electronics and Communication Engineering, VIT University,
More informationDistributed Data Mining with Differential Privacy
Distributed Data Mining with Differential Privacy Ning Zhang, Ming Li, Wenjing Lou Department of Electrical and Computer Engineering, Worcester Polytechnic Institute, MA Email: {ning, mingli}@wpi.edu,
More informationAdding Differential Privacy in an Open Board Discussion Board System
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-26-2017 Adding Differential Privacy in an Open Board Discussion Board System Pragya Rana San
More informationResearch Paper SECURED UTILITY ENHANCEMENT IN MINING USING GENETIC ALGORITHM
Research Paper SECURED UTILITY ENHANCEMENT IN MINING USING GENETIC ALGORITHM 1 Dr.G.Kirubhakar and 2 Dr.C.Venkatesh Address for Correspondence 1 Department of Computer Science and Engineering, Surya Engineering
More informationDifferentially-Private Network Trace Analysis. Frank McSherry and Ratul Mahajan Microsoft Research
Differentially-Private Network Trace Analysis Frank McSherry and Ratul Mahajan Microsoft Research Overview. 1 Overview Question: Is it possible to conduct network trace analyses in a way that provides
More informationSecure Multi-party Computation Protocols For Collaborative Data Publishing With m-privacy
Secure Multi-party Computation Protocols For Collaborative Data Publishing With m-privacy K. Prathyusha 1 M.Tech Student, CSE, KMMITS, JNTU-A, TIRUPATHI,AP Sakshi Siva Ramakrishna 2 Assistant proffesor,
More informationAttacks on Privacy and definetti s Theorem
Attacks on Privacy and definetti s Theorem Daniel Kifer Penn State University ABSTRACT In this paper we present a method for reasoning about privacy using the concepts of exchangeability and definetti
More informationIncognito: Efficient Full Domain K Anonymity
Incognito: Efficient Full Domain K Anonymity Kristen LeFevre David J. DeWitt Raghu Ramakrishnan University of Wisconsin Madison 1210 West Dayton St. Madison, WI 53706 Talk Prepared By Parul Halwe(05305002)
More informationPrivately Solving Linear Programs
Privately Solving Linear Programs Justin Hsu 1 Aaron Roth 1 Tim Roughgarden 2 Jonathan Ullman 3 1 University of Pennsylvania 2 Stanford University 3 Harvard University July 8th, 2014 A motivating example
More informationPartition Based Perturbation for Privacy Preserving Distributed Data Mining
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 2 Sofia 2017 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2017-0015 Partition Based Perturbation
More informationA FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING
A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING 1 B.KARTHIKEYAN, 2 G.MANIKANDAN, 3 V.VAITHIYANATHAN 1 Assistant Professor, School of Computing, SASTRA University, TamilNadu, India. 2 Assistant
More information