Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017

Similar documents
CS573 Data Privacy and Security. Differential Privacy. Li Xiong

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015.

Privacy Preserving Machine Learning: A Theoretically Sound App

Differential Privacy. CPSC 457/557, Fall 13 10/31/13 Hushiyang Liu

0x1A Great Papers in Computer Security

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Differential Privacy. Cynthia Dwork. Mamadou H. Diallo

Privacy in Statistical Databases

Survey Result on Privacy Preserving Techniques in Data Publishing

Introduction to Data Mining

Privacy-Integrated Graph Clustering Through Differential Privacy

DEEP LEARNING WITH DIFFERENTIAL PRIVACY Martin Abadi, Andy Chu, Ian Goodfellow*, Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang Google * Open

Algorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis. Part 1 Aaron Roth

Data Anonymization. Graham Cormode.

Privacy-Preserving Machine Learning

Cryptography & Data Privacy Research in the NSRC

Data Distortion for Privacy Protection in a Terrorist Analysis System

Research Statement. Yehuda Lindell. Dept. of Computer Science Bar-Ilan University, Israel.

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin

Pufferfish: A Semantic Approach to Customizable Privacy

An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis

Utilizing Large-Scale Randomized Response at Google: RAPPOR and its lessons

ADVANCES in NATURAL and APPLIED SCIENCES

Crowd-Blending Privacy

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING

Microdata Publishing with Algorithmic Privacy Guarantees

PrivApprox. Privacy- Preserving Stream Analytics.

Statistical Databases: Query Restriction

Re-identification in Dynamic. Opportunities

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

PRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION

A Theory of Privacy and Utility for Data Sources

Privacy Challenges in Big Data and Industry 4.0

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness

Guarding user Privacy with Federated Learning and Differential Privacy

Distributed Data Anonymization with Hiding Sensitive Node Labels

A Case Study: Privacy Preserving Release of Spa9o- temporal Density in Paris

Differentially-Private Network Trace Analysis. Frank McSherry and Ratul Mahajan Microsoft Research

Prateek Mittal Princeton University

Differentially Private H-Tree

Privacy Preserving Collaborative Filtering

Differentially Private Algorithm and Auction Configuration

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust

A Differentially Private Matching Scheme for Pairing Similar Users of Proximity Based Social Networking applications

Approaches to distributed privacy protecting data mining

An Overview of Secure Multiparty Computation

Comparative Analysis of Anonymization Techniques

The Confounding Problem of Private Data Release

More Efficient Classification of Web Content Using Graph Sampling

Enforcing Privacy in Decentralized Mobile Social Networks

Sanitization Techniques against Personal Information Inference Attack on Social Network

Mining Frequent Patterns with Differential Privacy

Attribute-based encryption with encryption and decryption outsourcing

An Architecture for Privacy-preserving Mining of Client Information

Co-clustering for differentially private synthetic data generation

Cryptography & Data Privacy Research in the NSRC

An Adaptive Algorithm for Range Queries in Differential Privacy

Adding Differential Privacy in an Open Board Discussion Board System

Emerging Measures in Preserving Privacy for Publishing The Data

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

Information Security in Big Data: Privacy & Data Mining

The Applicability of the Perturbation Model-based Privacy Preserving Data Mining for Real-world Data

Collecting Telemetry Data Privately

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

A Review on Privacy Preserving Data Mining Approaches

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Protecting Against Maximum-Knowledge Adversaries in Microdata Release: Analysis of Masking and Synthetic Data Using the Permutation Model

Differen'al Privacy. CS 297 Pragya Rana

TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SET

Synthetic Data. Michael Lin

Yale University Department of Computer Science

Demonstration of Damson: Differential Privacy for Analysis of Large Data

Distributed Data Mining with Differential Privacy

VPriv: Protecting Privacy in Location- Based Vehicular Services

k Anonymous Private Query Based on Blind Signature and Oblivious Transfer

A Survey on Frequent Itemset Mining using Differential Private with Transaction Splitting

Research Article Structural Attack to Anonymous Graph of Social Networks

Comparison and Analysis of Anonymization Techniques for Preserving Privacy in Big Data

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

ADVANCES in NATURAL and APPLIED SCIENCES

A compact Aggregate key Cryptosystem for Data Sharing in Cloud Storage systems.

A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING

Zero-Knowledge Proof and Authentication Protocols

Interactive Visualization of the Stock Market Graph

Frequent grams based Embedding for Privacy Preserving Record Linkage

Secure Multiparty Computation

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain

1 A Tale of Two Lovers

Design and Analysis of Efficient Anonymous Communication Protocols

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S.

Information Technology (CCHIT): Report on Activities and Progress

Privately Solving Linear Programs

Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique

Service-Oriented Architecture for Privacy-Preserving Data Mashup

Safely Measuring Tor. Rob Jansen U.S. Naval Research Laboratory Center for High Assurance Computer Systems

Statistical Disclosure Control meets Recommender Systems: A practical approach

Applications of Geometric Spanner

Transcription:

Differential Privacy Seminar: Robust Techniques Thomas Edlich Technische Universität München Department of Informatics kdd.in.tum.de July 16, 2017

Outline 1. Introduction 2. Definition and Features of Differential Privacy 3. Techniques 4. Practical Issues and Limitations 5. Differential Privacy in Machine Learning & Differential Privacy 2

Introduction

Privacy Protection Anonymization Removal of identifying attributes such as names or social security number. Often considered enough to protect privacy. Differential Privacy 4

Privacy Protection Anonymization Removal of identifying attributes such as names or social security number. Often considered enough to protect privacy. However: Netflix Prize Dataset [9] Movie A B C 1 5 3 1 2 1 5 3 3 3 1 5 Netflix Ratings Linkage Attack Movie A B C 1 9 5 2 2 1 10 6 3 4 2 9 IMDb Ratings Reidentification of medical records using publicly available voting records [12] Differential Privacy 4

Privacy Protection What is privacy protection? Nothing about an individual should be learnable from the database that cannot be learned without access to the database. [2] Differential Privacy 5

Privacy Protection What is privacy protection? Nothing about an individual should be learnable from the database that cannot be learned without access to the database. [2] Proven to be impossible if the privacy mechanism is useful. Reason: Auxiliary information. [3] Differential Privacy 5

Definition and Features of Differential Privacy

Differential Privacy Differential Privacy A mathematical definition of privacy which bounds the privacy risk for an participant in a database. Makes it possible to learn properties of a population while protecting the privacy of individuals. Differential Privacy 7

Formalizing Differential Privacy [11] ɛ-differential Privacy An algorithm A priv with A priv (D) T provides ɛ-differential privacy if Pr[A priv (D) S] e ɛ Pr[A priv (D ) S] (1) for all S T and all datasets D, D differing in only a single entry. Differential Privacy 8

Formalizing Differential Privacy [11] ɛ-differential Privacy An algorithm A priv with A priv (D) T provides ɛ-differential privacy if Pr[A priv (D) S] e ɛ Pr[A priv (D ) S] (1) for all S T and all datasets D, D differing in only a single entry. (ɛ, δ)-differential Privacy An algorithm A priv with A priv (D) T provides (ɛ, δ)-differential privacy if Pr[A priv (D) S] e ɛ Pr[A priv (D ) S] + δ (2) for all S T and all datasets D, D differing in only a single entry. Differential Privacy 8

Resilience to arbitrary auxiliary information [4] Differential Privacy provides plausible deniability to each participant since the same outcome could have been produced using a dataset without him. This definition is independent of available side information Furthermore: Differential Privacy holds regardless what auxiliary information is available right now or will be available in the future. Differential Privacy 9

Postprocessing Postprocessing [4] Let A be a (ɛ, δ)-differentially private mechanism and f an arbitrary mapping. Then the composition f A is (ɛ, δ)-differentially private. Differential Privacy 10

Composition Composition [11] Let A 1 priv and A2 priv be algorithms with privacy guarantees of ɛ 1 and ɛ 2. Then applying both algorithms to the data has a privacy risk of at most ɛ 1 + ɛ 2. Differential Privacy 11

Techniques

Techniques Approaches: Input Perturbation Output Perturbation Algorithm Perturbation Differential Privacy 13

Input perturbation [11] [11] Add noise directly to the database D The perturbed dataset can then be published and guarantees differential privacy for any following algorithm. Example: Randomized Response Differential Privacy 14

Input Perturbation [4] [13] Randomized Response Question: Have you ever committed a crime? Randomization Process: 1. Flip a coin. 2. If tails : answer truthfully. 3. If heads : Flip a coin. tails : say no. heads : say yes. Plausible deniability for the individual. True distribution can still be estimated: p: all people who have committed a crime, y: number of people who said yes : E(y) = 0.5 p + 0.25 p + 0.25 (1 p) = 0.5 p + 0.25 p = 2 y 0.5 Differential Privacy 15

Input perturbation: Pro s & Con s Pro Results can be reproduced Privacy is not dependent on a specific algorithm Contra Determining the amount of noise needed and therefore determining ɛ not trivial Privacy guarantees might be worse than for algorithm-specific techniques Differential Privacy 16

Output Perturbation[4][7] [11] Add noise to the results of A nonpriv Only publish the perturbed results Destroy original data Differential Privacy 17

Output Perturbation l1-sensitivity Maximum difference of the function over all pairs of databases D and D differing in a single record. S(A) = max D,D A(D) A(D ) 1 (3) Laplace Mechanism Given an algorithm A nonpriv : D R k, the Laplace Algorithm adds Laplacian noise to the result of A nonpriv [7]: A priv (x, ɛ) = A nonpriv (x) + (Z 1,..., Z k ) (4) where Z i are i.i.d. random variables with Z i Lap( S(A nonpriv ) ɛ ) Differential Privacy 18

Output Perturbation: Pro s & Con s Pro Better privacy guarantees than input perturbation Easier to add noise and control the privacy Contra Results cannot be reproduced Differential Privacy 19

Exponential Mechanism [7] [11] Sometimes adding noise to the input or output is not possible. Example [4]: Items for sale: A: 1,00$, B: 1,00$, C: 3,01$ Best price: 3,01$, revenue: 3,01$ 2nd best price: 1,00$, revenue: 3,00$ Revenue for price 3,02$: 0$ Revenue for price 1,01$: 1,01$ Differential Privacy 20

Exponential Mechanism [7] Construct a utility-measure over the dataset D and all possible outputs k: The sensitivity of q is: q(d, k) = u, u R (5) S(q) = max k,d,d q(d, k) q(d, k) (6) The exponential mechanism picks a random value for k with distribution: p(k) exp( ɛ q(d, k)) (7) 2S(q) Therefore the exponential mechanism is biased towards values of k with a higher utility. Differential Privacy 21

Exponential Mechanism: Pro s & Con s Pro Biased towards the more useful values Contra Computationally expensive Requires modification of existing algorithms Differential Privacy 22

Practical Issues and Limitations

Practical Issues and Limitations Some solutions for achieving differential privacy still rely on technical assumptions about the data (e.g. discrete/continuous data). How to choose ɛ and δ? Rule of thumb: δ 1 D The lower, the better: But what is low enough? Trade-off between privacy and utility: Figure : Lap( 1 ) for different ɛ ɛ Differential Privacy 24

Differential Privacy in Machine Learning & Data Mining

Differential Privacy in Machine Learning & Data Mining How to achieve differential privacy in machine learning and data mining: Input and Output Perturbation: Using these techniques enables us to use the standard machine learning techniques and achieving differential privacy at the same time. Algorithm Perturbation: This requires the machine learning technique to be modified. Differential Privacy 26

Differentially Private Graph Clustering [8] Motivation Suppose there exists a graph consisting of users of a social network and their relationship. Consider detecting a connection between two user a privacy violation Publishing the original graph as a whole is a clear privacy breach Even if just the community structure of a graph is revealed it might be possible for an attacker to infer the existence (or non-existence) between nodes Differential Privacy 27

Differentially Private Graph Clustering [8] Graph Perturbation (PIG) [8] Privacy-Itegrated Graph Guarantees edge-differential privacy Perturbs the input graph graph guarantees privacy independent from clustering algorithms Differential Privacy 28

Differentially Private Graph Clustering [8] Algorithm 1 Graph Perturbation Algorithm PIG 1: function PerturbGraph(Adjacency matrix A, privacy parameter s) 2: for all a ij A with i < j do 3: if preservation is chosen with probability 1 s then 4: continue 5: else 6: Value of a ij is randomized 7: 8: if 0 is chosen with probability 1 2 then a ij = a ji = 0 9: else 10: a ij = a ji = 1 11: end if 12: end if 13: end for 14: return A 15: end function Differential Privacy 29

Differentially Private Graph Clustering [8] Evaluation It can be proven that a PIG-perturbed graph guarantees edge-differential privacy for ɛ ln( 2 s 1). Figure : Clustering quality using the algorithm SCAN.[8] Differential Privacy 30

Other DM/ML techniques using Differential Privacy [8] Community Detection (Input Perturbation, Algorithm Perturbation) [10] Deep Learning (Algorithm Perturbation) [1] Decision Trees (Output Perturbation) [6] Differential Privacy 31

Questions?

References I [1] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS 16, pages 308 318, New York, NY, USA, 2016. ACM. [2] T. Dalenius. Towards a methodology for statistical disclosure control. Statistisk Tidskrift, 15:429 444, 1977. [3] C. Dwork. Differential privacy. In 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), volume 4052, pages 1 12, Venice, Italy, July 2006. Springer Verlag. Differential Privacy 33

References II [4] C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211 407, 2014. [5] A. Gupta, K. Ligett, F. McSherry, A. Roth, and K. Talwar. Differentially private approximation algorithms. Aug 2009. [6] G. Jagannathan, K. Pillaipakkamnatt, and R. N. Wright. A practical differentially private random decision tree classifier. In 2009 IEEE International Conference on Workshops, pages 114 121, Dec 2009. [7] Z. Ji, Z. C. Lipton, and C. Elkan. Differential Privacy and Machine Learning: a Survey and Review. ArXiv e-prints, Dec. 2014. Differential Privacy 34

References III [8] Y. Mülle and Chris Clifton and Klemens Böhm. Privacy-integrated graph clustering through differential privacy. In EDBT/ICDT Workshops, 2015. [9] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy, SP 08, pages 111 125, Washington, DC, USA, 2008. IEEE Computer Society. [10] H. H. Nguyen, A. Imine, and M. Rusinowitch. Detecting communities under differential privacy. CoRR, abs/1607.02060, 2016. [11] A. D. Sarwate and K. Chaudhuri. Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. IEEE Signal Processing Magazine, 30(5):86 94, Sept 2013. Differential Privacy 35

References IV [12] L. Sweeney. K-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557 570, Oct. 2002. [13] S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63 69, 1965. Differential Privacy 36