CS573 Data Privacy and Security. Differential Privacy. Li Xiong

Size: px
Start display at page:

Download "CS573 Data Privacy and Security. Differential Privacy. Li Xiong"

Transcription

1 CS573 Data Privacy and Security Differential Privacy Li Xiong

2 Outline Differential Privacy Definition Basic techniques Composition theorems

3

4 Statistical Data Privacy Non-interactive vs interactive Privacy goal: individual is protected Utility goal: statistical information useful for analysis Queries Original Data Privacy Mechanism Data curator Statistics/ Synthetic data Data analyst

5 Recap Anonymization or de-identification (input perturbation) Linkage attacks, homogeneity attacks Query auditing/restriction Query denial is itself disclosive, computationally infeasible Summary statistics Differencing attacks

6 Differential Privacy Promise: an individual will not be affected, adversely or otherwise, by allowing his/her data to be used in any study or analysis, no matter what other studies, datasets, or information sources, are available Paradox: learning nothing about an individual while learning useful statistical information about a population

7 Differential Privacy Statistical outcome is indistinguishable regardless whether a particular user (record) is included in the data

8 Differential Privacy Statistical outcome is indistinguishable regardless whether a particular user (record) is included in the data

9 Differential privacy: an example Original records Original histogram Perturbed histogram with differential privacy

10

11

12

13

14

15 Differential Privacy: Some Qualitative Properties Protection against presence/participation of a single record Quantification of privacy loss Composition Post-processing

16 Differential Privacy: Additional Remarks Correlations between records Granularity of a single record (difference for neighboring database) Group privacy Graph database (eg social networks): node vs edge Movie rating database: user vs event (movie)

17 Outline Differential Privacy Definition Basic techniques Laplace mechanism Exponential mechanism Random Response Composition theorems

18

19 Can deterministic algorithms satisfy differential privacy? Module 2 Tutorial: Differential Privacy in the Wild 19

20 Non trivial deterministic algorithms do not satisfy differential privacy Space of all inputs Space of all outputs (at least 2 distinct outputs) Module 2 Tutorial: Differential Privacy in the Wild 20

21 Non-trivial deterministic algorithms do not satisfy differential privacy Each input mapped to a distinct output. Module 2 Tutorial: Differential Privacy in the Wild 21

22 There exist two inputs that differ in one entry mapped to different outputs. Pr > 0 Pr = 0 Module 2 Tutorial: Differential Privacy in the Wild 22

23 Output Randomization Query Database Add noise to true answer Add noise to answers such that: Each answer does not leak too much information about the database. Noisy answers are close to the original answers. Researcher Module 2 Tutorial: Differential Privacy in the Wild 23

24 [DMNS 06] Laplace Mechanism Query q Database True answer q(d) q(d) + η η Researcher 0.6 Laplace Distribution Lap(S/ε) Module Tutorial: Differential Privacy in the Wild 24

25 Laplace Distribution PDF: Denoted as Lap(b) when u=0 Mean u Variance 2b 2

26 How much noise for privacy? [Dwork et al., TCC 2006] Sensitivity: Consider a query q: I R. S(q) is the smallest number s.t. for any neighboring tables D, D, q(d) q(d ) S(q) Theorem: If sensitivity of the query is S, then the algorithm A(D) = q(d) + Lap(S(q)/ε) guarantees ε- differential privacy Module 2 Tutorial: Differential Privacy in the Wild 26

27 Example: COUNT query Number of people having disease Sensitivity = 1 D Disease (Y/N) Y Y N Solution: 3 + η, where η is drawn from Lap(1/ε) Mean = 0 Variance = 2/ε 2 Y N N Module 2 Tutorial: Differential Privacy in the Wild 27

28 Example: SUM query Suppose all values x are in [a,b] Sensitivity = b Module 2 Tutorial: Differential Privacy in the Wild 28

29 Privacy of Laplace Mechanism Consider neighboring databases D and D Consider some output O Module 2 Tutorial: Differential Privacy in the Wild 29

30 Utility of Laplace Mechanism Laplace mechanism works for any function that returns a real number Error: E(true answer noisy answer) 2 = Var( Lap(S(q)/ε) ) = 2*S(q) 2 / ε 2 Error bound: very unlikely the result has an error greater than a factor (Roth book Theorem 3.8) Module 2 Tutorial: Differential Privacy in the Wild 30

31 Outline Differential Privacy Definition Basic techniques Laplace mechanism Exponential mechanism Random Response Composition theorems

32 Exponential Mechanism For functions that do not return a real number what is the most common nationality in this room : Chinese/Indian/American When perturbation leads to invalid outputs To ensure integrality/non-negativity of output Module 2 Tutorial: Differential Privacy in the Wild 32

33 [MT 07] Exponential Mechanism Consider some function f (can be deterministic or probabilistic): Inputs Outputs How to construct a differentially private version of f? Module 2 Tutorial: Differential Privacy in the Wild 33

34 Exponential Mechanism Theorem For a database D, output space R and a utility score function u : D R R, the algorithm A Pr[A(D) = r] exp (ε u(d, r)/ 2Δu) satisfies ε-differential privacy, where Δu is the sensitivity of the utility score function Δu = max r & D,D u(d, r) - u(d, r)

35 Example: Exponential Mechanism Scoring/utility function w: Inputs x Outputs R D: nationalities of a set of people f(d) : most frequent nationality in D u (D, O) = #(D, O) the number of people with nationality O Module 2 Tutorial: Differential Privacy in the Wild 35

36 Privacy of Exponential Mechanism The exponential mechanism outputs an element r with probability Pr[A(D) = r] exp (ε u(d, r)/ 2Δu) Δu = max r & D,D u(d, r) - u(d, r) Approximately Pr[A(D) = r] /Pr[A(D ) = r] <= ε (Exact proof with normalization factor: Roth Book page 39)

37 Privacy of Exponential Mechanism

38 Utility of Exponential Mechanism Can give strong utility guarantees, as it discounts outcomes exponentially based on utility score Highly unlikely that returned element r has a utility score inferior to max r u(d,r) by an additive factor of (Theorem 3.11 Roth book)

39 Outline Differential Privacy Definition Basic techniques Laplace mechanism Exponential mechanism Random Response Composition theorems

40 Randomized Response (a.k.a. local randomization) D O [W 65] Disease (Y/N) Y Y N With probability p, Report true value With probability 1-p, Report flipped value Disease (Y/N) Y N N Y N N Y N N Module 2 Tutorial: Differential Privacy in the Wild 40

41 Differential Privacy Analysis Consider 2 databases D, D (of size M) that differ in the j th value D[j] D [j]. But, D[i] = D [i], for all i j Consider some output O Module 2 Tutorial: Differential Privacy in the Wild 41

42 Utility Analysis Suppose n1 out of n people replied yes, and rest said no What is the best estimate for π = fraction of people with disease = Y? π hat = {n1/n (1-p)}/(2p-1) E(π hat ) = π Var(π hat) = Sampling Variance due to coin flips Module 2 Tutorial: Differential Privacy in the Wild 42

43 Laplace Mechanism vs Randomized Response Privacy Provide the same ε-differential privacy guarantee Laplace mechanism assumes data collector is trusted Randomized Response does not require data collector to be trusted Also called a Local Algorithm, since each record is perturbed Module 2 Tutorial: Differential Privacy in the Wild 43

44 Laplace Mechanism vs Randomized Response Utility Suppose a database with N records where μn records have disease = Y. Query: # rows with Disease=Y Std dev of Laplace mechanism answer: O(1/ε) Std dev of Randomized Response answer: O( N) Module 2 Tutorial: Differential Privacy in the Wild 44

45 Outline Differential Privacy Basic Algorithms Laplace Exponential Mechanism Randomized Response Composition Theorems Module 2 Tutorial: Differential Privacy in the Wild 45

46 Why Composition? Reasoning about privacy of a complex algorithm is hard. Helps software design If building blocks are proven to be private, it would be easy to reason about privacy of a complex algorithm built entirely using these building blocks. Module 2 Tutorial: Differential Privacy in the Wild 46

47 A bound on the number of queries In order to ensure utility, a statistical database must leak some information about each individual We can only hope to bound the amount of disclosure Hence, there is a limit on number of queries that can be answered Module 2 Tutorial: Differential Privacy in the Wild 47

48 Composition theorems Sequential composition i ε i differential privacy Parallel composition max(ε i ) differential privacy

49 Sequential Composition If M 1, M 2,..., M k are algorithms that access a private database D such that each M i satisfies ε i -differential privacy, then the combination of their outputs satisfies ε-differential privacy with ε=ε ε k Module 2 Tutorial: Differential Privacy in the Wild 49

50 Parallel Composition If M 1, M 2,..., M k are algorithms that access disjoint databases D 1, D 2,, D k such that each M i satisfies ε i -differential privacy, then the combination of their outputs satisfies ε-differential privacy with ε= max{ε 1,...,ε k } Module 2 Tutorial: Differential Privacy in the Wild 50

51 Postprocessing If M 1 is an ε differentially private algorithm that accesses a private database D, then outputting M 2 (M 1 (D)) also satisfies ε- differential privacy. Module 2 Tutorial: Differential Privacy in the Wild 51

52 Summary Differential privacy ensure an attacker can t infer the presence or absence of a single record in the input based on any output. Building blocks Laplace, exponential mechanism, (local) randomized response Composition rules help build complex algorithms using building blocks Module 2 Tutorial: Differential Privacy in the Wild 52

53 Case Study: K-means Clustering Module 2 Tutorial: Differential Privacy in the Wild 53

54 Kmeans Partition a set of points x 1, x 2,, x n into k clusters S 1, S 2,, S k such that the following is minimized: Mean of the cluster S i Module 2 Tutorial: Differential Privacy in the Wild 54

55 Kmeans Algorithm: Initialize a set of k centers Repeat Assign each point to its nearest center Recompute the set of centers Until convergence Output final set of k centers Module 2 Tutorial: Differential Privacy in the Wild 55

56 Differentially Private Kmeans [BDMN 05] Suppose we fix the number of iterations to T In each iteration (given a set of centers): 1. Assign the points to the new center to form clusters 2. Noisily compute the size of each cluster 3. Compute noisy sums of points in each cluster Module 2 Tutorial: Differential Privacy in the Wild 56

57 Differentially Private Kmeans Suppose we fix the number of iterations to T Each iteration uses ε/t privacy budget, total privacy loss is ε In each iteration (given a set of centers): 1. Assign the points to the new center to form clusters 2. Noisily compute the size of each cluster 3. Compute noisy sums of points in each cluster Module 2 Tutorial: Differential Privacy in the Wild 57

58 Differentially Private Kmeans Suppose we fix the number of iterations to T Exercise: Which of these steps expends privacy budget? In each iteration (given a set of centers): 1. Assign the points to the new center to form clusters 2. Noisily compute the size of each cluster 3. Compute noisy sums of points in each cluster Module 2 Tutorial: Differential Privacy in the Wild 58

59 Differentially Private Kmeans Suppose we fix the number of iterations to T Exercise: Which of these steps expends privacy budget? In each iteration (given a set of centers): 1. Assign the points to the new center to form clusters 2. Noisily compute the size of each cluster 3. Compute noisy sums of points in each cluster NO YES YES Module 2 Tutorial: Differential Privacy in the Wild 59

60 Differentially Private Kmeans Suppose we fix the number of iterations to T What is the sensitivity? In each iteration (given a set of centers): 1. Assign the points to the new center to form clusters 2. Noisily compute the size of each cluster 3. Compute noisy sums of points in each cluster 1 Domain size Module 2 Tutorial: Differential Privacy in the Wild 60

61 Differentially Private Kmeans Suppose we fix the number of iterations to T Each iteration uses ε/t privacy budget, total privacy loss is ε In each iteration (given a set of centers): 1. Assign the points to the new center to form clusters 2. Noisily compute the size of each cluster Laplace(2T/ε) 3. Compute noisy sums of points in each cluster Laplace(2T dom /ε) Module 2 Tutorial: Differential Privacy in the Wild 61

62 Results (T = 10 iterations, random initialization) Original Kmeans algorithm Laplace Kmeans algorithm Even though we noisily compute centers, Laplace kmeans can distinguish clusters that are far apart. Since we add noise to the sums with sensitivity proportional to dom, Laplace k-means can t distinguish small clusters that are close by. Module 2 Tutorial: Differential Privacy in the Wild 62

63 Privacy as Constrained Optimization Three axes Privacy Error Queries that can be answered E.g.: Given a fixed set of queries and privacy budget ε, what is the minimum error that can be achieved? E.g.: Given a task and privacy budget ε, how to design a set of queries (functions) and allocate the budget such that the error is minimized? Module 2 Tutorial: Differential Privacy in the Wild 63

64 References [W65] Warner, Randomized Response JASA 1965 [DN03] Dinur, Nissim, Revealing information while preserving privacy, PODS 2003 [BDMN05] Blum, Dwork, McSherry, Nissim, Practical privacy: the SuLQ framework, PODS 2005 [D06] Dwork, Differential Privacy, ICALP 2006 [DMNS06] Dwork, McSherry, Nissim, Smith, Calibrating noise to sensitivity in private data analysis, TCC 2006 [MT07] McSherry, Talwar, Mechanism Design via Differential Privacy, FOCS 2007 Module 2 Tutorial: Differential Privacy in the Wild 64

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017 Differential Privacy Seminar: Robust Techniques Thomas Edlich Technische Universität München Department of Informatics kdd.in.tum.de July 16, 2017 Outline 1. Introduction 2. Definition and Features of

More information

Pufferfish: A Semantic Approach to Customizable Privacy

Pufferfish: A Semantic Approach to Customizable Privacy Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu Collaborators: Daniel Kifer (Penn State), Bolin Ding (UIUC, Microsoft Research) idash Privacy Workshop

More information

Privacy Preserving Machine Learning: A Theoretically Sound App

Privacy Preserving Machine Learning: A Theoretically Sound App Privacy Preserving Machine Learning: A Theoretically Sound Approach Outline 1 2 3 4 5 6 Privacy Leakage Events AOL search data leak: New York Times journalist was able to identify users from the anonymous

More information

Privately Solving Linear Programs

Privately Solving Linear Programs Privately Solving Linear Programs Justin Hsu 1 Aaron Roth 1 Tim Roughgarden 2 Jonathan Ullman 3 1 University of Pennsylvania 2 Stanford University 3 Harvard University July 8th, 2014 A motivating example

More information

Data Anonymization. Graham Cormode.

Data Anonymization. Graham Cormode. Data Anonymization Graham Cormode graham@research.att.com 1 Why Anonymize? For Data Sharing Give real(istic) data to others to study without compromising privacy of individuals in the data Allows third-parties

More information

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University Outline Privacy preserving data publishing: What and Why Examples of privacy attacks

More information

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015.

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015. Privacy-preserving machine learning Bo Liu, the HKUST March, 1st, 2015. 1 Some slides extracted from Wang Yuxiang, Differential Privacy: a short tutorial. Cynthia Dwork, The Promise of Differential Privacy.

More information

Differential Privacy. Cynthia Dwork. Mamadou H. Diallo

Differential Privacy. Cynthia Dwork. Mamadou H. Diallo Differential Privacy Cynthia Dwork Mamadou H. Diallo 1 Focus Overview Privacy preservation in statistical databases Goal: to enable the user to learn properties of the population as a whole, while protecting

More information

0x1A Great Papers in Computer Security

0x1A Great Papers in Computer Security CS 380S 0x1A Great Papers in Computer Security Vitaly Shmatikov http://www.cs.utexas.edu/~shmat/courses/cs380s/ C. Dwork Differential Privacy (ICALP 2006 and many other papers) Basic Setting DB= x 1 x

More information

Differentially Private H-Tree

Differentially Private H-Tree GeoPrivacy: 2 nd Workshop on Privacy in Geographic Information Collection and Analysis Differentially Private H-Tree Hien To, Liyue Fan, Cyrus Shahabi Integrated Media System Center University of Southern

More information

Parallel Composition Revisited

Parallel Composition Revisited Parallel Composition Revisited Chris Clifton 23 October 2017 This is joint work with Keith Merrill and Shawn Merrill This work supported by the U.S. Census Bureau under Cooperative Agreement CB16ADR0160002

More information

An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis

An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis Mohammad Hammoud CS3525 Dept. of Computer Science University of Pittsburgh Introduction This paper addresses the problem of defining

More information

Cryptography & Data Privacy Research in the NSRC

Cryptography & Data Privacy Research in the NSRC Cryptography & Data Privacy Research in the NSRC Adam Smith Assistant Professor Computer Science and Engineering 1 Cryptography & Data Privacy @ CSE NSRC SIIS Algorithms & Complexity Group Cryptography

More information

Privacy-Preserving Machine Learning

Privacy-Preserving Machine Learning Privacy-Preserving Machine Learning CS 760: Machine Learning Spring 2018 Mark Craven and David Page www.biostat.wisc.edu/~craven/cs760 1 Goals for the Lecture You should understand the following concepts:

More information

Co-clustering for differentially private synthetic data generation

Co-clustering for differentially private synthetic data generation Co-clustering for differentially private synthetic data generation Tarek Benkhelif, Françoise Fessant, Fabrice Clérot and Guillaume Raschia January 23, 2018 Orange Labs & LS2N Journée thématique EGC &

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Privacy preserving data mining Li Xiong Slides credits: Chris Clifton Agrawal and Srikant 4/3/2011 1 Privacy Preserving Data Mining Privacy concerns about personal data AOL

More information

Microdata Publishing with Algorithmic Privacy Guarantees

Microdata Publishing with Algorithmic Privacy Guarantees Microdata Publishing with Algorithmic Privacy Guarantees Tiancheng Li and Ninghui Li Department of Computer Science, Purdue University 35 N. University Street West Lafayette, IN 4797-217 {li83,ninghui}@cs.purdue.edu

More information

Distributed Data Mining with Differential Privacy

Distributed Data Mining with Differential Privacy Distributed Data Mining with Differential Privacy Ning Zhang, Ming Li, Wenjing Lou Department of Electrical and Computer Engineering, Worcester Polytechnic Institute, MA Email: {ning, mingli}@wpi.edu,

More information

FREQUENT subgraph mining (FSM) is a fundamental and

FREQUENT subgraph mining (FSM) is a fundamental and JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 25 A Two-Phase Algorithm for Differentially Private Frequent Subgraph Mining Xiang Cheng, Sen Su, Shengzhi Xu, Li Xiong, Ke Xiao, Mingxing Zhao Abstract

More information

Approaches to distributed privacy protecting data mining

Approaches to distributed privacy protecting data mining Approaches to distributed privacy protecting data mining Bartosz Przydatek CMU Approaches to distributed privacy protecting data mining p.1/11 Introduction Data Mining and Privacy Protection conflicting

More information

arxiv: v1 [cs.ds] 12 Sep 2016

arxiv: v1 [cs.ds] 12 Sep 2016 Jaewoo Lee Penn State University, University Par, PA 16801 Daniel Kifer Penn State University, University Par, PA 16801 JLEE@CSE.PSU.EDU DKIFER@CSE.PSU.EDU arxiv:1609.03251v1 [cs.ds] 12 Sep 2016 Abstract

More information

Crowd-Blending Privacy

Crowd-Blending Privacy Crowd-Blending Privacy Johannes Gehrke, Michael Hay, Edward Lui, and Rafael Pass Department of Computer Science, Cornell University {johannes,mhay,luied,rafael}@cs.cornell.edu Abstract. We introduce a

More information

DEEP LEARNING WITH DIFFERENTIAL PRIVACY Martin Abadi, Andy Chu, Ian Goodfellow*, Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang Google * Open

DEEP LEARNING WITH DIFFERENTIAL PRIVACY Martin Abadi, Andy Chu, Ian Goodfellow*, Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang Google * Open DEEP LEARNING WITH DIFFERENTIAL PRIVACY Martin Abadi, Andy Chu, Ian Goodfellow*, Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang Google * Open AI 2 3 Deep Learning Fashion Cognitive tasks: speech,

More information

Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring

Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring DBSec 13 Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring Liyue Fan, Li Xiong, Vaidy Sunderam Department of Math & Computer Science Emory University 9/4/2013 DBSec'13:

More information

with BLENDER: Enabling Local Search a Hybrid Differential Privacy Model

with BLENDER: Enabling Local Search a Hybrid Differential Privacy Model BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model Brendan Avent 1, Aleksandra Korolova 1, David Zeber 2, Torgeir Hovden 2, Benjamin Livshits 3 University of Southern California 1

More information

Differentially Private Algorithm and Auction Configuration

Differentially Private Algorithm and Auction Configuration Differentially Private Algorithm and Auction Configuration Ellen Vitercik CMU, Theory Lunch October 11, 2017 Joint work with Nina Balcan and Travis Dick $1 Prices learned from purchase histories can reveal

More information

Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin

Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin Airavat: Security and Privacy for MapReduce Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin Computing in the year 201X 2 Data Illusion of

More information

Frequent grams based Embedding for Privacy Preserving Record Linkage

Frequent grams based Embedding for Privacy Preserving Record Linkage Frequent grams based Embedding for Privacy Preserving Record Linkage ABSTRACT Luca Bonomi Emory University Atlanta, USA lbonomi@mathcs.emory.edu Rui Chen Concordia University Montreal, Canada ru_che@encs.concordia.ca

More information

Sublinear Algorithms January 12, Lecture 2. First, let us look at a couple of simple testers that do not work well.

Sublinear Algorithms January 12, Lecture 2. First, let us look at a couple of simple testers that do not work well. Sublinear Algorithms January 12, 2012 Lecture 2 Lecturer: Sofya Raskhodnikova Scribe(s): Madhav Jha 1 Testing if a List is Sorted Input: Goal: a list x 1,..., x n of arbitrary numbers. an -tester for sortedness

More information

PrivApprox. Privacy- Preserving Stream Analytics.

PrivApprox. Privacy- Preserving Stream Analytics. PrivApprox Privacy- Preserving Stream Analytics https://privapprox.github.io Do Le Quoc, Martin Beck, Pramod Bhatotia, Ruichuan Chen, Christof Fetzer, Thorsten Strufe July 2017 Motivation Clients Analysts

More information

CS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong

CS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong Outline Tabular data and histogram/range queries Algorithms for low dimensional data Algorithms for high dimensional

More information

Security Control Methods for Statistical Database

Security Control Methods for Statistical Database Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security Statistical Database A statistical database is a database which provides statistics on subsets of records OLAP

More information

Introduction to Secure Multi-Party Computation

Introduction to Secure Multi-Party Computation CS 380S Introduction to Secure Multi-Party Computation Vitaly Shmatikov slide 1 Motivation General framework for describing computation between parties who do not trust each other Example: elections N

More information

Algorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis. Part 1 Aaron Roth

Algorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis. Part 1 Aaron Roth Algorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis Part 1 Aaron Roth The 2015 ImageNet competition An image classification competition during a heated war for deep learning talent

More information

Private Database Synthesis for Outsourced System Evaluation

Private Database Synthesis for Outsourced System Evaluation Private Database Synthesis for Outsourced System Evaluation Vani Gupta 1, Gerome Miklau 1, and Neoklis Polyzotis 2 1 Dept. of Computer Science, University of Massachusetts, Amherst, MA, USA 2 Dept. of

More information

Differential Privacy. CPSC 457/557, Fall 13 10/31/13 Hushiyang Liu

Differential Privacy. CPSC 457/557, Fall 13 10/31/13 Hushiyang Liu Differential Privacy CPSC 457/557, Fall 13 10/31/13 Hushiyang Liu Era of big data Motivation: Utility vs. Privacy large-size database automatized data analysis Utility "analyze and extract knowledge from

More information

Privacy in Statistical Databases

Privacy in Statistical Databases Privacy in Statistical Databases CSE 598D/STAT 598B Fall 2007 Lecture 2, 9/13/2007 Aleksandra Slavkovic Office hours: MW 3:30-4:30 Office: Thomas 412 Phone: x3-4918 Adam Smith Office hours: Mondays 3-5pm

More information

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique

SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique SMMCOA: Maintaining Multiple Correlations between Overlapped Attributes Using Slicing Technique Sumit Jain 1, Abhishek Raghuvanshi 1, Department of information Technology, MIT, Ujjain Abstract--Knowledge

More information

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness Data Security and Privacy Topic 18: k-anonymity, l-diversity, and t-closeness 1 Optional Readings for This Lecture t-closeness: Privacy Beyond k-anonymity and l-diversity. Ninghui Li, Tiancheng Li, and

More information

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Dr.K.P.Kaliyamurthie HOD, Department of CSE, Bharath University, Tamilnadu, India ABSTRACT: Automated

More information

A generic and distributed privacy preserving classification method with a worst-case privacy guarantee

A generic and distributed privacy preserving classification method with a worst-case privacy guarantee Distrib Parallel Databases (2014) 32:5 35 DOI 10.1007/s10619-013-7126-6 A generic and distributed privacy preserving classification method with a worst-case privacy guarantee Madhushri Banerjee Zhiyuan

More information

Differential Privacy Under Fire

Differential Privacy Under Fire Differential Privacy Under Fire Andreas Haeberlen Benjamin C. Pierce Arjun Narayan University of Pennsylvania 1 Motivation: Protecting privacy Alice #1 (Star Wars, 5) (Alien, 4) Bob #2 (Godfather, 1) (Porn,

More information

Coresets for Differentially Private K-Means Clustering and Applications to Privacy in Mobile Sensor Networks

Coresets for Differentially Private K-Means Clustering and Applications to Privacy in Mobile Sensor Networks Coresets for Differentially Private K-Means Clustering and Applications to Privacy in Mobile Sensor Networks ABSTRACT Mobile sensor networks are a great source of data. By collecting data with mobile sensor

More information

Emerging Measures in Preserving Privacy for Publishing The Data

Emerging Measures in Preserving Privacy for Publishing The Data Emerging Measures in Preserving Privacy for Publishing The Data K.SIVARAMAN 1 Assistant Professor, Dept. of Computer Science, BIST, Bharath University, Chennai -600073 1 ABSTRACT: The information in the

More information

The Two Dimensions of Data Privacy Measures

The Two Dimensions of Data Privacy Measures The Two Dimensions of Data Privacy Measures Abstract Orit Levin Page 1 of 9 Javier Salido Corporat e, Extern a l an d Lega l A ffairs, Microsoft This paper describes a practical framework for the first

More information

Introduction to Secure Multi-Party Computation

Introduction to Secure Multi-Party Computation Introduction to Secure Multi-Party Computation Many thanks to Vitaly Shmatikov of the University of Texas, Austin for providing these slides. slide 1 Motivation General framework for describing computation

More information

Adding Differential Privacy in an Open Board Discussion Board System

Adding Differential Privacy in an Open Board Discussion Board System San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-26-2017 Adding Differential Privacy in an Open Board Discussion Board System Pragya Rana San

More information

Sanitization of call detail records via differentially-private Bloom filters

Sanitization of call detail records via differentially-private Bloom filters Sanitization of call detail records via differentially-private Bloom filters Mohammad Alaggan Helwan University Joint work with Sébastien Gambs (Université de Rennes 1 - Inria / IRISA), Stan Matwin and

More information

Differentially Private Trajectory Data Publication

Differentially Private Trajectory Data Publication Differentially Private Trajectory Data Publication Rui Chen #1, Benjamin C. M. Fung 2, Bipin C. Desai #3 # Department of Computer Science and Software Engineering, Concordia University Montreal, Quebec,

More information

Demonstration of Damson: Differential Privacy for Analysis of Large Data

Demonstration of Damson: Differential Privacy for Analysis of Large Data Demonstration of Damson: Differential Privacy for Analysis of Large Data Marianne Winslett 1,2, Yin Yang 1,2, Zhenjie Zhang 1 1 Advanced Digital Sciences Center, Singapore {yin.yang, zhenjie}@adsc.com.sg

More information

Summarizing and mining inverse distributions on data streams via dynamic inverse sampling

Summarizing and mining inverse distributions on data streams via dynamic inverse sampling Summarizing and mining inverse distributions on data streams via dynamic inverse sampling Presented by Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Irina Rozenbaum rozenbau@paul.rutgers.edu

More information

Clustering. (Part 2)

Clustering. (Part 2) Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works

More information

Differentially-Private Network Trace Analysis. Frank McSherry and Ratul Mahajan Microsoft Research

Differentially-Private Network Trace Analysis. Frank McSherry and Ratul Mahajan Microsoft Research Differentially-Private Network Trace Analysis Frank McSherry and Ratul Mahajan Microsoft Research Overview. 1 Overview Question: Is it possible to conduct network trace analyses in a way that provides

More information

Secure Multiparty Computation

Secure Multiparty Computation CS573 Data Privacy and Security Secure Multiparty Computation Problem and security definitions Li Xiong Outline Cryptographic primitives Symmetric Encryption Public Key Encryption Secure Multiparty Computation

More information

On Biased Reservoir Sampling in the Presence of Stream Evolution

On Biased Reservoir Sampling in the Presence of Stream Evolution Charu C. Aggarwal T J Watson Research Center IBM Corporation Hawthorne, NY USA On Biased Reservoir Sampling in the Presence of Stream Evolution VLDB Conference, Seoul, South Korea, 2006 Synopsis Construction

More information

A Case Study: Privacy Preserving Release of Spa9o- temporal Density in Paris

A Case Study: Privacy Preserving Release of Spa9o- temporal Density in Paris A Case Study: Privacy Preserving Release of Spa9o- temporal Density in Paris Gergely Acs (INRIA) gergely.acs@inria.fr!! Claude Castelluccia (INRIA) claude.castelluccia@inria.fr! Outline 2! Dataset descrip9on!

More information

Collecting Telemetry Data Privately

Collecting Telemetry Data Privately Collecting Telemetry Data Privately Bolin Ding, Janardhan Kulkarni, Sergey Yekhanin Microsoft Research {bolind, jakul, yekhanin}@microsoft.com Abstract The collection and analysis of telemetry data from

More information

An Adaptive Algorithm for Range Queries in Differential Privacy

An Adaptive Algorithm for Range Queries in Differential Privacy Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 6-2016 An Adaptive Algorithm for Range Queries in Differential Privacy Asma Alnemari Follow this and additional

More information

6.842 Randomness and Computation September 25-27, Lecture 6 & 7. Definition 1 Interactive Proof Systems (IPS) [Goldwasser, Micali, Rackoff]

6.842 Randomness and Computation September 25-27, Lecture 6 & 7. Definition 1 Interactive Proof Systems (IPS) [Goldwasser, Micali, Rackoff] 6.84 Randomness and Computation September 5-7, 017 Lecture 6 & 7 Lecturer: Ronitt Rubinfeld Scribe: Leo de Castro & Kritkorn Karntikoon 1 Interactive Proof Systems An interactive proof system is a protocol

More information

Privacy-Preserving Outlier Detection for Data Streams

Privacy-Preserving Outlier Detection for Data Streams Privacy-Preserving Outlier Detection for Data Streams Jonas Böhler 1, Daniel Bernau 1, and Florian Kerschbaum 2 1 SAP Research, Karlsruhe, Germany firstname.lastname@sap.com 2 University of Waterloo, Canada

More information

Mining Frequent Patterns with Differential Privacy

Mining Frequent Patterns with Differential Privacy Mining Frequent Patterns with Differential Privacy Luca Bonomi (Supervised by Prof. Li Xiong) Department of Mathematics & Computer Science Emory University Atlanta, USA lbonomi@mathcs.emory.edu ABSTRACT

More information

Differen'al Privacy. CS 297 Pragya Rana

Differen'al Privacy. CS 297 Pragya Rana Differen'al Privacy CS 297 Pragya Rana Outline Introduc'on Privacy Data Analysis: The SeAng Impossibility of Absolute Disclosure Preven'on Achieving Differen'al Privacy Introduc'on Sta's'c: quan'ty computed

More information

CS 6903 Modern Cryptography February 14th, Lecture 4: Instructor: Nitesh Saxena Scribe: Neil Stewart, Chaya Pradip Vavilala

CS 6903 Modern Cryptography February 14th, Lecture 4: Instructor: Nitesh Saxena Scribe: Neil Stewart, Chaya Pradip Vavilala CS 6903 Modern Cryptography February 14th, 2008 Lecture 4: Instructor: Nitesh Saxena Scribe: Neil Stewart, Chaya Pradip Vavilala Definition 1 (Indistinguishability (IND-G)) IND-G is a notion that was defined

More information

CS321 Introduction To Numerical Methods

CS321 Introduction To Numerical Methods CS3 Introduction To Numerical Methods Fuhua (Frank) Cheng Department of Computer Science University of Kentucky Lexington KY 456-46 - - Table of Contents Errors and Number Representations 3 Error Types

More information

Differentially Private Histogram Publication

Differentially Private Histogram Publication Noname manuscript No. will be inserted by the editor) Differentially Private Histogram Publication Jia Xu Zhenjie Zhang Xiaokui Xiao Yin Yang Ge Yu Marianne Winslett Received: date / Accepted: date Abstract

More information

Algorithms for Integer Programming

Algorithms for Integer Programming Algorithms for Integer Programming Laura Galli November 9, 2016 Unlike linear programming problems, integer programming problems are very difficult to solve. In fact, no efficient general algorithm is

More information

Position Paper: Differential Privacy with Information Flow Control

Position Paper: Differential Privacy with Information Flow Control Position Paper: Differential Privacy with Information Flow Control Arnar Birgisson Chalmers University of Technology arnar.birgisson@chalmers.se Frank McSherry Martín Abadi Microsoft Research {mcsherry,abadi}@microsoft.com

More information

A Mathematical Proof. Zero Knowledge Protocols. Interactive Proof System. Other Kinds of Proofs. When referring to a proof in logic we usually mean:

A Mathematical Proof. Zero Knowledge Protocols. Interactive Proof System. Other Kinds of Proofs. When referring to a proof in logic we usually mean: A Mathematical Proof When referring to a proof in logic we usually mean: 1. A sequence of statements. 2. Based on axioms. Zero Knowledge Protocols 3. Each statement is derived via the derivation rules.

More information

Zero Knowledge Protocols. c Eli Biham - May 3, Zero Knowledge Protocols (16)

Zero Knowledge Protocols. c Eli Biham - May 3, Zero Knowledge Protocols (16) Zero Knowledge Protocols c Eli Biham - May 3, 2005 442 Zero Knowledge Protocols (16) A Mathematical Proof When referring to a proof in logic we usually mean: 1. A sequence of statements. 2. Based on axioms.

More information

A Differentially Private Matching Scheme for Pairing Similar Users of Proximity Based Social Networking applications

A Differentially Private Matching Scheme for Pairing Similar Users of Proximity Based Social Networking applications Proceedings of the 51 st Hawaii International Conference on System Sciences 2018 A Differentially Private Matching Scheme for Pairing Similar Users of Proximity Based Social Networking applications Michael

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Event Detection through Differential Pattern Mining in Internet of Things

Event Detection through Differential Pattern Mining in Internet of Things Event Detection through Differential Pattern Mining in Internet of Things Authors: Md Zakirul Alam Bhuiyan and Jie Wu IEEE MASS 2016 The 13th IEEE International Conference on Mobile Ad hoc and Sensor Systems

More information

Secure Multiparty Computation Multi-round protocols

Secure Multiparty Computation Multi-round protocols Secure Multiparty Computation Multi-round protocols Li Xiong CS573 Data Privacy and Security Secure multiparty computation General circuit based secure multiparty computation methods Specialized secure

More information

Matrix Mechanism and Data Dependent algorithms

Matrix Mechanism and Data Dependent algorithms Matrix Mechanism and Data Dependent algorithms CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 9 : 590.03 Fall 16 1 Recap: Constrained Inference Lecture 9 : 590.03 Fall 16 2 Constrained Inference

More information

Authenticating Pervasive Devices with Human Protocols

Authenticating Pervasive Devices with Human Protocols Authenticating Pervasive Devices with Human Protocols Presented by Xiaokun Mu Paper Authors: Ari Juels RSA Laboratories Stephen A. Weis Massachusetts Institute of Technology Authentication Problems It

More information

HW Graph Theory SOLUTIONS (hbovik) - Q

HW Graph Theory SOLUTIONS (hbovik) - Q 1, Diestel 9.3: An arithmetic progression is an increasing sequence of numbers of the form a, a+d, a+ d, a + 3d.... Van der Waerden s theorem says that no matter how we partition the natural numbers into

More information

Cryptography CS 555. Topic 11: Encryption Modes and CCA Security. CS555 Spring 2012/Topic 11 1

Cryptography CS 555. Topic 11: Encryption Modes and CCA Security. CS555 Spring 2012/Topic 11 1 Cryptography CS 555 Topic 11: Encryption Modes and CCA Security CS555 Spring 2012/Topic 11 1 Outline and Readings Outline Encryption modes CCA security Readings: Katz and Lindell: 3.6.4, 3.7 CS555 Spring

More information

Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles

Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles Supporting Information to Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles Ali Shojaie,#, Alexandra Jauhiainen 2,#, Michael Kallitsis 3,#, George

More information

An iteration of the simplex method (a pivot )

An iteration of the simplex method (a pivot ) Recap, and outline of Lecture 13 Previously Developed and justified all the steps in a typical iteration ( pivot ) of the Simplex Method (see next page). Today Simplex Method Initialization Start with

More information

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S.

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series Introduction to Privacy-Preserving Data Publishing Concepts and Techniques Benjamin C M Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S Yu CRC

More information

Mechanism Design in Large Congestion Games

Mechanism Design in Large Congestion Games Mechanism Design in Large Congestion Games Ryan Rogers, Aaron Roth, Jonathan Ullman, and Steven Wu July 22, 2015 Routing Game l e (y) Routing Game Routing Game A routing game G is defined by Routing Game

More information

Differentially Private Spatial Decompositions

Differentially Private Spatial Decompositions Differentially Private Spatial Decompositions Graham Cormode Cecilia Procopiuc Divesh Srivastava AT&T Labs Research {graham, magda, divesh}@research.att.com Entong Shen Ting Yu North Carolina State University

More information

Topic 5 - Joint distributions and the CLT

Topic 5 - Joint distributions and the CLT Topic 5 - Joint distributions and the CLT Joint distributions Calculation of probabilities, mean and variance Expectations of functions based on joint distributions Central Limit Theorem Sampling distributions

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

Secure Multiparty Computation: Introduction. Ran Cohen (Tel Aviv University)

Secure Multiparty Computation: Introduction. Ran Cohen (Tel Aviv University) Secure Multiparty Computation: Introduction Ran Cohen (Tel Aviv University) Scenario 1: Private Dating Alice and Bob meet at a pub If both of them want to date together they will find out If Alice doesn

More information

Graph Theory and Optimization Approximation Algorithms

Graph Theory and Optimization Approximation Algorithms Graph Theory and Optimization Approximation Algorithms Nicolas Nisse Université Côte d Azur, Inria, CNRS, I3S, France October 2018 Thank you to F. Giroire for some of the slides N. Nisse Graph Theory and

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

Time Distortion Anonymization for the Publication of Mobility Data with High Utility

Time Distortion Anonymization for the Publication of Mobility Data with High Utility Time Distortion Anonymization for the Publication of Mobility Data with High Utility Vincent Primault, Sonia Ben Mokhtar, Cédric Lauradoux and Lionel Brunie Mobility data usefulness Real-time traffic,

More information

Anonymization Algorithms - Microaggregation and Clustering

Anonymization Algorithms - Microaggregation and Clustering Anonymization Algorithms - Microaggregation and Clustering Li Xiong CS573 Data Privacy and Anonymity Anonymization using Microaggregation or Clustering Practical Data-Oriented Microaggregation for Statistical

More information

Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle

Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle Jacob Whitehill jrwhitehill@wpi.edu Worcester Polytechnic Institute https://arxiv.org/abs/1707.01825 Machine learning competitions Data-mining

More information

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique P.Nithya 1, V.Karpagam 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College,

More information

The Applicability of the Perturbation Model-based Privacy Preserving Data Mining for Real-world Data

The Applicability of the Perturbation Model-based Privacy Preserving Data Mining for Real-world Data The Applicability of the Perturbation Model-based Privacy Preserving Data Mining for Real-world Data Li Liu, Murat Kantarcioglu and Bhavani Thuraisingham Computer Science Department University of Texas

More information

Approximation Algorithms for k-anonymity 1

Approximation Algorithms for k-anonymity 1 Journal of Privacy Technology 20051120001 Approximation Algorithms for k-anonymity 1 Gagan Aggarwal 2 Google Inc., Mountain View, CA Tomas Feder 268 Waverley St., Palo Alto, CA Krishnaram Kenthapadi 3

More information

CPSC 536N: Randomized Algorithms Term 2. Lecture 10

CPSC 536N: Randomized Algorithms Term 2. Lecture 10 CPSC 536N: Randomized Algorithms 011-1 Term Prof. Nick Harvey Lecture 10 University of British Columbia In the first lecture we discussed the Max Cut problem, which is NP-complete, and we presented a very

More information

CS573 Data Privacy and Security. Cryptographic Primitives and Secure Multiparty Computation. Li Xiong

CS573 Data Privacy and Security. Cryptographic Primitives and Secure Multiparty Computation. Li Xiong CS573 Data Privacy and Security Cryptographic Primitives and Secure Multiparty Computation Li Xiong Outline Cryptographic primitives Symmetric Encryption Public Key Encryption Secure Multiparty Computation

More information

Secure Multiparty Computation

Secure Multiparty Computation Secure Multiparty Computation Li Xiong CS573 Data Privacy and Security Outline Secure multiparty computation Problem and security definitions Basic cryptographic tools and general constructions Yao s Millionnare

More information

Differential Privacy and Its Application

Differential Privacy and Its Application Differential Privacy and Its Application By Tianqing Zhu Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Deakin University March 2014 To my Son... iii Table of Contents

More information

Relationship Privacy: Output Perturbation for Queries with Joins

Relationship Privacy: Output Perturbation for Queries with Joins Relationship Privacy: Output Perturbation for Queries with Joins ABSTRACT Vibhor Rastogi CSE, University of Washington Seattle, WA, USA, 98105-2350 vibhor@cs.washington.edu Gerome Miklau CS, UMass Amherst

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information