Synthetic Data. Michael Lin

Size: px
Start display at page:

Download "Synthetic Data. Michael Lin"

Transcription

1 Synthetic Data Michael Lin 1

2 Overview The data privacy problem Imputation Synthetic data Analysis 2

3 Data Privacy As a data provider, how can we release data containing private information without disclosing this private information? For some values of private and disclosure Many many approaches. Could teach an entire course about it! Removal of data, k-anonymity, synthetic data... 3

4 Synthetic Data Overview The basic idea is simple: Analyze the data to determine its statistical properties Create a data set based on this knowledge Release the new data set Does this satisfy data privacy requirements? Is this useful? 4

5 Synthetic Data Overview Is it even possible to create a data set that preserves the statistical properties of the original? How do we do it in general? 5

6 Imputation Imputation - a statistical method for filling in missing data values Multiple imputation - impute data m times and release all m data sets S R A S R A A M - 20 A M W 20 B F - 21 Impute B F B 21 C F B 26 C F B 26 D - W - D F W 22 BACK 6

7 Multiple Imputation With Large Sample Sizes The original formulation of multiple imputation (Rubin 1987) Y obs Y mis is the observed data is the data missing due to non-response The distribution is described by: D = (X, Y obs, I, R) Y mis (Y mis D) based on posterior predictive distribution of 7

8 Multiple Imputation With Large Sample Sizes I - a vector that indicates whether a given individual is selected to be surveyed R - a vector that indicates whether a given individual responded to the survey Design Variables Variables/ Predictors Y X1 X2 X3 Education Sex Race Age We assume X is missing no data 8

9 Multiple Imputation With Large Sample Sizes Data provider repeats process from previous slides m times and releases m complete data sets Each complete data set can be analyzed with regular statistics and software After all m have been analyzed for some variable Q (ie. population mean), 3 equations give the estimated value of Q and the variance of the estimate 9

10 Multiple Imputation With Large Sample Sizes Q m = B m = Ū m = m l=1 Q (l) /m Sample Mean m (Q (l) Q m ) 2 /(m 1) l=1 m l=1 Variance Across Samples U (l) /m. Sample Variance Q m estimates Q, T m = (1 + 1/m)B m + Ūm estimates the variance of Q given this data set (a t- distribution) As m increases, these estimates improve BACK 10

11 Multiple Imputation and Data Privacy What does imputation have to do with data privacy? Traditional imputing is a method for using available data to fill in missing data What if all the responses are missing? 11

12 Creating Fully Synthetic Data Previously, we imputed values only for the sample. Now impute values for the population not in the sample. This produces the l-th complete data set (X, Y (l) com) Population (Data Unknown) Sample Data Set Impute Population Complete Data Set Sample Data Set 12

13 Creating Fully Synthetic Data Randomly sample from (X, Y (l) produce synthetic data set d n syn com) d (l) = (X, Y (l) syn) times to There is still a small possibility of sampling real data (can eliminate this possibility) Repeat m times and release these data sets Population Complete Data Set Sample Data Set Sample d (l) 13

14 Analyzing Fully Synthetic Data Calculate Q, B, and U as for normal multiple imputation However, calculate the variance for Q as T f = (1+1/m)B m Ūm T m = (1 + 1/m)B m + Ūm, compared to for normal imputation Intuitively, the first term estimates the variance of Q, and the second term estimates the variance due to the random sampling of (X, Y (l) com) 14

15 Partially Synthetic Data The same process as normal multiple imputation, except we replace data instead of filling it in S R A S R A A M B 20 A M W 20 B F W 21 Impute B F B 21 C F B 26 C F B 24 D F W 23 D F W 22 15

16 Partially Synthetic Data Replacing instead of filling in changes the analysis Use the same 3 equations, but now we measure variance with: T p = B m /m+ūm Note that it s trivial to identify which variables are synthetic in partially synthetic data 16

17 Analysis As always, we want to measure two things: How useful is this data? How well is confidentiality preserved? What trade-offs do we make here? 17

18 Confidentiality Identifying a person based on fully synthetic data is claimed to be pretty much impossible It is easier (but still difficult) to identify the real variables that the synthetic data is based on Both these claims are based on the security of using modeled data rather than actual data What if the model is too good? 18

19 Confidentiality Risks Variables imputed from distributions with small variances could be identified from synthetic data If the statistical models used for imputation are too accurate, real data can be leaked Bootstrapping can leak real data Bootstrapping - statistical resampling method that re-uses real data 19

20 Confidentiality Risks These risks can be controlled: Use less precise distributions when imputing This hurts the utility of the synthetic data Don t bootstrap 20

21 Utility The utility of synthetic data is based almost entirely on how good the distribution models of the original data are If the models are perfect, synthetic data will preserve all correlations and statistical measurements present in the original Since perfect models are impossible, very good ones will have to do 21

22 Utility What are the downsides of synthetic data? If an analyst wants to analyze a tenuous or obscure relationship in the original data, the synthetic modeling may not capture it Fundamentally: it s impossible to analyze anything that isn t modeled 22

23 Paper Example Generally, the synthetic data is very good for most variables, and awful for others The bad variables tend to measure relationships not captured in the models Does not discuss real or potential reidentification disclosure Predictive disclosure example is rather soft 23

24 Comments Where s the proof that synthetic data makes the risk of reidentification practically non-existant? Risk of reidentification is highly dependent on the models used, so this probably can t be proved in general, but at least some mathematical logic is needed No mathematical justification or proof given 24

Comparative Evaluation of Synthetic Dataset Generation Methods

Comparative Evaluation of Synthetic Dataset Generation Methods Comparative Evaluation of Synthetic Dataset Generation Methods Ashish Dandekar, Remmy A. M. Zen, Stéphane Bressan December 12, 2017 1 / 17 Open Data vs Data Privacy Open Data Helps crowdsourcing the research

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex

More information

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland Statistical Analysis Using Combined Data Sources: Discussion 2011 JPSM Distinguished Lecture University of Maryland 1 1 University of Michigan School of Public Health April 2011 Complete (Ideal) vs. Observed

More information

Security Control Methods for Statistical Database

Security Control Methods for Statistical Database Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security Statistical Database A statistical database is a database which provides statistics on subsets of records OLAP

More information

MIS2502: Data Analytics Clustering and Segmentation. Jing Gong

MIS2502: Data Analytics Clustering and Segmentation. Jing Gong MIS2502: Data Analytics Clustering and Segmentation Jing Gong gong@temple.edu http://community.mis.temple.edu/gong What is Cluster Analysis? Grouping data so that elements in a group will be Similar (or

More information

Privacy in Statistical Databases

Privacy in Statistical Databases Privacy in Statistical Databases CSE 598D/STAT 598B Fall 2007 Lecture 2, 9/13/2007 Aleksandra Slavkovic Office hours: MW 3:30-4:30 Office: Thomas 412 Phone: x3-4918 Adam Smith Office hours: Mondays 3-5pm

More information

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy Simulation of Imputation Effects Under Different Assumptions Danny Rithy ABSTRACT Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive

More information

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options

More information

Michelle Hayes Mary Joel Holin. Michael Roanhouse Julie Hovden. Special Thanks To. Disclaimer

Michelle Hayes Mary Joel Holin. Michael Roanhouse Julie Hovden. Special Thanks To. Disclaimer Further Understanding the Intersection of Technology and Privacy to Ensure and Protect Client Data Special Thanks To Michelle Hayes Mary Joel Holin We can provably know where domestic violence shelter

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

The Two Dimensions of Data Privacy Measures

The Two Dimensions of Data Privacy Measures The Two Dimensions of Data Privacy Measures Abstract Orit Levin Page 1 of 9 Javier Salido Corporat e, Extern a l an d Lega l A ffairs, Microsoft This paper describes a practical framework for the first

More information

Multiple imputation using chained equations: Issues and guidance for practice

Multiple imputation using chained equations: Issues and guidance for practice Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau

More information

RESAMPLING METHODS. Chapter 05

RESAMPLING METHODS. Chapter 05 1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation

More information

A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis

A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 1 (2016), pp. 1131-1140 Research India Publications http://www.ripublication.com A Monotonic Sequence and Subsequence Approach

More information

Handling missing data for indicators, Susanne Rässler 1

Handling missing data for indicators, Susanne Rässler 1 Handling Missing Data for Indicators Susanne Rässler Institute for Employment Research & Federal Employment Agency Nürnberg, Germany First Workshop on Indicators in the Knowledge Economy, Tübingen, 3-4

More information

Cryptography & Data Privacy Research in the NSRC

Cryptography & Data Privacy Research in the NSRC Cryptography & Data Privacy Research in the NSRC Adam Smith Assistant Professor Computer Science and Engineering 1 Cryptography & Data Privacy @ CSE NSRC SIIS Algorithms & Complexity Group Cryptography

More information

A Solidify Understanding Task

A Solidify Understanding Task 17 A Solidify Understanding Task We know that two triangles are congruent if all pairs of corresponding sides are congruent and all pairs of corresponding angles are congruent. We may wonder if knowing

More information

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to

More information

Definition. Quantifying Anonymity. Anonymous Communication. How can we calculate how anonymous we are? Who you are from the communicating party

Definition. Quantifying Anonymity. Anonymous Communication. How can we calculate how anonymous we are? Who you are from the communicating party Definition Anonymous Communication Hiding identities of parties involved in communications from each other, or from third-parties Who you are from the communicating party Who you are talking to from everyone

More information

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems Salford Systems Predictive Modeler Unsupervised Learning Salford Systems http://www.salford-systems.com Unsupervised Learning In mainstream statistics this is typically known as cluster analysis The term

More information

Section 4 Matching Estimator

Section 4 Matching Estimator Section 4 Matching Estimator Matching Estimators Key Idea: The matching method compares the outcomes of program participants with those of matched nonparticipants, where matches are chosen on the basis

More information

The Importance of Modeling the Sampling Design in Multiple. Imputation for Missing Data

The Importance of Modeling the Sampling Design in Multiple. Imputation for Missing Data The Importance of Modeling the Sampling Design in Multiple Imputation for Missing Data Jerome P. Reiter, Trivellore E. Raghunathan, and Satkartar K. Kinney Key Words: Complex Sampling Design, Multiple

More information

Secure Multiparty Computation

Secure Multiparty Computation CS573 Data Privacy and Security Secure Multiparty Computation Problem and security definitions Li Xiong Outline Cryptographic primitives Symmetric Encryption Public Key Encryption Secure Multiparty Computation

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Data Protection and Information Security. Presented by Emma Hawksworth Slater and Gordon

Data Protection and Information Security. Presented by Emma Hawksworth Slater and Gordon Data Protection and Information Security Webinar Presented by Emma Hawksworth Slater and Gordon 1 3 ways to participate Ask questions link below this presentation Answer the polls link below this presentation

More information

Privacy and Security Aspects Related to the Use of Big Data Progress of work in the ESS. Pascal Jacques Eurostat Local Security Officer 1

Privacy and Security Aspects Related to the Use of Big Data Progress of work in the ESS. Pascal Jacques Eurostat Local Security Officer 1 Privacy and Security Aspects Related to the Use of Big Data Progress of work in the ESS Pascal Jacques Eurostat Local Security Officer 1 Current work on privacy and ethics in Big data Privacy Confidentiality

More information

Data Anonymization. Graham Cormode.

Data Anonymization. Graham Cormode. Data Anonymization Graham Cormode graham@research.att.com 1 Why Anonymize? For Data Sharing Give real(istic) data to others to study without compromising privacy of individuals in the data Allows third-parties

More information

Missing data analysis. University College London, 2015

Missing data analysis. University College London, 2015 Missing data analysis University College London, 2015 Contents 1. Introduction 2. Missing-data mechanisms 3. Missing-data methods that discard data 4. Simple approaches that retain all the data 5. RIBG

More information

CS6501: Great Works in Computer Science

CS6501: Great Works in Computer Science CS6501: Great Works in Computer Science Jan. 29th 2013 Longze Chen The Protection of Information in Computer Systems Jerome H. Saltzer and Michael D. Schroeder Jerry Saltzer Michael Schroeder 1 The Meaning

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

CS573 Data Privacy and Security. Differential Privacy. Li Xiong

CS573 Data Privacy and Security. Differential Privacy. Li Xiong CS573 Data Privacy and Security Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques Composition theorems Statistical Data Privacy Non-interactive vs interactive Privacy

More information

An imputation approach for analyzing mixed-mode surveys

An imputation approach for analyzing mixed-mode surveys An imputation approach for analyzing mixed-mode surveys Jae-kwang Kim 1 Iowa State University June 4, 2013 1 Joint work with S. Park and S. Kim Ouline Introduction Proposed Methodology Application to Private

More information

CS475 Network and Information Security

CS475 Network and Information Security CS475 Network and Information Security Lecture 1 Introduction Elias Athanasopoulos eliasathan@cs.ucy.ac.cy What is this course all about? Understand the fundamental concepts of security in software, systems,

More information

Privacy Challenges in Big Data and Industry 4.0

Privacy Challenges in Big Data and Industry 4.0 Privacy Challenges in Big Data and Industry 4.0 Jiannong Cao Internet & Mobile Computing Lab Department of Computing Hong Kong Polytechnic University Email: csjcao@comp.polyu.edu.hk http://www.comp.polyu.edu.hk/~csjcao/

More information

A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY

A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY A STOCHASTIC METHOD FOR ESTIMATING IMPUTATION ACCURACY Norman Solomon School of Computing and Technology University of Sunderland A thesis submitted in partial fulfilment of the requirements of the University

More information

Handling Data with Three Types of Missing Values:

Handling Data with Three Types of Missing Values: Handling Data with Three Types of Missing Values: A Simulation Study Jennifer Boyko Advisor: Ofer Harel Department of Statistics University of Connecticut Storrs, CT May 21, 2013 Jennifer Boyko Handling

More information

Multiple-imputation analysis using Stata s mi command

Multiple-imputation analysis using Stata s mi command Multiple-imputation analysis using Stata s mi command Yulia Marchenko Senior Statistician StataCorp LP 2009 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi

More information

CS573 Data Privacy and Security. Cryptographic Primitives and Secure Multiparty Computation. Li Xiong

CS573 Data Privacy and Security. Cryptographic Primitives and Secure Multiparty Computation. Li Xiong CS573 Data Privacy and Security Cryptographic Primitives and Secure Multiparty Computation Li Xiong Outline Cryptographic primitives Symmetric Encryption Public Key Encryption Secure Multiparty Computation

More information

CS682 Advanced Security Topics

CS682 Advanced Security Topics CS682 Advanced Security Topics Lecture 1 Introduction Elias Athanasopoulos eliasathan@cs.ucy.ac.cy Course Structure Phase 1 4 weeks crash course in applied cryptography, system security, network security

More information

Privacy, Security & Ethical Issues

Privacy, Security & Ethical Issues Privacy, Security & Ethical Issues How do we mine data when we can t even look at it? 2 Individual Privacy Nobody should know more about any entity after the data mining than they did before Approaches:

More information

Protecting the Privacy with Human-Readable Pseudonyms: One-Way Pseudonym Calculation on Base of Primitive Roots

Protecting the Privacy with Human-Readable Pseudonyms: One-Way Pseudonym Calculation on Base of Primitive Roots Protecting the Privacy with Human-Readable Pseudonyms: One-Way Pseudonym Calculation on Base of Primitive Roots Uwe Roth SANTEC CRP Henri Tudor Luxembourg, Luxemburg uwe.roth@tudor.lu Abstract Pseudonyms

More information

K ANONYMITY. Xiaoyong Zhou

K ANONYMITY. Xiaoyong Zhou K ANONYMITY LATANYA SWEENEY Xiaoyong Zhou DATA releasing: Privacy vs. Utility Society is experiencing exponential growth in the number and variety of data collections containing person specific specific

More information

Privacy Preserving Service Discovery for Interoperability in Power to the Edge Approach Research and Development Initiative, Chuo University

Privacy Preserving Service Discovery for Interoperability in Power to the Edge Approach Research and Development Initiative, Chuo University Privacy Preserving Service Discovery for Interoperability in Power to the Edge Approach Research and Development Initiative, Chuo University Hiroshi Yamaguchi, Masahito Gotaishi, Shigeo Tsujii, Norihisa

More information

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms

More information

The problem we have now is called variable selection or perhaps model selection. There are several objectives.

The problem we have now is called variable selection or perhaps model selection. There are several objectives. STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We

More information

Missing Data Techniques

Missing Data Techniques Missing Data Techniques Paul Philippe Pare Department of Sociology, UWO Centre for Population, Aging, and Health, UWO London Criminometrics (www.crimino.biz) 1 Introduction Missing data is a common problem

More information

Research Use of Restricted Data: The HRS Experience. The Health and Retirement Study The University of Michigan

Research Use of Restricted Data: The HRS Experience. The Health and Retirement Study The University of Michigan Research Use of Restricted Data: The HRS Experience Michael A. Nolte Senior Research Associate Janet J. Keller Research Associate The Health and Retirement Study The University of Michigan Introduction

More information

Microdata Publishing with Algorithmic Privacy Guarantees

Microdata Publishing with Algorithmic Privacy Guarantees Microdata Publishing with Algorithmic Privacy Guarantees Tiancheng Li and Ninghui Li Department of Computer Science, Purdue University 35 N. University Street West Lafayette, IN 4797-217 {li83,ninghui}@cs.purdue.edu

More information

Panel Data 4: Fixed Effects vs Random Effects Models

Panel Data 4: Fixed Effects vs Random Effects Models Panel Data 4: Fixed Effects vs Random Effects Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 4, 2017 These notes borrow very heavily, sometimes verbatim,

More information

Lab 3: Sampling Distributions

Lab 3: Sampling Distributions Lab 3: Sampling Distributions Sampling from Ames, Iowa In this lab, we will investigate the ways in which the estimates that we make based on a random sample of data can inform us about what the population

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining

More information

Hardness of Approximation for the TSP. Michael Lampis LAMSADE Université Paris Dauphine

Hardness of Approximation for the TSP. Michael Lampis LAMSADE Université Paris Dauphine Hardness of Approximation for the TSP Michael Lampis LAMSADE Université Paris Dauphine Sep 2, 2015 Overview Hardness of Approximation What is it? How to do it? (Easy) Examples The PCP Theorem What is it?

More information

CS152: Programming Languages. Lecture 11 STLC Extensions and Related Topics. Dan Grossman Spring 2011

CS152: Programming Languages. Lecture 11 STLC Extensions and Related Topics. Dan Grossman Spring 2011 CS152: Programming Languages Lecture 11 STLC Extensions and Related Topics Dan Grossman Spring 2011 Review e ::= λx. e x e e c v ::= λx. e c τ ::= int τ τ Γ ::= Γ, x : τ (λx. e) v e[v/x] e 1 e 1 e 1 e

More information

Learning from Data: Adaptive Basis Functions

Learning from Data: Adaptive Basis Functions Learning from Data: Adaptive Basis Functions November 21, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.

More information

Last time. Reasoning about programs. Coming up. Project Final Presentations. This Thursday, Nov 30: 4 th in-class exercise

Last time. Reasoning about programs. Coming up. Project Final Presentations. This Thursday, Nov 30: 4 th in-class exercise Last time Reasoning about programs Coming up This Thursday, Nov 30: 4 th in-class exercise sign up for group on moodle bring laptop to class Final projects: final project presentations: Tue Dec 12, in

More information

Reasoning about programs

Reasoning about programs Reasoning about programs Last time Coming up This Thursday, Nov 30: 4 th in-class exercise sign up for group on moodle bring laptop to class Final projects: final project presentations: Tue Dec 12, in

More information

Discrete Mathematics and Probability Theory Summer 2016 Dinh, Psomas, and Ye HW 2

Discrete Mathematics and Probability Theory Summer 2016 Dinh, Psomas, and Ye HW 2 CS 70 Discrete Mathematics and Probability Theory Summer 2016 Dinh, Psomas, and Ye HW 2 Due Tuesday July 5 at 1:59PM 1. (8 points: 3/5) Hit or miss For each of the claims and proofs below, state whether

More information

0x1A Great Papers in Computer Security

0x1A Great Papers in Computer Security CS 380S 0x1A Great Papers in Computer Security Vitaly Shmatikov http://www.cs.utexas.edu/~shmat/courses/cs380s/ C. Dwork Differential Privacy (ICALP 2006 and many other papers) Basic Setting DB= x 1 x

More information

How Do Tor Users Interact With Onion Services?

How Do Tor Users Interact With Onion Services? How Do Tor Users Interact With Onion Services? Philipp Winter, Annie Edmundson, Laura Roberts, Agnieszka Dutkowska-Zuk, Marshini Chetty, Nick Feamster USENIX Security Symposium 15 August 2018 1 Tor is

More information

NON-CENTRALIZED DISTINCT L-DIVERSITY

NON-CENTRALIZED DISTINCT L-DIVERSITY NON-CENTRALIZED DISTINCT L-DIVERSITY Chi Hong Cheong 1, Dan Wu 2, and Man Hon Wong 3 1,3 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong {chcheong, mhwong}@cse.cuhk.edu.hk

More information

Pseudonymization risk analysis in distributed systems

Pseudonymization risk analysis in distributed systems Neumann et al. Journal of Internet Services and Applications (2019) 10:1 https://doi.org/10.1186/s13174-018-0098-z Journal of Internet Services and Applications RESEARCH Open Access Pseudonymization risk

More information

SOCIAL NETWORKING IN TODAY S BUSINESS WORLD

SOCIAL NETWORKING IN TODAY S BUSINESS WORLD SOCIAL NETWORKING IN TODAY S BUSINESS WORLD AGENDA Review the use of social networking applications within the business environment Review current trends in threats, attacks and incidents Understand how

More information

6. 5 Symmetries of Quadrilaterals

6. 5 Symmetries of Quadrilaterals 25 CC BY fdecomite 6. 5 Symmetries of Quadrilaterals A Develop Understanding Task A line that reflects a figure onto itself is called a line of symmetry. A figure that can be carried onto itself by a rotation

More information

Machine Learning on Encrypted Data

Machine Learning on Encrypted Data Machine Learning on Encrypted Data Kim Laine Microsoft Research, Redmond WA January 5, 2017 Joint Mathematics Meetings 2017, Atlanta GA AMS-MAA Special Session on Mathematics of Cryptography Two Tracks

More information

Secure Multiparty Computation

Secure Multiparty Computation Secure Multiparty Computation Li Xiong CS573 Data Privacy and Security Outline Secure multiparty computation Problem and security definitions Basic cryptographic tools and general constructions Yao s Millionnare

More information

Grade 6 Math Circles November 6 & Relations, Functions, and Morphisms

Grade 6 Math Circles November 6 & Relations, Functions, and Morphisms Faculty of Mathematics Waterloo, Ontario N2L 3G1 Centre for Education in Mathematics and Computing Relations Let s talk about relations! Grade 6 Math Circles November 6 & 7 2018 Relations, Functions, and

More information

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017 Differential Privacy Seminar: Robust Techniques Thomas Edlich Technische Universität München Department of Informatics kdd.in.tum.de July 16, 2017 Outline 1. Introduction 2. Definition and Features of

More information

CSC 411 Lecture 4: Ensembles I

CSC 411 Lecture 4: Ensembles I CSC 411 Lecture 4: Ensembles I Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 04-Ensembles I 1 / 22 Overview We ve seen two particular classification algorithms:

More information

WHAT TYPE OF NEURAL NETWORK IS IDEAL FOR PREDICTIONS OF SOLAR FLARES?

WHAT TYPE OF NEURAL NETWORK IS IDEAL FOR PREDICTIONS OF SOLAR FLARES? WHAT TYPE OF NEURAL NETWORK IS IDEAL FOR PREDICTIONS OF SOLAR FLARES? Initially considered for this model was a feed forward neural network. Essentially, this means connections between units do not form

More information

The Bootstrap and Jackknife

The Bootstrap and Jackknife The Bootstrap and Jackknife Summer 2017 Summer Institutes 249 Bootstrap & Jackknife Motivation In scientific research Interest often focuses upon the estimation of some unknown parameter, θ. The parameter

More information

Crowd-Blending Privacy

Crowd-Blending Privacy Crowd-Blending Privacy Johannes Gehrke, Michael Hay, Edward Lui, and Rafael Pass Department of Computer Science, Cornell University {johannes,mhay,luied,rafael}@cs.cornell.edu Abstract. We introduce a

More information

Opening Windows into the Black Box

Opening Windows into the Black Box Opening Windows into the Black Box Yu-Sung Su, Andrew Gelman, Jennifer Hill and Masanao Yajima Columbia University, Columbia University, New York University and University of California at Los Angels July

More information

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa Ronald H. Heck 1 In this handout, we will address a number of issues regarding missing data. It is often the case that the weakest point of a study is the quality of the data that can be brought to bear

More information

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim:

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim: We have seen that the insert operation on a RB takes an amount of time proportional to the number of the levels of the tree (since the additional operations required to do any rebalancing require constant

More information

Introduction to Geophysical Inversion

Introduction to Geophysical Inversion Introduction to Geophysical Inversion Goals Understand the non-uniqueness in geophysical interpretations Understand the concepts of inversion. Basic workflow for solving inversion problems. Some important

More information

Privacy Policy. I. How your information is used. Registration and account information. March 3,

Privacy Policy. I. How your information is used. Registration and account information. March 3, Privacy Policy This Privacy Policy describes how and when we collect, use and share your information across our App. When using our App you consent to the collection, transfer, storage, disclosure, and

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling

More information

Missing Data. SPIDA 2012 Part 6 Mixed Models with R:

Missing Data. SPIDA 2012 Part 6 Mixed Models with R: The best solution to the missing data problem is not to have any. Stef van Buuren, developer of mice SPIDA 2012 Part 6 Mixed Models with R: Missing Data Georges Monette 1 May 2012 Email: georges@yorku.ca

More information

Secure Multi-Party Computation. Lecture 13

Secure Multi-Party Computation. Lecture 13 Secure Multi-Party Computation Lecture 13 Must We Trust? Can we have an auction without an auctioneer?! Declared winning bid should be correct Only the winner and winning bid should be revealed Using data

More information

Introduction to Assurance

Introduction to Assurance Introduction to Assurance Overview Why assurance? Trust and assurance Life cycle and assurance April 1, 2015 Slide #1 Overview Trust Problems from lack of assurance Types of assurance Life cycle and assurance

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Algorithms and Game Theory Date: 12/3/15

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Algorithms and Game Theory Date: 12/3/15 600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Algorithms and Game Theory Date: 12/3/15 25.1 Introduction Today we re going to spend some time discussing game

More information

Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits

Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits Carl Sabottke Octavian Suciu Tudor Dumitraș University of Maryland 2 Problem Increasing number

More information

Secure Development Processes

Secure Development Processes Secure Development Processes SecAppDev2009 What s the problem? Writing secure software is tough Newcomers often are overwhelmed Fear of making mistakes can hinder Tend to delve into security superficially

More information

A Mathematical Proof. Zero Knowledge Protocols. Interactive Proof System. Other Kinds of Proofs. When referring to a proof in logic we usually mean:

A Mathematical Proof. Zero Knowledge Protocols. Interactive Proof System. Other Kinds of Proofs. When referring to a proof in logic we usually mean: A Mathematical Proof When referring to a proof in logic we usually mean: 1. A sequence of statements. 2. Based on axioms. Zero Knowledge Protocols 3. Each statement is derived via the derivation rules.

More information

Zero Knowledge Protocols. c Eli Biham - May 3, Zero Knowledge Protocols (16)

Zero Knowledge Protocols. c Eli Biham - May 3, Zero Knowledge Protocols (16) Zero Knowledge Protocols c Eli Biham - May 3, 2005 442 Zero Knowledge Protocols (16) A Mathematical Proof When referring to a proof in logic we usually mean: 1. A sequence of statements. 2. Based on axioms.

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Review Paper onbuilding Prediction based model for cloud-based data mining

Review Paper onbuilding Prediction based model for cloud-based data mining Review Paper onbuilding Prediction based model for cloud-based data mining Er. Spinder kaur 1,Dr. Sandeep Kautish 2 1 M.Tech Scholar, 2 Assistant Professor ABSTRACT University of Computer Application Guru

More information

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 31 st July 216. Vol.89. No.2 25-216 JATIT & LLS. All rights reserved. SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 1 AMANI MAHAGOUB OMER, 2 MOHD MURTADHA BIN MOHAMAD 1 Faculty of Computing,

More information

Cleanup and Statistical Analysis of Sets of National Files

Cleanup and Statistical Analysis of Sets of National Files Cleanup and Statistical Analysis of Sets of National Files William.e.winkler@census.gov FCSM Conference, November 6, 2013 Outline 1. Background on record linkage 2. Background on edit/imputation 3. Current

More information

Randomized Response Technique in Data Mining

Randomized Response Technique in Data Mining Randomized Response Technique in Data Mining Monika Soni Arya College of Engineering and IT, Jaipur(Raj.) 12.monika@gmail.com Vishal Shrivastva Arya College of Engineering and IT, Jaipur(Raj.) vishal500371@yahoo.co.in

More information

Formal Methods for Assuring Security of Computer Networks

Formal Methods for Assuring Security of Computer Networks for Assuring of Computer Networks May 8, 2012 Outline Testing 1 Testing 2 Tools for formal methods Model based software development 3 Principals of security Key security properties Assessing security protocols

More information

Robbing the Bank with a Theorem Prover

Robbing the Bank with a Theorem Prover Robbing the Bank with a Theorem Prover (Transcript of Discussion) Jolyon Clulow Cambridge University So it s a fairly provocative title, how did we get to that? Well automated tools have been successfully

More information

Houghton Mifflin MATHEMATICS Level 5 correlated to NCTM Standard

Houghton Mifflin MATHEMATICS Level 5 correlated to NCTM Standard s 2000 Number and Operations Standard Understand numbers, ways of representing numbers, relationships among numbers, and number systems understand the place-value structure of the TE: 4 5, 8 11, 14 17,

More information

Privacy Preserving Machine Learning: A Theoretically Sound App

Privacy Preserving Machine Learning: A Theoretically Sound App Privacy Preserving Machine Learning: A Theoretically Sound Approach Outline 1 2 3 4 5 6 Privacy Leakage Events AOL search data leak: New York Times journalist was able to identify users from the anonymous

More information

Introduction to Prof. Clarkson Fall Today s music: Prelude from Final Fantasy VII by Nobuo Uematsu (remastered by Sean Schafianski)

Introduction to Prof. Clarkson Fall Today s music: Prelude from Final Fantasy VII by Nobuo Uematsu (remastered by Sean Schafianski) Introduction to 3110 Prof. Clarkson Fall 2017 Today s music: Prelude from Final Fantasy VII by Nobuo Uematsu (remastered by Sean Schafianski) Welcome! Programming isn t hard Programming well is very hard

More information

WEB SITE PRIVACY POLICY

WEB SITE PRIVACY POLICY WEB SITE PRIVACY POLICY 1. Introduction This Privacy Policy applies only to the publicly available portions of the Web site www.stmonicasseniorliving.com (the Site ). By using the Site you represent that

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

Overview of Information Security

Overview of Information Security Overview of Information Security Lecture By Dr Richard Boateng, UGBS, Ghana Email: richard@pearlrichards.org Original Slides by Elisa Bertino CERIAS and CS &ECE Departments, Pag. 1 and UGBS Outline Information

More information

CS 153 Design of Operating Systems Winter 2016

CS 153 Design of Operating Systems Winter 2016 CS 153 Design of Operating Systems Winter 2016 Lecture 18: Page Replacement Terminology in Paging A virtual page corresponds to physical page/frame Segment should not be used anywhere Page out = Page eviction

More information

FMC: An Approach for Privacy Preserving OLAP

FMC: An Approach for Privacy Preserving OLAP FMC: An Approach for Privacy Preserving OLAP Ming Hua, Shouzhi Zhang, Wei Wang, Haofeng Zhou, Baile Shi Fudan University, China {minghua, shouzhi_zhang, weiwang, haofzhou, bshi}@fudan.edu.cn Abstract.

More information