Michelle Hayes Mary Joel Holin. Michael Roanhouse Julie Hovden. Special Thanks To. Disclaimer

Similar documents
DEMONSTRATION OF A PRIVACY-PRESERVING SYSTEM THAT PERFORMS AN UNDUPLICATED ACCOUNTING OF SERVICES ACROSS HOMELESS PROGRAMS

Now that you have an understanding of what will be covered in this module, let s review this module s learning objectives.

SHELTER PROGRAMS DATA COMPLETENESS DETAIL REPORT

HMIS Homeless Management Information System

Data Quality Monitoring Plan and Report Instructions and User Guide

Connect To Home Eastern PA CoC Coordinated Entry System HMIS Guide

GUIDE FOR RECORDING A MULTI-PERSON HOUSEHOLD ENTRY INTO SERVICEPOINT. Version 2.0

State Grant in Aid (SGIA) Program - SP5 Entry/Exit Workflow using ClientPoint (Families)

Homeless Management Information System (HMIS)

ART Gallery Report 0632 HUD Dedicated HMIS Annual Performance Report (HMIS APR)

Attendees: See Attached Sign-in Sheet. Introductions & Agenda Review. Notes are also available online for review,

or

HUD Data Quality Report

HELP MANAGEMENT INFORMATION SYSTEM Monterey-San Benito Counties Continuum of Care March 2015

NEBRASKA BOS AND LINCOLN ALL DOORS LEAD HOME HMIS Coordinated Entry and Referral Process

2017 PIT Summary: Jefferson County

HMIS 5.12 workflow Adding New CHAMP Clients

HMIS Street Outreach Projects

2017 PIT Summary: Boulder County

HMIS Emergency Shelter Projects

Beam Technologies Inc. Privacy Policy

USER GUIDE. Homelessness Data Exchange (HDX), Version 2.0 BETA TEST VERSION. June, U.S. Department of Housing and Urban Development

General Social Survey (GSS) NORC

HMIS Agency Administrators Meeting. January 4, 2018

DATA QUALITY Information Session

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

The New RoI. Effective 1 November 2014

HMIS Data Standards Critical Changes 2015

Uploading Youth Data: How To Collect & Upload RHY Data in HMIS

ServicePoint June, 2011

Startup Genome LLC and its affiliates ( Startup Genome, we or us ) are committed to protecting the privacy of all individuals who ( you ):

Synthetic Data. Michael Lin

Technical Solutions Novel Challenges to Privacy Privacy Enhancing Technologies Examples

HUD Data Clean-Up Guide Using the 0640 HUD Data Quality Framework Report

HMIS Guide to the CSV APR HMIS End User Resource

Santa Barbara County Continuum of Care

ServicePoint HMIS Entry/Exit Quick Reference Card (non HPRP)

Data Anonymization. Graham Cormode.

Marin HMIS Online. Introduction to using the Client Services Network

COC APR Finding and Fixing Data Quality Errors

Eagles Charitable Foundation Privacy Policy

Miami-Dade County Homeless Trust Homeless Management Information System (HMIS) Data Quality Standards. Miami-Dade County Continuum of Care (CoC)

PERTS Default Privacy Policy

Privacy-Enhancing Technologies & Applications to ehealth. Dr. Anja Lehmann IBM Research Zurich

Preserving Data Privacy in the IoT World

Survey Result on Privacy Preserving Techniques in Data Publishing

B. We may offer you the opportunity to submit other information about yourself (such as gender, age, occupation, hobbies, interests, zip code, etc.

Coalition of Homeless Services Providers Martinez Hall th Street Marina, CA Facebook.

I. INFORMATION WE COLLECT

PRIVACY STATEMENT. Effective Date 11/01/17.

2014 HMIS DATA STANDARDS AWARENESS. Timeline, Revision Process, and Major Changes

EnviroIssues Privacy Policy Effective Date:

GRANDSTREAM PRIVACY STATEMENT

In this unit we are continuing our discussion of IT security measures.

To respond to your inquiries and fulfill your requests, such as to send you newsletters or publications that you request.

Page 1 of 6 SURVEY: PROJECT TECH

NESTLÉ Consumer Privacy Notice Template PRIVACY NOTICE

The Table Privacy Policy Last revised on August 22, 2012

HIPAA Privacy & Security Training. Privacy and Security of Protected Health Information

Cognizant Careers Portal Terms of Use and Privacy Policy ( Policy )

Breach Notification Form

Authentication SPRING 2018: GANG WANG. Slides credit: Michelle Mazurek (U-Maryland) and Blase Ur (CMU)

Privacy Policy... 1 EU-U.S. Privacy Shield Policy... 2

Veteran By-Name List

IGN.COM - PRIVACY POLICY

ART Reporting Everything you ever wanted to know (well, some of it) Clif Graves - Thursday, September 17, 2015

Privacy Policy I. COOKEVILLE COMMUNICATIONS PRIVACY POLICY II. GENERAL PRIVACY GUIDELINES

Cognizant Careers Portal Privacy Policy ( Policy )

Online Privacy Notice

Privacy Policy Identity Games

Privacy Policy. I. How your information is used. Registration and account information. March 3,

TERMS AND CONDITIONS OF USE FOR THE WEBSITE This version is valid as from 1 October 2013.

UNTITLED HIP HOP PROJECT Privacy Policy. 1. Introduction

PRIVACY POLICY. 3.1 This policy does not apply to the collection, holding, use or disclosure of personal information that is an employee record.

Emerging Measures in Preserving Privacy for Publishing The Data

Healthfirst Website Privacy Policy

Authentication Objectives People Authentication I

Share Care. Consumer Search 11/24/2015 1

Survey of k-anonymity

Presenter: Terry Schoonover Author: Terry Schoonover. Summary:

DONE FOR YOU SAMPLE INTERNET ACCEPTABLE USE POLICY

Privacy Policy GENERAL

Zero-Knowledge Proof and Authentication Protocols

Identifying and Preventing Conditions for Web Privacy Leakage

Section 6. Registering a New Voter 5/11/2018

Featured Articles II Security Research and Development Research and Development of Advanced Security Technology

A Disclosure Avoidance Research Agenda

Stanislaus County HMIS Security & Data Quality Audit

Privacy Policy. When you create an account or use our Service, we collect the following types of information from you:

Technology Safety Quick Tips

Provider Configuration

Big Dogs Big Hearts Rescue Inc. Foster Application

OnlineNIC PRIVACY Policy

Blockchain for Enterprise: A Security & Privacy Perspective through Hyperledger/fabric

Corporate Policy. Revision Change Date Originator Description Rev Erick Edstrom Initial

CENTRAL INTAKE. AES Central Intake User Guide. AES University Manual. Adaptive Enterprise Solutions

Privacy Policy Manhattan Neighborhood Network Policies 2017

2. What is Personal Information and Non-Personally Identifiable Information?

Fooji Code of Conduct

II.C.4. Policy: Southeastern Technical College Computer Use

Transcription:

Further Understanding the Intersection of Technology and Privacy to Ensure and Protect Client Data Special Thanks To Michelle Hayes Mary Joel Holin We can provably know where domestic violence shelter clients have been without knowing who they are. privacy.cs.cmu.edu Latanya Sweeney, PhD latanya@privacy.cs.cmu.edu Michael Roanhouse Julie Hovden Disclaimer The views and opinions in this presentation represent my own and are not necessarily those of HUD, Abt, or any affiliates (or my cat s or dog s). Known side effects include shock and applause. Privacy Technology 1.Example: tracking people 2.Example: anonymizing data 3.Example: distributed surveillance 4.Example: trails of dots 5.Example: learning who you know 6.Example: identity theft 7.Example: fingerprint capture 8.Example: bio-terrorism surveillance 9.Example: privacy-preserving surveillance 10.Example: DNA privacy 11.Example: Identity theft protections 12.Example: k-anonymity.example: webcam surveillance 14.Example: text de-identification 15.Example: face de-identification 16.Example: fraudulent Spam privacy.cs.cmu.edu And Technology Or Privacy Traditional Belief System This Work Privacy Question in this Work How can Shelters construct UIDs without risk of re-identification while still achieving an accurate unduplicated accounting? Usefulness This talk will examine old approaches and introduce a new solution with provable properties. Copyright (c) 1998-2006 Dr. Sweeney. 1

This Talk 1. The Setting 2. Technology Survey 3. A Provable Privacy Solution The Big Goal Perform local unduplicated accountings of homeless visit patterns without identifying clients. privacy.cs.cmu.edu Homeless Management Information Systems (HMIS) Goal: a local unduplicated accounting of Client Visit Patterns Client1 Client2 Client3 Client4 1 2 HUD Client Personal Information Shelter Universal Data Elements HUD Aggregate Information Client5 Shelter3 Universal Data Elements Unique Identifier ( UID ) Name Social Security Number Date of Birth Ethnicity and Race Gender Veteran Status Disabling Condition Residence Prior to Program Entry Code of Last Permanent Address Program Entry Date Program Exit Date Unique Person Identification Number Program Identification Number Household Identification Number HUD Reporting (Sample) Question # AHAR Questions: Emergency Shelter -Individuals 1 How many people used emergency shelters at time? 2 What is the distribution of family sizes using emergency shelters? 3 What are the demographics of individuals using emergency shelters? 3 distribution by gender? 3 distribution by race and ethnicity? 3 distribution by age group? 3 distribution by household size? 3 distribution by veteran status? By disabling condition? 4 What was the living arrangement the night before entering the emergency shelter? 4 within/outside geographical jurisdiction? 5 What is distribution of the number of nights in an emergency shelter? 5 distribution by gender? 5 distribution by age group? Copyright (c) 1998-2006 Dr. Sweeney. 2

Intimate Stalker Threat Knows detailed information about a targeted client Is highly motivated Can compromise a shelter or to find the location of the targeted client ( re-identification ) Threat Has lots of other information that may contain the client. Motivated to learn information about clients generally Link data on clients specifically ( re-identification ) Re-Identification occurs when explicit client identifiers (e.g., name or address) can be reasonably associated with the client s de-identified information. to Re-identify Clients Alice Personal Information Shelter 9/19/60 F 372 Alice Alice 9/19/60 1 Main St External Information 9/19/60 F 372 Alice to re-identify HMIS Data Ethnicity Visit date PIN Shelter ID Dataset Name Address ZIP Date Birth registered date Party Sex affiliation Date last voted Voter List L. Sweeney. Weaving technology and policy together to maintain confidentiality. Journal of Law, Medicine and Ethics. 1997, 25:98-110. County Town/ Place ZIP 5-digit Gender Re-identification Results 18.1% 0.04% 0.00004% 58.4% 3.6% 0.04% 87.1% 3.7% 0.04% Date of Birth Mon/Yr Birth Year of Birth Copyright (c) 1998-2006 Dr. Sweeney. 3

Thwarting Using re-identification analysis, we can quantify linking risks associated with data elements and make changes accordingly. We can thwart linking. Remainder of this talk assumes linking precautions done. Question in this Work How can Shelters construct UIDs without risk of re-identification while still achieving an accurate unduplicated accounting? This talk will examine old approaches and introduce a new solution with provable properties. This Talk 1. The Setting 2. Technology Survey 3. A Provable Privacy Solution Minimal Risk v. Provable Privacy Minimal risk technologies uses a combination of technology, practices and policy to show that there is a minimal re-identification risk. privacy.cs.cmu.edu Provable privacy technology provides guarantees against reidentification. Minimal Risk Technologies Technology Homeless Shelters. Carnegie Mellon Tech Report CMU-ISRI-05-3 Pittsburgh: November 2005. Copyright (c) 1998-2006 Dr. Sweeney. 4

Concatenate parts of source information into a UID. Example: Using {date of birth, gender, ZIP} 021960F372 Date of birth Sex ZIP Providing explicitly sensitive source information. Need to use non-sensitive source information. Homeless Shelters. Carnegie Mellon Tech Report CMU-ISRI-05-3 Pittsburgh: November 2005. UID based on part of the source information. Example: Using {date of birth, gender, ZIP} 8126r29ws 986s594652 Must be strong Can be examined publicly. Fast to compute but infeasible to reverse. Problems with Consistent (1) If the same hash value is broadly used with Clients, then it may lead to reidentifications through linking. If the intimate stalker compromises a Shelter or the, the hashed UID could be learned and used to locate the targeted Client. Problems with Consistent (2) If the source information is SSN or demographics, then could re-identify all UIDs by exhaustively computing all UIDs. Dataset UID 149875 072 976526 Social Security Number UID UID 8563 for try 000-00-0000 UID 962656 for try 000-00-0001 UID 072 for try 000-00-0002 UID 976526 for try 104-51-2572 UID 149875 for try 104-51-2573 Try 000-00-0000 Try 000-00-0001 Try 000-00-0002 Try 104-51-2572 Try 104-51-2573 Try 999-99-9999 Problems with Consistent (3) bits seconds 28 1 29 3 30 7 31 15 32 31 33 62 34 124 35 249 36 499 37 998 38 1996 93 40 7986 41 15963 42 31926 43 63888 44 127725 45 255463 46 510774 47 1021463 Time to Exhaust Count (seconds) 1200000 1000000 800000 600000 400000 200000 0 24 29 34 44 49 Number of Bits Size of source information matters. Exhaust all SSNs in 4 seconds! Copyright (c) 1998-2006 Dr. Sweeney. 5

Homeless Shelters. Carnegie Mellon Tech Report CMU-ISRI-05-3 Pittsburgh: November 2005. Like hashing, but has a key to reverse result. Example: Using {date of birth, gender, ZIP} 8126r29ws 8126r29ws + key = 9/12/1960, F, 372 The person with the key can reveal the sensitive source information. Homeless Shelters. Carnegie Mellon Tech Report CMU-ISRI-05-3 Pittsburgh: November 2005. Scan Cards / RFID Tags Issue a card containing a UID to each client, who presents for service. Can be lost of given away! Example #57817 #57817 Should not contain personal information or Shelter information. Homeless Shelters. Carnegie Mellon Tech Report CMU-ISRI-05-3 Pittsburgh: November 2005. Use something always present with client and that typically does not change. Example: fingerprint 968c5z9 UID Fingerprints can often be linked to lawenforcement databases and re-identify clients. Copyright (c) 1998-2006 Dr. Sweeney. 6

Homeless Shelters. Carnegie Mellon Tech Report CMU-ISRI-05-3 Pittsburgh: November 2005. Ask each client their permission to share data in exchange for services. Disclose uses of data and circumstances of sharing. They may say no. Identifiable information can be shared. Forwarding identifiable information is not good. Homeless Shelters. Carnegie Mellon Tech Report CMU-ISRI-05-3 Pittsburgh: November 2005. This Talk 1. The HMIS Setting 2. Technology Survey 3. A Provable Privacy Solution privacy.cs.cmu.edu Question in this Work How can Shelters construct UIDs without risk of re-identification while still achieving an accurate unduplicated accounting? This talk will examine old approaches and introduce a new solution with provable properties. The Big Idea in 3 Steps 1. Shelters assign UIDs. Client has same UID at same shelter, and different UID at other shelters. 2. Shelters securely ship data to Fedex UIDs and Universal Data Elements 3. and Shelters de-duplicate UIDs (described over next slides) Copyright (c) 1998-2006 Dr. Sweeney. 7

UID Assignment Each Shelter has a private value. Each Client has a private value. Strong hashing is used to combine the Shelter and Client value to produce a UID for the client. De-Duplication Each Shelter re-hashes the UIDs from all other Shelters. All re-hashed values that are the same represent the same client. The Commutative Property of Strong Simplified Multiplication Example, 1 There exists strong hash functions that when all Shelters re-hash all UIDs, the re-hashed values will only be the same for Clients whose source information was the same. J. Benaloh and M. de Mare. One-way accumulators: a decentralized alternative to digital signatures. In Proceedings of Advances in Cryptology - EUROCRYPT '93, Lecture Notes in Computer Science, v 765, pages 274-285, Lofthus, Norway, 1994. Each Shelter has its own private value. Simplified Multiplication Example, 2 Simplified Multiplication Example, 3 3 Mult(, ) = Mult( 7, ) = 3 Mult(, ) = Mult( 7, ) = Multiply Client and Shelter private value to get UIDs. The stores UIDs of Clients from Shelter 1. Copyright (c) 1998-2006 Dr. Sweeney. 8

Simplified Multiplication Example, 4 Simplified Multiplication Example, 5 3 Mult(, ) = 11 Mult(, ) = Multiply Client and Shelter private value to get UIDs. now knows there are 4 visits, but how many Clients? Simplified Multiplication Example, 6 Simplified Multiplication Example, 7 Mult(, )= 897 Mult(, ) = 3289 sends UIDs from Shelter 2 to Shelter 1 for re-hashing. stores the re-hashed values. 897 3289 Simplified Multiplication Example, 8 Simplified Multiplication Example, 9 sends UIDs from Shelter 1 to Shelter 2 for rehashing. 897 3289 Mult(, ) = 897 Mult(, ) = 2093 stores the re-hashed values. 897 2093 897 3289 Copyright (c) 1998-2006 Dr. Sweeney. 9

Simplified Multiplication Example, 10 Simplified Multiplication Example, 11 Re-hashed values that are the same represent the same Client. Which are the same? 897 2093 897 3289 The re-hashed value 897 appears twice. learns that Client at Shelter 1 is the same Client as at Shelter 2. 897 2093 897 3289 Learns Simplified Multiplication Example 3 Mult(, ) = 3 (3* * )* Client1 Client2 Completely Re-hashed UIDs 897 2093 3289 (3 * ) * = 897 (3 * ) * = 897 (33 * ) * 897 Client3 3 Mult(, ) = 897 The Big Idea in 3 Steps 1. Shelters assign UIDs. Client has same UID at same shelter, and different UID at other shelters. 2. Shelters securely ship data to Fedex UIDs and Universal Data Elements 3. and Shelters de-duplicate UIDs Re-hash UIDs to reveal which UIDs belong to the same client. Note The UIDs are not to be used for any other purpose than this reporting and deduplication. Shelters use different private values at each reporting period. This results in different hashes for the same Clients over different reporting periods. Copyright (c) 1998-2006 Dr. Sweeney. 10

A Provable Claim A Provable Claim Theorem. If the re-hashed values are the same, the Clients representing the original UIDs provided the same source information. A dictionary attack by the will not yield reliable reidentifications. Dataset UID 149875 072 976526 Social Security Number UID UID 8563 for try 000-00-0000 UID 962656 for try 000-00-0001 UID 072 for try 000-00-0002 UID 976526 for try 104-51-2572 UID 149875 for try 104-51-2573 Try 000-00-0000 Try 000-00-0001 Try 000-00-0002 Try 104-51-2572 Try 104-51-2573 Try 999-99-9999 Client1 Client2 Client3 A Provable Claim Compromising a Shelter will not help the intimate stalker learn where a targeted Client is (or has been) at another Shelter. A Provable Claim Compromising the will not help the intimate stalker learn where a targeted Client is (or has been). Completely Re-hashed UIDs 897 2093 3289 A Provable Claim Even if the pads the UIDs with known values, the does not learn the source information of Clients. b3s7 ghre Planning Office H2732 0yfh02 Planning Office Over the Limit If the intimate stalker compromises both the and a Shelter the targeted Client visited, the intimate stalker can learn the locations of all Shelters the Client visited. ax4 1804 H2732 nw450 Copyright (c) 1998-2006 Dr. Sweeney. 11

Technologies for HMIS This Distributed provable Query solution * ** ** ** * If compromise enough parties, can learn information. ** Shown is worst case, can be improved by source information selection. Question in this Work How can Shelters construct UIDs without risk of re-identification while still achieving an accurate unduplicated accounting? First, use strong hashing, inconsistently across Shelters to assign UIDs. Second, provide accounting information to the through a secure means. Then, have each Shelter re-hash the UIDs of all other Shelters, in turn, to de-duplicate UIDs. This Talk 1. The Setting 2. Technology Survey 3. A Provable Privacy Solution privacy.cs.cmu.edu latanya@privacy.cs.cmu.edu Copyright (c) 1998-2006 Dr. Sweeney. 12