HIDE: Privacy Preserving Medical Data Publishing. James Gardner Department of Mathematics and Computer Science Emory University

Size: px
Start display at page:

Download "HIDE: Privacy Preserving Medical Data Publishing. James Gardner Department of Mathematics and Computer Science Emory University"

Transcription

1 HIDE: Privacy Preserving Medical Data Publishing James Gardner Department of Mathematics and Computer Science Emory University

2 Motivation De-identification is critical in any health informatics system Research Sharing Need an easy-to-use interface and framework for data custodians and publishers Understanding data is necessary to de-identify data

3 HIPAA 1. Names; 2. All geographical subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code, if according to the current publicly available data from the Bureau of the Census: (1) The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and (2) The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older; 4. Phone numbers; 5. Fax numbers; 6. Electronic mail addresses; 7. Social Security numbers; 8. Medical record numbers; 9. Health plan beneficiary numbers; 10. Account numbers; 11. Certificate/license numbers; 12. Vehicle identifiers and serial numbers, including license plate numbers; 13. Device identifiers and serial numbers; 14. Web Universal Resource Locators (URLs); 15. Internet Protocol (IP) address numbers; 16. Biometric identifiers, including finger and voice prints; 17. Full face photographic images and any comparable images; and 18. Any other unique identifying number, characteristic, or code (note this does not mean the unique code assigned by the investigator to code the data)

4 PHI Summary Protected Health Information (PHI) is defined by HIPAA as individually identifiable health information Direct identifiers include name, SSN, etc. Indirect identifiers include gender, age, address information, etc.

5 Research Challenges Detect PHI in heterogeneous medical data Apply structured anonymization principles on heterogeneous medical data (micro-privacy) Release differentially private aggregated statistics (macro-privacy)

6 HIDE Health Information DE-identification Uses techniques from Information Extraction Data linking Structured Anonymization Differential Privacy Data Mining

7 HIDE

8 Outline Background and related work Named entity recognition Proposed Work HIDE framework Micro-data publishing Macro-data publishing Existing de-identification approaches Privacy preserving data publishing Identifying and sensitive information extraction Software

9 Alternative Systems Scrub System - rules and dictionaries are used to detect PHI Semantic Lexicon System - rules and dictionaries are used to detect PHI DE-ID - rules and dictionaries, developed at Pittsburgh and approved by IRB Concept-Match Scrubber - removes every word not in an approved list of non-identifying terms Carafe - uses a CRF to detect PHI

10 Limitations of Most Systems Lack portability Donʼt give formal privacy guarantees Donʼt utilize the latest work from structured data anonymization Focus only on removing PHI

11 Named Entity Recognition Locate and classify atomic elements in text into predefined categories such as person, organization, location, expressions of time, quantities, etc. NER systems can be classified into either: Rule-based Machine Learning-based

12 NER Examples Part-of-speech (POS) Tagging I/PRP think/vbp it/prp s/bes a/dt pretty/ RB good/jj idea/nn./. Personal Health Identifier Detection <age>77</age> year old <gender>female</ gender> with history of <disease>b-cell lymphoma</disease> (Marginal zone, <mrn>sh </mrn>)

13 NER Metrics Precision TP / (TP + FP) Recall TP / (TP + FN)

14 Rule-based Rely on hand-coded rules and dictionaries Dictionaries can be used for terms in a closed class with an exhaustive list, e.g. geographic locations Regular expressions are used to detect terms that follow certain syntactic patterns, e.g. phone numbers

15 Machine learning-based Model the NER as a sequence labeling task where each token is assigned a label Train classifiers to label each token Classifiers use a list of features (or attributes) for training and classification of the sequence Frequently applied classifiers are HMM, MEMM, SVM, and CRF

16 Conditional Random Field A Conditional Random Field (CRF) provides a probabilistic framework for labeling and segmenting sequential data A CRF defines a conditional probability of a label sequence given an observation sequence

17 Comparison Rule-based Accurate Require experts to modify Not portable Machine learning-based Accurate Modification of models is done through training rather than coding Portable

18 Privacy Preserving Data Publishing Weak privacy (Micro) release a modified version of each record according to a given anonymization principle assumes level of background knowledge Differential privacy (Macro) release perturbed statistics that satisfy the differential privacy principle no assumptions of background knowledge

19 Micro-data publishing Prevent linking of records in separate databases k-anonymization l-diversity Prevent discovery of sensitive values Prevent discovery of presence or absence in a database delta-presence

20 Micro-data publishing Name Age Gender Zipcode Diagnosis Henry 25 Male Influenza Irene 28 Female Lymphoma Dan 28 Male Bronchitis Erica 26 Female Influenza Table 1: Illustration of Anonymization Name Age Gender Zipcode Diagnosis Henry 25 Male Influenza Irene 28 Female Lymphoma Dan 28 Male Bronchitis Erica 26 Female Influenza Original Data Data Name Age Gender Zipcode Disease [25 28] Male [ ] Influenza [25 28] Female Lymphoma [25 28] Male [ ] Bronchitis [25 28] Female Influenza Anonymized Data Name Age Gender Zipcode Disease [25 28] Male [ ] Influenza [25 28] Female Lymphoma [25 28] Male [ ] Bronchitis [25 28] Female Influenza Anonymized Data

21 k-anonymization Quasi identifier set Sensitive attributes Table is k-anonymous if every record has k-1 other records with the same quasiidentifier set The probability of linking a victim to a specific record through QID is at most 1/k

22 l-diversity Extension of k-anonymization Also ensures that each group has at least l distinct sensitive values Prevents disclosure of sensitive values

23 Macro-data publishing Differential Privacy is a strong privacy notion Requires that a randomized computation yields nearly identical output when performed on nearly identical input Interactive model limited to a specific number of queries Non-interactive model need query strategies to build noisy data cubes that maximize utility for a random query workload

24 Differentially Private Interface Query Strategy Workload Pre-designed Queries Original Data Differentially Private Interface Diff. Private Answers Diff. Private Histogram Queries Answers User

25 HIDE Framework Identifying and Sensitive Information Extraction uses state-of-the-art CRF model to extract PHI and sensitive information Data linking provides structured patient-centric view of the data De-identification and Anonymization Micro-data publication - uses data suppression and generalization to provide a k-anonymized view of the data Macro-data publication - release perturbed aggregated statistics from the patient-centric view

26 HIDE

27 Identifying and sensitive information extraction Use CRF classifier to extract information Studied impact of features including: regular expressions affixes dictionaries context Sampling techniques to adjust classifier for higher precision or recall

28 Example Token Label Token Label 77 B-age of O year O B B-disease old O - I-disease female B-gender cell I-disease with O lymphoma I-disease history O ( O

29 Regular Expressions Regular Expression Name ^[A-Za-z]$ ALPHA ^[A-Z].*$ INITCAPS ^[A-Z][a-z].*$ UPPER-LOWER ^[A-Z]+$ ALLCAPS ^[A-Z][a-z]+[A-Z][A-Za-z]*$ MIXEDCAPS ^[A-Za-z]$ SINGLECHAR ^[0-9]$ SINGLEDIGIT ^[0-9][0-9]$ DOUBLEDIGIT ^[0-9][0-9][0-9]$ TRIPLEDIGIT ^[0-9][0-9][0-9][0-9]$ QUADDIGIT ^[0-9,]+$ NUMBER [0-9] HASDIGIT ^.*[0-9].*[A-Za-z].*$ ALPHANUMERIC ^.*[A-Za-z].*[0-9].*$ ALPHANUMERIC ^[0-9]+[A-Za-z]$ NUMBERS LETTERS ^[A-Za-z]+[0-9]+$ LETTERS NUMBERS - HASDASH HASQUOTE / HASSLASH ~!@#$%\^&*()\-=_+\[\]{} ; :\",./<>?]+$ ISPUNCT (- \+)?[0-9,]+(\.[0-9]*)?%?$ REALNUMBER ^-.* STARTMINUS ^\+.*$ STARTPLUS ^.*%$ ENDPERCENT ^[IVXDLCM]+$ ROMAN ^\s+$ ISSPACE Table 1: List of regular expression features used in HIDE

30 Affixes Prefixes Suffixes All affixes up to size 3

31 Dictionaries Company Names Last Names State Names Hospital Names Male First Names Female First Names State Abbreviations

32 Context Previous 4 words Next 4 words Occurrence counts

33 Feature Vectors Token CAPS? SPECIAL? PREVIOUS NEXT LABEL 77 N Y? year B-age year N N 77 old O old N N year female O female N N old with B-gender with N N female history O history N N with of O of N N history B O B Y N of - B-disease - N Y B cell I-disease cell N N - lymphoma I-disease lymphoma N N cell ( I-disease

34 Features Set Results 220 re-identified pathology reports for i2b2 task 10-fold cross-validation

35 Features Set Results d r rd a ad ra rad c cd ac acd rc rac racd rcd Precision Recall F-Score

36 Sampling Honest brokers are concerned more about recall than precision Cost proportionate rejection sampling is often used for boosting Training examples are selected based on the associated cost of missing that label

37 Random O-Sampling Keep all non- O labels Select O labels with given probability Biases the classifier to select a label other than O

38 Random O-Sampling i2b2 PhysioNet Sample Probability Sample Probability prec recall f-score prec recall f-score

39 Window Sampling Select all non- O labels Select all terms within given window of any non- O label

40 Window Sampling i2b2 PhysioNet History Size Precision F-Score Recall History Size Precision Recall F-Score

41 Information Extraction Conclusion HIDE has a fast and accurate CRF for detecting PHI Feature engineering has been explored in great detail Window Sampling can be used to adjust recall with minimal impact on precision Impact of training data size

42 Micro-data publishing Release patient-centric view or original data with suppressed or generalized values Apply k-anonymization and l-diversity principles to unstructured data Evaluate query accuracy on real medical data

43 Micro-data publishing Full de-identification Remove all identifiers Partial de-identification Remove direct identifiers Statistical de-identification Statistical anonymization

44 Statistical Anonymization Partition the original data points into groups that will all share the same values with respect to QID Use multi-dimensional mondrian algorithm for releasing k-anonymized and l-diverse version of structured patient-centric view

45 Partitioning (a) Patients (b) Single-Dimensional (c) Strict Multidimensional e 4. Spatial representation of Patients and partitionings (quasi-identifiers Zipcode an

46 Mondrian algorithm Greedy top-down partitioning approach Choose dimension with maximum range Split at median if each newly created partition still satisfies k-anonymization and l-diversity

47 Example 50 (k = 50) (k = 25) (k = 10) (b) Greedy strict multidimensional partitioning More precision partitions are possible with smaller k

48 Query Accuracy 100 pathology reports age > n age < n 10,000 random queries

49 Query Accuracy Query Precision (%) Statistical De-identification Partial De-identification Full De-identification k

50 Macro-data publishing Differentially private data publishing (DPDP) module in HIDE Create differentially private data cube where each dimension represents a statistic over the patient-centric view Partitioning algorithm based on information gain to maximize level of utility of differentially private data cube Consistency algorithm to enhance utility

51 Differentially Private Interface Query Strategy Workload Pre-designed Queries Original Data Differentially Private Interface Diff. Private Answers Diff. Private Histogram Queries Answers User

52 DPDP considers: DPDP Access to original data Partitioning of the original database that best satisfies the workload of queries Level of differential privacy of data cube Level of utility (or noise) in the released data cube

53 Access to original database Every time the original database is accessed we use some of the privacy budget Access the original database in a differentially private manner Minimize the amount of times the original data is queried to minimize the amount of noise we must add to the results

54 Query Strategy Develop a query strategy that will allow the most utility given random queries from the user This query strategy is accomplished by partitioning the data according to information gain

55 Partitioning of the original database Release two data cubes One using cell-based algorithm that partitions database into itʼs individual cells and release a perturbed count for each cell One using top-down multi-dimensional partitioning strategy, where each split value selection maximizes the information gain and ensures the uniformity of the data points in the partition A consistency algorithm will be applied to the two datacubes that will increase the accuracy of the released datacubes

56 Cell partitioning Age Income 40K 50K Q1: count() where Age = 20, Income = 40K Q2: count() where Age = 20, Income = 50K alpha Age Income 40K 50K Q Select count where age > 20 and age < 30 alpha is the differential privacy parameter

57 Multi-dimensional partitioning Age Income 40K 50K Multi-dimensioning partitioning Age Income 40K 50K Select count where age > 20 and age < 30 Noise is divided

58 Goals of partitioning strategy Large partitions to minimize aggregated perturbation error Uniform partitions to minimize approximation error Minimize the number of times we access the original data

59 Proposed Approach Age Income 40K 50K Cell partitioning queries (alpha/2) 3. Multi-dim partitioning queries (alpha/2) K 50K Multi-dim 40K 50KPartitioning K 50K

60 Utility of release The level of utility is measured by comparing the value a query workload on the released differentially private data cubes and a non-perturbed data cube generated from the original data We empirically evaluate the level of error that is function of the given privacy budget

61 Software Web application Python, Django, and CouchDB Interface Iterative labeling of documents and training underlying classifier Analyze accuracy of classifier on validation sets Classifier is super-fast CRF provided by CRFSuite

62 Publications Y. Xiao, J. Gardner, L. Xiong. DPCube: Releasing Differentially Private Data Cubes for Health Information (demo paper). In 28th IEEE International Conference on Data Engineering (ICDE), 2012 James Gardner, Li Xiong, Fusheng Wang, Andrew Post and Joel Saltz. An evaluation of feature sets and sampling techniques for statistical de-identification of medical records. In 1st ACM International Health Informatics Symposium, 2010 (to appear). Li Xiong, James Gardner, Pawel Jurczyk and James J. Lu. Privacy Preserving Information Discovery on EHRs. In Information Discovery on Electronic Health Records, Ed. Vagelis Hristidis. Chapman and Hall/CRC, pp , James Gardner and Li Xiong. An integrated framework for de-identifying unstructured medical data. Data and Knowledge Engineering, 68(12), pp , 2009, doi: /j.datak James Gardner, Kanwei Li, Li Xiong and James J. Lu. HIDE: Heterogeneous Information DEidentification (demo track). 12th International Conference on Extending Database Technology (EDBT), March, James Gardner and Li Xiong. HIDE: A Health Information DE-identification System. In 21st IEEE International Symposium on Computer-Based Medical Systems (CBMS), June, 2008

POLICY. Create a governance process to manage requests to extract de- identified data from the Information Exchange (IE).

POLICY. Create a governance process to manage requests to extract de- identified data from the Information Exchange (IE). Academic Health Center Office of Biomedical Health Informatics POLICY Extraction of De- Identifiable Data from the Information Exchange Approved Proposal Purpose Create a governance process to manage requests

More information

Statistical and Synthetic Data Sharing with Differential Privacy

Statistical and Synthetic Data Sharing with Differential Privacy pscanner and idash Data Sharing Symposium UCSD, Sept 30 Oct 2, 2015 Statistical and Synthetic Data Sharing with Differential Privacy Li Xiong Department of Mathematics and Computer Science Department of

More information

Introduction/Instructions

Introduction/Instructions Introduction/Instructions Registries (data banks) and repositories (tissue banks, usually with databases associated) all involve the collection and storage of information and/or biological specimens that

More information

Universal Patient Key

Universal Patient Key Universal Patient Key Overview The Healthcare Data Privacy (i.e., HIPAA Compliance) and Data Management Challenge The healthcare industry continues to struggle with two important goals that many view as

More information

HIPAA and HIPAA Compliance with PHI/PII in Research

HIPAA and HIPAA Compliance with PHI/PII in Research HIPAA and HIPAA Compliance with PHI/PII in Research HIPAA Compliance Federal Regulations-Enforced by Office of Civil Rights State Regulations-Texas Administrative Codes Institutional Policies-UTHSA HOPs/IRB

More information

Privacy Preserving Data Mining: An approach to safely share and use sensible medical data

Privacy Preserving Data Mining: An approach to safely share and use sensible medical data Privacy Preserving Data Mining: An approach to safely share and use sensible medical data Gerhard Kranner, Viscovery Biomax Symposium, June 24 th, 2016, Munich www.viscovery.net Privacy protection vs knowledge

More information

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S.

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series Introduction to Privacy-Preserving Data Publishing Concepts and Techniques Benjamin C M Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S Yu CRC

More information

EXAMPLE 2-JOINT PRIVACY AND SECURITY CHECKLIST

EXAMPLE 2-JOINT PRIVACY AND SECURITY CHECKLIST Purpose: The purpose of this Checklist is to evaluate your proposal to use or disclose Protected Health Information ( PHI ) for the purpose indicated below and allow the University Privacy Office and Office

More information

Incognito: Efficient Full Domain K Anonymity

Incognito: Efficient Full Domain K Anonymity Incognito: Efficient Full Domain K Anonymity Kristen LeFevre David J. DeWitt Raghu Ramakrishnan University of Wisconsin Madison 1210 West Dayton St. Madison, WI 53706 Talk Prepared By Parul Halwe(05305002)

More information

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness

Data Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness Data Security and Privacy Topic 18: k-anonymity, l-diversity, and t-closeness 1 Optional Readings for This Lecture t-closeness: Privacy Beyond k-anonymity and l-diversity. Ninghui Li, Tiancheng Li, and

More information

EXAMPLE 3-JOINT PRIVACY AND SECURITY CHECKLIST

EXAMPLE 3-JOINT PRIVACY AND SECURITY CHECKLIST Purpose: The purpose of this Checklist is to evaluate your proposal to use or disclose Protected Health Information ( PHI ) for the purpose indicated below and allow the University Privacy Office and Office

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Privacy preserving data mining Li Xiong Slides credits: Chris Clifton Agrawal and Srikant 4/3/2011 1 Privacy Preserving Data Mining Privacy concerns about personal data AOL

More information

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University Outline Privacy preserving data publishing: What and Why Examples of privacy attacks

More information

Security Control Methods for Statistical Database

Security Control Methods for Statistical Database Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security Statistical Database A statistical database is a database which provides statistics on subsets of records OLAP

More information

HIPAA and Research Contracts JILL RAINES, ASSISTANT GENERAL COUNSEL AND UNIVERSITY PRIVACY OFFICIAL

HIPAA and Research Contracts JILL RAINES, ASSISTANT GENERAL COUNSEL AND UNIVERSITY PRIVACY OFFICIAL HIPAA and Research Contracts JILL RAINES, ASSISTANT GENERAL COUNSEL AND UNIVERSITY PRIVACY OFFICIAL Just a Few Reminders HIPAA applies to Covered Entities HIPAA is a federal law that governs the privacy

More information

University of Mississippi Medical Center Data Use Agreement Protected Health Information

University of Mississippi Medical Center Data Use Agreement Protected Health Information Data Use Agreement Protected Health Information This Data Use Agreement ( DUA ) is effective on the day of, 20, ( Effective Date ) by and between (UMMC) ( Data Custodian ), and ( Recipient ), located at

More information

Overview of Datavant's De-Identification and Linking Technology for Structured Data

Overview of Datavant's De-Identification and Linking Technology for Structured Data Overview of Datavant's De-Identification and Linking Technology for Structured Data Introduction Datavant is firmly committed to advancing healthcare through data analytics while protecting patients privacy.

More information

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Dr.K.P.Kaliyamurthie HOD, Department of CSE, Bharath University, Tamilnadu, India ABSTRACT: Automated

More information

K ANONYMITY. Xiaoyong Zhou

K ANONYMITY. Xiaoyong Zhou K ANONYMITY LATANYA SWEENEY Xiaoyong Zhou DATA releasing: Privacy vs. Utility Society is experiencing exponential growth in the number and variety of data collections containing person specific specific

More information

Computer Security Incident Response Plan. Date of Approval: 23-FEB-2014

Computer Security Incident Response Plan. Date of Approval: 23-FEB-2014 Computer Security Incident Response Plan Name of Approver: Mary Ann Blair Date of Approval: 23-FEB-2014 Date of Review: 31-MAY-2016 Effective Date: 23-FEB-2014 Name of Reviewer: John Lerchey Table of Contents

More information

Best Practices. Contents. Meridian Technologies 5210 Belfort Rd, Suite 400 Jacksonville, FL Meridiantechnologies.net

Best Practices. Contents. Meridian Technologies 5210 Belfort Rd, Suite 400 Jacksonville, FL Meridiantechnologies.net Meridian Technologies 5210 Belfort Rd, Suite 400 Jacksonville, FL 32257 Meridiantechnologies.net Contents Overview... 2 A Word on Data Profiling... 2 Extract... 2 De- Identification... 3 PHI... 3 Subsets...

More information

Emerging Measures in Preserving Privacy for Publishing The Data

Emerging Measures in Preserving Privacy for Publishing The Data Emerging Measures in Preserving Privacy for Publishing The Data K.SIVARAMAN 1 Assistant Professor, Dept. of Computer Science, BIST, Bharath University, Chennai -600073 1 ABSTRACT: The information in the

More information

Co-clustering for differentially private synthetic data generation

Co-clustering for differentially private synthetic data generation Co-clustering for differentially private synthetic data generation Tarek Benkhelif, Françoise Fessant, Fabrice Clérot and Guillaume Raschia January 23, 2018 Orange Labs & LS2N Journée thématique EGC &

More information

K-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007

K-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007 K-Anonymity and Other Cluster- Based Methods Ge Ruan Oct 11,2007 Data Publishing and Data Privacy Society is experiencing exponential growth in the number and variety of data collections containing person-specific

More information

CS573 Data Privacy and Security. Li Xiong

CS573 Data Privacy and Security. Li Xiong CS573 Data Privacy and Security Anonymizationmethods Li Xiong Today Clustering based anonymization(cont) Permutation based anonymization Other privacy principles Microaggregation/Clustering Two steps:

More information

A FRAMEWORK FOR PRIVACY-PRESERVING MEDICAL DOCUMENT SHARING

A FRAMEWORK FOR PRIVACY-PRESERVING MEDICAL DOCUMENT SHARING A FRAMEWORK FOR PRIVACY-PRESERVING MEDICAL DOCUMENT SHARING Completed Research Paper Xiao-Bai Li, Jialun Qin Department of Operations and Information Systems Manning School of Business University of Massachusetts

More information

HIPAA Federal Security Rule H I P A A

HIPAA Federal Security Rule H I P A A H I P A A HIPAA Federal Security Rule nsurance ortability ccountability ct of 1996 HIPAA Introduction - What is HIPAA? HIPAA = The Health Insurance Portability and Accountability Act A Federal Law Created

More information

Data Anonymization - Generalization Algorithms

Data Anonymization - Generalization Algorithms Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity Generalization and Suppression Z2 = {410**} Z1 = {4107*. 4109*} Generalization Replace the value with a less specific

More information

Security Overview. Joseph Balberde North Country Community Mental Health Information Technology Director

Security Overview. Joseph Balberde North Country Community Mental Health Information Technology Director Security Overview Joseph Balberde North Country Community Mental Health Information Technology Director 2-5-2019 Protected Health Information Individually Identifiable Health Information (IIHI): is information

More information

Achieving k-anonmity* Privacy Protection Using Generalization and Suppression

Achieving k-anonmity* Privacy Protection Using Generalization and Suppression UT DALLAS Erik Jonsson School of Engineering & Computer Science Achieving k-anonmity* Privacy Protection Using Generalization and Suppression Murat Kantarcioglu Based on Sweeney 2002 paper Releasing Private

More information

ZIPpy Safe Harbor De-Identification Macros

ZIPpy Safe Harbor De-Identification Macros ZIPpy Safe Harbor De-Identification Macros SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates

More information

HIPAA 101: What All Doctors NEED To Know

HIPAA 101: What All Doctors NEED To Know HIPAA 101: What All Doctors NEED To Know 1 HIPAA Basics HIPAA: Health Insurance and Portability Accountability Act of 1996 Purpose: to protect confidential information through improved security and privacy

More information

Privacy Preserving Health Data Mining

Privacy Preserving Health Data Mining IJCST Vo l. 6, Is s u e 4, Oc t - De c 2015 ISSN : 0976-8491 (Online) ISSN : 2229-4333 (Print) Privacy Preserving Health Data Mining 1 Somy.M.S, 2 Gayatri.K.S, 3 Ashwini.B 1,2,3 Dept. of CSE, Mar Baselios

More information

After. you scanned. Apply. 4. When the

After. you scanned. Apply. 4. When the Redaction Permanently removing sensitive information in a PDF Redaction Instructions for Adobe Acrobat Version 9 & 10 Optical Characterr Recognitionn (OCR) for Scanned Documents: After you scanned your

More information

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database T.Malathi 1, S. Nandagopal 2 PG Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,

More information

CS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong

CS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong Outline Tabular data and histogram/range queries Algorithms for low dimensional data Algorithms for high dimensional

More information

A Disclosure Avoidance Research Agenda

A Disclosure Avoidance Research Agenda A Disclosure Avoidance Research Agenda presented at FCSM research conf. ; Nov. 5, 2013 session E-3; 10:15am: Data Disclosure Issues Paul B. Massell U.S. Census Bureau Center for Disclosure Avoidance Research

More information

Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP

Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP 324 Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP Shivaji Yadav(131322) Assistant Professor, CSE Dept. CSE, IIMT College of Engineering, Greater Noida,

More information

Towards Application-Oriented Data Anonymization

Towards Application-Oriented Data Anonymization Towards Application-Oriented Data Anonymization Li Xiong Kumudhavalli Rangachari Abstract Data anonymization is of increasing importance for allowing sharing of individual data for a variety of data analysis

More information

An Efficient Clustering Method for k-anonymization

An Efficient Clustering Method for k-anonymization An Efficient Clustering Method for -Anonymization Jun-Lin Lin Department of Information Management Yuan Ze University Chung-Li, Taiwan jun@saturn.yzu.edu.tw Meng-Cheng Wei Department of Information Management

More information

Mobile security: Tips and tricks for securing your iphone, Android and other mobile devices

Mobile security: Tips and tricks for securing your iphone, Android and other mobile devices Mobile security: Tips and tricks for securing your iphone, Android and other mobile devices Presented by Michael Harris [MS, CISSP, WAPT] Systems Security Analyst University of Missouri Overview What data

More information

Privacy Preserving Data Mining. Danushka Bollegala COMP 527

Privacy Preserving Data Mining. Danushka Bollegala COMP 527 Privacy Preserving ata Mining anushka Bollegala COMP 527 Privacy Issues ata mining attempts to ind mine) interesting patterns rom large datasets However, some o those patterns might reveal inormation that

More information

Segmentation of Images

Segmentation of Images Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a

More information

Privacy, Security & Ethical Issues

Privacy, Security & Ethical Issues Privacy, Security & Ethical Issues How do we mine data when we can t even look at it? 2 Individual Privacy Nobody should know more about any entity after the data mining than they did before Approaches:

More information

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud R. H. Jadhav 1 P.E.S college of Engineering, Aurangabad, Maharashtra, India 1 rjadhav377@gmail.com ABSTRACT: Many

More information

Data linkages in PEDSnet

Data linkages in PEDSnet 2016/2017 CRISP Seminar Series - Part IV Data linkages in PEDSnet Toan C. Ong, PhD Assistant Professor Department of Pediatrics University of Colorado, Anschutz Medical Campus Content Record linkage background

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

DR. GARY W BROOKS JR. National Provider Identifiers Registry

DR. GARY W BROOKS JR. National Provider Identifiers Registry 1962728824 DR. GARY W BROOKS JR. National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated

More information

Privacy Challenges in Big Data and Industry 4.0

Privacy Challenges in Big Data and Industry 4.0 Privacy Challenges in Big Data and Industry 4.0 Jiannong Cao Internet & Mobile Computing Lab Department of Computing Hong Kong Polytechnic University Email: csjcao@comp.polyu.edu.hk http://www.comp.polyu.edu.hk/~csjcao/

More information

DR. HENRY BRIK National Provider Identifiers Registry

DR. HENRY BRIK National Provider Identifiers Registry 1225568702 DR. HENRY BRIK The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers for

More information

A Review of Privacy Preserving Data Publishing Technique

A Review of Privacy Preserving Data Publishing Technique A Review of Privacy Preserving Data Publishing Technique Abstract:- Amar Paul Singh School of CSE Bahra University Shimla Hills, India Ms. Dhanshri Parihar Asst. Prof (School of CSE) Bahra University Shimla

More information

Survey Result on Privacy Preserving Techniques in Data Publishing

Survey Result on Privacy Preserving Techniques in Data Publishing Survey Result on Privacy Preserving Techniques in Data Publishing S.Deebika PG Student, Computer Science and Engineering, Vivekananda College of Engineering for Women, Namakkal India A.Sathyapriya Assistant

More information

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique P.Nithya 1, V.Karpagam 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College,

More information

DR. JERRY DEAN VANDEL National Provider Identifiers Registry

DR. JERRY DEAN VANDEL National Provider Identifiers Registry 1588699466 DR. JERRY DEAN VANDEL The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers

More information

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER

SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 31 st July 216. Vol.89. No.2 25-216 JATIT & LLS. All rights reserved. SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 1 AMANI MAHAGOUB OMER, 2 MOHD MURTADHA BIN MOHAMAD 1 Faculty of Computing,

More information

IJSER. Privacy and Data Mining

IJSER. Privacy and Data Mining Privacy and Data Mining 2177 Shilpa M.S Dept. of Computer Science Mohandas College of Engineering and Technology Anad,Trivandrum shilpams333@gmail.com Shalini.L Dept. of Computer Science Mohandas College

More information

Survey of Anonymity Techniques for Privacy Preserving

Survey of Anonymity Techniques for Privacy Preserving 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Survey of Anonymity Techniques for Privacy Preserving Luo Yongcheng

More information

DR. ADITI KALLA National Provider Identifiers Registry

DR. ADITI KALLA National Provider Identifiers Registry 1851653216 DR. ADITI KALLA National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption

More information

Privacy Preserving in Knowledge Discovery and Data Publishing

Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao, G.V Konda Reddy and G.Yedukondalu 33 Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao 1, G.V Konda Reddy 2, G.Yedukondalu 3 Abstract Knowledge Discovery is

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

Privacy by Design: Product Development Guidelines for Engineers & Product Managers. Purpose:

Privacy by Design: Product Development Guidelines for Engineers & Product Managers. Purpose: Privacy by Design: Product Development Guidelines for Engineers & Product Managers Purpose: The purpose of this document is to provide our development teams with high level principles and concepts relating

More information

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD)

Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD) Vol.2, Issue.1, Jan-Feb 2012 pp-208-212 ISSN: 2249-6645 Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD) Krishna.V #, Santhana Lakshmi. S * # PG Student,

More information

Attachment B Newtopia Wellness Program and Genetic Testing. The Health Risk Assessment also invites individuals to undergo genetic testing.

Attachment B Newtopia Wellness Program and Genetic Testing. The Health Risk Assessment also invites individuals to undergo genetic testing. Attachment B Newtopia Wellness Program and Genetic Testing The Newtopia health risk assessment asks about individuals health status, history, and risk factors, including family history of obesity. The

More information

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey ISSN No. 0976-5697 Volume 8, No. 5, May-June 2017 International Journal of Advanced Research in Computer Science SURVEY REPORT Available Online at www.ijarcs.info Preserving Privacy during Big Data Publishing

More information

A Review on Privacy Preserving Data Mining Approaches

A Review on Privacy Preserving Data Mining Approaches A Review on Privacy Preserving Data Mining Approaches Anu Thomas Asst.Prof. Computer Science & Engineering Department DJMIT,Mogar,Anand Gujarat Technological University Anu.thomas@djmit.ac.in Jimesh Rana

More information

Curation of Large Scale EHR Data for Use with Biobank Samples

Curation of Large Scale EHR Data for Use with Biobank Samples Curation of Large Scale EHR Data for Use with Biobank Samples Global Biobank Week 14.9.2017 Session 6B: Biobanks and Electronic Health Records Henrik Edgren, CSO Conflicts of interest Employee of MediSapiens

More information

Differentially Private H-Tree

Differentially Private H-Tree GeoPrivacy: 2 nd Workshop on Privacy in Geographic Information Collection and Analysis Differentially Private H-Tree Hien To, Liyue Fan, Cyrus Shahabi Integrated Media System Center University of Southern

More information

Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design)

Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Electronic Health Records for Clinical Research Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Project acronym: EHR4CR Project full title: Electronic

More information

DR. UPASANA BARDHAN CHAKRABORTY National Provider Identifiers Registry

DR. UPASANA BARDHAN CHAKRABORTY National Provider Identifiers Registry 1649459033 DR. UPASANA BARDHAN CHAKRABORTY The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique

More information

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,

More information

MR. ROBERT HENRY STACKPOLE JR. National Provider Identifiers Registry

MR. ROBERT HENRY STACKPOLE JR. National Provider Identifiers Registry 1932377421 MR. ROBERT HENRY STACKPOLE JR. The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique

More information

HIPAA and Social Media and other PHI Safeguards. Presented by the UAMS HIPAA Office August 2016 William Dobbins

HIPAA and Social Media and other PHI Safeguards. Presented by the UAMS HIPAA Office August 2016 William Dobbins HIPAA and Social Media and other PHI Safeguards Presented by the UAMS HIPAA Office August 2016 William Dobbins Social Networking Let s Talk Facebook More than 1 billion users (TNW, 2014) Half of all adult

More information

Privacy Preserved Data Publishing Techniques for Tabular Data

Privacy Preserved Data Publishing Techniques for Tabular Data Privacy Preserved Data Publishing Techniques for Tabular Data Keerthy C. College of Engineering Trivandrum Sabitha S. College of Engineering Trivandrum ABSTRACT Almost all countries have imposed strict

More information

DR. DALE E. FELDPAUSCH JR. National Provider Identifiers Registry

DR. DALE E. FELDPAUSCH JR. National Provider Identifiers Registry 1316956931 DR. DALE E. FELDPAUSCH JR. National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated

More information

Data Linkage Methods: Overview of Computer Science Research

Data Linkage Methods: Overview of Computer Science Research Data Linkage Methods: Overview of Computer Science Research Peter Christen Research School of Computer Science, ANU College of Engineering and Computer Science, The Australian National University, Canberra,

More information

On Syntactic Anonymity and Differential Privacy

On Syntactic Anonymity and Differential Privacy 161 183 On Syntactic Anonymity and Differential Privacy Chris Clifton 1, Tamir Tassa 2 1 Department of Computer Science/CERIAS, Purdue University, West Lafayette, IN 47907-2107 USA. 2 The Department of

More information

DR. ERIC JASON FRIEDLANDER National Provider Identifiers Registry

DR. ERIC JASON FRIEDLANDER National Provider Identifiers Registry 1184678351 DR. ERIC JASON FRIEDLANDER National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated

More information

CS573 Data Privacy and Security. Differential Privacy. Li Xiong

CS573 Data Privacy and Security. Differential Privacy. Li Xiong CS573 Data Privacy and Security Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques Composition theorems Statistical Data Privacy Non-interactive vs interactive Privacy

More information

Anonymization Algorithms - Microaggregation and Clustering

Anonymization Algorithms - Microaggregation and Clustering Anonymization Algorithms - Microaggregation and Clustering Li Xiong CS573 Data Privacy and Anonymity Anonymization using Microaggregation or Clustering Practical Data-Oriented Microaggregation for Statistical

More information

MR. JAMES MADISON MAY IV National Provider Identifiers Registry

MR. JAMES MADISON MAY IV National Provider Identifiers Registry 1447655154 MR. JAMES MADISON MAY IV National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated

More information

DR. HENRY BONE ELLIS JR. National Provider Identifiers Registry

DR. HENRY BONE ELLIS JR. National Provider Identifiers Registry 1821200494 DR. HENRY BONE ELLIS JR. National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated

More information

DR. CAROLE M DENTINO National Provider Identifiers Registry

DR. CAROLE M DENTINO National Provider Identifiers Registry 1689652869 DR. CAROLE M DENTINO National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the

More information

An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis

An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis Mohammad Hammoud CS3525 Dept. of Computer Science University of Pittsburgh Introduction This paper addresses the problem of defining

More information

DR. MARC ADAM AGULNICK National Provider Identifiers Registry

DR. MARC ADAM AGULNICK National Provider Identifiers Registry 1700875846 DR. MARC ADAM AGULNICK National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated

More information

MRS. LIZA HOEN REICHENBERGER National Provider Identifiers Registry

MRS. LIZA HOEN REICHENBERGER National Provider Identifiers Registry 1356454698 MRS. LIZA HOEN REICHENBERGER The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique

More information

DR. VANG XIONG SKIBBIE National Provider Identifiers Registry

DR. VANG XIONG SKIBBIE National Provider Identifiers Registry 1225408826 DR. VANG XIONG SKIBBIE The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers

More information

DR. LAURIN MARINE WEISENTHAL CRISTIANO National Provider Identifiers Registry

DR. LAURIN MARINE WEISENTHAL CRISTIANO National Provider Identifiers Registry 1548674773 DR. LAURIN MARINE WEISENTHAL CRISTIANO National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996

More information

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015.

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015. Privacy-preserving machine learning Bo Liu, the HKUST March, 1st, 2015. 1 Some slides extracted from Wang Yuxiang, Differential Privacy: a short tutorial. Cynthia Dwork, The Promise of Differential Privacy.

More information

MS. CLAUDIA DEMME PLUTA National Provider Identifiers Registry

MS. CLAUDIA DEMME PLUTA National Provider Identifiers Registry 1255319257 MS. CLAUDIA DEMME PLUTA The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers

More information

DR. JAMES ROBERT SABETTA National Provider Identifiers Registry

DR. JAMES ROBERT SABETTA National Provider Identifiers Registry 1669508057 DR. JAMES ROBERT SABETTA National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated

More information

NON-CENTRALIZED DISTINCT L-DIVERSITY

NON-CENTRALIZED DISTINCT L-DIVERSITY NON-CENTRALIZED DISTINCT L-DIVERSITY Chi Hong Cheong 1, Dan Wu 2, and Man Hon Wong 3 1,3 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong {chcheong, mhwong}@cse.cuhk.edu.hk

More information

TIES Usage Policies. for University of Pittsburgh. Authors. University of Pittsburgh

TIES Usage Policies. for University of Pittsburgh. Authors. University of Pittsburgh TIES Usage Policies for University of Pittsburgh Authors University of Pittsburgh Girish Chavan, MS Elizabeth Legowski, BS Rebecca Crowley Jacobson, MD, MS Table of Contents A. DOCUMENT HISTORY... A-1

More information

DR. EVAN HENRY ARGINTAR National Provider Identifiers Registry

DR. EVAN HENRY ARGINTAR National Provider Identifiers Registry 1417108457 DR. EVAN HENRY ARGINTAR The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers

More information

Towards the Anonymisation of RDF Data

Towards the Anonymisation of RDF Data Towards the Anonymisation of RDF Data Filip Radulovic Ontology Engineering Group ETSI Informáticos Universidad Politécnica de Madrid Madrid, Spain fradulovic@fi.upm.es Raúl García-Castro Ontology Engineering

More information

DR. TROY HENRY CARON National Provider Identifiers Registry

DR. TROY HENRY CARON National Provider Identifiers Registry 1033182050 DR. TROY HENRY CARON National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the

More information

Abstract & Implementation

Abstract & Implementation Abstract & Implementation The Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule mandates the deidentification of specific types of Protected Health Information (PHI) for

More information

DR. KAMIL PRASAD National Provider Identifiers Registry

DR. KAMIL PRASAD National Provider Identifiers Registry 1518403989 DR. KAMIL PRASAD National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption

More information

COREY ANDREW MAYER National Provider Identifiers Registry

COREY ANDREW MAYER National Provider Identifiers Registry 1972828903 COREY ANDREW MAYER The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers

More information