HIDE: Privacy Preserving Medical Data Publishing. James Gardner Department of Mathematics and Computer Science Emory University
|
|
- Blaze Cole
- 6 years ago
- Views:
Transcription
1 HIDE: Privacy Preserving Medical Data Publishing James Gardner Department of Mathematics and Computer Science Emory University
2 Motivation De-identification is critical in any health informatics system Research Sharing Need an easy-to-use interface and framework for data custodians and publishers Understanding data is necessary to de-identify data
3 HIPAA 1. Names; 2. All geographical subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code, if according to the current publicly available data from the Bureau of the Census: (1) The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and (2) The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older; 4. Phone numbers; 5. Fax numbers; 6. Electronic mail addresses; 7. Social Security numbers; 8. Medical record numbers; 9. Health plan beneficiary numbers; 10. Account numbers; 11. Certificate/license numbers; 12. Vehicle identifiers and serial numbers, including license plate numbers; 13. Device identifiers and serial numbers; 14. Web Universal Resource Locators (URLs); 15. Internet Protocol (IP) address numbers; 16. Biometric identifiers, including finger and voice prints; 17. Full face photographic images and any comparable images; and 18. Any other unique identifying number, characteristic, or code (note this does not mean the unique code assigned by the investigator to code the data)
4 PHI Summary Protected Health Information (PHI) is defined by HIPAA as individually identifiable health information Direct identifiers include name, SSN, etc. Indirect identifiers include gender, age, address information, etc.
5 Research Challenges Detect PHI in heterogeneous medical data Apply structured anonymization principles on heterogeneous medical data (micro-privacy) Release differentially private aggregated statistics (macro-privacy)
6 HIDE Health Information DE-identification Uses techniques from Information Extraction Data linking Structured Anonymization Differential Privacy Data Mining
7 HIDE
8 Outline Background and related work Named entity recognition Proposed Work HIDE framework Micro-data publishing Macro-data publishing Existing de-identification approaches Privacy preserving data publishing Identifying and sensitive information extraction Software
9 Alternative Systems Scrub System - rules and dictionaries are used to detect PHI Semantic Lexicon System - rules and dictionaries are used to detect PHI DE-ID - rules and dictionaries, developed at Pittsburgh and approved by IRB Concept-Match Scrubber - removes every word not in an approved list of non-identifying terms Carafe - uses a CRF to detect PHI
10 Limitations of Most Systems Lack portability Donʼt give formal privacy guarantees Donʼt utilize the latest work from structured data anonymization Focus only on removing PHI
11 Named Entity Recognition Locate and classify atomic elements in text into predefined categories such as person, organization, location, expressions of time, quantities, etc. NER systems can be classified into either: Rule-based Machine Learning-based
12 NER Examples Part-of-speech (POS) Tagging I/PRP think/vbp it/prp s/bes a/dt pretty/ RB good/jj idea/nn./. Personal Health Identifier Detection <age>77</age> year old <gender>female</ gender> with history of <disease>b-cell lymphoma</disease> (Marginal zone, <mrn>sh </mrn>)
13 NER Metrics Precision TP / (TP + FP) Recall TP / (TP + FN)
14 Rule-based Rely on hand-coded rules and dictionaries Dictionaries can be used for terms in a closed class with an exhaustive list, e.g. geographic locations Regular expressions are used to detect terms that follow certain syntactic patterns, e.g. phone numbers
15 Machine learning-based Model the NER as a sequence labeling task where each token is assigned a label Train classifiers to label each token Classifiers use a list of features (or attributes) for training and classification of the sequence Frequently applied classifiers are HMM, MEMM, SVM, and CRF
16 Conditional Random Field A Conditional Random Field (CRF) provides a probabilistic framework for labeling and segmenting sequential data A CRF defines a conditional probability of a label sequence given an observation sequence
17 Comparison Rule-based Accurate Require experts to modify Not portable Machine learning-based Accurate Modification of models is done through training rather than coding Portable
18 Privacy Preserving Data Publishing Weak privacy (Micro) release a modified version of each record according to a given anonymization principle assumes level of background knowledge Differential privacy (Macro) release perturbed statistics that satisfy the differential privacy principle no assumptions of background knowledge
19 Micro-data publishing Prevent linking of records in separate databases k-anonymization l-diversity Prevent discovery of sensitive values Prevent discovery of presence or absence in a database delta-presence
20 Micro-data publishing Name Age Gender Zipcode Diagnosis Henry 25 Male Influenza Irene 28 Female Lymphoma Dan 28 Male Bronchitis Erica 26 Female Influenza Table 1: Illustration of Anonymization Name Age Gender Zipcode Diagnosis Henry 25 Male Influenza Irene 28 Female Lymphoma Dan 28 Male Bronchitis Erica 26 Female Influenza Original Data Data Name Age Gender Zipcode Disease [25 28] Male [ ] Influenza [25 28] Female Lymphoma [25 28] Male [ ] Bronchitis [25 28] Female Influenza Anonymized Data Name Age Gender Zipcode Disease [25 28] Male [ ] Influenza [25 28] Female Lymphoma [25 28] Male [ ] Bronchitis [25 28] Female Influenza Anonymized Data
21 k-anonymization Quasi identifier set Sensitive attributes Table is k-anonymous if every record has k-1 other records with the same quasiidentifier set The probability of linking a victim to a specific record through QID is at most 1/k
22 l-diversity Extension of k-anonymization Also ensures that each group has at least l distinct sensitive values Prevents disclosure of sensitive values
23 Macro-data publishing Differential Privacy is a strong privacy notion Requires that a randomized computation yields nearly identical output when performed on nearly identical input Interactive model limited to a specific number of queries Non-interactive model need query strategies to build noisy data cubes that maximize utility for a random query workload
24 Differentially Private Interface Query Strategy Workload Pre-designed Queries Original Data Differentially Private Interface Diff. Private Answers Diff. Private Histogram Queries Answers User
25 HIDE Framework Identifying and Sensitive Information Extraction uses state-of-the-art CRF model to extract PHI and sensitive information Data linking provides structured patient-centric view of the data De-identification and Anonymization Micro-data publication - uses data suppression and generalization to provide a k-anonymized view of the data Macro-data publication - release perturbed aggregated statistics from the patient-centric view
26 HIDE
27 Identifying and sensitive information extraction Use CRF classifier to extract information Studied impact of features including: regular expressions affixes dictionaries context Sampling techniques to adjust classifier for higher precision or recall
28 Example Token Label Token Label 77 B-age of O year O B B-disease old O - I-disease female B-gender cell I-disease with O lymphoma I-disease history O ( O
29 Regular Expressions Regular Expression Name ^[A-Za-z]$ ALPHA ^[A-Z].*$ INITCAPS ^[A-Z][a-z].*$ UPPER-LOWER ^[A-Z]+$ ALLCAPS ^[A-Z][a-z]+[A-Z][A-Za-z]*$ MIXEDCAPS ^[A-Za-z]$ SINGLECHAR ^[0-9]$ SINGLEDIGIT ^[0-9][0-9]$ DOUBLEDIGIT ^[0-9][0-9][0-9]$ TRIPLEDIGIT ^[0-9][0-9][0-9][0-9]$ QUADDIGIT ^[0-9,]+$ NUMBER [0-9] HASDIGIT ^.*[0-9].*[A-Za-z].*$ ALPHANUMERIC ^.*[A-Za-z].*[0-9].*$ ALPHANUMERIC ^[0-9]+[A-Za-z]$ NUMBERS LETTERS ^[A-Za-z]+[0-9]+$ LETTERS NUMBERS - HASDASH HASQUOTE / HASSLASH ~!@#$%\^&*()\-=_+\[\]{} ; :\",./<>?]+$ ISPUNCT (- \+)?[0-9,]+(\.[0-9]*)?%?$ REALNUMBER ^-.* STARTMINUS ^\+.*$ STARTPLUS ^.*%$ ENDPERCENT ^[IVXDLCM]+$ ROMAN ^\s+$ ISSPACE Table 1: List of regular expression features used in HIDE
30 Affixes Prefixes Suffixes All affixes up to size 3
31 Dictionaries Company Names Last Names State Names Hospital Names Male First Names Female First Names State Abbreviations
32 Context Previous 4 words Next 4 words Occurrence counts
33 Feature Vectors Token CAPS? SPECIAL? PREVIOUS NEXT LABEL 77 N Y? year B-age year N N 77 old O old N N year female O female N N old with B-gender with N N female history O history N N with of O of N N history B O B Y N of - B-disease - N Y B cell I-disease cell N N - lymphoma I-disease lymphoma N N cell ( I-disease
34 Features Set Results 220 re-identified pathology reports for i2b2 task 10-fold cross-validation
35 Features Set Results d r rd a ad ra rad c cd ac acd rc rac racd rcd Precision Recall F-Score
36 Sampling Honest brokers are concerned more about recall than precision Cost proportionate rejection sampling is often used for boosting Training examples are selected based on the associated cost of missing that label
37 Random O-Sampling Keep all non- O labels Select O labels with given probability Biases the classifier to select a label other than O
38 Random O-Sampling i2b2 PhysioNet Sample Probability Sample Probability prec recall f-score prec recall f-score
39 Window Sampling Select all non- O labels Select all terms within given window of any non- O label
40 Window Sampling i2b2 PhysioNet History Size Precision F-Score Recall History Size Precision Recall F-Score
41 Information Extraction Conclusion HIDE has a fast and accurate CRF for detecting PHI Feature engineering has been explored in great detail Window Sampling can be used to adjust recall with minimal impact on precision Impact of training data size
42 Micro-data publishing Release patient-centric view or original data with suppressed or generalized values Apply k-anonymization and l-diversity principles to unstructured data Evaluate query accuracy on real medical data
43 Micro-data publishing Full de-identification Remove all identifiers Partial de-identification Remove direct identifiers Statistical de-identification Statistical anonymization
44 Statistical Anonymization Partition the original data points into groups that will all share the same values with respect to QID Use multi-dimensional mondrian algorithm for releasing k-anonymized and l-diverse version of structured patient-centric view
45 Partitioning (a) Patients (b) Single-Dimensional (c) Strict Multidimensional e 4. Spatial representation of Patients and partitionings (quasi-identifiers Zipcode an
46 Mondrian algorithm Greedy top-down partitioning approach Choose dimension with maximum range Split at median if each newly created partition still satisfies k-anonymization and l-diversity
47 Example 50 (k = 50) (k = 25) (k = 10) (b) Greedy strict multidimensional partitioning More precision partitions are possible with smaller k
48 Query Accuracy 100 pathology reports age > n age < n 10,000 random queries
49 Query Accuracy Query Precision (%) Statistical De-identification Partial De-identification Full De-identification k
50 Macro-data publishing Differentially private data publishing (DPDP) module in HIDE Create differentially private data cube where each dimension represents a statistic over the patient-centric view Partitioning algorithm based on information gain to maximize level of utility of differentially private data cube Consistency algorithm to enhance utility
51 Differentially Private Interface Query Strategy Workload Pre-designed Queries Original Data Differentially Private Interface Diff. Private Answers Diff. Private Histogram Queries Answers User
52 DPDP considers: DPDP Access to original data Partitioning of the original database that best satisfies the workload of queries Level of differential privacy of data cube Level of utility (or noise) in the released data cube
53 Access to original database Every time the original database is accessed we use some of the privacy budget Access the original database in a differentially private manner Minimize the amount of times the original data is queried to minimize the amount of noise we must add to the results
54 Query Strategy Develop a query strategy that will allow the most utility given random queries from the user This query strategy is accomplished by partitioning the data according to information gain
55 Partitioning of the original database Release two data cubes One using cell-based algorithm that partitions database into itʼs individual cells and release a perturbed count for each cell One using top-down multi-dimensional partitioning strategy, where each split value selection maximizes the information gain and ensures the uniformity of the data points in the partition A consistency algorithm will be applied to the two datacubes that will increase the accuracy of the released datacubes
56 Cell partitioning Age Income 40K 50K Q1: count() where Age = 20, Income = 40K Q2: count() where Age = 20, Income = 50K alpha Age Income 40K 50K Q Select count where age > 20 and age < 30 alpha is the differential privacy parameter
57 Multi-dimensional partitioning Age Income 40K 50K Multi-dimensioning partitioning Age Income 40K 50K Select count where age > 20 and age < 30 Noise is divided
58 Goals of partitioning strategy Large partitions to minimize aggregated perturbation error Uniform partitions to minimize approximation error Minimize the number of times we access the original data
59 Proposed Approach Age Income 40K 50K Cell partitioning queries (alpha/2) 3. Multi-dim partitioning queries (alpha/2) K 50K Multi-dim 40K 50KPartitioning K 50K
60 Utility of release The level of utility is measured by comparing the value a query workload on the released differentially private data cubes and a non-perturbed data cube generated from the original data We empirically evaluate the level of error that is function of the given privacy budget
61 Software Web application Python, Django, and CouchDB Interface Iterative labeling of documents and training underlying classifier Analyze accuracy of classifier on validation sets Classifier is super-fast CRF provided by CRFSuite
62 Publications Y. Xiao, J. Gardner, L. Xiong. DPCube: Releasing Differentially Private Data Cubes for Health Information (demo paper). In 28th IEEE International Conference on Data Engineering (ICDE), 2012 James Gardner, Li Xiong, Fusheng Wang, Andrew Post and Joel Saltz. An evaluation of feature sets and sampling techniques for statistical de-identification of medical records. In 1st ACM International Health Informatics Symposium, 2010 (to appear). Li Xiong, James Gardner, Pawel Jurczyk and James J. Lu. Privacy Preserving Information Discovery on EHRs. In Information Discovery on Electronic Health Records, Ed. Vagelis Hristidis. Chapman and Hall/CRC, pp , James Gardner and Li Xiong. An integrated framework for de-identifying unstructured medical data. Data and Knowledge Engineering, 68(12), pp , 2009, doi: /j.datak James Gardner, Kanwei Li, Li Xiong and James J. Lu. HIDE: Heterogeneous Information DEidentification (demo track). 12th International Conference on Extending Database Technology (EDBT), March, James Gardner and Li Xiong. HIDE: A Health Information DE-identification System. In 21st IEEE International Symposium on Computer-Based Medical Systems (CBMS), June, 2008
POLICY. Create a governance process to manage requests to extract de- identified data from the Information Exchange (IE).
Academic Health Center Office of Biomedical Health Informatics POLICY Extraction of De- Identifiable Data from the Information Exchange Approved Proposal Purpose Create a governance process to manage requests
More informationStatistical and Synthetic Data Sharing with Differential Privacy
pscanner and idash Data Sharing Symposium UCSD, Sept 30 Oct 2, 2015 Statistical and Synthetic Data Sharing with Differential Privacy Li Xiong Department of Mathematics and Computer Science Department of
More informationIntroduction/Instructions
Introduction/Instructions Registries (data banks) and repositories (tissue banks, usually with databases associated) all involve the collection and storage of information and/or biological specimens that
More informationUniversal Patient Key
Universal Patient Key Overview The Healthcare Data Privacy (i.e., HIPAA Compliance) and Data Management Challenge The healthcare industry continues to struggle with two important goals that many view as
More informationHIPAA and HIPAA Compliance with PHI/PII in Research
HIPAA and HIPAA Compliance with PHI/PII in Research HIPAA Compliance Federal Regulations-Enforced by Office of Civil Rights State Regulations-Texas Administrative Codes Institutional Policies-UTHSA HOPs/IRB
More informationPrivacy Preserving Data Mining: An approach to safely share and use sensible medical data
Privacy Preserving Data Mining: An approach to safely share and use sensible medical data Gerhard Kranner, Viscovery Biomax Symposium, June 24 th, 2016, Munich www.viscovery.net Privacy protection vs knowledge
More informationPrivacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S.
Chapman & Hall/CRC Data Mining and Knowledge Discovery Series Introduction to Privacy-Preserving Data Publishing Concepts and Techniques Benjamin C M Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S Yu CRC
More informationEXAMPLE 2-JOINT PRIVACY AND SECURITY CHECKLIST
Purpose: The purpose of this Checklist is to evaluate your proposal to use or disclose Protected Health Information ( PHI ) for the purpose indicated below and allow the University Privacy Office and Office
More informationIncognito: Efficient Full Domain K Anonymity
Incognito: Efficient Full Domain K Anonymity Kristen LeFevre David J. DeWitt Raghu Ramakrishnan University of Wisconsin Madison 1210 West Dayton St. Madison, WI 53706 Talk Prepared By Parul Halwe(05305002)
More informationData Security and Privacy. Topic 18: k-anonymity, l-diversity, and t-closeness
Data Security and Privacy Topic 18: k-anonymity, l-diversity, and t-closeness 1 Optional Readings for This Lecture t-closeness: Privacy Beyond k-anonymity and l-diversity. Ninghui Li, Tiancheng Li, and
More informationEXAMPLE 3-JOINT PRIVACY AND SECURITY CHECKLIST
Purpose: The purpose of this Checklist is to evaluate your proposal to use or disclose Protected Health Information ( PHI ) for the purpose indicated below and allow the University Privacy Office and Office
More informationIntroduction to Data Mining
Introduction to Data Mining Privacy preserving data mining Li Xiong Slides credits: Chris Clifton Agrawal and Srikant 4/3/2011 1 Privacy Preserving Data Mining Privacy concerns about personal data AOL
More informationPrivacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University
Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University Outline Privacy preserving data publishing: What and Why Examples of privacy attacks
More informationSecurity Control Methods for Statistical Database
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security Statistical Database A statistical database is a database which provides statistics on subsets of records OLAP
More informationHIPAA and Research Contracts JILL RAINES, ASSISTANT GENERAL COUNSEL AND UNIVERSITY PRIVACY OFFICIAL
HIPAA and Research Contracts JILL RAINES, ASSISTANT GENERAL COUNSEL AND UNIVERSITY PRIVACY OFFICIAL Just a Few Reminders HIPAA applies to Covered Entities HIPAA is a federal law that governs the privacy
More informationUniversity of Mississippi Medical Center Data Use Agreement Protected Health Information
Data Use Agreement Protected Health Information This Data Use Agreement ( DUA ) is effective on the day of, 20, ( Effective Date ) by and between (UMMC) ( Data Custodian ), and ( Recipient ), located at
More informationOverview of Datavant's De-Identification and Linking Technology for Structured Data
Overview of Datavant's De-Identification and Linking Technology for Structured Data Introduction Datavant is firmly committed to advancing healthcare through data analytics while protecting patients privacy.
More informationAutomated Information Retrieval System Using Correlation Based Multi- Document Summarization Method
Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Dr.K.P.Kaliyamurthie HOD, Department of CSE, Bharath University, Tamilnadu, India ABSTRACT: Automated
More informationK ANONYMITY. Xiaoyong Zhou
K ANONYMITY LATANYA SWEENEY Xiaoyong Zhou DATA releasing: Privacy vs. Utility Society is experiencing exponential growth in the number and variety of data collections containing person specific specific
More informationComputer Security Incident Response Plan. Date of Approval: 23-FEB-2014
Computer Security Incident Response Plan Name of Approver: Mary Ann Blair Date of Approval: 23-FEB-2014 Date of Review: 31-MAY-2016 Effective Date: 23-FEB-2014 Name of Reviewer: John Lerchey Table of Contents
More informationBest Practices. Contents. Meridian Technologies 5210 Belfort Rd, Suite 400 Jacksonville, FL Meridiantechnologies.net
Meridian Technologies 5210 Belfort Rd, Suite 400 Jacksonville, FL 32257 Meridiantechnologies.net Contents Overview... 2 A Word on Data Profiling... 2 Extract... 2 De- Identification... 3 PHI... 3 Subsets...
More informationEmerging Measures in Preserving Privacy for Publishing The Data
Emerging Measures in Preserving Privacy for Publishing The Data K.SIVARAMAN 1 Assistant Professor, Dept. of Computer Science, BIST, Bharath University, Chennai -600073 1 ABSTRACT: The information in the
More informationCo-clustering for differentially private synthetic data generation
Co-clustering for differentially private synthetic data generation Tarek Benkhelif, Françoise Fessant, Fabrice Clérot and Guillaume Raschia January 23, 2018 Orange Labs & LS2N Journée thématique EGC &
More informationK-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007
K-Anonymity and Other Cluster- Based Methods Ge Ruan Oct 11,2007 Data Publishing and Data Privacy Society is experiencing exponential growth in the number and variety of data collections containing person-specific
More informationCS573 Data Privacy and Security. Li Xiong
CS573 Data Privacy and Security Anonymizationmethods Li Xiong Today Clustering based anonymization(cont) Permutation based anonymization Other privacy principles Microaggregation/Clustering Two steps:
More informationA FRAMEWORK FOR PRIVACY-PRESERVING MEDICAL DOCUMENT SHARING
A FRAMEWORK FOR PRIVACY-PRESERVING MEDICAL DOCUMENT SHARING Completed Research Paper Xiao-Bai Li, Jialun Qin Department of Operations and Information Systems Manning School of Business University of Massachusetts
More informationHIPAA Federal Security Rule H I P A A
H I P A A HIPAA Federal Security Rule nsurance ortability ccountability ct of 1996 HIPAA Introduction - What is HIPAA? HIPAA = The Health Insurance Portability and Accountability Act A Federal Law Created
More informationData Anonymization - Generalization Algorithms
Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity Generalization and Suppression Z2 = {410**} Z1 = {4107*. 4109*} Generalization Replace the value with a less specific
More informationSecurity Overview. Joseph Balberde North Country Community Mental Health Information Technology Director
Security Overview Joseph Balberde North Country Community Mental Health Information Technology Director 2-5-2019 Protected Health Information Individually Identifiable Health Information (IIHI): is information
More informationAchieving k-anonmity* Privacy Protection Using Generalization and Suppression
UT DALLAS Erik Jonsson School of Engineering & Computer Science Achieving k-anonmity* Privacy Protection Using Generalization and Suppression Murat Kantarcioglu Based on Sweeney 2002 paper Releasing Private
More informationZIPpy Safe Harbor De-Identification Macros
ZIPpy Safe Harbor De-Identification Macros SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates
More informationHIPAA 101: What All Doctors NEED To Know
HIPAA 101: What All Doctors NEED To Know 1 HIPAA Basics HIPAA: Health Insurance and Portability Accountability Act of 1996 Purpose: to protect confidential information through improved security and privacy
More informationPrivacy Preserving Health Data Mining
IJCST Vo l. 6, Is s u e 4, Oc t - De c 2015 ISSN : 0976-8491 (Online) ISSN : 2229-4333 (Print) Privacy Preserving Health Data Mining 1 Somy.M.S, 2 Gayatri.K.S, 3 Ashwini.B 1,2,3 Dept. of CSE, Mar Baselios
More informationAfter. you scanned. Apply. 4. When the
Redaction Permanently removing sensitive information in a PDF Redaction Instructions for Adobe Acrobat Version 9 & 10 Optical Characterr Recognitionn (OCR) for Scanned Documents: After you scanned your
More informationEnhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database
Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database T.Malathi 1, S. Nandagopal 2 PG Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,
More informationCS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong
CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong Outline Tabular data and histogram/range queries Algorithms for low dimensional data Algorithms for high dimensional
More informationA Disclosure Avoidance Research Agenda
A Disclosure Avoidance Research Agenda presented at FCSM research conf. ; Nov. 5, 2013 session E-3; 10:15am: Data Disclosure Issues Paul B. Massell U.S. Census Bureau Center for Disclosure Avoidance Research
More informationImplementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP
324 Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP Shivaji Yadav(131322) Assistant Professor, CSE Dept. CSE, IIMT College of Engineering, Greater Noida,
More informationTowards Application-Oriented Data Anonymization
Towards Application-Oriented Data Anonymization Li Xiong Kumudhavalli Rangachari Abstract Data anonymization is of increasing importance for allowing sharing of individual data for a variety of data analysis
More informationAn Efficient Clustering Method for k-anonymization
An Efficient Clustering Method for -Anonymization Jun-Lin Lin Department of Information Management Yuan Ze University Chung-Li, Taiwan jun@saturn.yzu.edu.tw Meng-Cheng Wei Department of Information Management
More informationMobile security: Tips and tricks for securing your iphone, Android and other mobile devices
Mobile security: Tips and tricks for securing your iphone, Android and other mobile devices Presented by Michael Harris [MS, CISSP, WAPT] Systems Security Analyst University of Missouri Overview What data
More informationPrivacy Preserving Data Mining. Danushka Bollegala COMP 527
Privacy Preserving ata Mining anushka Bollegala COMP 527 Privacy Issues ata mining attempts to ind mine) interesting patterns rom large datasets However, some o those patterns might reveal inormation that
More informationSegmentation of Images
Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a
More informationPrivacy, Security & Ethical Issues
Privacy, Security & Ethical Issues How do we mine data when we can t even look at it? 2 Individual Privacy Nobody should know more about any entity after the data mining than they did before Approaches:
More informationDistributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud
Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud R. H. Jadhav 1 P.E.S college of Engineering, Aurangabad, Maharashtra, India 1 rjadhav377@gmail.com ABSTRACT: Many
More informationData linkages in PEDSnet
2016/2017 CRISP Seminar Series - Part IV Data linkages in PEDSnet Toan C. Ong, PhD Assistant Professor Department of Pediatrics University of Colorado, Anschutz Medical Campus Content Record linkage background
More informationDiscriminative classifiers for image recognition
Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study
More informationDR. GARY W BROOKS JR. National Provider Identifiers Registry
1962728824 DR. GARY W BROOKS JR. National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated
More informationPrivacy Challenges in Big Data and Industry 4.0
Privacy Challenges in Big Data and Industry 4.0 Jiannong Cao Internet & Mobile Computing Lab Department of Computing Hong Kong Polytechnic University Email: csjcao@comp.polyu.edu.hk http://www.comp.polyu.edu.hk/~csjcao/
More informationDR. HENRY BRIK National Provider Identifiers Registry
1225568702 DR. HENRY BRIK The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers for
More informationA Review of Privacy Preserving Data Publishing Technique
A Review of Privacy Preserving Data Publishing Technique Abstract:- Amar Paul Singh School of CSE Bahra University Shimla Hills, India Ms. Dhanshri Parihar Asst. Prof (School of CSE) Bahra University Shimla
More informationSurvey Result on Privacy Preserving Techniques in Data Publishing
Survey Result on Privacy Preserving Techniques in Data Publishing S.Deebika PG Student, Computer Science and Engineering, Vivekananda College of Engineering for Women, Namakkal India A.Sathyapriya Assistant
More informationImproving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique
Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique P.Nithya 1, V.Karpagam 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College,
More informationDR. JERRY DEAN VANDEL National Provider Identifiers Registry
1588699466 DR. JERRY DEAN VANDEL The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers
More informationSIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER
31 st July 216. Vol.89. No.2 25-216 JATIT & LLS. All rights reserved. SIMPLE AND EFFECTIVE METHOD FOR SELECTING QUASI-IDENTIFIER 1 AMANI MAHAGOUB OMER, 2 MOHD MURTADHA BIN MOHAMAD 1 Faculty of Computing,
More informationIJSER. Privacy and Data Mining
Privacy and Data Mining 2177 Shilpa M.S Dept. of Computer Science Mohandas College of Engineering and Technology Anad,Trivandrum shilpams333@gmail.com Shalini.L Dept. of Computer Science Mohandas College
More informationSurvey of Anonymity Techniques for Privacy Preserving
2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Survey of Anonymity Techniques for Privacy Preserving Luo Yongcheng
More informationDR. ADITI KALLA National Provider Identifiers Registry
1851653216 DR. ADITI KALLA National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption
More informationPrivacy Preserving in Knowledge Discovery and Data Publishing
B.Lakshmana Rao, G.V Konda Reddy and G.Yedukondalu 33 Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao 1, G.V Konda Reddy 2, G.Yedukondalu 3 Abstract Knowledge Discovery is
More informationData Mining Course Overview
Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical
More informationPrivacy by Design: Product Development Guidelines for Engineers & Product Managers. Purpose:
Privacy by Design: Product Development Guidelines for Engineers & Product Managers Purpose: The purpose of this document is to provide our development teams with high level principles and concepts relating
More informationSecured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD)
Vol.2, Issue.1, Jan-Feb 2012 pp-208-212 ISSN: 2249-6645 Secured Medical Data Publication & Measure the Privacy Closeness Using Earth Mover Distance (EMD) Krishna.V #, Santhana Lakshmi. S * # PG Student,
More informationAttachment B Newtopia Wellness Program and Genetic Testing. The Health Risk Assessment also invites individuals to undergo genetic testing.
Attachment B Newtopia Wellness Program and Genetic Testing The Newtopia health risk assessment asks about individuals health status, history, and risk factors, including family history of obesity. The
More informationPreserving Privacy during Big Data Publishing using K-Anonymity Model A Survey
ISSN No. 0976-5697 Volume 8, No. 5, May-June 2017 International Journal of Advanced Research in Computer Science SURVEY REPORT Available Online at www.ijarcs.info Preserving Privacy during Big Data Publishing
More informationA Review on Privacy Preserving Data Mining Approaches
A Review on Privacy Preserving Data Mining Approaches Anu Thomas Asst.Prof. Computer Science & Engineering Department DJMIT,Mogar,Anand Gujarat Technological University Anu.thomas@djmit.ac.in Jimesh Rana
More informationCuration of Large Scale EHR Data for Use with Biobank Samples
Curation of Large Scale EHR Data for Use with Biobank Samples Global Biobank Week 14.9.2017 Session 6B: Biobanks and Electronic Health Records Henrik Edgren, CSO Conflicts of interest Employee of MediSapiens
More informationDifferentially Private H-Tree
GeoPrivacy: 2 nd Workshop on Privacy in Geographic Information Collection and Analysis Differentially Private H-Tree Hien To, Liyue Fan, Cyrus Shahabi Integrated Media System Center University of Southern
More informationExecutive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design)
Electronic Health Records for Clinical Research Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Project acronym: EHR4CR Project full title: Electronic
More informationDR. UPASANA BARDHAN CHAKRABORTY National Provider Identifiers Registry
1649459033 DR. UPASANA BARDHAN CHAKRABORTY The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique
More informationCS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,
More informationMR. ROBERT HENRY STACKPOLE JR. National Provider Identifiers Registry
1932377421 MR. ROBERT HENRY STACKPOLE JR. The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique
More informationHIPAA and Social Media and other PHI Safeguards. Presented by the UAMS HIPAA Office August 2016 William Dobbins
HIPAA and Social Media and other PHI Safeguards Presented by the UAMS HIPAA Office August 2016 William Dobbins Social Networking Let s Talk Facebook More than 1 billion users (TNW, 2014) Half of all adult
More informationPrivacy Preserved Data Publishing Techniques for Tabular Data
Privacy Preserved Data Publishing Techniques for Tabular Data Keerthy C. College of Engineering Trivandrum Sabitha S. College of Engineering Trivandrum ABSTRACT Almost all countries have imposed strict
More informationDR. DALE E. FELDPAUSCH JR. National Provider Identifiers Registry
1316956931 DR. DALE E. FELDPAUSCH JR. National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated
More informationData Linkage Methods: Overview of Computer Science Research
Data Linkage Methods: Overview of Computer Science Research Peter Christen Research School of Computer Science, ANU College of Engineering and Computer Science, The Australian National University, Canberra,
More informationOn Syntactic Anonymity and Differential Privacy
161 183 On Syntactic Anonymity and Differential Privacy Chris Clifton 1, Tamir Tassa 2 1 Department of Computer Science/CERIAS, Purdue University, West Lafayette, IN 47907-2107 USA. 2 The Department of
More informationDR. ERIC JASON FRIEDLANDER National Provider Identifiers Registry
1184678351 DR. ERIC JASON FRIEDLANDER National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated
More informationCS573 Data Privacy and Security. Differential Privacy. Li Xiong
CS573 Data Privacy and Security Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques Composition theorems Statistical Data Privacy Non-interactive vs interactive Privacy
More informationAnonymization Algorithms - Microaggregation and Clustering
Anonymization Algorithms - Microaggregation and Clustering Li Xiong CS573 Data Privacy and Anonymity Anonymization using Microaggregation or Clustering Practical Data-Oriented Microaggregation for Statistical
More informationMR. JAMES MADISON MAY IV National Provider Identifiers Registry
1447655154 MR. JAMES MADISON MAY IV National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated
More informationDR. HENRY BONE ELLIS JR. National Provider Identifiers Registry
1821200494 DR. HENRY BONE ELLIS JR. National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated
More informationDR. CAROLE M DENTINO National Provider Identifiers Registry
1689652869 DR. CAROLE M DENTINO National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the
More informationAn Ad Omnia Approach to Defining and Achiev ing Private Data Analysis
An Ad Omnia Approach to Defining and Achiev ing Private Data Analysis Mohammad Hammoud CS3525 Dept. of Computer Science University of Pittsburgh Introduction This paper addresses the problem of defining
More informationDR. MARC ADAM AGULNICK National Provider Identifiers Registry
1700875846 DR. MARC ADAM AGULNICK National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated
More informationMRS. LIZA HOEN REICHENBERGER National Provider Identifiers Registry
1356454698 MRS. LIZA HOEN REICHENBERGER The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique
More informationDR. VANG XIONG SKIBBIE National Provider Identifiers Registry
1225408826 DR. VANG XIONG SKIBBIE The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers
More informationDR. LAURIN MARINE WEISENTHAL CRISTIANO National Provider Identifiers Registry
1548674773 DR. LAURIN MARINE WEISENTHAL CRISTIANO National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996
More informationPrivacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015.
Privacy-preserving machine learning Bo Liu, the HKUST March, 1st, 2015. 1 Some slides extracted from Wang Yuxiang, Differential Privacy: a short tutorial. Cynthia Dwork, The Promise of Differential Privacy.
More informationMS. CLAUDIA DEMME PLUTA National Provider Identifiers Registry
1255319257 MS. CLAUDIA DEMME PLUTA The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers
More informationDR. JAMES ROBERT SABETTA National Provider Identifiers Registry
1669508057 DR. JAMES ROBERT SABETTA National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated
More informationNON-CENTRALIZED DISTINCT L-DIVERSITY
NON-CENTRALIZED DISTINCT L-DIVERSITY Chi Hong Cheong 1, Dan Wu 2, and Man Hon Wong 3 1,3 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong {chcheong, mhwong}@cse.cuhk.edu.hk
More informationTIES Usage Policies. for University of Pittsburgh. Authors. University of Pittsburgh
TIES Usage Policies for University of Pittsburgh Authors University of Pittsburgh Girish Chavan, MS Elizabeth Legowski, BS Rebecca Crowley Jacobson, MD, MS Table of Contents A. DOCUMENT HISTORY... A-1
More informationDR. EVAN HENRY ARGINTAR National Provider Identifiers Registry
1417108457 DR. EVAN HENRY ARGINTAR The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers
More informationTowards the Anonymisation of RDF Data
Towards the Anonymisation of RDF Data Filip Radulovic Ontology Engineering Group ETSI Informáticos Universidad Politécnica de Madrid Madrid, Spain fradulovic@fi.upm.es Raúl García-Castro Ontology Engineering
More informationDR. TROY HENRY CARON National Provider Identifiers Registry
1033182050 DR. TROY HENRY CARON National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the
More informationAbstract & Implementation
Abstract & Implementation The Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule mandates the deidentification of specific types of Protected Health Information (PHI) for
More informationDR. KAMIL PRASAD National Provider Identifiers Registry
1518403989 DR. KAMIL PRASAD National Provider Identifiers Registry The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption
More informationCOREY ANDREW MAYER National Provider Identifiers Registry
1972828903 COREY ANDREW MAYER The Administrative Simplification provisions of the Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the adoption of standard unique identifiers
More information