Statistical and Synthetic Data Sharing with Differential Privacy

Size: px
Start display at page:

Download "Statistical and Synthetic Data Sharing with Differential Privacy"

Transcription

1 pscanner and idash Data Sharing Symposium UCSD, Sept 30 Oct 2, 2015 Statistical and Synthetic Data Sharing with Differential Privacy Li Xiong Department of Mathematics and Computer Science Department of Biomedical Informatics Emory University

2 Motivation Electronic health records (EHR) Secondary use for medical research Privacy and confidentiality constraints

3 Current Practices and Limitations Raw data Patient s fear over loss of privacy and confidentiality - (un)willingness to consent Overhead of IRB reviews and policy setup De-identified data Insufficiency against re-identification and disclosure risks Lack of transparency and accountability in the use of de-identified data Lack of proof that the de-identified data is useful for PCOR studies Name Age Gender HTN Frank 42 M Y Bob 31 F Y Dave 43 M N

4 Statistical Data Sharing with Differential Privacy SHARE: Statistical and Synthetic Health Information Release with Differential Privacy (Linked R01) Building Data Registries with Privacy and Confidentiality for PCOR (PCORI) Project Team

5 SHARE: Statistical and Synthetic Health Information Release with Differential Privacy (Linked R01) Data-driven: - Multidimensionality - Correlation Application-driven: - Cross-sectional studies - Longitudinal studies Original Health Records Multi-Dimensional Data Release Sequence Data Release Statistics/ Synthetic Records SHARE: Statistical Health information RElease

6 Statistical Data Sharing with Differential Privacy

7 Statistical Data Sharing with Differential Privacy Original records Original histogram

8 Statistical Data Sharing with Differential Privacy Original records Original histogram Perturbed histogram with differential privacy

9 SHARE: Statistical and Synthetic Health Information Release with Differential Privacy (Linked R01) Data-driven: - Multidimensionality - Correlation Application-driven: - Cross-sectional studies - Longitudinal studies Original Health Records Multi-Dimensional Data Release Sequence Data Release Statistics/ Synthetic Records SHARE: Statistical Health information RElease

10 Multi-Dimensional Data Release Nonparametric methods release empirical distributions, i.e. histograms with differential privacy Parametric methods learn parameters of a distribution with differential privacy Y. Xiao, L. Xiong, L. Fan, S. Goryczka, H. Li. DPCube: Differentially Private Histogram Release through Multidimensional Partitioning, Transactions of Data Privacy (TDP), 7(3): , 2014

11 Multi-Dimensional Data Release Semi-parametric methods (DPCopula) DP marginal histograms (non-parametric) Model joint dependence using copula functions, e.g. Gaussian copula (parametric) Haoran Li, Li Xiong, Xiaoqian Jiang. Differentially Private Synthesization of Multi-Dimensional Data using Copula Functions. In EDBT 2014

12 Multi-Dimensional Data Release: Selected Results US Census data (4-dimensions) Brazil Census data (8 dimensions) DPCopula achieves lower error compared to other histogram methods Haoran Li, Li Xiong, Xiaoqian Jiang. Differentially Private Synthesization of Multi-Dimensional Data using Copula Functions. In EDBT

13 Multi-Dimensional Data Release: Addressing Data Updates Release a new histogram for each update leads to high accumulated perturbation error Distance based adaptive sampling for improved utility Can use any histogram method Haoran Li, Li Xiong, Xiaoqian Jiang, Jinfei Liu. Differentially Private Histogram Publication for Dynamic Datasets: An Adaptive Sampling Approach. CIKM 2015

14 Sequential Data Release Sequential data: longitudinal observations; encounters Prefix tree approach: sequential patterns represented as prefix tree Challenges: unsynchronized sequences; long sequences Name Blood pressure history (systolic) All: 100 Frank 110 (N) Bob 100 (N) Mary 140 (H) t1 t2 t3 120 (N) 120 (N) 130 (N) 140 (H) 140 (H) 140 (H) t1 t2 t3 L: 0 N: 70 NL: 0 NN: 40 NNL: 0 NNN: 0 NNH: 40 NH: 30 NHL: 0 NHN: 0 NHH: 30 H: 30 Original Records (Longitudinal) DP Prefix Tree James Gardner, Li Xiong, Yonghui Xiao, Jingjing Gao, Andrew Post, Xiaoqian Jiang, Lucila Ohno-Machado. SHARE: System Design and Case Studies for Statistical Health Information Release. JAMIA 2013 Luca Bonomi, Li Xiong. A Two-Phase Algorithm for Mining Sequential Patterns with Differential Privacy. CIKM 2013

15 Sequential Data Release Threshold-based pattern mining approach Generate frequent candidate k-sequences from frequent (k-1) sequences Prune infrequent ones based on database samples ID Database D Record a c d b c d a b c e d d b a d c d F 3 : freq 3-seqs Sequence {a c d} Noisy Sup. 3 C 1 : cand 1-seqs Sequence {a} {b} {c} {d} Sup {e} 1 C 3 : cand 3-seqs Sequence {a c d} Sup. 3 {a d c} 1 noise noise F 1 : freq 1-seqs Sequence {a} {c} {d} Noisy Sup F 2 : freq 2-seqs Sequence {a c} {a d} {c d} Noisy Sup {d c} 3.1 Sequence {a a} {a c} {a d} {c a} {c c} {c d} {d a} {d c} {d d} C 2 : cand 2-seqs S Xu, S Su, X Cheng, Z Li, L Xiong. Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning. ICDE 2015 Sup noise

16 Sequential Data Release: Selected Results Frequent pattern mining achieves better utility than prefix-tree approach Threshold need to be predefined MSNBC: F-score BIBLE: F-score House_Power: F-score MSNBC: RE BIBLE: RE House_Power: RE

17 Case Studies UCSD Clinical Data Warehouse for Research (CDWR) Emory Clinical Data Warehouse (CDW) Patient encounter data extracted from Demographics Hospital encounters ICD-9 discharge diagnosis codes ICD-9 and CPT procedure codes Medication orders Laboratory test results Vital signs

18 Case Studies Study 1. Cohort discovery study Study day readmission study Compare results from the original (de-identified) data and SHARE-processed data Emory/UCSD Patient data Study Emory/UCSD Patient data SHAREprocessed Data Study

19 Building Data Registries with Privacy and Confidentiality for PCOR (PCORI) Patient empowerment Analyze and track patient risks Empower patients to make informed privacy decisions Methods to utilize consented data and respect fine-grained privacy preferences

20 Building Registries with Private and Consented Data Using private data to complement consented/public data Local and cross-center (Emory and UCSD) data registries Z Ji, X Jiang, S Wang, L Xiong, and L Ohno-Machado. Differentially Private Distributed Logistic Regression using Public and Private Biomedical Datasets. In BMC Medical Genomics 2014, 7(Suppl 1):S14 H Li, L Xiong, L Ohno-Machado, X Jiang. Privacy Preserving RBF Kernel Support Vector Machine, BioMed Research International, Volume 2014, Article , 2014

21 Data registries with fine-grained privacy control Patient privacy risks evaluation and tracking when participating in multiple data registries Building data registries with personalized privacy control Data Registry 1 Data Registry 2 Data Registry n

22 Patient Engagement and Stakeholder Panels Advisory members Patients, patient advocates, privacy officers

23 Acknowledgement Research support Center for Comprehensive Informatics Woodrow Wilson Foundation Cisco research award Students James Gardner Yonghui Xiao Collaborators Andrew Post, CCI Fusheng Wang, CCI Tyrone Grandison, IBM Chun Yuan, Tsinghua

24 Emory AIMS (Assured Information Management and Sharing) Current students Recent graduates and visitors

25 THANK YOU

CS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong

CS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong Outline Tabular data and histogram/range queries Algorithms for low dimensional data Algorithms for high dimensional

More information

ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework on Private Blockchain Networks

ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework on Private Blockchain Networks ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework on Private Blockchain Networks Tsung-Ting Kuo, Chun-Nan Hsu, and Lucila Ohno-Machado pscanner Face-to-Face Meeting

More information

HIDE: Privacy Preserving Medical Data Publishing. James Gardner Department of Mathematics and Computer Science Emory University

HIDE: Privacy Preserving Medical Data Publishing. James Gardner Department of Mathematics and Computer Science Emory University HIDE: Privacy Preserving Medical Data Publishing James Gardner Department of Mathematics and Computer Science Emory University jgardn3@emory.edu Motivation De-identification is critical in any health informatics

More information

Co-clustering for differentially private synthetic data generation

Co-clustering for differentially private synthetic data generation Co-clustering for differentially private synthetic data generation Tarek Benkhelif, Françoise Fessant, Fabrice Clérot and Guillaume Raschia January 23, 2018 Orange Labs & LS2N Journée thématique EGC &

More information

The NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets

The NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets The NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets Jeffrey Brown, Lesley Curtis, and Rich Platt June 13, 2014 Previously The NIH Collaboratory:

More information

In order to mine data. P. Pearl O Rourke, MD Partners HealthCare Boston, MA

In order to mine data. P. Pearl O Rourke, MD Partners HealthCare Boston, MA In order to mine data P. Pearl O Rourke, MD Partners HealthCare Boston, MA In order to mine data You need a Mine P. Pearl O Rourke, MD Partners HealthCare Boston, MA Assumptions Current science requires

More information

My Health, My Data (and other related projects) Yannis Ioannidis ATHENA Research Center & University of Athens

My Health, My Data (and other related projects) Yannis Ioannidis ATHENA Research Center & University of Athens My Health, My Data (and other related projects) Yannis Ioannidis ATHENA Research Center & University of Athens My Health, My Data! 1 / 11 / 2016-30 / 10 / 2019 ~3M ( ~420K for ARC) Age ParCHD Procedures

More information

Anomaly Detection. You Chen

Anomaly Detection. You Chen Anomaly Detection You Chen 1 Two questions: (1) What is Anomaly Detection? (2) What are Anomalies? Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior

More information

Frequent grams based Embedding for Privacy Preserving Record Linkage

Frequent grams based Embedding for Privacy Preserving Record Linkage Frequent grams based Embedding for Privacy Preserving Record Linkage ABSTRACT Luca Bonomi Emory University Atlanta, USA lbonomi@mathcs.emory.edu Rui Chen Concordia University Montreal, Canada ru_che@encs.concordia.ca

More information

Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring

Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring DBSec 13 Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring Liyue Fan, Li Xiong, Vaidy Sunderam Department of Math & Computer Science Emory University 9/4/2013 DBSec'13:

More information

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University Outline Privacy preserving data publishing: What and Why Examples of privacy attacks

More information

Demonstration of Damson: Differential Privacy for Analysis of Large Data

Demonstration of Damson: Differential Privacy for Analysis of Large Data Demonstration of Damson: Differential Privacy for Analysis of Large Data Marianne Winslett 1,2, Yin Yang 1,2, Zhenjie Zhang 1 1 Advanced Digital Sciences Center, Singapore {yin.yang, zhenjie}@adsc.com.sg

More information

Mining Frequent Patterns with Differential Privacy

Mining Frequent Patterns with Differential Privacy Mining Frequent Patterns with Differential Privacy Luca Bonomi (Supervised by Prof. Li Xiong) Department of Mathematics & Computer Science Emory University Atlanta, USA lbonomi@mathcs.emory.edu ABSTRACT

More information

Li Xiong Education Positions Held Honors and Awards

Li Xiong Education Positions Held Honors and Awards Li Xiong Department of Mathematics and Computer Science Emory University, Atlanta, GA 30322 Phone: 404-727-0758; Email: lxiong@emory.edu Web: http://www.mathcs.emory.edu/~lxiong January 4, 2016 Education

More information

Security Control Methods for Statistical Database

Security Control Methods for Statistical Database Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security Statistical Database A statistical database is a database which provides statistics on subsets of records OLAP

More information

Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design)

Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Electronic Health Records for Clinical Research Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Project acronym: EHR4CR Project full title: Electronic

More information

Smooth Isotonic Regression: A New Method to Calibrate Predictive Models

Smooth Isotonic Regression: A New Method to Calibrate Predictive Models Smooth Isotonic Regression: A New Method to Calibrate Predictive Models Xiaoqian Jiang, PhD, Melanie Osl, PhD, Jihoon Kim, MSc, Lucila Ohno-Machado, MD, PhD Division of Biomedical Informatics, Department

More information

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Partition Based Perturbation for Privacy Preserving Distributed Data Mining BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 2 Sofia 2017 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2017-0015 Partition Based Perturbation

More information

Data linkages in PEDSnet

Data linkages in PEDSnet 2016/2017 CRISP Seminar Series - Part IV Data linkages in PEDSnet Toan C. Ong, PhD Assistant Professor Department of Pediatrics University of Colorado, Anschutz Medical Campus Content Record linkage background

More information

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust G.Mareeswari 1, V.Anusuya 2 ME, Department of CSE, PSR Engineering College, Sivakasi, Tamilnadu,

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Informatics Common Metric Pre-Pilot Landscape Assessment

Informatics Common Metric Pre-Pilot Landscape Assessment Informatics Common Metric Pre-Pilot Landscape Assessment University of Rochester Center for Leading Innovation and Collaboration (CLIC) Funded by the National Center for Advancing Translational Sciences

More information

ICTR UW Institute of Clinical and Translational Research. i2b2 User Guide. Version 1.0 Updated 9/11/2017

ICTR UW Institute of Clinical and Translational Research. i2b2 User Guide. Version 1.0 Updated 9/11/2017 ICTR UW Institute of Clinical and Translational Research i2b2 User Guide Version 1.0 Updated 9/11/2017 Table of Contents Background/Search Criteria... 2 Accessing i2b2... 3 Navigating the Workbench...

More information

i2b2 User Guide University of Minnesota Clinical and Translational Science Institute

i2b2 User Guide University of Minnesota Clinical and Translational Science Institute Clinical and Translational Science Institute i2b2 User Guide i2b2 is a tool for discovering research cohorts using existing, de-identified, clinical data This guide is provided by the Office of Biomedical

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

A Review Of Synthetic Data Generation Methods For Privacy Preserving Data Publishing

A Review Of Synthetic Data Generation Methods For Privacy Preserving Data Publishing A Review Of Data Generation Methods For Privacy Preserving Data Publishing Surendra.H, Dr. Mohan.H.S Abstract: Due to the technological advancement, enormous micro data containing detailed individual information

More information

The Beau Biden Cancer Moonshot and the NCI Blue Ribbon Panel Recommendations. NAACCR Registry of the Future Presentation June 19, 2017

The Beau Biden Cancer Moonshot and the NCI Blue Ribbon Panel Recommendations. NAACCR Registry of the Future Presentation June 19, 2017 The Beau Biden Cancer Moonshot and the NCI Blue Ribbon Panel Recommendations NAACCR Registry of the Future Presentation June 19, 2017 The Beau Biden Cancer Moonshot Accelerate progress in cancer, including

More information

Computer-based Tracking Protocols: Improving Communication between Databases

Computer-based Tracking Protocols: Improving Communication between Databases Computer-based Tracking Protocols: Improving Communication between Databases Amol Deshpande Database Group Department of Computer Science University of Maryland Overview Food tracking and traceability

More information

FREQUENT subgraph mining (FSM) is a fundamental and

FREQUENT subgraph mining (FSM) is a fundamental and JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 25 A Two-Phase Algorithm for Differentially Private Frequent Subgraph Mining Xiang Cheng, Sen Su, Shengzhi Xu, Li Xiong, Ke Xiao, Mingxing Zhao Abstract

More information

Subject Area Data Element Examples Earliest Date Patient Demographics Race, primary language, mortality 2000 Encounters

Subject Area Data Element Examples Earliest Date Patient Demographics Race, primary language, mortality 2000 Encounters User Guide DataDirect is a self-service tool enabling access to robust, up-to-date data on more than 3 million unique patients from across the UMHS enterprise. This data informs study design and guides

More information

mhealth & integrated care

mhealth & integrated care mhealth & integrated care 2nd Shiraz International mhealth Congress February 22th, 23th 2017, Shiraz - Iran Nick Guldemond Associate Professor Integrated Care & Technology Roadmap 1 Healthcare paradigm

More information

Prediction of Dialysis Length. Adrian Loy, Antje Schubotz 2 February 2017

Prediction of Dialysis Length. Adrian Loy, Antje Schubotz 2 February 2017 , 2 February 2017 Agenda 1. Introduction Dialysis Research Questions and Objectives 2. Methodology MIMIC-III Algorithms SVR and LPR Preprocessing with rapidminer Optimization Challenges 3. Preliminary

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Table of Contents. Page 1 of 51

Table of Contents. Page 1 of 51 Table of Contents Introduction/Background/Search Criteria...2 Accessing i2b2.3 Navigating the Workbench..14 Resize the Workspace 17 Constructing and Running a Query.18 Selecting Query Criteria. 18 Building

More information

Certification Commission for Healthcare Information Technology. CCHIT A Catalyst for EHR Adoption

Certification Commission for Healthcare Information Technology. CCHIT A Catalyst for EHR Adoption Certification Commission for Healthcare Information Technology CCHIT A Catalyst for EHR Adoption Alisa Ray, Executive Director, CCHIT Sarah Corley, MD, Chief Medical Officer, NextGen Healthcare Systems;

More information

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015.

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015. Privacy-preserving machine learning Bo Liu, the HKUST March, 1st, 2015. 1 Some slides extracted from Wang Yuxiang, Differential Privacy: a short tutorial. Cynthia Dwork, The Promise of Differential Privacy.

More information

Differentially private distributed logistic regression using private and public data

Differentially private distributed logistic regression using private and public data RESEARCH Differentially private distributed logistic regression using private and public data Zhanglong Ji 1*, Xiaoqian Jiang 1, Shuang Wang 1, Li Xiong 2, Lucila Ohno-Machado 1 From The 3rd Annual Translational

More information

From Integration to Interoperability: The Role of Public Health Systems in the Emerging World of Health Information Exchange

From Integration to Interoperability: The Role of Public Health Systems in the Emerging World of Health Information Exchange From Integration to Interoperability: The Role of Public Health Systems in the Emerging World of Health Information Exchange Noam H. Arzt, PhD American Public Health Association Annual Meeting Session

More information

Three Levels of Access Control to Personal Health Records in a Healthcare Cloud

Three Levels of Access Control to Personal Health Records in a Healthcare Cloud Three Levels of Access Control to Personal Health Records in a Healthcare Cloud Gabriel Sanchez Bautista and Ning Zhang School of Computer Science The University of Manchester Manchester M13 9PL, United

More information

Differentially Private H-Tree

Differentially Private H-Tree GeoPrivacy: 2 nd Workshop on Privacy in Geographic Information Collection and Analysis Differentially Private H-Tree Hien To, Liyue Fan, Cyrus Shahabi Integrated Media System Center University of Southern

More information

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Dr.K.P.Kaliyamurthie HOD, Department of CSE, Bharath University, Tamilnadu, India ABSTRACT: Automated

More information

Design of student information system based on association algorithm and data mining technology. CaiYan, ChenHua

Design of student information system based on association algorithm and data mining technology. CaiYan, ChenHua 5th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2017) Design of student information system based on association algorithm and data mining technology

More information

CTSA Program Common Metric for Informatics Solutions

CTSA Program Common Metric for Informatics Solutions CTSA Program Common Metric for Informatics Solutions KRISTI HOLMES, PHD DIRECTOR OF EVALUATION, NUCATS DIRECTOR, GALTER HEALTH SCIENCES LIBRARY & LEARNING CENTER NORTHWESTERN UNIVERSITY CTSA PROGRAM STEERING

More information

Best Practices. Contents. Meridian Technologies 5210 Belfort Rd, Suite 400 Jacksonville, FL Meridiantechnologies.net

Best Practices. Contents. Meridian Technologies 5210 Belfort Rd, Suite 400 Jacksonville, FL Meridiantechnologies.net Meridian Technologies 5210 Belfort Rd, Suite 400 Jacksonville, FL 32257 Meridiantechnologies.net Contents Overview... 2 A Word on Data Profiling... 2 Extract... 2 De- Identification... 3 PHI... 3 Subsets...

More information

Using Blockchain for Consent and Access to Private and Sensitive Data in the GDPR Environment

Using Blockchain for Consent and Access to Private and Sensitive Data in the GDPR Environment Using Blockchain for Consent and Access to Private and Sensitive Data in the GDPR Environment Gary Leeming, Chief Technology Officer Connected Health Cities, University of Manchester 1 Connected Health

More information

Privacy Preserving Data Sharing in Data Mining Environment

Privacy Preserving Data Sharing in Data Mining Environment Privacy Preserving Data Sharing in Data Mining Environment PH.D DISSERTATION BY SUN, XIAOXUN A DISSERTATION SUBMITTED TO THE UNIVERSITY OF SOUTHERN QUEENSLAND IN FULLFILLMENT OF THE REQUIREMENTS FOR THE

More information

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S.

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series Introduction to Privacy-Preserving Data Publishing Concepts and Techniques Benjamin C M Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S Yu CRC

More information

Differentially Private Multi-Dimensional Time Series Release for Traffic Monitoring

Differentially Private Multi-Dimensional Time Series Release for Traffic Monitoring Differentially Private Multi-Dimensional Time Series Release for Traffic Monitoring Liyue Fan, Li Xiong, and Vaidy Sunderam Emory University Atlanta GA 30322, USA {lfan3,lxiong,vss}@mathcs.emory.edu Abstract.

More information

Comprehensive analysis and evaluation of big data for main transformer equipment based on PCA and Apriority

Comprehensive analysis and evaluation of big data for main transformer equipment based on PCA and Apriority IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Comprehensive analysis and evaluation of big data for main transformer equipment based on PCA and Apriority To cite this article:

More information

edify Requirements for Evidence-based Templates in Electronic Case Report Forms Marco Schweitzer, Stefan Oberbichler

edify Requirements for Evidence-based Templates in Electronic Case Report Forms Marco Schweitzer, Stefan Oberbichler edify Requirements for Evidence-based Templates in Electronic Case Report Forms Marco Schweitzer, Stefan Oberbichler ehealth Research and Innovation Unit, UMIT - University for Health Sciences, Medical

More information

Reproducible Workflows Biomedical Research. P Berlin, Germany

Reproducible Workflows Biomedical Research. P Berlin, Germany Reproducible Workflows Biomedical Research P11 2018 Berlin, Germany Contributors Leslie McIntosh Research Data Alliance, U.S., Executive Director Oya Beyan Aachen University, Germany Anthony Juehne RDA,

More information

Privacy Preserving Health Data Mining

Privacy Preserving Health Data Mining IJCST Vo l. 6, Is s u e 4, Oc t - De c 2015 ISSN : 0976-8491 (Online) ISSN : 2229-4333 (Print) Privacy Preserving Health Data Mining 1 Somy.M.S, 2 Gayatri.K.S, 3 Ashwini.B 1,2,3 Dept. of CSE, Mar Baselios

More information

IJSER. Privacy and Data Mining

IJSER. Privacy and Data Mining Privacy and Data Mining 2177 Shilpa M.S Dept. of Computer Science Mohandas College of Engineering and Technology Anad,Trivandrum shilpams333@gmail.com Shalini.L Dept. of Computer Science Mohandas College

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Emerging Measures in Preserving Privacy for Publishing The Data

Emerging Measures in Preserving Privacy for Publishing The Data Emerging Measures in Preserving Privacy for Publishing The Data K.SIVARAMAN 1 Assistant Professor, Dept. of Computer Science, BIST, Bharath University, Chennai -600073 1 ABSTRACT: The information in the

More information

Project Participants

Project Participants Annual Report for Period:10/2004-10/2005 Submitted on: 06/21/2005 Principal Investigator: Yang, Li. Award ID: 0414857 Organization: Western Michigan Univ Title: Projection and Interactive Exploration of

More information

Event Detection through Differential Pattern Mining in Internet of Things

Event Detection through Differential Pattern Mining in Internet of Things Event Detection through Differential Pattern Mining in Internet of Things Authors: Md Zakirul Alam Bhuiyan and Jie Wu IEEE MASS 2016 The 13th IEEE International Conference on Mobile Ad hoc and Sensor Systems

More information

ONLINE INDEXING FOR DATABASES USING QUERY WORKLOADS

ONLINE INDEXING FOR DATABASES USING QUERY WORKLOADS International Journal of Computer Science and Communication Vol. 2, No. 2, July-December 2011, pp. 427-433 ONLINE INDEXING FOR DATABASES USING QUERY WORKLOADS Shanta Rangaswamy 1 and Shobha G. 2 1,2 Department

More information

ICT sebagai pemacu bisnes dalam perkhidmatan penjagaan kesihatan yang berkualiti dan bersepadu

ICT sebagai pemacu bisnes dalam perkhidmatan penjagaan kesihatan yang berkualiti dan bersepadu ICT sebagai pemacu bisnes dalam perkhidmatan penjagaan kesihatan yang berkualiti dan bersepadu Untuk menyediakan perkhidmatan ICT yang menyeluruh bagi membolehkan perkhidmatan penjagaan kesihatan yang

More information

P A T I E N T C E N T E R E D M E D I C A L H O M E ( P C M H ) A T T E S T A T I O N O F F A C I L I T Y C O M P L I A N C E

P A T I E N T C E N T E R E D M E D I C A L H O M E ( P C M H ) A T T E S T A T I O N O F F A C I L I T Y C O M P L I A N C E P A T I E N T C E N T E R E D M E D I C A L H O M E ( P C M H ) A T T E S T A T I O N O F F A C I L I T Y C O M P L I A N C E State of Wyoming, Department of Health, Division of Healthcare Financing 2015

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

University Hospitals UH Personal Health Record User Guide

University Hospitals UH Personal Health Record User Guide This guide will help you learn how to use the University Hospitals Personal Health Record, including requesting appointments, viewing health information, sending secure messages and more. What is the?...

More information

CS573 Data Privacy and Security. Differential Privacy. Li Xiong

CS573 Data Privacy and Security. Differential Privacy. Li Xiong CS573 Data Privacy and Security Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques Composition theorems Statistical Data Privacy Non-interactive vs interactive Privacy

More information

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Tiancheng Li Ninghui Li CERIAS and Department of Computer Science, Purdue University 250 N. University Street, West

More information

Pre-Requisites: CS2510. NU Core Designations: AD

Pre-Requisites: CS2510. NU Core Designations: AD DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification

More information

Disassociation for Electronic Health Record Privacy

Disassociation for Electronic Health Record Privacy Disassociation for Electronic Health Record Privacy Grigorios Loukides a, John Liagouris b, Aris Gkoulalas-Divanis c, Manolis Terrovitis d a School of Computer Science and Informatics, Cardiff University,

More information

A Vision for Bigger Biomedical Data: Integration of REDCap with Other Data Sources

A Vision for Bigger Biomedical Data: Integration of REDCap with Other Data Sources A Vision for Bigger Biomedical Data: Integration of REDCap with Other Data Sources Ram Gouripeddi Assistant Professor, Department of Biomedical Informatics, University of Utah Senior Biomedical Informatics

More information

ehealth and DSM, Digital Single Market

ehealth and DSM, Digital Single Market ehealth and DSM, Digital Single Market Dr. Christoph Klein Interoperable data, access and sharing ehealth, Wellbeing and Ageing DG Communications Networks, Content and Technology European Commission, Luxembourg,

More information

LibreHealth Electronic Health Record

LibreHealth Electronic Health Record 1 of 10 LibreHealth Electronic Health Record The LibreHealth EHR log in page can be accessed using Google Chrome and other common browsers. LibreHealth EHR is an open source EHR which means the programming

More information

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

TriRank: Review-aware Explainable Recommendation by Modeling Aspects TriRank: Review-aware Explainable Recommendation by Modeling Aspects Xiangnan He, Tao Chen, Min-Yen Kan, Xiao Chen National University of Singapore Presented by Xiangnan He CIKM 15, Melbourne, Australia

More information

POLICY. Create a governance process to manage requests to extract de- identified data from the Information Exchange (IE).

POLICY. Create a governance process to manage requests to extract de- identified data from the Information Exchange (IE). Academic Health Center Office of Biomedical Health Informatics POLICY Extraction of De- Identifiable Data from the Information Exchange Approved Proposal Purpose Create a governance process to manage requests

More information

Quality Data Model (QDM)

Quality Data Model (QDM) Quality Data Model (QDM) Overview Document National Quality Forum 4/20/2011 QUALITY DATA MODEL (QDM): OVERVIEW TABLE OF CONTENTS Quality Data Model (QDM): Overview... 2 National Quality Forum: Overview

More information

All of Us Research Program

All of Us Research Program An Introduction to the All of Us Research Program October, 2017 @AllofUsResearch #JoinAllofUs Presented on behalf of NIH All of Us by Petra Kaufmann, Director of Office of Rare Diseases Research, NCATS/NIH

More information

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain

Implementation of Privacy Mechanism using Curve Fitting Method for Data Publishing in Health Care Domain Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.1105

More information

Organizational Track 1 p.m.-2 p.m. Maintenance and Updating Margo Imel, RHIT, MBA Terminology ManagerMapping, SNOMED, International Pat Wilson, RT

Organizational Track 1 p.m.-2 p.m. Maintenance and Updating Margo Imel, RHIT, MBA Terminology ManagerMapping, SNOMED, International Pat Wilson, RT Organizational Track 1 p.m.-2 p.m. Maintenance and Updating Margo Imel, RHIT, MBA Terminology ManagerMapping, SNOMED, International Pat Wilson, RT (R), CPC, Team Lead Health Data Dictionary 3M SNOMED CT

More information

Jarek Szlichta

Jarek Szlichta Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns

More information

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400 Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,

More information

SHOW ME THE MONEY SOCIETAL CHALLENGE 1 [ ] 2 nd Oct 2017

SHOW ME THE MONEY SOCIETAL CHALLENGE 1 [ ] 2 nd Oct 2017 SHOW ME THE MONEY SOCIETAL CHALLENGE 1 [2018-2020] 2 nd Oct 2017 H2020 SC1-eHealth Calls (2018). Agenda Appendix A. Lessons Learned From ESRs. Appendix B. A Quick Guide How To Make A Proposal. Appendix

More information

Pufferfish: A Semantic Approach to Customizable Privacy

Pufferfish: A Semantic Approach to Customizable Privacy Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu Collaborators: Daniel Kifer (Penn State), Bolin Ding (UIUC, Microsoft Research) idash Privacy Workshop

More information

CERT Symposium: Cyber Security Incident Management for Health Information Exchanges

CERT Symposium: Cyber Security Incident Management for Health Information Exchanges Pennsylvania ehealth Partnership Authority Pennsylvania s Journey for Health Information Exchange CERT Symposium: Cyber Security Incident Management for Health Information Exchanges June 26, 2013 Pittsburgh,

More information

Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP

Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP 324 Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP Shivaji Yadav(131322) Assistant Professor, CSE Dept. CSE, IIMT College of Engineering, Greater Noida,

More information

K-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007

K-Anonymity and Other Cluster- Based Methods. Ge Ruan Oct. 11,2007 K-Anonymity and Other Cluster- Based Methods Ge Ruan Oct 11,2007 Data Publishing and Data Privacy Society is experiencing exponential growth in the number and variety of data collections containing person-specific

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Sequence Data Sequence Database: Timeline 10 15 20 25 30 35 Object Timestamp Events A 10 2, 3, 5 A 20 6, 1 A 23 1 B 11 4, 5, 6 B

More information

University of Wisconsin-Madison Policy and Procedure

University of Wisconsin-Madison Policy and Procedure Page 1 of 10 I. Policy The Health Information Technology for Economic and Clinical Health Act regulations ( HITECH ) amended the Health Information Portability and Accountability Act ( HIPAA ) to establish

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Sequence Data: Sequential Pattern Mining Instructor: Yizhou Sun yzsun@cs.ucla.edu November 27, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification

More information

Privacy, Security & Ethical Issues

Privacy, Security & Ethical Issues Privacy, Security & Ethical Issues How do we mine data when we can t even look at it? 2 Individual Privacy Nobody should know more about any entity after the data mining than they did before Approaches:

More information

EHR Connectivity Integration Specification

EHR Connectivity Integration Specification EHR Connectivity Integration Specification HeC Contact information Name Phone Email Title/Role Jeremy Smith (315) 671 2241 x320 jsmith@healtheconnections.org Manager, HIE Integration OVERVIEW This document

More information

A Survey on Frequent Itemset Mining using Differential Private with Transaction Splitting

A Survey on Frequent Itemset Mining using Differential Private with Transaction Splitting A Survey on Frequent Itemset Mining using Differential Private with Transaction Splitting Bhagyashree R. Vhatkar 1,Prof. (Dr. ). S. A. Itkar 2 1 Computer Department, P.E.S. Modern College of Engineering

More information

Privacy Preserving Service Discovery for Interoperability in Power to the Edge Approach Research and Development Initiative, Chuo University

Privacy Preserving Service Discovery for Interoperability in Power to the Edge Approach Research and Development Initiative, Chuo University Privacy Preserving Service Discovery for Interoperability in Power to the Edge Approach Research and Development Initiative, Chuo University Hiroshi Yamaguchi, Masahito Gotaishi, Shigeo Tsujii, Norihisa

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Privacy preserving data mining Li Xiong Slides credits: Chris Clifton Agrawal and Srikant 4/3/2011 1 Privacy Preserving Data Mining Privacy concerns about personal data AOL

More information

Protecting Privacy while Sharing Medical Data between Regional Healthcare Entities

Protecting Privacy while Sharing Medical Data between Regional Healthcare Entities IBM Almaden Research Center Protecting Privacy while Sharing Medical Data between Regional Healthcare Entities Tyrone Grandison, Srivatsava Ranjit Ganta, Uri Braun, James Kaufman Session S113: Sharing

More information

Access EMR Data for Research

Access EMR Data for Research Access EMR Data for Research Srikar Chamala, PhD Clinical Assistant Professor Director of Biomedical Informatics (Dept. of Pathology) Univ. of Florida Dept. of Pathology, Immunology, and Laboratory Medicine

More information

Qualifying Alternative Payment Model Participants (QPs) Methodology Fact Sheet

Qualifying Alternative Payment Model Participants (QPs) Methodology Fact Sheet Qualifying Alternative Payment Model Participants (QPs) Methodology Fact Sheet Overview This methodology fact sheet describes the process and methodology that the Centers for Medicare & Medicaid Services

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Scientific Research Data Management Policy

Scientific Research Data Management Policy Scientific Research Data Management Policy DOCUMENT SUMMARY Document No. SRDMP-0001 Ref. Document Title Author(s) Policy Sponsor Scientific Research Data Management Policy Karen Ambrose Alison Davis DOCUMENT

More information

Anonymizing datasets with demographics and diagnosis codes in the presence of utility constraints

Anonymizing datasets with demographics and diagnosis codes in the presence of utility constraints Anonymizing datasets with demographics and diagnosis codes in the presence of utility constraints Giorgos Poulis a, Grigorios Loukides b, Spiros Skiadopoulos a, Aris Gkoulalas-Divanis c a Department of

More information

Secure Multiparty Computation Introduction to Privacy Preserving Distributed Data Mining

Secure Multiparty Computation Introduction to Privacy Preserving Distributed Data Mining CS573 Data Privacy and Security Secure Multiparty Computation Introduction to Privacy Preserving Distributed Data Mining Li Xiong Slides credit: Chris Clifton, Purdue University; Murat Kantarcioglu, UT

More information

Data warehouse and Data Mining

Data warehouse and Data Mining Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Jeffrey Friedberg. Chief Trust Architect Microsoft Corporation. July 12, 2010 Microsoft Corporation

Jeffrey Friedberg. Chief Trust Architect Microsoft Corporation. July 12, 2010 Microsoft Corporation Jeffrey Friedberg Chief Trust Architect Microsoft Corporation July 2, 200 Microsoft Corporation Secure against attacks Protects confidentiality, integrity and availability of data and systems Manageable

More information

Design A Database Schema For A Hospital

Design A Database Schema For A Hospital Design A Database Schema For A Hospital Information System Databases for Clinical Information Systems are difficult to design and implement, but how to query the data and use it for healthcare (documentation,

More information