Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S.

Similar documents
Support Vector. Machines. Algorithms, and Extensions. Optimization Based Theory, Naiyang Deng YingjieTian. Chunhua Zhang.

the Simulation of Dynamics Using Simulink

Data Clustering in C++

Contents. Preface to the Second Edition

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Introduction to Data Mining

Mobile Device Security

@Taylor. Usability. Evaluation. for In-Vehicle Systems. Harvey. Catherine. Neville A.Stanton. CRC Press. Francis Group

CLASSIFICATION AND CHANGE DETECTION

Manifold Learning Theory and Applications

\XjP^J Taylor & Francis Group. Model-Based Control. Tensor Product Model Transformation in Polytopic. Yeung Yam. CRC Press.

DOTNET PROJECTS. DOTNET Projects. I. IEEE based IOT IEEE BASED CLOUD COMPUTING

PRACTICAL SPEECH USER INTERFACE DESIGN

A Review on Privacy Preserving Data Mining Approaches

A Review of Privacy Preserving Data Publishing Technique

Securing an IT. Governance, Risk. Management, and Audit

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Reproducible Research with R and RStudio

IEEE 2013 JAVA PROJECTS Contact No: KNOWLEDGE AND DATA ENGINEERING

Contents. Part I Setting the Scene

Enhanced Slicing Technique for Improving Accuracy in Crowdsourcing Database

Achieving k-anonmity* Privacy Protection Using Generalization and Suppression

Computer Network. The Practical User Guide for. Simulation. Adarshpal S. Hnatyshin. Vasil Y. CRC Press. Taylor Si Francis Croup

Summary of Contents LIST OF FIGURES LIST OF TABLES

Part I: Data Mining Foundations

Table Of Contents: xix Foreword to Second Edition

Survey of Anonymity Techniques for Privacy Preserving

CISSP* CBK (ISC) GUIDE TO THE. OFFICIAL (ISCf. \Xjfl^J Taylor &. Francis Group ' Boca Raton London New York. CRC Press THIRD EDITION

Management. Port Security. Second Edition KENNETH CHRISTOPHER. CRC Press. Taylor & Francis Group. Taylor & Francis Group,

Anonymizing Sequential Releases

Managing Your Biological Data with Python

Modelling and Quantitative Methods in Fisheries

CLOUD MANAGEMENT AND SECURITY

Hiding the Presence of Individuals from Shared Databases: δ-presence

Image Analysis, Classification and Change Detection in Remote Sensing

Survey Result on Privacy Preserving Techniques in Data Publishing

Void main Technologies

Biosignal And Medical Image Processing Second Edition Signal Processing And Communications

MASHUP service is a web technology that combines

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud

Privacy Challenges in Big Data and Industry 4.0

IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING

A Primer on WAVELETS and Their Scientific Applications

COMPUTER NETWORK TIME SYNCHRONIZATION

DATA STREAMS: MODELS AND ALGORITHMS

CROSS-REFERENCE TABLE ASME A Including A17.1a-1997 Through A17.1d 2000 vs. ASME A

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

Table of Contents 1 Introduction A Declarative Approach to Entity Resolution... 17

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Pseudorandomness and Cryptographic Applications

Information Security in Big Data: Privacy & Data Mining

Applied Combinatorics

Moving to the Cloud. Developing Apps in. the New World of Cloud Computing. Dinkar Sitaram. Geetha Manjunath. David R. Deily ELSEVIER.

Emerging Measures in Preserving Privacy for Publishing The Data

Computers as Components Principles of Embedded Computing System Design

Business Intelligence Roadmap HDT923 Three Days

CONSTITUTIVE MODELING OF GEOMATERIALS

Protecting Against Maximum-Knowledge Adversaries in Microdata Release: Analysis of Masking and Synthetic Data Using the Permutation Model

Privacy and Security Ensured Rule Mining under Partitioned Databases

Reconstruction-based Classification Rule Hiding through Controlled Data Modification

DOTNET Projects. DotNet Projects IEEE I. DOTNET based CLOUD COMPUTING. DOTNET based ARTIFICIAL INTELLIGENCE

m-privacy for Collaborative Data Publishing

Incognito: Efficient Full Domain K Anonymity

ACOUSTIC MODELING UNDERWATER. and SIMULATION. Paul C. Etter. CRC Press. Taylor & Francis Croup. Taylor & Francis Croup, CRC Press is an imprint of the

INFORMATION HIDING IN COMMUNICATION NETWORKS

JAVA Projects. 1. Enforcing Multitenancy for Cloud Computing Environments (IEEE 2012).

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SOME ASSEMBLY REQUIRED

Computer Arithmetic andveriloghdl Fundamentals

Privacy, Security & Ethical Issues

Collective Intelligence in Action

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey

A Survey on A Privacy Preserving Technique using K-means Clustering Algorithm

Information Security: Principles and Practice Second Edition. Mark Stamp

PRIVACY PRESERVATION IN HIGH-DIMENSIONAL TRAJECTORY DATA FOR PASSENGER FLOW ANALYSIS

Software-Defined Data Infrastructure Essentials

A Survey on: Privacy Preserving Mining Implementation Techniques

Managing and Mining Graph Data

Service-Oriented Architecture for Privacy-Preserving Data Mashup

(α, k)-anonymity: An Enhanced k-anonymity Model for Privacy-Preserving Data Publishing

Microdata Publishing with Algorithmic Privacy Guarantees

Graphics Shaders. Theory and Practice. Second Edition. Mike Bailey. Steve Cunningham. CRC Press. Taylor&FnincIs Croup tootutor London New York

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

CS573 Data Privacy and Security. Li Xiong

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Hiding Sensitive Predictive Frequent Itemsets

VERIFICATION AND VALIDATION FOR QUALITY OF UML 2.0 MODELS

CS573 Data Privacy and Security. Differential Privacy. Li Xiong

A Guide to MATLAB Object-Oriented Programming

Towards the Anonymisation of RDF Data

"Charting the Course... MOC C: Developing SQL Databases. Course Summary

Analysis of Incomplete Multivariate Data

TABLE OF CONTENTS PAGE TITLE NO.

Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning

System Level Design. Technology. El Mostapha Aboulhamid Frederic Rousseau. Edited by. CRC Press. Taylor &. Francis Croup

Privacy Preserving High-Dimensional Data Mashup Megala.K 1, Amudha.S 2 2. THE CHALLENGES 1. INTRODUCTION

CERIAS Tech Report

Transcription:

Chapman & Hall/CRC Data Mining and Knowledge Discovery Series Introduction to Privacy-Preserving Data Publishing Concepts and Techniques Benjamin C M Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S Yu CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business A CHAPMAN & HALL BOOK

Contents List of Figures xv List of Tables xvii List of Algorithms xxi Preface xxiii Acknowledgments xxix About the Authors xxxi I The Fundamentals 1 1 Introduction 3 11 Data Collection and Data Publishing 4 12 What Is Privacy-Preserving Data Publishing? 7 13 Related Research Areas 9 2 Attack Models and Privacy Models 13 21 Record Linkage Model 14 211 k-anonymity 15 212 (X, Y)-Anonymity 18 213 Dilemma on Choosing QID 18 22 Attribute Linkage Model 19 221 ^-Diversity 20 222 Confidence Bounding 23 223 (X, y)-linkability 23 224 (X, Y)-Privacy 24 225 {a, fc)-anonymity 24 226 LiTC-Privacy 25 227 (k, e)-anonymity 26 228 i-closeness 27 229 Personalized Privacy 27 2210 i^-anonymity 28 23 Table Linkage Model 29 24 Probabilistic Model 30 241 (c, t)-isolation 30 vii

viii Contents 242 e-differential Privacy 30 243 (d, 7)-Privac,y 31 244 Distributional Privacy 31 25 Modeling Adversary's Background Knowledge 32 251 Skyline Privacy 32 252 Privacy-MaxEnt 33 253 Skyline (B, ^-Privacy 33 Anonymization Operations 35 31 Generalization and Suppression 35 32 Anatomization and Permutation 38 33 Random Perturbation 41 331 Additive Noise 41 332 Data Swapping 42 333 Synthetic Data Generation 42 Information Metrics 43 41 General Purpose Metrics 43 411 Minimal Distortion 43 412 ILoss 44 413 Discernibility Metric 44 414 Distinctive Attribute 45 42 Special Purpose Metrics 45 43 Trade-Off Metrics 47 Anonymization Algorithms 49 51 Algorithms for the Record Linkage Model 49 511 Optimal Auonyinization 49 512 Locally Minimal Anonymization 51 513 Perturbation Algorithms 54 52 Algorithms for the Attribute Linkage Model 55 521 ^-Diversity Incognito and ^-Optimize 55 522 InfoGain Mondrian 56 523 Top-Down Disclosure 56 524 Anatomize 57 525 (k, e)-anonymity Permutation 58 526 Personalized Privacy 58 53 Algorithms for the Table Linkage Model 59 531 (5-Preseuce Algorithms SPALM and MPALM 59 54 Algorithms for the Probabilistic Attack Model 59 541 e-differential Additive Noise 61 542 afs Algorithm 61 55 Attacks on Anonymous Data 61 551 Minimality Attack 61 552 definetti Attack 63

Contents ix 553 Corruption Attack 64 II Anonymization for Data Mining 67 82 6 Anonymization for Classification Analysis 69 61 Introduction G9 62 Anonymization Problems for Red Cross BTS 74 621 Privacy Model 74 622 Information Metrics 75 623 Problem Statement 81 63 High-Dimensional Top-Down Specialization (HDTDS) 631 Find the Best Specialization 83 632 Perform the Best Specialization 84 633 Update Score and Validity 87 634 Discussion 87 64 Workload-Aware Mondrian 89 641 Single Categorical Target Attribute 89 642 Single Numerical Target Attribute 90 643 Multiple Target Attributes 91 644 Discussion 91 65 Bottom-Up Generalization 92 651 The Anonymization Algorithm 92 652 Data Structure 93 653 Discussion 95 66 Genetic Algorithm 95 661 The Anonymization Algorithm 96 662 Discussion 96 67 Evaluation Methodology 96 671 Data Utility 97 672 Efficiency and Scalability 102 68 Summary and Lesson Learned 103 7 Anonymization for Cluster Analysis 105 71 Introduction 105 72 Anonymization Framework for Cluster Analysis 105 721 Anonymization Problem for Cluster Analysis 108 722 Overview of Solution Framework 112 723 Anonymization for Classification Analysis 114 724 Evaluation 118 725 Discussion 121 73 Dimensionality Reduction-Based Transformation 124 731 Dimensionality Reduction 124 732 The DRBT Method 125 74 Related Topics 126 75 Summary 126

X Contents III Extended Data Publishing Scenarios 129 133 8 Multiple Views Publishing 131 81 Introduction 131 82 Checking Violations of A'-Anonymity on Multiple Views 133 821 Violations by Multiple Selection-Project Views 822 Violations by Functional Dependencies 136 823 Discussion 13C 83 Checking Violations with Marginals 137 84 MultiRelational fc-anonymity 140 85 Multi-Level Perturbation 140 86 Summary 141 9 Anonymizing Sequential Releases with New Attributes 143 91 Introduction 143 911 Motivations 143 912 Anonymization Problem for Sequential Releases 147 92 MonotoniaLy of Privacy 151 93 Anonymization Algorithm for Sequential Releases 153 931 Overview of the Anonymization Algorithm 153 932 Information Metrics 154 933 OY, Y>Lmkability 155 934 (X, V)-Anonymity 158 94 Extensions 159 95 Summary 159 10 Anonymizing Incrementally Updated Data Records 161 101 Introduction 161 102 Continuous Data Publishing 164 1021 Data Model 164 1022 Correspondence Attacks 165 1023 Anonymization Problem for Continuous Publishing 168 1024 Detection of Correspondence Attacks 175 1025 Anonymization Algorithm for Correspondence Attacks 178 1026 Beyond Two Releases 180 1027 Beyond Anonymity 181 103 Dynamic Data Republishing 181 1031 Privacy Threats 182 1032 m-invariance 183 104 I-ID-Composition 185 105 Summary 190

Contents xi 11 Collaborative Anonymization for Vertically Partitioned Data 193 111 Introduction 193 112 Privacy-Preserving Data Mashup 194 1121 Anonymization Problem for Data Mashup 199 1122 Information Metrics 201 1123 Architecture and Protocol 202 1124 Anonymization Algorithm for Semi-Honest Model 204 1125 Anonymization Algorithm for Malicious Model 210 1126 Discussion 217 113 Cryptographic Approach 218 1131 Secure Multiparty Computation 218 1132 Minimal Information Sharing 219 114 Summary and Lesson Learned 220 12 Collaborative Anonymization for Horizontally Partitioned Data 221 121 Introduction 221 122 Privacy Model 222 123 Overview of the Solution 223 124 Discussion 224 IV Anonymizing Complex Data 227 13 Anonymizing Transaction Data 229 131 Introduction 229 1311 Motivations 229 1312 The Transaction Publishing Problem 230 1313 Previous Works on Privacy-Preserving Data Mining 231 1314 Challenges and Requirements 232 132 Cohesion Approach 234 1321 Coherence 235 1322 Item Suppression 236 1323 A Heuristic Suppression Algorithm 237 1324 Itemset-Based Utility 240 1325 Discussion 241 133 Band Matrix Method 242 1331 Band Matrix Representation 243 1332 Constructing Anonymized Groups 243 1333 Reconstruction Error 245 1334 Discussion 246 134 fem-anonymization 247 1341 fcm-anonymity 247 1342 Apriori Anonymization 248 1343 Discussion 250

xii Contents 135 Transactional k-anonymity 252 1351 fc-anonymity for Set Valued Data 252 1352 Top-Down Partitioning Anonymization 254 1353 Discussion 254 136 Anonymizing Query Logs 255 1361 Token-Based Hashing 255 13G2 Secret Sharing 256 1363 Split Personality 257 1364 Other Related Works 257 137 Summary 258 14 Anonymizing Trajectory Data 261 141 Introduction 261 1411 Motivations 261 1412 Attack Models on Trajectory Data 262 142 LA'C-Privacy 264 1421 Trajectory Anonymity for Maximal Frequent Sequences 265 1422 Anonymization Algorithm for LA'C-Privacy 269 1423 Discussion 276 143 (M)-Anonymity 278 1431 Trajectory Anonymity for Minimal Distortion 278 1432 The Never Walk Alone Anonymization Algorithm 280 1433 Discussion 281 144 MOB fc-anonymity 282 1441 Trajectory Anonymity for Minimal Information Loss 282 1442 Anonymization Algorithm for MOB fc-anonymity 284 145 Other Spatio-Tcmporal Anonymization Methods 288 146 Summary 289 15 Anonymizing Social Networks 291 151 Introduction 291 1511 Data Models 292 1512 Attack Models 292 1513 Utility of the Published Data 297 152 General Privacy-Preserving Strategies 297 1521 Graph Modification 298 1522 Equivalence Classes of Nodes 298 153 Anonymization Methods for Social Networks 299 1531 Edge Insertion and Label Generalization 299 1532 Clustering Nodes for fc-anonymity 300 1533 Supergraph Generation 301 1534 Randomized Social Networks 302 1535 Releasing Subgraphs to Users: Link Recovery 302 1536 Not Releasing the Network 302 154 Data Sets 303

Contents xiii 155 Summary 303 16 Sanitizing Textual Data 305 161 Introduction 305 162 ERASE 30G 1621 Sanitization Problem for Documents 306 1622 Privacy Model: /^-Safety 306 1623 Problem Statement 307 1624 Sanitization Algorithms for A'-Safety 308 163 Health Information DE-identification (HIDE) 309 1631 De-Identification Models 309 1632 The HIDE Framework 310 164 Summary 311 17 Other Privacy-Preserving Techniques and Future Trends 313 171 Interactive Query Model 313 172 Privacy Threats Caused by Data Mining Results 315 173 Privacy-Preserving Distributed Data Mining 316 174 Future Directions 317 References 319 Index 341