Mondrian Mul+dimensional K Anonymity

Similar documents
Incognito: Efficient Full Domain K Anonymity

Data Anonymization - Generalization Algorithms

Web- Scale Mul,media: Op,mizing LSH. Malcolm Slaney Yury Li<shits Junfeng He Y! Research

Decision Trees, Random Forests and Random Ferns. Peter Kovesi

Keyword search in databases: the power of RDBMS

Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn

OpenWorld 2015 Oracle Par22oning

Deformable Part Models

Extending Heuris.c Search

Stages of (Batch) Machine Learning

Introduction to Database Systems CSE 444, Winter 2011

Preserving Privacy during Big Data Publishing using K-Anonymity Model A Survey

Robust Identification of Fuzzy Duplicates

On Op%mality of Clustering by Space Filling Curves

Query and Join Op/miza/on 11/5

CS573 Data Privacy and Security. Li Xiong

Spa$al Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1

Crowdsourcing the Acquisi3on and Analysis of Mobile Videos for Disaster Response

Chunking: An Empirical Evalua3on of So7ware Architecture (?)

Hypergraph Sparsifica/on and Its Applica/on to Par//oning

ACT s College Readiness Standards

Machine Learning Crash Course: Part I

ECS 165B: Database System Implementa6on Lecture 14

CITS4009 Introduc0on to Data Science

Database Design CENG 351

Utility-Based Anonymization Using Local Recoding

AN EFFECTIVE FRAMEWORK FOR EXTENDING PRIVACY- PRESERVING ACCESS CONTROL MECHANISM FOR RELATIONAL DATA

Privacy-preserving Anonymization of Set-valued Data

Coupled Conges,on Control for RTP Media. Safiqul Islam, Michael Welzl, Stein Gjessing and Naeem Khademi Department of Informa,cs University of Oslo

Privacy Preserved Data Publishing Techniques for Tabular Data

Differen'al Privacy. CS 297 Pragya Rana

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson

h7ps://bit.ly/citustutorial

Informa(on Retrieval

Starchart*: GPU Program Power/Performance Op7miza7on Using Regression Trees

Fix- point engine in Z3. Krystof Hoder Nikolaj Bjorner Leonardo de Moura

Anonymity in Unstructured Data

Classification: Decision Trees

Informa(on Retrieval

Today s Class. High Dimensional Data & Dimensionality Reduc8on. Readings for This Week: Today s Class. Scien8fic Data. Misc. Personal Data 2/22/12

Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein

Utility-Based k-anonymization

Achieving k-anonmity* Privacy Protection Using Generalization and Suppression

Global Analytics in the Face of Bandwidth and Regulatory Constraints

Collabora've, Privacy Preserving Data Aggrega'on at Scale

Introduc)on to Informa)on Visualiza)on

Security Control Methods for Statistical Database

UNIT II A. ENTITY RELATIONSHIP MODEL

MPI & OpenMP Mixed Hybrid Programming

There is a tempta7on to say it is really used, it must be good

Graph-Based Synopses for Relational Data. Alkis Polyzotis (UC Santa Cruz)

Data Flow Analysis. Suman Jana. Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006)

SEDA An architecture for Well Condi6oned, scalable Internet Services

Survey of Anonymity Techniques for Privacy Preserving

Origin- des*na*on Flow Measurement in High- Speed Networks

Introduc)on to Probabilis)c Latent Seman)c Analysis. NYP Predic)ve Analy)cs Meetup June 10, 2010

CS: Formal Methods in Software Engineering

Anonymization Algorithms - Microaggregation and Clustering

Introduction to MAPPER

Human Factors in Anonymous Mobile Communications

CrowdLogging: Distributed, private, and anonymous search logging

Informa/on Retrieval. Text Search. CISC437/637, Lecture #23 Ben CartereAe. Consider a database consis/ng of long textual informa/on fields

HIDDEN SLIDE Summary These slides are meant to be used as is to give an upper level view of perfsonar for an audience that is not familiar with the

MFTP: a Clean- Slate Transport Protocol for the Informa8on Centric MobilityFirst Network

CS 6140: Machine Learning Spring 2017

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

RESTful Design for Internet of Things Systems

CompSci Understanding Data: Theory and Applica>ons

Online Algorithms for Mul2-commodity Network Design

hashfs Applying Hashing to Op2mize File Systems for Small File Reads

Ensemble- Based Characteriza4on of Uncertain Features Dennis McLaughlin, Rafal Wojcik

Predic'ng ALS Progression with Bayesian Addi've Regression Trees

Virtual Synchrony. Jared Cantwell

Visualizing Logical Dependencies in SWRL Rule Bases

DD2451 Parallel and Distributed Computing --- FDD3008 Distributed Algorithms

A Forward Scan based Plane Sweep Algorithm for Parallel Interval Joins

Input: n jobs (associated start time s j, finish time f j, and value v j ) for j = 1 to n M[j] = empty M[0] = 0. M-Compute-Opt(n)

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Hiding the Presence of Individuals from Shared Databases: δ-presence

Spa$al Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Decision Support Systems

: Advanced Compiler Design. 8.0 Instruc?on scheduling

Introduction to Data Mining

If ( ) is approximated by a left sum using three inscribed rectangles of equal width on the x-axis, then the approximation is

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #2: The Rela0onal Model, and SQL/Rela0onal Algebra

Today s Objec2ves. Kerberos. Kerberos Peer To Peer Overlay Networks Final Projects

Survey of k-anonymity

Using Sequen+al Run+me Distribu+ons for the Parallel Speedup Predic+on of SAT Local Search

TOPOLOGY, DR. BLOCK, FALL 2015, NOTES, PART 3.

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #21: Data Mining and Warehousing

Ar#ficial Intelligence

Dr. Ulas Bagci

Detec%ng the Temporal Context of Queries. Oliver Kennedy, Ying Yang, Jan Chomicki, Ronny Fehling, Zhen Hua Liu, and Dieter Gawlick 09/01/2014

STA 4273H: Sta-s-cal Machine Learning

Efficient and Scalable Socware Detec2on in Online Social Networks

Review. Objec,ves. Example Students Table. Database Overview 3/8/17. PostgreSQL DB Elas,csearch. Databases

Chapter 10 Advanced topics in relational databases

Scalable Package Queries in Rela2onal Database Systems. Ma9eo Brucato Juan F. Beltran Azza Abouzied Alexandra Meliou

SQL- Updates, Asser0ons and Views

Transcription:

Mondrian Mul+dimensional K Anonymity Kristen Lefevre, David J. DeWi<, and Raghu Ramakrishnan George W. Boulos gwf5@pi3.edu October 21 2009

Table Linking

Overview Mo+va+on & contribu+ons Terminology Quality Metrics Mul+dimensional K Anonymiza+on Greedy Par++oning Algorithm Performance experiments

Mo+va+on Protect the data owners privacy using kanonymous tables. Achieve higher quality of anonymzed data. Provide an algorithm for anonymizing tables. The primary goal of k anonymiza3on is to protect the privacy of the individuals to whom the data pertains. However, subject to this constraint, it is important that the released data remain as useful as possible.

Contribu+ons Introducing mul+dimensional k anymiza+on. Introducing a greedy algorithm for Kanonymiza+on: more efficient than proposed op0mal k anonymiza0on algorithms for single dimensional models; complexity O(n log n), compared to exponen0al. The greedy mul0dimensional algorithm oaen produces higher quality results than op0mal singledimensional algorithms. More targeted no0on of quality measurement.

Terminology Quasi IdenAfier: Minimal set of a<ributes X1, Xd in table T that can be joined with external informa+on to re iden+fy individual records. Equivalence class: the set of all tuples in T containing iden0cal values (x1 xd) for X1 Xd. K Anonymity Property: Table T is k anonymous with respect to a<ributes X1 Xd if every unique tuple (x1 xd) in the (mul0set) projec0on of T on X1 Xd occurs at least k 0mes. K AnonymizaAon: A view V of rela0on T is said to be a kanonymiza0on if the view modifies or generalizes the data of T according to some model such that V is k anonymous with respect to the quasi iden+fier.

General Quality Metrics Discernability Metric: Normalized Average Equivalence:

K anonymiza+on global recoding: achieves anonymity by mapping the domains of the quasi iden+fier a<ributes to generalized or altered values.

Single VS. Mul+dimensional K Single dimensional: Anonymiza+on A single dimensional par++oning defines, for each Xi, a set of non overlapping single dimensional intervals that cover Dxi. øi maps each x Є Dxi to sum summary sta0s0c. Mul0 dimensional: A global recoding achieves anonymity by mapping the domains of the quasi iden+fier a<ributes to generalized or altered values. Øi : Dxi x x Dxn D

Single VS. Mul+dimensional K Anonymiza+on (Cont.)

Single dimensional Par++oning A single dimensional par++oning defines, for each Xi, a set of non overlapping single dimensional intervals that cover Dxi. Фi maps each x Є Dxi to some summary sta0s0c for the interval in which it is contained.

Strict Mul+dimensional Par++oning A strict mul+dimensional par++oning defines a set of non overlapping mul+dimensional regions that cover DX1 DXd. Ø maps each tuple (x1 xd) 2 DX1 DXd to a summary sta+s+c for the region in which it is contained. Proposi3on 1: Every single dimensional par00oning for quasi iden0fier awributes X1 Xd can be expressed as a strict mul0dimensional par00oning.

Strict Mul+dimensional Par++oning (Cont.) NP Hard

Single dimensional par++oning vs. Proposi+on 1: mul+dimensional Every single dimensional par00oning for quasi iden0fier awributes X1 Xd can be expressed as a strict mul0dimensional par00oning. However, when d >=2 and for all i, Dxi >= 2, there exists a strict mul0dimensional par00oning that cannot be expressed as a singledimensional par00oning.

Decisional K Anonymous Mul+dimensional Par++oning Given a set P of unique (point, count) pairs, with points in d dimensional space, for every resul0ng mul+dimensional region Ri: OR NP Complete

Allowable Cut Mul+dimensional: A cut perpendicular to axis Xi at xi is allowable if and only if Count(P.Xi > xi) >= k and Count(P.Xi < xi) >= k. Single Dimensional: A single dimensional cut perpendicular to Xi at xi is allowable, given S, if

Minimal Par++oning Minimal Strict Mul+dimensional Par++oning: Let R1 Rn denote a set of regions induced by a strict mul0dimensional par++oning, and let each region Ri contain mul+set Pi of points. This mul0dimensional par00oning is minimal if and there exists no allowable mul+dimensional cut for Pi. Minimal Single Dimensional Par00oning: A set S of allowable single dimensional cuts is a minimal single dimensional par++oning for mul+set P of points if there does not exist an allowable singledimensional cut for P given S.

Bounds on Par++on size in Mul+dimensional K Anonymiza+on

Bounds on Par++on size in Single Dimensional K anonymiza+on <=2k 1

Relaxed Mul+dimensional Par++oning A relaxed mul+dimensional par++oning for rela+on T defines a set of (poten+ally overlapping) dis0nct mul0dimensional regions that cover DX1 DXd. Local recoding func0on Ф maps each tuple (x1 xd) Є T to a summary sta0s0c for one of the regions in which it is contained. Proposi0on 2: Every strict mul0dimensional par00oning can be expressed as a relaxed mul0dimensional par00oning. However, if there are at least two tuples in table T having the same vector of quasi iden0fier values, there exists a relaxed mul0dimensional par00oning that cannot be expressed as a strict mul0dimensional par00oning.

Greedy Par++oning Algorithm Choose the dimension with the widest range of values

Bounds on Quality

Scalability Problem Table may be too large to fit in the available memory Calculate the frequency set of a<ributes and load only the frequency set In memory.

Workload Driven Quality Range Sta+s+cs: Select Avg(Age) From Pa+ents where sex= male Mean Sta+s+cs Select count(*) From Pa+ents where sex= male and age<=26 It is impossible to answer the second query precisely using the singledimensional recoding.

Experimental Evalua+on Used a synthe+c data generator to produce two discrete joint distribu+ons: discrete uniform and discrete normal. Also tested on adults database.

Experimental Evalua+on for Synthe+c data

Experimental Evalua+on for Adults Database

Op+mal single dimensional vs. Greedy strict mul+dimensional par++oning

Strengths vs. Weaknesses Defines the process of k anonymity in a larger and more accurate concept. Mul+dimensional approach make sure to include minimal points in a par++on so the output data is be<er. Any Weaknesses?

Q & A Thank you