Mondrian Mul+dimensional K Anonymity Kristen Lefevre, David J. DeWi<, and Raghu Ramakrishnan George W. Boulos gwf5@pi3.edu October 21 2009
Table Linking
Overview Mo+va+on & contribu+ons Terminology Quality Metrics Mul+dimensional K Anonymiza+on Greedy Par++oning Algorithm Performance experiments
Mo+va+on Protect the data owners privacy using kanonymous tables. Achieve higher quality of anonymzed data. Provide an algorithm for anonymizing tables. The primary goal of k anonymiza3on is to protect the privacy of the individuals to whom the data pertains. However, subject to this constraint, it is important that the released data remain as useful as possible.
Contribu+ons Introducing mul+dimensional k anymiza+on. Introducing a greedy algorithm for Kanonymiza+on: more efficient than proposed op0mal k anonymiza0on algorithms for single dimensional models; complexity O(n log n), compared to exponen0al. The greedy mul0dimensional algorithm oaen produces higher quality results than op0mal singledimensional algorithms. More targeted no0on of quality measurement.
Terminology Quasi IdenAfier: Minimal set of a<ributes X1, Xd in table T that can be joined with external informa+on to re iden+fy individual records. Equivalence class: the set of all tuples in T containing iden0cal values (x1 xd) for X1 Xd. K Anonymity Property: Table T is k anonymous with respect to a<ributes X1 Xd if every unique tuple (x1 xd) in the (mul0set) projec0on of T on X1 Xd occurs at least k 0mes. K AnonymizaAon: A view V of rela0on T is said to be a kanonymiza0on if the view modifies or generalizes the data of T according to some model such that V is k anonymous with respect to the quasi iden+fier.
General Quality Metrics Discernability Metric: Normalized Average Equivalence:
K anonymiza+on global recoding: achieves anonymity by mapping the domains of the quasi iden+fier a<ributes to generalized or altered values.
Single VS. Mul+dimensional K Single dimensional: Anonymiza+on A single dimensional par++oning defines, for each Xi, a set of non overlapping single dimensional intervals that cover Dxi. øi maps each x Є Dxi to sum summary sta0s0c. Mul0 dimensional: A global recoding achieves anonymity by mapping the domains of the quasi iden+fier a<ributes to generalized or altered values. Øi : Dxi x x Dxn D
Single VS. Mul+dimensional K Anonymiza+on (Cont.)
Single dimensional Par++oning A single dimensional par++oning defines, for each Xi, a set of non overlapping single dimensional intervals that cover Dxi. Фi maps each x Є Dxi to some summary sta0s0c for the interval in which it is contained.
Strict Mul+dimensional Par++oning A strict mul+dimensional par++oning defines a set of non overlapping mul+dimensional regions that cover DX1 DXd. Ø maps each tuple (x1 xd) 2 DX1 DXd to a summary sta+s+c for the region in which it is contained. Proposi3on 1: Every single dimensional par00oning for quasi iden0fier awributes X1 Xd can be expressed as a strict mul0dimensional par00oning.
Strict Mul+dimensional Par++oning (Cont.) NP Hard
Single dimensional par++oning vs. Proposi+on 1: mul+dimensional Every single dimensional par00oning for quasi iden0fier awributes X1 Xd can be expressed as a strict mul0dimensional par00oning. However, when d >=2 and for all i, Dxi >= 2, there exists a strict mul0dimensional par00oning that cannot be expressed as a singledimensional par00oning.
Decisional K Anonymous Mul+dimensional Par++oning Given a set P of unique (point, count) pairs, with points in d dimensional space, for every resul0ng mul+dimensional region Ri: OR NP Complete
Allowable Cut Mul+dimensional: A cut perpendicular to axis Xi at xi is allowable if and only if Count(P.Xi > xi) >= k and Count(P.Xi < xi) >= k. Single Dimensional: A single dimensional cut perpendicular to Xi at xi is allowable, given S, if
Minimal Par++oning Minimal Strict Mul+dimensional Par++oning: Let R1 Rn denote a set of regions induced by a strict mul0dimensional par++oning, and let each region Ri contain mul+set Pi of points. This mul0dimensional par00oning is minimal if and there exists no allowable mul+dimensional cut for Pi. Minimal Single Dimensional Par00oning: A set S of allowable single dimensional cuts is a minimal single dimensional par++oning for mul+set P of points if there does not exist an allowable singledimensional cut for P given S.
Bounds on Par++on size in Mul+dimensional K Anonymiza+on
Bounds on Par++on size in Single Dimensional K anonymiza+on <=2k 1
Relaxed Mul+dimensional Par++oning A relaxed mul+dimensional par++oning for rela+on T defines a set of (poten+ally overlapping) dis0nct mul0dimensional regions that cover DX1 DXd. Local recoding func0on Ф maps each tuple (x1 xd) Є T to a summary sta0s0c for one of the regions in which it is contained. Proposi0on 2: Every strict mul0dimensional par00oning can be expressed as a relaxed mul0dimensional par00oning. However, if there are at least two tuples in table T having the same vector of quasi iden0fier values, there exists a relaxed mul0dimensional par00oning that cannot be expressed as a strict mul0dimensional par00oning.
Greedy Par++oning Algorithm Choose the dimension with the widest range of values
Bounds on Quality
Scalability Problem Table may be too large to fit in the available memory Calculate the frequency set of a<ributes and load only the frequency set In memory.
Workload Driven Quality Range Sta+s+cs: Select Avg(Age) From Pa+ents where sex= male Mean Sta+s+cs Select count(*) From Pa+ents where sex= male and age<=26 It is impossible to answer the second query precisely using the singledimensional recoding.
Experimental Evalua+on Used a synthe+c data generator to produce two discrete joint distribu+ons: discrete uniform and discrete normal. Also tested on adults database.
Experimental Evalua+on for Synthe+c data
Experimental Evalua+on for Adults Database
Op+mal single dimensional vs. Greedy strict mul+dimensional par++oning
Strengths vs. Weaknesses Defines the process of k anonymity in a larger and more accurate concept. Mul+dimensional approach make sure to include minimal points in a par++on so the output data is be<er. Any Weaknesses?
Q & A Thank you