Differentially Private H-Tree

Size: px

Start display at page:

Download "Differentially Private H-Tree"

Jocelin Hodges
5 years ago
Views:

1 GeoPrivacy: 2 nd Workshop on Privacy in Geographic Information Collection and Analysis Differentially Private H-Tree Hien To, Liyue Fan, Cyrus Shahabi Integrated Media System Center University of Southern California November 3, 25

2 Motivation Mobile devices collect/share location data Enable applications, e.g., spatial crowdsourcing, traffic monitoring, location-aware recommendation Adversary can infer users sensitive details Many location-based apps require only spatial aggregation of users e.g., spatial crowdsourcing Differential privacy serves that purpose ~ ~3 ~3 ~2 ~4 ~7 ~6 ~3 ~4 ~5 ~2 ~3 ~ task ~ ~6 ~2 ~ ~ ~3 ~ ~ Noisy worker count per grid cell 2

3 Differential Privacy (DP) Ensures adversary do not know whether an individual is present or not in dataset, regardless of background knowledge Allows only aggregate queries, e.g., count, sum ε-indistinguishability L -sensitivity ε σ ( QS) D Pr[ QS = U] ln ε D2 Pr[ QS = U] ε : privacy budget = max D, D 2 q i= QS( D [Dwork 6] ) QS( D D and D2 are sibling datasets that differ in only one record Achieve -DP by adding random Laplace noise with mean and standard deviation λ = σ (QS) / ε [Dwork 6] 2 ) 3

Problem Definition Publish private spatial decomposition (PSD) of 2-d dataset Accurately answer count queries Range query fully covers 2 cells and

4 Problem Definition Publish private spatial decomposition (PSD) of 2-d dataset Accurately answer count queries Range query fully covers 2 cells and partially covers 2 cells à Estimated result set size: 2*2/2+5*2=3 5 5 Relative error RE PSD (q) = Q PSD (q) A(q) A(q) actual count published noisy counts

5 Related Work ü Kd-tree on top of fixed equal-size grid ü Wavelet transformation [Xiao et al. 2] [Xiao et al. 2] ü Kd-tree, Quad-tree [Cormode et.al ICDE 22] Perturbation error is excessively high on hierarchical partitions and high dimensional data L ü Uniform grid, adaptive grid [Qardaji et al. ICDE 23] ü Extend to higher dimension [Qardaji et al. VLDB 23] Grid-based partitions are not ideal for skewed datasets L ü H-Tree: two-level data-dependent tree [This study] Kd-tree Adaptive grid 5

Differentially Private H-Tree Equi-depth multidimensional histograms C C2 C3 C C2 C3 C4 C2 C22 C23 C24 C3 C32 C33 C34 C4 C4 C42 C43 C44 H-Tree of size m=4 C C C2 C3 C4 C2 C22 C23 C24

6 Differentially Private H-Tree Equi-depth multidimensional histograms C C2 C3 C C2 C3 C4 C2 C22 C23 C24 C3 C32 C33 C34 C4 C4 C42 C43 C44 H-Tree of size m=4 C C C2 C3 C4 C2 C22 C23 C24 [Muralikrishna et. al SIGMOD 988] C2 C3 C4 H-Tree structure Canonical range query processing minimizes total error ) Granularity 2) Count/Median budget 3) Post-processing C3 C32 C33 C34 C3 C32 C33 C34 6

7 Granularity Compute H-tree s size mxm that minimizes query estimation error Perturbation error vs. Non-uniformity error C C2 C3 # leaf nodes m 2 w W 2 ε c Granularity trade-off + Laplace error Query size increase à Perturbation error increases à Non-uniformity error decreases m = Wε c / c 4 ww c m # data points in the border C C2 C3 C4 C2 C22 C23 C24 c is a small constant W is the domain size ε c is the count budget C3 C32 C33 C34 H-tree partition C4 C4 C42 C43 C44 7

8 Budget Allocation Strategy Two kinds of budgets. Median budget for 2 levels. Count budget for 2 levels Total budget ε m ε c = ε c +ε 2 c ε = ε m +ε c C C2 C3 C4 C C2 C3 C4 C3 C32 C33 C34 C2 C22 C23 C24 C3 C32 C33 C34 H-Tree structure 8

level-2 nodes The proof uses Cauchy Schwarz inequality! ε c c n ( +ε ) (ε c ) + n $!

9 Count Budget Allocation Split count budget across levels of the tree index 2 Minimize Err(q) = n (ε c ) + n (ε c 2 ), subject to ε c = ε c c 2 +ε 2 n: number of level- nodes n2: number of level-2 nodes The proof uses Cauchy Schwarz inequality! ε c c n ( +ε ) (ε c ) + n $! 2 # & " 2 (ε c 2 ) 2 % # " n ε + n 2 c c ε 2 Err(q) is minimized when ε c ε c =,ε c = ε c 3 m m + 3 m $ & % 2 n 2 m n 9

10 Median Budget Allocation Private H-Tree requires selecting private medians Splits apply to the same data à sequential composition Recursively splits each dimensional range à parallel composition Each split ε m 2 log 2 m Use exponential mechanism [McSherry SIGMOD 29] Proposed Slicing Algorithm recursively splits a range at points that are closest to the corresponding medians C C2 C3 C4

11 Input: h-tree of size mxm. Median budget 2. Count budget 3. For each level- node:. Median budget 2. Count budget DP H-tree Algorithm m ε c ε ε 2 m ε 2 c The entire H-tree satisfies ε = ε m +ε c ε -DP by composition property ε m = ε m +ε 2 m ε c = ε c +ε 2 c C C C2 C3 C4 C2 C22 C23 C24 C2 C3 C4 H-Tree structure C3 C32 C33 C34 C3 C32 C33 C34 [Cormode et.al ICDE 22] Trade-off between median budget and count budget ε m =.3ε

12 Datasets Experimental Setup Tiger_NMWA Brightkite Gowalla-Sparse Queries ε = {.,.4,.7,} Privacy budget Query size = {, 2, 4, 8} square km ε m =.4ε ; c = 3 Average relative error over random queries 2

13 Tiger dataset (similar result on Brightkite) Relative error quad-geo kd-hybrid grid-uniform grid-adaptive ht-standard Query size (square km) Relative error quad-geo.2 kd-hybrid grid-uniform. grid-adaptive ht-standard..4.7 Budget Grid-adaptive performs well and even better than datadependent methods 3

14 Gowalla-Sparse Tiger-Syn Relative error.8.6 quad-geo.4 kd-hybrid grid-uniform.2 grid-adaptive ht-standard..4.7 Budget Relative error quad-geo.4 kd-hybrid.3 grid-uniform.2 grid-adaptive. ht-standard..4.7 Budget Grid-adaptive performs arbitrarily worse in the presence of sparseness and outliers 4

15 Conclusion ü Observed drawbacks of high-level trees and grid-based structures ü Proposed several analysis on DP H-tree, i.e., budget allocation, median splitting, post-processing ü DP H-Tree consistently performs well on various datasets i.e., h-tree outperforms kd-tree and quadtree in all cases and adaptive grid for sparse datasets 5

16 Q/A Liyue Fan University of Southern California 6

17 Partitions Adaptive grid Quadtree H-tree Kd-tree 7

Differentially Private H-Tree

Differentially Private H-Tree Hien To, Liyue Fan, Cyrus Shahabi Integrated Media Systems Center University of Southern California Los Angeles, CA, U.S.A {hto,liyuefan,shahabi}@usc.edu ABSTRACT In this