Differentially Private H-Tree

Size: px
Start display at page:

Download "Differentially Private H-Tree"

Transcription

1 GeoPrivacy: 2 nd Workshop on Privacy in Geographic Information Collection and Analysis Differentially Private H-Tree Hien To, Liyue Fan, Cyrus Shahabi Integrated Media System Center University of Southern California November 3, 25

2 Motivation Mobile devices collect/share location data Enable applications, e.g., spatial crowdsourcing, traffic monitoring, location-aware recommendation Adversary can infer users sensitive details Many location-based apps require only spatial aggregation of users e.g., spatial crowdsourcing Differential privacy serves that purpose ~ ~3 ~3 ~2 ~4 ~7 ~6 ~3 ~4 ~5 ~2 ~3 ~ task ~ ~6 ~2 ~ ~ ~3 ~ ~ Noisy worker count per grid cell 2

3 Differential Privacy (DP) Ensures adversary do not know whether an individual is present or not in dataset, regardless of background knowledge Allows only aggregate queries, e.g., count, sum ε-indistinguishability L -sensitivity ε σ ( QS) D Pr[ QS = U] ln ε D2 Pr[ QS = U] ε : privacy budget = max D, D 2 q i= QS( D [Dwork 6] ) QS( D D and D2 are sibling datasets that differ in only one record Achieve -DP by adding random Laplace noise with mean and standard deviation λ = σ (QS) / ε [Dwork 6] 2 ) 3

4 Problem Definition Publish private spatial decomposition (PSD) of 2-d dataset Accurately answer count queries Range query fully covers 2 cells and partially covers 2 cells à Estimated result set size: 2*2/2+5*2=3 5 5 Relative error RE PSD (q) = Q PSD (q) A(q) A(q) actual count published noisy counts

5 Related Work ü Kd-tree on top of fixed equal-size grid ü Wavelet transformation [Xiao et al. 2] [Xiao et al. 2] ü Kd-tree, Quad-tree [Cormode et.al ICDE 22] Perturbation error is excessively high on hierarchical partitions and high dimensional data L ü Uniform grid, adaptive grid [Qardaji et al. ICDE 23] ü Extend to higher dimension [Qardaji et al. VLDB 23] Grid-based partitions are not ideal for skewed datasets L ü H-Tree: two-level data-dependent tree [This study] Kd-tree Adaptive grid 5

6 Differentially Private H-Tree Equi-depth multidimensional histograms C C2 C3 C C2 C3 C4 C2 C22 C23 C24 C3 C32 C33 C34 C4 C4 C42 C43 C44 H-Tree of size m=4 C C C2 C3 C4 C2 C22 C23 C24 [Muralikrishna et. al SIGMOD 988] C2 C3 C4 H-Tree structure Canonical range query processing minimizes total error ) Granularity 2) Count/Median budget 3) Post-processing C3 C32 C33 C34 C3 C32 C33 C34 6

7 Granularity Compute H-tree s size mxm that minimizes query estimation error Perturbation error vs. Non-uniformity error C C2 C3 # leaf nodes m 2 w W 2 ε c Granularity trade-off + Laplace error Query size increase à Perturbation error increases à Non-uniformity error decreases m = Wε c / c 4 ww c m # data points in the border C C2 C3 C4 C2 C22 C23 C24 c is a small constant W is the domain size ε c is the count budget C3 C32 C33 C34 H-tree partition C4 C4 C42 C43 C44 7

8 Budget Allocation Strategy Two kinds of budgets. Median budget for 2 levels. Count budget for 2 levels Total budget ε m ε c = ε c +ε 2 c ε = ε m +ε c C C2 C3 C4 C C2 C3 C4 C3 C32 C33 C34 C2 C22 C23 C24 C3 C32 C33 C34 H-Tree structure 8

9 Count Budget Allocation Split count budget across levels of the tree index 2 Minimize Err(q) = n (ε c ) + n (ε c 2 ), subject to ε c = ε c c 2 +ε 2 n: number of level- nodes n2: number of level-2 nodes The proof uses Cauchy Schwarz inequality! ε c c n ( +ε ) (ε c ) + n $! 2 # & " 2 (ε c 2 ) 2 % # " n ε + n 2 c c ε 2 Err(q) is minimized when ε c ε c =,ε c = ε c 3 m m + 3 m $ & % 2 n 2 m n 9

10 Median Budget Allocation Private H-Tree requires selecting private medians Splits apply to the same data à sequential composition Recursively splits each dimensional range à parallel composition Each split ε m 2 log 2 m Use exponential mechanism [McSherry SIGMOD 29] Proposed Slicing Algorithm recursively splits a range at points that are closest to the corresponding medians C C2 C3 C4

11 Input: h-tree of size mxm. Median budget 2. Count budget 3. For each level- node:. Median budget 2. Count budget DP H-tree Algorithm m ε c ε ε 2 m ε 2 c The entire H-tree satisfies ε = ε m +ε c ε -DP by composition property ε m = ε m +ε 2 m ε c = ε c +ε 2 c C C C2 C3 C4 C2 C22 C23 C24 C2 C3 C4 H-Tree structure C3 C32 C33 C34 C3 C32 C33 C34 [Cormode et.al ICDE 22] Trade-off between median budget and count budget ε m =.3ε

12 Datasets Experimental Setup Tiger_NMWA Brightkite Gowalla-Sparse Queries ε = {.,.4,.7,} Privacy budget Query size = {, 2, 4, 8} square km ε m =.4ε ; c = 3 Average relative error over random queries 2

13 Tiger dataset (similar result on Brightkite) Relative error quad-geo kd-hybrid grid-uniform grid-adaptive ht-standard Query size (square km) Relative error quad-geo.2 kd-hybrid grid-uniform. grid-adaptive ht-standard..4.7 Budget Grid-adaptive performs well and even better than datadependent methods 3

14 Gowalla-Sparse Tiger-Syn Relative error.8.6 quad-geo.4 kd-hybrid grid-uniform.2 grid-adaptive ht-standard..4.7 Budget Relative error quad-geo.4 kd-hybrid.3 grid-uniform.2 grid-adaptive. ht-standard..4.7 Budget Grid-adaptive performs arbitrarily worse in the presence of sparseness and outliers 4

15 Conclusion ü Observed drawbacks of high-level trees and grid-based structures ü Proposed several analysis on DP H-tree, i.e., budget allocation, median splitting, post-processing ü DP H-Tree consistently performs well on various datasets i.e., h-tree outperforms kd-tree and quadtree in all cases and adaptive grid for sparse datasets 5

16 Q/A Liyue Fan University of Southern California 6

17 Partitions Adaptive grid Quadtree H-tree Kd-tree 7

Differentially Private H-Tree

Differentially Private H-Tree Differentially Private H-Tree Hien To, Liyue Fan, Cyrus Shahabi Integrated Media Systems Center University of Southern California Los Angeles, CA, U.S.A {hto,liyuefan,shahabi}@usc.edu ABSTRACT In this

More information

CS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong

CS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong Outline Tabular data and histogram/range queries Algorithms for low dimensional data Algorithms for high dimensional

More information

Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring

Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring DBSec 13 Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring Liyue Fan, Li Xiong, Vaidy Sunderam Department of Math & Computer Science Emory University 9/4/2013 DBSec'13:

More information

Differentially Private Multi-Dimensional Time Series Release for Traffic Monitoring

Differentially Private Multi-Dimensional Time Series Release for Traffic Monitoring Differentially Private Multi-Dimensional Time Series Release for Traffic Monitoring Liyue Fan, Li Xiong, and Vaidy Sunderam Emory University Atlanta GA 30322, USA {lfan3,lxiong,vss}@mathcs.emory.edu Abstract.

More information

Differentially Private Spatial Decompositions

Differentially Private Spatial Decompositions Differentially Private Spatial Decompositions Graham Cormode Cecilia Procopiuc Divesh Srivastava AT&T Labs Research {graham, magda, divesh}@research.att.com Entong Shen Ting Yu North Carolina State University

More information

Matrix Mechanism and Data Dependent algorithms

Matrix Mechanism and Data Dependent algorithms Matrix Mechanism and Data Dependent algorithms CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 9 : 590.03 Fall 16 1 Recap: Constrained Inference Lecture 9 : 590.03 Fall 16 2 Constrained Inference

More information

with BLENDER: Enabling Local Search a Hybrid Differential Privacy Model

with BLENDER: Enabling Local Search a Hybrid Differential Privacy Model BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model Brendan Avent 1, Aleksandra Korolova 1, David Zeber 2, Torgeir Hovden 2, Benjamin Livshits 3 University of Southern California 1

More information

CS573 Data Privacy and Security. Differential Privacy. Li Xiong

CS573 Data Privacy and Security. Differential Privacy. Li Xiong CS573 Data Privacy and Security Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques Composition theorems Statistical Data Privacy Non-interactive vs interactive Privacy

More information

Co-clustering for differentially private synthetic data generation

Co-clustering for differentially private synthetic data generation Co-clustering for differentially private synthetic data generation Tarek Benkhelif, Françoise Fessant, Fabrice Clérot and Guillaume Raschia January 23, 2018 Orange Labs & LS2N Journée thématique EGC &

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

Parallel Composition Revisited

Parallel Composition Revisited Parallel Composition Revisited Chris Clifton 23 October 2017 This is joint work with Keith Merrill and Shawn Merrill This work supported by the U.S. Census Bureau under Cooperative Agreement CB16ADR0160002

More information

Algorithms for GIS:! Quadtrees

Algorithms for GIS:! Quadtrees Algorithms for GIS: Quadtrees Quadtree A data structure that corresponds to a hierarchical subdivision of the plane Start with a square (containing inside input data) Divide into 4 equal squares (quadrants)

More information

Data Anonymization. Graham Cormode.

Data Anonymization. Graham Cormode. Data Anonymization Graham Cormode graham@research.att.com 1 Why Anonymize? For Data Sharing Give real(istic) data to others to study without compromising privacy of individuals in the data Allows third-parties

More information

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique P.Nithya 1, V.Karpagam 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College,

More information

Crowdsourcing the Acquisi3on and Analysis of Mobile Videos for Disaster Response

Crowdsourcing the Acquisi3on and Analysis of Mobile Videos for Disaster Response IEEE Big Data 2015, October 31, 2015 Crowdsourcing the Acquisi3on and Analysis of Mobile Videos for Disaster Response Presented by Hien To Dr. Seon Ho Kim Integrated Media Systems Center University of

More information

Privacy Preserving Machine Learning: A Theoretically Sound App

Privacy Preserving Machine Learning: A Theoretically Sound App Privacy Preserving Machine Learning: A Theoretically Sound Approach Outline 1 2 3 4 5 6 Privacy Leakage Events AOL search data leak: New York Times journalist was able to identify users from the anonymous

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

NOWADAYS, mobile devices such as smartphones and tablets

NOWADAYS, mobile devices such as smartphones and tablets 1 Protecting Location Privacy for Task Allocation in Ad Hoc Mobile Cloud Computing Yanmin Gong, Student Member, IEEE, Chi Zhang, Member, IEEE, Yuguang Fang, Fellow, IEEE, and Jinyuan Sun, Member, IEEE

More information

arxiv: v1 [cs.ds] 12 Sep 2016

arxiv: v1 [cs.ds] 12 Sep 2016 Jaewoo Lee Penn State University, University Par, PA 16801 Daniel Kifer Penn State University, University Par, PA 16801 JLEE@CSE.PSU.EDU DKIFER@CSE.PSU.EDU arxiv:1609.03251v1 [cs.ds] 12 Sep 2016 Abstract

More information

Demonstration of Damson: Differential Privacy for Analysis of Large Data

Demonstration of Damson: Differential Privacy for Analysis of Large Data Demonstration of Damson: Differential Privacy for Analysis of Large Data Marianne Winslett 1,2, Yin Yang 1,2, Zhenjie Zhang 1 1 Advanced Digital Sciences Center, Singapore {yin.yang, zhenjie}@adsc.com.sg

More information

arxiv: v2 [cs.db] 5 Feb 2018

arxiv: v2 [cs.db] 5 Feb 2018 Knowledge and Information Systems (KAIS) manuscript No. (will be inserted by the editor) arxiv:169.7983v2 [cs.db] 5 Feb 218 Differentially-Private Counting of Users Spatial Regions Maryam Fanaeepour Benjamin

More information

Solutions. Location-Based Services (LBS) Problem Statement. PIR Overview. Spatial K-Anonymity

Solutions. Location-Based Services (LBS) Problem Statement. PIR Overview. Spatial K-Anonymity 2 Location-Based Services (LBS) Private Queries in Location-Based Services: Anonymizers are Not Necessary Gabriel Ghinita Panos Kalnis Ali Khoshgozaran 2 Cyrus Shahabi 2 Kian Lee Tan LBS users Mobile devices

More information

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept

More information

Multidimensional Indexes [14]

Multidimensional Indexes [14] CMSC 661, Principles of Database Systems Multidimensional Indexes [14] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Motivation Examined indexes when search keys are in 1-D space Many interesting

More information

Frequent grams based Embedding for Privacy Preserving Record Linkage

Frequent grams based Embedding for Privacy Preserving Record Linkage Frequent grams based Embedding for Privacy Preserving Record Linkage ABSTRACT Luca Bonomi Emory University Atlanta, USA lbonomi@mathcs.emory.edu Rui Chen Concordia University Montreal, Canada ru_che@encs.concordia.ca

More information

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University Outline Privacy preserving data publishing: What and Why Examples of privacy attacks

More information

An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database Stephan Müller, Hasso Plattner Enterprise Platform and Integration Concepts Hasso Plattner Institute, Potsdam (Germany)

More information

Answering Approximate Range Aggregate Queries on OLAP Data Cubes with Probabilistic Guarantees

Answering Approximate Range Aggregate Queries on OLAP Data Cubes with Probabilistic Guarantees Answering Approximate Range Aggregate Queries on OLAP Data Cubes with Probabilistic Guarantees Alfredo Cuzzocrea 1, Wei Wang 2, Ugo Matrangolo 3 1 DEIS Dept. University of Calabria 87036 Rende, Cosenza,

More information

0x1A Great Papers in Computer Security

0x1A Great Papers in Computer Security CS 380S 0x1A Great Papers in Computer Security Vitaly Shmatikov http://www.cs.utexas.edu/~shmat/courses/cs380s/ C. Dwork Differential Privacy (ICALP 2006 and many other papers) Basic Setting DB= x 1 x

More information

Differentially Private Histogram Publication

Differentially Private Histogram Publication Noname manuscript No. will be inserted by the editor) Differentially Private Histogram Publication Jia Xu Zhenjie Zhang Xiaokui Xiao Yin Yang Ge Yu Marianne Winslett Received: date / Accepted: date Abstract

More information

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing Nesime Tatbul Uğur Çetintemel Stan Zdonik Talk Outline Problem Introduction Approach Overview Advance Planning with an

More information

Clustering in Ratemaking: Applications in Territories Clustering

Clustering in Ratemaking: Applications in Territories Clustering Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking

More information

Acceleration Data Structures

Acceleration Data Structures CT4510: Computer Graphics Acceleration Data Structures BOCHANG MOON Ray Tracing Procedure for Ray Tracing: For each pixel Generate a primary ray (with depth 0) While (depth < d) { Find the closest intersection

More information

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Clustering. K-means clustering

Clustering. K-means clustering Clustering K-means clustering Clustering Motivation: Identify clusters of data points in a multidimensional space, i.e. partition the data set {x 1,...,x N } into K clusters. Intuition: A cluster is a

More information

Privacy-Preserving Machine Learning

Privacy-Preserving Machine Learning Privacy-Preserving Machine Learning CS 760: Machine Learning Spring 2018 Mark Craven and David Page www.biostat.wisc.edu/~craven/cs760 1 Goals for the Lecture You should understand the following concepts:

More information

K-means Clustering of Proportional Data Using L1 Distance

K-means Clustering of Proportional Data Using L1 Distance K-means Clustering of Proportional Data Using L1 Distance Hisashi Kashima IBM Research, Tokyo Research Laboratory Jianying Hu Bonnie Ray Moninder Singh IBM Research, T.J. Watson Research Center Copyright

More information

Essentials for Modern Data Analysis Systems

Essentials for Modern Data Analysis Systems Essentials for Modern Data Analysis Systems Mehrdad Jahangiri, Cyrus Shahabi University of Southern California Los Angeles, CA 90089-0781 {jahangir, shahabi}@usc.edu Abstract Earth scientists need to perform

More information

pcube: Update-Efficient Online Aggregation with Progressive Feedback and Error Bounds

pcube: Update-Efficient Online Aggregation with Progressive Feedback and Error Bounds pcube: Update-Efficient Online Aggregation with Progressive Feedback and Error Bounds Mirek Riedewald, Divyakant Agrawal, and Amr El Abbadi Department of Computer Science University of California, Santa

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

PrivApprox. Privacy- Preserving Stream Analytics.

PrivApprox. Privacy- Preserving Stream Analytics. PrivApprox Privacy- Preserving Stream Analytics https://privapprox.github.io Do Le Quoc, Martin Beck, Pramod Bhatotia, Ruichuan Chen, Christof Fetzer, Thorsten Strufe July 2017 Motivation Clients Analysts

More information

Indexing Land Surface for Efficient knn Query

Indexing Land Surface for Efficient knn Query Indexing Land Surface for Efficient knn Query Cyrus Shahabi, Lu-An Tang and Songhua Xing InfoLab University of Southern California Los Angeles, CA 90089-0781 http://infolab.usc.edu Outline q Mo+va+on q

More information

An Adaptive Algorithm for Range Queries in Differential Privacy

An Adaptive Algorithm for Range Queries in Differential Privacy Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 6-2016 An Adaptive Algorithm for Range Queries in Differential Privacy Asma Alnemari Follow this and additional

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Ray Tracing Acceleration Data Structures

Ray Tracing Acceleration Data Structures Ray Tracing Acceleration Data Structures Sumair Ahmed October 29, 2009 Ray Tracing is very time-consuming because of the ray-object intersection calculations. With the brute force method, each ray has

More information

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser Datenbanksysteme II: Multidimensional Index Structures 2 Ulf Leser Content of this Lecture Introduction Partitioned Hashing Grid Files kdb Trees kd Tree kdb Tree R Trees Example: Nearest neighbor image

More information

Orthogonal range searching. Orthogonal range search

Orthogonal range searching. Orthogonal range search CG Lecture Orthogonal range searching Orthogonal range search. Problem definition and motiation. Space decomposition: techniques and trade-offs 3. Space decomposition schemes: Grids: uniform, non-hierarchical

More information

Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY

Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY Clustering Algorithm Clustering is an unsupervised machine learning algorithm that divides a data into meaningful sub-groups,

More information

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4 Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is

More information

1. Meshes. D7013E Lecture 14

1. Meshes. D7013E Lecture 14 D7013E Lecture 14 Quadtrees Mesh Generation 1. Meshes Input: Components in the form of disjoint polygonal objects Integer coordinates, 0, 45, 90, or 135 angles Output: A triangular mesh Conforming: A triangle

More information

Spatial Data Structures

Spatial Data Structures CSCI 420 Computer Graphics Lecture 17 Spatial Data Structures Jernej Barbic University of Southern California Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees [Angel Ch. 8] 1 Ray Tracing Acceleration

More information

Spatial Data Structures

Spatial Data Structures CSCI 480 Computer Graphics Lecture 7 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids BSP Trees [Ch. 0.] March 8, 0 Jernej Barbic University of Southern California http://www-bcf.usc.edu/~jbarbic/cs480-s/

More information

Pufferfish: A Semantic Approach to Customizable Privacy

Pufferfish: A Semantic Approach to Customizable Privacy Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu Collaborators: Daniel Kifer (Penn State), Bolin Ding (UIUC, Microsoft Research) idash Privacy Workshop

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Similarity Searching:

Similarity Searching: Similarity Searching: Indexing, Nearest Neighbor Finding, Dimensionality Reduction, and Embedding Methods, for Applications in Multimedia Databases Hanan Samet hjs@cs.umd.edu Department of Computer Science

More information

Summarizing and mining inverse distributions on data streams via dynamic inverse sampling

Summarizing and mining inverse distributions on data streams via dynamic inverse sampling Summarizing and mining inverse distributions on data streams via dynamic inverse sampling Presented by Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Irina Rozenbaum rozenbau@paul.rutgers.edu

More information

Where Next? Data Mining Techniques and Challenges for Trajectory Prediction. Slides credit: Layla Pournajaf

Where Next? Data Mining Techniques and Challenges for Trajectory Prediction. Slides credit: Layla Pournajaf Where Next? Data Mining Techniques and Challenges for Trajectory Prediction Slides credit: Layla Pournajaf o Navigational services. o Traffic management. o Location-based advertising. Source: A. Monreale,

More information

Coding for the Network: Scalable and Multiple description coding Marco Cagnazzo

Coding for the Network: Scalable and Multiple description coding Marco Cagnazzo Coding for the Network: Scalable and Multiple description coding Marco Cagnazzo Overview Examples and motivations Scalable coding for network transmission Techniques for multiple description coding 2 27/05/2013

More information

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015.

Privacy-preserving machine learning. Bo Liu, the HKUST March, 1st, 2015. Privacy-preserving machine learning Bo Liu, the HKUST March, 1st, 2015. 1 Some slides extracted from Wang Yuxiang, Differential Privacy: a short tutorial. Cynthia Dwork, The Promise of Differential Privacy.

More information

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017 Nearest neighbors Focus on tree-based methods Clément Jamin, GUDHI project, Inria March 2017 Introduction Exact and approximate nearest neighbor search Essential tool for many applications Huge bibliography

More information

Distributed Data Mining with Differential Privacy

Distributed Data Mining with Differential Privacy Distributed Data Mining with Differential Privacy Ning Zhang, Ming Li, Wenjing Lou Department of Electrical and Computer Engineering, Worcester Polytechnic Institute, MA Email: {ning, mingli}@wpi.edu,

More information

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017

Differential Privacy. Seminar: Robust Data Mining Techniques. Thomas Edlich. July 16, 2017 Differential Privacy Seminar: Robust Techniques Thomas Edlich Technische Universität München Department of Informatics kdd.in.tum.de July 16, 2017 Outline 1. Introduction 2. Definition and Features of

More information

TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets

TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets Philippe Cudré-Mauroux Eugene Wu Samuel Madden Computer Science and Artificial Intelligence Laboratory Massachusetts Institute

More information

Hybrid Indexes to Expedite Spatial-Visual Search

Hybrid Indexes to Expedite Spatial-Visual Search Hybrid Indexes to Expedite Spatial-Visual Search Abdullah Alfarrarjeh, Cyrus Shahabi Integrated edia Systems Center, University of Southern California, Los Angeles, CA 989 {alfarrar,shahabi}@usc.edu ASTRACT

More information

arxiv: v1 [cs.db] 1 Oct 2014

arxiv: v1 [cs.db] 1 Oct 2014 A Data- and Workload-Aware Algorithm for Range Queries Under Differential Privacy Chao Li, Michael Hay, Gerome Miklau, Yue Wang University of Massachusetts Amherst School of Computer Science {chaoli,miklau,yuewang}@cs.umass.edu

More information

Segmentation of Images

Segmentation of Images Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a

More information

UNIT 2 Data Preprocessing

UNIT 2 Data Preprocessing UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and

More information

Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases

Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases Mohammad Kolahdouzan and Cyrus Shahabi Department of Computer Science University of Southern California Los Angeles, CA, 90089, USA

More information

Approximation Algorithms for Clustering Uncertain Data

Approximation Algorithms for Clustering Uncertain Data Approximation Algorithms for Clustering Uncertain Data Graham Cormode AT&T Labs - Research graham@research.att.com Andrew McGregor UCSD / MSR / UMass Amherst andrewm@ucsd.edu Introduction Many applications

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

Mining Frequent Patterns with Differential Privacy

Mining Frequent Patterns with Differential Privacy Mining Frequent Patterns with Differential Privacy Luca Bonomi (Supervised by Prof. Li Xiong) Department of Mathematics & Computer Science Emory University Atlanta, USA lbonomi@mathcs.emory.edu ABSTRACT

More information

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b simas@cs.aau.dk Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees

More information

CS535 Fall Department of Computer Science Purdue University

CS535 Fall Department of Computer Science Purdue University Spatial Data Structures and Hierarchies CS535 Fall 2010 Daniel G Aliaga Daniel G. Aliaga Department of Computer Science Purdue University Spatial Data Structures Store geometric information Organize geometric

More information

DW Performance Optimization (II)

DW Performance Optimization (II) DW Performance Optimization (II) Overview Data Cube in ROLAP and MOLAP ROLAP Technique(s) Efficient Data Cube Computation MOLAP Technique(s) Prefix Sum Array Multiway Augmented Tree Aalborg University

More information

Essentials for Modern Data Analysis Systems Second NASA Data Mining Workshop

Essentials for Modern Data Analysis Systems Second NASA Data Mining Workshop Essentials for Modern Data Analysis Systems Second NASA Data Mining Workshop Mehrdad Jahangiri, Cyrus shahabi University of Southern California Los Angeles, CA 90089 {jahangir,shahabi}@usc.edu 1 Motivation

More information

m-tsne: A Framework for Visualizing High-Dimensional Multivariate Time Series

m-tsne: A Framework for Visualizing High-Dimensional Multivariate Time Series VAHC - Workshop on Visual Analytics in Healthcare AMIA 2016 Annual Symposium m-tsne: A Framework for Visualizing High-Dimensional Multivariate Time Series Minh Nguyen, Sanjay Purushotham, Hien To, and

More information

11/30/2010 IEEE CloudCom 2010

11/30/2010 IEEE CloudCom 2010 Afsin Akdogan, Ugur Demiryurek, Farnoush Banaei-Kashani and Cyrus Shahabi University of Southern California 11/30/2010 IEEE CloudCom 2010 Outline Motivation Related Work Preliminaries Voronoi Diagram (Index)

More information

USC Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams

USC Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams Cyrus Shahabi and Donghui Yan Integrated Media Systems Center and Computer Science Department, University of Southern California

More information

MAD 12 Monitoring the Dynamics of Network Traffic by Recursive Multi-dimensional Aggregation. Midori Kato, Kenjiro Cho, Michio Honda, Hideyuki Tokuda

MAD 12 Monitoring the Dynamics of Network Traffic by Recursive Multi-dimensional Aggregation. Midori Kato, Kenjiro Cho, Michio Honda, Hideyuki Tokuda MAD 12 Monitoring the Dynamics of Network Traffic by Recursive Multi-dimensional Aggregation Midori Kato, Kenjiro Cho, Michio Honda, Hideyuki Tokuda 1 Background Traffic monitoring is important to detect

More information

Spatial Data Management

Spatial Data Management Spatial Data Management [R&G] Chapter 28 CS432 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite imagery, where each pixel stores a measured value

More information

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification

More information

Organizing Spatial Data

Organizing Spatial Data Organizing Spatial Data Spatial data records include a sense of location as an attribute. Typically location is represented by coordinate data (in 2D or 3D). 1 If we are to search spatial data using the

More information

Spatial Data Management

Spatial Data Management Spatial Data Management Chapter 28 Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite

More information

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Partition Based Perturbation for Privacy Preserving Distributed Data Mining BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 2 Sofia 2017 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2017-0015 Partition Based Perturbation

More information

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li Parallel Physically Based Path-tracing and Shading Part 3 of 2 CIS565 Fall 202 University of Pennsylvania by Yining Karl Li Jim Scott 2009 Spatial cceleration Structures: KD-Trees *Some portions of these

More information

One-Pass Streaming Algorithms

One-Pass Streaming Algorithms One-Pass Streaming Algorithms Theory and Practice Complaints and Grievances about theory in practice Disclaimer Experiences with Gigascope. A practitioner s perspective. Will be using my own implementations,

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Introduction to Mobile Robotics Iterative Closest Point Algorithm. Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Kai Arras

Introduction to Mobile Robotics Iterative Closest Point Algorithm. Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Kai Arras Introduction to Mobile Robotics Iterative Closest Point Algorithm Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Kai Arras 1 Motivation 2 The Problem Given: two corresponding point sets: Wanted: translation

More information

Rate-Distortion Optimized Tree Structured Compression Algorithms for Piecewise Polynomial Images

Rate-Distortion Optimized Tree Structured Compression Algorithms for Piecewise Polynomial Images 1 Rate-Distortion Optimized Tree Structured Compression Algorithms for Piecewise Polynomial Images Rahul Shukla, Pier Luigi Dragotti 1, Minh N. Do and Martin Vetterli Audio-Visual Communications Laboratory,

More information

Outline. The History of Histograms. Yannis Ioannidis University of Athens, Hellas

Outline. The History of Histograms. Yannis Ioannidis University of Athens, Hellas The History of Histograms Yannis Ioannidis University of Athens, Hellas Outline Prehistory Definitions and Framework The Early Past 10 Years Ago The Recent Past Industry Competitors The Future Prehistory

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING Neha V. Sonparote, Professor Vijay B. More. Neha V. Sonparote, Dept. of computer Engineering, MET s Institute of Engineering Nashik, Maharashtra,

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

Data Mining and Machine Learning: Techniques and Algorithms

Data Mining and Machine Learning: Techniques and Algorithms Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,

More information

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 14 B + trees, multi-key indices, partitioned hashing and grid files B and B + -trees are used one implementation

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information