Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme

Similar documents
Querying Shortest Distance on Large Graphs

Dynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling

SILC: Efficient Query Processing on Spatial Networks

A Novel Method to Estimate the Route and Travel Time with the Help of Location Based Services

An Efficient Technique for Distance Computation in Road Networks

Three-Hop Distance Estimation in Social Graphs

Memory-Efficient Fast Shortest Path Estimation in Large Social Networks

Adaptive Landmark Selection Strategies for Fast Shortest Path Computation in Large Real-World Graphs

Fast Fully Dynamic Landmark-based Estimation of Shortest Path Distances in Very Large Graphs

[Kleinberg04] J. Kleinberg, A. Slivkins, T. Wexler. Triangulation and Embedding using Small Sets of Beacons. Proc. 45th IEEE Symposium on Foundations

Best Keyword Cover Search

I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs

Efficient NKS Queries Search in Multidimensional Dataset through Projection and Multi-Scale Hashing Scheme

Invited Talk: GraphSM/DBKDA About Reachability in Graphs

Distance Join Queries on Spatial Networks

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018

Shortest paths on large graphs: Systems, Algorithms, Applications

9/23/2009 CONFERENCES CONTINUOUS NEAREST NEIGHBOR SEARCH INTRODUCTION OVERVIEW PRELIMINARY -- POINT NN QUERIES

Reach for A : an Efficient Point-to-Point Shortest Path Algorithm

ISSN: (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies

Finding All Nearest Neighbors with a Single Graph Traversal

Exploiting a Page-Level Upper Bound for Multi-Type Nearest Neighbor Queries

The exact distance to destination in undirected world

Applications of Geometric Spanner

Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling with Application to Network Structure Prediction

Dynamic Skyline Queries in Large Graphs

Goal Directed Relative Skyline Queries in Time Dependent Road Networks

Chapter 1, Introduction

Searching of Nearest Neighbor Based on Keywords using Spatial Inverted Index

NOVEL CACHE SEARCH TO SEARCH THE KEYWORD COVERS FROM SPATIAL DATABASE

ANNATTO: Adaptive Nearest Neighbor Queries in Travel Time Networks

Probabilistic Graph Summarization

ProUD: Probabilistic Ranking in Uncertain Databases

MobiPLACE*: A Distributed Framework for Spatio-Temporal Data Streams Processing Utilizing Mobile Clients Processing Power.

Constrained Shortest Path Computation

Landmark-based routing

A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY

Finding Shortest Path on Land Surface

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Inverted Index for Fast Nearest Neighbour

Fractional Cascading in Wireless. Jie Gao Computer Science Department Stony Brook University

Two Ellipse-based Pruning Methods for Group Nearest Neighbor Queries

Inferring the Underlying Structure of Information Cascades

Towards a Taxonomy of Location Based Services

Closest Keywords Search on Spatial Databases

Diversity in Skylines

Using Parallel Spatial Mashup Model to Process k-nn Queries

Foundations of Nearest Neighbor Queries in Euclidean Space

Exploring Graph Partitioning for Shortest Path Queries on Road Networks

Generalizing the Optimality of Multi-Step k-nearest Neighbor Query Processing

Probabilistic Spatial Queries on Existentially Uncertain Data

Charm: A charming network coordinate system

A Study to Handle Dynamic Graph Partitioning

Processing In-Route Nearest Neighbor Queries: A Comparison of Alternative Approaches

Ranking Web Pages by Associating Keywords with Locations

Uncertain Data Management Non-relational models: Graphs

Approximate Continuous K Nearest Neighbor Queries for Continuous Moving Objects with Pre-Defined Paths

Clustering For Similarity Search And Privacyguaranteed Publishing Of Hi-Dimensional Data Ashwini.R #1, K.Praveen *2, R.V.

Indexing Land Surface for Efficient knn Query

Given a graph, find an embedding s.t. greedy routing works

Distributed Data Anonymization with Hiding Sensitive Node Labels

Kyriakos Mouratidis. Curriculum Vitae September 2017

A Hierarchical Approach to Internet Distance Prediction

Incremental calculation of isochrones regarding duration

Aggregate-Max Nearest Neighbor Searching in the Plane

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li

Efficient Subgraph Matching by Postponing Cartesian Products

Fast Similarity Search for High-Dimensional Dataset

Solutions. Location-Based Services (LBS) Problem Statement. PIR Overview. Spatial K-Anonymity

Dynamic Nearest Neighbor Queries in Euclidean Space

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

A Scalable Content- Addressable Network

AN IMPROVED DENSITY BASED k-means ALGORITHM

SeqIndex: Indexing Sequences by Sequential Pattern Analysis

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Fast Nearest Neighbor Search on Road Networks

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

Top-k Keyword Search Over Graphs Based On Backward Search

ISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET

A Survey on Efficient Location Tracker Using Keyword Search

Label Constrained Shortest Path Estimation on Large Graphs

Trajectory Compression under Network constraints

TELCOM2125: Network Science and Analysis

Computing A Near-Maximum Independent Set in Linear Time by Reducing-Peeling

Computing Continuous Skyline Queries without Discriminating between Static and Dynamic Attributes

Where Next? Data Mining Techniques and Challenges for Trajectory Prediction. Slides credit: Layla Pournajaf

In-Route Nearest Neighbor Queries

Scalable Routing for Networks of Dynamic Substrates. Boris Drazic

Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets

The Multi-Rule Partial Sequenced Route Query

Scalable Shortest Paths Browsing on Land Surface

Clustering Moving Objects in Spatial Networks

Voronoi-based Trajectory Search Algorithm for Multi-locations in Road Networks

An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

Secure and Advanced Best Keyword Cover Search over Spatial Database

Distance-based Outlier Detection: Consolidation and Renewed Bearing

Semantics Representation of Probabilistic Data by Using Topk-Queries for Uncertain Data

On the Utility of Inference Mechanisms

Transcription:

Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme Miao Qiao, Hong Cheng, Lijun Chang and Jeffrey Xu Yu The Chinese University of Hong Kong {mqiao, hcheng, ljchang, yu}@secuhkeduhk March 29, 2012

Landmark Embedding Scheme for Shortest Distance Query Landmark Embedding Graph: G(V, E, W ) Landmarks: S = {l 1,, l k } V Embedding: l S, v V, compute δ(l, v) Querying: q = (a, b), estimate Widely used in Road networks [11, 4]; δ(a, b) = min l i S {δ(l i, a) + δ(l i, b)} Social networks and web graphs [8, 7, 10, 2]; Internet graphs[1, 5]; Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 2 / 23

Landmark Embedding Scheme for Shortest Distance Query Landmark Embedding Graph: G(V, E, W ) Landmarks: S = {l 1,, l k } V Embedding: l S, v V, compute δ(l, v) Querying: q = (a, b), estimate Widely used in Road networks [11, 4]; δ(a, b) = min l i S {δ(l i, a) + δ(l i, b)} Social networks and web graphs [8, 7, 10, 2]; Internet graphs[1, 5]; Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 2 / 23

Major Differences among Existing Landmark Embedding Approaches Landmark selection strategy Optimal landmark set selection: NP-Hard! Betweenness centrality based criterion(potamias et al [7]) Minimum K-center formulation(francis et al [1]) Random selection([3, 12, 8, 10, 2]) Graph measure based heuristics Approximate centrality based selection (Potamias et al [7]) Drawback: the performance largely depends on graph properties Landmark organization Hierarchical(Kriegel et al [4]) Sketch-based Landmark(Sarma et al [10], Gubichev et al [2]) Error bound or not (2k 1)-approximation with O(kn 1+1/k ) memory(thorup et al [12]) Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 3 / 23

Major Differences among Existing Landmark Embedding Approaches Landmark selection strategy Optimal landmark set selection: NP-Hard! Betweenness centrality based criterion(potamias et al [7]) Minimum K-center formulation(francis et al [1]) Random selection([3, 12, 8, 10, 2]) Graph measure based heuristics Approximate centrality based selection (Potamias et al [7]) Drawback: the performance largely depends on graph properties Landmark organization Hierarchical(Kriegel et al [4]) Sketch-based Landmark(Sarma et al [10], Gubichev et al [2]) Error bound or not (2k 1)-approximation with O(kn 1+1/k ) memory(thorup et al [12]) Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 3 / 23

Major Differences among Existing Landmark Embedding Approaches Landmark selection strategy Optimal landmark set selection: NP-Hard! Betweenness centrality based criterion(potamias et al [7]) Minimum K-center formulation(francis et al [1]) Random selection([3, 12, 8, 10, 2]) Graph measure based heuristics Approximate centrality based selection (Potamias et al [7]) Drawback: the performance largely depends on graph properties Landmark organization Hierarchical(Kriegel et al [4]) Sketch-based Landmark(Sarma et al [10], Gubichev et al [2]) Error bound or not (2k 1)-approximation with O(kn 1+1/k ) memory(thorup et al [12]) Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 3 / 23

Major Differences among Existing Landmark Embedding Approaches Landmark selection strategy Optimal landmark set selection: NP-Hard! Betweenness centrality based criterion(potamias et al [7]) Minimum K-center formulation(francis et al [1]) Random selection([3, 12, 8, 10, 2]) Graph measure based heuristics Approximate centrality based selection (Potamias et al [7]) Drawback: the performance largely depends on graph properties Landmark organization Hierarchical(Kriegel et al [4]) Sketch-based Landmark(Sarma et al [10], Gubichev et al [2]) Error bound or not (2k 1)-approximation with O(kn 1+1/k ) memory(thorup et al [12]) Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 3 / 23

Other Shortest Distance Computing Approaches Spatial Networks Euclidean distance as a lower bound to prune the search space (Papadias et al [6]) Path-distance oracle of size O(n max(s d, 1 ϵ d )), and answer a query with ϵ-approximation in O(log n) time (Sankaranarayanan et al [9]) Drawback: coordinate dependent Exact distance indexing on non-spatial Networks Tree-width decomposition based [13] Drawback: not scalable Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 4 / 23

Global Landmark Embedding Drawbacks Large error for nearby query nodes δ ( l, a) δ ( l, b) δ ( a, b) ~ δ ( a, b) = δ ( l, a) + δ ( l, b) >> δ ( a, b) Improving accuracy by increasing S draws large overhead Global Landmark Embedding Complexity: Query time: O( S ) Index time: O( S V log( V )) Index size: O( S V ) Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 5 / 23

Global Landmark Embedding Drawbacks Large error for nearby query nodes δ ( l, a) δ ( l, b) δ ( a, b) ~ δ ( a, b) = δ ( l, a) + δ ( l, b) >> δ ( a, b) Improving accuracy by increasing S draws large overhead Global Landmark Embedding Complexity: Query time: O( S ) Index time: O( S V log( V )) Index size: O( S V ) Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 5 / 23

Roadmap Introduction Query-Dependent Local Landmark Scheme Optimization Techniques Experiments Conclusion Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 6 / 23

Query-Dependent Local Landmarks Example Landmark set S = {l 1, l 2, l 3 } For a query (a, b), P(a, b) = (a, e, f, g, b) l 1 l 2 P(l 1, a) = (l 1, c, e, a), P(l 1, b) = (l 1, c, d, g, b) Based on landmark l 1, δ(a, b) = δ(l 1, a) + δ(l 1, b) a c d e f g h l 3 b But based on node c, δ L (a, b) = δ(c, a) + δ(c, b) is more accurate because δ(a, b) = δ L (a, b) + 2δ(l 1, c) Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 7 / 23

Query-Dependent Local Landmarks Example Landmark set S = {l 1, l 2, l 3 } For a query (a, b), P(a, b) = (a, e, f, g, b) l 1 l 2 P(l 1, a) = (l 1, c, e, a), P(l 1, b) = (l 1, c, d, g, b) Based on landmark l 1, δ(a, b) = δ(l 1, a) + δ(l 1, b) a c d e f g h l 3 b But based on node c, δ L (a, b) = δ(c, a) + δ(c, b) is more accurate because δ(a, b) = δ L (a, b) + 2δ(l 1, c) Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 7 / 23

Query-Dependent Local Landmarks Example Landmark set S = {l 1, l 2, l 3 } For a query (a, b), P(a, b) = (a, e, f, g, b) l 1 l 2 P(l 1, a) = (l 1, c, e, a), P(l 1, b) = (l 1, c, d, g, b) Based on landmark l 1, δ(a, b) = δ(l 1, a) + δ(l 1, b) a c d e f g h l 3 b But based on node c, δ L (a, b) = δ(c, a) + δ(c, b) is more accurate because δ(a, b) = δ L (a, b) + 2δ(l 1, c) Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 7 / 23

Query-Dependent Local Landmarks Definition (Query-Dependent Local Landmark Function) Given a global landmark set S a query (a, b), L ab (S) : V k V A more accurate distance estimation, r a,b L ab (S) δ L (a, b) = δ(r a,b, a) + δ(r a,b, b) Requirements for a local landmark function: Efficiently retrieve L ab (S); Efficiently compute δ(l ab (S), a), δ(l ab (S), b); Compact index Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 8 / 23

SPT Based Local Landmark Function Definition (SPT Based Local Landmark Function) L ab (S) = arg min {δ(r, a) + δ(r, b)} r {LCA Tl (a,b) l S} T l : The shortest path tree rooted at l S LCA Tl (a, b) : The Least common ancestor of a and b in T l l 1 l 2 l 1 c d e c d a e f g b a f g h h l 2 b l 3 l 3 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 9 / 23

SPT Based Local Landmark Function Definition (SPT Based Local Landmark Function) L ab (S) = arg min {δ(r, a) + δ(r, b)} r {LCA Tl (a,b) l S} T l : The shortest path tree rooted at l S LCA Tl (a, b) : The Least common ancestor of a and b in T l l 1 c=lca T (a, b) e d δ L (a, b) = δ(c, a) + δ(c, b) a f g = δ(l 1, a) + δ(l 1, b) 2δ(l 1, c) h l 2 b l 3 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 9 / 23

SPT Based Local Landmark Function Accuracy a, b V, Complexity LCA Query Time: O(1) Online Query Time: O( S ) δ(a, b) δ L (a, b) δ(a, b) Offline Embedding Space: O( S V ) Offline Embedding TimeO( S V log( V )) SAME complexities as the global landmark embedding Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 10 / 23

SPT Based Local Landmark Function Accuracy a, b V, Complexity LCA Query Time: O(1) Online Query Time: O( S ) δ(a, b) δ L (a, b) δ(a, b) Offline Embedding Space: O( S V ) Offline Embedding TimeO( S V log( V )) SAME complexities as the global landmark embedding Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 10 / 23

Roadmap Introduction Query-Dependent Local Landmark Scheme Optimization Techniques Index Reduction with Graph Compression Improving the Accuracy by Local Search Experiments Conclusion Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 11 / 23

Index Reduction with Graph Compression Lossless graph compression by removing two special types of nodes: Tree node Map to the entry node Remove the tree node Chain node Map to two end nodes Remove the chain node c b a e g d h f i j k n o l m Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 12 / 23

Index Reduction with Graph Compression Lossless graph compression by removing two special types of nodes: Tree node Map to the entry node Remove the tree node Chain node Map to two end nodes Remove the chain node c b a e g d h f i j k n o l m Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 12 / 23

Index Reduction with Graph Compression For a V, case a: Tree node: map(a) contains a entry node Chain node: map(a) contains two end nodes Otherwise: map(a) contains only itself Query time: O( S ) δ L (a, b) = min {δ(a, r a) + δ L (r a, r b ) + δ(b, r b )} r a map(a),r b map(b) Index size Reduce size from O( S V ) to O( V + ( S 1) V ) V : the number of nodes in the compressed graph Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 13 / 23

Index Reduction with Graph Compression For a V, case a: Tree node: map(a) contains a entry node Chain node: map(a) contains two end nodes Otherwise: map(a) contains only itself Query time: O( S ) δ L (a, b) = min {δ(a, r a) + δ L (r a, r b ) + δ(b, r b )} r a map(a),r b map(b) Index size Reduce size from O( S V ) to O( V + ( S 1) V ) V : the number of nodes in the compressed graph Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 13 / 23

Index Reduction with Graph Compression For a V, case a: Tree node: map(a) contains a entry node Chain node: map(a) contains two end nodes Otherwise: map(a) contains only itself Query time: O( S ) δ L (a, b) = min {δ(a, r a) + δ L (r a, r b ) + δ(b, r b )} r a map(a),r b map(b) Index size Reduce size from O( S V ) to O( V + ( S 1) V ) V : the number of nodes in the compressed graph Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 13 / 23

Improving the Accuracy by Local Search Cost-effective Local Search Procedure Connect two query nodes to local landmarks through the shortest paths Expand each node to include its c-hop neighbors a lca 1 lca 3 lca 2 The expanded nodes may form shortcuts which provide tighter distance estimation lca 4 b Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 14 / 23

Improving the Accuracy by Local Search Cost-effective Local Search Procedure Connect two query nodes to local landmarks through the shortest paths a lca 1 Expand each node to include its c-hop neighbors lca 3 lca 2 The expanded nodes may form shortcuts which provide tighter distance estimation lca 4 b Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 14 / 23

Improving the Accuracy by Local Search Cost-effective Local Search Procedure Connect two query nodes to local landmarks through the shortest paths a lca 1 Expand each node to include its c-hop neighbors lca 3 lca 2 The expanded nodes may form shortcuts which provide tighter distance estimation lca 4 b Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 14 / 23

Improving the Accuracy by Local Search Cost-effective Local Search Procedure Connect two query nodes to local landmarks through the shortest paths a lca 1 Expand each node to include its c-hop neighbors lca 3 lca 2 The expanded nodes may form shortcuts which provide tighter distance estimation lca 4 b Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 14 / 23

Roadmap Introduction Query-Dependent Local Landmark Scheme Optimization Techniques Experiments Conclusion Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 15 / 23

Comparison Methods and Metrics The embedding methods for comparison: Global Landmark Scheme (GLS) Local Landmark Scheme (LLS) Local Search (LS) 2RNE (Kriegel et al [4]) TreeSketch (Gubichev et al [2]) Evaluation Metrics: relative error err = δ(s, t) δ(s, t) δ(s, t) Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 16 / 23

Comparison Methods and Metrics The embedding methods for comparison: Global Landmark Scheme (GLS) Local Landmark Scheme (LLS) Local Search (LS) 2RNE (Kriegel et al [4]) TreeSketch (Gubichev et al [2]) Evaluation Metrics: relative error err = δ(s, t) δ(s, t) δ(s, t) Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 16 / 23

Dataset Description Table: Network Statistics Dataset V E V E Slashdot 77,360 905,468 36,012 752,478 Google 875,713 5,105,039 449,341 4,621,002 Youtube 1,157,827 4,945,382 313,479 4,082,046 Flickr 1,846,198 22,613,981 493,525 18,470,294 NYRN 264,346 733,846 164,843 532,264 USARN 23,947,347 58,333,344 7,911,536 24,882,476 V and E denote the number of nodes and edges in the compressed graph Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 17 / 23

Dataset Description Table: Network Statistics Dataset V E V E Slashdot 77,360 905,468 36,012 752,478 Google 875,713 5,105,039 449,341 4,621,002 Youtube 1,157,827 4,945,382 313,479 4,082,046 Flickr 1,846,198 22,613,981 493,525 18,470,294 NYRN 264,346 733,846 164,843 532,264 USARN 23,947,347 58,333,344 7,911,536 24,882,476 V and E denote the number of nodes and edges in the compressed graph Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 17 / 23

Average Relative Error SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 06309 05072 06346 05131 01825 01121 LLS 01423 00321 00637 00814 00246 00786 LS 00000 00046 00009 00001 00071 00090 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 18 / 23

Average Relative Error SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 06309 05072 06346 05131 01825 01121 LLS 01423 00321 00637 00814 00246 00786 LS 00000 00046 00009 00001 00071 00090 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 18 / 23

Average Relative Error SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 06309 05072 06346 05131 01825 01121 LLS 01423 00321 00637 00814 00246 00786 LS 00000 00046 00009 00001 00071 00090 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 18 / 23

Query Time and Index Size Online Query Time in Milliseconds SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 0002 0005 0008 0009 0006 0020 LLS 0006 0021 0015 0014 0036 0067 LS 0158 2729 2818 4735 0681 58289 Index Size in MB SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 62 579 907 1247 212 19158 LLS 104 1227 1032 1561 853 44246 LS 164 1597 1359 3039 896 46236 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 19 / 23

Query Time and Index Size Online Query Time in Milliseconds SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 0002 0005 0008 0009 0006 0020 LLS 0006 0021 0015 0014 0036 0067 LS 0158 2729 2818 4735 0681 58289 Index Size in MB SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 62 579 907 1247 212 19158 LLS 104 1227 1032 1561 853 44246 LS 164 1597 1359 3039 896 46236 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 19 / 23

Query Time and Index Size Online Query Time in Milliseconds SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 0002 0005 0008 0009 0006 0020 LLS 0006 0021 0015 0014 0036 0067 LS 0158 2729 2818 4735 0681 58289 Index Size in MB SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 62 579 907 1247 212 19158 LLS 104 1227 1032 1561 853 44246 LS 164 1597 1359 3039 896 46236 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 19 / 23

Query Time and Index Size Online Query Time in Milliseconds SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 0002 0005 0008 0009 0006 0020 LLS 0006 0021 0015 0014 0036 0067 LS 0158 2729 2818 4735 0681 58289 Index Size in MB SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 62 579 907 1247 212 19158 LLS 104 1227 1032 1561 853 44246 LS 164 1597 1359 3039 896 46236 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 19 / 23

Query Time and Index Size Online Query Time in Milliseconds SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 0002 0005 0008 0009 0006 0020 LLS 0006 0021 0015 0014 0036 0067 LS 0158 2729 2818 4735 0681 58289 Index Size in MB SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 62 579 907 1247 212 19158 LLS 104 1227 1032 1561 853 44246 LS 164 1597 1359 3039 896 46236 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 19 / 23

Query Time and Index Size Online Query Time in Milliseconds SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 0002 0005 0008 0009 0006 0020 LLS 0006 0021 0015 0014 0036 0067 LS 0158 2729 2818 4735 0681 58289 Index Size in MB SlashD Google Youtube Flickr NYRN USARN k = 20 GLS 62 579 907 1247 212 19158 LLS 104 1227 1032 1561 853 44246 LS 164 1597 1359 3039 896 46236 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 19 / 23

Comparison with Other Methods Dataset Algorithm AvgErr Query Time(ms) Index Size(MB) 2RNE 08345 0001 62 SlashD TreeSketch 00011 0176 374 LS 00000 0158 164 2RNE 05750 0001 579 Google TreeSketch 00048 3549 3837 LS 00046 2729 1597 2RNE 07138 0001 907 Youtube TreeSketch 00005 5317 5877 LS 00009 2818 1359 2RNE 06233 0001 1247 Flickr TreeSketch 00001 7333 9596 LS 00001 4735 3039 2RNE 04748 0001 212 NYRN TreeSketch 00156 1074 1205 LS 00071 0681 896 2RNE 04240 0002 19158 USARN TreeSketch 00379 104769 145555 LS 00090 58289 46236 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 20 / 23

Comparison with Other Methods Dataset Algorithm AvgErr Query Time(ms) Index Size(MB) 2RNE 08345 0001 62 SlashD TreeSketch 00011 0176 374 LS 00000 0158 164 2RNE 05750 0001 579 Google TreeSketch 00048 3549 3837 LS 00046 2729 1597 2RNE 07138 0001 907 Youtube TreeSketch 00005 5317 5877 LS 00009 2818 1359 2RNE 06233 0001 1247 Flickr TreeSketch 00001 7333 9596 LS 00001 4735 3039 2RNE 04748 0001 212 NYRN TreeSketch 00156 1074 1205 LS 00071 0681 896 2RNE 04240 0002 19158 USARN TreeSketch 00379 104769 145555 LS 00090 58289 46236 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 20 / 23

Comparison with Other Methods Dataset Algorithm AvgErr Query Time(ms) Index Size(MB) 2RNE 08345 0001 62 SlashD TreeSketch 00011 0176 374 LS 00000 0158 164 2RNE 05750 0001 579 Google TreeSketch 00048 3549 3837 LS 00046 2729 1597 2RNE 07138 0001 907 Youtube TreeSketch 00005 5317 5877 LS 00009 2818 1359 2RNE 06233 0001 1247 Flickr TreeSketch 00001 7333 9596 LS 00001 4735 3039 2RNE 04748 0001 212 NYRN TreeSketch 00156 1074 1205 LS 00071 0681 896 2RNE 04240 0002 19158 USARN TreeSketch 00379 104769 145555 LS 00090 58289 46236 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 20 / 23

Comparison with Other Methods Dataset Algorithm AvgErr Query Time(ms) Index Size(MB) 2RNE 08345 0001 62 SlashD TreeSketch 00011 0176 374 LS 00000 0158 164 2RNE 05750 0001 579 Google TreeSketch 00048 3549 3837 LS 00046 2729 1597 2RNE 07138 0001 907 Youtube TreeSketch 00005 5317 5877 LS 00009 2818 1359 2RNE 06233 0001 1247 Flickr TreeSketch 00001 7333 9596 LS 00001 4735 3039 2RNE 04748 0001 212 NYRN TreeSketch 00156 1074 1205 LS 00071 0681 896 2RNE 04240 0002 19158 USARN TreeSketch 00379 104769 145555 LS 00090 58289 46236 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 20 / 23

Comparison with Other Methods Dataset Algorithm AvgErr Query Time(ms) Index Size(MB) 2RNE 08345 0001 62 SlashD TreeSketch 00011 0176 374 LS 00000 0158 164 2RNE 05750 0001 579 Google TreeSketch 00048 3549 3837 LS 00046 2729 1597 2RNE 07138 0001 907 Youtube TreeSketch 00005 5317 5877 LS 00009 2818 1359 2RNE 06233 0001 1247 Flickr TreeSketch 00001 7333 9596 LS 00001 4735 3039 2RNE 04748 0001 212 NYRN TreeSketch 00156 1074 1205 LS 00071 0681 896 2RNE 04240 0002 19158 USARN TreeSketch 00379 104769 145555 LS 00090 58289 46236 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 20 / 23

Roadmap Introduction Problem Statement Query-Dependent Local Landmark Scheme Optimization Techniques Experiments Conclusion Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 21 / 23

Conclusion Query-Dependent Local Landmark Scheme Cost-effective in improving accuracy Linear index size, and thus scalable Allows optimization techniques for further reducing index size and relative error Outperforms state-of-art landmark embedding approaches in experiments Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 22 / 23

Q&A Thanks! Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 23 / 23

P Francis, S Jamin, C Jin, Y Jin, D Raz, Y Shavitt, and L Zhang IDMaps: A global internet host distance estimation service IEEE/ACM Trans Networking, 9(5):525 540, 2001 A Gubichev, S Bedathur, S Seufert, and G Weikum Fast and accurate estimation of shortest paths in large graphs In Proc Int Conf Information and Knowledge Management (CIKM 10), 2010 J Kleinberg, A Slivkins, and T Wexler Triangulation and embedding using small sets of beacons In Proc Symp Foundations of Computer Science (FOCS 04), pages 444 453, 2004 H-P Kriegel, P Kröger, M Renz, and T Schmidt Hierarchical graph embedding for efficient query processing in very large traffic networks In Proc Int Conf Scientific and Statistical Database Management (SSDBM 08), pages 150 167, 2008 T S E Ng and H Zhang Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 23 / 23

Predicting internet network distance with coordinates-based approaches In Int Conf on Computer Communications (INFOCOM 01), pages 170 179, 2001 D Papadias, J Zhang, N Mamoulis, and Y Tao Query processing in spatial network database In Proc Int Conf Very Large Data Bases (VLDB 03), pages 802 813, 2003 M Potamias, F Bonchi, C Castillo, and A Gionis Fast shortest path distance estimation in large networks In Proc Int Conf Information and Knowledge Management (CIKM 09), pages 867 876, 2009 M J Rattigan, M Maier, and D Jensen Using structure indices for efficient approximation of network properties In Proc ACM SIGKDD Int Conf Knowledge Discovery and Data Mining (KDD 06), pages 357 366, 2006 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 23 / 23

J Sankaranarayanan, H Samet, and H Alborzi Path oracles for spatial networks PVLDB, pages 1210 1221, 2009 A D Sarma, S Gollapudi, M Najork, and R Panigrahy A sketch-based distance oracle for web-scale graphs In Proc Int Conf Web Search and Data Mining (WSDM 10), pages 401 410, 2010 C Shahabi, M Kolahdouzan, and M Sharifzadeh A road network embedding technique for k-nearest neighbor search in moving object databases In Proc 10th ACM Int Symp Advances in Geographic Information Systems (GIS 02), pages 94 100, 2002 M Thorup and U Zwick Approximate distance oracles Journal of the ACM, 52(1):1 24, 2005 F Wei TEDI: Efficient shortest path query answering on graphs Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 23 / 23

In Proc ACM-SIGMOD Int Conf Management of Data (SIGMOD 10), pages 99 110, 2010 Miao Qiao, et al (CUHK) Approximate Shortest Distance Computing March 29, 2012 23 / 23