SCALING A DISTRIBUTED SPATIAL CACHE OVERLAY. Alexander Gessler Simon Hanna Ashley Marie Smith

Similar documents
On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage

CHAPTER 7 CONCLUSION AND FUTURE SCOPE

Mobility Models. Larissa Marinho Eglem de Oliveira. May 26th CMPE 257 Wireless Networks. (UCSC) May / 50

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems

Systematic Cooperation in P2P Grids

Aerospike Scales with Google Cloud Platform

Oracle NoSQL Database

Efficient Orienteering-Route Search over Uncertain Spatial Datasets

Application Placement and Demand Distribution in a Global Elastic Cloud: A Unified Approach

Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University)

Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems. Dong Dai, Yong Chen, Dries Kimpe, and Robert Ross

PARALLEL AND DISTRIBUTED PLATFORM FOR PLUG-AND-PLAY AGENT-BASED SIMULATIONS. Wentong CAI

Auto Management for Apache Kafka and Distributed Stateful System in General

From the Outside Looking In: Probing Web APIs to Build Detailed Workload Profile

Simulations of the quadrilateral-based localization

Finding a needle in Haystack: Facebook's photo storage

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing

Applications. Oversampled 3D scan data. ~150k triangles ~80k triangles

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Approximately Uniform Random Sampling in Sensor Networks

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

FROM PEER TO PEER...

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Overview. CPS Architecture Overview. Operations, Administration and Management (OAM) CPS Architecture Overview, page 1 Geographic Redundancy, page 5

SMD149 - Operating Systems - Multiprocessing

1. Memory technology & Hierarchy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

DEPLOYMENT OF PERFORMANCE IN LARGE SCALE WIRELESS MESH NETWORK 1

CSE 5306 Distributed Systems. Consistency and Replication

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

Sensor Tasking and Control

BUYING SERVER HARDWARE FOR A SCALABLE VIRTUAL INFRASTRUCTURE

Building an Internet-Scale Publish/Subscribe System

Locality in Structured Peer-to-Peer Networks

Designing Windows Server 2008 Network and Applications Infrastructure

Challenges in Geographic Routing: Sparse Networks, Obstacles, and Traffic Provisioning

Pyro: A Spatial-Temporal Big-Data Storage System. Shen Li Shaohan Hu Raghu Ganti Mudhakar Srivatsa Tarek Abdelzaher

IBM IBM soliddb and IBM soliddb Universal Cache. Download Full Version :

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

A Fast and High Throughput SQL Query System for Big Data

Geometric data structures:

Achieving Horizontal Scalability. Alain Houf Sales Engineer

CS 655 Advanced Topics in Distributed Systems

Current Topics in OS Research. So, what s hot?

Coherence & WebLogic Server integration with Coherence (Active Cache)

Architekturen für die Cloud

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Fixing the Embarrassing Slowness of OpenDHT on PlanetLab

Approximation Algorithms for NP-Hard Clustering Problems

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Outline. Motivation. Traditional Database Systems. A Distributed Indexing Scheme for Multi-dimensional Range Queries in Sensor Networks

FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS

Landmark-based routing

Data-Centric Query in Sensor Networks

References. System Design. System Design: Performance. What is system design? Keep it Simple. Hints for System Design

Logan City GIS Master Plan. Term Project. Colton Smith. David Tarboton CEE 6440

CSE 124: Networked Services Lecture-16

Veracity: Practical Secure Network Coordinates via Vote-Based Agreements

CSE 124: CONTENT-DISTRIBUTION NETWORKS. George Porter December 4, 2017

Multidimensional Data and Modelling - DBMS

A Hierarchically Aggregated In-Network Global Name Resolution Service for the Mobile Internet

Assessing performance in HP LeftHand SANs

Venice: Reliable Virtual Data Center Embedding in Clouds

Understanding Data Locality in VMware vsan First Published On: Last Updated On:

Adaptive-Mesh-Refinement Pattern

Predictive Elastic Database Systems. Rebecca Taft HPTS 2017

FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM

Diffusing Your Mobile Apps: Extending In-Network Function Virtualisation to Mobile Function Offloading

PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads

An Introduction to Spatial Databases

Connect and Transform Your Digital Business with IBM

Enhancing Throughput of

MySQL High Availability Solutions. Alex Poritskiy Percona

Correlation based File Prefetching Approach for Hadoop

EBOOK DATABASE CONSIDERATIONS FOR DOCKER

Global Analytics in the Face of Bandwidth and Regulatory Constraints

Advanced Architectures for Oracle Database on Amazon EC2

GLIDER: Gradient Landmark-Based Distributed Routing for Sensor Networks. Stanford University. HP Labs

Model-Driven Geo-Elasticity In Database Clouds

Massive Scalability With InterSystems IRIS Data Platform

New Trends in Database Systems

Using MySQL for Distributed Database Architectures

Research Faculty Summit Systems Fueling future disruptions

Big data, little time. Scale-out data serving. Scale-out data serving. Highly skewed key popularity

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

Coordinate-based Routing:

Increase MPLS and SD-WAN Speeds and Reduce Operating Costs

SHARDS & Talus: Online MRC estimation and optimization for very large caches

Hadoop Virtualization Extensions on VMware vsphere 5 T E C H N I C A L W H I T E P A P E R

CA ERwin Data Modeler s Role in the Relational Cloud. Nuccio Piscopo.

Qlik Sense Enterprise architecture and scalability

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Oracle Database 11g: Real Application Testing & Manageability Overview

Nomad: Mitigating Arbitrary Cloud Side Channels via Provider-Assisted Migration

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Geographical routing 1

[This is not an article, chapter, of conference paper!]

CONTENT-DISTRIBUTION NETWORKS

5 Fundamental Strategies for Building a Data-centered Data Center

Transcription:

SCALING A DISTRIBUTED SPATIAL CACHE OVERLAY Alexander Gessler Simon Hanna Ashley Marie Smith

MOTIVATION Location-based services utilize time and geographic behavior of user geotagging photos recommendations based on user location http://www.imore.com/instagram-30-introduces-photo-map-and-improvements Heavy usage of LBS high demand for data 74% of smartphone owners use their phone to get directions or information based on current location Load varies temporally and spatially Anticipated workload fewer queries at night Spatial data skew data denser in cities Motivation: How to adapt to loads in distributed systems characterized by spatial data? 2

LOAD-BALANCING Goal: Prevent decrease in system performance caused by unexpected, heavy workloads that are unevenly distributed. Static i. Use anticipated workloads statistics Dynamic i. Reduce data skew via migration and replication ii. Elastic load-balancing 3

SCALING Scaling-Out: add resources, Scaling-In: remove resources Unlike LB: maximum system throughput changes Expensive: Must not use too often 4

OUR PROJEKT-INF CONTEXT A grid-based cache for spatial data Partition spatial data into a 2D grid Build distributed cache overlay on top Topology: Delaunay triangulation Greedy forwarding / routing http://commons.wikimedia.org/wiki/file:delaunay-triangulation.svg Dedicate cache nodes to cover grid partitions Cache focus 5

PROBLEM STATEMENT We investigate whether Scaling efficiently alleviates dynamic loads in a distributed spatial cache overlay. Does the system scale under increasing query workload or does it need additional load-balancing mechanisms? We quantify efficient by measuring response time in a real computer cluster. Previous study done under a simulation without looking at response time. 6

OUR PROTOTYPE Admin Node Suitable for deployment on a cluster No real caching or query processing We do have a master node (admin) Handle initial overlay construction Very lightweight most tasks distributed Aggregate a global load metric CacheNode #2 Deploys Initial Placement Forward CacheNode #1 CacheNode #3 Query GUI / Manager Users 7

HOW WE HANDLE QUERIES From each AWQL query a single spatial coordinate can be extracted 1. Caller sends query to an arbitrary cache node. 2. Node greedily forwards query to closest node 3. Closest node simulates processing by sleeping 50ms 4. A Response is send to entry node and therefrom forwarded to caller. No actual caching at this time (i.e. everything is a hit). 8

HOW WE SCALE High-load node periodically walks incident triangles and collects loads If aggregate load > threshold: 1. Ensure mutual exclusion in area 2. Add node in triangle center 3. Have nearby nodes reset load estimates {1.2} {1.2, 0.8, 1.0} {1.2, 0.8} Limitation: New cache node started locally No Scaling-in implemented 9

BENCHMARK Experiment groups A cache overlay with n nodes. Over 10 seconds, each node receives k queries / s. We do this for two kinds of queries: Uniform target coordinate is uniformly distributed across the grid Non-uniform a tight Gaussian around an arbitrary point (σ = 18% grid size). Think hotspot around a big city. We measure median request latency Scaling-Out up to 200% of the initial size 10

RESULTS 140 Average load: Scaling-Out does not make things worse, but it also does not help. Median request latency in ms, k=15 queries/s/node 120 100 80 60 40 20 0 n = 8 n = 12 n = 16 Uniform Hotspot Hotspot+Scaling-Out 11

RESULTS 800 High load: Scaling-Out preserves quasi-linear Scalability in the nonuniform case. Median request latency in ms, k=20 queries/s/node 700 600 500 400 300 200 100 0 n = 8 n = 12 n = 16 Uniform Hotspot Hotspot+Scaling-Out 12

CONCLUSION Does Scaling alleviate dynamic loads in a distributed spatial cache overlay? Yes, it does We are able to achieve median response times of about 100ms and quasi-linear relative scalability. But: many limitations 13

FUTURE WORK - SCALING Combine Elastic Load-Balancing + Scaling Maintain global load metric. Use Elastic LB to handle relative load differences and Scaling to respond to global load changes. Scaling-In Yet to be done 14

FUTURE WORK - ROUTING 2D Skipgraph Routing Maintain skiplist-like probabilistic connections to 2 nd, 3 rd neighbors and so forth to achieve Ο(log n) routing. Local Routing Shortlist Query messages track visited nodes + foci. As they route queries, nodes remember this free sample of the global overlay topology. When a query enters the overlay, the entry node uses its shortlist to make the first routing decision. 15

TRIVIA We contributed a paper to GI Informatiktage-2014 Big (Data) is beautiful Alexander Gessler, Simon Hanna, Ashley Marie Smith, Scaling a distributed spatial cache overlay, in GI Seminars Band 13 Informatiktage 2014, ISBN 978-3-885-79-447-9 16

17