GlobeTP: Template-Based Database Replication for Scalable. Web Applications

Similar documents
Towards Autonomic Hosting of Multi-tier Internet Applications

SCALABLE HOSTING OF WEB APPLICATIONS SWAMINATHAN SIVASUBRAMANIAN

Autonomic Data Placement Strategies for Update-intensive Web applications

An Enhanced Binning Algorithm for Distributed Web Clusters

Index. ADEPT (tool for modelling proposed systerns),

Application Specific Data Replication for Edge Services

Distributed Systems Principles and Paradigms. Chapter 12: Distributed Web-Based Systems

Providing Multi-tenant Services with FPGAs: Case Study on a Key-Value Store

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

A Ph.D. Dissertation Proposal By Jozsef Patvarczki. Dissertation Committee:

Huge market -- essentially all high performance databases work this way

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CA464 Distributed Programming

Sparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica

Consistency-preserving Caching of Dynamic Database Content

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Web Services - Concepts, Architecture and Applications Part 3: Asynchronous middleware

Distributed Information Processing

An Efficient Storage Mechanism to Distribute Disk Load in a VoD Server

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Enhancing Throughput of

QLE10000 Series Adapter Provides Application Benefits Through I/O Caching

LOAD BALANCING ALGORITHMS ROUND-ROBIN (RR), LEAST- CONNECTION, AND LEAST LOADED EFFICIENCY

THE emerging edge services architecture distributes Web

Chapter 7 Consistency And Replication

Scalability of web applications

High Availability/ Clustering with Zend Platform

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Scalable Transactions for Web Applications in the Cloud

The Google File System

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

02 - Distributed Systems

A Peer-to-Peer System to Bring Content Distribution Networks to the Masses Guillaume Pierre Vrije Universiteit, Amsterdam

Gustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2

Distributed KIDS Labs 1

Database Replication in Tashkent. CSEP 545 Transaction Processing Sameh Elnikety

Web Replica Hosting Systems Design

Enhancing Edge Computing with Database Replication

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 7 Consistency And Replication

Distributed Systems Principles and Paradigms

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

Performance Evaluation of NoSQL Databases

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

VOLTDB + HP VERTICA. page

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Rediffmail Enterprise High Availability Architecture

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Current Topics in OS Research. So, what s hot?

NetAirt: A Flexible Redirection System for Apache

Ceph: A Scalable, High-Performance Distributed File System

Using MVCC for Clustered Databases

Parallel Processing SIMD, Vector and GPU s cont.

April 21, 2017 Revision GridDB Reliability and Robustness

Clustered Network Applications

Performance and Scalability with Griddable.io

The Google File System

02 - Distributed Systems

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

TopLink Grid: Scaling JPA applications with Coherence

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Processes and threads

Memory-Based Cloud Architectures

Integrity in Distributed Databases

Scalable Tools - Part I Introduction to Scalable Tools

GIS - Clustering Architectures. Raj Kumar Integration Management 9/25/2008

A QOS-AWARE WEB SERVICE REPLICA SELECTION FRAMEWORK FOR AN EXTRANET

Chapter 20: Database System Architectures

Configuring Network Load Balancing

Hi! NET Developer Group Braunschweig!

ayaz ali Micro & Macro Scheduling Techniques Ayaz Ali Department of Computer Science University of Houston Houston, TX

A Mediator based Dynamic Server Load Balancing Approach using SDN

Chapter 18: Parallel Databases

Multiprocessor Scheduling. Multiprocessor Scheduling

Multiprocessor Scheduling

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing

Virtualized SQL Server Performance and Scaling on Dell EMC XC Series Web-Scale Hyper-converged Appliances Powered by Nutanix Software

PERFORMANCE GUARANTEES FOR WEB APPLICATIONS

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Whitepaper. 4 Ways to Improve ASP.NET Performance. Under Peak Loads. Iqbal Khan. Copyright 2015 by Alachisoft

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

RA-GRS, 130 replication support, ZRS, 130

Network Load Balancing Methods: Experimental Comparisons and Improvement

Esper EQC. Horizontal Scale-Out for Complex Event Processing

Scalable Consistency Management for Web Database Caches

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Scaling Internet Routers Using Optics Producing a 100TB/s Router. Ashley Green and Brad Rosen February 16, 2004

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

<Insert Picture Here> Exadata MAA Best Practices Series Session 1: E-Business Suite on Exadata

Data Informatics. Seon Ho Kim, Ph.D.

Chapter 11 DISTRIBUTED FILE SYSTEMS

Getafix: Workload-aware Distributed Interactive Analytics

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Performance Isolation in Multi- Tenant Relational Database-asa-Service. Sudipto Das (Microsoft Research)

Building a Scalable Architecture for Web Apps - Part I (Lessons Directi)

Modeling End-to-End Response Times in Multi-Tier Internet Applications

Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach

Transcription:

GlobeTP: Template-Based Database Replication for Scalable Page 1 of 18 Web Applications Tobias Groothuyse, Swaminathan Sivasubramanian, and Guillaume Pierre. In procedings of WWW 2007, May 8-12, 2007, Banff, Alberta, Canada. Dina Adel Said dsaid@vt.edu

Problem Definition Page 2 of 18 How to provide a scalable infrastructure for hosting dynamically generated web content? Past Solutions: 1. Cache generated pages 2. Distribute the computational across multiple application servers 3. Cache the results of DB queries. Problems: Bottleneck resides in the throughput of the origin DB.

Problem Definition (cont.) Solution: Use DB Replication. Page 3 of 18 Problem: Doesn t scale linearly because all update, delete, insert (UDI) queries are performed to each DB relipca. Past solutions: 1. Increase the throughput of each individual sever 2. Partial Replication

Partial Replication Page 4 of 18 Past Solutions: Depending on the application programmer Gao et al. [2003] GlobeDB: Sivasubramanian et al. [2005]. Record-level replication granularity Provides excellent query latency A central sever maintains all the updates then sends batch updates to other servers. Does not improve the thoughput because the central server provides a bottleneck.

DBTP: Template-Based solution Page 5 of 18 The nature of web applications belong to small number of query templates. Query template: parameterized SQL query where parameters are passed at run time. By knowing these templates, table placements are selected to insure maximum throughput and reasonable latency.

Models Page 6 of 18 Application Model: The application programmer is required to specify explicity the application templates. System Model:

Main problems to consider Page 7 of 18 1. Cluster Identification: Ensure that the placement of tables would find at least one server to execute each query template. 2. Consider all the defined templates, read or UDI, and determine the best placement to provide the maximum throughput. 3. Define a load balancing algorithm that allows read queries to distribute efficiently.

Data Placement: Cluster Identification Page 8 of 18 Goal: Determines the set of tables that is needed to be replicated together so that templates function correctly. Meanwhile, number of servers that must execute the UDI query should be minimized. Characterize each query template: 1. Whether it is read or UDI 2. The set of tables that it accesses.

Data Placement: Load Analysis Page 9 of 18 Determines the load received by each of the cluster. Determines the load on Table Clusters: Read or UDI query Frequency of template occurrence Computational complexity for executing this query: Use DB systems tools to estimate the actual execution time. Run the query in a live system. Determines the load on DB servers (Read or UDI query)

Data Placement: Cluster Placement Page 10 of 18 Determines the placement of the cluster across the set of DB servers load achieved by each replica is minimized. Using exhaustive search O(2 N T /N!), where T is No. of tables and N number of Nodes.

Query Routing Page 11 of 18 Round Robin (RR): Efficient if all coming queries have the same cost. RR-QID: RR by Query ID Each Query template is identified by its QID. Each queue is associated with the set of DB servers that can server a certain QID. RR fashion is implemented for each queue. Cost-based Routing Upon arrival of incoming query, the query router estimates the current load on each DB server. The Query is scheduled to the least loaded DB server (that can serve the query).

Experiments Page 12 of 18 Compare Globe-TP with full DB replication using: TPC-W: standard e-commerce benchmark RUBBoS: bulletin-board benchmark modeled after slashdot.org

Experiments (cont.) Query latency distributions using 4 servers. Page 13 of 18

Experiments (cont.) Maximum achievable throughputs with 90% of queries processed within 100ms. Page 14 of 18

Advantages Page 15 of 18 Easily coupled with a distributed DB query cache. Does not require any modification in the application itself.

Disadvantages Page 16 of 18 Does not support transactions. However, it can be implemented through query router. Limitation due to table granularity partial replication. Fault Tolerance issues. Does not take into consideration the longterm load variations that must be expected when operating a popular dynamic web site.

References Lei Gao, Mike Dahlin, Amol Nayate, Jiandan Zheng, and Arun Iyengar. Application specific data replication for edge services. In WWW 03: Proceedings of the 12th international conference on World Wide Web, 449 460, Budapest, Hungary. 2003. ISBN 1-58113-680-3. Page 17 of 18 Swaminathan Sivasubramanian, Gustavo Alonso, Guillaume Pierre, and Maarten van Steen. Globedb: autonomic data replication for web applications. In WWW 05: Proceedings of the 14th international conference on World Wide Web, 33 42, Chiba, Japan. 2005. ISBN 1-59593-046-9.

Page 18 of 18 Thank you dsaid@vt.edu