Distributed Systems Principles and Paradigms. Chapter 12: Distributed Web-Based Systems

Similar documents
Part 12 殷亚凤. Homepage: Room 301, Building of Computer Science and Technology

Distributed System: Definition

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Traditional Web Based Systems

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

CPE731 Distributed System Models

Web-Based Systems. INF 5040 autumn lecturer: Roman Vitenberg

Today: World Wide Web! Traditional Web-Based Systems!

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

Distributed Systems Principles and Paradigms

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

A Peer-to-Peer System to Bring Content Distribution Networks to the Masses Guillaume Pierre Vrije Universiteit, Amsterdam

Distributed Systems Principles and Paradigms

Chapter 6: Distributed Systems: The Web. Fall 2012 Sini Ruohomaa Slides joint work with Jussi Kangasharju et al.

Today: World Wide Web

Today: World Wide Web. Traditional Web-Based Systems

Distributed Systems Principles and Paradigms

CA464 Distributed Programming

Globule: a Platform for Self-Replicating Web Documents

Web as a Distributed System

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. Caching, Content Distribution and Load Balancing

Distributed Systems Principles and Paradigms. Chapter 07: Consistency & Replication

Large-Scale Web Applications

Chapter 11 DISTRIBUTED FILE SYSTEMS

Towards Autonomic Hosting of Multi-tier Internet Applications

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 5 Naming

Distributed Systems Principles and Paradigms. Distributed Object-Based Systems. Remote distributed objects. Remote distributed objects

Computer Networks. Wenzhong Li. Nanjing University

HTTP and Web Content Delivery

Distributed redirection for the World-Wide Web

Realtime visitor analysis with Couchbase and Elasticsearch

Design and implementation of an MPLS based load balancing architecture for Web switching

The World Wide Web. Distributed Computing Systems. References. Outline 5/2/2014. The Web. Architecture Processes Caching Web 2.0

WWW, REST, and Web Services

Yet another redirection mechanism for the World-Wide Web?

From Internet Data Centers to Data Centers in the Cloud

EEC-682/782 Computer Networks I

CS November 2017

CHAPTER 7 WEB SERVERS AND WEB BROWSERS

Distributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018

Distributed Systems Principles and Paradigms. Chapter 04: Communication

416 Distributed Systems. March 23, 2018 CDNs

CSE 124: CONTENT-DISTRIBUTION NETWORKS. George Porter December 4, 2017

Distributed Systems Principles and Paradigms. Chapter 04: Communication

Achieving Scalability and High Availability for clustered Web Services using Apache Synapse. Ruwan Linton WSO2 Inc.

Distributed Multitiered Application

MONitoring Agents using a Large Integrated Services Architecture. Iosif Legrand California Institute of Technology

Chapter 7 Consistency And Replication

Web caches (proxy server) Applications (part 3) Applications (part 3) Caching example (1) More about Web caching

CSE/EE 461 HTTP and the Web

XML Web Service? A programmable component Provides a particular function for an application Can be published, located, and invoked across the Web

04 Webservices. Web APIs REST Coulouris. Roy Fielding, Aphrodite, chp.9. Chp 5/6

Oracle 10g and IPv6 IPv6 Summit 11 December 2003

Last Class: Consistency Models. Today: Implementation Issues

CS 194: Distributed Systems WWW and Web Services

DS 2009: middleware. David Evans

Using peer to peer. Marco Danelutto Dept. Computer Science University of Pisa

Performance Analysis for Crawling

Web Programming Paper Solution (Chapter wise)

Distributed Systems. Edited by. Ghada Ahmed, PhD. Fall (3rd Edition) Maarten van Steen and Tanenbaum

02 - Distributed Systems

Developing Microsoft Azure Solutions (70-532) Syllabus

Caching. Caching Overview

Developing Microsoft Azure Solutions (70-532) Syllabus

How to Make the Client IP Address Available to the Back-end Server

Review for Internet Introduction

殷亚凤. Processes. Distributed Systems [3]

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

A Location Service for Worldwide Distributed Objects

Inf 202 Introduction to Data and Databases (Spring 2010)

Chapter 2. Application Layer

Active Endpoints. ActiveVOS Platform Architecture Active Endpoints

Uniform Resource Locators (URL)

Tackling Application Integration Nightmares with WSO2 ESB. Hiranya Jayathilaka

CS November 2018

Developing Microsoft Azure Solutions (70-532) Syllabus

Edge Side Includes (ESI) Overview

Proxying. Why and How. Alon Altman. Haifa Linux Club. Proxying p.1/24

Web Replica Hosting Systems Design

Distributed Systems: Models and Design

Replication architecture

CS 347 Parallel and Distributed Data Processing

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Outline. Distributed Computing Systems. The Rise of Distributed Systems. Depiction of a Distributed System 4/15/2014

SOFTWARE ARCHITECTURES ARCHITECTURAL STYLES SCALING UP PERFORMANCE

Democratizing Content Publication with Coral

CS 347 Parallel and Distributed Data Processing

Service Mesh and Microservices Networking

Content Delivery on the Web: HTTP and CDNs

Content Distribution. Today. l Challenges of content delivery l Content distribution networks l CDN through an example

Distributed Object-Based Systems The WWW Architecture Web Services Handout 11 Part(a) EECS 591 Farnam Jahanian University of Michigan.

Exam : Implementing Microsoft Azure Infrastructure Solutions

02 - Distributed Systems

Verteilte Systeme (Distributed Systems)

Chapter 2 Application Layer

World Wide Web. Before WWW

Media File Options. Deployment and Ongoing Management. This chapter covers the following topics:

Files/News/Software Distribution on Demand. Replicated Internet Sites. Means of Content Distribution. Computer Networks 11/9/2009

Electronic Mail. Three Components: SMTP SMTP. SMTP mail server. 1. User Agents. 2. Mail Servers. 3. SMTP protocol

Transcription:

Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science steen@cs.vu.nl Chapter 12: Distributed -Based Systems Version: December 10, 2012

Distributed -Based Systems 12.1 Architecture 2 / 19 Distributed -based systems Essence The WWW is a huge client- system with millions of s; each hosting thousands of hyperlinked documents. Documents are often represented in text (plain text, HTML, XML) Alternative types: images, audio, video, applications (PDF, PS) Documents may contain scripts, executed by client-side software machine Browser Server machine 2. Server fetches document from local file OS 3. Response 1. Get document request (HTTP)

Distributed -Based Systems 12.1 Architecture Multi-tiered architectures Observation Already very soon, sites were organized into three tiers. 3. Start process to fetch document 1. Get request 6. Return result HTTP request handler CGI program 4. Database interaction 5. HTML document created CGI process Database 3 / 19

services Distributed -Based Systems 12.1 Architecture Observation At a certain point, people started recognizing that it is was more than just user site interaction: sites could offer services to other sites standardization is then badly needed. Look up a service machine application Server machine Server application Publish service Stub Stub Communication subsystem Generate stub from WSDL description SOAP Communication subsystem Generate stub from WSDL description Service Service Service description description description (WSDL) (WSDL) (WSDL) Directory service (UDDI) 4 / 19

Distributed -Based Systems Apache 12.2 Processes Observation: More than 52% of all 185 million sites are Apache. The is internally organized more or less according to the steps needed to process an HTTP request. Module Module Function Module......... Hook Hook Hook Hook Link between function and hook Functions called per hook Apache core Request Response 5 / 19

Distributed -Based Systems 12.2 Processes 6 / 19 Server clusters Essence To improve performance and availability, WWW s are often clustered in a way that is transparent to clients. LAN Front end Front end handles all incoming requests and outgoing responses Request Response

Distributed -Based Systems 12.2 Processes Server clusters Problem The front end may easily get overloaded, so that special measures need to be taken. Transport-layer switching: Front end simply passes the TCP request to one of the s, taking some performance metric into account. Content-aware distribution: Front end reads the content of the HTTP request and then selects the best. 7 / 19

8 / 19 Server Clusters Distributed -Based Systems 12.2 Processes Question Why can content-aware distribution be so much better? 6. Server responses 5. Forward other messages Distributor 3. Hand of TCP connection f Other messages Setup request Switch 1. Pass setup request to a distributor 4. Inform switch Distributor Dispatcher 2. Dispatcher selects

Distributed -Based Systems proxy caching Basic idea Sites install a separate proxy that handles all outgoing requests. Proxies subsequently cache incoming documents. Cache-consistency protocols: Always verify validity by contacting Age-based consistency: T expire = α (T cached T last modified ) + T cached 9 / 19

Distributed -Based Systems 10 / 19 proxy caching Basic idea (cnt d) Cooperative caching, by which you first check your neighbors on a cache miss 1. Look in local cache 3. Forward request to Cache proxy 2. Ask neighboring proxy caches proxy Cache HTTP Get request proxy Cache

Distributed -Based Systems 11 / 19 Replication in hosting systems Observation By-and-large, hosting systems are adopting replication to increase performance. Much research is done to improve their organization. Follows the lines of self-managing systems. Uncontrollable parameters (disturbance / noise) Initial configuration Corrections hosting system Observed output +/- +/- +/- Replica placement Consistency enforcement Request routing Reference input Metric estimation Adjustment triggers Analysis Measured output

Distributed -Based Systems Handling flash crowds Observation We need dynamic adjustment to balance resource usage. Flash crowds introduce a serious problem. 2 days (a) 2 days (b) 6 days (c) 2.5 days (d) 12 / 19

Distributed -Based Systems 13 / 19 Server replication Content Delivery Network CDNs act as hosting services to replicate documents across the Internet providing their customers guarantees on high availability and performance (example: Akamai). Cache CDN 6. Get embedded documents (if not already cached) Return IP address client-best CDN DNS 5. Get embedded documents 4 DNS lookups 3 7. Embedded documents 1. Get base document 2. Document with refs to embedded documents Origin Regular DNS system

Distributed -Based Systems Replication of applications Observation Replication becomes more difficult when dealing with databses and such. No single best solution. Assumption Updates are carried out at origin, and propagated to edge s. 14 / 19

15 / 19 Distributed -Based Systems Replication of applications: normal Edge- side Origin- side query response Appl logic Appl logic Content-blind cache Database copy full/partial data replication Content-aware cache Schema full schema replication/ query templates Schema Authoritative database

Distributed -Based Systems 16 / 19 Replication of applications Alternative solutions Full replication: high read/write ratio, often in combination with complex queries. Partial replication: high read/write ratio, but in combination with simple queries Content-aware caching: Check for queries at local database, and subscribe for invalidations at the. Works good with range queries and complex queries. Content-blind caching: Simply cache the result of previous queries. Works great with simple queries that address unique results (e.g., no range queries). Question What can be said about replication vs. performance?

17 / 19 Distributed -Based Systems Replication apps.: full/partial replication Edge- side Origin- side query response Appl logic Appl logic Content-blind cache Database copy full/partial data replication Content-aware cache Schema full schema replication/ query templates Schema Authoritative database

18 / 19 Distributed -Based Systems Replication apps.: content-aware caching Edge- side Origin- side query response Appl logic Appl logic Content-blind cache Database copy full/partial data replication Content-aware cache Schema full schema replication/ query templates Schema Authoritative database

19 / 19 Distributed -Based Systems Replication apps.: content-blind caching Edge- side Origin- side query response Appl logic Appl logic Content-blind cache Database copy full/partial data replication Content-aware cache Schema full schema replication/ query templates Schema Authoritative database