Outline Computer Networking. HTTP Basics (Review) How to Mark End of Message? (Review)

Similar documents
416 Distributed Systems. March 23, 2018 CDNs

Overview Content Delivery Computer Networking Lecture 15: The Web Peter Steenkiste. Fall 2016

Web, HTTP, Caching, CDNs

HTTP and Web Content Delivery

Lecture Overview : Computer Networking. Applications and Application-Layer Protocols. Client-Server Paradigm

CSE 123b Communications Software

Today s class. CSE 123b Communications Software. Telnet. Network File System (NFS) Quick descriptions of some other sample applications

HTTP Reading: Section and COS 461: Computer Networks Spring 2013

Web, HTTP and Web Caching

HyperText Transfer Protocol

Lecture Overview. Lecture 3 Design Philosophy & Applications. Internet Architecture. Priorities

Computer Networks. Wenzhong Li. Nanjing University

1-1. Switching Networks (Fall 2010) EE 586 Communication and. September Lecture 10

Page # Lecture Overview. Lecture 3 Design Philosophy & Applications. Internet Architecture. Priorities. Survivability.

CMSC 332 Computer Networking Web and FTP

Applications & Application-Layer Protocols: The Web & HTTP

Lecture 7b: HTTP. Feb. 24, Internet and Intranet Protocols and Applications

Internet Content Distribution

CS 355. Computer Networking. Wei Lu, Ph.D., P.Eng.

CS 43: Computer Networks. HTTP September 10, 2018

CSC 4900 Computer Networks:

Content Delivery on the Web: HTTP and CDNs

Application Protocols and HTTP

CSEN 404 Introduction to Networks. Mervat AbuElkheir Mohamed Abdelrazik. ** Slides are attributed to J. F. Kurose

Review of Previous Lecture

Chapter 2 Application Layer

Server Selection Mechanism. Server Selection Policy. Content Distribution Network. Content Distribution Networks. Proactive content replication

Lecture 04: Application Layer (Part 01) Principles and the World Wide Web (HTTP) Dr. Anis Koubaa

CSCI-1680 WWW Rodrigo Fonseca

CSCI-1680 WWW Rodrigo Fonseca

CSC358 Week 2. Adapted from slides by J.F. Kurose and K. W. Ross. All material copyright J.F Kurose and K.W. Ross, All Rights Reserved

Chapter 2: outline. 2.6 P2P applications 2.7 socket programming with UDP and TCP

Announcements. The World Wide Web. Retrieving From the Server. Goals of Today s Lecture. The World Wide Web. POP3 Protocol

The World Wide Web. EE 122: Intro to Communication Networks. Fall 2006 (MW 4-5:30 in Donner 155) Vern Paxson TAs: Dilip Antony Joseph and Sukun Kim

Web caches (proxy server) Applications (part 3) Applications (part 3) Caching example (1) More about Web caching

COSC 2206 Internet Tools. The HTTP Protocol

DATA COMMUNICATOIN NETWORKING

CS4/MSc Computer Networking. Lecture 3: The Application Layer

Application Layer: HTTP

Chapter 2 Application Layer

Lecture 6 Application Layer. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Lecture 12. Application Layer. Application Layer 1

Content Distribu-on Networks (CDNs)

Chapter 2: outline. 2.6 P2P applications 2.7 socket programming with UDP and TCP

ELEC / COMP 177 Fall Some slides from Kurose and Ross, Computer Networking, 5 th Edition

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Produced by. Mobile Application Development. Higher Diploma in Science in Computer Science. Eamonn de Leastar

How to work with HTTP requests and responses

Mul$media Networking. #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya

EECS 3214: Computer Network Protocols and Applications

EE 122: HyperText Transfer Protocol (HTTP)

Content Distribution. Today. l Challenges of content delivery l Content distribution networks l CDN through an example

Applications & Application-Layer Protocols: The Web & HTTP

World-Wide Web Protocols CS 571 Fall Kenneth L. Calvert All rights reserved

CSE 333 Lecture HTTP

Chapter 2: Application Layer. Chapter 2 Application Layer. Some network apps. Application architectures. Chapter 2: Application layer

Computer Networks. HTTP and more. Jianping Pan Spring /20/17 CSC361 1

Introduc)on to Computer Networks

Announcements. The World Wide Web. Goals of Today s Lecture. The World Wide Web HTML. Web Components

Application Level Protocols

Today: World Wide Web! Traditional Web-Based Systems!

TCP/IP Networking Basics

CSE 124: CONTENT-DISTRIBUTION NETWORKS. George Porter December 4, 2017

Application Layer: The Web and HTTP Sec 2.2 Prof Lina Battestilli Fall 2017

CSE/EE 461 HTTP and the Web

2- Application Level Protocols HTTP 1.0/1.1/2

Improving HTTP Latency

CONTENT-DISTRIBUTION NETWORKS

Chapter 2 Application Layer

Computer Network Midterm Explain Internet protocol stack (1% each layer s name, 1% each layer s functions, 10% total)

CSC 257/457 Computer Networks. Fall 2017 MW 4:50 pm 6:05 pm CSB 601

Session 8. Reading and Reference. en.wikipedia.org/wiki/list_of_http_headers. en.wikipedia.org/wiki/http_status_codes

The HTTP protocol. Fulvio Corno, Dario Bonino. 08/10/09 http 1

Flash: an efficient and portable web server

90 Minute Optimization Life Cycle

EECS 122: Introduction to Computer Networks DNS and WWW. Internet Names & Addresses

CS 43: Computer Networks. Layering & HTTP September 7, 2018

CSE 333 Lecture HTTP

Notes beforehand... For more details: See the (online) presentation program.

Internet Architecture. Web Programming - 2 (Ref: Chapter 2) IP Software. IP Addressing. TCP/IP Basics. Client Server Basics. URL and MIME Types HTTP

HTTP TRAFFIC CONSISTS OF REQUESTS AND RESPONSES. All HTTP traffic can be

CSCE 463/612 Networks and Distributed Processing Spring 2018

Application Layer. Applications and application-layer protocols. Goals:

Chapter 2 Application Layer

Hypertext Transport Protocol

Proxying. Why and How. Alon Altman. Haifa Linux Club. Proxying p.1/24

Chapter 2 Application Layer. Lecture 4: principles of network applications. Computer Networking: A Top Down Approach

ECE697AA Lecture 2. Today s lecture

CSC 401 Data and Computer Communications Networks

Foundations of Telematics

A taste of HTTP v1.1. additions. HTTP v1.1: introduces many complexities no longer an easy protocol to implement. G.Bianchi, G.Neglia, V.

CS 455/555 Spring 2011 Weigle

HTTP Protocol and Server-Side Basics

CSCI 466 Midterm Networks Fall 2013

HTTP (HyperText Transfer Protocol)

School of Engineering Department of Computer and Communication Engineering Semester: Fall Course: CENG415 Communication Networks

CSEN 503 Introduction to Communication Networks

Seminar on. By Sai Rahul Reddy P. 2/2/2005 Web Caching 1

Caching. Caching Overview

Web as a Distributed System

Transcription:

Outline 15-441 Computer Networking Lecture 25 The Web HTTP review and details (more in notes) Persistent HTTP review HTTP caching Content distribution networks Lecture 19: 2006-11-02 2 HTTP Basics (Review) HTTP layered over bidirectional byte stream Almost always TCP Interaction Client sends request to, followed by response from to client Requests/responses are encoded in text Stateless Server maintains no information about past client requests How to Mark End of Message? (Review) Size of message Content-Length Must know size of transfer in advance Delimiter MIME-style Content-Type Server must escape delimiter in content Close connection Only can do this Lecture 19: 2006-11-02 3 Lecture 19: 2006-11-02 4 HTTP Request (review) Request line Method GET return URI HEAD return headers only of GET response POST send data to the (forms, etc.) URL (relative) E.g., /index.html HTTP version HTTP Request (cont.) (review) Request headers Authorization authentication info Acceptable document types/encodings From user email If-Modified-Since Referrer what caused this page to be requested User-Agent client software Blank-line Body Lecture 19: 2006-11-02 5 Lecture 19: 2006-11-02 6 1

HTTP Request (review) HTTP Request Example (review) GET / HTTP/1.1 Accept: */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Host: www.intel-iris.net Connection: Keep-Alive Lecture 19: 2006-11-02 7 Lecture 19: 2006-11-02 8 HTTP Response (review) Status-line HTTP version 3 digit response code 1XX informational 2XX success 200 OK 3XX redirection 301 Moved Permanently 303 Moved Temporarily 304 Not Modified 4XX client error 404 Not Found 5XX error 505 HTTP Version Not Supported Reason phrase HTTP Response (cont.) (review) Headers Location for redirection Server software WWW-Authenticate request for authentication Allow list of methods supported (get, head, etc) Content-Encoding E.g x-gzip Content-Length Content-Type Expires Last-Modified Blank-line Body Lecture 19: 2006-11-02 9 Lecture 19: 2006-11-02 10 HTTP Response Example (review) Outline HTTP/1.1 200 OK Date: Tue, 27 Mar 2001 03:49:38 GMT Server: Apache/1.3.14 (Unix) (Red-Hat/Linux) mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2 PHP/4.0.1pl2 mod_perl/1.24 Last-Modified: Mon, 29 Jan 2001 17:54:18 GMT ETag: "7a11f-10ed-3a75ae4a" Accept-Ranges: bytes Content-Length: 4333 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html.. HTTP intro and details Persistent HTTP HTTP caching Content distribution networks Lecture 19: 2006-11-02 11 Lecture 19: 2006-11-02 12 2

Typical Workload (Web Pages) Multiple (typically small) objects per page File sizes Heavy-tailed Pareto distribution for tail Lognormal for body of distribution -- For reference/interest only -- Embedded references Number of embedded objects = pareto p(x) = ak a x -(a+1) HTTP 0.9/1.0 (mostly review) One request/response per TCP connection Simple to implement Disadvantages Multiple connection setups three-way handshake each time Several extra round trips added to transfer Multiple slow starts Lecture 19: 2006-11-02 13 Lecture 19: 2006-11-02 14 Single Transfer Example More Problems 0 RTT Client opens TCP connection 1 RTT Client sends HTTP request for HTML 2 RTT Client parses HTML Client opens TCP connection 3 RTT Client sends HTTP request for image Image begins to arrive 4 RTT Client SYN FIN SYN Server SYN FIN SYN Server reads from disk Server reads from disk Short transfers are hard on TCP Stuck in slow start Loss recovery is poor when windows are small Lots of extra connections Increases state/processing Server also forced to keep TIME_WAIT connection state -- Things to think about -- Why must keep these? Tends to be an order of magnitude greater than # of active connections, why? Lecture 19: 2006-11-02 15 Lecture 19: 2006-11-02 16 Persistent Connection Solution (review) Persistent Connection Example (review) Multiplex multiple transfers onto one TCP connection Client Server How to identify requests/responses Delimiter Server must examine response for delimiter string Content-length and delimiter Must know size of transfer in advance Block-based transmission send in multiple length delimited blocks Store-and-forward wait for entire response and then use content-length Solution use existing methods and close connection otherwise 0 RTT Client sends HTTP request for HTML 1 RTT Client parses HTML Client sends HTTP request for image Image begins to arrive 2 RTT Server reads from disk Server reads from disk Lecture 19: 2006-11-02 17 Lecture 19: 2006-11-02 18 3

Persistent HTTP (review) Outline Nonpersistent HTTP issues: Requires 2 RTTs per object OS must work and allocate host resources for each TCP connection But browsers often open parallel TCP connections to fetch referenced objects Persistent HTTP Server leaves connection open after sending response Subsequent HTTP messages between same client/ are sent over connection Persistent without pipelining: Client issues new request only when previous response has been received One RTT for each referenced object Persistent with pipelining: Default in HTTP/1.1 Client sends requests as soon as it encounters a referenced object As little as one RTT for all the referenced objects HTTP intro and details Persistent HTTP -- new stuff -- HTTP caching Content distribution networks Lecture 19: 2006-11-02 19 Lecture 19: 2006-11-02 20 HTTP Caching Clients often cache documents Challenge: update of documents If-Modified-Since requests to check HTTP 0.9/1.0 used just date HTTP 1.1 has an opaque entity tag (could be a file signature, etc.) as well When/how often should the al be checked for changes? Check every time? Check each session? Day? Etc? Use Expires header If no Expires, often use Last-Modified as estimate Example Cache Check Request GET / HTTP/1.1 Accept: */* Accept-Language: en-us Accept-Encoding: gzip, deflate If-Modified-Since: Mon, 29 Jan 2001 17:54:18 GMT If-None-Match: "7a11f-10ed-3a75ae4a" User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Host: www.intel-iris.net Connection: Keep-Alive Lecture 19: 2006-11-02 21 Lecture 19: 2006-11-02 22 Example Cache Check Response HTTP/1.1 304 Not Modified Date: Tue, 27 Mar 2001 03:50:51 GMT Server: Apache/1.3.14 (Unix) (Red-Hat/Linux) mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2 PHP/4.0.1pl2 mod_perl/1.24 Connection: Keep-Alive Keep-Alive: timeout=15, max=100 ETag: "7a11f-10ed-3a75ae4a Ways to cache Client-directed caching: Web Proxies Server-directed caching: Content Delivery Networks (CDNs) Lecture 19: 2006-11-02 23 Lecture 19: 2006-11-02 24 4

Web Proxy Caches Caching Example (1) User configures browser: Web accesses via cache Browser sends all HTTP requests to cache Object in cache: cache returns object Else cache requests object from, then returns object to client client client HTTP request HTTP response HTTP request HTTP response Proxy HTTP request HTTP response Assumptions Average object size = 100,000 bits Avg. request rate from institution s browser to s = 15/sec Delay from institutional router to any and back to router = 2 sec Consequences Utilization on LAN = 15% Utilization on access link = 100% Total delay = Internet delay + access delay + LAN delay = 2 sec + minutes + milliseconds institutional network public Internet 1.5 Mbps access link 10 Mbps LAN s Lecture 19: 2006-11-02 25 Lecture 19: 2006-11-02 26 Caching Example (2) Caching Example (3) Possible solution Increase bandwidth of access link to, say, 10 Mbps Often a costly upgrade Consequences Utilization on LAN = 15% Utilization on access link = 15% Total delay = Internet delay + access delay + LAN delay = 2 sec + msecs + msecs institutional network public Internet 10 Mbps access link 10 Mbps LAN s Install cache Suppose hit rate is.4 Consequence 40% requests will be satisfied almost immediately (say 10 msec) 60% requests satisfied by Utilization of access link reduced to 60%, resulting in negligible delays Weighted average of delays =.6*2 sec +.4*10msecs < 1.3 secs institutional network public Internet 1.5 Mbps access link 10 Mbps LAN s institutional cache Lecture 19: 2006-11-02 27 Lecture 19: 2006-11-02 28 Problems Over 50% of all HTTP objects are uncacheable why? Not easily solvable Dynamic data stock prices, scores, web cams CGI scripts results based on passed parameters Obvious fixes SSL encrypted data is not cacheable Most web clients don t handle mixed pages well many generic objects transferred with SSL Cookies results may be based on passed data Hit metering owner wants to measure # of hits for revenue, etc. What will be the end result? Content Distribution Networks (CDNs) The content providers are the CDN customers. Content replication CDN company installs hundreds of CDN s throughout Internet Close to users CDN replicates its customers content in CDN s. When provider updates content, CDN updates s CDN in S. America in North America CDN distribution node CDN in Europe CDN in Asia Lecture 19: 2006-11-02 29 Lecture 19: 2006-11-02 30 5

Outline HTTP intro and details Persistent HTTP HTTP caching Content Distribution Networks & Server Selection Replicate content on many s Challenges How to replicate content Where to replicate content How to find replicated content How to choose among know replicas How to direct clients towards replica Content distribution networks Lecture 19: 2006-11-02 31 Lecture 19: 2006-11-02 32 Server Selection Which? Lowest load to balance load on s Best performance to improve client performance Based on Geography? RTT? Throughput? Load? Any alive node to provide fault tolerance How to direct clients to a particular? As part of routing anycast, cluster load balancing Not covered As part of application HTTP redirect As part of naming DNS Application Based HTTP supports simple way to indicate that Web page has moved (30X responses) Server receives Get request from client Decides which is best suited for particular client and object Returns HTTP redirect to that Can make informed application specific decision May introduce additional overhead multiple connection setup, name lookups, etc. OK solution in general, but HTTP Redirect has some flaws especially with current browsers Incurs many delays, which operators may really care about Lecture 19: 2006-11-02 33 Lecture 19: 2006-11-02 34 Naming Based Client does DNS name lookup for service Name chooses appropriate address A-record returned is best one for the client What information can name base decision on? Server load/location must be collected Information in the name lookup request Name service client typically the local name for client How Akamai Works Clients fetch html document from primary E.g. fetch index.html from cnn.com URLs for replicated content are replaced in html E.g. <img src= http://cnn.com/af/x.gif > replaced with <img src= http://a73.g.akamaitech.net/7/23/cnn.com/af/x.gif > Client is forced to resolve axyz.g.akamaitech.net hostname Lecture 19: 2006-11-02 35 Lecture 19: 2006-11-02 36 6

How Akamai Works How is content replicated? Akamai only replicates static content (*) Modified name contains al file name Akamai is asked for content First checks local cache If not in cache, requests file from primary and caches file * (At least, the version we re talking about today. Akamai actually lets sites write code that can run on Akamai s s, but that s a pretty different beast) Lecture 19: 2006-11-02 37 How Akamai Works Root gives NS record for akamai.net Akamai.net name returns NS record for g.akamaitech.net Name chosen to be in region of client s name TTL is large G.akamaitech.net name chooses in region Should try to chose that has file in cache - How to choose? Uses axyz name and hash TTL is small why? Lecture 19: 2006-11-02 38 Simple Hashing Given document XYZ, we need to choose a to use Suppose we use modulo Number s from 1 n Place document XYZ on (XYZ mod n) What happens when a s fails? n n-1 Same if different people have different measures of n Why might this be bad? Consistent Hash view = subset of all hash buckets that are visible Desired features Balanced in any one view, load is equal across buckets Smoothness little impact on hash bucket contents when buckets are added/removed Spread small set of hash buckets that may hold an object regardless of views Load across all views # of objects assigned to hash bucket is small Lecture 19: 2006-11-02 39 Lecture 19: 2006-11-02 40 Consistent Hash Example Construction 0 Assign each of C hash buckets to random points on mod 2 n 14 circle, where, hash key size = n. Bucket 12 4 Map object to random position on circle Hash of object = closest 8 clockwise bucket Smoothness addition of bucket does not cause movement between existing buckets Spread & Load small set of buckets that lie near object Balance no bucket is responsible for large number of objects How Akamai Works cnn.com (content provider) DNS root Akamai Get 11 index. html 1 2 3 End-user Get foo.jpg 4 12 9 5 6 7 8 10 Get /cnn.com/foo.jpg Akamai high-level DNS Akamai low-level DNS Nearby matching Akamai Lecture 19: 2006-11-02 41 Lecture 19: 2006-11-02 42 7

Akamai Subsequent Requests Impact on DNS Usage cnn.com (content provider) DNS root Akamai Get index. html 1 2 Akamai high-level DNS End-user 9 7 8 Get 10 /cnn.com/foo.jpg Akamai low-level DNS Nearby matching Akamai Lecture 19: 2006-11-02 43 DNS is used for selection more and more What are reasonable DNS TTLs for this type of use Typically want to adapt to load changes Low TTL for A-records what about NS records? How does this affect caching? What do the first and subsequent lookup do? Lecture 19: 2006-11-02 44 HTTP (Summary) Simple text-based file exchange protocol Support for status/error responses, authentication, client-side state maintenance, cache maintenance Workloads Typical documents structure, popularity Server workload Interactions with TCP Connection setup, reliability, state maintenance Persistent connections How to improve performance Persistent connections Caching Replication EXTRA SLIDES The rest of the slides are FYI Lecture 19: 2006-11-02 45 Typical Workload (Server) Popularity Zipf distribution (P = kr -1 ) surprisingly common Obvious optimization caching Request sizes In one measurement paper median 1946 bytes, mean 13767 bytes Why such a difference? Heavy-tailed distribution Pareto p(x) = ak a x -(a+1) Temporal locality Modeled as distance into push-down stack Lognormal distribution of stack distances Request interarrival Bursty request patterns Caching Proxies Sources for Misses Capacity How large a cache is necessary or equivalent to infinite On disk vs. in memory typically on disk Compulsory First time access to document Non-cacheable documents CGI-scripts Personalized documents (cookies, etc) Encrypted data (SSL) Consistency Document has been updated/expired before reuse Conflict No such misses Lecture 19: 2006-11-02 47 Lecture 19: 2006-11-02 48 8