Application Protocols and HTTP 14-740: Fundamentals of Computer Networks Bill Nace Material from Computer Networking: A Top Down Approach, 6 th edition. J.F. Kurose and K.W. Ross
Administrivia Lab #0 due in a week Next time: Paper Review of Mockapetris88 Quiz #1 approaches (25 Sep, 2 weeks away) First half of class, 45 minutes Multiple choice / Short answer questions Covers everything up until 25 Sep lecture Layered architecture, Design Principles, ISPs & Peering, Web & HTTP, DNS, P2P, Queuing Theory Note: TAs hold office hours go talk to them 2
Last Lecture ISPs, Backbones, Peering Motivations to peer Tier-1 Tier-2 Content / Enterprise Companies Interconnections Private vs Public Peering 3
traceroute Application Layer Web and HTTP Message format Persistent connections Caching 4
In the app layer HTTP SMTP DNS (queries) VOIP Abstract transport TCP UDP Use transport services: TCP or UDP or... Think about transport as a channel for data from client to server and back TCP requires setup and teardown UDP has no such requirement 5
Setup Overhead Client initiates transport connection to server API: Creates socket Server accepts connection Client initiate transport connection request file Server accept connection Application-layer protocol messages exchanged between browser and web server Transport connection closed file received; close connection send file closed 6
Operations Mission Addressing Network data type 7
traceroute Application Layer Web and HTTP Message format Persistent connections Caching 8
HTTP Overview HTTP: hypertext transfer protocol Web s application layer protocol Client / server model client: browser that requests, receives, renders web objects server: stores objects, sends in response to requests PC running IE Mac running Safari HTTP Request HTTP Response HTTP Request HTTP Response Linux running Apache (web server) Many implementations in various operating systems communicate using HTTP 9
History Lesson HTTP 0.9, circa 1990 Original release, first described in W3 mailing list HTTP as implemented in WWW, by Tim Berners-Lee http://lists.w3.org/archives/public/www-talk/1992janfeb/0000.html HTTP 1.0, started 1993 RFCs for HTML and URI published the same year Informational RFC-1945, 1996 Not a standards document, merely common usages A number of problems Caching control TCP overhead for short responses 10
History (2) HTTP/1.1 Backward compatibility big issue RFC 2068, proposed standard, 1997 RFC 2616, draft standard, 1999 Some web server products claimed compliance to HTTP 1.1 even before it became standard! RFC 2616 had to be backward compatible with 2068 Pressures from vendors, technologist, etc This lecture focuses on HTTP/1.1 11
History (3) HTTP/2 approved May 2015 (RFC 7540) Somewhat slow adoption rate Changes how data is transferred Avoid head-of-line blocking Compress headers Allow server push Violates layered architecture principles 12
traceroute Application Layer Web and HTTP Message format Persistent connections Caching 13
HTTP Message 2 types of messages Requests from client to server, and Responses from server to client RFCs use Backus-Naur Form (BNF) to formally specify formats (RFC 5234) HTTP-message = Request Response or 14
Request / Response Both request and response consist of: Start line, followed by... Zero or more headers, followed by... An empty line, followed by... Message body (optional) generic-message = start-line Zero or more times *( message-header CRLF ) optional CRLF [ message-body ] start-line = Request-line status-line 15
Request Format Request = Request-Line *(( general-header request-header entity-header ) CRLF) CRLF [ message-body ] Zero or more of general, request or entity headers followed by CRLF, followed by an optional message body
What are those headers? Headers provide metadata about the request or response Dates/times Application or Server information Caching control 46 defined headers Host: is required on requests 17
Request Format (2) Request-Line = Method SP Request-URI SP HTTP-Version CRLF You get the idea... Method = OPTIONS GET HEAD POST PUT DELETE 18
Example: Request Note: ASCII (human-readable format) GET /images/logos.html HTTP/1.1 Host: www.cmu.edu User-agent: mozilla/5.0... Connection: close Accept-language: en-us (extra carriage return, line feed) Request-line (GET, POST,... commands) message-header (x4) CRLF: Carriage return, line feed 2nd CRLF indicates no message-body, thus end of message 19
Request Methods GET: Retrieve an object Conditional GET if header includes If- Modified-Since, If-Match, etc Partial GET if header includes a Range field Essential for restartable transfers such as scrubbing and buffering a media stream 20
Request Methods HEAD: Retrieve metadata about an object (validity, modification time, etc) Same as GET but MUST NOT return a message body 21
Request Methods OPTIONS: Request info about the capabilities of server (or a resource) without requesting the resource POST: Upload data to server E.g. posting a message to mailing list, submitting a form, etc 22
Example: Response Recall: start-line = Request-line status-line Defined as: status-line = HTTP-version SP Status-Code SP Reason-Phrase CRLF Status code Header lines HTTP/1.1 200 OK Connection close Date: Wed, 01 Sep 2018 12:16:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 2016... Content-Length: 6821 Content-Type: text/html data data data data data... data, e.g. requested HTML file
Status Code In first line of response from server to client 3-digit integer result code 1xx: Informational Request received, continuing process 2xx: Success Action successful 3xx: Redirection Further action needed to complete request 4xx: Client Error Request has bad syntax or cannot be fulfilled 5xx: Server Error Server failed to fulfill a valid request 24
Sample Status Codes 200 OK request succeeded, requested object included in this message 301 Moved Permanently requested object moved, new location specified later in this message (Location:) 404 Not Found requested document not found on this server 505 HTTP Version Not Supported 25
Try HTTP for yourself Telnet to your favorite web server telnet www.google.com 80 Type in a GET HTTP request: GET /index.html HTTP/1.1 <CR> Host: www.google.com <CR> <CR> Opens TCP connection to port 80 (default HTTP server port) at www.google.com. Anything typed in gets sent to port 80 at www.google.com By typing this in, you send this minimal (but complete) GET request to HTTP server Examine the response message Hmm.. different for Kobe students 26
Google.com/index.html HTML file (index.html) describes layout, links, scripts, etc Includes a reference to the logo image file (logo.gif)
Question Does the following request retrieve the logo file as well? GET /index.html HTTP/1.1 <CR> Host: www.google.com <CR> <CR> 28
HTTP Request Each HTTP Request retrieves a single object per message An object (e.g. HTML file) can contain links to other objects (e.g. images, HTML files) Client must send separate request to retrieve each additional object 29
traceroute Application Layer Web and HTTP Message format Persistent connections Caching 30
Connection Management HTTP uses TCP as its transport protocol TCP not optimized for short-lived connections typical of HTTP message exchange Often simple pages, which result in short messages 31
Nonpersistent HTTP Suppose user wants cmu.edu/index.html 1. Client initiates TCP connection to cmu.edu on port 80 3. Client sends HTTP request message (containing URL /index.html) into connection socket 2. Web server at cmu.edu waiting for connection on port 80 accepts connection, responds to sender 4. Server receives request message, fetches object, formats response message, sends message into connection
Nonpersistent HTTP (2) 6. HTTP client receives response message. Parses HTML file, discovering 10 referenced image files 7. Repeat steps 1-5 for each of 10 image objects 5. Server closes connection
Response time modeling Round Trip Time (RTT): time to send a small packet from client to server and back Calculation for HTTP response time Client initiate transport connection RTT Server accept connection One RTT to initiate TCP connection One RTT for HTTP request and first byte of response RTT request file send file file transmission time response time = 2RTT + transmit time Transmit Time file received; close connection closed
Problems A separate transport connection is established to fetch each object Requires at least 2 RTTs per object High overhead in terms of packets in the network Long user-perceived latency 35
Problems (2) Transport protocol (TCP) is optimized for large data transfers. Pays extra startup time to avoid congestion (slow-start, windowing, etc) HTTP request for small objects never gets past initial phase Connection closed before window size can be increased significantly Available bandwidth never fully used Details in Transport lecture 36
Parallel connections? Browser opens several connections in parallel, and download embedded images separately but simultaneously Early Netscape browser, circa 1994 Pros User feels webpage is loading faster Cons Do not solve the TCP overhead and slow-start problems Impose considerable load on network congestion Server juggles more TCP connections Actually reduces effective throughput 37
User behavior: Aborted requests A page is not what we wanted (or we are just bored), so we click Back button Similar to TV channel surfing If browser is using parallel connections,... Already started to download all embedded objects! Connections must be aborted But, the cost of establishing them has already been paid, and thus is wasted 38
Persistent HTTP 1. Reuse existing transport connection Server leaves connection open after sending response Subsequent HTTP messages between same client/server sent over open connection 2. Pipelining at application protocol level 39
To Pipeline or not... Client Server Persistent without pipelining: client issues new request only upon receipt of previous response one RTT for each referenced object... plus one setup/close overhead initiate transport connection request object object received; request next object accept connection send object send object
To Pipeline or not... Client Server Persistent with pipelining: client issues new request as soon as it encounters a referenced object default in HTTP/1.1 server sends objects in order as little as one RTT for all referenced objects initiate transport connection request object request object request object receive object receive object receive object accept connection send object send object send object
Persistent HTTP: Advantages Reduce transport-layer connection costs Fewer setups and teardowns CPU time saved in routers and hosts Hosts save memory for transport state (buf, counts,...) Reduce latency by avoiding multiple TCP slow-starts Do opening handshake once to establish connection Do slow-start once to get to ideal sending rate Avoid bandwidth wastage and reduce overall congestion Fewer number of packets sent 42
traceroute Application Layer Web and HTTP Message format Persistent connections Caching 43
Web Proxy Caching Goal: satisfy client request without involving origin server; reduce latency and bandwidth requirements client sends requests to cache cache responds if it has a copy, otherwise uses HTTP to request a copy from the origin server Client Client HTTP Request Request Response Proxy Server Origin Server HTTP Response Origin Server 44
Consistency HTTP ensures correctness of caching Eliminate need to send requests to origin server Specifies cacheability of responses, e.g. Can I cache this object? Specifies expiration mechanisms, e.g. When does this object become stale? Eliminate need to send full responses from origin server Specifies validation mechanisms, e.g. Is this object fresh or stale? 45
Protocol is not Policy Web cache policy is separate from protocol Sample policy questions: If cache is full, which object to evict? Should we replace a stale object that is very popular with a fresh object that might not be requested often? 46
Expiration Model Server-specified expiration Uses Expires header or max-age directive in Cache- Control header Recommended Heuristic expiration Server does not specify explicit expiration times Up to the web cache implementation Freshness calculation Is a cache entry fresh? Age and Expiration calculations 47
Validation Model If response is not fresh, need to validate with server don t send object if cache still has up-to-date cached version cache: specify date of cached copy in HTTP request If-modified-since: <date> server: response contains no object if cached copy is up-todate: HTTP/1.0 304 Not Modified Cache HTTP request msg If-modified-since: <date> HTTP response HTTP/1.0 304 Not Modified HTTP request msg!48 If-modified-since: <date> HTTP response HTTP/1.0 200 OK <data> Server object not modified object modified
Question Is a web proxy (cache) better for performance than the browser cache? 49
Lesson Objectives Now, you should be able to: describe the mission, scope, addressing mechanism and data types of the Application Layer explain the HTTP protocol, including message format, interaction model and connection management calculate response time for an HTTP request over nonpersistent, parallel or persistent connections, including the pipelined variant describe how web proxies work to cache HTTP responses, including how they ensure consistency 50