CPSC 360 Network Programming Applications & Application-Layer Protocols: The Web & HTTP Michele Weigle Department of Computer Science Clemson University mweigle@cs.clemson.edu http://www.cs.clemson.edu/~mweigle/courses/cpsc360 1 Application-Layer Protocols Outline The architecture of distributed systems» Client/ computing» P2P computing» Hybrid (Client/ and P2P) systems The programming model used in constructing distributed systems» Socket programming Example client/server systems and their application-layer protocols» The World-Wide Web (HTTP)» Reliable file transfer (FTP)» E-mail (SMTP & POP)» Internet Domain Name System (DNS) application transport link link physical local ISP company regional ISP 2
Applications and Application-Layer Protocols Overview Application: Communicating, distributed processes» Running in hosts in user space» Exchange messages to implement application application transport link physical local ISP Application-layer protocols» One piece of an application» Defines messages exchanged and actions taken» Uses services provided by lower layer protocols application transport link physical company regional application ISP transport link physical 3 Application-Layer Protocols The Web User agent (client) for the Web is called a browser:» MS Internet Explorer» Mozilla Firefox» Apple Safari for the Web is called a Web server:» Apache (public domain)» MS Internet Information (IIS) 4
Application-Layer Protocols Web terminology Web page:» Addressed by a URL» Consists of objects Most Web pages consist of:» <html Base lang="en"> HTML page <html lang="en"> <head> <head>» Embedded objects <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <meta http-equiv="content-type" content="text/html; charset=iso-8859-1"> <meta http-equiv="content-type" content="text/html; charset=iso-8859-1"> <title>cnn.com</title> <title>cnn.com</title> <meta http-equiv="refresh" content="1800; URL=http://www.cnn.com/?"> <meta http-equiv="refresh" content="1800; URL=http://www.cnn.com/?"> <link rel="stylesheet" href="http://i.cnn.net/cnn/virtual/2001/style/main.css" type="text/css"> <link rel="stylesheet" href="http://i.cnn.net/cnn/virtual/2001/style/main.css" type="text/css"> <script language="javascript1.1" src="http://i.cnn.net/cnn/virtual/2000/code/main.js" <script language="javascript1.1" src="http://i.cnn.net/cnn/virtual/2000/code/main.js" type="text/javascript"> </script> type="text/javascript"> </script> <script language="javascript1.1" type="text/javascript"> </script> <script language="javascript1.1" type="text/javascript"> </script> <script language="javascript1.1" src="http://ar.atwola.com/file/adswrapper.js"></script> <script language="javascript1.1" src="http://ar.atwola.com/file/adswrapper.js"></script> <style type="text/css"></style> <style type="text/css"></style> <script language="javascript">document.adoffset=0</script> <script language="javascript">document.adoffset=0</script> </head> </head> <body class="cnnmainbody" bgcolor="#ffffff"> <body class="cnnmainbody" bgcolor="#ffffff"> <a name="top_of_page"></a> <a name="top_of_page"></a> : : : : 5 Web Terminology URLs (Universal Resource Locators) Optional server port (Default = port 80) www.someschool.edu:8080/somedept/pic.gif domain name Object path name URL components» address» (Optional port number)» Path name 6
Web Terminology The Hypertext Transfer Protocol (HTTP) Web s application layer protocol Client/server model» client: browser that requests, receives, displays Web objects» server: Web server sends objects in response to requests HTTP/1.0: RFC 1945 HTTP/1.1: RFC 2616 PC running Firefox Mac running Safari running Apache 7 The Hypertext Transfer Protocol HTTP Overview HTTP uses TCP sockets» Browser initiates TCP connection to server (on port 80) HTTP messages (application - layer protocol messages) exchanged between browser and Web server HTTP/1.0: RFC 1945» One request/response interaction per connection HTTP/1.1: RFC 2616» Persistent connections» Pipelined connections HTTP is stateless» maintains no information about past browser requests aside Protocols that maintain state are complex!» Past history (state) must be maintained» If server or client crashes, their views of state may be inconsistent and must be reconciled 8
The Hypertext Transfer Protocol HTTP example User enters URL www.someschool.edu/somedept/home.index» Referenced object contains HTML text and references 10 JPEG images Browser sends an HTML GET request to the server www.someschool.edu will retrieve and send the HTML file Browser will read the file and sequentially make 10 separate requests for the embedded JPEG images Browser 1 1... 11 11 Web 9 HTTP 1.0 Example URL www.someschool.edu/somedept/home.index Client 1) Browser initiates TCP connection to server at www.someschool.edu. Port 80 is well known for server TCP 3-way handshake 0) process at host www.someschool.edu waiting for TCP connections on port 80 3) Client writes an HTTP GET request message (containing path) to TCP connection socket 2) accepts connection time 4) reads request message, forms response message containing requested object, writes message to socket 5) closes TCP connection 10
HTTP 1.0 Example URL www.someschool.edu/somedept/home.index Client 6) Browser reads response message containing the HTML file. Ten references to JPEG objects are found during the HTML parse 7) Browser initiates TCP connection to server at www.someschool.edu TCP 3-way handshake time 8) accepts connection The above steps are repeated for for each of of the the 10 10 JPEG objects 11 The Hypertext Transfer Protocol HTTP message format Two types of HTTP message formats: request and response messages» ASCII (human-readable format) message:» Request line» Optional header lines» Present only for some methods (e.g., POST) method <SP> path <SP> version <CR><LF> header field name : value <CR><LF> header field name : value <CR><LF> <CR><LF> entity body 12
HTTP Message Format Netscape Navigator & MS Explorer request examples How does Netscape process: http://www.cs.clemson.edu:8080/~mweigle/? GET /~mweigle/ HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.74 [en] (WinNT; U) Host: www.cs.clemson.edu:8080 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 Cookie: SITESERVER=ID=8a064b7855a043146e45991174a3d970 13 HTTP Message Format Netscape Navigator & MS Explorer request examples GET /~mweigle/ HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.74 [en] (WinNT; U) Host: www.cs.clemson.edu:8080 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 Cookie: SITESERVER=ID=8a064b7855a043146e45991174a3d970 GET /~mweigle/ HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/msword, application/vnd.ms-excel, application/vnd.ms-powerpoint, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0) Host: www.cs.clemson.edu:8080 Connection: Keep-Alive 14
HTTP Message Format General response message format Response messages» ASCII (human-readable format) Message structure:» Status line» Optional header lines» Requested object, error message message, etc. version <SP> code <SP> phrase <CR><LF> header field name : value <CR><LF> header field name : value <CR><LF> <CR><LF> entity body 15 HTTP Message Format Telnet example Connect to HTTP server port Telnet output Type GET command plus blank line status line headers plus blank line Object content Telnet output > telnet www.cs.clemson.edu 80 Trying 130.127.48.92... Connected to www.cs.clemson.edu. Escape character is '^]'. GET /~mweigle/foo.txt HTTP/1.0 HTTP/1.1 200 OK Date: Mon, 06 Sep 2004 19:22:18 GMT : Apache/1.3.31 (Unix) Last-Modified: Mon, 30 Aug 2004 20:35:29 GMT ETag: "2fa6-76c-41338f91" Accept-Ranges: bytes Content-Length: 95 Connection: close Content-Type: text/plain ** This test file is stored in the UNIX ** file system at ** /home/mweigle/public_html/foo.txt Connection closed by foreign host. 16
HTTP Message Format Telnet example (2) Connect to HTTP server port Telnet output Type GET command plus blank line status line headers plus blank line Object content Telnet output > telnet www.msn.com 80 Trying 207.46.179.134... Connected to www.msn.com. Escape character is '^]'. GET /~index.html HTTP/1.0 HTTP/1.1 404 Object Not Found : Microsoft-IIS/5.0 Date: Mon, 11 Feb 2002 18:33:15 GMT Content-Length: 1638 Content-Type: text/html <HTML> <HEAD>....... Error type 404 - Object Not Found </body> </html> Connection closed by foreign host. 17 HTTP Message Format status codes Sample response codes: 200 OK» Request succeeded, requested object later in this message 301 Moved Permanently» Requested object moved, new location specified later in this message (Location:) 400 Bad Request» Request message not understood by server 404 Not Found» Requested document not found on this server 505 HTTP Version Not Supported 18
HTTP Message Format Typical Request and Response Headers Request headers Connection: Keep-Alive User-Agent: Mozilla/4.74 [en] (WinNT; U) Host: www.cs.clemson.edu:8080 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 Cookie: SITESERVER=ID=8a064b785a043146e4599174a3d970 Response headers Date: Fri, 02 Feb 2001 19:10:11 GMT : Apache/1.3.9 (Unix) (Red Hat/Linux) Last-Modified: Tue, 30 Jan 2001 21:48:14 GMT ETag: "1807135e-67-3a77369e" Accept-Ranges: bytes Content-Length: 103 Connection: close Content-Type: text/plain 19 HTTP Protocol Design Non-persistent connections The default browser/server behavior in HTTP/1.0 is for the connection to be closed after the completion of the request» parses request, responds, and closes TCP connection» The Connection:keep-alive header allows for persistent connections With non-persistent connections at least 2 RTTs are required to fetch every object» 1 RTT for TCP handshake» 1 RTT for request/response Browser TCP connection establishment Web 20
Non-Persistent Connections Performance A transmission propagation B nodal processing queueing With non-persistent connections at least 2 RTTs are required to fetch every object» 1 RTT for TCP handshake» 1 RTT for request/response Browser TCP connection establishment Web 21 Non-Persistent Connections Performance A transmission propagation B nodal processing queueing Example: A 1 Kbyte base page with five 1.5 Kbyte embedded images coming from the West coast on an OC-48 link» 1 RTT for TCP handshake = 50 ms» 1 RTT for request/response = 50 ms Page download time with non-persistent connections? Page download time with a persistent connection? 22
Non-Persistent Connections Parallel connections To improve performance a browser can issue multiple requests in parallel to a server (or servers)» parses request, responds, and closes TCP connection Web TCP connection establishment Browser TCP connection establishment Page download time with parallel connections?» 2 parallel connections =» 4 parallel connections = Web 23 HTTP Protocol Design Persistent v. non-persistent connections Non-persistent» HTTP/1.0» parses request, responds, and closes TCP connection» At least 2 RTTs to fetch every object Persistent» Default for HTTP/1.1 (negotiable in 1.0)» Client sends requests for multiple objects on one TCP connection», parses request, responds, parses next request, responds...» Fewer RTTs Parallel v. persistent connections? 24
Persistent Connections Persistent connections with pipelining Persistent without pipelining: Client issues new request only when previous response has been received One RTT for each referenced object Persistent with pipelining: Default in HTTP/1.1 Client sends requests as soon as it encounters a referenced object As little as one RTT for all the referenced objects 25 Persistent Connections Without Pipelining Client issues new request only when previous response has been received One RTT for each referenced object Client msg base msg msg (1st embedded object) msg (1st embedded object) msg (2nd embedded object) msg (2nd embedded object) Time 26
Persistent Connections With Pipelining Default in HTTP/1.1 Client sends requests as soon as it encounters a referenced object As little as one RTT for all the referenced objects Client msg base msg msg (1st HTTP embedded request object) msg (2nd embedded object) msg msg (1st embedded object) (2nd embedded object) Time 27 HTTP User- Interaction Authentication Problem: How to limit access to server documents?» s provide a means to require users to authenticate themselves HTTP includes a header tag for user to specify name and password (on a GET request)» If no authorization presented, server refuses access, sends WWW authenticate: header line in response Stateless: client must send authorization for each request» A stateless design» (But browser may cache credentials) Client usual msg 401: authorization WWW authenticate: usual msg + authorization: usual msg usual msg + authorization: usual msg Time 28
HTTP User- Interaction Cookies sends cookie to browser in response message Set-cookie:<value> Browser presents cookie in later requests to same server cookie: <value> matches cookie with server-stored information» Provides authentication» Client-side state main-tenance (remembering user preferences, previous choices, ) Client usual msg usual + Set-cookie: S1 usual msg cookie: S1 usual msg usual msg cookie: S1 usual + Set-cookie: S2 cookiespecific action cookiespecific action 29 HTTP User- Interaction Browser caches browser Internet origin server hit miss Internet Browser with disk cache origin server Browsers cache content from servers to avoid future server interactions to retrieve the same content 30
HTTP User- Interaction The conditional GET If object in browser cache is fresh, the server won t re-send it» Browsers save current date along with object in cache Client specifies the date of cached copy in HTTP request If-modified-since:<date> s response contains the object only if it has been changed since the cached date Otherwise server returns: HTTP/1.0 304 Not Modified Client If-modified-since: <date> HTTP/1.0 304 Not Modified If-modified-since: <date> HTTP/1.0 200 OK <data> object not modified object modified 31 HTTP User- Interaction Cache Performance for HTTP Requests Cache Hit Cache Miss Network Browser with disk cache Origin What is the average time to retrieve a web object?» T mean = hit ratio x T cache + (1 hit ratio) x T server where hit ratio is the fraction of objects found in the cache» Mean access time from a disk cache =» Mean access time from the origin server = For a 60% hit ratio, the mean client access time is:» (0.6 x 10 ms) + (0.4 x 1,000 ms) = 406 ms 32
Cache Performance for HTTP Requests What determines the hit ratio? Cache size Locality of references» How often the same web object is requested How long objects remain fresh (unchanged) Object references that can t be cached at all» Dynamically generated content» Protected content» Content purchased for each use» Content that must always be up-to-date» Advertisements ( pay-per-click issues) 33 Caching on the Web Web caches (Proxy servers) Web caches are used to satisfy client requests without contacting the origin server Users configure browsers to send all requests through a shared proxy server» Proxy server is a large cache of web objects Browsers send all HTTP requests to proxy» If object in cache, proxy returns object in HTTP response» Else proxy requests object from origin server, then returns it in to browser client (hit) client Proxy server (miss) Origin server Open research question: How does the proxy hit ratio change with the population of users sharing it? 34
Why do Proxy Caching? The performance implications of caching Consider a cache that is close to client» E.g., on the same LAN public Internet origin servers Nearby caches mean:» Smaller response times» Decreased traffic on egress link to institutional ISP (often the primary bottleneck) To To improve Web response times should one buy a 10 10 Mbps access link or or a proxy server? 1.5 Mbps access link 10 Mbps LAN proxy server campus 35 Why do Proxy Caching? The performance implications of caching Web performance without caching:» Mean object size = 50 Kbits» Mean request rate = 29/sec» Mean origin server access time = 1 sec» Average response time =?? Traffic intensity on the access link: 29 reqs sec 50 Kbits/req X = 0.97 1.5 Mbps public Internet 1.5 Mbps access link 10 Mbps LAN origin servers 1000 ms campus 36
Why do Proxy Caching? The performance implications of caching origin servers Upgrade the access link to 10 Mb/s» Response time =??» Queuing is negligible hence response time = 1 sec Add a proxy cache with 40% hit ratio and 10 ms access time» Response time =??» Traffic intensity on access link = public Internet 1.5 Mbps access link 1000 ms 0.6 x 0.97 = 0.58» Response time = 0.4 x 10 ms + 0.6 x 1,000 ms = 604 ms 10 Mbps LAN A proxy cache lowers response time, lowers access link utilization, and saves money! campus 37 Why do Proxy Caching? The case for proxy caching origin servers Lower latency for user s web requests Reduced traffic at all levels Reduced load on servers Some level of fault tolerance (, servers) Reduced costs to ISPs, content providers, etc., as web usage continues to grow exponentially More rapid distribution of content 1.5 Mbps access link 10 Mbps LAN public Internet campus proxy server 38