Outline Introduce Socket Programming Domain Name Service (DNS) Standard Application-level Protocols email (SMTP) HTTP HyperText Transfer Protocol Defintitions A web page consists of a base HTML-file which can includes several referenced objects An object can be an HTML file, a JPEG image, a Java applet, an audio file, Each object is addressable by a URL Example URL: www.someschool.edu/somedept/pic.gif host name path name 1
HTTP overview HTTP is the application layer protocol for the World-Wide Web client/ model client: browser that requests, receives, displays Web objects : Web sends objects in response to requests Windows PC running Explorer Server running Apache Web Mac running Safari HTTP overview (continued) Uses TCP: client initiates TCP connection (creates socket) to, port 80 accepts TCP connection from client HTTP messages (applicationlayer protocol messages) exchanged between browser (HTTP client) and Web (HTTP ) TCP connection closed Initially, HTTP was entirely stateless maintains no information about past client requests Protocols that maintain state are complex! aside past history (state) must be maintained if /client crashes, their views of state may be inconsistent, must be reconciled 2
HTTP connections Nonpersistent HTTP At most one object is sent over a TCP connection. Persistent HTTP Multiple objects can be sent over single TCP connection between client and. Nonpersistent HTTP Suppose user enters URL: www.someschool.edu/somedept/home.index time 1a. HTTP client initiates TCP connection to HTTP (process) at www.someschool.edu on port 80 2. HTTP client sends HTTP request message (containing URL) using TCP connection socket. Message indicates that client wants object somedept/home.index (which contains text, references to 10 jpeg images) 1b. HTTP at host www.someschool.edu waiting for TCP connection at port 80. accepts connection, with TCP notifying client of established connection 3. HTTP receives request message, forms response message containing requested object, and sends message using its TCP socket 3
Nonpersistent HTTP (cont.) time 5. HTTP client receives response message containing the html file and prepares to display the html contents. Parsing the html file, finds 10 referenced jpeg objects 6. Next, the HTTP client REPEATS Steps 1 through 5 for EACH of the 10 jpeg objects! 4. HTTP closes the TCP connection. Non-Persistent HTTP: Response time Definition of RTT: time for a small packet to travel from client to and back. Response time: one RTT to initiate TCP connection one RTT for HTTP request and first few bytes of HTTP response to return file transmission time total = 2RTT+transmission time initiate TCP connection RTT request file RTT file received time time time to transmit file 4
Persistent HTTP Nonpersistent HTTP issues: requires 2 RTTs per object OS overhead for each TCP connection Note: browsers often open parallel TCP connections to fetch referenced objects Persistent HTTP leaves connection open after sending a response subsequent HTTP messages between same client/ are sent over the open connection client sends requests as soon as it encounters a referenced object different page requests also sent on same connection greatly reduces TCP overhead connection closed when unused for a timeout interval HTTP request message two types of HTTP messages: request, response HTTP request message: ASCII (human-readable format) request line (GET, POST, HEAD commands) header lines GET /somedir/page.html HTTP/1.1 Host: www.someschool.edu User-agent: Mozilla/4.0 (browser type) Connection: close (non-persistent) Accept-language:fr (prefers French) Carriage return, line feed indicates end of message (extra carriage return, line feed) 5
HTTP response message status line (protocol status code status phrase) data, e.g., requested HTML file header lines HTTP/1.1 200 OK Connection close Date: Thu, 06 Aug 1998 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 1998... Content-Length: 6821 Content-Type: text/html requested data here... HTTP response status codes In first line in ->client response message. A few sample codes: 200 OK request succeeded, requested object later in this message 301 Moved Permanently requested object moved, new location specified later in this message (Location: field) 302 Moved Temporarily requested object moved, new location specified later in this message but keep using old URI (Location: field) 400 Bad Request request message not understood by 404 Not Found requested document not found on this 500 Internal Server Error The encountered an unexpected condition which prevented it from fulfilling the request 505 HTTP Version Not Supported 6
Example: Be the browser <131 black:~ >telnet www.cse.msu.edu 80 Trying 35.9.20.103... Connected to at2.cse.msu.edu. Escape character is '^]'. GET /~dennisp/ HTTP/1.1 Host: www.cse.msu.edu HTTP/1.1 200 OK Date: Mon, 22 Jan 2018 16:37:14 GMT Server: Apache/2.4.10 (Debian) Last-Modified: Tue, 07 Feb 2017 18:26:15 GMT ETag: "bb2-547f4e2219fc0" Accept-Ranges: bytes Content-Length: 2994 Vary: Accept-Encoding Connection: close Content-Type: text/html (followed by lots of html) User- state: cookies Many Web sites use cookies Four components: 1) Set-cookie header line of HTTP response message 2) cookie header line in HTTP request message 3) cookie file kept on user s host, managed by user s browser 4) back-end database at Web site Example: Susan accesses the Internet always from browser on her PC visits specific e-commerce site for first time when initial HTTP request arrives at site, the site creates: unique ID entry in backend database for the ID 7
Cookies: keeping state (cont.) client ebay 8734 cookie file ebay 8734 amazon 1678 one week later: ebay 8734 amazon 1678 usual http request msg usual http response Set-cookie: 1678 usual http request msg cookie: 1678 usual http response msg usual http request msg cookie: 1678 usual http response msg Amazon creates ID 1678 for user create entry cookiespecific action cookiespectific action access access backend database Cookies (continued) What cookies can bring: authorization shopping carts recommendations user session state How to keep state : protocol endpoints: maintain state at sender/receiver over multiple transactions cookies: http messages carry the combination to unlock that state aside Cookies and privacy: cookies permit sites to learn a LOT about you you might supply all kinds of personal information to sites, which can be stored, shared, sold 8
Web caches (a.k.a. proxy s) Goal: satisfy client request without involving origin user can configure browser to discover proxies or use specific ones browser sends all HTTP requests to the proxy client Proxy origin if object is in cache: proxy returns object else proxy requests object from origin, then returns object to client client origin More about Web caching Proxy acts as both a client and a Typically proxy is installed by ISP (university, company, residential ISP) Why Web caching? reduce response time for client request reduce traffic on an institution s access link. Internet dense with caches: enables poor (or not so poor) content providers to more effectively deliver content Hence, a market! (e.g., Akamai) 9
Caching example Assumptions average object size = 1,000,000 bits avg. request rate from institution s browsers to origin s = 15/sec delay from institutional router to any origin and back to router = 2 sec Consequences utilization on LAN = 15% utilization on access link = 100% total delay = Internet delay + access delay + LAN delay = 2 sec + minutes + milliseconds institutional network public Internet 15 Mbps access link 100 Mbps LAN origin s Caching example (cont.) possible solution increase bandwidth of access link to, say, 100 Mbps consequence utilization on LAN = 15% utilization on access link = 15% Total delay = Internet delay + access delay + LAN delay = 2 sec + msecs + msecs 2.01 sec but, costly upgrade institutional network public Internet 100 Mbps access link 100 Mbps LAN origin s institutional cache 10
Caching example (cont) possible solution: install cache suppose hit rate is 0.4 consequence 40% requests will be satisfied almost immediately 60% requests satisfied by origin utilization of access link reduced to 60%, resulting in negligible delays (say 10 msec) total avg delay = Internet delay + access delay + LAN delay =.6*(2.01) secs +.4*milliseconds < 1.4 secs institutional network public Internet 15 Mbps access link origin s 100 Mbps LAN institutional cache Caching What if How does using HTTPS impact caching? HTTPS provides end to end encryption including the headers Web Proxy is a man-in-the-middle and can not decrypt the packets So, No Caching 11
Conditional GET Goal: don t send object if cache has up-to-date cached version cache: specify date of cached copy in HTTP request If-modified-since: <date> : response contains no object if cached copy is up-todate: HTTP/1.0 304 Not Modified cache HTTP request msg If-modified-since: <date> HTTP response HTTP/1.0 304 Not Modified HTTP request msg If-modified-since: <date> HTTP response HTTP/1.0 200 OK <data> object not modified object modified How does a do that? How to move a web object without changing the web page? For apache add a line like to.htaccess Redirect 301 /rediredpage.html http://www.xyz.com/newpage.html For more information see: https://www.digitalocean.com/community/tutorials/how-tocreate-temporary-and-permanent-redirects-with-apache-andnginx 12
Summary We have now learned a little about how applications access the Internet protocols Specifically, sockets We discussed DNS and its operation A little about email simple, around a long time A little more about HTTP A very simple (human readable) protocol that changed the world as we know it. But, it needed an underlying reliable protocol Next, that amazing protocol: TCP! (and UDP) 13