CS 3640: Introduction to Networks and Their Applications Fall 2018, Lecture 19: Application Layer III (Credit: Prof. Phillipa Gill @ University of Massachusetts) Instructor: Rishab Nithyanand Teaching Assistant: Md. Kowsar Hossain 1
You should Be ready for Assignment 4: Scraping the Web. Releasing tonight. I ll post on Piazza when it s available. More freedom to design your code! Collect your graded mid-term exams After class or during office hours Know and understand: The three Internet design principles and components of the Internet. Circuit- vs. packet- switched networks. Components of end-to-end delay. The link layer: error detection, MAC, local addressing/routing. The network layer: addressing, fragmentation, IPv4 vs. IPv6, Ases, Interdomain and Intradomain routing. The transport layer: core functionality, TCP vs. UDP, flow control vs. congestion control, TCP fast retransmit and recovery. Network Address Translation: Why do we need it? How does it work? How do 2 NATed devices communicate? 2
This week in class 1. Mid-term exam 2. 3. Domain Name System (DNS) The Web and HTTP 4
Domain naming hierarchy Verisign Root ICANN UIowa net edu com gov mil org uk fr etc. uiowa mit cs www login mail Tree is divided into zones Each zone has an administrator Responsible for the part of the hierarchy Example: CS controls *.cs.uiowa.edu UIowa controls *.uiowa.edu Verisign controls *.edu ICANN controls.* 5
DNS servers Functions of each DNS server: Authority over a portion of the hierarchy No need to store all DNS names Store all the records for hosts/domains in its zone May be replicated for robustness Know the addresses of the root servers Resolve queries for unknown names Root servers know about all TLDs The buck stops at the root servers 6
DNS servers: Local nameservers and authoritative nameservers Where is www.google.com? www.google.com Local nameserver Local nameserver handles queries on behalf of clients Authoritative nameservers know the zone mappings for a subset of the hierarchy toutatis.cs.uiowa.edu Root nameserver Root ns1.google.com com Authority for *google.com Authority for *.com 7
Basic domain name resolution Every host knows a local DNS server Sends all queries to the local DNS server If the local DNS can answer the query, then you re done Authoritative response: Local server is also the authoritative server for queried name Non-authoritative response: Local server has cached the record for queried name Otherwise, go down the hierarchy and search for the authoritative name server Every local DNS server knows the root servers Use cache to skip steps if possible e.g. skip the root and go directly to.edu if the root file is cached 8
DNS packets DNS is a UDP-based protocol on port 53 No TCP means no connections TxIDs are needed to correlate requests and responses Serves as authentication for responses ID number used to match requests and responses Query/response? Authoritative/non-authoritative response? Success/failure? 0 16 32 TxID Flags Question Count Answer Count Authority Count Additional Record Count Question and answer data (Resource Records, variable length) How many records are there of each type in the response payload? 9
Discuss: Why should this process be iterative? - Fate sharing Iterative DNS Query Example Where is www.google.com? www.google.com TxID: 12347 TxID: 12345 12346 12347 Q: 1 A: 0 Auth: 0 Addl: 0 Q: Where is www.google.com? asgard.ccs.neu.edu ns1.google.com Q: 1 A: 1 Auth: 0 Addl: 0 Q: Where is www.google.com? A www.google.com 182.0.7.34 TxID: 12345 Q: 1 A: 0 Auth: 1 Addl: 1 Q: Where is www.google.com? Auth: NS a.gtld-server.com Addl: A a.gtld-server.com 12.56.10.1 Root a.gtld-server.com TxID: 12346 Q: 1 A: 0 Auth: 1 Addl: 1 Q: Where is www.google.com? Auth: NS ns1.google.com Addl: A ns1.google.com 8.8.0.1
[cbw@ativ9 ~] dig google.com Header info from the response The original question Answers(s) Authority information ; <<>> DiG 9.9.5-3ubuntu0.1-Ubuntu <<>> google.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39348 ;; flags: qr rd ra; QUERY: 1, ANSWER: 16, AUTHORITY: 4, ADDITIONAL: 5 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;google.com. IN A ;; ANSWER SECTION: google.com. 161 IN A 4.53.56.93 google.com. 161 IN A 4.53.56.94 google.com. 161 IN A 4.53.56.104 google.com. 161 IN A 4.53.56.109 google.com. 161 IN A 4.53.56.99 google.com. 161 IN A 4.53.56.113 ;; AUTHORITY SECTION: google.com. 156797 IN NS ns2.google.com. google.com. 156797 IN NS ns1.google.com. ;; ADDITIONAL SECTION: ns2.google.com. 330052 IN A 216.239.34.10 ns1.google.com. 330052 IN A 216.239.32.10
DNS Queries and Resource Records DNS queries have two fields: name and type Resource record is the response to a query Four fields: (name, value, type, TTL) There may be multiple records returned for one query What do the name and value mean? Depends on the type of query and response 12
Resp. Query Resp. Query DNS Types Type = A / AAAA Name = domain name Value = IP address A is IPv4, AAAA is IPv6 Name: www.ccs.neu.edu Type: A Name: www.ccs.neu.edu Value: 129.10.116.81 Type = NS Name = partial domain Value = name of DNS server for this domain Go send your query to this other server Name: ccs.neu.edu Type: NS Name: ccs.neu.edu Value: 129.10.116.51 13
Resp. Query Resp. Query DNS Types Type = CNAME Name = hostname Value = canonical hostname Useful for aliasing CDNs use this Type = MX Name = domain in email address Value = canonical name of mail server Name: foo.mysite.com Type: CNAME Name: foo.mysite.com Value: bar.mysite.com Name: ccs.neu.edu Type: MX Name: ccs.neu.edu Value: amber.ccs.neu.edu 14
DNS as an indirection service Discuss: DNS gives us very powerful capabilities. What are they? Not only easier for humans to reference machines! Changing the IPs of machines becomes trivial e.g. you want to move your web server to a new host Just change the DNS record! Censorship is easier to implement. 15
Aliasing and load balancing One machine can have many aliases www.reddit.com christo.blogspot.com www.foursquare.com www.huffingtonpost.com sandi.blogspot.com *.blogspot.com One domain can map to multiple machines www.google.com 16
DNS and Content Delivery Networks (CDNs) DNS responses may vary based on geography, ISP, etc
DNS delays How many of you have purchased a domain name? Did you notice that it took ~72 hours for your name to become accessible? This delay is due to DNS propagation delays Discuss: Why would this process fail for a new website? www.my-new-site.com Root com asgard.ccs.neu.edu ns.godaddy.com 18
DNS caching (efficiency) vs. freshness (correctness) DNS Propagation delay is caused by caching Where is That name www.my-new-site.com? does not exist. asgard.ccs.neu.edu Cached Root Zone File Cached.com Zone File Cached.net Zone File Etc. Zone files may be cached for 1-72 hours Root com www.my-new-site.com ns.godaddy.com 19
This week in class 1. Mid-term exam 2. 3. Domain Name System (DNS) The Web and HTTP 20
The Web and HTTP Some background on Web pages A web page consists of base HTML-file which includes several referenced objects An object can be HTML file, JPEG image, Java applet, audio file, etc. Each object is addressable by a URL. Example: 21
The Web and HTTP.html object.jpeg object 22
The HTTP protocol HTTP: hypertext transfer protocol Web s application layer protocol client/server model client: browser that requests, receives, (using HTTP protocol) and displays Web objects server: Web server sends (using HTTP protocol) objects in response to requests PC running Firefox browser Web server iphone running Safari browser 23
The HTTP protocol: Connection types HTTP 1.0 Establish TCP connection Request object Close TCP connection Repeat for each object Discuss: How can we be more efficient? Client TCP SYN G TCP FIN TCP SYN G TCP FIN TCP SYN G TCP FIN Server page.html hpface.jpg castle.gif
The HTTP protocol: Connection types How can we be more efficient? Option 1: Add parallelism
The HTTP protocol: Connection types How can we be more efficient? Option 2: Use persistent HTTP connections Use the same TCP connection to get all objects served by the same server (end-host). Good: TCP overhead is amortized. Bad: Need to maintain state for longer. Discuss: Can we combine parallelism with persistence?
The HTTP protocol: Connection types How can we be more efficient? Option 3: HTTP Pipelining (HTTP 1.1) Allow asynchronous resource fetching. Don t wait for previous response before asking for next resource.
The HTTP protocol: Message types HTTP requests HTTP Get: Used for requesting Web content/resources HTTP Post: Used for sending data from client to server request line (GET, POST) header lines carriage return, line feed at start of line indicates end of header lines GET /index.html HTTP/1.1\r\n Host: www-net.cs.umass.edu\r\n User-Agent: Firefox/3.6.10\r\n Accept: text/html,application/xhtml+xml\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n Keep-Alive: 115\r\n Connection: keep-alive\r\n \r\n Application Layer carriage return character line-feed character
The HTTP protocol: Message types HTTP responses