DNS and HTTP A High-Level Overview of how the Internet works Adam Portier Fall 2017
How do I Google? Smaller problems you need to solve 1. Where is Google? 2. How do I access the Google webpage? 3. How do I ask Google a question?
Real World Problem Problem I want to visit Villanova University, but I don t know the address. Solution Use a phone book (Or Yelp, etc.) Translates the name of a place (Villanova University) into a location identifier (street address) Internet Equivalent Problem I want to visit Google, but I don t know the address. Solution DNS Translates the name of a service (www.google.com) into a location identifier (IP address) Where is Google?
DNS Domain Name System Proposed in 1983 by Paul Mockapetris (RFC 1034 and 1035) Breaks up the Internet s name space into Domains and Subdomains Provides a mapping of FQDN (fully qualified domain name) to IP address, as well as other records of interest Hierarchical Child domains are a subset of all domains that share a common parent Parent domains are domains that have one or more children Distributed Each owner of a subdomain maintains their own DNS records
DNS History Before the Internet, there was ARPANET Only a few hundred networked computers, almost all Education or Government owned Mapping of service name to IP address was available in a text file (HOSTS.TXT) Problem of scale; too many services to centrally manage a hosts file
Domain Hierarchy Most to Least Specific, Left to Right All domains start with. (Root) www.google.com is actually www.google.com. Each label between a dot is a Subdomain of the domain to it s right, and the leftmost label is the record com is a subdomain of. (root) google is a subdomain of com www is a record in the domain google.com The process of splitting a Domain into a Subdomain is called delegation
Domain Hierarchy Example Root DNS Servers com DNS servers org DNS servers edu DNS servers yahoo.com DNS servers amazon.com DNS servers pbs.org DNS servers villanova.edu DNS servers umass.edu DNS servers villanova.edu is a subdomain of edu; the DNS servers for edu reference Villanova s DNS servers edu is a subdomain of root; the DNS servers for Root reference edu s DNS servers
TLD and Authoritative Servers Authoritative DNS servers: operated by organizations owning a subdomain name space Can be maintained by organization directly or by service provider Hosts DNS records for that subdomain only Top-level domain (TLD) servers: responsible for immediate subdomains of Root (com, org, net, edu, etc and all toplevel country codes)
DNS Record Types All DNS records have a Name, Value, Type and TTL NS Name Server record Maps a subdomain to the A record of another DNS server How delegation is accomplished SOA Start of Authority All subdomains have exactly one Sets up default settings for the domain, identifies contact information, etc. A Maps an FQDN to an IPv4 address Most common record type
DNS Record Types CNAME Alias for one FQDN to another FQDN Maps namespace to another, does not have to be in the same domain MX Identifies the location of an Email server Contains the FQDN of the email server and a priority Email clients try the server with the lowest priority first TXT can contain up to 255 characters of ASCII text
DNS Protocol and Messages Query and Reply messages use the same format identification: 16 bit # for query, reply to query uses same # flags: query or reply recursion desired recursion available reply is authoritative
DNS Protocol and Messages questions: name and type of DNS queries answers: resource records in response to query authority: contains the DNS servers that are authoritative for the queried FQDN additional info: extra records to assist with DNS function Resolved CNAMES Glue NS records
Authoritative Contains a set of all the records for a single domain Referenced using NS records from other Authoritative DNS servers (delegation) Will not answer questions about records outside it s domain Must be centrally located and publicly visible Recursive Contains no DNS records of it s own Will answer questions about records anywhere in the DNS hierarchy using recursion Can be centrally located or located directly on the client DNS Server Types
Recursion Example Host at cis.poly.edu wants the IP address of gaia.cs.umass.edu 1. Host asks the recursive resolver for the A record for gaia.cs.umass.edu 2. The resolver (dns.poly.edu) does not have the answer, so it asks Root for the location of edu 3. Root returns the location of an edu server 4. Resolver asks the edu Authoritative DNS server for the location of umass.edu
Recursion Example 5. The edu DNS server returns the location of the umass.edu DNS server 6. Resolver asks the umass.edu DNS server for the location of cs.dns.umass.edu DNS server (for the purposes of this example, it s the same server) 7. Resolver asks for the record gaia.cs.umass.edu, which is returned 8. Resolver returns the answer to the client (cis.poly.edu)
Caching Recursion is computationally expensive Recursive DNS servers get asked the same question a lot The TTL (Time to Live) of a DNS record specifies in seconds how long a recursive resolver can hold on to an answer Queries made for the same record within the TTL are answered from cache Queries made for a record after it s TTL has expired and handled with normal recursion
DNS Cache Poisoning
Where is Google? Your web browser asks it s recursive resolver for the A record at www.google.com The recursive resolver locates DNS servers for Root, com, and google The DNS server for google.com returns the A record requested The web browser accesses the webpage at that IP address over HTTP
DNS Activity Use dig to see Google s DNS records Use dig +trace to see a full recursion to get Google s DNS records https://www.ultratools.com/tools/dnsloo kup Perform the same steps for your favorite website
How do I Google? 1. Where is Google? 2. How do I access the Google webpage? 3. How do I ask Google a question?
Web Pages A webpage is a collection of objects Objects can be text or multimedia HTML is the most common language used in webpages Base HTML page references other objects A URL is a combination of a host name and a path Each object has it s own URL
How do I access Google? HTTP: Hypertext Transfer Protocol Application layer protocol for the Internet Client / Server Client: a browser that sends requests and organizes objects into a page Server: a web server that stores objects and responds to clients
HTTP Overview Uses TCP port 80 HTTP is stateless, so each request / response is resolved independent of any previous communication Connection can be persistent or non-persistent Non-persistent: HTTP 1.0 TCP connection is opened and closed for every object request Persistent: HTTP 1.1 TCP connection is opened once on the first request and used for all subsequent reqests 1. Creates a TCP socket connection with server 2. Server accepts TCP connection from client 3. HTTP messages are exchanged between the client and the server 4. TCP connection is closed
Non-persistent HTTP Example www.someschool.edu/somedepartment/home.index Client 1. Client initiates HTTP port 80 request to www.someschool.edu 2. Client sends an HTTP request to retrieve object somedepartment/home.i ndex Server 1. HTTP server at www.someschool.edu accepts connection and notifies client 2. HTTP server receives request message, forms a response message containing the requested object and sends it back through the socket
Non-persistent HTTP Example www.someschool.edu/somedepartment/home.index Client 3. HTTP client closes connection with the server 4. HTTP client parses the object retrieves, locates 10 other.jpeg objects it needs to retrieve 5. Repeat steps 1-4 for each object Server 3. HTTP server closes the connection to the client
Persistent HTTP Problems with Non-persistent HTTP Requires 2 RTTs per object TCP connections are expensive (lots of overhead setting up and tearing down) Client may open several connections to the same server in parallel Persistent HTTP Server leaves connection open after the itial object retrieval Subsequent HTTP messages use the open connection for a set amount of time
Persistent HTTP Pipelining With pipelining Default in HTTP 1.1 Client sends request for object as soon as it is referenced in another object As little as one RTT for all objects on a page Without pipelining Client issues a new request only after previous request is complete One RTT for each referenced object, minus the overhead of setting up a new connection
HTTP Methods HTTP 1.0 GET Basic object retrieval request POST Basic form submission request HEAD Do not retrieve object, only return header information HTTP 1.1 PUT Uploads a file to server DELETE Deletes a file from the server
HTTP Messages (Request)
HTTP Messages (Response)
Response Status Codes Indicate to client if the server was able to fulfill the request 200 OK Request succeeded, object requested is in this message 301 / 302 Object has moved (permanently / temporarily) Requested object was not found at the requested location, new location in message 400 Bad Request HTTP request message not understood by server 404 Not Found Object requested does not exist 500 Server Error Something went wrong when processing the request 505 HTTP version not supported
How do I access Google? Make an HTTP GET request to the server at www.google.com Make additional HTTP GET requests for all images on the page
How do I ask Google a question? HTTP POST HTTP request to the server with POST data POST data is the result of a web form submission In Google example, this is your search term
Manual HTTP Request telnet www.csc.villanova.edu 80 Opens a TCP port 80 (HTTP) socket to www.csc.villanova.edu Type in HTTP request GET /~carterh/ HTTP/1.1 Host: www.csc.villanova.edu Enter on blank line Look at response You can submit POST data using telnet, but formatting it is hard
HTTP Activity Retrieve www.google.com using telnet Retrieve www.google.com using curl curl i to include protocol headers in response View www.google.com in a browser debugger Firefox: Tools > Web Developer > Inspector Chrome: View > Developer > Developer Tools > Elements Repeat with your favorite website