Internet Content Distribution

Similar documents
Application Protocols and HTTP

Computer Networks. Wenzhong Li. Nanjing University

Application Layer. Applications and application-layer protocols. Goals:

CS4/MSc Computer Networking. Lecture 3: The Application Layer

Web, HTTP and Web Caching

Content Delivery on the Web: HTTP and CDNs

CMSC 332 Computer Networking Web and FTP

EECS 122: Introduction to Computer Networks DNS and WWW. Internet Names & Addresses

Lecture 6 Application Layer. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Outline 9.2. TCP for 2.5G/3G wireless

Mobile Communications Chapter 9: Mobile Transport Layer

CPSC 441 COMPUTER COMMUNICATIONS MIDTERM EXAM SOLUTION

CSE/EE 461 HTTP and the Web

HTTP Reading: Section and COS 461: Computer Networks Spring 2013

Mobile Transport Layer

Lecture 7b: HTTP. Feb. 24, Internet and Intranet Protocols and Applications

TCP so far Computer Networking Outline. How Was TCP Able to Evolve

Review of Previous Lecture

CSCI 466 Midterm Networks Fall 2013

0 0& Basic Background. Now let s get into how things really work!

HyperText Transfer Protocol

Homework 2 50 points. CSE422 Computer Networking Spring 2018

CSE 4215/5431: Mobile Communications Winter Suprakash Datta

Mobile Communications Chapter 9: Mobile Transport Layer

Chapter 13 TRANSPORT. Mobile Computing Winter 2005 / Overview. TCP Overview. TCP slow-start. Motivation Simple analysis Various TCP mechanisms

UNIT IV -- TRANSPORT LAYER

1-1. Switching Networks (Fall 2010) EE 586 Communication and. September Lecture 10

Applications & Application-Layer Protocols: The Web & HTTP

CS 355. Computer Networking. Wei Lu, Ph.D., P.Eng.

Announcements. The World Wide Web. Retrieving From the Server. Goals of Today s Lecture. The World Wide Web. POP3 Protocol

Chapter 6: Distributed Systems: The Web. Fall 2012 Sini Ruohomaa Slides joint work with Jussi Kangasharju et al.

Web, HTTP, Caching, CDNs

A taste of HTTP v1.1. additions. HTTP v1.1: introduces many complexities no longer an easy protocol to implement. G.Bianchi, G.Neglia, V.

Outline Computer Networking. HTTP Basics (Review) How to Mark End of Message? (Review)

The World Wide Web. EE 122: Intro to Communication Networks. Fall 2006 (MW 4-5:30 in Donner 155) Vern Paxson TAs: Dilip Antony Joseph and Sukun Kim

TRANSMISSION CONTROL PROTOCOL

EE 122: HyperText Transfer Protocol (HTTP)

Homework 3 1 DNS. A root. A com. A google.com

Internet Technology. 06. Exam 1 Review Paul Krzyzanowski. Rutgers University. Spring 2016

ELEC / COMP 177 Fall Some slides from Kurose and Ross, Computer Networking, 5 th Edition

Lecture 7 Application Layer. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

CSE 473 Introduction to Computer Networks. Midterm Exam Review

Outline Computer Networking. TCP slow start. TCP modeling. TCP details AIMD. Congestion Avoidance. Lecture 18 TCP Performance Peter Steenkiste

Internet Technology 3/2/2016

ECE 435 Network Engineering Lecture 10

Information Network I: The Application Layer. Doudou Fall Internet Engineering Laboratory Nara Institute of Science and Technique

Transport Layer. Application / Transport Interface. Transport Layer Services. Transport Layer Connections

Outline. Connecting to the access network: DHCP and mobile IP, LTE. Transport layer: UDP and TCP

Lecture 3: The Transport Layer: UDP and TCP

Application Layer: The Web and HTTP Sec 2.2 Prof Lina Battestilli Fall 2017

Computer Networks and Data Systems

Department of EECS - University of California at Berkeley EECS122 - Introduction to Communication Networks - Spring 2005 Final: 5/20/2005

MCS-377 Intra-term Exam 1 Serial #:

CS132/EECS148 - Instructor: Karim El Defrawy Midterm Spring 2013 Time: 1hour May 2nd, 2013

Information Network Systems The application layer. Stephan Sigg

Question 1 (6 points) Compare circuit-switching and packet-switching networks based on the following criteria:

Content Distribution. Today. l Challenges of content delivery l Content distribution networks l CDN through an example

Transport layer. UDP: User Datagram Protocol [RFC 768] Review principles: Instantiation in the Internet UDP TCP

Transport layer. Review principles: Instantiation in the Internet UDP TCP. Reliable data transfer Flow control Congestion control

PLEASE READ CAREFULLY BEFORE YOU START

PLEASE READ CAREFULLY BEFORE YOU START

MPTCP: Design and Deployment. Day 11

CS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP

Introduction to Internet, Web, and TCP/IP Protocols SEEM

Application Level Protocols

Advanced Networking. Domain Name System

Advanced Networking. Domain Name System. Purpose of DNS servers. Purpose of DNS servers. Purpose of DNS servers

ICS 351: Networking Protocols

CCNA 1 v3.11 Module 11 TCP/IP Transport and Application Layers

CS519: Computer Networks. Lecture 5, Part 1: Mar 3, 2004 Transport: UDP/TCP demux and flow control / sequencing

HTTP. Robert Grimm New York University

CS 43: Computer Networks. 19: TCP Flow and Congestion Control October 31, Nov 2, 2018

Outline. EEC-484/584 Computer Networks. Slow Start Algorithm. Internet Congestion Control Algorithm

DNS and HTTP. A High-Level Overview of how the Internet works

Computer Communication Networks Midterm Review

Contents. CIS 632 / EEC 687 Mobile Computing. TCP in Fixed Networks. Prof. Chansu Yu

Network Applications Principles of Network Applications

Transport Protocols and TCP

Bandwidth Allocation & TCP

CPSC 441 COMPUTER COMMUNICATIONS MIDTERM EXAM

Transmission Control Protocol. ITS 413 Internet Technologies and Applications

Notes beforehand... For more details: See the (online) presentation program.

World-Wide Web Protocols CS 571 Fall Kenneth L. Calvert All rights reserved

School of Engineering Department of Computer and Communication Engineering Semester: Fall Course: CENG415 Communication Networks

Expected Time: 90 min PART-A Max Marks: 42

Name Student ID Department/Year. Midterm Examination. Introduction to Computer Networks Class#: 901 E31110 Fall 2007

Application Layer Protocols

TRANSMISSION CONTROL PROTOCOL

Name Student ID Department/Year. Midterm Examination. Introduction to Computer Networks Class#: 901 E31110 Fall 2015

INGI1341: Project 2 Analysis of a website

Transport Protocols & TCP TCP

Applications. Chong-kwon Kim. Running in end systems (hosts) over transport layer protocols Ex: , Web, FTP, instant messaging

BU-2 How Protocols Work 16 June 2009

Exam - Final. CSCI 1680 Computer Networks Fonseca. Closed Book. Maximum points: 100 NAME: 1. TCP Congestion Control [15 pts]

Domain Name System.

Announcements. IP Forwarding & Transport Protocols. Goals of Today s Lecture. Are 32-bit Addresses Enough? Summary of IP Addressing.

Produced by. Mobile Application Development. Higher Diploma in Science in Computer Science. Eamonn de Leastar

Networked Systems and Services, Fall 2018 Chapter 3

Mobile Transport Layer Lesson 10 Timeout Freezing, Selective Retransmission, Transaction Oriented TCP and Explicit Notification Methods

Networked Systems and Services, Fall 2017 Reliability with TCP

Transcription:

Internet Content Distribution Chapter 1: Introduction Jussi Kangasharju

Chapter Outline Introduction into content distribution Basic concepts TCP DNS HTTP Outline of the rest of the course Kangasharju: Internet Content Distribution 2

What is the Problem? Kangasharju: Internet Content Distribution 3

Problem Definition How to distribute (popular) content efficiently on Internet to a large number of geographically distributed people? while keeping your costs manageable! :-) Who is you? 1. Who are the players? 2. What is the content? 3. How are costs determined? 4. How is performance measured? So, we need to answer the following questions Kangasharju: Internet Content Distribution 4

The Players User Content Provider ISP ISP represents the network path between the user and the content provider Typically, such a path contains several ISPs Kangasharju: Internet Content Distribution 5

Roles of the Players User Wants to access content provided by content provider Buys access to network from ISP Possibly pays content provider for content Content provider ISP Produces content Runs a server (or outsources this) Buys access from an ISP Provides the network connection for the others Kangasharju: Internet Content Distribution 6

What Is Content? Content is any digital content the users want Examples: Web pages Music files Videos Streaming media and so on In this course we will focus mainly on web content This means: Mostly small files (a few hundred KB at most, often < 10 KB) Relatively fast delivery (user is waiting to see the page) Kangasharju: Internet Content Distribution 7

Follow the Money Where does the money come from? User Content Provider ISP Kangasharju: Internet Content Distribution 8

Business Relationships User Pays money to ISP for connectivity These days often flat-rate, can be per-byte Might rarely pay content provider Content provider ISP Pays ISP for connectivity Pays for running a server Needs to get money from somewhere May or may not need to make a profit from content Gets money from others Uses money to run network Wants to make a profit Kangasharju: Internet Content Distribution 9

Performance Targets User Wants content fast Does not want to pay (too much) Content provider ISP Wants to get as many users as possible While maintaining costs as low as possible Wants to make as much money as possible Which goal is the most important? In real world, user needs often not relevant Users do not contribute enough money Users might not have a choice, e.g., only 1 ISP possible ISP is the most important in many cases Content providers have some say in some cases Compare proxy caching and CDNs Kangasharju: Internet Content Distribution 10

Our Focus So, what is our focus then? We look at the problem from all sides User (client) Content provider (server) ISP (network) Different solutions for different parts Can use any of the solutions individually or several simultaneously in combination Kangasharju: Internet Content Distribution 11

Constraints Where are we allowed to touch things? Two main constraints: 1. No modifications to user software allowed 2. No modifications to the network allowed Above constraints can be relaxed a bit: Can make easy software updates to clients Each ISP can configure his own network What remains: Any solution must be backwards compatible Practical constraint: Users do not need to do anything Too complicated for most users to configure even simple things Kangasharju: Internet Content Distribution 12

What Does That Leave Us? So, what can we do? Install several servers But how to get users to use other servers? Install caches on client side But how to get users to use them? Replicate content in the network But how to get users to find the content? And so on Core problem: How to get users to use our solution? Before we get to that, let s recap the basics Kangasharju: Internet Content Distribution 13

Internet Basics Three main components that are relevant to us 1. TCP 2. DNS 3. HTTP All transfers happen over TCP Mapping between server names and IP addresses Since our focus is on web content, we have HTTP running on top of TCP Below a brief recap of the most relevant features of the three And an in-depth look at HTTP performance Kangasharju: Internet Content Distribution 14

TCP TCP is a reliable byte-stream transport protocol Responsibilities: Reliable end-to-end transport Flow control Congestion control Main problem for us is TCP congestion control Recall: Three-way handshake Slow-start Congestion avoidance Fast retransmit Fast recovery Kangasharju: Internet Content Distribution 15

Three-Way Handshake In order to open a TCP connection, we need to send three packets before any actual application data can be sent 1. Client sends SYN packet to server 2. Server replies with SYN ACK 3. Client acknowledges this with an ACK In other words, one RTT delay Actually, a bit more than one Nothing we can do about this Some extensions have proposed faster transactions (T/ TCP), but none have been widely implemented Kangasharju: Internet Content Distribution 16

TCP Basics: Slow Start and Congestion Avoidance Sender calculates a congestion window CW for a receiver Start: CW :== 1 segment ACKs arrive (no congestion and no other errors): CW increases Exponential (*2) up to congestion threshold, then linear (+1) Congestion threshold dynamically determined ACK misses: Congestion threshold :== ½ current CW CW :== 1 segment,... i.e., slow start again TCP is part of a class of protocol called AIMD Additive Increase, Multiplicative Decrease Kangasharju: Internet Content Distribution 17

TCP Congestion Control, SS/CA Congestion window (in segments) Cong. avoidance Threshold New threshold Slow start Number of transmissions Kangasharju: Internet Content Distribution 18

TCP Basics: Fast Retransmit/Recovery TCP sends an acknowledgement only after receiving a packet (modulo some enhancements) If a sender receives several acknowledgements for the same packet, this is due to a gap in received packets at the receiver Recall: TCP has cumulative ACKs However, the receiver got all packets up to the gap and is actually receiving packets Means, (probably) only 1 packet was lost Just send that missing packet and hope it fixes the problem If congestion short-lived, continue with current congestion window (do not use slow-start) Kangasharju: Internet Content Distribution 19

Fast Retransmit/Recovery If congestion short-lived and only 1 packet lost Cutting cwnd to 1 and new slow start too drastic Currently most widely implemented Life returns to normal Congestion window S Lost A S S S S A A A S A Fast retransmit 3 dupl. ACKs! S A Not actually sent (delayed ACK) S A Time Kangasharju: Internet Content Distribution 20

TCP and Us Recall two assumptions 1. Content transferred over TCP 2. Most content is small files But usually not small enough to fit in one packet! Result: TCP often never makes it out of slow-start This means we cannot really use all the bandwidth in the network that is available to us The above is the main effect of TCP on us! Keep this in mind! Kangasharju: Internet Content Distribution 21

Solutions for TCP s Problem 1. Live with the problem A.k.a, the Ostrich approach 2. Try to figure out a way to make connections longer-lived Tricky, but currently the most wide-spread solution 3. Develop a completely new transport protocol Not feasible for web, maybe for new types of content Also, what is the effect of the new protocol on TCP? Note that solutions 2 and 3 do require modifications on client side The reason why 2 works is that it was implemented in web browsers and people update the browsers Note: Updates are sometimes forced on people, no need to act Still, it took a long time to be widely supported! Kangasharju: Internet Content Distribution 22

Domain Name System (DNS) DNS is a directory service for Internet Most commonly used to map hostnames to IP-addresses But DNS can do a lot more Although it is not really used for much else DNS is a critical service, without it the whole Internet breaks DNS is a hierarchical system Root consists of 13 well-known name servers (called root servers) All queries start from the root and work their way down until they find the information they seek Kangasharju: Internet Content Distribution 23

DNS: Overview DNS organized in zones ( domain) Actual data in resource records (RR) Several types of RRs: A, PTR, NS, MX, CNAME, Administrator of zone responsible for setting up a server for that zone (+ redundant servers at other domains) Owner of a zone is responsible for serving zone s data RRs can be cached on client side Up to a period determined by zone s administrator Kangasharju: Internet Content Distribution 24

DNS: Example a.root-servers.net ns.something.com NS com. ns.something.com NS foo.com. ns.foo.com ns.foo.com Client A www.foo.com. 192.168.125.54 Client wants to resolve www.foo.com Replies to queries have additional information (IP address + name) Queries can be iterative (here) or recursive Kangasharju: Internet Content Distribution 25

DNS Queries As shown above, DNS queries typically iterative Exception: Client (= user s PC) sends recursive query to its local name server which then proceeds iteratively Answers to queries may also be different for different clients Why is that an important feature? Short answer: We can do all kinds of nice tricks Long answer: DNS redirection, see CDN chapter :-) Kangasharju: Internet Content Distribution 26

DNS and Content Distribution All clients must use DNS to resolve IP-addresses Mandatory step, implemented everywhere Because of its ubiquity, DNS is a viable way to improve content distribution DNS used in content distribution mainly in two ways: 1. Load balancing on server side See Chapter 2 2. DNS redirection on client side See Chapter 4 Besides those, DNS is simply a black box for us Kangasharju: Internet Content Distribution 27

HTTP HyperText Transfer Protocol (HTTP) is The Protocol for delivering Web content For a while in late 90 s, HTTP was the cause for the largest share of Internet traffic Before it, the big fish was email, afterwards P2P HTTP-share getting bigger again (YouTube and others) HTTP standardized by the World Wide Web Consortium (W3C) and by IETF Two versions have been standardized: HTTP/1.0: Earlier version, simple protocol HTTP/1.1: Latest version, much more complicated Currently, HTTP/1.1 already widely implemented Spread through browser updates :-) Kangasharju: Internet Content Distribution 28

HTTP Details HTTP is a (simple) client-server protocol Clients request files, one file at a time Files identified by Uniform Resource Locators (URL) In web context: One HTML page Several images Need to request several files to show one page to user Images are referred to on HTML page References can be relative or absolute Relative reference: URL path is the same as HTML Absolute reference: Absolute URL for image Keep this distinction in mind! Kangasharju: Internet Content Distribution 29

HTTP Requests Example request: GET /index.html HTTP/1.1 Host: www.google.com Connection: close User-Agent: Mozilla 1.6 <CR><LF> Request line, header lines, possible body Kangasharju: Internet Content Distribution 30

HTTP Request Format Method sp URL sp Version <cr> <lf> Header field :<sp> Value <cr> <lf>... Header field :<sp> Value <cr> <lf> <cr> <lf> Request line Header lines Entity body Kangasharju: Internet Content Distribution 31

HTTP Requests Method: GET, POST, HEAD,... (see RFC 2616) Normal requests GET Retrieving web pages, images, etc. Simple forms also processed with GET Entity body in POST POST used for complicated forms (lot of info to handle) Contents of form in entity body Headers give more information about request or modify it in some way We will see more HTTP headers in client-side techniques chapter Kangasharju: Internet Content Distribution 32

HTTP Responses Example response: HTTP/1.1 200 OK Date: Thu, 06 Aug 1998 12:00:15 GMT Connection: close Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 1998 09:23:24 GMT Content-Length: 6821 Content-Type: text/html <CR><LF> (data data data data data) Status line, header lines, requested document Kangasharju: Internet Content Distribution 33

HTTP Response Format Version sp Status sp Phrase <cr> <lf> Header field :<sp> Value <cr> <lf>... Header field :<sp> Value <cr> <lf> <cr> <lf> Status line Header lines Entity body Kangasharju: Internet Content Distribution 34

HTTP Responses Status code gives result Phrase for humans, only code is important! Typical status codes with standard phrases: 200 OK: Everything went fine, information follows 301 Moved Permanently: Document moved, new location in Location-header in response 400 Bad Request: Error in request 404 Not Found: Document does not exist on server 505 HTTP Version Not Supported: Requested protocol version not supported by server Headers give more information, like with requests Kangasharju: Internet Content Distribution 35

HTTP Connections Basic HTTP interaction: Client sends a separate HTTP request for each object Read: Client opens a new TCP connection for each object These are so-called non-persistent connections Implemented in early versions of HTTP And still implemented for backwards compatibility Consider web page with 10 images 1 HTML page + 10 images = 11 files = 11 TCP connections Each connection has to do 3-way handshake Each connection has to go through slow-start Very inefficient! Kangasharju: Internet Content Distribution 36

HTTP Persistent Connections Solution to above problem: Persistent connections Client keeps connection to server open and sends several requests over the same connection Two advantages: 3-way handshake only in the beginning Slow-start only once When download of a file finishes, client can send new request This means 1 RTT delay between objects Compare to 2 RTT for non-persistent connections Better, but still not stellar Kangasharju: Internet Content Distribution 37

HTTP Pipelining Pipelining remedies the 1 RTT delay in persistent connections Idea: Client sends all requests it has at once and server processes them one after the other Benefits: No need to wait between objects Continuous download (good for TCP congestion control) Disadvantage: Not widely implemented nor supported Kangasharju: Internet Content Distribution 38

HTTP in Practice Browsers typically open several connections in parallel In particular for non-persistent connections Typically, 2-6 connections are used simultaneously HTTP/1.0 defined keep-alives HTTP/1.1 defined persistent connections Both are functionally equivalent What are the real advantages of persistent connections and pipelining? Results from Nielsen et al. Network Performance Effects of HTTP/1.1, CSS1, and PNG, published in SIGCOMM 97 Kangasharju: Internet Content Distribution 39

Test Setup Synthetic website, similar to popular websites Similarity was true in 1997, but still close to reality today 1 HTML page, size 42 KB 42 inlined images, total size 125 KB Inlined images 70B -- 40KB 19 images < 1KB, 7 images between 1KB and 2KB, and 6 images between 2KB and 3KB Three kinds of network conditions: High bandwidth, low latency (LAN) High bandwidth, high latency (wide-area) Low bandwidth, high latency (modem) Computers running mostly variants of Unix Kangasharju: Internet Content Distribution 40

Test Clients HTTP/1.0, 4 parallel non-persistent connections (HTTP/1.0) HTTP/1.1, persistent connections (HTTP/1.1) HTTP/1.1, persistent connections with pipelining (Pipeline) HTTP/1.0 HTTP/1.1 Pipeline Max simultaneous sockets 6 1 1 Total sockets 40 1 1 Packets client->server 226 70 25 Packets server->client 271 153 58 Total packets 497 223 83 Elapsed time 1.85s 4.13s 3.02s Kangasharju: Internet Content Distribution 41

Analysis of Results Both persistent connections send less packets Persistent connections with pipelining are much slower than non-persistent connections How is this possible?!? How can persistent connections be slower on a lightly loaded LAN?!? Short answer: There is no reason for that Long answer: You need to tune things correctly Flushing buffers, Nagle s algorithm, connection management, In other words, initial comparison was not fair Kangasharju: Internet Content Distribution 42

Let s Try Again :-) Results for high bandwidth, high latency Average over 5 runs Packets Bytes Seconds TCP overhead HTTP/1.0 565.8 251913 4.17 8.2% HTTP/1.1 304 193595 6.64 5.9% Pipeline 214.2 193887 2.33 4.2% Pipeline w/ compr. 183.2 161698 2.09 4.3% Kangasharju: Internet Content Distribution 43

What Do We Learn? Best performance is with persistent connections and pipelining together 50% gain in retrieval time Non-persistent but parallel connections always beat persistent connections without pipelining in terms of elapsed time Getting the implementation details right is hard Lots of dependencies between operating system, network stack, and application Bottom line: Done correctly, persistent connections with pipelining is the best solution Currently, pipelining not widely implemented, but persistent connections are And parallel connections Kangasharju: Internet Content Distribution 44

Chapter Summary Introduction to content distribution problem Basic technologies: TCP DNS HTTP We will revisit each of them later, when they are needed Kangasharju: Internet Content Distribution 45

Outline of Future Chapters Chapter 2: Server-side techniques Chapter 3: Client-side techniques Chapter 4: Content Distribution Networks (CDN) Kangasharju: Internet Content Distribution 46