Topics in P2P Networked Systems

Similar documents
Content Delivery and File Sharing in the Modern Internet

Peering at Peerings: On the Role of IXP Route Servers

Web Caching and Content Delivery

CSE 124: Networked Services Lecture-17

A Signal Analysis of Network Traffic Anomalies

Spotify Behind the Scenes

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

18050 (2.48 pages/visit) Jul Sep May Jun Aug Number of visits

Opportunities for Exploiting Social Awareness in Overlay Networks. Bruce Maggs Duke University Akamai Technologies

(3.62 Pages/Visit) * Not viewed traffic includes traffic generated by robots, worms, or replies with special HTTP status codes.

Subway : Peer-To-Peer Clustering of Clients for Web Proxy

Finding a needle in Haystack: Facebook's photo storage

The impact of fiber access to ISP backbones in.jp. Kenjiro Cho (IIJ / WIDE)

Characterizing Files in the Modern Gnutella Network: A Measurement Study

Web caches (proxy server) Applications (part 3) Applications (part 3) Caching example (1) More about Web caching

Data Transfers in the Grid: Workload Analysis of Globus GridFTP

Overview Content Delivery Computer Networking Lecture 15: The Web Peter Steenkiste. Fall 2016

Multimedia Streaming. Mike Zink

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Replicate It! Scalable Content Delivery: Why? Scalable Content Delivery: How? Scalable Content Delivery: How? Scalable Content Delivery: What?

CoopNet: Cooperative Networking

Peer-to-peer. T Applications and Services in Internet, Fall Jukka K. Nurminen. 1 V1-Filename.ppt / / Jukka K.

It s Not the Cost, It s the Quality! Ion Stoica Conviva Networks and UC Berkeley

Performance Study of CCNx

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

CSE 123b Communications Software

Peer-to-peer & Energy Consumption

Measuring KSA Broadband

2. Traffic. Contents. Offered vs. carried traffic. Characterisation of carried traffic

Today s class. CSE 123b Communications Software. Telnet. Network File System (NFS) Quick descriptions of some other sample applications

Workload Characterization of Uncacheable HTTP Traffic

Reza Tourani, Satyajayant (Jay) Misra, Travis Mick

2. Traffic lect02.ppt S Introduction to Teletraffic Theory Spring

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli

Performance and cost effectiveness of caching in mobile access networks

Why Highly Distributed Computing Matters. Tom Leighton, Chief Scientist Mike Afergan, Chief Technology Officer J.D. Sherman, Chief Financial Officer

Measurement and Analysis of Internet Content Delivery Systems

On the 95-percentile billing method

Statistics for web9 (2008)

Peer-to-Peer Networks

CPSC 641: WAN Measurement. Carey Williamson Department of Computer Science University of Calgary

Alibaba Cloud CDN. FAQs

CIT 668: System Architecture. Scalability

1 of 10 8/10/2009 4:51 PM

Job sample: SCOPE (VLDBJ, 2012)

Statistics for cornish-maine.org ( ) - main

Reliable Distributed Systems. Peer to Peer

Unique visitors Number of visits Pages Hits Bandwidth (1.64 visits/visitor) 850 (2.47 Pages/Visit)

EE 122: Network Applications

Analyzing Cacheable Traffic in ISP Access Networks for Micro CDN applications via Content-Centric Networking

P2P Optimized Traffic Control Riad Hartani & Joe Neil Caspian Networks

Characterizing Files in the Modern Gnutella Network

Peer-to-Peer (p2p) Systems. CS514: Intermediate Course in Operating Systems. p2p systems. An important topic

CS 425 / ECE 428 Distributed Systems Fall 2015

Modeling and Caching of P2P Traffic

Migrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring

Around the Web in Six Weeks: Documenting a Large-Scale Crawl

On the Equivalence of Forward and Reverse Query Caching in Peer-to-Peer Overlay Networks

SamKnows test methodology

Statistics for cornish-maine.org ( )... 4/25/15, 12:07 PM

Characterizing Gnutella Network Properties for Peer-to-Peer Network Simulation

CSE/EE 461 HTTP and the Web

PeerApp Case Study. November University of California, Santa Barbara, Boosts Internet Video Quality and Reduces Bandwidth Costs

Unit background and administrivia. Foundations of Peer-to- Peer Applications & Systems

Advancing the Art of Internet Edge Outage Detection

1. Creates the illusion of an address space much larger than the physical memory

Multimedia Networking. Network Support for Multimedia Applications

WINNER 2007 WINNER 2008 WINNER 2009 WINNER 2010

Counting is Hard: Probabilistically Counting Views at Reddit. Krishnan Chandra, Data Engineer

Technical University of Munich - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Distributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018

CS November 2018

CONTENT DISTRIBUTION. Oliver Michel University of Illinois at Urbana-Champaign. October 25th, 2011

Virtual CDN Implementation

DRAFT. Measuring KSA Broadband. Meqyas, Q Report

Data Centers. Tom Anderson

Outline Computer Networking. HTTP Basics (Review) How to Mark End of Message? (Review)

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

The Power of Prediction: Cloud Bandwidth and Cost Reduction

T U M. Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic

Operational Experiences With High-Volume Network Intrusion Detection

Optimal Cache Allocation for Content-Centric Networking

Cache Management for TelcoCDNs. Daphné Tuncer Department of Electronic & Electrical Engineering University College London (UK)

CSE/EE 461 Lecture 16 TCP Congestion Control. TCP Congestion Control

A FRESH LOOK AT SCALABLE FORWARDING THROUGH ROUTER FIB CACHING. Kaustubh Gadkari, Dan Massey and Christos Papadopoulos

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza

Characterizing files in the modern Gnutella network

How to deal with large numbers (millions) of entities in a system? IP devices in the internet (0.5 billion) Users in P2P network (millions)

Experimental Study of Skype. Skype Peer-to-Peer VoIP System

APPLICATION OF POLICY BASED INDEXES AND UNIFIED CACHING FOR CONTENT DELIVERY Andrey Kisel Alcatel-Lucent

Evaluation of Performance of Cooperative Web Caching with Web Polygraph

Maintaining the Central Management System Database

Lecture 17: Peer-to-Peer System and BitTorrent

Seminar on. By Sai Rahul Reddy P. 2/2/2005 Web Caching 1

On Web Server Performance Modeling for Accurate Simulation of End User Response Time

C exam. Number: C Passing Score: 800 Time Limit: 120 min IBM C IBM Cloud Platform Application Development

Cache and Forward Architecture

Investigating the Use of Synchronized Clocks in TCP Congestion Control

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

Samsung PM863 and SM863 for Data Centers

Transcription:

600.413 Topics in P2P Networked Systems Week 4 Measurements Andreas Terzis Slides from Stefan Saroiu Content Delivery is Changing Thirst for data continues to increase (more data & users) New types of data have emerged audio, video People use new means to exchange this data The result Internet is now seeing a mixture of old and new content delivery systems: Conventional Web servers and Web clients CDNs: Akamai, Digital Island, Speedera P2Ps:, Gnutella, Napster, AudioGalaxy 1

Goals of this Work Explore and characterize content delivery in the new Internet World Wide Web, Akamai CDN, & Gnutella P2P High-level questions: What is the impact of these systems on the Internet? What are the characteristics of the new delivery systems? Methodology Collect anonymized traces at UW border routers Extension of existing infrastructure [Wolman & Voelker] Identify and log HTTP traffic: Akamai ports 80, 8080, 443 and served by Akamai server ports 80, 8080, 443 (exclude Akamai) Gnutella ports 6346, 6347 port 1214 In this talk, results are based on a 9-day trace May 28th June 6th, 2002 500 million Xactions, 20TB of HTTP data 2

Question 1: What is the bandwidth impact of the new content delivery systems? Mbps 800 Where has all the bandwidth gone? 700 600 500 400 300 200 100 0 non-http TCP P2P non-http TCP P2P Akamai Wed Thu Fri Sat Sun Mon Tue Wed Thu Web = 14% of TCP; P2P = 43% of TCP P2P now dominates Web in bandwidth consumed!!! 3

Bytes Xferred Unique objects inbound outbound 1.51TB 3.02TB 72,818,997 3,412,647 inbound outbound 1.78TB 13.6TB 111,437 166,442 Clients 39,285 1,231,308 4,644 611,005 Servers 403,087 1,463 281,026 3,888 Bandwidth consumed by UW servers 250 Bandwidth Consumed by UW Servers 200 Mbps 150 100 50 Gnutella 0 Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri P2P has diurnal cycle, like the Web P2P peaks later at night 4

Summary 1: (what is the bandwidth impact?) P2P traffic is the largest bandwidth consumer Three times bigger than Web, at UW Uploads significantly exceed downloads at least within UW-like environments Question 2: What are the bytes carrying? 5

Specialized Content Delivery Systems 100% Byte Breakdown per Content Delivery System 80% TEXT (T) IMAGES (I) AUDIO (A) VIDEO (V) OTHER (O) V % Bytes 60% V I 40% A T I O 20% T V O A V A O O A T I T I 0% Akamai Gnutella Web = text + images = video Akamai = images Gnutella = video + audio HTTP Bytes: Now vs. Then May 1999 May 2002 Content Types Ordered by Size 1.7% 2.1% video/x-msvideo text/plain 4.8% 7.1% MP3 7.5% 8.8% app/octet-stream 9.9% GIF JPG 11.3% HASHED MPG 14.6% HTML 18.0% AVI 0% 10% 20% % All Bytes 6

Summary 2: (what are the bytes?) Systems have become specialized in the type of content they deliver P2P = video + audio bytes It s not the music a single video is 200 MP3s Web = text + image bytes Question 3: How do workload characteristics differ in the new delivery systems, compared to the old ones? 7

Object size 100% Object Size CDF % Objects 80% 60% 40% Akamai Gnutella 20% 0% 0 1 10 100 1,000 10,000 100,000 1,000,000 Object Size (KB) P2P objects are 3 orders of magnitude bigger than Web objects Most bandwidth consuming object (inbound) (inbound) (outbound) object size (MB) # requests object size (MB) # clients # servers object size (MB) # clients # servers 1 0.009 1.4 mil. 694.4 20 164 696.9 397 1 8

Top 5 bandwidth consuming objects 1 2 3 4 5 (inbound) object size (MB) 0.009 0.002 333 0.005 2.23 # requests 1.4 mil. 3 mil. 21 1.4 mil. 1,457 (inbound) object size (MB) 694.4 702.2 690.3 775.6 698.1 # clients 20 14 22 16 14 # servers 164 91 83 105 74 (outbound) object size (MB) 696.9 699.2 699.1 700.9 634.2 # clients 397 1000 Few UW clients and servers exchange big, popular objects 390 558 540 # servers 1 4 10 2 1 Server Availability 100% Fraction of Requests by Server Status Codes 80% 35% 30% % Responses 60% 40% 65% 69% 81% 89% Other Error (5xx) Success (2xx) 20% 0% 19% Akamai Gnutella 6% Most P2P servers are unresponsive 9

Summary 3: (how do characteristics differ?) P2P objects are three orders of magnitude bigger than Web objects P2P servers are largely unresponsive (but it s not clear whether it matters to users) Question 4: How is bandwidth use distributed among objects, clients, servers? 10

Bytes Xferred Unique objects inbound outbound 1.51TB 3.02TB 72,818,997 3,412,647 inbound outbound 1.78TB 13.6TB 111,437 166,442 Clients 39,285 1,231,308 4,644 611,005 Servers 403,087 1,463 281,026 3,888 Object Popularity at UW 100% Gnutella Top Bandwidth Consuming Objects 80% % Bytes 60% 40% Akamai 20% 0% 0 100 200 300 400 500 600 700 800 900 1000 Number of Objects 1,000 objects (out of 111K) responsible for 50% of bytes transferred 11

Bytes Xferred Unique objects Clients inbound outbound 1.51TB 3.02TB 72,818,997 3,412,647 39,285 1,231,308 inbound outbound 1.78TB 13.6TB 111,437 166,442 4,644 611,005 Servers 403,087 1,463 281,026 3,888 UW Clients 100% 80% Gnutella Top Bandwidth Consuming UW Clients (as fraction of each system) % Bytes 60% 40% 20% + Akamai 0% 0 100 200 300 400 500 600 700 800 900 1000 Number of UW Clients Small number of clients account for large portion of traffic 200 Web+Akamai = 13% of traffic 200 = 50% of traffic (20% of all incoming HTTP) 12

Bytes Xferred Unique objects Clients inbound outbound 1.51TB 3.02TB 72,818,997 3,412,647 39,285 1,231,308 inbound outbound 1.78TB 13.6TB 111,437 166,442 4,644 611,005 Servers 403,087 1,463 281,026 3,888 UW Servers % Bytes 100% 80% 60% 40% Top Bandwidth Consuming UW Servers (as fraction of each system) Gnutella 20% 0% 0 200 400 600 800 1000 Number of UW Servers 20 servers supply 80% of Web traffic 334 servers (10%) supply 80% of traffic Load is not uniformly spread across peers! 13

Summary 4: (how is bandwidth distributed?) A very small number of P2P objects, clients, and servers consume an enormous fraction of traffic Question 5: Could P2P data be cached? 14

Caching Methodology Simulation of an ideal cache: Infinite storage, bandwidth, CPU, etc All objects are cached forever P2P objects are immutable, no dynamic data Evaluated: Both inbound and outbound P2P data cacheability Objects are big: Byte hit rate more important than object hit rate Gives an upper bound on potential for caching P2P Data Cacheability 100% Ideal Byte Hit Rate () Outbound 80% Byte Hit Rate 60% 40% Inbound 20% 0% 28-May 2-Jun 7-Jun 12-Jun 17-Jun 22-Jun 27-Jun 2-Jul 7-Jul 12-Jul Potential for caching is high in both directions UW inbound cache needs a month to warm-up 15

Summary 5: (is it cacheable?) Large potential for data cacheability in P2P P2P caches take a long time to warm-up Conclusions The Internet is being used in a completely different way: P2P traffic is the largest bandwidth consumer Peers consume significant bandwidth in both directions P2P objects are 1,000 times larger than Web objects Small number of huge objects are responsible for an enormous fraction of bytes transferred 300 objects consumed 5.6TB bandwidth! Few P2P peers are causing much of the traffic Large potential for caching in P2P 16

Final Thoughts The average peer consumes 90x more bandwidth than the average Web client Adding another 450 peers is equivalent to doubling the entire UW web client population Is the current architecture of scalable? 17