Back-Office Web Traffic on the Internet. IMC 2014 Vancouver, BC, CANADA November 5-7, 2014

Similar documents
Peering at Peerings: On the Role of IXP Route Servers

Annoyed Users: Ads and Ad-Block Usage in the Wild

Express or Local Lanes: On Assessing QoE over Private vs. Public Peering Links

The forces behind the changing Internet: IXPs and content delivery and SDN

A Multi-Perspective Analysis of Carrier-Grade NAT Deployment

Understanding the Share of IPv6 Traffic in a Dual-Stack ISP

Inferring BGP Blackholing in the Internet

Illegitimate Source IP Addresses At Internet Exchange Points

Empirical Analysis of the Effects and the Mitigation of IPv4 Address Exhaustion

Mobile Content Hosting Infrastructure in China: A View from a Cellular ISP. Zhenhua Li Chunjing Han Gaogang Xie

Anatomy of a Large European IXP

Content Deployment and Caching Techniques in Africa

A First Look at QUIC in the Wild

Large-Scale Scanning of TCP s Initial Window

Peering with Akamai. CaribNOG November 2015 Belize City, Belize

PoP Level Mapping And Peering Deals

Revisiting router architectures with Zipf

Advancing the Art of Internet Edge Outage Detection

Web Content Cartography. Georgios Smaragdakis Joint work with Bernhard Ager, Wolfgang Mühlbauer, and Steve Uhlig

BGP Community Harvesting: Locating Peering Infrastructures

Analyzing Cacheable Traffic in ISP Access Networks for Micro CDN applications via Content-Centric Networking

CLOAK OF VISIBILITY : DETECTING WHEN MACHINES BROWSE A DIFFERENT WEB

Detecting Peering Infrastructure Outages

BGP Case Studies. ISP Workshops

Distributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018

Cloak of Visibility. -Detecting When Machines Browse A Different Web. Zhe Zhao

CS November 2018

A content delivery perspective on mobility in the Internet

Routing optimisation without BGP

working with Akamai to improve HTTP(S) traffic flows

Indirection. science can be solved by adding another level of indirection" -- Butler Lampson. "Every problem in computer

Network infrastructure, routing and traffic. q Internet inter-domain traffic q Traffic estimation for the outsider

Turbo King: Framework for Large- Scale Internet Delay Measurements

CS November 2017

Internet Inter-Domain Traffic. C. Labovitz, S. Iekel-Johnson, D. McPherson, J. Oberheide, F. Jahanian, Proc. of SIGCOMM 2010

A methodology for estimating interdomain Web traffic demand

Application Layer Traffic Optimization (ALTO)

Defining the Impact of the Edge Using The SamKnows Internet Measurement Platform. PREPARED FOR EdgeConneX

The Internet: A Complex System at its Limits

Deriving Traffic Demands for Operational IP Networks: Methodology and Experience

HAIR: Hierarchical Architecture for Internet Routing

Understanding Online Social Network Usage from a Network Perspective

The Rise and Rise of Content Distribution Networks. Geoff Huston APNIC WIE December 2017

An overview on Internet Measurement Methodologies, Techniques and Tools

OpenCache. A Platform for Efficient Video Delivery. Matthew Broadbent. 1 st Year PhD Student

ThousandEyes for. Application Delivery White Paper

Improving Content Delivery Using Provider-aided Distance Information

Networking 101 ISP/IXP Workshops

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Drafting Behind Akamai (Travelocity-Based Detouring)

COMP6218: Content Caches. Prof Leslie Carr

CORD use cases and benefits: Telefonica Edge Computing

Internet Interconnection An Internet Society Public Policy Briefing

MAPPING PEERING INTERCONNECTIONS TO A FACILITY

Peering as a Cloud enabler for Enterprises

Performance and Quality-of-Service Analysis of a Live P2P Video Multicast Session on the Internet

Send me up to 5 good questions in your opinion, I ll use top ones Via direct message at slack. Can be a group effort. Try to add some explanation.

Revealing the load-balancing behavior of YouTube traffic of interdomain links

The impact of fiber access to ISP backbones in.jp. Kenjiro Cho (IIJ / WIDE)

From Internet Data Centers to Data Centers in the Cloud

How Akamai delivers your packets - the insight. Christian Kaufmann SwiNOG #21 11th Nov 2010

SDN Use-Cases. internet exchange, home networks. TELE4642: Week8. Materials from Prof. Nick Feamster is gratefully acknowledged

MAPPING PEERING INTERCONNECTIONS TO A FACILITY

Akamai's V6 Rollout Plan and Experience from a CDN Point of View. Christian Kaufmann Director Network Architecture Akamai Technologies, Inc.

Multihoming Complex Cases & Caveats

Cache Management for TelcoCDNs. Daphné Tuncer Department of Electronic & Electrical Engineering University College London (UK)

Computer Networks CS 552

Computer Networks CS 552

Open Connect Overview

Peering THINK. A Guide

FG INET: Internet Network Architectures

Open Connect Everywhere: A Glimpse at the Internet Ecosystem through the Lens of the Netflix CDN

Challenges in deploying QoS in contemporary networks. Vikrant S. Kaulgud

List of measurements in rural area

Attack Fingerprint Sharing: The Need for Automation of Inter-Domain Information Sharing

SFMap: Inferring Services over Encrypted Web Flows using Dynamical Domain Name Graphs TMA 2015

PinPoint: A Ground-Truth Based Approach for IP Geolocation

The Guide to Best Practices in PREMIUM ONLINE VIDEO STREAMING

New levels of cooperation between eyeball ISPs and OTT/CDNs. RIPE 75 Dubai Oct 24, 2017 Falk von Bornstaedt, DTAG ICSS

CONTENT-AWARE DNS. IMPROVING CONTENT-AWARE DNS RESOLUTION WITH AKAMAI DNSi CACHESERVE EQUIVALENCE CLASS. AKAMAI DNSi CACHESERVE

Correlating Network Congestion with Video QoE Degradation - a Last-Mile Perspective

Development of IPX: Myth or Reality?

Guaranteeing Video Quality

Cato Cloud. Software-defined and cloud-based secure enterprise network. Solution Brief

The Internet today. Measuring the Internet: challenges and applications. Politecnico di Torino 7/12/2011. Speaker: Marco Mellia

Chapter 6: Distributed Systems: The Web. Fall 2012 Sini Ruohomaa Slides joint work with Jussi Kangasharju et al.

Securing BGP. Geoff Huston November 2007

A Tale of Three CDNs

It s Time to Aim Lower Zero Latency at the Edge

Implementation Guide - VPN Network with Static Routing

Peering at DE-CIX. Frank P. Orlowski Director BIZ Development, Marketing & Consulting. DE-CIX is proud to be the Silver Sponsor of MENOG7

IPv6: Are we really ready to turn off IPv4?

2nd SIG-NOC meeting and DDoS Mitigation Workshop Scrubbing Away DDOS Attacks. 9 th November 2015

TechNote AltitudeCDN OmniCache Integration with Microsoft Teams Live Events

Shortcuts through Colocation Facilities

A Server-to-Server View of the Internet

BGP Routing inside an AS

Measuring IPv6 Deployment

IP Traffic Exchange Market Developments and Policy Challenges

Introduction to BGP. ISP/IXP Workshops

Transcription:

Back-Office Web Traffic on the Internet Enric Pujol Philipp Richter Balakrishnan Chandrasekaran Georgios Smaragdakis Anja Feldmann Bruce Maggs Keung- Chi Ng TU- Berlin TU- Berlin Duke University MIT / TU- Berlin / Akamai TU- Berlin Duke University / Akamai Akamai IMC 2014 Vancouver, BC, CANADA November 5-7, 2014

The Web for an end user Search$engine End$user HTTP$GET CDN Front- office Web traffic: Web traffic between end users and servers The front- office AdPublisher 2

Behind the scenes... Search$engine End$user HTTP$GET CDN Back- office Web traffic: Machine- to- machine Web traffic? The front- office AdPublisher The back- office 3

Search engines: crawlers HTTP$GET Search$engine Crawlers Content HTTP$GET End$user CDN The front- office AdPublisher The back- office 4

Content delivery: proxies HTTP$GET Search$engine Crawlers Content HTTP$GET HTTP$GET$ HTTP$GET$ End$user CDN Overlay$of$proxies Origin The front- office AdPublisher The back- office 5

AdExchanges: real-time bidding HTTP$GET Search$engine Crawlers Content HTTP$GET HTTP$GET$ HTTP$GET$ End$user CDN Overlay$of$proxies Origin AdExchange HTTP$POST The front- office AdPublisher Auctioneer Advertisers/bidders The back- office 6

Agenda 1. Introduction 2. Methodology and datasets 3. Characteristics 1. Traffic 2. Patterns 3. Inter- domain perspective 4. CDN back- office traffic 5. The end- user perspective 6. Summary and implications 7

Vantage points (VP) Type VP Daily traffic Observations IXPs 11,900 TB SFlow (1/16K) M- IXP 1,580 TB Transit BBone- 1 40 TB Packet sampled (1/1K) BBone- 2 70 TB Content CDN 350 TB 5 locations Eyeballs RBN 35 TB Packet dumps Diverse vantage points: multiple perspectives 8

Candidate IPs for the back-office Send and receive requests Dual role IPs are prime candidates 9

Candidate IPs for the back-office Send and receive requests Send and receive requests Dual role IPs are prime candidates 10

Candidate IPs for the back-office Many requests to many servers Heavy hitter IPs are also prime candidates 11

Candidate IPs for the back-office Many requests to many servers Many requests to a few servers Heavy hitter IPs are also prime candidates 12

Sources of back-office Web traffic Dual role IPs Heavy hitters 13

Sources of back-office Web traffic Dual role IPs Crawling?? Heavy hitters Real- time bidding? 14

Dual-role IPs: active measurements Client only (%) Server only (%) Dual- role (%) Passive 96.90 2.74 0.36 Passive+Active 93.85 2.74 3.40 ZMap project: Internet- wide scan of Web Servers (scans.io) Observations: 1. Most IPs have only client behavior 2. Many servers also show client behavior Active measurements augment the number of servers 15

Candidates: manual classification Crawlers: Reverse DNS + Origin AS 3.9K IPs, 74% in 2 orgs Auctioneers: URL + Origin AS 316 IPs, 4 orgs Content Delivery Proxies: Origin AS + Reverse DNS (for caches) 36K IPs, 8 orgs Other: Rest of dual- role IPs 151K IPs, mostly in cloud prov. 16

Candidates: manual classification Crawlers: Reverse DNS + Origin AS 3.9K IPs, 74% in 2 orgs Auctioneers: URL + Origin AS 316 IPs, 4 orgs Content Delivery Proxies: Origin AS + Reverse DNS (for caches) 36K IPs, 8 orgs Other: Rest of dual- role IPs 151K IPs, mostly in cloud prov. 17

Candidates: manual classification Crawlers: Reverse DNS + Origin AS 3.9K IPs, 74% in 2 orgs Auctioneers: URL + Origin AS 316 IPs, 4 orgs Content Delivery Proxies: Origin AS + Reverse DNS (for caches) 36K IPs, 8 orgs Other: Rest of dual- role IPs 151K IPs, mostly in cloud prov. 18

Candidates: manual classification Crawlers: Reverse DNS + Origin AS 3.9K IPs, 74% in 2 orgs Auctioneers: URL + Origin AS 316 IPs, 4 orgs Content Delivery Proxies: Origin AS + Reverse DNS (for caches) 36K IPs, 8 orgs Other: Rest of dual- role IPs 151K IPs, mostly in cloud prov. 19

Agenda 1. Introduction 2. Methodology and datasets 3. Characteristics 1. Traffic 2. Patterns 3. Inter- domain perspective 4. CDN back- office traffic 5. The end- user perspective 6. Summary and implications 20

Traffic Transit links: different obs. IXPs: from 10% to 20% At least 10% in our VPs 21

Traffic: Contribution per class CDPs Auctioneers Crawlers Other Bytes 12.1 % 1.1 % 10.3 % 76.5 % Requests 11.8 % 22.5 % 15.1 % 50.6 % Observations: 1. CDPs big players significant share 2. Real- time bidding many but small transactions 3. Crawlers a few orgs significant share 4. Other cloud service providers All classes contribute. More to discover 22

Traffic patterns: bytes % back- office Web traffic increases during off hours in IXPs 23

Traffic patterns: requests Observations: 1. A multiplicative factor of human activity (e.g., RTB) 2. Non- human triggered activity (e.g., crawlers) 24

Inter-domain perspective Top 10 traffic carrying links: 4 x Cloud Content 3 x Search Hosters 2 x CDN Content 1 x Content Advertisement Back- office traffic appears in many peering links 25

Agenda 1. Introduction 2. Methodology and datasets 3. Characteristics 1. Traffic 2. Patterns 3. Inter- domain perspective 4. CDN back- office traffic 5. The end- user perspective 6. Summary and implications 26

A CDN perspective CDN$cluster Front&office Public Private Origin End*user CDN$front*end CDN$back*end CDN$back*end Origin Three sub- classes of back- office traffic 27

A CDN perspective CDN$cluster Front&office Public Private Origin End*user CDN$front*end CDN$back*end CDN$back*end Origin Public: front- end back- end over the Internet 28

A CDN perspective CDN$cluster Front&office Public Private Origin End*user CDN$front*end CDN$back*end CDN$back*end Origin Private: within same cluster 29

A CDN perspective CDN$cluster Front&office Public Private Origin End*user CDN$front*end CDN$back*end CDN$back*end Origin Origin: inter- organization over the Internet 30

Back-office per location Front- office Back- office: ORIGIN Back- office: PUBLIC Back- office: PRIVATE Edge cluster Regional hub CDNs heavily rely on back- office traffic 31

Agenda 1. Introduction 2. Methodology and datasets 3. Characteristics 1. Traffic 2. Patterns 3. Inter- domain perspective 4. CDN back- office traffic 5. The end- user perspective 6. Summary and implications 32

The end-user perspective Residential broadband network: backbone latency (no access) Continental Local Trans- continental A smaller front- office: but the back- office may be large 33

Summary 1. A back- office to support the Web 2. Significant traffic: bytes and requests 3. Different type of traffic patterns 4. Visible at multiple peering links An important yet understudied class of traffic 34

Implications Feasibility to deploy new protocols: It is easier to change the back office than the front office Performance evaluation: Interactions with the back office More users than anticipated Opportunities: ISPs: micro- data centers, virtualized services IXPs: co- location strategies NSPs: new services e.g., SLAs 35

Implications Feasibility to deploy new protocols: It is easier to change the back office than the front office Performance evaluation: Interactions with the back office More users than anticipated Opportunities: ISPs: micro- data centers, virtualized services IXPs: co- location strategies NSPs: new services e.g., SLAs 36

Implications Feasibility to deploy new protocols: It is easier to change the back office than the front office Performance evaluation: Interactions with the back office More users than anticipated Opportunities: ISPs: micro- data centers, virtualized services IXPs: co- location strategies NSPs: new services e.g., SLAs 37

Back-office traffic on the Internet HTTP$GET Search$engine Crawlers Content HTTP$GET HTTP$GET$ HTTP$GET$ End$user CDN Overlay$of$proxies Origin AdExchange HTTP$POST The front- office AdPublisher Auctioneer Advertisers/bidders The back- office (some examples thereof) 38