Assessing the Nature of Internet traffic: Methods and Pitfalls

Similar documents
Rethinking The Building Block: A Profiling Methodology for UDP Flows

Internet Traffic Classification using Machine Learning

Heuristics to Classify Internet Backbone Traffic based on Connection Patterns

PeerApp Case Study. November University of California, Santa Barbara, Boosts Internet Video Quality and Reduces Bandwidth Costs

SVILUPPO DI UNA TECNICA DI RICONOSCIMENTO STATISTICO DI APPLICAZIONI SU RETE IP

4. The transport layer

Security Enhancement by Detecting Network Address Translation Based on Instant Messaging

Network Architectures for Emerging Services Riad Hartani & Joe Neil Caspian Networks

App-ID. PALO ALTO NETWORKS: App-ID Technology Brief

Keywords Machine learning, Traffic classification, feature extraction, signature generation, cluster aggregation.

CSC Network Security

Some Observations of Internet Stream Lifetimes

Feature Rich Flow Monitoring with P4

Automated Application Signature Generation Using LASER and Cosine Similarity

Automated Traffic Classification and Application Identification using Machine Learning. Sebastian Zander, Thuy Nguyen, Grenville Armitage

Bittorrent traffic classification

TOWARDS HIGH-PERFORMANCE NETWORK APPLICATION IDENTIFICATION WITH AGGREGATE-FLOW CACHE

Computer Networks. General Course Information. Addressing and Routing. Computer Networks 9/8/2009. Basic Building Blocks for Computer Networks

Basic Concepts in Intrusion Detection

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning

BLINC: Multilevel Traffic Classification in the Dark

The impact of fiber access to ISP backbones in.jp. Kenjiro Cho (IIJ / WIDE)

Trends and Differences in Connection-behavior within Classes of Internet Backbone Traffic

An Analysis of UDP Traffic Classification

Investigating Forms of Simulating Web Traffic. Yixin Hua Eswin Anzueto Computer Science Department Worcester Polytechnic Institute Worcester, MA

Network Forensics Prefix Hijacking Theory Prefix Hijacking Forensics Concluding Remarks. Network Forensics:

MEASUREMENT STUDY OF A P2P IPTV SYSTEM: SOPCAST

Entropy estimation for real-time encrypted traffic identification

SHARE THIS WHITEPAPER. Fastest Website Acceleration for New HTTP Protocol with Alteon NG and Advanced HTTP/2 Gateway Whitepaper

Design of Next Generation Internet Based on Application-Oriented Networking

Active and Passive Metrics and Methods (and everything in-between, or Hybrid) draft-ietf-ippm-active-passive-03 Al Morton Nov 2015

Measuring MPLS overhead

Revealing Skype Traffic: When Randomness Plays with You

Application Identification Based on Network Behavioral Profiles

HUAWEI USG6000 Series Next-Generation Firewall Technical White Paper VPN HUAWEI TECHNOLOGIES CO., LTD. Issue 1.1. Date

RFC1918 updates on servers near M and F roots C A I D A. Andre Broido, work in progress. CAIDA WIDE Workshop ISI, CAIDA / SDSC / UCSD

A Method and System for Thunder Traffic Online Identification

Week 7: Traffic Models and QoS

A Hybrid Approach for Accurate Application Traffic Identification

Octoshape. Commercial hosting not cable to home, founded 2003

A Firewall Architecture to Enhance Performance of Enterprise Network

Visualization of Internet Traffic Features

ThousandEyes for. Application Delivery White Paper

Behavioral Graph Analysis of Internet Applications

Performance and Quality-of-Service Analysis of a Live P2P Video Multicast Session on the Internet

Deep-Q: Traffic-driven QoS Inference using Deep Generative Network

Network Heartbeat Traffic Characterization. Mackenzie Haffey Martin Arlitt Carey Williamson Department of Computer Science University of Calgary

BitTorrent Traffic Classification

On Minimizing Packet Loss Rate and Delay for Mesh-based P2P Streaming Services

QoS Services with Dynamic Packet State

Understanding the effect of streaming overlay construction on AS level traffic

Understanding the Start-up Delay of Mesh-pull. Peer-to-Peer Live Streaming Systems

ADVANCED TOPICS FOR CONGESTION CONTROL

Rab Nawaz Jadoon. Characterizing Network Traffic DCS. Assistant Professor. Department of Computer Science. COMSATS IIT, Abbottabad Pakistan

Twitter Adaptation Layer Submitted for Drexel University s CS544

Supporting Scalability and Adaptability via ADAptive Middleware And Network Transports (ADAMANT)

Congestion Control and Resource Allocation

Can we trust the inter-packet time for traffic classification?

Master Course Computer Networks IN2097

Statistical based Approach for Packet Classification

A Study of the Merits of Precision Time Protocol (IEEE-1588) Across High-Speed Data Networks

ECE 333: Introduction to Communication Networks Fall 2001

Internet Inter-Domain Traffic

CSE 461: Computer Networks John Zahorjan Justin Chan Rajalakshmi Nandkumar CJ Park

Real-Time Protocol (RTP)

TIE. Traffic Identification Engine. Alberto Dainotti - COMICS Research Group University of Napoli Federico II

APP-ID. A foundation for visibility and control in the Palo Alto Networks Security Platform

Internet Protocol version 6

Adaptive Bit Rate (ABR) Video Detection and Control

Internet. 1) Internet basic technology (overview) 3) Quality of Service (QoS) aspects

Scalable and Interoperable DDS Security

Multimedia Streaming. Mike Zink

Chunk Scheduling Strategies In Peer to Peer System-A Review

Indicate whether the statement is true or false.

Replicate It! Scalable Content Delivery: Why? Scalable Content Delivery: How? Scalable Content Delivery: How? Scalable Content Delivery: What?

liberate, (n): A library for exposing (traffic-classification) rules and avoiding them efficiently

SOMETIME1 SOftware defined network-based Available Bandwidth MEasuremenT In MONROE

A First Look at QUIC in the Wild

Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web

Game Traffic Analysis: An MMORPG Perspective

Data Preparation. UROŠ KRČADINAC URL:

Table of Contents...2 Abstract...3 Protocol Flow Analyzer...3

MAD 12 Monitoring the Dynamics of Network Traffic by Recursive Multi-dimensional Aggregation. Midori Kato, Kenjiro Cho, Michio Honda, Hideyuki Tokuda

Performance Evaluation of Tcpdump

Two days in The Life of The DNS Anycast Root Servers

Managed IP Services from Dial Access to Gigabit Routers

On the Stability of the Information Carried by Traffic Flow Features at the Packet Level

Guaranteeing Video Quality

A New Approach To Manage a Best Effort IP WAN Services

An study of the concepts necessary to create, as well as the implementation of, a flexible data processing and reporting engine for large datasets.

A Flow Label Based QoS Scheme for End-to-End Mobile Services

Single Network: applications, client and server hosts, switches, access links, trunk links, frames, path. Review of TCP/IP Internetworking

The Internet today. Measuring the Internet: challenges and applications. Politecnico di Torino 7/12/2011. Speaker: Marco Mellia

PLEASE READ CAREFULLY BEFORE YOU START

NMI End-to-End Diagnostic Advisory Group BoF. Spring 2004 Internet2 Member Meeting

Global DDoS Measurements. Jose Nazario, Ph.D. NSF CyberTrust Workshop

Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks

Strategies for sound Internet measurement

Tracking the Evolution of Web Traffic:

Internet Continuous Situation Awareness

Transcription:

Assessing the Nature of Internet traffic: Methods and Pitfalls Wolfgang John Chalmers University of Technology, Sweden together with Min Zhang Beijing Jiaotong University, China Maurizio Dusi Università degli Studi di Brescia, Italy kc claffy, Nevil Brownlee CAIDA, SDSC, UCSD, USA Traffic classification (TC) Introduction Bittorrent HTTP SMTP

Introduction (cont.) Why traffic classification Network design and provisioning QoS assignment and traffic shaping Accounting Security monitoring: IDS/IPS Network Forensics Trends and changes in network applications Introduction (cont.) Today s Internet evolving in scope and complexity applications adapt rapidly to detection attempts emerging obfuscation techniques Many classification approaches in literature using whatever traffic samples available no systematic integration of results

Outline Classification Methods Research review and taxonomy Survey analysis: P2P Pitfalls Systematic shortcomings Re-validate assumptions UDP rising Routing (a)symmetry on backbone links Research Review and Taxonomy Research review create a structured taxonomy of traffic classification papers and their datasets help to answer popular questions reveal open issues and challenges http://www.caida.org/research/traffic-analysis/classification-overview

Research review and taxonomy: Overview 64 papers published between 1994 and 2008 Definition: traffic classification Methods to classify traffic data sets based on features passively observed in the traffic, according to specific classification goals. Research review and taxonomy: Datasets and Goals Data sets: >80 data sets used for 64 papers! Time of collection, link type, capture environments, geographic location, (payload, anonymization), etc. Classification goals: Coarse or fine-grained classification Applications or protocols

Research review and taxonomy: Features Features Reacting on application development Research review and taxonomy: Methods Methods exact matching port number, payload, etc heuristic methods e.g. on connection patterns machine learning methods supervised and unsupervised

Survey analysis: P2P How much P2P 1.3% to 93% across the 18 (out of 64) papers Survey analysis: P2P (contd.) So how much of modern Internet traffic is P2P "there is a wide range of P2P traffic on Internet links; see your specific link of interest and classification technique you trust for more details."

Survey analysis: P2P (contd.) SUNET: April till Nov. 2006 Outline Methods Research review and taxonomy Survey analysis: P2P Pitfalls Systematic shortcomings Re-validate assumtions UDP rising Routing (a)symmetry on backbone links

Systematic Shortcomings Poor comparability of results!!! 80 data sets by 64 papers lack of shared, modern data sets as reference data no clear definitions (P2P or file-sharing ) lack of standardized measures lack of defined classification goals Assumption: TCP dominates traffic Current TC approaches consider mainly TCP Assumptions TCP is dominating traffic Bulk (data) transfer is done via TCP Advantage TCP has a clear notion of sessions

Assumption: TCP dominates traffic (cont.) There might be a shift (soon): IPTV applications PPLive, PPStream: switched to UDP in Oct. 2008 VA (Video Accelerator): UDP for data transfer P2P applications utp: Micro Transport protocol, based on UDP Part of utorrent 1.9 beta, expected during 2010 All on high, random ports (of course ) Assumption: TCP dominates traffic (cont.)

Assumption: TCP dominates traffic (cont.) CDF of UDP flows per Port number Indeed, high ephemeral ports are common today! Assumption: TCP dominates traffic (cont.) Avg. Packets/Flow for top 10 UDP ports No substantial data portions carried (on these links - yet)

Assumption: TCP dominates traffic (cont.) Current situation (on the links measured) TCP dominating pkts (bytes), UDP dominating flows UDP for P2P overlay signaling This might change soon: UDP based IPTV already common in China, utp UDP for bulk and streaming data transfer TC methods can no longer ignore UDP Assumption: routing symmetry Current approaches consider bidirectional traffic Assumption Traffic is routed symmetrically Same path for forward and backward direction Advantage Bi-directional information offers more features for classification For TCP, bi-directional information allows easier inference of sessions (connections)

Assumption: routing symmetry (cont.) Degree of symmetry 4 link locations (Sweden and USA) 2 samples each Assumption: routing symmetry (cont.) Beyond Intranets and access links (edge networks), there is little symmetry Degree of symmetry decreases with level of coreness of the link TC methods for backbone links need to master unidirectional data flows

Summary Research review structured taxonomy of traffic classification papers Current systematic shortcomings lack of shared, modern data sets as reference data lack of standardized measures lack of defined classification goals Upcoming technical challenges TC methods can no longer ignore UDP TC methods should handle unidirectional flows Traffic classification overview: http://www.caida.org/research/traffic-analysis/classification-overview/ Observations on UDP traffic on Internet backbone links: soon to be published on www.caida.org ( News section) Estimation of routing asymmetry on Internet links: http://www.caida.org/research/traffic-analysis/asymmetry/ or Email: johnwolf@chalmers.se