TCP Spatial-Temporal Measurement and Analysis

TCP Spatial-Temporal Measurement and Analysis Infocom 05 E. Brosh Distributed Network Analysis (DNA) Group Columbia University G. Lubetzky-Sharon, Y. Shavitt Tel-Aviv University 1

Overview Inferring network performance Method of identifying root cause (bad) links Analytic and Experimental Results Inferring path characteristics from end-to-end measurements 2

Goal Infer the performance of the internet by passively monitoring network traffic Quantitative versus qualitative approach How bad is a link? => infer the characteristics (e.g., lossiness) of network links Which links are bad? => classify links according to their performance 3

Solutions - Previous Work Focus on the quantitative approach Inferring link characteristics (e.g., lossiness) Active probing (MINC, Pathchar, Packet Stripes, ) Network overhead may bias the results Passive observations (padmanabhan 03) High-complexity analysis, e.g., Bayesian inference Detecting shared congestion (passive approach) Entropy-based clustering technique (Katabi 01) Measurement point needs to observe large portions of data Correlation-based technique (Rubenstein 00) Supports only pair-wise congestion detection 4

Our contributions Focus on the qualitative approach Consider a simplified problem: identifying lowperformance links rather than inferring detailed link statistics Cost-effective algorithm Based on a Root Cause criterion Practical and relatively easy to implement High detection ratio with low errors 5

Passive Measurements Experimental Setup Capture TCP headers using tcpdump from an edge ISP router Network Topology Collect network paths using traceroute (infrequent probing) Diagnostic node ISP Internet 6

Our approach link classification Goal locate low-performance links Our approach: classify and report lowperformance links using a Root Cause criterion Performance metrics: link loss, reordering, and duplication rates Rest of the talk will focus on link loss 7

Root Cause Criterion Def: link (u,v) is a Root Cause (RC) If its loss probability is larger by at least δ compared to the loss probabilities of the links entering u or leaving v a b c 0.1 e 0.2 0.3 f 0.2 0.5 0.3>0.1 and 0.3>0.2 => c is RC link 0.5>0.3 => f is RC link δ=0 (zero threshold) 8

Root Cause Detection Algorithm 1. Estimate link loss rates For each link average the loss rates of the paths going through it 2. Link classification For each link apply the root cause criterion using the estimated link loss rates Assumption: network routes and link characteristics are stationary 9

Evaluation - Analysis Generalization: assume weighted average {w 1,,w n } = weights of paths going through link l {r 1, r n } = loss rates of the paths going through link l ^ p l = w j r j, the weighted average of losses on link l is a biased estimator pˆ l = p (1 { l + pl w j e j 123 { ) real real success prob. loss prob. j: l t error factor e j = loss probability of path j excluding l s losses j 10

Estimator Analysis Assumptions Loss probabilities are i.i.d random vars, mean 1-μ, var σ 2 All paths have length h Bias properties E[ pˆ V[ pˆ l l p p l l ] = (1 p ] = (1 p l l )(1 μ ) 2 2 (( σ h 1 2 + μ ) h 1 μ 2( h 1) Bias tends to a constant value for small loss prob. The variance of the bias can decrease as path length increases ) ) w 2 j 11

Estimator Analysis (cont d) Root Cause criterion The difference between loss probability estimators for two adjacent links has a lower bias than the bias of a single link estimator. 12

Evaluation - Simulations Methodology A directed acyclic graph topology 100-3000 nodes, max degree = 5-10 Link loss probabilities drawn from Zipf and uniform distributions over the range [0,0.04] For each loss probability value: Portion of experiments with true classification Portion of experiments with false classification (false positive and false negative) 13

Simulation Results For high loss probabilities Detection ratio above 95% False ratio below 10% Positive correlation between RCclassified links and links with true high loss probabilities. δ=0 (zero threshold) 14

Evaluation - Internet Results Israeli ISP sample, Dec. 2002, 230 million TCP packets Validation by graph visualization of problematic links Worst lossy links, highest->lowest Internal Link in a US provider Israeli ISP -- UK Software company Israeli ISP -- Israeli portal Israeli ISP -- Israeli portal Internal link in an Israeli portal Link between 2 Israeli ISPs Israeli ISP -- US telecom Israeli ISP and a US ISP 2 US ISPs Top worst links (significant losses or reordering) are inter-isp links. 15

Inferring path characteristics from end-to-end measurements Classify out-of-sequence TCP packets: Sender s retransmissions ( = loss rate) Network reordering Network duplication Our technique relies on observing a oneway direction of a TCP connection Key Idea: Leverage IP-ID field In practice the sender s IP-ID forms a monotonic increasing sequence 16

Classification Process (per each packet) Is TCP sequence # Out-of-order? yes yes Is packet previously observed? no Is the IP-ID of both Packets different? no Duplicate yes yes Retransmission Is IP-ID In-order? no Reordering 17

Summary Passive inference of low-performance links is feasible Based on a Root-Cause criterion Cost effective and comprehensive solution High detection ratio (95%) with low errors (10%) Temporal results (details in paper) Passive study of consecutive packet losses Geometric-like distribution Revisited classical TCP throughput analysis Show that TCP throughput formulas apply for the Bernoulli loss model. 18