Characterization of accuracy problems in NetFlow data and approaches to handle them

Characterization of accuracy problems in NetFlow data and approaches to handle them IRTF NMRG / 3rd NetFlow/IPFIX Workshop Jochen Kögel jochen.koegel@ikr.uni-stuttgart.de IETF 78 Maastricht 30 July 2010 University of Stuttgart Institute of Communication Networks and Computer Engineering (IKR) Prof. Dr.-Ing. Andreas Kirstädter

Outline Motivation: why looking at accuracy of data? Accuracy issues Handling problems with exporter profile Conclusion and Outlook 2

Motivation Scenario Global Enterprise Network, MPLS-VPN Flow data from several (1..5) routers on path Own routers (full control) CE-Routers of carrier ("read only") Netflow-based (v5) view on traffic at several points in the network Correlation of Flow data for extraction of network characteristics 3

Motivation Extraction of network charakteristics Extractable characteristics are e.g. one-way-delay, RTT, packet loss, flow contention Requires matching flow records For the same 6-Tuple (src/dst address, src/dst port, protocol, ToS) Exported from different observation points (exporter + input interface) 4

Motivation Extraction of network charakteristics Matching First try (very strict rules) Take flows exported in one record only Drop implausible records Match records, match forward and reverse, drop implausible data 12,500 bidirectional trajectories left from 22 million records (10 samples per path and hour) Two questions 1. Precission of characteristics obtained from flow-data wrt. timings, bytes, packets 2. What do we have to consider in consistency and plausibility checks in order to get a high amount of samples? 5

Overview of issues (not all shown in detail) Record loss (the simplest one) Duplicates Packet counters Byte counters Clock accuracy Granularity "Noise" Jumps Clock offset, clock skew Different reasons Inaccuracies at exporters Configuration issues Middle boxes 6

Comparing trace to NetFlow: scenario Path between two European cities 5 day packet trace, filtered on two endpoints: application probe and server Flow data from three exporters (two of them CE) 7

Comparing trace to NetFlow: byte count Byte count Issue Byte count different at one observation point, but packet count consistent Here: router rounds byte count up to 46 Bytes Side note: Similar effects from some middle boxes (WAN optimizers) 20 18 16 Histogram of byte difference: exporter 3 forward: trace bytes netflow bytes reverse: trace bytes netflow bytes frequency 14 12 10 8 6 4 2 0 50 40 30 20 10 0 byte difference 8

Comparing trace to NetFlow: clocks Clock Offset and Skew (CE-Routers) 6.224 x 105 6.225 Time difference betw. trace and exporter 1 forward, start time difference reverse, start time difference forward, end time difference reverse, end time difference time difference in milliseconds 6.226 6.227 6.228 6.229 6.23 day1 day2 day3 day4 day5 9

Comparing trace to NetFlow: clocks Distribution of time difference 0.2 0.18 0.16 Histogram of time difference: trace exporter 1 fw start time diff. rv start time diff. fw end time diff. rv end time diff. 0.06 0.05 Histogram of time difference: trace exporter 3 fw start time diff. rv start time diff. fw end time diff. rv end time diff. rel. frequency 0.14 0.12 0.1 0.08 0.06 0.04 0.02 rel. frequency 0.04 0.03 0.02 0.01 0 60 40 20 0 20 40 60 time difference in milliseconds "Signal to Noise Ratio" depends on exporter On good exporters (left) accuracy around +/- 10 ms. Right: much more noise. Note: difference between start and end time diff of reverse-flow granularity issue? 0 0 20 40 60 80 time difference in milliseconds 10

From NetFlow data only: granularity of clocks Determination of granularity calculate difference between record start times, duration, end times,... and/or calculate greatest common divisor Results start/end time granularity: 4 ms or 1 ms (see following slides) duration granularity: 4 ms on all exporters nsecs-granularity: 15,258 (1e9/2^16) 11

From Netflow data only: granularity of clocks This exporter: 1ms granularity 12000 Difference between adjacent records 10000 8000 occurence 6000 4000 2000 0 0 2 4 6 8 10 12 14 16 18 20 difference between start values in ms 12

From Netflow data only: granularity of clocks Another exporter: 4 ms granularity 3 x 104 Difference between adjacent records 2.5 2 occurence 1.5 1 0.5 0 0 2 4 6 8 10 12 14 16 18 20 difference between start values in ms 13

from Netflow data only: granularity of clocks Yet another one: 4 ms granularity (simple calculation would reveal 1 ms) 12 x 104 Difference between adjacent records 10 8 occurence 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20 difference between start values in ms 14

From NetFlow data only: duplicates Definition of "Duplicates" More than one record for the same key within a time interval. What to do? depends on type of duplicate 15

Handling problems using an exporter profile Observation Routers behave differently wrt. accuracy of timestamps, bytecount, duplicates,... Knowing these effects can lead to more/better results Exporter profile that describes effects and accuracy for each exporter Exporter profile Exporter-specific par Timestamp granularity, Timestamp accuracy and behavior How to handle duplicates Bytecount problems Configuration/scenario-specific part Clock offsets/skew Middlebox locations How to obtain exporter profiles? Manufacturer (?) "Calibration" using packet trace Using NetFlow data only (e.g. more accurate data from off-peak times) 16

Conclusion and Outlook Conclusion Accuracy issues wrt. timestamps, byte count, duplicates identified Effects depend on router NetFlow data from "good" routers is suitable for estimating one-way-delays with at least +/- 20 ms accuracy Outlook Exporter Profile Profile format and relation to other (configuration) items Methods to create exporter profile Evaluate improvements wrt. acccuracy from different features in the Exporter Profile Load dependency? Discussion: More accuracy issues known? 17