Visualization of Internet Traffic Features Jiraporn Pongsiri, Mital Parikh, Miroslova Raspopovic and Kavitha Chandra Center for Advanced Computation and Telecommunications University of Massachusetts Lowell, MA 01854 http://morse.uml.edu ABSTRACT A visualization study of network data traffic is presented that allows for interpretation of the hourly traffic trends and dependencies between traffic parameters. The traffic statistics are a function of many parameters such as the time of the day, the application port, protocol and packet size. The data used in this work is obtained from packet header traces measured at the University of Massachusetts Lowell router that connects the campus network to the Internet. The visualization of upstream and downstream traffic shows that the TCP protocol is the dominant traffic generator at all times of the day. Among the TCP applications, HTTP, FTP, and SMTP generate over seventy percent of the hourly byte and packet volume. This study also examines the relationship between applications and the packet sizes they generate. On an hourly time scale, over 90% of traffic in the outbound direction is in the 0-100 byte range and is generated by client traffic from web and email related applications. In the inbound direction, over 75% of the traffic are in the 576 and 1500 byte range and arise from server responses to the outbound requests. These dependent relationships between packet size and applications are found to be consistent features in the day to day patterns of bi-directional traffic between the campus network and the Internet. 1. INTRODUCTION The Internet data network is becoming an integral part of business, educational and consumer related communications. The number of new network hosts and network traffic are approximately doubling each year. To support this increased reliance on the data network, new models of packet delivery that afford controlled delays and losses are currently under investigation [1,2,3]. The design of a bandwidth management scheme will require an understanding of the traffic signatures that are likely to dominate the network. Visualization techniques offer a first step in qualitative understanding of the traffic trends and the relationships that can exist between various traffic parameters. These observations will form the basis for the design of detailed stochastic models for application towards a controlled services framework. Many traffic measurement infrastructures and modeling studies are available today than ever before, yet a full understanding of the traffic characteristics and their relation to network performance is lacking. Visualization studies are developed to extract meaningful insights from the masses of network data currently available. A basic problem in relation to the data network is the degree of traffic predictability. Visualization tools used in conjunction with the ability of the human visual system to recognize and interpret complex patterns can assist network engineers in identifying traffic invariants as well as anomalies that are relevant for network performance engineering. A number of Internet traffic measurement studies have been conducted in recent years. These studies have addressed both long-term trends [4,5], and detailed short term analysis of traffic structure [6,7]. Wide-area traffic studies [8] have been conducted revealing the traffic breakdown in terms of protocols and applications. These studies have shown the existence of long-range dependence features in Internet traffic, the effects of which generally tend to increase queueing delays and losses. Traffic decomposition techniques based on application ports was shown [9] to significantly reduce the degree of traffic correlation. This work presents a visualization of important long-term features of traffic generated from a local area network as well as the resulting traffic flow from the Internet. Section 2 describes the traffic measurements. In Section 3, some of the important traffic statistics are presented as a function of time of day. Section 4 summarizes the paper. 2. TRAFFIC MEASUREMENTS Traffic between the Internet and the University of Massachusetts Lowell (UML) network edge router flows through three T1 lines. The UML network is connected to the edge router through 100 Mbps fiber optic links. One of the ports on the router is dedicated to monitoring the uplink and downlink traffic. The packet header traces are obtained using the
tcpdump utility. Traffic measurements are collected 24 hrs a day, sev en days a week. The main results presented in this work are for one day during the week 10/05/99-10/11/99. The measurements of packet headers include timestamps at microsecond resolution, source and destination IP addresses, source and destination port numbers, packet size and protocol index. A sample of the data trace is shown in Table 1. hour min sec microsec source IP source port dest IP dest port: packet size protocol 12 8 22 847256 204.244.96.160 80 129.63.96.143 2075: 759 tcp 12 8 22 851160 129.63.206.32 1111 209.185.131.251 80: 466 tcp 12 8 22 852316 129.63.210.101 3609 206.151.166.121 80: 40 tcp Table 1. Raw Header Traces The outbound or uplink traffic represents packets originating from the UML network and are identified as packets with the network address prefix 129.63. The inbound traffic is differentiated by source IP addresses that do not belong to the UML subnet. The traffic statistics in both directions are considered. During a typical day, the UML network generates approximately 44M packets in the outbound direction and receives 46M packets from the Internet. The corresponding byte volumes are 22 and 27 Gbytes in the outbound and inbound directions respectively. The pattern of number of packets per hour and number of bytes per hour in the inbound and outbound directions are shown in Figs. 1 and 2. 4e+06 3.5e+06 inbound pkt count 3e+06 2.5e+06 2e+06 1.5e+06 1e+06 500000 0 0 20 40 60 80 100 120 140 160 180 time Fig. 1(a): Inbound traffic packets per hour. (10/05/99 : 10/11/99) 3.5e+06 3e+06 outbound 2.5e+06 pkt count 2e+06 1.5e+06 1e+06 500000 0 0 20 40 60 80 100 120 140 160 180 time Fig. 1(b): Outbound traffic packets per hour.
byte count 2e+09 1.8e+09 inbound 1.6e+09 1.4e+09 1.2e+09 1e+09 8e+08 6e+08 4e+08 2e+08 0 0 20 40 60 80 100 120 140 160 180 time Fig. 2(a): Inbound traffic bytes per hour. byte count 2.2e+09 2e+09 1.8e+09 1.6e+09 1.4e+09 1.2e+09 1e+09 8e+08 6e+08 4e+08 2e+08 outbound 0 0 20 40 60 80 100 120 140 160 180 time Fig. 2(b): Outbound traffic bytes per hour. The cyclical patterns observed are typical of most local area networks, however the peak hours of activity may vary for corporate and university networks. The cyclical structures are particularly evident in the packets generated per hour. The peak activity is seen to typically begin during 10-11 am and is sustained during the work day until about 8-9 pm. The graphs also show that the packet count trace exhibits less variability than the byte counts and may provide a more predictable component for traffic modeling. In the next section we characterize the traffic composition in terms of IP protocols, packet sizes and applications. 3. TRAFFIC STATISTICS The traffic characteristics are influenced by both the protocol and application. Transmission control protocol (TCP) based applications typically generate over 90% of the traffic bytes during any giv en hour. The remainder is contributed by the user datagram protocol (UDP) and other protocols related to the specific local area network. We consider one day of traffic corresponding to a 10/06/99 a Wednesday, to discuss the important statistics and note that these trends are observed during the other work days as well. The traffic patterns per hour due to TCP, UDP and other protocols are shown in Figs. 3(a) and 3(b) for outbound and inbound directions respectively. The peak usage occurs not only during the expected hours from 9am onwards, but also during the late evening and early morning hours due to activities from dorms. The dip in traffic observed during the 12-3 pm, 6 pm and 9 pm hours may be attributed to the reduction in number of users on the network due to lunch break, fewer afternoon classes, end of classes, and dinner times. Since TCP is the dominant protocol, we consider next the breakup in the applications running over TCP.
Fig. 3(a): Outbound traffic due to TCP, UDP and other protocols Fig. 3(b): Inbound traffic due to TCP, UDP and other protocols The applications may be identified by considering the source port index in the traffic header. The source port index ranges from 1 to over 50000. The well known ports correspond to applications such as http (80), ftp (20), smtp (25), nntp (19), telnet (20) etc.. Indices in the range 1024-49151 are registered port numbers used by applications executed by users. For visualization purposes the source port indices are grouped and mapped to a set of eight integer values as
shown in Table 2. Under this mapping, for example, HTTP sources will appear in bin ten, FTP sources in bin one, SMTP in bin two etc.. Figs. 4(a) and 4(b) depict the outbound and inbound traffic bytes contributed by the applications during each hour of the day. The application port axis has in some figures been arranged to depict the aggregate traffic count of each group in descending order. The outbound traffic pattern shows that registered ports (1000-10000) in bins 25 and 30 contribute a significant traffic volume, whereas in the inbound direction, the http port in bin 10 dominates the traffic flow during the peak hours, contributing over 70% of traffic bytes. This indicates that a significant portion of outbound traffic arises due to UML clients accessing servers and the inbound traffic is comprised mainly of the response of these servers residing on the Internet. One can also observe in the hourly patterns, a high level of correlation between the client generated traffic in Fig. 4(a) and the corresponding server response in Fig. 4(b). Src Port Index Bin 0-24 1 25-50 5 51-100 10 101-500 15 501-1000 20 1001-5000 25 5001-10000 30 10001+ 35 Table 2. Mapping of source port index to bin value for visualization. Fig. 4(a) Traffic contributed by different applications (outbound)
Fig. 4(b) Traffic contributed by different applications (inbound) Next we examine the packet sizes as a function of the application ports to determine any correlation that may exist between particular applications and packet sizes. To facilitate visualization, the packet sizes which typically range from 40-1500 bytes are mapped to a set of seven indices as tabulated in Table 3. packet size Bin 0-40 1 40-100 5 101-300 10 301-600 15 601-900 20 901-1200 25 1201+ 30 Table 3. Mapping of packet size to bin index for visualization. The packet size distribution is one of the features that exhibits the most consistent behavior, even among different networks. The packets are typically concentrated around the 40, 576, and 1500 byte regions and these values represent the IP packet header size, the maximum transfer unit supported by all IP routers and maximum packet size for IP networks. Figs. 5(a) and (b) depict the distribution of packet sizes for the aggregate traffic during each hour of the day. It can be seen that for the outbound direction which consists mainly of client traffic, the packet sizes in the range 0-100 contribute to over 60 % of the traffic. These packets are mainly comprised of TCP acknowledgement and HTTP document access related packets. On the other hand, the server traffic that dominates the inbound direction includes in addition to the ACK packets, a large percentage of data packets of size 576 and 1500 bytes.
Fig. 5(a) Packet size distribution for aggregate outbound traffic. Fig 5(b) Packet size distribution for aggregate inbound traffic. To determine the relation between packet sizes and application ports if any, Figs. 6(a) and 6(b) map the packet size index to the source port index. The results shown are obtained during one of the peak usage hours (11 am). The graphs clearly depict that for the outbound traffic, the client HTTP access traffic has packet sizes in the 40-64 byte range. This is represented by the cylinder at the intersection of bins one and twenty five. The inbound traffic pattern distinguishes both
client and server traffic. Fig. 6(a) Packet size vs application port during peak hour (outbound ) Fig. 6(b) Packet size vs application port during peak hour (inbound)
4. SUMMARY A visualization of some of long-term traffic features that characterize Internet related traffic has been presented. The hourly variation in the traffic between UML and the Internet shows several invariant features that may be useful for development of traffic and performance models. TCP protocol is the dominant traffic generated by HTTP, FTP and SMTP applications. The distribution of packet size over time is also found to be a predictable component for both inbound and outbound directions. A strong correlation is also shown to exist between the dominant applications in each direction and the packet size they generate. REFERENCES 1. Differentiated Services Working Group Charter, <http://www.ietf.org/html.charters/diffserv-charter.html>. 2. D. O. Awduche, "MPLS and Traffic Engineering in IP Networks," IEEE Comm. Mag., vol. 37, (12), p42-47, 1999. 3. Y. Bernet, "The Complementary Roles of RSVP and Differentiated Services in the Full-Service QOS Network," IEEE Comm. Mag., vol. 38, (2), p154-162, 2000. 4. K. Claffy, H.W. Braun, and G.C. Polyzos, "Tracking long-term growth of the NSFNET backbone," Proc. IEEE INFO- COM 93, San Francisco, CA, 1993. 5. K. Thompson, G.J. Miller, and R. Wilder, "Wide-area Internet traffic patterns and characteristics," IEEE Network, vol. 11, (6), p10-23, 1997. 6. W.E. Leland, W. Willinger, M.S. Taqqu and D.V. Wilson, "On the self-similar nature of Ethernet traffic," IEEE/ACM Trans. on Networking, vol. 2, (1), p.1-15, 1994. 7. M.E. Crovella and A. Bestavros, "Self-Similarity in World Wide Web Traffic, Evidence and Possible Causes," IEEE/ACM Transactions on Networking, vol. 5, p835-846, December 1997. 8. V. Paxson and S. Floyd, "Wide-area traffic: The failure of poisson modeling," IEEE/ACM Trans. Networking, vol. 3 p226-244, 1995. 9. C. You and K. Chandra, "Time Series Models for Internet Data Traffic," Proc. 24th Conf. on Local Computer Networks, p164-171, 1999.