Year 1 Activities report for the NSF project EIN-0335190 Title: Collaborative research: End-to-End Provisioned Optical Network Testbed for Large-Scale escience Applications Date: July 29, 2004 (this is a partial-year report since we started the project on Jan. 1, 2004) Please reiterate the goals and objectives of your efforts, and summarize the research and education activities you have engaged in that aim to achieve these objectives. Include experiments you have conducted, the simulations you have run, the collecting you have done, the observations you have made, the materials you have developed, and major presentations you have made about your efforts. In a later section you will list more formally any publications and other specific products (database, collections, software, inventions, etc.) that have resulted. The goal of this project is to develop the infrastructure and networking technologies to support a broad class of escience projects and specifically the Terascale Supernova Initiative. Our objectives are to design and deploy a high-performance, experimental optical network infrastructure and to test application/ middleware/transport protocol software, developed specifically for escience projects, on this network. Our two target applications are file transfers and remote visualization. Our work addresses limited-scope research questions in the various components of the experimental work (e.g., what form of flow control mechanism is ideal for fixed-rate dedicated end-to-end circuits, window based, rate based or a null scheme) as well as a more far-reaching research question of how to overcome a well-known drawback of circuit-switched networks, wherein a file transfer committed to a fixedrate circuit cannot take advantage of bandwidth that becomes available subsequent to the start of the transfer. The objective of the educational activities is to prepare a new generation of engineers who are knowledgeable about the new technological advents in the general area of optical networks and specifically in how to leverage the high-speed capabilities of these networks to architect and design new applications. Below is a summary list of all our activities in this project to date (Jan 1, 2004 - July 29, 2004), which includes experiments we have conducted, simulations we have run, and some observations we have made (more details on our observations are listed in the findings attachment): We designed the CHEETAH network and are in the process of identifying the various components of this network for purchase. This includes Gigabit Ethernet NICs, Ethernet switches, DWDM optical transport equipment (such as Cisco s 15454 MSTP and 15808), and SONET
switches. Different pieces of equipment are in various stages of procurement. Some are already in our laboratory, others have been ordered and yet others are under negotiation with vendors. We are focussing on connecting NCSU and ORNL this year since these are the two organizations in which our scientist co-pis work. We have communicated and obtained price quotes from NLR for wide-area connectivity. We worked with NCSU and MCNC/NCNI network staff to plan out the details of the connectivity between our scientist co-pi s laboratory and the NLR POP in Raleigh. We also worked with ORNL to plan out the Atlanta and ORNL portions of the network. We have set up a CHEETAH local-area network in our laboratory in which end hosts enjoy two paths for communication: a CHEETAH circuit passing through a Cisco 15454 MSPP (which maps Ethernet signals to SONET and vice versa) and a connectionless IP path through two Cisco GSR routers (which we obtained through another project). We have been running our experiments on this network. Newly proposed high-speed transport protocols implemented in application-space using UDP sockets include SABUL, Tsunami, UDT, RB-UDP. We downloaded all four of these implementations and experimented with them preferring them to kernel-based TCP enhancements for ease of programming and deployment reasons. Of these, we found the SABUL protocol and code the easiest to modify for adaptation to dedicated high-speed circuits. FRTP is the result of this work. We ran FRTP between two Dell workstations. Each Dell workstation has a 2.4-GHz Intel Xeon TM CPU connected to a 533-MHz front-side bus (34Gbps CPU bandwidth), an E7505 chipset with 512MB of DDR 266MHz memory (17Gbps memory bandwidth), an 80GB ATA/100 7200 RPM EIDE disk drive with 2MB cache (400Mbps average writing rate measured by Bonnie), and two 64bit/100MHz PCIx GbE NIC (6.4Gbps network bandwidth). The operating system on both workstations is RedHat Linux 9 with version 2.4.20-30.9 kernel. We collected several measurements to understand the impact of various parameters on the performance of this transport protocol implementation. While our memoryto-memory transfer rates were around 910Mbps, our disk-to-disk transfer rates were limited by the capabilities of our disks. This was indeed fortunate because it enabled us to experiment with flow control/error control over dedicated circuits. We are currently working with ORNL to integrate our efforts on high-speed transport protocols. More details are in our published and submitted papers. We experimented with an RSVP-TE implementation, obtained from the Dragon EIN project (contact: Jerry Sobieski). Using Ethereal, we are able to decode the RSVP-TE messages to
check compliance with standards. Recently, we completed an interoperability testing of this code with Sycamore s SN16000 (SONET switch) GMPLS implementation. The testing was a success. We are now incorporating the RSVP-TE client at CHEETAH end hosts with SFTP source code to enable the SFTP application to request a circuit for a file transfer. We obtained Dynamic TL1 libraries from Monfox to control the Cisco 15454 MSPP in our laboratory CHEETAH network. We implemented a C++ program to incorporate this Java library. We took measurements of call setup delay and found that it takes about 17-20ms to issue a cross connection setup TL1 command and receive a response. This is better than we expected. Lower call setup delays means higher crossover file sizes above which the use of circuits are justified. We are integrating the Dragon RSVP-TE VLSR code with this C++ code to create a GMPLS based control engine for the Cisco 15454 system. There is a well-known drawback to using circuits for file transfers when compared to using packet-switched networks. This is explained in classical textbooks as follows. Time-Division Multiplexing/Frequency-Division Multiplexing (TDM/FDM) schemes are typically used in a fixed-bandwidth allocation mode, which means a call is assigned a fixed amount of bandwidth for its whole duration. For file transfers, such schemes compare unfavorably against packetswitching schemes. This is because, unlike in packet-switched networks, in fixed-bandwidth TDM/FDM schemes once a file transfer is allocated a certain bandwidth, it cannot take advantage of bandwidth that becomes available as a result of other transfers completing. To address this drawback, we propose a Varying-Bandwidth List Scheduling (VBLS) heuristic for SONET/SDH/WDM circuit-based networks in which a file transfer is allocated varying bandwidth levels for different time ranges within the duration of the transfer. Such an allocation is possible if all file senders specify the file sizes of the transfers. We ran simulations for a single-link network to compare the performance of VBLS with a fixed-bandwidth allocation scheme (which we called FBLS), and a packet-switched system. We assumed that file transfer requests arrive to the system according to a Poisson process, and that the size of each file is distributed according to a bounded Pareto distribution. Our key conclusions are presented in the findings report and in published papers. Understanding the constraints of existing networks in our plans for demonstrations across campuses taught us that we need Ethernet switches to aggregate Ethernet signals from various hosts of the clusters used by scientists. This led to an extension of our SONET-based CHEE- TAH concept to a more-general connection-oriented internet concept, in which rate-guaranteed connections would be provisioned across heterogeneous networks such as MPLS
networks, Ethernet VLANs, as well as SONET and WDM networks. We experimented with MPLS tunnels through our Cisco GSRs and found rate guarantees to be quite stable. We are conducting other experiments to quantify parameters such as call setup delay. These results are presented in a submitted paper. The CHEETAH solution is proposed as an add-on to the basic Internet access available to an end host. This means that CHEETAH end hosts have two NICs, which, in turn, implies that applications running on these hosts have to decide on whether to attempt a CHEETAH circuit (and fall back to the primary TCP/IP path if the circuit request is denied) or whether to opt directly for the primary TCP/IP path. We are implementing a routing decision module for the file-transfer application to invoke prior to invoking a circuit setup. Our analytical work shows that we need measurements of various parameters on the TCP/IP path, such as round-trip delay and bottleneck link rate. We experimented with many Internet performance measurement tools such as pathrate, iperf, to determine the values of parameters for use in the routing decision module. Our key conclusions are outlined in the findings report with details in our published papers. We planned and worked with NCSU to set up a monitoring platform to obtain measurements of network traffic when our scientist co-pis execute remote visualization sessions. So far the equipment is in place and the software tools have been developed. We are just starting to collect measurements. We will analyze this data using matlab. We set up the network configuration for the router disconnect operation from CUNY and are supporting CUNY in their router disconnect study. Finally on the applications side, we are upgrading SFTP code to include the APIs for the signaling module, routing decision module, and FRTP. Education activities include the teaching of several graduate students on this project. It has provided an excellent opportunity for these students to work with network switch equipment, such as Cisco s 15454 Multi-Service Provisioning Platform, Cisco s GSR 12008 routers, high-end PCs with GbE NICs, and a GbE Summit switch. One student learned first-hand the experience of interoperability testing with an actual network switch vendor. Besides experimental research, students also had an opportunity for analytical work (routing decision algorithm) and simulation work (VBLS). Overall, this project has provided amazing opportunities for teaching our students. The materials we developed include papers, presentations, and software. The publications are listed in the Products part of this report. We published two journal papers, three conference papers, one workshop paper and have submitted one conference paper and two workshop papers (which are under review). We
gave several presentations: at conferences, at the second PFLDN workshop, Feb. 16-17, 2004, http://wwwdidc.lbl.gov/pfldnet2004/index.htm, and at the MCNC workshop on optical control planes for the grid community, April 22-23, 2004, http://www.mcnc.org/mcncopticalworkshop/opticalprocess404.cfm. All these materials are available through our project web site: http://cheetah.cs.virginia.edu. We will post various software components on this web site shortly.