Extending dynamic Layer-2 services to campuses Scott Tepsuporn and Malathi Veeraraghavan University of Virginia (UVA) mvee@virginia.edu Brian Cashman Internet2 bsc@internet2.edu April 1, 2015 FTW Intl. OpenFlow/SDN Testbeds, Miami, FL Thanks to A. J. Ragusa and Luke Fowler (IU), Chin Guok (ESnet), T. Lehman and X. Yang (MAX) Coauthors on a submitted paper Thanks also to Ezra Kissel (Indiana U), Dale Carder and Jerry Robaidek (U. Wisconsin), Ivan Seskar and Steve Decker (Rutgers U), R. D. Russell and P. MacArthur (U. New Hampshire), Conan Moore (U.Colorado), and Ryan Harden (U. Chicago), Ron Withers (U. Virginia), John Lawson (MARIA), Eric Boyd (Internet2), GRNOC, and several regional REN providers for their support. Thanks to NSF for grants CNS-1116081, OCI-1127340, ACI-1340910, and CNS-1405171, ACI-0958998, and DOE grant DE- SC0011358C 1
Outline What was done? How was it done? Why do this? Long-term vision Contributions to community Control-plane models International component 2
What was done? Configured DYNES in 8 campuses WAN multi-domain testbed What is DYNES? Eric Boyd, Shawn McKee, Harvey Newman, Paul Sheldon: PIs NSF MRI project : File Data Transfer (FDT) host + Switch (OpenFlow) + SDN Controller (IDC) + perfsonar host 40 universities and 11 regionals Dynamically created inter-domain L2 paths via OESS GUI (running OSCARS on most DYNES IDCs) Configured FDT: vconfig, ifconfig, Linux tc Tested nuttcp and GridFTP: 0 loss? 3
Campuses involved CU (UCAR) UWisc UNH I2Lab U. Chicago Rutgers IU MAX UVA VTech UTD UH Regionals: VA: MARIA; Rutgers: MAGPI; UChic, UWisc: CIC; IU: Indiana GigaPop; UNH: NOX; CU: FRGP DYNES sites (in use); New sites
How was it done? Method Brian Cashman: significant help! For each campus: Requested logins on FDT with sudo access Assisted campus admin to install, configure and run OSCARS and OESS Assisted campus admin to organize static VLANs through campus networks and regionals Provisioned inter-domain circuits automatically Provisioned FDTs at end of each circuit manually Ran nuttcp and GridFTP with htcp/reno and tc rate shaping with cron jobs for loss/throughput 5
Multi-domain deployment Internet2 ION Regional OSCARS OESS OSCARS Regional Internet2 AL2S OESS OSCARS OSCARS University OSCARS OESS IDC 19 DYNES University FDT ps ESnet OSCARS OESS IDC 19 DYNES ps FDT DOE Sites 6
Examples: End-to-end L2 paths between UVA and IU, and between UVA and UWisc. 7
Lessons learned OSCARS and OESS software works well, but.. When something goes wrong, the error messages are cryptic: error reporting needs community help to improve Topology approach: scalability? Use DNS? Tools required for debugging on multi-domain L2 paths Providers may police rate-guaranteed paths Need to set tc ceiling (ceil) option; higher throughput at 45 Mbps than at 50 Mbps when circuit through ION was 50 Mbps It was good to have ION to gain this experience AuthN/AuthZ Add DYNES to Shibboleth single-sign service, or GlobusOnline type service: which is more scalable? 8
Why pursue this course? Rate-guaranteed circuits offer a solution to the TCP throughput issue 0.0046% loss on high BDP paths causes throughput to crash ESnet SC13 paper Dynamic L1 circuits Rates have reached levels where WDM optical circuits are economically viable L1 now has colorless, directionless, contentionless ROADMs allowing for Rate-guaranteed 100Gbps DTN-to- DTN circuits Dynamic circuits: solution to the rare big-dataset movement needs of scientific community 9
Visions of ARPAnet-like growth! Picture taken in LBJ Library, Texas Austin, 60s Exhibit, Oct. 2014 10
Contributions to community Extending dynamic L2 service to campuses by having engineers/students gain experience with OSCARS/OESS setup and usage End-host configuration: use of tc, Circuit TCP to avoid HTCP cwnd changes Develop: applications for end-to-end L2 paths FCAPS: Fault, Config. mgmt, Accounting, Performance monitoring and Security Management plane help improve OSCARS and OESS: error reporting/autoconf CC-NIE awards (ScienceDMZ): many campus deployments; grow this service 11
Control-plane models Daisy-chain vs. tree-model Research literature and PCE IETF work To avoid lockup of resources: Daisy-chaining requires limited resource allocation on forward signaling path Multiple start-time options to increase chance of success Fast processing Tree-model AuthN needs? Global PSTN: no customer-provider relationships required with providers more than two hops away in daisy-chain model. Not so in tree model Testbed view (GENI) vs. ARPAnet growth view 12
International component Added Keio University, Yokohama, Japan OSCARS successfully set up L2 circuit ping didn t work need to create trouble ticket 13
Requesting your feedback One approach Grow this deployment to CC-NIE/other DYNES sites Create a virtual organization of individuals to develop tools for diagnostics, improve OSCARS, OESS, develop applications Second approach Add Aggregate Manager and contribute this testbed to GENI for networking researchers Third approach Develop L1 (WDM optical) SDN testbed 14
Backup slides 15
UVA DYNES data-plane with an example VLAN. 16
Path UVA- to- Path rate (Gbps) tc Throughput (Mbps) Min. Mean Max. IQR IU 4 R 2933 3856 3927 39 MAX 4 R 3695 4070 4105 27 MAX 3 R 2938 3218 3262 27 MAX 3 C 3132 3221 3248 17 MAX 3 B 3124 3221 3250 19 IU 3 B 609 2973 3132 32 I2Lab 50 R 42.1 45.1 47.1 0.89 I2Lab (Reno) 50 R 25.9 37.2 41.4 2.01 UWisc 45 R 44.1 44.4 44.6 0.19 UWisc 45 C 44.1 44.4 44.6 0.17 UWisc 50 R 36.5 39.7 40.9 0.61 UWisc 50 C 37.3 39.6 41.0 0.8 UWisc 50 B 37.5 39.7 40.7 0.58 Path UVA-to- IU 26 MAX 4.4 I2Lab 27.5 UWisc 24.1 RTT (ms) nuttcp throughput for paths through AL2S (blue) and ION (red). 17
Path UVA-to- Path Rate (Gbps) tc Mean packet retx rate Mean # retx,first 2s IU 4 R 0.00075 4.8 13 MAX 4 R 0.00085 47.8 12 MAX 3 R 0.0007 58.7 4 MAX 3 C 4E-05 3.3 6 MAX 3 B 4E-05 3.5 6 IU 3 B 0.006 95 7 I2Lab 50 R 0.07 20.5 97 I2Lab (Reno) 50 R 0.02 19.7 30 UWisc 45 R 0.002 0.83 60 UWisc 45 C 0.0015 0.73 48 UWisc 50 R 0.15 17.8 100 UWisc 50 C 0.148 17.4 96 UWisc 50 B 0.149 17.3 96 % runs w/ retxin later sec R: max rate guaranteed by tc; C: Ceiling limits max sending rate 18
GridFTPtests Disk-to-disk transfers. 20 GiB * 1 file for single 20 MiB * 1024 files for LOSF. tc=c, 3 Gbps -fast, -pp, and -cc 16 used Path UVA-to- Type Throughput (Mbps) Min. Mean Max. IQR MAX Single 3179 3230 3246 15 MAX LOSF 1485 2035 2246 549 IU Single 1247 2181 2455 150 IU LOSF 1519 2025 2178 39 GridFTP reported throughput for paths through AL2S. LOSF=Lots of Small Files 19