Developing Applications with Networking Capabilities via End-to-End Software Defined Networking (DANCES) Kathy Benninger Pittsburgh Supercomputing Center OIN Workshop Pittsburgh, PA 18 March 2015
What is DANCES? DANCES is a two-year, NSF funded, CC-NIE collaborative research integration project DANCES is developing mechanisms for managing network bandwidth by adding end-to-end SDN capability and interoperability to selected supercomputing cyberinfrastructure applications Motivated by need to support large bulk file transfer flows and efficiently share bandwidth of end site 10G infrastructure 2
DANCES Participants and Partner Sites Pittsburgh Supercomputing Center (PSC) National Institute for Computational Sciences (NICS) Pennsylvania State University (Penn State) National Center for Supercomputing Applications (NCSA) Texas Advanced Computing Center (TACC) Georgia Institute of Technology (GaTech) extreme Science and Engineering Discovery Environment (XSEDE) Internet2 3
DANCES Map 4
DANCES Application Integration Targets Add network bandwidth scheduling capability and QoS using SDN/OpenFlow 1.3 metering to supercomputing infrastructure applications Resource management and scheduling Manage file transfer within workflow Distributed wide area file systems SLASH2 XWFS Map SLASH2 and XWFS file system interfaces to network bandwidth reservation 5
SDN/OF Infrastructure Components and Interfaces CONGA bandwidth manager software Receives b/w request from application Verifies user authorization Initiates OF path set up with OF controller RYU OF controller software Interfaces to Internet2 s AL2S VLAN provisioning Local provisioning of VLAN and reserved bandwidth OpenFlow 1.3 switches 6
DANCES System Diagram Corsa 6410 Corsa 6410 7
Workflow Example 1. User submits job with file transfer and bandwidth request 2. Check user authorization for bandwidth scheduling 3. Torque/MOAB schedules job when resources are available 4. Torque uses Prologue script to request bandwidth and path provisioning 1. End site OpenFlow configuration 2. Configure AL2S/XSEDEnet path via FlowSpace Firewall and OESS for wide area authorization and path provisioning 5. Execute file transfer 6. Torque Epilogue script will tear down provisioned path when transfer is finished 8
SDN/OpenFlow Switch Selection Testing Three test servers with iperf3 Testing between hosts on the local OF switch Remote loopback on Internet2 AL2S to provide long RTT test path (in our case ~105ms) Mix of local and remote traffic Verify fair-share of bandwidth in local and long RTT traffic Used Linux tc (traffic control) with htb (hierarchical token bucket) queueing discipline to simulate QoS and flow metering Monitored behavior of metered and best effort traffic 9
SDN/OF Switch Infrastructure Challenges Many switches are designed for LAN ToR applications with buffers that are too small to sustain high bandwidth, long RTT, flows Vendor OF 1.3 support is late by >> 1 year Vendors implement subsets of OpenFlow features Verify that the match fields needed are supported OF 1.3 spec includes metering, but that does not guarantee metering capability is supported in the product Variations between vendor Northbound/Southbound interfaces 10
Production Issues - Operational Authentication and authorization mechanism for users/projects to allow bandwidth reservation request Site/XSEDE context Internet2 AL2S context Real-time cross-site tracking and management of allocated bandwidth resources Monitoring and accounting of bandwidth usage 11
Broader Questions to be Answered Is a single or multiple SDN/OF controller configuration most appropriate for the project? "one controller to rule them all? Does OpenFlow 1.3 flow metering and QoS meet the performance needs? How do we optimize network bandwidth utilization by using bandwidth scheduling? What verification needs to be done by project team to prepare for production deployment? 12