Grid Tutorial Networking Laukik Chitnis Sanjay Ranka
Outline Motivation Key Issues and Challenges Emerging protocols DWDM MPLS, GMPLS Network Infrastructure Internet2 Abilene and HOPI NLR and FLR Gloriad End-to-end utilization of line speeds Data transfers Provisioning bandwidth scheduling usage of network resources Co-scheduling with compute and storage resources
Motivation Exchange of large amounts of data Collaborative data analysis in HEP For LHC experiments, CERN would host only about 11% of the data Large quantities of data flow in the Grid processing power at CERN would comprise only about 17% of the total Data needs to be dispersed geographically Multiple applications sharing the same infrastructure, but requiring QoS guarantees Experiments in Data Fusion (ITER), Astronomy, Medical Sciences High Definition video conferencing Requires provisioning and QoS guarantees e.g. VRVS
Types of data transfers in e-science Scheduled transfer of data Explicit request to transfer data by end-user Automatic replication of files and datasets by storage managers Data produced at source (like LHC experiments) to be disseminated to various destinations E.g. bulk data transfers in e- Science LHC e-vlbi Unscheduled data movement Interactive analysis of experiment results Remote visualization of datasets Amount of data transferred not usually large Else, it makes sense to schedule the transfer e.g. GAE (Grid enabled analysis using distributed services) Root
Key Issues for Networking in Grid Computing 1. Traffic engineering Co-existence of circuit-oriented services (like protocols specialized for long-haul, high-speed bulk data transfers) with best-effort TCP-based transport protocols used in the IP network. 2. Decentralized ownership of Networks Networks not owned by specific groups Managed by multiple organizations like Internet2, ESNet Data flows have to co-exist with other data flows in multiple domain networks
Key Challenges for Networking in Grid Computing 1. Deploying networks with large wire-speeds Encompassing geographically distributed sites Technologies that enable fast transfer of bulk data 2. Effective end-to-end utilization of these speeds Enabling data transfers to achieve speeds as close to the wire-speeds as possible QoS guarantees for services
Key protocols in emerging networks Layer 1 SONET/SDH DWDM IP over DWDM Layer 2/3 MPLS and GMPLS
Early years SONET/SDH at Layer 1 infrastructure introduced in the early 1990s to support traditional time-division multiplexing (TDM)-based data and voice services. efficient multiplexing of lower-speed TDM circuits such as T1/E1 and T3/E3 to higher-speed OC-3 and OC-12 trunks for transport across service providers' core networks. Supports three critical functions Grooming aggregating IP traffic to utilize the available bandwidth protection and restoration; and thorough operational support (such as alarming and performance monitoring).
Multiplexing Wavelengths In the latter part of the 1990s, DWDM emerged as a way to significantly increase the efficiency of the installed fiber plant by allowing transmission of multiple wavelengths over a single physical fiber. Another level of multiplexing and demultiplexing at the optical level to support greatly increased bandwidth at the core of the network The SONET/SDH layer mapped into wavelengths at the DWDM transport layer to be carried across the core long-haul networks spanning regions and countries in many cases.
Wavelength Division Multiplexing WDM enables the utilization of a significant portion of the available fiber bandwidth Allows many independent signals to be transmitted simultaneously on one fiber, with each signal located at a different wavelength The more number of different wavelengths (colors) are transmitted at once, the more complex the components (Mux, switch, DeMux) need to be Coarse WDM systems usually support 16 channels per fiber DWDM (Dense WDM) can transmit 40 or 80 channels Source: rad.com
Routing in Optical networks Two types of traffic entering and exiting a typical service provider point of presence (POP) Router-terminated traffic IP traffic that needs a Layer 3 lookup at the POP riding a wavelength that will terminate on a router. "pass-through" (or transient) traffic stays in the transport domain and bypasses the router to travel on to an adjacent POP in the service provider's core network.
Typical operations at PoP Typical operations at PoP Incoming traffic composed of colored wavelengths multiplexed through DWDM on to a physical fiber Fed to DWDM demultiplexers that convert it to different wavelengths Each wavelength passed through a transponder that converts it to short reach wavelength (grey light) optical-to-electrical-to-optical (OEO) conversion is used because historically short-reach optics have been used for connectivity inside the POP environment. Grey light is then typically fed into a short-reach interface on a SONET/SDH cross-connect The SONET/SDH cross-connect then feeds the 10 Gbps to the router Router does the following performs performance monitoring at Layer 1 through Layer 3, monitors for LOS so it can perform MPLS Fast Reroute (FRR) restoration, and performs a Layer 3 and above lookup to route the packet to its destination.
IP-over-DWDM Interconnect Model in Today's Network Cross-connects v/s Manual patching Capacities are utilized today. Since aggregation is not required, cross connects function only as patch panels However, with manual patching, any changes require human interference for reconfiguration In both cases, OEO conversions and the associated electrical processing result in an additional cost in terms of space Source: Cisco whitepaper
Converging technologies Source: Cisco
IP over DWDM
IP over DWDM Cisco CRS-1 Family of routers offering IPoDWDM Advantages Integrated transponders Lesser independent components that can fail better reliability Provides S-GMPLS Single IP control plane management providing service flexibility and lower operational expenditure Segmented or Integrated Management model Better provisioning ROADMs (Reconfigurable Optical Add Drop Multiplexers) Eliminate O/E/O Better reliability, more flexibility Lower Operational expenditure Current routers offering similar capabilities Cisco 7600 series, Force10
Layer 2/3 convergence Layer 2 Switching of frames Layer 3 Routing of IP packets based on destination IP address Motivation for convergence Lookup of destination IP address to determine next hop in the core (even within an Autonomous System) is expensive at each router ATM-like flows desirable guaranteeing certain Quality of Service Connection oriented service over connectionless networks Recent trends MPLS and GMPLS Recent work in progress for dynamic provisioning of Inter-domain paths
MPLS Motivation Speed up packet forwarding Traffic Engineering in IP networks How to do it? Convert the connectionless IP into a connection-oriented network Path between source and destination is pre-calculated based on specifications provided by the user MultiProtocol Label Switching Use labels rather than address lookup for determining the next hop of a packet/frame To speed up forwarding Use tables to store QoS parameters For traffic Engineering
MPLS Advantages Ability to place IP traffic on predefined paths Guarantees can be given for Bandwidth and other differentiated QoS LSP Label switch path LSR Label switch router LIB Label Information base Picture source: netcraftsmen.net
MPLS protocol suite Traditional IP routing protocols and their extensions for traffic engineering Open Shortest Path First (OSPF) Intermediate System to Intermediate System (IS- IS) Extensions to existing signalling protocols Label Distribution protocol (CR-LDP) Resource Reservation protocol (RSVP)
Generalized MPLS (GMPLS) Extends control plane to other network types Provides Signaling and Routing to dissimilar network types such as packet (IP), time (TDM networks) and optical (WDM networks) Advantages Automating end-to-end provisioning of connections across multiple networks Manage establishment and release of Label Switched Paths (LSPs) spanning across networks Simplifying network management Allows admins to automate the provisioning and management of networks and lowers the cost of operations Source: www.iec.org
GMPLS protocol suite Signaling Routing Link Management CR-LDP, RSVP- TE OSPF-TE, IS-IS- TE LMP For establishment of LSPs (support for generalized labels, and bidirectional LSPs) Routing protocols for autodiscovery of network topology, advertisement of parameters such as bandwidth availability Control channel management, Link connectivity verification, fault isolation
MPLS and GMPLS in a nutshell Multiprotocol Label Switching Route at edge, switch in core Ultra fast forwarding IP Traffic Engineering Constraint-based Routing Virtual Private Networks Controllable tunneling mechanism Voice/Video on IP Delay variation + QoS constraints Generalized Multiprotocol Label Switching enhances MPLS architecture by the complete separation of the control and data planes of various networking layers.
Network Infrastructure Deploying networks with large wire-speeds The National LambdaRail (NLR) initiative And the Florida LambdaRail (FLR) Initiatives by the Internet2 community Abilene HOPI Gloriad (Global Ring Network for Advanced Application Development)
NLR 1 st transcontinental Ethernet network High-speed network over fiber-optic lines
Florida LambdaRail Over 1,540 miles of dark fiber Dense wave division multiplexing (DWDM)-based optical footprint using Cisco Systems 15454 optical electronic systems capacity of 32 wavelengths per fiber pair. each wavelength supports transmission upto 10 Gbps. Ethernet based MPLS built on top of the optic infrastructure
Internet2 and Abilene Internet2 Consortium to provide scalable, sustainable, highperformance networking in support of the research universities of the United States. Abilene OC-192c over unprotected DWDM waves with SONET framing Backbone Shared packet infrastructure Abilene provides an IP connection over infrastructure rented from commercial backbone providers NRL offers a complete fiber infrastructure on which researchers can build their own Internet Protocol networks. an arrangement that ultimately limits research possibilities.
Hybrid Optical and Packet Infrastructure Motivation: Increased demands for deterministic paths Hybrid of shared IP packet switching and aggressive use of dynamically provisioned optical lambdas. The packet based infrastructure is the usual IP infrastructure The circuit infrastructure can be viewed as a set of paths between nodes may or may not connect to the packet infrastructure
HOPI Possible types of circuit infrastructure Node type Path Attributes Optical switches SONET Add-Drop Multiplexer Raw lambdas (light paths) SONET channels Color, spacing (No digital data) Number of channels, channel size Ethernet switches VLANs Dedicated bandwidths IP routers MPLS tunnels Dedicated bandwidths
Current HOPI network topology Full scale optical switching on raw waves does not exist Utilizes Ethernet switches and VLANs (may be with MLPS tunnels) to model optical capabilities
GLORIAD Fiber-optic ring of networks around the northern hemisphere High-speed computer network connects scientific organizations in Russia, China, United States, the Netherlands, Korea and Canada. Bandwidth of up to 10 Gbit/s via OC-192 links e.g. between KRLight in Korea and the Pacific NorthWest GigaPOP in the United States. Source: Gloriad
Global Lambda Integrated Facility International VO promoting the paradigm of lambda networking. Integrated facility support data-intensive scientific research supports middleware development for lambda networking.
End-to-end utilization of network resources DRAGON Ultralight Vinci LISA FDT MonALISA TeraPaths
Dynamic Resource Allocation in GMPLS Optical Networks (DRAGON) DRAGON is a research and experimental framework for high performance networks Motivation: e-science applications need network services beyond the typical best-effort infrastructures Deploying infrastructure that enables dynamic provisioning of network resources to establish deterministic paths Multi-domain provisioning of traffic engineering paths Distributed control plane across heterogeneous network technologies Includes mechanism for Authentication, Authorization and Accounting (AAA) Enables Scheduling across domains Reference implementation in Washington, DC area
DRAGON Virtual Label Switch Router Provides GMPLS protocols to switching elements without native GMPLS capability A small unix-based PC running GMPLS control plane Acts as proxy agent for switching device Converts events into commands native to the local switching element Primarily used in DRAGON for controlling Ethernet switches via the GMPLS control plane
DRAGON Network Aware resource Broker (NARB) Network Aware Resource Broker (NARB) Exchanges interdomain topology Stores topology in OSPF-TE database Allows for abstraction of topology Performs intra-domain and inter-domain path First calculates path based on interdomain abstractions Initial path computation result can be expanded to high-fidelity path via coordination of NARBs in different domains Maintain Traffic Engineering Database and associated AAA and scheduling information 3D TEDB GMPLS TE constraints, AAA constraints, scheduling constraints Policy-based provisioning using 3D Resource Computation element
DRAGON NARBs Inter-domain path computation Control plane Transport Layer Capability Set Exchange NARB NARB NARB End System End System AS 1 AS 2 AS 3 Data plane More on DRAGON: http://dragon.maxgigapop.net/twiki/bin/view/dragon/overview
UltraLight Promotes network as an actively managed component Along with storage and compute resources Single interface for end applications to access collection of network services Optimize transfer performance for individual applications Global optimization for all applications transferring data at a given time Dynamically evolving network core on other backbones such as NLR, HOPI, Abilene and ESnet.
Virtual Intelligent Networks for Computing Infrastructures Multi-agent system for light path provisioning based on dynamic discovery of the topology in distributed networks Adds a new level of data transfer predictibility Use provisioning and reservation of network resources Path Discovery service Provides the scheduler with the best suited path(s) between source and destination with Bandwidth information Earliest reservation time Maximum duration of b/w guarantee Responsible for setting up the required path at the request of the scheduler
VINCI Architecture
VINCI Components Transfer scheduler Provides uniform interface for requesting network resources Provides transfer classes and priorities Interacts with Authentication, Authorization and Accounting services End host agents utilize LISA Network services Path discovery Distributed set of Monitoring and Alert tools MonALISA end-to-end monitoring Transfer Management Efficient data transfers across networks using FDT
The Functionality of the VINCI System ML proxy services MonALISA ML Agent ML Agent MonALISA ML Agent ML Agent MonALISA ML Agent ML Agent ROUTERS Layer 3 Agent ETHERNET LAN-PHY or WAN-PHY Agent Layer 2 Agent DWDM FIBER Agent Layer 1 Agent Site A Site B Site C Source: GridNets06 Ultralight talk
Fast Data Transfer Application for Efficient Data Transfers capable of reading and writing at disk speed over wide area networks with standard TCP can be used to stream a large set of files across the network Uses appropriate-sized buffers for disk I/O and for the network Transfers data in parallel on multiple TCP streams, when necessary Uses independent threads to read and write on each physical device Streams a dataset (list of files) continuously, using a managed pool of buffers through one or more TCP sockets. Experiments at SC07 demonstrated sustained data transfer speeds of 18 Gbps Work in progress for integrating FDT with dcache
Other initiatives OSCAR, Cheetah TeraPaths Route planning with MPLS BNL TeraPaths project
General key issue for doing e-science across multi-domain networks Control plane interaction between domains for scheduling, routing and circuit provisioning
Multi-domain provisioning flow Overview Source: Internet2.edu