ESnet5 Deployment Lessons Learned Joe Metzger, Network Engineer ESnet Network Engineering Group TIP January 16 2013
Outline ESnet5 Overview Transport Network Router Network Transition Constraints Deployment Experiences Challenges & Risks General Issues What went well What could have gone better What still needs to be done
ESnet5 Transport Network ESnet Partnered with Internet2 to build a Ciena OME 6500 nation-wide optical system ESnet has 50% of the capacity of the shared optical system (Excluding the Northern path from Seattle to Chicago through ID, MT, ND, MN & WI) Shared Spectrum, Chassis, Configuration, Management, etc System Over 14,000 miles of shared Internet2 fiber, most of it pre-existing Optical System Inventory report has 6495 components! 341 nodes, 60+ add/drop/regen 80% are common, the rest are dedicated to ESnet or Internet2 (over 500 are XFPs.) Sunnyvale has 4 32-slot shelves Sacramento is a 7-direction node We extended the shared optical system to connect national laboratories ~600 miles of ESnet fiber including building 12 new laterals Ring connecting Chicago-Hub & Starlight to ANL & FERMI Ring connecting Sacramento & Sunnyvale to LBL, SNLL, LLNL, SLAC, JGI, NERSC Spur to ORNL We also have our existing Infinera system on Long Island between hubs in NYC and BNL. Services Point-to-Point static dedicated optical Circuits
ESnet5 Optical System January 2012 SEAT ESnet5 Optical Network PORT :8 1/3/2013 EUGE BOIS ANL ALBA BOST STAR FNAL SOUT NEWY BNL EQCH BUFF LBNL JGI SACR SNLL RENO EURE SALT ECHO DENV GOOD KANS STLO CHIC INDI CINC CLEV PITT ASHB AOFA PHIL WASH NERSC SLAC SUNN LASV PEBL LOUI RALE LOSA SAND PHOE ALBQ ELPA SANA TULS HOUS DALL HOUL JKMS BATO MEMP NASH CHAT ORNL ATLA CHAR JACK Add Drop Node (Ciena) ORNL Express Node (Ciena) LIMAN Node (Infinera) 44 Lambdas 61 Lambdas 88 Lambdas G+ Lit Geography U.S. Department is of Energy Office of Science only representational
ESnet5 Optical System January 2012 Lit Waves SEAT ESnet5 Optical Network PORT :54 1//2013 EUGE BOIS ANL ALBA BOST STAR FNAL SOUT NEWY BNL JGI LBNL SACR X2 X2 SNLL SUNN NERSC SLAC RENO LASV EURE SALT ECHO PEBL DENV GOOD KANS STLO CHIC EQCH INDI LOUI CINC CLEV BUFF PITT ASHB RALE AOFA PHIL WASH X2 X3 LOSA SAND PHOE ALBQ ELPA SANA TULS HOUS DALL HOUL JKMS BATO MEMP NASH CHAT ORNL ATLA CHAR JACK Add Drop Node (Ciena) ORNL Express Node (Ciena) LIMAN Node (Infinera) 44 Lambdas 61 Lambdas 88 Lambdas G+ Lit G Wave < 0km G Wave > 0km G Wave with optical protection 4xG mux d Wave 1/16/13 5
ESnet5 Routed Network Routers 16 new Alcatel Lucent (ALU) 7750-SR12 -slot router with up to 2xG per slot today. 56 G interfaces & 200+ G interfaces 35 existing Juniper MXs Used in G hubs, commercial exchange points, sites 12 existing Juniper M7i & Mi For terminating links slower than GE 5 really old Cisco 7206s to be retired Terminating links slower than GE Services Standard routed IP (including full Internet services) Point to Point Dynamic Virtual Circuits using OSCARS Various overlay networks (Private VPN s, LHCONE VRF)
ESnet5 January 2012 SEAT PNNL LBNL JGI SUNN SNLL LLNL Salt Lake AMES 1 ANL STAR EQCH CLEV PPPL GFDL PU Physics JLAB BNL LOSA SDSC 1 LASV ALBQ LANL SNLA SUNN ESnet PoP/hub locations ESnet managed G routers ESnet managed G router Site managed routers LOSA ESnet optical node locations (only some are shown) ESnet optical transport nodes (only some are shown) commercial peering points R&E network peering locations LBNL Major Office of Science (SC) sites LLNL Lawrence Major non-sc Berkeley DOE National sites Laboratory Routed IP Gb/s Routed IP 4 X Gb/s 3 rd party Gb/s Express / metro Gb/s Express / metro G Express multi path G Lab supplied links Other links Geography U.S. Department is Tail of Energy circuits Office of Science only representational
ESnet5 Transition Constraints Deadlines ESnet4 backbone waves on the Internet2 DWS (Infinera System) needed to be shutdown no later than Nov 30 th 2012 ESnet4 backbone waves on the NLR system needed to be shutdown by Dec 31st 2012 ESnet4 metro waves in the San Francisco Bay Area needed to be shutdown by Dec 31st 2012 Contributing Challenges Contracting & procurement always take longer than planned Equipment delivery delays 1/16/13 8
Challenges & Risks x MSA Optics There are around pluggable G Optics (CFP) in ESnet5 now Router & transport vendors in mid 2011: LR4 s were the only available & supported optic LR4 s were as high as $375K list each with discounted prices greater than $50K Santur put out a press release saying their x MSA CFPs were available for under $5K each in large quantities in July 2011 So, costs for CFPs were bounded between $0.5M and $5M We made a decision to go with Santur x MSA CFPs Worked with ALU and convinced them to support and resell them Purchased 18 from Santur directly for ANI Phase 1 while working with Ciena to get them into their testing & support process We had several millions of dollars of risk if this didn t work out 1/16/13 9
x MSA Optics: The results No interoperability problems or failures yet! Currently resold and supported by ALU, Ciena, Brocade & others Work in Juniper & Cisco gear, but not certified or resold by them No unexpected interoperability problems encountered* Typically >50% cheaper than LR4s Future Outlook We have deployed a small number of LR4 s where required for interoperability with Cisco s in Cisco supported configurations Santur was bought by NeoPhotonics NeoPhotonics only sells to OEMs Limited (1?) manufacture of the x MSA CFP leads to supplychain risks We are experiencing delivery delays 1/16/13
Challenges & Risks ALU 7750 s The ALU 7750 is a new entry to the R&E market sector Supports all the right protocols & has an attractive feature set Designed as a broadband service delivery platform We have been able to make it do what we need so far We have barely touched the unique features that it offers Did run into one serious issue The box has many config knobs, some of which enable behavior specified in internet drafts that have not been accepted by the global community Don t twist any unless you fully understand the global implications of what they do A future ALU OS release will have 1 less knob Sorry to those folks who were impacted
Being on the Leading/Bleeding Edge This is a fun place to be, but it does add a bit of stress Having good relationships with your suppliers is critical! We have had excellent support from Ciena and ALU in dealing with challenges, problems & delays Must be flexible because plans will change Moved all of our 1 st set of Ciena G transponders to shorterreach spans Replaced all of our 1 st generation of G router interfaces Will be replacing all of our 3 rd -party G CFPs in our Ciena s
What went well? Partner relationship with Internet2 & sharing the common infrastructure. Consolidation of ESnet4 IP & SDN routers to make space for the ESnet5 routers Router & Transponder installations The first ~13,000 miles of common fiber & optical system installs. Transitioning the new routers & circuits into production Staff changes (retirement of some senior people) were not as disruptive as expected Acceptance testing
Acceptance Testing >1 PetaBytes, no loss (Or more than PB if you count every packet in and every packet out on every interface) 1/16/13 14
What could have gone better - Shipping Never put more than $250K of equipment in a single shipping crate. Even if it is a very heavy-duty crate with a strong steel shelf in the middle! The crate will be dropped. The shelf will bend, and the cards will be damaged! 1/16/13 15
Discovered some serious PDU problems Some of our AC dual supply PDUs had a power switching component that could silently fail closed, allowing them to leak power between input feeds leading to a serious shock hazard! DC units had a problem with the DC power lug bolts & locking nuts. Leading to power balancing issues and the potential to short. We worked with the vendor and they addressed both of these problems. We swapped out a bunch of PDUs. 1/16/13 16
Lateral Build Challenges - JGI Locate services found and clearly marked 1 of the water lines We found the other one 1/16/13 17
Albuquerque Goal: The ESnet4 hub was in 4 Gold The ESnet5 hub is in the first floor of 505 Marquette We needed to install 2 new tail circuits, or swing 3 existing circuits from our ESnet4 hub to our ESnet5 hub before the backbone circuits to the ESnet4 hub terminated on November 30 th Challenges: Some providers only provide services at the building Minimum Point Of Entry, regardless of what the order might say Others don t provide any services there, they pull their fiber into a proper suite/facility Some vendors take a long time to turn up new services The more parties involved, the more complex things get! 1/16/13 18
Albuquerque Solution MX480 Other ESnet5 Nodes G G MX960 Big Thanks to Ed May, Gary Bauerschmidt at UNM/ABQG! LANL Router SNLA-RT3 G 1G 1G Centurylink Main Hub 1G 1G G DENV-CR5 G ESnet4 ALBU-CR1 DENV-CR2 G ESnet4 4 Gold MX480 ALBU-SDN1 Level3 & Centurylink also put in a lot of hard work to make this happen. Critical room where both providers are colocated! Centurylink FDP Centurylink FDP Level3 FDP ABQG room Basement RAMP Room MPOE G ESnet6 G ESnet4 G ESnet4 Together we made it work with more than 48 hours before the deadline! Level3 FDP ESnet FDP 505 Marquette Level3: 1 st floor ALBQ-ASW1 G ALBQ-CR5 G ESnet5 ELPA-CR1 ELPA-ANI 1/16/13 19 Other ESnet5 Nodes G G MX960
What s Left to do? G production connections to the labs: ANL, BNL, FNAL, LBL, LLNL, ORNL & NERSC G production connections to peers: MANLAN, Starlight, PACWAVE, WIX, Internet2 40G into Equinix Ashburn & re-arranging our Washington DC ring to provide diverse backbone connections for JLAB & other sites in the area Lots of cleanup & consolidation at the hubs, moving connections from the MX s to the ALUs Normalize our G Testbed infrastructure Swap out our un-supported third party x MSA CFPs in our Ciena interfaces with Ciena supported ones covered by a 4-hour on-site maintenance contract Additional diversity for ANL & FNAL 1/16/13 20
Summary Next Time Plan more time & resources for communicating early and often Plan more time & resources for handling logistics It is harder and takes more time than expected Include the logistics head-aches in the cost-benefit analysis of dealing with multiple types of G optics (XFP & SPF+ in both 13 & 1550) We have a great team, and everybody pulled together to work on the challenges as they came up! 1/16/13 21
Questions? Thanks! Joe Metzger metzger@es.net http://www.es.net/ http://fasterdata.es.net/