400G: Deployment at a National Lab Chris Tracy (Esnet) *Jason R. Lee (NERSC) June 30, 2016-1 -
Concept - 2 -
Concept: Use case This work originally began as a white paper in December 2013, in which Esnet was exploring new technologies to support rates above 100G. One use case, in particular, was the problem of linking two disparate data centers. NERSC was in the planning phase of a move from the Oakland Scientific Facility (OSF) in Oakland to Shyh Wang Hall (CRT), at LBL in Berkeley. Oakland Scientific Facility [1] Oakland, CA - 3 - Berkeley Lab s Shyh Wang Hall [2] Berkeley, CA
Concept: Proposal By February 2014, interest in these new technologies grew. This led to the generation of a draft proposal submitted to the DOE. In collaboration with NERSC and Ciena, ESnet proposed a field-trial of 400G technology on BayExpress ESnet s Bay Area production dark fiber ring. ESnet5 BayExpress: Production system serving Bay Area laboratories between Sacramento and Sunnyvale. - 4 -
400G - 5 -
400G: Plan BayExpress ring is 450 km in length National Energy Research and Scientific Computer Facility (NERSC) was moving to a new building that was only 11.5 km from the current site. Short way around. NERSC needed to stay up and running, serving the large diverse scientific community it supports ~6000 scientists, ~900 projects and 46 countries across the world. There is no time that the center is lightly loaded. As of June 27 th we have a 10 day backlog of jobs to run. - 6 -
400G: Plan We would create two alien waves, where each wave would carry 200G. Then combine these waves to form a SuperChannel, that would be 400G in total bandwidth Wave selectable switches are in the path, but they are limited to 50GHz granularity. In the production circuit this took up 100GHz of spectral bandwidth. 2 x 50GHz channels - 7 -
400G: Execution Network-wide upgrade had to be performed for new h/w and optical control plane 4x100 GigE circuits brought up and fully production quality between OSF and CRT On line (DWDM) side, provisioned on BayExpress as two adjacent 50 GHz channels Each 50 GHz channel contains one DP-16QAM signal (2x100GigE payload) DP-16QAM signal line rate => 275.75 Gbit/s (incl. G.709/FEC overhead) - 8 -
NERSC - 9 -
NERSC: Physical topology CRT OSF - 10 -
NERSC: Synchronizing File systems Sync FS between sites while keeping jobs running on the supercomputers. In total about ~10 PB of file system data GPFS restripe, keeping both sites live. Achieved a sustained rate of ~250 Gbps over the link: Limited by the number of sinks / sources we could allocate to the transfer. We did push 400 Gbps during acceptance of the link. Path that the data took was: Disk 10G Ethernet 400G Superchannel Ethernet/Infinband routers Disk All the disk at CRT was IB connected. - 11 -
NERSC: File system Transfers OSF CRT - 12 -
NERSC: WAN Key component: 200G 16QAM transponder - 13 -
400G: Production (Sept 15) 400G service Termination point (4 x 100G Ethernet client) LBNL Ciena node Berkeley, CA - 14 -
Test Bed - 15 -
Testbed: Sept 15 Demonstrated 400G super-channel in lab at LBNL 37.5GHz spacing using 80km fiber spool Better utilization of the spectral bandwidth Using Raman amplification w/ integrated OTDR Validating next-gen ROADM technology: Flexible (gridless), colorless mux/demux Level3's acquiring of TW Telecom last year has caused some delay on bringing up the Dark Fiber for this project Goal to characterize next-gen ROADM architecture in the real-world and gain operational experience - 16 -
Testbed: 400G May 2016-17 -
Testbed: Industry Partner: Provided hands-on technical assistance Loaned four 40km single-mode fiber (SMF) spools Donated equipment: two colorless mux/demux, two Raman amplifiers, two switchable line amplifiers Four 40 km SMF-28 fiber spools 1 colorless mux/dem ux 1 Raman amp 1 switchable line amp - 18 -
Final Thoughts - 19 -
Summary: Project Timeline 2013 Dec White paper on Moving ESnet Beyond 100G. 2014 Feb Draft proposal. 2014 May FWP submitted to PAMS. Ciena presents SC 13 400G superchannel at TNC2014 [4] 2014 Sept Receive FY14 guidance. 2015 Jan CR ends. Receive FY15 guidance, project kick-off. Ciena and Brocade equipment procurement. 2015 Feb ALU equipment procurement. Level 3 and ESnet complete Ciena code upgrade. 2015 Mar Level 3 splicing procurement. 2015 May 400G testbed running in lab, super-channel PoC with Ciena spools (80 km) 2nd Ciena equipment procurement. 2015 Jul 400G link across BayExpress (11.5 km) put into production ready for upcoming NERSC relocation to Berkeley. 2015 Nov Press releases (right before SC15), Shyh Wang Hall building dedication. 2015 Dec NERSC relocates to Berkeley facility. Level 3 splices delivers dark fiber. 2016 May Field trial: 400G super-channel across 93.3 km (dark fiber plus spools). - 20 -
Summary: Filesystem Transfers - 21 -
Summary: Final Thoughts Took almost two years to deploy. Worked almost flawlessly For the 11 km length. Doesn t work around the entire 450k ring (OSNR too high) Took less then a month to move all the data from OSF to CRT No apparent down-time to the users. Took about an hour per file system to remount after a final sync Still in production today as NERSC moves out of OSF - 22 -
Thank you! - 23 -
Contact Info: PI: Chris Tracy: ctracy@es.net Co-PI: Jason Lee: jason@lbl.gov - 24 -
National Energy Research Scientific Computing Center - 25 -
NERSC: WAN Topology - 26 -
WAN: Topology (cont) - 27 -
NERSC: WAN Fiber Fiber provided by: Loaned 11.5 km dark fiber BCXN6956 between Oakland and Berkeley Supported Ciena code upgrade to support new hardware from this project ESnet contributed funds for fiber splicing work Fiber path is approximate - 28 -