Programmable Information Highway (with no Traffic Jams) Inder Monga Energy Sciences Network Scientific Networking Division Lawrence Berkeley National Lab
Exponential Growth ESnet Accepted Traffic: Jan 1990 - Aug 2012 (Log Scale) 0.0000.0000 70% CAGR growth From PB/month 0 PB/month in 2015 1.0000 Is there an impedance mismatch? Petabytes 0.00 0.00 0.00 Source: Wikipedia Source: NERSC 0.0001 0.0000 Jul 2011 Jan 2011 Jul 20 Jan 20 Jul 2009 Jan 2009 Jul 2008 Jan 2008 Jul 2007 Jan 2007 Jul 2006 Jan 2006 Jul 2005 Jan 2005 Jul 2004 Jan 2004 Jul 2003 Jan 2003 Jul 2002 Jan 2002 Jul 2001 Jan 2001 Jul 2000 Jan 2000 Jul 1999 Jan 1999 Jul 1998 Jan 1998 Jul 1997 Jan 1997 Jul 1996 Jan 1996 Jul 1995 Jan 1995 Jul 1994 Jan 1994 Jul 1993 Jan 1993 Jul 1992 Jan 1992 Jul 1991 Jan 1991 Jul 1990 Jan 1990 Jul 2012 Jan 2012
HEP as a Prototype for Data-Intensive Science Data courtesy of Harvey Newman, Caltech, and Richard Mount, SLAC and Belle II CHEP 2012 presentation
Moving data is going to be a way of life
SEAT 0 PNNL LBNL JGI 0 ANL 1 0 0 AMES 0 SUNN Salt Lake 0 0 Network: SNLL an integral 0 part 0 of the application LLNL LOSA SDSC 1 LASV ALBU 0 LANL SNLA workflow STAR 0 0 EQCH 0 CLEV 0 0 0 0 PPPL GFDL PU Physics JLAB BNL 0 0
A Network-Centric View of LHC CERN T1 mile s kms France 350 565 Italy 570 920 UK 625 00 Netherlands 625 00 Germany 700 1185 Spain 850 1400 Nordic 1300 20 USA New York 3900 6300 USA - Chicago 4400 70 Canada BC 5200 8400 Taiwan 60 9850 Source: Bill Johnston The LHC Open Network Environment (LHCONE) The LHC Optical Private Network (LHCOPN) O(1-) meter O(-0) meters O(1) km 500-,000 km CERN Computer Center detector Level 1 and 2 triggers Level 3 trigger ~50 Gb/s (25Gb/s ATLAS, 25Gb/s CMS) 1 PB/s LHC Tier 0 Deep archive and send data to Tier 1 centers LHC Tier 1 Data Centers LHC Tier 2 Analysis Centers
A Network Centric View of the SKA Receptors/sensors ~200km, avg. ~15,000 Tb/s aggregate correlator / data processor Source: Bill Johnston ~00 km 400 Tb/s aggregate from SKA RFI ~25,000 km (Perth to London via USA) or ~13,000 km (South Africa to London) Hypothetical (based on the LHC experience) National tier 1 supercomputer European distribution point National tier 1 0.1 Tb/s (0 Gb/s) aggregate 1 fiber data path per tier 1 data center.03 Tb/s (30 Gb/s) each National tier 1 astronomy Lawrence Berkeley astronomy National astronomy Laboratory astronomy astronomy astronomy U.S. Department astronomy of Energy astronomy Office of Science astronomy
New thinking changes the language of interaction Infrastructure Provides best-effort IP dialtone Average end-to-end performance, packet loss is fine How much bandwidth do you need? (1G/G/0G) Ping works, you are all set, go away Instrument Adapts to the requirements of the experiment, science, end-to-end flow Highly calibrated, zero packet loss end-to-end What s your sustained end-to-end performance in bits/sec? Can I get the same performance anytime? Tuned to meet the application s workflow needs, across network domains
Adjectives for Network as a Instrument (NaaI) End-to-End Programmable Simple Predictable
Science DMZ: remove roadblocks towards end-to-end performance Science DMZ a well-configured location for high-performance WAN-facing science services Located at or near site perimeter on dedicated infrastructure Dedicated, high-performance data movers Highly capable network devices (wire-speed, deep queues) Virtual circuit connectivity Security policy and enforcement are specific to science workflows perfsonar
Programmable networks (1): Service Interfaces Intelligent Network Services Reservation Scheduled Standard Service Interface NSI (OGF) Slide courtesy: ARCHSTONE project
Programmable networks (2): Software-Defined Networking (SDN) Flexible, Programmable, separation of flows Demonstrated at Joint Techs 2011
Eric Pouyoul, Inder Monga, Brian Tierney (ESnet), Martin Swany (Indiana) & Ezra Kissel (U. of Delaware) Programmable networks (3): New Protocols for high-speed data transfer Bridging end-site dynamic flows with WAN dynamic tunnels - Zero-configuration virtual circuit from end-host to end-host - Automated discovery of circuit end-points Cross-country RDMA-over-Ethernet high-performance data transfers Mb/s 000 8000 6000 4000 2000 SC11 Demonstration 0 time (s) - 9.8 Gbps on Gbps,78 ms RTT link between Brookhaven in NY to Seattle, WA - <4% CPU load compared to 1 stream TCP w/80% util. - No special host hardware other than NIC with RoCE support RDMA TCP CPU core utilization 70-90% - 1 stream TCP 3-4% - RDMA OpenFlow @SC11, Seattle OSCARS/ESnet4 OpenFlow @ BNL Fully Automated, End to End, Dynamically Stitched, Virtual Connection
Simple abstractions How can Storage leverage this simple network abstraction? SDN Ctrl. Program flows A Virtual Switch SDN Ctrl. Wide Area Network SRS Demo with Ciena, in Ciena s booth 2437
Conclusion Moving data fast(er) is a 21 st century reality Distributed Science Collaborations, Large Instruments, Cloud Computing Network is not an infrastructure, but an instrument Think Different, do not set your expectations about the network as your traffic highway Simple, Programmable, Network abstractions with a Service Interface How will storage workflows leverage that?
Inder Monga imonga@es.net www.es.net, fasterdata.es.net Thank You!