The Science DMZ: Evolution Eli Dart, ESnet CC-NIE PI Meeting Washington, DC May 1, 2014
Why Are We Doing This? It s good to build high-quality infrastructure As network engineers, we like building networks Anything worth doing is worth doing well But really what s a network worth? Networks have very little intrinsic value The value of a network is in what you can do with it In order to be valuable a network must be useful Who uses the Science DMZ? 5/5/14 2
One Experimental Data Flow Triples Network Utilization for Major HPC Center 5/5/14 3
Our Past: Network as Infrastructure SEAT 0 PNNL LBNL JGI 0 0 SUNN 0 SNLL LLNL Salt Lake 0 AMES 1 0 ANL 0 STAR 0 0 EQCH CLEV 0 0 0 PPPL GFDL PU Physics JLAB BNL LOSA SDSC 1 LASV ALBU 0 LANL SNLA 0 0 0 0 0 0 Geography U.S. Department is of Energy Office of Science only representational
Three Historical Inflection Points for Global Research Networks 1. Abundant capacity (88 λ x 0Gbps) 2. Programmability 3. Campus architectures newly optimized for data mobility. Science DMZ + NSF grants.
Our Future: Network as Instrument SEAT 0 PNNL LBNL JGI 0 0 SUNN 0 SNLL LLNL Salt Lake 0 AMES 1 0 ANL 0 STAR 0 0 EQCH CLEV 0 0 0 PPPL GFDL PU Physics JLAB BNL LOSA SDSC 1 LASV ALBU 0 LANL SNLA 0 0 0 0 Networks are at the heart of the telescope. - Roshene McCool, Signal Transport Engineer for SKA, at NORDUNET 2012 0 0 Geography U.S. Department is of Energy Office of Science only representational
Network-Centric View of Large Hadron Collider (@CERN) Q1: where does discovery occur? CERN T1 mile s kms France 350 565 Italy 570 920 UK 625 00 Netherlands 625 00 Germany 700 1185 Spain 850 1400 Nordic 1300 20 USA New York 3900 6300 USA - Chicago 4400 70 Canada BC 5200 8400 Taiwan 60 9850 Source: Bill Johnston The LHC Open Network Environment (LHCONE) The LHC Optical Private Network (LHCOPN) O(1-) meter O(-0) meters O(1) km 500-,000 km CERN Computer Center detector Level 1 and 2 triggers Level 3 trigger ~50 Gb/s (25Gb/s ATLAS, 25Gb/s CMS) 1 PB/s Q2: where does the instrument end? LHC Tier 0 Deep archive and send data to Tier 1 centers LHC Tier 1 Data Centers LHC Tier 2 Analysis Centers
Evolution of LHC Data Model In chronological order: 1. Copy as much data as feasible to analysis centers worldwide, with hierarchical distribution. 2. Relax the hierarchy and rely on caching. 3. Use federated data stores to fetch portions of relevant data sets from remote storage (anywhere), just before they re needed. Increasing faith in global science networks.
How Do We Build The Network Instrument? First, it all has to work Build it well, keep it clean Run perfsonar, take action based on data Next, people need to know it s there Engage with users, experiments, programs Find out who is doing what, and what they would like to do After that, scientists need to see value Can they do their work better? Are there things they could not do without the network? They probably need help maybe a lot of help Helping is good then we succeed together 5/5/14 9
Example Map Out A Security Policy What does the DMZ resource need to do? Single workflow involving a single remote resource Data ingest/export involving a single remote system Identify the remote system, understand the tools, write the filter Local resource dedicated to a collaboration Where are the other parts of the collaboration? Does the collaboration use specific tools (e.g. workflow engine)? Global data service Data service probably uses standard tools Data service ports open to entire Internet How tightly does it need to be filtered? Do a realistic risk assessment Don t forget the auxiliary services! DNS, NTP, SSH, OAuth, patch servers, outbound mail for status, etc. These can typically be more tightly controlled (they are typically local services) 5/5/14
Example Globus DTN, Single Workflow Lab1 DTN security filters Lab1 DTN GE DTN TCP ports 50000-500 DATA DTN Lab2 DTN GE Lab2 DTN security filters Lab1 Science DMZ TCP ports 443, 2811, 7512 TCP ports 443, 2811, 7512 Lab2 Science DMZ 0GE Orchestration Orchestration GE Lab1 Border Router Lab2 Border Router 0GE Amazon AWS GE ESnet Router 0GE ESnet 0GE ESnet Router Logical data path Logical control path Physical data path Physical control path Lab1 DTN security filters Lab2 DTN security filters 5/5/14 11
Example Globus DTN, Global Data Service Local DTN DATA TCP ports 50000-500 DATA DTN GE DTN security filters Science DMZ Orchestration TCP ports 443, 2811, 7512 0GE DTN Remote DTNs GE 0GE Site / Campus Border Router Amazon AWS DTN GE World Logical data path Physical data path Logical control path Physical control path 5/5/14 12
Example Requirements Analysis Often scientists know they need to do something, but don t know how to integrate the pieces Working together, networking people can help guide toward a solution Don t ask what network pieces they need ask what they are trying to do, then derive the requirements from the science ESnet has a formal requirements analysis process that incorporates these ideas (see Lauren s talk tomorrow) 5/5/14 13
Example Integration of Instruments Many scientific instruments come with embedded or attached computing systems Sequencers Electron microscopes Mass spec. machines Typically, no user serviceable parts inside So, how does this thing get integrated into a workflow? Have it mount the DTN over a back-to-back G Double copy the files Better yet, start working with the vendor on a better way 5/5/14 14
The Infrastructure View Is Not Enough It s necessary, but not sufficient. Infrastructure-only view regards networks as other, static, opaque The substrate must be solid, flexible, robust but that s not the whole story The differentiating value of a network, or a Science DMZ, does not come just from infrastructure, but also from: ü The people who run it ü The services it provides customized for science ü The audacity to try to accomplish new things 5/5/14 15
The Instrument View Inspires Us Innovative Capabilities Tailored for Science Partnerships and Outreach Operational Substrate that Scales Quickly, Cheaply, Flexibly What can a network instrument do? enable new discovery processes and workflows offer APIs for discovery, inspection, virtualization, and control ü for applications, middleware, and other networks decouple data acquisition, data storage, and computation (making geography irrelevant) 5/5/14 16
ESnet Vision: discovery unconstrained by geography. US R&E (DREN/Internet2/NLR) CANADA (CANARIE) ASIA-PACIFIC (ASGC/Kreonet2/ TWAREN) RUSSIA AND CHINA (GLORIAD) CANADA (CANARIE) LHCONE FRANCE (OpenTransit) CERN (USLHCNet) ASIA-PACIFIC (KAREN/KREONET2/ NUS-GP/ODN/ REANNZ/SINET/ TRANSPAC/TWAREN) SEATTLE PNNL RUSSIA AND CHINA (GLORIAD) ASIA-PACIFIC (BNP/HEPNET) AUSTRALIA (AARnet) LATIN AMERICA CLARA/CUDI SUNNYVALE ASIA-PACIFIC (ASCC/KAREN/ KREONET2/NUS-GP/ ODN/REANNZ/ SINET/TRANSPAC) LBNL SACRAMENTO SLAC BOISE US R&E (DREN/Internet2/ NASA) US R&E (NASA/NISN/ USDOI) DENVER US R&E (DREN/Internet2/ NISN/NLR) AMES CHICAGO KANSAS CITY FNAL ANL BOSTON BNL NEW YORK PPPL WASHINGTON DC JLAB US R&E (Internet2/ NLR) CERN CANADA (CANARIE) EUROPE (GÉANT/ NORDUNET) ASIA-PACIFIC (SINET) ORNL AUSTRALIA (AARnet) ALBUQUERQUE NASHVILLE EUROPE (GÉANT) ATLANTA LATIN AMERICA (AMPATH/CLARA) LATIN AMERICA (CLARA/CUDI) El PASO HOUSTON US R&E (DREN/Internet2/ NISN)
Links ESnet fasterdata knowledge base http://fasterdata.es.net/ Science DMZ paper http://www.es.net/assets/pubs_presos/sc13scidmz-final.pdf Science DMZ email list https://gab.es.net/mailman/listinfo/sciencedmz perfsonar http://fasterdata.es.net/performance-testing/perfsonar/ http://www.perfsonar.net/ Additional material http://fasterdata.es.net/science-dmz/ http://fasterdata.es.net/host-tuning/ 5/5/14 18
Thanks! Questions? Eli Dart dart@es.net http://www.es.net/ http://fasterdata.es.net/
Thanks! Questions? Eli Dart dart@es.net http://www.es.net/ http://fasterdata.es.net/