Network Architecture and Services to Support Large-Scale Science: An ESnet Perspective

Similar documents
perfsonar Update Jason Zurawski Internet2 March 5, 2009 The 27th APAN Meeting, Kaohsiung, Taiwan

Programmable Information Highway (with no Traffic Jams)

The Science DMZ: Evolution

ESnet4: Networking for the Future of DOE Science

ESnet Update Winter 2008 Joint Techs Workshop

ESnet Planning, Status, and Future Issues

NM-WG Specification Adoption in perfsonar. Aaron Brown, Internet2, University of Delaware Martin Swany University of Delaware, Internet2

Connectivity Services, Autobahn and New Services

ESnet Update Summer 2008 Joint Techs Workshop

DICE Diagnostic Service

Issues and Challenges in Optical Network Design

Performance Update 10 pounds of stuff in a 5 pound bag

ESnet Status Update. ESCC, July Networking for the Future of Science

Data Transfers Between LHC Grid Sites Dorian Kcira

TMC-Rice-UH data communication needs for Research and Education

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

By establishing IGTF, we are seeing

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Towards Network Awareness in LHC Computing

ESnet s primary mission is to enable the largescale science that is the mission of the Office of Science (SC) and that depends on:

NLR Update: Backbone Upgrade Joint Techs July 2008

DYNES: DYnamic NEtwork System

UltraScience Net Update: Network Research Experiments

Presentation of the LHCONE Architecture document

Enhancing Infrastructure: Success Stories

IEPSAS-Kosice: experiences in running LCG site

Next Generation Networking and The HOPI Testbed

ESnet s Advanced Networking Initiative and Testbed

5 August 2010 Eric Boyd, Internet2 Deputy CTO

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

The Grid Architecture

ESnet Update. Summer 2010 Joint Techs Columbus, OH. Steve Cotter, ESnet Dept. Head Lawrence Berkeley National Lab

THOUGHTS ON SDN IN DATA INTENSIVE SCIENCE APPLICATIONS

Evolution of OSCARS. Chin Guok, Network Engineer ESnet Network Engineering Group. Winter 2012 Internet2 Joint Techs. Baton Rouge, LA.

Experiences with Dynamic Circuit Creation in a Regional Network Testbed

Zhengyang Liu University of Virginia. Oct 29, 2012

perfsonar ESCC Indianapolis IN

LHC and LSST Use Cases

SubOptic 2007 May 15, 2007 Baltimore, MD

Challenges of Big Data Movement in support of the ESA Copernicus program and global research collaborations

Implementation of the Pacific Research Platform over Pacific Wave

Virtual Circuits Landscape

The CMS Computing Model

Achieving the Science DMZ

Storage and I/O requirements of the LHC experiments

International Big Science Coming to Your Campus Soon (Sooner Than You Think )

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

SENSE: SDN for End-to-end Networked Science at the Exascale

FEDERICA Federated E-infrastructure Dedicated to European Researchers Innovating in Computing network Architectures

IRNC:RXP SDN / SDX Update

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

November 1 st 2010, Internet2 Fall Member Mee5ng Jason Zurawski Research Liaison

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

ALICE Grid Activities in US

Vasilis Maglaris. Chairman, NREN Policy Committee - GÉANT Consortium Coordinator, NOVI FIRE Project

Improving Network Infrastructure to Enable Large Scale Scientific Data Flows and Collaboration (Award # ) Klara Jelinkova Joseph Ghobrial

The European DataGRID Production Testbed

GÉANT Mission and Services

Wide-Area Networking at SLAC. Warren Matthews and Les Cottrell (SCS Network Group) Presented at SLAC, April

The Software Journey: from networks to visualization

Building 10-Gbps Networks: A few observations on the national and regional scales in the U.S.

ISTITUTO NAZIONALE DI FISICA NUCLEARE

GÉANT3 Services. Ann Harding, SWITCH TNC Connectivity and Monitoring Services by and for NRENs. connect communicate collaborate

CSCS CERN videoconference CFD applications

HOME GOLE Status. David Foster CERN Hawaii GLIF January CERN - IT Department CH-1211 Genève 23 Switzerland

Travelling securely on the Grid to the origin of the Universe

BigData Express: Toward Predictable, Schedulable, and High-Performance Data Transfer. BigData Express Research Team November 10, 2018

Grid Computing a new tool for science

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

The perfsonar Project at 10 Years: Status and Trajectory

Introduction to Geant4

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Integration of Network Services Interface version 2 with the JUNOS Space SDK

Next Generation Integrated Architecture SDN Ecosystem for LHC and Exascale Science. Harvey Newman, Caltech

Grid Tutorial Networking

Engagement With Scientific Facilities

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

BNL Dimitrios Katramatos Sushant Sharma Dantong Yu

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

US LHCNet: Transatlantic Networking for the LHC and the U.S. HEP Community

Internet2 and IPv6: A Status Update

ANI Testbed Project Update

Industry Perspectives on Optical Networking. Joe Berthold 28 September 2004

SDN Peering with XSP. Ezra Kissel Indiana University. Internet2 Joint Techs / TIP2013 January 2013

How PERTs Can Help With Network Performance Issues

Virtual Organizations in Academic Settings

Internet2: Presentation to Astronomy Community at Haystack. T. Charles Yun April 2002

AWS Pilot Report M. O Connor, Y. Hines July 2016 Version 1.3

perfsonar Going Forward Eric Boyd, Internet2 Internet2 Technology Exchange September 27 th 2016

IST ATRIUM. A testbed of terabit IP routers running MPLS over DWDM. TF-NGN meeting

LHC Open Network Environment. Artur Barczyk California Institute of Technology Baton Rouge, January 25 th, 2012

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

Invenio: A Modern Digital Library for Grey Literature

Internet2 DCN and Dynamic Circuit GOLEs. Eric Boyd Deputy Technology Officer Internet2 GLIF Catania March 5, 2009

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

Enabling High Performance Data Centre Solutions and Cloud Services Through Novel Optical DC Architectures. Dimitra Simeonidou

WLCG Network Throughput WG

ESnet s (100G) SDN Testbed

The New Internet2 Network

High Throughput WAN Data Transfer with Hadoop-based Storage

Transcription:

Network Architecture and Services to Support Large-Scale Science: An ESnet Perspective Joint Techs January, 2008 William E. Johnston ESnet Department Head and Senior Scientist Energy Sciences Network Lawrence Berkeley National Laboratory wej@es.net, www.es.net This talk is available at www.es.net/esnet4 Networking for the Future of Science 1

DOE s Office of Science: Enabling Large-Scale Science The Office of Science (SC) is the single largest supporter of basic research in the physical sciences in the United States, providing more than 40 percent of total funding for the Nation s research programs in high-energy physics, nuclear physics, and fusion energy sciences. (http://www.science.doe.gov) SC funds 25,000 PhDs and PostDocs A primary mission of SC s National Labs is to build and operate very large scientific instruments - particle accelerators, synchrotron light sources, very large supercomputers - that generate massive amounts of data and involve very large, distributed collaborations Distributed data analysis and simulation is the emerging approach for these complex problems ESnet is an SC program whose primary mission is to enable the largescale science of the Office of Science (SC) that depends on: Sharing of massive amounts of data Supporting thousands of collaborators world-wide Distributed data processing Distributed data management Distributed simulation, visualization, and computational steering Collaboration with the US and International Research and Education community 2

A Systems of Systems Approach for Distributed Simulation A complete approach to climate modeling involves many interacting models and data that are provided by different groups at different locations Evaporation Transpiration Snow Melt Infiltration Runoff closely coordinated and interdependent distributed systems that must have predictable intercommunication for effective functioning Climate Temperature, Precipitation, Radiation, Humidity, Wind Heat Moisture Momentum Biogeophysics Energy Soil Water Water Hydrology Snow Aerodynamics Intercepted Water Watersheds Surface Water Subsurface Water Geomorphology Hydrologic Cycle Microclimate Canopy Physiology Species Composition Ecosystem Structure Nutrient Availability Water Chemistry CO 2, CH 4, N 2 O ozone, aerosols CO 2 CH 4 N 2 O VOCs Dust Biogeochemistry Carbon Assimilation Decomposition Mineralization Phenology Bud Break Leaf Senescence Ecosystems Species Composition Ecosystem Structure Vegetation Dynamics Gross Primary Production Plant Respiration Microbial Respiration Nutrient Availability Disturbance Fires Hurricanes Ice Storms Windthrows (Courtesy Gordon Bonan, NCAR: Ecological Climatology: Concepts and Applications. Cambridge University Press, Cambridge, 2002.) Minutes-To-Hours Days-To-Weeks Years-To-Centuries 3

Large-Scale Science: High Energy Physics Large Hadron Collider (Accelerator) at CERN LHC Goal - Detect the Higgs Boson The Higgs boson is a hypothetical massive scalar elementary particle predicted to exist by the Standard Model of particle physics. It is the only Standard Model particle not yet observed, but plays a key role in explaining the origins of the mass of other elementary particles, in particular the difference between the massless photon and the very heavy W and Z bosons. Elementary particle masses, and the differences between electromagnetism (caused by the photon) and the weak force (caused by the W and Z bosons), are critical to many aspects of the structure of microscopic (and hence macroscopic) matter; thus, if it exists, the Higgs boson has an enormous effect on the world around us.

The Largest Facility: Large Hadron Collider at CERN LHC CMS detector 15m X 15m X 22m,12,500 tons, $700M CMS is one of several major detectors (experiments). The other large detector is ATLAS. human (for scale) Two counter-rotating, 7 TeV proton beams, 27 km circumference (8.6 km diameter), collide in the middle of the detectors 5

Data Management Model: A refined view of the LHC Data Grid Hierarchy where operations of the Tier2 centers and the U.S. Tier1 center are integrated through network connections with typical speeds in the 10 Gbps range. [ICFA SCIC] closely coordinated and interdependen t distributed systems that must have predictable intercommuni cation for effective functioning

Accumulated data (Terabytes) received by CMS Data Centers ( tier1 sites) and many analysis centers ( tier2 sites) during the past 12 months (15 petabytes of data) [LHC/CMS] This sets the scale of the LHC distributed data analysis problem.

Service Oriented Architecture Data Management Service Data Management Services Version management Workflow management Master copy management Reliable Replication Service Metadata Services Replica Location Service Overlapping hierarchical directories Soft state registration Compressed state updates Reliable File Transfer Service GridFTP caches local archival storage See: Giggle: Framework for Constructing Scalable Replica Location Services. Chervenak, et al. http://www.globus.org/ research/papers/giggle.pdf 8

Workflow View of a Distributed Data Management Service Elements of a Service Oriented Architecture application may interact in complex ways that make reliable communication service important to the overall functioning of the system Need γ γ is known. Contact Materialized Data Catalogue. Metadata Catalogue Need γ Need γ Need to materialize γ Need γ Need γ Materialized Data Catalogue Estimate for generating γ Abstract Planner (for materializing data) Materialize γ with γpers Proceed? γpers requires β How to generate γ (β is at βlfn) β is materialized at βlfn Resolve βlfn Virtual Data Catalogue (how to generate β and γ) Concrete Planner (generates workflow) βpfn Exact steps to generate γ Have γ Grid compute resources LFN for γ Grid workflow engine Data generation workflow γ data and LFN LFN for γ Adapted from LHC/CMS Data Grid: CMS Data Grid elements: see USCMS/GriPhyN/PPDG prototype virtual data grid system Software development and integration planning for 1,2,3Q2002 V1.0, 1 March 2002. Koen Holtman NSF GriPhyN, EU DataGrid, DOE Data Grid Toolkit unified project elements: see Giggle - A Framework for Constructing Scalable Replica Location Services to be presented at SC02 (http://www.globus.org/research/papers/giggle.pdf) LFN = logical file name PFN = physical file name PERS = prescription for generating unmaterialized data Data Grid replica services Grid storage resources

Service Oriented Architecture / Systems of Systems Two types of systems seem to be likely 1) Where the components are them selves standalone elements that are frequently used that way, but that can also be integrated into the types of systems implied by the complex climate modeling example 2) Where the elements are normally used integrated into a distributed system, but the elements of the system are distributed because of compute, storage, or data resource availability this is the case with the high energy physics data analysis 10

The LHC Data Management System has Several Characteristics that Result in Requirements for the Network and its Services The systems are data intensive and high-performance, typically moving terabytes a day for months at a time The system are high duty-cycle, operating most of the day for months at a time in order to meet the requirements for data movement The systems are widely distributed typically spread over continental or inter-continental distances Such systems depend on network performance and availability, but these characteristics cannot be taken for granted, even in well run networks, when the multi-domain network path is considered The applications must be able to get guarantees from the network that there is adequate bandwidth to accomplish the task at hand The applications must be able to get information from the network that allows graceful failure and auto-recovery and adaptation to unexpected network conditions that are short of outright failure This slide drawn from [ICFA SCIC]

Enabling Large-Scale Science These requirements are generally true for systems with widely distributed components to be reliable and consistent in performing the sustained, complex tasks of large-scale science Networks must provide communication capability that is service-oriented: configurable schedulable predictable reliable informative and the network and its services must be scalable and geographically comprehensive 12

Networks Must Provide Communication Capability that is Service-Oriented Configurable Must be able to provide multiple, specific paths (specified by the user as end points) with specific characteristics Schedulable Premium service such as guaranteed bandwidth will be a scarce resource that is not always freely available, therefore time slots obtained through a resource allocation process must be schedulable Predictable A committed time slot should be provided by a network service that is not brittle - reroute in the face of network failures is important Reliable Reroutes should be largely transparent to the user Informative When users do system planning they should be able to see average path characteristics, including capacity When things do go wrong, the network should report back to the user in ways that are meaningful to the user so that informed decisions can about alternative approaches Scalable The underlying network should be able to manage its resources to provide the appearance of scalability to the user Geographically comprehensive The R&E network community must act in a coordinated fashion to provide this environment end-to-end

The ESnet Approach Provide configurability, schedulability, predictability, and reliability with a flexible virtual circuit service - OSCARS User* specifies end points, bandwidth, and schedule OSCARS can do fast reroute of the underlying MPLS paths Provide useful, comprehensive, and meaningful information on the state of the paths, or potential paths, to the user perfsonar, and associated tools, provide real time information in a form that is useful to the user (via appropriate network abstractions) and that is delivered through standard interfaces that can be incorporated in to SOA type applications Techniques need to be developed to monitor virtual circuits based on the approaches of the various R&E nets - e.g. MPLS in ESnet, VLANs, TDM/grooming devices (e.g. Ciena Core Directors), etc., and then integrate this into a perfsonar framework * User = human or system component (process) 14

The ESnet Approach Scalability will be provided by new network services that, e.g., provide dynamic wave allocation at the optical layer of the network Currently an R&D project Geographic ubiquity of the services can only be accomplished through active collaborations in the global R&E network community so that all sites of interest to the science community can provide compatible services for forming end-to-end virtual circuits Active and productive collaborations exist among numerous R&E networks: ESnet, Internet2, CANARIE, DANTE/GÉANT, some European NRENs, some US regionals, etc. 15

1) Network Architecture Tailored to Circuit-Oriented Services ESnet4 is a hybrid network: IP + L2/3 Science Data Network (SDN) - OSCARS circuits can span both IP and SDN Seattle Portland (>1 λ) ESnet 2011 Configuration Sunnyvale 5λ Boise Salt Lake City 5λ Denver 5λ KC Chicago 3λ 5λ Clev. 5λ Boston 5λ NYC Philadelphia 5λ Wash. DC LA San Diego Albuq. Tulsa 3λ Nashville 3λ Atlanta OC48 5λ Raleigh El Paso ESnet IP switch/router hubs ESnet IP switch only hubs Houston ESnet SDN switch hubs Layer 1 optical nodes at eventual ESnet Points of Presence Layer 1 optical nodes not currently in ESnet plans Lab site Baton Rouge (20) Jacksonville ESnet IP core (1λ) ESnet Science Data Network core ESnet SDN core, NLR links (existing) Lab supplied link LHC related link MAN link International IP Connections Internet2 circuit number 16

Sunnyvale LA San Diego High Bandwidth all the Way to the End Sites major ESnet sites are now effectively directly on the ESnet core network e.g. the Seattle bandwidth into and out (28) of FNAL Portland is equal to, or greater, than the (29) 5λ ESnet core bandwidth (23) (24) (>1 λ) Boise USLHCNet Salt Lake City ESnet IP switch/router hubs El Paso ESnet IP switch only hubs LLNL ESnet SDN switch hubs SNLL 5λ FNAL Denver Albuq. Tulsa KC (5) Baton Houston Nashville Rouge 56 Marietta Wash., DC (SOX) Layer 1 optical nodes at eventual ESnet Points of Presence Layer 1 optical nodes not currently in ESnet plans Lab site (32) JGI (7) San Francisco Bay Area MAN SLAC LBNL West Chicago MAN Starlight NERSC (1) (17) (0) 600 W. Chicago (15) 5λ ANL (19) (20) (22) 3λ Atlanta MAN ORNL Houston Boston Chicago (9) 5λ (11) Clev. 5λ (10) NYC (13) 5λ (25) Philadelphia 5λ (26) (21) Wash. DC 3λ Nashville 3λ Atlanta (3) Long Island MAN 180 Peachtree (6) (20) USLHCNet 32 AoA, NYC OC48 5λ (2) (4) BNL (30) Raleigh Jacksonville Wash., DC ESnet IP core (1λ) MATP ESnet Science Data Network core ESnet SDN core, NLR links JLab (existing) Lab supplied link ELITE LHC related link MAN link ODU International IP Connections Internet2 circuit number

2) Multi-Domain Virtual Circuits ESnet OSCARS [OSCARS] project has as its goals: Traffic isolation and traffic engineering Provides for high-performance, non-standard transport mechanisms that cannot co-exist with commodity TCP-based transport Enables the engineering of explicit paths to meet specific requirements e.g. bypass congested links, using lower bandwidth, lower latency paths Guaranteed bandwidth (Quality of Service (QoS)) User specified bandwidth Addresses deadline scheduling Where fixed amounts of data have to reach sites on a fixed schedule, so that the processing does not fall far enough behind that it could never catch up very important for experiment data analysis Reduces cost of handling high bandwidth data flows Highly capable routers are not necessary when every packet goes to the same place Use lower cost (factor of 5x) switches to relatively route the packets Secure connections The circuits are secure to the edges of the network (the site boundary) because they are managed by the control plane of the network which is isolated from the general traffic End-to-end (cross-domain) connections between Labs and collaborating institutions 18

OSCARS Human User User User Application User request via WBUI User feedback User app request via AAAS Web-Based User Interface Authentication, Authorization, And Auditing Subsystem Reservation Manager Path Setup Subsystem Bandwidth Scheduler Subsystem Instructions to routers and switches to setup/teardown MPLS LSPs To ensure compatibility, the design and implementation is done in collaboration with the other major science R&E networks and end sites Internet2: Bandwidth Reservation for User Work (BRUW) Development of common code base GÉANT: Bandwidth on Demand (GN2-JRA3), Performance and Allocated Capacity for End-users (SA3-PACE) and Advance Multi-domain Provisioning System (AMPS) extends to NRENs BNL: TeraPaths - A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research GA: Network Quality of Service for Magnetic Fusion Research SLAC: Internet End-to-end Performance Monitoring (IEPM) USN: Experimental Ultra-Scale Network Testbed for Large-Scale Science DRAGON/HOPI: Optical testbed 19

3) perfsonar Monitoring Applications Move Us Toward Service-Oriented Communications Services E2Emon provides end-to-end path status in a service-oriented, easily interpreted way a perfsonar application used to monitor the LHC paths end-to-end across many domains uses perfsonar protocols to retrieve current circuit status every minute or so from MAs and MPs in all the different domains supporting the circuits is itself a service that produces Web based, real-time displays of the overall state of the network, and it generates alarms when one of the MP or MA s reports link problems.

E2Emon: Status of E2E link CERN-LHCOPN-FNAL-001 E2Emon generated view of the data for one OPN link [E2EMON]

E2Emon: Status of E2E link CERN-LHCOPN-FNAL-001 Paths are not always up, of course - especially international paths that may not have an easy alternative path [http://lhcopnmon1.fnal.gov:9090/fermi-e2e/g2_e2e_view_e2elink_fermi-in2p3-igtmd-002.html] 22

Path Performance Monitoring Path performance monitoring needs to provide users/applications with the end-to-end, multi-domain traffic and bandwidth availability should also provide real-time performance such as path utilization and/or packet drop Multiple path performance monitoring tools are in development One example Traceroute Visualizer [TrViz] has been deployed at about 10 R&E networks in the US and Europe that have at least some of the required perfsonar MA services to support the tool 23

Traceroute Visualizer Forward direction bandwidth utilization on application path from LBNL to INFN-Frascati (Italy) traffic shown as bars on those network device interfaces that have an associated MP services (the first 4 graphs are normalized to 2000 Mb/s, the last to 500 Mb/s) 1 ir1000gw (131.243.2.1) 2 er1kgw 3 lbl2-ge-lbnl.es.net link capacity is also provided 10 esnet.rt1.nyc.us.geant2.net (NO DATA) 11 so-7-0-0.rt1.ams.nl.geant2.net (NO DATA) 12 so-6-2-0.rt1.fra.de.geant2.net (NO DATA) 13 so-6-2-0.rt1.gen.ch.geant2.net (NO DATA) 14 so-2-0-0.rt1.mil.it.geant2.net (NO DATA) 15 garr-gw.rt1.mil.it.geant2.net (NO DATA) 16 rt1-mi1-rt-mi2.mi2.garr.net 4 slacmr1-sdn-lblmr1.es.net (GRAPH OMITTED) 5 snv2mr1-slacmr1.es.net (GRAPH OMITTED) 6 snv2sdn1-snv2mr1.es.net 17 rt-mi2-rt-rm2.rm2.garr.net (GRAPH OMITTED) 18 rt-rm2-rc-fra.fra.garr.net (GRAPH OMITTED) 19 rc-fra-ru-lnf.fra.garr.net (GRAPH OMITTED) 7 chislsdn1-oc192-snv2sdn1.es.net (GRAPH OMITTED) 8 chiccr1-chislsdn1.es.net 20 21 www6.lnf.infn.it (193.206.84.223) 189.908 ms 189.596 ms 189.684 ms 9 aofacr1-chicsdn1.es.net (GRAPH OMITTED) 24

layer perfsonar architecture architectural relationship examples interface user performance GUI client (e.g. part of an application system communication service manager) real-time end-to-end performance graph (e.g. bandwidth or packet loss vs. time) historical performance data for planning purposes event subscription service (e.g. end-to-end path segment outage) path monitor event subscription service service locator service topology aggregator measurement archive measurement point measurement export m1 m2 m3 measurement export m1 m4 m3 measurement export m1 m5 m6 m3 The measurement points (m1.m6) are the real-time feeds from the network or local monitoring devices The Measurement Export service converts each local measurement to a standard format for that type of measurement network domain 1 network domain 2 network domain 3

perfsonar Only Works E2E When All Networks Participate Our collaborations are inherently multi-domain, so for an end-to-end monitoring tool to work everyone must participate in the monitoring infrastructure performance GUI user path monitor measurement export measurement export measurement archive measurement export m1 m3 m4 measurement export m1 m3 m4 measurement export m1 m3 m4 FNAL (AS3152) [US] m1 m4 GEANT (AS20965) [Europe] m1 m3 m4 DESY (AS1754) [Germany] m3 ESnet (AS293) [US] DFN (AS680) [Germany] 26

Conclusions To meet the existing overall bandwidth requirements of large-scale science networks must deploy adequate infrastructure mostly on-track to meet this requirement To meet the emerging requirements of how largescale science software system are built the network community must provide new services that allow the network to be a service element that can be integrated into a Service Oriented Architecture / System of Systems framework progress is being made in this direction 27

Federated Trust Services Support for Large-Scale Collaboration Remote, multi-institutional, identity authentication is critical for distributed, collaborative science in order to permit sharing widely distributed computing and data resources, and other Grid services Public Key Infrastructure (PKI) is used to formalize the existing web of trust within science collaborations and to extend that trust into cyber space The function, form, and policy of the ESnet trust services are driven entirely by the requirements of the science community and by direct input from the science community International scope trust agreements that encompass many organizations are crucial for large-scale collaborations ESnet has lead in negotiating and managing the cross-site, crossorganization, and international trust relationships to provide policies that are tailored for collaborative science This service, together with the associated ESnet PKI service, is the basis of the routine sharing of HEP Grid-based computing resources between US and Europe 28

ESnet Public Key Infrastructure CAs are provided with different policies as required by the science community o DOEGrids CA has a policy tailored to accommodate international science collaboration o NERSC CA policy integrates CA and certificate issuance with NIM (NERSC user accounts management services) o FusionGrid CA supports the FusionGrid roaming authentication and authorization services, providing complete key lifecycle management o Stats: o User certificates issued 5237 o Host & service certificates issued 11704 o Total no. of currently active certificates 6982 DOEGrids CA NERSC CA ESnet root CA FusionGrid CA CA See www.doegrids.org 29

References [OSCARS] For more information contact Chin Guok (chin@es.net). Also see http://www.es.net/oscars [LHC/CMS] http://cmsdoc.cern.ch/cms/aprom/phedex/prod/activity::rateplots? graph=quantity_cumulative&entity=src&src_filter=&dest_filter=&no_mss =true&period=l52w&upto= [ICFA SCIC] Networking for High Energy Physics. International Committee for Future Accelerators (ICFA), Standing Committee on Inter-Regional Connectivity (SCIC), Professor Harvey Newman, Caltech, Chairperson. - http://monalisa.caltech.edu:8080/slides/icfascic2007/ [E2EMON] Geant2 E2E Monitoring System developed and operated by JRA4/WI3, with implementation done at DFN http://cnmdev.lrz-muenchen.de/e2e/html/g2_e2e_index.html http://wiki.perfsonar.net/jra1- wiki/index.php/ PerfSONAR_support_for_E2E_Link_Monitoring [TrViz] ESnet PerfSONAR Traceroute Visualizer https://performance.es.net/cgi-bin/level0/perfsonar-trace.cgi 30