Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

Similar documents
Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Overview. About CERN 2 / 11

Virtualizing a Batch. University Grid Center

Programmable Information Highway (with no Traffic Jams)

perfsonar Update Jason Zurawski Internet2 March 5, 2009 The 27th APAN Meeting, Kaohsiung, Taiwan

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

WLCG Network Throughput WG

Storage and I/O requirements of the LHC experiments

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

The CMS Computing Model

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Deployment of a WLCG network monitoring infrastructure based on the perfsonar-ps technology

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

IEPSAS-Kosice: experiences in running LCG site

perfsonar Deployment on ESnet

CC-IN2P3: A High Performance Data Center for Research

ATLAS Experiment and GCE

ANSE: Advanced Network Services for [LHC] Experiments

Grid Computing: dealing with GB/s dataflows

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

CMS High Level Trigger Timing Measurements

Data Transfers Between LHC Grid Sites Dorian Kcira

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Grid Computing a new tool for science

Experience of the WLCG data management system from the first two years of the LHC data taking

Reprocessing DØ data with SAMGrid

CouchDB-based system for data management in a Grid environment Implementation and Experience

Tracking and flavour tagging selection in the ATLAS High Level Trigger

Intercontinental Multi-Domain Monitoring for LHC with perfsonar

perfsonar ESCC Indianapolis IN

Federated data storage system prototype for LHC experiments and data intensive science

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

Installation & Basic Configuration

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

Andrea Sciabà CERN, Switzerland

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

The Science DMZ: Evolution

perfsonar psui in a multi-domain federated environment

CERN and Scientific Computing

1. Introduction. Outline

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Grid Computing: dealing with GB/s dataflows

N. Marusov, I. Semenov

The LHC Computing Grid

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Tier2 Centers. Rob Gardner. University of Chicago. LHC Software and Computing Review UC San Diego Feb 7-9, 2006

Connectivity Services, Autobahn and New Services

Grid Computing at the IIHE

Adding timing to the VELO

Summary of the LHC Computing Review

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Data oriented job submission scheme for the PHENIX user analysis in CCJ

ATLAS 実験コンピューティングの現状と将来 - エクサバイトへの挑戦 坂本宏 東大 ICEPP

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

Tackling tomorrow s computing challenges today at CERN. Maria Girone CERN openlab CTO

Experience with ATLAS MySQL PanDA database service

CMS Computing Model with Focus on German Tier1 Activities

Deploying distributed network monitoring mesh for LHC Tier-1 and Tier-2 sites

Overview of ATLAS PanDA Workload Management

Data Quality Monitoring Display for ATLAS experiment

NCP Computing Infrastructure & T2-PK-NCP Site Update. Saqib Haleem National Centre for Physics (NCP), Pakistan

The perfsonar Project at 10 Years: Status and Trajectory

Batch Services at CERN: Status and Future Evolution

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Canadian Networks for Particle Physics Research 2011 Report to the Standing Committee on Interregional Connectivity, ICFA Panel January 2012

CSCS CERN videoconference CFD applications

Improving Network Infrastructure to Enable Large Scale Scientific Data Flows and Collaboration (Award # ) Klara Jelinkova Joseph Ghobrial

Introduction to. Network Startup Resource Center. Partially adopted from materials by

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

Support for multiple virtual organizations in the Romanian LCG Federation

Prompt data reconstruction at the ATLAS experiment

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

An ATCA framework for the upgraded ATLAS read out electronics at the LHC

A Step Towards Automated Event Diagnosis Stanford Linear Accelerator Center. Adnan Iqbal, Yee-Ting Li, Les Cottrell Connie A. Log.

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Storage Resource Sharing with CASTOR.

The ATLAS PanDA Pilot in Operation

The JINR Tier1 Site Simulation for Research and Development Purposes

Microsoft IT Leverages its Compute Service to Virtualize SharePoint 2010

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

High-Energy Physics Data-Storage Challenges

Internet2 Technology Update. Eric Boyd Deputy Technology Officer

Grid Computing Activities at KIT

Direct photon measurements in ALICE. Alexis Mas for the ALICE collaboration

UW-ATLAS Experiences with Condor

Multi-domain Internet Performance Sampling and Analysis Tools

Embedded Network Systems. Internet2 Technology Exchange 2018 October, 2018 Eric Boyd Ed Colone

THE ATLAS DATA ACQUISITION SYSTEM IN LHC RUN 2

Travelling securely on the Grid to the origin of the Universe

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

High Throughput WAN Data Transfer with Hadoop-based Storage

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

Review of the Compact Muon Solenoid (CMS) Collaboration Heavy Ion Computing Proposal

A MIMD Multi Threaded Processor

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data. Maaike Limper Emil Pilecki Manuel Martín Márquez

Transcription:

Philippe Laurens, Michigan State University, for USATLAS Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan ESCC/Internet2 Joint Techs -- 12 July 2011

Content Introduction LHC, ATLAS, USATLAS Data Deluge Hierarchical Tiers of Computing Robust Network is key Network Infrastructure monitoring Distributed perfsonar nodes at USATLAS T1 & T2s Centralized Dashboard Examples of diagnostics Prospects after our pilot deployment 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 2

Large Hadron Collider at the European Center for Nuclear Research 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 3

Large Hadron Collider 2x circulating beams of protons guided by superconducting magnets cooled down to -271 o C 27 km (~17 miles) circumference Underground: 50-100 m ATLAS is one of the two primary experiments at LHC 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 4

ATLAS Particle Physics Experiment to explore the fundamental forces and the structure of matter in our universe One quark or gluon from each proton collide at the center of the ATLAS detector and produce other particles, which themselves decay or further collide with material in the detector giving jets and showers of secondary particles A Russian doll set of sub-detector components surround the collision point (altogether 7,000 tons, 25m high) Particle paths and energies are measured using millions of channels of data acquisition repeat 25ns later, up to 30 million times/sec 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 5

The ATLAS Collaboration A Torroidal LHC ApparatuS The collaboration 38 Countries ~170 Institutes/Universities ~3000 physicists Including ~1000 students One of the largest efforts in physical sciences 2011: Started 2 year run of data taking at 3.5 TeV per beam (7 TeV total) Already produced papers and new physics results from 2010 low luminosity data, more results coming at this summer conferences 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 6

Data Deluge Proton beams cross at 40 MHz but only ~30MHz of collisions Trigger System: multi-level online event selection, down to < 1kHz of recorded events Huge Data Volume > 300 MB/s Huge Computer Storage and Analysis resources needed for simulation (Monte Carlo) and event reconstruction Worldwide Grid 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 7

ATLAS Tiered Computing Model To address these computing needs (Storage and CPU) ATLAS has chosen a Tiered Computing Model: Tier 0 : at CERN Tier 1 : ~10 national centers Tier 2 : Regional centers Tier 3 : Institutional/group centers (Tier 4 : Desktops) Raw data is duplicated among Tier-0 and Tier-1s for backup Reconstructed data and simulation data are available to all Tier-1s and Tier-2s Tier-2s primarily handle Simulation (MC production) & Analysis Tasks Implicit in this distributed model and central to its success are: High-performance, ubiquitous and robust networks Grid middleware to securely find, prioritize and manage resources User jobs need to find the data they need 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 8

7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 9

USATLAS sites US Tier 1 BNL Brookhaven National Lab (Long Island, NY) Five US Tier 2 centers AGLT2 Atlas Great Lakes Tier 2 MSU Michigan State University UM University of Michigan MWT2 Mid-West Tier 2 IU Indiana University Purdue University Indianapolis UC University of Chicago NET2 North-East Tier 2 BU Boston University HU Harvard SWT2 South-West Tier 2 OU University of Oklahoma UTA University of Texas at Arlington WT2 West Tier 2 SLAC SLAC National Accelerator Laboratory 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 10

T2 Site example: AGLT2 Split between MSU and UM 10Gb network between MSU and UM and between each site and the rest of the world ~20 file server nodes > 1.9 PB of data ~ 375 compute nodes > 4,500 job slots 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 11

Robust Network is Key Desire to instrument the network connections between US Tier 1 and all US Tier 2 sites in one uniform way Primary motives Aid in problem diagnosis and location identification Archive of standard regular measurements over time USATLAS Adopted perfsonar-ps Implemented end points in each facility Define mesh of connection tests between all facilities 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 12

perfsonar-ps Deployment at USATLAS Deploy same inexpensive hardware at all sites ~$600 per KOI 1U system, 1Gb NIC Used same linux based ps perfsonar Toolkit LiveCD now most sites use the new net-install, rpm distribution Dedicate one node for throughput and one node for latency at each site Throughput tests are resource intensive and tend to bias latency test results Define a common set of test Mesh of Throughput tests to/from all T1/T2 perfsonar nodes Mesh of Latency tests to/from all T1/T2 perfsonar nodes Now augmented with summary via Dashboard (more later) 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 13

perfsonar-ps Project A deployable measurement infrastructure perfsonar-ps is comprised of several members: Esnet, Fermilab, Georgia Tech, Indiana University, Internet2, SLAC, The University of Delaware perfsonar-ps products are written in the perl programming language and are available for installation via source or RPM (Red Hat Compatible) packages perfsonar-ps is also a major component of the ps Performance Toolkit A bootable Linux CD containing measurement tools and GUI ready to be configured for desired tests. 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 14

perfsonar-ps Tools Web based GUIs for admin to configure and for user to display measurements After initial setup of local disk, IP, NTP Nodes may be declared being part of some communities (e.g. LHC or USATLAS) to help identification in a directory lookup service Two main test types Throughput tests (bwctl) non-concurrent Two-way Ping Latency tests (PingER) and One-Way Latency tests with packet loss accounting (owamp), can run concurrently Tests are scheduled and a Measurement Archive manages results Also available traceroute and ping (i.e. reverse route from remote PS host) Network Diagnostic Tools (NDT,NPAD) on demand Cacti pre-installed 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 15

perfsonar: Web GUI 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 16

perfsonar: Throughput Tests web page 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 17

perfsonar: Throughput graphs 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 18

perfsonar: Latency Tests web page 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 19

perfsonar: Latency Graph Graph for current time shown here, but one can also retrieve older time slices from archive, or zoom in on a particular time within such graph. 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 20

perfsonar: reverse traceroute 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 21

Centralized Monitoring of our Distributed Monitoring: the BNL Dashboard 9 separate T1/T2 sites monitored, i.e. 18 perfsonar nodes Total of 108 critical services, 72 throughput tests, 72 one-way latency tests Need a centralized Dashboard to keep track of the overall mesh Developed by BNL (Tom Wlodek) for USATLAS (and now other clouds) First within Nagios (but complex and hard to access) Now rewritten as a standalone project accessible by all (and portable) Use probes to monitor proper operation of critical services on each node Alert emails sent to site admin on failing services Use probes to retrieve the latest test results on pre-defined mesh of measurements (throughput & Latency) Both measurements about link A B measured by A & B Thresholds on results for label (OK, CRITICAL, etc) and color code History and time plots of service status and mesh of measurements Present a compact overview of all USATLAS inter-site network connections (and perfsonar nodes health) 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 22

Dashboard: in Nagios first 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 23

Dashboard: now standalone version 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 24

Dashboard: Primitive Services 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 25

Dashboard: Service History 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 26

Dashboard: Throughput Measurement plot 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 27

Dashboard: Latency Measurement plot 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 28

Dashboard: other ATLAS clouds 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 29

Diagnostics Throughput: Notice and localize problems to help debug network, also help differentiate server problems from path problems Latency: Notice route changes, asymmetric routes Watch for excessive Packet Loss Optionally: Install additional perfsonar nodes inside local network and/or at periphery Characterize local performance and internal packet loss Separate WAN performance from LAN performance Daily Dashboard check of own site, and peers 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 30

Example of diagnostics Asymmetric throughput between peer sites IU & AGLT2 was documented, then resolved 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 31

Other example of diagnostics Most recent case, right after routing upgrade work, quickly noticed small 0.7ms latency increase Traceroute showed an unintended minor route change (packets to MSU were going through UM) router config quickly fixed 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 32

Prospects after our pilot deployment perfsonar-ps Has proven extremely useful for USATLAS to-date! perfsonar-ps will be recommended for US T3 sites perfsonar is being deployed in other ATLAS clouds Italy has started, Canada also in process BNL Dashboard already monitoring IT cloud (at least for now) Dashboard code will be packaged & distributed perfsonar is being deployed at LHC T1 sites LHCOPN already plans to deploy it LHCONE is considering perfsonar-ps for their monitoring Will continue usage at USATLAS T1&T2s Expand to Inter-cloud monitoring, between T2s of different clouds Add 10Gb throughput tests perfsonar is open source with new release ~twice/year e.g. work underway to use single multi-core node for both throughput and latency The more test points along the paths the better Integrating information from backbone, routing points Allows a divide-and-conquer approach to problem isolation 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 33

Thank you perfsonar http://www.perfsonar.net/ http://psps.perfsonar.net/toolkit Jason Zurawski zurawski@internet2.edu USATLAS perfsonar Dashboard Nagios https://nagios.racf.bnl.gov/nagios/cgi-bin/prod/perfsonar.php needs BNL login Standalone Dashboard http://130.199.185.78:28080/exda/ Tom Wlodek tomw@bnl.gov AGLT2 https://hep.pa.msu.edu/twiki/bin/view/aglt2 Our Compute Summary page http://www.pa.msu.edu/people/laurens/aglt2/aglt2compactivsum.html My email laurens@pa.msu.edu 7/12/2011 Philippe Laurens, MSU AGLT2 USATLAS 34