ATLAS Tier-3 UniGe

Similar documents
The ATLAS Tier-3 in Geneva and the Trigger Development Facility

Edinburgh (ECDF) Update

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

Status of KISTI Tier2 Center for ALICE

XRAY Grid TO BE OR NOT TO BE?

Middleware-Tests with our Xen-based Testcluster

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO

HTCondor with KRB/AFS Setup and first experiences on the DESY interactive batch farm

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

ARC integration for CMS

Austrian Federated WLCG Tier-2

CMS Belgian T2. G. Bruno UCL, Louvain, Belgium on behalf of the CMS Belgian T2 community. GridKa T1/2 meeting, Karlsruhe Germany February

RFC Editor Reporting April 2013

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

The INFN Tier1. 1. INFN-CNAF, Italy

Deploying virtualisation in a production grid

Tier2 Centre in Prague

Andrej Filipčič

Grid-related related Activities around KISTI Tier2 Center for ALICE

EGI-InSPIRE. ARC-CE IPv6 TESTBED. Barbara Krašovec, Jure Kranjc ARNES. EGI-InSPIRE RI

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

EGEE and Interoperation

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Introduction to High-Performance Computing (HPC)

Pan-European Grid einfrastructure for LHC Experiments at CERN - SCL's Activities in EGEE

Distributing storage of LHC data - in the nordic countries

FREE SCIENTIFIC COMPUTING

Opportunities for container environments on Cray XC30 with GPU devices

Parallel Storage Systems for Large-Scale Machines

WLCG Lightweight Sites

Grid Computing at Ljubljana and Nova Gorica

Computing / The DESY Grid Center

UW-ATLAS Experiences with Condor

RFC Editor Reporting September 2010

The LHC Computing Grid

Lessons Learned in the NorduGrid Federation

Doses Administered Reporting User Instructions

LHCb experience running jobs in virtual machines

University of Johannesburg South Africa. Stavros Lambropoulos Network Engineer

Monitoring tools in EGEE

Service Availability Monitor tests for ATLAS

Andrea Sciabà CERN, Switzerland

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

Grid Operation at Tokyo Tier-2 Centre for ATLAS

Philippe Charpentier PH Department CERN, Geneva

ATLAS NorduGrid related activities

HSC Data Processing Environment ~Upgrading plan of open-use computing facility for HSC data

Opportunities A Realistic Study of Costs Associated

Access the power of Grid with Eclipse

glite Grid Services Overview

EMI Deployment Planning. C. Aiftimiei D. Dongiovanni INFN

Updates from Edinburgh Computing & LHCb Nightly testing

Using Resources of Multiple Grids with the Grid Service Provider. Micha?Kosiedowski

SAM at CCIN2P3 configuration issues

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

Use to exploit extra CPU from busy Tier2 site

OBTAINING AN ACCOUNT:

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Name Department/Research Area Have you used the Linux command line?

Operating the Distributed NDGF Tier-1

Habanero Operating Committee. January

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose

Support for multiple virtual organizations in the Romanian LCG Federation

Data Grid Infrastructure for YBJ-ARGO Cosmic-Ray Project

Introduction. Kevin Miles. Paul Henderson. Rick Stillings. Essex Scales. Director of Research Support. Systems Engineer.

Grid and Cloud Activities in KISTI

Grid Code Planner EU Code Modifications GC0100/101/102/104

Past. Inputs: Various ATLAS ADC weekly s Next: TIM wkshop in Glasgow, 6-10/06

Perceptive Content. Release Notes. Version: 7.2.x

The Swiss ATLAS Grid End 2008 Progress Report for the SwiNG EB

Clouds at other sites T2-type computing

Scalability / Data / Tasks

Using a dynamic data federation for running Belle-II simulation applications in a distributed cloud environment

Testing SLURM open source batch system for a Tierl/Tier2 HEP computing facility

I/O at the Center for Information Services and High Performance Computing

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Computing for LHC in Germany

ALICE Grid Activities in US

HPCF Cray Phase 2. User Test period. Cristian Simarro User Support. ECMWF April 18, 2016

VMs at a Tier-1 site. EGEE 09, Sander Klous, Nikhef

Monitoring ARC services with GangliARC

Data services for LHC computing

Bright Cluster Manager

Getting started with the CEES Grid

HTCondor Week 2015: Implementing an HTCondor service at CERN

The National Analysis DESY

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

CERN: LSF and HTCondor Batch Services

The Software Defined Online Storage System at the GridKa WLCG Tier-1 Center

Tel-Aviv University GRID Status

Managing CAE Simulation Workloads in Cluster Environments

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Barcelona Supercomputing Center

Management of batch at CERN

Transcription:

ATLAS Tier-3 cluster @ UniGe Luis March and Yann Meunier (Université de Genève) CHIPP + CSCS GRID: Face To Face meeting CERN, September 1st 2016

Description of ATLAS Tier-3 cluster at UniGe The ATLAS Tier-3 cluster at UniGe-DPNC is physically located at Uni Dufour (~ 500 m away from UniGe-DPNC building) Grid services (NorduGrid): ARC-CE, BDII, proxy, DPM SE Batch system (62+12 nodes): Worker Nodes ~ 656+96 cores Memory/process = 2 10 GB (option defined by user) Storage system (DPM): ATLAS pool Reserved = 450.0 TB File Servers = 15 New space for DAMPE: 17 TB Inside DPM Grid Storage ATLAS Space Tokens Capacity (TB) Used (TB) Free (TB) ATLASGROUPDISK 25.0 9.85 15.15 (~ 60.6%) ATLASLOCALGROUPDISK 420.0 380.14 39.86 (~ 9.5%) ATLASSCRATCHDISK 5.0 0.22 4.78 (~ 95.7%) Fine for now 2

Description of extra Tier-3 cluster at UniGe Some extra Tier-3 cluster resources at UniGe-DPNC, for different experiments (not only ATLAS), which are also physically located at Uni Dufour User Interfaces (login machines for users): SLC6 (3 nodes) = 48 cores SLC5 (3 nodes) = 48 cores They will be used for other services In addition to DPM SE, we have NFS disk servers for local storage: /atlas/users /atlas/software /cvmfs/*.cern.ch Intended for software development (3 TB) Intended for common ATLAS software (local users) (2 TB) Official software tools for (some) experiments (mounted) /atlas/data /neutrino/data /ams/data /icecube/data /dampe/data Data storage for UniGe ATLAS users Data storage for UniGe neutrino users Data storage for UniGe AMS users Data storage for UniGe IceCube users Data storage for UniGe DAMPE users 108.0 TB 82.0 TB 103.0 TB 2.0 TB 58.6 TB Total NFS disk space for local storage = ~5 TB + 353.6 TB = ~ 358.6 TB 3

Operations Grid services (NorduGrid): ARC-CE DPM SE nordugrid-arc-ce-5.0.5 glite-yaim-dpm 4.2.20-1 (we should upgrade it with Puppet) GGUS ticket/s: Ticket-ID 117900 About ATLAS storage (monthly) consistency checks Data management: Status = closed srmls -l srm://grid05.unige.ch:8446/srm/managerv2?sfn=/dpm/unige.ch/home/atlas/atlaslocalgroupdisk/dumps/ srmls -l srm://grid05.unige.ch:8446/srm/managerv2?sfn=/dpm/unige.ch/home/atlas/atlasscratchdisk/dumps/ srmls -l srm://grid05.unige.ch:8446/srm/managerv2?sfn=/dpm/unige.ch/home/atlas/atlasgroupdisk/trig-daq/dumps/ Ticket-ID 120979 About ATLAS storage deletions: Status = closed Deletion failures at UNIGE-DPNC_SCRATCHDISK, due to permissions Once the permissions were changed, no failures observed yet In general, running smoothly: ATLAS Production jobs UNIGE-DPNC came back to production on July 23rd UniGe local users Increased activity (job submission) for last months 4

Accounting along time: UniGe cluster October 2015 Stats from October 2015 (4th quarter 2015) to June 2016 (2nd quarter 2016) 1st & 2nd quarter 2016: Highest local user activity (main users: ATLAS and DAMPE) Lower ATLAS production activity since June 2015 (checked: related to memory) ATLAS production activity re-started since ~ July 23rd 5

Accounting: Monitoring at UniGe (1) Total number of running jobs: ATLAS + UniGe users Stats: Last year I started in October 2015 Lower ATLAS production activity since June 2015 Running jobs: UniGe (local) users GRID running jobs: ATLAS Larger activity since March Back to ATLAS production activity since July 23rd 2016 6

Accounting: Monitoring at UniGe (2) Total number of running jobs: ATLAS + UniGe users Higher ATLAS production activity since July 23rd GRID running jobs: ATLAS Stats: Last month Running jobs: UniGe (local) users 7

Monthly accounting: UniGe cluster Stats until August 31st at ~ 14 h Two remarks: ATLAS group (local) and ATLAS production (grid) increased their usage DAMPE group: Decreased activity for July and August 8

Monthly accounting: Local users 9

Monthly accounting: Local groups 10

Accounting: ATLAS production (1) 11

Accounting: ATLAS production (2) 12

Outlook Operations: CPU/cores Operating system Batch system Testbed We have up to ~136 cores more to be added at the cluster Currently SLC6, but moving to CentOS at some point We would like to move to SLURM (currently Torque/PBS) We could use some of these CPUs to be tested with SLURM GPU machines Funding request submitted to UniGe: Approved Finally, GPUs would be added into Baobab HPC cluster ATLAS Production ATLAS production re-started at UniGe since July 23rd We should cross-check/review our accounting + Multi-Core Storage: Disk servers We are going to add 1 (70 TB) disk server (maybe 2) to DPM Created a DAMPE pool of ~ 20 TB on DPM 11 File Disk Servers with SCL5 (upgrade to SLC6 only if necessary) DPM SE We would like to move to Puppet (currently YAIM) Testbed We would like to make a small testbed: 1 service machine: Puppet 1 Head Node: DPM (newer version than current one) 1 File Disk Server: Data to be managed by DPM Network: Upgrade to 10 Gb/s Funding request submitted to UniGe: Approved Data transfers from/to NFS disk servers: Performance tests scheduled soon 13

Back-up Slides 14

Disk server intervention atlasfs15, atlasfs18 & atlasfs20: RAID controller card damaged RAID cards replaced and put them back into service mode 15