The Global Grid and the Local Analysis

Similar documents
The National Analysis DESY

The German National Analysis Facility What it is and how to use it efficiently

and the GridKa mass storage system Jos van Wezel / GridKa

Computing / The DESY Grid Center

NAF & NUC reports. Y. Kemp for NAF admin team H. Stadie for NUC 4 th annual Alliance Workshop Dresden,

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Batch system usage arm euthen F azo he Z J. B T

UW-ATLAS Experiences with Condor

CHIPP Phoenix Cluster Inauguration

dcache Introduction Course

glite Grid Services Overview

Philippe Charpentier PH Department CERN, Geneva

Andrea Sciabà CERN, Switzerland

HEP Grid Activities in China

A scalable storage element and its usage in HEP

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

Grid Computing Activities at KIT

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Lessons Learned in the NorduGrid Federation

EGEE and Interoperation

Operating the Distributed NDGF Tier-1

ATLAS operations in the GridKa T1/T2 Cloud

The glite middleware. Ariel Garcia KIT

DESY. Andreas Gellrich DESY DESY,

The LHC Computing Grid

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Austrian Federated WLCG Tier-2

Service Availability Monitor tests for ATLAS

Virtualizing a Batch. University Grid Center

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Data Access and Data Management

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

CMS Computing Model with Focus on German Tier1 Activities

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

DQ2 - Data distribution with DQ2 in Atlas

Outline. ASP 2012 Grid School

PROOF-Condor integration for ATLAS

High Energy Physics data analysis

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

Understanding StoRM: from introduction to internals

Edinburgh (ECDF) Update

DIRAC pilot framework and the DIRAC Workload Management System

The INFN Tier1. 1. INFN-CNAF, Italy

Tutorial for CMS Users: Data Analysis on the Grid with CRAB

The LHC Computing Grid

Computing for LHC in Germany

Distributed Data Management on the Grid. Mario Lassnig

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

The Grid: Processing the Data from the World s Largest Scientific Machine

Virtualization. A very short summary by Owen Synge

Distributing storage of LHC data - in the nordic countries

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

XRAY Grid TO BE OR NOT TO BE?

Distributed Computing Framework. A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille

ARC integration for CMS

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

The ATLAS Production System

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

The EU DataGrid Testbed

HIGH ENERGY PHYSICS ON THE OSG. Brian Bockelman CCL Workshop, 2016

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

AMGA metadata catalogue system

HammerCloud: A Stress Testing System for Distributed Analysis

Monitoring the Usage of the ZEUS Analysis Grid

ANSE: Advanced Network Services for [LHC] Experiments

ALICE Grid Activities in US

HEP replica management

LHCb Computing Strategy

Storage Resource Sharing with CASTOR.

ISTITUTO NAZIONALE DI FISICA NUCLEARE

The Grid Monitor. Usage and installation manual. Oxana Smirnova

Status of KISTI Tier2 Center for ALICE

Batch Services at CERN: Status and Future Evolution

Scientific data management

The PanDA System in the ATLAS Experiment

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Data handling and processing at the LHC experiments

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

The grid for LHC Data Analysis

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Monte Carlo Production on the Grid by the H1 Collaboration

Introducing the HTCondor-CE

LHCb Distributed Conditions Database

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS

The ATLAS Distributed Analysis System

HTCondor with KRB/AFS Setup and first experiences on the DESY interactive batch farm

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Transcription:

The Global Grid and the Local Analysis Yves Kemp DESY IT GridKA School, 11.9.2008

Overview Global and globalization : Some thoughts Anatomy of an analysis and the computing resources needed Boundary between global and local Some problems analysts and resource providers encounter Disclaimer: If you expect an how-to during this talk, you will be disappointed. The speaker does know not The Right way to go himself GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 2

Wikipedia: Globalization Globalization in its literal sense is the process of transformation of local or regional things or phenomena into global ones. It can also be used to describe a process by which the people of the world are unified into a single society and function together. This process is a combination of economic, technological, sociocultural and political forces. (Sheila L. Croucher, 2004) Globalization is NOT Everyone can do everything without limits Have a common set of rules In Computer Science: This is called defining standards The different actors need clear interfaces they can rely on GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 3

And local? What does that mean? Every resource is local somewhere It is in a local building, with local admins, local users, For users: You know the facility, the admins, the peculiarities, The resource might even be near to you (geographically) Eventually, you have a different (local) access GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 4

Anatomy of the Grid (very simplified) Local application User User Interface API Central Services Global Grid Site Site Site GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 5 Local site

Analysis facility (very simplified) CPU User Interface User Storage The central point of data analysis is the data Analysts need a system where storage is the central part Analysts want an integrated system, not a layered and distributed system like the Grid Analysis has its own requirement! GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 6

Different tasks: Different requirements Example: HEP model Coordinated & global tasks Uncoordinated, unstructured & local tasks MC Production Event Generation: no I; small O; large CPU Detector Simulation: small I; large O & CPU Event Reconstruction/Reprocessing Reprocessing: full I; full O; large CPU Selections: large I; large O; large CPU Analysis Usually: large I; small O; little CPU Performed by many users, many times! LHC StartUp phase: Short turn-around GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 7

Performance criteria for analysis and I want it now! Real-time response Get the answer after a definite time Fast Well, as fast as possible Rapid analysis (like Rapid Application Development) Interactive You can react instantaneously and steer your analysis GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 8

From Global to Local: Local Computing Global Grid Computing Data collected with an experiment MC simulations -Reprocessing & converting/stripping data -Creating group or personal data skim -Applying own & modified algorithms - (Like the Waterfall model ) Where to put the boundary? Higgs mass is GeV/c 2 Earth warming is C / year GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 9

Boundary: Different approaches In the Computing Model E.g. LHC Tier Model: Each layer has different roles, T3 the most local one Adding layer of abstraction! E.g. Submission frameworks hide underlying resources Integrate into existing facilities, logical separation E.g. NAF @ DESY (additional fairshare and storage, ) Additional facilities, only partly coupled to the Grid E.g. NAF @ DESY (WGS, local batch queue, local filesystem ) Enhancing Grid: E.g. Interactive Grid: Having the instantaneous response users expect from a local machine (remember the previous talk) No single road to success! GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 10

Boundary: Computing Model Different tasks require different setups Break-up with uniformity and homogeneity of an Ideal Computing Grid Have different hierarchical layers for the different tasks Each layer is specialized for one set of tasks, dataflow among layers Example: ATLAS Data Model and Computing model (slides from Amir Farbin) GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 11

Boundary: Additional layer of abstraction (e.g. Ganga) LHCb Experiment neutral ATLAS Gauss/Boole/Brunel/DaVinci (Simulation/Digitisation/ Executable AthenaMC (Production) Athena (Simulation/Digitisation/ Reconstruction/Analysis) Reconstruction/Analysis) GANGA Local PBS LSF OSG PANDA LHCb WMS US-ATLAS WMS Ganga: Submission framework for different Grids and local clusters and local machines! CRAB (submission tool developed by CMS) PROOF and gliteproof (ALICE, ) Ganga & PROOF: See tutorials! GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 12

Boundary: Integration into existing Example NAF @ DESY: Use VOMS-Groups to differentiate local users CE: Scheduler gives prioritization and fairshare SE: additional space only writable by / /de Problems: Info system is per-vo (ignores VOMS groups) Education of users: Use of VOMS-extension! Potential danger of split into too many groups GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 13

Boundary: Build something new Example NAF: building blocks: gsissh Use Grid whenever possible! qsub Interactive Local Batch Proof NAF Grid I n t e r c o n n e c t AFS Parallel Cluster FS Dedicated space NAF AFS/Kerberos scp SRM? dcache SE main data store and exchange grid-submit Grid Cluster DESY Grid dcache Storage Grid-ftp, SRM GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 14

? Some concerns still remain in the areas: Storage technique and management Support Computing services Administrators (following slides show only some aspects) GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 15

Storage technique Global file catalog and name space vs. local one. Are the two in sync? Which one is the authoritative? Lots of effort going into manually synchronizing the two Problem known as Dark Data (see G. Cowan, Monday) Access to Storage Local access protocols to Global storage Global: e.g. gridftp, local: could be mounted file system File system: Has Users (UID/GID) / VOMS: has Groups/Roles Some operations are just not possible that one expects from the simple laptop HD (read-modify-write somewhat difficult with tapes ) GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 16

Storage management Who can write where? Experiments usually have Role=production to write data And there are different groups of users Quotas and ACLs: How to consistently set them up? Home directory in the Grid Lots of communication needed between partners to set up space for different groups Orphaned files somewhere in the Grid No owner (in the sense of: is using them) On SE, but not in LFC (recall: Dark Data) Need people taking care of storage GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 17

Do not forget support! The more users, the more support groups exist. How to chose the right one? People want personal support does not scale at a Grid level Information flow to your local working group is (relatively) easy. When and how to inform a globalized community? Debugging: Large latencies in a distributed system Probably time will show how support will organize itself And organization will change with time GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 18

Computing services Again: How to handle (many small?) subgroups? Is fairshare enough? Will it scale and is it manageable? They might want to install their own software in the Grid This is one reason for separate clusters! Different jobs profile Can the (meta-) schedulers handle this? Again: One reason for separate clusters Accounting Different supported subgroups might induce different financing bodies with different accounting wishes on the same resources GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 19

From a site admins perspective Also for the admin, many services are global He/she has no control over them Cannot easily debug things Examples: VOMS server, FTS, WMS, LFC Needs inter-site communication and global support channels Keep services as generic as possible The site resources are part of a bigger thing WLCG/EGEE sets standards (e.g. Software installation, data access, ) Admin has little leeway, and gives away some control Everything is larger and more anonymous E.g. for support Admin have a hard time explaining this to the users GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 20

Summary and Outlook Analysis has special requirements Most analysis is data-centric Finding the right boundary between Grid and local facilities is important for success of the final analysis Many ways to success, but still some open concerns and work to be done Future will tell which way(s) will be chosen GridKA School 11.9.2008 Global Grid and Local Analysis Yves Kemp 21