XRAY Grid TO BE OR NOT TO BE?

Similar documents
E S R F U P - W P 1 1

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

Edinburgh (ECDF) Update

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

glite Grid Services Overview

Deploying virtualisation in a production grid

ATLAS Tier-3 UniGe

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Grid technologies, solutions and concepts in the synchrotron Elettra

Monte Carlo Production on the Grid by the H1 Collaboration

Data storage services at KEK/CRC -- status and plan

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

IEPSAS-Kosice: experiences in running LCG site

ARC integration for CMS

Access the power of Grid with Eclipse

The glite middleware. Ariel Garcia KIT

Introduction to Grid Infrastructures

On the employment of LCG GRID middleware

The ATLAS Production System

DESY. Andreas Gellrich DESY DESY,

Status of KISTI Tier2 Center for ALICE

Monitoring tools in EGEE

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

Outline. ASP 2012 Grid School

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

Andrea Sciabà CERN, Switzerland

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Parallel Computing in EGI

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Gergely Sipos MTA SZTAKI

Presentation Title. Grid Computing Project Officer / Research Assistance. InfoComm Development Center (idec) & Department of Communication

Middleware-Tests with our Xen-based Testcluster

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute

Philippe Charpentier PH Department CERN, Geneva

The Global Grid and the Local Analysis

Grid Infrastructure For Collaborative High Performance Scientific Computing

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT

FREE SCIENTIFIC COMPUTING

Integration of Cloud and Grid Middleware at DGRZR

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

Tier2 Centre in Prague

The National Analysis DESY

Advanced Job Submission on the Grid

UW-ATLAS Experiences with Condor

Workload management at KEK/CRC -- status and plan

Overview of HEP software & LCG from the openlab perspective

Interconnect EGEE and CNGRID e-infrastructures

AGATA Analysis on the GRID

Data Grid Infrastructure for YBJ-ARGO Cosmic-Ray Project

Grid Scheduling Architectures with Globus

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Virtualization. A very short summary by Owen Synge

Grid Computing for Bioinformatics: An Implementation of a User-Friendly Web Portal for ASTI's In Silico Laboratory

EMI Deployment Planning. C. Aiftimiei D. Dongiovanni INFN

Name Department/Research Area Have you used the Linux command line?

Pan-European Grid einfrastructure for LHC Experiments at CERN - SCL's Activities in EGEE

Batch Systems. Running calculations on HPC resources

Distributed Monte Carlo Production for

Tutorial for CMS Users: Data Analysis on the Grid with CRAB

Data Storage. Paul Millar dcache

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

HTCondor with KRB/AFS Setup and first experiences on the DESY interactive batch farm

Enabling Grids for E-sciencE. A centralized administration of the Grid infrastructure using Cfengine. Tomáš Kouba Varna, NEC2009.

Clouds: An Opportunity for Scientific Applications?

Architecture Proposal

Parallel Storage Systems for Large-Scale Machines

A Case for High Performance Computing with Virtual Machines

SZDG, ecom4com technology, EDGeS-EDGI in large P. Kacsuk MTA SZTAKI

High Throughput WAN Data Transfer with Hadoop-based Storage

The Grid: Processing the Data from the World s Largest Scientific Machine

Tomography data processing and management

Scalable Computing: Practice and Experience Volume 10, Number 4, pp

University of Johannesburg South Africa. Stavros Lambropoulos Network Engineer

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

Operating the Distributed NDGF Tier-1

Monitoring the Usage of the ZEUS Analysis Grid

Improved 3G Bridge scalability to support desktop grid executions

VMs at a Tier-1 site. EGEE 09, Sander Klous, Nikhef

ATLAS NorduGrid related activities

Data Access and Data Management

Accelerating GATE simulations

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

Easy Access to Grid Infrastructures

CHIPP Phoenix Cluster Inauguration

Support for multiple virtual organizations in the Romanian LCG Federation

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

Using MATLAB on the TeraGrid. Nate Woody, CAC John Kotwicki, MathWorks Susan Mehringer, CAC

High Energy Physics data analysis

Technical Computing with MATLAB

Using Rmpi within the HPC4Stats framework

The EU DataGrid Testbed

Grid Computing. Olivier Dadoun LAL, Orsay. Introduction & Parachute method. Socle 2006 Clermont-Ferrand Orsay)

OBTAINING AN ACCOUNT:

Transcription:

XRAY Grid TO BE OR NOT TO BE? 1

I was not always a Grid sceptic! I started off as a grid enthusiast e.g. by insisting that Grid be part of the ESRF Upgrade Program outlined in the Purple Book : In this talk I will try to explain why nowadays I oscillate between being a Grid sceptic and Devils' advocate 2

Which Grid are we talking about? EGEE / glite grid : 3

Synchrotron Requirements Lots of small jobs 100000's running for minutes Jobs often read and write images 100's images / job Experiments generate lots of data up to Tera Bytes / day Large number of small users groups 100's /year Many diverse experiments 1000's / year 4

XRAY Science is DATA INTENSIVE! ESRF has currently 400 TB, end of 2009 800 TB, in the future Petabytes... Data per Experiment and per Beamline since January 2009 : 50 14 45 12 40 35 10 30 8 25 TB per Experiment 6 TB per Beamline 20 15 4 10 2 5 0 0 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 BL1 BL2 BL3 BL4 BL5 BL6 BL7 BL8 BL9 BL10 5

Test program - spd Spd is a program to correct 2D images for : Spatial distortion : 2d spline curve Flood field : image division Background field : image subtraction One image takes about 17 seconds Additional images take a fraction of a second Simple but typical of many programs used on 2d images : Low on CPU, High in I/O Typical data set is 180 images x 8 i.e. 1.44 GB Typical experiment measures HUNDREDS of data sets 6

grid_spd grid_spd is a script written by Emmanuel Taurel to test running spd on the XRAY grid setup by Clemens Koerdt and Fernando Calvelo Test scenarios : Data on User Interface, submit Jobs to WMS Data on LFC, submit Jobs to WMS Data on LFC, submit jobs directly to CREAM-CE Data on LFC, submit Parametric jobs Data on NFS, submit jobs to CREAM-CE Data on NFS, run job from command line 7

grid_spd results 30 25 20 Longest Job Wait for Job Total 15 10 5 0 Data on LFC Data on UI Parametric Cream CE Data on NFS Local non-grid 8

XRAY Grid TO BE? We could assume : Grid is OK for synchrotron science Programs will run slower but still run External users will be able to use the grid resources Data can be exported and accessed from UI Middleware (glite) will get better (and faster) with time National Grid Initiatives will provide free resources 9

XRAY Grid OR NOT TO BE? We could assume : Grid is NOT OK for synchrotron science Data intensive applications are not adapted to Grid LFC and WMS are too slow SE does not provide fast, direct access to data Exporting data is not adapted to occasional intense usage Middleware (glite) will not change drastically Future of glite is uncertain, funding stops in 2010 10

XRAY issues with EGEE/Glite LFC : Very slow for many files WMS : Very slow, optimised for long jobs SE : dcache + DPM are slow, not directly mountable WN : Users cannot run with their UID/GID Not adapted for MPI intensive applications GridFTP : Needs tuning by guru's, not reliable, no/poor Windows support Grid certificates : Not clear that they scale to 1000's of XRAY users 11

Why is (EGEE) Grid not adapted? Grid assumes each node is independent File systems are not mounted locally on WN's Grid file I/O is much slower than locally mounted files Grid does not provide any improvement over clusters Why would you want to use the Grid in this case? 12

XRAY grid next steps Gridify and benchmark more applications : Penelope a radiation dose monte-carlo simulation Peaksearch a 2D and 3D peaksearching code Laminography a tomography technique FDMNES an xray spectra calculation MPI code Fit2dCake a radial integration program XDS a protein structure refinement program Try to use GridFTP for transferring real data, test irods Install a faster scheduler e.g. Condor, and connect to ESRF local batch system 13

XRAY A Hybrid Grid Setup local optimised clusters with high performance local file systems and batch submission systems Provide Grid access via direct login and grid certificates Export data via gridftp and irods : Use glite for long jobs or remote batch submission Preinstall common xray software 14

HPC-grid versus HEP Grid High Performance Computing refers to optimised clusters for running cpu + data intensive and mpi applications Users have an account on the worker nodes File systems and network is optimised Computation is done on : Multi-core CPU's GPU's Hardware and software is installed and managed locally 15

XRAY Science and Grid? 16

Conclusions* EGEE/Glite based Grid has a steep learning curve, is not user friendly however it does seem to work for certain applications First impression of EGEE/gLite based Grid is that it is not suited to XRAY science A big thanks to DESY for help so far but we need more if we want to run data intensive applications on the grid A better return on investment for data intensive applications seems to be to invest in HPC e.g. GPU's, multi-core CPU's * we reserve the right to change these conclusions in the future 8-) 17