Distributed Monte Carlo Production for

Similar documents
Reprocessing DØ data with SAMGrid

ATLAS COMPUTING AT OU

DØ Southern Analysis Region Workshop Goals and Organization

Long Term Data Preservation for CDF at INFN-CNAF

High Throughput WAN Data Transfer with Hadoop-based Storage

Fermilab FERMILAB-Conf-03/207-E August 2003

DOSAR Grids on Campus Workshop October 2, 2005 Joel Snow Langston University Outline

Andrea Sciabà CERN, Switzerland

The LHC Computing Grid

Grid Computing Activities at KIT

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

A large-scale International IPv6 Network. A large-scale International IPv6 Network.

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Was ist dran an einer spezialisierten Data Warehousing platform?

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

SAM at CCIN2P3 configuration issues

A data handling system for modern and future Fermilab experiments

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Status of KISTI Tier2 Center for ALICE

Deploying virtualisation in a production grid

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

CHIPP Phoenix Cluster Inauguration

UW-ATLAS Experiences with Condor

CMS Belgian T2. G. Bruno UCL, Louvain, Belgium on behalf of the CMS Belgian T2 community. GridKa T1/2 meeting, Karlsruhe Germany February

Outline. ASP 2012 Grid School

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Data Transfers Between LHC Grid Sites Dorian Kcira

Installation of CMSSW in the Grid DESY Computing Seminar May 17th, 2010 Wolf Behrenhoff, Christoph Wissing

Computing for LHC in Germany

Overview of ATLAS PanDA Workload Management

1.Remote Production Facilities

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

OSG Lessons Learned and Best Practices. Steven Timm, Fermilab OSG Consortium August 21, 2006 Site and Fabric Parallel Session

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

EGEE and Interoperation

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Introduction to Grid Computing

The Grid: Processing the Data from the World s Largest Scientific Machine

Distributed production managers meeting. Armando Fella on behalf of Italian distributed computing group

Tier2 Centers. Rob Gardner. University of Chicago. LHC Software and Computing Review UC San Diego Feb 7-9, 2006

Geographical failover for the EGEE-WLCG Grid collaboration tools. CHEP 2007 Victoria, Canada, 2-7 September. Enabling Grids for E-sciencE

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

A scalable storage element and its usage in HEP

ELFms industrialisation plans

The INFN Tier1. 1. INFN-CNAF, Italy

LHCb Computing Resource usage in 2017

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

D0 Grid: CCIN2P3 at Lyon

The ATLAS Distributed Analysis System

Implementing GRID interoperability

Batch Services at CERN: Status and Future Evolution

The PanDA System in the ATLAS Experiment

Improving Enterprise Search at Microsoft

IEPSAS-Kosice: experiences in running LCG site

DØ Regional Analysis Center Concepts

Edinburgh (ECDF) Update

HEP Grid Activities in China

AMAZON S3 FOR SCIENCE GRIDS: A VIABLE SOLUTION?

Virtualization. A very short summary by Owen Synge

A L I C E Computing Model

The ATLAS Production System

On the EGI Operational Level Agreement Framework

University of Johannesburg South Africa. Stavros Lambropoulos Network Engineer

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

The LHC Computing Grid

vsan Mixed Workloads First Published On: Last Updated On:

The European DataGRID Production Testbed

A User-level Secure Grid File System

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

PROOF-Condor integration for ATLAS

ISTITUTO NAZIONALE DI FISICA NUCLEARE

<Insert Picture Here> Introducing Oracle WebLogic Server on Oracle Database Appliance

Nimble Storage Adaptive Flash

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

Site Report. Stephan Wiesand DESY -DV

XSEDE Campus Bridging Tools Jim Ferguson National Institute for Computational Sciences

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Storage Resource Sharing with CASTOR.

Monte Carlo Production on the Grid by the H1 Collaboration

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Surveillance Dell EMC Storage with Verint Nextiva

XRAY Grid TO BE OR NOT TO BE?

LCG MCDB Knowledge Base of Monte Carlo Simulated Events

ARC integration for CMS

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

einfrastructures Concertation Event

VEXATA FOR ORACLE. Digital Business Demands Performance and Scale. Solution Brief

Austrian Federated WLCG Tier-2

The Changing. Introduction. IT Business Brief. Published By IT Business Media Cofounders Jim Metzler

STATE OF STORAGE IN VIRTUALIZED ENVIRONMENTS INSIGHTS FROM THE MIDMARKET

Tier2 Centre in Prague

High-Energy Physics Data-Storage Challenges

Transcription:

Distributed Monte Carlo Production for Joel Snow Langston University DOE Review March 2011

Outline Introduction FNAL SAM SAMGrid Interoperability with OSG and LCG Production System Production Results LUHEP Computing Summary DOE Review March 2011 Joel Snow Langston University 2

Introduction Covers my tenure as MC production coordinator Simulation data (MC) crucial to physics analysis Tevatron luminosity and hence raw data volume is at record levels Challenge for analysts and production Personnel & computing resources migrating to LHC experiments DZero strategy Increase automation Leverage resources and support DOE Review March 2011 Joel Snow Langston University 3

Evolution Mature experiment, but nimble history of adopting innovative technologies distributed data handling - SAM early adopter of the grid for production - SAMGrid significant investment in these technologies Grid technology allows opportunistic usage DZero can mix traditional dedicated and opportunistic resources Grid interoperability Leverages resources and support, reduces personnel needs per CPU hour DOE Review March 2011 Joel Snow Langston University 4

Sequential data Access via Metadata Fermilab system first used by DZero SAM distributed data handling system predates grid Set of servers working together to store and retrieve files and metadata Permanent storage and local disk caches Database tracks location, metadata of files, job processing history Delivers files to jobs (using GridFTP over WAN), provides job submission capabilities DOE Review March 2011 Joel Snow Langston University 5

SAMGrid Fermilab developed grid first used by DZero for global MC production in 2004 SAMGrid = SAM + Job and Information Management (JIM) components Provides the user with transparent remote job submission, data processing and status monitoring. VDT based (Globus + Condor) Logically consists of Multiple execution sites Resource selector Multiple Job Submission (Scheduler) sites Multiple Clients (User Interface) to Submission site. DOE Review March 2011 Joel Snow Langston University 6

SAMGrid Interoperability As Open Science Grid (OSG) and LHC Computing Grid (LCG) became operational it was desirable to leverage these resources for DZero FNAL and DZero developed and deployed SAMGrid interoperability with both LCG and OSG resources Execution site acts as a Forwarding node packages SAMGrid jobs for OSG/LCG job submission via Condor-G DOE Review March 2011 Joel Snow Langston University 7

Consolidation, Automation, Exploitation SAMGrid sites require operational manpower and expert support People power and FNAL support migrating to LHC experiments Increase automation - Automc Reduce number of SAMGrid sites, increase use of OSG and LCG comes with support provides opportunistic job slots DOE Review March 2011 Joel Snow Langston University 8

MC production gets work from the SAM Request System Physics groups' MC requests are parametrized and prioritized as a Python object Production System DOE Review March 2011 Joel Snow Langston University 9

Automatic Monte Carlo Request Processing Developed Automc System in use at FNAL Handles official DZero MC production at all but 2 sites From approved request to final data storage Easy to use minimizes manpower needs Site independent deploy for any grid site (SAMGrid, OSG, LCG) capable of managing many sites Handle recovery of common failures Integrates with existing MC request priority protocol DOE Review March 2011 Joel Snow Langston University 10

AutoMC Monitoring Running at FNAL & managing production at 39 sites http://www-d0.fnal.gov/computing/mcprod/dajd/dajd_status.html DOE Review March 2011 Joel Snow Langston University 11

Production System Resources MC production uses a variety of dedicated and opportunistic resources on 4 continents Non-grid site at ccin2p3 Lyon (FR) very productive, flexible Native Samgrid sites: FZU (CZ), GridKa (DE), LUHEP (US), USTC (CN) LCG resources: CE's, SE's, and Samgrid-LCG infrastructure in FR, UK, NL OSG resources: OSG resources: CE's, SE's, and Samgrid-OSG infrastructure in US DOE Review March 2011 Joel Snow Langston University 12

MC Production Results Looking back at the last 30 days Averaging 5.8M events per day and totaling 172.8M events in 30 days DOE Review March 2011 Joel Snow Langston University 13

MC Production Results Looking back at the last year (2010/02/14-2011/02/14) cumulative since September 2005. Averaging 49M events per week and totaling 2.6B events in a year DOE Review March 2011 Joel Snow Langston University 14

MC Production Results Looking back at the last year by production segment 52 week averages per week (2010/02/14-2011/02/14) Non-grid: 19.8M, OSG: 11.4M, Samgrid: 12.6M, LCG: 4.9M DOE Review March 2011 Joel Snow Langston University 15

MC Production Results Looking back at the last year by production segment Cumulative since September, 2005 Production Last Year By Segment Nongrid OSG Samgrid LCG 52 week totals (2010/02/14-2011/02/14) Non-grid: 1041M, OSG: 596M, Samgrid: 658M, LCG: 257M 40.8% 23.3% 25.8% 10.1% DOE Review March 2011 Joel Snow Langston University 16

MC Production Geographic Events Last Year: Distribution 1.1% 22.5% Europe 1925M N. America 574M 0.9% Asia 29M S. America 24M 75.4% (2010/02/14-2011/02/14) Europe S. America N. America Asia DOE Review March 2011 Joel Snow Langston University 17

MC Production Results Looking back at the last 5.5 years (2005/09/05-2011/02/14) cumulative since September 2005. Averaging 19.2M events per week and totaling 2.82B events DOE Review March 2011 Joel Snow Langston University 18

MC Production Results Looking back at the last 5.5 years by production segment 5.5 year averages per week (2005/09/05-2011/02/14) Non-grid: 8.0M, OSG: 4.8M, Samgrid: 5.3M, LCG: 1.1M DOE Review March 2011 Joel Snow Langston University 19

MC Production Results Looking back at the last 5.5 years by production segment Cumulative since September, 2005 Production Last Year By Segment Nongrid OSG Samgrid LCG 5.5 year totals (2005/09/05-2011/02/14) Non-grid: 2.26B, OSG: 1.37B, Samgrid: 1.51B, LCG: 306M 41.5% 25.2% 27.7% 5.6% DOE Review March 2011 Joel Snow Langston University 20

Production Results Last 7 Years DZero MC Production in Millions of Events per year ending 12/26 DZero MC Production in Millions of Events 3000 2500 2000 1500 1000 500 0 2004 2005 2006 2007 2008 2009 2010 LCG OSG SAMGrid Non-Grid Year Total Non-Grid SAMGrid OSG LCG 2010 2388.5 1011.2 614.8 539.2 223.3 2009 1122.6 540.3 217.9 364.2 0.3 2008 794.8 315.6 213.6 259.7 5.8 2007 398.2 109.1 158.1 96.5 34.4 2006 348.0 144.4 195.5 0.5 7.6 2005 98.1 68.6 29.5 0.0 0.0 2004 42.4 41.8 0.6 0.0 0.0 DOE Review March 2011 Joel Snow Langston University 21

Production Results Last 7 Years DZero MC Production in Terabytes of Data per year ending 12/26 DZero MC Production in Terabytes of Data 250 200 150 100 50 LCG OSG SAMGrid Non-Grid Year Total Non-Grid SAMGrid OSG LCG 2010 221.0 83.3 61.8 53.7 22.3 2009 95.3 42.7 19.8 32.8 0.0 2008 67.8 26.9 18.4 22.0 0.5 2007 31.6 7.3 13.2 8.2 2.9 2006 23.0 9.4 13.1 0.0 0.5 2005 6.0 4.1 1.9 0.0 0.0 2004 1.9 1.9 0.0 0.0 0.0 0 2004 2005 2006 2007 2008 2009 2010 DOE Review March 2011 Joel Snow Langston University 22

OU DZero MC Production 2005/09/05-2011/02/14 OUHEP produced 306 M events and 28.4 TB data Last year OUHEP produced 139 M events and 14.0 TB data 2010/02/14 2011/02/14 Cumulative since Sept. 2005 DOE Review March 2011 Joel Snow Langston University 23

LU DZero MC Production 2005/09/05-2011/02/14 LUHEP produced 15.5 M events and 1.36 TB data Last year LUHEP produced 4.6 M events and 450 GB data 2010/02/14 2011/02/14 Cumulative since Sept. 2005 DOE Review March 2011 Joel Snow Langston University 24

LUHEP Computing 2 grid enabled clusters both producing DØ MC Old Samgrid cluster- 12 job slots New OSG cluster - 12 job slots with small associated SE used as DØ cache DOE Review March 2011 Joel Snow Langston University 25

Condor Q's at LUHEP SAMGrid Last Year OSG DOE Review March 2011 Joel Snow Langston University 26

Summary DZero 's early deployment of grid technology and automation has dramatically increased MC production First deployment SAM distributed data handling system Early SAMGrid deployment Use of OSG and LCG resources through interoperability with SAMGrid First opportunistic usage of OSG Storage Elements Automated MC production system Anticipate adequate MC through the last analysis DOE Review March 2011 Joel Snow Langston University 27