MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT

Similar documents
Geant4 simulation in a distributed computing environment

Geant4 in a Distributed Computing Environment

Benchmarks of medical dosimetry simulation on the grid

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

GRID COMPUTING IN MEDICAL APPLICATIONS

PROOF-Condor integration for ATLAS

Improved ATLAS HammerCloud Monitoring for Local Site Administration

UW-ATLAS Experiences with Condor

ISTITUTO NAZIONALE DI FISICA NUCLEARE

OBJECT ORIENTED DESIGN OF ANTHROPOMORPHIC PHANTOMS AND GEANT4-BASED IMPLEMENTATIONS ABSTRACT

High-availability services in enterprise environment with SAS Grid Manager

Scalable Computing: Practice and Experience Volume 10, Number 4, pp

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

IEPSAS-Kosice: experiences in running LCG site

CERN: LSF and HTCondor Batch Services

High Throughput WAN Data Transfer with Hadoop-based Storage

Benchmarking the ATLAS software through the Kit Validation engine

Report. Middleware Proxy: A Request-Driven Messaging Broker For High Volume Data Distribution

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Grid Scheduling Architectures with Globus

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat

Introduction to Grid Computing

EGEE and Interoperation

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Andrea Sciabà CERN, Switzerland

Verification / Validation Tools

DIRAC pilot framework and the DIRAC Workload Management System

glite Grid Services Overview

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

High Performance Computing from an EU perspective

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

AliEn Resource Brokers

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

Geant4 Computing Performance Benchmarking and Monitoring

CMS HLT production using Grid tools

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

DTM a lightweight computing virtualization system based on irods. Yonny Cardenas, Pascal Calvat, Jean-Yves Nief, Thomas Kachelhoffer

First evaluation of the Globus GRAM Service. Massimo Sgaravatto INFN Padova

A Login Shell interface for INFN-GRID

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

Clustering and Reclustering HEP Data in Object Databases

N. Marusov, I. Semenov

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Batch Services at CERN: Status and Future Evolution

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM

S i m p l i f y i n g A d m i n i s t r a t i o n a n d M a n a g e m e n t P r o c e s s e s i n t h e P o l i s h N a t i o n a l C l u s t e r

Most real programs operate somewhere between task and data parallelism. Our solution also lies in this set.

INTRODUCTION TO THE ANAPHE/LHC++ SOFTWARE SUITE

Clouds at other sites T2-type computing

Test & Analysis Project aka statistical testing

A fast and accurate GPU-based proton transport Monte Carlo simulation for validating proton therapy treatment plans

MOHA: Many-Task Computing Framework on Hadoop

Real-time grid computing for financial applications

Testing SLURM open source batch system for a Tierl/Tier2 HEP computing facility

XRAY Grid TO BE OR NOT TO BE?

Clouds in High Energy Physics

Executing dynamic heterogeneous workloads on Blue Waters with RADICAL-Pilot

Hungarian Supercomputing Grid 1

Storage Resource Sharing with CASTOR.

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Introduction to Grid Infrastructures

Big Data: Tremendous challenges, great solutions

Maimonides Medical Center s Quest for Operational Continuity Via Real-Time Data Accessibility

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase. Chen Zhang Hans De Sterck University of Waterloo

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet

The LHC Computing Grid

Overview of ATLAS PanDA Workload Management

WHEN the Large Hadron Collider (LHC) begins operation

Benchmarking and accounting for the (private) cloud

CernVM-FS beyond LHC computing

Massive Scalability With InterSystems IRIS Data Platform

Outline. Infrastructure and operations architecture. Operations. Services Monitoring and management tools

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

Bob Jones. EGEE and glite are registered trademarks. egee EGEE-III INFSO-RI

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

WP3 Final Activity Report

Systematic Cooperation in P2P Grids

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

The LHC Computing Grid

Multi-threaded, discrete event simulation of distributed computing systems

CMS experience of running glideinwms in High Availability mode

The INFN Tier1. 1. INFN-CNAF, Italy

The coolest place on earth

Monitoring the Usage of the ZEUS Analysis Grid

A data handling system for modern and future Fermilab experiments

e-infrastructures in FP7 INFO DAY - Paris

M. Roehrig, Sandia National Laboratories. Philipp Wieder, Research Centre Jülich Nov 2002

arxiv: v1 [cs.dc] 7 Apr 2014

AUTOTASK ENDPOINT BACKUP (AEB) SECURITY ARCHITECTURE GUIDE

The Grid: Processing the Data from the World s Largest Scientific Machine

Cluster Computing. Resource and Job Management for HPC 16/08/2010 SC-CAMP. ( SC-CAMP) Cluster Computing 16/08/ / 50

Microsoft SQL Server on Stratus ftserver Systems

CMS users data management service integration and first experiences with its NoSQL data storage

HammerCloud: A Stress Testing System for Distributed Analysis

Simulation Techniques Using Geant4

Transcription:

The Monte Carlo Method: Versatility Unbounded in a Dynamic Computing World Chattanooga, Tennessee, April 17-21, 2005, on CD-ROM, American Nuclear Society, LaGrange Park, IL (2005) MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT J. T. Moscicki CERN CH-1122 Geneva (Switzerland) Jakub.Moscicki@cern.ch S. Guatelli, M. G. Pia, M. Piergentili INFN - Sezione di Genova Via Dodecaneso 33, 16146 Genova (Italy) Susanna.Guatelli@ge.infn.it, MariaGrazia.Pia@ge.infn.it, Michela.Piergentili@ge.infn.it ABSTRACT We show how nowadays it is possible to achieve the goal of accuracy and fast computation response in radiotherapic dosimetry using Monte Carlo methods, together with a distributed computing model. Monte Carlo methods have never been used in clinical practice because, even if they are more accurate than available commercial software, the calculation time needed to accumulate sufficient statistics is too long for a realistic use in radiotherapic treatment. We present a complete, fully functional prototype dosimetric system for radiotherapy, integrating various components based on HEP software systems: a Geant4-based simulation, an AIDA-based dosimetric analysis, a web-based user interface, and distributed processing either on a local computing farm or on geographically spread nodes. The performance of the dosimetric system has been studied in three execution modes: sequential on a single dedicated machine, parallel on a dedicated computing farm, on a grid test-bed. An intermediate software layer, the DIANE system, makes the three execution modes completely transparent to the user, allowing to use the same code in any of the three configurations. Thanks to the integration in a grid environment, any hospital, even small ones or in less wealthy countries, that could not afford the high costs of commercial treatment planning software, may get the chance of using advanced software oncological therapy, based on Monte Carlo methods, by accessing distributed computing resources, shared with other hospitals and institutes belonging to the same virtual organization. Key Words: radiotherapy, Geant4, simulation, distributed computing, parallel execution, GRID 1 TYPICAL REQUIREMENTS OF DISTRIBUTED GEANT 4 SIMULATIONS Typical requirements of an efficient distributed Geant4 [2] simulation may vary significantly depending on the application area. Medical applications, such as brachytherapy [3], require interactive, almost real-time optimization of the treatment planning during the patient s stay in the clinic. The acceptable response time from the Monte Carlo simulation should not exceed a few minutes which is a dominant requirement. However the number of simultaneous jobs is small and typically there is one type of application of interest. Monte Carlo simulation is much slower than the analytical approach, albeit more accurate [4]. To achieve the response time comparable to currently used

J. T. Moscicki, S. Guatelli, M. G. Pia et al. analytical approach (based on commercial software in many clinics) one needs to run Monte Carlo simulation in a parallel way. In the clinical practice the computer clusters running such simulations would typically be small, ranging from several to one hundred machines. The requirements for some astrophysics applications are on the other end of the spectrum. Certain Monte Carlo simulations for Lisa, a joint ESA-NASA experiment in space for measuring the gravitational waves, are of order of 10 processor-years in fully batch mode [5]. The critical issues in this case are reliable error recovery (to preserve the completed parts of the simulation), monitoring of the progress of the jobs and traceability of the failed worker tasks for debugging purposes. Typically, an order of 1000 CPUs is required to achieve expected performance. The computing power of this magnitude may practically be available through the WAN network of computing elements (computers), also known as a Computing GRID. In case of semi-interactive applications in the multi-user environment there are other important issues to address. Today s cluster Workload Management Systems (such as LSF[6] and PBS[7]) have multi-user and multi-application capability with automatic load balancing. Quite naturally similar features would be required for semi-interactive public clusters which must be able to scale to hundreds of machines, support hundreds of users running concurrently their jobs each one consisting of several tens or hundreds of tasks. Clusters dedicated to a specific experiment or a group of users would certainly have different usage patterns. This requires not only a scalable software system but also a flexible one: it should be easy to fine-tune and modify the core system features according to the type of scientific application, access-patterns, average workload and other factors. 2 DIANE DISTRIBUTED ANALYSIS ENVIRONMENT Most often developers (as well as users) do not really care for the implementation details of distributed middleware which runs their distributed application. An application-oriented layer above the generic middleware infrastructure, such as Distributed Analysis Environment (DIANE), hides the complexity of the underlying distributed technology and provide additional functionality required for the parallel execution of jobs. DIANE is a R&D study, focusing on semi-interactive parallel and remote data analysis and simulation, which has been conducted at CERN. DIANE provides necessary software infrastructure for parallel scientific applications in the master-worker model. Advanced error recovery policies, automatic book-keeping of distributed jobs and on-line monitoring and control tools are provided. DIANE makes a transparent use of a number of different middleware implementations such as load balancing service (LSF, PBS, GRID Resource Broker[8], Condor[9]) and security service (GSI[10], Kerberos, openssh). Applications which use DIANE may run in a variety of different configurations without changes in their source code. Whether application is distributed by global GRID Resource Broker or local Workload Management System it actually does not matter neither for the application user nor for the developer. It is also easy to modify and adapt given DIANE configuration to the particular needs of a specific application. This capability is achieved with the layered design of the DIANE framework (Fig. 1). DIANE has been developed in the context of High Energy Physics applications which, in most cases, follow the simple master-worker computational model. However this model is also common in other scientific areas, particularly in the ones which involve Monte Carlo event-level American Nuclear Society Topical Meeting in Monte Carlo, Chattanooga, TN, 2005 2/6

Monte Carlo Simulation for radiotherapy in a distributed computing environment parallel simulation. The difference comes from the non-functional requirements: user access patterns, execution time and data access patterns, CPU load versus I/O load, etc. The design of DIANE framework allows an easy adaptation to different practical scenarios by plugging the appropriate service components into the core system. 3 RUNNING GEANT 4 WITH DIANE Further discussion relates to distribution of Geant 4 at the event level, i.e. treating every event as an independent unit of processing. A typical Geant 4 simulation producing AIDA[11] histograms on output was ported to DIANE environment. Usually simulation is linked as an executable. For the distributed environment a shared library is built with the well-known entry point to create an object representing the simulation program. This object implements a very simple abstract interface which allows to run the simulation and to set certain parameters (such as random seed) programatically. The porting process is straightforward and involves a very minimal code change. The simulation ported in such a way is executed in the worker process. To run a distributed application in parallel one has to know how to partition the job and what to do with the results that workers produce at the end of the job. This is specified in a separate library which acts as a glue between the application itself and the DIANE framework. Typically it is written once per type of application. In the discussed case, the workers output, i.e. histograms serialized to XML, were simply added at the end of the job. Tests were performed using three independent mechanisms: ssh-based, explicit placement of distributed workers via remote execution of shell scripts in the local cluster, submission of the distributed workers into the LSF queues in the local cluster, submission of the distributed workers via the European Data Grid resource broker in the WAN network. The transition between the three scenarios above required only minimal changes to the core of DIANE, which was the original design goal. It did not require changes to the test application itself. The explicit placement mechanism allows to accurately measure the execution times. The initial tests involved a small, interactive cluster of 30 machines (35 machines were used for the largest jobs), deploying a Geant 4 simulation distributed at the event level, producing AIDA histograms on the output. Average execution times are shown in Fig. 2. As shown in Fig. 3 DIANE achieved around 75% of theoretical efficiency (measured as a ratio of speed-up and the number of CPU used) in this particular test configuration. The benchmarking environment in this test did not exclude other users from logging-in interactively to the machines what partially explains this result. However great care has been taken to remove random errors from the measurements by averaging the results and analyzing the hourly load on the cluster. Fig. 4 shows the average execution times for large Geant 4 simulation on a dedicated cluster of 15/20 machines. As expected the performance of parallel execution is proportional to the number of available CPUs. The performance gain in this case could be even greater if dual American Nuclear Society Topical Meeting in Monte Carlo, Chattanooga, TN, 2005 3/6

J. T. Moscicki, S. Guatelli, M. G. Pia et al. processor machines were exploited fully, i.e. DIANE could automatically allocate two worker processes per machine. This is one of the subjects of current work. 4 CONCLUSIONS We successfully demonstrated a realistic medical physics use-case on the GRID. Using a Geant4 simulation and DIANE framework we run distributed simulation of a large number of events for dosimetry in brachytherapy. Scalable application-oriented framework for distributed scientific applications must be flexible enough to meet diverse non-functional requirements. Fine-tuning for such requirements, including evolution and interchangeability of low-lever service components, is an important design goal of DIANE. Early prototype of DIANE has been successfully deployed and initial tests in a distributed environment proves that important speed-up in the end-user simulation/analysis cycle is possible. The paper demonstrates the feasibility of close integration of Geant4 and DIANE and presents the initial performance benchmark results. 5 REFERENCES 1. Distributed Analysis Environment, DIANE R&D Project, http://cern.ch/diane 2. Geant4, http://cern.ch/geant4 3. M. Tropeano et al, Leipzig Applicators Montecarlo Simulations: Results and Comparison with Experimental and Manufacturer s Data, summary in [4]. 4. M. G. Pia et al, Overview of bio-medical applications, http://www.ge.infn.it/geant4 5. A Howard et al: Experiences of Submitting UKDMC and LISA GEANT4 Jobs, http://www.gridpp.ac.uk/gridpp4 6. Load Sharing Facility, CERN Pages, http://wwwinfo.cern.ch/pdp/lsf 7. Portable Batch System, http://www.openpbs.org 8. European Data Grid, http://www.edg.org 9. Condor, http://www.cs.wisc.edu/condor 10. GSI: Grid Secutiry Infrastructure, http://www.globus.org/security 11. Abstract Interfaces for Data Analysis, http://aida.freehep.org American Nuclear Society Topical Meeting in Monte Carlo, Chattanooga, TN, 2005 4/6

Monte Carlo Simulation for radiotherapy in a distributed computing environment Applications applications are loaded dynamically and called back IIntegrator IWorkplanner Logical Processing Model IWorkPlanner applications implement the logical view feedback loop (optional) IIntegrator Physical Middleware Model application Services and Flavours Worker ProcDaemon Gateway Master ThreadDaemon Client env. reconstruct. load balancing authentication vanilla 3rd party AFS PULL LSF SSL grid Condor-G Corba Component Model different middleware models depend on the nature of underlying services services are implemented as pluggable modules Physical Deployment Topology fine-tuning by system administrator may change the physical location of middleware components Figure 1: Overview of DIANE s layered architecture. 1000 time [minutes] 100 10 cluster 30 machines sequential simulation (approximate!) 35 machines constant initialization time 1 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 number of events [M] Figure 2: Average execution times for large Geant 4 simulation using explicit worker placement with ssh. Interactive cluster. American Nuclear Society Topical Meeting in Monte Carlo, Chattanooga, TN, 2005 5/6

J. T. Moscicki, S. Guatelli, M. G. Pia et al. 1 normalized worker efficiency 0.8 0.6 0.4 0.2 single runs 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 number of events [Mio] Figure 3: Normalized, average worker efficiency for large Geant 4 simulation using explicit worker placement with ssh. Interactive cluster. DIANE Parallel Job Comparison Benchmark Geant-4 X-Ray Fluorescence Emission 1e+05 1 day Time (seconds) 10000 1000 Dedicated cluster of 15/20 machines Single CPU run Extrapolated single CPU run 1 hour 10 minutes 100 0 20 M 40 M 60 M 80 M 100 M 120 M 140 M Number of Simulated Events Figure 4: Average execution times for large Geant 4 simulation using explicit worker placement with ssh. Dedicated cluster. American Nuclear Society Topical Meeting in Monte Carlo, Chattanooga, TN, 2005 6/6