e-science for High-Energy Physics in Korea

Similar documents
A test of the interoperability of grid middleware for the Korean High Energy Physics Data Grid system

IEPSAS-Kosice: experiences in running LCG site

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

The Grid: Processing the Data from the World s Largest Scientific Machine

Status of KISTI Tier2 Center for ALICE

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

Andrea Sciabà CERN, Switzerland

CHIPP Phoenix Cluster Inauguration

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Grid-related related Activities around KISTI Tier2 Center for ALICE

1. Introduction. Outline

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

Overview. About CERN 2 / 11

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Grid and Cloud Activities in KISTI

The grid for LHC Data Analysis

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

Virtualizing a Batch. University Grid Center

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF


High Throughput WAN Data Transfer with Hadoop-based Storage

Grid Computing at the IIHE

Grid Computing a new tool for science

CC-IN2P3: A High Performance Data Center for Research

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

Monte Carlo Production on the Grid by the H1 Collaboration

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Reprocessing DØ data with SAMGrid

Travelling securely on the Grid to the origin of the Universe

The CMS Computing Model

Austrian Federated WLCG Tier-2

Storage and I/O requirements of the LHC experiments

LHC Computing Grid today Did it work?

Ganga - a job management and optimisation tool. A. Maier CERN

LHCb Computing Strategy

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

A Simulation Model for Large Scale Distributed Systems

WHEN the Large Hadron Collider (LHC) begins operation

Grid Computing: dealing with GB/s dataflows

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

The ALICE Glance Shift Accounting Management System (SAMS)

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Integration of an Asian NGI with European counterparts

DESY. Andreas Gellrich DESY DESY,

The LHC Computing Grid

UW-ATLAS Experiences with Condor

CSCS CERN videoconference CFD applications

AGIS: The ATLAS Grid Information System

Storage Resource Sharing with CASTOR.

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Batch Services at CERN: Status and Future Evolution

ATLAS Experiment and GCE

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

Data transfer over the wide area network with a large round trip time

Grid Computing Activities at KIT

Distributed e-infrastructures for data intensive science

The INFN Tier1. 1. INFN-CNAF, Italy

Experience of the WLCG data management system from the first two years of the LHC data taking

Long Term Data Preservation for CDF at INFN-CNAF

Data Management for the World s Largest Machine

The LHC computing model and its evolution. Dr Bob Jones CERN

Security Service Challenge & Security Monitoring

The EGEE-III Project Towards Sustainable e-infrastructures

HEP Grid Activities in China

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

DIRAC pilot framework and the DIRAC Workload Management System

The LHC Computing Grid

( PROPOSAL ) THE AGATA GRID COMPUTING MODEL FOR DATA MANAGEMENT AND DATA PROCESSING. version 0.6. July 2010 Revised January 2011

Introduction to Grid Computing May 2013 Marian Babik CERN. The LHC Computing Grid Marian Babik (orig. by Rafal Otto, GridCafe), CERN

Data Transfers Between LHC Grid Sites Dorian Kcira

Status and plan for the hadron therapy simulation project in Japan

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

Geant4 Computing Performance Benchmarking and Monitoring

Review of the Compact Muon Solenoid (CMS) Collaboration Heavy Ion Computing Proposal

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

Università degli Studi di Ferrara

On the employment of LCG GRID middleware

Monitoring the Usage of the ZEUS Analysis Grid

CernVM-FS beyond LHC computing

Geant4 in a Distributed Computing Environment

Overview of ATLAS PanDA Workload Management

Grid Challenges and Experience

Introduction to Grid Infrastructures

A comparison of data-access platforms for BaBar and ALICE analysis computing model at the Italian Tier1

THE Compact Muon Solenoid (CMS) is one of four particle

A Grid Web Portal for Aerospace

arxiv: v1 [physics.ins-det] 1 Oct 2009

Philippe Charpentier PH Department CERN, Geneva

BigData Express: Toward Predictable, Schedulable, and High-Performance Data Transfer. BigData Express Research Team November 10, 2018

Problems for Resource Brokering in Large and Dynamic Grid Environments

International Cooperation in High Energy Physics. Barry Barish Caltech 30-Oct-06

Transcription:

Journal of the Korean Physical Society, Vol. 53, No. 2, August 2008, pp. 11871191 e-science for High-Energy Physics in Korea Kihyeon Cho e-science Applications Research and Development Team, Korea Institute of Science and Technology Information, Daejeon 305-806 (Received 21 August 2007) e-science for high-energy physics is to study high-energy physics (HEP) \any time and anywhere" even if we are not on-site at accelerator laboratories. The components of the project include data production, data processing, and data publication that can be accessed any time and anywhere. We report our experiences on the integration and the utilization of the e-science for high-energy physics, especially the ALICE (A large ion collider Experiment) and the CDF (collider detector at Fermilab) experiments. We discuss both the general idea and the current implementations of data processing, production, and publication. PACS numbers: 29.85+c, 07.05.Bx Keywords: Grid, e-science, Data process, High-energy physics I. INTRODUCTION Science includes many challenging problems that require large resources, particularly knowledge from many disciplines [1]. e-science is a new research and development paradigm for science, which is computationally intensive science carried out in highly distributed network environments [2]. e-science uses immense data sets that require grid computing [2]. The goal of e-science is to do any research \any time and anywhere." The main goal of HEP (high-energy physics) is to understand the basic properties of elementary particles and their interactions. HEP experiments are usually conducted at major accelerator sites where experimentalists perform detector design and construction, signal processing, data acquisition, and data analysis on a large scale [3]. The size of a collaboration is 100 2000 physicists. HEP requires a particularly well-developed e-science infrastructure due to its need for adequate computing facilities for the analysis of results and the storage of data originating from the LHC (large hadron collider) at CERN (European Organization for Nuclear Research) [2]. To perform computing at the required HEP scale, we need strong data grid support. Therefore, HEP is one of best applications for e-science. The goal of e-science for high-energy physics is to study HEP \any time and anywhere" even if we are not at accelerator laboratories (on-site). The components include 1) data production, 2) data processing, and 3) data publication that can be accessed any time and anywhere even if o-site. First, data production is to get data and take shifts anywhere even if we are not on-site by using a E-mail: cho@kisti.re.kr; Fax: +82-42-869-0789 remote operation center or a remote control room. Second, data processing is to process data by using a HEP data grid. The objective of a HEP data grid is to construct a system to manage and process HEP data and to support the user group (i.e., high energy physicists) [4]. Third, data publication is for collaborations around world to analyze and publish the results by using collaborative environments. II. E-SCIENCE FOR HIGH ENERGY PHYSICS 1. Overview The goal of the ALICE (A large ion collider Experiment) experiment at CERN is to study the physics of strongly interacting matter at extreme energy densities where the quark-gluon plasma is expected. If computing is to be performed at the required scale of PByte, a grid is a strong requirement. For this work, we have assembled LCG (LHC Computing Grid) farms at the KISTI (Korea Institute of Science and Technology Information). Based on the grid concept with VO (virtual organization) [5], we use this farm not only for the ALICE experiment but also for the CDF (collider detector at Fermilab) experiment. The goal of the CDF experiment at Fermilab is to discover the identities and the properties of the particles that make up the universe and to understand the interactions between those particles [6]. The CDF experiment began its Run II phase in 2001 [7]. The CDF computing model is based on the concept of the CDF analysis farm (CAF) [8]. To date, Run II has -1187-

-1188- Journal of the Korean Physical Society, Vol. 53, No. 2, August 2008 Table 1. Components of e-science for High-energy physics for ALICE and CDF experiments. Component of e-science ALICE Experiment CDF Experiment Data Production N/A Remote Operation Center Data Processing ALICE Tier 2 center using Pacic CAF (CDF Analysis LCG farm Farm) using LCG farm Data Publication EVO (Enabling Virtual Organization) Fig. 1. Architecture of data processing for e-science for high-energy physics. gathered more than 2 fb 1 of data, equivalent to 3.0 10 9 events or a couple of PByte of data. In order to process the data, CDF is investigating shared computing resource, such as DCAF (decentralized CDF analysis farm). After DCAF in Korea [4], now we run CDF jobs at a grid farm. Table 1 shows the components of e-science for the AL- ICE and the CDF experiments. We will explain each component in the following sections. 2. Data Production Usually, we take data on-site where accelerators are located. However, in the spirit of e-science, we would like to take data from anywhere. A method is to use a remote operation center. An example is the remote operation center at Fermilab in the USA to operate the LHC experiment at CERN. Currently, we have constructed a remote CDF operation center at the KISTI, which will enable CDF users in Korea to take CO (consumer operator) shifts at the KISTI in Korea, not at Fermilab in USA. 3. Data Processing Figure 1 shows the architecture of data processing for e-science for high-energy physics. The main infrastructure of data processing for e-science is network and computing resources. The rst is network. KREONET (Korea Research Environment Open NETwork) is a national research and development network operated by the KISTI and supported by the Korea government. KRE- ONET is connected to GLORIAD (Global Ring Network for Advanced Applications Development), which is a ring network with 10 Gbps connecting Korea to Hong-Kong and the United States. The KISTI is directly peered to CERN via a 10-Gbps network. The second is computing resources. For current and future HEP activities for large-scale data, the HEP data grid is indispensable. We need to maintain a mass storage system of hard disks and tapes in a stable state. If the HEP data are to be made transparent, CPU power should be extendable and accessible. Transparent HEP data means that the data should be analyzed even if high energy physicists as users do not know the actual source of the data [1]. The middleware resources are LCG (LHC computing

e-science for High-Energy Physics in Korea { Kihyeon Cho -1189- Table 2. Progress of CDF Grid. Name Starting Grid Job Content Site date middleware scheduling CAF 2001 - Condor Cluster farm USA (Fermilab) inside Fermilab DCAF 2003 - Condor Cluster farm Korea, Japan, Taiwan, Spain outside Fermilab (Barcelona, Cantabria), USA (Rutgers, San Diego), Canada, France, Grid CAF 2006 LCG / OSG Resource Broker Grid farm North America CAF + Condor European CAF Pacic CAF CGCC (CDF Grid 2008 LCG Resource Broker Big Grid farm Korea (KISTI) Computing Center) (plan) + Condor France (IN2P3) Italy (CNAF) Table 3. Comparison of Grid CAF (CDF Analysis Farm). Grid CAF Head node Work node Grid Method VO (Virtual middle ware Organization) Server North Fermilab (USA) USSD (USA) etc: OSG Condor over CDF (Collider Detector America CAF Globus at Fermilab) VO European CNAF (Italy) IN2P3 LCG WMS (Workload CDF VO CAF (France) etc: Management System) Pacic CAF AS (Taiwan) KISTI LCG, OSG Condor over Globus CMS (Compact Muon Solenoid) VO (Korea) etc: / ALICE (A Large Ion Collider Experiment) VO grid). The applications are ALICE and CDF VO. We have installed and operated EGEE middleware of LCG at the KISTI site, which has been approved as an EGEEcertied site. The current LCG farm at the KISTI consists of 30 nodes (30 CPU) and 2-TByte storage. 4. Data Publication For data publication, we host the EVO (enabling virtual organization) server system at the KISTI so that high-energy physicists in Korea may use it directly without using reectors in USA. Therefore, the EVO server at the KISTI enables collaborations around the world to do analysis and to publish the results together easily. III. RESULTS For data production, we have constructed a remote control room for CO operation at the KISTI. The CDF control room at Fermilab consists of many sections. One of them is monitoring sections to check the quality of data. We call it CO, which does not aect the control of the data acquisition directly. Everything on the CDF CO desktop is available at remote sites. `Event Display' and `Consumer Slide' are done by `Screen Snap Shots' through a web site. The status and logs of `Consumer Monitors' and `Detector Check List' are done through a web site. A `Consumer Controller' can be sent to the remote site. Communication with shift crews at the CDF control room is done by using the EVO system. For data processing, we have constructed the LCG farm at the KISTI for the ALICE experiment. Based on the grid concept [5], we have also constructed a CDF VO besides the ALICE VO. Moreover, a signicant fraction of these resources is expected to be available to CDF during the LHC era. CDF is using several computing processing systems, such as CAF, DCAF, and a grid system. The movement to a grid at the CDF experiment is a worldwide trend for HEP experiments. Table 2 shows the progress of the CDF grid. For the LCG farm at the KISTI, we use the method of `Condor over Globus', which is a grid-based production system. The advantage of this method is its exibility [9]. Table 3 shows a comparison of Grid CAF. We have made a federation of LCG and OSG (open science grid) farm at AS (Academia

-1190- Journal of the Korean Physical Society, Vol. 53, No. 2, August 2008 Fig. 2. Monitoring system of Pacic CAF (CDF analysis farm). Fig. 4. Monitoring system of the EVO (enabling virtual organization). which provides very stable research environment. Figure 4 shows the monitoring system of EVO, which is connected with the servers at the KISTI. IV. CONCLUSION Fig. 3. Results of the operation of the LCG farm at KISTI. Sinica) in Taiwan and the LCG farm at the KISTI. We call this federation of grid farms as `Pacic CAF'. The head node of `Pacic CAF' is located in AS. Now the `Pacic CAF' has become a federation of grid farms between one LCG farm at the KISTI (KR-KISTI-GCRT- 01) in Korea and four LCG and OSG farms (IPAS OSG, Taiwan-LCG2, Taiwan-NCUCC-LCG2, and TW-NIU- EECS) in Taiwan. Figure 2 shows the web monitoring system of `Pacic CAF', including the status of work nodes at the KISTI. The KORCAF (Korea CAF) farm at Kyungpook National University in Korea and the JP- CAF (Japan CAF) farm at the University of Tsukuba in Japan will be linked to `Pacic CAF'. In conclusion, we have constructed an LCG farm for data processing for both the ALICE and the CDF experiments. The farm has been running since May 6, 2007. Figure 3 shows the results of operation of the LCG farm at the KISTI in 2007 by the monitoring systems [10]. The average successful job rate at the LCG farm at the KISTI is 94 % where the successful job rate is dened as the number of successful jobs divided by the total number of submitted jobs. The results show stable performance of the system. For data publication, we have constructed the EVO servers at the KISTI. When users in Korea use the EVO servers at the KISTI, the routing time is reduced by 60 msec without the congestion of the network inside USA, HEP is one of the best applications of e-science projects in Korea. The goal of e-science for high-energy physics is to study the ALICE and the CDF experiments any time in Korea. We have shown the implementation of data production, data processing, and data publication for that purpose. For data processing, we have succeeded in installing and serving the LCG farms for both experiments based on a grid concept. ACKNOWLEDGMENTS We would like to thank Hyunwoo Kim, Minho Jeung, Beobkyun Kim, Jae-Hyuck Kwak, and Soonwook Hwang (Korea Institute of Science and Technology Information) for data processing at LCG (LHC Computing Grid) farm, Yuchul Yang (Kyungpook National University), Mark Neubauer, and Frank Wurthwein (University of California at San Diego) for DCAF (Decentralized CDF analysis farm), and Igor Sllioi (Fermilab) and Tsan Lung Hsieh (Academia Sinica) for Pacic CAF (CDF Analysis farm). REFERENCES [1] K. Cho, Comp. Phys. Comm. 177, 247 (2007). [2] The denition of e-science is from wiki home page: http://en.wikipedia.org/wiki/e-science.

e-science for High-Energy Physics in Korea { Kihyeon Cho -1191- [3] S. K. Park, J. Park, S. Lee, S. H. Ahn, S. J. Hong, T. J. Kim, K. S. Lee, M. Begel, T. Andeen, H. Schellman, J. Stein, S. Yacoob, E. Gallas, V. Sirotenko, J. Hauptman, G. Snow, J. Korean Phys. Soc. 49, 2247 (2006). [4] K. Cho, Internat. J. of Comp. Sci. and Network Sec. 7, 49 (2007). [5] I. Foster, C. Kesselman and S. Tuecke, Internat. J. of High-Performance Computing Appl. 15, 200 (2001). [6] D. Gerdes, J. Korean Phys. Soc. 45, S268 (2004). [7] G. P. Yeh, J. Korean Phys. Soc. 45, S301 (2004). [8] M. Neubauer, Nucl. Instrum. Meth. A 502, 386 (2003). [9] I. Sigoi, Comp. Phys. Comm. 177, 235 (2007). [10] Monitoring systems: http://pcalimonitor.cern.ch/reports/; http://gridportal.hep.ph.ic.ac.uk/rtm/.