Office of Nuclear Physics Report Review of the Compact Muon Solenoid (CMS) Collaboration Heavy Ion Computing Proposal May 11, 2009
Evaluation Summary Report The Department of Energy (DOE), Office of Nuclear Physics (NP) completed its review on May 11, 2009 of the proposal received from the U.S. Compact Muon Solenoid (CMS) Heavy Ion (HI) Collaboration (CMS HI) for funding of substantial computing resources to be located at the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University, and at the Bates Computing Facility at the Massachusetts Institute of Technology (MIT). The Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN) will accelerate heavy ion beams for a period of one month each year. The proposal under review described the computing capabilities that will be needed to store, process, and analyze the heavy ion collision data from the CMS experiment. Computing for the LHC program is organized in a hierarchy of computing facilities, starting with the single Tier-0 facility at CERN, which receives data directly from the experiments, to numerous Tier-3 end user analysis stations. After initial processing at CERN, data is distributed to large computer centers (Tier-1) around the world, with sufficient storage capacity for a large fraction of the data, and support for the computing grid. Tier-1 centers make the data available to Tier-2 centers for performing specific analysis tasks. The CMS HI collaboration presented a plan in which the CERN Tier-0 facility will allocate sufficient processing power to perform the real-time detector calibrations, transfer the raw (RAW) data to a dedicated Tier-1 facility at Vanderbilt University (VU), and archive a backup of the RAW data. The RAW data will be processed at VU and converted to reconstructed (RECO) data. The RAW and RECO components, stored together as FEVT data, will form the primary archive. Vanderbilt University will also serve as a Tier-2 facility, where users (Tier-3) will be able to access the processed data for analysis. This computing model appears to depart from the CERN specifications and requirements which are described in the CMS Computing Technical Design Report (TDR) referenced in the CMS HI proposal. In addition to the reconstruction of the RAW data, resources were requested for the production and analysis of Monte Carlo (MC) simulation data. Presently, these activities are performed at the High Energy Physics (HEP) CMS Tier-2 computing center at MIT, but on a smaller scale. The proposal discussed two options for satisfying the simulation requirements. The first and preferred option provided a dedicated small computing cluster sited at the MIT Bates Computing Facility. In the second option, the simulation work is consolidated with the Tier-1/2 computing facility at VU, but managed by the MIT collaborators. The cost differential between these two alternatives was shown to be minimal. The panel recognized the significance and merit of the proposal, in that a dedicated computing resource, roughly of the scope proposed, is essential for the success of the U.S. CMS HI effort. 1
The strong institutional interest of Vanderbilt University is clearly expressed by the availability of infrastructure at the ACCRE facility and the contributed support of 8.25 FTE s over a 5 year period. The CMS HI Monte Carlo simulations facility in the first option would be housed in a new computing facility at the MIT Bates center. Both institutions are clearly willing to contribute significant resources towards the construction and operation of the respective facilities. The panel noted that most of the simulation studies requested in the 2006 DOE Science Review report were performed at the MIT center and that MIT has experience hosting a CMS Tier-2 center for the high energy physics (HEP) group since 2005. However, the panel believes the CMS HI proposal requires further development and elaboration in several crucial areas. Significant comments were made concerning the integration of the VU computing component within the computing framework of the CMS collaboration and ACCRE, performance and technical specifications, management and operations, workforce requirements, and formal agreements. The main issues that need to be addressed include: Justification of the formal performance requirements of the proposed CMS HI computing center(s) that integrates the CERN Tier-0 obligations, the service level requirements including end users, and the resources of non-doe participants and other countries. The analysis and storage model articulated in the CMS HI Collaboration proposal differs from the one described in the CMS Computing Technical Design Report (TDR 1 ). In the TDR, heavy ion data would be partially processed at the CERN Tier-0 center during the heavy ion run. The remainder would be processed at the CERN Tier-0 center during the 4-6 month LHC downtime and possibly at one or more dedicated Tier-2 centers. It appears that the CERN facility is the primary custodian of the first copy of the RAW and RECO data, since no distinction is drawn by CERN between proton and heavy ion data. On computing capacity, the TDR specifies that the combined capacity of the Tier-0 and Tier-1 centers will be sized such that each year, 3 complete re-processing passes of the RAW data can be completed. At the review presentation, the VU computing center was sized to allow for 1.5 full reconstruction passes per year. A consistent computing strategy and a plan that specifies and integrates services relating to CERN needs to be developed. The relationship between DOE NP supported grid resources (VU and MIT) for heavy ion research and the grid resources available to the larger CMS HEP collaboration needs to be clarified. Also the formal arrangements with respect to NP pledged resources to the CERN Worldwide LHC Computing Grid (WLCG) need to be defined. 1 Document referenced in the proposal as CERN-LHCC-205-023 2
The US CMS HI computing resources should be driven by technical specifications that are independent of specific hardware choices. Well-established performance metrics should be used to quantify the needs in a way that can be mapped to any processor technology the market may be offering over the course of the next few years. Expected improvements of the CPU processor cost/performance ratio suggest that the requested budget for CPU hardware could be high. The management, interaction and coordination model between the Tier-2 center(s) and Tier-3 clients is not well formulated. It will be important to document that user institutions will have sufficient resources to access the VU computing center and how the use of the facility by multiple U.S. and international clients will be managed. ACCRE should articulate its plans and facility investments needed to support the development of a custom infrastructure suited to the needs of the US CMS HI collaboration and conversely, to what extent the US CMS HI collaboration will need to adapt to the existing ACCRE infrastructure (e.g. the use of L-Store at Vanderbilt, when other CMS computing centers use dcache). The CMS HI proposal provided insufficient information to allow the panel to make a clear recommendation regarding the one- and two-site computing center options. A case should be made beyond stating that a two center solution might result in more access to opportunistic compute cycles, and that the simulation expertise existing in the MIT CMS HI group should not be lost. Based on (near) cost equality of the two solutions, there was no strong argument for the one-site solution. As the lead institution for the CMS HI simulation effort, it might be appropriate for MIT to divide responsibilities between two sites provided a welldefined, fully integrated computing plan is presented. A detailed plan for external oversight of the responsiveness and quality of operation of the computing center(s) should be developed. Draft Memoranda of Understanding (MoUs) should be prepared between the appropriate parties, separately for the VU and the Bates Computing Facilities, that clearly define management, operations, and service level responsibilities. CMS HI should further develop its computing operations model, indicating how it will handle allocations, priority allocation of temporarily available CPU capacity to DOE NP supported programs, grid support, etc. If assumptions are being made regarding CMS HEP resources (e.g. grid support), these should be clarified. The size of the workforce associated with data transport, production, production re-passes, calibration and Monte Carlo simulation efforts, and the general challenges of running in a grid environment should be carefully examined and documented. 3
The Tier-1 quality-of-service (QoS) requirements were specifically developed for the HEP LHC experiments, but they might be relaxed for the CMS HI effort in view of its spooled approach to the grid. CMS HI and the WLCG are encouraged to examine the possibility of tailoring the Tier-1 requirements to reflect the needs of the U.S. heavy ion research community. DOE Recommendations The US CMS HI Collaboration is requested to resubmit its computing proposal, together with separate documents if necessary, that responds to the concerns expressed in the reviewers reports for further evaluation. The resubmission is due to the DOE Office of Nuclear Physics by December 31, 2009. 4
Appendix A: Charge Memorandum Dear Professor/Dr.: Thank you for agreeing to participate as a member of a committee to review the proposal received from the U.S. Compact Muon Spectrometer Heavy Ion (US CMS HI) Collaboration, requesting support from the Office of Nuclear Physics for substantial computing resources. This review will take place at the Department of Energy (DOE) Headquarters in Germantown on May 11, 2009. To maintain the best possible program in nuclear physics, it is essential for us to obtain the most highly qualified technical opinions on this proposal. Your contribution is important in this regard, and we welcome your critical evaluation of the US CMS HI computing proposal. In particular, we are interested in your considerations of: a) The significance and merits of the proposed computing plan for the US CMS HI collaboration; b) The completeness and feasibility of the US CMS HI computing plans considering such factors as networking and infrastructure support; c) The cost effectiveness of the proposed plans and the appropriateness of the size of the requested budget; d) A critical evaluation of the two alternative solutions; and e) The resources and interest of the institution(s) at which the computing center(s) might be located, and the contributions of non-doe supported US CMS HI institutions. In your evaluations, you should address whether the funding requests are explicitly tied to accomplishment of annual and long-term facility performance goals. Does the proposed plan incorporate independent and quality evaluations of sufficient scope and quality conducted on a regular basis to ensure optimal utilization of the US CMS HI computing center(s)? Has the proposal conducted a credible analysis of alternatives that include trade-offs between cost, schedule, and performance goals? Please feel free to make comparisons with existing computing facilities or similar proposals with which you are familiar. The results of this review should establish the scientific need for the computing capabilities, and in turn, clearly defined deliverables, tasks, and capability/performance facility parameters necessary to assure that the science can be accomplished during the first five years of Large Hadron Collider Heavy Ion operations. This computing review will form the second session of a one-day review, comprising presentations followed by executive discussions and report writing, and a brief close-out. The review will be chaired by Dr. Gulshan Rai, Program Manager for Heavy Ion Nuclear 5
Physics, assisted by Dr. Helmut Marsiske, Program Manager for Instrumentation. You will be asked to write individual letter reports on your evaluation. Your letter report will be held in strictest confidence, so please be candid in your written remarks. Your letter reports will be due to Dr. Rai one week after the conclusion of the review. An agenda and background material will be sent to you in a later correspondence. If you have any questions about the review, please contact Dr. Rai at (301) 903-4702, (Email: Gulshan.Rai@science.doe.gov). For logistics questions, please contact Brenda May at (301) 903-0536 or Email: Brenda.May@science.doe.gov. I greatly appreciate your efforts in preparing for this review. It is an important process that allows our office to understand the scientific need for the computing resources. I look forward to a very informative and stimulating review. Sincerely, Enclosure cc: Boleslaw Wyslouch, MIT Richard G. Milner, MIT Eugene A. Henry Acting Associate Director of the Office of Science for Nuclear Physics 6
Appendix B: Agenda and List of Reviewers May 11, 2009 Department of Energy, Germantown Headquarters, Maryland Room E301 Conference Room 8:30 am Executive Session 9:00 am Heavy Ion Physics with CMS: Physics Plans and Preparations G. Roland (<40min) 10:00 am Trigger and Data Acquisition C. Roland (<30 min) 10:45 am US Participation in CMS Heavy Ion program. 11:30 am Break + Executive Session + Lunch 1:30 pm Computing facility for CMS Heavy Ions in the US 2:30 pm Vanderbilt Computing Center ACCRE B. Wyslouch (<20min) C. Maguire (<40min) A. Tackett (< 20min) 3:20 pm MIT Computing Center at Bates B. Wyslouch (<20min) 3:20 pm Break 3:30 pm Q&A Home work questions 3:50 pm Executive Session + Report Writing 6:30 pm End of Review 7
CMS HEAVY ION REVIEW May 11, 2009 Review Panel Members Dr. Michael Ernst Building 510M Brookhaven National Laboratory Upton, NY 11973-5000 (631) 344-4223 mernst@bnl.gov Dr. Anthony Frawley Department of Physics Florida State University MC: 4350 Tallahassee, FL 32306 (850) 644-4034 frawley@fsulcd.physics.fsu.edu Dr. Timothy Hallman Physics Department Brookhaven National Laboratory Building 510A Upton, NY 11973-5000 (631) 344-7420 hallman@bnl.gov Prof. Itzhak Tserruya Department of Particle Physics Weizmann Institute of Science Rehovot 76100 Israel 972-8-934-4052 itzhak.tserruya@weizmann.ac.il Dr. William (Chip) Watson Thomas Jefferson National Accelerator Facility 12000 Jefferson Avenue Mail Stop 16A Newport News, VA 23606 (757) 269-7101 watson@jlab.org DOE Participants Dr. Eugene A. Henry Office of Nuclear Physics U. S. Department of Energy SC-26/Germantown Building 1000 Independence Avenue Washington, DC 20585-1290 (301) 903-6093 Gene.Henry@science.doe.gov Dr. Helmut Marsiske Office of Nuclear Physics U. S. Department of Energy SC-26.2/Germantown Building 1000 Independence Avenue Washington, DC 20585-1290 (301) 903-0028 Helmut.Marsiske@science.doe.gov Dr. Gulshan Rai Office of Nuclear Physics U. S. Department of Energy SC-26.1/Germantown Building 1000 Independence Avenue Washington, DC 20585-1290 (301) 903-4702 Gulshan.Rai@science.doe.gov Dr. Hubert van Hecke Office of Nuclear Physics U. S. Department of Energy SC-26.1/Germantown Building 1000 Independence Avenue Washington, DC 20585-1290 (301) 903-8363 Hubert.VanHecke@science.doe.gov 8