Data Processing and Analysis Requirements for CMS-HI Computing

Size: px

Start display at page:

Download "Data Processing and Analysis Requirements for CMS-HI Computing"

Michael Carr
5 years ago
Views:

1 CMS-HI Computing Specifications 1 Data Processing and Analysis Requirements for CMS-HI Computing Charles F. Maguire, Version August 21 Executive Summary The annual bandwidth, CPU power, data storage, and tape archiving requirements for CMS-HI in the U.S. for are calculated according to certain assumptions described in the main text. The 2011 calendar year is taken to be the point when the computing center is operating at designed goals. The suggested milestones for a separate HI compute center are: Bandwidth and archive specification for a HI compute center functioning as Tier 1+Tier 2 Input: 0.5 GBits/s by the end of 2008, growing to 2.5 GBits/s by 2011, 2 months/year Output: 0.4 GBits/s by the end of 2008, growing to 2.0 GBits/s by 2011, year round operation Raw data archiving: 60 TBytes end 2008, growing to 300 TBytes by 2011, done in one month s time Annual CPU: scaled to a 2011 single pass reconstruction power of SpInt2K Year CPUs Total Power Available Time MC Computing Real Computing Avail Need (SpInt2K) (SpInt2K-sec) (SpInt2K-sec) (SpInt2K-sec) (SpInt2K-sec) Annual Disk Storage Year MC RAW RECO AOD PWG Data+User Total (TB) (TB) (TB) (TB) (TB) (TB) Total Annual Tape Archiving: raw data in one month, rest over the course of each year Year MC RAW RECO AOD PWG Cumulative (TB) (TB) (TB) (TB) (TB) (TB) While it may be possible for the FermiLab Tier 1 center to carry out some of the functions for CMS-HI, it appears that a separate HI compute center is the most practical solution.

2 CMS-HI Computing Specifications 2 1 Introduction This document summarizes the data processsing and analysis requirements needed for a CMS-HI computing center in the U.S. These requirements are developed in large measure from the information given in the tables presented by Olga Barannikova at the DOE review committee on October 24, Additional guidance came from the discussion at the CMS-HI-US computer committee VRVS meeting on August 15, and from the phone conference with the MIT CMS-HEP computing model person held the following day. This document does not go into detail about various infrastructure requirements: electric power, air conditioning capacity, and technical staffing. It assumed that the four sites who have expressed interest in hosting the CMS-HI-US computing center (Iowa, MIT, UIC, and Vanderbilt) will specify those infrastructure requirements according to the demonstrated experience at already working, large computing centers. Similarly, an annual budget is not provided. The guidelines appear to be 1) $ K for the first four years to cover both capital costs and operations, and 2) $250K after the first four years to cover operations and maintenance. 2 Standard CMS Computing Model for p + p Data The DAQ system for the CMS detector is specified to write 225 MBytes/s of RAW data onto buffer disks in the CERN Tier 0 center for both p + p and heavy ion running. Subdetector alignment and calibration data will be made available for prompt processing of the raw data at the Tier 0. After the RAW data has been processed into what are called RECO data, these files will be written to write-only tapes at the Tier 0. These tapes are intended only as emergency archives, and will not normally be read for additional processing. Subsets of the RAW and RECO data will be transferred to the seven Tier 1 institutions. These subsets are arranged according to physics content. For example, sets with electron data will go to Tier X, sets with muon data will go to Tier Y, sets with jet data will go to Tier Z, and so on. I presume that these event sets are identified either on-line or as part of the reconstruction process. There are not enough resources at the Tier 1 centers to have duplicates of these data sets. The Tier 1 centers will have their own archives of the RAW data which they receive. Additionally, the Tier 1 will derive AOD (Analysis Object Data) files from the RECO files. The AOD files are an order of magnitude smaller than are the RECO files. The AOD files are intended as the primary input data for physics analysis at the Tier 2 centers. The AOD event format is for the most part fixed, but possibly different event sets may have slightly varying AOD event types. Each Tier 1 site will get a copy of all the AOD files from the other Tier 1 sites such that any Tier 2 site coupled to a given Tier 1 site, can in principle have access to any AOD file. I assume this last specification is subject to disk space limitations at the Tier 2 sites. Lastly, the Tier 1 sites will reprocess the RAW data files, with better calibrations and possibly better algorithms, at least once per year to produce new sets of RECO and AOD files. The Tier 1 sites are responsible for all tape archiving of their respective data files. 3 Raw Data Volume in Heavy Ion Operations The effective running time for p+p is said to be 10 7 seconds, and 10 6 seconds for heavy ion operations. In the CMS computing TDR (2005), it is said that the design luminosity for heavy ions, cm 2 s 1, will be available in In this document, I take the start of heavy ion operations to be in 2009, and the ramp-up to full data taking is assumed to occur over three years as 20%, 40%, and 100%.

3 CMS-HI Computing Specifications 3 These are also the ramp-up numbers being assumed in the ALICE computing model for assembling their computing resources. From the MIT phone conversation of August 16, the conclusion was that it that there would not be prompt reconstruction of CMS-HI RAW data at the CERN Tier 0 site. Apparently the ALICE experiment will have first priority for any CPU cycles at Tier 0 which are not being used by for reconstruction by the CMS, ATLAS, and LHCb experiments during the heavy ion running. Possibly there would be resources available at the Tier 0 for developing the initial calibration and alignment files needed for the first heavy ion reconstruction pass. Otherwise, that information would have to be developed at another computer center. At a minimum we would expect that the RAW data from HI running would be written to the emergency back tapes in the CERN Tier 0. From Olga s document the RAW data event average size will be taken as 4.5 MBytes. At the nominal DAQ rate of 225 MBytes/s for 10 6 seconds there would be 225 TBytes of RAW data per year, corresponding to 50 Million events. For the calibration and alignment, and possibly other file types, we can conservatively round-up the volume of HI files coming from CERN as 300 TBytes/year. In the 20%, 40%, and 100% ramp-up scenario, this would mean volumes of 60, 120, and 300 TBytes/year for 2009, 2010, and 2011, respectively. 1 4 RAW Data Archive and Reconstruction Siting Solutions During the August 16 phone meeting there was an extended discussion of where the RAW data files should be reconstructed, and re-reconstructed. The original idea was that the RAW data files would first be sent to the FermiLab Tier 1 for archiving, and then sent to the Heavy Ion computing center for initial reconstruction. However, since a re-reconstruction will be necessary, this would likely mean there would have to be two file transfers from FermiLab, as well as one reading of the RAW data tapes at FermiLab. The only way to avoid the second file transfer was to have a dedicated 300 TBytes of disk space for RAW storage at the HI computing center, in addition to whatever other disk space was needed at the HI computing center. 2 The extra transport of files from FermiLab to the HI computing center was characterized as undesirable new dependency, meaning a new component in the CMS computing model which would have to be supported with software and other resources. An alternate solution would be to add CPU, disk, and tape resources to the FermiLab site to serve as a full-fledged Tier 1 center for the HI program. 3 Actually, since the FermiLab Tier 1 would then be doing two reconstructions of the HI RAW data, it would be going beyond what it was doing for the p + p data. This solution would still require a HI computing center which functioned as a regular Tier 2 site for the purposes of receiving AOD files from FermiLab and generating physics analyses. The conclusion at the August 16 phone meeting was that it would be prudent to consult with the FermiLab Tier 1 on whether this solution would be practical. We would have to know the costs of going this route for Tier 1 and also supporting a separate HI computing center as a Tier 2 site. A second solution to the problem of needing two instances of RAW data file transfers from the FermiLab Tier 1 site is to bypass FermiLab entirely. Instead, the HI computing center would function 1 If the actual data volumes in the first two years are much closer to the third year volume, then we would not be able to process all of those data with the at-year computing power available in any linearly staged budget plan. Possibly the data could be archived to tape for future processing when more CPU became available. 2 Reserving 300 TBytes of disk for files which will be read only twice per year seems extravagant. 3 A variant of this solution would be to have either RCF or the US-ALICE associated NERSC or LLNL as the CMS- HI Tier 1 site. Effectively this would be a contracting out of the reconstruction computing to a non-cms but still CERN-linked large computing center. Such non-cms sites would be required to maintain the standard CMS software.

4 CMS-HI Computing Specifications 4 as both a Tier 1 and a Tier 2 center. As a Tier 1 center it would receive the RAW data and other support files from CERN. These files would be archived locally as they were received. The first and (time-separated) second reconstruction passes would be done at this one site. To save time some fraction of the first pass could be done while the RAW data were on buffers awaiting transfer to the tape archive. The remainder of the first pass would be done by reading from the tape archive. The second pass input would be entirely coming from the tape archive. There would not be the need to reserve 300 TBytes of disk space dedicated to maintaining the RAW data files themselves as long as there was a local tape archiving facility. A critical element in this second solution is the bandwidth capability into the HI computing center. By definition the FermiLab Tier 1 center solution should be expected to have the needed bandwidth. This assumes that the extra input load of the HI RAW is not excessive compared to the normal input of the p + p RAW data. Any proposed HI computing center, in the second solution model, would have to meet the specification of accepting 300 TBytes and archiving it locally in one month s time. The minimum input bandwidth is easily calculated as I (MBytes/s) = (MBytes) 120 (MBytes/s) 30 (days) 24 (hours/day) 3600 (seconds/hour) I (MBytes/s) 1 GBit/s While in the first two years of operation, one might be able to make do with less than 1 GBit/s, there is no question that by the third year of operation the HI compute center must have an average input acceptance of 1 GBit/s. In order to accommodate fluctuations in the data rate or other exigencies, the actually capacity should be significantly greater, say a factor of 2.5 higher, meaning 2.5 GBits/s. By comparison, the CMS computing TDR quotes the incoming bandwidth requirement in p + p data for a Tier 1 center to be 7.2 GBits/s. A major difference between a p+p Tier 1 facility and the proposed HI compute facility is that the input bandwidth requirement need be sustained over only 1 2 months for the HI data, but over at least 7 months for the p + p data. A separately functioning HI compute center must archive to tape the input RAW data. Ideally, the archiving to tape will be achieved as rapidly as the data arrive. If the HI compute center were to dedicate 300 TBytes of disk space to the incoming data, then the archiving could proceed at a slower rate. In the more likely case where only a limited buffer area, say 50 TBytes, is dedicated to the RAW data, then the archiving would have to be more rapid. In this model some of the RAW data might be processed immediately in a first pass, while the first pass for the remainder of the data would be reading from the tape archive. The second reconstruction of the data would also read RAW input from the tape archive. 4 The CMS computing TDR quotes the outgoing bandwidth requirement in p + p data for a Tier 1 center to be 3.5 GBit/s. If the HI compute center is to be a combined Tier 1 and Tier 2 facility, there is no obvious output bandwidth requirement for Tier 1 functions regarding the HI data. For a Tier 2 center, the CMS specification for p + p data is at least 1 GBit/s, with some Tier 2 centers hope to be as large as 10 GBits/s. The output capacity is intended to serve the Tier 3 computing facilities located at the institutions participating in the CMS-HI program. To be conservative, we can specify that the HI compute center must be rated at 2 GBits/s for output. This output capacity must be maintained for the entire year. 4 There could be a question of how long data on the tape archive should be retained. Due to the demands on processing the newest data, it becomes less likely that one will want to process data more than say 3 years old with the available CPUs. Perhaps those data could be transferred to a less expensive archive medium.

5 CMS-HI Computing Specifications 5 Bandwidth and archive specification for a HI compute center functioning as Tier 1+Tier 2 Input: 0.5 GBits/s end 2008, growing to 2.5 GBits/s by 2011, 2 months/year Output: 0.4 GBits/s end 2008, growing to 2.0 GBits/s by 2011, year round operation Archiving: 60 TBytes end 2008, growing to 300 TBytes by 2011, done in one month s time 5 Reconstruction, Analysis and Simulation Requirements 5.1 Reconstruction Passes For the reconstruction time of an average event Olga essentially quoted a time of 621 seconds on a 900 SpecInt2K processor, leading to integrated 556 KSpecInt2K-seconds. The 621 seconds in turn was derived from old OCRA simulations for which the quoted time was 20 minutes. Perhaps the 621 seconds assumed a factor of two improvement eventually. In any case, the conclusion of both the August 15 VRVS meeting and the August 16 phone meeting was that we would not obtain any new or more believable information on the CPU time per event using the CMSSW framework for several more weeks. Moreover, even such a new number would be subject to code-profiling tools in order to squeeze out all the less efficient coding. So the decision was to retain the 556 KSpecInt2K-seconds number for the purpose of setting the reconstruction requirements for this document. In the CMS language, this discussion involves the process of producing RECO events from RAW events. For nominal year operations at design HI luminosity we expect 50 million events. Hence the integrated CPU time to reconstruct these events is R = (events) SpecInt2K-seconds/event = SpecInt2K-seconds We would want to see the first reconstruction pass, and the re-reconstruction pass as well, to each take no more than 4 months time (T ) apiece. That would leave 4 more months total for analysis of each year s data. We will assume an effective 0.80 utilization rate of the CPUs, to account for I/O overhead, nodes which are down, and other outage sources during the reconstruction period. With such a utilization factor, the required compute power P RD for raw data reconstruction can be calculated at P RD = R 0.8T = SpecInt2K-seconds (days) 24 (hours/day) 3600 (seconds/hour) P RD = SpecInt2K In order to calculate the number of CPUs required to achieve this value of P RD we assume that the HI computing center will be assembled over a 4 year period with equal numbers of nodes being purchased each year in order to take advantage of Moore s Law. The first purchase year would be in mid-calendar The remaining three years would be witnessing a ramp-up of the RAW data volume as 20%, 40%, and 100% of the final nominal year volume. We can calculate the total number of CPUs required using the following conservative assumptions 1) Equal numbers of compute nodes are purchased each year for four years starting sometime in mid For definitiveness, we assume that the nodes are single quad-cpus, although one can get the same CPU number result in terms of dual quad-cpus. 2) We assume that the first purchase year has CPU processors rated at S = 1900 SpecInt2K. 3) We assume that each year there is a 25% growth in power per CPU processor for the same cost per CPU, in some currently valid approximation of Moore s Law.

6 CMS-HI Computing Specifications 6 With these assumptions, the only unknown is the number of CPUs to be purchased in each of the four years. It is easy to work out 5 that the number of CPUs to be purchased each year under the above assumptions is 320. After four years, the total number of CPUs would be 1280, and these CPUs will total to SpecInt2K in overall CPU power, enough power to meet the goal of one pass of reconstruction in four months. The analysis and simulation goals will be discussed in the following two sections. Whether these 1280 CPUs are in single quad-cpu units or dual quad-cpu units is a second order question to be answered in terms of cost/performance comparisons and networking tests. If these assumptions are placed into a spreadsheet, it is straightforward to change some of the parameters to see the effect on the total number of CPUs. For example, if one wanted to be more optimistic about the Moore s Law growth, and assume say 1.40 instead of 1.25, then the required number of CPUs at the end four years would be 1040, corresponding to buying 65 quad-cpu nodes per year. Similarly, if one retained the 1.25 Moore s Law parameter but took an initial CPU processor power as 2300 instead of 1900, then the target CPU power would be reached after four years also with 65 quad-cpus. 5.2 Real Data Analysis Passes Each reconstruction pass will yield RECO and AOD type event files, with the AOD files a factor of 10 smaller than the RECO files. The CMS model is that the AOD files will serve as input for the vast majority of analysis projects. Possibly a subset of the RECO files will be used for additional verification of the analysis methods or specialized analyses. In her presentation, Olga had the real data analysis requirement as 0.25R in the first year of data taking, and 0.5R in the second and third years of data taking. Here, R represents one pass of reconstruction over the nominal year data, SpecInt2K-seconds. For this document I will be a little more conservative and state that the real data analysis requirement R A is 0.5R in all three data taking years. This means R A = SpecInt2K-seconds. Furthermore, in the simplest approach, this analysis will be carried out in two passes of two months each, one after each of the four-month reconstruction passes 6, with each of these two month analysis passes requiring R A /2 = R/4 of computing time. Thus the net value of the SpecInt2000-seconds has been reduced by a factor of four compared to the reconstruction pass, while the time to complete the analysis pass has been reduced by a factor of two. Trivially then, each analysis pass will require one-half the CPU power as the each reconstruction pass. Hence the power for each analysis pass, symbolized as P AD is P AD = 0.5P RD = SpecInt2K-seconds Moreover, this means that during the each two month analysis pass the computer farm will have half of its power available for simulation purposes. 5.3 Simulation Analysis For the simulation analysis, Olga assumed that we would be reconstructing 5% of the real data in the nominal data production years. The CPU time to produce and analyze a simulation event is taken to be twenty times the time to reconstruct a real data event. So the simulation requirement per nominal year is R SpecInt2K-seconds. 5 See the spreadsheet at maguirc/cms-hi/computingrequirementsexcel.xls which also allows for different input parameters assumptions. 6 One can modify this approach to have partial overlaps of the reconstruction and analysis processing, but that will not change the total required SpecInt2000-seconds.

7 CMS-HI Computing Specifications 7 In the above time division of functions, there are four months in which simulations could be sharing the farm with the analysis jobs. Since the analysis jobs will exhaust R/2, this means that only half of the specified simulations can be carried out in these four months. In order to complete the remaining R/2 of simulations, we have to add CPUs which can be dedicated to simulations for the entire year. Approximately additional CPUs would be needed to carry out the remainder of the simulations, using the same numerical assumptions for the assembly of the computer farm. 5.4 Comparison of HI Computing Center CPUs Installation with Projected Needs The four year plan for purchasing the HI Computing Center CPUs in comparison with the assumed simulation and real data computing needs in each of these years is shown in the following table. Year CPUs Total Power Available Time MC Computing Real Computing Avail Need (SpInt2K) (SpInt2K-sec) (SpInt2K-sec) (SpInt2K-sec) (SpInt2K-sec) a a b b c c d d a In 2008 the CPUs are assumed to arrive in mid-year. The desired number of simulation events for a thorough study would be the nominal year amount, but there is not enough first-year CPU power to accomplish that goal. b In 2009 the real data is assumed to arrive at 20% the nominal year amount, meaning 10 million events. There will be 500 thousand more MC events in this year. c In 2010 the real data is assumed to arrive at 40% the nominal year amount, meaning 20 million events. There will be 1 million more MC events in this year. d In 2011 the real data is assumed to arrive at the nominal year amount, meaning 50 million events. There will be 2.5 million more MC events in this year. In the 2008, when there are no data, the computing will be all simulations largely done for the purpose of perfecting the reconstruction and analysis algorithms to be applied to the next year s real data. Since it is doubtful that the HI computing center will be installed and operational before June 2008, I am allowing for only half a year s production. This means that there will be at most only about 18 million MC events processed instead of a desired 50 million events. Hence, this table shows for 2008 a computing deficit of SpecInt2000-seconds. For 2009, when 20% of the nominal year RAW data is assumed to arrive, there is an apparent surplus of computing. However, this surplus will be easily expended in making up for the missing simulation events from the first year, or in doing more than two passes on the real data. For 2010, there is again a similar surplus forecast, equal to 40% of the installed CPU power. If the real data volume is 60% of the nominal year value instead of 40%, then this surplus will largely disappear. Nonetheless, we may decide to defer purchases in 2010 into 2011 in order to take advantage of any more price reductions. Finally in 2011, which is the year when the design goal of 50 million RAW events is expected, the available computing power and the projected needed computing power are matched to within 4% by design of this plan.

8 CMS-HI Computing Specifications 8 6 Disk Storage Requirement for the HI Computing Center The HI Computing Center will need significant disk storage for three purposes 1) Input buffers for incoming RAW and other data files sent by the CERN Tier 0 while and immediately after the HI data are acquired. 2) Resident disk areas for the MC production and analysis. One does not want to be repeatedly replaying MC production off tape, at least in the initial few years. 3) Resident disk areas for the real data output analysis. Items 2) and 3) will also be archived to tape when appropriate, but we want to be working off disk areas as much as possible in the first two or three years, instead of having to wait for tape accesses. In addition to these major disk uses, individual analyzers or Physics Working Groups (PWGs) will need space for developing their software. These disk areas will be much smaller than the disk areas which accommodate large volumes of MC or real data production. However, these individual and PWG areas will need to be backed-up on a daily basis. That backup service is a separate cost which will eventually be counted or donated. In the first year of operation (2008) there will be primarily MC data driven needs, although we should be practicing the transfer of files from the Tier 0 in volume amounts approaching what we expect in So some of item 1) disk space should be installed. 6.1 New Forecasts of Disk Storage As with predicting CPU power requirements, the prediction of disk requirements requires making assumptions which cannot be made with good precision several years in advance. My table of anticipated disk need for the first four years of operation is shown below, followed by the assumptions being made in each year. Year MC RAW RECO AOD PWG Data+User Total (TB) (TB) (TB) (TB) (TB) (TB) In 2008, I am assuming that the MC production will be about 1/3 of the nominal year production, in line with the number of CPUs available in that (half) year. This amounts to about 875,000 events. Each of those simulation events, as produced by GEANT4, is assumed to be five times the size of a raw event. This factor of five includes the extra simulation information which goes into producing a simulated RAW event. So MC production, without further reconstruction, is forecast to take about 26 TBytes. The third column in the above table shows the volume of the RAW data buffer. In the first year, I am assigning 20 TBytes of disk space on which to practice the transfer of files from the CERN Tier 0. That 20 TBytes corresponds to 1/3 of the initial data year s (2009) anticipated volume. In the fourth column for the first year I have the RECO output corresponding only to the 875K MC events input data, since there are no real data in the first year. I allow for three passes of the MC data to produce three versions of RECO output.

9 CMS-HI Computing Specifications 9 The fifth column shows the AOD output. According to the Olga s presentation, one scales the AOD output by a factor 0.20 (=0.3/1.5) compared to the RECO data. The sixth column shows the analysis output from the AOD input. On an annual basis I simply put in the PWG output as two times the AOD input. We may do more than two passes over each AOD pass, but we can decide to retain on disk only the two most recent sets of results in a given year. However, I keep accumulating the analysis output from the prior years. The last column shows the total of columns 2 6 with an extra 5% to account for user software areas which are backed up. In the second operations year 2009 I assume that because enough additional CPU is available that we can run the full amount (2.5 million) of nominal year MC events. In the third and fourth years of operation, I keep the same amount of MC disk space, assuming that we can reuse, or replace, the prior year simulations. Similarly, the RAW data input buffer volumes for all three data years in column three are kept fixed at 50 TBytes. As mentioned earlier in this document, we should not plan reserving dedicated space for all the RAW data if that space will be read only twice per year. The 50 TBytes represents 1/6 of the nominal year RAW data input, which should be an adequate buffer for archiving to tape. The RECO column for the first data year 2009 assumes that we will keep all that year s RAW data reconstruction (RECO) output on disk. In addition, we will be keeping all the MC reconstruction output on disk. In the CMS TDR it is said that the RECO outputs will not normally be used for physics analysis, but we can plan on using those in the first year to assist in developing better analysis algorithms which are supposed to be based on only the smaller AOD data format. In 2010 I am assuming that we are keeping only half the produced real RECO output on disk (one pass) along with the same amount of MC reconstruction output as in The RECO disk area remains the same in 2010 as in 2009 because the assumption is that there will be twice as much real data in 2010 as in Lastly, in 2011 when the real data volume should go up by a factor of 2.5 compared to 2010, I am also assuming that we keep only one RECO pass on disk. The amount of RECO disk space in 2011 jumps accordingly, compared to Comparison with Previous Forecast for Disk Storage The previous forecast for disk storage was contained in the lower figure on page 9 of Olga s presentation. In 2011, Olga s figure has about 1.5 PBytes as the predicted number for disk storage, as compared with the 385 TBytes shown in the above table. Our computer center charges $700 per TByte of purchased disk space; so the 1.5 PBytes would cost approximately one million dollars to buy. So why is there such a huge difference between Olga s prediction and mine? The major difference is that Olga s amount is a cumulative one, meaning that all disk space used in the previous year is not overwritten and new space is constantly being added for the next year s work. In my model, last year s disk space tends to get overwritten with this year s needs, or re-used in the case of MC data, except for the final analysis output. If one were to sum all annual numbers in my model it would come to about 750 TBytes, half of what Olga showed. This re-writing in my model may be too optimistic an assumption. On the other hand with a small group like CMS-HI-US, can we really expect to be looking at all three year s of previous files in the fourth year? There may not be enough CPU power to do that, even if we had the human resources. It should be realized that the previous year s output will be archived to tape. A user could re-read that information from tape onto the local (non-networked) disks of the compute nodes, which do not have to be stated in the above table. So one could have a certain amount of prior year data being accessed on a node-by-node basis, which may be sufficient for the intended purposes.

10 CMS-HI Computing Specifications 10 A second smaller difference is that Olga might be assuming that the entire accumulated amount of RAW and RECO data is being kept on disk. In my model, I am assuming that even in a single year we never store all the RAW data on disk, and only keep fractions of the RECO data on disk, relying primarily on the smaller AOD data set. A third difference with Olga s presentation is that she had multiple AOD outputs per RECO output. Since the AOD events are a defined subset of the RECO events, I did not see why multiple AOD outputs would be necessary. It s possible that these multiple AOD outputs were accounting for what I have in the multiple PWG outputs. Ultimately the price of disk space will be a seriously painful constraint. In this respect, I point out the current numbers from the PHENIX experiment, which also tends to recycle disk use in subsequent years. The equivalent RAW data input in PHENIX for 2007 was over 600 TBytes, at least a factor of two more than we expect to get from CMS in nominal years. The available disk space in PHENIX is also about 600 TBytes. So in this respect, the final year total of 385 TBytes almost scales with what PHENIX presently has. On the other hand, ALICE is requesting 10.2 PBytes of transient disk storage, and that is also based on non-cumulative one-year needs. ALICE must account for the needs of the p + p program, storage at 4 Tier 1 sites (7.5 PBtyes), and an unspecified number of Tier 2 sites (2.6 PBytes). Nonetheless, a factor of 26 difference between what ALICE claims to need and what CMS-HI claims to need as transient disk space is hard to reconcile. 7 Tape Archiving Requirement The tape archiving requirement, shown in the table below, follows rather directly from the disk storage requirements discussion. The major difference in the two tables is that the RAW and RECO requirements must include the total amount of data, not the partial amounts which are being stored on the disk. Also a cumulative total is given in the final column instead of just the annual amounts since the annual cost of storage will be based on the cumulative number. Naturally, after two years we will have Year MC RAW RECO AOD PWG Cumulative (TB) (TB) (TB) (TB) (TB) (TB) enough experience to adjust all of these predicted numbers. 8 Summary Specifications have been presented for the needed data bandwidths, CPU processing power, data storage, and tape archiving capabilities for CMS-HI computing. The most practical solution appears to be a separate, dedicated HI computing center functioning as a combined Tier 1 and Tier 2 facility. The CPU power numbers derived here are comparable to what has previously been shown, within the uncertainties of various input parameters. The disk storage and tape archiving requirements are a factor of two smaller than previously shown, but the numbers here may be closer to what can be realistically budgeted in the near future.

August 31, 2009 Bologna Workshop Rehearsal I

August 31, 2009 Bologna Workshop Rehearsal I 1 The CMS-HI Research Plan Major goals Outline Assumptions about the Heavy Ion beam schedule CMS-HI Compute Model Guiding principles Actual implementation Computing