High-Energy Physics Data-Storage Challenges Richard P. Mount SLAC SC2003
Experimental HENP Understanding the quantum world requires: Repeated measurement billions of collisions Large (500 2000 physicist) international collaborations 5 10 years detector construction 10 20 years data-taking and analysis Many experiments: Alice, Atlas, BaBar, Belle, CDF, CLEO, CMS, D0, LHCb, PHENIX, STAR BaBar at SLAC Measuring matter-antimatter asymmetry (why we exist?) 500 Physicists Data taking since 1999 More data (~1 petabyte) than any other experiment Fermilab Run II (CDF, D0), RHIC at BNL (STAR, PHENIX) Petabytes soon CERN LHC (Atlas, CMS, Alice) 10s to 100s of petabytes early in the next decade
BaBar Experiment at SLAC
High Data-Rate Devices BaBar SVT
BaBar Collaboration 500 Physicists, 76 Universities/labs, 9 Countries
BaBar At SLAC for BaBar data analysis: 850 Terabyte true database over 1 petabyte of data in total 300 Terabytes of disk storage >1 Teraflop (>2 Teraop) data-intensive compute power 300 Mbits/s sustained of WAN traffic In Europe for BaBar data analysis (CCIN2P3, Padova, RAL) Mirror of much of the SLAC database (in tape robots) 100 Terabytes disk 1 Teraop data-intensive compute power 100s of Mbits/s of dedicated transatlantic bandwidth Europe + North America > 1 Teraop of simulation (to be doubled in next 12 months) All growing with Moore s Law
Are the BaBarians Happy? Typical database queries take days, weeks or months Factor 10, 100 or 1000 performance improvement would revolutionize the science Hardware alone cannot achieve these factors If a 1000-box system delivers an answer in 6 hours, it does not follow that a 60,000-box system will deliver the result in 6 minutes
SLAC Storage Architecture Client Client Client Client Client Client 1200 dual CPU Linux 900 single CPU Sun/Solaris IP Network (Cisco) Objectivity/DB object database + HEP-specific ROOT software 120 dual/quad CPU Sun/Solaris 300 TB Sun FibreChannel RAID arrays IP Network (Cisco) HPSS + SLAC enhancements to Objectivity and ROOT server code Tape Tape Tape Tape Tape 25 dual CPU Sun/Solaris 40 STK 9940B 6 STK 9840A 6 STK Powderhorn over 1 PB of data
Generic Storage Architecture Client Client Client Client Client Client Tape Tape Tape Tape Tape
Large Hadron Collider
CMS Experiment Find the Higgs
Data Management Challenges (1) Sparse access to objects in petabyte databases: Natural object size 1 10 kbytes (and tape) performance dominated by latency Approaches: Hash data over physical disks Instantiate richer database subsets for each analysis application Queue and reorder all disk access requests Keep the hottest objects in (many terabytes of) memory etc.
Data Management Challenges (2) Information management: BaBar has cataloged 60,000,000 collections (database views) Freedom to create any subset dataset or derived dataset is wonderful In a 1000-scientist collaboration the default result is chaos Approaches: Limit freedom by allocating very little space for datasets that are not designed by a committee Catalog all the subset and derived datasets Catalog the way datasets were made Virtual data etc.
Storage Characteristics Capacity Latency Speed Cost
Latency and Speed Random Access Random-Access Storage Performance 1000 100 10 Retreival Rate Mbytes/s 1 0.1 0.01 0.001 0.0001 0.00001 0.000001 PC2100 WD200GB STK9940B 0.0000001 0.00000001 0.000000001 0 1 2 3 4 5 6 7 8 9 10 log10 (Obect Size Bytes)
Latency and Speed Random Access Historical Trends in Storage Performance 1000 100 10 Retrieval Rate MBytes/s 1 0.1 0.01 0.001 0.0001 0.00001 0.000001 PC2100 WD200GB STK9940B RAM 10 years ago 10 years ago Tape 10 years ago 0.0000001 0.00000001 0.000000001 0 1 2 3 4 5 6 7 8 9 10 log10 (Object Size Bytes)
Storage Characteristics Cost Storage Hosted on Network Cost per TB ($k) net after RAID, hot spares etc. Cost per GB/s ($k) Streaming Random access to typically accessed objects Cost per GB/s ($k) Object Size Good Memory * 750 1 18 4 bytes Cheap Memory 250 0.4 6 4 bytes Enterprise SAN maxed out 40 400 8,000 5 kbytes High-quality fibrechannel disk * 10 100 2,000 5 kbytes Tolerable IDE disk 5 50 1,000 5 kbytes Robotic tape (STK 9480C) 1 2000 25,000 500 Mbytes Robotic tape (STK 9940B) * 0.4 2000 50,000 500 Mbytes * Current SLAC choice
Storage-Cost Notes Memory costs per TB are calculated: Cost of memory + host system Memory costs per GB/s are calculated: (Cost of typical memory + host system)/(gb/s of memory in this system) costs per TB are calculated: Cost of disk + server system costs per GB/s are calculated: (Cost of typical disk + server system)/(gb/s of this system) Tape costs per TB are calculated: Cost of media only Tape costs per GB/s are calculated: (Cost of typical server+drives+robotics only)/(gb/s of this server+drives+robotics)
Storage Issues Tapes: Still cheaper than disk for low I/O rates becomes cheaper at, for example, 300MB/s per petabyte for randomaccessed 500 MB files Will SLAC every buy new tape silos?
Storage Issues s: Random access performance is lousy, independent of cost unless objects are megabytes or more Google people say: If you were as smart as us you could have fun building reliable storage out of cheap junk My Systems Group says: Accounting for TCO, we are buying the right stuff
Storage Issues Software: Transparent, scalable access to data: Still waiting for a general-purpose, scalable cluster file system (Lustre?) More application-specific solutions (e.g. Objectivity, ROOT/xrootd work well Information Management BaBar physicists have created millions of data products Automated tracking of data provenance and maximized reuse of data products is becoming a requirement
Workshops on Data Management Sponsored by DOE/MICS March 16-18, 2004 SLAC Focus on application science needs and technology April 20-33, 2004 East Coast or Midwest Focus on computing science and long-term planning for the Office of Science