Visita delegazione ditte italiane CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Massimo Lamanna/CERN IT department - Data Storage Services group
Innovation in Computing in High-Energy Physics Demanding science Demanding computing Power usage Innovation Web invention Grid computing (LHC Computing Grid) 2
Example:CMS CMS is a general-purpose detector with the same physics goals as ATLAS, but different technical solutions and design. It is built around a huge superconducting solenoid. This takes the form of a cylindrical coil of superconducting cable that will generate a magnetic field of 4 T, about 100 000 times that of the Earth. About 4000 people work for CMS from 182 institutes in 42 countries (May 2013). Detector: 21 m long, 15 high m and 15 m wide; 12 500 t http://cmsinfo.cern.ch/outreach 40 MHz collisions (bunch crossing) Typical data rate (RAW data; pp): 300 MB/s in Run 2 rates will go up! CMS: ~1 GB/s O(100) Hz selected events 3
Reconstruction, analysis and simulation software for the LHC experiments: O(10 7 ) lines of C++ Large sections are experiment specific Written by the community (100s of developers) Physicists needed here Used by 1000s physicists (actively changing the code) Querying/skimming large data Query running custom complex programs 4
CERN Computer Centre CERN computer centre: Built in the 70s on the CERN site (Meyrin-Geneva) ~3000 m 2 (4 main machine rooms) 3.5 MW for equipment Est. PUE ~ 1.6 New extension: Located at Wigner (Budapest) ~1000 m 2 2.7 MW for equipment Connected to the Geneva CC with 2x100Gb links (21 and 24 ms RTT) 5
2014 Overall Numbers ~9 PB written in 2014 (2013: ~10PB) 98.5PB on 305M files Reduced user activity (no LHC data taking) LHC run: 1 PB/week (or more!) Infrastructure: Disk farms CASTOR + EOS 40 PB 80 PB ~1600 disk servers ~PCs with 24 disks Tapes: 7 libraries, 65K slots 50K tapes 141 drives ~14 PB read in 2014 (2013: ~23PB) CPU 100,000 cores 6
Data distribution (e.g. CERN distributes RAW data to Tier1s) (Large) processing campaigns on preplaced data (e.g. Reprocessing) Download small sample for local analysis? but Scale out with grid jobs (user executable dispatched where data are) Using file parallelism (data set list of files list of independent jobs) Federating storages Recall data `on the flight 7
Intercontinental links (data and jobs) AP area: Taipei (Tier1), Tokyo, Beijing, Seul, Melbourne, Mumbai, NorthAmerica: BNL and Fermilab (US Tier1), Victoria (Canada Tier1) and many Tier2 like Stanford, MIT, Wisconsin, Argonne, South America: several Tier2s Africa: few sites as the South Africa Tier2s 8
CMS processing: wall clock consumption Tier0 and Tier1 processing Top: last week Bottom: Oct 2012 (data taking) Sizeable even if no data taking: continuous reprocessing Reconstruction activities RAW reconstructed objects Organised processing Output for physicists analysis They can access RAW data if needed Final analysis more efficient on the files containing the reconstructed objects The grid never sleeps 9
Storage Strategy @ CERN
Two interesting directions Innovation for heavy-duty task EOS Interesting solutions for collaboration CERNBox Close contacts with technology leaders Ceph 11
EOS: Large disk farms for physics and beyond Currently ~25 PB used quota ~60 PB quota (@Dec2014) Developed in CERN/IT (DSS) Original goal Large scale (PBs for 100s/1000s independent scientists) analysis of LHC data Arbitrary level of data durability via cross-node file replication or RAIN using commodity hardware Status Open to non-physics use cases NB: large number of protocols available 12
EOS installed across the CERN computer centres EOS takes advantages of the two CERN computer centres Coping with ~20-ms latency Distributing copies across the two sites for dependability and performance Status As today we are crossing the 30% mark Expect to be at ~ 50% next year New acquisitions once per year (10s of PB, 100s of boxes) adding capacity and replacing obsoleted boxes Bottom line: not at all trivial It is an unique tool Developed for our needs Much broader applications possible 13
Towards large-scale data sync and share Starting point: the classical Dropbox use-case usability and easiness over high-performance Based on OwnCloud Currently deployed CERNBox beta Data in our data centre!!! But can we bring this system to the next level? Our core-business workflows and large-scale workloads expose PBs of existing data from day 1 integration into physics data processing central services: batch, interactive data analysis applications sync higher data volumes at higher rates Can we still keep the simplicity of cloud storage access? Yes: using EOS as a backend Seamless integration of the work environment (mobile devices) and the CERN IT central services (batch and grid) 14
Sync client (webdav) Web access (https) Architecture HTTPS HTTPS LB HTTPS LB HTTPS LB LB Data flow Metadata flow Data directly accessible by the user USER http (public data) https (private data) OC http (internal) KHz metadata ops fuse All sync state as metadata in the storage Files written with USER credentials STORAGE (EOS) IO redirect disk servers (1000s) namespace
~40 participants (from non HEP) Keynote: B.Pierce (Penn Uni) Technology & Users Site reports Vendor talks IBM Powerfolder SeaFile PyDio Owncloud 16
Immediate access to all our data! EOS Spring 2014 17
Ceph @ CERN Probably the most promising cloud storage technology Hand-in-hand with our OpenStack infrastructure Testing began in early 2013; 3PB cluster deployed in August 2013 Our use-cases: OpenStack Cinder volumes to offer persistent, thinly provisioned disks to our VMs OpenStack Glance image repository for system images and VM snapshots Consolidating our NFS and OpenAFS storage services on Ceph block devices Future large scale object stores for physics data We built and maintain a close collaboration with Inktank (now RedHat) Operations experience of one of the largest clusters in the world Invited presentations at Ceph Days in London and Frankfurt Development contributions in the Ceph source from three CERN IT staff
OpenStack with Ceph @ CERN Linear growth with nearly 300TB consumed (incl. 3x replication): Close to 700 volumes consuming >250TB data More than 1000 machine images Ceph is running on our standard physics data servers, which is not tuned for Ceph Augmented with SSDs to improve low latency high IOPS performance. We make frequent contributions of operational experience (and patches) back to the community. Increasing space usage Growing number of Cinder volumes
Future Plans for Ceph @ CERN New instance at the CERN data centre in Budapest in 2015 To offer volume service to VMs in Budapest Redundancy for DR and business continuity Ongoing development for physics data storage on Ceph, with two approaches: Thin storage gateways to adapt our existing storage systems to a Ceph backend Co-hosting our physics service gateways alongside the Ceph OSDs. This minimizes duplicated network traffic. 10PB test planned for 2015 Now that RedHat is the caretaker of Ceph, we re hopeful for a close integration with RHEL7.x which could enabling: Native NFS-like home directory service (e.g. CephFS) Enterprise databases
Conclusions and QA Challenge: make LHC analysis possible IT infrastructure providing services: Dependable Cost effective High performance Large scale international collaborations Not only pure number crunching (or byte storing ) Concurrent access ( high performance applications) Remote access ( cloud computing) Collaborative access ( large user communities) Xavier Cortada (with the participation of physicist Pete Markowitz), "In search of the Higgs boson: H -> ZZ", digital art, 2013. 21