Realtime Data Analytics at NERSC Prabhat XLDB May 24, 2016-1 -
Lawrence Berkeley National Laboratory - 2 -
National Energy Research Scientific Computing Center 3
NERSC is the Production HPC & Data Facility for DOE Largest funder of physical science research in U.S. Biological and Environmental Systems Applied Math, Exascale Materials, Chemistry, Geophysics Particle Physics, Astrophysics Nuclear Physics Fusion Energy, Plasma Physics - 4 -
Focus on Science NERSC supports the broad mission needs of the six DOE Office of Science program offices 6,000 users and 750 projects Extensive science engagement and user training programs 2078 refereed publications in 2015-5 -
NERSC - 2016 Edison: Cray XC-30 7.6 PB Local Scratch 163 GB/s 80 GB/s Global Scratch 3.6 PB 5 x SFA12KE 5,576 nodes, 133K, 2.4GHz Intel IvyBridge Cores, 357TB RAM 16x FDR IB 50 GB/s /project 5 PB DDN9900 & NexSAN Cori: Cray XC-40 Ph1: 1630 nodes, 2.3GHz Intel Haswell Cores, 203TB RAM Ph2: >9300 nodes, >60cores, 16GB HBM, 96GB DDR per node 28 PB Local Scratch >700 GB/s 1.5 PB DataWarp >1.5 TB/s 32x FDR IB 5 GB/s 12 GB/s /home HPSS 250 TB NetApp 5460 50 PB stored, 240 PB capacity Data-Intensive Systems PDSF, JGI,KBASE,HEP 14x QDR Vis & Analytics Data Transfer Nodes Adv. Arch. Testbeds Science Gateways Ethernet & IB Fabric Science Friendly Security Production Monitoring Power Efficiency WAN - 6-2 x 10 Gb 1 x 100 Gb Software Defined Networking
The Cori System Cori will transition HPC and datacentric workloads to energy efficient architectures System named after Gerty Cori, Biochemist and first American woman to receive the Nobel prize in science. - 7 -
DOE facilities are facing a data deluge Astronomy Genomics Climate Physics Light Sources
- 9 -
- 11 -
- 12 -
- 13 -
- 14 -
- 15 -
- 16 -
- 17 -
4 V s of Scientific Big Data Science Domain Astronomy Variety Volume Velocity Veracity Multiple Telescopes, multi-band/spectra O(100) TB 100 GB/night 10 TB/night Noisy, acquisition artefacts Light Sources Genomics Multiple imaging modalities Sequencers, Massspec, proteomics O(100) GB 1 Gb/s-1 Tb/s Noisy, sample preparation/acquisition artefacts O(1-10) TB TB/week Missing data, errors High Energy Physics Multiple detectors O(100) TB O(10) PB 1-10 PB/s reduced to GB/s Noisy, artefacts, spatiotemporal Climate Simulations Multi-variate, spatio-temporal O(10) TB 100 GB/s Clean, need to account for multiple sources of uncertainty - 18 -
Why Real-time Analytics? Why Now? Large instruments are producing massive data streams Fast, predictable turnaround is integral to the processing pipeline Traditional HPC systems use batch queues with long or unpredictable wait times Computational Steering <-> Experimental Steering Change experimental configuration during your precious beam-time! Follow-on analysis might be time critical Supernovae candidates, asteroid detection - 19 -
Real-time Use Cases Realtime interaction with experimental facilities Light Sources: ALS, LCLS Realtime jobs driven by web portals OpenMSI, MetAtlas Computational Steering DIII D reactor Experimental Steering iptf follow-on - 20 -
Real-time Queue at NERSC NERSC has made a small pool of nodes available for immediate turnaround / Realtime computing Up to 32 nodes in realtime queue (1024 cores) Realtime nodes have higher priority than other queues Pool can shrink or grow as needed based on demand Approved projects have a small number of nodes available on-demand without queue wait times Configurations on a per-repo basis for Maximum number of jobs Maximum number of cores Wallclock - 21 -
Usage (12/2015-04/2016) - 22 -
Distribution TOTALS: 332,625 hours used 23,244 jobs - 23 -
Science Use Case: iptf Nightly images transferred Subtractions performed Candidates inserted in database Typical turn-around time < 5 minutes DISCOVERIES Yi Cao, et al. (2015) Nature, A strong ultraviolet pulse from a newborn Type Ia supernova PI: Kasliwal, Nugent, Cao - 24 -
Science Use Case: Advanced Light Source Image reconstruction algorithms run on Cori 3D volume rendered on SPOT web portal ALS beamline users receive instant feedback Production running at ALS beamlines: 24x7 Operation 176,293 Datasets 155 Beamline Users 1,050 TB Data Stored 2,379,754 Jobs at NERSC - 25 -
Science Use Case: Metabolite Atlas Pre-computed fragmentation trees for 10,000+ compounds Real-time queue used for comparing raw spectra to trees to obtain possible matches Results obtained in minutes ipython interface to NERSC Ben Bowen, LBL - 26 -
Science Use Case: Cryo-Electron Microscopy Structure determination of TFIID 10-100 GB image stacks Image classification Real time queue used for Assessment of data quality during electron microscopy data collection Rapid optimization of data processing strategies 3D structure of TFIID-containing complex Nogales Lab Louder et al. (2016), Nature 531 (7596): 604-619
LCLS Workflow Today: 150 TB Analysis in 5 days HPS S Global Scratch /Project (NGF) stream XTC format Global Scratch /Project (NGF) HPS S DAQ multilevel data acquisition and control system Science DMZ Compute Engine Cray XC30 Cornell SLAC Pixel Array hitfinder hitfinder psana hitfinder Diffraction Detector Injector Prompt analysis requires Fast Networks & Real-time HPC Queues spotfinder index integrate spotfinder index integrate Reconstruction spotfinder index integrate Actionable knowledge for Next Beamtime
LCLS-II 2019: Nanocrystallography Pipeline 2GB/s HPC Streaming data from the detector to HPC 100-1000x data rates Indexing, classification, reconstruction, via on-the-fly veto system Quasi real-time response (<10 min) Terabit/s throughput from front-end electronics Petaflop scale analysis on-demand Indexed Diffraction Image Reconstructed structure
Key Takeaways Data streaming and real-time analytics are emerging requirements at NERSC Experimental facilities are heaviest users Light sources, Telescopes SDN capabilities are needed to enable data flows directly between compute node and workflow DBs Users would like to use realtime nodes to do more long-running interactive work/debugging Provisioning resources for real-time queue is an ongoing exercise - 30 -
Acknowledgments Shreyas Cholia Doug Jacobsen (NERSC) NERSC Real-time queue users! - 31 -
Thanks! - 32 -