Realtime Data Analytics at NERSC

Similar documents
NERSC. National Energy Research Scientific Computing Center

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Lustre architecture for Riccardo Veraldi for the LCLS IT Team

Cori (2016) and Beyond Ensuring NERSC Users Stay Productive

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

ALICE Grid Activities in US

HPC Capabilities at Research Intensive Universities

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud

Evaluation of Parallel I/O Performance and Energy with Frequency Scaling on Cray XC30 Suren Byna and Brian Austin

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Enabling a SuperFacility with Software Defined Networking

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

Network Support for Data Intensive Science

Linac Coherent Light Source (LCLS) Data Transfer Requirements

HPC Storage Use Cases & Future Trends

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Overview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization

Customer Success Story Los Alamos National Laboratory

Present and Future Leadership Computers at OLCF

NAMD Performance Benchmark and Profiling. January 2015

Enabling a SuperFacility with Software Defined Networking

Big Data in HPC. John Shalf Lawrence Berkeley National Laboratory

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Scientific Computing at SLAC. Amber Boehnlein

X-ray imaging software tools for HPC clusters and the Cloud

UQ Infrastructure for Data Intensive Science

Motivation Goal Idea Proposition for users Study

The Data exacell DXC. J. Ray Scott DXC PI May 17, 2016

A declarative programming style job submission filter.

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

Storage Supporting DOE Science

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

Stream Processing for Remote Collaborative Data Analysis

International Big Science Coming to Your Campus Soon (Sooner Than You Think )

HPC SERVICE PROVISION FOR THE UK

GPFS for Life Sciences at NERSC

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Engagement With Scientific Facilities

Handling and Processing Big Data for Biomedical Discovery with MATLAB

Python at NERSC. Rollin Thomas NERSC Data and Analytics Services IXPUG

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

MILC Performance Benchmark and Profiling. April 2013

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

Metadata Models for Experimental Science Data Management

Gen-Z Memory-Driven Computing

Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures

Advanced Photon Source Data Management. S. Veseli, N. Schwarz, C. Schmitz (SDM/XSD) R. Sersted, D. Wallis (IT/AES)

Interactively Visualizing Science at Scale

SDS: A Framework for Scientific Data Services

libhio: Optimizing IO on Cray XC Systems With DataWarp

SENSE: SDN for End-to-end Networked Science at the Exascale

Lightweight Streaming-based Runtime for Cloud Computing. Shrideep Pallickara. Community Grids Lab, Indiana University

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Towards a Strategy for Data Sciences at UW

On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows

The Computation and Data Needs of Canadian Astronomy

SLIDE 1 - COPYRIGHT 2015 ELEPHANT FLOWS IN THE ROOM: SCIENCEDMZ NATIONALLY DISTRIBUTED

A New NSF TeraGrid Resource for Data-Intensive Science

FuncX: A Function Serving Platform for HPC. Ryan Chard 28 Jan 2019

NetApp: Solving I/O Challenges. Jeff Baxter February 2013

HSC Data Processing Environment ~Upgrading plan of open-use computing facility for HSC data

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

I/O at the Center for Information Services and High Performance Computing

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

Adaptive selfcalibration for Allen Telescope Array imaging

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-Volatile Burst Buffers

NERSC Site Report One year of Slurm Douglas Jacobsen NERSC. SLURM User Group 2016

CHARACTERIZING HPC I/O: FROM APPLICATIONS TO SYSTEMS

1. Many Core vs Multi Core. 2. Performance Optimization Concepts for Many Core. 3. Performance Optimization Strategy for Many Core

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Programmable Information Highway (with no Traffic Jams)

Performance quality monitoring system (PQM) for the Daya Bay experiment

Smart Trading with Cray Systems: Making Smarter Models + Better Decisions in Algorithmic Trading

A Big Big Data Platform

DAQ system at SACLA and future plan for SPring-8-II

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

API and Usage of libhio on XC-40 Systems

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

IBM CORAL HPC System Solution

User s Perspective for Ten Gigabit

WELCOME TO THE JUNGLE!

LBRN - HPC systems : CCT, LSU

Improving Packet Processing Performance of a Memory- Bounded Application

The BioHPC Nucleus Cluster & Future Developments

Application and System Memory Use, Configuration, and Problems on Bassi. Richard Gerber

Headline in Arial Bold 30pt. Visualisation using the Grid Jeff Adie Principal Systems Engineer, SAPK July 2008

BROCADE CLOUD-OPTIMIZED NETWORKING: THE BLUEPRINT FOR THE SOFTWARE-DEFINED NETWORK

University at Buffalo Center for Computational Research

Monash High Performance Computing

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

Saving Energy with Free Cooling and How Well It Works

Parallel File Systems. John White Lawrence Berkeley National Lab

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

Transcription:

Realtime Data Analytics at NERSC Prabhat XLDB May 24, 2016-1 -

Lawrence Berkeley National Laboratory - 2 -

National Energy Research Scientific Computing Center 3

NERSC is the Production HPC & Data Facility for DOE Largest funder of physical science research in U.S. Biological and Environmental Systems Applied Math, Exascale Materials, Chemistry, Geophysics Particle Physics, Astrophysics Nuclear Physics Fusion Energy, Plasma Physics - 4 -

Focus on Science NERSC supports the broad mission needs of the six DOE Office of Science program offices 6,000 users and 750 projects Extensive science engagement and user training programs 2078 refereed publications in 2015-5 -

NERSC - 2016 Edison: Cray XC-30 7.6 PB Local Scratch 163 GB/s 80 GB/s Global Scratch 3.6 PB 5 x SFA12KE 5,576 nodes, 133K, 2.4GHz Intel IvyBridge Cores, 357TB RAM 16x FDR IB 50 GB/s /project 5 PB DDN9900 & NexSAN Cori: Cray XC-40 Ph1: 1630 nodes, 2.3GHz Intel Haswell Cores, 203TB RAM Ph2: >9300 nodes, >60cores, 16GB HBM, 96GB DDR per node 28 PB Local Scratch >700 GB/s 1.5 PB DataWarp >1.5 TB/s 32x FDR IB 5 GB/s 12 GB/s /home HPSS 250 TB NetApp 5460 50 PB stored, 240 PB capacity Data-Intensive Systems PDSF, JGI,KBASE,HEP 14x QDR Vis & Analytics Data Transfer Nodes Adv. Arch. Testbeds Science Gateways Ethernet & IB Fabric Science Friendly Security Production Monitoring Power Efficiency WAN - 6-2 x 10 Gb 1 x 100 Gb Software Defined Networking

The Cori System Cori will transition HPC and datacentric workloads to energy efficient architectures System named after Gerty Cori, Biochemist and first American woman to receive the Nobel prize in science. - 7 -

DOE facilities are facing a data deluge Astronomy Genomics Climate Physics Light Sources

- 9 -

- 11 -

- 12 -

- 13 -

- 14 -

- 15 -

- 16 -

- 17 -

4 V s of Scientific Big Data Science Domain Astronomy Variety Volume Velocity Veracity Multiple Telescopes, multi-band/spectra O(100) TB 100 GB/night 10 TB/night Noisy, acquisition artefacts Light Sources Genomics Multiple imaging modalities Sequencers, Massspec, proteomics O(100) GB 1 Gb/s-1 Tb/s Noisy, sample preparation/acquisition artefacts O(1-10) TB TB/week Missing data, errors High Energy Physics Multiple detectors O(100) TB O(10) PB 1-10 PB/s reduced to GB/s Noisy, artefacts, spatiotemporal Climate Simulations Multi-variate, spatio-temporal O(10) TB 100 GB/s Clean, need to account for multiple sources of uncertainty - 18 -

Why Real-time Analytics? Why Now? Large instruments are producing massive data streams Fast, predictable turnaround is integral to the processing pipeline Traditional HPC systems use batch queues with long or unpredictable wait times Computational Steering <-> Experimental Steering Change experimental configuration during your precious beam-time! Follow-on analysis might be time critical Supernovae candidates, asteroid detection - 19 -

Real-time Use Cases Realtime interaction with experimental facilities Light Sources: ALS, LCLS Realtime jobs driven by web portals OpenMSI, MetAtlas Computational Steering DIII D reactor Experimental Steering iptf follow-on - 20 -

Real-time Queue at NERSC NERSC has made a small pool of nodes available for immediate turnaround / Realtime computing Up to 32 nodes in realtime queue (1024 cores) Realtime nodes have higher priority than other queues Pool can shrink or grow as needed based on demand Approved projects have a small number of nodes available on-demand without queue wait times Configurations on a per-repo basis for Maximum number of jobs Maximum number of cores Wallclock - 21 -

Usage (12/2015-04/2016) - 22 -

Distribution TOTALS: 332,625 hours used 23,244 jobs - 23 -

Science Use Case: iptf Nightly images transferred Subtractions performed Candidates inserted in database Typical turn-around time < 5 minutes DISCOVERIES Yi Cao, et al. (2015) Nature, A strong ultraviolet pulse from a newborn Type Ia supernova PI: Kasliwal, Nugent, Cao - 24 -

Science Use Case: Advanced Light Source Image reconstruction algorithms run on Cori 3D volume rendered on SPOT web portal ALS beamline users receive instant feedback Production running at ALS beamlines: 24x7 Operation 176,293 Datasets 155 Beamline Users 1,050 TB Data Stored 2,379,754 Jobs at NERSC - 25 -

Science Use Case: Metabolite Atlas Pre-computed fragmentation trees for 10,000+ compounds Real-time queue used for comparing raw spectra to trees to obtain possible matches Results obtained in minutes ipython interface to NERSC Ben Bowen, LBL - 26 -

Science Use Case: Cryo-Electron Microscopy Structure determination of TFIID 10-100 GB image stacks Image classification Real time queue used for Assessment of data quality during electron microscopy data collection Rapid optimization of data processing strategies 3D structure of TFIID-containing complex Nogales Lab Louder et al. (2016), Nature 531 (7596): 604-619

LCLS Workflow Today: 150 TB Analysis in 5 days HPS S Global Scratch /Project (NGF) stream XTC format Global Scratch /Project (NGF) HPS S DAQ multilevel data acquisition and control system Science DMZ Compute Engine Cray XC30 Cornell SLAC Pixel Array hitfinder hitfinder psana hitfinder Diffraction Detector Injector Prompt analysis requires Fast Networks & Real-time HPC Queues spotfinder index integrate spotfinder index integrate Reconstruction spotfinder index integrate Actionable knowledge for Next Beamtime

LCLS-II 2019: Nanocrystallography Pipeline 2GB/s HPC Streaming data from the detector to HPC 100-1000x data rates Indexing, classification, reconstruction, via on-the-fly veto system Quasi real-time response (<10 min) Terabit/s throughput from front-end electronics Petaflop scale analysis on-demand Indexed Diffraction Image Reconstructed structure

Key Takeaways Data streaming and real-time analytics are emerging requirements at NERSC Experimental facilities are heaviest users Light sources, Telescopes SDN capabilities are needed to enable data flows directly between compute node and workflow DBs Users would like to use realtime nodes to do more long-running interactive work/debugging Provisioning resources for real-time queue is an ongoing exercise - 30 -

Acknowledgments Shreyas Cholia Doug Jacobsen (NERSC) NERSC Real-time queue users! - 31 -

Thanks! - 32 -