Big Data in HPC. John Shalf Lawrence Berkeley National Laboratory

Similar documents
Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

Gen-Z Memory-Driven Computing

Messaging Overview. Introduction. Gen-Z Messaging

Oak Ridge National Laboratory Computing and Computational Sciences

SDS: A Framework for Scientific Data Services

Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence

IBM Spectrum Scale IO performance

DDN About Us Solving Large Enterprise and Web Scale Challenges

Customer Success Story Los Alamos National Laboratory

Xyratex ClusterStor6000 & OneStor

Lessons from Post-processing Climate Data on Modern Flash-based HPC Systems

Introduction to High Performance Parallel I/O

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

BlueDBM: An Appliance for Big Data Analytics*

The Fusion Distributed File System

Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Fast Forward I/O & Storage

Structuring PLFS for Extensibility

Evaluation of Parallel I/O Performance and Energy with Frequency Scaling on Cray XC30 Suren Byna and Brian Austin

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

Leveraging Flash in HPC Systems

Highest Levels of Scalability Simplified Network Manageability Maximum System Productivity

Realtime Data Analytics at NERSC

Data Intensive Scalable Computing

CS6453. Data-Intensive Systems: Rachit Agarwal. Technology trends, Emerging challenges & opportuni=es

The Tofu Interconnect D

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Density Optimized System Enabling Next-Gen Performance

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

ENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION

Company. Intellectual Property. Headquartered in the Silicon Valley

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Data storage services at KEK/CRC -- status and plan

Stockholm Brain Institute Blue Gene/L

PHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan

Brand-New Vector Supercomputer

Slim Fly: A Cost Effective Low-Diameter Network Topology

IBM Power 755 server. High performance compute node for scalable clusters using InfiniBand architecture interconnect products.

Revealing Applications Access Pattern in Collective I/O for Cache Management

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

SurFS Product Description

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Organizational Update: December 2015

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

New Approach to Unstructured Data

Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Comet Virtualization Code & Design Sprint

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

HPC Storage Use Cases & Future Trends

By John Kim, Chair SNIA Ethernet Storage Forum. Several technology changes are collectively driving the need for faster networking speeds.

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Photonics & 3D, Convergence Towards a New Market Segment Eric Mounier Thibault Buisson IRT Nanoelec, Grenoble, 21 mars 2016

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

FastForward I/O and Storage: ACG 8.6 Demonstration

Improved Solutions for I/O Provisioning and Application Acceleration

NetApp: Solving I/O Challenges. Jeff Baxter February 2013

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS

Cluster-Level Google How we use Colossus to improve storage efficiency

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

The Future of High Performance Interconnects

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit

Supercomputer and grid infrastructure! in Poland!

S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

InfiniBand Switch System Family. Highest Levels of Scalability, Simplified Network Manageability, Maximum System Productivity

High Performance Computing Cloud - a PaaS Perspective

A 101 Guide to Heterogeneous, Accelerated, Data Centric Computing Architectures

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Storage Protocol Offload for Virtualized Environments Session 301-F

HPC NETWORKING IN THE REAL WORLD

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

Building blocks for 64-bit Systems Development of System IP in ARM

IPTV Bandwidth Demands in Metropolitan Area Networks

MOHA: Many-Task Computing Framework on Hadoop

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Parallel File Systems. John White Lawrence Berkeley National Lab

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

N V M e o v e r F a b r i c s -

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

Memcached Design on High Performance RDMA Capable Interconnects

Cori (2016) and Beyond Ensuring NERSC Users Stay Productive

An Overview of Fujitsu s Lustre Based File System

Transcription:

Big Data in HPC John Shalf Lawrence Berkeley National Laboratory 1

Evolving Role of Supercomputing Centers Traditional Pillars of science Theory: mathematical models of nature Experiment: empirical data about nature Scientific computing connects theory to experiment Computational models are used to test theories involving complex phenomena that cannot be matched directly against experiments Enable comprehension of complex experimental data by distilling complex data down to understandable information There is no scientific discovery without a theory about underlying mechanism There is no scientific discovery without experimental validation of model Theory Simulation Mechanism Experiment Mathematical Models of Nature Evidence Confirm/Violate Analysis Empirical Data about Nature

Breakthrough science will occur at the interface of observation and simulation Environment and Climate Cosmology Materials and Chemistry Combustion Subsurface science New research challenges to make these work together -3-

Data Intensive Arch for 2017 (as imagined in 2012) Compute On-Package DRAM Capacity Memory On-node-Storage In-Rack Storage Interconnect Global Shared Disk Off-System Network Compute Intensive Arch n.a. 5TF/socket 1-2 sockets 64-128GB HMC or Stack - 4-1TB/s 200TB/s.5-1TB CDIMM (opt.) Organized for Burst Buffer Spatially-oriented e.g. 3D-5D Torus ~1% nodes for ~1% nodes for IP Gateways 100TB/s 50GB/s/node 10TB/s/rack 10TB/s bisect 0.5TB/s aggregate 4GB/s per node Data Intensive Arch /! (root)! Dataset0! time =0.2345! validity =None! Dataset1! type, space! author =JoeBlow! Dataset0.1! subgrp! Dataset0.2! + + 5TF/sock 8+ sockets 1-4 TB Aggregate Chained-HMC 5-10TB Memory Class NVRAM 10-100TB SSD Cache or local FS 1-10PB Dist.Obj. DB (e.g whamcloud) All-to-All oriented e.g. Dragonfly or 3T ~10-20% nodes for 40GBe Ethernet to Direct from each node

Data Intensive Arch for 2017 (as imagined in 2012) Compute On-Package DRAM Capacity Memory On-node-Storage In-Rack Storage Interconnect Global Shared Disk Off-System Network Compute Intensive Arch 5TF/socket 1-2 sockets 64-128GB HMC or Stack - 5-1TB/s 200TB/s.5-1TB CDIMM (opt.) Goal: Maximum Computational Density and local n.a. bandwidth for given power/cost constraint. Organized for Burst Buffer Maximizes bandwidth density near compute Spatially-oriented e.g. 3D-5D Torus ~1% nodes for ~1% nodes for IP Gateways 100TB/s 50GB/s/node 10TB/s/rack 10TB/s bisect 0.5TB/s aggregate 4GB/s per node Data Intensive Arch 5TF/sock + + 8+ sockets Goal: Maximum Data Capacity and 1-4 TB global Aggregate Chained-HMC bandwidth for given power/cost 5-10TB constraint. Memory Class NVRAM Bring more 10-100TB storage SSD capacity near Cache compute or local (or FS conversely embed more /! author =JoeBlow! (root)! compute into the storage). Dataset0! Dataset1! subgrp! type, space! time =0.2345! validity =None! Dataset0.1! Dataset0.2! 1-10PB Dist.Obj. DB (e.g whamcloud) All-to-All oriented e.g. Dragonfly or 3T Requires software and programming environment support for such a paradigm shift ~10-20% nodes for 40GBe Ethernet to Direct from each node

Old/New Conception of Cloud/Datacenters (Simplified Conceptual Model of Interconnect Convergence) Old Conception Designed for externally facing TCP/IP Nearly 100% Std. TCP/IP ethernet inside and out The Datacenter/Cloud DMZ Router 90+% traffic Backbone 6

Old/New Conception of Cloud/Datacenters (Simplified Conceptual Model of Interconnect Convergence) New Old Conception Need Designed to Handle for Internal externally Data facing Mining/Processing TCP/IP Nearly 100% Design Std. for TCP/IP 80+% ethernet internal traffic inside and out Cloud/Datacenter The Datacenter/Cloud 80+% of traffic DMZ Router 90+% traffic Backbone High-Performance Fluid Center Low Overhead, High Bandwidth, Semi-custom internal interconnect Crunchy TCP/IP Exterior 7