Building supercomputers from embedded technologies

Similar documents
Building supercomputers from commodity embedded chips

The Mont-Blanc approach towards Exascale

The Mont-Blanc Project

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

European energy efficient supercomputer project

Barcelona Supercomputing Center

Pedraforca: a First ARM + GPU Cluster for HPC

HPC projects. Grischa Bolls

Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes

Butterfly effect of porting scientific applications to ARM-based platforms

Building blocks for 64-bit Systems Development of System IP in ARM

HPC state of play. HPC Ecosystem. HPC system supply. HPC use 24% EU 4,3% EU. Application software & tools. Academia 23% Bio-sciences 22% CAE 21%

FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers

Gen-Z Memory-Driven Computing

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

The DEEP (and DEEP-ER) projects

The Mont-Blanc project Updates from the Barcelona Supercomputing Center

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing

Intel Select Solutions for Professional Visualization with Advantech Servers & Appliances

Broadberry. Artificial Intelligence Server for Fraud. Date: Q Application: Artificial Intelligence

IBM Power Systems HPC Cluster

OPERA. Low Power Heterogeneous Architecture for the Next Generation of Smart Infrastructure and Platforms in Industrial and Societal Applications

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Fra superdatamaskiner til grafikkprosessorer og

The Stampede is Coming: A New Petascale Resource for the Open Science Community

Bringing Intelligence to Enterprise Storage Drives

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

Mercury Computer Systems & The Cell Broadband Engine

Bringing Intelligence to Enterprise Storage Drives

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >

Advanced IP solutions enabling the autonomous driving revolution

Atos ARM solutions for HPC

Cisco HyperFlex HX220c Edge M5

Checklist for Selecting and Deploying Scalable Clusters with InfiniBand Fabrics

Each Milliwatt Matters

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

HPC and the AppleTV-Cluster

ARM the Company ARM the Research Collaborator

Vectorisation and Portable Programming using OpenCL

HPC future trends from a science perspective

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM

Dynamical Exascale Entry Platform

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

HPC. Accelerating. HPC Advisory Council Lugano, CH March 15 th, Herbert Cornelius Intel

Fundamentals of Computers Design

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

An Introduction to OpenACC

ECE 574 Cluster Computing Lecture 18

Trends in Scientific Discovery Engines

Fundamentals of Quantitative Design and Analysis

Hypervisor support for emerging scale-out / scale-up architectures. Julian Chesterfield Chief Scientific Officer, OnApp Ltd

The Future of High Performance Interconnects

Maxwell: a 64-FPGA Supercomputer

The Future of Computing: AMD Vision

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

HPC IN EUROPE. Organisation of public HPC resources

Fundamentals of Computer Design

How GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige

Data Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node

EuroEXA Driving the technology towards exascale

ECE 486/586. Computer Architecture. Lecture # 2

Next Generation Enterprise Solutions from ARM

Intel Many Integrated Core (MIC) Architecture

SNAP Performance Benchmark and Profiling. April 2014

Overview of Tianhe-2

DESCRIPTION GHz, 1.536TB shared memory RAM, and 20.48TB RAW internal storage teraflops About ScaleMP

INCREASE IT EFFICIENCY, REDUCE OPERATING COSTS AND DEPLOY ANYWHERE

RDMA in Embedded Fabrics

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

MILC Performance Benchmark and Profiling. April 2013

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

KALEAO Company & Solutions

Data Sheet Fujitsu Server PRIMERGY CX250 S2 Dual Socket Server Node

Cisco HyperFlex HX220c M4 Node

GPUS FOR NGVLA. M Clark, April 2015

POWER9 Announcement. Martin Bušek IBM Server Solution Sales Specialist

Apalis A New Architecture for Embedded Computing

Introduction to Xeon Phi. Bill Barth January 11, 2013

FUSION PROCESSORS AND HPC

Trends in HPC (hardware complexity and software challenges)

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

THE LEADER IN VISUAL COMPUTING

GOING ARM A CODE PERSPECTIVE

E4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè. ARM64 and GPGPU

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics

John Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems

SUPERMICRO, VEXATA AND INTEL ENABLING NEW LEVELS PERFORMANCE AND EFFICIENCY FOR REAL-TIME DATA ANALYTICS FOR SQL DATA WAREHOUSE DEPLOYMENTS

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

Transcription:

http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement n 288777.

The problem and the opportunity Europe represents ~35% of the HPC market Yet, it does not have HPC technology of its own Nobody really knows how to build a sustainable EFLOPS supercomputer POWER SPACE COST Consensus about it being revolutionary, not evolutionary Everyone starts from square zero, hence equal opportunities Europe is very strong in embedded computing The most energy- and cost-efficient computing technology today 2

Mont-Blanc project goals To develop an European Exascale approach Leverage commodity and embedded power-efficient technology Supported by EU FP7 with 16M under two projects: Mont-Blanc: October 2011 September 2014 14.5 M budget (8.1 M EC contribution), 1095 Person-Month Mont-Blanc 2: October 2013 September 2016 11.3 M budget (8.0 M EC contribution), 892 Person-Month 3

Mont-Blanc 1: Project objectives Objective 1: To deploy a prototype HPC system based on currently available energy-efficient embedded technology Scalable to 50 PFLOPS on 7 MWatt Competitive with Green500 leaders in 2014 Deploy a full HPC system software stack Objective 2: To design a next-generation HPC system and new embedded technologies targeting HPC systems that would overcome most of the limitations encountered in the prototype system Scalable to 200 PFLOPS on 10 MWatt Competitive with Top500 leaders in 2017 Objective 3: To port and optimise a small number of representative Exascale applications capable of exploiting this new generation of HPC systems Up to 11 full-scale applications

Mont-Blanc 2: Project objectives Objective 1: Complement the effort on the Mont-Blanc system software stack Development tools: debugger, performance analysis Resiliency ARMv8 ISA Objective 2: Initial definition of the Mont-Blanc Exascale architecture Performance & power models for DSE Objective 3: Continued tracking and evaluation of ARMbased products Deployment and evaluation of small developer kit clusters Evaluation of their suitability for HPC Objective 4: Continued support for the Mont-Blanc consortium Mont-Blanc prototype(s) operation OmpSs developer support Increased dissemination effort

Commodity components drive HPC Top500 1993, 1 st edition: Cray vector, 41% MasPar SIMD, 11% Convex/HP vector, 5% Microprocessors replaced Vector/SIMD supercomputers They were not faster They were cheaper 6

Samsung Exynos 5 Dual Superphone SoC 32nm HKMG Dual-core ARM Cortex-A15 @ 1.7 GHz Quad-core ARM Mali T604 OpenCL 1.1 Dual-channel DDR3 USB 3.0 to 1 GbE bridge All in a low-power mobile socket 7

Mont-Blanc SoM CPU + GPU + DRAM + storage + network all in a compute card just 8.5x5.6 cm 4 GB of DDR3-1600 usd slot, up to 64 GB Exynos 5 Dual: 2x ARM Cortex-A15 ARM Mali-T604 USB 3.0 to 1 GbE bridge 8

Mont-Blanc server blade 15 node-cluster in a standard Bull B505 enclosure Cluster management 1 GbE crossbar switch 2 x 10 GbE links 15 compute cards 30 x ARM Cortex-A15 15 x ARM Mali-T604 GPU 120 GB DDR3 9

Mont-Blanc server chassis 9 blades in a standard 7U BullX chassis Shared cooling, PSU, chassis management Chassis management panel 9 x Compute blades: 135 x Compute cards + 36 x 10 GbE links 270 x ARM Cortex-A15 135 x ARM Mali-T604 GPU 540 GB DDR3-1600 10

The Mont-Blanc prototype 6 BullX chassis 54 Compute blades 810 Compute cards 1620 CPU 810 GPU 3.2 TB of DRAM 52 TB of Flash 26 TFLOPS 18 KWatt 11

There is no free lunch 2X more cores for the same performance ½ on-chip memory / core 8X more cluster nodes Hybrid CPU + GPU Lower I/O bandwidth 12

OmpSs runtime layer manages architecture complexity P0 P1 P2 No overlap Overlap Programmer exposed a simple architecture Task graph provides lookahead Exploit knowledge about the future Automatically handle all of the architecture challenges Strong scalability Multiple address spaces Low cache size Low interconnect bandwidth Enjoy the positive aspects Energy efficiency Low cost 13

Advances over State of the Art Developed an entire HPC software ecosystem on ARM Including OpenCL driver for GPU acclerator Performance monitoring + tracing + analysis Advanced parallel programming models MPI + OmpSs @ OpenCL + FORTRAN Fine-grain per-node power monitoring Open the door to future per-node power management Combined with SoC power management features Microserver architecture for HPC Built on commodity SoC + commodity network (Ethernet) 14

Limitations of current mobile processors for HPC 32-bit memory controller Even if ARM Cortex-A15 offers 40-bit address space No ECC protection in memory Limited scalability, errors will appear beyond a certain number of nodes No standard server I/O interfaces Do NOT provide native Ethernet or PCI Express Provide USB 3.0 and SATA (required for tablets) No network protocol off-load engine TCP/IP, OpenMX, USB protocol stacks run on the CPU Thermal package not designed for sustained full-power operation All these are implementation decisions, not unsolvable problems Only need a business case to jusitfy the cost of including the new features such as the HPC and server markets 15

Get ready for the change, before it happens Embedded processors have qualities that make them interesting for HPC FP64 capability Performance increasing rapidly + energy efficient Large market, many providers, competition, low cost Lots of other mebedded IP blokcs to build the perfect HPC SoC Embedded GPU accelerators, DSP, FPGA, Network interfaces + protocol offloads Current limitations are due to target market conditions Not real technical challenges A whole set of ARM server chips is coming Solving most of the limitations identified 16

Conclusions The convergence of Embedded and HPC technologies has happened already We have enabled the software already Leverage on Europe s leadership position in parallel programming models & tools Leverage on all of the embedded systems technology to build a new class of HPC system Automated SoC design Automatic core customization SoC power management Decouple IP provider from semiconductor provider montblanc-project.eu MontBlancEU @MontBlanc_EU 17