Von Antreibern und Beschleunigern des HPC

Similar documents
Porting Scientific Applications to OpenPOWER

Intel Many Integrated Core (MIC) Architecture

Exascale: challenges and opportunities in a power constrained world

NVIDIA Application Lab at Jülich

Systems Architectures towards Exascale

Revolutionizing Open. Cecilia Carniel IBM Power Systems Scale Out sales

Vectorisation and Portable Programming using OpenCL

IBM Power Systems Update. David Spurway IBM Power Systems Product Manager STG, UK and Ireland

Looking ahead with IBM i. 10+ year roadmap

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

GPU Architecture. Alan Gray EPCC The University of Edinburgh

Intel Enterprise Processors Technology

The Stampede is Coming: A New Petascale Resource for the Open Science Community

Building NVLink for Developers

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

OpenCAPI and its Roadmap

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

I/O Monitoring at JSC, SIONlib & Resiliency

Université IBM i 2017

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE

Interconnect Your Future

Tools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Erkenntnisse aus aktuellen Performance- Messungen mit LS-DYNA

Trends in HPC (hardware complexity and software challenges)

Interconnect Your Future

Infrastructure Matters: POWER8 vs. Xeon x86

IBM Power 9 надежная платформа для развертывания облаков. Ташкент. Юрий Кондратенко Cross-Brand Sales Specialist

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

Trends in the Infrastructure of Computing

Arm Processor Technology Update and Roadmap

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

March 10 11, 2015 San Jose

Outline Marquette University

OpenCAPI Technology. Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name. Join the Conversation #OpenPOWERSummit

Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing

Paving the Road to Exascale

These slides contain projections or other forward-looking statements within the meaning of Section 27A of the Securities Act of 1933, as amended, and

OpenCL: History & Future. November 20, 2017

SUPERMICRO, VEXATA AND INTEL ENABLING NEW LEVELS PERFORMANCE AND EFFICIENCY FOR REAL-TIME DATA ANALYTICS FOR SQL DATA WAREHOUSE DEPLOYMENTS

The Mont-Blanc approach towards Exascale

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

SUSE Linux Entreprise Server for ARM

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

RISC-V: Enabling a New Era of Open Data-Centric Computing Architectures

OpenPOWER Innovations for HPC. IBM Research. IWOPH workshop, ISC, Germany June 21, Christoph Hagleitner,

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Capturing value from an open ecosystem

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

The Era of Heterogeneous Computing

John Hengeveld Director of Marketing, HPC Evangelist

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

IBM CORAL HPC System Solution

MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE

Post-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED

General overview and first results.

Computer Architecture

High performance Computing and O&G Challenges

IBM Power Systems: Open Innovation to put data to work. Juan López-Vidriero Mata Director técnico de ventas de servidores

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

Computer Architecture!

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Cray XC Scalability and the Aries Network Tony Ford

Graphics Processor Acceleration and YOU

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

Fundamentals of Quantitative Design and Analysis

Jülich Supercomputing Centre

Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17

European energy efficient supercomputer project

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

POWER9 Announcement. Martin Bušek IBM Server Solution Sales Specialist

Parallel Programming on Ranger and Stampede

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

EXASCALE COMPUTING ROADMAP IMPACT ON LEGACY CODES MARCH 17 TH, MIC Workshop PAGE 1. MIC workshop Guillaume Colin de Verdière

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

HETEROGENEOUS COMPUTE INFRASTRUCTURE FOR SINGAPORE

New HPC architectures landscape and impact on code developments Massimiliano Guarrasi, CINECA Meeting ICT INAF Catania, 13/09/2018

Interconnect Your Future

IBM Power Advanced Compute (AC) AC922 Server

HPC Cineca Infrastructure: State of the art and towards the exascale

IBM Power AC922 Server

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Zigbee 3.0 and Dotdot Connecting the IoT. Jean-Pierre Desbenoit Schneider Electric Bruno Vulcano Legrand

HPC future trends from a science perspective

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

World s most advanced data center accelerator for PCIe-based servers

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

HPE ProLiant ML350 Gen P 16GB-R E208i-a 8SFF 1x800W RPS Solution Server (P04674-S01)

Inside Intel Core Microarchitecture

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

Multimedia in Mobile Phones. Architectures and Trends Lund

Transcription:

Mitglied der Helmholtz-Gemeinschaft Von Antreibern und Beschleunigern des HPC D. Pleiter Jülich 16 December 2014

Ein Dementi vorweg [c't, Nr. 25/2014, 15.11.2014] Ja: Das FZJ ist seit März Mitglieder der OpenPOWER Foundation und plant das POWER Acceleration and Design Center Nein: Kein Lock-in: wir legen uns heute noch nicht auf einen Hersteller fest Aber: Wir denken natürlich viel über zukünftige Architekturen nach... 2

OpenPOWER Foundation History Announced in September 2013 Established in December 2013 as an open, not-for-profit technical membership group November 2014: >60 institutions joined Mission statement Create an open ecosystem, using the POWER Architecture to share expertise, investment, and serverclass intellectual property to serve the evolving needs of customers and industry Organisation Board of directors + Technical Steering Committee Work groups 3

OpenPOWER Foundation (cont.) 4

Long-term HPC Trends: Top500 List Performance metric Floating-point operations per time unit while solving a dense linear set of equations Criticism Workload not representative Problem size can be freely tuned Positive aspects Reasonable well defined basis for comparison Allows for long-term comparison First list published in June 1993 5

Top500 Performance Trends Rmax vs. time 6

Top500 Processor Architecture Trends System share: Xeon 5600 Itanium Pentium 4 Xeon E5 v2 Xeon E5 June 2004 June 2014 7

Why OpenPOWER? A Customer View Increasing share of Top500 are based on CPUs from single vendor Pure market observation, no statement about technology Lack of competition Usually higher prices Less incentive for innovations Need for promoting alternative technologies OpenPOWER ARM 8

Key Technology Constraints Dennard Scaling for MOSFET transistors Allowed for change of following parameters such that electric fields are roughly constant: Transistor density Switching speed Supply voltage Breakdown of Dennard Scaling Broken since around 2005 due to the end of voltage scaling Scaling of switching speed prohibitive due to power consumption More performance = more parallelism [L. Chang et al., 2010] 9

Technology Path: More Parallel Processors Processor parallelism Micro-architecture level: Data-parallel instructions (SIMD) Number of instruction pipelines Processor level: multi-core Example: JUROPA Cluster at JSC JUROPA-2 JUROPA-4 SIMD width 2x64 bit 4x64 bit No. of SIMD pipelines 1 2 Core/processor 4 12 Flop/cycle/processor 16 192 Core clock frequency [GHz] 2.93 2.5 10

Even more Parallel Accelerators... Competing technologies Graphics processing units (GPU) Xeon Phi Processor level parallelism NVIDIA K40 Intel Xeon Phi 7120D Flop/cycle/processor 1920 976 Core clock frequency [GHz] 0.75 1.24 11

Top500 Trends on Accelerated Architectures System share: June 2009 June 2014 12

Top500 Trends on Accelerated Architectures Performance share: June 2009 June 2014 13

Technology Path: Deeper Memory Hierarchy High memory capability and capacity requirements Increasing compute performance Increase of memory bandwidth B mem Applications ambition to solve large problems Significant memory capacity C mem Costs challenge Faster memory = more expensive (larger GByte/EUR) Solution: Memory hierarchy with more levels Fast memory, smaller capacity Large capacity, slower memory 14

Deeper Memory Hierarchy: Top500 Trends R mem = C mem /B mem vs. C mem R mem mainly determined by technology C mem is architecture parameter Top500 11/2013 Rank #1,, #10, #129 15

Accelerator Architectures Today CPU 16 GByte/s 16x PCIe GEN3 GPU O(50 GByte/s) 200-300 GByte/s O(100 GiByte) MEM MEM O(10 GiByte) Relatively small bandwidth between host and device Separate memory coherence domains 16

Future GPU Architectures CPU 80-200 GByte/s NVLink GPU O(100 GByte/s) >300 GByte/s O(100 GiByte) MEM MEM O(10 GiByte) Similar bandwidth host-device and host-memory Single memory coherence domains OpenPOWER is going down this road 17

OpenPOWER and the Exascale Challenges Drastically improve energy efficiency GPU have potential for being highly energy efficient Preserve usability at tremendously increased level of parallelism GPU architectures proven to be suitable for many scientific applications; growing experience and eco-system Keep overall system balanced Tighter integration of CPU-GPU with different memory layers Address reliability and resilience High-performance nodes smaller number of components 18

Exascale Applications Challenges for application developers Increase parallelism of application Manage data locality to leverage deeper memory hierarchy Possible need for re-design of applications POWER Acceleration and Design Center Collaboration between IBM-BOE, IBM-ZRL, FZJ, NVIDIA Approach: Work on scalability of selected applications Create competence and knowledge for Application developers Technology developers 19

Summary and Conclusions Need for more competition on HPC processors OpenPOWER provides solutions today Other alternatives are in the pipeline Key technology trends towards exascale Massive increase of parallelism Deepening of the memory hierarchy OpenPOWER drives architectures in right direction Tight integration of CPU and accelerator Improved usability of deeper memory hierarchy Open eco-system good for co-design approach Good reasons for R&D along OpenPOWER roadmap 20