Future Technologies (WP8) Prototype Evaluation & Research Activities. Iris Christadler, Dr. Herbert Huber Leibniz Supercomputing Centre, Germany

Size: px
Start display at page:

Download "Future Technologies (WP8) Prototype Evaluation & Research Activities. Iris Christadler, Dr. Herbert Huber Leibniz Supercomputing Centre, Germany"

Transcription

1 Future Technologies (WP8) Prototype Evaluation & Research Activities Iris Christadler, Dr. Herbert Huber Leibniz Supercomputing Centre, Germany

2 Prototype Overview (1/2) CEA 1U Tesla Server T1070 (CUDA, Take more easily advantage of accelerators. Compare GPU/CAPS CAPS, DDT), Intel Harpertown nodes HMPP with other approaches to program accelerators. CINECA I/O Subsystem (SSD, Lustre, pnfs) Assess the applicability of new file system and storage technologies. CINES-LRZ LRB/CS Hybrid SGI ICE/UV/Nehalem-EP & Nehalem-EX/ClearSpeed/Larrabee Evaluate a hybrid system architecture containing thin nodes, fat nodes and compute accelerators with a shared file system. CSCS UPC/CAF Prototype PGAS language compilers (CAF + UPC for Cray XT systems) Understand the usability and programmability of PGAS languages. EPCC FPGA Maxwell FPGA prototype (VHDL support & consultancy + software licenses (e.g., Mitrion-C)) Assess the potential of high-level languages for using FPGAs in HPC. Compare energy efficiency with other solutions. PRACE Workshop, New Languages & Future Technology Prototypes 2

3 Prototype Overview (2/2) FZJ eqpace (PowerXCell Gain deep expertise in communication 8i cluster with special network issues. Extend the application network processor) domain of the QPACE system. Cell & FPGA interconnect LRZ RapidMind RapidMind Multi-Core Development Platform (automatic code generation for x86, GPUs and Cell) Assess the potential of data stream languages. Compare RapidMind with other approaches for programming accelerators or multi-core systems NCF ClearSpeed ClearSpeed CATS 700 units Evaluate ClearSpeed accelerator hardware for large-scale applications. SNIC- KTH Air cooled blade system from Supermicro with AMD Istanbul processors & QDR IB (subject to EC approval) Evaluate and optimize energy efficiency and packing density of commodity hardware. PRACE Workshop, New Languages & Future Technology Prototypes 3

4 RESEARCH ACTIVITIES PRACE Workshop, New Languages & Future Technology Prototypes 4

5 Parallel GPU Evaluation of GPGPU programming languages (CSC). Languages CUDA+MPI GPU-HMMER OpenCL Benchmarks: GPU-HMMER Euroben Kernels Hardware Tesla AMD Firestream CEA WP8 Prototype PRACE Workshop, New Languages & Future Technology Prototypes 5

6 Advanced PGAS Programming Evaluate usability of PGAS programming model (CSC). Languages Coarray Fortran (CAF) Unified Parallel C (UPC) Benchmarks Euroben mod2am/as/f Environments Cray XT5 (cce) SGI Altix (g95, bupc) upc_barrier; upc_forall (sc=0; sc<totblks; sc++; sc) { // Square matrix multiply l C = A * B with aid of DGEMM double beta = 0; double *clocal = (double *)c[sc].x; // Local C-block for this UPC-thread int ib = sc / numblks, jb = sc % numblks, kb, i, j, k; for (kb=0; kb<numblks; kb++) { int sa = ib * numblks + kb; // The owner of A-block is sa % THREADS int sb = kb * numblks + jb; // The owner of B-block is sb %THREADS double *al = (sa%threads == MYTHREAD)? // Get the A-block (double *)a[sa].x : ( upc_memget(alocal, a[sa].x, ns), alocal); double *bl = (sb%threads == MYTHREAD)? // Get the B-block (double *)b[sb].x : (upc_memget(blocal, b[sb].x, ns), blocal); double *cl = clocal; // The local C-block owned by this UPC-thread // Call BLAS3-library DGEMM dgemm_("n","n", &blksize, &blksize, &blksize, &alpha, al, &blksize, bl, &blksize, &beta, cl, &blksize, 1, 1); beta = 1; } /* for (kb=0; kb<numblks; kb++) */ } /* upc_forall (sc=0; sc<totblks; sc++; sc) */ upc_barrier; Mod2am kernel using DGEMM PRACE Workshop, New Languages & Future Technology Prototypes 6

7 Research on power efficiency Evaluate power consumption of components (STFC, PSNC). Hardware: Intel Xeon, AMP Opteron, ClearSpeed, Tesla, Firestream, Cell, Power6. Different workloads: stand-by, neutral, real life, artificial i stress. Assess CPU, Memories, Accelerators, HDD s, cooling fans, backplane, power supply. Power measurements with: Clamp meters, PDUs with built-in in ammeters, values from system management software PRACE Workshop, New Languages & Future Technology Prototypes 7

8 Research on Performance Predictions Prediction of application performance for future architectures Optimize hardware specifications in terms of sustained application performance per Euro. Identify applications porting issues to new architectures. Identify hard- and software scaling issues PRACE Workshop, New Languages & Future Technology Prototypes 8

9 Detailed Results are reported in Deliverable D8.3.2 available at project.eu/documents/d8 A SELECTION OF D8.3.2 KEY RESULTS PRACE Workshop, New Languages & Future Technology Prototypes 9

10 QPACE ranked #1 in Green 500 List PRACE Workshop, New Languages & Future Technology Prototypes 10

11 Euroben results - accelerator languages AcceleratorLanguages (absolute performance) MKL (8 Nehalem cores) CUDA (1 C1060) Mflops % 81% CellSs (1PowerXCell8i) 79% 78% v. peak Cn (1CSX700) mod2f/mkl: single threaded only peak perf mod2am mod2as mod2f % of peak per rformance Accelerator Languages (%peak perf) mod2f/mkl: single threaded only mod2am mod2as mod2f MKL CUDA CellSs Cn PRACE Workshop, New Languages & Future Technology Prototypes 11

12 Euroben results - GPGPU languages Hardware SP Peak Performance DP Peak Performance Nehalem-EP (2.53 GHz, 1 core) Nehalem-EP (2.53 GHz, 8 cores) GFlop/s GFlop/s GFlop/s GFlop/s 1 C1060 GPU 933 GFlop/s 78 GFlop/s 1 PowerXCell8i GFlop/s (8 SPUs) GFlop/s 2 PowerXCell8i (16 SPUs) GFlop/s GFlop/s PRACE Workshop, New Languages & Future Technology Prototypes 12

13 Mflops Performance in Euroben results - productivity Development Time versus Performance (dense matrix-matrix mul.) * * ** 6 PPerformancePerformance 10 4 total time Dev velopment Tim me in Days 2 first version 1 0 * OpenCL and CUDA+MPI port based on existing CUDA port ** RapidMind developer included time for benchmarking PRACE Workshop, New Languages & Future Technology Prototypes 13

14 Rinf PRACE Workshop, New Languages & Future Technology Prototypes 14

15 Infiniband: Intelligent Routing Traffic Aware Routing Algorithm (TARA) PRACE Workshop, New Languages & Future Technology Prototypes 15

16 Infiniband: Interconnect Prunning MPI- MPT OpenMPI MPI MPI Tasks Time Intel_PRU- Intel_FULL_Inter Inter- (s) Time (s) NED Time connect Time (s) (s) Influence of different MPI versions and network pruning on execution time of GADGET PRACE Workshop, New Languages & Future Technology Prototypes 16

17 Das Bild kann nicht angezeigt werden. Dieser Computer verfügt möglicherweise über zu wenig Arbeitsspeicher, um das Bild zu öffnen, oder das Bild ist beschädigt. Starten Sie den Computer neu, und öffnen Sie dann erneut die Datei. Wenn weiterhin das rote x angezeigt wird, müssen Sie das Bild möglicherweise löschen und dann erneut einfügen. IO-Results: Lustre Metadata Performance Lustre MDS is bottleneck for small I/O operations Use stripe count 1 for metadata intensive I/O loads The metadata performance of Lustre needs to be largely l improved for Multi- Petascale machines PRACE Workshop, New Languages & Future Technology Prototypes 17

18 A glimpse on what you will find in Deliverable D8.3.2 PROTOTYPES PRACE Workshop, New Languages & Future Technology Prototypes 18

19 eqpace Extend communication capabilities of eqpace to make it suitable for a wider range of applications. Reach a top position in the Green500 list (FZJ). Hardware: PowerXCell8i processor nodes with custom 3D-torus interconnect. Benchmarks: HPL, Euroben kernels, torus network benchmark, applications & iterative solvers. Programming g environments: Cell SDK & CellSs PRACE Workshop, New Languages & Future Technology Prototypes 19

20 RapidMind Evaluation of the RapidMind programming model (LRZ). Hardware: CPUs (Nehalem EP, AMD Opteron) GPUs (Nvidia Tesla and Quadro FX) Cell (QS22-blade cluster) Gfops RapidMind dmod2am matrix size (m) Software: RapidMind allows to write code which can run on x86 cores as well as accelerators like GPUs and Cell. x86 dp (8 cores nehalem) cuda dp (c1060) glsl sp (FX 5800) Evaluate ease-of-use & portability Assess RapidMind performance on different architectures Compare RapidMind with other accelerator languages PRACE Workshop, New Languages & Future Technology Prototypes 20

21 LRZ-CINES Evaluation of a hybrid system architecture containing thin nodes, fat nodes and compute accelerators with a shared file system (CINES, LRZ). Hardware: SGI ICE (Nehalem EP) SGI UV (Nehalem EX) Clearspeed CSX700 Benchmarks: Euroben kernels Synthetic BMs: HPL, Rinf, Intel MPI Benchmark, Apex-MAP Application BMs: Gadget, Raxml PRACE Workshop, New Languages & Future Technology Prototypes 21

22 Hybrid technology demonstrator Evaluating GPGPU with CAPS HMPP (CEA). Hardware: Tesla servers connected to Bull servers via PCI-E. Software: CAPS HMPP allows to exploit the potential of GPGPUs by simply adding preprocessor directives to legacy Fortran and C codes. ops Gfl Gflops CAPS hmpp mod2am matrix size (m) CUDA mod2am matrix size (m) PRACE Workshop, New Languages & Future Technology Prototypes 22

23 Maxwell FPGA Evaluate the performance and usability of the HARWEST Compiling Environment (EPCC). Hardware: FPGA prototype Maxwell (32 FPGAs) from both Alpha Data Ltd and Nallatech Ld Ltd using Virtex-4 FPGAs supplied by Xilinx Corp. Benchmarks: 4 Euroben kernels Languages: VHDL HCE PRACE Workshop, New Languages & Future Technology Prototypes 23

24 PGAS languages Evaluate ease of use of PGAS programming model (CSCS). Hardware: Cray XT5 Compiler: Cray Compiler Environment (CCE) Evaluation of the compiler: Functional correctness Conformance with language standards Usability for existing CAF and UPC benchmarks/applications Benchmarks from Rice University, George Washington University and the Lawrence Berkley National Laboratory PRACE Workshop, New Languages & Future Technology Prototypes 24

25 ClearSpeed/PetaPath Evaluate ClearSpeed-Petapath system (NCF). Hardware: 114 ClearSpeed CSX700 cards Language: C n Benchmarks: 4 Euroben kernels 4 Applications Astronomy Geophysics numerical mathematics medical tomography PRACE Workshop, New Languages & Future Technology Prototypes 25

26 Das Bild kann nicht angezeigt werden. Dieser Computer verfügt möglicherweise über zu wenig Arbeitsspeicher, um das Bild zu öffnen, oder das Bild ist beschädigt. Starten Sie den Computer neu, und öffnen Sie dann erneut die Datei. Wenn weiterhin das rote x angezeigt wird, müssen Sie das Bild möglicherweise löschen und dann erneut einfügen. XC4-IO Compare performances in storage infrastructure access, using different hardware configurations and file system architectures. (CINECA). PRACE Workshop, New Languages & Future Technology Prototypes 26

27 SNIC-KTH Evaluate energy efficiency of high density commodity parts (SNIC-KTH). Preliminary Results (Gromacs) Hardware: AMD Istanbul Benchmarks: Euroben, STREAM, IMB, Gromacs, CFD Measure power consumption per component Adjust fan speed and fan power Assess energy management features of AMD Istanbul (Control of voltage and frequency of components) PRACE Workshop, New Languages & Future Technology Prototypes 27

28 Contact information: Dr. Herbert Huber (WP8 Leader), Ii Iris Chi Christadler (WP8C Co-Leader), hi d Leibniz Supercomputing Centre, Germany THANK YOU FOR YOUR ATTENTION! COMMENTS? QUESTIONS? PRACE Workshop, New Languages & Future Technology Prototypes 28

1. SQL definition SQL is a declarative query language 2. Components DRL: Data Retrieval Language DML: Data Manipulation Language DDL: Data Definition

1. SQL definition SQL is a declarative query language 2. Components DRL: Data Retrieval Language DML: Data Manipulation Language DDL: Data Definition SQL Summary Definitions iti 1. SQL definition SQL is a declarative query language 2. Components DRL: Data Retrieval Language DML: Data Manipulation Language g DDL: Data Definition Language DCL: Data Control

More information

Directions in HPC Technology

Directions in HPC Technology Directions in HPC Technology PRACE evaluates Technologies for Multi-Petaflop/s Systems This should lead to integration of 3 5 Tier-0 world-class systems in Europe from 2010 on. It implies: New hardware

More information

Creating a European Supercomputing Infrastructure Thomas Lippert, Forschungszentrum Jülich, Germany

Creating a European Supercomputing Infrastructure Thomas Lippert, Forschungszentrum Jülich, Germany The Next-Generation Supercomputing Symposium 2009, Tokyo, Japan Creating a European Supercomputing Infrastructure Thomas Lippert, Forschungszentrum Jülich, Germany Regards from Prof. Dr. Achim Bachem Coordinator

More information

Creating a European Supercomputing Infrastructure Thomas Lippert, Forschungszentrum Jülich, Germany

Creating a European Supercomputing Infrastructure Thomas Lippert, Forschungszentrum Jülich, Germany 25e Forum ORAP, EDF R&D Clamart, France, 13 36 14 octobre 2009 Creating a European Supercomputing Infrastructure Thomas Lippert, Forschungszentrum Jülich, Germany Regards from Prof. Dr. Achim Bachem Coordinator

More information

arxiv: v2 [cs.pf] 19 Feb 2010

arxiv: v2 [cs.pf] 19 Feb 2010 RapidMind: Portability across Architectures and its Limitations Iris Christadler and Volker Weinberg Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften, D-85748 Garching bei München, Germany

More information

POTENTIAL AND BENEFITS OF FUNCTIONAL MOCK-UP INTERFACE - FMI FOR VIRTUAL VEHICLE INTEGRATION

POTENTIAL AND BENEFITS OF FUNCTIONAL MOCK-UP INTERFACE - FMI FOR VIRTUAL VEHICLE INTEGRATION POTENTIAL AND BENEFITS OF FUNCTIONAL MOCK-UP INTERFACE - FMI FOR VIRTUAL VEHICLE INTEGRATION 1 WHY WOULD CARMAKER NEED FMI? New Challenges in vehicle development Hybrid and electric cars, networking functions...

More information

Experiences with code porting to (GPU) accelerated systems

Experiences with code porting to (GPU) accelerated systems Experiences with code porting to (GPU) accelerated systems PRACE Project experiences Lennart Johnsson Department of Computer Science University of Houston Outline PRACE PRACE Technology Objectives Accelerator

More information

Mixed MPI-OpenMP EUROBEN kernels

Mixed MPI-OpenMP EUROBEN kernels Mixed MPI-OpenMP EUROBEN kernels Filippo Spiga ( on behalf of CINECA ) PRACE Workshop New Languages & Future Technology Prototypes, March 1-2, LRZ, Germany Outline Short kernel description MPI and OpenMP

More information

The SNIC/KTH Prototype Energy Efficiency with Standard components. Lennart Johnsson, Daniel Ahlin, John Wang KTH/SNIC, UH

The SNIC/KTH Prototype Energy Efficiency with Standard components. Lennart Johnsson, Daniel Ahlin, John Wang KTH/SNIC, UH The SNIC/KTH Prototype Energy Efficiency with Standard components Lennart Johnsson, Daniel Ahlin, John Wang KTH/SNIC, UH Outline Background The PRACE SNIC/KTH prototype Preliminary results Work to be done

More information

Addressing Heterogeneity in Manycore Applications

Addressing Heterogeneity in Manycore Applications Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction

More information

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about

More information

High Performance Computing with Accelerators

High Performance Computing with Accelerators High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

CUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation

CUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation CUDA Accelerated Linpack on Clusters E. Phillips, NVIDIA Corporation Outline Linpack benchmark CUDA Acceleration Strategy Fermi DGEMM Optimization / Performance Linpack Results Conclusions LINPACK Benchmark

More information

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences

More information

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &

More information

PRACE prototypes. ICT 08, Lyon, Nov. 26, 2008 Dr. J.Ph. Nominé, CEA/DIF

PRACE prototypes. ICT 08, Lyon, Nov. 26, 2008 Dr. J.Ph. Nominé, CEA/DIF PRACE prototypes ICT 08, Lyon, Nov. 26, 2008 Dr. J.Ph. Nominé, CEA/DIF jean-philippe.nomine@cea.fr Credits and acknowledgements: FZJ, CEA, NCF/SARA, HLRS, BSC, CSC, CSCS F. Robin (PRACE WP7 Leader) 2 PRACE

More information

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION

More information

HMPP port. G. Colin de Verdière (CEA)

HMPP port. G. Colin de Verdière (CEA) HMPP port G. Colin de Verdière (CEA) Overview.Uchu prototype HMPP MOD2AS MOD2AM HMPP in a real code 2 The UCHU prototype Bull servers 1 login node 4 nodes 2 Haperton, 8GB 2 NVIDIA Tesla S1070 IB DDR Slurm

More information

HPC Technology Update Challenges or Chances?

HPC Technology Update Challenges or Chances? HPC Technology Update Challenges or Chances? Swiss Distributed Computing Day Thomas Schoenemeyer, Technology Integration, CSCS 1 Move in Feb-April 2012 1500m2 16 MW Lake-water cooling PUE 1.2 New Datacenter

More information

Pedraforca: a First ARM + GPU Cluster for HPC

Pedraforca: a First ARM + GPU Cluster for HPC www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu

More information

Efficiency Considerations of Cauchy Reed-Solomon Implementations on Accelerator and Multi-Core Platforms

Efficiency Considerations of Cauchy Reed-Solomon Implementations on Accelerator and Multi-Core Platforms Efficiency Considerations of Cauchy Reed-Solomon Implementations on Accelerator and Multi-Core Platforms SAAHPC June 15 2010 Knoxville, TN Kathrin Peter Sebastian Borchert Thomas Steinke Zuse Institute

More information

Prototypes Systems for PRACE. François Robin, GENCI, WP7 leader

Prototypes Systems for PRACE. François Robin, GENCI, WP7 leader Prototypes Systems for PRACE François Robin, GENCI, WP7 leader Outline Motivation Summary of the selection process Description of the set of prototypes selected by the Management Board Conclusions 2 Outline

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

RWTH GPU-Cluster. Sandra Wienke March Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

RWTH GPU-Cluster. Sandra Wienke March Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky RWTH GPU-Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de March 2012 Rechen- und Kommunikationszentrum (RZ) The GPU-Cluster GPU-Cluster: 57 Nvidia Quadro 6000 (29 nodes) innovative

More information

The GPU-Cluster. Sandra Wienke Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

The GPU-Cluster. Sandra Wienke Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky The GPU-Cluster Sandra Wienke wienke@rz.rwth-aachen.de Fotos: Christian Iwainsky Rechen- und Kommunikationszentrum (RZ) The GPU-Cluster GPU-Cluster: 57 Nvidia Quadro 6000 (29 nodes) innovative computer

More information

Experts in Application Acceleration Synective Labs AB

Experts in Application Acceleration Synective Labs AB Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg

More information

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500

More information

Comparison of PRACE prototypes and benchmarks. Axel Berg (SARA, NL), ISC 10 Hamburg June 1 st 2010

Comparison of PRACE prototypes and benchmarks. Axel Berg (SARA, NL), ISC 10 Hamburg June 1 st 2010 Comparison of PRACE prototypes and benchmarks Axel Berg (SARA, NL), ISC 10 Hamburg June 1 st 2010 What is a prototype? 2 The prototype according to Wikipedia A prototype is an original type, form, or instance

More information

GPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3

GPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3 /CPU,a),2,2 2,2 Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3 XMP XMP-dev CPU XMP-dev/StarPU XMP-dev XMP CPU StarPU CPU /CPU XMP-dev/StarPU N /CPU CPU. Graphics Processing Unit GP General-Purpose

More information

John Hengeveld Director of Marketing, HPC Evangelist

John Hengeveld Director of Marketing, HPC Evangelist MIC, Intel and Rearchitecting for Exascale John Hengeveld Director of Marketing, HPC Evangelist Intel Data Center Group Dr. Jean-Laurent Philippe, PhD Technical Sales Manager & Exascale Technical Lead

More information

Dealing with Heterogeneous Multicores

Dealing with Heterogeneous Multicores Dealing with Heterogeneous Multicores François Bodin INRIA-UIUC, June 12 th, 2009 Introduction Main stream applications will rely on new multicore / manycore architectures It is about performance not parallelism

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

Prototyping in PRACE PRACE Energy to Solution prototype at LRZ

Prototyping in PRACE PRACE Energy to Solution prototype at LRZ Prototyping in PRACE PRACE Energy to Solution prototype at LRZ Torsten Wilde 1IP-WP9 co-lead and 2IP-WP11 lead (GSC-LRZ) PRACE Industy Seminar, Bologna, April 16, 2012 Leibniz Supercomputing Center 2 Outline

More information

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director CAS 2K13 Sept. 2013 Jean-Pierre Panziera Chief Technology Director 1 personal note 2 Complete solutions for Extreme Computing b ubullx ssupercomputer u p e r c o p u t e r suite s u e Production ready

More information

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

RapidMind & PGI Accelerator Compiler. Dr. Volker Weinberg Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften

RapidMind & PGI Accelerator Compiler. Dr. Volker Weinberg Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften RapidMind & PGI Accelerator Compiler Dr. Volker Weinberg Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften volker.weinberg@lrz.de PRACE Workshop New Languages & Future Technology Prototypes

More information

Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:

Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe: Cray events! Cray User Group (CUG):! When: May 16-19, 2005! Where: Albuquerque, New Mexico - USA! Registration: reserved to CUG members! Web site: http://www.cug.org! Cray Technical Workshop Europe:! When:

More information

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance

More information

n N c CIni.o ewsrg.au

n N c CIni.o ewsrg.au @NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU

More information

Operational Robustness of Accelerator Aware MPI

Operational Robustness of Accelerator Aware MPI Operational Robustness of Accelerator Aware MPI Sadaf Alam Swiss National Supercomputing Centre (CSSC) Switzerland 2nd Annual MVAPICH User Group (MUG) Meeting, 2014 Computing Systems @ CSCS http://www.cscs.ch/computers

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all

More information

Thinking Outside of the Tera-Scale Box. Piotr Luszczek

Thinking Outside of the Tera-Scale Box. Piotr Luszczek Thinking Outside of the Tera-Scale Box Piotr Luszczek Brief History of Tera-flop: 1997 1997 ASCI Red Brief History of Tera-flop: 2007 Intel Polaris 2007 1997 ASCI Red Brief History of Tera-flop: GPGPU

More information

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System X idataplex CINECA, Italy The site selection

More information

Benchmark Results. 2006/10/03

Benchmark Results. 2006/10/03 Benchmark Results cychou@nchc.org.tw 2006/10/03 Outline Motivation HPC Challenge Benchmark Suite Software Installation guide Fine Tune Results Analysis Summary 2 Motivation Evaluate, Compare, Characterize

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Solving Dense Linear Systems on Graphics Processors

Solving Dense Linear Systems on Graphics Processors Solving Dense Linear Systems on Graphics Processors Sergio Barrachina Maribel Castillo Francisco Igual Rafael Mayo Enrique S. Quintana-Ortí High Performance Computing & Architectures Group Universidad

More information

Preparing for Highly Parallel, Heterogeneous Coprocessing

Preparing for Highly Parallel, Heterogeneous Coprocessing Preparing for Highly Parallel, Heterogeneous Coprocessing Steve Lantz Senior Research Associate Cornell CAC Workshop: Parallel Computing on Ranger and Lonestar May 17, 2012 What Are We Talking About Here?

More information

GPU Clusters for High- Performance Computing Jeremy Enos Innovative Systems Laboratory

GPU Clusters for High- Performance Computing Jeremy Enos Innovative Systems Laboratory GPU Clusters for High- Performance Computing Jeremy Enos Innovative Systems Laboratory National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Presentation Outline NVIDIA

More information

High Performance Computing (HPC) Introduction

High Performance Computing (HPC) Introduction High Performance Computing (HPC) Introduction Ontario Summer School on High Performance Computing Scott Northrup SciNet HPC Consortium Compute Canada June 25th, 2012 Outline 1 HPC Overview 2 Parallel Computing

More information

The Cray CX1 puts massive power and flexibility right where you need it in your workgroup

The Cray CX1 puts massive power and flexibility right where you need it in your workgroup The Cray CX1 puts massive power and flexibility right where you need it in your workgroup Up to 96 cores of Intel 5600 compute power 3D visualization Up to 32TB of storage GPU acceleration Small footprint

More information

Overview of research activities Toward portability of performance

Overview of research activities Toward portability of performance Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into

More information

ABySS Performance Benchmark and Profiling. May 2010

ABySS Performance Benchmark and Profiling. May 2010 ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014 InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment TOP500 Supercomputers, June 2014 TOP500 Performance Trends 38% CAGR 78% CAGR Explosive high-performance

More information

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters An Extension of XcalableMP PGAS Lanaguage for Multi-node Clusters Jinpil Lee, Minh Tuan Tran, Tetsuya Odajima, Taisuke Boku and Mitsuhisa Sato University of Tsukuba 1 Presentation Overview l Introduction

More information

Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu

Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu Filip Staněk Seminář gridového počítání 2011, MetaCentrum, Brno, 7. 11. 2011 Introduction I Project objectives: to establish a centre

More information

The thrust for energy efficiency in computation and how it may impact software

The thrust for energy efficiency in computation and how it may impact software The thrust for energy efficiency in computation and how it may impact software Lennart Johnsson Department of Computer Science University of Houston School of Computer Science and Communications Royal

More information

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) DELIVERABLE D5.5 Report on ICARUS visualization cluster installation John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) 02 May 2011 NextMuSE 2 Next generation Multi-mechanics Simulation Environment Cluster

More information

Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes

Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes www.bsc.es Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes Vishal Mehta Engineer, Barcelona Supercomputing Center vishal.mehta@bsc.es BSC/UPC CUDA Centre of Excellence (CCOE) Training

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) 4th IEEE International Workshop of High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB

More information

CS500 SMARTER CLUSTER SUPERCOMPUTERS

CS500 SMARTER CLUSTER SUPERCOMPUTERS CS500 SMARTER CLUSTER SUPERCOMPUTERS OVERVIEW Extending the boundaries of what you can achieve takes reliable computing tools matched to your workloads. That s why we tailor the Cray CS500 cluster supercomputer

More information

Parallel Programming on Ranger and Stampede

Parallel Programming on Ranger and Stampede Parallel Programming on Ranger and Stampede Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition December 11, 2012 What is Stampede? NSF-funded XSEDE

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

Distributed Dense Linear Algebra on Heterogeneous Architectures. George Bosilca

Distributed Dense Linear Algebra on Heterogeneous Architectures. George Bosilca Distributed Dense Linear Algebra on Heterogeneous Architectures George Bosilca bosilca@eecs.utk.edu Centraro, Italy June 2010 Factors that Necessitate to Redesign of Our Software» Steepness of the ascent

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

Hybrid programming with MPI and OpenMP On the way to exascale

Hybrid programming with MPI and OpenMP On the way to exascale Institut du Développement et des Ressources en Informatique Scientifique www.idris.fr Hybrid programming with MPI and OpenMP On the way to exascale 1 Trends of hardware evolution Main problematic : how

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

User Training Cray XC40 IITM, Pune

User Training Cray XC40 IITM, Pune User Training Cray XC40 IITM, Pune Sudhakar Yerneni, Raviteja K, Nachiket Manapragada, etc. 1 Cray XC40 Architecture & Packaging 3 Cray XC Series Building Blocks XC40 System Compute Blade 4 Compute Nodes

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

Trends in HPC Architectures

Trends in HPC Architectures Mitglied der Helmholtz-Gemeinschaft Trends in HPC Architectures Norbert Eicker Institute for Advanced Simulation Jülich Supercomputing Centre PRACE/LinkSCEEM-2 CyI 2011 Winter School Nikosia, Cyprus Forschungszentrum

More information

Technology for a better society. hetcomp.com

Technology for a better society. hetcomp.com Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction

More information

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories HPC Benchmarking Presentations: Jack Dongarra, University of Tennessee & ORNL The HPL Benchmark: Past, Present & Future Mike Heroux, Sandia National Laboratories The HPCG Benchmark: Challenges It Presents

More information

On the limits of (and opportunities for?) GPU acceleration

On the limits of (and opportunities for?) GPU acceleration On the limits of (and opportunities for?) GPU acceleration Aparna Chandramowlishwaran, Jee Choi, Kenneth Czechowski, Murat (Efe) Guney, Logan Moon, Aashay Shringarpure, Richard (Rich) Vuduc HotPar 10,

More information

LLVM-based Communication Optimizations for PGAS Programs

LLVM-based Communication Optimizations for PGAS Programs LLVM-based Communication Optimizations for PGAS Programs nd Workshop on the LLVM Compiler Infrastructure in HPC @ SC15 Akihiro Hayashi (Rice University) Jisheng Zhao (Rice University) Michael Ferguson

More information

Philippe Thierry Sr Staff Engineer Intel Corp.

Philippe Thierry Sr Staff Engineer Intel Corp. HPC@Intel Philippe Thierry Sr Staff Engineer Intel Corp. IBM, April 8, 2009 1 Agenda CPU update: roadmap, micro-μ and performance Solid State Disk Impact What s next Q & A Tick Tock Model Perenity market

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran

More information

FPGA-based Supercomputing: New Opportunities and Challenges

FPGA-based Supercomputing: New Opportunities and Challenges FPGA-based Supercomputing: New Opportunities and Challenges Naoya Maruyama (RIKEN AICS)* 5 th ADAC Workshop Feb 15, 2018 * Current Main affiliation is Lawrence Livermore National Laboratory SIAM PP18:

More information

How to Write Code that Will Survive the Many-Core Revolution

How to Write Code that Will Survive the Many-Core Revolution How to Write Code that Will Survive the Many-Core Revolution Write Once, Deploy Many(-Cores) Guillaume BARAT, EMEA Sales Manager CAPS worldwide ecosystem Customers Business Partners Involved in many European

More information

Multi-Threaded UPC Runtime for GPU to GPU communication over InfiniBand

Multi-Threaded UPC Runtime for GPU to GPU communication over InfiniBand Multi-Threaded UPC Runtime for GPU to GPU communication over InfiniBand Miao Luo, Hao Wang, & D. K. Panda Network- Based Compu2ng Laboratory Department of Computer Science and Engineering The Ohio State

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) PRACE 16th Call Technical Guidelines for Applicants V1: published on 26/09/17 TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) The contributing sites and the corresponding computer systems

More information

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big

More information

CUDA. Matthew Joyner, Jeremy Williams

CUDA. Matthew Joyner, Jeremy Williams CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel

More information

What does Heterogeneity bring?

What does Heterogeneity bring? What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or

More information

The Mont-Blanc Project

The Mont-Blanc Project http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding

More information

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

High performance Computing and O&G Challenges

High performance Computing and O&G Challenges High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating

More information

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid

More information

Parallel Computer Architecture - Basics -

Parallel Computer Architecture - Basics - Parallel Computer Architecture - Basics - Christian Terboven 19.03.2012 / Aachen, Germany Stand: 15.03.2012 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda Processor

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

J. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst

J. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function

More information

Experiences with GPGPUs at HLRS

Experiences with GPGPUs at HLRS ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: Experiences with GPGPUs at HLRS Stefan Wesner, Managing Director High

More information

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Hitoshi Sato *1, Shuichi Ihara *2, Satoshi Matsuoka *1 *1 Tokyo Institute

More information

Maxwell: a 64-FPGA Supercomputer

Maxwell: a 64-FPGA Supercomputer Maxwell: a 64-FPGA Supercomputer Copyright 2007, the University of Edinburgh Dr Rob Baxter Software Development Group Manager, EPCC R.Baxter@epcc.ed.ac.uk +44 131 651 3579 Outline The FHPCA Why build Maxwell?

More information