Future Technologies (WP8) Prototype Evaluation & Research Activities. Iris Christadler, Dr. Herbert Huber Leibniz Supercomputing Centre, Germany
|
|
- Roderick McLaughlin
- 6 years ago
- Views:
Transcription
1 Future Technologies (WP8) Prototype Evaluation & Research Activities Iris Christadler, Dr. Herbert Huber Leibniz Supercomputing Centre, Germany
2 Prototype Overview (1/2) CEA 1U Tesla Server T1070 (CUDA, Take more easily advantage of accelerators. Compare GPU/CAPS CAPS, DDT), Intel Harpertown nodes HMPP with other approaches to program accelerators. CINECA I/O Subsystem (SSD, Lustre, pnfs) Assess the applicability of new file system and storage technologies. CINES-LRZ LRB/CS Hybrid SGI ICE/UV/Nehalem-EP & Nehalem-EX/ClearSpeed/Larrabee Evaluate a hybrid system architecture containing thin nodes, fat nodes and compute accelerators with a shared file system. CSCS UPC/CAF Prototype PGAS language compilers (CAF + UPC for Cray XT systems) Understand the usability and programmability of PGAS languages. EPCC FPGA Maxwell FPGA prototype (VHDL support & consultancy + software licenses (e.g., Mitrion-C)) Assess the potential of high-level languages for using FPGAs in HPC. Compare energy efficiency with other solutions. PRACE Workshop, New Languages & Future Technology Prototypes 2
3 Prototype Overview (2/2) FZJ eqpace (PowerXCell Gain deep expertise in communication 8i cluster with special network issues. Extend the application network processor) domain of the QPACE system. Cell & FPGA interconnect LRZ RapidMind RapidMind Multi-Core Development Platform (automatic code generation for x86, GPUs and Cell) Assess the potential of data stream languages. Compare RapidMind with other approaches for programming accelerators or multi-core systems NCF ClearSpeed ClearSpeed CATS 700 units Evaluate ClearSpeed accelerator hardware for large-scale applications. SNIC- KTH Air cooled blade system from Supermicro with AMD Istanbul processors & QDR IB (subject to EC approval) Evaluate and optimize energy efficiency and packing density of commodity hardware. PRACE Workshop, New Languages & Future Technology Prototypes 3
4 RESEARCH ACTIVITIES PRACE Workshop, New Languages & Future Technology Prototypes 4
5 Parallel GPU Evaluation of GPGPU programming languages (CSC). Languages CUDA+MPI GPU-HMMER OpenCL Benchmarks: GPU-HMMER Euroben Kernels Hardware Tesla AMD Firestream CEA WP8 Prototype PRACE Workshop, New Languages & Future Technology Prototypes 5
6 Advanced PGAS Programming Evaluate usability of PGAS programming model (CSC). Languages Coarray Fortran (CAF) Unified Parallel C (UPC) Benchmarks Euroben mod2am/as/f Environments Cray XT5 (cce) SGI Altix (g95, bupc) upc_barrier; upc_forall (sc=0; sc<totblks; sc++; sc) { // Square matrix multiply l C = A * B with aid of DGEMM double beta = 0; double *clocal = (double *)c[sc].x; // Local C-block for this UPC-thread int ib = sc / numblks, jb = sc % numblks, kb, i, j, k; for (kb=0; kb<numblks; kb++) { int sa = ib * numblks + kb; // The owner of A-block is sa % THREADS int sb = kb * numblks + jb; // The owner of B-block is sb %THREADS double *al = (sa%threads == MYTHREAD)? // Get the A-block (double *)a[sa].x : ( upc_memget(alocal, a[sa].x, ns), alocal); double *bl = (sb%threads == MYTHREAD)? // Get the B-block (double *)b[sb].x : (upc_memget(blocal, b[sb].x, ns), blocal); double *cl = clocal; // The local C-block owned by this UPC-thread // Call BLAS3-library DGEMM dgemm_("n","n", &blksize, &blksize, &blksize, &alpha, al, &blksize, bl, &blksize, &beta, cl, &blksize, 1, 1); beta = 1; } /* for (kb=0; kb<numblks; kb++) */ } /* upc_forall (sc=0; sc<totblks; sc++; sc) */ upc_barrier; Mod2am kernel using DGEMM PRACE Workshop, New Languages & Future Technology Prototypes 6
7 Research on power efficiency Evaluate power consumption of components (STFC, PSNC). Hardware: Intel Xeon, AMP Opteron, ClearSpeed, Tesla, Firestream, Cell, Power6. Different workloads: stand-by, neutral, real life, artificial i stress. Assess CPU, Memories, Accelerators, HDD s, cooling fans, backplane, power supply. Power measurements with: Clamp meters, PDUs with built-in in ammeters, values from system management software PRACE Workshop, New Languages & Future Technology Prototypes 7
8 Research on Performance Predictions Prediction of application performance for future architectures Optimize hardware specifications in terms of sustained application performance per Euro. Identify applications porting issues to new architectures. Identify hard- and software scaling issues PRACE Workshop, New Languages & Future Technology Prototypes 8
9 Detailed Results are reported in Deliverable D8.3.2 available at project.eu/documents/d8 A SELECTION OF D8.3.2 KEY RESULTS PRACE Workshop, New Languages & Future Technology Prototypes 9
10 QPACE ranked #1 in Green 500 List PRACE Workshop, New Languages & Future Technology Prototypes 10
11 Euroben results - accelerator languages AcceleratorLanguages (absolute performance) MKL (8 Nehalem cores) CUDA (1 C1060) Mflops % 81% CellSs (1PowerXCell8i) 79% 78% v. peak Cn (1CSX700) mod2f/mkl: single threaded only peak perf mod2am mod2as mod2f % of peak per rformance Accelerator Languages (%peak perf) mod2f/mkl: single threaded only mod2am mod2as mod2f MKL CUDA CellSs Cn PRACE Workshop, New Languages & Future Technology Prototypes 11
12 Euroben results - GPGPU languages Hardware SP Peak Performance DP Peak Performance Nehalem-EP (2.53 GHz, 1 core) Nehalem-EP (2.53 GHz, 8 cores) GFlop/s GFlop/s GFlop/s GFlop/s 1 C1060 GPU 933 GFlop/s 78 GFlop/s 1 PowerXCell8i GFlop/s (8 SPUs) GFlop/s 2 PowerXCell8i (16 SPUs) GFlop/s GFlop/s PRACE Workshop, New Languages & Future Technology Prototypes 12
13 Mflops Performance in Euroben results - productivity Development Time versus Performance (dense matrix-matrix mul.) * * ** 6 PPerformancePerformance 10 4 total time Dev velopment Tim me in Days 2 first version 1 0 * OpenCL and CUDA+MPI port based on existing CUDA port ** RapidMind developer included time for benchmarking PRACE Workshop, New Languages & Future Technology Prototypes 13
14 Rinf PRACE Workshop, New Languages & Future Technology Prototypes 14
15 Infiniband: Intelligent Routing Traffic Aware Routing Algorithm (TARA) PRACE Workshop, New Languages & Future Technology Prototypes 15
16 Infiniband: Interconnect Prunning MPI- MPT OpenMPI MPI MPI Tasks Time Intel_PRU- Intel_FULL_Inter Inter- (s) Time (s) NED Time connect Time (s) (s) Influence of different MPI versions and network pruning on execution time of GADGET PRACE Workshop, New Languages & Future Technology Prototypes 16
17 Das Bild kann nicht angezeigt werden. Dieser Computer verfügt möglicherweise über zu wenig Arbeitsspeicher, um das Bild zu öffnen, oder das Bild ist beschädigt. Starten Sie den Computer neu, und öffnen Sie dann erneut die Datei. Wenn weiterhin das rote x angezeigt wird, müssen Sie das Bild möglicherweise löschen und dann erneut einfügen. IO-Results: Lustre Metadata Performance Lustre MDS is bottleneck for small I/O operations Use stripe count 1 for metadata intensive I/O loads The metadata performance of Lustre needs to be largely l improved for Multi- Petascale machines PRACE Workshop, New Languages & Future Technology Prototypes 17
18 A glimpse on what you will find in Deliverable D8.3.2 PROTOTYPES PRACE Workshop, New Languages & Future Technology Prototypes 18
19 eqpace Extend communication capabilities of eqpace to make it suitable for a wider range of applications. Reach a top position in the Green500 list (FZJ). Hardware: PowerXCell8i processor nodes with custom 3D-torus interconnect. Benchmarks: HPL, Euroben kernels, torus network benchmark, applications & iterative solvers. Programming g environments: Cell SDK & CellSs PRACE Workshop, New Languages & Future Technology Prototypes 19
20 RapidMind Evaluation of the RapidMind programming model (LRZ). Hardware: CPUs (Nehalem EP, AMD Opteron) GPUs (Nvidia Tesla and Quadro FX) Cell (QS22-blade cluster) Gfops RapidMind dmod2am matrix size (m) Software: RapidMind allows to write code which can run on x86 cores as well as accelerators like GPUs and Cell. x86 dp (8 cores nehalem) cuda dp (c1060) glsl sp (FX 5800) Evaluate ease-of-use & portability Assess RapidMind performance on different architectures Compare RapidMind with other accelerator languages PRACE Workshop, New Languages & Future Technology Prototypes 20
21 LRZ-CINES Evaluation of a hybrid system architecture containing thin nodes, fat nodes and compute accelerators with a shared file system (CINES, LRZ). Hardware: SGI ICE (Nehalem EP) SGI UV (Nehalem EX) Clearspeed CSX700 Benchmarks: Euroben kernels Synthetic BMs: HPL, Rinf, Intel MPI Benchmark, Apex-MAP Application BMs: Gadget, Raxml PRACE Workshop, New Languages & Future Technology Prototypes 21
22 Hybrid technology demonstrator Evaluating GPGPU with CAPS HMPP (CEA). Hardware: Tesla servers connected to Bull servers via PCI-E. Software: CAPS HMPP allows to exploit the potential of GPGPUs by simply adding preprocessor directives to legacy Fortran and C codes. ops Gfl Gflops CAPS hmpp mod2am matrix size (m) CUDA mod2am matrix size (m) PRACE Workshop, New Languages & Future Technology Prototypes 22
23 Maxwell FPGA Evaluate the performance and usability of the HARWEST Compiling Environment (EPCC). Hardware: FPGA prototype Maxwell (32 FPGAs) from both Alpha Data Ltd and Nallatech Ld Ltd using Virtex-4 FPGAs supplied by Xilinx Corp. Benchmarks: 4 Euroben kernels Languages: VHDL HCE PRACE Workshop, New Languages & Future Technology Prototypes 23
24 PGAS languages Evaluate ease of use of PGAS programming model (CSCS). Hardware: Cray XT5 Compiler: Cray Compiler Environment (CCE) Evaluation of the compiler: Functional correctness Conformance with language standards Usability for existing CAF and UPC benchmarks/applications Benchmarks from Rice University, George Washington University and the Lawrence Berkley National Laboratory PRACE Workshop, New Languages & Future Technology Prototypes 24
25 ClearSpeed/PetaPath Evaluate ClearSpeed-Petapath system (NCF). Hardware: 114 ClearSpeed CSX700 cards Language: C n Benchmarks: 4 Euroben kernels 4 Applications Astronomy Geophysics numerical mathematics medical tomography PRACE Workshop, New Languages & Future Technology Prototypes 25
26 Das Bild kann nicht angezeigt werden. Dieser Computer verfügt möglicherweise über zu wenig Arbeitsspeicher, um das Bild zu öffnen, oder das Bild ist beschädigt. Starten Sie den Computer neu, und öffnen Sie dann erneut die Datei. Wenn weiterhin das rote x angezeigt wird, müssen Sie das Bild möglicherweise löschen und dann erneut einfügen. XC4-IO Compare performances in storage infrastructure access, using different hardware configurations and file system architectures. (CINECA). PRACE Workshop, New Languages & Future Technology Prototypes 26
27 SNIC-KTH Evaluate energy efficiency of high density commodity parts (SNIC-KTH). Preliminary Results (Gromacs) Hardware: AMD Istanbul Benchmarks: Euroben, STREAM, IMB, Gromacs, CFD Measure power consumption per component Adjust fan speed and fan power Assess energy management features of AMD Istanbul (Control of voltage and frequency of components) PRACE Workshop, New Languages & Future Technology Prototypes 27
28 Contact information: Dr. Herbert Huber (WP8 Leader), Ii Iris Chi Christadler (WP8C Co-Leader), hi d Leibniz Supercomputing Centre, Germany THANK YOU FOR YOUR ATTENTION! COMMENTS? QUESTIONS? PRACE Workshop, New Languages & Future Technology Prototypes 28
1. SQL definition SQL is a declarative query language 2. Components DRL: Data Retrieval Language DML: Data Manipulation Language DDL: Data Definition
SQL Summary Definitions iti 1. SQL definition SQL is a declarative query language 2. Components DRL: Data Retrieval Language DML: Data Manipulation Language g DDL: Data Definition Language DCL: Data Control
More informationDirections in HPC Technology
Directions in HPC Technology PRACE evaluates Technologies for Multi-Petaflop/s Systems This should lead to integration of 3 5 Tier-0 world-class systems in Europe from 2010 on. It implies: New hardware
More informationCreating a European Supercomputing Infrastructure Thomas Lippert, Forschungszentrum Jülich, Germany
The Next-Generation Supercomputing Symposium 2009, Tokyo, Japan Creating a European Supercomputing Infrastructure Thomas Lippert, Forschungszentrum Jülich, Germany Regards from Prof. Dr. Achim Bachem Coordinator
More informationCreating a European Supercomputing Infrastructure Thomas Lippert, Forschungszentrum Jülich, Germany
25e Forum ORAP, EDF R&D Clamart, France, 13 36 14 octobre 2009 Creating a European Supercomputing Infrastructure Thomas Lippert, Forschungszentrum Jülich, Germany Regards from Prof. Dr. Achim Bachem Coordinator
More informationarxiv: v2 [cs.pf] 19 Feb 2010
RapidMind: Portability across Architectures and its Limitations Iris Christadler and Volker Weinberg Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften, D-85748 Garching bei München, Germany
More informationPOTENTIAL AND BENEFITS OF FUNCTIONAL MOCK-UP INTERFACE - FMI FOR VIRTUAL VEHICLE INTEGRATION
POTENTIAL AND BENEFITS OF FUNCTIONAL MOCK-UP INTERFACE - FMI FOR VIRTUAL VEHICLE INTEGRATION 1 WHY WOULD CARMAKER NEED FMI? New Challenges in vehicle development Hybrid and electric cars, networking functions...
More informationExperiences with code porting to (GPU) accelerated systems
Experiences with code porting to (GPU) accelerated systems PRACE Project experiences Lennart Johnsson Department of Computer Science University of Houston Outline PRACE PRACE Technology Objectives Accelerator
More informationMixed MPI-OpenMP EUROBEN kernels
Mixed MPI-OpenMP EUROBEN kernels Filippo Spiga ( on behalf of CINECA ) PRACE Workshop New Languages & Future Technology Prototypes, March 1-2, LRZ, Germany Outline Short kernel description MPI and OpenMP
More informationThe SNIC/KTH Prototype Energy Efficiency with Standard components. Lennart Johnsson, Daniel Ahlin, John Wang KTH/SNIC, UH
The SNIC/KTH Prototype Energy Efficiency with Standard components Lennart Johnsson, Daniel Ahlin, John Wang KTH/SNIC, UH Outline Background The PRACE SNIC/KTH prototype Preliminary results Work to be done
More informationAddressing Heterogeneity in Manycore Applications
Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationCUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation
CUDA Accelerated Linpack on Clusters E. Phillips, NVIDIA Corporation Outline Linpack benchmark CUDA Acceleration Strategy Fermi DGEMM Optimization / Performance Linpack Results Conclusions LINPACK Benchmark
More informationPortable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.
Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences
More informationPerformance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster
Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &
More informationPRACE prototypes. ICT 08, Lyon, Nov. 26, 2008 Dr. J.Ph. Nominé, CEA/DIF
PRACE prototypes ICT 08, Lyon, Nov. 26, 2008 Dr. J.Ph. Nominé, CEA/DIF jean-philippe.nomine@cea.fr Credits and acknowledgements: FZJ, CEA, NCF/SARA, HLRS, BSC, CSC, CSCS F. Robin (PRACE WP7 Leader) 2 PRACE
More informationCRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar
CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION
More informationHMPP port. G. Colin de Verdière (CEA)
HMPP port G. Colin de Verdière (CEA) Overview.Uchu prototype HMPP MOD2AS MOD2AM HMPP in a real code 2 The UCHU prototype Bull servers 1 login node 4 nodes 2 Haperton, 8GB 2 NVIDIA Tesla S1070 IB DDR Slurm
More informationHPC Technology Update Challenges or Chances?
HPC Technology Update Challenges or Chances? Swiss Distributed Computing Day Thomas Schoenemeyer, Technology Integration, CSCS 1 Move in Feb-April 2012 1500m2 16 MW Lake-water cooling PUE 1.2 New Datacenter
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationEfficiency Considerations of Cauchy Reed-Solomon Implementations on Accelerator and Multi-Core Platforms
Efficiency Considerations of Cauchy Reed-Solomon Implementations on Accelerator and Multi-Core Platforms SAAHPC June 15 2010 Knoxville, TN Kathrin Peter Sebastian Borchert Thomas Steinke Zuse Institute
More informationPrototypes Systems for PRACE. François Robin, GENCI, WP7 leader
Prototypes Systems for PRACE François Robin, GENCI, WP7 leader Outline Motivation Summary of the selection process Description of the set of prototypes selected by the Management Board Conclusions 2 Outline
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationRWTH GPU-Cluster. Sandra Wienke March Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
RWTH GPU-Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de March 2012 Rechen- und Kommunikationszentrum (RZ) The GPU-Cluster GPU-Cluster: 57 Nvidia Quadro 6000 (29 nodes) innovative
More informationThe GPU-Cluster. Sandra Wienke Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
The GPU-Cluster Sandra Wienke wienke@rz.rwth-aachen.de Fotos: Christian Iwainsky Rechen- und Kommunikationszentrum (RZ) The GPU-Cluster GPU-Cluster: 57 Nvidia Quadro 6000 (29 nodes) innovative computer
More informationExperts in Application Acceleration Synective Labs AB
Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg
More informationPiz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design
Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500
More informationComparison of PRACE prototypes and benchmarks. Axel Berg (SARA, NL), ISC 10 Hamburg June 1 st 2010
Comparison of PRACE prototypes and benchmarks Axel Berg (SARA, NL), ISC 10 Hamburg June 1 st 2010 What is a prototype? 2 The prototype according to Wikipedia A prototype is an original type, form, or instance
More informationGPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3
/CPU,a),2,2 2,2 Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3 XMP XMP-dev CPU XMP-dev/StarPU XMP-dev XMP CPU StarPU CPU /CPU XMP-dev/StarPU N /CPU CPU. Graphics Processing Unit GP General-Purpose
More informationJohn Hengeveld Director of Marketing, HPC Evangelist
MIC, Intel and Rearchitecting for Exascale John Hengeveld Director of Marketing, HPC Evangelist Intel Data Center Group Dr. Jean-Laurent Philippe, PhD Technical Sales Manager & Exascale Technical Lead
More informationDealing with Heterogeneous Multicores
Dealing with Heterogeneous Multicores François Bodin INRIA-UIUC, June 12 th, 2009 Introduction Main stream applications will rely on new multicore / manycore architectures It is about performance not parallelism
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationPrototyping in PRACE PRACE Energy to Solution prototype at LRZ
Prototyping in PRACE PRACE Energy to Solution prototype at LRZ Torsten Wilde 1IP-WP9 co-lead and 2IP-WP11 lead (GSC-LRZ) PRACE Industy Seminar, Bologna, April 16, 2012 Leibniz Supercomputing Center 2 Outline
More informationCAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director
CAS 2K13 Sept. 2013 Jean-Pierre Panziera Chief Technology Director 1 personal note 2 Complete solutions for Extreme Computing b ubullx ssupercomputer u p e r c o p u t e r suite s u e Production ready
More informationProductive Performance on the Cray XK System Using OpenACC Compilers and Tools
Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationRapidMind & PGI Accelerator Compiler. Dr. Volker Weinberg Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften
RapidMind & PGI Accelerator Compiler Dr. Volker Weinberg Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften volker.weinberg@lrz.de PRACE Workshop New Languages & Future Technology Prototypes
More informationCray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:
Cray events! Cray User Group (CUG):! When: May 16-19, 2005! Where: Albuquerque, New Mexico - USA! Registration: reserved to CUG members! Web site: http://www.cug.org! Cray Technical Workshop Europe:! When:
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationOperational Robustness of Accelerator Aware MPI
Operational Robustness of Accelerator Aware MPI Sadaf Alam Swiss National Supercomputing Centre (CSSC) Switzerland 2nd Annual MVAPICH User Group (MUG) Meeting, 2014 Computing Systems @ CSCS http://www.cscs.ch/computers
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all
More informationThinking Outside of the Tera-Scale Box. Piotr Luszczek
Thinking Outside of the Tera-Scale Box Piotr Luszczek Brief History of Tera-flop: 1997 1997 ASCI Red Brief History of Tera-flop: 2007 Intel Polaris 2007 1997 ASCI Red Brief History of Tera-flop: GPGPU
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System X idataplex CINECA, Italy The site selection
More informationBenchmark Results. 2006/10/03
Benchmark Results cychou@nchc.org.tw 2006/10/03 Outline Motivation HPC Challenge Benchmark Suite Software Installation guide Fine Tune Results Analysis Summary 2 Motivation Evaluate, Compare, Characterize
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationSolving Dense Linear Systems on Graphics Processors
Solving Dense Linear Systems on Graphics Processors Sergio Barrachina Maribel Castillo Francisco Igual Rafael Mayo Enrique S. Quintana-Ortí High Performance Computing & Architectures Group Universidad
More informationPreparing for Highly Parallel, Heterogeneous Coprocessing
Preparing for Highly Parallel, Heterogeneous Coprocessing Steve Lantz Senior Research Associate Cornell CAC Workshop: Parallel Computing on Ranger and Lonestar May 17, 2012 What Are We Talking About Here?
More informationGPU Clusters for High- Performance Computing Jeremy Enos Innovative Systems Laboratory
GPU Clusters for High- Performance Computing Jeremy Enos Innovative Systems Laboratory National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Presentation Outline NVIDIA
More informationHigh Performance Computing (HPC) Introduction
High Performance Computing (HPC) Introduction Ontario Summer School on High Performance Computing Scott Northrup SciNet HPC Consortium Compute Canada June 25th, 2012 Outline 1 HPC Overview 2 Parallel Computing
More informationThe Cray CX1 puts massive power and flexibility right where you need it in your workgroup
The Cray CX1 puts massive power and flexibility right where you need it in your workgroup Up to 96 cores of Intel 5600 compute power 3D visualization Up to 32TB of storage GPU acceleration Small footprint
More informationOverview of research activities Toward portability of performance
Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into
More informationABySS Performance Benchmark and Profiling. May 2010
ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationInfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014
InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment TOP500 Supercomputers, June 2014 TOP500 Performance Trends 38% CAGR 78% CAGR Explosive high-performance
More informationAn Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters
An Extension of XcalableMP PGAS Lanaguage for Multi-node Clusters Jinpil Lee, Minh Tuan Tran, Tetsuya Odajima, Taisuke Boku and Mitsuhisa Sato University of Tsukuba 1 Presentation Overview l Introduction
More informationVýpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu
Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu Filip Staněk Seminář gridového počítání 2011, MetaCentrum, Brno, 7. 11. 2011 Introduction I Project objectives: to establish a centre
More informationThe thrust for energy efficiency in computation and how it may impact software
The thrust for energy efficiency in computation and how it may impact software Lennart Johnsson Department of Computer Science University of Houston School of Computer Science and Communications Royal
More informationDELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)
DELIVERABLE D5.5 Report on ICARUS visualization cluster installation John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) 02 May 2011 NextMuSE 2 Next generation Multi-mechanics Simulation Environment Cluster
More informationExploiting CUDA Dynamic Parallelism for low power ARM based prototypes
www.bsc.es Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes Vishal Mehta Engineer, Barcelona Supercomputing Center vishal.mehta@bsc.es BSC/UPC CUDA Centre of Excellence (CCOE) Training
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationCarlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)
Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) 4th IEEE International Workshop of High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB
More informationCS500 SMARTER CLUSTER SUPERCOMPUTERS
CS500 SMARTER CLUSTER SUPERCOMPUTERS OVERVIEW Extending the boundaries of what you can achieve takes reliable computing tools matched to your workloads. That s why we tailor the Cray CS500 cluster supercomputer
More informationParallel Programming on Ranger and Stampede
Parallel Programming on Ranger and Stampede Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition December 11, 2012 What is Stampede? NSF-funded XSEDE
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationOP2 FOR MANY-CORE ARCHITECTURES
OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC
More informationDistributed Dense Linear Algebra on Heterogeneous Architectures. George Bosilca
Distributed Dense Linear Algebra on Heterogeneous Architectures George Bosilca bosilca@eecs.utk.edu Centraro, Italy June 2010 Factors that Necessitate to Redesign of Our Software» Steepness of the ascent
More informationIntroduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes
Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel
More informationHybrid programming with MPI and OpenMP On the way to exascale
Institut du Développement et des Ressources en Informatique Scientifique www.idris.fr Hybrid programming with MPI and OpenMP On the way to exascale 1 Trends of hardware evolution Main problematic : how
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationUser Training Cray XC40 IITM, Pune
User Training Cray XC40 IITM, Pune Sudhakar Yerneni, Raviteja K, Nachiket Manapragada, etc. 1 Cray XC40 Architecture & Packaging 3 Cray XC Series Building Blocks XC40 System Compute Blade 4 Compute Nodes
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationTrends in HPC Architectures
Mitglied der Helmholtz-Gemeinschaft Trends in HPC Architectures Norbert Eicker Institute for Advanced Simulation Jülich Supercomputing Centre PRACE/LinkSCEEM-2 CyI 2011 Winter School Nikosia, Cyprus Forschungszentrum
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationPresentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories
HPC Benchmarking Presentations: Jack Dongarra, University of Tennessee & ORNL The HPL Benchmark: Past, Present & Future Mike Heroux, Sandia National Laboratories The HPCG Benchmark: Challenges It Presents
More informationOn the limits of (and opportunities for?) GPU acceleration
On the limits of (and opportunities for?) GPU acceleration Aparna Chandramowlishwaran, Jee Choi, Kenneth Czechowski, Murat (Efe) Guney, Logan Moon, Aashay Shringarpure, Richard (Rich) Vuduc HotPar 10,
More informationLLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS Programs nd Workshop on the LLVM Compiler Infrastructure in HPC @ SC15 Akihiro Hayashi (Rice University) Jisheng Zhao (Rice University) Michael Ferguson
More informationPhilippe Thierry Sr Staff Engineer Intel Corp.
HPC@Intel Philippe Thierry Sr Staff Engineer Intel Corp. IBM, April 8, 2009 1 Agenda CPU update: roadmap, micro-μ and performance Solid State Disk Impact What s next Q & A Tick Tock Model Perenity market
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationFPGA-based Supercomputing: New Opportunities and Challenges
FPGA-based Supercomputing: New Opportunities and Challenges Naoya Maruyama (RIKEN AICS)* 5 th ADAC Workshop Feb 15, 2018 * Current Main affiliation is Lawrence Livermore National Laboratory SIAM PP18:
More informationHow to Write Code that Will Survive the Many-Core Revolution
How to Write Code that Will Survive the Many-Core Revolution Write Once, Deploy Many(-Cores) Guillaume BARAT, EMEA Sales Manager CAPS worldwide ecosystem Customers Business Partners Involved in many European
More informationMulti-Threaded UPC Runtime for GPU to GPU communication over InfiniBand
Multi-Threaded UPC Runtime for GPU to GPU communication over InfiniBand Miao Luo, Hao Wang, & D. K. Panda Network- Based Compu2ng Laboratory Department of Computer Science and Engineering The Ohio State
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)
PRACE 16th Call Technical Guidelines for Applicants V1: published on 26/09/17 TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) The contributing sites and the corresponding computer systems
More informationOncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries
Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big
More informationCUDA. Matthew Joyner, Jeremy Williams
CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel
More informationWhat does Heterogeneity bring?
What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or
More informationThe Mont-Blanc Project
http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding
More informationEnabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters
Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationBig Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures
Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid
More informationParallel Computer Architecture - Basics -
Parallel Computer Architecture - Basics - Christian Terboven 19.03.2012 / Aachen, Germany Stand: 15.03.2012 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda Processor
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationJ. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst
Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function
More informationExperiences with GPGPUs at HLRS
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: Experiences with GPGPUs at HLRS Stefan Wesner, Managing Director High
More informationLustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE
Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Hitoshi Sato *1, Shuichi Ihara *2, Satoshi Matsuoka *1 *1 Tokyo Institute
More informationMaxwell: a 64-FPGA Supercomputer
Maxwell: a 64-FPGA Supercomputer Copyright 2007, the University of Edinburgh Dr Rob Baxter Software Development Group Manager, EPCC R.Baxter@epcc.ed.ac.uk +44 131 651 3579 Outline The FHPCA Why build Maxwell?
More information