Future Technologies (WP8) Prototype Evaluation & Research Activities. Iris Christadler, Dr. Herbert Huber Leibniz Supercomputing Centre, Germany

Size: px

Start display at page:

Download "Future Technologies (WP8) Prototype Evaluation & Research Activities. Iris Christadler, Dr. Herbert Huber Leibniz Supercomputing Centre, Germany"

Roderick McLaughlin
6 years ago
Views:

1 Future Technologies (WP8) Prototype Evaluation & Research Activities Iris Christadler, Dr. Herbert Huber Leibniz Supercomputing Centre, Germany

CINECA I/O Subsystem (SSD, Lustre, pnfs) Assess the applicability of new file system and storage technologies.

2 Prototype Overview (1/2) CEA 1U Tesla Server T1070 (CUDA, Take more easily advantage of accelerators. Compare GPU/CAPS CAPS, DDT), Intel Harpertown nodes HMPP with other approaches to program accelerators. CINECA I/O Subsystem (SSD, Lustre, pnfs) Assess the applicability of new file system and storage technologies. CINES-LRZ LRB/CS Hybrid SGI ICE/UV/Nehalem-EP & Nehalem-EX/ClearSpeed/Larrabee Evaluate a hybrid system architecture containing thin nodes, fat nodes and compute accelerators with a shared file system. CSCS UPC/CAF Prototype PGAS language compilers (CAF + UPC for Cray XT systems) Understand the usability and programmability of PGAS languages. EPCC FPGA Maxwell FPGA prototype (VHDL support & consultancy + software licenses (e.g., Mitrion-C)) Assess the potential of high-level languages for using FPGAs in HPC. Compare energy efficiency with other solutions. PRACE Workshop, New Languages & Future Technology Prototypes 2

Prototype Overview (2/2) FZJ eqpace (PowerXCell Gain deep expertise in communication 8i cluster with special network issues. Extend the application network processor) domain of the QPACE system.

3 Prototype Overview (2/2) FZJ eqpace (PowerXCell Gain deep expertise in communication 8i cluster with special network issues. Extend the application network processor) domain of the QPACE system. Cell & FPGA interconnect LRZ RapidMind RapidMind Multi-Core Development Platform (automatic code generation for x86, GPUs and Cell) Assess the potential of data stream languages. Compare RapidMind with other approaches for programming accelerators or multi-core systems NCF ClearSpeed ClearSpeed CATS 700 units Evaluate ClearSpeed accelerator hardware for large-scale applications. SNIC- KTH Air cooled blade system from Supermicro with AMD Istanbul processors & QDR IB (subject to EC approval) Evaluate and optimize energy efficiency and packing density of commodity hardware. PRACE Workshop, New Languages & Future Technology Prototypes 3

4 RESEARCH ACTIVITIES PRACE Workshop, New Languages & Future Technology Prototypes 4

Euroben Kernels Hardware Tesla AMD Firestream CEA WP8

5 Parallel GPU Evaluation of GPGPU programming languages (CSC). Languages CUDA+MPI GPU-HMMER OpenCL Benchmarks: GPU-HMMER Euroben Kernels Hardware Tesla AMD Firestream CEA WP8 Prototype PRACE Workshop, New Languages & Future Technology Prototypes 5

6 Advanced PGAS Programming Evaluate usability of PGAS programming model (CSC). Languages Coarray Fortran (CAF) Unified Parallel C (UPC) Benchmarks Euroben mod2am/as/f Environments Cray XT5 (cce) SGI Altix (g95, bupc) upc_barrier; upc_forall (sc=0; sc<totblks; sc++; sc) { // Square matrix multiply l C = A * B with aid of DGEMM double beta = 0; double *clocal = (double *)c[sc].x; // Local C-block for this UPC-thread int ib = sc / numblks, jb = sc % numblks, kb, i, j, k; for (kb=0; kb<numblks; kb++) { int sa = ib * numblks + kb; // The owner of A-block is sa % THREADS int sb = kb * numblks + jb; // The owner of B-block is sb %THREADS double *al = (sa%threads == MYTHREAD)? // Get the A-block (double *)a[sa].x : ( upc_memget(alocal, a[sa].x, ns), alocal); double *bl = (sb%threads == MYTHREAD)? // Get the B-block (double *)b[sb].x : (upc_memget(blocal, b[sb].x, ns), blocal); double *cl = clocal; // The local C-block owned by this UPC-thread // Call BLAS3-library DGEMM dgemm_("n","n", &blksize, &blksize, &blksize, &alpha, al, &blksize, bl, &blksize, &beta, cl, &blksize, 1, 1); beta = 1; } /* for (kb=0; kb<numblks; kb++) */ } /* upc_forall (sc=0; sc<totblks; sc++; sc) */ upc_barrier; Mod2am kernel using DGEMM PRACE Workshop, New Languages & Future Technology Prototypes 6

7 Research on power efficiency Evaluate power consumption of components (STFC, PSNC). Hardware: Intel Xeon, AMP Opteron, ClearSpeed, Tesla, Firestream, Cell, Power6. Different workloads: stand-by, neutral, real life, artificial i stress. Assess CPU, Memories, Accelerators, HDD s, cooling fans, backplane, power supply. Power measurements with: Clamp meters, PDUs with built-in in ammeters, values from system management software PRACE Workshop, New Languages & Future Technology Prototypes 7

8 Research on Performance Predictions Prediction of application performance for future architectures Optimize hardware specifications in terms of sustained application performance per Euro. Identify applications porting issues to new architectures. Identify hard- and software scaling issues PRACE Workshop, New Languages & Future Technology Prototypes 8

9 Detailed Results are reported in Deliverable D8.3.2 available at project.eu/documents/d8 A SELECTION OF D8.3.2 KEY RESULTS PRACE Workshop, New Languages & Future Technology Prototypes 9

10 QPACE ranked #1 in Green 500 List PRACE Workshop, New Languages & Future Technology Prototypes 10

11 Euroben results - accelerator languages AcceleratorLanguages (absolute performance) MKL (8 Nehalem cores) CUDA (1 C1060) Mflops % 81% CellSs (1PowerXCell8i) 79% 78% v. peak Cn (1CSX700) mod2f/mkl: single threaded only peak perf mod2am mod2as mod2f % of peak per rformance Accelerator Languages (%peak perf) mod2f/mkl: single threaded only mod2am mod2as mod2f MKL CUDA CellSs Cn PRACE Workshop, New Languages & Future Technology Prototypes 11

12 Euroben results - GPGPU languages Hardware SP Peak Performance DP Peak Performance Nehalem-EP (2.53 GHz, 1 core) Nehalem-EP (2.53 GHz, 8 cores) GFlop/s GFlop/s GFlop/s GFlop/s 1 C1060 GPU 933 GFlop/s 78 GFlop/s 1 PowerXCell8i GFlop/s (8 SPUs) GFlop/s 2 PowerXCell8i (16 SPUs) GFlop/s GFlop/s PRACE Workshop, New Languages & Future Technology Prototypes 12

13 Mflops Performance in Euroben results - productivity Development Time versus Performance (dense matrix-matrix mul.) * * ** 6 PPerformancePerformance 10 4 total time Dev velopment Tim me in Days 2 first version 1 0 * OpenCL and CUDA+MPI port based on existing CUDA port ** RapidMind developer included time for benchmarking PRACE Workshop, New Languages & Future Technology Prototypes 13

14 Rinf PRACE Workshop, New Languages & Future Technology Prototypes 14

15 Infiniband: Intelligent Routing Traffic Aware Routing Algorithm (TARA) PRACE Workshop, New Languages & Future Technology Prototypes 15

16 Infiniband: Interconnect Prunning MPI- MPT OpenMPI MPI MPI Tasks Time Intel_PRU- Intel_FULL_Inter Inter- (s) Time (s) NED Time connect Time (s) (s) Influence of different MPI versions and network pruning on execution time of GADGET PRACE Workshop, New Languages & Future Technology Prototypes 16

ist beschädigt. Starten Sie den Computer neu, und öffnen Sie dann erneut die Datei.

17 Das Bild kann nicht angezeigt werden. Dieser Computer verfügt möglicherweise über zu wenig Arbeitsspeicher, um das Bild zu öffnen, oder das Bild ist beschädigt. Starten Sie den Computer neu, und öffnen Sie dann erneut die Datei. Wenn weiterhin das rote x angezeigt wird, müssen Sie das Bild möglicherweise löschen und dann erneut einfügen. IO-Results: Lustre Metadata Performance Lustre MDS is bottleneck for small I/O operations Use stripe count 1 for metadata intensive I/O loads The metadata performance of Lustre needs to be largely l improved for Multi- Petascale machines PRACE Workshop, New Languages & Future Technology Prototypes 17

18 A glimpse on what you will find in Deliverable D8.3.2 PROTOTYPES PRACE Workshop, New Languages & Future Technology Prototypes 18

Hardware: PowerXCell8i processor nodes with custom 3D-torus interconnect.

19 eqpace Extend communication capabilities of eqpace to make it suitable for a wider range of applications. Reach a top position in the Green500 list (FZJ). Hardware: PowerXCell8i processor nodes with custom 3D-torus interconnect. Benchmarks: HPL, Euroben kernels, torus network benchmark, applications & iterative solvers. Programming g environments: Cell SDK & CellSs PRACE Workshop, New Languages & Future Technology Prototypes 19

20 RapidMind Evaluation of the RapidMind programming model (LRZ). Hardware: CPUs (Nehalem EP, AMD Opteron) GPUs (Nvidia Tesla and Quadro FX) Cell (QS22-blade cluster) Gfops RapidMind dmod2am matrix size (m) Software: RapidMind allows to write code which can run on x86 cores as well as accelerators like GPUs and Cell. x86 dp (8 cores nehalem) cuda dp (c1060) glsl sp (FX 5800) Evaluate ease-of-use & portability Assess RapidMind performance on different architectures Compare RapidMind with other accelerator languages PRACE Workshop, New Languages & Future Technology Prototypes 20

LRZ-CINES Evaluation of a hybrid system architecture containing thin nodes, fat nodes and compute accelerators with a shared file system (CINES, LRZ).

21 LRZ-CINES Evaluation of a hybrid system architecture containing thin nodes, fat nodes and compute accelerators with a shared file system (CINES, LRZ). Hardware: SGI ICE (Nehalem EP) SGI UV (Nehalem EX) Clearspeed CSX700 Benchmarks: Euroben kernels Synthetic BMs: HPL, Rinf, Intel MPI Benchmark, Apex-MAP Application BMs: Gadget, Raxml PRACE Workshop, New Languages & Future Technology Prototypes 21

22 Hybrid technology demonstrator Evaluating GPGPU with CAPS HMPP (CEA). Hardware: Tesla servers connected to Bull servers via PCI-E. Software: CAPS HMPP allows to exploit the potential of GPGPUs by simply adding preprocessor directives to legacy Fortran and C codes. ops Gfl Gflops CAPS hmpp mod2am matrix size (m) CUDA mod2am matrix size (m) PRACE Workshop, New Languages & Future Technology Prototypes 22

23 Maxwell FPGA Evaluate the performance and usability of the HARWEST Compiling Environment (EPCC). Hardware: FPGA prototype Maxwell (32 FPGAs) from both Alpha Data Ltd and Nallatech Ld Ltd using Virtex-4 FPGAs supplied by Xilinx Corp. Benchmarks: 4 Euroben kernels Languages: VHDL HCE PRACE Workshop, New Languages & Future Technology Prototypes 23

24 PGAS languages Evaluate ease of use of PGAS programming model (CSCS). Hardware: Cray XT5 Compiler: Cray Compiler Environment (CCE) Evaluation of the compiler: Functional correctness Conformance with language standards Usability for existing CAF and UPC benchmarks/applications Benchmarks from Rice University, George Washington University and the Lawrence Berkley National Laboratory PRACE Workshop, New Languages & Future Technology Prototypes 24

25 ClearSpeed/PetaPath Evaluate ClearSpeed-Petapath system (NCF). Hardware: 114 ClearSpeed CSX700 cards Language: C n Benchmarks: 4 Euroben kernels 4 Applications Astronomy Geophysics numerical mathematics medical tomography PRACE Workshop, New Languages & Future Technology Prototypes 25

Wenn weiterhin das rote x angezeigt wird, müssen Sie das Bild möglicherweise löschen und

26 Das Bild kann nicht angezeigt werden. Dieser Computer verfügt möglicherweise über zu wenig Arbeitsspeicher, um das Bild zu öffnen, oder das Bild ist beschädigt. Starten Sie den Computer neu, und öffnen Sie dann erneut die Datei. Wenn weiterhin das rote x angezeigt wird, müssen Sie das Bild möglicherweise löschen und dann erneut einfügen. XC4-IO Compare performances in storage infrastructure access, using different hardware configurations and file system architectures. (CINECA). PRACE Workshop, New Languages & Future Technology Prototypes 26

27 SNIC-KTH Evaluate energy efficiency of high density commodity parts (SNIC-KTH). Preliminary Results (Gromacs) Hardware: AMD Istanbul Benchmarks: Euroben, STREAM, IMB, Gromacs, CFD Measure power consumption per component Adjust fan speed and fan power Assess energy management features of AMD Istanbul (Control of voltage and frequency of components) PRACE Workshop, New Languages & Future Technology Prototypes 27

28 Contact information: Dr. Herbert Huber (WP8 Leader), Ii Iris Chi Christadler (WP8C Co-Leader), hi d Leibniz Supercomputing Centre, Germany THANK YOU FOR YOUR ATTENTION! COMMENTS? QUESTIONS? PRACE Workshop, New Languages & Future Technology Prototypes 28

1. SQL definition SQL is a declarative query language 2. Components DRL: Data Retrieval Language DML: Data Manipulation Language DDL: Data Definition

SQL Summary Definitions iti 1. SQL definition SQL is a declarative query language 2. Components DRL: Data Retrieval Language DML: Data Manipulation Language g DDL: Data Definition Language DCL: Data Control