Comparison of PRACE prototypes and benchmarks. Axel Berg (SARA, NL), ISC 10 Hamburg June 1 st 2010

Size: px

Start display at page:

Download "Comparison of PRACE prototypes and benchmarks. Axel Berg (SARA, NL), ISC 10 Hamburg June 1 st 2010"

Julia Sharp
5 years ago
Views:

1 Comparison of PRACE prototypes and benchmarks Axel Berg (SARA, NL), ISC 10 Hamburg June 1 st 2010

2 What is a prototype? 2

3 The prototype according to Wikipedia A prototype is an original type, form, or instance of something serving as a typical example, basis, or standard for other things of the same category. The word derives from the Greek πρωτότυπον (prototypon), "primitive form", neutral of πρωτότυπος (prototypos), "original, primitive", from πρῶτος (protos), "first" and τύπος (typos), "impression". 3

4 Prototypes in PRACE prototypes of leadership class systems at selected sites that are likely to become productions level systems in 2009/2010 prototypes of potential Petaflop/s systems/architectures for the near future ( ) PRACE WP5 prototypes prototypes of testable components and technology (HW & SW) for multi-petaflop/s systems beyond 2011 PRACE WP8 prototypes t 4

5 Prototypes in PRACE prototypes of leadership class systems at selected sites that are likely to become productions level systems in 2009/2010 prototypes of potential Petaflop/s systems/architectures for the near future ( ) PRACE WP5 prototypes prototypes of testable components and technology (HW & SW) for multi-petaflop/s systems beyond 2010 PRACE WP8 prototypes t 5

6 Deployment and assessment of Prototype Systems selection and installation ti of prototype t systems test t integration ti and operation in close to production conditions evaluation of the technical capabilities user application benchmarking 6

7 Selected PRACE prototypes Site FZJ Germany Architecture Vendor/Technology MPP IBM BlueGene/P Point of contact Michael Stephan CSC-CSCSCSCS MPP Janne Ignatius Finland+Switzerland Cray XT5 - AMD Opteron janne.ignatius@csc.fi CEA-FZJ France+Germany NCF/SARA Netherlands BSC Spain SMP-TN Bull/SUN Intel Nehalem SMP-FN IBM Power6 Hybrid fine grain IBM Cell Gilles Wiber gilles.wiber@cea.fr Norbert Eicker n.eicker@fz-juelich.de Axel Berg axel@sara.nl Peter Michielse p.michielse@nwo.nl Sergi Girona sergi.girona@bsc.es HLRS Hybrid coarse grain Stefan Wesner Germany NEC Vector SX/9 + Intel Nehalem wesner@hlrs.de 7

8 Installed PRACE prototypes IBM BlueGene/P (FZJ) IBM Power6 (SARA) Cray XT5 (CSC) IBM Cell (BSC) NEC SX9, vector part (HLRS) cluster part (HLRS) Bull Intel Nehalem (CEA) SUN/Bull Intel Nehalem (FZJ)

Basic features of the prototypes IBM BlueGene/P FZJ IBM Power6 SARA Cray XT4/5 CSC IBM

PPC450d 0,85 GHz IBM Power6 4,7 GHz AMD IBM Power NEC Intel Intel Intel Barc.

2,93 GHz Nr of nodes 16384 104 1012/864 72 12 700 128 2048+1024 Nr.

of cores/node 4 32 4/8 18 16 8 8 8 Memory size/node 2 GB 128 256 GB 4 8 GB 8 32 GB 512

IB QDR way SeaStart2+ connection Peak performance 223 TF 63 TF 86 TF 16 TF 19 TF 60 TF

9 Basic features of the prototypes IBM BlueGene/P FZJ IBM Power6 SARA Cray XT4/5 CSC IBM Cell BSC NEC SX 9 HLRS NEC Nehalem HLRS Bull Nehalem CEA SUN Nehalem FZJ CPU IBM PPC450d 0,85 GHz IBM Power6 4,7 GHz AMD IBM Power NEC Intel Intel Intel Barc./Shang XCell 8i 3,2 GHz Nehalem Nehalem Nehalem 2,3/2,7 GHz 3,2 GHz 2,8 GHz 2,8 GHz 2,93 GHz Nr of nodes / Nr. of cores / Nr. of cores/node / Memory size/node 2 GB GB 4 8 GB 8 32 GB 512 GB 12 GB 24 GB 24 GB Interconnect Proprietary Qlogic IB 8 Cray Direct NEC IXS IB IB QDR IB QDR way SeaStart2+ connection Peak performance 223 TF 63 TF 86 TF 16 TF 19 TF 60 TF 11 TF 207 TF File system IBM GPFS IBM GPFS Lustre IBM GPFS GstorageFS GstorageFS Lustre Lustre Cooling type Air Water Air Air Air Air Water/air Water/air 9

10 Use and benefits of PRACE prototypes HPC infrastructure available for all partners, sharing infrastructure, results, knowledge and experience First PRACE infrastructure available to the user community for porting and testing real applications user awareness Testing of the PRACE peer review system Engagement of HPC vendors with PRACE through prototypes; individual collaboration between centres and vendors were fruitful in assessing and driving future technology Gained insight i in actual status t and potential ti of technology and vendors for delivering Petascale systems 10

11 Use and benefits of PRACE prototypes Results of technical assessments were useful in evaluation of technologies and architectures All prototypes have been used to port, optimize and scale selected applications adaptation of applications to future technology acceleration of application deployment of PRACE Tier-0 systems Results of the applications benchmarks show which applications run well on which prototype/architecture Testing and evaluation of middleware for integration of system into a European service gaining experience and collaboration among centres on the operational and technical level 11

12 Sensitivity of results NDA s with vendors (site specific and PRACE-wide) PRACE intends to work with vendors to continuously verify benchmark results, i.e. synthetic benchmarks Focus on methodologies and general experience and results 12

13 Installation reports of prototype systems installation at planning (start t date, completion o date, planned and actual) for each phase installation experiences and recommendations site preparation system delivery and physical installation system software installation application environment installation system installation to be ready for test users software preparation/configuration p of system for production installation of software for distributed system management acceptance of the system role of the vendor in installation ti process 13

Installation experiences and best practices Site preparation, at system

flow and cabling Weight of racks, weight capacity if raised floor and route

Performance measurements (Linpack, I/O, application, throughput)

14 Installation experiences and best practices Site preparation, at system delivery e and physical installation at Infrastructure planning can be very complex careful planning E.g. floor planning with precise drawings above and under floor for air flow and cabling Weight of racks, weight capacity if raised floor and route to raised floor System acceptance Power measurement Functionality tests Performance measurements (Linpack, I/O, application, throughput) Reliability test Relation with vendors Collaborate as much as possible with vendor: knowledge transfer, working relationship on technical level 14

15 Installation experiences and best practices Site preparation, at system delivery e and physical installation at Infrastructure planning can be very complex careful planning E.g. floor planning with precise drawings above and under floor for air flow and cabling Weight of racks, weight capacity if raised floor and route to raised floor System acceptance Power measurement Functionality tests Performance measurements (Linpack, I/O, application, throughput) Reliability test Relation with vendors Collaborate as much as possible with vendor: knowledge transfer, working relationship on technical level 15

16 Technical assessments of prototypes Mainly synthetic benchmarks, some qualitative measures es JuBE Benchmark framework, script-based framework to easily create benchmark sets, run those sets on different computer systems and evaluate the results ( Scalability wherever possible benchmarks on: a single core, half of the cores on a node, all cores on a node, all cores on an increasing i number of nodes, all nodes in the prototype Comparing results is a challenge Prototypes are different in size Prototypes have different production status Prototypes have I/O systems s designed ed for different e levels e of performance Prototypes represent different generations of technology 16

(STREAM/STREAM2) Cache miss performance (RandomAccess) message passing MPI performance (SKaMPI)

17 Technical assessments system performance Linpack sustained performance EuroBen intrinsic operations EuroBen representative ti algorithms Sustained memory bandwidth (at different cache levels) (STREAM/STREAM2) Cache miss performance (RandomAccess) message passing MPI performance (SKaMPI) Overlap between computation and MPI communication performance (SMB) internal I/O Metadata I/O (Bonnie++) Data I/O (IOR) 17

18 Technical assessments continued system balance a Memory bandwidth per Flop/s MPI bandwidth per Flop/s at different scales Disk I/O per Flop/s at different scales OS performance OS Noise (P-SNAP), OS Jitter (Selfish), OS System resource usage system availability and reliability design + experience manageability of the system system start-up times In general: sufficiently mature technology exists in most if not all aspects of systems to provide Petaflop/s in very near future 18

19 Particle User application benchmarks User applications have been identified ed as being representative of current and future usage of major European HPC systems Initial choice of representative applications based on results of surveys of applications, systems usage and key users Good spread of applications areas, algorithm classes, geographical distribution and prototype o coverage Updated after actual scaling results from PRACE prototypes Plasma Physics 3.3 Computational Engineering 3.7 Life Sciences 5.3 Astronomy & Cosmology 5.8 Earth & Climate 7.8 CFD 8.6 Other 5.8 Condensed Matter Physics 14.2 Particle Physics 23.5 Computational Chemistry 22.1 Application benchmarks used were not I/O intensive i Measure performance of basic architecture No unduly influence by particular I/O system, in some cases I/O systems were not representative of production systems 19

20 User application benchmarks To make meaningful comparisons, s, benchmarks were e run on partitions tto of prototypes which have the same nominal peak performance e.g. 10Tflop/s peak: 3000 cores of IBM BG/P 1000 cores of Cray XT5 500 cores of IBM Power6 100 SX9 vector cores 100 IBM Cells 3000 cores 1000 cores 500 cores 100 cores 100 cores Run on a small number of partition sizes (e.g. 5, 10, 20, 40 Tflop/s) For each partition size, the same input data will be used on all systems Partial coverage of possible application/prototype combinations Compile or runtime failures Lack of available experts for porting Late availability of some prototypes 20

21 Application IBM Power6 IBM BlueGene/P Cray XT5b Cray XT5s Sun x86 cluster Bull x86 cluster IBM Cell NEC SX-9 NEC x86 cluster Code_Saturne CP2K CPMD EUTERPE GADGET GROMACS NAMD NEMO NS3D QCD Quantum_Espresso WRF ALYA AVBP BSIT ELMER GPAW HELIUM OCTOPUS PEPC existence of benchmark data on 10 Tflop/s peak partition, status December

22 General remarks on user application benchmarks Spread of performance numbers across different e prototypes is not huge for most applications Many prototypes show application performance in same order of magnitude (factor between 0,5 and 2) No orders of magnitude difference Selection of a system for running the applications makes sense but is not crucial For some applications spread of performance numbers across prototypes is large Selection of a system for running the application is very relevant Results obtained are very useful for selection the right system/architecture for certain applications 22

23 Scaling of user application benchmarks Benchmark results give good indication of the scaling properties of applications on the various prototype systems Combination of the application and the prototype architecture determines performance scaling Very difficult to predict scaling properties of application-prototype combinations beyond the tested t sizes Performance modelling is required to make such predictions Ex xecution time 1.0E E+02 10E E E+00 IBM BlueGene/P IBM Power Number of CPUs 23

24 Final remarks Huge amount of data has been collected through synthetic and user benchmarks on different prototype systems Detailed insight in large number of important characteristics of systems and architectures, useful comparisons can be made using these benchmarks Important set of reference material for assessment of systems in the future Early access to new systems and architectures is important for testing under close to production conditions Both from system side as from application side 24

25 Final remarks cont d Much more difficult to derive e clear and unambiguous recommendations e o for selection of near-future multi Petaflop/s systems Large number of variables to consider in selection of HPC system Not one system will fit best all applications PRACE will offer variety of systems and architectures Assessed prototypes had very different system sizes and technical maturity Results do provide good guideline for what system/architecture shows good performance on certain features or applications Scaling has been evaluated up to the scale of the prototype systems Predicting scaling beyond the size of the prototype is speculative or at most indicative 25

26 Thank you for your attention Acknowledgements PRACE partners with prototypes: BSC, CEA, CSC, FZJ, HLRS, SARA Task leaders: Jonathan Evans (BSC), Patrice Lucas (CEA), Mark Bull (EPCC) Vendors: Bull, Cray, IBM, NEC, SUN EC FP7/ under grant agreement n RI For more information: PRACE project ( Axel Berg (axel@sara.nl) 26

Prototypes Systems for PRACE. François Robin, GENCI, WP7 leader

Prototypes Systems for PRACE. François Robin, GENCI, WP7 leader Prototypes Systems for PRACE François Robin, GENCI, WP7 leader Outline Motivation Summary of the selection process Description of the set of prototypes selected by the Management Board Conclusions 2 Outline