Accelerators in Technical Computing: Is it Worth the Pain?

Size: px

Start display at page:

Download "Accelerators in Technical Computing: Is it Worth the Pain?"

Solomon Fitzgerald
5 years ago
Views:

1 Accelerators in Technical Computing: Is it Worth the Pain? A TCO Perspective Sandra Wienke, Dieter an Mey, Matthias S. Müller Center for Computing and Communication JARA High-Performance Computing RWTH Aachen University Rechen- und Kommunikationszentrum (RZ)

2 Agenda Introduction Modeling Total Cost of Ownership (TCO) Comparison Metrics Case Study on Accelerators Programming Models & System Types TCO RWTH Real-World Application Results Conclusion & Outlook 2

3 Introduction Today: Varity of HPC clusters Usage of accelerators (NVIDIA GPU, Intel Xeon Phi) motivated by promising performance per watt ratio System comparison by performance or performance per watt not sufficient for purchase decision Total costs of ownership (TCO) Acquisition costs, housing, operation costs,.. Inclusion of manpower costs (administration & programming) Comparison of costs per program run (application-dependent) Investigation of a real-world software package OpenMP on Intel Sandy Bridge OpenMP + LEO on Intel Xeon Phi Impact of manpower effort/ programming model? 3 OpenCL, OpenACC on NVIDA Fermi GPU

4 Modeling Total Cost of Ownership (TCO) Basis: single compute node extrapolate to cluster amount Investment I = TCO n, τ One-time costs C ot = C ot (n) + C pa (n) τ n: number of nodes τ: system lifetime 4 Per node: HW acquisition, building/infrastructure, OS/ env. installation Per node type: OS/ env. installation, programming effort Annual costs C pa Per node: HW maintenance, building/infrastructure, OS/ env. maintenance, power consumption Per node type: OS/ env. maintenance, compiler/software, application maintenance TCO depends on architecture & application

5 Modeling Comparison Metrics Costs per program run C ppr Includes investment/ TCO & application performance TCO(n, τ) k τ C ppr n, τ = with n n ex (τ) n ex τ = t par n number of nodes τ system lifetime n ex #app. executions k system usage rate t par : parallel runtime Used baseline for system X: Intel Sandy Bridge (SNB) + OpenMP C ppr,x n X, τ C ppr,omp n OMP, τ C ppr,omp n OMP, τ < 0 0 if X OMP beneficial Break-even investments Min. budget needed so that system X beneficial over OpenMP on SNB Solve for I with given fixed lifetime τ: C ppr,x n X, τ C ppr,omp n OMP, τ = 0 with TCO n, τ = I 5

6 Case Study on Accelerators Programming Models & System Types Programming Model Accelerator Host Compiler Serial OpenMP (simple, vectorized) LEO + OpenMP Intel Xeon Phi 5110P, 60 cores 2x Intel Sandy Bridge, 16 cores, 2 GHz 1x Intel Westmere, 4 cores, 2.4 GHz Intel Intel OpenACC NVIDIA Tesla PGI 12.9 OpenCL C2050 (Fermi), ECC on Intel

7 Case Study on Accelerators TCO RWTH One-time costs 7 HW purchase: list prices from Bull Building/infrastructure: as annual costs since it is amortized over 25 years OS/env. installation: - Programming effort: Full-time employee costs a day Annual costs HW maintenance: 5% of HW purchase costs Building/infrastructure: 200,000 per year; costs per node: division by 1.6MW; multiplication by max. power consumption of each node OS/env. maintenance: 4 admins, 75% maintenance cluster (~2300 nodes): 180,000 / 2300 = 78 per node and year Software/compiler: - Power: PUE 1.5, regional electricity costs 0.15 /kwh Application maintenance: - (small kernels) Given lifetime of 4 years & investment C ppr #nodes, #executions (usage rate 80%)

8 Source: BMW, ZF, Klingelnberg Case Study on Accelerators Real-World Application Basis Serial version Small kernel Assumption: homogeneous app. landscape KegelSpan 2 3D simulation of bevel gear cutting process Kernel artificially increased from 25% to 90% 8 2 C. Brecher, C. Gorgels, and A. Hardjosuwito. Simulation based Tool Wear Analysis in Bevel Gear Cutting. In International Conference on Gears, volume of VDI- Berichte, pp , Düsseldorf, VDI Verlag, 2010.

9 effort [days] runtime [s] power consumption [W] Case Study on Accelerators TCO Components of Application OpenCL (GPU) OpenACC (GPU) OpenMP+LEO (Phi) OpenMP-vec (SNB) OpenMP-simp (SNB)

10 break-even investment costs per program run (relative to OMP-simp) Case Study on Accelerators Results 20% 10% 0% 3.62% OpenCL (GPU) OpenACC (GPU) OpenMP+LEO (Phi) OpenMP-vec (SNB) -10% -20% 0 100K 200K Investment % % % 10,000 7,787 7,231 5,000 1,

11 Conclusion Are accelerators beneficial? It depends TCO spreadsheet 1 for own computations available Our results (w/ 90% kernel portion) show GPU Fermi beneficial over 2-socket Intel SNB server Intel Xeon Phi results disappointing for now SNB-OMP (4 years, 250 K ) -17% C ppr + 4% C ppr Mainly due to high acquisition costs NVIDIA Kepler probably similar Programming effort impacts break-even investment (see OpenACC OpenCL) Bigger codes: increase of kernel size ~ increase of break-even invest. Projections possible (e.g. hybrid codes) 11 1 Wienke, S., an Mey, D., Müller, M.S.: Accelerators for Technical Computing: Is it Worth the Pain? TCO Spreadsheet. campus.rwth-aachen.de/units/rz/hpc/public/shared%20documents/ WienkeEtAl_Accelerators-TCO-Perspective.xlsx, 2013

12 Outlook Hybrid code implementation (cmp to projections) Model extensions New programming models & architectures (OpenMP 4.0, NVIDIA Kepler) Network communication (MPI) Mixed job execution (heterogeneous application landscape) Assessment of decrease in runtime/ gaining more results Comprehensive TCO calculation with predictive powers Performance, power consumption, manpower Towards exascale computing, architectures might get more complex More difficult to manage & program Impact of manpower effort might get stronger Thank you for your attention! 12

Research on Programming Models to foster Programmer Productivity

Research on Programming Models to foster Programmer Productivity to foster Programmer Productivity Christian Terboven April 5th, 2017 Where is Aachen? 2 Where is Aachen? 3 Where is Aachen? 4 Agenda n Our Research Activities n Some Thoughts