Remote GPU virtualization: pros and cons of a recent technology. Federico Silla Technical University of Valencia Spain

Size: px

Start display at page:

Download "Remote GPU virtualization: pros and cons of a recent technology. Federico Silla Technical University of Valencia Spain"

Doris Perkins
5 years ago
Views:

1 Remote virtualization: pros and cons of a recent technology Federico Silla Technical University of Valencia Spain

2 The scope of this talk HPC Advisory Council Brazil Conference /43

3 st Outline What is remote virtualization? HPC Advisory Council Brazil Conference /43

4 Basics of computing Basic CUDA behavior HPC Advisory Council Brazil Conference /43

5 Remote virtualization A software technology that enables a more flexible use of s in computing facilities No HPC Advisory Council Brazil Conference /43

6 Basics of remote virtualization HPC Advisory Council Brazil Conference /43

7 Basics of remote virtualization HPC Advisory Council Brazil Conference /43

8 nd Outline Why is remote virtualization needed? HPC Advisory Council Brazil Conference /43

9 Outline Which is the problem with -enabled clusters? HPC Advisory Council Brazil Conference /43

10 Characteristics of -based clusters A -enabled cluster is a set of independent self-contained nodes that leverage the shared-nothing approach: Nothing is directly shared among nodes (MPI required for aggregating computing resources within the cluster) s can only be used within the node they are attached to Interconnection HPC Advisory Council Brazil Conference /43

11 First concern with accelerated clusters Applications can only use the s located within their node: Non-accelerated applications keep s idle in the nodes where they use all the cores A -only application spreading over these four nodes would make their s unavailable for accelerated applications Interconnection HPC Advisory Council Brazil Conference /43

Initial acquisition costs not amortized Space: s reduce density Energy:

node: Two E5-2620V2 sockets and 32GB DDR3 RAM.

12 Money leakage in current clusters? For some workloads, s may be idle for significant periods of time: Initial acquisition costs not amortized Space: s reduce density Energy: idle s keep consuming power Idle Power (Watts) 1 node 4 s node 25% 1 node: Two E5-2620V2 sockets and 32GB DDR3 RAM. One Tesla K20 4 s node: Two E5-2620V2 sockets and 128GB DDR3 RAM. Four Tesla K20 s Time (s) HPC Advisory Council Brazil Conference /43

13 Second concern with accelerated clusters Applications can only use the s located within their node: Multi- applications running on a subset of nodes cannot make use of the tremendous resources available at other cluster nodes (even if they are idle) multi- application All these s cannot be used by the multi- application in execution Interconnection HPC Advisory Council Brazil Conference /43

14 One more concern with accelerated clusters Do applications completely squeeze the s present in the cluster? Even if all s are assigned to running applications, computational resources inside s may not be fully used Application presenting low level of parallelism code being executed ( assigned working) -core stall due to lack of data etc Interconnection HPC Advisory Council Brazil Conference /43

15 Why performance in clusters is lost? In summary There are scenarios where s are available but cannot be used Accelerated applications do not make use of s 100% of the time In conclusion We are losing cycles, thus reducing cluster performance HPC Advisory Council Brazil Conference /43

16 We need something more in the cluster The current model for using s is too rigid What is missing is some flexibility for using the s in the cluster HPC Advisory Council Brazil Conference /43

17 We need something more in the cluster The current model for using s is too rigid What is missing is some flexibility for using the s in the cluster A way of seamlessly sharing s across nodes in the cluster (remote virtualization) HPC Advisory Council Brazil Conference /43

Remote virtualization envision Remote virtualization allows a new vision of a deployment, moving from the usual cluster configuration: Physical

18 Remote virtualization envision Remote virtualization allows a new vision of a deployment, moving from the usual cluster configuration: Physical configuration Interconnection to the following one: Logical connections Logical configuration HPC Advisory Council Brazil Conference 2015 Interconnection 18/43

Virtualized remote s virtualization allows all nodes to

19 Remote virtualization envision Real local s Without virtualization Interconnection With virtualization Virtualized remote s virtualization allows all nodes to access all s Interconnection HPC Advisory Council Brazil Conference /43

20 Busy cores are no longer a problem Physical Interconnection configuration Logical connections Logical Interconnection configuration HPC Advisory Council Brazil Conference /43

21 Multi- applications get benefit virtualization is also useful for multi- applications Only the s in the node can be provided to the application Without virtualization Interconnection With virtualization Many s in the cluster can be provided to the application Logical connections Interconnection HPC Advisory Council Brazil Conference /43

22 Remote virtualization frameworks Several efforts have been made to implement remote virtualization during the last years: rcuda (CUDA 7.0) GVirtuS (CUDA 3.2) DS-CUDA (CUDA 4.1) vcuda (CUDA 1.1) GViM (CUDA 1.1) GridCUDA (CUDA 2.3) V- (CUDA 4.0) Publicly available NOT publicly available HPC Advisory Council Brazil Conference /43

23 Remote virtualization frameworks InfiniBand FDR + K20!! H2D pageable D2H pageable H2D pinned D2H pinned HPC Advisory Council Brazil Conference /43

24 rd Outline Cons of remote virtualization? HPC Advisory Council Brazil Conference /43

25 Problem with remote virtualization The main virtualization drawback is the reduced bandwidth to the remote No HPC Advisory Council Brazil Conference /43

26 rcuda transfers are optimized H2D pageable D2H pageable Almost 100% of available BW H2D pinned Almost 100% of available BW D2H pinned HPC Advisory Council Brazil Conference /43

27 Rodinia performance with rcuda InfiniBand EDR + K40!! HPC Advisory Council Brazil Conference /43

28 Application performance with rcuda InfiniBand EDR + K40!! HPC Advisory Council Brazil Conference /43

29 th Outline Pros of remote virtualization? HPC Advisory Council Brazil Conference /43

30 1: more s for a single application As many s as there are in the cluster may be provided to a single application No HPC Advisory Council Brazil Conference /43

31 1: more s for a single application HPC Advisory Council Brazil Conference /43

32 1: more s for a single application MonteCarlo Multi- (from NVIDIA samples) Higher is better Lower is better HPC Advisory Council Brazil Conference /43

33 2: increased cluster performance s can be shared among jobs running in remote clients App 1 App 2 App 3 App 4 App 5 App 6 App 7 App 8 App 9 HPC Advisory Council Brazil Conference /43

applications used LAMMPS -Blast MCUDA-MEME Gromacs (no ) Three workload sizes: Small Medium Large 1 node hosting the main

34 2: increased cluster performance Test bench for studying rcuda performance at cluster level: SLURM used as job scheduler InfiniBand ConnectX-3 based cluster Dual socket E5-2620v2 Intel Xeon based nodes: 1 node without 8 nodes with NVIDIA K20 Four applications used LAMMPS -Blast MCUDA-MEME Gromacs (no ) Three workload sizes: Small Medium Large 1 node hosting the main SLURM controller 8 nodes with one K20 each SLURM v15.08 includes support for rcuda!! HPC Advisory Council Brazil Conference /43

35 2: increased cluster performance HPC Advisory Council Brazil Conference /43

36 3: more performance with less cost Let s reduce the amount of s in the cluster 43% Less 41% Less 42% Less HPC Advisory Council Brazil Conference /43

37 4: reduced energy consumption HPC Advisory Council Brazil Conference /43

Ongoing work Non- 16-node cluster being used Analysis of different assignment policies Based on -ory occupancy Based on utilization More applications used for tests: -Blast (21 s execution time)

38 Ongoing work Non- 16-node cluster being used Analysis of different assignment policies Based on -ory occupancy Based on utilization More applications used for tests: -Blast (21 s execution time) LAMMPS (15 s execution time) MCUDA-MEME (165 s execution time) GROMACS (167 s execution time) NAMD (11 m execution time) BarraCUDA (10 m execution time) -LIBSVM (5 m execution time) MUMmer (5 m execution time) Short execution time Long execution time HPC Advisory Council Brazil Conference /43

39 5: easier cluster upgrade A cluster without s may be easily upgraded to use s with rcuda No HPC Advisory Council Brazil Conference /43

40 5: easier cluster upgrade A cluster without s may be easily upgraded to use s with rcuda No HPC Advisory Council Brazil Conference /43

41 6: task migration Box A has 4 s but only one is busy Box B has 8 s but only two are busy 1. Move jobs from Box B to Box A and switch off Box B 2. Migration should be transparent to applications (decided by the global scheduler) Box B Box A TRUE GREEN COMPUTING HPC Advisory Council Brazil Conference /43

application migration allow to devote just the required computing resources to the current workload More flexible system upgrades

42 rcuda is the enabling technology for High Throughput Computing Sharing remote s makes applications to execute slower BUT more throughput (jobs/time) is achieved Datacenter administrators can choose between HPC and HTC Green Computing migration and application migration allow to devote just the required computing resources to the current workload More flexible system upgrades and updates become independent from each other. Attaching boxes to non -enabled clusters is possible HPC Advisory Council Brazil Conference /43

43 Get a free copy of rcuda at More than 600 requests world The rcuda Team Carlos Reaño Federico Silla Fernando Campos José Duato Javier Prades rcuda is owned by Technical University of Valencia HPC Advisory Council Brazil Conference /43

Is remote GPU virtualization useful? Federico Silla Technical University of Valencia Spain

Is remote GPU virtualization useful? Federico Silla Technical University of Valencia Spain Is remote virtualization useful? Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC Advisory Council Spain Conference 2015 2/57 We deal with s, obviously!