Deploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain

Size: px

Start display at page:

Download "Deploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain"

Godfrey Gallagher
5 years ago
Views:

1 Deploying remote virtualization with rcuda Federico Silla Technical University of Valencia Spain

2 st Outline What is remote virtualization? HPC ADMINTECH /53

3 It deals with s, obviously! HPC ADMINTECH /53

4 Basics of computing Basic behavior of CUDA HPC ADMINTECH /53

5 Remote virtualization No HPC ADMINTECH /53

6 Remote virtualization A software technology that enables a more flexible use of s in computing facilities No HPC ADMINTECH /53

7 Basics of remote virtualization HPC ADMINTECH /53

8 Basics of remote virtualization HPC ADMINTECH /53

9 Remote virtualization envision Remote virtualization allows a new vision of a deployment, moving from the usual cluster configuration: Physical configuration Interconnection to the following one: Logical connections Logical configuration HPC ADMINTECH 2016 Interconnection 9/53

10 nd Outline Why is remote virtualization needed? HPC ADMINTECH /53

11 Outline Which is the problem with -enabled clusters? HPC ADMINTECH /53

12 Characteristics of -based clusters A -enabled cluster is a set of independent self-contained nodes that leverage the shared-nothing approach: Nothing is directly shared among nodes (MPI required for aggregating computing resources across the cluster, included s) s can only be used within the node they are attached to Interconnection HPC ADMINTECH /53

13 First concern with accelerated clusters Applications can only use the s located within their node: Non-accelerated applications keep s idle in the nodes where they use all the cores A -only application spreading over these four nodes would make their s unavailable for accelerated applications Interconnection HPC ADMINTECH /53

14 Second concern with accelerated clusters Applications can only use the s located within their node: non-mpi multi- applications running on a subset of nodes cannot make use of the tremendous resources available at other cluster nodes (even if they are idle) Non-MPI multi- application All these s cannot be used by the multi- application being executed Interconnection HPC ADMINTECH /53

15 One more concern with accelerated clusters Do applications completely squeeze the s available in the cluster? When a is assigned to an application, computational resources inside the may not be fully used Application presenting low level of parallelism code being executed ( assigned working) -core stall due to lack of data etc Interconnection HPC ADMINTECH /53

16 usage of -Blast assigned but not used assigned but not used HPC ADMINTECH /53

17 usage of CUDA-MEME utilization is far away from maximum HPC ADMINTECH /53

18 Why performance in clusters is lost? In summary There are scenarios where s are available but cannot be used Accelerated applications do not make use of s 100% of the time In conclusion cycles are lost, thus reducing cluster performance HPC ADMINTECH /53

19 We need something more in the cluster The current model for using s is too rigid What is missing is some flexibility for using the s in the cluster HPC ADMINTECH /53

20 We need something more in the cluster The current model for using s is too rigid What is missing is some flexibility for using the s in the cluster A way of seamlessly sharing s across nodes in the cluster (remote virtualization) HPC ADMINTECH /53

Virtualized remote s virtualization allows all nodes

21 Remote virtualization envision Real local s Without virtualization Interconnection With virtualization Virtualized remote s virtualization allows all nodes to access all s Interconnection HPC ADMINTECH /53

22 Remote virtualization frameworks Several efforts have been made to implement remote virtualization during the last years: rcuda (CUDA 7.0) GVirtuS (CUDA 3.2) DS-CUDA (CUDA 4.1) vcuda (CUDA 1.1) GViM (CUDA 1.1) GridCUDA (CUDA 2.3) V- (CUDA 4.0) Publicly available NOT publicly available HPC ADMINTECH /53

23 rd Outline Cons of remote virtualization? HPC ADMINTECH /53

24 Problem with remote virtualization The main drawback of virtualization is the reduced bandwidth to the remote No HPC ADMINTECH /53

25 Using InfiniBand networks HPC ADMINTECH /53

26 Optimized transfers within rcuda H2D pageable D2H pageable rcuda FDR Orig rcuda EDR Orig rcuda FDR Opt rcuda EDR Opt Almost 100% of available BW H2D pinned Almost 100% of available BW D2H pinned HPC ADMINTECH /53

27 rcuda optimizations on applications Several applications executed with CUDA and rcuda K20 and FDR InfiniBand K40 and EDR InfiniBand Lower is better HPC ADMINTECH /53

28 th Outline Pros of remote virtualization? HPC ADMINTECH /53

1: more s for a single application virtualization is useful for multi- applications Only the s in the node can be provided to the application Without

29 1: more s for a single application virtualization is useful for multi- applications Only the s in the node can be provided to the application Without virtualization Interconnection With virtualization Many s in the cluster can be provided to the application Logical connections Interconnection HPC ADMINTECH /53

30 1: more s for a single application 64 s! HPC ADMINTECH /53

31 1: more s for a single application MonteCarlo Multi- (from NVIDIA samples) FDR InfiniBand + NVIDIA Tesla K20 Higher is better Lower is better HPC ADMINTECH /53

32 2: busy cores do not block s Physical Interconnection configuration Logical connections Logical Interconnection configuration HPC ADMINTECH /53

33 3: cheaper cluster upgrade Let s suppose that a cluster without s needs to be upgraded to use s No s require large power supplies Are power supplies already installed in the nodes large enough? s require large amounts of space Does current form factor of the nodes allow to install s? The answer to both questions is usually NO HPC ADMINTECH /53

34 3: cheaper cluster upgrade Approach 1: augment the cluster with some CUDA enabled nodes only those -enabled nodes can execute accelerated applications No -enabled HPC ADMINTECH /53

35 3: cheaper cluster upgrade Approach 2: augment the cluster with some rcuda servers all nodes can execute accelerated applications -enabled HPC ADMINTECH /53

36 3: cheaper cluster upgrade Dual socket E5-2620v2 Intel Xeon + 32GB RAM + K20 FDR InfiniBand based cluster 16 nodes without + 1 node with 4 s HPC ADMINTECH /53

37 3: cheaper cluster upgrade Applications used for tests: -Blast (21 seconds; 1 ; 1599 MB) LAMMPS (15 seconds; 4 s; 876 MB) Non- MCUDA-MEME (165 seconds; 4 s; 151 MB) GROMACS (167 seconds) NAMD (11 minutes) BarraCUDA (10 minutes; 1 ; 3319 MB) -LIBSVM (5 minutes; 1; 145 MB) MUMmer (5 minutes; 1; 2804 MB) Set 1 Set 2 Short execution time Long execution time Three workloads: Set 1 Set 2 Set 1 + Set 2 Three workload sizes: Small (100 jobs) Medium (200 jobs) Large (400 jobs) HPC ADMINTECH /53

38 3: cheaper cluster upgrade HPC ADMINTECH /53

39 4: task migration Box A has 4 s but only one is busy Box B has 8 s but only two are busy 1. Move jobs from Box B to Box A and switch off Box B 2. Migration should be transparent to applications (decided by the global scheduler) Box B Box A Migration is performed at granularity HPC ADMINTECH /53

40 4: task migration Job granularity instead of granularity HPC ADMINTECH /53

41 5: virtual machines can easily access s The is assigned by using PCI passthrough exclusively to a single virtual machine Concurrent usage of the is not possible HPC ADMINTECH /53

42 5: virtual machines can easily access s High performance network available Low performance network available HPC ADMINTECH /53

43 5: virtual machines can easily access s KVM 1.6% 2.5% 0.5% 0.07% 6.1% Xen FDR InfiniBand + K20!! 2.9% 0.7% 2.3% HPC ADMINTECH /53

44 th Outline What happens at the cluster level? HPC ADMINTECH /53

45 rcuda at cluster level Slurm s can be shared among jobs running in remote clients Job scheduler required for coordination Slurm was selected App 1 App 2 App 3 App 4 App 5 App 6 App 7 App 8 App 9 HPC ADMINTECH /53

Applications for studying rcuda+slurm Applications used for tests: -Blast (21 seconds; 1 ; 1599 MB) LAMMPS (15 seconds; 4 s; 876 MB) Non- MCUDA-MEME (165 seconds; 4 s; 151 MB) GROMACS (167 seconds)

46 Applications for studying rcuda+slurm Applications used for tests: -Blast (21 seconds; 1 ; 1599 MB) LAMMPS (15 seconds; 4 s; 876 MB) Non- MCUDA-MEME (165 seconds; 4 s; 151 MB) GROMACS (167 seconds) NAMD (11 minutes) BarraCUDA (10 minutes; 1 ; 3319 MB) -LIBSVM (5 minutes; 1; 145 MB) MUMmer (5 minutes; 1; 2804 MB) Set 1 Set 2 Short execution time Long execution time Three workloads: Set 1 Set 2 Set 1 + Set 2 Three workload sizes: Small (100 jobs) Medium (200 jobs) Large (400 jobs) HPC ADMINTECH /53

with the Slurm scheduler 8 nodes node with the Slurm scheduler

47 Test bench for studying rcuda+slurm Dual socket E5-2620v2 Intel Xeon + 32GB RAM + K20 FDR InfiniBand based cluster 4 nodes node with the Slurm scheduler 8 nodes node with the Slurm scheduler 16 nodes node with the Slurm scheduler HPC ADMINTECH /53

48 Increased cluster performance with Slurm Lower is better Lower is better Higher is better Results for 16 nodes with Set 1 HPC ADMINTECH /53

49 Increased cluster performance with Slurm Lower is better Lower is better Higher is better Results for 16 nodes with Set 1+2 HPC ADMINTECH /53

50 th Outline in summary HPC ADMINTECH /53

Similar performance with smaller investment 5.

51 Pros and cons of rcuda Cons: 1. Reduced bandwidth to remote (really a concern??) Pros: 1. Many s for a single application 2. Concurrent access to virtual machines 3. Increased cluster throughput 4. Similar performance with smaller investment 5. Easier (cheaper) cluster upgrade 6. Migration of jobs 7. Reduced energy consumption 8. Increased utilization rcuda is a development by Technical University of Valencia HPC ADMINTECH /53

52 Get a free copy of rcuda at More than 650 requests world rcuda is a development by Technical University of Valencia HPC ADMINTECH /53

53 Thanks! Questions? rcuda is a development by Technical University of Valencia HPC ADMINTECH /53

Is remote GPU virtualization useful? Federico Silla Technical University of Valencia Spain

Is remote GPU virtualization useful? Federico Silla Technical University of Valencia Spain Is remote virtualization useful? Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC Advisory Council Spain Conference 2015 2/57 We deal with s, obviously!