Opportunities of the rcuda remote GPU virtualization middleware. Federico Silla Universitat Politècnica de València Spain

Size: px

Start display at page:

Download "Opportunities of the rcuda remote GPU virtualization middleware. Federico Silla Universitat Politècnica de València Spain"

Malcolm Parks
6 years ago
Views:

1 Opportunities of the rcuda remote virtualization middleware Federico Silla Universitat Politècnica de València Spain

2 st Outline What is rcuda? HPC Advisory Council China Conference /45

3 s are the focus! rcuda remote CUDA HPC Advisory Council China Conference /45

4 Basics of computing Basic behavior of CUDA HPC Advisory Council China Conference /45

5 Basics of computing HPC Advisory Council China Conference /45

6 Remote virtualization No HPC Advisory Council China Conference /45

7 rcuda remote CUDA rcuda is a software technology that enables a more flexible use of s in computing facilities No App 1 App 2 App 3 App 4 App 5 App 6 App 7 App 8 App 9 rcuda is a development by Universitat Politècnica de València, Spain HPC Advisory Council China Conference /45

8 Basics of rcuda rcuda is a development by Universitat Politècnica de València, Spain HPC Advisory Council China Conference /45

9 Basics of rcuda rcuda is a development by Universitat Politècnica de València, Spain HPC Advisory Council China Conference /45

10 Remote virtualization envision Remote virtualization allows a new vision of a deployment, moving from the usual cluster configuration: node 1 node 2 node 3 node n Physical configuration Interconnection to the following one: node 1 Logical connections node 2 node 3 node n Logical configuration Interconnection HPC Advisory Council China Conference /45

11 nd Outline is rcuda useful? HPC Advisory Council China Conference /45

12 Characteristics missing in s Can we make an even better usage of s with rcuda? Which characteristics do we miss from s? 1. Many s in a single box 2. Easily sharing a given (or s) HPC Advisory Council China Conference /45

13 Characteristics missing in s 1. Why many s in a single box Traditionally, in order to use many s, applications had to use MPI: s can only be used within the node they are attached to Nothing is directly shared among nodes (MPI required for aggregating computing resources within the cluster) node 1 node 2 node 3 node n A non-mpi application running in this node can only use the s in this node Interconnection HPC Advisory Council China Conference /45

14 Characteristics missing in s 1. Many s in a single box The amount of s is limited by the physical space inside the node HPC Advisory Council China Conference /45

15 Many s in a single box K40 s and EDR InfiniBand Lower is better MonteCarlo multi- program running in 10 NVIDIA Tesla K40 s HPC Advisory Council China Conference /45

16 Many s in a single box K20 s and FDR InfiniBand Lower is better MonteCarlo multi- program running in 14 NVIDIA Tesla K20 s HPC Advisory Council China Conference /45

17 Many s in a single box 64 s!! HPC Advisory Council China Conference /45

18 Many s in a single box Work in progress!! K20 s HPC Advisory Council China Conference /45

19 Many s in a single box Deep Learning Work in progress!! The training stage of a deep learning application can be accelerated by providing many s Development still in progress HPC Advisory Council China Conference /45

20 Characteristics missing in s Can we make an even better usage of s with rcuda? Which characteristics do we miss from s? 1. Many s in a single box 2. Easily sharing a given (or s) HPC Advisory Council China Conference /45

21 Characteristics missing in s 2. Easily sharing a given Why should we be interested in sharing s among applications? HPC Advisory Council China Conference /45

22 usage of -Blast assigned but not used assigned but not used NVIDIA Tesla K20 HPC Advisory Council China Conference /45

23 usage of LAMMPS assigned but not used NVIDIA Tesla K20 HPC Advisory Council China Conference /45

24 Characteristics missing in s Which characteristics do we miss from s? 1. Many s in a single box 2. Easily sharing a given (or s) The remote virtualization technique can efficiently address these concerns HPC Advisory Council China Conference /45

25 Characteristics missing in s node 1 node 2 node 3 node n Interconnection The remote virtualization technique can efficiently address these concerns HPC Advisory Council China Conference /45

26 Characteristics missing in s Interconnection The remote virtualization technique can efficiently address these concerns HPC Advisory Council China Conference /45

27 Characteristics missing in s Interconnection The remote virtualization technique can efficiently address these concerns HPC Advisory Council China Conference /45

28 Characteristics missing in s Interconnection The remote virtualization technique can efficiently address these concerns HPC Advisory Council China Conference /45

29 rd Outline How about the performance of rcuda? HPC Advisory Council China Conference /45

30 Performance of rcuda CUDA rcuda to (host to device) H2D pinned H2D pageable to (device to host) D2H pinned D2H pageable Used by applications HPC Advisory Council China Conference /45

31 Performance of rcuda H2D pageable Higher is better D2H pageable HPC Advisory Council China Conference /45

32 Performance of rcuda rcuda CUDA rcuda scenario 1 rcuda scenario 2 HPC Advisory Council China Conference /45

33 Performance of rcuda rcuda in scenario 2 (s located at different nodes) Higher is better HPC Advisory Council China Conference /45

34 Performance of applications using rcuda K20 and FDR InfiniBand K40 and EDR InfiniBand Lower is better Lower is better HPC Advisory Council China Conference /45

35 Performance of applications using rcuda EDR InfiniBand and P100 Lower is better BarraCUDA CUDA-MEME Lower is better HPC Advisory Council China Conference /45

36 th Outline Other benefits of rcuda HPC Advisory Council China Conference /45

37 Easily sharing a among VMs A is assigned to a VM by using PCI passthrough Assignment is done exclusively to a single virtual machine. Concurrent usage of the is not possible HPC Advisory Council China Conference /45

38 Easily sharing a among VMs High performance network available Low performance network available HPC Advisory Council China Conference /45

39 Overhead of rcuda within KVM VMs FDR InfiniBand + K20!! Lower is better 1.6% 2.5% 0.5% 0.07% HPC Advisory Council China Conference /45

40 Server consolidation with rcuda 1 off 3 off off off 7 off 9 off off utilization (%) rcuda provides support for migrating jobs from one in the cluster to another located at the same or different cluster node. Migration is transparent to applications Only the part of the application is migrated. HPC Advisory Council China Conference /45

Example of migration performance The -Blast application is migrated up to 5 times among K40 s The aggregated volume of data is 1300 MB (consisting of 9 memory regions) Lower

41 Example of migration performance The -Blast application is migrated up to 5 times among K40 s The aggregated volume of data is 1300 MB (consisting of 9 memory regions) Lower is better The Reference line is the execution time of the application when using CUDA with a local and without any migration HPC Advisory Council China Conference /45

42 Increasing heterogeneity rcuda clients and servers can use different processor architectures rcuda clients ARM rcuda servers x86 IBM Power IBM Power x86 HPC Advisory Council China Conference /45

43 Get a free copy of rcuda at More than 850 requests world rcuda is a development by Universitat Politècnica de València, Spain HPC Advisory Council China Conference /45

44 Tony Díaz Pablo Higueras Javier Prades Carlos Reaño Jaime Sierra Federico Silla rcuda is a development by Universitat Politècnica de València, Spain HPC Advisory Council China Conference /45

45 Thanks! Questions? rcuda is a development by Universitat Politècnica de València, Spain HPC Advisory Council China Conference /45

rcuda: desde máquinas virtuales a clústers mixtos CPU-GPU

rcuda: desde máquinas virtuales a clústers mixtos CPU-GPU Federico Silla Universitat Politècnica de València HPC ADMINTECH 2018 rcuda: from virtual machines to hybrid CPU-GPU clusters Federico Silla Universitat