rcuda: hybrid CPU-GPU clusters Federico Silla Technical University of Valencia Spain

Size: px

Start display at page:

Download "rcuda: hybrid CPU-GPU clusters Federico Silla Technical University of Valencia Spain"

Opal Alexander
5 years ago
Views:

1 rcuda: hybrid - clusters Federico Silla Technical University of Valencia Spain

2 Outline 1. Hybrid - clusters 2. Concerns with hybrid clusters 3. One possible solution: virtualize s! 4. rcuda what s that? 5. What can I do with rcuda? 6. Additional activities HPC ADMINTECH /49

3 Outline 1. Hybrid - clusters 2. Concerns with hybrid clusters 3. One possible solution: virtualize s! 4. rcuda what s that? 5. What can I do with rcuda? 6. Additional activities HPC ADMINTECH /49

4 Current computing needs Many applications require a lot of computing resources Execution time is usually increased Applications are accelerated to get their execution time reduced computing has experienced a remarkable growth in the last years HPC ADMINTECH /49

5 Introducing s into clusters HPC ADMINTECH /49

6 Basics of computing Basic behavior of CUDA HPC ADMINTECH /49

7 Characteristics of -based clusters A -enabled cluster is a set of independent self-contained nodes. The cluster follows the shared-nothing approach: Nothing is directly shared among nodes (MPI required for aggregating computing resources within the cluster, included s) s can only be used within the node they are attached to node 1 node 2 node 3 node n Interconnection HPC ADMINTECH /49

8 Outline 1. Hybrid - clusters 2. Concerns with hybrid clusters 3. One possible solution: virtualize s! 4. rcuda what s that? 5. What can I do with rcuda? 6. Additional activities HPC ADMINTECH /49

9 First concern with accelerated clusters Non-accelerated applications keep s idle in the nodes where they use all the cores Hybrid MPI shared-memory non-accelerated applications usually span to all the cores in a node (across n nodes) A -only application spreading over these nodes will make their s unavailable for accelerated applications node 1 node 2 node 3 node n Interconnection HPC ADMINTECH /49

10 First concern with accelerated clusters (II) Accelerated applications keep s idle in the nodes where they execute Hybrid MPI shared-memory non-accelerated applications usually span to all the cores in a node (across n nodes) An accelerated application using just one core may avoid other jobs to be dispatched to this node node 1 node 2 node 3 node n Interconnection HPC ADMINTECH /49

11 First concern with accelerated clusters (II) Accelerated applications keep s idle in the nodes where they execute Hybrid MPI shared-memory non-accelerated applications usually span to all the cores in a node (across n nodes) An accelerated MPI application using just one core per node may keep part of the cluster busy node 1 node 2 node 3 node n Interconnection HPC ADMINTECH /49

12 Second concern with accelerated clusters Non-MPI multi- applications cannot make use of the tremendous resources available across the cluster (even if those resources are idle) Non-MPI multi- application All these s cannot be used by the multi- application being executed node 1 node 2 node 3 node n Interconnection HPC ADMINTECH /49

13 One more concern with accelerated clusters Do applications completely squeeze the s available in the cluster? When a is assigned to an application, computational resources inside the may not be fully used Application presenting low level of parallelism code being executed ( assigned working) -core stall due to lack of data etc node 1 node 2 node 3 node n Interconnection HPC ADMINTECH /49

14 usage of -Blast assigned but not used assigned but not used HPC ADMINTECH /49

15 usage of LAMMPS assigned but not used HPC ADMINTECH /49

16 Sharing a among jobs K20 LAMMPS: 876 MB mcuda-meme: 151 MB BarraCUDA: 3319 MB MUMmer: 2104 MB -LIBSVM: 145 MB HPC ADMINTECH /49

17 Outline 1. Hybrid - clusters 2. Concerns with hybrid clusters 3. One possible solution: virtualize s! 4. rcuda what s that? 5. What can I do with rcuda? 6. Additional activities HPC ADMINTECH /49

18 Remote virtualization envision Remote virtualization allows a new vision of a deployment, moving from the usual cluster configuration: node 1 node 2 node 3 node n Physical configuration Interconnection node 1 Logical connections node 2 node 3 node n Logical configuration Interconnection HPC ADMINTECH /49

19 Remote virtualization envision Real local s node 1 node 2 node 3 node n Without virtualization Interconnection With virtualization Virtualized remote s n virtualization allows all nodes to share all s Interconnection HPC ADMINTECH /49

20 Basics of remote virtualization HPC ADMINTECH /49

21 Basics of remote virtualization HPC ADMINTECH /49

22 Outline 1. Hybrid - clusters 2. Concerns with hybrid clusters 3. One possible solution: virtualize s! 4. rcuda what s that? 5. What can I do with rcuda? 6. Additional activities HPC ADMINTECH /49

Remote virtualization frameworks Several efforts have been made to implement remote virtualization during the last years: rcuda (CUDA 8.0) GVirtuS (CUDA 3.2) DS-CUDA (CUDA 4.

23 Remote virtualization frameworks Several efforts have been made to implement remote virtualization during the last years: rcuda (CUDA 8.0) GVirtuS (CUDA 3.2) DS-CUDA (CUDA 4.1) vcuda (CUDA 1.1) GViM (CUDA 1.1) GridCUDA (CUDA 2.3) V- (CUDA 4.0) Publicly available NOT publicly available rcuda is a development by Technical University of Valencia HPC ADMINTECH /49

24 Remote virtualization frameworks FDR InfiniBand + K20!! H2D pageable D2H pageable H2D pinned D2H pinned HPC ADMINTECH /49

25 P2P copy support within rcuda FDR InfiniBand + K20!! HPC ADMINTECH /49

26 Application performance with rcuda Several applications executed with CUDA and rcuda K20 and FDR InfiniBand K40 and EDR InfiniBand Lower is better HPC ADMINTECH /49

27 Outline 1. Hybrid - clusters 2. Concerns with hybrid clusters 3. One possible solution: virtualize s! 4. rcuda what s that? 5. What can I do with rcuda? 6. Additional activities HPC ADMINTECH /49

28 1: more s for a single application virtualization is useful for multi- applications node 1 node 2 node 3 node n Only the s in the node can be provided to the application Without virtualization Interconnection With virtualization Many s in the cluster can be provided to the application Logical connections node 1 node 2 node 3 node n Interconnection HPC ADMINTECH /49

29 1: more s for a single application 64 s! HPC ADMINTECH /49

30 1: more s for a single application MonteCarlo Multi- (from NVIDIA samples) FDR InfiniBand + NVIDIA Tesla K20 Higher is better Lower is better HPC ADMINTECH /49

31 2: task migration Job granularity instead of granularity HPC ADMINTECH /49

32 2: task migration The -Blast application is migrated up to 5 times among K40 s The aggregated volume of data is 1300 MB (consisting of 9 memory regions) The Reference line is the execution time of the application when using CUDA with a local and without any migration HPC ADMINTECH /49

33 3: virtual machines can easily access s The is assigned by using PCI passthrough exclusively to a single virtual machine Concurrent usage of the is not possible HPC ADMINTECH /49

34 3: virtual machines can easily access s High performance network available Low performance network available HPC ADMINTECH /49

35 3: virtual machines can easily access s FDR InfiniBand + K20!! LAMMPS CUDA-MEME CUDASW++ -BLAST HPC ADMINTECH /49

36 rcuda can provide advanced support rcuda can be used to provide VMs with concurrent access to one or more s with advanced support Why is more support for VMs needed? Because shared s may run out of memory and cause applications to abort VMs blindly allocate and release memory independently from each other without any kind of coordination among them HPC ADMINTECH /49

37 rcuda can provide advanced support rcuda can be used to provide VMs with concurrent access to one or more s with advanced support Why is more support for VMs needed? Because shared s may run out of memory and cause applications to abort VMs blindly allocate and release memory independently from each other without any kind of coordination among them 4GB K40 (12GB) HPC ADMINTECH /49

38 rcuda can provide advanced support rcuda can be used to provide VMs with concurrent access to one or more s with advanced support Why is more support for VMs needed? Because shared s may run out of memory and cause applications to abort VMs blindly allocate and release memory independently from each other without any kind of coordination among them 3GB 4GB K40 (12GB) HPC ADMINTECH /49

39 rcuda can provide advanced support rcuda can be used to provide VMs with concurrent access to one or more s with advanced support Why is more support for VMs needed? Because shared s may run out of memory and cause applications to abort VMs blindly allocate and release memory independently from each other without any kind of coordination among them 3GB 4GB 4GB K40 (12GB) HPC ADMINTECH /49

40 rcuda can provide advanced support rcuda can be used to provide VMs with concurrent access to one or more s with advanced support Why is more support for VMs needed? Because shared s may run out of memory and cause applications to abort VMs blindly allocate and release memory independently from each other without any kind of coordination among them 3GB 4GB 4GB 4GB K40 (12GB) HPC ADMINTECH /49

41 rcuda can provide advanced support rcuda can be used to provide VMs with concurrent access to one or more s with advanced support Why is more support for VMs needed? Because shared s may run out of memory and cause applications to abort VMs blindly allocate and release memory independently from each other without any kind of coordination among them 3GB 4GB 4GB Conclusion: In addition to access the, it is required to coordinate how 4GBVMs access the accelerator K40 (12GB) HPC ADMINTECH /49

42 4: cheaper cluster upgrade Let s suppose that a cluster without s needs to be upgraded to use s No s require large power supplies Are power supplies already installed in the nodes large enough? s require large amounts of space Does current form factor of the nodes allow to install s? The answer to both questions is usually NO HPC ADMINTECH /49

43 4: cheaper cluster upgrade Lower is better Lower is better Higher is better HPC ADMINTECH /49

44 5: increased cluster performance with Slurm s can be shared among jobs running in remote clients Job scheduler required for coordination Slurm was enhanced App 1 App 2 App 3 App 4 App 5 App 6 App 7 App 8 App 9 HPC ADMINTECH /49

45 5: increased cluster performance with Slurm Lower is better Lower is better Higher is better HPC ADMINTECH /49

46 Outline 1. Hybrid - clusters 2. Concerns with hybrid clusters 3. One possible solution: virtualize s! 4. rcuda what s that? 5. What can I do with rcuda? 6. Additional activities HPC ADMINTECH /49

47 Additional activities Latest rcuda SC 16: Support for CUDA 8.0 P2P memory copies Improved support for cudnn On going developments: Improved support for deep learning Support for 64-bit ARM architectures Support for Matlab Support for popular graphics rendering suites Developments we are considering Support for PowerPC architectures and much more!! rcuda is a development by Technical University of Valencia HPC ADMINTECH /49

48 Get a free copy of rcuda at More than 750 requests world rcuda is a development by Technical University of Valencia HPC ADMINTECH /49

49 Thanks! Questions? rcuda is a development by Technical University of Valencia HPC ADMINTECH /49

Deploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain

Deploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain Deploying remote virtualization with rcuda Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC ADMINTECH 2016 2/53 It deals with s, obviously! HPC ADMINTECH