The configurations of each Dell R730:

Similar documents
GPU on OpenStack for Science

Evaluation of AMD EPYC

MOC Dataset Repository and Big Data as a Service Platform

Running in parallel. Total number of cores available after hyper threading (virtual cores)

NAMD GPU Performance Benchmark. March 2011

rabbit.engr.oregonstate.edu What is rabbit?

Independent consultant. (Ex-) Oracle ACE Director. Member of OakTable Network. Performance Troubleshooting In-house workshops

Redhat OpenStack 5.0 and PLUMgrid OpenStack Networking Suite 2.0 Installation Hands-on lab guide

Mass Big Data: Progressive Growth through Strategic Collaboration

Governor Patrick Announces Funding to Launch Massachusetts Open Cloud Project Celebrates Release of 2014 Mass Big Data Report

CS 261 Fall Caching. Mike Lam, Professor. (get it??)

Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to OpenStack Environment

IBM Deep Learning Solutions

VIRTUAL GPU SOFTWARE R390 FOR RED HAT ENTERPRISE LINUX WITH KVM

Introduction to Cisco and Intel NFV Quick Start

Red Hat Development Suite 2.2

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

Webinar: Getting started with CSC's IaaS cloud computing services Pouta

BUILDING A GPU-FOCUSED CI SOLUTION

HP GTC Presentation May 2012

Architectures for Scalable Media Object Search

Red Hat Development Suite 2.1

GRID SOFTWARE. DU _v4.6 January User Guide

Install ISE on a VMware Virtual Machine

Building NVLink for Developers

IBM Power AC922 Server

Install ISE on a VMware Virtual Machine

Linux Network Tuning Guide for AMD EPYC Processor Based Servers

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

Cisco Stealthwatch Cloud. Private Network Monitoring Advanced Configuration Guide

N-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo

Zoptymalizuj Swoje Centrum Danych z Red Hat Virtualization. Jacek Skórzyński Solution Architect/Red Hat

Intel(R) Platforms ============================= PRODUCT ============================= Device Drivers for the Intel(R) Graphics Media Accelerator

WHITE PAPER - JULY 2018 ENABLING MACHINE LEARNING AS A SERVICE (MLAAS) WITH GPU ACCELERATION USING VMWARE VREALIZE AUTOMATION. Technical White Paper

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA

CIT 668: System Architecture. Amazon Web Services

HP Helion OpenStack Carrier Grade 1.1: Release Notes

DEEP DIVE: OPENSTACK COMPUTE

NVIDIA Tesla C Installation Guide of C-2075 Driver on Linux

n N c CIni.o ewsrg.au

CA AppLogic and Dell PowerEdge R620 Equipment Validation

Install ISE on a VMware Virtual Machine

Install ISE on a VMware Virtual Machine

3.6. How to Use the Reports and Data Warehouse Capabilities of Red Hat Enterprise Virtualization. Last Updated:

Introducing SUSE Enterprise Storage 5

Red Hat Cloud Platforms with Dell EMC. Quentin Geldenhuys Emerging Technology Lead

IBM System p5 185 Express Server

PERFORMANCE IMPLICATIONS OF NUMA WHAT YOU DON T KNOW COULD HURT YOU! CLAIRE CATES SAS INSTITUTE

CA AppLogic and Dell PowerEdge R420 Equipment Validation

VIRTUAL GPU SOFTWARE. DU _v5.0 through 5.2 Revision 05 March User Guide

HDX 3D Version 1.0 Requirements Guide

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Install ISE on a VMware Virtual Machine

NVIDIA GRID. Linux Virtual Desktops with NVIDIA Virtual GPUs for Chip-Design Applications

OpenPOWER Performance

Page 1 of 20 David Pimm Avid Technology September 15th, 2012 Rev - A

IBM Power Advanced Compute (AC) AC922 Server

Deploying and Managing Dell Big Data Clusters with StackIQ Cluster Manager

DGX UPDATE. Customer Presentation Deck May 8, 2017

Page 1 of 20 David Pimm Avid Technology Dec 7th, 2012 Rev - B

Cisco Integration Platform

Graph Database and Analytics in a GPU- Accelerated Cloud Offering

Page 1 of 20 David Pimm Avid Technology 10 December 2013 Rev - D

VIRTUAL GPU SOFTWARE R384 FOR HUAWEI UVP

Tiny GPU Cluster for Big Spatial Data: A Preliminary Performance Evaluation

MAKING CONTAINERS EASIER WITH HPC CONTAINER MAKER. Scott McMillan September 2018

NVIDIA DGX OS SERVER VERSION 3.1.4

IBM Power Systems HPC Cluster

CA AppLogic and Dell PowerEdge R820 Equipment Validation

QuickSpecs. Models SATA RAID Controller HP 6-Port SATA RAID Controller B21. HP 6-Port SATA RAID Controller. Overview.

Developing, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge

Gateway Guide. Leostream Gateway. Advanced Capacity and Connection Management for Hybrid Clouds

Amazon Elastic Compute Cloud (EC2)

IBM Leading High Performance Computing and Deep Learning Technologies

EDB Ark 2.0 Release Notes

HP ProLiant m300 1P C2750 CPU 32GB Configure-to-order Server Cartridge B21

SUPER CLOUD STORAGE MEASUREMENT STUDY AND OPTIMIZATION

Singularity CRI User Documentation

DELIVERING HIGH-PERFORMANCE REMOTE GRAPHICS WITH NVIDIA GRID VIRTUAL GPU. Andy Currid NVIDIA

IBM PowerKVM available with the Linux only scale-out servers IBM Redbooks Solution Guide

MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE

VIRTUAL GPU SOFTWARE R384 FOR HUAWEI UVP

THE API DEVELOPER EXPERIENCE ENABLING RAPID INTEGRATION

VIRTUAL GPU SOFTWARE. DU _v6.0 through 6.1 Revision 02 June User Guide

MWC 2015 End to End NFV Architecture demo_

Hardware and software requirements IBM

EMC Smarts SAM, IP, ESM, MPLS, NPM, OTM, and VoIP Managers 9.5 Support Matrix

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

INCREASE IT EFFICIENCY, REDUCE OPERATING COSTS AND DEPLOY ANYWHERE

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

VIRTUAL GPU SOFTWARE R384 FOR MICROSOFT WINDOWS SERVER

Sun and Oracle. Kevin Ashby. Oracle Technical Account Manager. Mob:

VIRTUAL GPU SOFTWARE R384 FOR MICROSOFT WINDOWS SERVER

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

VIRTUAL GPU SOFTWARE. DU _v6.0 March User Guide

BUILD, MODERNIZE AND PROTECT WITH IBM CLOUD PRIVATE

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

Transcription:

Introduction The Mass Open Cloud (MOC) is a collaborative project between higher education, government, non-profit entities and industry. The mission of the MOC is To create a self-sustaining at-scale public cloud based on the Open Cloud exchange model. It will serve as a marketplace for industry partners as well as a place for researchers and industry to innovate and expose innovation to real users. To learn more about the MOC, please visit https://massopen.cloud/about/ The Center of Geographic Analysis (CGA) at Harvard University is a collaborator of the MOC. The CGA has moved its services, including Billion Objects Platform, WorldMap and HyperMap, along with their development platform to the MOC OpenStack cloud. As the computing power of GPU advanced in the past years, the CGA as well as the MOC both see the potential benefits in using GPU in data analytics and visualization. In supporting the CGA, the MOC took on the challenge of becoming the first OpenStack cloud provider to deploy MapD with GPU enabled. Another MOC project, Boston Children s Hospital Imaging Collaboration, also saw the needs for GPU. The project is a joint effort between the MOC, Boston Children s Hospital and Red Hat, a core industry partner. As a result, Red Hat generously provided funding for three Dell R730 servers and three NVIDIA Tesla P100 Server GPUs. This set of hardware is to be shared between this project and the CGA. Hardware and OpenStack Cloud The three Dell R730 servers were each installed with NVIDIA Tesla P100 Server GPU. The servers were integrated to be part of the MOC Engage1 cluster, one of the three clusters in the Mass Open Cloud. After which we deployed all three GPU servers into already existing OpenStack cloud as compute nodes. In Engage1, in addition to these new 3 GPU servers, there are 10 OpenStack computes and 2 controller nodes. The version of OpenStack is Pike. There is a half peta byte Ceph storage with Jewel release deployed. The network is a Brocade (now Broadcom) bifurcated network. In addition, an MIT HPC setup consists of 200+ servers are also part of this network. The configurations of each Dell R730: $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 56 On-line CPU(s) list: 0-55 Thread(s) per core: 2 Core(s) per socket: 14 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz Stepping: 1 CPU MHz: 2347.656 CPU max MHz: 3200.0000 1

CPU min MHz: 1200.0000 BogoMIPS: 4000.12 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 35840K $ uname -a Linux e1-gcompute-11.eng1.moc.edu 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux MapD instance on a GPU OpenStack compute node For this POC effort, the scenario we have is three OpenStack compute nodes each with a GPU card, and three projects that would be sharing this setting. The fastest and simplest approach is to simply use the PCI passthrough feature in OpenStack. This implies no sharing of the GPU and a maximum limit of the three instances, each runs on one of the three compute/gpu nodes. Figure 1. OpenStack GPU flavor created for this POC By default, OpenStack Compute load balancer decides which compute node an instance would be running on. In this case, since there are three GPU compute nodes and only three GPU flavor instances, assigning each instance to GPU compute nodes is quite intuitive. To create an instance on OpenStack, a customized flavor with the following configuration was created: Flavor name: m2.md_large_gpu CPU: 16 virtual cores Memory : 64GB Hard Drive: 200GB 2

This flavor mimics after p3.2xlarge, MapD s second most modest EC2 Instance Type on Amazon Web Services. To explore other MapD configurations, see MapD Database & Visual Analytics Platform Figure 2. The configuration of the OpenStack GPU instance Deploy RHEL 7 on the MapD instance For the operating system, we chose RHEL 7 with subscriptions, for easier deployment. Before NVIDIA driver and MapD installations, it is highly recommended to update Red Hat server so all the dependencies are lined up. You should notice that rebooting the virtual machine instance will also reboot the physical system, because the the virtual machine has direct passthrough access to the P100 PCIe pins. If you have other virtual machines running on this physical system, please back them up before restarting. 3

Deploy NVIDIA CUDA We installed the following latest NVIDIA Tesla driver and CUDA versions: NVIDIA 375.66 CUDA 9.1.85-1 To see more versions, go to http://us.download.nvidia.com/tesla/. NVIDIA also provides drivers for different Linux distros. As for CUDA, go to http://developer.download.nvidia.com/compute/cuda/repos/. 1. wget "http://us.download.nvidia.com/tesla/375.66/nvidia-diag-driver-local-repo-rhel7-375.66-1.x86_64.rpm" 2. cd nvidia-diag-driver-local-repo-rhel7-375.66-1.x86_64.rpm 3. sudo rpm -ivv nvidia-diag-driver-local-repo-rhel7-375.66-1.x86_64.rpm 4. curl -O -u mapd http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-reporhel7-9.1.85-1.x86_64.rpm 5. sudo yum clean expire-cache 6. sudo yum install cuda-drivers 7. sudo reboot Figure 3. Selected NVIDIA CUDA for Linux distros 4

To check if the driver installation is successful, run this on the terminal: # lspci -vnnn perl -lne 'print if /^\d+\:.+(\[\s+\:\s+\])/' grep VGA 00:02.0 VGA compatible controller [0300]: Cirrus Logic GD 5446 [1013:00b8] (prog-if 00 [VGA controller]) If successful, the terminal should return a GPU controller, similar to above (2nd line). Deploy MapD libraries After deploying an OpenStack instance, we deployed MapD on the virtual machine using the following MapD CentOS 7 recipe. https://www.mapd.com/docs/latest/getting-started/centos7-yum-gpu-ce-recipe/#centos7-yum-gpu-ce-recipe Testing To access the MapD dashboard, go to port 9092 of the MapD instance. Figure 4. A new MapD dashboard running on the MOC By following this MapD CentOS7 recipe, a user can ingest MapD s sample data such as 2015 NYC tree data or 2008 US flights data into the MapD database. The figure below is an example of how data visualization for 2008 flights dataset 5

Detours In this section, we discussed the difficult challenges we encountered to get the system as well as the software up and running. 1. Incorrect Cables The Dell systems and the NVIDIA P100 cards were ordered and shipped separately which is not typical for Dell. This is for economic reason provided by the NVIDIA education discount. Thanks to MapD initial support for this transaction. The cable that was shipped with the server turned out to be an incorrect one. As a result, when rebooting the system, the error code PSU0036 was displayed on the console. Figure 6. Error caused by using incorrect cable Further, the operating system itself also detected this issue during a driver deployment. Figure 7 Error messages detected by a driver 2. Multiple MapD users While testing the functionalities of the instance, we discovered that if two users use the same account to access MapD via port 9092, there will be rendering errors. Therefore, in order to test, we created two other 6

users. These users can independently access the data and perform analytics/ data visualization using MapD Core. 7