Introduction The Mass Open Cloud (MOC) is a collaborative project between higher education, government, non-profit entities and industry. The mission of the MOC is To create a self-sustaining at-scale public cloud based on the Open Cloud exchange model. It will serve as a marketplace for industry partners as well as a place for researchers and industry to innovate and expose innovation to real users. To learn more about the MOC, please visit https://massopen.cloud/about/ The Center of Geographic Analysis (CGA) at Harvard University is a collaborator of the MOC. The CGA has moved its services, including Billion Objects Platform, WorldMap and HyperMap, along with their development platform to the MOC OpenStack cloud. As the computing power of GPU advanced in the past years, the CGA as well as the MOC both see the potential benefits in using GPU in data analytics and visualization. In supporting the CGA, the MOC took on the challenge of becoming the first OpenStack cloud provider to deploy MapD with GPU enabled. Another MOC project, Boston Children s Hospital Imaging Collaboration, also saw the needs for GPU. The project is a joint effort between the MOC, Boston Children s Hospital and Red Hat, a core industry partner. As a result, Red Hat generously provided funding for three Dell R730 servers and three NVIDIA Tesla P100 Server GPUs. This set of hardware is to be shared between this project and the CGA. Hardware and OpenStack Cloud The three Dell R730 servers were each installed with NVIDIA Tesla P100 Server GPU. The servers were integrated to be part of the MOC Engage1 cluster, one of the three clusters in the Mass Open Cloud. After which we deployed all three GPU servers into already existing OpenStack cloud as compute nodes. In Engage1, in addition to these new 3 GPU servers, there are 10 OpenStack computes and 2 controller nodes. The version of OpenStack is Pike. There is a half peta byte Ceph storage with Jewel release deployed. The network is a Brocade (now Broadcom) bifurcated network. In addition, an MIT HPC setup consists of 200+ servers are also part of this network. The configurations of each Dell R730: $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 56 On-line CPU(s) list: 0-55 Thread(s) per core: 2 Core(s) per socket: 14 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz Stepping: 1 CPU MHz: 2347.656 CPU max MHz: 3200.0000 1
CPU min MHz: 1200.0000 BogoMIPS: 4000.12 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 35840K $ uname -a Linux e1-gcompute-11.eng1.moc.edu 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux MapD instance on a GPU OpenStack compute node For this POC effort, the scenario we have is three OpenStack compute nodes each with a GPU card, and three projects that would be sharing this setting. The fastest and simplest approach is to simply use the PCI passthrough feature in OpenStack. This implies no sharing of the GPU and a maximum limit of the three instances, each runs on one of the three compute/gpu nodes. Figure 1. OpenStack GPU flavor created for this POC By default, OpenStack Compute load balancer decides which compute node an instance would be running on. In this case, since there are three GPU compute nodes and only three GPU flavor instances, assigning each instance to GPU compute nodes is quite intuitive. To create an instance on OpenStack, a customized flavor with the following configuration was created: Flavor name: m2.md_large_gpu CPU: 16 virtual cores Memory : 64GB Hard Drive: 200GB 2
This flavor mimics after p3.2xlarge, MapD s second most modest EC2 Instance Type on Amazon Web Services. To explore other MapD configurations, see MapD Database & Visual Analytics Platform Figure 2. The configuration of the OpenStack GPU instance Deploy RHEL 7 on the MapD instance For the operating system, we chose RHEL 7 with subscriptions, for easier deployment. Before NVIDIA driver and MapD installations, it is highly recommended to update Red Hat server so all the dependencies are lined up. You should notice that rebooting the virtual machine instance will also reboot the physical system, because the the virtual machine has direct passthrough access to the P100 PCIe pins. If you have other virtual machines running on this physical system, please back them up before restarting. 3
Deploy NVIDIA CUDA We installed the following latest NVIDIA Tesla driver and CUDA versions: NVIDIA 375.66 CUDA 9.1.85-1 To see more versions, go to http://us.download.nvidia.com/tesla/. NVIDIA also provides drivers for different Linux distros. As for CUDA, go to http://developer.download.nvidia.com/compute/cuda/repos/. 1. wget "http://us.download.nvidia.com/tesla/375.66/nvidia-diag-driver-local-repo-rhel7-375.66-1.x86_64.rpm" 2. cd nvidia-diag-driver-local-repo-rhel7-375.66-1.x86_64.rpm 3. sudo rpm -ivv nvidia-diag-driver-local-repo-rhel7-375.66-1.x86_64.rpm 4. curl -O -u mapd http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-reporhel7-9.1.85-1.x86_64.rpm 5. sudo yum clean expire-cache 6. sudo yum install cuda-drivers 7. sudo reboot Figure 3. Selected NVIDIA CUDA for Linux distros 4
To check if the driver installation is successful, run this on the terminal: # lspci -vnnn perl -lne 'print if /^\d+\:.+(\[\s+\:\s+\])/' grep VGA 00:02.0 VGA compatible controller [0300]: Cirrus Logic GD 5446 [1013:00b8] (prog-if 00 [VGA controller]) If successful, the terminal should return a GPU controller, similar to above (2nd line). Deploy MapD libraries After deploying an OpenStack instance, we deployed MapD on the virtual machine using the following MapD CentOS 7 recipe. https://www.mapd.com/docs/latest/getting-started/centos7-yum-gpu-ce-recipe/#centos7-yum-gpu-ce-recipe Testing To access the MapD dashboard, go to port 9092 of the MapD instance. Figure 4. A new MapD dashboard running on the MOC By following this MapD CentOS7 recipe, a user can ingest MapD s sample data such as 2015 NYC tree data or 2008 US flights data into the MapD database. The figure below is an example of how data visualization for 2008 flights dataset 5
Detours In this section, we discussed the difficult challenges we encountered to get the system as well as the software up and running. 1. Incorrect Cables The Dell systems and the NVIDIA P100 cards were ordered and shipped separately which is not typical for Dell. This is for economic reason provided by the NVIDIA education discount. Thanks to MapD initial support for this transaction. The cable that was shipped with the server turned out to be an incorrect one. As a result, when rebooting the system, the error code PSU0036 was displayed on the console. Figure 6. Error caused by using incorrect cable Further, the operating system itself also detected this issue during a driver deployment. Figure 7 Error messages detected by a driver 2. Multiple MapD users While testing the functionalities of the instance, we discovered that if two users use the same account to access MapD via port 9092, there will be rendering errors. Therefore, in order to test, we created two other 6
users. These users can independently access the data and perform analytics/ data visualization using MapD Core. 7