Running Docker* Containers on Intel Xeon Phi Processors

Similar documents
Clear CMOS after Hardware Configuration Changes

Intel Unite Plugin Guide for VDO360 Clearwater

Intel Unite. Intel Unite Firewall Help Guide

Installation Guide and Release Notes

Hetero Streams Library (hstreams Library) User's Guide

Intel QuickAssist for Windows*

Intel Unite Solution Version 4.0

Intel & Lustre: LUG Micah Bhakti

Intel Xeon W-3175X Processor Thermal Design Power (TDP) and Power Rail DC Specifications

Localized Adaptive Contrast Enhancement (LACE)

Intel Unite Solution Intel Unite Plugin for WebEx*

Modernizing Meetings: Delivering Intel Unite App Authentication with RFID

Intel Software Guard Extensions SDK for Linux* OS. Installation Guide

Intel Unite Plugin for Logitech GROUP* and Logitech CONNECT* Devices INSTALLATION AND USER GUIDE

Intel Open Network Platform Release 2.0 Hardware and Software Specifications Application Note. SDN/NFV Solutions with Intel Open Network Platform

HAProxy* with Intel QuickAssist Technology

Intel Unite Solution. Plugin Guide for Protected Guest Access

Intel Unite Solution. Linux* Release Notes Software version 3.2

HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF

Intel System Information Retrieval Utility

Intel Unite Solution Version 4.0

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes

Intel Compute Card Slot Design Overview

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel Omni-Path Fabric Manager GUI Software

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

Installation Guide and Release Notes

Intel Integrated Native Developer Experience 2015 (OS X* host)

High Performance Computing The Essential Tool for a Knowledge Economy

Intel Omni-Path Fabric Manager GUI Software

White Paper. May Document Number: US

Intel Visual Compute Accelerator Product Family

Intel Visual Compute Accelerator Product Family

Intel Software Guard Extensions Platform Software for Windows* OS Release Notes

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel Unite Solution Version 4.0

Intel Speed Select Technology Base Frequency - Enhancing Performance

IPSO 6LoWPAN IoT Software for Yocto Project* for Intel Atom Processor E3800 Product Family

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Intel System Event Log Viewer Utility

Intel Unite Solution. Plugin Guide for Protected Guest Access

Intel Unite. Enterprise Test Environment Setup Guide

Intel Security Dev API 1.0 Production Release

Intel Unite Solution Intel Unite Plugin for Ultrasonic Join

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python

Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation. OpenMPCon 2017 September 18, 2017

Bei Wang, Dmitry Prohorov and Carlos Rosales

Intel Parallel Studio XE 2019 Update 1

Intel System Debugger 2018 for System Trace Linux* host

Intel QuickAssist for Windows*

Movidius Neural Compute Stick

Omni-Path Cluster Configurator

Intel Celeron Processor J1900, N2807 & N2930 for Internet of Things Platforms

Intel Parallel Studio XE 2016

Intel Omni-Path Fabric Manager GUI Software

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

Andreas Dilger High Performance Data Division RUG 2016, Paris

IXPUG 16. Dmitry Durnov, Intel MPI team

Intel Unite Solution Version 4.0

Jomar Silva Technical Evangelist

Intel Many Integrated Core (MIC) Architecture

Zhang, Hongchao

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

Installation Guide and Release Notes

Bitonic Sorting Intel OpenCL SDK Sample Documentation

6th Generation Intel Core Processor Series

Fast-track Hybrid IT Transformation with Intel Data Center Blocks for Cloud

Intel Parallel Studio XE 2018

Intel Server Board S2600CW2S

Välkommen. Intel Anders Huge

Optimization of Lustre* performance using a mix of fabric cards

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Intel Manycore Platform Software Stack (Intel MPSS)

Becca Paren Cluster Systems Engineer Software and Services Group. May 2017

Expand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready

SELINUX SUPPORT IN HFI1 AND PSM2

The Intel Processor Diagnostic Tool Release Notes

BIOS Implementation of UCSI

Intel Omni-Path Fabric Switches

DIY Security Camera using. Intel Movidius Neural Compute Stick

Optimizing the operations with sparse matrices on Intel architecture

Intel Server Board S5520HC

LNet Roadmap & Development. Amir Shehata Lustre * Network Engineer Intel High Performance Data Division

Intel Ethernet Controller I350 Frequently Asked Questions (FAQs)

Device Firmware Update (DFU) for Windows

Virtuozzo Hyperconverged Platform Uses Intel Optane SSDs to Accelerate Performance for Containers and VMs

Installation Guide and Release Notes

Intel Firmware Support Package (Intel FSP) for Intel Xeon Processor D Product Family (formerly Broadwell-DE), Gold 001

An introduction to today s Modular Operating System

Intel Quark Microcontroller Software Interface Pin Multiplexing

Achieving 2.5X 1 Higher Performance for the Taboola TensorFlow* Serving Application through Targeted Software Optimization

Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth

Intel True Scale Fabric Switches Series

pymic: A Python* Offload Module for the Intel Xeon Phi Coprocessor

Transcription:

Running Docker* Containers on Intel Xeon Phi Processors White Paper March 2017 Revision 001 Document Number: 335644-001US

Notice: This document contains information on products in the design phase of development. The information here is subject to change without notice. Do not finalize a design with this information. Intel technologies features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages resulting from such losses. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. Warning: Altering PC clock or memory frequency and/or voltage may (i) reduce system stability and use life of the system, memory and processor; (ii) cause the processor and other system components to fail; (iii) cause reductions in system performance; (iv) cause additional heat or other damage; and (v) affect system data integrity. Intel assumes no responsibility that the memory, included if used with altered clock frequencies and/or voltages, will be fit for any particular purpose. Check with memory manufacturer for warranty and additional details. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel is a sponsor and member of the Benchmark XPRT Development Community, and was the major developer of the XPRT family of benchmarks. Principled Technologies is the publisher of the XPRT family of benchmarks. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm. Intel, the Intel logo, Xeon Phi, Intel Advanced Vector Extensions 512 (Intel AVX-512) and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright 2017, Intel Corporation. All Rights Reserved. 2 Document Number: 335644-001US

Contents 1 Introduction... 5 1.1 Content and Scope... 5 1.2 References... 5 2 Running micperf as a Container... 7 2.1 About micperf... 7 2.2 Prerequisites... 8 2.2.1 Docker Runtime... 8 2.2.2 Runtime Libraries... 8 2.2.3 Intel Xeon Phi Processor Software... 8 2.3 Building micperf Container... 8 2.4 Running micperf Container... 10 2.4.1 Security Configuration... 10 2.4.2 Running the Container... 11 3 Performance in the Container... 13 3.1 Configuration... 13 3.2 Performance Results... 13 4 Summary... 14 5 Sources... 15 5.1 Dockerfile* Configuration File... 15 Tables Table 1 Difference Between Multiple Runs on Bare Metal... 13 Table 2 Difference Between Bare Metal and Container Results... 13 Document Number: 335644-001US 3

Revision History Document Number Revision Number Description Date 335644 001 Initial release of the document. March 2017 4 Document Number: 335644-001US

1 Introduction 1.1 Content and Scope This document intends to present Intel s experiences with running Docker* containers on Intel Xeon Phi processors. This paper will not describe all the details about Docker container building and execution. Readers can find this information in the Docker documentation [1]. Containers are becoming ever more popular as a mechanism of running applications and services on Linux*, and Docker remains among the most popular solutions. Its containers are pretty much independent from the underlying hardware. For HPC systems, they offer separation of their virtual environment, where an application is executed, from the native OS services (bare metal) in a way similar to the classical virtual machines but with less overhead and complexity. In order to evaluate Docker containers on Intel Xeon Phi processors, Intel containerized and ran the micperf application, which is a set of benchmarks created to demonstrate the performance of Intel Xeon Phi processors and coprocessors. Section 2 describes micperf as well as the process of containerization and container execution, Section 3 contains performance results, Section 4 contains the summary, and Section 5 shows the configuration files. 1.2 References [1] Docker Inc., "Docker Documentation," 25 January 2017. [Online]. Available: https://docs.docker.com/. [2] Intel Corporation, "Intel Xeon Phi Processor Software," [Online]. Available: https://software.intel.com/en-us/articles/xeon-phi-software. [Accessed 26 01 2017]. [3] Docker Inc., "Install Docker on Linux distributions," [Online]. Available: https://docs.docker.com/engine/installation/linux/. [4] Intel Corporation, "Intel Parallel Studio XE 2017," [Online]. Available: https://software.intel.com/en-us/intel-parallel-studio-xe. [Accessed 26 01 2017]. [5] Intel Corporation, "Intel Math Kernel Library (Intel MKL)," [Online]. Available: https://software.intel.com/en-us/intel-mkl. [6] Intel Corporation, "Intel MPI Library," [Online]. Available: https://software.intel.com/en-us/intel-mpi-library. Document Number: 335644-001US 5

[7] Intel Corporation, "Intel Xeon Phi Processor Software User s Guide," [Online]. Available: https://software.intel.com/en-us/articles/xeon-phisoftware. 6 Document Number: 335644-001US

2 Running micperf as a Container 2.1 About micperf There are many benchmarks which can be used to measure the performance of Intel Xeon Phi processors. Micperf is a software package designed to incorporate a variety of benchmarks into a simple user experience with a single interface for execution and a unified means of data inspection. It can be downloaded as a part of the Intel Xeon Phi Processor Software [2]. The micperf package targets a range of users including engineers interested in performance regression testing while implementing modifications to hardware, firmware, drivers, or operating system software. In addition, micperf also has other applications that can be employed by users and hardware manufacturers for demonstration and system verification purposes. The micprun executable, the primary application in the micperf package, executes the following benchmarks on the Intel Xeon Phi Processor: Linpack HPLinpack HPCG SGEMM DGEMM STREAM These benchmarks were carefully chosen to demonstrate performance in all of the major bottlenecks of the system. Their implementations use the Intel Xeon Phi processor s features, like its support for Intel Advanced Vector Extensions 512 (Intel AVX-512) vector instructions or MCDRAM memory, to fully utilize the processor s capabilities and demonstrate its effectiveness. The processor excels at dense level basic linear algebra calculations, and the Linpack, HPLinpack, HPCG, DGEMM, and SGEMM benchmarks demonstrate these capabilities. All of these benchmarks compute with double precision floating point numbers, except SGEMM, which computes in single precision. The STREAM benchmark measures the bus bandwidth between the processor s main memory and the computational registers. Micperf depends on several runtime libraries, and the results heavily depend on their exact version. To simplify the execution of micperf benchmarks and to ensure that the right dependencies are used, for the purpose of this white paper, a containerized version of this package was created that can be executed on any Linux distribution supported by Intel Xeon Phi processors without the need to install extra packages. Document Number: 335644-001US 7

2.2 Prerequisites 2.2.1 Docker Runtime To create and run containers, Docker must be installed. Some Linux distributions already include the Docker runtime software for example Red Hat* Enterprise Linux* 7.2 used for experiments described here includes Docker version 1.9.1. The latest version of the software can be downloaded from the Docker servers. During the Intel experiments, Docker* version 1.12.5 was used. For instructions on how to update Docker, refer to its documentation [3]. 2.2.2 Runtime Libraries Micperf uses existing benchmark implementations and various runtime libraries optimized for best performance on Intel Xeon Phi processors. They are delivered as part of the Intel Parallel Studio product [4]. However, installing the entire Intel Parallel Studio inside the micperf container is not required. Instead, only the following runtime libraries are needed: Intel Math Kernel Library (Intel MKL) L runtime (see [5] for downloading instruction) Intel Parallel Studio runtime (see [6] for installation) 2.2.3 Intel Xeon Phi Processor Software Micperf is a part of Intel Xeon Phi Processor Software. This software can be downloaded from reference [2]. Separate tar archives are available for various OS distributions. Once the right tar file is downloaded, it should be unpacked. It contains multiple RPM files. All of the files are not needed in the container. For micperf, only memkind and micperf RPM files are needed: xppsl-memkind- <version-and-build>,x86_64.rpm xppsl-micperf-<version-andbuild>,x86_64.rpm. Those files should be copied to the directory where the container is built. 2.3 Building micperf Container Building a micperf container can be performed on any Linux-based node, not necessarily equipped with an Intel Xeon Phi processor. To create a micperf container, a new directory was created where the Intel MKL runtime distribution file was copied, the Intel MPI distribution file, and the OpenMP* library. Then, a typical Dockerfile* configuration file was created. The content of this file is described below, while the complete file is shown in Section 5.1. 8 Document Number: 335644-001US

The Dockerfile starts with information about the source container (in the Intel case, it is the CentOS* 7.2 container): FROM centos:7.2.1511 An optional instruction informs about the container s maintainer: MAINTAINER <maintainer>@<domain> If the container is built in a corporate network environment, proxy addresses to the access public repositories may be required: ENV http_proxy="http://<proxy.address:port>/" ENV https_proxy="https:// <proxy.address:port>/" Some environment variables used within the container are also specified. MKLROOT indicates where the Intel MKL runtime is installed. LD_LIBRARY_PATH is modified to include the OpenMP runtime. ENV MKLROOT="/opt/mkl" ENV LD_LIBRARY_PATH="/opt/intel/psxe_runtime/linux/compiler/lib/intel64 _lin:${ld_library_path}" Next, specify the filenames for the installation versions of the Intel MKL runtime. The exact names should be changed when new versions of those libraries are used. # MKL runtime filename ARG MKL_RUNTIME_FILENAME="l_mklb_p_2017.1.013" The filenames of the latest versions of the micperf RPMs must be specified as well: # Memkind and micperf RPM names ARG MEMKIND="xppsl-memkind-1.5.0-4036.x86_64.rpm" ARG MICPERF="xppsl-micperf-1.5.0-4036.x86_64.rpm" The Intel MKL runtime is installed by unpacking the distribution tarball to the desired location within the container image. # install mkl runtime ADD ${MKL_RUNTIME_FILENAME}.tgz ${MKLROOT} Once all dependencies are installed, install micperf into the newly created container image. # Install memkind from local RPM ADD ${MEMKIND}. RUN yum install -y./${memkind} && \ rm -f ${MEMKIND} Document Number: 335644-001US 9

# Install micperf from the local RPM ADD ${MICPERF}. RUN yum install -y./${micperf} && \ rm -f./${micperf} Finally, the Intel Parallel Studio script is run to configure environmental variables and to run the command shell: # Set Intel MPI enviroment variables and run command shell CMD source /opt/intel/psxe_runtime/linux/bin/psxevars.sh && /bin/bash Once the Dockerfile is defined, build the container. The container creation requires root privileges. $ sudo docker build -t <user>/<project>:micper-mkl-1-132. As a result of the above operation, a container is labeled as micperf:01. That container can be pushed to the container repository: $ sudo docker push <user>/<project>:micper-mkl-1-132 2.4 Running micperf Container To obtain meaningful performance results, the micperf container should be executed on the Intel Xeon Phi processor. Details on how to run the micperf container are presented below. 2.4.1 Security Configuration Docker version 1.9.x, delivered with Red Hat Enterprise Linux 7.3, does not have any security restrictions that affects the micperf container execution. However, extra security features were added starting from Docker version 1.10. Therefore, on the newest versions of Docker, execution of certain system calls within the container is blocked. Micperf uses two system calls, mbind() and set_mempolicy(), that are blocked by the default configuration of Docker 1.12.x. Those system calls are needed to enforce the memory allocation from MCDRAM instead of DDR. To change the security configuration, the default security configuration file was downloaded from Docker github: raw.githubusercontent.com/docker/docker/v1.12.3/profiles/seccomp/default.js on. Based on that file, a new custom security configuration file was created called micperf.json and added the following: { 10 Document Number: 335644-001US

}, { }, "name": "mbind", "action": "SCMP_ACT_ALLOW", "args": [] "name": "set_mempolicy", "action": "SCMP_ACT_ALLOW", "args": [] This new policy allows the use of mbind() and set_mempolicy() system calls from the container. 2.4.2 Running the Container So far the micperf container was prepared and the optional security policy configuration. Now the micperf container can be executed on the Intel Xeon Phi processor. The latest version of the micperf container may be uploaded first. This is optional, since the run command will upload the container if a specified name and version is not available. $ docker pull <user>/<project>:micper-mkl-1-132 Now the container can be run. The simplest way is to run it in the interactive mode and run the micprun command within the container. First, the iterative container is started: $ docker run -it <user>/<project>:micper-mkl-1-132 If Docker version 1.10 or newer is installed, a custom security policy (described in the previous section) must be used: $ docker run --security-opt seccomp=micperf.json -it \ <user>/<project>:micper-mkl-1-132 Once the container prompt is shown, an entire set of all available benchmarks can be run: root@98df59cd3def# micprun Optionally, you can select a benchmark, for example: root@98df59cd3def# micprun - k dgemm Micperf tries to detect coprocessors using dmidecode and lspci commands that are intentionally not included in the container. Warning messages are generated, but they can be ignored. root@98df59cd3def# micprun -k linpack WARNING: Could not find dmidecode. Document Number: 335644-001US 11

WARNING: Could not find lspci in path, and it is not located in /sbin/lspci. 12 Document Number: 335644-001US

3 Performance in the Container 3.1 Configuration Performance tests were executed on a system with one Intel Xeon Phi processor 7250 at 1.40 GHz (68 cores) with hyper threading enabled, 48 GB DDR and 16 GB MCDRAM (flat mode, quadrant). The installed OS (bare metal) software: Red Hat Enterprise Linux release 7.2 (Maipo), XPPSL 1.5.0, kernel 3.10.0-327.36.3.el7.xppsl_1.5.0.3509.x86_64. 3.2 Performance Results The micperf benchmarks were executed in the container and on the bare metal system. The goal of the performance test was to compare the results on both configurations. The performance results of benchmarks such as SGEMM and DGEMM may differ slightly between runs. As presented in Table 1, the difference may be greater than 1%. Table 1 Difference Between Multiple Runs on Bare Metal Benchmark Diff Between Runs (Gflops) Relative Difference % sgemm 45 1.03% dgemm 158 8.01% Table 2 compares the average benchmark results obtained in the bare metal (Native OS) and inside the container. It shows that the performance impact of the containers can be ignored, as the difference between bare metal and container average results are smaller than between multiple executions of the same benchmark on the bare metal node. Table 2 Difference Between Bare Metal and Container Results Benchmark Difference Between Bare Metal and Container Difference in % sgemm [GFlops] -5-0.11% dgemm [GFlops] 4 0.21% stream [GB/s] 0 0.00% HPLinpack [GFlops] 0 0.00% linpack [GFlops] 4 0.21% HPCG [GFlops] 0.11 0.23% Document Number: 335644-001US 13

4 Summary This paper presented how to create and run a Docker container with the micperf benchmark on the Intel Xeon Phi processor. The performance results obtained within the container are comparable to results obtained on the native operating system (bare metal). It proves that important hardware features specific for Intel Xeon Phi processor, like new Intel AVX-512 instructions or MCDRAM memory running in the flat mode, are available in the container. The modifications to the default security policy, which were required to use MCDRAM, were very limited and did not affect overall system security. 14 Document Number: 335644-001US

5 Sources 5.1 Dockerfile* Configuration File FROM centos:7.2.1511 MAINTAINER <maintainer>@<domain> ENV http_proxy="http://<your_proxy>:<your_proxy_port>/" ENV https_proxy=<your_proxy>:<your_proxy_port>/ ENV MKLROOT="/opt/mkl" ENV LD_LIBRARY_PATH="/opt/intel/psxe_runtime/linux/compiler/lib/intel64 _lin:${ld_library_path}" # MKL runtime filename ARG MKL_RUNTIME_FILENAME="l_mklb_p_2017.1.132" # Memkind and micperf RPM names ARG MEMKIND="xppsl-memkind-1.5.0-4036.x86_64.rpm" ARG MICPERF="xppsl-micperf-1.5.0-4036.x86_64.rpm" # install mkl runtime + benchmarks ADD ${MKL_RUNTIME_FILENAME}.tgz ${MKLROOT} # Install Intel Paralell Studio runtime from a repository # This will install MPI and OpenMP runtimes RUN rpm --import http://yum.repos.intel.com/2017/setup/rpm-gpg-keyintel-psxe-runtime-2017 && \ rpm -Uhv http://yum.repos.intel.com/2017/setup/intel-psxeruntime-2017-reposetup-1-0.noarch.rpm && \ yum -y install intel-mpi-runtime-64 && \ yum -y install intel-openmp-runtime-64 # Install memkind from local RPM ADD ${MEMKIND}. RUN yum install -y./${memkind} && \ rm -f ${MEMKIND} # Install micperf from the local RPM ADD ${MICPERF}. RUN yum install -y./${micperf} && \ rm -f./${micperf} # Set Intel MPI enviroment variables and run command shell CMD source /opt/intel/psxe_runtime/linux/bin/psxevars.sh && /bin/sh Document Number: 335644-001US 15