HPC IN CONTAINERS: WHY CONTAINERS, WHY HPC, WHY NVIDIA GTC 18, S8642, Monday March 26, 11am

Size: px

Start display at page:

Download "HPC IN CONTAINERS: WHY CONTAINERS, WHY HPC, WHY NVIDIA GTC 18, S8642, Monday March 26, 11am"

Mabel Freeman
5 years ago
Views:

1 HPC IN CONTAINERS: WHY CONTAINERS, WHY HPC, WHY NVIDIA GTC 18, S8642, Monday March 26, 11am CJ Newburn, HPC Architect, NVIDIA Compute SW Principal Engineer

2 OUTLINE Motivation What NVIDIA is doing Collaborations Requested feedback Call to action 2

3 WHY CONTAINERS: MOTIVATIONAL STORIES War stories from the trenches Hard to configure and install HPC apps App updates get delayed Lack of a reference design Many variants, some better than others Experimental/simulation hybrid molecular modeling as a service Will a given app run on a new platform? Better startup times with fewer libs loaded from bottlenecked metadata servers Encapsulating pipelines reduces complexity 3

5 This framework requires installing 6 dependencies from sources I want to train my model on the

4 RUNNING A GPU APPLICATION Customer Pain Points DL Application RHEL 7.3 CUDA 8.0 Driver 375 4x Pascal Python 2.7 Ubuntu CUDA 9.0 Driver 384 4x Volta Python 3.5 This framework requires installing 6 dependencies from sources I want to train my model on the cluster but it s running RHEL 7 Some machines in the cluster have different NVIDIA hardware & drivers How do I deploy a DL model/application at scale 4

5 EXPERIMENTATION+MODELING HPC modeling as a service Experimenters Run equipment to collect raw data Challenge: what s signal vs. noise? Scientists who don t do code or SW administration Augmenting with modeling Model helps filter out noise more accurate with less processing time Provide container, e.g. NAMD on 1 GPU in a few hours 5

6 EASING THE TRANSITION TO IMPROVED SYSTEMS Try before you buy, on your own workload Cloud Legacy Latest GPUs 6

7 TRIMMING LIB DEPENDENCES VIA CONTAINERS Container is a good fit for applying special optimization steps Size of dependent libraries can become huge 4x can make fit in RAMdisk for faster access Metadata server I/O can become bottleneck, e.g. with 20 job groups Trim away shared libs and Python include searches Fix/patch to merge data locally and move to Lustre at the end of the job avoids conflicts RAMdisk access improvements can greatly reduce startup time, even with copy Relevant example ATLAS (CERN) simulations on Titan, courtesy of Sergey Panitkin of BNL Container build defines mount points, installs special versions with perf optimizations Optimizing for size and using RAMdisk halved setup time, reduced runtime by >2 minutes (9%) Background info for this use case courtesy of Adam Simpson, ORNL 7

8 PIPELINE EXAMPLE Moving toward HPC as a service vs. becoming an app mechanic index map sort index report Consider a pipeline of many processes Each could have its own dependences, require its own set up But each stage or the whole set of stages could be containerized Some relevant work: snakemake, SCI-F by Vanessa Sochat, Stanford: The Scientific Filesystem, Containers in HPC Symposium at UCAR, Boulder CO, 8

9 WHY HIGH-PERFORMANCE COMPUTING We in HPC care about performance; democratizing HPC Performance can depend on Tuning discover and apply best-known methods Getting the latest version We are making a transition from HPC for experts to HPC for the masses Breadth of adoption may strongly depend on ease of use The time is ripe! 9

10 PROBLEMS ADDRESSED VIA CONTAINERIZATION Making it easier for users, admins and developers Portability Repeatability Resource isolation New telemetry surface Bare metal performance, vs. VMs Parameterizability and control over runtime 10

11 DESIGNED FOR GPU-ACCELERATED SYSTEMS RUN ON PASCAL- & VOLTA-POWERED SYSTEMS Workstations Supercomputing Clusters Cloud Computing 11

12 OPENMPI DOCKERFILE VARIANTS Real examples: lots of ways, some better than others Enable many versions RUN OPENMPI_VERSION=3.0.0 && \ wget -q -O - mpi.org/software/ompi/v3.0/downloads/openmpi- with parameters to ${OPENMPI_VERSION}.tar.gz tar -xzf - && \ common interface cd openmpi-${openmpi_version} && \./configure --enable-orterun-prefix-by-default --with-cuda -- with-verbs \ --prefix=/usr/local/mpi --disable-getpwuid && \ make -j"$(nproc)" install && \ cd.. && rm -rf openmpi-${openmpi_version} && \ echo "/usr/local/mpi/lib" >> /etc/ld.so.conf.d/openmpi.conf Parameters vary&& ldconfig ENV PATH /usr/local/mpi/bin:$path Control environment RUN apt-get update \ && apt-get install -y --no-install-recommends \ libopenmpi-dev \ openmpi-bin \ openmpi-common \ && rm -rf /var/lib/apt/lists/* ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib Functional, simpler, but not CUDA or IB aware Different compilers COPY openmpi /usr/local/openmpi WORKDIR /usr/local/openmpi RUN /bin/bash -c "source /opt/pgi/license.txt && CC=pgcc CXX=pgc++ F77=pgf77 FC=pgf90./configure --with-cuda -- prefix=/usr/local/openmpi RUN /bin/bash -c "source /opt/pgi/license.txt && make all install" RUN mkdir /logs RUN wget -nv && \ tar -xzf openmpi tar.gz && \ cd openmpi-*&&./configure --with-cuda=/usr/local/cuda \ --enable-mpi-cxx --prefix=/usr 2>&1 tee /logs/openmpi_config && \ make -j 32 2>&1 tee /logs/openmpi_make && make install 2>&1 tee /logs/openmpi_install && cd /tmp \ && rm -rf openmpi-* Bad layering WORKDIR /tmp ADD /tmp RUN tar -xzf openmpi tar.gz && \ cd openmpi-*&&./configure --with-cuda=/usr/local/cuda \ --enable-mpi-cxx --prefix=/usr && \ make -j 32 && make install && cd /tmp \ && rm -rf openmpi-* RUN wget -q -O - tar - xjf - && \ cd openmpi && \ CXX=pgc++ CC=pgcc FC=pgfortran F77=pgfortran./configure -- prefix=/usr/local/openmpi --with-cuda=/usr/local/cuda --with-verbs --disable-getpwuid && \ make -j4 install && \ rm -rf /openmpi

13 WHAT NVIDIA IS DOING Enabling Offerings Technology collaboration 13

14 SCOPE OF ENABLING PLANS Better, more up-to-date results with less effort Ecosystem: nurture a collaborative ecosystem around HPC containers Registry: host containerized applications, CUDA base containers Ingredients, recipes: Recommend and validate best practices HPC Containers: easily derive application containers from these Container technologies: GPU enabled System SW: OS, container runtime, scheduler is GPU enabled Recommended platforms: known-good solutions for HPC apps 14

15 MAKING IT EASIER WITH HPC CONTAINERS Potentially easier for non-expert end users NVIDIA has experience collaborating with developers to containerize HPC apps Identifying, improving, creating ingredients Developing and optimizing recipes Codify those learnings Dockerfiles and other recipe files with tuned steps for each recommended ingredient Careful layering, for the sake of minimizing size, maximizing cacheability Validated combinations in specific HPC base containers from which app containers derived Recipes for building platforms container runtime, scheduler, OS, system Consistent approach to documentation 15

16 HPC APPS CONTAINERS ON NVIDIA GPU CLOUD CANDLE CHROMA* GAMESS GROMACS LAMMPS Lattice Microbes MILC* NAMD RELION *Coming soon RAPID CONTAINER ADDITION RAPID USER ADOPTION 16

17 NVIDIA GPU CLOUD FOR HPC VISUALIZATION U CLOUD FOR HPC VISUALIZATION IndeX ParaView with NVIDIA IndeX ParaView with NVIDIA OptiX ParaView with NVIDIA Holodeck VMD 17

18 NVIDIA CONTAINER RUNTIME Enables GPU support in popular container runtimes Containerized Applications Container Runtime Caffe NAMD Docker Tensor Flow MILC OCI Runtime Interface Nvidia-container-runtime CUDA Libnvidia-container NVIDIA Driver LXC, CRIO etc NVML NVIDIA-Docker makes GPU containers truly portable Integrates Linux container internals instead of wrapping specific runtimes (e.g. Docker) Better integration into the container ecosystem - Kubernetes (CRI), HPC (rootless) 2M downloads 18

19 KUBERNETES ON NVIDIA GPUs GPU enhancements to mainline Kubernetes - get features faster than community releases Updated with each release of K8s (current version is v1.9) and close collaboration with community to upstream changes Minimize friction to adoption of Kubernetes on GPUs Fully open-source KUBERNETES NVIDIA CONTAINER RUNTIME NVIDIA DRIVER 19

20 HPC CONTAINER MAKER - HPCCM h-p-see-um Collect and codify best practices Make recipe file creation easy, repeatable, modular, qualifiable Using this as a reference and a vehicle to drive collaboration Container implementation neutral Write Python code that calls primitives and building blocks vs. roll your own Leverage latest and greatest building blocks 20

21 Python file with references to primitives and parameterized building blocks BIG PICTURE Script that transforms into container recipe file, using primitive and building block implementations $ docker build reference hpccm input recipe hpccm CLI tool Recipe implementations HPC Container Maker: hpccm Dockerfile Base images Singularity recipe file Buildkit, buildah, $ singularity build Docker image docker2singularity Singularity image container spec file container build container image 21

22 HPCCM CONCEPTS AND TERMINOLOGY hpccm input recipe file: what hpccm ingests container recipe file: what hpccm produces, e.g. Dockerfile, Singularity recipe file primitive: line in hpccm-recipe file that has a 1:1 mapping with primitive implementation line in container recipe file building block: line in hpccm-recipe file with a 1:many primitive mapping; mapping is codified in hpccm implementation; these are parameterized recipe implementations: collection of implementations of primitives and building blocks 22

23 RECIPES INCLUDED WITH CONTAINER MAKER Shown in current build order HPC base recipe with GNU compilers Ubuntu CUDA 9.0 Python 2 and 3 GNU compilers (upstream) Mellanox OFED OpenMPI FFTW HDF HPC base recipe with PGI compilers Ubuntu CUDA 9.0 Python 2 and 3 PGI compilers Mellanox OFED OpenMPI FFTW HDF HPC application samples coming 23

24 BUILDING AN HPC APPLICATION IMAGE Analogous workflows for Singularity 1. Use the HPC base image as your starting point Base recipe Dockerfile Base image App Dockerfile 2. Generate a Dockerfile from the HPC base recipe Dockerfile and manually edit it to add the steps to build your application Base recipe Dockerfile App Dockerfile 3. Copy the HPC base recipe file and add your application build steps to the recipe Base recipe App recipe 24

25 HIGHER LEVEL ABSTRACTION Building block encapsulates simplified best practices, avoids duplication ompi = openmpi(version='3.0.0', toolchain=tc) Stage0 += ompi # OpenMPI version RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ file \ hwloc && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp tar.bz2 && \ tar -x -f /tmp/openmpi tar.bz2 -C /tmp -j && \ cd /tmp/openmpi &&./configure --prefix=/usr/local/openmpi --disable-getpwuid --with-cuda --without-verbs && \ make -j4 && \ make -j4 install && \ rm -rf /tmp/openmpi tar.bz2 /tmp/openmpi ENV PATH=/usr/local/openmpi/bin:$PATH \ LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH 25

26 """ HPC Base image Contents: CUDA version 9.0 FFTW version GNU compilers (upstream) HDF5 version Mellanox OFED version OpenMPI version Python 2 and 3 (upstream) """ Stage0 += comment( doc, reformat=false) Stage0 += baseimage(image='nvidia/cuda:9.0-devel', _as='devel') # Python (use upstream) Stage0 += apt_get(ospackages=['python', 'python3']) # Compilers (use upstream) Stage0 += apt_get(ospackages=['gcc', 'g++', 'gfortran']) # Create a toolchain tc = hpccm.toolchain(cc='gcc', CXX='g++', F77='gfortran', F90='gfortran', FC='gfortran', CUDA_HOME='/usr/local/cuda') # Mellanox OFED ofed = mlnx_ofed(version=' ') Stage0 += ofed # OpenMPI ompi = openmpi(version='3.0.0', toolchain=tc) Stage0 += ompi # # HPC Base image # # Contents: # CUDA version 9.0 # FFTW version # GNU compilers (upstream) # HDF5 version # Mellanox OFED version # OpenMPI version # Python 2 and 3 (upstream) # FROM nvidia/cuda:9.0-devel AS devel RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ python \ python3 && \ rm -rf /var/lib/apt/lists/* RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ gcc \ g++ \ gfortran && \ rm -rf /var/lib/apt/lists/* # Mellanox OFED version RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ libnl \ libnl-route \ libnuma1 \ wget && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp x86_64.tgz && \ tar -x -f /tmp/mlnx_ofed_linux ubuntu16.04-x86_64.tgz -C /tmp -z && \ dpkg --install /tmp/mlnx_ofed_linux ubuntu16.04-x86_64/debs/libibverbs1_*_amd64.deb && \ dpkg --install /tmp/mlnx_ofed_linux ubuntu16.04-x86_64/debs/libibverbs-dev_*_amd64.deb && \ dpkg --install /tmp/mlnx_ofed_linux ubuntu16.04-x86_64/debs/libmlx5-1_*_amd64.deb && \ dpkg --install /tmp/mlnx_ofed_linux ubuntu16.04-x86_64/debs/ibverbs-utils_*_amd64.deb && \ rm -rf /tmp/mlnx_ofed_linux ubuntu16.04-x86_64.tgz /tmp/mlnx_ofed_linux ubuntu16.04-x86_64 # OpenMPI version RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ file \ hwloc \ openssh-client \ wget && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp && \ tar -x -f /tmp/openmpi tar.bz2 -C /tmp -j && \ cd /tmp/openmpi && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran./configure --prefix=/usr/local/openmpi --disable-getpwuid --enable-orterunprefix-by-default --with-cuda=/usr/local/cuda --with-verbs && \ make -j4 && \ make -j4 install && \ rm -rf /tmp/openmpi tar.bz2 /tmp/openmpi ENV PATH=/usr/local/openmpi/bin:$PATH \ LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH # FFTW version RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ file \ make \ wget && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp ftp://ftp.fftw.org/pub/fftw/fftw tar.gz && \ tar -x -f /tmp/fftw tar.gz -C /tmp -z && \ cd /tmp/fftw && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran./configure --prefix=/usr/local/fftw --enable-shared --enable-openmp --enablethreads --enable-sse2 && \ make -j4 && \ make -j4 install && \ rm -rf /tmp/fftw tar.gz /tmp/fftw ENV LD_LIBRARY_PATH=/usr/local/fftw/lib:$LD_LIBRARY_PATH # FFTW fftw = fftw(version='3.3.7', toolchain=tc) Stage0 += fftw # HDF5 hdf5 = hdf5(version='1.10.1', toolchain=tc) Stage0 += hdf5 # HDF5 version RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ file \ make \ wget \ zlib1g-dev && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp && \ tar -x -f /tmp/hdf tar.bz2 -C /tmp -j && \ cd /tmp/hdf && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran./configure --prefix=/usr/local/hdf5 --enable-cxx --enable-fortran && \ make -j4 && \ make -j4 install && \ rm -rf /tmp/hdf tar.bz2 /tmp/hdf ENV PATH=/usr/local/hdf5/bin:$PATH \ HDF5_DIR=/usr/local/hdf5 \ LD_LIBRARY_PATH=/usr/local/hdf5/lib:$LD_LIBRARY_PATH 26

27 MULTI-STAGE BUILDS Stage0 += baseimage(image='nvidia/cuda:9.0-devel', _as='devel )... # OpenMPI ompi = openmpi(version='3.0.0', toolchain=tc) Stage0 += ompi... ###### # Runtime image ###### Stage1 += baseimage(image='nvidia/cuda:9.0-runtime )... # OpenMPI Stage1 += ompi.runtime()... # OpenMPI RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ hwloc \ openssh-client && \ rm -rf /var/lib/apt/lists/* FROM nvidia/cuda:9.0-devel AS devel COPY --from=0 /usr/local/openmpi /usr/local/openmpi FROM nvidia/cuda:9.0-runtime ENV PATH=/usr/local/openmpi/bin:$PATH \... LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH... # OpenMPI version RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ file \ hwloc \ openssh-client \ wget && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp && \ tar -x -f /tmp/openmpi tar.bz2 -C /tmp -j && \ cd /tmp/openmpi && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran./configure -- prefix=/usr/local/openmpi --disable-getpwuid --enable-orterun-prefix-by-default --withcuda=/usr/local/cuda --with-verbs && \ make -j4 && \ make -j4 install && \ rm -rf /tmp/openmpi tar.bz2 /tmp/openmpi ENV PATH=/usr/local/openmpi/bin:$PATH \ LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH... # OpenMPI RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ hwloc \ openssh-client && \ rm -rf /var/lib/apt/lists/* COPY --from=0 /usr/local/openmpi /usr/local/openmpi ENV PATH=/usr/local/openmpi/bin:$PATH \ LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH... 27

28 PARAMETERIZED BUILDING BLOCKS Parameters enable specialization; implementations invoke Python code Also: apt-get, FFTW, HDF5, Linux OFED, PGI compiler openmpi(check=false, # run make check? configure_opts=[ disable-getpwuid, enable-orterun-prefix-by-default ], cuda=true, directory=, # path to source in build context infiniband=true, ospackages=[ file, hwloc, openssh-client, wget ], prefix= /usr/local/openmpi, toolchain=toolchain(), version= ) # version to download mlnx_ofed(ospackages=[ libnl-3-200, libnl-route-3-200, libnuma1, wget ], packages=[ libibverbs1, libibverbs-dev, libmlx5-1, ibverbs-utils ], version= ) # version to download Full recipe documentation can be found in RECIPES.md 28

29 CONTAINER IMPLEMENTATION ABSTRACTION Single source to either Docker or Singularity Container Builder primitive Dockerfile Singularity recipe file baseimage(image= ubuntu:16.04 ) FROM ubuntu:16.04 Bootstrap: docker From: ubuntu:16.04 shell(commands=[ a, b, c ]) RUN a && \ b && \ c %post a b c copy(src= a, dest= b ) COPY a b %files a b 29

30 FULL PROGRAMMING LANGUAGE AVAILABLE Conditional branching, validation, etc. in hpccm input recipe # get and validate precision VALID_PRECISION = [ single, double, mixed ] precision = os.environ.get( LAMMPS_PRECISION, single ) if precision not in VALID_PRECISION: raise ValueError( Invalid precision ) Stage0 += shell(commands=[f make f Makefile.linux.{precision}, ]) Courtesy of Logan Herche 30

hpccm ENVISIONED FLOW Accelerating the container creation and usage HPC devel container App source HPC App runtime test container binary App

31 hpccm ENVISIONED FLOW Accelerating the container creation and usage HPC devel container App source HPC App runtime test container binary App final binary HPC runtime container NVIDIA registry App image Validated GPU-enabled technologies Container runtime, scheduler, OS NGC GPU clusters 31

32 V0.5 OUTSIDE-CONTAINER TRADE-OFFS Situation and environment-based choices Ingredient Choices Choice factors CUDA Version 9.0 Supports Kepler through Volta, highest performance Container runtimes Docker, LXC, Shifter, Singularity Docker has best GPU support today. NVIDIA is investing in LXC for rootless Orchestration & scheduling SLURM, Kubernetes SLURM widely used in HPC; Kubernetes widely used in cloud GPU Enablement NVIDIA Container Runtime SDK OCI compliant, enabled multiple container runtimes, multi-node support OS Ubuntu 16.04, CentOS 7 Application based choice. Ubuntu has more testing for GPU enabled containers. CentOS uses RPMs. 32

33 SAMPLE DOCKER FILES We re in the process of normalizing our containers wrt these devel and runtime HPC container offerings GROMACS MILC 33

34 GROMACS DOCKERFILE PART 1; BUILD STAGE FROM nvidia/cuda:9.0-devel-ubuntu16.04 AS devel RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ ca-certificates cmake file git hwloc \ libibverbs-dev openssh-client python wget && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && \ wget -q --no-check-certificate -P /tmp && \ tar -x -f /tmp/openmpi tar.bz2 -C /tmp -j && \ cd /tmp/openmpi && \./configure --prefix=/opt/openmpi --enable-mpi-cxx --with-cuda \ --with-verbs && \ make -j32 && \ make -j32 install && \ rm -rf /tmp/openmpi tar.bz2 /tmp/openmpi ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \ PATH=/opt/openmpi/bin:$PATH Initialize build stage Install packages and cleanup Install OpenMPI 34

35 GROMACS DOCKERFILE PART 2 RUN mkdir -p /gromacs/install && \ mkdir -p /gromacs/builds && \ mkdir -p /tmp && git -C /tmp clone --depth=1 --branch v2018 \ && \ mv /tmp/gromacs /gromacs/src && \ cd /gromacs/builds && \ CC=gcc CXX=g++ cmake /gromacs/src -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=/gromacs/install \ -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \ -DGMX_BUILD_OWN_FFTW=ON -DGMX_GPU=ON -DGMX_MPI=OFF \ -DGMX_OPENMP=ON -DGMX_PREFER_STATIC_LIBS=ON \ -DMPIEXEC_PREFLAGS=--allow-run-as-root \ -DREGRESSIONTEST_DOWNLOAD=ON && \ make -j && \ make install && \ make check Build GROMACS 35

36 GROMACS DOCKERFILE PART 3; RUNTIME STAGE FROM nvidia/cuda:9.0-runtime-ubuntu16.04 RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ hwloc \ libgomp1 \ libibverbs-dev \ openssh-client \ python && \ rm -rf /var/lib/apt/lists/* COPY --from=devel /opt/openmpi /opt/openmpi ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \ PATH=/opt/openmpi/bin:$PATH COPY --from=devel /gromacs/install /gromacs/install ENV PATH=$PATH:/gromacs/install/bin WORKDIR /workspace Initialize release stage Install packages and cleanup Copy OpenMPI from build Copy GROMACS from build 36

37 MILC DOCKERFILE PART 1; BUILD STAGE FROM nvidia/cuda:9.0-devel-ubuntu16.04 AS devel RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ autoconf automake ca-certificates cmake dapl2-utils \ file git hwloc ibutils ibverbs-utils \ infiniband-diags libdapl-dev libibcm-dev \ libibmad5 libibverbs-dev libibverbs1 \ libmlx4-1 libmlx4-dev libmlx5-1 libmlx5-dev \ libnuma-dev librdmacm-dev librdmacm1 opensm \ openssh-client rdmacm-utils wget && \ rm -rf /var/lib/apt/lists/* Initialize build stage Install packages and cleanup 37

38 MILC DOCKERFILE PART 2 RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp tar.bz2 && \ tar -x -f /tmp/openmpi tar.bz2 -C /tmp -j && \ cd /tmp/openmpi && \./configure --prefix=/opt/openmpi --enable-mpi-cxx \ --with-cuda --with-verbs && \ make -j32 && \ make -j32 install && \ rm -rf /tmp/openmpi tar.bz2 /tmp/openmpi ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \ PATH=/opt/openmpi/bin:$PATH Build OpenMPI 38

39 MILC DOCKERFILE PART 3 WORKDIR /quda RUN mkdir -p /tmp && git -C /tmp clone --depth=1 --branch release/0.8.x && \ mv /tmp/quda /quda/src && \ mkdir -p /quda/build && \ cd /quda/build && \ cmake../src -DCMAKE_BUILD_TYPE=RELEASE \ -DQUDA_DIRAC_CLOVER=ON -DQUDA_DIRAC_DOMAIN_WALL=ON \ -DQUDA_DIRAC_STAGGERED=ON \ -DQUDA_DIRAC_TWISTED_CLOVER=ON \ -DQUDA_DIRAC_TWISTED_MASS=ON -DQUDA_DIRAC_WILSON=ON \ -DQUDA_FORCE_GAUGE=ON -DQUDA_FORCE_HISQ=ON \ -DQUDA_GPU_ARCH=sm_70 -DQUDA_INTERFACE_MILC=ON \ -DQUDA_INTERFACE_QDP=ON -DQUDA_LINK_HISQ=ON \ -DQUDA_MPI=ON && \ make -j32 && \ rm -rf /quda/src Build QUDA 39

40 MILC DOCKERFILE PART 4 RUN mkdir -p /tmp && \ git -C /tmp clone --depth=1 && \ Build MILC mv /tmp/milc /milc && \ cd /milc/ks_imp_rhmc/ && \ cp /milc/makefile /milc/ks_imp_rhmc/ && \ sed -i 's/wantquda$.*$=.*/wantquda\1= true/g' Makefile && \ sed -i 's/$want_.*_gpu$$.*$=.*/\1\2= true/g' Makefile && \ sed -i 's/quda_home$.*$=.*/quda_home\1= \/quda\/build/g' Makefile && \ sed -i 's/cuda_home$.*$=.*/cuda_home\1= \/usr\/local\/cuda/g' Makefile && \ sed -i 's/#\?mpp =.*/MPP = true/g' Makefile && \ sed -i 's/#\?cc =.*/CC = mpicc/g' Makefile && \ sed -i 's/ld$\s+$=.*/ld\1= mpicxx/g' Makefile && \ sed -i 's/precision = \d+/precision = 2/g' Makefile && \ sed -i 's/wantqio =.*/WANTQIO = #true or blank. Implies HAVEQMP./g' Makefile && \ sed -i 's/cgeom =.*-DFIX_NODE_GEOM.*/CGEOM = #-DFIX_NODE_GEOM/g' Makefile && \ C_INCLUDE_PATH=/quda/build/include make su3_rhmd_hisq 40

41 MILC DOCKERFILE PART 5; RUNTIME STAGE FROM nvidia/cuda:9.0-runtime-ubuntu16.04 RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ hwloc \ libibverbs1 \ libnuma1 \ librdmacm1 \ openssh-client && \ rm -rf /var/lib/apt/lists/* COPY --from=devel /opt/openmpi /opt/openmpi ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \ PATH=/opt/openmpi/bin:/milc:$PATH COPY --from=devel /milc/ks_imp_rhmc/su3_rhmd_hisq /milc/su3_rhmd_hisq COPY examples /workspace/examples WORKDIR /workspace Initialize release stage Install packages and cleanup Multi-node MPI is enabled Copy OpenMPI from build stage Copy MILC from build stage Copy examples into container 41

42 MILC HPCCM INPUT RECIPE FILE Header Ubuntu 16.04, CUDA 9.0, QUDA, MPI and MILC. Build with: nvidia-docker build -t milc. Run with: nvidia-docker run -it milc 42

43 MILC HPCCM INPUT RECIPE FILE, PART1 # pylint: disable=invalid-name, undefined-variable, used-before-assignment # pylama: ignore=e0602 gpu_arch = USERARG.get('GPU_ARCH', 'sm_70') # add docstring to Dockerfile Stage0 += comment( doc.strip(), reformat=false) ############################################################################### # Devel stage ############################################################################### Stage0.name = 'devel' Stage0 += baseimage(image='nvidia/cuda:9.0-devel-ubuntu16.04', AS=Stage0.name) Stage0 += apt_get(ospackages=['autoconf', 'automake', 'cmake', 'git', 'ca-certificates']) Stage0 += ofed() mpi_prefix = '/opt/openmpi' ompi = openmpi(configure_opts=['--enable-mpi-cxx'], prefix=mpi_prefix, parallel=32, version="3.0.0") Stage0 += ompi 43

44 MILC HPCCM INPUT RECIPE FILE, PART 2: QUDA # build QUDA git = hpccm.git() quda_build_dir = '/quda/build' Stage0 += workdir(directory="/quda") Stage0 += shell(commands=[git.clone_step(repository=" branch="release/0.8.x"), 'mv /tmp/quda /quda/src', 'mkdir -p {}'.format(quda_build_dir), 'cd {}'.format(quda_build_dir), ('cmake../src ' + '-DCMAKE_BUILD_TYPE=RELEASE ' + '-DQUDA_DIRAC_CLOVER=ON ' + '-DQUDA_DIRAC_DOMAIN_WALL=ON ' + '-DQUDA_DIRAC_STAGGERED=ON ' + '-DQUDA_DIRAC_TWISTED_CLOVER=ON ' + '-DQUDA_DIRAC_TWISTED_MASS=ON ' + '-DQUDA_DIRAC_WILSON=ON ' + '-DQUDA_FORCE_GAUGE=ON ' + '-DQUDA_FORCE_HISQ=ON ' + '-DQUDA_GPU_ARCH={} '.format(gpu_arch) + '-DQUDA_INTERFACE_MILC=ON ' + '-DQUDA_INTERFACE_QDP=ON ' + '-DQUDA_LINK_HISQ=ON ' + '-DQUDA_MPI=ON'), 'make -j32', 'rm -rf /quda/src']) 44

45 MILC HPCCM INPUT RECIPE FILE, PART 3: MILC # build MILC Stage0 += shell(commands=[git.clone_step(repository=" 'mv /tmp/milc_qcd /milc', 'cd /milc/ks_imp_rhmc/', 'cp /milc/makefile /milc/ks_imp_rhmc/', r"sed -i 's/wantquda$.*$=.*/wantquda\1= true/g' Makefile", r"sed -i 's/$want_.*_gpu$$.*$=.*/\1\2= true/g' Makefile", r"sed -i 's/quda_home$.*$=.*/quda_home\1= \/quda\/build/g' Makefile", r"sed -i 's/cuda_home$.*$=.*/cuda_home\1= \/usr\/local\/cuda/g' Makefile", r"sed -i 's/#\?mpp =.*/MPP = true/g' Makefile", r"sed -i 's/#\?cc =.*/CC = mpicc/g' Makefile", r"sed -i 's/ld$\s+$=.*/ld\1= mpicxx/g' Makefile", r"sed -i 's/precision = \d+/precision = 2/g' Makefile", r"sed -i 's/wantqio =.*/WANTQIO = #true or blank. Implies HAVEQMP./g' Makefile", r"sed -i 's/cgeom =.*-DFIX_NODE_GEOM.*/CGEOM = #-DFIX_NODE_GEOM/g' Makefile", 'C_INCLUDE_PATH={}/include make su3_rhmd_hisq'.format(quda_build_dir)]) 45

46 MILC HPCCM INPUT RECIPE FILE, PART 4: RELEASE ############################################################################### # Release stage ############################################################################### Stage0 += baseimage(image='nvidia/cuda:9.0-release-ubuntu16.04') Stage0 += apt_get(ospackages=['libnuma1', 'ssh', 'libibverbs1', 'librdmacm1']) Stage1 += ompi.runtime() Stage1 += copy(_from=stage0.name, src='/milc/ks_imp_rhmc/su3_rhmd_hisq', dest='/milc/su3_rhmd_hisq') Stage1 += copy(src='examples', dest='/workspace/examples') Stage1 += workdir(directory='/workspace') 46

47 COLLABORATIONS Nurturing a communal effort Mellanox [Yong Qin] Negligible overhead from using containers Working on resolving driver versioning issues Collaborating on best recipes, including for multi-node Dell [Nishanth Dandapanthula] Negligible overhead from using containers Ease of use Universities and labs Evaluations, feedback, use cases 47

48 BW (MB/sec) Latency (us) RDMA Performance RDMA BW (EDR) Size (Bytes) Host Singularity Docker RDMA Latency (EDR) Size (Bytes) Host Singularity Docker RDMA performances are inline between host and containers Images built with hpccm Courtesy of Yong Qin, Mellanox 2018 Mellanox Technologies 48

49 Latency (us) RDMA Performance (Cont.) 1000 RDMA Latency (99% percentile) Size (Bytes) Host Singularity Docker Larger variations observed on small message sizes due to runtime overheads from Containers Courtesy of Yong Qin, Mellanox 2018 Mellanox Technologies 49

50 Courtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC 50

51 Courtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC 51

52 Image built with hpccm Courtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC 52

53 Courtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC 53

54 KEY ISSUES Working with the community to deliver leading reference solutions Addressing key issues across the ecosystem to increase container adoption Developers Posting containerized HPC apps to our registry Infrastructure for making containers: recipes, scripts (hpccm), validated images Admins Driver matching Multi-node containers End users Working with OEMs to assure best performance Using containers from our registry 54

55 DRIVER VERSIONING One down, more to go Problem Container doesn t know which kernel driver versions are installed on the target platform Mismatches may be problematic, e.g. CUDA or mofed user and kernel drivers One approach Appropriate kernel driver is loaded into the container with container runtime enabling Relevance There are available solutions for Docker and Singularity for the CUDA driver case This issue is being actively worked in the Mellanox driver case Plans for Mellanox drivers Nail down support matrix Share test cases for regression suite, based on hpccm input receipe, platform config 55

56 YOUR FEEDBACK What value do you see to using containers that motivate you to containerize your app? What ingredients do you most want to see inside of an HPC container? outside of an HPC container? What are your pain points around developing containers? What pain points do you hear about for deploying containers? Are you willing to try out NVIDIA s containerized HPC apps? the hpccm infrastructure that helps with containerization of HPC apps? 56

57 CALL TO ACTION Try - OSS project Find this content at GTC website for Monday Mar 26 11am by CJ Newburn App developers Build your containers with HPCCM & deploy on NGC, offer feedback Take the opportunity to focus efforts, collaborate around a reference System Admins Make your cluster container ready with Docker, LXC and/or Singularity runtimes Application users Pull and run containers from ngc.nvidia.com Enjoy HPC apps with greater ease and confidence OEMs Build container-ready systems with NGC 57

58 FAQ Supported systems - The containers must run on Pascal, Volta. & newer GPU-powered systems Testing & performance NVIDIA may QA and benchmark the container License agreement Developer has to comply with all the app license requirements Ownership Container developer owns and retains all the rights, tile, and interest in and to HPC containers Support Developer must provide technical support to the end user of the container Cost NVIDIA will host the containers on NGC for free Container removal Both NVIDIA and the container developer have the right to take down the container at any time for any reason 58

59 REQUEST FOR FEEDBACK: HPC RUNTIME CONTAINER V0.5 Design and validate HPC container, derive from there for apps Kind of ingredient Recommended choice Rationale, alternatives OS Ubuntu Aligned with DL, current focus Alternative: CentOS 7 CUDA version 9.0 Backwards compatible [driver upgrade] CUDA type Runtime For deployment, not development Compiler PGI, gcc [and Intel] runtimes Actual compiler not needed for most usages Comms libraries CUDA-aware OpenMPI MOFED libs Scientific libraries FFTW [and MKL] Most commonly used OpenMPI CUDA enabling is underway via UCX Infrastructure Python 2 and 3, HDF Commonly used, may add more tools 59

60 REQUEST FOR FEEDBACK: HPC DEVEL CONTAINER V0.5 Design and validate HPC container, derive from there for apps Kind of ingredient Recommended choice Rationale, alternatives OS Ubuntu Aligned with DL, current focus Alternative: CentOS 7 CUDA version 9.0 Backwards compatible [driver upgrade] CUDA type Devel For development, include CUDA toolkit Compiler PGI, gcc [and Intel] compilers Compiler and its license in private images only Comms libraries CUDA-aware OpenMPI MOFED libs Scientific libraries FFTW [and MKL] Most commonly used OpenMPI CUDA enabling is underway via UCX Infrastructure Python 2 and 3, HDF Commonly used, may add more tools 60

61 Container Namespace Isolation Docker Singularity Namespace Isolation Share almost nothing Share almost everything File System (Mount) Isolated by default Can bind mount host volumes $HOME, /proc, /sys, /tmp, etc., from host by default Can bind mount other host volumes PID Isolated Shared Network Isolated Can be expanded with full support Shared with limited support Courtesy of Yong Qin, Mellanox 2018 Mellanox Technologies 61

62 MPI Docker Singularity MPI library Inside of container Outside of container (host) MPI Program Binary Inside of container Inside of container Network Container Host Security Docker Daemon Inherited from host Courtesy of Yong Qin, Mellanox 2018 Mellanox Technologies 62

MAKING CONTAINERS EASIER WITH HPC CONTAINER MAKER. Scott McMillan September 2018

MAKING CONTAINERS EASIER WITH HPC CONTAINER MAKER Scott McMillan September 2018 NVIDIA GPU CLOUD (NGC) Simple Access to Ready to-run, GPU-Accelerated Software Discover 35 GPU-Accelerated Containers Deep