HPC IN CONTAINERS: WHY CONTAINERS, WHY HPC, WHY NVIDIA GTC 18, S8642, Monday March 26, 11am
|
|
- Mabel Freeman
- 5 years ago
- Views:
Transcription
1 HPC IN CONTAINERS: WHY CONTAINERS, WHY HPC, WHY NVIDIA GTC 18, S8642, Monday March 26, 11am CJ Newburn, HPC Architect, NVIDIA Compute SW Principal Engineer
2 OUTLINE Motivation What NVIDIA is doing Collaborations Requested feedback Call to action 2
3 WHY CONTAINERS: MOTIVATIONAL STORIES War stories from the trenches Hard to configure and install HPC apps App updates get delayed Lack of a reference design Many variants, some better than others Experimental/simulation hybrid molecular modeling as a service Will a given app run on a new platform? Better startup times with fewer libs loaded from bottlenecked metadata servers Encapsulating pipelines reduces complexity 3
4 RUNNING A GPU APPLICATION Customer Pain Points DL Application RHEL 7.3 CUDA 8.0 Driver 375 4x Pascal Python 2.7 Ubuntu CUDA 9.0 Driver 384 4x Volta Python 3.5 This framework requires installing 6 dependencies from sources I want to train my model on the cluster but it s running RHEL 7 Some machines in the cluster have different NVIDIA hardware & drivers How do I deploy a DL model/application at scale 4
5 EXPERIMENTATION+MODELING HPC modeling as a service Experimenters Run equipment to collect raw data Challenge: what s signal vs. noise? Scientists who don t do code or SW administration Augmenting with modeling Model helps filter out noise more accurate with less processing time Provide container, e.g. NAMD on 1 GPU in a few hours 5
6 EASING THE TRANSITION TO IMPROVED SYSTEMS Try before you buy, on your own workload Cloud Legacy Latest GPUs 6
7 TRIMMING LIB DEPENDENCES VIA CONTAINERS Container is a good fit for applying special optimization steps Size of dependent libraries can become huge 4x can make fit in RAMdisk for faster access Metadata server I/O can become bottleneck, e.g. with 20 job groups Trim away shared libs and Python include searches Fix/patch to merge data locally and move to Lustre at the end of the job avoids conflicts RAMdisk access improvements can greatly reduce startup time, even with copy Relevant example ATLAS (CERN) simulations on Titan, courtesy of Sergey Panitkin of BNL Container build defines mount points, installs special versions with perf optimizations Optimizing for size and using RAMdisk halved setup time, reduced runtime by >2 minutes (9%) Background info for this use case courtesy of Adam Simpson, ORNL 7
8 PIPELINE EXAMPLE Moving toward HPC as a service vs. becoming an app mechanic index map sort index report Consider a pipeline of many processes Each could have its own dependences, require its own set up But each stage or the whole set of stages could be containerized Some relevant work: snakemake, SCI-F by Vanessa Sochat, Stanford: The Scientific Filesystem, Containers in HPC Symposium at UCAR, Boulder CO, 8
9 WHY HIGH-PERFORMANCE COMPUTING We in HPC care about performance; democratizing HPC Performance can depend on Tuning discover and apply best-known methods Getting the latest version We are making a transition from HPC for experts to HPC for the masses Breadth of adoption may strongly depend on ease of use The time is ripe! 9
10 PROBLEMS ADDRESSED VIA CONTAINERIZATION Making it easier for users, admins and developers Portability Repeatability Resource isolation New telemetry surface Bare metal performance, vs. VMs Parameterizability and control over runtime 10
11 DESIGNED FOR GPU-ACCELERATED SYSTEMS RUN ON PASCAL- & VOLTA-POWERED SYSTEMS Workstations Supercomputing Clusters Cloud Computing 11
12 OPENMPI DOCKERFILE VARIANTS Real examples: lots of ways, some better than others Enable many versions RUN OPENMPI_VERSION=3.0.0 && \ wget -q -O - mpi.org/software/ompi/v3.0/downloads/openmpi- with parameters to ${OPENMPI_VERSION}.tar.gz tar -xzf - && \ common interface cd openmpi-${openmpi_version} && \./configure --enable-orterun-prefix-by-default --with-cuda -- with-verbs \ --prefix=/usr/local/mpi --disable-getpwuid && \ make -j"$(nproc)" install && \ cd.. && rm -rf openmpi-${openmpi_version} && \ echo "/usr/local/mpi/lib" >> /etc/ld.so.conf.d/openmpi.conf Parameters vary&& ldconfig ENV PATH /usr/local/mpi/bin:$path Control environment RUN apt-get update \ && apt-get install -y --no-install-recommends \ libopenmpi-dev \ openmpi-bin \ openmpi-common \ && rm -rf /var/lib/apt/lists/* ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib Functional, simpler, but not CUDA or IB aware Different compilers COPY openmpi /usr/local/openmpi WORKDIR /usr/local/openmpi RUN /bin/bash -c "source /opt/pgi/license.txt && CC=pgcc CXX=pgc++ F77=pgf77 FC=pgf90./configure --with-cuda -- prefix=/usr/local/openmpi RUN /bin/bash -c "source /opt/pgi/license.txt && make all install" RUN mkdir /logs RUN wget -nv && \ tar -xzf openmpi tar.gz && \ cd openmpi-*&&./configure --with-cuda=/usr/local/cuda \ --enable-mpi-cxx --prefix=/usr 2>&1 tee /logs/openmpi_config && \ make -j 32 2>&1 tee /logs/openmpi_make && make install 2>&1 tee /logs/openmpi_install && cd /tmp \ && rm -rf openmpi-* Bad layering WORKDIR /tmp ADD /tmp RUN tar -xzf openmpi tar.gz && \ cd openmpi-*&&./configure --with-cuda=/usr/local/cuda \ --enable-mpi-cxx --prefix=/usr && \ make -j 32 && make install && cd /tmp \ && rm -rf openmpi-* RUN wget -q -O - tar - xjf - && \ cd openmpi && \ CXX=pgc++ CC=pgcc FC=pgfortran F77=pgfortran./configure -- prefix=/usr/local/openmpi --with-cuda=/usr/local/cuda --with-verbs --disable-getpwuid && \ make -j4 install && \ rm -rf /openmpi
13 WHAT NVIDIA IS DOING Enabling Offerings Technology collaboration 13
14 SCOPE OF ENABLING PLANS Better, more up-to-date results with less effort Ecosystem: nurture a collaborative ecosystem around HPC containers Registry: host containerized applications, CUDA base containers Ingredients, recipes: Recommend and validate best practices HPC Containers: easily derive application containers from these Container technologies: GPU enabled System SW: OS, container runtime, scheduler is GPU enabled Recommended platforms: known-good solutions for HPC apps 14
15 MAKING IT EASIER WITH HPC CONTAINERS Potentially easier for non-expert end users NVIDIA has experience collaborating with developers to containerize HPC apps Identifying, improving, creating ingredients Developing and optimizing recipes Codify those learnings Dockerfiles and other recipe files with tuned steps for each recommended ingredient Careful layering, for the sake of minimizing size, maximizing cacheability Validated combinations in specific HPC base containers from which app containers derived Recipes for building platforms container runtime, scheduler, OS, system Consistent approach to documentation 15
16 HPC APPS CONTAINERS ON NVIDIA GPU CLOUD CANDLE CHROMA* GAMESS GROMACS LAMMPS Lattice Microbes MILC* NAMD RELION *Coming soon RAPID CONTAINER ADDITION RAPID USER ADOPTION 16
17 NVIDIA GPU CLOUD FOR HPC VISUALIZATION U CLOUD FOR HPC VISUALIZATION IndeX ParaView with NVIDIA IndeX ParaView with NVIDIA OptiX ParaView with NVIDIA Holodeck VMD 17
18 NVIDIA CONTAINER RUNTIME Enables GPU support in popular container runtimes Containerized Applications Container Runtime Caffe NAMD Docker Tensor Flow MILC OCI Runtime Interface Nvidia-container-runtime CUDA Libnvidia-container NVIDIA Driver LXC, CRIO etc NVML NVIDIA-Docker makes GPU containers truly portable Integrates Linux container internals instead of wrapping specific runtimes (e.g. Docker) Better integration into the container ecosystem - Kubernetes (CRI), HPC (rootless) 2M downloads 18
19 KUBERNETES ON NVIDIA GPUs GPU enhancements to mainline Kubernetes - get features faster than community releases Updated with each release of K8s (current version is v1.9) and close collaboration with community to upstream changes Minimize friction to adoption of Kubernetes on GPUs Fully open-source KUBERNETES NVIDIA CONTAINER RUNTIME NVIDIA DRIVER 19
20 HPC CONTAINER MAKER - HPCCM h-p-see-um Collect and codify best practices Make recipe file creation easy, repeatable, modular, qualifiable Using this as a reference and a vehicle to drive collaboration Container implementation neutral Write Python code that calls primitives and building blocks vs. roll your own Leverage latest and greatest building blocks 20
21 Python file with references to primitives and parameterized building blocks BIG PICTURE Script that transforms into container recipe file, using primitive and building block implementations $ docker build reference hpccm input recipe hpccm CLI tool Recipe implementations HPC Container Maker: hpccm Dockerfile Base images Singularity recipe file Buildkit, buildah, $ singularity build Docker image docker2singularity Singularity image container spec file container build container image 21
22 HPCCM CONCEPTS AND TERMINOLOGY hpccm input recipe file: what hpccm ingests container recipe file: what hpccm produces, e.g. Dockerfile, Singularity recipe file primitive: line in hpccm-recipe file that has a 1:1 mapping with primitive implementation line in container recipe file building block: line in hpccm-recipe file with a 1:many primitive mapping; mapping is codified in hpccm implementation; these are parameterized recipe implementations: collection of implementations of primitives and building blocks 22
23 RECIPES INCLUDED WITH CONTAINER MAKER Shown in current build order HPC base recipe with GNU compilers Ubuntu CUDA 9.0 Python 2 and 3 GNU compilers (upstream) Mellanox OFED OpenMPI FFTW HDF HPC base recipe with PGI compilers Ubuntu CUDA 9.0 Python 2 and 3 PGI compilers Mellanox OFED OpenMPI FFTW HDF HPC application samples coming 23
24 BUILDING AN HPC APPLICATION IMAGE Analogous workflows for Singularity 1. Use the HPC base image as your starting point Base recipe Dockerfile Base image App Dockerfile 2. Generate a Dockerfile from the HPC base recipe Dockerfile and manually edit it to add the steps to build your application Base recipe Dockerfile App Dockerfile 3. Copy the HPC base recipe file and add your application build steps to the recipe Base recipe App recipe 24
25 HIGHER LEVEL ABSTRACTION Building block encapsulates simplified best practices, avoids duplication ompi = openmpi(version='3.0.0', toolchain=tc) Stage0 += ompi # OpenMPI version RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ file \ hwloc && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp tar.bz2 && \ tar -x -f /tmp/openmpi tar.bz2 -C /tmp -j && \ cd /tmp/openmpi &&./configure --prefix=/usr/local/openmpi --disable-getpwuid --with-cuda --without-verbs && \ make -j4 && \ make -j4 install && \ rm -rf /tmp/openmpi tar.bz2 /tmp/openmpi ENV PATH=/usr/local/openmpi/bin:$PATH \ LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH 25
26 """ HPC Base image Contents: CUDA version 9.0 FFTW version GNU compilers (upstream) HDF5 version Mellanox OFED version OpenMPI version Python 2 and 3 (upstream) """ Stage0 += comment( doc, reformat=false) Stage0 += baseimage(image='nvidia/cuda:9.0-devel', _as='devel') # Python (use upstream) Stage0 += apt_get(ospackages=['python', 'python3']) # Compilers (use upstream) Stage0 += apt_get(ospackages=['gcc', 'g++', 'gfortran']) # Create a toolchain tc = hpccm.toolchain(cc='gcc', CXX='g++', F77='gfortran', F90='gfortran', FC='gfortran', CUDA_HOME='/usr/local/cuda') # Mellanox OFED ofed = mlnx_ofed(version=' ') Stage0 += ofed # OpenMPI ompi = openmpi(version='3.0.0', toolchain=tc) Stage0 += ompi # # HPC Base image # # Contents: # CUDA version 9.0 # FFTW version # GNU compilers (upstream) # HDF5 version # Mellanox OFED version # OpenMPI version # Python 2 and 3 (upstream) # FROM nvidia/cuda:9.0-devel AS devel RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ python \ python3 && \ rm -rf /var/lib/apt/lists/* RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ gcc \ g++ \ gfortran && \ rm -rf /var/lib/apt/lists/* # Mellanox OFED version RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ libnl \ libnl-route \ libnuma1 \ wget && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp x86_64.tgz && \ tar -x -f /tmp/mlnx_ofed_linux ubuntu16.04-x86_64.tgz -C /tmp -z && \ dpkg --install /tmp/mlnx_ofed_linux ubuntu16.04-x86_64/debs/libibverbs1_*_amd64.deb && \ dpkg --install /tmp/mlnx_ofed_linux ubuntu16.04-x86_64/debs/libibverbs-dev_*_amd64.deb && \ dpkg --install /tmp/mlnx_ofed_linux ubuntu16.04-x86_64/debs/libmlx5-1_*_amd64.deb && \ dpkg --install /tmp/mlnx_ofed_linux ubuntu16.04-x86_64/debs/ibverbs-utils_*_amd64.deb && \ rm -rf /tmp/mlnx_ofed_linux ubuntu16.04-x86_64.tgz /tmp/mlnx_ofed_linux ubuntu16.04-x86_64 # OpenMPI version RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ file \ hwloc \ openssh-client \ wget && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp && \ tar -x -f /tmp/openmpi tar.bz2 -C /tmp -j && \ cd /tmp/openmpi && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran./configure --prefix=/usr/local/openmpi --disable-getpwuid --enable-orterunprefix-by-default --with-cuda=/usr/local/cuda --with-verbs && \ make -j4 && \ make -j4 install && \ rm -rf /tmp/openmpi tar.bz2 /tmp/openmpi ENV PATH=/usr/local/openmpi/bin:$PATH \ LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH # FFTW version RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ file \ make \ wget && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp ftp://ftp.fftw.org/pub/fftw/fftw tar.gz && \ tar -x -f /tmp/fftw tar.gz -C /tmp -z && \ cd /tmp/fftw && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran./configure --prefix=/usr/local/fftw --enable-shared --enable-openmp --enablethreads --enable-sse2 && \ make -j4 && \ make -j4 install && \ rm -rf /tmp/fftw tar.gz /tmp/fftw ENV LD_LIBRARY_PATH=/usr/local/fftw/lib:$LD_LIBRARY_PATH # FFTW fftw = fftw(version='3.3.7', toolchain=tc) Stage0 += fftw # HDF5 hdf5 = hdf5(version='1.10.1', toolchain=tc) Stage0 += hdf5 # HDF5 version RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ file \ make \ wget \ zlib1g-dev && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp && \ tar -x -f /tmp/hdf tar.bz2 -C /tmp -j && \ cd /tmp/hdf && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran./configure --prefix=/usr/local/hdf5 --enable-cxx --enable-fortran && \ make -j4 && \ make -j4 install && \ rm -rf /tmp/hdf tar.bz2 /tmp/hdf ENV PATH=/usr/local/hdf5/bin:$PATH \ HDF5_DIR=/usr/local/hdf5 \ LD_LIBRARY_PATH=/usr/local/hdf5/lib:$LD_LIBRARY_PATH 26
27 MULTI-STAGE BUILDS Stage0 += baseimage(image='nvidia/cuda:9.0-devel', _as='devel )... # OpenMPI ompi = openmpi(version='3.0.0', toolchain=tc) Stage0 += ompi... ###### # Runtime image ###### Stage1 += baseimage(image='nvidia/cuda:9.0-runtime )... # OpenMPI Stage1 += ompi.runtime()... # OpenMPI RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ hwloc \ openssh-client && \ rm -rf /var/lib/apt/lists/* FROM nvidia/cuda:9.0-devel AS devel COPY --from=0 /usr/local/openmpi /usr/local/openmpi FROM nvidia/cuda:9.0-runtime ENV PATH=/usr/local/openmpi/bin:$PATH \... LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH... # OpenMPI version RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ file \ hwloc \ openssh-client \ wget && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp && \ tar -x -f /tmp/openmpi tar.bz2 -C /tmp -j && \ cd /tmp/openmpi && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran./configure -- prefix=/usr/local/openmpi --disable-getpwuid --enable-orterun-prefix-by-default --withcuda=/usr/local/cuda --with-verbs && \ make -j4 && \ make -j4 install && \ rm -rf /tmp/openmpi tar.bz2 /tmp/openmpi ENV PATH=/usr/local/openmpi/bin:$PATH \ LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH... # OpenMPI RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ hwloc \ openssh-client && \ rm -rf /var/lib/apt/lists/* COPY --from=0 /usr/local/openmpi /usr/local/openmpi ENV PATH=/usr/local/openmpi/bin:$PATH \ LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH... 27
28 PARAMETERIZED BUILDING BLOCKS Parameters enable specialization; implementations invoke Python code Also: apt-get, FFTW, HDF5, Linux OFED, PGI compiler openmpi(check=false, # run make check? configure_opts=[ disable-getpwuid, enable-orterun-prefix-by-default ], cuda=true, directory=, # path to source in build context infiniband=true, ospackages=[ file, hwloc, openssh-client, wget ], prefix= /usr/local/openmpi, toolchain=toolchain(), version= ) # version to download mlnx_ofed(ospackages=[ libnl-3-200, libnl-route-3-200, libnuma1, wget ], packages=[ libibverbs1, libibverbs-dev, libmlx5-1, ibverbs-utils ], version= ) # version to download Full recipe documentation can be found in RECIPES.md 28
29 CONTAINER IMPLEMENTATION ABSTRACTION Single source to either Docker or Singularity Container Builder primitive Dockerfile Singularity recipe file baseimage(image= ubuntu:16.04 ) FROM ubuntu:16.04 Bootstrap: docker From: ubuntu:16.04 shell(commands=[ a, b, c ]) RUN a && \ b && \ c %post a b c copy(src= a, dest= b ) COPY a b %files a b 29
30 FULL PROGRAMMING LANGUAGE AVAILABLE Conditional branching, validation, etc. in hpccm input recipe # get and validate precision VALID_PRECISION = [ single, double, mixed ] precision = os.environ.get( LAMMPS_PRECISION, single ) if precision not in VALID_PRECISION: raise ValueError( Invalid precision ) Stage0 += shell(commands=[f make f Makefile.linux.{precision}, ]) Courtesy of Logan Herche 30
31 hpccm ENVISIONED FLOW Accelerating the container creation and usage HPC devel container App source HPC App runtime test container binary App final binary HPC runtime container NVIDIA registry App image Validated GPU-enabled technologies Container runtime, scheduler, OS NGC GPU clusters 31
32 V0.5 OUTSIDE-CONTAINER TRADE-OFFS Situation and environment-based choices Ingredient Choices Choice factors CUDA Version 9.0 Supports Kepler through Volta, highest performance Container runtimes Docker, LXC, Shifter, Singularity Docker has best GPU support today. NVIDIA is investing in LXC for rootless Orchestration & scheduling SLURM, Kubernetes SLURM widely used in HPC; Kubernetes widely used in cloud GPU Enablement NVIDIA Container Runtime SDK OCI compliant, enabled multiple container runtimes, multi-node support OS Ubuntu 16.04, CentOS 7 Application based choice. Ubuntu has more testing for GPU enabled containers. CentOS uses RPMs. 32
33 SAMPLE DOCKER FILES We re in the process of normalizing our containers wrt these devel and runtime HPC container offerings GROMACS MILC 33
34 GROMACS DOCKERFILE PART 1; BUILD STAGE FROM nvidia/cuda:9.0-devel-ubuntu16.04 AS devel RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ ca-certificates cmake file git hwloc \ libibverbs-dev openssh-client python wget && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && \ wget -q --no-check-certificate -P /tmp && \ tar -x -f /tmp/openmpi tar.bz2 -C /tmp -j && \ cd /tmp/openmpi && \./configure --prefix=/opt/openmpi --enable-mpi-cxx --with-cuda \ --with-verbs && \ make -j32 && \ make -j32 install && \ rm -rf /tmp/openmpi tar.bz2 /tmp/openmpi ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \ PATH=/opt/openmpi/bin:$PATH Initialize build stage Install packages and cleanup Install OpenMPI 34
35 GROMACS DOCKERFILE PART 2 RUN mkdir -p /gromacs/install && \ mkdir -p /gromacs/builds && \ mkdir -p /tmp && git -C /tmp clone --depth=1 --branch v2018 \ && \ mv /tmp/gromacs /gromacs/src && \ cd /gromacs/builds && \ CC=gcc CXX=g++ cmake /gromacs/src -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=/gromacs/install \ -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \ -DGMX_BUILD_OWN_FFTW=ON -DGMX_GPU=ON -DGMX_MPI=OFF \ -DGMX_OPENMP=ON -DGMX_PREFER_STATIC_LIBS=ON \ -DMPIEXEC_PREFLAGS=--allow-run-as-root \ -DREGRESSIONTEST_DOWNLOAD=ON && \ make -j && \ make install && \ make check Build GROMACS 35
36 GROMACS DOCKERFILE PART 3; RUNTIME STAGE FROM nvidia/cuda:9.0-runtime-ubuntu16.04 RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ hwloc \ libgomp1 \ libibverbs-dev \ openssh-client \ python && \ rm -rf /var/lib/apt/lists/* COPY --from=devel /opt/openmpi /opt/openmpi ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \ PATH=/opt/openmpi/bin:$PATH COPY --from=devel /gromacs/install /gromacs/install ENV PATH=$PATH:/gromacs/install/bin WORKDIR /workspace Initialize release stage Install packages and cleanup Copy OpenMPI from build Copy GROMACS from build 36
37 MILC DOCKERFILE PART 1; BUILD STAGE FROM nvidia/cuda:9.0-devel-ubuntu16.04 AS devel RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ autoconf automake ca-certificates cmake dapl2-utils \ file git hwloc ibutils ibverbs-utils \ infiniband-diags libdapl-dev libibcm-dev \ libibmad5 libibverbs-dev libibverbs1 \ libmlx4-1 libmlx4-dev libmlx5-1 libmlx5-dev \ libnuma-dev librdmacm-dev librdmacm1 opensm \ openssh-client rdmacm-utils wget && \ rm -rf /var/lib/apt/lists/* Initialize build stage Install packages and cleanup 37
38 MILC DOCKERFILE PART 2 RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp tar.bz2 && \ tar -x -f /tmp/openmpi tar.bz2 -C /tmp -j && \ cd /tmp/openmpi && \./configure --prefix=/opt/openmpi --enable-mpi-cxx \ --with-cuda --with-verbs && \ make -j32 && \ make -j32 install && \ rm -rf /tmp/openmpi tar.bz2 /tmp/openmpi ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \ PATH=/opt/openmpi/bin:$PATH Build OpenMPI 38
39 MILC DOCKERFILE PART 3 WORKDIR /quda RUN mkdir -p /tmp && git -C /tmp clone --depth=1 --branch release/0.8.x && \ mv /tmp/quda /quda/src && \ mkdir -p /quda/build && \ cd /quda/build && \ cmake../src -DCMAKE_BUILD_TYPE=RELEASE \ -DQUDA_DIRAC_CLOVER=ON -DQUDA_DIRAC_DOMAIN_WALL=ON \ -DQUDA_DIRAC_STAGGERED=ON \ -DQUDA_DIRAC_TWISTED_CLOVER=ON \ -DQUDA_DIRAC_TWISTED_MASS=ON -DQUDA_DIRAC_WILSON=ON \ -DQUDA_FORCE_GAUGE=ON -DQUDA_FORCE_HISQ=ON \ -DQUDA_GPU_ARCH=sm_70 -DQUDA_INTERFACE_MILC=ON \ -DQUDA_INTERFACE_QDP=ON -DQUDA_LINK_HISQ=ON \ -DQUDA_MPI=ON && \ make -j32 && \ rm -rf /quda/src Build QUDA 39
40 MILC DOCKERFILE PART 4 RUN mkdir -p /tmp && \ git -C /tmp clone --depth=1 && \ Build MILC mv /tmp/milc /milc && \ cd /milc/ks_imp_rhmc/ && \ cp /milc/makefile /milc/ks_imp_rhmc/ && \ sed -i 's/wantquda\(.*\)=.*/wantquda\1= true/g' Makefile && \ sed -i 's/\(want_.*_gpu\)\(.*\)=.*/\1\2= true/g' Makefile && \ sed -i 's/quda_home\(.*\)=.*/quda_home\1= \/quda\/build/g' Makefile && \ sed -i 's/cuda_home\(.*\)=.*/cuda_home\1= \/usr\/local\/cuda/g' Makefile && \ sed -i 's/#\?mpp =.*/MPP = true/g' Makefile && \ sed -i 's/#\?cc =.*/CC = mpicc/g' Makefile && \ sed -i 's/ld\(\s+\)=.*/ld\1= mpicxx/g' Makefile && \ sed -i 's/precision = \d+/precision = 2/g' Makefile && \ sed -i 's/wantqio =.*/WANTQIO = #true or blank. Implies HAVEQMP./g' Makefile && \ sed -i 's/cgeom =.*-DFIX_NODE_GEOM.*/CGEOM = #-DFIX_NODE_GEOM/g' Makefile && \ C_INCLUDE_PATH=/quda/build/include make su3_rhmd_hisq 40
41 MILC DOCKERFILE PART 5; RUNTIME STAGE FROM nvidia/cuda:9.0-runtime-ubuntu16.04 RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ hwloc \ libibverbs1 \ libnuma1 \ librdmacm1 \ openssh-client && \ rm -rf /var/lib/apt/lists/* COPY --from=devel /opt/openmpi /opt/openmpi ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \ PATH=/opt/openmpi/bin:/milc:$PATH COPY --from=devel /milc/ks_imp_rhmc/su3_rhmd_hisq /milc/su3_rhmd_hisq COPY examples /workspace/examples WORKDIR /workspace Initialize release stage Install packages and cleanup Multi-node MPI is enabled Copy OpenMPI from build stage Copy MILC from build stage Copy examples into container 41
42 MILC HPCCM INPUT RECIPE FILE Header Ubuntu 16.04, CUDA 9.0, QUDA, MPI and MILC. Build with: nvidia-docker build -t milc. Run with: nvidia-docker run -it milc 42
43 MILC HPCCM INPUT RECIPE FILE, PART1 # pylint: disable=invalid-name, undefined-variable, used-before-assignment # pylama: ignore=e0602 gpu_arch = USERARG.get('GPU_ARCH', 'sm_70') # add docstring to Dockerfile Stage0 += comment( doc.strip(), reformat=false) ############################################################################### # Devel stage ############################################################################### Stage0.name = 'devel' Stage0 += baseimage(image='nvidia/cuda:9.0-devel-ubuntu16.04', AS=Stage0.name) Stage0 += apt_get(ospackages=['autoconf', 'automake', 'cmake', 'git', 'ca-certificates']) Stage0 += ofed() mpi_prefix = '/opt/openmpi' ompi = openmpi(configure_opts=['--enable-mpi-cxx'], prefix=mpi_prefix, parallel=32, version="3.0.0") Stage0 += ompi 43
44 MILC HPCCM INPUT RECIPE FILE, PART 2: QUDA # build QUDA git = hpccm.git() quda_build_dir = '/quda/build' Stage0 += workdir(directory="/quda") Stage0 += shell(commands=[git.clone_step(repository=" branch="release/0.8.x"), 'mv /tmp/quda /quda/src', 'mkdir -p {}'.format(quda_build_dir), 'cd {}'.format(quda_build_dir), ('cmake../src ' + '-DCMAKE_BUILD_TYPE=RELEASE ' + '-DQUDA_DIRAC_CLOVER=ON ' + '-DQUDA_DIRAC_DOMAIN_WALL=ON ' + '-DQUDA_DIRAC_STAGGERED=ON ' + '-DQUDA_DIRAC_TWISTED_CLOVER=ON ' + '-DQUDA_DIRAC_TWISTED_MASS=ON ' + '-DQUDA_DIRAC_WILSON=ON ' + '-DQUDA_FORCE_GAUGE=ON ' + '-DQUDA_FORCE_HISQ=ON ' + '-DQUDA_GPU_ARCH={} '.format(gpu_arch) + '-DQUDA_INTERFACE_MILC=ON ' + '-DQUDA_INTERFACE_QDP=ON ' + '-DQUDA_LINK_HISQ=ON ' + '-DQUDA_MPI=ON'), 'make -j32', 'rm -rf /quda/src']) 44
45 MILC HPCCM INPUT RECIPE FILE, PART 3: MILC # build MILC Stage0 += shell(commands=[git.clone_step(repository=" 'mv /tmp/milc_qcd /milc', 'cd /milc/ks_imp_rhmc/', 'cp /milc/makefile /milc/ks_imp_rhmc/', r"sed -i 's/wantquda\(.*\)=.*/wantquda\1= true/g' Makefile", r"sed -i 's/\(want_.*_gpu\)\(.*\)=.*/\1\2= true/g' Makefile", r"sed -i 's/quda_home\(.*\)=.*/quda_home\1= \/quda\/build/g' Makefile", r"sed -i 's/cuda_home\(.*\)=.*/cuda_home\1= \/usr\/local\/cuda/g' Makefile", r"sed -i 's/#\?mpp =.*/MPP = true/g' Makefile", r"sed -i 's/#\?cc =.*/CC = mpicc/g' Makefile", r"sed -i 's/ld\(\s+\)=.*/ld\1= mpicxx/g' Makefile", r"sed -i 's/precision = \d+/precision = 2/g' Makefile", r"sed -i 's/wantqio =.*/WANTQIO = #true or blank. Implies HAVEQMP./g' Makefile", r"sed -i 's/cgeom =.*-DFIX_NODE_GEOM.*/CGEOM = #-DFIX_NODE_GEOM/g' Makefile", 'C_INCLUDE_PATH={}/include make su3_rhmd_hisq'.format(quda_build_dir)]) 45
46 MILC HPCCM INPUT RECIPE FILE, PART 4: RELEASE ############################################################################### # Release stage ############################################################################### Stage0 += baseimage(image='nvidia/cuda:9.0-release-ubuntu16.04') Stage0 += apt_get(ospackages=['libnuma1', 'ssh', 'libibverbs1', 'librdmacm1']) Stage1 += ompi.runtime() Stage1 += copy(_from=stage0.name, src='/milc/ks_imp_rhmc/su3_rhmd_hisq', dest='/milc/su3_rhmd_hisq') Stage1 += copy(src='examples', dest='/workspace/examples') Stage1 += workdir(directory='/workspace') 46
47 COLLABORATIONS Nurturing a communal effort Mellanox [Yong Qin] Negligible overhead from using containers Working on resolving driver versioning issues Collaborating on best recipes, including for multi-node Dell [Nishanth Dandapanthula] Negligible overhead from using containers Ease of use Universities and labs Evaluations, feedback, use cases 47
48 BW (MB/sec) Latency (us) RDMA Performance RDMA BW (EDR) Size (Bytes) Host Singularity Docker RDMA Latency (EDR) Size (Bytes) Host Singularity Docker RDMA performances are inline between host and containers Images built with hpccm Courtesy of Yong Qin, Mellanox 2018 Mellanox Technologies 48
49 Latency (us) RDMA Performance (Cont.) 1000 RDMA Latency (99% percentile) Size (Bytes) Host Singularity Docker Larger variations observed on small message sizes due to runtime overheads from Containers Courtesy of Yong Qin, Mellanox 2018 Mellanox Technologies 49
50 Courtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC 50
51 Courtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC 51
52 Image built with hpccm Courtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC 52
53 Courtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC 53
54 KEY ISSUES Working with the community to deliver leading reference solutions Addressing key issues across the ecosystem to increase container adoption Developers Posting containerized HPC apps to our registry Infrastructure for making containers: recipes, scripts (hpccm), validated images Admins Driver matching Multi-node containers End users Working with OEMs to assure best performance Using containers from our registry 54
55 DRIVER VERSIONING One down, more to go Problem Container doesn t know which kernel driver versions are installed on the target platform Mismatches may be problematic, e.g. CUDA or mofed user and kernel drivers One approach Appropriate kernel driver is loaded into the container with container runtime enabling Relevance There are available solutions for Docker and Singularity for the CUDA driver case This issue is being actively worked in the Mellanox driver case Plans for Mellanox drivers Nail down support matrix Share test cases for regression suite, based on hpccm input receipe, platform config 55
56 YOUR FEEDBACK What value do you see to using containers that motivate you to containerize your app? What ingredients do you most want to see inside of an HPC container? outside of an HPC container? What are your pain points around developing containers? What pain points do you hear about for deploying containers? Are you willing to try out NVIDIA s containerized HPC apps? the hpccm infrastructure that helps with containerization of HPC apps? 56
57 CALL TO ACTION Try - OSS project Find this content at GTC website for Monday Mar 26 11am by CJ Newburn App developers Build your containers with HPCCM & deploy on NGC, offer feedback Take the opportunity to focus efforts, collaborate around a reference System Admins Make your cluster container ready with Docker, LXC and/or Singularity runtimes Application users Pull and run containers from ngc.nvidia.com Enjoy HPC apps with greater ease and confidence OEMs Build container-ready systems with NGC 57
58 FAQ Supported systems - The containers must run on Pascal, Volta. & newer GPU-powered systems Testing & performance NVIDIA may QA and benchmark the container License agreement Developer has to comply with all the app license requirements Ownership Container developer owns and retains all the rights, tile, and interest in and to HPC containers Support Developer must provide technical support to the end user of the container Cost NVIDIA will host the containers on NGC for free Container removal Both NVIDIA and the container developer have the right to take down the container at any time for any reason 58
59 REQUEST FOR FEEDBACK: HPC RUNTIME CONTAINER V0.5 Design and validate HPC container, derive from there for apps Kind of ingredient Recommended choice Rationale, alternatives OS Ubuntu Aligned with DL, current focus Alternative: CentOS 7 CUDA version 9.0 Backwards compatible [driver upgrade] CUDA type Runtime For deployment, not development Compiler PGI, gcc [and Intel] runtimes Actual compiler not needed for most usages Comms libraries CUDA-aware OpenMPI MOFED libs Scientific libraries FFTW [and MKL] Most commonly used OpenMPI CUDA enabling is underway via UCX Infrastructure Python 2 and 3, HDF Commonly used, may add more tools 59
60 REQUEST FOR FEEDBACK: HPC DEVEL CONTAINER V0.5 Design and validate HPC container, derive from there for apps Kind of ingredient Recommended choice Rationale, alternatives OS Ubuntu Aligned with DL, current focus Alternative: CentOS 7 CUDA version 9.0 Backwards compatible [driver upgrade] CUDA type Devel For development, include CUDA toolkit Compiler PGI, gcc [and Intel] compilers Compiler and its license in private images only Comms libraries CUDA-aware OpenMPI MOFED libs Scientific libraries FFTW [and MKL] Most commonly used OpenMPI CUDA enabling is underway via UCX Infrastructure Python 2 and 3, HDF Commonly used, may add more tools 60
61 Container Namespace Isolation Docker Singularity Namespace Isolation Share almost nothing Share almost everything File System (Mount) Isolated by default Can bind mount host volumes $HOME, /proc, /sys, /tmp, etc., from host by default Can bind mount other host volumes PID Isolated Shared Network Isolated Can be expanded with full support Shared with limited support Courtesy of Yong Qin, Mellanox 2018 Mellanox Technologies 61
62 MPI Docker Singularity MPI library Inside of container Outside of container (host) MPI Program Binary Inside of container Inside of container Network Container Host Security Docker Daemon Inherited from host Courtesy of Yong Qin, Mellanox 2018 Mellanox Technologies 62
MAKING CONTAINERS EASIER WITH HPC CONTAINER MAKER. Scott McMillan September 2018
MAKING CONTAINERS EASIER WITH HPC CONTAINER MAKER Scott McMillan September 2018 NVIDIA GPU CLOUD (NGC) Simple Access to Ready to-run, GPU-Accelerated Software Discover 35 GPU-Accelerated Containers Deep
More informationThe Path to GPU as a Service in Kubernetes Renaud Gaubert Lead Kubernetes Engineer
The Path to GPU as a Service in Kubernetes Renaud Gaubert , Lead Kubernetes Engineer May 03, 2018 RUNNING A GPU APPLICATION Customers using DL DL Application RHEL 7.3 CUDA 8.0 Driver 375
More informationCentre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules. Singularity overview. Vanessa HAMAR
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules Singularity overview Vanessa HAMAR Disclaimer } The information in this presentation was compiled from different
More informationNVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI
NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI Overview Unparalleled Value Product Portfolio Software Platform From Desk to Data Center to Cloud Summary AI researchers depend on computing performance to gain
More informationInstall your scientific software stack easily with Spack
Install your scientific software stack easily with Spack Les mardis du développement technologique Florent Pruvost (SED) Outline 1. Context 2. Features overview 3. In practice 4. Some feedback Florent
More informationNGC CONTAINER. DU _v02 November User Guide
NGC CONTAINER DU-08812-001_v02 November 2017 User Guide TABLE OF CONTENTS Chapter 1. Docker Containers... 1 1.1. What Is A Docker Container?... 1 1.2. Why Use A Container?... 2 Chapter 2. Prerequisites...3
More informationContainers. Pablo F. Ordóñez. October 18, 2018
Containers Pablo F. Ordóñez October 18, 2018 1 Welcome Song: Sola vaya Interpreter: La Sonora Ponceña 2 Goals Containers!= ( Moby-Dick ) Containers are part of the Linux Kernel Make your own container
More informationS INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS
S8497 - INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS Chris Lamb CUDA and NGC Engineering, NVIDIA John Barco NGC Product Management, NVIDIA NVIDIA GPU Cloud (NGC) overview AGENDA Using NGC
More informationPresented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory CONTAINERS IN HPC WITH SINGULARITY
Presented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory gmkurtzer@lbl.gov CONTAINERS IN HPC WITH SINGULARITY A QUICK REVIEW OF THE LANDSCAPE Many types of virtualization
More informationADINA DMP System 9.3 Installation Notes
ADINA DMP System 9.3 Installation Notes for Linux (only) Updated for version 9.3.2 ADINA R & D, Inc. 71 Elton Avenue Watertown, MA 02472 support@adina.com www.adina.com ADINA DMP System 9.3 Installation
More informationShifter and Singularity on Blue Waters
Shifter and Singularity on Blue Waters Maxim Belkin June 7, 2018 A simplistic view of a scientific application DATA RESULTS My Application Received an allocation on Blue Waters! DATA RESULTS My Application
More informationShifter: Fast and consistent HPC workflows using containers
Shifter: Fast and consistent HPC workflows using containers CUG 2017, Redmond, Washington Lucas Benedicic, Felipe A. Cruz, Thomas C. Schulthess - CSCS May 11, 2017 Outline 1. Overview 2. Docker 3. Shifter
More informationSingularity: Containers for High-Performance Computing. Grigory Shamov Nov 21, 2017
Singularity: Containers for High-Performance Computing Grigory Shamov Nov 21, 2017 Outline Software and High Performance Computing: Installation/Maintenance of the HPC Software stack Why containers and
More informationRed Hat Atomic Details Dockah, Dockah, Dockah! Containerization as a shift of paradigm for the GNU/Linux OS
Red Hat Atomic Details Dockah, Dockah, Dockah! Containerization as a shift of paradigm for the GNU/Linux OS Daniel Riek Sr. Director Systems Design & Engineering In the beginning there was Stow... and
More informationContaining RDMA and High Performance Computing
Containing RDMA and High Performance Computing Liran Liss ContainerCon 2015 Agenda High Performance Computing (HPC) networking RDMA 101 Containing RDMA Challenges Solution approach RDMA network namespace
More informationAccelio The OpenSource I/O, Message, and RPC Acceleration Library
Accelio The OpenSource I/O, Message, and RPC Acceleration Library Rev 1.0 www.accelio.org Table of Contents Table of Contents.......................................................... 2 List Of Tables.............................................................
More informationScientific Filesystem. Vanessa Sochat
Scientific Filesystem Vanessa Sochat vsochat@stanford.edu 1 Welcome! 2 Welcome! 3 Let s all be friends! 5 Once upon a time There was a scientist Foo Bar foo bar e c n e sci + 12 Pipeline foobar foo
More informationShifter at CSCS Docker Containers for HPC
Shifter at CSCS Docker Containers for HPC HPC Advisory Council Swiss Conference Alberto Madonna, Lucas Benedicic, Felipe A. Cruz, Kean Mariotti - CSCS April 9 th, 2018 Table of Contents 1. Introduction
More informationSTATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID
The WLCG Motivation and benefits Container engines Experiments status and plans Security considerations Summary and outlook STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID SWISS EXPERIENCE
More informationContainerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the Cloud SUBBU RAMA FUTURE OF PACKAGING APPLICATIONS Turns Discrete Computing Resources into a Virtual Supercomputer GPU Mem Mem GPU GPU Mem
More informationOPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA
OPEN MPI WITH RDMA SUPPORT AND CUDA Rolf vandevaart, NVIDIA OVERVIEW What is CUDA-aware History of CUDA-aware support in Open MPI GPU Direct RDMA support Tuning parameters Application example Future work
More informationHigh Performance Containers. Convergence of Hyperscale, Big Data and Big Compute
High Performance Containers Convergence of Hyperscale, Big Data and Big Compute Christian Kniep Technical Account Manager, Docker Brief Recap of Container Technology Brief History of Container Technology
More informationState of Containers. Convergence of Big Data, AI and HPC
State of Containers Convergence of Big Data, AI and HPC Technology ReCap Comparison of Hypervisor and Container Virtualization VM1 VM2 appa appb Userland Userland Kernel Kernel Operational Abstraction
More informationOpportunities for container environments on Cray XC30 with GPU devices
Opportunities for container environments on Cray XC30 with GPU devices Cray User Group 2016, London Sadaf Alam, Lucas Benedicic, T. Schulthess, Miguel Gila May 12, 2016 Agenda Motivation Container technologies,
More informationTravis Cardwell Technical Meeting
.. Introduction to Docker Travis Cardwell Tokyo Linux Users Group 2014-01-18 Technical Meeting Presentation Motivation OS-level virtualization is becoming accessible Docker makes it very easy to experiment
More informationDocker & why we should use it
Docker & why we should use it Vicențiu Ciorbaru Software Engineer @ MariaDB Foundation * * Agenda What is Docker? What Docker brings to the table compared to KVM and Vagrant? Docker tutorial What is Docker
More informationDGX UPDATE. Customer Presentation Deck May 8, 2017
DGX UPDATE Customer Presentation Deck May 8, 2017 NVIDIA DGX-1: The World s Fastest AI Supercomputer FASTEST PATH TO DEEP LEARNING EFFORTLESS PRODUCTIVITY REVOLUTIONARY AI PERFORMANCE Fully-integrated
More informationSingularity tests at CC-IN2P3 for Atlas
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules Singularity tests at CC-IN2P3 for Atlas Vamvakopoulos Emmanouil Journées LCG-France, 22-24 Novembre 2017, LPC
More informationCONTAINERIZING JOBS ON THE ACCRE CLUSTER WITH SINGULARITY
CONTAINERIZING JOBS ON THE ACCRE CLUSTER WITH SINGULARITY VIRTUAL MACHINE (VM) Uses so&ware to emulate an en/re computer, including both hardware and so&ware. Host Computer Virtual Machine Host Resources:
More informationWho is Docker and how he can help us? Heino Talvik
Who is Docker and how he can help us? Heino Talvik heino.talvik@seb.ee heino.talvik@gmail.com What is Docker? Software guy view: Marriage of infrastucture and Source Code Management Hardware guy view:
More informationThe Why and How of HPC-Cloud Hybrids with OpenStack
The Why and How of HPC-Cloud Hybrids with OpenStack OpenStack Australia Day Melbourne June, 2017 Lev Lafayette, HPC Support and Training Officer, University of Melbourne lev.lafayette@unimelb.edu.au 1.0
More informationSYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS
SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA S Axel Koehler, Principal Solution Architect HPCN%Workshop%Goettingen,%14.%Mai%2018 NVIDIA - AI COMPUTING COMPANY Computer Graphics Computing Artificial Intelligence
More informationIntroduction to Containers. Martin Čuma Center for High Performance Computing University of Utah
Introduction to Containers Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu Overview Why do we want to use containers? Containers basics Prepare your computer for containers
More informationDEPLOYMENT MADE EASY!
DEPLOYMENT MADE EASY! Presented by Hunde Keba & Ashish Pagar 1 DSFederal Inc. We provide solutions to Federal Agencies Our technology solutions connect customers to the people they serve 2 Necessity is
More informationNVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS
TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationDeploying (community) codes. Martin Čuma Center for High Performance Computing University of Utah
Deploying (community) codes Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu Outline What codes our users need Prerequisites Who installs what? Community codes Commercial
More informationShifter on Blue Waters
Shifter on Blue Waters Why Containers? Your Computer Another Computer (Supercomputer) Application Application software libraries System libraries software libraries System libraries Why Containers? Your
More informationUP! TO DOCKER PAAS. Ming
UP! TO DOCKER PAAS Ming Jin(mjin@thoughtworks.com) March 15, 2015 1 WHO AM I Ming Jin Head of Cloud Solutions of ThoughtWorks China Architect, Agile Consulting Solutions and Consulting on DevOps & Cloud
More informationImplementing DPDK based Application Container Framework with SPP YASUFUMI OGAWA, NTT
x Implementing DPDK based Application Container Framework with SPP YASUFUMI OGAWA, NTT Agenda Introduction of SPP SPP Container Containerize DPDK Apps SPP Container Tools Usecases Limitations and Restrictions
More informationMellanox GPUDirect RDMA User Manual
Mellanox GPUDirect RDMA User Manual Rev 1.0 www.mellanox.com NOTE: THIS HARDWARE, SOFTWARE OR TEST SUITE PRODUCT ( PRODUCT(S) ) AND ITS RELATED DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES AS-IS
More informationDocker for HPC? Yes, Singularity! Josef Hrabal
Docker for HPC? Yes, Singularity! Josef Hrabal IT4Innovations josef.hrabal@vsb.cz support@it4i.cz Virtual Machine Hardware (CPU, Memory, NIC, HDD) Host OS (Windows, Linux, MacOS) Hypervisor (VirtualBox,
More informationSingularity CRI User Documentation
Singularity CRI User Documentation Release 1.0 Sylabs Apr 02, 2019 CONTENTS 1 Installation 1 1.1 Overview................................................. 1 1.2 Before you begin.............................................
More informationBright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers
Bright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers Technical White Paper Table of Contents Pre-requisites...1 Setup...2 Run PyTorch in Kubernetes...3 Run PyTorch in Singularity...4 Run
More information[Docker] Containerization
[Docker] Containerization ABCD-LMA Working Group Will Kinard October 12, 2017 WILL Kinard Infrastructure Architect Software Developer Startup Venture IC Husband Father Clemson University That s me. 2 The
More informationInvestigating Containers for Future Services and User Application Support
Investigating Containers for Future Services and User Application Support JLAB CNI NLIT 2018 () Overview JLAB scope What is a container? Why are we interested? Platform-as-a-Service (PaaS) for orchestration
More informationTENSORRT 3.0. DU _v3.0 February Installation Guide
TENSORRT 3.0 DU-08731-001_v3.0 February 2018 Installation Guide TABLE OF CONTENTS Chapter 1. Overview... 1 Chapter 2. Getting Started... 2 Chapter 3. Downloading TensorRT...4 Chapter 4. Installing TensorRT...
More informationDGX-1 DOCKER USER GUIDE Josh Park Senior Solutions Architect Contents created by Jack Han Solutions Architect
DGX-1 DOCKER USER GUIDE 17.08 Josh Park Senior Solutions Architect Contents created by Jack Han Solutions Architect AGENDA Introduction to Docker & DGX-1 SW Stack Docker basic & nvidia-docker Docker image
More informationGetting Started With Containers
DEVNET 2042 Getting Started With Containers Matt Johnson Developer Evangelist @mattdashj Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session
More informationTOSS - A RHEL-based Operating System for HPC Clusters
TOSS - A RHEL-based Operating System for HPC Clusters Supercomputing 2017 Red Hat Booth November 14, 2017 Ned Bass System Software Development Group Leader Livermore Computing Division LLNL-PRES-741473
More informationHow Container Runtimes matter in Kubernetes?
How Container Runtimes matter in Kubernetes? Kunal Kushwaha NTT OSS Center About me Works @ NTT Open Source Software Center Contributes to containerd and other related projects. Docker community leader,
More informationBuilding A Better Test Platform:
Building A Better Test Platform: A Case Study of Improving Apache HBase Testing with Docker Aleks Shulman, Dima Spivak Outline About Cloudera Apache HBase Overview API compatibility API compatibility testing
More informationTENSORRT 4.0 RELEASE CANDIDATE (RC)
TENSORRT 4.0 RELEASE CANDIDATE (RC) DU-08731-001_v4.0 RC March 2018 Installation Guide TABLE OF CONTENTS Chapter 1. Overview... 1 Chapter 2. Getting Started... 2 Chapter 3. Downloading TensorRT...3 Chapter
More informationBioshadock. O. Sallou - IRISA Nettab 2016 CC BY-CA 3.0
Bioshadock O. Sallou - IRISA Nettab 2016 CC BY-CA 3.0 Containers 2 Docker, LXC, Rkt and Co Docker is the current leader in container ecosystem but not alone in ecosystem Rkt compatible with Docker images
More informationMulti-Arch Layered Image Build System
Multi-Arch Layered Image Build System PRESENTED BY: Adam Miller Fedora Engineering, Red Hat CC BY-SA 2.0 Today's Topics Define containers in the context of Linux systems Brief History/Background Container
More informationSingularity in CMS. Over a million containers served
Singularity in CMS Over a million containers served Introduction The topic of containers is broad - and this is a 15 minute talk! I m filtering out a lot of relevant details, particularly why we are using
More informationThe Arm Technology Ecosystem: Current Products and Future Outlook
The Arm Technology Ecosystem: Current Products and Future Outlook Dan Ernst, PhD Advanced Technology Cray, Inc. Why is an Ecosystem Important? An Ecosystem is a collection of common material Developed
More informationArm in HPC. Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm Arm Limited
Arm in HPC Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm 2019 Arm Limited Arm Technology Connects the World Arm in IOT 21 billion chips in the past year Mobile/Embedded/IoT/ Automotive/GPUs/Servers
More informationDeveloping and Testing Java Microservices on Docker. Todd Fasullo Dir. Engineering
Developing and Testing Java Microservices on Docker Todd Fasullo Dir. Engineering Agenda Who is Smartsheet + why we started using Docker Docker fundamentals Demo - creating a service Demo - building service
More informationIntroduction to Containers
Introduction to Containers Shawfeng Dong Principal Cyberinfrastructure Engineer University of California, Santa Cruz What are Containers? Containerization, aka operating-system-level virtualization, refers
More informationdocker & HEP: containerization of applications for development, distribution and preservation
docker & HEP: containerization of applications for development, distribution and preservation Sébastien Binet LAL/IN2P3 2015-04-13 S. Binet (LAL) docker-hep 2015-04-13 1 / 16 Docker: what is it? http://www.docker.io/
More informationDocker und IBM Digital Experience in Docker Container
Docker und IBM Digital Experience in Docker Container 20. 21. Juni 2017 IBM Labor Böblingen 1 What is docker Introduction VMs vs. containers Terminology v Docker components 2 6/22/2017 What is docker?
More informationThink Small to Scale Big
Think Small to Scale Big Intro to Containers for the Datacenter Admin Pete Zerger Principal Program Manager, MVP pete.zerger@cireson.com Cireson Lee Berg Blog, e-mail address, title Company Pete Zerger
More informationViryaOS RFC: Secure Containers for Embedded and IoT. A proposal for a new Xen Project sub-project
ViryaOS RFC: Secure Containers for Embedded and IoT A proposal for a new Xen Project sub-project Stefano Stabellini @stabellinist The problem Package applications for the target Contain all dependencies
More informationRed Hat OpenShift Roadmap Q4 CY16 and H1 CY17 Releases. Lutz Lange Solution
Red Hat OpenShift Roadmap Q4 CY16 and H1 CY17 Releases Lutz Lange Solution Architect @AtomicContainer OpenShift Roadmap OpenShift Container Platform 3.2 Kubernetes 1.2 & Docker 1.9
More informationSunil Shah SECURE, FLEXIBLE CONTINUOUS DELIVERY PIPELINES WITH GITLAB AND DC/OS Mesosphere, Inc. All Rights Reserved.
Sunil Shah SECURE, FLEXIBLE CONTINUOUS DELIVERY PIPELINES WITH GITLAB AND DC/OS 1 Introduction MOBILE, SOCIAL & CLOUD ARE RAISING CUSTOMER EXPECTATIONS We need a way to deliver software so fast that our
More informationBUILDING A GPU-FOCUSED CI SOLUTION
BUILDING A GPU-FOCUSED CI SOLUTION Mike Wendt @mike_wendt github.com/nvidia github.com/mike-wendt Need for CPU CI Challenges of GPU CI Methods to Implement GPU CI AGENDA Improving GPU CI Today Demo Lessons
More informationDocker and Security. September 28, 2017 VASCAN Michael Irwin
Docker and Security September 28, 2017 VASCAN Michael Irwin Quick Intro - Michael Irwin 2011 - Graduated (CS@VT); started full-time at VT Sept 2015 - Started using Docker for QA June 2016 - Attended first
More informationOptimizing Docker Images
Optimizing Docker Images Brian DeHamer - CenturyLink Labs bdehamer CenturyLinkLabs @bdehamer @centurylinklabs Overview Images & Layers Minimizing Image Size Leveraging the Image Cache Dockerfile Tips
More informationInfoblox Kubernetes1.0.0 IPAM Plugin
2h DEPLOYMENT GUIDE Infoblox Kubernetes1.0.0 IPAM Plugin NIOS version 8.X August 2018 2018 Infoblox Inc. All rights reserved. Infoblox Kubernetes 1.0.0 IPAM Deployment Guide August 2018 Page 1 of 18 Overview...
More informationDocker and Oracle Everything You Wanted To Know
Docker and Oracle Everything You Wanted To Know June, 2017 Umesh Tanna Principal Technology Sales Consultant Oracle Sales Consulting Centers(SCC) Bangalore Safe Harbor Statement The following is intended
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationAn introduction to Docker
An introduction to Docker Ing. Vincenzo Maffione Operating Systems Security Container technologies on Linux Several light virtualization technologies are available for Linux They build on cgroups, namespaces
More informationRDMA Container Support. Liran Liss Mellanox Technologies
RDMA Container Support Liran Liss Mellanox Technologies Agenda Containers 101 RDMA isolation Namespace support Controller support Putting it all together Status Conclusions March 15 18, 2015 #OFADevWorkshop
More informationGuillimin HPC Users Meeting
Guillimin HPC Users Meeting July 16, 2015 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Outline Compute Canada News Storage Updates Software Updates Training
More informationovirt and Docker Integration
ovirt and Docker Integration October 2014 Federico Simoncelli Principal Software Engineer Red Hat 1 Agenda Deploying an Application (Old-Fashion and Docker) Ecosystem: Kubernetes and Project Atomic Current
More informationUSING DOCKER FOR MXCUBE DEVELOPMENT AT MAX IV
USING DOCKER FOR MXCUBE DEVELOPMENT AT MAX IV Fredrik Bolmsten, Antonio Milán Otero K.I.T.S. Group at Max IV - 2017 1 OVERVIEW What is Docker? How does it work? How we use it for MxCUBE How to create a
More informationThe BioHPC Nucleus Cluster & Future Developments
1 The BioHPC Nucleus Cluster & Future Developments Overview Today we ll talk about the BioHPC Nucleus HPC cluster with some technical details for those interested! How is it designed? What hardware does
More informationGenomics on Cisco Metacloud + SwiftStack
Genomics on Cisco Metacloud + SwiftStack Technology is a large component of driving discovery in both research and providing timely answers for clinical treatments. Advances in genomic sequencing have
More informationConfiguring Non-Volatile Memory Express* (NVMe*) over Fabrics on Intel Omni-Path Architecture
Configuring Non-Volatile Memory Express* (NVMe*) over Fabrics on Intel Omni-Path Architecture Document Number: J78967-1.0 Legal Disclaimer Legal Disclaimer You may not use or facilitate the use of this
More informationIndex. Bessel function, 51 Big data, 1. Cloud-based version-control system, 226 Containerization, 30 application, 32 virtualize processes, 30 31
Index A Amazon Web Services (AWS), 2 account creation, 2 EC2 instance creation, 9 Docker, 13 IP address, 12 key pair, 12 launch button, 11 security group, 11 stable Ubuntu server, 9 t2.micro type, 9 10
More informationIntroduction to Containers. Martin Čuma Center for High Performance Computing University of Utah
Introduction to Containers Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu Overview Why do we want to use containers? Containers basics Prepare your computer for containers
More informationNAMD GPU Performance Benchmark. March 2011
NAMD GPU Performance Benchmark March 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory
More informationBare Metal Library. Abstractions for modern hardware Cyprien Noel
Bare Metal Library Abstractions for modern hardware Cyprien Noel Plan 1. 2. 3. Modern Hardware? New challenges & opportunities Three use cases Current solutions Leveraging hardware Simple abstraction Myself
More informationCS-580K/480K Advanced Topics in Cloud Computing. Container III
CS-580/480 Advanced Topics in Cloud Computing Container III 1 Docker Container https://www.docker.com/ Docker is a platform for developers and sysadmins to develop, deploy, and run applications with containers.
More informationDeploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain
Deploying remote virtualization with rcuda Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC ADMINTECH 2016 2/53 It deals with s, obviously! HPC ADMINTECH
More informationYOUR APPLICATION S JOURNEY TO THE CLOUD. What s the best way to get cloud native capabilities for your existing applications?
YOUR APPLICATION S JOURNEY TO THE CLOUD What s the best way to get cloud native capabilities for your existing applications? Introduction Moving applications to cloud is a priority for many IT organizations.
More informationAndroid meets Docker. Jing Li
Android meets Docker Jing Li 1 2 > 50 cities in Europe 3 Developer Story 4 Pain in the Admin provision machines ( e.g. mobile CI ) 5 Containerization vs Virtualization 6 Why Docker? Docker Vagrant Resource
More informationSharing High-Performance Devices Across Multiple Virtual Machines
Sharing High-Performance Devices Across Multiple Virtual Machines Preamble What does sharing devices across multiple virtual machines in our title mean? How is it different from virtual networking / NSX,
More informationIntroduction to containers
Introduction to containers Nabil Abdennadher nabil.abdennadher@hesge.ch 1 Plan Introduction Details : chroot, control groups, namespaces My first container Deploying a distributed application using containers
More informationTHE STATE OF CONTAINERS
THE STATE OF CONTAINERS Engines & Runtimes in RHEL & OpenShift Scott McCarty Principal Technology Product Manager - Containers 10/15/2018 What if... I told you there is container innovation happening in
More informationFIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS
WHITE PAPER FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS Over the past 15 years, server virtualization has become the preferred method of application deployment in the enterprise datacenter.
More informationRunning MarkLogic in Containers (Both Docker and Kubernetes)
Running MarkLogic in Containers (Both Docker and Kubernetes) Emma Liu Product Manager, MarkLogic Vitaly Korolev Staff QA Engineer, MarkLogic @vitaly_korolev 4 June 2018 MARKLOGIC CORPORATION Source: http://turnoff.us/image/en/tech-adoption.png
More informationMellanox GPUDirect RDMA User Manual
Mellanox GPUDirect RDMA User Manual Rev 1.2 www.mellanox.com NOTE: THIS HARDWARE, SOFTWARE OR TEST SUITE PRODUCT ( PRODUCT(S) ) AND ITS RELATED DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES AS-IS
More informationNexus Application Development - SDK
This chapter contains the following sections: About the Cisco SDK, page 1 Installing the SDK, page 1 Procedure for Installation and Environment Initialization, page 2 Using the SDK to Build Applications,
More information2 Setting up the RDMA Framework for Development
Spring Term 2015 ADVANCED COMPUTER NETWORKS Project P1: Introduction to RDMA Programming Assigned on: 16 April 2015 Due by: 29 April 2015, 23:59 1 Introduction The goal of this project is to give an introduction
More informationRed Hat Container Strategy Ahmed El-Rayess
Red Hat Container Strategy Ahmed El-Rayess I.T. Organiza,ons Under Pressure CONCRETE SHOES OF LEGACY AND RIGID PROCESSES CURRENT STATE Manual processes Inconsistent environments Dependency hell Legacy
More informationDevOps Workflow. From 0 to kube in 60 min. Christian Kniep, v Technical Account Manager, Docker Inc.
DevOps Workflow From 0 to kube in 60 min http://qnib.org/devops-workflow Christian Kniep, v2018-02-20 Technical Account Manager, Docker Inc. Motivation Iteration barriers Works on my Laptop! Why is DevOps
More informationUbuntu Linux Inbox Driver User Manual
Ubuntu 17.10 Linux Inbox Driver User Manual www.mellanox.com Mellanox Technologies Doc Nr: MLNX-15-060059 Mellanox Technologies 2 Table of Contents Document Revision History... 5 1 Firmware Burning...
More informationDocker 101 Workshop. Eric Smalling - Solution Architect, Docker
Docker 101 Workshop Eric Smalling - Solution Architect, Docker Inc. @ericsmalling Who Am I? Eric Smalling Solution Architect Docker Customer Success Team ~25 years in software development, architecture,
More information