Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc C O M P U T E S T O R E A N A L Y Z E

Similar documents
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Deep Learning Frameworks with Spark and GPUs

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python

Index. Bessel function, 51 Big data, 1. Cloud-based version-control system, 226 Containerization, 30 application, 32 virtualize processes, 30 31

Chris Calloway for Triangle Python Users Group at Caktus Group December 14, 2017

Employing HPC DEEP-EST for HEP Data Analysis. Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN)

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

Managing Deep Learning Workflows

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

Connecting ArcGIS with R and Conda. Shaun Walbridge

LSST software stack and deployment on other architectures. William O Mullane for Andy Connolly with material from Owen Boberg

Shifter: Fast and consistent HPC workflows using containers

Getting Started with Python

KNIME Python Integration Installation Guide. KNIME AG, Zurich, Switzerland Version 3.7 (last updated on )

Onto Petaflops with Kubernetes

IBM Data Science Experience (DSX) Partner Application Validation Quick Guide

Python With Data Science

Cloudera Data Science Workbench

Using jupyter notebooks on Blue Waters. Roland Haas (NCSA / University of Illinois)

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Frameworks in Python for Numeric Computation / ML

DATA SCIENCE USING SPARK: AN INTRODUCTION

Supporting GPUs in Docker Containers on Apache Mesos

Specialist ICT Learning

CONTAINERIZING JOBS ON THE ACCRE CLUSTER WITH SINGULARITY

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

Important DevOps Technologies (3+2+3days) for Deployment

Opportunities for container environments on Cray XC30 with GPU devices

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

CS/Math 471: Intro. to Scientific Computing

Latest from the Lab: What's New Machine Learning Sam Buhler - Machine Learning Product/Offering Manager

Cloudera Data Science Workbench

Investigating Containers for Future Services and User Application Support

Guillimin HPC Users Meeting December 14, 2017

Intel Distribution for Python* и Intel Performance Libraries

Introduction to Computer Vision Laboratories

Intel Distribution For Python*

S INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS

Machine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich

Blurring the Line Between Developer and Data Scientist

AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management

Apache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved.

CloudMan cloud clusters for everyone

Deep Learning on AWS with TensorFlow and Apache MXNet

About Intellipaat. About the Course. Why Take This Course?

IBM Data Science Experience (DSX) Partner Application Validation Quick Guide

Homework 01 : Deep learning Tutorial

Building (Better) Data Pipelines using Apache Airflow

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

BUILDING A GPU-FOCUSED CI SOLUTION

Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer

Migrate from Netezza Workload Migration

Containers. Pablo F. Ordóñez. October 18, 2018

Analytics and Visualization

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Parsl: Developing Interactive Parallel Workflows in Python using Parsl

An exceedingly high-level overview of ambient noise processing with Spark and Hadoop

Harnessing the Power of Python in ArcGIS Using the Conda Distribution. Shaun Walbridge Mark Janikas Ting Lee

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch

Singularity for GPU and Deep Learning

CERN openlab & IBM Research Workshop Trip Report

Facilitating Collaborative Analysis in SWAN

Numba: A Compiler for Python Functions

SAP Vora - AWS Marketplace Production Edition Reference Guide

Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules. Singularity overview. Vanessa HAMAR

Shifter and Singularity on Blue Waters

Getting Started with Hadoop

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

WITH INTEL TECHNOLOGIES

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

OpenFOAM Scaling on Cray Supercomputers Dr. Stephen Sachs GOFUN 2017

Hadoop. Introduction / Overview

Using EasyBuild and Continuous Integration for Deploying Scientific Applications on Large Scale Production Systems

XC Series Shifter User Guide (CLE 6.0.UP02) S-2571

OpenShift Roadmap Enterprise Kubernetes for Developers. Clayton Coleman, Architect, OpenShift

Cloud Computing & Visualization

Scientific Python. 1 of 10 23/11/ :00

HPC & BD Survey: Results for participants who use resources outside of NJIT

Python ecosystem for scientific computing with ABINIT: challenges and opportunities. M. Giantomassi and the AbiPy group

Machine Learning. Bridging the OT IT Gap for Machine Learning with Ignition and AWS Greengrass

Analytics Platform for ATLAS Computing Services

OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS

COPYRIGHT DATASHEET

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

Building a Secure Data Environment. Maureen Dougherty Center for High-Performance Computing University of Southern California

TensorFlow: A System for Learning-Scale Machine Learning. Google Brain

Utilizing Container Technology to Streamline Data Science

StreamSets Control Hub Installation Guide

Declarative Modeling for Cloud Deployments

Integrate MATLAB Analytics into Enterprise Applications

Using DC/OS for Continuous Delivery

Singularity: Containers for High-Performance Computing. Grigory Shamov Nov 21, 2017

NVIDIA DEEP LEARNING INSTITUTE

DevOps Technologies. for Deployment

Big Data Analytics Tools. Applied to ATLAS Event Data

Modeling Uncertainty with nanohub tools using BOINC as a Computational Resource

Big Data Analytics at OSC

Transcription:

Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc

Overview Supported Technologies Cray PE Python Support Shifter Urika-XC Anaconda Python Spark Intel BigDL machine learning library Dask Distributed TensorFlow Jupyter Notebooks Use Cases Met Office JADE Platform Nowcasting

Supported Technologies Overview Python in Cray PE Shifter Containers for XC Urika-XC - Analytics for Cray XC systems Anaconda Apache Spark Dask Distributed Tensorflow Jupyter Notebooks

Python in Cray PE > module load cray-python > python example.py > echo "Coming December > module load craypython/2.7.13.1.102 > python2 example2.py > module load craypython/3.6.1.1.102 > python3 example3.py Currently single module bundling Python 2 and 3 Stock Python distributions Version numbering matches PE versions From December release separate Python 2 and 3 modules Exact version numbers may change in release but will correspond to Python versions Have corresponding python2 and python3 commands Both contain a python command, most recently loaded module takes precedence module load cray-python without explicit version will use Python 2 Built against relevant Cray libraries (libsci and mpt) Includes common data science libraries: numpy, scipy, mpi4py and dask

Shifter Container support for HPC Originally developed at NERSC Officially supported by Cray since CLE 6.0UP02 Containers allow for encapsulating applications and their dependencies Supports common image repositories and formats e.g. Dockerfile and Docker repositories Key features for HPC environments Provides better security model than Docker Users run as themselves inside the container System admin can control mount points Enables MPI and GPU support for containerized applications Loopback Cache feature avoids metadata overheads

Urika-XC Provides a suite of analytics and data science software that runs on the XC platform Requirements: CLE 6.0UP02 Shifter Provided as a module module load analytics Dynamically creates analytics clusters in the context of an existing WLM reservation Spark and/or Dask clusters Added in 1.1 release: Google TensorFlow and Intel BigDL for machine learning Jupyter Notebooks for interactive development module load analytics salloc -N 10 start_analytics spark-shell

Urika-XC - Python Dependency Management > conda create -n example > source activate example (example) > conda install pandas (example) > python --version Python 3.6.2 :: Continuum Analytics, Inc. > source deactivate example > conda create -n py2 python=2.6 > source activate py2 (py2) > python --version Python 2.6.9 :: Continuum Analytics, Inc. For analytics products we use Anaconda to manage dependencies Provides users the ability to manage isolated Python environments with their desired package and Python versions Environments can be consumed by relevant applications e.g. Spark, Dask etc. As of 1.1 release: Can optionally use Intel Distribution of Python if preferred Add --idp flag to start_analytics

Urika-XC - Apache Spark Popular in-memory analytics package http://spark.apache.org Currently version 2.1.0 1.1 release will upgrade to 2.2.0 Always run as part of Urika-XC jobs Supports jobs written in multiple languages Scala/Java Python R Interactive Scala, Python and R shells available Can also batch submit Integrates with Anaconda environments for dependency management Also includes the Intel BigDL machine learning libraries Currently version 0.3.0

Urika-XC - Dask Distributed Python > salloc -N 10 start_analytics --dask --dask-env example --dask-workers 8 --daskcores 2 Popular distributed Python analytics package https://distributed.readthedocs.io/en/latest/ Optionally runs as part of Urika-XC jobs Co-exists with Spark cluster Integrates with Anaconda for dependency management

Urika-XC - TensorFlow Popular machine learning framework Originally developed at Google, now open source Version 1.3 New feature in our 1.1 release Machine learning workflows can be written in Python Supports two modes of distribution Google grpc (TCP/IP) Cray MPI (via Cray developed plugin) Supports both CPU and GPU nodes Our analytics module provides several helper scripts for launching distributed jobs

Urika-XC - Jupyter Notebooks Provides UIs for data scientists to develop analytics workflows in Mix code, prose, images etc in a single document Can be shared for collaboration Offloads execution to underlying analytics framework Spark/Dask/TensorFlow/Bi gdl etc SSH tunnels used to expose notebook server to outside world i.e. user laptops

Use Cases - Met Office JADE Platform Met Office JADE Data Analysis platform Collaboration with Met Office Informatics Lab Goal to replace powerful desktops with analysis environment accessed via web browser Leverages : DASK distributed python engine Jupyter interactive notebooks IRIS Python Library for Meteorology and Climatology Developed/prototyped on AWS Will be able to run on their XC systems once Urika-XC 1.1 is released

Use Cases - Nowcasting Create very short term forecasts on demand Goals: Investigate the utility of machine learning for very short term (0-1 hour) precipitation forecasts Gain insights into the full deep learning workflow to drive product roadmap Approach Machine learning used to train a convolutional neural network on historical observational data for a region Once trained model can be used to generate a very short-term forecast based upon current/past observations Past observations useful as we can compare predictions with subsequent observations for evaluation i.e. Train once, Predict many Python implementation of workflow encapsulated in a Jupyter Notebook

Use Case - Nowcasting - Sample Results

Questions? rvesse@cray.com mikeri@cray.com aheye@cray.com

Nowcasting Case Study Additional Details and Results

Nowcasting Data Pipeline Amazon S3 NOAA Bucket 1. Load Radar Files 4. Save Tensor Cray Sonexion TM 5. Process Tensor 2. Parallelize Radar files to Spark Cray CS- Storm TM Py-ART Cray Urika-GX TM : 3. Generate 3D Projection of Raw Radar Data BigDL

Nowcasting Preliminary Results 0.8 FAR: False Alarm 0.7 Rate lower is better 0.6 CSI: Critical Success 0.5 Index higher is 0.4 better 0.3 POD: Probability of 0.2 Detection higher is 0.1 better 0 ConvLSTM vs Persistence, KTLH station T+10 T+20 T+30 T+40 T+50 T+60 NN-FAR NN-CSI NN-POD P-FAR P-CSI P-POD NN = ConvLSTM P = Persistence

Nowcasting - Further Results Station: KTLX, Oklahoma City 1 Timestep = 10 minutes Blue: Predictions made by DL Red: Constant Prediction of Persistence 0.6 0.4 0.2 0 KTLX False-Alarm Rate (FAR) KTLX Critical Sucess Rate (CSI) 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 1 2 3 4 5 6 NN Predictions Persistence NN Predictions Persistence KTLX Probability of Detection (POD) 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 NN Predictions Persistence

Nowcasting - Further Results KTLX Mean Squared Error (MSE) KTLX Mean Absolute Error (MAE) 0.012 0.6 0.01 0.5 0.008 0.4 0.006 0.3 0.004 0.2 0.002 0.1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 NN Predictions Persistence NN Predictions Persistence