Using Rmpi within the HPC4Stats framework
|
|
- Holly Clarke
- 5 years ago
- Views:
Transcription
1 Using Rmpi within the HPC4Stats framework Dorit Hammerling Analytics and Integrative Machine Learning Group National Center for Atmospheric Research (NCAR) Based on work by Doug Nychka (Applied Mathematics and Statistics Department, Colorado School of Mines) with contributions from Daniel Milroy (Department of Computer Science, CU Boulder), Brian Vanderwende (IT Consulting Services Group, NCAR), Sophia Chen (Department of Computer Science, Brown University), and Nathan Lenssen (Columbia University) October 11, 2018 Hammerling et al. (NCAR) HPC4Stats October 11, / 19
2 Overview Context and introduction Strategies for parallel analysis Find the right platform for your problem NCAR s HPC system as an example Choices for splitting up the data Rmpi as a general strategy HPC4Stats framework In a nutshell: Leverage computational infrastructure to conduct embarrassingly parallel data analysis in R using existing tools. Hammerling et al. (NCAR) HPC4Stats October 11, / 19
3 Typical data amenable to parallel inference daily data for 35 years: 12,775 values per grid cell 288 longitudes 192 latitudes: 55,296 grid cells 12,775 55,296 = 706,406,400 data points (2.83 GB) Hammerling et al. (NCAR) HPC4Stats October 11, / 19
4 Fitting a Generalized Pareto distribution This is a complementary approach to block maxima for Extreme Value Analysis For data above a given threshold (µ) fit a probability density with the form: 1 f (x) = σ[1 + ξ(x µ) ] σ (1/ξ+1) for x µ. σ scale parameter, ξ shape parameter We are ignoring all the data below the threshold to just fit the tail. Having selected the threshold, estimate σ and ξ by maximum likelihood. Hammerling et al. (NCAR) HPC4Stats October 11, / 19
5 Fitting a Generalized Pareto distribution: R code tailprob<-.01 # tail probability used in extremes fitting returnlevelyear <- 100 # Years used for return level Y<- dataset[lonindex, latindex,] threshold<- quantile(y, 1- tailprob) frac<- sum(y > threshold) / length(y) GPFit<- fevd(y, threshold=threshold, type="gp",method="mle") ReturnLevel<- return.level(gpfit,returnlevelyear, do.ci=true) Depending on your machine takes somewhere from 0.3 to 1 second. Hammerling et al. (NCAR) HPC4Stats October 11, / 19
6 Fitting a Generalized Pareto distribution: R code tailprob<-.01 # tail probability used in extremes fitting returnlevelyear <- 100 # Years used for return level Y<- dataset[lonindex, latindex,] threshold<- quantile(y, 1- tailprob) frac<- sum(y > threshold) / length(y) GPFit<- fevd(y, threshold=threshold, type="gp",method="mle") ReturnLevel<- return.level(gpfit,returnlevelyear, do.ci=true) Depending on your machine takes somewhere from 0.3 to 1 second. Hammerling et al. (NCAR) HPC4Stats October 11, / 19
7 Why use HPC systems for statistical computing? Doing repetitive tasks can take a lot of time. Even short tasks add up quickly: 0.33 seconds for one location corresponds to approx. 5 hours for 55,000 locations. 1 second for one location corresponds to approx. 15 hours for 55,000 locations. And that is for a single data set. Often we want to analyze hundreds of data sets and test different parameters. And we might not have to worry about obtaining the data. Moving the analysis to the data is becoming more and more common. Hammerling et al. (NCAR) HPC4Stats October 11, / 19
8 Why use HPC systems for statistical computing? Doing repetitive tasks can take a lot of time. Even short tasks add up quickly: 0.33 seconds for one location corresponds to approx. 5 hours for 55,000 locations. 1 second for one location corresponds to approx. 15 hours for 55,000 locations. And that is for a single data set. Often we want to analyze hundreds of data sets and test different parameters. And we might not have to worry about obtaining the data. Moving the analysis to the data is becoming more and more common. Hammerling et al. (NCAR) HPC4Stats October 11, / 19
9 Why use HPC systems for statistical computing? Doing repetitive tasks can take a lot of time. Even short tasks add up quickly: 0.33 seconds for one location corresponds to approx. 5 hours for 55,000 locations. 1 second for one location corresponds to approx. 15 hours for 55,000 locations. And that is for a single data set. Often we want to analyze hundreds of data sets and test different parameters. And we might not have to worry about obtaining the data. Moving the analysis to the data is becoming more and more common. Hammerling et al. (NCAR) HPC4Stats October 11, / 19
10 Why use HPC systems for statistical computing? Doing repetitive tasks can take a lot of time. Even short tasks add up quickly: 0.33 seconds for one location corresponds to approx. 5 hours for 55,000 locations. 1 second for one location corresponds to approx. 15 hours for 55,000 locations. And that is for a single data set. Often we want to analyze hundreds of data sets and test different parameters. And we might not have to worry about obtaining the data. Moving the analysis to the data is becoming more and more common. Hammerling et al. (NCAR) HPC4Stats October 11, / 19
11 NCAR s high performance computing (HPC) system Cheyenne (online since 2017:) 5.34 petaflops peak 145,152 cores with 64 or 128 GB memory 313 TB total memory 100 Gb/s interconnects Core-hours available to NSF research community Simple application process for graduate students Hammerling et al. (NCAR) HPC4Stats October 11, / 19
12 Cores and nodes on HPC systems Usually cores on one node share memory (cache). Memory between nodes is typically not shared, but can be accessed. Understanding the basics of the architecture and interconnects can be really helpful! Hammerling et al. (NCAR) HPC4Stats October 11, / 19
13 Relevant details: Memory and parallelization tools Memory available on compute nodes Two classes of CPU nodes on Cheyenne: Standard nodes have 64 GB of memory (46 GB usable). Large memory nodes with 128 GB of memory (110 GB usable). Data Analysis cluster: Many GPUs, but limited memory. Configured for deep learning applications. You need to know what is installed and how it is configured! Rmpi: Limits on workers? What physical interconnect is it using? Matlab Distributed Computing server Spark for Python or Scala Hammerling et al. (NCAR) HPC4Stats October 11, / 19
14 Relevant details: Memory and parallelization tools Memory available on compute nodes Two classes of CPU nodes on Cheyenne: Standard nodes have 64 GB of memory (46 GB usable). Large memory nodes with 128 GB of memory (110 GB usable). Data Analysis cluster: Many GPUs, but limited memory. Configured for deep learning applications. You need to know what is installed and how it is configured! Rmpi: Limits on workers? What physical interconnect is it using? Matlab Distributed Computing server Spark for Python or Scala Hammerling et al. (NCAR) HPC4Stats October 11, / 19
15 Application benchmarking Even if one knows the architecture very well and has data on low-level benchmarks, application benchmarking is critical. Application benchmarking: benchmarking that uses code as close as possible to the real production code (including I/O operations!). For large data sets how you read in and distribute the data matters! Typical options: All the data at once In blocks: e.g. one latitude or longitude band at a time Smallest possible unit: e.g. one grid cell at a time Hammerling et al. (NCAR) HPC4Stats October 11, / 19
16 Application benchmarking Even if one knows the architecture very well and has data on low-level benchmarks, application benchmarking is critical. Application benchmarking: benchmarking that uses code as close as possible to the real production code (including I/O operations!). For large data sets how you read in and distribute the data matters! Typical options: All the data at once In blocks: e.g. one latitude or longitude band at a time Smallest possible unit: e.g. one grid cell at a time Hammerling et al. (NCAR) HPC4Stats October 11, / 19
17 Technical Report, Data and Code available Hammerling et al. (NCAR) HPC4Stats October 11, / 19
18 Rmpi in a picture Hammerling et al. (NCAR) HPC4Stats October 11, / 19
19 Rmpi overview An R interface (wrapper) to MPI A convenient way to run R with many R tasks on many cores. Little knowledge of MPI and parallel computing needed. Uses a supervisor/worker model: All are full fledged R sessions but the worker sessions only receive instructions from the supervisor. The task assigned to each worker is an R function that is passed a unique index. The index is used to determine exactly what task to do. (In our case the index determines a range of grid boxes.) Hammerling et al. (NCAR) HPC4Stats October 11, / 19
20 Schematic of using Rmpi 1. LIBRARIES: Install/load all the libraries you need. 2. SETUP: Read in any common data sets (e.g. climate data, lat/lon grids). Define any input functions. Define objects to control computation. 3. WORKERS: Spawn them and broadcast objects to them. 4. APPLY: mpi.iapplylb: a single R function (e.g. dotask) applied to a sequence of IDs. 5. SAVE RESULTS Hammerling et al. (NCAR) HPC4Stats October 11, / 19
21 HPC4Stats: a reusable framework to implement Rmpi Batch script that calls Rmpi does not need to be changed. Organizes the information broadcast to the workers. Particular parallel tasks determined by a short R Namelist. Working directories default to a specific organization. (Relatively) simple to switch between a laptop/cluster and HPC system. Hammerling et al. (NCAR) HPC4Stats October 11, / 19
22 HPC4Stats layout Assumes three subdirectories: batch Holds R namelists (.rnl), example batch scripts (.pbs), README files, examples, and output from R scripts. src The batchsupervisor.r script and any other source code. output Where output is saved to (usually in R binary format). data (Optional) Where any common data sets are located. plots (Optional) Location for plots. Hammerling et al. (NCAR) HPC4Stats October 11, / 19
23 HPC4Stats batch execution We assume that batch jobs are executed from the batch directory. To submit an R batch job: 1. Create a specific R namelist and export the name of this file. (export HPC4StatsNAMELIST=template.rnl) 2. Execute R batch command. (R CMD BATCH --no-save../src/supervisorbatch.r template.rout) On a local machine: use a terminal window to export the namelist and to submit batch command. On an HPC system: create a batch job wrapper using a queue script (e.g. PBS) and submit to queuing system. Hammerling et al. (NCAR) HPC4Stats October 11, / 19
24 Some details The dotask function is expected to only take a task ID. The function needs to figure out from this what to do. Keep in mind that any objects broadcast to the workers will be found by this function through the usual way R searches for objects. Make sure the that number of workers (nworkers in the R namelist) is at least one less than the number of processes requested. The R namelist is included as part of the output object. Keep in mind that while there are many defaults most choices can be changed through the R namelist. Avoid changing the batchsupervisor.r script. Be creative with your namelist structure! Hammerling et al. (NCAR) HPC4Stats October 11, / 19
25 Exercise for this afternoon Running an Rmpi example using HPC4Stats: Make sure you have Rmpi (and required mpi and compiler libraries) installed on your laptop! Download the directory tree HCP4Stats Asheville. We will start with the README file and take it from there! Hammerling et al. (NCAR) HPC4Stats October 11, / 19
Accelerating CMIP Data Analysis with Parallel Computing in R
Accelerating CMIP Data Analysis with Parallel Computing in R Daniel Milroy Sophia Chen Brian Vanderwende Dorit Hammerling NCAR Technical Notes NCAR/TN-534+CODE National Center for Atmospheric Research
More informationExtreme Value Theory in (Hourly) Precipitation
Extreme Value Theory in (Hourly) Precipitation Uli Schneider Geophysical Statistics Project, NCAR GSP Miniseries at CSU November 17, 2003 Outline Project overview Extreme value theory 101 Applying extreme
More informationIntroduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose
Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer Daniel Yorgov Department of Mathematical & Statistical Sciences, University of Colorado Denver
More informationIntroduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende
Introduction to the NCAR HPC Systems 25 May 2018 Consulting Services Group Brian Vanderwende Topics to cover Overview of the NCAR cluster resources Basic tasks in the HPC environment Accessing pre-built
More informationOur Workshop Environment
Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Our Environment Today Your laptops or workstations: only used for portal access Bridges
More informationBatch Systems. Running calculations on HPC resources
Batch Systems Running calculations on HPC resources Outline What is a batch system? How do I interact with the batch system Job submission scripts Interactive jobs Common batch systems Converting between
More informationRunning the model in production mode: using the queue.
Running the model in production mode: using the queue. 1) Codes are executed with run scripts. These are shell script text files that set up the individual runs and execute the code. The scripts will seem
More informationOur Workshop Environment
Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2017 Our Environment This Week Your laptops or workstations: only used for portal access Bridges
More informationIntroduction to Cheyenne. 12 January, 2017 Consulting Services Group Brian Vanderwende
Introduction to Cheyenne 12 January, 2017 Consulting Services Group Brian Vanderwende Topics we will cover Technical specs of the Cheyenne supercomputer and expanded GLADE file systems The Cheyenne computing
More informationCS/Math 471: Intro. to Scientific Computing
CS/Math 471: Intro. to Scientific Computing Getting Started with High Performance Computing Matthew Fricke, PhD Center for Advanced Research Computing Table of contents 1. The Center for Advanced Research
More informationHigh Performance Computing. What is it used for and why?
High Performance Computing What is it used for and why? Overview What is it used for? Drivers for HPC Examples of usage Why do you need to learn the basics? Hardware layout and structure matters Serial
More informationNCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017
NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017 Overview The Globally Accessible Data Environment (GLADE) provides centralized file storage for HPC computational, data-analysis,
More informationOur Workshop Environment
Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2017 Our Environment This Week Your laptops or workstations: only used for portal access Bridges
More informationHigh Performance Computing. What is it used for and why?
High Performance Computing What is it used for and why? Overview What is it used for? Drivers for HPC Examples of usage Why do you need to learn the basics? Hardware layout and structure matters Serial
More informationTDDE31/732A54 - Big Data Analytics Lab compendium
TDDE31/732A54 - Big Data Analytics Lab compendium For relational databases lab, please refer to http://www.ida.liu.se/~732a54/lab/rdb/index.en.shtml. Description and Aim In the lab exercises you will work
More informationOur Workshop Environment
Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2017 Our Environment This Week Your laptops or workstations: only used for portal access Bridges
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What is HPC Concept? What is
More informationChoosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing
Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 Email:plamenkrastev@fas.harvard.edu Objectives Inform you of available computational resources Help you choose appropriate computational
More informationWelcome to the XSEDE Big Data Workshop
Welcome to the XSEDE Big Data Workshop John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2017 Who are we? Your hosts: Pittsburgh Supercomputing Center Our satellite sites:
More informationBridging the Gap Between High Quality and High Performance for HPC Visualization
Bridging the Gap Between High Quality and High Performance for HPC Visualization Rob Sisneros National Center for Supercomputing Applications University of Illinois at Urbana Champaign Outline Why am I
More informationNERSC. National Energy Research Scientific Computing Center
NERSC National Energy Research Scientific Computing Center Established 1974, first unclassified supercomputer center Original mission: to enable computational science as a complement to magnetically controlled
More informationUL HPC Monitoring in practice: why, what, how, where to look
C. Parisot UL HPC Monitoring in practice: why, what, how, where to look 1 / 22 What is HPC? Best Practices Getting Fast & Efficient UL HPC Monitoring in practice: why, what, how, where to look Clément
More informationHigh Performance Computing (HPC) Using zcluster at GACRC
High Performance Computing (HPC) Using zcluster at GACRC On-class STAT8060 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC?
More informationWelcome to the XSEDE Big Data Workshop
Welcome to the XSEDE Big Data Workshop John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Who are we? Our satellite sites: Tufts University University of Utah Purdue
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationDeep Learning Frameworks with Spark and GPUs
Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,
More informationWelcome to the XSEDE Big Data Workshop
Welcome to the XSEDE Big Data Workshop John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Who are we? Our satellite sites: Tufts University Purdue University Howard
More informationOur Workshop Environment
Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Our Environment This Week Your laptops or workstations: only used for portal access Bridges
More informationIntroduction to High Performance Computing Using Sapelo2 at GACRC
Introduction to High Performance Computing Using Sapelo2 at GACRC Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu 1 Outline High Performance Computing (HPC)
More informationCERN openlab & IBM Research Workshop Trip Report
CERN openlab & IBM Research Workshop Trip Report Jakob Blomer, Javier Cervantes, Pere Mato, Radu Popescu 2018-12-03 Workshop Organization 1 full day at IBM Research Zürich ~25 participants from CERN ~10
More informationOur Workshop Environment
Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2016 Our Environment This Week Your laptops or workstations: only used for portal access Bridges
More informationIntroduction to High-Performance Computing
Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC On-class PBIO/BINF8350 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What
More informationOur Workshop Environment
Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2015 Our Environment Today Your laptops or workstations: only used for portal access Blue Waters
More informationIntroduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende
Introduction to NCAR HPC 25 May 2017 Consulting Services Group Brian Vanderwende Topics we will cover Technical overview of our HPC systems The NCAR computing environment Accessing software on Cheyenne
More informationMIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization
MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization 2 Glenn Bresnahan Director, SCV MGHPCC Buy-in Program Kadin Tseng HPC Programmer/Consultant
More informationGenius Quick Start Guide
Genius Quick Start Guide Overview of the system Genius consists of a total of 116 nodes with 2 Skylake Xeon Gold 6140 processors. Each with 18 cores, at least 192GB of memory and 800 GB of local SSD disk.
More informationComputer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research
Computer Science Section Computational and Information Systems Laboratory National Center for Atmospheric Research My work in the context of TDD/CSS/ReSET Polynya new research computing environment Polynya
More informationXRAY Grid TO BE OR NOT TO BE?
XRAY Grid TO BE OR NOT TO BE? 1 I was not always a Grid sceptic! I started off as a grid enthusiast e.g. by insisting that Grid be part of the ESRF Upgrade Program outlined in the Purple Book : In this
More informationIntroduction to parallel computing
Introduction to parallel computing using R and the Claudia Vitolo 1 1 Department of Civil and Environmental Engineering Imperial College London Civil Lunches, 16.10.2012 Outline 1 Parallelism What s parallel
More informationGetting started with the CEES Grid
Getting started with the CEES Grid October, 2013 CEES HPC Manager: Dennis Michael, dennis@stanford.edu, 723-2014, Mitchell Building room 415. Please see our web site at http://cees.stanford.edu. Account
More informationIntroduction to dompi
Steve Weston stephen.b.weston@gmail.com May 1, 2017 1 Introduction The dompi package is what I call a parallel backend for the foreach package. Since the foreach package is not a parallel programming system,
More informationSuperMike-II Launch Workshop. System Overview and Allocations
: System Overview and Allocations Dr Jim Lupo CCT Computational Enablement jalupo@cct.lsu.edu SuperMike-II: Serious Heterogeneous Computing Power System Hardware SuperMike provides 442 nodes, 221TB of
More informationParallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer
Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster
More informationBig Data Analytics at OSC
Big Data Analytics at OSC 04/05/2018 SUG Shameema Oottikkal Data Application Engineer Ohio SuperComputer Center email:soottikkal@osc.edu 1 Data Analytics at OSC Introduction: Data Analytical nodes OSC
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationUsers and utilization of CERIT-SC infrastructure
Users and utilization of CERIT-SC infrastructure Equipment CERIT-SC is an integral part of the national e-infrastructure operated by CESNET, and it leverages many of its services (e.g. management of user
More informationThe Red Storm System: Architecture, System Update and Performance Analysis
The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI
More informationPractical Introduction to
1 2 Outline of the workshop Practical Introduction to What is ScaleMP? When do we need it? How do we run codes on the ScaleMP node on the ScaleMP Guillimin cluster? How to run programs efficiently on ScaleMP?
More informationLLVM Summer School, Paris 2017
LLVM Summer School, Paris 2017 David Chisnall June 12 & 13 Setting up You may either use the VMs provided on the lab machines or your own computer for the exercises. If you are using your own machine,
More informationOBTAINING AN ACCOUNT:
HPC Usage Policies The IIA High Performance Computing (HPC) System is managed by the Computer Management Committee. The User Policies here were developed by the Committee. The user policies below aim to
More informationRegression Testing on Petaflop Computational Resources. CUG 2010, Edinburgh Mike McCarty Software Developer May 27, 2010
Regression Testing on Petaflop Computational Resources CUG 2010, Edinburgh Mike McCarty Software Developer May 27, 2010 Additional Authors Troy Baer (NICS) Lonnie Crosby (NICS) Outline What is NICS and
More informationThe DTU HPC system. and how to use TopOpt in PETSc on a HPC system, visualize and 3D print results.
The DTU HPC system and how to use TopOpt in PETSc on a HPC system, visualize and 3D print results. Niels Aage Department of Mechanical Engineering Technical University of Denmark Email: naage@mek.dtu.dk
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC On-class STAT8330 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu Slides courtesy: Zhoufei Hou 1 Outline What
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationIgniting QuantLib on a Zeppelin
Igniting QuantLib on a Zeppelin Andreas Pfadler, d-fine GmbH QuantLib UserMeeting, Düsseldorf, 7.12.2016 d-fine d-fine All rights All rights reserved reserved 0 Welcome Back!» An early stage of this work
More informationImage Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System
Image Sharpening Practical Introduction to HPC Exercise Instructions for Cirrus Tier-2 System 2 1. Aims The aim of this exercise is to get you used to logging into an HPC resource, using the command line
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationSlurm basics. Summer Kickstart June slide 1 of 49
Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource
More informationHow to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende
How to access Geyser and Caldera from Cheyenne 19 December 2017 Consulting Services Group Brian Vanderwende Geyser nodes useful for large-scale data analysis and post-processing tasks 16 nodes with: 40
More informationCornell Theory Center 1
Cornell Theory Center Cornell Theory Center (CTC) is a high-performance computing and interdisciplinary research center at Cornell University. Scientific and engineering research projects supported by
More informationdesigning a GPU Computing Solution
designing a GPU Computing Solution Patrick Van Reeth EMEA HPC Competency Center - GPU Computing Solutions Saturday, May the 29th, 2010 1 2010 Hewlett-Packard Development Company, L.P. The information contained
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationIntroduction to PICO Parallel & Production Enviroment
Introduction to PICO Parallel & Production Enviroment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Nicola Spallanzani n.spallanzani@cineca.it
More informationIntroduction to High Performance Computing at UEA. Chris Collins Head of Research and Specialist Computing ITCS
Introduction to High Performance Computing at UEA. Chris Collins Head of Research and Specialist Computing ITCS Introduction to High Performance Computing High Performance Computing at UEA http://rscs.uea.ac.uk/hpc/
More informationEmploying HPC DEEP-EST for HEP Data Analysis. Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN)
Employing HPC DEEP-EST for HEP Data Analysis Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN) 1 Outline The DEEP-EST Project Goals and Motivation HEP Data Analysis on HPC with Apache Spark on HPC
More informationNBIC TechTrack PBS Tutorial
NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen Visit our webpage at: http://www.nbic.nl/support/brs 1 NBIC PBS Tutorial
More informationOrganizational Update: December 2015
Organizational Update: December 2015 David Hudak Doug Johnson Alan Chalker www.osc.edu Slide 1 OSC Organizational Update Leadership changes State of OSC Roadmap Web app demonstration (if time) Slide 2
More informationIllinois Proposal Considerations Greg Bauer
- 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and
More informationHPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda
KFUPM HPC Workshop April 29-30 2015 Mohamed Mekias HPC Solutions Consultant Agenda 1 Agenda-Day 1 HPC Overview What is a cluster? Shared v.s. Distributed Parallel v.s. Massively Parallel Interconnects
More informationBig Data Analytics with Hadoop and Spark at OSC
Big Data Analytics with Hadoop and Spark at OSC 09/28/2017 SUG Shameema Oottikkal Data Application Engineer Ohio SuperComputer Center email:soottikkal@osc.edu 1 Data Analytics at OSC Introduction: Data
More informationPart 6b: The effect of scale on raster calculations mean local relief and slope
Part 6b: The effect of scale on raster calculations mean local relief and slope Due: Be done with this section by class on Monday 10 Oct. Tasks: Calculate slope for three rasters and produce a decent looking
More informationSequential Gaussian Simulation Application Porting
Sequential Gaussian Simulation Application Porting GISELA/EPIKH School for Application Porting Álvaro Parra alvaro.parra@alges.cl ALGES ALGES aims at developing tools and models for the characterization
More informationDynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California
Dynamic Cuda with F# HPC GPU & F# Meetup March 19 San Jose, California Dr. Daniel Egloff daniel.egloff@quantalea.net +41 44 520 01 17 +41 79 430 03 61 About Us! Software development and consulting company!
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationPegasus. Automate, recover, and debug scientific computations. Rafael Ferreira da Silva.
Pegasus Automate, recover, and debug scientific computations. Rafael Ferreira da Silva http://pegasus.isi.edu Experiment Timeline Scientific Problem Earth Science, Astronomy, Neuroinformatics, Bioinformatics,
More informationParallel and Distributed Computing with MATLAB The MathWorks, Inc. 1
Parallel and Distributed Computing with MATLAB 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster insight on more complex problems with larger datasets
More informationCentre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules. Singularity overview. Vanessa HAMAR
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules Singularity overview Vanessa HAMAR Disclaimer } The information in this presentation was compiled from different
More informationHPC DOCUMENTATION. 3. Node Names and IP addresses:- Node details with respect to their individual IP addresses are given below:-
HPC DOCUMENTATION 1. Hardware Resource :- Our HPC consists of Blade chassis with 5 blade servers and one GPU rack server. a.total available cores for computing: - 96 cores. b.cores reserved and dedicated
More informationL25: Modern Compiler Design Exercises
L25: Modern Compiler Design Exercises David Chisnall Deadlines: October 26 th, November 23 th, November 9 th These simple exercises account for 20% of the course marks. They are intended to provide practice
More informationTechnical guide. Windows HPC server 2016 for LS-DYNA How to setup. Reference system setup - v1.0
Technical guide Windows HPC server 2016 for LS-DYNA How to setup Reference system setup - v1.0 2018-02-17 2018 DYNAmore Nordic AB LS-DYNA / LS-PrePost 1 Introduction - Running LS-DYNA on Windows HPC cluster
More informationHow to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions
How to run applications on Aziz supercomputer Mohammad Rafi System Administrator Fujitsu Technology Solutions Agenda Overview Compute Nodes Storage Infrastructure Servers Cluster Stack Environment Modules
More informationFacilitating Collaborative Analysis in SWAN
Facilitating Collaborative Analysis in SWAN E. Tejedor, D. Castro, D. Piparo, P. Mato E. Bocchi, J. Moscicki, M. Lamanna, P. Kothuri https://swan.cern.ch July 11th, 2018 CHEP 2018, Sofia (Bulgaria) Introduction
More informationEngagement With Scientific Facilities
Engagement With Scientific Facilities Eli Dart, Network Engineer ESnet Science Engagement Lawrence Berkeley National Laboratory Global Science Engagement Panel Internet2 Technology Exchange San Francisco,
More informationThe BioHPC Nucleus Cluster & Future Developments
1 The BioHPC Nucleus Cluster & Future Developments Overview Today we ll talk about the BioHPC Nucleus HPC cluster with some technical details for those interested! How is it designed? What hardware does
More informationRunning Jobs, Submission Scripts, Modules
9/17/15 Running Jobs, Submission Scripts, Modules 16,384 cores total of about 21,000 cores today Infiniband interconnect >3PB fast, high-availability, storage GPGPUs Large memory nodes (512GB to 1TB of
More informationOutline. March 5, 2012 CIRMMT - McGill University 2
Outline CLUMEQ, Calcul Quebec and Compute Canada Research Support Objectives and Focal Points CLUMEQ Site at McGill ETS Key Specifications and Status CLUMEQ HPC Support Staff at McGill Getting Started
More informationThe Why and How of HPC-Cloud Hybrids with OpenStack
The Why and How of HPC-Cloud Hybrids with OpenStack OpenStack Australia Day Melbourne June, 2017 Lev Lafayette, HPC Support and Training Officer, University of Melbourne lev.lafayette@unimelb.edu.au 1.0
More informationUser Guide of High Performance Computing Cluster in School of Physics
User Guide of High Performance Computing Cluster in School of Physics Prepared by Sue Yang (xue.yang@sydney.edu.au) This document aims at helping users to quickly log into the cluster, set up the software
More informationChapter 2: The Normal Distribution
Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60
More informationUsing Everest Platform for Teaching Parallel and Distributed Computing
Using Everest Platform for Teaching Parallel and Distributed Computing Oleg Sukhoroslov 1,2(B) 1 Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute),
More informationPython based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc C O M P U T E S T O R E A N A L Y Z E
Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc Overview Supported Technologies Cray PE Python Support Shifter Urika-XC Anaconda Python Spark Intel BigDL machine
More informationANSYS Customization. Mechanical and Mechanical APDL. Eric Stamper. Presented by CAE Associates
ANSYS Customization Mechanical and Mechanical APDL Presented by Eric Stamper 2011 CAE Associates Introduction CAE Associates Inc. Engineering consulting firm since 1981. ANSYS consulting, custom software
More informationHIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH
HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina Koch, Research Computing Facilitator Center for High Throughput Computing STAT679, October 29, 2018 1 About Me I work for the Center for High Throughput
More informationThe Use of Cloud Computing Resources in an HPC Environment
The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes
More informationExecuting dynamic heterogeneous workloads on Blue Waters with RADICAL-Pilot
Executing dynamic heterogeneous workloads on Blue Waters with RADICAL-Pilot Research in Advanced DIstributed Cyberinfrastructure & Applications Laboratory (RADICAL) Rutgers University http://radical.rutgers.edu
More informationAn exceedingly high-level overview of ambient noise processing with Spark and Hadoop
IRIS: USArray Short Course in Bloomington, Indian Special focus: Oklahoma Wavefields An exceedingly high-level overview of ambient noise processing with Spark and Hadoop Presented by Rob Mellors but based
More informationSpatial Distributions of Precipitation Events from Regional Climate Models
Spatial Distributions of Precipitation Events from Regional Climate Models N. Lenssen September 2, 2010 1 Scientific Reason The Institute of Mathematics Applied to Geosciences (IMAGe) and the National
More informationIntroduction to HPC Resources and Linux
Introduction to HPC Resources and Linux Burak Himmetoglu Enterprise Technology Services & Center for Scientific Computing e-mail: bhimmetoglu@ucsb.edu Paul Weakliem California Nanosystems Institute & Center
More information