R on BioHPC. Rstudio, Parallel R and BioconductoR. Updated for
|
|
- Dominick Evans
- 5 years ago
- Views:
Transcription
1 R on BioHPC Rstudio, Parallel R and BioconductoR 1 Updated for
2 2 Today we ll be looking at
3 Why R? The dominant statistics environment in academia Large number of packages to do a lot of different analyses Excellent uptake in Bioinformatics specialist packages (Relatively) easy to accomplish complex stats work Very active development right now R Foundation, R Consortium, Revolution Analytics, RStudio, Microsoft 3
4 Why not R? Quirky language painful for e.g. Python programmers Generally thought to be quite slow except for optimized linear algebra Complex old-fashioned documentation Parallelization packages can be complex / outdated but it s getting better quickly. 4
5 5 Exciting Recent Developments in R
6 RStudio An IDE for R, on the web BioHPC optimized R, access to cluster storage, persistent sessions 6
7 When to use RStudio Development work with small datasets Creating R Markdown documents Working with Shiny for dataset visualizations Any small, short-running data analysis tasks Large datasets, very long running jobs, parallel code? Must use R on the cluster 7
8 Using R on the cluster / clients module load R/3.2.1-intel Latest version, optimized, same as used by rstudio.biohpc.swmed.edu 8
9 Installing Packages We have a set of common packages pre-installed in the R module. You can install your own into your home directory (~/R) install.packages(c("microbenchmark", "data.table")) Some packages need additional libraries, won t compile successfully. - Ask us to install them for you (biohpc-help@utsouthwestern.edu) This is for packages from CRAN BioconductoR packages install differently See later! 9
10 Our R is faster than standard downloads mkl_test.r Compiled using Intel compiler and Intel Math Kernel Library Task Standard R BioHPC R Speedup Matrix Multiplication x Cholesky Decomposition x SVD x PCA x LDA x This is on a cluster node speedup is less on clients with fewer CPU cores For your own Mac or PC see 10
11 Benchmarking functions in R (and compiling them) functions.r library(compiler) f <- function(n, x) for (i in 1:n) x = (1 + sin(x))^(cos(x)) g <- cmpfun(f) library(microbenchmark) compare <- microbenchmark(f(1000, 1), g(1000, 1), times = 1000) library(ggplot2) autoplot(compare) Compiling a function that is called often can increase speed The microbenchmark package allows you to benchmark functions 11
12 For speed always vectorize! functions.r distnorm <- function(){ } x <- seq(-5, 5, 0.01) y <- rep(na,length(x)) for(i in 1:length(x)) { y[i] <- stdnorm(x[i]) } return(list(x=x,y=y)) 54x speedup! vdistnorm <- function(){ } x <- seq(-5, 5, 0.01) y <- stdnorm(x) return(list(x=x, y=y)) Using a function compilation improved median some (< 2x) Using vector form was much faster 12
13 Explicit Parallelization in R Our optimized R automatically parallelizes linear algebra on a single machine - enough in a lot of cases! Always prefer using vector/matrix form over for loops and apply functions to get the most out of these optimizations. If you need more options you can control the parallelization: library(parallel) library(doparallel) # Single-node and cluster parallelization # apply functions and explicit execution # Simple parallel foreach loops Can run parallel code on a single node (multicore) or across nodes (MPI) 13
14 Our Example Application mc_parallel.r # Define a function that performs a random walk with a # specified bias that decays rw2d <- function(n, mu, sigma){ steps=matrix(, nrow=n, ncol=2) for (i in 1:n){ steps[i,1] <- rnorm(1, mean=mu, sd=sigma ) steps[i,2] <- rnorm(1, mean=mu, sd=sigma ) mu <- mu/2 } return( apply(steps, 2, cumsum) ) } 14
15 A bigger task mc_parallel.r # Generate random walks of lengths between 1000 and 5000 # foreach loop system.time( results <- foreach(l=1000:5000) %do% rw2d(l, 3, 1) ) # user system elapsed # # Apply system.time( results <- lapply( 1000:5000, rw2d, 3, 1) ) # user system elapsed #
16 Start a cluster (of R slave workers on a single machine) mc_parallel.r Single node, multiple cores running multiple R slaves #Parallel Single node library(parallel) library(doparallel) # Create a cluster of workers using all cores cl <- makecluster( detectcores() ) # Tell foreach with %dopar% to use this cluster registerdoparallel(cl) stopcluster(cl) 16
17 R parallel vs MKL conflict Intel MKL tries to use all cores for every linear algebra operation R is running multiple iterations of a loop in parallel using all cores If used together too many threads/processes are launched far more than cores! export OMP_NUM_THREADS=1 sys.setenv(omp_num_threads="1") # on terminal before running R # within R ~ 5% improvement by disabling MKL multi-threading 17
18 This time in parallel! mc_parallel.sh cl <- makecluster( detectcores() ) RegisterDoParallel(cl) Sys.setenv(OMP_NUM_THREADS="1") # Generate 1000 random walks of increasing length # Parallel foreach loop system.time( results <- foreach(l=1000:5000) %dopar% rw2d(l, 3, 1) ) # user system elapsed # # Parallel apply system.time( results <- parlapply( cl, 1000:5000, rw2d, 3, 1) ) # user system elapsed # stopcluster(cl) 5x Speedup 9x Speedup 18
19 MPI parallelization for really big jobs MPI is available on R/3.1.2-intel only We will continue to use the simple parallel and doparallel packages Lots online about snow this is now behind the scenes in new versions of R Please join us for coffee to discuss MPI projects using R Work in progress optimizations with your help 19
20 MPI parallelization easy! mpi_parallel.r Just one change in R code! cl <- makecluster( 128, type="mpi" ) Number of MPI tasks cores per node * nodes (or less if RAM limited) 48 cores per node for 256GB partition 32 cores per node for other partitions 20
21 MPI parallelization submitting the job mpi_parallel.sh #!/bin/bash #SBATCH --job-name R_MPI_TEST # Number of nodes required to run this job #SBATCH -N 4 # Distribute n tasks per node #SBATCH --ntasks-per-node=32 #SBATCH -t 0-2:0:0 #SBATCH -o job_%j.out #SBATCH -e job_%j.err #SBATCH --mail-type ALL #SBATCH --mail-user david.trudgian@utsouthwestern.edu module load R/3.2.1-intel ulimit -l unlimited R --vanilla < mpi_parallel.r No mpirun! # END OF SCRIPT 21
22 MPI Performance # Sequential (with MKL multi-threading) system.time( results <- lapply( 1000:10000, rw2d, 3, 1) ) # user system elapsed # # Parallel apply, 4 nodes, 128 MPI tasks system.time( results <- parlapply( cl, 1000:10000, rw2d, 3, 1) ) # user system elapsed # x Speedup 22
23 BioconductoR A comprehensive set of Bioinformatics related packages for R Software and datasets 23
24 BioconductoR Base packages installed, plus some commonly used extras Install additional packages to home directory: source(" bioclite('limma') Ask for packages that fail to compile 24
25 BioconductoR Bioconductor workflows are fantastic tutorials 25
26 BioconductoR Example DEMO RNA-Seq Analysis & UCSC Genome Browser See bioconductor.rmd 26
27 Dallas R Users Group University of Dallas, Irving, Saturdays (Accessible by DART Orange Line) 27
2 Calculation of the within-class covariance matrix
1 Topic Parallel programming in R. Using the «parallel» and «doparallel» packages. Personal computers become more and more efficient. They are mostly equipped with multi-core processors. At the same time,
More informationParallel Computing with R. Le Yan LSU
Parallel Computing with R Le Yan HPC @ LSU 3/22/2017 HPC training series Spring 2017 Outline Parallel computing primers Parallel computing with R Implicit parallelism Explicit parallelism R with GPU 3/22/2017
More informationParallel programming in R
Parallel programming in R Bjørn-Helge Mevik Research Infrastructure Services Group, USIT, UiO RIS Course Week, spring 2014 Bjørn-Helge Mevik (RIS) Parallel programming in R RIS Course Week 1 / 13 Introduction
More informationParallel Computing with R. Le Yan LSU
Parallel Computing with Le Yan HPC @ LSU 11/1/2017 HPC training series Fall 2017 Parallel Computing: Why? Getting results faster unning in parallel may speed up the time to reach solution Dealing with
More informationInteracting with Remote Systems + MPI
Interacting with Remote Systems + MPI Advanced Statistical Programming Camp Jonathan Olmsted (Q-APS) Day 2: May 28th, 2014 PM Session ASPC Interacting with Remote Systems + MPI Day 2 PM 1 / 17 Getting
More informationMonitoring and Trouble Shooting on BioHPC
Monitoring and Trouble Shooting on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2017-03-15 Why Monitoring & Troubleshooting data code Monitoring jobs running
More informationGetting Started with doparallel and foreach
Steve Weston and Rich Calaway doc@revolutionanalytics.com September 19, 2017 1 Introduction The doparallel package is a parallel backend for the foreach package. It provides a mechanism needed to execute
More informationThe BioHPC Nucleus Cluster & Future Developments
1 The BioHPC Nucleus Cluster & Future Developments Overview Today we ll talk about the BioHPC Nucleus HPC cluster with some technical details for those interested! How is it designed? What hardware does
More informationFirst steps in Parallel Computing
MilanoR 4th meeting October 24, 2013 First steps in Parallel Computing Anna Longari anna.longari@quantide.com Outline Parallel Computing Implicit Parallelism Explicit Parallelism Example on Amazon Servers
More informationIntroduction to BioHPC
Introduction to BioHPC New User Training [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2015-06-03 Overview Today we re going to cover: What is BioHPC? How do I access
More informationparallel Parallel R ANF R Vincent Miele CNRS 07/10/2015
Parallel R ANF R Vincent Miele CNRS 07/10/2015 Thinking Plan Thinking Context Principles Traditional paradigms and languages Parallel R - the foundations embarrassingly computations in R the snow heritage
More informationParallel Computing with R and How to Use it on High Performance Computing Cluster
UNIVERSITY OF TEXAS AT SAN ANTONIO Parallel Computing with R and How to Use it on High Performance Computing Cluster Liang Jing Nov. 2010 1 1 ABSTRACT Methodological advances have led to much more computationally
More informationIntroduction to dompi
Steve Weston stephen.b.weston@gmail.com May 1, 2017 1 Introduction The dompi package is what I call a parallel backend for the foreach package. Since the foreach package is not a parallel programming system,
More informationHigh Performance Computing Cluster Advanced course
High Performance Computing Cluster Advanced course Jeremie Vandenplas, Gwen Dawes 9 November 2017 Outline Introduction to the Agrogenomics HPC Submitting and monitoring jobs on the HPC Parallel jobs on
More informationUsing the dorng package
Using the dorng package dorng package Version 1.6 Renaud Gaujoux March 5, 2014 Contents Introduction............. 1 1 The %dorng% operator...... 3 1.1 How it works......... 3 1.2 Seeding computations.....
More informationUsing Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta
Using Compute Canada Masao Fujinaga Information Services and Technology University of Alberta Introduction to cedar batch system jobs are queued priority depends on allocation and past usage Cedar Nodes
More informationParallel Computing with Matlab and R
Parallel Computing with Matlab and R scsc@duke.edu https://wiki.duke.edu/display/scsc Tom Milledge tm103@duke.edu Overview Running Matlab and R interactively and in batch mode Introduction to Parallel
More informationR at the U of R: Benchmarking Computing Resources
R at the U of R: Benchmarking Computing Resources Jonathan P. Olmsted Contents 1 Introduction 1 1.1 Description of Document....................................... 1 2 Scope of the Benchmarking 2 2.1 Environments.............................................
More informationBig Data Analytics at OSC
Big Data Analytics at OSC 04/05/2018 SUG Shameema Oottikkal Data Application Engineer Ohio SuperComputer Center email:soottikkal@osc.edu 1 Data Analytics at OSC Introduction: Data Analytical nodes OSC
More informationDuke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu
Duke Compute Cluster Workshop 3/28/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch
More informationParallel Architecture & Programing Models for Face Recognition
Parallel Architecture & Programing Models for Face Recognition Submitted by Sagar Kukreja Computer Engineering Department Rochester Institute of Technology Agenda Introduction to face recognition Feature
More informationUsing Existing Numerical Libraries on Spark
Using Existing Numerical Libraries on Spark Brian Spector Chicago Spark Users Meetup June 24 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm
More informationComparing R and Python for PCA PyData Boston 2013
Vipin Sachdeva Senior Engineer, IBM Research Comparing R and Python for PCA PyData Boston 2013 Comparison of R and Python for Principal Component Analysis R and Python are popular choices for data analysis.
More informationIntroduction to UBELIX
Science IT Support (ScITS) Michael Rolli, Nico Färber Informatikdienste Universität Bern 06.06.2017, Introduction to UBELIX Agenda > Introduction to UBELIX (Overview only) Other topics spread in > Introducing
More informationIntroduction to BioHPC New User Training
Introduction to BioHPC New User Training [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2019-02-06 Overview Today we re going to cover: What is BioHPC? How do I access
More informationUsing Numerical Libraries on Spark
Using Numerical Libraries on Spark Brian Spector London Spark Users Meetup August 18 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm with
More informationDeep Learning Frameworks with Spark and GPUs
Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,
More informationIntroducing Oracle R Enterprise 1.4 -
Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I
More informationSIBER User Manual. Pan Tong and Kevin R Coombes. May 27, Introduction 1
SIBER User Manual Pan Tong and Kevin R Coombes May 27, 2015 Contents 1 Introduction 1 2 Using SIBER 1 2.1 A Quick Example........................................... 1 2.2 Dealing With RNAseq Normalization................................
More informationIntroduction to BioHPC
Introduction to BioHPC New User Training [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2018-03-07 Overview Today we re going to cover: What is BioHPC? How do I access
More informationParallel Systems. Project topics
Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a
More informationXeon Phi Native Mode - Sharpen Exercise
Xeon Phi Native Mode - Sharpen Exercise Fiona Reid, Andrew Turner, Dominic Sloan-Murphy, David Henty, Adrian Jackson Contents June 19, 2015 1 Aims 1 2 Introduction 1 3 Instructions 2 3.1 Log into yellowxx
More informationA Quick Introduction to Machine Learning. Paul Rodriguez (and Mai Nguyen) SDSC
A Quick Introduction to Machine Learning Paul Rodriguez (and Mai Nguyen) SDSC Overview Terminology and Key concepts Modeling and Machine Learning Main Activities of Modeling R and HPC Deep Learning and
More informationSCALABLE HYBRID PROTOTYPE
SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform
More informationIntroduction to BatchtoolsParam
Nitesh Turaga 1, Martin Morgan 2 Edited: March 22, 2018; Compiled: January 4, 2019 1 Nitesh.Turaga@ RoswellPark.org 2 Martin.Morgan@ RoswellPark.org Contents 1 Introduction..............................
More informationXeon Phi Native Mode - Sharpen Exercise
Xeon Phi Native Mode - Sharpen Exercise Fiona Reid, Andrew Turner, Dominic Sloan-Murphy, David Henty, Adrian Jackson Contents April 30, 2015 1 Aims The aim of this exercise is to get you compiling and
More informationRDAV and Nautilus
http://rdav.nics.tennessee.edu/ RDAV and Nautilus Parallel Processing with R Amy F. Szczepa!ski Remote Data Analysis and Visualization Center University of Tennessee, Knoxville aszczepa@utk.edu Any opinions,
More informationIntroduction to BioHPC
Introduction to BioHPC New User Training [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2017-01-04 Overview Today we re going to cover: What is BioHPC? How do I access
More informationGraham vs legacy systems
New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet
More informationGetting Started. April Strand Life Sciences, Inc All rights reserved.
Getting Started April 2015 Strand Life Sciences, Inc. 2015. All rights reserved. Contents Aim... 3 Demo Project and User Interface... 3 Downloading Annotations... 4 Project and Experiment Creation... 6
More information7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT
7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT Draft Printed for SECO Murex S.A.S 2012 all rights reserved Murex Analytics Only global vendor of trading, risk management and processing systems focusing also
More informationParallel Programming in MATLAB on BioHPC
Parallel Programming in MATLAB on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2017-05-17 What is MATLAB High level language and development environment for:
More informationUsing The foreach Package
Steve Weston doc@revolutionanalytics.com December 9, 2017 1 Introduction One of R s most useful features is its interactive interpreter. This makes it very easy to learn and experiment with R. It allows
More informationBrief notes on setting up semi-high performance computing environments. July 25, 2014
Brief notes on setting up semi-high performance computing environments July 25, 2014 1 We have two different computing environments for fitting demanding models to large space and/or time data sets. 1
More informationParallelism paradigms
Parallelism paradigms Intro part of course in Parallel Image Analysis Elias Rudberg elias.rudberg@it.uu.se March 23, 2011 Outline 1 Parallelization strategies 2 Shared memory 3 Distributed memory 4 Parallelization
More informationDuke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/
Duke Compute Cluster Workshop 11/10/2016 Tom Milledge h:ps://rc.duke.edu/ rescompu>ng@duke.edu Outline of talk Overview of Research Compu>ng resources Duke Compute Cluster overview Running interac>ve and
More informationChoosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing
Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 Email:plamenkrastev@fas.harvard.edu Objectives Inform you of available computational resources Help you choose appropriate computational
More informationRNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University
RNA-Seq Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University joshua.ainsley@tufts.edu Day four Quantifying expression Intro to R Differential expression
More informationLogging in to the CRAY
Logging in to the CRAY 1. Open Terminal Cray Hostname: cray2.colostate.edu Cray IP address: 129.82.103.183 On a Mac 2. type ssh username@cray2.colostate.edu where username is your account name 3. enter
More informationBig Data Analytics with Hadoop and Spark at OSC
Big Data Analytics with Hadoop and Spark at OSC 09/28/2017 SUG Shameema Oottikkal Data Application Engineer Ohio SuperComputer Center email:soottikkal@osc.edu 1 Data Analytics at OSC Introduction: Data
More informationInstallation and Introduction to Jupyter & RStudio
Installation and Introduction to Jupyter & RStudio CSE 4/587 Data Intensive Computing Spring 2017 Prepared by Jacob Condello 1 Anaconda/Jupyter Installation 1.1 What is Anaconda? Anaconda is a freemium
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationwith High Performance Computing: Parallel processing and large memory Many thanks allocations
R with High Performance Computing: Parallel processing and large memory Amy F. Szczepański, Remote Data Analysis and Visualization Center, University of Tennessee http://rdav.nics.tennessee.edu/ Many thanks
More informationAmdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP
AMath 483/583 Lecture 13 April 25, 2011 Amdahl s Law Today: Amdahl s law Speed up, strong and weak scaling OpenMP Typically only part of a computation can be parallelized. Suppose 50% of the computation
More informationDuke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu
Duke Compute Cluster Workshop 10/04/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationGalaxy workshop at the Winter School Igor Makunin
Galaxy workshop at the Winter School 2016 Igor Makunin i.makunin@uq.edu.au Winter school, UQ, July 6, 2016 Plan Overview of the Genomics Virtual Lab Introduce Galaxy, a web based platform for analysis
More informationParallel and high performance processing with R An introduction to the high performance computing environment at SHARCNET
Parallel and high performance processing with R An introduction to the high performance computing environment at SHARCNET General Interest Seminar Series Teaching the lab skills for SCIENTIFIC COMPUTING
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 0: Course information Cho-Jui Hsieh UC Davis April 3, 2018 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/
More informationWorkstations & Thin Clients
1 Workstations & Thin Clients Overview Why use a BioHPC computer? System Specs Network requirements OS Tour Running Code Locally Submitting Jobs to the Cluster Run Graphical Jobs on the Cluster Use Windows
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University
More informationSparse Training Data Tutorial of Parameter Server
Carnegie Mellon University Sparse Training Data Tutorial of Parameter Server Mu Li! CSD@CMU & IDL@Baidu! muli@cs.cmu.edu High-dimensional data are sparse Why high dimension?! make the classifier s job
More informationIntroduction to SLURM on the High Performance Cluster at the Center for Computational Research
Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY
More informationDaniel D. Warner. May 31, Introduction to Parallel Matlab. Daniel D. Warner. Introduction. Matlab s 5-fold way. Basic Matlab Example
to May 31, 2010 What is Matlab? Matlab is... an Integrated Development Environment for solving numerical problems in computational science. a collection of state-of-the-art algorithms for scientific computing
More informationStatistical and Mathematical Software on HPC systems. Jefferson Davis Research Analytics
Statistical and Mathematical Software on HPC systems Jefferson Davis Research Analytics Plan of Attack Look at three packages on Karst: SAS, R, Matlab. Look at running a common task in all three. Discuss
More informationUsing CSC Environment Efficiently,
Using CSC Environment Efficiently, 17.09.2018 1 Exercises a) Log in to Taito either with your training or CSC user account, either from a terminal (with X11 forwarding) or using NoMachine client b) Go
More informationSimple Parallel Statistical Computing in R
Simple Parallel Statistical Computing in R Luke Tierney Department of Statistics & Actuarial Science University of Iowa December 7, 2007 Luke Tierney (U. of Iowa) Simple Parallel Statistical Computing
More informationSpeeding up R code using Rcpp and foreach packages.
Speeding up R code using Rcpp and foreach packages. Pedro Albuquerque Universidade de Brasília February, 8th 2017 Speeding up R code using Rcpp and foreach packages. 1 1 Speeding up R. 2 foreach package.
More informationGetting the most out of your CPUs Parallel computing strategies in R
Getting the most out of your CPUs Parallel computing strategies in R Stefan Theussl Department of Statistics and Mathematics Wirtschaftsuniversität Wien July 2, 2008 Outline Introduction Parallel Computing
More informationBright Cluster Manager Advanced HPC cluster management made easy. Martijn de Vries CTO Bright Computing
Bright Cluster Manager Advanced HPC cluster management made easy Martijn de Vries CTO Bright Computing About Bright Computing Bright Computing 1. Develops and supports Bright Cluster Manager for HPC systems
More informationR for deep learning (III): CUDA and MultiGPUs Acceleration
R for deep learning (III): CUDA and MultiGPUs Acceleration Peng Zhao, ParallelR Notes: 1. The entire source code of this post in here In previous two blogs (here and here), we illustrated several skills
More informationSome notes on efficient computing and high performance computing environments
Some notes on efficient computing and high performance computing environments Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public
More informationBasic Performance Improvements
Basic Performance Improvements Advanced Statistical Programming Camp Jonathan Olmsted (Q-APS) Day 1: May 27th, 2014 PM Session ASPC Basic Performance Improvements Day 1 PM 1 / 66 This Session... 1 monitoring
More informationMoore s Law. Computer architect goal Software developer assumption
Moore s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer
More informationState of the art in Parallel Computing with R
State of the art in Parallel Computing with R Markus Schmidberger (schmidb@ibe.med.uni muenchen.de) The R User Conference 2009 July 8 10, Agrocampus Ouest, Rennes, France The Future is Parallel Prof. Bill
More informationIntroduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende
Introduction to the NCAR HPC Systems 25 May 2018 Consulting Services Group Brian Vanderwende Topics to cover Overview of the NCAR cluster resources Basic tasks in the HPC environment Accessing pre-built
More informationEffective R Programming
Effective R Programming Jacob Colvin February 21, 2009 Jacob Colvin () Effective R Programming February 21, 2009 1 / 21 1 Introduction Motivation 2 R Concepts Language Details 3 Debuging 4 Profiling Tidying
More informationParallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer
Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer 2018 The MathWorks, Inc. 1 Practical Application of Parallel Computing Why parallel computing? Need faster
More informationBarry Grant
Barry Grant bjgrant@umich.edu http://thegrantlab.org What is R? R is a freely distributed and widely used programing language and environment for statistical computing, data analysis and graphics. R provides
More informationStatistical and Mathematical Software on HPC systems. Jefferson Davis Research Analytics
Statistical and Mathematical Software on HPC systems Jefferson Davis Research Analytics Plan of Attack Look at three packages on Karst: SAS, R, Matlab. Look at running a common task in all three. Discuss
More informationTable of Contents. Table of Contents Job Manager for remote execution of QuantumATK scripts. A single remote machine
Table of Contents Table of Contents Job Manager for remote execution of QuantumATK scripts A single remote machine Settings Environment Resources Notifications Diagnostics Save and test the new machine
More informationWorkshop: R and Bioinformatics
Workshop: R and Bioinformatics Jean Monlong & Simon Papillon Human Genetics department October 28, 2013 1 Why using R for bioinformatics? I Flexible statistics and data visualization software. I Many packages
More informationCSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch
CSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch Purpose: We will take a look at programming this week using a language called Scratch. Scratch is a programming language that was developed
More informationBig Orange Bramble. August 09, 2016
Big Orange Bramble August 09, 2016 Overview HPL SPH PiBrot Numeric Integration Parallel Pi Monte Carlo FDS DANNA HPL High Performance Linpack is a benchmark for clusters Created here at the University
More informationIntel Performance Libraries
Intel Performance Libraries Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation
More informationOverview of research activities Toward portability of performance
Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into
More informationWhat is Scalable Data Processing?
SCALABLE DATA PROCESSING IN R What is Scalable Data Processing? Michael J. Kane and Simon Urbanek Instructors, DataCamp In this course.. Work with data that is too large for your computer Write Scalable
More informationTriton file systems - an introduction. slide 1 of 28
Triton file systems - an introduction slide 1 of 28 File systems Motivation & basic concepts Storage locations Basic flow of IO Do's and Don'ts Exercises slide 2 of 28 File systems: Motivation Case #1:
More informationWriting Efficient Programs in R (and Beyond)
Writing Efficient Programs in R (and Beyond) Ross Ihaka, Duncan Temple Lang, Brendan McArdle The University of Auckland The University of California, Davis Example: Generating a 2d Simple Random Walk
More informationJeff Nothwehr National Drought Mitigation Center University of Nebraska-Lincoln
Using Multiprocessing in Python to Decrease Map Production Time Jeff Nothwehr National Drought Mitigation Center University of Nebraska-Lincoln Overview About multi-processing How it works Implementation
More informationCHEOPS Cologne High Efficient Operating Platform for Science Application Software
CHEOPS Cologne High Efficient Operating Platform for Science Application Software (Version: 20.12.2017) Foto: V.Winkelmann/E.Feldmar Dr. Lars Packschies Volker Winkelmann Kevin Kaatz EMail: wiss-anwendung@uni-koeln.de
More informationHigh Performance Computing Cluster Basic course
High Performance Computing Cluster Basic course Jeremie Vandenplas, Gwen Dawes 30 October 2017 Outline Introduction to the Agrogenomics HPC Connecting with Secure Shell to the HPC Introduction to the Unix/Linux
More informationLecture 16. Today: Start looking into memory hierarchy Cache$! Yay!
Lecture 16 Today: Start looking into memory hierarchy Cache$! Yay! Note: There are no slides labeled Lecture 15. Nothing omitted, just that the numbering got out of sequence somewhere along the way. 1
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationIntroduction to Linux and Cluster Computing Environments for Bioinformatics
Introduction to Linux and Cluster Computing Environments for Bioinformatics Doug Crabill Senior Academic IT Specialist Department of Statistics Purdue University dgc@purdue.edu What you will learn Linux
More informationR Demonstration Summary Statistics and the Law of Large Numbers
R Demonstration Summary Statistics and the Law of Large Numbers Objective: The purpose of this session is to use some of the R functionality you have recently learned to demonstrate the Law of Large Numbers.
More informationIntroduction to Multicore Programming
Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming
More informationPractical High Performance Computing
Practical High Performance Computing Donour Sizemore July 21, 2005 2005 ICE Purpose of This Talk Define High Performance computing Illustrate how to get started 2005 ICE 1 Preliminaries What is high performance
More informationIntel Distribution for Python* и Intel Performance Libraries
Intel Distribution for Python* и Intel Performance Libraries 1 Motivation * L.Prechelt, An empirical comparison of seven programming languages, IEEE Computer, 2000, Vol. 33, Issue 10, pp. 23-29 ** RedMonk
More information