CS 470 Spring Mike Lam, Professor. Performance Analysis
|
|
- Dominic Fields
- 5 years ago
- Views:
Transcription
1 CS 470 Sring 2018 Mike Lam, Professor Performance Analysis
2 Performance analysis Why do we arallelize our rograms?
3 Performance analysis Why do we arallelize our rograms? So that they run faster!
4 Performance analysis How do we evaluate whether we've done a good job in arallelizing a rogram?
5 Performance analysis How do we evaluate whether we've done a good job in arallelizing a rogram? Asymtotic analysis (i.e., distributed sum) Emirical analysis
6 Emirical analysis issues How do you measure time-to-solution accurately? CPU cycles, OS clock "ticks", wall time, etc. How do you comare across systems? Differing CPUs, memories, OSes, etc. How do you comare against the original? 1-core arallel version will likely be slower How do you assess scalability? Does erformance imrove as you add cores? How do you quantify the imrovement? Is there a limit to how far we can imrove erformance?
7 Exerimental methods Measure wall time for secific code regions of interest Ignore startu and I/O time if not relevant Make sure you have a high-resolution timer! /usr/bin/time -v for whole rograms gettimeofday() from sys/time.h for Pthreads om_get_wtime() for OenMP MPI_Wtime() for MPI Use barriers if necessary to make sure all threads/rocesses have finished before you sto a timer
8 Exerimental methods Control for variance Do all exeriments on the same machine or cluster Maximum of one thread er core and one job er node Our cluster can suort 8 threads er node (or 16 if hyerthreading, but this is not recommended) Run multile trials and use minimum time Avoid OS interference or noise Track variance to measure system noise If your variance is low or if your slowest and fastest time are relatively close, it's robably noise!
9 Emirical analysis T s = serial time = arallel time = # of rocesses S = seedu = T S S E = efficiency = = T S should increase as grows usually decreases as grows
10 Emirical analysis T s = serial time = arallel time = # of rocesses S = seedu = T S S E = efficiency = = T S should increase as grows usually decreases as grows r = serial % of original rogram = (1 r )T S +r T S S = seedu = T S (1 r)t S +r T S
11 Emirical analysis T s = serial time = arallel time = # of rocesses S = seedu = T S S E = efficiency = = T S should increase as grows usually decreases as grows r = serial % of original rogram = (1 r )T S +r T S S = seedu = T S (1 r)t S +r T S Amdahl's Law: S 1 r as increases
12 Amdahl's Law = # of rocessors r = serial % of rogram S = seedu = T S (1 r)t S +r T S S Amdahl's Law: 1 r as increases r = 50% seedu limited to 2x r = 25% seedu limited to 4x r = 10% seedu limited to 10x r = 5% seedu limited to 20x Seedu limited inversely roortionally by serial %
13 Scaling Generally, we don't care about any articular TP Or with how it comares to T S (excet as a sanity check) More imortant: how TP, S, and E change as increases And/or as the roblem size increases Similar to asymtotic analysis in CS 240 In general, a rogram is scalable if E remains fixed as and the roblem size increase at fixed rates Most common: grah on y-axis vs. on logarithmic x-axis
14 Scaling Strong scaling: as increases, TP decreases Linear seedu: same rate of change (2x rocs half time) Sublinear (most common) / suerlinear (exceedingly rare) seedu Weak scaling: as increases AND the roblem size increases roortionally, stays roughly the same bad bad Strong scaling good Weak scaling good and _size
15 Scaling Alternatively: Strong scaling means we can kee the efficiency fixed without increasing the roblem size Weak scaling means we can kee the efficiency fixed by increasing the roblem size at the same rate as the rocess/thread count S E = efficiency = = T S usually decreases as grows
16 Cluster access Detailed instructions online: w3.cs.jmu.edu/lam2mo/cs470/cluster.html Connect to login node via SSH Hostname: login.cluster.cs.jmu.edu User/assword: (your e-id and assword) Recommended conveniences Set u ublic/rivate key access from stu Set u.ssh/config entries Install Sack for access to more software
17 Cluster access Things to lay with: "squeue" or "watch squeue" to see jobs "srun <command>" to run an interactive job Use -n <> to launch rocesses Use -N <n> to request n nodes (defaults to /8) The given <command> will run in every rocess "salloc <command>" to run an interactive MPI job Use -n <> to launch MPI rocesses srun hostname srun -n 4 hostname srun -n 16 hostname srun -N 4 hostname srun slee 5 srun -N 2 slee 5 salloc -n 1 mirun /shared/mi-i/mii salloc -n 2 mirun /shared/mi-i/mii salloc -n 4 mirun /shared/mi-i/mii salloc -n 8 mirun /shared/mi-i/mii salloc -n 16 mirun /shared/mi-i/mii (etc.) What s the max n?
18 Job management SLURM (Simle Linux Utility for Resource Management) is a iece of system software outside the OS (a.k.a. middleware) that handles job submission and scheduling on our cluster An interactive job takes control of your terminal Run with srun or sbatch You may interact with it (rovide standard inut, etc.) You also have to wait for it to finish Similar to a foreground shell job A batch job runs in the background without interaction Create a shell scrit and run it with sbatch Sends outut to a file (named slurm-jobid.out by default) Use squeue to check to see if it has finished
19 Batch jobs To run a batch job on the cluster, create a shell scrit and run it with sbatch Bash examle: #!/bin/bash # #SBATCH --job-name=hostname #SBATCH --nodes=1 #SBATCH --ntasks=1 <your commands go here>
20 Running exeriments Common exerimentation atterns in Bash: # run 5 times for i in $(seq 1 5); do <cmd> done # run common thread counts for t in ; do OMP_NUM_THREADS=$t <cmd> done
Topics. Lecture 4. IT Group Cluster2 (1/2) What is a cluster? IT Group Cluster2 (2/2) Important Commands / Queuing.
Toics Our Cluster Lecture 4 MPI Programming (I) MPI Introduction Information inquery Broadcast / Reduce 1 2 What is a cluster? A cluster is a dedicated resource for running comutational tasks. A collection
More informationAssignment #3. Assignment #3. Assignment #3. What is a cluster? IT Group Cluster2 (1/2) IT Group Cluster2
Assignment #3 Assignment #3 How to count FLOP? A = A + b * c 2 floating oint oerations for(int i=0;i
More informationSPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation
To aear in IEEE VLSI Test Symosium, 1997 SITFIRE: Scalable arallel Algorithms for Test Set artitioned Fault Simulation Dili Krishnaswamy y Elizabeth M. Rudnick y Janak H. atel y rithviraj Banerjee z y
More informationIntroduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU
Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU What is Joker? NMSU s supercomputer. 238 core computer cluster. Intel E-5 Xeon CPUs and Nvidia K-40 GPUs. InfiniBand innerconnect.
More informationPREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS
PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS Kevin Miller, Vivian Lin, and Rui Zhang Grou ID: 5 1. INTRODUCTION The roblem we are trying to solve is redicting future links or recovering missing links
More informationOptimization of Collective Communication Operations in MPICH
To be ublished in the International Journal of High Performance Comuting Alications, 5. c Sage Publications. Otimization of Collective Communication Oerations in MPICH Rajeev Thakur Rolf Rabenseifner William
More informationEfficient Parallel Hierarchical Clustering
Efficient Parallel Hierarchical Clustering Manoranjan Dash 1,SimonaPetrutiu, and Peter Scheuermann 1 Deartment of Information Systems, School of Comuter Engineering, Nanyang Technological University, Singaore
More informationAUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS. Ren Chen and Viktor K.
inuts er clock cycle Streaming ermutation oututs er clock cycle AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS Ren Chen and Viktor K.
More informationHeterogeneous Job Support
Heterogeneous Job Support Tim Wickberg SchedMD SC17 Submitting Jobs Multiple independent job specifications identified in command line using : separator The job specifications are sent to slurmctld daemon
More information10. Multiprocessor Scheduling (Advanced)
10. Multirocessor Scheduling (Advanced) Oerating System: Three Easy Pieces AOS@UC 1 Multirocessor Scheduling The rise of the multicore rocessor is the source of multirocessorscheduling roliferation. w
More informationBash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University
Bash for SLURM Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University wesley.schaal@farmbio.uu.se Lab session: Pavlin Mitev (pavlin.mitev@kemi.uu.se) it i slides at http://uppmax.uu.se/support/courses
More informationS16-02, URL:
Self Introduction A/Prof ay Seng Chuan el: Email: scitaysc@nus.edu.sg Office: S-0, Dean s s Office at Level URL: htt://www.hysics.nus.edu.sg/~hytaysc I was a rogrammer from to. I have been working in NUS
More informationIntroduction to Parallel Algorithms
CS 1762 Fall, 2011 1 Introduction to Parallel Algorithms Introduction to Parallel Algorithms ECE 1762 Algorithms and Data Structures Fall Semester, 2011 1 Preliminaries Since the early 1990s, there has
More informationLecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.
U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 18 Professor Satish Rao Lecturer: Satish Rao Last revised Scribe so far: Satish Rao (following revious lecture notes quite closely. Lecture
More informationA BICRITERION STEINER TREE PROBLEM ON GRAPH. Mirko VUJO[EVI], Milan STANOJEVI] 1. INTRODUCTION
Yugoslav Journal of Oerations Research (00), umber, 5- A BICRITERIO STEIER TREE PROBLEM O GRAPH Mirko VUJO[EVI], Milan STAOJEVI] Laboratory for Oerational Research, Faculty of Organizational Sciences University
More informationParallel Merge Sort Using MPI
Parallel Merge Sort Using MPI CSE 702: Seminar on Programming Massively Parallel Systems Course Instructor: Dr. Russ Miller UB Distinguished Professor Department of Computer Science & Engineering State
More information10. Parallel Methods for Data Sorting
10. Parallel Methods for Data Sorting 10. Parallel Methods for Data Sorting... 1 10.1. Parallelizing Princiles... 10.. Scaling Parallel Comutations... 10.3. Bubble Sort...3 10.3.1. Sequential Algorithm...3
More informationSlurm basics. Summer Kickstart June slide 1 of 49
Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource
More informationStatistical Detection for Network Flooding Attacks
Statistical Detection for Network Flooding Attacks C. S. Chao, Y. S. Chen, and A.C. Liu Det. of Information Engineering, Feng Chia Univ., Taiwan 407, OC. Email: cschao@fcu.edu.tw Abstract In order to meet
More informationGraham vs legacy systems
New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet
More informationExercises: Abel/Colossus and SLURM
Exercises: Abel/Colossus and SLURM November 08, 2016 Sabry Razick The Research Computing Services Group, USIT Topics Get access Running a simple job Job script Running a simple job -- qlogin Customize
More informationA Scalable Parallel Approach for Peptide Identification from Large-scale Mass Spectrometry Data
2009 International Conference on Parallel Processing Workshos A Scalable Parallel Aroach for Petide Identification from Large-scale Mass Sectrometry Data Gaurav Kulkarni, Ananth Kalyanaraman School of
More informationSherlock for IBIIS. William Law Stanford Research Computing
Sherlock for IBIIS William Law Stanford Research Computing Overview How we can help System overview Tech specs Signing on Batch submission Software environment Interactive jobs Next steps We are here to
More informationA New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism
A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism Erlin Yao, Mingyu Chen, Rui Wang, Wenli Zhang, Guangming Tan Key Laboratory of Comuter System and Architecture Institute
More information2. Introduction to Operating Systems
2. Introduction to Oerating Systems Oerating System: Three Easy Pieces 1 What a haens when a rogram runs? A running rogram executes instructions. 1. The rocessor fetches an instruction from memory. 2.
More informationMIC Lab Parallel Computing on Stampede
MIC Lab Parallel Computing on Stampede Aaron Birkland and Steve Lantz Cornell Center for Advanced Computing June 11 & 18, 2013 1 Interactive Launching This exercise will walk through interactively launching
More informationCRUK cluster practical sessions (SLURM) Part I processes & scripts
CRUK cluster practical sessions (SLURM) Part I processes & scripts login Log in to the head node, clust1-headnode, using ssh and your usual user name & password. SSH Secure Shell 3.2.9 (Build 283) Copyright
More informationFigure 8.1: Home age taken from the examle health education site (htt:// Setember 14, 2001). 201
200 Chater 8 Alying the Web Interface Profiles: Examle Web Site Assessment 8.1 Introduction This chater describes the use of the rofiles develoed in Chater 6 to assess and imrove the quality of an examle
More informationProcess and Measurement System Capability Analysis
Process and Measurement System aability Analysis Process caability is the uniformity of the rocess. Variability is a measure of the uniformity of outut. Assume that a rocess involves a quality characteristic
More informationComplexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks
Journal of Comuting and Information Technology - CIT 8, 2000, 1, 1 12 1 Comlexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks Eunice E. Santos Deartment of Electrical
More informationA Model-Adaptable MOSFET Parameter Extraction System
A Model-Adatable MOSFET Parameter Extraction System Masaki Kondo Hidetoshi Onodera Keikichi Tamaru Deartment of Electronics Faculty of Engineering, Kyoto University Kyoto 66-1, JAPAN Tel: +81-7-73-313
More informationIntroduction to SLURM on the High Performance Cluster at the Center for Computational Research
Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY
More informationUsing Rational Numbers and Parallel Computing to Efficiently Avoid Round-off Errors on Map Simplification
Using Rational Numbers and Parallel Comuting to Efficiently Avoid Round-off Errors on Ma Simlification Maurício G. Grui 1, Salles V. G. de Magalhães 1,2, Marcus V. A. Andrade 1, W. Randolh Franklin 2,
More informationBatch Systems & Parallel Application Launchers Running your jobs on an HPC machine
Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike
More informationCOMP Parallel Computing. BSP (1) Bulk-Synchronous Processing Model
COMP 6 - Parallel Comuting Lecture 6 November, 8 Bulk-Synchronous essing Model Models of arallel comutation Shared-memory model Imlicit communication algorithm design and analysis relatively simle but
More informationUsing a Linux System 6
Canaan User Guide Connecting to the Cluster 1 SSH (Secure Shell) 1 Starting an ssh session from a Mac or Linux system 1 Starting an ssh session from a Windows PC 1 Once you're connected... 1 Ending an
More informationIntroduction to SLURM & SLURM batch scripts
Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 16 Feb 2017 Overview of Talk Basic SLURM commands SLURM batch
More informationIntroduction to SLURM & SLURM batch scripts
Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 23 June 2016 Overview of Talk Basic SLURM commands SLURM batch
More informationA Parallel Algorithm for Constructing Obstacle-Avoiding Rectilinear Steiner Minimal Trees on Multi-Core Systems
A Parallel Algorithm for Constructing Obstacle-Avoiding Rectilinear Steiner Minimal Trees on Multi-Core Systems Cheng-Yuan Chang and I-Lun Tseng Deartment of Comuter Science and Engineering Yuan Ze University,
More informationSubmitting and running jobs on PlaFRIM2 Redouane Bouchouirbat
Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat Summary 1. Submitting Jobs: Batch mode - Interactive mode 2. Partition 3. Jobs: Serial, Parallel 4. Using generic resources Gres : GPUs, MICs.
More informationLearning Motion Patterns in Crowded Scenes Using Motion Flow Field
Learning Motion Patterns in Crowded Scenes Using Motion Flow Field Min Hu, Saad Ali and Mubarak Shah Comuter Vision Lab, University of Central Florida {mhu,sali,shah}@eecs.ucf.edu Abstract Learning tyical
More informationParallel Construction of Multidimensional Binary Search Trees. Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka
Parallel Construction of Multidimensional Binary Search Trees Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka School of CIS and School of CISE Northeast Parallel Architectures Center Syracuse
More informationImproving Trust Estimates in Planning Domains with Rare Failure Events
Imroving Trust Estimates in Planning Domains with Rare Failure Events Colin M. Potts and Kurt D. Krebsbach Det. of Mathematics and Comuter Science Lawrence University Aleton, Wisconsin 54911 USA {colin.m.otts,
More informationEquality-Based Translation Validator for LLVM
Equality-Based Translation Validator for LLVM Michael Ste, Ross Tate, and Sorin Lerner University of California, San Diego {mste,rtate,lerner@cs.ucsd.edu Abstract. We udated our Peggy tool, reviously resented
More informationAutonomic Physical Database Design - From Indexing to Multidimensional Clustering
Autonomic Physical Database Design - From Indexing to Multidimensional Clustering Stehan Baumann, Kai-Uwe Sattler Databases and Information Systems Grou Technische Universität Ilmenau, Ilmenau, Germany
More informationObjectives. Part 1: Implement Friction Compensation.
ME 446 Laboratory # Inverse Dynamics Joint Control Reort is due at the beginning of your lab time the week of Aril 9 th. One reort er grou. Lab sessions will be held the weeks of March th, March 6 th,
More informationSCALABLE HYBRID PROTOTYPE
SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform
More informationMultigrain Parallel Delaunay Mesh Generation: Challenges and Opportunities for Multithreaded Architectures
Multigrain Parallel Delaunay Mesh Generation: Challenges and Oortunities for Multithreaded Architectures Christos D. Antonooulos, Xiaoning Ding, Andrey Chernikov, Fili Blagojevic, Dimitrios S. Nikolooulos,
More informationSimulating Ocean Currents. Simulating Galaxy Evolution
Simulating Ocean Currents (a) Cross sections (b) Satial discretization of a cross section Model as two-dimensional grids Discretize in sace and time finer satial and temoral resolution => greater accuracy
More informationEnergy consumption model over parallel programs implemented on multicore architectures
Energy consumtion model over arallel rograms imlemented on multicore architectures Ricardo Isidro-Ramírez Instituto Politécnico Nacional SEPI-ESCOM M exico, D.F. Amilcar Meneses Viveros Deartamento de
More informationAN ANALYTICAL MODEL DESCRIBING THE RELATIONSHIPS BETWEEN LOGIC ARCHITECTURE AND FPGA DENSITY
AN ANALYTICAL MODEL DESCRIBING THE RELATIONSHIPS BETWEEN LOGIC ARCHITECTURE AND FPGA DENSITY Andrew Lam 1, Steven J.E. Wilton 1, Phili Leong 2, Wayne Luk 3 1 Elec. and Com. Engineering 2 Comuter Science
More informationLearning Robust Locality Preserving Projection via p-order Minimization
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Learning Robust Locality Preserving Projection via -Order Minimization Hua Wang, Feiing Nie, Heng Huang Deartment of Electrical
More informationHow to run a job on a Cluster?
How to run a job on a Cluster? Cluster Training Workshop Dr Samuel Kortas Computational Scientist KAUST Supercomputing Laboratory Samuel.kortas@kaust.edu.sa 17 October 2017 Outline 1. Resources available
More informationMultidimensional Service Weight Sequence Mining based on Cloud Service Utilization in Jyaguchi
Proceedings of the International MultiConference of Engineers and Comuter Scientists 2013 Vol I, Multidimensional Service Weight Sequence Mining based on Cloud Service Utilization in Jyaguchi Shree Krishna
More informationCommunication-Avoiding Parallel Algorithms for Solving Triangular Matrix Equations
Research Collection Bachelor Thesis Communication-Avoiding Parallel Algorithms for Solving Triangular Matrix Equations Author(s): Wicky, Tobias Publication Date: 2015 Permanent Link: htts://doi.org/10.3929/ethz-a-010686133
More informationAn empirical analysis of loopy belief propagation in three topologies: grids, small-world networks and random graphs
An emirical analysis of looy belief roagation in three toologies: grids, small-world networks and random grahs R. Santana, A. Mendiburu and J. A. Lozano Intelligent Systems Grou Deartment of Comuter Science
More informationLAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers
LAB Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012 1 Discovery
More informationSpace-efficient Region Filling in Raster Graphics
"The Visual Comuter: An International Journal of Comuter Grahics" (submitted July 13, 1992; revised December 7, 1992; acceted in Aril 16, 1993) Sace-efficient Region Filling in Raster Grahics Dominik Henrich
More informationPRO: a Model for Parallel Resource-Optimal Computation
PRO: a Model for Parallel Resource-Otimal Comutation Assefaw Hadish Gebremedhin Isabelle Guérin Lassous Jens Gustedt Jan Arne Telle Abstract We resent a new arallel comutation model that enables the design
More informationA label distance maximum-based classifier for multi-label learning
Bio-Medical Materials and Engineering 26 (2015) S1969 S1976 DOI 10.3233/BME-151500 IOS ress S1969 A label distance maximum-based classifier for multi-label learning Xiaoli Liu a,b, Hang Bao a, Dazhe Zhao
More informationIntroduction to SLURM & SLURM batch scripts
Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 6 February 2018 Overview of Talk Basic SLURM commands SLURM batch
More informationAn Efficient Coding Method for Coding Region-of-Interest Locations in AVS2
An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2 Mingliang Chen 1, Weiyao Lin 1*, Xiaozhen Zheng 2 1 Deartment of Electronic Engineering, Shanghai Jiao Tong University, China
More information12) United States Patent 10) Patent No.: US 6,321,328 B1
USOO6321328B1 12) United States Patent 10) Patent No.: 9 9 Kar et al. (45) Date of Patent: Nov. 20, 2001 (54) PROCESSOR HAVING DATA FOR 5,961,615 10/1999 Zaid... 710/54 SPECULATIVE LOADS 6,006,317 * 12/1999
More informationA Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs) (Extended Abstract)
A Fast, Parallel Sanning Tree Algorithm for Symmetric Multirocessors (SMPs) (Extended Abstract) David A. Bader Guojing Cong Electrical and Comuter Engineering Deartment University of New Mexico, Albuquerque,
More informationModified Bloom filter for high performance hybrid NoSQL systems
odified Bloom filter for high erformance hybrid NoSQL systems A.B.Vavrenyuk, N.P.Vasilyev, V.V.akarov, K.A.atyukhin,..Rovnyagin, A.A.Skitev National Research Nuclear University EPhI (oscow Engineering
More informationModel-Based Annotation of Online Handwritten Datasets
Model-Based Annotation of Online Handwritten Datasets Anand Kumar, A. Balasubramanian, Anoo Namboodiri and C.V. Jawahar Center for Visual Information Technology, International Institute of Information
More informationDesign Trade-offs in Customized On-chip Crossbar Schedulers
J Sign Process Syst () 8:9 8 DOI.7/s-8--x Design Trade-offs in Customized On-chi Crossbar Schedulers Jae Young Hur Stehan Wong Todor Stefanov Received: October 7 / Revised: June 8 / cceted: ugust 8 / Published
More informationA GPU Heterogeneous Cluster Scheduling Model for Preventing Temperature Heat Island
A GPU Heterogeneous Cluster Scheduling Model for Preventing Temerature Heat Island Yun-Peng CAO 1,2,a and Hai-Feng WANG 1,2 1 School of Information Science and Engineering, Linyi University, Linyi Shandong,
More informationA Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing
A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Grah Processing Abdullah Gharaibeh, Lauro Beltrão Costa, Elizeu Santos-Neto, Matei Rieanu Deartment of Electrical and Comuter Engineering, The University
More informationSTARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)
STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2 (Mouse over to the left to see thumbnails of all of the slides) ALLINEA DDT Allinea DDT is a powerful, easy-to-use graphical debugger capable of debugging a
More information28. Locks. Operating System: Three Easy Pieces
28. Locks Oerating System: Three Easy Pieces AOS@UC 1 Locks: The Basic Idea Ensure that any critical section executes as if it were a single atomic instruction. w An examle: the canonical udate of a shared
More informationBatch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC
Batch Usage on JURECA Introduction to Slurm May 2016 Chrysovalantis Paschoulas HPS group @ JSC Batch System Concepts Resource Manager is the software responsible for managing the resources of a cluster,
More informationAn improved algorithm for Hausdorff Voronoi diagram for non-crossing sets
An imroved algorithm for Hausdorff Voronoi diagram for non-crossing sets Frank Dehne, Anil Maheshwari and Ryan Taylor May 26, 2006 Abstract We resent an imroved algorithm for building a Hausdorff Voronoi
More informationPart One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary
C MPI Slurm Tutorial - Hello World Introduction The example shown here demonstrates the use of the Slurm Scheduler for the purpose of running a C/MPI program. Knowledge of C is assumed. Having read the
More informationControl plane and data plane. Computing systems now. Glacial process of innovation made worse by standards process. Computing systems once upon a time
Classical work Architecture A A A Intro to SDN A A Oerating A Secialized Packet A A Oerating Secialized Packet A A A Oerating A Secialized Packet A A Oerating A Secialized Packet Oerating Secialized Packet
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationIntroduction to RCC. September 14, 2016 Research Computing Center
Introduction to HPC @ RCC September 14, 2016 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers
More informationIntroduction to UBELIX
Science IT Support (ScITS) Michael Rolli, Nico Färber Informatikdienste Universität Bern 06.06.2017, Introduction to UBELIX Agenda > Introduction to UBELIX (Overview only) Other topics spread in > Introducing
More informationModels for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chip Platform
Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chi Platform Uzi Vishkin George C. Caragea Bryant Lee Aril 2006 University of Maryland, College Park, MD 20740 UMIACS-TR
More informationIntroduction to RCC. January 18, 2017 Research Computing Center
Introduction to HPC @ RCC January 18, 2017 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much
More informationFor Dr Landau s PHYS8602 course
For Dr Landau s PHYS8602 course Shan-Ho Tsai (shtsai@uga.edu) Georgia Advanced Computing Resource Center - GACRC January 7, 2019 You will be given a student account on the GACRC s Teaching cluster. Your
More informationHigh Performance Computing Cluster Advanced course
High Performance Computing Cluster Advanced course Jeremie Vandenplas, Gwen Dawes 9 November 2017 Outline Introduction to the Agrogenomics HPC Submitting and monitoring jobs on the HPC Parallel jobs on
More informationSzámítogépes modellezés labor (MSc)
Számítogépes modellezés labor (MSc) Running Simulations on Supercomputers Gábor Rácz Physics of Complex Systems Department Eötvös Loránd University, Budapest September 19, 2018, Budapest, Hungary Outline
More informationCOSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008
COSC 6374 Parallel Computation Debugging MPI applications Spring 2008 How to use a cluster A cluster usually consists of a front-end node and compute nodes Name of the front-end node: shark.cs.uh.edu You
More informationFast Distributed Process Creation with the XMOS XS1 Architecture
Communicating Process Architectures 20 P.H. Welch et al. (Eds.) IOS Press, 20 c 20 The authors and IOS Press. All rights reserved. Fast Distributed Process Creation with the XMOS XS Architecture James
More informationDuke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu
Duke Compute Cluster Workshop 3/28/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch
More informationSubmitting batch jobs Slurm on ecgate Solutions to the practicals
Submitting batch jobs Slurm on ecgate Solutions to the practicals Xavi Abellan xavier.abellan@ecmwf.int User Support Section Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 1 Practical 1: Basic job
More informationPatterned Wafer Segmentation
atterned Wafer Segmentation ierrick Bourgeat ab, Fabrice Meriaudeau b, Kenneth W. Tobin a, atrick Gorria b a Oak Ridge National Laboratory,.O.Box 2008, Oak Ridge, TN 37831-6011, USA b Le2i Laboratory Univ.of
More informationEfficient Sequence Generator Mining and its Application in Classification
Efficient Sequence Generator Mining and its Alication in Classification Chuancong Gao, Jianyong Wang 2, Yukai He 3 and Lizhu Zhou 4 Tsinghua University, Beijing 0084, China {gaocc07, heyk05 3 }@mails.tsinghua.edu.cn,
More informationOpenMP threading on Mio and AuN. Timothy H. Kaiser, Ph.D. Feb 23, 2015
OpenMP threading on Mio and AuN. Timothy H. Kaiser, Ph.D. Feb 23, 2015 Abstract The nodes on Mio have between 8 and 24 cores each. AuN nodes have 16 cores. Mc2 nodes also have 16 cores each. Many people
More informationResource Management at LLNL SLURM Version 1.2
UCRL PRES 230170 Resource Management at LLNL SLURM Version 1.2 April 2007 Morris Jette (jette1@llnl.gov) Danny Auble (auble1@llnl.gov) Chris Morrone (morrone2@llnl.gov) Lawrence Livermore National Laboratory
More informationSlurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX
Slurm at UPPMAX How to submit jobs with our queueing system Jessica Nettelblad sysadmin at UPPMAX Slurm at UPPMAX Intro Queueing with Slurm How to submit jobs Testing How to test your scripts before submission
More informationScheduling By Trackable Resources
Scheduling By Trackable Resources Morris Jette and Dominik Bartkiewicz SchedMD Slurm User Group Meeting 2018 Thanks to NVIDIA for sponsoring this work Goals More flexible scheduling mechanism Especially
More informationImproving the Performance of MPI Derived Datatypes by Optimizing Memory-Access Cost
Imroving the Performance of MPI Derived Datatyes by Otimizing Memory-Access Cost Surendra Byna William Gro Xian-He Sun Rajeev Thakur Deartment of Comuter Science Illinois Institute of Technology Chicago,
More informationIntroduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende
Introduction to the NCAR HPC Systems 25 May 2018 Consulting Services Group Brian Vanderwende Topics to cover Overview of the NCAR cluster resources Basic tasks in the HPC environment Accessing pre-built
More informationIntroduction to High Performance Computing at Case Western Reserve University. KSL Data Center
Introduction to High Performance Computing at Case Western Reserve University Research Computing and CyberInfrastructure team KSL Data Center Presenters Emily Dragowsky Daniel Balagué Guardia Hadrian Djohari
More informationMitigating the Impact of Decompression Latency in L1 Compressed Data Caches via Prefetching
Mitigating the Imact of Decomression Latency in L1 Comressed Data Caches via Prefetching by Sean Rea A thesis resented to Lakehead University in artial fulfillment of the requirement for the degree of
More informationLecture notes for CS Chapter 4 11/27/18
Chapter 5: Thread-Level arallelism art 1 Introduction What is a parallel or multiprocessor system? Why parallel architecture? erformance potential Flynn classification Communication models Architectures
More informationRHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK
RHRK-Seminar High Performance Computing with the Cluster Elwetritsch - II Course instructor : Dr. Josef Schüle, RHRK Overview Course I Login to cluster SSH RDP / NX Desktop Environments GNOME (default)
More informationOpenMP Exercises. These exercises will introduce you to using OpenMP for parallel programming. There are four exercises:
OpenMP Exercises These exercises will introduce you to using OpenMP for parallel programming. There are four exercises: 1. OMP Hello World 2. Worksharing Loop 3. OMP Functions 4. Hand-coding vs. MKL To
More information