Performance Metrics. Measuring Performance
|
|
- Roy Bradley
- 5 years ago
- Views:
Transcription
1 Metrics 12/9/ Measuring How should the performance of a parallel computation be measured? Traditional measures like MIPS and MFLOPS really don t cut it New ways to measure parallel performance are needed Speedup Efficiency 12/9/
2 Measures of for Parallel Programs Speed-up How much faster than a sequential program? Efficiency How efficiently are the processors utilized Speedup Speedup is the most often used measure of parallel performance If T s (N) is the best possible serial time T(P,N) is the time taken by a parallel algorithm of size N on P processors Then Speedup, S(P,N) = T s (N) / T(P,N) 12/9/
3 Measuring Empirically Time programs on sequential and parallel machines Timings must be done under appropriate conditions (e.g., no time sharing) Theoretically Use estimates of times for different operations Build estimate of total time for program Useful for comparing different program organization Read Between the Lines Exactly what is meant by T s (N) (i.e., the time taken to run the fastest serial algorithm on one processor) One processor of the parallel computer? The fastest serial machine available? A parallel algorithm run on a single processor? Is the serial algorithm the best one? To keep things fair, T s (N) should be the best possible time in the serial world 12/9/
4 Practical Speedup A slightly different definition of speedup also exists. The time taken by the parallel algorithm on one processor divided by the time taken by the parallel algorithm on P processors. However this is misleading since many parallel algorithms contain extra operations to accommodate the parallelism (e.g., the communication) The result is T s (N) is increased thus exaggerating the speedup. 12/9/ Components of Execution Time Execution of standard instructions arithmetic, logical, branch Input and Output Communications time T tot = T comp + T io + T comm 4
5 Time for Computation Computation time (Tcomp) will depend on the number of instructions executed and the computation time per instruction. Tcomp = Number of Instr x Time per Instr Although different instructions take different times, we can approximate by using averages for a given machine. Input Output Importance depends on the problem Critical for high I/O problems Opens the avenue of research into parallel I/O systems and storage Less important for computationally intensive problems, simulations 5
6 Communications Time Dependent on the Link/Switch technology Dependent on the network topology Components start up time, or time for smallest message to get through (latency) time for an additional unit of a message to get through (rate = inverse of bandwidth) Communications Time for Message T comm = T l + kt c T l = latency (setup time) T c = rate k = length of message T c = d * T r d = distance = link rate T r 6
7 Typical Proportions of Times T c is 1 to 10 times Tcomp T l is 100 to 1000 times Tcomp (Distributed memory MIMD) or worse T r is comparable to T c for nearest neighbor communications on a SIMD machine SIMD Communications Nearest neighbor Distant (two hop, three hop, etc) Global (reduce) times depend on network behind best possible is log P (P = number of PEs) 7
8 Factors That Limit Speedup Software Overhead Even with a completely equivalent algorithm, software overhead arises in the concurrent implementation Load Balancing Speedup is generally limited by the speed of the slowest node. So an important consideration is to ensure that each node performs the same amount of work Communication Overhead Assuming that communication and calculation cannot be overlapped, then any time spent communicating the data between processors directly degrades the speedup 12/9/ Linear Speedup Which ever definition is used the ideal is to produce linear speedup A speedup of P using P processors However in practice the speedup is reduced from its ideal value of P Superlinear speedup results when unfair values are used for T s (N) Differences in the nature of the hardware used 12/9/
9 Speedup Curves Superlinear Speedup Speedup Linear Speedup Typical Speedup Number of Processors 12/9/ Efficiency Speed up does not measure how efficiently the processors are being used Is it worth using 100 processors to get a speedup of 2? Efficiency is defined as the ratio of the speedup and the number of processors required to achieve it Efficiency is given by E(P,N) = S(P, N) / P The efficiency is bounded from above by 1 12/9/
10 Example Processors Time(secs) Speedup Efficiency /9/ Speedup Curve Speedup Speedup Processors Linear Actual 12/9/
11 Timing for assignment 1 Let s look at FindMax and PeopleWave 12/9/ FindMax.pm MODULE FindMax; CONST N = 3; CONFIGURATION grid[1..n],[1..n]; CONNECTION right: grid[i,j] <-> grid[i,j+1]:left; up: grid[i,j] <-> grid[i+1,j]:down; VAR i : INTEGER; value, buffer : grid OF INTEGER; 12/9/
12 FindMax.pm (continued) BEGIN value := ID(grid); FOR i := 1 TO N-1 DO buffer := MOVE.left(value); IF buffer > value THEN value := buffer END; END ; (* FOR *) FOR i := 1 TO N-1 DO buffer := MOVE.down(value); IF buffer > value THEN value := buffer END; END ; (* FOR *) i := value<:1,1:>; WriteInt(i,10); WriteLn; END FindMax. 12/9/ MODULE PeopleWave; PeopleWave.pm CONST GRIDSIZE = 4; NUMBEROFNEIGHBORS = 8; CONFIGURATION grid[1..gridsize],[1..gridsize]; CONNECTION dir[0] : grid[i,j] <-> grid[i-1, j ]:dir[4]; dir[1] : grid[i,j] <-> grid[i-1, j+1]:dir[5]; dir[2] : grid[i,j] <-> grid[i, j+1]:dir[6]; dir[3] : grid[i,j] <-> grid[i+1, j+1]:dir[7]; VAR i,k : INTEGER; WaveElement Averager AllNeighbors OneNeighbor : grid OF INTEGER; : grid OF INTEGER; : grid OF INTEGER; : grid OF INTEGER; 12/9/
13 PeopleWave.pm(2) BEGIN (* Initialize the wave *) IF DIM(grid,1) = GRIDSIZE THEN WaveElement := 1 ELSE WaveElement := 0 END; (* Divisor for determining new WaveElement is normally 3, but only 2 on edges *) IF (DIM(grid,2) = 1) OR (DIM(grid,2) = GRIDSIZE) OR (DIM(grid,1) = 1) OR (DIM(grid,1) = GRIDSIZE) THEN Averager := 2 ELSE Averager := 3 END; WriteInt(WaveElement, 1); 12/9/ PeopleWave.pm (3) FOR i := 1 TO GRIDSIZE DO (* retrieve and average (weighted) info about neighbors *) AllNeighbors := WaveElement; FOR k := 0 TO NUMBEROFNEIGHBORS-1 DO OneNeighbor :=0; SEND.dir[(k+4) MOD 8] (WaveElement, OneNeighbor); IF k < 5 THEN (* apply the template *) AllNeighbors := AllNeighbors + OneNeighbor; ELSE AllNeighbors := AllNeighbors - OneNeighbor; END; (* IF *) END; (* FOR k *) WaveElement := AllNeighbors DIV Averager; 12/9/
14 PeopleWave.pm (4) IF WaveElement >= 1 THEN WaveElement := 1 ELSE WaveElement := 0 END; WriteInt(WaveElement, 1); END; (* FOR i *) END PeopleWave. 12/9/ Asymptotic Analysis Analysis as a variable increases toward infinity Speedup depends on two variables: S(P, N) Three possible type of limit: Fixed P, N increases Fixed N, P increases Both P and N increase in some fixed relationship 14
15 P fixed, N increases Fixed number of processors, size of problem increases. Should see more computation relative to communications, reducing overheads Should see asymptotic time similar to single processor, thus improving efficiency May be exceptions if complexity of communications grows with problem size N fixed, P increases In other words, what happens as we use more and more processors to solve a given problem? Amdahl s Law -- the law of diminishing returns Based on the observation that every problem has a part which must be computed in sequence. Let the whole problem need time 1 on a single processor. Let s be the necessarily sequential part, let p be the parallel part: s + p = 1 Assume that the parallel part can be arbitrarily divided with no overhead or communications time. What is the speedup? 15
16 N fixed, P increases S(P, N) = 1 / (s + p/n) limit as N--> infinity? lim S(P,N) = 1 / s This says that no matter how many processors we have, the speedup is limited by 1 / s. Amdahl s Law A parallel computation has two types of operations Those which must be executed in serial Those which can be executed in parallel Amdahl s law states that the speedup of a parallel algorithm is effectively limited by the number of operations which must be performed sequentially 12/9/
17 Amdahl s Law Let the time taken to do the serial calculations be some fraction σ of the total time ( 0 < σ 1 ) The parallelizable portion is 1- σ of the total Assuming linear speedup T serial = σt 1 T parallel = (1- σ)t 1 /N By substitution Speedup <= 1 (1- σ ) σ + N 12/9/ Consequences of Amdahl s Say we have a program containing 100 operations each of which take 1 time unit. Suppose σ=.2, using 80 processors Speedup = 100 / ( /80) = 100 / 21 < 5 A speedup of only 5 is possible no matter how many processors are available So why bother with parallel computing? Just wait for a faster processor 12/9/
18 Avoiding Amdahl There are several ways to avoid Amdahl s law Concentrate on parallel algorithms with small serial components Amdahl s law is not complete in that it does not take into account problem size 12/9/ Amdahl s Law and Assignment 1 Let s look at PeopleWave 12/9/
19 MODULE PeopleWave; PeopleWave.pm CONST GRIDSIZE = 4; NUMBEROFNEIGHBORS = 8; CONFIGURATION grid[1..gridsize],[1..gridsize]; CONNECTION dir[0] : grid[i,j] <-> grid[i-1, j ]:dir[4]; dir[1] : grid[i,j] <-> grid[i-1, j+1]:dir[5]; dir[2] : grid[i,j] <-> grid[i, j+1]:dir[6]; dir[3] : grid[i,j] <-> grid[i+1, j+1]:dir[7]; VAR i,k : INTEGER; WaveElement Averager AllNeighbors OneNeighbor : grid OF INTEGER; : grid OF INTEGER; : grid OF INTEGER; : grid OF INTEGER; 12/9/ PeopleWave.pm(2) BEGIN (* Initialize the wave *) IF DIM(grid,1) = GRIDSIZE THEN WaveElement := 1 ELSE WaveElement := 0 END; (* Divisor for determining new WaveElement is normally 3, but only 2 on edges *) IF (DIM(grid,2) = 1) OR (DIM(grid,2) = GRIDSIZE) OR (DIM(grid,1) = 1) OR (DIM(grid,1) = GRIDSIZE) THEN Averager := 2 ELSE Averager := 3 END; WriteInt(WaveElement, 1); 12/9/
20 PeopleWave.pm (3) FOR i := 1 TO GRIDSIZE DO (* retrieve and average (weighted) info about neighbors *) AllNeighbors := WaveElement; FOR k := 0 TO NUMBEROFNEIGHBORS-1 DO OneNeighbor :=0; SEND.dir[(k+4) MOD 8] (WaveElement, OneNeighbor); IF k < 5 THEN (* apply the template *) AllNeighbors := AllNeighbors + OneNeighbor; ELSE AllNeighbors := AllNeighbors - OneNeighbor; END; (* IF *) END; (* FOR k *) WaveElement := AllNeighbors DIV Averager; 12/9/ PeopleWave.pm (4) IF WaveElement >= 1 THEN WaveElement := 1 ELSE WaveElement := 0 END; WriteInt(WaveElement, 1); END; (* FOR i *) END PeopleWave. 12/9/
21 Classifying Parallel Programs Parallel programs can be placed into broad categories based on expected speedups Trivial Parallel Assumes complete parallelism with no overhead due to communication Divide and Conquer N log N speedup Communication Bound Parallelism 12/9/
Understanding Parallelism and the Limitations of Parallel Computing
Understanding Parallelism and the Limitations of Parallel omputing Understanding Parallelism: Sequential work After 16 time steps: 4 cars Scalability Laws 2 Understanding Parallelism: Parallel work After
More informationCSC630/CSC730 Parallel & Distributed Computing
CSC630/CSC730 Parallel & Distributed Computing Analytical Modeling of Parallel Programs Chapter 5 1 Contents Sources of Parallel Overhead Performance Metrics Granularity and Data Mapping Scalability 2
More informationWhat is Performance for Internet/Grid Computation?
Goals for Internet/Grid Computation? Do things you cannot otherwise do because of: Lack of Capacity Large scale computations Cost SETI Scale/Scope of communication Internet searches All of the above 9/10/2002
More informationAnalytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.
Analytical Modeling of Parallel Systems To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Sources of Overhead in Parallel Programs Performance Metrics for
More informationOutline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems
CSC 447: Parallel Programming for Multi- Core and Cluster Systems Performance Analysis Instructor: Haidar M. Harmanani Spring 2018 Outline Performance scalability Analytical performance measures Amdahl
More informationDesigning for Performance. Patrick Happ Raul Feitosa
Designing for Performance Patrick Happ Raul Feitosa Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance
More informationPerformance Metrics. 1 cycle. 1 cycle. Computer B performs more instructions per second, thus it is the fastest for this program.
Parallel Programming WS6 HOMEWORK (with solutions) Performance Metrics Basic concepts. Performance. Suppose we have two computers A and B. Computer A has a clock cycle of ns and performs on average 2 instructions
More informationCS 475: Parallel Programming Introduction
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
More informationChapter 13 Strong Scaling
Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationInterconnect Technology and Computational Speed
Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented
More informationVector and Parallel Processors. Amdahl's Law
Vector and Parallel Processors. Vector processors are processors which have special hardware for performing operations on vectors: generally, this takes the form of a deep pipeline specialized for this
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 9 Workload Evaluation Outline Evaluation of applications is important Simulation of sample data sets provides important information Working sets indicate
More informationAdvanced Topics UNIT 2 PERFORMANCE EVALUATIONS
Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors
More informationMEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance
Necessity of evaluation computer performance MEASURING COMPUTER PERFORMANCE For comparing different computer performances User: Interested in reducing the execution time (response time) of a task. Computer
More informationOverview of High Performance Computing
Overview of High Performance Computing Timothy H. Kaiser, PH.D. tkaiser@mines.edu http://inside.mines.edu/~tkaiser/csci580fall13/ 1 Near Term Overview HPC computing in a nutshell? Basic MPI - run an example
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete
More informationBlocking SEND/RECEIVE
Message Passing Blocking SEND/RECEIVE : couple data transfer and synchronization - Sender and receiver rendezvous to exchange data P P SrcP... x : =... SEND(x, DestP)... DestP... RECEIVE(y,SrcP)... M F
More informationSCALABILITY ANALYSIS
SCALABILITY ANALYSIS PERFORMANCE AND SCALABILITY OF PARALLEL SYSTEMS Evaluation Sequential: runtime (execution time) Ts =T (InputSize) Parallel: runtime (start-->last PE ends) Tp =T (InputSize,p,architecture)
More informationParallel Algorithms. Parallel Algorithms
Parallel Algorithms Parallel Algorithms Goals: Speedup and Efficiency Speedup, in general, is limited linearly to the number of processors (P) used. When can we epect to have linear speedup? Fully (embarrassingly)
More informationMeasuring Performance. Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks
Measuring Performance Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks Why Measure Performance? Performance tells you how you are doing and whether things can be improved appreciably When
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 15
CO20-320241 Computer Architecture and Programming Languages CAPL Lecture 15 Dr. Kinga Lipskoch Fall 2017 How to Compute a Binary Float Decimal fraction: 8.703125 Integral part: 8 1000 Fraction part: 0.703125
More informationAnalytical Modeling of Parallel Programs
2014 IJEDR Volume 2, Issue 1 ISSN: 2321-9939 Analytical Modeling of Parallel Programs Hardik K. Molia Master of Computer Engineering, Department of Computer Engineering Atmiya Institute of Technology &
More informationLecture 10: Performance Metrics. Shantanu Dutt ECE Dept. UIC
Lecture 10: Performance Metrics Shantanu Dutt ECE Dept. UIC Acknowledgement Adapted from Chapter 5 slides of the text, by A. Grama w/ a few changes, augmentations and corrections in colored text by Shantanu
More informationHigh Performance Computing Systems
High Performance Computing Systems Shared Memory Doug Shook Shared Memory Bottlenecks Trips to memory Cache coherence 2 Why Multicore? Shared memory systems used to be purely the domain of HPC... What
More informationA Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs
A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs Dan Grossman Last Updated: January 2016 For more information, see http://www.cs.washington.edu/homes/djg/teachingmaterials/
More informationChapter 5: Analytical Modelling of Parallel Programs
Chapter 5: Analytical Modelling of Parallel Programs Introduction to Parallel Computing, Second Edition By Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar Contents 1. Sources of Overhead in Parallel
More informationParallel Programming
Parallel Programming Timings, pt.3 Prof. Paolo Bientinesi HPAC, RWTH Aachen pauldj@aices.rwth-aachen.de WS17/18 Time Wall time or wall-clock time : real time between the beginning and the end of a computation
More informationPerformance analysis. Performance analysis p. 1
Performance analysis Performance analysis p. 1 An example of time measurements Dark grey: time spent on computation, decreasing with p White: time spent on communication, increasing with p Performance
More informationA Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs
A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs Steve Wolfman, based on work by Dan Grossman (with small tweaks by Alan Hu) Learning
More informationPage 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1
Program Performance Metrics The parallel run time (Tpar) is the time from the moment when computation starts to the moment when the last processor finished his execution The speedup (S) is defined as the
More informationWhich is the best? Measuring & Improving Performance (if planes were computers...) An architecture example
1 Which is the best? 2 Lecture 05 Performance Metrics and Benchmarking 3 Measuring & Improving Performance (if planes were computers...) Plane People Range (miles) Speed (mph) Avg. Cost (millions) Passenger*Miles
More informationHigh Performance Computing
The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical
More informationLecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"
Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3
More informationCSE 613: Parallel Programming. Lecture 2 ( Analytical Modeling of Parallel Algorithms )
CSE 613: Parallel Programming Lecture 2 ( Analytical Modeling of Parallel Algorithms ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2017 Parallel Execution Time & Overhead
More informationDesign of Parallel Algorithms. Course Introduction
+ Design of Parallel Algorithms Course Introduction + CSE 4163/6163 Parallel Algorithm Analysis & Design! Course Web Site: http://www.cse.msstate.edu/~luke/courses/fl17/cse4163! Instructor: Ed Luke! Office:
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationCSE 332: Data Structures & Parallelism Lecture 15: Analysis of Fork-Join Parallel Programs. Ruth Anderson Autumn 2018
CSE 332: Data Structures & Parallelism Lecture 15: Analysis of Fork-Join Parallel Programs Ruth Anderson Autumn 2018 Outline Done: How to use fork and join to write a parallel algorithm Why using divide-and-conquer
More informationHomework # 1 Due: Feb 23. Multicore Programming: An Introduction
C O N D I T I O N S C O N D I T I O N S Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.86: Parallel Computing Spring 21, Agarwal Handout #5 Homework #
More informationCSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis. Kevin Quinn Fall 2015
CSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis Kevin Quinn Fall 2015 Outline Done: How to write a parallel algorithm with fork and join Why using divide-and-conquer
More informationChapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,
Chapter 1 Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Course Goals Introduce you to design principles, analysis techniques and design options in computer architecture
More informationHomework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization
ECE669: Parallel Computer Architecture Fall 2 Handout #2 Homework # 2 Due: October 6 Programming Multiprocessors: Parallelism, Communication, and Synchronization 1 Introduction When developing multiprocessor
More informationThe typical speedup curve - fixed problem size
Performance analysis Goals are 1. to be able to understand better why your program has the performance it has, and 2. what could be preventing its performance from being better. The typical speedup curve
More informationSome aspects of parallel program design. R. Bader (LRZ) G. Hager (RRZE)
Some aspects of parallel program design R. Bader (LRZ) G. Hager (RRZE) Finding exploitable concurrency Problem analysis 1. Decompose into subproblems perhaps even hierarchy of subproblems that can simultaneously
More information1.3 Data processing; data storage; data movement; and control.
CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical
More information1 Introduction to Parallel Computing
1 Introduction to Parallel Computing 1.1 Goals of Parallel vs Distributed Computing Distributed computing, commonly represented by distributed services such as the world wide web, is one form of computational
More informationOutline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency
1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationScalability of Processing on GPUs
Scalability of Processing on GPUs Keith Kelley, CS6260 Final Project Report April 7, 2009 Research description: I wanted to figure out how useful General Purpose GPU computing (GPGPU) is for speeding up
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Linda Woodard CAC 19 May 2010 Introduction to Parallel Computing on Ranger 5/18/2010 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor
More informationPipelining to Superscalar
Pipelining to Superscalar ECE/CS 752 Fall 207 Prof. Mikko H. Lipasti University of Wisconsin-Madison Pipelining to Superscalar Forecast Limits of pipelining The case for superscalar Instruction-level parallel
More informationWorkloads Programmierung Paralleler und Verteilter Systeme (PPV)
Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment
More informationENGG 1203 Tutorial. Computer Arithmetic (1) Computer Arithmetic (3) Computer Arithmetic (2) Convert the following decimal values to binary:
ENGG 203 Tutorial Computer Arithmetic () Computer Systems Supplementary Notes Learning Objectives Compute via Computer Arithmetic Evaluate the performance of processing via Amdahl s law News Revision tutorial
More informationCSL 860: Modern Parallel
CSL 860: Modern Parallel Computation Course Information www.cse.iitd.ac.in/~subodh/courses/csl860 Grading: Quizes25 Lab Exercise 17 + 8 Project35 (25% design, 25% presentations, 50% Demo) Final Exam 25
More informationCSC2/458 Parallel and Distributed Systems Machines and Models
CSC2/458 Parallel and Distributed Systems Machines and Models Sreepathi Pai January 23, 2018 URCS Outline Recap Scalability Taxonomy of Parallel Machines Performance Metrics Outline Recap Scalability Taxonomy
More informationQuiz for Chapter 1 Computer Abstractions and Technology 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: 1. [15 points] Consider two different implementations, M1 and
More informationShared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation
Shared Memory and Distributed Multiprocessing Bhanu Kapoor, Ph.D. The Saylor Foundation 1 Issue with Parallelism Parallel software is the problem Need to get significant performance improvement Otherwise,
More informationUnit 9 : Fundamentals of Parallel Processing
Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing
More informationParallel Constraint Programming (and why it is hard... ) Ciaran McCreesh and Patrick Prosser
Parallel Constraint Programming (and why it is hard... ) This Week s Lectures Search and Discrepancies Parallel Constraint Programming Why? Some failed attempts A little bit of theory and some very simple
More informationReview: Creating a Parallel Program. Programming for Performance
Review: Creating a Parallel Program Can be done by programmer, compiler, run-time system or OS Steps for creating parallel program Decomposition Assignment of tasks to processes Orchestration Mapping (C)
More informationAnalytical Modeling of Parallel Programs
Analytical Modeling of Parallel Programs Alexandre David Introduction to Parallel Computing 1 Topic overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of
More informationPROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8
PROCESSES AND THREADS THREADING MODELS CS124 Operating Systems Winter 2016-2017, Lecture 8 2 Processes and Threads As previously described, processes have one sequential thread of execution Increasingly,
More informationSpeed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes and Tori
The Computer Journal, 46(6, c British Computer Society 2003; all rights reserved Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes Tori KEQIN LI Department of Computer Science,
More information6. Parallel Volume Rendering Algorithms
6. Parallel Volume Algorithms This chapter introduces a taxonomy of parallel volume rendering algorithms. In the thesis statement we claim that parallel algorithms may be described by "... how the tasks
More informationHOW TO WRITE PARALLEL PROGRAMS AND UTILIZE CLUSTERS EFFICIENTLY
HOW TO WRITE PARALLEL PROGRAMS AND UTILIZE CLUSTERS EFFICIENTLY Sarvani Chadalapaka HPC Administrator University of California Merced, Office of Information Technology schadalapaka@ucmerced.edu it.ucmerced.edu
More informationParallel Computation/Program Issues
Parallel Computation/Program Issues Dependency Analysis: Types of dependency Dependency Graphs Bernstein s s Conditions of Parallelism Asymptotic Notations for Algorithm Complexity Analysis Parallel Random-Access
More informationPerformance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Performance COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals What is Performance? How do we measure the performance of
More informationECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti
ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipelining to Superscalar Forecast Real
More informationQuiz for Chapter 1 Computer Abstractions and Technology
Date: Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,
More informationBeyond Latency and Throughput
Beyond Latency and Throughput Performance for Heterogeneous Multi-Core Architectures JoAnn M. Paul Virginia Tech, ECE National Capital Region Common basis for two themes Flynn s Taxonomy Computers viewed
More informationThe Role of Performance
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture The Role of Performance What is performance? A set of metrics that allow us to compare two different hardware
More informationDraft Notes 1 : Scaling in Ad hoc Routing Protocols
Draft Notes 1 : Scaling in Ad hoc Routing Protocols Timothy X Brown University of Colorado April 2, 2008 2 Introduction What is the best network wireless network routing protocol? This question is a function
More informationEvaluation of Parallel Programs by Measurement of Its Granularity
Evaluation of Parallel Programs by Measurement of Its Granularity Jan Kwiatkowski Computer Science Department, Wroclaw University of Technology 50-370 Wroclaw, Wybrzeze Wyspianskiego 27, Poland kwiatkowski@ci-1.ci.pwr.wroc.pl
More informationPerformance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]
Performance CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Performance Complex question How fast is the processor? How fast your application runs?
More informationCS6963: Parallel Programming for GPUs Midterm Exam March 25, 2009
1 CS6963: Parallel Programming for GPUs Midterm Exam March 25, 2009 Instructions: This is an in class, open note exam. Please use the paper provided to submit your responses. You can include additional
More informationCourse web site: teaching/courses/car. Piazza discussion forum:
Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start
More informationCMSC 313 Lecture 27. System Performance CPU Performance Disk Performance. Announcement: Don t use oscillator in DigSim3
System Performance CPU Performance Disk Performance CMSC 313 Lecture 27 Announcement: Don t use oscillator in DigSim3 UMBC, CMSC313, Richard Chang Bottlenecks The performance of a process
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing This document consists of two parts. The first part introduces basic concepts and issues that apply generally in discussions of parallel computing. The second part consists
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally
More informationExercise 1 Advanced Computer Architecture. Exercise 1
Folie a: Name Advanced Computer Architecture Department of Electrical Engineering and Information Technology Institute for g Dipl.-Ing. M.A. Lebedev Institute for BB 321, Tel: 0203 379-1019 E-mail: michail.lebedev@uni-due.de
More informationCS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism
CS 61C: Great Ideas in Computer Architecture Amdahl s Law, Thread Level Parallelism Instructor: Alan Christopher 07/17/2014 Summer 2014 -- Lecture #15 1 Review of Last Lecture Flynn Taxonomy of Parallel
More informationParallel Computing Concepts. CSInParallel Project
Parallel Computing Concepts CSInParallel Project July 26, 2012 CONTENTS 1 Introduction 1 1.1 Motivation................................................ 1 1.2 Some pairs of terms...........................................
More informationOutline. Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & Ni s Law. Khoa Khoa học và Kỹ thuật Máy tính - ĐHBK TP.HCM
Speedup Thoai am Outline Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & i s Law Speedup & Efficiency Speedup: S = Time(the most efficient sequential Efficiency: E = S / algorithm) / Time(parallel
More informationUNIVERSITI SAINS MALAYSIA. CCS524 Parallel Computing Architectures, Algorithms & Compilers
UNIVERSITI SAINS MALAYSIA Second Semester Examination Academic Session 2003/2004 September/October 2003 CCS524 Parallel Computing Architectures, Algorithms & Compilers Duration : 3 hours INSTRUCTION TO
More informationParallel Processing in Mixed Integer Programming
Parallel Processing in Mixed Integer Programming Laurent Poirrier Université de Liège Montefiore Institute March 27, 2009 Outline Parallel Processing Basics What, why, how? Types of Parallel Computing
More informationParallel Programming with MPI and OpenMP
Parallel Programming with MPI and OpenMP Michael J. Quinn Chapter 7 Performance Analysis Learning Objectives n Predict performance of parallel programs n Understand barriers to higher performance Outline
More informationParallel Programming. Presentation to Linux Users of Victoria, Inc. November 4th, 2015
Parallel Programming Presentation to Linux Users of Victoria, Inc. November 4th, 2015 http://levlafayette.com 1.0 What Is Parallel Programming? 1.1 Historically, software has been written for serial computation
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More information18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor
More informationNAMD Serial and Parallel Performance
NAMD Serial and Parallel Performance Jim Phillips Theoretical Biophysics Group Serial performance basics Main factors affecting serial performance: Molecular system size and composition. Cutoff distance
More informationMany-Core Computing Era and New Challenges. Nikos Hardavellas, EECS
Many-Core Computing Era and New Challenges Nikos Hardavellas, EECS Moore s Law Is Alive And Well 90nm 90nm transistor (Intel, 2005) Swine Flu A/H1N1 (CDC) 65nm 2007 45nm 2010 32nm 2013 22nm 2016 16nm 2019
More informationFinal Lecture. A few minutes to wrap up and add some perspective
Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection
More informationOutline. Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & Ni s Law. Khoa Khoa học và Kỹ thuật Máy tính - ĐHBK TP.HCM
Speedup Thoai am Outline Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & i s Law Speedup & Efficiency Speedup: S = T seq T par - T seq : Time(the most efficient sequential algorithm) - T par :
More informationCMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.
CMSC 4 Practice Exam w/answers General instructions. Be complete, yet concise. You may leave arithmetic expressions in any form that a calculator could evaluate.. CPU performance Suppose we have the following
More informationMultiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed
Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking
More informationCMPSCI 691AD General Purpose Computation on the GPU
CMPSCI 691AD General Purpose Computation on the GPU Spring 2009 Lecture 5: Quantitative Analysis of Parallel Algorithms Rui Wang (cont. from last lecture) Device Management Context Management Module Management
More informationParallel Programming with MPI and OpenMP
Parallel Programming with MPI and OpenMP Michael J. Quinn (revised by L.M. Liebrock) Chapter 7 Performance Analysis Learning Objectives Predict performance of parallel programs Understand barriers to higher
More information