Parallel Numerics, WT 2017/ Introduction. page 1 of 127

Size: px
Start display at page:

Download "Parallel Numerics, WT 2017/ Introduction. page 1 of 127"

Transcription

1 Parallel Numerics, WT 2017/ Introduction page 1 of 127

2 Scope Revise standard numerical methods considering parallel computations! Change method or implementation! page 2 of 127

3 Scope Revise standard numerical methods considering parallel computations! Change method or implementation! Required knowledge Numerics Parallel Programming Graphs page 2 of 127

4 Scope Revise standard numerical methods considering parallel computations! Change method or implementation! Required knowledge Numerics Parallel Programming Graphs Literature Dongarra, Duff, Sorensen, van der Vorst: Numerical Linear Algebra for High-Performance Computers (MAT 657f) Pacheco: A User s Guide to MPI (web) Parallel Programming with MPI Schüle: Paralleles Rechnen Source for some images: comp/ Additionally: Parallel in time/matrix formats/cg on GPU/energy/resilience page 2 of 127

5 Main aspects to keep in mind: Parallel efficiency (single instructions, algorithms) Accuracy Stability - Resilience Energy consumption page 3 of 127

6 Main aspects to keep in mind: Parallel efficiency (single instructions, algorithms) Accuracy Stability - Resilience Energy consumption Be aware of certain effects when writing programs like data access, break down of processors, idle processors... page 3 of 127

7 Computer Science Aspects of Parallel Numerics Computer Science Aspects of Parallel Numerics Numerical Problems Numerical Problems Nume Why parallel computing? Huge problems like weather prediction, quantum simulation, data mining, uncertainty quantification, modelling the universe,... TOP500 SuperMUC page 4 of 127

8 Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations 2 Elementary Linear Algebra Problems 2.1 BLAS 2.2 Matrix-Vector Operations 2.3 Matrix-Matrix-Product 3 Linear Systems of Equations with Dense Matrices 3.1 Gaussian Elimination 3.2 Parallelization 3.3 QR-Decomposition with Householder matrices 4 Sparse Matrices 4.1 General Properties, Storage 4.2 Sparse Matrices and Graphs 4.3 Reordering 4.4 Gaussian Elimination for Sparse Matrices 5 Iterative Methods for Sparse Matrices 5.1 Stationary Methods 5.2 Nonstationary Methods 5.3 Preconditioning 6 Domain Decomposition 6.1 Overlapping Domain Decomposition 6.2 Non-overlapping Domain Decomposition 6.3 Schur Complements 7 Eigenvalues, Fourier Transform, Quantum Computing page 5 of 127

9 1.1. Computer Science Aspects of Parallel Numerics Pipelining in the CPU Goal: Acceleration of single instructions To notice: All ops are created equal Elementary operations are carried out in pipelines: Divide task into smaller subtasks Each subtask is executed on piece of hardware that operates concurrently with other stages of the pipeline Compare Production line, car manufacturing! page 6 of 127

10 1.2. Computer Science Aspects of Parallel Numerics Pipelining in the CPU Goal: Acceleration of single instructions To notice: All ops are created equal Elementary operations are carried out in pipelines: Divide task into smaller subtasks Each subtask is executed on piece of hardware that operates concurrently with other stages of the pipeline Compare Production line, car manufacturing! Addition Pipeline: page 6 of 127

11 Visualization Pipelining page 7 of 127

12 Visualization Pipelining: Time step 1 page 8 of 127

13 Visualization Pipelining: Time step 2 page 9 of 127

14 Visualization Pipelining: Time step 3 page 10 of 127

15 Visualization Pipelining: Time step 4 page 11 of 127

16 Visualization Pipelining: Time step 5 page 12 of 127

17 Visualization Pipelining Startup time = k(= 4) clock units Lateron: per clock unit one result Total time: k u + n u Total time considering only one pipeline and one vector operation! page 13 of 127

18 Advantages of Pipelines Pipeline filled: per clock unit one result is achieved. All additions should be organized such that pipeline is always filled! Pipeline nearly empty: e.g., in the beginning of the computations, it is not efficient! Major task for CPU: Organize all operations such that operands are just in time at the right position to fill the pipeline and keep it full. page 14 of 127

19 Computer Science Aspects of Parallel Numerics Computer Science Aspects of Parallel Numerics Numerical Problems Numerical Problems Nume CPU Pipelining page 15 of 127

20 Computer Science Aspects of Parallel Numerics Computer Science Aspects of Parallel Numerics Numerical Problems Numerical Problems Nume CPU page 16 of 127

21 Pipelining: General Steps Instruction Fetch: Get the next command Decoding: Analyse instruction, compute addresses of operands Operand Fetch: Get the values of the next operands Execution step: Carry out command on operands Result Write: Write result in memory Pipelining of these steps, and also inside each step. Chaining Pipelines: αx + y combining multiplication αx and addition page 17 of 127

22 Special case: Vector scaling For set of data the same operation has to be executed on all components. y 1 x 1 y 2. = α x 2. y n x n for j = 1, 2,..., n : y j = αx j ; Total costs: Startup time + vector length clock time (pipeline length + vector length ) T If CPU is well organized, the startup time disappears! page 18 of 127

23 Example of a modern CPU architecture Source: A. Heinecke and C. Trinitis, Cache-oblivious Matrix Algorithms in the Age of Multi- and Many-Cores, Concurrency and Computation: Practice and Experience, pp. 1 20, 2012, DOI: /cpe page 19 of 127

24 Problem: Data Dependency Example: Fibonacci numbers x 0 = 0, x 1 = 1, x 2 = x 0 + x 1,..., x i = x i 2 + x i 1 After filling in x 0 and x 1 the next pair needs x 2 which is known only after the first computation is finished! Pipeline contains always only one pair is nearly empty all the time! Similar Problem for recursive subroutine calls. page 20 of 127

25 Memory Organization page 21 of 127

26 Cache Idea Cache as memory buffer between large, slow and small, fast memory. Considering data flow (last used data), try to predict (!) which data will be requested in next step: keep last used data in fast cache because it is likely that the same data will be used again keep the neighborhood of last used data in fast cache. Memory is organized in chunks. Together with current data, put surroundings into cache (cache line). Possible discrepancy between intention and reality! page 22 of 127

27 Cache Idea Cache hit: The data requested from the higher level is found in the cache: use the cached data and transfer it to the higher level. Done. Cache miss: The data requested from the higher level is not found in the cache: Look for data in the lower (slower) memory. Copy the related data to the cache (removing the oldest cache entry) and transfer it to the higher level. page 23 of 127

28 Cache Idea Cache hit: The data requested from the higher level is found in the cache: use the cached data and transfer it to the higher level. Done. Cache miss: The data requested from the higher level is not found in the cache: Look for data in the lower (slower) memory. Copy the related data to the cache (removing the oldest cache entry) and transfer it to the higher level. Temporal locality: Data referenced at some point in time will again be referenced in near future. If data needs to be reused, do it as soon as possible! Spatial locality: Data close to recently referenced data is probable to be referenced soon. Work blockwisely to ensure neighbouring! page 23 of 127

29 Mapping between Memory Direct mapping: Adress in cache, modulo Adress Adress Disadvantage: Immediately replacing of data in cache page 24 of 127

30 Mapping between Memory Direct mapping: Adress in cache, modulo Adress Adress Disadvantage: Immediately replacing of data in cache Associative mapping: Partition cache in blocks. Write data to direct mapped address in one of the blocks. Replace oldest data in block. page 24 of 127

31 Cyclic data distribution Main memory is often organized in banks connected by bus: Per cycle n operands can be fetched out of the n banks. Storing vectors! x 1 in bank1 x 2 in bank2,.... Allows one step access to x. page 25 of 127

32 Parallel Processors Classical von Neumann model: Code and data in memory! Control unit fetches instructions and data from memory and sequentially coordinates the operations. page 26 of 127

33 Flynn Taxonomie Classification of computer systems by number of instruction streams and data streams: SISD: Single instruction stream, single data stream. Conventional serial computers page 27 of 127

34 Flynn Taxonomie Classification of computer systems by number of instruction streams and data streams: SISD: Single instruction stream, single data stream. Conventional serial computers SIMD: Single instruction stream can handle multiple data streams. All PEs synchronously executing same instruction stream on different sets of data. E.g., SSE (Streaming SIMD Extensions), instruction set, e.g. for graphics processing MISD: Machine is capable to process single data stream using multiple instruction streams. Uncommon architecture. Heterogenous systems operate on same data stream and must agree on result e.g., fault tolerance. page 27 of 127

35 Flynn Taxonomie Classification of computer systems by number of instruction streams and data streams: SISD: Single instruction stream, single data stream. Conventional serial computers SIMD: Single instruction stream can handle multiple data streams. All PEs synchronously executing same instruction stream on different sets of data. E.g., SSE (Streaming SIMD Extensions), instruction set, e.g. for graphics processing MISD: Machine is capable to process single data stream using multiple instruction streams. Uncommon architecture. Heterogenous systems operate on same data stream and must agree on result e.g., fault tolerance. MIMD: Multiple instruction streams on multiple data streams. General purpose parallel computers. page 27 of 127

36 Parallel Computation MIMD architecture: multiple instructions multiple data (compare to SISD = single instructions single data, etc.) page 28 of 127

37 Shared Memory Parallelization (SMP) page 29 of 127

38 Cache Coherence parallel for(i=proc_number; i<n; i=i+2) x[i]=2.; 2 threads, proc_number 0 and 1: Thread 1 changes x(0,2,4,...) and thread2 changes x(1,3,5,...). Each changing step of thread 1 also changes data that is contained in the cache of thread 2 (and vice versa). Otherwise data in the two caches is not consistent anymore! To retain the right values in both caches after each changing step, also the value in the other cache has to be renewed! Leads to dramatic increase of computational time, ev. slower than sequential computation! page 30 of 127

39 Distributed Memory Parallelization (DMP) Virtual shared memory: Distributed data but organized as shared memory. page 31 of 127

40 Nonuniform Memory Access (NUMA) Cluster of multiple CPU processors Different types of communication shared memory and distributed memory! page 32 of 127

41 Topology of the processor/memory interconnection (DMP) Mesh (Array, Grid): p processors, longest path 2( p 1) Historical first approaches: Vectorization and Topology of Processors page 33 of 127

42 Topology of the processor/memory interconnection Time for sending data from one processor to another depends on the connection network topology: Topology worst case time Mesh 2( p 1) Vector p 1 Ring/Torus p 2 Tree 2(ld(p) 1) Hypercube ld(p) Definition: diameter = largest distance page 34 of 127

43 Communication Topology diameter = 2( p 1) Manhattan distance page 35 of 127

44 Hypercube Diameter=Dimension page 36 of 127

45 Hypercube Diameter=Dimension page 36 of 127

46 Hypercube Diameter=Dimension Example: vector x i = x ip...i 1 i 0 as Hypercube page 36 of 127

47 Different Topologies Topology G p diameter(g) degree(g) edges(g) G 1 (n), (vector) n n 1 2 n 1 T 1 (n), (ring) n n 2 2 n G 2 (n, n), (mesh) n 2 2n 2 4 2n 2 2n T 2 (n, n) n 2 2 n 2 4 2n2 BT (h) 2 h+1 1 2h 3 2 h+1 2 HC(k) 2 k k k 2 k 1 k G: Grid, T : Torus, BT : binary tree, HC: hypercube page 37 of 127

48 Network based on Switches - 3-level Omega network Blocking network. Simultaneous connection P0 P6 and P1 P7 is not possible! Turn-over of switches necessary! Always only one possible position for pair of outgoing/ingoing switches! page 38 of 127

49 Network based on Switches - Crossbar Direct, independent connection between all processors. Nonblocking! page 39 of 127

50 page 40 of 127

51 Performance Analysis Definitions: f : parallel fraction of program 1 f : sequential fraction of program t p : wall clock time to execute the job on p parallel processors page 41 of 127

52 Performance Analysis Definitions: f : parallel fraction of program 1 f : sequential fraction of program t p : wall clock time to execute the job on p parallel processors Using p processors in parallel we can achieve speedup. Speedup S p := t 1 tp is the ratio of execution time with 1 vs. p PEs In the ideal case we observe t 1 = p t p. Choice of algorithm for t 1! page 41 of 127

53 Performance Analysis Definitions: f : parallel fraction of program 1 f : sequential fraction of program t p : wall clock time to execute the job on p parallel processors Using p processors in parallel we can achieve speedup. Speedup S p := t 1 tp is the ratio of execution time with 1 vs. p PEs In the ideal case we observe t 1 = p t p. Choice of algorithm for t 1! Efficiency using p processors: E p = S p p, 0 E p 1 E p 1: very good parallelizable, because then S p p or t 1 p t p. (problem scales) E p 0: bad, because E p = Sp p = t 1 p t p and t 1 p t p. page 41 of 127

54 Amdahl s Law Assumption: for fixed problem size program can be separated into parallel fraction f and sequential fraction 1 f. t p = f t 1 f + (1 f )p + (1 f )t 1 = t 1 (1 f )t 1 p }{{} p }{{} strongly sequential ideally parallel page 42 of 127

55 Amdahl s Law Assumption: for fixed problem size program can be separated into parallel fraction f and sequential fraction 1 f. t p = f t 1 f + (1 f )p + (1 f )t 1 = t 1 (1 f )t 1 p }{{} p }{{} strongly sequential ideally parallel Amdahl s law: S p = t 1 t p = 1 f +(1 f )p p = p f + (1 f )p 1 1 f E p = S p p = 1 f + (1 f )p 1 p E p = 0 (1 f )p We always have a small portion of our algorithm that is not parallelizable the efficiency will always be zero in the limit! page 42 of 127

56 Amdahl s Law Speedup depending on p. Saturation 1 1 f. Stay away of saturation! page 43 of 127

57 Gustafson s Law Assume that problem can be solved in 1 unit of time on parallel machine with p processors. Fraction f is good parallelizable with speedup p Define f as execution time of the parallelizable part in t p = 1 Compared with this parallel implementation an uniprocessor would perform t 1 = (1 f ) + f p for the same job. page 44 of 127

58 Gustafson s Law Assume that problem can be solved in 1 unit of time on parallel machine with p processors. Fraction f is good parallelizable with speedup p Define f as execution time of the parallelizable part in t p = 1 Compared with this parallel implementation an uniprocessor would perform t 1 = (1 f ) + f p for the same job. Speedup: S pf = t 1 = 1 f + f p = p + (1 p)(1 f ) t p 1 page 44 of 127

59 Gustafson s Law Assume that problem can be solved in 1 unit of time on parallel machine with p processors. Fraction f is good parallelizable with speedup p Define f as execution time of the parallelizable part in t p = 1 Compared with this parallel implementation an uniprocessor would perform t 1 = (1 f ) + f p for the same job. Speedup: Efficiency: S pf = t 1 = 1 f + f p = p + (1 p)(1 f ) t p 1 E pf = S pf p = 1 f + f p f p page 44 of 127

60 Amdahl: 0 1-f f 1 f/p f p f E p f f S p ft t f t p p p ) (1 1 / 1 1 / ) 1 ( 1 1 Gustafson: 0 1-f f 1 ' ' 1 ' ' 1 ' ') (1 1 f p f E p f f S pt f t f t p p p p page 45 of 127

61 Comparison Amdahl vs. Gustafson law/speedup: f = f = 0 p f = f = 1 Amdahl f Gustafson 1 f O(p) p p law/efficiency: f = f = 0 p f = f = 1 Amdahl 1 p 0 1 Gustafson 1 p f 1 page 46 of 127

62 Who Needs Communication Communications between tasks depends upon problem: No need for communication: Some problems can be decomposed and executed in parallel with virtually no need for tasks to share data. Often called embarrassingly parallel as they are straight-forward. Very little inter-task communication is required. page 47 of 127

63 Who Needs Communication Communications between tasks depends upon problem: No need for communication: Some problems can be decomposed and executed in parallel with virtually no need for tasks to share data. Often called embarrassingly parallel as they are straight-forward. Very little inter-task communication is required. Need for communication: Most parallel applications not so simple require tasks to share data with each other. Changes to neighboring data has a direct effect on that task s data. Usually there always exists a sequential part, e.g. I/O. page 47 of 127

64 Further Keywords Load balancing: Job should be distributed such that all processors are busy all the time! avoid idle processors efficiency page 48 of 127

65 Further Keywords Load balancing: Job should be distributed such that all processors are busy all the time! avoid idle processors efficiency Dead lock: Two or more processors waiting indefinitely for event that can be caused only by one of themselves. page 48 of 127

66 Further Keywords Load balancing: Job should be distributed such that all processors are busy all the time! avoid idle processors efficiency Dead lock: Two or more processors waiting indefinitely for event that can be caused only by one of themselves. Each processor waiting for results of the other processor! page 48 of 127

67 Further Keywords Load balancing: Job should be distributed such that all processors are busy all the time! avoid idle processors efficiency Dead lock: Two or more processors waiting indefinitely for event that can be caused only by one of themselves. Each processor waiting for results of the other processor! Race condition: Non-deterministic outcome/results depending on chronology of parallel tasks (shared memory). page 48 of 127

Parallel Numerics, WT 2013/ Introduction

Parallel Numerics, WT 2013/ Introduction Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

BlueGene/L (No. 4 in the Latest Top500 List)

BlueGene/L (No. 4 in the Latest Top500 List) BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

BİL 542 Parallel Computing

BİL 542 Parallel Computing BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,

More information

Parallel Architecture. Sathish Vadhiyar

Parallel Architecture. Sathish Vadhiyar Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate

More information

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport CPS 303 High Performance Computing Wensheng Shen Department of Computational Science SUNY Brockport Chapter 2: Architecture of Parallel Computers Hardware Software 2.1.1 Flynn s taxonomy Single-instruction

More information

Taxonomy of Parallel Computers, Models for Parallel Computers. Levels of Parallelism

Taxonomy of Parallel Computers, Models for Parallel Computers. Levels of Parallelism Taxonomy of Parallel Computers, Models for Parallel Computers Reference : C. Xavier and S. S. Iyengar, Introduction to Parallel Algorithms 1 Levels of Parallelism Parallelism can be achieved at different

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

High Performance Computing Systems

High Performance Computing Systems High Performance Computing Systems Shared Memory Doug Shook Shared Memory Bottlenecks Trips to memory Cache coherence 2 Why Multicore? Shared memory systems used to be purely the domain of HPC... What

More information

Lect. 2: Types of Parallelism

Lect. 2: Types of Parallelism Lect. 2: Types of Parallelism Parallelism in Hardware (Uniprocessor) Parallelism in a Uniprocessor Pipelining Superscalar, VLIW etc. SIMD instructions, Vector processors, GPUs Multiprocessor Symmetric

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming Linda Woodard CAC 19 May 2010 Introduction to Parallel Computing on Ranger 5/18/2010 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Parallel Computing Why & How?

Parallel Computing Why & How? Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Tools and techniques for optimization and debugging. Fabio Affinito October 2015

Tools and techniques for optimization and debugging. Fabio Affinito October 2015 Tools and techniques for optimization and debugging Fabio Affinito October 2015 Fundamentals of computer architecture Serial architectures Introducing the CPU It s a complex, modular object, made of different

More information

Module 5 Introduction to Parallel Processing Systems

Module 5 Introduction to Parallel Processing Systems Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this

More information

Multi-Processor / Parallel Processing

Multi-Processor / Parallel Processing Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms

More information

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors - Flynn s Taxonomy (1966) Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The

More information

What is Parallel Computing?

What is Parallel Computing? What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing

More information

Types of Parallel Computers

Types of Parallel Computers slides1-22 Two principal types: Types of Parallel Computers Shared memory multiprocessor Distributed memory multicomputer slides1-23 Shared Memory Multiprocessor Conventional Computer slides1-24 Consists

More information

10th August Part One: Introduction to Parallel Computing

10th August Part One: Introduction to Parallel Computing Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer

More information

Processor Architecture and Interconnect

Processor Architecture and Interconnect Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348 Chapter 1 Introduction: Part I Jens Saak Scientific Computing II 7/348 Why Parallel Computing? 1. Problem size exceeds desktop capabilities. Jens Saak Scientific Computing II 8/348 Why Parallel Computing?

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally

More information

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K. Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing

More information

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 18-447 Computer Architecture Lecture 27: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 Assignments Lab 7 out Due April 17 HW 6 Due Friday (April 10) Midterm II April

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

Parallel programming. Luis Alejandro Giraldo León

Parallel programming. Luis Alejandro Giraldo León Parallel programming Luis Alejandro Giraldo León Topics 1. 2. 3. 4. 5. 6. 7. 8. Philosophy KeyWords Parallel algorithm design Advantages and disadvantages Models of parallel programming Multi-processor

More information

SMD149 - Operating Systems - Multiprocessing

SMD149 - Operating Systems - Multiprocessing SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction

More information

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system

More information

High Performance Computing in C and C++

High Performance Computing in C and C++ High Performance Computing in C and C++ Rita Borgo Computer Science Department, Swansea University Announcement No change in lecture schedule: Timetable remains the same: Monday 1 to 2 Glyndwr C Friday

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

CMSC 313 Lecture 27. System Performance CPU Performance Disk Performance. Announcement: Don t use oscillator in DigSim3

CMSC 313 Lecture 27. System Performance CPU Performance Disk Performance. Announcement: Don t use oscillator in DigSim3 System Performance CPU Performance Disk Performance CMSC 313 Lecture 27 Announcement: Don t use oscillator in DigSim3 UMBC, CMSC313, Richard Chang Bottlenecks The performance of a process

More information

Chapter 11. Introduction to Multiprocessors

Chapter 11. Introduction to Multiprocessors Chapter 11 Introduction to Multiprocessors 11.1 Introduction A multiple processor system consists of two or more processors that are connected in a manner that allows them to share the simultaneous (parallel)

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing Introduction to Parallel Computing with MPI and OpenMP P. Ramieri Segrate, November 2016 Course agenda Tuesday, 22 November 2016 9.30-11.00 01 - Introduction to parallel

More information

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE The most popular taxonomy of computer architecture was defined by Flynn in 1966. Flynn s classification scheme is based on the notion of a stream of information.

More information

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor

More information

Parallel Computing Introduction

Parallel Computing Introduction Parallel Computing Introduction Bedřich Beneš, Ph.D. Associate Professor Department of Computer Graphics Purdue University von Neumann computer architecture CPU Hard disk Network Bus Memory GPU I/O devices

More information

Parallel Architectures

Parallel Architectures Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s

More information

Overview. Processor organizations Types of parallel machines. Real machines

Overview. Processor organizations Types of parallel machines. Real machines Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments

More information

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking

More information

High Performance Computing. University questions with solution

High Performance Computing. University questions with solution High Performance Computing University questions with solution Q1) Explain the basic working principle of VLIW processor. (6 marks) The following points are basic working principle of VLIW processor. The

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

PARALLEL COMPUTER ARCHITECTURES

PARALLEL COMPUTER ARCHITECTURES 8 ARALLEL COMUTER ARCHITECTURES 1 CU Shared memory (a) (b) Figure 8-1. (a) A multiprocessor with 16 CUs sharing a common memory. (b) An image partitioned into 16 sections, each being analyzed by a different

More information

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

High Performance Computing. Introduction to Parallel Computing

High Performance Computing. Introduction to Parallel Computing High Performance Computing Introduction to Parallel Computing Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials

More information

Computer Architecture and Organization

Computer Architecture and Organization 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 10 Advanced Computer Architecture 10-2 Chapter 10 - Advanced Computer

More information

Tools and techniques for optimization and debugging. Andrew Emerson, Fabio Affinito November 2017

Tools and techniques for optimization and debugging. Andrew Emerson, Fabio Affinito November 2017 Tools and techniques for optimization and debugging Andrew Emerson, Fabio Affinito November 2017 Fundamentals of computer architecture Serial architectures Introducing the CPU It s a complex, modular object,

More information

School of Parallel Programming & Parallel Architecture for HPC ICTP October, Intro to HPC Architecture. Instructor: Ekpe Okorafor

School of Parallel Programming & Parallel Architecture for HPC ICTP October, Intro to HPC Architecture. Instructor: Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Intro to HPC Architecture Instructor: Ekpe Okorafor A little about me! PhD Computer Engineering Texas A&M University Computer

More information

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Parallel Processing Basics Prof. Onur Mutlu Carnegie Mellon University Readings Required Hill, Jouppi, Sohi, Multiprocessors and Multicomputers, pp. 551-560 in Readings in Computer

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Instructor: Tsung-Che Chiang tcchiang@ieee.org Department of Science and Information Engineering National Taiwan Normal University Introduction In the roughly three decades between

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Introduction to parallel computing

Introduction to parallel computing Introduction to parallel computing 3. Parallel Software Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Last time Parallel hardware Multi-core

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2) Lecture 15 Multiple Processor Systems Multiple Processor Systems Multiprocessors Multicomputers Continuous need for faster computers shared memory model message passing multiprocessor wide area distributed

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Chapter 18 Parallel Processing

Chapter 18 Parallel Processing Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018 Introduction CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University What is Parallel

More information

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

CSL 860: Modern Parallel

CSL 860: Modern Parallel CSL 860: Modern Parallel Computation Course Information www.cse.iitd.ac.in/~subodh/courses/csl860 Grading: Quizes25 Lab Exercise 17 + 8 Project35 (25% design, 25% presentations, 50% Demo) Final Exam 25

More information

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD Pipeline and Vector Processing 1. Parallel Processing Parallel processing is a term used to denote a large class of techniques that are used to provide simultaneous data-processing tasks for the purpose

More information

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2 Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION For questions, email to jan.kwiatkowski@pwr.edu.pl with 'Subject=your name.

More information

SHARED MEMORY VS DISTRIBUTED MEMORY

SHARED MEMORY VS DISTRIBUTED MEMORY OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors

More information

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. Multiprocessor Systems A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. However, Flynn s SIMD machine classification, also called an array processor,

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Principles of Computer Architecture. Chapter 10: Trends in Computer. Principles of Computer Architecture by M. Murdocca and V.

Principles of Computer Architecture. Chapter 10: Trends in Computer. Principles of Computer Architecture by M. Murdocca and V. 10-1 Principles of Computer Architecture Miles Murdocca and Vincent Heuring Chapter 10: Trends in Computer Architecture 10-2 Chapter Contents 10.1 Quantitative Analyses of Program Execution 10.2 From CISC

More information

ELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges

ELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges ELE 455/555 Computer System Engineering Section 4 Class 1 Challenges Introduction Motivation Desire to provide more performance (processing) Scaling a single processor is limited Clock speeds Power concerns

More information

Crossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2)

Crossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2) Chapter 2: Concepts and Architectures Computer System Architectures Disk(s) CPU I/O Memory Traditional Computer Architecture Flynn, 1966+1972 classification of computer systems in terms of instruction

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih ARCHITECTURAL CLASSIFICATION Mariam A. Salih Basic types of architectural classification FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FENG S CLASSIFICATION Handler Classification Other types of architectural

More information

Understanding Parallelism and the Limitations of Parallel Computing

Understanding Parallelism and the Limitations of Parallel Computing Understanding Parallelism and the Limitations of Parallel omputing Understanding Parallelism: Sequential work After 16 time steps: 4 cars Scalability Laws 2 Understanding Parallelism: Parallel work After

More information

Parallel Programming: Background Information

Parallel Programming: Background Information 1 Parallel Programming: Background Information Mike Bailey mjb@cs.oregonstate.edu parallel.background.pptx Three Reasons to Study Parallel Programming 2 1. Increase performance: do more work in the same

More information

Parallel Programming: Background Information

Parallel Programming: Background Information 1 Parallel Programming: Background Information Mike Bailey mjb@cs.oregonstate.edu parallel.background.pptx Three Reasons to Study Parallel Programming 2 1. Increase performance: do more work in the same

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Introduction. EE 4504 Computer Organization

Introduction. EE 4504 Computer Organization Introduction EE 4504 Computer Organization Section 11 Parallel Processing Overview EE 4504 Section 11 1 This course has concentrated on singleprocessor architectures and techniques to improve upon their

More information

High Performance Computing

High Performance Computing The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical

More information

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been

More information

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation Shared Memory and Distributed Multiprocessing Bhanu Kapoor, Ph.D. The Saylor Foundation 1 Issue with Parallelism Parallel software is the problem Need to get significant performance improvement Otherwise,

More information