10th August Part One: Introduction to Parallel Computing

Size: px
Start display at page:

Download "10th August Part One: Introduction to Parallel Computing"

Transcription

1 Part One: Introduction to Parallel Computing 10th August 2007

2 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer architectures Examples of problems demanding parallel processing Well and hard to parallelize Relations between algorithmic complexity and parallel computing Measures of parallel computing Reachable Speedups (Amdahl s Law) Finding an optimal number of processors Typical parallel applications

3 Reasons for Parallel Computing Parallel Computing often considered as the main direction for high performance computing. Specific goals can be: Solve problems in a shorter, acceptable time Find solutions for big problems (large set of input variables, very high accuracy) in acceptable time Map a problem into the memory of a computer Most of these aspects are getting supported by progress of technology and processor architecture, but there are limitations.

4 Limiting Factors Traditionally, performance growth was driven by: Packing of more and more functions into a processor chip Scaling up processors clock frequency Limiting factors. i.e. aspects against: Area size of processor chip (die area) can not be enlarged without increasing the time for signal propagation. Clock frequency is bounded: signals often must propagate across the chip area within a single clock cycle A further increase of functional density causes structures that measure only a few atoms More functionality and higher clock frequency cause more energy consumption and more heating of the processors.

5 How these limitations materialize Example: 20 cm ns as typical velocity of electrons in copper (electrical wires) A future processor chip could be clocked with 100 GHz. A clock period then is ns. The signal distance in this time is 0.2 cm. Thus, the only ways for a performance increase Parallel utilization of smaller sub-components within a processor chip, not in a common clock domain Using multiple processors Using multiple computers

6 Criteria Better Algorithm (less Operations) Degree of Parallel Proc. Mapping on Processors, Memory Hierarchie Algorithm with minimal number of computation steps: often several algorithms exist, differing in the number of computations steps and memory consumption. Mapping to processor architecture: Use of operations directly corresponding to instructions, memory locality Parallelization: Decomposition into independently executable operation streams / using vector operations.

7 Overview: Parallel Computer Architectures (1) Definition Parallel Computer by T. Ungerer: A parallel computer consists of multiple processing units that work coordinated and (at least partly) simultaneously in order to solve a problem cooperatively.

8 Overview: Parallel Computer Architectures (2) Classification (Flynn, 1966): Coarse classification - based on number of independent instruction steams and number of data pair streams. DataStreams Single (SD) Multiple (MD) Instruction Streams Single (SI) Multiple (MI) SISD SIMD MIMD von Neumann computer: SISD SIMD, MIMD are extensions of the von-neumann architecture, and both parallel computers

9 Overview: Parallel Computer Architectures (3) MIMD: Shared Memory Multiprocessor Systems Server with many processors (as usual today), with different applications components beeing executed onto different processors, e.g. database and web server MultiCore processors Distributed Memory Multiprocessor Systems - Distributed Systems Blade systems (networked computer blades) Cluster computer Networked workstations, as long used with parallel run time environment (e.g. MPI) Parallel computers connected by wide area networks: GRID

10 Overview: Parallel Computer Architectures (4) SIMD: Array Computer: High number of equally strutured arithmetical units working synchronously under control of a single control unit. Vector Computer - one (multiple) specialized arithmetical units. These units work in a pipeline mode for fast floating point calculations. Arithmetical Pipelining

11 Classification Parallel Computer von Neumann non von Neumann MIMD SIMD Dataflow Computer Systolic array Distributed Memory Shared Memory Array Computer Vector Computer NUMA UMA Network topologies, Routing Cache coherency & memory cnsistency architecture class technical treats

12 Example (1/3) Polynomial: y = a 4 x 4 + a 3 x 3 + a 2 x 2 + a 1 x + a 0 Algorithm A1: separate calculation of powers and products 1 x 1, x 2, x 3, x 4 : 3 Multiply 2 Products a i x i : 4 Multiply 3 Summarize: 4 Add 4 Control (Loop over i) requires 3 Add A1 requires 7 multiplications and 7 additions.

13 Example (2/3) Algorithm A2: stepwise calculation (1) i:=1; n:=4; (2) z:=a[n]; (3) z1:= x * z + a[i-1]; (4) i:= i + 1; (5) if (i<=n) { z := z1; goto (2); } (6) result := z1; A2 requires 4 multiplications and 8 additions. A2 is the better algorithm, compared to A1, because it needs less operations.

14 Example (3/3) Question: Can A1 and A2 get parallelized? y = a *x*x*x*x + a *x*x*x + a *x*x + a *x + a y = (((a * x + a ) * x + a ) * x + a ) * x + a * * * * * * * * * + * + * + * A1 Result after 5 time steps + + A2 Result after 8 time steps + * + * + A1 is the better one in terms of parallel execution. A2 can not be parallelized, due to data dependencies

15 Complexity of Algorithms (1) Definition: Time Complexity Number of computation steps related to the problem size n... size of input data T(n) exact number of computation steps O(n): Order of Complexity (without constant factors, contains only major functions of n) Example: T(n) = n + 3 n 2 O(n 2 )

16 Complexity of Algorithms (2) Hierarchy of complexities: Useful (gut brauchbare Algorithmen): O(1), O(log n) Still useful (noch brauchbare Algorithmen): O(n), O(n log n), polynomial Critical, useless algorithms: O(2 n ), O(n!) Parallel execution of algorithms beneficial, if: complexity in the range between logarithmic to polynomial algorithm contains a high degree of independent calculations

17 Complexity and Parallel Computing Scaling problem size (left) single processor vs. (right) linearly growing number of processors

18 Complexity and Parallel Computing Examples: Scalar-Product O(n): number of used processors directly corresponds to the scaled vector size, n new = d n old p new = d p old. Matrix-Multiplication O(n 3 ): Parallel matrix multiplication allows bigger problem sizes in a constant time, n new = d n old p new = d 3 p old. Generate and test binary numbers of length n, O(2 n ): practically not scalable, n new = n old + 1 p new = p old 2. Traveling Salesman O(n!): practically not scalable, n new = n old + 1 p new = p old n new.

19 A Good Example (1/2) Matrix Multiplication C = A B for i:=0 to n-1 for j:=0 to n-1 c[i,j]:=0 for k:=0 to n-1 endfor endfor endfor c[i,j] := c[i,j] + a[k,j] * b[i,k] Complexity Order: O(n 3 ) Parallel algorithm: Input partitioning - The outer two loops (i,j) are split, and different processes/threads cover these different areas.

20 A Good Example (2/2) Matrix Multiplication Table shows number of steps, divided in steps per loop n input size T 1 (n) T 2 (n) T 4 (n) T 8 (n) 10 2 * *10*10 5*10*10 5*5*10 5*5*5 = 200 = 1000 = 500 = 250 = * *20*20 10*20*20 10*10*20 5*10*20 = 800 = 8000 = 4000 = 2000 = * *40*40 20*40*40 20*20*40 10*20*40 = 3200 = = =16000 = * *80*80 40*80*80 40*40*80 20*40*80 = = = = = Problem size can be increased, but doubling problem size requires the processor number to be increased by factor 8.

21 A Bad Example (1/2) Traveling Salesman (TSP) Input: n objects, for each two objects i,j a distance cost d i,j {1, 2,...,n} Required result: permutation p of the objects with p(i) = i-th Element, such that d p(i),p(i+1) ) + d p(n),p(0) is minimal n 1 ( i=1 Time Complexity: T = (n-1)!, T = (n-1)!/2 (symmetric TSP)

22 A Bad Example (2/2) Experiment: Provide (n 1) processors for a problem of size n n T 1 (n) T n 1 (n) 4 3!=6 6/3=2 5 4!=24 24/4 = 6 6 5! = /5 = != /9 = != /10 = By using n processors we are able to process a problem size of n + 1, compared to a single processor machine with a problem of size n.

23 Measures to Evaluate Parallel Computing: Speedup Parameters: p... number of processors used T 1... time steps needed for execution on a single processor T p... time steps for execution on a parallel computer with p processors Speedup - how many times faster does the program run S p = T 1 T p Speedup normally in the range of 1... p. If S p > p, then this is caused by additional effects, e.g. better memory utilization, parallel operating system.

24 Measures: Efficiency Efficiency - utilization of parallelism E p = S p p Normally, E p is in the range of Ideal algorithms exhibit an E p = 1, independently of p. When E p on a realistic machine does not sink with increasing number of processors, we call that scalable. (Scalability)

25 Measures: Scaleup Scaleup - how much more data can we process in a fixed period of time m... size of the small problem n... size of the big problem, computed with p processors SC p = n m whereby T 1 (m) = T p (n) Scaleup depends directly on time complexity of the algorithm.

26 Measures: Reachable Speedup (1) Ideally, with p processors we can gain a speedup of p. Not always, because most algorithms contain (small) sequential parts. a... fraction of that can be parallelized on p processors b... fraction of that remains sequential (e.g. due to data dependencies) a and b express fractions of time consumptions related to the entire execution time on a single processor. Thus, a + b = 1. By using the speedup formula and normalizing T 1 (n) to 1, we obtain: S p = T 1(n) T p (n) = a + b b + a p = 1 (1 a) + a p

27 Measures: Reachable Speedup (2) Amdahl s Law (1967) S p = 1 b + a p Maximal Speedup: Use an infinite number of processors: lim p S p = 1 b Vary b in the range from 0 to 1: x-axis is b A low fraction non-parallelizable operations may significantly limit the reachable speedup.

28 Measures: Reachable Speedup (3) lim p S p = 1 b Example: with b=0.1, the maximum speedup is 10, independently how many processors are used (p>=10).

29 Measures: Reachable Speedup (4) Vary the number of processors used, curves for several b-values x-axis is p The existence b > 0.05 causes that speedup increase can only be reached until a number of processors p x. As bigger b gets, the smaller is p x.

30 Measures: Optimal number of processors (1) We use another measure: F p = S p E p T 1 F p grows with increasing speedups But F p sinks with decreasing efficiency Division by T 1 in order to normalize F p ; not really necessary in our scope F p reaches a maximum, when the optimal number of processors is used.

31 Measures: Optimal number of processors (2) Applying Amdahl s Law to S p and calculate E p, F p. Plot for several b-values - fractions of nonparallelizable operations F p reaches a maximum, when the optimal number of processors is used, thus search for the top points in the curves!

32 Measures: Optimal number of processors (3) Analytical approach: with F p = S p E p = (S p ) 2 1 p F p = ( d df p dp = 0 ( 1 b+ a p 1 b + a p dp ) 2 1 p ) 2 1 p = 0

33 Measures: Optimal number of processors (4) we obtain: ( a 2 p = 1 2a + a 2 ) 1 2 Examples for p using the analytical approach: b a optimal p

34 Typical Parallel Applications (1) All common applications exhibit a very high fraction of parallelizable operations (b very small) Linear Algebra: Operations with vectors and matrices Systems of linear equations: A x = b Solvers may work in a direct way, e.g. Gaussian-Elimination-Algorithm Iterative solvers, e.g. Gauss-Seidel-Iteration, some very efficient solvers for sparse coefficient matrices A

35 Typical Parallel Applications (2) Solution of Differential Equations: Equations that contain x, a function y(x) and deviations y (x). Numerical solution using discrete differences instead of symbolic differentiation Calculate approximated values for different values of x in parallel (Runge-Kutta-Algorithm)

36 Typical Parallel Applications (3) Image processing: Local operators, e.g. spreading of spectrum, smoothing can be executed on different image parts in parallel Object matching, e.g. detection of geometric forms Finding of similar blocks in different images for detection of object movements (soft) real-time multimedia

37 Summary Part 1 High performance computing with parallel computers Goals: solve problem in shorter time (speedup), or bigger problems in a specified/acceptable time (scaleup) Different parallel computer architectures: Multiprocesssors (shared memory), Distributed systems, Vector processors, Array computers Scaleup directly depends on time complexity of the algorithm, parallelization helps if time complexity order is polynomial or less Speedup is limited by sequential fraction of operations Common parallel applications with a very small sequential fraction of operations

PCS - Part 1: Introduction to Parallel Computing

PCS - Part 1: Introduction to Parallel Computing PCS - Part 1: Introduction to Parallel Computing Institute of Computer Engineering University of Lübeck, Germany Baltic Summer School, Tartu 2009 Part 1 - Overview Reasons for parallel computing Goals

More information

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors - Flynn s Taxonomy (1966) Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Parallel Numerics, WT 2013/ Introduction

Parallel Numerics, WT 2013/ Introduction Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

BlueGene/L (No. 4 in the Latest Top500 List)

BlueGene/L (No. 4 in the Latest Top500 List) BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

High Performance Computing Systems

High Performance Computing Systems High Performance Computing Systems Shared Memory Doug Shook Shared Memory Bottlenecks Trips to memory Cache coherence 2 Why Multicore? Shared memory systems used to be purely the domain of HPC... What

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Parallel Processing Basics Prof. Onur Mutlu Carnegie Mellon University Readings Required Hill, Jouppi, Sohi, Multiprocessors and Multicomputers, pp. 551-560 in Readings in Computer

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

Dr. Joe Zhang PDC-3: Parallel Platforms

Dr. Joe Zhang PDC-3: Parallel Platforms CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model

More information

Chapter 11. Introduction to Multiprocessors

Chapter 11. Introduction to Multiprocessors Chapter 11 Introduction to Multiprocessors 11.1 Introduction A multiple processor system consists of two or more processors that are connected in a manner that allows them to share the simultaneous (parallel)

More information

EECS4201 Computer Architecture

EECS4201 Computer Architecture Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be

More information

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 18-447 Computer Architecture Lecture 27: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 Assignments Lab 7 out Due April 17 HW 6 Due Friday (April 10) Midterm II April

More information

ELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges

ELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges ELE 455/555 Computer System Engineering Section 4 Class 1 Challenges Introduction Motivation Desire to provide more performance (processing) Scaling a single processor is limited Clock speeds Power concerns

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Introduction to High-Performance Computing

Introduction to High-Performance Computing Introduction to High-Performance Computing Simon D. Levy BIOL 274 17 November 2010 Chapter 12 12.1: Concurrent Processing High-Performance Computing A fancy term for computers significantly faster than

More information

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

CS 475: Parallel Programming Introduction

CS 475: Parallel Programming Introduction CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

Objectives of the Course

Objectives of the Course Objectives of the Course Parallel Systems: Understanding the current state-of-the-art in parallel programming technology Getting familiar with existing algorithms for number of application areas Distributed

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally

More information

Lecture 2. Memory locality optimizations Address space organization

Lecture 2. Memory locality optimizations Address space organization Lecture 2 Memory locality optimizations Address space organization Announcements Office hours in EBU3B Room 3244 Mondays 3.00 to 4.00pm; Thurs 2:00pm-3:30pm Partners XSED Portal accounts Log in to Lilliput

More information

27. Parallel Programming I

27. Parallel Programming I 760 27. Parallel Programming I Moore s Law and the Free Lunch, Hardware Architectures, Parallel Execution, Flynn s Taxonomy, Scalability: Amdahl and Gustafson, Data-parallelism, Task-parallelism, Scheduling

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Chapter 5: Thread-Level Parallelism Part 1

Chapter 5: Thread-Level Parallelism Part 1 Chapter 5: Thread-Level Parallelism Part 1 Introduction What is a parallel or multiprocessor system? Why parallel architecture? Performance potential Flynn classification Communication models Architectures

More information

Lect. 2: Types of Parallelism

Lect. 2: Types of Parallelism Lect. 2: Types of Parallelism Parallelism in Hardware (Uniprocessor) Parallelism in a Uniprocessor Pipelining Superscalar, VLIW etc. SIMD instructions, Vector processors, GPUs Multiprocessor Symmetric

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes: BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General

More information

Parallel Numerics, WT 2017/ Introduction. page 1 of 127

Parallel Numerics, WT 2017/ Introduction. page 1 of 127 Parallel Numerics, WT 2017/2018 1 Introduction page 1 of 127 Scope Revise standard numerical methods considering parallel computations! Change method or implementation! page 2 of 127 Scope Revise standard

More information

CDA3101 Recitation Section 13

CDA3101 Recitation Section 13 CDA3101 Recitation Section 13 Storage + Bus + Multicore and some exam tips Hard Disks Traditional disk performance is limited by the moving parts. Some disk terms Disk Performance Platters - the surfaces

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih ARCHITECTURAL CLASSIFICATION Mariam A. Salih Basic types of architectural classification FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FENG S CLASSIFICATION Handler Classification Other types of architectural

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Parallel Computing Introduction

Parallel Computing Introduction Parallel Computing Introduction Bedřich Beneš, Ph.D. Associate Professor Department of Computer Graphics Purdue University von Neumann computer architecture CPU Hard disk Network Bus Memory GPU I/O devices

More information

SAS Meets Big Iron: High Performance Computing in SAS Analytic Procedures

SAS Meets Big Iron: High Performance Computing in SAS Analytic Procedures SAS Meets Big Iron: High Performance Computing in SAS Analytic Procedures Robert A. Cohen SAS Institute Inc. Cary, North Carolina, USA Abstract Version 9targets the heavy-duty analytic procedures in SAS

More information

Flynn classification. S = single, M = multiple, I = instruction (stream), D = data (stream)

Flynn classification. S = single, M = multiple, I = instruction (stream), D = data (stream) Flynn classification = single, = multiple, I = instruction (stream), D = data (stream) ID ID ID ID Basic concepts Def. The speedup of an algorithm is p = T = T p time for best serial algorithm parallel

More information

27. Parallel Programming I

27. Parallel Programming I The Free Lunch 27. Parallel Programming I Moore s Law and the Free Lunch, Hardware Architectures, Parallel Execution, Flynn s Taxonomy, Scalability: Amdahl and Gustafson, Data-parallelism, Task-parallelism,

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

School of Parallel Programming & Parallel Architecture for HPC ICTP October, Intro to HPC Architecture. Instructor: Ekpe Okorafor

School of Parallel Programming & Parallel Architecture for HPC ICTP October, Intro to HPC Architecture. Instructor: Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Intro to HPC Architecture Instructor: Ekpe Okorafor A little about me! PhD Computer Engineering Texas A&M University Computer

More information

PARALLEL COMPUTER ARCHITECTURES

PARALLEL COMPUTER ARCHITECTURES 8 ARALLEL COMUTER ARCHITECTURES 1 CU Shared memory (a) (b) Figure 8-1. (a) A multiprocessor with 16 CUs sharing a common memory. (b) An image partitioned into 16 sections, each being analyzed by a different

More information

Parallel Computing Why & How?

Parallel Computing Why & How? Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel

More information

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

Parallel Architectures

Parallel Architectures Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s

More information

Multicores, Multiprocessors, and Clusters

Multicores, Multiprocessors, and Clusters 1 / 12 Multicores, Multiprocessors, and Clusters P. A. Wilsey Univ of Cincinnati 2 / 12 Classification of Parallelism Classification from Textbook Software Sequential Concurrent Serial Some problem written

More information

High Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA

High Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA High Performance Computing Leopold Grinberg T. J. Watson IBM Research Center, USA High Performance Computing Why do we need HPC? High Performance Computing Amazon can ship products within hours would it

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 15 Numerically solve a 2D boundary value problem Example:

More information

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

Computer Organization and Design, 5th Edition: The Hardware/Software Interface Computer Organization and Design, 5th Edition: The Hardware/Software Interface 1 Computer Abstractions and Technology 1.1 Introduction 1.2 Eight Great Ideas in Computer Architecture 1.3 Below Your Program

More information

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK SUBJECT : CS6303 / COMPUTER ARCHITECTURE SEM / YEAR : VI / III year B.E. Unit I OVERVIEW AND INSTRUCTIONS Part A Q.No Questions BT Level

More information

High Performance Computing

High Performance Computing The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical

More information

High Performance Computing in C and C++

High Performance Computing in C and C++ High Performance Computing in C and C++ Rita Borgo Computer Science Department, Swansea University Announcement No change in lecture schedule: Timetable remains the same: Monday 1 to 2 Glyndwr C Friday

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Beyond Latency and Throughput

Beyond Latency and Throughput Beyond Latency and Throughput Performance for Heterogeneous Multi-Core Architectures JoAnn M. Paul Virginia Tech, ECE National Capital Region Common basis for two themes Flynn s Taxonomy Computers viewed

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Introduction Goal: connecting multiple computers to get higher performance

More information

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming Linda Woodard CAC 19 May 2010 Introduction to Parallel Computing on Ranger 5/18/2010 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor

More information

Introduction to parallel computing

Introduction to parallel computing Introduction to parallel computing 2. Parallel Hardware Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Motherboard Processor https://sites.google.com/

More information

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of

More information

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing

More information

CPS311 Lecture: Parallelism November 29, Objectives:

CPS311 Lecture: Parallelism November 29, Objectives: Objectives: CPS311 Lecture: Parallelism November 29, 2011 To introduce Flynn s taxonomy 1. To introduce various SIMD approaches (Vector processors, MMX) 2. To introduce multicore CPU s 3. To introduce

More information

Introduction. EE 4504 Computer Organization

Introduction. EE 4504 Computer Organization Introduction EE 4504 Computer Organization Section 11 Parallel Processing Overview EE 4504 Section 11 1 This course has concentrated on singleprocessor architectures and techniques to improve upon their

More information

Cray XE6 Performance Workshop

Cray XE6 Performance Workshop Cray XE6 erformance Workshop odern HC Architectures David Henty d.henty@epcc.ed.ac.uk ECC, University of Edinburgh Overview Components History Flynn s Taxonomy SID ID Classification via emory Distributed

More information

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE The most popular taxonomy of computer architecture was defined by Flynn in 1966. Flynn s classification scheme is based on the notion of a stream of information.

More information

Lecture notes for CS Chapter 4 11/27/18

Lecture notes for CS Chapter 4 11/27/18 Chapter 5: Thread-Level arallelism art 1 Introduction What is a parallel or multiprocessor system? Why parallel architecture? erformance potential Flynn classification Communication models Architectures

More information

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University 18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?

More information

Computer Architecture and Organization

Computer Architecture and Organization 10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 10 Advanced Computer Architecture 10-2 Chapter 10 - Advanced Computer

More information

SHARED MEMORY VS DISTRIBUTED MEMORY

SHARED MEMORY VS DISTRIBUTED MEMORY OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors

More information

27. Parallel Programming I

27. Parallel Programming I 771 27. Parallel Programming I Moore s Law and the Free Lunch, Hardware Architectures, Parallel Execution, Flynn s Taxonomy, Scalability: Amdahl and Gustafson, Data-parallelism, Task-parallelism, Scheduling

More information

Computer Architecture Crash course

Computer Architecture Crash course Computer Architecture Crash course Frédéric Haziza Department of Computer Systems Uppsala University Summer 2008 Conclusions The multicore era is already here cost of parallelism is dropping

More information

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011 CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

BİL 542 Parallel Computing

BİL 542 Parallel Computing BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,

More information

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. Multiprocessor Systems A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. However, Flynn s SIMD machine classification, also called an array processor,

More information

Test on Wednesday! Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs)

Test on Wednesday! Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs) Test on Wednesday! 50 minutes Closed notes, closed computer, closed everything Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs) Study notes and readings posted on course

More information

Review of previous examinations TMA4280 Introduction to Supercomputing

Review of previous examinations TMA4280 Introduction to Supercomputing Review of previous examinations TMA4280 Introduction to Supercomputing NTNU, IMF April 24. 2017 1 Examination The examination is usually comprised of: one problem related to linear algebra operations with

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid Chapter 5 Thread-Level Parallelism Abdullah Muzahid 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors is saturating + Modern multiple issue processors are becoming very complex

More information

Parallel Computing. Hwansoo Han (SKKU)

Parallel Computing. Hwansoo Han (SKKU) Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information