CME342 - Parallel Methods in Numerical Analysis

Size: px
Start display at page:

Download "CME342 - Parallel Methods in Numerical Analysis"

Transcription

1 CME342 - Parallel Methods in Numerical Analysis Parallel Architectures April 2, 2014 Lecture 2

2 Announcements 1. Subscribe to the mailing list. Go to lists.stanford.edu and follow directions in 1 st handout Subscribe to list with name: cme342-class 2. Sign your name up on the list I am passing around if you have not done so. If you are not on the list, you will not have preferred access to cluster resources / GPUs. 3. If you missed lecture notes or handouts, they are on the web page now 2

3 Some units 1MFlop= 10^6 flops/sec 1MByte=10^6 bytes 1GFlop= 10^9 flops/sec 1GByte=10^9 bytes 1TFlop= 10^12 flops/sec 1TByte=10^12 bytes 3

4 Performance goals 4

5 Microprocessor performance 5

6 Top 500 List November

7 Top 500 List Historical Trends 7

8 Parallel Computers A parallel computer is a collection of CPUs/ Processing Units that cooperate to solve a problem. It can solve a problem faster or just solve a bigger problem. How large is the collection? How is the memory organized? How do they communicate and transfer information? This is not a course on parallel architectures: this lecture will be just an overview.you need to have a certain knowledge of the underlying hardware to efficiently use parallel computers. 8

9 Categorization of Parallel Architectures Control mechanism: instruction stream and data stream Process granularity Address space organization Interconnection network Static Dynamic 9

10 Control mechanism (Flynn s taxonomy) SISD: Single Instruction stream Single Data stream SIMD: Single Instruction stream Multiple Data stream MIMD: Multiple Instruction stream Multiple Data stream MISD: Multiple Instruction stream Single Data stream 10

11 SIMD Multiple processing elements are under the supervision of a control unit Thinking Machine CM-2, MasPar MP-2, Quadrics SIMD extensions are now present in commercial microprocessors (MMX or SSE in Intel x86, 3DNow in AMD K6 and Athlon, Altivec in Motorola G4) 11

12 MIMD Each processing element is capable of executing a different program independent of the other processors Most multiprocessors can be classified in this category 12

13 Process granularity Coarse grain: Cray C90, Fujitsu Medium grain: IBM SP2, CM-5, clusters Fine grain: CM-2, Quadrics, Blue Gene/L? 13

14 Address space: Single address space Uniform Memory Address: SMP (UMA) Non Uniform memory Address (NUMA) Local address spaces: Message passing 14

15 SMP architecture CPU CPU CPU CPU Cache Cache Cache Cache Bus or Crossbar Switch Memory I/O SMP uses shared system resources (memory, I/O) that can be accessed equally from all the processors Cache coherence is maintained 15

16 NUMA architecture Memory Memory Memory Memory CPU CPU CPU CPU Cache Cache Cache Cache Bus or Crossbar Switch Shared address space Memory latency varies whether you access local or remote memory Cache coherence is maintained using an hardware or software protocol 16

17 Message-passing / Distributed Memory Memory Memory Memory Memory CPU CPU CPU CPU Cache Cache Cache Cache Local address space No Cache coherence Communication network 17

18 Hybrids? Memory Device memory Memory Device memory Memory Device memory CPU GPU CPU GPU CPU GPU Cache Cache Cache PCIx PCIx PCIx Notice that, currently, a CPU can have many cores that share memory within one processor (and possibly multiple ones) In addition, accelerator cards can be present with separate memory (heterogeneous) architectures (GPUs and others) No Cache coherence Communication network 18

19 Dynamic interconnections Crossbar Switching : P1 M1 M2 most expensive and extensive interconnection. P2 Bus connected : Processors are connected to memory through a common datapath Multistage interconnection: Butterfly,Omega network, perfect shuffle, etc Butterfly 19

20 Static interconnection networks Complete interconnection Star interconnection Linear array Mesh: 2D/3D mesh, 2D/3D torus Tree and fat tree network Hypercube network 20

21 Characteristics of static networks Diameter: maximum distance between any two processors in the network D=1 complete connection D=N-1 D=N/2 D=2( N -1) linear array ring 2D mesh D=2 ( (N/2)) 2D torus D=log N hypercube 21

22 Characteristics of static networks Bisection width: the minimum number of communications links that have to be removed to partition the network in half. Channel rate: peak rate at which a single wire can deliver bits Channel bandwidth: it is the product of channel rate and channel width Bisection bandwidth B: it is the product of bisection width and channel bandwidth. 22

23 Linear array, ring, mesh, torus P P P P P P Processors are arranged as a d-dimensional grid or torus P P P P P P P P P P P P P P P P P P 23

24 Tree, Fat-tree Tree network: there is only one path between any pair of processors. Fat tree network: increase the number of communication links close to the root 24

25 Hypercube 1-D 2-D 3-D 25

26 Binary Reflected GRAY code: G(i,d) denotes the i-th entry in a sequence of Gray codes of d bits. G(i,d+1) is derived from G(i,d) by reflecting the table and prefixing the reflected entry with 1 and the original entry with 0. 26

27 0 Example of BRG code 1-bit 2-bit 3-bit 8p ring 8p hyper

28 Embedding other networks into hypercubes The hypercube is a rich topology, many other networks can be easily mapped onto it. Mapping a linear array into an hypercube: A linear array (or ring) of 2^d processors can be embedded into a d-dimensional hypercube by mapping processor I onto processor G(I,d) of the hypercube Mapping a 2^r x 2^s mesh on an hypercube: processor(i,j)---> G(i,r) G(j,s) concatenation) ( denotes 28

29 Trade-off among different networks Network Minimum latency Maximum Bw per Proc Wires Switches Example Completely connected Constant Constant O(p*p) - - Crossbar Constant Constant O(p) O(p*p) Cray Bus Constant O(1/p) O(p) O(p) SGI Challenge Mesh O(sqrt p) Constant O(p) - Intel ASCI Red Hypercube O(log p) Constant O(p log p) - Sgi Origin Switched O(log p) Constant O(p log p) O(p log p) IBM SP-2 29

30 Beowulf Cluster built with commodity hardware components PC hardware (x86,alpha,powerpc) Commercial high-speed interconnection (100Base-T, Gigabit Ethernet, Myrinet,SCI,Infiniband) Linux, Free-BSD operating system 30

31 Appleseed: PowerPC cluster 31

32 Clusters of SMPs The next generation of supercomputers will have thousand of SMP nodes connected. Increase the computational power of the single node Keep the number of nodes low New programming approach needed, MPI +Threads (OpenMP,Pthreads,.) ASCI White, Compaq SC, IBM SP

33 Multithreaded architecture The MTA system provides scalable shared memory, in which every processor has equal access to every memory location No concerns about the layout of memory Each MTA processor has up to 128 RISC-like virtual processors Each virtual processor is a hardware stream with its own instruction counter, register set, stream status word and target and trap registers. A different hardware stream is activated every clock period. (now Cray) 33

34 Earth Simulator Between , the Earth Simulator was the most powerful supercomputer available today 40 Teraflops of peak 10 Terabytes of memory Use a crossbar switch to connect 640 nodes: 3,000Km of cables!!!! Each node has 8 vector processors Sustained performances of up to 20TFlops on climate simulation, 15TFlops on 4096^3 isotropic turbulence simulation 34

35 BlueGene/L In , BGL was the #1 computer on the TOP500 supercomputer list 32 x 32 x 64 3D torus 131,000 processors System on a chip Low-cost, low-power processor 360 teraops peak 280 teraops sustained (Linpack) 32 tebibytes 35

36 Tianhe-1 and -2 National Super Computer Center in Guangzhou 3,120,000 cores Intel Xeon E5-2692, 2.2GHz Intel Xeon Phi 31S1P accelerator cards TH Express-2 interconnect Linpack performance: 33.8 PFlop/s Power: 18.8 MWatt Intel CC compiler, Intel MKL MPICH2 for communication 36

37 Programming models Shared memory: Automatic parallelization Pthreads Compiler directives: OpenMP Message passing: MPI: message passing interface PVM: parallel virtual machine HPF: high performance Fortran 37

38 Pthreads Posix threads: Standard definition but non-standard implementation Hard to code 38

39 More Recent Models Extracting high performance from the use of Graphics Processing Units (GPUs) using CUDA for NVIDIA cards OpenCL for CPUs / GPUs / DSPs / FPGAs OpenACC for heterogeneous systems And the list goes on what should you do? 39

40 OpenMP New de-facto standard available on all major platforms Easy to implement Single source for parallel and serial code 40

41 MPI Standard parallel interface: MPI 1 MPI 2: extend MPI 1 with single side communication Need to rewrite the code Code portable on all architectures 41

42 PVM Parallel Virtual Machine Another popular message passing interface Useful in environments with multiple vendors and for MPMD approach 42

43 Simple code to compute Π!program compute_pi!!integer n, i!!double precision w, x, sum, pi, f, a! c function to integrate!!f(a) = 4.d0 / (1.d0 + a*a)!!print *, 'Enter number of intervals: '!!read *,n! c calculate the interval size!!w = 1.0d0/n!!sum = 0.0d0!!do i = 1, n!!!x = w * (i - 0.5d0)!!!sum = sum + f(x)!!end do!!pi = w * sum!!print *, 'computed pi = ', pi!!stop!!end! 43

44 OpenMP code program compute_pi!!integer n, i!!double precision w, x, sum, pi, f, a! c function to integrate!!f(a) = 4.d0 / (1.d0 + a*a)!!print *, 'Enter number of intervals: '!!read *,n! c calculate the interval size!!w = 1.0d0/n!!sum = 0.0d0!!$OMP PARALLEL DO PRIVATE(x), SHARED(w)!!$OMP& REDUCTION(+: sum)!!do i = 1, n!!!x = w * (i - 0.5d0)!!!sum = sum + f(x)!!end do!!$omp END PARALLEL DO!!pi = w * sum!!print *, 'computed pi = ', pi!!stop!!end! 44

45 MPI code program compute_pi!!include 'mpif.h'!!double precision mypi, pi, w, sum, x, f, a!!integer n, myid, numprocs, i, rc! c function to integrate!!f(a) = 4.d0 / (1.d0 + a*a)!!call MPI_INIT( ierr )!!call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )!!call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )!!if ( myid.eq. 0 ) then!!print *, 'Enter number of intervals: '!!read *, n!!endif!!call MPI_BCAST(n,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)! c calculate the interval size!!w = 1.0d0/n!!sum = 0.0d0!!do i = myid+1, n, numprocs!!!x = w * (i - 0.5d0)!!!sum = sum + f(x)!!enddo!!mypi = w * sum!! 45

46 MPI code collect all the partial sums!!call MPI_REDUCE(mypi,pi,1,MPI_DOUBLE_PRECISION,MPI_SUM,0,!!$ MPI_COMM_WORLD,ierr)! c node 0 prints the answer.!!!if (myid.eq. 0) then!!!print *, 'computed pi = ', pi!!endif!!call MPI_FINALIZE(rc)!!stop!!end! 46

47 Pthreads code #include <stdio.h>! #include <unistd.h>! #include <sys/times.h>! #include <pthread.h>! #include <stdlib.h>!! float sumall, width;! int i, iend;! private int ibegin;! private int cut;! private float x;! private float xsum;! void *do_work();! main(argc,argv)! int argc;! char *argv[];! {! /* Pi - Program loops over slices in interval, summing area of each slice! */! struct tms time_buf;! float ticks, t1, t2;! int intrvls, numthreads, istart;! pthread_t pt[32];!! /* get intervals from command line */!! intrvls = atoi(argv[1]);! numthreads = atoi(getenv( "PL_NUM_THREADS"));! printf(" intervals = %d PL_NUM_THREADS = %d \n",intrvls, numthreads);!! /* get number of clock ticks per second and initialize timer */!! ticks = sysconf(_sc_clk_tck);! t2 =times(&time_buf);!! /* - - Compute width of cuts */!! width = 1. / intrvls;! sumall = 0.0;!! 47

48 Pthreads code /* - - Loop over interval, summing areas */! istart = 1;! iend = intrvls/numthreads;! for (i = 0; i < numthreads - 1 ; i++)! {! pthread_create(&pt[i], pthread_attr_default, do_work,(void *) istart);! istart += iend;! }! do_work( istart);! istart += iend;! for (i = 0; i < numthreads - 1 ; i++)! {! pthread_join(pt[i], NULL);! }! /* - - fininish any remaining slices */! iend = intrvls - (intrvls/numthreads)* numthreads;! if( iend) do_work( istart);! /* - - Finish overall timing and write results */! t1 = times(&time_buf);! printf("time in main = %20.14e sum = %20.14f \n",(t1 -t2)/ticks,sumall);! printf("error = %20.15e \n",sumall );! }!! void *do_work(istart)! int istart;! {! ibegin = istart;! xsum = 0.0;! for (cut = ibegin ; cut < ibegin+iend; cut++)! {! x = (( (float ) cut) -.5) * width ;! }! xsum += width * 4. /(1. + x * x);! }! sumall += xsum;! 48

49 PVM code 1/3 program compute_pi_master! include '~/pvm3/include/fpvm3.h'! parameter (NTASKS = 5)! parameter (INTERVALS = 1000)! integer mytid! integer tids(ntasks)! real sum, area! real width! integer i, numt, msgtype, bufid, bytes, who, info!! sum = 0.0! C Enroll in PVM! call pvmfmytid(mytid)! C spawn off NTASKS workers! call pvmfspawn('comppi.worker', PVMDEFAULT, ' ',! + NTASKS, tids, numt)! width = 0.0! i = 0! C Multi-cast initial dummy message to workers! msgtype = 0! call pvmfinitsend(0, info)! call pvmfpack(integer4, i, 1, 1, info)! call pvmfpack(real4, width, 1, 1, info)! call pvmfmcast(ntasks, tids, msgtype, info)! C compute interval width! width = 1.0 / INTERVALS! C for each interval, 1) receive area from worker 2) add area to sum 3) send worker new interval number and width! call pvmfrecv(-1, -1, bufid)! call pvmfbufinfo(bufid, bytes, msgtype, who, info)! call pvmfunpack(real4, area, 1, 1, info)! sum = sum + area! call pvmfinitsend(pvmdatadefault, info)! call pvmfpack(integer4, i, 1, 1, info)! call pvmfpack(real4, width, 1, 1, info)! call pvmfsend(who, msgtype, info)! enddo!! 49

50 PVM code 2/3 C Signal to workers that tasks are done! i = -1! call pvmfinitsend(0, info)! call pvmfpack(integer4, i, 1, 1, info)! call pvmfpack(real4, width, 1, 1, info)!! C Collect the last NTASK areas and send the completion signal! do i = 1, NTASKS! call pvmfrecv(-1,-1, bufid)! call pvmfbufinfo(bufid, bytes, msgtype, who, info)! call pvmfunpack(real4, area, 1, 1, info)!! sum = sum + area!! call pvmfsend(who, msgtype, info)! enddo!! print 10,sum! 10 format(x,'computed value of Pi is ',F8.6)!! call pvmfexit(info)! end! 50

51 PVM code 3/3 integer mytid, master! real area! real width, int_val, height! integer int_num! C Enroll in PVM! call pvmfmytid(mytid)! C who is sending me work?! call pvmfparent(master)!! C receive first job from the master! call pvmfrecv(-1, -1, info)! call pvmfunpack(integer4, int_num, 1, 1, info)! call pvmfunpack(real4, width, 1, 1, info)!! C While I've not been sent the signal to quit, I'll keep processing! 40 if (int_num.eq. -1) goto 50! C compute interval value from interval number! int_val = int_num * width! C compute height of given rectangle! height = F(int_val)!! C compute area! area = height * width!! C send area back to master! call pvmfinitsend(pvmdatadefault, info)! call pvmfpack(real4, area, 1, 1, info)! call pvmfsend(master, 9, info)!! C Wait for next job from master! call pvmfrecv(-1, -1, info)! call pvmfunpack(integer4, int_num, 1, 1, info)! call pvmfunpack(real4, width, 1, 1, info)! goto 40! C all done! 50 call pvmfexit(info)! end!! REAL FUNCTION F(X)! REAL X! F = 4.0/(1.0+X**2)! RETURN! END! 51

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2 CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann

More information

CS Parallel Algorithms in Scientific Computing

CS Parallel Algorithms in Scientific Computing CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan

More information

Shared memory programming

Shared memory programming CME342- Parallel Methods in Numerical Analysis Shared memory programming May 14, 2014 Lectures 13-14 Motivation Popularity of shared memory systems is increasing: Early on, DSM computers (SGI Origin 3000

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Parallel Architecture. Sathish Vadhiyar

Parallel Architecture. Sathish Vadhiyar Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2 Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION For questions, email to jan.kwiatkowski@pwr.edu.pl with 'Subject=your name.

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy

More information

Overview. Processor organizations Types of parallel machines. Real machines

Overview. Processor organizations Types of parallel machines. Real machines Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

What is Parallel Computing?

What is Parallel Computing? What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing

More information

Multi-core Programming - Introduction

Multi-core Programming - Introduction Multi-core Programming - Introduction Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued) Cluster Computing Dichotomy of Parallel Computing Platforms (Continued) Lecturer: Dr Yifeng Zhu Class Review Interconnections Crossbar» Example: myrinet Multistage» Example: Omega network Outline Flynn

More information

Parallel Architectures

Parallel Architectures Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

EEC 581 Computer Architecture. Lec 10 Multiprocessors (4.3 & 4.4)

EEC 581 Computer Architecture. Lec 10 Multiprocessors (4.3 & 4.4) EEC 581 Computer Architecture Lec 10 Multiprocessors (4.3 & 4.4) Chansu Yu Electrical and Computer Engineering Cleveland State University Acknowledgement Part of class notes are from David Patterson Electrical

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

CS 267: Introduction to Parallel Machines and Programming Models Lecture 3 "

CS 267: Introduction to Parallel Machines and Programming Models Lecture 3 CS 267: Introduction to Parallel Machines and Programming Models Lecture 3 " James Demmel www.cs.berkeley.edu/~demmel/cs267_spr16/!!! Outline Overview of parallel machines (~hardware) and programming models

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008 Parallel Computing Using OpenMP/MPI Presented by - Jyotsna 29/01/2008 Serial Computing Serially solving a problem Parallel Computing Parallelly solving a problem Parallel Computer Memory Architecture Shared

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel

More information

CS 267: Introduction to Parallel Machines and Programming Models Lecture 3 "

CS 267: Introduction to Parallel Machines and Programming Models Lecture 3 CS 267: Introduction to Parallel Machines and Lecture 3 " James Demmel www.cs.berkeley.edu/~demmel/cs267_spr15/!!! Outline Overview of parallel machines (~hardware) and programming models (~software) Shared

More information

Parallel Computing Why & How?

Parallel Computing Why & How? Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel

More information

BİL 542 Parallel Computing

BİL 542 Parallel Computing BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,

More information

BlueGene/L (No. 4 in the Latest Top500 List)

BlueGene/L (No. 4 in the Latest Top500 List) BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside

More information

Parallelization, OpenMP

Parallelization, OpenMP ~ Parallelization, OpenMP Scientific Computing Winter 2016/2017 Lecture 26 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de made wit pandoc 1 / 18 Why parallelization? Computers became faster and faster

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message

More information

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)

CMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman) CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI

More information

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Parallel Programming Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Challenges Difficult to write parallel programs Most programmers think sequentially

More information

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell. COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell. COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University COMP4300/8300: Overview of Parallel Hardware Alistair Rendell COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University 2.1 Lecture Outline Review of Single Processor Design So we talk

More information

Lecture 7: Distributed memory

Lecture 7: Distributed memory Lecture 7: Distributed memory David Bindel 15 Feb 2010 Logistics HW 1 due Wednesday: See wiki for notes on: Bottom-up strategy and debugging Matrix allocation issues Using SSE and alignment comments Timing

More information

SMD149 - Operating Systems - Multiprocessing

SMD149 - Operating Systems - Multiprocessing SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction

More information

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

Parallel and High Performance Computing CSE 745

Parallel and High Performance Computing CSE 745 Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel

More information

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign

Introduction to MPI. May 20, Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Introduction to MPI May 20, 2013 Daniel J. Bodony Department of Aerospace Engineering University of Illinois at Urbana-Champaign Top500.org PERFORMANCE DEVELOPMENT 1 Eflop/s 162 Pflop/s PROJECTED 100 Pflop/s

More information

Types of Parallel Computers

Types of Parallel Computers slides1-22 Two principal types: Types of Parallel Computers Shared memory multiprocessor Distributed memory multicomputer slides1-23 Shared Memory Multiprocessor Conventional Computer slides1-24 Consists

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel

More information

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS

More information

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architecture Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is

More information

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell COMP4300/8300: Overview of Parallel Hardware Alistair Rendell COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University 2.2 The Performs: Floating point operations (FLOPS) - add, mult,

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

What are Clusters? Why Clusters? - a Short History

What are Clusters? Why Clusters? - a Short History What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by

More information

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport CPS 303 High Performance Computing Wensheng Shen Department of Computational Science SUNY Brockport Chapter 2: Architecture of Parallel Computers Hardware Software 2.1.1 Flynn s taxonomy Single-instruction

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming Linda Woodard CAC 19 May 2010 Introduction to Parallel Computing on Ranger 5/18/2010 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #4 1/24/2018 Xuehai Qian xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Announcements PA #1

More information

CS/COE1541: Intro. to Computer Architecture

CS/COE1541: Intro. to Computer Architecture CS/COE1541: Intro. to Computer Architecture Multiprocessors Sangyeun Cho Computer Science Department Tilera TILE64 IBM BlueGene/L nvidia GPGPU Intel Core 2 Duo 2 Why multiprocessors? For improved latency

More information

Physical Organization of Parallel Platforms. Alexandre David

Physical Organization of Parallel Platforms. Alexandre David Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect:

More information

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared

More information

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Parallel Systems Prof. James L. Frankel Harvard University Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Architectures SISD (Single Instruction, Single Data)

More information

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How

More information

Lecture 3: Intro to parallel machines and models

Lecture 3: Intro to parallel machines and models Lecture 3: Intro to parallel machines and models David Bindel 1 Sep 2011 Logistics Remember: http://www.cs.cornell.edu/~bindel/class/cs5220-f11/ http://www.piazza.com/cornell/cs5220 Note: the entire class

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Outline. Overview Theoretical background Parallel computing systems Parallel programming models MPI/OpenMP examples

Outline. Overview Theoretical background Parallel computing systems Parallel programming models MPI/OpenMP examples Outline Overview Theoretical background Parallel computing systems Parallel programming models MPI/OpenMP examples OVERVIEW y What is Parallel Computing? Parallel computing: use of multiple processors

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved. Chapter 6 Parallel Processors from Client to Cloud FIGURE 6.1 Hardware/software categorization and examples of application perspective on concurrency versus hardware perspective on parallelism. 2 FIGURE

More information

SHARED MEMORY VS DISTRIBUTED MEMORY

SHARED MEMORY VS DISTRIBUTED MEMORY OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors

More information

High Performance Computing (HPC) Introduction

High Performance Computing (HPC) Introduction High Performance Computing (HPC) Introduction Ontario Summer School on High Performance Computing Scott Northrup SciNet HPC Consortium Compute Canada June 25th, 2012 Outline 1 HPC Overview 2 Parallel Computing

More information

High Performance Computing - Parallel Computers and Networks. Prof Matt Probert

High Performance Computing - Parallel Computers and Networks. Prof Matt Probert High Performance Computing - Parallel Computers and Networks Prof Matt Probert http://www-users.york.ac.uk/~mijp1 Overview Parallel on a chip? Shared vs. distributed memory Latency & bandwidth Topology

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer? Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT

6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT 6.189 IAP 2007 Lecture 5 Parallel Programming Concepts 1 6.189 IAP 2007 MIT Recap Two primary patterns of multicore architecture design Shared memory Ex: Intel Core 2 Duo/Quad One copy of data shared among

More information

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE 3/3/205 EE 4683/5683: COMPUTER ARCHITECTURE Lecture 8: Interconnection Networks Avinash Kodi, kodi@ohio.edu Agenda 2 Interconnection Networks Performance Metrics Topology 3/3/205 IN Performance Metrics

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

Chapter Seven. Idea: create powerful computers by connecting many smaller ones Chapter Seven Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) vector processing may be coming back bad news:

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

General introduction: GPUs and the realm of parallel architectures

General introduction: GPUs and the realm of parallel architectures General introduction: GPUs and the realm of parallel architectures GPU Computing Training August 17-19 th 2015 Jan Lemeire (jan.lemeire@vub.ac.be) Graduated as Engineer in 1994 at VUB Worked for 4 years

More information

CSC630/CSC730: Parallel Computing

CSC630/CSC730: Parallel Computing CSC630/CSC730: Parallel Computing Parallel Computing Platforms Chapter 2 (2.4.1 2.4.4) Dr. Joe Zhang PDC-4: Topology 1 Content Parallel computing platforms Logical organization (a programmer s view) Control

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

Dheeraj Bhardwaj May 12, 2003

Dheeraj Bhardwaj May 12, 2003 HPC Systems and Models Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110 016 India http://www.cse.iitd.ac.in/~dheerajb 1 Sequential Computers Traditional

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors - Flynn s Taxonomy (1966) Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The

More information

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018 Introduction CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University What is Parallel

More information

OpenACC. Part I. Ned Nedialkov. McMaster University Canada. October 2016

OpenACC. Part I. Ned Nedialkov. McMaster University Canada. October 2016 OpenACC. Part I Ned Nedialkov McMaster University Canada October 2016 Outline Introduction Execution model Memory model Compiling pgaccelinfo Example Speedups Profiling c 2016 Ned Nedialkov 2/23 Why accelerators

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information