CS475 Parallel Programming

Size: px
Start display at page:

Download "CS475 Parallel Programming"

Transcription

1 CS475 Parallel Programmig Dese Matrix Multiply Wim Bohm, Colorado State Uiversity Except as otherwise oted, the cotet of this presetatio is licesed uder the Creative Commos Attributio 2.5 licese.

2 Block mappig a matrix oto p PEs Blocked / Checkerboard 2 matrix o p PEs Map /sqrt(p) x /sqrt(p) blocks oto PEs Maps well o a 2D mesh Fiest graularity: 1 elemet per PE, (p = *) May matrix algorithms allow block formulatio Matrix add Matrix multiply

3 x Matrix Multiply for i = 0 to -1 for j = 0 to -1 Cij = 0 for k = 0 to -1 Cij += Aik * Bkj We do ot cosider recursive < O( 3 ) algorithms ( s.a. Strasse). These could be the top level, drivig the O( 3 ) algorithms described here.

4 x Matrix Multiply for i = 0 to -1 for j = 0 to -1 Cij = 0 for k = 0 to -1 Cij += Aik * Bkj outer i,j idices of A ad B determie the target C elemet or block

5 x Matrix Multiply for i = 0 to -1 for j = 0 to -1 Cij = 0 for k = 0 to -1 Cij += Aik * Bkj A ad B elemets or block compute oly if their ier idex (k) is equal distributive ad commutative + ad * the order i which we compute ad accumulate the blocks does ot matter!

6 2 ier products B *j B A i* Cij=A i*. B *j A C

7 outer products A A *k B k* B C for k = 0 to -1 forall i i [0,-1] forall j i [0,-1] C ij += A ik * B kj outer i,j idices of A ad B are the target C idices ier idices k of A ad B are equal

8 3x3 example: ier product A 00 A 01 A 02 B 00 B 01 B 02 A 10 A 11 A 12 X B 10 B 11 B 12 = A 20 A 21 A 22 B 20 B 21 B 22 A 00 B 00 +A 01 B 10 + A 02 B 20 A 00 B 01 +A 01 B 11 + A 02 B 21 A 00 B 02 +A 01 B 12 + A 02 B 22 A 10 B 00 +A 11 B 10 + A 12 B 20 A 10 B 01 +A 11 B 11 + A 12 B 21 A 10 B 02 +A 11 B 12 + A 12 B 22 A 20 B 00 +A 21 B 10 + A 22 B 20 A 20 B 01 +A 21 B 11 + A 22 B 21 A 20 B 02 +A 21 B 12 + A 22 B 22

9 3x3 example: outer 1 A 00 A 01 A 02 B 00 B 01 B 02 A 10 A 11 A 12 X B 10 B 11 B 12 A 20 A 21 A 22 B 20 B 21 B 22 A 00 B 00 +A 01 B 10 + A 02 B 20 A 00 B 01 +A 01 B 11 + A 02 B 21 A 00 B 02 +A 01 B 12 + A 02 B 22 A 10 B 00 +A 11 B 10 + A 12 B 20 A 10 B 01 +A 11 B 11 + A 12 B 21 A 10 B 02 +A 11 B 12 + A 12 B 22 A 20 B 00 +A 21 B 10 + A 22 B 20 A 20 B 01 +A 21 B 11 + A 22 B 21 A 20 B 02 +A 21 B 12 + A 22 B 22

10 3x3 example: outer 2 A 00 A 01 A 02 B 00 B 01 B 02 A 10 A 11 A 12 X B 10 B 11 B 12 A 20 A 21 A 22 B 20 B 21 B 22 A 00 B 00 +A 01 B 10 + A 02 B 20 A 00 B 01 +A 01 B 11 + A 02 B 21 A 00 B 02 +A 01 B 12 + A 02 B 22 A 10 B 00 +A 11 B 10 + A 12 B 20 A 10 B 01 +A 11 B 11 + A 12 B 21 A 10 B 02 +A 11 B 12 + A 12 B 22 A 20 B 00 +A 21 B 10 + A 22 B 20 A 20 B 01 +A 21 B 11 + A 22 B 21 A 20 B 02 +A 21 B 12 + A 22 B 22

11 3x3 example: outer 3 A 00 A 01 A 02 B 00 B 01 B 02 A 10 A 11 A 12 X B 10 B 11 B 12 A 20 A 21 A 22 B 20 B 21 B 22 A 00 B 00 +A 01 B 10 + A 02 B 20 A 00 B 01 +A 01 B 11 + A 02 B 21 A 00 B 02 +A 01 B 12 + A 02 B 22 A 10 B 00 +A 11 B 10 + A 12 B 20 A 10 B 01 +A 11 B 11 + A 12 B 21 A 10 B 02 +A 11 B 12 + A 12 B 22 A 20 B 00 +A 21 B 10 + A 22 B 20 A 20 B 01 +A 21 B 11 + A 22 B 21 A 20 B 02 +A 21 B 12 + A 22 B 22

12 Blocked outer product Bk * A *k B A C C ij

13 Blocked outer product Bk * A *k B A C C ij

14 Blocked outer product otice: width of A block row does ot eed to be equal to width of B block colum A *k B Bk * A C C ij

15 Blocked Matrix Multiply Stadard ier product algorithm ca be blocked p processors: /sqrt(p) * /sqrt(p) sized blocks PEij has blocks Aij ad Bij ad computes block Cij Cij eeds Aik ad Bkj, k = 0 to -1 Assumig blocked data distributio for all three matrices, some form of commuicatio is eeded

16 B *j B A i* A C C ij

17 B *j B A i* A C C ij

18 B *j B etc. A i* A C C ij

19 Simple Block Matrix Multiply All PEs i a row eed all row blocks of A Oe block: 2 / p all-to-all block broadcast of A i ar ow of PEs O(sqrt(p) * ( 2 / p) All PEs i a colum eed all colum blocks of B all-to-all block broadcast of B i colum PEs O(sqrt(p) * ( 2 / p ) Compute block Cij i PEij: 3 /p time Space use for A ad B: per PE: 2*sqrt(p)*( 2 / p )= 2 2 / sqrt(p), Total: 2 2 sqrt(p), a replicatio factor of sqrt(p)

20 Cao s Matrix Multiply Avoids space overhead processsor has ot more tha 1 A block, 1 B block, ad 1 C block iterleaves block moves ad computatio PEij computes block Cij Iitial aligmet of data Circular left shift block Aij by i steps Circular up shift block Bij by j steps Iterleave computatio ad commuicatio Compute: block matrix multiplicatio Commuicate: circular shift left A blocks circular shift up B blocks

21 Iitial state: p ij ows blocks ij A 00 A 01 A 02 A 03 B 00 B 01 B 02 B 03 A 10 A 11 A 12 A 13 B 10 B 11 B 12 B 13 A 20 A 21 A 22 A 23 B 20 B 21 B 22 B 23 A 30 A 31 A 32 A 33 B 30 B 31 B 32 B 33

22 Cao: alig blocks so all ca compute A 00 A 01 A 02 A 03 B 00 B 01 B 10 B 11 B 20 B 21 B 30 B 31 First row of A ad colum of B are i the right place, but which B block does A 01 eed to compute? B 11 So rotate B *1 1 up

23 Cao: alig blocks so all ca compute A 00 A 01 A 02 A 03 B 00 B 11 B 10 B 21 B 20 B 31 B 30 B 01

24 Cao: alig blocks so all ca compute A 00 A 01 A 02 A 03 B 00 B 11 B 02 B 10 B 21 B 12 B 20 B 31 B 22 B 30 B 01 B 32 Which B block does A 02 eed to compute? B 22 So rotate B *2 2 up

25 Cao: alig blocks so all ca compute A 00 A 01 A 02 A 03 B 00 B 11 B 22 B 10 B 21 B 32 B 20 B 31 B 02 B 30 B 01 B 12

26 Cao: alig blocks so all ca compute A 00 A 01 A 02 A 03 B 00 B 11 B 22 B 33 B 10 B 21 B 32 B 03 B 20 B 31 B 02 B 13 B 30 B 01 B 12 B 23 ad rotate B *3 3 up

27 Cao: alig blocks so all ca compute A 00 A 01 A 02 A 03 B 00 B 11 B 22 B 33 B 10 B 21 B 32 B 03 B 20 B 31 B 02 B 13 B 30 B 01 B 12 B 23 ad ow alig A rows with B

28 Cao: alig blocks so all ca compute A 00 A 01 A 02 A 03 B 00 B 11 B 22 B 33 A 11 A 12 A 13 A 10 B 10 B 21 B 32 B 03 A 22 A 23 A 20 A 21 B 20 B 31 B 02 B 13 A 33 A 30 A 31 A 32 B 30 B 01 B 12 B 23 ow all blocks are i the right place to multiply for the ext step: As cyclic shift left, ad Bs cyclic shift up

29 Cao: alig blocks so all ca compute A 01 A 02 A 03 A 00 B 10 B 21 B 32 B 03 A 12 A 13 A 10 A 11 B 20 B 31 B 02 B 13 A 23 A 20 A 21 A 22 B 30 B 01 B 12 B 23 A 30 A 31 A 32 A 33 B 00 B 11 B 22 B 33

30 Cao: alig blocks so all ca compute A 02 A 03 A 00 A 01 B 20 B 31 B 22 B 13 A 13 A 10 A 11 A 12 B 30 B 01 B 32 B 23 A 20 A 21 A 22 A 23 B 00 B 11 B 02 B 33 A 31 A 32 A 33 A 30 B 10 B 21 B 12 B 03

31 Cao: alig blocks so all ca compute A 03 A 00 A 01 A 02 B 30 B 01 B 22 B 13 A 10 A 11 A 12 A 13 B 00 B 11 B 32 B 23 A 21 A 22 A 23 A 20 B 10 B 21 B 02 B 33 A 32 A 33 A 30 A 31 B 20 B 31 B 12 B 03

32 Cost of Cao s Matrix Multiply Iitial data aligmet Aligig A or B Worst distace * size» sqrt(p) * 2 /p Total» 2* sqrt(p) * 2 /p Iterleave computatio ad commuicatio Compute: total = 3 / p Commuicate: A blocks circular shift left B blocks circular shift up Total cost = umber of shifts * size» 2* sqrt(p) * 2 /p Space: 3 2 /p per PE (A, B ad C)

33 Fox s Matrix Multiply Also avoids space overhead iterleaves broadcasts for A blocks, block shifts for B, ad computatio Iitial data distributio: stadard block Iitial computatio Broadcast Aii i row i Compute: block matrix multiplicatio for j = 1 to sqrt(p) 1 Circular up shift B blocks Broadcast Aik block i row i, where k = (j+i) mod sqrt(p) Compute: block matrix multiplicatio ad add to C block

34 Iitial state: p ij ows blocks ij A 00 A 01 A 02 A 03 B 00 B 01 B 02 B 03 A 10 A 11 A 12 A 13 B 10 B 11 B 12 B 13 A 20 A 21 A 22 A 23 B 20 B 21 B 22 B 23 A 30 A 31 A 32 A 33 B 30 B 31 B 32 B 33

35 Fox: broadcast diagoal block i row A 00 A 01 A 02 A 03 B 00 B 01 B 02 B 03 A 10 A 11 A 12 A 13 B 10 B 11 B 12 B 13 A 20 A 21 A 22 A 23 B 20 B 21 B 22 B 23 A 30 A 31 A 32 A 33 B 30 B 31 B 32 B 33

36 Fox: ext diagoal, B rotates up A 00 A 01 A 02 A 03 B 10 B 11 B 12 B 13 A 10 A 11 A 12 A 13 B 20 B 21 B 22 B 23 A 20 A 21 A 22 A 23 B 30 B 31 B 32 B 33 A 30 A 31 A 32 A 33 B 00 B 01 B 02 B 03

37 Fox: ext diagoal, B rotates up A 00 A 01 A 02 A 03 B 20 B 21 B 22 B 23 A 10 A 11 A 12 A 13 B 30 B 31 B 32 B 33 A 20 A 21 A 22 A 23 B 00 B 01 B 02 B 03 A 30 A 31 A 32 A 33 B 10 B 11 B 12 B 13

38 Fox: ext diagoal, B rotates up A 00 A 01 A 02 A 03 B 30 B 31 B 32 B 33 A 10 A 11 A 12 A 13 B 00 B 01 B 02 B 03 A 20 A 21 A 22 A 23 B 10 B 11 B 12 B 13 A 30 A 31 A 32 A 33 B 20 B 21 B 22 B 23

39 Cost of Fox s Matrix Multiply A: sqrt(p) times sqrt(p) broadcasts of blocks sized 2 /p (oe-to-sqrt(p)): total volume 2 (all of A) sqrt(p) circular shifts Each circular shift (earest eighbor): volume = 2 /p Computatio time: O( 3 /p)

40 Dekel, Nassimi, Sahi Matrix Multiply 3D Mesh formulatio: Z plaes have equal k values A s colums distributed/replicated over Y plaes B s rows distributed/replicated over X plaes lots of data replicatio Do all poit to poit multiplies i parallel Collapse sum reductio up / dow the Z plaes

41 Dekel, Nassimi, Sahi Matrix Multiply Replicate A Replicate B B B j Sum reduce up A 1 A i A A 1k A ik A k B 1 B k1 B kj B k B 1 Block multiply B 1j A 11 A i1 A 1 B 11

42 Sum Reduce 1 B B j A 1 A i A B 1 B k B kj A 1k A ik A k B k1 B 1 A i1 *B 1j B 1j A 11 A i1 A 1 B 11

43 Sum Reduce k B B j A 1 A i A B 1 B k A ik *B kj B kj A 1k A ik A k B k1 B 1 A i1 *B 1j B 1j A 11 A i1 A 1 B 11

44 Sum Reduce A i *B j B B j A 1 A i A B 1 B k A ik *B kj B kj A 1k A ik A k B k1 B 1 A i1 *B 1j B 1j A 11 A i1 A 1 B 11

45 Cij += Ai* B*j C ij B B j A 1 A i A B 1 B k B kj A 1k A ik A k B k1 B 1 B 1j A 11 A i1 A 1 B 11

46 Dekel, Nassimi, Sahi Matrix Multiply C 1 C i C C 1j C ij C j B C 11 A 1 A i A C i1 C 1 B 1 B j B k B kj A 1k A ik A k B k1 B 1 B 1j A 11 A i1 A 1 B 11

47 MPI 2x2 block matrix multiply

48 four processes, four blocks per matrix A00 B00 C00 A10 B10 C10 A01 B01 0 C01 1 A11 B11 2 C11 3

49 exchage rows A00 B00 C00 A10 B10 C10 A01 B01 0 C01 1 A11 B11 2 C11 3 A00,A01 A00,A01 B00 B01 C00 0 C01 1 A10,A11 A10,A11 B10 B11 C10 2 C11 3

50 exchage colums A00 A01 B00 B10 C00 A10 A11 B00 B10 C10 A00 A01 B01 B11 C A10 A11 B01 B11 C11 2 3

51 multiply A00 A01 B00 B10 C00 A10 A11 B00 B10 C10 A00 A01 B01 B11 C A10 A11 B01 B11 C11 2 3

52 gather C00 C01 C10 C

53 multiply a block /* block size */ #defie b 8 /* A, B, C are it* */ void multblock(it C[b][b], it A[b][b], it B[b][b]) { it i,j,k; for(i=0;i<b;i++){ for(j=0;j<b;j++){ for(k=0;k<b;k++) C[i][j] += A[i][k]*B[k][j]; } } }

54 sequetial mai iitialize it mai(it argc, char *argv[]) { it i,j,k, ioff, joff; it A00[b][b], A01[b][b], A10[b][b], A11[b][b]; it B00[b][b], B01[b][b], B10[b][b], B11[b][b]; it C00[b][b], C01[b][b], C10[b][b], C11[b][b]; /* iitialize A, B ad C blocks */ for(i=0,ioff=b;i<b;i++,ioff++){ for(j=0,joff=b;j<b;j++,joff++){ A00[i][j] = i+j; A01[i][j] = i+joff; A10[i][j] = ioff+j; A11[i][j] = ioff + joff; B00[i][j] = i-j; B01[i][j] = i-joff; B10[i][j] = ioff-j; B11[i][j] = ioff - joff; C00[i][j] = 0; C01[i][j] = 0; C10[i][j] = 0; C11[i][j] = 0; }}

55 A 0 1 iitial B

56 sequetial mai compute multblock(c00,a00,b00); multblock(c00,a01,b10); pritf("\ C00: "); pritblock(c00); multblock(c01,a00,b01); multblock(c01,a01,b11); pritf("\ C01: "); pritblock(c01); multblock(c10,a10,b00); multblock(c10,a11,b10); pritf("\ C10: "); pritblock(c10); multblock(c11,a10,b01); multblock(c11,a11,b11); pritf("\ C11: "); pritblock(c11);

57 MPI code all pe-s declare all blocks (easiest) each pe iitializes it s A,B ad C blocks exchage A row blocks exchage B col blocks compute Gather C blocks I oly used blockig block J seds ad recvs makig sure seds ad recvs correctly ordered

58 pe 0 iitialize MPI_Iit( &argc, &argv ); MPI_Comm_rak( MPI_COMM_WORLD, &my_id ); MPI_Comm_size( MPI_COMM_WORLD, &p ); MPI_Barrier(MPI_COMM_WORLD); switch(my_id) { case 0: pritf("pe0: Iit\"); /* Iitialize A00, BOO ad C00 */ for(i=0;i<b;i++){ for(j=0,joff=b;j<b;j++,joff++){ A00[i][j] = i+j; B00[i][j] = i-j; C00[i][j] =0; } }

59 some exchages /* Row Exchage 0 <--> 1 */ pritf("pe0: <--> PE1: Row Exchage\"); MPI_Recv( (it *)A01, b*b, MPI_INT, 1, 1, MPI_COMM_WORLD, &status); MPI_Sed( (it *)A00, b*b, MPI_INT, 1, 2, MPI_COMM_WORLD); /* Col Exchage 0 <--> 2 */ pritf("pe0: <--> PE2: Col Exchage\"); MPI_Recv( (it *)B10, b*b, MPI_INT, 2, 3, MPI_COMM_WORLD, &status); MPI_Sed( (it *)B00, b*b, MPI_INT, 2, 4, MPI_COMM_WORLD); Row EXCHANGE i PE1: /* Row Exchage 0 <--> 1 */ pritf("pe1: <--> PE0: Row Exchage\"); MPI_Sed( (it *)A01, b*b, MPI_INT, 0, 1, MPI_COMM_WORLD); MPI_Recv( (it *)A00, b*b, MPI_INT, 0, 2, MPI_COMM_WORLD, &status);

60 pe 0 computes /* Block Multiply C00 = A00*B00 + A01*B10 */ multblock(c00,a00,b00); multblock(c00,a01,b10);

61 pe 0 gathers /* Gather */ pritf("pe0: Gather C01 <-- PE1\"); MPI_Recv( (it *)C01, b*b, MPI_INT, 1, 5, MPI_COMM_WORLD, &status); pritf("pe0: Gather C10 <-- PE2\"); MPI_Recv( (it *)C10, b*b, MPI_INT, 2, 6, MPI_COMM_WORLD, &status); pritf("pe0: Gather C11 <-- PE3\"); MPI_Recv( (it *)C11, b*b, MPI_INT, 3, 7, MPI_COMM_WORLD, &status);

62 pe 0 prits /* Prit */ pritf("pe0: Prit blocks\"); pritf("\ C00: "); pritblock(c00); pritf("\ C01: "); pritblock(c01); pritf("\ C10: "); pritblock(c10); pritf("\ C11: "); pritblock(c11); pritf("\"); break;

63 all pe-s happy J EXIT: MPI_Fialize();

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop Parallelizing The Matrix Multiplication 6/10/2013 LONI Parallel Programming Workshop 2013 1 Serial version 6/10/2013 LONI Parallel Programming Workshop 2013 2 X = A md x B dn = C mn d c i,j = a i,k b k,j

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication Nur Dean PhD Program in Computer Science The Graduate Center, CUNY 05/01/2017 Nur Dean (The Graduate Center) Matrix Multiplication 05/01/2017 1 / 36 Today, I will talk about matrix

More information

Lecture 6: Parallel Matrix Algorithms (part 3)

Lecture 6: Parallel Matrix Algorithms (part 3) Lecture 6: Parallel Matrix Algorithms (part 3) 1 A Simple Parallel Dense Matrix-Matrix Multiplication Let A = [a ij ] n n and B = [b ij ] n n be n n matrices. Compute C = AB Computational complexity of

More information

Last Time. Intro to Parallel Algorithms. Parallel Search Parallel Sorting. Merge sort Sample sort

Last Time. Intro to Parallel Algorithms. Parallel Search Parallel Sorting. Merge sort Sample sort Intro to MPI Last Time Intro to Parallel Algorithms Parallel Search Parallel Sorting Merge sort Sample sort Today Network Topology Communication Primitives Message Passing Interface (MPI) Randomized Algorithms

More information

Lecture 5: Matrices. Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur

Lecture 5: Matrices. Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur Lecture 5: Matrices Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur 29 th July, 2008 Types of Matrices Matrix Addition and Multiplication

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

Lecture 17: Array Algorithms

Lecture 17: Array Algorithms Lecture 17: Array Algorithms CS178: Programming Parallel and Distributed Systems April 4, 2001 Steven P. Reiss I. Overview A. We talking about constructing parallel programs 1. Last time we discussed sorting

More information

Numerical Algorithms

Numerical Algorithms Chapter 10 Slide 464 Numerical Algorithms Slide 465 Numerical Algorithms In textbook do: Matrix multiplication Solving a system of linear equations Slide 466 Matrices A Review An n m matrix Column a 0,0

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication Material based on Chapter 10, Numerical Algorithms, of B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c

More information

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana The Closest Lie to a Data Set i the Plae David Gurey Southeaster Louisiaa Uiversity Hammod, Louisiaa ABSTRACT This paper looks at three differet measures of distace betwee a lie ad a data set i the plae:

More information

EE123 Digital Signal Processing

EE123 Digital Signal Processing Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add

More information

CMPT 125 Assignment 2 Solutions

CMPT 125 Assignment 2 Solutions CMPT 25 Assigmet 2 Solutios Questio (20 marks total) a) Let s cosider a iteger array of size 0. (0 marks, each part is 2 marks) it a[0]; I. How would you assig a poiter, called pa, to store the address

More information

CS 33. Caches. CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 33. Caches. CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 33 Caches CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Cache Performance Metrics Miss rate fraction of memory references not found in cache (misses

More information

Algorithm. Counting Sort Analysis of Algorithms

Algorithm. Counting Sort Analysis of Algorithms Algorithm Coutig Sort Aalysis of Algorithms Assumptios: records Coutig sort Each record cotais keys ad data All keys are i the rage of 1 to k Space The usorted list is stored i A, the sorted list will

More information

CSE 160 Lecture 23. Matrix Multiplication Continued Managing communicators Gather and Scatter (Collectives)

CSE 160 Lecture 23. Matrix Multiplication Continued Managing communicators Gather and Scatter (Collectives) CS 160 Lecture 23 Matrix Multiplication Continued Managing communicators Gather and Scatter (Collectives) Today s lecture All to all communication Application to Parallel Sorting Blocking for cache 2013

More information

Thompson s Group F (p + 1) is not Minimally Almost Convex

Thompson s Group F (p + 1) is not Minimally Almost Convex Thompso s Group F (p + ) is ot Miimally Almost Covex Claire Wladis Thompso s Group F (p + ). A Descriptio of F (p + ) Thompso s group F (p + ) ca be defied as the group of piecewiseliear orietatio-preservig

More information

Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays

Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 19 25 October 2018 Topics for

More information

Transitioning to BGP

Transitioning to BGP Trasitioig to BGP ISP Workshops These materials are licesed uder the Creative Commos Attributio-NoCommercial 4.0 Iteratioal licese (http://creativecommos.org/liceses/by-c/4.0/) Last updated 24 th April

More information

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1 CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implemetatios: average cases Search Add Remove Sorted array-based Usorted array-based Balaced Search Trees O(log ) O() O() O() O(1) O()

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Parallel Computing: Parallel Algorithm Design Examples Jin, Hai

Parallel Computing: Parallel Algorithm Design Examples Jin, Hai Parallel Computing: Parallel Algorithm Design Examples Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! Given associative operator!! a 0! a 1! a 2!! a

More information

arxiv: v2 [cs.ds] 24 Mar 2018

arxiv: v2 [cs.ds] 24 Mar 2018 Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves

More information

Introduction to Parallel Programming Message Passing Interface Practical Session Part I

Introduction to Parallel Programming Message Passing Interface Practical Session Part I Introduction to Parallel Programming Message Passing Interface Practical Session Part I T. Streit, H.-J. Pflug streit@rz.rwth-aachen.de October 28, 2008 1 1. Examples We provide codes of the theoretical

More information

Wavelet Transform. CSE 490 G Introduction to Data Compression Winter Wavelet Transformed Barbara (Enhanced) Wavelet Transformed Barbara (Actual)

Wavelet Transform. CSE 490 G Introduction to Data Compression Winter Wavelet Transformed Barbara (Enhanced) Wavelet Transformed Barbara (Actual) Wavelet Trasform CSE 49 G Itroductio to Data Compressio Witer 6 Wavelet Trasform Codig PACW Wavelet Trasform A family of atios that filters the data ito low resolutio data plus detail data high pass filter

More information

A graphical view of big-o notation. c*g(n) f(n) f(n) = O(g(n))

A graphical view of big-o notation. c*g(n) f(n) f(n) = O(g(n)) ca see that time required to search/sort grows with size of We How do space/time eeds of program grow with iput size? iput. time: cout umber of operatios as fuctio of iput Executio size operatio Assigmet:

More information

A log n lower bound to compute any function in parallel Reduction and broadcast in O(log n) time Parallel prefix (scan) in O(log n) time

A log n lower bound to compute any function in parallel Reduction and broadcast in O(log n) time Parallel prefix (scan) in O(log n) time CS 267 Tricks with Trees Outlie A log lower boud to compute ay fuctio i parallel Reductio ad broadcast i O(log ) time Parallel prefix (sca) i O(log ) time Addig two -bit itegers i O(log ) time Multiplyig

More information

Cluster Analysis. Andrew Kusiak Intelligent Systems Laboratory

Cluster Analysis. Andrew Kusiak Intelligent Systems Laboratory Cluster Aalysis Adrew Kusiak Itelliget Systems Laboratory 2139 Seamas Ceter The Uiversity of Iowa Iowa City, Iowa 52242-1527 adrew-kusiak@uiowa.edu http://www.icae.uiowa.edu/~akusiak Two geeric modes of

More information

Alpha Individual Solutions MAΘ National Convention 2013

Alpha Individual Solutions MAΘ National Convention 2013 Alpha Idividual Solutios MAΘ Natioal Covetio 0 Aswers:. D. A. C 4. D 5. C 6. B 7. A 8. C 9. D 0. B. B. A. D 4. C 5. A 6. C 7. B 8. A 9. A 0. C. E. B. D 4. C 5. A 6. D 7. B 8. C 9. D 0. B TB. 570 TB. 5

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

prerequisites: 6.046, 6.041/2, ability to do proofs Randomized algorithms: make random choices during run. Main benefits:

prerequisites: 6.046, 6.041/2, ability to do proofs Randomized algorithms: make random choices during run. Main benefits: Itro Admiistrivia. Sigup sheet. prerequisites: 6.046, 6.041/2, ability to do proofs homework weekly (first ext week) collaboratio idepedet homeworks gradig requiremet term project books. questio: scribig?

More information

Array Applications. Sorting. Want to put the contents of an array in order. Selection Sort Bubble Sort Insertion Sort. Quicksort Quickersort

Array Applications. Sorting. Want to put the contents of an array in order. Selection Sort Bubble Sort Insertion Sort. Quicksort Quickersort Sortig Wat to put the cotets of a arra i order Selectio Sort Bubble Sort Isertio Sort Quicksort Quickersort 2 tj Bubble Sort - coceptual Sort a arra of umbers ito ascedig or descedig order Split the list

More information

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein 068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig

More information

Lecture 16. Parallel Matrix Multiplication

Lecture 16. Parallel Matrix Multiplication Lecture 16 Parallel Matrix Multiplication Assignment #5 Announcements Message passing on Triton GPU programming on Lincoln Calendar No class on Tuesday/Thursday Nov 16th/18 th TA Evaluation, Professor

More information

Matrices. Jordi Cortadella Department of Computer Science

Matrices. Jordi Cortadella Department of Computer Science Matrices Jordi Cortadella Department of Computer Science Matrices A matrix can be considered a two-dimensional vector, i.e. a vector of vectors. my_matrix: 3 8 1 0 5 0 6 3 7 2 9 4 // Declaration of a matrix

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

Project 2.5 Improved Euler Implementation

Project 2.5 Improved Euler Implementation Project 2.5 Improved Euler Implemetatio Figure 2.5.10 i the text lists TI-85 ad BASIC programs implemetig the improved Euler method to approximate the solutio of the iitial value problem dy dx = x+ y,

More information

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components Aoucemets Readig Chapter 4 (4.1-4.2) Project #4 is o the web ote policy about project #3 missig compoets Homework #1 Due 11/6/01 Chapter 6: 4, 12, 24, 37 Midterm #2 11/8/01 i class 1 Project #4 otes IPv6Iit,

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

IS-IS in Detail. ISP Workshops

IS-IS in Detail. ISP Workshops IS-IS i Detail ISP Workshops These materials are licesed uder the Creative Commos Attributio-NoCommercial 4.0 Iteratioal licese (http://creativecommos.org/liceses/by-c/4.0/) Last updated 27 th November

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

The golden search method: Question 1

The golden search method: Question 1 1. Golde Sectio Search for the Mode of a Fuctio The golde search method: Questio 1 Suppose the last pair of poits at which we have a fuctio evaluatio is x(), y(). The accordig to the method, If f(x())

More information

14 Dynamic. Matrix-chain multiplication. P.D. Dr. Alexander Souza. Winter term 11/12

14 Dynamic. Matrix-chain multiplication. P.D. Dr. Alexander Souza. Winter term 11/12 Algorithms Theory 14 Dynamic Programming (2) Matrix-chain multiplication P.D. Dr. Alexander Souza Optimal substructure Dynamic programming is typically applied to optimization problems. An optimal solution

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

Lab #10 Multi-dimensional Arrays

Lab #10 Multi-dimensional Arrays Multi-dimensional Arrays Sheet s Owner Student ID Name Signature Group partner 1. Two-Dimensional Arrays Arrays that we have seen and used so far are one dimensional arrays, where each element is indexed

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems

More information

Assignment 3 MPI Tutorial Compiling and Executing MPI programs

Assignment 3 MPI Tutorial Compiling and Executing MPI programs Assignment 3 MPI Tutorial Compiling and Executing MPI programs B. Wilkinson: Modification date: February 11, 2016. This assignment is a tutorial to learn how to execute MPI programs and explore their characteristics.

More information

CSE 111 Bio: Program Design I Lecture 17: software development, list methods

CSE 111 Bio: Program Design I Lecture 17: software development, list methods CSE 111 Bio: Program Desig I Lecture 17: software developmet, list methods Robert H. Sloa(CS) & Rachel Poretsky(Bio) Uiversity of Illiois, Chicago October 19, 2017 NESTED LOOPS: REVIEW Geerate times table

More information

Math Section 2.2 Polynomial Functions

Math Section 2.2 Polynomial Functions Math 1330 - Sectio. Polyomial Fuctios Our objectives i workig with polyomial fuctios will be, first, to gather iformatio about the graph of the fuctio ad, secod, to use that iformatio to geerate a reasoably

More information

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS) CSC165H1, Witer 018 Learig Objectives By the ed of this worksheet, you will: Aalyse the ruig time of fuctios cotaiig ested loops. 1. Nested loop variatios. Each of the followig fuctios takes as iput a

More information

int MPI_Cart_shift ( MPI_Comm comm, int direction, int displ, int *source, int *dest )

int MPI_Cart_shift ( MPI_Comm comm, int direction, int displ, int *source, int *dest ) Lecture 10 int MPI_Cart_shift ( MPI_Comm comm, int direction, int displ, int *source, int *dest ) comm - communicator with Cartesian structure direction - coordinate dimension of shift, in range [0,n-1]

More information

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015. Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative

More information

Speeding-up dynamic programming in sequence alignment

Speeding-up dynamic programming in sequence alignment Departmet of Computer Sciece Aarhus Uiversity Demark Speedig-up dyamic programmig i sequece aligmet Master s Thesis Dug My Hoa - 443 December, Supervisor: Christia Nørgaard Storm Pederse Implemetatio code

More information

Theory of Fuzzy Soft Matrix and its Multi Criteria in Decision Making Based on Three Basic t-norm Operators

Theory of Fuzzy Soft Matrix and its Multi Criteria in Decision Making Based on Three Basic t-norm Operators Theory of Fuzzy Soft Matrix ad its Multi Criteria i Decisio Makig Based o Three Basic t-norm Operators Md. Jalilul Islam Modal 1, Dr. Tapa Kumar Roy 2 Research Scholar, Dept. of Mathematics, BESUS, Howrah-711103,

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Arithmetic Sequences

Arithmetic Sequences . Arithmetic Sequeces COMMON CORE Learig Stadards HSF-IF.A. HSF-BF.A.1a HSF-BF.A. HSF-LE.A. Essetial Questio How ca you use a arithmetic sequece to describe a patter? A arithmetic sequece is a ordered

More information

CS 111: Program Design I Lecture 15: Objects, Pandas, Modules. Robert H. Sloan & Richard Warner University of Illinois at Chicago October 13, 2016

CS 111: Program Design I Lecture 15: Objects, Pandas, Modules. Robert H. Sloan & Richard Warner University of Illinois at Chicago October 13, 2016 CS 111: Program Desig I Lecture 15: Objects, Padas, Modules Robert H. Sloa & Richard Warer Uiversity of Illiois at Chicago October 13, 2016 OBJECTS AND DOT NOTATION Objects (Implicit i Chapter 2, Variables,

More information

CS475 Parallel Programming

CS475 Parallel Programming CS475 Parallel Programming Sorting Wim Bohm, Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license. Sorting

More information

High-Performance Parallel Computing

High-Performance Parallel Computing High-Performance Parallel Computing P. (Saday) Sadayappan Rupesh Nasre Course Overview Emphasis on algorithm development and programming issues for high performance No assumed background in computer architecture;

More information

LU Decomposition Method

LU Decomposition Method SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS LU Decompositio Method Jamie Traha, Autar Kaw, Kevi Marti Uiversity of South Florida Uited States of America kaw@eg.usf.edu http://umericalmethods.eg.usf.edu Itroductio

More information

CSE 230 Intermediate Programming in C and C++ Arrays and Pointers

CSE 230 Intermediate Programming in C and C++ Arrays and Pointers CSE 230 Intermediate Programming in C and C++ Arrays and Pointers Fall 2017 Stony Brook University Instructor: Shebuti Rayana http://www3.cs.stonybrook.edu/~cse230/ Definition: Arrays A collection of elements

More information

Lecture 14. Performance Profiling Under the hood of MPI Parallel Matrix Multiplication MPI Communicators

Lecture 14. Performance Profiling Under the hood of MPI Parallel Matrix Multiplication MPI Communicators Lecture 14 Performance Profiling Under the hood of MPI Parallel Matrix Multiplication MPI Communicators Announcements 2010 Scott B. Baden / CSE 160 / Winter 2010 2 Today s lecture Performance Profiling

More information

Module 16: Data Flow Analysis in Presence of Procedure Calls Lecture 32: Iteration. The Lecture Contains: Iteration Space.

Module 16: Data Flow Analysis in Presence of Procedure Calls Lecture 32: Iteration. The Lecture Contains: Iteration Space. The Lecture Contains: Iteration Space Iteration Vector Normalized Iteration Vector Dependence Distance Direction Vector Loop Carried Dependence Relations Dependence Level Iteration Vector - Triangular

More information

Array Processing { Part II. Multi-Dimensional Arrays. 1. What is a multi-dimensional array?

Array Processing { Part II. Multi-Dimensional Arrays. 1. What is a multi-dimensional array? Array Processing { Part II Multi-Dimensional Arrays 1. What is a multi-dimensional array? A multi-dimensional array is simply a table (2-dimensional) or a group of tables. The following is a 2-dimensional

More information

CS 111: Program Design I Lecture 15: Modules, Pandas again. Robert H. Sloan & Richard Warner University of Illinois at Chicago March 8, 2018

CS 111: Program Design I Lecture 15: Modules, Pandas again. Robert H. Sloan & Richard Warner University of Illinois at Chicago March 8, 2018 CS 111: Program Desig I Lecture 15: Modules, Padas agai Robert H. Sloa & Richard Warer Uiversity of Illiois at Chicago March 8, 2018 PYTHON STANDARD LIBRARY & BEYOND: MODULES Extedig Pytho Every moder

More information

Computers and Scientific Thinking

Computers and Scientific Thinking Computers ad Scietific Thikig David Reed, Creighto Uiversity Chapter 15 JavaScript Strigs 1 Strigs as Objects so far, your iteractive Web pages have maipulated strigs i simple ways use text box to iput

More information

Multiprocessors. HPC Prof. Robert van Engelen

Multiprocessors. HPC Prof. Robert van Engelen Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies

More information

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science High Performance Computing Lecture 41 Matthew Jacob Indian Institute of Science Example: MPI Pi Calculating Program /Each process initializes, determines the communicator size and its own rank MPI_Init

More information

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:... ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 016 Solutions Name:... Answer questions in space provided below questions. Use additional paper if necessary but make sure

More information

On Spectral Theory Of K-n- Arithmetic Mean Idempotent Matrices On Posets

On Spectral Theory Of K-n- Arithmetic Mean Idempotent Matrices On Posets Iteratioal Joural of Sciece, Egieerig ad echology Research (IJSER), Volume 5, Issue, February 016 O Spectral heory Of -- Arithmetic Mea Idempotet Matrices O Posets 1 Dr N Elumalai, ProfRMaikada, 3 Sythiya

More information

Lecture 13. Writing parallel programs with MPI Matrix Multiplication Basic Collectives Managing communicators

Lecture 13. Writing parallel programs with MPI Matrix Multiplication Basic Collectives Managing communicators Lecture 13 Writing parallel programs with MPI Matrix Multiplication Basic Collectives Managing communicators Announcements Extra lecture Friday 4p to 5.20p, room 2154 A4 posted u Cannon s matrix multiplication

More information

CS 111: Program Design I Lecture 20: Web crawling, HTML, Copyright

CS 111: Program Design I Lecture 20: Web crawling, HTML, Copyright CS 111: Program Desig I Lecture 20: Web crawlig, HTML, Copyright Robert H. Sloa & Richard Warer Uiversity of Illiois at Chicago November 8, 2016 WEB CRAWLER AGAIN Two bits of useful Pytho sytax Do't eed

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

Optimal Mapped Mesh on the Circle

Optimal Mapped Mesh on the Circle Koferece ANSYS 009 Optimal Mapped Mesh o the Circle doc. Ig. Jaroslav Štigler, Ph.D. Bro Uiversity of Techology, aculty of Mechaical gieerig, ergy Istitut, Abstract: This paper brigs out some ideas ad

More information

211: Computer Architecture Summer 2016

211: Computer Architecture Summer 2016 211: Computer Architecture Summer 2016 Liu Liu Topic: Assembly Programming Storage - Assembly Programming: Recap - Call-chain - Factorial - Storage: - RAM - Caching - Direct - Mapping Rutgers University

More information

Systems I. Optimizing for the Memory Hierarchy. Topics Impact of caches on performance Memory hierarchy considerations

Systems I. Optimizing for the Memory Hierarchy. Topics Impact of caches on performance Memory hierarchy considerations Systems I Optimizing for the Memory Hierarchy Topics Impact of caches on performance Memory hierarchy considerations Cache Performance Metrics Miss Rate Fraction of memory references not found in cache

More information

Xbar/R Chart for x1-x3

Xbar/R Chart for x1-x3 Chapter 6 Selected roblem Solutios Sectio 6-5 6- a) X-bar ad Rage - Iitial Study Chartig roblem 6- X-bar Rage ----- ----- UCL:. sigma 7.4 UCL:. sigma 5.79 Ceterlie 5.9 Ceterlie.5 LCL: -. sigma.79 LCL:

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Memory Hierarchy Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Time (ns) The CPU-Memory Gap The gap widens between DRAM, disk, and CPU speeds

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000. 5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

Leica Lino Accurate, self-levelling point and line lasers

Leica Lino Accurate, self-levelling point and line lasers Leica Lio Accurate, self-levellig poit ad lie lasers Setup, Switch o, Ready! With the Leica Lio everythig is plumb ad perfectly aliged Leica Lios project lies or poits to millimeter accuracy, leavig your

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

SPIRAL DSP Transform Compiler:

SPIRAL DSP Transform Compiler: SPIRAL DSP Trasform Compiler: Applicatio Specific Hardware Sythesis Peter A. Milder (peter.milder@stoybroo.edu) Fraz Frachetti, James C. Hoe, ad Marus Pueschel Departmet of ECE Caregie Mello Uiversity

More information

Chapter 8 Dense Matrix Algorithms

Chapter 8 Dense Matrix Algorithms Chapter 8 Dense Matrix Algorithms (Selected slides & additional slides) A. Grama, A. Gupta, G. Karypis, and V. Kumar To accompany the text Introduction to arallel Computing, Addison Wesley, 23. Topic Overview

More information

CS 177. Lists and Matrices. Week 8

CS 177. Lists and Matrices. Week 8 CS 177 Lists and Matrices Week 8 1 Announcements Project 2 due on 7 th March, 2015 at 11.59 pm Table of Contents Lists Matrices Traversing a Matrix Construction of Matrices 3 Just a list of numbers 1D

More information

Homework 1 Solutions MA 522 Fall 2017

Homework 1 Solutions MA 522 Fall 2017 Homework 1 Solutios MA 5 Fall 017 1. Cosider the searchig problem: Iput A sequece of umbers A = [a 1,..., a ] ad a value v. Output A idex i such that v = A[i] or the special value NIL if v does ot appear

More information

Instruction and Data Streams

Instruction and Data Streams Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad

More information

Basic Structure and Low Level Routines

Basic Structure and Low Level Routines SUZAKU Pattern Programming Framework Specification 1 - Structure and Low-Level Patterns B. Wilkinson, March 17, 2016. Suzaku is a pattern parallel programming framework developed at UNC-Charlotte that

More information

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le Fudametals of Media Processig Shi'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dih Le Today's topics Noparametric Methods Parze Widow k-nearest Neighbor Estimatio Clusterig Techiques k-meas Agglomerative Hierarchical

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

1.2 Binomial Coefficients and Subsets

1.2 Binomial Coefficients and Subsets 1.2. BINOMIAL COEFFICIENTS AND SUBSETS 13 1.2 Biomial Coefficiets ad Subsets 1.2-1 The loop below is part of a program to determie the umber of triagles formed by poits i the plae. for i =1 to for j =

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #15 3/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline

More information

Attendance (2) Performance (3) Oral (5) Total (10) Dated Sign of Subject Teacher

Attendance (2) Performance (3) Oral (5) Total (10) Dated Sign of Subject Teacher Attendance (2) Performance (3) Oral (5) Total (10) Dated Sign of Subject Teacher Date of Performance:... Actual Date of Completion:... Expected Date of Completion:... ----------------------------------------------------------------------------------------------------------------

More information

IMAGE-BASED MODELING AND RENDERING 1. HISTOGRAM AND GMM. I-Chen Lin, Dept. of CS, National Chiao Tung University

IMAGE-BASED MODELING AND RENDERING 1. HISTOGRAM AND GMM. I-Chen Lin, Dept. of CS, National Chiao Tung University IMAGE-BASED MODELING AND RENDERING. HISTOGRAM AND GMM I-Che Li, Dept. of CS, Natioal Chiao Tug Uiversity Outlie What s the itesity/color histogram? What s the Gaussia Mixture Model (GMM? Their applicatios

More information

State-space feedback 6 challenges of pole placement

State-space feedback 6 challenges of pole placement State-space feedbac 6 challeges of pole placemet J Rossiter Itroductio The earlier videos itroduced the cocept of state feedbac ad demostrated that it moves the poles. x u x Kx Bu It was show that whe

More information

9 x and g(x) = 4. x. Find (x) 3.6. I. Combining Functions. A. From Equations. Example: Let f(x) = and its domain. Example: Let f(x) = and g(x) = x x 4

9 x and g(x) = 4. x. Find (x) 3.6. I. Combining Functions. A. From Equations. Example: Let f(x) = and its domain. Example: Let f(x) = and g(x) = x x 4 1 3.6 I. Combiig Fuctios A. From Equatios Example: Let f(x) = 9 x ad g(x) = 4 f x. Fid (x) g ad its domai. 4 Example: Let f(x) = ad g(x) = x x 4. Fid (f-g)(x) B. From Graphs: Graphical Additio. Example:

More information