Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
|
|
- Howard Gregory
- 5 years ago
- Views:
Transcription
1 Spring 2 Parallelization Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
2 Outline Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization a at Opportunities t
3 Moore s Law From Hennessy and Patterson, Computer Architecture: Itanium 2 A Quantitative Approach, 4th edition, 26 Itanium,,, mance (vs. VAX X-/78) Perfor %/year 486 Pentium 52%/year P2 P3 P4??%/year,,,,,,, Numbe er of Transistors s 886, Saman Amarasinghe MIT From Fall David 26 Patterson
4 Uniprocessor Performance (SPECint) From Hennessy and Patterson, Computer Architecture: Itanium 2 A Quantitative Approach, 4th edition, 26 Itanium,,, P4,, Pentium P2 P3,,,,, Numbe er of Transistors s 886, Saman Amarasinghe MIT From Fall David 26 Patterson
5 Multicores Are Here! 52 Picochip Ambric PC2 AM Cisco CSR- Intel 28 Tflops 64 #of cores 32 Raza Cavium Raw XLR Octeon 6 8 Niagara Opteron 4P Boardcom 48 4 Xeon MP Xbox36 Power4 2 PA-88 Opteron Tanglewood PExtreme Power6 Yonah Pentium P2 P3 Itanium P4 88 Athlon Itanium 2 Cell ?? Saman Amarasinghe MIT Fall 26
6 Issues with Parallelism Amdhal s Law Any computation can be analyzed in terms of a portion that must be executed sequentially, Ts, and a portion that can be executed in parallel, Tp. Then for n processors: T(n) = Ts + Tp/n T() = Ts, thus maximum speedup (Ts + Tp) /Ts Load Balancing The work is distributed among processors so that all processors are kept busy when parallel task is executed. Granularity The size of the parallel regions between synchronizations or the ratio of computation (useful work) to communication (overhead). Saman Amarasinghe MIT Fall 26
7 Outline Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization a at Opportunities t
8 Types of Parallelism Instruction Level Parallelism li (ILP) Scheduling and Hardware Task Level Parallelism (TLP) Loop Level Parallelism (LLP) or Data Parallelism Pipeline Parallelism Divide and Conquer Parallelism Mainly by hand Hand or Compiler Generated Hardware or Streaming Recursive functions Saman Amarasinghe MIT Fall 26
9 Why Loops? 9% of the execution time in % of the code Mostly in loops If parallel, can get good performance Load balancing Relatively easy to analyze Saman Amarasinghe MIT Fall 26
10 Programmer Defined Parallel Loop FORALL FORACROSS No loop carried Some loop carried dependences dependences Fully parallel Saman Amarasinghe 6.35 MIT Fall 26
11 Parallel Execution Example FORPAR I = to N A[I] = A[I] + Block Distribution: Program gets mapped into Iters = ceiling(n/numproc); FOR P = to NUMPROC- FOR I = P*Iters to MIN((P+)*Iters, N) A[I] = A[I] + SPMD (Single Program, Multiple Data) Code If(myPid == ) { Iters = ceiling(n/numproc); } Barrier(); FOR I = mypid*iters to MIN((myPid+)*Iters, Iters, N) A[I] = A[I] + Barrier(); Saman Amarasinghe 6.35 MIT Fall 26
12 Parallel Execution Example FORPAR I = to N A[I] = A[I] + Block Distribution: Program gets mapped into Iters = ceiling(n/numproc); FOR P = to NUMPROC- FOR I = P*Iters to MIN((P+)*Iters, N) A[I] = A[I] + Code fork a function Iters = ceiling(n/numproc); ParallelExecute(func); void func(integer mypid) { FOR I = mypid*iters to MIN((myPid+)*Iters, Iters, N) A[I] = A[I] + } Saman Amarasinghe MIT Fall 26
13 Parallel Execution SPMD Need to get all the processors execute the control flow Et Extra synchronization overhead or redundant d computation on all processors or both Stack: Private or Shared? Fork Local variables not visible within the function Either make the variables used/defined in the loop body global or pass and return them as arguments Function call overhead Saman Amarasinghe MIT Fall 26
14 Parallel Thread Basics Create separate threads Create an OS thread (hopefully) it will be run on a separate core pthread_create(&thr, NULL, &entry_point, NULL) Overhead in thread creation Create a separate stack Get the OS to allocate a thread Thread pool Create all the threads (= num cores) at the beginning g Keep N- idling on a barrier, while sequential execution Get them to run parallel code by each executing a function Back to the barrier when parallel region is done Saman Amarasinghe MIT Fall 26
15 Outline Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization a at Opportunities t
16 Parallelizing Compilers Finding FORALL Loops out of FOR loops Examples FOR I = to 5 A[I] = A[I] + FOR I = to 5 A[I] = A[I+6] + For I = to 5 A[2*I] = A[2*I + ] + Saman Amarasinghe MIT Fall 26
17 Iteration Space N deep loops n-dimensional discrete cartesian space Normalized loops: assume step size = FOR I = to 6 FOR J = I to 7 2 I J Iterations are represented as coordinates in iteration space i = [i, i 2, i 3,, i n ] Saman Amarasinghe MIT Fall 26
18 Iteration Space N deep loops n-dimensional discrete cartesian space Normalized loops: assume step size = FOR I = to 6 FOR J = I to 7 I J Iterations are represented as coordinates in iteration space Sequential execution order of iterations Lexicographic order [,], [,], [,2],, [,6], [,7], [,], [,2],, [,6], [,7], [2,2],, [2,6], [2,7], [6,6], [6,7], Saman Amarasinghe MIT Fall 26
19 Iteration Space N deep loops n-dimensional discrete cartesian space Normalized loops: assume step size = FOR I = to 6 FOR J = I to 7 I J Iterations are represented as coordinates in iteration space Sequential execution order of iterations Lexicographic order Iteration i is lexicograpically less than j, i < j iff there exists c s.t. i = j, i 2 = j 2, i c- = j c- and i c < j c Saman Amarasinghe MIT Fall 26
20 Iteration Space N deep loops n-dimensional discrete cartesian space Normalized loops: assume step size = FOR I = to 6 FOR J = I to 7 I J An affine loop nest Loop bounds are integer linear functions of constants, loop constant variables and outer loop indexes Array accesses are integer linear functions of constants, loop constant variables and loop indexes Saman Amarasinghe MIT Fall 26
21 Iteration Space N deep loops n-dimensional discrete cartesian space Normalized loops: assume step size = J FOR I = to 6 FOR J = I to 7 2 I Affine loop nest Iteration space as a set of liner inequalities I I J I 6 J 7 Saman Amarasinghe MIT Fall 26
22 Data Space M dimensional arrays m-dimensional discrete cartesian space a hypercube Integer A() Float B(5, 6) Saman Amarasinghe MIT Fall 26
23 Dependences True dependence a = = a Anti dependence = a a = Output dependence a = a = Definition: Data dependence exists for a dynamic instance i and j iff either i or j is a write operation i and j refer to the same variable i executes before j How about array accesses within loops? Saman Amarasinghe MIT Fall 26
24 Outline Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization a at Opportunities t
25 Array Accesses in a loop FOR I = to 5 A[I] = A[I] + Iteration Space Data Space Saman Amarasinghe MIT Fall 26
26 Array Accesses in a loop FOR I = to 5 A[I] = A[I] + Iteration Space Data Space A[I] A[I] A[I] A[I] A[I] = A[I] =A[I] = A[I] = A[I] = A[I] [] A[I] = A[I] Saman Amarasinghe MIT Fall 26
27 Array Accesses in a loop FOR I = to 5 A[I+] = A[I] + Iteration Space Data Space A[I+] A[I+] A[I+] A[I+] A[I+] = A[I] =A[I] = A[I] = A[I] = A[I] [] A[I+] = A[I] Saman Amarasinghe MIT Fall 26
28 Array Accesses in a loop FOR I = to 5 A[I] = A[I+2] + Iteration Space Data Space A[I] A[I] A[I] A[I] A[I] = A[I+2] =A[I+2] = A[I+2] = A[I+2] = A[I+2] A[I] = A[I+2] Saman Amarasinghe MIT Fall 26
29 Array Accesses in a loop FOR I = to 5 A[2*I] = A[2*I+] + Iteration Space Data Space = A[2*+] A[2*I] = A[2*I+] A[2*I] = A[2*I+] A[2*I] = A[2*I+] A[2*I] = A[2*I+] A[2*I] = A[2*I+] A[2*I] Saman Amarasinghe MIT Fall 26
30 Distance Vectors A loop has a distance d if there exist a data dependence from iteration i to j and d = j-i dv FOR I = to 5 A[I] = A[I] + dv FOR I = to 5 A[I+] = A[I] + dv 2 FOR I = to 5 A[I] = A[I+2] +, 2 dv * FOR I = to 5 A[I] = A[] + Saman Amarasinghe MIT Fall 26
31 Multi-Dimensional Dependence FOR I = to n FOR J = to n A[I, J] = A[I, J-] + dv I J Saman Amarasinghe MIT Fall 26
32 Multi-Dimensional Dependence FOR I = to n FOR J = to n A[I, J] = A[I, J-] + dv I J FOR I = to n FOR J = to n A[I, J] = A[I+, J] + dv J I Saman Amarasinghe MIT Fall 26
33 Outline Dependence Analysis Increasing Parallelization Opportunities
34 What is the Dependence? FOR I = to n FOR J = to n A[I, J] = A[I-, J+] + I J Saman Amarasinghe MIT Fall 26
35 What is the Dependence? FOR I = to n FOR J = to n A[I, J] = A[I-, J+] + I J Saman Amarasinghe MIT Fall 26
36 What is the Dependence? dv FOR I = to n FOR J = to n A[I, J] = A[I-, J+] + I J Saman Amarasinghe MIT Fall 26
37 What is the Dependence? FOR I = to n FOR J = to n A[I, J] = A[I-, J+] + I J FOR I = to n FOR J = to n A[I] = A[I-] + I J Saman Amarasinghe MIT Fall 26
38 What is the Dependence? FOR I = to n FOR J = to n A[I, J] = A[I-, J+] + dv dv=[, -] I J FOR I = to n FOR J = to n B[I] = B[I-] + dv,,, 2 3 * J I Saman Amarasinghe MIT Fall 26
39 What is the Dependence? FOR i = to N- FOR j = to N- A[i,j] = A[i,j-] + A[i-,j]; I J dv, Saman Amarasinghe MIT Fall 26
40 Recognizing g FORALL Loops Find data dependences in loop For every pair of array acceses to the same array If the first access has at least one dynamic instance (an iteration) in which it refers to a location in the array that the second access also refers to in at least one of the later dynamic instances (iterations). Then there is a data dependence between the statements (Note that same array can refer to itself output dependences) Definition Loop-carried dependence: dependence that crosses a loop boundary If there are no loop carried dependences parallelizable Saman Amarasinghe MIT Fall 26
41 Data Dependence Analysis I: Distance Vector method II: Integer Programming Saman Amarasinghe MIT Fall 26
42 Distance Vector Method The i th loop is parallelizable for all dependence d = [d,,d i,..d n ] either one of d,,d i- is > or all d,,d i = Saman Amarasinghe MIT Fall 26
43 Is the Loop Parallelizable? dv Yes FOR I = to 5 A[I] = A[I] + dv No FOR I = to 5 A[I+] = A[I] + dv 2 No FOR I = to 5 A[I] = A[I+2] + dv * No FOR I = to 5 A[I] = A[] + Saman Amarasinghe MIT Fall 26
44 Are the Loops Parallelizable? FOR I = to n dv FOR J = to n A[I, J] = A[I, J-] + Yes No I J FOR I = to n dv FOR J = to n A[I, J] = A[I+, J] + No Yes Saman Amarasinghe MIT Fall 26 I J
45 Are the Loops Parallelizable? FOR I = to n FOR J = to n A[I, J] = A[I-, J+] + dv dv=[, -] No Yes I J FOR I = to n FOR J = to n B[I] = B[I-] + No dv * Yes J I Saman Amarasinghe MIT Fall 26
46 Integer Programming g Method Example FOR I = to 5 A[I+] = A[I] + Is there a loop-carried dependence between A[I+] and A[I] Is there two distinct iterations i w and i r such that A[i w +] is the same location as A[i r ] integers i w, i r i w, i r 5 i w i r i w + = i r Is there a dependence between A[I+] and A[I+] Is there two distinct iterations i and i 2 such that A[i +] is the same location as A[i 2 +] integers i, i 2 i, i 2 5 i i 2 i +=i 2 + Saman Amarasinghe MIT Fall 26
47 Integer Programming g Method Formulation FOR I = to 5 A[I+] = A[I] + an integer vector i such that  i b where  is an integer matrix and b is an integer vector Saman Amarasinghe MIT Fall 26
48 Iteration Space N deep loops n-dimensional discrete cartesian space FOR I = to 5 A[I+] = A[I] + Affine loop nest Iteration space as a set of liner inequalities 2 I I 3 I 6 4 I J 5 J J Saman Amarasinghe MIT Fall 26
49 Integer Programming g Method Formulation FOR I = to 5 A[I+] = A[I] + an integer vector i such that  i b where  is an integer matrix and b is an integer vector Our problem formulation for A[i] and A[i+] integers i w, i r i w, i r 5 i w i r i w + = i r i w i r is not an affine function divide into 2 problems Problem with i w <i r and problem 2 with i r <i w If either problem has a solution there exists a dependence How about i w + = i r Add two inequalities to single problem i w + i r, and i r i w + Saman Amarasinghe MIT Fall 26
50 Integer Programming g Formulation FOR I = to 5 Problem A[I+] = A[I] + i w i w 5 i r i r 5 i w < i r i w + i r i r i w + Saman Amarasinghe MIT Fall 26
51 Integer Programming g Formulation FOR I = to 5 Problem A[I+] = A[I] + i w -i w i w 5 i w 5 i r -i r i r 5 i r 5 i w < i r i w - i r - i w + i r i w -i r - i r i w + -i w +i r Saman Amarasinghe MIT Fall 26
52 Integer Programming g Formulation 5 Problem 5 i w -i w - i w 5 i w i r -i r i r 5 i r i r i w + -i w +i r - Â b 5-5 i w < i r i w - i r i w + i r i w -i r and problem 2 with i r < i w Saman Amarasinghe MIT Fall 26
53 Generalization An affine loop nest FOR i = f l (c c k ) to I u (c c k ) FOR i 2 = f l2 (i,c c k ) to I u2 (i,c c k ) FOR i n = f ln (i i n-,c c k ) to I un (i i n-,c c k ) A[f a (i i n,c c k ), f a2 (i i n,c c k ),,f am (i i n,c c k )] Solve 2*n problems of the form i = j, i 2 = j 2, i n- = j n-, i n < j n i = j, i 2 = j 2, i n- = j n-, j n < i n i = j, i 2 = j 2, i n- < j n- i = j, i 2 = j 2, j n- < i n- i = j, i 2 < j 2 i = j, j 2 < i 2 i < j j < i Saman Amarasinghe MIT Fall 26
54 Outline Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization a at Opportunities t
55 Increasing Parallelization Scalar Privatization Opportunities Reduction Recognition Induction Variable Identification Array Privatization Loop Transformations Granularity of Parallelism Interprocedural Parallelization Saman Amarasinghe MIT Fall 26
56 Scalar Privatization Example FOR i = to n X = A[i] * 3; B[i] = X; Is there a loop carried dependence? What is the type of dependence? Saman Amarasinghe MIT Fall 26
57 Privatization Analysis: Any anti- and output- loop-carried dependences Eliminate i by assigning i in local l context t FOR i = to n integer Xtmp; Xtmp = A[i] * 3; B[i] = Xtmp; Eliminate by expanding into an array FOR i = to n Xtmp[i] = A[i] * 3; B[i] = Xtmp[i]; Saman Amarasinghe MIT Fall 26
58 Xtmp[i] = A[i] * 3; B[i] = Xtmp[i]; X = Xtmp[n]; Privatization Need a final assignment to maintain the correct value after the loop nest Eliminate by assigning in local context FOR i = to n integer Xtmp; Xtmp = A[i] * 3; B[i] = Xtmp; if(i == n) X = Xtmp Eliminate i by expanding into an array FOR i = to n Saman Amarasinghe MIT Fall 26
59 Another Example How about loop-carried true dependences? Example FOR i = to n X = X + A[i]; Is this loopparallelizable? Saman Amarasinghe MIT Fall 26
60 Reduction Recognition Reduction Analysis: Only associative operations The result is never used within the loop Transformation Integer Xtmp[NUMPROC]; Barrier(); FOR i = mypid*iters to MIN((myPid+)*Iters, n) Xtmp[myPid] = Xtmp[myPid] + A[i]; Barrier(); If(myPid == ) { FOR p = to NUMPROC- X = X + Xtmp[p]; Saman Amarasinghe MIT Fall 26
61 Induction Variables Example FOR i = to N A[i] = 2^i; After strength reduction t = FOR i = to N A[i] = t; t = t*2; What happened to loop carried dependences? Need to do opposite of this! Perform induction variable analysis Rewrite IVs as a function of the loop variable Saman Amarasinghe MIT Fall 26
62 Array Privatization Similar to scalar privatization However, analysis is more complex Array Data Dependence Analysis: Checks if two iterations access the same location Array Data Flow Analysis: Checks if two iterations access the same value Transformations Similar to scalar privatization Pi Private copy for each processor or expand with an additional dimension Saman Amarasinghe MIT Fall 26
63 Loop Transformations A loop may not be parallel as is Example FOR i = to N- FOR j = to N- A[i,j] = A[i,j-] + A[i-,j]; I J Saman Amarasinghe MIT Fall 26
64 Loop Transformations A loop may not be parallel as is Example FOR i = to N- FOR j = to N- A[i,j] = A[i,j-] + A[i-,j]; I J J After loop Skewing i new i j new j FOR i = to 2*N-3 FORPAR j = max(,i-n+2) to min(i, N-) A[i-j+,j] = A[i-j+,j-] + A[i-j,j]; old old I Saman Amarasinghe MIT Fall 26
65 Granularity of Parallelism Example FOR i = to N- FOR j = to N- A[i,j] = A[i,j] + A[i-,j]; Gets transformed into FOR i = to N- Barrier(); FOR j = + mypid*iters to MIN((myPid+)*Iters, n-) A[i,j] = A[i,j] + A[i-,j]; Barrier(); I J Inner loop parallelism can be expensive Startup and teardown overhead of parallel regions Lot of synchronization Can even lead to slowdowns Saman Amarasinghe MIT Fall 26
66 Granularity of Parallelism Inner loop pp parallelism can be expensive Solutions Don t parallelize if the amount of work within the loop is too small or Transform into outer-loop parallelism Saman Amarasinghe MIT Fall 26
67 Outer Loop Parallelism Example FOR i = to N- FOR j = to N- A[i,j] = A[i,j] + A[i-,j]; I J After Loop Transpose FOR j = to N- FOR i = to N- A[i,j] = A[i,j] + A[i-,j]; Get mapped into Barrier(); FOR j = + mypid*iters to MIN((myPid+)*Iters, n-) FOR i = to N- A[i,j] = A[i,j] + A[i-,j]; Barrier(); J I Saman Amarasinghe MIT Fall 26
68 Unimodular Transformations Interchange, reverse and skew Use a matrix transformation I new = A I old Interchange i new i j new j old old Reverse i new i j new j old old Skew i new i j new j Saman Amarasinghe MIT Fall 26 old old
69 Legality of Transformations g y Unimodular transformation with matrix A is valid iff. For all dependence vectors v For all dependence vectors v the first non-zero in Av is positive Example FOR i = to N- FOR j = to N- A[i,j] = A[i,j] + A[i-,j];, dv Interchange A Reverse A Skew Saman Amarasinghe MIT Fall 26 A
70 Interprocedural Parallelization Function calls will make a loop unparallelizatble Reduction of available parallelism A lot of inner-loop parallelism Solutions Interprocedural Analysis Inlining Saman Amarasinghe MIT Fall 26
71 Interprocedural Parallelization Issues Same function reused many times Analyze a function on each trace Possibly exponential Analyze a function once unrealizable path problem Interprocedural Analysis Need to update all the analysis Complex analysis Can be expensive Inlining Works with existing analysis Large code bloat can be very expensive Saman Amarasinghe MIT Fall 26
72 Summary Multicores are here Need parallelism to keep the performance gains Programmer defined or compiler extracted parallelism Automatic parallelization of loops with arrays Requires Data Dependence Analysis Iteration space & data space abstraction An integer programming problem Many optimizations that ll increase parallelism Saman Amarasinghe MIT Fall 26
73 MIT OpenCourseWare Computer Language Engineering Spring 2 For information about citing these materials or our Terms of Use, visit:
Outline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities
Parallelization Outline Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Moore s Law From Hennessy and Patterson, Computer Architecture:
More information6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT
6.189 IAP 2007 Lecture 11 Parallelizing Compilers 1 6.189 IAP 2007 MIT Outline Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Generation of Parallel
More informationCS758: Multicore Programming
CS758: Multicore Programming Introduction Fall 2009 1 CS758 Credits Material for these slides has been contributed by Prof. Saman Amarasinghe, MIT Prof. Mark Hill, Wisconsin Prof. David Patterson, Berkeley
More informationSaman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology
Saman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology http://cag.csail.mit.edu/ps3 6.189-chair@mit.edu A new processor design pattern emerges: The Arrival of Multicores MIT Raw 16 Cores
More informationPerformance Issues in Parallelization Saman Amarasinghe Fall 2009
Performance Issues in Parallelization Saman Amarasinghe Fall 2009 Today s Lecture Performance Issues of Parallelism Cilk provides a robust environment for parallelization It hides many issues and tries
More informationPerformance Issues in Parallelization. Saman Amarasinghe Fall 2010
Performance Issues in Parallelization Saman Amarasinghe Fall 2010 Today s Lecture Performance Issues of Parallelism Cilk provides a robust environment for parallelization It hides many issues and tries
More informationLecture 9 Basic Parallelization
Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning
More informationLecture 9 Basic Parallelization
Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning
More informationData Dependences and Parallelization
Data Dependences and Parallelization 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11, 20 a[i]
More informationSpring 2 Spring Loop Optimizations
Spring 2010 Loop Optimizations Instruction Scheduling 5 Outline Scheduling for loops Loop unrolling Software pipelining Interaction with register allocation Hardware vs. Compiler Induction Variable Recognition
More informationBlue-Steel Ray Tracer
MIT 6.189 IAP 2007 Student Project Blue-Steel Ray Tracer Natalia Chernenko Michael D'Ambrosio Scott Fisher Russel Ryan Brian Sweatt Leevar Williams Game Developers Conference March 7 2007 1 Imperative
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationOutline. L5: Writing Correct Programs, cont. Is this CUDA code correct? Administrative 2/11/09. Next assignment (a homework) given out on Monday
Outline L5: Writing Correct Programs, cont. 1 How to tell if your parallelization is correct? Race conditions and data dependences Tools that detect race conditions Abstractions for writing correct parallel
More informationCompiling for Advanced Architectures
Compiling for Advanced Architectures In this lecture, we will concentrate on compilation issues for compiling scientific codes Typically, scientific codes Use arrays as their main data structures Have
More informationPetaBricks: A Language and Compiler based on Autotuning
PetaBricks: A Language and Compiler based on Autotuning Saman Amarasinghe Joint work with Jason Ansel, Marek Olszewski Cy Chan, Yee Lok Wong, Maciej Pacula Una-May O Reilly and Alan Edelman Computer Science
More informationLecture 8. The StreamIt Language IAP 2007 MIT
6.189 IAP 2007 Lecture 8 The StreamIt Language 1 6.189 IAP 2007 MIT Languages Have Not Kept Up C von-neumann machine Modern architecture Two choices: Develop cool architecture with complicated, ad-hoc
More informationCS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012
CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Administrative Mailing list set up, everyone should be on it - You should have received a test mail last night
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Previously Unroll and Jam Homework PA3 is due Monday November 2nd Today Unroll and Jam is tiling Code generation for fixed-sized tiles Paper writing and critique
More informationMulti-core Architectures. Dr. Yingwu Zhu
Multi-core Architectures Dr. Yingwu Zhu What is parallel computing? Using multiple processors in parallel to solve problems more quickly than with a single processor Examples of parallel computing A cluster
More informationExploring Parallelism At Different Levels
Exploring Parallelism At Different Levels Balanced composition and customization of optimizations 7/9/2014 DragonStar 2014 - Qing Yi 1 Exploring Parallelism Focus on Parallelism at different granularities
More informationScheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok
Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok Texas Learning and Computation Center Department of Computer Science University of Houston Outline Motivation
More informationIntroduction to Parallel Programming. Tuesday, April 17, 12
Introduction to Parallel Programming 1 Overview Parallel programming allows the user to use multiple cpus concurrently Reasons for parallel execution: shorten execution time by spreading the computational
More informationCS 293S Parallelism and Dependence Theory
CS 293S Parallelism and Dependence Theory Yufei Ding Reference Book: Optimizing Compilers for Modern Architecture by Allen & Kennedy Slides adapted from Louis-Noël Pouche, Mary Hall End of Moore's Law
More informationParallel Computing. Hwansoo Han (SKKU)
Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo
More informationModule 18: Loop Optimizations Lecture 35: Amdahl s Law. The Lecture Contains: Amdahl s Law. Induction Variable Substitution.
The Lecture Contains: Amdahl s Law Induction Variable Substitution Index Recurrence Loop Unrolling Constant Propagation And Expression Evaluation Loop Vectorization Partial Loop Vectorization Nested Loops
More informationA Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality
A Crash Course in Compilers for Parallel Computing Mary Hall Fall, 2008 1 Overview of Crash Course L1: Data Dependence Analysis and Parallelization (Oct. 30) L2 & L3: Loop Reordering Transformations, Reuse
More informationMIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:
MIT OpenCourseWare http://ocw.mit.edu 6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation format: Saman Amarasinghe, 6.189 Multicore Programming Primer, January (IAP)
More informationPrinciple of Polyhedral model for loop optimization. cschen 陳鍾樞
Principle of Polyhedral model for loop optimization cschen 陳鍾樞 Outline Abstract model Affine expression, Polygon space Polyhedron space, Affine Accesses Data reuse Data locality Tiling Space partition
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Announcements Monday November 28th, Dr. Sanjay Rajopadhye is talking at BMAC Friday December 2nd, Dr. Sanjay Rajopadhye will be leading CS553 Last Monday Kelly
More informationLegal and impossible dependences
Transformations and Dependences 1 operations, column Fourier-Motzkin elimination us use these tools to determine (i) legality of permutation and Let generation of transformed code. (ii) Recall: Polyhedral
More informationOutline. Issues with the Memory System Loop Transformations Data Transformations Prefetching Alias Analysis
Memory Optimization Outline Issues with the Memory System Loop Transformations Data Transformations Prefetching Alias Analysis Memory Hierarchy 1-2 ns Registers 32 512 B 3-10 ns 8-30 ns 60-250 ns 5-20
More informationLecture 11 Loop Transformations for Parallelism and Locality
Lecture 11 Loop Transformations for Parallelism and Locality 1. Examples 2. Affine Partitioning: Do-all 3. Affine Partitioning: Pipelining Readings: Chapter 11 11.3, 11.6 11.7.4, 11.9-11.9.6 1 Shared Memory
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationThe Polyhedral Model (Transformations)
The Polyhedral Model (Transformations) Announcements HW4 is due Wednesday February 22 th Project proposal is due NEXT Friday, extension so that example with tool is possible (see resources website for
More informationAllows program to be incrementally parallelized
Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP
More informationAdvanced Compiler Construction Theory And Practice
Advanced Compiler Construction Theory And Practice Introduction to loop dependence and Optimizations 7/7/2014 DragonStar 2014 - Qing Yi 1 A little about myself Qing Yi Ph.D. Rice University, USA. Associate
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationCSE 392/CS 378: High-performance Computing - Principles and Practice
CSE 392/CS 378: High-performance Computing - Principles and Practice Parallel Computer Architectures A Conceptual Introduction for Software Developers Jim Browne browne@cs.utexas.edu Parallel Computer
More informationI-1 Introduction. I-0 Introduction. Objectives:
I-0 Introduction Objectives: Explain necessity of parallel/multithreaded algorithms. Describe different forms of parallel processing. Present commonly used architectures. Introduce a few basic terms. Comments:
More information6.189 IAP Lecture 3. Introduction to Parallel Architectures. Prof. Saman Amarasinghe, MIT IAP 2007 MIT
6.189 IAP 2007 Lecture 3 Introduction to Parallel Architectures 1 6.189 IAP 2007 MIT Implicit vs. Explicit Parallelism Implicit Explicit Hardware Compiler Superscalar Processors Explicitly Parallel Architectures
More informationCS4230 Parallel Programming. Lecture 7: Loop Scheduling cont., and Data Dependences 9/6/12. Administrative. Mary Hall.
CS4230 Parallel Programming Lecture 7: Loop Scheduling cont., and Data Dependences Mary Hall September 7, 2012 Administrative I will be on travel on Tuesday, September 11 There will be no lecture Instead,
More informationCoarse-Grained Parallelism
Coarse-Grained Parallelism Variable Privatization, Loop Alignment, Loop Fusion, Loop interchange and skewing, Loop Strip-mining cs6363 1 Introduction Our previous loop transformations target vector and
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 4A: Instruction Level Parallelism - Static Scheduling Avinash Kodi, kodi@ohio.edu Agenda 2 Dependences RAW, WAR, WAW Static Scheduling Loop-carried Dependence
More informationParallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?
Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and
More informationCompiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7
Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,
More informationShared-memory Parallel Programming with Cilk Plus
Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming
More informationOnline Course Evaluation. What we will do in the last week?
Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationAnalysis of Parallelization Techniques and Tools
International Journal of Information and Computation Technology. ISSN 97-2239 Volume 3, Number 5 (213), pp. 71-7 International Research Publications House http://www. irphouse.com /ijict.htm Analysis of
More informationCS4961 Parallel Programming. Lecture 2: Introduction to Parallel Algorithms 8/31/10. Mary Hall August 26, Homework 1, cont.
Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 26, 2010 1 Homework 1 Due 10:00 PM, Wed., Sept. 1 To submit your homework: - Submit a PDF file - Use the handin program
More informationCSC 447: Parallel Programming for Multi- Core and Cluster Systems
CSC 447: Parallel Programming for Multi- Core and Cluster Systems Why Parallel Computing? Haidar M. Harmanani Spring 2017 Definitions What is parallel? Webster: An arrangement or state that permits several
More informationLoop Transformations, Dependences, and Parallelization
Loop Transformations, Dependences, and Parallelization Announcements HW3 is due Wednesday February 15th Today HW3 intro Unimodular framework rehash with edits Skewing Smith-Waterman (the fix is in!), composing
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationMemory Systems and Performance Engineering
SPEED LIMIT PER ORDER OF 6.172 Memory Systems and Performance Engineering Fall 2010 Basic Caching Idea A. Smaller memory faster to access B. Use smaller memory to cache contents of larger memory C. Provide
More informationCompiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7
Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,
More informationAutotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT
Autotuning John Cavazos University of Delaware What is Autotuning? Searching for the best code parameters, code transformations, system configuration settings, etc. Search can be Quasi-intelligent: genetic
More informationSimone Campanoni Loop transformations
Simone Campanoni simonec@eecs.northwestern.edu Loop transformations Outline Simple loop transformations Loop invariants Induction variables Complex loop transformations Simple loop transformations Simple
More informationLoop Transformations! Part II!
Lecture 9! Loop Transformations! Part II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Loop Unswitching Hoist invariant control-flow
More informationEvaluating Performance Via Profiling
Performance Engineering of Software Systems September 21, 2010 Massachusetts Institute of Technology 6.172 Professors Saman Amarasinghe and Charles E. Leiserson Handout 6 Profiling Project 2-1 Evaluating
More informationLecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture
Lecture Topics ECE 486/586 Computer Architecture Lecture # 5 Spring 2015 Portland State University Quantitative Principles of Computer Design Fallacies and Pitfalls Instruction Set Principles Introduction
More informationWhy do we care about parallel?
Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The
More informationParallelisation. Michael O Boyle. March 2014
Parallelisation Michael O Boyle March 2014 1 Lecture Overview Parallelisation for fork/join Mapping parallelism to shared memory multi-processors Loop distribution and fusion Data Partitioning and SPMD
More informationLinear Loop Transformations for Locality Enhancement
Linear Loop Transformations for Locality Enhancement 1 Story so far Cache performance can be improved by tiling and permutation Permutation of perfectly nested loop can be modeled as a linear transformation
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationParallel Architecture. Hwansoo Han
Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range
More informationELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges
ELE 455/555 Computer System Engineering Section 4 Class 1 Challenges Introduction Motivation Desire to provide more performance (processing) Scaling a single processor is limited Clock speeds Power concerns
More informationMIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:
MIT OpenCourseWare http://ocw.mit.edu 6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation format: Rodric Rabbah, 6.189 Multicore Programming Primer, January (IAP) 2007.
More informationParallel Programming with OpenMP. CS240A, T. Yang
Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs
More informationMIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:
MIT OpenCourseWare http://ocw.mit.edu 6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation format: Saman Amarasinghe, 6.189 Multicore Programming Primer, January (IAP)
More informationMulti-core Architectures. Dr. Yingwu Zhu
Multi-core Architectures Dr. Yingwu Zhu Outline Parallel computing? Multi-core architectures Memory hierarchy Vs. SMT Cache coherence What is parallel computing? Using multiple processors in parallel to
More informationMulticore and Parallel Processing
Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement
More informationKismet: Parallel Speedup Estimates for Serial Programs
Kismet: Parallel Speedup Estimates for Serial Programs Donghwan Jeon, Saturnino Garcia, Chris Louie, and Michael Bedford Taylor Computer Science and Engineering University of California, San Diego 1 Questions
More informationSimulating ocean currents
Simulating ocean currents We will study a parallel application that simulates ocean currents. Goal: Simulate the motion of water currents in the ocean. Important to climate modeling. Motion depends on
More informationCompilation for Heterogeneous Platforms
Compilation for Heterogeneous Platforms Grid in a Box and on a Chip Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/heterogeneous.pdf Senior Researchers Ken Kennedy John Mellor-Crummey
More informationCSCI-580 Advanced High Performance Computing
CSCI-580 Advanced High Performance Computing Performance Hacking: Matrix Multiplication Bo Wu Colorado School of Mines Most content of the slides is from: Saman Amarasinghe (MIT) Square-Matrix Multiplication!2
More informationApplication-Platform Mapping in Multiprocessor Systems-on-Chip
Application-Platform Mapping in Multiprocessor Systems-on-Chip Leandro Soares Indrusiak lsi@cs.york.ac.uk http://www-users.cs.york.ac.uk/lsi CREDES Kick-off Meeting Tallinn - June 2009 Application-Platform
More informationThe Shift to Multicore Architectures
Parallel Programming Practice Fall 2011 The Shift to Multicore Architectures Bernd Burgstaller Yonsei University Bernhard Scholz The University of Sydney Why is it important? Now we're into the explicit
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 1 Introduction Li Jiang 1 Course Details Time: Tue 8:00-9:40pm, Thu 8:00-9:40am, the first 16 weeks Location: 东下院 402 Course Website: TBA Instructor:
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationOpenMP - III. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen
OpenMP - III Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions
More informationCSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis. Kevin Quinn Fall 2015
CSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis Kevin Quinn Fall 2015 Outline Done: How to write a parallel algorithm with fork and join Why using divide-and-conquer
More informationProgrammation Concurrente (SE205)
Programmation Concurrente (SE205) CM1 - Introduction to Parallelism Florian Brandner & Laurent Pautet LTCI, Télécom ParisTech, Université Paris-Saclay x Outline Course Outline CM1: Introduction Forms of
More informationDesigning Parallel Programs. This review was developed from Introduction to Parallel Computing
Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis
More informationAdministration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture
CS 395T: Topics in Multicore Programming Administration Instructors: Keshav Pingali (CS,ICES) 4.26A ACES Email: pingali@cs.utexas.edu TA: Xin Sui Email: xin@cs.utexas.edu University of Texas, Austin Fall
More informationParallelism. CS6787 Lecture 8 Fall 2017
Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does
More informationParallel Programming. OpenMP Parallel programming for multiprocessors for loops
Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory
More informationSimple Machine Model. Lectures 14 & 15: Instruction Scheduling. Simple Execution Model. Simple Execution Model
Simple Machine Model Fall 005 Lectures & 5: Instruction Scheduling Instructions are executed in sequence Fetch, decode, execute, store results One instruction at a time For branch instructions, start fetching
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationMemory Systems and Performance Engineering. Fall 2009
Memory Systems and Performance Engineering Fall 2009 Basic Caching Idea A. Smaller memory faster to access B. Use smaller memory to cache contents of larger memory C. Provide illusion of fast larger memory
More informationCOMP Parallel Computing. SMM (2) OpenMP Programming Model
COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel
More informationControl flow graphs and loop optimizations. Thursday, October 24, 13
Control flow graphs and loop optimizations Agenda Building control flow graphs Low level loop optimizations Code motion Strength reduction Unrolling High level loop optimizations Loop fusion Loop interchange
More informationPatterns of Parallel Programming with.net 4. Ade Miller Microsoft patterns & practices
Patterns of Parallel Programming with.net 4 Ade Miller (adem@microsoft.com) Microsoft patterns & practices Introduction Why you should care? Where to start? Patterns walkthrough Conclusions (and a quiz)
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Introduction Goal: connecting multiple computers to get higher performance
More informationWorkloads Programmierung Paralleler und Verteilter Systeme (PPV)
Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment
More informationCMSC 714 Lecture 4 OpenMP and UPC. Chau-Wen Tseng (from A. Sussman)
CMSC 714 Lecture 4 OpenMP and UPC Chau-Wen Tseng (from A. Sussman) Programming Model Overview Message passing (MPI, PVM) Separate address spaces Explicit messages to access shared data Send / receive (MPI
More informationOpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means
High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview
More informationAdministration. Prerequisites. CS 395T: Topics in Multicore Programming. Why study parallel programming? Instructors: TA:
CS 395T: Topics in Multicore Programming Administration Instructors: Keshav Pingali (CS,ICES) 4.126A ACES Email: pingali@cs.utexas.edu TA: Aditya Rawal Email: 83.aditya.rawal@gmail.com University of Texas,
More informationParallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville
Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information
More informationEE/CSCI 451 Midterm 1
EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming
More information