Null space basis: mxz. zxz I

Size: px
Start display at page:

Download "Null space basis: mxz. zxz I"

Transcription

1 Loop Transformations Linear Locality Enhancement for

2 ache performance can be improved by tiling and permutation Permutation of perfectly nested loop can be modeled as a matrix of the loop nest. dependence Transformed code can be generated using ILP calculator. Story so far transformation on the iteration space of the loop nest. linear Legality of permutation can be determined from the 2

3 for permutations applies to other loop transformations that Theory be modeled as linear transformations: skewing, reversal,scaling. can Dependence matrix of transformed program: T:D Transformation matrix: T (a non-singular matrix) matrix: D Dependence in which each column is a distance/direction vector Matrix Legality: T:D Small complication with code generation if scaling is included. 3

4 Linear loop transformations: permutation, skewing, reversal,scaling ompositions of linear loop transformations: matrix viewpoint Locality matrix: model for temporal and spatial locality Two key ideas: Height reduction. Making a loop nest fully permutable 2. A unied algorithm for improving cache performance Lecture based on work we did for HPs compiler product line Overview of lecture 4

5 is easy to nd a basis for null space of matrix in column-echelon It form. Find unimodular matrix U so that MU = L is in form. column-echelon If x is a vector in null space of L, Ux is in null space of M. If is basis for null space of L, U is basis for null space of M. Key algorithm: nding basis for null space of an integer matrix Denition: A vector x is in null space of a matrix M if Mx =. Null space basis: mxz zxz I m z General matrix M: 5

6 @ = A Example: M*U = L 2 ;2 ; ; ; asis for null space of M = U* ( ) = (-2 - ) 6

7 We have used reduction to column-echelon form: MU = L. We can also view this as a matrix decomposition: M = LQ Q is inverse of U from previous step. where Analogy: QR factorization of real matrices. matrices are like orthogonal matrices in reals. Unimodular Special form of this decomposition: M = LQ where all diagonal of L are non-negative. elements This is called a psuedo-hermite normal form of matrix. Some observations: 7

8 between two iterations Dependence iterations touch the same location => J I Motivating example: Wavefront code DO I =,N DO J =,N X(I,J) = X(I-,J+)... Dependence matrix = - => potential for exploiting data reuse! iteration points between executions of dependent iterations. N we schedule dependent iterations closer together? an 8

9 now, focus only on reuse opportunities between dependent For iterations. iterations that read from the same location: input dependences spatial locality This exploits only one kind of temporal locality. There are other kinds of locality that are important: oth are easy to add to basic model as we will see later. 9

10 J Exploiting temporal locality in wavefront I J I Permutation is illegal! Tiling is illegal! We have studied two transformations: permutation and tiling. Permutation and tiling are both illegal.

11 Height Reduction

12 Transformation is legal. Dependent iterations are scheduled close together, so good for J H One solution: schedule iterations along 45 degree planes! I G Note: locality. an we view this in terms of loop transformation? Loop skewing followed by loop reversal. 2

13 Loop Skewing: a linear loop transformation J V I J = U V Skewing of inner loop by outer loop: I k U (k is some fixed integer) Skewing of inner loop by an outer loop: always legal New dependence vectors: compute T*D In this example, D = T*D = - This skewing has changed dependence vector but it has not brought dependent iterations closer together... 3

14 Skewing outer loop by inner loop J V I U I J = U V Outer loop skewing: k Skewing of outer loop by inner loop: not necessarily legal In this example, D = T*D = incorrect - - Dependent iterations are closer together (good) but program is illegal (bad). How do we fix this?? 4

15 5-5 Loop Reversal:a linear loop transformation I U U = [-][I] DO I =, N DO U = -N,- X(I) = I+2 X(-U) = -U +2 Transformation matrix = [-] Another example: 2-D loop, reverse inner loop U V = - I J Legality of loop reversal: Apply transformation matrix to all dependences verify lex +ve ode generation: easy 5

16 Need for composite transformations J V I J = U V I U Transformation: skewing followed by reversal In final program, dependent iterations are close together! omposition of linear transformations = another linear transformation! omposite transformation matrix is H - U V = G H G - * = - How do we synthesize this composite transformation?? 6

17 Transformation matrices for permutation/reversal/skewing are unimodular. Any composition of these transformations can be represented a unimodular matrix. by Any unimodular matrix can be decomposed into product of matrices. permutation/reversal/skewing Legality of composite transformation T : check that T:D. T 3 (T 2 (T D)) = (T 3 T 2 T ) D.) (Proof: ode generation algorithm: New bounds: compute from A T ; U b Some facts about permutation/reversal/skewing Original bounds: A I b Transformation: U = T I 7

18 composite transformations using matrix-based Synthesizing approaches Rather than reason about sequences of transformations, we can about the single matrix that represents the composite reason transformation. Enabling abstraction: dependence matrix 8

19 Height reduction: move reuse into inner loops!. Dependence vector is ; Prefer not to have dependent iterations in dierent outer loop iterations.! So dependence vector in transformed program should look like.??!! So T = ;??! This says rst row of T is orthogonal to. ; So rst row of T can be chosen to be ( ). What about second row? 9

20 s should be linearly independent of rst row of T (non-singular matrix). T D should be a lexicographically positive vector. Determinant of matrix should be + or - (unimodularity). Second row of T (call it s) should satisfy the following properties:! One choice: ( ; ), giving T =. ; 2

21 How do we choose the rst few rows of the transformation. to push reuses down? matrix How do we ll in the rest of the matrix to get unimodular 2. matrix? General questions 2

22 Let be a basis for null space of D. Use transpose of basis vectors as rst few rows of hoosing rst few rows of transformation matrix: For now, assume dependence matrix D has only distances in it. Algorithm: transformation matrix. Example: study running example in previous slides. 22

23 ; rst row of transformation be (r r 2 r 3 r 4 ). Let this is orthogonal to D, it is clear that r and r 2 must be zero. If st and 2nd rows of D and nd basis for null space of Ignore rows of D : ( ) remaining picture: knock out rows of D with direction entries, use General for distance vectors, and pad resulting null space vectors algorithm Extension to direction vectors: easy + onsider D = 2 ; ;2 Pad with zeros to get partial transformation: ( ) with s. 23

24 Turns out to be easier if we drop unimodularity requirement ompletion procedure: generate a non-singular matrix T given few rows of this matrix rst Problem: how do we generate code if transformation matrix is Filling in rest of matrix: a general non-singular matrix? 24

25 to generate a complete transformation from a partially Goal: one specied a non-singular matrix T that extends P and that satises all generate dependences in D (TD ). the rows of P are linearly independent P D (no dependences violated yet) ompletion Procedure!!! = ;???? Given: a dependence matrix D a partial transformation P (rst few rows) Precondition: 25

26 From D, delete all dependences d for which Pd.. If D is now empty, use null space basis determination to add 2. rows to P to generate a non-singular matrix. more Otherwise, let k be the rst row of (reduced) D that has a 3. non-zero entry. Use e k everywhere else) as the next row of the transformation. zero Repeat from step (i). 4. Proof: Need to show that e k of partial transformation, and does not violate any rows Easy (see Li/Pingali paper). dependences. ompletion algorithm: iterative algorithm (unit row vector with in column k and is linearly independent of existing 26

27 A dependence is satised by partial transformation. No row of transformation = ( ) A Example: + + = A?????? ; ; New partial dependence matrix is + +??? 27

28 dependences are now satised, so we are left with the problem All completing transformation with a row independent of existing A ones. asis for null space of partial transformation = ( -) T = ; 28

29 transformation matrix T we obtain is non-singular but Problem: not be unimodular. may solutions: based on T = LQ pseudo-hermite normal form Two where L is lower-triangular with positive diagonal decomposition Develop code generation technology for non-singular matrices. Use Q as transformation matrix. How do we know this is legal, and is good for Question: locality? let us understand what transformations are modeled by First, matrices that are not unimodular. non-singular elements 29

30 k Loop scaling:change step size of loop DO I =, vs DO U = 2,2,2 Y(I) = I Y(U/2) = U/2 Scaling matrix: non-unimodular transformation! 3

31 Scaling, like reversal, is not important by itself. Nonsingular linear loop transformations: permutation/skewing/reversal/scaling Any non-singular integer matrix can be decomposed into of permutation/skewing/reversal/scaling matrices. product Standard decompositions: (pseudo)-hermite form T = L Q Q is unimodular and L is lower triangular with positive where elements, (pseudo)-smith normal form T = UDV diagonal U V are unimodular and D is diagonal with positive where elements. Including scaling in repertoire of loop transformations makes it to synthesize transformations but complicates code easier generation. 3

32 ode generation with a non-singular transformation T Original bounds: A I b ode generation for unimodular matrix U: Transformation: J = U I New bounds: compute from A U ; J b Key problem here: T ; is not necessarily an integer matrix. 32

33 Solution: use Hermite normal form decomposition T = L Q Diculties: J -2 4 I J = U V I U DO I =,3 DO J =,3 A(4J-2I+3,I+J) = J; DO U = -2,,2 DO V = -U/2 +3 max(,ceil(u/2+/2)), -U/2 + 3min(3,floor(U/2+3/2)),3 A[U+3,V] = (U+2V)/6 Key problems: How do we determine integer lower and upper bounds? (rounding up from rational bounds is not correct) How do we determine step sizes? 33

34 Running example: factorize T = L Q J - 2 Auxiliary Iteration Space DO I =,3 DO J =,3 A(4J-2I+3,I+J) = J; Initial Iteration Space I 2-3 U DO P = -,5 DO Q = max(,ceil(p+/2)),min(3,floor(p+3/2) A(2P+3,-P+Q) = Q -2 4 = U Final Iteration Space DO U = -2,,2 DO V = -U/2 +3 max(,ceil(u/2+/2)), -U/2 + 3min(3,floor(U/2+3/2)),3 A[U+3,V] = (U+2V)/6 34

35 Given T = L Q where L is triangular with positive diagonal and Q is unimodular. elements Use Q as the transformation to generate bounds for auxiliary space. Read o bounds for nal iteration space from L non-singular matrix T as transformation matrix. Using example: Running -2 4 = ; p 5 max( ceil((p + )=2)) q min(3 floor((p + 3)=2))!!! p u 2 = q v ; 3 ;2 u ;u=2 + 3max( ceil((u=2 + )=2)) v ;u=2 + 3 min(3 floor((u=2 + 3) 35

36 Good to use strength reduction to replace the integer divisions by and multiplications. additions transformation with non-singular matrix is needed for omplete code for distributed-memory parallel machines and similar generating line: transformation synthesis procedure can focus on producing ottom matrix. Unimodularity is not important. non-singular Step sizes of loops: diagonal entries of L hange of variables in body!!! ;=6 4=6 i u = =6 2=6 j v problems (see Wolfes book). 36

37 2 to code generation with non-singular transformation Solution matrices If L is a n n triangular matrix with positive diagonal Lemma: L maps lexicographically positive vectors into elements, L does not change the order in which iterations are done Intuition: just renumbers them! but onsider d = L where is a lexicographically positive Proof: and show that d is lexicographically positive as well. vector Auxiliary space iterations are performed in same order as iterations in nal space. corresponding Memory system behavior of auxiliary program is same as system behavior of nal program! memory lexicographically positive vectors. 37

38 memory system optimization, we can use a non-singular For matrix T as follows: compute Hermite normal form transformation of T = L Q and use Q as transformation matrix! 38

39 Find rows orthogonal to space spanned by dependence vectors, and them as the partial transformation. In this example, the use subspace is spanned by the vector ( ). orthogonal Apply completion procedure to partial transformation to generate J I Putting it all together for wavefront example: DO I =,N DO J =,N X(I,J) = X(I-,J+)... Dependence matrix = - full transformation. We would add row ( ). So transformation is! 39

40 Exploit dependences + spatial locality + read-only data reuse a dependence matrix D with dependence vectors. a locality matrix L D with locality vectors 2. matrix has additional vectors to account for spatial Locality and read-only data reuse. locality we would like to scan along directions in L in innermost Intuitively, provided dependences permit it. loops, General picture: 4

41 spatial locality: scan iteration space points so that we get Modeling stride accesses (or some multiple thereof) unit arrays are stored in column-major order, we should interchange the If loops. How do we determine this in general? two Example: from Assignment DO I =.. DO J =... [I,J] = A[I,J] + [J,I] 4

42 we do iteration i and then iteration i 2. Suppose element accessed in iteration i = X[Ai + a] Array locality matrix: matrix containing a basis for null space of Spatial A Example: array stored in column-major order, reference X[Ai + a] Array element accessed in iteration i 2 = X[Ai 2 + a] We want these two accesses to be within a column of A. So c A (i ; i 2 ) = ::: Let A = A with rst row deleted. So we want to solve A R = where R is "reuse direction". Solution: R is any vector in null space of A. 42

43 what direction(s) can we move so we get accesses along rst In of A for the most part? A Example: for 2 2 A = A Spatial locality matrix is ; 43

44 from Assignment : exploiting spatial locality in MVM Example accesses A. for I =,N for J =,N Y(I) = Y(I) + A(I,J)*X(J) Reference matrix is. Truncated matrix is So spatial locality matrix is 44

45 spatial locality: perform height reduction on locality A For our example, we get the transformation A A That is, we would permute the loops which is correct. 45

46 I = I 2 K = K 2 = I 2 ; I 2 = J 2 ; J 3 = K 2 ; K Read-only data reuse: add reuse directions to locality matrix DO I =,N DO J =,N DO K =,N [I,J] = [I,J] + A[I,K]*[K,J] onsider reuse vector for reference A[I,K]: I I 2 J J 2 K K 2 N (I J K ) (I 2 J 2 K 2 ) 46

47 @. A + Reuse vector is onsidering all the references, we get the following locality matrix:

48 ompute the locality matrix L which contains all vectors along we would like to exploit locality (dependences + which the rst few rows of the transformation. as all the completion algorithm to generate a complete and use that T transformation. Do a pseudo-hermite form decomposition of transformation Flexibility: if null space of L T good to drop some of the "less important" read-only reuse be and spatial locality vectors from consideration. vectors General Algorithm for Height Reduction data reuse + spatial locality) read-only Determine a basis for the null space of L T matrix and use the unimodular part as transformation. has only the zero vector in it, it may 48

49 some codes, height reduction may fail since locality vectors span In space. entire MMM as shown in previous slide (actually even MVM if Example: want to exploit locality in both matrix A and vectors x,y) we We can still exploit most of the locality provided we can tile loops. Tiling is not always legal... Solution: apply linear loop transformations to enable tiling. 49

50 Linear loop transformations to enable tiling 5

51 is legal if loops are fully permutable (all permutations of Tiling are legal). loops an we always convert a perfectly nested loop into a fully loop nest? permutable When we can, how do we do it? J In general, tiling is not legal D = I Tiling is illegal! Tiling is legal if all entries in dependence matrix are non-negative. 5

52 If all dependence vectors are distance vectors, we can Theorem: entire loop nest into a fully permutable loop nest. ; A. matrix of transformed program must have all positive Dependence entries. rst row of transformation can be ( ). So row of transformation (m ) (for any m > ). Second idea: skew inner loops by outer loops suciently to make General negative entries non-negative. all Example: wavefront Dependence matrix is 52

53 to make rst row with negative entries into row Transformation non-negative entries with p2... p m -n p3... -k row a row b first row with negative entries (a) for each negative entry in the first row with negative entries, find the first positive number in the corresponding column assume the rows for these positive entries are a,b etc as shown above (b) skew the row with negative entries by appropriate multiples of rows a,b... For our example, multiple of row a = ceiling(n/p2) multiple of row b = ceiling(max(m/p,k/p3)) Transformation: I.. ceiling(n/p2) ceiling(max(m/p,k/p3))... I 53

54 all entries in dependence matrix are non-negative, done. If Otherwise, Apply algorithm on previous slide to rst row with. entries. non-negative Generate new dependence matrix. 2. If no negative entries, done. 3. General algorithm for making loop nest fully permutable: 4. Otherwise, go step (). 54

55 as nice as height reduction solution, but it will work ne for Not enhancement except at tile boundaries (but boundary locality J I J I Result of tiling transformed wavefront Original loop Tiled fully permutable loop Tiling generates a 4-deep loop nest. points small compared to number of interior points). 55

56 @ A try to make outermost loops fully permutable, so we would We the second and third loops, and then tile the rst two interchange What happens with direction vectors? In general, we cannot make loop nest fully permutable. + ; Example: D = + est we can do is to make some of the loops fully permutable. loops only. Idea: algorithm will nd bands of fully permutable loop nests. 56

57 row has -ve direction which cannot be knocked out. ut we Second interchange fth row with second row to continue band. can Example for general algorithm: + ; + ;2 ;3 ; ; + A 3 4 egin rst band of fully permutable loops with rst loop. 57

58 second loop can be part of rst fully permutable band. New out -ve distances in third row by adding 2*second row to Knock 3 4 ;2 ;3 ; ; + A + ; + third row. 58

59 make band any bigger, so terminate band, drop all satised annot and start second band. dependences, this case, all dependences are satised, so last two loops form In band of fully permutable loops. second ; + A + ; + 59

60 do we determine transformation matrix for performing this How operations? time you apply a transformation to the dependence matrix, Every same transformation to T. apply Proof sketch: (U 2 (U D)) = (U 2 (U I)) D If you have n loops, start with T = I nn. At the end, T is your transformation matrix. 6

61 the previous example, we get A Interchange the second and fth rows to A 6

62 Add 2*second row to third. A 62

63 ompute dependence matrix D. urrent-row = row.. egin new fully permutable band. 2. If all entries in D are non-negative, add all remaining loops to 3. band and exit. current Otherwise, nd rst row r with one or more negative entries. 4. negative entry must be guarded by a positive entry above Each in its dependence vector. Add all loops between current-row it row (r-) to current band. and If row does not both positive and negative directions, x it as 5. follows. all direction entries are negative, negate row. If this point, all direction entries must be positive, and any At negative entries must be distances. If there are remaining entries remaining, determine appropriate multiple of negative onversion to fully permutable bands 63

64 rows to be added to that row (treating + guard entries guard ) to convert all negative entries in row r to strictly positive. as transformation matrix and add row r to current band. Update = r+ urrent-row to step 3. Go Otherwise, see if a row below current row without both positive 6. negative directions can be interchanged legally with and row. current so, perform interchange, update transformation matrix and If to step 4. go terminate current fully permutable band, drop all Otherwise, dependence vectors, and go to step 2. satised 64

65 ompute locality matrix L.. Perform height reduction by nding a basis for null space of 2. important locality vectors. less ompute resulting dependence matrix after height reduction. 3. all entries are non-negative, declare loop nest to be fully If permutable. Otherwise, apply algorithm to convert to bands of fully 4. loop nests. permutable Perform pseudo-hermite normal form decomposition of 5. transformation matrix, and use unimodular extract to resulting code transformation. perform Tile fully permutable bands choosing appropriate tile sizes. 6. Ye Grande Locality Enhancement Algorithm L T. If null space is too small, you have the option of dropping 65

66 Height reduction can also be viewed as outer-loop if no dependences are carried by outer loops, parallelization: loops are parallel! outer First paper on transformation synthesis in this style: Leslie (969) on multiprocessor scheduling. Lamport Algorithm here is based on Wei Lis ornell PhD thesis (Wei is a compiler hot-shot at Intel!). now A version of this algorithm was implemented by us in HPs product line. compiler It is easy to combine height reduction and conversion to fully band form into a single transformation matrix permutable but you can work it out for yourself. synthesis, Summary 66

Linear Loop Transformations for Locality Enhancement

Linear Loop Transformations for Locality Enhancement Linear Loop Transformations for Locality Enhancement 1 Story so far Cache performance can be improved by tiling and permutation Permutation of perfectly nested loop can be modeled as a linear transformation

More information

Legal and impossible dependences

Legal and impossible dependences Transformations and Dependences 1 operations, column Fourier-Motzkin elimination us use these tools to determine (i) legality of permutation and Let generation of transformed code. (ii) Recall: Polyhedral

More information

Tiling: A Data Locality Optimizing Algorithm

Tiling: A Data Locality Optimizing Algorithm Tiling: A Data Locality Optimizing Algorithm Announcements Monday November 28th, Dr. Sanjay Rajopadhye is talking at BMAC Friday December 2nd, Dr. Sanjay Rajopadhye will be leading CS553 Last Monday Kelly

More information

Tiling: A Data Locality Optimizing Algorithm

Tiling: A Data Locality Optimizing Algorithm Tiling: A Data Locality Optimizing Algorithm Previously Unroll and Jam Homework PA3 is due Monday November 2nd Today Unroll and Jam is tiling Code generation for fixed-sized tiles Paper writing and critique

More information

Transforming Imperfectly Nested Loops

Transforming Imperfectly Nested Loops Transforming Imperfectly Nested Loops 1 Classes of loop transformations: Iteration re-numbering: (eg) loop interchange Example DO 10 J = 1,100 DO 10 I = 1,100 DO 10 I = 1,100 vs DO 10 J = 1,100 Y(I) =

More information

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality A Crash Course in Compilers for Parallel Computing Mary Hall Fall, 2008 1 Overview of Crash Course L1: Data Dependence Analysis and Parallelization (Oct. 30) L2 & L3: Loop Reordering Transformations, Reuse

More information

Principle of Polyhedral model for loop optimization. cschen 陳鍾樞

Principle of Polyhedral model for loop optimization. cschen 陳鍾樞 Principle of Polyhedral model for loop optimization cschen 陳鍾樞 Outline Abstract model Affine expression, Polygon space Polyhedron space, Affine Accesses Data reuse Data locality Tiling Space partition

More information

Loop Transformations, Dependences, and Parallelization

Loop Transformations, Dependences, and Parallelization Loop Transformations, Dependences, and Parallelization Announcements HW3 is due Wednesday February 15th Today HW3 intro Unimodular framework rehash with edits Skewing Smith-Waterman (the fix is in!), composing

More information

Control flow graphs and loop optimizations. Thursday, October 24, 13

Control flow graphs and loop optimizations. Thursday, October 24, 13 Control flow graphs and loop optimizations Agenda Building control flow graphs Low level loop optimizations Code motion Strength reduction Unrolling High level loop optimizations Loop fusion Loop interchange

More information

Program Transformations for the Memory Hierarchy

Program Transformations for the Memory Hierarchy Program Transformations for the Memory Hierarchy Locality Analysis and Reuse Copyright 214, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California

More information

A Layout-Conscious Iteration Space Transformation Technique

A Layout-Conscious Iteration Space Transformation Technique IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO 12, DECEMBER 2001 1321 A Layout-Conscious Iteration Space Transformation Technique Mahmut Kandemir, Member, IEEE, J Ramanujam, Member, IEEE, Alok Choudhary, Senior

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #16 Loops: Matrix Using Nested for Loop In this section, we will use the, for loop to code of the matrix problem.

More information

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT Autotuning John Cavazos University of Delaware What is Autotuning? Searching for the best code parameters, code transformations, system configuration settings, etc. Search can be Quasi-intelligent: genetic

More information

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Computational Methods CMSC/AMSC/MAPL 460 Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Zero elements of first column below 1 st row multiplying 1 st

More information

MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.

MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix. MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix. Row echelon form A matrix is said to be in the row echelon form if the leading entries shift to the

More information

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers Administration CS 380C: Advanced Topics in Compilers Instructor: eshav Pingali Professor (CS, ICES) Office: POB 4.126A Email: pingali@cs.utexas.edu TA: TBD Graduate student (CS) Office: Email: Meeting

More information

CS 293S Parallelism and Dependence Theory

CS 293S Parallelism and Dependence Theory CS 293S Parallelism and Dependence Theory Yufei Ding Reference Book: Optimizing Compilers for Modern Architecture by Allen & Kennedy Slides adapted from Louis-Noël Pouche, Mary Hall End of Moore's Law

More information

LECTURES 3 and 4: Flows and Matchings

LECTURES 3 and 4: Flows and Matchings LECTURES 3 and 4: Flows and Matchings 1 Max Flow MAX FLOW (SP). Instance: Directed graph N = (V,A), two nodes s,t V, and capacities on the arcs c : A R +. A flow is a set of numbers on the arcs such that

More information

2.7 Numerical Linear Algebra Software

2.7 Numerical Linear Algebra Software 2.7 Numerical Linear Algebra Software In this section we will discuss three software packages for linear algebra operations: (i) (ii) (iii) Matlab, Basic Linear Algebra Subroutines (BLAS) and LAPACK. There

More information

Loop Transformations! Part II!

Loop Transformations! Part II! Lecture 9! Loop Transformations! Part II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Loop Unswitching Hoist invariant control-flow

More information

6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT

6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT 6.189 IAP 2007 Lecture 11 Parallelizing Compilers 1 6.189 IAP 2007 MIT Outline Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Generation of Parallel

More information

Lecture Notes on Loop Transformations for Cache Optimization

Lecture Notes on Loop Transformations for Cache Optimization Lecture Notes on Loop Transformations for Cache Optimization 5-: Compiler Design André Platzer Lecture Introduction In this lecture we consider loop transformations that can be used for cache optimization.

More information

Lecture 5: Matrices. Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur

Lecture 5: Matrices. Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur Lecture 5: Matrices Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur 29 th July, 2008 Types of Matrices Matrix Addition and Multiplication

More information

Optimizing MMM & ATLAS Library Generator

Optimizing MMM & ATLAS Library Generator Optimizing MMM & ATLAS Library Generator Recall: MMM miss ratios L1 Cache Miss Ratio for Intel Pentium III MMM with N = 1 1300 16 32/lock 4-way 8-byte elements IJ version (large cache) DO I = 1, N//row-major

More information

CSE P 501 Compilers. Loops Hal Perkins Spring UW CSE P 501 Spring 2018 U-1

CSE P 501 Compilers. Loops Hal Perkins Spring UW CSE P 501 Spring 2018 U-1 CSE P 501 Compilers Loops Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 U-1 Agenda Loop optimizations Dominators discovering loops Loop invariant calculations Loop transformations A quick look at some

More information

Compiling for Advanced Architectures

Compiling for Advanced Architectures Compiling for Advanced Architectures In this lecture, we will concentrate on compilation issues for compiling scientific codes Typically, scientific codes Use arrays as their main data structures Have

More information

Coarse-Grained Parallelism

Coarse-Grained Parallelism Coarse-Grained Parallelism Variable Privatization, Loop Alignment, Loop Fusion, Loop interchange and skewing, Loop Strip-mining cs6363 1 Introduction Our previous loop transformations target vector and

More information

Review. Loop Fusion Example

Review. Loop Fusion Example Review Distance vectors Concisely represent dependences in loops (i.e., in iteration spaces) Dictate what transformations are legal e.g., Permutation and parallelization Legality A dependence vector is

More information

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanfordedu) February 6, 2018 Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 In the

More information

Module 18: Loop Optimizations Lecture 36: Cycle Shrinking. The Lecture Contains: Cycle Shrinking. Cycle Shrinking in Distance Varying Loops

Module 18: Loop Optimizations Lecture 36: Cycle Shrinking. The Lecture Contains: Cycle Shrinking. Cycle Shrinking in Distance Varying Loops The Lecture Contains: Cycle Shrinking Cycle Shrinking in Distance Varying Loops Loop Peeling Index Set Splitting Loop Fusion Loop Fission Loop Reversal Loop Skewing Iteration Space of The Loop Example

More information

CS6015 / LARP ACK : Linear Algebra and Its Applications - Gilbert Strang

CS6015 / LARP ACK : Linear Algebra and Its Applications - Gilbert Strang Solving and CS6015 / LARP 2018 ACK : Linear Algebra and Its Applications - Gilbert Strang Introduction Chapter 1 concentrated on square invertible matrices. There was one solution to Ax = b and it was

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Probably the simplest kind of problem. Occurs in many contexts, often as part of larger problem. Symbolic manipulation packages can do linear algebra "analytically" (e.g. Mathematica,

More information

Unconstrained Optimization

Unconstrained Optimization Unconstrained Optimization Joshua Wilde, revised by Isabel Tecu, Takeshi Suzuki and María José Boccardi August 13, 2013 1 Denitions Economics is a science of optima We maximize utility functions, minimize

More information

CSC D70: Compiler Optimization Memory Optimizations

CSC D70: Compiler Optimization Memory Optimizations CSC D70: Compiler Optimization Memory Optimizations Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry, Greg Steffan, and

More information

4 Integer Linear Programming (ILP)

4 Integer Linear Programming (ILP) TDA6/DIT37 DISCRETE OPTIMIZATION 17 PERIOD 3 WEEK III 4 Integer Linear Programg (ILP) 14 An integer linear program, ILP for short, has the same form as a linear program (LP). The only difference is that

More information

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 In this lecture, we describe a very general problem called linear programming

More information

Simone Campanoni Loop transformations

Simone Campanoni Loop transformations Simone Campanoni simonec@eecs.northwestern.edu Loop transformations Outline Simple loop transformations Loop invariants Induction variables Complex loop transformations Simple loop transformations Simple

More information

A Loop Transformation Theory and an Algorithm to Maximize Parallelism

A Loop Transformation Theory and an Algorithm to Maximize Parallelism 452 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 2, NO. 4, OCTOBER 1991 A Loop Transformation Theory and an Algorithm to Maximize Parallelism Michael E. Wolf and Monica S. Lam, Member, IEEE

More information

Systems of Inequalities

Systems of Inequalities Systems of Inequalities 1 Goals: Given system of inequalities of the form Ax b determine if system has an integer solution enumerate all integer solutions 2 Running example: Upper bounds for x: (2)and

More information

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into 2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into the viewport of the current application window. A pixel

More information

Module 10. Network Simplex Method:

Module 10. Network Simplex Method: Module 10 1 Network Simplex Method: In this lecture we shall study a specialized simplex method specifically designed to solve network structured linear programming problems. This specialized algorithm

More information

i=1 i=2 i=3 i=4 i=5 x(4) x(6) x(8)

i=1 i=2 i=3 i=4 i=5 x(4) x(6) x(8) Vectorization Using Reversible Data Dependences Peiyi Tang and Nianshu Gao Technical Report ANU-TR-CS-94-08 October 21, 1994 Vectorization Using Reversible Data Dependences Peiyi Tang Department of Computer

More information

Array Optimizations in OCaml

Array Optimizations in OCaml Array Optimizations in OCaml Michael Clarkson Cornell University clarkson@cs.cornell.edu Vaibhav Vaish Cornell University vaibhav@cs.cornell.edu May 7, 2001 Abstract OCaml is a modern programming language

More information

LARP / 2018 ACK : 1. Linear Algebra and Its Applications - Gilbert Strang 2. Autar Kaw, Transforming Numerical Methods Education for STEM Graduates

LARP / 2018 ACK : 1. Linear Algebra and Its Applications - Gilbert Strang 2. Autar Kaw, Transforming Numerical Methods Education for STEM Graduates Triangular Factors and Row Exchanges LARP / 28 ACK :. Linear Algebra and Its Applications - Gilbert Strang 2. Autar Kaw, Transforming Numerical Methods Education for STEM Graduates Then there were three

More information

Multiple View Geometry in Computer Vision

Multiple View Geometry in Computer Vision Multiple View Geometry in Computer Vision Prasanna Sahoo Department of Mathematics University of Louisville 1 Projective 3D Geometry (Back to Chapter 2) Lecture 6 2 Singular Value Decomposition Given a

More information

Lecture 11 Loop Transformations for Parallelism and Locality

Lecture 11 Loop Transformations for Parallelism and Locality Lecture 11 Loop Transformations for Parallelism and Locality 1. Examples 2. Affine Partitioning: Do-all 3. Affine Partitioning: Pipelining Readings: Chapter 11 11.3, 11.6 11.7.4, 11.9-11.9.6 1 Shared Memory

More information

Data-centric Transformations for Locality Enhancement

Data-centric Transformations for Locality Enhancement Data-centric Transformations for Locality Enhancement Induprakas Kodukula Keshav Pingali September 26, 2002 Abstract On modern computers, the performance of programs is often limited by memory latency

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert

More information

: How to Write Fast Numerical Code ETH Computer Science, Spring 2016 Midterm Exam Wednesday, April 20, 2016

: How to Write Fast Numerical Code ETH Computer Science, Spring 2016 Midterm Exam Wednesday, April 20, 2016 ETH login ID: (Please print in capital letters) Full name: 263-2300: How to Write Fast Numerical Code ETH Computer Science, Spring 2016 Midterm Exam Wednesday, April 20, 2016 Instructions Make sure that

More information

Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Department of Computer Science The Australian National University Canberra ACT 26

Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Department of Computer Science The Australian National University Canberra ACT 26 Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Technical Report ANU-TR-CS-92- November 7, 992 Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Department of Computer

More information

COMP 558 lecture 19 Nov. 17, 2010

COMP 558 lecture 19 Nov. 17, 2010 COMP 558 lecture 9 Nov. 7, 2 Camera calibration To estimate the geometry of 3D scenes, it helps to know the camera parameters, both external and internal. The problem of finding all these parameters is

More information

An Improved Measurement Placement Algorithm for Network Observability

An Improved Measurement Placement Algorithm for Network Observability IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 16, NO. 4, NOVEMBER 2001 819 An Improved Measurement Placement Algorithm for Network Observability Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper

More information

Computational Geometry

Computational Geometry Lecture 12: Lecture 12: Motivation: Terrains by interpolation To build a model of the terrain surface, we can start with a number of sample points where we know the height. Lecture 12: Motivation: Terrains

More information

Finding two vertex connected components in linear time

Finding two vertex connected components in linear time TO 1 Finding two vertex connected components in linear time Guy Kortsarz TO 2 ackward edges Please read the DFS algorithm (its in the lecture notes). The DFS gives a tree. The edges on the tree are called

More information

Computing and Informatics, Vol. 36, 2017, , doi: /cai

Computing and Informatics, Vol. 36, 2017, , doi: /cai Computing and Informatics, Vol. 36, 2017, 566 596, doi: 10.4149/cai 2017 3 566 NESTED-LOOPS TILING FOR PARALLELIZATION AND LOCALITY OPTIMIZATION Saeed Parsa, Mohammad Hamzei Department of Computer Engineering

More information

2. Mathematical Notation Our mathematical notation follows Dijkstra [5]. Quantication over a dummy variable x is written (Q x : R:x : P:x). Q is the q

2. Mathematical Notation Our mathematical notation follows Dijkstra [5]. Quantication over a dummy variable x is written (Q x : R:x : P:x). Q is the q Parallel Processing Letters c World Scientic Publishing Company ON THE SPACE-TIME MAPPING OF WHILE-LOOPS MARTIN GRIEBL and CHRISTIAN LENGAUER Fakultat fur Mathematik und Informatik Universitat Passau D{94030

More information

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Spring 2 Parallelization Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Outline Why Parallelism Parallel Execution Parallelizing Compilers

More information

AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1

AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1 AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1 Virgil Andronache Richard P. Simpson Nelson L. Passos Department of Computer Science Midwestern State University

More information

Integer Programming Theory

Integer Programming Theory Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x

More information

Lecture 5: Duality Theory

Lecture 5: Duality Theory Lecture 5: Duality Theory Rajat Mittal IIT Kanpur The objective of this lecture note will be to learn duality theory of linear programming. We are planning to answer following questions. What are hyperplane

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

Lecture 18 Restoring Invariants

Lecture 18 Restoring Invariants Lecture 18 Restoring Invariants 15-122: Principles of Imperative Computation (Spring 2018) Frank Pfenning In this lecture we will implement heaps and operations on them. The theme of this lecture is reasoning

More information

Outline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities

Outline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Parallelization Outline Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Moore s Law From Hennessy and Patterson, Computer Architecture:

More information

c Nawaaz Ahmed 000 ALL RIGHTS RESERVED

c Nawaaz Ahmed 000 ALL RIGHTS RESERVED LOCALITY ENHANCEMENT OF IMPERFECTLY-NESTED LOOP NESTS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulllment of the Requirements for the Degree of Doctor

More information

Exploring Parallelism At Different Levels

Exploring Parallelism At Different Levels Exploring Parallelism At Different Levels Balanced composition and customization of optimizations 7/9/2014 DragonStar 2014 - Qing Yi 1 Exploring Parallelism Focus on Parallelism at different granularities

More information

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers. CS 612 Software Design for High-performance Architectures 1 computers. CS 412 is desirable but not high-performance essential. Course Organization Lecturer:Paul Stodghill, stodghil@cs.cornell.edu, Rhodes

More information

Lecture 3: Camera Calibration, DLT, SVD

Lecture 3: Camera Calibration, DLT, SVD Computer Vision Lecture 3 23--28 Lecture 3: Camera Calibration, DL, SVD he Inner Parameters In this section we will introduce the inner parameters of the cameras Recall from the camera equations λx = P

More information

An Ecient Approximation Algorithm for the. File Redistribution Scheduling Problem in. Fully Connected Networks. Abstract

An Ecient Approximation Algorithm for the. File Redistribution Scheduling Problem in. Fully Connected Networks. Abstract An Ecient Approximation Algorithm for the File Redistribution Scheduling Problem in Fully Connected Networks Ravi Varadarajan Pedro I. Rivera-Vega y Abstract We consider the problem of transferring a set

More information

Essential constraints: Data Dependences. S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2

Essential constraints: Data Dependences. S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2 Essential constraints: Data Dependences S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2 Essential constraints: Data Dependences S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2 S2

More information

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Computational Methods CMSC/AMSC/MAPL 460 Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Some special matrices Matlab code How many operations and memory

More information

The Polyhedral Model (Transformations)

The Polyhedral Model (Transformations) The Polyhedral Model (Transformations) Announcements HW4 is due Wednesday February 22 th Project proposal is due NEXT Friday, extension so that example with tool is possible (see resources website for

More information

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs 15.082J and 6.855J Lagrangian Relaxation 2 Algorithms Application to LPs 1 The Constrained Shortest Path Problem (1,10) 2 (1,1) 4 (2,3) (1,7) 1 (10,3) (1,2) (10,1) (5,7) 3 (12,3) 5 (2,2) 6 Find the shortest

More information

Dual-fitting analysis of Greedy for Set Cover

Dual-fitting analysis of Greedy for Set Cover Dual-fitting analysis of Greedy for Set Cover We showed earlier that the greedy algorithm for set cover gives a H n approximation We will show that greedy produces a solution of cost at most H n OPT LP

More information

Lecture Notes for Chapter 2: Getting Started

Lecture Notes for Chapter 2: Getting Started Instant download and all chapters Instructor's Manual Introduction To Algorithms 2nd Edition Thomas H. Cormen, Clara Lee, Erica Lin https://testbankdata.com/download/instructors-manual-introduction-algorithms-2ndedition-thomas-h-cormen-clara-lee-erica-lin/

More information

Ranking Functions. Linear-Constraint Loops

Ranking Functions. Linear-Constraint Loops for Linear-Constraint Loops Amir Ben-Amram 1 for Loops Example 1 (GCD program): while (x > 1, y > 1) if x

More information

Column and row space of a matrix

Column and row space of a matrix Column and row space of a matrix Recall that we can consider matrices as concatenation of rows or columns. c c 2 c 3 A = r r 2 r 3 a a 2 a 3 a 2 a 22 a 23 a 3 a 32 a 33 The space spanned by columns of

More information

Tilings of the Euclidean plane

Tilings of the Euclidean plane Tilings of the Euclidean plane Yan Der, Robin, Cécile January 9, 2017 Abstract This document gives a quick overview of a eld of mathematics which lies in the intersection of geometry and algebra : tilings.

More information

Introduction to Software Engineering

Introduction to Software Engineering Introduction to Software Engineering (CS350) Lecture 17 Jongmoon Baik Testing Conventional Applications 2 Testability Operability it operates cleanly Observability the results of each test case are readily

More information

Calibrating a Structured Light System Dr Alan M. McIvor Robert J. Valkenburg Machine Vision Team, Industrial Research Limited P.O. Box 2225, Auckland

Calibrating a Structured Light System Dr Alan M. McIvor Robert J. Valkenburg Machine Vision Team, Industrial Research Limited P.O. Box 2225, Auckland Calibrating a Structured Light System Dr Alan M. McIvor Robert J. Valkenburg Machine Vision Team, Industrial Research Limited P.O. Box 2225, Auckland New Zealand Tel: +64 9 3034116, Fax: +64 9 302 8106

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

AM 121: Intro to Optimization Models and Methods Fall 2017

AM 121: Intro to Optimization Models and Methods Fall 2017 AM 121: Intro to Optimization Models and Methods Fall 2017 Lecture 10: Dual Simplex Yiling Chen SEAS Lesson Plan Interpret primal simplex in terms of pivots on the corresponding dual tableau Dictionaries

More information

The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs

The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs Frankie Smith Nebraska Wesleyan University fsmith@nebrwesleyan.edu May 11, 2015 Abstract We will look at how to represent

More information

Notes for Lecture 20

Notes for Lecture 20 U.C. Berkeley CS170: Intro to CS Theory Handout N20 Professor Luca Trevisan November 13, 2001 Notes for Lecture 20 1 Duality As it turns out, the max-flow min-cut theorem is a special case of a more general

More information

Segmentation: Clustering, Graph Cut and EM

Segmentation: Clustering, Graph Cut and EM Segmentation: Clustering, Graph Cut and EM Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu

More information

Optimizing Cache Performance in Matrix Multiplication. UCSB CS240A, 2017 Modified from Demmel/Yelick s slides

Optimizing Cache Performance in Matrix Multiplication. UCSB CS240A, 2017 Modified from Demmel/Yelick s slides Optimizing Cache Performance in Matrix Multiplication UCSB CS240A, 2017 Modified from Demmel/Yelick s slides 1 Case Study with Matrix Multiplication An important kernel in many problems Optimization ideas

More information

Reducing Memory Requirements of Nested Loops for Embedded Systems

Reducing Memory Requirements of Nested Loops for Embedded Systems Reducing Memory Requirements of Nested Loops for Embedded Systems 23.3 J. Ramanujam Λ Jinpyo Hong Λ Mahmut Kandemir y A. Narayan Λ Abstract Most embedded systems have limited amount of memory. In contrast,

More information

Increasing Parallelism of Loops with the Loop Distribution Technique

Increasing Parallelism of Loops with the Loop Distribution Technique Increasing Parallelism of Loops with the Loop Distribution Technique Ku-Nien Chang and Chang-Biau Yang Department of pplied Mathematics National Sun Yat-sen University Kaohsiung, Taiwan 804, ROC cbyang@math.nsysu.edu.tw

More information

Lecture 57 Dynamic Programming. (Refer Slide Time: 00:31)

Lecture 57 Dynamic Programming. (Refer Slide Time: 00:31) Programming, Data Structures and Algorithms Prof. N.S. Narayanaswamy Department of Computer Science and Engineering Indian Institution Technology, Madras Lecture 57 Dynamic Programming (Refer Slide Time:

More information

Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations

Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations Ying Hu Clark Barrett Benjamin Goldberg Department of Computer Science New York University yinghubarrettgoldberg

More information

Chapter 3. Quadric hypersurfaces. 3.1 Quadric hypersurfaces Denition.

Chapter 3. Quadric hypersurfaces. 3.1 Quadric hypersurfaces Denition. Chapter 3 Quadric hypersurfaces 3.1 Quadric hypersurfaces. 3.1.1 Denition. Denition 1. In an n-dimensional ane space A; given an ane frame fo;! e i g: A quadric hypersurface in A is a set S consisting

More information

Cluster algebras and infinite associahedra

Cluster algebras and infinite associahedra Cluster algebras and infinite associahedra Nathan Reading NC State University CombinaTexas 2008 Coxeter groups Associahedra and cluster algebras Sortable elements/cambrian fans Infinite type Much of the

More information

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 9: Performance Issues in Shared Memory. The Lecture Contains:

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 9: Performance Issues in Shared Memory. The Lecture Contains: The Lecture Contains: Data Access and Communication Data Access Artifactual Comm. Capacity Problem Temporal Locality Spatial Locality 2D to 4D Conversion Transfer Granularity Worse: False Sharing Contention

More information

Outline. Issues with the Memory System Loop Transformations Data Transformations Prefetching Alias Analysis

Outline. Issues with the Memory System Loop Transformations Data Transformations Prefetching Alias Analysis Memory Optimization Outline Issues with the Memory System Loop Transformations Data Transformations Prefetching Alias Analysis Memory Hierarchy 1-2 ns Registers 32 512 B 3-10 ns 8-30 ns 60-250 ns 5-20

More information

Catalan Numbers. Table 1: Balanced Parentheses

Catalan Numbers. Table 1: Balanced Parentheses Catalan Numbers Tom Davis tomrdavis@earthlink.net http://www.geometer.org/mathcircles November, 00 We begin with a set of problems that will be shown to be completely equivalent. The solution to each problem

More information

COT 6936: Topics in Algorithms! Giri Narasimhan. ECS 254A / EC 2443; Phone: x3748

COT 6936: Topics in Algorithms! Giri Narasimhan. ECS 254A / EC 2443; Phone: x3748 COT 6936: Topics in Algorithms! Giri Narasimhan ECS 254A / EC 2443; Phone: x3748 giri@cs.fiu.edu http://www.cs.fiu.edu/~giri/teach/cot6936_s12.html https://moodle.cis.fiu.edu/v2.1/course/view.php?id=174

More information

Linear Equations in Linear Algebra

Linear Equations in Linear Algebra 1 Linear Equations in Linear Algebra 1.2 Row Reduction and Echelon Forms ECHELON FORM A rectangular matrix is in echelon form (or row echelon form) if it has the following three properties: 1. All nonzero

More information

/ Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang

/ Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang 600.469 / 600.669 Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang 9.1 Linear Programming Suppose we are trying to approximate a minimization

More information

Module 18: Loop Optimizations Lecture 35: Amdahl s Law. The Lecture Contains: Amdahl s Law. Induction Variable Substitution.

Module 18: Loop Optimizations Lecture 35: Amdahl s Law. The Lecture Contains: Amdahl s Law. Induction Variable Substitution. The Lecture Contains: Amdahl s Law Induction Variable Substitution Index Recurrence Loop Unrolling Constant Propagation And Expression Evaluation Loop Vectorization Partial Loop Vectorization Nested Loops

More information