AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1

Size: px
Start display at page:

Download "AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1"

Transcription

1 AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1 Virgil Andronache Richard P. Simpson Nelson L. Passos Department of Computer Science Midwestern State University Wichita Falls, Tx, (andron simpson passos)@abacus.mwsu.edu ABSTRACT A significant amount of research has been done towards finding a technique that will allow maximum use of parallel capabilities in the case of nested loops. The last couple of years have seen a number of theoretical results that attempt to gain the largest possible amount of parallelism. In this paper, one such technique - multidimensional retiming - is integrated with a novel implementation of the way the indices are used to control the loop iterations to provide a practical approach to the parallelization problem. Previous approaches have considered the problem as a whole and emphasized the use of multiple processors, without optimizing the execution for an individual processor. This paper looks at an efficient implementation of nested loops for a single processor. The theory and the algorithm used to achieve the desired result are presented. A detailed example illustrates the use of the technique. 1. INTRODUCTION Numerous studies have shown that in most computation intensive applications the largest amount of time is spent executing loop structures. As such, a considerable amount of work has been done to improve the efficiency with which the loops are executed. The focus of that work has been directed towards achieving some degree of parallelism within or between loops, whether using software (compilers) or hardware (architecture) means. However, these approaches have considered the problem as a whole and emphasized the use of multiple processors, without optimizing the execution for an individual processor. This paper looks at an efficient implementation of nested loops for a single processor. The software approach of the solution is rather diverse in itself. A series of loop transformation techniques using such tools as linear algebra [13], index-shifting [9] and software pipelining [2,4,5,6] have proven successful to varying degrees. Still, each of the above mentioned techniques has its disadvantages. Linear algebra techniques require fairly large matrix manipulation before a transformed loop can be obtained. Index-shifting provides no information about the instructions that need to be move outside the loop at either end (prologue and epilogue) in order for the overall loop structure to remain 1 This work was supported in part by the National Science Foundation under Grant No. MIP

2 unchanged. Finally, software pipelining optimizes the loop body without considering the nested control structures. At the same time, new twists on old techniques such as scheduling algorithms [8] have also brought about improvement. However, here again the focus is on the loop body rather than the control structures. Other approaches worth mentioning involve changing the dependency graph of a loop [10] and the use of the hyperplane method for obtaining parallelism [1,7]. The technique described by Midkiff and Padua [10] deals with each level of the nested control structure individually, rather than as a block. At the same time, the wavefront method [1] applies to multi-processor environments instead of the more commonly found single processor - multiple functional unit case. From the hardware standpoint, the most common approach has been to employ two different processing modules to perform separate functions in the scheduling process [3,4],e.g. a dataflow instruction scheduling unit and a pipelined instruction execution unit. Still, the use of two different units may not be necessary in dealing with nested control structures. In this study, common constructs of the form for i = 1 to 10 for j = 1 to which require a conventional row-wise execution are converted to a more efficient format, represented by the command: for (i,j) = (1,1) to (10,10).. where (i,j) is a number formed by the concatenation of the two indices, i and j. In order to verify the efficiency of the new construct and to present the transformation technique used in our method, the next section provides a set of basic concepts. Section 3 describes the algorithm used in the implementation process. Section 4 describes an example of applying this method and it is followed by a summary of the topics discussed, finalizing the paper. 2. BACKGROUND There are several important concepts when dealing with nested loops. Among them, the iteration space of a nested loop is the set of all the points in the loop, i.e. all valid combinations of the control variables. A hyperplane is a subset of the iteration space whose dimension is one less than the iteration space dimension (a line in a 2-D space, an (n-1)-dimensional subspace in an n-dimensional iteration space). The global schedule is 2

3 the general direction in which the loop is traversed and the local schedule is the direction of the execution sequence along a hyperplane. Nested loops written solely with the aid of constructs currently available in most programming languages execute by starting at what could be described as a corner of the iteration space and proceed in a direction parallel to one of its edges (in 2-D either rowwise or column-wise). It can be shown that all such possible execution paths can be treated in a similar manner. As such, this paper will show the process involved when the local execution sequence follows a left to right order with respect to the global schedule [11]. The restructuring of the nested loop will make use of loop transformations. The technique used is multidimensional retiming [12]. Its main advantages compared with other techniques are that it requires no unrolling or hand-written optimal code and it does provide for a prologue and an epilogue [1,2,5,6,7,9,10,13]. For example in the loop: for i = 1 to 100 for j = 1 to 100 a[i,j] = b[i,j-1] + b[i-1,j] since in the second instruction of the loop body b makes use of the just computed a, the two instructions inside the loop cannot be executed in parallel. The multidimensional retiming technique can produce a new code equivalent to the pseudo-code below: prologue control instructions a[i+1,j-1] = b[i+1,j-2] + b[i,j-1] epilogue In this new code, since the result computed by a is not immediately used by b, it can now be seen that the loop body is fully parallel. As a side effect of the method, there are two sets of instructions that have been moved outside of the loop - the prologue and epilogue [12]. In order to obtain such parallelism, the loop structure and execution have been shifted such that the direction in which the loop is traversed is no longer parallel to its edges. This change in direction has led to modifying the control instructions as will be seen in the next section. The same change makes it reasonable to assume that an execution pattern that does not follow the original row-wise (or column-wise) pattern will improve the degree of parallelism available and therefore the execution time of the loop. 3

4 Two of the tools used in obtaining the new loop structure are dependency vectors and scheduling vectors. A dependency vector is a vector which indicates the relation between the computed value and the values it depends on in terms of loop iterations. In the original code, the dependency vectors of a with regard to b are ( 0,1) and (1,0), while the dependency vector of b with regard to a is (0,0). A scheduling vector is a vector that indicates the global schedule. 3. TRANSFORMATION As seen in the previous section, traversing a nested loop in the traditional order will often not yield the best performance. Therefore, a different way of executing the loop needs to be found, in which the processing order is not parallel to the iteration space edges. Therefore, the best direction for the loop execution needs to be found. That is taken care of through a modified version of multidimensional loop retiming. The next step is calculating the necessary increments for the control variables such that the loop is traversed in the direction required by the results from the previous step. Finally, there is the question of reducing the two control variables into one construct. In order to find the new direction of execution, without loss of generality, a two dimensional nested loop is assumed, with the format: for i = 0 to m for j = 0 to n a[i,j] = f(a[i',j']) In this case, any time a new value a[i,j] is computed, the value(s) a[i',j'] must have been computed previously. It follows, therefore, that i' i and if i' = i, then j' < j. Thinking of i and j in terms of a Cartesian coordinate system with i on the horizontal axis and j on the vertical axis, if we examine the possible locus of (i-i',j-j'), we find it to be the right half of the plane, inclusive of the negative j axis, but exclusive of the positive one. It can be proven that there exists a scheduling vector (x,y), such that y = mx, with 0 < m < infinity, where m = max {-d /d }, D being the set of all dependency vectors. D y x Thus, when y = 1, x = 1/m. Any point of i coordinate greater than x and j coordinate 1 will fall within the sector in which it is a universal scheduling vector. Therefore, if we choose the point ( (1/m),1), we have a scheduling vector of integer coordinates. (The corresponding retiming vector is (1,- (1/m) ). Therefore, by choosing an execution direction of gradient -1/ (1/m), we can obtain better results than by traversing the loop in the traditional fashion. In order to advance from one point in the iteration to the next and initialize the next execution sequence, two increments need to be calculated that are somewhat equivalent to the regular increment and the reset at the end of the inner loop. The regular 4

5 increment is fairly easily obtained, as it is 1 for the outer index and - (1/m) for the inner index. The calculation for the other index has to be divided into three distinct parts. However, due to the index changes that were brought about by retiming, the retimed execution cannot be applied to the first and last columns as well as the first and last (1/m) rows. These will have to be part of prologues and epilogues as the execution progresses. In the first part, the start of each diagonal execution sequence has the outer index set at 0 and the end is reached when the inner index reaches 0. In the second part, the start has the inner index at its maximum and the end is reached when the inner index reaches 0. Finally, the third sequence of execution sequences starts with the inner index at its maximum and ends with the outer index at its maximum. Considering the iteration space of the loop as a two dimensional array, a visual representation is given in figure 1. Figure 1. The three stages of loop execution The end of each stage can be easily determined, as the figure 1 shows: two of the stopping points are "corners" of the nested loop, while the third can be calculated. The first stage ends at the end of the execution sequence starting at the top left hand corner of the iteration space (the point (0,max(inner index)). Thus the last point in the execution sequence (the end of the first stage) is given by: inner index: max(inner index) mod (1/m) outer index: max(inner index) div 1/m In the Cartesian plane, the point (outer index, inner index) and the slope (1/m) determine a straight line. Multiplying the slope by the x coordinate and adding the y coordinate will result in the y value of the function when x is 0. Since in this stage all that is needed is a stepwise increase in the inner index, 1 is added. So, the increment at the end of each execution sequence sets the inner index to (1/m) * outer index + inner index + 1. As a result of the traversal technique, the outer index at the start of each execution sequence is equal to 0. The second stage ends in the lower right hand corner of the iteration space. So, the value of each index at the end of each stage is given by: 5

6 inner index: 0 outer index: max(outer index) The increment at the end of each execution sequence sets the outer index to ( (1/m) * outer index + inner index - max(inner index)) div (1/m) + 1. The first three terms are the previous value of the inner index as calculated by the formula in stage 1. The entire parenthesis then calculates how many multiples of the slope that are outside the iteration space considered (since the outer index changes by one for every (1/m) change in the inner index). Then, since the first point in iteration space would by the next multiple, 1 is added. The inner index is set to max(inner index) - (1/m) + ( (1/m) * outer index + inner index) mod (1/m) + 1. It is fairly obvious that the first iteration point in an execution sequence cannot be further down than ( (1/m) - 1) units from max(inner index) in the iteration space. Also, since the access is still in sequential order in respect to the inner index, there is a cyclic nature to the starting inner index coordinate of each execution sequence. Since the first cycle starts ( (1/m) - 1) units down and following a similar logic as for the outer index, the formula results quite naturally. Finally, the third stage ends when the upper left hand corner is reached. Thus, the value of each index at the end of the stage is given by: inner index: max(inner index) outer index: max(outer index) The increment at the end of each execution sequence sets the inner index to inner index (1/m) * (max(inner index) - inner index - 1) div (1/m). By a similar process as in stage 2, this index is calculated by going one cell up in the iteration space and calculating how many multiples of the slope can be found between that number and the inner index limit. The outer index is then set to max(outer index) - (max(inner index) - inner index - 1) div (1/m). The parenthesis calculates how many multiples of the slope are in between the point just above the end of the previous execution sequence and the inner index limit of the iteration space. That is the number of units away from the outer index limit that the outer index of the next execution sequence should be. The algorithm to perform the transformed code is represented by the pseudo-code that follows: Algorithm Calculate the end points of each stage. Combine each of the three end points and form three bitmasks. Prologue. Initialize the first execution sequence Set the starting mask at (0,ceiling(1/m)). if (mask > first bitmask) Execute loop instructions if (mask < first bitmask) if (mask > 0) 6

7 Execute loop instructions Set end of sequence values and prologue of the next one Until (mask = first bitmask) Execute loop instructions Switch to second bitmask and generate the next address. if (mask > second bitmask) Execute loop instructions if (mask < second bitmask) Set end of sequence values and prologue of the next one Until (mask = second bitmask) Execute loop instructions End. Switch to third bitmask and calculate the next address. if (mask < third bitmask) Execute loop instructions if (mask > third bitmask) Set end of sequence values and prologue of the next one Until (mask = third mask) Execute loop instruction Epilogue. An additional step consists of simply concatenating the two indices into one comprehensive bitmask. Let us consider the Intel Pentium architecture. The Intel Pentium processor already has registers that can be accessed both as one single register and as two halves. Such an arrangement is very well suited to the process described in this paper if we consider loops with control indices in the range 1 to 255. All that is needed is that the indices be stored in the two halves of a register and a suitable increment be calculated for each case. In the first two stages presented above, since the end point of each execution sequence is at the point where the inner index is a value less than (1/m), we store the stage ending index combination with the inner index in the higher order bits of the register and the outer index in the lower order bits. Then, if the current combination of indices results in a lower number than the stage ending one, a regular index increase is required. If the number is higher, the start of a new execution sequence is signaled. Finally, if the two numbers are equal, the stage is over. In the third step, the end of an execution sequence is signaled by the outer index reaching its maximum value. Thus, the same process as above is employed, with the difference that the outer index is stored in the higher order bits of the register and the 7

8 inner index in the lower order bits. At each stage, the increment for both indices can be performed by storing the individual increment in the respective halves of a register. 4. EXAMPLE Let us consider the following nested loop: for i = 0 to 9 for j = 0 to 9 a[i,j] = b[i,j-1] + b[i-1,j] In this case, the dependence vectors are (0,1), (1,0) and (0,0). The (0,0) is the reason the two instructions are not being executed in parallel, since it indicates an immediate dependence between the operands involved. The two dependence vectors that determine the retiming are then (0,1) and (1,0). In that case, the intersection of the relevant half planes is the entire first quadrant. The result of that is that any scheduling vector (m,1) with m > 0 is a valid scheduling vector. Assuming the scheduling vector selected is (3,1), we then have a retiming vector of (1,-3). The resulting structure is as follows: prologue (j,i) = (0,3) bitmask1 = (3,2) initialize first execution sequence if ((j,i) > bitmask) (j,i) = (j,i) + (-3,1) if ((j,i) < bitmask) if ((j,i) > 0) (j,i) = (3*i+j-2,1) Initialize the next execution sequence Until ((j,i) = bitmask) bitmask = (3,8) (j,i) = (4,2) // the values are those of the previous bitmask if ((j,i) > bitmask) 8

9 (j,i) = (j,i) + (-3,1) if (j,i) < bitmask (j,i) = (6-3+(3*i+j) mod 3 + 1,(3*i+j-6) div 3 + 1) Initialize the next execution sequence Until ((j,i) = bitmask) bitmask = (8,6) (i,j) = (8,4) if ((i,j) < bitmask) (i,j) = (i,j) + (1,-3) if ((i,j) > bitmask) (i,j) = (8-(6-j-1) div 3,j+1+3*((6-j-1) div 3)) Initialize the next execution sequence Until ((i,j) = bitmask) epilogue Three things should be noted. First, index calculation is rather elaborate. However, the parallelism obtained as a result should provide sufficient compensation for the time spent in calculating the indices, especially if the following two conditions are met: i. The loops are sufficiently large ii. The slope is chosen to be the smallest power of 2 larger than (1/m). The second thing to be noted is that the total number of comparisons to decide if the loop has reached its conclusion is lowered with the approach presented above 2 2 from O(n + n) to O(n ). Finally, the question remains of the prologues and epilogues, which is the next step in completely solving the parallelization problem. Determining the prologues and epilogues and an efficient execution algorithm is the focus of our research in the near future. 5. SUMMARY This paper has presented a new technique of executing nested loops by transforming the traditional structures into a new one that allows for access along directions not parallel to the edges of the iteration space. What this approach brings to the loop execution is the possibility of exploiting parallelism between instructions that would 9

10 otherwise not be accessible. The steps taken in the loop transformation process have also been presented together with the construct necessary for the technique to be viable. The algorithm used was described in some detail and an example illustrating its use was also given. As a result of the transformation, a large percentage of the nested loop can be fully parallelized, regardless of the dependencies that exist within the loop. REFERENCES [1] A. Aiken, A. Nicolau. Fine-grain Parallelization and the Wavefront Method. Languages and Compilers for Parallel Computing, Cambridge, Massachusetts, MIT Press, 1990, pp [2] A. Aiken, A. Nicolau. Resource-Constrained Software Pipelining. IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No 12, December [3] A. Aiken, A. Nicolau. Perfect Pipelining: A New Loop Parallelization Technique. European Symposium on Programming [4] G.R. Gao, Z. Paraskevas. Compiling for Dataflow Software Pipelining. Languages and Compilers for Parallel Computing, Cambridge, Massachusetts, MIT Press, 1991, pp [5] R. Govindarajan, E.R. Altman, G.R. Gao. A Framework for Resource-Constrained Rate-Optimal Software Pipelining. IEEE Transactions on Parallel and Distributed Systems. Vol. 7, No 11, November [6] R.B. Jones, V.H. Allan. Software Pipelining: An Evaluation of Enhanced th Pipelining. Proceedings of the 24 Annual International Symposium on Microarchitecture, 1991, pp [7] L. Lamport. The Parallel Execution of Do Loops. Communications of the ACM. Vol. 17, No 2, February [8] T.-F. Lee, A.C.-H. Wu, Y.-L. Lin. A Transformation-Based Method for Loop Folding. IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems, Vol. 13, No. 4, April [9] L.-S. Liu, C.-W. Ho, J.-P. Sheu. On the Parallelism of Nested For-Loops Using Index Shift Method. International Conference on Parallel Processing [10] S.P. Midkiff and D.A. Padua. Compiler Generated Synchronization for Do Loops. International Conference on Parallel Processing [11] N.L. Passos, E. H.-M. Sha. Synthesis of Multi-Dimensional Applications in VHDL. International Conference on Computer Design [12] N.L. Passos, E. H.-M. Sha. Achieving Full Parallelism Using Multidimensional Retiming. IEEE Transactions on Parallel and Distributed Systems. Vol. 7, No 11, November [13] J. Ramanujam. Non-unimodular Transformations of Nested Loops. IEEE Proceedings on Supercomputing. Nov 1992, pp

A Perfect Branch Prediction Technique for Conditional Loops

A Perfect Branch Prediction Technique for Conditional Loops A Perfect Branch Prediction Technique for Conditional Loops Virgil Andronache Department of Computer Science, Midwestern State University Wichita Falls, TX, 76308, USA and Richard P. Simpson Department

More information

DESIGN OF 2-D FILTERS USING A PARALLEL PROCESSOR ARCHITECTURE. Nelson L. Passos Robert P. Light Virgil Andronache Edwin H.-M. Sha

DESIGN OF 2-D FILTERS USING A PARALLEL PROCESSOR ARCHITECTURE. Nelson L. Passos Robert P. Light Virgil Andronache Edwin H.-M. Sha DESIGN OF -D FILTERS USING A PARALLEL PROCESSOR ARCHITECTURE Nelson L. Passos Robert P. Light Virgil Andronache Edwin H.-M. Sha Midwestern State University University of Notre Dame Wichita Falls, TX 76308

More information

Theoretical Constraints on Multi-Dimensional Retiming Design Techniques

Theoretical Constraints on Multi-Dimensional Retiming Design Techniques header for SPIE use Theoretical onstraints on Multi-imensional Retiming esign Techniques N. L. Passos,.. efoe, R. J. Bailey, R. Halverson, R. Simpson epartment of omputer Science Midwestern State University

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

RPUSM: An Effective Instruction Scheduling Method for. Nested Loops

RPUSM: An Effective Instruction Scheduling Method for. Nested Loops RPUSM: An Effective Instruction Scheduling Method for Nested Loops Yi-Hsuan Lee, Ming-Lung Tsai and Cheng Chen Department of Computer Science and Information Engineering 1001 Ta Hsueh Road, Hsinchu, Taiwan,

More information

A Study of Software Pipelining for Multi-dimensional Problems

A Study of Software Pipelining for Multi-dimensional Problems Study of Software Pipelining for Multi-dimensional Problems Reynold ailey, elvin efoe, Ranette Halverson, Richard Simpson, Nelson Passos epartment of omputer Science Midwestern State University Wichita

More information

Efficient Polynomial-Time Nested Loop Fusion with Full Parallelism

Efficient Polynomial-Time Nested Loop Fusion with Full Parallelism Efficient Polynomial-Time Nested Loop Fusion with Full Parallelism Edwin H.-M. Sha Timothy W. O Neil Nelson L. Passos Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Erik

More information

High-level Variable Selection for Partial-Scan Implementation

High-level Variable Selection for Partial-Scan Implementation High-level Variable Selection for Partial-Scan Implementation FrankF.Hsu JanakH.Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract In this paper, we propose

More information

Chapter 1. Linear Equations and Straight Lines. 2 of 71. Copyright 2014, 2010, 2007 Pearson Education, Inc.

Chapter 1. Linear Equations and Straight Lines. 2 of 71. Copyright 2014, 2010, 2007 Pearson Education, Inc. Chapter 1 Linear Equations and Straight Lines 2 of 71 Outline 1.1 Coordinate Systems and Graphs 1.4 The Slope of a Straight Line 1.3 The Intersection Point of a Pair of Lines 1.2 Linear Inequalities 1.5

More information

The Rectangular Coordinate System and Equations of Lines. College Algebra

The Rectangular Coordinate System and Equations of Lines. College Algebra The Rectangular Coordinate System and Equations of Lines College Algebra Cartesian Coordinate System A grid system based on a two-dimensional plane with perpendicular axes: horizontal axis is the x-axis

More information

A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture

A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture Robert S. French April 5, 1989 Abstract Computational origami is a parallel-processing concept in which

More information

Lecture 12 (Last): Parallel Algorithms for Solving a System of Linear Equations. Reference: Introduction to Parallel Computing Chapter 8.

Lecture 12 (Last): Parallel Algorithms for Solving a System of Linear Equations. Reference: Introduction to Parallel Computing Chapter 8. CZ4102 High Performance Computing Lecture 12 (Last): Parallel Algorithms for Solving a System of Linear Equations - Dr Tay Seng Chuan Reference: Introduction to Parallel Computing Chapter 8. 1 Topic Overview

More information

Polar Coordinates. 2, π and ( )

Polar Coordinates. 2, π and ( ) Polar Coordinates Up to this point we ve dealt exclusively with the Cartesian (or Rectangular, or x-y) coordinate system. However, as we will see, this is not always the easiest coordinate system to work

More information

Lecture 14. Resource Allocation involving Continuous Variables (Linear Programming) 1.040/1.401/ESD.018 Project Management.

Lecture 14. Resource Allocation involving Continuous Variables (Linear Programming) 1.040/1.401/ESD.018 Project Management. 1.040/1.401/ESD.018 Project Management Lecture 14 Resource Allocation involving Continuous Variables (Linear Programming) April 2, 2007 Samuel Labi and Fred Moavenzadeh Massachusetts Institute of Technology

More information

MAT 003 Brian Killough s Instructor Notes Saint Leo University

MAT 003 Brian Killough s Instructor Notes Saint Leo University MAT 003 Brian Killough s Instructor Notes Saint Leo University Success in online courses requires self-motivation and discipline. It is anticipated that students will read the textbook and complete sample

More information

Profiling Dependence Vectors for Loop Parallelization

Profiling Dependence Vectors for Loop Parallelization Profiling Dependence Vectors for Loop Parallelization Shaw-Yen Tseng Chung-Ta King Chuan-Yi Tang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan, R.O.C. fdr788301,king,cytangg@cs.nthu.edu.tw

More information

Hypercubes. (Chapter Nine)

Hypercubes. (Chapter Nine) Hypercubes (Chapter Nine) Mesh Shortcomings: Due to its simplicity and regular structure, the mesh is attractive, both theoretically and practically. A problem with the mesh is that movement of data is

More information

Design and Analysis of Efficient Application-Specific On-line Page Replacement Techniques

Design and Analysis of Efficient Application-Specific On-line Page Replacement Techniques Design and Analysis of Efficient Application-Specific On-line Page Replacement Techniques Virgil Andronache Edwin H.-M. Sha Nelson L. Passos Dept of Computer Science and Engineering Department of Computer

More information

Linear Loop Transformations for Locality Enhancement

Linear Loop Transformations for Locality Enhancement Linear Loop Transformations for Locality Enhancement 1 Story so far Cache performance can be improved by tiling and permutation Permutation of perfectly nested loop can be modeled as a linear transformation

More information

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers. CS 612 Software Design for High-performance Architectures 1 computers. CS 412 is desirable but not high-performance essential. Course Organization Lecturer:Paul Stodghill, stodghil@cs.cornell.edu, Rhodes

More information

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany

More information

Null space basis: mxz. zxz I

Null space basis: mxz. zxz I Loop Transformations Linear Locality Enhancement for ache performance can be improved by tiling and permutation Permutation of perfectly nested loop can be modeled as a matrix of the loop nest. dependence

More information

Algebra 2 Common Core Summer Skills Packet

Algebra 2 Common Core Summer Skills Packet Algebra 2 Common Core Summer Skills Packet Our Purpose: Completion of this packet over the summer before beginning Algebra 2 will be of great value to helping students successfully meet the academic challenges

More information

Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube

Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube Kavish Gandhi April 4, 2015 Abstract A geodesic in the hypercube is the shortest possible path between two vertices. Leader and Long

More information

Twiddle Factor Transformation for Pipelined FFT Processing

Twiddle Factor Transformation for Pipelined FFT Processing Twiddle Factor Transformation for Pipelined FFT Processing In-Cheol Park, WonHee Son, and Ji-Hoon Kim School of EECS, Korea Advanced Institute of Science and Technology, Daejeon, Korea icpark@ee.kaist.ac.kr,

More information

Increasing Parallelism of Loops with the Loop Distribution Technique

Increasing Parallelism of Loops with the Loop Distribution Technique Increasing Parallelism of Loops with the Loop Distribution Technique Ku-Nien Chang and Chang-Biau Yang Department of pplied Mathematics National Sun Yat-sen University Kaohsiung, Taiwan 804, ROC cbyang@math.nsysu.edu.tw

More information

ECE 5730 Memory Systems

ECE 5730 Memory Systems ECE 5730 Memory Systems Spring 2009 Off-line Cache Content Management Lecture 7: 1 Quiz 4 on Tuesday Announcements Only covers today s lecture No office hours today Lecture 7: 2 Where We re Headed Off-line

More information

1 Affine and Projective Coordinate Notation

1 Affine and Projective Coordinate Notation CS348a: Computer Graphics Handout #9 Geometric Modeling Original Handout #9 Stanford University Tuesday, 3 November 992 Original Lecture #2: 6 October 992 Topics: Coordinates and Transformations Scribe:

More information

Types of Edges. Why Edge Detection? Types of Edges. Edge Detection. Gradient. Edge Detection

Types of Edges. Why Edge Detection? Types of Edges. Edge Detection. Gradient. Edge Detection Why Edge Detection? How can an algorithm extract relevant information from an image that is enables the algorithm to recognize objects? The most important information for the interpretation of an image

More information

Mapping Algorithms to Hardware By Prawat Nagvajara

Mapping Algorithms to Hardware By Prawat Nagvajara Electrical and Computer Engineering Mapping Algorithms to Hardware By Prawat Nagvajara Synopsis This note covers theory, design and implementation of the bit-vector multiplication algorithm. It presents

More information

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,

More information

Embedded Systems Design with Platform FPGAs

Embedded Systems Design with Platform FPGAs Embedded Systems Design with Platform FPGAs Spatial Design Ron Sass and Andrew G. Schmidt http://www.rcs.uncc.edu/ rsass University of North Carolina at Charlotte Spring 2011 Embedded Systems Design with

More information

You should be able to plot points on the coordinate axis. You should know that the the midpoint of the line segment joining (x, y 1 1

You should be able to plot points on the coordinate axis. You should know that the the midpoint of the line segment joining (x, y 1 1 Name GRAPHICAL REPRESENTATION OF DATA: You should be able to plot points on the coordinate axis. You should know that the the midpoint of the line segment joining (x, y 1 1 ) and (x, y ) is x1 x y1 y,.

More information

COMPUTER ORGANIZATION AND ARCHITECTURE

COMPUTER ORGANIZATION AND ARCHITECTURE COMPUTER ORGANIZATION AND ARCHITECTURE For COMPUTER SCIENCE COMPUTER ORGANIZATION. SYLLABUS AND ARCHITECTURE Machine instructions and addressing modes, ALU and data-path, CPU control design, Memory interface,

More information

EGR 102 Introduction to Engineering Modeling. Lab 10A Nested Programming II Iterative Nesting

EGR 102 Introduction to Engineering Modeling. Lab 10A Nested Programming II Iterative Nesting EGR 102 Introduction to Engineering Modeling Lab 10A Nested Programming II Iterative Nesting 1 Overview 1. Nested loops 2. Nested loop: Creating Arrays 3. Nested Loops: 2 Variable functions 4. Nested Loops

More information

Affine Loop Optimization using Modulo Unrolling in CHAPEL

Affine Loop Optimization using Modulo Unrolling in CHAPEL Affine Loop Optimization using Modulo Unrolling in CHAPEL Aroon Sharma, Joshua Koehler, Rajeev Barua LTS POC: Michael Ferguson 2 Overall Goal Improve the runtime of certain types of parallel computers

More information

211: Computer Architecture Summer 2016

211: Computer Architecture Summer 2016 211: Computer Architecture Summer 2016 Liu Liu Topic: Assembly Programming Storage - Assembly Programming: Recap - Call-chain - Factorial - Storage: - RAM - Caching - Direct - Mapping Rutgers University

More information

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into 2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into the viewport of the current application window. A pixel

More information

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop Parallelizing The Matrix Multiplication 6/10/2013 LONI Parallel Programming Workshop 2013 1 Serial version 6/10/2013 LONI Parallel Programming Workshop 2013 2 X = A md x B dn = C mn d c i,j = a i,k b k,j

More information

Advanced optimizations of cache performance ( 2.2)

Advanced optimizations of cache performance ( 2.2) Advanced optimizations of cache performance ( 2.2) 30 1. Small and Simple Caches to reduce hit time Critical timing path: address tag memory, then compare tags, then select set Lower associativity Direct-mapped

More information

LAB 2 VECTORS AND MATRICES

LAB 2 VECTORS AND MATRICES EN001-4: Intro to Computational Design Tufts University, Department of Computer Science Prof. Soha Hassoun LAB 2 VECTORS AND MATRICES 1.1 Background Overview of data types Programming languages distinguish

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

1. Answer: x or x. Explanation Set up the two equations, then solve each equation. x. Check

1. Answer: x or x. Explanation Set up the two equations, then solve each equation. x. Check Thinkwell s Placement Test 5 Answer Key If you answered 7 or more Test 5 questions correctly, we recommend Thinkwell's Algebra. If you answered fewer than 7 Test 5 questions correctly, we recommend Thinkwell's

More information

A Feasibility Study for Methods of Effective Memoization Optimization

A Feasibility Study for Methods of Effective Memoization Optimization A Feasibility Study for Methods of Effective Memoization Optimization Daniel Mock October 2018 Abstract Traditionally, memoization is a compiler optimization that is applied to regions of code with few

More information

HP-35s Calculator Program Curves 2A

HP-35s Calculator Program Curves 2A Programmer: Dr. Bill Hazelton Date: March, 2008. Version: 1.0 Mnemonic: P for Parabolic Vertical Curve. Line Instruction Display User Instructions P001 LBL P LBL P P002 CLSTK CLEAR 5 P003 FS? 10 FLAGS

More information

Section Graphs and Lines

Section Graphs and Lines Section 1.1 - Graphs and Lines The first chapter of this text is a review of College Algebra skills that you will need as you move through the course. This is a review, so you should have some familiarity

More information

Space Filling Curves and Hierarchical Basis. Klaus Speer

Space Filling Curves and Hierarchical Basis. Klaus Speer Space Filling Curves and Hierarchical Basis Klaus Speer Abstract Real world phenomena can be best described using differential equations. After linearisation we have to deal with huge linear systems of

More information

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Using Intel Streaming SIMD Extensions for 3D Geometry Processing Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,

More information

Data Dependence Analysis

Data Dependence Analysis CSc 553 Principles of Compilation 33 : Loop Dependence Data Dependence Analysis Department of Computer Science University of Arizona collberg@gmail.com Copyright c 2011 Christian Collberg Data Dependence

More information

About Graphing Lines

About Graphing Lines About Graphing Lines TABLE OF CONTENTS About Graphing Lines... 1 What is a LINE SEGMENT?... 1 Ordered Pairs... 1 Cartesian Co-ordinate System... 1 Ordered Pairs... 2 Line Segments... 2 Slope of a Line

More information

Matrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation

Matrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation Chapter 7 Introduction to Matrices This chapter introduces the theory and application of matrices. It is divided into two main sections. Section 7.1 discusses some of the basic properties and operations

More information

Problem with Scanning an Infix Expression

Problem with Scanning an Infix Expression Operator Notation Consider the infix expression (X Y) + (W U), with parentheses added to make the evaluation order perfectly obvious. This is an arithmetic expression written in standard form, called infix

More information

Dense Matrix Algorithms

Dense Matrix Algorithms Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication

More information

Algebra I Notes Linear Equations and Inequalities in Two Variables Unit 04c

Algebra I Notes Linear Equations and Inequalities in Two Variables Unit 04c Big Idea: Describe the similarities and differences between equations and inequalities including solutions and graphs. Skill: graph linear equations and find possible solutions to those equations using

More information

Writing and Graphing Linear Equations. Linear equations can be used to represent relationships.

Writing and Graphing Linear Equations. Linear equations can be used to represent relationships. Writing and Graphing Linear Equations Linear equations can be used to represent relationships. Linear equation An equation whose solutions form a straight line on a coordinate plane. Collinear Points that

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

CSCI 4620/8626. Computer Graphics Clipping Algorithms (Chapter 8-5 )

CSCI 4620/8626. Computer Graphics Clipping Algorithms (Chapter 8-5 ) CSCI 4620/8626 Computer Graphics Clipping Algorithms (Chapter 8-5 ) Last update: 2016-03-15 Clipping Algorithms A clipping algorithm is any procedure that eliminates those portions of a picture outside

More information

Redundant Synchronization Elimination for DOACROSS Loops

Redundant Synchronization Elimination for DOACROSS Loops Redundant Synchronization Elimination for DOACROSS Loops Ding-Kai Chen Pen-Chung Yew fdkchen,yewg@csrd.uiuc.edu Center for Supercomputing Research and Development University of Illinois at Urbana-Champaign

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information

A VHDL Design Optimization for Two-Dimensional Filters

A VHDL Design Optimization for Two-Dimensional Filters A VHDL Design Optimization for Two-Dimensional Filters Nelson Luiz Passos Jian Song Robert Light Ranette Halverson Richard Simpson Department of Computer Science Midwestern State University Wichita Falls,

More information

Section 2.2 Graphs of Linear Functions

Section 2.2 Graphs of Linear Functions Section. Graphs of Linear Functions Section. Graphs of Linear Functions When we are working with a new function, it is useful to know as much as we can about the function: its graph, where the function

More information

THE CAMERA TRANSFORM

THE CAMERA TRANSFORM On-Line Computer Graphics Notes THE CAMERA TRANSFORM Kenneth I. Joy Visualization and Graphics Research Group Department of Computer Science University of California, Davis Overview To understanding the

More information

Vector Calculus: Understanding the Cross Product

Vector Calculus: Understanding the Cross Product University of Babylon College of Engineering Mechanical Engineering Dept. Subject : Mathematics III Class : 2 nd year - first semester Date: / 10 / 2016 2016 \ 2017 Vector Calculus: Understanding the Cross

More information

Essential constraints: Data Dependences. S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2

Essential constraints: Data Dependences. S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2 Essential constraints: Data Dependences S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2 Essential constraints: Data Dependences S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2 S2

More information

A Preliminary Assessment of the ACRI 1 Fortran Compiler

A Preliminary Assessment of the ACRI 1 Fortran Compiler A Preliminary Assessment of the ACRI 1 Fortran Compiler Joan M. Parcerisa, Antonio González, Josep Llosa, Toni Jerez Computer Architecture Department Universitat Politècnica de Catalunya Report No UPC-DAC-94-24

More information

Visual Formula, Important Graphs, Inequalities, and Other Things

Visual Formula, Important Graphs, Inequalities, and Other Things flynt_1598632914_ch10, 4/8/6, 12:3191 chapter 10 Visual Formula, Important Graphs, Inequalities, and Other Things The activities this chapter covers allow you to use Visual Formula to work with many of

More information

CSE P 501 Compilers. Loops Hal Perkins Spring UW CSE P 501 Spring 2018 U-1

CSE P 501 Compilers. Loops Hal Perkins Spring UW CSE P 501 Spring 2018 U-1 CSE P 501 Compilers Loops Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 U-1 Agenda Loop optimizations Dominators discovering loops Loop invariant calculations Loop transformations A quick look at some

More information

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan Memory Hierarchy

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan Memory Hierarchy SE-292 High Performance Computing Memory Hierarchy R. Govindarajan govind@serc Memory Hierarchy 2 1 Memory Organization Memory hierarchy CPU registers few in number (typically 16/32/128) subcycle access

More information

Tiling: A Data Locality Optimizing Algorithm

Tiling: A Data Locality Optimizing Algorithm Tiling: A Data Locality Optimizing Algorithm Previously Unroll and Jam Homework PA3 is due Monday November 2nd Today Unroll and Jam is tiling Code generation for fixed-sized tiles Paper writing and critique

More information

Characterization of the Northwest Coast Native Art Ovoid

Characterization of the Northwest Coast Native Art Ovoid Characterization of the Northwest Coast Native Art Ovoid By: Nathaniel P. Wilkerson July 10, 2010 Probably the most predominant design unit in the art work, the Ovoid takes many shapes and forms. In theory

More information

Chapter Multidimensional Gradient Method

Chapter Multidimensional Gradient Method Chapter 09.04 Multidimensional Gradient Method After reading this chapter, you should be able to: 1. Understand how multi-dimensional gradient methods are different from direct search methods. Understand

More information

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers Administration CS 380C: Advanced Topics in Compilers Instructor: eshav Pingali Professor (CS, ICES) Office: POB 4.126A Email: pingali@cs.utexas.edu TA: TBD Graduate student (CS) Office: Email: Meeting

More information

Performance Issues in Parallelization Saman Amarasinghe Fall 2009

Performance Issues in Parallelization Saman Amarasinghe Fall 2009 Performance Issues in Parallelization Saman Amarasinghe Fall 2009 Today s Lecture Performance Issues of Parallelism Cilk provides a robust environment for parallelization It hides many issues and tries

More information

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

Software Optimization Using Hardware Synthesis Techniques Bret Victor,

Software Optimization Using Hardware Synthesis Techniques Bret Victor, EE 219B LOGIC SYNTHESIS, MAY 2000 Software Optimization Using Hardware Synthesis Techniques Bret Victor, bret@eecs.berkeley.edu Abstract Although a myriad of techniques exist in the hardware design domain

More information

Vector an ordered series of scalar quantities a one-dimensional array. Vector Quantity Data Data Data Data Data Data Data Data

Vector an ordered series of scalar quantities a one-dimensional array. Vector Quantity Data Data Data Data Data Data Data Data Vector Processors A vector processor is a pipelined processor with special instructions designed to keep the (floating point) execution unit pipeline(s) full. These special instructions are vector instructions.

More information

Sec 4.1 Coordinates and Scatter Plots. Coordinate Plane: Formed by two real number lines that intersect at a right angle.

Sec 4.1 Coordinates and Scatter Plots. Coordinate Plane: Formed by two real number lines that intersect at a right angle. Algebra I Chapter 4 Notes Name Sec 4.1 Coordinates and Scatter Plots Coordinate Plane: Formed by two real number lines that intersect at a right angle. X-axis: The horizontal axis Y-axis: The vertical

More information

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana

More information

Lofting 3D Shapes. Abstract

Lofting 3D Shapes. Abstract Lofting 3D Shapes Robby Prescott Department of Computer Science University of Wisconsin Eau Claire Eau Claire, Wisconsin 54701 robprescott715@gmail.com Chris Johnson Department of Computer Science University

More information

Point-to-Point Synchronisation on Shared Memory Architectures

Point-to-Point Synchronisation on Shared Memory Architectures Point-to-Point Synchronisation on Shared Memory Architectures J. Mark Bull and Carwyn Ball EPCC, The King s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland, U.K. email:

More information

Transforming Objects in Inkscape Transform Menu. Move

Transforming Objects in Inkscape Transform Menu. Move Transforming Objects in Inkscape Transform Menu Many of the tools for transforming objects are located in the Transform menu. (You can open the menu in Object > Transform, or by clicking SHIFT+CTRL+M.)

More information

Legal and impossible dependences

Legal and impossible dependences Transformations and Dependences 1 operations, column Fourier-Motzkin elimination us use these tools to determine (i) legality of permutation and Let generation of transformed code. (ii) Recall: Polyhedral

More information

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information

More information

UNIT-II. Part-2: CENTRAL PROCESSING UNIT

UNIT-II. Part-2: CENTRAL PROCESSING UNIT Page1 UNIT-II Part-2: CENTRAL PROCESSING UNIT Stack Organization Instruction Formats Addressing Modes Data Transfer And Manipulation Program Control Reduced Instruction Set Computer (RISC) Introduction:

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

(Refer Slide Time: 00:04:20)

(Refer Slide Time: 00:04:20) Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 8 Three Dimensional Graphics Welcome back all of you to the lectures in Computer

More information

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS 1 RONNIE O. SERFA JUAN, 2 CHAN SU PARK, 3 HI SEOK KIM, 4 HYEONG WOO CHA 1,2,3,4 CheongJu University E-maul: 1 engr_serfs@yahoo.com,

More information

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Scan Converting Lines, Circles and Ellipses Hello everybody, welcome again

More information

Planar Point Location

Planar Point Location C.S. 252 Prof. Roberto Tamassia Computational Geometry Sem. II, 1992 1993 Lecture 04 Date: February 15, 1993 Scribe: John Bazik Planar Point Location 1 Introduction In range searching, a set of values,

More information

Software pipelining of nested loops 2 1 Introduction Exploiting parallelism in loops in scientic programs is an important factor in realizing the pote

Software pipelining of nested loops 2 1 Introduction Exploiting parallelism in loops in scientic programs is an important factor in realizing the pote Software pipelining of nested loops J. Ramanujam Dept. of Electrical and Computer Engineering Louisiana State University, Baton Rouge, LA 70803 E-mail: jxr@ee.lsu.edu May 1994 Abstract This paper presents

More information

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #16 Loops: Matrix Using Nested for Loop In this section, we will use the, for loop to code of the matrix problem.

More information

Multiple-Subscripted Arrays

Multiple-Subscripted Arrays Arrays in C can have multiple subscripts. A common use of multiple-subscripted arrays (also called multidimensional arrays) is to represent tables of values consisting of information arranged in rows and

More information

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Spring 2 Parallelization Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Outline Why Parallelism Parallel Execution Parallelizing Compilers

More information

Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement

Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement Adil Gheewala*, Jih-Kwon Peir*, Yen-Kuang Chen**, Konrad Lai** *Department of CISE, University of Florida,

More information

Chapter 1. Math review. 1.1 Some sets

Chapter 1. Math review. 1.1 Some sets Chapter 1 Math review This book assumes that you understood precalculus when you took it. So you used to know how to do things like factoring polynomials, solving high school geometry problems, using trigonometric

More information

Leslie Lamport: The Specification Language TLA +

Leslie Lamport: The Specification Language TLA + Leslie Lamport: The Specification Language TLA + This is an addendum to a chapter by Stephan Merz in the book Logics of Specification Languages by Dines Bjørner and Martin C. Henson (Springer, 2008). It

More information

Cache-Oblivious Traversals of an Array s Pairs

Cache-Oblivious Traversals of an Array s Pairs Cache-Oblivious Traversals of an Array s Pairs Tobias Johnson May 7, 2007 Abstract Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and develop a cache-oblivious

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY. Working Paper 113 March 29, ON SOLVING THE FINDSPACE PROBLEM, or

MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY. Working Paper 113 March 29, ON SOLVING THE FINDSPACE PROBLEM, or MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY Working Paper 113 March 29, 1973 ON SOLVING THE FINDSPACE PROBLEM, or How to Find Out Where Things Aren't... Gregory F. Pfister

More information