Incremental Tree Height Reduction For High Level Synthesis *

Size: px
Start display at page:

Download "Incremental Tree Height Reduction For High Level Synthesis *"

Transcription

1 Incremental Tree Height Reduction For High Level Synthesis * Alexandru Nicolau+ Roni Potasman++ +Information and Computer Science Department ++Dept. of Electrical and Computer Engineering University of California, Irvine, CA Abstract A new local and incrementoltree Height Reduction (THR) technique for parallelization of application programs is presented. Although THR was introduced many years ago-it has not been widely used in HLS scheduling systems. The two main reasons for that were the inability of most systems to compact beyond basic blocks of the program, thus limiting the strength of THR and the fact that traditionally THR required a global view of the program which made it either inefficient or impossible to integrate into local transformations. THR has several interesting prop erties: while known compaction techniques yield constant factor of speed-up (even with unlimited resources), THR has speed-up of O(n/ log n). Furthermore, THR is able to compact programs when other techniques fail (due to data depency between operations). The capability of our system to integrate THR with beyond-basic-blocks compaction and with loop pipelining means that more operations may be considered for THR which may yield much more aggressive compaction. 1 Introduction Tree Height Reduction is a well known technique for reducing the height of an expression tree from O(n) to O(1ogn) by balancing its subtrees. The height of the tree is the number of steps needed to compute the expression. Suppose the following schedule is given and the hardware constraints allow the use of only 2 adders. cycle 1: cycle 2: cycle 3: cycle 4: rl := ro + el; 52 := rl + c2; r3 := r2 + c3; 54 := 73 + r2; r5 := 73 + c4; Without any semantic transformation four steps are re- 'This work was supported in part by NSF grant CCR and ONR grant N K Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. quired to execute this program; However, using THR the schedule may be rewritten as: cycle 1: cycle 2: cycle 3: rl := ro + cl; r2 := r1 + c2; 24 := 73 + r2; 11 := c2 + c3; r3 := rl + tl; r5 := r3 + c4; Which produces the same results under the same hardware constraints but reduces the time needed to execute it from 4 to 3 steps. In general, the potential speed-up factor of THR (when the tree is fully balanced) is of O(n/logn), limited only by the resources available. Although THR was introduced many years ago tkumuch72, Ku78]-it has not been widely used in HLS or other scheduling systems. Two main reasons contributed to that: first, THR is effective only when there is a longenough chain of operations which are data depent. Unfortunately, the original THR was only applicable to operations within basic blocks. Since the average number of o p erations within a basic block is 4-5 [TjF170] (and not all of these will always form a chain) potential speed-up in basic blocks is limited. Second, the traditional implementation of THR required global information about the whole expression to be reduced. This prevented integration of THR into any of the existing local and incremental parallelizing transformations (List Scheduling, Trace Scheduling, Percolation Scheduling etc.) which can be used for HLS. In [PLNGSO] we presented a global (beyond basic blocks) approach to scheduling. By designing a set of incremental transformations for THR that integrate into our system of local transformations we overcome the previous problems associated with THR. In this context incremental and local THR has some important advantages: it is less ad hoc than the global one, it has more general application, it is easier to implement and it interfaces very well with other local parallelizing transformations and enables better control of resources. Furthermore, the local and incremental aspect of our technique will exploit potential opportunities wherever they are interspersed in the program; so even in a program that is not as regular as in the above example, we may still benefit from local opportunities interspersed throughout the program. The application of our THR is controlled by the resources available such that it only 'fills' unused resources. Thus, the traditional concern that THR may degrade per th ACM/IEEE Design Automation Conference@ ' 1991 ACM /91/0006/0770 $1.50

2 formance by generating redundant code that cannot translate into speed-up (due to limited resources) is completely eliminated by our incremental approach. In performing THR care must be taken not to violate numerical and other properties of the code. However in most cases this process can be applied without detrimental side effects. Although our implementation of THR can handle pipelined operations- for simplicity-we assume throughout the algorithm description that all operations are onecycle operations. The extension of the incremental THR algorithm to pipelined operations is straightforward. To our knowledge, this is the fir6t published local and incremental THR algorithm working across basic blocks. 2 Previous work The use of THR in HLS systems is relatively rare. In Flame1 [Tr87] an idea similar to THR is implemented by an algorithm called level compression. This algorithm picks (heuristically) a node about halfway along the critical path and tries to move it up so that the height of the path may be reduced. As height reduction proceeds, certain nodes are frozen to prevent them from taking part in further reductions. Nodes chosen for later moving must be about halfway along non-frozen portions of critical paths. This height reduction procedure usually cannot do better than approximately halve the original height. The moving process stops when there is no further height reduction or when resource bounds are exceeded at some level. In this algorithm care must be taken to avoid doing transformations that increase the graphs height. 3 THR Applications Although [Ku78] claims that applying THR to multioperation machines would be quite disappointing we found a large span of applications of THR in the context of HLS. Obviously, if one considers only basic blocks, the chain of depencies is not long enough to expose the strength of THR, but by looking at the global RTL level parallelism we are able to go past conditional jumps and have a longer chain of operations which improves the potential parallelism. This is particularly noticeable when combined with loop pipelining, when operations from different iterations make this chain even longer. While space limitations prevent a detailed example using pipelining with THR the benchmark results reflect this. The two most attractive applications of THR are digital filters and array computations. Digital filters are potential candidates for THR since they have chain of additions (resulting from the different delay elements) and since they are usually implementing loops which may be pipelined, so that much more operations may be exposed to THR. The most frequent array computations suitable for THR are sum of vector elements, dot product and simple recurrences where one can find a chain of depent operations. Although these chains of depencies are simple-they prevent any reduction to parallel form without an algorithmic change. Given enough resources, THR will reduce the computation time for all these examples from O(n) to O(1og n). 4 Algorithm Description 4.1 Background The idea behind tree height reduction is to try to compact a program at the expense of additional computation. In a design where execution of more than one operation per cycle is possible it is natural to utilize all available (unused) resources in order to increase performance. Hence, THR is adding more operations to the program, which can be executed by these free resources,.such that the total execution time of the program is reduced. THR takes advantage of the associativity and distributivity properties of arithmetic operations. For simplicity, we only present the algorithm with addition, multiplication and subtraction. It can be exted easily to programs with divisions and logical (AND, OR) operations as well. 4.2 Definitions: In this section we define some notations used later in the algorithm: Program: The program is represented by a control flow graph (CFG). The vertices (nodes) correspond to operations executed in each cycle. The edges represent flow of control from one node to its successor. Initially all nodes contain a single operation. Making a program more parallel involves compaction of several operations into one node while preserving the semantics of the sequential program. Operation: Each operation has a type (opfype) and variables which are called uses variables (for operands read) and a defvariable (for operands written). For the operation: Q := b * c the def is Q and the uses are b and c. The op-type is multiplication. Current-op: The operation currently being examined (or the operation we are trying to schedule earlier than its current cycle). Selectedsath: The Dath selected for THR. Later-definer and earlier-definer: The operations defining the uses of current-op. In a := b * c, b and c are called the definers of a. Suppose the following program is given: cycle (k) : cycle (k+l): cycle (k+2): b := d +e; c := h - e; a := b * c; h := d * g; We ll call the operation (c := h - e) the later-definer of operation (Q := b * c) while the operation (b := d + e) is called the earlier-definer of the current-op. Available variable: A variable is said to be available in cycle (k) if it is defined at cycle (k-1) or earlier. In the example above c is available in cycle (k+2) while b in cycle (k+l). Percolate Operations: Scheduling operations as soon as possible using the Percolation Scheduling (PS) transformations. PS does the actual operation motion between nodes ensuring that program correctness is preserved. Thus, PS would check for data depency preservation on all paths through the affected code and modify the schedule accordingly. While this is extremely simple in the absence of conditional jumps, the general transformations are nontrivial. Detailed discussion and description of the PS transformations are found in [Ni85, PLNGSO]. 771

3 4.3 Algorithm in detail Our local and incremental THR algorithm could be invoked in one of two ways: If during the incremental process, in which PS is trying to move an operation up from a node to its predecessor, a depency is encountered then THR is invoked to incrementally change the code to allow the motion. Alternatively, incremental THR can be invoked in the final phase of the compaction process, after all dataindepent operations have moved up as high as possible and there are still unused resources to fill. When activated for a particular operation the algorithm checks whether it could be scheduled earlier than its current cycle by introduction of a new operation which can be performed early enough so it can be used to eliminate the depency on the later-definer and advance the schedule of the current-op. Since each node may have more than one predecessor node (several incoming paths), incremental THR should be performed with respect to selectedpaths in the program. On different paths, each operation may have different later-definer and different earlier-definer, thus each path should be considered separately. Although it is usually sufficient to check only adjacent nodes in the program, hence preserve locality, it turns out that in order to achieve optimality (in the presence of sufficient resources), the whole chain of operations on the path has to be checked. This process is not needed when the resources are limited. The following algorithmic description refers to the optimal reduction on each path. The algorithm analyzes two cases differently: the first is when the associativity property of operations is used which happens whenever the current-op and its later-definer constitute one of the following pairs: ADD/ADD, ADD/SUB, SUB/ADD, SUB/SUB and MUL/MUL. The other case is when current-op is MUL and its later-defineris either ADD or SUB where the distributivity property is used. In any of these cases we try to hoist current-op from its current node (cycle) to a predecessor node, which eventually may reduce the length of the program Necessary and sufficient conditions for an operation to be hoisted: 1. One of its definers must be available at least two cycles earlier than itself on the path selected. 2. current-op s later-definer has a definer which is available at least two cycles earlier than current-op s cycle on that path. 3. If current-op is ADD or SUB then later-definer has to be either ADD or SUB. (These legal combinations constitute a legal chain). If, on the other hand, current-op is MUL, the later-definer might be either MUL, ADD or SUB. 4. Both current-op and its definers have two uses variables. 5. All relevant nodes on the path (into which new operations are added) have free resources. This does not mean that the algorithm needs to consider all paths; we may simply concentrate on only one or several important paths. Due to the incremental nature of the transformations we can stop at any point in the process and still have correct code Procedures procedure THRAnalysis(selected-path) for each node n in the selected-path do reset back-tmck flag for each operations in TI do if current-op meets the conditions then begin switch /* find which case is it */ case associativity: AssociativityAnalysis( current-op) case distributivity: DistributivityAnalysis( current-op) percolate operations on the path if back-tmck is set recheck predecessor node else check next node ( THRAnaIysis) The back-track flag causes backtracking to the previous node. This node has to be rechecked due to the possible creation of new legal chains of operations following the pushing of multiplications upward. These chains may create further THR opportunities. See example 1. procedure AssociativityAnalysis(curren+op) if current-op is SUB AND later-definer is its subtrah set sign-flag earliest-op= Find Aighest Avail-Op( current-op) if succeeded to find such an operation then begin /* add new operations recursively to path */ Climb-Up(modified op-type, earliestap s earlier-definer, current-op s laterdefiner) remove current-op from list (AssociativityAnalysis) Signflag controls the correct addition of SUB operations into the program. We need to flip the operands whenever we find a SUB and its laterdefiner is its subtrah. procedure DistributivityAnalysis(current-op) /* the procedure is called when current-op is of the form d := a * (b + c). In this case we do not try to hoist d-but rather use the distributivity property and convert d into d := a * b + a * c. */ /* add first additive (a * b) */ add new operation with (MUL type, later-definer s earlier-definer, current-op s earlier-definer) into later-definer s node /* add second additive (U * c) */ add new operation with (MUL type, later-definer s later-definer, current-op s earlier-definer) into later-definer s node /* add modified current-op (d) */ add new operation with (later-definer s type, first-additive, second-additive) into current-op s node remove current-op from list set back-track flag (DistributivityAnalysis) 772

4 procedure Find-HighestAvail-Op(selected-path) /* the procedure is searching along the selected-path for the earliest operation which meets the conditions explained in section For correctness preservation, each time a SUB is found and its later-definer is the subtrah-the operation s sign is flipped. * f (Find-HighestAvail-Op) procedure Climb- Up(type, firstsp, second-op) f * the procedure adds new operations into selected-path after the earliest operation that meets the conditions has been found by previous procedure. Calls itself recursively until reaches the later-definer of current-op. The addition of, the modified current-op is done by higher level calling procedure. */ add new operation with (type, first-op, second-op) if (didn t reach current-op s later-definer) Climb-Up(first-op s type, first-op s later-definer, the newly added operation) ( Climb- Up) 5 Examples In this section we present two examples, the first to clarify the algorithm and the second to show its application. Example 1: Suppose the following program is given and assume that a0 and all c s are available at the first cycle. a4 := a3 * c4; Step I : Let us begin, for example, with the third op eration (a3 := a2 - c3). Its earlier-definer is not defined in the previous instruction, so execute AssociativityAnalys:s(). The op-type is SUB-so set signflag and call FindXighestAvail-Op(). But, since current-op violates condition 3 quit the procedure. Step 2: Current-op is (a4 := a3 * c4). It s type is MUL and its later-definer is SUB so DistributivityAnalysis() is called. Three operations are added into the tree: (1) A MUL,operation whose uses are later-definer s earlier-definer (c3) and current-op s earlier-definer (c4). This operation gets a new def (tl) and is inserted into later-definer s cycle. (2) Another MUL whose uses are (a2) and (c4) and its def is t2. It is inserted into later-definer s cycle. (3) The reconstruction of current-op with the type of later-definer (SUB) and with uses which are the operations just added. Its defis current-op s def. The backtrack flag is set. After this step and percolation, the code has this form: a4 := t2 - tl; 11 := c3 * c4; t2 := a2 * c4; Step 3: Since bock-track flag is set cycle 3 is rechecked. (a3 := a2 - c3) cannot be hoisted for the same rea- son mentioned in step l abov-o check the next operation in this cycle (t2 := a2 * c4). The operation is MUL and its later-definer (a2) is MUL, hence by FindJiighestAvail-Op() the highest op which is (a2 := a1 * c2) is found. Now, using Climb-Up(), operations are added as follows: a MUL (t3) operation whose uses are highest op s earlier-definer (c2) and current-op s earlier-definer (c4) is added. Then another MUL, whose uses are later-definer s laterdefiner (al) and t3 is added. After this step and percolation we get: tl := c3 * c4; t2 := a1 * t3; a4 := t2 - tl; 13 := c2 * c4; Step 4: Consider (a5 := a4 - c5) as the current-op. Since it is SUB we use AssociativityAnalysis() and get:.tl := c3 * c4; t2 := a1 * t3; a4 := t2-11; t3 := c2 * c4; t4 := tl + c5; a5 := t2 - t4; Note that if resources didn t allow one of the steps (e.g., if only two subtractors were available per cycle) the incremental THR would have stopped without allowing a5 to move up, but still would produce a one cycle gain. Example 2: This example shows how incremental THR works across basic blocks. Suppose the following program: t ( r4:=r3+c3 3 4 /ifr3 > o 7 v rl:=ro+co 9 rz:=rl+cl W { r3,r4,r5,r6) ue dead hexe {IS) 2 live here This program segment has 3 basic blocks separated by conditional jumps. Conventional THR (within basic block boundaries) on this program fails since there are not enough operations in each of these 3 chains to produce any speed-up. But applying our incremental THR beyond the conditionals yields a significant compaction (from 8 cycles to 3) as shown in the next page. We assume that the two (indepent) conditionals can be executed in one cycle. In case that there is no hardware support for the execution 773

5 of two conditional jumps-the second conditional will be deferred by one cycle yielding a 4 cycles schedule. This example clarifies that the ability to move operations beyond the basic blocks exts the potential length of chains of operation which is a crucial for the applicability of THR. rl:=ro+co ; tl:=cl+c2 : t2:=c3+c4 r2:=rt+cl ; r3:=rl+tl : t3:=t2+c5 6 Experiments 4 r4:=r3+c3 ; r5:=r3+t2 : r6:=r3+t3 C E A This section details the results obtained by applying the incremental THR on the fifth order elliptic filter example [PaKn89] and the Sehwa example presented in [PaPa88]. In the following tables FDS and FDLS stand for Force Directed (List) Scheduling, PBS for Percolation Based Synthesis [PLNGSO] and PBST for PBS with THR. A. Fifth Order Elliptic Filter: Table 1 refers to the non pipelined case where the model assumes that the execution unit has to be flushed before the succeeding operation can be issued. Table 2 is for the pipelined case where the functional units can accept a new input each cycle. The results for the elliptic filter show that even though incremental THR is powerful when applied to loop body-it may yield further parallelization when combined with loop pipelining. B. Sehwo: The Sehwa example is an implementation of a digital filter with 16 points. Using the same semantics Table 2: Fifth Order Elliptic Filter - Pipelined Without loop 11 With ~OOD II as [PaPa88], our system reduces the schedule from 6 time steps to 5. Using structural pipelining rather then functional pipelining (see [PLNGSO]) incremental THR reduces the schedule from 10 steps into 8 as shown in table 3. 7 Discussion and conclusion With the advance in optimizing compiler techniques and.especially those with local transformations (like Percolation Scheduling) it is shown that by using THR there is a real possibility to compact programs across basic blocks limits even when conventional depency would appear to preclude further speed-up. There is one possible drawback for using THR. It is the notion of numerical stability. This problem is illustrated by the code sequence: a := b - c followed by: d := a * e. This may be transformed (during THR) into: d := b * e - c* e. If the values of b or c are too large but the value of their difference is still small, the order in which the expression is evaluated may be significant. We argue that.the algorithm presented here may be used selectively. In cases where numerical stability is violated, the user may disallow THR. References [AlKe82] J. R. Allen and K. Kennedy. PFC: A program to convert Fortran to parallel form. Technical Report MASC TR 82-6, Rice University, [Ni85] A. Nicolau. Uniform Parallelism Exploitation in Ordinary Programs. Proceedings of the 1985 International Conference on Parallel Processing, [I<u78] D. J. Kuck. The Structure of Computers and Computations. Vol I, New York: Wiley, [KuMuCh72] D. J. Kuck, Y. Muraoka and S. C. Chen. On the number of operations simultaneously executable in Fortran-like programs and their resulting speedup. IEEE Trans. on Computers, C-21, 12, December [PaKn89] P. G. Paulin and J. P. Knight. Force-Directed scheduling for the Behavioral Synthesis of ASIC s. IEEE trans. on CAD, Vol. 8, No. 6, June [Papa881 N. Park and A. C. Parker. Sehwa: A Software Package for Synthesis of Pipelines from Behavioral Specifications. IEEE Trans. on CAD, Vol. 7, No. 3, March [PLNGSO] R. Potasman, J. Lis, A. Nicolau, D. Gajski. Percolation Based Synthesis. Proc. of the ACM IEEE 27th Design Automation Conference, June [TjF170] G. S. Tjaden and M. J. Flynn. Detection and parallel execution of indepent instructions. IEEE Trans. on Computers, Vol. 19, No. 10, October [Tr87] H. Trickey. Flamel: A High-Level Hardware Compiler. IEEE Trans. on CAD, Vol. 6, No. 2, March

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz Compiler Design Fall 2015 Control-Flow Analysis Sample Exercises and Solutions Prof. Pedro C. Diniz USC / Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, California 90292

More information

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No. # 10 Lecture No. # 16 Machine-Independent Optimizations Welcome to the

More information

Incorporating Compiler Feedback Into the Design of ASIPs

Incorporating Compiler Feedback Into the Design of ASIPs Incorporating Compiler Feedback Into the Design of ASIPs Frederick Onion Alexandru Nicolau Nikil Dutt Department of Information and Computer Science University of California, Irvine, CA 92717-3425 Abstract

More information

Machine-Independent Optimizations

Machine-Independent Optimizations Chapter 9 Machine-Independent Optimizations High-level language constructs can introduce substantial run-time overhead if we naively translate each construct independently into machine code. This chapter

More information

TR ON OPERATOR STRENGTH REDUCTION. John H. Crawford Mehd i Jazayer i

TR ON OPERATOR STRENGTH REDUCTION. John H. Crawford Mehd i Jazayer i TR 80-003 ON OPERATOR STRENGTH REDUCTION John H. Crawford Mehd i Jazayer i l. ON OPERATOR STRENGTH REDUCTION John H. Crawford Intel Corporation 3065 Bowers Avenue Santa Clara, California Mehdi Jazayeri

More information

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control, UNIT - 7 Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete Instruction, Multiple Bus Organization, Hard-wired Control, Microprogrammed Control Page 178 UNIT - 7 BASIC PROCESSING

More information

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES (2018-19) (ODD) Code Optimization Prof. Jonita Roman Date: 30/06/2018 Time: 9:45 to 10:45 Venue: MCA

More information

Automatic Counterflow Pipeline Synthesis

Automatic Counterflow Pipeline Synthesis Automatic Counterflow Pipeline Synthesis Bruce R. Childers, Jack W. Davidson Computer Science Department University of Virginia Charlottesville, Virginia 22901 {brc2m, jwd}@cs.virginia.edu Abstract The

More information

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion Outline Loop Optimizations Induction Variables Recognition Induction Variables Combination of Analyses Copyright 2010, Pedro C Diniz, all rights reserved Students enrolled in the Compilers class at the

More information

Lecture Notes on Loop Optimizations

Lecture Notes on Loop Optimizations Lecture Notes on Loop Optimizations 15-411: Compiler Design Frank Pfenning Lecture 17 October 22, 2013 1 Introduction Optimizing loops is particularly important in compilation, since loops (and in particular

More information

Topic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer

Topic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer Topic 14: Scheduling COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 The Back End Well, let s see Motivating example Starting point Motivating example Starting point Multiplication

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION 1 S.Ateeb Ahmed, 2 Mr.S.Yuvaraj 1 Student, Department of Electronics and Communication/ VLSI Design SRM University, Chennai, India 2 Assistant

More information

A Propagation Engine for GCC

A Propagation Engine for GCC A Propagation Engine for GCC Diego Novillo Red Hat Canada dnovillo@redhat.com May 1, 2005 Abstract Several analyses and transformations work by propagating known values and attributes throughout the program.

More information

Tour of common optimizations

Tour of common optimizations Tour of common optimizations Simple example foo(z) { x := 3 + 6; y := x 5 return z * y } Simple example foo(z) { x := 3 + 6; y := x 5; return z * y } x:=9; Applying Constant Folding Simple example foo(z)

More information

Compiler Optimization and Code Generation

Compiler Optimization and Code Generation Compiler Optimization and Code Generation Professor: Sc.D., Professor Vazgen Melikyan 1 Course Overview Introduction: Overview of Optimizations 1 lecture Intermediate-Code Generation 2 lectures Machine-Independent

More information

Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis*

Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis* Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis* R. Ruiz-Sautua, M. C. Molina, J.M. Mendías, R. Hermida Dpto. Arquitectura de Computadores y Automática Universidad Complutense

More information

ISA[k] Trees: a Class of Binary Search Trees with Minimal or Near Minimal Internal Path Length

ISA[k] Trees: a Class of Binary Search Trees with Minimal or Near Minimal Internal Path Length SOFTWARE PRACTICE AND EXPERIENCE, VOL. 23(11), 1267 1283 (NOVEMBER 1993) ISA[k] Trees: a Class of Binary Search Trees with Minimal or Near Minimal Internal Path Length faris n. abuali and roger l. wainwright

More information

High Level Synthesis

High Level Synthesis High Level Synthesis Design Representation Intermediate representation essential for efficient processing. Input HDL behavioral descriptions translated into some canonical intermediate representation.

More information

SPARK: A Parallelizing High-Level Synthesis Framework

SPARK: A Parallelizing High-Level Synthesis Framework SPARK: A Parallelizing High-Level Synthesis Framework Sumit Gupta Rajesh Gupta, Nikil Dutt, Alex Nicolau Center for Embedded Computer Systems University of California, Irvine and San Diego http://www.cecs.uci.edu/~spark

More information

REDUCTION IN RUN TIME USING TRAP ANALYSIS

REDUCTION IN RUN TIME USING TRAP ANALYSIS REDUCTION IN RUN TIME USING TRAP ANALYSIS 1 Prof. K.V.N.Sunitha 2 Dr V. Vijay Kumar 1 Professor & Head, CSE Dept, G.Narayanamma Inst.of Tech. & Science, Shaikpet, Hyderabad, India. 2 Dr V. Vijay Kumar

More information

Hello I am Zheng, Hongbin today I will introduce our LLVM based HLS framework, which is built upon the Target Independent Code Generator.

Hello I am Zheng, Hongbin today I will introduce our LLVM based HLS framework, which is built upon the Target Independent Code Generator. Hello I am Zheng, Hongbin today I will introduce our LLVM based HLS framework, which is built upon the Target Independent Code Generator. 1 In this talk I will first briefly introduce what HLS is, and

More information

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators. Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators Comp 412 COMP 412 FALL 2016 source code IR Front End Optimizer Back

More information

Time Constrained Modulo Scheduling with Global Resource Sharing

Time Constrained Modulo Scheduling with Global Resource Sharing Time Constrained Modulo Scheduling with Global Resource Sharing Christoph Jäschke Friedrich Beckmann Rainer Laur Institute for Electromagnetic Theory and Microelectronics, University of Bremen, Germany

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

CSc 453 Interpreters & Interpretation

CSc 453 Interpreters & Interpretation CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson Interpreters An interpreter is a program that executes another program. An interpreter implements a virtual machine,

More information

Compiler Construction 2010/2011 Loop Optimizations

Compiler Construction 2010/2011 Loop Optimizations Compiler Construction 2010/2011 Loop Optimizations Peter Thiemann January 25, 2011 Outline 1 Loop Optimizations 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6

More information

CSC D70: Compiler Optimization Register Allocation

CSC D70: Compiler Optimization Register Allocation CSC D70: Compiler Optimization Register Allocation Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry and Phillip Gibbons

More information

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Dynamic Instruction Scheduling with Branch Prediction

More information

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Register Allocation Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Mapping Vector Codes to a Stream Processor (Imagine)

Mapping Vector Codes to a Stream Processor (Imagine) Mapping Vector Codes to a Stream Processor (Imagine) Mehdi Baradaran Tahoori and Paul Wang Lee {mtahoori,paulwlee}@stanford.edu Abstract: We examined some basic problems in mapping vector codes to stream

More information

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02) Programming, Data Structures and Algorithms in Python Prof. Madhavan Mukund Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 04 Lecture - 01 Merge Sort (Refer

More information

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis Synthesis of output program (back-end) Intermediate Code Generation Optimization Before and after generating machine

More information

TABLES AND HASHING. Chapter 13

TABLES AND HASHING. Chapter 13 Data Structures Dr Ahmed Rafat Abas Computer Science Dept, Faculty of Computer and Information, Zagazig University arabas@zu.edu.eg http://www.arsaliem.faculty.zu.edu.eg/ TABLES AND HASHING Chapter 13

More information

Predicated Software Pipelining Technique for Loops with Conditions

Predicated Software Pipelining Technique for Loops with Conditions Predicated Software Pipelining Technique for Loops with Conditions Dragan Milicev and Zoran Jovanovic University of Belgrade E-mail: emiliced@ubbg.etf.bg.ac.yu Abstract An effort to formalize the process

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Lecture Notes on Contracts

Lecture Notes on Contracts Lecture Notes on Contracts 15-122: Principles of Imperative Computation Frank Pfenning Lecture 2 August 30, 2012 1 Introduction For an overview the course goals and the mechanics and schedule of the course,

More information

Register Allocation via Hierarchical Graph Coloring

Register Allocation via Hierarchical Graph Coloring Register Allocation via Hierarchical Graph Coloring by Qunyan Wu A THESIS Submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN COMPUTER SCIENCE MICHIGAN TECHNOLOGICAL

More information

Introduction to optimizations. CS Compiler Design. Phases inside the compiler. Optimization. Introduction to Optimizations. V.

Introduction to optimizations. CS Compiler Design. Phases inside the compiler. Optimization. Introduction to Optimizations. V. Introduction to optimizations CS3300 - Compiler Design Introduction to Optimizations V. Krishna Nandivada IIT Madras Copyright c 2018 by Antony L. Hosking. Permission to make digital or hard copies of

More information

URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures

URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures Presented at IFIP WG 10.3(Concurrent Systems) Working Conference on Architectures and Compliation Techniques for Fine and Medium Grain Parallelism, Orlando, Fl., January 1993 URSA: A Unified ReSource Allocator

More information

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 Compiler Optimizations Chapter 8, Section 8.5 Chapter 9, Section 9.1.7 2 Local vs. Global Optimizations Local: inside a single basic block Simple forms of common subexpression elimination, dead code elimination,

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

NISC Application and Advantages

NISC Application and Advantages NISC Application and Advantages Daniel D. Gajski Mehrdad Reshadi Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92697-3425, USA {gajski, reshadi}@cecs.uci.edu CECS Technical

More information

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning Lecture 1 Contracts 15-122: Principles of Imperative Computation (Spring 2018) Frank Pfenning In these notes we review contracts, which we use to collectively denote function contracts, loop invariants,

More information

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning Lecture 1 Contracts 15-122: Principles of Imperative Computation (Fall 2018) Frank Pfenning In these notes we review contracts, which we use to collectively denote function contracts, loop invariants,

More information

5.4 Pure Minimal Cost Flow

5.4 Pure Minimal Cost Flow Pure Minimal Cost Flow Problem. Pure Minimal Cost Flow Networks are especially convenient for modeling because of their simple nonmathematical structure that can be easily portrayed with a graph. This

More information

Languages and Compiler Design II IR Code Optimization

Languages and Compiler Design II IR Code Optimization Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 4/16/2010 PSU CS322 HM 1 Agenda IR Optimization

More information

Supplement for MIPS (Section 4.14 of the textbook)

Supplement for MIPS (Section 4.14 of the textbook) Supplement for MIPS (Section 44 of the textbook) Section 44 does a good job emphasizing that MARIE is a toy architecture that lacks key feature of real-world computer architectures Most noticable, MARIE

More information

Implementing Sequential Consistency In Cache-Based Systems

Implementing Sequential Consistency In Cache-Based Systems To appear in the Proceedings of the 1990 International Conference on Parallel Processing Implementing Sequential Consistency In Cache-Based Systems Sarita V. Adve Mark D. Hill Computer Sciences Department

More information

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE TED (10)-3071 Reg. No.. (REVISION-2010) Signature. FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE (Common to CT and IF) [Time: 3 hours (Maximum marks: 100)

More information

Parallel Programming. Parallel algorithms Combinatorial Search

Parallel Programming. Parallel algorithms Combinatorial Search Parallel Programming Parallel algorithms Combinatorial Search Some Combinatorial Search Methods Divide and conquer Backtrack search Branch and bound Game tree search (minimax, alpha-beta) 2010@FEUP Parallel

More information

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction Group B Assignment 8 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code optimization using DAG. 8.1.1 Problem Definition: Code optimization using DAG. 8.1.2 Perquisite: Lex, Yacc, Compiler

More information

UMIACS-TR December, CS-TR-3192 Revised April, William Pugh. Dept. of Computer Science. Univ. of Maryland, College Park, MD 20742

UMIACS-TR December, CS-TR-3192 Revised April, William Pugh. Dept. of Computer Science. Univ. of Maryland, College Park, MD 20742 UMIACS-TR-93-133 December, 1992 CS-TR-3192 Revised April, 1993 Denitions of Dependence Distance William Pugh Institute for Advanced Computer Studies Dept. of Computer Science Univ. of Maryland, College

More information

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous. Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or

More information

Using Static Single Assignment Form

Using Static Single Assignment Form Using Static Single Assignment Form Announcements Project 2 schedule due today HW1 due Friday Last Time SSA Technicalities Today Constant propagation Loop invariant code motion Induction variables CS553

More information

String Allocation in Icon

String Allocation in Icon String Allocation in Icon Ralph E. Griswold Department of Computer Science The University of Arizona Tucson, Arizona IPD277 May 12, 1996 http://www.cs.arizona.edu/icon/docs/ipd275.html Note: This report

More information

System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics

System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics Chunho Lee, Miodrag Potkonjak, and Wayne Wolf Computer Science Department, University of

More information

// The next 4 functions return true on success, false on failure

// The next 4 functions return true on success, false on failure Stacks and Queues Queues and stacks are two special list types of particular importance. They can be implemented using any list implementation, but arrays are a more practical solution for these structures

More information

Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas

Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas Software Pipelining by Modulo Scheduling Philip Sweany University of North Texas Overview Instruction-Level Parallelism Instruction Scheduling Opportunities for Loop Optimization Software Pipelining Modulo

More information

6.001 Notes: Section 4.1

6.001 Notes: Section 4.1 6.001 Notes: Section 4.1 Slide 4.1.1 In this lecture, we are going to take a careful look at the kinds of procedures we can build. We will first go back to look very carefully at the substitution model,

More information

P Is Not Equal to NP. ScholarlyCommons. University of Pennsylvania. Jon Freeman University of Pennsylvania. October 1989

P Is Not Equal to NP. ScholarlyCommons. University of Pennsylvania. Jon Freeman University of Pennsylvania. October 1989 University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science October 1989 P Is Not Equal to NP Jon Freeman University of Pennsylvania Follow this and

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

Compiler Construction 2016/2017 Loop Optimizations

Compiler Construction 2016/2017 Loop Optimizations Compiler Construction 2016/2017 Loop Optimizations Peter Thiemann January 16, 2017 Outline 1 Loops 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6 Loop Unrolling

More information

Memory Access Optimizations in Instruction-Set Simulators

Memory Access Optimizations in Instruction-Set Simulators Memory Access Optimizations in Instruction-Set Simulators Mehrdad Reshadi Center for Embedded Computer Systems (CECS) University of California Irvine Irvine, CA 92697, USA reshadi@cecs.uci.edu ABSTRACT

More information

Summary: Issues / Open Questions:

Summary: Issues / Open Questions: Summary: The paper introduces Transitional Locking II (TL2), a Software Transactional Memory (STM) algorithm, which tries to overcomes most of the safety and performance issues of former STM implementations.

More information

Lecture Notes on Liveness Analysis

Lecture Notes on Liveness Analysis Lecture Notes on Liveness Analysis 15-411: Compiler Design Frank Pfenning André Platzer Lecture 4 1 Introduction We will see different kinds of program analyses in the course, most of them for the purpose

More information

PROBLEM SOLVING TECHNIQUES

PROBLEM SOLVING TECHNIQUES PROBLEM SOLVING TECHNIQUES UNIT I PROGRAMMING TECHNIQUES 1.1 Steps Involved in Computer Programming What is Program? A program is a set of instructions written by a programmer. The program contains detailed

More information

Code Placement, Code Motion

Code Placement, Code Motion Code Placement, Code Motion Compiler Construction Course Winter Term 2009/2010 saarland university computer science 2 Why? Loop-invariant code motion Global value numbering destroys block membership Remove

More information

Chapter 10 - Computer Arithmetic

Chapter 10 - Computer Arithmetic Chapter 10 - Computer Arithmetic Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 10 - Computer Arithmetic 1 / 126 1 Motivation 2 Arithmetic and Logic Unit 3 Integer representation

More information

Eliminating False Loops Caused by Sharing in Control Path

Eliminating False Loops Caused by Sharing in Control Path Eliminating False Loops Caused by Sharing in Control Path ALAN SU and YU-CHIN HSU University of California Riverside and TA-YUNG LIU and MIKE TIEN-CHIEN LEE Avant! Corporation In high-level synthesis,

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Intermediate representations IR #1: CPS/L 3. Code example. Advanced Compiler Construction Michel Schinz

Intermediate representations IR #1: CPS/L 3. Code example. Advanced Compiler Construction Michel Schinz Intermediate representations Intermediate representations Advanced Compiler Construction Michel Schinz 2016 03 03 The term intermediate representation (IR) or intermediate language designates the data-structure(s)

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17 01.433/33 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/2/1.1 Introduction In this lecture we ll talk about a useful abstraction, priority queues, which are

More information

III Data Structures. Dynamic sets

III Data Structures. Dynamic sets III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees Dynamic sets Sets are fundamental to computer science Algorithms may require several different types of operations

More information

Programming for Engineers Introduction to C

Programming for Engineers Introduction to C Programming for Engineers Introduction to C ICEN 200 Spring 2018 Prof. Dola Saha 1 Simple Program 2 Comments // Fig. 2.1: fig02_01.c // A first program in C begin with //, indicating that these two lines

More information

Binary Decision Diagrams (BDD)

Binary Decision Diagrams (BDD) Binary Decision Diagrams (BDD) Contents Motivation for Decision diagrams Binary Decision Diagrams ROBDD Effect of Variable Ordering on BDD size BDD operations Encoding state machines Reachability Analysis

More information

Code Optimization. Code Optimization

Code Optimization. Code Optimization 161 Code Optimization Code Optimization 162 Two steps: 1. Analysis (to uncover optimization opportunities) 2. Optimizing transformation Optimization: must be semantically correct. shall improve program

More information

Exploiting Hardware Resources: Register Assignment across Method Boundaries

Exploiting Hardware Resources: Register Assignment across Method Boundaries Exploiting Hardware Resources: Register Assignment across Method Boundaries Ian Rogers, Alasdair Rawsthorne, Jason Souloglou The University of Manchester, England {Ian.Rogers,Alasdair.Rawsthorne,Jason.Souloglou}@cs.man.ac.uk

More information

Functional abstraction. What is abstraction? Eating apples. Readings: HtDP, sections Language level: Intermediate Student With Lambda

Functional abstraction. What is abstraction? Eating apples. Readings: HtDP, sections Language level: Intermediate Student With Lambda Functional abstraction Readings: HtDP, sections 19-24. Language level: Intermediate Student With Lambda different order used in lecture section 24 material introduced much earlier sections 22, 23 not covered

More information

Functional abstraction

Functional abstraction Functional abstraction Readings: HtDP, sections 19-24. Language level: Intermediate Student With Lambda different order used in lecture section 24 material introduced much earlier sections 22, 23 not covered

More information

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g) Introduction to Algorithms March 11, 2009 Massachusetts Institute of Technology 6.006 Spring 2009 Professors Sivan Toledo and Alan Edelman Quiz 1 Solutions Problem 1. Quiz 1 Solutions Asymptotic orders

More information

COMPILER DESIGN - CODE OPTIMIZATION

COMPILER DESIGN - CODE OPTIMIZATION COMPILER DESIGN - CODE OPTIMIZATION http://www.tutorialspoint.com/compiler_design/compiler_design_code_optimization.htm Copyright tutorialspoint.com Optimization is a program transformation technique,

More information

Lecture Compiler Backend

Lecture Compiler Backend Lecture 19-23 Compiler Backend Jianwen Zhu Electrical and Computer Engineering University of Toronto Jianwen Zhu 2009 - P. 1 Backend Tasks Instruction selection Map virtual instructions To machine instructions

More information

1 GB FLIP INTRODUCTION Introduction. This is GB FLIP, the module used by GraphBase programs to generate random numbers.

1 GB FLIP INTRODUCTION Introduction. This is GB FLIP, the module used by GraphBase programs to generate random numbers. 1 GB FLIP INTRODUCTION 1 1. Introduction. This is GB FLIP, the module used by GraphBase programs to generate random numbers. To use the routines in this file, first call the function gb init rand (seed

More information

Achieving Distributed Buffering in Multi-path Routing using Fair Allocation

Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Ali Al-Dhaher, Tricha Anjali Department of Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois

More information

High-Level Synthesis (HLS)

High-Level Synthesis (HLS) Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1.1 Introduction Given that digital logic and memory devices are based on two electrical states (on and off), it is natural to use a number

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

A Simple Code Improvement Scheme for Prolog. Saumya K. Debray. Department of Computer Science The University of Arizona Tucson, AZ 85721, USA

A Simple Code Improvement Scheme for Prolog. Saumya K. Debray. Department of Computer Science The University of Arizona Tucson, AZ 85721, USA A Simple Code Improvement Scheme for Prolog Saumya K. Debray Department of Computer Science The University of Arizona Tucson, AZ 85721, USA Abstract The generation of efficient code for Prolog programs

More information

Heap Management. Heap Allocation

Heap Management. Heap Allocation Heap Management Heap Allocation A very flexible storage allocation mechanism is heap allocation. Any number of data objects can be allocated and freed in a memory pool, called a heap. Heap allocation is

More information

We will focus on data dependencies: when an operand is written at some point and read at a later point. Example:!

We will focus on data dependencies: when an operand is written at some point and read at a later point. Example:! Class Notes 18 June 2014 Tufts COMP 140, Chris Gregg Detecting and Enhancing Loop-Level Parallelism Loops: the reason we can parallelize so many things If the compiler can figure out if a loop is parallel,

More information

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

More information

Mapping Control to Hardware

Mapping Control to Hardware C A P P E N D I X A custom format such as this is slave to the architecture of the hardware and the instruction set it serves. The format must strike a proper compromise between ROM size, ROM-output decoding,

More information

A Fuzzy Logic Approach to Assembly Line Balancing

A Fuzzy Logic Approach to Assembly Line Balancing Mathware & Soft Computing 12 (2005), 57-74 A Fuzzy Logic Approach to Assembly Line Balancing D.J. Fonseca 1, C.L. Guest 1, M. Elam 1, and C.L. Karr 2 1 Department of Industrial Engineering 2 Department

More information

An Enhanced Perturbing Algorithm for Floorplan Design Using the O-tree Representation*

An Enhanced Perturbing Algorithm for Floorplan Design Using the O-tree Representation* An Enhanced Perturbing Algorithm for Floorplan Design Using the O-tree Representation* Yingxin Pang Dept.ofCSE Univ. of California, San Diego La Jolla, CA 92093 ypang@cs.ucsd.edu Chung-Kuan Cheng Dept.ofCSE

More information

Geometric and Thematic Integration of Spatial Data into Maps

Geometric and Thematic Integration of Spatial Data into Maps Geometric and Thematic Integration of Spatial Data into Maps Mark McKenney Department of Computer Science, Texas State University mckenney@txstate.edu Abstract The map construction problem (MCP) is defined

More information

Data-flow Analysis. Y.N. Srikant. Department of Computer Science and Automation Indian Institute of Science Bangalore

Data-flow Analysis. Y.N. Srikant. Department of Computer Science and Automation Indian Institute of Science Bangalore Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Compiler Design Data-flow analysis These are techniques that derive information about the flow

More information

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program and Code Improvement Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program Review Front end code Source code analysis Syntax tree Back end code Target code

More information