PPS : A Pipeline Path-based Scheduler. 46, Avenue Felix Viallet, Grenoble Cedex, France.

Size: px
Start display at page:

Download "PPS : A Pipeline Path-based Scheduler. 46, Avenue Felix Viallet, Grenoble Cedex, France."

Transcription

1 : A Pipeline Path-based Scheduler Maher Rahmouni Ahmed A. Jerraya Laboratoire TIMA/lNPG,, Avenue Felix Viallet, 80 Grenoble Cedex, France rahmouni@verdon.imag.fr Abstract This paper presents a scheduling algorithm that improves on other approaches when dealing with the synthesis of control-ow dominated behavioral descriptions. It achieves this through the use of a constraintdriven path-based scheduling algorithm. The suboptimality of the original path-based algorithms when dealing with loops is overcome through a new technique for pipelining dierent loop iterations during execution path generation. Results show that the algorithm always generates the fastest solution in terms of clock cycles. Introduction Path-based scheduling algorithms () have proved themselves to be much more ecient than classical approaches when dealing with descriptions of control-ow dominated circuits. The rst application of to synthesis was made by Camposano [,,] and was based on algorithms rst proposed for microcode compaction []. generates an As Fast As Possible (AFAP) schedule for a description containing many dierent possible execution paths. This is achieved through a complex clique covering technique that identies the minimum number of cuts necessary for all paths in order to satisfy the constraints (userimposed or data-dependent). This approach however, tends to be sub-optimal when the input description contains many loops. The problem is related to the fact that, in Camposano's approach, all loop feedback edges are broken and thus no advantage can be taken of the fact that dierent loop iterations can be pipelined, implying potential parallelism beyond loop boundaries. Two other path-based approaches, namely [] and LDS [8] attempt to overcome this problem by leaving loop feedback edges intact. However, their approach is rather simplistic as they only consider one iteration. In addition, they do not cut the generated paths in an optimal way as they use an As Late As Possible (ALAP) scheduling technique to do this. Nevertheless, results published for these algorithms show that by treating loops more eciently, improvements on the original path-based approach can be made, even taking into consideration the fact that the path cuts are not optimal. In this paper, we propose a path-based scheduling algorithm that uses a clique covering technique to cut the paths in an optimal fashion while at the same time, using a new technique for pipelining loop iterations in order to identify any parallelism that may exist beyond loop boundaries. This loop pipelining goes beyond that proposed in both and LDS. The algorithm assumes that a loop may execute 0 times (when the loop condition is false for example), once or two or more times and generates dierent paths for each of these cases. Thus potential parallelism over several iterations of a loop can be identied. The algorithm, called Pipelined Path-based Scheduling () is presented in the rest of this paper. The paper is organized as follows. The basic concepts are reviewed in section. In Section, we analyze two previous path-based algorithms, and, to deduce our new algorithm,. The algorithm is illustrated in section. In section, we show some experimental results that indicate the improvements that may be obtained using. Finally, conclusions and perspectives are summarized in section. Basic Concepts In this section, we briey outline some of the basic concepts necessary for developing the algorithm.

2 . Path-based Concepts Control ow graphs are the most suitable representation for modeling control-ow dominated designs containing many (possibly nested) loops, global exceptions, multiple wait on events and procedure calls. In other words, features that reect the inherent properties of controllers. A control ow graph CFG isa graph G =(V; E), where the nodes V represent the operations such as assignment, addition as well as procedure calls, etc., and the edges E represent the precedence relation. An edge (v ;v ) E means that v is executed after v. If an operation represented by v has more than one successor represented by (v ;:::;v n ), one of them is executed. The selection of the successor depends on the conditions attached to the edges (v; v );:::;(v; v n ) and indicated on the edges. Paths in path based scheduling represent sequence of nodes (v ;:::;v n ), such that all these nodes can be executed in the same control step. Each path has a header which is the rst node of the sequence and a successor which follows the last node in the sequence. Paths with the same headers are merged into the same state. A transition is made between states S i and S j if and only if there is a Path P in S i with a successor representing the header of all the paths in S j.. Cost Function -dependent loops and loops with non-static bounds in control ow graphs introduce a major problem for the cost function of the scheduling result. This is due to the unknown number of iterations for each loop. The number of states or transitions generated by the schedule does not reect the real total execution time of an algorithm. The right way to evaluate these algorithms is to dene a new metric representing the expected number of clock cycles of a schedule [8]. This metric includes: Branch Probability: association of a branch probability to each edge in the control ow graph. These probabilities are computed by simulating the given behavioral description with a large set of dierent possible inputs. Path Probability: Let P = (v ;:::;v n ), v m = Succ(P ), the probability of executing such a path is: Prob(P) = ny i= p(v i ;v i+ ) p(v n ;v m ) () State Transition Probability: Let (P i ;:::;P j )bethe paths having S k as entry state, and S l as destination state. The probability of state transition from S k to S l is: p s (k; l) = jx r=i Prob(P r ) () Expected number of Clock Cycles of a Schedule: Let S =(S ;:::;S n ) be the set of states resulting from the Schedule. The expected number of clock cycles needed to execute the correponding input behavioral description is: X sch = nx i= X i () where X i is a random variable representing the expected number of times the state S i is executed during an execution of the behavioral description. Computing X i, 8i (;:::;n) is equivalent toresolving the following set of linear equations: X = () X i = X (X j :p s (j; i)) () 8j such that 9 a path P from S j to S i. Scheduling of Control Dominated Descriptions For control-ow dominated circuit descriptions, the most suitable scheduling techniques are those based on optimizing the dierent execution paths that may occur. The classic Path-based Scheduling() algorithm uses a minimum clique covering technique to generate an As Fast As Possible (AFAP) schedule. Other path-based algorithms such as, use a simpler, ALAP schedule preferring to concentrate on the optimization of loop execution. Both algorithms dramatically improve on the performance of data- ow dominated scheduling techniques for controldominated circuits but their results dier depending on the input description. Take, for example, the control-ow graph of gure (a). This represents a simple algorithm containing two possible execution paths starting at node and terminating at node. We assume that nodes and are synchronization nodes. In other words, they represent a statement such as"wait until S='';" in VHDL. The treatment of such synchronization statements is dierent in

3 S (a) (b) (c) S S S 0.0 all loops in the input description by removing feedback edges. This implies that each loop will require at least one state (control steps) to execute. Using statistical analysis, the number of clock cycles necessary to execute this schedule is 00. does not break loop feedback edges and can therefore potentially execute the loop in a single state. Even if a loop requires several states, the fact that we can pipeline dierent iterations of the loop means that the execution time of an algorithm containing loops scheduled with will be faster than for. This is validated in gure (c) (even for such a small example). We see that, using the same statistics as for, the execution time of the algorithm using is 0 clock cycles. In the next section, we present Figure : (a) Simple CFG input representation (b) Result of scheduling (c) Result of scheduling and. The result of scheduling the description of gure (a) using is shown in gure (b). We can see that, under certain conditions, the entire algorithm can be executed in a single state (S). Depending on the evaluation of the conditional of node, either state or S will be executed if the corresponding synchronization statement isfalse. Using the statistical analysis described in section. and the probability of taking each branch asshown in gure (b), we deduce that the execution time of this algorithm in terms of clock cycles will be.0. For, the execution paths will always be broken when a synchronization statement is encountered. Thus, will generate a circuit that executes according to the state diagram of gure (c). This diagram also has three states but note that the algorithm will always take at least two states to execute. Again according to our statistical analysis, the execution time of this circuit will be.. Thus, outperforms in this case in terms of execution time (number of states and transitions are equal). A more realistic control-ow dominated circuit description contains many (possibly nested) loops. In such cases, tends to produce better results than. Suppose we take the input graph representation as shown in gure (a). For the sake of clarity,we limit this graph to a single loop. We assume that there is a constraint violation between nodes and. As we are dealing with behavioral descriptions, it is not imperative tohave a synchronization statement within a loop body. The result of scheduling this representation using is shown in gure (b). breaks (a) (b) (c) S.0 S S Figure : (a) CFG with a single loop (b) Result of scheduling (c) Result of scheduling an algorithm that combines the advantages of both of these approaches and generates schedules that always execute in the minimum number of cycles. Pipeline Path-based Scheduling In order to benet from the advantages oered by both and, we have developed a new algorithm that uses clique covering to generate an AFAP schedule while at the same time using loop feedback edges to pipeline dierent iterations of a loop. In order to describe this algorithm, we will use the input graph representation shown in gure (a). The algorithm, known as Pipelined Path-based Scheduling () considers that a given loop will be executed 0, or or more times. If there are no

4 loops, the paths are generated in accordance with the approach.thus, for the example presented in gure, will generate the same results as. If on the other hand, loops exist, paths will be generated assuming the loop executes 0 times, once and twice. This technique is not unlike that of loop unrolling[]. Unrolling loops twice allows us to detect any interdependencies that may exist between dierent iterations of the same loop. As there will always be a dependency between dierent iterations of the same node, therefore implying a path interval, it is unnecessary to unroll loops more than twice. Of course, if the loop bounds are known at compile time, we can fully unroll the loop. The dierent paths for the input graph of gure (a) are shown in gure. The subscripts correspond to the loop iteration from which the node is taken. Thus, 0 is node for zero loop iterations (the loop condition is false), is node of the rst iteration of the loop and node is node of the second iteration. We still assume that a constraint violation is present between nodes and implying that all paths must be broken between these nodes for all loop iterations. Other path intervals are due to constraint violations (data-dependencies) between dierent iterations of the same node (nodes 0 and for example). Nevertheless, we can still benet from loop pipelining as shown in gure by the intervals generated using clique covering. before Path Path Path (a) after (b) S Figure : (a) Paths before and after Cuts generated using scheduling on the input graph of gure (c), (b) Corresponding state diagram The path f,,,g is deduced from Path. This is due to the fact that the loop is unrolled twice. The GCD DESIGN PREFETCH MMULT Constraints Loops S/L Paths States Trans Expck Adders / / / / / / Figure : Benchmark Results / 7.8 /.0 / 0.90 operations and of iteraion i can be executed in the same state as operations and of iteraion i. Experimental Results We have executed our algorithm on several benchmarks and other published examples. The results show that always nds the best solution under the same constraints when compared with the two other path-based approaches, represented by [] and []. Three benchmarks were chosen, PREFETCH[], GCD(Great Common Divisor) and MMULT[9] which represent the computation of the function ab mod n, given 0 a; b n, and lg(n). For each of them, gure shows the results produced by and the two others algorithms. Loops means the number of loops, S/L paths gives the shortest/longest path, States the number of states and Trans the number of transitions. Expck means the expected number of clock cycles needed to execute the algorithm. The results in gure show that always produces the better result of the two others algorithms. Furthermore, in the small examples, sometimes (the case of GCD) the clique covering technique and the ALAP generate the same cuts, thus producing the same results. Of more interest is the performance of when treating large realistic examples. Figure shows how performed when dealing with two relatively complex designs; a send process of the X.[8] and a telephone answering machine[]. Both of these exam-

5 ples contain complex control structures. ANSWER is avery large example making it dicult to compute the probabilities for every possible execution path and to solve the linear equations to produce the expected number of clock cycles. Nonetheless, with, we save states compared to and 9 compared to. X. ANSWER DESIGN Constraints Chaining Loops States Trans Expck Figure : performance using large example Conclusions In this paper, we have presented a new algorithm which combines clique covering techniques and a new loop pipelining technique to optimally schedule control dominated designs. Our method optimizes both the cuts (and therefore the number of states) and the loop execution time by taking into account any parallelism that may exist beyond iteration boundaries. Experimental results obtained by executing on several examples show that the new technique is a considerable improvement on previous path-based approaches. [] R.A. Bergamaschi, R. Camposano, M. Payer, "Area And Performance Optimizations in Path- Based Scheduling", Proc. EDAC'9, pp0-0, Amsterdam, February 99. [] R. Camposano, R.A. Bergamaschi, "Synthesis Using Path-Based Scheduling: Algorithms And Exercises", Proc. 7th DAC, pp0-, Orlando, June 990. [] G. Goossens, J. Vandewalle, H. D. Man, "Loop optimization in registe-transfer scheduling for DSP-systems", Proc. th DAC, pp8-8, June 989. [] K. O'Brien, M. Rahmouni, A.A.Jerraya, " A VHDL-Based Scheduling Algorithm for Control- Flow Dominated Design", th Intl. Workshop On High-Level Synthesis, California, November 99. [7] K. O'Brien, M. Rahmouni, A.A.Jerraya, ": A Scheduling Algorithm for High-Level Synthesis in VHDL", Proc. EDAC'9, Paris, France, February 99. [8] S. Bhatacharya, S. Dey, F. Brglez, "Performance Analysis and Optimization of Schedules for Conditional and Loop Intensive Specications", st Design Automation Conference, pp. 9-9, 99. [9] Howard Trickey, "Compiling Pascal Programs into Silicon", Ph D Thesis, Department of Computer Science, Stanford University, July 98. Acknowledgements The authors would like to thank Dr Kevin O'Brien and Jean Frehel for there help during this work. References [] J. A. Fisher, "Trace scheduleing: A technique for global microcode compaction", IEEE T. CAD, Vol C-0, July 98. [] R. Camposano, "Path-Based Scheduling for Synthesis", IEEE T. CAD, Vol 0(), pp8-9, January 99.

Formulation and Evaluation Of Scheduling Techniques. For Control Flow Graphs. 46, Avenue Felix Viallet, Grenoble Cedex, France

Formulation and Evaluation Of Scheduling Techniques. For Control Flow Graphs. 46, Avenue Felix Viallet, Grenoble Cedex, France Formulation and Evaluation Of Scheduling Techniques For Control Flow Graphs Maher Rahmouni Ahmed A. Jerraya Laboratoire TIMA/lNPG,, Avenue Felix Viallet, 0 Grenoble Cedex, France Email:rahmouni@verdon.imag.fr

More information

Combining MBP-Speculative Computation and Loop Pipelining. in High-Level Synthesis. Technical University of Braunschweig. Braunschweig, Germany

Combining MBP-Speculative Computation and Loop Pipelining. in High-Level Synthesis. Technical University of Braunschweig. Braunschweig, Germany Combining MBP-Speculative Computation and Loop Pipelining in High-Level Synthesis U. Holtmann, R. Ernst Technical University of Braunschweig Braunschweig, Germany Abstract Frequent control dependencies

More information

Performance Analysis and Optimization of Schedules for Conditional and Loop-Intensive Specifications

Performance Analysis and Optimization of Schedules for Conditional and Loop-Intensive Specifications Performance Analysis and Optimization of Schedules for Conditional and Loop-Intensive Specifications Subhrajit Bhattacharya Sujit Dey Franc Brglez y Dept. of Computer Science C&C Research Labs CBL, Dept.

More information

High-Level Synthesis (HLS)

High-Level Synthesis (HLS) Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

HIGH-LEVEL SYNTHESIS

HIGH-LEVEL SYNTHESIS HIGH-LEVEL SYNTHESIS Page 1 HIGH-LEVEL SYNTHESIS High-level synthesis: the automatic addition of structural information to a design described by an algorithm. BEHAVIORAL D. STRUCTURAL D. Systems Algorithms

More information

An Algorithm for the Allocation of Functional Units from. Realistic RT Component Libraries. Department of Information and Computer Science

An Algorithm for the Allocation of Functional Units from. Realistic RT Component Libraries. Department of Information and Computer Science An Algorithm for the Allocation of Functional Units from Realistic RT Component Libraries Roger Ang rang@ics.uci.edu Nikil Dutt dutt@ics.uci.edu Department of Information and Computer Science University

More information

Submitted for TAU97 Abstract Many attempts have been made to combine some form of retiming with combinational

Submitted for TAU97 Abstract Many attempts have been made to combine some form of retiming with combinational Experiments in the Iterative Application of Resynthesis and Retiming Soha Hassoun and Carl Ebeling Department of Computer Science and Engineering University ofwashington, Seattle, WA fsoha,ebelingg@cs.washington.edu

More information

8ns. 8ns. 16ns. 10ns COUT S3 COUT S3 A3 B3 A2 B2 A1 B1 B0 2 B0 CIN CIN COUT S3 A3 B3 A2 B2 A1 B1 A0 B0 CIN S0 S1 S2 S3 COUT CIN 2 A0 B0 A2 _ A1 B1

8ns. 8ns. 16ns. 10ns COUT S3 COUT S3 A3 B3 A2 B2 A1 B1 B0 2 B0 CIN CIN COUT S3 A3 B3 A2 B2 A1 B1 A0 B0 CIN S0 S1 S2 S3 COUT CIN 2 A0 B0 A2 _ A1 B1 Delay Abstraction in Combinational Logic Circuits Noriya Kobayashi Sharad Malik C&C Research Laboratories Department of Electrical Engineering NEC Corp. Princeton University Miyamae-ku, Kawasaki Japan

More information

Type T1: force false. Type T2: force true. Type T3: complement. Type T4: load

Type T1: force false. Type T2: force true. Type T3: complement. Type T4: load Testability Insertion in Behavioral Descriptions Frank F. Hsu Elizabeth M. Rudnick Janak H. Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract A new synthesis-for-testability

More information

CSC148, Lab #4. General rules. Overview. Tracing recursion. Greatest Common Denominator GCD

CSC148, Lab #4. General rules. Overview. Tracing recursion. Greatest Common Denominator GCD CSC148, Lab #4 This document contains the instructions for lab number 4 in CSC148H. To earn your lab mark, you must actively participate in the lab. We mark you in order to ensure a serious attempt at

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

Reprogrammable Controller Design From High- Level Specification.

Reprogrammable Controller Design From High- Level Specification. Reprogrammable Controller Design From High- Level Specification. M. BENMOHAMMED 1, 1 Computer Science Department, University of Cne, 25000 Constantine, ALGERIE. Email : ibnmyahoo.fr M. BOURAHLA 2, and

More information

However, no results are published that indicate the applicability for cycle-accurate simulation purposes. The language RADL [12] is derived from earli

However, no results are published that indicate the applicability for cycle-accurate simulation purposes. The language RADL [12] is derived from earli Retargeting of Compiled Simulators for Digital Signal Processors Using a Machine Description Language Stefan Pees, Andreas Homann, Heinrich Meyr Integrated Signal Processing Systems, RWTH Aachen pees[homann,meyr]@ert.rwth-aachen.de

More information

Area. A max. f(t) f(t *) A min

Area. A max. f(t) f(t *) A min Abstract Toward a Practical Methodology for Completely Characterizing the Optimal Design Space One of the most compelling reasons for developing highlevel synthesis systems has been the desire to quickly

More information

Register Transfer Methodology II

Register Transfer Methodology II Register Transfer Methodology II Chapter 12 1 Outline 1. Design example: One shot pulse generator 2. Design Example: GCD 3. Design Example: UART 4. Design Example: SRAM Interface Controller 5. Square root

More information

Outline. Register Transfer Methodology II. 1. One shot pulse generator. Refined block diagram of FSMD

Outline. Register Transfer Methodology II. 1. One shot pulse generator. Refined block diagram of FSMD Outline Register Transfer Methodology II 1. Design example: One shot pulse generator 2. Design Example: GCD 3. Design Example: UART 4. Design Example: SRAM Interface Controller 5. Square root approximation

More information

EEL 4783: HDL in Digital System Design

EEL 4783: HDL in Digital System Design EEL 4783: HDL in Digital System Design Lecture 3: Architeching Speed Prof. Mingjie Lin 1 Flowchart of CAD 2 Digital Circuits: Definition of Speed Throughput Latency The amount of data that is processed

More information

P j. system description. internal representation. partitioning. HW synthesis. SW synthesis. cost estimation - path based scheduler -...

P j. system description. internal representation. partitioning. HW synthesis. SW synthesis. cost estimation - path based scheduler -... A Path{Based Technique for Estimating Hardware Runtime in HW/SW-Cosynthesis Jorg Henkel, Rolf Ernst Institut fur Datenverarbeitungsanlagen Technische Universitat Braunschweig Hans{Sommer{Str. 66, D{806

More information

UMIACS-TR December, CS-TR-3192 Revised April, William Pugh. Dept. of Computer Science. Univ. of Maryland, College Park, MD 20742

UMIACS-TR December, CS-TR-3192 Revised April, William Pugh. Dept. of Computer Science. Univ. of Maryland, College Park, MD 20742 UMIACS-TR-93-133 December, 1992 CS-TR-3192 Revised April, 1993 Denitions of Dependence Distance William Pugh Institute for Advanced Computer Studies Dept. of Computer Science Univ. of Maryland, College

More information

under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli

under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli Interface Optimization for Concurrent Systems under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli Abstract The scope of most high-level synthesis eorts to date has

More information

OUT. + * * + + * * + c1 c2. c4 c5 D2 OUT

OUT. + * * + + * * + c1 c2. c4 c5 D2  OUT Techniques for Functional Test Pattern Execution Inki Hong and Miodrag Potkonjak UCLA Computer Science Department Los Angeles, CA 90095-596 USA Abstract Functional debugging of application specic integrated

More information

Generalized Loop-Unrolling: a Method for Program Speed-Up

Generalized Loop-Unrolling: a Method for Program Speed-Up Generalized Loop-Unrolling: a Method for Program Speed-Up J. C. Huang and T. Leng Department of Computer Science The University of Houston Houston, TX 77204-3475 jhuang leng@cs.uh.edu Abstract - It is

More information

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809 PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA Laurent Lemarchand Informatique ubo University{ bp 809 f-29285, Brest { France lemarch@univ-brest.fr ea 2215, D pt ABSTRACT An ecient distributed

More information

DRAFT for FINAL VERSION. Accepted for CACSD'97, Gent, Belgium, April 1997 IMPLEMENTATION ASPECTS OF THE PLC STANDARD IEC

DRAFT for FINAL VERSION. Accepted for CACSD'97, Gent, Belgium, April 1997 IMPLEMENTATION ASPECTS OF THE PLC STANDARD IEC DRAFT for FINAL VERSION. Accepted for CACSD'97, Gent, Belgium, 28-3 April 1997 IMPLEMENTATION ASPECTS OF THE PLC STANDARD IEC 1131-3 Martin hman Stefan Johansson Karl-Erik rzen Department of Automatic

More information

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax: Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical

More information

Frank Mueller. Dept. of Computer Science. Florida State University. Tallahassee, FL phone: (904)

Frank Mueller. Dept. of Computer Science. Florida State University. Tallahassee, FL phone: (904) Static Cache Simulation and its Applications by Frank Mueller Dept. of Computer Science Florida State University Tallahassee, FL 32306-4019 e-mail: mueller@cs.fsu.edu phone: (904) 644-3441 July 12, 1994

More information

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi Incorporating the Controller Eects During Register Transfer Level Synthesis Champaka Ramachandran and Fadi J. Kurdahi Department of Electrical & Computer Engineering, University of California, Irvine,

More information

Using Speculative Computation and Parallelizing techniques to improve Scheduling of Control based Designs

Using Speculative Computation and Parallelizing techniques to improve Scheduling of Control based Designs Using Speculative Computation and Parallelizing techniques to improve Scheduling of Control based Designs Roberto Cordone Fabrizio Ferrandi, Gianluca Palermo, Marco D. Santambrogio, Donatella Sciuto Università

More information

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica A New Register Allocation Scheme for Low Power Data Format Converters Kala Srivatsan, Chaitali Chakrabarti Lori E. Lucke Department of Electrical Engineering Minnetronix, Inc. Arizona State University

More information

Sequential Circuit Test Generation Using Decision Diagram Models

Sequential Circuit Test Generation Using Decision Diagram Models Sequential Circuit Test Generation Using Decision Diagram Models Jaan Raik, Raimund Ubar Department of Computer Engineering Tallinn Technical University, Estonia Abstract A novel approach to testing sequential

More information

Chapter 04: Instruction Sets and the Processor organizations. Lesson 20: RISC and converged Architecture

Chapter 04: Instruction Sets and the Processor organizations. Lesson 20: RISC and converged Architecture Chapter 04: Instruction Sets and the Processor organizations Lesson 20: RISC and converged Architecture 1 Objective Learn the RISC architecture Learn the Converged Architecture 2 Reduced Instruction Set

More information

High-Level Synthesis

High-Level Synthesis High-Level Synthesis 1 High-Level Synthesis 1. Basic definition 2. A typical HLS process 3. Scheduling techniques 4. Allocation and binding techniques 5. Advanced issues High-Level Synthesis 2 Introduction

More information

B2 if cs < cs_max then cs := cs + 1 cs := 1 ra

B2 if cs < cs_max then cs := cs + 1 cs := 1 ra Register Transfer Level VHDL Models without Clocks Matthias Mutz (MMutz@sican{bs.de) SICAN Braunschweig GmbH, Digital IC Center D{38106 Braunschweig, GERMANY Abstract Several hardware compilers on the

More information

Transport protocols are of practical. login, le transfer, and remote procedure. calls. will operate on and therefore are generally

Transport protocols are of practical. login, le transfer, and remote procedure. calls. will operate on and therefore are generally Hazard-Free Connection Release Jennifer E. Walter Department of Computer Science Texas A&M University College Station, TX 77843-3112, U.S.A. Jennifer L. Welch Department of Computer Science Texas A&M University

More information

Architectural Design and Analysis of a VLIW Processor. Arthur Abnous and Nader Bagherzadeh. Department of Electrical and Computer Engineering

Architectural Design and Analysis of a VLIW Processor. Arthur Abnous and Nader Bagherzadeh. Department of Electrical and Computer Engineering Architectural Design and Analysis of a VLIW Processor Arthur Abnous and Nader Bagherzadeh Department of Electrical and Computer Engineering University of California, Irvine Irvine, CA 92717 Phone: (714)

More information

Scheduling of Behavioral VHDL by Retiming Techniques. source level in the algorithmic description. Although this

Scheduling of Behavioral VHDL by Retiming Techniques. source level in the algorithmic description. Although this Scheduling of Behavioral VHDL by Retiming Techniques N. Wehn 1, J. Biesenack 1, T. Langmaier 1,M.Munch 1;2, M. Pilsl 1, S. Rumler 1,P. Duzy 1 1 Siemens AG, ZFE BT SE 52 Otto-Hahn-Ring 6, D-8173 Munchen

More information

2 J. Karvo et al. / Blocking of dynamic multicast connections Figure 1. Point to point (top) vs. point to multipoint, or multicast connections (bottom

2 J. Karvo et al. / Blocking of dynamic multicast connections Figure 1. Point to point (top) vs. point to multipoint, or multicast connections (bottom Telecommunication Systems 0 (1998)?? 1 Blocking of dynamic multicast connections Jouni Karvo a;, Jorma Virtamo b, Samuli Aalto b and Olli Martikainen a a Helsinki University of Technology, Laboratory of

More information

Timer DRAM DDD FM9001 DRAM

Timer DRAM DDD FM9001 DRAM indiana university computer science department technical report no. 385 Derivation of a DRAM Memory Interface by Sequential Decomposition Kamlesh Rath, Bhaskar Bose, and Steven D. Johnson june 993 To appear

More information

Fast Reliable Level-Lines Segments Extraction

Fast Reliable Level-Lines Segments Extraction Fast Reliable Level-Lines Segments Extraction N. Suvonvorn, S. Bouchafa, L. Lacassagne nstitut d'electronique Fondamentale, Université Paris-Sud 91405 Orsay FRANCE nikom.suvonvorn@ief.u-psud.fr, samia.bouchafa@ief.u-psud.fr,

More information

Designing for Performance. Patrick Happ Raul Feitosa

Designing for Performance. Patrick Happ Raul Feitosa Designing for Performance Patrick Happ Raul Feitosa Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance

More information

High-level Variable Selection for Partial-Scan Implementation

High-level Variable Selection for Partial-Scan Implementation High-level Variable Selection for Partial-Scan Implementation FrankF.Hsu JanakH.Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract In this paper, we propose

More information

fa0 fa1 fa2 a(0) a(1) a(2) a(3) cin a b sum(0) sum(1) sum(2) sum(3) sum cout cin cin cout a b sum cout cin a b sum cout cin b(0) b(1) b(2) b(3)

fa0 fa1 fa2 a(0) a(1) a(2) a(3) cin a b sum(0) sum(1) sum(2) sum(3) sum cout cin cin cout a b sum cout cin a b sum cout cin b(0) b(1) b(2) b(3) Introduction to Synopsys and VHDL on Solaris c Naveen Michaud-Agrawal for Dr. Pakzad's CSE 331 Honor class September 25, 2000 1 Introduction VHDL is an acronym which stands for VHSIC Hardware Description

More information

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli Eect of fan-out on the Performance of a Single-message cancellation scheme Atul Prakash (Contact Author) Gwo-baw Wu Seema Jetli Department of Electrical Engineering and Computer Science University of Michigan,

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,

More information

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley Department of Computer Science Remapping Subpartitions of Hyperspace Using Iterative Genetic Search Keith Mathias and Darrell Whitley Technical Report CS-4-11 January 7, 14 Colorado State University Remapping

More information

Phase2. Phase 1. Video Sequence. Frame Intensities. 1 Bi-ME Bi-ME Bi-ME. Motion Vectors. temporal training. Snake Images. Boundary Smoothing

Phase2. Phase 1. Video Sequence. Frame Intensities. 1 Bi-ME Bi-ME Bi-ME. Motion Vectors. temporal training. Snake Images. Boundary Smoothing CIRCULAR VITERBI BASED ADAPTIVE SYSTEM FOR AUTOMATIC VIDEO OBJECT SEGMENTATION I-Jong Lin, S.Y. Kung ijonglin@ee.princeton.edu Princeton University Abstract - Many future video standards such as MPEG-4

More information

The only known methods for solving this problem optimally are enumerative in nature, with branch-and-bound being the most ecient. However, such algori

The only known methods for solving this problem optimally are enumerative in nature, with branch-and-bound being the most ecient. However, such algori Use of K-Near Optimal Solutions to Improve Data Association in Multi-frame Processing Aubrey B. Poore a and in Yan a a Department of Mathematics, Colorado State University, Fort Collins, CO, USA ABSTRACT

More information

BEHAVIORAL SYNTHESIS: AN OVERVIEW

BEHAVIORAL SYNTHESIS: AN OVERVIEW CHAPTER 6 BEHAVIORAL SYNTHESIS: AN OVERVIEW REINALDO A. BERGAMASCHI IBM Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, USA., Tel: 1 914 945 3903, Fax: 1 914 945 4469, e-mail:

More information

i=1 i=2 i=3 i=4 i=5 x(4) x(6) x(8)

i=1 i=2 i=3 i=4 i=5 x(4) x(6) x(8) Vectorization Using Reversible Data Dependences Peiyi Tang and Nianshu Gao Technical Report ANU-TR-CS-94-08 October 21, 1994 Vectorization Using Reversible Data Dependences Peiyi Tang Department of Computer

More information

Multi-Version Caches for Multiscalar Processors. Manoj Franklin. Clemson University. 221-C Riggs Hall, Clemson, SC , USA

Multi-Version Caches for Multiscalar Processors. Manoj Franklin. Clemson University. 221-C Riggs Hall, Clemson, SC , USA Multi-Version Caches for Multiscalar Processors Manoj Franklin Department of Electrical and Computer Engineering Clemson University 22-C Riggs Hall, Clemson, SC 29634-095, USA Email: mfrankl@blessing.eng.clemson.edu

More information

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a Preprint 0 (2000)?{? 1 Approximation of a direction of N d in bounded coordinates Jean-Christophe Novelli a Gilles Schaeer b Florent Hivert a a Universite Paris 7 { LIAFA 2, place Jussieu - 75251 Paris

More information

NISC Application and Advantages

NISC Application and Advantages NISC Application and Advantages Daniel D. Gajski Mehrdad Reshadi Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92697-3425, USA {gajski, reshadi}@cecs.uci.edu CECS Technical

More information

Automatic Counterflow Pipeline Synthesis

Automatic Counterflow Pipeline Synthesis Automatic Counterflow Pipeline Synthesis Bruce R. Childers, Jack W. Davidson Computer Science Department University of Virginia Charlottesville, Virginia 22901 {brc2m, jwd}@cs.virginia.edu Abstract The

More information

minute xed time-out. In other words, the simulations indicate that battery life is extended by more than 17% when the share algorithm is used instead

minute xed time-out. In other words, the simulations indicate that battery life is extended by more than 17% when the share algorithm is used instead A Dynamic Disk Spin-down Technique for Mobile Computing David P. Helmbold, Darrell D. E. Long and Bruce Sherrod y Department of Computer Science University of California, Santa Cruz Abstract We address

More information

Solve the Data Flow Problem

Solve the Data Flow Problem Gaining Condence in Distributed Systems Gleb Naumovich, Lori A. Clarke, and Leon J. Osterweil University of Massachusetts, Amherst Computer Science Department University of Massachusetts Amherst, Massachusetts

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,

More information

2 <3> <2> <1> (5,6) 9 (5,6) (4,5) <1,3> <1,2> <1,1> (4,5) 6 <1,1,4> <1,1,3> <1,1,2> (5,7) (5,6) (5,5)

2 <3> <2> <1> (5,6) 9 (5,6) (4,5) <1,3> <1,2> <1,1> (4,5) 6 <1,1,4> <1,1,3> <1,1,2> (5,7) (5,6) (5,5) A fast approach to computing exact solutions to the resource-constrained scheduling problem M. NARASIMHAN and J. RAMANUJAM 1 Louisiana State University This paper presents an algorithm that substantially

More information

Compiler Construction 2010/2011 Loop Optimizations

Compiler Construction 2010/2011 Loop Optimizations Compiler Construction 2010/2011 Loop Optimizations Peter Thiemann January 25, 2011 Outline 1 Loop Optimizations 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6

More information

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907 The Game of Clustering Rowena Cole and Luigi Barone Department of Computer Science, The University of Western Australia, Western Australia, 697 frowena, luigig@cs.uwa.edu.au Abstract Clustering is a technique

More information

Data Path Allocation using an Extended Binding Model*

Data Path Allocation using an Extended Binding Model* Data Path Allocation using an Extended Binding Model* Ganesh Krishnamoorthy Mentor Graphics Corporation Warren, NJ 07059 Abstract Existing approaches to data path allocation in highlevel synthesis use

More information

(RC) utilize CAD tools to perform the technology mapping of a extensive amount of time is spent for compilation by the CAD

(RC) utilize CAD tools to perform the technology mapping of a extensive amount of time is spent for compilation by the CAD Domain Specic Mapping for Solving Graph Problems on Recongurable Devices? Andreas Dandalis, Alessandro Mei??, and Viktor K. Prasanna University of Southern California fdandalis, prasanna, ameig@halcyon.usc.edu

More information

MODELING LANGUAGES AND ABSTRACT MODELS. Giovanni De Micheli Stanford University. Chapter 3 in book, please read it.

MODELING LANGUAGES AND ABSTRACT MODELS. Giovanni De Micheli Stanford University. Chapter 3 in book, please read it. MODELING LANGUAGES AND ABSTRACT MODELS Giovanni De Micheli Stanford University Chapter 3 in book, please read it. Outline Hardware modeling issues: Representations and models. Issues in hardware languages.

More information

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI

More information

The CPU Design Kit: An Instructional Prototyping Platform. for Teaching Processor Design. Anujan Varma, Lampros Kalampoukas

The CPU Design Kit: An Instructional Prototyping Platform. for Teaching Processor Design. Anujan Varma, Lampros Kalampoukas The CPU Design Kit: An Instructional Prototyping Platform for Teaching Processor Design Anujan Varma, Lampros Kalampoukas Dimitrios Stiliadis, and Quinn Jacobson Computer Engineering Department University

More information

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T.

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T. Document Image Restoration Using Binary Morphological Filters Jisheng Liang, Robert M. Haralick University of Washington, Department of Electrical Engineering Seattle, Washington 98195 Ihsin T. Phillips

More information

Parallel Program Graphs and their. (fvivek dependence graphs, including the Control Flow Graph (CFG) which

Parallel Program Graphs and their. (fvivek dependence graphs, including the Control Flow Graph (CFG) which Parallel Program Graphs and their Classication Vivek Sarkar Barbara Simons IBM Santa Teresa Laboratory, 555 Bailey Avenue, San Jose, CA 95141 (fvivek sarkar,simonsg@vnet.ibm.com) Abstract. We categorize

More information

Compiler Construction 2016/2017 Loop Optimizations

Compiler Construction 2016/2017 Loop Optimizations Compiler Construction 2016/2017 Loop Optimizations Peter Thiemann January 16, 2017 Outline 1 Loops 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6 Loop Unrolling

More information

Mahsa Vahidi and Alex Orailoglu. La Jolla CA of alternatives needs to be explored to obtain the

Mahsa Vahidi and Alex Orailoglu. La Jolla CA of alternatives needs to be explored to obtain the Metric-Based Transformations for Self Testable VLSI Designs with High Test Concurrency Mahsa Vahidi and Alex Orailoglu Department of Computer Science and Engineering University of California, San Diego

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

CIS 890: Safety Critical Systems

CIS 890: Safety Critical Systems CIS 890: Safety Critical Systems Lecture: SPARK -- Analysis Tools Copyright 2007, John Hatcliff. The syllabus and all lectures for this course are copyrighted materials and may not be used in other course

More information

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract Implementations of Dijkstra's Algorithm Based on Multi-Level Buckets Andrew V. Goldberg NEC Research Institute 4 Independence Way Princeton, NJ 08540 avg@research.nj.nec.com Craig Silverstein Computer

More information

Compilation Issues for High Performance Computers: A Comparative. Overview of a General Model and the Unied Model. Brian J.

Compilation Issues for High Performance Computers: A Comparative. Overview of a General Model and the Unied Model. Brian J. Compilation Issues for High Performance Computers: A Comparative Overview of a General Model and the Unied Model Abstract This paper presents a comparison of two models suitable for use in a compiler for

More information

[HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE

[HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE [HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE International Conference on Computer-Aided Design, pp. 422-427, November 1992. [HaKa92b] L. Hagen and A. B.Kahng,

More information

Allowing Cycle-Stealing Direct Memory Access I/O. Concurrent with Hard-Real-Time Programs

Allowing Cycle-Stealing Direct Memory Access I/O. Concurrent with Hard-Real-Time Programs To appear in: Int. Conf. on Parallel and Distributed Systems, ICPADS'96, June 3-6, 1996, Tokyo Allowing Cycle-Stealing Direct Memory Access I/O Concurrent with Hard-Real-Time Programs Tai-Yi Huang, Jane

More information

proc {Produce State Out} local State2 Out2 in State2 = State + 1 Out = State Out2 {Produce State2 Out2}

proc {Produce State Out} local State2 Out2 in State2 = State + 1 Out = State Out2 {Produce State2 Out2} Laziness and Declarative Concurrency Raphael Collet Universite Catholique de Louvain, B-1348 Louvain-la-Neuve, Belgium raph@info.ucl.ac.be May 7, 2004 Abstract Concurrency and distribution in a programming

More information

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz Compiler Design Fall 2015 Control-Flow Analysis Sample Exercises and Solutions Prof. Pedro C. Diniz USC / Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, California 90292

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Testability Analysis and Improvement from VHDL Behavioral Specifications

Testability Analysis and Improvement from VHDL Behavioral Specifications Testability Analysis and Improvement from VHDL Behavioral Specifications Xinli Gu, Krzysztof Kuchcinski, Zebo Peng Dept. of Computer and Information Science Linköping University S-581 83 Linköping, Sweden

More information

Abstract. Programs written in languages of the Oberon family usually. contain runtime tests on the dynamic type of variables.

Abstract. Programs written in languages of the Oberon family usually. contain runtime tests on the dynamic type of variables. Type Test Elimination using Typeow Analysis Diane Corney and John Gough Queensland University of Technology, Brisbane, Australia Abstract. Programs written in languages of the Oberon family usually contain

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines B. B. Zhou, R. P. Brent and A. Tridgell Computer Sciences Laboratory The Australian National University Canberra,

More information

A Quantitative Algorithm for Data. IRISA, University of Rennes. Christine Eisenbeis INRIA. Abstract

A Quantitative Algorithm for Data. IRISA, University of Rennes. Christine Eisenbeis INRIA. Abstract A Quantitative Algorithm for Data Locality Optimization Francois Bodin, William Jalby, Daniel Windheiser IRISA, University of Rennes Rennes, FRANCE Christine Eisenbeis INRIA Rocquencourt, FRANCE Abstract

More information

Path Testing + Coverage. Chapter 8

Path Testing + Coverage. Chapter 8 Path Testing + Coverage Chapter 8 Structural Testing n Also known as glass/white/open box testing n A software testing technique whereby explicit knowledge of the internal workings of the item being tested

More information

REDUCING THE CODE SIZE OF RETIMED SOFTWARE LOOPS UNDER TIMING AND RESOURCE CONSTRAINTS

REDUCING THE CODE SIZE OF RETIMED SOFTWARE LOOPS UNDER TIMING AND RESOURCE CONSTRAINTS REDUCING THE CODE SIZE OF RETIMED SOFTWARE LOOPS UNDER TIMING AND RESOURCE CONSTRAINTS Noureddine Chabini 1 and Wayne Wolf 2 1 Department of Electrical and Computer Engineering, Royal Military College

More information

Memory hierarchy. 1. Module structure. 2. Basic cache memory. J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas

Memory hierarchy. 1. Module structure. 2. Basic cache memory. J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Memory hierarchy J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Computer Architecture ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid

More information

Lecture 19. Software Pipelining. I. Example of DoAll Loops. I. Introduction. II. Problem Formulation. III. Algorithm.

Lecture 19. Software Pipelining. I. Example of DoAll Loops. I. Introduction. II. Problem Formulation. III. Algorithm. Lecture 19 Software Pipelining I. Introduction II. Problem Formulation III. Algorithm I. Example of DoAll Loops Machine: Per clock: 1 read, 1 write, 1 (2-stage) arithmetic op, with hardware loop op and

More information

Final Exam. If you plan to solve a problem using a standard graph algorithm then you should clearly

Final Exam. If you plan to solve a problem using a standard graph algorithm then you should clearly NAME: CS 241 Algorithms and Data Structures Spring Semester, 2003 Final Exam May 2, 2003 Do not spend too much time on any problem. The point value approximates the time I expect you to need for the problem.

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

HW/SW Codesign. WCET Analysis

HW/SW Codesign. WCET Analysis HW/SW Codesign WCET Analysis 29 November 2017 Andres Gomez gomeza@tik.ee.ethz.ch 1 Outline Today s exercise is one long question with several parts: Basic blocks of a program Static value analysis WCET

More information

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES A. Likas, K. Blekas and A. Stafylopatis National Technical University of Athens Department

More information

Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications Timothy Sherwood Erez Perelman Brad Calder University of California, San Diego Motivation Architecture

More information

Mobile NFS. Fixed NFS. MFS Proxy. Client. Client. Standard NFS Server. Fixed NFS MFS: Proxy. Mobile. Client NFS. Wired Network.

Mobile NFS. Fixed NFS. MFS Proxy. Client. Client. Standard NFS Server. Fixed NFS MFS: Proxy. Mobile. Client NFS. Wired Network. On Building a File System for Mobile Environments Using Generic Services F. Andre M.T. Segarra IRISA Research Institute IRISA Research Institute Campus de Beaulieu Campus de Beaulieu 35042 Rennes Cedex,

More information

Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization

Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization Mian-Muhammad Hamayun, Frédéric Pétrot and Nicolas Fournel System Level Synthesis

More information

Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis*

Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis* Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis* R. Ruiz-Sautua, M. C. Molina, J.M. Mendías, R. Hermida Dpto. Arquitectura de Computadores y Automática Universidad Complutense

More information

A Loosely Synchronized Execution Model for a. Simple Data-Parallel Language. (Extended Abstract)

A Loosely Synchronized Execution Model for a. Simple Data-Parallel Language. (Extended Abstract) A Loosely Synchronized Execution Model for a Simple Data-Parallel Language (Extended Abstract) Yann Le Guyadec 2, Emmanuel Melin 1, Bruno Ran 1 Xavier Rebeuf 1 and Bernard Virot 1? 1 LIFO - IIIA Universite

More information

a = f(x) a = f(x) if b == 0 if b == 0 if b == 0 if b == 0 if b == 0 a = f(x) a = f(y)

a = f(x) a = f(x) if b == 0 if b == 0 if b == 0 if b == 0 if b == 0 a = f(x) a = f(y) Comparing Tail Duplication with Compensation Code in Single Path Global Instruction Scheduling? David Gregg Institut fur Computersprachen, Technische Universitat Wien, Argentinierstr. 8,A-1040 Wien, E-mail:

More information

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology.

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology. A Fast Recursive Mapping Algorithm Song Chen and Mary M. Eshaghian Department of Computer and Information Science New Jersey Institute of Technology Newark, NJ 7 Abstract This paper presents a generic

More information

X(1) X. X(k) DFF PI1 FF PI2 PI3 PI1 FF PI2 PI3

X(1) X. X(k) DFF PI1 FF PI2 PI3 PI1 FF PI2 PI3 Partial Scan Design Methods Based on Internally Balanced Structure Tomoya TAKASAKI Tomoo INOUE Hideo FUJIWARA Graduate School of Information Science, Nara Institute of Science and Technology 8916-5 Takayama-cho,

More information

What is the role of teletraffic engineering in broadband networks? *

What is the role of teletraffic engineering in broadband networks? * OpenStax-CNX module: m13376 1 What is the role of teletraffic engineering in broadband networks? * Jones Kalunga This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution

More information

High Level Synthesis

High Level Synthesis High Level Synthesis Design Representation Intermediate representation essential for efficient processing. Input HDL behavioral descriptions translated into some canonical intermediate representation.

More information