Area. f(t) f(t *) A min. T min. T T * T max Latency

Size: px
Start display at page:

Download "Area. f(t) f(t *) A min. T min. T T * T max Latency"

Transcription

1 Ecient Optimal Design Space Characterization Methodologies STEPHEN A. BLYTHE Rensselaer Polytechnic Institute and ROBERT A. WALKER Kent State University 1 One of the primary advantages of a high-level synthesis system is its ability to explore the design space. This paper presents several methodologies for design space exploration that compute all optimal tradeo points for the combined problem of scheduling, clock length determination, and module selection. We discuss how each methodology takes advantage of both the structure within the design space itself as well as the structure of, and interaction between, each of the three subproblems. 1. INTRODUCTION For many years, one of the most compelling reasons for developing high-level synthesis systems [Gajski et al. 1994] [De Micheli 1994] has been the desire to quickly explore a wide range of designs for the same behavioral description. Given a set of designs, two metrics are commonly used to evaluate their quality: area (ideally total area, but often only functional unit area), and time (the schedule length, or latency). Finding the optimal tradeo curve between these two metrics is called design space exploration. Design space exploration is generally considered too dicult to solve optimally in a reasonable amount of time, so the problem is usually limited to computing either lower bounds [Timmer et al. 1993] or estimates [Chen and Jeng 1991] on the optimal tradeo curve for some set of time or resource constraints. Moreover, the design space is usually determined by solving only the scheduling and functional unit allocation subproblems. The design space exploration methodology described here goes beyond traditional design space exploration in several ways. First, all optimal tradeo points are computed so that the design space is completely characterized. Second, these optimal tradeo points represent optimal solutions to the time-constrained scheduling (TCS) and resource-constrained scheduling (RCS) problems, rather than lower bounds or estimates. Third, the tradeo points are computed in a manner that supports more realistic module libraries by incorporating clock length determination and module selection into the methodology. Finally, these tradeo points are 0 This material is based upon work supported by the National Science Foundation under Grant No. MIP Author's addresses: Stephen A. Blythe, Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180; Robert A. Walker, Department of Mathematics and Computer Science, Kent State University, Kent, OH 44242

2 2 S. A. Blythe and R. A. Walker Area A max f(t) f(t *) A min T min T T * T max Latency Fig. 1. Example design space showing Pareto points. The shaded regions show the two distinct clusters of Pareto points that many tradeo curves exhibit. computed in an ecient manner through careful pruning of the search space during the design cycle. The resulting methodology can also be extended to include additional subproblems. 1.1 The Design Space The process of exploring the design space can be viewed as solving either the timeconstrained scheduling (TCS) problem (minimizing the functional unit area) for a range of time constraints, or the resource-constrained scheduling (RCS) problem (minimizing the latency) for a range of resource constraints. Although there is a tradeo between latency and area, the tradeo curve is not smooth due to the nite combinations of the library modules available [McFarland 1987]. Consider the design space shown in Figure 1 { this curve can be described by the set of points f(t; f(t ))g, where f(t ) is the minimum area required for a given time constraint T (i.e., the optimal solution to that TCS problem). To ensure that this curve is completely characterized, one could exhaustively solve the TCS problem optimally for every time constraint T from T min (the critical path length) to T max (the time constraint corresponding to the module selection / allocation with the minimum area). However, this brute-force approach is not very ecient. Fortunately, the number of points needed to fully characterize the optimal tradeo curve is much smaller. The curve can be completely characterized by the set f(t ; f(t ))g of optimal tradeo points (shown by black dots in Figure 1) { those points for which there is no design with a smaller latency and the same area, and no design with a smaller area and the same latency. Such points are called Pareto points [De Micheli 1994] [Brayton and Spence 1984], and can be formally dened as follows: f(t ) < f(t? k); 1 k T? T min f(t ) f(t + k); 1 k T max? T Therefore, the design space exploration problem can be solved more eciently by

3 Ecient Optimal Design Space Characterization Methodologies 3 nding only the Pareto points in the design space. Furthermore, many optimal tradeo curves contain two distinct clusters of such Pareto points, as shown by the shaded regions in Figure 1: one where the latency is small and the area is large, and another where the area is small and the latency is large. This paper explores several approaches to nd all the Pareto points in an ecient manner. First, Section 2 describes a basic methodology that explores the latency axis to nd the Pareto points in the design space through repeatedly applying a TCS methodology. Section 3 discusses how to extend this problem to incorporate the clock length determination problem, and Section 4 discusses the incorporation of module selection. In Section 5, a related approach based on solving the RCS problem repeatedly while taking advantage of the structure of the module selection problem is discussed, and a comparison is made with the TCS based method. Then Section 6 examines the advantages and disadvantages of combining the two approaches in a manner similar to Timmer's bounding methodology [Timmer et al. 1993]. Sections 7 and 8 describes techniques for pivoting between the TCS and RCS-based methodologies to take advantage of the Pareto point clusters described above. In Section 9, results for a fairly complex module library are presented, and why this pivoting method works well for such a library is discussed. Lastly, Section 10 gives a summary of this work and suggests some future directions that it may take. 2. LATENCY-AXIS EXPLORATION To nd all of the Pareto points, either the TCS problem could be solved repeatedly for various time constraints, or the RCS problem could be solved repeatedly for various resource constraints. Our methodology repeatedly solves the TCS problem 2, which leads to two subproblems: (1) determining which time constraints to explore, and (2) determining how to eciently explore the design space at each time constraint. Ideally, we want to avoid exhaustively searching all time constraints in the feasible range [T min ; T max ]. If the module set and clock length are specied a priori, then the TCS problem need only be solved for those time constraints that are a multiple of the clock length, since any other time constraint could be replaced by the smaller of the two time constraints that it would lie between without any increase in area. As a simple example, consider the design space exploration problem for the DIF- FEQ example [Paulin and Knight 1989], using library A from Table 1 (the \trivial" library 1 found in [Timmer et al. 1993]) and a clock length of 100. The minimum time constraint is 600 (the length of the critical path), the maximum time constraint is 1300 (the latency required for a feasible schedule with 1 mult and 1 alu1), so the only time constraints that must be explored are those in the set f600; 700; 800; 900; 1000; 1100; 1200; 1300g. Given that set of time constraints, our Voyager design space exploration system [Chaudhuri et al. 1997] eciently characterizes the design space as follows. The main loop (see Figure 2) scans the time constraints in the direction of in- 2 Note that, although we are solving only the TCS problem, this methodology is not limited to solving only that problem, and could be extended to include register allocation, interconnect allocation, control unit design, etc.

4 4 S. A. Blythe and R. A. Walker Design Space Exploration: area cur MAXINT compute T min and T max compute all candidate time constraints T i in [T min; T max] for each T i from T min to T max if (latency(asap) > f(t i)) (T i; f(t i)) is not a feasible schedule else compute the lower bound lb on f(t i) if (lb >= area cur) else (T i; f(t i)) is not a Pareto point /* np-lb */ compute the upper bound ub on f(t i) if (lb = ub) (T i; f(t i)) is a Pareto point /* P-lbub */ area cur = lb else compute the LP-relaxed lower bound rlb on f(t i) if (rlb >= area cur) (T i; f(t i)) is not a Pareto point /* np-rlb */ else if (rlb = ub) (T i; f(t i)) is a Pareto point /* P-rlbub */ area cur = rlb else calculate IP solution ip = f(t i). if (ip < area cur) (T i; f(t i)) is a Pareto point /* P-ILP */ area cur = ip else (T i; f(t i)) is not a Pareto point/* np-ilp */ Fig. 2. Voyager's main design space exploration loop creasing latency. At each time constraint, an ASAP schedule is rst calculated to determine if a feasible schedule exists for that time constraint and clock length. If so, then it uses a heuristic to compute a lower bound on the functional unit (FU) area; if this area is the same as or larger 3 than the previous area, then that solution is not a Pareto point. This is the case for time constraints 900, 1000, 1100, and 1200 in Figure 3. However, if the lower bound is smaller than the previous area, then it is a potential 3 In the problem as specied so far, the area will never be larger. However, it may be larger when the clock length determination and module selection problems are incorporated into the methodology as described in Sections 3 and 4.

5 Ecient Optimal Design Space Characterization Methodologies 5 MODULE AREA DELAY (ns) OPERATIONS mult f*g alu f+;?; <g Table 1. Library A { Timmer's \trivial" library Optimal (Pareto-Based) Curve Optimal Solutions Lower Bounds Area Latency Fig. 3. Results from DIFFEQ using library A Pareto point. The methodology then computes an upper bound on the area, and compares it to the lower bound. If the two are equal, then the point is an optimal solution, and a Pareto point (e.g., time constraint 800 in Figure 3); if not, then the results are still inconclusive (e.g., time constraint 700). It then uses a tighter (but more computationally-intensive) FU lower-bounding method based on LPrelaxation, and tries this procedure again (in this example, determining that time constraint 700 corresponds to a Pareto point). If this method also fails, then it solves a carefully-developed ILP formulation [Chaudhuri et al. 1994] to determine the optimal solution, using the bounds determined earlier to reduce the search space for that solution. Thus our base methodology quickly determines whether or not each time constraint corresponds to a Pareto point by carefully pruning the search space. It rst computes a small set of time constraints to explore. Increasingly tighter heuristics are then used to try to determine if each time constraint corresponds to a Pareto point. Only if those heuristics fail is a more computationally-intensive ILP formulation used. Unfortunately, assuming the module set and clock length are specied a priori is unrealistic with complex module libraries. Accordingly, Section 3 describes how this base methodology can be extended to include clock length determination and Section 4 describes the incorporation of module selection. 3. ADDING CLOCK DETERMINATION As described earlier, our base methodology explores a set of time constraints, determining whether or not the solution to each TCS problem is a Pareto point. The problem was simplied by assuming the clock length was known a priori, whereas

6 6 S. A. Blythe and R. A. Walker recent work has shown that not only is determining the system clock length a dif- cult problem [Chen and Jeng 1991; Chaiyakul et al. 1992; Narayan and Gajski 1992; Corazao et al. 1993; Jha et al. 1995; Chaudhuri et al. 1997]), but the choice of the clock length has a signicant impact on the resulting design. Therefore, the problem of clock length determination must be folded into the design space exploration problem. 3.1 Prior Work As described in [Chaudhuri et al. 1997], the clock determination problem is usually ignored in favor of ad hoc decisions or estimates. For example, several early synthesis systems used the delay of the slowest functional unit as the estimated clock length, a choice which favored the use of chaining and disallowed multi-cycling. A heuristic method for nding the clock length was given in [Narayan and Gajski 1992], but the result may not be optimal. To guarantee that the optimal clock length is chosen, 4 the scheduling problem could be solved repeatedly for every possible clock length { a very computationallyintensive task. Fortunately, such an exhaustive search is not necessary, as the set of candidate clock lengths to be scheduled can be reduced. In [Corazao et al. 1993], one method for reducing that set is given. A tighter method was introduced in [Chen and Jeng 1991], and later proven correct in [Jha et al. 1995] and [Chaudhuri et al. 1997] { this method computes a small set of candidate clock lengths (one of which must be the optimal clock length) by taking the ceiling of the integral divisors of each of the functional unit delays. 3.2 Pruning the Candidate Clock Lengths Even these integral-divisor methods can lead to a set of candidate clock lengths so large that it becomes too time-consuming to solve the TCS problem for each clock length at each time constraint. Fortunately, the set of candidate clock lengths can be reduced even further, as described below. Definition 1. For a given clock length c, the slack s k of a module of type k with execution delay d k is given by s k (c) = c dk? d k ; c Theorem 1. Given a clock length c, if there exists a clock length c such that s k (c ) s k (c) for all module types k in the current module selection, then c can be replaced by c without lengthening the schedule. Proof: Since s k (c ) s k (c) for all modules k, the same quality will hold for operations in a schedule using these modules. Thus all operations in the schedule using c could be scheduled at least as soon, if not sooner, in a schedule using c because all operations will be capable of executing faster (or equally as fast) in the schedule using c. Thus, changing the clock length to c can only improve the schedule.2 4 Actually, this is only the data path component of the system clock length; the nal clock length includes controller and interconnect delays as well.

7 clocks time constr. design points np-lb np-rlb np-ilp P-lb P -rlb P-ILP Ecient Optimal Design Space Characterization Methodologies 7 MODULE AREA DELAY (ns) OPERATIONS mul f*g alu f+;?; <g Table 2. Library B { Narayan's library Clock Length slack(*) slack(+) replaced by { { { / { / / Table 3. Slack values found in library B To demonstrate the use of this theorem, consider library B, shown in Table 2 (the VDP100 library from [Narayan and Gajski 1992], augmented with areas similar to those of library A). Assuming a technology limit of 17ns on the shortest clock length, integral divisor methods give the set CK = f163; 82; 55; 48; 41; 33; 28; 24; 21; 19; 17g of candidate clock lengths, with the corresponding slack values shown in Table 3. Consider the clock length of 33ns, found as d163=5e = 33. When a multiplier is scheduled using this clock length, there will be a slack of 2ns. There are several clock lengths whose slack for the multiplier is smaller, but the slack corresponding to the alu1 is always larger. However, a clock length of 55ns has the same slack as the multiplier, and less slack for the alu1. Therefore, Theorem 1 says that any schedule that uses a clock length of 33ns can be shortened by using a clock length of 55ns (without increasing the number of functional units). When Theorem 1 is applied to the full set of candidate clock lengths, the set is reduced to CK 0 = f163; 82; 55; 24g. Note that when two sets of slack values are equivalent, the shorter clock length is dropped since it would tend to result in a larger controller. 3.3 Exploring the Candidate Clock Lengths Once the pruned set CK 0 of candidate clock lengths has been computed, the integral multiples of each of those clock lengths give the time constraints to explore. Then, for each such time constraint and candidate clock length, the methodology outlined in Figure 2 can be applied. The eciency of the search at each time Table 4. Statistics from solving DIFFEQ using library B

8 8 S. A. Blythe and R. A. Walker Optimal (Pareto-Based) Curve Optimal Solutions Lower Bounds Area Latency Fig. 4. Results from DIFFEQ using library B Design Space Exploration w/ Clock Determination: area cur MAXINT compute pruned set CK 0 of candidate clock lengths compute T min and T max for each c j in CK 0 compute all candidate time constraints T i in [T min; T max] for each T i from T min to T max using each c j in CK 0 inducing T i determine if (T i; f(t i)) is a Pareto point (see Figure 2) Fig. 5. Voyager's design space exploration loop with clock determination constraint can be improved by observing that each time constraint was derived as an integral multiple of one or more clock lengths, so only those inducing clock lengths need be explored at that time constraint. The resulting methodology is outlined in Figure 5. Using library B and the DIFFEQ example, this methodology generates the design space shown in Figure 4. From the pruned set CK 0 = f163; 82; 55; 24g of candidate clock lengths, 49 time constraints were generated, and 50 time constraint / clock length pairs were explored (note that there was only a single time constraint with more than one candidate clock length). Two corresponded to infeasible schedules, while the other 48 had to be examined to determine if they were Pareto points. As Table 4 shows (the headings of the last six columns correspond to labels in Figure 2), the vast majority of the solutions were determined to be either Pareto or non-pareto points using the bounding heuristics { only two were solved using the tighter LP-relaxation lower bounding method and no solutions required an ILP approach! Note also that in several cases (time constraints in the range ), the

9 Ecient Optimal Design Space Characterization Methodologies 9 MODULE AREA DELAY (ns) OPERATIONS mult f*g alu f+;?; <g sub f-g add f+g alu f+;?; <g sub f-g add f+g Table 5. Library C { Timmer's library 2 lower bound diered from the optimal solution, so methods based solely on lowerbounding would incorrectly characterize the design space. Finally, Figure 4 also demonstrates the importance of systematically examining all relevant clock lengths in the design space. At a time constraint of 652, the inducing clock length of 163ns leads to a solution with an area of 500, whereas the previous time constraint had a lower area of 400. Although the point (652, 500) is optimal with respect to its time constraint and a xed clock length of 163 ns, it is not a Pareto point, and is thus rejected by the line labeled /* np-lb */ in Figure ADDING MODULE SELECTION While adding clock length determination to the base methodology is an important step toward supporting more complex libraries, the methodology must also be extended to cover libraries that oer a number of possible module sets. Again, we would prefer to avoid an exhaustive search of all possible module sets, yet we must ensure that we do not miss any combination of a time constraint, clock length, and module set that corresponds to a Pareto point. 4.1 Prior Work Over the years, a variety of methods have been employed to determine the appropriate module set. One method, described in [Jain et al. 1988], generates a number of module sets, and then selects the best one. Another method, presented in [Timmer et al. 1993], computes an initial module set through a MILP formulation, and determines its validity by scheduling; if no viable schedule is found, then the set (and its allocation) are updated, and the scheduling process is repeated. 5 As with some of the previous work on clock length determination, using such techniques to determine a single module set before (and independently of) scheduling cannot guarantee a globally optimal solution. Instead of trying to nd a single module set, the method found in [Chen and Jeng 1991] exhaustively explores all possible module sets. Since this method also exhaustively explores all integral divisor based clock lengths, its computational complexity is too large for optimal scheduling, so only estimates are computed. 4.2 Exploring Dierent Module Sets Fortunately, such an exhaustive search is not necessary. Many of the possible module sets can be eliminated since they are incapable of implementing all the 5 This method also incorporates the type mapping problem into the MILP formulation { something our methodology does not yet handle. See Section 9.

10 10 S. A. Blythe and R. A. Walker 5000 library clocks time constr. design points np-lb np-rlb np-ilp P-lb P-rlb P-ILP 4500 Optimal (Pareto-Based) Curve Optimal Solutions Lower Bounds Area Latency Fig. 6. Results from DIFFEQ using library C MODULE AREA DELAY (ns) OPERATIONS alu f; +;?;<; >g mul f*g add f+g sub f-g cmp f<; >g Table 6. Library D { an articial complex library operation types found in the data ow graph. For example, in the case of the DIFFEQ, the module set must be capable of performing the operations f+;?; ; <g; any module sets that do not can be eliminated. Moreover, the number of module sets that must be explored at each time constraint can be reduced (as was the number of candidate clock lengths) by observing that each time constraint was derived as an integral multiple of a clock length derived from one or more specic modules. Therefore, only those module sets that contain at least one of those modules must be explored at that time constraint. Using library C, shown in Table 5 (library 2 from [Timmer et al. 1993]), and the DIFFEQ example, the methodology described above generates the design space shown in Figure 6. There are 32 possible module sets, but only 1 pruned candidate clock length (100ns) and 9 time constraints, resulting in 288 TCS problems to solve. 32 resulted in infeasible schedules (i.e., no solution was possible), and as before, the vast majority of the solutions were determined to be either Pareto or non-pareto points using the bounding heuristics. C D Table 7. Statistics from solving DIFFEQ using libraries C and D

11 Ecient Optimal Design Space Characterization Methodologies Optimal (Pareto-Based) Curve Optimal Solutions Lower Bounds Area Latency Fig. 7. Results from DIFFEQ using library D As another example, consider library D, shown in Table 6 (an articial library slightly less complex than library C, but with more realistic module delays). Using that library, and the DIFFEQ example, the methodology described above generates the design space shown in Figure 7. Here there were 16 possible module sets, 9 integral-divisor candidate clock lengths, and 131 time constraints { almost 19,000 combinations. Even after pruning the candidate clock lengths, there were 6 pruned candidate clock lengths, and 93 time constraints { almost 9,000 combinations. However, the methodology had to solve only 1522 TCS problems (an average of 1.35 clock lengths and module selections at each time constraint). 183 of those were infeasible, and again, the vast majority of the solutions were determined to be either Pareto or non-pareto points using the bounding heuristics. Moreover, this entire procedure took only 1.5 hours of wall-clock time. Without such a careful pruning of the search space, this problem could not have been solved optimally in a reasonable amount of time. Furthermore, with these 4 module delays, there are many resulting designs that lie above the optimal tradeo curve. Although these designs are optimal solutions for a particular clock length and module set, they are not Pareto points, so it is very important that the methodology correctly explores the design space. For example, [Timmer et al. 1993] presents a method that begins at time constraint T max and alternately performs time and area lower-bounding to nd a stair-step tradeo curve. Even if that methodology is enhanced to alternate between optimally solving the resource-constrained and time-constrained scheduling problems, it would only nd the Pareto-based tradeo curve in the absence of the combined module selection and clock length determination problem. If this combined problem was included, the enhanced methodology would fail to nd the Pareto-based curve if one of the points found by time-constrained scheduling is a suboptimal point that lies above the optimal Pareto-based curve. Such a point (which would have a non-minimal area) would then be used by resource-constrained scheduling to nd the minimal latency with this (non-minimal) area, thus compounding the problem and giving an erroneous design curve that actually lies above the optimal area curve based on

12 12 S. A. Blythe and R. A. Walker Module Area Delay Operations alu f; +;?;<;>g mul f*g add f+g sub f-g cmp f<; >g Table 8. An articial complex library the Pareto points. 5. AREA-AXIS EXPLORATION Viewing the previous approach as a latency-axis exploration methodology, an alternative approach is based on the area axis: the Pareto points can be found by determining a set of area constraints to explore, and then optimally solving the RCS problem at each area constraint. Again, it is necessary to reduce the number of constraints to explore, as searching the entire integral range along the area axis would be prohibitively expensive. Given such a reduced set of area constraints, the algorithm outlined in Figure 2 can be modied to explore the area axis instead { starting with the point that has the smallest possible area (and the largest latency) and continuing until it reaches the point with the largest possible area (and minimal latency), solving the RCS problem to minimize the schedule latency at each area constraint. The various solutions can then be determined to be either Pareto or non-pareto points using heuristics and exact techniques together in a manner similar to that described in Section 2. In much the same way that any one time constraint can be induced by more than one clock length, each area constraint can correspond to more than one module set / resource allocation. Enumerating all possible allocations whose resulting area is within A max for each module set gives an initial set of candidate area / resource constraints that could be used in the area-axis methodology. The size of this initial set can then be reduced by noting that it may contain overly loose resource constraints { for example, a resource constraint of 3 adders would be too loose for a behavioral description with only one addition operation. In general, we can reduce the set of candidate resource constraints by upper bounding the number of independent paths in the DFG that could require resources of type T and could possibly be executed in parallel. The resulting upper bound would then be the maximal number of type T resources needed in any allocation for any schedule using any clock length. Although greatly dependent on the module library and DFG in use, applying even such simple heuristics can reduce the number of area constraints to explore by 80% - 85%. Given this basic area-axis methodology, the clock length determination and module selection problems can be incorporated much as they were in the basic latencyaxis methodology. However, in the area-axis methodology, it is easier to solve module selection than clock length determination at each area constraint, as the candidate module selections have the more pronounced eect.

13 Ecient Optimal Design Space Characterization Methodologies 13 DIFF AR EWF Latency Axis Area Axis Timmer Based 14:07 2:36 5: :08 8:50 12: :45 223:22 211: Table 9. Results from axis-based and neighborhood-based Timmer-like exploration using the library from Table Latency-Axis vs. Area-Axis Exploration Due to eect that the inducing clock lengths have on the other problems, it is easier to solve the clock length determination problem than it is to solve the module selection problem at each time constraint in the latency-axis methodology. This is reected by the fact that there are often more candidate module selections than candidate clock lengths at each time constraint. The opposite is true of the area-axis methods - each area constraint is derived from a module selection and allocation, without concern for the clock length determination problem. This frequently results in a single module selection with many clock length candidates at each point considered along the area axis. Results for both the latency-axis and area-axis methodologies are given in the middle two columns of Table 9, which show results for three dierent benchmarks (DIFFEQ, AR-lattice, and Elliptic Wave Filter). In each cell of the table, the execution time 6 is given (as minutes:seconds), along with the total number of points explored (i.e., the number of TCS or RCS problems solved). Note that neither approach is universally better than the other. Most of the time, it was faster to use area-axis exploration, but for the EWF example, several of the RCS problems were quite time-consuming to solve optimally. However, expressing those problems as TCS problems tended to be signicantly faster, resulting in faster latency-axis exploration. Furthermore, the simplicity of the module library used tends to lend itself to giving fewer area constraints, which is a signicant factor in the speed of either method. Thus it is unclear how to determine, a priori, which axis to choose for exploration. 6. A TIMMER-LIKE EXPLORATION While both of the axis-based methods described in Section 5 explore the design space in a reasonably eective manner, neither is universally eective. Furthermore, each method still has a fairly large number of TCS/RCS problems to solve. To increase the eciency further, one approach would be to combine the two methods, using a search methodology similar to the one employed in [Timmer et al. 1993]. As described in that paper, Timmer's methodology solves only the lower bounding problem, but it could easily be adapted to solve the scheduling problem instead. When adapted in this manner, Timmer's methodology essentially alternates between solving TCS and RCS problems, as shown in Figure 8. First, given a clock length and a minimal area (resource) constraint, the RCS problem would be solved, 6 The executions times are based on a Sun SPARC-20 running the Solaris Operating System.

14 14 S. A. Blythe and R. A. Walker A TCS B C D Timmer Curve Timmer Pareto Candidates True Optimal Curve True Pareto Points RCS TCS RCS The pitfall of Timmer's method when considering clock length and module set determi- Fig. 8. nation minimizing the latency for that resource constraint. The resulting latency would then be used as a time constraint and the corresponding TCS problem solved, minimizing the number of resources. Then another RCS problem would be solved using that minimized number of resources, etc. Since every RCS problem solved by this methodology nds a Pareto point, the number of TCS/RCS problems to be solved is reduced considerably over the axis-based methods. As presented, this Timmer-like methodology does not consider the clock length determination problem, but as has been shown in [Chaudhuri et al. 1995] [Chen and Jeng 1991], the clock length has a signicant impact on resulting designs. In fact, failure to incorporate clock length determination can result in overlooking Pareto points in the complete optimal design space. Because Timmer's methodology assumes a single clock length, the resulting Pareto points are optimal relative only to that one clock length and may not fully characterize the design space since time constraints induced by other clock lengths frequently result in smaller areas (TCS solutions). This situation is depicted in Figure 8, where Pareto points C and D were not found with Timmer's methodology, since they correspond to dierent clock lengths than the one being used, and Pareto point A (also corresponding to a dierent clock length) was missed in favor of the false Pareto point B. However, the clock length determination problem can be incorporated into Timmer's method by considering a neighborhood of time constraints around each Timmer Pareto candidate as follows. For each candidate clock length, the induced time constraint closest to, without exceeding, that Pareto point's time constraint is found. Then, instead of solving a single TCS problem at a single time constraint, the minimum area resulting from solving the TCS problem at each of these new time constraints is found, and used as the area constraint for the next RCS problem. Unfortunately, the information from previous schedules can no longer be used to prune the search space as described in Section 2, which puts this method at distinct disadvantage over the axis-based methods. This eect can be seen in the last column of Table 9, where the execution times for our neighborhood-based Timmer-like methodology can exceed those of the axis-based methodologies, even though there is a decrease in the number of design points explored.

15 Ecient Optimal Design Space Characterization Methodologies 15 Design Space Exploration using Simple Pivoting: input percentage of time constraints to explore, perc generate candidate time constraints in range [T min; T max] locate T such that j[t min; T ]j = perc j[t min; T max]j explore [T min; T ] using the latency axis methodology using A corresponding to the point (T ; A ) generate candidate area constraints in range [A min; A ] explore [A min; A ] using the area axis methodology Fig. 9. Voyager's Simple Pivoting Methodology DIFF AR EWF % time constraints explored :07 3:58 2:24 2:01 1:38 1:49 2: :08 11:04 7:45 5:57 4:41 9:24 8: :45 7:01 3:01 1:29 221:46 209:51 223: Table 10. Results from simple pivoting 7. PIVOTING BETWEEN LATENCY-AXIS AND AREA-AXIS EXPLORATION Another approach to combining latency-axis and area-axis exploration is to consider the structure of the tradeo curve. As shown in Figure 1, a large number of the Pareto points are clustered into two regions: one where the latency is small and the area is large, and another where the area is small and the latency is large. This phenomenon is also illustrated in Figure 10. Here, the latency-axis methodology (exploring the latency axis in the direction of increasing latency) would nd many Pareto points fairly quickly, but would then waste a considerable amount of time exploring time constraints that do not correspond to Pareto points until the high-latency cluster of Pareto points is reached. Using the area-axis methodology (exploring the area axis in the direction of increasing area) has a similar shortcoming. However, this shortcoming can be overcome by pivoting between the two axisbased methods { using the latency-axis methodology to explore the high-area / lowlatency cluster, and using the area-axis methodology to explore the high-latency / low-area cluster. This process is outlined in Figure 9. When exploring the latency axis in the direction of decreasing latency, the most obvious method of pivoting is to simply switch from latency-axis exploration to area-axis exploration after exploring a certain percentage (perc) of the latency axis. Note that after making this switch, the area-axis methodology must still explore the area axis in the direction of increasing area (so that information from previous schedules can be used to prune the search space as described in Section 2), but now it can stop when it reaches the last Pareto point found by the latency-axis methodology. The results of performing this pivoting process for various percentages of the

16 16 S. A. Blythe and R. A. Walker AR Optimal (Pareto-Based) Curve EWF Optimal (Pareto-Based) Curve Area Latency Fig. 10. The EWF and AR optimal Pareto-base curves when using the library from Table 8 latency axis are presented in Table 10. Note that the 0% column corresponds to an immediate pivot to area-axis exploration, and the 100% column corresponds to using solely latency-axis exploration. Not surprisingly, for the tradeo curves depicted in Figure 10, the percentages that result in the fastest execution times are fairly low (10%-20%), since most of the low-latency cluster of Pareto points are within the rst 20% of the latency axis. Unfortunately, however, there is no consistent percentage that will always correspond to the best pivot point for every tradeo curve, regardless of whether the execution time 7 or the number of points being explored is the quantity being minimized. Thus, a better method for deciding where to pivot must be found. 8. DYNAMIC PIVOTING Since the best pivot point cannot be determined a priori, it must be determined dynamically during the exploration process. Since the tradeo curve often exhibits two clusters of Pareto points as described earlier, one approach would be to determine when a cluster is being left, and pivot while exploring the next few points that are not members of either cluster. When exploring the latency axis, this pivot would occur when the curve begins to \atten out" into a roughly horizontal line. One simple method of implementing this dynamic pivot is outlined in Figure 11, in which a window W of constant size (W size ) is kept. This window contains the last n design points explored, many of which were pruned as non-pareto points. If the area of the rst element in the window (A 1 ) is not signicantly larger than the area corresponding to the current time constraint (A n ) at any point during latency-axis exploration, the current point is selected as the pivot point. In other words, if a Pareto point has not been found recently, the curve is attening out and the pivot from latency-axis exploration to area-axis exploration is made. 7 Once again, the EWF example contains several points that are computationally more expensive when solved as RCS problems { thus the dramatic execution time increase between 20% and 10% despite the decrease in points explored.

17 Ecient Optimal Design Space Characterization Methodologies 17 Design Space Exploration using Dynamic Window Pivoting: input tolerance percentage tol of time constraints in window W size = tol j[t min; T max]j generate initial window W = [T 1; T n] = [T min; T ] such that jw j = W size explore window W using latency axis methodology using A 1 and A n from the points (T 1; A 1) and (T n; A n) in W while ((A 1? A n)=a 1 > tol) remove T 1 from W append next T i to W calculate A n using TCS method generate nal window W a as the area constraints [A min; A n] explore W a using area axis methodology Fig. 11. Voyager's Dynamic Pivoting Methodology DIFF AR EWF % time constraints in window 0(a) (t) 2:36 1:48 2:23 4:40 5:20 14: :50 10:08 6:20 7:34 8:21 48: :22 256:05 256:15 2:50 3:50 55: Table 11. Results from dynamic window-based pivoting The results of applying this dynamic window-based pivoting are given in Table 11. To determine when a \signicant" change in area was reached, the size of the current window was compared to the change in area over that window. If the percentage change in area was smaller than the size of the window as a percentage of the total number of time constraints, we pivoted to using the area axis; otherwise we continued using the latency axis. Unfortunately, there was no consistent window size that yielded the best result for every case. In general, a window size of 10%-15% of the time constraints generally seemed to give good results. Looking at Table 11, the rst example shown (DIFFEQ) is small enough that 5% of the time constraints is statistically insignicant, leading to results that are dominated by the area-axis exploration. However, the EWF results give a strong argument for using dynamic pivoting { here a bad a priori choice of using only latency-axis exploration or area-axis exploration (as shown in Table 10) could lead to a signicantly larger execution time than 15% dynamic pivoting. 9. FURTHER RESULTS In all of our results so far, we have used the library presented in Table 8. That library has a number of dierent delays, which complicates any design space exploration methodology that considers clock length determination. However, it has only two alternatives for each operation type, leading to a fairly small number of

18 18 S. A. Blythe and R. A. Walker Module Area Delay Operations mul f*g mul f*g mul f*g sub f-,<g sub f-,<g sub f-,<g add f+g add f+g add f+g Table 12. A module selection intensive library DIFF AR EWF Latency Area Timmer Pivot (15%) 44:22 4:37 32:58 6: :32 213:17 313:27 192: :59 232:01 298:46 43: Table 13. Results when using a library with many module selections 4000 AR Optimal (Pareto-Based) Curve EWF Optimal (Pareto-Based) Curve Area Latency Fig. 12. The EWF and AR optimal Pareto-base curves when using the library from Table 12

19 Ecient Optimal Design Space Characterization Methodologies 19 module selection candidates. Now consider Table 12, which is the opposite: it has fewer unique functional unit delays, and several of those delays are multiples of each other. Both of these factors result in fewer resulting candidate clock lengths (for example, several functional units have 50 as a candidate clock length). However, this library has a much larger number of module selection candidates. Results using this library are presented in Table 13. Compared to results using the previous library when the latency-axis methodology is used, there is signicantly more time being spent exploring the latency axis, since the module selection problem is now much more dicult and latency-axis exploration gets most of its time savings due to the structure of the clock length determination problem. However, for area-axis exploration the results are now generally faster, reecting the savings due to considering the structure of the module selection problem. Again, as with the rst library, EWF gives several RCS problems that are time consuming to solve optimally (while the corresponding TCS problems are not as time consuming), thus dramatically increasing overall run time for the area axis. Note that in all cases the number of area constraints to solve is much higher { thus the savings in execution time must result from the fact that there is more structure to each constraint along the area axis for this library. As with the prior library, Timmer-like exploration (even our neighborhood-based Timmer-like exploration) once again fails to produce faster run times for this library, although in this case the primary contributing factor is not only clock length determination but the number of possible module selection candidates at each of the generated time constraints. For AR and EWF, the pivoting method once again gave the best execution times 8, but this time it also explored fewer points than the Timmer-like method! Finally, note that the resulting design spaces for these two benchmarks are also much more complex for this library, as can be seen in Figure 12. The added complexity of these plots is directly attributable to the complexity of the module selection problem { many more area constraints exist, leading to more Pareto points being derived from the corresponding resource constraints. 10. CONCLUSIONS AND FUTURE WORK This paper has examined the process of design space exploration, reducing that process to one of characterizing the optimal latency-area tradeo curve by nding all the Pareto points on that curve. For the combined problem of scheduling, clock length determination, and module selection, we have presented several exploration methodologies: dedicated latency-axis or area-axis exploration, a Timmer-like exploration method, and two methods (one static, one dynamic) for pivoting between the two axis-based methods. Each of these methodologies takes advantage of the structure found along both the latency and are axes by carefully pruning a large number of sub-optimal solutions at each level of the design cycle, making it possible to use optimal scheduling techniques rather than bounds or estimates. Furthermore, 8 The DIFFEQ benchmark once again gives skewed results as its size does not allow a statistically signicant number of Pareto points to be incorporated within the 15% design window, thus not allowing the pivoting method to take full advantage of the structure in the resulting design space.

20 20 S. A. Blythe and R. A. Walker we discussed how the tradeo curve is dominated by two clusters of Pareto points, and how that structure, along with the structure of the combined problem, can be used to more eciently nd the Pareto points. Tests using various benchmarks and dierent module libraries have shown the importance of considering the clock length determination and module selection problem in conjunction with the scheduling problem. When these subproblems are not considered in conjunction like this (often such subproblems are resolved prior to and independently of scheduling), we have shown that results do not accurately reect the optimal tradeo curve. In many cases, methods that do not consider this combined problem entirely miss globally optimal points. Although the methodologies presented solve the design space exploration problem optimally, they could also be used to generate a preliminary characterization by replacing the optimal scheduler with a heuristic scheduler or lower bound estimate. 9 The reductions in the number of constraints to explore would be similar to that found in the optimal case, but the amount if execution time would be lower at the expense of optimality. At present, although these methodologies allow us to handle more realistic module libraries than most previous methodologies since they consider clock length determination and module selection, they do not consider the type mapping problem. That is, they assume that all operations of a given type are mapped to a single functional unit type (found by module selection). To more more fully take advantage of module libraries, and thus to more completely characterize the optimal tradeo curve, the methodologies (in particular the scheduling portions) must be enhanced to handle the complete type-mapping problem. When nding optimal solutions to this type mapping problem, it will crucial to nd tight heuristic bounding techniques for the type mapping problem so that the axis-based methods can maintain their eciency through pruning methods. Furthermore, a more realistic model of the resulting design must be developed so that the methodology also incorporates registers, interconnect issues, controller eects, etc. REFERENCES Blythe, S. A. and Walker, R. A Towards a Practical Methodology for Completely Characterizing the Optimal Design Space. In Proc. of the 9th International Symposium on System-Level Synthesis (La Jolla, California, Nov ). IEEE Computer Society Press. Brayton, R. K. and Spence, R Sensitivity and Optimization. Computer-aided design of electronic circuits. Elsevier Science Publishing Co., INC., 52 Vandervilt Avenue, New York, NY 10017, USA. Chaiyakul, V., Wu, A. C.-H., and Gajski, D. D Timing Models for High Level Synthesis. In Proc. of the European Design Automation Conference (EuroDAC) (Hamburg, Germany, Feb. 1992), pp. 60{65. IEEE Computer Society Press. Chaudhuri, S., Blythe, S. A., and Walker, R. A An Exact Methodology for Scheduling in a 3D Design Space. In Proc. of the 8th International Symposium on System- Level Synthesis (Cannes, France, Sept ), pp. 78{83. IEEE Computer Society Press. Chaudhuri, S., Blythe, S. A., and Walker, R. A A Solution Methodology for 9 Note that the bounding methodology described here would more fully characterize the design space than the one described in [Timmer et al. 1993], for the reasons explained in Section 4.2.

21 Ecient Optimal Design Space Characterization Methodologies 21 Exact Design Space Exploration in a Three Dimensional Design Space. IEEE Transactions on VLSI Systems 5, 1 (March), (to appear). Chaudhuri, S., Walker, R. A., and Mitchell, J. E Analyzing and Exploiting the Structure of the Constraints in the ILP Approach to the Scheduling Problem. IEEE Transactions on VLSI Systems 2, 4 (Dec.), 456{471. Chen, L.-G. and Jeng, L.-G Optimal Module Set and Clock Cycle Selection for DSP Synthesis. In Proc. of 1991 IEEE International Symp. on Circuits and Systems. (Singapore, June ), pp. 2200{2203. IEEE Computer Society Press. Corazao, M., Khalaf, M., Guerra, L., Potkonjak, M., and Rabaey, J. M Instruction Set Mapping for Performance Optimization. In Proc. of the IEEE/ACM International Conference on Computer-Aided Design (Santa Clara, California, Nov ), pp. 518{521. IEEE Computer Society Press. De Micheli, G Synthesis and Optimization of Digital Circuits. McGraw-Hill series in electrical and computer engineering. McGraw-Hill, New York, NY, USA. Gajski, D. D., Vahid, F., Narayan, S., and Gong, J Specication and Design of Embedded Systems. P T R Prentice-Hall, Englewood Clis, NJ 07632, USA. Jain, R., Parker, A. C., and Park, N Module Selection for Pipeline Synthesis. In Proc. of the 25th ACM/IEEE Design Automation Conf. (Anaheim, California, June ), pp. 542{547. IEEE Computer Society Press. Jha, P., Parameswaran, S., and Dutt, N Reclocking for High Level Synthesis. In Proc. of the Asia-South Pacic Conference on Design Automation (ASP-DAC) (Makuhari Messe, Chiba, Japan, Aug. 29-Sept ). IEEE Computer Society Press. McFarland, M. C Reevaluating the Design Space for Register Transfer Hardware Synthesis. In Proc. of the IEEE/ACM International Conference on Computer-Aided Design (Santa Clara, California, Nov ), pp. 262{265. IEEE Computer Society Press. Narayan, S. and Gajski, D. D System Clock Estimation based on Clock Slack Minimization. In Proc. of the European Design Automation Conference (EuroDAC) (Hamburg, Germany, Feb. 1992), pp. 66{71. IEEE Computer Society Press. Paulin, P. G. and Knight, J. P Force Directed Scheduling for the Behavioral Synthesis of ASICs. IEEE Transactions on Computer-Aided Design 8, 6 (June), 661{679. Timmer, A. H., Heijiligers, M. J. M., and Jess, J. A. G Fast System-Level Area- Delay Curve Prediction. In Proc. of 1st APCHDLSA (1993), pp. 198{ Proc. of the European Design Automation Conference (EuroDAC) (Hamburg, Germany, Feb. 1992). IEEE Computer Society Press.

Area. A max. f(t) f(t *) A min

Area. A max. f(t) f(t *) A min Abstract Toward a Practical Methodology for Completely Characterizing the Optimal Design Space One of the most compelling reasons for developing highlevel synthesis systems has been the desire to quickly

More information

Area. f(t) f(t *) A min. T min. T T* T max Latency

Area. f(t) f(t *) A min. T min. T T* T max Latency This material is based upon work supported by the National Science Foundation under Grant No. MIP-9423953 while the authors were with the Department of Computer Science at Rensselaer Polytechnic Institute.

More information

Area Upper Bounds Optimal Designs Lower Bounds. Feasible Designs. Schedule Length. Area. Schedule Length. Clock Length

Area Upper Bounds Optimal Designs Lower Bounds. Feasible Designs. Schedule Length. Area. Schedule Length. Clock Length IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 5, NO. 1, MARCH 1997 1 A Solution Methodology for Exact Design Space Exploration in a Three-Dimensional Design Space 1 Samit Chaudhuri

More information

2 <3> <2> <1> (5,6) 9 (5,6) (4,5) <1,3> <1,2> <1,1> (4,5) 6 <1,1,4> <1,1,3> <1,1,2> (5,7) (5,6) (5,5)

2 <3> <2> <1> (5,6) 9 (5,6) (4,5) <1,3> <1,2> <1,1> (4,5) 6 <1,1,4> <1,1,3> <1,1,2> (5,7) (5,6) (5,5) A fast approach to computing exact solutions to the resource-constrained scheduling problem M. NARASIMHAN and J. RAMANUJAM 1 Louisiana State University This paper presents an algorithm that substantially

More information

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi Incorporating the Controller Eects During Register Transfer Level Synthesis Champaka Ramachandran and Fadi J. Kurdahi Department of Electrical & Computer Engineering, University of California, Irvine,

More information

Kalev Kask and Rina Dechter. Department of Information and Computer Science. University of California, Irvine, CA

Kalev Kask and Rina Dechter. Department of Information and Computer Science. University of California, Irvine, CA GSAT and Local Consistency 3 Kalev Kask and Rina Dechter Department of Information and Computer Science University of California, Irvine, CA 92717-3425 fkkask,dechterg@ics.uci.edu Abstract It has been

More information

Mahsa Vahidi and Alex Orailoglu. La Jolla CA of alternatives needs to be explored to obtain the

Mahsa Vahidi and Alex Orailoglu. La Jolla CA of alternatives needs to be explored to obtain the Metric-Based Transformations for Self Testable VLSI Designs with High Test Concurrency Mahsa Vahidi and Alex Orailoglu Department of Computer Science and Engineering University of California, San Diego

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Datapath Allocation. Zoltan Baruch. Computer Science Department, Technical University of Cluj-Napoca

Datapath Allocation. Zoltan Baruch. Computer Science Department, Technical University of Cluj-Napoca Datapath Allocation Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca e-mail: baruch@utcluj.ro Abstract. The datapath allocation is one of the basic operations executed in

More information

PARAS: System-Level Concurrent Partitioning and Scheduling. University of Wisconsin. Madison, WI

PARAS: System-Level Concurrent Partitioning and Scheduling. University of Wisconsin. Madison, WI PARAS: System-Level Concurrent Partitioning and Scheduling Wing Hang Wong and Rajiv Jain Department of Electrical and Computer Engineering University of Wisconsin Madison, WI 53706 http://polya.ece.wisc.edu/~rajiv/home.html

More information

System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics

System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics Chunho Lee, Miodrag Potkonjak, and Wayne Wolf Computer Science Department, University of

More information

9.4 SOME CHARACTERISTICS OF INTEGER PROGRAMS A SAMPLE PROBLEM

9.4 SOME CHARACTERISTICS OF INTEGER PROGRAMS A SAMPLE PROBLEM 9.4 SOME CHARACTERISTICS OF INTEGER PROGRAMS A SAMPLE PROBLEM Whereas the simplex method is effective for solving linear programs, there is no single technique for solving integer programs. Instead, a

More information

Worst-case running time for RANDOMIZED-SELECT

Worst-case running time for RANDOMIZED-SELECT Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case

More information

PPS : A Pipeline Path-based Scheduler. 46, Avenue Felix Viallet, Grenoble Cedex, France.

PPS : A Pipeline Path-based Scheduler. 46, Avenue Felix Viallet, Grenoble Cedex, France. : A Pipeline Path-based Scheduler Maher Rahmouni Ahmed A. Jerraya Laboratoire TIMA/lNPG,, Avenue Felix Viallet, 80 Grenoble Cedex, France Email:rahmouni@verdon.imag.fr Abstract This paper presents a scheduling

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica A New Register Allocation Scheme for Low Power Data Format Converters Kala Srivatsan, Chaitali Chakrabarti Lori E. Lucke Department of Electrical Engineering Minnetronix, Inc. Arizona State University

More information

Type T1: force false. Type T2: force true. Type T3: complement. Type T4: load

Type T1: force false. Type T2: force true. Type T3: complement. Type T4: load Testability Insertion in Behavioral Descriptions Frank F. Hsu Elizabeth M. Rudnick Janak H. Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract A new synthesis-for-testability

More information

Accelerated SAT-based Scheduling of Control/Data Flow Graphs

Accelerated SAT-based Scheduling of Control/Data Flow Graphs Accelerated SAT-based Scheduling of Control/Data Flow Graphs Seda Ogrenci Memik Computer Science Department University of California, Los Angeles Los Angeles, CA - USA seda@cs.ucla.edu Farzan Fallah Fujitsu

More information

100-hour Design Cycle: A Test Case. keeping the specication close to the conceptualization. 2. Use of standard languages for input specications.

100-hour Design Cycle: A Test Case. keeping the specication close to the conceptualization. 2. Use of standard languages for input specications. 100-hour Design Cycle: A Test Case Daniel D. Gajski, Loganath Ramachandran, Peter Fung 3, Sanjiv Narayan 1 and Frank Vahid 2 University of California, Irvine, CA 3 Matsushita Electric Works, Research and

More information

9/24/ Hash functions

9/24/ Hash functions 11.3 Hash functions A good hash function satis es (approximately) the assumption of SUH: each key is equally likely to hash to any of the slots, independently of the other keys We typically have no way

More information

An Appropriate Search Algorithm for Finding Grid Resources

An Appropriate Search Algorithm for Finding Grid Resources An Appropriate Search Algorithm for Finding Grid Resources Olusegun O. A. 1, Babatunde A. N. 2, Omotehinwa T. O. 3,Aremu D. R. 4, Balogun B. F. 5 1,4 Department of Computer Science University of Ilorin,

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines B. B. Zhou, R. P. Brent and A. Tridgell Computer Sciences Laboratory The Australian National University Canberra,

More information

GSAT and Local Consistency

GSAT and Local Consistency GSAT and Local Consistency Kalev Kask Computer Science Department University of California at Irvine Irvine, CA 92717 USA Rina Dechter Computer Science Department University of California at Irvine Irvine,

More information

MERL { A MITSUBISHI ELECTRIC RESEARCH LABORATORY. Empirical Testing of Algorithms for. Variable-Sized Label Placement.

MERL { A MITSUBISHI ELECTRIC RESEARCH LABORATORY. Empirical Testing of Algorithms for. Variable-Sized Label Placement. MERL { A MITSUBISHI ELECTRIC RESEARCH LABORATORY http://www.merl.com Empirical Testing of Algorithms for Variable-Sized Placement Jon Christensen Painted Word, Inc. Joe Marks MERL Stacy Friedman Oracle

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

An Algorithm for the Allocation of Functional Units from. Realistic RT Component Libraries. Department of Information and Computer Science

An Algorithm for the Allocation of Functional Units from. Realistic RT Component Libraries. Department of Information and Computer Science An Algorithm for the Allocation of Functional Units from Realistic RT Component Libraries Roger Ang rang@ics.uci.edu Nikil Dutt dutt@ics.uci.edu Department of Information and Computer Science University

More information

High-Level Test Synthesis. Tianruo Yang and Zebo Peng. Department of Computer and Information Science

High-Level Test Synthesis. Tianruo Yang and Zebo Peng. Department of Computer and Information Science An Ecient Algorithm to Integrate Scheduling and Allocation in High-Level Synthesis Tianruo Yang and Zebo Peng Department of Computer and Information Science Linkoping University, S-581 83, Linkoping, Sweden

More information

An Intelligent Priority Decision Making Algorithm for Competitive Operators in List-based Scheduling

An Intelligent Priority Decision Making Algorithm for Competitive Operators in List-based Scheduling IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.1, January 2009 81 An Intelligent Priority Decision Making Algorithm for Competitive Operators in List-based Scheduling Jun-yong

More information

Test Set Compaction Algorithms for Combinational Circuits

Test Set Compaction Algorithms for Combinational Circuits Proceedings of the International Conference on Computer-Aided Design, November 1998 Set Compaction Algorithms for Combinational Circuits Ilker Hamzaoglu and Janak H. Patel Center for Reliable & High-Performance

More information

The only known methods for solving this problem optimally are enumerative in nature, with branch-and-bound being the most ecient. However, such algori

The only known methods for solving this problem optimally are enumerative in nature, with branch-and-bound being the most ecient. However, such algori Use of K-Near Optimal Solutions to Improve Data Association in Multi-frame Processing Aubrey B. Poore a and in Yan a a Department of Mathematics, Colorado State University, Fort Collins, CO, USA ABSTRACT

More information

An Agent-Based Adaptation of Friendship Games: Observations on Network Topologies

An Agent-Based Adaptation of Friendship Games: Observations on Network Topologies An Agent-Based Adaptation of Friendship Games: Observations on Network Topologies David S. Dixon University of New Mexico, Albuquerque NM 87131, USA Abstract. A friendship game in game theory is a network

More information

Algorithms for an FPGA Switch Module Routing Problem with. Application to Global Routing. Abstract

Algorithms for an FPGA Switch Module Routing Problem with. Application to Global Routing. Abstract Algorithms for an FPGA Switch Module Routing Problem with Application to Global Routing Shashidhar Thakur y Yao-Wen Chang y D. F. Wong y S. Muthukrishnan z Abstract We consider a switch-module-routing

More information

a) wire i with width (Wi) b) lij C coupled lij wire j with width (Wj) (x,y) (u,v) (u,v) (x,y) upper wiring (u,v) (x,y) (u,v) (x,y) lower wiring dij

a) wire i with width (Wi) b) lij C coupled lij wire j with width (Wj) (x,y) (u,v) (u,v) (x,y) upper wiring (u,v) (x,y) (u,v) (x,y) lower wiring dij COUPLING AWARE ROUTING Ryan Kastner, Elaheh Bozorgzadeh and Majid Sarrafzadeh Department of Electrical and Computer Engineering Northwestern University kastner,elib,majid@ece.northwestern.edu ABSTRACT

More information

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Zhigang Pan Department of Computer Science University of California, Los Angeles, CA 90095 Email: fcong,pang@cs.ucla.edu

More information

Stochastic branch & bound applying. target oriented branch & bound method to. optimal scenario tree reduction

Stochastic branch & bound applying. target oriented branch & bound method to. optimal scenario tree reduction Stochastic branch & bound applying target oriented branch & bound method to optimal scenario tree reduction Volker Stix Vienna University of Economics Department of Information Business Augasse 2 6 A-1090

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

Enumeration of Full Graphs: Onset of the Asymptotic Region. Department of Mathematics. Massachusetts Institute of Technology. Cambridge, MA 02139

Enumeration of Full Graphs: Onset of the Asymptotic Region. Department of Mathematics. Massachusetts Institute of Technology. Cambridge, MA 02139 Enumeration of Full Graphs: Onset of the Asymptotic Region L. J. Cowen D. J. Kleitman y F. Lasaga D. E. Sussman Department of Mathematics Massachusetts Institute of Technology Cambridge, MA 02139 Abstract

More information

n = 2 n = 1 µ λ n = 0

n = 2 n = 1 µ λ n = 0 A Comparison of Allocation Policies in Wavelength Routing Networks Yuhong Zhu, George N. Rouskas, Harry G. Perros Department of Computer Science, North Carolina State University Abstract We consider wavelength

More information

Richard E. Korf. June 27, Abstract. divide them into two subsets, so that the sum of the numbers in

Richard E. Korf. June 27, Abstract. divide them into two subsets, so that the sum of the numbers in A Complete Anytime Algorithm for Number Partitioning Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, Ca. 90095 korf@cs.ucla.edu June 27, 1997 Abstract Given

More information

Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers

Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers , October 20-22, 2010, San Francisco, USA Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers I-Lun Tseng, Member, IAENG, Huan-Wen Chen, and Che-I Lee Abstract Longest-path routing problems,

More information

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract Implementations of Dijkstra's Algorithm Based on Multi-Level Buckets Andrew V. Goldberg NEC Research Institute 4 Independence Way Princeton, NJ 08540 avg@research.nj.nec.com Craig Silverstein Computer

More information

3 INTEGER LINEAR PROGRAMMING

3 INTEGER LINEAR PROGRAMMING 3 INTEGER LINEAR PROGRAMMING PROBLEM DEFINITION Integer linear programming problem (ILP) of the decision variables x 1,..,x n : (ILP) subject to minimize c x j j n j= 1 a ij x j x j 0 x j integer n j=

More information

Symbolic Modeling and Evaluation of Data Paths*

Symbolic Modeling and Evaluation of Data Paths* Symbolic Modeling and Evaluation of Data Paths* Chuck Monahan Forrest Brewer Department of Electrical and Computer Engineering University of California, Santa Barbara, U.S.A. 1. Introduction Abstract We

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information

Effects of Technology Mapping on Fault Detection Coverage in Reprogrammable FPGAs

Effects of Technology Mapping on Fault Detection Coverage in Reprogrammable FPGAs Syracuse University SURFACE Electrical Engineering and Computer Science College of Engineering and Computer Science 1995 Effects of Technology Mapping on Fault Detection Coverage in Reprogrammable FPGAs

More information

III Data Structures. Dynamic sets

III Data Structures. Dynamic sets III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees Dynamic sets Sets are fundamental to computer science Algorithms may require several different types of operations

More information

under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli

under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli Interface Optimization for Concurrent Systems under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli Abstract The scope of most high-level synthesis eorts to date has

More information

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809 PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA Laurent Lemarchand Informatique ubo University{ bp 809 f-29285, Brest { France lemarch@univ-brest.fr ea 2215, D pt ABSTRACT An ecient distributed

More information

An Embedded Wavelet Video. Set Partitioning in Hierarchical. Beong-Jo Kim and William A. Pearlman

An Embedded Wavelet Video. Set Partitioning in Hierarchical. Beong-Jo Kim and William A. Pearlman An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (SPIHT) 1 Beong-Jo Kim and William A. Pearlman Department of Electrical, Computer, and Systems Engineering

More information

second_language research_teaching sla vivian_cook language_department idl

second_language research_teaching sla vivian_cook language_department idl Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli

More information

A robust optimization based approach to the general solution of mp-milp problems

A robust optimization based approach to the general solution of mp-milp problems 21 st European Symposium on Computer Aided Process Engineering ESCAPE 21 E.N. Pistikopoulos, M.C. Georgiadis and A. Kokossis (Editors) 2011 Elsevier B.V. All rights reserved. A robust optimization based

More information

OUT. + * * + + * * + c1 c2. c4 c5 D2 OUT

OUT. + * * + + * * + c1 c2. c4 c5 D2  OUT Techniques for Functional Test Pattern Execution Inki Hong and Miodrag Potkonjak UCLA Computer Science Department Los Angeles, CA 90095-596 USA Abstract Functional debugging of application specic integrated

More information

A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture

A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture Robert S. French April 5, 1989 Abstract Computational origami is a parallel-processing concept in which

More information

System-Level Synthesis of Application Specific Systems using A* and Generalized Force-Directed Heuristics

System-Level Synthesis of Application Specific Systems using A* and Generalized Force-Directed Heuristics System-Level Synthesis of Application Specific Systems using A* and Generalized Force-Directed Heuristics Chunho Lee, Miodrag Potkonjak, and Wayne Wolff Computer Science Department, University of California,

More information

Global Solution of Mixed-Integer Dynamic Optimization Problems

Global Solution of Mixed-Integer Dynamic Optimization Problems European Symposium on Computer Arded Aided Process Engineering 15 L. Puigjaner and A. Espuña (Editors) 25 Elsevier Science B.V. All rights reserved. Global Solution of Mixed-Integer Dynamic Optimization

More information

CPSC 320 Sample Solution, Playing with Graphs!

CPSC 320 Sample Solution, Playing with Graphs! CPSC 320 Sample Solution, Playing with Graphs! September 23, 2017 Today we practice reasoning about graphs by playing with two new terms. These terms/concepts are useful in themselves but not tremendously

More information

RPUSM: An Effective Instruction Scheduling Method for. Nested Loops

RPUSM: An Effective Instruction Scheduling Method for. Nested Loops RPUSM: An Effective Instruction Scheduling Method for Nested Loops Yi-Hsuan Lee, Ming-Lung Tsai and Cheng Chen Department of Computer Science and Information Engineering 1001 Ta Hsueh Road, Hsinchu, Taiwan,

More information

B553 Lecture 12: Global Optimization

B553 Lecture 12: Global Optimization B553 Lecture 12: Global Optimization Kris Hauser February 20, 2012 Most of the techniques we have examined in prior lectures only deal with local optimization, so that we can only guarantee convergence

More information

Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2

Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2 Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2 X. Zhao 3, P. B. Luh 4, and J. Wang 5 Communicated by W.B. Gong and D. D. Yao 1 This paper is dedicated to Professor Yu-Chi Ho for his 65th birthday.

More information

Submitted for TAU97 Abstract Many attempts have been made to combine some form of retiming with combinational

Submitted for TAU97 Abstract Many attempts have been made to combine some form of retiming with combinational Experiments in the Iterative Application of Resynthesis and Retiming Soha Hassoun and Carl Ebeling Department of Computer Science and Engineering University ofwashington, Seattle, WA fsoha,ebelingg@cs.washington.edu

More information

MOST computations used in applications, such as multimedia

MOST computations used in applications, such as multimedia IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 9, SEPTEMBER 2005 1023 Pipelining With Common Operands for Power-Efficient Linear Systems Daehong Kim, Member, IEEE, Dongwan

More information

Efficient Computation of Canonical Form for Boolean Matching in Large Libraries

Efficient Computation of Canonical Form for Boolean Matching in Large Libraries Efficient Computation of Canonical Form for Boolean Matching in Large Libraries Debatosh Debnath Dept. of Computer Science & Engineering Oakland University, Rochester Michigan 48309, U.S.A. debnath@oakland.edu

More information

On the Rectangle Escape Problem

On the Rectangle Escape Problem CCCG 2013, Waterloo, Ontario, August 8 10, 2013 On the Rectangle Escape Problem Sepehr Assadi Ehsan Emamjomeh-Zadeh Sadra Yazdanbod Hamid Zarrabi-Zadeh Abstract Motivated by a PCB routing application,

More information

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology.

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology. A Fast Recursive Mapping Algorithm Song Chen and Mary M. Eshaghian Department of Computer and Information Science New Jersey Institute of Technology Newark, NJ 7 Abstract This paper presents a generic

More information

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T.

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T. Document Image Restoration Using Binary Morphological Filters Jisheng Liang, Robert M. Haralick University of Washington, Department of Electrical Engineering Seattle, Washington 98195 Ihsin T. Phillips

More information

Optimal Configuration of Compute Nodes for Synthetic Aperture Radar Processing

Optimal Configuration of Compute Nodes for Synthetic Aperture Radar Processing Optimal Configuration of Compute Nodes for Synthetic Aperture Radar Processing Jeffrey T. Muehring and John K. Antonio Deptartment of Computer Science, P.O. Box 43104, Texas Tech University, Lubbock, TX

More information

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs 15.082J and 6.855J Lagrangian Relaxation 2 Algorithms Application to LPs 1 The Constrained Shortest Path Problem (1,10) 2 (1,1) 4 (2,3) (1,7) 1 (10,3) (1,2) (10,1) (5,7) 3 (12,3) 5 (2,2) 6 Find the shortest

More information

Reclocking for High Level Synthesis

Reclocking for High Level Synthesis Reclocking for High Level Synthesis Pradip Jha Sri Parameswaran Nikil Dutt Information and Computer Science Dept of ECE Information and Computer Science University of California, Irvine The University

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

An Embedded Wavelet Video Coder. Using Three-Dimensional Set. Partitioning in Hierarchical Trees. Beong-Jo Kim and William A.

An Embedded Wavelet Video Coder. Using Three-Dimensional Set. Partitioning in Hierarchical Trees. Beong-Jo Kim and William A. An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (SPIHT) Beong-Jo Kim and William A. Pearlman Department of Electrical, Computer, and Systems Engineering Rensselaer

More information

Constraint Analysis and Heuristic Scheduling Methods

Constraint Analysis and Heuristic Scheduling Methods Constraint Analysis and Heuristic Scheduling Methods P. Poplavko, C.A.J. van Eijk, and T. Basten Eindhoven University of Technology, Department of Electrical Engineering, Eindhoven, The Netherlands peter@ics.ele.tue.nl

More information

Efficient QoS Partition and Routing in Multiservice IP Networks

Efficient QoS Partition and Routing in Multiservice IP Networks Efficient QoS Partition and Routing in Multiservice IP Networks I. Atov H. T. Tran R. J. Harris RMIT University Melbourne Box 2476V Vic. 3001, Australia Abstract This paper considers the combined problem

More information

Retiming and Clock Scheduling for Digital Circuit Optimization

Retiming and Clock Scheduling for Digital Circuit Optimization 184 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Retiming and Clock Scheduling for Digital Circuit Optimization Xun Liu, Student Member,

More information

Algorithms for Integer Programming

Algorithms for Integer Programming Algorithms for Integer Programming Laura Galli November 9, 2016 Unlike linear programming problems, integer programming problems are very difficult to solve. In fact, no efficient general algorithm is

More information

A B. A: sigmoid B: EBA (x0=0.03) C: EBA (x0=0.05) U

A B. A: sigmoid B: EBA (x0=0.03) C: EBA (x0=0.05) U Extending the Power and Capacity of Constraint Satisfaction Networks nchuan Zeng and Tony R. Martinez Computer Science Department, Brigham Young University, Provo, Utah 8460 Email: zengx@axon.cs.byu.edu,

More information

An Order-2 Context Model for Data Compression. With Reduced Time and Space Requirements. Technical Report No

An Order-2 Context Model for Data Compression. With Reduced Time and Space Requirements. Technical Report No An Order-2 Context Model for Data Compression With Reduced Time and Space Requirements Debra A. Lelewer and Daniel S. Hirschberg Technical Report No. 90-33 Abstract Context modeling has emerged as the

More information

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and Using Local Trajectory Optimizers To Speed Up Global Optimization In Dynamic Programming Christopher G. Atkeson Department of Brain and Cognitive Sciences and the Articial Intelligence Laboratory Massachusetts

More information

Worst Case Execution Time Analysis for Synthesized Hardware

Worst Case Execution Time Analysis for Synthesized Hardware Worst Case Execution Time Analysis for Synthesized Hardware Jun-hee Yoo ihavnoid@poppy.snu.ac.kr Seoul National University, Seoul, Republic of Korea Xingguang Feng fengxg@poppy.snu.ac.kr Seoul National

More information

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Power-Mode-Aware Buffer Synthesis for Low-Power

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a Multi-Layer Incremental Induction Xindong Wu and William H.W. Lo School of Computer Science and Software Ebgineering Monash University 900 Dandenong Road Melbourne, VIC 3145, Australia Email: xindong@computer.org

More information

High Level Synthesis

High Level Synthesis High Level Synthesis Design Representation Intermediate representation essential for efficient processing. Input HDL behavioral descriptions translated into some canonical intermediate representation.

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu,

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu, Mining N-most Interesting Itemsets Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang Department of Computer Science and Engineering The Chinese University of Hong Kong, Hong Kong fadafu, wwkwongg@cse.cuhk.edu.hk

More information

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli Eect of fan-out on the Performance of a Single-message cancellation scheme Atul Prakash (Contact Author) Gwo-baw Wu Seema Jetli Department of Electrical Engineering and Computer Science University of Michigan,

More information

Data Wordlength Optimization for FPGA Synthesis

Data Wordlength Optimization for FPGA Synthesis Data Wordlength Optimization for FPGA Synthesis Nicolas HERVÉ, Daniel MÉNARD and Olivier SENTIEYS IRISA University of Rennes 6, rue de Kerampont 22300 Lannion, France {first-name}.{name}@irisa.fr Abstract

More information

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley Department of Computer Science Remapping Subpartitions of Hyperspace Using Iterative Genetic Search Keith Mathias and Darrell Whitley Technical Report CS-4-11 January 7, 14 Colorado State University Remapping

More information

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the Heap-on-Top Priority Queues Boris V. Cherkassky Central Economics and Mathematics Institute Krasikova St. 32 117418, Moscow, Russia cher@cemi.msk.su Andrew V. Goldberg NEC Research Institute 4 Independence

More information

(RC) utilize CAD tools to perform the technology mapping of a extensive amount of time is spent for compilation by the CAD

(RC) utilize CAD tools to perform the technology mapping of a extensive amount of time is spent for compilation by the CAD Domain Specic Mapping for Solving Graph Problems on Recongurable Devices? Andreas Dandalis, Alessandro Mei??, and Viktor K. Prasanna University of Southern California fdandalis, prasanna, ameig@halcyon.usc.edu

More information

HIGH-LEVEL SYNTHESIS

HIGH-LEVEL SYNTHESIS HIGH-LEVEL SYNTHESIS Page 1 HIGH-LEVEL SYNTHESIS High-level synthesis: the automatic addition of structural information to a design described by an algorithm. BEHAVIORAL D. STRUCTURAL D. Systems Algorithms

More information

However, no results are published that indicate the applicability for cycle-accurate simulation purposes. The language RADL [12] is derived from earli

However, no results are published that indicate the applicability for cycle-accurate simulation purposes. The language RADL [12] is derived from earli Retargeting of Compiled Simulators for Digital Signal Processors Using a Machine Description Language Stefan Pees, Andreas Homann, Heinrich Meyr Integrated Signal Processing Systems, RWTH Aachen pees[homann,meyr]@ert.rwth-aachen.de

More information

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Helga Ingimundardóttir University of Iceland March 28 th, 2012 Outline Introduction Job Shop Scheduling

More information

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a Preprint 0 (2000)?{? 1 Approximation of a direction of N d in bounded coordinates Jean-Christophe Novelli a Gilles Schaeer b Florent Hivert a a Universite Paris 7 { LIAFA 2, place Jussieu - 75251 Paris

More information

Solve the Data Flow Problem

Solve the Data Flow Problem Gaining Condence in Distributed Systems Gleb Naumovich, Lori A. Clarke, and Leon J. Osterweil University of Massachusetts, Amherst Computer Science Department University of Massachusetts Amherst, Massachusetts

More information

[HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE

[HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE [HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE International Conference on Computer-Aided Design, pp. 422-427, November 1992. [HaKa92b] L. Hagen and A. B.Kahng,

More information

Fundamentals of Operations Research. Prof. G. Srinivasan. Department of Management Studies. Indian Institute of Technology, Madras. Lecture No.

Fundamentals of Operations Research. Prof. G. Srinivasan. Department of Management Studies. Indian Institute of Technology, Madras. Lecture No. Fundamentals of Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture No. # 13 Transportation Problem, Methods for Initial Basic Feasible

More information

Static Compaction Techniques to Control Scan Vector Power Dissipation

Static Compaction Techniques to Control Scan Vector Power Dissipation Static Compaction Techniques to Control Scan Vector Power Dissipation Ranganathan Sankaralingam, Rama Rao Oruganti, and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer

More information

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD CAR-TR-728 CS-TR-3326 UMIACS-TR-94-92 Samir Khuller Department of Computer Science Institute for Advanced Computer Studies University of Maryland College Park, MD 20742-3255 Localization in Graphs Azriel

More information

Computer Technology Institute. Patras, Greece. In this paper we present a user{friendly framework and a

Computer Technology Institute. Patras, Greece. In this paper we present a user{friendly framework and a MEASURING SOFTWARE COMPLEXITY USING SOFTWARE METRICS 1 2 Xenos M., Tsalidis C., Christodoulakis D. Computer Technology Institute Patras, Greece In this paper we present a user{friendly framework and a

More information