Issues and Approaches to Coarse-Grain Reconfigurable Architecture Development

Size: px

Start display at page:

Download "Issues and Approaches to Coarse-Grain Reconfigurable Architecture Development"

Charlotte Janice Reeves
5 years ago
Views:

1 Issues and Approaches to Coarse-Grain Reconfigurable Architecture Development Ken Eguro and Scott Hauck Department of Electrical Engineering University of Washington Seattle, WA USA Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 03) Presented by Marko Neitola, Electronics Lab, Dept. of Electrical and Information Engineering, University of Oulu, Finland

2 Contents Introduction Implications of Domain specialized Devices Difficulties of Functional Unit allocation Functional Unit allocation Functional Unit allocation Results Conclusions

3 1. Introduction The effects that fundamental architecture decisions have on specialized reconfigurable devices are largely unknown and difficult to quantify. How to simultaneously consider all applications in a domain and determine the most appropriate overall number and ratio of the different functional units? This paper attempts to study how this problem manifests itself during the development of an encryption-specialized FPGA architecture. This paper focuses on the problem of functional unit allocation i.e determining the most appropriate quantity and ratio of functional units across the domain. Three algorithms for the functional unit allocation are presented Concerns brought up by this paper need to be addressed by future CAD tools

4 2. Implications of Domain-Specialized Devices Fine-grained logical resources replaced with a set of coarse-grained function units. This improves the system performance, BUT developers need to closely consider how a given implementation would use the provided resources. The logical elements are no longer universally flexible so it is not obvious: what is the most appropriate set of functional units what is routing architecture what implications are there on necessary CAD tools how many of these factors might affect each other. This paper will primarily focus on the problem of functional unit allocation and overlooks other possible issues of optimization such as operator identification and optimization.

5 3. Difficulties of Functional Unit Allocation Determining the best way to allocate additional resources Performance versus area curves do not offer enough information Complicated relationships between functional unit demands across a domain Case studies for implementation: 15 candidate algorithms of the Advanced Encryption Standard (AES) competition are analyzed to illustrate the issues that make component allocation difficult. Various resource requirements to implement each of the algorithms at natural unrolling points, from relatively small, time multiplexed elements to completely unrolled implementations. Four main factors obscure the relationship between hardware resources and performance.

6 Fully rolled implementations Ratio Complications

7 Factor 1 of 4: Equalization Difficulties

8 Factor 2 of 4: Complexity Disparity (Unevenness)

9 Factor 3 of 4: Scaling Behavior

10 Factor 4 of 4: Availability of Functional Units The last problem of estimating performance from available resources is that if a particular implementation requires more functional units of a certain type than is available, the needed functionality can often be emulated with combinations of the other, under-utilized units. For example, a regular bit permutation could be accomplished with a mixture of shifting and masking. Although this flexibility may improve resource utilization, it also dramatically increases the number of designs to be evaluated.

11 4. Function Unit Allocation Solving the allocation problem: Simultaneously address hardware requirements while maximizing usability and maintaining hard or soft area and performance constraints. Presenting three function unit allocation algorithms to solve this problem. The first algorithm addresses hard performance constraints. The second and third algorithms attempt to maximize the overall performance given softer constraints. i = 0 Cost = N 1 CC i i = 0 N 1 { CC i i = 0 Cost = N 1 PC A otherwise i = 0 N 1 Cost = CC i + Area Penalty if algorithm i fits on the architecture (1) (2) (3)

12 4.1 Performance-Constrained Algorithm, starting point Functional unit requirements of three encryption algorithms in a range of performance levels. A hard throughput constraint is given - 4 clock cycles / block

13 4.1 Performance-Constrained Algorithm, determine slowest implementation Now we determine the slowest implementation for each algorithm that still satisfies the minimum throughput requirement. Then, eliminate implementations below the performance treshold.

14 4.1 Performance-Constrained Algorithm - Minimum Number of Each Resource type unroll Based on this subset of implementations, determine the minimum number of each resource type. If possible, further unroll the algorithms to better utilize available resources

15 4.2 Area-Constrained Algorithm Cost = { CC i i = 0 N 1 PC A if algorithm i fits on the architecture otherwise

16 4.3 Improved Area-Constrained Algorithm - starting point Eliminate any implementations below the given performance treshold, then randomly choose a throughput level for each algorithm and determine the minimum hardware requirements Unroll the algorithms further, if possible

17 4.3 Improved Area-Constrained Algorithm - Evaluation Cost = = 106 Evaluate the throughput and penalize for any excessive area required by the resulting architecture. Cost N 1 N 1 = CC i + Area Penalty = CC i + PC CA MA i = 0 i = 0

18 4.3 Improved Area-Constrained Algorithm - Choose The Implementation Randomly choose a new implementation for one algorithm (here Z), and determine the hardware requirements for the new configuration.

19 4.3 Improved Area-Constrained Algorithm - Accept the New State According to Cost Function Cost = = 106 Cost = = 34 Despite the lower performance, the new state will be accepted. Cost N 1 N 1 = CC i + Area Penalty = CC i + PC CA MA i = 0 i = 0

20 5. Function Unit Allocation Results Minimum Throughput Results of Functional Unit Selection Max. # of clock cycles per block vs. area

21 Performance Results of Functional Unit Selection Across the Domain Total # of clock cycles required by all algorithms vs. area

22 Minimum Throughput Results of Functional Unit Selection on a Limited Domain

23 Performance Results of Functional Unit Selection Across a Limited Domain

24 Conclusions The development of a coarse-grained reconfigurable architecture raises several unique and un-addressed design problems. Three techniques to allocate functional units that attempt to balance performance and area constraints on domains that have vastly different hardware requirements. 1st: Architectures under a guaranteed hard performance requirement 2nd: Allows to trade versatility for better average throughput. 3rd: Produces efficient structures that take advantage of softer area constraints Future development Increased specialization of function units, growing domain size Need for resource utilization optimization techniques CAD tools that are aware of these issues

Tuning Coarse-Grained Reconfigurable Architectures towards an Application Domain

Tuning Coarse-Grained Reconfigurable Architectures towards an Application Domain Julio Oliveira Filho, Thomas Schweizer, Tobias Oppold, Tommy Kuhn, Wolfgang Rosenstiel Department of Computing Engineering