Storage Binding in RTL synthesis

Similar documents
Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

The Codesign Challenge

AADL : about scheduling analysis

Load Balancing for Hex-Cell Interconnection Network

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

An Optimal Algorithm for Prufer Codes *

ELEC 377 Operating Systems. Week 6 Class 3

Programming in Fortran 90 : 2017/2018

A Binarization Algorithm specialized on Document Images and Photos

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Parallelism for Nested Loops with Non-uniform and Flow Dependences

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Memory Modeling in ESL-RTL Equivalence Checking

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Behavior-Level Observability Analysis for Operation Gating in Low-Power Behavioral Synthesis

Specifications in 2001

Efficient Distributed File System (EDFS)

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Parallel matrix-vector multiplication

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Assembler. Building a Modern Computer From First Principles.

A fast algorithm for color image segmentation

GSLM Operations Research II Fall 13/14

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Module Management Tool in Software Development Organizations

Meta-heuristics for Multidimensional Knapsack Problems

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Simulation Based Analysis of FAST TCP using OMNET++

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Mathematics 256 a course in differential equations for engineering students

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

Multiblock method for database generation in finite element programs

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Intro. Iterators. 1. Access

CMPS 10 Introduction to Computer Science Lecture Notes

Brave New World Pseudocode Reference

CHAPTER 2 DECOMPOSITION OF GRAPHS

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Edge Detection in Noisy Images Using the Support Vector Machines

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

Problem Set 3 Solutions

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

Solving two-person zero-sum game by Matlab

On Some Entertaining Applications of the Concept of Set in Computer Science Course

Smoothing Spline ANOVA for variable screening

RADIX-10 PARALLEL DECIMAL MULTIPLIER

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

Chapter 1. Introduction

A Deflected Grid-based Algorithm for Clustering Analysis

Video Proxy System for a Large-scale VOD System (DINA)

Support Vector Machines

Scheduling with Integer Time Budgeting for Low-Power Optimization

High-Boost Mesh Filtering for 3-D Shape Enhancement

Greedy Technique - Definition

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Maintaining temporal validity of real-time data on non-continuously executing resources

Classifier Selection Based on Data Complexity Measures *

Lossless Compression of Map Contours by Context Tree Modeling of Chain Codes

Load-Balanced Anycast Routing

Application of VCG in Replica Placement Strategy of Cloud Storage

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Oracle Database: SQL and PL/SQL Fundamentals Certification Course

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Interconnect Optimization for High-Level Synthesis of SSA Form Programs

ASSERTION SUPPORT IN HIGH-LEVEL SYNTHESIS DESIGN FLOW. 351 crs de la Libération, Talence, France

Array transposition in CUDA shared memory

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Ontology Generator from Relational Database Based on Jena

Flexible ASIC: Shared Masking for Multiple Media Processors

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

IP Camera Configuration Software Instruction Manual

Hermite Splines in Lie Groups as Products of Geodesics

A Topology-aware Random Walk

ETAtouch RESTful Webservices

Conditional Speculative Decimal Addition*

Report on On-line Graph Coloring

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Support Vector Machines

Wavefront Reconstructor

Improved Symoblic Simulation By Dynamic Funtional Space Partitioning

Machine Learning 9. week

Design and Analysis of Algorithms

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

The stream cipher MICKEY-128 (version 1) Algorithm specification issue 1.0

Transcription:

Storage Bndng n RTL synthess Pe Zhang Danel D. Gajsk Techncal Report ICS-0-37 August 0th, 200 Center for Embedded Computer Systems Department of Informaton and Computer Scence Unersty of Calforna, Irne Irne, CA 92697-3425, USA (949) 824-8059 {pzhang, gajsk}@cs.uc.edu Abstract In ths report, we present the mplementaton of storage bndng whch s one key process n hgh-leel (RTL) synthess. In preous related works, storage bndng s based on solated regster, or use 0- nteger lnear programmng (ILP) for multple port memores to get optmal result. In ths report, we ntroduce two new approaches that use graph-parttonng algorthm and groupng method to map arables nto regster fles and memory that are normally used n ndustry.

Contents Introducton 2 Target archtecture and fe styles n RTL model 2 2. Target archtecture.. 2 2.2 Fe styles n RTL model 2 3 Project goal 3 4 Implementaton of regster bndng 3 4. Data structure of our mplementaton 3 4.2 Approach one. 4 4.2. Get arable nformaton.. 4 4.2.2 Bnd arrays to memores.. 4 4.2.3 Clque-parttonng... 4 4.2.4 Group clques to regster fles.. 6 4.2.5 Adjustment.. 6 4.2.6 Example.. 7 4.3 Approach two 8 4.3. Splt all arables nto small groups 9 4.3.2 Clque-Parttonng n small groups 9 4.3.3 Adjustment.. 0 4.3.4 Example 0 5 Experments 0 6 Summary 2 7 Future works 2 Reference 3

Lst of Fgures Target archtecture 2 Synthess tasks n RTL synthess.. 3 3 CDFG data structure 4 4 The procedure of storage bndng approach 4 5 Get arable nformaton 4 6 Graph-parttonng algorthm 6 7 CDFG of the example 7 8 Lfetmes of arables 8 9 Clque-parttonng result.. 8 0 Bndng result usng approach 8 The procedure of storage bndng approach 2... 8 2 Result of sorted lfetme arables 0 3 Result of clque-parttonng n each L 0 4 Bndng result usng approach 2 5 Square-root approxmaton 6 Process of graph-parttonng 2 7 Experment results (Approach ).. 2 8 Experment results (Approach 2).. 2 Lst of Tables Smple example of 3 clques 8 2 Varables Lfetme Table. 3 Prorty weght Table 4 Comparson of dfferent approaches () 2 5 Comparson of dfferent approaches (2) 2 Lst of Algorthm. Approach of Storage Bndng Usng Multple Port Regster Fles 5 2. Approach 2 of Storage Bndng Usng Multple Port Regster Fles 9

Storage Bndng n RTL Synthess Pe Zhang, Danel D. Gajsk Center for Embedded Computer Systems Unersty of Calforna, Irne Irne, CA 92697-3425, USA Abstract In ths report, we present the mplementaton of storage bndng whch s one key task n hghleel (RTL) synthess. In preous related works, storage bndng s based on solated regster, or uses 0- nteger lnear programmng (ILP) for multple port memores to get optmal result. In ths report, we ntroduce two new approaches that use clque-parttonng algorthm and groupng method to map arables nto regster fles and memores that are normally used n ndustry. Introducton Hgh-leel (RTL) synthess s normally dded nto four separate tasks: schedulng, storage bndng, functonal unt bndng and nterconnecton bndng. The storage bndng bnds the arables to the storage unts, such as regsters, regster fles and memores. Recently, there s a trend for desgner to use regster fles other than solated regsters n the storage bndng. There are seeral approaches for the storage bndng usng regster fles. But all of them are too complcated, tme-consumng and not feasble for the large scale desgns. In ths report, we descrbe some news approaches of storage bndng usng regster fles n hgh-leel (RTL) synthess. The rest of the report s organzed as follows: secton 2 shows the 5 leels n RTL descrpton; Secton 3 ges the goal of ths project; Secton 4 descrbes mplementaton, algorthm, data structure and of the RTL storage bndng. Fnally, experment results for our algorthms are gen n secton 5. Secton 6 makes a concluson and secton 7 ges some drectons of future Datapath Input mux control sgnals from control unt Regster Fle Regster Fle 2 Memory Bus Bus 2 Bus 3 mux Regster Regster Regster Regster ALU * / Regster Regster Datapath Output Fgure : Target archtecture Bus 4

works. 2 Target archtecture and fe styles n RTL model 2. Target archtecture Our archtecture s shown n fgure. Snce the RTL synthess s focus on data-path, we only show the data-path part of a general desgn, whch wll be controlled by the control sgnals from control unt. It s composed wth regster fles, memores, busses, functonal unts and multplexers. Regster fles and memores get data from nputs and nternal busses, send data to other nternal busses. Then the data from busses s the nput of functonal unts. After functonal unts fnsh ther tasks, they send data to busses for Regster fles and memores' nput, or for data-path outputs. Here, nstead of solated regster, our desgn s based on regster fles and memores as storage unts because regster fles and memores can be more structured, modular and dense, and requres less chp area because of ts regular layout structure. The regsters aboe and below the functonal unts, as well as latch between the functonal unts, are for the purpose of ppelne. 2.2 Fe styles n RTL model We use Fnte State Machne wth Data (FSMD) to descrbe the RTL model. FSMD s an FSM wth assgnment statement added to each state. The RTL model has two ews: a behaoral RTL ew and a structural RTL ew. The behaoral RTL ew specfes the operatons performed n each clock cycle wth explctly modelng the unts n the component's data-path and s obtaned by schedulng the operatons n the C code nto clock cycles. The structural RTL ew explctly models the schedulng of regster transfers nto clock cycles the allocaton and bndng of operatons, arables and nterconnectons to functonal unts, regster fles/memores and nternal busses respectely. From behaor RTL ew to structural RTL ew, t ncludes the three tasks of hgh-leel synthess: storage bndng, functonal unt bndng and nterconnecton bndng. In [GAJS00], the RTL model s dded nto 5 well-defned styles to represent the refnement steps lke schedulng, storage bndng, functonal unt bndng and nterconnecton bndng from behaoral RTL ew (style ) to structural RTL ew (style 5) Style : Behaoral RTL (unmapped RTL). Behaoral RTL only specfes the change of alues for some arables n each state. States, transtons and assgnment statements are n no way related to any mplementaton. Schedulng task focuses on ths style. Style 2: Storage-mapped RTL. The arables n style can be of two types. One type s arables whose alue s used n the same state n whch that alue s assgned. These arables wll be mplemented as wres or busses n the fnal mplementaton. The other type s arables whose alues are assgned n one state and used n other state. The states between the alue assgnment and ts last usage defne the lfetme of each arable. These arables must be mapped to storage unts such as regster, regster fles, and memores n the fnal mplementaton. Thus style 2 represents RTL descrpton n whch the second type of arables wth nonoerlappng lfetmes are grouped and assgned to storage unts. Storage bndng task wll mplement the transfer from style to style 2. Style 3: Functon-mapped RTL. In style 3, the operators and/or functons wth nonoerlappng lfetmes are grouped nto functonal unts, and a control encodng s assgned to each operaton n the functonal unt. Functonal unt bndng task wll mplement the transfer from style 2 to style 3. Style 4: Connecton-mapped RTL. Smlarly to style 2, the arables, wth nonoerlappng lfe tmes, that represent wres as well as nputs and outputs to storage elements and functonal unts are grouped and assgned to busses. Syntactcally, there s no dfference between wres and buses. Interconnecton bndng task wll mplement the transfer from style 3 to style 4. Style 5: Exposed-control (structural) RTL. In style 5, the FSMD mplementaton s 2

descrbed n two parts: netlst of data-path components and a control unt that control the data-path components usng control sgnals n each state. 3 Project goal Due to the 5 styles n RTL, we dde the hghleel (RTL) synthess nto seeral separate tasks, whch nclude schedulng, storage bndng, functonal unt bndng and nterconnecton bndng (as shown n fgure 2). The sequences of storage bndng, functonal unt bndng and nterconnecton bndng can be any order of these three tasks to make the whole synthess task more freely. From RTL style to RTL Style 2, arables wth non-oerlappng lfetmes need to be grouped and assgned to storage unts, such as regster, regster fles and memory. Snce the storage unts usually occupy a substantal slcon area n a mcrochp, we generally try to reduce the number of storage unts by mergng seeral arables nto a storage unt, whch wll lead to smaller area. RTL Descrpton (style ~4) number of regster n each regster fle, as well as the mnmum cost of nterconnecton. [AHCH92] uses 0- nteger lnear programmng (ILP) to group arables nto mult-port memores, whch s ery dffcult and tmeconsumng when the desgn sze s n large scale. Here we ntroduce another two approaches, whch use clque-parttonng and groupng to make the whole task much easer and to get the smlar results. In our mplementaton, the number of regster fles and memory s fxed before the storage bndng task, that s so-called resource constrant storage bndng. So the qualty metrcs s not the mnmum number of regster fles. We can use the followng metrcs to compare the results:. The number of regsters n each regster fles and the total number of regsters n all regster fles; 2. The ports usage n eery regster fles; After the storage bndng, we can ge users the feedback of bndng results. Users then can add/remoe the number of regster fles or change the sze of regster fles to hae hgher usage of resources. FSMD/ CDFG C++/HDL Compler Schedulng Storage Bndng Functon Bndng Connecton Bndng RTL Code Generator Lbrary In our mplementatons, we hae the followng assumptons:. All arables hae the same type; 2. All gen regster fles are the same; 3. The number of gen regster fles are suffcent for storage bndng, so we don't consder to mnmze the number of the regster fles. 4 Implementaton of storage bndng RTL Descrpton (style ~5) Fgure 2: Synthess tasks n RTL synthess Tradtonal storage bndngs use solated regsters, whch s not effcent. Here we use mult-port regster fles and memores as storage unts. Generally, the goal of storage bndng when usng regster fles s to desgn procedures to enable fast automatc arables to storage unts bndng. After storage bndng, each arable wth non-oerlappng lfetmes wll assgn to the mnmum regster fle modules and mnmum Instead of ILP, we hae two approaches for the storage bndng usng regster fles. The two approaches are all based on clque-parttonng and groupng method. They are smlar except that they hae dfference orders of step n each procedure. 4. Data structure of our mplementatons In order to perform our works, we use CDFG as our basc data structure, whch s also used n 3

other parts n hgh-leel synthess. Fgure 3 ge the class data structure of our CDFG. There are fe major steps n ths approach:. Get arables nformaton whch wll be used n the followng steps; 2. Bnd arrays to memory; 3. Clque-parttonng the arables; 4. Group the clques to the regster fles; 5. Adjustment; 4.2. Get arable nformaton Fgure 3: CDFG data structure The detal of the CDFG data structure can be found n [DOGA0]. 4.2 Approach one - Groupng after clque-parttonng The procedure of the storage bndng approach one s descrbed by the fgure 4. The algorthm related to regster fles s shown n algorthm. Frst, we wll get all arables nformaton from class HLS_FSMD The procedure of gettng arable nformaton s explan n fgure 5. Start last state? No get suggraph n current state/node last node n subgraph? Yes Input:.The scheduled states; 2.Bnded or unbnded functon nodes Yes No nodetype = control Get Storage Info from HLS_FSMD No No nodetype = storage Fnd arrays and map them to MEM Yes add node nfo. nto VAR_LIST next node Clque-parttonng on left arables next state Group clques to the number of gen regster fles End Fgure 5: Get arable nformaton Are ports and regsters n all regster fle's aalable? Y Output: The bnded storage nodes N Adjust the clques n regster fles Fgure 4: The procedure of storage bndng approach 4.2.2 Bnd arrays to memores Snce the structure of memores s ery sutable for the arrays, we fnd the arrays n all arables and assgn them to the gen me mores. 4.2.3 Clque-parttonng 4

The followng steps are for the left arables that wll be assgned to regster fles. Let L be the set of all arables needed to be bnded: L={ 2... m } m: the number of arables The procedure of clque-parttonng algorthm [GAJS97] s descrbed as fgure 6. Here we hae to ge the defnton of the lfetme of a arable. It s defned as the set of states n whch that arable s ale. The ale states ncludes the state followng that state n whch t s assgned a new alue (wrte state ), eery state n whch t s used on the rght-hand sde of a assgnment statement (read state), and all states on each path between the wrte state and a read state. for all L do Start(); End(); endfor C = ClqueParttonng(L); // Get lfetmes of arables // Clque-Parttonng frst for all arables for all r C do Start(r); End (r); endfor SORT(C); ther n = NumofRF(RF); Set NumofRnRF() = number of regsters n each Set Aalableport() = number of ports n each // Get lfetmes of clques (regsters) // sort the clques (regsters) n C n ascendng order wth // start tmes, Start(r), as the prmary key and end tmes, // End(r), as the secondary key RF ; // =,2..n RF ; // =,2..n C... reg_ndex()=0; whle C Φ do //Dde C nto C, C... C, n= Number of Regster Fles 2 n for =; ++; <=n do temp_reg = frst r n C; ADD( C,temp_reg); reg_ndex()++; C=DELETE(C, temp_reg); f C =Φ then break; endf, C C 2 n = Φ ; endfor endwhle // Check Regster number and port number Whle any NumofRnRF() reg_ndex() or Aalableport() port() do f NumofRnRF() reg_ndex() or Aalableport() port() then MOVE( C, C,oneRegn j C ); // C has the least number of clques (regsters) j endf endwhle Algorthm : Approach of Storage Bndng Usng Multple Port Regster Fles 5

Then the compatblty graph wll be generated. The compatblty graph conssts of nodes and edges, n whch each node represents a arable and each edge between two nodes represents compatblty (prorty edge) or ncompatblty (ncompatblty edge) n mergng arables represented by these two nodes. The ncompatblty edge ndcates arables wth oerlappng lfetmes, whle the prorty edge ndcates arables wth non-oerlappng lfetmes. Each prorty has a weght w that can be represented as: w= s+d () s: The number of dfferent functonal unts that use both nodes as left or rght operands; d: The number of dfferent functonal unts that generate results for both nodes. No Start Create compatblty graph Merge hghest prorty nodes Upgrade compatblty graph All nodes ncompatble? Yes Stop Fgure 6: Clque-parttonng algorthm Here we should consder two condtons when we calculate w: before functonal unts bndng and after functonal unts bndng. If before functonal unts bndng, we can thnk the eery operaton as dfferent functonal unts f they hae dfferent symbolc descrpton, for examples, + and /. But f after functonal unts bndng, the same symbolc operaton descrptons may be mapped to dfferent functonal unts, so they can also be thought as dfferent functonal unts. Usng clque-parttonng algorthm, we can get the mnmum number of requred number of regsters. After graph parttonng, the arables are grouped nto dfferent clques, whch form the set C. Eery clque has ts own lfetme that s equal the total of the lfetmes of the arables n the clque. The lfetme n the clque may be not contnuous states. For examples, clque hae two arables whose lfetme are state ~3 and state 5~7 respectely, then the lfetme of ths clque s the state ~3 and 5~7. The lfetme of clques wll be used n the followng step. 4.2.4 Group clques to regster fles Regster fle ncludes seeral regster and multples n/out port. Snce the number of regster fles s user-gen and fxed, we should fnd a method to assgn the clques to the regster fles and make all regster fles do not hae regsters and ports conflcts. Suppose the number of ge regster fles s n. Here we use sorted clques lfetme to dstrbute the clques. We hae the followng steps to dstrbute the clques:. We sort the clques (regsters) n C n ascendng order wth ther start tmes, Start(r), as the prmary key and end tmes, End(r), as the secondary key; 2. We use n as the module, splt C nto small groups, C, C... C. The frst clque n C s 2 n assgn to C, the second one goes to C... 2 then the n+th clque to C agan. Repeat ths process untl all clques n C are used. C..., C C 2 n wll correspond to regster fle, regster fle2...regster fle n respectely. 4.2.5 Adjustment We should consder not only number of regsters n each regster fle, but also the ports of each regster fle. The clques number n each C maybe more than the number of regsters n regster fles. The ports of regster fles also could hae conflcts. When we assgn the dfferent clques to the regsters n regster fles, these clques can hae oerlappng lfetmes. It wll work fne f n eery state, the number of lfetme oerlappng clques whch 6

2 S 6 + 3 4 3 = + 2 S 2 S 3 0 5 x + 7 / + + 8 9 7 = 3-4 5 = 3 * 6 = 0 / 5 8 = 5 + 3 9 = + 7 S 4 & 2 = 8 & 3 = 8 9 2 3 Fgure 7: CDFG of the example nclude ths state do not exceed the number of ports n the target regster fles. The lfetmes of each arable are gen n fgure For examples, f there are three clques whose lfe tmes are showed n table. S S2 S3 S4 S5 Clque X X X Clque 2 X X Clque 3 X X X Table : Smple example of 3 clques 8. Here L={ S 2 3 3 }. 2 3 4 5 6 7 8 9 0 2 3 In state 3, we can see all 3 clques hae oerlappng lfetmes. So f gen a regster fle hang two ports, only two of these three can be assgned to ths regster fle. Besdes lfetme, we also should consder n ports of the regster fle that s correspondng to the wrte state of the arables. So n ths step, we wll make some adjustment of the clques n each regster fle.. Fnd the C that hae regsters or ports conflcts; 2. Fnd the C whch has the least number of j regster and least number of ports usage; 3. Moe clque that cause conflct n C to C, j check f any conflct n both C and C ; If j so, go to agan. 4.2.6 Example Here we use an example to explan the approach. Fgure 7 ges the CDFG of the example. S 2 S 3 S 4 Fgure 8: Lfetmes of arables Then we do clque-partton and get the results shown n fgure 9. We group all arables nto 5 clques. We let C={ r r2 r3 r4 r 5 }. 7

r r 2 r 3 r 4 r5 Input:.The scheduled states; 2.Bnded or unbnded functon nodes 2 S 0 4 6 Get Storage Info from HLS_FSMD 3 S 2 Fnd arrays and map them to MEM S 3 5 7 Group arables to small groups. The number of small groups equal to gennumber of gen regster fles S 4 8 9 clque-partton n each small group 2 3 Fgure 9: Clque-parttonng result Are ports and regsters n all regster fle's aalable? Y N Adjust the arables n small groups Three regster fles wth one nport, two outports and four regsters are ge as storage unts. So we should dde C nto C, C, C usng the 2 3 method n 4.2.4. We get C ={ r r 4 }, C ={ r 2 2 r 5 } C ={ r 3 3 } Snce there are no conflcts n all regster fles, the fnal bndng result s shown n fgure 0. 4.3 Approach two - Clqueparttonng after groupng Besdes approach one, we hae another approach for the storage bndng. The procedure s shown n fgure. The algorthm related to regster fles s shown n algorthm 2. Output: The bnded storage nodes Fgure : The procedure of storage bndng approach 2 There are also fe major steps n ths approach:. Get arables nformaton whch wll be used n the followng steps; 2. Bnd arrays to memory; 3. Splt all arables nto small groups. The Number of small groups equals to the number of gen regster fles; 4. Clque-parttonng the arables n each groups; 5. Adjustment; The frst two steps are the same as approach one. 8 2 6 7 9 0 3 2 3 4 5 RF RF2 RF3 Fgure 0: Bndng results usng approach 8

4.3. Splt all arables nto small groups Instead of dong clque-parttonng frst n approach one, we splt arables frst. We let L be the set of all arables needed to be bnded: L={ 2... m } m: the number of arables Suppose the number of ge regster fles s n. Then we wll splt L nto L L... L,. 2 n We also use sorted arables lfe to dstrbute arables.. We sort the clques (regsters) n L n ascendng order wth ther start tmes, Start(), as the prmary key and end tmes, End(), as the secondary key; 2. We use n as the module, splt C nto small groups, L..., L L. The frst clque n C s 2 n assgn to L, the second one goes to L... 2 then the n+th clque to L agan. Repeat ths process untl all clques n C are used. L..., L L 2 n wll correspond to regster fle, regster fle2...regster fle n respectely. Usng ths method, we can spread the arables by ther lfetmes, whch wll be helpful n the followng steps. 4.3.2 Clque-Parttonng n small groups Then n each small group, we do clquefor all L do Start(); End (); endfor // Get lfetmes of arables SORT(L); n = NumofRF(RF); Set NumofRnRF() = number of regsters n each, L2 L n = Φ ; // sort the arables n L n ascendng order wth ther start // tmes, Start(), as the prmary key and end tmes, End(), as // the secondary key RF ; // =,2..n L... whle L Φ do //Dde L nto L, L... L, n= Number of Regster Fles 2 n for =; ++; <=n do temp_ar = frst n L; ADD( L,temp_ar); L=DELETE(L, temp_ar); f C = Φ then break; endf endfor endwhle for =; ++; n do endfor // Clque Parttonng n eery L Clque() = ClqueParttonng( L );// Get number of cluqes n each Whle any NumofRnRF() Clque() do f NumofRnRF() Clque() then MOVE( L, L,oneVarn j L ); // L has the least number of clques (regsters) j Clque() = ClqueParttonng( L ); Clque(j) = ClqueParttonng( L ); endf endwhle j L Algorthm 2: Approach 2 of Storage Bndng Usng Multple Port Regster Fles 9

parttonng algorthm that s descrbed n 4.2.3. 4.3.3 Adjustment We also need some adjustments, whch s smlar as 4.2.5, n ths approach to deal wth regsters and ports conflcts. The dfferences between them are that here we moe arable other than clque and we need to do clque-parttonng agan after arable moement. S r r 2 r 2 r 22 r 3 r32 6 0 2 4 4.3.4 Example We also use the same example as approach one and gen three regster fles wth one nport, two outports and four regsters. S 2 S 3 5 7 3 Frst we get the lfetme of all arables L and sort them. The result s shown n fgure 2. 0 4 6 2 3 5 7 8 9 2 3 S 4 9 8 3 2 S S 2 RF RF2 RF3 Fgure 3:Results of clque-parttonng n L Last, We get the bndng result n fgure 4 5 Experment S 3 S 4 Fgure 2: Result of sorted lfetme arables To test our mplementaton, we use square-root approxmaton (SRA) as our example. 2 2 a + b max((0.875x + 0.5y), x) (2) where x=max( a, b ) and y=mn( a, b ) The procedure of SRA s descrbed n fgure 5 that has 8 states. Then usng 4.3. method, we get: L ={ 4 7 0 3} L ={ 2 2 5 8 } L ={ 3 3 6 9 2 } The results of clque-parttonng n each shown n fgure 3. L are 0

s0 s s2 s3 a = In b = In2 Start t = a t2 = b x = max(t,t2) y = mn(t,t2) t3 = x >> 3 t4 = y >> 0 t3 = 0.25x t4 = 0.5x a b t t2 x y t4 t3 t5 t6 t7 a - b - t - t2 - x - y - t4 - t3 - t5 - t6 - t7 - Table 3: Prorty weght Table s4 s5 t5 = x - t3 t5 = 0.875x For the approach one, fgure 6 ge the clqueparttonng of the SRA. s6 s7 t6 = t4 + t5 t7 = max(t6, x) Done = Out = t7 Fgure 5: Square-root approxmaton Table 2 and table 3 ge the arables lfetme and prorty weght respectely. S0 S S2 S3 S4 S5 S6 S7 a X b X t X t2 X x X X X X y X t4 X X t3 X t5 X t6 X t7 X After clque-parttonng, we got three clques: Clque (t4), lfetme:s4~s5 Clque 2(b,t,x,t7), lfetme: s~s7 Clque 3 (a,t2,t3,t5,t6,y), lfetme:s~s6. Then we sort the lfetmes of these clques, we got a squence of Clque 2, Clque 3, Clque. The gen resources are two regster fles wth two out ports and one n port. So we group Clque 2 and Clque to regster fle, and Clque 3 to regster fle 2. After checkng the regsters and ports conflcts, we found they are all feasble. From another angle of ews, we can see, all clques' lfetmes nclude state s4 and s5. That s the s4 and s5 hae oerlappng 3 tmes. So, these three clques can not assgn to the same regster fle. Also, clque and clque 3, clque 2 and clque 3 can not be assgned nto the same regster fle snce there wrte state are oerlapped. Table 2: Varables Lfetme Table 9 3 5 6 0 2 7 4 8 3 2 RF RF2 RF3 Fgure 4: Bndng results usng approach 2

a t y t4 a, t, x, t4, t7 b, t2, y, t3, t6 t3 t5 t6 t5 b t2 x t7 a t y t4 t3 t5 t6 Fgure 8: Experment Results (Approach 2) b t2 x t7 a y t4 Table 4 and 5 ge the comparson of dfferent approaches. a b t2 t x t7 t4 t3 t5 t6 Total number of regsters Regster usage n RF Regster usage n RF2 Approach 3 50% 25% Approach 2 3 50% 25% Table 4: Comparson of dfferent approaches () b t x t7 t4 t3 t5 t6 y t2 Inport usage n RF Outport usage n RF Inport usage n RF2 Outport usage n RF2 Approach 62.5% 43.75% 75% 37.5% Approach 2 75% 43.75% 62.5% 37.5% Table 5: Comparson of dfferent approaches (2) a t3 t5 t6 y t2 b t x t7 Fgure 6: Process of clque-parttonng After consder these ssues, the fnal bndng results s showed n fgure 7. t4 b,t,x,t7 a,t2,t3,t5,t6,y Fgure 7: Experment Results (Approach ) If we use approach 2, we can get the followng results (fgure 8): 6 Summary In ths report, we use regster fles n the storage bndng n the hgh-leel (RTL) synthess. In the mplementaton, the clque-parttonng algorthm and groupng method are used to get the mnmum number of regster and assgn these regsters n dfferent regster fles. Dfferent approaches hae smlar results. Results show that usng our mplementaton, we can get decent results. 7 Future works Here, we also ge some drectons on the future works. Frst, we can use dfferent types of arables and regster fles to do storage bndng, whch s general n real desgns. Second, we can consder the nterconnecton cost when do storage bndng. Thrd, We should decde whch ports of regster fle should be used for whch regster n each state (port bndng). These works should be combned wth schedulng and nterconnecton bndng works. 2

References [GAJS00] D. Gajsk: RTL Desgn and Methodology, Unersty of Calforna, Irne, Techncal Report ICS-00-35, Noember 2000 [GWDL92]D. Gajsk et al. : Hgh leel synthess: Introducton to Chp and System Desgn, Kluwer Academc Publshers, 992 [AHCH92] I. Ahmad, C. Y. R. Chen: Groupng Varables nto Multport memores for ASIC Data Path Synthess, ASIC Conference and Exhbt, 992. [DOGA0] Dongwan Shn et al.: CDFG Representaton for SpecC RTL, Unersty of Calforna, Irne, Techncal Report ICS-0-50, June, 200 [GAJS97] D. Gajsk: Prncples of Dgtal Desgn, Prentce-Hall, Inc, 997 3