Folding. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Folding ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2010 ldvan@cs.nctu.edu.tw http://www.cs.nctu.tw/~ldvan/

Outline Introduction Folding Transformation Register Minimization Techniques Register Minimization in Folded Architecture Conclusions VLSI-DSP-6-2

Introduction (1/2) Systematically determine the control circuits in DSP architectures by folding transformation, where multiple algorithm operations are time-multiplexed to a single functional unit. Use for synthesis of DSP architectures that can be operated at single or multiple clocks. Use to reduce the number of hardware functional units (FUs) by a factor of N at the expense of increasing computation time by a factor of N. Lead to an architecture that uses a large number of registers and thus present the register minimization technique. VLSI-DSP-6-3

Introduction (2/2) VLSI-DSP-6-4

Outline Introduction Folding Transformation Register Minimization Techniques Register Minimization in Folded Architecture Conclusions VLSI-DSP-6-5

Folding Transformation (1/3) A systematic techniques for designing control circuits for hardware where several algorithm operations are time-multiplexed on a single functional unit. Notations U, V: nodes (operations) of the original DFG H U, H V : nodes (functional units) of the folded DFG W (x) : x-th iteration of node W e U V: an edge e from node U to noe V w(e): # of delays of the edge e Folding factor N # of operations that share one FU Folding set An ordered set of operations that executed by the same FU the position of an operation U in folding set is actually the folding order of U The folding set are typically obtained from a scheduling and allocation algorithm (ref. Appendix B) The folding set represents underlying folding transformation VLSI-DSP-6-6

Folding Transformation (2/3) P U : # of the pipeline stages of H U. P U = 0 indicates that H U is not pipelined. e D F (U V): (folding equation) # of cycles that the result of H U must be stored D F ( U e V ) [ N( l w( e))] Nw( e) P Negative value of folding equation D F is possible before retiming the folding equations. U v v] [ Nl u P U u] VLSI-DSP-6-7

Folding Transformation (3/3) U (l) w(e) V (l+w(e)) N folded N folded H U (Nl+u) P U +D F H V (N(l+w(e))+v) VLSI-DSP-6-8

Folding Retimed Biquad Filter (1/2) Folding factor N = 4 Folding set S 1 = {4, 2, 3, 1}, S 2 = {5, 8, 6, 7}, where S 1 denote all add operation and S 2 denote all multiply operation. Assume that addition and multiplication require 1 and 2 u.t. respectively. 1-stage adders and 2-stage pipelined multipliers are available. VLSI-DSP-6-9

Folding Retimed Biquad Filter (2/2) folding equations VLSI-DSP-6-10

Retiming (1/3) What situations will be suffered if the folding equation D F is negative? Retiming (moving delay elements) the original DFG prior to folding Constraint: e D F (U V)= Nw r (e) P U +v u>=0 -----(1) Substitute w r (e)=w(e)+r(v) r(u) into (1) r(u) r(v)<= D F (U V)/N Since the retiming values of the nodes are restricted to be integers, the above equations can be rewritten as r(u) r(v)<= D F (U V)/N e e VLSI-DSP-6-11

Retiming (2/3) Example: D F (1 2)=Nw(e)-P U +vu=0-1+1-3=-3 r(1)-r(2)<= floor{d F (1 2)/N} =floor{-3/4}=-1 VLSI-DSP-6-12

Retiming (3/3) r(1)=-1, r(2)=0, r(3)=-1, r(4)=0 r(5)=-1, r(6)=-1, r(7)=-2, r(8)=-1 VLSI-DSP-6-13

Outline Introduction Folding Transformation Register Minimization Techniques Register Minimization in Folded Architecture Conclusions VLSI-DSP-6-14

Lifetime Analysis Lifetime analysis is a procedure used to compute the minimum number of registers required to implement a DSP algorithm in hardware. Linear lifetimes analysis Circular lifetime analysis In lifetime analysis, the number of live variables at each time unit is computed, and the maximum number of live variables at any time unit is determined. Forward-backward register allocation technique VLSI-DSP-6-15

Linear Lifetime Analysis Variables {a, b, c} max {0,1,2,2,2,2,2,2}=2 Periodicity Implicit Three iterations with N=6 VLSI-DSP-6-16

Matrix Transpose Example (1/3) a b c d e f g h i Transpose a d g b e h c f i i h g f e d c b a Matrix Transpose i f c h e b g d a VLSI-DSP-6-17

Matrix Transpose Example (2/3) T zlout = zero-lantacy output time T diff = T zlout T input T output = T zlout + max{-t diff } VLSI-DSP-6-18

Matrix Transpose Example (3/3) Linear Lifetime Chart Circular Lifetime Chart The minimum register number is 4. VLSI-DSP-6-19

VLSI Digital Signal Processing Systems Procedures of Forward-Backward Register Allocation Steps: Step 1: Determinate the minimum number of registers using lifetime analysis. Step 2: Input each variable at time step according to the beginning of its lifetime. Step 3: Each variable is allocated in a forward manner until it is dead or it reaches the last register. Step 4: Since the allocation is periodic, the allocation of the current iteration also repeats itself in subsequent iterations. Thus, we hash the position for registers at period of N. Step 5: If a variable that reaches the last register and is still alive, then these variables are allocated to a register in a backwardly manner. Step 6: Repeat Steps 4 and 5 as required until the allocation is completed. VLSI-DSP-6-20

Outline Introduction Folding Transformation Register Minimization Techniques Register Minimization in Folded Architecture Conclusions VLSI-DSP-6-22

Procedures of Register Minimization in Folded Architectures Steps: Step 1: Perform retiming for folding Step 2: Write the folding equations Step 3: Use the folding equations to construct a lifetime table Step 4: Draw the lifetime chart and determine the required number of registers Step 5: Perform forward-backward register allocation Step 6: Draw the folded architecture that uses the minimum number of registers VLSI-DSP-6-23

Folding Architecture Example VLSI-DSP-6-24

Folded Architecture for Matrix Transpose Example VLSI-DSP-6-25

Biquad Filter Example (1/4) Step 1: Retiming Retiming Invalid folding: DF(1 2) = -3 DF(6 4) = -4 DF(8 4) = -3 DF(7 3) = -3 VLSI-DSP-6-26

Biquad Filter Example (2/4) Step 2: Folding Equations D F (U V) = Nw(e) P u + v - u Step 3: Construct the lifetime table T input = u + P u T output = u + P u + max v {D F (U V) } D F (1 2) = 4(1) 1 + 1 3 = 1 D F (1 5) = 4(1) 1 + 0 3 = 0 D F (1 6) = 4(1) 1 + 2 3 = 2 D F (1 7) = 4(1) 1 + 3 3 = 3 D F (1 8) = 4(2) 1 + 1 3 = 5 D F (3 1) = 4(0) 1 + 3 2 = 0 D F (4 2) = 4(0) 1 + 1 0 = 0 D F (5 3) = 4(0) 2 + 2 0 = 0 D F (6 4) = 4(1) 2 + 0 2 = 4 D F (7 3) = 4(1) 2 + 2 3 = 1 D F (8 4) = 4(1) 2 + 0 1 = 1 VLSI-DSP-6-27

Biquad Filter Example (3/4) Step 4: Draw the Lifetime Chart Step 5: Register Allocation Folding Factor = 4 The minimum number of registers is 2. VLSI-DSP-6-28

Biquad Filter Example (4/4) Step 6: Folded Architecture VLSI-DSP-6-29

IIR Filter Example (1/4) Step 1: Retiming Retiming Invalid folding: DF(3 1) = -3 DF(4 1) = -2 VLSI-DSP-6-30

IIR Filter Example (2/4) Step 2: Folding Equations Step 3: Construct the lifetime table D F (U V) = Nw(e) P u + v - u T input = u + P u T output = u + P u + max v {D F (U V) } D F (1 2) = 4(1) 1 + 1 3 = 0 D F (2 3) = 4(1) 1 + 0 3 = 5 D F (2 4) = 4(1) 1 + 2 3 = 2 D F (3 1) = 4(1) 1 + 3 3 = 1 D F (4 1) = 4(2) 1 + 1 3 = 0 VLSI-DSP-6-31

IIR Filter Example (3/4) Step 4: Draw the Lifetime Chart Step 5: Register Allocation Folding Factor = 2 The minimum number of registers is 3. VLSI-DSP-6-32

IIR Filter Example (4/4) Step 6: Folded Architecture VLSI-DSP-6-33

Conclusions Present a systematic transformation of timemultiplexed architectures Explore folding techniques to reduce # of functional units Explore register minimization technique to reduce # of registers VLSI-DSP-6-34

References K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, Wiley, 1999. S. Y. Huang, Handout of text book, 2004. VLSI-DSP-6-35