Optimized Design Platform for High Speed Digital Filter using Folding Technique

Size: px

Start display at page:

Download "Optimized Design Platform for High Speed Digital Filter using Folding Technique"

Bruno Patterson
5 years ago
Views:

Volume-2, Issue-1, January-February, 2014, pp. 19-30, IASTER 2013 www.iaster.

1 Volume-2, Issue-1, January-February, 2014, pp , IASTER Online: , Print: ABSTRACT Optimized Design Platform for High Speed Digital Filter using Folding Technique 1 Shreyas Patel, 2 Prof.J.S. Rani Alex 1 Department of SENSE, VIT University, Chennai, India 2 Department of SENSE, VIT University, Chennai, India Implementation of DSP system must satisfy the sampling rate constraint and must require less space and power consumption. Thus finding a reasonable solution to optimize design platform using different algorithm is much needed. In this paper an optimized platform is designed by lifetime analysis which is one of the techniques of folding algorithm for minimizing the registers such that synthesizable RTL is obtained. Folding techniques can be used for the synthesis of DSP architecture that can be operated using single or multiple clocks with less number of registers and functional units resulting in an integrated circuit with usage of small silicon area. A technique is presented for computing the minimum number of registers, allocating the data to these registers and obtains synthesizable RTL code for folded architecture. Keywords: Folding Architecture, RTL (Register Transfer Logic), Register Minimization, Lifetime Analysis. I. INTRODUCTION In today's VLSI world, Designers had to design circuit with high performance and with less area and this to be done with a rapid design time. CAD tool play a very important role in achieving this requirement. ASIC design process start with given specification, from these high level functional block is obtained. These can be later used for obtaining circuit level device. In present work designed of 3-tap IIR filter model is design in MATLAB SIMULINK using XILINX block set, System generator which generate automatic synthesizable RTL code and design specification report of speed, area, power and registers. Folding technique provide a mean for trading area for time in a DSP architecture. DSP Architecture consists of adders and multipliers, in CMOS technology multiplier consume more power and thus structure must be implemented using one adder and multiplier using folding technique with minimum registers. The work carried out in previous paper is for reduction in clock period using retiming method [1]. In this paper, it had been reported that there is minimization of clock period but number of register is increasing. In this paper the technique is applied on folded retimed filter to reduce the registers. First, Design a 3-tap IIR folded retimed filter in MATLAB SIMULINK using XILINX block and obtain synthesizable RTL code automatic which reduce time for designer, observe the number of register that has been used from synthesize report. Next, Find iteration bound using longest path matrix (LPM) and minimum cycle mean (MCM) algorithm using MATLAB. Then, Obtain folded retimed architecture of 3-tap IIR filter(manually) and again check for iteration bound using LPM and MCM algorithm. Iteration and loop bound must remain same(matlab).required number of registers is more in folded structure so use life time analysis technique which is part of folding technique for minimization of registers(manually). Finally, Design a folded structure according to life time analysis technique and write an HDL code and synthesize report of folded structure compare with pervious synthesize result.(xilinx). 19

2 II. FOLDING TECHNIQUE Folding can be used to reduce the number of hardware functional unit by a factor of N at the expense of increasing the computational time by a factor of N. While folding transformation reduces the number of functional unit in the architecture, it may also apply to an architecture that uses a larger number of register. To avoid architecture consist of excessive amount of register, life time analysis technique can be used to compute the minimum number of register required to implement a folded DSP architecture. Using register minimization along with folding transformation not only reduce number of functional unit but also keeps the area as minimum as possible[8]. Fig-1 shows an example of 2 addition operations can be time multiplexed on a single pipelined hardware adder [9]. Fig-1 DSP program with 2 addition operation [9] y(n)=a(n)+b(n)+c(n) (1)[8] In Fig-2, the 2 addition operation are time-multiplexed on a single pipelined adder. Fig-2 A folded architecture 2 addition operation are folded to a single hardware adder with 1 stage of pipelining.[9] Table-1 operation of first six cycle of the folded hardware[8][9] In Table-1 in cycle 0,th sample a(0) and b(0) are switched into adder and in cycle 1[8], sum of (a(0)+b(0)) is switched into adder along with c(0),in cycle 2 when sum of (a(0)+b(0)+c(0)) is output and intermediate result (a(1)+b(1)) is computed by the adder[8]. This process continues as shown in table-1[8].the use of systematic folding technique is explained by folding the 2-tap retimed IIR filter, shown in Fig-4. Assume that addition and Multiplication require 1 and 2 unit the filter is folded with folding factor N=4[8],folding factor N means that iteration period of folded hardware is 4 unit i.e each node of filter is executed exactly once every 4 unit in folded architecture[8]. 20

3 For folded system to be realized D F (U V) >= 0 must hold for all of edge in DFG (data flow graph), must implies Nw r (e)-p u+ V-U >=0 (2) where Pu is processing unit time and Wr(e) is number of delay in edge Consider a one node-1 at instance (S 1 /3) doing to Node-2 with instance (S 1 /1) with one delay 4(0)-1+1-3=-3 (before folding) 4(1)-1+1-3=1 (after folding) Fig-2(A) Retimed Biquard Filter with Valid Folding Structure Fig-2(B) The Folded Biquard Filter using 1 Adder and 1 Multipier [8] As shown in Fig-2(b) number of adder and multiplier reduce, consider node 1 in Fig-2(a) at instance 4l+3 input to adder and at instance 4l+1 is output of filter compare this operation with Fig-2(b).as per equation (2) delay is 1 unit so in Fig-2(b) sample at IN,{3} enter input as adder and after 1 delay again input to adder{1},this structure give same functionality as Fig-2(a).but problem with this structure, it required more number of delay(register). III. LIFETIME ANALYSIS Lifetime analysis is one of folding technique used to compute minimum number of register require to implement a dsp algorithm in hardware[8].a data sample is live from the time it is product through the time it is consumed. After the variable is consumed it is dead[10]. A variable occupies one register during each time unit that is live[10]. In lifetime analysis, the number of live variable at any time unit is determined[10]. This is the minimum number of register required to implement the DSP program[8]. The folded architecture without lifetime analysis show in Fig-2(b) requires 6 register and 1 adder and multiplier. Since retiming for folding has already been performed,the next step is to construct the lifetime show in Table-2.In life time there is one entry for each node in DFG, that specify the lifetime(t input T output ) for a node. 21

4 T input =u+pu (3) T outout= u+pu+max v (D F (U V) (4) The time T in put for node U is u+pu where u is folding order of U and Pu is number of pipeline stage in functional unit that execute U[9]. This value of Tinput is the time unit in which the node produce data in hardware for the 0-th iteration of DSP programmed[8]. For example T input for node 1 in Fig-3 is 3+1=4.The time T output for node U is u+pu+max v (D F (U V)).where max{d F (U V)} represent longest folded path delay among all edge that begin at node U[9]. from equation T input and T ouput develop a table show in Table-2. Table-2 Lifetime for the Retimed Biquard filter NODE T in T out Fig-3 Lifetime Chart[8] Table-3 The Allocation Table for the Folded Biquard Filter[8] The linear lifetime chart can be drawn from Table-2 for the lifetime Fig-3 shown, at last the allocation of data variable to register shown in Table-3.Lifetime analysis need less number of register compare to folded technique. Same folded architecture is obtained by using lifetime analysis with 2 register shown in Fig-4[8]. 22

Fig-4 A Folded Biquard Filter Architecture Implementing the DFG Using Minimum Number of Registers [8][9] As from Fig-4 same biquade filter is implemented by using 1 adder and multiplier with two

DESIGN AND ANALYSIS In this paper the main goal to reduce the number of registers used in retimed folded 3- Tap IIR filter using HDL, for comparison of designer HDL code first we are designing the

5 Fig-4 A Folded Biquard Filter Architecture Implementing the DFG Using Minimum Number of Registers [8][9] As from Fig-4 same biquade filter is implemented by using 1 adder and multiplier with two registers and data allocate in registers using switching activity IV. DESIGN AND ANALYSIS In this paper the main goal to reduce the number of registers used in retimed folded 3- Tap IIR filter using HDL, for comparison of designer HDL code first we are designing the retimed folded 3-Tap IIR filter in Matlab Simulink using Xilinx System Generator, show in Fig-5(a) and output for 5 discrete sample shown in Fig-5(b),System Generator is a system-level modeling tool that facilitate FPGA hardware design. It extends Simulink in many ways to provide modeling environment that is well suited to hardware design. Fig-5(a) Implementation of Retimed Folded 3-TAP IIR Filter in Matlab Simulink using System Generator Fig-5(b) Output of 3-TAP IIR Filter with 5 Sample 23

6 System Generator automatically compiles design into low-level representation. Design is compiled and simulated using the System Generator. Automatically code is generated and code is synthesis in Xilinx simulator to find number of register used in retimed folded 3-TAP IIR filter, Synthesis report is been show in Fig-5(c). Fig-5(c) Automatic Synthesis Report generated by System Generator From the synthesis report the number of registers slice generated by System generator is 48, so our aim to reduce number of Registers by writing HDL for folded structure. For folded structure we need to do calculation analytically and by using matlab. The 3-tap IIR Filter been designed by using dataflow graph.dataflow graph gives detail information without implementation of hardware and can be able to represent any algorithm. A DFG is a directed graph G(V,E) with a set of edges E. These set of nodes V are subdivided into computational nodes, input and output nodes [1]. (a) (b) Fig-6 (a) 3-TAP IIR filter (b) Dataflow graph of 3-TAP IIR filter In dataflow graph representation the node represent computational time and directed edge represent data path and each has a non-negative number of delay associated with node implementation of data flow graph represent in Fig-7.This filter is folded with folding factor N=6,means that iteration period of folded hardware is 4 U.T,each node in 3-tap IIR filter is executed exactly ones the iteration period can be founded by using LPM(longest path matrix) and MCM(Minimum cycle mean) algorithm, algorithm is implemented in matlab to check iteration period, after and before folding, the property of folding transformation that loop bound and iteration bound should not change after adding number of delay in path. 24

7 Fig-7 Dataflow Graph of 3-Tap IIR Filter Using Matlab In present paper as per eqa-2 the weight (delay) of D f (U V) is calculated, some of edge may get negative value shown in Table-8, the edge with negative D f (U V) can be made non-negative by increasing (decreasing) number of delay the D f (U V) by Nw,while adding delay property should not be effected. Table-4 Folding Equation for Folding Constraint for DFG D f (U V) Delay In the Table-4 some of edge get negative value to make non-negative value, we added a delay(register) to make them positive after adding delay to each negative value retime 3-TAP IIR filter with valid folding retimed structure is shown in Fig-8,but adding delay there is increase in latency but functionality and property 1.loop bound remain same 2.iteration bound must remain same iteration bound and Loop bound of folded architecture can check by using LPM and MCM algorithm shown in Fig-8(a),Fig-8(b). 25

8 Fig-8 Retimed 3-TAP IIR Filter with Valid Folding Architecture Fig-8(a)Verified Iteration Bound using LPM after Adding Delay Fig-8(B) Verified Iteration Bound using MCM after Adding Delay 26

fashions be calculated as per (eqation-3 and equation-4) show in Table-5.

9 In folded structure {(s1/0),(s1/1),(s1/2),(s1/3),(s1/4),(s1/5),(s2/0),(s2/1),(s2/2),(s2/3),(s2/4),(s2/5)}, are assumed instance at particular time. For folded structure again calculate the delay for each edge, as per mention above Life time Analysis a linear life time chart is used to graphically represent the lifetime of variable in a linear fashions be calculated as per (eqation-3 and equation-4) show in Table-5. Table-5 Lifetime Chart NODE T input T output Fig-9 Life Time Chart The vertical line in Fig-9 represent the clock cycle and horizontal line represent the activation of node at particular clock cycle. For example sample leaving from node-1(fig-8) should activate at 6 th clock cycle and must reach at node 6 with 9 delay. While writing HDL code Table-6 gives information about data allocation in registers. Table-6 Data Allocation in Register for Every Clock Cycle 27

Fig-10 Folded Architecture of 3-TAP IIR Filter Using Lifetime Chart Fig-10 show is folded structure of 3-TAP IIR filter now to represent this structure in digital design for writing HDL we need to

10 Fig-10 Folded Architecture of 3-TAP IIR Filter Using Lifetime Chart Fig-10 show is folded structure of 3-TAP IIR filter now to represent this structure in digital design for writing HDL we need to replace those switches by multiplexer and need RAM to store data for filter co-efficient and to store Multiplier output which can be further used Fig-11 show the implementation of Fig-10(Folded architecture of 3-TAP IIR filter) in digital design. Fig-11 3-TAP IIR Filter Folded Digital Design 3-TAP IIR filter with folded structure using 4 register,1 adder and 1 multiplier in Xilinx with HDL code and synthesis and design summary report is to be compare the result with report generated by System Generator V. SIMULATION RESULT Fig-12 3-tap IIR Folded Filter using Xilinx Simulation Tool 28

Fig-13 RTL Schematic View of Folded 3-TAP IIR Filter Fig-14 Synthesis Report of Folded Digital

technique but disadvantage of paper is after doing retiming, they are able to reduce clock

11 Fig-13 RTL Schematic View of Folded 3-TAP IIR Filter Fig-14 Synthesis Report of Folded Digital Design in Xilinx From Fig-14 Synthesis report, registers get reduce with usage of 5 Look-up Table. In previous design work been carried out for optimizing the clock period by using retiming technique but disadvantage of paper is after doing retiming, they are able to reduce clock period but in report they shown the number of registers is increasing shown in Fig-15,so our design give reduction in register can be seen by synthesis report. Fig-15 Previous Work Simulation Result [1] 29

12 VI. CONCLUSION In this particular work a design optimized platform is developed for Digital filter. There are two ways by which optimization is performed in the current work. Firstly folding and second lifetime analysis technique but in folding functional unit and critical path is reduced but there is increasing in number of registers so lifetime analysis method is chosen which reduce the critical path, functional unit as well as registers and generates the synthesizable HDL. Since the entire process is reduce area occupied by register. VII. REFERENCES [1] Deepa Yagain,Dr. Vijaya Krishna A"Design Optimization Platform for Synthesizable High Speed Digital Filters Using Retiming Technique"IEEE-ICSE2012 Proc., 2012, Kuala Lumpur, Malaysia. [2] Daniel D. Gajski, Lognath Ramachandran IEEE Design & Test, volume 11, Issue 4 (Oct 1994), Publishers: IEEE computer society press, Los Alamitos, CA,USA,: ,pp [3] Zahra Jeddi and Esmail Amini Power optimization of Sequential Circuits by Retiming and Rewiring, IEEE, 2006 [4] Ozgur Sinanoglu and Vishwani D. Agrawal Retiming Scan Circuit to Eliminate Timing Penalty,IEEE, [5] A. Chandrakasan, S. Sheng, and R. Brodersen, Low-power CMOS digital design, IEEE J. Solid-State Circuits, vol. 27, pp , Apr [6] Zahra Jeddi and Esmail Amini Power optimization of Sequential Circuits by Retiming and Rewiring, IEEE, [7] K. K. Parhi "Synthesis of Control Circuits in Folded Pipelined DSP Architectures", IEEE Jl. of Solid-State Circuits, vol. SC-27, no. 1, pp [8] KESHAB K.PARHI "VLSI DIGITAL SIGNAL PROCESSING SYSTEM design and implementation" ISBN: , [9] Pierre COULON "Postgraduate Course on Signal Processing in Communications, FALL 99. [10] S. Srinivasan. "A novel architecture for lifting-based discrete wavelet transform for JPEG2000standard suitable for VLSI implementation", 16th International Conference on VLSI Design2003 Proceedings ICVD-03,

Chapter 6: Folding. Keshab K. Parhi

Chapter 6: Folding. Keshab K. Parhi Chapter 6: Folding Keshab K. Parhi Folding is a technique to reduce the silicon area by timemultiplexing many algorithm operations into single functional units (such as adders and multipliers) Fig(a) shows