University of Massachusetts Amherst Department of Electrical & Computer Engineering

Similar documents
Model-Based Design for Video/Image Processing Applications

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs

AccelDSP tutorial 2 (Matlab.m to HDL for Xilinx) Ronak Gandhi Syracuse University Fall

DESIGN STRATEGIES & TOOLS UTILIZED

Accelerating FPGA/ASIC Design and Verification

USING THE SYSTEM-C LIBRARY FOR BIT TRUE SIMULATIONS IN MATLAB

Connecting MATLAB & Simulink with your SystemVerilog Workflow for Functional Verification

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Evaluation of the RTL Synthesis Tools for FPGA/PLD Design. M.Matveev. Rice University. August 10, 2001

Cover TBD. intel Quartus prime Design software

Cover TBD. intel Quartus prime Design software

FPGAs: FAST TRACK TO DSP

Implementing MATLAB Algorithms in FPGAs and ASICs By Alexander Schreiber Senior Application Engineer MathWorks

Model-Based Design Using Simulink, HDL Coder, and DSP Builder for Intel FPGAs By Kiran Kintali, Yongfeng Gu, and Eric Cigan

High Level Abstractions for Implementation of Software Radios

Evolution of CAD Tools & Verilog HDL Definition

ECE 501- Project in lieu of thesis VIKAS YELAGONDANAHALLI. Summer 2007

FPGA Polyphase Filter Bank Study & Implementation

Intro to System Generator. Objectives. After completing this module, you will be able to:

Overview of Digital Design with Verilog HDL 1

DSP Builder Handbook Volume 1: Introduction to DSP Builder

Simulink Design Environment

Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers

Modeling and implementation of dsp fpga solutions

DSP Builder Handbook Volume 1: Introduction to DSP Builder

DSP Flow for SmartFusion2 and IGLOO2 Devices - Libero SoC v11.6 TU0312 Quickstart and Design Tutorial

INTRODUCTION TO CATAPULT C

AC : INCORPORATING SYSTEM-LEVEL DESIGN TOOLS INTO UPPER-LEVEL DIGITAL DESIGN AND CAPSTONE COURSES

Introduction to DSP/FPGA Programming Using MATLAB Simulink

Parallel FIR Filters. Chapter 5

CAD SUBSYSTEM FOR DESIGN OF EFFECTIVE DIGITAL FILTERS IN FPGA

Programmable Logic Devices HDL-Based Design Flows CMPE 415

Making the Most of your MATLAB Models to Improve Verification

Choosing an Intellectual Property Core

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

A Matlab/Simulink Simulation Approach for Early Field-Programmable Gate Array Hardware Evaluation

Modeling a 4G LTE System in MATLAB

[Sub Track 1-3] FPGA/ASIC 을타겟으로한알고리즘의효율적인생성방법및신기능소개

Basic Xilinx Design Capture. Objectives. After completing this module, you will be able to:

Lecture 2 Hardware Description Language (HDL): VHSIC HDL (VHDL)

Agenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs

MODEL BASED HARDWARE DESIGN WITH SIMULINK HDL CODER

Introduction to C and HDL Code Generation from MATLAB

FPGA Co-Processing Architectures for Video Compression

Early Models in Silicon with SystemC synthesis

OUTLINE RTL DESIGN WITH ARX

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

High Speed SPI Slave Implementation in FPGA using Verilog HDL

Design and Verification of FPGA Applications

Four Best Practices for Prototyping MATLAB and Simulink Algorithms on FPGAs by Stephan van Beek, Sudhir Sharma, and Sudeepa Prakash, MathWorks

FPGA-Based Rapid Prototyping of Digital Signal Processing Systems

Design and Verify Embedded Signal Processing Systems Using MATLAB and Simulink

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms

Hardware Implementation and Verification by Model-Based Design Workflow - Communication Models to FPGA-based Radio

Chapter 5: ASICs Vs. PLDs

101-1 Under-Graduate Project Digital IC Design Flow

Modeling and Verifying Mixed-Signal Designs with MATLAB and Simulink

International Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools

What's new in MATLAB and Simulink for Model-Based Design

Large Data handling Technique for Compression Pre-coder using Scalable Algorithm

FPGAs Provide Reconfigurable DSP Solutions

Agenda. How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware design

Mentor Graphics Solutions Enable Fast, Efficient Designs for Altera s FPGAs. Fall 2004

Optimize DSP Designs and Code using Fixed-Point Designer

Assignment. Last time. Last time. ECE 4514 Digital Design II. Back to the big picture. Back to the big picture

Appendix SystemC Product Briefs. All product claims contained within are provided by the respective supplying company.

An introduction to CoCentric

Employing Multi-FPGA Debug Techniques

Hardware Software Co-Simulation of Canny Edge Detection Algorithm

RTL Coding General Concepts

AccelDSP Synthesis Tool

Design and Verify Embedded Signal Processing Systems Using MATLAB and Simulink

Lecture 1: Introduction Course arrangements Recap of basic digital design concepts EDA tool demonstration

Integrated Workflow to Implement Embedded Software and FPGA Designs on the Xilinx Zynq Platform Puneet Kumar Senior Team Lead - SPC

A Model-based Embedded Control Hardware/Software Co-design Approach for Optimized Sensor Selection of Industrial Systems

ASIC Design Flow. P.Radhakrishnan, Senior ASIC-Core Development Engineer, Toshiba, 1060, Rincon Circle, San Jose, CA (USA) Jan 2000 (Issue-3)

FPGA Design Flow 1. All About FPGA

Reducing the cost of FPGA/ASIC Verification with MATLAB and Simulink

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience

Digital Design Methodology

Using SystemC for Hardware Design Comparison of results with VHDL, Cossap and CoCentric

ASIC Implementation and FPGA Validation of IMA ADPCM Encoder and Decoder Cores using Verilog HDL

DSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions

Hierarchical Design Using Synopsys and Xilinx FPGAs

A Rapid Prototyping Methodology for Algorithm Development in Wireless Communications

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Formal Verification of ASIC Design

Is SystemVerilog Useful for FPGA Design & Verification?

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS.

DIGITAL DESIGN TECHNOLOGY & TECHNIQUES

Fixed-point Simulink Designs for Automatic HDL Generation of Binary Dilation & Erosion

Design of Convolution Encoder and Reconfigurable Viterbi Decoder

Implementation of a Low Power Decimation Filter Using 1/3-Band IIR Filter

CMPE 415 Programmable Logic Devices Introduction

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

Design Once with Design Compiler FPGA

SDR Spring KOMSYS-F6: Programmable Digital Devices (FPGAs)

: : (91-44) (Office) (91-44) (Residence)

Tools for Reconfigurable Supercomputing. Kris Gaj George Mason University

Transcription:

University of Massachusetts Amherst Department of Electrical & Computer Engineering ECE 696 Independent Study Fall 2005 Final Report Title: Efficient RTL Synthesis of DSP Algorithms Supervisor: Prof. Maciej Ciesielski Submitted by, Tariq Bashir Ahmad. ID: 20716451 tbashir@ecs.umass.edu 1

1. Introduction Hardware design starts with initial specifications and culminates with the hardware implementation. All this is a continuous process, that means refining specifications, algorithms and hardware details until implementation is in accordance with specifications. 1.a. Conventional DSP Hardware Design Flow Figure 2 shows a general form of a typical DSP hardware design flow. DSP design has traditionally been divided into two types of activities systems/algorithm development and hardware/software implementation. These tasks have been accomplished by two different groups of engineers that often have little connection or interaction. As a result the transition from system level to implementation level is not seamless. Figure 1: Conventional DSP hardware design flow 2

The flow originates with algorithm developers and system engineers. Algorithm developers create, analyze and refine the required DSP algorithms using mathematical analysis tools at the behavioral level, often without consideration for the underlying system architecture or hardware implementation details. The system designer is concerned with defining the functionality and architecture of the design to adhere to the product specification and interface standards. According to market research firm Forward Concepts as well as reports in FPGA and Programmable Logic Journal, the majority of DSP system designers and algorithm developers use the MATLAB language from The MathWorks. In contrast, hardware designers take the specifications created by the systems engineers and algorithm developers and are tasked to create a physical implementation of the DSP design. If the target of the DSP algorithm is an FPGA, structured ASIC, ASIC or SOC, the first task is to create a register transfer level (RTL) model in a hardware description language (HDL) such as Verilog or VHDL. The hardware designer must have a sufficient understanding of communications theory and signal processing to be able to interpret the written specification provided by the systems engineer. The process of creating an RTL model and a simulation testbench usually takes many months because of the need to verify that the manually created RTL file exactly matches the MATLAB model. Once the RTL model and simulation environment is created, the hardware designer interacts with the systems engineers and algorithm developers to analyze the performance, area and functionality of the hardware realization of the DSP system. It is quite common for the original algorithms and system architecture to be modified because the systems engineers had no visibility into the physical design domain during the algorithm development. The iteration process continues refine the algorithms and system architecture, update the written specification, modify the RTL models and testbenches, and resimulate until the DSP system requirements are met by the hardware realization. The design flow then continues with a standard FPGA and/or ASIC top-down design flow using logic synthesis, and ultimately physical design tools to place and route the netlist in a given FPGA or ASIC device. 1.b. New DSP Hardware Design Flow Figure 2 shows the new DSP hardware design flow. In the rest of the report, we shall be discussing the advantages of the new approach. The figure has several aspects to discuss. First, there is no real need of two different groups to achieve the DSP design. One group can do this. The first step is the same as the first step in conventional DSP design. This step is often called floating-point simulation of algorithm. Once the algorithm has been verified at behavioral level such as in MATLAB, the algorithm can be mapped as it is to vendor dependent representation. If the target hardware is a XILINX FPGA, e XILINX System 3

Generator Blockset for SIMULINK can be used to represent algorithm. On the other hand, if the target hardware is an ALTERA CPLD/FPGA, ALTERA DSP Builder Blockset for SIMULINK can be used to represent algorithm. Once it is done, algorithm can be simulated in fixed point and see if the fixed-point simulation is close to the floating-point simulation done previously. You keep changing bit lengths until fixed-point simulation results match floating-point simulation results. Now the algorithm is ready to go into RTL. XILINX System Generator, ALTERA DSP Builder and SYNPLIFY DSP Blocksets make it easy to convert algorithm to RTL by a single click. So, using vendor Blocksets for target FPGA eliminates the need of manual and time-consuming RTL translation. Once this translation is done, HDL simulation can be done using tools like Mentor Graphics ModelSim. It is important to realize that throughout results will be compared against initial floating-point simulation, which serves as a golden reference. Once it is done successfully, algorithm can be synthesized using various tools as described in figure 2. Once it is done, implementation and verification can be done on target FPGA. The vendor s Blocksets also make it possible to perform Hardware in the loop simulation (HIL). This means performing real time verification of algorithm after it has been implemented on FPGA against golden reference. Figure 2: New DSP hardware design flow 4

So to conclude, new approach to DSP design is quick, less error prone and efficient. 2. Analysis of new approach to DSP Design flow The new approach aims at providing efficiency in terms of a) Time, i.e. the time to go from algorithm to implementation is quicker than the old approach. b) Hardware resources, i.e., optimization to achieve timing and area constraints. c) Less errors are introduced and integration is seamless Figure 3 illustrates the above points. New approach to DSP design Versus Old approach to DSP design Figure 3: comparison of old and new DSP design approaches 5

Figure 3 illustrates that old DSP design approach had a wall of abstraction between algorithmic and implementation level. The new approach has bridged this gap by tightly integrating the two at a behavioral level. 2.a. How this is done? Two decades ago, hardware engineers captured their designs manually or as hierarchical gate-level schematics. As electronic systems became larger and more complex, visualizing, capturing, exploring, understanding, and debugging designs at this level of abstraction became increasingly difficult, resource-intensive, time-consuming, and inefficient. The solution at that time was to move to a higher level of abstraction in the form of HDLs, which could be used to specify designs in RTL. These representations were much more concise, they could be more easily captured and understood, and they simulated much faster. The problem was to bridge the gap between the abstract RTL and the implementation-level netlist. Thus, the real boost to productivity came when the RTL representations were used in conjunction with logic synthesis technology. When provided with information as to implementation-level requirements such as timing and area utilization, the synthesis engine could rapidly explore a tremendous number of different implementation alternatives and perform appropriate optimizations to ensure that the design met its objectives (See Figure 4). Figure 4 : Logic Synthesis that was introduced in 80 s Today s DSP designers are faced with a similar situation. They have the ability to capture, simulate, and analyze their algorithms at a high level of abstraction using SIMULINK. Thus far, however, there has been no automated way to quickly and easily migrate these designs to the implementation level. The answer is true DSP synthesis (a term coined by SYNPLICITY INC in their white paper describing their solution to large DSP designs) in the form of Synplify DSP or Xilinx System Generator or Altera DSP Builder (See Figure 5). 6

Figure 5: DSP Synthesis which is introduced now The essence is to design DSP like general-purpose hardware design using FPGAs rather than using DSP from Texas Instruments to implement the design. The debate, which is better than the other, will continue. 3. Model-Based Design with SIMULINK Figure 6 introduces a new term model-based design coined by MATHWORKS INC. It is based upon the fact that SIMULINK models or Blocksets, ones that come as default from MATHWORKS as well as ones that are provided by vendors like XILINX, ALTERA or SYNPLICITY help achieve hardware design flow. The figure shows hardware design flow starting from algorithm level. MATHWORKS has used the term executable specification for floating point simulation of algorithm or golden reference. The reason is the concreteness of specification as apposed to handwritten specification. So starting from executable specifications till the hardware implementation, there are SIMULINK blocks available to make the design flow automated, seamless and abstract. It is complete because verification, testing and debugging are also incorporated. Figure 6 : Model-based design with SIMULINK 7

Figure 7 shows advantages of model-based design, which makes it easier to capture large designs in simple steps without any hassles of debugging or manual work. Figure 7: Advantages of Model-Based Design Figure 8 shows values of various metrics when applied to model-based design. The figure shows the success of model-based design in terms of innovation, quality, cost and time to market. Figure 8 : The value of Model Based Design Figure 8: Value of model-based design 8

Figure 9 summarizes model-based design. The first term that needs some explanation is bit true modeling, which means fixed point results agree with floating point results or golden reference. The second is cycle accurate modeling, which means results agree at all time instants with the golden reference. Figure 9: Summary of Model Based Design 4. A note about FPGAS versus DSPS As mentioned earlier, Digital signal processors are the main competitors of FPGAs for DSP designs. Both have pros and cons that make one suitable over the other depending upon your design specifications. DSPs Facing tough time due to enhanced CPUS Costly as it is IP Serial processing Power hungry Limited customization Integrated with MATLAB Mature and easy to design because of rich libraries and templates FPGAs Gaining in popularity and applications Cheap Parallel processing Also consumes power Unlimited customization Also integrated with MATLAB Evolving and more features are being added So to conclude, when the cost and power are not the main factors, DSPs are preferable because DSP technology is more mature and they are easy to design. Further, as more 9

and more signal processing applications come to the picture, DSPs are the first choice for implementation. FPGAs are the second choice for new signal processing applications as they are still evolving in terms of their signal processing capabilities. But in future, the competition between FPGAs and DSPs will be neck to neck. 5. Application: Design of 12 tap FIR low pass Filter @ 140 MHz using Synplify DSP and Synthesis with Synplify Pro Specifications: Response Type = Low pass Design Method = FIR Filter Order = Minimum order Wpassband = 0.1*pi Wstopband = 0.5*pi Errorpassband = 0.1 Errorstopband = 0.001 Design Flow: Figure 10: Design flow of typical application (in this case FIR design) 10

We shall apply two sinusoids of two different frequencies to the filter. The filer will pass one sinusoid frequency and will suppress the other. Since Sample Rate fs is 140 MHz, the highest frequency that Filter can handle without aliasing is fs/2 = 70MHz Sinusoid is of the form: sin(2*pi*fs/2*t) in time domain Sine1 is generated like: sin(2*pi*fs/2*0.05*t)=sin(2*pi*3.5e6*t) Sine2 is generated like: sin(2*pi*fs/2*0.95*t)=sin(2*pi*66.5e6*t) FIR Filter Specifications are to pass frequencies up to 0.1*fs/2 = 7MHz and stop from 0.2*fs/2 = 14MHz 5.a. Design Capture Figure 11 shows FDA tool (filter design and analysis tool) in MATLAB. The tool can be used to design various filter kinds according to the specifications. Here it is being used to design FIR filter according to the specifications described above. Once the specifications are entered and design filter button is pressed, MATLAB computes the filter coefficients while keeping the number of coefficients minimum (here 12). The figure also shows the magnitude response of the FIR filter. Figure 11: Filter Design using FDA tool in MATLAB 11

Figure 12 shows the 12 filter coefficients of FIR filter. Note the symmetry of filter coefficients, which is an important property of FIR filters with linear phase [4]. This property can be exploited in that the number of multiply and accumulate operations to describe the filter are reduced by half. In other words, half of the filter coefficients are useful while the magnitude of the other half of the coefficients is same as the first half. Figure 12: Filter coefficients (Note Symmetry of filter coefficients) It is observed that filter coefficients are symmetric or anti-symmetric if filter is FIR linear phase filter of type I, type II, type III or type IV [4]. So MATLAB can convert FIR structure of 12 coefficients into FIR structure of 6 coefficients, thus reducing the number of multiply and accumulate operations (MACs) only if the filter is FIR linear phase filter of type I, type II, type III or type IV. In general, it is required to use least number of multiply and accumulate (MACs) for a given DSP design. This is possible when tool has the capability to factor out as much common coefficients as possible regardless of symmetry or anti-symmetry. It has been proved that once the symmetry of FIR linear phase filter of any type is disturbed, MATLAB is not able to recognize common coefficients. For example, if the coefficients are 12

[0.0135-0.0000-0.0151-0.0099 0.0105 0.0181-0.0000-0.0209-0.0140 0.0152 0.0269-0.0000-0.0330-0.0230 0.0264 0.0499-0.0000-0.0750-0.0619 0.0929 0.3007 0.3974 0.3007 0.0929-0.0619-0.0750-0.0000 0.0499 0.0264-0.0230-0.0330-0.0000 0.0269 0.0152-0.0140-0.0209-0.0000 0.0181 0.0105-0.0099-0.0151-0.0000 0.0135] MATLAB can recognize symmetry as it is linear phase FIR of type 2. But if we shuffle some of the coefficients as [-0.0099-0.0000-0.0151 0.0135 0.0105 0.0181-0.0000-0.0209-0.0140 0.0152 0.0269-0.0000-0.0330-0.0230 0.0264 0.0152 0.0499-0.0000-0.0750-0.0619 0.0929 0.3007 0.3974 0.3007 0.0929-0.0619-0.0750-0.0000 0.0499 0.0264-0.0230-0.0330-0.0000 0.0269-0.0140-0.0209-0.0000 0.0181 0.0105-0.0099-0.0151-0.0000 0.0135] MATLAB will not be able to reduce multiply and accumulate (MACs) operations although half of the coefficients are equal to the other half. Figure 13 shows the filter designed above inserted between the input and output ports of the Synplify DSP block. Every thing between the input and output port should be from Synplify DSP Blockset in order to generate HDL and synthesize it. Figure 13: Inserting filter into the design 13

5b. Simulation Once the filter is in place as shown in figure 13, it can be made a subsystem to make the design more hierarchical. Once it is done, it is the time to simulate the design. For that it is necessary to give input as test vectors and observe the response. As mentioned before, the test input is a sum of sinusoids of two different frequencies. The filter will pass the low frequency sinusoid and reject the high frequency sinusoid. At the input and output, the time response and frequency response can be observed using scope and spectrum analyze respectively. Figure 14 shows this. It is important to note that figure 14 represents floating-point simulation or golden reference, which is the foremost step. Figure 14: Adding Stimuli and analysis components Figure 15 shows the input and output in frequency domain. The left hand side figure shows two peaks corresponding to two input sine waves. The right hand side figure shows one peak while the filter suppresses the other peak, as it is a low pass filter. 14

Figure 15: Input and output in freq domain. One sinusoid peak is suppressed at the output. Figure 16 show the same as figure 15 but in time domain. The left hand side figure shows sum of two input sine waves in time domain. The right hand side figure shows one sinusoid in time domain while the filter suppresses the other high frequency sinusoid. Figure 16: Input and output in time domain. One sinusoid is suppressed at the output. 15

5c. Conversion to fixed point After the floating-point simulation, it is confirmed that the design really acts like a low pass filter. So now it s the time to convert it to fixed point and prepare it for HDL translation. Figure 17 shows the output in time and frequency domain after the input and output are quantized to 12 bits. Figure 17 clearly shows that the result is very different from golden reference. This is the result of quantization error caused by choosing less number of bits to represent inputs, filter coefficients and outputs. Figure 17: Error in the representation of output signal due to Quantization error. Here less bits are used than necessary. So the remedy is to increase the number of bits to reduce quantization error to acceptable limits. This is what is being done in figure 18 where we are increasing the number of bits from 12 to 24 thereby doubling it. 16

Figure 18: Increasing bit length to reduce Quantization error. Here more bits are used. Now when the design is simulated in fixed point, the results are in accordance with the floating-point simulation, which shows successful fixed-point conversion. Figure 19: Output Corresponding to the above bit length. We see that output matches earlier floating-point simulation result. 17

5d. Optimization Now the design is ready to go to RTL. Since the design has been accomplished using Synplify DSP, it gives an option to optimize the design using folding, retiming and multi-channelizing. None of these options have been used in the FIR design example as shown in figure 20. Figure 21 shows technology specification and RTL specification. Once Run button is pressed, RTL for the design is generated. Figure 20: Choosing Optimization options Figure 21: Choosing implementation options 18

5e. Synthesis The design is synthesized using SYNPLIFY Pro. Figure 22 shows the result. It shows flow from input to output using transposed direct form [4]. It is observed that the design is implemented with least number of multiply and accumulate (MACs) blocks and this has been done while synthesizing the design. Also transposed direct form implementation is less immune to noise. Hence, during synthesis the tool explored a variety of possibilities and came up with a good solution. Transposed Direct form Implementation of FIR filter (Highly efficient and optimized) 3 multipliers instead of 6 (12/2 since symmetric) because some filter coefficients are zero and some are equal to 1 after Quantization. Figure 22: Synthesis 19

5f. Conclusions Although using Synplify Pro is effective in terms of multiply and accumulate (MACs), it cannot always determine symmetry and cannot factor out common coefficients. For example, when a common multiply accumulate expression like following is executed Out = 2*5 + 6*2 Synplify Pro cannot factor it out as Out = 2*(6+5) And hence it uses two multipliers instead of one as shown in the following figure. Figure 22b:Synthesis of expression Out = 2*5 + 2*6 Moreover, when the same filter, which was designed in section 5, is designed using Xilinx System Generator and synthesized using Synplify Pro, result came out to be very different. This shows that Synplify Pro cannot take vendor independent DSP design and synthesize it. Further Not all major DSP Algorithms have been implemented by Synplify DSP Only vendor library blocks can be inserted to complete the design. The vendors have provided not many examples and support. 20

The tools are nascent and will continue to evolve. Although Synplify DSP blockset or XILINX System generator or ALTERA DSP builder become part of SIMULINK, yet they are not as easy to use as SIMULINK. 6. Our Approach to DSP Synthesis 6.a What are the objectives? To implement DSP transforms optimized in terms of DSP hardware. That is to reduce the number of multiply, add or multiply and accumulate (MACs) operations. To do this prior to RTL synthesis as a behavioral transformation step 6.b How this is going to be done? The goal is to take any DSP transform matrix generated from MATLAB and analyze the structure of the matrix. For example, discrete cosine transform (DCT) for image compression. Then the DCT operation can be written in matrix representation as Y = MX where M = transform matrix X = input Y = output Then regardless of the size of matrix M, analyze its structure to see if there are common entries, entries that are 1 s or 1 or zero. If such entries exist, replace them with symbolic values. Once this is done, the symbolic matrix is passed to TEDify package [5] that exploits commonality using Taylor expansion diagram [6] and generates expressions for output optimized in terms of operators Note that matrix M need not correspond to any particular DSP transform but could be any user-defined matrix of any size obtained from user-defined transformation. Once output expressions from TEDify are obtained, correctness of Y = MX can be verified by providing input and matching output. 21

Y original = MX Error M TEDify Y optimized = M X Figure 23: Optimization approach using TEDify 6.c Summary of MATLAB TEDify interface The first step is to generate a transformation matrix in MATLAB environment to perform the computation Y original = MX Then Matrix M is passed to TEDify package. TEDify package will do the optimization and give back the output Y optimized in terms of X. Plug in values of X and see if it matches with Y original. 7. What is next? Given output expressions from TEDify e.g. Y 0 = C 0 *(x 0 +x 1 +x 2 +x 3 ) One can easily convert such code to RTL. This can be automated with special script. Once RTL code has been generated, it can be synthesized and one can be assured that the code will be optimized in terms of hardware resources. 22

8. MATLAB framework for TEDify Figure 24 shows MATLAB framework for TEDify where one can choose well-known DSP transforms or generate user-defined transform, which will be eventually passed to TEDify. Figure 24: Menu for DSP transform generation (still under development) Once the user chooses the transform type, he/she is asked for the transform matrix size. Figure 25 shows this menu. 23

Figure 25: Menu for DSP transform size Once the user chooses transform and transform size shown in figure 24 and figure 25, two files are generated. One file contains symbolic values in a matrix format that is as rows and columns. Figure 26 shows such a file. The first line describes the transform type e.g., DCT. The second line specifies the size of the transform e.g., 16 by 16. Next matrix is represented as symbolic values. 24

Figure 26: Symbolic constant generated for DCT transform of size 16 by 16. This will be the input to TEDify. 9. Conclusions and future work We have seen that our MATLAB - TEDify interface can do what FPGA vendor tool s does. Moreover, its integration with any vendor tool will greatly help the vendor in improving hardware optimization. The menu of MATLAB - TEDify interface is still under development. It will be expanded to cater Filtering FIR/IIR and any other user-defined transform. Interface will be developed for TEDify to export the results to MATLAB. An Interface has to be developed to generate RTL. 25

Acknowledgements I am greatly thankful to Prof. Maciej Ciesielski for his constant feedback and support. I also want to thank Daniel Gomez-Prado and Jeremie Guillot for their help in understanding TEDify package. References 1) Simplifying DSP Hardware Development within a MATLAB -based Design Flow", Compiler magazine from Synopsys http://www.synopsys.com/news/pubs/compiler/art2_easypath-sep05.html 2) FPGAs: Fast track to DSP, white paper from Mentor Graphics. 3) Alan V. Oppenheim, Ronald W. Schafer, John R. Buck, Discrete-Time Signal Processing, Prentice Hall 1999. 4) TEDify Package, http://tango.ecs.umass.edu/ted/doc/html/ 5) D. Gomez-Prado, Q. Ren, M. Ciesielski, J. Guillot, E. Boutillon, High level transformations using Taylor Expansion Diagrams, DATE 2006. 6) Allen Kinast, Designing Digital Signal Processing with FPGAs, Mentor Graphics, Feb 2003 7) Douang Phanthavong, Manish Bansal, Mandar Chitnis,D.J. Wang, Optimization techniques for efficient implementation of DSP on FPGAs, Mentor Graphics 26