Chapter 9: Integration of Full ASIP and its FPGA Implementation 9.1 Introduction A top-level module has been created for the ASIP in VHDL in which all the blocks have been instantiated at the Register Transfer Level (RTL). An appropriate VHDL code has been written for all the memories (control, data and program memory) to infer BlockRAMs available on FPGA. Constraint file is created to initialize all the inferred BlockRAMs. Next step is to synthesize the fully integrated ASIP to translate RTlevel code into gate-level netlist and optimize it for the target FPGA. Synthesis and implementation have been done in Xilinx s Project Navigator which is the primary user interface for Integrated Software Environment (ISE). Each step of the design process from design entry to downloading of the design onto FPGA chip has been managed in Xilinx Project Navigator environment. FPGA design flow is shown in figure 9.1. 9.2 FPGA Implementation Steps for implementation of the ASIP in FPGA are given below: 9.2.1 Design and Constraints Entry A top-level design is created in which all the functional blocks/units used in the ASIP micro-architecture including the BlockRAMS (that are used for implementing the memories) are instantiated and interconnected. As described in chapter 7 all the individual functional blocks had earlier been designed using VHDL at RT-level and functionally verified. Besides functional verification they were also individually synthesized (as a trial) to check out their gate counts and whether the synthesized gatelevel circuit met the clock speed requirement of 25 MHz. The top-level design goes as an input to the synthesis tool. A User Constraint File (UCF) was also created to initialize all the memories i.e. control store, data memories (integer and floating-point) and program memory. 147
9.2.2 Design Verification through Functional Simulation Functional simulation tests functional correctness of the design. One can save time during subsequent design steps if the functional simulation is performed early in the design flow. Functional tests were run on the integrated design to verify correctness of its functionality before final implementation onto the FPGA. Here the test data consisted of each individual instruction (with several different operand values) and different test sequences of instructions. The instruction(s) were placed in the program memory through its initialization and the data was similarly placed in the appropriate data memories. The program counter too was initialized. The ASIP design was then simulated for a number of clock cycles and its output (state of data in the data memory) was compared against the corresponding output of the instruction set simulator after execution of each instruction. Design Entry & Constraints Entry Design Synthesis Design Verification Functional Verification Through Simulation Design Implementation Static Timing Analysis Back Annotation Timing Simulation Download to a Xilinx Device In-circuit Verification Figure 9.1: FPGA Design Flow 148
9.2.3 Synthesis After the design was successfully verified functionally, the next step was to translate the design into gate-level netlist and optimize it for the target FPGA. This was achieved through design synthesis in which the top-level design description (comprised of instantiations and interconnections of RT-level functional blocks) was translated into a structural netlist, and the design was optimized for the FPGA device. Synthesis was done on XST (Xilinx Synthesis Technology), which is a Xilinx tool that synthesizes HDL designs to create a gate-level netlist. XST was invoked within the Project Navigator. Post-synthesis simulation on the generated gate-level netlist was also carried out to check whether the generated gate-level netist is correct or not. 9.2.4 Design Implementation Next step in the FPGA design flow is design implementation. Design implementation begins with the mapping of a logical design file to a specific device and is complete when the physical design is successfully routed and a bit stream is generated. The implementation of the design consists of taking the synthesized netlist through translation, mapping, place and route and finally configuration as shown in figure 9.2. Static Timing Analysis (STA) was performed to check whether the design met the timing requirements of 25 MHz (clock frequency constraint). Also post-place&route simulation was carried out after back annotation of delays by including the SDF (Standard Delay Format) file. Details of FPGA implementation flow and steps as shown in Figure 9.2 are described below: 1) Translation: Performs all the steps necessary to read a gate-level netlist in XNF (Xilinx Netlist Format) and create an Xilinx Native Generic Database (NGD) primitives describing the logical design in terms of logic elements such as AND gates, OR gates, Decoders, Flip flops, and RAM s. This output NGD file can be mapped to the desired device family. 149
2) Mapping: Maps a logical design to a Xilinx FPGA. The input to mapping is an NGD (Native Generic Database) generated from the translation step, which contains a logical description of the design in terms of both the hierarchical components used to develop the design and the lower-level Xilinx primitives. It maps the logic to the components (logic cells, I/O cells and other components) in the target FPGA device. The output after the mapping is an NCD (Native Component Descrition) file which is a physical representation of the design mapped to the components in the Xilinx FPGA. 3) Placement and Routing: Places and routes the mapped NCD file. For FPGAs, the PAR command line program takes mapped NCD file as input, places and routes the design, and outputs a placed and routed NCD file, which is used by the bitstream generator, BitGen. 4) Bitstream Generation or Configuration: For FPGAs, the BitGen command line program produces a bitstream for Xilinx device configuration. BitGen takes a fully placed and routed NCD file as its input and produces a configuration bitstream a binary file with a.bit extension. The BIT file contains all of the configuration information from the NCD file defining the internal logic and interconnections of the FPGA, plus device-specific information from other files associated with the target device. XNF ENGINE TRANSLATION FLOW MAPPING PLACEMENT & ROUTING CONFIGURATION Figure 9.2: FPGA Implementation Flow 150
9.2.5 Overall Synthesis Results of the ASIP Following are the results obtained from the synthesis of the ASIP design: Equivalent Gate Count: 286,713 Number of BlockRAMs Used: 15 out of 16 (each of size 4 kb) Minimum Clock Period (clock cycle duration): 36.129 ns 9.3 Implementation of a Real-time Demonstration system for the ASIP The developed ASIP is a computational engine (implementing Klatt s parametric speech synthesizer) which computes 60 speech samples (of 16 bits each) as output using a parametric frame (consisting of 60 parameter values of 16 bits each) as input in realtime (within 5 milliseconds). A stand alone ASIP thus appears only as a number cruncher. It was therefore considered more interesting if a live text-to-speech conversion system could be rigged using the ASIP for demonstration. Such a demonstration system must contain facilities for performing the following tasks: a) A means of Hindi text entry. b) Conversion of entered Hindi text into a sequence of parameter frames (60 parameter values per frame). c) Transfer of parameter frames to the ASIP (which implements the Klatt s parametric speech synthesizer in real-time: at a parameter frame rate of 200 frames per second) when requested by the ASIP. d) Transfer of computed speech samples from the ASIP to an audio-codec (whose output is connected to the speaker) for playing the speech. Such a demonstration system was rigged using a PC and a specially selected commercial FPGA based prototyping board which besides the FPGA (XSB-300E: that could accommodate the ASIP design) also contained a programmable clock generator, 256 KB SRAM and an audio-codec required for developing the demonstration system. The task of Hindi text entry was handled through the PC key board (and the PC resident software). 151
The task of converting the entered Hindi text into sequence of parameter frames was also performed on the PC (through a software developed for the purpose by another research team at CEERI-Delhi Centre). The task of transfer of PC generated sequence of parameter frames to the ASIP was handled through the transfer of sequence of parameter frames from the PC to the SRAM available on the proto-board (using the software tool GXSLOAD which was available as a part of the software tool kit supplied by the vendor of the board), and thereafter from the SRAM to the ASIP with the help of a specially designed interface controller (one of whose functions was to interface the SRAM available on the protoboard to the ASIP input port). The other function of the interface controller was to interface the output port (16- bit parallel) of the ASIP to the input port (serial) of the audio-codec available on the proto-board. For the demonstration system the circuit implemented in the FPGA comprises of the following three components and their interconnections: 1) The ASIP. 2) The interface controller. 3) STARTUP_VIRTEX. The STARTUP_VIRTEX component (which is a component from the Xilinx library) allows the use of an external 'reset' signal. In this case, this signal is wired to pin- 122 of the FPGA, which on the proto-board is wired to a dip-switch identified as S5-8 where S-5 is a set of 8 dip-switches. When this switch (S5-8) is switched to the position marked as 'ON', then the system is reset and stays in reset till the switch is switched to the other position. This component STARTUP_VIRTEX also allows the use of the userclock for the start-up phase of the configuration. This allows precise synchronization of timing signals generated by using the user-clock. This feature is also made use of in the design. The interface controller component is responsible for interfacing the ASIP with the SRAM (external) and audio-codec available on the proto-board. It reads parameters 152
from SRAM when the ASIP requests a new parameter and transfers it to the ASIP. It also takes speech samples (16 bits each) generated by the ASIP, buffers them in a simple double-buffer and sends them as a serial stream to the audio-codec at the appropriate instants. It also generates the 3 clocks required by the audio-codec by using synchronous counters to divide the clock being supplied to the FPGA. The audio codec requires data in 2's complement representation. The audio-codec is used in its default mode which is 16- bits, MSB first. It may be mentioned here that the FPGA is supplied a 25 MHz clock by programming the clock generator on the proto-board. 153
154