Accelerating Nios II Systems with the C2H Compiler Tutorial
|
|
- Sophia Allen
- 6 years ago
- Views:
Transcription
1 Accelerating Nios II Systems with the C2H Compiler Tutorial August 2008, Version 8.0 Tutorial Introduction The Nios II C2H Compiler is a powerful tool that generates hardware accelerators for software functions. The C2H Compiler enhances design productivity by allowing you to use a compiler to accelerate software algorithms in hardware. You can quickly prototype hardware functional changes in C, and explore hardware-software design tradeoffs in an efficient, iterative process. The C2H Compiler is well suited to improving computational bandwidth as well as memory throughput. It is possible to achieve substantial performance gains with minimal engineering effort. This tutorial teaches you how to use the C2H Compiler to accelerate a fast Fourier transform (FFT), yielding a large performance gain over a purely software based approach. Table of Contents Introduction...1 Prerequisites... 2 Hardware & Software Requirements... 2 Getting the Hardware and Software Files... 2 FFT Background...3 Analyzing the FFT Code...4 Creating a Build Report... 4 Performance Metrics... 5 Unoptimized Accelerator...7 Optimizing the FFT...8 Adding On-Chip Buffers... 8 Building the Accelerator... 9 Optimized Performance Metrics Downloading and Running the Accelerated System Bottlenecks...12 Computational Bottlenecks Memory Bottlenecks Narrow Memory Access Random Access to SDRAM Multiple Master Port Memory Stalls Restructuring Code to Optimize the Accelerator...14 Data Buffering Optimizations Fast Memory Double Buffering Master Port Minimization Sine and Cosine Data Buffering Bit Reversal Buffering SDRAM Memory Access Optimizations Calculation Stage Optimizations Conclusion...19 Altera Corporation 1 TU-N2C2H-1.3
2 Introduction Prerequisites To make effective use of this tutorial, you should be familiar with the following topics: ANSI C syntax and usage Defining and generating Nios II hardware systems with SOPC Builder Compiling Nios II hardware systems with the Altera Quartus II development software Creating, compiling, and running Nios II software projects Nios II C2H Compiler theory of operation f To familiarize yourself with the basics of the C2H Compiler, refer to the Nios II C2H Compiler User Guide, especially chapters Introduction to the C2H Compile, and Getting Started Tutorial. To learn about defining, generating and compiling Nios II systems, refer to the Nios II Hardware Development Tutorial. To learn about Nios II software projects, refer to the Nios II Software Development Tutorial, available in the Nios II IDE help system. Hardware & Software Requirements This tutorial requires you to have the following software and hardware: Quartus II development software version 7.2 or later, installed on a Windows or Linux computer. Nios II Embedded Design Suite (EDS) version 7.2 or later One of the following Nios II development boards: Stratix II Edition Cyclone II Edition A JTAG download cable compatible with your target hardware, such as a USB-Blaster cable. Getting the Hardware and Software Files The tutorial software files are available on the Nios II literature page. A link to the software files appears next to Accelerating Nios II Systems with the C2H Compiler Tutorial (this document), at The hardware and software files are distributed in a zip file. Extract the design files included in the file c2h_tutorial.zip to a new directory on your host computer. Be sure to recreate the directory structure on your local file system (for example, turn on Use folder names in the WinZip application). If you are targeting a Cyclone II development board, the files in the c2h_fft_cyclone_ii subdirectory are relevant to you. If you are targeting a Stratix II development board, the files in the c2h_fft_stratix_ii subdirectory are relevant. The remainder of this document refers to the relevant directory as <tutorial install dir>. The <tutorial install dir> folder contains a Quartus II project and a software folder. The software folder contains two subdirectories: c2h_fft and c2h_fft_syslib. c2h_fft is the main software project and contains the following files: sw_only_fft.c, sw_only_fft.h These files implement a 256-point radix-two FFT that can run on a Nios II processor without any hardware acceleration. 2 Altera Corporation Accelerating Nios II Systems with the C2H Compiler Tutorial August 2008
3 FFT Background accelerator_optimized_fft.c, accelerator_optimized_fft.h These files implement a 256-point radix-two FFT that is functionally equivalent to the FFT defined in sw_only_fft.c. These files have been modified to run optimally in a hardware accelerator based system. For details about the optimizations, see the Restructuring Code to Optimize the Accelerator section. pound_defines.h This file contains macros used by the FFT as well as pragma directives passed to the C2H Compiler. the_top_file.c This file contains function main(). This is the top-level benchmark which runs both the software only and accelerated FFT functions and compares the performance of the two approaches. testdata.dat, results.dat These files contain the input data to the FFT and the expected output results. twiddles.dat This data file contains precalculated sine and cosine terms used in the software FFT calculation. To import the software projects, perform the following steps: 1. Launch the Nios II IDE, and import the application and system library projects. a. Click Import in the Nios II IDE File menu. b. Select Existing Altera Nios II Project into Workspace, and click Next. c. Browse to the <tutorial install dir>/software directory, select the c2h_fft folder, and click OK. d. Click Finish. e. Repeat the above steps for the c2h_fft_syslib project.! If the import dialog box does not automatically locate the SOPC Builder system file, browse to the <tutorial install dir> directory, select FFT_system.ptf, and click OK. 2. The C2H Compiler ignores the Configuration setting. FFT Background The fast Fourier transform (FFT) is a highly efficient method for calculating the discrete Fourier transform (DFT). The DFT is used in signal processing applications for a range of purposes, such as analyzing the frequency components of signals and data compression. The DFT is a computationally intensive function. A naïve (non-fft) implementation of an n-point DFT requires n 2 complex multiplications. The FFT algorithm achieves its efficiency gains by decomposing the DFT into a number of smaller DFTs and exploiting the symmetry and periodicity of the sub stages to reduce the number of calculations. An n-point FFT only requires n log 2 n complex multiplications. Cutting down the number of complex multiplications improves the FFT performance, often by several orders of magnitude, depending on the order of the transform. A full description of the FFT algorithm is beyond the scope of this tutorial. Here are some basic facts about the FFT algorithm to be aware of: The FFT operates on complex data. It performs calculations simultaneously on real and imaginary components of the data. The algorithm implements complex multiplication as four multiplications, one addition and one subtraction. Altera Corporation 3 August 2008 Accelerating Nios II Systems with the C2H Compiler Tutorial
4 Analyzing the FFT Code One of the fundamental operations in the FFT algorithm is the butterfly calculation. The butterfly calculation either breaks a larger DFT into smaller DFTs, or recombines smaller DFTs into a larger. The name butterfly comes from the shape of the dataflow diagram describing the operation. f You can find more information at under "Butterfly (FFT algorithm)". The FFT function uses a technique called bit reversal to rearrange the input points so that the outputs are in the correct order. Conventional software FFT implementations obtain some of their speed by pre-calculating sine and cosine terms used in the butterfly calculations. These sine and cosine terms are called twiddle factors. The example design included with this tutorial is based on a radix-two implementation of the FFT function. This a decimation-in-time FFT, which decomposes the original 256-point DFT into two 128-point DFTs, which it then breaks down to four 64-point DFTs, and so on, until ultimately it evaluates 128 two-point DFTs. Analyzing the FFT Code A typical first step with the C2H Compiler is to accelerate the C function without restructuring the code. This approach rarely yields optimal performance. However, it provides performance metrics which allow you to identify the system bottlenecks. Creating a Build Report The C2H Compiler provides tools to help you analyze your C code. Carry out the following steps to see an analysis of the FFT function. 1. Launch the Nios II IDE if it is not already running. 2. Open the file sw_only_fft.c in the application project. 3. Highlight the function name software_only_fft, right-click, and click Accelerate with the Nios II C2H Compiler, as shown in Figure The Nios II IDE displays the C2H view. Expand the folders labeled c2h_fft and sw_only_fft(), and select Use Software implementation and Analyze all accelerators. Make sure the settings are as follows: Use software implementation for all accelerators Use hardware accelerator in place of software implementation. Flush data cache before each call The message Build report cannot be displayed is normal at this point. 5. Right click the c2h_fft application project, and then click Build Project. As part of the build process, the Nios II IDE performs the following steps: Analyzes the function to be accelerated, determining the mapping from C constructs to hardware, and computing performance metrics. Compiles the software executable, using the software implementation of software_only_fft(). Depending on the speed of your platform, this can take ten to twenty minutes. While you wait for the build to complete, you might wish to read ahead in the Unoptimized Accelerator section. This section describes how the C2H Compiler accelerates software_only_fft() without optimizations. 4 Altera Corporation Accelerating Nios II Systems with the C2H Compiler Tutorial August 2008
5 Analyzing the FFT Code Figure 1. Accelerating the Function! The Nios II compiler displays the following message: ignoring #pragma altera_accelerate connect_variable. You can disregard this warning. The C2H Compiler uses the altera_accelerate pragma to limit the number of master ports, as described in the Restructuring Code to Optimize the Accelerator section. It has no meaning for the Nios II compiler. Performance Metrics Although the C2H Compiler can accelerate unmodified ANSI C code, you typically need to modify the code to build a fully optimized hardware design. To gain insight into which portions of the function might need optimization, look at the build report generated by the C2H Compiler. The build report appears in the C2H view after you build the project. To see the performance metrics, expand the folders labeled Build Report, Performance, and The accelerated function contains 5 loops, and then expand each folder labeled file:../sw_only_fft.c line:n Loop CPLI=m, as shown in Figure 2 on page 6. Altera Corporation 5 August 2008 Accelerating Nios II Systems with the C2H Compiler Tutorial
6 Analyzing the FFT Code The most important performance metrics are: Cycles per loop iteration (CPLI) the number of clock cycles per loop iteration in the best case. The lowest possible value of CPLI is 1. This means that an iteration of the loop occurs every clock cycle, assuming no stalling for inner loops or memory access. A loop with CPLI = 1 has the maximum throughput possible without fundamental restructuring beyond the scope of this tutorial. Loop latency the initial overhead when the accelerator enters the state machine implementing a C loop. The accelerator must fill its pipeline before the first result is ready, and the loop latency is the number of clock cycles it requires to do so. Figure 2. Performance Metrics for Non-Optimized FFT 6 Altera Corporation Accelerating Nios II Systems with the C2H Compiler Tutorial August 2008
7 Unoptimized Accelerator The results in Figure 2 show that most of the values of loop latency and CPLI are high. The next section provides an overview of how the C2H Compiler generates the unoptimized accelerator. Unoptimized Accelerator The C2H Compiler translates C constructs to their hardware equivalents in a straightforward way. C code is usually designed assuming serial execution on a CPU, and therefore is not optimal for parallel execution on hardware. The FFT function uses the following operations: Memory Access Multiplication Division Addition and subtraction Bit Shifting Iteration with counter All of the operations listed above translate to hardware constructs. f For further information about C2H hardware transforms, refer to the chapter C-to-Hardware Mappings Reference in the Nios II C2H Compiler User Guide. Figure 3 illustrates the system that the C2H Compiler builds when you accelerate the unoptimized FFT function. Figure 3. Hardware Accelerated System Nios II SDRAM FFT Accelerator In the next section, you accelerate the FFT algorithm with optimizations for the C2H Compiler. Altera Corporation 7 August 2008 Accelerating Nios II Systems with the C2H Compiler Tutorial
8 Optimizing the FFT Optimizing the FFT This tutorial includes a modified version of the FFT algorithm, accelerator_optimized_fft(). It is optimized according to techniques described later in this tutorial. In this section, you perform the following steps: Accelerate the function accelerator_optimized_fft() Build the software project Compile the hardware design, which includes the accelerator Run the accelerated FFT, comparing its performance with the software implementation The following sections guide you through the process of creating the optimized accelerator. Adding On-Chip Buffers Perform the following steps to prepare the hardware project for optimized acceleration: 1. Start the Quartus II development software, and open the Quartus II project (2s60_fft_acceleration.qpf or 2c35_fft_acceleration.qpf). 2. Start SOPC Builder. 3. Add four on-chip memories to the SOPC Builder system with the following properties: Memory Type = RAM Dual-Port Access enabled Memory Width = 16 bits Total Memory Size = 512 bytes Read Latency = 1 (this applies to both slave ports) The accelerator uses these on-chip memories to buffer the input and output data for the FFT. The Restructuring Code to Optimize the Accelerator section discusses the reasons for the memory settings. 4. Give the memories the following names: BufferRAM1 BufferRAM2 BufferRAM3 BufferRAM4 5. Add two on-chip memories to the system with the following properties. Memory Type = RAM Dual Port Access disabled Memory Width = 16 bits Total Memory Size = 512 bytes Read Latency = 1 8 Altera Corporation Accelerating Nios II Systems with the C2H Compiler Tutorial August 2008
9 Optimizing the FFT The accelerated code uses these on-chip buffers to store sine and cosine terms. The Restructuring Code to Optimize the Accelerator section discusses the reasons for the memory settings. 6. Give these memories the following names: CosRAM SinRAM 7. Disconnect all on-chip memory slave ports. By default, SOPC Builder connects the on-chip memories to the Nios II processor's instruction and data master ports. The C2H Compiler connects the memory's slave ports to the accelerator at build time. 8. Set each memory's base address as shown in Table 1. In the case of dual-port memories, set both slave ports to the same base address. Be sure to lock the base address of all on-chip memories in the system. Table 1. Memory Addresses BufferRAM1 BufferRAM2 BufferRAM3 BufferRAM4 CosRAM SinRAM Memory Name 0x x x x x x00000A00 Base Address! It is important to use the exact names shown in Table 1, because the FFT source code refers to them explicitly. 9. Verify that your system resembles the system depicted in Figure 4. Disregard the messages stating that the slave ports are not connected to any master port. The C2H Compiler connects them later, when it builds the accelerator. 10. Exit SOPC Builder, making sure to save the system when prompted. You do not need to generate the SOPC Builder system at this point. The C2H Compiler generates it for you after it builds the accelerator. Building the Accelerator Perform the following steps to build the optimized accelerator. 1. Launch the Nios II IDE, if it is not already running. 2. Remove the accelerator from software_only_fft(). a. In the C2H view, select the function name, right-click, and click Remove C2H. The Nios II IDE prompts you: Do you really want to remove the function "software_only_fft()" from the list of functions to accelerate? b. Click Yes. A message appears saying The accelerator has been removed. Rebuild the project to update the SOPC Builder system. c. Click OK. Altera Corporation 9 August 2008 Accelerating Nios II Systems with the C2H Compiler Tutorial
10 Optimizing the FFT Figure 4. SOPC Builder System 3. In the file accelerator_optimized_fft.c, accelerate the function accelerator_optimized_fft(), as you accelerated software_only_fft()in the Creating a Build Report section on page 4. This time, leave Build software, generate SOPC Builder system, and run Quartus II compilation selected (the default). 4. Expand the c2h_fft and accelerator_optimized_fft() folder icons in the C2H view, and make sure the following settings are turned on: Build software, generate SOPC Builder system, and run Quartus II compilation Use hardware accelerator in place of software implementation. Flush data cache before each call 5. Build the application project again. As part of the build process, the Nios II IDE performs the following steps: Analyzes the function to be accelerated, determining the mapping from C constructs to hardware, and computing performance metrics 10 Altera Corporation Accelerating Nios II Systems with the C2H Compiler Tutorial August 2008
11 Optimizing the FFT Integrates the accelerator into the SOPC Builder system Generates the HDL Compiles the system in Quartus II Creates a wrapper function to invoke the accelerator Compiles the software executable The build process might take 20 to 40 minutes, depending on the speed of your platform. While you wait, you might wish to read ahead in the Bottlenecks and Restructuring Code to Optimize the Accelerator sections. These sections describe how accelerator_optimized_fft() is optimized to improve the accelerator performance.! The Nios II compiler displays the following message: ignoring #pragma altera_accelerate connect_variable. You can disregard this warning. The C2H Compiler uses the altera_accelerate pragma to limit the number of master ports, as described in the Restructuring Code to Optimize the Accelerator section. It has no meaning for the Nios II compiler. Optimized Performance Metrics After you accelerate the function, the Nios II IDE displays a new set of performance metrics in the C2H view. The new performance metrics show that the latency of most loops is lower, and CPLI=1 for each for loop in the design. Even though the calculation stage consists of three nested loops, each loop has CPLI=1, minimizing stalling of the outer loops. To review the accelerator that the C2H Compiler has added to the SOPC Builder System, open the design in SOPC Builder. The system connections and on-chip memory base addresses appear. You might notice that the on-chip RAM slave ports are not visibly connected. This is normal. SOPC Builder hides accelerator master ports, because they are often so numerous that the connection grid is unreadable. You cannot use SOPC Builder to edit master-slave connections inserted by the C2H Compiler. Downloading and Running the Accelerated System Perform the following steps to download and run the accelerated system. 1. After the compilation has finished, download the resulting FPGA configuration file (.sof) to the development board using the Quartus II programmer. 2. Return to the Nios II IDE, right-click the c2h_fft application project, point to Run As, and click Nios II Hardware to run the software project on the development board. The example executes 1000 iterations of the unaccelerated and accelerated FFT functions, and verifies that the output data from the last run of each is valid. After the software is finished, the results of the FFT benchmark appear in the Console view of the Nios II IDE. The console output for the Nios II Cyclone II development board resembles Figure 5. Altera Corporation 11 August 2008 Accelerating Nios II Systems with the C2H Compiler Tutorial
12 Bottlenecks Figure 5. FFT Benchmark Sample Output FFT Benchmark Starting (this will take up to 20 seconds) - Running 1000 iterations for both software and hardware. - Each iteration runs a 256 point radix 2 FFT transformation. --Performance Counter Report-- Total Time: seconds ( clock-cycles) Section % Time (sec) Time (clocks) Occurrences Software Only HW Accelerated The software only output data is correct The hardware accelerated output data is correct The report details the performance of the FFT on the Nios II processor using unaccelerated software, followed by the performance results from the FFT accelerated with the C2H Compiler. In the example shown in Figure 5, the C2H Compiler has improved the performance of the FFT calculation by a factor of approximately 16. The remainder of this tutorial describes the techniques used to achieve this performance improvement. Bottlenecks This tutorial shows three common types of performance bottlenecks which can occur in unoptimized accelerators, discussed in the following sections: Computational Bottlenecks Memory Bottlenecks Multiple Master Port Memory Stalls Computational Bottlenecks The FFT is subject to a common type of computational bottleneck caused by mismatched data widths. software_only_fft(), designed for a Nios II system with plenty of memory and a 32-bit data path, uses 32- bit signed data types to avoid overflow and underflow. However, when you implement a design in hardware, wide data paths consume more logic than narrow ones, which can reduce f MAX for the entire design. When accelerating software functions in hardware, it is best to tailor the data width to your exact data range requirements. Selection of signed or unsigned data types also plays a role in the performance of the design. When possible, use unsigned values. Converting an unsigned value to signed is trivial, but the opposite conversion requires extra logic. Memory Bottlenecks Memory bottlenecks cause the largest performance penalty in the unoptimized hardware accelerator. This tutorial exemplifies two common types of memory bottleneck: 12 Altera Corporation Accelerating Nios II Systems with the C2H Compiler Tutorial August 2008
13 Bottlenecks Narrow Memory Access Random Access to SDRAM The following sections discuss these types of memory bottleneck. Narrow Memory Access The SDRAM used in the example design has a 32 bit data width. However, the FFT software function uses 16 bit data types. Therefore each time the accelerator fetches a variable from memory, half of the memory bandwidth is wasted, because the function only uses 16 bits. For the best performance, access high latency memory devices with bus transfers of the same width as the memory device. Random Access to SDRAM The SDRAM device used in the example design suffers from long latency times. SDRAM devices achieve their highest bandwidth when they are accessed sequentially. By contrast, SRAM based memories typically have no performance penalty for random (non-sequential) access. There are two situations in which the FFT algorithm accesses the SDRAM non-sequentially: Single Master Port Random Access Multiple Master Port Random Access The following sections discuss each of these situations. Single Master Port Random Access The FFT algorithm uses a technique called bit reversal to rearrange the input points so that the outputs are in the correct order. The bit reversal values form a pattern of array indices beginning with the following values: 0, 128, 64, 192, and 32. The algorithm uses these values as array indices for each input point read from SDRAM. This causes poor memory performance, because the indices are not sequential. Multiple Master Port Random Access The hardware accelerator contains multiple Avalon-MM master ports, all of which compete for access to the SDRAM. Each master port accesses independent locations within memory. This results in non-sequential memory accesses when the Avalon interconnect fabric arbitrates between master ports. This type of random access is typical of any system with multiple master ports. Multiple Master Port Memory Stalls Another memory bottleneck results from the physical limitations of memory interfaced with multiple master ports. Only one master port can access the memory at a time. If a second master port tries to access memory when the first is in control, the second master port must wait. This stalls the pipeline. The FFT hardware accelerator must read data, sine and cosine terms from memory, and also write results back to memory. These multiple types of memory access cause memory stalls. One state in the pipelined transform can be starved of input data when master ports belonging to other states gain access to the SDRAM. Altera Corporation 13 August 2008 Accelerating Nios II Systems with the C2H Compiler Tutorial
14 Restructuring Code to Optimize the Accelerator Restructuring Code to Optimize the Accelerator To improve accelerator performance, optimize your code to allow independent reading and writing tasks to occur in parallel. This optimization technique requires fast data buffering, which you can implement using on-chip memory available in the FPGA. This is why you add the on-chip memories in the Adding On-Chip Buffers section. Data Buffering Optimizations This example illustrates several types of data buffering optimizations: Fast Memory Double Buffering Master Port Minimization Sine and Cosine Data Buffering Bit Reversal Buffering Fast Memory The accelerated FFT stores data in fast memory with a low, fixed latency. The on-chip memories have a latency of 1, the lowest available. Low latency lets the calculation stage access the data rapidly, because it reduces the number of states the C2H Compiler must create for the state machines that schedule the memory access. Fixed latency means that the accelerator need not access the memory sequentially to achieve the highest throughput. Double Buffering The software FFT implementation performs in-place data calculations. The software implementation uses the same memory buffer to load input values, save intermediate calculation results, and store output values. In-place calculations save memory resources, but often slow performance when translated to hardware. Full concurrency is difficult to achieve with a single buffer, because of the memory stalls described in Multiple Master Port Memory Stalls on page 13. The optimized FFT accelerator uses on-chip memory to buffer the input and output data, so that the calculation portion of the accelerator can operate independently of SDRAM. To achieve full concurrency, the transformation phase of the FFT must be able to read and write at the same time. The optimized FFT achieves this with double buffering. Double buffering, also known as ping-pong buffering, allows one master port in the FFT accelerator to read input data from one buffer while another master port writes results into another buffer. This type of data buffering avoids memory stalls. Figure 6 illustrates a simplified form of the double buffering scheme used in the optimized FFT function. In this FFT implementation, double buffering requires a total of 4 buffers. This is because the FFT performs calculations on real and imaginary input data simultaneously. The FFT uses one pair of buffers for real data, and one pair for imaginary data. The real and imaginary calculations are independent of one another, so the accelerator can perform them concurrently. This section describes the buffering method for the real input data. The algorithm handles imaginary data exactly the same. 14 Altera Corporation Accelerating Nios II Systems with the C2H Compiler Tutorial August 2008
15 Restructuring Code to Optimize the Accelerator Figure 6. Hardware Accelerated Data Buffering Scheme Nios II FFT Accelerator BufferRAM1 Calculation Read Buffer SDRAM BufferRAM2 Calculation Write Buffer Figure 7 illustrates the double buffering scheme used in the optimized code. Figure 7. Double Buffering BufferRAM1 BufferRAM2 Stage Number Read Buffer Write Buffer 1 BufferRAM1 BufferRAM2 2 BufferRAM2 BufferRAM1 3 BufferRAM1 BufferRAM2 dataflow 8 BufferRAM2 BufferRAM1 Altera Corporation 15 August 2008 Accelerating Nios II Systems with the C2H Compiler Tutorial
16 Restructuring Code to Optimize the Accelerator The 256-point FFT in this example decomposes into eight stages. During the first stage the read buffer is BufferRAM1 and the write buffer is BufferRAM2, as shown in Example 1. Example 1. accelerator_optimized_fft.c: Initial Buffer State 52 /* Assign the ping pong buffers default address locations */ 53 BufferedRealCalcDataRead = BufferRAM1; 54 BufferedRealCalcDataReadPort2 = BufferRAM1; 55 BufferedRealCalcDataWrite = BufferRAM2; 56 BufferedRealCalcDataWritePort2 = BufferRAM2; The algorithm swaps the buffers after each stage, as shown in Example 2. Example 2. accelerator_optimized_fft.c: Swapping Buffers 144 BufferedRealCalcDataRead = BufferedRealCalcDataWrite; 145 BufferedRealCalcDataWrite = BufferedRealCalcDataReadPort2; 146 BufferedRealCalcDataReadPort2 = BufferedRealCalcDataRead; 147 BufferedRealCalcDataWritePort2 = BufferedRealCalcDataWrite; When all eight stages are complete, the results of the FFT are in BufferRAM1. The function copies the results back to SDRAM, as shown in Example 3. Example 3. accelerator_optimized_fft.c: Results Stored in SDRAM 155 /* returning the interleaved results to sdram 156 * Since the data is 16 bit and interleaved we'll stick the real and 157 * imaginary parts together and send them off to sdram */ 158 for(outputcounter = 0; outputcounter < NUM_POINTS; outputcounter++) { 159 tempoutputptr[outputcounter] = (((alt_u32)(bufferedimagcalcdataread[outputcounter]) & 0x0000FFFF)<<16) ((alt_u32)bufferedrealcalcdataread[outputcounter] & 0x0000FFFF); 160 } The accelerator maintains data integrity, because it always reads from memory written to on the previous loop iteration. Because there is an even number of stages, the finished data ends up in BufferRAM1. This buffering scheme improves performance, because when the accelerator is performing butterfly calculations, it can read and write data concurrently. Also, the on-chip buffers generate no excess wait states which could cause pipeline stalls. The input consists of 256 data points, each 16 bits wide. This means that each of the two on-chip memory buffers is 512 bytes long. As the memory block type is set to Automatic in SOPC Builder, the Quartus II development software allocates it to a single M4K block if available. In the tutorial, the real data flows through buffers BufferRAM1 and BufferRAM2, while the imaginary data flows through buffers BufferRAM3 and BufferRAM4. The double buffer scheme described above connects two master ports to each of the buffers (the calculation read master port and the calculation write master port). 16 Altera Corporation Accelerating Nios II Systems with the C2H Compiler Tutorial August 2008
17 Restructuring Code to Optimize the Accelerator Master Port Minimization When designing buffer schemes for the C2H Compiler, it is important to consider how many master ports are connected to each memory. The more master ports that are connected to a memory the higher the likelihood of degrading system f MAX, and causing memory stalls as master ports contend for access to the memory. There is also a higher likelihood of master ports competing for the same resources when large numbers of master ports are used. Therefore it is important to balance the number of master ports connected to each memory with the throughput needs of the system. When multiple master ports are connected to a slave port, SOPC Builder must generate logic to arbitrate among them. This logic, if too complex, causes a reduction in f MAX. In this example, only the FFT accelerator needs to access the on-chip memories, so the Nios II processor is not connected to them. The accelerated code uses pragma directives to connect only the master ports that are required. Sine and Cosine Data Buffering The software FFT implementation in this tutorial stores the sine and cosine terms (twiddle factors) in SDRAM. Storing these values in low latency on-chip memory increases performance. In this tutorial, the sine terms are in SinRAM, and the cosine terms are in CosRAM, as shown in Example 4. The Quartus II development software initializes the memories from files SinRAM.hex and CosRAM.hex, which contain the precalculated sine and cosine terms. Example 4. accelerator_optimized_fft.c: Sine and Cosine Data 63 /* Point the Cosine and Sine Tables to the CosRAM and SinRAM on-chip memory 64 * buffers. These memories are local to the accelerator and are not shared 65 * with the Nios II processor. */ 66 CosineTable = CosRAM; 67 SineTable = SinRAM; Bit Reversal Buffering The optimized FFT algorithm reads input data sequentially from SDRAM and stores it in the read buffer. It then accesses the read buffer non-sequentially, using the bit reversal indexes. The read buffer is implemented in onchip memory, which has no latency penalty for random access. The bit reversal pattern is the only part of the FFT function that accesses memory in a non-sequential order. This means that all of the accelerator's SDRAM accesses are sequential, taking advantage of the SDRAM's optimal bandwidth. SDRAM Memory Access Optimizations It is important to use the full data width of the available memory interface. In the FFT, each real and imaginary data point is 16 bits wide. However, the SDRAM device used in this example has a 32 bit interface. For example, the unoptimized code makes two separate accesses to SDRAM when reading the data samples, as shown in Example 5. This wastes half the bandwidth of the SDRAM interface. Altera Corporation 17 August 2008 Accelerating Nios II Systems with the C2H Compiler Tutorial
18 Restructuring Code to Optimize the Accelerator Example 5. sw_only_fft.c: Unoptimized SDRAM Access 24 // Re-order samples using bit reversal 25 for (i = 0; i < NUM_POINTS; i++) { 26 bit_rev_index = bitrev(i); 27 reversed_realdata[bit_rev_index] = InData[2*i]; 28 reversed_imaginarydata[bit_rev_index] = InData[2*i+1]; 29 } Each data point is a complex number, stored as a pair (real and imaginary). Thus a single 32 bit read from SDRAM can access the entire pair. The optimized hardware accelerator uses this fact in the input stage, reading data pairs and storing the real and imaginary components concurrently into separate buffers. The accelerator uses the same optimization in the output phase, storing the real and imaginary components into SDRAM in a single 32 bit write. This code is optimized for the C2H Compiler by reading from the SDRAM into a temporary variable and then writing each half of the temporary variable into the appropriate buffer, as shown in Example 6. Example 6. accelerator_optimized_fft.c: Optimizing SDRAM Access 71 /* Calculate the bitreversal index and read 72 * 32 bits of data from the input buffer in SDRAM (real and imaginary pair). 73 * Split the data read into half and write them into real and imaginary 74 * buffers concurrently */ 75 for (inputcounter = 0; inputcounter < NUM_POINTS; inputcounter++) { 76 bit_rev_index = bitrev(inputcounter); tempinput = tempinputptr[inputcounter]; 79 BufferedRealCalcDataRead[bit_rev_index] = (alt_16)(tempinput & 0x0000FFFF); 80 BufferedImagCalcDataRead[bit_rev_index] = (alt_16)((tempinput & 0xFFFF0000)>>16); 81 } Calculation Stage Optimizations The FFT function uses complex multiplications to calculate the outputs from each butterfly calculation. The algorithm implements complex multiplication as four multiplications, one addition and one subtraction. To improve the throughput of the FFT butterfly calculation, the accelerator uses four separate hardware multipliers, so that all the mathematical operations occur on a single clock cycle, as shown in Example 7. This improves the computational bandwidth of the accelerator. However, all the inputs to the computation stage come from on-chip memory buffers. To maximize the computational throughput, the memory buffers must be able to match the throughput. Example 7. accelerator_optimized_fft.c: Using Parallel Multipliers 113 /* Scale twiddle products to accomodate 16 bit storage */ 114 /* CosReal, SinReal, temp1, and temp2 are all registers so no 115 * waiting occurs here (this happens concurrently) */ 116 trealdata = (( CosReal * temp1 ) + ( SinReal * temp2 ))>> PRESCALE; 117 timagdata = (( CosReal * temp2 ) - ( SinReal * temp1 ))>> PRESCALE; To improve the throughput of the memory buffers, the read and write buffers are implemented as dual-port memories. Dual-port buffers are helpful because the butterfly calculation uses two inputs for every output. The 18 Altera Corporation Accelerating Nios II Systems with the C2H Compiler Tutorial August 2008
19 Conclusion two inputs always come from different memory locations. Therefore the accelerator can retrieve them simultaneously without collision. Example 8 shows the optimized code. Example 8. accelerator_optimized_fft.c: Using Dual-Port Memory 104 /* using temps (regs) to allow this to happen concurrently since 105 * these are DP RAM accesses that do not overlap. We are using read 106 * pointers here so that the write pointers at the bottom can work in 107 * parallel */ 108 temp1 = BufferedRealCalcDataRead[l]; 109 temp2 = BufferedImagCalcDataRead[l]; 110 temp3 = BufferedRealCalcDataReadPort2[butterfly_index]; 111 temp4 = BufferedImagCalcDataReadPort2[butterfly_index]; Conclusion With a few straightforward code optimizations, the Nios II C2H Compiler can sharply improve the computational bandwidth and memory throughput of a software algorithm. In the case of an FFT, we apply the following optimizations: Use 16-bit integers in place of 32-bit integers Use unsigned integers in place of signed integers Fetch data from SDRAM 32 bits at a time Avoid non-sequential SDRAM access by buffering data in SRAM Avoid multi-master-port memory stalls by buffering data and constants in multiple memories Facilitate pipelining with double buffering and dual-port RAM Reduce wait states by using low-latency on-chip RAM The optimized accelerator is up to 50 times faster than a software-only implementation. Altera Corporation 19 August 2008 Accelerating Nios II Systems with the C2H Compiler Tutorial
20 Conclusion 101 Innovation Drive San Jose, CA (408) Technical support: Product literature: Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company, the stylized Altera logo, specific device designations, and all other words and logos that are identified as trademarks and/or service marks are, unless noted otherwise, the trademarks and service marks of Altera Corporation in the U.S. and other countries. All other product or service names are the property of their respective holders. Altera products are protected under numerous U.S. and foreign patents and pending applications, maskwork rights, and copyrights. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. 20 Altera Corporation Accelerating Nios II Systems with the C2H Compiler Tutorial August 2008
Nios II Embedded Design Suite 6.1 Release Notes
December 2006, Version 6.1 Release Notes This document lists the release notes for the Nios II Embedded Design Suite (EDS) version 6.1. Table of Contents: New Features & Enhancements...2 Device & Host
More information24K FFT for 3GPP LTE RACH Detection
24K FFT for GPP LTE RACH Detection ovember 2008, version 1.0 Application ote 515 Introduction In GPP Long Term Evolution (LTE), the user equipment (UE) transmits a random access channel (RACH) on the uplink
More information4K Format Conversion Reference Design
4K Format Conversion Reference Design AN-646 Application Note This application note describes a 4K format conversion reference design. 4K resolution is the next major enhancement in video because of the
More informationFPGAs Provide Reconfigurable DSP Solutions
FPGAs Provide Reconfigurable DSP Solutions Razak Mohammedali Product Marketing Engineer Altera Corporation DSP processors are widely used for implementing many DSP applications. Although DSP processors
More informationActive Serial Memory Interface
Active Serial Memory Interface October 2002, Version 1.0 Data Sheet Introduction Altera Cyclone TM devices can be configured in active serial configuration mode. This mode reads a configuration bitstream
More informationSimultaneous Multi-Mastering with the Avalon Bus
Simultaneous Multi-Mastering with the Avalon Bus April 2002, ver. 1.1 Application Note 184 Introduction The Excalibur Development Kit, featuring the Nios embedded processor version 2.1 supports an enhanced
More informationIntroduction to the Altera SOPC Builder Using Verilog Design
Introduction to the Altera SOPC Builder Using Verilog Design This tutorial presents an introduction to Altera s SOPC Builder software, which is used to implement a system that uses the Nios II processor
More informationIntroduction to the Altera SOPC Builder Using Verilog Designs. 1 Introduction
Introduction to the Altera SOPC Builder Using Verilog Designs 1 Introduction This tutorial presents an introduction to Altera s SOPC Builder software, which is used to implement a system that uses the
More informationSimple Excalibur System
Excalibur Solutions Simple Excalibur System August 2002, ver. 1.0 Application Note 242 Introduction This application note describes a simple Excalibur system design that consists of software running on
More informationDDR and DDR2 SDRAM Controller Compiler User Guide
DDR and DDR2 SDRAM Controller Compiler User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com Operations Part Number Compiler Version: 8.1 Document Date: November 2008 Copyright 2008 Altera
More informationDebugging Nios II Systems with the SignalTap II Logic Analyzer
Debugging Nios II Systems with the SignalTap II Logic Analyzer May 2007, ver. 1.0 Application Note 446 Introduction As FPGA system designs become more sophisticated and system focused, with increasing
More informationNios II Custom Instruction User Guide Preliminary Information
Nios II Custom Instruction User Guide Preliminary Information 101 Innovation Drive San Jose, CA 95134 (408) 544-7000 http://www.altera.com Copyright 2008 Altera Corporation. All rights reserved. Altera,
More informationPractical Hardware Debugging: Quick Notes On How to Simulate Altera s Nios II Multiprocessor Systems Using Mentor Graphics ModelSim
Practical Hardware Debugging: Quick Notes On How to Simulate Altera s Nios II Multiprocessor Systems Using Mentor Graphics ModelSim Ray Duran Staff Design Specialist FAE, Altera Corporation 408-544-7937
More informationSimulating Nios II Embedded Processor Designs
Simulating Nios II Embedded Processor Designs May 2004, ver.1.0 Application Note 351 Introduction The increasing pressure to deliver robust products to market in a timely manner has amplified the importance
More informationWhite Paper The Need for a High-Bandwidth Memory Architecture in Programmable Logic Devices
Introduction White Paper The Need for a High-Bandwidth Memory Architecture in Programmable Logic Devices One of the challenges faced by engineers designing communications equipment is that memory devices
More informationNios II Embedded Design Suite 7.1 Release Notes
Nios II Embedded Design Suite 7.1 Release Notes May 2007, Version 7.1 Release Notes This document contains release notes for the Nios II Embedded Design Suite (EDS) version 7.1. Table of Contents: New
More informationFFT MegaCore Function User Guide
FFT MegaCore Function User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com MegaCore Version: 11.0 Document Date: May 2011 Copyright 2011 Altera Corporation. All rights reserved. Altera, The
More informationDisassemble the machine code present in any memory region. Single step through each assembly language instruction in the Nios II application.
Nios II Debug Client This tutorial presents an introduction to the Nios II Debug Client, which is used to compile, assemble, download and debug programs for Altera s Nios II processor. This tutorial presents
More informationEstimating Nios Resource Usage & Performance
Estimating Nios Resource Usage & Performance in Altera Devices September 2001, ver. 1.0 Application Note 178 Introduction The Excalibur Development Kit, featuring the Nios embedded processor, includes
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationWhite Paper. Floating-Point FFT Processor (IEEE 754 Single Precision) Radix 2 Core. Introduction. Parameters & Ports
White Paper Introduction Floating-Point FFT Processor (IEEE 754 Single Precision) Radix 2 Core The floating-point fast fourier transform (FFT) processor calculates FFTs with IEEE 754 single precision (1
More informationExercise 1 In this exercise you will review the DSSS modem design using the Quartus II software.
White Paper DSSS Modem Lab Background The direct sequence spread spectrum (DSSS) digital modem reference design is a hardware design that has been optimized for the Altera APEX DSP development board (starter
More informationDSP Development Kit, Stratix II Edition
DSP Development Kit, Stratix II Edition August 2005, Development Kit version 1.1.0 Errata Sheet This document addresses known errata and documentation changes the DSP Development Kit, Stratix II Edition
More informationNios II Performance Benchmarks
Subscribe Performance Benchmarks Overview This datasheet lists the performance and logic element (LE) usage for the Nios II Classic and Nios II Gen2 soft processor, and peripherals. Nios II is configurable
More informationUsing MicroC/OS-II RTOS with the Nios II Processor Tutorial Preliminary Information
Using MicroC/OS-II RTOS with the Nios II Processor Tutorial Preliminary Information 101 Innovation Drive San Jose, CA 95134 (408) 544-7000 http://www.altera.com Copyright 2004 Altera Corporation. All rights
More informationPCI Express Multi-Channel DMA Interface
2014.12.15 UG-01160 Subscribe The PCI Express DMA Multi-Channel Controller Example Design provides multi-channel support for the Stratix V Avalon Memory-Mapped (Avalon-MM) DMA for PCI Express IP Core.
More informationVideo and Image Processing Suite
Video and Image Processing Suite December 2006, Version 7.0 Errata Sheet This document addresses known errata and documentation issues for the MegaCore functions in the Video and Image Processing Suite,
More informationWhite Paper AHB to Avalon & Avalon to AHB Bridges
White Paper AHB to & to AHB s Introduction For years, system designers have been manually connecting IP peripheral functions to embedded processors, taking anywhere from weeks to months to accomplish.
More informationDDR & DDR2 SDRAM Controller Compiler
DDR & DDR2 SDRAM Controller Compiler August 2007, Compiler Version 7.1 Errata Sheet This document addresses known errata and documentation issues for the DDR and DDR2 SDRAM Controller Compiler version
More informationLaboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication
Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication Introduction All processors offer some form of instructions to add, subtract, and manipulate data.
More informationUsing the Serial FlashLoader With the Quartus II Software
Using the Serial FlashLoader With the Quartus II Software July 2006, ver. 3.0 Application Note 370 Introduction Using the Joint Test Action Group () interface, the Altera Serial FlashLoader (SFL) is the
More informationDDR & DDR2 SDRAM Controller Compiler
DDR & DDR2 SDRAM Controller Compiler May 2006, Compiler Version 3.3.1 Errata Sheet This document addresses known errata and documentation issues for the DDR and DDR2 SDRAM Controller Compiler version 3.3.1.
More informationNIOS II Processor Booting Methods In MAX 10 Devices
2015.01.23 AN-730 Subscribe MAX 10 device is the first MAX device series which supports Nios II processor. Overview MAX 10 devices contain on-chip flash which segmented to two types: Configuration Flash
More informationDesign Verification Using the SignalTap II Embedded
Design Verification Using the SignalTap II Embedded Logic Analyzer January 2003, ver. 1.0 Application Note 280 Introduction The SignalTap II embedded logic analyzer, available exclusively in the Altera
More informationNIOS CPU Based Embedded Computer System on Programmable Chip
1 Objectives NIOS CPU Based Embedded Computer System on Programmable Chip EE8205: Embedded Computer Systems This lab has been constructed to introduce the development of dedicated embedded system based
More informationCyclone II FPGA Family
ES-030405-1.3 Errata Sheet Introduction This errata sheet provides updated information on Cyclone II devices. This document addresses known device issues and includes methods to work around the issues.
More informationLegacy SDRAM Controller with Avalon Interface
Legacy SDRAM Controller with Avalon Interface January 2003, Version 1.0 Data Sheet Introduction PTF Assignments SDRAM is commonly used in cost-sensitive applications requiring large amounts of memory.
More informationNIOS CPU Based Embedded Computer System on Programmable Chip
NIOS CPU Based Embedded Computer System on Programmable Chip 1 Lab Objectives EE8205: Embedded Computer Systems NIOS-II SoPC: PART-I This lab has been constructed to introduce the development of dedicated
More informationWhite Paper Taking Advantage of Advances in FPGA Floating-Point IP Cores
White Paper Recently available FPGA design tools and IP provide a substantial reduction in computational resources, as well as greatly easing the implementation effort in a floating-point datapath. Moreover,
More informationDDR & DDR2 SDRAM Controller Compiler
DDR & DDR2 SDRAM Controller Compiler march 2007, Compiler Version 7.0 Errata Sheet This document addresses known errata and documentation issues for the DDR and DDR2 SDRAM Controller Compiler version 7.0.
More informationDSP Builder. DSP Builder v6.1 Issues. Error When Directory Pathname is a Network UNC Path
March 2007, Version 6.1 Errata Sheet This document addresses known errata and documentation changes for DSP Builder version 6.1. Errata are functional defects or errors which may cause DSP Builder to deviate
More informationAvalon Streaming Interface Specification
Avalon Streaming Interface Specification 101 Innovation Drive San Jose, CA 95134 www.altera.com Document Version: 1.3 Document Date: June 2007 Copyright 2005 Altera Corporation. All rights reserved. Altera,
More informationPOS-PHY Level 4 MegaCore Function
POS-PHY Level 4 MegaCore Function November 2004, MegaCore Version 2.2.2 Errata Sheet Introduction This document addresses known errata and documentation changes for version v2.2.2 of the POS-PHY Level
More informationDSP Builder Release Notes
April 2006, Version 6.0 SP1 Release Notes These release notes for DSP Builder version 6.0 SP1 contain the following information: System Requirements New Features & Enhancements Errata Fixed in This Release
More informationDDR & DDR2 SDRAM Controller
DDR & DDR2 SDRAM Controller December 2005, Compiler Version 3.3.1 Release Notes These release notes for the DDR and DDR2 SDRAM Controller Compiler version 3.3.1 contain the following information: System
More informationDDR & DDR2 SDRAM Controller
DDR & DDR2 SDRAM Controller October 2005, Compiler Version 3.3.0 Release Notes These release notes for the DDR and DDR2 SDRAM Controller Compiler version 3.3.0 contain the following information: System
More informationTable 1 shows the issues that affect the FIR Compiler v7.1.
May 2007, Version 7.1 Errata Sheet This document addresses known errata and documentation issues for the Altera, v7.1. Errata are functional defects or errors, which may cause an Altera MegaCore function
More informationRLDRAM II Controller MegaCore Function
RLDRAM II Controller MegaCore Function November 2006, MegaCore Version 1.0.0 Errata Sheet This document addresses known errata and documentation issues for the RLDRAM II Controller MegaCore function version
More informationMaking Qsys Components. 1 Introduction. For Quartus II 13.0
Making Qsys Components For Quartus II 13.0 1 Introduction The Altera Qsys tool allows a digital system to be designed by interconnecting selected Qsys components, such as processors, memory controllers,
More informationNios Soft Core Embedded Processor
Nios Soft Core Embedded Processor June 2000, ver. 1 Data Sheet Features... Preliminary Information Part of Altera s Excalibur TM embedded processor solutions, the Nios TM soft core embedded processor is
More informationAN 549: Managing Designs with Multiple FPGAs
AN 549: Managing Designs with Multiple FPGAs October 2008 AN-549-1.0 Introduction Managing designs that incorporate multiple FPGAs raises new challenges that are unique compared to designs using only one
More informationExcalibur Solutions DPRAM Reference Design
Excalibur Solutions DPRAM Reference Design August 22, ver. 2.3 Application Note 173 Introduction The Excalibur devices are excellent system development platforms, offering flexibility, performance, and
More informationNios II C2H Compiler User Guide
Nios II C2H Compiler User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com Nios II C2H Compiler Version: 7.1 Document Version: 1.2 Document Date: May 2007 Copyright 2007 Altera Corporation.
More informationRemote Drive. Quick Start Guide. System Level Solutions, Inc. (USA) Murphy Avenue San Martin, CA (408) Version : 0.1.
Remote Drive Quick Start Guide, Inc. (USA) 14100 Murphy Avenue San Martin, CA 95046 (408) 852-0067 http://www.slscorp.com Version : 0.1.1 Date : July 17, 2007 Copyright 2007,.All rights reserved. SLS,
More informationCORDIC Reference Design. Introduction. Background
CORDIC Reference Design June 2005, ver. 1.4 Application Note 263 Introduction The co-ordinate rotation digital computer (CORDIC) reference design implements the CORDIC algorithm, which converts cartesian
More information2. System Interconnect Fabric for Memory-Mapped Interfaces
2. System Interconnect Fabric for Memory-Mapped Interfaces QII54003-8.1.0 Introduction The system interconnect fabric for memory-mapped interfaces is a high-bandwidth interconnect structure for connecting
More information9. Functional Description Example Designs
November 2012 EMI_RM_007-1.3 9. Functional Description Example Designs EMI_RM_007-1.3 This chapter describes the example designs and the traffic generator. Two independent example designs are created during
More information2.5G Reed-Solomon II MegaCore Function Reference Design
2.5G Reed-Solomon II MegaCore Function Reference Design AN-642-1.0 Application Note The Altera 2.5G Reed-Solomon (RS) II MegaCore function reference design demonstrates a basic application of the Reed-Solomon
More informationUsing the Nios Development Board Configuration Controller Reference Designs
Using the Nios Development Board Controller Reference Designs July 2006 - ver 1.1 Application Note 346 Introduction Many modern embedded systems utilize flash memory to store processor configuration information
More informationUTOPIA Level 2 Slave MegaCore Function
UTOPIA Level 2 Slave MegaCore Function October 2005, Version 2.5.0 Release Notes These release notes for the UTOPIA Level 2 Slave MegaCore function contain the following information: System Requirements
More informationMICROTRONIX AVALON MULTI-PORT FRONT END IP CORE
MICROTRONIX AVALON MULTI-PORT FRONT END IP CORE USER MANUAL V1.0 Microtronix Datacom Ltd 126-4056 Meadowbrook Drive London, ON, Canada N5L 1E3 www.microtronix.com Document Revision History This user guide
More informationWhite Paper Assessing FPGA DSP Benchmarks at 40 nm
White Paper Assessing FPGA DSP Benchmarks at 40 nm Introduction Benchmarking the performance of algorithms, devices, and programming methodologies is a well-worn topic among developers and research of
More informationError Correction Code (ALTECC_ENCODER and ALTECC_DECODER) Megafunctions User Guide
Error Correction Code (ALTECC_ENCODER and ALTECC_DECODER) Megafunctions User Guide 11 Innovation Drive San Jose, CA 95134 www.altera.com Software Version 8. Document Version: 2. Document Date: June 28
More informationFPGA Design Security Solution Using MAX II Devices
White Paper FPGA Solution Using MAX II Devices Introduction SRAM-based FPGAs are volatile devices. They require external memory to store the configuration data that is sent to them at power up. It is possible
More informationSimulating the ASMI Block in Your Design
2015.08.03 AN-720 Subscribe Supported Devices Overview You can simulate the ASMI block in your design for the following devices: Arria V, Arria V GZ, Arria 10 Cyclone V Stratix V In the Quartus II software,
More informationSystem Debugging Tools Overview
9 QII53027 Subscribe About Altera System Debugging Tools The Altera system debugging tools help you verify your FPGA designs. As your product requirements continue to increase in complexity, the time you
More informationNios DMA. General Description. Functional Description
Nios DMA January 2003, Version 1.1 Data Sheet General Functional The Nios DMA module is an Altera SOPC Builder library component included in the Nios development kit. The DMA module allows for efficient
More informationNios II C2H Compiler User Guide
Nios II C2H Compiler User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com Nios II C2H Compiler Version: 7.2 Document Date: October 2007 Copyright 2007 Altera Corporation. All rights reserved.
More informationByteBlaster II Download Cable User Guide
ByteBlaster II Download Cable User Guide 101 Innovation Drive San Jose, CA 95134 (408) 544-7000 http://www.altera.com UG-BBII81204-1.1 P25-10324-00 Document Version: 1.1 Document Date: December 2004 Copyright
More informationMAX 10 User Flash Memory User Guide
MAX 10 User Flash Memory User Guide Subscribe Last updated for Quartus Prime Design Suite: 16.0 UG-M10UFM 101 Innovation Drive San Jose, CA 95134 www.altera.com TOC-2 Contents MAX 10 User Flash Memory
More informationUniversity of Massachusetts Amherst Computer Systems Lab 2 (ECE 354) Spring Lab 1: Using Nios 2 processor for code execution on FPGA
University of Massachusetts Amherst Computer Systems Lab 2 (ECE 354) Spring 2007 Lab 1: Using Nios 2 processor for code execution on FPGA Objectives: After the completion of this lab: 1. You will understand
More informationAN 831: Intel FPGA SDK for OpenCL
AN 831: Intel FPGA SDK for OpenCL Host Pipelined Multithread Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Intel FPGA SDK for OpenCL Host Pipelined Multithread...3 1.1
More informationAN 464: DFT/IDFT Reference Design
Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents About the DFT/IDFT Reference Design... 3 Functional Description for the DFT/IDFT Reference Design... 4 Parameters for the
More informationIntroduction. Design Hierarchy. FPGA Compiler II BLIS & the Quartus II LogicLock Design Flow
FPGA Compiler II BLIS & the Quartus II LogicLock Design Flow February 2002, ver. 2.0 Application Note 171 Introduction To maximize the benefits of the LogicLock TM block-based design methodology in the
More informationFFT/IFFT Block Floating Point Scaling
FFT/IFFT Block Floating Point Scaling October 2005, ver. 1.0 Application Note 404 Introduction The Altera FFT MegaCore function uses block-floating-point (BFP) arithmetic internally to perform calculations.
More informationStratix II vs. Virtex-4 Performance Comparison
White Paper Stratix II vs. Virtex-4 Performance Comparison Altera Stratix II devices use a new and innovative logic structure called the adaptive logic module () to make Stratix II devices the industry
More informationIntel High Level Synthesis Compiler
Intel High Level Synthesis Compiler Best Practices Guide Updated for Intel Quartus Prime Design Suite: 18.0 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1. Intel HLS Compiler
More informationExcalibur Solutions Using the Expansion Bus Interface. Introduction. EBI Characteristics
Excalibur Solutions Using the Expansion Bus Interface October 2002, ver. 1.0 Application Note 143 Introduction In the Excalibur family of devices, an ARM922T processor, memory and peripherals are embedded
More informationIntel High Level Synthesis Compiler
Intel High Level Synthesis Compiler User Guide Updated for Intel Quartus Prime Design Suite: 18.0 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1....3 2. Overview of the
More informationUSB BitJetLite Download Cable
USB BitJetLite Download Cable User Guide, Inc. (USA) 14100 Murphy Avenue San Martin, CA 95046 (408) 852-0067 http://www.slscorp.com Product Version: 1.0 Document Version: 1.0 Document Date: Copyright 2010,.All
More informationStratix vs. Virtex-II Pro FPGA Performance Analysis
White Paper Stratix vs. Virtex-II Pro FPGA Performance Analysis The Stratix TM and Stratix II architecture provides outstanding performance for the high performance design segment, providing clear performance
More informationECE1387 Exercise 3: Using the LegUp High-level Synthesis Framework
ECE1387 Exercise 3: Using the LegUp High-level Synthesis Framework 1 Introduction and Motivation This lab will give you an overview of how to use the LegUp high-level synthesis framework. In LegUp, you
More informationUsing Tightly Coupled Memory with the Nios II Processor
Using Tightly Coupled Memory with the Nios II Processor TU-N2060305-1.2 This document describes how to use tightly coupled memory in designs that include a Nios II processor and discusses some possible
More informationBoard Update Portal based on Nios II Processor with EPCQ (Arria 10 GX FPGA Development Kit)
Board Update Portal based on Nios II Processor with EPCQ (Arria 10 GX FPGA Development Kit) Date: 1 December 2016 Revision:1.0 2015 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY,
More informationAN 829: PCI Express* Avalon -MM DMA Reference Design
AN 829: PCI Express* Avalon -MM DMA Reference Design Updated for Intel Quartus Prime Design Suite: 18.0 Subscribe Latest document on the web: PDF HTML Contents Contents 1....3 1.1. Introduction...3 1.1.1.
More informationTable 1 shows the issues that affect the FIR Compiler, v6.1. Table 1. FIR Compiler, v6.1 Issues.
December 2006, Version 6.1 Errata Sheet This document addresses known errata and documentation issues for the Altera FIR Compiler, v6.1. Errata are functional defects or errors, which may cause an Altera
More informationCyclone II FFT Co-Processor Reference Design
Cyclone II FFT Co-Processor Reference Design May 2005 ver. 1.0 Application Note 375 Introduction f The fast Fourier transform (FFT) co-processor reference design demonstrates the use of an Altera FPGA
More informationNios II Custom Instruction User Guide
Nios II Custom Instruction User Guide Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Nios II Custom Instruction Overview...4 1.1 Custom Instruction Implementation... 4
More informationAN 370: Using the Serial FlashLoader with the Quartus II Software
AN 370: Using the Serial FlashLoader with the Quartus II Software April 2009 AN-370-3.1 Introduction Using the interface, the Altera Serial FlashLoader (SFL) is the first in-system programming solution
More informationNios II Studio Help System
Nios II Studio Help System 101 Innovation Drive San Jose, CA 95134 www.altera.com Nios II Studio Version: 8.1 Beta Document Version: 1.2 Document Date: November 2008 UG-01042-1.2 Table Of Contents About
More informationFFT Co-Processor Reference Design
FFT Co-Processor Reference Design October 2004 ver. 1.0 Application Note 363 Introduction f The Fast Fourier Transform (FFT) co-processor reference design demonstrates the use of an Altera FPGA as a high-performance
More informationWhite Paper Using the MAX II altufm Megafunction I 2 C Interface
White Paper Using the MAX II altufm Megafunction I 2 C Interface Introduction Inter-Integrated Circuit (I 2 C) is a bidirectional two-wire interface protocol, requiring only two bus lines; a serial data/address
More information4. TriMatrix Embedded Memory Blocks in HardCopy IV Devices
January 2011 HIV51004-2.2 4. TriMatrix Embedded Memory Blocks in HardCopy IV Devices HIV51004-2.2 This chapter describes TriMatrix memory blocks, modes, features, and design considerations in HardCopy
More informationSupporting Custom Boards with DSP Builder
Supporting Custom Boards with DSP Builder April 2003, ver. 1.0 Application Note 221 Introduction As designs become more complex, verification becomes a critical, time consuming process. To address the
More informationIntroduction to Simulation of VHDL Designs Using ModelSim Graphical Waveform Editor. 1 Introduction. For Quartus Prime 16.1
Introduction to Simulation of VHDL Designs Using ModelSim Graphical Waveform Editor For Quartus Prime 16.1 1 Introduction This tutorial provides an introduction to simulation of logic circuits using the
More information9. Building Memory Subsystems Using SOPC Builder
9. Building Memory Subsystems Using SOPC Builder QII54006-6.0.0 Introduction Most systems generated with SOPC Builder require memory. For example, embedded processor systems require memory for software
More informationSystem-on-a-Programmable-Chip (SOPC) Development Board
System-on-a-Programmable-Chip (SOPC) Development Board Solution Brief 47 March 2000, ver. 1 Target Applications: Embedded microprocessor-based solutions Family: APEX TM 20K Ordering Code: SOPC-BOARD/A4E
More informationUniversity Program 3 Kit
University Program 3 Kit VLSI Tutorial : LEDs & Push Buttons Version 02.00 System Level Solutions Inc. (USA) 14702 White Cloud Ct. Morgan Hill, CA 95037 2 System Level Solutions Copyright 2003-2005 System
More informationMultimedia Decoder Using the Nios II Processor
Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra
More informationPCI Express Compiler. System Requirements. New Features & Enhancements
April 2006, Compiler Version 2.1.0 Release Notes These release notes for the PCI Express Compiler version 2.1.0 contain the following information: System Requirements New Features & Enhancements Errata
More information