On Using Simulink to Program SRC-6 Reconfigurable Computer

Size: px
Start display at page:

Download "On Using Simulink to Program SRC-6 Reconfigurable Computer"

Transcription

1 In Proc. 9 th Military and Aerospace Programmable Logic Devices (MAPLD) International Conference September, 2006, Washington, DC. On Using Simulink to Program SRC-6 Reconfigurable Computer David Meixner, Volodymyr Kindratenko, David Pointer National Center for Supercomputing Applications (NCSA), University of Illinois at Urbana-Champaign (UIUC) addresses: dmeixner@uiuc.edu, kindr@ncsa.uiuc.edu, pointer@ncsa.uiuc.edu Abstract By design, the SRC-6 reconfigurable computer is programmed in the MAP C programming language within the framework provided by the SRC Carte development environment. The functionality of the original language can be extended via third party subroutines, called macros. These macros, typically implemented in Verilog Hardware Description Language, are brought into the MAP C program via configuration files that define the interface between the macros and the MAP C language. In this paper, we describe a process of using the Verilog source for SRC macros generated from the MathWorks Simulink designs built using Xilinx System Generator for DSP and Xilinx Blockset. We also describe an example application that takes advantage of this programming model. Introduction In this paper we describe a process of programming the SRC-6 reconfigurable computer [1] using MathWorks Simulink [2] with Xilinx System Generator for DSP [3] and Xilinx Blockset [4]. By design, the SRC-6 is programmed in the SRC MAP C programming language [5]. Code development for the SRC-6 platform closely resembles code development for a conventional microprocessor-based system, except for explicit code to support data transfer between the system memory and FPGA-controlled memory. The SRC Carte development environment [5] allows the developer to bring in third-party subroutines, called macros, that can be used to extend the functionality of the original language. These macros are typically implemented in Verilog Hardware Description Language (HDL) and are brought into the MAP C program via configuration files that define the interface between the macros and MAP C language. We describe a method of using the Verilog source for SRC macros generated from the MathWork s Simulinkbased designs. The rest of the paper is organized as follows. First, we provide a brief overview of the SRC-6 reconfigurable computer and its programming model. Next, we present steps necessary to integrate a Simulink-based design with MAP C source code (part of this work also appears in [6]). We briefly discuss issues related to numerical accuracy of using fixed-precision arithmetic instead of floating-point arithmetic. We then present an application example which shows the benefits of using Simulink instead of the native MAP C implementation. SRC-6 reconfigurable computer The SRC-6 MAPstation used in this work consists of a dual- CPU Xeon board, one MAP Series C and one MAP Series E processor, and an 8 GB common memory module, all interconnected with a 1.4 GB/s low latency Hi-Bar switch (Figure 1). The SNAP Series B interface board is used to connect the CPU board to the Hi-Bar switch. SNAP Memory µp PCI-X Dual Xeon 2.8 GHz, 1 GB memory SRC Hi-Bar 4-port Switch MAPC Common Memory MAPE Figure 1: SRC-6 MAPstation used in the course of this study. The MAP Series C processor module consists of two user FPGAs, one control FPGA, and memory (Figure 2). There are six banks (A-F) of on-board memory (); each bank is bits wide and 4 MB deep for a total of 24 MB. There is an additional 4 MB of dual-ported memory dedicated to data transfer between the two FPGAs. The two user FPGAs in the MAP Series C are Xilinx Virtex-II XC2V6000 FPGAs. The FPGA clock rate of 100 MHz is set from within the SRC programming environment. The MAP Series E processor module is identical to the Series C module with the exception of the user FPGAs: The two user FPGAs in the MAP Series E are Xilinx Virtex-II Pro XC2VP100 chips. Distributed SRAM in User Logic GB/s Block SRAM in User Logic GB/s User FPGA GB/s sustained payload A B Control FPGA C 2400 MB/s each GPIO 1.4 GB/s sustained payload D Dual-ported Memory (4 MB) E 192 On-Board SRAM GB/s Microprocessor Memory MB/s F User FPGA 1 Corresponding author Figure 2: MAP Series C processor module. 1

2 Software for the SRC MAPstation is developed in the MAP C programming language using the Carte version 2.1 programming environment. The Intel C (icc) version 8.1 compiler is used to generate the CPU side of the combined CPU/MAP executable. The SRC MAP C compiler produces the hardware description of the FPGA design for our final, combined CPU/MAP target executable. This intermediate hardware description of the FPGA design is passed to Xilinx ISE place and route tools to produce the FPGA bit file. Finally, the linker is invoked to combine the CPU code and the FPGA hardware bit file into a unified executable. Figure 3 provides an overview of the code development process. and run the System Generator. Its parameters should be set up as follows: Compilation Type: HDL Netlist Part: the FPGA to be used, e.g., Virtex2 xc2v6000-4ff1517 for MAP Series C processor Target Directory: the directory where the Verilog files will be outputted Hardware Description Language: Verilog FPGA Clock period (ns): 10 Source (C or Fortran) Source (MAP C) Macro source (VHDL/Verilog Verilog) C/Fortran compiler Verilog source MAP compiler Logic synthesis binary objects MAP library binary objects netlist Linker Place & route Unified executable µp P code FPGA code µp P code Figure 3: SRC-6 code development process. bitstream Figure 4: Simple Simulink Example, addmult.mdl. Functionality of the MAP C language can be extended via third-party user macros written in an HDL, such as Verilog, and integrated with the Carte programming environment via a black box file that defines the interface information for all of the user macros and an info file that establishes the mapping between operators and calls in the source program and the macros and signal names in the Verilog code generated by the MAP compiler. We use this feature of the Carte development environment to bring in Verilog source generated from Simulink-based designs. MathWorks Simulink model design The Simulink model one wishes to incorporate into SRC-6 s Carte framework should be created using the Xilinx Blockset for Simulink. Figure 4 shows a simple Simulink example which takes three inputs: a, b, and c, and outputs q=(a+b)*c. For this example, all signals have a bit width of 40 and a binary point of 30. The input and output ports are 'gateway in' and 'gateway out' blocks, respectively. They should be labeled in lowercase letters using the same names as the variables to be used in the resulting macro. The 'gateway in' blocks also allow one to specify the bit width and binary point of the inputs; however, they do not perform the actual data type conversion. The programmer is responsible for performing the required data type conversion; more on this follows Once the Simulink model is created, the next step is to set up Once the model generation is complete, several files are created (here <design> is the name of the model): <design>_files.v contains most of the HDL for the design. <design>_clk_wrapper.v an HDL wrapper that drives clocks and clock enables. conv_pkg.v contains constants and functions used by <design>_fils.v..edn files implementation of the parts of the design. In the above example, the following files are produced: addmult_files.v, addmult_clk_wrapper.v, conv_pkg.v, adder_subtracter_virtex2_7_0_84f1dba84ee809b9.edn, and multiplier_virtex2_7_0_b018b3a1b259a550.edn. These files need to be copied into the macro directory of the target MAP C implementation. Integration with Carte framework First, <design>_clk_wrapper.v file needs to be modified to include <deisgn>_files.v in addition to conv_pkg.v: --System Generator code here-- `include "conv_pkg.v" `include "addmult_files.v" /* line added manually */ module addmult_clk_wrapper (a, b, c, ce, ce_clr, clk, q); --System Generator code here-- 2

3 Note that some Xilinx blocks (i.e. FIR filters) cannot be generated using Verilog, so VHDL has to be used instead. When using VHDL source instead of Verilog, one needs to include the <design>_clk_wrapper.prj file in the macros directory instead of adding the "`include <design>_files.v" line, and modify the following line: "set_option -disable_io_insertion false" so that it is set to "true". Next, a black box definition for the design is created. It includes the inputs and outputs defined in the Simulink model, as well as the signals ce, ce_clr, and clk, created by the System Generator. If the design does not have any delays, the signals ce, ce_clr, and clk will not be generated and should therefore not be included. For this example, the multiplier has a delay of 5, so the clock signals are generated. The black box definition appears as follows (remember that we are using 40-bit fixed-point numbers): module addmult_clk_wrapper (a, b, c, ce, ce_clr, clk, q); input [39:0] a; input [39:0] b; input [39:0] c; input ce; input ce_clr; input clk; output [39:0] q; endmodule The next step is to create the info file. This file links the operators and calls from the source program to macros and signal names in the HDL code. Note that the inputs and outputs are -bits in the source code, but we only use the bottom 40-bits in our HDL code. The clk signal should always be mapped to CLOCK. Lastly, the debug function can be included as well. The resultant info file is as follows: BEGIN_DEF "addmult" MACRO = "addmult_clk_wrapper"; STATEFUL = NO; EXTERNAL = NO; PIPELINED = YES; LATENCY = 5; INPUTS = 3: I0 = INT BITS (a[39:0]) // explicit input I1 = INT BITS (b[39:0]) // explicit input I2 = INT BITS (c[39:0]) // explicit input ; OUTPUTS = 1: O0 = INT BITS (q[39:0]) // explicit output ; IN_SIGNAL : 1 BITS "ce" = "1'b1"; IN_SIGNAL : 1 BITS "ce_clr" = "1'b0"; IN_SIGNAL : 1 BITS "clk" = "CLOCK"; END_DEF Next, we add the path for <design>_clk_wrapper.v to the MACROS line of the Makefile. For this example, the line would read: MACROS = macros/addmult_clk_wrapper.v The last step is to provide a MAP C function prototype: void addmult(int_t a, int_t b, int_t c, int_t *q); Now the macro can be called from MAP C code just as any other macro: addmult(a, b, c, &q); Numerical conversion Since Simulink performs all calculations with fixed-point arithmetic, floating-point numbers must be converted to fixed-point representation in order to take advantage of variable resolution. This conversion can occur either on the microprocessor side before the data is transferred from the main memory to the MAP or on the MAP before the data is used by the Simulink-based design. We have implemented subroutines both in C for the execution on the microprocessor and in MAP C for the inclusion in the MAP s algorithm implementation. Appendix A includes C subroutines that convert between the floating-point and fixed-point representations. When converting between floating-point and fixed-point numerical representations, there can be a loss of numerical resolution. The size (number of bits) and the binary point of the fixed-point numbers determine its range and precision, so a larger range or higher precision requires more bits to be used. This is a tradeoff that must be taken into consideration by the programmer. For instance, in the above example the variables are 40 bits wide with the decimal point placed after the 30 th bit. This allows for an accuracy of e-10 and a maximum range of just under ±2 10, since there are 10 bits for the integer and 30 bits for the decimal. Example application The ability to use fixed-point arithmetic with a user-defined bit width allows one to save the FPGA resources when compared to the use of native floating-point data types. This, in turn, allows one to implement additional compute logic on the chip, thus shortening the overall calculation time. We demonstrate this in the following application. In astronomy, the two-point angular correlation function (TPACF), ω(θ), encodes the frequency distribution of angular separations, θ, between objects on the celestial sphere as compared to randomly distributed objects across the same space [7]. Qualitatively, a positive value of ω(θ) indicates that objects are found more frequently at angular separations of θ than expected for randomly distributed 3

4 objects, and ω(θ)=0 indicates a random distribution of objects. Precise computation of the TPACF can involve calculation of autocorrelation and cross-correlation components for different angular separations θ for a large number of celestial objects. As an example, the problem of computing the autocorrelation function for this particular application can be expressed as follows: Input: Set of points x 1,, x n with Cartesian coordinates distributed on the surface of the 3-sphere and a small number b of bins: [θ 0, θ 1 ), [θ 1, θ 2 ),, [θ b-1, θ b ]. Output: For each bin, the number of unique pairs of points (x i, x j ) for which the dot product is in the respective bin: B k = {ij: θ k-1 <= x i x j < θ k. This problem can be solved in log(b)(n-1)n/2 steps by sequentially looping through all unique pairs of points in the data set, computing their dot product, and applying a binary search algorithm to identify the bin the dot product belongs to. The following is the core of the algorithm written in C: for (int i = 0; i < n-1; i++) { for (int j = i+1; j < n; j++) { double dot = x[i] * x[j] + y[i] * y[j] + z[i] * z[j]; int k, min = 0, max = nbins; while (max > min+1) { k = (min + max) / 2; if (dot >= binb[k]) max = k; else min = k; ; if (dot >= binb[min]) bin[min] += 1; else if (dot < binb[max]) bin[max+1] += 1; else bin[max] += 1; The core can be implemented in MAP C as is, however, it will not constitute an efficient FPGA implementation because only the most inner loop used for the binary search algorithm will be pipelined by the MAP C compiler. This inefficiency can be avoided if the number of bins b is known ahead of time and thus the binary search loop can be manually unrolled. As a result, the next most inner loop will be fully pipelined, and thus the entire problem can be solved in just (n-1)n/2 steps. For example, assuming that b<32, the following is an efficient MAP C implementation of the autocorrelation core: for (i = 0; i < n-1; i++) { pi_x = x[i]; pi_y = y[i]; pi_z = z[i]; #pragma loop noloop_dep for (j = i+1; j < n; j++) { cg_count_ceil_32 (1, 0, j == (i+1), 3, &bank); dot = pi_x * x[j] + pi_y * y[j] + pi_z * z[j]; select_pri_8_32val( (dot < bv31), 31, (dot < bv30), 30, (dot < bv29), 29, (dot < bv28), 28, (dot < bv27), 27, (dot < bv26), 26, (dot < bv25), 25, (dot < bv24), 24, (dot < bv23), 23, (dot < bv22), 22, (dot < bv21), 21, (dot < bv20), 20, (dot < bv19), 19, (dot < bv18), 18, (dot < bv17), 17, (dot < bv16), 16, (dot < bv15), 15, (dot < bv14), 14, (dot < bv13), 13, (dot < bv12), 12, (dot < bv11), 11, (dot < bv10), 10, (dot < bv09), 9, (dot < bv08), 8, (dot < bv07), 7, (dot < bv06), 6, (dot < bv05), 5,(dot < bv04), 4, (dot < bv03), 3, (dot < bv02), 2, (dot < bv01), 1, 0, &indx); if (bank == 0) bin1a[indx] += 1; else if (bank == 1) bin2a[indx] += 1; else if (bank == 2) bin3a[indx] += 1; else bin4a[indx] += 1; In this implementation, double-precision floating-point arithmetic is used in the calculation of the dot product, point coordinates are stored in the banks, care is taken to avoid read-after-write data dependency, and select_pri_8_32val macro is used to implement a sequence of if/if else statements. The dataset and random samples used to calculate TPACF in this study are the sample of photometrically classified quasars, and the random catalogs, first analyzed in this context by [7]. The dataset and each of the random realizations contains points (n=97178). We use five bins per decade of scale with θ min =0.01 arcminutes and θ max =10000 arcminutes. Thus, angular separations are spread across 6 decades of scale and require 30 bins (b=30). Covering this range of scales requires the use of doubleprecision floating-point arithmetic as single-precision floating-point numbers are not sufficient to accurately compute θ values smaller than 1 arcminute. However, a closer look at the numerical range of the bin boundaries shows that just 41 bits of the mantissa are sufficient to cover the required range of scales. Thus, instead of using doubleprecision floating-point arithmetic, we can use fixed-point arithmetic via macros implemented in Simulink. These macros will replace the dot product calculation and select_pri_8_32val macro as follows: dot_product(pi_x, pi_y, pi_z, x[j], y[j], z[j], &dot); bin_mapper(dot, bv01, bv02, bv03, bv04, bv05, bv06, bv07, bv08, bv09, bv10, bv11, bv12, bv13, bv14, bv15, bv16, bv17, bv18, bv19, bv20, bv21, bv22, bv23, bv24, bv25, bv26, bv27, bv28, bv29, bv30, bv31, bv32, &indx); In this implementation, point coordinates are stored in the banks as before. However, instead of the doubleprecision floating-point representation we use 48 bits fixedpoint representation with 42 bits allocated for the fractional part and 6 bits allocated for the integer part and the sign. The few extra bits in excess of the 41 bits previously 4

5 mentioned as sufficient to represent the bin boundaries are used to eliminate effects associated with rounding and overflow. The conversion between the point coordinates stored in the double-precision floating-point format and the fixed-point format takes place on the CPU before the data is translated to the banks. The overhead associated with the data type conversion is negligible compared to the overall computational time. The Simulink-based dot product macro implementation is shown in Figure 5 and the bin mapping macro design is shown in Figure 6 and Figure 7. Table 1 shows reports for loop pipelining, FPGA resource utilization, and place and route time for the complete design. Thus, SLICE utilization is reduced by 22% as compared to the original design. This was expected because 31 double-precision floating-point comparison operators are now replaced with much smaller 48-bits-wide fixed-point comparison operators and a cascade of 1-bit adders. Figure 5: Dot product macro implemented in Simulink. Figure 7: One of four sub-components from the bin mapping macro. Note that MULT18X18s utilization has increased from 12% to 18%. This was not expected; it can be explained, however, by a less efficient implementation of the multipliers in Xilinx Blockset. In addition, the time needed to place and route the entire design on the chip is reduced by a factor of 3. This is mainly due to the smaller overall size of the design. So, what did we gain by implementing parts of this application in Simulink rather than in native MAP C? It turns out that when using the native MAP C implementation, we are able to place only two such compute kernels on the FPGA before we run out of available SLICEs. However, when using the Simulink-based kernel implementation, because of the reduced usage of the SLICEs, we are able to place 4 compute kernels per chip, thus achieving twice the performance of the native MAP C implementation. Figure 6: Bin mapping macro implemented in Simulink. Original Modified pipeline depth (clocks) MULT18X18s 12% 18% RAMB16s 5% 5% SLICEs 63% 41% PAR time (minutes) Table 1. Comparison of two implementations. Conclusions We have demonstrated how a Simulink-based design created with Xilinx System Generator and Xilinx Blockset can be integrated with the native SRC MAP C code. The ability to introduce Simulink-based designs into the Carte framework opens up new possibilities in programming the SRC-6 system. The main advantage of using Simulink-based designs is the ability to use the fixed-point numeric type, which is not directly available in MAP C. This leads to reduced FPGA resource utilization as one can avoid the need to use larger numerical types for problems that require a reduced numerical range. Other benefits, although not explored in this paper, include the ability to use low-level FPGA resources (e.g., BRAM) directly and access to Xilinx IP cores, such as FFT and CORDIC algorithms, etc. We should point out that instead of using Simulink, one can implement the same functionality using an HDL directly. However, using HDL is a more involved process as it requires a set of skills typically possessed by those involved 5

6 with hardware design, whereas the Simulink environment provides a higher level of abstraction and is more familiar to software developers. Acknowledgements This work was funded by the National Science Foundation grant SCI We would like to thank Jon Huppenthal, David Caliga, Dan Poznanovic, and Dr. Jeff Hammes, all from SRC Computers Inc., for their help and support with the SRC-6 system. The TPACF work was performed in collaboration with Dr. Adam Myers and Dr. Robert Brunner from the Department of Astronomy at the University of Illinois at Urbana-Champaign and funded by NASA grants NAG , NAG , and NNG06GH15G. Special thanks to Trish Barker from NCSA s Office of Public Affairs for help in preparing this publication. References [1] SRC Computers Inc., Colorado Springs, CO, SRC Systems and Servers Datasheet, [2] [3] [4] [5] SRC Computers Inc., Colorado Springs, CO, SRC C Programming Environment v 2.1 Guide, [6] D. Meixner, V. Kindratenko, D. Pointer, Running Simulinkbased Designs on SRC-6, The High Performance Embedded Computing (HPEC 06). [7] A. Myers, R. Brunner, G. Richards, R. Nichol, D. Schneider, D. Vanden Berk, R. Scranton, A. Gray, and J. Brinkmann, First Measurement of the Clustering Evolution of Photometrically Classified Quasars, The Astrophysical Journal, 2006, 638, 622. Appendix A /* double2fix input: flp - a -bit floating-point number binpt - the placement of the decimal point output: a -bit integer holding the fixed-point representation */ long long double2fix (double flp, int binpt) { if (binpt == 63 && flp == -1) return ((long long)1 << 63); union {double d; long long l; temp; long long sign, exp, man, fixed; temp.d = flp; if (temp.d == 0) return 0; sign = temp.l >> 63; exp = ((temp.l >> 52) & 0x7FF) ; man = temp.l & 0xFFFFFFFFFFFFF; if (binpt-(52-exp) > 0) fixed = (man 0x ) << (binpt-(52-exp)) & 0x7FFFFFFFFFFFFFFF; else fixed = (man 0x ) >> ((52-exp)-binpt); if (sign < 0) fixed = ~fixed+1; return fixed; /* fix2double input: fixed - a -bit integer holding a fixed-point number binpt - the placement of the decimal point width - the bit width of the fixed-point number output: a double holding the floating-point representation */ double fix2double (long long fixed, int binpt, int width) { union {double d; long long l; flp; long long sign = 0; long long exp = 1023; long long man; if ((binpt == 63) && (fixed == (long long)1<<63)) return -1; else if ((fixed >> (width-1)) & 1) { if (width!= ) fixed = fixed ((long long)-1 << width); fixed = ~fixed+1; sign = 1; man = fixed; if (binpt == 63) { binpt--; exp--; if (fixed == 0) exp = 0; else if ((fixed >> binpt) > 1) { while ((fixed >> binpt) > 1) { fixed = fixed >> 1; exp++; else { while ((fixed >> binpt) < 1) { fixed = fixed << 1; exp--; if ((man >> 52) < 1 && man!= 0) { while (man >> 52 < 1) man = man << 1; else { while ((man >> 52) > 1) man = man >> 1; man = man & 0xFFFFFFFFFFFFF; flp.l = (sign << 63) (exp << 52) man; return flp.d; 6

Implementing Simulink Designs on SRC-6 System

Implementing Simulink Designs on SRC-6 System Implementing Simulink Designs on SRC-6 System 1. Introduction David Meixner, Volodymyr Kindratenko 1, David Pointer Innovative Systems Laboratory National Center for Supercomputing Applications University

More information

Accelerating Cosmology Applications

Accelerating Cosmology Applications Accelerating Cosmology Applications from 80 MFLOPS to 8 GFLOPS in 4 steps Volodymyr Kindratenko Innovative Systems Lab (ISL) (NCSA) Robert Brunner and Adam Myers Department of Astronomy University of Illinois

More information

Dynamic load-balancing on multi-fpga systems a case study

Dynamic load-balancing on multi-fpga systems a case study Dynamic load-balancing on multi-fpga systems a case study Volodymyr Kindratenko Innovative Systems Lab (ISL) (NCSA) Robert Brunner and Adam Myers Department of Astronomy University of Illinois at Urbana-Champaign

More information

Developing and Deploying Advanced Algorithms to Novel Supercomputing Hardware

Developing and Deploying Advanced Algorithms to Novel Supercomputing Hardware Developing and Deploying Advanced Algorithms to Novel Supercomputing Hardware Robert J. Brunner 1,2, Volodymyr V. Kindratenko 2, and Adam D. Myers 1 1) Department of Astronomy, 2) National Center for Supercomputing

More information

Implementation of the two-point angular correlation function on a high-performance reconfigurable computer

Implementation of the two-point angular correlation function on a high-performance reconfigurable computer Scientific Programming 17 (2009) 247 259 247 DOI 10.3233/SPR-2009-0286 IOS Press Implementation of the two-point angular correlation function on a high-performance reconfigurable computer Volodymyr V.

More information

Accelerating Cosmological Data Analysis with Graphics Processors Dylan W. Roeh Volodymyr V. Kindratenko Robert J. Brunner

Accelerating Cosmological Data Analysis with Graphics Processors Dylan W. Roeh Volodymyr V. Kindratenko Robert J. Brunner Accelerating Cosmological Data Analysis with Graphics Processors Dylan W. Roeh Volodymyr V. Kindratenko Robert J. Brunner University of Illinois at Urbana-Champaign Presentation Outline Motivation Digital

More information

Tools for Reconfigurable Supercomputing. Kris Gaj George Mason University

Tools for Reconfigurable Supercomputing. Kris Gaj George Mason University Tools for Reconfigurable Supercomputing Kris Gaj George Mason University 1 Application Development for Reconfigurable Computers Program Entry Platform mapping Debugging & Verification Compilation Execution

More information

Basic Xilinx Design Capture. Objectives. After completing this module, you will be able to:

Basic Xilinx Design Capture. Objectives. After completing this module, you will be able to: Basic Xilinx Design Capture This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: List various blocksets available in System

More information

Mitrion-C Application Development on SGI Altix 350/RC100

Mitrion-C Application Development on SGI Altix 350/RC100 In Proc. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 07), April 23-25, 2006, Napa, California Mitrion-C Application Development on SGI Altix 350/RC100 Volodymyr V. Kindratenko,

More information

IMPLICIT+EXPLICIT Architecture

IMPLICIT+EXPLICIT Architecture IMPLICIT+EXPLICIT Architecture Fortran Carte Programming Environment C Implicitly Controlled Device Dense logic device Typically fixed logic µp, DSP, ASIC, etc. Implicit Device Explicit Device Explicitly

More information

Tutorial - Using Xilinx System Generator 14.6 for Co-Simulation on Digilent NEXYS3 (Spartan-6) Board

Tutorial - Using Xilinx System Generator 14.6 for Co-Simulation on Digilent NEXYS3 (Spartan-6) Board Tutorial - Using Xilinx System Generator 14.6 for Co-Simulation on Digilent NEXYS3 (Spartan-6) Board Shawki Areibi August 15, 2017 1 Introduction Xilinx System Generator provides a set of Simulink blocks

More information

An Implementation Comparison of an IDEA Encryption Cryptosystem on Two General-Purpose Reconfigurable Computers

An Implementation Comparison of an IDEA Encryption Cryptosystem on Two General-Purpose Reconfigurable Computers An Implementation Comparison of an IDEA Encryption Cryptosystem on Two General-Purpose Reconfigurable Computers Allen Michalski 1, Kris Gaj 1, Tarek El-Ghazawi 2 1 ECE Department, George Mason University

More information

Considerations for Algorithm Selection and C Programming Style for the SRC-6E Reconfigurable Computer

Considerations for Algorithm Selection and C Programming Style for the SRC-6E Reconfigurable Computer Considerations for Algorithm Selection and C Programming Style for the SRC-6E Reconfigurable Computer Russ Duren and Douglas Fouts Naval Postgraduate School Abstract: The architecture and programming environment

More information

Optimize DSP Designs and Code using Fixed-Point Designer

Optimize DSP Designs and Code using Fixed-Point Designer Optimize DSP Designs and Code using Fixed-Point Designer MathWorks Korea 이웅재부장 Senior Application Engineer 2013 The MathWorks, Inc. 1 Agenda Fixed-point concepts Introducing Fixed-Point Designer Overview

More information

Modeling and implementation of dsp fpga solutions

Modeling and implementation of dsp fpga solutions See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/228877179 Modeling and implementation of dsp fpga solutions Article CITATIONS 9 READS 57 4

More information

Performance and Overhead in a Hybrid Reconfigurable Computer

Performance and Overhead in a Hybrid Reconfigurable Computer Performance and Overhead in a Hybrid Reconfigurable Computer Osman Devrim Fidanci 1, Dan Poznanovic 2, Kris Gaj 3, Tarek El-Ghazawi 1, Nikitas Alexandridis 1 1 George Washington University, 2 SRC Computers

More information

Parallel FIR Filters. Chapter 5

Parallel FIR Filters. Chapter 5 Chapter 5 Parallel FIR Filters This chapter describes the implementation of high-performance, parallel, full-precision FIR filters using the DSP48 slice in a Virtex-4 device. ecause the Virtex-4 architecture

More information

SECURE PARTIAL RECONFIGURATION OF FPGAs. Amir S. Zeineddini Kris Gaj

SECURE PARTIAL RECONFIGURATION OF FPGAs. Amir S. Zeineddini Kris Gaj SECURE PARTIAL RECONFIGURATION OF FPGAs Amir S. Zeineddini Kris Gaj Outline FPGAs Security Our scheme Implementation approach Experimental results Conclusions FPGAs SECURITY SRAM FPGA Security Designer/Vendor

More information

NCSA GPU programming tutorial day 4. Dylan Roeh

NCSA GPU programming tutorial day 4. Dylan Roeh NCSA GPU programming tutorial day 4 Dylan Roeh droeh2@illinois.edu Introduction Application: Two Point Angular Correlation Function The goal is, given a large set D of unit vectors, to compute a histogram

More information

First-hand experience on porting MATPHOT code to SRC platform

First-hand experience on porting MATPHOT code to SRC platform First-hand experience on porting MATPHOT code to SRC platform Volodymyr (Vlad) Kindratenko NCSA, UIUC kindr@ncsauiucedu Presentation outline What is MATPHOT? MATPHOT code Testbed code Implementations on

More information

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage silage@temple.edu ECE Temple University www.temple.edu/scdl Signal Processing Algorithms into Fixed Point FPGA Hardware Motivation

More information

Objectives. After completing this module, you will be able to:

Objectives. After completing this module, you will be able to: Signal Routing This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Describe how signals are converted through Gateway In

More information

Accelerating Scientific Applications with High-Performance Reconfigurable Computing (HPRC)

Accelerating Scientific Applications with High-Performance Reconfigurable Computing (HPRC) Accelerating Scientific Applications with High-Performance Reconfigurable Computing (HPRC) Volodymyr V. Kindratenko Innovative Systems Laboratory (ISL) (NCSA) University of Illinois at Urbana-Champaign

More information

Overview of ROCCC 2.0

Overview of ROCCC 2.0 Overview of ROCCC 2.0 Walid Najjar and Jason Villarreal SUMMARY FPGAs have been shown to be powerful platforms for hardware code acceleration. However, their poor programmability is the main impediment

More information

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN Xiaoying Li 1 Fuming Sun 2 Enhua Wu 1, 3 1 University of Macau, Macao, China 2 University of Science and Technology Beijing, Beijing, China

More information

INTRODUCTION TO CATAPULT C

INTRODUCTION TO CATAPULT C INTRODUCTION TO CATAPULT C Vijay Madisetti, Mohanned Sinnokrot Georgia Institute of Technology School of Electrical and Computer Engineering with adaptations and updates by: Dongwook Lee, Andreas Gerstlauer

More information

A Library of Parameterized Floating-point Modules and Their Use

A Library of Parameterized Floating-point Modules and Their Use A Library of Parameterized Floating-point Modules and Their Use Pavle Belanović and Miriam Leeser Department of Electrical and Computer Engineering Northeastern University Boston, MA, 02115, USA {pbelanov,mel}@ece.neu.edu

More information

Introduction to DSP/FPGA Programming Using MATLAB Simulink

Introduction to DSP/FPGA Programming Using MATLAB Simulink دوازدهمين سمينار ساليانه دانشكده مهندسي برق فناوری های الکترونيک قدرت اسفند 93 Introduction to DSP/FPGA Programming Using MATLAB Simulink By: Dr. M.R. Zolghadri Dr. M. Shahbazi N. Noroozi 2 Table of main

More information

QP: A Heterogeneous Multi-Accelerator Cluster

QP: A Heterogeneous Multi-Accelerator Cluster 10th LCI International Conference on High-Performance Clustered Computing March 10-12, 2009; Boulder, Colorado QP: A Heterogeneous Multi-Accelerator Cluster Michael Showerman 1, Jeremy Enos, Avneesh Pant,

More information

Quixilica Floating Point FPGA Cores

Quixilica Floating Point FPGA Cores Data sheet Quixilica Floating Point FPGA Cores Floating Point Adder - 169 MFLOPS* on VirtexE-8 Floating Point Multiplier - 152 MFLOPS* on VirtexE-8 Floating Point Divider - 189 MFLOPS* on VirtexE-8 Floating

More information

Reconfigurable Hardware Implementation of Mesh Routing in the Number Field Sieve Factorization

Reconfigurable Hardware Implementation of Mesh Routing in the Number Field Sieve Factorization Reconfigurable Hardware Implementation of Mesh Routing in the Number Field Sieve Factorization Sashisu Bajracharya, Deapesh Misra, Kris Gaj George Mason University Tarek El-Ghazawi The George Washington

More information

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation

More information

Intro to System Generator. Objectives. After completing this module, you will be able to:

Intro to System Generator. Objectives. After completing this module, you will be able to: Intro to System Generator This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Explain why there is a need for an integrated

More information

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers

More information

Developing a Data Driven System for Computational Neuroscience

Developing a Data Driven System for Computational Neuroscience Developing a Data Driven System for Computational Neuroscience Ross Snider and Yongming Zhu Montana State University, Bozeman MT 59717, USA Abstract. A data driven system implies the need to integrate

More information

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011 FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level

More information

ISim Hardware Co-Simulation Tutorial: Accelerating Floating Point Fast Fourier Transform Simulation

ISim Hardware Co-Simulation Tutorial: Accelerating Floating Point Fast Fourier Transform Simulation ISim Hardware Co-Simulation Tutorial: Accelerating Floating Point Fast Fourier Transform Simulation UG817 (v 13.2) July 28, 2011 Xilinx is disclosing this user guide, manual, release note, and/or specification

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering Winter/Summer Training Level 2 continues. 3 rd Year 4 th Year FIG-3 Level 1 (Basic & Mandatory) & Level 1.1 and

More information

Pipelined High Speed Double Precision Floating Point Multiplier Using Dadda Algorithm Based on FPGA

Pipelined High Speed Double Precision Floating Point Multiplier Using Dadda Algorithm Based on FPGA RESEARCH ARTICLE OPEN ACCESS Pipelined High Speed Double Precision Floating Point Multiplier Using Dadda Algorithm Based on FPGA J.Rupesh Kumar, G.Ram Mohan, Sudershanraju.Ch M. Tech Scholar, Dept. of

More information

Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer

Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Sashisu Bajracharya, Chang Shu, Kris Gaj George Mason University Tarek El-Ghazawi The George

More information

Kintex-7: Hardware Co-simulation and Design Using Simulink and Sysgen

Kintex-7: Hardware Co-simulation and Design Using Simulink and Sysgen Kintex-7: Hardware Co-simulation and Design Using Simulink and Sysgen Version 1.2 April 19, 2013 Revision History Version Date Author Comments Version Date Author(s) Comments on Versions No Completed 1.0

More information

Implementation of a Quadrature Mirror Filter Bank on an SRC reconfigurable computer for real-time signal processing

Implementation of a Quadrature Mirror Filter Bank on an SRC reconfigurable computer for real-time signal processing Calhoun: The NPS Institutional Archive DSpace Repository Theses and Dissertations 1. Thesis and Dissertation Collection, all items 2006-09 Implementation of a Quadrature Mirror Filter Bank on an SRC reconfigurable

More information

Cover TBD. intel Quartus prime Design software

Cover TBD. intel Quartus prime Design software Cover TBD intel Quartus prime Design software Fastest Path to Your Design The Intel Quartus Prime software is revolutionary in performance and productivity for FPGA, CPLD, and SoC designs, providing a

More information

SDR Spring KOMSYS-F6: Programmable Digital Devices (FPGAs)

SDR Spring KOMSYS-F6: Programmable Digital Devices (FPGAs) SDR Spring 2006 KOMSYS-F6: Programmable Digital Devices (FPGAs) Lecture 4 Jan Hvolgaard Mikkelsen Aalborg University 2006 Agenda What was the story about VHDL? o Quick recap from Lecture 3. Illustration

More information

ISE Design Suite Software Manuals and Help

ISE Design Suite Software Manuals and Help ISE Design Suite Software Manuals and Help These documents support the Xilinx ISE Design Suite. Click a document title on the left to view a document, or click a design step in the following figure to

More information

SRC MAPstation Image Processing: Edge Detection

SRC MAPstation Image Processing: Edge Detection SRC MAPstation Image Processing: Edge Detection David Caliga, Director Software Applications SRC Computers, Inc. dcaliga@srccomputers.com Motivations The purpose of detecting sharp changes in image brightness

More information

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items (FFT_MIXED) November 26, 2008 Product Specification Dillon Engineering, Inc. 4974 Lincoln Drive Edina, MN USA, 55436 Phone: 952.836.2413 Fax: 952.927.6514 E mail: info@dilloneng.com URL: www.dilloneng.com

More information

Agenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs

Agenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs New Directions in Programming FPGAs for DSP Dr. Jim Hwang Xilinx, Inc. Agenda Introduction FPGA DSP platforms Design challenges New programming models for FPGAs System Generator Getting your math into

More information

Hardware Oriented Security

Hardware Oriented Security 1 / 20 Hardware Oriented Security SRC-7 Programming Basics and Pipelining Miaoqing Huang University of Arkansas Fall 2014 2 / 20 Outline Basics of SRC-7 Programming Pipelining 3 / 20 Framework of Program

More information

Introduction to Microprocessor

Introduction to Microprocessor Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device

More information

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 Anurag Dwivedi Digital Design : Bottom Up Approach Basic Block - Gates Digital Design : Bottom Up Approach Gates -> Flip Flops Digital

More information

Efficient design and FPGA implementation of JPEG encoder

Efficient design and FPGA implementation of JPEG encoder IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 5, Ver. II (Sep. - Oct. 2016), PP 47-53 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Efficient design and FPGA implementation

More information

Introduction to Field Programmable Gate Arrays

Introduction to Field Programmable Gate Arrays Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Historical introduction.

More information

FPGA Matrix Multiplier

FPGA Matrix Multiplier FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri

More information

Mohamed Taher The George Washington University

Mohamed Taher The George Washington University Experience Programming Current HPRCs Mohamed Taher The George Washington University Acknowledgements GWU: Prof.\Tarek El-Ghazawi and Esam El-Araby GMU: Prof.\Kris Gaj ARSC SRC SGI CRAY 2 Outline Introduction

More information

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE 754-2008 Standard M. Shyamsi, M. I. Ibrahimy, S. M. A. Motakabber and M. R. Ahsan Dept. of Electrical and Computer Engineering

More information

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

FPGA Implementation and Validation of the Asynchronous Array of simple Processors FPGA Implementation and Validation of the Asynchronous Array of simple Processors Jeremy W. Webb VLSI Computation Laboratory Department of ECE University of California, Davis One Shields Avenue Davis,

More information

Cover TBD. intel Quartus prime Design software

Cover TBD. intel Quartus prime Design software Cover TBD intel Quartus prime Design software Fastest Path to Your Design The Intel Quartus Prime software is revolutionary in performance and productivity for FPGA, CPLD, and SoC designs, providing a

More information

FPGA VHDL Design Flow AES128 Implementation

FPGA VHDL Design Flow AES128 Implementation Sakinder Ali FPGA VHDL Design Flow AES128 Implementation Field Programmable Gate Array Basic idea: two-dimensional array of logic blocks and flip-flops with a means for the user to configure: 1. The interconnection

More information

University, Patiala, Punjab, India 1 2

University, Patiala, Punjab, India 1 2 1102 Design and Implementation of Efficient Adder based Floating Point Multiplier LOKESH BHARDWAJ 1, SAKSHI BAJAJ 2 1 Student, M.tech, VLSI, 2 Assistant Professor,Electronics and Communication Engineering

More information

Advanced Synthesis Techniques

Advanced Synthesis Techniques Advanced Synthesis Techniques Reminder From Last Year Use UltraFast Design Methodology for Vivado www.xilinx.com/ultrafast Recommendations for Rapid Closure HDL: use HDL Language Templates & DRC Constraints:

More information

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010 Digital Logic & Computer Design CS 434 Professor Dan Moldovan Spring 2 Copyright 27 Elsevier 5- Chapter 5 :: Digital Building Blocks Digital Design and Computer Architecture David Money Harris and Sarah

More information

FT-UNSHADES credits. UNiversity of Sevilla HArdware DEbugging System.

FT-UNSHADES credits. UNiversity of Sevilla HArdware DEbugging System. FT-UNSHADES Microelectronic Presentation Day February, 4th, 2004 J. Tombs & M.A. Aguirre jon@gte.esi.us.es, aguirre@gte.esi.us.es AICIA-GTE of The University of Sevilla (SPAIN) FT-UNSHADES credits UNiversity

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

VLSI Programming 2016: Lecture 3

VLSI Programming 2016: Lecture 3 VLSI Programming 2016: Lecture 3 Course: 2IMN35 Teachers: Kees van Berkel c.h.v.berkel@tue.nl Rudolf Mak r.h.mak@tue.nl Lab: Kees van Berkel, Rudolf Mak, Alok Lele www: http://www.win.tue.nl/~wsinmak/education/2imn35/

More information

Accelerating FPGA/ASIC Design and Verification

Accelerating FPGA/ASIC Design and Verification Accelerating FPGA/ASIC Design and Verification Tabrez Khan Senior Application Engineer Vidya Viswanathan Application Engineer 2015 The MathWorks, Inc. 1 Agenda Challeges with Traditional Implementation

More information

An Implementation of Double precision Floating point Adder & Subtractor Using Verilog

An Implementation of Double precision Floating point Adder & Subtractor Using Verilog IOSR Journal of Electrical and Electronics Engineering (IOSR-JEEE) e-issn: 2278-1676,p-ISSN: 2320-3331, Volume 9, Issue 4 Ver. III (Jul Aug. 2014), PP 01-05 An Implementation of Double precision Floating

More information

Implementation of Floating Point Multiplier Using Dadda Algorithm

Implementation of Floating Point Multiplier Using Dadda Algorithm Implementation of Floating Point Multiplier Using Dadda Algorithm Abstract: Floating point multiplication is the most usefull in all the computation application like in Arithematic operation, DSP application.

More information

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items (ULFFT) November 3, 2008 Product Specification Dillon Engineering, Inc. 4974 Lincoln Drive Edina, MN USA, 55436 Phone: 952.836.2413 Fax: 952.927.6514 E-mail: info@dilloneng.com URL: www.dilloneng.com Core

More information

Quixilica Floating-Point QR Processor Core

Quixilica Floating-Point QR Processor Core Data sheet Quixilica Floating-Point QR Processor Core With 13 processors on XC2V6000-5 - 20 GFlop/s at 100MHz With 10 processors on XC2V6000-5 - 15 GFlop/s at 97MHz With 4 processors on XC2V3000-5 - 81

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

ECE 699: Lecture 12. Introduction to High-Level Synthesis

ECE 699: Lecture 12. Introduction to High-Level Synthesis ECE 699: Lecture 12 Introduction to High-Level Synthesis Required Reading The ZYNQ Book Chapter 14: Spotlight on High-Level Synthesis Chapter 15: Vivado HLS: A Closer Look S. Neuendorffer and F. Martinez-Vallina,

More information

ESL design with the Agility Compiler for SystemC

ESL design with the Agility Compiler for SystemC ESL design with the Agility Compiler for SystemC SystemC behavioral design & synthesis Steve Chappell & Chris Sullivan Celoxica ESL design portfolio Complete ESL design environment Streaming Video Processing

More information

Xilinx DSP. High Performance Signal Processing. January 1998

Xilinx DSP. High Performance Signal Processing. January 1998 DSP High Performance Signal Processing January 1998 New High Performance DSP Alternative New advantages in FPGA technology and tools: DSP offers a new alternative to ASICs, fixed function DSP devices,

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Master s Thesis Presentation Hoang Le Director: Dr. Kris Gaj

Master s Thesis Presentation Hoang Le Director: Dr. Kris Gaj Master s Thesis Presentation Hoang Le Director: Dr. Kris Gaj Outline RSA ECM Reconfigurable Computing Platforms, Languages and Programming Environments Partitioning t ECM Code between HDLs and HLLs Implementation

More information

Stratix II vs. Virtex-4 Performance Comparison

Stratix II vs. Virtex-4 Performance Comparison White Paper Stratix II vs. Virtex-4 Performance Comparison Altera Stratix II devices use a new and innovative logic structure called the adaptive logic module () to make Stratix II devices the industry

More information

AL8253 Core Application Note

AL8253 Core Application Note AL8253 Core Application Note 6-15-2012 Table of Contents General Information... 3 Features... 3 Block Diagram... 3 Contents... 4 Behavioral... 4 Synthesizable... 4 Test Vectors... 4 Interface... 5 Implementation

More information

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices 3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific

More information

A Framework to Improve IP Portability on Reconfigurable Computers

A Framework to Improve IP Portability on Reconfigurable Computers A Framework to Improve IP Portability on Reconfigurable Computers Miaoqing Huang, Ivan Gonzalez, Sergio Lopez-Buedo, and Tarek El-Ghazawi NSF Center for High-Performance Reconfigurable Computing (CHREC)

More information

Tutorial. CASPER Reference Design

Tutorial. CASPER Reference Design Tutorial Author: Henry Chen December 18, 2009 (v1.1) Hardware Platforms Used: IBOB FPGA Clock Rate: 100MHz Sampling Rate: N/A Software Environment: TinySH This tutorial walks through the process of building

More information

Performance potential for simulating spin models on GPU

Performance potential for simulating spin models on GPU Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational

More information

System-on Solution from Altera and Xilinx

System-on Solution from Altera and Xilinx System-on on-a-programmable-chip Solution from Altera and Xilinx Xun Yang VLSI CAD Lab, Computer Science Department, UCLA FPGAs with Embedded Microprocessors Combination of embedded processors and programmable

More information

Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005

Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005 Ascenium: A Continuously Reconfigurable Architecture Robert Mykland Founder/CTO robert@ascenium.com August, 2005 Ascenium: A Continuously Reconfigurable Processor Continuously reconfigurable approach provides:

More information

Prototyping Architectural Support for Program Rollback Using FPGAs

Prototyping Architectural Support for Program Rollback Using FPGAs Prototyping Architectural Support for Program Rollback Using FPGAs Radu Teodorescu and Josep Torrellas http://iacoma.cs.uiuc.edu University of Illinois at Urbana-Champaign Motivation Problem: Software

More information

Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer

Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Sashisu Bajracharya 1, Chang Shu 1, Kris Gaj 1, Tarek El-Ghazawi 2 1 ECE Department, George

More information

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm 455 A High Speed Binary Floating Point Multiplier Using Dadda Algorithm B. Jeevan, Asst. Professor, Dept. of E&IE, KITS, Warangal. jeevanbs776@gmail.com S. Narender, M.Tech (VLSI&ES), KITS, Warangal. narender.s446@gmail.com

More information

ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC

ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC Chun Te Ewe, Peter Y. K. Cheung and George A. Constantinides Department of Electrical & Electronic Engineering,

More information

Parallel graph traversal for FPGA

Parallel graph traversal for FPGA LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,

More information

Field Programmable Gate Array (FPGA)

Field Programmable Gate Array (FPGA) Field Programmable Gate Array (FPGA) Lecturer: Krébesz, Tamas 1 FPGA in general Reprogrammable Si chip Invented in 1985 by Ross Freeman (Xilinx inc.) Combines the advantages of ASIC and uc-based systems

More information

The Xilinx XC6200 chip, the software tools and the board development tools

The Xilinx XC6200 chip, the software tools and the board development tools The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions

More information

H100 Series FPGA Application Accelerators

H100 Series FPGA Application Accelerators 2 H100 Series FPGA Application Accelerators Products in the H100 Series PCI-X Mainstream IBM EBlade H101-PCIXM» HPC solution for optimal price/performance» PCI-X form factor» Single Xilinx Virtex 4 FPGA

More information

Verilog Module 1 Introduction and Combinational Logic

Verilog Module 1 Introduction and Combinational Logic Verilog Module 1 Introduction and Combinational Logic Jim Duckworth ECE Department, WPI 1 Module 1 Verilog background 1983: Gateway Design Automation released Verilog HDL Verilog and simulator 1985: Verilog

More information

EITF35: Introduction to Structured VLSI Design

EITF35: Introduction to Structured VLSI Design EITF35: Introduction to Structured VLSI Design Introduction to FPGA design Rakesh Gangarajaiah Rakesh.gangarajaiah@eit.lth.se Slides from Chenxin Zhang and Steffan Malkowsky WWW.FPGA What is FPGA? Field

More information

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram: The CPU and Memory How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram: 1 Registers A register is a permanent storage location within

More information

An Efficient Implementation of Floating Point Multiplier

An Efficient Implementation of Floating Point Multiplier An Efficient Implementation of Floating Point Multiplier Mohamed Al-Ashrafy Mentor Graphics Mohamed_Samy@Mentor.com Ashraf Salem Mentor Graphics Ashraf_Salem@Mentor.com Wagdy Anis Communications and Electronics

More information

C-Based Hardware Design Platform for Dynamically Reconfigurable Processor

C-Based Hardware Design Platform for Dynamically Reconfigurable Processor C-Based Hardware Design Platform for Dynamically Reconfigurable Processor September 22 nd, 2005 IPFlex Inc. Agenda Merits of C-Based hardware design Hardware enabling C-Based hardware design DAPDNA-FW

More information

High-Performance Linear Algebra Processor using FPGA

High-Performance Linear Algebra Processor using FPGA High-Performance Linear Algebra Processor using FPGA J. R. Johnson P. Nagvajara C. Nwankpa 1 Extended Abstract With recent advances in FPGA (Field Programmable Gate Array) technology it is now feasible

More information

FPGA Polyphase Filter Bank Study & Implementation

FPGA Polyphase Filter Bank Study & Implementation FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image Communications/. Electrical Engineering Dept. UCLA 1 Introduction This document describes

More information