Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints

Size: px
Start display at page:

Download "Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints"

Transcription

1 Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints Amit Kulkarni, Tom Davidson, Karel Heyse, and Dirk Stroobandt ELIS department, Computer Systems Lab, Ghent University Sint-Pietersnieuwstraat 41, Ghent B-9000, Belgium { Amit.Kulkarni, Tom.Davidson, Karel.Heyse, Dirk.Stroobandt Abstract Dynamic Circuit Specialization (DCS) is an optimization technique used for implementing a parameterized application on an FPGA. The application is said to be parameterized when some of its inputs, called parameters, are infrequently changing compared to the other inputs. Instead of implementing these parameter inputs as regular inputs, in the DCS approach these inputs are implemented as constants and the design is optimized for these constants. When the parameter values change, the design is re-optimized for the new constant values by reconfiguring the FPGA. It has been investigated that run-time reconfiguration speed is the limiting factor of the DCS implementations on Xilinx FPGAs. We propose an idea to constrain the design s placement and use the custom Xilinx HWICAP driver to improve reconfiguration speed at the cost of a small reduction in design performance. We use Xilinx and as experimental platforms and we have used an 8-bit FIR filter with different tap configurations as our parameterized design whose filter coefficient values are infrequently changing inputs. A drastic improvement in the reconfiguration speed with a factor of 14 is achieved with only a 6% decrease in performance. I. INTRODUCTION Partial run-time reconfiguration is the ability to modify some logic blocks of an FPGA while the rest of it remains active. One of the commercially available technologies, developed by Xilinx, is called Partial Reconfiguration (PR) and has been around in the market for quite a while. Because of its reconfiguration overhead, the advantage of using PR is greatly diminished. Authors in [1] developed a technique called Dynamic Circuit Specialization which is a partial reconfiguration technique tailored to parameterized applications. Dynamic Circuit Specialization (DCS) uses the run-time reconfiguration technique to specialize the parameterized design depending on the values of the infrequently changing inputs (the parameters). Hence for every change in the parameter value, a new specialized bitstream is generated and the FPGA is reconfigured with the specialized bitstream. A detailed implementation of the DCS tool flow on a self reconfigurable platform is described in [2]. The DCS tool flow consists of two stages: the generic stage and the specialization stage. In the generic stage, the design with parameterized inputs described in a Hardware Description Language (HDL) is processed to yield a Partial Parameterized Configuration (PPC), which contains the bitstream expressed in the form of boolean functions. In the specialization stage, the boolean functions are evaluated for a specific parameter value by the Specialized Configuration Generator (SCG) to generate a specialized bitstream. Usually the SCG is implemented on an embedded processor. The embedded processor is responsible to swap the specialized bitstream into the configuration memory using the HWICAP. Our experiments for DCS implementations on a self reconfigurable platform have shown that the HWICAP proves to be the main bottleneck for the reconfiguration speed, since its throughput is not high enough to match with the speed of the embedded processor used during the reconfiguration process. However, experiments described in [3] have shown that the bottleneck depends on the experiment setup and the different components that participate during the reconfiguration process. The Xilinx HWICAP driver function XhwIcap_setClb_bits" is used to reconfigure the truth table entries of a single LookUp Table (LUT) during run time. However, with existing Xilinx FPGA column based architectures, we propose to reconfigure multiple LUTs at the same time. We do this by using design placement constraints to cluster the bits that have to be changed in the same reconfiguration columns and customizing the XhwIcap_setClb_bits" function. This gives us a significant improvement in reconfiguration speed. However this improvement comes at the cost of a slight reduction in the performance of the design. In this paper we show the trade-off between the design performance and the reconfiguration speed achieved by employing placement constraints and a custom HWICAP driver. We use the custom HWICAP driver along with the placement constraints on the Xilinx and FPGAs for implementing 8-bit FIR filters using DCS. In Section II, we describe the reconfiguration process of DCS. A brief overview of column based FPGA and architectures is presented in Section III. In Section IV, the details of the Xilinx HWICAP driver used for reconfiguration (the XhwIcap_setClb_bits" function) are described. In Section V, the use of placement constraints for the parameterized design is described. In Section VI, we present the main idea of improving the XhwIcap_setClb_bits" driver. In section VII, a brief description of the experiments with parameterized designs is given, the results of the improved reconfiguration speed are tabulated followed by the comparison and the discussion of the trade-off between reconfiguration speed and design performance. Finally we conclude in Section VIII. II. RUN-TIME RECONFIGURATION FOR DYNAMIC CIRCUIT SPECIALIZATION In this section, we briefly explain how run-time reconfiguration is used in Dynamic Circuit Specialization. In [4], it is explained how the parameterized design is mapped on /14/$31.00 c 2014 IEEE

2 to virtual LUTs called Tunable LUTs (TLUTs). TLUTs are virtual versions of conventional LUTs whose truth table entries are expressed as boolean functions of the parameters. The bitstream of a parameterized design is thus expressed as a boolean function of parameters, resulting in a parameterized configuration. For every change in parameter input values, a new specialized bitstream is generated by evaluating the corresponding boolean functions and a new specialized bitstream is generated by the SCG. Usually, the SCG is implemented on an embedded hard-core processor such as PowerPC or on an embedded soft-core processor such as MicroBlaze. The specialized bitstream represents the truth table entries of the TLUTs. Once the specialized bitstream is generated, it has to be swapped into the FPGA configuration memory to reconfigure the LUTs that correspond to their virtual TLUTs. The swapping is done by using the HWICAP as a configuration interface on a Xilinx FPGA. The HWICAP is accessible for the reconfiguration with the help of its driver called XhwIcap_setClb_ bits" [5]. More information on this driver is found in Section IV. The main advantage of this driver is that it provides access to the reconfiguration of a specific LUT when provided its location co-ordinates. Any LUT can be accessed via this driver function for the purpose of reconfiguration. The only disadvantage is that the XhwIcap_setClb_ bits" needs to be called for reconfiguring every single LUT even though there are good opportunities to reconfigure multiple LUTs with a single function call. To understand how this driver works, it is necessary to understand the Xilinx column based FPGA architecture first. TABLE I. XILINX FPGA DEVICE DETAILS Device name XC5VFX70T -FFG1136 XC7Z020 -CLG484-1 Board ML507 name Evaluation Platform ZedBoard Hard-core Processor PowerPC 440 Core ARM Cortex-A9 Clock frequency 400 MHz 667 MHz Soft-core Processor MicroBlaze (8.20.b) MicroBlaze (8.40.a) Clock frequency 100 MHz 100 MHz LUT inputs 6 6 LUT entries HWICAP type XPS HWICAP (5.01.a) AXI HWICAP (2.03.a) HWICAP clock (MHz) HWICAP throughput (non-dma) (MB/s) HWICAP port width (bits) Number of Clock Regions 16 6 Number of CLBs in one CLB column Frame size (32-bit words) III. XILINX COLUMN BASED ARCHITECTURE We consider the modern column based FPGA architectures from Xilinx for our experiments. Our experiments are limited to the and the FPGAs only. However, the idea of improving the reconfiguration speed can be applied to any column based Xilinx FPGA. The specifications related to reconfiguration are tabulated in Table I. The Xilinx FPGA contains an array of Configurable Logic Blocks (CLB) which encapsulates LUTs, flip-flops and multiplexers. Each CLB contains 8 LUTs and is capable of realizing combinational and sequential logic. The array of CLBs is divided into a number of Clock Regions. Each clock region contains CLB columns with a fixed number of CLBs and the height of the CLB column remains the same in all the clock regions. There are multiple CLB columns adjacent to each other thus forming CLB rows as shown in Figure 1. There are other columns such as DSP and BRAM columns that exist in between CLB columns. Frame Structure A frame of an FPGA is the smallest addressable element of an FPGA configuration. It can be viewed as a vertical stack of a fixed number of bits spanning a complete height of a row [6] [7]. A fixed data size of 2 words (1 word = 32 bits) are assigned to each CLB within the entire frame. This means a set of LUT entries present in one CLB can be configured within those 2 words. However, the complete configuration data of Fig. 1. Column based FPGA architecture: an entire CLB containing multiple LUTs spans over multiple frames and each frame has its own unique frame address [6]. It should be noted that there exist one extra word called HCLK config word" for each column within one frame as shown in Figure 2.

3 TABLE II. TLUTS CLUSTER RATE OF 64-TAP FIR FILTER IN A SINGLE CLB COLUMN Average Maximum Average Maximum Clustered TLUTs 55% 78% 52% 75% Remaining LUTs 45% 22% 48% 25% this is inefficient. The HWICAP with its fixed throughput proves to be a bottleneck and hence limits the reconfiguration speed. Our approach is to improve the XhwIcap_setClb_bits" to incorporate a technique where we can modify multiple TLUTs within a single read and write activity in frames. Fig. 2. Frame structure of column based Xilinx FPGA A single frame can contain truth table entries of multiple LUTs which are located in a single CLB column. In the Virtex- 5 there are 20 CLBs in one column and hence a total of =41 words exist in one frame. Similarly in the Zynq family, there are 50 CLBs in one column, so a total of = 101 words exist in one frame. The frame size plays an important role during the reconfiguration process. Since a frame is the smallest addressable element, for every reconfiguration process, at least one frame has to be accessed via the HWICAP. Thus the time taken to reconfigure a LUT is affected by the frame size. For a fixed HWICAP throughput, an increase in frame size results in an increase in reconfiguration time and thus reduces the reconfiguration speed. IV. THE XhwIcap_setClb_bits" DRIVER This is a HWICAP driver used to reconfigure actual LUTs that are used as virtual TLUTs in the DCS implementation. This procedure accepts the TLUT location co-ordinates and specialized bits (truth table values) as inputs. The function first generates a frame address from the given TLUT location co-ordinates and this helps to target the frame that contains truth table entries of a corresponding TLUT. The complete reconfiguration occurs in 3 steps: 1) Read frames: With the help of the frame address, multiple frames containing all the truth table entries of one TLUT are read from the configuration memory. 2) Modify frames: The current truth table entries of a TLUT are replaced with the specialized truth table bits. 3) Write-back the frames: With the help of the same frame address, the modified or specialized truth table values are updated in a TLUT by swapping in multiple frames into the configuration memory of the FPGA. The frames are accessed through the HWICAP and with a fixed HWICAP throughput. All 3 steps of the reconfiguration process should be executed to reconfigure a single TLUT and V. PLACEMENT CONSTRAINTS TO IMPROVE RECONFIGURATION SPEED The main aim of using placement constraints is to force multiple TLUTs to cluster all their truth table entries in a minimal number of frames. The placement constraints are used to restrict where the design s logic is placed. It forces the placer to use a certain area of the FPGA. We have described the correlation between the CLB columns and the frame structure in Section III. Our approach is to force more TLUTs to be placed in a single CLB column so that their truth table entries can be reconfigured with a minimal number of frame accesses. We have used the AREA_GROUP" constraint [8]. This constraint allows us to specify that certain parts of the design can only be placed in a pre-determined rectangular region of the FPGA s CLBs. To determine the exact size of this rectangular region the maximum length of the CLB column and minimum width of the CLB rows have to be considered. The maximum length of the CLB column is equal to its height (50 for the and 20 for the ) in a given clock region and it ensures that more TLUTs can fit the specified area, while the minimum CLB rows ensures that we use the minimal number of CLB columns possible. The exact area constraint differs for both targeted FPGAs. We first used the constraint to place the TLUTs in an exact minimum number of CLB columns determined by the number of LUTs present in it. For example, in the each column has 200 LUTs. Therefore to place the 64-tap FIR filter (1536 TLUTs), it is sufficient to use 8 columns. However with 8 columns, the router was not able to route the design. Hence we increased the width of the rectangular area by increasing the number of columns untill the router was able to route the whole design. The width of the rectangular area in terms of CLB columns for different configurations of the FIR filter is tabulated in Table III. For a 64-tap FIR filter, the average number of TLUTs clustered in a single CLB column of the is 110 which is 52% of the total LUTs available in a single CLB column and there are a maximum of 156 TLUTs clustered in a single column which is 75%, remaining LUTs are not a part of the reconfiguration process and hence they are used for the non-reconfigurable parts of the problem. Similarly, for the, the average number of TLUTs clustered in a single CLB column is 41 which is 55% of the total LUTs available in a single CLB column and there are a maximum of 60 TLUTs clustered in a single column which is 78%. Table II shows the percentage of TLUTs clustered.

4 VI. IMPROVING XhwIcap_setClb_bits" DRIVER Once the multiple TLUTs are placed within a single column, we modified the XhwIcap_setClb_bits" driver in order to exploit the advantage of the existing frame structure that is dependent on the column based Xilinx FPGA architecture. If multiple TLUTs of a parameterized design are placed in a single column then each TLUT with a certain set of truth table entries is located in a single frame. However, all 64 entries of a single TLUT are spread over multiple frames. We have modified the XhwIcap_setClb_bits" and renamed it XhwIcap_custom_setClb_bits". The reconfiguration process takes place in 3 steps: 1) Read frames: With the help of the frame address, multiple frames containing all the truth table entries of multiple TLUTs are read from the configuration memory. Since multiple TLUTs are placed in a single column, the truth table values of multiple TLUTs are read with a single read activity. 2) Modify frames: The current truth table entries of multiple TLUTs are replaced with the specialized truth table bits, which are generated by the SCG. Thus multiple TLUTs are specialized in a single attempt. 3) Write-back the frames: With the help of the same frame address, the modified or specialized truth table values are updated in multiple TLUTs by swapping in multiple frames into the configuration memory of the FPGA. This updates all the truth table entries of multiple TLUTs that are placed in a single column. Hence for a single read frames activity, multiple TLUTs can be reconfigured and this proves to be efficient since reading and writing back the frames for each TLUT can be avoided in contrast to the case of the conventional XhwIcap_setClb_bits" driver. If the number of TLUTs in a parameterized design is higher than what fits in a single CLB column then multiple CLB columns containing multiple TLUTs can be used in order to achieve the gain in reconfiguration speed. The main concern with using the placement constraints is the design performance. Strict placement constraints would lead to hindrance of the design performance. There will be a trade-off between the reconfiguration speed and the design performance which needs to be investigated. VII. EXPERIMENTS AND RESULTS In this section, we present our experiments followed by their results and compare them to the conventional DCS implementation. We used an 8-bit FIR filter with three different tap configurations as a paramaterized design. Each filter tap contains two 4-bit multipliers and each multiplier is mapped onto 12 TLUTs [2]. We used a FIR filter with different configurations as listed in Table IV. Figure 3 shows the structure of the filter: all coefficients are the parameterized inputs. For every infrequent change in the coefficient value, a specialized bitstream is generated and the filter taps containing multiplications are reconfigured accordingly. The reconfiguration time is tabulated in Table V and the corresponding bar graph is depicted in Figure 4. The Fig. 3. TABLE III. DIMENSIONS FOR THE PLACEMENT CONSTRAINTS 16-tap FIR 32-tap FIR 64-tap FIR Number of TLUTs to be clustered Note: Above dimensions are in the form of Length Width of the CLB columns. TABLE IV. k-taps, 8-bit FIR filter TABLE V. FIR FILTER CONFIGURATIONS Taps Multipliers TLUTs RECONFIGURATION TIME IN MILLISECONDS 16-tap FIR 384 TLUTs 32-tap FIR 768 TLUTs 64-tap FIR 1536 TLUTs 37.7 / / / / / / / / / / / / 72.9 Note: Above values are in the form of Without placement constraints / With placement constraints. figure shows that the FIR implementation without placement constraints needs less reconfiguration time for the than for the. The main reason is the larger frame size of the compared to the and thus the higher number of words to be reconfigured for the compared to the [5]. The significant improvement in reconfiguration speed can be noticed after introducing the placement constraints and using the XhwIcap_custom_setClb_bits" driver. On average, the reconfiguration time is reduced with a factor of 14 because of the reduced number of read and write frames function calls of the XhwIcap_setClb_bits" driver. We used the placement constraints so that the TLUTs are placed within the minimal number of CLB columns possible. The dimensions for the rectangular region of the placement constraints is tabulated in Table III. Since there are more CLBs in the CLB columns of the than in the, it is an advantage for the to incorporate more TLUTs within a column. Therefore we notice in Table III that the number of columns (width size of the CLB columns) used to constrain the TLUTs in the is lower than in the

5 Fig. 4. Reconfiguration time comparison Fig. 5. Clock frequency of the FIR filter with various tap configurations TABLE VI. MAXIMUM CLOCK (MHZ) THE DESIGN CAN SUPPORT 16-tap FIR 384 TLUTs 32-tap FIR 768 TLUTs 64-tap FIR 1536 TLUTs / / / / / / / / / / / / Note: Above values are in the form of Without placement constraints / With placement constraints.. The improvement in the reconfiguration speed comes at the cost of a reduction in the design performance. Introducing the placement constraints causes the design to have a long critical path compared to the conventional implementation. This causes a decrease in the maximum clock frequency the design can support as observed in Table VI. Figure 5 shows the bar graph of the design performance of various profiles. In Figure 6 the variation of clock frequency as a function of the number of TLUTs for a FIR filter implementation using a with hardcore processor is depicted. Clearly, an increase in number of TLUTs decreases the design performance. The overall average deterioration in design performance is about 6 MHz (or a deterioration of 6%). The same kind of response is observed in the implementation. Functional Density The effect of introducing the placement constraints to improve the reconfiguration speed in the DCS can be best explained using the Functional Density curve [9]. The functional density is defined as the number of Computations (N) that can be performed per unit Area (A) and unit Time (T) as shown in equation 1. F d = N AT (1) Fig. 6. Design Performance of FIR filter In our experiments, the computations are all the operations in the FIR filter. The value of A depends on the resources of the FPGA used by the FIR filter (mainly TLUTs). The value of T is composed of the reconfiguration time, the execution time and the time to specialize. A higher functional density signifies a more efficient usage of implementation area. The functional density curve is plotted against the rate of change of the input parameters. We plot the functional density of the FIR filter in three different forms: 1) Generic: FIR filter implementation without DCS. 2) DCS without placement constraints: FIR filter implementation using DCS without placement constraints. 3) DCS with placement constraints: FIR filter implementation using DCS with placement constraints. Figure 7 depicts the corresponding three curves. The x- axis represents the average time (in clock cycles) between

6 REFERENCES Fig. 7. Functional Density two parameter value changes. The Generic implementation has no variation in functional density since it uses a fixed number of resources. The functional density for the DCS with placement constraints, rises well before the functional density of the DCS without placement constraints. This shows that improving the reconfiguration speed allows the parameters to change faster with the same gain in area compared to the DCS whose reconfiguration speed is slow. However, since the design performance is slightly reduced, the magnitude of the functional density curve beyond point B is relatively lower compared to the DCS without placement constraints forming the main trade-off. Hence it makes sense to use the placement constraints in the range of parameter changes between point A and point B. If the parameters change too fast then it is suitable to use the generic implementation. In our future research work, we will try to push the crossover point A of the functional density of the DCS towards the left which causes the curve to rise more early than the other curves resulting in a significantly higher functional density for more frequent parameter re-use (expressed in clock cycles) in between changes, this can be achieved by improving the reconfiguration speed. [1] K. Bruneel, W. Heirman, and D. Stroobandt, Dynamic data folding with parameterizable configurations, ACM Transactions on Design Automation of Electronic Systems, vol. 16, no. 4, [2] K. Bruneel, F. Abouelella, and D. Stroobandt, Automatically mapping applications to a self-reconfiguring platform, in Design, Automation Test in Europe Conference Exhibition, DATE 09., April 2009, pp [3] K. Papadimitriou, A. Dollas, and S. Hauck, Performance of partial reconfiguration in fpga systems: A survey and a cost model, ACM Trans. Reconfigurable Technol. Syst., vol. 4, no. 4, pp. 36:1 36:24, Dec [Online]. Available: [4] K. Heyse, T. Davidson, E. Vansteenkiste, K. Bruneel, and D. Stroobandt, Efficient implementation of virtual coarse grained reconfigurable arrays on FPGAs, in Proceedings of the 23rd International Conference on Field Programmable Logic and Applications. Piscataway, NJ, USA: IEEE, 2013, pp [5] A. Kulkarni, K. Heyse, T. Davidson, and D. Stroobandt, Performance Evaluation of Dynamic Circuit Specialization on Xilinx FPGAs, in Proceedings of the 11th FPGAworld Conference, ser. FPGAworld 14, [6] FPGA Configuration User Guide (ug191), com/support/documentation/user_guides/ug191.pdf, accessed: [7] 7 Series FPGAs Configuration User Guide (ug470), com/support/documentation/user_guides/ug470_7series_config.pdf, accessed: [8] Constriants Guide (cgd 10.1), books/docs/cgd/cgd.pdf, accessed: [9] A. DeHon, Reconfigurable architectures for general-purpose computing, Cambridge, MA, USA, Tech. Rep., VIII. CONCLUSION To improve the reconfiguration speed in DCS implementations using parameterized reconfiguration we constrained the TLUTs of the FIR filter within the minimal number of columns possible. We have also modified the existing Xilinx HWICAP driver in which optimizations were done to read and write the frames only once to reconfigure multiple TLUT entries. We have shown that there is a drastic improvement in the reconfiguration speed but this comes at the cost of a slight reduction in performance of the design. Functional density curves were used to discuss the impact of improving reconfiguration speed and slight reduction in design performance. The experiments were done on the and the platforms. In typical cases, if the FPGA resources are underutilized during the DCS implementation, then it is suitable to use placement constraints in order to improve the reconfiguration speed. This gives more flexibility to the parameterized design to have changes in parameters more frequently than the conventional DCS implementation. It is also to be noted that design performance will be degraded slightly and which should be considered by the designers if it is allowed in the given timing budget.

An automatic tool flow for the combined implementation of multi-mode circuits

An automatic tool flow for the combined implementation of multi-mode circuits An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João M. P. Cardoso and Dirk Stroobandt Ghent University, ELIS Department Sint-Pietersnieuwstraat

More information

New Successes for Parameterized Run-time Reconfiguration

New Successes for Parameterized Run-time Reconfiguration New Successes for Parameterized Run-time Reconfiguration (or: use the FPGA to its true capabilities) Prof. Dirk Stroobandt Ghent University, Belgium Hardware and Embedded Systems group Universiteit Gent

More information

Memory-efficient and fast run-time reconfiguration of regularly structured designs

Memory-efficient and fast run-time reconfiguration of regularly structured designs Memory-efficient and fast run-time reconfiguration of regularly structured designs Brahim Al Farisi, Karel Heyse, Karel Bruneel and Dirk Stroobandt Ghent University, ELIS Department Sint-Pietersnieuwstraat

More information

Identification of Dynamic Circuit Specialization Opportunities in RTL Code

Identification of Dynamic Circuit Specialization Opportunities in RTL Code Identification of Dynamic Circuit Specialization Opportunities in RTL Code TOM DAVIDSON, ELIAS VANSTEENKISTE, KAREL HEYSE, KAREL BRUNEEL, and DIRK STROOBANDT, Ghent University, ELIS Department Dynamic

More information

MAXIMIZING THE REUSE OF ROUTING RESOURCES IN A RECONFIGURATION-AWARE CONNECTION ROUTER

MAXIMIZING THE REUSE OF ROUTING RESOURCES IN A RECONFIGURATION-AWARE CONNECTION ROUTER MAXIMIZING THE REUSE OF ROUTING RESOURCES IN A RECONFIGURATION-AWARE CONNECTION ROUTER Elias Vansteenkiste, Karel Bruneel and Dirk Stroobandt Department of Electronics and Information Systems Ghent University

More information

Automating Reconfiguration Chain Generation for SRL-Based Run-Time Reconfiguration

Automating Reconfiguration Chain Generation for SRL-Based Run-Time Reconfiguration Automating Reconfiguration Chain Generation for SRL-Based Run-Time Reconfiguration Karel Heyse, Brahim Al Farisi, Karel Bruneel, and Dirk Stroobandt Ghent University, ELIS Department Sint-Pietersnieuwstraat

More information

Maximizing Routing Resource Reuse in a Reconfiguration-aware Connection Router for FPGAs

Maximizing Routing Resource Reuse in a Reconfiguration-aware Connection Router for FPGAs FACULTY OF ENGINEERING AND ARCHITECTURE Maximizing Routing Resource Reuse in a Reconfiguration-aware Connection Router for FPGAs Elias Vansteenkiste Karel Bruneel and Dirk Stroobandt Elias.Vansteenkiste@UGent.be

More information

TROUTE: A Reconfigurability-aware FPGA Router

TROUTE: A Reconfigurability-aware FPGA Router TROUTE: A Reconfigurability-aware FPGA Router Karel Bruneel and Dirk Stroobandt Hardware and Embedded Systems Group, ELIS Dept., Ghent University, Sint-Pietersnieuwstraat 4, B-9000 Gent, Belgium {karel.bruneel;dirk.stroobandt}@ugent.be

More information

Research Article Dynamic Circuit Specialisation for Key-Based Encryption Algorithms and DNA Alignment

Research Article Dynamic Circuit Specialisation for Key-Based Encryption Algorithms and DNA Alignment Hindawi Publishing Corporation International Journal of Reconfigurable Computing Volume 2012, Article ID 716984, 13 pages doi:10.1155/2012/716984 Research Article Dynamic Circuit Specialisation for Key-Based

More information

Run-time reconfiguration for automatic hardware/software partitioning

Run-time reconfiguration for automatic hardware/software partitioning Run-time reconfiguration for automatic hardware/software partitioning Tom Davidson ELIS department, Ghent University Sint-pietersnieuwstraat, 41 9000, Ghent, Belgium Email: tom.davidson@ugent.be Karel

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study Bradley F. Dutton, Graduate Student Member, IEEE, and Charles E. Stroud, Fellow, IEEE Dept. of Electrical and Computer Engineering

More information

How Parameterizable Run-time FPGA Reconfiguration can Benefit Adaptive Embedded Systems

How Parameterizable Run-time FPGA Reconfiguration can Benefit Adaptive Embedded Systems How Parameterizable Run-time FPGA Reconfiguration can Benefit Adaptive Embedded Systems Dirk Stroobandt and Karel Bruneel Ghent University, ELIS Department, Gent, Belgium, Dirk.Stroobandt@UGent.be Abstract

More information

FPGA: What? Why? Marco D. Santambrogio

FPGA: What? Why? Marco D. Santambrogio FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much

More information

Efficient Hardware Debugging using Parameterized FPGA Reconfiguration

Efficient Hardware Debugging using Parameterized FPGA Reconfiguration 2016 IEEE International Parallel and Distributed Processing Symposium Workshops Efficient Hardware Debugging using Parameterized FPGA Reconfiguration Alexandra Kourfali Department of Electronics and Information

More information

Partial Region and Bitstream Cost Models for Hardware Multitasking on Partially Reconfigurable FPGAs

Partial Region and Bitstream Cost Models for Hardware Multitasking on Partially Reconfigurable FPGAs Partial Region and Bitstream Cost Models for Hardware Multitasking on Partially Reconfigurable FPGAs Aurelio Morales-Villanueva and Ann Gordon-Ross NSF Center for High-Performance Reconfigurable Computing

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Introduction to Partial Reconfiguration Methodology

Introduction to Partial Reconfiguration Methodology Methodology This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Define Partial Reconfiguration technology List common applications

More information

Implementation of a FIR Filter on a Partial Reconfigurable Platform

Implementation of a FIR Filter on a Partial Reconfigurable Platform Implementation of a FIR Filter on a Partial Reconfigurable Platform Hanho Lee and Chang-Seok Choi School of Information and Communication Engineering Inha University, Incheon, 402-751, Korea hhlee@inha.ac.kr

More information

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores A Configurable Multi-Ported Register File Architecture for Soft Processor Cores Mazen A. R. Saghir and Rawan Naous Department of Electrical and Computer Engineering American University of Beirut P.O. Box

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011 FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level

More information

RUN-TIME PARTIAL RECONFIGURATION SPEED INVESTIGATION AND ARCHITECTURAL DESIGN SPACE EXPLORATION

RUN-TIME PARTIAL RECONFIGURATION SPEED INVESTIGATION AND ARCHITECTURAL DESIGN SPACE EXPLORATION RUN-TIME PARTIAL RECONFIGURATION SPEED INVESTIGATION AND ARCHITECTURAL DESIGN SPACE EXPLORATION Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch II. Physics Institute Dept. of Electronic, Computer and

More information

Parallel FIR Filters. Chapter 5

Parallel FIR Filters. Chapter 5 Chapter 5 Parallel FIR Filters This chapter describes the implementation of high-performance, parallel, full-precision FIR filters using the DSP48 slice in a Virtex-4 device. ecause the Virtex-4 architecture

More information

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS.

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS. INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS Arulalan Rajan 1, H S Jamadagni 1, Ashok Rao 2 1 Centre for Electronics Design and Technology, Indian Institute of Science, India (mrarul,hsjam)@cedt.iisc.ernet.in

More information

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices 3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific

More information

INTRODUCTION TO FPGA ARCHITECTURE

INTRODUCTION TO FPGA ARCHITECTURE 3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)

More information

High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications

High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications Pallavi R. Yewale ME Student, Dept. of Electronics and Tele-communication, DYPCOE, Savitribai phule University, Pune,

More information

Hybrid LUT/Multiplexer FPGA Logic Architectures

Hybrid LUT/Multiplexer FPGA Logic Architectures Hybrid LUT/Multiplexer FPGA Logic Architectures Abstract: Hybrid configurable logic block architectures for field-programmable gate arrays that contain a mixture of lookup tables and hardened multiplexers

More information

On the parallelization of slice-based Keccak implementations on Xilinx FPGAs

On the parallelization of slice-based Keccak implementations on Xilinx FPGAs On the parallelization of slice-based Keccak implementations on Xilinx FPGAs Jori Winderickx, Joan Daemen and Nele Mentens KU Leuven, ESAT/COSIC & iminds, Leuven, Belgium STMicroelectronics Belgium & Radboud

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

High Level Abstractions for Implementation of Software Radios

High Level Abstractions for Implementation of Software Radios High Level Abstractions for Implementation of Software Radios J. B. Evans, Ed Komp, S. G. Mathen, and G. Minden Information and Telecommunication Technology Center University of Kansas, Lawrence, KS 66044-7541

More information

ISSN Vol.05,Issue.09, September-2017, Pages:

ISSN Vol.05,Issue.09, September-2017, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,

More information

SoC Basics Avnet Silica & Enclustra Seminar Getting started with Xilinx Zynq SoC Fribourg, April 26, 2017

SoC Basics Avnet Silica & Enclustra Seminar Getting started with Xilinx Zynq SoC Fribourg, April 26, 2017 1 2 3 4 Introduction - Cool new Stuff Everybody knows, that new technologies are usually driven by application requirements. A nice example for this is, that we developed portable super-computers with

More information

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB. Agenda The topics that will be addressed are: Scheduling tasks on Reconfigurable FPGA architectures Mauro Marinoni ReTiS Lab, TeCIP Institute Scuola superiore Sant Anna - Pisa Overview on basic characteristics

More information

Design and Implementation of 3-D DWT for Video Processing Applications

Design and Implementation of 3-D DWT for Video Processing Applications Design and Implementation of 3-D DWT for Video Processing Applications P. Mohaniah 1, P. Sathyanarayana 2, A. S. Ram Kumar Reddy 3 & A. Vijayalakshmi 4 1 E.C.E, N.B.K.R.IST, Vidyanagar, 2 E.C.E, S.V University

More information

Test Set Generation almost for Free using a Run-Time FPGA Reconfiguration Technique

Test Set Generation almost for Free using a Run-Time FPGA Reconfiguration Technique Test Set Generation almost for Free using a Run-Time FPGA Reconfiguration Technique Alexandra Kourfali Department of Electronics and Information Systems Ghent University Sint-Pietersnieuwstraat 41, B-9

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

A Reconfigurable Multifunction Computing Cache Architecture

A Reconfigurable Multifunction Computing Cache Architecture IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 4, AUGUST 2001 509 A Reconfigurable Multifunction Computing Cache Architecture Huesung Kim, Student Member, IEEE, Arun K. Somani,

More information

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA L2: FPGA HARDWARE 18-545: ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA 18-545: FALL 2014 2 Admin stuff Project Proposals happen on Monday Be prepared to give an in-class presentation Lab 1 is

More information

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE Anni Benitta.M #1 and Felcy Jeba Malar.M *2 1# Centre for excellence in VLSI Design, ECE, KCG College of Technology, Chennai, Tamilnadu

More information

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable

More information

Dynamic Partial Reconfigurable FIR Filter Design

Dynamic Partial Reconfigurable FIR Filter Design Dynamic Partial Reconfigurable FIR Filter Design Yeong-Jae Oh, Hanho Lee, and Chong-Ho Lee School of Information and Communication Engineering Inha University, Incheon, Korea rokmcno6@gmail.com, {hhlee,

More information

An FPGA based rapid prototyping platform for wavelet coprocessors

An FPGA based rapid prototyping platform for wavelet coprocessors An FPGA based rapid prototyping platform for wavelet coprocessors Alonzo Vera a, Uwe Meyer-Baese b and Marios Pattichis a a University of New Mexico, ECE Dept., Albuquerque, NM87131 b FAMU-FSU, ECE Dept.,

More information

Lecture 41: Introduction to Reconfigurable Computing

Lecture 41: Introduction to Reconfigurable Computing inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following

More information

High Speed Pipelined Architecture for Adaptive Median Filter

High Speed Pipelined Architecture for Adaptive Median Filter Abstract High Speed Pipelined Architecture for Adaptive Median Filter D.Dhanasekaran, and **Dr.K.Boopathy Bagan *Assistant Professor, SVCE, Pennalur,Sriperumbudur-602105. **Professor, Madras Institute

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN Xiaoying Li 1 Fuming Sun 2 Enhua Wu 1, 3 1 University of Macau, Macao, China 2 University of Science and Technology Beijing, Beijing, China

More information

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation. ISSN 2319-8885 Vol.03,Issue.32 October-2014, Pages:6436-6440 www.ijsetr.com Design and Modeling of Arithmetic and Logical Unit with the Platform of VLSI N. AMRUTHA BINDU 1, M. SAILAJA 2 1 Dept of ECE,

More information

Exploring OpenCL Memory Throughput on the Zynq

Exploring OpenCL Memory Throughput on the Zynq Exploring OpenCL Memory Throughput on the Zynq Technical Report no. 2016:04, ISSN 1652-926X Chalmers University of Technology Bo Joel Svensson bo.joel.svensson@gmail.com Abstract The Zynq platform combines

More information

Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency

Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency Vijay G. Savani, Akash I. Mecwan, N. P. Gajjar Institute of Technology, Nirma University vijay.savani@nirmauni.ac.in, akash.mecwan@nirmauni.ac.in,

More information

The Efficient Implementation of Numerical Integration for FPGA Platforms

The Efficient Implementation of Numerical Integration for FPGA Platforms Website: www.ijeee.in (ISSN: 2348-4748, Volume 2, Issue 7, July 2015) The Efficient Implementation of Numerical Integration for FPGA Platforms Hemavathi H Department of Electronics and Communication Engineering

More information

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

High Performance and Area Efficient DSP Architecture using Dadda Multiplier 2017 IJSRST Volume 3 Issue 6 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology High Performance and Area Efficient DSP Architecture using Dadda Multiplier V.Kiran Kumar

More information

Reducing Reconfiguration Overhead for Reconfigurable Multi-Mode Filters Through Polynomial-Time Optimization and Joint Filter Design

Reducing Reconfiguration Overhead for Reconfigurable Multi-Mode Filters Through Polynomial-Time Optimization and Joint Filter Design Center for Embedded Computer Systems University of California, Irvine Reducing Reconfiguration Overhead for Reconfigurable Multi-Mode Filters Through Polynomial-Time Optimization and Joint Filter Design

More information

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS American Journal of Applied Sciences 11 (4): 558-563, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.558.563 Published Online 11 (4) 2014 (http://www.thescipub.com/ajas.toc) PERFORMANCE

More information

THERE IS A strong interest in developing effective methods

THERE IS A strong interest in developing effective methods 488 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013 A Dynamically Reconfigurable Pixel Processor System Based on Power/Energy-Performance-Accuracy Optimization

More information

Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Lo

Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Lo Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Low-Power Capacity- based Measurement Application on Xilinx FPGAs Abstract The application of Field Programmable

More information

EITF35: Introduction to Structured VLSI Design

EITF35: Introduction to Structured VLSI Design EITF35: Introduction to Structured VLSI Design Introduction to FPGA design Rakesh Gangarajaiah Rakesh.gangarajaiah@eit.lth.se Slides from Chenxin Zhang and Steffan Malkowsky WWW.FPGA What is FPGA? Field

More information

FPGA Matrix Multiplier

FPGA Matrix Multiplier FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri

More information

Design of Digital Circuits

Design of Digital Circuits Design of Digital Circuits Lecture 3: Introduction to the Labs and FPGAs Prof. Onur Mutlu (Lecture by Hasan Hassan) ETH Zurich Spring 2018 1 March 2018 1 Lab Sessions Where? HG E 19, HG E 26.1, HG E 26.3,

More information

SECURE PARTIAL RECONFIGURATION OF FPGAs. Amir S. Zeineddini Kris Gaj

SECURE PARTIAL RECONFIGURATION OF FPGAs. Amir S. Zeineddini Kris Gaj SECURE PARTIAL RECONFIGURATION OF FPGAs Amir S. Zeineddini Kris Gaj Outline FPGAs Security Our scheme Implementation approach Experimental results Conclusions FPGAs SECURITY SRAM FPGA Security Designer/Vendor

More information

A DYNAMICALLY RECONFIGURABLE PARALLEL PIXEL PROCESSING SYSTEM. Daniel Llamocca, Marios Pattichis, and Alonzo Vera

A DYNAMICALLY RECONFIGURABLE PARALLEL PIXEL PROCESSING SYSTEM. Daniel Llamocca, Marios Pattichis, and Alonzo Vera A DYNAMICALLY RECONFIGURABLE PARALLEL PIXEL PROCESSING SYSTEM Daniel Llamocca, Marios Pattichis, and Alonzo Vera Electrical and Computer Engineering Department The University of New Mexico, Albuquerque,

More information

MCM Based FIR Filter Architecture for High Performance

MCM Based FIR Filter Architecture for High Performance ISSN No: 2454-9614 MCM Based FIR Filter Architecture for High Performance R.Gopalana, A.Parameswari * Department Of Electronics and Communication Engineering, Velalar College of Engineering and Technology,

More information

Efficient SAT-based Boolean Matching for FPGA Technology Mapping

Efficient SAT-based Boolean Matching for FPGA Technology Mapping Efficient SAT-based Boolean Matching for FPGA Technology Mapping Sean Safarpour, Andreas Veneris Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada {sean, veneris}@eecg.toronto.edu

More information

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Performance Analysis of CORDIC Architectures Targeted by FPGA Devices Guddeti Nagarjuna Reddy 1, R.Jayalakshmi 2, Dr.K.Umapathy

More information

Using Genetic Algorithms to Solve the Box Stacking Problem

Using Genetic Algorithms to Solve the Box Stacking Problem Using Genetic Algorithms to Solve the Box Stacking Problem Jenniffer Estrada, Kris Lee, Ryan Edgar October 7th, 2010 Abstract The box stacking or strip stacking problem is exceedingly difficult to solve

More information

Advanced FPGA Design Methodologies with Xilinx Vivado

Advanced FPGA Design Methodologies with Xilinx Vivado Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,

More information

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs Harrys Sidiropoulos, Kostas Siozios and Dimitrios Soudris School of Electrical & Computer Engineering National

More information

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier U.V.N.S.Suhitha Student Department of ECE, BVC College of Engineering, AP, India. Abstract: The ever growing need for improved

More information

Leso Martin, Musil Tomáš

Leso Martin, Musil Tomáš SAFETY CORE APPROACH FOR THE SYSTEM WITH HIGH DEMANDS FOR A SAFETY AND RELIABILITY DESIGN IN A PARTIALLY DYNAMICALLY RECON- FIGURABLE FIELD-PROGRAMMABLE GATE ARRAY (FPGA) Leso Martin, Musil Tomáš Abstract:

More information

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage silage@temple.edu ECE Temple University www.temple.edu/scdl Signal Processing Algorithms into Fixed Point FPGA Hardware Motivation

More information

Performance Imrovement of a Navigataion System Using Partial Reconfiguration

Performance Imrovement of a Navigataion System Using Partial Reconfiguration Performance Imrovement of a Navigataion System Using Partial Reconfiguration S.S.Shriramwar 1, Dr. N.K.Choudhari 2 1 Priyadarshini College of Engineering, R.T.M. Nagpur Unversity,Nagpur, sshriramwar@yahoo.com

More information

Rapid Overlay Builder for Xilinx FPGAs

Rapid Overlay Builder for Xilinx FPGAs Rapid Overlay Builder for Xilinx FPGAs by Xi Yue B.A.Sc., University of Toronto, 2012 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REUIQEMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY

More information

FIR Filter Architecture for Fixed and Reconfigurable Applications

FIR Filter Architecture for Fixed and Reconfigurable Applications FIR Filter Architecture for Fixed and Reconfigurable Applications Nagajyothi 1,P.Sayannna 2 1 M.Tech student, Dept. of ECE, Sudheer reddy college of Engineering & technology (w), Telangana, India 2 Assosciate

More information

An Overlay Architecture for FPGA-based Industrial Control Systems Designed with Functional Block Diagrams

An Overlay Architecture for FPGA-based Industrial Control Systems Designed with Functional Block Diagrams R2-7 SASIMI 26 Proceedings An Overlay Architecture for FPGA-based Industrial Control Systems Designed with Functional Block Diagrams Taisei Segawa, Yuichiro Shibata, Yudai Shirakura, Kenichi Morimoto,

More information

Readings: Storage unit. Can hold an n-bit value Composed of a group of n flip-flops. Each flip-flop stores 1 bit of information.

Readings: Storage unit. Can hold an n-bit value Composed of a group of n flip-flops. Each flip-flop stores 1 bit of information. Registers Readings: 5.8-5.9.3 Storage unit. Can hold an n-bit value Composed of a group of n flip-flops Each flip-flop stores 1 bit of information ff ff ff ff 178 Controlled Register Reset Load Action

More information

High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication

High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication Erik H. D Hollander Electronics and Information Systems Department Ghent University, Ghent, Belgium Erik.DHollander@ugent.be

More information

An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic

An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic Pedro Echeverría, Marisa López-Vallejo Department of Electronic Engineering, Universidad Politécnica de Madrid

More information

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 PP 11-18 www.iosrjen.org Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory S.Parkavi (1) And S.Bharath

More information

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

Outline of Presentation Field Programmable Gate Arrays (FPGAs( FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering Auburn University Outline of Presentation Field Programmable Gate Arrays (FPGAs( FPGAs) How Programmable

More information

A Dynamic Computing Platform for Image and Video Processing Applications

A Dynamic Computing Platform for Image and Video Processing Applications A Dynamic Computing Platform for Image and Video Processing Applications Daniel Llamocca, Marios Pattichis Electrical and Computer Engineering Department The University of New Mexico Albuquerque, NM, USA

More information

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm 455 A High Speed Binary Floating Point Multiplier Using Dadda Algorithm B. Jeevan, Asst. Professor, Dept. of E&IE, KITS, Warangal. jeevanbs776@gmail.com S. Narender, M.Tech (VLSI&ES), KITS, Warangal. narender.s446@gmail.com

More information

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

Chapter 2. FPGA and Dynamic Reconfiguration ...

Chapter 2. FPGA and Dynamic Reconfiguration ... Chapter 2 FPGA and Dynamic Reconfiguration... This chapter will introduce a family of silicon devices, FPGAs exploring their architecture. This work is based on these particular devices. The chapter will

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,

More information

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS 1 RONNIE O. SERFA JUAN, 2 CHAN SU PARK, 3 HI SEOK KIM, 4 HYEONG WOO CHA 1,2,3,4 CheongJu University E-maul: 1 engr_serfs@yahoo.com,

More information

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder ESE532: System-on-a-Chip Architecture Day 8: September 26, 2018 Spatial Computations Today Graph Cycles (from Day 7) Accelerator Pipelines FPGAs Zynq Computational Capacity 1 2 Message Custom accelerators

More information

Don t expect to be able to write and debug your code during the lab session.

Don t expect to be able to write and debug your code during the lab session. EECS150 Spring 2002 Lab 4 Verilog Simulation Mapping UNIVERSITY OF CALIFORNIA AT BERKELEY COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Lab 4 Verilog Simulation Mapping

More information

The Xilinx XC6200 chip, the software tools and the board development tools

The Xilinx XC6200 chip, the software tools and the board development tools The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions

More information

Fast dynamic and partial reconfiguration Data Path

Fast dynamic and partial reconfiguration Data Path Fast dynamic and partial reconfiguration Data Path with low Michael Hübner 1, Diana Göhringer 2, Juanjo Noguera 3, Jürgen Becker 1 1 Karlsruhe Institute t of Technology (KIT), Germany 2 Fraunhofer IOSB,

More information

Compact Clock Skew Scheme for FPGA based Wave- Pipelined Circuits

Compact Clock Skew Scheme for FPGA based Wave- Pipelined Circuits International Journal of Communication Engineering and Technology. ISSN 2277-3150 Volume 3, Number 1 (2013), pp. 13-22 Research India Publications http://www.ripublication.com Compact Clock Skew Scheme

More information

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM B.HARIKRISHNA 1, DR.S.RAVI 2 1 Sathyabama Univeristy, Chennai, India 2 Department of Electronics Engineering, Dr. M. G. R. Univeristy, Chennai,

More information

Computing to the Energy and Performance Limits with Heterogeneous CPU-FPGA Devices. Dr Jose Luis Nunez-Yanez University of Bristol

Computing to the Energy and Performance Limits with Heterogeneous CPU-FPGA Devices. Dr Jose Luis Nunez-Yanez University of Bristol Computing to the Energy and Performance Limits with Heterogeneous CPU-FPGA Devices Dr Jose Luis Nunez-Yanez University of Bristol Power and energy savings at run-time Power = α.c.v 2.f+g1.V 3 Energy =

More information

Implementation of a Bi-Variate Gaussian Random Number Generator on FPGA without Using Multipliers

Implementation of a Bi-Variate Gaussian Random Number Generator on FPGA without Using Multipliers Implementation of a Bi-Variate Gaussian Random Number Generator on FPGA without Using Multipliers Eldho P Sunny 1, Haripriya. P 2 M.Tech Student [VLSI & Embedded Systems], Sree Narayana Gurukulam College

More information

RiceNIC. A Reconfigurable Network Interface for Experimental Research and Education. Jeffrey Shafer Scott Rixner

RiceNIC. A Reconfigurable Network Interface for Experimental Research and Education. Jeffrey Shafer Scott Rixner RiceNIC A Reconfigurable Network Interface for Experimental Research and Education Jeffrey Shafer Scott Rixner Introduction Networking is critical to modern computer systems Role of the network interface

More information

FPGA Polyphase Filter Bank Study & Implementation

FPGA Polyphase Filter Bank Study & Implementation FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image Communications/. Electrical Engineering Dept. UCLA 1 Introduction This document describes

More information

Fast FPGA Routing Approach Using Stochestic Architecture

Fast FPGA Routing Approach Using Stochestic Architecture . Fast FPGA Routing Approach Using Stochestic Architecture MITESH GURJAR 1, NAYAN PATEL 2 1 M.E. Student, VLSI and Embedded System Design, GTU PG School, Ahmedabad, Gujarat, India. 2 Professor, Sabar Institute

More information

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner RiceNIC Prototyping Network Interfaces Jeffrey Shafer Scott Rixner RiceNIC Overview Gigabit Ethernet Network Interface Card RiceNIC - Prototyping Network Interfaces 2 RiceNIC Overview Reconfigurable and

More information

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders Vol. 3, Issue. 4, July-august. 2013 pp-2266-2270 ISSN: 2249-6645 Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders V.Krishna Kumari (1), Y.Sri Chakrapani

More information