A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs

Size: px
Start display at page:

Download "A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs"

Transcription

1 A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs Harrys Sidiropoulos, Kostas Siozios and Dimitrios Soudris School of Electrical & Computer Engineering National Technical University of Athens, Greece {harrys, ksiop, Abstract. This paper introduces a novel methodology for enabling rapid exploration of memory hierarchies onto FPGA devices. The methodology is software supported by a new open-source tool framework, named NAROUTO. Among others, the proposed framework enables critical tasks during architecture s design, such as memory hierarchy and floor-planning. Furthermore, NAROUTO framework is the only available solution for power/energy evaluation of different memory organizations. Experimental results shown that NAROUTO framework leads to significant area, power (about 82%) and performance (about 46%) improvements, as compared to existing solutions. Keywords: FPGA, CAD Tool, Exploration Framework. 1 Introduction Recently, reconfigurable architectures, and more specifically Field-Programmable Gate Arrays (FPGAs), have become efficient alternatives to Application-Specific Integrated Circuits (ASICs) due to their inherent re-programmability feature. FPGA platforms include, apart from logic and routing infrastructure, more complex components (e.g. memory blocks, DSP cores, embedded CPUs, etc.) that further improve their efficiency. One of the upmost important tasks for designing an efficient FPGA device is the architecture-level exploration that determines the architecture of building blocks/components, as well as their optimal organization. This problem becomes even more important nowadays, due to the increased complexity posed by additional (heterogeneous) IP blocks. In order to accomplish this task, up to now many tools have been released that automate the exploration procedure, stating from synthesis and technology mapping [1, 2], up to placement and routing (P&R) [3], and power/energy estimation [5]. Since these tools support only devices consisted of configurable logic blocks (CLBs) and routing infrastructure, they cannot be employed for architecture-level exploration at FPGA platforms that contain also more complex IP blocks (e.g. memories, DSPs cores, etc.). Although commercial frameworks (e.g. [6]) support heterogeneity and power estimation, unfortunately they allow only a small degree of architecture-level exploration. Recently, two frameworks that support application mapping onto FPGAs with such IP cores were introduced [1, 11]. These frameworks are based on a commercial synthesizer [6], while the P&R step is performed with [4]. Even though these This work was supported by the HiPEAC Grand entitled On Providing Dynamic Reliability Improvement in FPGA. 75

2 solutions alleviate the limitation about heterogeneity support, they do not provide results about power consumption (dynamic or static) and energy dissipation. In this paper we propose a new framework targeting to support architecture-level exploration and power estimation for FPGAs that incorporate different memory hierarchies and organizations in terms of delay, area and power/energy consumption. More specifically, the contributions of this work, as compared to prior publications, are summarized, as follows: Introduction of a novel methodology for exploring memory hierarchies and organization, targeting to FPGAs. Extension of an existing tool for power/energy consumption estimating [5, 9], in order to handle also designs with different types, as well as multiple instantiations, of memory blocks. Development of a new open-source tool framework, named NAROUTO (public available at [10]), that software supports the proposed methodology. The rest of the paper is organized, as follows: Section 2 highlights the dominating problems in existing tools for heterogeneity support. The proposed framework is described in section 3, while section 4 discusses the experimental results that prove the efficiency of the proposed framework. Finally, conclusions are summarized in section 5. 2 LIMITATIONS IN HETEROGENEITY SUPPORT In this section we highlight the main limitations in heterogeneity support for available frameworks [1, 11]. The synthesis and technology mapping for both of these solutions are performed with Altera Quartus II [6], while the placement and routing (P&R) is software supported by VPR 5.0 tool [4]. As we will depict later, these tools cannot incorporate with a press button approach, designs that contain functionalities that are mapped onto heterogeneous IP blocks of modern FPGAs (e.g. memories, DSPs, CPUs, etc). Even though the synthesis output from Quartus produces BLIF (Berkeley Logic Interchange Format) format [7] with IP blocks, the resulted netlist is not logically equivalent to the original RTL description, since any IP block of the design is translated into a blackbox (BB) instantiation. More specifically, whenever the functionality of an IP block cannot be mapped onto LUTs and F/Fs, this block is replaced with a BB. A BB provides the same number of input/output pins, as compared to the IP block that actually replaces (in order to enable transparently signal propagation). The requirement for incorporating BBs inside BLIF files is in order to enable academic frameworks to handle state-of-the-art designs (that often contain many heterogeneous blocks). Since BLIF format does not have a build-in support for these heterogeneous blocks, many serious limitations need to be alleviated in order application s functionality not to be disturbed during synthesis and technology mapping. For instance, assuming a design with a 8,192 8 bit RAM block, existing synthesis and technology mapping tools will produce a BLIF file that contains 8,192 unique BBs (the synthesis output for a memory block is reported at word level). However, the limitations of such an approach are summarized, as follows: The application s functionality is altered, since the BBs (both their total number, as well as their connections to rest BBs/CLBs) do not correspond to the initial application s RTL description. For a given design, all the BBs are marked with a unique keyword (.blackbox ) regardless of their actual functionality. This imposes that each design can employ up to one type of BB. However, existing applications assume numerous BBs, each of which has its own characteristics (e.g., size, throughput, power/energy consumption, etc.). 76

3 The increased number of BBs leads to delay, power and area penalties due to the additional routing infrastructure needed for signal communication. 3 PROPOSED FRAMEWORK This section describes the proposed framework, named NAROUTO [9], which is depicted in Figure 1.This framework allows the architecture-level exploration at FPGAs with memory blocks, in terms of delay, area and power/energy consumption. In order to software support NAROUTO framework, a number of new open-source CAD tools have been developed. Due to lack of space, it is not possible to give details about the employed algorithms that support each step of NAROUTO framework; however, more info can be found in [9]. 3.1 Synthesis and Technology Mapping The first step at NAROUTO framework deals with application s synthesis and technology mapping. These tasks are software supported by Quartus [6] tool, while the output is a hierarchical netlist in BLIF format [7]. Such a format is a pre-request not only for the rest tools of NAROUTO framework, but for the majority of academic tools targeting to FPGAs. In order to extract the technology mapped netlist, where the heterogeneous IP blocks (e.g. memories, DSPs, etc.) are replaced with BBs, the following macro is employed in Quartus tool: set_global_assignment -name INI_VARS no_add_ops=on; dump_blif_after_lut_map=on 3.2 Generation of input files for power estimation Next step deals with the generation of activity files for the estimation of power/energy consumption. Since existing version of ACE 2.0 tool [5] cannot support BLIF netlists with BB(s), we have developed a preprocessing step in order to enable the calculation of static probabilities and transition densities from primary inputs to primary outputs for all the nets of the design with BBs. The new tool, named Hb_for_ACE, initially annotates application s netlist by removing all the BB instantiations from BLIF files, and then it connects the BB input and output pins to the BLIF s primary outputs and primary inputs, respectively. By applying the Hb_for_ACE tool, the retrieved design does not contain any BBs, and hence the ACE 2.0 tool can be employed. Regarding the calculation of power/energy consumption for BBs, we assume that these BBs are connected through nets with static probability 0.5 and transition density 0.2 (except if different values are given by the designer). 3.3 Technology mapping onto target FPGA Next, the netlist in BLIF format (with BBs), as it was already retrieved from technology mapping, is mapped onto the target FPGA. For this purpose we use the HBT-VPACK tool [4] in conjunction to a new set of tools that provide efficient handling of BBs. More specifically, the new set of tools focus on alleviating the limitation of Quartus tool that splits a single 77

4 heterogeneous block into multiple (partial) BBs. Next subsection describes in more detail the main features of the new developed tools. Figure 1: The proposed NAROUTO Framework BlackBox_Profiler The BlackBox_Profiler identifies the number of individual BBs incorporated by a design, as well as the specifications for each of them (e.g. functionality, size, number of pins, etc.). This task is accomplished by finding all the partial BBs that belong to a unique IP block. This is feasible since all the partial BBs of an IP have the same signals for control and 78

5 communication with the rest FPGA components (e.g. the read/write enable inputs of a RAM). After identifying the instantiations for different BBs, the specifications for each of them are retrieved from a technology library (e.g. based on datasheets) BlackBox_Packing The output from BlackBox_Profiler gives guidelines regarding how to appropriately cluster all the partial BBs (belonging to the same IP block) into a unique BB. This task, referred as Single-Packing or SP, in NAROUTO framework is software supported by the BlackBox_Packing tool. In order to further improve the flexibility of proposed framework, BlackBox_Profiler supports one more level of packing (mentioned as Full-Packed or FP). The goal of this additional packing is to cluster all the BBs of the same type, into a larger super-bb. For instance, assume that a design requires 16 1Kbyte RAM blocks. The BLIF netlist from Quartus output will report that design has 16,000 BBs, each of which actually corresponds to a 1 byte. After applying SP with NAROUTO, the resulted netlist incorporate 16 BBs, each of which corresponds to a 1Kbyte. With the second level of packing (FP), the netlist will contain only 1 BB with size 16Kbytes. Hence, with the usage of NAROUTO framework, it is possible to evaluate different memory hierarchies Pin_Multiplexing Apart from the limitation of Quartus tool to generate an excessive high number of partial BBs per IP block, each of these BBs have much more I/O pins than those actually exist in the IP block. This imposes that target FPGA incorporates a wider channel, which in turns leads tp higher delay, power and area overheads. In order to overcome from this limitation, we developed a new tool, named Pin_Multiplexing, which aim to reduce the input/output (I/O) pins of BBs. During this task, we do not merge many signals into one, since this would undermine the structural and functional integrity of final netlist. On contrast, the reduction of pins is based on implementing a set of multiplexers at CLBs. More specifically, input signals of a BB initially pass through multiplexing CLBs, and the new multiplexed signals are the actually inputs of the BB. Similarly, output signals of a BB are multiplexed and pass through de-multiplexing CLBs in advance of connecting to the rest netlist. More info regarding this multiplexing/demultiplexing strategy can be found in [9]. Based on the design specifications, the inputs/outputs of a BB can be recursively multiplexed many times, in order to further reduce the number of required pins. This allows deriving BBs with the same number of I/O pins, as compared to the corresponding IP cores (these values were already extracted from the component library during BlackBox_Profiler task). 3.4 Placement and Routing The last step of the proposed framework deals with application s P&R. After that, delay, power/energy and area metrics are extracted in order to evaluate the design implementation. This task is accomplished by a new tool, named HBVPR, which is based on VPR [4] and Powermodel [5], [8] tools. 79

6 As part of HBVPR, we have developed an additional tool that automatically generates the XML descriptions of target FPGA architectures. This tool allows the generation of architectural templates for FPGAs with many types of BBs, each of which might have different properties (e.g. number of pins, functionality, size, etc). 4 EXPERIMENTAL RESULTS This section provides a number of qualitative and quantitative comparisons among the proposed framework (NAROUTO) and two available approaches found in relevant literature [1, 11], under a number of DSP applications from [10]. Table 1 gives a qualitative comparison among the frameworks discussed throughout this paper. Based on this table we can claim that NAROUTO supports more efficiently designs with BBs, while the power/energy estimation feature is incremental to the existing frameworks. Table 1: Qualitative comparison in supported features Feature NAROUTO [6], [12] Different types of BBs Unlimited 1 Realistic number of BBs Yes No Realistic number of I/Os per BBs Yes No Power estimation Yes No Part of complete framework Yes Yes Open source Yes Yes The target FPGA used for the scopes of this paper incorporates a cluster (CLB) size equals to 10, 4-input LUTs and 22/10 inputs/outputs per CLB. The FPGA array, as well as the routing channel width, is the minimum for which each application is routable. Table 2 summarizes some statistics about the application mapping onto such an FPGA device with NAROUTO framework. Table 2: Employed benchmark suite from [11] Benchmark Functionality 4LUT F/Fs RAM bits I/Os oc_aes_core_inv Encryption 5, , oc_ata_ocidec3 Processor 1, oc_hdlc Processor , oc_minirisc Processor , oc_oc8051 Processor 4, , os_blowfish Encryption 5, , Average: 3, , Next, we discuss three possible floor-plans for the memory blocks. These floor-plans, as they are depicted in Figure 2, correspond to FPGA architectures where the memories are assigned to the borders of the device (Figure 2(a)), to the center (Figure 2(b)), and a scenario where memories are uniformly distributed over the FPGA (Figure 2(c)). The three alternative floor-plans are denoted as Border, Center and Uniform, respectively. Note that different floor-plans result to different performance for application mapping, since each of them impose different placement and routing. In order to quantify these floor-plans, we P&R a number of applications onto FPGA devices, where memory blocks are assigned based on Figure 2. 80

7 Table 4 summarizes the performance and power metrics regarding the employed benchmark suite under the three candidate floor-plans for memory blocks. As we can conclude for this table, whenever memories are uniformly distributed over the FPGA (Figure 2(c)), applications are mapped under higher operation frequencies (smaller delay), but this selection also imposes the highest power consumption. On the other hand, if we aim to design a poweraware FPGA architecture, then memories should be floor-planed at the center (Figure 2(b)) of the device, since this selection leads to lower power dissipation (with an almost negligible penalty in performance). (a) (b) (c) Figure 2: Different floor-plans for memory blocks: (a) placed in borders, (b) placed in center, and (c) uniformly distributed. Since target FPGA devices have to meet both timing and power constraints, the selection of most suitable memory floor-planning is performed under these criteria. For this purpose, Table 5 gives the Energy Delay Product (EDP) for a number of applications. Based on these results, it is evident that whenever memory blocks are assigned to the center of the FPGA, this leads to the minimum EDP value. More specifically, the reduction of EDP is up to 33%, as compared to the floor-plan where memories are uniformly distributed over the device. Hence, for the rest of this paper, such an organization of memory blocks is assumed. Note that apart from these three candidate floor-plans, any other floor-plan can be also explored by NAROUTO framework. Table 4: Exploration results for topology selection of memory blocks Benchmark Operation Frequency (MHz) Power Consumption (mwatt) Border Center Uniform Border Center Uniform oc_aes_core_inv oc_ata_ocidec oc_hdlc oc_minirisc oc_oc os_blowfish Average: Table 6 depicts the required number of BBs for different designs. The second column corresponds to the number of BBs as it is retrieved from Quartus synthesis (with the usage of existing approaches [6, 12]), while third and fifth columns give the corresponding values after SP and FP, respectively. For some designs, the BlackBox_Profiler clusters all the BBs into one BB during the first level of packing (SP). Hence, during the FP there is no further 81

8 reduction. Fourth column depicts the estimated RAM bits for each BB after SP. The corresponding value after FP for a given design is retrieved by summarizing all the partial values (shown at fourth column). Table 6 proves our claim that both Quartus synthesizer [6], as well as the existing frameworks [1, 11], cannot handle efficiently designs with BBs. More specifically, it s the first time that a public available framework supports realistic application mapping in heterogeneous FPGA architectures, by supporting the clustering of BBs with same functionality and type (e.g. memories with different sizes) into a super-bb (e.g. BlockRam). Based on experimental results, an average of 68.5 BBs per application is assumed with existing approaches, while the proposed one (after SP) leads only to 5 BBs. We have to mention that the additional partial BBs used at [1, 11] introduce constraints during P&R, which in turn result to delay, power/energy and area overheads. Table 5: Exploration results for topology selection of memory blocks Benchmark Energy Delay Product ( 10-6 ) Border Center Uniform oc_aes_core_inv oc_ata_ocidec oc_hdlc oc_minirisc oc_oc os_blowfish Average: Ratio: Table 7 gives the summary of I/O pins for all the BBs of each design, before and after pin multiplexing, as it is derived from SP. Based on the results we can conclude that before multiplexing (it corresponds to solution retrieved from [1, 11]), there is an average demand for 162 I/O pins for BBs, while after SP, the pin requirement is eliminated to 30.5 (there is a reduction about 80% in the pins number). Table 6: Number and size of BBs before and after packing Benchmark Existing SP FP [6], [12] # of BBs Size of BBs # of BBs oc_aes_core_inv ,176 1 oc_ata_ocidec oc_hdlc ,024 1 oc_minirisc ,024 1 oc_oc os_blowfish ,434 1 Average: ,282 1 Such an unrealistic demand for pins posed by [1, 11] among others introduce constraints that do not allow some of the benchmarks to be mapped onto heterogeneous FPGAs. These constraints are mainly tightly firmed to the routing channel width, which in many cases (especially when the number of I/O pins from BBs is extremely high) exceeded the maximum value the design tools could manage. 82

9 In order to quantify the gains from applying the proposed framework in terms of delay and power consumption, Figures 3 and 4 plot these variations for different applications. For each design at these graphs we provide three solutions, namely (i) Initial [1, 11], (ii) SP (Single Packed), and (iii) FP (Full Packed). Table 7: Total number of I/O pins for BBs before and after SP. Benchmark Total pins of all BBs Before multiplexing After multiplexing oc_aes_core_inv oc_ata_ocidec oc_hdlc oc_minirisc oc_oc os_blowfish Average: Based on the results we can conclude that SP and FP lead to an average delay reduction about 46%, as compared to existing frameworks [1, 11]. Similar, regarding the power consumption, the proposed solutions (SP and FP) achieve an average reduction about 82%. As we have already mentioned, these gains occur due to better handling of BBs inside designs. Since the designs retrieved with NAROUTO framework incorporate fewer BBs, and hence fewer I/O pins around each of them, the proposed framework leads to smaller FPGA devices, composed among others with fewer tracks per routing channel. Figure 3: Delay evaluation for alternative application implementations. 83

10 5 CONCLUSIONS A novel methodology, as well as the supporting tool framework, for enabling rapid memory exploration in FPGA devices, was proposed. This framework can handle designs with IP cores more efficiently, as compared to existing solutions, while it is the first tool that also provides measurements about power/energy consumption. Experimental results shown average gains in terms of delay and power consumption about 46% and 82%, respectively, as compared to relevant solutions, whereas different memory floor-plans lead to EDP reduction up to 33%. Figure 4: Power consumption for alternative application implementations. REFERENCES [1] J. Pistorius, et.al., Benchmarking method and designs targeting logic synthesis for FPGAs", Proc. IWLS, pp , [2] M. Gao, J.H. Jiang, Y. Jiang, Y. Li, S. Sinha, and R. Brayton, MVSIS, International Workshop on Logic Synthesis, [3] V. Betz and J. Rose, VPR: A New Packing, Placement and Routing Tool for FPGA Research, Int. Workshop on Field-Programmable Logic and Applications, 1997, pp [4] L. Jason, et.al., VPR 5.0: FPGA cad and architecture exploration tools with single-driver routing, heterogeneity and process scaling, Int. Symp. on FPGA, pp , [5] K. Poon, et.al., A detailed power model for field-programmable gate arrays, ACM Trans. on TODAES, Vol.10 No.2, pp , April [6] Altera, Corporation, Quartus II Software. [7] Berkeley Logic Interchange Format (BLIF), University of California, Berkeley, [8] P. Jamieson, et.al., An Energy and Power Consumption Analysis of FPGA Routing Architectures, Field-Programmable Technology, pp , [9] C. Sidiropoulos, Development of a design framework for Power/Energy consumption estimation in heterogeneous FPGA architectures, Master thesis, NTUA, Greece, 2010 (available at [10] Altera, Corporation, Quartus-II University Interface Program. [11] S. Dai and E. Bozorgzadeh, CAD Tool for FPGAs with Embedded Hard Cores for Design Space Exploration of Future Architectures, 14th Symp. FCCM,

Journal of Systems Architecture

Journal of Systems Architecture Journal of Systems Architecture 59 (2013) 78 90 Contents lists available at SciVerse ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc On supporting rapid exploration

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

On Supporting Adaptive Fault Tolerant at Run-Time with Virtual FPGAs

On Supporting Adaptive Fault Tolerant at Run-Time with Virtual FPGAs On Supporting Adaptive Fault Tolerant at Run-Time with Virtual FPAs K. Siozios 1, D. Soudris 1 and M. Hüebner 2 1 School of ECE, National Technical University of Athens reece Email: {ksiop, dsoudris}@microlab.ntua.gr

More information

Designing Heterogeneous FPGAs with Multiple SBs *

Designing Heterogeneous FPGAs with Multiple SBs * Designing Heterogeneous FPGAs with Multiple SBs * K. Siozios, S. Mamagkakis, D. Soudris, and A. Thanailakis VLSI Design and Testing Center, Department of Electrical and Computer Engineering, Democritus

More information

Fault-Free: A Framework for Supporting Fault Tolerance in FPGAs

Fault-Free: A Framework for Supporting Fault Tolerance in FPGAs Fault-Free: A Framework for Supporting Fault Tolerance in FPGAs Kostas Siozios 1, Dimitrios Soudris 1 and Dionisios Pnevmatikatos 2 1 School of Electrical & Computer Engineering, National Technical University

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Development of tools supporting. MEANDER Design Framework

Development of tools supporting. MEANDER Design Framework Development of tools supporting FPGA reconfigurable hardware MEANDER Design Framework Presentation Outline Current state of academic design tools Proposed design flow Proposed graphical user interface

More information

Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures

Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures by Daniele G Paladino A thesis submitted in conformity with the requirements for the degree of Master of Applied

More information

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,

More information

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada, V6T

More information

Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs

Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs Anurag Tiwari and Karen A. Tomko Department of ECECS, University of Cincinnati Cincinnati, OH 45221-0030, USA {atiwari,

More information

Reducing Power in an FPGA via Computer-Aided Design

Reducing Power in an FPGA via Computer-Aided Design Reducing Power in an FPGA via Computer-Aided Design Steve Wilton University of British Columbia Power Reduction via CAD How to reduce power dissipation in an FPGA: - Create power-aware CAD tools - Create

More information

Research Article Architecture-Level Exploration of Alternative Interconnection Schemes Targeting 3D FPGAs: A Software-Supported Methodology

Research Article Architecture-Level Exploration of Alternative Interconnection Schemes Targeting 3D FPGAs: A Software-Supported Methodology International Journal of Reconfigurable Computing Volume 2008, Article ID 764942, 18 pages doi:10.1155/2008/764942 Research Article Architecture-Level Exploration of Alternative Interconnection Schemes

More information

MODULAR PARTITIONING FOR INCREMENTAL COMPILATION

MODULAR PARTITIONING FOR INCREMENTAL COMPILATION MODULAR PARTITIONING FOR INCREMENTAL COMPILATION Mehrdad Eslami Dehkordi, Stephen D. Brown Dept. of Electrical and Computer Engineering University of Toronto, Toronto, Canada email: {eslami,brown}@eecg.utoronto.ca

More information

Advanced FPGA Design Methodologies with Xilinx Vivado

Advanced FPGA Design Methodologies with Xilinx Vivado Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,

More information

Architecture Evaluation for

Architecture Evaluation for Architecture Evaluation for Power-efficient FPGAs Fei Li*, Deming Chen +, Lei He*, Jason Cong + * EE Department, UCLA + CS Department, UCLA Partially supported by NSF and SRC Outline Introduction Evaluation

More information

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Design and Implementation of CVNS Based Low Power 64-Bit Adder Design and Implementation of CVNS Based Low Power 64-Bit Adder Ch.Vijay Kumar Department of ECE Embedded Systems & VLSI Design Vishakhapatnam, India Sri.Sagara Pandu Department of ECE Embedded Systems

More information

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010 Parallelizing FPGA Technology Mapping using GPUs Doris Chen Deshanand Singh Aug 31 st, 2010 Motivation: Compile Time In last 12 years: 110x increase in FPGA Logic, 23x increase in CPU speed, 4.8x gap Question:

More information

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design

More information

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs Vaughn Betz Jonathan Rose Alexander Marquardt

More information

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments 8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments QII51017-9.0.0 Introduction The Quartus II incremental compilation feature allows you to partition a design, compile partitions

More information

Stratix vs. Virtex-II Pro FPGA Performance Analysis

Stratix vs. Virtex-II Pro FPGA Performance Analysis White Paper Stratix vs. Virtex-II Pro FPGA Performance Analysis The Stratix TM and Stratix II architecture provides outstanding performance for the high performance design segment, providing clear performance

More information

Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints

Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints Improving Reconfiguration Speed for Dynamic Circuit Specialization using Placement Constraints Amit Kulkarni, Tom Davidson, Karel Heyse, and Dirk Stroobandt ELIS department, Computer Systems Lab, Ghent

More information

A Hierarchical Description Language and Packing Algorithm for Heterogenous FPGAs. Jason Luu

A Hierarchical Description Language and Packing Algorithm for Heterogenous FPGAs. Jason Luu A Hierarchical Description Language and Packing Algorithm for Heterogenous FPGAs by Jason Luu A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate

More information

Lossless Compression using Efficient Encoding of Bitmasks

Lossless Compression using Efficient Encoding of Bitmasks Lossless Compression using Efficient Encoding of Bitmasks Chetan Murthy and Prabhat Mishra Department of Computer and Information Science and Engineering University of Florida, Gainesville, FL 326, USA

More information

The Impact of Pipelining on Energy per Operation in Field-Programmable Gate Arrays

The Impact of Pipelining on Energy per Operation in Field-Programmable Gate Arrays The Impact of Pipelining on Energy per Operation in Field-Programmable Gate Arrays Steven J.E. Wilton 1, Su-Shin Ang 2 and Wayne Luk 2 1 Dept. of Electrical and Computer Eng. University of British Columbia

More information

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable

More information

ISSN Vol.05,Issue.09, September-2017, Pages:

ISSN Vol.05,Issue.09, September-2017, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,

More information

Fast FPGA Routing Approach Using Stochestic Architecture

Fast FPGA Routing Approach Using Stochestic Architecture . Fast FPGA Routing Approach Using Stochestic Architecture MITESH GURJAR 1, NAYAN PATEL 2 1 M.E. Student, VLSI and Embedded System Design, GTU PG School, Ahmedabad, Gujarat, India. 2 Professor, Sabar Institute

More information

Thermal optimization for micro-architectures through selective block replication

Thermal optimization for micro-architectures through selective block replication Thermal optimization for micro-architectures through selective block replication Dionisios Diamantopoulos, Kostas Siozios, Sotiris Xydis and Dimitrios Soudris School of Electrical and Computer Engineering

More information

Exploring Logic Block Granularity for Regular Fabrics

Exploring Logic Block Granularity for Regular Fabrics 1530-1591/04 $20.00 (c) 2004 IEEE Exploring Logic Block Granularity for Regular Fabrics A. Koorapaty, V. Kheterpal, P. Gopalakrishnan, M. Fu, L. Pileggi {aneeshk, vkheterp, pgopalak, mfu, pileggi}@ece.cmu.edu

More information

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors RPack: Rability-Driven packing for cluster-based FPGAs E. Bozorgzadeh S. Ogrenci-Memik M. Sarrafzadeh Computer Science Department Department ofece Computer Science Department UCLA Northwestern University

More information

A Software-Supported Methodology for Designing General-Purpose Interconnection Networks for Reconfigurable Architectures

A Software-Supported Methodology for Designing General-Purpose Interconnection Networks for Reconfigurable Architectures A Software-Supported Methodology for Designing General-Purpose Interconnection Networks for Reconfigurable Architectures Kostas Siozios, Dimitrios Soudris and Antonios Thanailakis Abstract Modern applications

More information

What is Xilinx Design Language?

What is Xilinx Design Language? Bill Jason P. Tomas University of Nevada Las Vegas Dept. of Electrical and Computer Engineering What is Xilinx Design Language? XDL is a human readable ASCII format compatible with the more widely used

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

CHAPTER 7 FPGA IMPLEMENTATION OF HIGH SPEED ARITHMETIC CIRCUIT FOR FACTORIAL CALCULATION

CHAPTER 7 FPGA IMPLEMENTATION OF HIGH SPEED ARITHMETIC CIRCUIT FOR FACTORIAL CALCULATION 86 CHAPTER 7 FPGA IMPLEMENTATION OF HIGH SPEED ARITHMETIC CIRCUIT FOR FACTORIAL CALCULATION 7.1 INTRODUCTION Factorial calculation is important in ALUs and MAC designed for general and special purpose

More information

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Dynamic HW/SW Partitioning Initially execute application in software only 5 Partitioned application executes faster

More information

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores A Configurable Multi-Ported Register File Architecture for Soft Processor Cores Mazen A. R. Saghir and Rawan Naous Department of Electrical and Computer Engineering American University of Beirut P.O. Box

More information

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems Abstract Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time

More information

Introduction to VHDL Design on Quartus II and DE2 Board

Introduction to VHDL Design on Quartus II and DE2 Board ECP3116 Digital Computer Design Lab Experiment Duration: 3 hours Introduction to VHDL Design on Quartus II and DE2 Board Objective To learn how to create projects using Quartus II, design circuits and

More information

IMPROVING MEMORY AND VALIDATION SUPPORT IN FPGA ARCHITECTURE EXPLORATION. Andrew Somerville

IMPROVING MEMORY AND VALIDATION SUPPORT IN FPGA ARCHITECTURE EXPLORATION. Andrew Somerville IMPROVING MEMORY AND VALIDATION SUPPORT IN FPGA ARCHITECTURE EXPLORATION by Andrew Somerville Bachelor of Computer Science, University of New Brunswick, 2010 A Thesis Submitted in Partial Fulfillment of

More information

Lab 3 Verilog Simulation Mapping

Lab 3 Verilog Simulation Mapping University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences 1. Motivation Lab 3 Verilog Simulation Mapping In this lab you will learn how to use

More information

SYNTHETIC CIRCUIT GENERATION USING CLUSTERING AND ITERATION

SYNTHETIC CIRCUIT GENERATION USING CLUSTERING AND ITERATION SYNTHETIC CIRCUIT GENERATION USING CLUSTERING AND ITERATION Paul D. Kundarewich and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, ON, M5S G4, Canada {kundarew,

More information

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE 754-2008 Standard M. Shyamsi, M. I. Ibrahimy, S. M. A. Motakabber and M. R. Ahsan Dept. of Electrical and Computer Engineering

More information

Don t expect to be able to write and debug your code during the lab session.

Don t expect to be able to write and debug your code during the lab session. EECS150 Spring 2002 Lab 4 Verilog Simulation Mapping UNIVERSITY OF CALIFORNIA AT BERKELEY COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Lab 4 Verilog Simulation Mapping

More information

On pin-to-wire routing in FPGAs. Niyati Shah

On pin-to-wire routing in FPGAs. Niyati Shah On pin-to-wire routing in FPGAs by Niyati Shah A thesis submitted in conformity with the requirements for the degree of Master of Applied Science and Engineering Graduate Department of Electrical & Computer

More information

Vdd Programmability to Reduce FPGA Interconnect Power

Vdd Programmability to Reduce FPGA Interconnect Power Vdd Programmability to Reduce FPGA Interconnect Power Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 ABSTRACT Power is an increasingly important

More information

Placement Algorithm for FPGA Circuits

Placement Algorithm for FPGA Circuits Placement Algorithm for FPGA Circuits ZOLTAN BARUCH, OCTAVIAN CREŢ, KALMAN PUSZTAI Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,

More information

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany

More information

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department

More information

CAD dependent Estimation of Optimal k-value in FSM onto k-lut FPGA mappings, based on standard benchmark networks

CAD dependent Estimation of Optimal k-value in FSM onto k-lut FPGA mappings, based on standard benchmark networks CAD dependent Estimation of Optimal k-value in FSM onto k-lut FPGA mappings, based on standard benchmark networks DOKOUZYANNIS STAVROS 1 ARZOUMANIDIS EFSEVIOS 2 Aristotle University of Thessaloniki Department

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

FPGA Clock Network Architecture: Flexibility vs. Area and Power

FPGA Clock Network Architecture: Flexibility vs. Area and Power FPGA Clock Network Architecture: Flexibility vs. Area and Power Julien Lamoureux and Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, B.C.,

More information

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I Overview Anti-fuse and EEPROM-based devices Contemporary SRAM devices - Wiring - Embedded New trends - Single-driver wiring -

More information

Efficient SAT-based Boolean Matching for FPGA Technology Mapping

Efficient SAT-based Boolean Matching for FPGA Technology Mapping Efficient SAT-based Boolean Matching for FPGA Technology Mapping Sean Safarpour, Andreas Veneris Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada {sean, veneris}@eecg.toronto.edu

More information

A High Performance Bus Communication Architecture through Bus Splitting

A High Performance Bus Communication Architecture through Bus Splitting A High Performance Communication Architecture through Splitting Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 797, USA {lur, chengkok}@ecn.purdue.edu

More information

FAST time-to-market, steadily decreasing cost, and

FAST time-to-market, steadily decreasing cost, and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 10, OCTOBER 2004 1015 Power Estimation Techniques for FPGAs Jason H. Anderson, Student Member, IEEE, and Farid N. Najm, Fellow,

More information

FIELD programmable gate arrays (FPGAs) provide an attractive

FIELD programmable gate arrays (FPGAs) provide an attractive IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 9, SEPTEMBER 2005 1035 Circuits and Architectures for Field Programmable Gate Array With Configurable Supply Voltage Yan Lin,

More information

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Michalis D. Galanis, Gregory Dimitroulakos, and Costas E. Goutis VLSI Design Laboratory, Electrical and Computer Engineering

More information

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric. SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, 2007 1 A Low-Power Field-Programmable Gate Array Routing Fabric Mingjie Lin Abbas El Gamal Abstract This paper describes a new FPGA

More information

An automatic tool flow for the combined implementation of multi-mode circuits

An automatic tool flow for the combined implementation of multi-mode circuits An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João M. P. Cardoso and Dirk Stroobandt Ghent University, ELIS Department Sint-Pietersnieuwstraat

More information

Hierarchical Design Using Synopsys and Xilinx FPGAs

Hierarchical Design Using Synopsys and Xilinx FPGAs White Paper: FPGA Design Tools WP386 (v1.0) February 15, 2011 Hierarchical Design Using Synopsys and Xilinx FPGAs By: Kate Kelley Xilinx FPGAs offer up to two million logic cells currently, and they continue

More information

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders Vol. 3, Issue. 4, July-august. 2013 pp-2266-2270 ISSN: 2249-6645 Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders V.Krishna Kumari (1), Y.Sri Chakrapani

More information

Challenges of FPGA Physical Design

Challenges of FPGA Physical Design Challenges of FPGA Physical Design Larry McMurchie 1 and Jovanka Ciric Vujkovic 2 1 Principal Engineer, Solutions Group, Synopsys, Inc., Mountain View, CA, USA 2 R&D Manager, Solutions Group, Synopsys,

More information

Reduce FPGA Power With Automatic Optimization & Power-Efficient Design. Vaughn Betz & Sanjay Rajput

Reduce FPGA Power With Automatic Optimization & Power-Efficient Design. Vaughn Betz & Sanjay Rajput Reduce FPGA Power With Automatic Optimization & Power-Efficient Design Vaughn Betz & Sanjay Rajput Previous Power Net Seminar Silicon vs. Software Comparison 100% 80% 60% 40% 20% 0% 20% -40% Percent Error

More information

Toward More Efficient Annealing-Based Placement for Heterogeneous FPGAs. Yingxuan Liu

Toward More Efficient Annealing-Based Placement for Heterogeneous FPGAs. Yingxuan Liu Toward More Efficient Annealing-Based Placement for Heterogeneous FPGAs by Yingxuan Liu A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department

More information

Embedded Programmable Logic Core Enhancements for System Bus Interfaces

Embedded Programmable Logic Core Enhancements for System Bus Interfaces Embedded Programmable Logic Core Enhancements for System Bus Interfaces Bradley R. Quinton, Steven J.E. Wilton Dept. of Electrical and Computer Engineering University of British Columbia {bradq,stevew}@ece.ubc.ca

More information

POWER OPTIMIZATION USING BODY BIASING METHOD FOR DUAL VOLTAGE FPGA

POWER OPTIMIZATION USING BODY BIASING METHOD FOR DUAL VOLTAGE FPGA POWER OPTIMIZATION USING BODY BIASING METHOD FOR DUAL VOLTAGE FPGA B.Sankar 1, Dr.C.N.Marimuthu 2 1 PG Scholar, Applied Electronics, Nandha Engineering College, Tamilnadu, India 2 Dean/Professor of ECE,

More information

THE COARSE-GRAINED / FINE-GRAINED LOGIC INTERFACE IN FPGAS WITH EMBEDDED FLOATING-POINT ARITHMETIC UNITS

THE COARSE-GRAINED / FINE-GRAINED LOGIC INTERFACE IN FPGAS WITH EMBEDDED FLOATING-POINT ARITHMETIC UNITS THE COARSE-GRAINED / FINE-GRAINED LOGIC INTERFACE IN FPGAS WITH EMBEDDED FLOATING-POINT ARITHMETIC UNITS Chi Wai Yu 1, Julien Lamoureux 2, Steven J.E. Wilton 2, Philip H.W. Leong 3, Wayne Luk 1 1 Dept

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

Using Bus-Based Connections to Improve Field-Programmable Gate Array Density for Implementing Datapath Circuits

Using Bus-Based Connections to Improve Field-Programmable Gate Array Density for Implementing Datapath Circuits Using Bus-Based Connections to Improve Field-Programmable Gate Array Density for Implementing Datapath Circuits Andy Ye and Jonathan Rose The Edward S. Rogers Sr. Department of Electrical and Computer

More information

AN 567: Quartus II Design Separation Flow

AN 567: Quartus II Design Separation Flow AN 567: Quartus II Design Separation Flow June 2009 AN-567-1.0 Introduction This application note assumes familiarity with the Quartus II incremental compilation flow and floorplanning with the LogicLock

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

Area Efficient SAD Architecture for Block Based Video Compression Standards

Area Efficient SAD Architecture for Block Based Video Compression Standards IJCAES ISSN: 2231-4946 Volume III, Special Issue, August 2013 International Journal of Computer Applications in Engineering Sciences Special Issue on National Conference on Information and Communication

More information

Best Practices for Incremental Compilation Partitions and Floorplan Assignments

Best Practices for Incremental Compilation Partitions and Floorplan Assignments Best Practices for Incremental Compilation Partitions and Floorplan Assignments December 2007, ver. 1.0 Application Note 470 Introduction The Quartus II incremental compilation feature allows you to partition

More information

A System-Level Stochastic Circuit Generator for FPGA Architecture Evaluation

A System-Level Stochastic Circuit Generator for FPGA Architecture Evaluation A System-Level Stochastic Circuit Generator for FPGA Architecture Evaluation Cindy Mark, Ava Shui, Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver,

More information

DESIGN STRATEGIES & TOOLS UTILIZED

DESIGN STRATEGIES & TOOLS UTILIZED CHAPTER 7 DESIGN STRATEGIES & TOOLS UTILIZED 7-1. Field Programmable Gate Array The internal architecture of an FPGA consist of several uncommitted logic blocks in which the design is to be encoded. The

More information

Design Methodologies

Design Methodologies Design Methodologies 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 Complexity Productivity (K) Trans./Staff - Mo. Productivity Trends Logic Transistor per Chip (M) 10,000 0.1

More information

A Direct Memory Access Controller (DMAC) IP-Core using the AMBA AXI protocol

A Direct Memory Access Controller (DMAC) IP-Core using the AMBA AXI protocol SIM 2011 26 th South Symposium on Microelectronics 167 A Direct Memory Access Controller (DMAC) IP-Core using the AMBA AXI protocol 1 Ilan Correa, 2 José Luís Güntzel, 1 Aldebaro Klautau and 1 João Crisóstomo

More information

Paper ID # IC In the last decade many research have been carried

Paper ID # IC In the last decade many research have been carried A New VLSI Architecture of Efficient Radix based Modified Booth Multiplier with Reduced Complexity In the last decade many research have been carried KARTHICK.Kout 1, MR. to reduce S. BHARATH the computation

More information

Graduate Institute of Electronics Engineering, NTU FPGA Design with Xilinx ISE

Graduate Institute of Electronics Engineering, NTU FPGA Design with Xilinx ISE FPGA Design with Xilinx ISE Presenter: Shu-yen Lin Advisor: Prof. An-Yeu Wu 2005/6/6 ACCESS IC LAB Outline Concepts of Xilinx FPGA Xilinx FPGA Architecture Introduction to ISE Code Generator Constraints

More information

FPGA-Based Rapid Prototyping of Digital Signal Processing Systems

FPGA-Based Rapid Prototyping of Digital Signal Processing Systems FPGA-Based Rapid Prototyping of Digital Signal Processing Systems Kevin Banovic, Mohammed A. S. Khalid, and Esam Abdel-Raheem Presented By Kevin Banovic July 29, 2005 To be presented at the 48 th Midwest

More information

New Successes for Parameterized Run-time Reconfiguration

New Successes for Parameterized Run-time Reconfiguration New Successes for Parameterized Run-time Reconfiguration (or: use the FPGA to its true capabilities) Prof. Dirk Stroobandt Ghent University, Belgium Hardware and Embedded Systems group Universiteit Gent

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016 NEW VLSI ARCHITECTURE FOR EXPLOITING CARRY- SAVE ARITHMETIC USING VERILOG HDL B.Anusha 1 Ch.Ramesh 2 shivajeehul@gmail.com 1 chintala12271@rediffmail.com 2 1 PG Scholar, Dept of ECE, Ganapathy Engineering

More information

Improving Logic Obfuscation via Logic Cone Analysis

Improving Logic Obfuscation via Logic Cone Analysis Improving Logic Obfuscation via Logic Cone Analysis Yu-Wei Lee and Nur A. Touba Computer Engineering Research Center University of Texas, Austin, TX 78712 ywlee@utexas.edu, touba@utexas.edu Abstract -

More information

TIERS: Topology IndependEnt Pipelined Routing and Scheduling for VirtualWire TM Compilation

TIERS: Topology IndependEnt Pipelined Routing and Scheduling for VirtualWire TM Compilation TIERS: Topology IndependEnt Pipelined Routing and Scheduling for VirtualWire TM Compilation Charles Selvidge, Anant Agarwal, Matt Dahl, Jonathan Babb Virtual Machine Works, Inc. 1 Kendall Sq. Building

More information

Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure

Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Subhendu Roy 1, Pavlos M. Mattheakis 2, Laurent Masse-Navette 2 and David Z. Pan 1 1 ECE Department, The University of Texas at Austin

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

Multi-domain Communication Scheduling For FPGA-based Logic Emulation

Multi-domain Communication Scheduling For FPGA-based Logic Emulation Multi-domain Communication Scheduling For -based Logic Emulation Abstract Communication scheduling is a technique used by many parallel verification systems to pipeline data signals across shared physical

More information

Low energy and High-performance Embedded Systems Design and Reconfigurable Architectures

Low energy and High-performance Embedded Systems Design and Reconfigurable Architectures Low energy and High-performance Embedded Systems Design and Reconfigurable Architectures Ass. Professor Dimitrios Soudris School of Electrical and Computer Eng., National Technical Univ. of Athens, Greece

More information

Digital Design Methodology (Revisited) Design Methodology: Big Picture

Digital Design Methodology (Revisited) Design Methodology: Big Picture Digital Design Methodology (Revisited) Design Methodology Design Specification Verification Synthesis Technology Options Full Custom VLSI Standard Cell ASIC FPGA CS 150 Fall 2005 - Lec #25 Design Methodology

More information

An FPGA Design And Implementation Framework Combined With Commercial VLSI CADs

An FPGA Design And Implementation Framework Combined With Commercial VLSI CADs An FPGA Design And Implementation Framework Combined With Commercial VLSI CADs ReCoSoC 2013 Qian Zhao Motoki Amagasaki Masahiro Iida Morihiro Kuga Toshinori Sueyoshi (, Japan) Background FPGA IP core development

More information

Placement Strategies for 2.5D FPGA Fabric Architectures

Placement Strategies for 2.5D FPGA Fabric Architectures Placement Strategies for 2.5D FPGA Fabric Architectures Chirag Ravishankar 3100 Logic Dr. Longmont, Colorado Email: chiragr@xilinx.com Dinesh Gaitonde 2100 Logic Dr. San Jose, California Email: dineshg@xilinx.com

More information

Verilog Simulation Mapping

Verilog Simulation Mapping 1 Motivation UNIVERSITY OF CALIFORNIA AT BERKELEY COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Lab 4 Verilog Simulation Mapping In this lab you will learn how to use

More information

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

An Efficient Carry Select Adder with Less Delay and Reduced Area Application An Efficient Carry Select Adder with Less Delay and Reduced Area Application Pandu Ranga Rao #1 Priyanka Halle #2 # Associate Professor Department of ECE Sreyas Institute of Engineering and Technology,

More information

Combinational and Sequential Mapping with Priority Cuts

Combinational and Sequential Mapping with Priority Cuts Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton Department of EECS, University of California, Berkeley {alanmi, smcho, satrajit, brayton@eecs.berkeley.edu

More information

Prog. Logic Devices Schematic-Based Design Flows CMPE 415. Designer could access symbols for logic gates and functions from a library.

Prog. Logic Devices Schematic-Based Design Flows CMPE 415. Designer could access symbols for logic gates and functions from a library. Schematic-Based Design Flows Early schematic-driven ASIC flow Designer could access symbols for logic gates and functions from a library. Simulator would use a corresponding library with logic functionality

More information

Digital Design Methodology

Digital Design Methodology Digital Design Methodology Prof. Soo-Ik Chae Digital System Designs and Practices Using Verilog HDL and FPGAs @ 2008, John Wiley 1-1 Digital Design Methodology (Added) Design Methodology Design Specification

More information

A Time-Multiplexed FPGA

A Time-Multiplexed FPGA A Time-Multiplexed FPGA Steve Trimberger, Dean Carberry, Anders Johnson, Jennifer Wong Xilinx, nc. 2 100 Logic Drive San Jose, CA 95124 408-559-7778 steve.trimberger @ xilinx.com Abstract This paper describes

More information

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic EN2911X: Reconfigurable Computing Topic 01: Programmable Logic Prof. Sherief Reda School of Engineering, Brown University Fall 2012 1 FPGA architecture Programmable interconnect Programmable logic blocks

More information