Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs. Chethan Kumar H B and Nachiket Kapre
|
|
- Easter Elliott
- 5 years ago
- Views:
Transcription
1 -DSP Harnessing the Xilinx DSP Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org
2 FPL 201 paper Jan Gray co-author Specs 60 s+100 FFs 2.9ns clock Smallest FPGA router available + RTL code 2
3 32b payload + Virtex-6 240T Router s FFs Clock Penn 1.7K 41 4.ns CMU 1.K ns FPL ns 3
4 32b payload + Virtex-6 240T Router s FFs Clock Penn 1.7K 41 4.ns CMU 1.K ns FPL ns 2x 3
5 32b payload + Virtex-6 240T Router s FFs Clock Penn 1.7K 41 4.ns CMU 1.K ns FPL ns 2x x 3
6 32b payload + Virtex-6 240T Router s FFs Clock Penn 1.7K 41 4.ns CMU 1.K ns FPL ns 2x x 1.x 3
7 47b payload + Virtex-7 T Router s FFs Clock FPL 201 -DSP FPL ns ns 4
8 47b payload + Virtex-7 T Router s FFs Clock FPL 201 -DSP FPL ns ns
9 47b payload + Virtex-7 T Router s FFs Clock FPL 201 -DSP FPL ns ns x
10 47b payload + Virtex-7 T Router s FFs Clock FPL 201 -DSP FPL ns ns x 8x
11 47b payload + Virtex-7 T Router s FFs Clock FPL 201 -DSP FPL ns ns x 8x ~
12 47b payload + Virtex-7 T Router s FFs Clock FPL 201 -DSP FPL ns ns x 8x ~ + 1 DSP 6
13 7
14 Motivation Close the gap vs. embedded NoCs do we really want clean-slate hard NoCs? Return resources to FPGA application reduce NoC overheads Find clever ways to reuse existing FPGA elements 8
15 Outline Adapting the arch. to the DSP Scaling to 2D layouts using DSP carry chains Performance and Resource evaluation 9
16 Outline Adapting the arch. to the DSP Scaling to 2D layouts using DSP carry chains Performance and Resource evaluation 10
17 Overview of switch organization NoC organised as a unidirectional torus Each switch has 2 inputs, 2 outputs into the network + PE connection Uses deflection routing no buffering, no allocation, etc from: Jan Gray 11
18 Internals W PE N E 6 SPE 12 sel0 sel1,2 DOR Logic
19 W PE N 6 E SPE summary sel0 sel1,2 DOR Logic Bulk of the footprint from -, 6- blocks implement packet multiplexers DOR logic handful of s only reads address fields, valid signals Inter- router links pipelined registers Idea: move (1) multiplexers + (2) registers into Xilinx DSP block 13
20 Xilinx DSP block INMODE OPMODE ALUMODE PCOUT A D X B 18 Y P C PCIN Z ALU 14
21 Xilinx DSP block INMODE OPMODE ALUMODE PCOUT A D X B 18 Y P C PCIN Z ALU 1
22 INMODE OPMODE ALUMODE PCOUT A D X Programmable B 18 Y P C PCIN Z ALU elements Xilinx DSP block very versatile! Typical use case: signal processing, streaming computations => mainly arithmetic INMODE 27b multiplexer between A and D OPMODE b multiplexers between A:B, C Exploit cascade links PCINPCOUT! 16
23 Input + Multiplexer Mapping 6 W PE E SPE DOR Logic sel0 sel1,2 N A D B C P 27 PCIN ALU X Z Y PCOUT OPMODE ALUMODE INMODE 17
24 Input + Multiplexer Mapping 6 W PE E SPE DOR Logic sel0 sel1,2 N A D B C P 27 PCIN ALU X Z Y PCOUT OPMODE ALUMODE INMODE WEST PE N SPE EAST 18
25 Input + Multiplexer Mapping 6 W PE E SPE DOR Logic sel0 sel1,2 N A D B C P 27 PCIN ALU X Z Y PCOUT OPMODE ALUMODE INMODE WEST PE N SPE EAST 19
26 Input + Multiplexer Mapping 6 W PE E SPE DOR Logic sel0 sel1,2 N A D B C P 27 PCIN ALU X Z Y PCOUT OPMODE ALUMODE INMODE WEST PE N SPE EAST 20
27 Multi-cycling Problem: has two outputs (three in fact, with SPE output port shared) Solution: must multi-pump the DSP block runs at 2x the frequency of the PEs First sub-cycle resolve EAST output Second sub-cycle resolve SOUTHPE output 21
28 First cycle CE INMODE OPMODE ALUMODE PCOUT A D X East Output B 18 Y P C PCIN PE Input West Input Z ALU 22
29 Second cycle CE INMODE OPMODE ALUMODE PCOUT A D B North Input X Y SouthPE Output P C PCIN PE Input West Input Z ALU 23
30 Outline Adapting the arch. to the DSP Scaling to 2D layouts using DSP carry chains Performance and Resource evaluation 24
31 DSP columnar layout DOR Logic PCIN dedicated cascade routes PCOUT P A:B P A:B C PCIN DSP Column User Logic PCOUT programmable FPGA interconnect 2
32 Layout considerations FPGA DSPs organised into vertical columns ~100s of DSPs in a column ~10s of columns Restrictions: 1. Cascade links only extend within column 2. Horizontal links must use general interconnect Key question: Adjusting NoC size vs. DSP count use passthrough DSPs 26
33 Embedded layout Top-Turn DSPs PCIN to P Router DSPs Pass-thru DSPs PCOUT to PCIN Router DSPs fabric Pass-thru DSPs PCOUT to PCIN cascade fabric Router DSPs Bottom-Turn DSPs A:B to PCOUT 27
34 Comparing Xilinx Virtex6 and Virtex7 Layouts 8x8 NoC (ML60 board) 28 16x16 NoC (VC707 board)
35 Outline Adapting the arch. to the DSP Scaling to 2D layouts using DSP carry chains Performance and Resource evaluation 29
36 s vs DSPs Simple tradeoff substantially fewer s vs. DSPs Importantly, FFs absorbed into DSP Power and effective BW for random traffic mostly identical 30
37 s vs DSPs Simple tradeoff substantially fewer s vs. DSPs Importantly, FFs absorbed into DSP Power and effective BW for random traffic mostly identical 31
38 Commentary on hard NoCs Area: Hard router = 12.4 LABs 1 Altera DSP block = 11.9 LABs Stratix-III -DSP marginally smaller Speed: Hard router ~996 MHz -DSP ~60 MHz (multi-pumped) -DSP limits freq advantage to 3x. Power Hard router ~1.8 W -DSP model ~1.1W 1% activity -DSP uses ~0% less power Abdelfattah + Betz [TRETS2014] (extrapolated results for b-wide 1VC) 32
39 Wish-list for DSPs Gen2 Configurable Cascades b switched bidirectional routing instead of just cascades (approach hard NoC wiring) option to skip DSP blocks (segment lengths) DOR routing pattern detection logic with multiple masks (similar to Altera DSP units) SIMD Multiplexing fracturing b-wide lanes into multiple lanes 33
40 Conclusions muxes mapped to DSP blocks use the dynamic OPMODE feature Reduce cost by x s, 8x FFs per router Exploit cascade links to absorb NoC wiring Significantly close the gap with hard NoCs 34
41 Embedded layout Top-Turn DSPs PCIN to P Router DSPs Top-Turn DSPs PCIN to P Three kinds of DSPs Router DSPs D H cascade Pass-thru DSPs PCOUT to PCIN Router DSPs fabric Pass-thru DSPs fabric PCOUT to PCIN Router DSPs Route DSPs Pass-thru DSPs PCOUT to PCIN Small fraction of DSPs for Router DSPs switching fabric Pass-through Pass-thru DSPs PCOUT to PCIN glorified pipelined wires Router DSPs multi-pumping 0% back to user cascade fabric fabric H H Bottom-Turn DSPs Top-Turn A:B to PCOUT DSPs PCIN to P Router DSPs Bottom-Turn DSPs A:B to PCOUT Corner-turn DSPs connect cascades to fabric Pass-thru DSPs PCOUT to PCIN 3
42 Physical FPGA layout Corner-Turn fabric cascade fabric Pass-Thru 36 2x2 NoC (ML60 board)
43
44 Efficiency 38
45 Efficiency 39
46 Efficiency 40
47 Efficiency DSPs less-efficient than -based! 41
Implementing FPGA overlay NoCs using the Xilinx UltraScale memory cascades
Implementing FPGA overlay NoCs using the Xilinx UltraScale memory cascades Nachiket Kapre University of Waterloo Waterloo, Ontario, Canada Email: nachiket@uwaterloo.ca Abstract We can enhance the performance
More informationFastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs
1/29 FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs Nachiket Kapre + Tushar Krishna nachiket@uwaterloo.ca, tushar@ece.gatech.edu 2/29 Claim FPGA overlay NoCs
More informationDSP Resources. Main features: 1 adder-subtractor, 1 multiplier, 1 add/sub/logic ALU, 1 comparator, several pipeline stages
DSP Resources Specialized FPGA columns for complex arithmetic functionality DSP48 Tile: two DSP48 slices, interconnect Each DSP48 is a self-contained arithmeticlogical unit with add/sub/multiply/logic
More informationINTRODUCTION TO FPGA ARCHITECTURE
3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)
More informationParallel FIR Filters. Chapter 5
Chapter 5 Parallel FIR Filters This chapter describes the implementation of high-performance, parallel, full-precision FIR filters using the DSP48 slice in a Virtex-4 device. ecause the Virtex-4 architecture
More informationFPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture
FPGA Architecture Overview dr chris dick dsp chief architect wireless and signal processing group xilinx inc. Generic FPGA Architecture () Generic FPGA architecture consists of an array of logic tiles
More informationFPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)
FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor
More informationHRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing
HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University http://mast.stanford.edu HPCA March 14, 2016 PIM is Coming Back End of Dennard
More informationUltra-Fast NoC Emulation on a Single FPGA
The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo
More informationLow-Power Interconnection Networks
Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:
More informationFast Flexible FPGA-Tuned Networks-on-Chip
This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael, James C. Hoe
More informationFast Scalable FPGA-Based Network-on-Chip Simulation Models
We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations and support. Computer Architecture Lab at Carnegie Mellon Fast Scalable FPGA-Based Network-on-Chip Simulation
More informationFPGA architecture and design technology
CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA
More informationSynthesizable FPGA Fabrics Targetable by the VTR CAD Tool
Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design
More informationEECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)
EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Fall 2002 EECS150 - Lec06-FPGA Page 1 Outline What are FPGAs? Why use FPGAs (a short history
More informationOutline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?
EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Outline What are FPGAs? Why use FPGAs (a short history lesson). FPGA variations Internal logic
More informationBasic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices
3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific
More informationAn FPGA Architecture Supporting Dynamically-Controlled Power Gating
An FPGA Architecture Supporting Dynamically-Controlled Power Gating Altera Corporation March 16 th, 2012 Assem Bsoul and Steve Wilton {absoul, stevew}@ece.ubc.ca System-on-Chip Research Group Department
More informationRe-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs
This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs
More informationFastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs
FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs Nachiket Kapre University of Waterloo Ontario, Canada nachiket@uwaterloo.ca Tushar Krishna Georgia Institute
More informationHow Much Logic Should Go in an FPGA Logic Block?
How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca
More informationECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I
ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I Overview Anti-fuse and EEPROM-based devices Contemporary SRAM devices - Wiring - Embedded New trends - Single-driver wiring -
More information3 Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs
3 Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs NACHKET KAPRE, University of Waterloo JAN GRA, Gray Research LLC We can design an FPGA-optimized lightweight network-on-chip (NoC) router
More informationMarathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs
Marathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs Nachiket Kapre Nanyang Technological University 5 Nanyang Avenue, Singapore 639798 Email: nachiket@ieee.org Abstract We can improve
More informationHigh Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx
High Capacity and High Performance 20nm FPGAs Steve Young, Dinesh Gaitonde August 2014 Not a Complete Product Overview Page 2 Outline Page 3 Petabytes per month Increasing Bandwidth Global IP Traffic Growth
More informationReconfigurable Cell Array for DSP Applications
Outline econfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden econfigurable computing Coarse-grained reconfigurable cell
More informationAUGMENTING FPGAS WITH EMBEDDED NETWORKS-ON-CHIP
AUGMENTING FPGAS WITH EMBEDDED NETWORKS-ON-CHIP Mohamed S. Abdelfattah and Vaughn Betz Department of Electrical and Computer Engineering University of Toronto, Toronto, ON, Canada {mohamed,vaughn}@eecg.utoronto.ca
More information! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.
Topics! SRAM-based FPGA fabrics:! Xilinx.! Altera. SRAM-based FPGAs! Program logic functions, using SRAM.! Advantages:! Re-programmable;! dynamically reconfigurable;! uses standard processes.! isadvantages:!
More informationDNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs
IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei
More informationCPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline
CPE/EE 422/522 Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices Dr. Rhonda Kay Gaede UAH Outline Introduction Field-Programmable Gate Arrays Virtex Virtex-E, Virtex-II, and Virtex-II
More informationGRVI Phalanx. A Massively Parallel RISC-V FPGA Accelerator Accelerator. Jan Gray
GRVI Phalanx A Massively Parallel RISC-V FPGA Accelerator Accelerator Jan Gray jan@fpga.org Introduction FPGA accelerators are hot MSR Catapult. Intel += Altera. OpenPOWER + Xilinx FPGAs as computers Massively
More informationReNoC: A Network-on-Chip Architecture with Reconfigurable Topology
1 ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology Mikkel B. Stensgaard and Jens Sparsø Technical University of Denmark Technical University of Denmark Outline 2 Motivation ReNoC Basic
More informationAltera FLEX 8000 Block Diagram
Altera FLEX 8000 Block Diagram Figure from Altera technical literature FLEX 8000 chip contains 26 162 LABs Each LAB contains 8 Logic Elements (LEs), so a chip contains 208 1296 LEs, totaling 2,500 16,000
More informationDesign and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA
Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,
More informationXilinx ASMBL Architecture
FPGA Structure Xilinx ASMBL Architecture Design Flow Synthesis: HDL to FPGA primitives Translate: FPGA Primitives to FPGA Slice components Map: Packing of Slice components into Slices, placement of Slices
More informationEN2911X: Reconfigurable Computing Topic 01: Programmable Logic
EN2911X: Reconfigurable Computing Topic 01: Programmable Logic Prof. Sherief Reda School of Engineering, Brown University Fall 2012 1 FPGA architecture Programmable interconnect Programmable logic blocks
More informationFPGA Based Digital Design Using Verilog HDL
FPGA Based Digital Design Using Course Designed by: IRFAN FAISAL MIR ( Verilog / FPGA Designer ) irfanfaisalmir@yahoo.com * Organized by Electronics Division Integrated Circuits Uses for digital IC technology
More informationDesign and Implementation of Buffer Loan Algorithm for BiNoC Router
Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India
More informationMapping a Pipelined Data Path onto a Network-on-Chip
Mapping a Pipelined Data Path onto a Network-on-Chip Stephan Kubisch, Claas Cornelius, Ronald Hecht, Dirk Timmermann {stephan.kubisch;claas.cornelius}@uni-rostock.de University of Rostock Institute of
More informationFPGA for Software Engineers
FPGA for Software Engineers Course Description This course closes the gap between hardware and software engineers by providing the software engineer all the necessary FPGA concepts and terms. The course
More informationNoC Test-Chip Project: Working Document
NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance
More informationStratix II vs. Virtex-4 Performance Comparison
White Paper Stratix II vs. Virtex-4 Performance Comparison Altera Stratix II devices use a new and innovative logic structure called the adaptive logic module () to make Stratix II devices the industry
More informationThe Xilinx XC6200 chip, the software tools and the board development tools
The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions
More informationUltra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks
Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks O. Liboiron-Ladouceur 1, C. Gray 2, D. Keezer 2 and K. Bergman 1 1 Department of Electrical Engineering,
More informationDual Split-Merge: A High Throughput Router Architecture for FPGAs
Dual plit-erge: A High Throughput Router Architecture for FPGAs Khaled Helal a,, ameh Attia a,, Hossam Fahmy a, Tawfik Ismail a, Yehea Ismail b, Hassan ostafa a,b, Abstract a Department of Electronics
More informationAn Overlay Architecture for FPGA-based Industrial Control Systems Designed with Functional Block Diagrams
R2-7 SASIMI 26 Proceedings An Overlay Architecture for FPGA-based Industrial Control Systems Designed with Functional Block Diagrams Taisei Segawa, Yuichiro Shibata, Yudai Shirakura, Kenichi Morimoto,
More informationHoplite-Q: Priority-Aware Routing in FPGA Overlay NoCs
Hoplite-Q: Priority-Aware Routing in FPGA Overlay NoCs Siddhartha Nanyang Technological University siddhart00@e.ntu.edu.sg Nachiket Kapre University of Waterloo nachiket@uwaterloo.ca Abstract The Hoplite
More informationIEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 Power Analysis of Embedded NoCs on FPGAs and Comparison With Custom Buses Mohamed S. Abdelfattah, Graduate Student Member, IEEE, and Vaughn
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationRouting Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip
Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract
More informationLecture 6: Hard vs Soft Logic. James C. Hoe Department of ECE Carnegie Mellon University
18 643 Lecture 6: Hard vs Soft Logic James C. Hoe Department of ECE Carnegie Mellon niversity 18 643 F17 L06 S1, James C. Hoe, CM/ECE/CALCM, 2017 Housekeeping Your goal today: understand the difference
More informationQsys and IP Core Integration
Qsys and IP Core Integration Stephen A. Edwards (after David Lariviere) Columbia University Spring 2016 IP Cores Altera s IP Core Integration Tools Connecting IP Cores IP Cores Cyclone V SoC: A Mix of
More informationCHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP
133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located
More informationTSEA44 - Design for FPGAs
2015-11-24 Now for something else... Adapting designs to FPGAs Why? Clock frequency Area Power Target FPGA architecture: Xilinx FPGAs with 4 input LUTs (such as Virtex-II) Determining the maximum frequency
More informationOrganic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design
Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationEECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007
EECS 5 - Components and Design Techniques for Digital Systems Lec 2 RTL Design Optimization /6/27 Shauki Elassaad Electrical Engineering and Computer Sciences University of California, Berkeley Slides
More informationOpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April
More informationEECS150 - Digital Design Lecture 09 - Parallelism
EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization
More informationVerilog for High Performance
Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes
More informationPower Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas
Power Solutions for Leading-Edge FPGAs Vaughn Betz & Paul Ekas Agenda 90 nm Power Overview Stratix II : Power Optimization Without Sacrificing Performance Technical Features & Competitive Results Dynamic
More informationFCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture
More informationBringing Programmability to the Data Plane: Packet Processing with a NoC-Enhanced FPGA
Total Tranceiver BW (Gb/s) Bringing Programmability to the Data Plane: Packet Processing with a NoC-Enhanced FPGA Andrew Bitar, Mohamed S. Abdelfattah, Vaughn Betz Department of Electrical and Computer
More informationMessaging Overview. Introduction. Gen-Z Messaging
Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional
More informationA Novel Energy Efficient Source Routing for Mesh NoCs
2014 Fourth International Conference on Advances in Computing and Communications A ovel Energy Efficient Source Routing for Mesh ocs Meril Rani John, Reenu James, John Jose, Elizabeth Isaac, Jobin K. Antony
More informationFPGA Polyphase Filter Bank Study & Implementation
FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image Communications/. Electrical Engineering Dept. UCLA 1 Introduction This document describes
More informationComputer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics
Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationMinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect
MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin, Greg Nazario, Xiangyao Yu*, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu Carnegie Mellon University *CMU
More information4. Networks. in parallel computers. Advances in Computer Architecture
4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors
More informationThe Nostrum Network on Chip
The Nostrum Network on Chip 10 processors 10 processors Mikael Millberg, Erland Nilsson, Richard Thid, Johnny Öberg, Zhonghai Lu, Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Pa Available online at: Analysis of Network Processor Elements Topologies Devesh Chaurasiya
More informationArchitecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC
BWCCA 2010 Fukuoka, Japan November 4-6 2010 Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC Akram Ben Ahmed, Abderazek Ben Abdallah, Kenichi Kuroda The University of Aizu
More informationDesign of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture
Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and
More informationReconfigurable Computing. On-line communication strategies. Chapter 7
On-line communication strategies Chapter 7 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design On-line connection - Motivation Routing-conscious temporal placement algorithms consider
More informationWhat is Xilinx Design Language?
Bill Jason P. Tomas University of Nevada Las Vegas Dept. of Electrical and Computer Engineering What is Xilinx Design Language? XDL is a human readable ASCII format compatible with the more widely used
More informationA Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis
A Data-Parallel Genealogy: The GPU Family Tree John Owens University of California, Davis Outline Moore s Law brings opportunity Gains in performance and capabilities. What has 20+ years of development
More informationTing Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China
CMOS Crossbar Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China OUTLINE Motivations Problems of Designing Large Crossbar Our Approach - Pipelined MUX
More informationEECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements
EECS15 - Digital Design Lecture 11 SRAM (II), Caches September 29, 211 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http//www-inst.eecs.berkeley.edu/~cs15 Fall
More informationA Protocol for Realtime Switched Communication in FPGA Clusters
A Protocol for Realtime Switched Communication in FPGA Clusters Richard D. Anderson Computer Science and Engineering, Box 9637 Mississippi State University Mississippi State, MS 39762 rda62@msstate.edu
More informationProgrammable Logic. Any other approaches?
Programmable Logic So far, have only talked about PALs (see 22V10 figure next page). What is the next step in the evolution of PLDs? More gates! How do we get more gates? We could put several PALs on one
More informationVdd Programmable and Variation Tolerant FPGA Circuits and Architectures
Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance
More informationIntroduction to Modern FPGAs
Introduction to Modern FPGAs Arturo Díaz Pérez Centro de Investigación y de Estudios Avanzados del IPN Departamento de Ingeniería Eléctrica Sección de Computación adiaz@cs.cinvestav.mx Outline Technology
More informationNetwork-on-Chip Architecture
Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)
More informationA Routing Approach to Reduce Glitches in Low Power FPGAs
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin Wong Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign This research
More informationDesign methodology for multi processor systems design on regular platforms
Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline
More informationCS310 Embedded Computer Systems. Maeng
1 INTRODUCTION (PART II) Maeng Three key embedded system technologies 2 Technology A manner of accomplishing a task, especially using technical processes, methods, or knowledge Three key technologies for
More informationA HT3 Platform for Rapid Prototyping and High Performance Reconfigurable Computing
A HT3 Platform for Rapid Prototyping and High Performance Reconfigurable Computing Second International Workshop on HyperTransport Research and Application (WHTRA 2011) University of Heidelberg Computer
More informationProcessor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP
Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor
More informationFPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011
FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level
More informationDigital System Design Lecture 7: Altera FPGAs. Amir Masoud Gharehbaghi
Digital System Design Lecture 7: Altera FPGAs Amir Masoud Gharehbaghi amgh@mehr.sharif.edu Table of Contents Altera FPGAs FLEX 8000 FLEX 10k APEX 20k Sharif University of Technology 2 FLEX 8000 Block Diagram
More informationAn Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart
An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart Weiwei Jiang Columbia University, USA Gabriele Miorandi University of Ferrara, Italy Wayne Burleson
More informationMaximizing Routing Resource Reuse in a Reconfiguration-aware Connection Router for FPGAs
FACULTY OF ENGINEERING AND ARCHITECTURE Maximizing Routing Resource Reuse in a Reconfiguration-aware Connection Router for FPGAs Elias Vansteenkiste Karel Bruneel and Dirk Stroobandt Elias.Vansteenkiste@UGent.be
More informationThe communication bottleneck
3D-MPSoCs: architectural and design technology outlook Luca Benini DEIS Università di Bologna lbenini@deis.unibo.it The communication bottleneck Architectural issues Traditional shared buses do not scale
More informationA 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology
http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee
More informationNetwork-on-chip (NOC) Topologies
Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance
More informationEECS Components and Design Techniques for Digital Systems. Lec 07 PLAs and FSMs 9/ Big Idea: boolean functions <> gates.
Review: minimum sum-of-products expression from a Karnaugh map EECS 5 - Components and Design Techniques for Digital Systems Lec 7 PLAs and FSMs 9/2- David Culler Electrical Engineering and Computer Sciences
More informationESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?
ESE534: Computer Organization Day 22: April 9, 2012 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. 1 [src: www.tabula.com] 2 Previously Today Saw how to pipeline
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more
More information