Customising Field Programmability
|
|
- Evelyn Morton
- 5 years ago
- Views:
Transcription
1 Customising Field Programmability Wayne Luk Department of Computing Imperial College London FPL 1 September
2 Overview 1. how it begins 2. customisation + field programmability 3. customisation: fab time 4. customisation: design time 5. customisation: run time 6. research directions 7. customise FPL 8. summary 2
3 1. How it begins 3
4 FPL 91 Attendants. FPL 91: 90 FPL 10: 220 at dinner tonight 4
5 Technical papers Paper on run-time reconfiguration 5
6 Technical papers Remember the RAMP project? 6
7 FPL 91 Bank of England inflation calculator 275 in 1991 = 440 in 2009 = 536 Euros in 2010 Registration fee for FPL 10: 500 Euros! 7
8 1991 Program comparison 24 regular papers 26 poster papers selected papers in submissions 60 regular papers 43 short papers 9 PhD Forum papers 8
9 Program comparison regular papers 26 poster papers selected papers in visit: Oxford Story submissions 60 regular papers 43 short papers 9 PhD Forum papers visit: The Last Supper 9
10 Commercial impact how many companies formed/benefited from related research published at FPL, ? paper published before or within a year of company start authors may or may not be involved in company answer: 10
11 Commercial impact how many companies formed/benefited from related research published at FPL, ? paper published before or within a year of company start authors may or may not be involved in company answer: at least 8, all still around except one 11
12 Some companies from FPL 91, Page: syn tool Celoxica 95, Gokhale: syn tool Impulse Accel. Tech. 97, Betz: device tool Right Track 99, Kastrup: device Silicon Hive 00, Mencer: syn tool Maxeler 02, de Sousa: IP cores Core Works 03, Teifel: device Achronix 07, Quinton: post-silicon debug Veridae first author of paper 12
13 Some companies from FPL 91, Page: syn tool Celoxica 95, Gokhale: syn tool Impulse Accel. Tech. 97, Betz: device tool Right Track 99, Kastrup: device Silicon Hive 00, Mencer: syn tool Maxeler bought by Altera: hear him tomorrow! 02, de Sousa: IP cores Core Works 03, Teifel: device Achronix 07, Quinton: post-silicon debug Veridae first author of paper 13
14 2. Customisation + field programmability reconfigurable computing reconfigurable technology + custom computing computing with field programmable devices, e.g. Field Programmable Gate Arrays (FPGAs) meet requirements by customisation of parallelism: space reconfiguration: time reconfigurable computing systems: wide range supercomputers portable systems 14
15 Systems with Reconfigurable Technology Supercomputer: SGI Altix 4700 Multi-Paradigm ccnuma FP6 hartes Project Source: SGI, Sony, FP6 hartes, CERN Particle Physics: CERN LHC 15
16 Customisation: when fabrication time: pre-fab options customise field-programmable fabric less post-fab options: less flexible, more efficient design time: pre-execution options customise initial mapping from application to fabric less execution options: less flexible, more efficient run time customise mapping to fabric during execution more execution options: dynamic scheduling 16
17 Customisation: some benefits high-level generic design requirements + context: multiple designs optimise: options + parameters + abstraction levels facilitate design composition help optimisation and verification modularity of building blocks + interfaces platform-based evolution re-use un-verified design: risky automate re-verification after changes 17
18 3. Customisation: fab time FPL 97: VPR FPGA device architecture: homogeneous device-specific mapping tools the most-cited FPL paper so far FPGA 09: VPR from Google Scholar different types of blocks in columns + routing single-driver routing, architecture files compare customised device with real device? new FPGA customised for particular application FPL 07: hybrid FPGA customised for floating-point applications coarse-grained blocks too complex for VPR 18
19 Generic hybrid FPGA architecture Example: floating-point hybrid FPGA coarse-grained units implement datapath bus-based operations, constant bit-width dominated by multiplication and addition fine-grained units implement control logic, state machine adopt existing FPGA fabric 19
20 Coarse-grained fabric D=9, M=4, R=3, F=3, 2 add, 2 mul: best density over benchmarks 20
21 challenges Modelling compare hybrid FPGA with current technology fine-grained units: same as current FPGA existing modelling tools cannot support new approach: Virtual Embedded Blocks synthesis tools: produce delay and area of specialised units model specialised units by arrays of existing FPGA cells with similar delay or area can use existing low-level tools in mapping applications to hybrid FPGA, e.g. place and route 21
22 Virtual Embedded Blocks L L' W ' W tpd Embedded Block in ASIC tpd' WL W' L' tpd tpd' Equivalent VEB using LC Distributed VEBs in a virtual FPGA dummy blocks to model coarse-grained block s area or delay timing analyzer: determine hybrid s performance, e.g. fine-to-coarse routing delay 22
23 Comparison: double precision Area (slices) Floating Point hybrid FPGA Delay (ns) XC2V Area (slices) Delay (ns) Area (times) Delay (times) bfly dscg fir mm ode syn bgm times area reduction 4 times speedup Geometric Mean compared with Virtex-II 23
24 Comparison: energy reduction (FPL 08) Energy reduced by 14 times on average 24
25 4. Customisation: design time FPL 91: program to gates program construct for parallelism + communication syntax directed compiler: simple but not optimised basis for Celoxica s Handel-C tool; now Mentor FPL 95: data parallel C, FCCM 00: Streams-C automate parallelisation, roots of Impulse-C FPL 01: GALADRIEL and NENYA compilers expose parallelism in hierarchical model support multi-memories + region-based scheduling FPL 09, DATE 10: combine two optimisations goal-directed: e.g. geometric programming syntax-directed: e.g. loop tiling 25
26 Customisation approach Sequential design Parser Speed spec Platform spec Pre-transform ( Inlining, Ptr2Array, Loop merge, Loop coalesce... ) Dependence analysis Design exploration and transform ( Data reuse, MapReduce, Pipelining) Power efficient design pre-transformations: enable and facilitate code analysis and design exploration geometric programming model for low power design exploration 26
27 Design space formulation Design parameters: N-level nested loops, R array references, Target performance: execution timet Target platform: F types of resources, each has Resf min { 1,2} ρij 1 kl L 1 ii l Power( ρ, k subject to Speed( ρ, k ij l, ii, freq) ij Resource f l, ii, ( ρ, k ij freq) l, ii) T Res f for i = 1,..., R, j = 1,..., E i, l = 1,..., N, f F ρ ij : data reuse variable k l : MapReduce parallel partition variable ii : pipelining initiation interval freq :system clock frequency 27
28 28 Power model PU Off-chip RAMs FPGA RAMs PU PU RAMs PU PU RAMs PU freq P bram P dsp P par P accesses off P cycles total accesses off I V on dd off on off dynamic = = + = ) # # # _ # ( _ # _ # P P P P Power
29 29 dsp N l l R i E j B ij N l l R i E j C ij x k dsp d bram k par accesses off i ij i ij = = = = = = = = = = log log # # # _ # 2 2 ρ ρ Capturing parallelisation + data re-use freq P bram P dsp P par P accesses off P cycles total accesses off I V on dd off on off dynamic = = + = ) # # # _ # ( _ # _ # P P P P Power Data reuse Parallelization
30 Speed model # total _ cycles = cyc + cyc + cyc + # off s in r _ accesses Num of cycles of statements outside the innermost loop cyc s = v l = S Ws s= 1 l= 1 Num of cycles of the innermost loop after pipelining cyc in N 1 v ( v ii + C + l N data l= 1 i= 1 Num of cycles of the result reduce phase cyc r = Wr l= 1 v l k l I d i + log 2 k + notfull) N 30
31 Resource model computational resource constraint N l=1 k l x dsp Rec dsp memory constraint d R E i i= 1 j= 1 ρ log 2 ij B ij Rec ram 31
32 Experimental results #Refs: number of array references 32
33 Experiment settings FPGA-based platform Xilinx XC4VFX140: 192 DSP blocks and 552 RAM blocks Clock frequency: 1~100MHz Speed constraint: up to 158x Design exploration time: within 2 mins Power results: Xilinx Power Estimator 33
34 Result: matrix multiplication (ρ1j, ρ2j, k1, k2, k3, ii, frequency) minimum energy 34 34
35 FP6 hartes: Harmonic toolchain 35
36 FP6 hartes: bigger picture Holistic Approach to Reconfigurable real Time Embedded Systems Algorithm Exploration Tools.c source code hartes Tool-Chain DSP GPP FPGA 16 partners in 5 European countries: Atmel, FAITAL, Fraunhofer, Imperial, INRIA, Leaff, P. di Bari, P. di Milano, Scaleo, Segula, Thales, Thomson, TU Delft, Uni. Di Ferrara, Uni. P. delle Marche, Uni. D Avignon 36
37 5. Customisation: run time 1. pre-deployment - reconfigure for design/debug, not during deployed 2. in-field upgrade - update and reboot - reconfiguration time/energy overhead: irrelevant 3. infrequent run-time reconfiguration - occasional adaptation to changing environment - reconfiguration time/energy overhead: not critical 4. frequent run-time reconfiguration - frequent reconfiguration: core functionality - reconfiguration time/energy overhead: critical 37
38 Design approaches FPL 91, 93, 00: Dynamic Circuit Switching run-time reconfigurable design applications estimate reconfiguration latency FPL 03, FPL 05: branch/phase-optimised more resources to speed up more frequent branch optimise for program phases ARC 09, FCCM 10 analytical optimisation of parallelism: speed/energy more parallelism: faster execution, slower configuration less execution energy, more configuration energy 38
39 Performance parallelism increases processing throughput parallelism slows down reconfiguration 39
40 Optimising parallelism total processing time for workload n find optimal degree of parallelism 40
41 How parallelism affects performance small workload: reconfiguration penalty drives to serial implementations large workload: benefits from parallelism 41
42 1. derive Φ app, s and n Optimisation steps 2. obtain Φ config from target technology 3. develop one design and determine t p and A 4. calculate t r and t r,e and t p,e 5. calculate p opt and find a feasible value 6. calculate t total and verify throughput 7. implement design with p from step 5 and verify its actual throughput 8. calculate buffer size, check device utilisation 42
43 Case study: GSM channel filter optimising the FIR filter for maximum performance 43
44 44 Extend: parallelism to optimise energy total energy for workload n find optimal degree of parallelism p t P p n s t P n s t P E E E E e r r e p o e p e c R o c total + + = + + =,,,, e r r e p o total t P p n s t P p E, 2, + = e r r e p o opt t P n s t P p,, =
45 CoreGen design E = 33.4 nj p opt = E total,n E p,n E r,n LUTs p = 20 E model = 16.7 nj E = 17.7 nj E [nj/sample] x area -47% -49% compared with parallel design compared with non-reconfig design p 45
46 6. Some research directions theory and models analysis of reconfiguration and parallelism design automation standard optimisation: apply automatically floorplan for parametric relocatable modules scalable design re-use, placement strategies run-time performance tuning and validation benefits vs overhead applications of reconfiguration overcome process variation, autonomous systems 46
47 Autonomous systems control strategy make decisions to optimise itself model of world: planning and action understand trade-offs: e.g. reconfigure or not event-driven just-in-time reconfiguration component meta-data description assemble + tune partially-optimised components hide reconfiguration latency extensions self-optimisation self-verification 47
48 Self-optimisation + self-verification optimise and verify: hardware + software meet requirements efficiently and demonstrably self-optimising and self-verifying design (SOSV) preserve property in design composition self? aware of context capable of planning effective external control 2 stages pre-deployment: building design, compile time post-deployment: operational, run time 48
49 Pre-deployment: example of choices circuit technology: e.g. ASIC or FPGA input/output: options memory: hierarchy + options interconnect: e.g. bus, switch, network-on-chip granularity: configurable unit, custom instruction synchronisation: e.g. clock domains, self-timed parallelism: processors, hardware/software data representation optimisation post-deployment optimisation/verification 49
50 Post-deployment: situation-specific optimization and verification opportunities design upgrade run-time conditions, e.g. noise, process variation program phase optimisation optimisation and verification process light-weight: on-site, e.g. proof-carrying code heavy-weight: remotely, verified by signature run-time system deals with exceptions error diagnosis facilities 50
51 Example: dynamic power optimisation power surge while self-optimising/verifying 51
52 Challenges: theory + practice for: productive automate: evolutionary vs disruptive SOSV design: specify + analyse requirements composable description: design + context multi-level capture: domain-specific constraints open standard: design, optim/verify programs 52
53 aim 7. Customise FPL could only attend one conference? FPL! platform for great research + education cost of attendance: not a barrier provide your opinions: they will count! cost of registration number of pages for papers paper review: authors response? 53
54 8. Summary self * (optimising+verifying) = trusted re-use unify: autonomic, self-test, dynamic optim., RTR improve: design efficiency + designer productivity self-optimising self-verifying design platform customisable: cloud + embedded computing autonomous: system-on-chip + network of ASOCs applications: ubiquitous, dependable, secure, robust new generation of designers building blocks + tools: made smarter specify, analyse, adapt: requirements + search but do not forget... 54
55 FPL is us! lets make FPL even better... better papers better review + feedback on papers better interactions at conference lower cost to attend more companies benefit from FPL research and... 55
56 FPL is us! lets make FPL even better... better papers better review + feedback on papers better interactions at conference lower cost to attend more companies benefit from FPL research and... see you in the next 20 FPLs at least! 56
Architecture. Philip Leong Computer Engineering Laboratory School of Electrical and Information Engineering The University of Sydney
Architecture Philip Leong Computer Engineering Laboratory School of Electrical and Information Engineering The University of Sydney This course 1. Introduction to Reconfigurable Computing - what is reconfigurable
More informationOPTIMIZING COARSE- GRAINED UNITS IN FLOATING POINT HYBRID FPGA
OPTIMIZING COARSE- GRAINED UNITS IN FLOATING POINT HYBRID FPGA Chi Wai Yu 1, Alastair M. Smith 2, Wayne Luk 1, Philip Leong 3, Steven J.E. Wilton 2 1 Dept of Computing Imperial College London, London {cyu,wl}@doc.ic.ac.uk
More informationTowards smart sensing based on field-programmable technology
Towards smart sensing based on field-programmable technology Wayne Luk Imperial College 28 January 2017 Outline 1. Smart sensing + field-programmable technology 2. Example: remote sensing 3. Acceleration:
More informationCustom Computing. wl
Custom Computing theory and practice of customising designs one of the fastest growing technologies impact on ASIC, CPU, many-core, GPU, multi-scale dataflow wide range of architectures and applications
More informationTHE COARSE-GRAINED / FINE-GRAINED LOGIC INTERFACE IN FPGAS WITH EMBEDDED FLOATING-POINT ARITHMETIC UNITS
THE COARSE-GRAINED / FINE-GRAINED LOGIC INTERFACE IN FPGAS WITH EMBEDDED FLOATING-POINT ARITHMETIC UNITS Chi Wai Yu 1, Julien Lamoureux 2, Steven J.E. Wilton 2, Philip H.W. Leong 3, Wayne Luk 1 1 Dept
More informationSupporting Multithreading in Configurable Soft Processor Cores
Supporting Multithreading in Configurable Soft Processor Cores Roger Moussali, Nabil Ghanem, and Mazen A. R. Saghir Department of Electrical and Computer Engineering American University of Beirut P.O.
More informationSynthesizable FPGA Fabrics Targetable by the VTR CAD Tool
Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design
More informationFloating Point FPGA: Architecture and Modelling
1 Floating Point FPGA: Architecture and Modelling Chun Hok Ho, Chi Wai Yu, Philip Leong, Wayne Luk, Steven J.E. Wilton Abstract This paper presents an architecture for a reconfigurable device which is
More informationTowards Optimal Custom Instruction Processors
Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors
More information"On the Capability and Achievable Performance of FPGAs for HPC Applications"
"On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies
More informationAn Overview of a Compiler for Mapping MATLAB Programs onto FPGAs
An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs P. Banerjee Department of Electrical and Computer Engineering Northwestern University 2145 Sheridan Road, Evanston, IL-60208 banerjee@ece.northwestern.edu
More informationParallel graph traversal for FPGA
LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,
More informationSpiral 2-8. Cell Layout
2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric
More informationA Configurable Multi-Ported Register File Architecture for Soft Processor Cores
A Configurable Multi-Ported Register File Architecture for Soft Processor Cores Mazen A. R. Saghir and Rawan Naous Department of Electrical and Computer Engineering American University of Beirut P.O. Box
More informationSDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center
SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently
More informationA Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms
A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,
More informationReconfigurable Computing. Introduction
Reconfigurable Computing Tony Givargis and Nikil Dutt Introduction! Reconfigurable computing, a new paradigm for system design Post fabrication software personalization for hardware computation Traditionally
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationAutomated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case
XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation SAMOS XIV - 2014 July 14 th - Samos Island (Greece) Carlo Sau, Luigi Raffo DIEE Università degli Studi
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationFPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.
Agenda The topics that will be addressed are: Scheduling tasks on Reconfigurable FPGA architectures Mauro Marinoni ReTiS Lab, TeCIP Institute Scuola superiore Sant Anna - Pisa Overview on basic characteristics
More informationFPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011
FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level
More informationA Novel Design Framework for the Design of Reconfigurable Systems based on NoCs
Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationSignal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University
Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage silage@temple.edu ECE Temple University www.temple.edu/scdl Signal Processing Algorithms into Fixed Point FPGA Hardware Motivation
More informationASIC, Customer-Owned Tooling, and Processor Design
ASIC, Customer-Owned Tooling, and Processor Design Design Style Myths That Lead EDA Astray Nancy Nettleton Manager, VLSI ASIC Device Engineering April 2000 Design Style Myths COT is a design style that
More informationCPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline
CPE/EE 422/522 Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices Dr. Rhonda Kay Gaede UAH Outline Introduction Field-Programmable Gate Arrays Virtex Virtex-E, Virtex-II, and Virtex-II
More informationVivado HLx Design Entry. June 2016
Vivado HLx Design Entry June 2016 Agenda What is the HLx Design Methodology? New & Early Access features for Connectivity Platforms Creating Differentiated Logic 2 What is the HLx Design Methodology? Page
More informationLow energy and High-performance Embedded Systems Design and Reconfigurable Architectures
Low energy and High-performance Embedded Systems Design and Reconfigurable Architectures Ass. Professor Dimitrios Soudris School of Electrical and Computer Eng., National Technical Univ. of Athens, Greece
More informationSynthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool
Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool Md. Abdul Latif Sarker, Moon Ho Lee Division of Electronics & Information Engineering Chonbuk National University 664-14 1GA Dekjin-Dong
More informationECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I
ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I Overview Anti-fuse and EEPROM-based devices Contemporary SRAM devices - Wiring - Embedded New trends - Single-driver wiring -
More informationCo-synthesis and Accelerator based Embedded System Design
Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationVdd Programmable and Variation Tolerant FPGA Circuits and Architectures
Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance
More informationResearch Article The Coarse-Grained/Fine-Grained Logic Interface in FPGAs with Embedded Floating-Point Arithmetic Units
Hindawi Publishing Corporation International Journal of Reconfigurable Computing Volume 008, Article ID 73603, 10 pages doi:10.1155/008/73603 Research Article The Coarse-Grained/Fine-Grained Logic Interface
More informationCustom computing systems
Custom computing systems difference engine: Charles Babbage 1832 - compute maths tables digital orrery: MIT 1985 - special-purpose engine, found pluto motion chaotic Splash2: Supercomputing esearch Center
More informationFPGA design with National Instuments
FPGA design with National Instuments Rémi DA SILVA Systems Engineer - Embedded and Data Acquisition Systems - MED Region ni.com The NI Approach to Flexible Hardware Processor Real-time OS Application software
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationTracking Acceleration with FPGAs. Future Tracking, CMS Week 4/12/17 Sioni Summers
Tracking Acceleration with FPGAs Future Tracking, CMS Week 4/12/17 Sioni Summers Contents Introduction FPGAs & 'DataFlow Engines' for computing Device architecture Maxeler HLT Tracking Acceleration 2 Introduction
More informationFPGA BASED ACCELERATION OF THE LINPACK BENCHMARK: A HIGH LEVEL CODE TRANSFORMATION APPROACH
FPGA BASED ACCELERATION OF THE LINPACK BENCHMARK: A HIGH LEVEL CODE TRANSFORMATION APPROACH Kieron Turkington, Konstantinos Masselos, George A. Constantinides Department of Electrical and Electronic Engineering,
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationLarge-Scale Network Simulation Scalability and an FPGA-based Network Simulator
Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid
More informationIntroduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013
Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.
More informationIntroduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture
Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Dynamic HW/SW Partitioning Initially execute application in software only 5 Partitioned application executes faster
More informationIEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012 1295 Optimizing Floating Point Units in Hybrid FPGAs ChiWai Yu, Alastair M. Smith, Wayne Luk, Fellow, IEEE, Philip
More information118th AES Convention Barcelona, Spain - May 28-31, 2005
www.nutechdsp.com 118th AES Convention Barcelona, Spain - May 28-31, 2005 NU-Tech: implementing DSP Algorithms in a plug-in based software platform for Real Time Audio applications Abstract: This work
More informationAdvanced FPGA Design Methodologies with Xilinx Vivado
Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,
More informationPipelining & Verilog. Sequential Divider. Verilog divider.v. Math Functions in Coregen. Lab #3 due tonight, LPSet 8 Thurs 10/11
Lab #3 due tonight, LPSet 8 Thurs 0/ Pipelining & Verilog Latency & Throughput Pipelining to increase throughput Retiming Verilog Math Functions Debugging Hints Sequential Divider Assume the Divid (A)
More informationEnd User Update: High-Performance Reconfigurable Computing
End User Update: High-Performance Reconfigurable Computing Tarek El-Ghazawi Director, GW Institute for Massively Parallel Applications and Computing Technologies(IMPACT) Co-Director, NSF Center for High-Performance
More informationNew Successes for Parameterized Run-time Reconfiguration
New Successes for Parameterized Run-time Reconfiguration (or: use the FPGA to its true capabilities) Prof. Dirk Stroobandt Ghent University, Belgium Hardware and Embedded Systems group Universiteit Gent
More informationElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests
ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests Mingxing Tan 1 2, Gai Liu 1, Ritchie Zhao 1, Steve Dai 1, Zhiru Zhang 1 1 Computer Systems Laboratory, Electrical and Computer
More informationThe extreme Adaptive DSP Solution to Sensor Data Processing
The extreme Adaptive DSP Solution to Sensor Data Processing Abstract Martin Vorbach PACT XPP Technologies Leo Mirkin Sky Computers, Inc. The new ISR mobile autonomous sensor platforms present a difficult
More informationBasic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices
3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific
More informationDNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs
IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei
More informationMetadata for Component Optimisation
Metadata for Component Optimisation Olav Beckmann, Paul H J Kelly and John Darlington ob3@doc.ic.ac.uk Department of Computing, Imperial College London 80 Queen s Gate, London SW7 2BZ United Kingdom Metadata
More informationFlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com
FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers
More informationScalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA
Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationFCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA
1 FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA Compiler Tan Nguyen 1, Swathi Gurumani 1, Kyle Rupnow 1, Deming Chen 2 1 Advanced Digital Sciences Center, Singapore {tan.nguyen,
More informationINTRODUCTION TO FPGA ARCHITECTURE
3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)
More informationLOW LATENCY DATA DISTRIBUTION IN CAPITAL MARKETS: GETTING IT RIGHT
LOW LATENCY DATA DISTRIBUTION IN CAPITAL MARKETS: GETTING IT RIGHT PATRICK KUSTER Head of Business Development, Enterprise Capabilities, Thomson Reuters +358 (40) 840 7788; patrick.kuster@thomsonreuters.com
More informationWhat is Xilinx Design Language?
Bill Jason P. Tomas University of Nevada Las Vegas Dept. of Electrical and Computer Engineering What is Xilinx Design Language? XDL is a human readable ASCII format compatible with the more widely used
More informationComputer Architecture s Changing Definition
Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction
More informationFPGA: What? Why? Marco D. Santambrogio
FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much
More informationHRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing
HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University http://mast.stanford.edu HPCA March 14, 2016 PIM is Coming Back End of Dennard
More informationMemory-efficient and fast run-time reconfiguration of regularly structured designs
Memory-efficient and fast run-time reconfiguration of regularly structured designs Brahim Al Farisi, Karel Heyse, Karel Bruneel and Dirk Stroobandt Ghent University, ELIS Department Sint-Pietersnieuwstraat
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore
More informationPARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures *
PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * Sudarshan Banerjee, Elaheh Bozorgzadeh, Nikil Dutt Center for Embedded Computer Systems
More informationReconfigurable Computing. Design and implementation. Chapter 4.1
Reconfigurable Computing Design and implementation Chapter 4.1 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software Software-Co-Design Reconfigurable Computing In System Integration Reconfigurable
More informationFILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas
FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given
More informationA VARIETY OF ICS ARE POSSIBLE DESIGNING FPGAS & ASICS. APPLICATIONS MAY USE STANDARD ICs or FPGAs/ASICs FAB FOUNDRIES COST BILLIONS
architecture behavior of control is if left_paddle then n_state
More informationasoc: : A Scalable On-Chip Communication Architecture
asoc: : A Scalable On-Chip Communication Architecture Russell Tessier, Jian Liang,, Andrew Laffely,, and Wayne Burleson University of Massachusetts, Amherst Reconfigurable Computing Group Supported by
More informationAgenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs
New Directions in Programming FPGAs for DSP Dr. Jim Hwang Xilinx, Inc. Agenda Introduction FPGA DSP platforms Design challenges New programming models for FPGAs System Generator Getting your math into
More informationCustomising Parallelism and Caching for Machine Learning
Customising Parallelism and Caching for Machine Learning Andreas Fidjeland and Wayne Luk Department of Computing Imperial College London akf99,wl @doc.ic.ac.uk Abstract Inductive logic programming is an
More informationCOPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design
COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design Lecture Objectives Background Need for Accelerator Accelerators and different type of parallelizm
More informationSYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS
SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous
More informationGPU vs FPGA : A comparative analysis for non-standard precision
GPU vs FPGA : A comparative analysis for non-standard precision Umar Ibrahim Minhas, Samuel Bayliss, and George A. Constantinides Department of Electrical and Electronic Engineering Imperial College London
More informationCS310 Embedded Computer Systems. Maeng
1 INTRODUCTION (PART II) Maeng Three key embedded system technologies 2 Technology A manner of accomplishing a task, especially using technical processes, methods, or knowledge Three key technologies for
More informationSDSoC: Session 1
SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the
More informationReconfigurable Computing. Design and Implementation. Chapter 4.1
Design and Implementation Chapter 4.1 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design In System Integration System Integration Rapid Prototyping Reconfigurable devices (RD) are usually
More informationFPGAs: High Assurance through Model Based Design
FPGAs: High Assurance through Based Design AADL Workshop 24 January 2007 9:30 10:00 Yves LaCerte Rockwell Collins Advanced Technology Center 400 Collins Road N.E. Cedar Rapids, IA 52498 ylacerte@rockwellcollins.cm
More informationAdaptive Computing Systems (ACS) Domain for Implementing DSP Algorithms in Reconfigurable Hardware. Objective/Approach/Process
Adaptive Computing Systems (ACS) Domain for Implementing DSP Algorithms in Reconfigurable Hardware John Zaino, Eric Pauer, Ken Smith, Paul Fiore, Jairam Ramanathan, Cory Myers {john.c.aino, ken.smith,
More informationVerification and Validation of X-Sim: A Trace-Based Simulator
http://www.cse.wustl.edu/~jain/cse567-06/ftp/xsim/index.html 1 of 11 Verification and Validation of X-Sim: A Trace-Based Simulator Saurabh Gayen, sg3@wustl.edu Abstract X-Sim is a trace-based simulator
More informationDigital Design Methodology (Revisited) Design Methodology: Big Picture
Digital Design Methodology (Revisited) Design Methodology Design Specification Verification Synthesis Technology Options Full Custom VLSI Standard Cell ASIC FPGA CS 150 Fall 2005 - Lec #25 Design Methodology
More informationESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder
ESE532: System-on-a-Chip Architecture Day 8: September 26, 2018 Spatial Computations Today Graph Cycles (from Day 7) Accelerator Pipelines FPGAs Zynq Computational Capacity 1 2 Message Custom accelerators
More informationEmbedded Systems: Hardware Components (part I) Todor Stefanov
Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System
More informationIntroduction to Parallel Programming Models
Introduction to Parallel Programming Models Tim Foley Stanford University Beyond Programmable Shading 1 Overview Introduce three kinds of parallelism Used in visual computing Targeting throughput architectures
More informationA Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific
More informationComparing Memory Systems for Chip Multiprocessors
Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University
More informationExploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture
Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture Ramadass Nagarajan Karthikeyan Sankaralingam Haiming Liu Changkyu Kim Jaehyuk Huh Doug Burger Stephen W. Keckler Charles R. Moore Computer
More informationFPGA Polyphase Filter Bank Study & Implementation
FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image Communications/. Electrical Engineering Dept. UCLA 1 Introduction This document describes
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationProcessor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP
Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor
More informationEECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)
EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Fall 2002 EECS150 - Lec06-FPGA Page 1 Outline What are FPGAs? Why use FPGAs (a short history
More informationFPGA architecture and design technology
CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA
More informationIntroduction to reconfigurable systems
Introduction to reconfigurable systems Reconfigurable system (RS)= any system whose sub-system configurations can be changed or modified after fabrication Reconfigurable computing (RC) is commonly used
More informationDigital Design Methodology
Digital Design Methodology Prof. Soo-Ik Chae Digital System Designs and Practices Using Verilog HDL and FPGAs @ 2008, John Wiley 1-1 Digital Design Methodology (Added) Design Methodology Design Specification
More informationUltra-Fast NoC Emulation on a Single FPGA
The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo
More informationAbstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE
A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany
More informationHow Much Logic Should Go in an FPGA Logic Block?
How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca
More informationAutomated RTR Temporal Partitioning for Reconfigurable Embedded Real-Time System Design
Automated RTR Temporal Partitioning for Reconfigurable Embedded Real-Time System Design C. Tanougast, Y. Berviller, P. Brunet and S. Weber L. I. E. N. Laboratoire d Instrumentation Electronique de Nancy
More information