Generation of Multigrid-based Numerical Solvers for FPGA Accelerators
|
|
- Leona Lester
- 5 years ago
- Views:
Transcription
1 Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Christian Schmitt, Moritz Schmid, Frank Hannig, Jürgen Teich, Sebastian Kuckuk, Harald Köstler Hardware/Software Co-Design, System Simulation, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) HiStencils, Amsterdam, The Netherlands; January 20, 2015
2 Motivation Multigrid methods are widely used Solution of discretized PDEs Preconditioners for other iterative solvers and highly scalable O(N) operations but also on a device scale: from embedded hardware to supercomputers! But: Most efficient implementation varies greatly on numerical problem and target architecture! Our research Generation of multigrid-based solvers and automatic application of domain- and target-specific optimizations Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 1
3 Motivation Code generator works for supercomputers, e. g., on JUQUEEN (TOP500 #8) 1 : 10 total runtime [s] k 2k 4k 8k 16k 32k 64k 128k 256k number of cores but can it also work on the other end of the scale, i. e., for energy-efficient embedded devices such as FPGAs? 1 Christian Schmitt et al. ExaSlang: A Domain-Specific Language for Highly Scalable Multigrid Solvers. In: Proceedings of the 4th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC). (New Orleans, LA, USA). IEEE Computer Society, Nov. 17, 2014, pp ISBN: DOI: /WOLFHPC Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 2
4 Basic Multigrid Ideas Multigrid method 4 1. Pre-smoothing 3 2. Calculation of residual 3. Restriction 4. Recursive call(s) or solve (at coarsest level) 5. Prolongation 6. Correction 7. Post-smoothing level level time 1 0 time Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 3
5 Basic Multigrid Ideas Residual on fine grid Smoother applied Residual on coarse grid Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 4
6 FPGA Basics Field Programmable Gate Arrays Array of lookup tables and registers configurable logic blocks (CLBs) Switch matrices to connect CLBs Trade-off between performance of hardware (ASIC) and flexibility of software Programming via Hardware Description Language, e. g., VHDL, Verilog Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 5
7 FPGA Basics Spread across chip: Hard-IP cores Block RAM Distributed memory ~1000, 1-2 kb each Very high memory bandwidth DSP Blocks Dedicated multiplier/adder units Typically bit Clock: ~500 MHz High-speed serial I/O PCIe hard-ip for communication with host PC Off-chip DDR3 Soft IP support for various protocols (Infiniband, Ethernet,... ) Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 6
8 FPGA Basics Basic principle: Spatial Computing, e. g., stream processing Temporal Computing: Sequential execution time Spatial Computing: Parallel execution time Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 7
9 FPGA Basics Traditional Workflow Hand-coding HDL Register Transfer Level (RTL) Post Place & Route (PPnR) Upload configuration file to FPGA s t r u c t Smoother_1Kernel { double operator ( ) ( double rhsdata_1 [3][3], double solutiondata_1 [ 3 ] [ 3 ] ) { double temp1 = 4.0 f * solutiondata_1 [1][1] solutiondata_1 [2][1] solutiondata_1 [0][1] solutiondata_1 [1][2] solutiondata_1 [ 1 ] [ 0 ] ; double temp2 = rhsdata_1 [ 1 ] [ 1 ] temp1 ; double temp3 = solutiondata_1 [ 1 ] [ 1 ] + temp2 / 2 ; r e t u r n temp3 ; } } ; void Smoother_1 ( hls : : stream<double>& rhs_in, hls : : stream<double >& data, hls : : stream<double>& sol, hls : : stream< double >& rhs_out ) { s t r u c t Smoother_1Kernel Smoother_1_inst ; processmimo<16384, 32, 32, 3> ( rhs_in, data, sol, rhs_out, 32, 32, Smoother_1_inst, BorderPadding : :BORDER_CLAMP) ; } High-level Synthesis (HLS) Behavioral description (algorithm, math. model), often in a subset of C/C++ Conversion to structured description Register-transfer level (RTL) Connected blocks Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 8
10 ExaSlang Multi-layered DSL Description of multigrid-based numerical solvers Layer 4 aimed at computer scientists Explicitly parallel by providing simple communication statements Definition of arrays, stencils, loops Explicit addressing of different multigrid levels Level Specifications Referencing of multigrid Used to implement multigrid recursion. Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 9
11 ExaSlang Fields and Layouts Represent multi-dimensional arrays Size(s) determined automatically Layout determines datatype, communication Multiple fields can have the same layout Layout NoComm { ghostlayers = [ 0, 0] duplicatelayers = [ 1, 1 ] } Field Solution <global, NoComm, Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 10
12 ExaSlang Stencils Stencil { [ 0, 0] => 4.0, [ 1, 0] => -1.0, [-1, 0] => -1.0 [ 0, 1] => -1.0, [ 0, -1] => -1.0 } Loops loop over { = + ((( 1.0 / * 0.8) * - * } Bounds of loop determined by field One loop can be mapped to one kernel function Arguments for function via dependency analysis Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 11
13 Mapping to FPGAs Resolve stencil applications per multigrid levels Map loop over statements to separate IP core Dependency analysis: Add fields to IP core inputs/outputs Calculate field (stream) sizes for IP core Replace loop over statements with process statements Connect IP cores with streams and insert copy/split kernels to duplicate streams Add iteration intervals from simulation Resource sharing optimizations Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 12
14 Mapping to FPGAs Kernels connected via streams: pre8 data_in rhs_in smooth8_1 data_out sol8_1 rhs8 post8 smooth8_2 sol8_2 rhs8 residual8 res8 cpy8 res8_1 rhs8 res8_2 smooth8_4 rhs8 sm8 smooth8_3 corr8 correction up8 downsample8 upsample8 FIFO buffers needed between cores for different stream sizes, i. e., downsampling and upsampling Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 13
15 Mapping to FPGAs Stages can be reused due to FIFO buffering: sol8 rhs8 data_in data_out II=1 corr8 residual8 smooth8 smooth8 res8 correct8 down8 restrict8 B prolong8 up8 sol7 rhs7 smooth7 B smooth7 II=4 B smooth7 up8 sm7 sol7 smooth7 rhs7 ~ ~ restrict2 II=4096 B ~ prolong2 ~ sol7 rhs7 B II=16384 B smooth1 8 STAGE MULTIGRID SOLVER B Buffer Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 14
16 Results Setup V(2,2) solver for Poisson s equation 8 multigrid levels fixed Jacobi smoothers Jacobi applied multiple times for coarse grid solving Input grid size of High-level synthesis Xilinx Vivado HLS v14.2 Small support library to help with IP core instantiation 2 Buffer sizes calculated using external simulation tool 2 Moritz Schmid et al. An Image Processing Library for C-based High-Level Synthesis. In: Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL). (Munich, Germany). Sept. 2 4, 2014 Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 15
17 Results Resource usage on FPGAs for double precision: FPGA LUTs FFs DSPs BRAMs F max [MHz] Kintex % 43% 111% 124% Virtex % 29% 33% 53% Sharing of stages due to FIFO buffers Increase coarser stages iteration intervals (II) for resource sharing Double precision not possible for Kintex-7 More stages could be added for Virtex-7 3 Estimation by Vivado HLS. Place & Route not possible due resource constraints. 4 PPnR Result Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 16
18 Results Performance figures for a single V-cycle Target Runtime [ms] Throughput [Vps 5 ] FPGA Intel i V-cycles per second 6 Performance is the same for Kintex-7 (XC7VX485T) and Virtex-7 (XC7K325T). Single precision on Kintex-7, double precision on Virtex-7. 7 Intel i7-3770, 3.40 GHz, single thread. Besides AVX, no optimization where applied by our code generator. Double precision. Code is memory-bandwidth bound. Compiled with -O3. Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 17
19 Summary Conclusions Code generation for FPGAs based on HDL ExaStencils code generator flexible enough to emit code for a fundamentally different computing model Performance of mid-range FPGAs already promising Future work Research (algorithmical) optimization potential Smarter grid traversal for 3D Automatic calculation of buffer sizes Partitioning among multiple FPGA boards Automatic application of HLS and hardware synthesizing Automatic re-configuration at runtime if convergence prediction insufficient Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 18
20 Thanks for listening. Questions? E astencils ExaStencils Advanced Stencil Code Engineering ExaStencils is funded by the German Research Foundation (DFG) as part of the Priority Program 1648 (Software for Exascale Computing). Christian Schmitt FAU Generation of Multigrid-based Numerical Solvers for FPGA Accelerators HiStencils15 19
Automatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014
Automatic Generation of Algorithms and Data Structures for Geometric Multigrid Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014 Introduction Multigrid Goal: Solve a partial differential
More informationPreprint Version. Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution
Preprint Version Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution CHRISTIAN SCHMITT, MORITZ SCHMID, SEBASTIAN KUCKUK, HARALD KÖSTLER, JÜRGEN TEICH, and
More informationSoftware design for highly scalable numerical algorithms
Software design for highly scalable numerical algorithms Harald Köstler Workshop on Recent Advances in Parallel and High Performance Computing Techniques and Applications 12.1.2015, Singapore Contents
More informationChallenges in Fully Generating Multigrid Solvers for the Simulation of non-newtonian Fluids
Challenges in Fully Generating Multigrid Solvers for the Simulation of non-newtonian Fluids Sebastian Kuckuk FAU Erlangen-Nürnberg 18.01.2016 HiStencils 2016, Prague, Czech Republic Outline Outline Scope
More informationSODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou
SODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou University of California, Los Angeles 1 What is stencil computation? 2 What is Stencil Computation? A sliding
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationAn Evaluation of Domain-Specific Language Technologies for Code Generation
An Evaluation of Domain-Specific Language Technologies for Code Generation Christian Schmitt, Sebastian Kuckuk, Harald Köstler, Frank Hannig, Jürgen Teich Hardware/Software Co-Design, System Simulation,
More informationGeometric Multigrid on Multicore Architectures: Performance-Optimized Complex Diffusion
Geometric Multigrid on Multicore Architectures: Performance-Optimized Complex Diffusion M. Stürmer, H. Köstler, and U. Rüde Lehrstuhl für Systemsimulation Friedrich-Alexander-Universität Erlangen-Nürnberg
More informationTowards Generating Solvers for the Simulation of non-newtonian Fluids. Harald Köstler, Sebastian Kuckuk FAU Erlangen-Nürnberg
Towards Generating Solvers for the Simulation of non-newtonian Fluids Harald Köstler, Sebastian Kuckuk FAU Erlangen-Nürnberg 22.12.2015 Outline Outline Scope and Motivation Project ExaStencils The Application
More informationHow to Optimize Geometric Multigrid Methods on GPUs
How to Optimize Geometric Multigrid Methods on GPUs Markus Stürmer, Harald Köstler, Ulrich Rüde System Simulation Group University Erlangen March 31st 2011 at Copper Schedule motivation imaging in gradient
More informationA Multi-layered Domain-specific Language for Stencil Computations
A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt Hardware/Software Co-Design, University of Erlangen-Nuremberg SPPEXA-Kolloquium, Erlangen, Germany; July 09, 2014 Challenges
More informationIntroduction to Multigrid and its Parallelization
Introduction to Multigrid and its Parallelization! Thomas D. Economon Lecture 14a May 28, 2014 Announcements 2 HW 1 & 2 have been returned. Any questions? Final projects are due June 11, 5 pm. If you are
More informationSignal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University
Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage silage@temple.edu ECE Temple University www.temple.edu/scdl Signal Processing Algorithms into Fixed Point FPGA Hardware Motivation
More informationEXPERIMENTS ON OPTIMIZING THE PERFORMANCE OF STENCIL CODES WITH SPL CONQUEROR
Parallel Processing Letters c World Scientific Publishing Company EXPERIMENTS ON OPTIMIZING THE PERFORMANCE OF STENCIL CODES WITH SPL CONQUEROR ALEXANDER GREBHAHN and SEBASTIAN KUCKUK and CHRISTIAN SCHMITT
More informationThroughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling
Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling Tobias Schwarzer 1, Joachim Falk 1, Michael Glaß 1, Jürgen Teich 1, Christian Zebelein 2, Christian
More informationSmoothers. < interactive example > Partial Differential Equations Numerical Methods for PDEs Sparse Linear Systems
Smoothers Partial Differential Equations Disappointing convergence rates observed for stationary iterative methods are asymptotic Much better progress may be made initially before eventually settling into
More informationMassively Parallel Phase Field Simulations using HPC Framework walberla
Massively Parallel Phase Field Simulations using HPC Framework walberla SIAM CSE 2015, March 15 th 2015 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer, Harald Köstler and Ulrich
More informationEfficient AMG on Hybrid GPU Clusters. ScicomP Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann. Fraunhofer SCAI
Efficient AMG on Hybrid GPU Clusters ScicomP 2012 Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann Fraunhofer SCAI Illustration: Darin McInnis Motivation Sparse iterative solvers benefit from
More informationAccelerating image registration on GPUs
Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationTed N. Booth. DesignLinx Hardware Solutions
Ted N. Booth DesignLinx Hardware Solutions September 2015 Using Vivado HLS for Video Algorithm Implementation for Demonstration and Validation Agenda Project Description HLS Lessons Learned Summary Project
More informationParallel graph traversal for FPGA
LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,
More informationNEW FPGA DESIGN AND VERIFICATION TECHNIQUES MICHAL HUSEJKO IT-PES-ES
NEW FPGA DESIGN AND VERIFICATION TECHNIQUES MICHAL HUSEJKO IT-PES-ES Design: Part 1 High Level Synthesis (Xilinx Vivado HLS) Part 2 SDSoC (Xilinx, HLS + ARM) Part 3 OpenCL (Altera OpenCL SDK) Verification:
More informationIntroduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013
Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationAdvanced FPGA Design Methodologies with Xilinx Vivado
Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,
More informationCMPE 415 Programmable Logic Devices Introduction
Department of Computer Science and Electrical Engineering CMPE 415 Programmable Logic Devices Introduction Prof. Ryan Robucci What are FPGAs? Field programmable Gate Array Typically re programmable as
More informationAutomatic Optimization of Hardware Accelerators for Image Processing
Automatic Optimization of Hardware Accelerators for Image Processing Oliver Reiche, Konrad Häublein, Marc Reichenbach, Frank Hannig, Jürgen Teich, and Dietmar Fey Department of Computer Science Friedrich-Alexander
More informationsimulation framework for piecewise regular grids
WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler
More informationNumerical Algorithms on Multi-GPU Architectures
Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications
More informationFCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture
More informationExploring Automatically Generated Platforms in High Performance FPGAs
Exploring Automatically Generated Platforms in High Performance FPGAs Panagiotis Skrimponis b, Georgios Zindros a, Ioannis Parnassos a, Muhsen Owaida b, Nikolaos Bellas a, and Paolo Ienne b a Electrical
More informationGeneration of Multigrid-based Numerical Solvers for FPGA Accelerators
Tis is te autor s version of te work. Te definitive work was publised in Proceedings of te 2nd International Worksop on Hig-Performance Stencil Computations (HiStencils), Amsterdam, Te Neterlands, Jan
More informationSystems of Partial Differential Equations in ExaSlang
This is the author s version of the work. The definitive work was published in Hans-Joachim Bungartz, Philipp Neumann, and Wolfgang E. Nagel, editors, Software for Exascale Computing SPPEXA 2013-2015,
More informationFPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.
Agenda The topics that will be addressed are: Scheduling tasks on Reconfigurable FPGA architectures Mauro Marinoni ReTiS Lab, TeCIP Institute Scuola superiore Sant Anna - Pisa Overview on basic characteristics
More informationModeling Multigrid Algorithms for Variational Imaging
Modeling Multigrid Algorithms for Variational Imaging, Harald Koestler, Reinhard German, Ulrich Ruede Computer Networks and Communication Systems -Nürnberg, Germany April 09, 2010 What do we do? Model
More informationHardware/Software Codesign of Schedulers for Real Time Systems
Hardware/Software Codesign of Schedulers for Real Time Systems Jorge Ortiz Committee David Andrews, Chair Douglas Niehaus Perry Alexander Presentation Outline Background Prior work in hybrid co-design
More informationsmooth coefficients H. Köstler, U. Rüde
A robust multigrid solver for the optical flow problem with non- smooth coefficients H. Köstler, U. Rüde Overview Optical Flow Problem Data term and various regularizers A Robust Multigrid Solver Galerkin
More informationProgrammable Logic Devices HDL-Based Design Flows CMPE 415
HDL-Based Design Flows: ASIC Toward the end of the 80s, it became difficult to use schematic-based ASIC flows to deal with the size and complexity of >5K or more gates. HDLs were introduced to deal with
More informationA Highly Efficient and Comprehensive Image Processing Library for C ++ -based High-Level Synthesis
A Highly Efficient and Comprehensive Image Processing Library for C ++ -based High-Level Synthesis M. Akif Özkan, Oliver Reiche, Frank Hannig, and Jürgen Teich Hardware/Software Co-Design, Friedrich-Alexander
More informationLehrstuhl für Informatik 10 (Systemsimulation)
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG TECHNISCHE FAKULTÄT DEPARTMENT INFORMATIK Lehrstuhl für Informatik 10 (Systemsimulation) Generating an Interface for Parallel Multigrid Solvers and VisIt
More informationEITF35: Introduction to Structured VLSI Design
EITF35: Introduction to Structured VLSI Design Introduction to FPGA design Rakesh Gangarajaiah Rakesh.gangarajaiah@eit.lth.se Slides from Chenxin Zhang and Steffan Malkowsky WWW.FPGA What is FPGA? Field
More informationAn Overlay Architecture for FPGA-based Industrial Control Systems Designed with Functional Block Diagrams
R2-7 SASIMI 26 Proceedings An Overlay Architecture for FPGA-based Industrial Control Systems Designed with Functional Block Diagrams Taisei Segawa, Yuichiro Shibata, Yudai Shirakura, Kenichi Morimoto,
More informationFPGAs: High Assurance through Model Based Design
FPGAs: High Assurance through Based Design AADL Workshop 24 January 2007 9:30 10:00 Yves LaCerte Rockwell Collins Advanced Technology Center 400 Collins Road N.E. Cedar Rapids, IA 52498 ylacerte@rockwellcollins.cm
More informationReconstruction of Trees from Laser Scan Data and further Simulation Topics
Reconstruction of Trees from Laser Scan Data and further Simulation Topics Helmholtz-Research Center, Munich Daniel Ritter http://www10.informatik.uni-erlangen.de Overview 1. Introduction of the Chair
More informationHigh Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx
High Capacity and High Performance 20nm FPGAs Steve Young, Dinesh Gaitonde August 2014 Not a Complete Product Overview Page 2 Outline Page 3 Petabytes per month Increasing Bandwidth Global IP Traffic Growth
More informationMultigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX10
Multigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX0 Kengo Nakajima Information Technology enter, The University of Tokyo, Japan November 4 th, 0 Fujitsu Booth S Salt Lake
More informationA Configurable Multi-Ported Register File Architecture for Soft Processor Cores
A Configurable Multi-Ported Register File Architecture for Soft Processor Cores Mazen A. R. Saghir and Rawan Naous Department of Electrical and Computer Engineering American University of Beirut P.O. Box
More informationEfficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs
Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationII. LITERATURE SURVEY
Hardware Co-Simulation of Sobel Edge Detection Using FPGA and System Generator Sneha Moon 1, Prof Meena Chavan 2 1,2 Department of Electronics BVUCOE Pune India Abstract: This paper implements an image
More informationRTL Coding General Concepts
RTL Coding General Concepts Typical Digital System 2 Components of a Digital System Printed circuit board (PCB) Embedded d software microprocessor microcontroller digital signal processor (DSP) ASIC Programmable
More informationSimplify System Complexity
Simplify System Complexity With the new high-performance CompactRIO controller Fanie Coetzer Field Sales Engineer Northern South Africa 2 3 New control system CompactPCI MMI/Sequencing/Logging FieldPoint
More informationSimplify System Complexity
1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller
More informationScalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA
Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School
More informationScalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA
Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089
More informationESE532: System-on-a-Chip Architecture. Today. Message. Clock Cycle BRAM
ESE532: System-on-a-Chip Architecture Day 20: April 3, 2017 Pipelining, Frequency, Dataflow Today What drives cycle times Pipelining in Vivado HLS C Avoiding bottlenecks feeding data in Vivado HLS C Penn
More informationFCUDA: Enabling Efficient Compilation of CUDA Kernels onto
FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs October 13, 2009 Overview Presenting: Alex Papakonstantinou, Karthik Gururaj, John Stratton, Jason Cong, Deming Chen, Wen-mei Hwu. FCUDA:
More informationModeling a 4G LTE System in MATLAB
Modeling a 4G LTE System in MATLAB Part 3: Path to implementation (C and HDL) Houman Zarrinkoub PhD. Signal Processing Product Manager MathWorks houmanz@mathworks.com 2011 The MathWorks, Inc. 1 LTE Downlink
More informationA Lost Cycles Analysis for Performance Prediction using High-Level Synthesis
A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,
More informationA 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation
A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation Abstract: The power budget is expected to limit the portion of the chip that we can power ON at the upcoming technology nodes. This problem,
More information"On the Capability and Achievable Performance of FPGAs for HPC Applications"
"On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies
More informationSynthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool
Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool Md. Abdul Latif Sarker, Moon Ho Lee Division of Electronics & Information Engineering Chonbuk National University 664-14 1GA Dekjin-Dong
More informationFPGA design with National Instuments
FPGA design with National Instuments Rémi DA SILVA Systems Engineer - Embedded and Data Acquisition Systems - MED Region ni.com The NI Approach to Flexible Hardware Processor Real-time OS Application software
More informationExploring OpenCL Memory Throughput on the Zynq
Exploring OpenCL Memory Throughput on the Zynq Technical Report no. 2016:04, ISSN 1652-926X Chalmers University of Technology Bo Joel Svensson bo.joel.svensson@gmail.com Abstract The Zynq platform combines
More informationMultigrid Pattern. I. Problem. II. Driving Forces. III. Solution
Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids
More informationVivado HLx Design Entry. June 2016
Vivado HLx Design Entry June 2016 Agenda What is the HLx Design Methodology? New & Early Access features for Connectivity Platforms Creating Differentiated Logic 2 What is the HLx Design Methodology? Page
More informationSDSoC: Session 1
SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the
More informationFPGA Based Digital Design Using Verilog HDL
FPGA Based Digital Design Using Course Designed by: IRFAN FAISAL MIR ( Verilog / FPGA Designer ) irfanfaisalmir@yahoo.com * Organized by Electronics Division Integrated Circuits Uses for digital IC technology
More informationPrecise Continuous Non-Intrusive Measurement-Based Execution Time Estimation. Boris Dreyer, Christian Hochberger, Simon Wegener, Alexander Weiss
Precise Continuous Non-Intrusive Measurement-Based Execution Time Estimation Boris Dreyer, Christian Hochberger, Simon Wegener, Alexander Weiss This work was funded within the project CONIRAS by the German
More informationS-COR-10 IMAGE STABILITHATION IP CORE Programmer manual
S-COR-10 IMAGE STABILITHATION IP CORE Programmer manual IP core version: 1.0 Date: 28.09.2015 CONTENTS INTRODUCTION... 3 CORE VERSIONS... 3 BASIC CHARACTERISTICS... 3 DESCRIPTION AND OPERATION PRINCIPLE...
More informationMidterm Exam. Solutions
Midterm Exam Solutions Problem 1 List at least 3 advantages of implementing selected portions of a complex design in software Software vs. Hardware Trade-offs Improve Performance Improve Energy Efficiency
More informationReconfigurable Computing. Design and Implementation. Chapter 4.1
Design and Implementation Chapter 4.1 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design In System Integration System Integration Rapid Prototyping Reconfigurable devices (RD) are usually
More informationModel-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany
Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation
More informationReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware
ReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware Enno Lübbers and Marco Platzner Computer Engineering Group University of Paderborn {enno.luebbers, platzner}@upb.de Outline
More informationOUTLINE RTL DESIGN WITH ARX
1 2 RTL DESIGN WITH ARX IMPLEMENTATION OF DIGITAL SIGNAL PROCESSING Sabih H. Gerez University of Twente OUTLINE Design languages Arx motivation and alternatives Main features of Arx Arx language elements
More informationINTRODUCTION TO FPGA ARCHITECTURE
3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)
More informationECE 5775 High-Level Digital Design Automation, Fall 2016 School of Electrical and Computer Engineering, Cornell University
ECE 5775 High-Level Digital Design Automation, Fall 2016 School of Electrical and Computer Engineering, Cornell University Optical Flow on FPGA Ian Thompson (ijt5), Joseph Featherston (jgf82), Judy Stephen
More informationRFNoC Neural-Network Library using Vivado HLS (rfnoc-hls-neuralnet) EJ Kreinar Team E to the J Omega
RFNoC Neural-Network Library using Vivado HLS (rfnoc-hls-neuralnet) EJ Kreinar Team E to the J Omega Overview An RFNoC out-of-tree module that can be used to simulate, synthesize, and run a neural network
More informationYet Another Implementation of CoRAM Memory
Dec 7, 2013 CARL2013@Davis, CA Py Yet Another Implementation of Memory Architecture for Modern FPGA-based Computing Shinya Takamaeda-Yamazaki, Kenji Kise, James C. Hoe * Tokyo Institute of Technology JSPS
More informationFPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011
FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level
More informationOptimize DSP Designs and Code using Fixed-Point Designer
Optimize DSP Designs and Code using Fixed-Point Designer MathWorks Korea 이웅재부장 Senior Application Engineer 2013 The MathWorks, Inc. 1 Agenda Fixed-point concepts Introducing Fixed-Point Designer Overview
More informationAn Overview of a Compiler for Mapping MATLAB Programs onto FPGAs
An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs P. Banerjee Department of Electrical and Computer Engineering Northwestern University 2145 Sheridan Road, Evanston, IL-60208 banerjee@ece.northwestern.edu
More informationEnergy scalability and the RESUME scalable video codec
Energy scalability and the RESUME scalable video codec Harald Devos, Hendrik Eeckhaut, Mark Christiaens ELIS/PARIS Ghent University pag. 1 Outline Introduction Scalable Video Reconfigurable HW: FPGAs Implementation
More informationTowards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationFPGA: What? Why? Marco D. Santambrogio
FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much
More informationPerformance and accuracy of hardware-oriented. native-, solvers in FEM simulations
Robert Strzodka, Stanford University Dominik Göddeke, Universität Dortmund Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations Number of slices
More informationHardware/Software Codesign
Hardware/Software Codesign SS 2016 Prof. Dr. Christian Plessl High-Performance IT Systems group University of Paderborn Version 2.2.0 2016-04-08 how to design a "digital TV set top box" Motivating Example
More informationInternational Training Workshop on FPGA Design for Scientific Instrumentation and Computing November 2013
2499-20 International Training Workshop on FPGA Design for Scientific Instrumentation and Computing 11-22 November 2013 High-Level Synthesis: how to improve FPGA design productivity RINCON CALLE Fernando
More informationHRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing
HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University http://mast.stanford.edu HPCA March 14, 2016 PIM is Coming Back End of Dennard
More informationFCUDA: Enabling Efficient Compilation of CUDA Kernels onto
FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs October 13, 2009 Overview Presenting: Alex Papakonstantinou, Karthik Gururaj, John Stratton, Jason Cong, Deming Chen, Wen-mei Hwu. FCUDA:
More informationISim Hardware Co-Simulation Tutorial: Accelerating Floating Point Fast Fourier Transform Simulation
ISim Hardware Co-Simulation Tutorial: Accelerating Floating Point Fast Fourier Transform Simulation UG817 (v 13.2) July 28, 2011 Xilinx is disclosing this user guide, manual, release note, and/or specification
More informationPINE TRAINING ACADEMY
PINE TRAINING ACADEMY Course Module A d d r e s s D - 5 5 7, G o v i n d p u r a m, G h a z i a b a d, U. P., 2 0 1 0 1 3, I n d i a Digital Logic System Design using Gates/Verilog or VHDL and Implementation
More informationDeveloping Dynamic Profiling and Debugging Support in OpenCL for FPGAs
Developing Dynamic Profiling and Debugging Support in OpenCL for FPGAs ABSTRACT Anshuman Verma Virginia Tech, Blacksburg, VA anshuman@vt.edu Skip Booth, Robbie King, James Coole, Andy Keep, John Marshall
More informationIntegrating GPUs as fast co-processors into the existing parallel FE package FEAST
Integrating GPUs as fast co-processors into the existing parallel FE package FEAST Dominik Göddeke Universität Dortmund dominik.goeddeke@math.uni-dortmund.de Christian Becker christian.becker@math.uni-dortmund.de
More informationTowards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing
Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Walter Stechele, Stephan Herrmann, Andreas Herkersdorf Technische Universität München 80290 München Germany Walter.Stechele@ei.tum.de
More informationDATA REUSE ANALYSIS FOR AUTOMATED SYNTHESIS OF CUSTOM INSTRUCTIONS IN SLIDING WINDOW APPLICATIONS
Georgios Zacharopoulos Giovanni Ansaloni Laura Pozzi DATA REUSE ANALYSIS FOR AUTOMATED SYNTHESIS OF CUSTOM INSTRUCTIONS IN SLIDING WINDOW APPLICATIONS Università della Svizzera italiana (USI Lugano), Faculty
More informationIntroduction to Field Programmable Gate Arrays
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Historical introduction.
More informationTable 1: Example Implementation Statistics for Xilinx FPGAs
logijpge Motion JPEG Encoder January 10 th, 2018 Data Sheet Version: v1.0 Xylon d.o.o. Fallerovo setaliste 22 10000 Zagreb, Croatia Phone: +385 1 368 00 26 Fax: +385 1 365 51 67 E-mail: support@logicbricks.com
More informationPerformance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL Models Hugues Jérome Institut Supérieur de l'aéronautique et de l'espace (ISAE-SUPAERO) Université de Toulouse 31055 TOULOUSE Cedex 4 Jerome.huges@isae.fr
More information