NCSA Reconfigurable Systems Summer Institute July, Michael Babst DSPlogic, Inc x705. DSPlogic Proprietary
|
|
- Edwin Payne
- 6 years ago
- Views:
Transcription
1 Reconfigurable Practical Reconfigurable Computing Computing Made Easy! Today NCSA Reconfigurable Systems Summer Institute July, 2005 Michael Babst DSPlogic, Inc x705 DSPlogic Proprietary
2 Typical Design flow (the old way) Benchmark performance, profile execution, I/O Partition Algorithm Define CPU/FPGA messaging scheme Learn VHDL or Verilog Design FPGA Code Application I/O (interface w/ custom vendor cores) Verify I/O - Synthesize/Place/Route FPGA Optimize I/O for BW/latency Code Application Core (verify) Synthesize/Place/Route FPGA (App+I/O) Fiddle with I/O, Application until FPGA builds Verify APP+I/O - Synthesize/Place/Route FPGA Done! Make a small change to the application Become an Expert in VHDL/Verilog, HW design, timing diagrams Redesign I/O, Modularize design, Isolate I/O from application Optimize speed and timing Working design!
3 Typical Design flow (the old way) Here comes the Virtex 4! Re-partition Algorithm Redesign I/O Get the idea? RC can be a challenge, even for a hardware designer
4 Practical Reconfigurable Computing Customers #1 Requirement in selecting (or not selecting) an RC platform Usability of Tools Currently available methods High-level C programming Mitrion-C, System-C, Handel-C Hard-core HDL VHDL, Verilog Model-based / Dataflow Design Viva, Simulink Customers need a solution that 1) works today 2) is easy to use.
5 Agenda Rapid RC Development Kit Reconfigurable Computing (RCIO) API RCIO API Implementation RCIO FPGA Core / SW Library Cray XD1 Platform Application Examples FFT Point Projection Floating point Multiply/add/subtract
6 Rapid RC Development Kit DSPlogic Proprietary
7 DSPlogic Rapid RC Development Kit Key Development Kit Components RCIO Software Library RCIO API Implementation, platform optimized RCIO FPGA Core RCIO Core Implementation, platform optimized Matlab/Simulink Interface DSPlogic RC Blockset for Matlab/Simulink RCIO FPGA Builder Automated FPGA implementation, platform optimized Reliable one-click build process Xilinx System Generator Application Example
8 Rapid RC Development Kit Design Flow Algorithm Matlab/Simulink Rapid implementation Simplified CPU/FPGA messaging CPU/FPGA Partition, Specify Dataflow Simulink Low-overhead Bandwidth or latency optimization Implement Algorithm Call RCIO API Functions DSPlogic RC Blockset Familiar, industry standard modeling environment Design Data Processor Verify Data Processor Output DSPlogic RCIO FPGA Builder Stream or block processing Common API Portable, reusable upgradeable code Optimized core libraries Efficient resource utilization Integrated algorithm verification Processor Fully Integrated, Verified, Seamless Application RCIO API Transparent interface Integrated bitstream generation Simplified high-speed design Maximum processing and I/O throughput Floating-point capability
9 Model-based FPGA Design Advantages VHDL not required (but possible) Integrated algorithm verification Industry standard, familiar environment Easy integration of IP cores Clear view of algorithm architecture Optimized core libraries Highly efficient use of FPGA resources Ease of FPGA design verification Automatic design documentation Integrated bitstream generation Compatible with high-level languages HLL Behavioral descriptions possible
10 DSPlogic/Xilinx System Generator Integrated Environment Direct Algorithm To Platform FPGA! FPGA
11 Reconfigurable Computing I/O (RCIO) API DSPlogic Proprietary
12 Reconfigurable Computing I/O API A Simple, future-proof CPU/FPGA messaging interface Transparent portable interface Dramatically reduces FPGA development time RCIO Library and FPGA Core User FPGA Application Quickly achieve optimum latency and bandwidth Crucial to performance User Software Application rcio_send() rcio_receive() rcio_appcfg() rcio_appstat() Input Data Bus Output Data Bus Control Registers Status Registers Application portability, reusability and upgradeability Future-proof: easy migration to newer, higher-performance FPGAs Separate data and control message paths RAM Interface Low-overhead Block and Stream Processing Platform RAM Multiple CPU/FPGA support
13 Reconfigurable Computing I/O API Multiple processor / multiple FPGA support CPU #1 CPU #2 CPU #N User Application RCIO SW library User Application RCIO SW library MPI, PVM, etc User Application RCIO SW library Platform Specific Connection Fabric Transparent, Portable Application Interface! RCIO FPGA Core User Application FPGA #1 RCIO FPGA Core User Application FPGA #2 RCIO FPGA Core User Application FPGA #K
14 Data Message Structure (64-bit) Dataset Message MSG 0 BLK 0 User-definable format BLK 1 Data Block real32 real32 MSG 1 word 0 [63:0] word 1 [63:0] int16 int16 real64 int16 int16 custom BLK B-1 MSG M-1 M Messages B Blocks word K-1 [63:0] K words I/O often limits performance - Use care with CPU/FPGA algorithm partitioning - Consider smaller data types Easily group processing blocks into messages for optimal I/O bandwidth
15 Software API - Data Interface rcio_send() Send single message to FPGA rcio_receive() Receive single message from FPGA rcio_stream() Blocking function Break dataset into messages Send all messages to FPGA Receive all result messages from FPGA Bandwidth-optimized
16 Software API - Control Interface Messaging control functions rcio_config() Initialize / configure CPU/FPGA communications link rcio_status() Return status of communications link rcio_close() fpgastatus nmsgreceived,nmsgreturned Fifo levels / over/underflow Close CPU/FPGA communications link User application control and status commands rcio_appcfg() Write to user-definable application control register rcio_appstat() Read user-definable application control register
17 RCIO Hardware Abstraction Layer (HAL) API DSPlogic RCIO Core User FPGA Application in_ready Input Data Message FIFO Output Data Messag FIFO in_data in_write in_start in_length out_ready out_data out_write out_start Data Processor Platform Specific RAM / Peripherals out_length Control I/O ctrl_reg(0-7) stat_reg(0-7) ib_depth, ob_depth clk rst
18 RCIO Hardware Abstraction Layer (HAL) API FPGA design tool-independent Supports wrappers in all design environments High-level design tools Xilinx System Generator, etc. Custom VHDL / Verilog High-level C Mitrion-C System C, Handel-C, System Verilog, etc.
19 RCIO FPGA Core / SW Library Implementation for the Cray XD1 Supercomputer DSPlogic Proprietary
20 Seamless Cray XD1 CPU-FPGA Messaging Application Memory Processor DSPlogic RCIO SW Library rcio_send(fpga_id, *datap, txmsglen) RAP Directly Link CPU and FPGA Applications! Total Hardware Abstraction! DSPlogic RCIO FPGA Core ready data write start length Application Processing FPGA
21 RCIO FPGA core and SW Library Cray XD1 Opteron Application Accelerator User CPU Application User Data Memory Control DSPlogic RCIO Library Cray AA API Result Buffer (2 MB) 1.42 GB/s (Max) 1.1 GB/s (Typ) RAP 1.42 GB/s (Max) 1.1 GB/s (Typ) Cray RT Core DSPlogic RCIO Core Input Data FIFO Output Data FIFO Control User FPGA Application Process Data Control Cray QDRII Core 8 MB RAM Transparent interface
22 Cray XD1 Performance Demonstrated performance and usability on multiple applications in multiple design environments Extremely Modular - Multiple applications at full speed (200 MHz) Fastest CPU/FPGA Interface Available for the Cray XD1! Combined (Send/Recv) Throughput (Symmetric Send/Recv rates) Achievable Data Rates (Mbytes/sec) (Including dataset sizes > 2 MB) Send Rcv Total 1400 Theoretical Max (not achievable) Typical Send Only Application N/A MBytes/sec Typical Receive Only Application Typical Send/ Receive Application DSPlogic RCIO Send/Receive * N/A < < > Message Length (64-bit words) * Using rcio_stream() Applications immediately benefit from API enhancements
23 Application Examples -FFT -VHDL-based design flow DSPlogic Proprietary
24 FFT Accelerator FFT Length: 32 to Fixed-point, full-precision Complex FFT VHDL design flow Device Utilization 14k Slices (60% V2P50) 186 Block Rams (80% V2P50) Software API Include RCIO API Additional application specific library functions fft_init(fft length, direction) FFT Usage rcio_config() fft_init() rcio_stream() rcio_close()
25 FFT Performance Improvement ~10x improvement possible today! Complex FFT, FPGA vs. FFTW on AMD 246 Performance depends on data types Im 1 (15:0) Re 1 (15:0) Im 0 [15:0] Re 0 [15:0] in unused Re 0 (15:0) unused Re 0 [15:0] Im 1 (15:0) Re 1 (15:0) Im 0 [15:0] Re 0 [15:0] out Im[31:0] Re[31:0] Additional speed at expense of considering scaling / dynamic range effects Accuracy similar to single-precision floating point algorithms R=1.4G R=1.1G R=800M R=1.4G R=1.1G R=800M T(fftw)/T(fpga) T(fftw)/T(fpga) Nfft Nfft
26 FFT Summary Performance is I/O constrained > 10x speed gains are achievable today FPGA Performance enhancement increases with FFT length Multiple FFTs utilize pipeline and provide efficiency FFT L2norm accuracy ~10-5, similar to other single-precision algorithms Modular architecture Separate I/O and application optimization Rapid application development Message latency limits speed improvement for single computations of small FFT sizes
27 Application Examples -Dirt Code - Point Projection -Model-based design example DSPlogic Proprietary
28 Original Code struct s_point pointprojection(struct s_plane plane, struct s_point p1) { // // Get the projection of point p1 on the plane // v = p1 - plane.p; result = v - (v*plane.n)plane.n + plane.p // double temp; struct s_point vec1, proj; // // Get the vector (vec1) from a point in the plane to p1 // vec1.x = p1.x-plane.p.x; vec1.y = p1.y-plane.p.y; vec1.z = p1.z-plane.p.z; // // temp = vec1 dot plane.n // temp = plane.n.x*vec1.x + plane.n.y*vec1.y + plane.n.z*vec1.z; // // Get the global coordinates for p1's projection // proj.x = vec1.x - temp*plane.n.x + plane.p.x; proj.y = vec1.y - temp*plane.n.y + plane.p.y; proj.z = vec1.z - temp*plane.n.z + plane.p.z; return proj; } Courtesy David Raila / Youssef Hashash
29 Original Code struct s_point pointprojection(struct s_plane plane, struct s_point p1) plane.p p1 plane.n vec1.x = p1.x-plane.p.x; vec1.y = p1.y-plane.p.y; vec1.z = p1.z-plane.p.z; delay vec1 delay temp = plane.n.x*vec1.x + plane.n.y*vec1.y + plane.n.z*vec1.z; proj.x = vec1.x - temp*plane.n.x + plane.p.x; proj.y = vec1.y - temp*plane.n.y + plane.p.y; proj.z = vec1.z - temp*plane.n.z + plane.p.z; delay delay delay delay delay dot temp delay } return proj; proj
30 Message Formats Input Format Output Format Block 64-bit offset in_data[63:0] 0 pt[0].x 1 pt[0].y 2 pt[0].z 3 plane_n[0].x 4 plane_n[0].y 5 plane_n[0].z 6 plane_p[0].x 7 plane_p[0].y 8 plane_p[0].z 9 pt[1].x 10 pt[1].y 11 pt[1].z 12 plane_n[1].x 13 plane_n[1].y 14 plane_n[1].z 15 plane_p[1].x 16 plane_p[1].y 17 plane_p[1].z : : : : : : : : : pt[p-1].x pt[p-1].y pt[p-1].z Block 0 Block 1 Block P-1 plane_n[p-1].x plane_n[p-1].y plane_n[p-1].z plane_p[p-1].x plane_p[p-1].y 9P-1 plane_p[p-1].z Block 64-bit offset out_data[63:0] 0 proj[0].x Block 0 1 proj[0].y 2 proj[0].z 3 proj[1].x Block 1 4 proj[1].y 5 proj[1].z : : : : : : : Block P-1 : : proj[p-1].x proj[p-1].y 3P-1 proj[p-1].z Number of Projections/Message P = 64*k, 6 <=k <= 113 Theoretical Max possible projections / second I/O limited No Packing ~ 200 MHz/9 = 22.2 M projections/sec With Packing ~ 200 MHz/3 = 66.6 M projections/sec
31 Data Format / Precision Data Width Use 64-bit for future flexibility Tradeoff data packing vs. software/firmware rework Input Data din[15:0] Output Data dout[63:0] Precision Input: 16-bit signed integer Output: 52-bit signed integer (sign-extended to 64-bits)
32 Point Projection High-level Diagram DSPlogic Rapid Reconfigurable Computing Development Kit Point Projection Demonstration in_data in_write in_start in_length_m1 sfix64 double double uint16 out_data_s sfix64 in_data in_write in_start out_data testbench_source uint16 blkspermessage uint16 out_length_m1 userparameters 1 boolean double in_data_s in_write_s in_start_s in_length_m1_s out_ready _s rst_s out_write_s out_start_s out_length_m1_s in_ready _s double double double double out_write out_start out_length_m1 in_ready testbench_verify ib_depth_s dp_stat_reg_0_s uf ix64 dp_stat_reg_0 DSPlogic RCIO FPGA Builder ob_depth_s dp_ctrl_reg_0_s dp_stat_reg_1_s uf ix64 dp_stat_reg_1 dp_ctrl_reg_1_s dp_stat_reg_2_s uf ix64 dp_stat_reg_2 Sy stem Generator dp_ctrl_reg_2_s dp_ctrl_reg_3_s dp_ctrl_reg_4_s dp_stat_reg_3_s dp_stat_reg_4_s uf ix64 uf ix64 dp_stat_reg_3 dp_stat_reg_4 Copyright 2005 DSPlogic, Inc All Rights Reserved Double-click for more info 0 uf ix64 dp_ctrl_reg_5_s dp_ctrl_reg_6_s dp_ctrl_reg_7_s dp_stat_reg_5_s dp_stat_reg_6_s uf ix64 uf ix64 dp_stat_reg_5 dp_stat_reg_6 DSPlogic Proprietary core dp_stat_reg_7_s uf ix64 dp_stat_reg_7
33 Point Projection Core Unit: projection Fix_16_0 1 plane_n_x Fix_16_0 2 plane_n_y Fix_16_0 3 plane_n_z in_x in_y in_z delay1 out_x out_y out_z Fix_16_0 Fix_16_0 Fix_16_0 plane_n_x plane_n_y temp_x_planen_x Fix_51_0 temp = plane.n.x*vec1.x + plane.n.y*vec1.y + plane.n.z*vec1.z; plane_n_z Fix_16_0 4 p1_x Fix_16_0 5 p1_y Fix_16_0 6 p1_z Fix_16_0 7 plane_p_x Fix_16_0 8 plane_p_y Fix_16_0 9 plane_p_z p1_x p1_y p1_z plane_p_x plane_p_y plane_p_z getvec v1_x v 1_y v 1_z Fix_17_0 Fix_17_0 Fix_17_0 v 1_x v 1_y v 1_z temp_x_planen_y temp_x_planen_z v1_dot_plane_n Fix_51_0 Fix_51_0 temp_x_planen_x temp_x_planen_y temp_x_planen_z proj_x Fix_64_0 1 proj_x vec1.x = p1.x-plane.p.x; vec1.y = p1.y-plane.p.y; vec1.z = p1.z-plane.p.z; in_x in_y in_z out_x out_y out_z Fix_17_0 Fix_17_0 Fix_17_0 v1_x v1_y v1_z proj_y Fix_64_0 2 proj_y delay3 plane_p_x in_x in_y out_x out_y Fix_16_0 Fix_16_0 in_x in_y out_x out_y Fix_16_0 Fix_16_0 plane_p_y plane_p_z proj_z Fix_64_0 3 proj_z Fix_16_0 Fix_16_0 in_z out_z in_z out_z proj delay2 delay4 Stage1Delay=2 Stage2Delay=12 Stage3Delay=6 proj.x = vec1.x - temp*plane.n.x + plane.p.x; proj.y = vec1.y - temp*plane.n.y + plane.p.y; proj.z = vec1.z - temp*plane.n.z + plane.p.z; Bool 10 in_en z -2 Bool z -12 Bool z -6 Bool 4 out_en 11 Bool z -2 Bool z -12 Bool z -6 Bool in_start 5 out_start
34 Performance results Computation Rate (conservative) Input data type precision Millions of Projections/s Message Size (64-bit words) ~10-5 Computation Accuracy Full-precision L2Norm Error = 0 4x additional speed improvement (60M projections/sec) with data packing Next steps Move more functionality into FPGA Partition algorithm for lower I/O bandwidth
35 Summary Rapid RC Development Kit A practical design method available today Reconfigurable Computing (RCIO) API Offers many benefits, including portability RCIO API Implementation Cray XD1 Platform fastest FPGA/CPU interface available Additional platforms coming Application Examples FFT Point Projection Floating point multiply/add/subtract Future Hardware co-simulation CPU/FPGA I/O Standardization RCIO API enhancements High-level language integration
36 Contact Information Michael Babst x705
Evaluation of running FFTs on the Cray XD1 with attached FPGAs
Evaluation of running FFTs on the Cray XD1 with attached FPGAs Michael Babst DSPlogic, Inc. 13017 Wisteria Drive, #420, Germantown, MD 20874 Phone (301) 977-5970 Mike.Babst@dpslogic.com Roderick Swift
More informationMATLAB/Simulink 기반의프로그래머블 SoC 설계및검증
MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증 이웅재부장 Application Engineering Group 2014 The MathWorks, Inc. 1 Agenda Introduction ZYNQ Design Process Model-Based Design Workflow Prototyping and Verification Processor
More informationMetropolitan Road Traffic Simulation on FPGAs
Metropolitan Road Traffic Simulation on FPGAs Justin L. Tripp, Henning S. Mortveit, Anders Å. Hansson, Maya Gokhale Los Alamos National Laboratory Los Alamos, NM 85745 Overview Background Goals Using the
More informationSupport for Programming Reconfigurable Supercomputers
Support for Programming Reconfigurable Supercomputers Miriam Leeser Nicholas Moore, Albert Conti Dept. of Electrical and Computer Engineering Northeastern University Boston, MA Laurie Smith King Dept.
More informationIntegrated Workflow to Implement Embedded Software and FPGA Designs on the Xilinx Zynq Platform Puneet Kumar Senior Team Lead - SPC
Integrated Workflow to Implement Embedded Software and FPGA Designs on the Xilinx Zynq Platform Puneet Kumar Senior Team Lead - SPC 2012 The MathWorks, Inc. 1 Agenda Integrated Hardware / Software Top
More informationESL design with the Agility Compiler for SystemC
ESL design with the Agility Compiler for SystemC SystemC behavioral design & synthesis Steve Chappell & Chris Sullivan Celoxica ESL design portfolio Complete ESL design environment Streaming Video Processing
More informationFPGA Solutions: Modular Architecture for Peak Performance
FPGA Solutions: Modular Architecture for Peak Performance Real Time & Embedded Computing Conference Houston, TX June 17, 2004 Andy Reddig President & CTO andyr@tekmicro.com Agenda Company Overview FPGA
More informationImplementing MATLAB Algorithms in FPGAs and ASICs By Alexander Schreiber Senior Application Engineer MathWorks
Implementing MATLAB Algorithms in FPGAs and ASICs By Alexander Schreiber Senior Application Engineer MathWorks 2014 The MathWorks, Inc. 1 Traditional Implementation Workflow: Challenges Algorithm Development
More informationIntro to System Generator. Objectives. After completing this module, you will be able to:
Intro to System Generator This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Explain why there is a need for an integrated
More informationHardware-Software Co-Design and Prototyping on SoC FPGAs Puneet Kumar Prateek Sikka Application Engineering Team
Hardware-Software Co-Design and Prototyping on SoC FPGAs Puneet Kumar Prateek Sikka Application Engineering Team 2015 The MathWorks, Inc. 1 Agenda Integrated Hardware / Software Top down Workflow for SoC
More informationReconOS: An RTOS Supporting Hardware and Software Threads
ReconOS: An RTOS Supporting Hardware and Software Threads Enno Lübbers and Marco Platzner Computer Engineering Group University of Paderborn marco.platzner@computer.org Overview the ReconOS project programming
More informationNCBI BLAST accelerated on the Mitrion Virtual Processor
NCBI BLAST accelerated on the Mitrion Virtual Processor Why FPGAs? FPGAs are 10-30x faster than a modern Opteron or Itanium Performance gap is likely to grow further in the future Full performance at low
More informationTools for Reconfigurable Supercomputing. Kris Gaj George Mason University
Tools for Reconfigurable Supercomputing Kris Gaj George Mason University 1 Application Development for Reconfigurable Computers Program Entry Platform mapping Debugging & Verification Compilation Execution
More informationCray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:
Cray events! Cray User Group (CUG):! When: May 16-19, 2005! Where: Albuquerque, New Mexico - USA! Registration: reserved to CUG members! Web site: http://www.cug.org! Cray Technical Workshop Europe:! When:
More informationA Framework to Improve IP Portability on Reconfigurable Computers
A Framework to Improve IP Portability on Reconfigurable Computers Miaoqing Huang, Ivan Gonzalez, Sergio Lopez-Buedo, and Tarek El-Ghazawi NSF Center for High-Performance Reconfigurable Computing (CHREC)
More informationUsing FPGAs in Supercomputing Reconfigurable Supercomputing
Using FPGAs in Supercomputing Reconfigurable Supercomputing Why FPGAs? FPGAs are 10 100x faster than a modern Itanium or Opteron Performance gap is likely to grow further in the future Several major vendors
More informationModel-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany
Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation
More informationReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware
ReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware Enno Lübbers and Marco Platzner Computer Engineering Group University of Paderborn {enno.luebbers, platzner}@upb.de Outline
More informationOptimize DSP Designs and Code using Fixed-Point Designer
Optimize DSP Designs and Code using Fixed-Point Designer MathWorks Korea 이웅재부장 Senior Application Engineer 2013 The MathWorks, Inc. 1 Agenda Fixed-point concepts Introducing Fixed-Point Designer Overview
More informationThe Cray XD1. Technical Overview. Amar Shan, Senior Product Marketing Manager. Cray XD1. Cray Proprietary
The Cray XD1 Cray XD1 Technical Overview Amar Shan, Senior Product Marketing Manager Cray Proprietary The Cray XD1 Cray XD1 Built for price performance 30 times interconnect performance 2 times the density
More informationFlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com
FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers
More informationMiddleware Challenges for Reconfigurable Computing
Middleware Challenges for Reconfigurable Computing Cray User Group Conference Albuquerque, New Mexico May 16-19 2005 Contents Introduction The Promise of Reconfigurable Computing RC Integration Challenges
More informationModel-Based Design for Video/Image Processing Applications
Model-Based Design for Video/Image Processing Applications The MathWorks Agenda Model-Based Design From MATLAB and Simulink to Altera FPGA Step-by-step design and implementation of edge detection algorithm
More informationISim Hardware Co-Simulation Tutorial: Accelerating Floating Point Fast Fourier Transform Simulation
ISim Hardware Co-Simulation Tutorial: Accelerating Floating Point Fast Fourier Transform Simulation UG817 (v 13.2) July 28, 2011 Xilinx is disclosing this user guide, manual, release note, and/or specification
More informationMulti-Gigahertz Parallel FFTs for FPGA and ASIC Implementation
Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation Doug Johnson, Applications Consultant Chris Eddington, Technical Marketing Synopsys 2013 1 Synopsys, Inc. 700 E. Middlefield Road Mountain
More informationVivado HLx Design Entry. June 2016
Vivado HLx Design Entry June 2016 Agenda What is the HLx Design Methodology? New & Early Access features for Connectivity Platforms Creating Differentiated Logic 2 What is the HLx Design Methodology? Page
More informationMaster s Thesis Presentation Hoang Le Director: Dr. Kris Gaj
Master s Thesis Presentation Hoang Le Director: Dr. Kris Gaj Outline RSA ECM Reconfigurable Computing Platforms, Languages and Programming Environments Partitioning t ECM Code between HDLs and HLLs Implementation
More informationHigh-Level Synthesis with LabVIEW FPGA
High-Level Synthesis with LabVIEW FPGA National Instruments Agenda Introduction NI RIO technology LabVIEW FPGA & IP Builder RIO Hardware Platform Application 2 An Ideal Embedded Architecture Processor
More informationFPGA 101. Field programmable gate arrays in action
FPGA 101 Field programmable gate arrays in action About me Karsten Becker Head of electronics @Part-Time Scientists PhD candidate @TUHH FPGA Architecture 2 What is an FPGA Programmable Logic Programmable
More informationYet Another Implementation of CoRAM Memory
Dec 7, 2013 CARL2013@Davis, CA Py Yet Another Implementation of Memory Architecture for Modern FPGA-based Computing Shinya Takamaeda-Yamazaki, Kenji Kise, James C. Hoe * Tokyo Institute of Technology JSPS
More informationParallel FIR Filters. Chapter 5
Chapter 5 Parallel FIR Filters This chapter describes the implementation of high-performance, parallel, full-precision FIR filters using the DSP48 slice in a Virtex-4 device. ecause the Virtex-4 architecture
More informationESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)
ESE Back End 2.0 D. Gajski, S. Abdi (with contributions from H. Cho, D. Shin, A. Gerstlauer) Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu 1 Technology advantages
More informationINT G bit TCP Offload Engine SOC
INT 10011 10 G bit TCP Offload Engine SOC Product brief, features and benefits summary: Highly customizable hardware IP block. Easily portable to ASIC flow, Xilinx/Altera FPGAs or Structured ASIC flow.
More informationIntroduction to Field Programmable Gate Arrays
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Historical introduction.
More informationDesigning and Targeting Video Processing Subsystems for Hardware
1 Designing and Targeting Video Processing Subsystems for Hardware 정승혁과장 Senior Application Engineer MathWorks Korea 2017 The MathWorks, Inc. 2 Pixel-stream Frame-based Process : From Algorithm to Hardware
More informationOn Using Simulink to Program SRC-6 Reconfigurable Computer
In Proc. 9 th Military and Aerospace Programmable Logic Devices (MAPLD) International Conference September, 2006, Washington, DC. On Using Simulink to Program SRC-6 Reconfigurable Computer David Meixner,
More informationNew Software-Designed Instruments
1 New Software-Designed Instruments Nicholas Haripersad Field Applications Engineer National Instruments South Africa Agenda What Is a Software-Designed Instrument? Why Software-Designed Instrumentation?
More informationBasic Xilinx Design Capture. Objectives. After completing this module, you will be able to:
Basic Xilinx Design Capture This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: List various blocksets available in System
More informationIntroduction to C and HDL Code Generation from MATLAB
Introduction to C and HDL Code Generation from MATLAB 이웅재차장 Senior Application Engineer 2012 The MathWorks, Inc. 1 Algorithm Development Process Requirements Research & Design Explore and discover Design
More informationUser Manual for FC100
Sundance Multiprocessor Technology Limited User Manual Form : QCF42 Date : 6 July 2006 Unit / Module Description: IEEE-754 Floating-point FPGA IP Core Unit / Module Number: FC100 Document Issue Number:
More informationA GPU-Inspired Soft Processor for High- Throughput Acceleration
A GPU-Inspired Soft Processor for High- Throughput Acceleration Jeffrey Kingyens and J. Gregory Steffan Electrical and Computer Engineering University of Toronto 1 FGPA-Based Acceleration In-socket acceleration
More informationParallel Programming of High-Performance Reconfigurable Computing Systems with Unified Parallel C
Parallel Programming of High-Performance Reconfigurable Computing Systems with Unified Parallel C Tarek El-Ghazawi, Olivier Serres, Samy Bahra, Miaoqing Huang and Esam El-Araby Department of Electrical
More informationTen Reasons to Optimize a Processor
By Neil Robinson SoC designs today require application-specific logic that meets exacting design requirements, yet is flexible enough to adjust to evolving industry standards. Optimizing your processor
More informationExperts in Application Acceleration Synective Labs AB
Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg
More informationDeveloping and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors
Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors Paul Ekas, DSP Engineering, Altera Corp. pekas@altera.com, Tel: (408) 544-8388, Fax: (408) 544-6424 Altera Corp., 101
More informationAn Overview of a Compiler for Mapping MATLAB Programs onto FPGAs
An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs P. Banerjee Department of Electrical and Computer Engineering Northwestern University 2145 Sheridan Road, Evanston, IL-60208 banerjee@ece.northwestern.edu
More informationReconfigurable Computing - (RC)
Reconfigurable Computing - (RC) Yogindra S Abhyankar Hardware Technology Development Group, C-DAC Outline Motivation Architecture Applications Performance Summary HPC Fastest Growing Sector HPC, the massive
More informationLogiCORE IP FIFO Generator v6.1
DS317 April 19, 2010 Introduction The Xilinx LogiCORE IP FIFO Generator is a fully verified first-in first-out (FIFO) memory queue for applications requiring in-order storage and retrieval. The core provides
More informationFIFO Generator v13.0
FIFO Generator v13.0 LogiCORE IP Product Guide Vivado Design Suite Table of Contents IP Facts Chapter 1: Overview Native Interface FIFOs.............................................................. 5
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationSystem Level Design with IBM PowerPC Models
September 2005 System Level Design with IBM PowerPC Models A view of system level design SLE-m3 The System-Level Challenges Verification escapes cost design success There is a 45% chance of committing
More informationFPGA VHDL Design Flow AES128 Implementation
Sakinder Ali FPGA VHDL Design Flow AES128 Implementation Field Programmable Gate Array Basic idea: two-dimensional array of logic blocks and flip-flops with a means for the user to configure: 1. The interconnection
More informationAvnet Speedway Design Workshop
Accelerating Your Success Avnet Speedway Design Workshop Creating FPGA-based Co-Processors for DSPs Using Model Based Design Techniques Lecture 4: FPGA Co-Processor Architectures and Verification V10_1_2_0
More informationModeling HDL components for FPGAs in control applications
Modeling HDL components for FPGAs in control applications Mark Corless, Principal Application Engineer, Novi MI 2014 The MathWorks, Inc. 1 Position sensing High resolution voltage modulation Critical diagnostics
More informationLecture 7: Introduction to Co-synthesis Algorithms
Design & Co-design of Embedded Systems Lecture 7: Introduction to Co-synthesis Algorithms Sharif University of Technology Computer Engineering Dept. Winter-Spring 2008 Mehdi Modarressi Topics for today
More informationA software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system
A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system 26th July 2005 Alberto Donato donato@elet.polimi.it Relatore: Prof. Fabrizio Ferrandi Correlatore:
More informationPricing of Derivatives by Fast, Hardware-Based Monte-Carlo Simulation
Pricing of Derivatives by Fast, Hardware-Based Monte-Carlo Simulation Prof. Dr. Joachim K. Anlauf Universität Bonn Institut für Informatik II Technische Informatik Römerstr. 164 53117 Bonn E-Mail: anlauf@informatik.uni-bonn.de
More informationImplementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer
Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Sashisu Bajracharya, Chang Shu, Kris Gaj George Mason University Tarek El-Ghazawi The George
More informationOverview of ROCCC 2.0
Overview of ROCCC 2.0 Walid Najjar and Jason Villarreal SUMMARY FPGAs have been shown to be powerful platforms for hardware code acceleration. However, their poor programmability is the main impediment
More informationComputed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc.
Computed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc. CT Image Reconstruction Herman Head Sinogram Herman Head Reconstruction CT Image Reconstruction for all
More informationFPGAs Provide Reconfigurable DSP Solutions
FPGAs Provide Reconfigurable DSP Solutions Razak Mohammedali Product Marketing Engineer Altera Corporation DSP processors are widely used for implementing many DSP applications. Although DSP processors
More informationDid I Just Do That on a Bunch of FPGAs?
Did I Just Do That on a Bunch of FPGAs? Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto About the Talk Title It s the measure
More informationChapter 5: ASICs Vs. PLDs
Chapter 5: ASICs Vs. PLDs 5.1 Introduction A general definition of the term Application Specific Integrated Circuit (ASIC) is virtually every type of chip that is designed to perform a dedicated task.
More informationJakub Cabal et al. CESNET
CONFIGURABLE FPGA PACKET PARSER FOR TERABIT NETWORKS WITH GUARANTEED WIRE- SPEED THROUGHPUT Jakub Cabal et al. CESNET 2018/02/27 FPGA, Monterey, USA Packet parsing INTRODUCTION It is among basic operations
More informationExtending Model-Based Design for HW/SW Design and Verification in MPSoCs Jim Tung MathWorks Fellow
Extending Model-Based Design for HW/SW Design and Verification in MPSoCs Jim Tung MathWorks Fellow jim@mathworks.com 2014 The MathWorks, Inc. 1 Model-Based Design: From Concept to Production RESEARCH DESIGN
More information/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 3: Caching (1) Welcome!
/INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2015 - Lecture 3: Caching (1) Welcome! Today s Agenda: The Problem with Memory Cache Architectures Practical Assignment 1 INFOMOV Lecture 3 Caching
More informationLegUp: Accelerating Memcached on Cloud FPGAs
0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are
More informationSoC Basics Avnet Silica & Enclustra Seminar Getting started with Xilinx Zynq SoC Fribourg, April 26, 2017
1 2 3 4 Introduction - Cool new Stuff Everybody knows, that new technologies are usually driven by application requirements. A nice example for this is, that we developed portable super-computers with
More informationC-Based Hardware Design Platform for Dynamically Reconfigurable Processor
C-Based Hardware Design Platform for Dynamically Reconfigurable Processor September 22 nd, 2005 IPFlex Inc. Agenda Merits of C-Based hardware design Hardware enabling C-Based hardware design DAPDNA-FW
More informationMultimedia Decoder Using the Nios II Processor
Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra
More informationModeling a 4G LTE System in MATLAB
Modeling a 4G LTE System in MATLAB Part 3: Path to implementation (C and HDL) Houman Zarrinkoub PhD. Signal Processing Product Manager MathWorks houmanz@mathworks.com 2011 The MathWorks, Inc. 1 LTE Downlink
More informationPowerPC on NetFPGA CSE 237B. Erik Rubow
PowerPC on NetFPGA CSE 237B Erik Rubow NetFPGA PCI card + FPGA + 4 GbE ports FPGA (Virtex II Pro) has 2 PowerPC hard cores Untapped resource within NetFPGA community Goals Evaluate performance of on chip
More informationLogiCORE IP Serial RapidIO Gen2 v1.2
LogiCORE IP Serial RapidIO Gen2 v1.2 Product Guide Table of Contents Chapter 1: Overview System Overview............................................................ 5 Applications.................................................................
More informationDSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions
White Paper: Spartan-3 FPGAs WP212 (v1.0) March 18, 2004 DSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions By: Steve Zack, Signal Processing Engineer Suhel Dhanani, Senior
More information/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 3: Caching (1) Welcome!
/INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2017 - Lecture 3: Caching (1) Welcome! Today s Agenda: The Problem with Memory Cache Architectures Practical Assignment 1 INFOMOV Lecture 3 Caching
More information81920**slide. 1Developing the Accelerator Using HLS
81920**slide - 1Developing the Accelerator Using HLS - 82038**slide Objectives After completing this module, you will be able to: Describe the high-level synthesis flow Describe the capabilities of the
More informationReconfigurable Computing. Design and Implementation. Chapter 4.1
Design and Implementation Chapter 4.1 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design In System Integration System Integration Rapid Prototyping Reconfigurable devices (RD) are usually
More informationCLAS data format (HDF5) G.Gavalian (ODU)
CLAS data format (HDF5) G.Gavalian (ODU) Motivation For higher level physics analysis more complicated DST structure is needed. Current DST structures do not provide tools for storing Physics Analysis
More informationFirst To Market through Translation of Executable UML
1(40) A swedish friend asked: What is this uml uml that I see everywhere on the web? Humla : Swedish for bumble-bee. 2(40) The old story about the Depending on its weight in relation to the size of its
More informationField Program mable Gate Arrays
Field Program mable Gate Arrays M andakini Patil E H E P g r o u p D H E P T I F R SERC school NISER, Bhubaneshwar Nov 7-27 2017 Outline Digital electronics Short history of programmable logic devices
More informationAgenda. How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware design
Catapult C Synthesis High Level Synthesis Webinar Stuart Clubb Technical Marketing Engineer April 2009 Agenda How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware
More informationAn FPGA Based Adaptive Viterbi Decoder
An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture
More informationERCBench An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing
ERCBench An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing Daniel Chang Chris Jenkins, Philip Garcia, Syed Gilani, Paula Aguilera, Aishwarya Nagarajan, Michael Anderson, Matthew
More informationTutorial - Using Xilinx System Generator 14.6 for Co-Simulation on Digilent NEXYS3 (Spartan-6) Board
Tutorial - Using Xilinx System Generator 14.6 for Co-Simulation on Digilent NEXYS3 (Spartan-6) Board Shawki Areibi August 15, 2017 1 Introduction Xilinx System Generator provides a set of Simulink blocks
More information"On the Capability and Achievable Performance of FPGAs for HPC Applications"
"On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies
More informationDesign and Verification of FPGA Applications
Design and Verification of FPGA Applications Giuseppe Ridinò Paola Vallauri MathWorks giuseppe.ridino@mathworks.it paola.vallauri@mathworks.it Torino, 19 Maggio 2016, INAF 2016 The MathWorks, Inc. 1 Agenda
More informationCo-Design and Co-Verification using a Synchronous Language. Satnam Singh Xilinx Research Labs
Co-Design and Co-Verification using a Synchronous Language Satnam Singh Xilinx Research Labs Virtex-II PRO Device Array Size Logic Gates PPCs GBIOs BRAMs 2VP2 16 x 22 38K 0 4 12 2VP4 40 x 22 81K 1 4
More informationMaking the Most of your MATLAB Models to Improve Verification
Making the Most of your MATLAB Models to Improve Verification Verification Futures 2016 Graham Reith Industry Manager: Communications, Electronics & Semiconductors Graham.Reith@mathworks.co.uk 2015 The
More informationSDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center
SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently
More informationAutomated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case
XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation SAMOS XIV - 2014 July 14 th - Samos Island (Greece) Carlo Sau, Luigi Raffo DIEE Università degli Studi
More informationEarly Models in Silicon with SystemC synthesis
Early Models in Silicon with SystemC synthesis Agility Compiler summary C-based design & synthesis for SystemC Pure, standard compliant SystemC/ C++ Most widely used C-synthesis technology Structural SystemC
More informationIntroduction to DSP/FPGA Programming Using MATLAB Simulink
دوازدهمين سمينار ساليانه دانشكده مهندسي برق فناوری های الکترونيک قدرت اسفند 93 Introduction to DSP/FPGA Programming Using MATLAB Simulink By: Dr. M.R. Zolghadri Dr. M. Shahbazi N. Noroozi 2 Table of main
More informationSYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS
SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous
More informationFPGA: What? Why? Marco D. Santambrogio
FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much
More informationAgenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs
New Directions in Programming FPGAs for DSP Dr. Jim Hwang Xilinx, Inc. Agenda Introduction FPGA DSP platforms Design challenges New programming models for FPGAs System Generator Getting your math into
More informationRiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner
RiceNIC Prototyping Network Interfaces Jeffrey Shafer Scott Rixner RiceNIC Overview Gigabit Ethernet Network Interface Card RiceNIC - Prototyping Network Interfaces 2 RiceNIC Overview Reconfigurable and
More informationP51: High Performance Networking
P51: High Performance Networking Lecture 6: Programmable network devices Dr Noa Zilberman noa.zilberman@cl.cam.ac.uk Lent 2017/18 High Throughput Interfaces Performance Limitations So far we discussed
More informationDocumentation. Implementation Xilinx ISE v10.1. Simulation
DS317 September 19, 2008 Introduction The Xilinx LogiCORE IP Generator is a fully verified first-in first-out () memory queue for applications requiring in-order storage and retrieval. The core provides
More informationNEW FPGA DESIGN AND VERIFICATION TECHNIQUES MICHAL HUSEJKO IT-PES-ES
NEW FPGA DESIGN AND VERIFICATION TECHNIQUES MICHAL HUSEJKO IT-PES-ES Design: Part 1 High Level Synthesis (Xilinx Vivado HLS) Part 2 SDSoC (Xilinx, HLS + ARM) Part 3 OpenCL (Altera OpenCL SDK) Verification:
More informationDSP Builder Handbook Volume 1: Introduction to DSP Builder
DSP Builder Handbook Volume 1: Introduction to DSP Builder DSP Builder Handbook 101 Innovation Drive San Jose, CA 95134 www.altera.com HB_DSPB_INTRO-5.1 Document last updated for Altera Complete Design
More information