DATA REUSE ANALYSIS FOR AUTOMATED SYNTHESIS OF CUSTOM INSTRUCTIONS IN SLIDING WINDOW APPLICATIONS
|
|
- Clarissa Boyd
- 5 years ago
- Views:
Transcription
1 Georgios Zacharopoulos Giovanni Ansaloni Laura Pozzi DATA REUSE ANALYSIS FOR AUTOMATED SYNTHESIS OF CUSTOM INSTRUCTIONS IN SLIDING WINDOW APPLICATIONS Università della Svizzera italiana (USI Lugano), Faculty of Informatics IMPACT 2017 Jan 23, 2017 Stockholm, Sweden
2 INTRODUCTION 2 MOORE S LAW - DENNARD SCALING Expectation Performance should keep increasing Reality Performance is NOT increasing anymore
3 INTRODUCTION BREAKDOWN OF DENNARD SCALING Single core > Multiple cores Dark Silicon 3
4 INTRODUCTION 4 HETEROGENEOUS ERA Multiple cores > Heterogeneous Computing Need for Specialised HW [1] [1]
5 BACKGROUND 5 SPECIALISATION AND PERFORMANCE ASICS CPU + HW ACCELERATORS SPECIALISATION GENERAL PURPOSE CPU DOMAIN SPECIFIC PROCESSORS (ADSPS) APPLICATION SPECIFIC INSTRUCTION-SET PROCESSORS (ASIPS) PERFORMANCE
6 RELATED WORK 6 DATA TRANSFER IS THE BOTTLENECK Accelerate data transfer between accelerators and [4] [5] memory Custom storage as an optimisation [6] [7] Identify Data reuse for building custom storage [4] G. Gutin, A. Johnstone, J. Reddington, E. Scott, and A. Yeo. An algorithm for finding input-output constrained convex sets in an acyclic digraph. J. Discrete Algorithms, 13:47 58, [5] E. Giaquinta, A. Mishra, and L. Pozzi. Maximum convex subgraphs under I/O constraint for automatic identification of custom instructions. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(3): , [6] P. Biswas, N. Dutt, P. Ienne, and L. Pozzi. Automatic identification of application-specific functional units with architecturally visible storage. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pages , Mar [7] M.Haaß,L.Bauer,andJ.Henkel.Automatic Custom Instruction Identification in Memory Streaming Algorithms. In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems, pages 1 9, Oct
7 RELATED WORK 7 DATA TRANSFER IS THE BOTTLENECK Rely on source-to-source transformations [8] [9] [10] Limited Parallelism [8] [9] Large Internal Buffers [10] [8] W. Meeus and D. Stroobandt. Automating data reuse in high-level synthesis. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pages 1 4, Mar [9] L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the 2013 ACM/SIGDA 21st International Symposium on Field Programmable Gate Arrays, pages 29 38, Feb [10] M. Schmid, O. Reiche, F. Hannig, and J. Teich. Loop coarsening in C-based high-level synthesis. In Proceedings of the 26th International Conference on Application-specific Systems, Architectures and Processors, pages , July 2015.
8 RELATED WORK 8 DATA TRANSFER IS THE BOTTLENECK Rely on source-to-source transformations [8] [9] [10] Limited Parallelism [8] [9] Large Internal Buffers [10] Multiple Datapaths Area efficient Accelerators [8] W. Meeus and D. Stroobandt. Automating data reuse in high-level synthesis. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pages 1 4, Mar [9] L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the 2013 ACM/SIGDA 21st International Symposium on Field Programmable Gate Arrays, pages 29 38, Feb [10] M. Schmid, O. Reiche, F. Hannig, and J. Teich. Loop coarsening in C-based high-level synthesis. In Proceedings of the 26th International Conference on Application-specific Systems, Architectures and Processors, pages , July 2015.
9 METHODOLOGY 9 SW ANALYSIS FOR DATA REUSE void Sobelsystem() { outer_loop:for(i = 1; i < 100; i++) { inner_loop:for(j = 1; j < 100; j++) { Sobelmodule(image[i-1][j-1], image[i][j-1], image[i+1][j+1], image[i-1][j], image[i+1][j], image[i-1][j+1], image[i][j+1], image[i+1][j-1], image[i][j]); LOADS } } }
10 METHODOLOGY 10 SW ANALYSIS FOR DATA REUSE
11 METHODOLOGY 11 SW ANALYSIS FOR DATA REUSE
12 METHODOLOGY 12 SW ANALYSIS FOR DATA REUSE
13 METHODOLOGY 13 WINDOW 1
14 METHODOLOGY 14 WINDOW 2
15 METHODOLOGY 15 WINDOW 1 WINDOW 2 UNROLLING
16 METHODOLOGY 16 WINDOW 1 WINDOW 2 UNROLLING PIPELINING
17 METHODOLOGY 17 WINDOW 1 WINDOW 2 UNROLLING PIPELINING
18 METHODOLOGY 18 COMPILERS Perform Source Code Analysis Identify Data Reuse in Sliding Window Applications
19 METHODOLOGY 19 COMPILERS Perform Source Code Analysis Identify Data Reuse in Sliding Window Applications [2] [3] [2] C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2nd International Symposium on Code generation and optimization, Palo Alto, California, Mar [3] T. Grosser, H. Zheng, R. Aloor, A. Simbürger, A. Größlinger, and L.-N. Pouchet. Polly-polyhedral optimization in llvm. In Proceedings of the First International Workshop on Polyhedral Compilation Techniques (IMPACT), volume 2011, 2011.
20 METHODOLOGY 20 LLVM POLLY SCOP ANALYSIS FOR DATA REUSE WINDOW SIZE
21 METHODOLOGY 21 LLVM POLLY SCOP ANALYSIS FOR DATA REUSE STRIDE
22 METHODOLOGY 22 LLVM POLLY SCOP ANALYSIS FOR DATA REUSE FRAME SIZE
23 METHODOLOGY 23 HW IMPLEMENTATION Parameters retrieved with LLVM Polly Analysis Pass: Horizontal and Vertical Window Size Stride Iteration Domain (Frame Size)
24 METHODOLOGY 24 HW IMPLEMENTATION
25 METHODOLOGY 25 HW IMPLEMENTATION AVAILABLE INPUT DATA WIDTH
26 METHODOLOGY 26 HW IMPLEMENTATION AVAILABLE INPUT DATA WIDTH Buffer Size: Horizontal Width = Input Width (INw) INw
27 METHODOLOGY 27 HW IMPLEMENTATION Buffer Size: Horizontal Width = Input Width (INw) Example: INw INw = Horizontal Win. Size + 1
28 METHODOLOGY 28 HW IMPLEMENTATION Buffer Size: Horizontal Width = Input Width (INw) Vertical Width = Vertical Win. Size Example: INw INw = Horizontal Win. Size + 1
29 METHODOLOGY 29 HW IMPLEMENTATION Buffer Size: Horizontal Width = Input Width (INw) Vertical Width = Vertical Win. Size Example: INw INw = Horizontal Win. Size + 1
30 METHODOLOGY 30 WINDOW 1 WINDOW 2 BUFFER
31 METHODOLOGY 31 WINDOW 1 WINDOW 2 STRIDE - DISCARD TOPMOST ROW STRIDE - STORE NEW ROW
32 METHODOLOGY 32 WINDOW 1 WINDOW 2
33 METHODOLOGY 33 WINDOW 1 WINDOW 2
34 METHODOLOGY 34 WINDOW 1 WINDOW 2
35 METHODOLOGY 35 WINDOW 1 WINDOW 2
36 METHODOLOGY 36 DISPLACEMENT WINDOW 1 WINDOW 2
37 METHODOLOGY 37 DISPLACEMENT WINDOW 1 WINDOW 2 Horiz. Displacement = INw - Horiz. Win. Size + 1
38 METHODOLOGY 38 DISPLACEMENT WINDOW 1 WINDOW 2 Horiz. Displacement = INw - Horiz. Win. Size + 1 Example: INw = Horizontal Win. Size + 1 Horiz. Displacement = 2
39 EXPERIMENTAL SETUP 39 APPROACHES COMPARED ROCCC Vivado HLS Our Approach
40 EXPERIMENTAL SETUP 40 APPROACHES COMPARED ROCCC Vivado HLS Our Approach PIPELINING DATA REUSE
41 EXPERIMENTAL SETUP 41 APPROACHES COMPARED ROCCC Vivado HLS Our Approach WINDOW 1 WINDOW 2 PIPELINING PIPELINING DATA REUSE
42 EXPERIMENTAL SETUP 42 APPROACHES COMPARED ROCCC Vivado HLS Our Approach WINDOW 1 WINDOW 2 PIPELINING ROCCC PIPELINING DATA REUSE
43 EXPERIMENTAL SETUP 43 APPROACHES COMPARED ROCCC Vivado HLS Our Approach WINDOW 1 WINDOW 2 PIPELINING ROCCC PIPELINING DATA REUSE VIVADO Manual Rewrite
44 EXPERIMENTAL SETUP 44 APPROACHES COMPARED ROCCC Vivado HLS Our Approach WINDOW 1 WINDOW 2 PIPELINING ROCCC PIPELINING DATA REUSE VIVADO Manual Rewrite OUR
45 EXPERIMENTAL SETUP 45 APPROACHES COMPARED ROCCC Vivado HLS Our Approach PIPELINING DATA REUSE UNROLLING DATA REUSE ROCCC VIVADO Manual Rewrite OUR
46 EXPERIMENTAL SETUP 46 APPROACHES COMPARED ROCCC WINDOW 1 WINDOW 2 Vivado HLS Our Approach UNROLLING PIPELINING DATA REUSE UNROLLING DATA REUSE ROCCC VIVADO Manual Rewrite OUR
47 EXPERIMENTAL SETUP 47 APPROACHES COMPARED ROCCC WINDOW 1 WINDOW 2 Vivado HLS Our Approach UNROLLING PIPELINING DATA REUSE UNROLLING DATA REUSE ROCCC VIVADO Manual Rewrite OUR
48 EXPERIMENTAL SETUP 48 APPROACHES COMPARED ROCCC WINDOW 1 WINDOW 2 Vivado HLS Our Approach UNROLLING PIPELINING DATA REUSE UNROLLING DATA REUSE ROCCC VIVADO Manual Rewrite OUR
49 EXPERIMENTAL SETUP 49 APPROACHES COMPARED ROCCC WINDOW 1 WINDOW 2 Vivado HLS Our Approach UNROLLING PIPELINING DATA REUSE UNROLLING DATA REUSE ROCCC VIVADO Manual Rewrite OUR
50 EXPERIMENTAL SETUP 50 APPROACHES COMPARED ROCCC Vivado HLS Our Approach ROCCC PIPELINING DATA REUSE UNROLLING DATA REUSE PARAMETRIC I/O WIDTH VIVADO Manual Rewrite OUR
51 EXPERIMENTAL SETUP 51 APPROACHES COMPARED INPUT WIDTH ROCCC Vivado HLS BUFFER Our Approach PIPELINING DATA REUSE UNROLLING DATA REUSE PARAMETRIC I/O WIDTH ROCCC VIVADO Manual Rewrite OUR
52 EXPERIMENTAL SETUP 52 APPROACHES COMPARED INPUT WIDTH ROCCC Vivado HLS BUFFER Our Approach PIPELINING DATA REUSE UNROLLING DATA REUSE PARAMETRIC I/O WIDTH ROCCC VIVADO Manual Rewrite OUR
53 EXPERIMENTAL SETUP 53 APPROACHES COMPARED INPUT WIDTH ROCCC Vivado HLS BUFFER Our Approach PIPELINING DATA REUSE UNROLLING DATA REUSE PARAMETRIC I/O WIDTH ROCCC VIVADO Manual Rewrite OUR
54 EXPERIMENTAL SETUP 54 APPROACHES COMPARED INPUT WIDTH ROCCC Vivado HLS BUFFER Our Approach PIPELINING DATA REUSE UNROLLING DATA REUSE PARAMETRIC I/O WIDTH ROCCC VIVADO Manual Rewrite OUR
55 EXPERIMENTAL SETUP 55 OUR APPROACH Xilinx Virtex7 FPGA (HDL Simulation)
56 EXPERIMENTAL SETUP 56 OUR APPROACH Xilinx Virtex7 FPGA (HDL Simulation) 3 CONFIGURATIONS
57 EXPERIMENTAL SETUP 57 OUR APPROACH Xilinx Virtex7 FPGA (HDL Simulation) 3 CONFIGURATIONS INPUT WIDTH # DATAPATHS
58 EXPERIMENTAL SETUP 58 OUR APPROACH Xilinx Virtex7 FPGA (HDL Simulation) 3 CONFIGURATIONS INPUT WIDTH # DATAPATHS WINDOW SIZE SAD MAX FILTER SOBEL 4X4 8X8 3X3
59 EXPERIMENTAL SETUP 59 OUR APPROACH Xilinx Virtex7 FPGA (HDL Simulation) INPUT WIDTH # DATAPATHS WINDOW SIZE SINGLE DATAPATH SDP SAD MAX FILTER SOBEL 4X4 8X8 3X3
60 EXPERIMENTAL SETUP 60 OUR APPROACH Xilinx Virtex7 FPGA (HDL Simulation) INPUT WIDTH # DATAPATHS SAD MAX FILTER SOBEL WINDOW SIZE 4X4 8X8 3X3 SINGLE DATAPATH SDP 4, 1 8, 1 3, 1
61 EXPERIMENTAL SETUP 61 OUR APPROACH Xilinx Virtex7 FPGA (HDL Simulation) INPUT WIDTH # DATAPATHS WINDOW SIZE SINGLE DATAPATH SDP MULTIPLE DATAPATHS MDP SAD MAX FILTER SOBEL 4X4 8X8 3X3 4, 1 8, 1 3, 1
62 EXPERIMENTAL SETUP 62 OUR APPROACH Xilinx Virtex7 FPGA (HDL Simulation) INPUT WIDTH # DATAPATHS WINDOW SIZE SINGLE DATAPATH SDP MULTIPLE DATAPATHS MDP SAD 4X4 4, 1 8, 5 MAX FILTER 8X8 8, 1 16, 9 SOBEL 3X3 3, 1 8, 6
63 EXPERIMENTAL SETUP 63 OUR APPROACH Xilinx Virtex7 FPGA (HDL Simulation) INPUT WIDTH # DATAPATHS WINDOW SIZE SINGLE DATAPATH SDP MULTIPLE DATAPATHS MDP EXTRA MULT. DATAPATHS XMDP SAD 4X4 4, 1 8, 5 MAX FILTER 8X8 8, 1 16, 9 SOBEL 3X3 3, 1 8, 6
64 EXPERIMENTAL SETUP 64 OUR APPROACH Xilinx Virtex7 FPGA (HDL Simulation) INPUT WIDTH # DATAPATHS WINDOW SIZE SINGLE DATAPATH SDP MULTIPLE DATAPATHS MDP EXTRA MULT. DATAPATHS XMDP SAD 4X4 4, 1 8, 5 16, 13 MAX FILTER 8X8 8, 1 16, 9 24, 17 SOBEL 3X3 3, 1 8, 6 16, 14
65 EXPERIMENTS 65 EXECUTION TIME COMPARISON ROCCC Vivado_norew Vivado_norev Vivado_rew Vivado_rev CISC=Conf.1 SDP Conf.1========= CISC=Conf.2 Conf.2 MDP Conf.3 CISC=Conf.3 XMDP ns Sobel MaxFilter BlockSAD
66 EXPERIMENTS 66 EXECUTION TIME COMPARISON ROCCC Vivado_norew Vivado_norev Vivado_rew Vivado_rev CISC=Conf.1 SDP Conf.1========= CISC=Conf.2 Conf.2 MDP Conf.3 CISC=Conf.3 XMDP ns Sobel MaxFilter BlockSAD PIPELINING DATA REUSE UNROLLING DATA REUSE PARAMETRIC I/O WIDTH VIVADO_NOREW
67 EXPERIMENTS 67 EXECUTION TIME COMPARISON ROCCC Vivado_norew Vivado_norev Vivado_rew Vivado_rev CISC=Conf.1 SDP Conf.1========= CISC=Conf.2 Conf.2 MDP Conf.3 CISC=Conf.3 XMDP ns Sobel MaxFilter BlockSAD PIPELINING DATA REUSE UNROLLING DATA REUSE PARAMETRIC I/O WIDTH VIVADO_REW Manual Rewrite
68 EXPERIMENTS 68 EXECUTION TIME COMPARISON ROCCC Vivado_norew Vivado_norev Vivado_rew Vivado_rev CISC=Conf.1 SDP Conf.1========= CISC=Conf.2 Conf.2 MDP Conf.3 CISC=Conf.3 XMDP ns Sobel MaxFilter BlockSAD Exec. Time to process a 100x100 frame
69 EXPERIMENTS 69 EXECUTION TIME COMPARISON ROCCC Vivado_norew Vivado_norev Vivado_rew Vivado_rev CISC=Conf.1 SDP Conf.1========= CISC=Conf.2 Conf.2 MDP Conf.3 CISC=Conf.3 XMDP ns Sobel MaxFilter BlockSAD Exec. Time to process a 100x100 frame
70 EXPERIMENTS 70 EXECUTION TIME COMPARISON ROCCC Vivado_norew Vivado_norev Vivado_rew Vivado_rev CISC=Conf.1 SDP Conf.1========= CISC=Conf.2 Conf.2 MDP Conf.3 CISC=Conf.3 XMDP ns Sobel MaxFilter BlockSAD Exec. Time to process a 100x100 frame
71 EXPERIMENTS 71 EXECUTION TIME COMPARISON ROCCC Vivado_norew Vivado_norev Vivado_rew Vivado_rev CISC=Conf.1 SDP Conf.1========= CISC=Conf.2 Conf.2 MDP Conf.3 CISC=Conf.3 XMDP ns Sobel MaxFilter BlockSAD Exec. Time to process a 100x100 frame
72 EXPERIMENTS 72 AREA COMPARISON 45666!"""" 7,8)9$:;$.&8 7,8)9$:;$.&C!"""" 7,8)9$:.&8 7,8)9$:.&C 6<#6=6$;>?! SDP 6$;>?!=========!"""!""" MDP 6<#6=6$;>?A 6$;>?A XMDP!"" #$%&' ()*+,'-&. /'$01#23!"" #$%&' ()*+,'-&. /'$01#23 )B FLIP FLOPS LUTS
73 EXPERIMENTS 73 AREA COMPARISON 45666!"""" 7,8)9$:;$.&8 7,8)9$:;$.&C!"""" 7,8)9$:.&8 7,8)9$:.&C 6<#6=6$;>?! SDP 6$;>?!=========!"""!""" MDP 6<#6=6$;>?A 6$;>?A XMDP!"" #$%&' ()*+,'-&. /'$01#23!"" #$%&' ()*+,'-&. /'$01#23 )B FLIP FLOPS LUTS
74 EXPERIMENTS 74 AREA COMPARISON 45666!"""" 7,8)9$:;$.&8 7,8)9$:;$.&C!"""" 7,8)9$:.&8 7,8)9$:.&C 6<#6=6$;>?! SDP 6$;>?!=========!"""!""" MDP 6<#6=6$;>?A 6$;>?A XMDP!"" #$%&' ()*+,'-&. /'$01#23!"" #$%&' ()*+,'-&. /'$01#23 )B FLIP FLOPS LUTS
75 EXPERIMENTS 75 AREA COMPARISON 45666!"""" 7,8)9$:;$.&8 7,8)9$:;$.&C!"""" 7,8)9$:.&8 7,8)9$:.&C 6<#6=6$;>?! SDP 6$;>?!=========!"""!""" MDP 6<#6=6$;>?A 6$;>?A XMDP!"" #$%&' ()*+,'-&. /'$01#23!"" #$%&' ()*+,'-&. /'$01#23 )B FLIP FLOPS LUTS
76 EXPERIMENTS 76 AREA COMPARISON 45666!"""" 7,8)9$:;$.&8 7,8)9$:;$.&C!"""" 7,8)9$:.&8 7,8)9$:.&C 6<#6=6$;>?! SDP 6$;>?!=========!"""!""" MDP 6<#6=6$;>?A 6$;>?A XMDP!"" #$%&' ()*+,'-&. /'$01#23!"" #$%&' ()*+,'-&. /'$01#23 )B FLIP FLOPS LUTS
77 EXPERIMENTS 77 AREA COMPARISON 45666!"""" 7,8)9$:;$.&8 7,8)9$:;$.&C!"""" 7,8)9$:.&8 7,8)9$:.&C 6<#6=6$;>?! SDP 6$;>?!=========!"""!""" MDP 6<#6=6$;>?A 6$;>?A XMDP!"" #$%&' ()*+,'-&. /'$01#23!"" #$%&' ()*+,'-&. /'$01#23 )B FLIP FLOPS LUTS
78 EXPERIMENTS 78 AREA COMPARISON 45666!"""" 7,8)9$:;$.&8 7,8)9$:;$.&C!"""" 7,8)9$:.&8 7,8)9$:.&C 6<#6=6$;>?! SDP 6$;>?!=========!"""!""" MDP 6<#6=6$;>?A 6$;>?A XMDP!"" #$%&' ()*+,'-&. /'$01#23!"" #$%&' ()*+,'-&. /'$01#23 )B FLIP FLOPS LUTS
79 EXPERIMENTS 79 SPEEDUP AND AREA 74 7& & :9 ;/0,'"(0/<0,-'=>?@)A ;/0,'"(0/<0,-'=BB)A SOBEL MAX FILTER SAD SOBEL MAX FILTER SAD MDP vs Vivado_Rew!"#$%&''()%''*+(,-"./01 XMDP vs Vivado_Rew!"#$%2''()%''*+(,-"./01
80 CONCLUSIONS 80 DATA REUSE ANALYSIS BASED ON POLLY Optimize HLS of Accelerators for Sliding Window Applications Use Polly for Better HW Designs
81 CONCLUSIONS 81 DATA REUSE ANALYSIS BASED ON POLLY Optimize HLS of Accelerators for Sliding Window Applications Use Polly for Better HW Designs Exploit Data Locality
82 CONCLUSIONS 82 DATA REUSE ANALYSIS BASED ON POLLY Optimize HLS of Accelerators for Sliding Window Applications Use Polly for Better HW Designs Exploit Data Locality Design Efficient Buffers
83 CONCLUSIONS 83 DATA REUSE ANALYSIS BASED ON POLLY Optimize HLS of Accelerators for Sliding Window Applications Use Polly for Better HW Designs Exploit Data Locality Design Efficient Buffers Better Use of Resources
84 CONCLUSIONS 84 DATA REUSE ANALYSIS BASED ON POLLY Optimize HLS of Accelerators for Sliding Window Applications Use Polly for Better HW Designs Exploit Data Locality Design Efficient Buffers Better Use of Resources Better Performance
Data Reuse Analysis for Automated Synthesis of Custom Instructions in Sliding Window Applications
Data Reuse Analysis for Automated Synthesis of Custom Instructions in Sliding Window Applications Georgios Zacharopoulos, Giovanni Ansaloni, Laura Pozzi Faculty of Informatics University of Lugano (USI)
More informationA 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation
A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation Abstract: The power budget is expected to limit the portion of the chip that we can power ON at the upcoming technology nodes. This problem,
More informationAn Optimal Microarchitecture for Stencil Computation Acceleration Based on Non-Uniform Partitioning of Data Reuse Buffers
An Optimal Microarchitecture for Stencil Computation Acceleration Based on Non-Uniform Partitioning of Data Reuse Buffers Jason Cong *, Peng Li, Bingjun Xiao *, and Peng Zhang Computer Science Dept. &
More informationA Lost Cycles Analysis for Performance Prediction using High-Level Synthesis
A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,
More informationRun Fast When You Can: Loop Pipelining with Uncertain and Non-uniform Memory Dependencies
Run Fast When You Can: Loop Pipelining with Uncertain and Non-uniform Memory Dependencies Junyi Liu, John Wickerson, Samuel Bayliss, and George A. Constantinides Department of Electrical and Electronic
More informationPolyhedral-Based Data Reuse Optimization for Configurable Computing
Polyhedral-Based Data Reuse Optimization for Configurable Computing Louis-Noël Pouchet 1 Peng Zhang 1 P. Sadayappan 2 Jason Cong 1 1 University of California, Los Angeles 2 The Ohio State University February
More informationSODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou
SODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou University of California, Los Angeles 1 What is stencil computation? 2 What is Stencil Computation? A sliding
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationEfficient Hardware Acceleration on SoC- FPGA using OpenCL
Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA
More informationGeneration of Multigrid-based Numerical Solvers for FPGA Accelerators
Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Christian Schmitt, Moritz Schmid, Frank Hannig, Jürgen Teich, Sebastian Kuckuk, Harald Köstler Hardware/Software Co-Design, System
More informationElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests
ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests Mingxing Tan 1 2, Gai Liu 1, Ritchie Zhao 1, Steve Dai 1, Zhiru Zhang 1 1 Computer Systems Laboratory, Electrical and Computer
More informationCHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier
CHAPTER 3 METHODOLOGY 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier The design analysis starts with the analysis of the elementary algorithm for multiplication by
More informationECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University
ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University Lab 4: Binarized Convolutional Neural Networks Due Wednesday, October 31, 2018, 11:59pm
More informationEARLY PREDICTION OF HARDWARE COMPLEXITY IN HLL-TO-HDL TRANSLATION
UNIVERSITA DEGLI STUDI DI NAPOLI FEDERICO II Facoltà di Ingegneria EARLY PREDICTION OF HARDWARE COMPLEXITY IN HLL-TO-HDL TRANSLATION Alessandro Cilardo, Paolo Durante, Carmelo Lofiego, and Antonino Mazzeo
More informationFrequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM
More informationESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder
ESE532: System-on-a-Chip Architecture Day 8: September 26, 2018 Spatial Computations Today Graph Cycles (from Day 7) Accelerator Pipelines FPGAs Zynq Computational Capacity 1 2 Message Custom accelerators
More informationFPGA: What? Why? Marco D. Santambrogio
FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much
More informationAn Overlay Architecture for FPGA-based Industrial Control Systems Designed with Functional Block Diagrams
R2-7 SASIMI 26 Proceedings An Overlay Architecture for FPGA-based Industrial Control Systems Designed with Functional Block Diagrams Taisei Segawa, Yuichiro Shibata, Yudai Shirakura, Kenichi Morimoto,
More informationExploring Automatically Generated Platforms in High Performance FPGAs
Exploring Automatically Generated Platforms in High Performance FPGAs Panagiotis Skrimponis b, Georgios Zindros a, Ioannis Parnassos a, Muhsen Owaida b, Nikolaos Bellas a, and Paolo Ienne b a Electrical
More informationImplementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures
Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada, V6T
More informationHigh-Level Synthesis of Programmable Hardware Accelerators Considering Potential Varieties
THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.,, (VDEC) 113 8656 7 3 1 CREST E-mail: hiroaki@cad.t.u-tokyo.ac.jp, fujita@ee.t.u-tokyo.ac.jp SoC Abstract
More informationIntegrating MRPSOC with multigrain parallelism for improvement of performance
Integrating MRPSOC with multigrain parallelism for improvement of performance 1 Swathi S T, 2 Kavitha V 1 PG Student [VLSI], Dept. of ECE, CMRIT, Bangalore, Karnataka, India 2 Ph.D Scholar, Jain University,
More informationMOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden
High Level Synthesis with Catapult MOJTABA MAHDAVI 1 Outline High Level Synthesis HLS Design Flow in Catapult Data Types Project Creation Design Setup Data Flow Analysis Resource Allocation Scheduling
More informationHigh Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx
High Capacity and High Performance 20nm FPGAs Steve Young, Dinesh Gaitonde August 2014 Not a Complete Product Overview Page 2 Outline Page 3 Petabytes per month Increasing Bandwidth Global IP Traffic Growth
More informationLab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm
ECE5775 High-Level Digital Design Automation, Fall 2017 School of Electrical Computer Engineering, Cornell University Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm 1 Introduction
More informationInternational Journal of Computer Sciences and Engineering. Research Paper Volume-6, Issue-2 E-ISSN:
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-6, Issue-2 E-ISSN: 2347-2693 Implementation Sobel Edge Detector on FPGA S. Nandy 1*, B. Datta 2, D. Datta 3
More informationHow Much Logic Should Go in an FPGA Logic Block?
How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca
More informationHIGH-LEVEL SYNTHESIS
HIGH-LEVEL SYNTHESIS Page 1 HIGH-LEVEL SYNTHESIS High-level synthesis: the automatic addition of structural information to a design described by an algorithm. BEHAVIORAL D. STRUCTURAL D. Systems Algorithms
More informationESE532: System-on-a-Chip Architecture. Today. Message. Clock Cycle BRAM
ESE532: System-on-a-Chip Architecture Day 20: April 3, 2017 Pipelining, Frequency, Dataflow Today What drives cycle times Pipelining in Vivado HLS C Avoiding bottlenecks feeding data in Vivado HLS C Penn
More informationResource-Efficient Scheduling for Partially-Reconfigurable FPGAbased
Resource-Efficient Scheduling for Partially-Reconfigurable FPGAbased Systems Andrea Purgato: andrea.purgato@mail.polimi.it Davide Tantillo: davide.tantillo@mail.polimi.it Marco Rabozzi: marco.rabozzi@polimi.it
More informationCourse Overview Revisited
Course Overview Revisited void blur_filter_3x3( Image &in, Image &blur) { // allocate blur array Image blur(in.width(), in.height()); // blur in the x dimension for (int y = ; y < in.height(); y++) for
More informationFPGA Implementation and Validation of the Asynchronous Array of simple Processors
FPGA Implementation and Validation of the Asynchronous Array of simple Processors Jeremy W. Webb VLSI Computation Laboratory Department of ECE University of California, Davis One Shields Avenue Davis,
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationVivado HLx Design Entry. June 2016
Vivado HLx Design Entry June 2016 Agenda What is the HLx Design Methodology? New & Early Access features for Connectivity Platforms Creating Differentiated Logic 2 What is the HLx Design Methodology? Page
More informationNew Approach for Affine Combination of A New Architecture of RISC cum CISC Processor
Volume 2 Issue 1 March 2014 ISSN: 2320-9984 (Online) International Journal of Modern Engineering & Management Research Website: www.ijmemr.org New Approach for Affine Combination of A New Architecture
More informationUsing FPGAs as Microservices
Using FPGAs as Microservices David Ojika, Ann Gordon-Ross, Herman Lam, Bhavesh Patel, Gaurav Kaul, Jayson Strayer (University of Florida, DELL EMC, Intel Corporation) The 9 th Workshop on Big Data Benchmarks,
More informationFPGA Matrix Multiplier
FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri
More informationHello I am Zheng, Hongbin today I will introduce our LLVM based HLS framework, which is built upon the Target Independent Code Generator.
Hello I am Zheng, Hongbin today I will introduce our LLVM based HLS framework, which is built upon the Target Independent Code Generator. 1 In this talk I will first briefly introduce what HLS is, and
More informationOverview of ROCCC 2.0
Overview of ROCCC 2.0 Walid Najjar and Jason Villarreal SUMMARY FPGAs have been shown to be powerful platforms for hardware code acceleration. However, their poor programmability is the main impediment
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationMaximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman
Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael
More informationIntroduction to Neural Networks
ECE 5775 (Fall 17) High-Level Digital Design Automation Introduction to Neural Networks Ritchie Zhao, Zhiru Zhang School of Electrical and Computer Engineering Rise of the Machines Neural networks have
More informationA Configurable Multi-Ported Register File Architecture for Soft Processor Cores
A Configurable Multi-Ported Register File Architecture for Soft Processor Cores Mazen A. R. Saghir and Rawan Naous Department of Electrical and Computer Engineering American University of Beirut P.O. Box
More informationDesign Space Exploration Using Parameterized Cores
RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR Design Space Exploration Using Parameterized Cores Ian D. L. Anderson M.A.Sc. Candidate March 31, 2006 Supervisor: Dr. M. Khalid 1 OUTLINE
More informationEECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)
EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Fall 2002 EECS150 - Lec06-FPGA Page 1 Outline What are FPGAs? Why use FPGAs (a short history
More informationSynthesizable FPGA Fabrics Targetable by the VTR CAD Tool
Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design
More informationLossless Compression using Efficient Encoding of Bitmasks
Lossless Compression using Efficient Encoding of Bitmasks Chetan Murthy and Prabhat Mishra Department of Computer and Information Science and Engineering University of Florida, Gainesville, FL 326, USA
More informationOutline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?
EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Outline What are FPGAs? Why use FPGAs (a short history lesson). FPGA variations Internal logic
More informationA Methodology for Predicting Application- Specific Achievable Memory Bandwidth for HW/SW-Codesign
Göbel, M.; Elhossini, A.; Juurlink, B. A Methodology for Predicting Application- Specific Achievable Memory Bandwidth for HW/SW-Codesign Conference paper Accepted manuscript (Postprint) This version is
More informationAdvanced FPGA Design Methodologies with Xilinx Vivado
Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,
More informationC vs. VHDL: Benchmarking CAESAR Candidates Using High- Level Synthesis and Register- Transfer Level Methodologies
C vs. VHDL: Benchmarking CAESAR Candidates Using High- Level Synthesis and Register- Transfer Level Methodologies Ekawat Homsirikamol, William Diehl, Ahmed Ferozpuri, Farnoud Farahmand, and Kris Gaj George
More informationBig Data Systems on Future Hardware. Bingsheng He NUS Computing
Big Data Systems on Future Hardware Bingsheng He NUS Computing http://www.comp.nus.edu.sg/~hebs/ 1 Outline Challenges for Big Data Systems Why Hardware Matters? Open Challenges Summary 2 3 ANYs in Big
More informationSpecializing Hardware for Image Processing
Lecture 6: Specializing Hardware for Image Processing Visual Computing Systems So far, the discussion in this class has focused on generating efficient code for multi-core processors such as CPUs and GPUs.
More informationEECS150 - Digital Design Lecture 13 - Accelerators. Recap and Outline
EECS150 - Digital Design Lecture 13 - Accelerators Oct. 10, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof. John Wawrzynek)
More informationAdvanced Synthesis Techniques
Advanced Synthesis Techniques Reminder From Last Year Use UltraFast Design Methodology for Vivado www.xilinx.com/ultrafast Recommendations for Rapid Closure HDL: use HDL Language Templates & DRC Constraints:
More informationCo-synthesis and Accelerator based Embedded System Design
Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationFPGA briefing Part II FPGA development DMW: FPGA development DMW:
FPGA briefing Part II FPGA development FPGA development 1 FPGA development FPGA development : Domain level analysis (Level 3). System level design (Level 2). Module level design (Level 1). Academical focus
More informationExperience with the NetFPGA Program
Experience with the NetFPGA Program John W. Lockwood Algo-Logic Systems Algo-Logic.com With input from the Stanford University NetFPGA Group & Xilinx XUP Program Sunday, February 21, 2010 FPGA-2010 Pre-Conference
More informationCustom Instruction Generation Using Temporal Partitioning Techniques for a Reconfigurable Functional Unit
Custom Instruction Generation Using Temporal Partitioning Techniques for a Reconfigurable Functional Unit Farhad Mehdipour, Hamid Noori, Morteza Saheb Zamani, Kazuaki Murakami, Koji Inoue, Mehdi Sedighi
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationTechnology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas
Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas 1 RTL Design Flow HDL RTL Synthesis Manual Design Module Generators Library netlist
More informationPolly Polyhedral Optimizations for LLVM
Polly Polyhedral Optimizations for LLVM Tobias Grosser - Hongbin Zheng - Raghesh Aloor Andreas Simbürger - Armin Grösslinger - Louis-Noël Pouchet April 03, 2011 Polly - Polyhedral Optimizations for LLVM
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationScalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA
Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089
More informationThe Design of Sobel Edge Extraction System on FPGA
The Design of Sobel Edge Extraction System on FPGA Yu ZHENG 1, * 1 School of software, Beijing University of technology, Beijing 100124, China; Abstract. Edge is a basic feature of an image, the purpose
More informationTheory and Algorithm for Generalized Memory Partitioning in High-Level Synthesis
Theory and Algorithm for Generalized Memory Partitioning in High-Level Synthesis Yuxin Wang ayerwang@pku.edu.cn Peng Li peng.li@pku.edu.cn Jason Cong,3,, cong@cs.ucla.edu Center for Energy-Efficient Computing
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (I)
COSC 6385 Computer Architecture - Hierarchies (I) Fall 2007 Slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05 Recap
More informationFPGA Entering the Era of the All Programmable SoC
FPGA Entering the Era of the All Programmable SoC Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates on Cost Page 3 Design Cost Estimated Chip
More informationECE 5775 (Fall 17) High-Level Digital Design Automation. More Pipelining
ECE 5775 (Fall 17) High-Level Digital Design Automation More Pipelining Announcements HW 2 due Monday 10/16 (no late submission) Second round paper bidding @ 5pm tomorrow on Piazza Talk by Prof. Margaret
More informationFPGA design with National Instuments
FPGA design with National Instuments Rémi DA SILVA Systems Engineer - Embedded and Data Acquisition Systems - MED Region ni.com The NI Approach to Flexible Hardware Processor Real-time OS Application software
More informationOptimizing FPGA-based Accelerator Design for Deep Convolutional Neural Network
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Network Chen Zhang 1, Peng Li 3, Guangyu Sun 1,2, Yijin Guan 1, Bingjun Xiao 3, Jason Cong 1,2,3 1 Peking University 2 PKU/UCLA Joint
More informationII. LITERATURE SURVEY
Hardware Co-Simulation of Sobel Edge Detection Using FPGA and System Generator Sneha Moon 1, Prof Meena Chavan 2 1,2 Department of Electronics BVUCOE Pune India Abstract: This paper implements an image
More informationThroughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu
More informationChip Design for Turbo Encoder Module for In-Vehicle System
Chip Design for Turbo Encoder Module for In-Vehicle System Majeed Nader Email: majeed@wayneedu Yunrui Li Email: yunruili@wayneedu John Liu Email: johnliu@wayneedu Abstract This paper studies design and
More informationFPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011
FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level
More informationEnergy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package
High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction
More informationINTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016
NEW VLSI ARCHITECTURE FOR EXPLOITING CARRY- SAVE ARITHMETIC USING VERILOG HDL B.Anusha 1 Ch.Ramesh 2 shivajeehul@gmail.com 1 chintala12271@rediffmail.com 2 1 PG Scholar, Dept of ECE, Ganapathy Engineering
More informationParallelized Radix-4 Scalable Montgomery Multipliers
Parallelized Radix-4 Scalable Montgomery Multipliers Nathaniel Pinckney and David Money Harris 1 1 Harvey Mudd College, 301 Platt. Blvd., Claremont, CA, USA e-mail: npinckney@hmc.edu ABSTRACT This paper
More informationSynthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool
Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool Md. Abdul Latif Sarker, Moon Ho Lee Division of Electronics & Information Engineering Chonbuk National University 664-14 1GA Dekjin-Dong
More informationESL design with the Agility Compiler for SystemC
ESL design with the Agility Compiler for SystemC SystemC behavioral design & synthesis Steve Chappell & Chris Sullivan Celoxica ESL design portfolio Complete ESL design environment Streaming Video Processing
More informationAn Automated flow for Arithmetic Component Generation in Field-Programmable Gate Arrays
An Automated flow for Arithmetic Component Generation in Field-Programmable Gate Arrays ALASTAIR M. SMITH, GEORGE A. CONSTANTINIDES, PETER Y. K. CHEUNG Imperial College London, UK State of the art configurable
More informationReconfigurable Computing. Introduction
Reconfigurable Computing Tony Givargis and Nikil Dutt Introduction! Reconfigurable computing, a new paradigm for system design Post fabrication software personalization for hardware computation Traditionally
More informationField Programmable Gate Array (FPGA)
Field Programmable Gate Array (FPGA) Lecturer: Krébesz, Tamas 1 FPGA in general Reprogrammable Si chip Invented in 1985 by Ross Freeman (Xilinx inc.) Combines the advantages of ASIC and uc-based systems
More informationFPGA system development What you need to think about. Frédéric Leens, CEO
FPGA system development What you need to think about Frédéric Leens, CEO About Byte Paradigm 2005 : Founded by 3 ASIC-SoC-FPGA engineers as a Design Center for high-end FPGA and board design. 2007 : GP
More informationFast FPGA prototyping with Software Development Kit for FPGA (SDK4FPGA)
Fast FPGA prototyping with Software Development Kit for FPGA (SDK4FPGA) Andrea Suardi cas.ee.ic.ac.uk/projects/sdk4fpga This research has been supported by EPSRC Impact Acceleration grant number EP/K503733/1
More informationii Copyright c Harikrishna Samala 2009 All Rights Reserved
ii Copyright c Harikrishna Samala 2009 All Rights Reserved iii Abstract Methodology to Derive Resource Aware Context Adaptable Architectures for Field Programmable Gate Arrays by Harikrishna Samala, Master
More informationAnalyzing the Generation and Optimization of an FPGA Accelerator using High Level Synthesis
Paper Analyzing the Generation and Optimization of an FPGA Accelerator using High Level Synthesis Sebastian Kaltenstadler Ulm University Ulm, Germany sebastian.kaltenstadler@missinglinkelectronics.com
More informationPINE TRAINING ACADEMY
PINE TRAINING ACADEMY Course Module A d d r e s s D - 5 5 7, G o v i n d p u r a m, G h a z i a b a d, U. P., 2 0 1 0 1 3, I n d i a Digital Logic System Design using Gates/Verilog or VHDL and Implementation
More informationPerformance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL Models Hugues Jérome Institut Supérieur de l'aéronautique et de l'espace (ISAE-SUPAERO) Université de Toulouse 31055 TOULOUSE Cedex 4 Jerome.huges@isae.fr
More informationScalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA
Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School
More informationTowards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA
Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Junzhong Shen, You Huang, Zelong Wang, Yuran Qiao, Mei Wen, Chunyuan Zhang National University of Defense Technology,
More informationExploring Logic Block Granularity for Regular Fabrics
1530-1591/04 $20.00 (c) 2004 IEEE Exploring Logic Block Granularity for Regular Fabrics A. Koorapaty, V. Kheterpal, P. Gopalakrishnan, M. Fu, L. Pileggi {aneeshk, vkheterp, pgopalak, mfu, pileggi}@ece.cmu.edu
More informationECE 5775 High-Level Digital Design Automation, Fall 2016 School of Electrical and Computer Engineering, Cornell University
ECE 5775 High-Level Digital Design Automation, Fall 2016 School of Electrical and Computer Engineering, Cornell University Optical Flow on FPGA Ian Thompson (ijt5), Joseph Featherston (jgf82), Judy Stephen
More informationUnlocking FPGAs Using High- Level Synthesis Compiler Technologies
Unlocking FPGAs Using High- Leel Synthesis Compiler Technologies Fernando Mar*nez Vallina, Henry Styles Xilinx Feb 22, 2015 Why are FPGAs Good Scalable, highly parallel and customizable compute 10s to
More informationFast and Flexible High-Level Synthesis from OpenCL using Reconfiguration Contexts
Fast and Flexible High-Level Synthesis from OpenCL using Reconfiguration Contexts James Coole, Greg Stitt University of Florida Department of Electrical and Computer Engineering and NSF Center For High-Performance
More informationA Novel Deadlock Avoidance Algorithm and Its Hardware Implementation
A ovel Deadlock Avoidance Algorithm and Its Hardware Implementation + Jaehwan Lee and *Vincent* J. Mooney III Hardware/Software RTOS Group Center for Research on Embedded Systems and Technology (CREST)
More informationMulti-Gigahertz Parallel FFTs for FPGA and ASIC Implementation
Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation Doug Johnson, Applications Consultant Chris Eddington, Technical Marketing Synopsys 2013 1 Synopsys, Inc. 700 E. Middlefield Road Mountain
More informationModel-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany
Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation
More informationChristophe HURIAUX. Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration
Mid-term Evaluation March 19 th, 2015 Christophe HURIAUX Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration Accélérateurs matériels reconfigurables embarqués avec reconfiguration
More information