Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case
|
|
- Elmer Hodge
- 5 years ago
- Views:
Transcription
1 XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation SAMOS XIV July 14 th - Samos Island (Greece) Carlo Sau, Luigi Raffo DIEE Università degli Studi di Cagliari Francesca Palumbo POLCOMING Università degli Studi di Sassari Endri Bezati, Simone Casale-Brunet, Marco Mattavelli SCI STI MM École Politecnique Fédérale de Lausanne Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case
2 Introduction Problem Statement Two Step Problem Solving Outline Step 1: Coarse-Grained Reconfigurability Step 2: Dataflow Model of Computation and RVC-CAL Generation of Multi-Dataflow Graphs Automated Design Flow Composition: the Multi-Dataflow Composer Optimization: TURNUS Co-Exploration Framework RTL Generation: Xronos High Level Synthesis Tools Integration Experimental Results An MPEG-4 SP Decoder Use Case Synthesis Results Performance Results Conclusions 1
3 Introduction Problem Statement Two Step Problem Solving Outline Step 1: Coarse-Grained Reconfigurability Step 2: Dataflow Model of Computation and RVC-CAL Generation of Multi-Dataflow Graphs Automated Design Flow Composition: the Multi-Dataflow Composer Optimization: TURNUS Co-Exploration Framework RTL Generation: Xronos High Level Synthesis Tools Integration Experimental Results An MPEG-4 SP Decoder Use Case Synthesis Results Performance Results Conclusions 2
4 Problem Statement Electronic devices are converging to platforms that are: portable: dimensions and battery life limits have to be taken into consideration. multimedial: different applications have to be executed, very often at the same time. higly efficient: real time response is needed in many cases.. easily evolvable: technology and algorithms fast evolution have to be followed by the devices. 3 SOURCE: Merging Media 2013 in Vancouver Canada by Gary Hayes
5 Problem Statement Electronic devices are converging to platforms that are: portable: dimensions and battery life limits have to be taken into consideration. multimedial: different applications have to be executed, very often at the same time. higly efficient: real time response is needed in many cases.. easily evolvable: technology and algorithms fast evolution have to be followed by the SOURCE: Merging Media 2013 in Vancouver Canada by Gary Hayes devices. 3
6 Problem Statement Electronic devices are converging to platforms that are: portable: dimensions and battery life limits have to be taken into consideration. multimedial: different applications have to be executed, very often at the same time. higly efficient: real time response is needed in many cases.. easily evolvable: technology and algorithms fast evolution have to be followed by the SOURCE: Merging Media 2013 in Vancouver Canada by Gary Hayes devices. need of new design flows to address this complex scenario 3
7 Two Step Problem Solving Step 1: Coarse-Grained Reconfigurability Systems are required to be flexible and efficient. 4
8 Flexibility Two Step Problem Solving Step 1: Coarse-Grained Reconfigurability Systems are required to be flexible and efficient. Reconfigurable Paradigm (RP) to hardware design: specialized computing platforms, capable of changing configuration to serve the targeted computations. GPP DSP RP ASIC Performance 4
9 Flexibility Two Step Problem Solving Step 1: Coarse-Grained Reconfigurability Systems are required to be flexible and efficient. Reconfigurable Paradigm (RP) to hardware design: specialized computing platforms, capable of changing configuration to serve the targeted computations. The more the hardware is specialized, the more it is difficult to program. GPP DSP RP ASIC Performance FINE- GRAINED (FG) Bit-level COARSE- GRAINED (CG) Word-level Flexibility Reconf. Speed Config. Storage 4
10 Two Step Problem Solving Step 2: Dataflow Model of Computation and RVC-CAL (1) A dataflow program is a directed graph of functional units (actors) exchanging sequences of data (tokens) through dedicated channels. actions state Actors encapsulate their state and communicate exclusively by sending and receiving tokens. Such an absence of race conditions, along with the intrinsic modularity of the dataflow graphs, make it possible to explicit the algorithmic parallelism of the programs. 5
11 Two Step Problem Solving Step 2: Dataflow Model of Computation and RVC-CAL (2) CAL[1] = Cal Actor Language, high-level programming language for describing dataflow actors. Decoder Composition Mechanism VTL [2] (RVC-CAL FUs) Abstract Decoder Model (XDF + RVC-Cal) Decoder Implementation Standard RVC Model Proprietary Tool Library Decoder Solution Proprietary Decoder [1] MPEG-B part 4 (2009), [2] MPEG-C part 4 (2010) 6
12 Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications A C D B 7
13 Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications HW platform A A C D 1:1 C D B B clock 7
14 Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications HW platform A A C D 1:1 C D B B clock A C D B B E D 7
15 Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications HW platform A A C D 1:1 C D B B clock A B C D 2:1 HW CG reconfigurable platform A clock C B E D B D E 7
16 Generation of Multi-Dataflow Graphs (2) Challenges of the modern electronic devices design: low area and power (portability) flexibility (multimediality) performances (high efficiency) code reusability (fast evolution) 8
17 Generation of Multi-Dataflow Graphs (2) Challenges of the modern electronic devices design: resource sharing low area and power (portability) CG reconfiguration flexibility (multimediality) dataflow parallelism highlighting performances (high efficiency) dataflow modularity (VTL) code reusability (fast evolution) 8
18 Generation of Multi-Dataflow Graphs (2) Challenges of the modern electronic devices design: resource sharing low area and power (portability) CG reconfiguration flexibility (multimediality) dataflow parallelism highlighting performances (high efficiency) dataflow modularity (VTL) code reusability (fast evolution) perfect matching, but 8
19 Generation of Multi-Dataflow Graphs (2) Challenges of the modern electronic devices design: resource sharing low area and power (portability) CG reconfiguration flexibility (multimediality) dataflow parallelism highlighting performances (high efficiency) dataflow modularity (VTL) code reusability (fast evolution) perfect matching, but what happens if the design complexity grows? 8
20 Generation of Multi-Dataflow Graphs (2) Challenges of the modern electronic devices design: resource sharing low area and power (portability) CG reconfiguration flexibility (multimediality) dataflow parallelism highlighting performances (high efficiency) dataflow modularity (VTL) code reusability (fast evolution) perfect matching, but what happens if the design complexity grows? 8
21 Generation of Multi-Dataflow Graphs (2) Challenges of the modern electronic devices design: resource sharing low area and power (portability) CG reconfiguration flexibility (multimediality) dataflow parallelism highlighting performances (high efficiency) code reusability (fast evolution) need of an automated design flow dataflow modularity (VTL) perfect matching, but what happens if the design complexity grows? 8
22 Introduction Problem Statement Two Step Problem Solving Outline Step 1: Coarse-Grained Reconfigurability Step 2: Dataflow Model of Computation and RVC-CAL Generation of Multi-Dataflow Graphs Automated Design Flow Composition: the Multi-Dataflow Composer Optimization: TURNUS Co-Exploration Framework RTL Generation: Xronos High Level Synthesis Tools Integration Experimental Results An MPEG-4 SP Decoder Use Case Synthesis Results Performance Results Conclusions 9
23 Automated Design Flow Functionalities required by the automated design flow: 10
24 Automated Design Flow Functionalities required by the automated design flow: multi-dataflow network composition A C D B B E D A B C E D 10
25 Automated Design Flow Functionalities required by the automated design flow: multi-dataflow network composition dataflow network buffers optimization A C D B B E D A C B D E A B A B C E C E D D 10
26 Automated Design Flow Functionalities required by the automated design flow: multi-dataflow network composition dataflow network buffers optimization A C D B B E D A C B D E A B A B C E C E D D multi-dataflow corresponding RTL generation A C A B D B E C E clock D 10
27 Composition: the Multi-Dataflow Composer (1) ORCC back-ends front-end C Java LLVM Open RVC-CAL Compiler (ORCC) is a framework able to generate, from an RVC-CAL specification, the corresponding source code for different target platforms (hardware, software or mixed). 11
28 Composition: the Multi-Dataflow Composer (1) RVC-CAL specification ORCC CAL Actors back-ends front-end XDF Graph C Java LLVM Open RVC-CAL Compiler (ORCC) is a framework able to generate, from an RVC-CAL specification, the corresponding source code for different target platforms (hardware, software or mixed). 11
29 Composition: the Multi-Dataflow Composer (1) RVC-CAL specification ORCC CAL Actors back-ends front-end Intermediate Representation XDF Graph C Java LLVM Open RVC-CAL Compiler (ORCC) is a framework able to generate, from an RVC-CAL specification, the corresponding source code for different target platforms (hardware, software or mixed). 11
30 Composition: the Multi-Dataflow Composer (1) RVC-CAL specification 1:1 ORCC CAL Actors back-ends front-end Intermediate Representation XDF Graph C Java LLVM Open RVC-CAL Compiler (ORCC) is a framework able to generate, from an RVC-CAL specification, the corresponding source code for different target platforms (hardware, software or mixed). Source Code 11
31 Composition: the Multi-Dataflow Composer (1) front-end ORCC back-ends C Java LLVM 11
32 front-end back-end Composition: the Multi-Dataflow Composer (1) ORCC back-ends front-end C Java LLVM MDC MDC exploits the ORCC frontend to generate, from a set of RVC-CAL specifications, the hardware description of a reconfigurable platform. 11
33 front-end back-end Composition: the Multi-Dataflow Composer (1) RVC-CAL specifications CAL Actors XDF Graphs ORCC back-ends front-end C Java LLVM MDC MDC exploits the ORCC frontend to generate, from a set of RVC-CAL specifications, the hardware description of a reconfigurable platform. 11
34 front-end back-end Composition: the Multi-Dataflow Composer (1) RVC-CAL specifications CAL Actors XDF Graphs ORCC back-ends front-end Intermediate Representation C Java LLVM MDC MDC exploits the ORCC frontend to generate, from a set of RVC-CAL specifications, the hardware description of a reconfigurable platform. 11
35 front-end back-end Composition: the Multi-Dataflow Composer (1) RVC-CAL specifications CAL Actors XDF Graphs ORCC back-ends front-end Intermediate Representation C Java LLVM Multi- Dataflow XDF Graph MDC MDC exploits the ORCC frontend to generate, from a set of RVC-CAL specifications, the hardware description of a reconfigurable platform. 11
36 front-end back-end Composition: the Multi-Dataflow Composer (1) RVC-CAL specifications n:1 ORCC CAL Actors back-ends front-end Intermediate Representation XDF Graphs C Java LLVM Multi- Dataflow XDF Graph MDC HW Reconfigurable Platform (HDL) HDL Components Library MDC exploits the ORCC frontend to generate, from a set of RVC-CAL specifications, the hardware description of a reconfigurable platform. 11
37 Composition: the Multi-Dataflow RVC-CAL dataflow specifications A dataflow α C D B dataflow β B E D Composer (2) 12
38 MDC front-end Composition: the Multi-Dataflow RVC-CAL dataflow specifications A dataflow α C D Composer (2) multi-dataflow specification A C B dataflow β B s0 s1 D B E D E 12
39 MDC front-end Composition: the Multi-Dataflow RVC-CAL dataflow specifications A dataflow α C D Composer (2) multi-dataflow specification A C B dataflow β B s0 s1 D B E D E MDC back-end A B 0 1 s0 HW CG reconfigurable platform C E s1 0 1 clock D config LUT α β s1 0 1 s
40 Optimization: TURNUS Co-Exploration Framework TURNUS front-end back-end TURNUS is a design space exploration framework for heterogeneous parallel systems. It provides high-level modeling and simulation methods and tools for system level performances estimation and optimization. 13
41 Optimization: TURNUS Co-Exploration Framework TURNUS front-end back-end It provides the execution causation traces for a given dataflow specification in relation to a particular input stimulus. The causation traces are graphs describing the dependencies among the actions executed during the input stimulus processing. 13
42 Optimization: TURNUS Co-Exploration Framework RVC-CAL specification CAL Actors XDF Graph Input stimulus TURNUS front-end back-end It provides the execution causation traces for a given dataflow specification in relation to a particular input stimulus. The causation traces are graphs describing the dependencies among the actions executed during the input stimulus processing. 13
43 Optimization: TURNUS Co-Exploration Framework RVC-CAL specification CAL Actors XDF Graph Input stimulus TURNUS front-end Simulation back-end It provides the execution causation traces for a given dataflow specification in relation to a particular input stimulus. The causation traces are graphs describing the dependencies among the actions executed during the input stimulus processing. 13
44 Optimization: TURNUS Co-Exploration Framework RVC-CAL specification CAL Actors XDF Graph Input stimulus TURNUS front-end Simulation back-end It provides the execution causation traces for a given dataflow specification in relation to a particular input stimulus. The causation traces are graphs describing the dependencies among the actions executed during the input stimulus processing. Execution Causation Traces 13
45 Optimization: TURNUS Co-Exploration Framework TURNUS front-end back-end The causation traces postprocessing analyses the causation traces along with given action weights (latency of the actions for a specific target platform) in order to optimize the size of the FIFO memories in the dataflow network through a Design Space Exploration. Execution Causation Traces 13
46 Optimization: TURNUS Co-Exploration Framework Actions weights TURNUS front-end back-end The causation traces postprocessing analyses the causation traces along with given action weights (latency of the actions for a specific target platform) in order to optimize the size of the FIFO memories in the dataflow network through a Design Space Exploration. Execution Causation Traces 13
47 Optimization: TURNUS Co-Exploration Framework Actions weights TURNUS front-end Design Space Exploration back-end The causation traces postprocessing analyses the causation traces along with given action weights (latency of the actions for a specific target platform) in order to optimize the size of the FIFO memories in the dataflow network through a Design Space Exploration. Execution Causation Traces 13
48 Optimization: TURNUS Co-Exploration Framework Actions weights Execution Causation Traces TURNUS front-end Design Space Exploration back-end Optimal FIFO Sizes The causation traces postprocessing analyses the causation traces along with given action weights (latency of the actions for a specific target platform) in order to optimize the size of the FIFO memories in the dataflow network through a Design Space Exploration. 13
49 RTL Generation: Xronos High Level Synthesis (1) Xronos front-end back-end Xronos is a framework for the generation of RTL descriptions from dataflow or sequential applications. It supports three basic data types (int, uint and bool), flow control statements (branches, loops, parallel blocks) and procedural abstractions (sequences of statements) called tasks. 14
50 RTL Generation: Xronos High Level Synthesis (1) RVC-CAL specification CAL Actors Xronos front-end back-end XDF Graph Xronos is a framework for the generation of RTL descriptions from dataflow or sequential applications. It supports three basic data types (int, uint and bool), flow control statements (branches, loops, parallel blocks) and procedural abstractions (sequences of statements) called tasks. 14
51 RTL Generation: Xronos High Level Synthesis (1) RVC-CAL specification CAL Actors Xronos front-end Intermediate Representation back-end XDF Graph Xronos is a framework for the generation of RTL descriptions from dataflow or sequential applications. It supports three basic data types (int, uint and bool), flow control statements (branches, loops, parallel blocks) and procedural abstractions (sequences of statements) called tasks. 14
52 RTL Generation: Xronos High Level Synthesis (1) RVC-CAL specification 1:1 CAL Actors Xronos Actors.v front-end Intermediate Representation back-end Top.vhd XDF Graph Test.v Xronos is a framework for the generation of RTL descriptions from dataflow or sequential applications. It supports three basic data types (int, uint and bool), flow control statements (branches, loops, parallel blocks) and procedural abstractions (sequences of statements) called tasks. 14
53 RTL Generation: Xronos High Level Synthesis (2) RVC-CAL dataflow specification A A HW platform C D C D B B clock 15
54 RTL Generation: Xronos High Level Synthesis (2) RVC-CAL dataflow specification A A HW platform C D C D B B clock queue_actor_x_in1 in_data in_send in_ack in_rdy in_count out_data out_send out_ack out_count in1_data in1_send in1_ack in1_count Actor_X out1_data out1_send out1_ack out1_rdy out1_count fanout_actor_x_out1 in_data in_send in_ack in_rdy in_count out_data out_send out_ack out_rdy out_count queue_actor_x_inn in_data in_send in_ack in_rdy in_count out_data out_send out_ack out_count inn_data inn_send inn_ack inn_count outm_data outm_send outm_ack outtm_rdy outm_count fanout_actor_x_outm in_data in_send in_ack in_rdy in_count out_data out_send out_ack out_rdy out_count 15
55 front-end back-end Tools Integration XDF CAL RVC-CAL specifications ORCC front-end IR back-ends XDF multi dataflow MDC START: MDC compeses the multi-dataflow network 16
56 front-end back-end Tools Integration TRACES TURNUS Co- Exploration Framework XDF CAL RVC-CAL specifications ORCC front-end IR back-ends XDF multi dataflow MDC STEP 1: TURNUS generates the execution causation traces 16
57 front-end back-end Tools Integration TRACES TURNUS Co- Exploration Framework WEIGHTS XRONOS High Level Synthesis XDF front-end CAL RVC-CAL specifications ORCC IR back-ends XDF multi dataflow MDC STEP 2: Xronos generates the actions weights 16
58 front-end back-end Tools Integration TRACES TURNUS Co- Exploration Framework WEIGHTS FIFO SIZES XRONOS High Level Synthesis XDF front-end CAL RVC-CAL specifications ORCC IR back-ends XDF multi dataflow MDC STEP 3: TURNUS generates the FIFO optimal sizes 16
59 front-end back-end Tools Integration TRACES TURNUS Co- Exploration Framework WEIGHTS FIFO SIZES XRONOS High Level Synthesis HDL Components Library HW CG Reconfigurable Platform (with SBox queues/fanouts) XDF front-end CAL RVC-CAL specifications ORCC IR back-ends XDF multi dataflow MDC STEP 4: Xronos generates the RVC compliant reconfigurable hardware platform 16
60 front-end back-end Tools Integration TRACES TURNUS Co- Exploration Framework WEIGHTS FIFO SIZES XRONOS High Level Synthesis HDL Components Library HW CG Reconfigurable Platform (with SBox queues/fanouts) XDF front-end CAL RVC-CAL specifications ORCC IR back-ends STEP 5: MDC generates the optimized reconfigurable hardware platform XDF multi dataflow MDC opt HW CG Reconfigurable Platform (without Sbox queues/fanouts) 16
61 Introduction Problem Statement Two Step Problem Solving Outline Step 1: Coarse-Grained Reconfigurability Step 2: Dataflow Model of Computation and RVC-CAL Generation of Multi-Dataflow Graphs Automated Design Flow Composition: the Multi-Dataflow Composer Optimization: TURNUS Co-Exploration Framework RTL Generation: Xronos High Level Synthesis Tools Integration Experimental Results An MPEG-4 SP Decoder Use Case Synthesis Results Performance Results Conclusions 17
62 An MPEG-4 SP Decoder Use Case (1) TEX Y MOT Y input stream PARSER TEX U MOT U MERGER 420 video output TEX V MOT V intra 18
63 An MPEG-4 SP Decoder Use Case (1) TEX Y MOT Y input stream PARSER TEX U MOT U MERGER 420 video output PARSER: syntactical parser variable length coding TEX V MOT V intra 18
64 An MPEG-4 SP Decoder Use Case (1) TEX Y MOT Y input stream PARSER TEX U MOT U MERGER 420 video output TEX V MOT V intra TEX (residual decoding): AC-DC prediction inverse scan inverse quantization IDCT 18
65 An MPEG-4 SP Decoder Use Case (1) TEX Y MOT Y input stream PARSER TEX U MOT U MERGER 420 video output TEX V MOT V intra MOT (motion compensation): framebuffer interpolation residual error addition 18
66 An MPEG-4 SP Decoder Use Case (1) TEX Y MOT Y input stream PARSER TEX U MOT U MERGER 420 video output TEX V MOT V intra MERGER 420: 420 decoded components merging 18
67 An MPEG-4 SP Decoder Use Case (2) Different designs composition in terms of functionality and role of the involved actors. design ordinary* SBox number of actors not shared** shared** overall intra full parallel reconf opt_reconf * computational actors (not SBoxes). ** referred only to the computational actors (not to SBoxes). 19
68 An MPEG-4 SP Decoder Use Case (2) Different designs composition in terms of functionality and role of the involved actors. design ordinary* SBox number of actors not shared** shared** overall intra full parallel reconf opt_reconf * computational actors (not SBoxes). ** referred only to the computational actors (not to SBoxes). 19
69 An MPEG-4 SP Decoder Use Case (2) Different designs composition in terms of functionality and role of the involved actors. design ordinary* SBox number of actors not shared** shared** overall intra full parallel reconf opt_reconf * computational actors (not SBoxes). ** referred only to the computational actors (not to SBoxes). 19
70 An MPEG-4 SP Decoder Use Case (2) Different designs composition in terms of functionality and role of the involved actors. design ordinary* SBox number of actors not shared** shared** overall intra full parallel reconf opt_reconf * computational actors (not SBoxes). ** referred only to the computational actors (not to SBoxes). 19
71 An MPEG-4 SP Decoder Use Case (2) Different designs composition in terms of functionality and role of the involved actors. design ordinary* SBox number of actors not shared** shared** overall intra full parallel reconf opt_reconf * computational actors (not SBoxes). ** referred only to the computational actors (not to SBoxes). 19
72 resource Synthesis Results (1) Results retrieved from the Xilinx Synthesis Technology tool targeting a Xilinx Virtex FPGA board. design parallel reconf Δ% opt_reconf Δ% Slices Slice Regs Slice LUTs LUT-FF pairs BRAMs DSPs Max freq [MHz] 25,43 25, ,38 0 Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 20
73 resource Synthesis Results (1) Results retrieved from the Xilinx Synthesis Technology tool targeting a Xilinx Virtex FPGA board. design parallel reconf Δ% opt_reconf Δ% Slices Slice Regs Slice LUTs LUT-FF pairs BRAMs DSPs Max freq [MHz] 25,43 25, ,38 0 Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 20
74 resource Synthesis Results (1) Results retrieved from the Xilinx Synthesis Technology tool targeting a Xilinx Virtex FPGA board. design parallel reconf Δ% opt_reconf Δ% Slices Slice Regs Slice LUTs LUT-FF pairs BRAMs DSPs Max freq [MHz] 25,43 25, ,38 0 Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 20
75 resource Synthesis Results (1) Results retrieved from the Xilinx Synthesis Technology tool targeting a Xilinx Virtex FPGA board. design parallel reconf Δ% opt_reconf Δ% Slices Slice Regs Slice LUTs LUT-FF pairs BRAMs DSPs Max freq [MHz] 25,43 25, ,38 0 Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 20
76 resource power Synthesis Results (2) Results retrieved from the XPower Analyzer tool targeting a Xilinx Virtex FPGA board for a QCIF video sequence decoding. design parallel reconf Δ% opt_reconf Δ% Clock 0,382 0, , Logic 0,045 0, , Signals 0,050 0, , BRAMs 0,090 0, , TOT 0,567 0, , Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 21
77 resource power Synthesis Results (2) Results retrieved from the XPower Analyzer tool targeting a Xilinx Virtex FPGA board for a QCIF video sequence decoding. design parallel reconf Δ% opt_reconf Δ% Clock 0,382 0, , Logic 0,045 0, , Signals 0,050 0, , BRAMs 0,090 0, , TOT 0,567 0, , Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 21
78 resource power Synthesis Results (2) Results retrieved from the XPower Analyzer tool targeting a Xilinx Virtex FPGA board for a QCIF video sequence decoding. design parallel reconf Δ% opt_reconf Δ% Clock 0,382 0, , Logic 0,045 0, , Signals 0,050 0, , BRAMs 0,090 0, , TOT 0,567 0, , Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 21
79 video sequence resolution Performance Results Results are given in frames per second (fps) and are obtained as the average of fps for the intra and the full decoder configurations. design parallel reconf Δ% opt_reconf Δ% QCIF CIF Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 22
80 video sequence resolution Performance Results Results are given in frames per second (fps) and are obtained as the average of fps for the intra and the full decoder configurations. design parallel reconf Δ% opt_reconf Δ% QCIF CIF Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 22
81 Introduction Problem Statement Two Step Problem Solving Outline Step 1: Coarse-Grained Reconfigurability Step 2: Dataflow Model of Computation and RVC-CAL Generation of Multi-Dataflow Graphs Automated Design Flow Composition: the Multi-Dataflow Composer Optimization: TURNUS Co-Exploration Framework RTL Generation: Xronos High Level Synthesis Tools Integration Experimental Results An MPEG-4 SP Decoder Use Case Synthesis Results Performance Results Conclusions 23
82 Conclusions Challenge of modern electronic systems development: coupling portability, flexibility and efficiency with the minimum design effort. 24
83 Conclusions Challenge of modern electronic systems development: coupling portability, flexibility and efficiency with the minimum design effort. A solution is possible by exploiting the dataflow programming paradigm along with the coarse-grained reconfigurability. 24
84 Conclusions Challenge of modern electronic systems development: coupling portability, flexibility and efficiency with the minimum design effort. A solution is possible by exploiting the dataflow programming paradigm along with the coarse-grained reconfigurability. An automated design flow has been assembled, by the integration of different RVC-CAL tools (MDC, TURNUS, Xronos). 24
85 Conclusions Challenge of modern electronic systems development: coupling portability, flexibility and efficiency with the minimum design effort. A solution is possible by exploiting the dataflow programming paradigm along with the coarse-grained reconfigurability. An automated design flow has been assembled, by the integration of different RVC-CAL tools (MDC, TURNUS, Xronos). The proposed design flow has been validated on a real use case involving two configurations of the MPEG-4 Simple Profile decoder. Results highlighted the effectiveness of the approach by achieving more than 25% of saving in terms of resource utilization and more than 20% of saving in terms power consumption. The generated reconfigurable designs present a very small performance penalty (5-6%) with respect to the original decoder designs. 24
86 Acknowledgements The research leading to these results has received funding from: the Region of Sardinia L.R.7/2007 under grant agreement CRP [RPCT Project]. the Region of Sardinia, Young Researchers Grant, POR Sardegna FSE , L.R.7/2007 Promotion of the scientific research and technological innovation in Sardinia 25
87 XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation SAMOS XIV July 14 th - Samos Island (Greece) Carlo Sau DIEE Università degli Studi di Cagliari carlo.sau@diee.unica.it Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case
Department of Electrical and Computer Engineering, University of Maryland
Department of Electrical and Computer Engineering, University of Maryland OUTLINE Introduction: Problem statement Background Goals Co-processing units generation: Approach and baseline Multi-Dataflow Composer
More informationOutline. Embedded Systems and Rationale. Background. Multi-Dataflow Composer Tool: Features and Achievements. Summary, Coming Soon and Whish List
Outline Embedded Systems and Rationale Systems Complexity Hardware-Software Co-Design Design Flow Automation Power Management Background Dataflow Model of Computation Reconfigurable Video Coding Technologies
More informationResearch Article Modelling and Automated Implementation of Optimal Power Saving Strategies in Coarse-Grained Reconfigurable Architectures
Journal of Electrical and Computer Engineering Volume 2016, Article ID 4237350, 27 pages http://dx.doi.org/10.1155/2016/4237350 Research Article Modelling and Automated Implementation of Optimal Power
More informationMPEG RVC AVC Baseline Encoder Based on a Novel Iterative Methodology
MPEG RVC AVC Baseline Encoder Based on a Novel Iterative Methodology Hussein Aman-Allah, Ehab Hanna, Karim Maarouf, Ihab Amer Laboratory of Microelectronic Systems (GR-LSM), EPFL CH-1015 Lausanne, Switzerland
More informationHIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS
HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS S. Casale-Brunet 1, E. Bezati 1, M. Mattavelli 2 1 Swiss Institute of Bioinformatics, Lausanne, Switzerland 2 École Polytechnique Fédérale
More informationIntroduction to Reconfigurable Video Coding. Euee S. Jang Hanyang University
Introduction to Reconfigurable Video Coding Euee S. Jang Hanyang University 1 VISION OF RVC 2 Reconfigurable Video Coding RVC Standards for Algorithm/Architecture Co- Design from the standardization level.
More informationSystem-on-Chip Architecture for Mobile Applications. Sabyasachi Dey
System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution
More informationA Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures
Procedia Computer Science Volume 51, 2015, Pages 2962 2966 ICCS 2015 International Conference On Computational Science A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures
More informationBuffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms
Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms Małgorzata Michalska, Endri Bezati, Simone Casale-Brunet, Marco Mattavelli EPFL
More informationHeriot-Watt University
Heriot-Watt University Heriot-Watt University Research Gateway Shared-variable synchronization approaches for dynamic dataflow programs Modas, Apostolos; Casale-Brunet, Simone; Stewart, Robert James; Bezati,
More informationUpcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.
Upcoming Video Standards Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc. Outline Brief history of Video Coding standards Scalable Video Coding (SVC) standard Multiview Video Coding
More informationRVC-CAL DATAFLOW IMPLEMENTATIONS OF MPEG AVC/H.264 CABAC DECODING. 2 IETR/INSA Rennes F-35708, Rennes, France
RVC-CAL DATAFLOW IMPLEMENTATIONS OF MPEG AVC/H.264 CABAC DECODING Endri Bezati 1, Marco Mattavelli 1, Mickaël Raulet 2 1 Ecole Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland {firstname.lastname}@epfl.ch
More informationAn LLVM-based decoder for MPEG Reconfigurable Video Coding
An LLVM-based decoder for MPEG Reconfigurable Video Coding Jérôme Gorin, Matthieu Wipliez, Jonathan Piat, Françoise Préteux, Mickaël Raulet To cite this version: Jérôme Gorin, Matthieu Wipliez, Jonathan
More informationThe Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006
The Next Generation 65-nm FPGA Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 Hot Chips, 2006 Structure of the talk 65nm technology going towards 32nm Virtex-5 family Improved I/O Benchmarking
More informationComputing in the age of parallelism: Challenges and opportunities
Computing in the age of parallelism: Challenges and opportunities Jorn W. Janneck jwj@cs.lth.se Computer Science Dept. Lund University Multicore Day, 23 Sep 2013 why we are here Original data collected
More informationCompilers and Code Optimization EDOARDO FUSELLA
Compilers and Code Optimization EDOARDO FUSELLA Contents LLVM The nu+ architecture and toolchain LLVM 3 What is LLVM? LLVM is a compiler infrastructure designed as a set of reusable libraries with well
More informationFunctional modeling style for efficient SW code generation of video codec applications
Functional modeling style for efficient SW code generation of video codec applications Sang-Il Han 1)2) Soo-Ik Chae 1) Ahmed. A. Jerraya 2) SD Group 1) SLS Group 2) Seoul National Univ., Korea TIMA laboratory,
More informationExploring the Design Space of HEVC Inverse Transforms with Dataflow Programming
Indonesian Journal of Electrical Engineering and Computer Science Vol. 6, No. 1, April 2017, pp. 104 ~ 109 DOI: 10.11591/ijeecs.v6.i1.pp104-109 104 Exploring the Design Space of HEVC Inverse Transforms
More informationFPGA implementations of Histograms of Oriented Gradients in FPGA
FPGA implementations of Histograms of Oriented Gradients in FPGA C. Bourrasset 1, L. Maggiani 2,3, C. Salvadori 2,3, J. Sérot 1, P. Pagano 2,3 and F. Berry 1 1 Institut Pascal- D.R.E.A.M - Aubière, France
More informationDNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs
IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei
More informationFPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.
Agenda The topics that will be addressed are: Scheduling tasks on Reconfigurable FPGA architectures Mauro Marinoni ReTiS Lab, TeCIP Institute Scuola superiore Sant Anna - Pisa Overview on basic characteristics
More informationA Novel Design Framework for the Design of Reconfigurable Systems based on NoCs
Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction
More informationA unified hardware/software co-synthesis solution for signal processing systems
A unified hardware/software co-synthesis solution for signal processing systems Endri Bezati, Hervé Yviquel, Mickaël Raulet, Marco Mattavelli To cite this version: Endri Bezati, Hervé Yviquel, Mickaël
More informationHigh-level dataflow design of signal processing systems for reconfigurable and multicore heterogeneous platforms
J Real-Time Image Proc (2014) 9:251 262 DOI 10.1007/s11554-013-0326-5 SPECIAL ISSUE High-level dataflow design of signal processing systems for reconfigurable and multicore heterogeneous platforms Endri
More informationSynthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study
Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study Jörn W. Janneck, Ian D. Miller, David B. Parlour, Ghislain Roquier, Matthieu Wipliez, Mickael Raulet To cite this
More informationRTL Coding General Concepts
RTL Coding General Concepts Typical Digital System 2 Components of a Digital System Printed circuit board (PCB) Embedded d software microprocessor microcontroller digital signal processor (DSP) ASIC Programmable
More informationLearning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing
3-. 3-.2 Learning Outcomes Spiral 3 Hardware/Software Interfacing I understand the PicoBlaze bus interface signals: PORT_ID, IN_PORT, OUT_PORT, WRITE_STROBE I understand how a memory map provides the agreement
More informationA Lost Cycles Analysis for Performance Prediction using High-Level Synthesis
A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,
More informationAn FPGA Based Adaptive Viterbi Decoder
An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture
More informationRFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015
RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015 Outline Motivation Current situation Goal RFNoC Basic concepts Architecture overview Summary No Demo! See our booth,
More informationEarly Performance-Cost Estimation of Application-Specific Data Path Pipelining
Early Performance-Cost Estimation of Application-Specific Data Path Pipelining Jelena Trajkovic Computer Science Department École Polytechnique de Montréal, Canada Email: jelena.trajkovic@polymtl.ca Daniel
More informationCache Aware Optimization of Stream Programs
Cache Aware Optimization of Stream Programs Janis Sermulins, William Thies, Rodric Rabbah and Saman Amarasinghe LCTES Chicago, June 2005 Streaming Computing Is Everywhere! Prevalent computing domain with
More informationDataflow programming for heterogeneous computing systems
Dataflow programming for heterogeneous computing systems Jeronimo Castrillon Cfaed Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de Tutorial: Algorithmic specification, tools
More informationAn HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication
2018 IEEE International Conference on Consumer Electronics (ICCE) An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication Ahmet Can Mert, Ercan Kalali, Ilker Hamzaoglu Faculty
More informationEARLY PREDICTION OF HARDWARE COMPLEXITY IN HLL-TO-HDL TRANSLATION
UNIVERSITA DEGLI STUDI DI NAPOLI FEDERICO II Facoltà di Ingegneria EARLY PREDICTION OF HARDWARE COMPLEXITY IN HLL-TO-HDL TRANSLATION Alessandro Cilardo, Paolo Durante, Carmelo Lofiego, and Antonino Mazzeo
More informationFPGA Implementation and Validation of the Asynchronous Array of simple Processors
FPGA Implementation and Validation of the Asynchronous Array of simple Processors Jeremy W. Webb VLSI Computation Laboratory Department of ECE University of California, Davis One Shields Avenue Davis,
More informationA Dedicated Hardware Solution for the HEVC Interpolation Unit
XXVII SIM - South Symposium on Microelectronics 1 A Dedicated Hardware Solution for the HEVC Interpolation Unit 1 Vladimir Afonso, 1 Marcel Moscarelli Corrêa, 1 Luciano Volcan Agostini, 2 Denis Teixeira
More informationOrcc: multimedia development made easy
Orcc: multimedia development made easy Hervé Yviquel, Antoine Lorence, Khaled Jerbi, Gildas Cocherel, Alexandre Sanchez, Mickaël Raulet To cite this version: Hervé Yviquel, Antoine Lorence, Khaled Jerbi,
More informationA Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems
A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems Abstract Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time
More informationHigh Data Rate Fully Flexible SDR Modem
High Data Rate Fully Flexible SDR Modem Advanced configurable architecture & development methodology KASPERSKI F., PIERRELEE O., DOTTO F., SARLOTTE M. THALES Communication 160 bd de Valmy, 92704 Colombes,
More informationReconfigurable Platform Composer Tool Project
48th nnual Meeting ssociazione Gruppo Italiano di lettronica Reconfigurable Platform omposer Tool Project rancesca Palumbo University of assari, PolomIng Information ngineering Group arlo au, Tiziana anni,
More informationGeneration of Multigrid-based Numerical Solvers for FPGA Accelerators
Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Christian Schmitt, Moritz Schmid, Frank Hannig, Jürgen Teich, Sebastian Kuckuk, Harald Köstler Hardware/Software Co-Design, System
More informationEnergy Optimization of FPGA-Based Stream- Oriented Computing with Power Gating
Energy Optimization of FPGA-Based Stream- Oriented Computing with Power Gating Mohammad Hosseinabady and Jose Luis Nunez-Yanez Department of Electrical and Electronic Engineering University of Bristol,
More informationPerformance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL Models Hugues Jérome Institut Supérieur de l'aéronautique et de l'espace (ISAE-SUPAERO) Université de Toulouse 31055 TOULOUSE Cedex 4 Jerome.huges@isae.fr
More informationINTRODUCTION TO FPGA ARCHITECTURE
3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)
More informationAscenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005
Ascenium: A Continuously Reconfigurable Architecture Robert Mykland Founder/CTO robert@ascenium.com August, 2005 Ascenium: A Continuously Reconfigurable Processor Continuously reconfigurable approach provides:
More informationRISPP: Rotating Instruction Set Processing Platform
RISPP: Rotating Instruction Set Processing Platform Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for Embedded Systems (CES) University of Karlsruhe (TH) Development of Embedded Systems Typical:
More informationEITF35: Introduction to Structured VLSI Design
EITF35: Introduction to Structured VLSI Design Introduction to FPGA design Rakesh Gangarajaiah Rakesh.gangarajaiah@eit.lth.se Slides from Chenxin Zhang and Steffan Malkowsky WWW.FPGA What is FPGA? Field
More informationERCBench An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing
ERCBench An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing Daniel Chang Chris Jenkins, Philip Garcia, Syed Gilani, Paula Aguilera, Aishwarya Nagarajan, Michael Anderson, Matthew
More informationMulti-Gigahertz Parallel FFTs for FPGA and ASIC Implementation
Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation Doug Johnson, Applications Consultant Chris Eddington, Technical Marketing Synopsys 2013 1 Synopsys, Inc. 700 E. Middlefield Road Mountain
More informationESL design with the Agility Compiler for SystemC
ESL design with the Agility Compiler for SystemC SystemC behavioral design & synthesis Steve Chappell & Chris Sullivan Celoxica ESL design portfolio Complete ESL design environment Streaming Video Processing
More informationBreaking the Bitstream Decryption of FPGAs
Breaking the Bitstream Decryption of FPGAs 05. Sep. 2012 Amir Moradi Embedded Security Group, Ruhr University Bochum, Germany Acknowledgment Christof Paar Markus Kasper Timo Kasper Alessandro Barenghi
More informationReconOS: An RTOS Supporting Hardware and Software Threads
ReconOS: An RTOS Supporting Hardware and Software Threads Enno Lübbers and Marco Platzner Computer Engineering Group University of Paderborn marco.platzner@computer.org Overview the ReconOS project programming
More informationYet Another Implementation of CoRAM Memory
Dec 7, 2013 CARL2013@Davis, CA Py Yet Another Implementation of Memory Architecture for Modern FPGA-based Computing Shinya Takamaeda-Yamazaki, Kenji Kise, James C. Hoe * Tokyo Institute of Technology JSPS
More informationFPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)
FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor
More informationProceedings Chapter. Reference. Efficient scheduling policies for dynamic data flow programs executed on multi-core. MICHALSKA, Malgorzata, et al.
Proceedings Chapter Efficient scheduling policies for dynamic data flow programs executed on multi-core MICHALSKA, Malgorzata, et al. Abstract An important challenge of dataflow program implementations
More informationAVC Entropy Coding for MPEG Reconfigurable Video Coding
AVC Entropy Coding for MPEG Reconfigurable Video Coding Hussein Aman-Allah and Ihab Amer Laboratory of Microelectronic Systems (GR-LSM), EPFL CH-1015 Lausanne, Switzerland {hussein.aman-allah, ihab.amer}@epfl.ch
More informationH.264 AVC 4k Decoder V.1.0, 2014
SOC H.264 AVC 4k Video Decoder Datasheet System-On-Chip (SOC) Technologies 1. Key Features 1. Profile: High profile 2. Resolution: 4k (3840x2160) 3. Frame Rate: up to 60fps 4. Chroma Format: 4:2:0 or 4:2:2
More informationSide-Channel Countermeasures for Hardware: is There a Light at the End of the Tunnel?
Side-Channel Countermeasures for Hardware: is There a Light at the End of the Tunnel? 11. Sep 2013 Ruhr University Bochum Outline Power Analysis Attack Masking Problems in hardware Possible approaches
More informationTowards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing
Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Walter Stechele, Stephan Herrmann, Andreas Herkersdorf Technische Universität München 80290 München Germany Walter.Stechele@ei.tum.de
More informationIntegration of reduction operators in a compiler for FPGAs
Integration of reduction operators in a compiler for FPGAs Manjukumar Harthikote Matha Department of Electrical and Computer Engineering Colorado State University Outline Introduction to RC system and
More informationThe Lekha 3GPP LTE FEC IP Core meets 3GPP LTE specification 3GPP TS V Release 10[1].
Lekha IP 3GPP LTE FEC Encoder IP Core V1.0 The Lekha 3GPP LTE FEC IP Core meets 3GPP LTE specification 3GPP TS 36.212 V 10.5.0 Release 10[1]. 1.0 Introduction The Lekha IP 3GPP LTE FEC Encoder IP Core
More informationHardware Software Co-design and SoC. Neeraj Goel IIT Delhi
Hardware Software Co-design and SoC Neeraj Goel IIT Delhi Introduction What is hardware software co-design Some part of application in hardware and some part in software Mpeg2 decoder example Prediction
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationLars Schor, and Lothar Thiele ETH Zurich, Switzerland
Iuliana Bacivarov, Wolfgang Haid, Kai Huang, Lars Schor, and Lothar Thiele ETH Zurich, Switzerland Efficient i Execution of KPN on MPSoC Efficiency regarding speed-up small memory footprint portability
More informationJust-in-time adaptive decoder engine: a universal video decoder based on MPEG RVC
Just-in-time adaptive decoder engine: a universal video decoder based on MPEG RVC Jérôme Gorin, Hervé Yviquel, Françoise Prêteux, Mickaël Raulet To cite this version: Jérôme Gorin, Hervé Yviquel, Françoise
More informationSpiral 3-1. Hardware/Software Interfacing
3-1.1 Spiral 3-1 Hardware/Software Interfacing 3-1.2 Learning Outcomes I understand the PicoBlaze bus interface signals: PORT_ID, IN_PORT, OUT_PORT, WRITE_STROBE I understand how a memory map provides
More informationLecture 41: Introduction to Reconfigurable Computing
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following
More informationMulti-Level Computing Architectures (MLCA)
Multi-Level Computing Architectures (MLCA) Faraydon Karim ST US-Fellow Emerging Architectures Advanced System Technology STMicroelectronics Outline MLCA What s the problem Traditional solutions: VLIW,
More informationMetodologie di Progettazione Hardware e Software
POLITECNICO DI MILANO Metodologie di Progettazione Hardware e Software Reconfigurable Computing - Design Flow - Marco D. Santambrogio marco.santabrogio@polimi.it Outline 2 Retargetable Compiler Basic Idea
More informationSynthesizing Hardware from Dataflow Programs
Synthesizing Hardware from Dataflow Programs Jörn W. Janneck, Ian D. Miller, David B. Parlour, Ghislain Roquier, Matthieu Wipliez, Mickaël Raulet To cite this version: Jörn W. Janneck, Ian D. Miller, David
More informationFPGA architecture and design technology
CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA
More informationFPGAs for Image Processing
FPGAs for Image Processing A DSL and program transformations Rob Stewart Greg Michaelson Idress Ibrahim Deepayan Bhowmik Andy Wallace Paulo Garcia Heriot-Watt University 10 May 2016 What I will say 1.
More informationSignal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University
Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage silage@temple.edu ECE Temple University www.temple.edu/scdl Signal Processing Algorithms into Fixed Point FPGA Hardware Motivation
More informationSHA3 Core Specification. Author: Homer Hsing
SHA3 Core Specification Author: Homer Hsing homer.hsing@gmail.com Rev. 0.1 January 29, 2013 This page has been intentionally left blank. www.opencores.org Rev 0.1 ii Rev. Date Author Description 0.1 01/29/2013
More informationElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests
ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests Mingxing Tan 1 2, Gai Liu 1, Ritchie Zhao 1, Steve Dai 1, Zhiru Zhang 1 1 Computer Systems Laboratory, Electrical and Computer
More informationIntroduction. L25: Modern Compiler Design
Introduction L25: Modern Compiler Design Course Aims Understand the performance characteristics of modern processors Be familiar with strategies for optimising dynamic dispatch for languages like JavaScript
More informationMapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor. NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis
Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis University of Thessaly Greece 1 Outline Introduction to AVS
More informationEnhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension
Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami Institute of Systems, Information
More informationTable 1: Example Implementation Statistics for Xilinx FPGAs
logijpge Motion JPEG Encoder January 10 th, 2018 Data Sheet Version: v1.0 Xylon d.o.o. Fallerovo setaliste 22 10000 Zagreb, Croatia Phone: +385 1 368 00 26 Fax: +385 1 365 51 67 E-mail: support@logicbricks.com
More informationMultimedia Decoder Using the Nios II Processor
Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra
More informationMAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015
MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 Video Codecs 70% of internet traffic will be video in 2018 [CISCO] Video
More informationA qualitative analysis of the benefits of LUTs, Processors, embedded memory and interconnect in MPSoC platforms. Kees Vissers.
A qualitative analysis of the benefits of LUTs, Processors, embedded memory and interconnect in MPSoC platforms Xilinx Research OUTLINE Historical Perspective Conventional FPGAs Applications and Programming
More informationScalable Extension of HEVC 한종기
Scalable Extension of HEVC 한종기 Contents 0. Overview for Scalable Extension of HEVC 1. Requirements and Test Points 2. Coding Gain/Efficiency 3. Complexity 4. System Level Considerations 5. Related Contributions
More informationCPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline
CPE/EE 422/522 Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices Dr. Rhonda Kay Gaede UAH Outline Introduction Field-Programmable Gate Arrays Virtex Virtex-E, Virtex-II, and Virtex-II
More informationDevelopment of tools supporting. MEANDER Design Framework
Development of tools supporting FPGA reconfigurable hardware MEANDER Design Framework Presentation Outline Current state of academic design tools Proposed design flow Proposed graphical user interface
More informationWhat is Xilinx Design Language?
Bill Jason P. Tomas University of Nevada Las Vegas Dept. of Electrical and Computer Engineering What is Xilinx Design Language? XDL is a human readable ASCII format compatible with the more widely used
More informationA Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms
A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,
More informationA HOG-based Real-time and Multi-scale Pedestrian Detector Demonstration System on FPGA
Institute of Microelectronic Systems A HOG-based Real-time and Multi-scale Pedestrian Detector Demonstration System on FPGA J. Dürre, D. Paradzik and H. Blume FPGA 2018 Outline Pedestrian detection with
More informationTHE H.264 ADVANCED VIDEO COMPRESSION STANDARD
THE H.264 ADVANCED VIDEO COMPRESSION STANDARD Second Edition Iain E. Richardson Vcodex Limited, UK WILEY A John Wiley and Sons, Ltd., Publication About the Author Preface Glossary List of Figures List
More informationFPGA: What? Why? Marco D. Santambrogio
FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much
More informationMethod We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training
Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering Winter/Summer Training Level 2 continues. 3 rd Year 4 th Year FIG-3 Level 1 (Basic & Mandatory) & Level 1.1 and
More informationFPGA based High Performance CAVLC Implementation for H.264 Video Coding
FPGA based High Performance CAVLC Implementation for H.264 Video Coding Arun Kumar Pradhan Trident Academy of Technology Bhubaneswar,India Lalit Kumar Kanoje Trident Academy of Technology Bhubaneswar,India
More informationIntroduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013
Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.
More informationIterative Refinement on FPGAs
Iterative Refinement on FPGAs Tennessee Advanced Computing Laboratory University of Tennessee JunKyu Lee July 19 th 2011 This work was partially supported by the National Science Foundation, grant NSF
More informationMulti processor systems with configurable hardware acceleration
Multi processor systems with configurable hardware acceleration Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline Motivations
More informationA 1-GHz Configurable Processor Core MeP-h1
A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface
More informationBuilt-In Self-Test for Regular Structure Embedded Cores in System-on-Chip
Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip Srinivas Murthy Garimella Master s Thesis Defense Thesis Advisor: Dr. Charles E. Stroud Committee Members: Dr. Victor P. Nelson
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationAn automatic tool flow for the combined implementation of multi-mode circuits
An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João M. P. Cardoso and Dirk Stroobandt Ghent University, ELIS Department Sint-Pietersnieuwstraat
More information