Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case

Size: px
Start display at page:

Download "Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case"

Transcription

1 XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation SAMOS XIV July 14 th - Samos Island (Greece) Carlo Sau, Luigi Raffo DIEE Università degli Studi di Cagliari Francesca Palumbo POLCOMING Università degli Studi di Sassari Endri Bezati, Simone Casale-Brunet, Marco Mattavelli SCI STI MM École Politecnique Fédérale de Lausanne Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case

2 Introduction Problem Statement Two Step Problem Solving Outline Step 1: Coarse-Grained Reconfigurability Step 2: Dataflow Model of Computation and RVC-CAL Generation of Multi-Dataflow Graphs Automated Design Flow Composition: the Multi-Dataflow Composer Optimization: TURNUS Co-Exploration Framework RTL Generation: Xronos High Level Synthesis Tools Integration Experimental Results An MPEG-4 SP Decoder Use Case Synthesis Results Performance Results Conclusions 1

3 Introduction Problem Statement Two Step Problem Solving Outline Step 1: Coarse-Grained Reconfigurability Step 2: Dataflow Model of Computation and RVC-CAL Generation of Multi-Dataflow Graphs Automated Design Flow Composition: the Multi-Dataflow Composer Optimization: TURNUS Co-Exploration Framework RTL Generation: Xronos High Level Synthesis Tools Integration Experimental Results An MPEG-4 SP Decoder Use Case Synthesis Results Performance Results Conclusions 2

4 Problem Statement Electronic devices are converging to platforms that are: portable: dimensions and battery life limits have to be taken into consideration. multimedial: different applications have to be executed, very often at the same time. higly efficient: real time response is needed in many cases.. easily evolvable: technology and algorithms fast evolution have to be followed by the devices. 3 SOURCE: Merging Media 2013 in Vancouver Canada by Gary Hayes

5 Problem Statement Electronic devices are converging to platforms that are: portable: dimensions and battery life limits have to be taken into consideration. multimedial: different applications have to be executed, very often at the same time. higly efficient: real time response is needed in many cases.. easily evolvable: technology and algorithms fast evolution have to be followed by the SOURCE: Merging Media 2013 in Vancouver Canada by Gary Hayes devices. 3

6 Problem Statement Electronic devices are converging to platforms that are: portable: dimensions and battery life limits have to be taken into consideration. multimedial: different applications have to be executed, very often at the same time. higly efficient: real time response is needed in many cases.. easily evolvable: technology and algorithms fast evolution have to be followed by the SOURCE: Merging Media 2013 in Vancouver Canada by Gary Hayes devices. need of new design flows to address this complex scenario 3

7 Two Step Problem Solving Step 1: Coarse-Grained Reconfigurability Systems are required to be flexible and efficient. 4

8 Flexibility Two Step Problem Solving Step 1: Coarse-Grained Reconfigurability Systems are required to be flexible and efficient. Reconfigurable Paradigm (RP) to hardware design: specialized computing platforms, capable of changing configuration to serve the targeted computations. GPP DSP RP ASIC Performance 4

9 Flexibility Two Step Problem Solving Step 1: Coarse-Grained Reconfigurability Systems are required to be flexible and efficient. Reconfigurable Paradigm (RP) to hardware design: specialized computing platforms, capable of changing configuration to serve the targeted computations. The more the hardware is specialized, the more it is difficult to program. GPP DSP RP ASIC Performance FINE- GRAINED (FG) Bit-level COARSE- GRAINED (CG) Word-level Flexibility Reconf. Speed Config. Storage 4

10 Two Step Problem Solving Step 2: Dataflow Model of Computation and RVC-CAL (1) A dataflow program is a directed graph of functional units (actors) exchanging sequences of data (tokens) through dedicated channels. actions state Actors encapsulate their state and communicate exclusively by sending and receiving tokens. Such an absence of race conditions, along with the intrinsic modularity of the dataflow graphs, make it possible to explicit the algorithmic parallelism of the programs. 5

11 Two Step Problem Solving Step 2: Dataflow Model of Computation and RVC-CAL (2) CAL[1] = Cal Actor Language, high-level programming language for describing dataflow actors. Decoder Composition Mechanism VTL [2] (RVC-CAL FUs) Abstract Decoder Model (XDF + RVC-Cal) Decoder Implementation Standard RVC Model Proprietary Tool Library Decoder Solution Proprietary Decoder [1] MPEG-B part 4 (2009), [2] MPEG-C part 4 (2010) 6

12 Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications A C D B 7

13 Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications HW platform A A C D 1:1 C D B B clock 7

14 Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications HW platform A A C D 1:1 C D B B clock A C D B B E D 7

15 Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications HW platform A A C D 1:1 C D B B clock A B C D 2:1 HW CG reconfigurable platform A clock C B E D B D E 7

16 Generation of Multi-Dataflow Graphs (2) Challenges of the modern electronic devices design: low area and power (portability) flexibility (multimediality) performances (high efficiency) code reusability (fast evolution) 8

17 Generation of Multi-Dataflow Graphs (2) Challenges of the modern electronic devices design: resource sharing low area and power (portability) CG reconfiguration flexibility (multimediality) dataflow parallelism highlighting performances (high efficiency) dataflow modularity (VTL) code reusability (fast evolution) 8

18 Generation of Multi-Dataflow Graphs (2) Challenges of the modern electronic devices design: resource sharing low area and power (portability) CG reconfiguration flexibility (multimediality) dataflow parallelism highlighting performances (high efficiency) dataflow modularity (VTL) code reusability (fast evolution) perfect matching, but 8

19 Generation of Multi-Dataflow Graphs (2) Challenges of the modern electronic devices design: resource sharing low area and power (portability) CG reconfiguration flexibility (multimediality) dataflow parallelism highlighting performances (high efficiency) dataflow modularity (VTL) code reusability (fast evolution) perfect matching, but what happens if the design complexity grows? 8

20 Generation of Multi-Dataflow Graphs (2) Challenges of the modern electronic devices design: resource sharing low area and power (portability) CG reconfiguration flexibility (multimediality) dataflow parallelism highlighting performances (high efficiency) dataflow modularity (VTL) code reusability (fast evolution) perfect matching, but what happens if the design complexity grows? 8

21 Generation of Multi-Dataflow Graphs (2) Challenges of the modern electronic devices design: resource sharing low area and power (portability) CG reconfiguration flexibility (multimediality) dataflow parallelism highlighting performances (high efficiency) code reusability (fast evolution) need of an automated design flow dataflow modularity (VTL) perfect matching, but what happens if the design complexity grows? 8

22 Introduction Problem Statement Two Step Problem Solving Outline Step 1: Coarse-Grained Reconfigurability Step 2: Dataflow Model of Computation and RVC-CAL Generation of Multi-Dataflow Graphs Automated Design Flow Composition: the Multi-Dataflow Composer Optimization: TURNUS Co-Exploration Framework RTL Generation: Xronos High Level Synthesis Tools Integration Experimental Results An MPEG-4 SP Decoder Use Case Synthesis Results Performance Results Conclusions 9

23 Automated Design Flow Functionalities required by the automated design flow: 10

24 Automated Design Flow Functionalities required by the automated design flow: multi-dataflow network composition A C D B B E D A B C E D 10

25 Automated Design Flow Functionalities required by the automated design flow: multi-dataflow network composition dataflow network buffers optimization A C D B B E D A C B D E A B A B C E C E D D 10

26 Automated Design Flow Functionalities required by the automated design flow: multi-dataflow network composition dataflow network buffers optimization A C D B B E D A C B D E A B A B C E C E D D multi-dataflow corresponding RTL generation A C A B D B E C E clock D 10

27 Composition: the Multi-Dataflow Composer (1) ORCC back-ends front-end C Java LLVM Open RVC-CAL Compiler (ORCC) is a framework able to generate, from an RVC-CAL specification, the corresponding source code for different target platforms (hardware, software or mixed). 11

28 Composition: the Multi-Dataflow Composer (1) RVC-CAL specification ORCC CAL Actors back-ends front-end XDF Graph C Java LLVM Open RVC-CAL Compiler (ORCC) is a framework able to generate, from an RVC-CAL specification, the corresponding source code for different target platforms (hardware, software or mixed). 11

29 Composition: the Multi-Dataflow Composer (1) RVC-CAL specification ORCC CAL Actors back-ends front-end Intermediate Representation XDF Graph C Java LLVM Open RVC-CAL Compiler (ORCC) is a framework able to generate, from an RVC-CAL specification, the corresponding source code for different target platforms (hardware, software or mixed). 11

30 Composition: the Multi-Dataflow Composer (1) RVC-CAL specification 1:1 ORCC CAL Actors back-ends front-end Intermediate Representation XDF Graph C Java LLVM Open RVC-CAL Compiler (ORCC) is a framework able to generate, from an RVC-CAL specification, the corresponding source code for different target platforms (hardware, software or mixed). Source Code 11

31 Composition: the Multi-Dataflow Composer (1) front-end ORCC back-ends C Java LLVM 11

32 front-end back-end Composition: the Multi-Dataflow Composer (1) ORCC back-ends front-end C Java LLVM MDC MDC exploits the ORCC frontend to generate, from a set of RVC-CAL specifications, the hardware description of a reconfigurable platform. 11

33 front-end back-end Composition: the Multi-Dataflow Composer (1) RVC-CAL specifications CAL Actors XDF Graphs ORCC back-ends front-end C Java LLVM MDC MDC exploits the ORCC frontend to generate, from a set of RVC-CAL specifications, the hardware description of a reconfigurable platform. 11

34 front-end back-end Composition: the Multi-Dataflow Composer (1) RVC-CAL specifications CAL Actors XDF Graphs ORCC back-ends front-end Intermediate Representation C Java LLVM MDC MDC exploits the ORCC frontend to generate, from a set of RVC-CAL specifications, the hardware description of a reconfigurable platform. 11

35 front-end back-end Composition: the Multi-Dataflow Composer (1) RVC-CAL specifications CAL Actors XDF Graphs ORCC back-ends front-end Intermediate Representation C Java LLVM Multi- Dataflow XDF Graph MDC MDC exploits the ORCC frontend to generate, from a set of RVC-CAL specifications, the hardware description of a reconfigurable platform. 11

36 front-end back-end Composition: the Multi-Dataflow Composer (1) RVC-CAL specifications n:1 ORCC CAL Actors back-ends front-end Intermediate Representation XDF Graphs C Java LLVM Multi- Dataflow XDF Graph MDC HW Reconfigurable Platform (HDL) HDL Components Library MDC exploits the ORCC frontend to generate, from a set of RVC-CAL specifications, the hardware description of a reconfigurable platform. 11

37 Composition: the Multi-Dataflow RVC-CAL dataflow specifications A dataflow α C D B dataflow β B E D Composer (2) 12

38 MDC front-end Composition: the Multi-Dataflow RVC-CAL dataflow specifications A dataflow α C D Composer (2) multi-dataflow specification A C B dataflow β B s0 s1 D B E D E 12

39 MDC front-end Composition: the Multi-Dataflow RVC-CAL dataflow specifications A dataflow α C D Composer (2) multi-dataflow specification A C B dataflow β B s0 s1 D B E D E MDC back-end A B 0 1 s0 HW CG reconfigurable platform C E s1 0 1 clock D config LUT α β s1 0 1 s

40 Optimization: TURNUS Co-Exploration Framework TURNUS front-end back-end TURNUS is a design space exploration framework for heterogeneous parallel systems. It provides high-level modeling and simulation methods and tools for system level performances estimation and optimization. 13

41 Optimization: TURNUS Co-Exploration Framework TURNUS front-end back-end It provides the execution causation traces for a given dataflow specification in relation to a particular input stimulus. The causation traces are graphs describing the dependencies among the actions executed during the input stimulus processing. 13

42 Optimization: TURNUS Co-Exploration Framework RVC-CAL specification CAL Actors XDF Graph Input stimulus TURNUS front-end back-end It provides the execution causation traces for a given dataflow specification in relation to a particular input stimulus. The causation traces are graphs describing the dependencies among the actions executed during the input stimulus processing. 13

43 Optimization: TURNUS Co-Exploration Framework RVC-CAL specification CAL Actors XDF Graph Input stimulus TURNUS front-end Simulation back-end It provides the execution causation traces for a given dataflow specification in relation to a particular input stimulus. The causation traces are graphs describing the dependencies among the actions executed during the input stimulus processing. 13

44 Optimization: TURNUS Co-Exploration Framework RVC-CAL specification CAL Actors XDF Graph Input stimulus TURNUS front-end Simulation back-end It provides the execution causation traces for a given dataflow specification in relation to a particular input stimulus. The causation traces are graphs describing the dependencies among the actions executed during the input stimulus processing. Execution Causation Traces 13

45 Optimization: TURNUS Co-Exploration Framework TURNUS front-end back-end The causation traces postprocessing analyses the causation traces along with given action weights (latency of the actions for a specific target platform) in order to optimize the size of the FIFO memories in the dataflow network through a Design Space Exploration. Execution Causation Traces 13

46 Optimization: TURNUS Co-Exploration Framework Actions weights TURNUS front-end back-end The causation traces postprocessing analyses the causation traces along with given action weights (latency of the actions for a specific target platform) in order to optimize the size of the FIFO memories in the dataflow network through a Design Space Exploration. Execution Causation Traces 13

47 Optimization: TURNUS Co-Exploration Framework Actions weights TURNUS front-end Design Space Exploration back-end The causation traces postprocessing analyses the causation traces along with given action weights (latency of the actions for a specific target platform) in order to optimize the size of the FIFO memories in the dataflow network through a Design Space Exploration. Execution Causation Traces 13

48 Optimization: TURNUS Co-Exploration Framework Actions weights Execution Causation Traces TURNUS front-end Design Space Exploration back-end Optimal FIFO Sizes The causation traces postprocessing analyses the causation traces along with given action weights (latency of the actions for a specific target platform) in order to optimize the size of the FIFO memories in the dataflow network through a Design Space Exploration. 13

49 RTL Generation: Xronos High Level Synthesis (1) Xronos front-end back-end Xronos is a framework for the generation of RTL descriptions from dataflow or sequential applications. It supports three basic data types (int, uint and bool), flow control statements (branches, loops, parallel blocks) and procedural abstractions (sequences of statements) called tasks. 14

50 RTL Generation: Xronos High Level Synthesis (1) RVC-CAL specification CAL Actors Xronos front-end back-end XDF Graph Xronos is a framework for the generation of RTL descriptions from dataflow or sequential applications. It supports three basic data types (int, uint and bool), flow control statements (branches, loops, parallel blocks) and procedural abstractions (sequences of statements) called tasks. 14

51 RTL Generation: Xronos High Level Synthesis (1) RVC-CAL specification CAL Actors Xronos front-end Intermediate Representation back-end XDF Graph Xronos is a framework for the generation of RTL descriptions from dataflow or sequential applications. It supports three basic data types (int, uint and bool), flow control statements (branches, loops, parallel blocks) and procedural abstractions (sequences of statements) called tasks. 14

52 RTL Generation: Xronos High Level Synthesis (1) RVC-CAL specification 1:1 CAL Actors Xronos Actors.v front-end Intermediate Representation back-end Top.vhd XDF Graph Test.v Xronos is a framework for the generation of RTL descriptions from dataflow or sequential applications. It supports three basic data types (int, uint and bool), flow control statements (branches, loops, parallel blocks) and procedural abstractions (sequences of statements) called tasks. 14

53 RTL Generation: Xronos High Level Synthesis (2) RVC-CAL dataflow specification A A HW platform C D C D B B clock 15

54 RTL Generation: Xronos High Level Synthesis (2) RVC-CAL dataflow specification A A HW platform C D C D B B clock queue_actor_x_in1 in_data in_send in_ack in_rdy in_count out_data out_send out_ack out_count in1_data in1_send in1_ack in1_count Actor_X out1_data out1_send out1_ack out1_rdy out1_count fanout_actor_x_out1 in_data in_send in_ack in_rdy in_count out_data out_send out_ack out_rdy out_count queue_actor_x_inn in_data in_send in_ack in_rdy in_count out_data out_send out_ack out_count inn_data inn_send inn_ack inn_count outm_data outm_send outm_ack outtm_rdy outm_count fanout_actor_x_outm in_data in_send in_ack in_rdy in_count out_data out_send out_ack out_rdy out_count 15

55 front-end back-end Tools Integration XDF CAL RVC-CAL specifications ORCC front-end IR back-ends XDF multi dataflow MDC START: MDC compeses the multi-dataflow network 16

56 front-end back-end Tools Integration TRACES TURNUS Co- Exploration Framework XDF CAL RVC-CAL specifications ORCC front-end IR back-ends XDF multi dataflow MDC STEP 1: TURNUS generates the execution causation traces 16

57 front-end back-end Tools Integration TRACES TURNUS Co- Exploration Framework WEIGHTS XRONOS High Level Synthesis XDF front-end CAL RVC-CAL specifications ORCC IR back-ends XDF multi dataflow MDC STEP 2: Xronos generates the actions weights 16

58 front-end back-end Tools Integration TRACES TURNUS Co- Exploration Framework WEIGHTS FIFO SIZES XRONOS High Level Synthesis XDF front-end CAL RVC-CAL specifications ORCC IR back-ends XDF multi dataflow MDC STEP 3: TURNUS generates the FIFO optimal sizes 16

59 front-end back-end Tools Integration TRACES TURNUS Co- Exploration Framework WEIGHTS FIFO SIZES XRONOS High Level Synthesis HDL Components Library HW CG Reconfigurable Platform (with SBox queues/fanouts) XDF front-end CAL RVC-CAL specifications ORCC IR back-ends XDF multi dataflow MDC STEP 4: Xronos generates the RVC compliant reconfigurable hardware platform 16

60 front-end back-end Tools Integration TRACES TURNUS Co- Exploration Framework WEIGHTS FIFO SIZES XRONOS High Level Synthesis HDL Components Library HW CG Reconfigurable Platform (with SBox queues/fanouts) XDF front-end CAL RVC-CAL specifications ORCC IR back-ends STEP 5: MDC generates the optimized reconfigurable hardware platform XDF multi dataflow MDC opt HW CG Reconfigurable Platform (without Sbox queues/fanouts) 16

61 Introduction Problem Statement Two Step Problem Solving Outline Step 1: Coarse-Grained Reconfigurability Step 2: Dataflow Model of Computation and RVC-CAL Generation of Multi-Dataflow Graphs Automated Design Flow Composition: the Multi-Dataflow Composer Optimization: TURNUS Co-Exploration Framework RTL Generation: Xronos High Level Synthesis Tools Integration Experimental Results An MPEG-4 SP Decoder Use Case Synthesis Results Performance Results Conclusions 17

62 An MPEG-4 SP Decoder Use Case (1) TEX Y MOT Y input stream PARSER TEX U MOT U MERGER 420 video output TEX V MOT V intra 18

63 An MPEG-4 SP Decoder Use Case (1) TEX Y MOT Y input stream PARSER TEX U MOT U MERGER 420 video output PARSER: syntactical parser variable length coding TEX V MOT V intra 18

64 An MPEG-4 SP Decoder Use Case (1) TEX Y MOT Y input stream PARSER TEX U MOT U MERGER 420 video output TEX V MOT V intra TEX (residual decoding): AC-DC prediction inverse scan inverse quantization IDCT 18

65 An MPEG-4 SP Decoder Use Case (1) TEX Y MOT Y input stream PARSER TEX U MOT U MERGER 420 video output TEX V MOT V intra MOT (motion compensation): framebuffer interpolation residual error addition 18

66 An MPEG-4 SP Decoder Use Case (1) TEX Y MOT Y input stream PARSER TEX U MOT U MERGER 420 video output TEX V MOT V intra MERGER 420: 420 decoded components merging 18

67 An MPEG-4 SP Decoder Use Case (2) Different designs composition in terms of functionality and role of the involved actors. design ordinary* SBox number of actors not shared** shared** overall intra full parallel reconf opt_reconf * computational actors (not SBoxes). ** referred only to the computational actors (not to SBoxes). 19

68 An MPEG-4 SP Decoder Use Case (2) Different designs composition in terms of functionality and role of the involved actors. design ordinary* SBox number of actors not shared** shared** overall intra full parallel reconf opt_reconf * computational actors (not SBoxes). ** referred only to the computational actors (not to SBoxes). 19

69 An MPEG-4 SP Decoder Use Case (2) Different designs composition in terms of functionality and role of the involved actors. design ordinary* SBox number of actors not shared** shared** overall intra full parallel reconf opt_reconf * computational actors (not SBoxes). ** referred only to the computational actors (not to SBoxes). 19

70 An MPEG-4 SP Decoder Use Case (2) Different designs composition in terms of functionality and role of the involved actors. design ordinary* SBox number of actors not shared** shared** overall intra full parallel reconf opt_reconf * computational actors (not SBoxes). ** referred only to the computational actors (not to SBoxes). 19

71 An MPEG-4 SP Decoder Use Case (2) Different designs composition in terms of functionality and role of the involved actors. design ordinary* SBox number of actors not shared** shared** overall intra full parallel reconf opt_reconf * computational actors (not SBoxes). ** referred only to the computational actors (not to SBoxes). 19

72 resource Synthesis Results (1) Results retrieved from the Xilinx Synthesis Technology tool targeting a Xilinx Virtex FPGA board. design parallel reconf Δ% opt_reconf Δ% Slices Slice Regs Slice LUTs LUT-FF pairs BRAMs DSPs Max freq [MHz] 25,43 25, ,38 0 Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 20

73 resource Synthesis Results (1) Results retrieved from the Xilinx Synthesis Technology tool targeting a Xilinx Virtex FPGA board. design parallel reconf Δ% opt_reconf Δ% Slices Slice Regs Slice LUTs LUT-FF pairs BRAMs DSPs Max freq [MHz] 25,43 25, ,38 0 Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 20

74 resource Synthesis Results (1) Results retrieved from the Xilinx Synthesis Technology tool targeting a Xilinx Virtex FPGA board. design parallel reconf Δ% opt_reconf Δ% Slices Slice Regs Slice LUTs LUT-FF pairs BRAMs DSPs Max freq [MHz] 25,43 25, ,38 0 Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 20

75 resource Synthesis Results (1) Results retrieved from the Xilinx Synthesis Technology tool targeting a Xilinx Virtex FPGA board. design parallel reconf Δ% opt_reconf Δ% Slices Slice Regs Slice LUTs LUT-FF pairs BRAMs DSPs Max freq [MHz] 25,43 25, ,38 0 Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 20

76 resource power Synthesis Results (2) Results retrieved from the XPower Analyzer tool targeting a Xilinx Virtex FPGA board for a QCIF video sequence decoding. design parallel reconf Δ% opt_reconf Δ% Clock 0,382 0, , Logic 0,045 0, , Signals 0,050 0, , BRAMs 0,090 0, , TOT 0,567 0, , Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 21

77 resource power Synthesis Results (2) Results retrieved from the XPower Analyzer tool targeting a Xilinx Virtex FPGA board for a QCIF video sequence decoding. design parallel reconf Δ% opt_reconf Δ% Clock 0,382 0, , Logic 0,045 0, , Signals 0,050 0, , BRAMs 0,090 0, , TOT 0,567 0, , Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 21

78 resource power Synthesis Results (2) Results retrieved from the XPower Analyzer tool targeting a Xilinx Virtex FPGA board for a QCIF video sequence decoding. design parallel reconf Δ% opt_reconf Δ% Clock 0,382 0, , Logic 0,045 0, , Signals 0,050 0, , BRAMs 0,090 0, , TOT 0,567 0, , Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 21

79 video sequence resolution Performance Results Results are given in frames per second (fps) and are obtained as the average of fps for the intra and the full decoder configurations. design parallel reconf Δ% opt_reconf Δ% QCIF CIF Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 22

80 video sequence resolution Performance Results Results are given in frames per second (fps) and are obtained as the average of fps for the intra and the full decoder configurations. design parallel reconf Δ% opt_reconf Δ% QCIF CIF Δ% percentage increment/reduction between the parallel and the reconfigurable designs. 22

81 Introduction Problem Statement Two Step Problem Solving Outline Step 1: Coarse-Grained Reconfigurability Step 2: Dataflow Model of Computation and RVC-CAL Generation of Multi-Dataflow Graphs Automated Design Flow Composition: the Multi-Dataflow Composer Optimization: TURNUS Co-Exploration Framework RTL Generation: Xronos High Level Synthesis Tools Integration Experimental Results An MPEG-4 SP Decoder Use Case Synthesis Results Performance Results Conclusions 23

82 Conclusions Challenge of modern electronic systems development: coupling portability, flexibility and efficiency with the minimum design effort. 24

83 Conclusions Challenge of modern electronic systems development: coupling portability, flexibility and efficiency with the minimum design effort. A solution is possible by exploiting the dataflow programming paradigm along with the coarse-grained reconfigurability. 24

84 Conclusions Challenge of modern electronic systems development: coupling portability, flexibility and efficiency with the minimum design effort. A solution is possible by exploiting the dataflow programming paradigm along with the coarse-grained reconfigurability. An automated design flow has been assembled, by the integration of different RVC-CAL tools (MDC, TURNUS, Xronos). 24

85 Conclusions Challenge of modern electronic systems development: coupling portability, flexibility and efficiency with the minimum design effort. A solution is possible by exploiting the dataflow programming paradigm along with the coarse-grained reconfigurability. An automated design flow has been assembled, by the integration of different RVC-CAL tools (MDC, TURNUS, Xronos). The proposed design flow has been validated on a real use case involving two configurations of the MPEG-4 Simple Profile decoder. Results highlighted the effectiveness of the approach by achieving more than 25% of saving in terms of resource utilization and more than 20% of saving in terms power consumption. The generated reconfigurable designs present a very small performance penalty (5-6%) with respect to the original decoder designs. 24

86 Acknowledgements The research leading to these results has received funding from: the Region of Sardinia L.R.7/2007 under grant agreement CRP [RPCT Project]. the Region of Sardinia, Young Researchers Grant, POR Sardegna FSE , L.R.7/2007 Promotion of the scientific research and technological innovation in Sardinia 25

87 XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation SAMOS XIV July 14 th - Samos Island (Greece) Carlo Sau DIEE Università degli Studi di Cagliari carlo.sau@diee.unica.it Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case

Department of Electrical and Computer Engineering, University of Maryland

Department of Electrical and Computer Engineering, University of Maryland Department of Electrical and Computer Engineering, University of Maryland OUTLINE Introduction: Problem statement Background Goals Co-processing units generation: Approach and baseline Multi-Dataflow Composer

More information

Outline. Embedded Systems and Rationale. Background. Multi-Dataflow Composer Tool: Features and Achievements. Summary, Coming Soon and Whish List

Outline. Embedded Systems and Rationale. Background. Multi-Dataflow Composer Tool: Features and Achievements. Summary, Coming Soon and Whish List Outline Embedded Systems and Rationale Systems Complexity Hardware-Software Co-Design Design Flow Automation Power Management Background Dataflow Model of Computation Reconfigurable Video Coding Technologies

More information

Research Article Modelling and Automated Implementation of Optimal Power Saving Strategies in Coarse-Grained Reconfigurable Architectures

Research Article Modelling and Automated Implementation of Optimal Power Saving Strategies in Coarse-Grained Reconfigurable Architectures Journal of Electrical and Computer Engineering Volume 2016, Article ID 4237350, 27 pages http://dx.doi.org/10.1155/2016/4237350 Research Article Modelling and Automated Implementation of Optimal Power

More information

MPEG RVC AVC Baseline Encoder Based on a Novel Iterative Methodology

MPEG RVC AVC Baseline Encoder Based on a Novel Iterative Methodology MPEG RVC AVC Baseline Encoder Based on a Novel Iterative Methodology Hussein Aman-Allah, Ehab Hanna, Karim Maarouf, Ihab Amer Laboratory of Microelectronic Systems (GR-LSM), EPFL CH-1015 Lausanne, Switzerland

More information

HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS

HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS S. Casale-Brunet 1, E. Bezati 1, M. Mattavelli 2 1 Swiss Institute of Bioinformatics, Lausanne, Switzerland 2 École Polytechnique Fédérale

More information

Introduction to Reconfigurable Video Coding. Euee S. Jang Hanyang University

Introduction to Reconfigurable Video Coding. Euee S. Jang Hanyang University Introduction to Reconfigurable Video Coding Euee S. Jang Hanyang University 1 VISION OF RVC 2 Reconfigurable Video Coding RVC Standards for Algorithm/Architecture Co- Design from the standardization level.

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures

A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures Procedia Computer Science Volume 51, 2015, Pages 2962 2966 ICCS 2015 International Conference On Computational Science A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures

More information

Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms

Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms Buffer Dimensioning for Throughput Improvement of Dynamic Dataflow Signal Processing Applications on Multi-Core Platforms Małgorzata Michalska, Endri Bezati, Simone Casale-Brunet, Marco Mattavelli EPFL

More information

Heriot-Watt University

Heriot-Watt University Heriot-Watt University Heriot-Watt University Research Gateway Shared-variable synchronization approaches for dynamic dataflow programs Modas, Apostolos; Casale-Brunet, Simone; Stewart, Robert James; Bezati,

More information

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc. Upcoming Video Standards Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc. Outline Brief history of Video Coding standards Scalable Video Coding (SVC) standard Multiview Video Coding

More information

RVC-CAL DATAFLOW IMPLEMENTATIONS OF MPEG AVC/H.264 CABAC DECODING. 2 IETR/INSA Rennes F-35708, Rennes, France

RVC-CAL DATAFLOW IMPLEMENTATIONS OF MPEG AVC/H.264 CABAC DECODING. 2 IETR/INSA Rennes F-35708, Rennes, France RVC-CAL DATAFLOW IMPLEMENTATIONS OF MPEG AVC/H.264 CABAC DECODING Endri Bezati 1, Marco Mattavelli 1, Mickaël Raulet 2 1 Ecole Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland {firstname.lastname}@epfl.ch

More information

An LLVM-based decoder for MPEG Reconfigurable Video Coding

An LLVM-based decoder for MPEG Reconfigurable Video Coding An LLVM-based decoder for MPEG Reconfigurable Video Coding Jérôme Gorin, Matthieu Wipliez, Jonathan Piat, Françoise Préteux, Mickaël Raulet To cite this version: Jérôme Gorin, Matthieu Wipliez, Jonathan

More information

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 The Next Generation 65-nm FPGA Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 Hot Chips, 2006 Structure of the talk 65nm technology going towards 32nm Virtex-5 family Improved I/O Benchmarking

More information

Computing in the age of parallelism: Challenges and opportunities

Computing in the age of parallelism: Challenges and opportunities Computing in the age of parallelism: Challenges and opportunities Jorn W. Janneck jwj@cs.lth.se Computer Science Dept. Lund University Multicore Day, 23 Sep 2013 why we are here Original data collected

More information

Compilers and Code Optimization EDOARDO FUSELLA

Compilers and Code Optimization EDOARDO FUSELLA Compilers and Code Optimization EDOARDO FUSELLA Contents LLVM The nu+ architecture and toolchain LLVM 3 What is LLVM? LLVM is a compiler infrastructure designed as a set of reusable libraries with well

More information

Functional modeling style for efficient SW code generation of video codec applications

Functional modeling style for efficient SW code generation of video codec applications Functional modeling style for efficient SW code generation of video codec applications Sang-Il Han 1)2) Soo-Ik Chae 1) Ahmed. A. Jerraya 2) SD Group 1) SLS Group 2) Seoul National Univ., Korea TIMA laboratory,

More information

Exploring the Design Space of HEVC Inverse Transforms with Dataflow Programming

Exploring the Design Space of HEVC Inverse Transforms with Dataflow Programming Indonesian Journal of Electrical Engineering and Computer Science Vol. 6, No. 1, April 2017, pp. 104 ~ 109 DOI: 10.11591/ijeecs.v6.i1.pp104-109 104 Exploring the Design Space of HEVC Inverse Transforms

More information

FPGA implementations of Histograms of Oriented Gradients in FPGA

FPGA implementations of Histograms of Oriented Gradients in FPGA FPGA implementations of Histograms of Oriented Gradients in FPGA C. Bourrasset 1, L. Maggiani 2,3, C. Salvadori 2,3, J. Sérot 1, P. Pagano 2,3 and F. Berry 1 1 Institut Pascal- D.R.E.A.M - Aubière, France

More information

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei

More information

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB. Agenda The topics that will be addressed are: Scheduling tasks on Reconfigurable FPGA architectures Mauro Marinoni ReTiS Lab, TeCIP Institute Scuola superiore Sant Anna - Pisa Overview on basic characteristics

More information

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction

More information

A unified hardware/software co-synthesis solution for signal processing systems

A unified hardware/software co-synthesis solution for signal processing systems A unified hardware/software co-synthesis solution for signal processing systems Endri Bezati, Hervé Yviquel, Mickaël Raulet, Marco Mattavelli To cite this version: Endri Bezati, Hervé Yviquel, Mickaël

More information

High-level dataflow design of signal processing systems for reconfigurable and multicore heterogeneous platforms

High-level dataflow design of signal processing systems for reconfigurable and multicore heterogeneous platforms J Real-Time Image Proc (2014) 9:251 262 DOI 10.1007/s11554-013-0326-5 SPECIAL ISSUE High-level dataflow design of signal processing systems for reconfigurable and multicore heterogeneous platforms Endri

More information

Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study

Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study Jörn W. Janneck, Ian D. Miller, David B. Parlour, Ghislain Roquier, Matthieu Wipliez, Mickael Raulet To cite this

More information

RTL Coding General Concepts

RTL Coding General Concepts RTL Coding General Concepts Typical Digital System 2 Components of a Digital System Printed circuit board (PCB) Embedded d software microprocessor microcontroller digital signal processor (DSP) ASIC Programmable

More information

Learning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing

Learning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing 3-. 3-.2 Learning Outcomes Spiral 3 Hardware/Software Interfacing I understand the PicoBlaze bus interface signals: PORT_ID, IN_PORT, OUT_PORT, WRITE_STROBE I understand how a memory map provides the agreement

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

An FPGA Based Adaptive Viterbi Decoder

An FPGA Based Adaptive Viterbi Decoder An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture

More information

RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015

RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015 RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015 Outline Motivation Current situation Goal RFNoC Basic concepts Architecture overview Summary No Demo! See our booth,

More information

Early Performance-Cost Estimation of Application-Specific Data Path Pipelining

Early Performance-Cost Estimation of Application-Specific Data Path Pipelining Early Performance-Cost Estimation of Application-Specific Data Path Pipelining Jelena Trajkovic Computer Science Department École Polytechnique de Montréal, Canada Email: jelena.trajkovic@polymtl.ca Daniel

More information

Cache Aware Optimization of Stream Programs

Cache Aware Optimization of Stream Programs Cache Aware Optimization of Stream Programs Janis Sermulins, William Thies, Rodric Rabbah and Saman Amarasinghe LCTES Chicago, June 2005 Streaming Computing Is Everywhere! Prevalent computing domain with

More information

Dataflow programming for heterogeneous computing systems

Dataflow programming for heterogeneous computing systems Dataflow programming for heterogeneous computing systems Jeronimo Castrillon Cfaed Chair for Compiler Construction TU Dresden jeronimo.castrillon@tu-dresden.de Tutorial: Algorithmic specification, tools

More information

An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication

An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication 2018 IEEE International Conference on Consumer Electronics (ICCE) An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication Ahmet Can Mert, Ercan Kalali, Ilker Hamzaoglu Faculty

More information

EARLY PREDICTION OF HARDWARE COMPLEXITY IN HLL-TO-HDL TRANSLATION

EARLY PREDICTION OF HARDWARE COMPLEXITY IN HLL-TO-HDL TRANSLATION UNIVERSITA DEGLI STUDI DI NAPOLI FEDERICO II Facoltà di Ingegneria EARLY PREDICTION OF HARDWARE COMPLEXITY IN HLL-TO-HDL TRANSLATION Alessandro Cilardo, Paolo Durante, Carmelo Lofiego, and Antonino Mazzeo

More information

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

FPGA Implementation and Validation of the Asynchronous Array of simple Processors FPGA Implementation and Validation of the Asynchronous Array of simple Processors Jeremy W. Webb VLSI Computation Laboratory Department of ECE University of California, Davis One Shields Avenue Davis,

More information

A Dedicated Hardware Solution for the HEVC Interpolation Unit

A Dedicated Hardware Solution for the HEVC Interpolation Unit XXVII SIM - South Symposium on Microelectronics 1 A Dedicated Hardware Solution for the HEVC Interpolation Unit 1 Vladimir Afonso, 1 Marcel Moscarelli Corrêa, 1 Luciano Volcan Agostini, 2 Denis Teixeira

More information

Orcc: multimedia development made easy

Orcc: multimedia development made easy Orcc: multimedia development made easy Hervé Yviquel, Antoine Lorence, Khaled Jerbi, Gildas Cocherel, Alexandre Sanchez, Mickaël Raulet To cite this version: Hervé Yviquel, Antoine Lorence, Khaled Jerbi,

More information

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems Abstract Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time

More information

High Data Rate Fully Flexible SDR Modem

High Data Rate Fully Flexible SDR Modem High Data Rate Fully Flexible SDR Modem Advanced configurable architecture & development methodology KASPERSKI F., PIERRELEE O., DOTTO F., SARLOTTE M. THALES Communication 160 bd de Valmy, 92704 Colombes,

More information

Reconfigurable Platform Composer Tool Project

Reconfigurable Platform Composer Tool Project 48th nnual Meeting ssociazione Gruppo Italiano di lettronica Reconfigurable Platform omposer Tool Project rancesca Palumbo University of assari, PolomIng Information ngineering Group arlo au, Tiziana anni,

More information

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Christian Schmitt, Moritz Schmid, Frank Hannig, Jürgen Teich, Sebastian Kuckuk, Harald Köstler Hardware/Software Co-Design, System

More information

Energy Optimization of FPGA-Based Stream- Oriented Computing with Power Gating

Energy Optimization of FPGA-Based Stream- Oriented Computing with Power Gating Energy Optimization of FPGA-Based Stream- Oriented Computing with Power Gating Mohammad Hosseinabady and Jose Luis Nunez-Yanez Department of Electrical and Electronic Engineering University of Bristol,

More information

Performance Verification for ESL Design Methodology from AADL Models

Performance Verification for ESL Design Methodology from AADL Models Performance Verification for ESL Design Methodology from AADL Models Hugues Jérome Institut Supérieur de l'aéronautique et de l'espace (ISAE-SUPAERO) Université de Toulouse 31055 TOULOUSE Cedex 4 Jerome.huges@isae.fr

More information

INTRODUCTION TO FPGA ARCHITECTURE

INTRODUCTION TO FPGA ARCHITECTURE 3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)

More information

Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005

Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005 Ascenium: A Continuously Reconfigurable Architecture Robert Mykland Founder/CTO robert@ascenium.com August, 2005 Ascenium: A Continuously Reconfigurable Processor Continuously reconfigurable approach provides:

More information

RISPP: Rotating Instruction Set Processing Platform

RISPP: Rotating Instruction Set Processing Platform RISPP: Rotating Instruction Set Processing Platform Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for Embedded Systems (CES) University of Karlsruhe (TH) Development of Embedded Systems Typical:

More information

EITF35: Introduction to Structured VLSI Design

EITF35: Introduction to Structured VLSI Design EITF35: Introduction to Structured VLSI Design Introduction to FPGA design Rakesh Gangarajaiah Rakesh.gangarajaiah@eit.lth.se Slides from Chenxin Zhang and Steffan Malkowsky WWW.FPGA What is FPGA? Field

More information

ERCBench An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing

ERCBench An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing ERCBench An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing Daniel Chang Chris Jenkins, Philip Garcia, Syed Gilani, Paula Aguilera, Aishwarya Nagarajan, Michael Anderson, Matthew

More information

Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation

Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation Doug Johnson, Applications Consultant Chris Eddington, Technical Marketing Synopsys 2013 1 Synopsys, Inc. 700 E. Middlefield Road Mountain

More information

ESL design with the Agility Compiler for SystemC

ESL design with the Agility Compiler for SystemC ESL design with the Agility Compiler for SystemC SystemC behavioral design & synthesis Steve Chappell & Chris Sullivan Celoxica ESL design portfolio Complete ESL design environment Streaming Video Processing

More information

Breaking the Bitstream Decryption of FPGAs

Breaking the Bitstream Decryption of FPGAs Breaking the Bitstream Decryption of FPGAs 05. Sep. 2012 Amir Moradi Embedded Security Group, Ruhr University Bochum, Germany Acknowledgment Christof Paar Markus Kasper Timo Kasper Alessandro Barenghi

More information

ReconOS: An RTOS Supporting Hardware and Software Threads

ReconOS: An RTOS Supporting Hardware and Software Threads ReconOS: An RTOS Supporting Hardware and Software Threads Enno Lübbers and Marco Platzner Computer Engineering Group University of Paderborn marco.platzner@computer.org Overview the ReconOS project programming

More information

Yet Another Implementation of CoRAM Memory

Yet Another Implementation of CoRAM Memory Dec 7, 2013 CARL2013@Davis, CA Py Yet Another Implementation of Memory Architecture for Modern FPGA-based Computing Shinya Takamaeda-Yamazaki, Kenji Kise, James C. Hoe * Tokyo Institute of Technology JSPS

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

Proceedings Chapter. Reference. Efficient scheduling policies for dynamic data flow programs executed on multi-core. MICHALSKA, Malgorzata, et al.

Proceedings Chapter. Reference. Efficient scheduling policies for dynamic data flow programs executed on multi-core. MICHALSKA, Malgorzata, et al. Proceedings Chapter Efficient scheduling policies for dynamic data flow programs executed on multi-core MICHALSKA, Malgorzata, et al. Abstract An important challenge of dataflow program implementations

More information

AVC Entropy Coding for MPEG Reconfigurable Video Coding

AVC Entropy Coding for MPEG Reconfigurable Video Coding AVC Entropy Coding for MPEG Reconfigurable Video Coding Hussein Aman-Allah and Ihab Amer Laboratory of Microelectronic Systems (GR-LSM), EPFL CH-1015 Lausanne, Switzerland {hussein.aman-allah, ihab.amer}@epfl.ch

More information

H.264 AVC 4k Decoder V.1.0, 2014

H.264 AVC 4k Decoder V.1.0, 2014 SOC H.264 AVC 4k Video Decoder Datasheet System-On-Chip (SOC) Technologies 1. Key Features 1. Profile: High profile 2. Resolution: 4k (3840x2160) 3. Frame Rate: up to 60fps 4. Chroma Format: 4:2:0 or 4:2:2

More information

Side-Channel Countermeasures for Hardware: is There a Light at the End of the Tunnel?

Side-Channel Countermeasures for Hardware: is There a Light at the End of the Tunnel? Side-Channel Countermeasures for Hardware: is There a Light at the End of the Tunnel? 11. Sep 2013 Ruhr University Bochum Outline Power Analysis Attack Masking Problems in hardware Possible approaches

More information

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Walter Stechele, Stephan Herrmann, Andreas Herkersdorf Technische Universität München 80290 München Germany Walter.Stechele@ei.tum.de

More information

Integration of reduction operators in a compiler for FPGAs

Integration of reduction operators in a compiler for FPGAs Integration of reduction operators in a compiler for FPGAs Manjukumar Harthikote Matha Department of Electrical and Computer Engineering Colorado State University Outline Introduction to RC system and

More information

The Lekha 3GPP LTE FEC IP Core meets 3GPP LTE specification 3GPP TS V Release 10[1].

The Lekha 3GPP LTE FEC IP Core meets 3GPP LTE specification 3GPP TS V Release 10[1]. Lekha IP 3GPP LTE FEC Encoder IP Core V1.0 The Lekha 3GPP LTE FEC IP Core meets 3GPP LTE specification 3GPP TS 36.212 V 10.5.0 Release 10[1]. 1.0 Introduction The Lekha IP 3GPP LTE FEC Encoder IP Core

More information

Hardware Software Co-design and SoC. Neeraj Goel IIT Delhi

Hardware Software Co-design and SoC. Neeraj Goel IIT Delhi Hardware Software Co-design and SoC Neeraj Goel IIT Delhi Introduction What is hardware software co-design Some part of application in hardware and some part in software Mpeg2 decoder example Prediction

More information

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

Lars Schor, and Lothar Thiele ETH Zurich, Switzerland

Lars Schor, and Lothar Thiele ETH Zurich, Switzerland Iuliana Bacivarov, Wolfgang Haid, Kai Huang, Lars Schor, and Lothar Thiele ETH Zurich, Switzerland Efficient i Execution of KPN on MPSoC Efficiency regarding speed-up small memory footprint portability

More information

Just-in-time adaptive decoder engine: a universal video decoder based on MPEG RVC

Just-in-time adaptive decoder engine: a universal video decoder based on MPEG RVC Just-in-time adaptive decoder engine: a universal video decoder based on MPEG RVC Jérôme Gorin, Hervé Yviquel, Françoise Prêteux, Mickaël Raulet To cite this version: Jérôme Gorin, Hervé Yviquel, Françoise

More information

Spiral 3-1. Hardware/Software Interfacing

Spiral 3-1. Hardware/Software Interfacing 3-1.1 Spiral 3-1 Hardware/Software Interfacing 3-1.2 Learning Outcomes I understand the PicoBlaze bus interface signals: PORT_ID, IN_PORT, OUT_PORT, WRITE_STROBE I understand how a memory map provides

More information

Lecture 41: Introduction to Reconfigurable Computing

Lecture 41: Introduction to Reconfigurable Computing inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following

More information

Multi-Level Computing Architectures (MLCA)

Multi-Level Computing Architectures (MLCA) Multi-Level Computing Architectures (MLCA) Faraydon Karim ST US-Fellow Emerging Architectures Advanced System Technology STMicroelectronics Outline MLCA What s the problem Traditional solutions: VLIW,

More information

Metodologie di Progettazione Hardware e Software

Metodologie di Progettazione Hardware e Software POLITECNICO DI MILANO Metodologie di Progettazione Hardware e Software Reconfigurable Computing - Design Flow - Marco D. Santambrogio marco.santabrogio@polimi.it Outline 2 Retargetable Compiler Basic Idea

More information

Synthesizing Hardware from Dataflow Programs

Synthesizing Hardware from Dataflow Programs Synthesizing Hardware from Dataflow Programs Jörn W. Janneck, Ian D. Miller, David B. Parlour, Ghislain Roquier, Matthieu Wipliez, Mickaël Raulet To cite this version: Jörn W. Janneck, Ian D. Miller, David

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

FPGAs for Image Processing

FPGAs for Image Processing FPGAs for Image Processing A DSL and program transformations Rob Stewart Greg Michaelson Idress Ibrahim Deepayan Bhowmik Andy Wallace Paulo Garcia Heriot-Watt University 10 May 2016 What I will say 1.

More information

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage silage@temple.edu ECE Temple University www.temple.edu/scdl Signal Processing Algorithms into Fixed Point FPGA Hardware Motivation

More information

SHA3 Core Specification. Author: Homer Hsing

SHA3 Core Specification. Author: Homer Hsing SHA3 Core Specification Author: Homer Hsing homer.hsing@gmail.com Rev. 0.1 January 29, 2013 This page has been intentionally left blank. www.opencores.org Rev 0.1 ii Rev. Date Author Description 0.1 01/29/2013

More information

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests Mingxing Tan 1 2, Gai Liu 1, Ritchie Zhao 1, Steve Dai 1, Zhiru Zhang 1 1 Computer Systems Laboratory, Electrical and Computer

More information

Introduction. L25: Modern Compiler Design

Introduction. L25: Modern Compiler Design Introduction L25: Modern Compiler Design Course Aims Understand the performance characteristics of modern processors Be familiar with strategies for optimising dynamic dispatch for languages like JavaScript

More information

Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor. NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis

Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor. NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis University of Thessaly Greece 1 Outline Introduction to AVS

More information

Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension

Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami Institute of Systems, Information

More information

Table 1: Example Implementation Statistics for Xilinx FPGAs

Table 1: Example Implementation Statistics for Xilinx FPGAs logijpge Motion JPEG Encoder January 10 th, 2018 Data Sheet Version: v1.0 Xylon d.o.o. Fallerovo setaliste 22 10000 Zagreb, Croatia Phone: +385 1 368 00 26 Fax: +385 1 365 51 67 E-mail: support@logicbricks.com

More information

Multimedia Decoder Using the Nios II Processor

Multimedia Decoder Using the Nios II Processor Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra

More information

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 Video Codecs 70% of internet traffic will be video in 2018 [CISCO] Video

More information

A qualitative analysis of the benefits of LUTs, Processors, embedded memory and interconnect in MPSoC platforms. Kees Vissers.

A qualitative analysis of the benefits of LUTs, Processors, embedded memory and interconnect in MPSoC platforms. Kees Vissers. A qualitative analysis of the benefits of LUTs, Processors, embedded memory and interconnect in MPSoC platforms Xilinx Research OUTLINE Historical Perspective Conventional FPGAs Applications and Programming

More information

Scalable Extension of HEVC 한종기

Scalable Extension of HEVC 한종기 Scalable Extension of HEVC 한종기 Contents 0. Overview for Scalable Extension of HEVC 1. Requirements and Test Points 2. Coding Gain/Efficiency 3. Complexity 4. System Level Considerations 5. Related Contributions

More information

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline CPE/EE 422/522 Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices Dr. Rhonda Kay Gaede UAH Outline Introduction Field-Programmable Gate Arrays Virtex Virtex-E, Virtex-II, and Virtex-II

More information

Development of tools supporting. MEANDER Design Framework

Development of tools supporting. MEANDER Design Framework Development of tools supporting FPGA reconfigurable hardware MEANDER Design Framework Presentation Outline Current state of academic design tools Proposed design flow Proposed graphical user interface

More information

What is Xilinx Design Language?

What is Xilinx Design Language? Bill Jason P. Tomas University of Nevada Las Vegas Dept. of Electrical and Computer Engineering What is Xilinx Design Language? XDL is a human readable ASCII format compatible with the more widely used

More information

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,

More information

A HOG-based Real-time and Multi-scale Pedestrian Detector Demonstration System on FPGA

A HOG-based Real-time and Multi-scale Pedestrian Detector Demonstration System on FPGA Institute of Microelectronic Systems A HOG-based Real-time and Multi-scale Pedestrian Detector Demonstration System on FPGA J. Dürre, D. Paradzik and H. Blume FPGA 2018 Outline Pedestrian detection with

More information

THE H.264 ADVANCED VIDEO COMPRESSION STANDARD

THE H.264 ADVANCED VIDEO COMPRESSION STANDARD THE H.264 ADVANCED VIDEO COMPRESSION STANDARD Second Edition Iain E. Richardson Vcodex Limited, UK WILEY A John Wiley and Sons, Ltd., Publication About the Author Preface Glossary List of Figures List

More information

FPGA: What? Why? Marco D. Santambrogio

FPGA: What? Why? Marco D. Santambrogio FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much

More information

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering Winter/Summer Training Level 2 continues. 3 rd Year 4 th Year FIG-3 Level 1 (Basic & Mandatory) & Level 1.1 and

More information

FPGA based High Performance CAVLC Implementation for H.264 Video Coding

FPGA based High Performance CAVLC Implementation for H.264 Video Coding FPGA based High Performance CAVLC Implementation for H.264 Video Coding Arun Kumar Pradhan Trident Academy of Technology Bhubaneswar,India Lalit Kumar Kanoje Trident Academy of Technology Bhubaneswar,India

More information

Introduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013

Introduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013 Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.

More information

Iterative Refinement on FPGAs

Iterative Refinement on FPGAs Iterative Refinement on FPGAs Tennessee Advanced Computing Laboratory University of Tennessee JunKyu Lee July 19 th 2011 This work was partially supported by the National Science Foundation, grant NSF

More information

Multi processor systems with configurable hardware acceleration

Multi processor systems with configurable hardware acceleration Multi processor systems with configurable hardware acceleration Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline Motivations

More information

A 1-GHz Configurable Processor Core MeP-h1

A 1-GHz Configurable Processor Core MeP-h1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface

More information

Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip

Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip Srinivas Murthy Garimella Master s Thesis Defense Thesis Advisor: Dr. Charles E. Stroud Committee Members: Dr. Victor P. Nelson

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

An automatic tool flow for the combined implementation of multi-mode circuits

An automatic tool flow for the combined implementation of multi-mode circuits An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João M. P. Cardoso and Dirk Stroobandt Ghent University, ELIS Department Sint-Pietersnieuwstraat

More information