ECE 587 Hardware/Software Co-Design Lecture 23 Hardware Synthesis III

Size: px

Start display at page:

Download "ECE 587 Hardware/Software Co-Design Lecture 23 Hardware Synthesis III"

Lora Briggs
5 years ago
Views:

1 ECE 587 Hardware/Software Co-Design Spring /28 ECE 587 Hardware/Software Co-Design Lecture 23 Hardware Synthesis III Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute of Technology April 3, 2018

2 ECE 587 Hardware/Software Co-Design Spring /28 Reading Assignment This lecture: Next lecture: we will talk about LLVM

3 ECE 587 Hardware/Software Co-Design Spring /28 Outline Connection Sharing Register Merging Chaining and Multi-Cycling Pipelining

4 ECE 587 Hardware/Software Co-Design Spring /28 Connection Sharing Interconnects do consume considerable amount of resource in modern chip designs. Consist of metal wires, vias, and buffers. Merge connections into buses to reduce resource usage Group connections not used at the same time Use tri-state buffer to connect connection sources to bus We may implement the selectors the same way as that s exact how big mux s are implemented.

5 ECE 587 Hardware/Software Co-Design Spring /28 Connection Sharing Interconnects do consume considerable amount of resource in modern chip designs. Consist of metal wires, vias, and buffers. Merge connections into buses to reduce resource usage Group connections not used at the same time Use tri-state buffer to connect connection sources to bus We may implement the selectors the same way as that s exact how big mux s are implemented.

6 ECE 587 Hardware/Software Co-Design Spring /28 Connection Usage Table and Compatibility Graph (Gajski et al.)

7 ECE 587 Hardware/Software Co-Design Spring /28 Bus Assignment (Gajski et al.)

ECE 587 Hardware/Software Co-Design Spring 2018

8 ECE 587 Hardware/Software Co-Design Spring /28 Updated Datapath with Buses (Gajski et al.)

9 ECE 587 Hardware/Software Co-Design Spring /28 Outline Connection Sharing Register Merging Chaining and Multi-Cycling Pipelining

10 ECE 587 Hardware/Software Co-Design Spring /28 Register Merging Organize registers with non-overlapping access times into register files Register input and output ports are shared to reduce the number of connections. Register-to-register delay may increase due to the overhead of decoding the addresses.

11 ECE 587 Hardware/Software Co-Design Spring /28 Register Merging Organize registers with non-overlapping access times into register files Register input and output ports are shared to reduce the number of connections. Register-to-register delay may increase due to the overhead of decoding the addresses.

12 ECE 587 Hardware/Software Co-Design Spring /28 Register Merging Organize registers with non-overlapping access times into register files Register input and output ports are shared to reduce the number of connections. Register-to-register delay may increase due to the overhead of decoding the addresses.

13 Register Access Table and Compatibility Graph ECE 587 Hardware/Software Co-Design Spring 2018 (Gajski et10/28 al.)

14 ECE 587 Hardware/Software Co-Design Spring /28 Updated Datapath with Register File (Gajski et al.)

15 ECE 587 Hardware/Software Co-Design Spring /28 Outline Connection Sharing Register Merging Chaining and Multi-Cycling Pipelining

16 ECE 587 Hardware/Software Co-Design Spring /28 Clock Period How to choose a clock period? Smaller clock period Most operations may take multiple cycles to complete. The overall computation takes more cycle to finish. Overhead associated with more states: execution time and power consumption Larger clock period Many operations may only need part of the clock cycle to complete, wasting the remaining slacks The overall execution time could be longer. Issues addressed by chaining and multi-cycling

17 ECE 587 Hardware/Software Co-Design Spring /28 Clock Period How to choose a clock period? Smaller clock period Most operations may take multiple cycles to complete. The overall computation takes more cycle to finish. Overhead associated with more states: execution time and power consumption Larger clock period Many operations may only need part of the clock cycle to complete, wasting the remaining slacks The overall execution time could be longer. Issues addressed by chaining and multi-cycling

18 ECE 587 Hardware/Software Co-Design Spring /28 Clock Period How to choose a clock period? Smaller clock period Most operations may take multiple cycles to complete. The overall computation takes more cycle to finish. Overhead associated with more states: execution time and power consumption Larger clock period Many operations may only need part of the clock cycle to complete, wasting the remaining slacks The overall execution time could be longer. Issues addressed by chaining and multi-cycling

19 ECE 587 Hardware/Software Co-Design Spring /28 Clock Period How to choose a clock period? Smaller clock period Most operations may take multiple cycles to complete. The overall computation takes more cycle to finish. Overhead associated with more states: execution time and power consumption Larger clock period Many operations may only need part of the clock cycle to complete, wasting the remaining slacks The overall execution time could be longer. Issues addressed by chaining and multi-cycling

20 ECE 587 Hardware/Software Co-Design Spring /28 Chaining and Multi-Cycling Chaining Allow the result of an operation to be used immediately within the cycle by another operation Compose more complex combinational operations from simpler ones Fully utilize the whole clock period Multi-cycling Support operations requiring multiple cycles to finish Otherwise we have to set the clock period to the longest completion time among all operations and thus may waste a lot of slacks Recall our simplified HLS flow supports multi-cycling but not chaining. These two optimizations may change the schedule and thus may need expensive verifications.

21 ECE 587 Hardware/Software Co-Design Spring /28 Chaining and Multi-Cycling Chaining Allow the result of an operation to be used immediately within the cycle by another operation Compose more complex combinational operations from simpler ones Fully utilize the whole clock period Multi-cycling Support operations requiring multiple cycles to finish Otherwise we have to set the clock period to the longest completion time among all operations and thus may waste a lot of slacks Recall our simplified HLS flow supports multi-cycling but not chaining. These two optimizations may change the schedule and thus may need expensive verifications.

22 ECE 587 Hardware/Software Co-Design Spring /28 Chaining and Multi-Cycling Chaining Allow the result of an operation to be used immediately within the cycle by another operation Compose more complex combinational operations from simpler ones Fully utilize the whole clock period Multi-cycling Support operations requiring multiple cycles to finish Otherwise we have to set the clock period to the longest completion time among all operations and thus may waste a lot of slacks Recall our simplified HLS flow supports multi-cycling but not chaining. These two optimizations may change the schedule and thus may need expensive verifications.

23 ECE 587 Hardware/Software Co-Design Spring /28 Chaining and Multi-Cycling Chaining Allow the result of an operation to be used immediately within the cycle by another operation Compose more complex combinational operations from simpler ones Fully utilize the whole clock period Multi-cycling Support operations requiring multiple cycles to finish Otherwise we have to set the clock period to the longest completion time among all operations and thus may waste a lot of slacks Recall our simplified HLS flow supports multi-cycling but not chaining. These two optimizations may change the schedule and thus may need expensive verifications.

24 FSMD Model w/o and w/ Chaining (Gajski et al.) ECE 587 Hardware/Software Co-Design Spring /28

25 ECE 587 Hardware/Software Co-Design Spring /28 Updated Datapath with Chaining (Gajski et al.)

26 ECE 587 Hardware/Software Co-Design Spring /28 FSMD Model w/o and w/ Multi-Cycling (Gajski et al.)

27 ECE 587 Hardware/Software Co-Design Spring /28 Updated Datapath with Multi-Cycling (Gajski et al.)

28 ECE 587 Hardware/Software Co-Design Spring /28 Outline Connection Sharing Register Merging Chaining and Multi-Cycling Pipelining

29 ECE 587 Hardware/Software Co-Design Spring /28 Pipelining If an operation takes more than 1 cycle to finish, part of the functional unit could be idling for at least one clock cycle. Waste of resource! Pipelining Introduce additional flip-flops/latches into the functional units so the idling parts can be reused for other operations. As if there are additional functional units You may also pipeline a unit for a higher frequency Cost of pipelining Additional internal sequential elements A small overhead to clock period Complications Pipelining changes cycle-to-cycle behavior of the design. It is generally not possible to pipeline a RTL design so we prefer to apply it in HLS.

30 ECE 587 Hardware/Software Co-Design Spring /28 Pipelining If an operation takes more than 1 cycle to finish, part of the functional unit could be idling for at least one clock cycle. Waste of resource! Pipelining Introduce additional flip-flops/latches into the functional units so the idling parts can be reused for other operations. As if there are additional functional units You may also pipeline a unit for a higher frequency Cost of pipelining Additional internal sequential elements A small overhead to clock period Complications Pipelining changes cycle-to-cycle behavior of the design. It is generally not possible to pipeline a RTL design so we prefer to apply it in HLS.

31 ECE 587 Hardware/Software Co-Design Spring /28 Pipelining If an operation takes more than 1 cycle to finish, part of the functional unit could be idling for at least one clock cycle. Waste of resource! Pipelining Introduce additional flip-flops/latches into the functional units so the idling parts can be reused for other operations. As if there are additional functional units You may also pipeline a unit for a higher frequency Cost of pipelining Additional internal sequential elements A small overhead to clock period Complications Pipelining changes cycle-to-cycle behavior of the design. It is generally not possible to pipeline a RTL design so we prefer to apply it in HLS.

32 ECE 587 Hardware/Software Co-Design Spring /28 Pipelining If an operation takes more than 1 cycle to finish, part of the functional unit could be idling for at least one clock cycle. Waste of resource! Pipelining Introduce additional flip-flops/latches into the functional units so the idling parts can be reused for other operations. As if there are additional functional units You may also pipeline a unit for a higher frequency Cost of pipelining Additional internal sequential elements A small overhead to clock period Complications Pipelining changes cycle-to-cycle behavior of the design. It is generally not possible to pipeline a RTL design so we prefer to apply it in HLS.

33 ECE 587 Hardware/Software Co-Design Spring /28 Functional Unit Pipelining (Gajski et al.)

34 ECE 587 Hardware/Software Co-Design Spring /28 Scheduling with Pipelined Units A unit can be pipelined into multiple stages Assume the number of stages is the number clock cycles One set of new operands can be introduced per each clock cycle. As usual, they should persist throughout that clock cycle. Unlike multi-cycling, it is not necessary to maintain the inputs throughout the whole computation. The output will be ready after designated number of cycles. Following the same order you send in the operands The same as multi-cycling

35 ECE 587 Hardware/Software Co-Design Spring /28 Scheduling with Pipelined Units A unit can be pipelined into multiple stages Assume the number of stages is the number clock cycles One set of new operands can be introduced per each clock cycle. As usual, they should persist throughout that clock cycle. Unlike multi-cycling, it is not necessary to maintain the inputs throughout the whole computation. The output will be ready after designated number of cycles. Following the same order you send in the operands The same as multi-cycling

36 ECE 587 Hardware/Software Co-Design Spring /28 Scheduling with Pipelined Units A unit can be pipelined into multiple stages Assume the number of stages is the number clock cycles One set of new operands can be introduced per each clock cycle. As usual, they should persist throughout that clock cycle. Unlike multi-cycling, it is not necessary to maintain the inputs throughout the whole computation. The output will be ready after designated number of cycles. Following the same order you send in the operands The same as multi-cycling

37 ECE 587 Hardware/Software Co-Design Spring /28 Scheduling Details (Gajski et al.) Performance can be improved by bypassing certain pipeline stages. e.g. the second stage of ALU for absolute value computation

ECE 587 Hardware/Software Co-Design Spring 2018 23/28

38 ECE 587 Hardware/Software Co-Design Spring /28 Scheduling Details (Gajski et al.) Performance can be improved by bypassing certain pipeline stages. e.g. the second stage of ALU for absolute value computation

39 ECE 587 Hardware/Software Co-Design Spring /28 Datapath Pipelining (Gajski et al.) If interconnects/register files will consume significant amount of time, we may pipeline the whole datapath.

40 ECE 587 Hardware/Software Co-Design Spring /28 Scheduling Details (Gajski et al.) Performance can be improved by forwarding outputs not yet stored to registers. e.g. forward ALUout to ALUIn(L) and ALUIn(R)

41 ECE 587 Hardware/Software Co-Design Spring /28 Scheduling Details (Gajski et al.) Performance can be improved by forwarding outputs not yet stored to registers. e.g. forward ALUout to ALUIn(L) and ALUIn(R)

42 ECE 587 Hardware/Software Co-Design Spring /28 Control and Datapath Pipelining Registers at input/output (SR/CWR) to simplify timing, especially when interconnect delay matters (Gajski et al.) Internal register (PC) to mitigate memory access delays for programmable hardware.

43 ECE 587 Hardware/Software Co-Design Spring /28 Control and Datapath Pipelining Registers at input/output (SR/CWR) to simplify timing, especially when interconnect delay matters (Gajski et al.) Internal register (PC) to mitigate memory access delays for programmable hardware.

ECE 587 Hardware/Software Co-Design Spring 2018 26/28 Control and Datapath Pipelining Registers at input/output (SR/CWR) to simplify timing,

44 ECE 587 Hardware/Software Co-Design Spring /28 Control and Datapath Pipelining Registers at input/output (SR/CWR) to simplify timing, especially when interconnect delay matters (Gajski et al.) Internal register (PC) to mitigate memory access delays for programmable hardware.

45 ECE 587 Hardware/Software Co-Design Spring /28 Scheduling Details (Gajski et al.)

46 ECE 587 Hardware/Software Co-Design Spring /28 Summary Connections can be merged into buses Registers can be merged into register files to save ports and thus connections. Chaining and multi-cycling help to make full use of the clock period. Pipelining helps to reuse idling hardware components.

ECE 587 Hardware/Software Co-Design Lecture 11 Verification I

ECE 587 Hardware/Software Co-Design Spring 2018 1/23 ECE 587 Hardware/Software Co-Design Lecture 11 Verification I Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute