REGISTER FILE ACCESS REDUCTION BY DATA REUSE

Size: px

Start display at page:

Download "REGISTER FILE ACCESS REDUCTION BY DATA REUSE"

Thomasine Gregory
6 years ago
Views:

1 REGISTER FILE ACCESS REDUCTION BY DATA REUSE Hiroshi Takamura Koji Inoue Vasily G. Moshnyaga Dept. of Electronics Engineering and Computer Science Fukuoka University, Japan 1

2 Overview of the talk Motivation of this work The Data-Reuse approach Experimental Results Conclusion 2

3 Motivation of this work Extending battery life time. Making to low-cost. Reducing energy consumption of microprocessors is necessary 3

4 Power distribution in Motorola s M-core Source: D.Gonzales, IEEE Micro,19(4)1999 Clock : Data path: Controller: 36% 36% 28% Total 100% Register file takes 16% of the total power and 42% of the data path power! 4

5 Register File Energy Dissipation Energy = ( Nread + Nwrite ) * Eacc Total number of RF reads Total number of RF writes Average energy per RF access Our goal: To lower N according to operand variation by Architectural optimizations Assumption: Read and write consumes equal energy 5

6 Problem of conventional RF operation Destination operand The value is not updated. add $t0, $s1, $t1 (i) mul $t3, $s1, $t1 (ii) The first source operand The second source operand 4 read-accesses Register file Rs Rt ALU Therefore there is unnecessary RF reading 6

7 Problem of conventional RF operation ID/EX EX/MEM MEM/WB r s r s r d Register File $r s $r t x1 x2 ALU Data memory x B A Forwarding unit Almost all results are provided to following instructions via forwarding units, so that they are consumed before RF writing. So, there is a unnecessary RF writing 7

8 Register file access reduction approach (Reuse of the same source operand value Destination operand add $t0, $s1, $t1 (i) mul $t3, $s1, $t1 (ii) The first source operand The second source operand R-mode Register file Rs Rt ALU control 8

9 control Register file access reduction approach (operand swapping) Destination operand add $t0, $s1, $t1 (i) mul $t3, $t1, $s1 (ii) S-mode The first source operand The second source operand Register file Rs Rt ALU 9

10 RF access reduction approach (Delayed Operand Reuse) Destination operand sub $t3, $s1, $t1 (i) lw $t2, 20($s2) (ii) sub $t4, $t2, $t1 (iii) The first source operand The second source operand J-mode Register file Rs Rt ALU control 10

11 Reduction of RF writing (Application of writing operation omission) Destination operand Useless writing access add $t1, $t1, $s1 (i) sub $t1, $s1, $t1 (ii) c.c.1 c.c.2 c.c.3 c.c.4 c.c.5 c.c.6 The first source operand The second source operand (i) IM Reg DM Reg (ii) IM Reg DM Reg 11

12 -An example- Number of accesses in conventional register file Dest.s Source1 Source2 add $t0, $s1, $t1 (i) mul $t3, $s1, $t1 (ii) add $t1, $t1, $s1 (iii) sub $t1, $s1, $t1 (iv) lw $t2, 20($s1) (v) sub $t4, $s1, $t1 (vi) CONV R S RSJ W+RSJ Number of accesses Nread 11 Nwrite 6 12

13 Operand reusing between continuous instructions Dest.s Source1 Source2 add $t0, $s1, $t1 (i) mul $t3, $s1, $t1 (ii) add $t1, $t1, $s1 (iii) sub $t1, $s1, $t1 (iv) lw $t2, 20($s1) (v) sub $t4, $s1, $t1 (vi) CONV R S RSJ W+RSJ Number of accesses Nread Nwrite

14 Operand swapping Dest.s Source1 Source2 add $t0, $s1, $t1 (i) mul $t3, $s1, $t1 (ii) add $t1, $t1, $s1 (iii) sub $t1, $s1, $t1 (iv) lw $t2, 20($s1) (v) sub $t4, $s1, $t1 (vi) CONV R S RSJ W+RSJ Number of accesses Nread Nwrite

15 Reusing operand between discontinuous instructions Dest.s Source1 Source2 add $t0, $s1, $t1 (i) mul $t3, $s1, $t1 (ii) add $t1, $t1, $s1 (iii) sub $t1, $s1, $t1 (iv) lw $t2, 20($s1) (v) sub $t4, $s1, $t1 (vi) CONV R S RSJ W+RSJ Number of accesses Nread Nwrite

16 Writing operation omission Dest.s Source1 Source2 add $t0, $s1, $t1 (i) mul $t3, $s1, $t1 (ii) add $t1, $t1, $s1 (iii) sub $t1, $s1, $t1 (iv) lw $t2, 20($s1) (v) sub $t4, $s1, $t1 (vi) CONV R S RSJ W+RSJ Number of accesses Nread Nwrite

17 RF accesses by the proposed technique add $t0, $s1, $t1 (i) mul $t3, $s1, $t1 (ii) add $t1, $t1, $s1 (iii) sub $t1, $s1, $t1 (iv) lw $t2, 20($s1) (v) sub $t4, $s1, $t1 (vi) CONV R S RSJ W+RSJ Nread Nwrite 6 Number of reading 11 times > 2 times Number of writing 6 times > 5 times Number of total accesses 17 times > 7 times

18 Experimental Evaluation Flexible Architecture Simulation Tool Cycle-accurate instruction simulation on 5-stage RISCtype microprocessor (similar to MIPS) Traces user-level instructions and records RF access info as well as operand s total number of reuse. 32-entry RF (1 write, 2 reads) SPEC95 and MediaBench Benchmarks: adpcm_c, adpcm_d, compress, go, mpeg_d, mpeg_e, pegwit_g, pegwit_enc, pegwit_dec we described a simple RISC microprocessor in Verilog- HDL, and synthesized it by Synopsys Design Compiler. A 0.35 m process technology was assumed. SUN UltraSparc-3 environment 18

19 Benchmark description adc ade com_n com_t com_b go mpd mpd mpd mpd mpe pegc pege pegd Adaptive voice coding Adaptive voice decoding Unix file compression A go-playing program A mpeg2 video decoding program A mpeg2 video encoding program A mpeg2 video encoding program A public key generation program A key decryption program Instr.count 6,602,451 8,024,540 63,719,628 4,275,434 83,180,240,140 24,522,085,063 62,345, ,957,333 10,711,481 62,343,421 1,463,074,731 16,444,080 38,408,699 21,454,539 19

20 Reduction rate (%) for the RF read RF access reduction: 62.7% (maximum)! 20

21 Reduction rate (%) for the RF write 2inst 1inst ade com_n com_b mpd_m mpd_v mpe pege RF access reduction: 60% (maximum)! 21

22 Reduction rate (%) for read&write W+R W+S W+J W+RSJ RF access reduction: 61% (maximum)! 22

23 Area comparison The increase rate of area(%) 105% 104% 103% 102% 101% 100% 99% 98% 97% % % % Conventional type Read Reuse Read &Write Reuse Hardware Overhead: +3.2% (maximum)! 23

24 Conclusion We proposed a technique to reduce energy dissipation of register file by operand reuse Energy savings vary on application: Read: 62% (max), 29%(aver.) Write: 60% (2instr), 55%(1instr) Total: 61% (max), 39%(aver.) Hardware overhead Read: 1.7%, Read&Write: 3.2% Future Work Verification at a cycle level Evaluation based on a detailed energy models A detailed estimation of the control circuitry overhead 24

ECS 154B Computer Architecture II Spring 2009

ECS 154B Computer Architecture II Spring 2009 Pipelining Datapath and Control 6.2-6.3 Partially adapted from slides by Mary Jane Irwin, Penn State And Kurtis Kredo, UCD Pipelined CPU Break execution into