L1 Data Cache Decomposition for Energy Efficiency

Size: px

Start display at page:

Download "L1 Data Cache Decomposition for Energy Efficiency"

Marion Barker
5 years ago
Views:

1 L1 Data Cache Decomposition for Energy Efficiency Michael Huang, Joe Renau, Seung-Moon Yoo, Josep Torrellas University of Illinois at Urbana-Champaign

2 Objective Reduce L1 data cache energy consumption No performance degradation Partition the cache in multiple ways Specialization for stack accesses International Symposium on Low Power Electronics and Design, August

3 Outline L1 D-Cache decomposition Specialized Stack Cache Pseudo Set-Associative Cache Simulation Environment Evaluation Conclusions International Symposium on Low Power Electronics and Design, August

4 L1 D-Cache Decomposition A Specialized Stack Cache (SSC) A Pseudo Set-Associative Cache (PSAC) International Symposium on Low Power Electronics and Design, August

5 Selection Selection done in decode stage to speed up Based on instruction address and opcode 2Kbit table to predict the PSAC way Address Opcode PSAC SSC International Symposium on Low Power Electronics and Design, August

6 Stack Cache Small, direct-mapped cache Virtually tagged Software optimizations: Very important to reduce stack cache size Avoid trashing: allocate large structs in heap Easy to implement International Symposium on Low Power Electronics and Design, August

7 SSC: Specialized Stack Cache Pointers to reduce traffic: TOS: reduce number write-backs SRB (safe-region-bottom): reduce unnecessary line-fills for write miss Region between TOS & SRB is safe (missing lines are non initialized) Infrequent access TOS TOS SRB SRB Stack grows International Symposium on Low Power Electronics and Design, August

8 Pseudo Set-Associative Cache Partition the cache in 4 ways Tag Data Evaluated activation policies: Sequential, FallBackReg, Phased Cache, FallBackPha, PredictPha International Symposium on Low Power Electronics and Design, August

9 Sequential (Calder 96) cycle 1 cycle 2 cycle 3 International Symposium on Low Power Electronics and Design, August

10 Fallback-regular (Inoue 99) cycle 1 cycle 2 International Symposium on Low Power Electronics and Design, August

11 Phased Cache (Hasegawa 95) cycle 1 cycle 2 International Symposium on Low Power Electronics and Design, August

12 Fallback-phased (ours) Emphasis in energy reduction cycle 1 cycle 2 cycle 3 International Symposium on Low Power Electronics and Design, August

13 Predictive Phased (ours) Emphasis in performance cycle 1 cycle 2 International Symposium on Low Power Electronics and Design, August

14 Simulation Environment Baseline configuration: Processor: 1GHz R10000 like L1: 32 KB 2-way L2: 512KB 8-way phased cache Memory: 1 Rambus Channel Energy model: extended CACTI Energy is for data memory hierarchy only International Symposium on Low Power Electronics and Design, August

15 Applications Multimedia SPECint Scientific Mp3dec: MP3 decoder Mp3enc: MP3 encoder Gzip: Data compression Crafty: Chess game MCF: Traffic model Bsom: data mining Blast: protein matching Treeadd: Olden tree search International Symposium on Low Power Electronics and Design, August

16 Adding a Stack Cache Normalize Baseline PLAIN 256B SSC 256B PLAIN 512B SSC 512B PLAIN 1KB SSC 1KB Delay Energy E*D For the same size the Specialized Stack Cache is always better International Symposium on Low Power Electronics and Design, August

17 Pseudo Set-Associative Cache way Sequential 4-way FallBackReg 4-way Phased 4-way FallBackPha 4-way PredictPha Normalize Baseline Delay Energy E*D PredictPha has the best delay and energy-delay product International Symposium on Low Power Electronics and Design, August

18 PSAC: 2-way vs. 4-way way Sequential 2-way PredictPha 4-way PredictPha Normalize Basline Delay Energy E*D For E*D, 4-way PSAC is better than 2-way International Symposium on Low Power Electronics and Design, August

19 Pseudo Set-Associative + Specialized Stack Cache way PredictPha 4-way PredictPha + SSC256B 4-way PredictPha + SSC512B Normalize Baseline way PredictPha + SSC1KB Delay Energy E*D Combining PSAC and SSC reduces E*D by 44% on average International Symposium on Low Power Electronics and Design, August

20 Area Constrained: small PSAC+SSC KB 3-way PredictPha 24KB 3-way PredictPha + SSC512B 32KB 4-way PredictPha + SSC512B Normalize Baseline Delay Energy E*D SSC + small PSAC delivers cost effective E*D design International Symposium on Low Power Electronics and Design, August

21 Energy Breakdown Normalize Baseline BLAST MCF MP3D SSC L1 L2 Mem Baseline 4-way PSAC SSC512B Comb Baseline 4-way PSAC SSC512B Comb Baseline 4-way PSAC SSC512B Comb International Symposium on Low Power Electronics and Design, August

22 Conclusions Stack cache: important for energy-efficiency SW optimization required for stack caches Effective Specialized Stack Cache extensions Pseudo Set-Associative Cache: 4-way more effective than 2-way Predictive Phased PSAC has the lowest E*D Effective to combine PASC and SSC E*D reduced by 44% on average International Symposium on Low Power Electronics and Design, August

23 Backup Slides International Symposium on Low Power Electronics and Design, August

24 Cache Energy Energy (pj) way 2-way 1-way 0 4K 8K 16K 32K 64K Cache Size International Symposium on Low Power Electronics and Design, August

25 Extended CACTI New sense amplifier 15% bit-line swing for reads Full bit-line swing for writes Different energy for reads, writes, linefills, and write backs Multiple optimization parameters International Symposium on Low Power Electronics and Design, August

26 SSC Energy Overhead Small energy consumption required to use TOS and SRB Registers updated at function call and return Registers check on cache miss International Symposium on Low Power Electronics and Design, August

27 Miss Rate BLAST BSOM CRAFTY GZIP MCF MP3D MP3E TREE 12% 10% 8% 6% 4% 2% 0% 4KB 8KB 16KB 32KB 64KB International Symposium on Low Power Electronics and Design, August

28 Overview International Symposium on Low Power Electronics and Design, August

L1 Data Cache Decomposition for Energy Efficiency Λ Michael Huang, Jose Renau, Seung-Moon Yoo, and Josep Torrellas

L1 Data Cache Decomposition for Energy Efficiency Λ Michael Huang, Jose Renau, Seung-Moon Yoo, and Josep Torrellas L Data Cache Decomposition for Energy Efficiency Λ Michael Huang, Jose Renau, Seung-Moon Yoo, and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu ABSTRACT The L data