Low Power Cache Design

Size: px

Start display at page:

Download "Low Power Cache Design"

Junior Wheeler
5 years ago
Views:

1 Acknowlegements Low Power Cache Design M.Bilal Paracha Hisham Chowdhury Ali Raza Ching-Long Su and Alvin M Despain from University of Southern California, Cache Design Trade-offs for Power and Performance Optimization:A Case Study C.L and Alvin M.Despain Cache Designs for Energy and Efficiency Zhichun Zhu Xiadong Zhang, College of William and Mary, Access Mode predictions for low-power cache design M. D. Powell and A. Agrawal and T. N. Vijaykumar and B. Falsafi and K. Roy, Reducing Set-Associative Cache Energy via selective Direct Mapping and Way Prediction.. MICRO Today s talk Abstract Introduction Use of cache in microprocessors Different designs to optimize cache energy and power Design Trade-offs for Power & Performance Optimization Vertical Cache Partitioning Horizontal Cache Partitioning Gray Code Addressing Set-Associative Cache Energy Reduction Way Prediction Selective direct-mapping Access Mode Prediction (AMP) Advantages over Way Prediction and Phased cache Different prediction techniques Evaluation Results Cache Access Times Miss Rates Cache Energy Today s talk. Conclusion Acknowledgements Abstract Usage of caches in modern microprocessors. Caches designed for high performance, ignore power Research activities towards low power cache design Introduction Cache uses 30-60% processor energy in embedded systems Use of caches in high performance machines Various designs to optimize energy

Use of cache in microprocessors High performance products go mobile (Notebooks, PDA s etc) Cache s as temporary storage devices Design of components with low power Designs to optimize cache energy

Evaluation Results Gray code vs 2 s compliment Minimizes bit switches 2s Compliment:31 bits change Gray Code:16 bits change <dm,2> A direct mapped cache with block size 2 <dm,4> A direct mapped cache

2 Use of cache in microprocessors High performance products go mobile (Notebooks, PDA s etc) Cache s as temporary storage devices Design of components with low power Designs to optimize cache energy Vertical Cache Partitioning Horizontal Cache Partitioning Block Buffer Block Hit/Miss Block Size Cache segments Cache sub-banks Reduction cache accesses Hit time, an advantage Gray Code Addressing Evaluation Results Gray code vs 2 s compliment Minimizes bit switches 2s Compliment:31 bits change Gray Code:16 bits change <dm,2> A direct mapped cache with block size 2 <dm,4> A direct mapped cache with block size 4 <dm,8> A direct mapped cache with block size 8 <2lru,2> A 2-way set associative cache with block size 2 <2lru,4> A 2-way set associative cache with block size 4 <2lru,8> A 2-way set associative cache with block size 8 <4lru,2> A 4-way set associative cache with block size 2 <4lru,4> A 4-way set associative cache with block size 4 <4lru,8> A 4-way set associative cache with block size 8

15 ns o2 way set associative is approx 50% slower than dm cache Energy Consumption Reducing Set Associative Cache Energy Via Way Prediction and Selective

3 Cache Access Time Energy vs Cache Size otakes less time to access direct mapped than set associative ocache access of 1K byte for dm=4.79 ns, for set assoc=7.15 ns o2 way set associative is approx 50% slower than dm cache Energy Consumption Reducing Set Associative Cache Energy Via Way Prediction and Selective Direct mapping Cache Access Energy Reduction Techniques Energy Dissipation in Data Array is much larger than in Tag Array so Energy Optimizations in Data Array only are done. Selective Direct Mapping for D- Caches Way Prediction for I-Caches Different Design Techniques a) Conventional Parallel Access

b) Sequential Access c) Way Prediction d) Selective

Selective Direct mapping (DM) Different Cache

Cache Design Phased Cache: Compares tag with all the

then, it accesses the data Consumes energy, not

4 b) Sequential Access c) Way Prediction d) Selective Direct Mapping (DM) Prediction Framework for Selective Direct mapping (DM) Different Cache accessing mode Access Mode Prediction for Low Power Cache Design Phased Cache: Compares tag with all the tag in a particular set, If the tag matches only then, it accesses the data Consumes energy, not efficient Access the set Access all n tags Access the data corresponding to the tag

Way Prediction: Access only the predicted tag and data Efficient when hit rate is high Not very efficient when there is a miss (has to access rest of the tag and data elements) Access the set Way

5 Way Prediction: Access only the predicted tag and data Efficient when hit rate is high Not very efficient when there is a miss (has to access rest of the tag and data elements) Access the set Way Prediction Access the predicted data and tag sub array in the set Prediction Correct Access Mode Prediction (AMP) Prediction based approach Better to use Way Prediction when hit rate is very high When hit rate is low, it is preferable to use Phased Cache approach Predicts whether cache access will result in a hit or a miss. If it predicts a hit then Way prediction is used, other wise use Phased Cache approach Accuracy of the access mode determines the efficiency of the approach Yes No Compare the rest of the data and tag array Proceed Different Predictors Power Consumption: Perfect AMP and perfect Way Prediction has a power which is the lower bound of conventional set associative cache. predicted hit in the way-prediction cache, the energy consumed is Etag +Edata, compared with n Etag+ Edata in the phased cache miss in the way-prediction cache will consume (n + 1) Etag + (n + 1) Edata, in comparison with (n +1) Etag + Edata in the phased cache. Saturating Counter: Similar to the saturating counter of branch prediction used in project2 Maintains a two bit counter which increments on a cache hit and decrements on a cache miss Two-level adaptive predictor: Adaptive two level branch prediction using global pattern-history table (GAg) K bit history register records the result of most recent K accesses For a hit register records a 1, otherwise 0 This K bit is used to index global pattern history table which has 2^K entries, each entry is a 2 bit saturation counter Per address two level global pattern history table (PAg) Each set has its own access history register All history register index a single history pattern table Correlation predictor Gshare predictor: XOR of global access history with current reference set provides the index for global pattern history table Misprediction rate of different predictors Conclusion Cache Designs can be modified to obtain maximum performance and optimal energy Experiments suggest that direct-mapped caches (inst and data) consume less energy for dynamic logic Set Associative consume less energy for static logic Circuit level techniques can no longer keep power dissipation under a reasonable level. Reduction of power is done on architectural level. By producing different schemes for reducing onchip cache power

6 Questions???

BRANCH PREDICTORS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

BRANCH PREDICTORS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah BRANCH PREDICTORS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcements Homework 2 release: Sept. 26 th This lecture Dynamic