WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems

Size: px

Start display at page:

Download "WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems"

Richard Little
5 years ago
Views:

Chicago Northwestern University DATE-208

1 : A Writeback-Aware LLC Management for PCM-based Main Memory Systems Bahareh Pourshirazi *, Majed Valad Beigi, Zhichun Zhu *, and Gokhan Memik * University of Illinois at Chicago Northwestern University DATE-208 IEEE/ACM Design, Automation, and Test in Europe March 2 Dresden, Germany

2 Introduction Increasing demand for memory capacity Increasing number of cores on multicore processors Intel Sandy Bridge: 8 cores (6 threads) IBM POWER7: 8 cores (32 threads) Increasing data set sizes Graph, database, scientific workloads Problems with DRAM Scalability limitations Slowed down Below 6nm seems difficult Periodic refresh operations 2

3 Phase Change Memory Promising technology Denser than DRAM (3 2 ) Non-volatile storage wordline PCM storage element Shortcomings Higher access latency (4 2 DRAM) Higher dynamic energy (2 40 DRAM) Limited write endurance bitline especially WRITE operations 3

4 Existing Solutions DRAM + PCM hybrid main memories DRAM as a cache to PCM Modifications to PCM-based main memories Optimization on PCM architecture Reducing the number of writebacks from LLC to PCM CPU Core CPU Core CPU Core CPU Core CPU Core CPU Core Shared LLC Shared LLC MM DRAM Cache Large PCM storage PCM Main Memory 4

5 In this work Summary of Contributions We propose, a novel Writeback-Aware LLC management scheme reduces the number of writebacks from the Last Level Cache (LLC) to a PCM-based main memory consists of Writeback aware set-balancing mechanism Writeback-aware replacement policy 5

6 Outline Introduction Background Motivation Evaluation Results Conclusion 6

7 Impact of reducing write traffic on PCM Lifetime enhancement PCM Lifetime α Background write traffic Performance improvement Writes increase latency of reads by.2 to.8 times [Arjomand_ISCA206] Normalized Lifetime ( ) Reduction in energy consumption Writes consume ~0 of reads Write Reduction (%) 7

8 Related Work WADE [Wang_TACO203] Reduces the number of writebacks to PCM Partitions a set s blocks into frequent writeback and non-frequent writeback Tries to keep the frequent writeback blocks in the set Considers the set s blocks as the only replacement candidates Complex implementation Set-Balancing Cache (SBC) [Rolán_Micro2009] Balances the pressure on cache sets to reduce miss rate It does not reduce writebacks 8

9 Percentage of writebacks Motivation Writebacks are not uniformly distributed among LLC sets % sp Percentage of sets sp from NAS 5.3% 5.3% Percentage of writebacks % gcc Percentage of sets gcc from SPEC2006 Percentage of writebacks streamcluster 27.4% 8.7% Percentage of sets streamcluster from PARSEC A set with few writeback can be used to store the dirty eviction victims of a set with many writeback 5.% 9

10 Set Balancing LLC sets are classified into three categories Writer: frequent writebacks Non-writer: infrequent writebacks, underutilized Neutral: neither writer, nor non-writer Each writer set is partnered with a non-writer set WRITER EVICT LRU DIRTY write PCM Main Memory Partners NON-WRITER INSERT 0

11 To determine set types, two counters are used Writeback counter Saturation counter [Rolán_Micro2009] To measure the degree to which set can hold its working set Counter thresholds Writeback Set Balancing (cont.) Arithmetic Mean wwww wwbb 2 wwbb nn Saturation Counter Access hit miss overall average wwww mm Saturation wwww wwww 2 wwbb mm wwww mm wwww mm+ wwww nn τ sat = K/4, K is the set associativity τ low_wb τ high_wb

12 For a set with writeback count of wb and saturation counter of sat Set is writer if wb ττ hiiiii_wwww Set is non-writer if sat ττ ssssss and wb ττ llllll_wwww < [ ττ llllll_wwww - ττ hhhhhhhh_wwww ] > < > ττ llllll_wwww Set Balancing (cont.) ττ hhhhhhhh_wwww ττ ssssss ττ ssssss Percentage of Sets writer non-writer wb sat wb sat wb sat wb sat wb sat wb sat wb sat wb sat sp ua stream dedup gcc mcf mix mix2 2

13 Replacement Policy Frequent writeback block: frequently reused after being evicted from the cache Frequent writeback blocks are given a second chance upon eviction to stay in cache and be accessed INSERT ACCESS EVICT FV = 0 LRU DIRTY write PCM Main Memory Non-writer or neutral set To avoid performance penalty, the replacement policy is considered for the neutral or non-writer sets 3

14 Design Set(n) ST[0] ST[] 0 ST[0] ST[] 0 writer 0 non-writer 0 X neutral ST: Set Type Set(n) insert into partner of Set(n) victim 0 to PCM move to MRU MRU MRU dirty FV dirty FV 0 0 to PCM to MRU 0 X to PCM LRU finding another victim storage overhead Less than 0.6% of the LLC capacity 4

15 Experimental Setup Simulator GEM5 integrated with NVMAIN Cores 8 cores, out-of-order, 2.0GHz Caches 32KB L (2 cycles), 256KB L2 (2 cycles), Shared LLC 8MB (35 cycles) MOESI directory PCM Main Memory 4GB, 4 channels, rank/channel, 4 banks/rank t_set= 50ns, t_reset= 00ns, t_rcd= 20ns, Cell endurance = writes Four memory controllers One read and write queue, Write drain threshold: high = 80%, low = 50% 5

16 Workloads Experimental Setup Multi-threaded applications NAS and PARSEC benchmarks Multi-programmed workloads SPEC CPU2006 We run the workloads for 2 billion instructions, after two billion for cache warm-up phase We compare with : that uses the LRU replacement policy double-way: a baseline with double the associativity WADE: the proposed scheme by Wang et al. [Wang_TACO203] 6

17 LLC Writeback Compared to baseline, reduced by 26.6% on average For writers sets, reduced by 39.5%, on average For non-writers sets, increased from 0.4% to 3.% For neutral sets, reduced by 28.6% on average non-writer neutral writer.2 Normalized Writebacks % sp ua stream dedup gcc mcf mix mix2 Average 7

18 Writeback Reduction Compared to baseline double-way, by 23.3% on average Compared to WADE, by 6.4% on average.2 LLC Writeback Reduction % 6.4% sp ua stream dedup gcc mcf mix mix2 Average double-way WADE 8

19 MPKI Compared to baseline, reduced by 2.4% on average For writers sets, reduced by 27.8%, on average. For non-writers sets, increased from 2.0% to 6.2% For neutral sets, increased from 57.3% to 59.%.2 non-writer neutral writer 2.4% Normalized MPKI sp ua stream dedup gcc mcf mix mix2 Average 9

20 MPKI Compared to baseline double-way, increased by.0% on average Compared to WADE, reduced by 0.3% on average Normalized MPKI sp ua stream dedup gcc mcf mix mix2 Average double-way WADE 20

21 Main Memory Energy Compared to baseline, reduced by 9.2% on average Compared to baseline double-way, reduced by 6.5% on average Compared to WADE, reduced by.3% on average Main Memory Energy sp ua stream dedup gcc mcf mix mix2 GMEAN double-way WADE 9.2% 6.5%.3% 2

22 IPC Compared to baseline, improved by 6.7% on average Compared to baseline double-way, improved by 4.9% on average Compared to WADE, 3.2% on average.3 Normalized IPC % 4.9% 3.2% 0.9 sp ua stream dedup gcc mcf mix mix2 GMEAN double-way WADE 22

23 PCM Lifetime Compared to baseline scheme, increased by.25 on average Compared to baseline double-way, increased by.2 on average Compared to WADE, increased by.7 on average Lifetime (years) double-way WADE sp ua stream dedup gcc mcf mix mix2 GMEAN 23

24 Conclusion We proposed to reduce the number of writebacks from the LLC to the PCM main memory includes: A set-balancing mechanism Uses the non-write sets as storage of writer sets writebacks. A writeback-aware replacement policy Keeps the frequently reused dirty lines of the sets Results show that can achieve Writeback reduction, by 26.6% on average PCM lifetime enhancement, by.25 on average Main memory energy efficiency, by 9.2% on average 24

25 Thank You! Questions? 25

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Executive Summary Different memory technologies have different