ChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality

Size: px

Start display at page:

Download "ChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality"

Lucas Cobb
5 years ago
Views:

1 ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu

2 Executive Summary Goal: Reduce average DRAM access latency with no modification to the existing DRAM chips Observations: 1) A highly-charged DRAM row can be accessed with low latency 2) A row s charge is restored when the row is accessed 3) A recently-accessed row is likely to be accessed again: Row Level Temporal Locality (RLTL) Key Idea: Track recently-accessed DRAM rows and use lower timing parameters if such rows are accessed again ChargeCache: Low cost & no modifications to the DRAM Higher performance ( % on average for 8-core) Lower DRAM energy (7.9% on average) 2

3 Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 3

4 DRAM Stores Data as Charge DRAM Cell Three steps of charge movement MemCtrl 1. Sensing 2. Restore 3. Precharge CPU Sense-Amplifier 4

5 charge DRAM Charge over Time Cell Ready to Access Cell Ready to Precharge Ready to Access Charge Level Data 1 Sense Amplifier Sense-Amplifier ACT Sensing trcd tras Restore R/W PRE Data 0 Precharge time 5

6 Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 6

7 charge Accessing Highly-charged Rows Ready to Access Ready to Precharge Cell Data 1 Sense-Amplifier Data 0 ACT Sensing Restore Precharge trcd R/W tras PRE time 7

8 Observation 1 A highly-charged DRAM row can be accessed with low latency trcd: 44% tras: 37% How does a row become highly-charged? 8

9 charge How Does a Row Become Highly-Charged? DRAM cells lose charge over time Two ways of restoring a row s charge: Refresh Operation Access Refresh Access Refresh time 9

10 Observation 2 A row s charge is restored when the row is accessed How likely is a recently-accessed row to be accessed again? 10

11 Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 11

12 Fraction of Accesses Row Level Temporal Locality (RLTL) A recently-accessed DRAM row is likely to be accessed again. t-rltl: Fraction of rows that are accessed within time t after their previous access 100% 100% 80% 60% 80% 40% 60% 20% 40% 20% 0% 86% 97% 8ms RLTL for single-core eight-core workloads 12

13 Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 13

14 Summary of the Observations 1. A highly-charged DRAM row can be accessed with low latency 2. A row s charge is restored when the row is accessed 3. A recently-accessed DRAM row is likely to be accessed again: Row Level Temporal Locality (RLTL) 14

15 Key Idea Track recently-accessed DRAM rows and use lower timing parameters if such rows are accessed again 15

16 ChargeCache Overview Memory Controller ChargeCache A D DRAM :A :B :C :D :E :F Requests: A D A ChargeCache Miss: Hit: Use Default Lower Timings 16

17 Area and Power Overhead Modeled with CACTI Area ~5KB for 128-entry ChargeCache 0.24% of a 4MB Last Level Cache (LLC) area Power Consumption 0.15 mw on average (static + dynamic) 0.23% of the 4MB LLC power consumption 17

18 Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 18

19 Methodology Simulator DRAM Simulator (Ramulator [Kim+, CAL 15]) Workloads 22 single-core workloads SPEC CPU2006, TPC, STREAM 20 multi-programmed 8-core workloads By randomly choosing from single-core workloads Execute at least 1 billion representative instructions per core (Pinpoints) System Parameters 1/8 core system with 4MB LLC Default trcd/tras of 11/28 cycles 19

20 Mechanisms Evaluated Non-Uniform Access Time Memory Controller (NUAT) [Shin+, HPCA 14] Key idea: Access only recently-refreshed rows with lower timing parameters Recently-refreshed rows can be accessed faster Only a small fraction (10-12%) of accesses go to recently-refreshed rows ChargeCache Recently-accessed rows can be accessed faster A large fraction (86-97%) of accesses go to recentlyaccessed rows (RLTL) 128 entries per core, On hit: trcd-7, tras-20 cycles Upper Bound: Low Latency DRAM Works as ChargeCache with 100% Hit Ratio On all DRAM accesses: trcd-7, tras-20 cycles 20

21 Speedup Single-core Performance NUAT ChargeCache + NUAT ChargeCache LL-DRAM (Upper bound) 16% 12% 8% 4% 0% ChargeCache improves single-core performance 21

22 Speedup Eight-core Performance NUAT 2.5% ChargeCache 9% ChargeCache + NUAT LL-DRAM (Upperbound) 13% 16% 12% 8% 4% 0% ChargeCache significantly improves multi-core performance 22

23 DRAM Energy Reduction DRAM Energy Savings 15% Average Maximum 10% 5% 0% Single-core Eight-core ChargeCache reduces DRAM energy 23

24 Other Results In The Paper Detailed analysis of the Row Level Temporal Locality phenomenon ChargeCache hit-rate analysis Sensitivity studies osensitivity to t in t-rltl ochargecache capacity 24

25 Outline 1. DRAM Operation Basics 2. Accessing Highly-charged Rows 3. Row Level Temporal Locality (RLTL) 4. ChargeCache 5. Evaluation 6. Conclusion 25

26 Conclusion ChargeCache reduces average DRAM access latency at low cost Observations: 1) A highly-charged DRAM row can be accessed with low latency 2) A row s charge is restored when the row is accessed 3) A recently-accessed row is likely to be accessed again: Row Level Temporal Locality (RLTL) Key Idea: Track recently-accessed DRAM rows and use lower timing parameters if such rows are accessed again ChargeCache: Low cost & no modifications to the DRAM Higher performance ( % on average for 8-core) Lower DRAM energy (7.9% on average) Source code will be available in May 26

27 ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu

28 Backup Slides 28

29 Detailed Design Highly-charged Row Address Cache (HCRAC) 1 PRE Insert Row Address 3 Invalidation Mechanism 2 ACT Lookup the Address 29

30 Fraction of Accesses tpch6 apache20 GemsFDTD mcf sphinx3 tpch2 astar hmmer milc bwaves lbm omnetpp tonto bzip2 lislie3d sjeng tpcc64 cactusadm libquantum spolex tpch17 STREAMcopy AVERAGE RLTL Distribution 0.125ms - RLTL 0.25ms - RLTL 0.5ms - RLTL 1ms - RLTL 32ms - RLTL

31 Sensitivity on Capacity 31

32 Hit-rate Analysis 32

33 Sensitivity on t-rltl 33

SoftMC A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies

SoftMC A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies Hasan Hassan, Nandita Vijaykumar, Samira Khan, Saugata Ghose, Kevin Chang, Gennady Pekhimenko, Donghyuk