Memory Access Scheduling

Size: px

Start display at page:

Download "Memory Access Scheduling"

Aubrey Hampton
5 years ago
Views:

1 Memory Access Scheduling ECE 5900 Computer Engineering Seminar Ying Xu Mar 4, 2005 Instructor: Dr. Chigan 1 ECE 5900 spring 05 1

2 Outline Introduction Modern DRAM architecture Memory access scheduling Structure of access scheduler Scheduling policies Experimental results First-ready scheduling Aggressive reordering Conclusions 2 ECE 5900 spring 05 2

3 Introduction Bandwidth of memory chip increases dramatically DDR2, SDRAM Media processors Streaming memory reference patterns Memory bandwidth bottleneck 3 ECE 5900 spring 05 3

4 Intro (contd) Pipelining memory accesses Maximize the memory bandwidth Sequential accesses to the different row of the same bank can t be pipelined Memory access scheduling Reorder memory operations Bank precharge, row activation, column access Memory references completed out of order 4 ECE 5900 spring 05 4

5 Intro(contd) 5 ECE 5900 spring 05 5

6 Characteristics of DRAM architecture DRAMs are not truly random access devices 3 dimensional memories Bank Row Column 3 operations Bank precharge Row activation Column access 6 ECE 5900 spring 05 6

7 DRAM organization 7 ECE 5900 spring 05 7

8 Resource constraints of DRAMS Dram resources Internal banks A single set of address lines A single set of data lines Different operation has different demand 8 ECE 5900 spring 05 8

9 Bank state 9 ECE 5900 spring 05 9

10 Memory access scheduling Process of ordering DRAM operations Subject to resource constraints Simplest: oldest pending references first Inefficient DRAM Not ready for the oldest references Leave the available resource idle Need more complicated scheduling algorithm 10 ECE 5900 spring 05 10

11 Memory access scheduler structure 11 ECE 5900 spring 05 11

12 Memory access scheduling policies 12 ECE 5900 spring 05 12

13 Memory access scheduling algorithm Combination of policies used by precharge manager, row arbiter, column arbiter, address arbiter Address arbiter decides which selected precharge, row, column operation to perform Choices: in-order, priority, precharge operation first, row operation first, column operation first 13 ECE 5900 spring 05 13

14 Experimental setup Streaming media processors are preferred Streams lack temporal locality Stream transfer bandwidth drives the processor performance The image stream processor is simulated frequency 500MHZ Dram frequency 125MHZ Peak system bandwidth 2GB/s 14 ECE 5900 spring 05 14

15 Experimental setup(contd) Benchmarks and media processing applications 15 ECE 5900 spring 05 15

16 In order scheduling In-order access scheduler No access reordering A column is only performed for the oldest pending reference; same as bank precharge and row activation Baseline 16 ECE 5900 spring 05 16

17 First-ready ready scheduling Uses the ordered priority scheme for all units Subjects to resource and timing constraints Schedule an operation for the oldest pending references Benefits: Accesses targeting other banks can be performed while waiting for a precharge or row activation parallelism: multiple references in progress 17 ECE 5900 spring 05 17

18 Experimental results Sustained memory bandwidth increased about 79% 18 ECE 5900 spring 05 18

19 Experimental results Sustained bandwidth increased about 17% 19 ECE 5900 spring 05 19

20 Experimental results Sustained memory bandwidth increased about 79% 20 ECE 5900 spring 05 20

21 Aggressive reordering Drawback of first-ready scheduling Precharges a bank when the oldest pending reference targets a different row than the active row in a bank, there are still multiple pending references to the active row Aggressive reordering to further increase sustained memory bandwidth 21 ECE 5900 spring 05 21

22 Possible reordering scheduling algorithm polices Large range of possible memory access scheduler Four representative 22 ECE 5900 spring 05 22

23 Experimental results Improve bandwidth by % 23 ECE 5900 spring 05 23

24 Experimental results Improve bandwidth by 27-30% 24 ECE 5900 spring 05 24

25 Experimental results Improve bandwidth 85-93% 25 ECE 5900 spring 05 25

26 Row-first policy VS column first policy Address arbiter Row-first: always select row operation first Column-first: always select column operation first Little difference across all benchmarks Exception: FFT Less to do with the scheduling algorithm than the characteristic of benchmark itself FFT most sensitive to stream load latency Col/op policy allows a store stream to delay load streams 26 ECE 5900 spring 05 26

27 Open or closed precharge policy? Closed precharge policy banks are precharged as soon as no pending references to the active row Open precharge policy No pending references to the active row, pending references to other rows of the same bank Difference between open and closed precharge policy is slight Benchmarks with random access pattern prefer closed precharge policy Little reference locality No benefit to keep row open FFT prefers op precharge policy Numerous accesses to each row 27 ECE 5900 spring 05 27

28 Effect of bank buffer size Row/closed scheduling algorithm 28 ECE 5900 spring 05 28

29 Conclusions Memory access scheduling greatly increases the bandwidth utilization Buffering memory references Access internal banks in parallel Maximize the number of column accesses per row access First ready scheduling algorithm 79% bandwidth improvement on microbenchmarks, 40% on application traces Aggressive reordering algorithm 144% bandwidth improvement on benchmarks, 30% on media processing applications, 93% on the application traces 29 ECE 5900 spring 05 29

30 Conclusions Closed precharge policy preferred by most benchmarks Little difference in performance between rowfirst or column first policies. For latency sensitive applications, scheduling loads ahead of stores preferred. Banks are precharged as soon as the last column reference to an active row is completed 30 ECE 5900 spring 05 30

31 Paper reference Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, John D. Owens, Memory access scheduling, ACM SIGARCH Computer Architecture News, Proceedings of the 27th annual international symposium on Computer architecture, Volume 28 Issue 2, May ECE 5900 spring 05 31

32 Thank you! 32 ECE 5900 spring 05 32

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1]

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1] EE482: Advanced Computer Organization Lecture #7 Processor Architecture Stanford University Tuesday, June 6, 2000 Memory Systems and Memory Latency Lecture #7: Wednesday, April 19, 2000 Lecturer: Brian