Memory Controllers for Real-Time Embedded Systems. Benny Akesson Czech Technical University in Prague

Size: px

Start display at page:

Download "Memory Controllers for Real-Time Embedded Systems. Benny Akesson Czech Technical University in Prague"

Pamela Goodwin
5 years ago
Views:

1 Memory Controllers for Real-Time Embedded Systems Benny Akesson Czech Technical University in Prague

Trends in Embedded Systems Embedded systems get increasingly complex Increasingly

integrated in a device More applications execute concurrently Requires increased

platforms are heterogeneous multi-processor systems with distributed memory

2 Trends in Embedded Systems Embedded systems get increasingly complex Increasingly complex applications (more functionality) Growing number of applications integrated in a device More applications execute concurrently Requires increased system performance without increasing power The resulting complex contemporary platforms are heterogeneous multi-processor systems with distributed memory hierarchy to improve performance/power ratio Resources in the system are shared to reduce cost 2

3 Consumer Electronics Examples Source: International Technology Roadmap for Semiconductors (ITRS) Philips Set-Top Box Architecture (approximately one decade old) 3

4 Mixed Time-Criticality Applications have mixed time-criticality Firm real-time requirements (FRT) E.g. software-defined radio application Failure to satisfy requirement may violate correctness No deadline misses tolerable Soft real-time requirements (SRT) E.g. media decoder application Failure to satisfy requirement reduces quality of output Occassional deadline misses tolerable No real-time requirements (NRT) E.g. graphical user interface No timing requirements, but must be responsive 4

5 Simulation-Based Verification Simulation is typically used to verify SRT and NRT applications System simulated with a large set of inputs Resource sharing results in interference between applications Timing behaviors of applications in use-case inter-dependent All use-cases must be verified instead of all applications Verification must be repeated if applications are added or modified Verification by simulation is a slow process with poor coverage Verification is costly and effort is expected to increase in future! 5

6 Formal Verification Formal verification is alternative to simulation Provides analytical bounds on response time or throughput Considers all application inputs Covers all combinations of concurrently running applications Approach requires models of both applications and hardware Application models are not always available Behavior of dynamic applications is not captured accurately Most hardware is not designed with formal analysis in mind 6

7 Problem with SDRAM This presentation covers the CompSoC SDRAM controller SDRAM memories are particularly challenging resources The execution time of a request in an SDRAM is variable Worst-Case Execution Time (WCET) is pessimistic and guaranteed bandwidth is very low SDRAM bandwidth is scarce and must be efficiently utilized Additional interfaces cannot be added due to cost constraints 7

8 Problem Statement Complex systems have mixed time-criticality Firm, soft, and no real-time requirements in one system We refer to this as mixed real-time (MRT) requirements There are suitable SDRAM controllers for either FRT and SRT/NRT No good solutions for mixes between these types We develop a controller that reduce verification effort by enabling formal verification of FRT requirements enable independent verification by simulation for SRT/NRT while using the SDRAM in an efficient and power conscious manner 8

9 Presentation Outline Introduction Memory Overview Memory Controller Architectures Predator SDRAM Controller Conclusions 9

Cache L2 Cache Off-chip memory Off-chip memory

10 Distance from processor Access speed Memory Hierarchy Processor CPU Registers Registers L1 Cache L1 Cache L2 Cache L2 Cache Off-chip memory Off-chip memory Secondary memory Secondary memory (Hard disk) Size (capacity) 10

11 Memory Hierarchy Module Memory type Access Time Capacity Managed by Registers SRAM 1 cycle ~500B Software/compiler L1 Cache SRAM 1-3 cycles ~64KB Hardware L2 Cache SRAM 5-10 cycles 1-10MB Hardware Off-chip memory DRAM ~100 cycles ~10GB Software/OS Secondary memory Disk drive ~1000 cycles ~1TB Software/OS 11 Credits: J.Leverich, Stanford

12 Comparing SRAM and DRAM SRAM DRAM Fast (10x) Bitlines driven by transistors Large cell size (6-10x) 6 transistors Small cell size 1 capacitor and 1 transistor Bit stored as charge on capacitor Capacitor loses charge over time Period refresh required Hence the name dynamic RAM 12 Credits: J.Leverich, Stanford

13 Summary of Characteristics SRAM is preferable for register files and L1/L2 caches Simpler manufacturing (compatible with logic process) Fast random access (on-chip memory) No refreshes Lower density (6 transistors per cell) Higher cost DRAM is preferable for stand-alone memory chips Much higher capacity Higher density Lower cost Slower random access (off-chip memory) 13 Credits: J.Leverich, Stanford

14 SDRAM Architecture An SDRAM is organized in banks, rows and columns A row buffer in each bank stores a currently active (open) row Interface has a command bus, address bus, and a data bus Buses shared between banks to reduce the number of off-chip pins 14

15 Basic SDRAM Operation Memory map decodes address to bank, row, and column Row is activated and copied into the row buffer of the bank Read bursts and/or write bursts are issued to the active row Programmed burst length (BL) of 4 or 8 words Row is precharged and stored back into the memory array 15

16 Timing Constraints Timing constraints determine schedulability of commands More than 20 constraints on minimum time between commands E.g. activate-to-activate, activate-to-read/write, read/write-toprecharge, read-to-write, write-to-read, etc. Timing constraints for a DDR2-400 device 16

17 Memory Efficiency Execution times of requests are variable and traffic dependent Can vary by an order of magnitude Three reasons for overhead cycles: Activating and precharging (opening and closing) rows Switching direction of data bus from read to write Refreshing the memory Memory efficiency The fraction of clock cycles when requested data is transferred The exchange rate between peak bandwidth and net bandwidth High efficiency is required since bandwidth is a scarce resource 17

18 Bank-level Parallelism Enable DRAM access from different banks in parallel Reduces memory access latency and improves efficiency! 18 Credits: J.Leverich, Stanford

19 Presentation Outline Introduction Memory Overview Memory Controller Architectures Predator SDRAM Controller Conclusions 19

A General Memory Controller Architecture A general memory controller architecture consists of two parts The front-end buffers requests and responses per memory client

20 A General Memory Controller Architecture A general memory controller architecture consists of two parts The front-end buffers requests and responses per memory client schedules requests for memory access is independent of the memory type The back-end translates scheduled requests into SDRAM command sequence is dependent on the memory type 20

21 Front-End Front-end provides buffering and arbitration Arbiter can schedule requests in many different ways Round-Robin, Time-Division Multiplexing (TDM), priority scheduling Scheduled requests are sent to the back-end 21

Memory Map Back-end contains a memory map and a command generator Memory map decodes logical address to physical address Physical address is (bank, row, column) Decoding

22 Memory Map Back-end contains a memory map and a command generator Memory map decodes logical address to physical address Physical address is (bank, row, column) Decoding is done by slicing the bits in the logical address Many possibilities choice affects efficiency and power Logical addr. Memory map Physical addr. 0x10FF00 (2, 510, 128) 22

23 Command Generator Command generator generates and schedules commands Customized for a particular memory generation Programmable to handle different timing constraints 23

24 Controller Types Firm Real-Time Controllers Soft/No Real-Time Controllers Mixed Real-Time Controllers 24

25 Firm Real-Time Controllers FRT requirements must be satisfied even in worst-case scenario Typical goals of firm real-time controllers: Maximize the worst-case net bandwidth Minimize the worst-case response time A trade-off between the two, since they are contradictory 25

26 Locality in FRT Controllers SDRAM performance is highly dependent on locality Request served quickly if it targets an open row No overhead of opening and closing rows FRT controllers are typically unable to exploit locality Locality has to be guaranteed also in worst case Difficult for a single executing application Requires intimate knowledge of memory accesses More or less impossible for multiple concurrent applications Memory accesses mixed by memory arbiter 26

27 Close-Page Policy FRT controllers use close-page policies [Paolieri, Reineke] Precharge banks immediately after each request Assumes that every request targets closed rows Benefits of policy Reduces worst-case overhead of opening/closing rows Increases guaranteed net bandwidth Drawbacks of policy Sacrifices best and average-case performance and power Limits best-case efficiency of DDR3-800 with 64B requests to 80% 27

28 Statically Scheduled Controllers Controllers are classified as statically or dynamically scheduled Depends on SDRAM command scheduling mechanism Statically scheduled controllers [Bayliss] Pre-compute SDRAM schedule at design time Bandwidth and execution time bounded by inspecting schedule Suitable for FRT requirements Restricted to applications with well-specified memory behavior 28

29 Dynamically Scheduled Controllers Dynamically scheduled FRT controllers [Lee, Paolieri, Shao] Schedule commands at run-time based on incoming requests Challenge is to analyze command scheduler Required to bound net bandwidth and execution times Analysis often assumes large fixed-size requests [Paolieri, Reineke] Large enough to exploit maximum bank-level parallelism by interleaving Requires B requests depending on memory device 29

30 Predictable Arbitration These FRT controllers all have bounded execution time Bounding response times requires predictable arbitration Bounds number of interfering requests from other memory clients Different controllers uses different arbiters Statically scheduled controllers uses a static schedule [Paolieri] employs Round-Robin arbitration Targeting homogeneous chip multi-processors [Akesson] supports a variety of predictable arbiters E.g. (Weighted) Round-Robin, Credit-Controlled Static-Priority, and Frame-Based Static-Priority Targets heterogeneous MPSoCs 30

31 Controller Types Firm Real-Time Controllers Soft/No Real-Time Controllers Mixed Real-Time Controllers 31

32 Soft/No Real-Time Controllers Same controllers normally used for SRT/NRT requirements Dynamically scheduled high-performance controllers SRT applications are verified by simulation rather than formally Firm transaction-level guarantees are not necessary Sufficient to satisfy application-level deadlines with high probability May correspond to thousands of memory requests Typical goals of soft/no real-time controllers: Maximize the average net bandwidth Minimize the average response time A trade-off between the two, since they are contradictory 32

33 Locality in SRT Controllers SRT controllers do not have to guarantee locality Requires locality to offset miss penalties with high probability Open-page policies are common in SRT controllers Rows are speculatively kept open to exploit locality Average efficiency is hence typically higher than for FRT controllers Best-case memory efficiency is hence around 98% All requests are either reads or writes to the same row Efficiency losses only due to mandatory refresh activities 33

34 Flexibility SRT controllers are flexible and supports most memory traffic SRT controllers are dynamically scheduled Do not require formal analysis of supported memory traffic Enables supports of e.g. variable request sizes Fine-grained scheduling at level of single SDRAM bursts Reduces wasted data when serving small requests Reduces response times of sensitive clients Low worst-case memory efficiency Cannot guarantee locality or bank-level parallelism Worst-case efficiency about 16% for DDR3-800 with BL=8 words 34

35 Improving Memory Efficiency Memory efficiency is optimized using sophisticated mechanisms Preference for requests that target open rows Reduces overhead of opening and closing rows Increases response times for clients targeting closed rows Read/write grouping Reduces read/write switching overhead Increases response times for requests in wrong direction 35

36 Reducing Response Times Preference for reads over writes [Shao] Reads are often blocking while writes are posted Reduces stall cycles on processor Preemption of low-priority requests in service [Lee] Reduces response times of high-priority clients Increases response times of low-priority clients Reduces memory efficiency due to preemption overhead Interactions between mechanisms are complex Difficult to derive useful bounds on bandwidth and response times May even be difficult to guarantee the default 16% net bandwidth 36

37 Controller Types Firm Real-Time Controllers Soft/No Real-Time Controllers Mixed Real-Time Controllers 37

38 Mixed Real-Time Controllers MRT controllers must efficiently support FRT, SRT and NRT Current FRT controllers treat SRT/NRT clients like FRT clients Expensive both in terms of bandwidth and power Current SRT/NRT controllers treat FRT like SRT/NRT clients Guarantees are either not formally proven or very pessimistic Worst-case may be maximum observed case plus a safety margin Deadlines may be missed in corner cases MRT controllers are likely to evolve from current controllers Either from FRT controllers or SRT/NRT controllers 38

39 Presentation Outline Introduction Memory Overview Memory Controller Architectures Predator SDRAM Controller Conclusions 39

40 Positioning Predator is a mixed real-time hybrid controller Evolved from a firm real-time design Combines static and dynamic scheduling Statically computed memory patterns (sub-schedules) that are dynamically scheduled Two key ideas to predictability 1. Predictable WCET by using memory patterns 2. Predictable WCRT by using predictable arbitration 40

Predictable SDRAM Predictability through precomputed memory patterns Patterns are precomputed sequences of SDRAM commands There are six types of memory patterns Read, write, r/w switch, w/r switch,

41 Predictable SDRAM Predictability through precomputed memory patterns Patterns are precomputed sequences of SDRAM commands There are six types of memory patterns Read, write, r/w switch, w/r switch, refresh, and power down Dynamic request to pattern mapping: Read request read pattern (potentially first w/r switch) Write request write pattern (potentially first r/w switch) Refresh pattern issued when required Power down when idle 41

42 Memory Patterns Patterns enable scheduling at higher level than commands Less state and fewer constraints, making them easier to analyze Bounding memory efficiency Worst sequence of patterns is known (mapping & pattern lengths) Data transferred by patterns is known (by definition) Sequence of patterns for a DDR2-400 memory 42

43 Predictable Front-End Arbitration Controller works with any predictable front-end arbiter Example: Round-Robin, TDM, or our own CCSP arbiter Bounding response times Number of interfering requests is known (arbiter analysis) Worst-case request to pattern mapping is known (mapping) Pattern to cycle mapping is known (pattern lengths) Design provides bounds on net bandwidth and response times For any combination of supported memory and arbiter 43

Controller Architecture Front-end components Atomizer splits large requests into smaller pieces Delay Block contains request and response buffers Resource Bus schedules

44 Controller Architecture Front-end components Atomizer splits large requests into smaller pieces Delay Block contains request and response buffers Resource Bus schedules client according to arbiter policy Configuration Bus makes components run-time programmable Back-end Different components for SRAM and DRAM Real-Time Memory Controller 44

45 Pattern Structure Structure of basic access patterns (read and write patterns) At least one activate and precharge command per bank Access patterns of same type must be independent Banks are precharged immediately after access (close-page policy) Example read pattern for a DDR2-400 memory 45

46 Access Pattern Parameters There are three key access pattern parameters Determined at design time Number of banks to interleave (BI) request over Determines bank parallelism Memory efficiency and power consumption increase with BI Number of bursts to each bank, called burst count (BC) Number of words in a burst, the burst length (BL) BC & BL determine amount of data transferred per bank Increasing BC & BL amortizes overhead and increases efficiency 46

47 Access Granularity Parameters determine access granularity (AG) of the memory Amount of data accessed by a pattern AG = BI x BC x BL x word size Execution time of a request is proportional to AG Large requests increase WCET and WCRT Parameters are trade-off between efficiency, WCRT and power Automatically configured based on application requirements Read pattern for 16-bit DDR2-400 BI=4, BC=1, BL=8 AG=64 bytes 47

48 FRT Bandwidth / Power Trade-Off Bandwidth/ power trade-off for FRT applications For different values of BI and BC (BL=8) Labels denote BI 48

Conservative Open-Page Policy Controller uses a conservative open-page policy Page kept open as long as possible without sacrificing WCET Enables limited locality to be exploited Reduces execution

49 Conservative Open-Page Policy Controller uses a conservative open-page policy Page kept open as long as possible without sacrificing WCET Enables limited locality to be exploited Reduces execution time and power consumption Policy introduces three new types of read and write patterns With / without activates and precharges, respectively Pattern is chosen when it is known if following request hits or misses Worst-case analysis is only based on miss patterns Request arrivals: Request 1 Request 2 Request 3 Request 4 Request 5 (requests of the same color use the same page) Close-page policy ACT Read 1 PRE ACT Read 2 PRE ACT Read 3 PRE ACT Read 4 PRE ACT Read 5 PRE Open-page policy ACT Read 1 Read 2 Read 3 PRE ACT Read 4 PRE ACT Read 5 Conservative open-page policy ACT Read 1 Read 2 PRE ACT Read 3 PRE 49 ACT Read 4 PRE ACT Read 5 PRE

50 SRT Execution Time / Power Trade-Off SRT application is verified by SystemC system simulation Traffic generator running SRT H.263 trace with 32 B cache lines 2 FRT applications modeled with synthetic traffic (64, 128 B) Results normalized to best configuration Conservative OP policy reduces power and execution time Legend: Labels: (BI, BC) Colored points = close-page policy Thin symbol = violates power budget 50

51 Presentation Outline Introduction Memory Overview Memory Controller Architectures Predator SDRAM Controller Conclusions 51

52 Conclusions Complex SoCs have mixed real-time (MRT) requirements Mix of firm (FRT), soft (SRT), and no real-time (NRT) requirements There are suitable controllers for FRT and SRT/NRT, but not MRT Firm real-time controllers Maximize bandwidth bound and minimize response time bound Static, dynamic, or hybrid SDRAM command scheduling Close-page policies to reduce miss penalty Predictable arbitration Soft/no real-time controllers Maximize average bandwidth and minimize average response time Dynamically scheduled with sophisticated mechanisms Open-page policies to exploit locality 52

53 Conclusions We propose a mixed real-time SDRAM controller Predictability enables formal verification of FRT requirements Achieved by memory patterns and predictable arbitration Configurable trade-off between efficiency, response time, and power Uses a conservative open-page policy to exploit locality for SRT/NRT 53

Acknowledgements This work would not have been possible without the memory team

Eindhoven) Manil Dev Gomony (TU Eindhoven) Yonghui Li (TU Eindhoven) Anna

(TU Eindhoven) Jasper Kuijsten (TU Eindhoven) Getachew Teshome (TU Delft) Eelke

Foroutan (TIMA Laboratories) Gervin Thomas (TU Berlin) Winston Siauw (TU Delft)

54 Acknowledgements This work would not have been possible without the memory team Kees Goossens (TU Eindhoven) Karthik Chandrasekar (TU Delft) Sven Goossens (TU Eindhoven) Manil Dev Gomony (TU Eindhoven) Yonghui Li (TU Eindhoven) Anna Minaeva (CTU Prague) Tim Kouters (TU Eindhoven) Williston Hayes jr. (TU Eindhoven) Jasper Kuijsten (TU Eindhoven) Getachew Teshome (TU Delft) Eelke Strooisma (TU Delft) Markus Ringhofer (TU Graz) Anand Khot (TU Delft) Sahar Foroutan (TIMA Laboratories) Gervin Thomas (TU Berlin) Winston Siauw (TU Delft) This research is supported by EU grants T-CREST, FlexTiles, Cobra, & NL grant NEST. CA104 COBRA 54

55 References [Akesson] B. Akesson and K. Goossens. Architectures and Modeling of Predictable Memory Controllers for Improved System Integration. In Proc. DATE, 2011 [Bayliss] S. Bayliss and George A. Constantinides. "Methodology for designing statically scheduled application-specific SDRAM controllers using constrained local search. In Proc. FPT, 2009 [Lee] K. Lee, T. Lin, and C. Jen. An efficient quality-aware memory controller for multimedia platform SoC. In IEEE Trans. on Circuits and Systems for Video Technology,15(5), 2005 [Paolieri] M. Paolieri, E. Quinones, F. Cazorla, and M. Valero. An Analyzable Memory Controller for Hard Real-Time CMPs. In: Embedded Systems Letters, IEEE, 1(4), [Reineke] J. Reineke, Isaac Liu, Hiren Patel, Sungjun Kim, and Edward Lee. PRET DRAM Controller: Bank Privatization for Predictability and Temporal Isolation. In: Proc. CODES+ISSS, 2011 [Shao] J. Shao and B. Davis. A burst scheduling access reordering mechanism. In Proc. HPCA,

56 Questions Thank you for your attention! Any questions? For the full story, please consult our book Memory Controllers for Real-Time Embedded Systems, available from Springer. 56

An introduction to SDRAM and memory controllers. 5kk73

An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part