Trends in Embedded System Design

Size: px
Start display at page:

Download "Trends in Embedded System Design"

Transcription

1

2 Trends in Embedded System Design MPSoC design gets increasingly complex Moore s law enables increased component integration Digital convergence creates a market for highly integrated devices The resulting embedded systems have a large number of IP components run many different independent applications System life time is decreasing Pressure to reduce cost and time to market 2

3 Example SoC Philips Viper2 (PNX8550) Set-top box Heterogeneous architecture MIPS processor + 2x TriMedia VLI + hardware accelerators Centralized memory with non-uniform memory access latency Central SDAM memory Processors have caches Shared memory communication Interconnect Direct wires for processors PMA + buses for accelerators 0.13 m technology ~50 M transistors > 70 IP blocks 3 DCS-SECM S DCS-CTS M-GIC S M-IPC S CLOCKSS GLOBALS ESET S TM1-DBGS TM2-DBGS UAT1 S UAT2 S UAT3 S EJTAG M S BOOT M S MIPS P4450 MS M-DCS S PMA-MON S PMA-SEC S PMA-AB M S PCI/XIO S S S S S S S DE IIC1 IIC2 IIC3 USB SMC1 SMC2 M-Gate C-Bridge PMA Memory Controller VMPG S VLD TM32 S QVCP2 S DVDD S MBS1 S MBS2 S QTN S QVCP1 S DCS-SECS EDMA S VIP1 VIP2 VPK S S S TSDMA S MS T-DCS MS TM32 S TM1-IPC S TM1-GIC S TM2-IPC S TM2-GIC S M S M S M S M S M S S S S DCS-CT DENC SPDIO AIO1 AIO2 AIO3 GPIO M S TUNNEL MSP1 MSP2

4 eal-time requirements Embedded applications often have real-time requirements A certain computation must be finished before a deadline There are three different types of real-time requirements Hard real-time requirements Missing a deadline causes significant quality degradation (media application) Deadlines may be safety critical (automotive application) Soft real-time requirements Missing deadlines has moderate impact on quality Ok to miss some deadlines, but not too many Example: MP3 player No real-time requirements Applications does not have deadlines Example: Graphical user interface 4

5 Verification Applications share resources in the system to reduce cost Processors, memories, interconnect, etc. esource sharing results in interference between applications Timing behaviors of applications in use-case inter-dependent Verification is typically done by system-level simulation All use-cases must be verified instead of all applications Verification must be repeated if applications are added or modified Slow process with poor coverage Verification is costly and effort is expected to increase in future! 5

6 Formal Verification Formal verification is alternative to simulation Provides analytical bounds on latency or throughput Covers all combinations of concurrently running applications Approach requires predictable systems Need performance models of both applications and hardware Application models are not always available Hardware typically not designed with formal analysis in mind Interconnect and SAM controller with performance models exist Problem remains for complex resources, such as SDAM controllers 6

7 Problem Statement Current trends make it increasingly difficult to design embedded systems that satisfy application requirements An SDAM controller is required that satisfies the real-time requirements of embedded applications The controller should reduce verification effort by enabling formal verification of real-time requirements enable independent verification by simulation 7

8 Presentation Outline Introduction SDAM overview Predictable SDAM controller Composable SDAM controller Conclusions 8

9 About SDAM SDAM is an essential system component Enables large storage capacity at low cost DAM cell has 1 transistor and 1 capacitor vs. 6 transistors for SAM A bit is represented by a high or low charge on the capacitor Charge dissipates due to leakage hence the term dynamic AM Capacity of up to a gigabyte per chip SDAM is off-chip memory Long access time compared to SAM Off-chip pins are expensive in terms of area and power SDAM bandwidth is scarce and must be efficiently utilized 9

10 SDAM Architecture An SDAM is organized in banks, rows and columns A row buffer stores a currently active (open) row Interface has a command bus, address bus, and a data bus Buses shared between banks to reduce the number of off-chip pins Typical values DD2/DD3: 4 or 8 banks 8K 65K rows / bank 1K 2K columns / row 4, 8, 16 bits / column MHz 32 MB 1 GB density 10

11 Basic SDAM Operation Memory map decodes address to bank, row, and column ow is activated and copied into the row buffer of the bank ead bursts and/or write bursts are issued to the active row Programmed burst length (BL) of 4 or 8 words ow is precharged and stored back into the memory array 11

12 Timing Constraints Timing constraints determine when cmds can be scheduled Typically minimum delays between commands Different delays for different memories More than 20 constraints to consider Limits the efficiency of memory accesses Precharge, activate and read/write commands before data on bus Parameter Abbr. Cycles ACT to D/ tcd 3 ACT to ACT (diff. banks) td 2 ACT to ACT (same bank) tas 12 ead latency tl 3 D to D - BL/2 12

13 Problem with SDAM The time to serve a request is variable and traffic dependent Is appropriate row is already open in the bank? Is data bus is in read or write mode? Is it time to refresh the memory? This makes it difficult to satisfy latency requirements Offered bandwidth is also variable Problem to guarantee that bandwidth requirements are satisfied orst-case bandwidth very low Less than 40% of peak bandwidth for all DD2 devices Lower for faster memories, such as DD3 Efficiently satisfying hard real-time requirements is challenging! 13

14 Presentation Outline Introduction SDAM overview Predictable SDAM controller Composable SDAM controller Conclusions 14

15 Enabling Formal Verification Predictable systems enable formal verification System in which there is a useful bound temporal behavior Bounding the temporal behavior of a memory controller involves bounding bandwidth and latency Next, we look at how this is done with current controllers 15

16 Statically Scheduled Controllers Some memory controllers are statically scheduled Execute static sequence of SDAM commands Statically scheduled controllers are: predictable Latency of requests and available net bandwidth can be computed Formal verification possible inflexible Cannot adapt to variations in traffic, such as changes in requested bandwidth, read/write ratio etc. not scalable Number of schedules to create, store and verify explodes 16

17 Dynamically Scheduled Controllers Other controllers are dynamically scheduled SDAM commands scheduled dynamically in run-time Dynamically scheduled controllers are: flexible Adapt to variations in traffic efficient Can reorder requests to fit with memory state unpredictable Difficult to provide analytical bounds on bandwidth and latency Typically verified by simulation 17

18 Hybrid Controller e propose a hybrid SDAM controller Combines elements of predictable and flexible memory controllers Increases flexibility compared to existing predictable controllers Does not have to know exact memory traffic at design time Possibly with a reduction in efficiency Efficiency Static controllers Dynamic controllers Predictability Flexibility Hybrid controller 18

19 Predictable SDAM Predictability through precomputed memory access patterns Patterns are precomputed sub-schedules of SDAM commands There are five types of memory access patterns ead, write, r/w switch, w/r switch, and refresh patterns Pattern to request mapping: ead request read pattern (potentially first w/r switch) rite request write pattern (potentially first r/w switch) efresh pattern issued when required 19 19

20 Memory Patterns Patterns enable scheduling at higher level than commands Less state and fewer constraints, making them easier to analyze ead/write patterns are efficient without relying on locality equires large requests (64 bytes for 16-bit memory with 4 banks) Memory patterns let us provide lower bound on bandwidth E.g. 660 MB/s from a 16-bit DD2-400 with peak rate 800 MB/s (82%) ead pattern for DD

21 Pattern generation Memory patterns used to be made by hand Different patterns for different memories All timing constraints considered by designer Time consuming and error prone Now patterns are automatically generated by a tool Created to satisfy all timing constraints, given a specification Three heuristic algorithms Exhaustive algorithm may run for weeks, months, or years Other algorithms find efficient patterns in less than a second 21

22 Predictable Front-end Arbitration Controller and analysis supports any predictable arbiter Example: ound-obin, TDM, or our own priority-based arbiter Latency computed in number of interfering requests Latency bound in clock cycles is easily derived since: equest to pattern mapping is known (scheduling rules) Pattern to cycle mapping is known (length of patterns) Design provides bounds on latency and bandwidth For any combination of DD2/DD3 memory and supported arbiter 22 22

23 Presentation Outline Introduction SDAM overview Predictable SDAM controller Composable SDAM controller Conclusions 23

24 Isolating Applications Formal verification only applies if there is an application model Composability offers a complementary verification approach System is composable if appl. cannot interfere with each other Cannot affect each others timings by even a single clock cycle Composable systems provide isolation between applications Applications can be composed like pieces in a puzzle 24

25 Composability Composability simplifies verification for three main reasons: 1. Applications can be verified by simulation in isolation Linear instead of exponential verification complexity 2. Increases simulation speed Only need to simulate application and its required resources 3. Incremental verification process Verification process can start when first application is available 25

26 Existing Approaches There are currently three approaches to composable systems: 1. Not sharing any resources Trivially composable, but prohibitively expensive for SDAM 2. Statically schedule all resource accesses at design time equires a global notion of time Limited to applications that can be statically scheduled 3. Share resources at run-time using TDM equires resource with constant execution time Cannot efficiently satisfy tight latency requirements 26

27 Overview of Approach e propose a fourth approach to composability Idea is to delay responses and flow control until worst case Emulates maximum interference from other applications Makes applications temporally independent Our approach to composability is based on predictability Cannot emulate worst-case interference unless you know what it is orst-case finishing time of requestor in composable system is unaffected when other requestor changes behavior. 27

28 Benefits of Approach Does not have any assumptions on the applications orks with all applications that cannot be formally verified orks with any combination of predictable resource and arbiter Traditional approaches do not support SDAM controllers or priority-based arbitration Composability can be enabled/disabled per appl. at run-time Enables slack to improve performance of non-real-time appl. More efficient than using TDM Does not require constant (worst-case) execution time per request Memory never has to idle 28

29 Composable esource Front End Implemented by adding a Delay Block to the front-end Computes worst-case scheduling time and finishing time at arrival orst-case scheduling time used to delay flow-control signal esponses held in block until worst-case finishing time 29

30 Presentation Outline Introduction SDAM overview Predictable SDAM controller Composable SDAM controller Conclusions 30

31 Conclusions e propose a predictable and composable SDAM controller Addresses real-time verification problem in SoCs Predictability enables formal verification Covers all possible combinations of applications Achieved by memory patterns and predictable arbitration equires performance model of applications and hardware Composability enables verification by simulation in isolation Only covers simulated traces e emulate worst-case interference from other applications Approach does not have any assumptions on the application 31

32 eferences Predictable and Composable System-on-Chip Memory Controllers Benny Åkesson In: Ph.D. Dissertation February 2010, Eindhoven University of Technology Efficient Service Allocation in Hardware Using Credit-Controlled Static-Priority Arbitration Benny Åkesson, Liesbeth Steffens, and Kees Goossens In: Int'l Conference on Embedded and eal-time Computing Systems and Applications (TCSA), 2009 Composable esource Sharing Based on Latency-ate Servers Benny Åkesson, Andreas Hansson, and Kees Goossens In: 12th Euromicro Conference on Digital System Design (DSD), 2009 eal-time Scheduling Using Credit-Controlled Static-Priority Arbitration Benny Åkesson, Liesbeth Steffens, Eelke Strooisma, and Kees Goossens In: Int'l Conference on Embedded and eal-time Computing Systems and Applications (TCSA), 2008 Predator: A Predictable SDAM Memory Controller Benny Åkesson, Kees Goossens and Markus inghofer In: Int'l Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS),

33

Mode-Controlled Dataflow Modeling of Real-Time Memory Controllers

Mode-Controlled Dataflow Modeling of Real-Time Memory Controllers Mode-Controlled Dataflow Modeling of eal-time Memory Controllers Yonghui Li, Hrishikesh alunkhe, Joao Bastos, Orlando Moreira 2, Benny Akesson 3 and Kees Goossens Eindhoven University of Technology, the

More information

An introduction to SDRAM and memory controllers. 5kk73

An introduction to SDRAM and memory controllers. 5kk73 An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part

More information

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni orst Case Analysis of DAM Latency in Multi-equestor Systems Zheng Pei u Yogen Krish odolfo Pellizzoni Multi-equestor Systems CPU CPU CPU Inter-connect DAM DMA I/O 1/26 Multi-equestor Systems CPU CPU CPU

More information

Memory Controllers for Real-Time Embedded Systems. Benny Akesson Czech Technical University in Prague

Memory Controllers for Real-Time Embedded Systems. Benny Akesson Czech Technical University in Prague Memory Controllers for Real-Time Embedded Systems Benny Akesson Czech Technical University in Prague Trends in Embedded Systems Embedded systems get increasingly complex Increasingly complex applications

More information

Networks on Chip. on-chip interconnect: physical. Kees Goossens. Kees Goossens Eindhoven University of Technology 1

Networks on Chip. on-chip interconnect: physical. Kees Goossens. Kees Goossens Eindhoven University of Technology 1 1 Networks on Chip Kees Goossens Kees Goossens Group Electrical Engineering Faculty on-chip interconnect: physical Kees Goossens

More information

Efficient real-time SDRAM performance

Efficient real-time SDRAM performance 1 Efficient real-time SDRAM performance Kees Goossens with Benny Akesson, Sven Goossens, Karthik Chandrasekar, Manil Dev Gomony, Tim Kouters, and others Kees Goossens

More information

Embedded Systems. Series Editors

Embedded Systems. Series Editors Embedded Systems Series Editors Nikil D. Dutt, Department of Computer Science, Zot Code 3435, Donald Bren School of Information and Computer Sciences, University of California, Irvine, CA 92697-3435, USA

More information

Kees Goossens Electronic Systems TM 3218 PPMA TPBC T-PIC MBS AICP1 AICP2 VMPG VIP1 VIP2 MSP1 MSP2 MSP3 S S R W M S M S M S M S M S M S

Kees Goossens Electronic Systems TM 3218 PPMA TPBC T-PIC MBS AICP1 AICP2 VMPG VIP1 VIP2 MSP1 MSP2 MSP3 S S R W M S M S M S M S M S M S EJTAG FPBC -PIC DBG PBC GLOBAL IIC x CLOCK EET T_DBG BOOT P90 -Bridge -PI Bus F-PI Bus F-Gate DE PCI -Gate DA CAD UAT x UB 9 emory Controller C-Bridge PPA B AICP AICP VPG V V P P P T-Gate T 8 T-PI Bus

More information

Composable Resource Sharing Based on Latency-Rate Servers

Composable Resource Sharing Based on Latency-Rate Servers Composable Resource Sharing Based on Latency-Rate Servers Benny Akesson 1, Andreas Hansson 1, Kees Goossens 2,3 1 Eindhoven University of Technology 2 NXP Semiconductors Research 3 Delft University of

More information

trend: embedded systems Composable Timing and Energy in CompSOC trend: multiple applications on one device problem: design time 3 composability

trend: embedded systems Composable Timing and Energy in CompSOC trend: multiple applications on one device problem: design time 3 composability Eindhoven University of Technology This research is supported by EU grants T-CTEST, Cobra and NL grant NEST. Parts of the platform were developed in COMCAS, Scalopes, TSA, NEVA,

More information

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University Design and Test Solutions for Networks-on-Chip Jin-Ho Ahn Hoseo University Topics Introduction NoC Basics NoC-elated esearch Topics NoC Design Procedure Case Studies of eal Applications NoC-Based SoC Testing

More information

Embedded Systems. Series Editors

Embedded Systems. Series Editors Embedded Systems Series Editors Nikil D. Dutt, Department of Computer Science, Zot Code 3435, Donald Bren School of Information and Computer Sciences, University of California, Irvine, CA 92697-3435, USA

More information

CENG3420 Lecture 08: Memory Organization

CENG3420 Lecture 08: Memory Organization CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48 Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization

Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization Karthik Chandrasekar, Sven Goossens 2, Christian Weis 3, Martijn Koedam 2, Benny Akesson 4, Norbert Wehn 3, and Kees

More information

Introduction to memory system :from device to system

Introduction to memory system :from device to system Introduction to memory system :from device to system Jianhui Yue Electrical and Computer Engineering University of Maine The Position of DRAM in the Computer 2 The Complexity of Memory 3 Question Assume

More information

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models Lecture: Memory, Multiprocessors Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models 1 Refresh Every DRAM cell must be refreshed within a 64 ms window A row

More information

A Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems

A Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems A Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems Sven Goossens, Jasper Kuijsten, Benny Akesson, Kees Goossens Eindhoven University of Technology {s.l.m.goossens,k.b.akesson,k.g.w.goossens}@tue.nl

More information

COMPUTER ARCHITECTURES

COMPUTER ARCHITECTURES COMPUTER ARCHITECTURES Random Access Memory Technologies Gábor Horváth BUTE Department of Networked Systems and Services ghorvath@hit.bme.hu Budapest, 2019. 02. 24. Department of Networked Systems and

More information

Memories: Memory Technology

Memories: Memory Technology Memories: Memory Technology Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Memory Hierarchy

More information

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 18-447: Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 Reminder: Homework 5 (Today) Due April 3 (Wednesday!) Topics: Vector processing,

More information

BlueVisor: A Scalable Real-time Hardware Hypervisor for Many-core Embedded System

BlueVisor: A Scalable Real-time Hardware Hypervisor for Many-core Embedded System BlueVisor: A Scalable eal-time Hardware Hypervisor for Many-core Embedded System Zhe Jiang, Neil C Audsley, Pan Dong eal-time Systems Group Department of Computer Science University of York, United Kingdom

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information

CENG4480 Lecture 09: Memory 1

CENG4480 Lecture 09: Memory 1 CENG4480 Lecture 09: Memory 1 Bei Yu byu@cse.cuhk.edu.hk (Latest update: November 8, 2017) Fall 2017 1 / 37 Overview Introduction Memory Principle Random Access Memory (RAM) Non-Volatile Memory Conclusion

More information

Design and Implementation of Refresh and Timing Controller Unit for LPDDR2 Memory Controller

Design and Implementation of Refresh and Timing Controller Unit for LPDDR2 Memory Controller Design and Implementation of Refresh and Timing Controller Unit for LPDDR2 Memory Controller Sandya M.J Dept. of Electronics and communication BNM Institute Of Technology Chaitra.N Dept. of Electronics

More information

Synthesizable CIO DDR RLDRAM II Controller for Virtex-II Pro FPGAs Author: Rodrigo Angel

Synthesizable CIO DDR RLDRAM II Controller for Virtex-II Pro FPGAs Author: Rodrigo Angel XAPP771 (v1.0) June 13, 2005 Application Note: Virtex-II Pro Devices Synthesizable CIO DD LDAM II Controller for Virtex-II Pro FPGAs Author: odrigo Angel Summary This application note describes how to

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS TECHNOLOGY BRIEF June 2002 Compaq Computer Corporation Prepared by ISS Technology Communications C ONTENTS Executive Summary 1 Notice 2 Introduction 3 SDRAM Operation 3 How CAS Latency Affects System Performance

More information

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections ) Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case

More information

Mapping and Configuration Methods for Multi-Use-Case Networks on Chips

Mapping and Configuration Methods for Multi-Use-Case Networks on Chips Mapping and Configuration Methods for Multi-Use-Case Networks on Chips Srinivasan Murali, Stanford University Martijn Coenen, Andrei Radulescu, Kees Goossens, Giovanni De Micheli, Ecole Polytechnique Federal

More information

Worst Case Analysis of DRAM Latency in Hard Real Time Systems

Worst Case Analysis of DRAM Latency in Hard Real Time Systems Worst Case Analysis of DRAM Latency in Hard Real Time Systems by Zheng Pei Wu A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied

More information

The Memory Hierarchy Part I

The Memory Hierarchy Part I Chapter 6 The Memory Hierarchy Part I The slides of Part I are taken in large part from V. Heuring & H. Jordan, Computer Systems esign and Architecture 1997. 1 Outline: Memory components: RAM memory cells

More information

EEM 486: Computer Architecture. Lecture 9. Memory

EEM 486: Computer Architecture. Lecture 9. Memory EEM 486: Computer Architecture Lecture 9 Memory The Big Picture Designing a Multiple Clock Cycle Datapath Processor Control Memory Input Datapath Output The following slides belong to Prof. Onur Mutlu

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design ECE 1160/2160 Embedded Systems Design Midterm Review Wei Gao ECE 1160/2160 Embedded Systems Design 1 Midterm Exam When: next Monday (10/16) 4:30-5:45pm Where: Benedum G26 15% of your final grade What about:

More information

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: April 9, 2018 at 12:16 CS429 Slideset 17: 1 Random-Access Memory

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3) Lecture 15: DRAM Main Memory Systems Today: DRAM basics and innovations (Section 2.3) 1 Memory Architecture Processor Memory Controller Address/Cmd Bank Row Buffer DIMM Data DIMM: a PCB with DRAM chips

More information

Main Memory Systems. Department of Electrical Engineering Stanford University Lecture 5-1

Main Memory Systems. Department of Electrical Engineering Stanford University   Lecture 5-1 Lecture 5 Main Memory Systems Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 5-1 Announcements If you don t have a group of 3, contact us ASAP HW-1 is

More information

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,

More information

Random-Access Memory (RAM) CS429: Computer Organization and Architecture. SRAM and DRAM. Flash / RAM Summary. Storage Technologies

Random-Access Memory (RAM) CS429: Computer Organization and Architecture. SRAM and DRAM. Flash / RAM Summary. Storage Technologies Random-ccess Memory (RM) CS429: Computer Organization and rchitecture Dr. Bill Young Department of Computer Science University of Texas at ustin Key Features RM is packaged as a chip The basic storage

More information

Where Have We Been? Ch. 6 Memory Technology

Where Have We Been? Ch. 6 Memory Technology Where Have We Been? Combinational and Sequential Logic Finite State Machines Computer Architecture Instruction Set Architecture Tracing Instructions at the Register Level Building a CPU Pipelining Where

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: November 28, 2017 at 14:31 CS429 Slideset 18: 1 Random-Access Memory

More information

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad nc. Application Note AN1801 Rev. 0.2, 11/2003 Performance Differences between MPC8240 and the Tsi106 Host Bridge Top Changwatchai Roy Jenevein risc10@email.sps.mot.com CPD Applications This paper discusses

More information

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems Mattan Erez The University of Texas at Austin EE382: Principles of Computer Architecture, Fall 2011 -- Lecture

More information

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,

More information

Asynchronous on-chip Communication: Explorations on the Intel PXA27x Peripheral Bus

Asynchronous on-chip Communication: Explorations on the Intel PXA27x Peripheral Bus Asynchronous on-chip Communication: Explorations on the Intel PXA27x Peripheral Bus Andrew M. Scott, Mark E. Schuelein, Marly Roncken, Jin-Jer Hwan John Bainbridge, John R. Mawer, David L. Jackson, Andrew

More information

Refresh-Aware DDR3 Barrel Memory Controller with Deterministic Functionality

Refresh-Aware DDR3 Barrel Memory Controller with Deterministic Functionality Refresh-Aware DDR3 Barrel Memory Controller with Deterministic Functionality Abir M zah Department of Computer and System Engineering ENSTA-Paristech, 828 Blvd Marechaux, Palaiseau, France abir.mzah@ensta-paristech.fr

More information

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness EE 352 Unit 10 Memory System Overview SRAM vs. DRAM DMA & Endian-ness The Memory Wall Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology

More information

Memory. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Memory. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Memory Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Building a Processor memory inst register file alu PC +4 +4 new pc offset target imm control extend =? cmp

More information

Embedded Systems: Hardware Components (part II) Todor Stefanov

Embedded Systems: Hardware Components (part II) Todor Stefanov Embedded Systems: Hardware Components (part II) Todor Stefanov Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded

More information

ECE 485/585 Microprocessor System Design

ECE 485/585 Microprocessor System Design Microprocessor System Design Lecture 5: Zeshan Chishti DRAM Basics DRAM Evolution SDRAM-based Memory Systems Electrical and Computer Engineering Dept. Maseeh College of Engineering and Computer Science

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, 2006 Sr. Principal Engineer Panel Questions How do we build scalable networks that balance power, reliability and performance

More information

Variability Windows for Predictable DDR Controllers, A Technical Report

Variability Windows for Predictable DDR Controllers, A Technical Report Variability Windows for Predictable DDR Controllers, A Technical Report MOHAMED HASSAN 1 INTRODUCTION In this technical report, we detail the derivation of the variability window for the eight predictable

More information

High-Performance DDR3 SDRAM Interface in Virtex-5 Devices Author: Adrian Cosoroaba

High-Performance DDR3 SDRAM Interface in Virtex-5 Devices Author: Adrian Cosoroaba Application Note: Virtex-5 FPGAs XAPP867 (v1.2.1) July 9, 2009 High-Performance DD3 SDAM Interface in Virtex-5 Devices Author: Adrian Cosoroaba Summary Introduction DD3 SDAM Overview This application note

More information

ECE 152 Introduction to Computer Architecture

ECE 152 Introduction to Computer Architecture Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2009 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2009 1 Where We Are in This Course

More information

The RM9150 and the Fast Device Bus High Speed Interconnect

The RM9150 and the Fast Device Bus High Speed Interconnect The RM9150 and the Fast Device High Speed Interconnect John R. Kinsel Principal Engineer www.pmc -sierra.com 1 August 2004 Agenda CPU-based SOC Design Challenges Fast Device (FDB) Overview Generic Device

More information

ECE 486/586. Computer Architecture. Lecture # 2

ECE 486/586. Computer Architecture. Lecture # 2 ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:

More information

4. Hardware Platform: Real-Time Requirements

4. Hardware Platform: Real-Time Requirements 4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Bounding SDRAM Interference: Detailed Analysis vs. Latency-Rate Analysis

Bounding SDRAM Interference: Detailed Analysis vs. Latency-Rate Analysis Bounding SDRAM Interference: Detailed Analysis vs. Latency-Rate Analysis Hardik Shah 1, Alois Knoll 2, and Benny Akesson 3 1 fortiss GmbH, Germany, 2 Technical University Munich, Germany, 3 CISTER-ISEP

More information

Real-Time Mixed-Criticality Wormhole Networks

Real-Time Mixed-Criticality Wormhole Networks eal-time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak eal-time Systems Group Department of Computer Science University of York United Kingdom eal-time Systems Group 1 Outline Wormhole Networks

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

Managing Memory for Timing Predictability. Rodolfo Pellizzoni

Managing Memory for Timing Predictability. Rodolfo Pellizzoni Managing Memory for Timing Predictability Rodolfo Pellizzoni Thanks This work would not have been possible without the following students and collaborators Zheng Pei Wu*, Yogen Krish Heechul Yun* Renato

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 CPU-Memory Bottleneck Computer Architecture ELEC44 CPU Memory Lecture 9 Cache Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Performance of high-speed computers is usually limited

More information

Lecture: Memory Technology Innovations

Lecture: Memory Technology Innovations Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile cells, photonics Multiprocessor intro 1 Row Buffers

More information

CS250 VLSI Systems Design Lecture 9: Memory

CS250 VLSI Systems Design Lecture 9: Memory CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled

More information

Conquering Memory Bandwidth Challenges in High-Performance SoCs

Conquering Memory Bandwidth Challenges in High-Performance SoCs Conquering Memory Bandwidth Challenges in High-Performance SoCs ABSTRACT High end System on Chip (SoC) architectures consist of tens of processing engines. In SoCs targeted at high performance computing

More information

OVERCOMING THE MEMORY WALL FINAL REPORT. By Jennifer Inouye Paul Molloy Matt Wisler

OVERCOMING THE MEMORY WALL FINAL REPORT. By Jennifer Inouye Paul Molloy Matt Wisler OVERCOMING THE MEMORY WALL FINAL REPORT By Jennifer Inouye Paul Molloy Matt Wisler ECE/CS 570 OREGON STATE UNIVERSITY Winter 2012 Contents 1. Introduction... 3 2. Background... 5 3. 3D Stacked Memory...

More information

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013 NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching

More information

DDR2 Controller Using Virtex-4 Devices Author: Tze Yi Yeoh

DDR2 Controller Using Virtex-4 Devices Author: Tze Yi Yeoh Application Note: Virtex-4 Family XAPP702 (v1.8) April 23, 2007 DD2 Controller Using Virtex-4 Devices Author: Tze Yi Yeoh Summary DD2 SDAM devices offer new features that surpass the DD SDAM specifications

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved

More information

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 483 Computer Organization Chapter 5 Large and Fast: Exploiting Memory Hierarchy Chansu Yu Table of Contents Ch.1 Introduction Ch. 2 Instruction: Machine Language Ch. 3-4 CPU Implementation Ch. 5 Cache

More information

Chapter Seven Morgan Kaufmann Publishers

Chapter Seven Morgan Kaufmann Publishers Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Cache 11232011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Memory Components/Boards Two-Level Memory Hierarchy

More information

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing

More information

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, intro to multi-thread programming models

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, intro to multi-thread programming models Lecture: Memory, Coherence Protocols Topics: wrap-up of memory systems, intro to multi-thread programming models 1 Refresh Every DRAM cell must be refreshed within a 64 ms window A row read/write automatically

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

State Elements. Register File Design and Memory Design. An unclocked state element. Latches and Flip-flops

State Elements. Register File Design and Memory Design. An unclocked state element. Latches and Flip-flops SE 67.0: Introduction to omputer Architecture State Elements egister File esign and Memory esign Unclocked vs. locked locks used in synchronous logic when shouln element that contains state be updated?

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

High-Performance DDR2 SDRAM Interface in Virtex-5 Devices Authors: Karthi Palanisamy and Maria George

High-Performance DDR2 SDRAM Interface in Virtex-5 Devices Authors: Karthi Palanisamy and Maria George Application Note: Virtex-5 FPGAs XAPP858 (v1.1) January 9, 2007 High-Performance DD2 SDAM Interface in Virtex-5 Devices Authors: Karthi Palanisamy and Maria George Summary Introduction This application

More information

Accessing I/O Devices Interface to CPU and Memory Interface to one or more peripherals Generic Model of IO Module Interface for an IO Device: CPU checks I/O module device status I/O module returns status

More information

Timing Analysis on Complex Real-Time Automotive Multicore Architectures

Timing Analysis on Complex Real-Time Automotive Multicore Architectures 2 nd Workshop on Mapping Applications to MPSoCs St. Goar, June 2009 Timing Analysis on Complex Real-Time Automotive Multicore Architectures Mircea Negrean Simon Schliecker Rolf Ernst Technische Universität

More information

VLSI Design Automation. Maurizio Palesi

VLSI Design Automation. Maurizio Palesi VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 Outline Technology trends VLSI Design flow (an overview) 3 IC Products Processors CPU, DSP, Controllers Memory chips

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design SRAMs to Memory Low Power VLSI System Design Lecture 0: Low Power Memory Design Prof. R. Iris Bahar October, 07 Last lecture focused on the SRAM cell and the D or D memory architecture built from these

More information

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now? cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example

More information

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 20: Main Memory II Prof. Onur Mutlu Carnegie Mellon University Today SRAM vs. DRAM Interleaving/Banking DRAM Microarchitecture Memory controller Memory buses

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,

More information

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer) ESE Back End 2.0 D. Gajski, S. Abdi (with contributions from H. Cho, D. Shin, A. Gerstlauer) Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu 1 Technology advantages

More information

Hardware-Software Codesign

Hardware-Software Codesign Hardware-Software Codesign 8. Performance Estimation Lothar Thiele 8-1 System Design specification system synthesis estimation -compilation intellectual prop. code instruction set HW-synthesis intellectual

More information