Hands-On Workshop: Memory Configuration and Throughput

Size: px
Start display at page:

Download "Hands-On Workshop: Memory Configuration and Throughput"

Transcription

1 Hands-On Workshop: Memory Configuration and Throughput FTF-AUT-F0343 Ioseph Martinez Senior Applications Engineer A P R TM External Use

2 Session Introduction This session reviews the challenges of working with the latest MCUs for automotive instrument cluster & graphic systems. Interconnect complexity and throughput requirements have incremented for systems that do graphical applications Understanding memory system configuration is important because it helps you select the right part for your project multimedia/graphic projects can be underestimated or overestimated if the memory system of the part is not correctly understood External Use 1

3 Session Objectives After completing this session, you will be able to: Calculate bandwidth requirements for different systems Differentiate between the different type of masters and slaves in a system and how they access or get accessed in the system Perform memory bandwidth stress tests to achieve peak bandwidth in Freescale Vybrid controllers External Use 2

4 Agenda Introduction to Vybrid Controllers and Next-generation Cluster Systems QuadSPI Memory Theory and Practice DDR DRAM Theory and Practice Internal SRAM Theory and Practice Session Closure External Use 3

5 Vybrid R Series System External Use 4

6 Next-generation Cluster System External Use 5

7 Key Differences Next-generation cluster systems have internal flash memory while Vybrid processors don t Next-generation cluster DDR2 can be 32 bits wide, while Vybrid processor is 16 bit Vybrid processor has an L2 cache controller Vybrid processor ports for internal SRAM are all AXI. In nextgeneration clusters, some are AXI and others are AHB Vybrid processor operates the core and some other masters at 400 MHz. Next-generation cluster system operates at 320 MHz (system frequency is 133 MHz (R-Series) and 160 MHz respectively) External Use 6

8 About the Masters Masters initiate and drive access to the slaves A5 and M4 can consume some of the bandwidth, but caches relieve most of the load from the system Most of the opcodes require more than 1 cycle to execute. Load is reduced based on the type of encoding used Masters may operate at different frequencies depending on whether they are clocked at system frequency or a multiple of it Latencies and peak bandwidth on each master also depends on the slave being accessed External Use 7

9 2D-ACE: Display Controller Bandwidth per layer = pixel clock * bytes per pixel Maximum 6 layers blend in a single pixel External Use 8

10 Graphics Processing Unit: OpenVG1.1 Full fixed function hardware vector graphics GPU Hardware tessellation: Minimum CPU involvement 16x FSAA: Photorealistic quality Multiformat rendering High quality vector font rendering Standard API OpenVG1.1 Output bandwidth = sysfreq * pixels 200 Mpixels for Vybrid processor 160 Mpixels for next-generation cluster Input bandwidth = 4 x output bandwidth GC355 GPU Core AHB AXI Host Interface Memory Controller Graphics Pipeline Front End Vector Graphics Engine Imaging Engine VG Pixel Engine External Use 9 r0: 23-Sep-13

11 About the Slaves Slaves are passive elements accessed by the masters in the system. They stand by until a master accesses them Some slaves are read only while others are read/write. Read/write can double the bandwidth Some slaves have higher latency for random accesses than others (ex. external DRAM and QSPI) Some slaves have more than 1 instance of the same module External Use 10

12 Agenda Introduction to Vybrid Controllers and Next-generation Cluster Systems QuadSPI Memory Theory and Practice DDR DRAM Theory and Practice Internal SRAM Theory and Practice Session Closure External Use 11

13 QSPI Features Dual QuadSPI architecture supports: Two external serial flashes per QuadSPI module Programmable sequence engine compatible with any serial flash Supports up to 4 chip selects QuadSPI can control 2 x 4-bit serial flashes: Individual flash mode Parallel mode enabling octal flash with data recombination internally in QuadSPI READING ONLY Flexible receive (Rx) buffering scheme: Sub-buffers allocated to specific masters Master prioritisation Pre-fetch capability Suspend & resume for lower priority masters Up to 100 MHz clock (200 MByte/s peak bandwidth) in Next Gen Cluster External Use 12

14 QuadSPI Bandwidth Serial interface bandwidth (b/w): Peak b/w = [66 Mhz(sclk) * 4(quad) * 2(parallel mode) *2(ddr)] / [8bits/byte] = 132 MByte/sec Effective b/w: Less than peak b/w. Overhead due to flash command Impact depends on data size transferred AXI Read Request 1 st Databeat available on AHB Sclk Cycles? 8 4/ ? FIRST ACCESS Pre Command Addr Mode Dummy Data Post SUBSEQUENT ACCESSES 64 bit databeat in 4 cycles Sclk Cycles? 4/ ? Pre Addr Mode Dummy Data Post Effective bandwidth: Access 18-4 (same command in subsequent access) Effective b/w (128 byte access, XIP, 24add) = (32/( )) % = MByte/sec Effective b/w (128 byte access, XIP, 32add) = 98.2 MByte/sec Effective b/w (256 byte access, XIP, 32add) = MByte/sec External Use 13

15 Laboratory 1, Part 1: QSPI: Flashing memory Step 1: Open Lab1.eww by double clicking on it Step 2: Build the project (F7) Step 3: Download the project (Ctrl+D) Step 4: Debug the project run Step 5: Wait for the program to erase and program the memories (this may take more than 30 seconds) Step 6: Some colors will appear on the screen How does it look? External Use 14

16 Laboratory 1, Part 1: QSPI Flashing memory Step 7: Break code to debug: Menu Debug>Break Step 8: Go to menu View>Register and select DCU0 Step 9: Select DCU0_DIV_RATIO:DIV_RATIO Step 10: Change the pixel clock to a lower frequency until the image looks correct on the screen. (Increment the value of the Divider) What is the divider value at which the image looks correct? External Use 15

17 Laboratory 1, Part 2: QSPI: Bits per pixel Step 1: Stop Debugging. Open image.c file. Step 2: Comment the following line: #define PROGRAM_GRAPHICS Step 2: Select a different image with lower resolution by selecting (3) on the following line: #define IMGNUMBER (3) Step 8: Rebuild, debug and run again Step 9: If the image does not looks correct, try to find a DCU0 clock divider on which the image looks correct. What is the value at which the image looks correct? Now try with: #define IMGNUMBER (1) External Use 16

18 Screen Pixel Clock & QSPI Throughput Screen pixel 60 fps: WQVGA (480 x 272): #9 MHz WVGA (800 x 480): #32 MHz QSPI clock max throughput: DDR MHz 200 MB/s max in next gen cluster DDR 8 66 MHz 132 MB/s max in Vybrid processor Per layer 2D-ACE required throughput: 8 9 MHz 9 MB/s max 22 layers can be blended in next-gen cluster, 14 layers in Vybrid processor 16 9 MHz 18 MB/s max 11 layers can be blended in next-gen cluster, 7 layers in Vybrid processor 8 32 MHz 32 MB/s max 6 layers can be blended in next-gen cluster, 4 layers in Vybrid processor MHz 64 MB/s max 3 layers can be blended in next-gen cluster, 2 layers in Vybrid processor (Theoretical/ideal use cases) External Use 17

19 Laboratory 1, Part 3: QSPI 2D-ACE Blending Step 1: Start Over, open image.c file Step 2: Uncomment the following line: #define EXTRALAYER8BPP Step 3: Rebuild, debug and run again Does the image looks correct? Step 4: If the image does not looks correct, try to find a DCU0 clock divider on which the image looks correct. External Use 18

20 Laboratory 1, Part 3: QSPI 2D-ACE Blending Step 5: Stop Debugging, open image.c file Step 6: Uncomment the following line: #define QUADREADS Step 7: Select a different image with higher resolution by selecting (0) on the following line: #define IMGNUMBER (0) Step 8: Rebuild, debug and run again Does the image looks correct? Step 8: Stop Debugging and uncomment the following line: #define EXTRALAYER16BPP Step 9: Rebuild, debug and run again External Use 19

21 Laboratory 1, Part 4: QSPI Parallel Mode Step 1: Start Over, open image.c file Step 2: Uncomment the following line: #define PARALLELREADS Step 3: Rebuild, debug and run again Does the image looks correct? External Use 20

22 QuadSPI Memory Map Serial and Parallel Region Start Address End Address Size (MB) QSPI0 0x2000_0000 0x2FFF_FFFF 256 AMBA_BASE SFA1AD SFA2AD Serial Mode QSPI1 0x5000_0000 0x5FFF_FFFF 256 A1 A2 AMBA_BASE SFA2AD Parallel Mode A1 + B1 SFB1AD SFB2AD B1 B2 SFB2AD A2 + B2 External Use 21

23 Laboratory 1, Part 4: QSPI Parallel Mode Step 4: Start Over, open image.c file Step 5: Uncomment the following line: #define PROGRAM_GRAPHICS Step 6: Rebuild, debug and run again. Step 7: Wait until something shows on the screen, it will take a while since we are re-flashing the memory. Step 8: Close the debug session and comment again: #define PROGRAM_GRAPHICS Step 9: Rebuild, debug and run again. Does the image looks correct? External Use 22

24 Agenda Introduction to Vybrid Controllers and Next-generation Cluster Systems QuadSPI Memory Theory and Practice DRAM Theory and Practice Internal SRAM Theory and Practice Session Closure External Use 23

25 DRAM Controller Next-generation cluster devices and Vybrid processors have different types of DRAM controllers: Next-gen: Supports SDR 16 MHz and DDR 16/ MHz Vybrid: Supports LPDDR2 & DDR MHz In the case of the next-gen cluster devices the A5, GPU and 2D- ACE has direct access to the DRAM for more efficient access There are some penalties for different data access methods. The most efficient way is linear access Peak bandwidth is calculated this way: Peak BW = Freq * BusWidth * mode Mode = 2 if DDR otherwise Mode = 1 Effective BW is a complex thing to calculate, but it is OK to generalize to certain efficiency level External Use 24

26 Screen Pixel Clock & DRAM Throughput Screen pixel 60 fps: WQVGA (480 x 272): #9 MHz WVGA (800 x 480): #32 MHz DRAM clock max throughput: DDR MHz 2560 MB/s max in next-gen cluster DDR MHz 1600 MB/s max in Vybrid processor SDR MHz 320 MB/s max in next-gen cluster Per layer 2D-ACE required throughput: 24 9 MHz 27 MB/s max 94 layers can be blended in next-gen cluster, 59 layers in Vybrid processor, 11 with SDR memory 32 9 MHz 32 MB/s max 80 layers can be blended in next-gen cluster, 50 layers in Vybrid processor, 10 with SDR memory MHz 96 MB/s max 26 layers can be blended in next-gen cluster, 16 layers in Vybrid processor, 3 with SDR memory MHz 128 MB/s max 20 layers can be blended in next-gen cluster, 12 layers in Vybrid processor, 2 with SDR memory (Theoretical/ideal use cases) External Use 25

27 Laboratory 2: DRAM, Overhead Step 1: Open Lab2.eww Step 2: Build the project (F7) Step 3: Download the project (Ctrl+D) Step 4: Debug the project run Step 5: Look at the serial console, what is the time spent on that function? Step 6: Modify the size of the buffer set BUFFERSMALLHEIGHT = 4 Step 7: Rebuild, debug and run again What is the time spent on that function? External Use 26

28 Laboratory 2: DRAM, GPU Write Step 1: Start Over, open image.c file Step 2: Comment #define TESTOVERHEAD Step 3: Uncomment #define TESTCLEAR Step 4: Rebuild, debug and run again What is the time spent on the each of the two operations? Do the numbers make sense? What is the achieved BW? Actual time = measured time - overhead External Use 27

29 Laboratory 2: DRAM, GPU Write Step 5: Start Over, open image.c file Step 6: Uncomment #define TESTCLEAR Step 7: Rebuild, debug and run again What is the achieved BW for the 32bpp operations? How it can be compared to the 16bpp operations? External Use 28

30 Laboratory 2: DRAM, GPU Copy Step 1: Start Over, open image.c file Step 2: Uncomment #define TESTCOPY Step 3: Rebuild, debug and run again What is the achieved BW for the operations? External Use 29

31 Laboratory 2: DRAM, GPU Blend Step 1: Start Over, open image.c file Step 2: Uncomment #define TESTBLEND Step 3: Rebuild, debug and run again What is the achieved BW for the operations? External Use 30

32 Laboratory 2: DRAM, GPU Rotate Step 1: Start Over, open image.c file Step 2: Uncomment #define TESTROTATE Step 3: Rebuild, debug and run again What is the achieved BW for the operations? External Use 31

33 Laboratory 2: DRAM, GPU QSPI Step 1: Start Over, open image.c file Step 2: Uncomment #define TESTQSPI Step 3: Rebuild, debug and run again What is the achieved BW for the operations? External Use 32

34 Agenda Introduction to Vybrid Controllers and Next-generation Cluster Systems QuadSPI Memory Theory and Practice DRAM Theory and Practice Internal SRAM Theory and Practice Session Closure External Use 33

35 RAM Controller On next-generation cluster devices there are two types of internal RAM: System RAM: Uses AHB port Graphics RAM: Uses AXI port Peak bandwidth = Freq * BusWidth Some features of the next-gen internal RAM controller: 1.3 MByte graphics SRAM block does not natively support ECC FlexECC enables conversion of non-ecc SRAM into ECC SRAM 1.3 MBytes non-ecc SRAM converts to 1 MByte ECC SRAM 320 kbytes are sacrificed as a syndrome-array 128 kbyte contains the packed ECC syndromes 192 kbyte becomes inaccessible Separate path from RAM controller to the syndrome-array allows parallel fetch of data and syndrome External Use 34

36 Screen Pixel Clock & SRAM Throughput Screen Pixel 60 fps: WQVGA (480 x 272): #9 MHz WVGA (800 x 480): #32 MHz DRAM clock max throughput: 160 MHz 1280 MB/s max in next-gen cluster 133 MHz 1064 MB/s max in Vybrid Per layer 2D-ACE required throughput: 9 MHz 27 MB/s max 47 layers can be blended in next-gen cluster, 39 layers in Vybrid processor 9 MHz 32 MB/s max 40 layers can be blended in next gen cluster, 33 layers in Vybrid processor 32 MHz 96 MB/s max 13 layers can be blended in next gen cluster, 11 layers in Vybrid processor 32 MHz 128 MB/s max 10 layers can be blended in next gen cluster, 8 layers in Vybrid processor (Theoretical/ideal use cases) External Use 35

37 Laboratory 3: RAM GPU Operations Step 1: Open Lab3.eww Step 2: Rebuild, debug and run again Step 3: Compare the results of DRAM (Lab2 vs. Lab3) in terms of BW Parameters to be tested measured: #define TESTOVERHEAD #define TESTCLEAR #define TEST32BPP #define TESTCOPY #define TESTBLEND #define TESTROTATE #define TESTQSPI External Use 36

38 Agenda Introduction to Vybrid Controllers and Next-generation Cluster Systems QuadSPI Memory Theory and Practice DDR DRAM Theory and Practice Internal SRAM Theory and Practice Session Closure External Use 37

39 Session Summary Graphics systems require full awareness of maximum limits, latencies and effective bandwidth for optimal usage. Each memory will have different limitations or scenarios where a device is most efficient. Application has to be designed considering this. Distributing utilization and bandwidth between the different memories for the different masters is an important requirement for graphics systems, because typically it will offload each slave and allow other masters to perform efficiently External Use 38

40 For Further Information External Use 39

41 Session Closing By now, you should be able to: Effectively describe the general bandwidth requirements of a graphical application based on the system configuration. Use this knowledge to decide what type of platform fits better with your designs Avoid the common problem of running out of bandwidth for a graphic application by using the different memories on a Freescale automotive microcontroller. External Use 40

42 Freescale Semiconductor, Inc. External Use

Understanding Vybrid Architecture

Understanding Vybrid Architecture Freescale Semiconductor, Inc. Application Note Document Number: AN4947 Rev. 0, 07/2014 Understanding Vybrid Architecture by Jiri Kotzian and Rastislav Pavlanin Vybrid controller solutions are built on

More information

Hands-On Workshop: ARM Architectures Optimization Hints & Tips

Hands-On Workshop: ARM Architectures Optimization Hints & Tips Hands-On Workshop: ARM Architectures Optimization Hints & Tips FTF-AUT-F0337 Daniel McKenna Applications Engineer A P R. 2 0 1 4 TM External Use Agenda This hands-on session will take a typical application

More information

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK] Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM

More information

IoT, Wearable, Networking and Automotive Markets Driving External Memory Innovation Jim Cooke, Sr. Ecosystem Enabling Manager, Embedded Business Unit

IoT, Wearable, Networking and Automotive Markets Driving External Memory Innovation Jim Cooke, Sr. Ecosystem Enabling Manager, Embedded Business Unit IoT, Wearable, Networking and Automotive Markets Driving External Memory Innovation Jim Cooke, Sr. Ecosystem Enabling Manager, Embedded Business Unit JCooke@Micron.com 2016Micron Technology, Inc. All rights

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Li Chen, Staff AE Cadence China Agenda Performance Challenges Current Approaches Traffic Profiles Intro Traffic Profiles Implementation

More information

Introduction to Embedded Graphics with Freescale Devices

Introduction to Embedded Graphics with Freescale Devices Freescale Semiconductor Document Number: AN5072 Application Note Rev 0, 02/2015 Introduction to Embedded Graphics with Freescale Devices by: Luis Olea and Ioseph Martinez 1 Introduction The purpose of

More information

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400

More information

TAG Word 0 Word 1 Word 2 Word 3 0x0A0 D2 55 C7 C8 0x0A0 FC FA AC C7 0x0A0 A5 A6 FF 00

TAG Word 0 Word 1 Word 2 Word 3 0x0A0 D2 55 C7 C8 0x0A0 FC FA AC C7 0x0A0 A5 A6 FF 00 ELE 758 Final Examination 2000: Answers and solutions Number of hits = 15 Miss rate = 25 % Miss rate = [5 (misses) / 20 (total memory references)]* 100% = 25% Show the final content of cache using the

More information

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech

More information

Building blocks for 64-bit Systems Development of System IP in ARM

Building blocks for 64-bit Systems Development of System IP in ARM Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects

More information

MAC57D5xx Start-Up Sequence

MAC57D5xx Start-Up Sequence Freescale Semiconductor Document Number: AN5285 Application Note Rev. 0, 05/2016 MAC57D5xx Start-Up Sequence by: Manuel Rodriguez 1 Introduction The MAC57D5xx family is the next generation platform of

More information

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye Negotiating the Maze Getting the most out of memory systems today and tomorrow Robert Kaye 1 System on Chip Memory Systems Systems use external memory Large address space Low cost-per-bit Large interface

More information

An Introduction to SPI-NOR Subsystem. By Vignesh R Texas Instruments India

An Introduction to SPI-NOR Subsystem. By Vignesh R Texas Instruments India An Introduction to SPI-NOR Subsystem By Vignesh R Texas Instruments India vigneshr@ti.com About me Software Engineer at Texas Instruments India Part of Linux team that works on supporting various TI SoCs

More information

Hello, and welcome to this presentation of the STM32L4 System Configuration Controller.

Hello, and welcome to this presentation of the STM32L4 System Configuration Controller. Hello, and welcome to this presentation of the STM32L4 System Configuration Controller. 1 Please note that this presentation has been written for STM32L47x/48x devices. The key differences with other devices

More information

Working with Live Video and Graphics

Working with Live Video and Graphics Working with Live Video and Graphics FTF-AUT-F0464 Oliver Tian Auto FAE MAY.2014 TM External Use Agenda Trend of Video and Graphics in Vehicle Roadmap of Cluster Introduction of Rainbow/Vybrid Working

More information

Computer Memory. Textbook: Chapter 1

Computer Memory. Textbook: Chapter 1 Computer Memory Textbook: Chapter 1 ARM Cortex-M4 User Guide (Section 2.2 Memory Model) STM32F4xx Technical Reference Manual: Chapter 2 Memory and Bus Architecture Chapter 3 Flash Memory Chapter 36 Flexible

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 16

More information

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

ELE 375 Final Exam Fall, 2000 Prof. Martonosi ELE 375 Final Exam Fall, 2000 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in the space

More information

SAMA5D2 Quad SPI (QSPI) Performance. Introduction. SMART ARM-based Microprocessor APPLICATION NOTE

SAMA5D2 Quad SPI (QSPI) Performance. Introduction. SMART ARM-based Microprocessor APPLICATION NOTE SMART ARM-based Microprocessor SAMA5D2 Quad SPI (QSPI) Performance APPLICATION NOTE Introduction The Atmel SMART SAMA5D2 Series is a high-performance, powerefficient embedded MPU based on the ARM Cortex

More information

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies

More information

Memory technology and optimizations ( 2.3) Main Memory

Memory technology and optimizations ( 2.3) Main Memory Memory technology and optimizations ( 2.3) 47 Main Memory Performance of Main Memory: Latency: affects Cache Miss Penalty» Access Time: time between request and word arrival» Cycle Time: minimum time between

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

Zynq-7000 All Programmable SoC Product Overview

Zynq-7000 All Programmable SoC Product Overview Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform

More information

Chapter 6 Storage and Other I/O Topics

Chapter 6 Storage and Other I/O Topics Department of Electr rical Eng ineering, Chapter 6 Storage and Other I/O Topics 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Feng-Chia Unive ersity Outline 6.1 Introduction 6.2 Dependability,

More information

Effective System Design with ARM System IP

Effective System Design with ARM System IP Effective System Design with ARM System IP Mentor Technical Forum 2009 Serge Poublan Product Marketing Manager ARM 1 Higher level of integration WiFi Platform OS Graphic 13 days standby Bluetooth MP3 Camera

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

AVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction.

AVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction. AVR XMEGA TM Product Introduction 32-bit AVR UC3 AVR Flash Microcontrollers The highest performance AVR in the world 8/16-bit AVR XMEGA Peripheral Performance 8-bit megaavr The world s most successful

More information

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU New STM32 F7 Series World s 1 st to market, ARM Cortex -M7 based 32-bit MCU 7 Keys of STM32 F7 series 2 1 2 3 4 5 6 7 First. ST is first to sample a fully functional Cortex-M7 based 32-bit MCU : STM32

More information

ARM Multimedia IP: working together to drive down system power and bandwidth

ARM Multimedia IP: working together to drive down system power and bandwidth ARM Multimedia IP: working together to drive down system power and bandwidth Speaker: Robert Kong ARM China FAE Author: Sean Ellis ARM Architect 1 Agenda System power overview Bandwidth, bandwidth, bandwidth!

More information

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics Overcoming the Memory System Challenge in Dataflow Processing Darren Jones, Wave Computing Drew Wingard, Sonics Current Technology Limits Deep Learning Performance Deep Learning Dataflow Graph Existing

More information

Hands-On Workshop: An Introduction to OpenVG

Hands-On Workshop: An Introduction to OpenVG Hands-On Workshop: An Introduction to OpenVG FTF-AUT-F0342 Steve McAslan Senior Member of Technical Staff A P R. 2 0 1 4 TM External Use Agenda Introduction to computer graphics and the 2D-ACE Hands-on

More information

COMPUTER ARCHITECTURES

COMPUTER ARCHITECTURES COMPUTER ARCHITECTURES Random Access Memory Technologies Gábor Horváth BUTE Department of Networked Systems and Services ghorvath@hit.bme.hu Budapest, 2019. 02. 24. Department of Networked Systems and

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits

More information

Technical Note. Maximize SPI Flash Memory Design Flexibility With a Single Package. Introduction

Technical Note. Maximize SPI Flash Memory Design Flexibility With a Single Package. Introduction Technical Note Maximize SPI Flash Memory Design Flexibility With a Single Package TN-25-08: Maximize SPI Flash Memory Design Flexibility Introduction Introduction This technical note discusses how a single

More information

MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16

MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16 MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16 THE DATA CHALLENGE Performance Improvement (RelaLve) 4.4 ZB Total data created, replicated, and consumed in a single year

More information

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory Memory Hierarchy Contents Memory System Overview Cache Memory Internal Memory External Memory Virtual Memory Memory Hierarchy Registers In CPU Internal or Main memory Cache RAM External memory Backing

More information

Hello, and welcome to this presentation of the STM32 Flash memory interface. It covers all the new features of the STM32F7 Flash memory.

Hello, and welcome to this presentation of the STM32 Flash memory interface. It covers all the new features of the STM32F7 Flash memory. Hello, and welcome to this presentation of the STM32 Flash memory interface. It covers all the new features of the STM32F7 Flash memory. 1 STM32F7 microcontrollers embed up to 2 Mbytes of Flash memory.

More information

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,

More information

COSC 6385 Computer Architecture - Memory Hierarchies (III)

COSC 6385 Computer Architecture - Memory Hierarchies (III) COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory

More information

Chapter 5. Internal Memory. Yonsei University

Chapter 5. Internal Memory. Yonsei University Chapter 5 Internal Memory Contents Main Memory Error Correction Advanced DRAM Organization 5-2 Memory Types Memory Type Category Erasure Write Mechanism Volatility Random-access memory(ram) Read-write

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

Memory Systems for Embedded Applications. Chapter 4 (Sections )

Memory Systems for Embedded Applications. Chapter 4 (Sections ) Memory Systems for Embedded Applications Chapter 4 (Sections 4.1-4.4) 1 Platform components CPUs. Interconnect buses. Memory. Input/output devices. Implementations: System-on-Chip (SoC) vs. Multi-Chip

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast

More information

Introduction I/O 1. I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec

Introduction I/O 1. I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec Introduction I/O 1 I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections I/O Device Summary I/O 2 I/O System

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

EE 457 Unit 7b. Main Memory Organization

EE 457 Unit 7b. Main Memory Organization 1 EE 457 Unit 7b Main Memory Organization 2 Motivation Organize main memory to Facilitate byte-addressability while maintaining Efficient fetching of the words in a cache block Low order interleaving (L.O.I)

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Product Technical Brief S3C2416 May 2008

Product Technical Brief S3C2416 May 2008 Product Technical Brief S3C2416 May 2008 Overview SAMSUNG's S3C2416 is a 32/16-bit RISC cost-effective, low power, high performance micro-processor solution for general applications including the GPS Navigation

More information

Introduction to Pre-Boot Loader Supported by QorIQ Processors

Introduction to Pre-Boot Loader Supported by QorIQ Processors Introduction to Pre-Boot Loader Supported by QorIQ Processors FTF-NET-F0152 Zhongcai Zhou Application Engineer A P R. 2 0 1 4 TM External Use Introduction What does Pre-Boot Loader (PBL) do? Device configuration

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

SoC Platforms and CPU Cores

SoC Platforms and CPU Cores SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Chapter 6. Storage and Other I/O Topics

Chapter 6. Storage and Other I/O Topics Chapter 6 Storage and Other I/O Topics Introduction I/O devices can be characterized by Behaviour: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections

More information

Blackfin Optimizations for Performance and Power Consumption

Blackfin Optimizations for Performance and Power Consumption The World Leader in High Performance Signal Processing Solutions Blackfin Optimizations for Performance and Power Consumption Presented by: Merril Weiner Senior DSP Engineer About This Module This module

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories

More information

08 - Address Generator Unit (AGU)

08 - Address Generator Unit (AGU) October 2, 2014 Todays lecture Memory subsystem Address Generator Unit (AGU) Schedule change A new lecture has been entered into the schedule (to compensate for the lost lecture last week) Memory subsystem

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Hercules ARM Cortex -R4 System Architecture. Processor Overview

Hercules ARM Cortex -R4 System Architecture. Processor Overview Hercules ARM Cortex -R4 System Architecture Processor Overview What is Hercules? TI s 32-bit ARM Cortex -R4/R5 MCU family for Industrial, Automotive, and Transportation Safety Hardware Safety Features

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS

More information

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand Main Memory & DRAM Nima Honarmand Main Memory Big Picture 1) Last-level cache sends its memory requests to a Memory Controller Over a system bus of other types of interconnect 2) Memory controller translates

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Mobile HW and Bandwidth

Mobile HW and Bandwidth Your logo on white Mobile HW and Bandwidth Andrew Gruber Qualcomm Technologies, Inc. Agenda and Goals Describe the Power and Bandwidth challenges facing Mobile Graphics Describe some of the Power Saving

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

PC-based data acquisition II

PC-based data acquisition II FYS3240 PC-based instrumentation and microcontrollers PC-based data acquisition II Data streaming to a storage device Spring 2015 Lecture 9 Bekkeng, 29.1.2015 Data streaming Data written to or read from

More information

Cannon Mountain Dr Longmont, CO LS6410 Hardware Design Perspective

Cannon Mountain Dr Longmont, CO LS6410 Hardware Design Perspective LS6410 Hardware Design Perspective 1. S3C6410 Introduction The S3C6410X is a 16/32-bit RISC microprocessor, which is designed to provide a cost-effective, lowpower capabilities, high performance Application

More information

Hands-on Workshop: Driving Displays Part 4 - The Latest ColdFire MCU, the MCF5227x

Hands-on Workshop: Driving Displays Part 4 - The Latest ColdFire MCU, the MCF5227x November 2008 Hands-on Workshop: Driving Displays Part 4 - The Latest ColdFire MCU, the MCF5227x PZ111 Shen Li Application Engineer owners. Freescale Semiconductor, Inc. 2008. Agenda MCF5227x Intro MCF5227x

More information

Technology in Action

Technology in Action Technology in Action Chapter 9 Behind the Scenes: A Closer Look at System Hardware 1 Binary Language Computers work in binary language. Consists of two numbers: 0 and 1 Everything a computer does is broken

More information

High-Speed NAND Flash

High-Speed NAND Flash High-Speed NAND Flash Design Considerations to Maximize Performance Presented by: Robert Pierce Sr. Director, NAND Flash Denali Software, Inc. History of NAND Bandwidth Trend MB/s 20 60 80 100 200 The

More information

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB) CS6290 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesn t want to be bothered Do you think, oh, this computer only has 128MB so

More information

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad nc. Application Note AN1801 Rev. 0.2, 11/2003 Performance Differences between MPC8240 and the Tsi106 Host Bridge Top Changwatchai Roy Jenevein risc10@email.sps.mot.com CPD Applications This paper discusses

More information

RX600. Direct Drive LCD KIT. Product Overview. Renesas Electronics America Inc. Carmelo Sansone. Tuesday, February, 2011 Rev. 1.

RX600. Direct Drive LCD KIT. Product Overview. Renesas Electronics America Inc. Carmelo Sansone. Tuesday, February, 2011 Rev. 1. RX600 Direct Drive LCD KIT Product Overview Renesas Electronics America Inc. Carmelo Sansone Tuesday, February, 2011 Rev. 1.3 2010 Renesas Electronics America Inc. All rights reserved. 00000-A Outline

More information

Designing with External Flash Memory on Renesas Platforms

Designing with External Flash Memory on Renesas Platforms Designing with External Flash Memory on Renesas Platforms Douglas Crane, Segment Manager Micron Technology Class ID: CL23A Renesas Electronics America Inc. Douglas Crane Doug is a 27 year veteran in the

More information

Chapter 1 Microprocessor architecture ECE 3120 Dr. Mohamed Mahmoud http://iweb.tntech.edu/mmahmoud/ mmahmoud@tntech.edu Outline 1.1 Computer hardware organization 1.1.1 Number System 1.1.2 Computer hardware

More information

The University of Adelaide, School of Computer Science 13 September 2018

The University of Adelaide, School of Computer Science 13 September 2018 Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

3D Graphics in Future Mobile Devices. Steve Steele, ARM

3D Graphics in Future Mobile Devices. Steve Steele, ARM 3D Graphics in Future Mobile Devices Steve Steele, ARM Market Trends Mobile Computing Market Growth Volume in millions Mobile Computing Market Trends 1600 Smart Mobile Device Shipments (Smartphones and

More information

Nitro240/260 CPU Board Scalable 680x0 VME board for I/O intensive applications

Nitro240/260 CPU Board Scalable 680x0 VME board for I/O intensive applications Nitro240/260 CPU Board Scalable 680x0 VME board for I/O intensive applications Nitro260 features a 50 MHz MC68060 CISC processor with superscalar pipeline architecture for maximum integer and floating

More information

STM32F7 series ARM Cortex -M7 powered Releasing your creativity

STM32F7 series ARM Cortex -M7 powered Releasing your creativity STM32F7 series ARM Cortex -M7 powered Releasing your creativity STM32 high performance Very high performance 32-bit MCU with DSP and FPU The STM32F7 with its ARM Cortex -M7 core is the smartest MCU and

More information

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company Presentation Overview PA-8500 Overview uction Fetch Capabilities

More information

Pollard s Attempt to Explain Cache Memory

Pollard s Attempt to Explain Cache Memory Pollard s Attempt to Explain Cache Start with (Very) Basic Block Diagram CPU (Actual work done here) (Starting and ending data stored here, along with program) Organization of : Designer s choice 1 Problem

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

CENG3420 Lecture 08: Memory Organization

CENG3420 Lecture 08: Memory Organization CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48 Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory

More information

Flash Memory Summit 2011

Flash Memory Summit 2011 1 Billion cores Memory Summit 2011 Session 302: Nonvolatile Design Challenges and Methodologies The Processor s role in maximizing performance and reducing energy consumption Neil Robinson Tensilica At

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Storage. Hwansoo Han

Storage. Hwansoo Han Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in

More information

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses 1 Most of the integrated I/O subsystems are connected to the

More information

CROSSOVER TO MEMORY EXPANSION WITH ADESTO ECOXiP AND NXP S i.mx RT CROSSOVER PROCESSORS

CROSSOVER TO MEMORY EXPANSION WITH ADESTO ECOXiP AND NXP S i.mx RT CROSSOVER PROCESSORS CROSSOVER TO MEMORY EXPANSION WITH ADESTO ECOXiP AND NXP S i.mx RT CROSSOVER PROCESSORS Donnie Garcia, NXP Semiconductor: Solutions Architect Eyal Barzilay, Adesto Technologies: System and Software INTRODUCTION

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance

More information

ID 730L: Getting Started with Multimedia Programming on Linux on SH7724

ID 730L: Getting Started with Multimedia Programming on Linux on SH7724 ID 730L: Getting Started with Multimedia Programming on Linux on SH7724 Global Edge Ian Carvalho Architect 14 October 2010 Version 1.0 Mr. Ian Carvalho System Architect, Global Edge Software Ltd. Responsible

More information

Remote Keyless Entry In a Body Controller Unit Application

Remote Keyless Entry In a Body Controller Unit Application 38 Petr Cholasta Remote Keyless Entry In a Body Controller Unit Application Many of us know this situation. When we leave the car, with a single click of a remote control we lock and secure it until we

More information

Each Milliwatt Matters

Each Milliwatt Matters Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets

More information

KeyStone II. CorePac Overview

KeyStone II. CorePac Overview KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information