Hands-On Workshop: Memory Configuration and Throughput
|
|
- Brittany Oliver
- 5 years ago
- Views:
Transcription
1 Hands-On Workshop: Memory Configuration and Throughput FTF-AUT-F0343 Ioseph Martinez Senior Applications Engineer A P R TM External Use
2 Session Introduction This session reviews the challenges of working with the latest MCUs for automotive instrument cluster & graphic systems. Interconnect complexity and throughput requirements have incremented for systems that do graphical applications Understanding memory system configuration is important because it helps you select the right part for your project multimedia/graphic projects can be underestimated or overestimated if the memory system of the part is not correctly understood External Use 1
3 Session Objectives After completing this session, you will be able to: Calculate bandwidth requirements for different systems Differentiate between the different type of masters and slaves in a system and how they access or get accessed in the system Perform memory bandwidth stress tests to achieve peak bandwidth in Freescale Vybrid controllers External Use 2
4 Agenda Introduction to Vybrid Controllers and Next-generation Cluster Systems QuadSPI Memory Theory and Practice DDR DRAM Theory and Practice Internal SRAM Theory and Practice Session Closure External Use 3
5 Vybrid R Series System External Use 4
6 Next-generation Cluster System External Use 5
7 Key Differences Next-generation cluster systems have internal flash memory while Vybrid processors don t Next-generation cluster DDR2 can be 32 bits wide, while Vybrid processor is 16 bit Vybrid processor has an L2 cache controller Vybrid processor ports for internal SRAM are all AXI. In nextgeneration clusters, some are AXI and others are AHB Vybrid processor operates the core and some other masters at 400 MHz. Next-generation cluster system operates at 320 MHz (system frequency is 133 MHz (R-Series) and 160 MHz respectively) External Use 6
8 About the Masters Masters initiate and drive access to the slaves A5 and M4 can consume some of the bandwidth, but caches relieve most of the load from the system Most of the opcodes require more than 1 cycle to execute. Load is reduced based on the type of encoding used Masters may operate at different frequencies depending on whether they are clocked at system frequency or a multiple of it Latencies and peak bandwidth on each master also depends on the slave being accessed External Use 7
9 2D-ACE: Display Controller Bandwidth per layer = pixel clock * bytes per pixel Maximum 6 layers blend in a single pixel External Use 8
10 Graphics Processing Unit: OpenVG1.1 Full fixed function hardware vector graphics GPU Hardware tessellation: Minimum CPU involvement 16x FSAA: Photorealistic quality Multiformat rendering High quality vector font rendering Standard API OpenVG1.1 Output bandwidth = sysfreq * pixels 200 Mpixels for Vybrid processor 160 Mpixels for next-generation cluster Input bandwidth = 4 x output bandwidth GC355 GPU Core AHB AXI Host Interface Memory Controller Graphics Pipeline Front End Vector Graphics Engine Imaging Engine VG Pixel Engine External Use 9 r0: 23-Sep-13
11 About the Slaves Slaves are passive elements accessed by the masters in the system. They stand by until a master accesses them Some slaves are read only while others are read/write. Read/write can double the bandwidth Some slaves have higher latency for random accesses than others (ex. external DRAM and QSPI) Some slaves have more than 1 instance of the same module External Use 10
12 Agenda Introduction to Vybrid Controllers and Next-generation Cluster Systems QuadSPI Memory Theory and Practice DDR DRAM Theory and Practice Internal SRAM Theory and Practice Session Closure External Use 11
13 QSPI Features Dual QuadSPI architecture supports: Two external serial flashes per QuadSPI module Programmable sequence engine compatible with any serial flash Supports up to 4 chip selects QuadSPI can control 2 x 4-bit serial flashes: Individual flash mode Parallel mode enabling octal flash with data recombination internally in QuadSPI READING ONLY Flexible receive (Rx) buffering scheme: Sub-buffers allocated to specific masters Master prioritisation Pre-fetch capability Suspend & resume for lower priority masters Up to 100 MHz clock (200 MByte/s peak bandwidth) in Next Gen Cluster External Use 12
14 QuadSPI Bandwidth Serial interface bandwidth (b/w): Peak b/w = [66 Mhz(sclk) * 4(quad) * 2(parallel mode) *2(ddr)] / [8bits/byte] = 132 MByte/sec Effective b/w: Less than peak b/w. Overhead due to flash command Impact depends on data size transferred AXI Read Request 1 st Databeat available on AHB Sclk Cycles? 8 4/ ? FIRST ACCESS Pre Command Addr Mode Dummy Data Post SUBSEQUENT ACCESSES 64 bit databeat in 4 cycles Sclk Cycles? 4/ ? Pre Addr Mode Dummy Data Post Effective bandwidth: Access 18-4 (same command in subsequent access) Effective b/w (128 byte access, XIP, 24add) = (32/( )) % = MByte/sec Effective b/w (128 byte access, XIP, 32add) = 98.2 MByte/sec Effective b/w (256 byte access, XIP, 32add) = MByte/sec External Use 13
15 Laboratory 1, Part 1: QSPI: Flashing memory Step 1: Open Lab1.eww by double clicking on it Step 2: Build the project (F7) Step 3: Download the project (Ctrl+D) Step 4: Debug the project run Step 5: Wait for the program to erase and program the memories (this may take more than 30 seconds) Step 6: Some colors will appear on the screen How does it look? External Use 14
16 Laboratory 1, Part 1: QSPI Flashing memory Step 7: Break code to debug: Menu Debug>Break Step 8: Go to menu View>Register and select DCU0 Step 9: Select DCU0_DIV_RATIO:DIV_RATIO Step 10: Change the pixel clock to a lower frequency until the image looks correct on the screen. (Increment the value of the Divider) What is the divider value at which the image looks correct? External Use 15
17 Laboratory 1, Part 2: QSPI: Bits per pixel Step 1: Stop Debugging. Open image.c file. Step 2: Comment the following line: #define PROGRAM_GRAPHICS Step 2: Select a different image with lower resolution by selecting (3) on the following line: #define IMGNUMBER (3) Step 8: Rebuild, debug and run again Step 9: If the image does not looks correct, try to find a DCU0 clock divider on which the image looks correct. What is the value at which the image looks correct? Now try with: #define IMGNUMBER (1) External Use 16
18 Screen Pixel Clock & QSPI Throughput Screen pixel 60 fps: WQVGA (480 x 272): #9 MHz WVGA (800 x 480): #32 MHz QSPI clock max throughput: DDR MHz 200 MB/s max in next gen cluster DDR 8 66 MHz 132 MB/s max in Vybrid processor Per layer 2D-ACE required throughput: 8 9 MHz 9 MB/s max 22 layers can be blended in next-gen cluster, 14 layers in Vybrid processor 16 9 MHz 18 MB/s max 11 layers can be blended in next-gen cluster, 7 layers in Vybrid processor 8 32 MHz 32 MB/s max 6 layers can be blended in next-gen cluster, 4 layers in Vybrid processor MHz 64 MB/s max 3 layers can be blended in next-gen cluster, 2 layers in Vybrid processor (Theoretical/ideal use cases) External Use 17
19 Laboratory 1, Part 3: QSPI 2D-ACE Blending Step 1: Start Over, open image.c file Step 2: Uncomment the following line: #define EXTRALAYER8BPP Step 3: Rebuild, debug and run again Does the image looks correct? Step 4: If the image does not looks correct, try to find a DCU0 clock divider on which the image looks correct. External Use 18
20 Laboratory 1, Part 3: QSPI 2D-ACE Blending Step 5: Stop Debugging, open image.c file Step 6: Uncomment the following line: #define QUADREADS Step 7: Select a different image with higher resolution by selecting (0) on the following line: #define IMGNUMBER (0) Step 8: Rebuild, debug and run again Does the image looks correct? Step 8: Stop Debugging and uncomment the following line: #define EXTRALAYER16BPP Step 9: Rebuild, debug and run again External Use 19
21 Laboratory 1, Part 4: QSPI Parallel Mode Step 1: Start Over, open image.c file Step 2: Uncomment the following line: #define PARALLELREADS Step 3: Rebuild, debug and run again Does the image looks correct? External Use 20
22 QuadSPI Memory Map Serial and Parallel Region Start Address End Address Size (MB) QSPI0 0x2000_0000 0x2FFF_FFFF 256 AMBA_BASE SFA1AD SFA2AD Serial Mode QSPI1 0x5000_0000 0x5FFF_FFFF 256 A1 A2 AMBA_BASE SFA2AD Parallel Mode A1 + B1 SFB1AD SFB2AD B1 B2 SFB2AD A2 + B2 External Use 21
23 Laboratory 1, Part 4: QSPI Parallel Mode Step 4: Start Over, open image.c file Step 5: Uncomment the following line: #define PROGRAM_GRAPHICS Step 6: Rebuild, debug and run again. Step 7: Wait until something shows on the screen, it will take a while since we are re-flashing the memory. Step 8: Close the debug session and comment again: #define PROGRAM_GRAPHICS Step 9: Rebuild, debug and run again. Does the image looks correct? External Use 22
24 Agenda Introduction to Vybrid Controllers and Next-generation Cluster Systems QuadSPI Memory Theory and Practice DRAM Theory and Practice Internal SRAM Theory and Practice Session Closure External Use 23
25 DRAM Controller Next-generation cluster devices and Vybrid processors have different types of DRAM controllers: Next-gen: Supports SDR 16 MHz and DDR 16/ MHz Vybrid: Supports LPDDR2 & DDR MHz In the case of the next-gen cluster devices the A5, GPU and 2D- ACE has direct access to the DRAM for more efficient access There are some penalties for different data access methods. The most efficient way is linear access Peak bandwidth is calculated this way: Peak BW = Freq * BusWidth * mode Mode = 2 if DDR otherwise Mode = 1 Effective BW is a complex thing to calculate, but it is OK to generalize to certain efficiency level External Use 24
26 Screen Pixel Clock & DRAM Throughput Screen pixel 60 fps: WQVGA (480 x 272): #9 MHz WVGA (800 x 480): #32 MHz DRAM clock max throughput: DDR MHz 2560 MB/s max in next-gen cluster DDR MHz 1600 MB/s max in Vybrid processor SDR MHz 320 MB/s max in next-gen cluster Per layer 2D-ACE required throughput: 24 9 MHz 27 MB/s max 94 layers can be blended in next-gen cluster, 59 layers in Vybrid processor, 11 with SDR memory 32 9 MHz 32 MB/s max 80 layers can be blended in next-gen cluster, 50 layers in Vybrid processor, 10 with SDR memory MHz 96 MB/s max 26 layers can be blended in next-gen cluster, 16 layers in Vybrid processor, 3 with SDR memory MHz 128 MB/s max 20 layers can be blended in next-gen cluster, 12 layers in Vybrid processor, 2 with SDR memory (Theoretical/ideal use cases) External Use 25
27 Laboratory 2: DRAM, Overhead Step 1: Open Lab2.eww Step 2: Build the project (F7) Step 3: Download the project (Ctrl+D) Step 4: Debug the project run Step 5: Look at the serial console, what is the time spent on that function? Step 6: Modify the size of the buffer set BUFFERSMALLHEIGHT = 4 Step 7: Rebuild, debug and run again What is the time spent on that function? External Use 26
28 Laboratory 2: DRAM, GPU Write Step 1: Start Over, open image.c file Step 2: Comment #define TESTOVERHEAD Step 3: Uncomment #define TESTCLEAR Step 4: Rebuild, debug and run again What is the time spent on the each of the two operations? Do the numbers make sense? What is the achieved BW? Actual time = measured time - overhead External Use 27
29 Laboratory 2: DRAM, GPU Write Step 5: Start Over, open image.c file Step 6: Uncomment #define TESTCLEAR Step 7: Rebuild, debug and run again What is the achieved BW for the 32bpp operations? How it can be compared to the 16bpp operations? External Use 28
30 Laboratory 2: DRAM, GPU Copy Step 1: Start Over, open image.c file Step 2: Uncomment #define TESTCOPY Step 3: Rebuild, debug and run again What is the achieved BW for the operations? External Use 29
31 Laboratory 2: DRAM, GPU Blend Step 1: Start Over, open image.c file Step 2: Uncomment #define TESTBLEND Step 3: Rebuild, debug and run again What is the achieved BW for the operations? External Use 30
32 Laboratory 2: DRAM, GPU Rotate Step 1: Start Over, open image.c file Step 2: Uncomment #define TESTROTATE Step 3: Rebuild, debug and run again What is the achieved BW for the operations? External Use 31
33 Laboratory 2: DRAM, GPU QSPI Step 1: Start Over, open image.c file Step 2: Uncomment #define TESTQSPI Step 3: Rebuild, debug and run again What is the achieved BW for the operations? External Use 32
34 Agenda Introduction to Vybrid Controllers and Next-generation Cluster Systems QuadSPI Memory Theory and Practice DRAM Theory and Practice Internal SRAM Theory and Practice Session Closure External Use 33
35 RAM Controller On next-generation cluster devices there are two types of internal RAM: System RAM: Uses AHB port Graphics RAM: Uses AXI port Peak bandwidth = Freq * BusWidth Some features of the next-gen internal RAM controller: 1.3 MByte graphics SRAM block does not natively support ECC FlexECC enables conversion of non-ecc SRAM into ECC SRAM 1.3 MBytes non-ecc SRAM converts to 1 MByte ECC SRAM 320 kbytes are sacrificed as a syndrome-array 128 kbyte contains the packed ECC syndromes 192 kbyte becomes inaccessible Separate path from RAM controller to the syndrome-array allows parallel fetch of data and syndrome External Use 34
36 Screen Pixel Clock & SRAM Throughput Screen Pixel 60 fps: WQVGA (480 x 272): #9 MHz WVGA (800 x 480): #32 MHz DRAM clock max throughput: 160 MHz 1280 MB/s max in next-gen cluster 133 MHz 1064 MB/s max in Vybrid Per layer 2D-ACE required throughput: 9 MHz 27 MB/s max 47 layers can be blended in next-gen cluster, 39 layers in Vybrid processor 9 MHz 32 MB/s max 40 layers can be blended in next gen cluster, 33 layers in Vybrid processor 32 MHz 96 MB/s max 13 layers can be blended in next gen cluster, 11 layers in Vybrid processor 32 MHz 128 MB/s max 10 layers can be blended in next gen cluster, 8 layers in Vybrid processor (Theoretical/ideal use cases) External Use 35
37 Laboratory 3: RAM GPU Operations Step 1: Open Lab3.eww Step 2: Rebuild, debug and run again Step 3: Compare the results of DRAM (Lab2 vs. Lab3) in terms of BW Parameters to be tested measured: #define TESTOVERHEAD #define TESTCLEAR #define TEST32BPP #define TESTCOPY #define TESTBLEND #define TESTROTATE #define TESTQSPI External Use 36
38 Agenda Introduction to Vybrid Controllers and Next-generation Cluster Systems QuadSPI Memory Theory and Practice DDR DRAM Theory and Practice Internal SRAM Theory and Practice Session Closure External Use 37
39 Session Summary Graphics systems require full awareness of maximum limits, latencies and effective bandwidth for optimal usage. Each memory will have different limitations or scenarios where a device is most efficient. Application has to be designed considering this. Distributing utilization and bandwidth between the different memories for the different masters is an important requirement for graphics systems, because typically it will offload each slave and allow other masters to perform efficiently External Use 38
40 For Further Information External Use 39
41 Session Closing By now, you should be able to: Effectively describe the general bandwidth requirements of a graphical application based on the system configuration. Use this knowledge to decide what type of platform fits better with your designs Avoid the common problem of running out of bandwidth for a graphic application by using the different memories on a Freescale automotive microcontroller. External Use 40
42 Freescale Semiconductor, Inc. External Use
Understanding Vybrid Architecture
Freescale Semiconductor, Inc. Application Note Document Number: AN4947 Rev. 0, 07/2014 Understanding Vybrid Architecture by Jiri Kotzian and Rastislav Pavlanin Vybrid controller solutions are built on
More informationHands-On Workshop: ARM Architectures Optimization Hints & Tips
Hands-On Workshop: ARM Architectures Optimization Hints & Tips FTF-AUT-F0337 Daniel McKenna Applications Engineer A P R. 2 0 1 4 TM External Use Agenda This hands-on session will take a typical application
More informationAdapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]
Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM
More informationIoT, Wearable, Networking and Automotive Markets Driving External Memory Innovation Jim Cooke, Sr. Ecosystem Enabling Manager, Embedded Business Unit
IoT, Wearable, Networking and Automotive Markets Driving External Memory Innovation Jim Cooke, Sr. Ecosystem Enabling Manager, Embedded Business Unit JCooke@Micron.com 2016Micron Technology, Inc. All rights
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationModeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces
Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Li Chen, Staff AE Cadence China Agenda Performance Challenges Current Approaches Traffic Profiles Intro Traffic Profiles Implementation
More informationIntroduction to Embedded Graphics with Freescale Devices
Freescale Semiconductor Document Number: AN5072 Application Note Rev 0, 02/2015 Introduction to Embedded Graphics with Freescale Devices by: Luis Olea and Ioseph Martinez 1 Introduction The purpose of
More informationBuilding High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye
Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400
More informationTAG Word 0 Word 1 Word 2 Word 3 0x0A0 D2 55 C7 C8 0x0A0 FC FA AC C7 0x0A0 A5 A6 FF 00
ELE 758 Final Examination 2000: Answers and solutions Number of hits = 15 Miss rate = 25 % Miss rate = [5 (misses) / 20 (total memory references)]* 100% = 25% Show the final content of cache using the
More informationExploring System Coherency and Maximizing Performance of Mobile Memory Systems
Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech
More informationBuilding blocks for 64-bit Systems Development of System IP in ARM
Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects
More informationMAC57D5xx Start-Up Sequence
Freescale Semiconductor Document Number: AN5285 Application Note Rev. 0, 05/2016 MAC57D5xx Start-Up Sequence by: Manuel Rodriguez 1 Introduction The MAC57D5xx family is the next generation platform of
More informationNegotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye
Negotiating the Maze Getting the most out of memory systems today and tomorrow Robert Kaye 1 System on Chip Memory Systems Systems use external memory Large address space Low cost-per-bit Large interface
More informationAn Introduction to SPI-NOR Subsystem. By Vignesh R Texas Instruments India
An Introduction to SPI-NOR Subsystem By Vignesh R Texas Instruments India vigneshr@ti.com About me Software Engineer at Texas Instruments India Part of Linux team that works on supporting various TI SoCs
More informationHello, and welcome to this presentation of the STM32L4 System Configuration Controller.
Hello, and welcome to this presentation of the STM32L4 System Configuration Controller. 1 Please note that this presentation has been written for STM32L47x/48x devices. The key differences with other devices
More informationWorking with Live Video and Graphics
Working with Live Video and Graphics FTF-AUT-F0464 Oliver Tian Auto FAE MAY.2014 TM External Use Agenda Trend of Video and Graphics in Vehicle Roadmap of Cluster Introduction of Rainbow/Vybrid Working
More informationComputer Memory. Textbook: Chapter 1
Computer Memory Textbook: Chapter 1 ARM Cortex-M4 User Guide (Section 2.2 Memory Model) STM32F4xx Technical Reference Manual: Chapter 2 Memory and Bus Architecture Chapter 3 Flash Memory Chapter 36 Flexible
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 16
More informationELE 375 Final Exam Fall, 2000 Prof. Martonosi
ELE 375 Final Exam Fall, 2000 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in the space
More informationSAMA5D2 Quad SPI (QSPI) Performance. Introduction. SMART ARM-based Microprocessor APPLICATION NOTE
SMART ARM-based Microprocessor SAMA5D2 Quad SPI (QSPI) Performance APPLICATION NOTE Introduction The Atmel SMART SAMA5D2 Series is a high-performance, powerefficient embedded MPU based on the ARM Cortex
More informationAgenda. System Performance Scaling of IBM POWER6 TM Based Servers
System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies
More informationMemory technology and optimizations ( 2.3) Main Memory
Memory technology and optimizations ( 2.3) 47 Main Memory Performance of Main Memory: Latency: affects Cache Miss Penalty» Access Time: time between request and word arrival» Cycle Time: minimum time between
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationZynq-7000 All Programmable SoC Product Overview
Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform
More informationChapter 6 Storage and Other I/O Topics
Department of Electr rical Eng ineering, Chapter 6 Storage and Other I/O Topics 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Feng-Chia Unive ersity Outline 6.1 Introduction 6.2 Dependability,
More informationEffective System Design with ARM System IP
Effective System Design with ARM System IP Mentor Technical Forum 2009 Serge Poublan Product Marketing Manager ARM 1 Higher level of integration WiFi Platform OS Graphic 13 days standby Bluetooth MP3 Camera
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationAVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction.
AVR XMEGA TM Product Introduction 32-bit AVR UC3 AVR Flash Microcontrollers The highest performance AVR in the world 8/16-bit AVR XMEGA Peripheral Performance 8-bit megaavr The world s most successful
More informationNew STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU
New STM32 F7 Series World s 1 st to market, ARM Cortex -M7 based 32-bit MCU 7 Keys of STM32 F7 series 2 1 2 3 4 5 6 7 First. ST is first to sample a fully functional Cortex-M7 based 32-bit MCU : STM32
More informationARM Multimedia IP: working together to drive down system power and bandwidth
ARM Multimedia IP: working together to drive down system power and bandwidth Speaker: Robert Kong ARM China FAE Author: Sean Ellis ARM Architect 1 Agenda System power overview Bandwidth, bandwidth, bandwidth!
More informationOvercoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics
Overcoming the Memory System Challenge in Dataflow Processing Darren Jones, Wave Computing Drew Wingard, Sonics Current Technology Limits Deep Learning Performance Deep Learning Dataflow Graph Existing
More informationHands-On Workshop: An Introduction to OpenVG
Hands-On Workshop: An Introduction to OpenVG FTF-AUT-F0342 Steve McAslan Senior Member of Technical Staff A P R. 2 0 1 4 TM External Use Agenda Introduction to computer graphics and the 2D-ACE Hands-on
More informationCOMPUTER ARCHITECTURES
COMPUTER ARCHITECTURES Random Access Memory Technologies Gábor Horváth BUTE Department of Networked Systems and Services ghorvath@hit.bme.hu Budapest, 2019. 02. 24. Department of Networked Systems and
More informationThis Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits
More informationTechnical Note. Maximize SPI Flash Memory Design Flexibility With a Single Package. Introduction
Technical Note Maximize SPI Flash Memory Design Flexibility With a Single Package TN-25-08: Maximize SPI Flash Memory Design Flexibility Introduction Introduction This technical note discusses how a single
More informationMANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16
MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16 THE DATA CHALLENGE Performance Improvement (RelaLve) 4.4 ZB Total data created, replicated, and consumed in a single year
More informationContents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory
Memory Hierarchy Contents Memory System Overview Cache Memory Internal Memory External Memory Virtual Memory Memory Hierarchy Registers In CPU Internal or Main memory Cache RAM External memory Backing
More informationHello, and welcome to this presentation of the STM32 Flash memory interface. It covers all the new features of the STM32F7 Flash memory.
Hello, and welcome to this presentation of the STM32 Flash memory interface. It covers all the new features of the STM32F7 Flash memory. 1 STM32F7 microcontrollers embed up to 2 Mbytes of Flash memory.
More informationPerformance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews
Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (III)
COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory
More informationChapter 5. Internal Memory. Yonsei University
Chapter 5 Internal Memory Contents Main Memory Error Correction Advanced DRAM Organization 5-2 Memory Types Memory Type Category Erasure Write Mechanism Volatility Random-access memory(ram) Read-write
More informationUnit 11: Putting it All Together: Anatomy of the XBox 360 Game Console
Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture
More informationMemory Systems for Embedded Applications. Chapter 4 (Sections )
Memory Systems for Embedded Applications Chapter 4 (Sections 4.1-4.4) 1 Platform components CPUs. Interconnect buses. Memory. Input/output devices. Implementations: System-on-Chip (SoC) vs. Multi-Chip
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast
More informationIntroduction I/O 1. I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec
Introduction I/O 1 I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections I/O Device Summary I/O 2 I/O System
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationEE 457 Unit 7b. Main Memory Organization
1 EE 457 Unit 7b Main Memory Organization 2 Motivation Organize main memory to Facilitate byte-addressability while maintaining Efficient fetching of the words in a cache block Low order interleaving (L.O.I)
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationProduct Technical Brief S3C2416 May 2008
Product Technical Brief S3C2416 May 2008 Overview SAMSUNG's S3C2416 is a 32/16-bit RISC cost-effective, low power, high performance micro-processor solution for general applications including the GPS Navigation
More informationIntroduction to Pre-Boot Loader Supported by QorIQ Processors
Introduction to Pre-Boot Loader Supported by QorIQ Processors FTF-NET-F0152 Zhongcai Zhou Application Engineer A P R. 2 0 1 4 TM External Use Introduction What does Pre-Boot Loader (PBL) do? Device configuration
More informationCopyright 2016 Xilinx
Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationSoC Platforms and CPU Cores
SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University
More informationChapter 6. Storage and Other I/O Topics
Chapter 6 Storage and Other I/O Topics Introduction I/O devices can be characterized by Behaviour: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections
More informationBlackfin Optimizations for Performance and Power Consumption
The World Leader in High Performance Signal Processing Solutions Blackfin Optimizations for Performance and Power Consumption Presented by: Merril Weiner Senior DSP Engineer About This Module This module
More information1. Memory technology & Hierarchy
1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories
More information08 - Address Generator Unit (AGU)
October 2, 2014 Todays lecture Memory subsystem Address Generator Unit (AGU) Schedule change A new lecture has been entered into the schedule (to compensate for the lost lecture last week) Memory subsystem
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationHercules ARM Cortex -R4 System Architecture. Processor Overview
Hercules ARM Cortex -R4 System Architecture Processor Overview What is Hercules? TI s 32-bit ARM Cortex -R4/R5 MCU family for Industrial, Automotive, and Transportation Safety Hardware Safety Features
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationBasics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS
Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS
More informationSpring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand
Main Memory & DRAM Nima Honarmand Main Memory Big Picture 1) Last-level cache sends its memory requests to a Memory Controller Over a system bus of other types of interconnect 2) Memory controller translates
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationMobile HW and Bandwidth
Your logo on white Mobile HW and Bandwidth Andrew Gruber Qualcomm Technologies, Inc. Agenda and Goals Describe the Power and Bandwidth challenges facing Mobile Graphics Describe some of the Power Saving
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationPC-based data acquisition II
FYS3240 PC-based instrumentation and microcontrollers PC-based data acquisition II Data streaming to a storage device Spring 2015 Lecture 9 Bekkeng, 29.1.2015 Data streaming Data written to or read from
More informationCannon Mountain Dr Longmont, CO LS6410 Hardware Design Perspective
LS6410 Hardware Design Perspective 1. S3C6410 Introduction The S3C6410X is a 16/32-bit RISC microprocessor, which is designed to provide a cost-effective, lowpower capabilities, high performance Application
More informationHands-on Workshop: Driving Displays Part 4 - The Latest ColdFire MCU, the MCF5227x
November 2008 Hands-on Workshop: Driving Displays Part 4 - The Latest ColdFire MCU, the MCF5227x PZ111 Shen Li Application Engineer owners. Freescale Semiconductor, Inc. 2008. Agenda MCF5227x Intro MCF5227x
More informationTechnology in Action
Technology in Action Chapter 9 Behind the Scenes: A Closer Look at System Hardware 1 Binary Language Computers work in binary language. Consists of two numbers: 0 and 1 Everything a computer does is broken
More informationHigh-Speed NAND Flash
High-Speed NAND Flash Design Considerations to Maximize Performance Presented by: Robert Pierce Sr. Director, NAND Flash Denali Software, Inc. History of NAND Bandwidth Trend MB/s 20 60 80 100 200 The
More informationViews of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)
CS6290 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesn t want to be bothered Do you think, oh, this computer only has 128MB so
More informationArchitectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad
nc. Application Note AN1801 Rev. 0.2, 11/2003 Performance Differences between MPC8240 and the Tsi106 Host Bridge Top Changwatchai Roy Jenevein risc10@email.sps.mot.com CPD Applications This paper discusses
More informationRX600. Direct Drive LCD KIT. Product Overview. Renesas Electronics America Inc. Carmelo Sansone. Tuesday, February, 2011 Rev. 1.
RX600 Direct Drive LCD KIT Product Overview Renesas Electronics America Inc. Carmelo Sansone Tuesday, February, 2011 Rev. 1.3 2010 Renesas Electronics America Inc. All rights reserved. 00000-A Outline
More informationDesigning with External Flash Memory on Renesas Platforms
Designing with External Flash Memory on Renesas Platforms Douglas Crane, Segment Manager Micron Technology Class ID: CL23A Renesas Electronics America Inc. Douglas Crane Doug is a 27 year veteran in the
More informationChapter 1 Microprocessor architecture ECE 3120 Dr. Mohamed Mahmoud http://iweb.tntech.edu/mmahmoud/ mmahmoud@tntech.edu Outline 1.1 Computer hardware organization 1.1.1 Number System 1.1.2 Computer hardware
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More information3D Graphics in Future Mobile Devices. Steve Steele, ARM
3D Graphics in Future Mobile Devices Steve Steele, ARM Market Trends Mobile Computing Market Growth Volume in millions Mobile Computing Market Trends 1600 Smart Mobile Device Shipments (Smartphones and
More informationNitro240/260 CPU Board Scalable 680x0 VME board for I/O intensive applications
Nitro240/260 CPU Board Scalable 680x0 VME board for I/O intensive applications Nitro260 features a 50 MHz MC68060 CISC processor with superscalar pipeline architecture for maximum integer and floating
More informationSTM32F7 series ARM Cortex -M7 powered Releasing your creativity
STM32F7 series ARM Cortex -M7 powered Releasing your creativity STM32 high performance Very high performance 32-bit MCU with DSP and FPU The STM32F7 with its ARM Cortex -M7 core is the smartest MCU and
More informationTechniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company
Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company Presentation Overview PA-8500 Overview uction Fetch Capabilities
More informationPollard s Attempt to Explain Cache Memory
Pollard s Attempt to Explain Cache Start with (Very) Basic Block Diagram CPU (Actual work done here) (Starting and ending data stored here, along with program) Organization of : Designer s choice 1 Problem
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationNear Memory Key/Value Lookup Acceleration MemSys 2017
Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy
More informationCENG3420 Lecture 08: Memory Organization
CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48 Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory
More informationFlash Memory Summit 2011
1 Billion cores Memory Summit 2011 Session 302: Nonvolatile Design Challenges and Methodologies The Processor s role in maximizing performance and reducing energy consumption Neil Robinson Tensilica At
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationStorage. Hwansoo Han
Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics
More informationThe Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA
The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in
More informationIntroduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses
Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses 1 Most of the integrated I/O subsystems are connected to the
More informationCROSSOVER TO MEMORY EXPANSION WITH ADESTO ECOXiP AND NXP S i.mx RT CROSSOVER PROCESSORS
CROSSOVER TO MEMORY EXPANSION WITH ADESTO ECOXiP AND NXP S i.mx RT CROSSOVER PROCESSORS Donnie Garcia, NXP Semiconductor: Solutions Architect Eyal Barzilay, Adesto Technologies: System and Software INTRODUCTION
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance
More informationID 730L: Getting Started with Multimedia Programming on Linux on SH7724
ID 730L: Getting Started with Multimedia Programming on Linux on SH7724 Global Edge Ian Carvalho Architect 14 October 2010 Version 1.0 Mr. Ian Carvalho System Architect, Global Edge Software Ltd. Responsible
More informationRemote Keyless Entry In a Body Controller Unit Application
38 Petr Cholasta Remote Keyless Entry In a Body Controller Unit Application Many of us know this situation. When we leave the car, with a single click of a remote control we lock and secure it until we
More informationEach Milliwatt Matters
Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets
More informationKeyStone II. CorePac Overview
KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit
More information