Hands-On Workshop: ARM Architectures Optimization Hints & Tips

Size: px
Start display at page:

Download "Hands-On Workshop: ARM Architectures Optimization Hints & Tips"

Transcription

1 Hands-On Workshop: ARM Architectures Optimization Hints & Tips FTF-AUT-F0337 Daniel McKenna Applications Engineer A P R TM External Use

2 Agenda This hands-on session will take a typical application project and apply various optimization techniques. Cortex A5 and Platform Example: Introduction to Benchmark DMA Example: Using DMA Cortex M4 and Dual Core Example: Using secondary core Cache Example: Using Cache] Example: Revisiting Benchmark External Use 1

3 The Class: The main goal is to explain the key architectural and system module features that can be tuned to optimize an application so that designers can make informed decisions when designing their system software. The Class is not: about coding techniques about compiler optimisation a step by step guide there is no universal set of rules! External Use 2

4 Cortex A5 and Platform External Use 3

5 Freescale Vybrid Controllers External Use 4

6 A5 Processor Core Details ARM Cortex -A5 Application Processor Operating at n:1 clock ratio compared to platform, where n = {2,3} Core frequencies = 266.6, MHz 8-stage pipeline provides 1.57 DMIPS per MHz performance Includes double precision FPU, NEON media engine, TrustZone Support for ARM and THUMB instruction sets 32K I- and 32K D-Caches with 32-byte cache line size Optional support for 512K L2 cache with 32-byte cache line size Partially enabled via an e-fuse Single 64-bit AXI system bus interface CoreSight debug and trace system Generic Interrupt Controller (GIC) External Use 5

7 ARM Cortex-A5 Processor Pipeline External Use 6

8 CPU Diagram to bus interface One bus interface, but Harvard Architecture cache External Use 7

9 Platform Diagram External Use 8

10 Masters TM Slaves NIC301 Bus NIC301 Internal Switches CoreLink Network Interconnect (NIC301) High-performance, optimized AMBAcompliant network infrastructure NIC301 optimised for 64-bit AXI interface CA5, GPU, 2D-ACE: 64-bit AXI interface Optimised master to slave connections Configurable at integration Reduces power and area of NIC301 Optimised for multi master data bandwidth, not data latency NIC301 has read/write buffering to improve throughput on larger data transfers Typ. 5 cycle latency for access between master & internal RAM External Use 9

11 AXI vs AHB AXI: Optimised for maximum throughput Burst-based, split transaction bus Ability to issue multiple outstanding addresses Out-of-order transaction completion AHB: Optimised for minimum latency Burst transfers, split transaction Master will wait for slave to respond Support for SPLIT transactions but predominantly in-order transaction completion External Use 10

12 Platform Masters Bus masters generate traffic Module Protocol Ports Data Width Cortex-A5 AXI 1 64 Cortex-M4 AHB-Lite 2 64 DMA2x AHB-Lite 1 64 DCU (x2) AXI 1 64 USB OTG (x2) AHB-Lite 1 32 Open VG AXI 1 64 ENET AHB-Lite 1 32 VIU3 AHB-Lite 1 64 esdhc (x2) AHB-Lite 1 32 NFC AHB-Lite 1 32 CAAM AXI 1 32 DAP AHB-Lite 1 32 External Use 11

13 Platform Slaves Internal SRAM: Total:1.5MB 1MB GRAM(no ECC) 2 x 256KB SRAM with ECC (32bit boundary) Module Protocol Ports Data Width ROM AHB-Lite 1 64 FlexBus AHB-Lite 1 32 QuadSPI (x2) AHB-Lite 1 64 OCRAM (x3) AXI 1 64 CM4_TCM AHB-Lite 1 64 SDRAMC (DDR) AXI 2 64 Peripheral Bridges (x2) AHB-Lite 1 64 External Use 12

14 DRAM Port Split DRAM has two ports. Accesses are split as: PORT 0 PORT 1 CM4 DCU0 USB VIU DMA0, ENET0, esdhc0, NFC, DBG CA5 GPU DCU1 DMA1, ENET1, USB, esdhc External Use 13

15 NIC Arbitration Known as Quality of Service (QoS) Primary arbitration: Higher QoS = higher priority Secondary arbitration: least recently granted Default: recommended configuration : Latency critical masters given higher priority Arbitration Priorities (abridged): Bus Master Default QoS Cortex-M4 Core 1 Cortex-M4 System 2 Cortex-A5 0 DMA 4 DCU 14 Open VG 12 VIU3 13 External Use 14

16 Example 1: Getting Started Locate and launch example 1 project. Note that main loop contains 3 main steps: -A memory copy -Data transmission via SPI -Mathematical Algorithm Run code and note A5 core loop time. External Use 15

17 DMA External Use 16

18 Introduction 2 x 32 channel DMA Memory Memory Memory Peripheral Alternative to interrupts Transfer data with no core intervention External Use 17

19 Peripherals: Selecting a DMA source Source # 1 Disabled Source # 2 Source # 3 Peripherals Source # 53 Always On #1 Always Enabled Always On #10 DMA Channel Mux DMA Channel #0 DMA Channel #1 DMA Channel #31 Software selects which DMA sources connect to the 32 DMA channels DMA request for channels can be initiated by: A peripheral (example: ADC conversion result ready to be put into queue) Software (example: set a bit to initiate a block move) External Use 18

20 DMA Configuration External Use 19

21 edma Mux: Channel Configuration Channel Configuration Registers [0:15] TRIG (PIT generated DMA request) is only available for channels 0:7 PIT timers 1:8 provide the TRIG for DMA channels 0:7 External Use 20

22 edma TCD: Transfer Control Descriptor Introduction Source Address (saddr) Signed Source Address Offset (soff) Transfer Attributes (smod, ssize, dmod, dsize) Inner Minor Byte Count (nbytes) Destination Address (daddr) Signed Destination Address Offset (doff) Current Major Iteration Count (citer) Last Source Address Adjustment (slast) Last Destination Address Adjustment (dlast_sga) Major Iteration Counter (biter) Channel Control Status (bwc, linkch, done, active, e_link, e_sg, dreq, int_half, int_maj, start) External Use 21

23 Peripherals: DMA TCD Loops Example Memory Array DMA Request... Minor Loop Current Major Loop Iteration Count (citer) 3 DMA Request... Minor Loop Major Loop 2 DMA Request... Minor Loop 1 Each DMA request initiates one minor loop transfer (iteration) without CPU intervention DMA arbitration can occur after each minor loop One level of minor loop DMA preemption is allowed The number of minor loops in a major loop = beginning iteration count (biter) External Use 22

24 edma TCD: Transfer Control Descriptor Introduction Source Address (saddr) Signed Source Address Offset (soff) Transfer Attributes (smod, ssize, dmod, dsize) Inner Minor Byte Count (nbytes) Destination Address (daddr) Signed Destination Address Offset (doff) Current Major Iteration Count (citer) Last Source Address Adjustment (slast) Last Destination Address Adjustment (dlast_sga) Major Iteration Counter (biter) Channel Control Status (bwc, linkch, done, active, e_link, e_sg, dreq, int_half, int_maj, start) External Use 23

25 Peripherals: DMA TCD Terms address starting address Minor Loop... Minor Loop size of one data transfer nbytes in minor loop (often the same value as size)... offset: number of bytes added to current address after each transfer (often the same value as size) Each DMA source and destination has their own: addr size offset last Peripheral queues typically have size and offset equal to nbytes. last: number of bytes added to current address after major loop (typically used to loop back) External Use 24

26 Peripherals: DMA Example (Different Size and Offset) Transferring alternating half-words DMA source: ADC output queue (in SRAM) DMA destination: SPI transmit eqadc Output Queue addr A/D Value 1 Time Stamp 1 A/D Value 2 Time Stamp 2 A/D Value 3 Time Stamp 3 size offset DMA transfers: A/D Value 1, A/D Value 2, A/D Value 3 SPI Tx Time stamps are not transferred because each DMA source reads 2 bytes, then increments the source address by 4 bytes. External Use 25

27 edma TCD: Transfer Control Descriptor Introduction Source Address (saddr) Signed Source Address Offset (soff) Transfer Attributes (smod, ssize, dmod, dsize) Inner Minor Byte Count (nbytes) Destination Address (daddr) Signed Destination Address Offset (doff) Current Major Iteration Count (citer) Last Source Address Adjustment (slast) Last Destination Address Adjustment (dlast_sga) Major Iteration Counter (biter) Channel Control Status (bwc, linkch, done, active, e_link, e_sg, dreq, int_half, int_maj, start) External Use 26

28 Peripherals: DMA TCD Scatter-Gather Feature Allows a DMA channel to use multiple TCDs Enables a DMA channel to scatter the DMA data to multiple destinations or gather it from multiple sources Example use: linked list of LIN messages TCD A (in flash or SRAM) TCD B (in flash or SRAM) sga sga (Scatter Gather Address) Sequence: 1. Initialization: Load TCD A from flash to DMA channel x s TCD 2. 1 st DMA request or START=1: Executes TCD A; TCD B loads automatically 3. 2 nd DMA request or START=1: Executes TCD B; TCD A loads automatically Option: TCD B could automatically execute with the 1 st DMA request if the TCD s start bit is set. External Use 27

29 Peripherals: DMA Channel Linking A DMA channel can link to another DMA channel, i.e., set the start bit of another TCD: At the end of every minor loop (except the last one) And/or at the end of the major loop Also enables linked lists Desired Link Behavior Link at end of minor loop Link at end of major loop TCD Control Field Name citer.e_link citer.linkch major.e_link major.linkch Description Enable channel-to-channel linking on minor loop completion (current iteration) Link channel number when linking at end of minor loop (current iteration) Enable channel-to-channel linking on major loop completion Link channel number when linking at end of major loop External Use 28

30 Peripherals: DMA Other Control & Status Fields (1 of 2) TCD Control Field Name start active Done d_req Description Control bit to explicitly start channel when using a software initiated DMA service. (Automatically cleared by hardware after service begins) (Note: Do not set START if DMA request comes from HW) Status bit indicating the channel is currently in execution Status bit indicating major loop completion (Set by hardware as CITER reaches 0. Cleared by software if using software initiated DMA service request.) Control bit to disable DMA request at end of major loop completion Clears channel enable bit, DMAERQ, at major loop completion so no additional DMA requests are recognized until channel is enabled again (Important for FIFOs later) External Use 29

31 Peripherals: DMA Other Control & Status Fields (2 of 2) TCD Control Field Name BWC[0:1] e_sg int_half int_major Description Control bits for throttling bandwidth control of a channel. Control bit to enable scatter-gather feature. Control bit to enable interrupt when major loop is half complete (DONE = 0) Control bit to enable interrupt when major loop completes (DONE = 1) BWC - Bandwidth Control Forces the edma to stall after the completion of each read/write access to control the bus request bandwidth seen by the crossbar. 00 No DMA_Engine stalls (for inner loop) 01 reserved 10 DMA_Engine stalls for 4 cycles after each R/W 11 DMA_Engine stalls for 8 cycles after each R/W External Use 30

32 Peripherals: DMA Register and TCD Memory Map (1 of 2) Channels must be enabled before any DMA request is recognized Errors are caused from a bad configuration or a bus error 8-bit registers to easily modify a single channel s: Enable request bit Enable error interrupt bit Interrupt request (clear only) Error (clear only) TCD start bit (set only) TCD done bit (clear only) Signals the presence of an interrupt request for a channel Signals the presence of an error on a channel 0 31 Set Enable Request Reg (SERQR) Clear Interrupt Request Reg. (CINTR) Control Register (CR) Error Status Register (ESR) Clear Enable Request Reg. (CERQR) Clear Error Reg. (CERR) Ena. Request Reg Low (ERQRL) Ena. Error IRQ Reg Low (EEIRL) Set Enable Error Interrupt Reg. (SEEIR) Set Start Bit Reg. (SSBR) Clear Enable Error Interrupt Reg. (CEEIR) Clear Done Status Bit Reg. (CDSBR) Interrupt Request Low (INTRL) Error Low (ERRL) External Use 31

33 Peripherals: DMA Register and TCD Memory Map (1 of 2) 0 31 Priority Registers (PR) include capability for that channel to be preempted by a higher priority channel and priority assignment Channel 0 Priority Reg. Channel 1 Priority Reg. Channel 2 Priority Reg. Channel 3 Priority Reg. Channel 12 Priority Reg. Channel 13 Priority Reg. Channel 14 Priority Reg. Channel 15 Priority Reg. 16 Transfer Control Descriptors One per channel 8 words per TCD External Use 32

34 ASRC DMA Workflow: Support for Polling, Interrupt or DMA 48kHz Sample 1 OUTPUT INPUT 48kHz Sample 2 48kHz Sample 3 8kHz 24bit Sample 4 48kHz Sample 4 saddr (source address) = ASRC ASRDO saddr register (source address) = start of memory queue 48kHz Sample 5 daddr (destination addr.) = start of memory daddr (destination addr.) = ASRC ASRDI register 48kHz Sample 6 nbytes 8kHz 24bit = 4 Sample bytes 1 (minor loop size; # bytes nbytes per = request) 4 bytes (minor loop size; # bytes 48kHz per request) Sample 7 Input clock biter 8kHz = 24bit citer Sample = 302 (# minor loops in major biter loop) = citer = 5 (# minor Output loops clock in major loop) 48kHz Sample 8 d_req 8kHz 24bit = 0 Sample (keep 3 channel enabled after major d_req = loop) 0 (keep channel enabled after major 48kHz loop) Sample 9 48kHz Sample 10 ssize = 32 bits (read 4bytes per transfer) ssize = 32 bits (read 4bytes per transfer) 8kHz 24bit Sample 5 soff = 0 bytes (src. addr. increment after soff transfer) = 4 bytes (src. addr. increment after transfer) slast = 0 (disabled) slast = -20 (restart saddr to start when done) Input FIFO Output FIFO smod = 0 (disabled) smod Engine = 0 (disabled) ASRDI ASRDO dsize = 32 bits (write 4bytes per transfer) dsize = 32 bits (write 4bytes per transfer) doff = 4 bytes (add 0 to dest. addr after doff each = transfer) 0 bytes (add 0 to dest. addr after each transfer) dlast = -120 DMA bytes driven (restart (threshold) saddr to start dlast when = 0 done) bytes (disabled) DMA driven threshold dmod = 0 (disabled) dmod = 0 (disabled)... Channel N External Use 33

35 Chan0 Chan1 Chan2 Chan3 Chan4 Chan5 Chan6 Chan7 Chan8 Chan9 Chan10 Chan11 Chan12 Chan13 Chan14 Chan15 Peripherals: DMA Channel Arbitration Default is round-robin mode. Other options include: Fixed-priority arbitration Pro: Fastest latency for higher priorities. Prevents lower priority tasks from using too much DMA bandwidth in case of a high priority deadline Con: Potential to use up DMA bandwidth if a channel is always active in highest priority group Pre-emption Normally transfers must complete before another channel can initiate a transfer Fixed priority allows one level of preemption Note: 0 is lowest priority Example: Fixed Chan. Arbitration External Use 34

36 Vybrid Modules with DMA Support UART SPI PDB PORT SAI (I2S) ADC RLE FTM QSPI SPDIF I2C External Use 35

37 Review of Key Points DMA is often an alternative to CPU interrupt Peripheral flag causes DMA request to transfer data Multiple different transfers can take place with a single DMA request Allows combining multiple peripherals to a new super peripheral External Use 36

38 Example 2 and 3: DMA Usage Launch example 2 Enable DMA to perform memcpy Change from 32bit to 64bit transfers Launch example 3 Enable DMA to load data to SPI buffer External Use 37

39 Dual Core: M4 Core External Use 38

40 More on Processor Core Details ARM Cortex -M4 Real-Time Control Processor Operating at 1:1 clock ratio compared to platform Core frequencies = MHz 3-stage pipeline provides 1.25 DMIPS per MHz performance Includes single precision FPU, DSP & SIMD ISA extensions 16K code and 16K system caches with 32-byte cache line size 64 Kbytes of tightly-coupled memory split equally across TCM{L,U} Modified Harvard 64-bit AHB system bus interface 64-bit AHB backdoor port to TCMs CoreSight debug and trace system Nested Vector Interrupt Controller (NVIC) External Use 39

41 ARM Cortex-M4 Processor Pipeline External Use 40

42 Memory Map M4 core has a static 4GB linear memory map: Standard across all Cortex M cores Internal Private Peripheral Bus 0xFFFFFFFF 0xE INT SYS M4 Core CODE Internal/External RAM Internal/External Flash Peripherals 0x CODE ROM/RAM 0x External Use 41

43 Aliased Locations Often a subset of the full region e.g. QSPI Note that DRAM alias has an offset of 0x80_0000 M4 Alias 0x0080_0000 0x0FFF_FFFF 0x1000_0000-0x17FF_FFFF 0x1800_0000-0x1EFF_FFFF 0x1F00_0000-0x1F07_FFFF Memory Map Address 0x8080_0000 0xDFFF_FFFF 0x2000_0000-0x2FFF_FFFF 0x3000_0000-0x3EFF_FFFF 0x3F00_0000-0x3F07_FFFF Region Descriptor External DDR QSPI0 FlexBus OCRAM SysRAM0 and 1 External Use 42

44 M4 Local Memory Controller TCM SRAM blocks are 32KB each. RAM and Cache are 64bits Wide and provide zero wait state accesses External Use 43

45 Dual Core: Launching M4 External Use 44

46 Dual Core Example Create linker file to put M4 code at specific location 0x3f Also use linker file to define specific start address 0x3f In A5 projects settings, linker configuration pulls in M4 binary at specific location 0x3f In A5 code, set SRC->GPR2 to M4 execution start address 0x3f In A5 code, enable M4 clock: CCM->CCOWR = 0x15a5a Compile M4 project as flat binary Compile A5 project (which includes M4 code) and load into memory External Use 45

47 Dual Core Example Continued BootROM boots A5 core code A5 core configures clocks, DDR, and other basic init A5 core enables M4 clock A5 code continues M4 Code Starts External Use 46

48 Example 4: Dual Core Use secondary M4 core to execute the algorithm1() function in parallel Move the function to different memory locations and compare execution time External Use 47

49 M4 Cache External Use 48

50 Cache basics Cache is a fast local memory that contains a copy of the slower main memory Caches operate on two principles of locality: Spatial locality An access to one location is likely to be followed by accesses from adjacent locations (e.g. Sequential code/array). Temporal locality An access to an area of memory is likely to be repeated within a short time period (e.g. code loop) To minimize the quantity of control information stored, the spatial locality property is used to group several locations together under the same tag. This logical block is commonly known as a cache line. Not all memory regions are suitable for caching Eg peripherals External Use 49

51 Cache terminology Line = smallest loadable unit of cache Index = part of memory address which is used to find the address in cache Way = subdivision of cache; all ways are same size Set = a group of lines with same index from different ways Tag = identifies main memory address of associated line M4 Code Cache: 16KB System Cache: 16KB Line Size: 32Bytes Ways: 2 External Use 50

52 Cache Modes Memory map divided into sections with a cache attribute: Attribute: Cache Write Miss Cache Write Hit Non-Cache Bus N/A Write Through Bus Cache and Bus Write Back Read-to-write Cache Example: Location Address Cache Attribute 0x4000_0000 0x4006_FFFF 0x3F00_0000 0x3F03_FFFF 0x0800_0000 0x0FFF_FFFF IPS0 - Peripherals OCRAM SysRAM0 DDR Alias Non-Cache Write Back Write Through External Use 51

53 Cache Potential Issues If another master accessed a cached location E.g. DMA, A5 If location could have been cached then perform a push operation to write modified cache lines back to bus External Use 52

54 M4 Cache Initialisation Cache is disabled by default. Steps to initialise: Invalidate both ways: LMEM_PxCCR[INVW1 and INVW0] Start invalidate using: LMEM_PxCCR[GO] Wait until bit is cleared Enable Cache using: LMEM_PCCR[ENCACHE] External Use 53

55 Example 5: Cache Enable Code and/or System Cache on the M4 core. Move the function to different memory locations and compare execution time External Use 54

56 Example 6: Revisiting Example 1 We can now apply the learning from Example 2-5 to our initial code. Open Example 6 and note A5 run time Try to perform further optimisations External Use 55

57 Freescale Semiconductor, Inc. External Use

Understanding Vybrid Architecture

Understanding Vybrid Architecture Freescale Semiconductor, Inc. Application Note Document Number: AN4947 Rev. 0, 07/2014 Understanding Vybrid Architecture by Jiri Kotzian and Rastislav Pavlanin Vybrid controller solutions are built on

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

Hands-On Workshop: Memory Configuration and Throughput

Hands-On Workshop: Memory Configuration and Throughput Hands-On Workshop: Memory Configuration and Throughput FTF-AUT-F0343 Ioseph Martinez Senior Applications Engineer A P R. 2 0 1 4 TM External Use Session Introduction This session reviews the challenges

More information

Interconnects, Memory, GPIO

Interconnects, Memory, GPIO Interconnects, Memory, GPIO Dr. Francesco Conti f.conti@unibo.it Slide contributions adapted from STMicroelectronics and from Dr. Michele Magno, others Processor vs. MCU Pipeline Harvard architecture Separate

More information

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,

More information

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Joseph Yiu, ARM Ian Johnson, ARM January 2013 Abstract: While the majority of Cortex -M processor-based microcontrollers are

More information

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Li Chen, Staff AE Cadence China Agenda Performance Challenges Current Approaches Traffic Profiles Intro Traffic Profiles Implementation

More information

MAC57D5xx Start-Up Sequence

MAC57D5xx Start-Up Sequence Freescale Semiconductor Document Number: AN5285 Application Note Rev. 0, 05/2016 MAC57D5xx Start-Up Sequence by: Manuel Rodriguez 1 Introduction The MAC57D5xx family is the next generation platform of

More information

The Challenges of System Design. Raising Performance and Reducing Power Consumption

The Challenges of System Design. Raising Performance and Reducing Power Consumption The Challenges of System Design Raising Performance and Reducing Power Consumption 1 Agenda The key challenges Visibility for software optimisation Efficiency for improved PPA 2 Product Challenge - Software

More information

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems Designing, developing, debugging ARM and heterogeneous multi-processor systems Kinjal Dave Senior Product Manager, ARM ARM Tech Symposia India December 7 th 2016 Topics Introduction System design Software

More information

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7 ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7 Scherer Balázs Budapest University of Technology and Economics Department of Measurement and Information Systems BME-MIT 2018 Trends of 32-bit microcontrollers

More information

Welcome to this presentation of the STM32 direct memory access controller (DMA). It covers the main features of this module, which is widely used to

Welcome to this presentation of the STM32 direct memory access controller (DMA). It covers the main features of this module, which is widely used to Welcome to this presentation of the STM32 direct memory access controller (DMA). It covers the main features of this module, which is widely used to handle the STM32 peripheral data transfers. 1 The Direct

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Cortex-R5 Software Development

Cortex-R5 Software Development Cortex-R5 Software Development Course Description Cortex-R5 software development is a three days ARM official course. The course goes into great depth, and provides all necessary know-how to develop software

More information

Hercules ARM Cortex -R4 System Architecture. Processor Overview

Hercules ARM Cortex -R4 System Architecture. Processor Overview Hercules ARM Cortex -R4 System Architecture Processor Overview What is Hercules? TI s 32-bit ARM Cortex -R4/R5 MCU family for Industrial, Automotive, and Transportation Safety Hardware Safety Features

More information

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400

More information

Designing with NXP i.mx8m SoC

Designing with NXP i.mx8m SoC Designing with NXP i.mx8m SoC Course Description Designing with NXP i.mx8m SoC is a 3 days deep dive training to the latest NXP application processor family. The first part of the course starts by overviewing

More information

Section 6 Blackfin ADSP-BF533 Memory

Section 6 Blackfin ADSP-BF533 Memory Section 6 Blackfin ADSP-BF533 Memory 6-1 a ADSP-BF533 Block Diagram Core Timer 64 L1 Instruction Memory Performance Monitor JTAG/ Debug Core Processor LD0 32 LD1 32 L1 Data Memory SD32 DMA Mastered 32

More information

Caches. Hiding Memory Access Times

Caches. Hiding Memory Access Times Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY

More information

5 MEMORY. Overview. Figure 5-0. Table 5-0. Listing 5-0.

5 MEMORY. Overview. Figure 5-0. Table 5-0. Listing 5-0. 5 MEMORY Figure 5-0. Table 5-0. Listing 5-0. Overview The ADSP-2191 contains a large internal memory and provides access to external memory through the DSP s external port. This chapter describes the internal

More information

SoC Platforms and CPU Cores

SoC Platforms and CPU Cores SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

ARM architecture road map. NuMicro Overview of Cortex M. Cortex M Processor Family (2/3) All binary upwards compatible

ARM architecture road map. NuMicro Overview of Cortex M. Cortex M Processor Family (2/3) All binary upwards compatible ARM architecture road map NuMicro Overview of Cortex M NuMicro@nuvoton.com 1 2 Cortex M Processor Family (1/3) Cortex M0 Cortex M0+ Cortex M3 Cortex M4 Low cost, ultra low power deeply embedded applications

More information

STM32F7 series ARM Cortex -M7 powered Releasing your creativity

STM32F7 series ARM Cortex -M7 powered Releasing your creativity STM32F7 series ARM Cortex -M7 powered Releasing your creativity STM32 high performance Very high performance 32-bit MCU with DSP and FPU The STM32F7 with its ARM Cortex -M7 core is the smartest MCU and

More information

Chapter 5. Introduction ARM Cortex series

Chapter 5. Introduction ARM Cortex series Chapter 5 Introduction ARM Cortex series 5.1 ARM Cortex series variants 5.2 ARM Cortex A series 5.3 ARM Cortex R series 5.4 ARM Cortex M series 5.5 Comparison of Cortex M series with 8/16 bit MCUs 51 5.1

More information

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech

More information

Zynq-7000 All Programmable SoC Product Overview

Zynq-7000 All Programmable SoC Product Overview Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform

More information

Using the i.mx RT FlexRAM

Using the i.mx RT FlexRAM NXP Semiconductors Document Number: AN12077 Application Note Rev. 0, 10/2017 Using the i.mx RT FlexRAM 1. Introduction This document describes the flexible memory array available on the i.mx RT MCUs. The

More information

EFM32 Series 0: DMA (ARM PrimeCell µdma PL230)

EFM32 Series 0: DMA (ARM PrimeCell µdma PL230) EFM32 Series 0: DMA (ARM PrimeCell µdma PL230) EFM32 - DMA DMA has read/write access to most of the EFM32 memory map Flash writes can not be done in memory map, but through sequenced writes to peripheral

More information

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. ARM CORTEX-R52 Course Family: ARMv8-R Cortex-R CPU Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. Duration: 4 days Prerequisites and related

More information

RM4 - Cortex-M7 implementation

RM4 - Cortex-M7 implementation Formation Cortex-M7 implementation: This course covers the Cortex-M7 V7E-M compliant CPU - Processeurs ARM: ARM Cores RM4 - Cortex-M7 implementation This course covers the Cortex-M7 V7E-M compliant CPU

More information

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye Negotiating the Maze Getting the most out of memory systems today and tomorrow Robert Kaye 1 System on Chip Memory Systems Systems use external memory Large address space Low cost-per-bit Large interface

More information

ARM Cortex-A9 ARM v7-a. A programmer s perspective Part1

ARM Cortex-A9 ARM v7-a. A programmer s perspective Part1 ARM Cortex-A9 ARM v7-a A programmer s perspective Part1 ARM: Advanced RISC Machine First appeared in 1985 as Acorn RISC Machine from Acorn Computers in Manchester England Limited success outcompeted by

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

RM3 - Cortex-M4 / Cortex-M4F implementation

RM3 - Cortex-M4 / Cortex-M4F implementation Formation Cortex-M4 / Cortex-M4F implementation: This course covers both Cortex-M4 and Cortex-M4F (with FPU) ARM core - Processeurs ARM: ARM Cores RM3 - Cortex-M4 / Cortex-M4F implementation This course

More information

The RM9150 and the Fast Device Bus High Speed Interconnect

The RM9150 and the Fast Device Bus High Speed Interconnect The RM9150 and the Fast Device High Speed Interconnect John R. Kinsel Principal Engineer www.pmc -sierra.com 1 August 2004 Agenda CPU-based SOC Design Challenges Fast Device (FDB) Overview Generic Device

More information

The ARM Cortex-M0 Processor Architecture Part-1

The ARM Cortex-M0 Processor Architecture Part-1 The ARM Cortex-M0 Processor Architecture Part-1 1 Module Syllabus ARM Architectures and Processors What is ARM Architecture ARM Processors Families ARM Cortex-M Series Family Cortex-M0 Processor ARM Processor

More information

Universität Dortmund. ARM Architecture

Universität Dortmund. ARM Architecture ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture

More information

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses

More information

AVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction.

AVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction. AVR XMEGA TM Product Introduction 32-bit AVR UC3 AVR Flash Microcontrollers The highest performance AVR in the world 8/16-bit AVR XMEGA Peripheral Performance 8-bit megaavr The world s most successful

More information

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

C66x KeyStone Training HyperLink

C66x KeyStone Training HyperLink C66x KeyStone Training HyperLink 1. HyperLink Overview 2. Address Translation 3. Configuration 4. Example and Demo Agenda 1. HyperLink Overview 2. Address Translation 3. Configuration 4. Example and Demo

More information

Design Choices for FPGA-based SoCs When Adding a SATA Storage }

Design Choices for FPGA-based SoCs When Adding a SATA Storage } U4 U7 U7 Q D U5 Q D Design Choices for FPGA-based SoCs When Adding a SATA Storage } Lorenz Kolb & Endric Schubert, Missing Link Electronics Rudolf Usselmann, ASICS World Services Motivation for SATA Storage

More information

Effective System Design with ARM System IP

Effective System Design with ARM System IP Effective System Design with ARM System IP Mentor Technical Forum 2009 Serge Poublan Product Marketing Manager ARM 1 Higher level of integration WiFi Platform OS Graphic 13 days standby Bluetooth MP3 Camera

More information

ARM Cortex-M4 Architecture and Instruction Set 1: Architecture Overview

ARM Cortex-M4 Architecture and Instruction Set 1: Architecture Overview ARM Cortex-M4 Architecture and Instruction Set 1: Architecture Overview M J Brockway January 25, 2016 UM10562 All information provided in this document is subject to legal disclaimers. NXP B.V. 2014. All

More information

Accessing I/O Devices Interface to CPU and Memory Interface to one or more peripherals Generic Model of IO Module Interface for an IO Device: CPU checks I/O module device status I/O module returns status

More information

EE414 Embedded Systems Ch 5. Memory Part 2/2

EE414 Embedded Systems Ch 5. Memory Part 2/2 EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage

More information

Lecture 5: Computing Platforms. Asbjørn Djupdal ARM Norway, IDI NTNU 2013 TDT

Lecture 5: Computing Platforms. Asbjørn Djupdal ARM Norway, IDI NTNU 2013 TDT 1 Lecture 5: Computing Platforms Asbjørn Djupdal ARM Norway, IDI NTNU 2013 2 Lecture overview Bus based systems Timing diagrams Bus protocols Various busses Basic I/O devices RAM Custom logic FPGA Debug

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

Keystone Architecture Inter-core Data Exchange

Keystone Architecture Inter-core Data Exchange Application Report Lit. Number November 2011 Keystone Architecture Inter-core Data Exchange Brighton Feng Vincent Han Communication Infrastructure ABSTRACT This application note introduces various methods

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

KeyStone II. CorePac Overview

KeyStone II. CorePac Overview KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB

More information

Using the i.mxrt L1 Cache

Using the i.mxrt L1 Cache NXP Semiconductors Document Number: AN12042 Application Note Rev. 0, 08/2017 Using the i.mxrt L1 Cache 1. Introduction i.mxrt series takes advantage of the ARM Cortex-M7 core with 32K/32K L1 I/D-Cache.

More information

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010 SEMICON Solutions Bus Structure Created by: Duong Dang Date: 20 th Oct,2010 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single

More information

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic

More information

Kinetis Software Optimization

Kinetis Software Optimization Kinetis Software Optimization Course Description This course provides all necessary theoretical and practical know-how to enhance performance with the Kinetis family. The course provides an in-depth overview

More information

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware. Department of Computer Science, Institute for System Architecture, Operating Systems Group Real-Time Systems '08 / '09 Hardware Marcus Völp Outlook Hardware is Source of Unpredictability Caches Pipeline

More information

Zynq Architecture, PS (ARM) and PL

Zynq Architecture, PS (ARM) and PL , PS (ARM) and PL Joint ICTP-IAEA School on Hybrid Reconfigurable Devices for Scientific Instrumentation Trieste, 1-5 June 2015 Fernando Rincón Fernando.rincon@uclm.es 1 Contents Zynq All Programmable

More information

Computer Organization ECE514. Chapter 5 Input/Output (9hrs)

Computer Organization ECE514. Chapter 5 Input/Output (9hrs) Computer Organization ECE514 Chapter 5 Input/Output (9hrs) Learning Outcomes Course Outcome (CO) - CO2 Describe the architecture and organization of computer systems Program Outcome (PO) PO1 Apply knowledge

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Taking Advantage of Using the dmax DMA Engine in Conjunction with the McASP Peripheral on the TMS320C67x DSP

Taking Advantage of Using the dmax DMA Engine in Conjunction with the McASP Peripheral on the TMS320C67x DSP 01001000100000110000001000001100 010010001000 Taking Advantage of Using the dmax DMA Engine in Conjunction with the McASP Peripheral on the TMS30C67x DSP SPRP498 Name: Gaganjot Singh Maur Title: Application

More information

Chapter 6 Storage and Other I/O Topics

Chapter 6 Storage and Other I/O Topics Department of Electr rical Eng ineering, Chapter 6 Storage and Other I/O Topics 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Feng-Chia Unive ersity Outline 6.1 Introduction 6.2 Dependability,

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 3 September 2015 Announcements HW#1 will be posted today, due next Thursday. I will send out

More information

Buses. Maurizio Palesi. Maurizio Palesi 1

Buses. Maurizio Palesi. Maurizio Palesi 1 Buses Maurizio Palesi Maurizio Palesi 1 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single shared channel Microcontroller Microcontroller

More information

C66x KeyStone Training HyperLink

C66x KeyStone Training HyperLink C66x KeyStone Training HyperLink 1. HyperLink Overview 2. Address Translation 3. Configuration 4. Example and Demo Agenda 1. HyperLink Overview 2. Address Translation 3. Configuration 4. Example and Demo

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

AN4838. Managing memory protection unit (MPU) in STM32 MCUs. Application note. Introduction

AN4838. Managing memory protection unit (MPU) in STM32 MCUs. Application note. Introduction Application note Managing memory protection unit (MPU) in STM32 MCUs Introduction This application note describes how to manage the MPU in the STM32 products which is an optional component for the memory

More information

Product Technical Brief S3C2413 Rev 2.2, Apr. 2006

Product Technical Brief S3C2413 Rev 2.2, Apr. 2006 Product Technical Brief Rev 2.2, Apr. 2006 Overview SAMSUNG's is a Derivative product of S3C2410A. is designed to provide hand-held devices and general applications with cost-effective, low-power, and

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Course Introduction. Purpose: Objectives: Content: 27 pages 4 questions. Learning Time: 20 minutes

Course Introduction. Purpose: Objectives: Content: 27 pages 4 questions. Learning Time: 20 minutes Course Introduction Purpose: This course provides an overview of the Direct Memory Access Controller and the Interrupt Controller on the SH-2 and SH-2A families of 32-bit RISC microcontrollers, which are

More information

Product Technical Brief S3C2416 May 2008

Product Technical Brief S3C2416 May 2008 Product Technical Brief S3C2416 May 2008 Overview SAMSUNG's S3C2416 is a 32/16-bit RISC cost-effective, low power, high performance micro-processor solution for general applications including the GPS Navigation

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast

More information

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU New STM32 F7 Series World s 1 st to market, ARM Cortex -M7 based 32-bit MCU 7 Keys of STM32 F7 series 2 1 2 3 4 5 6 7 First. ST is first to sample a fully functional Cortex-M7 based 32-bit MCU : STM32

More information

systems such as Linux (real time application interface Linux included). The unified 32-

systems such as Linux (real time application interface Linux included). The unified 32- 1.0 INTRODUCTION The TC1130 is a highly integrated controller combining a Memory Management Unit (MMU) and a Floating Point Unit (FPU) on one chip. Thanks to the MMU, this member of the 32-bit TriCoreTM

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

TMS320C672x DSP Dual Data Movement Accelerator (dmax) Reference Guide

TMS320C672x DSP Dual Data Movement Accelerator (dmax) Reference Guide TMS320C672x DSP Dual Data Movement Accelerator (dmax) Reference Guide Literature Number: SPRU795D November 2005 Revised October 2007 2 SPRU795D November 2005 Revised October 2007 Contents Preface... 11

More information

Course Introduction. Purpose: Objectives: Content: Learning Time:

Course Introduction. Purpose: Objectives: Content: Learning Time: Course Introduction Purpose: This course provides an overview of the Renesas SuperH series of 32-bit RISC processors, especially the microcontrollers in the SH-2 and SH-2A series Objectives: Learn the

More information

Introduction. PURPOSE: This course explains several important features of the i.mx21 microprocessor.

Introduction. PURPOSE: This course explains several important features of the i.mx21 microprocessor. Introduction PURPOSE: This course explains several important features of the i.mx21 microprocessor. OBJECTIVES: - Describe the features and functions of the ARM926EJ-S TM Core - Explain three processor

More information

L2 - C language for Embedded MCUs

L2 - C language for Embedded MCUs Formation C language for Embedded MCUs: Learning how to program a Microcontroller (especially the Cortex-M based ones) - Programmation: Langages L2 - C language for Embedded MCUs Learning how to program

More information

Introduction to ARM LPC2148 Microcontroller

Introduction to ARM LPC2148 Microcontroller Introduction to ARM LPC2148 Microcontroller Dr.R.Sundaramurthy Department of EIE Pondicherry Engineering College Features of LPC2148 in a Nut Shell CPU = ARM 7 Core Word Length = 32 Bit ROM = 512 KB RAM

More information

CISC RISC. Compiler. Compiler. Processor. Processor

CISC RISC. Compiler. Compiler. Processor. Processor Q1. Explain briefly the RISC design philosophy. Answer: RISC is a design philosophy aimed at delivering simple but powerful instructions that execute within a single cycle at a high clock speed. The RISC

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

TMS320C6678 Memory Access Performance

TMS320C6678 Memory Access Performance Application Report Lit. Number April 2011 TMS320C6678 Memory Access Performance Brighton Feng Communication Infrastructure ABSTRACT The TMS320C6678 has eight C66x cores, runs at 1GHz, each of them has

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

SAMA5D2 Quad SPI (QSPI) Performance. Introduction. SMART ARM-based Microprocessor APPLICATION NOTE

SAMA5D2 Quad SPI (QSPI) Performance. Introduction. SMART ARM-based Microprocessor APPLICATION NOTE SMART ARM-based Microprocessor SAMA5D2 Quad SPI (QSPI) Performance APPLICATION NOTE Introduction The Atmel SMART SAMA5D2 Series is a high-performance, powerefficient embedded MPU based on the ARM Cortex

More information

Summary of Computer Architecture

Summary of Computer Architecture Summary of Computer Architecture Summary CHAP 1: INTRODUCTION Structure Top Level Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output

More information

NXP Unveils Its First ARM Cortex -M4 Based Controller Family

NXP Unveils Its First ARM Cortex -M4 Based Controller Family NXP s LPC4300 MCU with Coprocessor: NXP Unveils Its First ARM Cortex -M4 Based Controller Family By Frank Riemenschneider, Editor, Electronik Magazine At the Electronica trade show last fall in Munich,

More information

Early Software Development Through Emulation for a Complex SoC

Early Software Development Through Emulation for a Complex SoC Early Software Development Through Emulation for a Complex SoC FTF-NET-F0204 Raghav U. Nayak Senior Validation Engineer A P R. 2 0 1 4 TM External Use Session Objectives After completing this session you

More information

CPCI-AD8. Intelligent DSP Based 8 Channel Analog Input Card for 3U CompactPCI systems REFERENCE MANUAL Version 1.

CPCI-AD8. Intelligent DSP Based 8 Channel Analog Input Card for 3U CompactPCI systems REFERENCE MANUAL Version 1. CPCI-AD8 Intelligent DSP Based 8 Channel Analog Input Card for 3U CompactPCI systems REFERENCE MANUAL 753-13-000-4000 Version 1.3 JUNE 2003 ALPHI TECHNOLOGY CORPORATION 6202 S. Maple Avenue #120 Tempe,

More information

Computer Memory. Textbook: Chapter 1

Computer Memory. Textbook: Chapter 1 Computer Memory Textbook: Chapter 1 ARM Cortex-M4 User Guide (Section 2.2 Memory Model) STM32F4xx Technical Reference Manual: Chapter 2 Memory and Bus Architecture Chapter 3 Flash Memory Chapter 36 Flexible

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

TAG Word 0 Word 1 Word 2 Word 3 0x0A0 D2 55 C7 C8 0x0A0 FC FA AC C7 0x0A0 A5 A6 FF 00

TAG Word 0 Word 1 Word 2 Word 3 0x0A0 D2 55 C7 C8 0x0A0 FC FA AC C7 0x0A0 A5 A6 FF 00 ELE 758 Final Examination 2000: Answers and solutions Number of hits = 15 Miss rate = 25 % Miss rate = [5 (misses) / 20 (total memory references)]* 100% = 25% Show the final content of cache using the

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Achieving UFS Host Throughput For System Performance

Achieving UFS Host Throughput For System Performance Achieving UFS Host Throughput For System Performance Yifei-Liu CAE Manager, Synopsys Mobile Forum 2013 Copyright 2013 Synopsys Agenda UFS Throughput Considerations to Meet Performance Objectives UFS Host

More information

How to manage Cortex-M7 Cache Coherence on the Atmel SAM S70 / E70

How to manage Cortex-M7 Cache Coherence on the Atmel SAM S70 / E70 How to manage Cortex-M7 Cache Coherence on the Atmel SAM S70 / E70 1 2015 Atmel Corporation Prerequisites Atmel Technical Presentations Atmel SMART Cortex-M7 ntroduction Using the SAM S70/E70 Cortex-M7

More information

STM32 MICROCONTROLLER

STM32 MICROCONTROLLER STM32 MICROCONTROLLER Lecture 2 Prof. Yasser Mostafa Kadah Harvard and von Neumann Architectures Harvard Architecture a type of computer architecture where the instructions (program code) and data are

More information

Product Technical Brief S3C2412 Rev 2.2, Apr. 2006

Product Technical Brief S3C2412 Rev 2.2, Apr. 2006 Product Technical Brief S3C2412 Rev 2.2, Apr. 2006 Overview SAMSUNG's S3C2412 is a Derivative product of S3C2410A. S3C2412 is designed to provide hand-held devices and general applications with cost-effective,

More information