Purpose This course provides an overview of the SH-2A 32-bit RISC CPU core built into newer microcontrollers in the popular SH-2 series

Similar documents
This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers

Course Introduction. Purpose: Objectives: Content: 27 pages 4 questions. Learning Time: 20 minutes

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

Computer Organization Question Bank

ECE 30 Introduction to Computer Engineering

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

1. PowerPC 970MP Overview

PowerPC 740 and 750

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-11: 80x86 Architecture

1.Explain with the diagram IVT of 80X86. Ans-

Topics in computer architecture

ECE 341 Final Exam Solution

Hercules ARM Cortex -R4 System Architecture. Processor Overview

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

The Nios II Family of Configurable Soft-core Processors

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

Chapter 13 Reduced Instruction Set Computers

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

ARM processor organization

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

HY225 Lecture 12: DRAM and Virtual Memory

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

Trying to design a simple yet efficient L1 cache. Jean-François Nguyen

CS2253 COMPUTER ORGANIZATION AND ARCHITECTURE 1 KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY

Course Introduction. Purpose: Objectives: Content: Learning Time:

ARM ARCHITECTURE. Contents at a glance:

Instruction Set Overview

CPE 631 Advanced Computer Systems Architecture: Homework #2

Design and Implementation of a FPGA-based Pipelined Microcontroller

ECE 3055: Final Exam

INTELLIGENCE PLUS CHARACTER - THAT IS THE GOAL OF TRUE EDUCATION UNIT-I

The CPU Pipeline. MIPS R4000 Microprocessor User's Manual 43

Structure of Computer Systems

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures

JNTUWORLD. 1. Discuss in detail inter processor arbitration logics and procedures with necessary diagrams? [15]

CS 2410 Mid term (fall 2018)

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

References EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions)

The check bits are in bit numbers 8, 4, 2, and 1.

CHAPTER 4 MARIE: An Introduction to a Simple Computer

CPE300: Digital System Architecture and Design

Real instruction set architectures. Part 2: a representative sample

ECE331: Hardware Organization and Design

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

CAM Content Addressable Memory. For TAG look-up in a Fully-Associative Cache

Delhi Noida Bhopal Hyderabad Jaipur Lucknow Indore Pune Bhubaneswar Kolkata Patna Web: Ph:

Chapter 2 Sections 1 8 Dr. Iyad Jafar

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

SAE5C Computer Organization and Architecture. Unit : I - V

Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.

101. The memory blocks are mapped on to the cache with the help of a) Hash functions b) Vectors c) Mapping functions d) None of the mentioned

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Chapter 7 The Potential of Special-Purpose Hardware

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Characteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram

Chapter 5. Introduction ARM Cortex series

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function

Dynamic Scheduling. CSE471 Susan Eggers 1

Q.1 Explain Computer s Basic Elements

Processors, Performance, and Profiling

Introduction to general architectures of 8 and 16 bit micro-processor and micro-controllers

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Digital Semiconductor Alpha Microprocessor Product Brief

Computer & Microprocessor Architecture HCA103

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

EECS 322 Computer Architecture Superpipline and the Cache

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Where Does The Cpu Store The Address Of The

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

CHETTINAD COLLEGE OF ENGINEERING AND TECHNOLOGY COMPUTER ARCHITECURE- III YEAR EEE-6 TH SEMESTER 16 MARKS QUESTION BANK UNIT-1

2.5 Address Space. The IBM 6x86 CPU can directly address 64 KBytes of I/O space and 4 GBytes of physical memory (Figure 2-24).

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the

Universität Dortmund. ARM Architecture

Hardware-based Speculation

Computer System Overview OPERATING SYSTEM TOP-LEVEL COMPONENTS. Simplified view: Operating Systems. Slide 1. Slide /S2. Slide 2.

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

Computer System Overview

William Stallings Computer Organization and Architecture 8th Edition. Cache Memory

Cycles Per Instruction For This Microprocessor

University of Toronto Faculty of Applied Science and Engineering

Advanced Computer Architecture

PROGRAM CONTROL UNIT (PCU)

EE 457 Unit 7b. Main Memory Organization

This section covers the MIPS instruction set.

CISC 360. Cache Memories Exercises Dec 3, 2009

Caches. Hiding Memory Access Times

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems

Hardware and Software Architecture. Chapter 2

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin

Chapter. Out of order Execution

ECE 485/585 Midterm Exam

Transcription:

Course Introduction Purpose This course provides an overview of the SH-2A 32-bit RISC CPU core built into newer microcontrollers in the popular SH-2 series Objectives Acquire knowledge about the CPU s register banks Gain an understanding of the SH-2A s on-chip cache memory Review some helpful programming suggestions Content 13 pages 2 questions Learning Time 20 minutes 1

SH-2A/SH2A-FPU Register Banks The SH-2A and SH2A-FPU CPU cores have register banks that: Provide high-speed register save and retrieve, particularly useful for improving the performance of interrupt processing Can be banked automatically by interrupts, based and enabled on an interrupt priority basis Can be restored using the RESBANK instruction SH-2A CPU Superscalar* RISC Design General Registers System Registers 5-stage Pipeline (*Two instructions are fetched and executed simultaneously) Hardware Multiplier Control Registers Register Banks CPU Instruction Fetch Bus CPU Data Fetch Bus On-chip Cache FPU (SH2A-FPU only) Clock 2

Nineteen Registers Are Banked General Registers R0 to R14 GBR MAC Registers Procedure Register IBCR, IBNR 3

Number of Register Banks SH-2A/SH2A-FPU architecture supports up to 512 banks, but the typical number is about 15 When all banks are full, the register contents are saved to and restored from the stack automatically Exceptions can be generated when: An attempt is made to bank registers when all banks are full (overflow) An attempt is made to restore register contents via a RESBANK instruction when all banks are empty (underflow) 4

Question Is the following statement true or false? Click Done when you are finished. When the ISR begins executing, it stacks the CPU contents in RAM, a process aided by register banking. True False Done 5

On-Chip, 16KB Cache Memory Built-in cache controller Separate operand (data) and instruction caches 8KB each Four-way set associative 128 entries per way 16-byte cache line size Operand cache: ways 2 and 3 are lockable Write modes Write-back and write-through, selectable LRU replacement algorithm employed Helps minimize impact of cache line replacement Pre-fetch capability PREF instruction SH-2A CPU Superscalar* RISC Design General Registers System Registers 5-stage Pipeline (*Two instructions are fetched and executed simultaneously) Hardware Multiplier Control Registers Register Banks CPU Instruction Fetch Bus CPU Data Fetch Bus On-chip Cache FPU (SH2A-FPU only) Clock 6

Structure of the Operand Cache There are four ways (Banks) 7

Address and Data Sections Operand Cache Both Both the the address address and and data data sections sections of of the the cache cache are are divided divided into into 128 128 entries entries 8

Cache Line Operand Cache The The data data section section of of each each entry entry is is a cache cache line line of of 16 16 bytes bytes (four (four 4-byte 4-byte longwords) longwords) 9

V: Valid Bit in Address Array Operand Cache V: V: Indicates Indicates when when the the data data in in the the cache cache is is valid valid (set (set to to 1) 1) (Important: (Important: Flush Flush the the cache cache before before using using it; it; that that sets sets the the V bit bit to to 0) 0) 10

U: Has Data Been Written to? Operand Cache U: U: Only Only present present in in the the operand operand cache; cache; it it indicates indicateswhether or or not not the the entry entry has has been been written written to to in in a write-back write-back mode. mode. (U (U is is a 1 when when it it has has been been written written to) to) 11

LRU: Cache Housekeeper Operand Cache LRU: LRU: Stores Stores information information on on which which the the four four ways waysan an entry entry is is stored stored in. in. This This is is important important because because up up to to four four data data or or instruction instruction entries entrieswith with the the same same entry entry address address can can be be registered registered in in the the cache. cache. The The LRU LRU also also indicates indicates the the least-used least-used data, data, if if replacement replacement is is necessary. necessary. 12

Seven Bits = 128 Entries Operand Cache Always zero Entries Entries are are selected selected using using bits bits 10 10 to to 4 of of the the memory memory address address (The (The four four LSBs LSBs are are always always 0) 0) 13

Tag Address Operand Cache Bits Bits 31 31 to to 11 11 of of the the address addressare arestored as as the thetag tag address address in in the the cache. cache. 14

V=1, Cache Hit; V=0, Cache Miss Operand Cache When When the the comparison comparison shows shows a match match and and the the V bit bit is is 1, 1, a cache cache hit hit occurs. occurs. If If the the V bit bit is is 0, 0, a cache cache miss miss occurs. occurs. 15

Cache Read Hits/Misses Read hit Data is transferred from the cache to the CPU Read miss External bus cycle starts and the cache entry is updated The data is transferred to the CPU at the same time that it is loaded into the cache The V bit is set and the LRU is updated For the operand cache, the U bit is cleared to 0 If the U bit was 1, the original contents of the cache are copied to the write-back buffer before the cache is updated After the cache fill, a cache write-back occurs to restore the original cache contents 16

Operand Cache Write Hits/Misses Write hit Write-back mode Data is written to the cache and no external access occurs The U bit is set and the LRU is updated Write-through mode Data is written to the cache and an external write cycle is issued. The U bit is not set; the LRU is updated Write miss Write-back mode External cycle starts and entry is updated If the U bit of the replaced cache way is 1, a cache update occurs after the original cache line is written to the write-back buffer After the cache update, the write-back buffer is written to external memory Write-through mode No cache write occurs There is external memory access only 17

Question Match the SH-2A instructions to the appropriate descriptions by dragging the letters on the left to their appropriate locations on the right. Click Done when you are finished. A Operand cache B Indicates when the data in the cache is valid B V bit D Occurs when the comparison shows a match and V is 1 C U = 1 A Ways 2 and 3 can be locked D Cache hit C Indicates that the entry has been written to in a write-back mode Done Reset Show Solution 18

Ten Helpful Programming Tips 1. Locate branch destinations on longword boundaries 2. Use a register different from the load destination register for the next three instructions after an instruction that loads from memory 3. Use a register different from the multiply result register for the next three instructions after a 32-bit multiply instruction 4. Use local or automatic stack-based variables wherever possible 5. Use modular programming 6. Be careful with constants, using 8-bit if possible 7. Avoid unnecessary MAC and FPU operations that might stall pipelines 8. Place functions that call each other close together 9. Try to align instructions on 32-bit boundaries 10. Convert byte and word values to signed-long integers

Course Summary Register banks of SH-2A and SH2A-FPU RISC CPU cores On-chip cache memory Suggestions for efficient programming 20