Chapter 13 Reduced Instruction Set Computers

Similar documents
William Stallings Computer Organization and Architecture. Chapter 12 Reduced Instruction Set Computers

ARSITEKTUR SISTEM KOMPUTER. Wayan Suparta, PhD 17 April 2018

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

Lecture 4: RISC Computers

COMPUTER ORGANIZATION & ARCHITECTURE

Lecture 4: RISC Computers

Major Advances (continued)

Topics in computer architecture

Processing Unit CS206T

RISC Principles. Introduction

CISC Attributes. E.g. Pentium is considered a modern CISC processor

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the

Lecture 4: Instruction Set Architecture

Overview. EE 4504 Computer Organization. This section. Items to be discussed. Section 9 RISC Architectures. Reading: Text, Chapter 12

55:132/22C:160, HPCA Spring 2011

REDUCED INSTRUCTION SET COMPUTERS (RISC)

Microprocessor Architecture Dr. Charles Kim Howard University

Instruction Set Architecture

ECE 486/586. Computer Architecture. Lecture # 7

Lecture 8: RISC & Parallel Computers. Parallel computers

Computer Organization CS 206 T Lec# 2: Instruction Sets

Instruction Set Principles. (Appendix B)

CSEE 3827: Fundamentals of Computer Systems

RISC Architecture Ch 12

Typical Processor Execution Cycle

Digital System Design Using Verilog. - Processing Unit Design

Instruction Set Architecture (ISA)

Chapter 5:: Target Machine Architecture (cont.)

Page 1. Structure of von Nuemann machine. Instruction Set - the type of Instructions

CISC / RISC. Complex / Reduced Instruction Set Computers

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.

Advanced Computer Architecture

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

RISC Processors and Parallel Processing. Section and 3.3.6

Review of instruction set architectures

New Advances in Micro-Processors and computer architectures

Instruction Set Architecture. "Speaking with the computer"

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

Reduced Instruction Set Computer

Chapter 2 Logic Gates and Introduction to Computer Architecture

Pipelining, Branch Prediction, Trends

Lecture 4: Instruction Set Design/Pipelining

Instruction Sets Ch 9-10

Instruction Sets Ch 9-10

Instruction Set Principles and Examples. Appendix B

Pipelining and Vector Processing

ECE 486/586. Computer Architecture. Lecture # 8

CPE300: Digital System Architecture and Design

COMPUTER ORGANIZATION & ARCHITECTURE

Instruction Set. Instruction Sets Ch Instruction Representation. Machine Instruction. Instruction Set Design (5) Operation types

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

Reminder: tutorials start next week!

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles.

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

CSCE 5610: Computer Architecture

Computer Architecture

MARIE: An Introduction to a Simple Computer

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

Chapter 2A Instructions: Language of the Computer

Instruction Set Design

Alternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model

EC 413 Computer Organization

Computer Architecture

Lecture Topics. Announcements. Today: The MIPS ISA (P&H ) Next: continued. Milestone #1 (due 1/26) Milestone #2 (due 2/2)

EJEMPLOS DE ARQUITECTURAS

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 2. Instruction Set Principles and Examples. In-Cheol Park Dept. of EE, KAIST

Like scalar processor Processes individual data items Item may be single integer or floating point number. - 1 of 15 - Superscalar Architectures

The Single Cycle Processor

Instruction Sets: Characteristics and Functions

Computer Systems Laboratory Sungkyunkwan University

ISA: The Hardware Software Interface

COS 140: Foundations of Computer Science

Chapter 4. MARIE: An Introduction to a Simple Computer

Announcements HW1 is due on this Friday (Sept 12th) Appendix A is very helpful to HW1. Check out system calls

Systems Architecture I

Chapter 2. Instruction Set Design. Computer Architectures. software. hardware. Which is easier to change/design??? Tien-Fu Chen

COSC 6385 Computer Architecture. Instruction Set Architectures

Computer Architecture and Organization. Instruction Sets: Addressing Modes and Formats

Math 230 Assembly Programming (AKA Computer Organization) Spring 2008

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

Chapter 4. Chapter 4 Objectives. MARIE: An Introduction to a Simple Computer

Real instruction set architectures. Part 2: a representative sample

EECS 322 Computer Architecture Superpipline and the Cache

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation

Addressing Modes. Immediate Direct Indirect Register Register Indirect Displacement (Indexed) Stack

17. Instruction Sets: Characteristics and Functions

Chapter 4. MARIE: An Introduction to a Simple Computer 4.8 MARIE 4.8 MARIE A Discussion on Decoding

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Instructions: Language of the Computer

Instructions: Language of the Computer

CS Computer Architecture

Lecture 3 Machine Language. Instructions: Instruction Execution cycle. Speaking computer before voice recognition interfaces

Micro-programmed Control Ch 17

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

Transcription:

Chapter 13 Reduced Instruction Set Computers

Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining MIPS R4000 SPARC RISC vs CISC controversy

Major Advances in Computers The family concept IBM System/360 in 1964 DEC PDP-8 Separates architecture from implementation Microprogrammed control unit Idea by Wilkes in 1951 Produced by IBM S/360 in 1964 Each machine instruction is interpreted as a sequence of microinstructions Cache memory IBM S/360 model 85 in 1968

Major Advances in Computers Microprocessors Intel 4004 in 1971 Pipelining Introduces parallelism into instruction execution Multiple processors RISC architecture Large number of GPRs Use of compiler technique to optimize register usage Limited and simple instruction set Emphasis on optimizing the instruction pipeline

Comparison of processors

13.1 Execution Characteristics Evolution of program execution Problems with software development High cost and unreliability More powerful and complex languages are developed Can express algorithms more concisely Take care of the detail Semantic gap problem Difference between HLL operations and machine instructions Execution inefficiency, excessive program size, and compiler complexity Architecture to close this gap Large instruction set, many addressing modes, and HLL statements implemented in hardware Simple architecture : RISC

Execution Characteristics Aspects of instruction execution Operations performed Functions performed by CPU Operands used Operand characteristics determine memory organization and addressing modes Execution sequencing Determines control and pipeline organization

Relative Dynamic Frequency of HLL Operations

Analysis Assignment statements predominate Simple movement of data is highly important Quite of conditional statements Sequence control mechanism is important Results do not reveal which statements use the most time in execution of a typical program Which statement causes the execution of the most machine-language instructions?

Weighted Relative Dynamic Frequency For the data in columns four and five, each value in the second and third columns is multiplied by the number of machine instructions produced by the compiler and then normalized. The sixth and seventh columns are obtained by multiplying the frequency of occurrence of each statement type by the relative number of memory references caused by each statement.

Operands Dynamic frequency of occurrence of classes of variables Mainly local scalar variables Optimization should concentrate on accessing local variables One study shows that each instruction references 0.5 operand in memory and 1.4 registers Fast operand accessing is important

Procedure Calls Procedure Arguments and Local Scalar Variables

Procedure Calls Most time-consuming operations Two important aspects Number of parameters passed Depth of nesting Studies show Number of words required per procedure is not large Nesting is not that deep Operand references are highly localized

Implications Conclusions of experiments Instruction set architecture close to HLL is not the most effective design We need to optimize performance of the most time consuming parts of programs Characteristics of RISC architecture Large number of registers To optimize operand referencing Careful design of pipelines There are many conditional branch and call instructions Simplified (reduced) instruction set

13.2 Use of Large Register File Register size is limited, so We need to keep most frequently accessed operands We need to minimize register-memory operations Software solution Require compiler to allocate registers Allocate based on most used variables in a given time Requires sophisticated program analysis Hardware solution Have more registers Thus more variables will be in registers

Register Windows Organization of large set of registers Most registers for local scalars A few for global variables Definition of local changes with each procedure call Only a few parameters and local variables Depth of nested call is relatively narrow So multiple small set of registers can be used Register window Consists of 3 fixed-size areas + Parameter, local, and temporary registers Circular buffer of register windows Used in SPARC, and IA-64 N-window register file can hold N-1 procedure activations Berkeley RISC use 8 windows of 16 registers each

Overlapping Register Windows

Circular Buffer diagram

Operation of Circular Buffer When a call is made, a current window pointer is moved to show the currently active register window If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory A saved window pointer indicates where the next saved windows should restore to

Global Variables Two options Assign memory locations There may be frequently accessed global variables Assign a set of global registers Fixed and available to all procedure

Large Register File vs Cache

Referencing a Scalar - Window Based Register File

Referencing a Scalar - Cache

13.3 Compiler Based Register Optimizing Case when only a small number of registers is available Compiler is responsible for the optimized usage Approach Variable is assigned to a symbolic(virtual) register Symbolic registers whose usage does not overlap can share the same real register Which symbolic registers to which real registers? Graph coloring can be used here If two symbolic registers are live during the same program fragment, they are joined by an edge to depict interference Try to color the graph with n colors, where n is the number of registers

Graph Coloring Problem Given a graph consists of nodes and edges, assign colors to nodes such that adjacent nodes have different colors and do this in such a way as to minimize the number of different colors Konigsburg bridges problem Introduced by Euler

Graph Coloring Approach

13.4 RISC Architecture Why CISC? Richer instruction sets Larger number and more complex instructions Reasons To simplify compilers To improve performance Can CISC simplify compilers? If there are machine instructions that resemble HLL statements, task can be simplified But complex instructions are hard to exploit Optimizing the code is much more difficult with a complex instruction set

RISC Architecture Why CISC? Is CISC program smaller? Not quite CISC program may have fewer instructions But each instruction is longer + Longer opcodes and address fields Is CISC program faster? A complex HLL operation will execute more quickly as a single machine instruction rather than as a series of more primitive instructions + Use of complex instructions are quite limited Control unit must be made more complex

Code Size Relative to RISC I

RISC Architecture Characteristics of RISC Common properties One instruction per cycle Register-to-register operations Simple addressing modes Simple instruction formats One machine instruction per machine cycle Machine instruction is directly executed by H/W Register-to-register operations Only LOAD/STORE instructions access memory Instruction set and control unit can be simplified + RISC may have only one or two ADD instructions + VAX(CISC) has 25 different ADD instructions

RISC Architecture Simple addressing modes Almost all instructions use simple register addressing Simple instruction formats Only a few formats are used Instruction length is fixed and aligned on word boundaries

Two Comparisons

RISC Architecture Benefits of RISC approach Performance More effective optimizing compilers can be developed Most instructions are directly executed by H/W Instruction pipelining can be applied more effectively More responsive to interrupts VLSI implementation Single-chip processor Devote chip area to those activities that occur frequently + Simple instructions and local scalars + RISC I devotes 6% of its area to control unit + Typical CISC devotes about 50% Design-and-implementation time

Design and Layout Effort

RISC v CISC No clear cut Many designs borrow from both philosophies PowerPC and Pentium II Typical characteristics of RISC Single instruction size of 4 bytes Small number of addressing modes, <5 No indirect addressing No operations that combine load/store with arithmetic No more than one memory-addressed operand per instruction No support for arbitrary alignment for data Number of memory access for a data address is 1 At least 32 integer registers At least 16 floating-point registers

Characteristics of Some Processors

13.5 RISC Pipelining Pipelining with regular instructions Most instructions are register to register Two phases of execution I: Instruction fetch E: Execute + ALU operation with register input and output For load and store I: Instruction fetch E: Execute + Calculate memory address D: Memory + Register to memory or memory to register operation

Effects of Pipelining

Optimization of Pipelining Simple and regular instructions make pipelining work efficiently Problems to deal with Data dependency Branch instruction Delayed branch Normal branch Delayed branch No need to clear the pipeline Optimized delayed branch Clock cycle can be saved

Normal and Delayed Branch

Use of Delayed Branch

Optimization of Pipelining For conditional branches, procedure cannot be blindly applied Still, compiler can seek to insert a useful instruction in delay slot Delayed load Target register for load is locked Continue execution until it reaches an instruction requiring that register Try to rearrange instructions so that useful work can be done while loading

13.6 MIPS R4000 System Developed at Stanford R2000, R3000, and R4000 64-bit machine Bigger address space Easy to handle double-precision FP numbers Partitioned into two sections CPU Coprocessor for memory management 32 64-bit registers R0 always contains 0 128 KB cache Instruction + Data

MIPS R4000 Instruction set 94 instructions with a single 32-bit format Simplifies instruction fetch and decode All data operations are register-register Load/store instructions for memory references lw r2, 128(r3) + Load word at address 128 offset from r3 into r2

MIPS Instruction Formats

MIPS R4000 Evolution of MIPS pipeline One instruction per clock cycle initially Two improvements on this Superscalar + Multiple pipelines so that two or more instructions at the same stage can be processed simultaneously + Dependencies between instructions in different pipelines + Extra logic is required to handle this Superpipeline + Makes use of more fine-grained pipeline stages o More instructions can be in the pipeline + Overhead associated with transferring instructions from one stage to the next + MIPS R4000

MIPS R4000 Evolution of MIPS pipeline(cont d) 5-stage R3000 pipeline Stages + Instruction fetch + Source operand fetch from register + ALU operation or data operand address generation + Data memory reference + Write back into register 60-ns clock cycle is divided into 2 30-ns stages + Some operations require 60 ns, while some other 30 ns There is also parallelism within the execution of a single instruction + Instruction decode and register fetch Functions performed in each stage

Enhancing the R3000 Pipeline

R3000 Pipeline Stages

MIPS R4000 Evolution of MIPS pipeline(cont d) R4000 technical advances over R3000 30-ns clock cycle Halved access time to register On-chip cache 8-stage R4000 pipeline Pipeline advances two stages per clock cycle Stages + Instruction fetch first half + Instruction fetch second half + Register file + Instruction execute + Data cache first + Data cache second + Tag check + Write back

13.7 SPARC Scalable Processor Architecture Designed by Sun and made public SPARC register set Uses register windows Each window with 24 registers Consists of 3 parts : Ins, Locals, Outs Example 8 windows implementation 136 registers for 8 windows 8 global registers(0 ~ 7) + R0 is hardwired with 0

SPARC Register Window Layout

Eight Register Windows Forming Circular Stack in SPARC

SPARC Instruction set Types of instructions Load/store Shift Boolean Arithmetic Jump/branch Others Most instructions are register-register Rd <-- Rs1 op S2 S2 can either be a register or 13-bit immediate operand Synthesizing other addressing modes

Synthesizing Other Addressing Modes with SPARC Addressing

SPARC Instruction format 69 instructions with a single 32-bit format Types of formats Call format + 30 bits + 00 --> 32 bit displacement in twos complement form Branch format + 22 bits + 00 --> 24 bit displacement in twos complement form + Annul bit o Specifies what to do for the delay slot instruction SETHI format + Used to load/store a 32-bit value + 22 high-order bits + 10 lower-order zeros Floating-point format General format + Used for load, store, arithmetic, and logical operations

SPARC Instruction Formats

13.8 RISC vs CISC Controversy Comparison categories Quantitative Compare program sizes and execution speeds Qualitative Examine issues of high level language support and use of VLSI real estate Problems No pair of RISC and CISC that are directly comparable No definitive set of test programs Difficult to separate hardware effects from complier effects Most comparisons done on toy rather than production machines Most commercial devices are mixtures