REDUCED INSTRUCTION SET COMPUTERS (RISC)

Similar documents
Lecture 4: RISC Computers

Lecture 4: RISC Computers

Lecture 8: RISC & Parallel Computers. Parallel computers

RISC Principles. Introduction

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

Chapter 13 Reduced Instruction Set Computers

Lecture 4: Instruction Set Design/Pipelining

SUPERSCALAR AND VLIW PROCESSORS

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

55:132/22C:160, HPCA Spring 2011

ECE 486/586. Computer Architecture. Lecture # 7

William Stallings Computer Organization and Architecture. Chapter 12 Reduced Instruction Set Computers

CPU Structure and Function

Lecture 4: Instruction Set Architecture

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

Cpu Architectures Using Fixed Length Instruction Formats

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

Major Advances (continued)

RISC Architecture Ch 12

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved.

COS 140: Foundations of Computer Science

COMPUTER ORGANIZATION & ARCHITECTURE

Supplement for MIPS (Section 4.14 of the textbook)

Pipelining, Branch Prediction, Trends

ARSITEKTUR SISTEM KOMPUTER. Wayan Suparta, PhD 17 April 2018

Chapter 2: Instructions How we talk to the computer

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the

Instruction Set Principles and Examples. Appendix B

EC 413 Computer Organization

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

William Stallings Computer Organization and Architecture

Computer Architecture and Organization

CISC / RISC. Complex / Reduced Instruction Set Computers

Reminder: tutorials start next week!

Lecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures

CPU Structure and Function

General Purpose Processors

instruction set computer or RISC.

Main Points of the Computer Organization and System Software Module

COMPUTER ORGANIZATION & ARCHITECTURE

Computer Architecture and Data Manipulation. Von Neumann Architecture

Instruction Set Architecture (ISA)

CS4617 Computer Architecture

Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

ECE 486/586. Computer Architecture. Lecture # 8

Instruction Sets Ch 9-10

Instruction Sets Ch 9-10

Real instruction set architectures. Part 2: a representative sample

Computer Architecture

Chapter 5:: Target Machine Architecture (cont.)

Computer Systems Laboratory Sungkyunkwan University

Chapter 5. A Closer Look at Instruction Set Architectures

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles.

CHAPTER 5 A Closer Look at Instruction Set Architectures

Superscalar Processors Ch 14

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Processing Unit CS206T

Overview. EE 4504 Computer Organization. This section. Items to be discussed. Section 9 RISC Architectures. Reading: Text, Chapter 12

Computer Architecture

Chapter 4. The Processor

Chapter 12. CPU Structure and Function. Yonsei University

Page 1. Structure of von Nuemann machine. Instruction Set - the type of Instructions

ISA: The Hardware Software Interface

Overview. EE 4504 Computer Organization. Much of the computer s architecture / organization is hidden from a HLL programmer

Computer Organization CS 206 T Lec# 2: Instruction Sets

Chapter 5. A Closer Look at Instruction Set Architectures

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Computer Architecture and Organization. Instruction Sets: Addressing Modes and Formats

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

Addressing Modes. Immediate Direct Indirect Register Register Indirect Displacement (Indexed) Stack

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function

Reduced Instruction Set Computer

Chapter 5. A Closer Look at Instruction Set Architectures

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise

CHAPTER 5 A Closer Look at Instruction Set Architectures

Chapter 4. The Processor

Micro-programmed Control Ch 17

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions

Micro-programmed Control Ch 15

EC-801 Advanced Computer Architecture

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

Instruction Sets: Characteristics and Functions Addressing Modes

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Micro-programmed Control Ch 15

Spring 2014 Midterm Exam Review

Instruction Set. Instruction Sets Ch Instruction Representation. Machine Instruction. Instruction Set Design (5) Operation types

I ve been getting this a lot lately So, what are you teaching this term? Computer Organization. Do you mean, like keeping your computer in place?

CS 5803 Introduction to High Performance Computer Architecture: RISC vs. CISC. A.R. Hurson 323 Computer Science Building, Missouri S&T

COMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA)

Review of instruction set architectures

CSCE 5610: Computer Architecture

CA226 Advanced Computer Architecture

Transcription:

Datorarkitektur Fö 5/6-1 Datorarkitektur Fö 5/6-2 What are RISCs and why do we need them? REDUCED INSTRUCTION SET COMPUTERS (RISC) RISC architectures represent an important innovation in the area of computer organization. The RISC architecture is an attempt to produce more CPU power by simplifying the instruction set of the CPU. The opposed trend to RISC is that of complex instruction set computers (CISC). 1. Why do we need RISCs? 2. Some Typical Features of Current Programs Both RISC and CISC architectures have been developed as an attempt to cover the semantic gap. 3. Main Characteristics of RISC Architectures 4. Are RISCs Really Better than CISCs? 5. Examples Datorarkitektur Fö 5/6-3 Datorarkitektur Fö 5/6-4 The Semantic Gap The Semantic Gap (cont d) growing abstraction level High-level language (HLL) the semantic gap Machine language (machine architecture fully visible) In order to improve the efficiency of software development, new and powerful programming languages have been developed (Ada, C++, Java). They provide: high level of abstraction, conciseness, power. By this evolution the semantic gap grows. Problem: How should new HLL programs be compiled and executed efficiently on a processor architecture? Two possible answers: 1. The CISC approach: design very complex architectures including a large number of instructions and addressing modes; include also instructions close to those present in HLL. 2. The RISC approach: simplify the instruction set and adapt it to the real requirements of user programs.

Datorarkitektur Fö 5/6-5 Datorarkitektur Fö 5/6-6 Evaluation of Program Execution or What are Programs Doing Most of the Time? Several studies have been conducted to determine the execution characteristics of machine instruction sequences generated from HLL programs. Aspects of interest: 1. The frequency of operations performed. 2. The types of operands and their frequency of use. 3. Execution sequencing (frequency of jumps, loops, subprogram calls). Frequency of Instructions Executed Frequency distribution of executed machine instructions: - moves: 33% - conditional branch: 20% - Arithmetic/logic: 16% - others: Between 0.1% and 10% Addressing modes: the overwhelming majority of instructions uses simple addressing modes, in which the address can be calculated in a single cycle (register, register indirect, displacement); complex addressing modes (memory indirect, indexed+indirect, displacement+indexed, stack) are used only by ~18% of the instructions. Operand Types 74 to 80% of the operands are scalars (integers, reals, characters, etc.) which can be hold in registers; the rest (20-26%) are arrays/structures; 90% of them are global variables; 80% of the scalars are local variables. The majority of operands are local variables of scalar type, which can be stored in registers. Datorarkitektur Fö 5/6-7 Datorarkitektur Fö 5/6-8 Procedure Calls (cont d) Procedure Calls Investigations have been also performed about the percentage of the total execution time spent executing a certain HLL instruction. It turned out that comparing HLL instructions, most of the time is spent executing CALLs and RETURNs. Even if only 15% of the executed HLL instructions is a CALL or RETURN, they are executed most of the time, because of their complexity. A CALL or RETURN is compiled into a relatively long sequence of machine instructions with a lot of memory references. int x, y; float z; void proc3(int a, int b, int c) { - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - - - void proc2(int k) { int j, q; - - - - - - - - - - - - - - - - - proc3(j, 5, q); - - - - - - - - - - - - - - - - - - - - - void proc1() { int i; - - - - - - - - - - - - - - - - - proc2(i); - - - - - - - - - - - - - - - - - - - - - main () { proc1(); Some statistics concerning procedure calls: Only 1.25% of called procedures have more than six parameters. Only 6.7% of called procedures have more than six local variables. Chains of nested procedure calls are usually short and only very seldom longer than 6.

Datorarkitektur Fö 5/6-9 Datorarkitektur Fö 5/6-10 Main Characteristics of RISC Architectures Conclusions from Evaluation of Program Execution An overwhelming preponderance of simple (ALU and move) operations over complex operations. Preponderance of simple addressing modes. Large frequency of operand accesses; on average each instruction references 1.9 operands. Most of the referenced operands are scalars (so they can be stored in a register) and are local variables or parameters. Optimizing the procedure CALL/RETURN mechanism promises large benefits in speed. These conclusions have been at the starting point to the Reduced Instruction Set Computer (RISC) approach. The instruction set is limited and includes only simple instructions. - The goal is to create an instruction set contain-ing instructions that execute quickly; most of the RISC instructions are executed in a single machine cycle (after fetched and decoded). Pipeline operation (without memory reference): - RISC instructions, being simple, are hard-wired, while CISC architectures have to use microprogramming in order to implement complex instructions. - Having only simple instructions results in reduced complexity of the control unit and the data path; as a consequence, the processor can work at a high clock frequency. - The pipelines are used efficiently if instructions are simple and of similar execution time. - Complex operations on RISCs are executed as a sequence of simple RISC instructions. In the case of CISCs they are executed as one single or a few complex instruction. Datorarkitektur Fö 5/6-11 Datorarkitektur Fö 5/6-12 Let s see some small example: Assume: - we have a program with 80% of executed instructions being simple and 20% complex; - on a CISC machine simple instructions take 4 cycles, complex instructions take 8 cycles; cycle time is 100 ns (10-7 s); - on a RISC machine simple instructions are executed in one cycle; complex operations are implemented as a sequence of instructions; we consider on average 14 instructions (14 cycles) for a complex operation; cycle time is 75 ns (0.75 * 10-7 s). How much time takes a program of 1 000 000 instructions? CISC: (10 6 *0.80*4 + 10 6 *0.20*8)*10-7 = 0.48 s RISC: (10 6 *0.80*1 + 10 6 *0.20*14)*0.75*10-7 = 0.27 s complex operations take more time on the RISC, but their number is small; because of its simplicity, the RISC works at a smaller cycle time; with the CISC, simple instructions are slowed down because of the increased data path length and the increased control complexity. Load-and-store architecture - Only LOAD and STORE instructions reference data in memory; all other instructions operate only with registers (are register-to-register instructions); thus, only the few instructions accessing memory need more than one cycle to execute (after fetched and decoded). Pipeline operation with memory reference: CA: compute address TR: transfere Instructions use only few addressing modes - Addressing modes are usually register, direct, register indirect, displacement. Instructions are of fixed length and uniform format - This makes the loading and decoding of instructions simple and fast; it is not needed to wait until the length of an instruction is known in order to start decoding the following one; - Decoding is simplified because opcode and address fields are located in the same position for all instructions

Datorarkitektur Fö 5/6-13 Datorarkitektur Fö 5/6-14 A large number of registers is available - Variables and intermediate results can be stored in registers and do not require repeated loads and stores from/to memory. - All local variables of procedures and the passed parameters can be stored in registers (see slide 8 for comments on possible number of variables and parameters). What happens when a new procedure is called? Is the above strategy realistic? - The strategy is realistic, because the number of local variables in procedures is not large. The chains of nested procedure calls is only exceptionally larger than 6. - If the chain of nested procedure calls becomes large, at a certain call there will be no registers to be assigned to the called procedure; in this case local variables and parameters have to be stored in memory. - Normally the registers have to be saved in memory (they contain values of variables and parameters for the calling procedure); at return to the calling procedure, the values have to be again loaded from memory. This takes a lot of time. - If a large number of registers is available, a new set of registers can be allocated to the called procedure and the register set assigned to the calling one remains untouched. Why is a large number of registers typical for RISC architectures? - Because of the reduced complexity of the processor there is enough space on the chip to be allocated to a large number of registers. This, usually, is not the case with CISCs. Datorarkitektur Fö 5/6-15 Datorarkitektur Fö 5/6-16 The delayed load problem (cont d) The delayed load problem LOAD instructions (similar to the STORE) require memory access and their execution cannot be completed in a single clock cycle. However, in the next cycle a new instruction is started by the processor. Let us consider the following sequence: LOAD R1,X loads from address X into R1 ADD R2,R1 R2 R2 + R1 ADD R4,R3 R4 R4 + R3 SUB R2,R4 R2 R2 - R4 LOAD R1,X ADD R2,R1 ADD R4,R3 FI DI stall EI This happens in the pipeline: Using a smart compiler, it can be possible to avoid the stall in the pipeline. The solution, that has similarities with the delayed branching, is called delayed load. LOAD R1,X ADD R2,R1 ADD R4,R3 SUB R2,R4 FI DI stall EI But a smart compiler can do something with this code!

Datorarkitektur Fö 5/6-17 Datorarkitektur Fö 5/6-18 The delayed load problem (cont d) Are RISCs Really Better than CISCs? The compiler reorganises the code and tries to find an instruction that does not depend on the loaded value and that can be placed after the LOAD: LOAD R1,X loads from address X into R1 ADD R4,R3 R4 R4 + R3 ADD R2,R1 R2 R2 + R1 SUB R2,R4 R2 R2 - R4 No stall in the pipeline: LOAD R1,X ADD R4,R3 ADD R2,R1 SUB R2,R4 If there is no instruction to move after the LOAD, like in the following sequence, the stall is unavoidable: LOAD R1,X loads from address X into R1 ADD R2,R1 R2 R2 + R1 ADD R4,R2 R4 R4 + R2 SUB R3,R4 R3 R3 - R4 RISC architectures have several advantages and they were discussed throughout this lecture. However, a definitive answer to the above question is difficult to give. A lot of performance comparisons have shown that benchmark programs are really running faster on RISC processors than on processors with CISC characteristics. However, it is difficult to identify which feature of a processor produces the higher performance. Some "CISC fans" argue that the higher speed is not produced by the typical RISC features but because of technology, better compilers, etc. An argument in favour of the CISC: the simpler instruction set of RISC processors results in a larger memory requirement compared to the similar program compiled for a CISC architecture. Most of the current processors are not typical RISCs or CISCs but try to combine advantages of both approaches Datorarkitektur Fö 5/6-19 Datorarkitektur Fö 5/6-20 CISC Architectures: Some Processor Examples VAX 11/780 Nr. of instructions: 303 Instruction size: 2-57 Instruction format: not fixed Addressing modes: 22 Number of general purpose registers: 16 Pentium Nr. of instructions: 235 Instruction size: 1-11 Instruction format: not fixed Addressing modes: 11 Number of general purpose registers: 8 (32-bit mode), 16 (64-bit mode) Some Processor Examples RISC Architectures: Sun SPARC Nr. of instructions: 52 Instruction size: 4 Instruction format: fixed Addressing modes: 2 Number of general purpose registers: up to 520 PowerPC Nr. of instructions: 206 Instruction size: 4 Instruction format: not fixed (but small differences) Addressing modes: 2 Number of general purpose registers: 32 ARM Nr. of instructions (in the standard set, not the Thumb): 122 (many versions only provide a subset) Instruction size: 4 (standard), 2 (Thumb) Instruction format: fixed (different for regular and Thumb) Addressing modes: 3 Number of general purpose registers: 27 (16 can be used at any one time) Current ARM processors can execute both the standard 32 bit instruction set and the 16 bit Thumb instruction set. Thumb contains a subset of the 32- bit set, encoded into 16-bit instructions.

Datorarkitektur Fö 5/6-21 Summary Both RISCs and CISCs try to solve the same problem: to cover the semantic gap. They do it in different ways. CISCs are going the traditional way of implementing more and more complex instructions. RISCs try to simplify the instruction set. Innovations in RISC architectures are based on a close analysis of a large set of widely used programs. The main features of RISC architectures are: reduced number of simple instructions, few addressing modes, load-store architecture, instructions are of fixed length and format, a large number of registers is available. One of the main concerns of RISC designers was to maximise the efficiency of pipelining. Present architectures often include both RISC and CISC features.