Lecture 4: RISC Computers

Similar documents
Lecture 4: RISC Computers

Lecture 8: RISC & Parallel Computers. Parallel computers

REDUCED INSTRUCTION SET COMPUTERS (RISC)

Control unit. Input/output devices provide a means for us to make use of a computer system. Computer System. Computer.

Chapter 13 Reduced Instruction Set Computers

RISC Principles. Introduction

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

ARSITEKTUR SISTEM KOMPUTER. Wayan Suparta, PhD 17 April 2018

William Stallings Computer Organization and Architecture. Chapter 12 Reduced Instruction Set Computers

Lecture 4: Instruction Set Design/Pipelining

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

ECE 486/586. Computer Architecture. Lecture # 7

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

Processing Unit CS206T

Major Advances (continued)

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Micro-programmed Control Ch 15

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)

Chapter 2: Instructions How we talk to the computer

Micro-programmed Control Ch 15

Micro-programmed Control Ch 17

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions

Computer Systems Laboratory Sungkyunkwan University

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the

Real instruction set architectures. Part 2: a representative sample

CISC / RISC. Complex / Reduced Instruction Set Computers

Digital System Design Using Verilog. - Processing Unit Design

CS 5803 Introduction to High Performance Computer Architecture: RISC vs. CISC. A.R. Hurson 323 Computer Science Building, Missouri S&T

Instruction Set Principles and Examples. Appendix B

Lecture 4: Instruction Set Architecture

55:132/22C:160, HPCA Spring 2011

Pipelining, Branch Prediction, Trends

Microprocessor Architecture Dr. Charles Kim Howard University

Instruction Set Architectures. Part 1

instruction set computer or RISC.

Topics in computer architecture

Overview. EE 4504 Computer Organization. This section. Items to be discussed. Section 9 RISC Architectures. Reading: Text, Chapter 12

Instruction Set Principles. (Appendix B)

RISC Processors and Parallel Processing. Section and 3.3.6

Review of instruction set architectures

Computer Architecture

Lecture 2: Memory Systems

COMPUTER STRUCTURE AND ORGANIZATION

Supplement for MIPS (Section 4.14 of the textbook)

Outline. What Makes a Good ISA? Programmability. Implementability

Lecture 3 Machine Language. Instructions: Instruction Execution cycle. Speaking computer before voice recognition interfaces

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

Instruction Set Architecture. "Speaking with the computer"

Outline. What Makes a Good ISA? Programmability. Implementability. Programmability Easy to express programs efficiently?

Computer Organization CS 206 T Lec# 2: Instruction Sets

Lecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures

CPU Structure and Function

Chapter 4. MARIE: An Introduction to a Simple Computer 4.8 MARIE 4.8 MARIE A Discussion on Decoding

Computer Architecture

Multicycle Approach. Designing MIPS Processor

Processor Architecture

COMPUTER ORGANIZATION & ARCHITECTURE

CISC Attributes. E.g. Pentium is considered a modern CISC processor

Intel released new technology call P6P

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

ECE 486/586. Computer Architecture. Lecture # 8

RISC Architecture Ch 12

New Advances in Micro-Processors and computer architectures

Chapter 12. CPU Structure and Function. Yonsei University

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles.

More advanced CPUs. August 4, Howard Huang 1

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

COMPUTER ORGANIZATION & ARCHITECTURE

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

Hardware and Software Architecture. Chapter 2

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

MARIE: An Introduction to a Simple Computer

Communications and Computer Engineering II: Lecturer : Tsuyoshi Isshiki

William Stallings Computer Organization and Architecture

CSE 141 Computer Architecture Spring Lecture 3 Instruction Set Architecute. Course Schedule. Announcements

COS 140: Foundations of Computer Science

Advanced processor designs

Chapter 2 Logic Gates and Introduction to Computer Architecture

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

CS4617 Computer Architecture

Chapter 2. Instruction Set Design. Computer Architectures. software. hardware. Which is easier to change/design??? Tien-Fu Chen

Computer Architecture and Data Manipulation. Von Neumann Architecture

Lecture1: introduction. Outline: History overview Central processing unite Register set Special purpose address registers Datapath Control unit

Microprogramming is a technique to implement the control system of a CPU using a control store to hold the microoperations.

CSEE 3827: Fundamentals of Computer Systems

Implementing the Control. Simple Questions

ELC4438: Embedded System Design Embedded Processor

General Purpose Processors

Universität Dortmund. ARM Architecture

Advanced Computer Architecture

Reduced Instruction Set Computers

Systems Architecture I

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.

Superscalar Processors

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition

Transcription:

Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer architecture. It is an attempt to produce more computation power by simplifying the instruction set of the CPU. The opposed trend to RISC is Complex Instruction Set Computer (CISC), or the regular computers. Both RISC and CISC architectures have the objective to address problems caused by the semantic gap. In order to reduce the ever growing software costs. However, they use very different approaches. Zebo Peng, IDA, LiTH 2 1

The Semantic Gap In order to improve efficiency of software development, powerful high-level programming languages have been developed (e.g., Ada, C++, and Java). They support higher levels of abstraction. This evolution has increased the semantic gap between programming languages and machine languages. Increasing abstraction High-Level Language (HLL) Growing Semantic Gap CISC Machine Language Zebo Peng, IDA, LiTH 3 Main features of CISC CISC attempts to make ML instructions similar to HLL statements. Ex. CASE (switch) instruction on the VAX computer. A large number of instructions (> 200) with complex instructions and data types. Many and complex addressing modes (e.g., indirect addressing is often used). Microprogramming techniques are used to implement the complicated control unit. Memory bottleneck is a major problem, due to complex addressing modes & multiple memory-accesses instructions. Zebo Peng, IDA, LiTH 4 2

Arguments for CISC A rich instruction set should simplify the compiler by having instructions which match HLL statements. This works fine if the number of HL languages is small. If a program is smaller in size, due to the use of complex ML instructions, it has better performance: They take up less memory space and need fewer instruction fetch cycles. Fewer number of instructions are executed, which may lead to smaller execution time. Program execution efficiency is improved by implementing complex operations in microcode rather than machine code. Zebo Peng, IDA, LiTH 5 Microprogrammed Control Microprogramming is a technique used to implement the control unit. The basic idea is to implement the control unit as a microprogram execution machine (a computer inside a computer). The set of micro-operations occurring at one micro-clock cycle defines a microinstruction. A sequence of micro-instructions is called a microprogram. The execution of a machine instruction becomes the execution of a sequence of micro-instructions. This is similar to that a C++ statement is implemented by a sequence of machine instructions. Zebo Peng, IDA, LiTH 6 3

Hardware Implementation Instruction Register Clock Flags Sequencing Logic Decoder Control Add. Reg. +1 µpc Read Control Memory Control Unit Control Buf. Reg. Decoder C 0 C 1 C n Zebo Peng, IDA, LiTH 7 Microcode Storage and Execution Microcodes are stored in a micromemory which is much faster than a cache. Since the micro-memory stores only programs, a ROM is often used. Microprograms are, therefore, sometimes called firmware. Execution of a complex instruction in microcode, e.g., COM: fi di load load add sub store Execution of the same operation in terms of a sequence of machine code: LOAD LOAD ADD SUB STORE 0 1 m Zebo Peng, IDA, LiTH 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 fi di load store add sub 4

Microcode Storage and Execution By Microcodes executing a are complex stored in operation a micromemory which code, is the much execution faster than time a is reduced 0 in microcode 0 1 instead m of machine fi from 37 cache. 1 to 21 cycles! 2 Since the micro-memory stores only di 3 In this programs, way, complexity a ROM is often has used. moved from the 4 software 5 Microprograms are, therefore, load level to the hardware level, since the microcode 6 memory sometimes called firmware. 7 becomes bigger due to this new instruction! store Execution of a complex instruction in microcode, e.g., COM: If too many such instructions are added, fi di the load microcode load add memory sub store will explode, leading to a larger microclock cycle time, and lower CPU performance. LOAD LOAD ADD SUB STORE Execution of the same operation in terms of a sequence of machine code: Zebo Peng, IDA, LiTH 9 8 9 10 11 12 13 14 15 16 17 18 19 add sub Problems with CISC A large instruction set requires complex and time consuming hardware steps to decode and execute instructions. Some complex machine instructions may not match HLL statements exactly, in which case they may be of little use. This will be a major problem if the number of HL languages is getting bigger. Instruction sets designed with specialized instructions for several HL languages will not be efficient when executing program of a given language. It will also lead to a complex design tasks, thus a larger timeto-market. QUESTION: Is CISC a good trend for architecture? Zebo Peng, IDA, LiTH 10 5

Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 11 Evaluation of Program Execution What are programs actually doing most of the time? In particular, what are the execution characteristics of machine instruction sequences generated from HLL programs? Aspects of interest: The frequency of operations performed. The types of operands and their frequency of use. Control flow frequency: branches, loops, and subprogram calls. Zebo Peng, IDA, LiTH 12 6

Evaluation Results Frequency of machine instructions executed: moves of data: 33% conditional branches: 20% ca. 70% arithmetic/logic operations: 16% others: between 0.1% and 10% Addressing modes: the majority of instructions uses simple addressing modes. complex addressing modes (memory indirect, indexed+indirect, etc.) are only used by ~18% of the instructions. Zebo Peng, IDA, LiTH 13 Operand types: Evaluation Results (Cont d) 74-80% of the operands are scalars (integers, reals, characters, etc.). the rest (20-26%) are arrays/structures, and 90% of them are global variables. about 80% of the scalars are local variables. Conclusion: the majority of operands are local variables of scalar type, which can be stored in registers. Zebo Peng, IDA, LiTH 14 7

Procedure Calls Studies has also been done on the percentage of the execution time spent executing different HLL statements. Most of the time is spent executing CALLs and RETURNs in HLL programs. Even if only 15% of the executed HLL statements is a CALL or RETURN, they consume most of the execution time, because of their complexity. A CALL or RETURN is compiled into a relatively long sequence of machine instructions with a lot of memory references. Zebo Peng, IDA, LiTH 15 Evaluation Conclusions An overwhelming dominance of simple (ALU and move) operations over complex operations. Dominance of simple addressing modes. Large frequency of operand accesses; on average each instruction references 1.9 operands. Most of the referenced operands are scalars (can therefore be stored in registers) and are local variables or parameters. Optimizing the procedure CALL/RETURN mechanism promises large benefits in performance improvement. The RISC computers are developed to take advantages of the above results, and thus delivering better performance! Zebo Peng, IDA, LiTH 16 8

Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 17 Main Characteristics of RISC A small number of simple instructions (desirably < 100). Simple and small decode and execution hardware is required. A hard-wired controller is needed, rather than microprogramming. CPU takes less silicon area to implement, and runs also faster. Execution of one instruction per clock cycle. The instruction pipeline performs more efficiently due to simple instructions and similar execution patterns. Complex operations are executed as a sequence of simple instructions. In the case of CISC they are executed as one single or a few complex instructions. Zebo Peng, IDA, LiTH 18 9

Main Characteristics of RISC (Cont d) Load-and-store architecture Only LOAD and STORE instructions reference data in memory. All other instructions operate only with registers (register-toregister instructions). Only a few simple addressing modes are used. Ex. register, direct, register indirect, and displacement. Instructions are of fixed length and uniform format. Loading and decoding of instructions are simple and fast. It is not needed to wait until the length of an instruction is known in order to start decoding it. Decoding is simplified because the opcode and address fields are located in the same position for all instructions. Zebo Peng, IDA, LiTH 19 Main Characteristics of RISC (Cont d) A large number of registers can be implemented. Variables and intermediate results can be stored in registers and do not require repeated loads from and stores to memory. All local variables of procedures and the passed parameters can also be stored in registers. The large number of registers is due to that the reduced complexity of the processor leaves silicon space on the chip to implement them. This is usually not the case with CISC machines. Zebo Peng, IDA, LiTH 20 10

Register Windows A large number of registers is usually very useful. However, if contents of all registers must be saved at every procedure call, more registers mean longer delay. A solution to this problem is to divide the register file into a set of fixed-size windows. Each window is assigned to a procedure. Windows for adjacent procedures are overlapped to allow parameter passing. Zebo Peng, IDA, LiTH 21 Main Advantages of RISC Best support is given by optimizing the most used and most time consuming architecture aspects. Frequently executed instructions. Simple memory reference. Procedure call/return. Pipeline design. Consequently, we have High performance for many applications; Less design complexity; Reduced power consumption; and reducing design cost and time-to-market (newer technology). Zebo Peng, IDA, LiTH 22 11

Criticism of RISC An operation might need two, three, or more instructions to accomplish. More memory access cycles might be needed. Execution speed may be reduced for certain applications. It usually leads to longer programs, which needs larger memory space to store. This is less a problem now with the development of memory technologies. RISC makes it more difficult to program machine codes and assembly programs. Programming becomes more time consuming. Luckily very little low level programming is done nowadays. Zebo Peng, IDA, LiTH 23 Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 24 12

RISC vs. CISC Studies have shown that benchmark programs run often faster on RISC processors than on CISC machines. However, it is difficult to identify the particular RISC feature(s) that really produces the higher performance. Some "CISC fans" argue that the higher speed is not produced by the typical RISC features but because of newer technology, better compilers, etc. An argument in favor of CISC: the simpler RISC instruction set results in a larger memory requirement for RISC, as compared with the CISC case. Most recent processors are not typical RISC or CISC, but combine advantages of both approaches. Ex. new versions of PowerPC and recent Pentium processors. Zebo Peng, IDA, LiTH 25 An Exception Embedded Processors implement a dedicated function within a larger mechanical or electrical system, often with real-time computing constraints. CISC is unsuitable (complex and less predictable). RISC reduces power consumption (mobile devices). RISC gives better MIPS/watt ratio. RISC decreases heat dissipation (more reliable). RISC simplifies hardware and its design. ARM processors with RISC architecture have been widely used in smart phones and tablet computers (e.g., ipad). Zebo Peng, IDA, LiTH 26 13

Historical CISC/RISC Examples Characteristics CISC Machines RISC Machines Processor IBM370 VAX i486 SPARC MIPS Year developed 1973 1978 1989 1987 1991 No. of instructions 208 303 235 69 94 Inst. size (bytes) 2-6 2-57 1-12 4 4 Addressing modes 4 22 11 1 1 No. of G.P. registers 16 16 8 40-520 32 Cont. M. size (Kbits) 420 480 246 - - Zebo Peng, IDA, LiTH 27 A CISC Example Pentium I 32-bit integer processor: Registers: 8 general; 6 for segmented addresses (16-bits); 2 status/control; 1 instruction-pointer (PC). Two instruction pipelines: u-pipe can execute any instructions. v-pipe executes only simple instructions (control is hardwired!). On-chip floating-point unit. Pre-fetch buffer for instructions (16 bytes of instructions). Branch target buffer (512-entry branch history table). Microprogrammed control. Instruction set: Number of instructions: 235 (+ MMX instructions). e.g., FYL2XP1 x y performs y log 2 (x+1) in 313 clock cycles. Instruction size: 1-12 bytes (3.2 on average). Addressing modes: 11. Zebo Peng, IDA, LiTH 28 14

CISC Pentium I (Cont d) Memory organization: Address length: 32 bits (4 GB space). Memory is segmented for protection purpose. Support virtual memory by paging. On-chip separate 8 KB I-cache and 8K D-cache. D-cache can be configured as write-through or write-back. L2 cache: 256 KB or 512 KB. Instruction execution: Execution models: reg-reg, reg-mem, and mem-mem. Five-stage instruction pipeline: Fetch, Decode1, Decode2 (address generation), Execute, and Write-Back. Zebo Peng, IDA, LiTH 29 A RISC Example SPARC Scalable Processor Architecture: 32-bit processor: An Integer Unit (IU) to execute all instructions. A Floating-Point Unit (FPU) co-processor working concurrently with IU for arithmetic operations on floating point numbers. 69 basic instructions. A linear, 32-bit virtual-address space (4G). 40 to 520 general purpose 32-bit registers. grouped into 2 to 32 overlapping reg windows + 8 global registers. Each group has16 registers. Scalability. Latest high-end processors (2016) are Fujitsu's SPARC64 X+, which is designed for 64-bit operations. Zebo Peng, IDA, LiTH 30 15

Summary Both RISCs and CISCs try to address issues related to the semantic gap, with different approaches. CISCs follow the traditional way of implementing more and more complex instructions. RISCs try to simplify the instruction set, while improving the instruction execution performance. Development of RISC architectures are based on a close analysis of typical program execution patterns. The main features of RISC architectures are: reduced number of simple instructions, few addressing modes; instructions with fixed length and format, load-store architecture; efficient implementation of pipelining. a large number of registers, often organized in windows. New architectures often include both RISC and CISC features. Zebo Peng, IDA, LiTH 31 16