Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. See P&H Appendix , and 2.21

Similar documents
Hakim Weatherspoon CS 3410 Computer Science Cornell University

RISC, CISC, and ISA Variations

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. See P&H Chapter:

Pipeline Control Hazards and Instruction Variations

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

RISC, CISC, and Assemblers

RISC, CISC, and Assemblers!

A Processor! Hakim Weatherspoon CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter , 4.1-3

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

A Processor. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter , 4.1-3

Instructions: Language of the Computer

Chapter 2: Instructions:

Review of instruction set architectures

CSE 141 Computer Architecture Spring Lecture 3 Instruction Set Architecute. Course Schedule. Announcements

Lecture 4: Instruction Set Architecture

Lecture Topics. Announcements. Today: The MIPS ISA (P&H ) Next: continued. Milestone #1 (due 1/26) Milestone #2 (due 2/2)

Computer Architecture

Instruction Set Architectures

CS3350B Computer Architecture MIPS Instruction Representation

Computer Organization MIPS ISA

Math 230 Assembly Programming (AKA Computer Organization) Spring 2008

Instruction Set Architecture. "Speaking with the computer"

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

Instruction Set Principles. (Appendix B)

RISC Pipeline. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 4.6

Announcements HW1 is due on this Friday (Sept 12th) Appendix A is very helpful to HW1. Check out system calls

CS222: Dr. A. Sahu. Indian Institute of Technology Guwahati

Instruction Set Architecture (ISA)

Computer Architecture. MIPS Instruction Set Architecture

Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410, Spring 2014 Computer Science Cornell University. See P&H 2.8 and 2.12, and A.

ECE232: Hardware Organization and Design

CPU Performance Pipelined CPU

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

Computer Architecture

Chapter 2A Instructions: Language of the Computer

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

Chapter 2. Instruction Set Design. Computer Architectures. software. hardware. Which is easier to change/design??? Tien-Fu Chen

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Stored Program Concept. Instructions: Characteristics of Instruction Set. Architecture Specification. Example of multiple operands

Instruction Set Principles and Examples. Appendix B

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

MIPS Memory Access Instructions

Lecture 3 Machine Language. Instructions: Instruction Execution cycle. Speaking computer before voice recognition interfaces

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture

CSCE 5610: Computer Architecture

Caches (Writing) P & H Chapter 5.2 3, 5.5. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Typical Processor Execution Cycle

CS3350B Computer Architecture

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the

Hakim Weatherspoon CS 3410 Computer Science Cornell University

CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro

Course Administration

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

Anne Bracy CS 3410 Computer Science Cornell University. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

CSEE 3827: Fundamentals of Computer Systems

MIPS Instruction Set Architecture (1)

Modern Computer Architecture

Lecture 5: Instruction Set Architectures II. Take QUIZ 2 before 11:59pm today over Chapter 1 Quiz 1: 100% - 29; 80% - 25; 60% - 17; 40% - 3

Systems Architecture I

are Softw Instruction Set Architecture Microarchitecture are rdw

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

EN164: Design of Computing Systems Lecture 09: Processor / ISA 2

CENG3420 Lecture 03 Review

The Single Cycle Processor

Instruction Set Architecture

Instructions: Language of the Computer

Chapter 2. Instructions: Language of the Computer. HW#1: 1.3 all, 1.4 all, 1.6.1, , , , , and Due date: one week.

ECE232: Hardware Organization and Design

EE108B Lecture 2 MIPS Assembly Language I. Christos Kozyrakis Stanford University

The MIPS Instruction Set Architecture

101 Assembly. ENGR 3410 Computer Architecture Mark L. Chang Fall 2009

Introduction to the MIPS. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Reminder: tutorials start next week!

Instructions: MIPS arithmetic. MIPS arithmetic. Chapter 3 : MIPS Downloaded from:

EC 413 Computer Organization

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design

P & H Chapter 5.7 (up to TLBs) Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Chapter 2. lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1

Chapter 1. Computer Abstractions and Technology. Lesson 3: Understanding Performance

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Computer Systems and Networks. ECPE 170 Jeff Shafer University of the Pacific. MIPS Assembly

Chapter 2: Instructions How we talk to the computer

EN164: Design of Computing Systems Topic 03: Instruction Set Architecture Design

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CENG3420 L03: Instruction Set Architecture

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Chapter 3 MIPS Assembly Language. Ó1998 Morgan Kaufmann Publishers 1

CS222: MIPS Instruction Set

Chapter 2 Instruction: Language of the Computer

CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE. Debdeep Mukhopadhyay, CSE, IIT Kharagpur. Instructions and Addressing

CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro

Reduced Instruction Set Computer (RISC)

EEC 581 Computer Architecture Lecture 1 Review MIPS

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Lecture 10: Simple Data Path

Forecast. Instructions (354 Review) Basics. Basics. Instruction set architecture (ISA) is its vocabulary. Instructions are the words of a computer

PHBREAK: RISC ISP architecture the MIPS ISP. you read: text Chapter 3 CS 350

Computer Architecture 计算机体系结构. Lecture 2. Instruction Set Architecture 第二讲 指令集架构. Chao Li, PhD. 李超博士

Transcription:

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Appendix 2.16 2.18, and 2.21

There is a Lab Section this week, C Lab2 Project1 (PA1) is due next Monday, March 9th Prelim today Starts at 7:30pm sharp Go to location based on netid [a g]* MRS146: Morrison Hall 146 [h l]* RRB125: Riley Robb Hall 125 [m n]* RRB105: Riley Robb Hall 105 [o s]* MVRG71: M Van Rensselaer Hall G71 [t z]* MVRG73: M Van Rensselaer Hall G73

Prelim1 today: Time: We will start at 7:30pm sharp, so come early Location: on previous slide Closed Book Cannot use electronic device or outside material Practice prelims are online in CMS Material covered everything up to end of this week Everything up to and including data hazards Appendix B (logic, gates, FSMs, memory, ALUs) Chapter 4 (pipelined [and non] MIPS processor with hazards) Chapters 2 (Numbers / Arithmetic, simple MIPS instructions) Chapter 1 (Performance) HW1, Lab0, Lab1, Lab2, C Lab0, C Lab1

compute jump/branch targets memory PC +4 new pc Instruction Fetch inst register file control extend detect hazard Instruction Decode imm B A ctrl alu forward unit Execute d in addr d out memory Memory Write Back IF/ID ID/EX EX/MEM MEM/WB B D ctrl D M ctrl

C compiler MIPS assembly assembler machine code CPU Circuits int x = 10; x = 2 * x + 15; addi r5, r0, 10 muli r5, r5, 2 addi r5, r5, 15 r0 = 0 r5 = r0 + 10 r5 = r5<<1 #r5 = r5 * 2 r5 = r15 + 15 op = addi r0 r5 10 00100000000001010000000000001010 00000000000001010010100001000000 00100000101001010000000000001111 op = addi r5 r5 15 op = r type r5 r5 shamt=1 func=sll Gates Transistors Silicon 5

C compiler MIPS assembly assembler machine code CPU Circuits int x = 10; x = 2 * x + 15; addi r5, r0, 10 muli r5, r5, 2 addi r5, r5, 15 High Level Languages 00100000000001010000000000001010 00000000000001010010100001000000 00100000101001010000000000001111 Instruction Set Architecture (ISA) Gates Transistors Silicon 6

Instruction Set Architectures ISA Variations, and CISC vs RISC Next Time Program Structure and Calling Conventions

Is MIPS the only possible instruction set architecture (ISA)? What are the alternatives?

ISA defines the permissible instructions MIPS: load/store, arithmetic, control flow, ARMv7: similar to MIPS, but more shift, memory, & conditional ops ARMv8 (64 bit): even closer to MIPS, no conditional ops VAX: arithmetic on memory or registers, strings, polynomial evaluation, stacks/queues, Cray: vector operations, x86: a little of everything

Accumulators Early stored program computers had one register! EDSAC (Electronic Delay Storage Automatic Calculator) in 1949 Intel 8008 in 1972 was an accumulator One register is two registers short of a MIPS instruction! Requires a memory based operand addressing mode Example Instructions: add 200 Add the accumulator to the word in memory at address 200 Place the sum back in the accumulator

Next step, more registers Dedicated registers E.g. indices for array references in data transfer instructions, separate accumulators for multiply or divide instructions, top of stack pointer. Extended Accumulator Intel 8086 extended accumulator Processor for IBM PCs One operand may be in memory (like previous accumulators). Or, all the operands may be registers (like MIPS).

Next step, more registers General purpose registers Registers can be used for any purpose E.g. MIPS, ARM, x86 Register memory architectures One operand may be in memory (e.g. accumulators) E.g. x86 (i.e. 80386 processors) Register register architectures (aka load store) All operands must be in registers E.g. MIPS, ARM

The number of available registers greatly influenced the instruction set architecture (ISA) Machine Num General Purpose Registers Architectural Style Year EDSAC 1 Accumulator 1949 IBM 701 1 Accumulator 1953 CDC 6600 8 Load Store 1963 IBM 360 18 Register Memory 1964 DEC PDP 8 1 Accumulator 1965 DEC PDP 11 8 Register Memory 1970 Intel 8008 1 Accumulator 1972 Motorola 6800 2 Accumulator 1974 DEC VAX 16 Register Memory, Memory Memory 1977 Intel 8086 1 Extended Accumulator 1978 Motorola 6800 16 Register Memory 1980 Intel 80386 8 Register Memory 1985 ARM 16 Load Store 1985 MIPS 32 Load Store 1985 HP PA RISC 32 Load Store 1986 SPARC 32 Load Store 1987 PowerPC 32 Load Store 1992 DEC Alpha 32 Load Store 1992 HP/Intel IA 64 128 Load Store 2001 AMD64 (EMT64) 16 Register Memory 2003

The number of available registers greatly influenced the instruction set architecture (ISA)

How to compute with limited resources? i.e. how do you design your ISA if you have limited resources?

People programmed in assembly and machine code! Needed as many addressing modes as possible Memory was (and still is) slow CPUs had relatively few registers Register s were more expensive than external mem Large number of registers requires many bits to index Memories were small Encouraged highly encoded microcodes as instructions Variable length instructions, load/store, conditions, etc

People programmed in assembly and machine code! E.g. x86 > 1000 instructions! 1 to 15 bytes each E.g. dozens of add instructions operands in dedicated registers, general purpose registers, memory, on stack, can be 1, 2, 4, 8 bytes, signed or unsigned 10s of addressing modes e.g. Mem[segment + reg + reg*scale + offset] E.g. VAX Like x86, arithmetic on memory or registers, but also on strings, polynomial evaluation, stacks/queues,

The number of available registers greatly influenced the instruction set architecture (ISA) Complex Instruction Set Computers were very complex Necessary to reduce the number of instructions required to fit a program into memory. However, also greatly increased the complexity of the ISA as well.

How do we reduce the complexity of the ISA while maintaining or increasing performance?

John Cock IBM 801, 1980 (started in 1975) Name 801 came from the bldg that housed the project Idea: Possible to make a very small and very fast core Influences: Known as the father of RISC Architecture. Turing Award Recipient and National Medal of Science.

Dave Patterson RISC Project, 1982 UC Berkeley RISC I: ½ transistors & 3x faster Influences: Sun SPARC, namesake of industry John L. Hennessy MIPS, 1981 Stanford Simple pipelining, keep full Influences: MIPS computer system, PlayStation, Nintendo

Dave Patterson RISC Project, 1982 UC Berkeley RISC I: ½ transistors & 3x faster Influences: Sun SPARC, namesake of industry John L. Hennessy MIPS, 1981 Stanford Simple pipelining, keep full Influences: MIPS computer system, PlayStation, Nintendo

MIPS Design Principles Simplicity favors regularity 32 bit instructions Smaller is faster Small register file Make the common case fast Include support for constants Good design demands good compromises Support for different type of interpretations/classes

MIPS = Reduced Instruction Set Computer (RlSC) 200 instructions, 32 bits each, 3 formats all operands in registers almost all are 32 bits each 1 addressing mode: Mem[reg + imm] x86 = Complex Instruction Set Computer (ClSC) > 1000 instructions, 1 to 15 bytes each operands in dedicated registers, general purpose registers, memory, on stack, can be 1, 2, 4, 8 bytes, signed or unsigned 10s of addressing modes e.g. Mem[segment + reg + reg*scale + offset]

RISC Philosophy Regularity & simplicity Leaner means faster Optimize the common case CISC Rebuttal Compilers can be smart Transistors are plentiful Legacy is important Code size counts Micro code! Energy efficiency Embedded Systems Phones/Tablets Desktops/Servers

Android OS on ARM processor Windows OS on Intel (x86) processor

The number of available registers greatly influenced the instruction set architecture (ISA) Complex Instruction Set Computers were very complex Necessary to reduce the number of instructions required to fit a program into memory. However, also greatly increased the complexity of the ISA as well. Back in the day CISC was necessary because everybody programmed in assembly and machine code! Today, CISC ISA s are still dominant due to the prevalence of x86 ISA processors. However, RISC ISA s today such as ARM have an ever increasing market share (of our everyday life!). ARM borrows a bit from both RISC and CISC.

How does MIPS and ARM compare to each other?

All MIPS instructions are 32 bits long, has 3 formats R type op rs rt rd shamt func 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I type op rs rt immediate 6 bits 5 bits 5 bits 16 bits J type op immediate (target address) 6 bits 26 bits

All ARMv7 instructions are 32 bits long, has 3 formats R type opx op rs rd opx rt 4 bits 8 bits 4 bits 4 bits 8 bits 4 bits I type opx op rs rd immediate 4 bits 8 bits 4 bits 4 bits 12 bits J type opx op immediate (target address) 4 bits 4 bits 24 bits

while(i!= j) { if (i > j) i = j; else j = i; } Loop: BEQ Ri, Rj, End SLT Rd, Rj, Ri BNE Rd, R0, Else SUB Ri, Ri, Rj J Loop In MIPS, performance will be slow if code has a lot of branches // if "NE" (not equal), then stay in loop // "GT" if (i > j), // // if "GT" (greater than), i= i j; Else: SUB Rj, Rj, Ri // or "LT" if (i < j) J Loop // if "LT" (less than), j = j i; End:

while(i!= j) { if (i > j) In ARM, can avoid delay due to i = j; Branches with conditional else instructions j = i; } 010 0 LOOP: CMP Ri, Rj = < > // set condition "NE" if (i!= j) // "GT" if (i > j), // or "LT" if (i < j) SUBGT Ri, Ri, Rj // if "GT" (greater than), i= i j; SUBLE Rj, Rj, Ri // if "LE" (less than or equal), j = j i; BNE loop // if "NE" (not equal), then loop 000 1 = < > 101 0 = < > 010 0 = < >

Shift one register (e.g. Rc) any amount Add to another register (e.g. Rb) Store result in a different register (e.g. Ra) ADD Ra, Rb, Rc LSL #4 Ra = Rb + Rc<<4 Ra = Rb + Rc x 16

All ARMv7 instructions are 32 bits long, has 3 formats Reduced Instruction Set Computer (RISC) properties Only Load/Store instructions access memory Instructions operate on operands in processor registers 16 registers Complex Instruction Set Computer (CISC) properties Autoincrement, autodecrement, PC relative addressing Conditional execution Multiple words can be accessed from memory with a single instruction (SIMD: single instr multiple data)

All ARMv8 instructions are 64 bits long, has 3 formats Reduced Instruction Set Computer (RISC) properties Only Load/Store instructions access memory Instructions operate on operands in processor registers 32 registers and r0 is always 0 NO MORE Complex Instruction Set Computer (CISC) properties NO Conditional execution NO Multiple words can be accessed from memory with a single instruction (SIMD: single instr multiple data)

ISA defines the permissible instructions MIPS: load/store, arithmetic, control flow, ARMv7: similar to MIPS, but more shift, memory, & conditional ops ARMv8 (64 bit): even closer to MIPS, no conditional ops VAX: arithmetic on memory or registers, strings, polynomial evaluation, stacks/queues, Cray: vector operations, x86: a little of everything

How do we coordinate use of registers? Calling Conventions! PA1 due next Tueday

Prelim today Starts at 7:30pm sharp Go to location based on netid [a g]* MRS146: Morrison Hall 146 [h l]* RRB125: Riley Robb Hall 125 [m n]* RRB105: Riley Robb Hall 105 [o s]* MVRG71: M Van Rensselaer Hall G71 [t z]* MVRG73: M Van Rensselaer Hall G73

Time: We will start at 7:30pm sharp, so come early Location: See previous slide Closed Book Cannot use electronic device or outside material Material covered everything up to end of last week Everything up to and including data hazards Appendix B (logic, gates, FSMs, memory, ALUs) Chapter 4 (pipelined [and non] MIPS processor with hazards) Chapters 2 (Numbers / Arithmetic, simple MIPS instructions) Chapter 1 (Performance) HW1, Lab0, Lab1, Lab2

General Case: Mealy Machine Registers Current State Input Comb. Logic Output Next State Outputs and next state depend on both current state and input

Special Case: Moore Machine Registers Current State Input Comb. Logic Comb. Logic Output Next State Outputs depend only on current state

How long does it take to compute a result? AB AB AB AB C out C in S S S S

How long does it take to compute a result? Speed of a circuit is affected by the number of gates in series (on the critical path or the deepest level of logic) AB AB AB AB C out C in S S S S

Next State s' Current Output State D Q s Comb. z a Logic b Next State Input s' z= b + a + s + abs s = ab + bs + a s + abs Strategy:. (1) Draw a state diagram (e.g.. Mealy Machine) (2) Write output and next state. tables (3) Encode states, inputs, and outputs as bits (4) Determine logic equations for next state and outputs

Endianness: Ordering of bytes within a memory word Little Endian = least significant part first (MIPS, x86) as 4 bytes 1000 1001 1002 1003 0x78 0x56 0x34 0x12 as 2 halfwords 0x5678 0x1234 as 1 word 0x12345678 Big Endian = most significant part first (MIPS, networks) as 4 bytes 1000 1001 1002 1003 0x12 0x34 0x56 0x78 as 2 halfwords as 1 word 0x1234 0x12345678 0x5678

Examples (big/little endian): # r5 contains 5 (0x00000005) SB r5, 2(r0) LB r6, 2(r0) # R[r6] = 0x05 SW r5, 8(r0) LB r7, 8(r0) LB r8, 11(r0) # R[r7] = 0x00 # R[r8] = 0x05 0x05 0x00 0x00 0x00 0x05 0x00000000 0x00000001 0x00000002 0x00000003 0x00000004 0x00000005 0x00000006 0x00000007 0x00000008 0x00000009 0x0000000a 0x0000000b...

Examples (big/little endian): # r5 contains 5 (0x00000005) SB r5, 2(r0) LB r6, 2(r0) # R[r6] = 0x00000005 SW r5, 8(r0) LB r7, 8(r0) LB r8, 11(r0) # R[r7] = 0x00000000 # R[r8] = 0x00000005 0x05 0x00 0x00 0x00 0x05 0x00000000 0x00000001 0x00000002 0x00000003 0x00000004 0x00000005 0x00000006 0x00000007 0x00000008 0x00000009 0x0000000a 0x0000000b...

inst mem D A B data mem add r3, r1, r2 IF ID Ex M W sub r5, r3, r1 IF ID Ex M W

inst mem D A B data mem add r3, r1, r2 IF ID Ex M W sub r5, r3, r1 IF ID Ex M W or r6, r3, r4 IF ID Ex M W

inst mem D A B data mem add r3, r1, r2 sub r5, r3, r1 or r6, r3, r4 add r6, r3, r8 IF ID Ex M W IF ID Ex M W IF ID Ex M W IF ID Ex M W

inst mem A D B sub r6,r4,r1 NOP data mem lw r4, 20(r8) lw r4, 20(r8) or r6, r3, r4 load use stall IF ID Ex M W Stall IF ID Ex ID Ex M W DELAY SLOT!

add r3, r1, r2 nand r5, r3, r4 add r2, r6, r3 lw r6, 24(r3) sw r6, 12(r2)

add r3, r1, r2 nand r5, r3, r4 add r2, r6, r3 lw r6, 24(r3) sw r6, 12(r2) Forwarding from Ex/M ID/Ex (M Ex) Forwarding from M/W ID/Ex (W Ex) RegisterFile (RF) Bypass Forwarding from M/W ID/Ex (W Ex) Stall + Forwarding from M/W ID/Ex (W Ex) 5 Hazards