Lecture 7: Instruction Set Architectures - IV

Similar documents
Lecture 7: Instruction Set Architectures - IV. Last time - MIPS ISA (a visual)

CS352H: Computer Systems Architecture

ECE 486/586. Computer Architecture. Lecture # 7

Tailoring the 32-Bit ALU to MIPS

Part I: Translating & Starting a Program: Compiler, Linker, Assembler, Loader. Lecture 4

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)

Number Systems and Computer Arithmetic

Review of Last lecture. Review ALU Design. Designing a Multiplier Shifter Design Review. Booth s algorithm. Today s Outline

Chapter 3 Arithmetic for Computers. ELEC 5200/ From P-H slides

Chapter 3 Arithmetic for Computers

MIPS Integer ALU Requirements

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

ECE468 Computer Organization & Architecture. The Design Process & ALU Design

Outline. EEL-4713 Computer Architecture Multipliers and shifters. Deriving requirements of ALU. MIPS arithmetic instructions

CS3350B Computer Architecture Winter 2015

Chapter 4. The Processor

are Softw Instruction Set Architecture Microarchitecture are rdw

Processor (I) - datapath & control. Hwansoo Han

Lecture 5. Other Adder Issues

Reminder: tutorials start next week!

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

ISA: The Hardware Software Interface

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

Computer Architecture Set Four. Arithmetic

RISC-V Assembly and Binary Notation

CS222: Processor Design

Review: MIPS Organization

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Lecture 21: Combinational Circuits. Integrated Circuits. Integrated Circuits, cont. Integrated Circuits Combinational Circuits

One and a half hours. Section A is COMPULSORY UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Chapter 4. The Processor Designing the datapath

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Lecture 4: Instruction Set Architecture

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Lecture Topics. Announcements. Today: The MIPS ISA (P&H ) Next: continued. Milestone #1 (due 1/26) Milestone #2 (due 2/2)

Chapter 4. The Processor

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

CPE300: Digital System Architecture and Design

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

COMP 303 Computer Architecture Lecture 6

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

Topics in computer architecture

EITF20: Computer Architecture Part2.2.1: Pipeline-1

REGISTER TRANSFER LANGUAGE

CPE 335 Computer Organization. MIPS Arithmetic Part I. Content from Chapter 3 and Appendix B

Instruction Set Principles and Examples. Appendix B

ECE260: Fundamentals of Computer Engineering

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

Instruction Set Architecture part 1 (Introduction) Mehran Rezaei

EITF20: Computer Architecture Part2.2.1: Pipeline-1

ECE331: Hardware Organization and Design

6.823 Computer System Architecture Datapath for DLX Problem Set #2

Let s put together a Manual Processor

Introduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C

Computer Architecture. Chapter 3: Arithmetic for Computers

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Lecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures

LECTURE 4. Logic Design

CS146 Computer Architecture. Fall Midterm Exam

Chapter 3: part 3 Binary Subtraction

Semester Transition Point. EE 109 Unit 11 Binary Arithmetic. Binary Arithmetic ARITHMETIC

ECE331: Hardware Organization and Design

Computer Architecture

The von Neumann Architecture. IT 3123 Hardware and Software Concepts. The Instruction Cycle. Registers. LMC Executes a Store.

COMPUTER ORGANIZATION AND DESIGN

Digital Circuit Design and Language. Datapath Design. Chang, Ik Joon Kyunghee University

Chapter 4 The Processor 1. Chapter 4A. The Processor

COS 140: Foundations of Computer Science

Instruction Set Architecture (ISA)

Design for a simplified DLX (SDLX) processor Rajat Moona

CAD4 The ALU Fall 2009 Assignment. Description

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

Review: Abstract Implementation View

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions

CPE300: Digital System Architecture and Design

Outline. Combinational Circuit Design: Practice. Sharing. 2. Operator sharing. An example 0.55 um standard-cell CMOS implementation

Combinational Circuit Design: Practice

Integer Arithmetic. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Lets Build a Processor

Arithmetic Logic Unit. Digital Computer Design

VLIW Digital Signal Processor. Michael Chang. Alison Chen. Candace Hobson. Bill Hodges

Part III The Arithmetic/Logic Unit. Oct Computer Architecture, The Arithmetic/Logic Unit Slide 1

55:132/22C:160, HPCA Spring 2011

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

Slide Set 5. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner. Topic 3: Arithmetic

CPE300: Digital System Architecture and Design

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

Programmable Machines

Instruction-Level Parallelism (ILP)

Transcription:

Lecture 7: Instruction Set Architectures - IV Last Time Register organization Memory issues (endian-ness, alignment, etc.) Today Exceptions General principles of ISA design Role of compiler Computer arithmetic Lecture 7 1

Control - Exceptions/Events Implied multi-way branch after every instruction External events (interrupts) completion of I/O operations Internal events (faults or exceptions) arithmetic overflow page fault What happens???? EPC PC of instruction that caused fault PC f(fault type) new PC from HW table lookup Return: PC FPC + 4 Inst 1 Inst 2 Page Flt Disk I/O RTC Overflow Lecture 7 2

Operations Data TypesAdd Modes Principles of Instruction Set Design Keep it simple (KISS) Frequency complexity increases logic area increases pipe stages increases development time evolution tends to make kludges 60% Orthogonality (modularity) 50% 40% simple rules, few exceptions 30% all ops on all registers make the common case fast some instructions (cases) are more important than others 20% 10% Regs Formats 0% INT LOAD STORE JMP FLOAT Lecture 7 3

Principles of Instruction Set Design (part 2) Generality not all problems need the same features/instructions principle of least surprise performance should be easy to predict Locality and concurrency design ISA to permit efficient implementation today 10 years from now 60% vs 50% 40% 30% 20% 10% 0% INT LOAD STORE JMP FLOAT CHAR F D R E W F D R E W F D R E W F D R E W Lecture 7 4

Good ISA design Review of ISA Principles KISS! - only implement necessities (encodings, address modes, etc.) FOG: Frequency, Orthogonality, Generality Instruction Types ALU ops, Data movement, Control Addressing modes Matched to program usage (local vars, globals, arrays) Program Control Conditional/unconditional branches and jumps Where to store conditions PC relative and absolute Lecture 7 5

Role of the Optimizing Compiler C source code HW/SW complexity tradeoffs Front End (Language Specific) IR High-Level Optimizations IR Global Optimizations Machine-IR Procedure Inlining Loop Transformations Common SubExp Elim. Code Motion Machine binary code Code Generator Instruction Scheduling Register Allocation Machine Dependent Lecture 7 6

Example: Loop Optimization LOOP: CONT: 7 LW R1, X ADD R2,R0,R0 ADD R3,R0,R0 SLT R5,R2,#MAX BEQZ R5,CONT LW R4,R1 ADD R3,R3,R4 ADD R1,R1,#4 ADD R2,R2,#1 J LOOP Loop Reordering sum=0; for(i=0;i<max;i++) sum+=x[i]; LOOP: CONT: 6 LW R1, X ADD R2,R0,R0 ADD R3,R0,R0 LW R4,R1 ADD R3,R3,R4 ADD R1,R1,#4 ADD R2,R2,#1 SLT R5,R2,#MAX BNEZ R5,LOOP LOOP: CONT: 5 LW R1, X ADD R2,R0,#MAX SLLI R2,R2,#2 ADD R2,R1,R2 ADD R3,R0,R0 LW R4,R1 ADD R3,R3,R4 ADD R1,R1,#4 SLT R5,R1,R2 BNEZ R5,LOOP Induction Variable Analysis Lecture 7 7

Architect Compiler Writer Simplify, Simplify, Simplify Feature difficult to use, it won t be used.less is More! Regularity Common set of formats, few special cases Primitive, not solutions CALLS vs. Fast register moves Make performance tradeoffs simple Ultimately, the ISA will *not* be perfect Lecture 7 8

Compiler Microarchitecture Instruction Scheduling Instruction Level Parallelism Resource Allocation Registers (minimize spills/restores to and from memory) Memory optimizations Cache conscious data organization Code layout Etc... Lecture 7 9

Building Blocks Arithmetic Units adders, multipliers, dividers, shifters,... Single Registers Register Files Memory Arrays Multiplexers Wires Microarchitecture involves trading off area and delay of alternative organizations Units area - tracks 2 (χ 2 ) delay - fan-out of 4 inv (τ 4 ), gate delay Typically organized into datapaths 10-20 tracks per bit slice Hard to estimate cost of control logic Lecture 7 10

What is Logic Design? Digital behavior of chip Specifies: Actual states (ISA visible and non-visible) Transitions between states Consists of: Combinational logic (ie. ALUs) Sequential logic ( memory ) Registers State machines Architecture/ISA MicroArchitecture Logic Design Circuit Design Fab Chip Lecture 7 11

Combinational Logic: The ALU B A 16 S-EXT 32 32 32 32 decode + shift sel_2 ADD,ADDI SUB,SUBI AND,ANDI,OR,ORI,XOR,XORI SLL,SRL,SRA,SLLI,SRLI,SRAI SLT,SLTI, etc. sh_func sub + 32 32 1 1 cmp 31*<0>,c c sel_1 sub LT, GT, etc. sel_3 sel_4 Lecture 7 12

Bit Slice Approach to ALU design A invert CarryIn and S-select or Mux Result B 1-bit Full Adder add Set-less-than? left as an exercise CarryOut Slide courtesy of D. Patterson Lecture 7 13

Bigger View of Bit Slicing LSB and MSB need to do a little extra A 32 B 32? a31 b31 ALU0 co cin s31 Ovflw S 32 a0 ALU0 co s0 b0 cin 4 M C/L to produce select, comp, c-in Slide courtesy of D. Patterson Lecture 7 14

Overflow Decimal Binary Decimal 0 0000 0 1 0001-1 2 0010-2 3 0011-3 4 0100-4 5 0101-5 6 0110-6 7 0111-7 Examples: 7 + 3 = 10 but... - 4-5 = - 9 but... -8 2 s Complement 0000 1111 1110 1101 1100 1011 1010 1001 1000 0 1 1 1 1 + 0 1 1 1 0 0 1 1 7 1 1 0 0 4 3 + 1 0 1 1 5 1 0 1 0 6 0 1 1 1 7 Slide courtesy of D. Patterson Lecture 7 15

Overflow Detection Overflow: the result is too large (or too small) to represent properly Example: - 8 < = 4-bit binary number <= 7 When adding operands with different signs, overflow cannot occur! Overflow occurs when adding: 2 positive numbers and the sum is negative 2 negative numbers and the sum is positive On your own: Prove you can detect overflow by: Carry into MSB xor Carry out of MSB 0 1 1 1 1 0 + 0 1 1 1 0 0 1 1 1 0 1 0 7 1 1 0 0 4 3 + 1 0 1 1 5 6 0 1 1 1 7 Slide courtesy of D. Patterson Lecture 7 16

What s the difference between. these instruction pairs add/addu addi/addiu div/divu sub/subu The unsigned versions don t check for overflow But otherwise the arithmetic algorithm is the same Lecture 7 17

Overflow Detection Logic Carry into MSB xor Carry out of MSB For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1] CarryIn0 A0 B0 A1 B1 A2 B2 A3 B3 1-bit Result0 ALU CarryIn1 CarryOut0 1-bit Result1 ALU CarryIn2 CarryOut1 1-bit Result2 ALU CarryIn3 1-bit ALU CarryOut3 Result3 X Y X XOR Y 0 0 0 0 1 1 1 0 1 1 1 0 Overflow Slide courtesy of D. Patterson Lecture 7 18

More Revised Diagram LSB and MSB need to do a little extra A 32 B 32 signed-arith and cin xor co Ovflw a31 b31 ALU0 co cin s31 S 32 a0 ALU0 co s0 b0 cin 4 M C/L to produce select, comp, c-in Slide courtesy of D. Patterson Lecture 7 19

But What about Performance? Critical Path of n-bit Rippled-carry adder is n*cp CarryIn0 A0 B0 A1 B1 A2 B2 A3 B3 1-bit ALU CarryIn1 CarryOut0 1-bit ALU CarryIn2 CarryOut1 1-bit ALU CarryIn3 CarryOut2 1-bit ALU Result0 Result1 Result2 Result3 CarryOut3 Design Trick: throw hardware at it Slide courtesy of D. Patterson Lecture 7 20

Carry Look Ahead (Design trick: peek) A0 B1 G P Cin S C1 =G0 + C0 P0 A B C-out 0 0 0 kill 0 1 C-in propagate 1 0 C-in propagate 1 1 1 generate A B G P S C2 = G1 + G0 P1 + C0 P0 P1 P = A and B G = A xor B A B G P S A B G P S C3 = G2 + G1 P2 + G0 P1 P2 + C0 P0 P1 P2 G P Slide courtesy of D. Patterson C4 =... Lecture 7 21

Summary ISA principles Compiler/ISA interaction Computer arithmetic (add/shift) Next Time RISC vs. CISC Computer arithmetic Memory Simple pipeline Reading assignment P&H 4.1-4.7 Lecture 7 22