Background: Sequential processor design. Computer Architecture and Systems Programming , Herbstsemester 2013 Timothy Roscoe

Similar documents
CS:APP Chapter 4 Computer Architecture Sequential Implementation

Giving credit where credit is due

Computer Science 104:! Y86 & Single Cycle Processor Design!

Sequential Implementation

Systems I. Datapath Design II. Topics Control flow instructions Hardware for sequential machine (SEQ)

Processor Architecture II! Sequential! Implementation!

CS:APP Chapter 4 Computer Architecture Sequential Implementation

Computer Science 104:! Y86 & Single Cycle Processor Design!

Chapter 4! Processor Architecture!!

CS:APP Chapter 4! Computer Architecture! Sequential! Implementation!

CISC Fall 2009

Computer Science 104:! Y86 & Single Cycle Processor Design!

Stage Computation: Arith/Log. Ops

God created the integers, all else is the work of man Leopold Kronecker

Computer Science 104:! Y86 & Single Cycle Processor Design!

Y86 Processor State. Instruction Example. Encoding Registers. Lecture 7A. Computer Architecture I Instruction Set Architecture Assembly Language View

Instruction Set Architecture

Instruction Set Architecture

Giving credit where credit is due

CISC 360 Instruction Set Architecture

CS:APP Chapter 4 Computer Architecture Sequential Implementation

Instruction Set Architecture

Y86 Instruction Set. SEQ Hardware Structure. SEQ Stages. Sequential Implementation. Lecture 8 Computer Architecture III.

Systems I. Datapath Design II. Topics Control flow instructions Hardware for sequential machine (SEQ)

Instruction Set Architecture

Where Have We Been? Logic Design in HCL. Assembly Language Instruction Set Architecture (Y86) Finite State Machines

CISC: Stack-intensive procedure linkage. [Early] RISC: Register-intensive procedure linkage.

CS429: Computer Organization and Architecture

Chapter 4! Processor Architecture!

Processor Architecture

Sequential CPU Implementation.

The ISA. Fetch Logic

cmovxx ra, rb 2 fn ra rb irmovq V, rb 3 0 F rb V rmmovq ra, D(rB) 4 0 ra rb D mrmovq D(rB), ra 5 0 ra rb D OPq ra, rb 6 fn ra rb jxx Dest 7 fn Dest

cmovxx ra, rb 2 fn ra rb irmovq V, rb 3 0 F rb V rmmovq ra, D(rB) 4 0 ra rb D mrmovq D(rB), ra 5 0 ra rb D OPq ra, rb 6 fn ra rb jxx Dest 7 fn Dest

cmovxx ra, rb 2 fn ra rb irmovl V, rb rb V rmmovl ra, D(rB) 4 0 ra rb D mrmovl D(rB), ra 5 0 ra rb D OPl ra, rb 6 fn ra rb jxx Dest 7 fn Dest

Overview. CS429: Computer Organization and Architecture. Y86 Instruction Set. Building Blocks

Computer Organization Chapter 4. Prof. Qi Tian Fall 2013

CSC 252: Computer Organization Spring 2018: Lecture 11

CS429: Computer Organization and Architecture

Foundations of Computer Systems

Chapter 4 Processor Architecture: Y86 (Sections 4.1 & 4.3) with material from Dr. Bin Ren, College of William & Mary

For Tuesday. Finish Chapter 4. Also, Project 2 starts today

cs281: Introduction to Computer Systems CPUlab Datapath Assigned: Oct. 29, Due: Nov. 3

cs281: Computer Systems CPUlab ALU and Datapath Assigned: Oct. 30, Due: Nov. 8 at 11:59 pm

Reading assignment. Chapter 3.1, 3.2 Chapter 4.1, 4.3

Byte. cmovle 7 1. halt 0 0 addq 6 0 nop 1 0 subq 6 1 cmovxx ra, rb 2 fn ra rb andq 6 2 irmovq V, rb 3 0 F rb V xorq 6 3. cmovl 7 2.

CS:APP Chapter 4 Computer Architecture Pipelined Implementation

Systems I. Pipelining II. Topics Pipelining hardware: registers and feedback paths Difficulties with pipelines: hazards Method of mitigating hazards

Michela Taufer CS:APP

CPSC 121: Models of Computation. Module 10: A Working Computer

CPSC 313, Summer 2013 Final Exam Date: June 24, 2013; Instructor: Mike Feeley

Computer Architecture I: Outline and Instruction Set Architecture. CENG331 - Computer Organization. Murat Manguoglu

CS:APP Guide to Y86 Processor Simulators

CPSC 121: Models of Computation. Module 10: A Working Computer

Review. Computer Science 104 Machine Organization & Programming

CPSC 313, Winter 2016 Term 1 Sample Date: October 2016; Instructor: Mike Feeley

Second Part of the Course

Term-Level Verification of a Pipelined CISC Microprocessor

CS:APP Chapter 4 Computer Architecture Wrap-Up Randal E. Bryant Carnegie Mellon University

Assembly I: Basic Operations. Jo, Heeseung

Processor Architecture I. Alexandre David

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung

CS:APP Chapter 4 Computer Architecture Wrap-Up Randal E. Bryant Carnegie Mellon University

Computer Organization: A Programmer's Perspective

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

Process Layout and Function Calls

Assembly I: Basic Operations. Computer Systems Laboratory Sungkyunkwan University

CS429: Computer Organization and Architecture

Intel x86-64 and Y86-64 Instruction Set Architecture

Topics of this Slideset. CS429: Computer Organization and Architecture. Instruction Set Architecture. Why Y86? Instruction Set Architecture

Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p

Sungkyunkwan University

Process Layout, Function Calls, and the Heap

CS241 Computer Organization Spring Addresses & Pointers

cmovxx ra, rb 2 fn ra rb irmovq V, rb 3 0 F rb V rmmovq ra, D(rB) 4 0 ra rb mrmovq D(rB), ra 5 0 ra rb OPq ra, rb 6 fn ra rb jxx Dest 7 fn Dest

Datorarkitektur, 2009 Tentamen

Giving credit where credit is due

SEQ part 3 / HCLRS 1

Machine-Level Programming II: Control and Arithmetic

CPSC 313, Summer 2013 Final Exam Solution Date: June 24, 2013; Instructor: Mike Feeley

Credits to Randy Bryant & Dave O Hallaron

CSE2421 FINAL EXAM SPRING Name KEY. Instructions: Signature

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018

CS241 Computer Organization Spring 2015 IA

Machine-Level Programming I: Introduction Jan. 30, 2001

Midterm Exam CSC February 2009

Machine-Level Programming II: Arithmetic & Control /18-243: Introduction to Computer Systems 6th Lecture, 5 June 2012

Machine-Level Programming II: Arithmetic & Control. Complete Memory Addressing Modes

Instruc(on Set Architecture. Computer Architecture Instruc(on Set Architecture Y86. Y86-64 Processor State. Assembly Programmer s View

1 /* file cpuid2.s */ 4.asciz "The processor Vendor ID is %s \n" 5.section.bss. 6.lcomm buffer, section.text. 8.globl _start.

Questions about last homework? (Would more feedback be useful?) New reading assignment up: due next Monday

Chapter 3 Machine-Level Programming II Control Flow

CS61 Section Solutions 3

CISC 360. Machine-Level Programming I: Introduction Sept. 18, 2008

The Hardware/Software Interface CSE351 Spring 2013

ASSEMBLY II: CONTROL FLOW. Jo, Heeseung

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Compiler Construction D7011E

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

Transcription:

Background: Sequential processor design Computer Architecture and Systems Programming 252 0061 00, Herbstsemester 2013 Timothy Roscoe

Overview Y86 instruction set architecture Processor state Instruction encoding Compiling, assembling, and simulation Sequential processor design Fetch, decode, execute, memory, write back, PC update Performance implications

The architecture: Y86 Simpler than X86, but close to it Illustrate instruction encoding with concrete examples Basis for assembler / simulator exercises Simple enough to design Sequential processor (this time) Pipelined processor (next that)

Y86 Processor State Program registers %eax %ecx %edx %ebx %esi %edi %esp %ebp Condition codes OF ZF SF PC Memory Program registers Same 8 as with IA32. Each 32 bits Condition codes Single bit flags set by arithmetic or logical instructions OF: Overflow ZF: Zero SF:Negative Program counter Indicates address of instruction Memory Byte addressable storage array Words stored in little endian byte order

Y86 Instructions Format 1 6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state

Encoding Registers Each register has 4 bit ID %eax %ecx %edx %ebx 0 1 2 3 %esi %edi %esp %ebp 6 7 4 5 Same encoding as in IA32 Register ID 8 indicates no register Will use this in our hardware design in multiple places

Instruction Example Addition instruction Generic Form Encoded Representation addl ra, rb 6 0 ra rb Add value in register ra to that in register rb Store result in register rb Note that Y86 only allows addition to be applied to register data Set condition codes based on result e.g., addl %eax,%esi Encoding: 60 06 Two byte encoding First indicates instruction type Second gives source and destination registers

Arithmetic and Logical Operations Instruction code Add Subtract (ra from rb) And Exclusive Or Function code addl ra, rb 6 0 ra rb subl ra, rb 6 1 ra rb andl ra, rb 6 2 ra rb Refer to generically as OPl Encodings differ only by function code Low order 4 bytes in first instruction word Set condition codes as side effect xorl ra, rb 6 3 ra rb

Move Operations rrmovl ra, rb 2 0 ra rb Register Register irmovl V, rb 3 0 8 rb V Immediate Register rmmovl ra,d(rb) 4 0 ra rb D Register Memory mrmovl D(rB),rA 5 0 ra rb D Memory Register Like the IA32 movl instruction Simpler format for memory addresses Give different names to keep them distinct

Move Instruction Examples IA32 Y86 Encoding movl $0xabcd, %edx irmovl $0xabcd, %edx 30 82 cd ab 00 00 movl %esp, %ebx rrmovl %esp, %ebx 20 43 movl 12(%ebp),%ecx movl %esi,0x41c(%esp) mrmovl 12(%ebp),%ecx rmmovl %esi,0x41c(%esp) 50 15 f4 ff ff ff 40 64 1c 04 00 00 movl $0xabcd, (%eax) movl %eax, 12(%eax,%edx) movl (%ebp,%eax,4),%ecx

Jump Instructions Jump Unconditionally jmp Dest 7 0 Dest Jump When Less Than or Equal jle Dest 7 1 Dest Jump When Less Than jl Dest 7 2 Dest Jump When Equal je Dest 7 3 Dest Jump When Not Equal jne Dest 7 4 Dest Jump When Greater Than or Equal jge Dest 7 5 Dest Jump When Greater Than Refer to generically as jxx Encodings differ only by function code Based on values of condition codes Same as IA32 counterparts Encode full destination address Unlike PC relative addressing seen in IA32 jg Dest 7 6 Dest

Y86 Program Stack Increasing Addresses Stack Bottom Stack Top %esp Region of memory holding program data Used in Y86 (and IA32) for supporting procedure calls Stack top indicated by %esp Address of top stack element Stack grows toward lower addresses Top element is at highest address in the stack When pushing, must first decrement stack pointer When popping, increment stack pointer

Stack Operations pushl ra a 0 ra 8 Decrement %esp by 4 Store word from ra to memory at %esp Like IA32 popl ra b 0 ra 8 Read word from memory at %esp Save in ra Increment %esp by 4 Like IA32

Subroutine Call and Return call Dest 8 0 Dest Push address of next instruction onto stack Start executing instructions at Dest Like IA32 ret 9 0 Pop value from stack Use as address for next instruction Like IA32

Miscellaneous Instructions nop 0 0 Don t do anything halt 1 0 Stop executing instructions IA32 has comparable instruction, but can t execute it in user mode We will use it to stop the simulator

Writing Y86 Code Try to use C compiler as much as possible Write code in C Compile for IA32 with gcc S Transliterate into Y86 Coding example Find number of elements in null terminated list int len1(int a[]); a 5043 6125 7395 0 3

Y86 code generation example First try Write typical array code /* Find number of elements in null terminated list */ int len1(int a[]) { int len; for (len = 0; a[len]; len++) ; return len; } Problem Hard to do array indexing on Y86 Since don t have scaled addressing modes L18: incl %eax cmpl $0,(%edx,%eax,4) jne L18 Compile with gcc O2 S

Y86 code generation example #2 Second try Write with pointer code /* Find number of elements in null terminated list */ int len2(int a[]) { int len = 0; while (*a++) len++; return len; } Result Don t need to do indexed addressing L24: movl (%edx),%eax incl %ecx L26: addl $4,%edx testl %eax,%eax jne L24 Compile with gcc O2 S

Y86 code generation example #3 IA32 code Setup len2: pushl %ebp xorl %ecx,%ecx movl %esp,%ebp movl 8(%ebp),%edx movl (%edx),%eax jmp L26 Y86 code Setup len2: pushl %ebp # Save %ebp xorl %ecx,%ecx # len = 0 rrmovl %esp,%ebp # Set frame mrmovl 8(%ebp),%edx # Get a mrmovl (%edx),%eax # Get *a jmp L26 # Goto entry

Y86 code generation example #4 IA32 code loop + finish L24: movl (%edx),%eax incl %ecx L26: addl $4,%edx testl %eax,%eax jne L24 movl %ebp,%esp movl %ecx,%eax popl %ebp ret Y86 code loop + finish L24: mrmovl (%edx),%eax # Get *a irmovl $1,%esi addl %esi,%ecx # len++ L26: # Entry: irmovl $4,%esi addl %esi,%edx # a++ andl %eax,%eax # *a == 0? jne L24 # No Loop rrmovl %ebp,%esp # Pop rrmovl %ecx,%eax # Rtn len popl %ebp ret

Y86 Program Structure irmovl Stack,%esp rrmovl %esp,%ebp irmovl List,%edx pushl %edx call len2 halt.align 4 List:.long 5043.long 6125.long 7395.long 0 # Function len2:... # Allocate space for stack.pos 0x100 Stack: # Set up stack # Set up frame # Push argument # Call Function # Halt # List of elements Program starts at address 0 Must set up stack Make sure don t overwrite code! Must initialize data Can use symbolic names

Assembling Y86 Program unix> yas eg.ys Generates object code file eg.yo Actually looks like disassembler output 0x000: 308400010000 irmovl Stack,%esp # Set up stack 0x006: 2045 rrmovl %esp,%ebp # Set up frame 0x008: 308218000000 irmovl List,%edx 0x00e: a028 pushl %edx # Push argument 0x010: 8028000000 call len2 # Call Function 0x015: 10 halt # Halt 0x018:.align 4 0x018: List: # List of elements 0x018: b3130000.long 5043 0x01c: ed170000.long 6125 0x020: e31c0000.long 7395 0x024: 00000000.long 0

Simulating Y86 Program unix> yis eg.yo Instruction set simulator Computes effect of each instruction on processor state Prints changes in state from original Stopped in 41 steps at PC = 0x16. Exception 'HLT', CC Z=1 S=0 O=0 Changes to registers: %eax: 0x00000000 0x00000003 %ecx: 0x00000000 0x00000003 %edx: 0x00000000 0x00000028 %esp: 0x00000000 0x000000fc %ebp: 0x00000000 0x00000100 %esi: 0x00000000 0x00000004 Changes to memory: 0x00f4: 0x00000000 0x00000100 0x00f8: 0x00000000 0x00000015 0x00fc: 0x00000000 0x00000018

Summary Y86 instruction set architecture (ISA) Similar state and instructions as IA32 Simpler encodings Somewhere between CISC and RISC How important is ISA design? Less now than before With enough hardware, can make almost anything go fast Added complexity of little used instructions is not many transistors Internally, IA32 is almost RISC anyway Decode unit can generate stream of RISC like instructions

Y86 Instruction Set Byte 0 1 2 3 4 5 nop 0 0 halt 1 0 rrmovl V,rB 2 0 ra rb irmovl V,rB 3 0 8 rb D rmmovl ra,d(rb) 4 0 ra rb D mrmovl D(rB),rA 5 0 ra rb D OP1 ra,rb jxx Dest 6 fn ra rb 7 fn Dest call Dest 8 0 Dest ret 9 0 pushl ra A 0 ra 8 popl ra B 0 ra 8 addl 6 0 subl 6 1 andl 6 2 xorl 6 3 jmp 7 0 jle 7 1 jl 7 2 je 7 3 jne 7 4 jge 7 5 jg 7 6

Building Blocks fun Combinational logic Compute Boolean functions of inputs Continuously respond to input changes Operate on data and implement control A B A L U 0 MUX 1 = Storage elements Store bits Addressable memories Non addressable registers Loaded only as clock rises vala srca valb srcb A Register file B valw W dstw Clock Clock

Hardware Control Language Very simple hardware description language Can only express limited aspects of hardware operation Parts we want to explore and modify Data types bool: Boolean a, b, c, int: words A, B, C, Does not specify word size bytes, 32 bit words, Statements bool a = bool expr ; int A = int expr ;

HCL Operations Classify by type of value returned Boolean expressions Logic operations a && b, a b,!a Word comparisons A == B, A!= B, A < B, A <= B, A >= B, A > B Set membership A in { B, C, D } Same as A == B A == C A == D Word expressions Case expressions [ a : A; b : B; c : C ] Evaluate test expressions a, b, c, in sequence Return word expression A, B, C, for first successful test

SEQ Hardware State Structure Program counter register (PC) Condition code register (CC) Register File Memories Access same memory space Data: for reading/writing program data Instruction: for reading instructions Instruction Flow Read instruction at address specified by PC Process through stages Update program counter PC Write back Memory Execute Decode Fetch icode, ifun ra, rb valc Instruction memory newpc vale, valm Addr, Data Bch CC alua, alub srca, srcb dsta, dstb valm Data memory valp vale ALU vala, valb PC increment A B Register M file E PC

SEQ Stages PC Write back newpc vale, valm valm Fetch Read instruction from instruction memory Decode Read program registers Execute Compute value or address Memory Read or write data Write Back PC Write program registers Update program counter Memory Execute Decode Fetch icode, ifun ra, rb valc Instruction memory Addr, Data Bch CC alua, alub srca, srcb dsta, dstb Data memory valp vale ALU vala, valb PC increment A B Register M file E PC

Instruction Decoding Optional Optional 5 0 ra rb D icode ifun ra rb valc Instruction Format Instruction byte icode:ifun Optional register byte ra:rb Optional constant word valc

Executing Arith./Logical Operation OP1 ra, rb 6 fn ra rb Fetch Read 2 bytes Decode Read operand registers Execute Perform operation Set condition codes Memory Do nothing Write back Update register PC Update Increment PC by 2

Stage Computation: Arith/Log. Ops Fetch Decode Execute Memory Write back PC update OPl ra, rb icode:ifun M 1 [PC] ra:rb M 1 [PC+1] valp PC+2 vala R[rA] valb R[rB] vale valb OP vala Set CC R[rB] vale PC valp Read instruction byte Read register byte Compute next PC Read operand A Read operand B Perform ALU operation Set condition code register Write back result Update PC Formulate instruction execution as sequence of simple steps Use same general form for all instructions

Executing rmmovl rmmovl ra,d(rb) 4 0 ra rb D Fetch Read 6 bytes Decode Read operand registers Execute Compute effective address Memory Write to memory Write back Do nothing PC Update Increment PC by 6

Stage Computation: rmmovl Fetch Decode Execute Memory Write back PC update rmmovl ra, D(rB) icode:ifun M 1 [PC] ra:rb M 1 [PC+1] valc M 4 [PC+2] valp PC+6 vala R[rA] valb R[rB] vale valb + valc M 4 [vale] vala PC valp Read instruction byte Read register byte Read displacement D Compute next PC Read operand A Read operand B Compute effective address Write value to memory Update PC Use ALU for address computation

Executing popl popl ra b 0 ra 8 Fetch Read 2 bytes Decode Read stack pointer Execute Increment stack pointer by 4 Memory Read from old stack pointer Write back Update stack pointer Write result to register PC Update Increment PC by 2

Stage Computation: popl Fetch Decode Execute Memory Write back PC update popl ra icode:ifun M 1 [PC] ra:rb M 1 [PC+1] valp PC+2 vala R[%esp] valb R [%esp] vale valb + 4 valm M 4 [vala] R[%esp] vale R[rA] valm PC valp Read instruction byte Read register byte Compute next PC Read stack pointer Read stack pointer Increment stack pointer Read from stack Update stack pointer Write back result Update PC Use ALU to increment stack pointer Must update two registers: Popped value, New SP

Executing Jumps jxx Dest Fall through: 7 fall fn thru: Dest xx xx Not taken target: Target: xx xx Taken Fetch Read 5 bytes Increment PC by 5 Decode Do nothing Execute Determine whether to take branch based on jump condition and condition codes Memory Do nothing Write,back Do nothing PC Update Set PC to Dest if branch taken or to incremented PC if not branch

Stage Computation: Jumps Fetch Decode jxx Dest icode:ifun M 1 [PC] valc M 4 [PC+1] valp PC+5 Read instruction byte Read destination address Fall through address Execute Memory Write back PC update Bch Cond(CC,ifun) PC Bch? valc : valp Take branch? Update PC Compute both addresses Choose based on setting of condition codes and branch condition

Executing call call Dest 8 fall 0 thru: Dest Return: xx xx Target: xx xx Fetch Read 5 bytes Increment PC by 5 Decode Read stack pointer Execute Decrement stack pointer by 4 Memory Write incremented PC to new value of stack pointer Write back Update stack pointer PC Update Set PC to Dest

Stage Computation: call Fetch Decode Execute Memory Write back PC update call Dest icode:ifun M 1 [PC] valc M 4 [PC+1] valp PC+5 valb R[%esp] vale valb + 4 M 4 [vale] valp R[%esp] vale PC valc Read instruction byte Read destination address Compute return point Read stack pointer Decrement stack pointer Write return value on stack Update stack pointer Set PC to destination Use ALU to decrement stack pointer Store incremented PC

Executing ret ret 9 0 Return: xx xx Fetch Read 1 byte Decode Read stack pointer Execute Increment stack pointer by 4 Memory Read return address from old stack pointer Write back Update stack pointer PC Update Set PC to return address

Stage Computation: ret ret icode:ifun M 1 [PC] Read instruction byte Fetch Decode Execute Memory Write back PC update vala R[%esp] valb R[%esp] vale valb + 4 valm M 4 [vala] R[%esp] vale PC valm Read operand stack pointer Read operand stack pointer Increment stack pointer Read return address Update stack pointer Set PC to return address Use ALU to increment stack pointer Read return address from memory

Computation Steps OPl ra, rb Fetch Decode Execute Memory Write back PC update icode,ifun ra,rb valc valp vala, srca valb, srcb vale Cond code valm dste dstm PC icode:ifun M 1 [PC] ra:rb M 1 [PC+1] valp PC+2 vala R[rA] valb R[rB] vale valb OP vala Set CC R[rB] vale PC valp Read instruction byte Read register byte [Read constant word] Compute next PC Read operand A Read operand B Perform ALU operation Set condition code register [Memory read/write] Write back ALU result [Write back memory result] Update PC All instructions follow same general pattern Differ in what gets computed on each step

Computation steps: example call Dest icode,ifun icode:ifun M 1 [PC] Read instruction byte Fetch ra,rb valc valc M 4 [PC+1] [Read register byte] Read constant word valp valp PC+5 Compute next PC Decode vala, srca valb, srcb valb R[%esp] [Read operand A] Read operand B Execute vale Cond code vale valb + 4 Perform ALU operation [Set condition code reg.] Memory Write valm dste M 4 [vale] valp R[%esp] vale [Memory read/write] [Write back ALU result] back PC update dstm PC PC valc Write back memory result Update PC All instructions follow same general pattern Differ in what gets computed on each step

Computed Values Fetch icode Instruction code ifun Instruction function ra Instr. Register A rb Instr. Register B valc Instruction constant valp Incremented PC Decode srca Register ID A srcb Register ID B dste Destination Register E dstm Destination Register M vala Register value A valb Register value B Execute vale ALU result Bch Branch flag Memory valm Value from memory

SEQ Hardware PC newpc New PC Key Blue boxes: predesigned hardware blocks E.g., memories, ALU Gray boxes: control logic Describe in HCL White ovals: labels for signals Thick lines: 32 bit word values Thin lines: 4 8 bit values Memory Execute Decode Bch CC icode ifun ra Mem. control ALU A rb read write vale ALU valc Data memory Addr ALU B valm data out Data valp vala ALU fun. valb A B Register M file E dste dstm srca srcb dste dstm srca srcb Write back Dotted lines: 1 bit values Fetch Instruction memory PC increment PC

Fetch Logic icode ifun ra rb valc valp Instr valid Need valc Need regids PC increment Split Byte 0 Align Bytes 1-5 Instruction memory Predefined Blocks PC: Register containing PC Instruction memory: Read 6 bytes (PC to PC+5) Split: Divide instruction byte into icode and ifun Align: Get fields for ra, rb, and valc PC

Fetch Logic icode ifun ra rb valc valp Instr valid Need valc Need regids PC increment Split Byte 0 Align Bytes 1-5 Instruction memory Control Logic Instr. Valid: Is this instruction valid? Need regids: Does this instruction have a register bytes? Need valc: Does this instruction have a constant word? PC

Fetch control logic nop 0 0 halt 1 0 rrmovl V,rB 2 0 ra rb immovl V,rB 3 0 8 rb D rmmovl ra,d(rb) 4 0 ra rb D mrmovl D(rB),rA 5 0 ra rb D OP1 ra,rb jxx Dest 6 fn ra rb 7 fn Dest call Dest 8 0 Dest ret 9 0 pushl ra A 0 ra 8 popl ra B 0 ra 8 bool need_regids = icode in { IRRMOVL, IOPL, IPUSHL, IPOPL, IIRMOVL, IRMMOVL, IMRMOVL }; bool instr_valid = icode in { INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL, IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL };

Decode Logic Register File Read ports A, B Write ports E, M Addresses are register IDs or 8 (no access) Control Logic vala valb A B M Register file E dste dstm srca srcb valm vale srca, srcb: read port addresses dsta, dstb: write port addresses icode dste dstm srca srcb ra rb

A Source Decode Decode Decode Decode Decode Decode OPl ra, rb vala R[rA] rmmovl ra, D(rB) vala R[rA] popl ra vala R[%esp] jxx Dest call Dest ret vala R[%esp] Read operand A Read operand A Read stack pointer No operand No operand Read stack pointer int srca = [ icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL } : ra; icode in { IPOPL, IRET } : RESP; 1 : RNONE; # Don't need register ];

E Destination Write back Write back Write back Write back Write back Write back OPl ra, rb R[rB] vale rmmovl ra, D(rB) popl ra R[%esp] vale jxx Dest call Dest R[%esp] vale ret R[%esp] vale Write back result None Update stack pointer None Update stack pointer Update stack pointer int dste = [ icode in { IRRMOVL, IIRMOVL, IOPL} : rb; icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP; 1 : RNONE; # Don't need register ];

Execute Logic Units ALU Implements 4 required functions Generates condition code values CC Register with 3 condition code bits bcond Computes branch flag Control Logic Set CC: Should condition code register be loaded? ALU A: Input A to ALU ALU B: Input B to ALU ALU fun: What function should ALU compute? Bch bcond CC Set CC ALU A vale ALU ALU B icode ifun valc vala valb ALU fun.

ALU A Input Execute Execute Execute Execute Execute OPl ra, rb vale valb OP vala rmmovl ra, D(rB) vale valb + valc popl ra vale valb + 4 jxx Dest call Dest vale valb + 4 Perform ALU operation Compute effective address Increment stack pointer No operation Decrement stack pointer ret Execute vale valb + 4 Increment stack pointer int alua = [ icode in { IRRMOVL, IOPL } : vala; icode in { IIRMOVL, IRMMOVL, IMRMOVL } : valc; icode in { ICALL, IPUSHL } : 4; icode in { IRET, IPOPL } : 4; ]; # Other instructions don't need ALU

ALU Operation Execute Execute Execute Execute Execute OPl ra, rb vale valb OP vala rmmovl ra, D(rB) vale valb + valc popl ra vale valb + 4 jxx Dest call Dest vale valb + 4 Perform ALU operation Compute effective address Increment stack pointer No operation Decrement stack pointer ret Execute vale valb + 4 Increment stack pointer int alufun = [ icode == IOPL : ifun; 1 : ALUADD; ];

Memory Logic Memory Reads or writes memory word Control Logic Mem. read: should word be read? Mem. write: should word be written? Mem. addr.: Select address Mem. data.: Select data icode Mem. read Mem. write read write Data memory Mem addr vale valm data out Mem data data in vala valp

Memory Address Memory Memory Memory Memory Memory Memory OPl ra, rb rmmovl ra, D(rB) M 4 [vale] vala popl ra valm M 4 [vala] jxx Dest call Dest M 4 [vale] valp ret valm M 4 [vala] No operation Write value to memory Read from stack No operation Write return value on stack Read return address int mem_addr = [ icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : vale; icode in { IPOPL, IRET } : vala; # Other instructions don't need address ];

Memory Read Memory Memory Memory Memory Memory Memory OPl ra, rb rmmovl ra, D(rB) M 4 [vale] vala popl ra valm M 4 [vala] jxx Dest call Dest M 4 [vale] valp ret valm M 4 [vala] No operation Write value to memory Read from stack No operation Write return value on stack Read return address bool mem_read = icode in { IMRMOVL, IPOPL, IRET };

PC Update Logic New PC Select next value of PC PC New PC icode Bch valc valm valp

PC Update PC update PC update PC update PC update PC update PC update OPl ra, rb PC valp rmmovl ra, D(rB) PC valp popl ra PC valp jxx Dest PC Bch? valc : valp call Dest PC valc ret PC valm Update PC Update PC Update PC Update PC Set PC to destination Set PC to return address int new_pc = [ icode == ICALL : valc; icode == IJXX && Bch : valc; icode == IRET : valm; 1 : valp; ];

SEQ Operation State Combinational Logic CC Read Read Ports Data memory Write Write Ports PC register Cond. Code register Data memory Register file All updated as clock rises Combinational Logic PC 0x00c Register file ALU Control logic Memory reads Instruction memory Register file Data memory

SEQ Operation #2 Clock Cycle 1: Cycle 2: Cycle 3: Cycle 4: Cycle 1 Cycle 2 Cycle 3 Cycle 4 0x000: irmovl $0x100,% ebx # %ebx < 0x100 0x006: irmovl $0x200,% edx # %edx < 0x200 0x00c: addl %edx,%ebx # %ebx < 0x300 CC < 000 0x00e: je dest # Not taken Combinational Logic CC 100 Read Read Ports Data memory Register file %ebx = 0x100 Write Write Ports state set according to second irmovl instruction combinational logic starting to react to state changes PC 0x00c

SEQ Operation #3 Clock Cycle 1: Cycle 2: Cycle 3: Cycle 4: Cycle 1 Cycle 2 Cycle 3 Cycle 4 0x000: irmovl $0x100,% ebx # %ebx < 0x100 0x006: irmovl $0x200,% edx # %edx < 0x200 0x00c: addl %edx,%ebx # %ebx < 0x300 CC < 000 0x00e: je dest # Not taken Combinational Logic CC 100 000 Read Read Ports Data memory Register file %ebx = 0x100 Write Write Ports state set according to second irmovl instruction combinational logic generates results for addl instruction PC 0x00c 0x00e

SEQ Operation #4 Clock Cycle 1: Cycle 2: Cycle 3: Cycle 4: Cycle 1 Cycle 2 Cycle 3 Cycle 4 0x000: irmovl $0x100,% ebx # %ebx < 0x100 0x006: irmovl $0x200,% edx # %edx < 0x200 0x00c: addl %edx,%ebx # %ebx < 0x300 CC < 000 0x00e: je dest # Not taken Combinational Logic CC 000 Read Read Ports Data memory Register file %ebx = 0x300 Write Write Ports state set according to addl instruction combinational logic starting to react to state changes PC 0x00e

SEQ Operation #5 Clock Cycle 1: Cycle 2: Cycle 3: Cycle 4: Cycle 1 Cycle 2 Cycle 3 Cycle 4 0x000: irmovl $0x100,% ebx # %ebx < 0x100 0x006: irmovl $0x200,% edx # %edx < 0x200 0x00c: addl %edx,%ebx # %ebx < 0x300 CC < 000 0x00e: je dest # Not taken Combinational Logic CC 000 PC 0x00e 0x013 Read Write Data memory Read Write Ports Ports Register file %ebx = 0x300 state set according to addl instruction combinational logic generates results for je instruction

SEQ Summary Implementation Express every instruction as series of simple steps Follow same general flow for each instruction type Assemble registers, memories, predesigned combinational blocks Connect with control logic Limitations Too slow to be practical In one cycle, must propagate through instruction memory, register file, ALU, and data memory Would need to run clock very slowly Hardware units only active for fraction of clock cycle