Foundations of Computer Systems
|
|
- Sibyl Reynolds
- 5 years ago
- Views:
Transcription
1 Foundations of Computer Systems Lecture 7: Processor Architecture & Design John P. Shen & Gregory Kesden September 20, 2017 Lecture #7 Processor Architecture & Design Lecture #8 Pipelined Processor Design Lecture #9 Superscalar O3 Processor Design Required Reading Assignment: Chapter 4 of CS:APP (3 rd edition) by Randy Bryant & Dave O Hallaron. Recommended Reference: Chapters 1 and 2 of Shen and Lipasti (SnL). 9/20/2017 ( J.P. Shen) Lecture #7 1
2 From Lec #2 static dynamic Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Computer Systems (ISA & DSI) PROGRAM ARCHITECTURE Instruction Set Definition: Architecture State: Reg & Memory Op-code & Operand types Operand Addressing modes Control Flow instructions Instruction Set Architecture (ISA) MACHINE Program Execution: Load program into Memory Fetch instructions from Memory Execute Instructions in CPU Update Architecture State 9/20/2017 ( J.P. Shen) Lecture #7 2
3 From Lec #2 Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Java Programming & Java Virtual Machine Application programs Operating system Processor Memory I/O devices Application programs Virtual machine Operating system Processor Memory I/O devices Class Loader and JAR Reader Interpreter and Execution Stacks Threading System and Thread Scheduler Native Interface Internal Runtime Structures Verifier Compiler (optional) Memory System and Garbage Collector 9/20/2017 ( J.P. Shen) Lecture #7 3
4 Foundations of Computer Systems Lecture 7: Processor Architecture & Design 1. Processor Architecture a. Instruction Set Architecture (ISA) b. Y86-64 Instruction Set Architecture c. Logic Design Revisited/Simplified 2. Processor Implementation a. Instruction Set Processor (CPU) b. Processor Organization Design c. Y86-64 Sequential Processor Design 3. Motivation for Pipelining 9/20/2017 ( J.P. Shen) Lecture #7 4
5 Program Development Processor Design Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Computer Systems (SE & CE) Software Engineering Application Software Program development Program compilation SPECIFICATION Instruction Set Architecture (ISA) Computer Engineering Hardware Technology Program execution Computer performance IMPLEMENTATION 9/20/2017 ( J.P. Shen) Lecture #7 5
6 Instruction Set Architecture (x86-64) Assembly Language View Processor state (architecture state) Registers, Memory, Instructions (specify operations, operands) addq, pushq, ret, How instructions are encoded as bytes Layer of Abstraction Above: how to program machine Processor executes instructions in a sequence Below: what needs to be implemented Use variety of uarch techniques to make it run fast E.g., execute multiple instructions simultaneously Application Program Compiler OS ISA x86 machine programs CPU (x86) Design Circuit Design Chip Design 9/20/2017 ( J.P. Shen) Lecture #7 6
7 Y86-64 Processor State Program Registers 15 registers (omit %r15). Each 64 bits Condition Codes %rax %rcx %rdx %rbx Single-bit flags set by arithmetic or logical instructions ZF: Zero SF: Negative OF: Overflow Program Counter Indicates address of next instruction Program Status Indicates either normal operation or some error condition Memory Byte-addressable storage array Words stored in little-endian byte order %rsp %rbp %rsi %rdi RF: Program registers %r8 %r9 %r10 %r11 %r12 %r13 %r14 CC: Condition codes ZF SF OF PC Stat: Program status DMEM: Memory Y86-64 Instruction Set Architecture Similar state and instructions as x86-64 Simpler encodings Somewhere between CISC and RISC 9/20/2017 ( J.P. Shen) Lecture #7 7
8 Y86-64 Instruction Set 9/20/2017 ( J.P. Shen) Lecture #7 8
9 Y86-64 Instruction Formats Format 1 10 bytes of information read from memory Can determine instruction length from first byte Fewer instruction types and simpler encoding than x86-64 Each instruction accesses and modifies some part(s) of the program state 9/20/2017 ( J.P. Shen) Lecture #7 9
10 Y86-64 Instructions (moves) 9/20/2017 ( J.P. Shen) Lecture #7 10
11 Y86-64 Instructions (ALUs) 9/20/2017 ( J.P. Shen) Lecture #7 11
12 Y86-64 Instructions (branches) 9/20/2017 ( J.P. Shen) Lecture #7 12
13 Encoding Register Specifiers Each register has 4-bit ID %rax %rcx 0 1 %r8 %r9 8 9 %rdx 2 %r10 A %rbx 3 %r11 B %rsp 4 %r12 C %rbp 5 %r13 D %rsi 6 %r14 E %rdi 7 No Register F Same encoding as in x86-64 Register ID 15 (0xF) indicates no register Will use this in our hardware design in multiple places 9/20/2017 ( J.P. Shen) Lecture #7 13
14 Instruction Format Example Addition Instruction Generic Form Encoded Representation addq ra, rb 6 0 ra rb Add value in register ra to that in register rb Store result in register rb Note that Y86-64 only allows addition to be applied to register data Set condition codes based on result e.g., addq %rax,%rsi Encoding: Two-byte encoding First indicates instruction type Second gives source and destination registers 9/20/2017 ( J.P. Shen) Lecture #7 14
15 Arithmetic and Logical Operations Instruction Code Function Code Add addq ra, rb 6 0 ra rb Subtract (ra from rb) subq ra, rb 6 1 ra rb Refer to generically as OPq Encodings differ only by function code Low-order 4 bits in first instruction byte Set condition codes as side effect And andq ra, rb 6 2 ra rb Exclusive-Or xorq ra, rb 6 3 ra rb 9/20/2017 ( J.P. Shen) Lecture #7 15
16 Move Operations Register Register rrmovq ra, rb 2 0 ra rb Immediate Register irmovq V, rb 3 0 F rb V Register Memory rmmovq ra, D(rB) 4 0 ra rb D Memory Register mrmovq D(rB), ra 5 0 ra rb D Like the x86-64 movq instruction Simpler format for memory addresses Give different names to keep them distinct 9/20/2017 ( J.P. Shen) Lecture #7 16
17 Move Instruction Examples X86-64 Y86-64 movq $0xabcd, %rdx irmovq $0xabcd, %rdx Encoding: cd ab movq %rsp, %rbx rrmovq %rsp, %rbx Encoding: movq -12(%rbp),%rcx mrmovq -12(%rbp),%rcx movq %rsi,0x41c(%rsp) Encoding: Encoding: f4 ff ff ff ff ff ff ff rmmovq %rsi,0x41c(%rsp) c /20/2017 ( J.P. Shen) Lecture #7 17
18 Conditional Move Instructions Move Unconditionally rrmovq ra, rb Move When Less or Equal cmovle ra, rb Move When Less cmovl ra, rb Move When Equal cmove ra, rb Move When Not Equal cmovne ra, rb Move When Greater or Equal cmovge ra, rb Move When Greater cmovg ra, rb 2 0 ra rb 2 1 ra rb 2 2 ra rb 2 3 ra rb 2 4 ra rb 2 5 ra rb 2 6 ra rb Refer to generically as cmovxx Encodings differ only by function code Based on values of condition codes Variants of rrmovq instruction (Conditionally) copy value from source to destination register 9/20/2017 ( J.P. Shen) Lecture #7 18
19 Jump Instructions Jump (Conditionally) jxx Dest 7 fn Dest Refer to generically as jxx Encodings differ only by function code fn Based on values of condition codes Same as x86-64 counterparts Encode full destination address Unlike PC-relative addressing seen in x /20/2017 ( J.P. Shen) Lecture #7 19
20 Jump Instructions Jump Unconditionally jmp Dest 7 0 Dest Jump When Less or Equal jle Dest 7 1 Dest Jump When Less jl Dest 7 2 Dest Jump When Equal je Dest 7 3 Dest Jump When Not Equal jne Dest 7 4 Dest Jump When Greater or Equal jge Dest 7 5 Dest Jump When Greater jg Dest 7 6 Dest 9/20/2017 ( J.P. Shen) Lecture #7 20
21 Stack Operations pushq ra A 0 ra F Decrement %rsp by 8 Store word from ra to memory at %rsp Like x86-64 popq ra B 0 ra F Read word from memory at %rsp Save in ra Increment %rsp by 8 Like x86-64 Increasing Addresses Stack Bottom Stack Top %rsp Region of memory holding program data Used in Y86-64 (and x86-64) for supporting procedure calls Stack top indicated by %rsp Address of top stack element Stack grows toward lower addresses Bottom element is at highest address in the stack When pushing, must first decrement stack pointer After popping, increment stack pointer 9/20/2017 ( J.P. Shen) Lecture #7 21
22 Subroutine Call and Return call Dest 8 0 Dest Push address of next instruction onto stack Start executing instructions at Dest Like x86-64 ret 9 0 Pop value from stack Use as address for next instruction Like x /20/2017 ( J.P. Shen) Lecture #7 22
23 Register Decode Logic Design Review HW Building Blocks A Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition B Combinational Logic Funct. ALU COMP Compute Boolean functions of inputs =? Continuously respond to input changes Operate on data and implement control Priority Encode Sel MUX Storage (Sequential) Elements Store bits Addressable memories Non-addressable registers Loaded only as clock rises Clock vala srca valb srcb A B Register file W valw dstw 9/20/2017 ( J.P. Shen) Lecture #7 23
24 Logic Design Review Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Arithmetic Logic Unit Y X A B A L U OF ZF CF X + Y Y X A B A L U OF ZF CF X - Y Y X A B A L U OF ZF CF X & Y Y X A B A L U OF ZF CF X ^ Y Combinational logic Continuously responding to inputs Control signal selects function computed Corresponding to 4 arithmetic/logical operations in Y86-64 Also computes values for condition codes 9/20/2017 ( J.P. Shen) Lecture #7 24
25 Logic Design Review Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Equality Comparator a b Bit equal eq b 63 Bit equal eq 63 b 62 Bit equal eq 62 a 62 Eq b 1 Bit equal eq 1 b 0 Bit equal eq 0 a 0 9/20/2017 ( J.P. Shen) Lecture #7 25
26 Logic Design Review Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Multiplexor/Demultiplexor 9/20/2017 ( J.P. Shen) Lecture #7 26
27 Logic Design Review Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Decoder/Encoder 9/20/2017 ( J.P. Shen) Lecture #7 27
28 Logic Design Review Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Transparent Latch D Latch D Data R Q+ Changing D C C Clock S Q D Q+ Time Latching d!d!d!d d Storing d!d 0!q q When in latching mode, combinational propagation from D to Q+ and Q 1 d d!d 0 0 q!q Value latched depends on value of D as C falls 9/20/2017 ( J.P. Shen) Lecture #7 28
29 Logic Design Review Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Edge-Triggered Latch D Data R Q+ C Clock T Trigger S Q C T D Q+ Time Only in latching mode for brief period Rising clock edge Value latched depends on data as clock rises Output remains stable at all other times 9/20/2017 ( J.P. Shen) Lecture #7 29
30 Logic Design Review Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Structure Registers i 7 D C Q+ o 7 i 6 D C Q+ o 6 State = x State = y i 5 D C Q+ o 5 Input = y x Output = x Rising clock y Output = y i 4 i 3 D C D C Q+ Q+ o 4 o 3 i 2 D C Q+ o 2 i 1 D C Q+ o 1 i 0 D C Q+ o 0 Stores data bits I Clock O For most of time acts as barrier between input and output As clock rises, loads input Clock Stores word of data Different from program registers seen in assembly code Collection of edge-triggered latches Loads input on rising edge of clock 9/20/2017 ( J.P. Shen) Lecture #7 30
31 Logic Design Review Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition State Machine Example Comb. Logic 0 Accumulator circuit A L U 0 MUX Out Load or accumulate on each cycle In 1 Load Clock Clock Load In Out x 0 x 1 x 2 x 3 x 4 x 5 x 0 x 0 +x 1 x 0 +x 1 +x 2 x 3 x 3 +x 4 x 3 +x 4 +x 5 9/20/2017 ( J.P. Shen) Lecture #7 31
32 Logic Design Review Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Random-Access Memory Stores multiple words of memory Address input specifies which word to read or write Register file Holds values of program registers %rax, %rsp, etc. Register identifier serves as address» ID 15 (0xF) implies no read or write performed Read ports vala srca valb srcb A B Register file W Clock valw dstw Write port Multiple Ports Can read and/or write multiple words in one cycle» Each has separate address and data input/output 9/20/2017 ( J.P. Shen) Lecture #7 32
33 Logic Design Review Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Register File Timing x 2 vala srca valb srcb A B 2 x Register file Reading Writing Like combinational logic Output data generated based on input address After some delay Like register Update only as clock rises 2 x Register file W valw dstw y 2 Rising Register clock file 2 y W valw dstw Clock Clock 9/20/2017 ( J.P. Shen) Lecture #7 33
34 Logic Design Review decoder decoder Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition decoder decoder (Cache) Memory Implementation Options index key idx key index index tag data tag data Indexed Memory k-bit index 2 k blocks Associative Memory (CAM) no index unlimited blocks N-Way Set-Associative Memory k-bit index 2 k N blocks Indexed Memory (Multi-Ported) (2x) k-bit index (2x) 2 k blocks 9/20/2017 ( J.P. Shen) Lecture #7 34
35 Foundations of Computer Systems Lecture 7: Instruction-Set Processor Design 1. Processor Architecture a. Instruction Set Architecture (ISA) b. Y86-64 Instruction Set Architecture c. Logic Design Revisited/Simplified 2. Processor Implementation a. Instruction Set Processor (CPU) b. Processor Organization Design c. Y86-64 Sequential Processor Design 3. Motivation for Pipelining 9/20/2017 ( J.P. Shen) Lecture #7 35
36 Instruction Processing Steps Processor State Program counter register (PC) Condition code register (CC) Register File Memories Access same memory space Data: for reading/writing program data Instruction: for reading instructions Instruction Processing Flow Read instruction at address specified by PC Process through (four) typical steps Update program counter (Repeat) 1. Fetch Read instruction from instruction memory 2. Decode Determine Instruction type; Read program registers 3. Execute Compute value or address 4. Memory Read or write data in memory 5. Write Back Write program registers 6. PC Update Update program counter 9/20/2017 ( J.P. Shen) Lecture #7 36
37 Instruction Decoding Optional Optional 5 0 ra rb D icode ifun ra rb valc Instruction Format Instruction byte Optional register byte Optional constant word icode:ifun ra:rb valc 9/20/2017 ( J.P. Shen) Lecture #7 37
38 Executing Arithmetic/Logical Operation OPq ra, rb 6 fn ra rb Fetch Read 2 bytes Decode Read operand registers Execute Perform operation Set condition codes Memory Do nothing Write Back Update register PC Update Increment PC by 2 9/20/2017 ( J.P. Shen) Lecture #7 38
39 Stage Computation: Arith/Log. Ops OPq ra, rb Fetch Decode Execute Memory Write back PC update icode:ifun M 1 [PC] ra:rb M 1 [PC+1] valp PC+2 vala R[rA] valb R[rB] vale valb OP vala Set CC R[rB] vale PC valp Read instruction byte Read register byte Compute next PC Read operand A Read operand B Perform ALU operation Set condition code register Write back result Update PC Formulate instruction execution as sequence of simple steps Use same general form for all instructions 9/20/2017 ( J.P. Shen) Lecture #7 39
40 Executing rmmovq rmmovq ra, D(rB) 4 0 ra rb D Fetch Read 10 bytes Decode Read operand registers Execute Compute effective address Memory Write to memory Write back Do nothing PC Update Increment PC by 10 9/20/2017 ( J.P. Shen) Lecture #7 40
41 Processing Steps: rmmovq rmmovq ra, D(rB) Fetch Decode Execute Memory Write back PC update icode:ifun M 1 [PC] ra:rb M 1 [PC+1] valc M 8 [PC+2] valp PC+10 vala R[rA] valb R[rB] vale valb + valc M 8 [vale] vala PC valp Read instruction byte Read register byte Read displacement D Compute next PC Read operand A Read operand B Compute effective address Write value to memory Update PC Use ALU for address computation 9/20/2017 ( J.P. Shen) Lecture #7 41
42 Executing popq popq ra b 0 ra 8 Fetch Read 2 bytes Decode Read stack pointer Execute Increment stack pointer by 8 Memory Read from old stack pointer Write back Update stack pointer Write result to register PC Update Increment PC by 2 9/20/2017 ( J.P. Shen) Lecture #7 42
43 Processing Steps: popq popq ra Fetch Decode Execute Memory Write back PC update icode:ifun M 1 [PC] ra:rb M 1 [PC+1] valp PC+2 vala R[%rsp] valb R[%rsp] vale valb + 8 valm M 8 [vala] R[%rsp] vale R[rA] valm PC valp Read instruction byte Read register byte Compute next PC Read stack pointer Read stack pointer Increment stack pointer Read from stack Update stack pointer Write back result Update PC Use ALU to increment stack pointer Must update two registers Popped value New stack pointer 9/20/2017 ( J.P. Shen) Lecture #7 43
44 Executing Conditional Moves cmovxx ra, rb 2 fn ra rb Fetch Decode Execute Read 2 bytes Read operand registers If!cnd, then set destination register to 0xF Memory Do nothing Write back Update register (or not) PC Update Increment PC by 2 9/20/2017 ( J.P. Shen) Lecture #7 44
45 Processing Steps: Cond. Move cmovxx ra, rb Fetch Decode Execute Memory Write back PC update icode:ifun M 1 [PC] ra:rb M 1 [PC+1] valp PC+2 vala R[rA] valb 0 vale valb + vala If! Cond(CC,ifun) rb 0xF R[rB] vale PC valp Read instruction byte Read register byte Compute next PC Read operand A Pass vala through ALU (Disable register update) Write back result Update PC Read register ra and pass through ALU Cancel move by setting destination register to 0xF If condition codes & move condition indicate no move 9/20/2017 ( J.P. Shen) Lecture #7 45
46 Executing Jumps jxx Dest 7 fn Dest fall thru: XX XX Not taken target: XX XX Taken Fetch Decode Execute Read 9 bytes Increment PC by 9 Do nothing Determine whether to take branch based on jump condition and condition codes Memory Do nothing Write back Do nothing PC Update Set PC to Dest if branch taken or to incremented PC if not branch 9/20/2017 ( J.P. Shen) Lecture #7 46
47 Processing Steps: Jumps jxx Dest Fetch Decode icode:ifun M 1 [PC] valc M 8 [PC+1] valp PC+9 Read instruction byte Read destination address Fall through address Execute Memory Write back PC update Cnd Cond(CC,ifun) PC Cnd? valc : valp Take branch? Update PC Compute both addresses Choose based on setting of condition codes and branch condition 9/20/2017 ( J.P. Shen) Lecture #7 47
48 Executing call call Dest 8 0 Dest return: XX XX target: XX XX Fetch Read 9 bytes Increment PC by 9 Decode Read stack pointer Execute Decrement stack pointer by 8 Memory Write incremented PC to new value of stack pointer Write back Update stack pointer PC Update Set PC to Dest 9/20/2017 ( J.P. Shen) Lecture #7 48
49 Processing Steps: call call Dest Fetch Decode Execute Memory Write back PC update icode:ifun M 1 [PC] valc M 8 [PC+1] valp PC+9 valb R[%rsp] vale valb + 8 M 8 [vale] valp R[%rsp] vale PC valc Read instruction byte Read destination address Compute return point Read stack pointer Decrement stack pointer Write return value on stack Update stack pointer Set PC to destination Use ALU to decrement stack pointer Store incremented PC 9/20/2017 ( J.P. Shen) Lecture #7 49
50 Executing ret ret 9 0 return: XX XX Fetch Read 1 byte Decode Read stack pointer Execute Increment stack pointer by 8 Memory Read return address from old stack pointer Write back Update stack pointer PC Update Set PC to return address 9/20/2017 ( J.P. Shen) Lecture #7 50
51 Processing Steps: ret ret icode:ifun M 1 [PC] Read instruction byte Fetch Decode Execute Memory Write back PC update vala R[%rsp] valb R[%rsp] vale valb + 8 valm M 8 [vala] R[%rsp] vale PC valm Read operand stack pointer Read operand stack pointer Increment stack pointer Read return address Update stack pointer Set PC to return address Use ALU to increment stack pointer Read return address from memory 9/20/2017 ( J.P. Shen) Lecture #7 51
52 Instruction Processing Steps OPq ra, rb icode,ifun icode:ifun M 1 [PC] Read instruction byte Fetch ra,rb valc ra:rb M 1 [PC+1] Read register byte [Read constant word] valp valp PC+2 Compute next PC Decode vala, srca valb, srcb vala R[rA] valb R[rB] Read operand A Read operand B Execute vale Cond code vale valb OP vala Set CC Perform ALU operation Set/use cond. code reg Memory valm [Memory read/write] Write dste R[rB] vale Write back ALU result back dstm [Write back memory result] PC update PC PC valp Update PC All instructions follow same general pattern Differ in what gets computed in each step 9/20/2017 ( J.P. Shen) Lecture #7 52
53 Instruction Processing Steps call Dest icode,ifun icode:ifun M 1 [PC] Read instruction byte Fetch ra,rb valc valc M 8 [PC+1] [Read register byte] Read constant word valp valp PC+9 Compute next PC Decode vala, srca valb, srcb valb R[%rsp] [Read operand A] Read operand B Execute vale Cond code vale valb + 8 Perform ALU operation [Set /use cond. code reg] Memory Write valm dste M 8 [vale] valp R[%rsp] vale Memory read/write Write back ALU result back PC update dstm PC PC valc [Write back memory result] Update PC All instructions follow same general pattern Differ in what gets computed in each step 9/20/2017 ( J.P. Shen) Lecture #7 53
54 Computed Values Fetch icode ifun ra rb valc valp Decode srca srcb dste dstm vala valb Instruction code Instruction function Instr. Register A Instr. Register B Instruction constant Incremented PC Register ID A Register ID B Destination Register E Destination Register M Register value A Register value B Execute vale ALU result Cnd Branch/move flag Memory valm Value from memory 9/20/2017 ( J.P. Shen) Lecture #7 54
55 SEQ Hardware Key Blue boxes: predesigned hardware blocks E.g., memories, ALU Gray boxes: control logic Describe in HCL White ovals: labels for signals Thick lines: 64-bit word values Thin lines: 4-8 bit values Dotted lines: 1-bit values 9/20/2017 ( J.P. Shen) Lecture #7 55
56 Fetch Logic Predefined Blocks icode ifun ra rb valc valp PC: Register containing PC Instruction memory: Read 10 bytes (PC to PC+9) Signal invalid address Instr valid Need valc Need regids PC increment Split: Divide instruction byte into icode and ifun Align: Get fields for ra, rb, and valc icode Split ifun Byte 0 Align Bytes 1-9 Control Logic Instr. Valid: Is this instruction valid? imem_error Instruction memory icode, ifun: Generate no-op if invalid address Need regids: Does this instruction have a register byte? PC Need valc: Does this instruction have a constant word? 9/20/2017 ( J.P. Shen) Lecture #7 56
57 Decode Logic Register File Cnd vala valb valm vale Read ports A, B Write ports E, M Addresses are register IDs or 15 (0xF) (no access) A Register file B M E dste dstm srca srcb Control Logic srca, srcb: read port addresses dste, dstm: write port addresses dste dstm srca srcb Signals Cnd: Indicate whether or not to perform conditional move icode ra rb Computed in Execute stage 9/20/2017 ( J.P. Shen) Lecture #7 57
58 A Source Decode OPq ra, rb vala R[rA] Read operand A cmovxx ra, rb Decode vala R[rA] Read operand A Decode Decode Decode Decode rmmovq ra, D(rB) vala R[rA] popq ra vala R[%rsp] jxx Dest call Dest Read operand A Read stack pointer No operand No operand Decode ret vala R[%rsp] int srca = [ icode in { IRRMOVQ, IRMMOVQ, IOPQ, IPUSHQ } : ra; icode in { IPOPQ, IRET } : RRSP; 1 : RNONE; # Don't need register ]; Read stack pointer 9/20/2017 ( J.P. Shen) Lecture #7 58
59 E Destination Write-back OPq ra, rb R[rB] vale Write back result Write-back Write-back Write-back Write-back Write-back cmovxx ra, rb R[rB] vale rmmovq ra, D(rB) popq ra R[%rsp] vale jxx Dest call Dest R[%rsp] vale Conditionally write back result None Update stack pointer None Update stack pointer Write-back R[%rsp] vale int dste = [ icode in { IRRMOVQ } && Cnd : rb; icode in { IIRMOVQ, IOPQ} : rb; icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP; 1 : RNONE; # Don't write any register ]; ret Update stack pointer 9/20/2017 ( J.P. Shen) Lecture #7 59
60 Execute Logic Units Cnd vale ALU Implements 4 required functions Generates condition code values CC Register with 3 condition code bits cond Computes conditional jump/move flag cond CC ALU ALU fun. Control Logic Set CC: Should condition code register be loaded? ALU A: Input A to ALU ALU B: Input B to ALU ALU fun: What function should ALU compute? Set CC ALU A ALU B icode ifun valc vala valb 9/20/2017 ( J.P. Shen) Lecture #7 60
61 ALU A Input Execute Execute Execute Execute Execute Execute OPq ra, rb vale valb OP vala cmovxx ra, rb vale 0 + vala rmmovq ra, D(rB) vale valb + valc popq ra vale valb + 8 jxx Dest call Dest vale valb + 8 Perform ALU operation Pass vala through ALU Compute effective address Increment stack pointer No operation Decrement stack pointer int alua = [ icode in { IRRMOVQ, IOPQ } : vala; icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ } : valc; icode in { ICALL, IPUSHQ } : -8; icode in { IRET, IPOPQ } : 8; # Other instructions don't need ALU ]; ret Execute vale valb + 8 Increment stack pointer 9/20/2017 ( J.P. Shen) Lecture #7 61
62 ALU Operation Execute Execute Execute Execute Execute Execute OPl ra, rb vale valb OP vala cmovxx ra, rb vale 0 + vala rmmovl ra, D(rB) vale valb + valc popq ra vale valb + 8 jxx Dest call Dest vale valb + 8 Perform ALU operation Pass vala through ALU Compute effective address Increment stack pointer No operation Decrement stack pointer int alufun = [ icode == IOPQ : ifun; 1 : ALUADD; ]; ret Execute vale valb + 8 Increment stack pointer 9/20/2017 ( J.P. Shen) Lecture #7 62
63 Memory Logic Stat Memory Reads or writes memory word stat dmem_error valm data out Control Logic stat: What is instruction status? Mem. read: should word be read? Mem. write: should word be written? instr_valid imem_error Mem. read Mem. write read write Data memory data in Mem. addr.: Select address Mem. data.: Select data Mem. addr Mem. data icode vale vala valp 9/20/2017 ( J.P. Shen) Lecture #7 63
64 Instruction Status Stat Control Logic stat: What is instruction status? instr_valid stat imem_error Mem. read Mem. write dmem_error read write valm data out Data memory data in Mem. addr Mem. data ## Determine instruction status int Stat = [ imem_error dmem_error : SADR;!instr_valid: SINS; icode == IHALT : SHLT; 1 : SAOK; ]; icode vale vala valp 9/20/2017 ( J.P. Shen) Lecture #7 64
65 Memory Address Memory Memory Memory Memory Memory Memory OPq ra, rb rmmovq ra, D(rB) M 8 [vale] vala popq ra valm M 8 [vala] jxx Dest call Dest M 8 [vale] valp ret valm M 8 [vala] No operation Write value to memory Read from stack No operation Write return value on stack Read return address int mem_addr = [ icode in { IRMMOVQ, IPUSHQ, ICALL, IMRMOVQ } : vale; icode in { IPOPQ, IRET } : vala; # Other instructions don't need address ]; 9/20/2017 ( J.P. Shen) Lecture #7 65
66 Memory Read Memory Memory Memory Memory Memory Memory OPq ra, rb rmmovq ra, D(rB) M 8 [vale] vala popq ra valm M 8 [vala] jxx Dest call Dest M 8 [vale] valp ret valm M 8 [vala] No operation Write value to memory Read from stack No operation Write return value on stack Read return address bool mem_read = icode in { IMRMOVQ, IPOPQ, IRET }; 9/20/2017 ( J.P. Shen) Lecture #7 66
67 PC Update PC update OPq ra, rb PC valp Update PC rmmovq ra, D(rB) int new_pc = [ icode == ICALL : valc; icode == IJXX && Cnd : valc; icode == IRET : valm; 1 : valp; ]; PC update PC update PC update PC valp popq ra PC valp jxx Dest PC Cnd? valc : valp Update PC Update PC Update PC PC PC update call Dest PC valc Set PC to destination New PC icode Cnd valc valm valp ret PC update PC valm New PC Select next value of PC Set PC to return address 9/20/2017 ( J.P. Shen) Lecture #7 67
68 SEQ Y86-64 Processor Combinational logic Read Data memory Write CC 100 Read ports Write ports Register file %rbx = 0x100 PC 0x014 9/20/2017 ( J.P. Shen) Lecture #7 68
69 Foundations of Computer Systems Lecture 7: Instruction-Set Processor Design 1. Processor Architecture a. Instruction Set Architecture (ISA) b. Y86-64 Instruction Set Architecture c. Logic Design Revisited/Simplified 2. Processor Implementation a. Instruction Set Processor (CPU) b. Processor Organization Design c. Y86-64 Sequential Processor Design 3. Motivation for Pipelining 9/20/2017 ( J.P. Shen) Lecture #7 69
70 Computational Example 300 ps 20 ps Combinational logic R e g Delay = 320 ps Throughput = 3.12 GIPS Clock System Computation requires total of 300 picoseconds Additional 20 picoseconds to save result in register Must have clock cycle of at least 320 ps 9/20/2017 ( J.P. Shen) Lecture #7 70
71 3-Way Pipelined Version 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps Comb. logic A R e g Comb. logic B R e g Comb. logic C R e g Delay = 360 ps Throughput = 8.33 GIPS Clock System Divide combinational logic into 3 blocks of 100 ps each Can begin new operation as soon as previous one passes through stage A. Begin new operation every 120 ps Overall latency increases 360 ps from start to finish 9/20/2017 ( J.P. Shen) Lecture #7 71
72 Pipeline Diagrams Unpipelined OP1 OP2 OP3 Time Cannot start new operation until previous one completes 3-Way Pipelined OP1 OP2 OP3 A B C A B C A B C Time Up to 3 operations in process simultaneously 9/20/2017 ( J.P. Shen) Lecture #7 72
73 Operating a Pipeline Clock OP1 OP2 OP3 A B C A B C A B C Time 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps Comb. logic A R e g Comb. logic B R e g Comb. logic C R e g Clock 9/20/2017 ( J.P. Shen) Lecture #7 73
74 Example: Integer Multiplier [Source: J. Hayes, Univ. of Michigan] 16x16 combinational multiplier ISCAS-85 C6288 standard benchmark Tools: Synopsys DC/LSI Logic 110nm gflxp ASIC 9/20/2017 ( J.P. Shen) Lecture #7 74
75 Example: Integer Multiplier Configuration Delay MPS Area (FF/wiring) Area Increase Combinational 3.52ns (--/1759) 2 Stages 1.87ns 534 (1.9x) 8725 (1078/1870) 16% 4 Stages 1.17ns 855 (3.0x) (3388/2112) 50% 8 Stages 0.80ns 1250 (4.4x) (8938/2612) 127% Pipeline efficiency 2-stage: nearly double throughput; marginal area cost 4-stage: 75% efficiency; area still reasonable 8-stage: 55% efficiency; area more than doubles Tools: Synopsys DC/LSI Logic 110nm gflxp ASIC 9/20/2017 ( J.P. Shen) Lecture #7 75
76 Foundations of Computer Systems Lecture 8: Pipelined Processor Design John P. Shen & Gregory Kesden September 25, 2017 Required Reading Assignment: Chapter 4 of CS:APP (3 rd edition) by Randy Bryant & Dave O Hallaron. Recommended Reference: Chapters 1 and 2 of Shen and Lipasti (SnL). 9/20/2017 ( J.P. Shen) Lecture #7 76
cmovxx ra, rb 2 fn ra rb irmovq V, rb 3 0 F rb V rmmovq ra, D(rB) 4 0 ra rb D mrmovq D(rB), ra 5 0 ra rb D OPq ra, rb 6 fn ra rb jxx Dest 7 fn Dest
Computer Architecture: Y86-64 Sequential Implementation CSci 2021: Machine Architecture and Organization Lecture #19, March 4th, 2016 Your instructor: Stephen McCamant Based on slides originally by: Randy
More informationSequential Implementation
CS:APP Chapter 4 Computer Architecture Sequential Implementation Randal E. Bryant adapted by Jason Fritts http://csapp.cs.cmu.edu CS:APP2e Hardware Architecture - using Y86 ISA For learning aspects of
More informationcmovxx ra, rb 2 fn ra rb irmovq V, rb 3 0 F rb V rmmovq ra, D(rB) 4 0 ra rb D mrmovq D(rB), ra 5 0 ra rb D OPq ra, rb 6 fn ra rb jxx Dest 7 fn Dest
Computer Architecture: Y86-64 Sequential Implementation CSci 2021: Machine Architecture and Organization October 31st, 2018 Your instructor: Stephen McCamant Based on slides originally by: Randy Bryant
More informationCSC 252: Computer Organization Spring 2018: Lecture 11
CSC 252: Computer Organization Spring 2018: Lecture 11 Instructor: Yuhao Zhu Department of Computer Science University of Rochester Action Items: Assignment 3 is due March 2, midnight Announcement Programming
More informationCS:APP Chapter 4! Computer Architecture! Sequential! Implementation!
CS:APP Chapter 4! Computer Architecture! Sequential! Implementation! Randal E. Bryant! Carnegie Mellon University! http://csapp.cs.cmu.edu CS:APP2e! Y86 Instruction Set #1! Byte! 0 1 2 3 4 5 halt 0 0 nop
More informationGod created the integers, all else is the work of man Leopold Kronecker
Sequential Hardware God created the integers, all else is the work of man Leopold Kronecker (He believed in the reduction of all mathematics to arguments involving only the integers and a finite number
More informationCS:APP Chapter 4 Computer Architecture Sequential Implementation
CS:APP Chapter 4 Computer Architecture Sequential Implementation Randal E. Bryant Carnegie Mellon University CS:APP Y86 Instruction Set Byte ra rb ra rb V rb rb V ra D rb ra rb D D rb ra ra rb D ra rb
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Dr Bill Young Department of Computer Sciences University of Texas at Austin Last updated: March 15, 2018 at 10:58 CS429 Slideset 13: 1 The ISA Byte 0 1 2 3
More informationCS:APP Chapter 4 Computer Architecture Sequential Implementation
CS:APP Chapter 4 Computer Architecture Sequential Implementation Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP Y86 Instruction Set Byte 0 1 2 3 4 5 nop 0 0 halt 1 0 rrmovl
More informationByte. cmovle 7 1. halt 0 0 addq 6 0 nop 1 0 subq 6 1 cmovxx ra, rb 2 fn ra rb andq 6 2 irmovq V, rb 3 0 F rb V xorq 6 3. cmovl 7 2.
Y86-6 Instruc&on Set Computer rchitecture Sequen&al Implementa&on CSCI 202: Machine rchitecture and Organiza&on Byte 0 2 3 5 6 7 8 9 2 fn r rb irmovq V, rb 3 0 F rb V 0 r rb Pen- Chung Yew epartment Computer
More informationComputer Science 104:! Y86 & Single Cycle Processor Design!
Computer Science 104: Y86 & Single Cycle Processor Design Alvin R. Lebeck Slides based on those from Randy Bryant 1 CS:APP Administrative Homework #4 My office hours today 11:30-12:30 Reading: text 4.3
More informationComputer Science 104:! Y86 & Single Cycle Processor Design!
Computer Science 104: Y86 & Single Cycle Processor Design Alvin R. Lebeck Slides based on those from Randy Bryant CS:APP Administrative HW #4 Due tomorrow tonight HW #5 up soon ing: 4.1-4.3 Today Review
More informationGiving credit where credit is due
CSCE 230J Computer Organization Processor Architecture III: Sequential Implementation Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce230j Giving credit where credit is due
More informationComputer Organization: A Programmer's Perspective
A Programmer's Perspective Instruction Set Architecture Gal A. Kaminka galk@cs.biu.ac.il Outline: CPU Design Background Instruction sets Logic design Sequential Implementation A simple, but not very fast
More informationChapter 4! Processor Architecture!!
Chapter 4! Processor Architecture!! Sequential Implementation! Instructor: Dr. Hyunyoung Lee! Texas A&M University! Based on slides provided by Randal E. Bryant, CMU Topics Covered! Hardware Control Language
More informationThe ISA. Fetch Logic
The ISA CS429: Computer Organization and Architecture Dr Bill Young Department of Computer Science University of Texas at Austin Last updated: July 5, 2018 at 11:55 Byte 0 1 2 3 4 5 6 7 8 9 halt 0 0 nop
More informationSystems I. Datapath Design II. Topics Control flow instructions Hardware for sequential machine (SEQ)
Systems I Datapath Design II Topics Control flow instructions Hardware for sequential machine (SEQ) Executing Jumps jxx Dest 7 fn Dest fall thru: XX XX Not taken target: XX XX Taken Read 5 bytes Increment
More informationComputer Science 104:! Y86 & Single Cycle Processor Design!
Computer Science 104:! Y86 & Single Cycle Processor Design! Alvin R. Lebeck! Slides based on those from Randy Bryant 1! CS:APP! CS:APP! Administrative! 2! CS:APP! Instruction Set Architecture! Application!
More informationSystems I. Datapath Design II. Topics Control flow instructions Hardware for sequential machine (SEQ)
Systems I Datapath Design II Topics Control flow instructions Hardware for sequential machine (SEQ) Executing Jumps jxx Dest 7 fn Dest fall thru: XX XX Not taken target: XX XX Taken Fetch Decode Read 5
More informationProcessor Architecture II! Sequential! Implementation!
Processor Architecture II! Sequential! Implementation! Lecture 6, April 14 th 2011 Alexandre David Slides by Randal E. Bryant! Carnegie Mellon University! Y86 Instruction Set! Byte! 0 1 2 3 4 5 nop 0 0
More informationOverview. CS429: Computer Organization and Architecture. Y86 Instruction Set. Building Blocks
Overview CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: March 15, 2018 at 10:54 How do we build a digital computer?
More informationComputer Science 104:! Y86 & Single Cycle Processor Design!
Computer Science 104:! Y86 & Single Cycle Processor Design! Alvin R. Lebeck! Slides based on those from Randy Bryant CS:APP! Administrative! 2! CS:APP! Y86 Instruction Set! Byte! 0 1 2 3 4 5 nop 0 0 halt
More informationStage Computation: Arith/Log. Ops
Stage Computation: Arith/Log. Ops OPl ra, rb Fetch icode:ifun M 1 [PC] ra:rb M 1 [PC+1] Read instruction byte Read register byte back valp PC+2 vala R[rA] valb R[rB] vale valb OP vala Set CC R[rB] vale
More informationCS:APP Chapter 4 Computer Architecture Sequential Implementation
CS:APP Chapter 4 Computer Architecture Sequential Implementation Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP Y86 Instruction Set Byte 0 1 2 3 4 5 nop 0 0 halt 1 0 rrmovl
More informationBackground: Sequential processor design. Computer Architecture and Systems Programming , Herbstsemester 2013 Timothy Roscoe
Background: Sequential processor design Computer Architecture and Systems Programming 252 0061 00, Herbstsemester 2013 Timothy Roscoe Overview Y86 instruction set architecture Processor state Instruction
More informationComputer Architecture I: Outline and Instruction Set Architecture. CENG331 - Computer Organization. Murat Manguoglu
Computer Architecture I: Outline and Instruction Set Architecture CENG331 - Computer Organization Murat Manguoglu Adapted from slides of the textbook: http://csapp.cs.cmu.edu/ Outline Background Instruction
More informationCISC Fall 2009
Michela Taufer October 20, 2009 Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective, R. Bryant and D. O'Hallaron, Prentice Hall, 2003 Y86 Instruction Set Byte 0 1 2 3 4 5 nop 0 0
More informationIntel x86-64 and Y86-64 Instruction Set Architecture
CSE 2421: Systems I Low-Level Programming and Computer Organization Intel x86-64 and Y86-64 Instruction Set Architecture Presentation J Read/Study: Bryant 3.1 3.5, 4.1 Gojko Babić 03-07-2018 Intel x86
More informationCISC: Stack-intensive procedure linkage. [Early] RISC: Register-intensive procedure linkage.
CISC: Stack-intensive procedure linkage. The stack is used for procedure arguments and return addresses. [Early] RISC: Register-intensive procedure linkage. Registers are used for procedure arguments and
More informationY86 Instruction Set. SEQ Hardware Structure. SEQ Stages. Sequential Implementation. Lecture 8 Computer Architecture III.
Building Blocks Combinational Logic Compute Boolean functions of inputs Continuously respond to input changes Operate on data and implement control Storage Elements Store bits Lecture 8 Computer rchitecture
More informationcmovxx ra, rb 2 fn ra rb irmovq V, rb 3 0 F rb V rmmovq ra, D(rB) 4 0 ra rb mrmovq D(rB), ra 5 0 ra rb OPq ra, rb 6 fn ra rb jxx Dest 7 fn Dest
Instruction Set Architecture Computer Architecture: Instruction Set Architecture CSci 2021: Machine Architecture and Organization Lecture #16, February 24th, 2016 Your instructor: Stephen McCamant Based
More informationGiving credit where credit is due
JDEP 284H Foundations of Computer Systems Processor rchitecture III: Sequential Implementation Dr. Steve Goddard goddard@cse.unl.edu Giving credit where credit is due Most of slides for this lecture are
More informationCS 3330: SEQ part September 2016
1 CS 3330: SEQ part 2 15 September 2016 Recall: Timing 2 compute new values between rising edges compute compute compute compute clock signal registers, memories change at rising edges next value register
More informationComputer Organization Chapter 4. Prof. Qi Tian Fall 2013
Computer Organization Chapter 4 Prof. Qi Tian Fall 2013 1 Topics Dec. 6 (Friday) Final Exam Review Record Check Dec. 4 (Wednesday) 5 variable Karnaugh Map Quiz 5 Dec. 2 (Monday) 3, 4 variables Karnaugh
More informationWhere Have We Been? Logic Design in HCL. Assembly Language Instruction Set Architecture (Y86) Finite State Machines
Where Have We Been? Assembly Language Instruction Set Architecture (Y86) Finite State Machines States and Transitions Events Where Are We Going? Tracing Instructions at the Register Level Build a CPU!
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: October 11, 2017 at 17:42 CS429 Slideset 6: 1 Topics of this Slideset
More informationFoundations of Computer Systems
18-600 Foundations of Computer Systems Lecture 8: Pipelined Processor Design John P. Shen & Gregory Kesden September 25, 2017 Lecture #7 Processor Architecture & Design Lecture #8 Pipelined Processor Design
More informationTopics of this Slideset. CS429: Computer Organization and Architecture. Instruction Set Architecture. Why Y86? Instruction Set Architecture
Topics of this Slideset CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Intro to Assembly language Programmer visible state Y86
More informationInstruc(on Set Architecture. Computer Architecture Instruc(on Set Architecture Y86. Y86-64 Processor State. Assembly Programmer s View
Instruc(on Set Architecture Computer Architecture Instruc(on Set Architecture Y86 CSCI 2021: Machine Architecture and Organiza(on Pen- Chung Yew Department Computer Science and Engineering University of
More informationSEQ part 3 / HCLRS 1
SEQ part 3 / HCLRS 1 Changelog 1 Changes made in this version not seen in first lecture: 21 September 2017: data memory value MUX input for call is PC + 10, not PC 21 September 2017: slide 23: add input
More informationbitwise (finish) / SEQ part 1
bitwise (finish) / SEQ part 1 1 Changelog 1 Changes made in this version not seen in first lecture: 14 September 2017: slide 16-17: the x86 arithmetic shift instruction is sar, not sra last time 2 bitwise
More informationCS:APP Chapter 4 Computer Architecture Wrap-Up Randal E. Bryant Carnegie Mellon University
CS:APP Chapter 4 Computer Architecture Wrap-Up Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP2e Overview Wrap-Up of PIPE Design Exceptional conditions Performance analysis Fetch
More informationChangelog. Changes made in this version not seen in first lecture: 13 Feb 2018: add slide on constants and width
HCL 1 Changelog 1 Changes made in this version not seen in first lecture: 13 Feb 2018: add slide on constants and width simple ISA 4: mov-to-register 2 irmovq $constant, %ryy rrmovq %rxx, %ryy mrmovq 10(%rXX),
More informationSequential CPU Implementation.
Sequential CPU Implementation Y86 Instruction Set P259 Byte 0 1 2 3 4 5 nop 0 0 halt 1 0 rrmovl ra, rb 2 0 ra rb irmovl V, rb 3 0 8 rb V rmmovl ra, D(rB) 4 0 ra rb D mrmovl D(rB), ra 5 0 ra rb D OPl ra,
More informationChapter 4 Processor Architecture: Y86 (Sections 4.1 & 4.3) with material from Dr. Bin Ren, College of William & Mary
Chapter 4 Processor Architecture: Y86 (Sections 4.1 & 4.3) with material from Dr. Bin Ren, College of William & Mary 1 Outline Introduction to assembly programing Introduction to Y86 Y86 instructions,
More informationCS 261 Fall Mike Lam, Professor. CPU architecture
CS 261 Fall 2017 Mike Lam, Professor CPU architecture Topics CPU stages and design Pipelining Y86 semantics CPU overview A CPU consists of Combinational circuits for computation Sequential circuits for
More informationPipelining 3: Hazards/Forwarding/Prediction
Pipelining 3: Hazards/Forwarding/Prediction 1 pipeline stages 2 fetch instruction memory, most PC computation decode reading register file execute computation, condition code read/write memory memory read/write
More informationInstruction Set Architecture
CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant adapted by Jason Fritts http://csapp.cs.cmu.edu CS:APP2e Hardware Architecture - using Y86 ISA For learning aspects
More informationcs281: Computer Systems CPUlab ALU and Datapath Assigned: Oct. 30, Due: Nov. 8 at 11:59 pm
cs281: Computer Systems CPUlab ALU and Datapath Assigned: Oct. 30, Due: Nov. 8 at 11:59 pm The objective of this exercise is twofold to complete a combinational circuit for an ALU that we can use with
More informationCS:APP Chapter 4 Computer Architecture Wrap-Up Randal E. Bryant Carnegie Mellon University
CS:APP Chapter 4 Computer Architecture Wrap-Up Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP Overview Wrap-Up of PIPE Design Performance analysis Fetch stage design Exceptional
More informationcs281: Introduction to Computer Systems CPUlab Datapath Assigned: Oct. 29, Due: Nov. 3
cs281: Introduction to Computer Systems CPUlab Datapath Assigned: Oct. 29, Due: Nov. 3 The objective of this exercise is to familiarize you with the Datapath of the Y86 CPU and to introduce you to the
More informationlast time Exam Review on the homework on office hour locations hardware description language programming language that compiles to circuits
last time Exam Review hardware description language programming language that compiles to circuits stages as conceptual division not the order things happen easier to figure out wiring stage-by-stage?
More informationCS 3330 Exam 1 Fall 2017 Computing ID:
S 3330 Fall 2017 xam 1 Variant page 1 of 8 mail I: S 3330 xam 1 Fall 2017 Name: omputing I: Letters go in the boxes unless otherwise specified (e.g., for 8 write not 8 ). Write Letters clearly: if we are
More informationCS:APP3e Web Aside ARCH:HCL: HCL Descriptions of Y86-64 Processors
CS:APP3e Web Aside ARCH:HCL: HCL Descriptions of Y86-64 Processors Randal E. Bryant David R. O Hallaron December 29, 2014 Notice The material in this document is supplementary material to the book Computer
More informationPipelining 4 / Caches 1
Pipelining 4 / Caches 1 1 2 after forwarding/prediction where do we still need to stall? memory output needed in fetch ret followed by anything memory output needed in exceute mrmovq or popq + use (in
More informationFor Tuesday. Finish Chapter 4. Also, Project 2 starts today
For Tuesday Finish Chapter 4 Also, Project 2 starts today 1 Sequential Y86 Implementation 1. Fetch From icode (4 bits) & ifun (4 bits) [valc (4bytes)] Calc valp 2. Decode: get ra [rb] [%esp] 3. Execute
More informationInstruction Set Architecture
CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP Instruction Set Architecture Assembly Language View Processor
More informationInstruction Set Architecture
CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP Instruction Set Architecture Assembly Language View! Processor
More informationSEQ without stages. valc register file ALU. Stat ZF/SF. instr. length PC+9. R[srcA] R[srcB] srca srcb dstm dste. Data in Data out. Instr. Mem.
Exam Review 1 SEQ without stages 2 0xF 0xF %rsp valc register file PC+9 PC + Instr. Mem. instr. length ra rb %rsp 0xF 0xF %rsp srca srcb dstm dste R[srcA] R[srcB] next R[dstM] next R[dstE] 0 8 ALU alua
More informationCISC 360 Instruction Set Architecture
CISC 360 Instruction Set Architecture Michela Taufer October 9, 2008 Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective, R. Bryant and D. O'Hallaron, Prentice Hall, 2003 Chapter
More informationReading assignment. Chapter 3.1, 3.2 Chapter 4.1, 4.3
Reading assignment Chapter 3.1, 3.2 Chapter 4.1, 4.3 1 Outline Introduc5on to assembly programing Introduc5on to Y86 Y86 instruc5ons, encoding and execu5on 2 Assembly The CPU uses machine language to perform
More informationInstruction Set Architecture
CISC 360 Instruction Set Architecture Michela Taufer October 9, 2008 Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective, R. Bryant and D. O'Hallaron, Prentice Hall, 2003 Chapter
More informationProcessor Architecture I. Alexandre David
Processor Architecture I Alexandre David Overview n Introduction: from transistors to gates. n and from gates to circuits. n 4.1 n 4.2 n Micro+macro code. 12-04-2011 CART - Aalborg University 2 Evolution
More informationcmovxx ra, rb 2 fn ra rb irmovl V, rb rb V rmmovl ra, D(rB) 4 0 ra rb D mrmovl D(rB), ra 5 0 ra rb D OPl ra, rb 6 fn ra rb jxx Dest 7 fn Dest
Instruction Set Architecture Instruction Set Architecture CSci 2021: Machine Architecture and Organization Lecture #16, February 25th, 2015 Your instructor: Stephen McCamant Based on slides originally
More informationY86 Processor State. Instruction Example. Encoding Registers. Lecture 7A. Computer Architecture I Instruction Set Architecture Assembly Language View
Computer Architecture I Instruction Set Architecture Assembly Language View Processor state Registers, memory, Instructions addl, movl, andl, How instructions are encoded as bytes Layer of Abstraction
More informationChapter 4! Processor Architecture!
Chapter 4! Processor Architecture!! Y86 Instruction Set Architecture! Instructor: Dr. Hyunyoung Lee! Texas A&M University! Based on slides provided by Randal E. Bryant, CMU Why Learn Processor Design?!
More informationData Hazard vs. Control Hazard. CS429: Computer Organization and Architecture. How Do We Fix the Pipeline? Possibilities: How Do We Fix the Pipeline?
Data Hazard vs. Control Hazard CS429: Computer Organization and Architecture There are two types of hazards that interfere with flow through a pipeline. Dr. Bill Young Department of Computer Science University
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: April 4, 2018 at 07:36 CS429 Slideset 16: 1 Data Hazard vs. Control
More informationProcessor Architecture
Processor Architecture God created the integers, all else is the work of man Leopold Kronecker (He believed in the reduction of all mathematics to arguments involving only the integers and a finite number
More informationCPSC 121: Models of Computation. Module 10: A Working Computer
CPSC 121: Models of Computation Module 10: A Working Computer Module 10: A Working Computer???build a computer that is?able to 1. How can we?execute a user-defined program? We are finally able to answer
More informationLecture 3 CIS 341: COMPILERS
Lecture 3 CIS 341: COMPILERS HW01: Hellocaml! Announcements is due tomorrow tonight at 11:59:59pm. HW02: X86lite Will be available soon look for an announcement on Piazza Pair-programming project Simulator
More informationCISC 360 Midterm Exam - Review Alignment
CISC 360 Midterm Exam - Review Alignment Michela Taufer October 14, 2008 Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective, R. Bryant and D. O'Hallaron, Prentice Hall, CS:APP 2003
More informationCS 3330 Exam 2 Fall 2017 Computing ID:
S 3330 Fall 2017 Exam 2 Variant page 1 of 8 Email I: S 3330 Exam 2 Fall 2017 Name: omputing I: Letters go in the boxes unless otherwise specified (e.g., for 8 write not 8 ). Write Letters clearly: if we
More informationCS 3330 Exam 3 Fall 2017 Computing ID:
S 3330 Fall 2017 Exam 3 Variant E page 1 of 16 Email I: S 3330 Exam 3 Fall 2017 Name: omputing I: Letters go in the boxes unless otherwise specified (e.g., for 8 write not 8 ). Write Letters clearly: if
More informationCPSC 121: Models of Computation. Module 10: A Working Computer
CPSC 121: Models of Computation The 10th online quiz is due Tuesday, March 26 th at 17:00. Assigned reading for the quiz: Epp, 4th edition: 6.1, 7.1 Epp, 3rd edition: 5.1, 6.1 Rosen, 6th edition: 2.1,
More informationMachine Language CS 3330 Samira Khan
Machine Language CS 3330 Samira Khan University of Virginia Feb 2, 2017 AGENDA Logistics Review of Abstractions Machine Language 2 Logistics Feedback Not clear Hard to hear Use microphone Good feedback
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Warren Hunt, Jr. and Bill Young Department of Computer Sciences University of Texas at Austin Last updated: October 1, 2014 at 12:03 CS429 Slideset 6: 1 Topics
More informationRecitation #6. Architecture Lab
18-600 Recitation #6 Architecture Lab Overview Where we are Archlab Intro Optimizing a Y86 program Intro to HCL Optimizing Processor Design Where we Are Archlab Intro Part A: write and simulate Y86-64
More informationlayers of abstraction
Exam Review 1 layers of abstraction 2 x += y Higher-level language: C add %rbx, %rax 60 03 SIXTEEN Assembly: X86-64 Machine code: Y86 Hardware Design Language: HCLRS Gates / Transistors / Wires / Registers
More informationReview. Computer Science 104 Machine Organization & Programming
Review Computer Science 104 Machine Organization & Programming Administrative 104/16 Processor Due today Final Sunday Dec 18, 2-5pm in D106 Today Ø high-level review Ø Q&A Alex will review sometime, watch
More informationminor HCL update Pipe Hazards (finish) / Caching post-quiz Q1 B post-quiz Q1 D new version of hclrs.tar:
minor HCL update Pipe Hazards (finish) / Caching new version of hclrs.tar: removes poptest.yo from list of tests for seqhw, pipehw2 gives error if you try to assign to register output: register xy { foo
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: February 26, 2018 at 12:02 CS429 Slideset 8: 1 Controlling Program
More informationCPSC 313, Winter 2016 Term 1 Sample Date: October 2016; Instructor: Mike Feeley
CPSC 313, Winter 2016 Term 1 Sample Date: October 2016; Instructor: Mike Feeley NOTE: This sample contains all of the question from the two CPSC 313 midterms for Summer 2015. This term s midterm will be
More informationTerm-Level Verification of a Pipelined CISC Microprocessor
Term-Level Verification of a Pipelined CISC Microprocessor Randal E. Bryant December, 2005 CMU-CS-05-195 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This work was supported
More informationMachine-Level Programming (2)
Machine-Level Programming (2) Yanqiao ZHU Introduction to Computer Systems Project Future (Fall 2017) Google Camp, Tongji University Outline Control Condition Codes Conditional Branches and Conditional
More informationCPSC 313, Summer 2013 Final Exam Date: June 24, 2013; Instructor: Mike Feeley
CPSC 313, Summer 2013 Final Exam Date: June 24, 2013; Instructor: Mike Feeley This is a closed-book exam. No outside notes. Electronic calculators are permitted. You may remove the last two pages of the
More informationCS 261 Fall Machine and Assembly Code. Data Movement and Arithmetic. Mike Lam, Professor
CS 261 Fall 2018 0000000100000f50 55 48 89 e5 48 83 ec 10 48 8d 3d 3b 00 00 00 c7 0000000100000f60 45 fc 00 00 00 00 b0 00 e8 0d 00 00 00 31 c9 89 0000000100000f70 45 f8 89 c8 48 83 c4 10 5d c3 Mike Lam,
More informationSystems I. Pipelining II. Topics Pipelining hardware: registers and feedback paths Difficulties with pipelines: hazards Method of mitigating hazards
Systems I Pipelining II Topics Pipelining hardware: registers and feedback paths ifficulties with pipelines: hazards Method of mitigating hazards Adding Pipeline Registers val, valm _icode, _valm rite
More informationF28HS Hardware-Software Interface: Systems Programming
F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2017/18 0 No proprietary software has
More informationCS:APP Chapter 4 Computer Architecture Pipelined Implementation
CS:APP Chapter 4 Computer Architecture Pipelined Implementation Part II Randal. Bryant Carnegie ellon University http://csapp.cs.cmu.edu CS:APP Overview ata Hazards ake the pipelined processor work! Instruction
More informationCS 3330 Exam 2 Spring 2017 Name: EXAM KEY Computing ID: KEY
S 3330 Spring 2017 Exam 2 Variant E page 1 of 6 Email I: KEY S 3330 Exam 2 Spring 2017 Name: EXM KEY omputing I: KEY Letters go in the boxes unless otherwise specified (e.g., for 8 write not 8 ). Write
More informationMidterm Exam CSC February 2009
Midterm Exam CSC 252 26 February 2009 Directions; PLEASE READ This exam has 7 questions, all of which have subparts. Each question indicates its point value. The total is 90 points. Questions 3(d) and
More information12.1. CS356 Unit 12. Processor Hardware Organization Pipelining
12.1 CS356 Unit 12 Processor Hardware Organization Pipelining BASIC HW 12.2 Inputs Outputs 12.3 Logic Circuits Combinational logic Performs a specific function (mapping of 2 n input combinations to desired
More informationMachine Program: Procedure. Zhaoguo Wang
Machine Program: Procedure Zhaoguo Wang Requirements of procedure calls? P() { y = Q(x); y++; 1. Passing control int Q(int i) { int t, z; return z; Requirements of procedure calls? P() { y = Q(x); y++;
More informationx86-64 Programming II
x86-64 Programming II CSE 351 Winter 2018 Instructor: Mark Wyse Teaching Assistants: Kevin Bi Parker DeWilde Emily Furst Sarah House Waylon Huang Vinny Palaniappan http://xkcd.com/409/ Administrative Homework
More informationMichela Taufer CS:APP
ichela Taufer CS:APP Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective, R. Bryant and. O'Hallaron, Prentice Hall, 2003 Overview 2 CISC 360 Faʼ08 Pipeline Stages W_icode, W_val W
More informationMachine Level Programming: Control
Machine Level Programming: Control Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3 Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU) Mohamed
More informationAssembly Programming III
Assembly Programming III CSE 410 Winter 2017 Instructor: Justin Hsia Teaching Assistants: Kathryn Chan, Kevin Bi, Ryan Wong, Waylon Huang, Xinyu Sui Facebook Stories puts a Snapchat clone above the News
More informationMachine-Level Programming II: Control
Machine-Level Programming II: Control CSE 238/2038/2138: Systems Programming Instructor: Fatma CORUT ERGİN Slides adapted from Bryant & O Hallaron s slides 1 Today Control: Condition codes Conditional
More informationMachine-Level Programming II: Control
Mellon Machine-Level Programming II: Control CS140 Computer Organization and Assembly Slides Courtesy of: Randal E. Bryant and David R. O Hallaron 1 First https://www.youtube.com/watch?v=ivuu8jobb1q 2
More information