Processor Architecture I. Alexandre David

Similar documents
Computer Organization Chapter 4. Prof. Qi Tian Fall 2013

DNA Microcode and Processor Modes. Alexandre David

CISC: Stack-intensive procedure linkage. [Early] RISC: Register-intensive procedure linkage.

Instruction Set Architecture

Chapter 4 Processor Architecture: Y86 (Sections 4.1 & 4.3) with material from Dr. Bin Ren, College of William & Mary

Computer Science 104:! Y86 & Single Cycle Processor Design!

Reading assignment. Chapter 3.1, 3.2 Chapter 4.1, 4.3

Computer Science 104:! Y86 & Single Cycle Processor Design!

Sequential Implementation

CPSC 121: Models of Computation. Module 10: A Working Computer

cmovxx ra, rb 2 fn ra rb irmovl V, rb rb V rmmovl ra, D(rB) 4 0 ra rb D mrmovl D(rB), ra 5 0 ra rb D OPl ra, rb 6 fn ra rb jxx Dest 7 fn Dest

CPSC 121: Models of Computation. Module 10: A Working Computer

CS429: Computer Organization and Architecture

CPSC 313, Summer 2013 Final Exam Date: June 24, 2013; Instructor: Mike Feeley

Chapter 4! Processor Architecture!

Processor Architecture

Background: Sequential processor design. Computer Architecture and Systems Programming , Herbstsemester 2013 Timothy Roscoe

Instruction Set Architecture

Where Have We Been? Logic Design in HCL. Assembly Language Instruction Set Architecture (Y86) Finite State Machines

Instruction Set Architecture

Computer Organization: A Programmer's Perspective

CS:APP Chapter 4! Computer Architecture! Sequential! Implementation!

Y86 Processor State. Instruction Example. Encoding Registers. Lecture 7A. Computer Architecture I Instruction Set Architecture Assembly Language View

CISC 360 Instruction Set Architecture

Instruction Set Architecture

CPSC 313, Winter 2016 Term 1 Sample Date: October 2016; Instructor: Mike Feeley

CS:APP Chapter 4 Computer Architecture Sequential Implementation

Giving credit where credit is due

Processor Architecture II! Sequential! Implementation!

Intel x86-64 and Y86-64 Instruction Set Architecture

Computer Science 104:! Y86 & Single Cycle Processor Design!

Chapter 4! Processor Architecture!!

CS:APP Chapter 4 Computer Architecture Sequential Implementation

Computer Architecture I: Outline and Instruction Set Architecture. CENG331 - Computer Organization. Murat Manguoglu

Giving credit where credit is due

Y86 Instruction Set. SEQ Hardware Structure. SEQ Stages. Sequential Implementation. Lecture 8 Computer Architecture III.

Second Part of the Course

CSC 252: Computer Organization Spring 2018: Lecture 11

cmovxx ra, rb 2 fn ra rb irmovq V, rb 3 0 F rb V rmmovq ra, D(rB) 4 0 ra rb mrmovq D(rB), ra 5 0 ra rb OPq ra, rb 6 fn ra rb jxx Dest 7 fn Dest

CISC Fall 2009

Computer Science 104:! Y86 & Single Cycle Processor Design!

CPSC 313, Summer 2013 Final Exam Solution Date: June 24, 2013; Instructor: Mike Feeley

CSE2421 FINAL EXAM SPRING Name KEY. Instructions: Signature

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016

CS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

Process Layout and Function Calls

Systems I. Datapath Design II. Topics Control flow instructions Hardware for sequential machine (SEQ)

Assembly Language: Function Calls

cs281: Computer Systems CPUlab ALU and Datapath Assigned: Oct. 30, Due: Nov. 8 at 11:59 pm

Assembly Language: Function Calls" Goals of this Lecture"

CSC 8400: Computer Systems. Machine-Level Representation of Programs

Credits to Randy Bryant & Dave O Hallaron

Instruc(on Set Architecture. Computer Architecture Instruc(on Set Architecture Y86. Y86-64 Processor State. Assembly Programmer s View

CS429: Computer Organization and Architecture

cs281: Introduction to Computer Systems CPUlab Datapath Assigned: Oct. 29, Due: Nov. 3

Assembly Language: Function Calls" Goals of this Lecture"

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

6.1 Combinational Circuits. George Boole ( ) Claude Shannon ( )

CSCI 2121 Computer Organization and Assembly Language PRACTICE QUESTION BANK

Topics of this Slideset. CS429: Computer Organization and Architecture. Instruction Set Architecture. Why Y86? Instruction Set Architecture

Machine-Level Programming II: Control and Arithmetic

CS 31: Intro to Systems ISAs and Assembly. Martin Gagné Swarthmore College February 7, 2017

God created the integers, all else is the work of man Leopold Kronecker

Towards the Hardware"

CS:APP Chapter 4 Computer Architecture Wrap-Up Randal E. Bryant Carnegie Mellon University

Function Calls COS 217. Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site)

Compiler Construction D7011E

Process Layout, Function Calls, and the Heap

The Hardware/Software Interface CSE351 Spring 2013

CS:APP Chapter 4 Computer Architecture Logic Design Randal E. Bryant Carnegie Mellon University

CS429: Computer Organization and Architecture

Sample Exam I PAC II ANSWERS

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CS241 Computer Organization Spring Addresses & Pointers

Machine-Level Programming II: Arithmetic & Control /18-243: Introduction to Computer Systems 6th Lecture, 5 June 2012

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

CS:APP Chapter 4 Computer Architecture Wrap-Up Randal E. Bryant Carnegie Mellon University

CPS104 Recitation: Assembly Programming

ASSEMBLY II: CONTROL FLOW. Jo, Heeseung

CS61 Section Solutions 3

CS241 Computer Organization Spring 2015 IA

Foundations of Computer Systems

3 (5 marks) RISC instruction sets do not allow an ALU operation (e.g., addl) to read from memory. 2a Why is the memory stage after the execute stage?

Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

SOEN228, Winter Revision 1.2 Date: October 25,

Credits and Disclaimers

X86 Addressing Modes Chapter 3" Review: Instructions to Recognize"

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor

Machine Language CS 3330 Samira Khan

Program Exploitation Intro

Machine-Level Programming II: Arithmetic & Control. Complete Memory Addressing Modes

Assembly level Programming. 198:211 Computer Architecture. (recall) Von Neumann Architecture. Simplified hardware view. Lecture 10 Fall 2012

CS429: Computer Organization and Architecture

How Software Executes

Sungkyunkwan University

Machine-level Representation of Programs. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Question 4.2 2: (Solution, p 5) Suppose that the HYMN CPU begins with the following in memory. addr data (translation) LOAD 11110

Transcription:

Processor Architecture I Alexandre David

Overview n Introduction: from transistors to gates. n and from gates to circuits. n 4.1 n 4.2 n Micro+macro code. 12-04-2011 CART - Aalborg University 2

Evolution of Computers n Early systems n CPU (central processing unit) controlled the entire system n Responsible for I/O, computations, n Modern computers n Decentralized architecture n Processors distributed (I/O) n CPU still controls other processors 12-04-2011 CART - Aalborg University 3

General Purpose CPU n Very complex because n designed for wide variety of tasks multiple roles n contains special purpose sub-units n ex: core i7 has 731M transistors n supports protection and privileges (OS/applic.) n supports priorities (I/O) n data size (32/64-bit registers) n high speed parallelism = replication 12-04-2011 CART - Aalborg University 4

CPU Visible State n Visible for ISA, used by compiler (& assembler) there may be other registers etc that depend on the CPU generation. n Registers (classify them), condition codes, status & memory. n Memory=array of bytes (abstraction). RF: Program registers" %eax %ecx %edx %ebx %esi %edi %esp %ebp CC: Condition codes" ZF SF OF PC" Stat: Program Status" DMEM: Memory" 12-04-2011 CART - Aalborg University 5

ISA n Classify: n load/store n r/i/m operands n arithmetics n jumps n (cmov) n call/return n stack n Encoding: n opcode+ operands Byte! 0 1 2 3 4 5 halt 0 0 nop 1 0 rrmovl ra, rb" 2 0 ra" rb" irmovl V, rb" 3 0 F rb" V" rmmovl ra,d(rb) 4 0 ra" rb" D" mrmovl D(rB),rA" 5 0 ra" rb" D" OPl ra, rb" 6 fn" ra" rb" jxx Dest" 7 fn" Dest" cmovxx ra, rb" 2 fn" ra" rb" call Dest" 8 0 Dest" ret 9 0 pushl ra" A 0 ra" F popl ra" B 0 ra" F 12-04-2011 CART - Aalborg University 6

ISA Notes n No memory memory transfer. n No imm memory transfer (x86 can do it). n Restricted operators (add, sub, and, xor). n Only register register operands. n Typical of load/store architectures (also RISC). n Conditional jumps depend on combinations of flags. n Similar for conditional move. n Call, ret, pop, and push implicitly modify the stack (& stack pointer). 12-04-2011 CART - Aalborg University 7

Encoding PP4.1 PP4.2 n 1 byte encoding = code + function. n Unique combination for every instruction. n Register operands have a unique identifier. n eax:0, ecx:1, none:f n none important for design Operations! Branches! Moves! addl 6 0 jmp 7 0 jne 7 4 rrmovl 2 0 cmovne 2 4 subl 6 1 jle 7 1 jge 7 5 cmovle 2 1 cmovge 2 5 andl 6 2 jl 7 2 jg 7 6 cmovl 2 2 cmovg 2 6 xorl 6 3 je 7 3 cmove 2 3 12-04-2011 CART - Aalborg University 8

Status Code n State of the processor n AOK normal n HLT halted n ADR invalid address n INS invalid instruction 12-04-2011 CART - Aalborg University 9

Y86 X86 n Y86 simplified model for X86. n Code is similar, except for n move instructions n restrictions n may need more instructions n not important, we abstract from that. n Reason on Y86 level, exercise with both (simulator Y86, gcc for X86). 12-04-2011 CART - Aalborg University 10

Y86 Assembly PP4.3 n Instructions as described with registers. n Assembly directives. n Where to put the code (.pos). n Align the code (.align). n Declare data (.long). n X86 has more. n Label declarations (used for jump offsets). n Assembled into bytes. n Y86 interpret the bytes. 12-04-2011 CART - Aalborg University 11

Logic Design n How to implement the hardware to recognize the instruction codes. n Logic that n reads bytes, n interprets bytes (switch), n performs operations, n updates state. n Transistors gates functions & blocks. n Processor design at block level. 12-04-2011 CART - Aalborg University 12

Background n Voltage: difference of potentials. n Vcc ground (=0). n Volts (V) n Current: flow of electrons. n Amperes (A) n Ohm s law: U = RI n Dissipated power: P = UI = U 2 /R 12-04-2011 CART - Aalborg University 13

Typical Chips n Operate on low voltage (5V, less for processors) see power dissipation. n Always 2 lines n ground (0V) n power (5V) n Diagrams usually omit ground and power. 12-04-2011 CART - Aalborg University 14

Transistor collector Basic building block of digital circuits base emitter Acts like a switch. 12-04-2011 CART - Aalborg University 15

How Are They Made? pure silicon with perfect crystalline structure photo-lithography slice into wafers cut sand doping materials (N/P) metal (copper/gold) connect & package 12-04-2011 CART - Aalborg University 16

Boolean Algebra n Mathematical basis for digital circuits. n From boolean functions to gates. n Basic functions: and, or, not. n In practice, cheaper to have nand & nor. 12-04-2011 CART - Aalborg University 17

Example: Not 12-04-2011 CART - Aalborg University 18

Gates n Primitive boolean functions. n Level of abstraction on integrated circuits. Symbols used in circuits. 12-04-2011 CART - Aalborg University 19

Logic Gate Technology n Transistor-transistor technology (TTL) n connect directly gates together to form boolean functions and function 12-04-2011 CART - Aalborg University 20

Design of Functions n Find a boolean expression that does what you need n and feed it to a tool that optimizes it to minimize the number of gates. n Come up with the truth table of your function n which is converted to a boolean function. 12-04-2011 CART - Aalborg University 21

Truth Table 12-04-2011 CART - Aalborg University 22

Combinatorial Circuits n Outputs = function(inputs) n change outputs only when inputs changes n need states to perform sequences of operations without sustained inputs n maintain states n use a clock 12-04-2011 CART - Aalborg University 23

Practical Concerns n Power n consumption: how to feed n dissipation P=CFV 2 (C: capacitance, F: frequency) how not to burn n Timing gates need time to settle. n Clock synchronization. n Update upon rise or fall of clock signal. 12-04-2011 CART - Aalborg University 24

Clock Skew Signals need time to propagate. Local clocks are used on larger systems need to synchronize them. The speed of light is too slow. 12-04-2011 CART - Aalborg University 25

Logic Design & HCL n Design logic with gates but not one by one and not manually! n Use an adapted language for that. Here HCL (hardware control language) for educational purposes. n C-like language to express boolean formulas. n Combinatorial circuits built out of these formulas. n Acyclic network of gates: signal propagates from inputs to outputs boolean functions. 12-04-2011 CART - Aalborg University 26

Example PP4.8 n bool eq = (a && b) (!a &&!b); a " Bit equal" eq" b " 12-04-2011 CART - Aalborg University 27

Multiplexor PP4.9 n Function: choose an input signal depending on a selection signal. bool out = (s && a) (!s && b); n Select results, functions, etc s" Bit MUX" b " out" a " 12-04-2011 CART - Aalborg University 28

Word-Level Combinatorial Circuits n Operations defined at word level (~integers). n Treat groups of bits together. n Define functions at word-level. 12-04-2011 CART - Aalborg University 29

Example: Equality Test n bool Eq = (A == B); A). Bit-level implementation! B). Word-level abstraction! b 31" a Bit equal" eq 31" b 30" Bit equal" eq 30" a 30" Eq " B" A" =" A = B" b 1" a Bit equal" eq 1" b 0" Bit equal" eq 0" a 0" 12-04-2011 CART - Aalborg University 30

Multiplexor At Word-Level s" A). Bit-level implementation! B). Word-level abstraction! b 31" a 31" out 31" b 30" s" a 30" out 30" B" A" MUX" Out" int Out = [ s : A; 1 : B; ]; b 0" a 0" out 0" 12-04-2011 CART - Aalborg University 31

Use: Select PP4.10 n int Out4 = [!s1 &&!s0 : A; # 00!s1 : B; # 01!s0 : C; # 10 1 : D; # 11 ] n Simplified select. s1" s0" D" C" B" A" MUX4" Out4" 12-04-2011 CART - Aalborg University 32

ALU n 2 operand inputs + 1 control input. n Operands X and Y. n Control selects operation. n Same principle as select, we abstract from the exact design. 0" 1" 2" 3" Y" X" A" B" A" L" U" X + Y" Y" X" A" B" A" L" U" X - Y" Y" X" A" A" L" U" B" X & Y" Y" X" A" A" L" U" B" X ^ Y" 12-04-2011 CART - Aalborg University 33

Memory & Clocking n Memory stores states. n Functions only propagates signals. n Memory implemented as flip-flop-like circuits. Have feedback loops to keep bits. n Registers (hardware or program). n Clocks synchronize when to update. n Between updates, signals propagate. n Clock signal rises registers are updated. 12-04-2011 CART - Aalborg University 34

Storing 1 Bit Bistable Element q Q+!q Q q = 0 or 1 V1 1 0.9 0.8 V 2 0.7 0.6 V1 V1 V2 0.5 0.4 V in V1 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 12-04-2011 CART - Aalborg University 35 Vin

Storing 1 Bit (cont.) Bistable Element q Q+!q Q q = 0 or 1 Stable 1 1 0.9 V in = V 2 V 2 0.8 0.7 0.6 V1 Vin V2 0.5 V in V1 Stable 0 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Vin Metastable 12-04-2011 CART - Aalborg University 36

. Physical Analogy Stable 1 1 1 0.9 0.9 0.8 0.8 0.7 0.6 V1Vin V2 V2 0.5 0.4 0.3 0.3 Metastable 0.2 0.2 Stable 0 0.1 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Vin Vin Metastable Stable left Stable right 12-04-2011 CART - Aalborg University 37

Storing and Accessing 1 Bit Bistable Element R-S Latch q Q+ R Q+!q Q S Q q = 0 or 1 Resetting Setting Storing R 1 1 0 Q+ R 0 0 1 Q+ R 0!q q Q+ S 0 0 1 Qœ S 1 1 0 Qœ S 0 q!q Qœ 12-04-2011 CART - Aalborg University 38

1-Bit Latch D Data D Latch R Q+ C Clock S Q Latching Storing d D!d!d!d d R Q+ D d!d 0 R!q q Q+ 1 C S d d!d Qœ 0 C 0 S q!q Qœ 12-04-2011 CART - Aalborg University 39

Registers Structure i 7 D C Q+ o 7 i 6 D C Q+ o 6 i 5 D C Q+ o 5 i 4 i 3 D C D C Q+ Q+ o 4 o 3 I O i 2 i 1 D C D C Q+ Q+ o 2 o 1 Clock i 0 D C Q+ o 0 n Stores word of data n Clock Different from program registers seen in assembly code n Collection of edge-triggered latches n Loads input on rising edge of clock 12-04-2011 CART - Aalborg University 40

Clock Synchronization State = x" State = y" Input = y" Output = x" _ Rising" clock" _ x" y" Output = y" n Register operations are synchronized. n Stores data bits. n For most of time acts as barrier between input and output. n As clock rises, loads input. 12-04-2011 CART - Aalborg University 41

Register File n Set of program registers. n Local and fast access storage. n Small. n Fixed size (machine word). 12-04-2011 CART - Aalborg University 42

Memory n Abstract & simplified model. n Simple array of byte, no hierarchy. n We ll see later the hierarchy & virtual memory system. 12-04-2011 CART - Aalborg University 43

Micro/Macro-code

Complement n We ll focus on gate/logic design. n = simplified model. n Reality is more complex. n Design is at a higher level. n We ll see hints of correspondence to micro-code. 12-04-2011 CART - Aalborg University 45

Microcode n How to implement complex CPU? n Program the complex instructions. n Visible machine language = macro instruction set. n Internal language = micro-code. n Microcontroller inside CPUs that decode and execute macro-instructions. n RISC n Processors are all RISCs in the end. n Key: Easier to write programs with micro-code than to build hardware from scratch. 12-04-2011 CART - Aalborg University 46

Microcode 12-04-2011 CART - Aalborg University 47

Data and Register Sizes n Size of visible register may differ from size of internal registers. n Ex: Could implement 32-bit instruction set on a 16-bit microcontroller. 12-04-2011 CART - Aalborg University 48

Advantages/Drawbacks n Advantages n Can change microcode and keep the same macro-instruction set! n Less prone to errors, can be updated more easily. n Drawback n Cost in performance overhead. 12-04-2011 CART - Aalborg University 49

Vertical Microcode n Simple view of microcontroller ~ standard processor. n Execution of micro-code like assembly. n One micro-instruction at a time. n Access to different units. n Decode each macro-instruction and execute micro-code. n Easy to read/write, bad performance. n Not the case in practice. 12-04-2011 CART - Aalborg University 50

Horizontal Microcode n Use implicit parallelism. n Utilize units in parallel when possible. n Control data movements and the different hardware units at the same time. n Very difficult to program. n Long instruction: exec op1 unit1 exec op2 unit2 transfer this register there 12-04-2011 CART - Aalborg University 51

Example Architecture 12-04-2011 CART - Aalborg University 52

12-04-2011 CART - Aalborg University 53

Horizontal Microcode n Not like conventional programs. n Each instruction takes one cycle n but not all operations take one cycle n special care for timing, wait for units that need more cycles continue 12-04-2011 CART - Aalborg University 54

Intelligent Microcontroller n Schedules instructions & units. n Handles operations in parallel. n Performs branch prediction. n May try 2 paths and discard the results of the wrong one later. n Important: Keep the sequential semantics. n Out-of-order execution n use scoreboard to keep track of results and dependencies 12-04-2011 CART - Aalborg University 55

Conclusion n Does it matter? n Yes! Understand your hardware and its technology. n Use it in a better way. Reduce branches, or make them easy to guess. Ex: for(i =0, j = n-1; i < j; ++i,--j) swap(&a[i],&a[j]) harder than for(i = 0; i < n/2; ++i) swap(&a[i],&a[n-1-i]) 12-04-2011 CART - Aalborg University 56