omputer Design Concept adao Nakamura akamura@archi.is.tohoku.ac.jp akamura@umunhum.stanford.edu
1 1 Pascal s Calculator
Leibniz s Calculator
Babbage s Calculator
Von Neumann Computer
Flynn s Classification of Computer Architecture
Microprocessor Design Process
Information to Adapt the Specification Requirement Clarification of the Requirement Specification Conceptual Design Concept Upgrade and Improve Design Stems of Mechanical Engineering
Number of Transistors (K) 1,000,000 100,000 10,000 Pentium 1,000 100 8086 i386 10 8085 4004 1 1971 1976 1981 1986 1991 1996 2001 Time Moore s Law 4.3 Billion Transistors in 2014 Pentium II Pentium III Pentium Pro
8-bit internal data bus Accumulator A Status Register SR ALU B D C E Instruction Register IR Clock Generator Control Circuits.., Internal Control Lines H L Stack Pointer SP Program Counter PC Serial IO Port Serial IO... External Control Data Register DR Address Buffer Address AD 0 -AD 7 / Data Address AD 8 -AD 15 Structure of Intel s Microprocessor 8085
1 10 Simple Model of von Neumann Computers
Cycles per Instruction CPI 20.0 10.0 5.0 2.0 1.0 0.5 0.2 0.1 Scalar CISC Superscalar RISC VLIW Superpipeline Scalar RISC 5 10 20 50 100 200 500 1000 Frequency MHz 11 Distribution of Processors in Cycle per Instruction
Clock Cycles 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 DE DE DE DE DE DE DE DE DE : Instruction Fetch DE: Decode & Operand Fetch : Execution : Write Back Instructions Superscalar Processor 12 Pipeline Execution in a Superscalar Processor
Clock Cycles 1 2 3 4 5 6 1 2 3 4 DE DE : Instruction Fetch DE: Decode & Operand Fetch : Execution : Write Back 5 6 7 DE 8 9 Instructions VLIW Processor 1 13 Pipeline Execution in a VLIW Processor
1 2 3 4 5 6 Instructions Clock Cycles 1 2 3 4 5 6 DE DE DE DE DE DE : Instruction Fetch DE: Decode & Operand Fetch : Execution : Write Back Superpipeline Processor 14 Pipeline Execution in a Superpipeline Processor
Main Memory Switch CPU CPU CPU (a) Multiprocessor System Switch CPU CPU CPU Main Memory Main Memory Main Memory (b) Multicomputer System 15 Parallel Computers
X=x1E+x2 Y=y1E-y2 Compare exponents Shift Add Normalize Z=X+Y (a) Floating Point Arithmetic Pipeline X1,Y1 Z1 X1,Y1 Z1 X2,Y2 Z2 X2,Y2 Z2 X3,Y3 Z3 X3,Y3 Z3 1 clock / 1 result (one processor) 4 clocks / 3 results (3 processors) (b) Pipeline Processing (c) Array Processing 16 Floating Point Arithmetic Processing
(a) Scheme of SIMD (b) Scheme of MISD 17 Some Duality of SIMD and MISD
CPU Memory Vector Register Arithmetic Pipeline (a) Vector Computer Memory S W I CPU C ac Local Memory Register (File) T C h e ALU H (b) Parallel Computere 18 Comparison of Vector and Parallel Computers
19 Scalar and Vector Processing in Applications
NOVEL PROGRAMMING LANGUAGE SUPPORT SOFTWARE PARALLEL APPLICATIONS AND ALGORITHMS PARALLEL ARCHITECTURE(S) Paradigm for Application-Driven Parallel Processing
Relations among algorithm, computation model and architecture
More General Relations among algorithm, computation model and architecture
Specification Domain Architecure Domain Conceptual Design Design Concept = Computer Architecture Software Design & Production Domain Machine Instructions Assembler & Assembly Language Semiconductor-Physical Design of Circuit with Devices Chip Implementation CHIP Operating System Compiler High-Level Language Design Flow of Microprocessors
Hierarchy of Computer Architecture
c0 (ADD) ALU M A I N M E M O R Y c1 (READ) c2 (WRITE) D R c6 c4 c7 AC c3 c5 c8 A R PC c10 c9 IR CONTROL UNIT c0 c1 c10 1 Structure of a Simple CPU
Control signal c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 Microoperation AC AC + DR DR M(AR)(READ M) M(AR) DR(WRITE M) RIGHT-SHT AC DR AC AC DR AR DR(ADR) PC DR(ADR) IR DR(OP) PC PC + 1 AR PC 2 Control Signals of the Simple CPU
Begin CPU active? No End Yes AR PC READ M PC PC + 1 IR DR(OP) Decode OP Fetch cycle AC = Accumulator AR = Memory address register DR = Memory data register DR(OP) = Opecode field of DR DR(ADR) = Address field of DR IR = Instruction register M = Main memory PC = Program counter LOAD ADD JUMP AR DR(ADR) READ M AR DR(ADR) READ M Execute cycle AC DR AC AC + DR PC DR(ADR) 3 Operation of an Three-Instruction CPU
External Address Source Control Memory Address Registar Control Memory 1 to 8 Decoder S a 2 a 1 a 0 c 0 c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 External Condition Address Field Control Signals
Microprogram 1 FETCH: Microprogram 2 LOAD: Microprogram 3 ADD: Microprogram 4 JUMP: AR PC; READ M; PC PC + 1, IR DR(OP); go to IR; AR DR(ADR); READ M; AC DR; go to FETCH; AR DR(ADR); READ M; AC AC + DR; go to FETCH; PC DR(ADR); go to FETCH; 5 Examples of Microprogarams
Multiplexer External Conditions External Address PC Control Memory CM Microinstruction Register IR Decorder Control Signals 6 Microprogrammed Control Unit
Condition Select Branch Address Control Fields for Control Signals 7 Microinstraction Format
From Instruction Register IR Microprogram Control Unit Control Memory Microinstruction Register npc Nanoprogram Control Unit Control Memory ncm Nanoinstruction Register nir Control Signals 8 Microprogram and Nanoprogram followed
ID OF ID OF Ex : : : : Instruction Fetch Instruction Decode Operand Fetch Execution 9 CISC Instruction Pipeline
Clock Stage Clock Register File Register File Mux B OF Stage Mux B Stage Function Unit Function Unit Mux D Stage Mux D (a) Conventional Datapath (b) Pipelined Datapath 10 Tadapath Timing
Clock Cycle 1 Clock Cycle 2 w x y z w: The control signals are set up. x: The registers are loaded onto the input buses. y: The ALU operates. z: The results back to registers through the output bus. One Datapath Cycle
Microoperation 1 2 3 4 5 6 7 Clock Cycle 1 2 3 4 5 6 7 8 9 OF OF OF OF OF OF OF 3 12 Pipeline Execution for Microoperation Sequence
Clock Cycles 1 2 3 4 5 6 7 8 9 1 2 3 DE DE DE Datapath Domain 4 DE 5 DE 6 DE Instructions : Instruction Fetch DE: Decode and Operand Fetch : Execution : Write Back Control Unit Domain 13 Control Unit and Datapath Domains in Pipelining
ID MEM ID Ex MEM : : : : Instruction Fetch Instruction Decode Execution Memory Read / Write : Write Back 1 RISC Instruction pipeline
PC stage Instruction memory IR Register file DOF stage Instruction decoder Zero fill MUX Data Control Address stage Function unit Data memory Data Address stage MUX Data memory Control Datapath Register file
U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U A B C ALU Reg INSTRUCTION 1 2 3 4 1 2 3 4 Process of How A Pipeline Works
1 Model of a Superscalar Processor
Program Instruction Fetch & Branch Prediction Window of Execution Instruction Execution Instruction Reorder & Commit Instruction Dispatch Instruction Issue 18 General Model of Superscalar Processors
C = A + B C = A + B E = C + D E = C + D D = F + G J = F + G D = H + I K = H + I 19 Data Dependency
MOV R1, R5 R1<= R5 ID ADD R2,R1,R6 R2<=R1+R6 ID ADD R3,R1,R2 R3<=R1+R2 ID A Data Hazard Problem
MOV R1, R5 ID NOP ID ADD R2,R1,R6 ID NOP ID ADD R3,R1,R2 ID [NOP]:NOPs IN CASE OF SOFTWARE SOLUTION :BUBBLES IN CASE OF HARDWARE SOLUTION 2 Program-Based and Hardware Solutions
1 BZ R1,18 ID 2 MOV R2,R3 [NOP] ID 3 MOV R1,R2 [NOP] ID 4 MOV R5,R6 ID [NOP]:NOPs IN CASE OF SOFTWARE SOLUTION :BUBBLEs IN CASE OF HARDWARE SOLUTION 2 Control Hazard and Its Solution
Cycle Decode Execute Write 1 2 3 4 5 6 7 8 I1 I3 I3 I5 I2 I4 I4 I4 I6 I6 I1 I1 I2 I5 I6 I3 I4 I1 I3 I5 I2 I4 I6 2 In Order Issue - In Order Completion
Cycle Decode Execute Write 1 2 3 4 5 6 7 I1 I3 I5 I2 I4 I4 I6 I6 I1 I1 I2 I5 I6 I3 I4 I2 I1 I4 I5 I6 I3 2 In Order Issue - Out of Order Completion
1 I3 I4 I1 I2 ID 2 3 n INSTRUCTION ISSUE NEYWORK 2 In Order Issue Out of Order Completion Structure
Cycle Decode Window Execute Write 1 2 3 4 5 6 I1 I3 I5 I2 I4 I6 I1, I2 I3, I4 I4, I5, I6 I5 I1 I1 I2 I6 I5 I3 I4 I2 I1 I4 I5 I3 I6 2 Out of Order Issue - Out of Order Completion