omputer Design Concept adao Nakamura

Similar documents
Lecture1: introduction. Outline: History overview Central processing unite Register set Special purpose address registers Datapath Control unit

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Processing Unit CS206T

Basic Computer Architecture

Class Notes. Dr.C.N.Zhang. Department of Computer Science. University of Regina. Regina, SK, Canada, S4S 0A2

Computer Architecture

THE MICROPROCESSOR Von Neumann s Architecture Model

RAČUNALNIŠKEA COMPUTER ARCHITECTURE

COURSE DESCRIPTION. CS 232 Course Title Computer Organization. Course Coordinators

structural RTL for mov ra, rb Answer:- (Page 164) Virtualians Social Network Prepared by: Irfan Khan

EC 513 Computer Architecture

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

MICROPROGRAMMED CONTROL

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

EE 4980 Modern Electronic Systems. Processor Advanced

COMPUTER ORGANIZATION AND ARCHITECTURE

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Computer Architecture

2 MARKS Q&A 1 KNREDDY UNIT-I

Micro-programmed Control Ch 15

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise

Micro-programmed Control Ch 17

Micro-programmed Control Ch 15

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions

Digital System Design Using Verilog. - Processing Unit Design

PIPELINING AND VECTOR PROCESSING

What Are The Main Differences Between Program Counter Pc And Instruction Register Ir

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

RISC Processors and Parallel Processing. Section and 3.3.6

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Chapter 3 : Control Unit

Chapter 2: Data Manipulation

Computer Architecture Programming the Basic Computer

REGISTER TRANSFER LANGUAGE

Processor Architecture

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

The Processor: Instruction-Level Parallelism

William Stallings Computer Organization and Architecture

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

CHETTINAD COLLEGE OF ENGINEERING AND TECHNOLOGY COMPUTER ARCHITECURE- III YEAR EEE-6 TH SEMESTER 16 MARKS QUESTION BANK UNIT-1

Microcomputer Architecture and Programming

LECTURE 10. Pipelining: Advanced ILP

Processor (IV) - advanced ILP. Hwansoo Han

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

The von Neumann Architecture. IT 3123 Hardware and Software Concepts. The Instruction Cycle. Registers. LMC Executes a Store.

QUESTION BANK UNIT-I. 4. With a neat diagram explain Von Neumann computer architecture

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Superscalar Processors Ch 14

COSC 122 Computer Fluency. Computer Organization. Dr. Ramon Lawrence University of British Columbia Okanagan

Chapter 4. MARIE: An Introduction to a Simple Computer 4.8 MARIE 4.8 MARIE A Discussion on Decoding

Chapter 4. Chapter 4 Objectives

Chapter 05: Basic Processing Units Control Unit Design. Lesson 15: Microinstructions

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

Blog -

Instruction Pipelining

MARIE: An Introduction to a Simple Computer

Control unit. Input/output devices provide a means for us to make use of a computer system. Computer System. Computer.

COMPUTER STRUCTURE AND ORGANIZATION

Instruction Pipelining

COSC 243. Computer Architecture 1. COSC 243 (Computer Architecture) Lecture 6 - Computer Architecture 1 1

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

COMPUTER ORGANIZATION AND DESI

Announcement. Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Chapter 9 Objectives. 9.2 RISC Machines

Where Does The Cpu Store The Address Of The

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Chapter 2 Logic Gates and Introduction to Computer Architecture

Real instruction set architectures. Part 2: a representative sample

Computer Architecture 2/26/01 Lecture #

Course Description: This course includes concepts of instruction set architecture,

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

INTELLIGENCE PLUS CHARACTER - THAT IS THE GOAL OF TRUE EDUCATION UNIT-I

Example of A Microprogrammed Computer

Intel released new technology call P6P

SYLLABUS. osmania university CHAPTER - 1 : REGISTER TRANSFER LANGUAGE AND MICRO OPERATION CHAPTER - 2 : BASIC COMPUTER

Computer Organization Question Bank

The Stored Program Computer

Superscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency

ECE 587 Advanced Computer Architecture I

Chapter 1: Basics of Microprocessor [08 M]

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

CISC / RISC. Complex / Reduced Instruction Set Computers

CC312: Computer Organization

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

ECE 571 Advanced Microprocessor-Based Design Lecture 4

MARIE: An Introduction to a Simple Computer

Chapter 4. The Processor

3.1 Description of Microprocessor. 3.2 History of Microprocessor

CPE300: Digital System Architecture and Design

Computer Architecture and Data Manipulation. Von Neumann Architecture

Processors. Young W. Lim. May 12, 2016

COS 140: Foundations of Computer Science

CS Computer Architecture

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Transcription:

omputer Design Concept adao Nakamura akamura@archi.is.tohoku.ac.jp akamura@umunhum.stanford.edu

1 1 Pascal s Calculator

Leibniz s Calculator

Babbage s Calculator

Von Neumann Computer

Flynn s Classification of Computer Architecture

Microprocessor Design Process

Information to Adapt the Specification Requirement Clarification of the Requirement Specification Conceptual Design Concept Upgrade and Improve Design Stems of Mechanical Engineering

Number of Transistors (K) 1,000,000 100,000 10,000 Pentium 1,000 100 8086 i386 10 8085 4004 1 1971 1976 1981 1986 1991 1996 2001 Time Moore s Law 4.3 Billion Transistors in 2014 Pentium II Pentium III Pentium Pro

8-bit internal data bus Accumulator A Status Register SR ALU B D C E Instruction Register IR Clock Generator Control Circuits.., Internal Control Lines H L Stack Pointer SP Program Counter PC Serial IO Port Serial IO... External Control Data Register DR Address Buffer Address AD 0 -AD 7 / Data Address AD 8 -AD 15 Structure of Intel s Microprocessor 8085

1 10 Simple Model of von Neumann Computers

Cycles per Instruction CPI 20.0 10.0 5.0 2.0 1.0 0.5 0.2 0.1 Scalar CISC Superscalar RISC VLIW Superpipeline Scalar RISC 5 10 20 50 100 200 500 1000 Frequency MHz 11 Distribution of Processors in Cycle per Instruction

Clock Cycles 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 DE DE DE DE DE DE DE DE DE : Instruction Fetch DE: Decode & Operand Fetch : Execution : Write Back Instructions Superscalar Processor 12 Pipeline Execution in a Superscalar Processor

Clock Cycles 1 2 3 4 5 6 1 2 3 4 DE DE : Instruction Fetch DE: Decode & Operand Fetch : Execution : Write Back 5 6 7 DE 8 9 Instructions VLIW Processor 1 13 Pipeline Execution in a VLIW Processor

1 2 3 4 5 6 Instructions Clock Cycles 1 2 3 4 5 6 DE DE DE DE DE DE : Instruction Fetch DE: Decode & Operand Fetch : Execution : Write Back Superpipeline Processor 14 Pipeline Execution in a Superpipeline Processor

Main Memory Switch CPU CPU CPU (a) Multiprocessor System Switch CPU CPU CPU Main Memory Main Memory Main Memory (b) Multicomputer System 15 Parallel Computers

X=x1E+x2 Y=y1E-y2 Compare exponents Shift Add Normalize Z=X+Y (a) Floating Point Arithmetic Pipeline X1,Y1 Z1 X1,Y1 Z1 X2,Y2 Z2 X2,Y2 Z2 X3,Y3 Z3 X3,Y3 Z3 1 clock / 1 result (one processor) 4 clocks / 3 results (3 processors) (b) Pipeline Processing (c) Array Processing 16 Floating Point Arithmetic Processing

(a) Scheme of SIMD (b) Scheme of MISD 17 Some Duality of SIMD and MISD

CPU Memory Vector Register Arithmetic Pipeline (a) Vector Computer Memory S W I CPU C ac Local Memory Register (File) T C h e ALU H (b) Parallel Computere 18 Comparison of Vector and Parallel Computers

19 Scalar and Vector Processing in Applications

NOVEL PROGRAMMING LANGUAGE SUPPORT SOFTWARE PARALLEL APPLICATIONS AND ALGORITHMS PARALLEL ARCHITECTURE(S) Paradigm for Application-Driven Parallel Processing

Relations among algorithm, computation model and architecture

More General Relations among algorithm, computation model and architecture

Specification Domain Architecure Domain Conceptual Design Design Concept = Computer Architecture Software Design & Production Domain Machine Instructions Assembler & Assembly Language Semiconductor-Physical Design of Circuit with Devices Chip Implementation CHIP Operating System Compiler High-Level Language Design Flow of Microprocessors

Hierarchy of Computer Architecture

c0 (ADD) ALU M A I N M E M O R Y c1 (READ) c2 (WRITE) D R c6 c4 c7 AC c3 c5 c8 A R PC c10 c9 IR CONTROL UNIT c0 c1 c10 1 Structure of a Simple CPU

Control signal c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 Microoperation AC AC + DR DR M(AR)(READ M) M(AR) DR(WRITE M) RIGHT-SHT AC DR AC AC DR AR DR(ADR) PC DR(ADR) IR DR(OP) PC PC + 1 AR PC 2 Control Signals of the Simple CPU

Begin CPU active? No End Yes AR PC READ M PC PC + 1 IR DR(OP) Decode OP Fetch cycle AC = Accumulator AR = Memory address register DR = Memory data register DR(OP) = Opecode field of DR DR(ADR) = Address field of DR IR = Instruction register M = Main memory PC = Program counter LOAD ADD JUMP AR DR(ADR) READ M AR DR(ADR) READ M Execute cycle AC DR AC AC + DR PC DR(ADR) 3 Operation of an Three-Instruction CPU

External Address Source Control Memory Address Registar Control Memory 1 to 8 Decoder S a 2 a 1 a 0 c 0 c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 External Condition Address Field Control Signals

Microprogram 1 FETCH: Microprogram 2 LOAD: Microprogram 3 ADD: Microprogram 4 JUMP: AR PC; READ M; PC PC + 1, IR DR(OP); go to IR; AR DR(ADR); READ M; AC DR; go to FETCH; AR DR(ADR); READ M; AC AC + DR; go to FETCH; PC DR(ADR); go to FETCH; 5 Examples of Microprogarams

Multiplexer External Conditions External Address PC Control Memory CM Microinstruction Register IR Decorder Control Signals 6 Microprogrammed Control Unit

Condition Select Branch Address Control Fields for Control Signals 7 Microinstraction Format

From Instruction Register IR Microprogram Control Unit Control Memory Microinstruction Register npc Nanoprogram Control Unit Control Memory ncm Nanoinstruction Register nir Control Signals 8 Microprogram and Nanoprogram followed

ID OF ID OF Ex : : : : Instruction Fetch Instruction Decode Operand Fetch Execution 9 CISC Instruction Pipeline

Clock Stage Clock Register File Register File Mux B OF Stage Mux B Stage Function Unit Function Unit Mux D Stage Mux D (a) Conventional Datapath (b) Pipelined Datapath 10 Tadapath Timing

Clock Cycle 1 Clock Cycle 2 w x y z w: The control signals are set up. x: The registers are loaded onto the input buses. y: The ALU operates. z: The results back to registers through the output bus. One Datapath Cycle

Microoperation 1 2 3 4 5 6 7 Clock Cycle 1 2 3 4 5 6 7 8 9 OF OF OF OF OF OF OF 3 12 Pipeline Execution for Microoperation Sequence

Clock Cycles 1 2 3 4 5 6 7 8 9 1 2 3 DE DE DE Datapath Domain 4 DE 5 DE 6 DE Instructions : Instruction Fetch DE: Decode and Operand Fetch : Execution : Write Back Control Unit Domain 13 Control Unit and Datapath Domains in Pipelining

ID MEM ID Ex MEM : : : : Instruction Fetch Instruction Decode Execution Memory Read / Write : Write Back 1 RISC Instruction pipeline

PC stage Instruction memory IR Register file DOF stage Instruction decoder Zero fill MUX Data Control Address stage Function unit Data memory Data Address stage MUX Data memory Control Datapath Register file

U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U Reg A B C ALU U A B C ALU Reg INSTRUCTION 1 2 3 4 1 2 3 4 Process of How A Pipeline Works

1 Model of a Superscalar Processor

Program Instruction Fetch & Branch Prediction Window of Execution Instruction Execution Instruction Reorder & Commit Instruction Dispatch Instruction Issue 18 General Model of Superscalar Processors

C = A + B C = A + B E = C + D E = C + D D = F + G J = F + G D = H + I K = H + I 19 Data Dependency

MOV R1, R5 R1<= R5 ID ADD R2,R1,R6 R2<=R1+R6 ID ADD R3,R1,R2 R3<=R1+R2 ID A Data Hazard Problem

MOV R1, R5 ID NOP ID ADD R2,R1,R6 ID NOP ID ADD R3,R1,R2 ID [NOP]:NOPs IN CASE OF SOFTWARE SOLUTION :BUBBLES IN CASE OF HARDWARE SOLUTION 2 Program-Based and Hardware Solutions

1 BZ R1,18 ID 2 MOV R2,R3 [NOP] ID 3 MOV R1,R2 [NOP] ID 4 MOV R5,R6 ID [NOP]:NOPs IN CASE OF SOFTWARE SOLUTION :BUBBLEs IN CASE OF HARDWARE SOLUTION 2 Control Hazard and Its Solution

Cycle Decode Execute Write 1 2 3 4 5 6 7 8 I1 I3 I3 I5 I2 I4 I4 I4 I6 I6 I1 I1 I2 I5 I6 I3 I4 I1 I3 I5 I2 I4 I6 2 In Order Issue - In Order Completion

Cycle Decode Execute Write 1 2 3 4 5 6 7 I1 I3 I5 I2 I4 I4 I6 I6 I1 I1 I2 I5 I6 I3 I4 I2 I1 I4 I5 I6 I3 2 In Order Issue - Out of Order Completion

1 I3 I4 I1 I2 ID 2 3 n INSTRUCTION ISSUE NEYWORK 2 In Order Issue Out of Order Completion Structure

Cycle Decode Window Execute Write 1 2 3 4 5 6 I1 I3 I5 I2 I4 I6 I1, I2 I3, I4 I4, I5, I6 I5 I1 I1 I2 I6 I5 I3 I4 I2 I1 I4 I5 I3 I6 2 Out of Order Issue - Out of Order Completion