ECE 172 Digital Systems. Chapter 4.2 Architecture. Herbert G. Mayer, PSU Status 6/10/2018

Size: px
Start display at page:

Download "ECE 172 Digital Systems. Chapter 4.2 Architecture. Herbert G. Mayer, PSU Status 6/10/2018"

Transcription

1 ECE 172 Digital Systems Chapter 4.2 Architecture Herbert G. Mayer, PSU Status 6/10/2018 1

2 Syllabus l Introduction l Uni Processor l Multi Processor l Instruction Set Architecture l Iron Law l Amdahl's Law l VLIW l Systolic Array l Bibliography 2

3 Introduction l In Digital Systems we focus on digital HW Architecture modules that enable fast operations, primarily computations l Architecture includes registers, memory, caches, processor, bus, peripherals, etc. l Ideal outcome for you: understand and learn to design complete digital computer system l Complete means, fully functional, fast, cheap to build, consuming little power, requiring a small volume, in line with actual priorities l Priorities include: function, schedule, cost, number of developers, evolving technologies, environment, etc. 3

4 Introduction (copy of p. 4, section 4.1) l Key modules of any Computer Architecture: 1. Central Processing Unit, AKA CPU, includes ALU for integers and other numeric types, Register File, pc, ir, flags, and internal registers that are not API visible 2. Memory (AKA Main Memory), including stack and heap 3. Caches: L1 and sometimes L2 integrated on same silicon die, physically but not logically part of CPU 4. Data-, address-, and control buses connecting CPU, peripherals, and memory; AKA System Bus 5. Peripherals, connected via bus 6. IO devices and controller, connected to system bus 7. Branch Prediction unit; invisible to API l Vast speed differences between CPU and memory l Memory may be few times to 2 decimal orders of magnitude slower than CPU speed l Speed disparity between CPU, memory, peripherals! 4

5 Introduction Uniprocessors l Single Accumulator Architecture (earliest systems 1940s), e.g. John von Neumann s computer, or the earlier John Vincent Atanasoff computer l l Were basis for ENIAC Commercial computers actually built and sold l General-Purpose Register Architectures (GPR) l 2-Address Architecture (GPR with one operand implied), e.g. IBM 360 l 3-Address Architecture (GPR with all operands of arithmetic operation explicit), e.g. VAX 11/70 l Stack Machines (e.g. B5000 see [2], B6000, HP3000 see [3]) 5

6 Introduction Multiprocessors l Vector Architecture, e.g. Amdahl 470/6, competing with IBM s 360 in the 1970s; blurs differentiation with Multiprocessor l Yet vector-architecture is still a pure uni-processor architecture l Shared Memory Architecture l Distributed Memory Architecture l Systolic Array Architecture; see Intel iwarp and CMU s warp architecture l Data Flow Machine; see Jack Dennis work at MIT l BSP Burroughs Scientific Processor of the 1970s 6

7 Introduction Hybrid Processors l Superscalar Architecture; see Intel 80860, AKA i860 l VLIW Architecture; see Multiflow computer l Pipelined Architecture; debatable whether UP or hybrid; we postulate: UP l EPIC Architecture; see Intel Itanium architecture l Multi-core processors as crafted today by AMD, HP, IBM, and Intel Corp. 7

8 Common Architecture Attributes l Main memory (main store); separate from CPU l Program instructions stored in main memory l Also, data stored in main memory; known as von Neumann architecture l Data available in distributed over-- main memory, stack, heap, reserved OS space, free space, IO space l Instruction pointer ip (AKA instruction counter ic, program counter pc), other special registers l Von Neumann memory bottle-neck, everything travels on same bus 8

9 Common Architecture Attributes l Accumulator (register, 1 or many) holds result of arithmetic/logical operation l IO Controller handles memory access requests from processor, to memory; AKA chipset l Current trend is to move all or part of memory controller onto CPU chip; does not mean the controller IS part of the CPU! l Processor units include: FP units, Integer unit, control unit, register file, pathways 9

10 Data-Stream, Instruction-Stream l Data-Stream, Instruction-Stream Classification, defined by Michael J. Flynn 1966! l Single-Instruction, Single-Data Stream (SISD) Architecture, e.g. (PDP-11) l Single-Instruction, Multiple-Data Stream (SIMD) Architecture, e.g. Array Processors, Solomon, Illiac IV, BSP, TMC l Multiple-Instruction, Single-Data Stream (MISD) Architecture, e.g. possibly: superscalar machines, pipelined, VLIW, EPIC l Multiple-Instruction, Multiple-Data Stream Architecture (MIMD); perhaps true multiprocessor yet to be built; yes, debatable! (Ignoring marketing hype) 10

11 Generic Computer Architecture Model 11

12 Instruction Set Architecture (ISA) l ISA is boundary between Software (SW) and Hardware (HW) l Specifies logical machine that is visible to the programmer & compiler l Is functional specification for processor designers l Boundary between CPU hardware and system firmware is sometimes a very low-level piece of system software that handles exceptions, interrupts, and HW-specific services l That level could fall into domain of the OS 12

13 Instruction Set Architecture (ISA) l Specified by ISA: l Operations: what to perform and in which order l Temporary Operand Storage in the CPU: registers, accumulator, stack (cache, as an duplicate of memory portions) l Note that stack can be word-sized, even bit-sized (design of successor for NCR s Century architecture of the 1970s) l Number of operands per instruction l Operand location: where and how to specify/locate the operands l Type and size of operands l Instruction Encoding in binary 13

14 Instruction Set Architecture (ISA) ISA: Dynamic Static Interface (DSI) 14

15 Iron Law of Processor Performance l Clock-rate doesn t count, bus width doesn t count, the number of registers and operations executed in parallel doesn t count! l What counts is: how long it takes for computational task to complete. That time is of essence of computing! l If a MIPS-based solution runs at 1 GHz, completing a program X in 2 minutes, while an Intel Pentium 4 based program runs at 3 GHz and completes that same program x in 2.5 minutes, programmers and users are more interested in the former solution 15

16 Iron Law of Processor Performance l If a solution on an Intel CPU can be expressed in an object program of size Y bytes, but on an IBM architecture of size 1.1 Y bytes, the Intel solution is generally more attractive l Assuming same execution, performance l Meaning of this: n Wall-clock time (Time) is time I have to wait for completion n Program Size perhaps measured in: bytes of code, bytes of static data space, size of stack and heap used is indicator of overall complexity of computational task, and physical parameters of data 16

17 Iron Law of Processor Performance 17

18 Amdahl s Law l Articulated by Gene Amdahl l During 1967 AFIPS conference l Stating that the maximum speedup of a program P is dominated by its sequential portion S l I.e. if some part of program P can be perfectly accelerated due to very numerous parallel processors, but some part S of P is inherently sequential, then the resulting performance is dominated by S l See Wikipedia sample: next page! 18

19 Amdahl s Law (Source: Wikipedia) The speedup of a program using multiple processors in parallel computing is limited by the sequential fraction of the program. For example, if 95% of the program can be parallelized, the theoretical maximum speedup using parallel computing would be 20 as shown in the diagram, regardless of number of available processors n = element of N, N number of processors B = element of { 0, 1 } T(n) = time to execute with n processors T(n) = T(1) ( B + (1-B) / n ) S(n) = Speedup T(1) / T(n) S(n) = 1 / ( B + (1 B ) / n ) 19

20 Amdahl s Law (Source: Wikipedia) 20

21 Uniprocessor (UP) Architectures l Ancient! Not used today for general computing: l Single Accumulator (SAA) Architecture, e.g. Von Neumann s machine, in the 1940s l Single register to hold operation results l Conventionally called accumulator l Accumulator used as destination of arithmetic operations, and as (one) source l Has central processing unit, memory unit, connecting memory bus l pc points to next instruction (in memory) to be executed next l Commercial sample: ENIAC 21

22 Uniprocessor (UP) Architectures Accumul. Main Mem. pc 22

23 General-Purpose Register (GPR) Architecture l Accumulates ALU results in n registers, n was typically 4, 8, 16, 64 l Allows register-to-register operations, fast! l GPR is essentially a multi-register extension of SA architecture l Two-address architecture specifies one source operand explicitly, another implicitly, plus one destination l Three-address architecture specifies two source operands explicitly, plus an explicit destination l Variations allow additional index registers, base registers, multiple index registers, etc. 23

24 General-Purpose Register (GPR) Architecture 24

25 Stack Machine Architecture (SMA) l AKA zero-address architecture, since arithmetic operations require no explicit operand, hence no operand addresses l All operands are implied, except for push and pop l What is equivalent of push/pop on GPR? l Pure Stack Machine (SMA) has no registers l Hence performance would be poor, as all operations involve memory! l However, one can design an SMA that implements n top of stack elements as registers: Stack Cache l Sample architectures: Burroughs B5000, HP

26 Stack Machine Architecture (SMA) l Implement impure stack operations that bypass tos operand addressing l Sample code sequence to compute on SMA: res := a * ( b ) -- operand sizes are implied! push a -- destination implied: stack pushlit also destination implied push b -- ditto add -- 2 sources, and destination implied mult -- 2 sources, and destination implied pop res -- source implied: stack 26

27 Stack Machine Architecture (SMA) 27

28 Pipelined Architecture (PA) l Arithmetic Logic Unit, ALU, split into separate, sequentially connected units in PA l Unit is referred to as a stage ; more precisely the time at which the action is done is the stage l Each of these stages/units can be initiated once per cycle l Yet each subunit is implemented in HW just once l Multiple subunits operate in parallel on different sub-ops, each executing a different stage; each stage is part instruction execution 28

29 Pipelined Architecture (PA) l Non-unit time, differing # of cycles per operation cause different terminations l Operations can abort in intermediate stage, if a later instruction changes the flow of control l E.g. due to a branch, exception, return, conditional branch, call l Operation must stall in case of operand dependence: stall, caused by interlock; AKA dependency of data or control 29

30 Pipelined Architecture (PA) 30

31 Pipelined Architecture (PA) l Ideally each instruction can be partitioned into the same number of stages, i.e. sub-operations l Operations to be pipelined can sometimes be evenly partitioned into equal-length sub-operations l That equal-length time quantum might as well be a single sub-clock l In practice hard for architect to achieve; compare for example integer add and floating point divide! Vastly different time needs! 31

32 Pipelined Architecture (PA) l Ideally all operations have independent operands l i.e. one operand being computed is not needed as source of the following few operations l If they were needed and often they are then this would cause dependence, which causes a stall 1. read after write (RAW) 2. write after read (WAR) 3. write after write with use in between (WAW) l Also, ideally, all instructions just happen to be arranged sequentially one after another in memory l In reality, there are branches, conditional branches, calls, returns, exceptions, etc. 32

33 Pipelined Architecture (PA) Idealized Pipeline Resource Diagram: 33

34 Multiprocessor (MP) Architectures l Shared Memory Architecture (SMA) l Equal access to memory for all n processors, p 0 to p n-1 l Only one will succeed in accessing shared memory, if there are multiple, simultaneous accesses l Simultaneous access must be deterministic; needed a policy or an arbiter that is deterministic l Von Neumann bottleneck even tighter than for conventional UP system l Typically there are ~twice as many loads as stores 34

35 Multiprocessor (MP) Architectures l Generally, some processors are idle due to memory or other conflict l Typical number of processors n=4, but n=8 and greater possible, with large 2 nd level cache, even larger 3 rd level cache l Early MP architectures had only limited commercial success and acceptance, due to programming burden, frequently loaded onto programmer l Morphing in the 2000s into multi-core and hyperthreaded architectures, where programming burden is on multi-threading OS; i.e. the OS identifies and exploits the threads! 35

36 Multiprocessor (MP) Architectures Yes, 3 CPUs, just to make point of Shared Memory 36

37 Distributed Memory Architecture DMA l Processors have private, AKA local memories l Yet programmer has to see single, logical memory space, regardless of local distribution l Hence each processor p i always has access to its own memory Mem i l And collection of all memories Mem i i= 0..n-1 is program s logical data space l Thus, processors must access others memories l Done via Message Passing or Virtual Shared Memory l Messages must be routed, route be determined l Route may require multiple, intermediate nodes 37

38 Distributed Memory Architecture DMA l Blocking when: message expected but hasn t arrived yet l Blocking when: message to be sent, but destination cannot receive l Growing message buffer size increases illusion of asynchronicity of sending and receiving operations l Key parameter: time for 1 hop and package overhead to send empty message l Message may also be delayed because of network congestion 38

39 Distributed Memory Architecture DMA 39

40 Systolic Array (SA) Architecture l Very few designed: CMU and Intel for (then) ARPA l Each processor has private memory l Network is pre-defined by Systolic Pathway (SP) l Each node is pre-connected via SP to some subset of other processors l Node connectivity: determined by implemented/ selected network topology l Systolic pathway is high-performance network; sending and receiving may be synchronized (blocking) or asynchronous (data received are buffered) l Typical network topologies: line, ring, torus, hex grid, mesh, etc. 40

41 Systolic Array (SA) Architecture l Sample SA below is actually a ring: the wrap-around along x and y direction is not fully shown l Processor can write to x or y gate; sends word off on x or y SP l Processor can read from x or y gate; consumes word from x or y SP l Buffered SA can write to gate, even if receiver cannot read l Attempt to read from gate, when no message is available, will cause blocking! l Automatic code generation for non-buffered SA hard, compiler must keep track of interprocessor synchronization l Can view SP as an extension of memory with infinite capacity, but with sequential access 41

42 Systolic Array (SA) Architecture 42

43 Systolic Array (SA) Architecture l Note that each pathway, x or y, may be bi-directional l May have any number of pathways, nothing magic about 2, x and y l Possible to have I/O capability with each node l Typical application: large polynomials of the form: y = k 0 + k 1 *x 1 + k 2 *x k n-1 *x n-1 = Σ k i *x i Next example shows a torus without displaying the wrap-around pathways across both dimensions 43

44 Systolic Array (SA) Architecture 44

45 Hybrid Architectures l Superscalar (SSA) Architecture l Replicates (duplicates) some operations in HW l Seems like scalar architecture w.r.t. object code l Offers (limited type of) parallel execution, as it has multiple copies of some hardware units l Is not an MP architecture: the multiple units do not have concurrent, independent memory access l Has multiple ALUs, possibly multiple FP add (FPA) units, FP multiply (FPM) units, and/or integer units l Arithmetic operations simultaneous with load and store operations; note data dependence! 45

46 Hybrid Architectures l Instruction fetch in superscalar architecture is speculative, since number of parallel operations unknown; rule: fetch too much! But cannot fetch more than longest possible superscalar pattern l Code sequence looks like sequence of instructions for scalar processor l Example: code executed on Pentium processors l More famous and successful example: processor; see below l Object code can be custom-tailored by compiler; i.e. compiler can have superscalar target processor in mind, bias code emission, knowing that some code sequences are better suited for superscalar execution 46

47 Hybrid Architectures l Fetch enough instruction bytes on superscalar target to support widest (most parallel) possible object sequence l Decoding is bottle-neck for CISC, easier for RISC 32-bit units, or 64-bit units l Sample of superscalar: i80860 has separate FPA, FPM, 2 integer ops, load, store with pre- and post address-increment and decrement l Superscalar, pipelined architecture with maximum of 3 instructions per cycle l In abstract picture next page the pipelined stages are: IF, DE, EX, and WB for: instruction fetch, decoding, execution, and write back of result 47

48 Hybrid Architectures N=3, i.e.3 IPC 48

49 VLIW Architecture (VLIW) l Very Long Instruction Word, typically 128 bits or more l Object code is no longer purely scalar, but explicitly parallel, though parallelism cannot always be exploited l Just like limitation in superscalar: This is not a general MP architecture: The subinstructions do not have concurrent memory access; dependences have to be resolved before code emission l But VLIW opcodes are designed to support some parallel execution l Compiler/programmer explicitly packs parallelizable operations into VLIW instruction 49

50 VLIW Architecture (VLIW) l Just like horizontal microcode compaction l Other opcodes are still scalar, can coexist with VLIW instructions l Partial parallel, even scalar, operations possible by placing no-ops into some of the VLIW fields l Sample: Compute instruction of CMU warp and Intel iwarp l Could be 1-bit (or few-bit) opcode for compute instruction; plus sub-opcodes for subinstructions l Data dependence example: Result of FPA cannot be used as operand for FPM in the same VLIW instruction 50

51 VLIW Architecture (VLIW) l Result of int1 cannot be used as operand for int2, etc. l Thus, need to software-pipeline l Below: this is one VLIW instruction 51

52 EPIC Architecture l Groups instructions into bundles l Straighten out branches by associating predicate with instructions l Execute instructions in parallel, say the else clause and the then clause of an If Statement l Decide at run time which of the predicates is true, and execute just that path of multiple choices l Use speculation to straighten branch tree l Use large, rotating register file l Has many registers, not just 64 GPRs 52

53 Summary: Computer Architecture l Computers are never fast enough, just like people: never rich enough l Speed improvements accomplished through parallelism, multi-processing, pipelining, and resource replication l Some modes of parallelism were dead-ends, e.g. systolic arrays (controversial) l Others offer solid improvement, e.g. pipelining, multi-processing, adequate register number, and multi-cores, etc. 53

54 Bibliography lect11.pdf 8. VLIW Architecture: acrobat_download2/other/vliw-wp.pdf 9. ACM reference to Multiflow computer architecture: id=110622&coll=portal&dl=acm 54

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 GBI0001@AUBURN.EDU ELEC 6200-001: Computer Architecture and Design Silicon Technology Moore s law Moore's Law describes a long-term trend in the history

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

Page 1. Structure of von Nuemann machine. Instruction Set - the type of Instructions

Page 1. Structure of von Nuemann machine. Instruction Set - the type of Instructions Structure of von Nuemann machine Arithmetic and Logic Unit Input Output Equipment Main Memory Program Control Unit 1 1 Instruction Set - the type of Instructions Arithmetic + Logical (ADD, SUB, MULT, DIV,

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Typical Processor Execution Cycle

Typical Processor Execution Cycle Typical Processor Execution Cycle Instruction Fetch Obtain instruction from program storage Instruction Decode Determine required actions and instruction size Operand Fetch Locate and obtain operand data

More information

The von Neumann Architecture. IT 3123 Hardware and Software Concepts. The Instruction Cycle. Registers. LMC Executes a Store.

The von Neumann Architecture. IT 3123 Hardware and Software Concepts. The Instruction Cycle. Registers. LMC Executes a Store. IT 3123 Hardware and Software Concepts February 11 and Memory II Copyright 2005 by Bob Brown The von Neumann Architecture 00 01 02 03 PC IR Control Unit Command Memory ALU 96 97 98 99 Notice: This session

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

Chapter 2: Instructions How we talk to the computer

Chapter 2: Instructions How we talk to the computer Chapter 2: Instructions How we talk to the computer 1 The Instruction Set Architecture that part of the architecture that is visible to the programmer - instruction formats - opcodes (available instructions)

More information

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor 1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Pipelining and Vector Processing Chapter 8 S. Dandamudi Outline Basic concepts Handling resource conflicts Data hazards Handling branches Performance enhancements Example implementations Pentium PowerPC

More information

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015 Advanced Parallel Architecture Lesson 3 Annalisa Massini - 2014/2015 Von Neumann Architecture 2 Summary of the traditional computer architecture: Von Neumann architecture http://williamstallings.com/coa/coa7e.html

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2 Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Module 5 Introduction to Parallel Processing Systems

Module 5 Introduction to Parallel Processing Systems Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Multithreading: Exploiting Thread-Level Parallelism within a Processor Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?) Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static

More information

As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor.

As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Hiroaki Kobayashi // As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Branches will arrive up to n times faster in an n-issue processor, and providing an instruction

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2010 Lecture 6: VLIW 563 L06.1 Fall 2010 Little s Law Number of Instructions in the pipeline (parallelism) = Throughput * Latency or N T L Throughput per Cycle

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

CS 101, Mock Computer Architecture

CS 101, Mock Computer Architecture CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 199 5.2 Instruction Formats 199 5.2.1 Design Decisions for Instruction Sets 200 5.2.2 Little versus Big Endian 201 5.2.3 Internal

More information

Real instruction set architectures. Part 2: a representative sample

Real instruction set architectures. Part 2: a representative sample Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length

More information

2 MARKS Q&A 1 KNREDDY UNIT-I

2 MARKS Q&A 1 KNREDDY UNIT-I 2 MARKS Q&A 1 KNREDDY UNIT-I 1. What is bus; list the different types of buses with its function. A group of lines that serves as a connecting path for several devices is called a bus; TYPES: ADDRESS BUS,

More information

Instruction Set Architectures. Part 1

Instruction Set Architectures. Part 1 Instruction Set Architectures Part 1 Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture Digital Design Circuit Design 1/9/02 Some ancient history Earliest (1940

More information

Instruction Set Architecture. "Speaking with the computer"

Instruction Set Architecture. Speaking with the computer Instruction Set Architecture "Speaking with the computer" The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture Digital Design

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the Evolution of ISAs Instruction set architectures have changed over computer generations with changes in the cost of the hardware density of the hardware design philosophy potential performance gains One

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

Processor Architecture and Interconnect

Processor Architecture and Interconnect Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 Objectives ---------- 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015 Advanced Parallel Architecture Lesson 3 Annalisa Massini - Von Neumann Architecture 2 Two lessons Summary of the traditional computer architecture Von Neumann architecture http://williamstallings.com/coa/coa7e.html

More information

Super Scalar. Kalyan Basu March 21,

Super Scalar. Kalyan Basu March 21, Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build

More information

VLIW/EPIC: Statically Scheduled ILP

VLIW/EPIC: Statically Scheduled ILP 6.823, L21-1 VLIW/EPIC: Statically Scheduled ILP Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. 11 1 This Set 11 1 These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. Text covers multiple-issue machines in Chapter 4, but

More information

COSC 6385 Computer Architecture. Instruction Set Architectures

COSC 6385 Computer Architecture. Instruction Set Architectures COSC 6385 Computer Architecture Instruction Set Architectures Spring 2012 Instruction Set Architecture (ISA) Definition on Wikipedia: Part of the Computer Architecture related to programming Defines set

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

Chapter 5. A Closer Look at Instruction Set Architectures

Chapter 5. A Closer Look at Instruction Set Architectures Chapter 5 A Closer Look at Instruction Set Architectures Chapter 5 Objectives Understand the factors involved in instruction set architecture design. Gain familiarity with memory addressing modes. Understand

More information

CMSC 313 Lecture 27. System Performance CPU Performance Disk Performance. Announcement: Don t use oscillator in DigSim3

CMSC 313 Lecture 27. System Performance CPU Performance Disk Performance. Announcement: Don t use oscillator in DigSim3 System Performance CPU Performance Disk Performance CMSC 313 Lecture 27 Announcement: Don t use oscillator in DigSim3 UMBC, CMSC313, Richard Chang Bottlenecks The performance of a process

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 293 5.2 Instruction Formats 293 5.2.1 Design Decisions for Instruction Sets 294 5.2.2 Little versus Big Endian 295 5.2.3 Internal

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function William Stallings Computer Organization and Architecture 8 th Edition Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data

More information

Processors, Performance, and Profiling

Processors, Performance, and Profiling Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode

More information

Instruction Set Design

Instruction Set Design Instruction Set Design software instruction set hardware CPE442 Lec 3 ISA.1 Instruction Set Architecture Programmer's View ADD SUBTRACT AND OR COMPARE... 01010 01110 10011 10001 11010... CPU Memory I/O

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 5.2 Instruction Formats 5.2.1 Design Decisions for Instruction Sets 5.2.2 Little versus Big Endian 5.2.3 Internal Storage in the

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

ECE 172 Digital Systems. Chapter 10 Instruction Set Architecture (ISA) Herbert G. Mayer, PSU Status 7/31/2018

ECE 172 Digital Systems. Chapter 10 Instruction Set Architecture (ISA) Herbert G. Mayer, PSU Status 7/31/2018 ECE 172 Digital Systems Chapter 10 Instruction Set Architecture (ISA) Herbert G. Mayer, PSU Status 7/31/2018 1 Syllabus l Introduction l CISC vs. RISC l If Statement HW l Processor μpc l μpc Simulator

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information