Key elements in DSP algorithms. for DSP algorithms on. To be tackled today. Instruction fetches must be efficient.

Size: px
Start display at page:

Download "Key elements in DSP algorithms. for DSP algorithms on. To be tackled today. Instruction fetches must be efficient."

Transcription

1 Efficient Loop Handling for DSP algorithms on CISC, RISC and DSP processors M. Smith, Electrical and Computer Engineering, University of Calgary, Alberta, Canada ucalgary.ca Key elements in DSP algorithms Instruction fetches must be efficient Data fetches / stores often multiple must be efficient Multiplication must be efficient and accurate and remain precise Addition must be efficient i and accurate and remain precise Decision logic to control all the operations above must be efficient Program flow is key control operation Copyright smithmr@ucalgary.ca 2/ 33 To be tackled today Performing operations on an array Loop overhead can steal many cycles Loop overhead -- depends on implementation Standard loop with test at the start -- while () Initial test with additional test at end -- do-while( ) Down-counting loops Special Efficiencies CISC -- hardware RISC -- intelligent compilers DSP -- hardware a DELAY 0 DELAY_LEFT DELAY_RIGHT No relative delay modelled into the audio channel -- then sound perceived in centre of head Modelling a relative delay into the right ear audio channel. Sound arrival will shift sound to left as sound seems to get to left ear first Copyright smithmr@ucalgary.ca 3/ 33 ENCM Background to Audio Channel Modelling Copyright smithmr@ucalgary.ca 4/ 33

2 FIFO buffer update via Memory Move Example Pointer FIFO Delay Line Implement a delay line Try zero delay on left channel and large delay on right delay. Then try to read into a microphone void MemoryMove_Delay_CPP(void) { int count; } // Insert new value into the back of the FIFO delay line left_delayline[0 + LEFT_DELAY_VALUE] = channel1_in; // Grab delayed value from the front of the FIFO delay line channel1_out = left_delayline[0]; // Update the FIFO delay line using inefficient // memory-to-memory moves for (count = 0; count < LEFT_DELAY_VALUE; count++) left_ delayline[count] = left_ delayline[count + 1]; Copyright smithmr@ucalgary.ca 5/ 33 float FIFO[N]; float *pt_in = &FIFO[DELAY]; float *pt_out = &FIFO[0]; void PointerFIFO_CPP(void) { // Insert new value into the back of the FIFO delay line *pt _ in ++ = channel1_ in (Read pt _ in value, use it, store new pt _ in value) // Grab delayed value from the front of the FIFO delay line *channel1_out = *pt_out ++ if (pt_in > &left_delay[0 + LEFT_DELAY_VALUE]) _ then pt_in = pt_in (LEFT_DELAY_VALUE); if (pt_out > &left_delay[0 + LEFT_DELAY_VALUE]) then pt_out = pt_out (LEFT_DELAY_VALUE) } Requires additional reads and stores of static memory locations where pointers are stored Requires compares and jumps pipeline issues on jumps Copyright smithmr@ucalgary.ca 6/ 33 Labs delay line -- Concept FILTER 30 LEFT FILTER 330 LEF FILTER 330 RIG FILTER 30 RIGHT Get ambience by taking into account constructive and destructive interference around the face. This implies knowing characteristics of audio channel and modelling using an FIR filter -- 2 FIR filters per speaker -- processing requirement increasing ENCM Background to Audio Channel Modelling Copyright smithmr@ucalgary.ca 7/ 33 Real-time FIR Filter float fircoeffs_30[], fircoeffs[330]; void FIRFilter(void) { // Insert new value into FIFO delay line left_delayline[0 + N] = (float) channelleft_in; right_delayline[0 + N] = (float) channelright_in; channel_one_30 = channel_one 330 = 0; Need equivalent of following loop for EACH sound source for (count = 0, count < FIRlength - 1, count++) { channel_one_30 = channel_one_30 + left_delayline [count] * fir_30(count); channel_one_330 = channel_one_330 + right_delayline [count]* fir_330[count]; } channelleft_out = (int) channel_one_30 + scale_factor * channel_one_30 ditto 2 Update Left Channel delay line; Update Right Channel Delay line } Copyright smithmr@ucalgary.ca 8/ 33

3 Real-time FIR Hard-coded loop channel_one_30 = channel_one_30 + arrayleft[0] * fir_30(0); channel_one_30 = channel_one_30 + arrayleft[1] * fir_30(1); channel_one_30 = channel_one_30 + arrayleft[2] * fir_30(2); channel _o one_30 = channel _o one_30 + arrayleft[3] ay * fir_30(3); channel_one_30 = channel_one_30 + arrayleft[4] * fir_30(4); channel_one_30 = channel_one_30 + arrayleft[5] * fir_30(5); channel_one_30 = channel_one_30 + arrayleft[6] * fir_30(6); channel_one_30 = channel_one_30 + arrayleft[7] * fir_30(7); No loop overhead heavy memory penalty -- FIR filters 300 taps * 4 filters Using pt++ type memory operations rather than direct memory access with offset is faster on SOME processors!! Timing required to handle DSP loops for k = 0 to (N-1) -- Could require many lines to code Body of Code -- BofC cycles -- Could be 1 line Endfor -- Could require many lines to -- code, jumps and counter updates Important feature -- how much overhead time is used in handling the loop construct itself? Three components Set up Time Body of code time -- BofC cycles Handling the loop itself Copyright smithmr@ucalgary.ca 9/ 33 Copyright smithmr@ucalgary.ca 10 / 33 Basic Loop body Set up loop -- loop overhead -- done once Check conditions -- loop overhead -- done many times Do Code Body -- done many times -- useful Loop Back + counter increment -- loop overhead -- many Define Loop Efficiency = Useful cycles / Total cycles = N * Tcodebody Tsetup + N * (Tcodebody + Tconditions + Tloopback) Different Efficiencies depending on the size of the loop Need to learn good approximation techniques and recognize the two extremes 3 different basic loop constructs While loop Main compare test at top of loop Modified do-while loop with initial test Initial compare test at top Main compare test at the bottom of the loop Down-counting do-while loop with initial iti test t No compare operations in test. Relies on the setting of condition code flags during adjustment of the loop counter. Can increase overhead in some algorithms Copyright smithmr@ucalgary.ca 11 / 33 Copyright smithmr@ucalgary.ca 12 / 33

4 Clements -- Microprocessor Systems Design PWS Publishing ISBN Data from the memory appears near the end of the read cycle Copyright 13 / 33 Review -- CISC processor instruction phases Fetch -- Obtain op-code PC-value out on Address Bus Instruction op-code at Memory[PC] on Data Bus and then into Instruction Register Decode -- Bringing required values (internal or external) to the ALU input. Immediate -- Additional memory access for value -- Memory[PC] Absolute -- Additional memory access for address value and then further access for value -- Memory[Memory[PC]] Indirect -- Additional memory access to obtain value at Memory[AddressReg] Execute -- ALU operation Writeback -- ALU value to internal/external storage May require additional memory accesses to obtain address used during storage May require additional memory operations to perform storage. Copyright smithmr@ucalgary.ca 14 / 33 Basic 68K CISC loop -- Test at start MOVE.L #0, count -- Set up -- count in register Fetch instr. (FI4) + Fetch 32-bit constant (FC 2 * 4) + operation (OP0) CMP.L #N, count -- (FI4 FC8, OP bit subtract) BGE ENDLOOP Actually ADD.L #(ENDLOOP - 4), PC (ADD OF 16-bit btdisplacement TO PC -- FI4 FC4 OP(0 or 4) ) Body Cycles -- doing FIR perhaps ADD.L #1, count JMP LOOP END -- This is actually a numerical value (address) LOOP EFFECIENCY = N*(28 + BodyCycles + 32) Since ( ) >> 12 (5 times) then ignore startup cycles even if N small Copyright smithmr@ucalgary.ca 15 / 33 Check at end -- 68K CISC loop MOVE.L #0, count -- (FI4, FC8, OP0) NOTE JUMP Body Cycles -- doing FIR perhaps ADD.L #1, count CMP.L #N, count BLT LOOP EFFECIENCY = N*BodyCycles + 44*(N+1) Since 44 > 26 (only 1.8 times) then can t Ignore startup cycles when N small and Body Cycles small -- Small loop means inefficient Copyright smithmr@ucalgary.ca 16 / 33

5 Down Count -- 68K CISC loop MOVEQ.L #0, array_index -- (FI4, FC0, OP0) MOVE.L #N, count -- (FI4, FC0, OP0) BodyCycles using instructions of form OPERATION (Addreg, Index) ADDQ.L #1, array_index -- (FI4, FC0, OP0?) SUBQ.L #1, count -- (FI4, FC0, OP0?) LOOPTEST : BGT LOOP 24 + N*BodyCycles y + 20*(N+1) (was 44 * (N+1)) Since 20 < 24 then can t Ignore startup if N small and Body Cycles small Copyright smithmr@ucalgary.ca 17 / 33 Down Count -- Possible sometimes MOVEA.L #array_start, Addreg -- (FI4, FC0, OP0) MOVE.L #N, count -- (FI4, FC0, OP0) BodyCycles y using autoincrement mode OPCODE (Addreg)+ SUBQ.L #1, count LOOPTEST : BGT LOOP -- (FI4, FC0, OP0?) N*BodyCycles 24 + N*BodyCycles + 16*(N+1) (was 20 * (N+1)) Since 16 < 24 then can t Ignore startup if N small and Body Cycles small NOTE -- Number of cycles needed in body of the loop decreases in this case Copyright smithmr@ucalgary.ca 18 / 33 Loop Efficiency on CISC processor Efficiency depends on how loop constructed Standard d while-loop hl l Check at end -- modified do-while Down counting -- with/without auto-incrementing addressing modes Compiler versus hardcode efficiency See Embedded System Design magazine Sept./Oct 2000 Local copy available on ENCM515 web-pages What happens with different processor architectures? Check at end -- 29K RISC loop CONST count, 0 JUMP LOOPTEST NOP Bodycycles -- autoincrementing mode -- NOT AN OPTION ON 29K ADDU count, count, 1 CPLE TruthReg, count, N -- (1 cycle should be 2 -- register forwarding) (Boolean Truth Flag in TruthReg -- which could be any register) JMPT TruthReg LOOP NOP Loop Efficiency = *(N+1) Since 4 = 3 then can t Ignore startup if N small and Body Cycles small Since dealing with single cycle operations -- body cycle count smaller than CISC. This means that the loop overhead become more problematic if the processor efficient Copyright smithmr@ucalgary.ca 19 / 33 Copyright smithmr@ucalgary.ca 20 / 33

6 Down Count -- 29K RISC loop CONST index, cycle JUMP LOOPTEST -- 1 cycle CONST count, N -- in delay slot BodyCycles SUBU count, count, cycle CPGT TruthReg, count, cycle JMPT TruthReg, LOOP -- 1 cycle ADDS index, index, 1 -- in delay 3 + N*BodyCycles + 4*(N+1) Copyright smithmr@ucalgary.ca 21 / 33 Efficiency on RISC processors Not much difference between Test at end, Down count loop format HOWEVER body-cycle count has decreased Processor is highly pipelined -- Loop jumps cause the pipeline to stall Need to take advantage of delay slots Efficiency depends on DSP algorithm being implemented? What about DSP processors? Architecture is designed for efficiency in this area. Copyright smithmr@ucalgary.ca 22 / 33 Check at end -- ADSP-21K loop count, = 0; number = N; JUMP LOOPTEST (DB); jump to loop end NOP; NOP; BODYCYCLES count = count + 1; LOOPTEST Comp(count, number); IF LT JUMP LOOP (DB); NOP; NOP; EFFICIENCY = N*BodyCycles + 5*(N+1) Copyright smithmr@ucalgary.ca 23 / 33 Speed improvement -- Possible? count = 1; number = N; JUMP LOOPTEST (DB); count = count - 1; ADJUST number = number - 1; BODYCYCLES count = count + 1; Comp(count, number); IF LT JUMP LOOP (DB); count = count + 1; NOP; EFFICIENCY = N*BodyCycles + 4*(N+1) Copyright smithmr@ucalgary.ca 24 / 33

7 Down Count -- ADSP-21K loop number = 0; JUMP (PC, LOOPTEST) (DB); index = 0; count = N ; Bodycycles count = count - 1; LOOPTEST Comp(count, number); IF GT JUMP (PC, LOOP) (DB); index = index + 1; NOP; 4 + N*BodyCycles + 5*(N+1) Improved Down Count -- ADSP21K loop Is code valid -- or 1 off in times around loop? number = -1; -- Bias the loop counter (1 cycle) JUMP (PC, LOOPTEST) (DB); index = 0; count = (N-1); Body cycles LOOPTEST Comp(count, number); IF GT JUMP (PC, LOOP); index = index + 1; count = count - 1; 4 + N*BodyCycles + 4*(N+1) Copyright smithmr@ucalgary.ca 25 / 33 Copyright smithmr@ucalgary.ca 26 / 33 Faster Loops Need to go to special features CISC -- special Test, Conditional Jump and Decrement in 1 instruction RISC -- Change algorithm ago format DSP -- Special hardware for loops Maximum of six-nested loops (or just 2 on some processors) Can be a hidden trap when mixing C and asm code Copyright smithmr@ucalgary.ca 27 / 33 Recap -- 68K CISC loop down count MOVEQ.L #0, index MOVE.L #N, count BodyCycles ADDQ.L #1, index SUBQ.L #1, count LOOPTEST : BGT LOOP -- (FI4, FC0, OP0) -- (FI4, FC0, OP0) -- (FI4, FC0, OP0?) -- (FI4, FC0, OP0?) 24 + N*BodyCycles + 20*(N+1) Since 24=20 then can t Ignore startup if N small and Body Cycles small Copyright smithmr@ucalgary.ca 28 / 33

8 Hardware 68K CISC loop MOVEQ.L #0, index MOVE.L #(N-1), count BodyCycles ADDQ.L #1, index DBCC count, LOOP -- (FI4 FC0 OP0) -- (FI4 FC0 OP0) -- (FI4, FC0 OP0?) Loop Efficiency = N*BodyCycles + 16*(N+1) Possibility that Efficiency almost 100% if the Body Instructions are small enough to fit into cache Copyright smithmr@ucalgary.ca 29 / 33 Custom loop hardware on RISC For long loops -- loop overhead small -- no need to be concerned about the loop overhead (unless loop in loop) For small loops -- unroll the loop so that hardcoded 20 instructions rather than 1 instruction looped 20 times For medium loops -- advantage over CISC normally is that instructions more efficient -- 1 cycles compared to cycles For medium loops -- advantage over DSP normally is that instructions more efficient 1 RISC cycle compared to 2 DSP cycles -- (not 21K since 1 to 1) For more information See the Micro 1992 articles See the CCI articles Copyright smithmr@ucalgary.ca 30 / 33 21k Processor architecture Recap -- Improved Down Count -- 21K DSP loop number_r1 = -1 JUMP (PC, LOOPTEST) (DB) index_m4 = 0 count_r2 = (N-1) Body cycles Copyright smithmr@ucalgary.ca 31 / 33 LOOPTEST Comp(count,_R2 number_r1) -- (1 cycle ) IF GT JUMP (PC, LOOP) index _M4= index _M4+ 1 count _R2= count _R2-1 Loop Efficiency = N*BodyCycles + 4*(N+1) Copyright smithmr@ucalgary.ca 32 / 33

9 Hardware Loop -- 21K DSP loop count_r0 = N count_r0 = PASS count_r0 sets the condition codes with out doing math (allows parallel operation IF LE JUMP (PC, PASTLOOP) (DB) index = 0 nop HARDWARE_ LCNTR = count_r0; do (PC, PASTLOOP-1) until LCE -- 1 cycle -- parallel instruction Body-cycles PAST Last cycle of loop is at location PASTLOOP -1 Rest of the program code N*BodyCycles = 100% ENCM High + Speed N*BodyCycles Loops -- Hardware and Software Copyright smithmr@ucalgary.ca 33 / 33 Copyright smithmr@ucalgary.ca 34 / 33 DSP Hardware loop Efficiency from a number of areas Hardware counter No overhead for decrement No overhead for compare Pipelining efficient Processor knows to fetch instructions ti from start t of loop, not past the loop Has some problems if loop size is too small -- loop timing is longer than expected as processor needs to flush the pipeline and restart it Tackled today Performing access to memory in a loop Loop overhead can steal many cycles Loop overhead -- depends on implementation Standard loop with test at the start -- while () Initial test with test at end -- do-while( ) Down-counting loops Special Efficiencies CISC -- hardware RISC -- intelligent compilers DSP -- hardware a Copyright smithmr@ucalgary.ca 35 / 33 Copyright smithmr@ucalgary.ca 36 / 33

The SHARC in the C. Mike Smith

The SHARC in the C. Mike Smith M. Smith -- The SHARC in the C Page 1 of 9 The SHARC in the C Mike Smith Department of Electrical and Computer Engineering, University of Calgary, Alberta, Canada T2N 1N4 Contact Person: M. Smith Phone:

More information

Better sharc data such as vliw format, number of kind of functional units

Better sharc data such as vliw format, number of kind of functional units Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com

More information

FEATURE ARTICLE. Michael Smith

FEATURE ARTICLE. Michael Smith In a recent project, Mike set out to develop DSP algorithms suitable for producing an improved sound stage for headphones. Using the Analog Devices 21061 SHARC, he modified the phase and amplitude of the

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

Static, multiple-issue (superscaler) pipelines

Static, multiple-issue (superscaler) pipelines Static, multiple-issue (superscaler) pipelines Start more than one instruction in the same cycle Instruction Register file EX + MEM + WB PC Instruction Register file EX + MEM + WB 79 A static two-issue

More information

CPU Structure and Function

CPU Structure and Function CPU Structure and Function Chapter 12 Lesson 17 Slide 1/36 Processor Organization CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Lesson 17 Slide 2/36 CPU With Systems

More information

Instruction Sets: Characteristics and Functions Addressing Modes

Instruction Sets: Characteristics and Functions Addressing Modes Instruction Sets: Characteristics and Functions Addressing Modes Chapters 10 and 11, William Stallings Computer Organization and Architecture 7 th Edition What is an Instruction Set? The complete collection

More information

CS146 Computer Architecture. Fall Midterm Exam

CS146 Computer Architecture. Fall Midterm Exam CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state

More information

INTRODUCTION TO BRANCHING. There are two forms of unconditional branching in the MC68000.

INTRODUCTION TO BRANCHING. There are two forms of unconditional branching in the MC68000. INTRODUCTION TO BRANCHING UNCONDITIONAL BRANCHING There are two forms of unconditional branching in the MC68000. BRA instruction BRA Program control passes directly to the instruction located at

More information

UNIT 2 PROCESSORS ORGANIZATION CONT.

UNIT 2 PROCESSORS ORGANIZATION CONT. UNIT 2 PROCESSORS ORGANIZATION CONT. Types of Operand Addresses Numbers Integer/floating point Characters ASCII etc. Logical Data Bits or flags x86 Data Types Operands in 8 bit -Byte 16 bit- word 32 bit-

More information

DSP Platforms Lab (AD-SHARC) Session 05

DSP Platforms Lab (AD-SHARC) Session 05 University of Miami - Frost School of Music DSP Platforms Lab (AD-SHARC) Session 05 Description This session will be dedicated to give an introduction to the hardware architecture and assembly programming

More information

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015 Advanced Parallel Architecture Lesson 3 Annalisa Massini - 2014/2015 Von Neumann Architecture 2 Summary of the traditional computer architecture: Von Neumann architecture http://williamstallings.com/coa/coa7e.html

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function William Stallings Computer Organization and Architecture 8 th Edition Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data

More information

Computer Organization CS 206 T Lec# 2: Instruction Sets

Computer Organization CS 206 T Lec# 2: Instruction Sets Computer Organization CS 206 T Lec# 2: Instruction Sets Topics What is an instruction set Elements of instruction Instruction Format Instruction types Types of operations Types of operand Addressing mode

More information

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA CISC 662 Graduate Computer Architecture Lecture 4 - ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

Working with the Compute Block

Working with the Compute Block Tackled today Working with the Compute Block M. R. Smith, ECE University of Calgary Canada Problems with using I-ALU as an integer processor TigerSHARC processor architecture What features are available

More information

The following revision history lists the anomaly list revisions and major changes for each anomaly list revision.

The following revision history lists the anomaly list revisions and major changes for each anomaly list revision. a SHARC Processor ADSP-21366 ABOUT ADSP-21366 SILICON ANOMALIES These anomalies represent the currently known differences between revisions of the SHARC ADSP-21366 product(s) and the functionality specified

More information

Computer Organization and Technology Processor and System Structures

Computer Organization and Technology Processor and System Structures Computer Organization and Technology Processor and System Structures Assoc. Prof. Dr. Wattanapong Kurdthongmee Division of Computer Engineering, School of Engineering and Resources, Walailak University

More information

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition CPU Structure and Function Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition CPU must: CPU Function Fetch instructions Interpret/decode instructions Fetch data Process data

More information

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

ISA and RISCV. CASS 2018 Lavanya Ramapantulu ISA and RISCV CASS 2018 Lavanya Ramapantulu Program Program =?? Algorithm + Data Structures Niklaus Wirth Program (Abstraction) of processor/hardware that executes 3-Jul-18 CASS18 - ISA and RISCV 2 Program

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

02 - Numerical Representation and Introduction to Junior

02 - Numerical Representation and Introduction to Junior 02 - Numerical Representation and Introduction to Junior September 10, 2013 Todays lecture Finite length effects, continued from Lecture 1 How to handle overflow Introduction to the Junior processor Demonstration

More information

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1 Pipelining COMP375 Computer Architecture and dorganization Parallelism The most common method of making computers faster is to increase parallelism. There are many levels of parallelism Macro Multiple

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization CISC 662 Graduate Computer Architecture Lecture 4 - ISA MIPS ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

03 - The Junior Processor

03 - The Junior Processor September 10, 2014 Designing a minimal instruction set What is the smallest instruction set you can get away with while retaining the capability to execute all possible programs you can encounter? Designing

More information

Chapter 3 : Control Unit

Chapter 3 : Control Unit 3.1 Control Memory Chapter 3 Control Unit The function of the control unit in a digital computer is to initiate sequences of microoperations. When the control signals are generated by hardware using conventional

More information

Question Bank Microprocessor and Microcontroller

Question Bank Microprocessor and Microcontroller QUESTION BANK - 2 PART A 1. What is cycle stealing? (K1-CO3) During any given bus cycle, one of the system components connected to the system bus is given control of the bus. This component is said to

More information

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015 Advanced Parallel Architecture Lesson 3 Annalisa Massini - Von Neumann Architecture 2 Two lessons Summary of the traditional computer architecture Von Neumann architecture http://williamstallings.com/coa/coa7e.html

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 1: Digital logic circuits The digital computer is a digital system that performs various computational tasks. Digital computers use the binary number system, which has two

More information

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation Computer Organization CS 231-01 Data Representation Dr. William H. Robinson November 12, 2004 Topics Power tends to corrupt; absolute power corrupts absolutely. Lord Acton British historian, late 19 th

More information

A First Look at Microprocessors

A First Look at Microprocessors A First Look at Microprocessors using the The General Prototype Computer (GPC) model Part 2 Can you identify an opcode to: Decrement the contents of R1, and store the result in R5? Invert the contents

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

C Calling Conventions

C Calling Conventions C Calling Conventions 1. parameters are passed on the run-time or system stack, SP (or A7) 2. parameters pushed on stack in right to left order of call A6 used as the stack frame pointer local variables

More information

Assembly language Simple, regular instructions building blocks of C, Java & other languages Typically one-to-one mapping to machine language

Assembly language Simple, regular instructions building blocks of C, Java & other languages Typically one-to-one mapping to machine language Assembly Language Readings: 2.1-2.7, 2.9-2.10, 2.14 Green reference card Assembly language Simple, regular instructions building blocks of C, Java & other languages Typically one-to-one mapping to machine

More information

Understand the factors involved in instruction set

Understand the factors involved in instruction set A Closer Look at Instruction Set Architectures Objectives Understand the factors involved in instruction set architecture design. Look at different instruction formats, operand types, and memory access

More information

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Registers

More information

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Pipelining, Branch Prediction, Trends

Pipelining, Branch Prediction, Trends Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping

More information

Digital IP Cell 8-bit Microcontroller PE80

Digital IP Cell 8-bit Microcontroller PE80 1. Description The is a Z80 compliant processor soft-macro - IP block that can be implemented in digital or mixed signal ASIC designs. The Z80 and its derivatives and clones make up one of the most commonly

More information

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng. CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng. Part 5: Processors Our goal: understand basics of processors and CPU understand the architecture of MARIE, a model computer a close look at the instruction

More information

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

Slides for Lecture 6

Slides for Lecture 6 Slides for Lecture 6 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 28 January,

More information

Interfacing Compiler and Hardware. Computer Systems Architecture. Processor Types And Instruction Sets. What Instructions Should A Processor Offer?

Interfacing Compiler and Hardware. Computer Systems Architecture. Processor Types And Instruction Sets. What Instructions Should A Processor Offer? Interfacing Compiler and Hardware Computer Systems Architecture FORTRAN 90 program C++ program Processor Types And Sets FORTRAN 90 Compiler C++ Compiler set level Hardware 1 2 What s Should A Processor

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

101. The memory blocks are mapped on to the cache with the help of a) Hash functions b) Vectors c) Mapping functions d) None of the mentioned

101. The memory blocks are mapped on to the cache with the help of a) Hash functions b) Vectors c) Mapping functions d) None of the mentioned 101. The memory blocks are mapped on to the cache with the help of a) Hash functions b) Vectors c) Mapping functions d) None of the mentioned 102. During a write operation if the required block is not

More information

Assembly Language Programming of 8085

Assembly Language Programming of 8085 Assembly Language Programming of 8085 Topics 1. Introduction 2. Programming model of 8085 3. Instruction set of 8085 4. Example Programs 5. Addressing modes of 8085 6. Instruction & Data Formats of 8085

More information

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , )

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , ) Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections 3.4-3.5, 3.8-3.14) 1 1-Bit Prediction For each branch, keep track of what happened last time and use

More information

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai Abstract: ARM is one of the most licensed and thus widespread processor

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

Instr. execution impl. view

Instr. execution impl. view Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction

More information

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version: SISTEMI EMBEDDED Computer Organization Pipelining Federico Baronti Last version: 20160518 Basic Concept of Pipelining Circuit technology and hardware arrangement influence the speed of execution for programs

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

2 MARKS Q&A 1 KNREDDY UNIT-I

2 MARKS Q&A 1 KNREDDY UNIT-I 2 MARKS Q&A 1 KNREDDY UNIT-I 1. What is bus; list the different types of buses with its function. A group of lines that serves as a connecting path for several devices is called a bus; TYPES: ADDRESS BUS,

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 199 5.2 Instruction Formats 199 5.2.1 Design Decisions for Instruction Sets 200 5.2.2 Little versus Big Endian 201 5.2.3 Internal

More information

Chapter 14 - Processor Structure and Function

Chapter 14 - Processor Structure and Function Chapter 14 - Processor Structure and Function Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 14 - Processor Structure and Function 1 / 94 Table of Contents I 1 Processor Organization

More information

UNIT V: CENTRAL PROCESSING UNIT

UNIT V: CENTRAL PROCESSING UNIT UNIT V: CENTRAL PROCESSING UNIT Agenda Basic Instruc1on Cycle & Sets Addressing Instruc1on Format Processor Organiza1on Register Organiza1on Pipeline Processors Instruc1on Pipelining Co-Processors RISC

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

DSP VLSI Design. Instruction Set. Byungin Moon. Yonsei University

DSP VLSI Design. Instruction Set. Byungin Moon. Yonsei University Byungin Moon Yonsei University Outline Instruction types Arithmetic and multiplication Logic operations Shifting and rotating Comparison Instruction flow control (looping, branch, call, and return) Conditional

More information

Computer Architecture Sample Test 1

Computer Architecture Sample Test 1 Computer Architecture Sample Test 1 Question 1. Suppose we have 32-bit memory addresses, a byte-addressable memory, and a 512 KB (2 19 bytes) cache with 32 (2 5 ) bytes per block. a) How many total lines

More information

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2 Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time

More information

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review Hazards (data/name/control) RAW, WAR, WAW hazards Different types

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

General Purpose Processors

General Purpose Processors Calcolatori Elettronici e Sistemi Operativi Specifications Device that executes a program General Purpose Processors Program list of instructions Instructions are stored in an external memory Stored program

More information

SAM Architecture Reference (Revised)

SAM Architecture Reference (Revised) SAM Architecture Reference (Revised) Mika Nyström Department of Computer Science California Institute of Technology Pasadena, California, U.S.A. 2000 April 19, 2000 Revised April 2, 2002 1. Introduction

More information

3.1 Description of Microprocessor. 3.2 History of Microprocessor

3.1 Description of Microprocessor. 3.2 History of Microprocessor 3.0 MAIN CONTENT 3.1 Description of Microprocessor The brain or engine of the PC is the processor (sometimes called microprocessor), or central processing unit (CPU). The CPU performs the system s calculating

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 11 Instruction Sets: Addressing Modes and Formats

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 11 Instruction Sets: Addressing Modes and Formats William Stallings Computer Organization and Architecture 8 th Edition Chapter 11 Instruction Sets: Addressing Modes and Formats Addressing Modes Immediate Direct Indirect Register Register Indirect Displacement

More information

CPU Structure and Function

CPU Structure and Function Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com http://www.yildiz.edu.tr/~naydin CPU Structure and Function 1 2 CPU Structure Registers

More information

3 PROGRAM SEQUENCER. Overview. Figure 3-0. Table 3-0. Listing 3-0.

3 PROGRAM SEQUENCER. Overview. Figure 3-0. Table 3-0. Listing 3-0. 3 PROGRAM SEQUENCER Figure 3-0. Table 3-0. Listing 3-0. Overview The DSP s program sequencer implements program flow which constantly provides the address of the next instruction to be executed by other

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

03 - The Junior Processor

03 - The Junior Processor September 8, 2015 Designing a minimal instruction set What is the smallest instruction set you can get away with while retaining the capability to execute all possible programs you can encounter? Designing

More information

NO CALCULATORS ARE ALLOWED IN THIS EXAM

NO CALCULATORS ARE ALLOWED IN THIS EXAM Department of Electrical and Computer Engineering, University of Calgary ENCM415 DECEMBER 18 th, 2001 3 HOURS NAME:- ID#:- PLEASE WRITE CLEARLY. USE AN HB GRADE OR SOFTER (DARKER) PENCIL! WHAT I CAN=T

More information

William Stallings Computer Organization and Architecture

William Stallings Computer Organization and Architecture William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function Rev. 3.2.1 (2005-06) by Enrico Nardelli 11-1 CPU Functions CPU must: Fetch instructions Decode instructions

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Chapter 4 The Processor (Part 4)

Chapter 4 The Processor (Part 4) Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline

More information

SCRAM Introduction. Philipp Koehn. 19 February 2018

SCRAM Introduction. Philipp Koehn. 19 February 2018 SCRAM Introduction Philipp Koehn 19 February 2018 This eek 1 Fully work through a computer circuit assembly code Simple but Complete Random Access Machine (SCRAM) every instruction is 8 bit 4 bit for op-code:

More information

Instruction Set Architecture. "Speaking with the computer"

Instruction Set Architecture. Speaking with the computer Instruction Set Architecture "Speaking with the computer" The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture Digital Design

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10032011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Chapter 3 Number Systems Fixed Point

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 293 5.2 Instruction Formats 293 5.2.1 Design Decisions for Instruction Sets 294 5.2.2 Little versus Big Endian 295 5.2.3 Internal

More information

William Stallings Computer Organization and Architecture

William Stallings Computer Organization and Architecture William Stallings Computer Organization and Architecture Chapter 16 Control Unit Operations Rev. 3.2 (2009-10) by Enrico Nardelli 16-1 Execution of the Instruction Cycle It has many elementary phases,

More information

Chapter 2 Lecture 1 Computer Systems Organization

Chapter 2 Lecture 1 Computer Systems Organization Chapter 2 Lecture 1 Computer Systems Organization This chapter provides an introduction to the components Processors: Primary Memory: Secondary Memory: Input/Output: Busses The Central Processing Unit

More information

CISC Processor Design

CISC Processor Design CISC Processor Design Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 3 SE-273: Processor Design Processor Architecture Processor Architecture CISC RISC Jan 21, 2008

More information

Computer Architecture and Organization. Instruction Sets: Addressing Modes and Formats

Computer Architecture and Organization. Instruction Sets: Addressing Modes and Formats Computer Architecture and Organization Instruction Sets: Addressing Modes and Formats Addressing Modes Immediate Direct Indirect Register Register Indirect Displacement (Indexed) Stack Immediate Addressing

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 RISC: The SPARC 09282011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Finish Motorola MC68000 The SPARC Architecture

More information

Chapter 2: Instructions How we talk to the computer

Chapter 2: Instructions How we talk to the computer Chapter 2: Instructions How we talk to the computer 1 The Instruction Set Architecture that part of the architecture that is visible to the programmer - instruction formats - opcodes (available instructions)

More information

CS Computer Architecture

CS Computer Architecture CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 An Example Implementation In principle, we could describe the control store in binary, 36 bits per word. We will use a simple symbolic

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

COMPUTER ORGANIZATION & ARCHITECTURE

COMPUTER ORGANIZATION & ARCHITECTURE COMPUTER ORGANIZATION & ARCHITECTURE Instructions Sets Architecture Lesson 5a 1 What are Instruction Sets The complete collection of instructions that are understood by a CPU Can be considered as a functional

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information