. Regularity: CFPP architecture seeks geometric

Size: px
Start display at page:

Download ". Regularity: CFPP architecture seeks geometric"

Transcription

1 Survey of the Counterflow Pipeline Processor Architectures Pic Balaji Esther Ososanya Department of Electrical Engineering, University of The District of Columbia, Van Ness Campus, Washington D.C USA Wagdy Mahmoud Karthik Thangarajan Keywords - Counterflow pipeline, synchronous and asynchronous systems, RlSC processors, MIPS architecture. Abstract - The Counterflow Pipeline Processor (CFPP) Architecture is a RISC-based pipeline processor [l I. It was proposed in 1994 as asynchronous processor architecture. Recently, researches have implemented it as synchronous processor architecture and later improved its design in terms of speed and performance by reducing average execution latency of instructions and minimize pipeline stalling. In this paper, we survey the architecture and the key design issues such as implementing as a synchronous and an asynchronous architecture and discuss the advantages and disadvantages of these implementations. Further, our research on evaluating the performance of the counterflow pipeline processor architecture to that of the traditional MIPS processor architecture 141 is also discussed. 1. INTRODUCTION The Counterflow Pipeline Processor (CFPP) Architecture was introduced by Sproull et al. [l] of Sun Microsystems lab, as a simple and regular structure of pipeline processors. It underlies the family of RlSC processor architectures. The architecture has two pipelines in which the data structure representing instructions and result flow in opposite directions. This allows each instruction and each counterflowing result to interact at every stage. Though the initial architecture was intended for asynchronous implementation, recent research proposed synchronous implementations of the CFPP architectures. Other variations and modifications to the original design have been proposed. In this paper we present the original structure of CFPP architecture, the pipeline rules that govern the architecture, and the way it handles branches and traps. We also discuss the different implementations of the architecture and their advantages and disadvantages. Finally, we discuss about our research to evaluate the performance of the architecture. 11. THE ORIGINAL CFPP This section covers the original CFFP properties, structure, functional units and sidings, pipeline rules, conditional branches and traps, and advantages and disadvantages. A. CFPP Properties The CFPP is a simple and regular structure with the following properties: Local Control: Only local information decides whether an instruction in the CFPP should advance.. Regularity: CFPP architecture seeks geometric regularity in the processor chip layout. Communication: Every stage communicates primarily with its nearest neighbors. This allows for short and fast communication paths. Modularity: Stages may differ in their computational logic. However, all the stages adopt the same communication protocol. B. CFPP Structure The original structure of the Counterflow Pipeline Processor (CFPP) Architecture is shown in Figure 1 [2][5]. CFPP has an instruction fetch unit at one end of the pipeline stages, and a register file on the other end. In between these two stages, instructions and results flow in opposite directions. The pipeline through which the instruction flows is called the Instruction pipeline, and the pipeline through which the result flows is called the result pipeline. Pipeline stages operate concurrently and each stage has an independent function to complete. Alongside the series of pipeline stages, side units perform various arithmetic, logical, and memory operations. These side units are called sidings /02/$ IEEE 1

2 The instruction and result flows through the pipeline as packets. The Instruction packets fetched from the instruction memory are decoded using the Decode and Fetch unit before being sent to the pipeline stage next to that unit. The register file is one of the sources of the result packets that are sent into the pipeline stage. The instruction packets in the instruction pipeline and the result packets in the result pipeline interact with each other at every stage. Within each stage, instructions and results packets consist of smaller records called bindings. A binding contains a register address, the register contents, and a 1-bit flag to indicate whether or not the content of the register is valid. A typical binding is shown in Figure 2 [I]. Each stage also contains hardware for comparing addresses of the different binding to determine if any information can be exchanged between the instruction and result packets. The instruction packets consist of the instruction operation code (opcode), three bindings, and the program counter value. The first two bindings contain the instruction operands and the third binding contains the result. The result packets consist of the two bindings. The interaction of the instruction and the result pipelines is one of the key features of the architecture. An instruction and result packages occupy the same stage at the same time exchange information. I Binding Name Register Name Fig. 2: Instruction / Result Binding [I] During their interaction, the address registers of the instruction's operands are compared to the address registers of the result and in case of a match, the result value is copied to the register content field of the instruction. This is called a garner operation. Similarly, the source operand registers are updated with valid result rlegister values. This operation, which is called as update operation, allows newly computed result values to be available to the subsequent instructions, even before they are stored in the register file. When an instruction reaches the end of the pipeline, the data values stored in its destination binding is written into the corresponding location in the register file. Until this happen, instructions are considered speculative and may be cancelled in case of a trap or branch. C. Functional Units and Sidings There can be functional units called sidings connected to the pipeline. These functional units perform memory, logic and arithmetic operations. Siding!; are connected to the pipeline through launch and return stages. The siding themselves are pipelined. A stage of the pipeline that launches the instruction to the siding is a launch stage. The launch action is called launch sequence. The results from the sidings are returned back to the processor after a few stages at the return stage. This is a retum sequence. While siding is performing the operation on the instructions launched, the processor may perform another operation simultaneously. Thus sidings can facilitate several operations progressing concurrently. However, sidings need not be a part of the architecture. The instructions may also be executed in the pipeline stages without using any siding. Typically, instructions with long computation delays (Long latency instructions) are executed in sidings. D. Pipeline Rules The pipeline follows some set of execution and matching rules [5][1]. Execution Rules: There are four execution rules direct the flow of information between various stages of the pipeline. Fig. 1 : Countertlow Pipeline Processor Architecture [2][5] El. No Overtaking: The instructions cannot move away from the program order in the instruction pipeline, i.e., the instructions cannot pass each other. 2

3 E2. Execution: The instruction can be executed only if all the instruction s source bindings are valid, and if the instruction occupies a stage with suitable computing logic. At the end of the instruction execution, its destination binding flag is marked valid and its destination binding value is filled with the result. E3. Insert Result: On completing the execution of some instruction, the destination binding will be marked valid and one or more copies of it will be made for later instructions awaiting the particular value. E4. Stalling for Operands: No operation can retire into the register file without being executed. An unexecuted instruction must wait at the last stage with suitable computing logic until it can be executed. Matching Rules: These rules govem the exchange of bindings between the instruction and the result pipelines occupying the same stage at the same time. M 1. Garner Instruction Operands: when a valid result binding matches an invalid instruction operand binding, replace the operand value with the result value and mark the operand binding valid. M2. Kill result: when an invalid destination binding matches a valid result binding, mark the result binding invalid. M3. Update Results: when a valid destination binding matches a result binding, copy the destination value into the result value and mark it valid. E. Conditional Branches and Traps One of the features of CFPP architecture is the way it handles traps and branch instructions. A single bit identifier in the instruction and the result bindings of the pipelines helps in handling branches and traps efficiently. The CFPP usually predict that a conditional branch will not be taken. In each of a trap or a wrongly-predicated branch, a specially-marked result (poison bill) travels down the result pipeline invalidating all instructions in the pipeline after the trap or the wrongly-predicted instruction (kill instruction). When the poison bill reaches the end of the result pipeline, it is interrupted by the stage responsible for the program counter control (the decode unit). In case of a trap, the address of the trap handler is loaded into the program counter. Similarly, In case of wrongly-predicated instruction, the target of the branch address is loaded into the program counter. Thus the architecture can recover from erroneous branch predictions and support precise interrupts. F. Advantages and Disadvantages The CFPP has several advantages and disadvantages. The CFPP design Advantages 1. Speculative Execution: This is one of the most important advantages of CFPP. Branches are predicted at the beginning of the pipeline. A single bit identifier helps in handling branch predictions and traps. 2. Out-of-order Execution: It can execute out-of-order instructions in different stages of the pipeline at the same time. 3. Asynchronous vs. synchronous: The CFPP was primarily designed as an asynchronous processor. However, it can be implemented as a synchronous processor also. In synchronous implementation [7], a) the instructions can move only one stage and only at the clock signal, b) The speed of the clock is determined by the slowest stage in the pipeline, and c) Power consumption is usually more than that for asynchronous implementation. In asynchronous implementation [8], instruction executions are event triggered. Therefore, instruction can move to the next stage as soon as it can, thus minimizing pipeline stalling. 4. Others: The CFPP design exhibits instruction level parallelism [6] features like super-pipelining, thereby improving execution speed. The CFPP Design Disadvantages Enforcing the pipeline matching rules may be expensive. As it involves two pipelines along with sidings, it may use more chip area. Average execution latency increases as the number of stages in the pipeline increase. The design may introduce delays, such as the time between instruction issue and acquisition of all its operands. Though CFPP provides register-renaming, data forwarding and a simple, efficient implementation for handling interrupts and branching, it may cripple its performance due to more chances for pipeline stalling. For example, consider a dependent memory instruction following an add instruction. In a conventional pipeline, the add operation would have been performed and its result would be stored in the register. The memory operation can then use the result for its execution. In CFPP, if the memory instruction has its launch before the launch of an add instruction, it will be stalled till add operation has been performed. It then has to pass through all the pipeline stage again, before it could be executed. Due to instruction advancement rules, the maximum throughput of a pipeline is achieved when it is half full. As there are higher probabilities for pipeline stalling in CFPP than any other conventional processors, it is highly recommended to have a very efficient compiler that would handle some of the data dependencies. 3

4 111. ADVANCES IN SYNCHRONOUS IMPLEMENTATION Researches at Oregon State University [2][3] explored possibilities of synchronous implementation of counterflow pipeline processor. Janik et al. [2] first attempted designing a general synchronous pipeline structure called Virtual Register Processor [3] (VRP). Miller et al. [9] in their research identified three conditions due to which the pipeline stalls occur in counterflow processors. The first is that the instruction that requires registers that had not been used before would have to travel up half the pipeline when it can have the operand values from the register file. The second is that, since the instruction stays in the pipeline till it reaches the register file, stalling in any intermediate instruction can stall all the subsequent instructions. The third is that, dependencies between the instructions being issued must be resolved in order to issue more than one instruction per cycle. The first problem was resolved by arranging the register file on the same side of the pipeline as the decode unit. In order to overcome the second problem, VRP Processor was further modified in VRP+ processor [9] by adding reorder buffer (ROB). The basic architecture of VRP+ is shown in Fig. 3. By wrapping the instruction pipeline back onto itself, the pipeline stalling was minimized. As there are always sidings capable of executing further down the pipeline, the instructions that are not executed do not stall. Also, as there is no last siding, the need to check the dependencies between concurrently issued instructions is eliminated. IV. APPLICATION SPECIFIC COUNTERFLOW PROCESSORS The features in CFPP like simple and regular structure, modularity, local control, and inherent handling of complex structures such as register renaming and speculative execution have made Childers et al. [lo] to target this architecture for Application-Specific Instruction-set Processors (ASIPs). ASlPs use minimal instruction set and micro-architecture elements and give a good performance at a low cost and hence widely used in embedded systems. Childers designed counterflow pipeline [ 11 ] customized to a kernel loop. He modified the original counterflow pipeline to a very long instruction word (VLI W) architecture called wide counter-flow pipeline (WCPF) [ 13][ 14][ 151 to exploit ILP in kernel loops. He demonstrated that CFPP are appropriate for constructing application specific processors. The WCFPs had low design complexity although achieving performance in comparison to the general-purpose architecture. The custom WCFPs had an instruction width of 4 with memory, multiplier, and divider sidings. The simulations showed that the custom pipeline could achieve speed-ups of with an average speed-up of 3.7 for several benchmarks (fir, kl, k5, k7, kl2, gsm, dither, dot, dct and mexp) E%y these work [10][11][12] he explained that CFPP is a flexible target for high-level synthesis of application-specific processors. Childers et al. [6] determined that the speedup (Fig.4) of asynchronous CFPP could achieve up to 6 times the speedup of synchronous general-purpose pipeline processor. He attributed the following reasons for the improved speedup of asynchronous CFPP. 0 0 CFPP eliminates resource contention as they are tailored to the resource requirements of the graphs. The stages are arranged to minimize the latency of conveying source operands. They achieve average case execution time. Fig. 3: Basic VRP+ Architecture [O] J Fig. 4: Speedup of Custom asynchronous CFPPs over general purpose synchronous CFPPs [GI 4

5 V. EVALUATION OF COUNTERFLOW PROCESSOR Though all the researches have brought out the salient features of the counterflow processor, we could not find any design to evaluate CFPP performance to that of a conventional pipeline processor based on some standard metrics. We are currently working on synchronous implementation and evaluation of the performance of the counterflow pipeline processor with traditional MIPS pipeline processor. As MlPS processor [4] was one of the first RlSC processor proposed, we choose this to evaluate CFPP architecture performance. The research uses Xilinx Foundation CAD tools and the design is coded using VHDL Programming. Both the processors are implemented on the same FPGA device (VIRTEX) and their performance parameters like speed, average CPI (Clocks-per-instruction), number of logic cells and system gates and performance to cost ratio are to be evaluated. Both the processors are implemented to execute the same chosen instruction set. Our research also includes comparison and evaluation of the performance of a branch prediction technique (BTB-HIPT) for traditional MIPS pipeline processor architecture and CFPP architecture. VI. CONCLUSION This paper is a novel approach to study the different works on the counterflow pipeline architecture. Since it was proposed in 1994, there has been several research and simulations on the architecture. The simulations claim that the counterflow pipeline processor would prove better than the conventional processors. It can be implemented both synchronously and asynchronously. Dr. Childers choose CFPP over other architectures for ASIPs and embedded systems. However there has not been any asynchronous implementation of the processor. It still remains a statement that If implemented successfully, Counterflow Pipeline Processor will be the first asynchronous processor [5]. We hence conclude with an expectation that our research on synchronous implementation would identify the advantages and disadvantages of counterflow pipeline processor (CFPP) architecture over the traditional pipeline processor architecture. VII. ACKNOWLEDGEMENT The authors gratefully acknowledge Dr. Ken Currie, Center for Manufacturing Research, Tennessee Technological University, Cookeville, TN for supporting their research. VIII. REFERENCES [I] R.F. Sproull, I.E. Sutherland, and C.E. Molnar, [2] [3] Counterflow Pipeline Processor Architecture, IEEE Design and test of Computers, Fall 1994, Vol. 11, NO.3, pp K.J. Janik, and S. Lu, Synchronous Implementation of a Counterflow Pipeline Processor IEEE International Symposium on Circuits and Systems, 1996, ~01.4, pp K.J. Janik, S. Lu, and M.F. Miller, Advances of the Counterflow Pipeline Microarchitecture, Third Svmposium on High Performance Computer Architecture, Feb. 1997, pp [4] J.L. Hennessy, and D.A. Patterson, Computer Architecture - Hardware Software CO-design, Morgan [SI [6] Kaufmann Publications, M.D. Jones, A New Approach to Microprocessors, Department of Computer Science, Brigham Young University, Utah, j ones/latex/sproull.html/sproull.html.html B. Childers, and J. Davidson, Application-Specific Pipelines for Exploiting Instruction-Level Parallelism, University of Virginia, Technical Report No. CS-98-14, May 1, [7] C.H. Van Berkel, M.B. Josephs, and S.M. Nowick, Scanning the technology: Applications of Asynchronous Circuits, Proceedings of the IEEE Feb vol. 87, NO. 2, pp [8] AI. Davis, S.M. Nowick, An lntroduction to Asynchronous Circuit Design A technical report UUCS , University of Utah, [9] M.F. Miller, K.J. Janik, and S. Lu, Non-Stalling Counterflow Architecture, Fourth Symposium on High Performance Computer Architecture, Las Vegas, NV, Feb. 1998, pp [ 101 B. Childers, Custom embedded counterflow pipelines, Ph. D., Thesis at University of Virginia, Charlottesville, Virginia, Jan [I 11 B. Childers, and J. Davidson, Automatic Counterflow Pipeline Synthesis, University of Virginia, Technical Report No. CS-98-01, January [12] B. Childers, and J. Davidson, A Design Environment for Counterflow Pipeline Synthesis, University of Virginia, Technical Report No. CS-98-05, March [I31 B. Childers, and J. Davidson, An infrastructure for designing custom embedded counterflow pipelines, Hawaii International Conference on System Sciences, Maui, Hawaii, January 3-7,2000. [ 141 B. Childers, and J. Davidson, Architectural Considerations for Application-Specific Counterflow Pipelines, Proceedings of the 20 Anniversary Conference on Advanced Research in VLSI, Atlanta, Georgia, March 1999 pp [15] B. Childers, and J. Davidson, Automatic Design of Custom wide-issue Counterflow Pipelines, University of Virginia, CS Technical Report ( , January

Automatic Counterflow Pipeline Synthesis

Automatic Counterflow Pipeline Synthesis Automatic Counterflow Pipeline Synthesis Bruce R. Childers, Jack W. Davidson Computer Science Department University of Virginia Charlottesville, Virginia 22901 {brc2m, jwd}@cs.virginia.edu Abstract The

More information

IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 2, FEBRUARY Custom Wide Counterflow Pipelines for High-Performance Embedded Applications

IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 2, FEBRUARY Custom Wide Counterflow Pipelines for High-Performance Embedded Applications IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 2, FEBRUARY 2004 141 Custom Wide Counterflow Pipelines for High-Performance Embedded Applications Bruce R. Childers, Member, IEEE, and Jack W. Davidson, Member,

More information

Application-Specific Pipelines for Exploiting Instruction-Level Parallelism

Application-Specific Pipelines for Exploiting Instruction-Level Parallelism Application-Specific Pipelines for Exploiting Instruction-Level Parallelism Bruce R. Childers, Jack W. Davidson Department of Computer Science University of Virginia Charlottesville, Virginia 22903 {brc2m,

More information

Processors. Young W. Lim. May 12, 2016

Processors. Young W. Lim. May 12, 2016 Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

More information

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit P Ajith Kumar 1, M Vijaya Lakshmi 2 P.G. Student, Department of Electronics and Communication Engineering, St.Martin s Engineering College,

More information

Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor

Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor Abstract The proposed work is the design of a 32 bit RISC (Reduced Instruction Set Computer) processor. The design

More information

Instr. execution impl. view

Instr. execution impl. view Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

Custom Wide Counterflow Pipelines for High Performance Embedded Applications

Custom Wide Counterflow Pipelines for High Performance Embedded Applications Custom Wide Counterflow Pipelines for High Performance Embedded Applications Bruce R. Childers Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania 15260 childers@cs.pitt.edu

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Processors. Young W. Lim. May 9, 2016

Processors. Young W. Lim. May 9, 2016 Processors Young W. Lim May 9, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Chapter 3 & Appendix C Part B: ILP and Its Exploitation

Chapter 3 & Appendix C Part B: ILP and Its Exploitation CS359: Computer Architecture Chapter 3 & Appendix C Part B: ILP and Its Exploitation Yanyan Shen Department of Computer Science and Engineering Shanghai Jiao Tong University 1 Outline 3.1 Concepts and

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

CS433 Homework 2 (Chapter 3)

CS433 Homework 2 (Chapter 3) CS433 Homework 2 (Chapter 3) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies

More information

Design of 16-bit RISC Processor Supraj Gaonkar 1, Anitha M. 2

Design of 16-bit RISC Processor Supraj Gaonkar 1, Anitha M. 2 Design of 16-bit RISC Processor Supraj Gaonkar 1, Anitha M. 2 1 M.Tech student, Sir M Visvesvaraya Institute of Technology Bangalore. Karnataka, India 2 Associate Professor Department of Telecommunication

More information

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections ) Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target

More information

Pipelined MIPS processor with cache controller using VHDL implementation for educational purpose

Pipelined MIPS processor with cache controller using VHDL implementation for educational purpose Journal From the SelectedWorks of Kirat Pal Singh Winter December 28, 203 Pipelined MIPS processor with cache controller using VHDL implementation for educational purpose Hadeel Sh. Mahmood, College of

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores A Configurable Multi-Ported Register File Architecture for Soft Processor Cores Mazen A. R. Saghir and Rawan Naous Department of Electrical and Computer Engineering American University of Beirut P.O. Box

More information

Novel Design of Dual Core RISC Architecture Implementation

Novel Design of Dual Core RISC Architecture Implementation Journal From the SelectedWorks of Kirat Pal Singh Spring May 18, 2015 Novel Design of Dual Core RISC Architecture Implementation Akshatha Rai K, VTU University, MITE, Moodbidri, Karnataka Basavaraj H J,

More information

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers) physical register file that is the same size as the architectural registers

More information

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory Embedded Systems 8. Hardware Components Lothar Thiele Computer Engineering and Networks Laboratory Do you Remember? 8 2 8 3 High Level Physical View 8 4 High Level Physical View 8 5 Implementation Alternatives

More information

Single Instructions Can Execute Several Low Level

Single Instructions Can Execute Several Low Level We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with single instructions

More information

Design and Implementation of a Super Scalar DLX based Microprocessor

Design and Implementation of a Super Scalar DLX based Microprocessor Design and Implementation of a Super Scalar DLX based Microprocessor 2 DLX Architecture As mentioned above, the Kishon is based on the original DLX as studies in (Hennessy & Patterson, 1996). By: Amnon

More information

Introduction to Asynchronous Circuits and Systems

Introduction to Asynchronous Circuits and Systems RCIM Presentation Introduction to Asynchronous Circuits and Systems Kristofer Perta April 02 / 2004 University of Windsor Computer and Electrical Engineering Dept. Presentation Outline Section - Introduction

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Redacted for Privacy

Redacted for Privacy AN ABSTRACT OF THE THESIS OF Michael F. Miller for the degree of Masters of Science in Electrical and Computer Engineering presented on July 17, 1997. Title: CounterDataFlow Architecture: Design and Performance.

More information

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of

More information

Spring 2014 Midterm Exam Review

Spring 2014 Midterm Exam Review mr 1 When / Where Spring 2014 Midterm Exam Review mr 1 Monday, 31 March 2014, 9:30-10:40 CDT 1112 P. Taylor Hall (Here) Conditions Closed Book, Closed Notes Bring one sheet of notes (both sides), 216 mm

More information

THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION

THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION Radu Balaban Computer Science student, Technical University of Cluj Napoca, Romania horizon3d@yahoo.com Horea Hopârtean Computer Science student,

More information

Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.

Module 5: MIPS R10000: A Case Study Lecture 9: MIPS R10000: A Case Study MIPS R A case study in modern microarchitecture. Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R10000 A case study in modern microarchitecture Overview Stage 1: Fetch Stage 2: Decode/Rename Branch prediction Branch

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches

More information

structural RTL for mov ra, rb Answer:- (Page 164) Virtualians Social Network Prepared by: Irfan Khan

structural RTL for mov ra, rb Answer:- (Page 164) Virtualians Social Network  Prepared by: Irfan Khan Solved Subjective Midterm Papers For Preparation of Midterm Exam Two approaches for control unit. Answer:- (Page 150) Additionally, there are two different approaches to the control unit design; it can

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III/VI Section : CSE-1 & CSE-2 Subject Code : CS2354 Subject Name : Advanced Computer Architecture Degree & Branch : B.E C.S.E. UNIT-1 1.

More information

CIS 371 Spring 2010 Thu. 4 March 2010

CIS 371 Spring 2010 Thu. 4 March 2010 1 Computer Organization and Design Midterm Exam Solutions CIS 371 Spring 2010 Thu. 4 March 2010 This exam is closed book and note. You may use one double-sided sheet of notes, but no magnifying glasses!

More information

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

Non-Stalling CounterFlow Architecture

Non-Stalling CounterFlow Architecture Non-Stalling CounterFlow Architecture Michael F. Miller, Kennneth J. Janik, and Shih-Lien Lu: mikem@ichips.intel.com, kjanik@icihps.intel.com, sllu@ece.orst.edu Dept of Electrical and Computer Engineering,

More information

Chapter 9. Pipelining Design Techniques

Chapter 9. Pipelining Design Techniques Chapter 9 Pipelining Design Techniques 9.1 General Concepts Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to be performed in sequence. Each subtask

More information

CS 654 Computer Architecture Summary. Peter Kemper

CS 654 Computer Architecture Summary. Peter Kemper CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

A Study for Branch Predictors to Alleviate the Aliasing Problem

A Study for Branch Predictors to Alleviate the Aliasing Problem A Study for Branch Predictors to Alleviate the Aliasing Problem Tieling Xie, Robert Evans, and Yul Chu Electrical and Computer Engineering Department Mississippi State University chu@ece.msstate.edu Abstract

More information

The counterow pipeline processor architecture (cfpp) is a proposal for a family of microarchitectures

The counterow pipeline processor architecture (cfpp) is a proposal for a family of microarchitectures Counterow Pipeline Processor Architecture Robert F. Sproull Ivan E. Sutherland Sun Microsystems Laboratories, Inc. Charles E. Molnar Institute for Biomedical Computing Washington University SMLI TR-94-25

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 05

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

1.3 Data processing; data storage; data movement; and control.

1.3 Data processing; data storage; data movement; and control. CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Chapter 4 The Processor (Part 4)

Chapter 4 The Processor (Part 4) Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline

More information

Instructional Level Parallelism

Instructional Level Parallelism ECE 585/SID: 999-28-7104/Taposh Dutta Roy 1 Instructional Level Parallelism Taposh Dutta Roy, Student Member, IEEE Abstract This paper is a review of the developments in Instruction level parallelism.

More information

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang

More information

Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS

Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS Advanced Computer Architecture- 06CS81 Hardware Based Speculation Tomasulu algorithm and Reorder Buffer Tomasulu idea: 1. Have reservation stations where register renaming is possible 2. Results are directly

More information

A Mechanism for Verifying Data Speculation

A Mechanism for Verifying Data Speculation A Mechanism for Verifying Data Speculation Enric Morancho, José María Llabería, and Àngel Olivé Computer Architecture Department, Universitat Politècnica de Catalunya (Spain), {enricm, llaberia, angel}@ac.upc.es

More information

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) 18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures

More information

CS Mid-Term Examination - Fall Solutions. Section A.

CS Mid-Term Examination - Fall Solutions. Section A. CS 211 - Mid-Term Examination - Fall 2008. Solutions Section A. Ques.1: 10 points For each of the questions, underline or circle the most suitable answer(s). The performance of a pipeline processor is

More information

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM Mansi Jhamb, Sugam Kapoor USIT, GGSIPU Sector 16-C, Dwarka, New Delhi-110078, India Abstract This paper demonstrates an asynchronous

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Design of Out-Of-Order Superscalar Processor with Speculative Thread Level Parallelism

Design of Out-Of-Order Superscalar Processor with Speculative Thread Level Parallelism ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination 1 Student name: Date: June 26, 2008 General requirements for the exam: 1. This is CLOSED BOOK examination; 2. No questions allowed within the examination period; 3. If something is not clear in question

More information

Design and Implementation of a FPGA-based Pipelined Microcontroller

Design and Implementation of a FPGA-based Pipelined Microcontroller Design and Implementation of a FPGA-based Pipelined Microcontroller Rainer Bermbach, Martin Kupfer University of Applied Sciences Braunschweig / Wolfenbüttel Germany Embedded World 2009, Nürnberg, 03.03.09

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

Introduction to CPU Design

Introduction to CPU Design ١ Introduction to CPU Design Computer Organization & Assembly Language Programming Dr Adnan Gutub aagutub at uqu.edu.sa [Adapted from slides of Dr. Kip Irvine: Assembly Language for Intel-Based Computers]

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations? Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined

More information

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1 Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]

More information

of Soft Core Processor Clock Synchronization DDR Controller and SDRAM by Using RISC Architecture

of Soft Core Processor Clock Synchronization DDR Controller and SDRAM by Using RISC Architecture Enhancement of Soft Core Processor Clock Synchronization DDR Controller and SDRAM by Using RISC Architecture Sushmita Bilani Department of Electronics and Communication (Embedded System & VLSI Design),

More information

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch branch taken Revisiting Branch Hazard Solutions Stall Predict Not Taken Predict Taken Branch Delay Slot Branch I+1 I+2 I+3 Predict Not Taken branch not taken Branch I+1 IF (bubble) (bubble) (bubble) (bubble)

More information

CS433 Homework 2 (Chapter 3)

CS433 Homework 2 (Chapter 3) CS Homework 2 (Chapter ) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration..

More information

Techniques for Efficient Processing in Runahead Execution Engines

Techniques for Efficient Processing in Runahead Execution Engines Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt Depment of Electrical and Computer Engineering University of Texas at Austin {onur,hyesoon,patt}@ece.utexas.edu

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Computer Architecture Lecture 15: Load/Store Handling and Data Flow. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014

Computer Architecture Lecture 15: Load/Store Handling and Data Flow. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014 18-447 Computer Architecture Lecture 15: Load/Store Handling and Data Flow Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014 Lab 4 Heads Up Lab 4a out Branch handling and branch predictors

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović Computer Architecture and Engineering CS52 Quiz #3 March 22nd, 202 Professor Krste Asanović Name: This is a closed book, closed notes exam. 80 Minutes 0 Pages Notes: Not all questions are

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies VLSI IMPLEMENTATION OF HIGH PERFORMANCE DISTRIBUTED ARITHMETIC (DA) BASED ADAPTIVE FILTER WITH FAST CONVERGENCE FACTOR G. PARTHIBAN 1, P.SATHIYA 2 PG Student, VLSI Design, Department of ECE, Surya Group

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

INSTRUCTION LEVEL PARALLELISM

INSTRUCTION LEVEL PARALLELISM INSTRUCTION LEVEL PARALLELISM Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix H, John L. Hennessy and David A. Patterson,

More information

SF-LRU Cache Replacement Algorithm

SF-LRU Cache Replacement Algorithm SF-LRU Cache Replacement Algorithm Jaafar Alghazo, Adil Akaaboune, Nazeih Botros Southern Illinois University at Carbondale Department of Electrical and Computer Engineering Carbondale, IL 6291 alghazo@siu.edu,

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Instructor Information

Instructor Information CS 203A Advanced Computer Architecture Lecture 1 1 Instructor Information Rajiv Gupta Office: Engg.II Room 408 E-mail: gupta@cs.ucr.edu Tel: (951) 827-2558 Office Times: T, Th 1-2 pm 2 1 Course Syllabus

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering Department of Computer Science and Engineering UNIT-III PROCESSOR AND CONTROL UNIT PART A 1. Define MIPS. MIPS:One alternative to time as the metric is MIPS(Million Instruction Per Second) MIPS=Instruction

More information

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency? Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

Electronics Engineering, DBACER, Nagpur, Maharashtra, India 5. Electronics Engineering, RGCER, Nagpur, Maharashtra, India.

Electronics Engineering, DBACER, Nagpur, Maharashtra, India 5. Electronics Engineering, RGCER, Nagpur, Maharashtra, India. Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Design and Implementation

More information