ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

Size: px
Start display at page:

Download "ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation"

Transcription

1 ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating virtual machine. The ABI virtual machines such as FX!32 [1], SUN WABI [2], and SHADE [3] use binary translator to translate application binaries with an ISA different from hardware platform so that they can be executed on that hardware platform. Some system uses binary translator as a component of dynamic optimization. In this paper, we study the effective of binary translation applied to two ISA that have different configuration of register files. We chose MIPS instruction set as a base ISA that has a flat register file for its simplicity and generality and CRAY 2 instruction set that has a hierarchal register file structure as a target ISA. 1. Introduction In VM architecture design, the emulation is a very important method to enable a (sub) system to present the same interface and characteristics as another system. By executing and tracing programs, the emulator can help build better computer hardware and software. As one of the ways to implementing emulator, Binary Translation is not only efficient for repeated instruction executions, but also can solve the software compatibility problem, which is becoming more and more serious nowadays. For example, one of the major impediments to using a VLIW (or any new ILP machine architecture) has been its inability to run existing binaries of established architectures. Code scheduling plays an importance role in increasing ILP available in the program. However, aggressive scheduling technique has high register requirement. In addition, the use of aggressive processor configurations tends to increase the number of register required by software pipeline loops. The flat register file organization traditionally used in the design of microprocessors does not scale well when the register file requirements and the number of ports required to access it are high. In [4] present an alternative design for register file of future aggressive VLIW processor, a 2-level hierarchical register file, which combines high capacity and high number of ports access with low 1

2 access time. Higher capacity reduces spill code and allows the software to make aggressive code scheduling and optimization. Our goal is to develop and binary translator that map flat register file configuration to hierarchical register file configuration and obtain the characteristic and some initial requirement for translating between the two platforms. This paper is organized in the following manner. The next section discusses our implementation of major module consist in the project. Section 3 shows our experiment set up and results. Section 4 discusses related works, and finally, we present our conclusion and future work in section Methodology 2.1 Overview Figure 1 shows the overview of the component that we develop for this project. - The compiler that compiles the Java or C benchmarks to MIPS assembly code is already available. In this project we use a simple Java compiler which can compile limited Java programs to MIPS assembly code. The language features supported by this compiler include integer operations, logic operation, functional call which can also be recursive, variables comparison, string variables, result output (printf), variables assignment, branch instructions (if... else...) and while loop. Although it only supports a subset of Java or C language, we think it is enough for us because our focus is on binary translation. - We develop simple in-order simulators for MIPS and CRAY-2 in order to obtain the profile information, as well as the statistic information for comparison between the two platforms. These simulators also serve as correctness verification. - The parser compiles the MIPS assembly code to binary format and writes the result binary in to a file. - The translator translates the input MIPS binary in to CRAY-2 binary file, and perform optimizations during the translation. The translator is the main emphasis in this project and will be discuss in great detail. 2

3 Compiler Parser Translator MIPS Simulator CRAY-2 Simulator Correctness verification and Performance comparison Figure 1: Overview of the component in MIPS to CRAY-2 Binary translator 2.2 Instruction set MIPS instruction has a fixed length of 4 bytes. It has 34 registers. Besides the 32 general purpose registers, LO and HI are special registers that hold 64 bit results from multiplication and division instructions. CRAY-2 instruction set has variable length instructions and uses word addressing. It has 8 of first level register and 64 of 2nd level registers. Due to time constrain, we only implement a subset of MIPS and CRAY-2 instruction set and we only chose a use-level instruction. Appendix A gives detail information about our source and target instruction set. 2.3 Binary translator Our translator is a static translator. We have 3 levels of translator. Level 1 is a baseline translator that has no optimization. A level 2 translator enhances the base line translator by construct a basic block and add register allocation technique to reduce spills to second level register. Level 3 translator enhances level 2 by 3

4 construct a superblock and applies various technique of superblock optimization. Appendix B gives detail information of the map from MIPS instructions to CRAY-2 instructions Level 1 translator Since Cray has more level 2 registers than MIPS registers, all MIPS registers are identically mapped to Cray level 2 register. Level 2 register is loaded to level 1 register whenever it s used and write it back to level 2 register when the result is generated. The MIPS instructions are scanned and converted into CRAY- 2 instruction, then the result CRAY-2 instructions are scanned and fix all the link of change of control flow instruction. This scheme allows us to do the translation statically without any runtime information but not very efficiently level 2 translator We construct the control flow graphs of basic blocks. Each basic block contains several MIPS instructions. The new basic block is constructed whenever encounter branch and direct jump whose targets basic block haven t been constructed. The new graph is constructed whenever the translator encounters the instruction of Jump And Link (i.e. call). The termination condition for a graph construction is indirect jump (i.e. return), direct jump and branch whose targets are already been constructed, and the end of program. After all graphs are constructed, the graph-coloring algorithm is applied to allocation MIPS registers to 8 registers with minimum register spills. The translators then walk through the graphs, convert MIPS instruction to CRAY-2 instruction, and adjust all the links of change of control flow instruction. Because we translate the binary code statically, and we assume all indirect jumps happen only when functions return, we do not need to handle them specifically level 3 translators After transform the MIPS instructions into graphs of basic blocks and apply the graph-coloring algorithm, the superblock [5] is generated based on the flow graph of basic blocks and the edge profile information obtained by the simulator. We generate the superblocks conservatively, i.e. we do not try to duplicate basic 4

5 blocks when the down stream block is target of more than one block. Besides this, the criterion to enlarge a superblock is that the ratio of execution frequency between its two paths exceeds 2.0. If the conservative condition and ratio requirement are met, we enlarge the superblock along the path with higher execution frequency. Unconditional branch is removed and its target block is directly merged if the conservative condition is met. After the superblock is generated, now with bigger scope, the translator walks through all the graph of superblocks and converts MIPS instruction to CRAY-2 instruction. Then many code optimization techniques can be applied to the CRAY-2 superblock. As our translator is static, couple of optimizations such as code sinking, peephole and dead code elimination have been used at this level. The following figure explains the relationships between the three different level translations: superblock basic Block Level 3 Level 2 Natural Translator Level 1 reg s coloring code sinking, dead code, peephole Figure 2: Translator diagram Binary CRAY sequence As compared in section 3, each level gives different performance and tradeoff. 2.4 Simulator For better and more precise comparison between the codes before and after our translation, we construct two simple simulators, one for MIPS and the other for CRAY. The simulator only supports in-order execution and only one functional unit. The simulators has ability to generate edge profile information along with other statistic such as total instruction simulated, IPC, total number of bytes of instruction 5

6 simulated, total number of load and stores instruction, total number of inter and intra level register moves. These simulators also serve as correctness verification. 3. Experimental setup and result Since we only translate a subset of MIPS instruction sets, we use the benchmark generated from our compiler. This test benchmark contains approximately 1850 MIPS assembly instructions and performs different tasks, e.g. Fibonacci recursion algorithm, Dot product, tight mathematical loop, nested loop. This benchmark needs to be run for more than one million cycles before it finishes, and is sufficient enough to evaluate both the quality and quantity of our special translator running at different level. The register configuration that we use for both simulators are as follow. MIPS simulator CRAY-2 simulator Number of first level register 32 8 Number of second level register 0 64 Memory, 1 st, 2 nd latency level register 1 1 Table 1: Register configuration for both simulators We generate the MIPS binary of the benchmark and translate it with different translator, which has different level of optimization and run it on the simulator to compare the result from standard output to verify the correctness. The following figures summarize the results that we got from our experiment. Figure 4 shows the code size for the binary files. The code size increases after translation. The reasons are: First, even though most of CRAY-2 instruction consumes only 2 bytes while all MIPS instruction consumes four bytes, the most frequently use MIPS instruction such as ADDI, ORI, LUI, LW, ST requires more than one CRAY-2 instruction to emulate them. An important drawback of the CRAY-2 instructions is 6

7 that the arithmetic operations and memory operations can only have register operands, and can not operate immediate value directly. So to translate all of these instructions, we need at least one more instruction to load the immediate value to a temporal register, which will be 4-byte instructions and thus increase the code size. Also, some MIPS instructions have no directly corresponding instructions in CRAY-2 ISA. So we need more than one instruction to achieve the same results. Second, the small amount of level 1 register in CRAY-2 architecture introduces some overhead because extra inter-level register move instructions are required, which consumes 4 bytes each. Therefore, the code and byte code expansion are quite high. Code size of program generated from different translator Code size(byte) MIPS level1 level2 level3 Translator type Code size(byte) Figure 4: Comparison between code size in byte of MIPS binary, binary generated from level 1, level 2, and level 3 translator. From figure 4, the code expansion of the code generated from natural (level1) translator is about 4 times the source code size, which is mainly because the huge numbers of data transfer between the two levels of registers. The code size of natural translator can be reduced sharply by effectively coloring register allocation taking the advantage of 1st level registers. In our benchmark the code size is shrunk with the factor about 2.2 when apply optimization level 2. Optimization level 3 gives a slightly smaller code size than optimization level2 due to the removal of some redundant code. Figure 5 shows the simulation cycle results. Because the simulation cycle is proportional to execution time for the same clock frequency, it is an ideal matrix for performance measurement. From figure 5 we can see, 7

8 the performance of code generated from natural translator degrades by approximately 4 times comparing to the performance of MIPS code due to translation overhead. When applying the optimization level2, the performance improves substantially due to the reduction of inter-level register move instruction as shown in figure 6 and 7. Simulation Cycle of program generated from differnt translator Simulation Cycle MIPS level1 level2 level3 Translator type Simulation Cycle Figure 5: Comparison between simulation cycles from simulator executing MIPS binary, binary generated from level1, level2, and level3 translator. For the optimizations used in translation at level3 do not give out very big improvement in terms of code size and execution time. There are two reasons: 1) The superblocks we formed are too conservative. This limits the size of superblocks and the scope for optimization. This is also the reason that the code size does not increase after superblock formation. 2) Optimization schemes such as peephole, dead code elimination and code sinking are not powerful for reducing both size and layout. Because peephole has already been performed on the MIPS code, not too much chances for it will be introduced by translation. Code sinking will not reduce the code size, but it can reduce the execution time. In fact we believe the inlining technique will be very helpful for performance. But due to the time limitation we do not implement and evaluate it. Figure 6 and 7 show the experiment results of register read and write. Because MIPS only has one level of registers, we compare the register accesses of first-level registers in CRAY to the register accesses in 8

9 MIPS. In figure 6 we can see the similar trend for register read and write operation as that in figure 4 and 5. The graph coloring register allocation in level-2 translation effectively reduces the register read and write. However, as mentioned before, the limitation exists as that a fair amount of CRAY instructions can not operate immediate value, while the corresponding MIPS instructions can. So the increase of register access in inevitable. Number of level1 register Read/Write of program generated from different translator Number of level1 register Read/Write MIPS level1 level2 level3 Translator type L1 register Write L1 register Read Figure 6: Comparison between number of level1 register read and write from simulator executing MIPS binary, and binary generated from level 1, level 2, level 3 translator. Figure 7 shows the accesses to the second level registers, for three different levels of translation. It is quite reasonable to see the drop of accesses after graph coloring method effectively allocate first level registers. Figure 8 shows the results of memory load and store. The numbers don t change too much, except on level 3 we can see some reductions which come from the elimination of some redundant load and store instructions, and move some load and store instructions to off-trace blocks. However, the improvement is minor. Note that the number of load and store operations does not increase after translation, which is because we assume the same bus bandwidth, and one CRAY memory operation (load or store) can achieve the same effect as one MIPS memory operation. On the other hand, because we assume the memory load 9

10 and store only have one-cycle latency, the reduction of memory load and store instructions does not affect the total execution time too much. Number of level2 register Read/Write of program generated from different translator Number of level2 register Read/Write level1 level2 level3 Level 2 register Read Level 2 registe Write Translator type Figure 7: Comparison between number of level2 register read and write from simulator executing MIPS binary, and binary generated from level 1, level 2, level 3 translator. 10

11 Number of Memory Load/Store of program genertaed from different translator Number of memory Load/Store MIPS level1 level2 level3 Translator type Memory Store Memory Load Figure 8: Comparison between number of memory load and store from simulator executing MIPS binary, and binary generated from level 1, level 2, level 3 translator. 4. Related work In the last decades there are several emulators very successfully performed dynamic binary translations like SHADE [3], DAISY [10], FX!32 [1], and SUN Wabi [2]. As a fast instruction set simulator at user level code, SHADE runs on SPARC system and simulates the SPARC (Version 8 and 9) and MIPS I instruction sets. It emulates a target system by dynamically cross-compiling the target machine code to run on the host. The dynamically compiled code can optionally include profiling code that traces the execution of the application running on the virtual target machine. The profiling is extensible and may be dynamically controlled by the user. Another interesting point is that SHADE has translation cache and translation TLB. So the translations are cached for later reuse to amortize compilation cost. DAISY from IBM Watson Research Center focuses on the full system simulation, i.e. unlike SHADE, it runs on both user and OS level. At run-time DAISY dynamically translates code for a PowerPC processor into code for an underlying VLIW processor. Its translated VLIW code are saved and cached too so that when the same PowerPC code is later encountered, its VLIW translation can execute immediately without retranslation. FX!32 and SUN Wabi are both ABI VM and their ISA are differ from Intel X-86 ISA. However the FX!32 translate any 32-bit x-86 application that run on an intelx86 microprocessor running the Windows NT4.0 operating system to run on Alpha microprocessor that also running the Windows NT4.0 operating system 11

12 while the SUN Wabi translate some commonly used x-86 application to run on Unix. So the translation for the SUN Wabi is more involved and this is reflex in their primary goal that they only support some commonly use windows application and emphasize on correctness. FX!32 does not do any translation while the application is executing. Rather, FX!32 captures an execution profile on the first run time and store it in the disk. The translation in done offline using run time collected data from profiling to translate and optimization then put the result into the translation cache. The later executions use the translated code in translation cache and continue profiling. The SUN Wabi uses dynamic translation base on the advantage of the dynamic translation such as adaptive optimization can exceed static quality and dynamically generate application code. It re-translates each time the ABI is initiated so there s no persistence between executions. SUN Wabi does translation per interval and store in the code cache with FIFO management. 5. Conclusion and future work There are some conclusions we can make from this experiment. First, it is desirable to have large amount of first-level registers. In our experiment, the small amount of first level register becomes the bottleneck. We do not fully utilize the second-level registers. Because we do not implement register renaming and speculative execution, we only need the same amount of registers in the second level register file, as that in MIPS. However, the small amount of first level register introduces extra execution such as spills. On the other hand, if we implement register renaming and speculative execution, we do not think there will be much improvement because the increase of inter-register move will reduce the benefit brought by renaming and speculation. On the other hand, in our experiment, we only have one execution unit and the instructions can not be executed parallel. This structure is not suitable to have hierarchical register files. For modern hierarchical register design, there are more than one function unit and each functional unit has its own first-level register file. However, even in those designs with multi function unit, the translation should have an effectively way to distribute the register usages evenly. Otherwise the small amount of first-level register will still become the bottleneck. 12

13 Graph coloring register allocation algorithm can reduce the inter-register move overhead substantially. But it is time-consuming and it is not possible to be implemented for dynamical translation. Second, the conservative superblock formation does not bring us too much benefit. Although it limit the size expansion, it prevents us from discovering more opportunities for optimization. Third, the limitation that a fair amount of CRAY instructions can not operate immediate value directly brings much overhead. It increases the code size, register assesses, and consequently increases the execution time. For the future work, integrating the simulator with the translator and constructing the translation cache allow the dynamic optimization to be performed and can improve the quality of the translator even further than what we see in this project. It s possible that the hierarchical register file can help improve the performance of VLIW architecture as introduce in [4]. Therefore, it s interesting to study the effect of organizing the CRAY instruction in bundles and run them in VLIW manner. This requires the modification in simulator to VLIW simulator. 13

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Virtual Machines and Dynamic Translation: Implementing ISAs in Software Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2007 Lecture 14: Virtual Machines 563 L14.1 Fall 2009 Outline Types of Virtual Machine User-level (or Process VMs) System-level Techniques for implementing all

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

ECE 486/586. Computer Architecture. Lecture # 7

ECE 486/586. Computer Architecture. Lecture # 7 ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix

More information

Chapter 2A Instructions: Language of the Computer

Chapter 2A Instructions: Language of the Computer Chapter 2A Instructions: Language of the Computer Copyright 2009 Elsevier, Inc. All rights reserved. Instruction Set The repertoire of instructions of a computer Different computers have different instruction

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Inherently Lower Complexity Architectures using Dynamic Optimization. Michael Gschwind Erik Altman

Inherently Lower Complexity Architectures using Dynamic Optimization. Michael Gschwind Erik Altman Inherently Lower Complexity Architectures using Dynamic Optimization Michael Gschwind Erik Altman ÿþýüûúùúüø öõôóüòñõñ ðïîüíñóöñð What is the Problem? Out of order superscalars achieve high performance....butatthecostofhighhigh

More information

Execution-based Scheduling for VLIW Architectures. Kemal Ebcioglu Erik R. Altman (Presenter) Sumedh Sathaye Michael Gschwind

Execution-based Scheduling for VLIW Architectures. Kemal Ebcioglu Erik R. Altman (Presenter) Sumedh Sathaye Michael Gschwind Execution-based Scheduling for VLIW Architectures Kemal Ebcioglu Erik R. Altman (Presenter) Sumedh Sathaye Michael Gschwind September 2, 1999 Outline Overview What's new? Results Conclusions Overview Based

More information

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture Lecture Topics ECE 486/586 Computer Architecture Lecture # 5 Spring 2015 Portland State University Quantitative Principles of Computer Design Fallacies and Pitfalls Instruction Set Principles Introduction

More information

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine Machine Language Instructions Introduction Instructions Words of a language understood by machine Instruction set Vocabulary of the machine Current goal: to relate a high level language to instruction

More information

Review of instruction set architectures

Review of instruction set architectures Review of instruction set architectures Outline ISA and Assembly Language RISC vs. CISC Instruction Set Definition (MIPS) 2 ISA and assembly language Assembly language ISA Machine language 3 Assembly language

More information

Computer Architecture

Computer Architecture CS3350B Computer Architecture Winter 2015 Lecture 4.2: MIPS ISA -- Instruction Representation Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design,

More information

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers Administration CS 380C: Advanced Topics in Compilers Instructor: eshav Pingali Professor (CS, ICES) Office: POB 4.126A Email: pingali@cs.utexas.edu TA: TBD Graduate student (CS) Office: Email: Meeting

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 15 April 1, 2009 martha@cs.columbia.edu and the rest of the semester Source code (e.g., *.java, *.c) (software) Compiler MIPS instruction set architecture

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

15-740/ Computer Architecture Lecture 21: Superscalar Processing. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 21: Superscalar Processing. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 21: Superscalar Processing Prof. Onur Mutlu Carnegie Mellon University Announcements Project Milestone 2 Due November 10 Homework 4 Out today Due November 15

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data

More information

ECE 486/586. Computer Architecture. Lecture # 8

ECE 486/586. Computer Architecture. Lecture # 8 ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Lecture Topics Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls

More information

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the Evolution of ISAs Instruction set architectures have changed over computer generations with changes in the cost of the hardware density of the hardware design philosophy potential performance gains One

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 2: Hardware/Software Interface Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Basic computer components How does a microprocessor

More information

The Role of Performance

The Role of Performance Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture The Role of Performance What is performance? A set of metrics that allow us to compare two different hardware

More information

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles.

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles. ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls Reference:

More information

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: C Multiple Issue Based on P&H Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in

More information

CS3350B Computer Architecture MIPS Instruction Representation

CS3350B Computer Architecture MIPS Instruction Representation CS3350B Computer Architecture MIPS Instruction Representation Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada

More information

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes Chapter 2 Instructions: Language of the Computer Adapted by Paulo Lopes Instruction Set The repertoire of instructions of a computer Different computers have different instruction sets But with many aspects

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

Introduction to the MIPS. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Introduction to the MIPS. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction to the MIPS Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction to the MIPS The Microprocessor without Interlocked Pipeline Stages

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 18: Virtual Machines

CS252 Spring 2017 Graduate Computer Architecture. Lecture 18: Virtual Machines CS252 Spring 2017 Graduate Computer Architecture Lecture 18: Virtual Machines Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Midterm Topics ISA -- e.g. RISC vs. CISC

More information

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1]) EE392C: Advanced Topics in Computer Architecture Lecture #10 Polymorphic Processors Stanford University Thursday, 8 May 2003 Virtual Machines Lecture #10: Thursday, 1 May 2003 Lecturer: Jayanth Gummaraju,

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

LIA. Large Installation Administration. Virtualization

LIA. Large Installation Administration. Virtualization LIA Large Installation Administration Virtualization 2 Virtualization What is Virtualization "a technique for hiding the physical characteristics of computing resources from the way in which other systems,

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Crusoe Reference. What is Binary Translation. What is so hard about it? Thinking Outside the Box The Transmeta Crusoe Processor

Crusoe Reference. What is Binary Translation. What is so hard about it? Thinking Outside the Box The Transmeta Crusoe Processor Crusoe Reference Thinking Outside the Box The Transmeta Crusoe Processor 55:132/22C:160 High Performance Computer Architecture The Technology Behind Crusoe Processors--Low-power -Compatible Processors

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Virtual Memory: From Address Translation to Demand Paging

Virtual Memory: From Address Translation to Demand Paging Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 12, 2014

More information

CS3350B Computer Architecture MIPS Introduction

CS3350B Computer Architecture MIPS Introduction CS3350B Computer Architecture MIPS Introduction Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada Thursday January

More information

A Key Theme of CIS 371: Parallelism. CIS 371 Computer Organization and Design. Readings. This Unit: (In-Order) Superscalar Pipelines

A Key Theme of CIS 371: Parallelism. CIS 371 Computer Organization and Design. Readings. This Unit: (In-Order) Superscalar Pipelines A Key Theme of CIS 371: arallelism CIS 371 Computer Organization and Design Unit 10: Superscalar ipelines reviously: pipeline-level parallelism Work on execute of one instruction in parallel with decode

More information

Chapter 7 The Potential of Special-Purpose Hardware

Chapter 7 The Potential of Special-Purpose Hardware Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture

More information

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov Dealing With Control Hazards Simplest solution to stall pipeline until branch is resolved and target address is calculated

More information

Slides for Lecture 6

Slides for Lecture 6 Slides for Lecture 6 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 28 January,

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known

More information

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer

More information

CENG3420 Lecture 03 Review

CENG3420 Lecture 03 Review CENG3420 Lecture 03 Review Bei Yu byu@cse.cuhk.edu.hk 2017 Spring 1 / 38 CISC vs. RISC Complex Instruction Set Computer (CISC) Lots of instructions of variable size, very memory optimal, typically less

More information

Instructions: Language of the Computer

Instructions: Language of the Computer CS359: Computer Architecture Instructions: Language of the Computer Yanyan Shen Department of Computer Science and Engineering 1 The Language a Computer Understands Word a computer understands: instruction

More information

Computer Architecture. Chapter 2-2. Instructions: Language of the Computer

Computer Architecture. Chapter 2-2. Instructions: Language of the Computer Computer Architecture Chapter 2-2 Instructions: Language of the Computer 1 Procedures A major program structuring mechanism Calling & returning from a procedure requires a protocol. The protocol is a sequence

More information

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program and Code Improvement Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program Review Front end code Source code analysis Syntax tree Back end code Target code

More information

Uniprocessors. HPC Fall 2012 Prof. Robert van Engelen

Uniprocessors. HPC Fall 2012 Prof. Robert van Engelen Uniprocessors HPC Fall 2012 Prof. Robert van Engelen Overview PART I: Uniprocessors and Compiler Optimizations PART II: Multiprocessors and Parallel Programming Models Uniprocessors Processor architectures

More information

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Abstract In microprocessor-based systems, data and address buses are the core of the interface between a microprocessor

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements Lec 13: Linking and Memory Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University PA 2 is out Due on Oct 22 nd Announcements Prelim Oct 23 rd, 7:30-9:30/10:00 All content up to Lecture on Oct

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 4: MIPS Instructions Adapted from Computer Organization and Design, Patterson & Hennessy, UCB From Last Time Two values enter from the left (A and B) Need

More information

The Implications of Multi-core

The Implications of Multi-core The Implications of Multi- What I want to do today Given that everyone is heralding Multi- Is it really the Holy Grail? Will it cure cancer? A lot of misinformation has surfaced What multi- is and what

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

4. Hardware Platform: Real-Time Requirements

4. Hardware Platform: Real-Time Requirements 4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture

More information

CS 252 Graduate Computer Architecture. Lecture 15: Virtual Machines

CS 252 Graduate Computer Architecture. Lecture 15: Virtual Machines CS 252 Graduate Computer Architecture Lecture 15: Virtual Machines Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252

More information

Ten Reasons to Optimize a Processor

Ten Reasons to Optimize a Processor By Neil Robinson SoC designs today require application-specific logic that meets exacting design requirements, yet is flexible enough to adjust to evolving industry standards. Optimizing your processor

More information

Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture

Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture Australian Computer Science Communications, Vol.21, No.4, 1999, Springer-Verlag Singapore Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture Kenji Watanabe and Yamin Li Graduate

More information

Instruction Set Principles. (Appendix B)

Instruction Set Principles. (Appendix B) Instruction Set Principles (Appendix B) Outline Introduction Classification of Instruction Set Architectures Addressing Modes Instruction Set Operations Type & Size of Operands Instruction Set Encoding

More information

Chapter 4 The Processor (Part 4)

Chapter 4 The Processor (Part 4) Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline

More information

4.1 Paging suffers from and Segmentation suffers from. Ans

4.1 Paging suffers from and Segmentation suffers from. Ans Worked out Examples 4.1 Paging suffers from and Segmentation suffers from. Ans: Internal Fragmentation, External Fragmentation 4.2 Which of the following is/are fastest memory allocation policy? a. First

More information

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang

More information

Computer Organization MIPS ISA

Computer Organization MIPS ISA CPE 335 Computer Organization MIPS ISA Dr. Iyad Jafar Adapted from Dr. Gheith Abandah Slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE 232 MIPS ISA 1 (vonneumann) Processor Organization

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 199 5.2 Instruction Formats 199 5.2.1 Design Decisions for Instruction Sets 200 5.2.2 Little versus Big Endian 201 5.2.3 Internal

More information

Lecture 4: MIPS Instruction Set

Lecture 4: MIPS Instruction Set Lecture 4: MIPS Instruction Set No class on Tuesday Today s topic: MIPS instructions Code examples 1 Instruction Set Understanding the language of the hardware is key to understanding the hardware/software

More information

Computer Systems Architecture I. CSE 560M Lecture 3 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 3 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 3 Prof. Patrick Crowley Plan for Today Announcements Readings are extremely important! No class meeting next Monday Questions Commentaries A few remaining

More information

CPS104 Computer Organization Lecture 1. CPS104: Computer Organization. Meat of the Course. Robert Wagner

CPS104 Computer Organization Lecture 1. CPS104: Computer Organization. Meat of the Course. Robert Wagner CPS104 Computer Organization Lecture 1 Robert Wagner Slides available on: http://www.cs.duke.edu/~raw/cps104/lectures 1 CPS104: Computer Organization Instructor: Robert Wagner Office: LSRC D336, 660-6536

More information

CSCE 5610: Computer Architecture

CSCE 5610: Computer Architecture HW #1 1.3, 1.5, 1.9, 1.12 Due: Sept 12, 2018 Review: Execution time of a program Arithmetic Average, Weighted Arithmetic Average Geometric Mean Benchmarks, kernels and synthetic benchmarks Computing CPI

More information

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Chapter 8 & Chapter 9 Main Memory & Virtual Memory Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array

More information

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

Computer Architecture Review. Jo, Heeseung

Computer Architecture Review. Jo, Heeseung Computer Architecture Review Jo, Heeseung Computer Abstractions and Technology Jo, Heeseung Below Your Program Application software Written in high-level language System software Compiler: translates HLL

More information

More on Conjunctive Selection Condition and Branch Prediction

More on Conjunctive Selection Condition and Branch Prediction More on Conjunctive Selection Condition and Branch Prediction CS764 Class Project - Fall Jichuan Chang and Nikhil Gupta {chang,nikhil}@cs.wisc.edu Abstract Traditionally, database applications have focused

More information

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple

More information

Pipelining, Branch Prediction, Trends

Pipelining, Branch Prediction, Trends Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping

More information

Hardware Speculation Support

Hardware Speculation Support Hardware Speculation Support Conditional instructions Most common form is conditional move BNEZ R1, L ;if MOV R2, R3 ;then CMOVZ R2,R3, R1 L: ;else Other variants conditional loads and stores nullification

More information

A Study of Workstation Computational Performance for Real-Time Flight Simulation

A Study of Workstation Computational Performance for Real-Time Flight Simulation A Study of Workstation Computational Performance for Real-Time Flight Simulation Summary Jeffrey M. Maddalon Jeff I. Cleveland II This paper presents the results of a computational benchmark, based on

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 15: Caching: Demand Paged Virtual Memory

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 15: Caching: Demand Paged Virtual Memory CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2003 Lecture 15: Caching: Demand Paged Virtual Memory 15.0 Main Points: Concept of paging to disk Replacement policies

More information

Microprocessor Architecture Dr. Charles Kim Howard University

Microprocessor Architecture Dr. Charles Kim Howard University EECE416 Microcomputer Fundamentals Microprocessor Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer System CPU (with PC, Register, SR) + Memory 2 Computer Architecture ALU

More information

CPS104 Computer Organization Lecture 1

CPS104 Computer Organization Lecture 1 CPS104 Computer Organization Lecture 1 Robert Wagner Slides available on: http://www.cs.duke.edu/~raw/cps104/lectures 1 CPS104: Computer Organization Instructor: Robert Wagner Office: LSRC D336, 660-6536

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information