Advances and Future Challenges in Binary Translation and Optimization

Size: px
Start display at page:

Download "Advances and Future Challenges in Binary Translation and Optimization"

Transcription

1 Advances and Future Challenges in Binary Translation and Optimization ERIK R. ALTMAN, KEMAL EBCIOG LU, MICHAEL GSCHWIND, SENIOR MEMBER, IEEE, AND SUMEDH SATHAYE Presented by Holly Ferguson

2 Can you define these terms? Introduction Dynamic binary translation = Dynamic optimization = (Various Current Architectural Solutions) Regular Hardware Architecture Solution Keywords: Binary Translation, Compilers, Dynamic optimization, ILP, Java, JIT, VLIW

3 Introduction Dynamic binary translation = is just-in-time (JIT) compilation from the binary code of one architecture to another. Dynamic optimization = is run-time improvement of code. (see the fig. below) (Various Current Architectural Solutions) Regular Hardware Architecture Solution Keywords: Binary Translation, Compilers, Dynamic optimization, ILP, Java, JIT, VLIW

4 Dynamic binary translation = is just-in-time (JIT) compilation from the binary code of one architecture to another. Dynamic optimization = is run-time improvement of code. (see the fig. below) Purpose Running Program Input Code Stream Translate and Optimize Output Code Regular Hardware Architecture Solution Keywords: Binary Translation, Compilers, Dynamic optimization, ILP, Java, JIT, VLIW

5 Dynamic binary translation = is just-in-time (JIT) compilation from the binary code of one architecture to another. Purpose Translate and Optimize Running Program Input Code Stream Translate and Optimize Instead Rules: Variety of Regular Regular Hardware Hardware Architecture Architecture Solution Solutions Keywords: Binary Translation, Compilers, Dynamic optimization, ILP, Java, JIT, VLIW

6 Why is BT a negative? Binary Translation Addresses: Purpose Allows Architecture to become a Layer of Software As SW, fixes problems of running legacy SW directly Enables Optimizations outside existing hardware boundaries Commercial & Research Interest BT is done automatically at run-time without programmer Saves POWER since memory uses less power than Logic (for non-superscalar) Paper analyzes questions with projects including: DAISY, Crusoe, Dynamo, and LaTTe

7 Why else is BT a negative? BT & VMM have additional drawbacks: Negatives IT-apps, commonality, or virtual IT shops, etc. may mean disruptive behavior VMM debugging difficult: target is several times removed from source = behavior is nondeterministic Takes memory and resources from the source arch. machine Takes cycles from source arch. programs New design territory (2001), so calibration is difficult. Start of prgm. exe is slow, since all code is interpreted and translated to target arch. code Overtaking Hardware is a concern large as well Running Program Input Code Stream Translate and Optimize Output Code Regular Hardware Architecture Solution Virtual Machine Monitor (VMM) CMS (Code Morphing Software) for VMM Translation Cache (TCache)

8 Why else is BT a negative? a large focus for: Difficult Issues with managing BT: Self Modifying Code Precise Exceptions Address Translation Self Referential Code Management of Translations Real-Time Behavior Boot and BIOS Code Reliability and Correctness Code Reuse Interpreting versus translating/optimizing Emulating a virtual versus a real machine Full system versus user mode only; (IEEE, NOV. 2001) OS independent versus OS dependent Translating to a different versus same architecture Emulating single versus multiple source architecture (Referred to as Source (legacy) to Target Arch.) DAISY, Crusoe, Dynamo, and LaTTe focus on: talked about in terms of: Negatives Virtual Machine Monitor (VMM) CMS (Code Morphing Software) for VMM Translation Cache (TCache)

9 Why is BT a positive? Attractions to the study of Binary Translation: In pursuit of Architecturally Independent Computing using the run-anywhere object code idea Variety of Customers Farm = Variety of Architectures Positives Paper analyzes questions with projects including: DAISY, Crusoe, Dynamo, and LaTTe

10 Attractions to the study of Binary Translation: In pursuit of Architecturally Independent Computing using the run-anywhere object code idea Variety of Customers Farm = Variety of Architectures Gives a Static Total Positives Determines a Static Breakdown Paper analyzes questions with projects including: DAISY, Crusoe, Dynamo, and LaTTe

11 Attractions to the study of Binary Translation: In pursuit of Architecturally Independent Computing using the run-anywhere object code idea Variety of Customers Farm = Variety of Architectures Gives a Static Total Positives Determines a Static Breakdown Means Limited Utilization Thus Increasing Cost Paper analyzes questions with projects including: DAISY, Crusoe, Dynamo, and LaTTe

12 Attractions to the study of Binary Translation: In pursuit of Architecturally Independent Computing using the run-anywhere object code idea Variety of Customers Farm = Variety of Architectures Gives a Static Total Positives Determines a Static Breakdown BT is a Solution for better Utilization if it is a Layer of SW and thus dynamic configuration of many machines of a farm. Means Limited Utilization Thus Increasing Cost Paper analyzes questions with projects including: DAISY, Crusoe, Dynamo, and LaTTe

13 Why else is BT a positive? BT permits such optimizations under the covers when the user runs the program. Positives BT is not limited and can cross boundaries such as indirect calls, function returns, shared libraries, and system calls. With BT, intelligence is in software, not hardware. This means smaller chips with higher yield. Only a software patch for better algorithm is needed to install them and update the VMM. A software patch is sufficient to fix a bug in the VMM. Bugs such as nonworking opcodes may be manipulated by changing the VMM software. Running Program Input Code Stream Translate and Optimize Output Code Regular Hardware Architecture Solution

14 Why else is BT a positive? BT permits such optimizations under the covers when the user runs the program. BT is not limited and can cross boundaries such as indirect calls, function returns, shared libraries, and system calls. With BT, intelligence is in software, not hardware. This means smaller chips with higher yield. Only a software patch for better algorithm is needed to install them and update the VMM. A software patch is sufficient to fix a bug in the VMM. Bugs such as nonworking opcodes may be manipulated by changing the VMM software. Legacy binaries where source code is unavailable can be optimized. Translated basic blocks can be laid out contiguously in a natural order. This improves instruction cache performance. Future architectural improvements are transparent to the user. Compatibility of VLIWs of different sizes and generations. Positives Running Program Input Code Stream Translate and Optimize Output Code Regular Hardware Architecture Solution

15 What is the best solution? With software convergence, BT JIT optimizations admit the possibility of convergence virtual machine (CVM): JVM: Java Virtual Machine Write-Once, run anywhere model Existing C/C++ apps and OSs do not run on JVM, nor does Linux For Security and safety guarantees, gives up universal ability to handle other HW problems CVM: Similar to the JVM (Same Goal) Different Tradeoffs: Research works to allow same OS and app object code to run on different platforms with CVM Is universal because of/ through JIT compilation and virtual device emulation = less protection than a modern RISC processor CVM The Internet has recently been changing the software landscape [radically] and has been implicitly encouraging write-once, run-anywhere software and interoperability of different hardware platforms, as exemplified by the recent popularity of technologies such as XML, Simple Object Access Protocol (SOAP) [17], and Java. ~Altman

16 What are future implications? Linux Apps Compiled for CVM Linux Compiled for CVM Linux Apps Compiled for PowerPC or x86 Linux Compiled for PowerPC or (diff. for x86) CVM CVM for PowerPC or x86 PowerPC or x86 Hardware PowerPC or x86 Hardware (Linux & Apps Under CVM) (Linux & Apps Under PowerPC/x86)

17 What are future implications? Linux Apps Compiled for CVM Linux Compiled for CVM Linux Apps Compiled for PowerPC or x86 Linux Compiled for PowerPC or (diff. for x86) CVM CVM for PowerPC or x86 PowerPC or x86 Hardware PowerPC or x86 Hardware (Linux & Apps Under CVM) (Linux & Apps Under PowerPC/x86) Using CVM, & object code in XML format, OS can be booted from the web

18 DAISY, Crusoe, Dynamo, LaTTe DAISY

19 PowerPC IBM DAISY L3 Cache DAISY VLIW DAISY 6xx Bus Memory Controller PCI Bus DAISY Flash ROM PowerPC Flash ROM Disk Video Network Keyboard Memory PowerPC DAISY. Most DAISY work uses PowerPC as the source architecture. By allowing operations from multiple paths in its groups, DAISY is not dependent on good branch prediction, and indeed makes forward progress along all possible paths.

20 PowerPC IBM DAISY Interpret, Add to Group X no Stopping Point yes Y = X X= New Group no Previous Translate d Entry Point yes no Interp. 30x yes DAISY Execute Group s VLIW Translation Translate and Sched. Group Y to VLIW Instruct. no Freq. Exe Code yes > 24 Ops ILP > 3 ILP > 10 > 180 Cps yes yes At Good Stop yes yes yes Good Stop Point.: such as a Loopback

21 PowerPC IBM DAISY DAISY renaming Scratchpad registers r36 to r63 LS telescoping (allows dependence chains to be significantly shortened)

22 PowerPC x86 TRANSMETA CRUSOE When a translated group completes, the contents of all working x86 registers are copied at once to the shadow register set. This shadow copy allows Crusoe to efficiently recover from exceptions. x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware DAISY = 3 4 PowerPC instructions per cycle Transmeta claims the 667-MHz Crusoe TM5400 = 500-MHz Pentium III Different Intended Use Daisy Crusoe type Big Machine Small/ Mobile IPC Up to 4 atoms (ops) per molecule 128-bit Molecule FADD Floatin g Point Unit ADD Intege r ALU LD Load Store Unit In order pipeline BRCC Branc hunit CRUSOE Total Memory Gigs MB 64 Registers (of Crusoe processor block d.) Self/ Translations 100 MB+ 16 MB Working Shadow x86 Architectural State Crusoe Working Registers

23 PowerPC x86 TRANSMETA CRUSOE x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware 8 kb of on-chip local memory for data and 8 kb for instructions. (So CMS can remain for quick access without disturbing x86 code and data.) 16-way associativity of the L1 DCache minimizes conflicts of x86 code data and that used by CMS. CRUSOE Memory controller is integrated into the TM5400 because it is part of the standard PC architecture. Unlike DAISY, has optimizations like strength reduction and aggressive dead code elimination. Alias Butler AB TM5400 Crusoe chip The TM5400 has less (7 million) transistors compared to AMD and Intel microprocessors, showing that BT allows = reduction in hardware complexity. Idp (X) (Speculative Load) Addr Size (Store Under Alias Mask) stam (Y) No : Continue Execution Hardware Aliasing Yes : Raise Exception to Crusoe VMM

24 x86 PowerPC HP DYNAMO HPUX Applications Problematic where it exposes Dynamo to the application = Potential + = Dynamo s translations last only 1 program invocation = Dynamo Software HPUX PA-RISC DYNAMO Potential + = Runs in the VA space of a single HPUX process = x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware

25 x86 PowerPC HP DYNAMO HPUX Applications Dynamo Software HPUX Problematic where it exposes Dynamo to the application = Less Transparent PA-RISC DYNAMO Potential + = Dynamo s translations last only 1 program invocation = Daisy & Crusoe last over many invocations of a program Potential + = Runs in the VA space of a single HPUX process = No need to deal with translation of addresses or grps spanning multiple pages= Reduces # synchronous exceptions to be done correctly = Allows more aggressive code optimizations x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware

26 x86 PowerPC HP DYNAMO HPUX Applications DAISY vs. Some of Dynamo s Optimizations : copy propagation constant propagation strength reduction loop invariant code motion loop unrolling DYNAMO Dynamo Software HPUX PA-RISC translates Groups as paths OR trees If OHead $ > optimization is helping, bails & exe original code directly ~ translation If OHead $ < optimization is helping, continues with VMM Groups are path, never trees or other forms= limit code explosion = cost of losses in exploiting parallelism DYNAMO x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware

27 x86 PowerPC HP DYNAMO HPUX Applications Dynamo Software HPUX PA-RISC DYNAMO x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware

28 x86 PowerPC SPARC IMB LaTTe Java JIT compilers, such as LaTTe, use dynamic translation and optimization to move from Java Virtual Machine code to RISC code. [3] Bytecode Bytecode CFG of Pseudo CFG of Real Code Native SPARC Bytecode Translation: Java stack is mapped to symbolic registers Register Allocation & Optimizations: Symbolic registers are allocated to machine registers Code Emission: Binary image is generated from the CFG Determines locations of basic blocks CFG of Pseudo SPARC Code CFG of Real SPARC Code Native SPARC Code LaTTe x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware HPUX Applications Dynamo Software HPUX PA-RISC

29 x86 PowerPC SPARC IMB LaTTe LaTTe and Java performance advantages over BT for traditional architectures: Bytecode CFG of Pseudo CFG of Real Code Lightweight monitor optimized for single threaded programs Native SPARC Efficient exception handling. Instead of inserting code, it uses hardware generated signals to detect exceptions such as out-ofbounds memory accesses LaTTe Efficient garbage collection, memory management, and sophisticated JIT compilation techniques Converts virtual method calls to direct method calls or inlines them by including a specific (conditional branch) check for the most frequently occurring method invoked from a particular call site Register allocation converts the JVM s stackbased model with push and pop operations to the register based model used in RISC machines such as Sparc. x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware HPUX Applications Dynamo Software HPUX PA-RISC

30 x86 PowerPC SPARC IMB LaTTe LaTTe and Java performance advantages over BT for traditional architectures: ( Difference: This uses BT to improve VM performance and in a way mitigate the gap between the source architecture JVM and the underlying target architecture ) Bytecode CFG of Pseudo CFG of Real Code Native SPARC A A A LaTTe B D C B D C D B D C (Tree Regions= its optimization unit) a) Original Control Flow Graph b) CFG of DAISY transformed code c) CFG of LaTTe transformed code Seeks tree groups in input code, instead of output w/ two-pass register allocation (first backward sweep to covey upward the info. for all symbolic registers live at its group exits, then forward sweep for actual register allocation based on hints from the backward sweep) x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware HPUX Applications Dynamo Software HPUX PA-RISC

31 PowerPC Additional Questions Asked: x86 SPARC 1. Can a binary translation machine have generally better performance than a well-designed superscalar? 2. Can all real-time problems be avoided? 3. What memory management schemes are best over a wide range of TCache sizes? 4. In full system BT, how can VMM memory amount change after startup? 5. If the VMM gives memory back to the system because of an unusually large working set for the TCache it has no way to transparently steal the memory from the OS running above it 6. Should the target architecture ever be exposed for users to access directly bypassing the source architecture layer and translation by the VMM? Afterthought Important considerations: operation semantics data formats (cond. code FP, etc) address translation special purpose registers more registers than the sourcearch (for VMM scratchpad) x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware HPUX Applications Dynamo Software HPUX PA-RISC Bytecode CFG of Pseudo CFG of Real Code Native SPARC

32 PowerPC FPGA/ Warp Processing (2008): x86 SPARC Using desktop, server, and scientific-computing applications, results gave similar speedups: ex. compared to a four-processor 400-MHz ARM11 system, warp processing obtained average speedups of 169X Recent Work x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware HPUX Applications Dynamo Software HPUX PA-RISC Bytecode CFG of Pseudo CFG of Real Code Native SPARC

33 PowerPC MTCrossBit (2011, from CrossBit) x86 SPARC A dynamic binary translation system based on multithreaded optimization Existing DBT techniques employed are for a single-threaded executive environment (increase the complexity of the hardware or runtime overhead) This is a multithreaded DBT framework with no associated hardware (uses a helper thread for building a hot trace which reduces overhead) Main and Helper threads use different cores to use multi-core resources efficiently Two methods: 1. the dual-special-parallel translation caches and 2. the new lock-free threads communication mechanism assembly language communication (ASLC) Recent Work Supported guest platforms including SimpleScalar, IA32, MIPS, SPARC, fully supported the IA32 host platform, PowerPC, etc x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware HPUX Applications Dynamo Software HPUX PA-RISC Bytecode CFG of Pseudo CFG of Real Code Native SPARC

34 PowerPC MTCrossBit (2011, framework) x86 SPARC MTCrossBit builds a hot trace that is concurrently executed in the helper thread to boost the performance, opposed to sequential execution in a single thread executive environment. However, multi-threaded framework has unavoidable problems such as mutual exclusion, the access of translated basic blocks, and communication between threads. Recent Work x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware HPUX Applications Dynamo Software HPUX PA-RISC Bytecode CFG of Pseudo CFG of Real Code Native SPARC

35 PowerPC CPU framework into GPUs () x86 SPARC Low overhead always has been an issue with DBT GPU (multi-core processor, used as a co-processor) can parallel execute the hot spot of binary code can reduce the overhead of DBT One solution is to construct the virtual execution environment to accelerate the process of DBT on CPU/GPU based architectures Hot spots of binary code and their related information, the framework converts the sequential code into PTX form and executes them on GPUs Recent Work No need to rewrite the source code, and the binary compatibility issues between different GPUs are also resolved, this usually has 10X speedup compared to X86 native platforms, and better with larger input x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware HPUX Applications Dynamo Software HPUX PA-RISC Bytecode CFG of Pseudo CFG of Real Code Native SPARC

36 PowerPC CPU framework into GPUs () x86 SPARC Recent Work Workflow of GXBit, First and Second Exe Pahases(extracts hotspots and converts to GPU form): The Translation Framework: x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware HPUX Applications Dynamo Software HPUX PA-RISC Bytecode CFG of Pseudo CFG of Real Code Native SPARC

37 PowerPC Sources Consulted: x86 SPARC [1] Altman et al., Advances and Future Challenges in Binary Translation and Optimization, Proceedings of the IEEE, vol. 89, No. 11, November [2] K. Diefendorff, Power4 focuses on memory bandwidth, Microprocessor: Rep., vol. 13, October [3] Dynamic Binary Translation and Optimization, December 2000: < micro33/tutorial/tutorial.html>, < [4] Vahid, F; Stitt, G; Lvsecky, R et al., Warp Processing: Dynamic Translation of Binaries to FPGA Circuits, Computer, vol. 41, Issue. 7, [5] Guan HaiBing et al., MTCrossBit: A dynamic binary translation system based on multithreaded optimization, Science China, vol. 54, No. 10, October Bibliography [6] Erzhou Zhu, Haibing Guan, Guoxing Dong, Yindong Yang, Hongbo Yang, A Translation Framework for Executing the Sequential Binary Code on CPU/GPU Based Architectures, Journal of Software, vol. 6, No. 12, December x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware HPUX Applications Dynamo Software HPUX PA-RISC Bytecode CFG of Pseudo CFG of Real Code Native SPARC

38 x86 SPARC PowerPC Questions/Comments?: Introduction Q U E S T I O N S? Purpose Negatives Positives CVM DAISY CRUSOE DYNAMO LaTTe Afterthought Recent Work Bibliography x86 Applications Windows, Linux, BIOS Code Morphing SW Crusoe Hardware HPUX Applications Dynamo Software HPUX PA-RISC Bytecode CFG of Pseudo CFG of Real Code Native SPARC

Crusoe Reference. What is Binary Translation. What is so hard about it? Thinking Outside the Box The Transmeta Crusoe Processor

Crusoe Reference. What is Binary Translation. What is so hard about it? Thinking Outside the Box The Transmeta Crusoe Processor Crusoe Reference Thinking Outside the Box The Transmeta Crusoe Processor 55:132/22C:160 High Performance Computer Architecture The Technology Behind Crusoe Processors--Low-power -Compatible Processors

More information

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Virtual Machines and Dynamic Translation: Implementing ISAs in Software Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2007 Lecture 14: Virtual Machines 563 L14.1 Fall 2009 Outline Types of Virtual Machine User-level (or Process VMs) System-level Techniques for implementing all

More information

Execution-based Scheduling for VLIW Architectures. Kemal Ebcioglu Erik R. Altman (Presenter) Sumedh Sathaye Michael Gschwind

Execution-based Scheduling for VLIW Architectures. Kemal Ebcioglu Erik R. Altman (Presenter) Sumedh Sathaye Michael Gschwind Execution-based Scheduling for VLIW Architectures Kemal Ebcioglu Erik R. Altman (Presenter) Sumedh Sathaye Michael Gschwind September 2, 1999 Outline Overview What's new? Results Conclusions Overview Based

More information

Introduction to Virtual Machines. Michael Jantz

Introduction to Virtual Machines. Michael Jantz Introduction to Virtual Machines Michael Jantz Acknowledgements Slides adapted from Chapter 1 in Virtual Machines: Versatile Platforms for Systems and Processes by James E. Smith and Ravi Nair Credit to

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

Inherently Lower Complexity Architectures using Dynamic Optimization. Michael Gschwind Erik Altman

Inherently Lower Complexity Architectures using Dynamic Optimization. Michael Gschwind Erik Altman Inherently Lower Complexity Architectures using Dynamic Optimization Michael Gschwind Erik Altman ÿþýüûúùúüø öõôóüòñõñ ðïîüíñóöñð What is the Problem? Out of order superscalars achieve high performance....butatthecostofhighhigh

More information

CS 252 Graduate Computer Architecture. Lecture 15: Virtual Machines

CS 252 Graduate Computer Architecture. Lecture 15: Virtual Machines CS 252 Graduate Computer Architecture Lecture 15: Virtual Machines Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252

More information

A Survey on Virtualization Technologies

A Survey on Virtualization Technologies A Survey on Virtualization Technologies Virtualization is HOT Microsoft acquires Connectix Corp. EMC acquires VMware Veritas acquires Ejascent IBM, already a pioneer Sun working hard on it HP picking up

More information

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor Lecture 12 Architectures for Low Power: Transmeta s Crusoe Processor Motivation Exponential performance increase at a low cost However, for some application areas low power consumption is more important

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

CS 152 Computer Architecture and Engineering. Lecture 22: Virtual Machines

CS 152 Computer Architecture and Engineering. Lecture 22: Virtual Machines CS 152 Computer Architecture and Engineering Lecture 22: Virtual Machines Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

Intel released new technology call P6P

Intel released new technology call P6P P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 18: Virtual Machines

CS252 Spring 2017 Graduate Computer Architecture. Lecture 18: Virtual Machines CS252 Spring 2017 Graduate Computer Architecture Lecture 18: Virtual Machines Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Midterm Topics ISA -- e.g. RISC vs. CISC

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Introduction. CS 2210 Compiler Design Wonsun Ahn

Introduction. CS 2210 Compiler Design Wonsun Ahn Introduction CS 2210 Compiler Design Wonsun Ahn What is a Compiler? Compiler: A program that translates source code written in one language to a target code written in another language Source code: Input

More information

RISC Architecture Ch 12

RISC Architecture Ch 12 RISC Architecture Ch 12 Some History Instruction Usage Characteristics Large Register Files Register Allocation Optimization RISC vs. CISC 18 Original Ideas Behind CISC (Complex Instruction Set Comp.)

More information

IA-64, P4 HT and Crusoe Architectures Ch 15

IA-64, P4 HT and Crusoe Architectures Ch 15 IA-64, P4 HT and Crusoe Architectures Ch 15 IA-64 General Organization Predication, Speculation Software Pipelining Example: Itanium Pentium 4 HT Crusoe General Architecture Emulated Precise Exceptions

More information

Introduction to Virtual Machines

Introduction to Virtual Machines Introduction to Virtual Machines abstraction and interfaces virtualization Vs. abstraction computer system architecture process virtual machines system virtual machines Abstraction Abstraction is a mechanism

More information

C 1. Last time. CSE 490/590 Computer Architecture. Virtual Machines I. Types of Virtual Machine (VM) Outline. User Virtual Machine = ISA + Environment

C 1. Last time. CSE 490/590 Computer Architecture. Virtual Machines I. Types of Virtual Machine (VM) Outline. User Virtual Machine = ISA + Environment CSE 490/590 Computer Architecture Last time Directory-based coherence protocol 4 cache states: C-invalid, C-shared, C-modified, and C-transient 4 memory states: R(dir), W(id), TR(dir), TW(id) Virtual Machines

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Case Study : Transmeta s Crusoe

Case Study : Transmeta s Crusoe Case Study : Transmeta s Crusoe Motivation David Ditzel (SUN microsystems) observed that Microprocessor complexity is getting worse, and they consume too much power. This led to the birth of Crusoe (nicknamed

More information

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1]) EE392C: Advanced Topics in Computer Architecture Lecture #10 Polymorphic Processors Stanford University Thursday, 8 May 2003 Virtual Machines Lecture #10: Thursday, 1 May 2003 Lecturer: Jayanth Gummaraju,

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Delft-Java Link Translation Buffer

Delft-Java Link Translation Buffer Delft-Java Link Translation Buffer John Glossner 1,2 and Stamatis Vassiliadis 2 1 Lucent / Bell Labs Advanced DSP Architecture and Compiler Research Allentown, Pa glossner@lucent.com 2 Delft University

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

Processors. Young W. Lim. May 12, 2016

Processors. Young W. Lim. May 12, 2016 Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Alternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model

Alternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model What is Computer Architecture? Structure: static arrangement of the parts Organization: dynamic interaction of the parts and their control Implementation: design of specific building blocks Performance:

More information

Just-In-Time Compilation

Just-In-Time Compilation Just-In-Time Compilation Thiemo Bucciarelli Institute for Software Engineering and Programming Languages 18. Januar 2016 T. Bucciarelli 18. Januar 2016 1/25 Agenda Definitions Just-In-Time Compilation

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

Dynamic Translation for EPIC Architectures

Dynamic Translation for EPIC Architectures Dynamic Translation for EPIC Architectures David R. Ditzel Chief Architect for Hybrid Computing, VP IAG Intel Corporation Presentation for 8 th Workshop on EPIC Architectures April 24, 2010 1 Dynamic Translation

More information

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue

More information

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known

More information

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define

More information

Superscalar Processors Ch 14

Superscalar Processors Ch 14 Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency? Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Hardware Speculation Support

Hardware Speculation Support Hardware Speculation Support Conditional instructions Most common form is conditional move BNEZ R1, L ;if MOV R2, R3 ;then CMOVZ R2,R3, R1 L: ;else Other variants conditional loads and stores nullification

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

Architectural Approaches for Dynamic Translation and Reconfiguration

Architectural Approaches for Dynamic Translation and Reconfiguration Architectural Approaches for Dynamic Translation and Reconfiguration Brian F. Veale *, John K. Antonio *, and Monte P. Tull * School of Computer Science School of Electrical and Computer Engineering University

More information

Just-In-Time Compilers & Runtime Optimizers

Just-In-Time Compilers & Runtime Optimizers COMP 412 FALL 2017 Just-In-Time Compilers & Runtime Optimizers Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 16: Virtual Machine Monitors Geoffrey M. Voelker Virtual Machine Monitors 2 Virtual Machine Monitors Virtual Machine Monitors (VMMs) are a hot

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

EC 413 Computer Organization

EC 413 Computer Organization EC 413 Computer Organization Review I Prof. Michel A. Kinsy Computing: The Art of Abstraction Application Algorithm Programming Language Operating System/Virtual Machine Instruction Set Architecture (ISA)

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University Advanced d Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Multithreading: Exploiting Thread-Level Parallelism within a Processor Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 199 5.2 Instruction Formats 199 5.2.1 Design Decisions for Instruction Sets 200 5.2.2 Little versus Big Endian 201 5.2.3 Internal

More information

Virtualization. Dr. Yingwu Zhu

Virtualization. Dr. Yingwu Zhu Virtualization Dr. Yingwu Zhu Virtualization Definition Framework or methodology of dividing the resources of a computer into multiple execution environments. Types Platform Virtualization: Simulate a

More information

New Challenges in Microarchitecture and Compiler Design

New Challenges in Microarchitecture and Compiler Design New Challenges in Microarchitecture and Compiler Design Contributors: Jesse Fang Tin-Fook Ngai Fred Pollack Intel Fellow Director of Microprocessor Research Labs Intel Corporation fred.pollack@intel.com

More information

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?) Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static

More information

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy COMPUTER ARCHITECTURE Virtualization and Memory Hierarchy 2 Contents Virtual memory. Policies and strategies. Page tables. Virtual machines. Requirements of virtual machines and ISA support. Virtual machines:

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

Computer Architecture Area Fall 2009 PhD Qualifier Exam October 20 th 2008

Computer Architecture Area Fall 2009 PhD Qualifier Exam October 20 th 2008 Computer Architecture Area Fall 2009 PhD Qualifier Exam October 20 th 2008 This exam has nine (9) problems. You should submit your answers to six (6) of these nine problems. You should not submit answers

More information

Untyped Memory in the Java Virtual Machine

Untyped Memory in the Java Virtual Machine Untyped Memory in the Java Virtual Machine Andreas Gal and Michael Franz University of California, Irvine {gal,franz}@uci.edu Christian W. Probst Technical University of Denmark probst@imm.dtu.dk July

More information

CSE502 Lecture 15 - Tue 3Nov09 Review: MidTerm Thu 5Nov09 - Outline of Major Topics

CSE502 Lecture 15 - Tue 3Nov09 Review: MidTerm Thu 5Nov09 - Outline of Major Topics CSE502 Lecture 15 - Tue 3Nov09 Review: MidTerm Thu 5Nov09 - Outline of Major Topics Computing system: performance, speedup, performance/cost Origins and benefits of scalar instruction pipelines and caches

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

The Slide does not contain all the information and cannot be treated as a study material for Operating System. Please refer the text book for exams.

The Slide does not contain all the information and cannot be treated as a study material for Operating System. Please refer the text book for exams. The Slide does not contain all the information and cannot be treated as a study material for Operating System. Please refer the text book for exams. Operating System Services User Operating System Interface

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

High-Level Language VMs

High-Level Language VMs High-Level Language VMs Outline Motivation What is the need for HLL VMs? How are these different from System or Process VMs? Approach to HLL VMs Evolutionary history Pascal P-code Object oriented HLL VMs

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any a performance

More information

Tutorial 11. Final Exam Review

Tutorial 11. Final Exam Review Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

High-Performance Processors Design Choices

High-Performance Processors Design Choices High-Performance Processors Design Choices Ramon Canal PD Fall 2013 1 High-Performance Processors Design Choices 1 Motivation 2 Multiprocessors 3 Multithreading 4 VLIW 2 Motivation Multiprocessors Outline

More information

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer

More information

Run-time Program Management. Hwansoo Han

Run-time Program Management. Hwansoo Han Run-time Program Management Hwansoo Han Run-time System Run-time system refers to Set of libraries needed for correct operation of language implementation Some parts obtain all the information from subroutine

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

Two hours. No special instructions. UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date. Time

Two hours. No special instructions. UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date. Time Two hours No special instructions. UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE System Architecture Date Time Please answer any THREE Questions from the FOUR questions provided Use a SEPARATE answerbook

More information

Computer Architecture. Fall Dongkun Shin, SKKU

Computer Architecture. Fall Dongkun Shin, SKKU Computer Architecture Fall 2018 1 Syllabus Instructors: Dongkun Shin Office : Room 85470 E-mail : dongkun@skku.edu Office Hours: Wed. 15:00-17:30 or by appointment Lecture notes nyx.skku.ac.kr Courses

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more

More information

EEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW)

EEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW) 1 EEC 581 Computer Architecture Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW) Chansu Yu Electrical and Computer Engineering Cleveland State University Overview

More information

Architectural Support for Operating Systems

Architectural Support for Operating Systems Architectural Support for Operating Systems Today Computer system overview Next time OS components & structure Computer architecture and OS OS is intimately tied to the hardware it runs on The OS design

More information

Lecture 9: Multiple Issue (Superscalar and VLIW)

Lecture 9: Multiple Issue (Superscalar and VLIW) Lecture 9: Multiple Issue (Superscalar and VLIW) Iakovos Mavroidis Computer Science Department University of Crete Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Chapter 4 The Processor (Part 4)

Chapter 4 The Processor (Part 4) Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline

More information

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam Assembly Language Lecture 2 - x86 Processor Architecture Ahmed Sallam Introduction to the course Outcomes of Lecture 1 Always check the course website Don t forget the deadline rule!! Motivations for studying

More information

BEAMJIT: An LLVM based just-in-time compiler for Erlang. Frej Drejhammar

BEAMJIT: An LLVM based just-in-time compiler for Erlang. Frej Drejhammar BEAMJIT: An LLVM based just-in-time compiler for Erlang Frej Drejhammar 140407 Who am I? Senior researcher at the Swedish Institute of Computer Science (SICS) working on programming languages,

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

Interaction of JVM with x86, Sparc and MIPS

Interaction of JVM with x86, Sparc and MIPS Interaction of JVM with x86, Sparc and MIPS Sasikanth Avancha, Dipanjan Chakraborty, Dhiral Gada, Tapan Kamdar {savanc1, dchakr1, dgada1, kamdar}@cs.umbc.edu Department of Computer Science and Electrical

More information

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

Chapter 2. OS Overview

Chapter 2. OS Overview Operating System Chapter 2. OS Overview Lynn Choi School of Electrical Engineering Class Information Lecturer Prof. Lynn Choi, School of Electrical Eng. Phone: 3290-3249, Kong-Hak-Kwan 411, lchoi@korea.ac.kr,

More information

Last class: OS and Architecture. OS and Computer Architecture

Last class: OS and Architecture. OS and Computer Architecture Last class: OS and Architecture OS and Computer Architecture OS Service Protection Interrupts System Calls IO Scheduling Synchronization Virtual Memory Hardware Support Kernel/User Mode Protected Instructions

More information

Last class: OS and Architecture. Chapter 3: Operating-System Structures. OS and Computer Architecture. Common System Components

Last class: OS and Architecture. Chapter 3: Operating-System Structures. OS and Computer Architecture. Common System Components Last class: OS and Architecture Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System Design and Implementation

More information