The ILOC Virtual Machine (Lab 1 Background Material) Comp 412
|
|
- Noah Johnston
- 6 years ago
- Views:
Transcription
1 COMP 12 FALL 20 The ILOC Virtual Machine (Lab 1 Background Material) Comp 12 source code IR Front End OpMmizer Back End IR target code Copyright 20, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 12 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educamonal insmtumons may use these materials for nonprofit educamonal purposes, provided this copyright nomce is preserved.
2 What is the execu:on model for an ILOC program? ILOC is the assembly language of a simple, idealized RISC processor ILOC Virtual Machine Separate code memory and data memory SomeMmes called a Harvard architecture Sizes of data memory & register set are configurable Code memory is large enough to hold your program Simple, in-order execumon model ILOC Set ArithmeMc operamons work on values held in registers Load & store move values between registers and memory ILOC Processor To debug the output of your labs, you will use an ILOC simulator, a program that mimics the operamon of the ILOC virtual machine that is, it is an interpreter for ILOC Code COMP 12, Fall 20 RISC Reduced InstrucMon Set Processor 1
3 The ILOC Subset See also the Lab 1 handout and Appendix A in EaC2e Pay acen:on to the meanings of the ILOC opera:ons Syntax Meaning Latency load r 1 => r 2 r 2 MEM(r 1 ) store r 1 => r 2 MEM(r 2 ) r1 loadi c => r 2 r 2 c 1 add r 1, r 2 => r r r 1 + r 2 1 sub r 1, r 2 => r r r 1 - r 2 1 mult r 1, r 2 => r r r1 * r 2 1 lshib r 1, r 2 => r r r 1 << r 2 1 rshib r 1, r 2 => r r r 1 >> r 2 1 output c prints MEM(x) to stdout 1 nop idles for one cycle 1 ILOC is an abstract assembly language. Each operamon, except nop, use (or read) one or more values. Each operamon, except output and nop, defines a value. loadi reads its value from the instrucmon stream. load reads both a register and a memory locamon. store reads two registers and writes a memory locamon. add, sub, mult, lshie, and rshie read two registers and write one register. COMP 12, Fall 20 2
4 ILOC ExecuMon A Simple ILOC Program % cat ex1.iloc // add two numbers add r0,r1 => r2 % 12alloc ex1.iloc >ex1a.iloc % sim ex1a.iloc -i The i opmon inimalizes memory, starmng at locamon 0, with the values 1 and 18. Executed 7 instructions and 7 operations in 11 cycles. COMP 12, Fall 20 RISC Reduced InstrucMon Set Processor
5 ILOC ExecuMon A Simple ILOC Program % cat ex1.iloc // add two numbers add r0,r1 => r2 ex1.iloc 12alloc ex1a.iloc sim results on stdout % 12alloc ex1.iloc >ex1a.iloc % sim ex1a.iloc -i Executed 7 instructions and 7 operations in 11 cycles. COMP 12, Fall 20 RISC Reduced InstrucMon Set Processor
6 Before Execu:on of the ILOC Program Starts Processor Invoked with the command line: % sim i 0 1 <ex1.iloc Code is loaded into instrucmon memory starmng at word 0. COMP 12, Fall 20
7 The virtual machine runs through the code, in order The basic unit of execumon is a cycle A cycle consists of a fetch phase and an execute phase ExecuMon looks like (fetch, execute) (fetch, execute) Fetch retrieves the next operamon from code memory Advances sequenmally through the straight-line code Execute performs the specified operamon Performs one step on each acmve operamon MulM-cycle operamons (e.g., load and store in lab 1) are divided into mulmple steps ExecuMon (on the processor s funcmonal unit) uses a pipeline of operamon steps Load and store proceed through three stages or steps in the pipeline The illustrated example should make this more clear COMP 12, Fall 20 6
8 Cycle 0: Fetch Phase Processor First, the processor fetches and decodes the operamon at the current value of the program counter. COMP 12, Fall 20 7
9 Cycle 0: Execute Phase Processor Next, it executes the operamon. In this case, that places the value 1 into register r0. COMP Trace 12, output: Fall 20 0: [ (1)] 8
10 Cycle 1: Fetch Phase Processor COMP 12, Fall 20 It advances the PC and the pipeline. (Since loadi is a 1-cycle operamon, it discards that operamon.) It fetches the next operamon. 9
11 Cycle 1: Execute Phase Processor Next, it executes the loadi, which places 0 in r1. COMP Trace 12, output: Fall 20 1: [ (0)] 10
12 Cycle 2: Fetch Phase Processor COMP 12, Fall 20 It advances the PC and the pipeline. (Since loadi is a 1-cycle operamon, it discards that operamon.) It fetches the next operamon. 11
13 Cycle 2: Execute Phase Processor The load begins operamon. COMP Trace 12, output: Fall 202: [load r1 (addr: 0) => r1 (1) 12
14 pipelined func:onal unit Cycle : Fetch Phase COMP 12, Fall 20 The processor advances the PC and the pipeline. The load moves to slot 2 and the add fills slot 1. 1
15 Cycle : Execute Phase The load conmnues to execute. The add needs the result of the load, so the processor stalls it. COMP Trace 12, output: Fall 20: [ stall ] stall means to hold the op for another cycle 1
16 Cycle : Fetch Phase The processor advances the pipeline. Since the add is stalled, it remains in the first pipeline slot. COMP 12, Fall 20 1
17 Cycle : Execute Phase The load completes and the value 1 is wrilen into r1. The add conmnues to stall, waimng on r1. COMP Trace 12, output: Fall 20: [ stall ] *2 16
18 Cycle : Fetch Phase The processor advances the pipeline. The load rolls out of the bolom. The add remains in slot 1. COMP 12, Fall 20
19 Cycle : Execute Phase The add executes and writes the value 28 into r2. COMP Trace 12, output: Fall 20: [add r0 (1), r1 (1) => r2 (28)] 18
20 Cycle 6: Fetch Phase The processor advances the pipeline and fetches the next operamon. COMP 12, Fall 20 19
21 Cycle 6: Execute Phase The processor executes the loadi operamon, which writes 0 into r0. COMP Trace 12, output: Fall 206: [ (0)] 20
22 Cycle 7: Fetch Phase The processor advances the pipeline and fetches the next operamon. COMP 12, Fall 20 21
23 Cycle 7: Execute Phase The processor begins execumon of the -cycle store operamon. COMP Trace 12, output: Fall 207: [store r2 (28) => r0 (addr: 0)] 22
24 Cycle 8: Fetch Phase The processor advances the pipeline (moving the store to slot 2) and fetches the next operamon COMP 12, Fall 20 2
25 Cycle 8: Execute Phase The store conmnues to execute. The output stalls, since it reads from data memory and the in-progress store writes to data memory. COMP Trace 12, output: Fall 208: [ stall ] 2
26 Cycle 9: Fetch Phase COMP 12, Fall 20 The processor advances the pipeline. The store moves to slot. The stalled output operamon remains in slot 1, waimng for the store to finish. 2
27 Cycle 9: Execute Phase The store writes 28 into memory locamon 0 at the end of the cycle. The output remains stalled. COMP Trace 12, output: Fall 209: [ stall ] *7 26
28 Cycle 10: Fetch Phase The processor advances the pipeline. The store falls out of the bolom of the pipeline. The output stays in slot 1. COMP 12, Fall 20 27
29 Cycle 10: Execute Phase The output operamons writes the contents of memory locamon 0 to stdout. Trace output: 10: [ (28)] COMP 12, Fall 20 output generates => 28 28
30 Cycle 11: Fetch Phase COMP 12, Fall 20 The processor advances the pipeline and fetches the next operamon. Since the next slot in the instrucmon memory is invalid, the processor halts. 29
31 ILOC ExecuMon This execumon is captured in the trace provided by the simulator % cat ex1.iloc // add two numbers add r0,r1 => r2 % Compare the simulator s trace output against the preceding slides. % sim -t ex1.iloc i 0 1 ILOC Simulator, Version Interlock settings memory registers branches 0: [ (1)] 1: [ (0)] 2: [load r1 (addr: 0) => r1 (1)] : [ stall ] : [ stall ] *2 : [add r0 (1), r1 (1) => r2 (28)] 6: [ (0)] 7: [store r2 (28) => r0 (addr: 0)] 8: [ stall ] 9: [ stall ] *7 10: [ (28)] output generates => 28 Executed 7 instructions and 7 operations in 11 cycles. COMP 12, Fall 20 0
32 The Model in the ILOC Virtual Machine big enough Code memory ILOC Processor In 0 to 2,767 are reserved for storage from the input program Its variables, arrays, objects Programmer needs space 2,768 and beyond is reserved for the allocator to use for spilled value ,768 big memory COMP 12, Fall 20 1
33 Does Real Hardware Work This Way? In fact, the ILOC model is fairly close to reality Real processors have a fetch, decode, execute cycle Fetch brings operamons into a buffer in the decode unit Decode deciphers the bits and sends control signals to the funcmonal unit(s) Execute clocks the funcmonal unit through one pipeline cycle Fetch, decode, execute is construed as a single cycle In reality, the units run concurrently Fetch unit works to deliver enough operamons to the fetch unit enough is defined, roughly, as one op per funcmonal unit per cycle Decode unit is, essenmally, combinatorial logic (&, therefore, fast) Execute unit performs complex operamons MulMply and divide are algorithmically complex operamons Pipeline units break long operamons into smaller subtasks COMP 12, Fall 20 2
34 More RealisMc Drawing Separate Fetch-Decode-Execute Fetch Unit Control Lines Decode Unit COMP 12, Fall 20
35 What about processors like core i7 or ARM? Control Lines s Decode Unit Fetch Unit 1 Modern processors typically have unified instrucmon and data memory. Operate on a fetch-decodeexecute cycle Complex, cachebased memory hierarchies MulMple pipelined funcmonal units MulMple cores One processing core Modified COMP 12, Fall Harvard 20 Architecture : separate pathways for code and data, but one store
36 What about processors like core i7 or ARM? Modern processors oeen have mul:ple func:onal units? For Lab 1, the ILOC simulator has one funcmonal unit In Lab, the simulator will have two funcmonal units Some operamons run on unit 0, some run on unit 1, some run on either unit 0 or unit 1 The basic model is the same Fetch then execute Number of operamons executed in a single cycle depends on the order in which they are encountered and the dependences between operamons Func:onal Unit 0 Register set Func:onal Unit 1 The Lab documentamon addresses these issues for ILOC The Lab simulator trace shows acmon in both units COMP 12, Fall 20
37 What about processors like core i7 or ARM? What happens to the execu:on model with mul:ple func:onal units? One operamon executes on each funcmonal unit The complicamon arises in the processor s fetch and decode units Fetch unit must be retrieve several operamons Fetch & decode must collaborate to decide where they execute Fixed, posimon-based scheme leads to VLIW system Dynamic scheme leads to superscalar systems More complex decode unit costs more transistors and more power Func:onal Unit 0 Register set Func:onal Unit 1 Processors with mulmple funcmonal units need code with mulmple independent (unrelated) operamons in each cycle Instruc;on Level Parallelism (or ILP ) See Lab in COMP 12 VLIW is Very Long Instruc;on Word computer COMP 12, Fall 20 6
38 What about processors like core i7 or ARM? When the number of func:onal units gets large At some point, the network to connect register sets to funcmonal units gets too deep Transmission Mme through the mulmplexor can come to dominate processor cycle Mme More funcmonal units would slow down the processor s fundamental clock speed Architects have adopted parmmoned register designs that have mulmple register sets with limited bandwidth between the register sets Adds a new problem to code generamon: the placement of operands Need to place each operamon on a funcmonal unit that can access the data Or, need to insert code to transfer the data (& ensure that a register is available for it in the new register set) Func:onal Unit 0 Func:onal Unit 1 Register sets, Func:onal Unit 0 Func:onal Unit 1 And the fetch and decode units get even more complex.. COMP 12, Fall 20 7
39 What s Next Aber MulMple FuncMonal Units? As processor complexity grows, the yield on performance for a given expenditure of chip real estate (or power) shrinks A core with eight funcmonal units might be bigger than four cores with two funcmonal units The interconnects between fetch, decode, register sets, (caches,) and funcmonal units become even more complex At some point, it is easier to put more cores on a chip than bigger cores Stamp out more simpler cores rather than fewer complex cores Easier design problem Lower power consumpmon Beler ramo of performance to chip area (and power) A great idea, if the programmer, language, and compiler can find Enough thread-level parallelism to keep all the cores busy Enough instrucmon-level parallelism (within each thread) to keep the funcmonal units busy COMP 12, Fall 20 8
40 What About MulMple Cores? Func:onal Units Decode Unit Func:onal Units Decode Unit Fetch Core 0 Fetch Core 1 1 Modern mul:core processors have 2 to many (6, 12, 2) cores Require lots of parallelism for best performance Major limitamon is memory bandwidth 1/ (# cores)? Bandwidth may impose some pracmcal limits on the use of all those cores COMP 12, Fall 20 9
41 What s Next Aber MulMple FuncMonal Units? What happens to the execu:on model in a mul:core processor? ExecuMon within a thread follows the single core model Fetch, decode, & execute with (possibly) mulmple funcmonal units Single threads have simple behavior Individual threads operate independently Language (& processor) usually provide synchronizamon between threads Need synchronizamon to share data and communicate control See COMP 22 and COMP 1 COMP 12, Fall 20 0
Instruction Selection: Preliminaries. Comp 412
COMP 412 FALL 2017 Instruction Selection: Preliminaries Comp 412 source code Front End Optimizer Back End target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled
More information9/9/12. New- School Machine Structures (It s a bit more complicated!) CS 61C: Great Ideas in Computer Architecture IntroducMon to Machine Language
CS 61C: Great Ideas in Computer Architecture IntroducMon to Machine Language Instructors: Krste Asanovic Randy H. Katz h
More informationSuperscalar Architectures: Part 2
Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23 rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr) Computer Science and Engineering Seoul NaMonal University Download this
More informationThe Software Stack: From Assembly Language to Machine Code
COMP 506 Rice University Spring 2018 The Software Stack: From Assembly Language to Machine Code source code IR Front End Optimizer Back End IR target code Somewhere Out Here Copyright 2018, Keith D. Cooper
More informationLexical Analysis, V Implemen'ng Scanners. Comp 412 COMP 412 FALL Sec0on 2.5 in EaC2e. target code. source code Front End OpMmizer Back End
COMP 412 FALL 2017 Lexical Analysis, V Implemen'ng Scanners Comp 412 source code IR Front End OpMmizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students
More informationCode Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End
COMP 412 FALL 2017 Code Shape Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at
More informationCS415 Compilers. Instruction Scheduling and Lexical Analysis
CS415 Compilers Instruction Scheduling and Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Instruction Scheduling (Engineer
More informationParsing II Top-down parsing. Comp 412
COMP 412 FALL 2017 Parsing II Top-down parsing Comp 412 source code IR Front End OpMmizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled
More informationIntermediate Representations
Most of the material in this lecture comes from Chapter 5 of EaC2 Intermediate Representations Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP
More informationLocal Register Allocation (critical content for Lab 2) Comp 412
Updated After Tutorial COMP 412 FALL 2018 Local Register Allocation (critical content for Lab 2) Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda
More informationCS 406/534 Compiler Construction Instruction Scheduling
CS 406/534 Compiler Construction Instruction Scheduling Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy
More informationProcessor Architecture
ECPE 170 Jeff Shafer University of the Pacific Processor Architecture 2 Lab Schedule Ac=vi=es Assignments Due Today Wednesday Apr 24 th Processor Architecture Lab 12 due by 11:59pm Wednesday Network Programming
More informationThe Processor Memory Hierarchy
Corrected COMP 506 Rice University Spring 2018 The Processor Memory Hierarchy source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved.
More informationThe ILOC Simulator User Documentation
The ILOC Simulator User Documentation COMP 412, Fall 2015 Documentation for Lab 1 The ILOC instruction set is taken from the book, Engineering A Compiler, published by the Elsevier Morgan-Kaufmann [1].
More informationImplementing Control Flow Constructs Comp 412
COMP 412 FALL 2018 Implementing Control Flow Constructs Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students
More informationSoftware Tools for Lab 1
Software Tools for Lab 1 We have built a number of tools to help you build and debug Lab 1 An ILOC Simulator Essentially, an interpreter for Lab 1 ILOC Matches the Lab 1 specs Single functional unit, latencies
More informationIntermediate Representations
COMP 506 Rice University Spring 2018 Intermediate Representations source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students
More informationArchitectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.
Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central
More informationThe ILOC Simulator User Documentation
The ILOC Simulator User Documentation Comp 506, Spring 2017 The ILOC instruction set is taken from the book, Engineering A Compiler, published by the Morgan- Kaufmann imprint of Elsevier [1]. The simulator
More informationLab 3, Tutorial 1 Comp 412
COMP 412 FALL 2018 Lab 3, Tutorial 1 Comp 412 source code IR IR Front End Optimizer Back End target code Copyright 2018, Keith D. Cooper, Linda Torczon & Zoran Budimlić, all rights reserved. Students enrolled
More informationThe ILOC Simulator User Documentation
The ILOC Simulator User Documentation Spring 2015 Semester The ILOC instruction set is taken from the book, Engineering A Compiler, published by the Morgan- Kaufmann imprint of Elsevier [1]. The simulator
More informationCS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University
CS415 Compilers Register Allocation These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Review: The Back End IR Instruction Selection IR Register
More informationParsing II Top-down parsing. Comp 412
COMP 412 FALL 2018 Parsing II Top-down parsing Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled
More informationLocal Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code
COMP 412 FALL 2017 Local Optimization: Value Numbering The Desert Island Optimization Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon,
More informationRegister Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.
Register Allocation Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP at Rice. Copyright 00, Keith D. Cooper & Linda Torczon, all rights reserved.
More informationComputer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics
Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can
More informationThe So'ware Stack: From Assembly Language to Machine Code
Taken from COMP 506 Rice University Spring 2017 The So'ware Stack: From Assembly Language to Machine Code source code IR Front End OpJmizer Back End IR target code Somewhere Out Here Copyright 2017, Keith
More informationECSE 425 Lecture 25: Mul1- threading
ECSE 425 Lecture 25: Mul1- threading H&P Chapter 3 Last Time Theore1cal and prac1cal limits of ILP Instruc1on window Branch predic1on Register renaming 2 Today Mul1- threading Chapter 3.5 Summary of ILP:
More informationJust-In-Time Compilers & Runtime Optimizers
COMP 412 FALL 2017 Just-In-Time Compilers & Runtime Optimizers Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved.
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining
More informationInstruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.
Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators Comp 412 COMP 412 FALL 2016 source code IR Front End Optimizer Back
More informationRuntime Support for Algol-Like Languages Comp 412
COMP 412 FALL 2018 Runtime Support for Algol-Like Languages Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students
More informationCopyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies
More informationBasic Computer Architecture
Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I
More informationMultiple Instruction Issue. Superscalars
Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths
More informationECSE 425 Lecture 1: Course Introduc5on Bre9 H. Meyer
ECSE 425 Lecture 1: Course Introduc5on 2011 Bre9 H. Meyer Staff Instructor: Bre9 H. Meyer, Professor of ECE Email: bre9 dot meyer at mcgill.ca Phone: 514-398- 4210 Office: McConnell 525 OHs: M 14h00-15h00;
More informationPipeline Architecture RISC
Pipeline Architecture RISC Independent tasks with independent hardware serial No repetitions during the process pipelined Pipelined vs Serial Processing Instruction Machine Cycle Every instruction must
More informationProcedure and Function Calls, Part II. Comp 412 COMP 412 FALL Chapter 6 in EaC2e. target code. source code Front End Optimizer Back End
COMP 412 FALL 2017 Procedure and Function Calls, Part II Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students
More informationProcessors. Young W. Lim. May 12, 2016
Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationSuperscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?
Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion
More informationSuperscalar Processors Ch 14
Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion
More informationTi Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr
Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions
More informationSyntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412
Midterm Exam: Thursday October 18, 7PM Herzstein Amphitheater Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412 COMP 412 FALL 2018 source code IR Front End Optimizer Back End IR target
More informationShow Me the $... Performance And Caches
Show Me the $... Performance And Caches 1 CPU-Cache Interaction (5-stage pipeline) PCen 0x4 Add bubble PC addr inst hit? Primary Instruction Cache IR D To Memory Control Decode, Register Fetch E A B MD1
More informationChapter 9. Pipelining Design Techniques
Chapter 9 Pipelining Design Techniques 9.1 General Concepts Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to be performed in sequence. Each subtask
More informationLexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.
Lexical Analysis Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.
More informationRISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.
COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped
More informationCS Computer Architecture
CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 Computer Systems Organization The CPU (Central Processing Unit) is the brain of the computer. Fetches instructions from main memory.
More informationCS415 Compilers. Intermediate Represeation & Code Generation
CS415 Compilers Intermediate Represeation & Code Generation These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Review - Types of Intermediate Representations
More informationSISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:
SISTEMI EMBEDDED Computer Organization Pipelining Federico Baronti Last version: 20160518 Basic Concept of Pipelining Circuit technology and hardware arrangement influence the speed of execution for programs
More information: Compiler Design
252-210: Compiler Design 9.0 Data- Flow Analysis Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Global program analysis is a crucial part of all real compilers. Global : beyond a statement
More informationCOMP 412, Fall 2018 Lab 1: A Front End for ILOC
COMP 412, Lab 1: A Front End for ILOC Due date: Submit to: Friday, September 7, 2018 at 11:59 PM comp412code@rice.edu Please report suspected typographical errors to the class Piazza site. We will issue
More informationMo Money, No Problems: Caches #2...
Mo Money, No Problems: Caches #2... 1 Reminder: Cache Terms... Cache: A small and fast memory used to increase the performance of accessing a big and slow memory Uses temporal locality: The tendency to
More informationNaming in OOLs and Storage Layout Comp 412
COMP 412 FALL 2018 Naming in OOLs and Storage Layout Comp 412 source IR IR target Front End Optimizer Back End Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in
More informationInstruc=on Set Architecture
ECPE 170 Jeff Shafer University of the Pacific Instruc=on Set Architecture 2 Schedule Today Closer look at instruc=on sets Friday Quiz 4 (over Chapter 5, i.e. HW #11 and HW #12) Endianness? Infix vs posnix
More informationThe Compiler s Front End (viewed from a lab 1 persepc2ve) Comp 412 COMP 412 FALL Chapter 1 & 2 in EaC2e. target code
COMP 412 FALL 2017 The Compiler s Front End (viewed from a lab 1 persepc2ve) Comp 412 source code IR Front End OpOmizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights
More informationRuntime Support for OOLs Object Records, Code Vectors, Inheritance Comp 412
COMP 412 FALL 2017 Runtime Support for OOLs Object Records, Code Vectors, Inheritance Comp 412 source IR Front End Optimizer Back End IR target Copyright 2017, Keith D. Cooper & Linda Torczon, all rights
More informationModule 4c: Pipelining
Module 4c: Pipelining R E F E R E N C E S : S T A L L I N G S, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E M O R R I S M A N O, C O M P U T E R O R G A N I Z A T I O N A N D A
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationDynamic Control Hazard Avoidance
Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More informationInstruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Instruction Selection: Peephole Matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. The Problem Writing a compiler is a lot of work Would like to reuse components
More informationReal Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel
More informationLecture-13 (ROB and Multi-threading) CS422-Spring
Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue
More informationMultithreading: Exploiting Thread-Level Parallelism within a Processor
Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced
More informationCS 136: Advanced Architecture. Review of Caches
1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you
More informationChapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics
Chapter 4 Objectives Learn the components common to every modern computer system. Chapter 4 MARIE: An Introduction to a Simple Computer Be able to explain how each component contributes to program execution.
More informationLecture Topics ECE 341. Lecture # 8. Functional Units. Information Processed by a Computer
ECE 341 Lecture # 8 Instructor: Zeshan Chishti zeshan@pdx.edu October 22, 2014 Portland State University Lecture Topics Basic Organization and Operation of Computers Functional units of a computer Computer
More informationCS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07
CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 Objectives ---------- 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as
More informationSuperscalar Processors
Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input
More informationIntroduction to Optimization, Instruction Selection and Scheduling, and Register Allocation
Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Traditional Three-pass Compiler
More informationCS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines
CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per
More informationComputing Inside The Parser Syntax-Directed Translation, II. Comp 412
COMP 412 FALL 2018 Computing Inside The Parser Syntax-Directed Translation, II Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights
More informationMicro-programmed Control Ch 15
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of
More informationInstruction Selection, II Tree-pattern matching
Instruction Selection, II Tree-pattern matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 4 at Rice University have explicit permission
More informationMachine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Machine Instructions vs. Micro-instructions Memory execution unit CPU control memory
More informationMicro-programmed Control Ch 15
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of
More informationParallel Processing SIMD, Vector and GPU s cont.
Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP
More informationCode Shape II Expressions & Assignment
Code Shape II Expressions & Assignment Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make
More informationUNIT V: CENTRAL PROCESSING UNIT
UNIT V: CENTRAL PROCESSING UNIT Agenda Basic Instruc1on Cycle & Sets Addressing Instruc1on Format Processor Organiza1on Register Organiza1on Pipeline Processors Instruc1on Pipelining Co-Processors RISC
More informationCS Computer Architecture
CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 An Example Implementation In principle, we could describe the control store in binary, 36 bits per word. We will use a simple symbolic
More informationIn-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution
In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationA Key Theme of CIS 371: Parallelism. CIS 371 Computer Organization and Design. Readings. This Unit: (In-Order) Superscalar Pipelines
A Key Theme of CIS 371: arallelism CIS 371 Computer Organization and Design Unit 10: Superscalar ipelines reviously: pipeline-level parallelism Work on execute of one instruction in parallel with decode
More informationGetting CPI under 1: Outline
CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more
More informationCHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 5th Edition, Irv Englander John
More informationComputing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.
COMP 412 FALL 20167 Computing Inside The Parser Syntax-Directed Translation, II Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationLecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )
Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target
More informationECE 154B Spring Project 4. Dual-Issue Superscalar MIPS Processor. Project Checkoff: Friday, June 1 nd, Report Due: Monday, June 4 th, 2018
Project 4 Dual-Issue Superscalar MIPS Processor Project Checkoff: Friday, June 1 nd, 2018 Report Due: Monday, June 4 th, 2018 Overview: Some machines go beyond pipelining and execute more than one instruction
More informationInstruction Selection and Scheduling
Instruction Selection and Scheduling The Problem Writing a compiler is a lot of work Would like to reuse components whenever possible Would like to automate construction of components Front End Middle
More informationSuperscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency
Superscalar Processors Ch 13 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction 1 New dependency for superscalar case? (8) Name dependency (nimiriippuvuus) two use the same
More informationECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines
ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 14 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationVon Neumann architecture. The first computers used a single fixed program (like a numeric calculator).
Microprocessors Von Neumann architecture The first computers used a single fixed program (like a numeric calculator). To change the program, one has to re-wire, re-structure, or re-design the computer.
More informationInstructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10. 10/4/10 Fall Lecture #16. Agenda
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10 1 Agenda Cache Sizing/Hits and Misses Administrivia
More informationCS415 Compilers. Lexical Analysis
CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework
More informationCompiler Architecture
Code Generation 1 Compiler Architecture Source language Scanner (lexical analysis) Tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Intermediate Language Code Optimizer
More informationNew Advances in Micro-Processors and computer architectures
New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,
More informationGenerating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.
COMP 412 FALL 2017 Generating Code for Assignment Statements back to work Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights
More information