Compiler: Control Flow Optimization

Size: px
Start display at page:

Download "Compiler: Control Flow Optimization"

Transcription

1 Compiler: Control Flow Optimization Virendra Singh Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay Advanced Topics in MNIT Lecture 4 (02 Oct 2015)

2 Compiler Backend Structure Improve code quality (machine independent opti Virtual to physical mapping and machine dependent optimization Control flow analysis Control flow optimization Dataflow analysis Dataflow optimization Instruction Selection Instruction Scheduling Register Allocation Machine Code Emission/Opti Branching structure Computation instructions Bind instrs to physical realizations Bind instrs to physical resources Bind virtual regs to physical regs 02 Oct

3 Control Flow Control transfer = branch (taken or fall- through) Control flow Branching behavior of an applica>on What sequences of instruc>ons can be executed Execu>on à Dynamic control flow Direc>on of a par>cular instance of a branch Predict, speculate, squash, etc. Compiler à Sta>c control flow Not execu>ng the program Input not known, so what could happen, worst case 02 Oct 2015 virendra@mnit 3

4 Regions Region: A collec>on of opera>ons that are treated as a single unit by the compiler Examples Basic block Procedure Body of a loop Proper>es Connected subgraph of opera>ons Control flow is the key parameter that defines regions Hierarchically organized Problem Basic blocks are too small (3-5 opera>ons) Hard to extract sufficient parallelism Procedure control flow too complex for many compiler transforms Plus only parts of a procedure are important (90/ rule) 02 Oct 2015 virendra@mnit 4

5 Regions Want Ø Intermediate sized regions with simple control flow Ø Bigger basic blocks would be ideal!! Ø Separate important code from less important Ø Op>mize frequently executed code at the expense of the rest Solu>on Ø Define new region types that consist of mul>ple BBs Ø Profile informa>on used in the iden>fica>on Ø Sequen>al control flow Ø Pretend the regions are basic blocks 02 Oct

6 Region Type 1 - Trace Trace - Linear collec>on of basic blocks that tend to execute in sequence Likely control flow path Acyclic (outer backedge ok) Side entrance branch into the middle of a trace Side exit branch out of the middle of a trace Compila>on strategy Compile assuming path occurs 0% of the >me Patch up side entrances and exits a_erwards Mo>vated by scheduling (i.e., trace scheduling) 90 BB2 BB Oct 2015 virendra@mnit

7 Linearizing a Trace (entry count) 90 (entry/ exit count) 80 BB (side exit) 20 (side entrance) (side exit) BB5 (side entrance) (exit count) 02 Oct 2015 virendra@mnit 7

8 Intelligent Trace Layout for better I-cache Performance BB2 Intra-procedural code placement Procedure positioning Procedure splitting trace1 trace 2 trace 3 BB5 Trace view The rest Procedure view 02 Oct 2015 virendra@mnit 8

9 Issues With Selecting Traces Acyclic Cannot go past a backedge Trace length Longer = beaer? Not always! On- trace Maximize on- trace Compile assuming on- trace is 0% (ie single BB) Tradeoff (heuris>c) Length Likelihood remain within the trace 90 BB2 BB Oct 2015 virendra@mnit

10 Trace Selection Algorithm i = 0; mark all BBs unvisited while (there are unvisited nodes) do seed = unvisited BB with largest execu>on freq trace[i] += seed mark seed visited current = seed /* Grow trace forward */ while (1) do next = best_successor_of(current) if (next == 0) then break trace[i] += next mark next visited current = next endwhile /* Grow trace backward analogously */ i++ endwhile 02 Oct 2015 virendra@mnit

11 Best Successor/Predecessor Node weight vs edge weight edge more accurate THRESHOLD Controls trace probability 60-70% found best Notes on this algorithm BB only allowed in 1 trace Cumula>ve probability ignored Min weight for seed to be chose (ie executed 0 >mes) best_successor_of(bb) e = control flow edge with highest probability leaving BB if (e is a backedge) then return 0 endif if (probability(e) <= THRESHOLD) then return 0 endif d = destination of e if (d is visited) then return 0 endif return d end procedure 02 Oct 2015 virendra@mnit

12 Trace Selection Example Find the traces. Assume a threshold probability of 60%. 20 BB BB Oct 2015 virendra@mnit 12

13 Traces are Nice, But Treat trace as a big BB Transform trace ignoring side entrance/exits Insert fixup code aka bookkeeping Side entrance fixup is more painful Some>mes not possible so transform not allowed Solu>on Eliminate side entrances The superblock is born 02 Oct 2015 virendra@mnit 90 BB2 BB5 90

14 Region Type 2 - Superblock Superblock - Linear collec>on of basic blocks that tend to execute in sequence in which control flow may only enter at the first BB Likely control flow path Acyclic (outer backedge ok) Trace with no side entrances Side exits s>ll exist Superblock forma>on Trace selec>on Eliminate side entrances 02 Oct 2015 virendra@mnit 90 BB2 BB5 90

15 Tail Duplication To eliminate all side entrances replicate the tail por>on of the trace Iden>fy first side entrance Replicate all BB from the target to the boaom Redirect all side entrances to the duplicated BBs Copy each BB only once Max code expansion = 2x- 1 where x is the number of BB in the trace Adjust profile informa>on 02 Oct 2015 virendra@mnit 90 BB2 BB5 90

16 Superblock Formation 90 BB2 BB BB BB Oct 2015 virendra@mnit 16

17 Issues with Superblocks Central tradeoff Side entrance elimina>on Compiler complexity Compiler effec>veness Code size increase Apply intelligently Most frequently executed BBs are converted to SBs Set upper limit on code expansion x are typical code expansion ra>os from SB forma>on 02 Oct BB BB

18 Predicated Execution Hardware mechanism that allows opera>ons to be condi>onally executed Add an addi>onal boolean source operand (predicate) ADD r1, r2, r3 if p1 if (p1 is True), r1 = r2 + r3 else if (p1 is False), do nothing (Add treated like a NOP) p1 referred to as the guarding predicate Predicated on True means always executed Omiaed predicated also means always executed Provides compiler with an alterna>ve to using branches to selec>vely execute opera>ons If statements in the source Realize with branches in the assembly code Could also realize with condi>onal instruc>ons Or use a combina>on of both 02 Oct 2015 virendra@mnit 18

19 Predicated Execution Example a = b + c if (a > 0) e = f + g else e = f / g h = i - j BB2 add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1: add e, f, g L2: sub h, i, j BB2 p2 à BB2 p3 à Tradi>onal branching code add a, b, c if T p2 = a > 0 if T p3 = a <= 0 if T div e, f, g if p3 BB2 add e, f, g if p2 sub h, i, j if T Predicated code BB2 02 Oct 2015 virendra@mnit 19

20 What About Nested If-then-else s? a = b + c if (a > 0) if (a > 25) e = f + g else e = f * g else e = f / g h = i - j BB2 BB5 add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1: bgt a, 25, L3 mpy e, f, g jump L2 L3: add e, f, g L2: sub h, i, j BB5 BB2 Tradi>onal branching code 02 Oct 2015 virendra@mnit 20

21 Nested If-then-else s No Problem a = b + c if (a > 0) if (a > 25) e = f + g else e = f * g else e = f / g h = i - j BB5 add a, b, c if T p2 = a > 0 if T p3 = a <= 0 if T div e, f, g if p3 p5 = a > 25 if p2 p6 = a <= 25 if p2 mpy e, f, g if p6 add e, f, g if p5 sub h, i, j if T BB2 BB5 Predicated code What do we assume to make this work?? if p2 is False, both p5 and p6 are False So, predicate sesng instruc>on should set result to False if guarding predicate is false!!! 02 Oct 2015 virendra@mnit 21

22 Benefits/Costs of Predicated Execution Benefits Remove branches (both condi>onal and uncondi>onal) Remove branch mispredic>ons Overlap execu>on of if- then- else statements Branches tend to sequen>alize opera>ons Predicates can be computed/used in parallel Costs Useless instruc>ons executed Code size (extra operand, can t fit into 32- bits) Possibly longer schedule lengths The real story Must be applied selec>vely or you get worse performance than not using it at all 02 Oct 2015 virendra@mnit 22

23 Benefits/Costs of Predicated Execution BB2 BB5 BB7 BB2 BB5 BB7 Benefits: - No branches, no mispredicts - Can freely reorder independent opera>ons in the predicated block - Overlap BB2 with BB5 and Costs (execute all paths) - worst case schedule length - worst case resources required 02 Oct 2015 virendra@mnit 23

24 Compare-to-Predicate Operations (CMPPs) How do we compute predicates Compare registers/literals like a branch would do Efficiency, code size, nested condi>onals, etc 2 targets for compu>ng taken/fall- through condi>ons with 1 opera>on p1, p2 = CMPP.cond.D1a.D2a (r1, r2) if p3 p1 = first des>na>on predicate p2 = second des>na>on predicate cond = compare condi>on (ie EQ, LT, GE, ) D1a = ac>on specifier for first des>na>on D2a = ac>on specifier for second des>na>on (r1,r2) = data inputs to be compared (ie r1 < r2) p3 = guarding predicate 02 Oct 2015 virendra@mnit 24

25 CMPP Action Specifiers Guarding predicate Compare Result UN UC ON OC AN AC UN/UC = Uncondi>onal normal/complement This is what we used in the earlier examples guard = 0, both outputs are 0 guard = 1, UN = Compare result, UC = opposite ON/OC = OR- type normal/complement AN/AC = AND- type normal/complement 02 Oct 2015 virendra@mnit 25

26 Thank You 02 Oct

EECS 583 Class 3 Region Formation, Predicated Execution

EECS 583 Class 3 Region Formation, Predicated Execution EECS 583 Class 3 Region Formation, Predicated Execution University of Michigan September 14, 2011 Reading Material Today s class» Trace Selection for Compiling Large C Applications to Microcode, Chang

More information

EECS 583 Class 3 More on loops, Region Formation

EECS 583 Class 3 More on loops, Region Formation EECS 583 Class 3 More on loops, Region Formation University of Michigan September 19, 2016 Announcements & Reading Material HW1 is out Get busy on it!» Course servers are ready to go Today s class» Trace

More information

EECS 583 Class 2 Control Flow Analysis LLVM Introduction

EECS 583 Class 2 Control Flow Analysis LLVM Introduction EECS 583 Class 2 Control Flow Analysis LLVM Introduction University of Michigan September 8, 2014 - 1 - Announcements & Reading Material HW 1 out today, due Friday, Sept 22 (2 wks)» This homework is not

More information

COS 598C - Advanced Compilers

COS 598C - Advanced Compilers Lecture 4: Control Flow Optimization COS 598C Advanced Compilers Reducible Flow Graphs!"# $ &## bb1 Nonreducible! bb2 bb3 Back to Loops Assembly Generation Schema for (i=x; i

More information

Compiler Optimization Intermediate Representation

Compiler Optimization Intermediate Representation Compiler Optimization Intermediate Representation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology

More information

Chapter 3: Instruc0on Level Parallelism and Its Exploita0on

Chapter 3: Instruc0on Level Parallelism and Its Exploita0on Chapter 3: Instruc0on Level Parallelism and Its Exploita0on - Abdullah Muzahid Hardware- Based Specula0on (Sec0on 3.6) In mul0ple issue processors, stalls due to branches would be frequent: You may need

More information

: Advanced Compiler Design. 8.0 Instruc?on scheduling

: Advanced Compiler Design. 8.0 Instruc?on scheduling 6-80: Advanced Compiler Design 8.0 Instruc?on scheduling Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Overview 8. Instruc?on scheduling basics 8. Scheduling for ILP processors 8.

More information

UNIT V: CENTRAL PROCESSING UNIT

UNIT V: CENTRAL PROCESSING UNIT UNIT V: CENTRAL PROCESSING UNIT Agenda Basic Instruc1on Cycle & Sets Addressing Instruc1on Format Processor Organiza1on Register Organiza1on Pipeline Processors Instruc1on Pipelining Co-Processors RISC

More information

ECS 165B: Database System Implementa6on Lecture 14

ECS 165B: Database System Implementa6on Lecture 14 ECS 165B: Database System Implementa6on Lecture 14 UC Davis April 28, 2010 Acknowledgements: por6ons based on slides by Raghu Ramakrishnan and Johannes Gehrke, as well as slides by Zack Ives. Class Agenda

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,

More information

Computer Architecture

Computer Architecture Computer Architecture An Introduction Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

CS553 Lecture Profile-Guided Optimizations 3

CS553 Lecture Profile-Guided Optimizations 3 Profile-Guided Optimizations Last time Instruction scheduling Register renaming alanced Load Scheduling Loop unrolling Software pipelining Today More instruction scheduling Profiling Trace scheduling CS553

More information

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer CS 61C: Great Ideas in Computer Architecture Everything is a Number Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13 9/19/13 Fall 2013 - - Lecture #7 1 New- School Machine Structures

More information

CS 61C: Great Ideas in Computer Architecture Direct- Mapped Caches. Increasing distance from processor, decreasing speed.

CS 61C: Great Ideas in Computer Architecture Direct- Mapped Caches. Increasing distance from processor, decreasing speed. CS 6C: Great Ideas in Computer Architecture Direct- Mapped s 9/27/2 Instructors: Krste Asanovic, Randy H Katz hdp://insteecsberkeleyedu/~cs6c/fa2 Fall 2 - - Lecture #4 New- School Machine Structures (It

More information

Advanced Topics in MNIT. Lecture 1 (27 Aug 2015) CADSL

Advanced Topics in MNIT. Lecture 1 (27 Aug 2015) CADSL Compiler Construction Virendra Singh Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

More information

High-Level Synthesis Creating Custom Circuits from High-Level Code

High-Level Synthesis Creating Custom Circuits from High-Level Code High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida Exis%ng Design Flow Register-transfer (RT) synthesis - Specify RT structure (muxes,

More information

CS 61C: Great Ideas in Computer Architecture. MIPS Instruc,on Representa,on II. Dan Garcia

CS 61C: Great Ideas in Computer Architecture. MIPS Instruc,on Representa,on II. Dan Garcia CS 61C: Great Ideas in Computer Architecture MIPS Instruc,on Representa,on II Dan Garcia 1 Review of Last Lecture Simplifying MIPS: Define instruc?ons to be same size as data word (one word) so that they

More information

CSSE232 Computer Architecture. Logic and Decision Opera:ons

CSSE232 Computer Architecture. Logic and Decision Opera:ons CSSE232 Computer Architecture Logic and Decision Opera:ons Class status Reading for today: Sec:ons 2.6-2.7 Due today HW0 Lab 0 status? Outline Logical opera:ons ShiI operators Pseudo instruc:ons Immediates

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer rchitecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 11: Software Pipelining and Global Scheduling Lecture Outline Review of Loop Unrolling Software Pipelining

More information

Design and Debug: Essen.al Concepts CS 16: Solving Problems with Computers I Lecture #8

Design and Debug: Essen.al Concepts CS 16: Solving Problems with Computers I Lecture #8 Design and Debug: Essen.al Concepts CS 16: Solving Problems with Computers I Lecture #8 Ziad Matni Dept. of Computer Science, UCSB Outline Midterm# 1 Grades Review of key concepts Loop design help Ch.

More information

Opera&ng Systems ECE344

Opera&ng Systems ECE344 Opera&ng Systems ECE344 Lecture 10: Scheduling Ding Yuan Scheduling Overview In discussing process management and synchroniza&on, we talked about context switching among processes/threads on the ready

More information

Lecture 2. White- box Tes2ng and Structural Coverage (see Amman and Offut, Chapter 2)

Lecture 2. White- box Tes2ng and Structural Coverage (see Amman and Offut, Chapter 2) Lecture 2 White- box Tes2ng and Structural Coverage (see Amman and Offut, Chapter 2) White- box Tes2ng (aka. Glass- box or structural tes2ng) An error may exist at one (or more) loca2on(s) Line numbers

More information

ECS 165B: Database System Implementa6on Lecture 3

ECS 165B: Database System Implementa6on Lecture 3 ECS 165B: Database System Implementa6on Lecture 3 UC Davis April 4, 2011 Acknowledgements: some slides based on earlier ones by Raghu Ramakrishnan, Johannes Gehrke, Jennifer Widom, Bertram Ludaescher,

More information

CS 61C: Great Ideas in Computer Architecture Strings and Func.ons. Anything can be represented as a number, i.e., data or instruc\ons

CS 61C: Great Ideas in Computer Architecture Strings and Func.ons. Anything can be represented as a number, i.e., data or instruc\ons CS 61C: Great Ideas in Computer Architecture Strings and Func.ons Instructor: Krste Asanovic, Randy H. Katz hdp://inst.eecs.berkeley.edu/~cs61c/sp12 Fall 2012 - - Lecture #7 1 New- School Machine Structures

More information

CS 61C: Great Ideas in Computer Architecture Func%ons and Numbers

CS 61C: Great Ideas in Computer Architecture Func%ons and Numbers CS 61C: Great Ideas in Computer Architecture Func%ons and Numbers 9/11/12 Instructor: Krste Asanovic, Randy H. Katz hcp://inst.eecs.berkeley.edu/~cs61c/sp12 Fall 2012 - - Lecture #8 1 New- School Machine

More information

Effec%ve So*ware. Lecture 9: JVM - Memory Analysis, Data Structures, Object Alloca=on. David Šišlák

Effec%ve So*ware. Lecture 9: JVM - Memory Analysis, Data Structures, Object Alloca=on. David Šišlák Effec%ve So*ware Lecture 9: JVM - Memory Analysis, Data Structures, Object Alloca=on David Šišlák david.sislak@fel.cvut.cz JVM Performance Factors and Memory Analysis» applica=on performance factors total

More information

Embedded Enabling Features MODULE 4. mpcdata delivering software innovation

Embedded Enabling Features MODULE 4. mpcdata delivering software innovation Embedded Enabling Features MODULE 4 Headless Opera@on A System without Display, Keyboard, Mouse Headless must be supported by system BIOS Replace user input/output with another input/output method LCD

More information

RISC Architecture: Multi-Cycle Implementation

RISC Architecture: Multi-Cycle Implementation RISC Architecture: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay

More information

Ways to implement a language

Ways to implement a language Interpreters Implemen+ng PLs Most of the course is learning fundamental concepts for using PLs Syntax vs. seman+cs vs. idioms Powerful constructs like closures, first- class objects, iterators (streams),

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2010 Lecture 6: VLIW 563 L06.1 Fall 2010 Little s Law Number of Instructions in the pipeline (parallelism) = Throughput * Latency or N T L Throughput per Cycle

More information

Lecture 2. White- box Tes2ng and Structural Coverage (see Amman and Offut, Chapter 2)

Lecture 2. White- box Tes2ng and Structural Coverage (see Amman and Offut, Chapter 2) Lecture 2 White- box Tes2ng and Structural Coverage (see Amman and Offut, Chapter 2) White- box Tes2ng (aka. Glass- box or structural tes2ng) An error may exist at one (or more) loca2on(s) Line numbers

More information

ECE 4750 Computer Architecture, Fall 2017 T13 Advanced Processors: Branch Prediction

ECE 4750 Computer Architecture, Fall 2017 T13 Advanced Processors: Branch Prediction ECE 4750 Computer Architecture, Fall 2017 T13 Advanced Processors: Branch Prediction School of Electrical and Computer Engineering Cornell University revision: 2017-11-20-08-48 1 Branch Prediction Overview

More information

Lecture 1 Introduc-on

Lecture 1 Introduc-on Lecture 1 Introduc-on What would you get out of this course? Structure of a Compiler Op9miza9on Example 15-745: Introduc9on 1 What Do Compilers Do? 1. Translate one language into another e.g., convert

More information

Libraries are wri4en in C/C++ and compiled for the par>cular hardware.

Libraries are wri4en in C/C++ and compiled for the par>cular hardware. marakana.com 1 marakana.com 2 marakana.com 3 marakana.com 4 Libraries are wri4en in C/C++ and compiled for the par>cular hardware. marakana.com 5 The Dalvik virtual machine is a major piece of Google's

More information

RISC Architecture: Multi-Cycle Implementation

RISC Architecture: Multi-Cycle Implementation RISC Architecture: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay

More information

Computer Architecture

Computer Architecture Computer Architecture Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

Sta$c Single Assignment (SSA) Form

Sta$c Single Assignment (SSA) Form Sta$c Single Assignment (SSA) Form SSA form Sta$c single assignment form Intermediate representa$on of program in which every use of a variable is reached by exactly one defini$on Most programs do not

More information

Advanced branch predic.on algorithms. Ryan Gabrys Ilya Kolykhmatov

Advanced branch predic.on algorithms. Ryan Gabrys Ilya Kolykhmatov Advanced branch predic.on algorithms Ryan Gabrys Ilya Kolykhmatov Context Branches are frequent: 15-25 % A branch predictor allows the processor to specula.vely fetch and execute instruc.ons down the predicted

More information

Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein

Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein Register Alloca.on Deconstructed David Ryan Koes Seth Copen Goldstein 12th Interna+onal Workshop on So3ware and Compilers for Embedded Systems April 24, 12009 Register Alloca:on Problem unbounded number

More information

Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation

Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation 2nd International Workshop on Overlay Architectures for FPGAs (OLAF) 2016 Kevin Andryc, Tedy Thomas and Russell Tessier University of Massachusetts

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Lecture 18 List Scheduling & Global Scheduling Carnegie Mellon

Lecture 18 List Scheduling & Global Scheduling Carnegie Mellon Lecture 18 List Scheduling & Global Scheduling Reading: Chapter 10.3-10.4 1 Review: The Ideal Scheduling Outcome What prevents us from achieving this ideal? Before After Time 1 cycle N cycles 2 Review:

More information

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson Search Engines Informa1on Retrieval in Prac1ce Annotations by Michael L. Nelson All slides Addison Wesley, 2008 Indexes Indexes are data structures designed to make search faster Text search has unique

More information

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy

More information

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson Search Engines Informa1on Retrieval in Prac1ce Annota1ons by Michael L. Nelson All slides Addison Wesley, 2008 Evalua1on Evalua1on is key to building effec$ve and efficient search engines measurement usually

More information

Computer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505

Computer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505 Computer Architecture: Mul1ple Issue Berk Sunar and Thomas Eisenbarth ECE 505 Outline 5 stages of RISC Type of hazards Sta@c and Dynamic Branch Predic@on Pipelining with Excep@ons Pipelining with Floa@ng-

More information

Program Op*miza*on and Analysis. Chenyang Lu CSE 467S

Program Op*miza*on and Analysis. Chenyang Lu CSE 467S Program Op*miza*on and Analysis Chenyang Lu CSE 467S 1 Program Transforma*on op#mize Analyze HLL compile assembly assemble Physical Address Rela5ve Address assembly object load executable link Absolute

More information

Outline. Review: Assembly/Machine Code View. Processor State (x86-64, Par2al) Condi2on Codes (Explicit Se^ng: Compare) Condi2on Codes (Implicit Se^ng)

Outline. Review: Assembly/Machine Code View. Processor State (x86-64, Par2al) Condi2on Codes (Explicit Se^ng: Compare) Condi2on Codes (Implicit Se^ng) Outline Machine- Level Representa2on: Control CSCI 2021: Machine Architecture and Organiza2on Pen- Chung Yew Department Computer Science and Engineering University of Minnesota Control: Condi2on codes

More information

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Indexing Process Indexes Indexes are data structures designed to make search faster Text search

More information

ques4ons? Midterm Projects, etc. Path- Based Sta4c Analysis Sta4c analysis we know Example 11/20/12

ques4ons? Midterm Projects, etc. Path- Based Sta4c Analysis Sta4c analysis we know Example 11/20/12 Midterm Grades and solu4ons are (and have been) on Moodle The midterm was hard[er than I thought] grades will be scaled I gave everyone a 10 bonus point (already included in your total) max: 98 mean: 71

More information

: Compiler Design

: Compiler Design 252-210: Compiler Design 9.0 Data- Flow Analysis Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Global program analysis is a crucial part of all real compilers. Global : beyond a statement

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version: SISTEMI EMBEDDED Computer Organization Pipelining Federico Baronti Last version: 20160518 Basic Concept of Pipelining Circuit technology and hardware arrangement influence the speed of execution for programs

More information

Page # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two?

Page # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two? Exploiting ILP through Software Approaches Venkatesh Akella EEC 270 Winter 2005 Based on Slides from Prof. Al. Davis @ cs.utah.edu Let the Compiler Do it Pros and Cons Pros No window size limitation, the

More information

The Processor: Improving the performance - Control Hazards

The Processor: Improving the performance - Control Hazards The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary

More information

h7ps://bit.ly/citustutorial

h7ps://bit.ly/citustutorial Before We Start Setup a Citus Cloud account for the exercises: h7ps://bit.ly/citustutorial Designing a Mul

More information

ece4750-t11-ooo-execution-notes.txt ========================================================================== ece4750-l12-ooo-execution-notes.txt ==========================================================================

More information

Reading assignment. Chapter 3.1, 3.2 Chapter 4.1, 4.3

Reading assignment. Chapter 3.1, 3.2 Chapter 4.1, 4.3 Reading assignment Chapter 3.1, 3.2 Chapter 4.1, 4.3 1 Outline Introduc5on to assembly programing Introduc5on to Y86 Y86 instruc5ons, encoding and execu5on 2 Assembly The CPU uses machine language to perform

More information

Database design and implementation CMPSCI 645. Lecture 08: Storage and Indexing

Database design and implementation CMPSCI 645. Lecture 08: Storage and Indexing Database design and implementation CMPSCI 645 Lecture 08: Storage and Indexing 1 Where is the data and how to get to it? DB 2 DBMS architecture Query Parser Query Rewriter Query Op=mizer Query Executor

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer)

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer) Agenda Caches Samira Khan March 23, 2017 Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed

More information

Caches. Samira Khan March 23, 2017

Caches. Samira Khan March 23, 2017 Caches Samira Khan March 23, 2017 Agenda Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering dministration CS 1/13 Introduction to Compilers and Translators ndrew Myers Cornell University P due in 1 week Optional reading: Muchnick 17 Lecture 30: Instruction scheduling 1 pril 00 1 Impact of instruction

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 7

ECE 571 Advanced Microprocessor-Based Design Lecture 7 ECE 571 Advanced Microprocessor-Based Design Lecture 7 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 9 February 2016 HW2 Grades Ready Announcements HW3 Posted be careful when

More information

Objec0ves. Gain understanding of what IDA Pro is and what it can do. Expose students to the tool GUI

Objec0ves. Gain understanding of what IDA Pro is and what it can do. Expose students to the tool GUI Intro to IDA Pro 31/15 Objec0ves Gain understanding of what IDA Pro is and what it can do Expose students to the tool GUI Discuss some of the important func

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

NAMES, SCOPES AND BINDING A REVIEW OF THE CONCEPTS

NAMES, SCOPES AND BINDING A REVIEW OF THE CONCEPTS NAMES, SCOPES AND BINDING A REVIEW OF THE CONCEPTS Name Binding and Binding Time Name binding is the associa1on of objects (data and/or code) with names (iden1fiers) Shape S = new Shape(); The binding

More information

Caching and Demand- Paged Virtual Memory

Caching and Demand- Paged Virtual Memory Caching and Demand- Paged Virtual Memory Defini8ons Cache Copy of data that is faster to access than the original Hit: if cache has copy Miss: if cache does not have copy Cache block Unit of cache storage

More information

Beyond ILP II: SMT and variants. 1 Simultaneous MT: D. Tullsen, S. Eggers, and H. Levy

Beyond ILP II: SMT and variants. 1 Simultaneous MT: D. Tullsen, S. Eggers, and H. Levy EE482: Advanced Computer Organization Lecture #13 Processor Architecture Stanford University Handout Date??? Beyond ILP II: SMT and variants Lecture #13: Wednesday, 10 May 2000 Lecturer: Anamaya Sullery

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

Agenda. Review 2/10/11. CS 61C: Great Ideas in Computer Architecture (Machine Structures)

Agenda. Review 2/10/11. CS 61C: Great Ideas in Computer Architecture (Machine Structures) CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/sp11 1 2 New- School Machine Structures (It s a bit more

More information

Instruction Scheduling. Software Pipelining - 3

Instruction Scheduling. Software Pipelining - 3 Instruction Scheduling and Software Pipelining - 3 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Instruction

More information

Processor Architecture

Processor Architecture ECPE 170 Jeff Shafer University of the Pacific Processor Architecture 2 Lab Schedule Ac=vi=es Assignments Due Today Wednesday Apr 24 th Processor Architecture Lab 12 due by 11:59pm Wednesday Network Programming

More information

Computer Programming-I. Developed by: Strawberry

Computer Programming-I. Developed by: Strawberry Computer Programming-I Objec=ve of CP-I The course will enable the students to understand the basic concepts of structured programming. What is programming? Wri=ng a set of instruc=ons that computer use

More information

CS 61C: Great Ideas in Computer Architecture Compilers and Floa-ng Point. Today s. Lecture

CS 61C: Great Ideas in Computer Architecture Compilers and Floa-ng Point. Today s. Lecture CS 61C: Great Ideas in Computer Architecture s and Floa-ng Point Instructors: Krste Asanovic, Randy H. Katz hdp://inst.eecs.berkeley.edu/~cs61c/fa12 Fall 2012 - - Lecture #13 1 New- School Machine Structures

More information

Dynamic Languages. CSE 501 Spring 15. With materials adopted from John Mitchell

Dynamic Languages. CSE 501 Spring 15. With materials adopted from John Mitchell Dynamic Languages CSE 501 Spring 15 With materials adopted from John Mitchell Dynamic Programming Languages Languages where program behavior, broadly construed, cannot be determined during compila@on Types

More information

hnp://

hnp:// The bots face off in a tournament against one another and about an equal number of humans, with each player trying to score points by elimina&ng its opponents. Each player also has a "judging gun" in addi&on

More information

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted

More information

Document Databases: MongoDB

Document Databases: MongoDB NDBI040: Big Data Management and NoSQL Databases hp://www.ksi.mff.cuni.cz/~svoboda/courses/171-ndbi040/ Lecture 9 Document Databases: MongoDB Marn Svoboda svoboda@ksi.mff.cuni.cz 28. 11. 2017 Charles University

More information

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013

A Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013 A Bad Name Optimization is the process by which we turn a program into a better one, for some definition of better. CS 2210: Optimization This is impossible in the general case. For instance, a fully optimizing

More information

Evaluating Compiler Support for Complexity Effective Network Processing

Evaluating Compiler Support for Complexity Effective Network Processing Evaluating Compiler Support for Complexity Effective Network Processing Pradeep Rao and S.K. Nandy Computer Aided Design Laboratory. SERC, Indian Institute of Science. pradeep,nandy@cadl.iisc.ernet.in

More information

Complex Pipelines and Branch Prediction

Complex Pipelines and Branch Prediction Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle

More information

COSC 111: Computer Programming I. Dr. Bowen Hui University of Bri>sh Columbia Okanagan

COSC 111: Computer Programming I. Dr. Bowen Hui University of Bri>sh Columbia Okanagan COSC 111: Computer Programming I Dr. Bowen Hui University of Bri>sh Columbia Okanagan 1 First half of course SoEware examples From English to Java Template for building small programs Exposure to Java

More information

Multiple Instruction Issue and Hardware Based Speculation

Multiple Instruction Issue and Hardware Based Speculation Multiple Instruction Issue and Hardware Based Speculation Soner Önder Michigan Technological University, Houghton MI www.cs.mtu.edu/~soner Hardware Based Speculation Exploiting more ILP requires that we

More information

Algorithms Lecture 11. UC Davis, ECS20, Winter Discrete Mathematics for Computer Science

Algorithms Lecture 11. UC Davis, ECS20, Winter Discrete Mathematics for Computer Science UC Davis, ECS20, Winter 2017 Discrete Mathematics for Computer Science Prof. Raissa D Souza (slides adopted from Michael Frank and Haluk Bingöl) Lecture 11 Algorithms 3.1-3.2 Algorithms Member of the House

More information

Design and Debug: Essen.al Concepts Numerical Conversions CS 16: Solving Problems with Computers Lecture #7

Design and Debug: Essen.al Concepts Numerical Conversions CS 16: Solving Problems with Computers Lecture #7 Design and Debug: Essen.al Concepts Numerical Conversions CS 16: Solving Problems with Computers Lecture #7 Ziad Matni Dept. of Computer Science, UCSB Announcements We are grading your midterms this week!

More information

ECSE 425 Lecture 25: Mul1- threading

ECSE 425 Lecture 25: Mul1- threading ECSE 425 Lecture 25: Mul1- threading H&P Chapter 3 Last Time Theore1cal and prac1cal limits of ILP Instruc1on window Branch predic1on Register renaming 2 Today Mul1- threading Chapter 3.5 Summary of ILP:

More information

CS 465 Final Review. Fall 2017 Prof. Daniel Menasce

CS 465 Final Review. Fall 2017 Prof. Daniel Menasce CS 465 Final Review Fall 2017 Prof. Daniel Menasce Ques@ons What are the types of hazards in a datapath and how each of them can be mi@gated? State and explain some of the methods used to deal with branch

More information

Performance Op>miza>on

Performance Op>miza>on ECPE 170 Jeff Shafer University of the Pacific Performance Op>miza>on 2 Lab Schedule This Week Ac>vi>es Background discussion Lab 5 Performance Measurement Lab 6 Performance Op;miza;on Lab 5 Assignments

More information

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis Hardware-Software Codesign 9. Worst Case Execution Time Analysis Lothar Thiele 9-1 System Design Specification System Synthesis Estimation SW-Compilation Intellectual Prop. Code Instruction Set HW-Synthesis

More information

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB)

More information

VHDL: Concurrent Coding vs. Sequen7al Coding. 1

VHDL: Concurrent Coding vs. Sequen7al Coding. 1 VHDL: Concurrent Coding vs. Sequen7al Coding talarico@gonzaga.edu 1 Concurrent Coding Concurrent = parallel VHDL code is inherently concurrent Concurrent statements are adequate only to code at a very

More information

Searching and Sorting (Savitch, Chapter 7.4)

Searching and Sorting (Savitch, Chapter 7.4) Searching and Sorting (Savitch, Chapter 7.4) TOPICS Algorithms Complexity Binary Search Bubble Sort Insertion Sort Selection Sort What is an algorithm? A finite set of precise instruc6ons for performing

More information

What is an algorithm?

What is an algorithm? /0/ What is an algorithm? Searching and Sorting (Savitch, Chapter 7.) TOPICS Algorithms Complexity Binary Search Bubble Sort Insertion Sort Selection Sort A finite set of precise instrucons for performing

More information

VLIW/EPIC: Statically Scheduled ILP

VLIW/EPIC: Statically Scheduled ILP 6.823, L21-1 VLIW/EPIC: Statically Scheduled ILP Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind

More information

Speculative Trace Scheduling in VLIW Processors

Speculative Trace Scheduling in VLIW Processors Speculative Trace Scheduling in VLIW Processors Manvi Agarwal and S.K. Nandy CADL, SERC, Indian Institute of Science, Bangalore, INDIA {manvi@rishi.,nandy@}serc.iisc.ernet.in J.v.Eijndhoven and S. Balakrishnan

More information

Component diagrams. Components Components are model elements that represent independent, interchangeable parts of a system.

Component diagrams. Components Components are model elements that represent independent, interchangeable parts of a system. Component diagrams Components Components are model elements that represent independent, interchangeable parts of a system. Components are more abstract than classes and can be considered to be stand- alone

More information

CS 406/534 Compiler Construction Putting It All Together

CS 406/534 Compiler Construction Putting It All Together CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy

More information