Compiler: Control Flow Optimization
|
|
- Bartholomew Evans
- 5 years ago
- Views:
Transcription
1 Compiler: Control Flow Optimization Virendra Singh Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay Advanced Topics in MNIT Lecture 4 (02 Oct 2015)
2 Compiler Backend Structure Improve code quality (machine independent opti Virtual to physical mapping and machine dependent optimization Control flow analysis Control flow optimization Dataflow analysis Dataflow optimization Instruction Selection Instruction Scheduling Register Allocation Machine Code Emission/Opti Branching structure Computation instructions Bind instrs to physical realizations Bind instrs to physical resources Bind virtual regs to physical regs 02 Oct
3 Control Flow Control transfer = branch (taken or fall- through) Control flow Branching behavior of an applica>on What sequences of instruc>ons can be executed Execu>on à Dynamic control flow Direc>on of a par>cular instance of a branch Predict, speculate, squash, etc. Compiler à Sta>c control flow Not execu>ng the program Input not known, so what could happen, worst case 02 Oct 2015 virendra@mnit 3
4 Regions Region: A collec>on of opera>ons that are treated as a single unit by the compiler Examples Basic block Procedure Body of a loop Proper>es Connected subgraph of opera>ons Control flow is the key parameter that defines regions Hierarchically organized Problem Basic blocks are too small (3-5 opera>ons) Hard to extract sufficient parallelism Procedure control flow too complex for many compiler transforms Plus only parts of a procedure are important (90/ rule) 02 Oct 2015 virendra@mnit 4
5 Regions Want Ø Intermediate sized regions with simple control flow Ø Bigger basic blocks would be ideal!! Ø Separate important code from less important Ø Op>mize frequently executed code at the expense of the rest Solu>on Ø Define new region types that consist of mul>ple BBs Ø Profile informa>on used in the iden>fica>on Ø Sequen>al control flow Ø Pretend the regions are basic blocks 02 Oct
6 Region Type 1 - Trace Trace - Linear collec>on of basic blocks that tend to execute in sequence Likely control flow path Acyclic (outer backedge ok) Side entrance branch into the middle of a trace Side exit branch out of the middle of a trace Compila>on strategy Compile assuming path occurs 0% of the >me Patch up side entrances and exits a_erwards Mo>vated by scheduling (i.e., trace scheduling) 90 BB2 BB Oct 2015 virendra@mnit
7 Linearizing a Trace (entry count) 90 (entry/ exit count) 80 BB (side exit) 20 (side entrance) (side exit) BB5 (side entrance) (exit count) 02 Oct 2015 virendra@mnit 7
8 Intelligent Trace Layout for better I-cache Performance BB2 Intra-procedural code placement Procedure positioning Procedure splitting trace1 trace 2 trace 3 BB5 Trace view The rest Procedure view 02 Oct 2015 virendra@mnit 8
9 Issues With Selecting Traces Acyclic Cannot go past a backedge Trace length Longer = beaer? Not always! On- trace Maximize on- trace Compile assuming on- trace is 0% (ie single BB) Tradeoff (heuris>c) Length Likelihood remain within the trace 90 BB2 BB Oct 2015 virendra@mnit
10 Trace Selection Algorithm i = 0; mark all BBs unvisited while (there are unvisited nodes) do seed = unvisited BB with largest execu>on freq trace[i] += seed mark seed visited current = seed /* Grow trace forward */ while (1) do next = best_successor_of(current) if (next == 0) then break trace[i] += next mark next visited current = next endwhile /* Grow trace backward analogously */ i++ endwhile 02 Oct 2015 virendra@mnit
11 Best Successor/Predecessor Node weight vs edge weight edge more accurate THRESHOLD Controls trace probability 60-70% found best Notes on this algorithm BB only allowed in 1 trace Cumula>ve probability ignored Min weight for seed to be chose (ie executed 0 >mes) best_successor_of(bb) e = control flow edge with highest probability leaving BB if (e is a backedge) then return 0 endif if (probability(e) <= THRESHOLD) then return 0 endif d = destination of e if (d is visited) then return 0 endif return d end procedure 02 Oct 2015 virendra@mnit
12 Trace Selection Example Find the traces. Assume a threshold probability of 60%. 20 BB BB Oct 2015 virendra@mnit 12
13 Traces are Nice, But Treat trace as a big BB Transform trace ignoring side entrance/exits Insert fixup code aka bookkeeping Side entrance fixup is more painful Some>mes not possible so transform not allowed Solu>on Eliminate side entrances The superblock is born 02 Oct 2015 virendra@mnit 90 BB2 BB5 90
14 Region Type 2 - Superblock Superblock - Linear collec>on of basic blocks that tend to execute in sequence in which control flow may only enter at the first BB Likely control flow path Acyclic (outer backedge ok) Trace with no side entrances Side exits s>ll exist Superblock forma>on Trace selec>on Eliminate side entrances 02 Oct 2015 virendra@mnit 90 BB2 BB5 90
15 Tail Duplication To eliminate all side entrances replicate the tail por>on of the trace Iden>fy first side entrance Replicate all BB from the target to the boaom Redirect all side entrances to the duplicated BBs Copy each BB only once Max code expansion = 2x- 1 where x is the number of BB in the trace Adjust profile informa>on 02 Oct 2015 virendra@mnit 90 BB2 BB5 90
16 Superblock Formation 90 BB2 BB BB BB Oct 2015 virendra@mnit 16
17 Issues with Superblocks Central tradeoff Side entrance elimina>on Compiler complexity Compiler effec>veness Code size increase Apply intelligently Most frequently executed BBs are converted to SBs Set upper limit on code expansion x are typical code expansion ra>os from SB forma>on 02 Oct BB BB
18 Predicated Execution Hardware mechanism that allows opera>ons to be condi>onally executed Add an addi>onal boolean source operand (predicate) ADD r1, r2, r3 if p1 if (p1 is True), r1 = r2 + r3 else if (p1 is False), do nothing (Add treated like a NOP) p1 referred to as the guarding predicate Predicated on True means always executed Omiaed predicated also means always executed Provides compiler with an alterna>ve to using branches to selec>vely execute opera>ons If statements in the source Realize with branches in the assembly code Could also realize with condi>onal instruc>ons Or use a combina>on of both 02 Oct 2015 virendra@mnit 18
19 Predicated Execution Example a = b + c if (a > 0) e = f + g else e = f / g h = i - j BB2 add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1: add e, f, g L2: sub h, i, j BB2 p2 à BB2 p3 à Tradi>onal branching code add a, b, c if T p2 = a > 0 if T p3 = a <= 0 if T div e, f, g if p3 BB2 add e, f, g if p2 sub h, i, j if T Predicated code BB2 02 Oct 2015 virendra@mnit 19
20 What About Nested If-then-else s? a = b + c if (a > 0) if (a > 25) e = f + g else e = f * g else e = f / g h = i - j BB2 BB5 add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1: bgt a, 25, L3 mpy e, f, g jump L2 L3: add e, f, g L2: sub h, i, j BB5 BB2 Tradi>onal branching code 02 Oct 2015 virendra@mnit 20
21 Nested If-then-else s No Problem a = b + c if (a > 0) if (a > 25) e = f + g else e = f * g else e = f / g h = i - j BB5 add a, b, c if T p2 = a > 0 if T p3 = a <= 0 if T div e, f, g if p3 p5 = a > 25 if p2 p6 = a <= 25 if p2 mpy e, f, g if p6 add e, f, g if p5 sub h, i, j if T BB2 BB5 Predicated code What do we assume to make this work?? if p2 is False, both p5 and p6 are False So, predicate sesng instruc>on should set result to False if guarding predicate is false!!! 02 Oct 2015 virendra@mnit 21
22 Benefits/Costs of Predicated Execution Benefits Remove branches (both condi>onal and uncondi>onal) Remove branch mispredic>ons Overlap execu>on of if- then- else statements Branches tend to sequen>alize opera>ons Predicates can be computed/used in parallel Costs Useless instruc>ons executed Code size (extra operand, can t fit into 32- bits) Possibly longer schedule lengths The real story Must be applied selec>vely or you get worse performance than not using it at all 02 Oct 2015 virendra@mnit 22
23 Benefits/Costs of Predicated Execution BB2 BB5 BB7 BB2 BB5 BB7 Benefits: - No branches, no mispredicts - Can freely reorder independent opera>ons in the predicated block - Overlap BB2 with BB5 and Costs (execute all paths) - worst case schedule length - worst case resources required 02 Oct 2015 virendra@mnit 23
24 Compare-to-Predicate Operations (CMPPs) How do we compute predicates Compare registers/literals like a branch would do Efficiency, code size, nested condi>onals, etc 2 targets for compu>ng taken/fall- through condi>ons with 1 opera>on p1, p2 = CMPP.cond.D1a.D2a (r1, r2) if p3 p1 = first des>na>on predicate p2 = second des>na>on predicate cond = compare condi>on (ie EQ, LT, GE, ) D1a = ac>on specifier for first des>na>on D2a = ac>on specifier for second des>na>on (r1,r2) = data inputs to be compared (ie r1 < r2) p3 = guarding predicate 02 Oct 2015 virendra@mnit 24
25 CMPP Action Specifiers Guarding predicate Compare Result UN UC ON OC AN AC UN/UC = Uncondi>onal normal/complement This is what we used in the earlier examples guard = 0, both outputs are 0 guard = 1, UN = Compare result, UC = opposite ON/OC = OR- type normal/complement AN/AC = AND- type normal/complement 02 Oct 2015 virendra@mnit 25
26 Thank You 02 Oct
EECS 583 Class 3 Region Formation, Predicated Execution
EECS 583 Class 3 Region Formation, Predicated Execution University of Michigan September 14, 2011 Reading Material Today s class» Trace Selection for Compiling Large C Applications to Microcode, Chang
More informationEECS 583 Class 3 More on loops, Region Formation
EECS 583 Class 3 More on loops, Region Formation University of Michigan September 19, 2016 Announcements & Reading Material HW1 is out Get busy on it!» Course servers are ready to go Today s class» Trace
More informationEECS 583 Class 2 Control Flow Analysis LLVM Introduction
EECS 583 Class 2 Control Flow Analysis LLVM Introduction University of Michigan September 8, 2014 - 1 - Announcements & Reading Material HW 1 out today, due Friday, Sept 22 (2 wks)» This homework is not
More informationCOS 598C - Advanced Compilers
Lecture 4: Control Flow Optimization COS 598C Advanced Compilers Reducible Flow Graphs!"# $ &## bb1 Nonreducible! bb2 bb3 Back to Loops Assembly Generation Schema for (i=x; i
More informationCompiler Optimization Intermediate Representation
Compiler Optimization Intermediate Representation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology
More informationChapter 3: Instruc0on Level Parallelism and Its Exploita0on
Chapter 3: Instruc0on Level Parallelism and Its Exploita0on - Abdullah Muzahid Hardware- Based Specula0on (Sec0on 3.6) In mul0ple issue processors, stalls due to branches would be frequent: You may need
More information: Advanced Compiler Design. 8.0 Instruc?on scheduling
6-80: Advanced Compiler Design 8.0 Instruc?on scheduling Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Overview 8. Instruc?on scheduling basics 8. Scheduling for ILP processors 8.
More informationUNIT V: CENTRAL PROCESSING UNIT
UNIT V: CENTRAL PROCESSING UNIT Agenda Basic Instruc1on Cycle & Sets Addressing Instruc1on Format Processor Organiza1on Register Organiza1on Pipeline Processors Instruc1on Pipelining Co-Processors RISC
More informationECS 165B: Database System Implementa6on Lecture 14
ECS 165B: Database System Implementa6on Lecture 14 UC Davis April 28, 2010 Acknowledgements: por6ons based on slides by Raghu Ramakrishnan and Johannes Gehrke, as well as slides by Zack Ives. Class Agenda
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,
More informationComputer Architecture
Computer Architecture An Introduction Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/
More informationCS553 Lecture Profile-Guided Optimizations 3
Profile-Guided Optimizations Last time Instruction scheduling Register renaming alanced Load Scheduling Loop unrolling Software pipelining Today More instruction scheduling Profiling Trace scheduling CS553
More informationInstructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer
CS 61C: Great Ideas in Computer Architecture Everything is a Number Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13 9/19/13 Fall 2013 - - Lecture #7 1 New- School Machine Structures
More informationCS 61C: Great Ideas in Computer Architecture Direct- Mapped Caches. Increasing distance from processor, decreasing speed.
CS 6C: Great Ideas in Computer Architecture Direct- Mapped s 9/27/2 Instructors: Krste Asanovic, Randy H Katz hdp://insteecsberkeleyedu/~cs6c/fa2 Fall 2 - - Lecture #4 New- School Machine Structures (It
More informationAdvanced Topics in MNIT. Lecture 1 (27 Aug 2015) CADSL
Compiler Construction Virendra Singh Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:
More informationHigh-Level Synthesis Creating Custom Circuits from High-Level Code
High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida Exis%ng Design Flow Register-transfer (RT) synthesis - Specify RT structure (muxes,
More informationCS 61C: Great Ideas in Computer Architecture. MIPS Instruc,on Representa,on II. Dan Garcia
CS 61C: Great Ideas in Computer Architecture MIPS Instruc,on Representa,on II Dan Garcia 1 Review of Last Lecture Simplifying MIPS: Define instruc?ons to be same size as data word (one word) so that they
More informationCSSE232 Computer Architecture. Logic and Decision Opera:ons
CSSE232 Computer Architecture Logic and Decision Opera:ons Class status Reading for today: Sec:ons 2.6-2.7 Due today HW0 Lab 0 status? Outline Logical opera:ons ShiI operators Pseudo instruc:ons Immediates
More informationComputer Science 146. Computer Architecture
Computer rchitecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 11: Software Pipelining and Global Scheduling Lecture Outline Review of Loop Unrolling Software Pipelining
More informationDesign and Debug: Essen.al Concepts CS 16: Solving Problems with Computers I Lecture #8
Design and Debug: Essen.al Concepts CS 16: Solving Problems with Computers I Lecture #8 Ziad Matni Dept. of Computer Science, UCSB Outline Midterm# 1 Grades Review of key concepts Loop design help Ch.
More informationOpera&ng Systems ECE344
Opera&ng Systems ECE344 Lecture 10: Scheduling Ding Yuan Scheduling Overview In discussing process management and synchroniza&on, we talked about context switching among processes/threads on the ready
More informationLecture 2. White- box Tes2ng and Structural Coverage (see Amman and Offut, Chapter 2)
Lecture 2 White- box Tes2ng and Structural Coverage (see Amman and Offut, Chapter 2) White- box Tes2ng (aka. Glass- box or structural tes2ng) An error may exist at one (or more) loca2on(s) Line numbers
More informationECS 165B: Database System Implementa6on Lecture 3
ECS 165B: Database System Implementa6on Lecture 3 UC Davis April 4, 2011 Acknowledgements: some slides based on earlier ones by Raghu Ramakrishnan, Johannes Gehrke, Jennifer Widom, Bertram Ludaescher,
More informationCS 61C: Great Ideas in Computer Architecture Strings and Func.ons. Anything can be represented as a number, i.e., data or instruc\ons
CS 61C: Great Ideas in Computer Architecture Strings and Func.ons Instructor: Krste Asanovic, Randy H. Katz hdp://inst.eecs.berkeley.edu/~cs61c/sp12 Fall 2012 - - Lecture #7 1 New- School Machine Structures
More informationCS 61C: Great Ideas in Computer Architecture Func%ons and Numbers
CS 61C: Great Ideas in Computer Architecture Func%ons and Numbers 9/11/12 Instructor: Krste Asanovic, Randy H. Katz hcp://inst.eecs.berkeley.edu/~cs61c/sp12 Fall 2012 - - Lecture #8 1 New- School Machine
More informationEffec%ve So*ware. Lecture 9: JVM - Memory Analysis, Data Structures, Object Alloca=on. David Šišlák
Effec%ve So*ware Lecture 9: JVM - Memory Analysis, Data Structures, Object Alloca=on David Šišlák david.sislak@fel.cvut.cz JVM Performance Factors and Memory Analysis» applica=on performance factors total
More informationEmbedded Enabling Features MODULE 4. mpcdata delivering software innovation
Embedded Enabling Features MODULE 4 Headless Opera@on A System without Display, Keyboard, Mouse Headless must be supported by system BIOS Replace user input/output with another input/output method LCD
More informationRISC Architecture: Multi-Cycle Implementation
RISC Architecture: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay
More informationWays to implement a language
Interpreters Implemen+ng PLs Most of the course is learning fundamental concepts for using PLs Syntax vs. seman+cs vs. idioms Powerful constructs like closures, first- class objects, iterators (streams),
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2010 Lecture 6: VLIW 563 L06.1 Fall 2010 Little s Law Number of Instructions in the pipeline (parallelism) = Throughput * Latency or N T L Throughput per Cycle
More informationLecture 2. White- box Tes2ng and Structural Coverage (see Amman and Offut, Chapter 2)
Lecture 2 White- box Tes2ng and Structural Coverage (see Amman and Offut, Chapter 2) White- box Tes2ng (aka. Glass- box or structural tes2ng) An error may exist at one (or more) loca2on(s) Line numbers
More informationECE 4750 Computer Architecture, Fall 2017 T13 Advanced Processors: Branch Prediction
ECE 4750 Computer Architecture, Fall 2017 T13 Advanced Processors: Branch Prediction School of Electrical and Computer Engineering Cornell University revision: 2017-11-20-08-48 1 Branch Prediction Overview
More informationLecture 1 Introduc-on
Lecture 1 Introduc-on What would you get out of this course? Structure of a Compiler Op9miza9on Example 15-745: Introduc9on 1 What Do Compilers Do? 1. Translate one language into another e.g., convert
More informationLibraries are wri4en in C/C++ and compiled for the par>cular hardware.
marakana.com 1 marakana.com 2 marakana.com 3 marakana.com 4 Libraries are wri4en in C/C++ and compiled for the par>cular hardware. marakana.com 5 The Dalvik virtual machine is a major piece of Google's
More informationRISC Architecture: Multi-Cycle Implementation
RISC Architecture: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay
More informationComputer Architecture
Computer Architecture Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/
More informationSta$c Single Assignment (SSA) Form
Sta$c Single Assignment (SSA) Form SSA form Sta$c single assignment form Intermediate representa$on of program in which every use of a variable is reached by exactly one defini$on Most programs do not
More informationAdvanced branch predic.on algorithms. Ryan Gabrys Ilya Kolykhmatov
Advanced branch predic.on algorithms Ryan Gabrys Ilya Kolykhmatov Context Branches are frequent: 15-25 % A branch predictor allows the processor to specula.vely fetch and execute instruc.ons down the predicted
More informationRegister Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein
Register Alloca.on Deconstructed David Ryan Koes Seth Copen Goldstein 12th Interna+onal Workshop on So3ware and Compilers for Embedded Systems April 24, 12009 Register Alloca:on Problem unbounded number
More informationSoft GPGPUs for Embedded FPGAS: An Architectural Evaluation
Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation 2nd International Workshop on Overlay Architectures for FPGAs (OLAF) 2016 Kevin Andryc, Tedy Thomas and Russell Tessier University of Massachusetts
More informationECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation
ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating
More informationLecture 18 List Scheduling & Global Scheduling Carnegie Mellon
Lecture 18 List Scheduling & Global Scheduling Reading: Chapter 10.3-10.4 1 Review: The Ideal Scheduling Outcome What prevents us from achieving this ideal? Before After Time 1 cycle N cycles 2 Review:
More informationSearch Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson
Search Engines Informa1on Retrieval in Prac1ce Annotations by Michael L. Nelson All slides Addison Wesley, 2008 Indexes Indexes are data structures designed to make search faster Text search has unique
More informationCISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions
CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy
More informationSearch Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson
Search Engines Informa1on Retrieval in Prac1ce Annota1ons by Michael L. Nelson All slides Addison Wesley, 2008 Evalua1on Evalua1on is key to building effec$ve and efficient search engines measurement usually
More informationComputer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505
Computer Architecture: Mul1ple Issue Berk Sunar and Thomas Eisenbarth ECE 505 Outline 5 stages of RISC Type of hazards Sta@c and Dynamic Branch Predic@on Pipelining with Excep@ons Pipelining with Floa@ng-
More informationProgram Op*miza*on and Analysis. Chenyang Lu CSE 467S
Program Op*miza*on and Analysis Chenyang Lu CSE 467S 1 Program Transforma*on op#mize Analyze HLL compile assembly assemble Physical Address Rela5ve Address assembly object load executable link Absolute
More informationOutline. Review: Assembly/Machine Code View. Processor State (x86-64, Par2al) Condi2on Codes (Explicit Se^ng: Compare) Condi2on Codes (Implicit Se^ng)
Outline Machine- Level Representa2on: Control CSCI 2021: Machine Architecture and Organiza2on Pen- Chung Yew Department Computer Science and Engineering University of Minnesota Control: Condi2on codes
More informationCS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University
CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Indexing Process Indexes Indexes are data structures designed to make search faster Text search
More informationques4ons? Midterm Projects, etc. Path- Based Sta4c Analysis Sta4c analysis we know Example 11/20/12
Midterm Grades and solu4ons are (and have been) on Moodle The midterm was hard[er than I thought] grades will be scaled I gave everyone a 10 bonus point (already included in your total) max: 98 mean: 71
More information: Compiler Design
252-210: Compiler Design 9.0 Data- Flow Analysis Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Global program analysis is a crucial part of all real compilers. Global : beyond a statement
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationSISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:
SISTEMI EMBEDDED Computer Organization Pipelining Federico Baronti Last version: 20160518 Basic Concept of Pipelining Circuit technology and hardware arrangement influence the speed of execution for programs
More informationPage # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two?
Exploiting ILP through Software Approaches Venkatesh Akella EEC 270 Winter 2005 Based on Slides from Prof. Al. Davis @ cs.utah.edu Let the Compiler Do it Pros and Cons Pros No window size limitation, the
More informationThe Processor: Improving the performance - Control Hazards
The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary
More informationh7ps://bit.ly/citustutorial
Before We Start Setup a Citus Cloud account for the exercises: h7ps://bit.ly/citustutorial Designing a Mul
More informationece4750-t11-ooo-execution-notes.txt ========================================================================== ece4750-l12-ooo-execution-notes.txt ==========================================================================
More informationReading assignment. Chapter 3.1, 3.2 Chapter 4.1, 4.3
Reading assignment Chapter 3.1, 3.2 Chapter 4.1, 4.3 1 Outline Introduc5on to assembly programing Introduc5on to Y86 Y86 instruc5ons, encoding and execu5on 2 Assembly The CPU uses machine language to perform
More informationDatabase design and implementation CMPSCI 645. Lecture 08: Storage and Indexing
Database design and implementation CMPSCI 645 Lecture 08: Storage and Indexing 1 Where is the data and how to get to it? DB 2 DBMS architecture Query Parser Query Rewriter Query Op=mizer Query Executor
More informationDynamic Control Hazard Avoidance
Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>
More informationCaches 3/23/17. Agenda. The Dataflow Model (of a Computer)
Agenda Caches Samira Khan March 23, 2017 Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed
More informationCaches. Samira Khan March 23, 2017
Caches Samira Khan March 23, 2017 Agenda Review from last lecture Data flow model Memory hierarchy More Caches The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed
More informationMemory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky
Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data
More informationAdministration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering
dministration CS 1/13 Introduction to Compilers and Translators ndrew Myers Cornell University P due in 1 week Optional reading: Muchnick 17 Lecture 30: Instruction scheduling 1 pril 00 1 Impact of instruction
More informationECE 571 Advanced Microprocessor-Based Design Lecture 7
ECE 571 Advanced Microprocessor-Based Design Lecture 7 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 9 February 2016 HW2 Grades Ready Announcements HW3 Posted be careful when
More informationObjec0ves. Gain understanding of what IDA Pro is and what it can do. Expose students to the tool GUI
Intro to IDA Pro 31/15 Objec0ves Gain understanding of what IDA Pro is and what it can do Expose students to the tool GUI Discuss some of the important func
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More informationNAMES, SCOPES AND BINDING A REVIEW OF THE CONCEPTS
NAMES, SCOPES AND BINDING A REVIEW OF THE CONCEPTS Name Binding and Binding Time Name binding is the associa1on of objects (data and/or code) with names (iden1fiers) Shape S = new Shape(); The binding
More informationCaching and Demand- Paged Virtual Memory
Caching and Demand- Paged Virtual Memory Defini8ons Cache Copy of data that is faster to access than the original Hit: if cache has copy Miss: if cache does not have copy Cache block Unit of cache storage
More informationBeyond ILP II: SMT and variants. 1 Simultaneous MT: D. Tullsen, S. Eggers, and H. Levy
EE482: Advanced Computer Organization Lecture #13 Processor Architecture Stanford University Handout Date??? Beyond ILP II: SMT and variants Lecture #13: Wednesday, 10 May 2000 Lecturer: Anamaya Sullery
More informationControl Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.
Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation
More informationAgenda. Review 2/10/11. CS 61C: Great Ideas in Computer Architecture (Machine Structures)
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/sp11 1 2 New- School Machine Structures (It s a bit more
More informationInstruction Scheduling. Software Pipelining - 3
Instruction Scheduling and Software Pipelining - 3 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Instruction
More informationProcessor Architecture
ECPE 170 Jeff Shafer University of the Pacific Processor Architecture 2 Lab Schedule Ac=vi=es Assignments Due Today Wednesday Apr 24 th Processor Architecture Lab 12 due by 11:59pm Wednesday Network Programming
More informationComputer Programming-I. Developed by: Strawberry
Computer Programming-I Objec=ve of CP-I The course will enable the students to understand the basic concepts of structured programming. What is programming? Wri=ng a set of instruc=ons that computer use
More informationCS 61C: Great Ideas in Computer Architecture Compilers and Floa-ng Point. Today s. Lecture
CS 61C: Great Ideas in Computer Architecture s and Floa-ng Point Instructors: Krste Asanovic, Randy H. Katz hdp://inst.eecs.berkeley.edu/~cs61c/fa12 Fall 2012 - - Lecture #13 1 New- School Machine Structures
More informationDynamic Languages. CSE 501 Spring 15. With materials adopted from John Mitchell
Dynamic Languages CSE 501 Spring 15 With materials adopted from John Mitchell Dynamic Programming Languages Languages where program behavior, broadly construed, cannot be determined during compila@on Types
More informationhnp://
The bots face off in a tournament against one another and about an equal number of humans, with each player trying to score points by elimina&ng its opponents. Each player also has a "judging gun" in addi&on
More informationEE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University
EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted
More informationDocument Databases: MongoDB
NDBI040: Big Data Management and NoSQL Databases hp://www.ksi.mff.cuni.cz/~svoboda/courses/171-ndbi040/ Lecture 9 Document Databases: MongoDB Marn Svoboda svoboda@ksi.mff.cuni.cz 28. 11. 2017 Charles University
More informationA Bad Name. CS 2210: Optimization. Register Allocation. Optimization. Reaching Definitions. Dataflow Analyses 4/10/2013
A Bad Name Optimization is the process by which we turn a program into a better one, for some definition of better. CS 2210: Optimization This is impossible in the general case. For instance, a fully optimizing
More informationEvaluating Compiler Support for Complexity Effective Network Processing
Evaluating Compiler Support for Complexity Effective Network Processing Pradeep Rao and S.K. Nandy Computer Aided Design Laboratory. SERC, Indian Institute of Science. pradeep,nandy@cadl.iisc.ernet.in
More informationComplex Pipelines and Branch Prediction
Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle
More informationCOSC 111: Computer Programming I. Dr. Bowen Hui University of Bri>sh Columbia Okanagan
COSC 111: Computer Programming I Dr. Bowen Hui University of Bri>sh Columbia Okanagan 1 First half of course SoEware examples From English to Java Template for building small programs Exposure to Java
More informationMultiple Instruction Issue and Hardware Based Speculation
Multiple Instruction Issue and Hardware Based Speculation Soner Önder Michigan Technological University, Houghton MI www.cs.mtu.edu/~soner Hardware Based Speculation Exploiting more ILP requires that we
More informationAlgorithms Lecture 11. UC Davis, ECS20, Winter Discrete Mathematics for Computer Science
UC Davis, ECS20, Winter 2017 Discrete Mathematics for Computer Science Prof. Raissa D Souza (slides adopted from Michael Frank and Haluk Bingöl) Lecture 11 Algorithms 3.1-3.2 Algorithms Member of the House
More informationDesign and Debug: Essen.al Concepts Numerical Conversions CS 16: Solving Problems with Computers Lecture #7
Design and Debug: Essen.al Concepts Numerical Conversions CS 16: Solving Problems with Computers Lecture #7 Ziad Matni Dept. of Computer Science, UCSB Announcements We are grading your midterms this week!
More informationECSE 425 Lecture 25: Mul1- threading
ECSE 425 Lecture 25: Mul1- threading H&P Chapter 3 Last Time Theore1cal and prac1cal limits of ILP Instruc1on window Branch predic1on Register renaming 2 Today Mul1- threading Chapter 3.5 Summary of ILP:
More informationCS 465 Final Review. Fall 2017 Prof. Daniel Menasce
CS 465 Final Review Fall 2017 Prof. Daniel Menasce Ques@ons What are the types of hazards in a datapath and how each of them can be mi@gated? State and explain some of the methods used to deal with branch
More informationPerformance Op>miza>on
ECPE 170 Jeff Shafer University of the Pacific Performance Op>miza>on 2 Lab Schedule This Week Ac>vi>es Background discussion Lab 5 Performance Measurement Lab 6 Performance Op;miza;on Lab 5 Assignments
More informationHardware-Software Codesign. 9. Worst Case Execution Time Analysis
Hardware-Software Codesign 9. Worst Case Execution Time Analysis Lothar Thiele 9-1 System Design Specification System Synthesis Estimation SW-Compilation Intellectual Prop. Code Instruction Set HW-Synthesis
More informationLecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ
Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB)
More informationVHDL: Concurrent Coding vs. Sequen7al Coding. 1
VHDL: Concurrent Coding vs. Sequen7al Coding talarico@gonzaga.edu 1 Concurrent Coding Concurrent = parallel VHDL code is inherently concurrent Concurrent statements are adequate only to code at a very
More informationSearching and Sorting (Savitch, Chapter 7.4)
Searching and Sorting (Savitch, Chapter 7.4) TOPICS Algorithms Complexity Binary Search Bubble Sort Insertion Sort Selection Sort What is an algorithm? A finite set of precise instruc6ons for performing
More informationBackground. IBM sold expensive mainframes to large organiza<ons. Monitor sits between one or more OSes and HW
Virtual Machines Background IBM sold expensive mainframes to large organiza
More informationWhat is an algorithm?
/0/ What is an algorithm? Searching and Sorting (Savitch, Chapter 7.) TOPICS Algorithms Complexity Binary Search Bubble Sort Insertion Sort Selection Sort A finite set of precise instrucons for performing
More informationVLIW/EPIC: Statically Scheduled ILP
6.823, L21-1 VLIW/EPIC: Statically Scheduled ILP Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind
More informationSpeculative Trace Scheduling in VLIW Processors
Speculative Trace Scheduling in VLIW Processors Manvi Agarwal and S.K. Nandy CADL, SERC, Indian Institute of Science, Bangalore, INDIA {manvi@rishi.,nandy@}serc.iisc.ernet.in J.v.Eijndhoven and S. Balakrishnan
More informationComponent diagrams. Components Components are model elements that represent independent, interchangeable parts of a system.
Component diagrams Components Components are model elements that represent independent, interchangeable parts of a system. Components are more abstract than classes and can be considered to be stand- alone
More informationCS 406/534 Compiler Construction Putting It All Together
CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy
More information