Dependence Analysis. Hwansoo Han
|
|
- MargaretMargaret Watson
- 5 years ago
- Views:
Transcription
1 Dependence Analysis Hwansoo Han
2 Dependence analysis Dependence Control dependence Data dependence Dependence graph Usage The way to represent dependences Dependence types, latencies Instruction scheduling Parallelization to exploit Instruction level parallelism Loop level parallelism 2
3 Dependence Control dependence Relation defined on control flow Different from execution order Data dependence Relation defined on data flow Flow / anti- / output / input dependence S δ T : T is dependent on S Dependences constrain execution order 3
4 Control Dependence S1 δ c S2: S2 is control dependent on S1 if S1 determines whether S2 is executed S1 S2 S3 br.eq a, b L1 a b + c L1: a a + e S1 δ c S2? ( ) S1 δ c S3? ( ) S2 δ c S3? ( ) 4
5 Control Dep. around Branches & Joins Control dependences come from branch or join points in CFG S1 L1 S3 S2 S1 δ c S2? ( ) S1 δ c L1? ( ) L1 δ c S3? ( ) S2 δ c L1? ( ) S2 δ c S3? ( ) S1 δ c S3? ( ) 5
6 Data Dependence S1 δ d S2: S2 is data dependent on S1 If S1 and S2 accesses the same variable AND S1 precedes S2 in program order S1 a b + c S1 δ f S2 : a S2 d a + e S2 δ a S3 : e S3 e f + g S2 δ o S4 : d S4 d g + h S3 δ i S4 : g 6
7 Types of Data Dependences Flow dependence (δ f ) true dependence Can cause : RAW hazard Anti-dependence (δ a ) register rename removes dependence Output dependence (δ o ) register rename removes dependence Input dependence (δ i ) does not constrain execution order : WAR hazard : WAW hazard : RAR (not hazard) 7
8 Example S 1 read *, x,i,n S 2 y = x + 10 S 3 if (x.gt.0) then S 4 z = y * 2 S 5 x = z 1 S 6 else S 7 a[i] = y + 1 S 8 i = i + 1 S 9 endif S 10 a[1] = a[i-1] S 11 do j = 2, 10, 1 S 12 S 13 S 14 a[j] = a[j-1] * y enddo print *, a[n],x Dependences on x S 1 f S 2 S 1 f S 3 S 1 o S 5 S 1 f S 14 S 2 a S 5 S 3 a S 5 S 5 f S 14 8
9 Dependence Graph DAG used to represent dependence relations. Allows the compiler to capitalize on graph algorithms to analyze dependences. Each edge may be annotated with Ex) dependence types execution latency S 1 S2, S 1 S 3, S 1 o S 5, S 1 S 14, S 2 a S 5, S 3 a S 5, S 5 S 14 S 1 2 S 2 4 output S 3 anti 1 anti S 5 1 S 14 9
10 Instruction Scheduling Dependence imposes execution order : If S 1 S 2, S 1 must execute earlier than S 2 Scheduling constraint for control dependence: cannot move instructions above/below branches cannot move instructions above/below join points b = 0; if (x < 6) { y = 0; z = 5; } else { y = 4; z = 5; } a = 1; original code yes y = 0 z = 5 b = 0 x < 6 a = 1 no y = 4 z = 5 b = 0; if (x < 6) { z = 5; y = 0; } else { z = 5; y = 4; } a = 1; OK b = 0; if (x < 6) { z = 5; } else { y = 4; z = 5; } y = 0; a = 1; error b = 0; if (x < 6) { y = 0; z = 5; } else { y = 4; z = 5; a = 1; } error 10
11 Tail merging hoisting Bending the Rule For some optimizations, control dependence may not be applied Code motion / hoisting / sinking Moves instructions up/down the control flow as far as other dependences and resource constraints allow d = a * 2 c = b + d b = a * 2 e = b - 5 b = a * 2 d = b c = b + d b = b e = b - 5 Tail merging (Cross jumping) Moves instructions outside the basic blocks to their common successor if the instructions appear at the end of all these blocks A special case of code sinking b = 0; if (x < 6) { y = 0; z = 5; } else { y = 4; z = 5; } a = 1; b = 0; if (x < 6) y = 0; else y = 4; z = 5; a = 1; original code OK 11
12 hoistable? VLIW Scheduling What does the dependence property imply to compiler techniques? Some compiler techniques need to move instructions. (e.g., code motion) Before they move instructions, they should identify dependences. Suppose that a compiler schedules this code on a 4-issue VLIW machine. VLIW instructions S 1 S 2 S 3 S 4 VLIW instructions (improved) S 1 S 2 S 3 S 58 S 10 S 4 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 ld [%i5],%i0 mov 2,%i3 cmp %i3,%i0 bge.else add %i2,%i4,%i2 st %i2,[%i5] ba.endif.else: add %i2,%i4,%i2 mov %i2,%i0.endif: S 10 mov 1,%i3 S 11 add %i5,%i3,%i5 S 8 S S 5 S 6 S 7 S 9 S 10 S 6 S 7 S 11 S 9 data dependence control dependence Finding dependences helps the compiler to know where it may or may not move instructions.
13 Data Parallel Applications Scientific/multimedia applications Focus on structured code (loops) Array analysis (memory access pattern) Issues Finding parallel loops Transforming codes for better cache performance 13
14 Data Dependence Analysis Parallel loops? for (i=0; i<100; i++) a[ i ] = (b[ i-1 ] + b[ i ] + b[ i+1 ]) / 3; Data dependence between loop iterations Data dependence analysis 14
15 Dependences in loops (1/3) Control dependence forces execution order among loop iterations (instances) [1 st iteration] L1: [2 nd iteration] L1: br L1 br L1 15
16 Dependences in loops (2/3) Loop-independent dependence dependence within the same loop iteration (loop code) i = 1 L1: a[i]=a[i-1]+1 b[i]=b[i+1]-1 i = i + 1 br.ne i,10,l1 loop independent anti-dependence 16
17 Dependences in loops (3/3) Loop-carried dependence dependence across loop iterations parallel loop : no loop-carried dependence (loop code) i = 1 L1: a[i]=a[i-1]+1 b[i]=b[i+1]-1 i = i + 1 br.ne i,10,l1 (1 st iteration) flow L1: a[1]=a[0]+1 b[1]=b[2]-1 i = i + 1 br.ne i,10,l1 (2 nd iteration) L1: a[2]=a[1]+1 b[2]=b[3]-1 i = i + 1 br.ne i,10,l1 anti 17
18 Iteration space Lexicographical order Distance vector : d = i 2 i 1, if i 1 δ i 2 Direction vector for i 0, N do for j 0, N do a[i, j] = a[i-1, j-1] + d; b[i, j] = b[i-1, j+1] + e; c[i, j] = c[i+1, j] + f i distance direction a[] : < 1, 1 > < +, + > b[] : < 1, -1 > < +, - > c[] : < 1, 0> < +, = > 0 0 j 18
19 Data Dependence Test (1) Any two references access the same memory location? E i 1, i 2, s.t. 3*i 1-5 = 2*i 2 +1, 1 i 1, i 2 4? Finding integer solution for given integer coefficient inequality equations integer programming (NP-complete) Many tests exist GCD test, Fourier-Motzkin elimination, Banerjee s inequality for i 1, 4 do b[ i ] = a[3*i-5] + 2; a[2*i+1] = 1 - i; (solution) i 1 = 4, i 2 = 3 flow dependence 19
20 GCD Test GCD test a 0 + a 1 i a n i n = b 0 + b 1 j b n j n a 1 i a n i n - b 1 j b n j n = b 0 - a 0 If GCD (a 1, a n, b 1, b n ) does not divides b 0 - a 0, no integer solution exists Weakness Does not consider index ranges Only deals with affine functions of loop indexes 20 for i 1 1, N 1 for i 2 1, N 2 for i n 1, N n A [, a 0 + a 1 i 1 + a n i n, ] = A[, b 0 + b 1 i 1 + b n i n, ]
21 Fourier-Motzkin Elimination (1) Decide satisfiability of conjunction of linear constraints over reals Earliest method for solving linear inequalities Discovered in 1826 by Fourier, re-discovered in 1936 by Motzkin Basic idea Pick one variable and eliminate it Continue until all variables but one are eliminated 21
22 Fourier-Motzkin Elimination (2) Assume eliminating x 1 Find B lo x 1 B up We eliminate x 1 Add B lo B up 22
23 Parallel Loop Loop L i is parallel, if all D=<d 1,, d n >, <d 1,,d i-1 > is lexicographically forward (i.e. LC dep. in outer loops) or <d 1,,d i > = 0 (i.e. no LC dependence in L i ) for i 0, N do for j 0, N do a[i, j] = a[i, j-1] + c; for i 0, N do for j 0, N do a[i, j] = a[i+2, j] + c; for i 0, N do for j 0, N do a[i, j] = a[i-1, j+1] + c; distance vector : <0,1> i (outer loop) : parallel j (inner loop) : not parallel distance vector : <2,0> i (outer loop) : not parallel j (inner loop) : parallel distance vector : <1,-1> i (outer loop) : not parallel j (inner loop) : parallel 23
24 Summary Instruction scheduling/reordering is a compiler technique that optimizes execution time or resource utilization by moving instructions based on the facts : If no dependence exists among instructions, they can be executed in any order. Instructions that are independent from each other can be executed in parallel. Parallelization also uses dependence information to achieve better performance by scheduling instructions for concurrent execution (e.g., VLIW compiler). 24
Data Dependences and Parallelization
Data Dependences and Parallelization 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11, 20 a[i]
More informationLecture 7. Instruction Scheduling. I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code)
Lecture 7 Instruction Scheduling I. Basic Block Scheduling II. Global Scheduling (for Non-Numeric Code) Reading: Chapter 10.3 10.4 CS243: Instruction Scheduling 1 Scheduling Constraints Data dependences
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 4A: Instruction Level Parallelism - Static Scheduling Avinash Kodi, kodi@ohio.edu Agenda 2 Dependences RAW, WAR, WAW Static Scheduling Loop-carried Dependence
More informationAutotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT
Autotuning John Cavazos University of Delaware What is Autotuning? Searching for the best code parameters, code transformations, system configuration settings, etc. Search can be Quasi-intelligent: genetic
More informationLecture 9 Basic Parallelization
Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning
More informationLecture 9 Basic Parallelization
Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning
More informationCompiling for Advanced Architectures
Compiling for Advanced Architectures In this lecture, we will concentrate on compilation issues for compiling scientific codes Typically, scientific codes Use arrays as their main data structures Have
More informationModule 16: Data Flow Analysis in Presence of Procedure Calls Lecture 32: Iteration. The Lecture Contains: Iteration Space.
The Lecture Contains: Iteration Space Iteration Vector Normalized Iteration Vector Dependence Distance Direction Vector Loop Carried Dependence Relations Dependence Level Iteration Vector - Triangular
More informationCMSC411 Fall 2013 Midterm 2 Solutions
CMSC411 Fall 2013 Midterm 2 Solutions 1. (12 pts) Memory hierarchy a. (6 pts) Suppose we have a virtual memory of size 64 GB, or 2 36 bytes, where pages are 16 KB (2 14 bytes) each, and the machine has
More informationChapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST
Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism
More informationCS 293S Parallelism and Dependence Theory
CS 293S Parallelism and Dependence Theory Yufei Ding Reference Book: Optimizing Compilers for Modern Architecture by Allen & Kennedy Slides adapted from Louis-Noël Pouche, Mary Hall End of Moore's Law
More informationChapter 06: Instruction Pipelining and Parallel Processing
Chapter 06: Instruction Pipelining and Parallel Processing Lesson 09: Superscalar Processors and Parallel Computer Systems Objective To understand parallel pipelines and multiple execution units Instruction
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2018 Static Instruction Scheduling 1 Techniques to reduce stalls CPI = Ideal CPI + Structural stalls per instruction + RAW stalls per instruction + WAR stalls per
More informationThe Polyhedral Model (Dependences and Transformations)
The Polyhedral Model (Dependences and Transformations) Announcements HW3 is due Wednesday February 15 th, tomorrow Project proposal is due NEXT Friday, extension so that example with tool is possible Today
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More informationFloating Point/Multicycle Pipelining in DLX
Floating Point/Multicycle Pipelining in DLX Completion of DLX EX stage floating point arithmetic operations in one or two cycles is impractical since it requires: A much longer CPU clock cycle, and/or
More informationModule 17: Loops Lecture 34: Symbolic Analysis. The Lecture Contains: Symbolic Analysis. Example: Triangular Lower Limits. Multiple Loop Limits
The Lecture Contains: Symbolic Analysis Example: Triangular Lower Limits Multiple Loop Limits Exit in The Middle of a Loop Dependence System Solvers Single Equation Simple Test GCD Test Extreme Value Test
More informationExploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville
Lecture : Exploiting ILP with SW Approaches Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Basic Pipeline Scheduling and Loop
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationCS553 Lecture Profile-Guided Optimizations 3
Profile-Guided Optimizations Last time Instruction scheduling Register renaming alanced Load Scheduling Loop unrolling Software pipelining Today More instruction scheduling Profiling Trace scheduling CS553
More informationTopic 14: Scheduling COS 320. Compiling Techniques. Princeton University Spring Lennart Beringer
Topic 14: Scheduling COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 The Back End Well, let s see Motivating example Starting point Motivating example Starting point Multiplication
More informationComputer Science 246 Computer Architecture
Computer Architecture Spring 2009 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Compiler ILP Static ILP Overview Have discussed methods to extract ILP from hardware Why can t some of these
More informationLecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )
Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 4.4) 1 Static vs Dynamic Scheduling Arguments against dynamic scheduling: requires complex structures
More informationReview. Loop Fusion Example
Review Distance vectors Concisely represent dependences in loops (i.e., in iteration spaces) Dictate what transformations are legal e.g., Permutation and parallelization Legality A dependence vector is
More informationClass Information INFORMATION and REMINDERS Homework 8 has been posted. Due Wednesday, December 13 at 11:59pm. Third programming has been posted. Due Friday, December 15, 11:59pm. Midterm sample solutions
More informationCourse on Advanced Computer Architectures
Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, July 9, 2018 Course on Advanced Computer Architectures Prof. D. Sciuto, Prof. C. Silvano EX1 EX2 EX3 Q1
More informationSoftware Pipelining by Modulo Scheduling. Philip Sweany University of North Texas
Software Pipelining by Modulo Scheduling Philip Sweany University of North Texas Overview Instruction-Level Parallelism Instruction Scheduling Opportunities for Loop Optimization Software Pipelining Modulo
More information6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT
6.189 IAP 2007 Lecture 11 Parallelizing Compilers 1 6.189 IAP 2007 MIT Outline Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Generation of Parallel
More informationHW 2 is out! Due 9/25!
HW 2 is out! Due 9/25! CS 6290 Static Exploitation of ILP Data-Dependence Stalls w/o OOO Single-Issue Pipeline When no bypassing exists Load-to-use Long-latency instructions Multiple-Issue (Superscalar),
More informationInstruction-Level Parallelism (ILP)
Instruction Level Parallelism Instruction-Level Parallelism (ILP): overlap the execution of instructions to improve performance 2 approaches to exploit ILP: 1. Rely on hardware to help discover and exploit
More informationLecture 6: Static ILP
Lecture 6: Static ILP Topics: loop analysis, SW pipelining, predication, speculation (Section 2.2, Appendix G) Assignment 2 posted; due in a week 1 Loop Dependences If a loop only has dependences within
More informationLoop Transformations, Dependences, and Parallelization
Loop Transformations, Dependences, and Parallelization Announcements HW3 is due Wednesday February 15th Today HW3 intro Unimodular framework rehash with edits Skewing Smith-Waterman (the fix is in!), composing
More informationData Dependence Analysis
CSc 553 Principles of Compilation 33 : Loop Dependence Data Dependence Analysis Department of Computer Science University of Arizona collberg@gmail.com Copyright c 2011 Christian Collberg Data Dependence
More informationControl flow graphs and loop optimizations. Thursday, October 24, 13
Control flow graphs and loop optimizations Agenda Building control flow graphs Low level loop optimizations Code motion Strength reduction Unrolling High level loop optimizations Loop fusion Loop interchange
More informationProgrammation Concurrente (SE205)
Programmation Concurrente (SE205) CM1 - Introduction to Parallelism Florian Brandner & Laurent Pautet LTCI, Télécom ParisTech, Université Paris-Saclay x Outline Course Outline CM1: Introduction Forms of
More informationPipelining: Issue instructions in every cycle (CPI 1) Compiler scheduling (static scheduling) reduces impact of dependences
Dynamic Scheduling Pipelining: Issue instructions in every cycle (CPI 1) Compiler scheduling (static scheduling) reduces impact of dependences Increased compiler complexity, especially when attempting
More informationHY425 Lecture 09: Software to exploit ILP
HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 ILP techniques Hardware Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 10 Compiler Techniques / VLIW Israel Koren ECE568/Koren Part.10.1 FP Loop Example Add a scalar
More informationHY425 Lecture 09: Software to exploit ILP
HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit ILP 1 / 44 ILP techniques
More informationHiroaki Kobayashi 12/21/2004
Hiroaki Kobayashi 12/21/2004 1 Loop Unrolling Static Branch Prediction Static Multiple Issue: The VLIW Approach Software Pipelining Global Code Scheduling Trace Scheduling Superblock Scheduling Conditional
More informationPage 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson
More informationLoop Transformations! Part II!
Lecture 9! Loop Transformations! Part II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Loop Unswitching Hoist invariant control-flow
More informationCS 351 Final Exam Solutions
CS 351 Final Exam Solutions Notes: You must explain your answers to receive partial credit. You will lose points for incorrect extraneous information, even if the answer is otherwise correct. Question
More informationChapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)
Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise
More informationArray Dependence Analysis as Integer Constraints. Array Dependence Analysis Example. Array Dependence Analysis as Integer Constraints, cont
Theory of Integers CS389L: Automated Logical Reasoning Omega Test Işıl Dillig Earlier, we talked aout the theory of integers T Z Signature of T Z : Σ Z : {..., 2, 1, 0, 1, 2,..., 3, 2, 2, 3,..., +,, =,
More informationThis test is not formatted for your answers. Submit your answers via to:
Page 1 of 7 Computer Science 320: Final Examination May 17, 2017 You have as much time as you like before the Monday May 22 nd 3:00PM ET deadline to answer the following questions. For partial credit,
More informationPage # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationWhat is ILP? Instruction Level Parallelism. Where do we find ILP? How do we expose ILP?
What is ILP? Instruction Level Parallelism or Declaration of Independence The characteristic of a program that certain instructions are, and can potentially be. Any mechanism that creates, identifies,
More informationG.1 Introduction: Exploiting Instruction-Level Parallelism Statically G-2 G.2 Detecting and Enhancing Loop-Level Parallelism G-2 G.
G.1 Introduction: Exploiting Instruction-Level Parallelism Statically G-2 G.2 Detecting and Enhancing Loop-Level Parallelism G-2 G.3 Scheduling and Structuring Code for Parallelism G-12 G.4 Hardware Support
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2010 Lecture 6: VLIW 563 L06.1 Fall 2010 Little s Law Number of Instructions in the pipeline (parallelism) = Throughput * Latency or N T L Throughput per Cycle
More informationAdvanced Pipelining and Instruction- Level Parallelism 4
4 Advanced Pipelining and Instruction- Level Parallelism 4 Who s first? America. Who s second? Sir, there is no second. Dialog between two observers of the sailing race later named The America s Cup and
More informationLecture 8: Compiling for ILP and Branch Prediction. Advanced pipelining and instruction level parallelism
Lecture 8: Compiling for ILP and Branch Prediction Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 Advanced pipelining and instruction level parallelism
More informationECE 505 Computer Architecture
ECE 505 Computer Architecture Pipelining 2 Berk Sunar and Thomas Eisenbarth Review 5 stages of RISC IF ID EX MEM WB Ideal speedup of pipelining = Pipeline depth (N) Practically Implementation problems
More informationLanguages and Compiler Design II IR Code Optimization
Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 4/16/2010 PSU CS322 HM 1 Agenda IR Optimization
More informationRegister Allocation (wrapup) & Code Scheduling. Constructing and Representing the Interference Graph. Adjacency List CS2210
Register Allocation (wrapup) & Code Scheduling CS2210 Lecture 22 Constructing and Representing the Interference Graph Construction alternatives: as side effect of live variables analysis (when variables
More informationInstruction Level Parallelism (ILP)
1 / 26 Instruction Level Parallelism (ILP) ILP: The simultaneous execution of multiple instructions from a program. While pipelining is a form of ILP, the general application of ILP goes much further into
More informationSlide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng
Slide Set 8 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 8 slide
More informationLecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S
Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching
More informationMultiple Instruction Issue and Hardware Based Speculation
Multiple Instruction Issue and Hardware Based Speculation Soner Önder Michigan Technological University, Houghton MI www.cs.mtu.edu/~soner Hardware Based Speculation Exploiting more ILP requires that we
More informationDynamic Scheduling. CSE471 Susan Eggers 1
Dynamic Scheduling Why go out of style? expensive hardware for the time (actually, still is, relatively) register files grew so less register pressure early RISCs had lower CPIs Why come back? higher chip
More informationLecture Compiler Middle-End
Lecture 16-18 18 Compiler Middle-End Jianwen Zhu Electrical and Computer Engineering University of Toronto Jianwen Zhu 2009 - P. 1 What We Have Done A lot! Compiler Frontend Defining language Generating
More informationDYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD
DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,
More informationEECC551 Exam Review 4 questions out of 6 questions
EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving
More informationUniprocessors. HPC Fall 2012 Prof. Robert van Engelen
Uniprocessors HPC Fall 2012 Prof. Robert van Engelen Overview PART I: Uniprocessors and Compiler Optimizations PART II: Multiprocessors and Parallel Programming Models Uniprocessors Processor architectures
More informationAC64/AT64 DESIGN & ANALYSIS OF ALGORITHMS DEC 2014
AC64/AT64 DESIGN & ANALYSIS OF ALGORITHMS DEC 214 Q.2 a. Design an algorithm for computing gcd (m,n) using Euclid s algorithm. Apply the algorithm to find gcd (31415, 14142). ALGORITHM Euclid(m, n) //Computes
More informationLecture Compiler Backend
Lecture 19-23 Compiler Backend Jianwen Zhu Electrical and Computer Engineering University of Toronto Jianwen Zhu 2009 - P. 1 Backend Tasks Instruction selection Map virtual instructions To machine instructions
More informationCSC D70: Compiler Optimization Prefetching
CSC D70: Compiler Optimization Prefetching Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry and Phillip Gibbons DRAM Improvement
More informationLecture: Pipeline Wrap-Up and Static ILP
Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2) 1 Multicycle
More informationComputer Architecture and Engineering. CS152 Quiz #5. April 23rd, Professor Krste Asanovic. Name: Answer Key
Computer Architecture and Engineering CS152 Quiz #5 April 23rd, 2009 Professor Krste Asanovic Name: Answer Key Notes: This is a closed book, closed notes exam. 80 Minutes 8 Pages Not all questions are
More informationDynamic Control Hazard Avoidance
Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>
More informationSystems of Inequalities
Systems of Inequalities 1 Goals: Given system of inequalities of the form Ax b determine if system has an integer solution enumerate all integer solutions 2 Running example: Upper bounds for x: (2)and
More informationCS671 Parallel Programming in the Many-Core Era
1 CS671 Parallel Programming in the Many-Core Era Polyhedral Framework for Compilation: Polyhedral Model Representation, Data Dependence Analysis, Scheduling and Data Locality Optimizations December 3,
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationInstruction-Level Parallelism. Instruction Level Parallelism (ILP)
Instruction-Level Parallelism CS448 1 Pipelining Instruction Level Parallelism (ILP) Limited form of ILP Overlapping instructions, these instructions can be evaluated in parallel (to some degree) Pipeline
More informationAdvanced Compiler Construction Theory And Practice
Advanced Compiler Construction Theory And Practice Introduction to loop dependence and Optimizations 7/7/2014 DragonStar 2014 - Qing Yi 1 A little about myself Qing Yi Ph.D. Rice University, USA. Associate
More informationUNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.
UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known
More informationUNIT I (Two Marks Questions & Answers)
UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-
More informationfast code (preserve flow of data)
Instruction scheduling: The engineer s view The problem Given a code fragment for some target machine and the latencies for each individual instruction, reorder the instructions to minimize execution time
More informationELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST *
ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST * SAMPLE 1 Section: Simple pipeline for integer operations For all following questions we assume that: a) Pipeline contains 5 stages: IF, ID, EX,
More informationPipeline issues. Pipeline hazard: RaW. Pipeline hazard: RaW. Calcolatori Elettronici e Sistemi Operativi. Hazards. Data hazard.
Calcolatori Elettronici e Sistemi Operativi Pipeline issues Hazards Pipeline issues Data hazard Control hazard Structural hazard Pipeline hazard: RaW Pipeline hazard: RaW 5 6 7 8 9 5 6 7 8 9 : add R,R,R
More informationHardware-based Speculation
Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions
More informationInstruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4
PROBLEM 1: An application running on a 1GHz pipelined processor has the following instruction mix: Instruction Frequency CPI Load-store 55% 5 Arithmetic 30% 4 Branch 15% 4 a) Determine the overall CPI
More informationENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design
ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationVLIW/EPIC: Statically Scheduled ILP
6.823, L21-1 VLIW/EPIC: Statically Scheduled ILP Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind
More informationPage # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two?
Exploiting ILP through Software Approaches Venkatesh Akella EEC 270 Winter 2005 Based on Slides from Prof. Al. Davis @ cs.utah.edu Let the Compiler Do it Pros and Cons Pros No window size limitation, the
More informationCompiler techniques for leveraging ILP
Compiler techniques for leveraging ILP Purshottam and Sajith October 12, 2011 Purshottam and Sajith (IU) Compiler techniques for leveraging ILP October 12, 2011 1 / 56 Parallelism in your pocket LINPACK
More informationRECAP. B649 Parallel Architectures and Programming
RECAP B649 Parallel Architectures and Programming RECAP 2 Recap ILP Exploiting ILP Dynamic scheduling Thread-level Parallelism Memory Hierarchy Other topics through student presentations Virtual Machines
More informationWhat Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009
What Compilers Can and Cannot Do Saman Amarasinghe Fall 009 Optimization Continuum Many examples across the compilation pipeline Static Dynamic Program Compiler Linker Loader Runtime System Optimization
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationControl Flow Analysis & Def-Use. Hwansoo Han
Control Flow Analysis & Def-Use Hwansoo Han Control Flow Graph What is CFG? Represents program structure for internal use of compilers Used in various program analyses Generated from AST or a sequential
More informationLecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.SP96 1 Review: Evaluating Branch Alternatives Two part solution: Determine
More information計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches
4.1 Basic Compiler Techniques for Exposing ILP 計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches 吳俊興高雄大學資訊工程學系 To avoid a pipeline stall, a dependent instruction must be
More informationTheory of Integers. CS389L: Automated Logical Reasoning. Lecture 13: The Omega Test. Overview of Techniques. Geometric Description
Theory of ntegers This lecture: Decision procedure for qff theory of integers CS389L: Automated Logical Reasoning Lecture 13: The Omega Test şıl Dillig As in previous two lectures, we ll consider T Z formulas
More informationIn embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency.
Lesson 1 Course Notes Review of Computer Architecture Embedded Systems ideal: low power, low cost, high performance Overview of VLIW and ILP What is ILP? It can be seen in: Superscalar In Order Processors
More informationIntermediate Representations. Reading & Topics. Intermediate Representations CS2210
Intermediate Representations CS2210 Lecture 11 Reading & Topics Muchnick: chapter 6 Topics today: Intermediate representations Automatic code generation with pattern matching Optimization Overview Control
More informationHW1 Solutions. Type Old Mix New Mix Cost CPI
HW1 Solutions Problem 1 TABLE 1 1. Given the parameters of Problem 6 (note that int =35% and shift=5% to fix typo in book problem), consider a strength-reducing optimization that converts multiplies by
More informationComputer Science 146. Computer Architecture
Computer rchitecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 11: Software Pipelining and Global Scheduling Lecture Outline Review of Loop Unrolling Software Pipelining
More information