Dependence Analysis. Hwansoo Han

Size: px

Start display at page:

Download "Dependence Analysis. Hwansoo Han"

MargaretMargaret Watson
5 years ago
Views:

1 Dependence Analysis Hwansoo Han

2 Dependence analysis Dependence Control dependence Data dependence Dependence graph Usage The way to represent dependences Dependence types, latencies Instruction scheduling Parallelization to exploit Instruction level parallelism Loop level parallelism 2

3 Dependence Control dependence Relation defined on control flow Different from execution order Data dependence Relation defined on data flow Flow / anti- / output / input dependence S δ T : T is dependent on S Dependences constrain execution order 3

4 Control Dependence S1 δ c S2: S2 is control dependent on S1 if S1 determines whether S2 is executed S1 S2 S3 br.eq a, b L1 a b + c L1: a a + e S1 δ c S2? ( ) S1 δ c S3? ( ) S2 δ c S3? ( ) 4

5 Control Dep. around Branches & Joins Control dependences come from branch or join points in CFG S1 L1 S3 S2 S1 δ c S2? ( ) S1 δ c L1? ( ) L1 δ c S3? ( ) S2 δ c L1? ( ) S2 δ c S3? ( ) S1 δ c S3? ( ) 5

6 Data Dependence S1 δ d S2: S2 is data dependent on S1 If S1 and S2 accesses the same variable AND S1 precedes S2 in program order S1 a b + c S1 δ f S2 : a S2 d a + e S2 δ a S3 : e S3 e f + g S2 δ o S4 : d S4 d g + h S3 δ i S4 : g 6

7 Types of Data Dependences Flow dependence (δ f ) true dependence Can cause : RAW hazard Anti-dependence (δ a ) register rename removes dependence Output dependence (δ o ) register rename removes dependence Input dependence (δ i ) does not constrain execution order : WAR hazard : WAW hazard : RAR (not hazard) 7

8 Example S 1 read *, x,i,n S 2 y = x + 10 S 3 if (x.gt.0) then S 4 z = y * 2 S 5 x = z 1 S 6 else S 7 a[i] = y + 1 S 8 i = i + 1 S 9 endif S 10 a[1] = a[i-1] S 11 do j = 2, 10, 1 S 12 S 13 S 14 a[j] = a[j-1] * y enddo print *, a[n],x Dependences on x S 1 f S 2 S 1 f S 3 S 1 o S 5 S 1 f S 14 S 2 a S 5 S 3 a S 5 S 5 f S 14 8

9 Dependence Graph DAG used to represent dependence relations. Allows the compiler to capitalize on graph algorithms to analyze dependences. Each edge may be annotated with Ex) dependence types execution latency S 1 S2, S 1 S 3, S 1 o S 5, S 1 S 14, S 2 a S 5, S 3 a S 5, S 5 S 14 S 1 2 S 2 4 output S 3 anti 1 anti S 5 1 S 14 9

Instruction Scheduling Dependence imposes execution order : If S 1 S 2, S

dependence: cannot move instructions above/below branches cannot move

else { y = 4; z = 5; } a = 1; original code yes y = 0 z = 5 b = 0 x < 6 a

4; } a = 1; OK b = 0; if (x < 6) { z = 5; } else { y = 4; z = 5; } y = 0;

10 Instruction Scheduling Dependence imposes execution order : If S 1 S 2, S 1 must execute earlier than S 2 Scheduling constraint for control dependence: cannot move instructions above/below branches cannot move instructions above/below join points b = 0; if (x < 6) { y = 0; z = 5; } else { y = 4; z = 5; } a = 1; original code yes y = 0 z = 5 b = 0 x < 6 a = 1 no y = 4 z = 5 b = 0; if (x < 6) { z = 5; y = 0; } else { z = 5; y = 4; } a = 1; OK b = 0; if (x < 6) { z = 5; } else { y = 4; z = 5; } y = 0; a = 1; error b = 0; if (x < 6) { y = 0; z = 5; } else { y = 4; z = 5; a = 1; } error 10

Tail merging hoisting Bending the Rule For some optimizations, control dependence may not be applied Code motion / hoisting / sinking Moves instructions up/down the control flow as far as other

11 Tail merging hoisting Bending the Rule For some optimizations, control dependence may not be applied Code motion / hoisting / sinking Moves instructions up/down the control flow as far as other dependences and resource constraints allow d = a * 2 c = b + d b = a * 2 e = b - 5 b = a * 2 d = b c = b + d b = b e = b - 5 Tail merging (Cross jumping) Moves instructions outside the basic blocks to their common successor if the instructions appear at the end of all these blocks A special case of code sinking b = 0; if (x < 6) { y = 0; z = 5; } else { y = 4; z = 5; } a = 1; b = 0; if (x < 6) y = 0; else y = 4; z = 5; a = 1; original code OK 11

hoistable? VLIW Scheduling What does the dependence property imply to compiler techniques? Some compiler techniques need to move instructions. (e.g., code motion) Before they move instructions, they should identify dependences.

VLIW instructions S 1 S 2 S 3 S 4 VLIW instructions (improved) S 1 S 2 S 3 S 58 S 10 S 4 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 ld [%i5],%i0 mov 2,%i3 cmp %i3,%i0 bge.

12 hoistable? VLIW Scheduling What does the dependence property imply to compiler techniques? Some compiler techniques need to move instructions. (e.g., code motion) Before they move instructions, they should identify dependences. Suppose that a compiler schedules this code on a 4-issue VLIW machine. VLIW instructions S 1 S 2 S 3 S 4 VLIW instructions (improved) S 1 S 2 S 3 S 58 S 10 S 4 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 ld [%i5],%i0 mov 2,%i3 cmp %i3,%i0 bge.else add %i2,%i4,%i2 st %i2,[%i5] ba.endif.else: add %i2,%i4,%i2 mov %i2,%i0.endif: S 10 mov 1,%i3 S 11 add %i5,%i3,%i5 S 8 S S 5 S 6 S 7 S 9 S 10 S 6 S 7 S 11 S 9 data dependence control dependence Finding dependences helps the compiler to know where it may or may not move instructions.

13 Data Parallel Applications Scientific/multimedia applications Focus on structured code (loops) Array analysis (memory access pattern) Issues Finding parallel loops Transforming codes for better cache performance 13

14 Data Dependence Analysis Parallel loops? for (i=0; i<100; i++) a[ i ] = (b[ i-1 ] + b[ i ] + b[ i+1 ]) / 3; Data dependence between loop iterations Data dependence analysis 14

15 Dependences in loops (1/3) Control dependence forces execution order among loop iterations (instances) [1 st iteration] L1: [2 nd iteration] L1: br L1 br L1 15

16 Dependences in loops (2/3) Loop-independent dependence dependence within the same loop iteration (loop code) i = 1 L1: a[i]=a[i-1]+1 b[i]=b[i+1]-1 i = i + 1 br.ne i,10,l1 loop independent anti-dependence 16

17 Dependences in loops (3/3) Loop-carried dependence dependence across loop iterations parallel loop : no loop-carried dependence (loop code) i = 1 L1: a[i]=a[i-1]+1 b[i]=b[i+1]-1 i = i + 1 br.ne i,10,l1 (1 st iteration) flow L1: a[1]=a[0]+1 b[1]=b[2]-1 i = i + 1 br.ne i,10,l1 (2 nd iteration) L1: a[2]=a[1]+1 b[2]=b[3]-1 i = i + 1 br.ne i,10,l1 anti 17

18 Iteration space Lexicographical order Distance vector : d = i 2 i 1, if i 1 δ i 2 Direction vector for i 0, N do for j 0, N do a[i, j] = a[i-1, j-1] + d; b[i, j] = b[i-1, j+1] + e; c[i, j] = c[i+1, j] + f i distance direction a[] : < 1, 1 > < +, + > b[] : < 1, -1 > < +, - > c[] : < 1, 0> < +, = > 0 0 j 18

19 Data Dependence Test (1) Any two references access the same memory location? E i 1, i 2, s.t. 3*i 1-5 = 2*i 2 +1, 1 i 1, i 2 4? Finding integer solution for given integer coefficient inequality equations integer programming (NP-complete) Many tests exist GCD test, Fourier-Motzkin elimination, Banerjee s inequality for i 1, 4 do b[ i ] = a[3*i-5] + 2; a[2*i+1] = 1 - i; (solution) i 1 = 4, i 2 = 3 flow dependence 19

20 GCD Test GCD test a 0 + a 1 i a n i n = b 0 + b 1 j b n j n a 1 i a n i n - b 1 j b n j n = b 0 - a 0 If GCD (a 1, a n, b 1, b n ) does not divides b 0 - a 0, no integer solution exists Weakness Does not consider index ranges Only deals with affine functions of loop indexes 20 for i 1 1, N 1 for i 2 1, N 2 for i n 1, N n A [, a 0 + a 1 i 1 + a n i n, ] = A[, b 0 + b 1 i 1 + b n i n, ]

21 Fourier-Motzkin Elimination (1) Decide satisfiability of conjunction of linear constraints over reals Earliest method for solving linear inequalities Discovered in 1826 by Fourier, re-discovered in 1936 by Motzkin Basic idea Pick one variable and eliminate it Continue until all variables but one are eliminated 21

22 Fourier-Motzkin Elimination (2) Assume eliminating x 1 Find B lo x 1 B up We eliminate x 1 Add B lo B up 22

Parallel Loop Loop L i is parallel, if all D=<d 1,, d n >, <d 1,,d i-1 > is lexicographically

23 Parallel Loop Loop L i is parallel, if all D=<d 1,, d n >, <d 1,,d i-1 > is lexicographically forward (i.e. LC dep. in outer loops) or <d 1,,d i > = 0 (i.e. no LC dependence in L i ) for i 0, N do for j 0, N do a[i, j] = a[i, j-1] + c; for i 0, N do for j 0, N do a[i, j] = a[i+2, j] + c; for i 0, N do for j 0, N do a[i, j] = a[i-1, j+1] + c; distance vector : <0,1> i (outer loop) : parallel j (inner loop) : not parallel distance vector : <2,0> i (outer loop) : not parallel j (inner loop) : parallel distance vector : <1,-1> i (outer loop) : not parallel j (inner loop) : parallel 23

24 Summary Instruction scheduling/reordering is a compiler technique that optimizes execution time or resource utilization by moving instructions based on the facts : If no dependence exists among instructions, they can be executed in any order. Instructions that are independent from each other can be executed in parallel. Parallelization also uses dependence information to achieve better performance by scheduling instructions for concurrent execution (e.g., VLIW compiler). 24

Data Dependences and Parallelization

Data Dependences and Parallelization 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11, 20 a[i]