Punctual Coalescing. Fernando Magno Quintão Pereira

Size: px

Start display at page:

Download "Punctual Coalescing. Fernando Magno Quintão Pereira"

Chad Gilmore
5 years ago
Views:

1 Punctual Coalescing Fernando Magno Quintão Pereira

Register Coalescing Register coalescing is an

int a = 0; int b = a; print(b); a R1 b R2 R1 =

2 Register Coalescing Register coalescing is an op7miza7on on top of register alloca7on. The objec7ve is to map both variables used in a copy instruc7on to the same register. int a = 0; int b = a; print(b); a R1 b R2 R1 = 0; R2 = R1; print(r2); a R1 b R1 R1 = 0; R1 = R1; print(r1);

3 The Importance of Register Coalescing Size reduc7on Execu7on speedup Energy saving

4 Punctual Coalescing Register assignment technique based on puzzle solving. It deals with aliased registers, e.g x86. No worries: I will define these concepts It is O(K*I), where K = number of registers, and I is number of instruc7ons. Strength: very fast! Weakness: it is a local solu7on.

5 Register Aliasing Two registers alias when an assignment to one may change the value of the other. Ex. X86 s GP regs: AX BX CX DX AH AL BH BL CH CL DH DL Ex. PowerPC s doubles: D0 D15 F10 F1 F30 F31 And also ARM, UltraSparc, ST240, etc

6 Register Aliasing Register aliasing allows the register allocator to keep more variables in registers. E.g.: two floats in the same register. But finding a register assignment that minimizes the number of registers taken is NPcomplete. Lee et al: Aliased register alloca7on for straightline programs is NP complete, TCS, 2008

7 Register alloca3on: fit the bars on the boxes. Aliased register alloca3on: Bars may have different widths. Program Live Ranges Registers a = a X L Y L Z L X H Y H Z H B = B c = c d = B d E = c E = a, d, E

8 Complexity result: op7mal aliased register alloca7on is NP complete. Program Live Ranges Registers X H = a X L Y L Z L X H Y H Z H Y = B X L = c Y H = Y d Z = X L E = X H,Y H,Z

9 Elementary form: split variables between each pair of consecu7ve instruc7ons. Elementary Program a 1 = a Live Ranges Registers X L Y L Z L X H Y H Z H (a 2 )=(a 1 ) B 2 = B (a 3,B 3 )=(a 2,B 2 ) c 3 = c (a 4,B 4,c 4 )=(a 3,B 3,c 3 ) d 4 = B 4 d (a 5,c 5,d 5 )=(a 4,c 4,d 4 ) E 5 = c 5 E (a 6,c 6,E 6 )=(a 5,c 5,E 5 ) = a 6,d 6,E 6

10 Complexity result: aliased register alloca7on has polynomial 7me solu7on for elementary form programs. Elementary Program X H = a Live Ranges Registers X L Y L Z L X H Y H Z H B Y = X L = c Y H = Y Copy d Y L = X H E X = X L = Y L,Y H,X

11 Elementary Form = Slow RA? If a program has O(V) variables and O(I) instruc7ons, then the elementary form program has O(V*I) variables. If we assume that each instruc7on defines a variable, we have O(V*I) = O(V 2 ). Many graph coloring based register allocators are O(V 2 ). O(V 4 ) Is just too slow. We have a O(K*I) approach, where K is the number of registers!

12 Growth in number of variables Number of variables in elementary form trace 0 Data extracted from 4054 program traces taken from SPEC CPU Number of variables in the original trace

13 Our solu3on: traverse the program from top to boeom using the solu7on of previous puzzles to guide the solu7on of the next one to be solved. a 1 = a X L Y L Z L X H Y H Z H (a 2 )=(a 1 ) B 2 = B (a 3,B 3 )=(a 2,B 2 ) c 3 = c (a 4,B 4,c 4 )=(a 3,B 3,c 3 ) d 4 = B 4 (a 5,c 5,d 5 )=(a 4,c 4,d 4 ) d E What is the best alloca3on at this point of the program? E 5 = c 5 (a 6,c 6,E 6 )=(a 5,c 5,E 5 ) = a 6,d 6,E 6

14 Punctual Coalescing Instance: two consecu7ve puzzles, p 1 and p 2, such that p 1 is already solved. Problem: find a solu7on for p 2 that minimizes the number of copies inserted between p 1 and p 1. Puzzle p 1 is called the guider, and puzzle p 2 is called the follower.

15 Example Guider This solu7on has only three matches. Follower This solu7on is beeer: it has four matches.

16 Polynomial 7me Punctual Coalescing Punctual coalescing has a polynomial 7me solu7on, as long as the follower starts with an empty register board. These are 89% of all the puzzles found in SPEC CPU 2000, when compiled to x86 using LLVM. Our solu7on is O(K), where K is the number of registers. For the whole program, O(I*K). For the solu7on, see our paper: Punctual Coalescing.

17 Punctual Global 1 a b c d E Sequence of Op7mal Punctual coalescings Op7mal Global Coalescing 2 a, b, c, d := := b, d a b c d a b c d a a b b c c d d a a c c b b d d 3 4 E := := E, a, c a c a E a c E a c c E E a a c c E E Global Coalescing: find the register assignment that Minimizes the number of copies in the whole program. This problem is NP complete.

18 Experiments We have compared four different coalescers: A coalescing oblivious register allocator based on the coloring of chordal graphs. The punctual coalescing heuris7cs used in register alloca7on by puzzle solving. The op7mal punctual coalescer. An op7mal solu7on for global coalescing, based on integer linear programming. Data: program traces from SPEC CPU 2000.

19 Number of copies inserted Gzip Vpr Gcc Mcf Craly Parser Gap Vortex Bzip2 Twolf Chordal (x 1000) 13,4 47,6 329,8 4,75 46,2 27,0 174,6 199,3 10,5 101,8 Heuris7cs Punctual ILP

20 Conclusion Punctual coalescing is a viable alterna7ve for JIT compilers. Has been implemented in LLVM with very compe77ve compila7on 7mes. Results are very good for program traces. We need a beeer algorithm for whole func7on op7miza7on! Slower compila7on 7me, but beeer code quality.

Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein

Register Alloca.on Deconstructed David Ryan Koes Seth Copen Goldstein 12th Interna+onal Workshop on So3ware and Compilers for Embedded Systems April 24, 12009 Register Alloca:on Problem unbounded number