Verified Validation of Lazy Code Motion

Verified Validaion of Lazy Code Moion Jean-Bapise Trisan, Xavier Leroy To cie his version: Jean-Bapise Trisan, Xavier Leroy. Verified Validaion of Lazy Code Moion. ACM SIGPLAN conference on Programming Language Design and Implemenaion (PLDI), Jun 2009, Dublin, Ireland. ACM, pp.316-326, 2009, <10.1145/1542476.1542512>. <inria-00415865> HAL Id: inria-00415865 hps://hal.inria.fr/inria-00415865 Submied on 11 Sep 2009 HAL is a muli-disciplinary open access archive for he deposi and disseminaion of scienific research documens, wheher hey are published or no. The documens may come from eaching and research insiuions in France or abroad, or from public or privae research ceners. L archive ouvere pluridisciplinaire HAL, es desinée au dépô e à la diffusion de documens scienifiques de niveau recherche, publiés ou non, émanan des éablissemens d enseignemen e de recherche français ou érangers, des laboraoires publics ou privés.

Verified Validaion of Lazy Code Moion Jean-Bapise Trisan INRIA Paris-Rocquencour jean-bapise.risan@inria.fr Xavier Leroy INRIA Paris-Rocquencour xavier.leroy@inria.fr Absrac Translaion validaion esablishes a poseriori he correcness of a run of a compilaion pass or oher program ransformaion. In his paper, we develop an efficien ranslaion validaion algorihm for he Lazy Code Moion (LCM) opimizaion. LCM is an ineresing challenge for validaion because i is a global opimizaion ha moves code across loops. Consequenly, care mus be aken no o move compuaions ha may fail before loops ha may no erminae. Our validaor includes a specific check for anicipabiliy o rule ou such incorrec moves. We presen a mechanicallychecked proof of correcness of he validaion algorihm, using he Coq proof assisan. Combining our validaor wih an unverified implemenaion of LCM, we obain a LCM pass ha is provably semanics-preserving and was inegraed in he CompCer formally verified compiler. Caegories and Subjec Descripors D.2.4 [Sofware Engineering]: Sofware/Program Verificaion - Correcness proofs; D.3.4 [Programming Languages]: Processors - Opimizaion; F.3.1 [Logics and Meanings of Programs]: Specifying and Verifying and Reasoning abou Programs - Mechanical verificaion; F.3.2 [Logics and Meanings of Programs]: Semanics of Programming Languages - Operaional semanics General Terms Languages, Verificaion, Algorihms Keywords Translaion validaion, lazy code moion, redundancy eliminaion, verified compilers, he Coq proof assisan 1. Inroducion Advanced compiler opimizaions perform suble ransformaions over he programs being compiled, exploiing he resuls of delicae saic analyses. Consequenly, compiler opimizaions are someimes incorrec, causing he compiler eiher o crash a compileime, or o silenly generae bad code from a correc source program. The laer case is especially roublesome since such compiler-inroduced bugs are very difficul o rack down. Incorrec opimizaions ofen sem from bugs in he implemenaion of a correc opimizaion algorihm, bu someimes he algorihm iself is fauly, or he condiions under which i can be applied are no well undersood. The sandard approach o weeding ou incorrec opimizaions is heavy esing of he compiler. Translaion validaion, as inroduced Permission o make digial or hard copies of all or par of his work for personal or classroom use is graned wihou fee provided ha copies are no made or disribued for profi or commercial advanage and ha copies bear his noice and he full ciaion on he firs page. To copy oherwise, o republish, o pos on servers or o redisribue o liss, requires prior specific permission and/or a fee. PLDI 09, June 15 20, 2009, Dublin, Ireland. Copyrigh c 2009 ACM 978-1-60558-392-1/09/06... $5.00 by Pnueli e al. (1998b), provides a more sysemaic way o deec (a compile-ime) semanic discrepancies beween he inpu and he oupu of an opimizaion. A every compilaion run, he inpu code and he generaed code are fed o a validaor (a piece of sofware disinc from he compiler iself), which ries o esablish a poseriori ha he generaed code behaves as prescribed by he inpu code. If, however, he validaor deecs a discrepancy, or is unable o esablish he desired semanic equivalence, compilaion is abored; some validaors also produce an explanaion of he error. Algorihms for ranslaion validaion roughly fall in wo classes. (See secion 9 for more discussion.) General-purpose validaors such as hose of Pnueli e al. (1998b), Necula (2000), Barre e al. (2005), Rinard and Marinov (1999) and Rival (2004) rely on generic echniques such as symbolic execuion, model-checking and heorem proving, and can herefore be applied o a wide range of program ransformaions. Since checking semanic equivalence beween wo code fragmens is undecidable in general, hese validaors can generae false alarms and have high complexiy. If we are ineresed only in a paricular opimizaion or family of relaed opimizaions, special-purpose validaors can be developed, aking advanage of our knowledge of he limied range of code ransformaions ha hese opimizaions can perform. Examples of special-purpose validaors include ha of Huang e al. (2006) for regiser allocaion and ha of Trisan and Leroy (2008) for lis and race insrucion scheduling. These validaors are based on efficien saic analyses and are believed o be correc and complee. This paper presens a ranslaion validaor specialized o he Lazy Code Moion (LCM) opimizaion of Knoop e al. (1992, 1994). LCM is an advanced opimizaion ha removes redundan compuaions; i includes common subexpression eliminaion and loop-invarian code moion as special cases, and can also eliminae parially redundan compuaions (i.e. compuaions ha are redundan on some bu no all program pahs). Since LCM can move compuaions across basic blocks and even across loops, is validaion appears more challenging han ha of regiser allocaion or race scheduling, which preserve he srucure of basic blocks and exended basic blocks, respecively. As we show in his work, he validaion of LCM urns ou o be relaively simple (a any rae, much simpler han he LCM algorihm iself): i explois he resuls of a sandard available expression analysis. A delicae issue wih LCM is ha i can anicipae (inser earlier compuaions of) insrucions ha can fail a run-ime, such as memory loads from a poenially invalid poiner; if done carelessly, his ransformaion can urn code ha diverges ino code ha crashes. To address his issue, we complemen he available expression analysis wih a socalled anicipabiliy checker, which ensures ha he ransformed code is a leas as defined as he original code. Translaion validaion provides much addiional confidence in he correcness of a program ransformaion, bu does no compleely rule ou he possibiliy of compiler-inroduced bugs: wha if he validaor iself is buggy? This is a concern for he developmen of criical sofware, where sysemaic esing does no suffice

o reach he desired level of assurance and mus be complemened by formal verificaion of he source. Any bug in he compiler can poenially invalidae he guaranees obained by his use of formal mehods. One way o address his issue is o formally verify he compiler iself, proving ha every pass preserves he semanics of he program being compiled. Several ambiious compiler verificaion effors are currenly under way, such as he Jinja projec of Klein and Nipkow (2006), he Verisof projec of Leinenbach e al. (2005), and he CompCer projec of Leroy e al. (2004 2009). Translaion validaion can provide semanic preservaion guaranees as srong as hose obained by formal verificaion of a compiler pass: i suffices o prove ha he validaor is correc, i.e. reurns rue only when he wo programs i receives as inpus are semanically equivalen. The compiler pass iself does no need o be proved correc. As illusraed by Trisan and Leroy (2008), he proof of a validaor can be significanly simpler and more reusable han ha of he corresponding opimizaions. The ranslaion validaor for LCM presened in his paper was mechanically verified using he Coq proof assisan (Coq developmen eam 1989 2009; Bero and Caséran 2004). We give a deailed overview of his proof in secions 5 o 7. Combining he verified validaor wih an unverified implemenaion of LCM wrien in Caml, we obain a provably correc LCM opimizaion ha inegraes smoohly wihin he CompCer verified compiler (Leroy e al. 2004 2009). The remainder of his paper is as follows. Afer a shor presenaion of Lazy Code Moion (secion 3) and of he RTL inermediae language over which i is performed (secion 2), secion 4 develops our validaion algorihm for LCM. The nex hree secions ouline he correcness proof of his algorihm: secion 5 gives he dynamic semanics of RTL, secion 6 presens he general shape of he proof of semanic preservaion using a simulaion argumen, and secion 7 deails he LCM-specific aspecs of he proof. Secion 8 discusses oher aspecs of he validaor and is proof, including compleeness, complexiy, performance and reusabiliy. Relaed work is discussed in secion 9, followed by conclusions in secion 10. 2. The RTL inermediae language The LCM opimizaion and is validaion are performed on he RTL inermediae language of he CompCer compiler. This is a sandard Regiser Transfer Language where conrol is represened by a conrol flow graph (CFG). Nodes of he CFG carry absrac insrucions, corresponding roughly o machine insrucions bu operaing over pseudo-regisers (also called emporaries). Every funcion has an unlimied supply of pseudo-regisers, and heir values are preserved across funcion calls. An RTL program is a collecion of funcions plus some global variables. As shown in figure 1, funcions come in wo flavors: exernal funcions ef are merely declared and model inpu-oupu operaions and similar sysem calls; inernal funcions f are defined wihin he language and consis of a ype signaure sig, a parameer lis r, he size n of heir acivaion record, an enry poin l, and a CFG g represening he code of he funcion. The CFG is implemened as a finie map from node labels l (posiive inegers) o insrucions. The se of insrucions includes arihmeic operaions, memory load and sores, condiional branches, and funcion calls, ail calls and reurns. Each insrucion carries he lis of is successors in he CFG. When he successor l is irrelevan or clear from he conex, we use he following more readable noaions for regisero-regiser moves, arihmeic operaions, and memory loads: r := r for op(move, r, r, l) r := op(op, r) for op(op, r, r, l) r := load(chunk, mode, r) for load(chunk, mode, r, r, l) A more deailed descripion of RTL can be found in (Leroy 2008). RTL insrucions: i ::= nop(l) no operaion op(op, r, r, l) arihmeic operaion load(chunk, mode, r, r, l) memory load sore(chunk, mode, r, r, l) memory sore call(sig, (r id), r, r, l) funcion call ailcall(sig, (r id), r) funcion ail call cond(cond, r, l rue, l false ) condiional branch reurn reurn(r) funcion reurn Conrol-flow graphs: g ::= l i finie map RTL funcions: fd ::= f ef f ::= id {sig sig; inernal funcion params r; sack n; sar l; graph g } ef ::= id { sig sig } exernal funcion 3. Lazy Code Moion Figure 1. RTL synax Lazy code moion (LCM) (Knoop e al. 1992, 1994) is a daaflowbased algorihm for he placemen of compuaions wihin conrol flow graphs. I suppresses unnecessary recompuaions of values by moving heir firs compuaions earlier in he execuion flow (if necessary), and laer reusing he resuls of hese firs compuaions. Thus, LCM performs eliminaion of common subexpressions (boh wihin and across basic blocks), as well as loop invarian code moion. In addiion, i can also facor ou parially redundan compuaions: compuaions ha occur muliple imes on some execuion pahs, bu once or no a all on oher pahs. LCM is used in producion compilers, for example in GCC version 4. Figure 2 presens an example of lazy code moion. The original program in par (a) presens several ineresing cases of redundancies for he compuaion of 1+ 2: loop invariance (node 4), simple sraigh-line redundancy (nodes 6 and 5), and parial redundancy (node 5). In he ransformed program (par (b)), hese redundan compuaions of 1 + 2 have all been eliminaed: he expression is compued a mos once on every possible execuion pah. Two insrucions (node n 1 and n 2) have been added o he graph, boh of which compue 1 + 2 and save is resul ino a fresh emporary h 0. The hree occurrences of 1 + 2 in he original code have been rewrien ino move insrucions (nodes 4, 5 and 6 ), copying he fresh h 0 regiser o he original desinaions of he insrucions. The reader migh wonder why wo insrucions h 0 := 1 + 2 were added in he wo branches of he condiional, insead of a single insrucion before node 1. The laer is wha he parial redundancy eliminaion opimizaion of Morel and Renvoise (1979) would do. However, his would creae a long-lived emporary h 0, herefore increasing regiser pressure in he ransformed code. The lazy aspec of LCM is ha compuaions are placed as lae as possible while avoiding repeaed compuaions. The LCM algorihm explois he resuls of 4 daaflow analyses: up-safey (also called availabiliy), down-safey (also called anicipabiliy), delayabiliy and isolaion. These analyses can be implemened efficienly using bi vecors. Their resuls are hen cleverly combined o deermine an opimal placemen for each compuaion performed by he iniial program. Knoop e al. (1994) presens a correcness proof for LCM. However, mechanizing his proof appears difficul. Unlike he program ransformaions ha have already been mechanically verified in he CompCer projec, LCM is a highly non-local ransformaion: in-

2 1 I connecs each node of he source code o is (possibly rewrien) counerpar in he ransformed code. In he example of figure 2, ϕ maps nodes 1... 6 o heir primed versions 1... 6. We assume he unverified implemenaion of LCM is insrumened o produce his funcion. (In our implemenaion, we arrange ha ϕ is always he ideniy funcion.) Nodes ha are no in he image of ϕ are he fresh nodes inroduced by LCM. 4 := 1 + 2 6 := 1 + 2 4. A ranslaion validaor for Lazy Code Moion 4 3 6 In his secion, we deail a ranslaion validaor for LCM. 4 := h 0 4 (a) Original code 5 := 1 + 2 2 h 0 := 1 + 2 n 1 3 1 5 := h 0 5 (b) Code afer lazy code moion 5 h 0 := 1 + 2 n 2 6 := h 0 6 Figure 2. An example of lazy code moion ransformaion srucions are moved across basic blocks and even across loops. Moreover, he ransformaion generaes fresh emporaries, which adds significan bureaucraic overhead o mechanized proofs. I appears easier o follow he verified validaor approach. An addiional benefi of his approach is ha he LCM implemenaion can use efficien imperaive daa srucures, since we do no need o formally verify hem. Moreover, i makes i easier o experimen wih oher varians of LCM. To design and prove correc a ranslaion validaor for LCM, i is no imporan o know all he deails of he analyses ha indicae where new compuaions should be placed and which insrucions should be rewrien. However i is imporan o know wha kind of ransformaions happen. Two kinds of rewriings of he graph can occur: The nodes ha exis in he original code (like node 4 in figure 2) sill exis in he ransformed code. The insrucion hey carry is eiher unchanged or can be rewrien as a move if hey are arihmeic operaions or loads (bu no calls, ail calls, sores, reurns nor condiions). Some fresh nodes are added (like node n 1) o he ransformed graph. Their lef-hand side is a fresh regiser; heir righ-hand side is he righ-hand side of some insrucions in he original code. There exiss an injecive funcion from nodes of he original code o nodes of he ransformed code. We call his mapping ϕ. 4.1 General srucure Since LCM is an inraprocedural opimizaion, he validaor proceeds funcion per funcion: each inernal funcion f of he original program is mached agains he idenically-named funcion f of he ransformed program. Moreover, LCM does no change he ype signaure, parameer lis and sack size of funcions, and can be assumed no o change he enry poin (by insering nops a he graph enrance if needed). Checking hese invarians is easy; hence, we can focus on he validaion of funcion graphs. Therefore, he validaion algorihm is of he following shape: validae(f, f, ϕ) = le AE = analyze(f ) in f.sig = f.sig and f.params = f.params and f.sack = f.sack and f.sar = f.sar and for each node n of f, V (f, f, n, ϕ, AE) = rue As discussed in secion 3, he ϕ parameer is he mapping from nodes of he inpu graph o nodes of he ransformed graph provided by he implemenaion of LCM. The analyze funcion is a saic analysis compuing available expressions, described below in secion 4.2.1. The V funcion validaes pairs of maching nodes and is composed of wo checks: unify, described in secion 4.2.2 and pah, described in secion 4.3.2. V (f, f, n, ϕ, AE) = unify(rd(n ), f.graph(n), f.graph(ϕ(n))) and for all successor s of n and maching successor s of n, pah(f.graph, f.graph, s, ϕ(s)) As oulined above, our implemenaion of a validaor for LCM is carefully srucured in wo pars: a generic, raher bureaucraic framework parameerized over he analyze and V funcions; and he LCM-specific, more suble funcions analyze and V. As we will see in secion 7, his srucure faciliaes he correcness proof of he validaor. I also makes i possible o reuse he generic framework and is proof in oher conexs, as illusraed in secion 8. We now focus on he consrucion of V, he node-level validaor, and he saic analysis i explois. 4.2 Verificaion of he equivalence of single insrucions Consider an insrucion i a node n in he original code and he corresponding insrucion i a node ϕ(n) in he code afer LCM (for example, nodes 4 and 4 in figure 2). We wish o check ha hese wo insrucions are semanically equivalen. If he ransformaion was a correc LCM, wo cases are possible: i = i : boh insrucions will obviously lead o equivalen runime saes, if execued in equivalen iniial saes. i is of he form r := h for some regiser r and fresh regiser h, and i is of he form r := rhs for some righ-hand side rhs, which can be eiher an arihmeic operaion op(...) or a memory readload(...). In he laer case, we need o verify ha rhs and h produce he same value. More precisely, we need o verify ha he value conained

in h in he ransformed code is equal o he value produced by evaluaing rhs in he original code. LCM being a purely synacical redundancy eliminaion ransformaion, i mus be he case ha he insrucion h := rhs exiss on every pah leading o ϕ(n) in he ransformed code; moreover, he values of h and rhs are preserved along hese pahs. This propery can be checked by performing an available expression analysis on he ransformed code. 4.2.1 Available expressions The available expression analysis produces, for each program poin of he ransformed code, a se of equaions r = rhs beween regisers and righ-hand sides. (For efficiency, we encode hese ses as finie maps from regisers o righ-hand sides, represened as Paricia rees.) Available expressions is a sandard forward daaflow analysis: AE(s) = \ {T(f.graph(l), AE(l)) s is a successor of l} The join operaion is se inersecion; he op elemen of he laice is he empy se, and he boom elemen is a symbolic consan U denoing he universe of all equaions. The ransfer funcion T is sandard; full deails can be found in he Coq developmen. For insance, if he insrucion i is he operaion r := 1 + 2, and R is he se of equaions before i, he se T(i, R) of equaions afer i is obained by adding he equaliy r = 1 + 2 o R, hen removing every equaliy in his se ha uses regiser r (including he one jus added if 1 or 2 equals r). We also rack equaliies beween regiser and load insrucions. Those equaliies are erased whenever a sore insrucion is encounered because we do no mainain aliasing informaion. To solve he daaflow equaions, we reuse he generic implemenaion of Kildall s algorihm provided by he CompCer compiler. Leveraging he correcness proof of his solver and he definiion of he ransfer funcion, we obain ha he equaions inferred by he analysis hold in any concree execuion of he ransformed code. For example, if he se of equaions a poin l include he equaliy r = 1+ 2, i mus be he case ha R(r) = R( 1)+R( 2) for every possible execuion of he program ha reaches poin l wih a regiser sae R. 4.2.2 Insrucion unificaion Armed wih he resuls of he available expression analysis, he unify check beween pairs of maching insrucions can be easily expressed: unify(d, i, i ) = if i = i henrue else case (i, i ) of (r := op(op, r), r := h) (h = op(op, r)) D (r := load(chunk,mode, r), r := h) (h = load(chunk,mode, r)) D oherwise false Here, D = AE(n ) is he se of available expressions a he poin n where he ransformed insrucion i occurs. Eiher he original insrucion i and he ransformed insrucion i are equal, or he former is r := rhs and he laer is r := h, in which case insrucion unificaion succeeds if and only if he equaion h = rhs is known o hold according o he resuls of he available expression analysis. 4.3 Verifying he flow of conrol Unifying pairs of insrucions is no enough o guaranee semanic preservaion: we also need o check ha he conrol flow is preserved. For example, in he code shown in figure 2, afer checking ha he condiional ess a nodes 1 and 1 are idenical, we mus n m ϕ ϕ h 1 := rhs 1 h 2 := rhs 2 ϕ(n) ϕ(m) Figure 3. Effec of he ransformaion on he srucure of he code make sure ha whenever he original code ransiions from node 1 o node 6, he ransformed code can ransiion from node 1 o 6, execuing he anicipaed compuaion a n 2 on is way. More generally, if he k-h successor of n in he original CFG is m, here mus exis a pah in he ransformed CFG from ϕ(n) o ϕ(m) ha goes hrough he k-h successor of ϕ(n). (See figure 3.) Since insrucions can be added o he ransformed graph during lazy code moion, ϕ(m) is no necessarily he k-h successor of ϕ(n): one or several anicipaed compuaions of he shape h := rhs may need o be execued. Here comes a delicae aspec of our validaor: no only here mus exis a pah from ϕ(n) o ϕ(m), bu moreover he anicipaed compuaions h := rhs found on his pah mus be semanically well-defined: hey should no go wrong a run-ime. This is required o ensure ha whenever an execuion of he original code ransiions in one sep from n o m, he ransformed code can ransiion (possibly in several seps) from ϕ(n) o ϕ(m) wihou going wrong. Figure 4 shows hree examples of code moion where his propery may no hold. In all hree cases, we consider anicipaing he compuaion a/b (an ineger division ha can go wrong if b = 0) a he program poins marked by a double arrow. In he lefmos example, i is obviously unsafe o compue a/b before he condiional es: quie possibly, he es in he original code checks ha b 0 before compuing a/b. The middle example is more suble: i could be he case ha he loop preceding he compuaion of a/b does no erminae whenever b = 0. In his case, he original code never crashes on a division by zero, bu anicipaing he division before he loop could cause he ransformed program o do so. The righmos example is similar o he middle one, wih he loop being replaced by a funcion call. The siuaion is similar because he funcion call may no erminae when b = 0. How, hen, can we check ha he insrucions ha have been added o he graph are semanically well-defined? Because we disinguish erroneous execuions and diverging execuions, we canno rely on a sandard anicipabiliy analysis. Our approach is he following: whenever we encouner an insrucion h := rhs ha was insered by he LCM ransformaion on he pah from ϕ(n) x := a/b x := a/b f(y) x := a/b Figure 4. Three examples of incorrec code moion. Placing a compuaion of a/b a he program poins marked by can poenially ransform a well-defined execuion ino an erroneous one.

1 funcion an checker rec (g,rhs,p c,s) = 2 3 case S(p c) of 4 Found (S,rue) 5 NoFound (S,false) 6 Visied (S,false) 7 Dunno 8 9 case g(p c) of 10 reurn (S{p c NoFound},false) 11 ailcall (,, ) (S{p c NoFound},false) 12 cond (,,l rue,l false ) 13 le (S,b 1) = an checker rec (g,rhs,l rue,s{p c Visied}) in 14 le (S,b 2) = an checker rec (g,rhs,l false,s ) in 15 if b 1&& b 2hen (S {p c Found},rue) else (S {p c NoFound},false) 16 nop l 17 le (S,b) := an checker rec (g,rhs,l,s{p c Visied}) in 18 if b hen (S {p c Found},rue) else (S {p c NoFound},false) 19 call (,,,,l) (S{p c NoFound},false) 20 sore (,,,,l) 21 if rhs reads memory hen (S{p c NoFound},false) else 22 le (S,b) := an checker rec (g,rhs,l,s{p c Visied}) in 23 if b hen (S {p c Found},rue) else (S {p c NoFound},false) 24 op (op,args,r,l) 25 if r is an operand of rhs hen (S{p c NoFound},false) else 26 if rhs = (op op args) hen (S{p c Found},rue) else 27 le (S,b) = an checker rec (g,rhs,l,s{p c Visied}) in 28 if b hen (S {p c Found},rue) else (S {p c NoFound},false) 29 load (chk,addr,args,r,l) 30 if r is an operand of rhs hen (S{p c NoFound},false) else 31 if rhs = (load chk addr args) hen (S{p c Found},rue) else 32 le (S,b) = an checker rec (g,rhs,l,s{p c Visied}) in 33 if b hen (S {p c Found},rue) else (S {p c NoFound},false) 34 35 36 funcion an checker (g,rhs,p c) = le (S,b) = an checker rec(g,rhs,p c,(l Dunno)) in b Figure 5. Anicipabiliy checker o ϕ(m), we check ha he compuaion of rhs is ineviable in he original code saring a node m. In oher words, all execuion pahs saring from m in he original code mus, in a finie number of seps, compue rhs. Since he semanic preservaion resul ha we wish o esablish akes as an assumpion ha he execuion of he original code does no go wrong, we know ha he compuaion of rhs canno go wrong, and herefore i is legal o anicipae i in he ransformed code. We now define precisely an algorihm, called he anicipabiliy checker, ha performs his check. 4.3.1 Anicipabiliy checking Our algorihm is described in figure 5. I akes four argumens: a graph g, an insrucion righ-hand side rhs o search for, a program poin l where he search begins and a map S ha associaes o every node a marker. Is goal is o verify ha on every pah saring a l in he graph g, execuion reaches an insrucion wih righhand side rhs such ha none of he operands of rhs have been redefined on he pah. Basically i is a deph-firs search ha covers all he pah saring a l. Noe ha if here is a pah saring a l ha conains a loop so ha rhs is neiher beween l and he loop nor in he loop iself, hen here exiss a pah on which rhs is no reachable and ha corresponds o an infinie execuion. To obain an efficien algorihm, we need o ensure ha we do no go hrough loops several imes. To his end, if he search reaches a join poin no for he firs ime and where rhs was no found before, we mus sop searching immediaely. This is achieved hrough he use of four differen markers over nodes: Found means ha rhs is compued on every pah from he curren node. NoFound means ha here exiss a pah from he curren node in which rhs is no compued. Dunno is he iniial sae of every node before i has been visied. Visied is he sae when a sae is visied and we do no know ye wheher rhs is compued on all pahs or no. I is used o deec loops. Le us deail a few cases. When he search reaches a node ha is marked Visied (line 6), i means ha he search wen hrough a loop and rhs was no found. This could lead o a semanics discrepancy (recall he middle example in figure 4) and he search fails. For similar reasons, i also fails when a call is reached (line 19). When he search reaches an operaion (line 24), we firs verify (line 25) ha r, he desinaion regiser of he insrucion does no modify he operands of rhs. Then, (line 26) if he insrucion righhand side we reached correspond o rhs, we found rhs and we mark he node accordingly. Oherwise, he search coninues (line 27) and we mark he node based on wheher he recursive search found rhs or no (line 28).

The an checker funcion, when i reurns Found, should imply ha he righ-hand side expression is well defined. We prove ha his is he case in secion 7.3 below. 4.3.2 Verifying he exisence of semanics pahs Once we can decide he well-definedness of insrucions, checking for he exisence of a pah beween wo nodes of he ransformed graph is simple. The funcion pah(g, g, n, m) checks ha here exiss a pah in CFG g from node n o node m, composed of zero, one or several single-successor insrucions of he form h := rhs. The desinaion regiser h mus be fresh (unused in g) so as o preserve he absrac semanics equivalence invarian. Moreover, he righ-hand side rhs mus be safely anicipable: i mus be he case ha an checker(g,rhs, ϕ 1 (m)) = Found, so ha rhs can be compued before reaching m wihou geing suck. 5. Dynamic semanics of RTL In preparaion for a proof of correcness of he validaor, we now ouline he dynamic semanics of he RTL language. More deails can be found in (Leroy 2008). The semanics manipulaes values, wrien v, comprising 32-bi inegers, 64-bi floas, and poiners. Several environmens are involved in he semanics. Memory saes M map poiners and memory chunks o values, in a way ha accouns for bye addressing and possible overlap beween chunks (Leroy and Blazy 2008). Regiser files R map regisers o values. Global environmens G associae poiners o names of global variables and funcions, and funcion definiions o funcion poiners. The semanics of RTL programs is given in small-sep syle, as a ransiion relaion beween execuion saes. Three kinds of saes are used: Regular saes: S(Σ, f, σ, l, R, M). This sae corresponds o an execuion poin wihin he inernal funcion f, a node l in he CFG of f. R and M are he curren regiser file and memory sae. Σ represens he call sack, and σ poins o he acivaion record for he curren invocaion of f. Call saes: C(Σ,fd, v, M). This is an inermediae sae represening an invocaion of funcion F d wih parameers v. Reurn saes: R(Σ, v, M). Symmerically, his inermediae sae represens a funcion reurn, wih reurn value v being passed back o he caller. Call sacks Σ are liss of frames F(r, f, σ, l, R), where r is he desinaion regiser where he value compued by he callee is o be sored on reurn, f is he caller funcion, and σ, l and R is local sae a he ime of he funcion call. The semanics is defined by he one-sep ransiion relaion G S S, where G is he global environmen (invarian during execuion), S and S he saes before and afer he ransiion, and a race of he exernal funcion call possibly performed during he ransiion. Traces record he names of exernal funcions invoked, along wih he argumen values provided by he program and he reurn value provided by he exernal world. To give a flavor of he semanics and show he level of deail of he formalizaion, figure 6 shows a subse of he rules defining he one-sep ransiion relaion. For example, he firs rule saes ha if he program couner l poins o an insrucion ha is an operaion of he form op(op, r, r d, l ), and if evaluaing he operaor op on he values conained in he regisers r of he regiser file R reurns he value v, hen we ransiion o a new regular sae where he regiser r d of R is updaed o hold he value v, and he program couner moves o he successor l of he operaion. The only rule ha produces a non-empy race is he one for exernal funcion invocaions (las rule in figure 6); all oher rules produce he empy race ε. f.graph(l) = op(op, r, r d, l ) v = eval op(g, op, R( r)) G S(Σ, f, σ, l, R, M) ε S(Σ, f, σ, l, R{r d v}, M) f.graph(l) = call(sig, r f, r, r d, l ) G(R(r f )) = fd fd.sig = sig Σ = F(r d, f, σ, l, R).Σ G S(Σ, f, σ, l, R, M) ε C(Σ,fd, v, M) f.graph(l) = reurn(r) v = R(r) G S(Σ, f, σ, l, R, M) ε R(Σ, v, M) Σ = F(r d, f, σ, l, R).Σ G R(Σ, v, M) ε S(Σ, f, σ, l, R{r d v}, M) alloc(m, 0, f.sacksize) = (σ, M ) l = f.sar R = [f.params v] G C(Σ, f, v, M) ε S(Σ, f, σ, l, R, M) = (ef.name, v, v) G C(Σ,ef, v, M) R(Σ, v, M) Figure 6. Seleced rules from he dynamic semanics of RTL Sequences of ransiions are capured by he following closures of he one-sep ransiion relaion: G S S zero, one or several ransiions G S + S one or several ransiions G S T infiniely many ransiions The finie race and he finie or infinie race T record he exernal funcion invocaions performed during hese sequences of ransiions. The observable behavior of a program P, hen, is defined in erms of he races corresponding o ransiion sequences from an iniial sae o a final sae. We wrie P B o say ha program P has behavior B, where B is eiher erminaion wih a finie race, or divergence wih a possibly infinie race T. Noe ha compuaions ha go wrong, such as an ineger division by zero, are modeled by he absence of a ransiion. Therefore, if P goes wrong, hen P B does no hold for any B. 6. Semanics preservaion for LCM Le P i be an inpu program and P o be he oupu program produced by he unrused implemenaion of LCM. We wish o prove ha if he validaor succeeds on all pairs of maching funcions from P i and P o, hen P i B P o B. In oher words, if P i does no go wrong and execues wih observable behavior B, hen so does P o. 6.1 Simulaing execuions The way we build a semanics preservaion proof is o consruc a relaion beween execuion saes of he inpu and oupu programs, wrien S i S o, and show ha i is a simulaion: Iniial saes: if S i and S o are wo iniial saes, hen S i S o. Final saes: if S i S o and S i is a final sae, hen S o mus be a final sae. Simulaion propery: if S i S o, any ransiion from sae S i wih race is simulaed by one or several ransiions saring in sae S o, producing he same race, and preserving he simulaion relaion. The hypohesis ha he inpu program P i does no go wrong plays a crucial role in our semanic preservaion proof, in paricular o show he correcness of he anicipabiliy crierion. Therefore,

we reflec his hypohesis in he precise saemen of he simulaion propery above, as follows. (G i, G o are he global environmens corresponding o programs P i and P o, respecively.) DEFINITION 1 (Simulaion propery). Le I i be he iniial sae of program P i and I o ha of program P o. Assume ha S i S o (curren saes are relaed) G i S i S i (he inpu program makes a ransiion) G i I i S i and G o I o S o (curren saes are reachable from iniial saes) G i S i B for some behavior B (he inpu program does no go wrong afer he ransiion). Then, here exiss S o such ha G o S o + S o and S i S o. The commuing diagram corresponding o his definiion is depiced below. Solid lines represen hypoheses; dashed lines represen conclusions. Inpu program: Oupu program: I i S i S i I o S o + S o does no go wrong I is easy o show ha he simulaion propery implies semanic preservaion: THEOREM 1. Under he hypoheses beween iniial saes and final saes and he simulaion propery, P i B implies P o B. 6.2 The invarian of semanic preservaion We now consruc he relaion beween execuion saes before and afer LCM ha acs as he invarian in our proof of semanic preservaion. We firs define a relaion beween regiser files. DEFINITION 2 (Equivalence of regiser files). f R R if and only if R(r) = R (r) for every regiser r ha appears in an insrucion of f s code. This definiion allows he regiser file R of he ransformed funcion o bind addiional regisers no presen in he original funcion, especially he emporary regisers inroduced during LCM opimizaion. Equivalence beween execuion saes is hen defined by he hree rules below. DEFINITION 3 (Equivalence of execuion saes). validae(f, f, ϕ) = rue f R R G, G Σ F Σ G, G S(Σ, f, σ, l, R, M) S(Σ, f, σ, ϕ(l), R, M) T V(fd) = fd G, G Σ F Σ G, G C(Σ,fd, v, M) C(Σ,fd, v, M) G, G Σ F Σ G, G R(Σ, v, M) R(Σ, v, M) Generally speaking, equivalen saes mus have exacly he same memory saes and he same value componens (sack poiner σ, argumens and resuls of funcion calls). As menioned before, he regiser files R, R of regular saes may differ on emporary regisers bu mus be relaed by he f R R relaion. The funcion pars f, f mus be relaed by a successful run of validaion. The program poins l, l mus be relaed by l = ϕ(l). The mos delicae par of he definiion is he equivalence beween call sacks G, G Σ F Σ. The frames of he wo sacks Σ and Σ mus be relaed pairwise by he following predicae. DEFINITION 4 (Equivalence of sack frames). validae(f, f, ϕ) = rue f R R v, M, B, G S(Σ, f, σ, l, R{r v}, M) B = R, f R{r v} R G S(Σ, f, σ, l, R {r v}, M) ε + S(Σ, f, σ, ϕ(l), R, M) G, G F(r, f, σ, l, R) F F(r, f, σ, l, R ) The scary-looking hird premise of he definiion above capures he following condiion: if we suppose ha he execuion of he iniial program is well-defined once conrol reurns o node l of he caller, hen i should be possible o perform an execuion in he ransformed graph from l down o ϕ(l). This requiremen is a consequence of he anicipabiliy problem. As explained earlier, we need o make sure ha execuion is well defined from l o ϕ(l). Bu when he insrucion is a funcion call, we have o sore his informaion in he equivalence of frames, universally quanified on he no-ye-known reurn value v and memory sae M a reurn ime. A he ime we sore he propery we do no know ye if he execuion will be semanically correc from l, so we suppose i unil we ge he informaion (ha is, when execuion reaches l). Having saed semanics preservaion as a simulaion diagram and defined he invarian of he simulaion, we now urn o he proof iself. 7. Skech of he formal proof This secion gives a high-level overview of he correcness proof for our validaor. I can be used as an inroducion o he Coq developmen, which gives full deails. Besides giving an idea of how we prove he validaion kernel (his proof differs from earlier paper proofs mainly on he handling of semanic well-definedness), we ry o show ha he burden of he proof can be reduced by adequae design. 7.1 Design: geing rid of bureaucracy Recall ha he validaor is composed of wo pars: firs, a generic validaor ha requires an implemenaion of V and of analyze; second, an implemenaion of V and analyze specialized for LCM. The proof follows his srucure: on one hand, we prove ha if V saisfies he simulaion propery, hen he generic validaor implies semanics preservaion; on he oher hand, we prove ha he node-level validaion specialized for LCM saisfies he simulaion propery. This decomposiion of he proof improves re-usabiliy and, above all, grealy improves absracion for he proof ha V saisfies he simulaion propery (which is he kernel of he proof on which we wan o focus) and hence reduces he proof burden of he formalizaion. Indeed, many deails of he formalizaion can be hidden in he proof of he framework. This includes, among oher hings, funcion invocaion, funcion reurn, global variables, and sack managemen. Besides, his allows us o prove ha V only saisfies a weaker version of he simulaion propery ha we call he validaion propery, and whose equivalence predicae is a simplificaion of he equivalence presened in secion 6.2. In he simplified equivalence predicae, here is no menion of sack equivalence, funcion ransformaion, sack poiners or resuls of he validaion. DEFINITION 5 (Absrac equivalence of saes). f R R l = ϕ(l) G, G S(Σ, f, σ, l, R, M) S S(Σ, f, σ, l, R, M) G, G C(Σ,fd, v, M) C C(Σ,fd, v, M)

G, G R(Σ, v, M) R R(Σ, v, M) The validaion propery is saed in hree version, one for regular saes, one for calls and one for reurn. We presen only he propery for regular saes. If S = S(Σ, f, σ, l, R, M) is a regular sae, we wrie S.f for he f componen of he sae and S.l for he l componen. DEFINITION 6 (Validaion propery). Le I i be he iniial sae of program P i and I o ha of program P o. Assume ha S i S S o G i S i S i G i I i S i and G o I o S o S i B for some behavior B V (S i.f, S o.f, S i.l, ϕ,analyze(s o.f)) = rue Then, here exiss S o such ha S o + S o and S i S o. We hen prove ha if V saisfies he validaion propery, and if he wo programs P i, P o successfully pass validaion, hen he simulaion propery (definiion 1) is saisfied, and herefore (heorem 1) semanic preservaion holds. This proof is no paricularly ineresing bu represens a large par of he Coq developmen and requires a fair knowledge of CompCer inernals. We now ouline he formal proof of he fac ha V saisfies he validaion propery, which is he more ineresing par of he proof. 7.2 Verificaion of he equivalence of single insrucions We firs need o prove he correcness of he available expression analysis. The predicae S = E saes ha a se of equaliies E inferred by he analysis are saisfied in execuion sae S. The predicae is always rue on call saes and on reurn saes. DEFINITION 7 (Correcness of a se of equaliies). S(Σ, f, σ, l, R, M) = RD(l) if and only if (r = op(op, r)) RD(l) implies R(r) = eval op(op, R( r)) (r = load(chunk,mode, r)) RD(l) implies eval addressing(mode, r) = v and R(r) = load(chunk, v) for some poiner value v. The correcness of he analysis can now be saed: LEMMA 2 (Correcness of available expression analysis). Le S 0 be he iniial sae of he program. For all regular saes S such ha S 0 S, we have S = analyze(s.f). Then, i is easy o prove he correcness of he unificaion check. The predicae W S is a weaker version of S, where we remove he requiremen ha l = ϕ(l), herefore enabling he program couner of he ransformed code o emporarily ge ou of synchronizaion wih ha of he original code. LEMMA 3. Assume S i S S o S i S i unify(analyze(s o.f), S i.f.graph, S o.f.graph, S i.l, S o.l) = rue I o S o Then, here exiss a sae S o such ha S o S o and S i W S S o Indeed, from he hypohesis I o S o and he correcness of he analysis, we deduce ha S o = analyze(s o.f), which implies ha he equaliy used during he unificaion, if any, holds a runime. This illusrae he use of hypohesis on he pas of he execuion of he ransformed program. By doing so, we avoid o mainain he correcness of he analysis in he predicae of equivalence. I remains o sep hrough he ransformed CFG, as performed by pah checking, in order o go from he weak absrac equivalence W S o he full absrac equivalence S. 7.3 Anicipabiliy checking Before proving he properies of pah checking, we need o prove he correcness of he anicipabiliy check: if he check succeeds and he semanics of he inpu program is well defined, hen he righ-hand side expression given o he anicipabiliy check is well defined. LEMMA 4. Assume an checker(f.graph, rhs, l) = rue and S(Σ, f, σ, l, R, M) B for some B. Then, here exiss a value v such ha rhs evaluaes o v (wihou run-ime errors) in he sae R, M. Then, he semanic propery guaraneed by pah checking is ha here exiss a sequence of reducions from successor(ϕ(n)) o ϕ(successor(n)) such ha he absrac invarian of semanic equivalence is reinsaed a he end of he sequence. LEMMA 5. Assume S i W S S o pah(s i.f.graph, S o.f.graph, S o.l, ϕ(s i.l)) = rue S i B for some B Then, here exiss a sae S o such ha S o ε S o and S i S o This illusraes he use of he hypohesis on he fuure of he execuion of he iniial program. All he proofs are raher sraighforward once we know ha we need o reason on he fuure of he execuion of he iniial program. By combining lemmas 3 and 5 we prove he validaion propery for regular saes, according o he following diagram. S S i I o S o S o W S ε S i S o does no go wrong The proofs of he validaion propery for call and reurn saes are similar. 8. Discussion Implemenaion The LCM validaor and is proof of correcness were implemened in he Coq proof assisan. The Coq developmen is approximaely 5000 lines long. 800 lines correspond o he specificaion of he LCM validaor, in pure funcional syle, from which execuable Caml code is auomaically generaed by Coq s exracion faciliy. The remaining 4200 lines correspond o he correcness proof. In addiion, a lazy code moion opimizaion was implemened in OCaml, in roughly 800 lines of code. The following able shows he relaive sizes of he various pars of he Coq developmen. Par Size General framework 37% Anicipabiliy check 16% Pah verificaion 7% Reaching definiion analysis 18% Insrucion unificaion 6% Validaion funcion 16% As discussed below, large pars of his developmen are no specific o LCM and can be reused: he general framework of secion 7.1,

anicipabiliy checking, available expressions, ec. Assuming hese pars are available as par of a oolki, building and proving correc he LCM validaor would require only 1100 lines of code and proofs. Compleeness We proved he correcness of he validaor. This is an imporan propery, bu no sufficien in pracice: a validaor ha rejecs every possible ransformaion is definiely correc bu also quie useless. We need evidence ha he validaor is relaively complee wih respec o reasonable implemenaions of LCM. Formally specifying and proving such a relaive compleeness resul is difficul, so we revered o experimenaion. We ran LCM and is validaor on he CompCer benchmark suie (17 small o medium-size C programs) and on a number of examples handcrafed o exercise he LCM opimizaion. No false alarms were repored by he validaor. More generally, here are wo main sources of possible incompleeness in our validaor. Firs, he exernal implemenaion of LCM could ake advanage of equaliies beween righ-hand sides of compuaions ha our available expression analysis is unable o capure, causing insrucion unificaion o fail. We believe his never happens as long as he available expression analysis used by he validaor is idenical o (or a leas no coarser han) he up-safey analysis used in he implemenaion of LCM, which is he case in our implemenaion. The second poenial source of false alarms is he anicipabiliy check. Recall ha he validaor prohibis anicipaing a compuaion ha can fail a run-ime before a loop or funcion call. The CompCer semanics for he RTL language errs on he side of cauion and reas all undefined behaviors as run-ime failures: no jus behaviors such as ineger division by zero or memory loads from incorrec poiners, which can acually cause he program o crash when run on a real processor, bu also behaviors such as adding wo poiners or shifing an ineger by more han 32 bis, which are no specified in RTL bu would no crash he program during acual execuion. (However, arihmeic overflows and underflows are correcly modeled as no causing run-ime errors, because he RTL language uses modulo ineger arihmeic and IEEE floa arihmeic.) Because he RTL semanics reas all undefined behaviors as poenial run-ime errors, our validaor resrics he poins where e.g. an addiion or a shif can be anicipaed, while he exernal implemenaion of LCM could (righly) consider ha such a compuaion is safe and can be placed anywhere. This siuaion happened once in our ess. One way o address his issue is o increase he number of operaions ha canno fail in he RTL semanics. We could exploi he resuls of a simple saic analysis ha keeps rack of he shape of values (inegers, poiners or floas), such as he rivial in or floa ype sysem for RTL used in (Leroy 2008). Addiionally, we could refine he semanics of RTL o disinguish beween undefined operaions ha can crash he program (such as loads from invalid addresses) and undefined operaions ha canno (such as adding wo poiners); he laer would be modeled as succeding, bu reurning an unspecified resul. In boh approaches, we increase he number of arihmeic insrucions ha can be anicipaed freely. Complexiy and performance Le N be he number of nodes in he iniial CFG g. The number of nodes in he ransformed graph g is in O(N). We firs perform an available expression analysis on he ransformed graph, which akes ime O(N 3 ). Then, for each node of he iniial graph we perform an unificaion and a pah checking. Unificaion is done in consan ime and pah checking ries o find a non-cyclic pah in he ransformed graph, performing an anicipabiliy checking in ime O(N) for insrucions ha may be ill-defined. Hence pah checking is in O(N 2 ) bu his is a rough pessimisic approximaion. In conclusion, our validaor runs in ime O(N 3 ). Since lazy code moion iself performs four daa-flow analysis ha run in ime O(N 3 ), running he validaor does no change he complexiy of he lazy code moion compiler pass. In pracice, on our benchmark suie, he ime needed o validae a funcion is on average 22.5% of he ime i akes o perform LCM. Reusing he developmen One advanage of ranslaion validaion is he re-usabiliy of he approach. I makes i easy o experimen wih varians of a ransformaion, for example by using a differen se of daa-flow analyzes in lazy code moion. I also happens ha, in one compiler, wo differen versions of a ransformaion coexis. I is he case wih GCC: depending on wheher one opimizes for space or for ime, he compiler performs parial redundancy eliminaion (Morel and Renvoise 1979) or lazy code moion. We believe, wihou any formal proof, ha he validaor presened here works equally well for parial redundancy eliminaion. In such a configuraion, he formalizaion burden is grealy reduced by using ranslaion validaion insead of compiler proof. Classical redundancy eliminaion algorihms make he safe resricion ha a compuaion e canno be placed on some conrol flow pah ha does no compue e in he original program. As a consequence, code moion can be blocked by prevening regions (Bodík e al. 1998), resuling in less redundancy eliminaion han expeced, especially in loops. A soluion o his problem is safe speculaive code moion (Bodík e al. 1998) where we lif he resricion for some compuaion e as long as e canno cause run-ime errors. Our validaor can easily handle his case: he anicipabiliy check is no needed if he new insrucion is safe, as can easily be checked by examinaion of his insrucion. Anoher soluion is o perform conrol flow resrucuring (Seffen 1996; Bodík e al. 1998) o separae pahs depending on wheher hey conain he compuaion e or no. This conrol flow ransformaion is no allowed by our validaor and consiues an ineresing direcion for fuure work. To show ha re-usabiliy can go one sep furher, we have modified he unificaion rules of our lazy code moion validaor o build a cerified compiler pass of consan propagaion wih srengh reducion. For his ransformaion, he available expression analysis needs o be performed no on he ransformed code bu on he iniial one. Thankfully, he framework is designed o allow analyses on boh programs. The modificaion mainly consiss of replacing he unificaion rules for operaion and loads, which represen abou 3% of he complee developmen of LCM. (Noe however ha unificaion rules in he case of consan propagaion are much bigger because of he muliple possible srengh reducions). I ook wo weeks o complee his experimen. The proof of semanics preservaion uses he same invarian as for lazy code moion and he proof remains unchanged apar from unificaion of operaions and loads. Using he same invarian, alhough effecive, is quesionable: i is also possible o use a simpler invarian crafed especially for consan propagaion wih srengh reducion. One ineresing possibiliy is o ry o absrac he invarian in he developmen. Insead of posing a paricular invarian and hen develop he framework upon i, wih maybe oher ransformaions ha will luckily fi he invarian, he framework is developed wih an unknown invarian on which we suppose some properies. (See Zuck e al. (2001) for more explanaions.) We may hope ha he resuling ool/heory be general enough for a wider class of ransformaions, wih he possibiliy ha he analyses have o be adaped. For example, by replacing he available expression analysis by global value numbering of Gulwani and Necula (2004), i is possible ha he resuling validaor would apply o a large class of redundancy eliminaion ransformaions.