DeltaPath: Precise and Scalable Calling Context Encoding

DeltaPath: Precise an Scalable Calling Context Encoing Qiang Zeng, Junghwan Rhee, Hui Zhang, Nipun rora, Guofei Jiang, Peng Liu Penn State University, NEC Laboratories merica BSTRCT Calling context provies important information for a large range of applications, such as event logging, profiling, ebugging, anomaly etection, an performance optimization. While some techniques have been propose to track calling context efficiently, they lack a reliable an precise ecoing capability; or they work only uner restricte conitions, that is, small programs without objectoriente programming or ynamic component loaing. These shortcomings have limite the application of calling context tracking in practice. We propose an encoing technique without those limitations: it provies precise an reliable ecoing, supports largesize programs, both proceural an objecte-oriente ones, an can cope with ynamic class/library loaing. The technique thus enables calling context tracking in a wie variety of scenarios. The evaluation on SPECjvm shows that its efficiency is comparable with that of the state-of-the-art approach while our technique provies precise ecoing an emonstrates scalability an flexibility. Categories an Subject Descriptors D..5 [Software Engineering]: Testing an Debugging-Monitors, Testing tools General Terms Performance, Measurement, Reliability Keywors Calling context encoing, object-oriente programming. INTRODUCTION calling context is the sequence of active function/metho invocations that lea to a program location. It provies critical information about ynamic program behavior. The usage has been emonstrate in a wie range of applications, such as ebugging [9, Permission to make igital or har copies of all or part of this work for personal or classroom use is grante without fee provie that copies are not mae or istribute for profit or commercial avantage an that copies bear this notice an the full citation on the first page. Copyrights for components of this work owne by others than the author(s) must be honore. bstracting with creit is permitte. To copy otherwise, or republish, to post on servers or to reistribute to lists, requires prior specific permission an/or a fee. Request permissions from Permissions@acm.org. CGO 4, February 5 9 4, Orlano, FL, US Copyright is hel by the owner/author(s). Publication rights license to CM. CM 978--453-67-4/4/...$5.., 5, 8, 9,,, 3, 7, 8, 3, 3, 39, 37], event logging an error reporting [45, 5, 35], testing [, 6, 4, 38], anomaly etection [, 6, 44], performance optimization [9], an profiling [, 33, 36, 34, 3, 43]. For example, system call event logging is critical for the analysis an iagnosis of the program execution in many prouction systems. Simply logging the system call events fails to recor how program components interact when a system call is issue, while recoring calling contexts woul be very informative. Take profiling as another example; context sensitive profiling is powerful as it associates ata such as execution frequencies, overhea an object life time with calling contexts, an thus provies precise information for program unerstaning an optimization [8]. It is straightforwar to obtain calling contexts through stack walking, which, however, is expensive [6, 3]. few encoing techniques, which represent a calling context using one or more integers, have been propose to track calling contexts continuously with low overhea. Bon an McKinley [4] propose a technique, probabilistic calling context (PCC), that computes a probabilistically unique integer ID, essentially a hash value, for each calling context. lthough PCC encoing is efficient an compact, it oes not provie ecoing, which is essential to applications that require inspecting an unerstaning contexts, such as ebugging, error reporting an event logging [4]. In orer to enable the ecoing capability, Breacrumbs was built on PCC [3]. It collects aitional ynamic information to assist ecoing. Specifically, it recors encoing values at relatively col call sites. Depening on the threshol efining the hot coe, the technique either incurs large overhea or sacrifices ecoing accuracy an reliability. Besies, the ecoing has to be offline because it involves expensive computation (their evaluation use the limit of 5 secons) for recovering one context. Computing calling contexts with low overhea an precise an reliable ecoing capability is challenging. Recent progress was mae by Sumner et al. [4], who propose precise calling context encoing (PCCE) evolve from path profiling []. This technique iterates every possible calling context through static analysis an represents each using a unique encoing uring runtime, so the context can be recovere precisely from the encoing. However, as pointe out in previous research [3], PCCE woul not work in the presence of virtual methos an ynamic class loaing. Besies, hanling large-scale software is a challenge to this technique, as it lacks a scalable solution to the encoing for a large number of calling contexts. More iscussion about PCCE is in Section. The challenges limit the application of the encoing technique in a large range of scenarios, as a lot of software nowaays is written in object-oriente languages with ynamically loae components an a large number of calling contexts. 9

This paper presents DeltaPath, an efficient calling context encoing technique with a precise ecoing capability an the support for both proceural an object-oriente programs. Similar to PCCE, the technique leverages the Ball-Larus path profiling algorithm []. It obtains a unique encoing for each context at runtime by summing up the aition values of the call sites forming the context. ition values are compute using our encoing algorithm base on static analysis of the target program, then an aition value an the aition operation are assigne to each call site through instrumentation. Unlike PCCE, which assumes small or meium-scale software without object-oriente programming or ynamically loae components, DeltaPath oes not have those limitations. It supports both proceural an object-oriente programming, allows ynamic class loaing, an works with largescale software. DeltaPath resolves the limitations base on the following insights an ieas. The aition value of a call site is relate to the number of calling contexts ening at it, while a big program usually contains a very large number of calling contexts, such that aition values may go beyon, i.e., overflow, the encoing integer. To avoi integer overflows, it is very inefficient to represent an operate on aition values using some class (e.g., BigInteger in Java). The insight is that a long calling context can be ivie into several shorter pieces, each can be encoe using an integer. The DeltaPath encoing algorithm automatically fins a small number of functions acting as such iviers for the whole program, avoiing integer overflows systematically with low overhea. In aition, a virtual function call can be ispatche to many possible functions, while PCCE computes an aition value for each target. It is infeasible to insert a bulky switch statement at each virtual function call site an execute it to choose the aition value, for virtual function call sites are ubiquitous an frequently invoke in object-oriente (OO) programs. Our encoing algorithm computes a single aition value for one call site to minimize the coe cache pressure an execution slowown. Moreover, ynamically loae classes introuce calling contexts that are not consiere uring static analysis. call path tracking technique is esigne to etect unexpecte calling contexts an keep the encoing correct. We implemente DeltaPath an performe experiments with a variety of Java programs. Our evaluation results show that its performance is comparable with that of PCC [4], which is a highly efficient an the state of the art calling context encoing technique capable of working in the presence of object-oriente programming an ynamic class loaing. Compare to PCC, DeltaPath introuces precise an reliable ecoing. We mae the following contributions. We propose a new precise calling context encoing technique that supports both proceural an OO programs. Encoing space pressure is aresse systematically. It thus allows the encoing technique to be applie to large-scale software. Dynamic class loaing is hanle properly using the call path tracking technique. We implemente DeltaPath an evaluate it on a variety of Java programs. The efficiency of DeltaPath is comparable with that of PCC while it provies precise an reliable ecoing. The rest of this paper is organize as follows. Section presents the encoing backgroun. Section 3 presents DeltaPath. Section 4 iscusses several practical issues. The implementation etails an evaluation results are presente in Section 5 an Section 6, respec- context B C BD CD BDE CDE BD'E CD'E BDF CDF CF... ID 3 4 B Figure : Example for PCCE encoing. Ege annotations are aition values; an aition value +c means ID+=c is execute before the invocation an ID-=c is execute after; the superscript on D (D ) isambiguates two call sites in D both invoking E. tively. The relate work is iscusse in Section 7, an the future work in Section 8. The paper is conclue in Section 9.. BCKGROUND Ball an Larus propose an efficient algorithm (referre to as the BL algorithm) to encoe intra-proceural control flow paths []. Each of the paths leaing from the entry of a function to the en of it obtains a unique encoing. The algorithm has become canonical in control flow encoing an path profiling. PCCE [4, 4] leverages the BL algorithm an aapts it to encoing calling contexts, which are essentially inter-proceural paths in a call graph. PCCE encoes the calling context using a small number of integer ientifiers (IDs), ieally one. It instruments function calls with aitions to an integer ID such that the calling context ening at any program point can be uniquely represente by the ID along with the program counter of the point. The algorithm calculating aition values consists of two steps of analysis of the target program s call graph. First, it computes the number of calling-contexts (NC) of each noe in the call graph by summing the NC of each of the noes preecessors (the NC of main is ). Secon, with respect to each noe, the aition value for the first ege is, an for each of the rest eges the aition value is the sum of the NCs of the preecessors appeare in the previously processe eges. Consier the call graph in Figure. The annotation of each noe inicates its NC. D s NC=, for example, is the sum of the NCs of B an C, enoting there are two possible contexts when D is invoke. The number along each ege is the aition value. Some eges o not have such numbers, meaning the aition values are. CG s aition value is calculate after EG an FG, thus it is the sum (7) of the NC of E (4) an that of F (3). Take the context CFG as an example for encoing, the ID increases along CF an FG, respectively, so the result is 6. The table in Figure shows the encoings of other calling contexts. Some contexts have the same ID values, for example, B an C. It is fine because an encoing is represente by both the ID an the ening noe. Given an encoing, we can precisely recover the calling context from bottom to top. Consier the ID 6 obtaine at noe G, for example, the ege whose aition value is the greatest but not greater than the ID value is taken, that is, FG. We then jump to noe F meanwhile ecrease the ID by the aition value, 4, for FG. The E + D + + F +4 C 3 +7 G 8

main p p p m V [ p ] V[ p ] V p ] n [ m V[ pi ] ICC[ pi ] V[ pi ] for i=...m ICC[ n] ICC[ pm ] V [ pm ] Figure : Intuition of our algorithm. ecoing continues with F an ID value, from which we can recover eges CF an C the same way. If cycles exist in the call graph, which implies there are recursions in the program, a recursive call path is ivie into acyclic sub-paths, each of which is encoe separately, such that a stack of IDs is use to represent a call path of recursions. We refer the reaers to [4] for more etails. Since recursions are hanle in the same way by our technique, we omit its iscussion an assume acyclic call graphs in the rest of the paper. s a first step towars calling context encoing that allows precise ecoing, PCCE works uner restrictive conitions: programs in proceural languages without ynamically loae components or high encoing space pressure. This has limite the application of PCCE in a variety of scenarios. Our goal is to eliver a calling context encoing technique without those limitations. 3. DELTPTH ENCODING This section presents DeltaPath. Section 3. covers encoing in the presence of virtual function calls, while Section 3. consiers encoing space pressure along with object-oriente programming an escribes a systematic solution. n... n 3. Encoing for Object-oriente Programs With object-oriente programming a function call can be ispatche to multiple targets. Such polymorphism complicates encoing, as a call site may have conflicte aition values ue to the multiple ispatch targets. For example, assume eges D E an DF in Figure are ue to the same virtual function call site. ccoring to the PCCE algorithm, the aition values for ege D E an DF are an, respectively. It is straightforwar an tempting to choose the aition value base on the ynamic ispatch result using a switch statement, which, however, will significantly increase the coe to be inserte an slow own the program execution, as virtual function call sites are massive an frequently invoke in OO programs. In orer to minimize the encoing overhea, our goal is to calculate a single aition value for each call site. However, aition values generate by the PCCE algorithm o not work for the purpose. For example, if is use as the single aition value in the case above, the calling contexts BDF an CF will both be encoe to, which violates the principle of a unique encoing for each ifferent context. The computation of the aition values thus requires a new encoing algorithm. The upper boun of the encoing space representing the calling contexts ening at a noe is calle its inflate calling-context count (ICC). That is, the calling contexts of a noe n are encoe usp... Step : a max{ CV[ n ],..., CV[ nk ]}...... Step : CV [ n ] CV [ nk ] ICC[ p] a Figure 3: Intuition of encoing with ynamic ispatch. virtual function call in p can be ispatche to multiple methos n, n,..., n k. The caniate aition value of a noe n i, enote by CV [n i ], is initially, an it is upate each time the aition value of an ege leaing to n i is calculate. We process the call in two steps. First, the aition value a = max{cv [n ],...,CV [n k ]} is obtaine; secon, CV [n i ] is upate as ICC[p] + a, for i =,...,k. The upate CV [n i ] is then use to calculate the aition value for the next incoming ege of n i. ing the integers in [,ICC[n]). The basic iea of our algorithm is to ensure the invariant that for any given noe, its encoing space is ivie into isjoint sub-ranges, with each sub-range encoing calling contexts along one incoming ege of the noe. Figure illustrates the intuition behin the iea. ssume for a given noe n, it has totally m incoming eges. The aition value along p i n is enote by V [p i ]. The invariant is kept if the following conitions are satisfie: () V [p ] an V [p i ] ICC[p i ] + V [p i ] for i =,...,m, an () ICC[main] = an ICC[n] ICC[p m ] + V [p m ] if n main. It can be prove using inuction. The encoing ID is initialize as at the main function, an ICC[main] = ; the encoing ID is in [,ICC[main]), thus the invariant is satisfie at main. ssume all the preecessors of n satisfy the invariant. The calling contexts ening at n are encoe into m isjoint sub-ranges: [V [p ],ICC[p ] + V [p ]) encoes the contexts along ege p n, an generally, [V [p i ],ICC[p i ] + V [p i ]), for i =,...,m, encoes the contexts along p i n, where V [p i ] ICC[p i ] + V [p i ], that is, V [p i ] the upper boun of the sub-range encoing the contexts along p i n. Plus, accoring to conition (), all the sub-ranges fall in the range [,ICC[n]). Figure 3 illustrates how ICCs an aition values are compute satisfying the conitions above. The virtual function call insie p can be ispatche to n,n,...,n k. Each noe n i is associate with a variable CV [n i ], enoting the caniate aition value that shoul be consiere when computing the aition value for the next incoming ege of n i. CV [n i ] is initially an keeps upating along with the calculation of the aition value. In this case, after the aition value for the call site is assigne as a = max{cv [n ],...,CV [n k ]}, the caniate aition values are upate as CV [n i ] = ICC[p] + a. Later the upate CV [n i ] is use to calculate the aition value for the next incoming ege of n i. By computing the aition value of a call site as the maximum value among the caniate aition values of the noes that the call site can be ispatche to an upating the CVs using the chosen aition value, conition () escribe above is satisfie. fter the aition value for the last incoming ege of a noe n is calculate an n aition value is associate with a call site, so more precisely it shoul be enote as V [p i n]; we use V [p i ] instea for the sake of conciseness. n k

lgorithm Encoing with ynamic ispatch : function ENCODING(N, E) : ICC[main] 3: for n N o 4: CV [n] 5: for n N in topological orer o 6: for e = p,n,l of the incoming eges of n o 7: cs = p,l 8: if cs processesites then 9: continue : processesites processesites {cs} : V [cs] CalculateIncrement(cs) : if n main then 3: ICC[n] CV [n] 4: function CLCULTEINCREMENT(cs = p, l ) 5: a 6: for each e = p,n,l ispatche from cs o 7: if CV [n] > a then 8: a CV [n] 9: for each e = p,n,l ispatche from cs o : CV [n] ICC[p] + a : return a CV [n] is upate for the last time, ICC[n] is assigne as CV [n], satisfying conition (). lgorithm shows the encoing algorithm. The input of the algorithm is the call graph of the target program, CG = N,E, where N is a set of noes with each representing a function an E a set of irecte eges. Each call ege e E is a triple n,m,l where n,m N, are the caller an callee, respectively, an n,l is a call site potentially invoking m. In Java, for example, l is the byte coe inex of the call site in n. call ege in our algorithm is moele as a triple instea of a caller an callee pair in orer to istinguish multiple call sites in the caller that may invoke the same callee. In the encoing examples below we omit l an simply use nm to enote an ege for the sake of simplicity, though. In the beginning, ICC[main] is set to (Line ), an the CV of each noe is initialize to be (Line 3 4). Then the noes are visite in a topological orer; a noe is visite after all its preecessors have been visite. For each non-main noe n, ICC[n] is assigne (Line 3) after all its incoming eges are processe (Line 6 ). For each incoming ege of the noe being visite, the algorithm ientifies its call site (Line 7), an etermines whether it has alreay been processe by searching the call site in processesites, which stores all the call sites that have been processe; as multiple eges can be ue to one call site, this ensures that the algorithm processes each call site only once. Function CLCULTEINCREMENT calculates the aition value for the call site an then upates the CV of each of the noes that the call site can be ispatche to. Due to the topological traversal orer, when noe n is processe, the ICC values of its preecessor noes are alreay assigne, so the upate of CV [n] can refer to ICC[p] without problems (Line ). Consier the example in Figure 4. ICC[] is set to be (Line ), an all the CVs are initialize as (Line 3 4). Upon the visit of B, the aition value for B is CV [B] = (Line 8), then CV [B] = ICC[]+ = (Line ) an ICC[B] = CV [B] = (Line 3). Noe C is processe similarly. Then the topological traversal reaches D. BD is first processe, so CV [D] = is use as the aition value. Next, CV [D] is upate as (= ICC[B] + ), which is then use as the aition value for the next incoming ege CD, an CV [D] is upate as (= ICC[C] + ) (Line ). s CD is the last incoming context B C BD CD BDE CDE BD'E CD'E BDF CDF CF... ID 3 3 4 4 B E Figure 4: Example for encoing with ynamic ispatch. The superscript on D (D ) isambiguates two call sites in D both invoking E. D E an DF are ue to a virtual function call in D, an CF an CG are ue to a virtual function call in C. Noe annotations are ICC values. ege of D, ICC[D] is assigne as (= CV [D]) (Line 3). The traversal then visits noe E. fter DE is processe, CV [E] = ICC[D]+ = (Line ). Next, the last incoming ege of E, D E, is processe. Note that the two eges D E an DF are ue to the same call site, an CV [F] =. The aition value for this call site is hence calculate as max{cv [E],CV [F]} = (Lines 6 8). Then CV [E] an CV [F] are upate as 4(= ICC[D] + ) (Lines 9 ). fter that, ICC[E] is upate as CV [E] = 4 (Line 3). Other noes are similarly processe following the algorithm. In the en, each call site in Figure 4 obtains a single aition value, while each calling context can be encoe uniquely. In orer to accommoate virtual function calls, the algorithm inflates the number of calling contexts (NC) of a noe an uses it as the upper boun of its encoing space, which is thus terme as the inflate calling-context count (ICC). For example, NC[F] = 3, while ICC[F] = 5; the gap between the two enables a uniform aition value for the virtual call site leaing to eges D E an DF. It is interesting to note that when there is no virtual function in a program, ICC[n] = NC[n] for any given noe n, which is the case in the PCCE encoing for proceural programs. The ecoing remains the same (Section ), so it is omitte. 3. Encoing for Large-scale Object-oriente Programs In both PCCE an lgorithm, the aition value at a call site reflects the number of calling contexts ening at the call site, while the number of calling contexts grows exponentially with the size of a call graph. Thus the computation of aition values uring static analysis can incur integer overflows. Moreover, runtime integer overflows may also occur when computing the encoing ID by summing up aition values. The runtime integer overflows can be easily resolve espite some cost: before each aition operation, the encoing juges whether an integer overflow can occur; if so, the current ID value is pushe onto a stack, an the ID is reset to before the encoing continues. The encoing result is then represente as a stack of IDs along with the current ID value. The integer overflow problem uring static analysis is more challenging. Implementation using some big integer classes, for example, Java library provies the BigInteger class supporting representation of huge integers, is straightforwar but woul be very inefficient, as aition values are represente as objects an uring runtime each aition operation becomes a function call. Our goal is to completely avoi the + D +5 + + +4 F +9 C 5 +4 G 4

runtime integer overflow an resolve the integer overflow uring static analysis without using big integer classes. While PCCE achieves this goal by pruning eges uring static analysis to ensure that the resultant call graph can be encoe by a single integer, the technique is not scalable for encoing largesize programs, as massive eges at the eep portion of the call graph woul be prune an the prune eges are hanle at a relatively high runtime cost the same way a runtime integer overflow is processe as aforementione. We aim at a scalable an efficient solution to the integer overflow problem. The observation is that so far all calling contexts begin at the main function, such that the encoing space keeps growing when encoing those calling contexts ening at the eep portion of a call graph. So instea of encoing a calling context as a whole, we ivie it into pieces, each of which can be encoe using an integer without overflows; it implies that each aition value can be represente by an integer, since the encoing value of each piece is the sum of the aition values within the piece. We thus esign an avance version of the encoing algorithm, which chooses a set of anchor noes iviing all long calling contexts in a program into shorter pieces. Each piece begins at an anchor noe an is encoe relative to it. During runtime the encoing maintains a stack. When the program invokes an anchor noe p, the current encoing ID value along with the anchor noe s ientifier is pushe onto the stack, then the ID is reset to an the encoing continues. When the invocation of p returns, the ID is recovere with the poppe ID value. In this way the previously global encoing space pressure is istribute along anchor noes, an each calling context piece can be encoe separately an locally. The anchor noes virtually act as barriers that keep the encoing space pressure from flowing ownstream along the call graph. There are two challenges to be resolve when esigning the algorithm. First, it nees to fin the anchor noes. Secon, given a call site, there exist multiple calling context pieces all reaching the call site while starting from ifferent anchor noes; hence, multiple aition values for the call site may be obtaine ue to the encoing relative to ifferent anchor noes. We revise the static analysis in lgorithm to resolve the two challenges, as shown in lgorithm. It automatically picks anchor noes. Initially only the main noe is in the anchor noe set n (Line ). Whenever an integer overflow occurs while processing an ege p,n,l, p is ae into the set of anchor noes n (Line 5) an the static analysis is rerun (Line 6). In orer to cutting calling contexts correctly, function IDENTI- FYTERRITORIES first walks the territory of each anchor noe. Specifically for each anchor it fins out the noes an eges that can be reache through a boune epth-first search, which starts the traversal from the anchor noe an retreats at other anchor noes; those anchor noes form the bounary of the territory. From the noes an eges in each territory we erive nanchors[n] an eanchors[e], which represent the anchor noes that can reach noe n an ege e, respectively. The territories of anchor noes overlap, which explains the reason behin the secon challenge. In orer to resolve it, the caniate aition values (CV ) an the inflate calling-context counts (ICC) of a noe are extene to two-imensional arrays with the first inex still representing the noe an the secon an anchor noe, taking into account of multiple anchor noes that can reach the noe. For example, CV [n][r] enotes the caniate aition value that shoul be consiere when processing the next incoming ege e of n if an an only if r eanchors[e]. The aition value for a given call site is etermine by the max- lgorithm Encoing resolving encoing space explosion. : function ENCODING(N, E) : n {main} 3: again: Ienti f yterritories(n, E, n) 4: for n N o 5: for r nanchors[n] o 6: CV [n][r] 7: for n N in topological orer o 8: for e = p,n,l of the incoming eges of n o 9: cs = p,l : if cs processesites then : continue : processesites processesites {cs} 3: V [cs] CalculateIncrement(cs) 4: if V [cs] =- then // Overflow etecte. 5: n n {p} 6: goto again // Restart the encoing. 7: if n / n then 8: for r nanchors[n] o 9: ICC[n][r] CV [n][r] : else : ICC[n][n] : function IDENTIFYTERRITORIES(N, E, n) 3: for r n o 4: visiten, visitee BouneDFS(r) 5: for n visiten o 6: nanchors[n] nanchors[n] {r} 7: for e visitee o 8: eanchors[e] eanchors[e] {r} 9: function CLCULTEINCREMENT(cs = p, l ) 3: a 3: for each e = p,n,l ispatche from cs o 3: ssume eanchors[e] = {r,...,r k } 33: a max{cv [n][r ],...,CV [n][r k ]} 34: if a > a then 35: a a 36: for each e = p,n,l ispatche from cs o 37: for r eanchors[e] o 38: CV [n][r] ICC[p][r] + a 39: if CV [n][r] incurs an integer overflow then 4: return - 4: return a imum of the caniate aition values of all its possible ispatch target noes to aress the ynamic ispatch (Line 3 35). ll the anchor noes that can reach each of the possible ispatch eges are taken into account, so that the territory conflict is resolve. fter a noe s incoming eges are processe (Line 8 6), the ICC values are upate. For a non-anchor noe n, the ICC relative to each anchor in nanchors[n] is upate (Lines 8 9), while the ICC of each anchor noe is set to (Line ), which is equivalent with ICC[main] in lgorithm. The encoing algorithm is correct because the invariant escribe in Section 3. is satisfie. Specifically, the encoe ID obtaine at n is in the range of [,ICC[n][top]), where top is the top anchor noe save on the stack. It is notable that the runtime overflow checks are not neee, as the algorithm ensures that there is no integer overflow for ICC values (Line 9), which are the upper bouns of the encoing spaces. Figure 5 shows an example of encoing involving two anchor noes C an D. Consier the encoing of the calling context CFG. Ege CF is first processe. Eges CF an CG are ue to the same 3

context DE D'E DF CF DEG D'EG DFG CFG CG stack c c c ID 3 ICC[D][D]= E ICC[E][D]= + D + F + G C ICC[C][C]= ICC[F][C]= ICC[F][D]= ICC[G][C]=3 ICC[G][D]=4 Figure 5: Example for encoing in large programs. C an D are anchor noes. The superscript on D (D ) isambiguates two call sites in D both invoking E. D E an DF are ue to a virtual function call in D, an CF an CG are ue to a virtual function call in C. The annotation of a noe enotes the ICC values of the noe relative to anchor noes; for example, ICC[E][D] = means the ICC of E relative to anchor D is. stack is use to store the anchor noe ientifiers an the ID values when invoking them. The encoing of a call path CFG, for example, is represente by c on the stack, where c contains anchor C s ientifier an the encoing ID value when C is invoke, along with the encoing ID value, which is the sum of aition values of CF an FG. virtual call site. s CV [F][C] an CV [G][C] are initially both (Line 4 6), the aition value for the call site is thus calculate as max{cv [F][C],CV [G][C]} = (Line 33). Then CV [F][C] an CV [G][C] are upate as ICC[C][C]+a = (Line 38). The aition value for the virtual function call in D are calculate similarly. fter DE, D E an DF are processe, now focus on noe G. CG is alreay processe an the relate call site is inclue in processesites (Line ) ue to the ynamic ispatch at C. EG is processe next with aition value. CV [G][D] = ICC[E][D] + = (Line 38), an CV [G][C] is still, so max{cv [G][D],CV [G][C]} = is use as the aition value for FG (Line 33). For ecoing, we first recover the eepest piece of the calling context accoring to the current ID an the anchor noe on the stack top. Then pop the anchor noe an the ID to continue ecoing the next piece of the calling context. The process repeats until the stack is empty. Given the ID obtaine at G an the anchor noe C on stack stop, we recover a call path CFG. Then anchor noe C an the save ID are poppe from the stack to continue ecoing the next piece of the calling context. 4. PRCTICL ISSUES 4. Dynamic Class Loaing So far we assume a complete call graph for static analysis. However, in reality the ynamic characteristics of a program can rener this assumption invali. Dynamic class loaing, which is common in Java for example, makes the generation of a complete call graph ahea of runtime unresolvable. s a result, relative to a call graph generate for static analysis, ynamically loae classes introuce unexpecte call paths (UCPs). Figure 6 shows such an example, where noe X an its incoming an outgoing eges are missing uring static analysis stage ue to ynamic class loaing. The context BXE contains a UCP B X E; its encoing ID value is. However, if we ecoe it, an incorrect calling context CE is obtaine. This kin of UCPs is hazarous as they lea to incorrect encoing an ecoing. Consier another context BXD, which contains a UCP B X D, an BD B X C D E X missing noe missing ege Figure 6: Incomplete call graph. BD an BX are ue to the same virtual function call, where X is from a ynamically loae class unexpecte by static analysis. an BX are ue to the same virtual call site. Its encoing ID can be ecoe to BD. lthough the ecoe result oes not contain the ynamically loae noe X, it inclues all the other noes in the right orer. We thus consier the encoing correct an this kin of UCPs benign. In orer to keep the encoing correct in the presence of ynamic class loaing, we nee to etect hazarous UCPs an respon accoringly. Inspire by the control flow integrity (CFI) technique [7], we propose a call path tracking technique that checks call transfers, an apply it to etect hazarous UCPs. The technique consists of static analysis an runtime enforcement. In the beginning of the static analysis each noe in the call graph is in a separate set. The analysis then traverses the call graph. For each call site, it fins out the ispatch target noes, an merge the sets that contain those noes. In the en each of the sets left is assigne with a unique set ientifier (SID), an noes in the same set share the SID. In runtime before a call is issue, the expecte callee noe s SID along with the call site an the current encoing ID value is save. t the entry point of each statically loae function, the expecte SID is compare against the function s SID. If they are not equal, a hazarous UCP is etecte. Consier the UCP B X E for example. E is able to etect it as hazarous since the expecte SID set by B is not equal to E s SID. Once a hazarous UCP is etecte, the encoing respons as follows. First, the expecte SID, the call site, the encoing ID, an the current function s ientifier (E, in this case) are pushe onto stack. The encoing ID is reset to zero, an the encoing continues. t the exit point of E, the pushe information is poppe to balance the stack an recover the encoing variables. It is easy to see that the technique allows benign UCPs; for example, in the case of B X D, the expecte SID set by B is equal to D s SID. During ecoing, whenever the ID is an the current function s ientifier is equal to the one on the stack top, it pops the information from the stack to continue the ecoing. Recall that the information pushe on the stack can be ue to invocation to an anchor noe, a hazarous UCP or a recursion. n extra integer can be use to inicate the information type in each stack element. lternative solutions exist. For example, we can maintain a variable representing the epth of invocations of ynamically loae functions by incrementing an ecrementing the variable at each ynamically loae function s entry an exit points, respectively. Such that each statically loae function etects a UCP if the epth is not zero. The reset an recover of the epth variable using a stack are require when statically an ynamically loae function calls interleave. The main engineering avantage of the call path track- Our implementation borrows two bits from the metho ientifier integer to ientify the type of a push, so it oes not nee an extra integer. 4

JDK B E D Figure 7: JDK library classes are exclue from encoing. ing solution is that instrumentation of ynamically loae classes is completely avoie, while in practice it is sometimes ifficult or infeasible to instrument them. For example, if they are loae by custom class loaers, the instrumentation component usually nees to moify the custom loaers; an the security policy of some thirparty components oes not allow instrumentation. 4. Flexible Encoing It is esirable that we can perform a selective encoing for the components of interest in orer to reuce encoing overhea. For example, the JVM an library methos are usually of less interest compare to application ones, as JVM implementation an Java libraries are often consiere black boxes [3]. When recovering calling contexts, users may want to obtain all application methos, while JVM an library methos can be ignore. PCC, for example, reefines that a calling context consists of application functions only an hence encoes application functions solely [4, 3]. Leveraging the call path tracking technique we can skip encoing components of no interest the same way we hanle ynamically loae classes to achieve a more efficient an flexible encoing. s illustrate in Figure 7, JDK methos an the associate eges (enote by ashe circles an lines) are of no interest an exclue from the call graph when running lgorithm, an the encoing is performe only on application methos. During runtime we rely on the call path tracking to etect unexpecte UCPs an encoe correctly. Consier the calling context BDFG for example. only ege B is encoe, while eges BD, DF an FG are skippe an hence no overhea is incurre. G etects the hazarous UCP at its entry point, so it respons as iscusse in Section 4. an continues the encoing. Finally, BG, which consists of application methos only, can be recovere from the encoing result. It illustrates another avantage of the call path tracking solution over the epth tracking one: no encoing or UCP etection coe is execute insie the exclue components, such that the more components are exclue from encoing, the less overhea is incurre by encoing. 5. IMPLEMENTTION The implementation of DeltaPath consists of static analysis an a runtime component. The input of the static analysis is the bytecoe of the target program. Besies, the user can optionally specify classes of interest for selective encoing. We use WL [5] to generate the call graph base on a context insensitive control flow analysis, -CF [4]. lgorithm is then run to compute aition values. The runtime component is implemente as a Java agent []. During runtime it hooks the loaing of each class an instruments the F C G call sites base on the static analysis results using Javassist [7]. Our implementation oes not require the source coe. It is not epenent on a specific Java Virtual Machine (JVM); it can work with any JVM compatible with JDK 5.. 6. EVLUTION In this section we present the effectiveness an efficiency our encoing technique. We use the SPECjvm8 benchmark suite [4], which contains a variety of Java programs incluing compilers (compiler.*), cryptography applications (crypto.*), scientific computation (scimark.*), an xml applications (xml.*). ll experiments were performe on a machine with an Intel Core i7 CPU an 8GB RM. We use Ubuntu.4 an Sun JDK.6..4 on this machine. 6. Static Program Characteristics In SPECjvm a common ispatcher is use in all benchmarks to configure the workloa an collect results. It is exclue from the encoing so that the statistics properly represent the characteristics of iniviual benchmark programs. The static analysis is performe in two encoing settings. One is to encoe functions of both JDK an application classes (encoingall) an the other encoes applications functions only (encoingapplication). The encoing-application setting correspons to the scenario that the user is only intereste in application functions in a calling context. Base on the call path tracking technique, Delta- Path provies the flexibility of encoing only the components of interest. Table presents the static characteristics of the benchmark programs with the two settings: encoing-all an encoing-application. For each program, the table etails the program size in bytes (size), an for each encoing setting, the number of noes (noes) an eges (eges) in the call graph, the number of call sites to be instrumente (CS), the number of virtual function call sites (VCS), an the static maximum encoing ID value (max. ID) which represents the encoing space neee. With the encoing-all setting, the majority of the benchmarks (3 out of 5) nee an encoing space larger than a million. Two benchmarks (sunflow an xml.valiation) in particular require huge encoing spaces with numbers shown in bol, such that even a 64-bit integer is insufficient for encoing. lgorithm resolves the integer overflow problem automatically by aing 6 an 7 anchor noes for sunflow an xml.valiation, respectively. With the encoing-application setting, all programs nee much smaller encoing spaces. xml.transform benchmark nees a 64- bit integer for encoing, while the encoing space of each other benchmarks can fit into a 3-bit integer. In aition we observe that with the encoing-application setting the call graph an the number of instrumente call sites of each program are much smaller than with the encoing-all setting, which implies efficiency benefit ue to the encoing flexibility. 6. Performance Comparison We compare our work with the state of the art encoing technique, Probabilistic Calling Context (PCC) encoing [4], which is a purely runtime mechanism representing each calling context as an integer hash value. Similar to DeltaPath, PCC works with object-oriente programs. The original PCC was implemente as a moule insie Jikes RVM, which is a research JVM allowing internal optimization options. For example, Jikes RVM inlining information enables PCC to optimize the instrumentation by combining multiple encoing computations within inline functions into a single one. Its performance overhea is very low on Jikes RVM, 3% 5

program size encoing-all encoing application (bytes) noes eges CS VCS max. ID noes eges CS VCS max. ID compiler.compiler 4K 38 739 73 839 7.8e7 77 93 3 compiler.sunflow 85K 846 485 55 49 9.6e7 7 83 4 43 compress 59K 98 675 339 394 4e5 98 65 93 57 3 crypto.aes 33K 656 8 8369 3487.5e9 99 69 9 4 5 crypto.rsa 33K 656 84 8386 35 3.6e8 99 76 96 4 6 crypto.signverify 35K 694 89 8548 3576.5e9 96 68 8 47 37 mpegauio 6K 33 9734 9579 46 3.3e4 5 84 497 37 3 scimark.fft.large 57K 79 636 33 347 4e5 78 37 4 9 5 scimark.lu.large 57K 73 66 334 33.e6 76 34 4 4 scimark.monte_carlo 56K 6 59 36 3.4e6 6 4 4 scimark.sor.large 57K 69 64 333 339.4e6 73 8 3 4 scimark.sparse.large 57K 65 65 39 33.e6 69 6 3 9 4 sunflow 458K 777 5485 735 3348 4.4e 69 93 995 485.e6 xml.transform 75K 9766 38 4466 4969.e7 98 4389 635 6.e xml.valiation 478K 673 39 8333 5493 4.6e9 75 97 38 7 Table : Static program characteristics (SPECjvm8). The maximum encoing ID values of sunflow an xml.valiation, shown in bol, are larger than the maximum value of a 64-bit integer (aroun.8e9), so the two benchmarks nee anchor noes. Figure 8: Execution spees applying PCC, DeltaPath without call path tracking (wo/cpt), an DeltaPath with call path tracking (w/cpt), respectively. The spee means the throughput (operations per minute) that reflects the rate at which the system was able to complete invocations of the workloa of that benchmark, an it is normalize against the native runtime. on average an 9% at most. In orer to have a hea-to-hea comparison on our platform, we implemente PCC as a Java agent as well. Therefore, such lowlevel optimizations are not performe. Note that this is not a limitation of our algorithm but rather ue to the implementation choice. If our scheme is implemente in Jikes RVM as the original PCC, both techniques can equally benefit from such low-level optimizations. In fact the composition of our encoing computations (simply aing up multiple aitions into one) is as easy as in PCC, an the call path tracking for the inline functions can also be one at compile time. The main focus of our experiments is to fin out whether our technique is comparable with PCC in terms of efficiency an effectiveness using the same implementation technology. s the original PCC work encoes application functions only, we aopt the encoing-application setting for DeltaPath to instrument the same set of functions. Figure 8 illustrates the normalize execution spee when our encoing technique an PCC are applie, respectively. We use the geometric mean to calculate the average slowown. Overall DeltaPath (without call path tracking) incurs 3.5% slowown on average with call path tracking introucing extra 6.79% slowown, while PCC incurs.5% higher overhea than DeltaPath (without call path tracking). For the majority of the benchmarks ( out of 5 benchmarks), DeltaPath with call path tracking incurs less than 6.5% overhea. In aition, both incur high overhea in a few benchmarks (compress, mpegauio, scimark.monte_carlo an sunflow), as they contain a few small hot functions; the overhea can be largely reuce if the optimization of combining instrumentations is performe for inline functions. The results show that DeltaPath is comparable with PCC in all benchmarks, which reflects the similarity of their runtime implementation: both instrument the same set of call sites with simple arithmetic operations. t a low cost, DeltaPath provies a reliable an precise ecoing capability, which is the most critical ifference between DeltaPath an PCC. In contrast, Breacrumbs [3], which is built on PCC with the purpose of proviing ecoing capability, either incurs almost % overhea for a very accurate ecoing, or sacrifices reliability an accuracy significantly even at a moerate cost (extra % overhea over PCC). 6.3 Dynamic Program Characteristics In orer to evaluate the effectiveness of DeltaPath an PCC, we collect the encoe calling contexts at the entry of the instrumente application functions. Table presents the statistics. It etails the total number of calling contexts collecte (total contexts), the maximum/average number of functions in a calling context (max. epth an avg. epth), the number of unique calling context encoings (unique contexts) collecte by both techniques. For DeltaPath we present aitional information about the encoing stack. The maximum/average epth of the the stack (max. epth an avg. epth), the maximum/average number of hazarous UCPs 6

collecte calling contexts PCC DeltaPath program total max. avg. unique unique max. avg. max. avg. max. contexts epth epth contexts contexts epth epth UCP UCP ID compiler.compiler 9634 5 5. 4 65.3 3.8 4 compiler.sunflow 6375 5.4 56 85 8.3.6 4 compress 34364985. 3 4.. 3 crypto.aes 443 9 5.6 94 7.6. 5 crypto.rsa 53865 9 6. 56 79.. 9 crypto.signverify 5468 9 6. 8 4.. 3 mpegauio 4897943 7 3.4 389 47 3.. 66 scimark.fft.large 5663736. 65 3.. 4 scimark.lu.large 8883839. 53 54.. scimark.monte_carlo 5336776. 34 35.. scimark.sor.large 9363875. 48 67 3.. scimark.sparse.large 549. 46 47.. sunflow 84779 39.8 966 45 6 4.4 3. 847 xml.transform 933346 55 5.5 44 4556 5 3. 3. 664 xml.valiation 977 9. 7 4.. 5 Table : Dynamic program characteristics (SPECjvm8). etecte per calling context (max. UCP an avg. UCP), an the maximum ynamic encoing ID (max. ID) are presente. DeltaPath represents each calling context using a stack. Ieally, the stack only contains one element recoring the entry noe; when the calling context contains recursions, anchor noes or hazarous UCPs, the epth increases. The maximum/average number of hazarous UCPs is not high, showing that hazarous UCPs are etecte in all benchmarks although they are infrequent. The average stack epth is 4.4 varying among the 5 benchmarks, compare to the original calling context epth (5..8). PCC uses only one integer to represent a calling context, which is more concise than DeltaPath. However, PCC collects fewer unique calling context encoings ue to hash collisions, implying that some calling contexts obtain ientical encoing results. Therefore, for applications that o not nee ecoing an allow occasional encoing collisions, such as test coverage an profiling, PCC is a better choice; however, for applications that require precise encoing or ecoing, DeltaPath is generally superior to PCC an Breacrumbs. 7. RELTED WORK Stack Walking. Stack walking is commonly use to obtain calling contexts in ebugging [] an error reporting [6, 3]. However, for applications that nee continuously capture calling contexts, it usually incurs excessive overhea. Dynamic Calling Context Tree. ynamic calling context tree (CCT) [8, 46] is a summary of calling contexts represente as a tree ata structure. Maintaining a complete CCT incurs large space an time overhea, while a CCT obtaine using sampling may miss contexts of interest. In contrast, calling context encoing approaches [4, 4, 4, 3] incluing our work provie concise representation of all calling contexts. Path Profiling. Ball an Larus evelope an algorithm to encoe intraproceural control flow paths []. Melski an Reps extene the work by capturing both inter- an intraproceural control flow [36]. It encoes the whole control flow transfer history leaing to a program point, but their approach oes not scale, because there exist too many possible paths for nontrivial programs, an the inserte coe is complex. Calling context encoing targets a succinct representation of active methos on the call stack with high efficiency. Precise Calling Context Encoing. Sumner et al. propose a precise calling context encoing technique that allows ecoing [4, 4]. However, it oes not work well with object-oriente programming, large-scale software, an ynamic class loaing. The challenges are resolve in our work. Probabilistic Calling Context. Probabilistic calling context [4] by Bon et al. provies an encoing technique that works with OO programs. The encoing result is essentially a hash value of the ientifiers of the functions in the calling context. Its avantages over DeltaPath are that the encoing result is consistently only one integer an it oes not nee static analysis. However, PCC oes not provie ecoing. Their later work aresses this shortcoming by combining call graph analysis an recoring of encoing results [3]. Due to the hash base encoing nature, it remains as a probabilistic approach in essence thus leaving the ecoing stage inaccurate, unreliable an/or expensive. In aition, its ecoing has to be offline. In prouction systems where precise an timely response is neee, eterministic an instant ecoing, which is supporte by DeltaPath, is a highly esire property. 8. FUTURE WORK Prune an Relative Encoing. In applications such as event logging an profiling where the functions of interest are known an the user only nees to capture the calling contexts of those target functions, we can perform a prune encoing by skipping encoing functions that o not invoke target functions irectly or inirectly. For example, in Figure 4, assume the user s interest is the calling contexts of functions D an F; with a simple static analysis we can fin E an G o not lea to the target functions, thus we can skip the encoing operations in E an G. This shoul boost the encoing efficiency for those applications. Moreover, we can exploit the relative positions of the target functions for encoing. For example, after the encoing result of BD is store, to encoe BDF, we simply represent the result as a reference to the previous encoing result an an encoing relative to D, that is, DF. It may reuce the storage space of encoing results. Hybri Encoing. PCC has a compact representation of the encoing result with a lack of precise ecoing capability, while DeltaPath complements PCC. So a hybri encoing approach combining PCC an DeltaPath can be interesting an useful. For example, we can perform profiling to establish the mapping between a set of calling contexts that are most frequently generate in a pro- 7