Draft. Debugging of Optimized Code through. Comparison Checking. Clara Jaramillo, Rajiv Gupta and Mary Lou Soa. Abstract

Size: px
Start display at page:

Download "Draft. Debugging of Optimized Code through. Comparison Checking. Clara Jaramillo, Rajiv Gupta and Mary Lou Soa. Abstract"

Transcription

1 Draft Debugging of Optimized Code through Comparison Checking Clara Jaramillo, Rajiv Gupta and Mary Lou Soa Abstract We present a new approach to the debugging of optimized code through comparison checking. In this scheme, both the unoptimized and optimized versions of an application execute, and values they compute are compared in order to ensure that the behaviors of the two versions are the same. To determine what values should be compared and where the comparisons must take place, statement instances in the unoptimized code are mapped to statement instances in the optimized code. The mappings are derived automatically as optimizations are performed. Annotations for both versions of the code are developed from the mappings. Using the annotations, a driver checks, while the programs are executing, that both programs are producing the same values. If values are dierent, the user determines if there is a bug in the unoptimized code. If so, a conventional debugger is used to debug the code. If the bug is in the optimized code, the user is told where in the code the problem occurred and what optimizations are involved in producing the error. The user can then turn o those oending optimizations and leave the other optimizations in place. This information is also helpful to the optimizer writer in debugging the optimizer. We implemented our checker, COP, and ran experiments which indicate that the approach is practical. Keywords - code optimization, program transformation, comparison checking, debugging. Supported in part by a grant from Hewlett Packard Labs to the University of Pittsburgh.

2 1 Introduction Although code transformations are important in improving the performance of programs, an application programmer typically compiles a program during the development phases with the optimizer turned o. One reason for not using the optimizer is that since a program under development is expected to have bugs, and therefore undergo changes and recompilation, the time spent on optimization is often wasted. An equally important reason is the lack of eective tools for debugging optimized programs. If an error is detected while debugging an optimized program, the user is uncertain as to whether the error was present in the original program or was introduced by the optimizer. Determining the cause of the error is hampered by the limitations of current techniques for debugging optimized code. For example if the user wishes to observe the value of a variable at some program point while debugging the optimized program, the debugger may not be able to report this value because the value of the variable requested by the user may not have been computed yet or it may have been overwritten. Techniques that are able to recover the values for reporting purposes work in limited situations and for a limited set of optimizations[15, 12, 6, 19, 10, 14, 16, 5, 18, 4, 9, 21, 3]. While it may be acceptable to turn o optimizations during the development of the application software, the optimizations should be turned on when the application is in production in order to gain the performance benets provided by optimizations. However when the application, apparently free of bugs, is optimized, its behavior may not be the same as the unoptimized program. In this situation the programmer is likely to assume that errors in the optimizer are responsible for the change in behavior and thus the optimizer is turned o. However, information that reveals the cause for the diering behaviors of unoptimized and optimized code would prove useful. It is possible that the program contained an error which was not previously observed, and changes in the program due to optimization have unmasked the error. For example optimizations change the data layout of a program, which may cause an uninitialized variable to be assigned dierent values in the unoptimized and optimized programs causing them to behave dierently [7]. Clearly in this case the application program must be further debugged. If it is found that an error was introduced by the optimizer, it would be benecial to know the statements and optimizations that were involved in the error. Using this information, the application programmer could turn o oending optimizations in the eected parts of the program. Thus, the error would be removed without sacricing the benet of correctly applied optimizations. Moreover the same information could be used by the optimizer writer to debug the optimizer. In this paper we present comparison checking, a new approach to the debugging of optimized code. In comparison checking, both optimized and unoptimized versions of a program are executed, and any deviations between the behaviors of the two programs are detected. If the outputs of both the optimized and unoptimized programs are the same and correct, the optimized program can be run with condence. If the outputs are dierent, and if the user determines that the output of the unoptimized program is incorrect, the user can debug the unoptimized program using conventional debuggers. On the other hand, if the output of the unoptimized program is correct but the behavior of the optimized program diers from that of the unoptimized program, then the application programmer is provided with the information necessary to turn o selected optimizations in parts of the program. In this manner the application can benet from correctly applied optimizations without the application programmer ever having to directly debug the optimized code. The optimizer writer is also provided with valuable information that can be used to debug the optimizer. Our comparison checking system, COP, is shown in Figure 1. In this system, the behaviors of the optimized and unoptimized program are compared by checking that corresponding assignments of values to source level variables and results of corresponding branch predicates are the same throughout the execution of the optimized and unoptimized versions of the program. In addition, when assignments are made through arrays and pointers, we ensure that the addresses to which the values are assigned correspond to each other. All assignments to source level variables are compared with the exception of values that are dead and hence never computed in the optimized code. The compiler generates annotations for the optimized and unoptimized programs that enable the comparisons of corresponding values and addresses to be made. The advantage of such detailed comparison of the two program versions is that when a comparison fails, we can report the statement where the failure occurred and 1

3 C o m p i l e r unoptimized program program input and correct output Comparison Checking System optimized program unopt. output correct comparisons successful unopt. output incorrect unopt. output correct comparisons unsuccessful Application programmer uses conventional means to debug the unoptimized programmer and modify the program Application programmer turns off selected optimizations in parts of the program Optimizer writer debugs the optimizer using info. on statements and optimizations related to failed checks Figure 1: The Comparison Checking System. the optimizations that involved the statement. This information is valuable in determining the cause of diering behaviors and hence locating the bugs in the application program or the optimizer. The merits of a comparison checking system include: The user has the benet of the full capabilities of a conventional debugger and does not need to deal with the optimized code. The optimized code is not changed, and thus no recompilation is required. The user can condently use optimizers without concern for the correct semantics of the program or the correctness of the optimizations. Information about where an optimized program diers from the unoptimized version benets both the user and optimizer writer. A wide range of optimizations including code reordering transformations, loop transformations, register allocation and inlining can be handled. In this paper, we present the design of the comparison checking system, COP. We also implemented COP and present experimental results that demonstrate the practicality of the system. 2 Comparison Checking The comparison checker executes the unoptimized and optimized programs and compares the values computed by the two programs to ensure that their behaviors are the same. To accomplish this task, three questions must be answered: How should the programs be executed? One approach to comparison checking is to execute the optimized and unoptimized versions one at a time, save the values computed in a trace, and then compare the traces. To avoid the problem of generating long traces, we present a strategy that simultaneously executes the optimized and unoptimized programs and orchestrates their relative progress so that values can be checked on-the-y as they are computed. Which values must be compared? To correctly perform the checks, it is necessary to determine the correspondences between instances of statements in the unoptimized and optimized code. Code transformations can result in the addition, deletion, and reordering of statement instances. Using the knowledge of the changes made by the optimizations, a mapping between statement instances in the unoptimized and optimized code is established. When should the comparisons be made? By analyzing the mappings between statements instances, the optimized and unoptimized programs are annotated with directives that guide the checking of values during execution. These annotations indicate which values must be compared and when the comparisons must be made. Since the values to be compared are not computed in the same order by the unoptimized and optimized code, some values may need to be temporarily saved until they can be compared. The values are saved in a memory-pool 2

4 and annotations direct the checker as to when the values are to be saved in the pool and when they can be safely discarded. While the basic steps of the above strategy are generally applicable to a wide range of optimizations from simple code reordering transformations to complex loop transformations, the complexity of the steps depends upon the nature of the optimizations. In the remainder of this section, we focus on a class of transformations that satisfy the following constraints: Control ow structure constraint: The branching structure of the program is not altered by the optimizations. While statements may be either inserted along edges of the control ow graph or existing statements may be removed by the optimizations, no branches can be added or deleted by the optimizations. Instance reordering constraint: The execution of instances of a statement in the unoptimized code cannot be arbitrarily reordered by the optimizations. If a statement lies within the same loop nest before and after optimization, then the order in which its instances are executed by the unoptimized and optimized code is the same. If a statement is inside a loop nest in one program version and outside the loop nest in the other, then either all instances within the loop nest correspond to the single instance outside the loop nest (e.g., loop invariant code motion) or a specic single instance within the loop nest corresponds to the single instance outside the loop nest (e.g., PDE). Finally if the statement is in dierent loop nests in the two programs then all values computed by all instances in the two programs must be the same. Extensions to handle transformations that do not satisfy these restrictions are discussed in a later section. We assume that code improving transformations are done at the intermediate code level, and thus our checker is language independent. We also assume a ow graph representation of the program. In the remainder of this section we discuss details of the execution strategy, mappings, and annotations. 2.1 Execution Strategy The execution of the unoptimized code drives the execution of the optimized code in COP. A statement in the unoptimized code is executed, and using the annotations, the checker can determine if the value computed can be checked at this point. If so, the optimized program executes until the corresponding value is computed, at which time the check is performed on the two values. While the optimized program executes, any values that are computed \early" (i.e., the corresponding value in the unoptimized code has not been computed yet) are saved in the memory pool, as directed by the annotations. If annotations indicate that the checking of the value computed by the unoptimized program cannot be performed at this point, then the value is saved for future checking. The system continues to alternate between executions of the unoptimized and optimized programs. Annotations also indicate when values that were saved for future checking can be nally checked and when the values can be removed from the memory pool. Any statement instances that are eliminated in the optimized code are not checked. Consider the example program segment in Figure 2 and assume that all the statements shown are source level statements. The unoptimized code is given in Figure 2(a) and the optimized code is given in Figure 2(b). The mappings between statements of the unoptimized and optimized representations are shown by dotted lines, and annotations are displayed in dotted boxes. In the example, the following optimizations have been applied: constant propagation - the constant 1 in S1 is propagated, as shown in S2 0 and S9 0. loop invariant code motion - S3 is moved out of the loop. (partial) redundancy elimination (PRE) - S7 is partially redundant with S6. copy propagation - the copy M in S4 is propagated, as shown by S5 0 and S8 0. dead code elimination - S1 and S4 are dead after constant and copy propagation. (partial) dead code elimination (PDE) - S8 is partially dead along all paths in the loop except the last iteration. 3

5 Unoptimized Program S1 A = 1 S2 T1 = A Optimized Program S2 T1 = 1 S3 M = X * X Check S2 Save S3 all 1 S3 M = X * X S4 B = M S5 IF (B > T1) S5 IF (M > T1) Check S3 Check S5 F T F T S6 C = T1 + X Save S7 S7 C = T1 + X S6 C = T1 + X Check S6 Save S6 S7 C = T1 + X S8 D = B + C S9 T1 = T1 + A S10 IF (T1 < 100) F S11 E = D * 2 Delay S8 T Checkable S8 Delete S8 Delete S8 last 1 S9 T1 = T1 + 1 S10 IF (T1 < 100) F S8 D = M + C S11 E = D * 2 T Check S7 Delete S6,S7 Check S9 Check S10 Delete S3 Check S8 Check S11 Figure 2: Mappings and Annotations for Unoptimized and Optimized Code Given the mapping and annotations, we now describe the operations of the checker on the example in Figure 2. Techniques to determine the mappings and annotations are discussed in subsequent sections. The unoptimized program starts to execute with S1. Since S1 is eliminated from the optimized program, the unoptimized program continues to execute. After S2 executes, the checker determines that the value computed can be checked at this point and so the optimized program executes until Check S2 is encountered, which occurs at S2 0. The values computed by S2 and S2 0 are compared. The unoptimized program resumes execution and the loop iteration at S3 begins. S3 executes and again the optimized program executes until the value computed by S3 can be compared, indicated by the annotation Check S3. However, when S3 0 executes, the annotation Save S3 0 is encountered and consequently, the value computed by S3 0 is stored in the memory pool. This was done because a number of comparisons has to be performed using this value. The next check is the control predicate found at S5. Assume that the T branch is taken. S6 executes and its value is checked with the value computed by S6 0. The value of S6 0 is also saved in the memory pool because there is another comparison that will need this value. S7 in the unoptimized code executes and is compared with the value saved from S6 0. The value computed by S6 0 (or S7 0 if the F branch was taken) is now deleted from the pool, as directed by the annotation Delete S6 0,S7 0. S8 executes and the checker nds the Delay S8 annotation, indicating that the check on S8 cannot be performed at this point and so the value computed by S8 is stored in the memory pool. S9 and S10 are executed and compared. Assume that only one iteration of the loop is performed. As the unoptimized program continues to execute, the checker nds the Checkable S8 annotation. The checker knows that S8 can now be checked and the optimized code resumes. The checker immediately nds the Delete S3 0 annotation so the value computed by S3 0 can now be deleted from the pool as it will never be needed again. S8 0 of the optimized code executes and the values of S8 and S8 0 are compared. Now, back in the unoptimized code, the value computed by S8 can be deleted as directed by the Delete S8 annotation. S11 is treated similarly. 2.2 Mappings The mappings capture the correspondences between statement instances in the unoptimized and those in the optimized programs, which are created as program transformations are applied. Transformations can be applied in any order and as many times as desired. A mapping must indicate which statements and, in particular, what instances of the statements must produce the same value in the unoptimized and optimized programs. Thus, a mapping has two components: an association of a statement in the unoptimized code with a statement in the optimized code, and an association of instances of the statements. 4

6 For the class of transformations that we are considering in this section, we refer to a statement's instances as follows. If a statement is not enclosed in a loop, then we refer to any one instance of the statment. If the statement is enclosed in a loop nest (could be just one loop) either we refer to any one instance of the statement in the loop nest, all instances of the statement in the nest, or the last instance of the statement in the loop nest. A mapping is dened from statement instances in the unoptimized to statement instances in the optimized code, e.g., one! one, all! one, etc. Consider the case when the corresponding statements from the unoptimized and optimized code are inside the same loop nest, and their instances are referred to as one. The number of times the two statements are executed is the same and the values computed by corresponding instances must be the same. If the statement on the optimized side is moved inside a loop, then its instances are referred to as all or last. The values computed by all instances or the last instance in the optimized code should be equal to the value computed by the one instance in the unoptimized code. Let us assume that a statement from the unoptimized code inside a loop nest is referred to as all. In this case all values computed by the statement in the unoptimized during a single complete execution of the loop nest must be equal. This value must equal one or more values computed by the corresponding statement in the optimized code. If the corresponding statement instance in the optimized code is immediately outside the loop nest and therefore referred to as one, then this is the value that must be compared. If the corresponding statement is inside a dierent loop nest then values of all or last instance in a single execution of this loop nest must be compared to the corresponding value from the unoptimized code. Finally let us assume that the statement in the unoptimized code is inside a loop nest and its last instance is of interest. If the corresponding statement instance in the optimized code is immediately outside the loop nest and therefore referred to as one, then the last value from the execution of the loop nest in the unoptimized code should equal the corresponding one value computed in the optimized code. If instead the statement is inside a dierent loop nest in the optimized code, then the last or all values computed by the statement in a single execution of this loop nest must be compared with the last value from the unoptimized code. We determine the mappings for individual transformations by using the semantics of those transformations. From the mappings of individual transformations, the mapping for any series of transformations can be easily determined. The optimized program initially starts as an identical copy of the unoptimized program with a one! one mapping between statements in the two programs. As optimizations are applied, the mappings are updated according to the mappings of the individual transformations. As code reordering transformations are applied, the mappings are changed to reect the eects of the transformation on a statement and its instances. Code transformations that do not involve code motion in and out of loops, either do not change the mapping, delete the mapping or require copies of the mapping to be placed on inserted statements. Moving statements in and out of loops causes the instance associations to change. When moving a statement out of a loop to above the loop, a one association changes to all and changes to last when moving a statement out of a loop but below the loop. If the instance association already was all or last, it remains as it was. When moving into loops, the association changes from one to all. When moving a statement with an all into another loop, the instance association is not changed. Consider the example in Figure 3 which shows a series of transformations and the way that the instance associations change. In the rst graph, assume the statement x = a + b is partially dead and is moved out of the inner loop, as shown in the second graph. The original association of one! one is changed to last! one. Assuming that the statement x = a + b is still partially dead, it is moved out of the outer loop, as shown in the third graph, and the mapping is last! one. Now assume that the statement x = a + b is moved into the lower loop due to code scheduling. The last! one mapping changes to one! all mapping. Figure 3(b) shows the nal mapping of the statement after all three transformations have been applied. The instance association has changed from the initial one! one to the nal association last! all. This is semantically correct as the last value computed by the inner loop must match the values computed by all instances of the bottom loop. 2.3 Annotations Code annotations are derived from the mappings after all of the code transformations have been applied. Code annotations guide the comparison checking of values computed by corresponding statement instances from the optimized and unoptimized code. These annotations (1) identify program points where comparison checks can 5

7 x=a+b x=a+b last:1 x=a+b last:1 x=a+b last:all 1:all x=a+b x=a+b (a) Applying PDE and Code Scheduling. (b) Final Mapping. Figure 3: Combinations of Transformations. be made, (2) indicate if values should be saved in the memory pool so that they will be available when checks are performed, and (3) indicate when a value currently residing in the memory pool can be discarded. The annotations are either associated with a statement or program point in the optimized code or the unoptimized code. In all cases once the statement or program point is reached, the actions associated with the annotation are executed by the checker. Four dierent types of annotations are needed to implement our comparison checking strategy. Check S uopt annotation: This annotation is associated with a statement or program point in the optimized code to indicate that a check of a value from the unoptimized program is to be performed. The corresponding value that it has to be compared with is either the result of the most recently executed statement in the optimized code or the value is in the memory pool. The positions of checks are determined as follows. Given the position of the unoptimized statement in the ow graph, if the corresponding statement in the optimized code is at the same or a later point in the ow graph, then the check annotation is associated with the statement in the optimized code. On the other hand if the statement is executed by the optimized code at an earlier point, then the check annotation is associated with the program point in the optimized code which represents the original position of the statement in the unoptimized code. In this case the value computed by the statement in the optimized code is in the memory pool. In Figure 2, since the positions of corresponding predicates S5 and S5 0 are at the same positions in the ow graph, the annotation Check S5 is associated with S5 0. Since S7 has been moved to an earlier point S7 0, the annotation Check S7 is associated with the original position in the optimized code. Finally since the check for statement S8 must be delayed until the point after the loop, the annotation Check S8 is introduced at this point in the optimized code. Save S opt annotation: If a value computed by a statement S opt in the optimized code cannot be immediately compared with the corresponding value computed by the unoptimized code then the value of S opt must be saved in the memory pool. In some situations a value computed by S opt is to be compared with multiple values computed by the unoptimized code and therefore it must be saved until all those value have been computed and compared. The annotation Save S opt is associated with S opt in the optimized code to ensure that the value is saved. In Figure 2 the statement S3 in the optimized code, which is moved out of the loop by invariant code motion, corresponds to statement S3 0 in the optimized code. The value computed by S3 0 cannot be immediately compared with the corresponding values computed by S3 in the unoptimized code since S3 0 is executed prior to the execution of S3. Thus, the annotation Save S3 0 is associated with S3 0. Delay S uopt and Checkable S uopt annotations: If the value computed by the execution of a statement S uopt in the unoptimized code cannot be immediately compared with the corresponding value in the optimized code because the correspondence between the values cannot be immediately established, then the value of S uopt must be saved in the memory pool. The annotation Delay S uopt is associated with S uopt to indicate that the checking of value computed by S opt should be delayed until the correspondence can be established and the value must be saved in the memory pool until then. The point in the unoptimized code at which checking can nally be performed is marked using the annotation Checkable S uopt. 6

8 In some situations the correspondence between the statement instances cannot be established unless the execution of the unoptimized code is further advanced. Thus, the checking of the value computed by the unoptimized code must be delayed. In Figure 2 statement S8 inside the loop in the unoptimized code is moved after the loop in the optimized code by the PDE optimization. In this situation only the value computed by statement S8 during the last iteration of the loop is to be compared with value computed by S8 0 in the optimized code. However, we can only determine that an execution of S8 corresponds to the last loop iteration when the unoptimized code exits the loop. Therefore the checking of S8's value is delayed. There is another situation in which a check is delayed for reasons of eciency. Consider the example in Figure 4a in which the computation of x's value is moved from before the loop to after the loop. In this case, after x has been computed by the unoptimized code, the execution of the optimized code is advanced to the point after the loop and the value of x is checked. However, all values of y that are computed inside the loop would have to be saved resulting in potentially a large memory pool. In order to avoid the creation of a large pool, we can delay the checking of the value of x until after the loop as shown in Figure 4b. S: x =... Delay S S: x =... y =... y =... 1:1 y =... y =... 1:1 S : x =... Check S Checkable S Delete S S : x =... Check S (a) Using save annotation. (b) Using delay annotation for efficiency. Figure 4: Types of Annotations. Delete S: This annotation is associated with a statement or program point in the optimized/unoptimized code to indicate that the value computed by statement S in the optimized/unoptimized code can now be discarded. A value computed by the unoptimized/optimized code that is placed in the pool is removed by a delete annotation in the unoptimized/optimized code. Since a value may be involved in multiple checks, a delete annotation must be introduced at a point where we are certain that all relevant checks have been performed and therefore it is safe to discard the value. In Figure 2 the annotation Delete S3 0 is introduced after the loop in the optimized code because at that point we are certain that all values computed by statement S3 in the unoptimized code have been compared with the corresponding value computed by S3 0 in the optimized code. The algorithms for introducing annotations require control ow analysis to locate the positions at which the annotations must be introduced. We omit the details of these algorithms from this abstract due to space limitations. The algorithms will be included in the completed paper. 3 Experimental Results We implemented COP to test our algorithms for instruction mapping, annotation placement and checking, and performed experiments to assess the practicality of COP. Lcc [11] was used as the compiler for the application program and was extended to include a set of optimizations, namely loop invariant code motion, dead code elimination, PRE, copy propagation, and constant propagation and folding. As a program is optimized, mappings are updated. Besides generating target code, lcc was extended to generate a le containing breakpoint information and annotations that are derived from the optimization mappings. Thus compilation and optimization of the application program produces the target code for both the unoptimized program and optimized program and auxiliary les containing breakpointing information and annotations for both the unoptimized and optimized programs. These auxiliary les are used by the checker. Breakpoints are generated whenever the value of a source level assignment or a predicate is computed, and whenever array and pointer addresses are computed. Breakpoints are also generated to save base addresses for dynamically allocated storage of structures (e.g. malloc, 7

9 Table 1: Test Programs Optimization Statistics yacc wc 8q.c sort.c 124.m88ksim 130.li Loop invariants Copies propagated Constants propagated Dead statements PRE expressions Table 2: Execution Times (minutes:seconds). Program Source Unoptimized Version Optimized Version length annotated annotated (lines) (cpu) (cpu) (cpu) (cpu) yacc : : : :21.48 wc : : : : q 39 00: : : :00.23 sort 65 00: : : : m88ksim : : : : li : : : :35.76 free etc.). We compare array addresses and pointer addresses by actually comparing their osets from the closest base addresses collected by the checker. Breakpointing is implemented using fast breakpoints [17]. Two versions of COP were implemented. In the rst version, traces of the unoptimized and optimized program were collected and the comparison checks were performed on the traces. Since traces can be arbitrarily large, our second version instead performs on-the-y checking, which performs comparisons during the execution of both programs. The unoptimized program, optimized program, and checker can execute on the same machine or on dierent machines, and messages are sent between the programs and checker. A buer is used to reduce the number of messages that are sent between the executing programs and the checker. In our experiments we ran COP on an HP 712/100 and the unoptimized and optimized programs on separate SPARC 5 workstations. Messages were passed through sockets. We ran some of the integer Spec95 benchmarks as well as some smaller test programs. (Note: we are continuing to run more programs and will add the results to the nal paper.) Although the benchmarks did not include oating point numbers, these can be handled by our system by allowing for inexact equality; that is by allowing two oating point numbers to dier by a certain small delta[2]. Table 1 shows the total number of times various optimizations were applied by our optimizer. Table 2 shows the cpu execution times of the unoptimized and optimized programs with and without annotations. On average the annotations slowed down the execution of the unoptimized programs by a factor of 10 and that of the optimized programs by a factor of 15. The optimized program experiences greater overhead than the unoptimized program because more annotations are added to the optimized program. Although the optimizations were frequently applied, no signicant reduction in execution time was observed. This is because we have not yet incorporated a global register allocator into our compiler. Thus, at present, the introduction of temporaries during PRE actually slows down program execution. The response time of the checker depends greatly upon the lengths of the execution runs of the programs. For small programs, comparison checking took from a few seconds (7 seconds for sort) to a few minutes (12 minutes for wc). Both value and address comparisons were performed in this experiment. For the two Spec95 benchmarks the comparison checker took several hours to execute (3 hours for 124.m88ksim and 6 hours for 130.li). These times are clearly acceptable if comparison checking is performed o-line. We found that the response time of the checker is bounded by the speed of the network which was 10 Mbits of data per second for our experiments. A faster network would considerably lower these response times. We also measured the memory pool size during 8

10 our experiments and found it to be quite small. A maximum pool size of 90 was observed during the execution of the 124.m88ksim benchmark. 4 Extensions Although in the previous sections we described mappings, annotations and an implementation that considered a certain class of code transformations, our basic approach is general and can be extended to consider more complex transformations. In this section we describe the extensions to allow inlining, loop transformations, and checking in the presence of register allocation. Inlining: Function inlining replaces calls to a function in the unoptimized code by bodies of the function in the optimized code and each of the inlined bodies may be optimized dierently. Therefore for each call site, a separate mapping is maintained between the statements in the function in the unoptimized code and the inlined copy in the optimized code. By analyzing the mappings corresponding to each call site, a set of annotations is computed. At runtime when the function is executed, the checker must select and follow the appropriate set of annotations by using the knowledge of the call site encountered during program execution. Loop Transformations: Our approach can also be extended to allow loop transformations such as loop reversal, distribution, fusion, interchange, etc. The statement instances must be more precisely identied by the mappings for these transformations. It is no longer sucient to refer to the instances as one, all or last. Instead we must refer to the instances by the iteration numbers (or formulas) during which they are executed. For example consider a loop whose control variable i takes the values 1 through 10 and it contains an assignment to A[i]. After loop reversal is applied to the loop, the mapping (1; 10)! (10; 1) can be used to express the relationship between the instances of assignment to A[i] in the unoptimized and optimized code. The annotations and checks can be performed correctly according to this mapping. If a loop nest is involved in the transformation (eg., loop interchange), then the mapping will be multidimensional. Register allocation: We can handle code in which values are stored in registers. In terms of the machine code, the mappings would exist between instructions whose resulting values are to be compared. By examining the instruction we can determine whether the result of the instruction is stored in a register or a memory location and the value can be appropriately retrieved. If the value cannot be compared as yet, it would be saved in the pool. 5 Related Work The problem of debugging optimized code has long been recognized, with most of the previous work focusing on the development of debugging tools for optimized code [15, 12, 22, 6, 19, 10, 14, 16, 5, 18, 4, 8, 9, 21, 3]. In these approaches, limitations have been placed on the debugging system by either restricting the type or placement of optimizations, modifying the optimized program or inhibiting debugging capabilities. They have also varied results in handling the code location and data value problems, which are introduced by code optimizations, although none can handle all the problems. Unfortunately, these techniques have not found their way into production type environments, and debugging of optimized code still remains a problem. The goal of our system is not to build a debugger for optimized code but a comparison checker of optimized code. In our approach, the user can still use conventional debuggers for unoptimized code that are currently in use. The most closely related work to our approach is Guard, which is a relative debugger, but not designed to debug optimized programs [20, 2, 1]. Using Guard, users can compare the execution of one program, the reference program, with the execution of another program, the development version. Guard requires the user to formulate assertions about the key data structures in both versions which specify the locations at which the data structures should be identical. The relative debugger is then responsible for managing the execution of the two programs and reporting any dierences in values. The technique does not require any modications to user programs and can perform comparisons on-the-y. The important dierence between Guard and COP is that in Guard, the user essentially has to put all of the mappings and annotations by hand, while this is done automatically in COP. Thus using COP, the optimized program is transparent to the user. We also are able to check the entire program which would be dicult in Guard since that would require the user putting in all mappings. In COP, 9

11 we can easily restrict checking to certain regions or statements as Guard does. We can also report the particular optimizations that were involved in producing erroneous behavior. The concept of a bisection debugging model, and a high level approach, was recently presented that also has as its goal the identication of semantic dierences between two versions of the same program, one of which is assumed to be correct [13]. The bisection debugger attempts to identify the earliest point where the two versions diverge. However, in order to handle the debugging of optimized code, all data values problems have to be solved at all breakpoints. References [1] Abramson, D. A., Foster, I., Michalakes, J., and Sosic, R. Relative Debugging and its Application to the Development of Large Numerical Models. In Proceedings of IEEE Supercomputing 1995, December [2] Abramson, D., Foster, I, Michalakes, J., and Sosic, R. A New Methodology for Debugging Scientic Applications. Communications of the ACM, 39(11):69{77, November [3] A. Adl-Tabatabai. Source-Level Debugging of Globally Optimized Code. PhD dissertation, Carnegie Mellon University, Technical Report CMU-CS [4] Adl-Tabatabai, A., and Gross, T. Evicted Variables and the Interaction of Global Register Allocation and Symbolic Debugging. In Proceedings 20th POPL Conference, pages 371{383, January [5] Brooks, G., Hansen, G.J., and Simmons, S. A New Approach to Debugging Optimized Code. In Proceedings ACM SIGPLAN'92 Conf. on Programming Languages Design and Implementation, pages 1{11, June [6] Chase, B. and Hood, R. Selective Interpretation as a Technique for Debugging Computationally Intensive Programs. In ACM SIGPLAN '87 Symposium on Interpreters and Interpretive Techniques, pages 113{124, June [7] Copperman, M. Debugging Optimized Code Without Being Misled. Technical Report 92-01, Board of Studies in Computer and Information Sciences, University of California at Santa Cruz, May [8] Copperman, M., and McDowell, C.E. Detecting Unexpected Data Values in Optimized Code. Technical Report 90-56, Board of Studies in Computer and Information Sciences, University of California at Santa Cruz, October [9] Copperman, Max. Debugging Optimized Code Without Being Misled. ACM Transactions on Programming Languages and Systems, 16(3):387{427, [10] Coutant, D.S., Meloy, S., and Ruscetta, M. A Practical Approach to Source-Level Debugging of Globally Optimized Code. In Proceedings ACM SIGPLAN'88 Conf. on Programming Languages Design and Implementation, pages 125{ 134, June [11] Fraser, Chris, and Hanson, David. A Retargetable C Compiler: Design and Implementation. Benjamin/Cummings, [12] Fritzson, P. A Systematic Approach to Advanced Debugging through Incremental Compilation. In Proceedings ACM SIGSOFT/SIGPLAN Software Engineering Symposium on High-Level Debugging, pages 130{139, [13] Gross, Thomas. Bisection Debugging. In Proceedings of the AADEBUG'97 Workshop, pages 185{191, May, [14] Gupta, R. Debugging Code Reorganized by a Trace Scheduling Compiler. Structured Programming, 11:141{150, [15] Hennessy, J. Symbolic Debugging of Optimized Code. ACM Transactions on Programming Languages and Systems, 4(3):323{344, July [16] Holzle, U., Chambers, C., and Ungar, D. Debugging Optimized Code with Dynamic Deoptimization. In Proceedings ACM SIGPLAN'92 Conf. on Programming Languages Design and Implementation, pages 32{43, June [17] Kessler, P. Fast Breakpoints: Design and Implementation. In ACM SIGPLAN Proceedings of Conf. on Programming Languages Design and Implementation, pages 78{84, [18] Pineo, P.P. and Soa, M.L. A Practical Approach to the Symbolic Debugging of Code. Proceedings of International Conference on Compiler Construction, 26(12):357{373, April [19] Pollock, L.L., and Soa, M.L. High-Level Debugging with the Aid of an Incremental Optimizer. In 21st Annual Hawaii International Conference on System Sciences, volume 2, pages 524{531, January [20] Sosic, R. and Abramson, D. A. Guard: A Relative Debugger. Software Practice and Experience, February

12 [21] Wismueller, R. Debugging of Globally Optimized Programs Using Data Flow Analysis. In Proceedings ACM SIG- PLAN'94 Conf. on Programming Languages Design and Implementation, pages 278{289, June [22] Zellweger, P.T. An Interactive High-Level Debugger for Control-Flow Optimized Programs. In Proceedings ACM SIGSOFT/SIGPLAN Software Engineering Symposium on High-Level Debugging, pages 159{171,

Comparison Checking. Dept. of Computer Science. Chatham College. Pittsburgh, USA. Rajiv Gupta. University of Arizona. Tucson, USA

Comparison Checking. Dept. of Computer Science. Chatham College. Pittsburgh, USA. Rajiv Gupta. University of Arizona. Tucson, USA Verifying Optimizers through Comparison Checking Clara Jaramillo Dept. of Computer Science Chatham College Pittsburgh, USA Rajiv Gupta Dept. of Computer Science University of Arizona Tucson, USA Mary Lou

More information

Comparison Checking: An Approach to Avoid Debugging of Optimized Code?

Comparison Checking: An Approach to Avoid Debugging of Optimized Code? Comparison Checking: An Approach to Avoid Debugging of Optimized Code? Clara Jaramillo, Rajiv Gupta, and Mary Lou Soffa Department of Computer Science, University of Pittsburgh Pittsburgh, PA 15260, U.S.A.

More information

Debugging and Testing Optimizers through Comparison Checking

Debugging and Testing Optimizers through Comparison Checking Electronic Notes in Theoretical Computer Science 65 No. 2 (2002) URL: http://www.elsevier.nl/locate/entcs/volume65.html 17 pages Debugging and Testing Optimizers through Comparison Checking Clara Jaramillo

More information

Bisection Debugging. 1 Introduction. Thomas Gross. Carnegie Mellon University. Preliminary version

Bisection Debugging. 1 Introduction. Thomas Gross. Carnegie Mellon University. Preliminary version Bisection Debugging Thomas Gross School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Institut für Computer Systeme ETH Zürich CH 8092 Zürich Preliminary version Abstract This paper

More information

\Symbolic Debugging of. Charles E. McDowell. April University of California at Santa Cruz. Santa Cruz, CA abstract

\Symbolic Debugging of. Charles E. McDowell. April University of California at Santa Cruz. Santa Cruz, CA abstract A urther Note on Hennessy's \Symbolic ebugging of Optimized Code" Max Copperman Charles E. Mcowell UCSC-CRL-92-2 Supersedes UCSC-CRL-9-0 April 992 Board of Studies in Computer and Information Sciences

More information

Debugging Optimized Code. Without Being Misled. Max Copperman UCSC-CRL June 11, University of California, Santa Cruz

Debugging Optimized Code. Without Being Misled. Max Copperman UCSC-CRL June 11, University of California, Santa Cruz Debugging Optimized Code Without Being Misled Max Copperman UCSC-CRL-93-21 June 11, 1993 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA 95064

More information

SRC Research. A Practical Approach for Recovery of Evicted Variables. Report 167. Caroline Tice and Susan L. Graham.

SRC Research. A Practical Approach for Recovery of Evicted Variables. Report 167. Caroline Tice and Susan L. Graham. November 30, 2000 SRC Research Report 167 A Practical Approach for Recovery of Evicted Variables Caroline Tice and Susan L. Graham Systems Research Center 130 Lytton Avenue Palo Alto, California 94301

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

DTIC ELECTE. co AD-A Computer Science 00 ~~MAY 06 19

DTIC ELECTE. co AD-A Computer Science 00 ~~MAY 06 19 Computer Science AD-A278 898 Symbolic Debugging of Globally Optimized Code: Data Value Problems and Their Solutions *~, ~Ali-Reza Adl-Tabatabai Thomas Gross January 1994 CMU-CS-94-105 DTIC ELECTE 00 ~~MAY

More information

S1: a = b + c; I1: ld r1, b <1> S2: x = 2;

S1: a = b + c; I1: ld r1, b <1> S2: x = 2; Technical Report IMPAT-98-01 1 A Novel reakpoint Implementation Scheme for Debugging Optimized ode Le-hun Wu Wen-mei W Hwu z Department of omputer Science z Department of Electrical and omputer Engineering

More information

REDUCTION IN RUN TIME USING TRAP ANALYSIS

REDUCTION IN RUN TIME USING TRAP ANALYSIS REDUCTION IN RUN TIME USING TRAP ANALYSIS 1 Prof. K.V.N.Sunitha 2 Dr V. Vijay Kumar 1 Professor & Head, CSE Dept, G.Narayanamma Inst.of Tech. & Science, Shaikpet, Hyderabad, India. 2 Dr V. Vijay Kumar

More information

director executor user program user program signal, breakpoint function call communication channel client library directing server

director executor user program user program signal, breakpoint function call communication channel client library directing server (appeared in Computing Systems, Vol. 8, 2, pp.107-134, MIT Press, Spring 1995.) The Dynascope Directing Server: Design and Implementation 1 Rok Sosic School of Computing and Information Technology Grith

More information

times the performance of a high-end microprocessor introduced in 1984 [3]. 1 Such rapid growth in microprocessor performance has stimulated the develo

times the performance of a high-end microprocessor introduced in 1984 [3]. 1 Such rapid growth in microprocessor performance has stimulated the develo Compiler Technology for Future Microprocessors Wen-mei W. Hwu Richard E. Hank David M. Gallagher Scott A. Mahlke y Daniel M. Lavery Grant E. Haab John C. Gyllenhaal David I. August Center for Reliable

More information

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS Xiaodong Zhang and Yongsheng Song 1. INTRODUCTION Networks of Workstations (NOW) have become important distributed

More information

UMIACS-TR December, CS-TR-3192 Revised April, William Pugh. Dept. of Computer Science. Univ. of Maryland, College Park, MD 20742

UMIACS-TR December, CS-TR-3192 Revised April, William Pugh. Dept. of Computer Science. Univ. of Maryland, College Park, MD 20742 UMIACS-TR-93-133 December, 1992 CS-TR-3192 Revised April, 1993 Denitions of Dependence Distance William Pugh Institute for Advanced Computer Studies Dept. of Computer Science Univ. of Maryland, College

More information

C. E. McDowell August 25, Baskin Center for. University of California, Santa Cruz. Santa Cruz, CA USA. abstract

C. E. McDowell August 25, Baskin Center for. University of California, Santa Cruz. Santa Cruz, CA USA. abstract Unloading Java Classes That Contain Static Fields C. E. McDowell E. A. Baldwin 97-18 August 25, 1997 Baskin Center for Computer Engineering & Information Sciences University of California, Santa Cruz Santa

More information

Tour of common optimizations

Tour of common optimizations Tour of common optimizations Simple example foo(z) { x := 3 + 6; y := x 5 return z * y } Simple example foo(z) { x := 3 + 6; y := x 5; return z * y } x:=9; Applying Constant Folding Simple example foo(z)

More information

Machine-Independent Optimizations

Machine-Independent Optimizations Chapter 9 Machine-Independent Optimizations High-level language constructs can introduce substantial run-time overhead if we naively translate each construct independently into machine code. This chapter

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

Parallelization System. Abstract. We present an overview of our interprocedural analysis system,

Parallelization System. Abstract. We present an overview of our interprocedural analysis system, Overview of an Interprocedural Automatic Parallelization System Mary W. Hall Brian R. Murphy y Saman P. Amarasinghe y Shih-Wei Liao y Monica S. Lam y Abstract We present an overview of our interprocedural

More information

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation CSE 501: Compiler Construction Course outline Main focus: program analysis and transformation how to represent programs? how to analyze programs? what to analyze? how to transform programs? what transformations

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

A Debugging Tool for Software Evolution

A Debugging Tool for Software Evolution CASE-95, 7th International Workshop on Computer-Aided Software Engineering, Toronto, Ontario, Canada, July 1995, A Debugging Tool for Software Evolution D. Abramson R. Sosic School of Computing and Information

More information

Automatically Locating software Errors using Interesting Value Mapping Pair (IVMP)

Automatically Locating software Errors using Interesting Value Mapping Pair (IVMP) 71 Automatically Locating software Errors using Interesting Value Mapping Pair (IVMP) Ajai Kumar 1, Anil Kumar 2, Deepti Tak 3, Sonam Pal 4, 1,2 Sr. Lecturer, Krishna Institute of Management & Technology,

More information

Generalized Iteration Space and the. Parallelization of Symbolic Programs. (Extended Abstract) Luddy Harrison. October 15, 1991.

Generalized Iteration Space and the. Parallelization of Symbolic Programs. (Extended Abstract) Luddy Harrison. October 15, 1991. Generalized Iteration Space and the Parallelization of Symbolic Programs (Extended Abstract) Luddy Harrison October 15, 1991 Abstract A large body of literature has developed concerning the automatic parallelization

More information

Incremental Flow Analysis. Andreas Krall and Thomas Berger. Institut fur Computersprachen. Technische Universitat Wien. Argentinierstrae 8

Incremental Flow Analysis. Andreas Krall and Thomas Berger. Institut fur Computersprachen. Technische Universitat Wien. Argentinierstrae 8 Incremental Flow Analysis Andreas Krall and Thomas Berger Institut fur Computersprachen Technische Universitat Wien Argentinierstrae 8 A-1040 Wien fandi,tbg@mips.complang.tuwien.ac.at Abstract Abstract

More information

Chapter 1: Interprocedural Parallelization Analysis: A Case Study. Abstract

Chapter 1: Interprocedural Parallelization Analysis: A Case Study. Abstract Chapter 1: Interprocedural Parallelization Analysis: A Case Study Mary W. Hall Brian R. Murphy Saman P. Amarasinghe Abstract We present an overview of our interprocedural analysis system, which applies

More information

The members of the Committee approve the thesis of Baosheng Cai defended on March David B. Whalley Professor Directing Thesis Xin Yuan Commit

The members of the Committee approve the thesis of Baosheng Cai defended on March David B. Whalley Professor Directing Thesis Xin Yuan Commit THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES COMPILER MODIFICATIONS TO SUPPORT INTERACTIVE COMPILATION By BAOSHENG CAI A Thesis submitted to the Department of Computer Science in partial fulllment

More information

1 Introduction Testing seeks to reveal software faults by executing a program and comparing the output expected to the output produced. Exhaustive tes

1 Introduction Testing seeks to reveal software faults by executing a program and comparing the output expected to the output produced. Exhaustive tes Using Dynamic Sensitivity Analysis to Assess Testability Jerey Voas, Larry Morell y, Keith Miller z Abstract: This paper discusses sensitivity analysis and its relationship to random black box testing.

More information

2. Modulo Scheduling of Control-Intensive Loops. Figure 1. Source code for example loop from lex. Figure 2. Superblock formation for example loop.

2. Modulo Scheduling of Control-Intensive Loops. Figure 1. Source code for example loop from lex. Figure 2. Superblock formation for example loop. Modulo Scheduling of Loops in Control-Intensive Non-Numeric Programs Daniel M. Lavery Wen-mei W. Hwu Center for Reliable and High-Performance Computing University of Illinois, Urbana-Champaign, IL 61801

More information

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning Lecture 1 Contracts 15-122: Principles of Imperative Computation (Fall 2018) Frank Pfenning In these notes we review contracts, which we use to collectively denote function contracts, loop invariants,

More information

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax: Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical

More information

DTIC ELECTF- FEB i. v928 ' Evicted Variables and the Interaction of Global Register Allocation and Symbolic Debugging "AD-A.

DTIC ELECTF- FEB i. v928 ' Evicted Variables and the Interaction of Global Register Allocation and Symbolic Debugging AD-A. "AD-A v928 ' DTIC ELECTF- FEB10 1993i Evicted Variables and the Interaction of Global Register Allocation and Symbolic Debugging Ali-Reza Adl-Tabatabai October 1992 CMU-CS-92-202 Thomas Gross School of

More information

2 BACKGROUND AND RELATED WORK Speculative execution refers to the execution of an instruction before it is known whether the instruction needs to be e

2 BACKGROUND AND RELATED WORK Speculative execution refers to the execution of an instruction before it is known whether the instruction needs to be e Sentinel Scheduling with Recovery Blocks David I. August Brian L. Deitrich Scott A. Mahlke Center for Reliable and High-Performance Computing University of Illinois Urbana, IL 61801 January 31, 1995 Abstract

More information

Allowing Cycle-Stealing Direct Memory Access I/O. Concurrent with Hard-Real-Time Programs

Allowing Cycle-Stealing Direct Memory Access I/O. Concurrent with Hard-Real-Time Programs To appear in: Int. Conf. on Parallel and Distributed Systems, ICPADS'96, June 3-6, 1996, Tokyo Allowing Cycle-Stealing Direct Memory Access I/O Concurrent with Hard-Real-Time Programs Tai-Yi Huang, Jane

More information

r[2] = M[x]; M[x] = r[2]; r[2] = M[x]; M[x] = r[2];

r[2] = M[x]; M[x] = r[2]; r[2] = M[x]; M[x] = r[2]; Using a Swap Instruction to Coalesce Loads and Stores Apan Qasem, David Whalley, Xin Yuan, and Robert van Engelen Department of Computer Science, Florida State University Tallahassee, FL 32306-4530, U.S.A.

More information

task object task queue

task object task queue Optimizations for Parallel Computing Using Data Access Information Martin C. Rinard Department of Computer Science University of California, Santa Barbara Santa Barbara, California 9316 martin@cs.ucsb.edu

More information

Storage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk

Storage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk HRaid: a Flexible Storage-system Simulator Toni Cortes Jesus Labarta Universitat Politecnica de Catalunya - Barcelona ftoni, jesusg@ac.upc.es - http://www.ac.upc.es/hpc Abstract Clusters of workstations

More information

ABCDE. HP Part No Printed in U.S.A U0989

ABCDE. HP Part No Printed in U.S.A U0989 Switch Programing Guide HP 3000 Computer Systems ABCDE HP Part No. 32650-90014 Printed in U.S.A. 19890901 U0989 The information contained in this document is subject to change without notice. HEWLETT-PACKARD

More information

residual residual program final result

residual residual program final result C-Mix: Making Easily Maintainable C-Programs run FAST The C-Mix Group, DIKU, University of Copenhagen Abstract C-Mix is a tool based on state-of-the-art technology that solves the dilemma of whether to

More information

Automatic Counterflow Pipeline Synthesis

Automatic Counterflow Pipeline Synthesis Automatic Counterflow Pipeline Synthesis Bruce R. Childers, Jack W. Davidson Computer Science Department University of Virginia Charlottesville, Virginia 22901 {brc2m, jwd}@cs.virginia.edu Abstract The

More information

Process Time Comparison between GPU and CPU

Process Time Comparison between GPU and CPU Process Time Comparison between GPU and CPU Abhranil Das July 20, 2011 Hamburger Sternwarte, Universität Hamburg Abstract This report discusses a CUDA program to compare process times on a GPU and a CPU

More information

Adaptive Methods for Distributed Video Presentation. Oregon Graduate Institute of Science and Technology. fcrispin, scen, walpole,

Adaptive Methods for Distributed Video Presentation. Oregon Graduate Institute of Science and Technology. fcrispin, scen, walpole, Adaptive Methods for Distributed Video Presentation Crispin Cowan, Shanwei Cen, Jonathan Walpole, and Calton Pu Department of Computer Science and Engineering Oregon Graduate Institute of Science and Technology

More information

Lecture Notes for Chapter 2: Getting Started

Lecture Notes for Chapter 2: Getting Started Instant download and all chapters Instructor's Manual Introduction To Algorithms 2nd Edition Thomas H. Cormen, Clara Lee, Erica Lin https://testbankdata.com/download/instructors-manual-introduction-algorithms-2ndedition-thomas-h-cormen-clara-lee-erica-lin/

More information

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall CS 701 Charles N. Fischer Class Meets Tuesdays & Thursdays, 11:00 12:15 2321 Engineering Hall Fall 2003 Instructor http://www.cs.wisc.edu/~fischer/cs703.html Charles N. Fischer 5397 Computer Sciences Telephone:

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Lecture Notes on Loop Optimizations

Lecture Notes on Loop Optimizations Lecture Notes on Loop Optimizations 15-411: Compiler Design Frank Pfenning Lecture 17 October 22, 2013 1 Introduction Optimizing loops is particularly important in compilation, since loops (and in particular

More information

The of these simple branch prediction strategies is about 3%, but some benchmark programs have a of. A more sophisticated implementation of static bra

The of these simple branch prediction strategies is about 3%, but some benchmark programs have a of. A more sophisticated implementation of static bra Improving Semi-static Branch Prediction by Code Replication Andreas Krall Institut fur Computersprachen Technische Universitat Wien Argentinierstrae 8 A-4 Wien andimips.complang.tuwien.ac.at Abstract Speculative

More information

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Spring 2015 1 Handling Overloaded Declarations Two approaches are popular: 1. Create a single symbol table

More information

The Compositional C++ Language. Denition. Abstract. This document gives a concise denition of the syntax and semantics

The Compositional C++ Language. Denition. Abstract. This document gives a concise denition of the syntax and semantics The Compositional C++ Language Denition Peter Carlin Mani Chandy Carl Kesselman March 12, 1993 Revision 0.95 3/12/93, Comments welcome. Abstract This document gives a concise denition of the syntax and

More information

VM instruction formats. Bytecode translator

VM instruction formats. Bytecode translator Implementing an Ecient Java Interpreter David Gregg 1, M. Anton Ertl 2 and Andreas Krall 2 1 Department of Computer Science, Trinity College, Dublin 2, Ireland. David.Gregg@cs.tcd.ie 2 Institut fur Computersprachen,

More information

Improving the Scalability of Comparative Debugging with MRNet

Improving the Scalability of Comparative Debugging with MRNet Improving the Scalability of Comparative Debugging with MRNet Jin Chao MeSsAGE Lab (Monash Uni.) Cray Inc. David Abramson Minh Ngoc Dinh Jin Chao Luiz DeRose Robert Moench Andrew Gontarek Outline Assertion-based

More information

Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations

Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations Ying Hu Clark Barrett Benjamin Goldberg Department of Computer Science New York University yinghubarrettgoldberg

More information

Compiler and Architectural Support for. Jerey S. Snyder, David B. Whalley, and Theodore P. Baker. Department of Computer Science

Compiler and Architectural Support for. Jerey S. Snyder, David B. Whalley, and Theodore P. Baker. Department of Computer Science Fast Context Switches: Compiler and Architectural Support for Preemptive Scheduling Jerey S. Snyder, David B. Whalley, and Theodore P. Baker Department of Computer Science Florida State University, Tallahassee,

More information

INPUT SYSTEM MODEL ANALYSIS OUTPUT

INPUT SYSTEM MODEL ANALYSIS OUTPUT Detecting Null Pointer Violations in Java Programs Xiaoping Jia, Sushant Sawant Jiangyu Zhou, Sotiris Skevoulis Division of Software Engineering School of Computer Science, Telecommunication, and Information

More information

Measuring the User Debugging Experience. Greg Bedwell Sony Interactive Entertainment

Measuring the User Debugging Experience. Greg Bedwell Sony Interactive Entertainment Measuring the User Debugging Experience Greg Bedwell Sony Interactive Entertainment introducing DExTer introducing Debugging Experience Tester introducing Debugging Experience Tester (currently in internal

More information

i=1 i=2 i=3 i=4 i=5 x(4) x(6) x(8)

i=1 i=2 i=3 i=4 i=5 x(4) x(6) x(8) Vectorization Using Reversible Data Dependences Peiyi Tang and Nianshu Gao Technical Report ANU-TR-CS-94-08 October 21, 1994 Vectorization Using Reversible Data Dependences Peiyi Tang Department of Computer

More information

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s] Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of

More information

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T.

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T. Document Image Restoration Using Binary Morphological Filters Jisheng Liang, Robert M. Haralick University of Washington, Department of Electrical Engineering Seattle, Washington 98195 Ihsin T. Phillips

More information

CS354 gdb Tutorial Written by Chris Feilbach

CS354 gdb Tutorial Written by Chris Feilbach CS354 gdb Tutorial Written by Chris Feilbach Purpose This tutorial aims to show you the basics of using gdb to debug C programs. gdb is the GNU debugger, and is provided on systems that

More information

An Object Oriented Runtime Complexity Metric based on Iterative Decision Points

An Object Oriented Runtime Complexity Metric based on Iterative Decision Points An Object Oriented Runtime Complexity Metric based on Iterative Amr F. Desouky 1, Letha H. Etzkorn 2 1 Computer Science Department, University of Alabama in Huntsville, Huntsville, AL, USA 2 Computer Science

More information

iii ACKNOWLEDGEMENTS I would like to thank my advisor, Professor Wen-Mei Hwu, for providing me with the resources, support, and guidance necessary to

iii ACKNOWLEDGEMENTS I would like to thank my advisor, Professor Wen-Mei Hwu, for providing me with the resources, support, and guidance necessary to DYNAMIC CONTROL OF COMPILE TIME USING VERTICAL REGION-BASED COMPILATION BY JAYMIE LYNN BRAUN B.S., University of Iowa, 1995 THESIS Submitted in partial fulllment of the requirements for the degree of Master

More information

Optimization on array bound check and Redundancy elimination

Optimization on array bound check and Redundancy elimination Optimization on array bound check and Redundancy elimination Dr V. Vijay Kumar Prof. K.V.N.Sunitha CSE Department CSE Department JNTU, JNTU, School of Information Technology, G.N.I.T.S, Kukatpally, Shaikpet,

More information

Accelerated Library Framework for Hybrid-x86

Accelerated Library Framework for Hybrid-x86 Software Development Kit for Multicore Acceleration Version 3.0 Accelerated Library Framework for Hybrid-x86 Programmer s Guide and API Reference Version 1.0 DRAFT SC33-8406-00 Software Development Kit

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

Cost Effective Dynamic Program Slicing

Cost Effective Dynamic Program Slicing Cost Effective Dynamic Program Slicing Xiangyu Zhang Rajiv Gupta Department of Computer Science The University of Arizona Tucson, Arizona 87 {xyzhang,gupta}@cs.arizona.edu ABSTRACT Although dynamic program

More information

School of Computer Science. Scheme Flow Analysis Note 3 5/1/90. Super-: Copy, Constant, and Lambda Propagation in Scheme.

School of Computer Science. Scheme Flow Analysis Note 3 5/1/90. Super-: Copy, Constant, and Lambda Propagation in Scheme. Carnegie Mellon School of Computer Science Scheme Flow Analysis Note 3 5/1/90 Super-: Copy, Constant, and Lambda Propagation in Scheme Olin Shivers shivers@cs.cmu.edu This is an informal note intended

More information

Interaction of JVM with x86, Sparc and MIPS

Interaction of JVM with x86, Sparc and MIPS Interaction of JVM with x86, Sparc and MIPS Sasikanth Avancha, Dipanjan Chakraborty, Dhiral Gada, Tapan Kamdar {savanc1, dchakr1, dgada1, kamdar}@cs.umbc.edu Department of Computer Science and Electrical

More information

Two Problems - Two Solutions: One System - ECLiPSe. Mark Wallace and Andre Veron. April 1993

Two Problems - Two Solutions: One System - ECLiPSe. Mark Wallace and Andre Veron. April 1993 Two Problems - Two Solutions: One System - ECLiPSe Mark Wallace and Andre Veron April 1993 1 Introduction The constraint logic programming system ECL i PS e [4] is the successor to the CHIP system [1].

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate

More information

Center for Supercomputing Research and Development. recognizing more general forms of these patterns, notably

Center for Supercomputing Research and Development. recognizing more general forms of these patterns, notably Idiom Recognition in the Polaris Parallelizing Compiler Bill Pottenger and Rudolf Eigenmann potteng@csrd.uiuc.edu, eigenman@csrd.uiuc.edu Center for Supercomputing Research and Development University of

More information

PT = 4l - 3w 1 PD = 2. w 2

PT = 4l - 3w 1 PD = 2. w 2 University of Maryland Systems & Computer Architecture Group technical report UMD-SCA-TR-2000-01. Multi-Chain Prefetching: Exploiting Natural Memory Parallelism in Pointer-Chasing Codes Nicholas Kohout,

More information

Compiler Support for Software-Based Cache Partitioning. Frank Mueller. Humboldt-Universitat zu Berlin. Institut fur Informatik. Unter den Linden 6

Compiler Support for Software-Based Cache Partitioning. Frank Mueller. Humboldt-Universitat zu Berlin. Institut fur Informatik. Unter den Linden 6 ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, La Jolla, California, June 1995. Compiler Support for Software-Based Cache Partitioning Frank Mueller Humboldt-Universitat

More information

Static WCET Analysis: Methods and Tools

Static WCET Analysis: Methods and Tools Static WCET Analysis: Methods and Tools Timo Lilja April 28, 2011 Timo Lilja () Static WCET Analysis: Methods and Tools April 28, 2011 1 / 23 1 Methods 2 Tools 3 Summary 4 References Timo Lilja () Static

More information

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems On Object Orientation as a Paradigm for General Purpose Distributed Operating Systems Vinny Cahill, Sean Baker, Brendan Tangney, Chris Horn and Neville Harris Distributed Systems Group, Dept. of Computer

More information

Plaintext (P) + F. Ciphertext (T)

Plaintext (P) + F. Ciphertext (T) Applying Dierential Cryptanalysis to DES Reduced to 5 Rounds Terence Tay 18 October 1997 Abstract Dierential cryptanalysis is a powerful attack developed by Eli Biham and Adi Shamir. It has been successfully

More information

Heap Management. Heap Allocation

Heap Management. Heap Allocation Heap Management Heap Allocation A very flexible storage allocation mechanism is heap allocation. Any number of data objects can be allocated and freed in a memory pool, called a heap. Heap allocation is

More information

Using Cache Line Coloring to Perform Aggressive Procedure Inlining

Using Cache Line Coloring to Perform Aggressive Procedure Inlining Using Cache Line Coloring to Perform Aggressive Procedure Inlining Hakan Aydın David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA, 02115 {haydin,kaeli}@ece.neu.edu

More information

Solve the Data Flow Problem

Solve the Data Flow Problem Gaining Condence in Distributed Systems Gleb Naumovich, Lori A. Clarke, and Leon J. Osterweil University of Massachusetts, Amherst Computer Science Department University of Massachusetts Amherst, Massachusetts

More information

Code Placement, Code Motion

Code Placement, Code Motion Code Placement, Code Motion Compiler Construction Course Winter Term 2009/2010 saarland university computer science 2 Why? Loop-invariant code motion Global value numbering destroys block membership Remove

More information

2 Related Work Often, animation is dealt with in an ad-hoc manner, such as keeping track of line-numbers. Below, we discuss some generic approaches. T

2 Related Work Often, animation is dealt with in an ad-hoc manner, such as keeping track of line-numbers. Below, we discuss some generic approaches. T Animators for Generated Programming Environments Frank Tip? CWI, P.O. Box 4079, 1009 AB Amsterdam, The Netherlands tip@cwi.nl Abstract. Animation of execution is a necessary feature of source-level debuggers.

More information

Unication of Register Allocation and Instruction Scheduling. in Compilers for Fine-Grain Parallel Architectures. David A. Berson

Unication of Register Allocation and Instruction Scheduling. in Compilers for Fine-Grain Parallel Architectures. David A. Berson Unication of Register Allocation and Instruction Scheduling in Compilers for Fine-Grain Parallel Architectures by David A. Berson B.A., Coe College, 1984 M.S., University of DePaul, 1988 Submitted to the

More information

Modify Compiler. Compiler Rewrite Assembly Code. Modify Linker. Modify Objects. Libraries. Modify Libraries

Modify Compiler. Compiler Rewrite Assembly Code. Modify Linker. Modify Objects. Libraries. Modify Libraries Appears in: Software Practice & Experience Rewriting Executable Files to Measure Program Behavior James R. Larus and Thomas Ball larus@cs.wisc.edu Computer Sciences Department University of Wisconsin{Madison

More information

Type T1: force false. Type T2: force true. Type T3: complement. Type T4: load

Type T1: force false. Type T2: force true. Type T3: complement. Type T4: load Testability Insertion in Behavioral Descriptions Frank F. Hsu Elizabeth M. Rudnick Janak H. Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract A new synthesis-for-testability

More information

Technische Universitat Munchen. Institut fur Informatik. D Munchen.

Technische Universitat Munchen. Institut fur Informatik. D Munchen. Developing Applications for Multicomputer Systems on Workstation Clusters Georg Stellner, Arndt Bode, Stefan Lamberts and Thomas Ludwig? Technische Universitat Munchen Institut fur Informatik Lehrstuhl

More information

Identifying Parallelism in Construction Operations of Cyclic Pointer-Linked Data Structures 1

Identifying Parallelism in Construction Operations of Cyclic Pointer-Linked Data Structures 1 Identifying Parallelism in Construction Operations of Cyclic Pointer-Linked Data Structures 1 Yuan-Shin Hwang Department of Computer Science National Taiwan Ocean University Keelung 20224 Taiwan shin@cs.ntou.edu.tw

More information

Lecture 5. Data Flow Analysis

Lecture 5. Data Flow Analysis Lecture 5. Data Flow Analysis Wei Le 2014.10 Abstraction-based Analysis dataflow analysis: combines model checking s fix point engine with abstract interpretation of data values abstract interpretation:

More information

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Objective PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Explain what is meant by compiler. Explain how the compiler works. Describe various analysis of the source program. Describe the

More information

Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Department of Computer Science The Australian National University Canberra ACT 26

Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Department of Computer Science The Australian National University Canberra ACT 26 Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Technical Report ANU-TR-CS-92- November 7, 992 Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Department of Computer

More information

Destination-Driven Code Generation R. Kent Dybvig, Robert Hieb, Tom Butler Computer Science Department Indiana University Bloomington, IN Februa

Destination-Driven Code Generation R. Kent Dybvig, Robert Hieb, Tom Butler Computer Science Department Indiana University Bloomington, IN Februa Destination-Driven Code Generation R. Kent Dybvig, Robert Hieb, Tom Butler dyb@cs.indiana.edu Indiana University Computer Science Department Technical Report #302 February 1990 Destination-Driven Code

More information

A 100 B F

A 100 B F Appears in Adv. in Lang. and Comp. for Par. Proc., Banerjee, Gelernter, Nicolau, and Padua (ed) 1 Using Prole Information to Assist Advanced Compiler Optimization and Scheduling William Y. Chen, Scott

More information

PPS : A Pipeline Path-based Scheduler. 46, Avenue Felix Viallet, Grenoble Cedex, France.

PPS : A Pipeline Path-based Scheduler. 46, Avenue Felix Viallet, Grenoble Cedex, France. : A Pipeline Path-based Scheduler Maher Rahmouni Ahmed A. Jerraya Laboratoire TIMA/lNPG,, Avenue Felix Viallet, 80 Grenoble Cedex, France Email:rahmouni@verdon.imag.fr Abstract This paper presents a scheduling

More information

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages Harvard School of Engineering and Applied Sciences CS 152: Programming Languages Lecture 24 Thursday, April 19, 2018 1 Error-propagating semantics For the last few weeks, we have been studying type systems.

More information

Improving the Static Analysis of Loops by Dynamic Partitioning Techniques

Improving the Static Analysis of Loops by Dynamic Partitioning Techniques Improving the Static Analysis of Loops by Dynamic Partitioning echniques Matthieu Martel CEA - Recherche echnologique LIS-DSI-SLA CEA F91191 Gif-Sur-Yvette Cedex, France Matthieu.Martel@cea.fr Abstract

More information

Distributed Algorithms for Detecting Conjunctive Predicates. The University of Texas at Austin, September 30, Abstract

Distributed Algorithms for Detecting Conjunctive Predicates. The University of Texas at Austin, September 30, Abstract Distributed Algorithms for Detecting Conjunctive Predicates Parallel and Distributed Systems Laboratory email: pdslab@ece.utexas.edu Electrical and Computer Engineering Department The University of Texas

More information

Introduction. 1 Measuring time. How large is the TLB? 1.1 process or wall time. 1.2 the test rig. Johan Montelius. September 20, 2018

Introduction. 1 Measuring time. How large is the TLB? 1.1 process or wall time. 1.2 the test rig. Johan Montelius. September 20, 2018 How large is the TLB? Johan Montelius September 20, 2018 Introduction The Translation Lookaside Buer, TLB, is a cache of page table entries that are used in virtual to physical address translation. Since

More information

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t Data Reduction - an Adaptation Technique for Mobile Environments A. Heuer, A. Lubinski Computer Science Dept., University of Rostock, Germany Keywords. Reduction. Mobile Database Systems, Data Abstract.

More information

Outline. Computer Science 331. Information Hiding. What This Lecture is About. Data Structures, Abstract Data Types, and Their Implementations

Outline. Computer Science 331. Information Hiding. What This Lecture is About. Data Structures, Abstract Data Types, and Their Implementations Outline Computer Science 331 Data Structures, Abstract Data Types, and Their Implementations Mike Jacobson 1 Overview 2 ADTs as Interfaces Department of Computer Science University of Calgary Lecture #8

More information

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli Eect of fan-out on the Performance of a Single-message cancellation scheme Atul Prakash (Contact Author) Gwo-baw Wu Seema Jetli Department of Electrical Engineering and Computer Science University of Michigan,

More information