A Compiler Framework for Recovery Code Generation in General Speculative Optimizations

Size: px
Start display at page:

Download "A Compiler Framework for Recovery Code Generation in General Speculative Optimizations"

Transcription

1 A Compiler Framework for Recovery Code Generation in General Speculative Optimizations Jin Lin, Wei-Chung Hsu, Pen-Chung Yew, Roy Dz-Ching Ju and Tin-Fook Ngai Department of Computer Science and Microprocessor Research Lab Engineering Intel Corporation University of Minnesota Santa Clara, CA U.S.A {jin, hsu, {roy.ju, Abstract A general framework that integrates both control and data speculation using alias profiling and/or compiler heuristic rules has shown to improve SPEC2000 performance on Itanium systems. However, speculative optimizations require check instructions and recovery code to ensure correct execution when speculation fails at runtime. How to generate check instructions and their associated recovery code efficiently and effectively is an issue yet to be well studied. Also, it is very important that the recovery code generated in the earlier phases integrate gracefully in the later optimization phases. At the very least, it should not hinder later optimizations, thus, ensuring overall performance improvement. This paper proposes a framework that uses an if-block structure to facilitate check instructions and recovery code generation for general speculative optimizations. It allows speculative instructions and their recovery code generated in the early compiler optimization phases to be integrated effectively with the subsequent optimization phases. It also allows multi-level speculation for multi-level pointers and multi-level expression trees to be handled with no additional complexity. The proposed recovery code generation framework has been implemented in the Open Research Compiler (ORC). 1. Introduction Control and data speculation [2,9,16,17,19] has been used effectively to improve program performance. Ju et al. [9] proposed a unified framework to exploit both control and data speculation targeting specifically for memory latency hiding. In their scheme, the speculation is exploited by hoisting load instructions across potentially aliasing store instructions, thus more effectively hiding the memory latency. In [16], a more general framework is proposed to support general speculative optimizations, such as speculative register promotion and speculative partial redundancy elimination. However, the crucial issue of check instructions and recovery code generation have so far been mostly overlooked. The recovery code generation scheme used by Ju, Norman, Mahadevan and Wu in [9] (referred to as JNMW algorithm in this paper) during the code scheduling is quite straightforward. When a load instruction is speculatively moved up to an earlier program point, a special check instruction, chk, is generated at its original location. Some of the instructions that are flow dependent on the load instruction could also be moved up with the speculative load instruction. To facilitate recovery code generation, the compiler introduces additional dependence edges for the new chk instruction. This simple scheme works fine because the generated recovery code rarely interacts with other compiler optimizations (instruction scheduling is usually performed at the very end of the compiler optimization phases). In [16], a check instruction is explicitly modeled as an assignment statement and the corresponding recovery code is generated separately. These additional assignment statements could hinder subsequent optimizations and impact the performance of the speculative optimizations unless we mark these special assignment statements explicitly and force later analysis and optimizations to deal with them specifically using ad hoc approaches. It makes the later analyses and optimizations more complex and less effective. In this paper, we propose a unified framework for recovery code generation that supports general speculative optimizations including speculative PRE

2 (usually performed in an early phase of the compiler optimizations) and instruction scheduling (usually performed late). It uses an explicit control flow structure to model check instructions and recovery code. Using this model, the check instruction and its recovery code can be treated as a highly biased if-block in the later analyses and optimization phases. The check instructions and their recovery blocks can thus be integrated gracefully in the later phases such that they are analyzed and optimized as any other instructions without having to be treated separately and explicitly as other ad hoc approaches [9] [16]. This is most obvious for multi-level speculation, such as cascaded speculation for multi-level pointers [9]. They can be handled easily in the later optimizations without additional complexity. As far as we know, this is the first general compiler framework for recovery code generation. Our experimental results show that the proposed framework can effectively support general speculative optimizations. The rest of the paper is organized as follows. Section 2 describes our scheme to model check instructions and their recovery code generation using explicit if-block structures. It also discusses how to generate the recovery block in both speculative partial redundancy elimination (SPRE) and instruction scheduling. In section 3, we discuss how to eliminate redundant check instructions. In section 4, we show how our new framework could generate recovery code for both single-level and multi-level speculation in a unified way. Section 5 presents some experimental results. In Section 6, we review some of the previous recovery code generation algorithms and explain why their check and recovery code generation approaches are inadequate for general speculative optimizations. We give our conclusions in section A Framework for Check Instructions and Recovery Code Generation A proper representation of a check instruction and its corresponding recovery block (A recovery block is the recovery code for a specific check instruction. In the rest of the paper, we will use recovery block and recovery code interchangeably). In a compiler internal representation (IR) is important because they could interact with later optimizations in a very complicated way. In this section, we present our model for check instructions and recovery code generation Representation of Check Instructions and Recovery Blocks We explicitly represent the semantics of the check instruction and its corresponding recovery block as a conditional if-block as shown in Figure 1(c). In the ifblock, it checks whether the speculation is successful or not, if not, the recovery block will be executed. Initially, only the original load instruction associated with the speculative load is included in the recovery block, shown as s5 in Figure 1(c). A φ-function is inserted after the if-block for r in the SSA form (in statement s7). Also, since mis-speculation has to be rare to make data speculation worthwhile, the ifcondition is set to be highly unlikely to happen. Based on such an explicit representation, later analyses and optimizations could treat it as any other highly biased if-blocks. There will be no need to distinguish speculative code from non-speculative code during the analyses and the optimizations, as will be discussed in Section Recovery Code Generation for Speculative PRE We show how check instructions and their recovery code are generated with our proposed model in speculative SSAPRE form [16]. Our strategy is to generate the check instruction at the point of the last weak update but before its next use. Starting from a speculative redundant candidate, we arrive at the latest weak update. There, we insert an if-then statement to model the check instruction and its corresponding recovery block, which initially has just one single load instruction (i.e. the original load instruction) in the then part of the if-then statement. Figure 2(a) gives an example to show the effect of the proposed algorithm. We assume that **q in s2 could potentially update the loads *p and **p with a very low probability. The second occurrence of the expression *p in s3 has been determined to be speculatively redundant to the first one in s1, as shown in Figure 2(b). The compiler inserts an if-then statement after the weak update **q in s2 for the speculative redundant expression *p in s3, as shown in Figure 2(c). The chk flag is used to mark those ifblocks that correspond to check instructions. Those ifblocks will be converted to a chk.a instruction, or an ld.c instruction (if there is only one load instruction in the recovery block) during the code generation.

3 s1:... = *p s2: *q =... (may update *p) s3:... = *p +10 (a) The source program s1 :... = r 1 s2: *q =... s3 : r 2 = *p s3: = r (b) The check instruction is modeled as an assignment statement s1 :... = r 1 s2: *q =... s4: if (r 1 is invalid){ s5: r 2 = *p s6: s7: r 3 φ(r 1, r 2 ) s3: = r (c) The check instruction is modeled as an explicit control flow structure Figure 1 Different representations used to model a check instruction and its recovery block s1:... = **p s2: **q = (may update **p and *p) s3:... = **p +10; (a) source program s1: = **p [h 1 ] s2: **q =... s3: = **p +10 [h 1 <speculative>] (b) output of the renaming step for speculative PRE for expression *p, where h is hypothetical temporary for the load of *p s2 = *r 1 s3 **q =... s4: if (r 1 is invalid){ s5: r 2 = *p s6 s7: r 3 φ (r 1, r 2 ) s8: = *r (c) output of the code motion step for speculative PRE for expression *p Figure 2 Example of recovery code generation in speculative PRE 2.3. Interaction of the Early Introduced Recovery Code with Later Optimizations The recovery blocks, represented by highly-biased if-then statements can be easily maintained in later optimizations such as instruction scheduling. In this section, we use partially ready code motion algorithm [1] during the instruction-scheduling phase as an example. Consider the example in Figure 3(a). If we assume that the right branch is very rarely taken, the instruction b=y in the basic block D can be identified as a P-ready (i.e. partial ready) candidate [1]. It will then be scheduled along the most likely path A B D, instead of the unlikely path A C D. Figure 3(b) shows the result of such a code motion. The instruction b=y is hoisted to the basic block A and a compensation copy is placed in the basic block C. The recovery code represented by an explicit and highly-biased if-then statement is a good candidate for P-ready code motion. If there exists a dependent instruction that could be hoisted before the recovery block, this instruction will be duplicated as a compensation instruction in the recovery block. At the end of instruction scheduling, the recovery block would have been updated accordingly and is already well formed. There is no need to trace the flow dependence chain again, as in the JNMW scheme, and to generate the recovery block in a separate effort. After instruction scheduling, we could easily convert these if-then blocks into check instructions and their recovery blocks. B A a=x b=y y=2 D (a) instruction b=y is partially ready at basic block D C b=y B a=x A y=2 b=y D (b) partial ready code motion duplicates instruction b=y Figure 3 Example of partial ready code motion for latency hiding Using the example in Figure 4 (a), the output of the speculative PRE (shown in Figure 4 (b)), is further optimized with instruction scheduling. As can be seen in Figure 4(c), if the compiler determines the C

4 instruction s = r+10 can be speculatively hoisted across the store *q statement, it will duplicate the instruction into the recovery block. The generated check and recovery code is shown in Figure 4(d).... = *p *q =... (may update *p)... = *p +10; r = *p /* ld.a flag */... = r *q =... if (r is invalid){ r = *p s = r +10 Similar to that in speculative PRE, the proposed scheme can also be used in speculative instruction scheduling. r = *p /* ld.a flag */... = r s = r+10 *q =... if (r is invalid){ r = *p s =r+10 ld.a r = [p] add s = r, 10 st [q] = chk.a r, recovery label: recovery: ld r = [p] add s =r,10 br label (a) source program (b) output after speculative PRE (c) output after instruction scheduling (d) final output Figure 4 Recovery code generation for a speculative PRE after instruction scheduling 3. Recovery Code Generation in Multi- Level Speculation 3.1 Check Instructions and Recovery Code Representation for Multi-Level Speculation s1:... = **p s2: **q = s3:... = **p ld.a r = [p] ld.a f = [r] st chk.a r, recovery_1 label_1: chk.a f, recovery_2 label_2: recovery_1: ld r = [p] ld f = [r] br label_1 recovery_2: ld f = [r] br label_2 ld.a r = [p] ld f = [r] st chk.a r, recovery_1 label_1: recovery_1: ld r = [p] ld f = [r] br label_1 s1:... = *p + *q s2: *a =... (may update *p or *q) s3:... = *p +*q; ld.a r = [p] ld.a f = [q] add s = r, f *a = chk.a r, recovery_1 label_1: chk.a f, recovery_2 label_2: recovery_1: ld r = [p] add s = r, f br label_1 recovery_2: ld f = [q] add s = r, f br label_2 (a) Source program (b) p and *p could potentially be updated by intervening store **q (c) Only p could potentially be updated by intervening store **q (d) Source program (e) Speculation for both the variables and the value in a computation expression Figure 5 Two examples of multi-level speculation Multi-level speculation refers to the data speculation that occurs in a multi-level expression tree as in the Figure 5(a) and Figure 5(d). In Figure 5 (d), the expression tree *p + *q in s3 could be speculatively redundant to *p + *q in s1, if *a in s2 is a potential weak update to *p and/or *q. In this case, we have to generate two speculative loads and two check instructions for *p and *q as shown in Figure 5(e). Each check instruction will have a different recovery block. Another example of a typical multi-level speculation is as shown in Figure 5 (a). It is also referred to as a cascaded speculation in [9]. In Figure 9(a), **q could be a potential weak update to *p and/or **p. We need to generate two check instructions for both *p and **p as shown in Figure 5(b), each with a different recovery block. If **q is a potential weak update only to *p, then the check instruction and recovery block will be as shown in Figure 5(c). According to our previous study [3], there are many multi-level indirect references (e.g. multi-level field

5 accesses) in the SPEC2000 C programs. Those references present opportunities for multi-level s1:... = **p s2: **q = s3:... = **p (a) Source program r 1 = *p /* ld.a flag */ f 1 = *r 1 /*ld.a flag */... = f 1 **q =... if (r 1 is invalid){ r 2 = *p f 2 = *r 2 r 3 φ (r 1, r 2 ) f 3 φ (f 1, f 2 ) if (f 3 is invalid){ f 4 = *r 1 f 5 φ (f 3, f 4 ) = f 5 (b) p and *p could potentially be updated by intervening store **q speculative PRE. r 1 = *p /* ld.a flag */ f 1 = *r 1... = f 1 *q =... if (r 1 is invalid){ r 2 = *p f 2 = *r 2 r 3 φ (r 1, r 2 ) f 3 φ (f 1, f 2 ) = f 3 (c) Only p could potentially be updated by intervening store **q s1:... = *p + *q s2: *a =... (may update *p or *q) s3:... = *p +*q (d) Source program r 1 = *p /* ld.a flag */ f 1 = *q /* ld.a flag */ s 1 = r 1 +f 1 *a = if (r 1 is invalid){ r 2 = *p s 2 = r 2 +f 1 r 3 φ (r 1, r 2 ) s 3 φ (s 1, s 2 ) if (f 1 is invalid){ f 2 = *q s 4 = r 3 +f 2 f 3 φ (f 1, f 2 ) s 5 φ (s 3, s 4 ) = s 5 (e) Speculation for both the variables and the value in a computation expression Figure 6 Examples of check instructions and their recovery blocks in multi-level speculation The recovery code representation discussed in Section 2 can be applied directly to multi-level speculation without any change. In Figure 6(b), both the address expression p and the value expression *p are candidates for speculative register promotion. Hence, two if-blocks are generated. In Figure 6(c), only the address *p may be modified by the potential weak update. It is obvious that the representation in Figure 6 matches very well with final assembly code shown in Figure Recovery Code Generation for Multi- Level Speculation Using an if-block to represent a check instruction and its recovery block, all previous discussions on recovery code generation for speculative SSAPRE can be applied directly to multi-level speculation without any change. In our speculative SSAPRE framework, an expression tree is processed in a bottom-up order. For example, given an expression ***p, we begin by processing the sub-expression *p for the first-level speculation, then the sub-expression **p for the second-level speculation, and lastly ***p for the third level speculation. This processing order also guarantees that the placement of check instructions and the recovery code generation are optimized level-by-level. Therefore, when a sub-expression is processed for the n-th level speculation, every sub-expression at the (n-1)-th level speculation has been processed, and its corresponding recovery block (i.e. if-block) also generated. Here, we assume an if-block will be generated at the last weak update of the sub-expression. A re-load instruction for the value of the subexpression will be included initially in the if-block. Figure 7(b) show the result after the first-level speculation on *p for the program in Figure 7(a). We assume **q is aliased with both *p and **p with a very low probability. Hence, **q in s2 is a speculative weak update for both *p and **p. During the first-level speculation, the sub-expression *p is speculatively promoted to the register r. The subscript of r represents the version number of r. As can be seen in Figure 7(b), the value of *p in s1' becomes unavailable to the use in s3 along the true path of the if-block because of the reload instruction in s5. The value of an expression is unavailable to a later use of the expression if the value of the expression is redefined before the use. This is represented in the version number of r that changes from r 1 to r 3 due to the φ(r 1, r 2 ) in s6.

6 s1:... = **p s2: **q = (may update **p and *p) s3:... = **p +10; (a) Source program s1 : = *r 1 [h 1 ] s2: **q =... s4: if (r 1 is invalid){ s5: r 2 = *p s6: r 3 φ (r 1, r 2 ) s6 : h 2 φ (h 1 <speculative>, ) s3: = *r 3 [h 2 ] +10 s1 : = *r 1 s2: **q =... s4: if (r 1 is invalid){ s5: r 2 = *p s6: r 3 φ (r 1, r 2 ) s3: = *r (b)the recovery code for 1 st level speculation s1 : s 1 = *r 1 /* ld.a flag */ s1 : = s 1 s2: **q =... s4: if (r 1 is invalid){ s5: r 2 = *p s5 : s 2 = *r 2 s6: r 3 φ (r 1, r 2 ) s7: s 3 φ (s 1, s 2 ) s8: if (s 1 is invalid){ s9: s 4 = *r 3 s10: s 5 φ (s 3, s 4 ) s3: = s s1 : = *r 1 [h] s2: **q =... s4: if (r 1 is invalid){ s5: r 2 = *p s6: r 3 φ (r 1, r 2 ) s6 : h φ (h, h) s3: = *r 3 [h] +10 /* h is hypothetical temporary for the load of **p */ (c)output of φ- insertion step for 2 nd level speculation s1 : = *r 1 s2: **q =... s4: r 2 = *p s3: = *r (d) Output of rename step for 2 nd level speculation (e) Output of recovery code for 2 nd level speculation (f) output of recovery code for 1 st level speculation using assignment statement representation Figure 7 Examples of recovery code generation in speculative PRE In the second-level speculation for **p, we can also speculatively promote **p to register s. A hypothetical variable h is created for *r (i.e. **p) in s1'. Figure 7(c) shows the result after the φ-insertion phase in SSAPRE [5]. A φ-function φ(h, h) in s6' is inserted because of the if-block in s4. The first operand of φ(h, h), which corresponds to the false path of the if-block in s4, will have the same version number as that in s1' because of the speculative weak update in s2 after the rename step in SSAPRE [5]. However, the second operand of φ(h, h), which corresponds to the true path, is replaced by. It is because the value of h (i.e. *r) becomes unavailable due to the re-load (i.e. a re-definition) of r in s5. The result is shown in Figure 7(d). The if-block in s8 is inserted after the speculative weak update of s2 with a re-load of *r in s9 (Figure 7(e)). Due to the second operand of φ(h 1, ) in s6', the algorithm will insert a reload of *r (in s5') along the true path of the if-block (in s4) to make the loading of *r 3 in s3 fully redundant. That is, the existing SSAPRE algorithm will automatically update the recovery block from the first level (i.e. *p) without additional effort. This advantage of the if-block representation for the recovery block can be better appreciated if we look at the difficulties we would have encountered if we represent the check instruction as an assignment statement as mentioned in section 2.1. The expression *p and **p are speculative redundant candidates. Using an assignment statement to represent a check instruction, the recovery block generated for the expression *p is shown in Figure 7(f). Since the value r is redefined at s4, the compiler cannot detect the expression *r (i.e. **p) in s3 as speculative redundant (because r has been redefined). In order to support multi-level speculation, we would have to significantly modify the SSAPRE representation and increase the complexity to the algorithm substantially.

7 4. Optimizing the Placement of Check Instructions and Recovery Blocks in Speculative PRE The newly generated check instructions and recovery blocks after the speculative PRE may introduce additional redundancies. For example, the algorithm described in Section 3.2 that inserts a check instruction after the last weak update may introduce further redundancy, if the check instructions cross the control flow merge points to their uses. Those extraneous check instructions may impact the later instruction scheduling and degrade performance. In this section, we examine the issues regarding how to eliminate such redundant check instructions. There are two ways to place a check instruction during the speculative PRE: one is at the use location of the speculative redundant expression. The other is after the last weak update but before its next use as we already mentioned. Both schemes may introduce redundant check instructions. Consider the example in Figure 8(a). The occurrence of expression *p in basic block D is assumed to be speculatively redundant to *p in basic block C, and non-speculatively redundant to *p in basic block B. In this case, it is better to generate a check instruction after the weak update *q in basic block C than a check instruction at the use location in basic block D. This is because it could introduce a redundant check instruction if the program executes along the path A B D. =*p. B = *p A = *p *q = (m a y m o d i f y *p) D c h e c k (a) A check instruction is generated after the weak update C = *p *q = (m a y m o d i f y *p) *p = exp B A = *p c h e c k (b) A check instruction is generated at the use location Figure 8 Examples of effective placement of check instructions in speculative PRE The example in Figure 8(b) shows an opposite situation. It is better to generate a check instruction at the use location because a check instruction generated after the weak update *q will introduce a redundant check instruction if the path A B is executed. It is difficult to determine which check instruction C placement scheme is superior because it depends on the structure of the application program. 5. Performance Measurements Increase percentage 25% 20% 15% 10% 5% 0% ammp art equake bzip2 gap mcf parser twolf number of basic blocks code size Figure 9 Number of basic blocks and code size increase with the if-block based recovery code generation scheme We implemented our new check instruction and recovery code generation algorithm for general speculative PRE optimizations in the ORC compiler version 2.0, and tested on the HP zx2000 workstation equipped with an Itainum-2 900MHz processor and 2GB of memory. Eight SPEC2000 programs are used in this measurement. The base versions used for comparison are compiled at optimization level O3. Our experiments contain two parts. In the first part, we study the impact of representing the check instruction and the recovery code as an if-block on the code generated by the compiler and its run time performance. In the second part, we evaluate the performance impact of the recovery code generation with the multi-level speculation support. The performance data is collected using the Pfmon tool [21] Impact of Recovery Code Generation Our recovery code generation algorithm is quite effective in practice. We show the increase in the number of basic blocks and the code size of the eight benchmarks from the speculative PRE optimization in Figure 9. We collected the number of basic blocks at the compile time, and the code size is measured using the size of text segments. The inserted ld.c and/or chk.a instructions and respective recovery blocks contribute only slightly to the static code size. Note that since our general speculative optimizations can eliminate speculative redundant expressions, it would offset the code increase from the check instructions and their recovery blocks. As can be seen in Figure 9, the impact on code size is marginal. We also collected the

8 instruction miss stall at runtime as shown in Table 1. We observe that the instruction cache miss increased a little in our recovery code generation version compared to the base version. For benchmark gap, the instruction cache miss is high due to frequent indirect procedure calls in the program. Figure 10 shows performance comparison of recovery code generation in speculative PRE using the assignment representation, the improved assignment representation and the if-block representation. The main difference between the assignment-based scheme and the improved assignment based scheme is that we spent much effort to modify later analysis and optimization phases to minimize the negative impact from the assignment statements generated from the check instructions in the improved assignment based scheme. Overall, Figure 10 shows that the if-block based scheme outperforms assignment based schemes. For some benchmarks such as parser and bzip2, the performance difference is rather small. This is because the opportunity for data speculation in these two Improvement percentage benchmarks is low. Also, despite our numerous efforts in minimizing the negative impact of assignment statements introduced by check instructions, the improvement in performance with the improved assignment based scheme still less than the if-block based scheme. However, we shall not focus on the performance aspect of the if-block based representation. Its key contribution is that we can avoid tedious tuning work required in later optimization phases to overcome the ill-effect from assignment based representation of check instructions. ammp art equake bzip2 gap mcf parser twolf % 0.01% 0.05% 0.04% 8.62% 0.02% 0.43% 9.46% % 0.02% 0.06% 0.05% 8.95% 0.03% 0.46% 10.57% Table 1 Percentage of instruction miss stalls with the if-block based recovery code scheme *1: The percentage of Instruction miss stalls in total cpu cycles in base version *2: The percentage of Instruction miss stalls in total cpu cycles after recovery code generation 40% 35% 30% 25% 20% 15% 10% 5% 0% ammp art equake bzip2 gap mcf parser twolf cpu cycles(assignment based) cpu cycles (improved assignment based) cpu cycles(if-block based) data access stall(assignment based) data access stall(improved assignment based) data access stall(if-block based) loads retired(assignment based) loads retired(improved assignment based) loads retired(if-block based) Figure 10 Overall performance comparison in speculative PRE using assignment based scheme and the if-block based scheme 5.2. Performance Comparison with Single- Level and Multi-Level Data Speculation Figure 11 shows the performance difference in speculative PRE with single-level and multi-level speculation support. We can observe some performance difference for benchmark ammp, equake and gap when multi-level data speculation is enabled. Overall, the impact on performance is not substantial. Consider Equake, for example, many intervening stores exist between multiple redundant multi-level memory loads. Since multi-level memory references here are pointer type and the type-based alias analysis assumes that aliases could only occur among memory references of the same type, hence, the possible aliasing caused by intervening stores can be ignored. However, multi-level speculation opportunities can still be applied even when the type-based alias analysis is enabled. In the code segment shown in Figure 13(a), the type of expression w[col][1], A[Anext], and A[Anext][0] are different, so the compiler could assume there is no alias existed among expressions w[col][0], A[Anext], and A[Anext][1]. We can see that expressions A[Anext] and A[Anext][1] can be promoted to registers simply using type-based alias analysis. Nevertheless, the opportunity of data speculation can still be applied to those pointer references of the same type. For example, the expression A[Anext][1][1] cannot be promoted to a register because it is the same type as the store expression w[col][0]. In Figure 12, we can see that both single-level speculation and multi-level

9 speculation increase performance for some benchmarks even when type-based alias analysis is enabled. Improvement percentage 40% 35% 30% 25% 20% 15% 10% 5% 0% ammp art equake bzip2 gap mcf parser twolf cpu cycles(single-level) cpu cycles(multi-level) data access stall(single-level) data access stall(multi-level) loads retired(single-level) loads retired(multi-level) Figure 11 Performance comparison using if-block based recovery code generation between single level and multi-level data speculation support, where the base version used for comparison is compiled with O3. Improvement percentage 35% 30% 25% 20% 15% 10% 5% 0% ammp art equake bzip2 gap mcf parser twolf cpu cycles(single-level) cpu cycles(m ulti-level) data access stall(single-level) data access stall(multi-level) loads retired(single-level) loads retired(m ulti-level) Figure 12 Performance comparison using if-block based recovery code generation between single level and multi-level data speculation support, where the base version used for comparison is compiled with O3 and type based alias analysis. for (i = 0; i < nodes; i++) {... while (Anext < Alast) { col = Acol[Anext]; sum0 += A[Anext][0][0] * sum1+= A[Anext][1][1] * sum2+= A[Anext][2][2] * w[col][0] += A[Anext][0][0]*v[i][0] + w[col][1] += A[Anext][1][1]*v[i][1] + w[col][2] += A[Anext][2][2]*v[i][2] + Anext++; (a) Multi-level speculation for multiple level memory loads *u = (cos(src.rake) * sin(src.strike) - sin(src.rake) * cos(src.strike) * cos(src.dip)); *v = (cos(src.rake) * cos(src.strike) + sin(src.rake) * sin(src.strike) * cos(src.dip)); *w = sin(src.rake) * sin(src.dip); (b) Multi-level speculation for computation expressions including intrinsic procedure call such as sin and cos Figure 13 Two examples of multi-level speculation opportunities in benchmark Equake

10 The type-based alias analysis would not help when the intervening stores have the same type as the multilevel memory loads or computation expressions. Consider the code in Figure 13(b). The expression Src.rake, Src.strike and Src.dip are of the same type as the expressions *u and *v. Therefore, with single-level data speculation, our compiler can speculatively promote expressions Src.rake, Src.strike and Src.dip into registers. With multi-level speculative PRE, our compiler can further speculatively promote the value of expressions sin(src.rake), sin(src.strike), cos(src.rake), cos(src.strike) and cos(src.dip) to registers. This would effectively eliminate six expensive function calls. Such performance opportunity may exist in some important applications. Finally, it should be noted that the type-based alias analysis is not always safe and can only work for languages that are type-safe. In the near future, we will further investigate the impact of multi-level speculation on the recovery code generation through more benchmark programs. 6. Related Work Most of the recovery code generation schemes proposed so far only deals with specific optimizations. If the expression*p and *q are potential aliases with a very low probability, the load instruction can be speculatively moved across the store instruction. A chk.a instruction is inserted at its original location to check for possible mis-speculation at runtime. It will jump to a recovery block if a mis-speculation occurs. This algorithm works very well for code scheduling, but not for more general speculative optimizations such as speculative partial redundancy elimination. Consider the example in Figure 15(a), assume *q and *p are unlikely to be aliases. The compiler can speculatively replace the expression *p+b (s4) by a+b (s1) as shown in Figure 14(c). Using the JNMW algorithm, in the later recovery code generation phase, it cannot find the instruction on the dependence chain between the ld.a and chk.a because it has been speculatively eliminated. This can be shown with the output in Figure 15(b), where the instruction *p+b (i.e. add r3=r4, r2 in Figure 15(c)) is eliminated. Keeping such eliminated instructions on the dependence chains in the IR can significantly complicate later optimizations because they must be handled differently from regular instructions. Another difficulty with the JNMW algorithm is that it implicitly assumes that there exists only one speculative load (ld.a) instruction for each chk.a. This assumption is only true for instruction scheduling. For speculative PRE, one chk.a instruction may correspond to multiple ld.a instructions, so multiple flow dependence chains may exist for a single chk.a instruction. This can be shown with the example in Figure 15(d). We can speculatively eliminate *p (s6) and use the value loaded in s1 or s3. A ld.a instruction is generated for s1 and s3, respectively (as shown in Figure 15(e)). The chk.a instruction in s6 corresponds to multiple ld.a instructions (at s1 and s3). If the compiler selects the flow dependence chain based on the ld.a instruction in s3, according to the JNMW algorithm, the statement s4, i.e. add r2 = r1, 1, would be included in the recovery block. However, this instruction is non-speculative, and should not be included in the recovery code. The correct recovery block should be as shown in Figure 15(g). This example shows the need to distinguish non-speculative instructions from speculative instructions in IR for correct recovery code generation. All later optimization phases must be aware of the existence of possible recovery blocks and check instructions. In [16], a check instruction is represented implicitly as an assignment statement and its recovery block as its associated flow dependence chain. Consider a simple example in Figure 15(a). The second occurrence of expression *p in s3 can be regarded as speculatively redundant to the first *p in s1. Therefore, we can use ld.a for the first load of *p, and use the check instruction for the second *p. In Figure 15(b), an assignment statement shown as r 2 = *p with a check flag is used to represent chk.a or ld.c, respectively. The recovery block is implicitly assumed to have the original load instruction for *p in s3. In the later code scheduling, if some instructions can be speculatively moved up as the example in Figure 15, the flow dependence chain between ld.a and the check instruction implicitly represents the remaining code inside the recovery block. However, as shown in Figure 15(b), the live range of the register r has been changed, and the first definition of r in s1 can no longer reach the last use of r in s3. The version number of r is changed in s3 from r1 to r2 because the check instruction is represented as an assignment statement. This in some sense violates the semantics of the check instruction because it requires that both the speculative load and the corresponding check instruction use the same register, but such a representation do not allow that to happen. In addition, this implicit representation could inhibit later optimizations, such as code scheduling, for instructions

11 related to the speculative load. For example, it no longer allows the instruction r+10 in s3 to be speculatively hoisted across the store instruction *q in s1:... = a+b s2: *p = a s3: *q =... s4:... = *p+b s1: ld r1 = [&a] s2: ld r2 = [&b] s3: add r3 = r1, r2 s4: st [p] = r1 s5: ld.a r4 = [p] s6: st [q] = s7: chk.a r4, recovery label:. recovery: /* incorrect */ ld r4 = [p] br label s2. Existing check and recovery code generation approaches are thus inadequate for general speculative optimizations. s1: ld r1 = [&a] s2: ld r2 = [&b] s3: add r3 = r1, r2 s4: st [p] = r1 s5: ld.a r4 = [p] s6: st [q] = s7: chk.a r4, recovery label:. recovery: /* correct */ ld r4 = [p] add r3 = r4, r2 /* missing in the speculative chain*/ br label (a)original program (b) incorrect recovery code for global value numbering based speculative redundancy (c) correct recovery code for global value numbering based speculative redundancy elimination s1: t =*p+2 s3: t =*p+1 s1: ld.a r1= [p] s2: add r2 = r1,2 s3: ld.a r1= [p] s4: add r2 = r1,1 s5: *q = s6: =*p (d) original program recovery: /*incorrect */ ld a r1 = [p] add r2 = r1, 1 br label (f) incorrect recovery code for speculative redundancy elimination s5: st [q]= s6: chk.a r1, recovery label: (e) output after speculative redundancy elimination recovery: /*correct */ ld a r1 = [p] br label (g) correct recovery code for speculative redundancy elimination Figure 15 Examples to show that the scheme proposed in [9] is insufficient to handle general speculative optimizations. (a), (b) and (c) shows one situation where some instructions could be missing in the recovery block. (d), (e) and (f) shows another situation where some instructions could be mistakenly added into the recovery block. 7. Conclusions In this paper, we propose a compiler framework to generate check instructions and recovery code for general speculative optimizations. The contributions of this paper are as follows: First, we propose a simple ifblock representation for the check instruction and their corresponding recovery blocks. It allows speculative recovery code introduced early on during program optimizations to be seamlessly integrated with subsequent optimizations. We use this framework to generate the recovery code for speculative PRE-based optimizations that include partial redundancy elimination, register promotion and strength reduction. We then show that the recovery code generated for speculative PRE can be integrated seamlessly with later optimizations such as instruction scheduling. Furthermore, this recovery code representation can be applied to other speculative optimizations such as speculative instruction scheduling. Finally, we show that our proposed framework can efficiently support the compiler to generate recovery code for both singlelevel and multi-level speculations. We have implemented the proposed recovery code generation framework in Intel s ORC compiler. In the past, we had used the assignment based representation for check

12 instructions. We ended up spending numerous hours in tuning later optimizations to minimize the negative impact from assignment based check instructions. Using the if-block based representation, the later analysis and optimization phases remain the same, saving substantial development efforts. As for future work, we would like to support more speculative optimizations, such as value speculation and thread-level speculation, under the same framework to ensure its generality and to exploit additional performance opportunities. Acknowgement The authors wish to thank Sun Chan (Intel), Peng Tu (Intel), Raymond Lo for their valuable suggestions and comments. This work was supported in part by the U.S. National Science Foundation under grants EIA , CCR , CCR , and EIA , and grants from Intel. References [1] J. Bharadwaj, K. Menezes, and C. McKinsey. Wavefront scheduling: Path based data representation and scheduling of subgraphs, MICRO 32, December, [2] R.A. Bringmann, S.A. Mahlke, R. E. Hank, J.C. Gyllenhaal, and W.W. Hwu, Speculative Execution Exception Recovery Using Write-Back Suppression, MICRO 26, Dec [3] T. Chen, J. Lin, W. Hsu, P.C. Yew. An Empirical Study on the Granularity of Pointer Analysis in C Programs, In 15th Workshop on Languages and Compilers for Parallel Computing, Maryland, July [4] F. Chow, S. Chan, S. Liu, R. Lo, and M. Streich. Effective representation of aliases and indirect memory operations in SSA form. In Proc. of the Sixth Int l Conf. on Compiler Construction, April [5] F. Chow, S. Chan, R. Kennedy, S. Liu, R. Lo, and P. Tu. A new algorithm for partial redundancy elimination based on SSA form. PLDI 97, pages , Las Vegas, Nevada, May [6] R. Cytron, et. al., Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4): , [7] C. Dulong. The IA-64 Architecture at Work, IEEE Computer, Vol. 31, No. 7, pages 24-32, July [8] K. Ishizaki, T. Inagaki, H. Komatsu, T. Nakatani, Eliminating Exception Constraints of Java Programs for IA-64, PACT 2002, pp , September, [9] R. D.-C. Ju, K. Nomura, U. Mahadevan, and L.-C. Wu. A Unified Compiler Framework for Control and Data Speculation, PACT 2000, pages , Oct [10] R. D.-C. Ju, S. Chan, and C. Wu. Open Research Compiler (ORC) for the Itanium Processor Family. Tutorial presented at Micro 34, [11] R. D.-C. Ju, S. Chan, F. Chow, and X. Feng. Open Research Compiler (ORC): Beyond Version 1.0, Tutorial presented at PACT [12] M. Kawahito, H. Komatsu, and T. Nakatani, Effective Null Pointer Check Elimination Utilizing Hardware Trap, ASPLOS-IX, Nov , [13] R. Kennedy, F. Chow, P. Dahl, S.-M. Liu, R. Lo, and M. Streich. Strength reduction via SSAPRE. In Proc. of the Seventh Int l Conf. on Compiler Construction, pages , Lisbon, Portugal, Apr [14] R.Kennedy, S. Chan, S. Liu, R. Lo, P. Tu, and F. Chow. Partial Redundancy Elimination in SSA Form. ACM Trans. on Programming Languages and systems, v.21 n.3, pages , May [15] J. Knoop, O. Ruthing, and B. Steffen. Lazy code motion.pldi 92, pages , San Francisco, California, June [16] J. Lin, T. Chen, W.-C. Hsu, P.-C. Yew, R. D-C Ju, T-F. Ngai, S Chan, A Compiler Framework for Speculative Analysis and Optimizations, PLDI 03, June [17] J. Lin, T.Chen, W.C. Hsu, P.C. Yew, Speculative Register Promotion Using Advanced Load Address Table (ALAT), CGO03, pages , San Francisco, California, March 2003 [18] R. Lo, F. Chow, R. Kennedy, S. Liu, P. Tu, Register Promotion by Sparse Partial Redundancy Elimination of Loads and Stores, PLDI 98, pages 26-37, Montreal, 1998 [19] S.A. Mahlke, Exploiting Instruction Level Parallelism in the Presence of Conditional Branches, Ph. D. Thesis, University of Illinois, Urbana, IL, [20] U. Mahadevan, K. Nomura, R. D.-C. Ju, and Rick Hank, Applying Data Speculation in Modulo Scheduled Loops, PACT 2000, pp , Oct [21] pfmon: ftp://ftp.hpl.hp.com/pub/linux-ia64/pfmon ia64.rpm [22] Youfeng Wu and James R. Larus. Static branch frequency and program profile analysis.micro 27, pages 1--11, Nov 1994.

A Compiler Framework for Speculative Optimizations

A Compiler Framework for Speculative Optimizations A Compiler Framework for Speculative Optimizations JIN LIN, TONG CHEN, WEI-CHUNG HSU, PEN-CHUNG YEW and ROY DZ-CHING JU, TIN-FOOK NGAI, SUN CHAN Speculative execution, such as control speculation or data

More information

Roy Dz-Ching Ju Tin-Fook Ngai Sun Chan Microprocessor Research Lab. Intel Corporation Santa Clara, CA U.S.A. {jin, tchen, hsu,

Roy Dz-Ching Ju Tin-Fook Ngai Sun Chan Microprocessor Research Lab. Intel Corporation Santa Clara, CA U.S.A. {jin, tchen, hsu, A Compiler Framework for Speculative Analysis and Optimizations Jin Lin Tong Chen Wei-Chung Hsu Pen-Chung Yew Department of Computer Science and Engineering University of Minnesota Minneapolis, MN 55455

More information

Outline. Speculative Register Promotion Using Advanced Load Address Table (ALAT) Motivation. Motivation Example. Motivation

Outline. Speculative Register Promotion Using Advanced Load Address Table (ALAT) Motivation. Motivation Example. Motivation Speculative Register Promotion Using Advanced Load Address Table (ALAT Jin Lin, Tong Chen, Wei-Chung Hsu, Pen-Chung Yew http://www.cs.umn.edu/agassiz Motivation Outline Scheme of speculative register promotion

More information

Speculative Register Promotion Using Advanced Load Address Table (ALAT)

Speculative Register Promotion Using Advanced Load Address Table (ALAT) Speculative Register Promotion Using Advanced Load Address Table (ALAT) Jin Lin, Tong Chen, Wei-Chung Hsu and Pen-Chung Yew Department of Computer Science and Engineering University of Minnesota Email:

More information

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known

More information

Title: ====== Open Research Compiler (ORC): Proliferation of Technologies and Tools

Title: ====== Open Research Compiler (ORC): Proliferation of Technologies and Tools Tutorial Proposal to Micro-36 Title: ====== Open Research Compiler (ORC): Proliferation of Technologies and Tools Abstract: ========= Open Research Compiler (ORC) has been well adopted by the research

More information

Implementation of Trace Optimization and Investigation of Advanced Load Optimization in ADORE. Technical Report

Implementation of Trace Optimization and Investigation of Advanced Load Optimization in ADORE. Technical Report Implementation of Trace Optimization and Investigation of Advanced Load Optimization in ADORE Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building

More information

Shengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota

Shengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota Loop Selection for Thread-Level Speculation, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota Chip Multiprocessors (CMPs)

More information

Wish Branch: A New Control Flow Instruction Combining Conditional Branching and Predicated Execution

Wish Branch: A New Control Flow Instruction Combining Conditional Branching and Predicated Execution Wish Branch: A New Control Flow Instruction Combining Conditional Branching and Predicated Execution Hyesoon Kim Onur Mutlu Jared Stark David N. Armstrong Yale N. Patt High Performance Systems Group Department

More information

Lazy Code Motion. Jens Knoop FernUniversität Hagen. Oliver Rüthing University of Dortmund. Bernhard Steffen University of Dortmund

Lazy Code Motion. Jens Knoop FernUniversität Hagen. Oliver Rüthing University of Dortmund. Bernhard Steffen University of Dortmund RETROSPECTIVE: Lazy Code Motion Jens Knoop FernUniversität Hagen Jens.Knoop@fernuni-hagen.de Oliver Rüthing University of Dortmund Oliver.Ruething@udo.edu Bernhard Steffen University of Dortmund Bernhard.Steffen@udo.edu

More information

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction Group B Assignment 8 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code optimization using DAG. 8.1.1 Problem Definition: Code optimization using DAG. 8.1.2 Perquisite: Lex, Yacc, Compiler

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Machine-Independent Optimizations

Machine-Independent Optimizations Chapter 9 Machine-Independent Optimizations High-level language constructs can introduce substantial run-time overhead if we naively translate each construct independently into machine code. This chapter

More information

Continuous Adaptive Object-Code Re-optimization Framework

Continuous Adaptive Object-Code Re-optimization Framework Continuous Adaptive Object-Code Re-optimization Framework Howard Chen, Jiwei Lu, Wei-Chung Hsu, and Pen-Chung Yew University of Minnesota, Department of Computer Science Minneapolis, MN 55414, USA {chenh,

More information

AN EFFICIENT FRAMEWORK FOR PERFORMING EXECUTION-CONSTRAINT-SENSITIVE TRANSFORMATIONS THAT INCREASE INSTRUCTION-LEVEL PARALLELISM

AN EFFICIENT FRAMEWORK FOR PERFORMING EXECUTION-CONSTRAINT-SENSITIVE TRANSFORMATIONS THAT INCREASE INSTRUCTION-LEVEL PARALLELISM Copyright by John Christopher Gyllenhaal, 1997 AN EFFICIENT FRAMEWORK FOR PERFORMING EXECUTION-CONSTRAINT-SENSITIVE TRANSFORMATIONS THAT INCREASE INSTRUCTION-LEVEL PARALLELISM BY JOHN CHRISTOPHER GYLLENHAAL

More information

Lecture Notes on Common Subexpression Elimination

Lecture Notes on Common Subexpression Elimination Lecture Notes on Common Subexpression Elimination 15-411: Compiler Design Frank Pfenning Lecture 18 October 29, 2015 1 Introduction Copy propagation allows us to have optimizations with this form: l :

More information

POSH: A TLS Compiler that Exploits Program Structure

POSH: A TLS Compiler that Exploits Program Structure POSH: A TLS Compiler that Exploits Program Structure Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign

More information

Execution-based Prediction Using Speculative Slices

Execution-based Prediction Using Speculative Slices Execution-based Prediction Using Speculative Slices Craig Zilles and Guri Sohi University of Wisconsin - Madison International Symposium on Computer Architecture July, 2001 The Problem Two major barriers

More information

Chapter-5 Memory Hierarchy Design

Chapter-5 Memory Hierarchy Design Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or

More information

Eliminating Exception Constraints of Java Programs for IA-64

Eliminating Exception Constraints of Java Programs for IA-64 Journal of Instruction Level Parallelism 5 (2003) 1-22 Submitted 10/02; Accepted 04/03 Eliminating Exception Constraints of Java Programs for IA-64 Kazuaki Ishizaki Tatsushi Inagaki Hideaki Komatsu Toshio

More information

Using Cache Line Coloring to Perform Aggressive Procedure Inlining

Using Cache Line Coloring to Perform Aggressive Procedure Inlining Using Cache Line Coloring to Perform Aggressive Procedure Inlining Hakan Aydın David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA, 02115 {haydin,kaeli}@ece.neu.edu

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

More on Conjunctive Selection Condition and Branch Prediction

More on Conjunctive Selection Condition and Branch Prediction More on Conjunctive Selection Condition and Branch Prediction CS764 Class Project - Fall Jichuan Chang and Nikhil Gupta {chang,nikhil}@cs.wisc.edu Abstract Traditionally, database applications have focused

More information

Supporting Speculative Multithreading on Simultaneous Multithreaded Processors

Supporting Speculative Multithreading on Simultaneous Multithreaded Processors Supporting Speculative Multithreading on Simultaneous Multithreaded Processors Venkatesan Packirisamy, Shengyue Wang, Antonia Zhai, Wei-Chung Hsu, and Pen-Chung Yew Department of Computer Science, University

More information

Speculative Optimizations for Parallel Programs on Multicores

Speculative Optimizations for Parallel Programs on Multicores Speculative Optimizations for Parallel Programs on Multicores Vijay Nagarajan and Rajiv Gupta University of California, Riverside, CSE Department, Riverside CA 92521 {vijay,gupta}@cs.ucr.edu Abstract.

More information

Partial Redundancy Elimination and SSA Form

Partial Redundancy Elimination and SSA Form Topic 5a Partial Redundancy Elimination and SSA Form 2008-03-26 \course\cpeg421-08s\topic-5a.ppt\course\cpeg421-07s\topic- 7b.ppt 1 References Robert Kennedy, Sun Chan, Shin-ming Liu, Raymond Lo, Pend

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks

An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 s Joshua J. Yi and David J. Lilja Department of Electrical and Computer Engineering Minnesota Supercomputing

More information

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window Huiyang Zhou School of Computer Science University of Central Florida New Challenges in Billion-Transistor Processor Era

More information

Anticipation-based partial redundancy elimination for static single assignment form

Anticipation-based partial redundancy elimination for static single assignment form SOFTWARE PRACTICE AND EXPERIENCE Softw. Pract. Exper. 00; : 9 Published online 5 August 00 in Wiley InterScience (www.interscience.wiley.com). DOI: 0.00/spe.68 Anticipation-based partial redundancy elimination

More information

RISC Principles. Introduction

RISC Principles. Introduction 3 RISC Principles In the last chapter, we presented many details on the processor design space as well as the CISC and RISC architectures. It is time we consolidated our discussion to give details of RISC

More information

Breaking Cyclic-Multithreading Parallelization with XML Parsing. Simone Campanoni, Svilen Kanev, Kevin Brownell Gu-Yeon Wei, David Brooks

Breaking Cyclic-Multithreading Parallelization with XML Parsing. Simone Campanoni, Svilen Kanev, Kevin Brownell Gu-Yeon Wei, David Brooks Breaking Cyclic-Multithreading Parallelization with XML Parsing Simone Campanoni, Svilen Kanev, Kevin Brownell Gu-Yeon Wei, David Brooks 0 / 21 Scope Today s commodity platforms include multiple cores

More information

A Framework for Safe Automatic Data Reorganization

A Framework for Safe Automatic Data Reorganization Compiler Technology A Framework for Safe Automatic Data Reorganization Shimin Cui (Speaker), Yaoqing Gao, Roch Archambault, Raul Silvera IBM Toronto Software Lab Peng Peers Zhao, Jose Nelson Amaral University

More information

Improvements to Linear Scan register allocation

Improvements to Linear Scan register allocation Improvements to Linear Scan register allocation Alkis Evlogimenos (alkis) April 1, 2004 1 Abstract Linear scan register allocation is a fast global register allocation first presented in [PS99] as an alternative

More information

Tour of common optimizations

Tour of common optimizations Tour of common optimizations Simple example foo(z) { x := 3 + 6; y := x 5 return z * y } Simple example foo(z) { x := 3 + 6; y := x 5; return z * y } x:=9; Applying Constant Folding Simple example foo(z)

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 11 Instruction Sets: Addressing Modes and Formats

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 11 Instruction Sets: Addressing Modes and Formats William Stallings Computer Organization and Architecture 8 th Edition Chapter 11 Instruction Sets: Addressing Modes and Formats Addressing Modes Immediate Direct Indirect Register Register Indirect Displacement

More information

Interaction of JVM with x86, Sparc and MIPS

Interaction of JVM with x86, Sparc and MIPS Interaction of JVM with x86, Sparc and MIPS Sasikanth Avancha, Dipanjan Chakraborty, Dhiral Gada, Tapan Kamdar {savanc1, dchakr1, dgada1, kamdar}@cs.umbc.edu Department of Computer Science and Electrical

More information

Speculative Parallelization in Decoupled Look-ahead

Speculative Parallelization in Decoupled Look-ahead Speculative Parallelization in Decoupled Look-ahead Alok Garg, Raj Parihar, and Michael C. Huang Dept. of Electrical & Computer Engineering University of Rochester, Rochester, NY Motivation Single-thread

More information

The SGI Pro64 Compiler Infrastructure - A Tutorial

The SGI Pro64 Compiler Infrastructure - A Tutorial The SGI Pro64 Compiler Infrastructure - A Tutorial Guang R. Gao (U of Delaware) J. Dehnert (SGI) J. N. Amaral (U of Alberta) R. Towle (SGI) Acknowledgement The SGI Compiler Development Teams The MIPSpro/Pro64

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information

The CPU IA-64: Key Features to Improve Performance

The CPU IA-64: Key Features to Improve Performance The CPU IA-64: Key Features to Improve Performance Ricardo Freitas Departamento de Informática, Universidade do Minho 4710-057 Braga, Portugal freitas@fam.ulusiada.pt Abstract. The IA-64 architecture is

More information

Lecture 14 Pointer Analysis

Lecture 14 Pointer Analysis Lecture 14 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis [ALSU 12.4, 12.6-12.7] Phillip B. Gibbons 15-745: Pointer Analysis

More information

Supporting Operating System Kernel Data Disambiguation using Points-to Analysis

Supporting Operating System Kernel Data Disambiguation using Points-to Analysis Supporting Operating System Kernel Data Disambiguation using Points-to Analysis Amani Ibriham, James Hamlyn-Harris, John Grundy & Mohamed Almorsy Center for Computing and Engineering Software Systems Swinburne

More information

IA-64 Compiler Technology

IA-64 Compiler Technology IA-64 Compiler Technology David Sehr, Jay Bharadwaj, Jim Pierce, Priti Shrivastav (speaker), Carole Dulong Microcomputer Software Lab Page-1 Introduction IA-32 compiler optimizations Profile Guidance (PGOPTI)

More information

Saving Register-File Leakage Power by Monitoring Instruction Sequence in ROB

Saving Register-File Leakage Power by Monitoring Instruction Sequence in ROB Saving Register-File Leakage Power by Monitoring Instruction Sequence in ROB Wann-Yun Shieh Department of Computer Science and Information Engineering Chang Gung University Tao-Yuan, Taiwan Hsin-Dar Chen

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches

計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches 4.1 Basic Compiler Techniques for Exposing ILP 計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches 吳俊興高雄大學資訊工程學系 To avoid a pipeline stall, a dependent instruction must be

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Automatic Counterflow Pipeline Synthesis

Automatic Counterflow Pipeline Synthesis Automatic Counterflow Pipeline Synthesis Bruce R. Childers, Jack W. Davidson Computer Science Department University of Virginia Charlottesville, Virginia 22901 {brc2m, jwd}@cs.virginia.edu Abstract The

More information

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function Chen-Ting Chang, Yu-Sheng Chen, I-Wei Wu, and Jyh-Jiun Shann Dept. of Computer Science, National Chiao

More information

Automatically Locating software Errors using Interesting Value Mapping Pair (IVMP)

Automatically Locating software Errors using Interesting Value Mapping Pair (IVMP) 71 Automatically Locating software Errors using Interesting Value Mapping Pair (IVMP) Ajai Kumar 1, Anil Kumar 2, Deepti Tak 3, Sonam Pal 4, 1,2 Sr. Lecturer, Krishna Institute of Management & Technology,

More information

Chapter 11. Instruction Sets: Addressing Modes and Formats. Yonsei University

Chapter 11. Instruction Sets: Addressing Modes and Formats. Yonsei University Chapter 11 Instruction Sets: Addressing Modes and Formats Contents Addressing Pentium and PowerPC Addressing Modes Instruction Formats Pentium and PowerPC Instruction Formats 11-2 Common Addressing Techniques

More information

Computer Architecture and Organization. Instruction Sets: Addressing Modes and Formats

Computer Architecture and Organization. Instruction Sets: Addressing Modes and Formats Computer Architecture and Organization Instruction Sets: Addressing Modes and Formats Addressing Modes Immediate Direct Indirect Register Register Indirect Displacement (Indexed) Stack Immediate Addressing

More information

An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors

An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors Onur Mutlu Hyesoon Kim David N. Armstrong Yale N. Patt High Performance Systems Group

More information

Lecture 16 Pointer Analysis

Lecture 16 Pointer Analysis Pros and Cons of Pointers Lecture 16 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis Many procedural languages have pointers

More information

Lecture 27. Pros and Cons of Pointers. Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis

Lecture 27. Pros and Cons of Pointers. Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis Pros and Cons of Pointers Lecture 27 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis Many procedural languages have pointers

More information

Lecture 20 Pointer Analysis

Lecture 20 Pointer Analysis Lecture 20 Pointer Analysis Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis (Slide content courtesy of Greg Steffan, U. of Toronto) 15-745:

More information

An Effective Automated Approach to Specialization of Code

An Effective Automated Approach to Specialization of Code An Effective Automated Approach to Specialization of Code Minhaj Ahmad Khan, H.-P. Charles, and D. Barthou University of Versailles-Saint-Quentin-en-Yvelines, France. Abstract. Application performance

More information

15-740/ Computer Architecture Lecture 8: Issues in Out-of-order Execution. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 8: Issues in Out-of-order Execution. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 8: Issues in Out-of-order Execution Prof. Onur Mutlu Carnegie Mellon University Readings General introduction and basic concepts Smith and Sohi, The Microarchitecture

More information

Code Placement, Code Motion

Code Placement, Code Motion Code Placement, Code Motion Compiler Construction Course Winter Term 2009/2010 saarland university computer science 2 Why? Loop-invariant code motion Global value numbering destroys block membership Remove

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Cost Effective Dynamic Program Slicing

Cost Effective Dynamic Program Slicing Cost Effective Dynamic Program Slicing Xiangyu Zhang Rajiv Gupta Department of Computer Science The University of Arizona Tucson, Arizona 87 {xyzhang,gupta}@cs.arizona.edu ABSTRACT Although dynamic program

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra is a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

Superscalar Processor Design

Superscalar Processor Design Superscalar Processor Design Superscalar Organization Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 26 SE-273: Processor Design Super-scalar Organization Fetch Instruction

More information

Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research

Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Joel Hestness jthestness@uwalumni.com Lenni Kuff lskuff@uwalumni.com Computer Science Department University of

More information

An Efficient Indirect Branch Predictor

An Efficient Indirect Branch Predictor An Efficient Indirect ranch Predictor Yul Chu and M. R. Ito 2 Electrical and Computer Engineering Department, Mississippi State University, ox 957, Mississippi State, MS 39762, USA chu@ece.msstate.edu

More information

Eliminating Exception Constraints of Java Programs for IA-64

Eliminating Exception Constraints of Java Programs for IA-64 Eliminating Exception Constraints of Java Programs for IA-64 Kazuaki Ishizaki, Tatsushi Inagaki, Hideaki Komatsu,Toshio Nakatani IBM Research, Tokyo Research

More information

Programmazione Avanzata

Programmazione Avanzata Programmazione Avanzata Vittorio Ruggiero (v.ruggiero@cineca.it) Roma, Marzo 2017 Pipeline Outline CPU: internal parallelism? CPU are entirely parallel pipelining superscalar execution units SIMD MMX,

More information

Techniques for Efficient Processing in Runahead Execution Engines

Techniques for Efficient Processing in Runahead Execution Engines Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt Depment of Electrical and Computer Engineering University of Texas at Austin {onur,hyesoon,patt}@ece.utexas.edu

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Lecture 9 Dynamic Compilation

Lecture 9 Dynamic Compilation Lecture 9 Dynamic Compilation I. Motivation & Background II. Overview III. Compilation Policy IV. Partial Method Compilation V. Partial Dead Code Elimination VI. Escape Analysis VII. Results Partial Method

More information

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

AMD S X86 OPEN64 COMPILER. Michael Lai AMD AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries

More information

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Woosung Lee, Keewon Cho, Jooyoung Kim, and Sungho Kang Department of Electrical & Electronic Engineering, Yonsei

More information

Understanding Uncertainty in Static Pointer Analysis

Understanding Uncertainty in Static Pointer Analysis Understanding Uncertainty in Static Pointer Analysis Constantino Goncalves Ribeiro E H U N I V E R S I T Y T O H F R G E D I N B U Master of Philosophy Institute of Computing Systems Architecture School

More information

INTERACTION COST: FOR WHEN EVENT COUNTS JUST DON T ADD UP

INTERACTION COST: FOR WHEN EVENT COUNTS JUST DON T ADD UP INTERACTION COST: FOR WHEN EVENT COUNTS JUST DON T ADD UP INTERACTION COST HELPS IMPROVE PROCESSOR PERFORMANCE AND DECREASE POWER CONSUMPTION BY IDENTIFYING WHEN DESIGNERS CAN CHOOSE AMONG A SET OF OPTIMIZATIONS

More information

An Assembler for the MSSP Distiller Eric Zimmerman University of Illinois, Urbana Champaign

An Assembler for the MSSP Distiller Eric Zimmerman University of Illinois, Urbana Champaign An Assembler for the MSSP Distiller Eric Zimmerman University of Illinois, Urbana Champaign Abstract It is important to have a means of manually testing a potential optimization before laboring to fully

More information

Instruction Based Memory Distance Analysis and its Application to Optimization

Instruction Based Memory Distance Analysis and its Application to Optimization Instruction Based Memory Distance Analysis and its Application to Optimization Changpeng Fang cfang@mtu.edu Steve Carr carr@mtu.edu Soner Önder soner@mtu.edu Department of Computer Science Michigan Technological

More information

Efficient Sequential Consistency Using Conditional Fences

Efficient Sequential Consistency Using Conditional Fences Efficient Sequential Consistency Using Conditional Fences Changhui Lin CSE Department University of California, Riverside CA 92521 linc@cs.ucr.edu Vijay Nagarajan School of Informatics University of Edinburgh

More information

Efficient Architecture Support for Thread-Level Speculation

Efficient Architecture Support for Thread-Level Speculation Efficient Architecture Support for Thread-Level Speculation A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Venkatesan Packirisamy IN PARTIAL FULFILLMENT OF THE

More information

Code Reordering and Speculation Support for Dynamic Optimization Systems

Code Reordering and Speculation Support for Dynamic Optimization Systems Code Reordering and Speculation Support for Dynamic Optimization Systems Erik M. Nystrom, Ronald D. Barnes, Matthew C. Merten, Wen-mei W. Hwu Center for Reliable and High-Performance Computing University

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra ia a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Accuracy Enhancement by Selective Use of Branch History in Embedded Processor

Accuracy Enhancement by Selective Use of Branch History in Embedded Processor Accuracy Enhancement by Selective Use of Branch History in Embedded Processor Jong Wook Kwak 1, Seong Tae Jhang 2, and Chu Shik Jhon 1 1 Department of Electrical Engineering and Computer Science, Seoul

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

CS553 Lecture Dynamic Optimizations 2

CS553 Lecture Dynamic Optimizations 2 Dynamic Optimizations Last time Predication and speculation Today Dynamic compilation CS553 Lecture Dynamic Optimizations 2 Motivation Limitations of static analysis Programs can have values and invariants

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

A Sparse Algorithm for Predicated Global Value Numbering

A Sparse Algorithm for Predicated Global Value Numbering Sparse Predicated Global Value Numbering A Sparse Algorithm for Predicated Global Value Numbering Karthik Gargi Hewlett-Packard India Software Operation PLDI 02 Monday 17 June 2002 1. Introduction 2. Brute

More information

Software Pipelining of Loops with Early Exits for the Itanium Architecture

Software Pipelining of Loops with Early Exits for the Itanium Architecture Software Pipelining of Loops with Early Exits for the Itanium Architecture Kalyan Muthukumar Dong-Yuan Chen ξ Youfeng Wu ξ Daniel M. Lavery Intel Technology India Pvt Ltd ξ Intel Microprocessor Research

More information

CS229 Project: TLS, using Learning to Speculate

CS229 Project: TLS, using Learning to Speculate CS229 Project: TLS, using Learning to Speculate Jiwon Seo Dept. of Electrical Engineering jiwon@stanford.edu Sang Kyun Kim Dept. of Electrical Engineering skkim38@stanford.edu ABSTRACT We apply machine

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

Super Scalar. Kalyan Basu March 21,

Super Scalar. Kalyan Basu March 21, Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build

More information

Binary-to-Binary Translation Literature Survey. University of Texas at Austin Department of Electrical and Computer Engineering

Binary-to-Binary Translation Literature Survey. University of Texas at Austin Department of Electrical and Computer Engineering Binary-to-Binary Translation Literature Survey University of Texas at Austin Department of Electrical and Computer Engineering Juan Rubio Wade Schwartzkopf March 16, 1998 I. INTRODUCTION...4 II. HISTORY...4

More information

Digital Filter Synthesis Considering Multiple Adder Graphs for a Coefficient

Digital Filter Synthesis Considering Multiple Adder Graphs for a Coefficient Digital Filter Synthesis Considering Multiple Graphs for a Coefficient Jeong-Ho Han, and In-Cheol Park School of EECS, Korea Advanced Institute of Science and Technology, Daejeon, Korea jhhan.kr@gmail.com,

More information

VLIW/EPIC: Statically Scheduled ILP

VLIW/EPIC: Statically Scheduled ILP 6.823, L21-1 VLIW/EPIC: Statically Scheduled ILP Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind

More information

SEVERAL studies have proposed methods to exploit more

SEVERAL studies have proposed methods to exploit more IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 16, NO. 4, APRIL 2005 1 The Impact of Incorrectly Speculated Memory Operations in a Multithreaded Architecture Resit Sendag, Member, IEEE, Ying

More information

Memory Hierarchy Management for Iterative Graph Structures

Memory Hierarchy Management for Iterative Graph Structures Memory Hierarchy Management for Iterative Graph Structures Ibraheem Al-Furaih y Syracuse University Sanjay Ranka University of Florida Abstract The increasing gap in processor and memory speeds has forced

More information

The Impact of Instruction Compression on I-cache Performance

The Impact of Instruction Compression on I-cache Performance Technical Report CSE-TR--97, University of Michigan The Impact of Instruction Compression on I-cache Performance I-Cheng K. Chen Peter L. Bird Trevor Mudge EECS Department University of Michigan {icheng,pbird,tnm}@eecs.umich.edu

More information

Hardware-Supported Pointer Detection for common Garbage Collections

Hardware-Supported Pointer Detection for common Garbage Collections 2013 First International Symposium on Computing and Networking Hardware-Supported Pointer Detection for common Garbage Collections Kei IDEUE, Yuki SATOMI, Tomoaki TSUMURA and Hiroshi MATSUO Nagoya Institute

More information