A Framework for Analyzing Architecture-Level Fault Tolerance Behavior in Applications

Size: px
Start display at page:

Download "A Framework for Analyzing Architecture-Level Fault Tolerance Behavior in Applications"

Transcription

1 A Framework for Analyzing Architecture-Level Fault Tolerance Behavior in Applications by Harshad Sane B.S., University of Pune, India 2004 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Master of Science Department of Electrical and Computer Engineering 2008

2 This thesis entitled: A Framework for Analyzing Architecture-Level Fault Tolerance Behavior in Applications written by Harshad Sane has been approved for the Department of Electrical and Computer Engineering Professor Daniel A. Connors Professor Manish Vachharajani Professor Li Shang Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline.

3 iii Sane, Harshad (M.S., Computer Engineering) A Framework for Analyzing Architecture-Level Fault Tolerance Behavior in Applications Thesis directed by Professor Daniel A. Connors, Ph.D. Radiation-induced transient faults, also known as single-event upsets, are a major concern in the design and operation of modern computer systems. Transient errors first impact the circuit and logic levels of a processor, and may propagate to the microarchitecture and architecture state of a processor. When undetected, transient errors in architecture state can lead to incorrect and undefined application behavior. Detailed simulation is a vital component of the design process of modern processors and exploration of new design concepts. However, high-level architectural simulators typically run x slower, making detailed simulation of most programs prohibitively long. Due to the deficiencies in simulation technology, current architecturelevel transient fault studies are primarily based on sampling techniques. Current program fault behavior analysis uses architecture-level injection of random bits selected over a time line. Such injection methods allow only a limited number of injections per unit time, a high percentage of which may not expose the true fault susceptibility nature of a program. This thesis introduces an accurate and fast fault-injection framework for studying inherent code execution properties of a program that correlate to different levels of fault susceptibility. The framework utilizes a three step approach consisting of trace logging, fault injection and replay execution to emulate the effect of a transient fault in the architectural registers of a running application. Finally by correlating the injection analysis results to dependence graph patterns across sets of applications, an analysis methodology is constructed to accurately predict the fault tolerance of an application without performing any error injections.

4 Dedication I dedicate this thesis to my family for their unconditional support.

5 v Acknowledgements Firstly I would like to thank my adviser, Professor Dan Connors who guided me through my academic endeavor at CU. I would like to thank him for all the knowledge and encouragement he bestowed upon me. This work has been possible because of his guidance and prompt feedback. I would like to thank all the members of the DRACO research group for their insight and ideas. I would like to thank my friends who made sure that I enjoyed life along with my career. Most importantly, I would like to thank my family who have supported me unconditionally throughout out my life.

6 vi Contents Chapter 1 Introduction Contributions Background Single Event Phenomenon Architecture-level Fault Analysis Motivation Overview of current injection methodologies Fault penetration and point of injection Natural fault resilience and ineffective injections Fault Emulation Framework and Results Overview and Goals The TEFIS Framework Execution Tracing Fault Generation Fault Emulation Trace Emulation Experimental Results and Analysis Fault tolerance categories

7 4.3.2 Accuracy of emulations Average Execution Time of Fault Emulations Against Full Injections 24 vii 5 Analysis of Fault Tolerance Program Behavior Source Code Analysis Algorithm Level Effects Dynamic Source Code Behavior Dynamic Program Trace Behavior Inter-Procedural Fault Tolerance Dynamic Dependence Graph Representation Estimating Program Fault Tolerance Future Work 38 7 Conclusion 40 Bibliography 41

8 viii Tables Table 3.1 References Fault tolerance characteristics of logical operations

9 ix Figures Figure 2.1 A neutron strike Feature size Vs Soft error rate Current random injection methodologies Fault penetration Fault injection distribution in time A high fault tolerant case from 186.crafty with 80.6% correct results of emultion Fractional emulation captures the effect of fractional execution with traces against full injections in the entire application. The figure shows the exclusion of build-up and monitoring time Context dump Framework overview Fault emulation Result categories Emulation accuracy A comparison of execution time Fault tolerance analysis of sorting routines (a)heapsort (b)quicksort A trace from 164.gzip with 91.4% incorrect results of emulation

10 x 5.3 A trace from 300.twolf with 100% segmentation fault results of emulation Fault tolerance correlation with program counter similarity Fault susceptibility correlation calculated using similar program code points Inter-procedural fault tolerance Example: Dependency graph Dependence graph similarity correlation Dependence graph similarity correlation - All benchmarks Fault tolerance prediction

11 Chapter 1 Introduction Scaling trends in technology that lead to faster and smaller transistors, lower voltage levels and reduced noise margins also increase the susceptibility of a circuit to transient faults. Shielding systems in hardware from radiation, cosmic rays and cross talk is difficult from a high-speed design perspective and is costly in terms of active power consumption. Transient errors first impact the circuit and logic levels of a processor, and can propagate to the micro-architecture and architecture state. Architecture state errors lead to invalid and unpredictable software behavior. As many program phases are more tolerant of some single-bit architecture errors, software shielding becomes an exciting solution due to its low cost and flexibility. Many software-based fault tolerant techniques [24, 23, 10] have been proposed to balance performance with error detection and recovery. Current studies involving the observation of effects of transient faults utilize a limited number of random fault injections in which micro-architecture (pipeline registers, intermediate logic) or architecture (register, memory) state is modified during simulated execution. As these error injection campaigns involve emulating machine execution, each injection run includes substantial experimental time leading up to the point of injection, and the remaining execution of the application to determine program correctness. Collectively the excessive time for simulated injections limit the points of

12 2 the program execution that can be studied, thus reducing the significance of understanding the fault tolerance behavior in applications. There are a number of ways to improve the accuracy and execution time of fault analysis over traditional fault injection systems. Firstly, as applications are characterized by repeating phases [5], there are opportunities to reduce the number of fault injections by studying representable phases of execution. Furthermore random injects often do not expose relevant program behavior as significant portions of program execution involve dead code [6] and value locality [16]. There is substantial potential to model fault behavior of code sequences by correlating the results of fault injections to dynamic code regions. In this way, a fault analysis modeling framework can be constructed anticipate the fault susceptibility of an application based on execution profile of code regions which do not require fault injections. 1.1 Contributions This thesis presents an experimental study of the current architecture-level injection techniques for evaluating fault tolerance. Based on this study, the first half of the thesis motivates the need and design of a new infrastructure for fault injection methodology. The framework is based on trace logging and performing all possible architecture faults of the trace through an injection emulation system (TEFIS). An analysis of the approach order to improve the accuracy and timeliness of the system is examined. The next section focuses on correlating the properties of a program with inherent fault tolerance characteristics. Based on this correlation, an experimental model is constructed to predict the fault tolerance of a program without any injections. This model would provide a baseline for making fault tolerance predictions of application. The above points are encompassed as contributions. These are the following contributions made to this thesis:

13 3 (1) Development of a fast and accurate fault modeling framework: A new methodology for evaluating the transient fault tolerance of program regions is presented. The framework deploys re-play execution of selected program traces that provide the same accurate results of full-scale fault injection in a fraction of the experimental evaluation time. (2) Demonstration of correlation between fault tolerance and program structure: The fundamental code property of the dynamic dependence graph of architecture state is analyzed to expose patterns exhibiting various levels of fault susceptibility. Case studies are examined to reveals relations between source code structures and fault tolerance behavior. (3) Fault tolerance prediction based on studied program behavior : Estimation of fault tolerance of an application by constructing an analytical model to accurately assess fault tolerance to code regions. The following sections elaborate on each of thesis contributions. The thesis concludes by proposing possible enhancements to the framework and scope for future work with this system.

14 Chapter 2 Background 2.1 Single Event Phenomenon Radiation effects on processors is a major concern for architects with reduction in transistor features. Among these effects, bit flips resulting from the ionization by neutron strikes from cosmic rays and alpha particles are considered critical source of upsets owing to their random occurrence. These effects, called Single Event Upset (SEU), constitute a serious threat to the reliability of digital equipment built on advanced circuits. Single event phenomena can be classified into three effects: I Single event upset (soft error) II Single event latch up (soft or hard error) III Single event burnout (hard failure) Single event upsets is defined by NASA as radiation induced errors in microelectronic circuits when charged particles lose energy by ionizing the medium through which they pass, leaving behind a wake of electron-hole pairs [2]. These electron hole pairs generate charge as they combine, and if this charge is greater than the critical charge of the device, it results in a change of state. Transient faults fall under this category of SEUs. Figure 2.1 is an example of a neutron strike on a transistor.

15 5 Figure 2.1: A neutron strike. Transient faults are emerging as a major concern for architects designing reliable computer systems [3, 17]. Trends in silicon process technology report a bleak future in terms of the fault susceptibility of an application. While the future error rate of a single transistor is expected to stay relatively constant [11, 13], the number of transistors per chip continues to increase at an exponential rate. As a result, the overall error rate for processors is expected to increase dramatically, making fault tolerance as important a design characteristic as performance, power consumption, and temperature dissipation. Figure 2.2 shows the effect of reduction in feature size with years to come. The degradation rate is about 8% per bit per generation [4] and follows the curve as shown in the figure. Hardware designs can be customized for fault tolerant execution with redundant resources such as latches or extended pipelines. Providing fault tolerance may require the addition of hundreds of thousands of delay latches and 20-30% logic to an existing processor [28]. Other more specialized approaches create even more sophisticated systems requiring both hardware and software integration [1, 31]. While these approaches work well in their specific scientific computing domain, the general purpose design field must adapt to the need for fault tolerance in fundamentally different ways. As design

16 6 cycle time is critical, many chip designers propose implementing redundancy-based fault tolerance using existing multi-core and multi-threaded process extensions [9, 18]. The driving motivation is to extend the engineering decision towards multi-context processor to provide fault tolerance. Figure 2.2: Feature size Vs Soft error rate. 2.2 Architecture-level Fault Analysis Most recent architecture research is focused on using performance models to provide Architecture Vulnerability Factor (AVF) estimates of processor reliability rather than deploying detailed fault injection into hardware RTL models. AVF is defined as the probability that a fault in that particular structure will result in an error in the final output of a program [19]. A structure s error rate is the product of its raw error rate, as determined by process and circuit technology, and the AVF. Processor designers can use AVF analysis to determine the processor structures in probabilistic need of protection (e.g., structures with high AVF are likely to be protected). Some structures, such as the branch predictor, have no effect on whether an error will propagate to the output of the program. In contrast, other structures are on the opposite end of the spectrum such

17 7 as the instruction issue window, load-store queue, and re-order buffer. The majority of hardware structures fall in the middle of the two extremes. While AVF analysis provides support for investigating new fault tolerant architecture techniques, program execution characteristics are largely missing from the determination of periods of software error susceptibility. A software-centric view makes this key insight: although faults occur at the hardware level, the only faults which matter are the faults which affect software correctness. By changing the boundaries of output comparison to software, a software-centric model shifts the focus from ensuring correct hardware execution to ensuring correct software execution. As a result, only faults which affect correctness are detected. Benign faults are safely ignored. A software-centric system with both detection and recovery, will not need to invoke the recovery mechanism for faults which do not affect correctness. The primary problem with AVF is that software periods of vulnerability substantially differ from micro-architecture periods of vulnerability. As research trends dictate, finding ways to selectively enable transient fault tolerant mechanisms, run-time and off-line experimental techniques must be guided equally by program behavior and hardware. As such, it is important to determine and predict when program susceptibility and hardware susceptibility differ.

18 Chapter 3 Motivation 3.1 Overview of current injection methodologies A major problem in the development of fault-tolerant systems is the accurate determination of the dependability properties of the system. Unlike performance, which can be evaluated through the use of benchmark programs, the degree of fault tolerance and reliability of a system cannot be evaluated in such a manner. This is because we do not often have the luxury of allowing systems to run for a very long time to see their behavior under fault effects. The generally preferred solution to this problem is to inject the effects of faults in a simulation model or a prototype implementation, and to observe the behavior of the system under the injected faults. Fault injection in a simulation is very flexible but far too time consuming. On the other hand, it is much more difficult to inject accurate (i.e. realistic) faults into a prototype, though their effect is readily observable. Figure 3.1 shows the current methods of fault injection, where random injection procedure has been adopted. The following graph shows the number of injections per benchmark used in recent papers that adopted random fault injection methodology. Table 3.1 are the list of references for each of the published articles numbered in Figure 3.1.

19 9 Figure 3.1: Current random injection methodologies. Number Short Description Reference (1) Soft-error detection through software TFT techniques(99) [22] (2) Y-Branches(03) [29] (3) Characterizing TF effects on processor pipeline(04) [30] (4) Configurable TF detection via dynamic binary translation(06) [24] (5) Symplfied(08) [20] (6) Using PLR to exploit multi cores for TFT(07) [27] Table 3.1: References A recent paper focusing on instruction level error derating adopted an interval injection methodology [7]. The injection campaign includes 100 uniformly distributed points of injection in trace lengths of 100 instructions, in 32 as well as 64 bit registers, resulting up to 224 injections per instruction. This method does show a representative set of experiments without having to simulate the entire benchmark. 3.2 Fault penetration and point of injection Fault injections can be performed at geometric layout, circuit, gate or block level models. The block level model is a functional view defining data and control paths of the application. Logic gates can go through several levels of masking like electrical and

20 10 latching window and logical before they can affect the behavior of an application [26]. From a user point of view, it only matters if a transient fault causes undesirable effects in the application. Hence our study can utilize fault injections at the block level as long as they emulate the same effect of the propagation of a hardware fault. The process of injecting faults into the architectural registers captures the above notion as shown in figure 3.2, although logical masking would still persist between the architectural and application layers. Figure 3.2: Fault penetration. Current injection techniques corrupt a single bit at 1000 random execution points [25, 27]. As discussed previously, such tests do not regard program behavior and have substantial variation. Figure 3.3 demonstrates the cumulative time for 1000 random fault injections. The injection campaign time is sorted to account for the longest running injections then the shorter injections. Some of the runs only take a matter of seconds, while others take several minutes to complete to understand whether the program s behavior was changed. Based on Figure 3.3, the view of performing statistically significant fault injection using random or interval based schemes would require substantial computation efforts. Clearly the use of fault injections into program state must be strategically guided to gather fault outcomes for only certain regions of interest.

21 11 Figure 3.3: Fault injection distribution in time. 3.3 Natural fault resilience and ineffective injections As stated above though electrical and latching window masking is covered by injecting faults at the architectural level, logical masking of faults are overlooked in random injection methods. Examples of logical masking effects include logical operations, conditional operations, overwriting faulted values before use, binary return values, dynamically dead instructions [6], silent stores [15] etc. This is why a large number of random injections do not expose the fault susceptibility of a program. For example, the table 3.2 shows the probable levels of tolerance with logical operations at bit level. The table compares the logical operation of a register either with itself or with another operand. The effects of the values presented in this table can be realized from their truth tables. From the table it can be seen that an XOR operation of a register on itself has 100% fault tolerance since it simply clears the register. Hence any bit perturbation before this operation would have no effect whatsoever. Figure 3.4 is an example of logical masking obtained from a trace of 186.crafty. The figure shows the source code of a function from the benchmark along with its

22 Operation Other Operand Itself AND,OR,NAND,NOR 50% 0% XOR,XNOR 0% 100% NOT 0% - 12 Table 3.2: Fault tolerance characteristics of logical operations. dynamic block execution. It can be seen from the source code, that all this function does is return a binary value (true or false) based on the movement of a white or black piece in a game of chess. The analysis shows that with such a dependence flow, the effect of flipping bits would hardly impact the return values. Random methods of fault injection impose a limitation on the number of injections per unit time, a high percentage of whose susceptibility may be low as per the above example. These limitations tend to invoke a motivation in developing a system which could not only be folds of magnitude faster but also as accurate as the full injections. Besides it should also be able to capture the properties of the program that affect its fault susceptibility. TEFIS (Trace emulation fault injecting system) is a powerful technique for emulating hardware faults using software and uses PIN, a binary instrumentation tool for tracing and emulation, which form the two major portions of the framework. Traces are captured along with their state information from a running executable based on various user defined parameters. This is followed by a rigorous fault injection procedure and replay execution of the faulted trace, but only up to the length of the trace. This approach provides the desired flexibility, and at the same time, allows execution of many experimental runs in a relatively very short period of time.

23 13 (a) Source code (b) Dynamic control flow Figure 3.4: A high fault tolerant case from 186.crafty with 80.6% correct results of emultion.

24 Chapter 4 Fault Emulation Framework and Results 4.1 Overview and Goals Software implemented fault injection methodologies can be broadly classified under compile time and run time injections. A run time fault injection system would truly emulate the effect of a real time transient fault occurrence. Current methods of run time software fault injection are either time based or interrupt driven. These are program level fault injection techniques which are generally unguided. Faults are injected either randomly, interval based or within phases of the application. Phase based injection would exploit some of the properties of an application. In either case, these methods would suffer from the following drawbacks: Experimental time for a single injection : This time consists of two parts, the build up time and monitoring time. Build up time would be the time required for full execution up until fault point, while monitoring time is what follows the injection until an outcome. Multiplied impact of the single injection overhead : There is a limitation in studying the complete application due the single injection overhead time multiplied over the number of injections applied to the application. For evaluating

25 15 the fault tolerance of the application, a statistically significant amount of injections are necessary, which consumes a lot of time, limiting the exposure of the application to a fewer number of faults. Inaccuracy in determination of dependability properties of a system : Due to the combined effect of he above two points, limited number of injections provide low determinism in understanding the dependency behavior of the system under all kinds of fault behavior. Keeping the above limitations in mind, there is a requirement for a system which overcomes the need to run full executions of the application for each injection, as well as explore the dependability properties of a system that affect its fault susceptibility. The trace emulation fault injecting system (TEFIS) has been developed with the following goals in mind: (1) Accuracy in fault tolerance with fractional execution : This frame work shall employ fractional traces for execution within the binary. Traces of any length can be obtained at any time point within the application. Accuracy is determined by comparing results of this fractionally injected execution to those of full injections. Figure 4.1 shows a comparison between full injection and fractional emulation on trace lengths of 100 and 200 instructions, where the dark regions show the fractional length of execution. (2) Very high speed injections and summary generation : Length of a single run extends from the point of injection to the end of the trace. This eliminates the build up time before the point of injection and monitoring time after the end of the trace, limiting the execution time to only the number of instructions per trace. This method not only speeds of result collection but also provides room for a large number of emulations.

26 16 Figure 4.1: Fractional emulation captures the effect of fractional execution with traces against full injections in the entire application. The figure shows the exclusion of buildup and monitoring time. (3) Expose deeper understanding of program behavior : Program properties have been studied based on their possible impact on the fault susceptibility of the application. Two methods of correlation have been described and used in this analysis, code region similarity and data dependence similarity. 4.2 The TEFIS Framework The framework has been designed to have functionally independent units. Each of these units are presented in the order in which they are executed within the framework. The three units of this procedure are execution tracing, fault generation and fault emulation. The tracing and emulation system uses PIN [21], a binary instrumentation tool developed at Intel. PIN is used in this framework for generating and loading context information at an instruction level.

27 Execution Tracing Execution tracing is a one time procedure involving generation of traces from the binary with a number of controllable parameters of operation. The framework provides operational flexibility on tracing methods with the following options: Length - Traces of any length can be generated Time Any point in time Interval - uniform intervals with uniform lengths Phase - Integration with SIMPOINTS [5] Function specific tracing This system uses a tool called ExecutionTracer for this purpose. The Execution- Tracer is a PIN [21] tool capable of dumping snippets of the binary at run time with predefined knobs for the user to control the length and starting point of the trace. Once these parameters are passed in, every instruction is instrumented before its execution. This instrumentation constitutes gathering of the following information: Trace context : Disassembly with instruction pointer which give control flow information Register context : Value stored within each of the 32 bit registers Memory context : Any values read from or written to memory along with their addresses. Dependence graphs : Data flow dependence information generated in graphical format with the help of dot [12]

28 18 Edge context : Edge information at each basic block (Not used as of now) An example format for context dumps within a trace is shown in Figure 4.2. The figure shows a register context which is generated for each instruction, a memory context if exists for that instruction and the disassembly for that instruction, in the form of a trace context with its instruction pointer information. The conventions for storing each context have the context name followed by the trace number. For trace generation these files form static storage, but for fault emulations, they are created and destroyed on the fly as required. Figure 4.2: Context dump Fault Generation Fault generation follows the one time tracing procedure. This is a highly iterative process that repeats a number of times in a trace. The bit flips required for fault emulation are generated in this step. The tool used for bit flips needs to unfold through a number of stages which are listed sequentially as follows: Trace - A trace generated from the binary forms the top level entry Instruction - Each instruction within a trace forms the next level of entry

29 Source registers - Every 32 bit source register for an instruction, if present forms the next level of entry 19 Bits - Every bit in the 32 bit source register is flipped and forms a single emulation Figure 4.3 gives an overview of the tracing and fault generation procedure. The right half of the figure depicts the one time tracing system, while the left hand side shows the fault generation and emulation steps. The box in the middle shows a possible future step of less susceptible instruction filtering. The right half block has three columns representing the binary, a trace and the context. The binary is shown to have been split into traces, with each trace having a number of instructions and each instruction having a context associated with it. The fault generation portion is associated for a single trace. The figure shows number of emulations for each instruction in the trace. Also within each instruction, 32 flips per source register are shown for each emulation. This structure repeats over all the traces within the executable. Faulted contexts are saved for each emulation in a flat directory structure with extensions to each file name representing the trace and emulation it belongs to. This step prepares the faulted contexts for their actual runs during fault emulation Fault Emulation This step runs the binary loaded with the faulted context. Each run consists of a context with a single bit flipped from within a source register in an instruction within the trace. Though there are a large number of runs of the executable, their execution length is limited only to the end of the trace to which the emulation belongs. Faulted register contexts from the fault generation stage along with the memory context as dumped during tracing are loaded at run time using instrumentation via PIN [21].

30 20 Figure 4.3: Framework overview. Since the instruction pointer is known from the trace, the exact dumped contexts at an instruction are loaded at the exact same point, but with one of the source register bits flipped. After this initial loading the binary is run from this state for the same number of instructions that follow it in the original trace. This process is repeated over all emulations till half of the trace length. i.e. First 50% instructions within the trace. The framework lets each trace represent a folder with all the emulations for the trace residing in that folder. Figure 4.4 shows the fault emulation system with the reference trace on the left and the emulation trace on the right. The reference trace is shown for the purpose of result comparison. The figure shows a point of injection, which consists of loading of the memory context and the faulted register context based on the context gathered for the same instruction in the reference trace. When the binary is run with the state set in such a manner, there are five different possibilities in execution. Data flow deviation: This results in a context mismatch with or without a control flow deviation Control flow deviation: Instruction pointers show a mismatch in the traces

31 21 Figure 4.4: Fault emulation. generally always accompanied by a data flow deviation as expected, since a totally new set of instructions are encountered. Signal Fault: Trace execution aborts due to inappropriate memory accessing signaled as an error from the operating system. Timeout: Cases where the trace gets stuck in a loop or hangs, which is when execution is forced to halt based on a set timer. These cases are generally very rare due to the design of the system. None: The emulation exactly matches the reference trace in all contexts The results are based on comparing the contexts of the reference trace with the emulated trace at the end of execution of the last instruction. The results are placed in bins indicating mismatches, signal faults or correct execution. 4.3 Trace Emulation Experimental Results and Analysis 12 sets of integer benchmarks from SPEC2000 suite were used as candidate applications for the fault emulation experiments. Full injections were also performed on

32 the same set of traces that were used for emulation purpose traces per benchmark 22 were generated for the experiment. Trace lengths of 100 and 200 instructions were selected and emulations were run across half the length of each trace. Since the start of the trace lengths of 100 and 200 instructions were the same, the 200 trace length covers the characteristics of the prior and more Fault tolerance categories Each emulation was compared with the respective reference trace on the fly, and results were generated. The basis of grouping the results of emulation can be summarized in the following manner: Mismatch: This case covers a mismatch in either the register, memory contexts or control flow Signal Fault: The emulation tool is equipped with handlers to catch any signal faults from the operating system. None: All contexts and control flow match at the end of the trace. Figure 4.5 shows the fault tolerance characteristics of each benchmark. The figure depicts fault tolerance based on the overall categories of results mentioned above. The groupings in the left figure are made based on a cut-off percentage. i.e Mostly incorrect signifies over 50% incorrect with the number of correct entries less than 10% in that trace. Same is the case for the segmentation fault case, while the Correct bin accumulates all correct entries over 10% in the trace. This figure shows an overall characteristic of each benchmark displaying its fault tolerance behavior. From the figure it can be seen that 186.crafty and 256.perlbmk show a high fault tolerant behavior.

33 23 Figure 4.5: Result categories Accuracy of emulations As described previously the goals of the framework is to prove two contributions; accuracy of the frame work and speed of execution based on comparison with full injections. For this purpose the same traces used for emulation were taken and full injections were performed on each bit of every source register found in every instruction of the trace. Since the full injections run the complete binary and base the results on entire execution, the results of emulation are compared against these for determining accuracy. Figure 4.6 shows the accuracy of emulations with two trace lengths against full injections. The figure shows 3 vertical bars per benchmark, the first two being emulations of trace length 100 and 200 instructions respectively, while the last bar showing full injections. The categories are in the same order as mentioned before. The Y axis shows the percentage contribution per category. The accuracy of emulations is determined by the closeness of the correct entries for the 100 and 200 instruction trace length emulations to the full injections. From the figure it seems that the accuracy is fairly good for trace lengths of 100 and 200 instructions. Though application dependent, the real injection data seems fairly close to the emulation accuracy observed with above

34 24 trace lengths. An enhancement to this would include trace lengths of increasing order above 200 instructions. An experiment was conducted to monitor the position of the control deviation due to fault injections. It was observed that a control flow would occur generally between 100 to 200 basic blocks. That gives an approximate length of 1000 instructions considering an average of 5 instructions per basic block. A saturating curve for correctness should be observed with increasing length of traces, somewhere upon this value. This would follow an analysis to see whether an increasing trend of correct entries is observed closer to the full injections with this method. Figure 4.6: Emulation accuracy Average Execution Time of Fault Emulations Against Full Injections Figure 4.7 shows a comparison between the times of execution of 1000 emulations to the same number of full injections. Notice that the Y axis is on a logarithmic scale. It can be seen that the execution times for emulations are much smaller than the full

35 injections. Also this execution time for emulations remains constant over all benchmarks since the length of instructions executed remains more or less the same. 25 Figure 4.7: A comparison of execution time. The above two results are the most important which help in proving the accuracy of the emulation system as well as its ability for having execution times folds of magnitude lesser than that for full injections. The system also has the flexibility to add the extra block for ineffective instruction filtering which would even further enhance its ability to capture the fault susceptibility of an application as well as reduce execution times.

36 Chapter 5 Analysis of Fault Tolerance Program Behavior The above sections have proved the accuracy and reduced execution time of the emulation framework. This section would look in to correlating the observed fault tolerance obtained from the system to the program behavior. This is necessary in understanding the relation between the properties of a run time program and its fault susceptibility. The criteria applied for selecting such properties is based on the assumption that similar code execution should have similarity in its fault susceptibility. 5.1 Source Code Analysis Finding properties of a program that affects its fault tolerance require a much deeper understanding of what is going on within the source. An application may be analyzed by either looking at its algorithm level or dynamic assembly to understand its fault susceptibility. This section shall look at algorithmic effects on fault tolerance as well as do a source code analysis in order to understand program behavior from a reliability point of view Algorithm Level Effects To view an algorithm based effect on fault tolerance, fault injections were applied to sorting routines, which basically performed the same function, using the same inputs

37 but applied different algorithms. These applications were also compiled under different 27 versions to observe the effect of optimizations across each. A portion of the results are displayed in Figure 5.1 to show the effect of algorithm on the fault tolerance of an application. (a) Heapsort (b) Quicksort Figure 5.1: Fault tolerance analysis of sorting routines (a)heapsort (b)quicksort.

38 Dynamic Source Code Behavior A PIN [21] tool was written for the purpose of tracking the dynamic flow of instructions in basic blocks along with their disassembly. The tool is capable of taking in a function name and dump a static control flow graph with markings for its dynamic execution. The function names were provided from the functions that occurred in the traces, which were dumped using a script that used another simple PIN [21] tool. This gives an idea of what the trace is actually executing and a complete view of how a fault propagates. Sections of these traces were processed through this tool to analyze which instructions the fault encountered during its flow dynamically through the program. A trace having fault tolerance has already been illustrated in the motivation section. Further cases of traces with low fault tolerance and high segmentation fault cases in this section. Figure 5.2 shows a case of low fault tolerance, about 91.4% incorrect behavior, in 164.gzip. This is a function called updcrc() from the trace which does a crc check and hence is compute intensive. Comparing this layout to the high fault tolerant case, it does not have binary decisions or logic to mask values off. Instead the operations just before returning from the function perform an xor with all F s which necessarily flips all bits. From the context point of view this information will never match after a fault injection. Figure 5.3 shows a case of high segmentation fault. In this case it is a function from 300.twolf. The source code shows that this function performs a linked list traversal, within which it assigns values. Also, most of the assignments use pointers, hence a very likely candidate for misalignment in memory accesses. The study shown above, requires properties closer to the source, based on their impact on fault tolerance. Two such properties and their effects on the fault susceptibil-

39 29 /* Source function */ ulg updcrc(s, n) uch *s; /* pointer to bytes to pump through */ unsigned n; /* number of bytes in s[ ] */ { register ulg c; /* temporary variable */ static ulg crc = (ulg)0xffffffffl; /* shift register contents */ if (s == NULL) { c = 0xffffffffL; } else { c = crc; if (n) do { c = crc_32_tab[((int)c ^ (*s++)) & 0xff] ^ (c >> 8); } while (--n); } crc = c; return c ^ 0xffffffffL; /* (instead of ~c for 64-bit machines) */ } (a) Source code (b) Dynamic control flow Figure 5.2: A trace from 164.gzip with 91.4% incorrect results of emulation. ity of a program have been chosen. The following sections analyze code region similarity and data dependency of a program as tools to correlate its fault tolerance. 5.2 Dynamic Program Trace Behavior This section analyzes the behavior of a program based on similarity of code regions and its effect on fault tolerance of the application. The analysis method comprises of comparing the existing traces with each other, in an attempt to find matches in program counter values. This gives an indication of the program executing in the same code region but probably in a different dynamic state.

40 30 (a) Source code (b) Dynamic control flow Figure 5.3: A trace from 300.twolf with 100% segmentation fault results of emulation. The correlation with fault tolerance is based on generating a score of similarity among a pair of traces. This score would give an indication of how similar the two traces are. The score generation procedure comprises of taking two traces, finding the percentage of matching program counters from one trace to the other and vice versa, and averaging their value to generate a similarity score. This procedure is iterated in pairs of traces among all the traces. Figure 5.4 shows the overall correlation of fault tolerance to PC similarity across all benchmarks. The different bins represent the delta

41 31 in fault tolerance with similarity scores of above 65%. Figure 5.4: Fault tolerance correlation with program counter similarity. To observe the correlation, the difference in the fault tolerance of a pair of traces with similarity scores of 65% and above are plotted as shown in figure 5.5. The figure shows program counter similarity score plot against a function of the fault tolerance delta between the pair of traces. The curve observed is a polynomial fitting curve of third degree depicting a trend of the susceptibility of a program to faults with similarity of code regions. There are 2 observable trends from the figures of 175.vpr and 181.mcf which show a very promising decrease in the fault tolerance delta with increasing score of similarity. On the other hand compression algorithms like 164.gzip and 175.vpr show a very irregular behavior. This could be due to the large variations in the dynamic states, though similar instructions are being executed, but at a different time. Also these applications tend to have more inter-procedural calls than others Inter-Procedural Fault Tolerance

42 32 (a) 175.vpr (b) 256.bzip2 (c) 181.mcf (d) 164.gzip Figure 5.5: Fault susceptibility correlation calculated using similar program code points. The existing trace information was used to correlate the effect of number of procedures in a trace to its fault susceptibility. As expected, a decreasing trend of fault tolerance was observed with increasing number of inter procedural calls within a trace. Figure 5.6 shows the average fault tolerance across all traces with increasing number of procedures per trace. The figure also shows the percentage of times these occur among a set of 1000 traces and is scaled with a factor of 0.2 for visibility in the same region. The occurrence of single functions within traces dominates the rest, showing fewer traces with higher number of procedures having the highest fault susceptibility. The above study gives us some indication that though certain applications show a favorable trend of fault tolerance with code similarity, others depend on the dynamic state of the system. This is taken into consideration in the following section where data

43 33 Figure 5.6: Inter-procedural fault tolerance. dependencies which are the primary propagators of a fault are analyzed and compared among traces. 5.3 Dynamic Dependence Graph Representation Program counter correlation captures the similarity in terms of functional execution, but fails to capture the dynamic data flow, which propagates or masks the effect of a transient fault. The propagation of a fault depends on the data dependencies of a program, and this information needs to be captured and used to find similar trends, in order to correlate with the fault susceptibility of an application. Figure 5.7 shows an example of a data dependency graph in dot format. The reliability of the system is inherent in the connectivity of the graph. For this purpose, data dependency graphs are generated from the traces. The graphs are generated by a graph clustering tool developed by Dennis Sasha at NYU [8]. The tool takes a data set of directed graphs generated by a PIN [21] tool which instruments and dumps data dependency information. The graph clustering tool uses SUBDUE [14] to find common sub-structures in a given trace. The tool then iterates

44 34 Figure 5.7: Example: Dependency graph. over all traces to find t similar structures and clusters them together. The tightness measure of a cluster which defines the precision of similarity can also be defined. The above graph clustering procedure was iterated over all the generated traces. A similarity score was calculated among a pair of traces based on the closeness of the dependence graph contained by each trace. This analysis directly correlates the fault tolerance with the dynamic dependencies of the program, which is the active ingredient in fault propagation. For a larger view, similarity scores of 35% and above have been plotted against their respective fault tolerances. The graph on the left gives the actual raw number while that on the right is a polynomial fitting of the graph on the left. The cases in Figure 5.8 are the same as shown for program counter similarity, yet show a decreasing trend in fault tolerance with increasing similarity score of dependences. A similar trend was observed among all the other benchmarks with very few outliers. Figure 5.9 shows the dependence similarity correlation over all the benchmarks. from the figure, it can be observed that the trend remains similar averaged over all the

45 35 (a) 164.gzip. (b) 164.gzip - Trend. (c) 175.vpr. (d) 175.vpr - Trend. Figure 5.8: Dependence graph similarity correlation. (a) All Benchmarks. (b) All Benchmarks - Trend. Figure 5.9: Dependence graph similarity correlation - All benchmarks. benchmarks. The delta in fault tolerance shows a decrease with increasing similarity. Although a slight rise in the curve can be observed in the end.

46 Estimating Program Fault Tolerance Based on the two correlation techniques seen in the previous sections, it is pretty evident that dependency graph similarity is a much more accurate analysis of correlation with fault tolerance. This technique is put to use in prediction of fault tolerance based on the constant trends of fault susceptibility seen against graph similarity scores. If the prediction accuracy does fall under an acceptable range, one could only look at the dependence structure of any program and be able to predict its fault tolerance without any injections. The existing results for the traces from the emulation framework along with their dependency graphs were used to make prediction models. The prediction model used here uses incremental number of graphs which are most similar to the reference graph, in order to predict its fault tolerance. The fault tolerance values are averaged for graph number more than one. Ten models have been selected for this prediction, where the first model predicts the fault tolerance of a trace based on the fault tolerance of another trace which has the most similar dependency graph. The models greater than two, select the designated number of most similar graphs and average out their fault tolerance score for prediction. The accuracy of the prediction model is observed with increasing number of similarity graphs in use. Figure 5.10 gives the accuracy of each prediction model averaged over all the traces on 12 benchmarks. The X axis represents the model with the number of similarity graphs used for its prediction and the Y axis represents the average accuracy of prediction. Since the Y axis is a difference between the actual and predicted value of fault tolerance averaged over all traces, a value closer to the X axis represents higher prediction accuracy. Based on the figure, it is evident that the accuracy in prediction of fault tolerance

Transient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof.

Transient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof. Transient Fault Detection and Reducing Transient Error Rate Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof. Steven Swanson Outline Motivation What are transient faults? Hardware Fault Detection

More information

Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance

Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance Outline Introduction and Motivation Software-centric Fault Detection Process-Level Redundancy Experimental Results

More information

Threshold-Based Markov Prefetchers

Threshold-Based Markov Prefetchers Threshold-Based Markov Prefetchers Carlos Marchani Tamer Mohamed Lerzan Celikkanat George AbiNader Rice University, Department of Electrical and Computer Engineering ELEC 525, Spring 26 Abstract In this

More information

Reliable Architectures

Reliable Architectures 6.823, L24-1 Reliable Architectures Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 6.823, L24-2 Strike Changes State of a Single Bit 10 6.823, L24-3 Impact

More information

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Elec525 Spring 2005 Raj Bandyopadhyay, Mandy Liu, Nico Peña Hypothesis Some modern processors use a prefetching unit at the front-end

More information

Selective Fill Data Cache

Selective Fill Data Cache Selective Fill Data Cache Rice University ELEC525 Final Report Anuj Dharia, Paul Rodriguez, Ryan Verret Abstract Here we present an architecture for improving data cache miss rate. Our enhancement seeks

More information

A Low-Cost Correction Algorithm for Transient Data Errors

A Low-Cost Correction Algorithm for Transient Data Errors A Low-Cost Correction Algorithm for Transient Data Errors Aiguo Li, Bingrong Hong School of Computer Science and Technology Harbin Institute of Technology, Harbin 150001, China liaiguo@hit.edu.cn Introduction

More information

Area-Efficient Error Protection for Caches

Area-Efficient Error Protection for Caches Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Verification and Validation of X-Sim: A Trace-Based Simulator

Verification and Validation of X-Sim: A Trace-Based Simulator http://www.cse.wustl.edu/~jain/cse567-06/ftp/xsim/index.html 1 of 11 Verification and Validation of X-Sim: A Trace-Based Simulator Saurabh Gayen, sg3@wustl.edu Abstract X-Sim is a trace-based simulator

More information

An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks

An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 s Joshua J. Yi and David J. Lilja Department of Electrical and Computer Engineering Minnesota Supercomputing

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

ARCHITECTURE DESIGN FOR SOFT ERRORS

ARCHITECTURE DESIGN FOR SOFT ERRORS ARCHITECTURE DESIGN FOR SOFT ERRORS Shubu Mukherjee ^ШВпШшр"* AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO T^"ТГПШГ SAN FRANCISCO SINGAPORE SYDNEY TOKYO ^ P f ^ ^ ELSEVIER Morgan

More information

An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors

An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors Onur Mutlu Hyesoon Kim David N. Armstrong Yale N. Patt High Performance Systems Group

More information

ABSTRACT STRATEGIES FOR ENHANCING THROUGHPUT AND FAIRNESS IN SMT PROCESSORS. Chungsoo Lim, Master of Science, 2004

ABSTRACT STRATEGIES FOR ENHANCING THROUGHPUT AND FAIRNESS IN SMT PROCESSORS. Chungsoo Lim, Master of Science, 2004 ABSTRACT Title of thesis: STRATEGIES FOR ENHANCING THROUGHPUT AND FAIRNESS IN SMT PROCESSORS Chungsoo Lim, Master of Science, 2004 Thesis directed by: Professor Manoj Franklin Department of Electrical

More information

ALMA Memo No Effects of Radiation on the ALMA Correlator

ALMA Memo No Effects of Radiation on the ALMA Correlator ALMA Memo No. 462 Effects of Radiation on the ALMA Correlator Joseph Greenberg National Radio Astronomy Observatory Charlottesville, VA July 8, 2003 Abstract This memo looks specifically at the effects

More information

Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency

Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency Vijay G. Savani, Akash I. Mecwan, N. P. Gajjar Institute of Technology, Nirma University vijay.savani@nirmauni.ac.in, akash.mecwan@nirmauni.ac.in,

More information

WITH the continuous decrease of CMOS feature size and

WITH the continuous decrease of CMOS feature size and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 5, MAY 2012 777 IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan, Student

More information

More on Conjunctive Selection Condition and Branch Prediction

More on Conjunctive Selection Condition and Branch Prediction More on Conjunctive Selection Condition and Branch Prediction CS764 Class Project - Fall Jichuan Chang and Nikhil Gupta {chang,nikhil}@cs.wisc.edu Abstract Traditionally, database applications have focused

More information

An Interactive GUI Front-End for a Credit Scoring Modeling System by Jeffrey Morrison, Futian Shi, and Timothy Lee

An Interactive GUI Front-End for a Credit Scoring Modeling System by Jeffrey Morrison, Futian Shi, and Timothy Lee An Interactive GUI Front-End for a Credit Scoring Modeling System by Jeffrey Morrison, Futian Shi, and Timothy Lee Abstract The need for statistical modeling has been on the rise in recent years. Banks,

More information

SEE Tolerant Self-Calibrating Simple Fractional-N PLL

SEE Tolerant Self-Calibrating Simple Fractional-N PLL SEE Tolerant Self-Calibrating Simple Fractional-N PLL Robert L. Shuler, Avionic Systems Division, NASA Johnson Space Center, Houston, TX 77058 Li Chen, Department of Electrical Engineering, University

More information

1993. (BP-2) (BP-5, BP-10) (BP-6, BP-10) (BP-7, BP-10) YAGS (BP-10) EECC722

1993. (BP-2) (BP-5, BP-10) (BP-6, BP-10) (BP-7, BP-10) YAGS (BP-10) EECC722 Dynamic Branch Prediction Dynamic branch prediction schemes run-time behavior of branches to make predictions. Usually information about outcomes of previous occurrences of branches are used to predict

More information

An Interactive GUI Front-End for a Credit Scoring Modeling System

An Interactive GUI Front-End for a Credit Scoring Modeling System Paper 6 An Interactive GUI Front-End for a Credit Scoring Modeling System Jeffrey Morrison, Futian Shi, and Timothy Lee Knowledge Sciences & Analytics, Equifax Credit Information Services, Inc. Abstract

More information

Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX

Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Keerthi Bhushan Rajesh K Chaurasia Hewlett-Packard India Software Operations 29, Cunningham Road Bangalore 560 052 India +91-80-2251554

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

Logic, Words, and Integers

Logic, Words, and Integers Computer Science 52 Logic, Words, and Integers 1 Words and Data The basic unit of information in a computer is the bit; it is simply a quantity that takes one of two values, 0 or 1. A sequence of k bits

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

Unit 2 : Computer and Operating System Structure

Unit 2 : Computer and Operating System Structure Unit 2 : Computer and Operating System Structure Lesson 1 : Interrupts and I/O Structure 1.1. Learning Objectives On completion of this lesson you will know : what interrupt is the causes of occurring

More information

ECE 574 Cluster Computing Lecture 19

ECE 574 Cluster Computing Lecture 19 ECE 574 Cluster Computing Lecture 19 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 10 November 2015 Announcements Projects HW extended 1 MPI Review MPI is *not* shared memory

More information

Wish Branch: A New Control Flow Instruction Combining Conditional Branching and Predicated Execution

Wish Branch: A New Control Flow Instruction Combining Conditional Branching and Predicated Execution Wish Branch: A New Control Flow Instruction Combining Conditional Branching and Predicated Execution Hyesoon Kim Onur Mutlu Jared Stark David N. Armstrong Yale N. Patt High Performance Systems Group Department

More information

A Study for Branch Predictors to Alleviate the Aliasing Problem

A Study for Branch Predictors to Alleviate the Aliasing Problem A Study for Branch Predictors to Alleviate the Aliasing Problem Tieling Xie, Robert Evans, and Yul Chu Electrical and Computer Engineering Department Mississippi State University chu@ece.msstate.edu Abstract

More information

Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy

Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Wei Chen, Rui Gong, Fang Liu, Kui Dai, Zhiying Wang School of Computer, National University of Defense Technology,

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

Limiting the Number of Dirty Cache Lines

Limiting the Number of Dirty Cache Lines Limiting the Number of Dirty Cache Lines Pepijn de Langen and Ben Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology

More information

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting

More information

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION Introduction :- An exploits the hardware resources of one or more processors to provide a set of services to system users. The OS also manages secondary memory and I/O devices on behalf of its users. So

More information

Predicting Messaging Response Time in a Long Distance Relationship

Predicting Messaging Response Time in a Long Distance Relationship Predicting Messaging Response Time in a Long Distance Relationship Meng-Chen Shieh m3shieh@ucsd.edu I. Introduction The key to any successful relationship is communication, especially during times when

More information

COL862 Programming Assignment-1

COL862 Programming Assignment-1 Submitted By: Rajesh Kedia (214CSZ8383) COL862 Programming Assignment-1 Objective: Understand the power and energy behavior of various benchmarks on different types of x86 based systems. We explore a laptop,

More information

On Using Machine Learning for Logic BIST

On Using Machine Learning for Logic BIST On Using Machine Learning for Logic BIST Christophe FAGOT Patrick GIRARD Christian LANDRAULT Laboratoire d Informatique de Robotique et de Microélectronique de Montpellier, UMR 5506 UNIVERSITE MONTPELLIER

More information

Selecting PLLs for ASIC Applications Requires Tradeoffs

Selecting PLLs for ASIC Applications Requires Tradeoffs Selecting PLLs for ASIC Applications Requires Tradeoffs John G. Maneatis, Ph.., President, True Circuits, Inc. Los Altos, California October 7, 2004 Phase-Locked Loops (PLLs) are commonly used to perform

More information

Parallel Streaming Computation on Error-Prone Processors. Yavuz Yetim, Margaret Martonosi, Sharad Malik

Parallel Streaming Computation on Error-Prone Processors. Yavuz Yetim, Margaret Martonosi, Sharad Malik Parallel Streaming Computation on Error-Prone Processors Yavuz Yetim, Margaret Martonosi, Sharad Malik Upsets/B muons/mb Average Number of Dopant Atoms Hardware Errors on the Rise Soft Errors Due to Cosmic

More information

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple. Bit-Flip Errors

One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple. Bit-Flip Errors One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors Behrooz Sangchoolie *, Karthik Pattabiraman +, Johan Karlsson * * Department of Computer Science and Engineering,

More information

A Fault-Tolerant Alternative to Lockstep Triple Modular Redundancy

A Fault-Tolerant Alternative to Lockstep Triple Modular Redundancy A Fault-Tolerant Alternative to Lockstep Triple Modular Redundancy Andrew L. Baldwin, BS 09, MS 12 W. Robert Daasch, Professor Integrated Circuits Design and Test Laboratory Problem Statement In a fault

More information

EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS

EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS A Project Report Presented to The faculty of the Department of Computer Science San Jose State University In Partial Fulfillment of the Requirements

More information

Performance Oriented Prefetching Enhancements Using Commit Stalls

Performance Oriented Prefetching Enhancements Using Commit Stalls Journal of Instruction-Level Parallelism 13 (2011) 1-28 Submitted 10/10; published 3/11 Performance Oriented Prefetching Enhancements Using Commit Stalls R Manikantan R Govindarajan Indian Institute of

More information

Hi everyone. Starting this week I'm going to make a couple tweaks to how section is run. The first thing is that I'm going to go over all the slides

Hi everyone. Starting this week I'm going to make a couple tweaks to how section is run. The first thing is that I'm going to go over all the slides Hi everyone. Starting this week I'm going to make a couple tweaks to how section is run. The first thing is that I'm going to go over all the slides for both problems first, and let you guys code them

More information

High-level Variable Selection for Partial-Scan Implementation

High-level Variable Selection for Partial-Scan Implementation High-level Variable Selection for Partial-Scan Implementation FrankF.Hsu JanakH.Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract In this paper, we propose

More information

Accelerated Library Framework for Hybrid-x86

Accelerated Library Framework for Hybrid-x86 Software Development Kit for Multicore Acceleration Version 3.0 Accelerated Library Framework for Hybrid-x86 Programmer s Guide and API Reference Version 1.0 DRAFT SC33-8406-00 Software Development Kit

More information

INTRODUCING ABSTRACTION TO VULNERABILITY ANALYSIS

INTRODUCING ABSTRACTION TO VULNERABILITY ANALYSIS INTRODUCING ABSTRACTION TO VULNERABILITY ANALYSIS A Dissertation Presented by Vilas Keshav Sridharan to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements

More information

Deviceless respiratory motion correction in PET imaging exploring the potential of novel data driven strategies

Deviceless respiratory motion correction in PET imaging exploring the potential of novel data driven strategies g Deviceless respiratory motion correction in PET imaging exploring the potential of novel data driven strategies Presented by Adam Kesner, Ph.D., DABR Assistant Professor, Division of Radiological Sciences,

More information

Q.1 Explain Computer s Basic Elements

Q.1 Explain Computer s Basic Elements Q.1 Explain Computer s Basic Elements Ans. At a top level, a computer consists of processor, memory, and I/O components, with one or more modules of each type. These components are interconnected in some

More information

Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers

Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers Subash Chandar G (g-chandar1@ti.com), Vaideeswaran S (vaidee@ti.com) DSP Design, Texas Instruments India

More information

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory William Stallings Computer Organization and Architecture 8th Edition Chapter 5 Internal Memory Semiconductor Memory The basic element of a semiconductor memory is the memory cell. Although a variety of

More information

Chapter I INTRODUCTION. and potential, previous deployments and engineering issues that concern them, and the security

Chapter I INTRODUCTION. and potential, previous deployments and engineering issues that concern them, and the security Chapter I INTRODUCTION This thesis provides an introduction to wireless sensor network [47-51], their history and potential, previous deployments and engineering issues that concern them, and the security

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

Dynamic Branch Prediction

Dynamic Branch Prediction #1 lec # 6 Fall 2002 9-25-2002 Dynamic Branch Prediction Dynamic branch prediction schemes are different from static mechanisms because they use the run-time behavior of branches to make predictions. Usually

More information

Branch statistics. 66% forward (i.e., slightly over 50% of total branches). Most often Not Taken 33% backward. Almost all Taken

Branch statistics. 66% forward (i.e., slightly over 50% of total branches). Most often Not Taken 33% backward. Almost all Taken Branch statistics Branches occur every 4-7 instructions on average in integer programs, commercial and desktop applications; somewhat less frequently in scientific ones Unconditional branches : 20% (of

More information

Bits, Words, and Integers

Bits, Words, and Integers Computer Science 52 Bits, Words, and Integers Spring Semester, 2017 In this document, we look at how bits are organized into meaningful data. In particular, we will see the details of how integers are

More information

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control, UNIT - 7 Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete Instruction, Multiple Bus Organization, Hard-wired Control, Microprogrammed Control Page 178 UNIT - 7 BASIC PROCESSING

More information

hot plug RAID memory technology for fault tolerance and scalability

hot plug RAID memory technology for fault tolerance and scalability hp industry standard servers april 2003 technology brief TC030412TB hot plug RAID memory technology for fault tolerance and scalability table of contents abstract... 2 introduction... 2 memory reliability...

More information

11 Data Structures Foundations of Computer Science Cengage Learning

11 Data Structures Foundations of Computer Science Cengage Learning 11 Data Structures 11.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define a data structure. Define an array as a data structure

More information

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating

More information

11. SEU Mitigation in Stratix IV Devices

11. SEU Mitigation in Stratix IV Devices 11. SEU Mitigation in Stratix IV Devices February 2011 SIV51011-3.2 SIV51011-3.2 This chapter describes how to use the error detection cyclical redundancy check (CRC) feature when a Stratix IV device is

More information

Satisfactory Peening Intensity Curves

Satisfactory Peening Intensity Curves academic study Prof. Dr. David Kirk Coventry University, U.K. Satisfactory Peening Intensity Curves INTRODUCTION Obtaining satisfactory peening intensity curves is a basic priority. Such curves will: 1

More information

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Scan Converting Lines, Circles and Ellipses Hello everybody, welcome again

More information

c 2004 by Ritu Gupta. All rights reserved.

c 2004 by Ritu Gupta. All rights reserved. c by Ritu Gupta. All rights reserved. JOINT PROCESSOR-MEMORY ADAPTATION FOR ENERGY FOR GENERAL-PURPOSE APPLICATIONS BY RITU GUPTA B.Tech, Indian Institute of Technology, Bombay, THESIS Submitted in partial

More information

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015 Advanced Parallel Architecture Lesson 3 Annalisa Massini - 2014/2015 Von Neumann Architecture 2 Summary of the traditional computer architecture: Von Neumann architecture http://williamstallings.com/coa/coa7e.html

More information

Accurate Analysis of Single Event Upsets in a Pipelined Microprocessor

Accurate Analysis of Single Event Upsets in a Pipelined Microprocessor Accurate Analysis of Single Event Upsets in a Pipelined Microprocessor M. Rebaudengo, M. Sonza Reorda, M. Violante Politecnico di Torino Dipartimento di Automatica e Informatica Torino, Italy www.cad.polito.it

More information

Execution-based Prediction Using Speculative Slices

Execution-based Prediction Using Speculative Slices Execution-based Prediction Using Speculative Slices Craig Zilles and Guri Sohi University of Wisconsin - Madison International Symposium on Computer Architecture July, 2001 The Problem Two major barriers

More information

Accuracy Enhancement by Selective Use of Branch History in Embedded Processor

Accuracy Enhancement by Selective Use of Branch History in Embedded Processor Accuracy Enhancement by Selective Use of Branch History in Embedded Processor Jong Wook Kwak 1, Seong Tae Jhang 2, and Chu Shik Jhon 1 1 Department of Electrical Engineering and Computer Science, Seoul

More information

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 09, 2016 ISSN (online): 2321-0613 A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM Yogit

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What

More information

Evaluation of Embedded Operating System by a Software Method *

Evaluation of Embedded Operating System by a Software Method * Jan. 2006, Volume 3, No.1 (Serial No.14) Journal of Communication and Computer, ISSN1548-7709, USA * Junjie Peng 1, Jun Ma 2, Bingrong Hong 3 (1,3 School of Computer Science & Engineering, Harbin Institute

More information

Intelligent Networks For Fault Tolerance in Real-Time Distributed Systems. Jayakrishnan Nair

Intelligent Networks For Fault Tolerance in Real-Time Distributed Systems. Jayakrishnan Nair Intelligent Networks For Fault Tolerance in Real-Time Distributed Systems Jayakrishnan Nair Real Time Distributed Systems A Distributed System may follow a traditional Master-Slave Approach for Task Allocation

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 10 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Content Manufacturing Defects Wafer defects Chip defects Board defects system defects

More information

Chapter 7 The Potential of Special-Purpose Hardware

Chapter 7 The Potential of Special-Purpose Hardware Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture

More information

RTL Power Estimation and Optimization

RTL Power Estimation and Optimization Power Modeling Issues RTL Power Estimation and Optimization Model granularity Model parameters Model semantics Model storage Model construction Politecnico di Torino Dip. di Automatica e Informatica RTL

More information

Milind Kulkarni Research Statement

Milind Kulkarni Research Statement Milind Kulkarni Research Statement With the increasing ubiquity of multicore processors, interest in parallel programming is again on the upswing. Over the past three decades, languages and compilers researchers

More information

Optimal Clustering and Statistical Identification of Defective ICs using I DDQ Testing

Optimal Clustering and Statistical Identification of Defective ICs using I DDQ Testing Optimal Clustering and Statistical Identification of Defective ICs using I DDQ Testing A. Rao +, A.P. Jayasumana * and Y.K. Malaiya* *Colorado State University, Fort Collins, CO 8523 + PalmChip Corporation,

More information

A Fault Tolerant Superscalar Processor

A Fault Tolerant Superscalar Processor A Fault Tolerant Superscalar Processor 1 [Based on Coverage of a Microarchitecture-level Fault Check Regimen in a Superscalar Processor by V. Reddy and E. Rotenberg (2008)] P R E S E N T E D B Y NAN Z

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

DATABASE SCALABILITY AND CLUSTERING

DATABASE SCALABILITY AND CLUSTERING WHITE PAPER DATABASE SCALABILITY AND CLUSTERING As application architectures become increasingly dependent on distributed communication and processing, it is extremely important to understand where the

More information

High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs

High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs Vol. 3, Issue. 5, Sep - Oct. 2013 pp-2894-2900 ISSN: 2249-6645 High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs M. Reddy Sekhar Reddy, R.Sudheer Babu

More information

Analysis of Different Multiplication Algorithms & FPGA Implementation

Analysis of Different Multiplication Algorithms & FPGA Implementation IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 2, Ver. I (Mar-Apr. 2014), PP 29-35 e-issn: 2319 4200, p-issn No. : 2319 4197 Analysis of Different Multiplication Algorithms & FPGA

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS)

High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS) The University of Akron IdeaExchange@UAkron Mechanical Engineering Faculty Research Mechanical Engineering Department 2008 High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS) Ajay

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Operating Systems: Internals and Design Principles You re gonna need a bigger boat. Steven

More information

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems Salford Systems Predictive Modeler Unsupervised Learning Salford Systems http://www.salford-systems.com Unsupervised Learning In mainstream statistics this is typically known as cluster analysis The term

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

CNC Milling Machines Advanced Cutting Strategies for Forging Die Manufacturing

CNC Milling Machines Advanced Cutting Strategies for Forging Die Manufacturing CNC Milling Machines Advanced Cutting Strategies for Forging Die Manufacturing Bansuwada Prashanth Reddy (AMS ) Department of Mechanical Engineering, Malla Reddy Engineering College-Autonomous, Maisammaguda,

More information

Improving Achievable ILP through Value Prediction and Program Profiling

Improving Achievable ILP through Value Prediction and Program Profiling Improving Achievable ILP through Value Prediction and Program Profiling Freddy Gabbay Department of Electrical Engineering Technion - Israel Institute of Technology, Haifa 32000, Israel. fredg@psl.technion.ac.il

More information

Design of CPU Simulation Software for ARMv7 Instruction Set Architecture

Design of CPU Simulation Software for ARMv7 Instruction Set Architecture Design of CPU Simulation Software for ARMv7 Instruction Set Architecture Author: Dillon Tellier Advisor: Dr. Christopher Lupo Date: June 2014 1 INTRODUCTION Simulations have long been a part of the engineering

More information

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015 Advanced Parallel Architecture Lesson 3 Annalisa Massini - Von Neumann Architecture 2 Two lessons Summary of the traditional computer architecture Von Neumann architecture http://williamstallings.com/coa/coa7e.html

More information

Exploiting Value Prediction for Fault Tolerance

Exploiting Value Prediction for Fault Tolerance Appears in Proceedings of the 3rd Workshop on Dependable Architectures, Lake Como, Italy. Nov. 28. Exploiting Value Prediction for Fault Tolerance Xuanhua Li and Donald Yeung Department of Electrical and

More information

The Need for Speed: Understanding design factors that make multicore parallel simulations efficient

The Need for Speed: Understanding design factors that make multicore parallel simulations efficient The Need for Speed: Understanding design factors that make multicore parallel simulations efficient Shobana Sudhakar Design & Verification Technology Mentor Graphics Wilsonville, OR shobana_sudhakar@mentor.com

More information

METAL OXIDE VARISTORS

METAL OXIDE VARISTORS POWERCET CORPORATION METAL OXIDE VARISTORS PROTECTIVE LEVELS, CURRENT AND ENERGY RATINGS OF PARALLEL VARISTORS PREPARED FOR EFI ELECTRONICS CORPORATION SALT LAKE CITY, UTAH METAL OXIDE VARISTORS PROTECTIVE

More information

Duke University Department of Electrical and Computer Engineering

Duke University Department of Electrical and Computer Engineering Duke University Department of Electrical and Computer Engineering Senior Honors Thesis Spring 2008 Proving the Completeness of Error Detection Mechanisms in Simple Core Chip Multiprocessors Michael Edward

More information