Design of Fault Tolerant Software
|
|
- Shavonne Adela Francis
- 6 years ago
- Views:
Transcription
1 Design of Fault Tolerant Software Andrea Bondavalli CNUCE/CNR, via S.Maria 36, Pisa, Italy. Abstract In this paper we deal with structured software fault-tolerance. Structured software fault tolerance are those techniques where redundancy (both for detection and correction) is applied to the individual blocks of software with the goal of masking or reveal errors internal to the block. Each technique has its own way of structuring the interactions among redundant parts and of managing the complexity added. We discuss some of the open problems of software fault tolerance structures and the issues related to effectiveness. In particular we address generality and flexibility, which can be improved at the price of adding complexity to the design, and discuss the need of proper trade-offs between generality and flexibility on one hand and complexity on the other. The SCOP (Self-Configuring Optimal Programming) scheme [7, 24] constitutes an interesting example of such a trade-off. SCOP is a fault tolerant system structuring method, originally intended to improve run time efficiency, that is also very flexible and general. Together with the mechanisms used by SCOP for obtaining flexibility and generality, some recent research aiming at reducing the complexity is finally described 1 Introduction As computer systems are used in modern society for many critical applications, it is commonly recognised that it is necessary to improve their reliability, and in general their dependability. Since the early seventies it has become also apparent that obtaining software dependability (i.e., coping with design faults) constitutes a major problem. The development of dependable computing consists in the combined utilisation of a large number of techniques that can be classified into fault tolerance and fault prevention. Fault prevention techniques aim at a product that is as much as possible free, and likely to remain free, from internal defects (faults). Fault tolerance techniques are intended at "tolerating", by redundancy, the effects of faults, that is to cope with the effects of faults and avert the occurrence of failures or at least to warn a user that errors have been introduced into the state of the system. All the techniques known as 'software fault tolerance' are based on the concept of diversity of design [4] or data [1] (the principle of "double checking one's results" already found in Babbage's work) and are characterised by the emphasis on structuring and systematicity to make these concepts applicable in practice. Design diversity is the approach in which the production of two or more components (variants) of a system is aimed at delivering the same service through independent designs. The major advantage of design diversity is that it does not require the complete absence of design faults, but only that they should not produce similar errors in variants. Classical techniques for tolerating software faults are recovery blocks [18], N-version programming [3], which can be seen as extreme organisations following the design diversity approach, and other intermediate or combined techniques [13, 16, 19, 23]. Recovery blocks (RB) are the first scheme designed to provide software fault tolerance. In this approach, variants are named alternates and the main part of the adjudicator is an acceptance test that is applied sequentially to the results produced by variants. The variants are usually executed serially on a single processor. The execution time of a recovery block is normally that of the first variant, acceptance test, and the operations required to establish and discard a checkpoint. This will not impose a high run-time overhead unless an error is detected and backward recovery required. In this regard, RB is highly efficient. Limitations of the RB method are mainly related to its acceptance test. This test is usually derived from the semantics of a given application, and close dependency between the test and variants may impact dependability of the whole system. Moreover, the development of simple, effective acceptance tests is a difficult task. The N-version programming (NVP) approach avoids use of an acceptance test by taking advantage of parallel execution of multiple versions and result's comparison (although sequential execution is conceptually possible just as parallel execution of RB alternates is possible). NVP is a direct application of the hardware N-modular redundancy approach (NMR) to software. Many adjudication mechanisms, usually based on result comparison and in most cases independent of semantics of the applications, are available and can be selected to determine a single adjudication result from a set or a subset of all the results of variants. Here the probability of common mode failure between the adjudicator and the variants is relatively low. When variants are executed in parallel, NVP may have a fixed response time, thereby guaranteeing timely responses in presence of faults. However, it utilises redundancy in a static manner and always execute all the versions regardless of the normal or abnormal state of the system. The purpose is to tolerate the maximum number of faults that may be present in the system; but, since such a worst case rarely happens, the amount of resources consumed is often higher than necessary. 2 Open Problems in Designing Software Fault Tolerance Academic research and practical applications on software fault tolerance have made much progress in clarifying the possibilities of such methodologies and the problems related to their page 1 page 2
2 application such as complexity which should be kept under control. At the same time, the advantages obtainable from individual software fault tolerance schemes are not clearly measurable. Design diversity still has some difficulties in ensuring a routine-based improvement in software dependability; [9] details such discussion. The work on using simple retry of programs to mask the effects of the faults that cause transient errors [10, 11] seems to fit practical experience, but it is less complete and its effectiveness may be a matter of luck. Using data diversity to tolerate design faults in software systems [1] might provide a more cost effective alternative though such a technique is not generally applicable either. In order to improve the effectiveness of software fault tolerance some problems need to be addressed. Among them are the high costs (both the run-time overhead and design cost), the ability to evaluate the impact of software fault tolerance structures and the usually very limited flexibility of software fault tolerance designs and their consequent inability to adapt to changing run-time conditions. Some of these problems are typical of software fault tolerance while others are common also to software implemented fault tolerance, i.e. the tolerance to hardware faults performed by software. The cost of developing variants and adjudicator may be many times more than that of a single variant [15]. Some research activity is being undertaken with the objective of reducing the development cost of fault-tolerant software. The object-oriented programming paradigm has shown some possibilities through inheritance and polymorphism mechanisms [25]. At run time, all the fault tolerance approaches require some extra space or extra time, or both. Note that efficient use of the available resources generally requires dynamic management and conditional execution of the software variants. This should come with a dynamic trade-off between full parallel execution and totally sequential execution of the variants. Dynamic redundancy for the purpose of space-time trade-off is a classical idea, e.g. Duplicated Configuration with a Spare and NMR with Spares used in hardware [12]. However, the majority of software fault tolerance schemes do not provide such a dynamic space-time trade-off. In order to use redundancy in a dynamic or conditional manner, a scheme has to decide, at appropriate intermediate points of its execution, which of the following three execution states has been reached: i) End-state E a result exists that meets the required condition for delivery and can thus be delivered; ii) Non end-state N there is no result that meets the condition, but it is still possible to obtain such a result if further redundancy is employed; or iii) Failure state F there is no further possibility of producing a result that meets the condition. Research to evaluate the impact of software fault tolerance structures has followed two main directions: experimental measurement and analytical estimation. Some of the experiments developed a software project, with procedures as close as possible to those that would be used if the proposed methodology were chosen for producing "real-world" software, and then tested extensively. They have mainly provided useful insight into the problems of implementing the methodology since very little statistical value can be given to the data collected. Other experiments were "Statistically oriented" in which a number of variants of the same software were developed, and then tested to obtain statistical data. Of course, the scope of applicability of such results is still limited. A common outcome was that good specifications are of paramount importance. The fundamental problem of coincident failures has been studied to some detail for the NVP. The experiment described in [14] has disproved (for one particular sample of variants, of course) the independence hypothesis while PODS [5] has suggested that it may hold for small software modules. In PODS coincident failures in two variants were not normally due to similar programming errors (faults) but rather to a fault-masking effect in Boolean decision logic, a well-known phenomenon in the study of combinatorial circuits. Different faults (bugs) appeared to produce failures with independent probabilities. Analytical estimations of fault-tolerant software have been published in a number of papers, most recently [2, 6, 8, 15, 17, 20-22]. They differ in the models and analytical tools used and in the assumptions made. The main problem in using these models is the difficulty of estimating the values of the parameters, in particular the probabilities of errors common to redundant components. This information must be obtained experimentally, but we are still far from being able to determine it with acceptable confidence. All the structuring methods for software fault tolerance, are designed such that one single condition for delivering a result is usually embedded explicitly or implicitly in the adjudicator. This rigid design choice limits the flexibility and prevents the possibility to adapt to variations of the run-time environment or of the application requirements. Note that different conditions for delivering a result in principle have different fault coverage though some conditions seem to be very similar. Take as an example a mission of a critical system where two phases have been identified: normal and critical. In the critical phase an application could use a more severe delivery condition ensuring very strict checks against the delivery of erroneous output, thus a support for strong detection of abnormal conditions and an help to trigger the safety mechanisms could be provided. Obviously the ability to adapt to variations of the runtime environment or of the application requirements calls for dynamic decision to be taken. The price to be paid for this ability is clearly additional complexity, which must be controlled and limited as it may be source of errors in itself and cause the defeat of the method. 3 SCOP Among the existing problems in the area of software fault tolerance, we here intend to focus our attention on generality and flexibility. These features may of course be improved but more complex structures must be designed. Here we discuss the case of SCOP [7, 24]. It has been originally proposed aiming primarily at improving run-time effectiveness and to this page 3 page 4
3 purpose it applies conditional redundancy also when comparison based adjudicators are used. A second thinking, however, allows to recognise SCOP as a general dynamic scheme coping with both flexibility and run time efficiency for tolerating either software or hardware faults. The question that remains to be solved is whether the increased complexity is justified and the trade off represented by SCOP is satisfactory. In this respect the SCOP critical features, namely the inherent complexity of the control algorithm and the need for mechanism to support implementation are discussed and directions towards reducing complexity identified. 3.1 Basic Description The SCOP scheme consists of a set of software components, V={v 1, v 2,..., v n }, an adjudication mechanism, a set of delivery conditions one of them to be dynamically chosen at run time, and a controller that coordinates dynamic actions of the architecture. At run time an instance of SCOP accepts as additional parameters the selected delivery condition and, possibly, a deadline for the whole execution. First the controller decides how many phases can be performed (in order to provide a timely result), then it selects the (minimum) set of components that (if successful) could satisfy the selected delivery condition. After execution of the set of components the adjudicator verifies if the chosen delivery condition has been met, (using a Syndrome that may grow as more phases are performed). This behaviour is repeated until a result can be delivered or the software components are exhausted. The behaviour of SCOP can be described more precisely by the following control algorithm with comments on the right side. begin i:= 0; State_mark := N; Si = {}; C := one of { delivery conditions }; decide(max_phase); while State_mark = N and i < max_phase do begin i := i+1; configure(c, Si-1, i, Vi); execute(vi, Si); adjudicate(c, Si, State_mark, res); end; if State = E then deliver(res) else signal(failure); end {index of the current phase, set to 0} {set current state as non end-state} {set syndrome as empty} {set required delivery condition} {based on time constraints} {while current state is non end-state and current phase < maximum allowed} {start new phase} {set new Currently Active Set} {execute and obtain new syndrome} {set new state mark and select result} {current state is end-state or failure state?} The decide procedure determines the maximum number max_phase of phases to be permitted by the specified timing constraints. Procedure configure constructs the CAS set V 1 in phase one according to the selected delivery condition and the given application environment, and establishes the CAS set V i (i>1) based on the syndrome S i-1 collected in the (i-1)th phase and the information on phases. The execution of a CAS may lead to a successful state E. Note that software components in V i are selected from the software components that have not been used in any of the previous phases, i.e. V i is a subset of V - (V 1 V 2... V i-1 ). If the i-th phase is the last, V i would contain all the remaining spare software components. The execute procedure manages the execution of the software components in CAS and generates the syndrome S i, where S 0 is an empty set and S i-1 is a subset of S i. Procedure adjudicate implements the adjudication function using the selected condition C. It receives the syndrome S i, sets the new State_mark and selects the result res, if one exists. The deliver procedure delivers the selected result and the signal produces a failure notification. 3.2 SCOP characteristics Given the above mentioned difficulties for demonstrating the general effectiveness of design diversity to attain software dependability, in order to tolerate software faults (and some hardware-related faults), an instance of SCOP may employ software components according to an application-specific approach for masking the effect of faults. One of these is obviously multiple versions of software, but also diversity in data space, simple retry of programs or multiple replicas could be used, much depending upon specific application requirements and considerations of cost-effectiveness. An instance of SCOP can be designed to obey multiple different delivery conditions. One of them is dynamically chosen at run time, and the selected condition may change for different executions, according to the degradation of the system or to the actual execution phase, as discussed previously. In addition, if SCOP is used for the provision of a service used by many different applications, different delivery conditions may be dynamically chosen by the different applications, according to their degrees of criticality. Since the different conditions will usually have different fault coverage, SCOP is therefore able to provide different levels of dependability. SCOP is very efficient and makes a dynamic use of redundancy; i.e., always tries to execute the minimum number of software components strictly necessary for providing a result that meets the stated delivery conditions. To do this it organises the execution of components in phases, dynamically configuring a currently active set (CAS) V i, a subset of V, at the beginning of the ith phase. An adjudication is made after the execution of V i in order to check if conditions for the release of a result are satisfied. The result will be output immediately and any further phases and actions will be ended once these conditions are met. The initial CAS V 1 in phase one can be determined and flexibly changed with respect to different delivery conditions. Whenever recognised necessary, according to the selected delivery condition, the page 5 page 6
4 syndrome (a set of information used by an adjudicator to perform its judgement as to the correctness of a result) in SCOP is accumulated with the increase of phases. All the results produced and the additional information collected so far are employed to support the selection of a correct result. The architecture is very general allowing to combine several approaches for masking the effect of faults with different delivery conditions. For example, combining the design diversity approach with an acceptance test the Recovery block behaviour is obtained, while a pure replication and a majority voter (with the selection of one phase only) can be used for the design of an instance of NMR. This way the best alternative appropriate for the specific application can be specified and designed. The mechanisms for providing flexibility (concept of delivery condition) and efficiency (adaptive redundancy management) are basically independent of ways of redundancy for masking software faults, but must rely on the highly dynamic and complex control algorithm described. 3.3 Complexity The complexity of the dynamic control algorithm and of the adjudication mechanism for SCOP could itself be a source of errors and thus defeat the proposed scheme. These components are very general and can be re-used for all the instances of the scheme. Thus it appears feasible to verify them formally prove their correctness. Then, there is the need to develop a methodology for designing SCOP components hopefully supported by automatic tools. Given the application requirements to be fulfilled, the design of the specific SCOP instance implies the proper selection of the fault masking approach, of the redundancy degree and of the delivery condition(s). The appropriateness of the synthesised SCOP instance can then be verified and evaluated. The information specifying the desired behaviour, relative to this instance, of the control algorithm and the adjudication mechanism can be made read-only and recorded on stable storage if necessary. In this way, complexity of on-line control is greatly reduced: the adjudication mechanism and control algorithm of SCOP can take run-time actions by just monitoring the execution and reading the proper information, without performing complex computation. 3.4 Supporting Mechanism To support SCOP methodology several OS implemented mechanisms are necessary which are basically the same used to implement NVP and RB. Control of the components may be provided by a controller (similar to the driver program used in N-version programming). The controller is responsible for: 1) a synchronisation mechanism; 2) a mechanism for ensuring an identical set of input values or the proper data representation, as the case may be, to each component; 3) a mechanism for dynamically invoking an appropriate subset of variants; 4) a support to the specialisation of possible types of adjudication (both application semantics and syntax based); 5) a support to logical and/or physical reconfiguration, if needed. This set of OS implemented mechanisms may be seen as constituting a generic SCOP runtime support. Actually, from our preliminary analysis it results that the implementation of this support should not introduce serious technical difficulties, in comparison with the classical approaches such as NVP and RB. We are currently implementing a prototype of the SCOP scheme in an experimental C++ testbed (a local network environment that consists of a number of Sun-3 and Sun-4 workstations). Preliminary experimental data are promising and have been providing us with additional confidence for adopting SCOP in practical systems. 4 Conclusions In this paper we have dealt with some of the open problems that need to be addressed to improve the effectiveness of software fault tolerance structures. Besides pointing out at the high costs (both the run-time overhead and design cost) and at the ability to evaluate the impact of software fault tolerance structures, we focused on the issues of lack of generality and the usually very limited flexibility of software fault tolerance designs and their consequent inability to adapt to changing run-time conditions. These characteristics can be improved at the price of adding complexity to the design as exemplified by SCOP; thus proper trade-offs are required. SCOP is a fault tolerant system structuring method based on conditional usage of the available redundant components. It allows to combine several approaches for masking the effect of faults with different, even multiple, delivery conditions. It is immediate to design specific instances of SCOP which behave as a Recovery block or an NVP or as software implemented techniques for tolerating hardware faults. This generality allows to specify the best alternative for a specific application and to reduce the supporting mechanisms that the kernel/os must provide to a common set. The possibility in SCOP to define multiple delivery conditions to be used by the same instance represents a significant novelty in software fault tolerance in what it allows a great degree of flexibility and generality. Different applications with different criticality may thus use the same basic service equipped with multiple delivery conditions. In this way SCOP supports for different integrity levels. Another case easily managed is that of a single application with a variable degree of criticality in different phases of a mission. A SCOP instance, which usually provides services with a given degree of fault tolerance, may dynamically be used to provide support for safety. It is enough to switch to a delivery condition enforcing very strict checks against the delivery of erroneous outputs: the behaviour obtained consists in very strong detection of abnormal conditions helping to trigger the safety mechanisms. In both cases, this flexibility allows the scheme to easily manage executions that must occur in a degraded system, where less resources than usual are available due to some faults and before a repair or replacing action can take place. All these nice char- page 7 page 8
5 acteristics are obtained at the price of adding complexity which must be controlled and limited since it may be source of errors itself. To understand whether the increased complexity is justified and the trade off represented by SCOP is satisfactory the different sources of additional complexity of SCOP compared to the other schemes have been pointed out and directions towards reducing complexity identified. References [1] P. E. Ammann and J. C. Knight, "Data diversity: an approach to software fault tolerance," IEEE Trans. Comput., Vol. 37, pp , [2] J. Arlat, K. Kanoun and J. C. Laprie, "Dependability modelling and evaluation of software fault tolerant systems," IEEE Trans. Comput., Vol. 39, pp , [3] A. Avizienis and L. Chen, "On the implementation of N-version-programming for software ault-tolerance during execution," in Proc. Int. Conf. Comput. Soft. and Appli., New York, 1977, pp [4] A. Avizienis and J. C. Laprie, "Dependable Computing: from Concepts to Design Diversity," Proc. of the IEEE, Vol. 74, pp , [5] P. G. Bishop and F. D. Pullen, "PODS Revisited - A Study of Software Failure Behaviour," in Proc. 18th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-18), Tokyo, Japan, 1988, pp [6] A. Bondavalli, S. Chiaradonna, F. Di Giandomenico and L. Strigini, "Dependability Analysis of Iterative Fault Tolerant Software Considering Correlation," in "Predictably Dependable Computing Systems", B. Randell, J. C. Laprie, H. Kopetz and B. Littlewood Ed., Springer-Verlag, 1995, pp [7] A. Bondavalli, F. Di Giandomenico and J. Xu, "A Cost-Effective and Flexible Scheme for Software fault Tolerance," Journal of Computer Systems Science and Engineering, Vol. 8, pp , [8] S. Chiaradonna, A. Bondavalli and L. Strigini, "On Performability Modeling and Evaluation of Software Fault Tolerance Structures," in Proc. 1st European Dependable Computing Conference (EDCC-1), Berlin, Germany, 1994, pp [9] D. E. Eckhardt, A. K. Caglayan, J. C. Knight, L. D. Lee, D. F. McAllister, M. A. Vouk and J. P. J. Kelly, "An Experimental Evaluation of Software Redundancy as a Strategy for Improving Reliability," IEEE Trans. Soft. Eng., Vol. 17, pp , [10] J. Gray and A. Reuter, "Transaction Processing: Concepts and Techniques," Morgan Kaufmann, [11] Y. Huang and C.M.R. Kintala, "Software implemented fault tolerance: Technologies and experience," in Proc. 23rd Int. Symp. Fault Tolerant Comput. (FTCS-23), Toulouse, 1993, pp [12] B. W. Johnson, "Design and Analysis of Fault Tolerant Digital Systems," Addison- Wesley Pub. Co., [13] K. H. Kim, "Distributed execution of recovery blocks: an approach to uniform treatment of hardware and software faults," in Proc. 4th Int. Conf. Distributed Comput. Sys., 1984, pp [14] J. C. Knight and N. G. Leveson, "An Experimental Evaluation of the Assumption of Independence in Multiversion Programming," IEEE Trans. Soft. Eng., Vol. SE-12, pp , [15] J. C. Laprie, J. Arlat, C. Beounes and K. Kanoun, "Definition and Analysis of Hardware and Software Fault-Tolerant Architecture," IEEE Computer, Vol. 23, pp , [16] J. C. Laprie, J. Arlat, C. Beounes, K. Kanoun and C. Hourtolle, "Hardware and Software Fault Tolerance: Definition and Analysis of Architectural Solutions," in Proc. 17th Int. Symp. Fault-Tolerant Comput., Pittsburgh, 1987, pp [17] M. R. Lyu and Y. He, "Improving the N-Version Programming Process Through the Evolution of a Design Paradigm," IEEE Transactions on Reliability, Special Issue on Fault-Tolerant Software, Vol. R-42, pp , [18] B. Randell, "System Structure for Software Fault Tolerance," IEEE Trans. Soft. Eng., Vol. SE-1, pp , [19] R. K. Scott, J. W. Gault and D. F. Mcallister, "Fault tolerant software reliability modeling," IEEE Trans. Soft. Eng., Vol. SE-13, pp , [20] A. Tai, A. Avizienis and J. Meyer, "Evaluation of fault-tolerant software: a perormability modeling approach," in "Dependable Computing for Critical Applications 3", C. E. Landweh, B. Randell and L. Simoncini Ed., Sprinter-Verlag, 1993, pp. [21] A. T. Tai, "Performability-Driven Adaptive Fault Tolerance," in Proc. 24th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-24), Austin, Texas, 1994, pp [22] A. T. Tai, A. Avizienis and J. F. Meyer, "Performability Enhancement of Fault-Tolerant Software," IEEE Transactions on Reliability, Special Issue on Fault-Tolerant Software, Vol. R-42, pp , [23] J. Xu, "The t/(n-1)-diagnosability and Its Applications to Fault Tolerance," in Proc. 21st Int. Symp. Fault-Tolerant Comput., Montreal, 1991, pp [24] J. Xu, A. Bondavalli and F. Di Giandomenico, "Dynamic Adjustment of Dependability and Efficiency in Fault-Tolerant Software," in "Predictably Dependable Computing Systems", B. Randell, J. C. Laprie, H. Kopetz and B. Littlewood Ed., Springer-Verlag, 1995, pp [25] J. Xu, B. Randell, C.M.F. Rubira-Calsavara and R.J. Stroud, "Toward an Object-Oriented Approach to Software Fault Tolerance," in "Fault-Tolerant Parallel and Distributed Systems", D. R. Avresky Ed., IEEE Computer Society Press, 1994, pp. page 9 page 10
The Reliable Hybrid Pattern A Generalized Software Fault Tolerant Design Pattern
1 The Reliable Pattern A Generalized Software Fault Tolerant Design Pattern Fonda Daniels Department of Electrical & Computer Engineering, Box 7911 North Carolina State University Raleigh, NC 27695 email:
More informationImplementing Software-Fault Tolerance in C++ and Open C++: An Object-Oriented and Reflective Approach
Implementing Software-Fault Tolerance in C++ and Open C++: An Object-Oriented and Reflective Approach Jie Xu, Brian Randell and Avelino F. Zorzo Department of Computing Science University of Newcastle
More informationHardware and Software Fault Tolerance: Adaptive Architectures in Distributed Computing Environments
Hardware and Software Fault Tolerance: Adaptive Architectures in Distributed Computing Environments F. Di Giandomenico 1, A. Bondavalli 2 and J. Xu 3 1 IEI/CNR, Pisa, Italy; 2 CNUCE/CNR, Pisa, Italy 3
More informationResponsive Roll-Forward Recovery in Embedded Real-Time Systems
Responsive Roll-Forward Recovery in Embedded Real-Time Systems Jie Xu and Brian Randell Department of Computing Science University of Newcastle upon Tyne, Newcastle upon Tyne, UK ABSTRACT Roll-forward
More informationA Low-Cost Correction Algorithm for Transient Data Errors
A Low-Cost Correction Algorithm for Transient Data Errors Aiguo Li, Bingrong Hong School of Computer Science and Technology Harbin Institute of Technology, Harbin 150001, China liaiguo@hit.edu.cn Introduction
More informationReview of Software Fault-Tolerance Methods for Reliability Enhancement of Real-Time Software Systems
International Journal of Electrical and Computer Engineering (IJECE) Vol. 6, No. 3, June 2016, pp. 1031 ~ 1037 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i3.9041 1031 Review of Software Fault-Tolerance Methods
More informationCprE 458/558: Real-Time Systems. Lecture 17 Fault-tolerant design techniques
: Real-Time Systems Lecture 17 Fault-tolerant design techniques Fault Tolerant Strategies Fault tolerance in computer system is achieved through redundancy in hardware, software, information, and/or computations.
More informationDependability tree 1
Dependability tree 1 Means for achieving dependability A combined use of methods can be applied as means for achieving dependability. These means can be classified into: 1. Fault Prevention techniques
More informationDependability Analysis of Web Service-based Business Processes by Model Transformations
Dependability Analysis of Web Service-based Business Processes by Model Transformations László Gönczy 1 1 DMIS, Budapest University of Technology and Economics Magyar Tudósok krt. 2. H-1117, Budapest,
More informationExperimental Evaluation of Fault-Tolerant Mechanisms for Object-Oriented Software
Experimental Evaluation of Fault-Tolerant Mechanisms for Object-Oriented Software Avelino Zorzo, Jie Xu, and Brian Randell * Department of Computing Science, University of Newcastle upon Tyne, NE1 7RU,UK
More informationFAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)
Distributed Systems Fö 9/10-1 Distributed Systems Fö 9/10-2 FAULT TOLERANCE 1. Fault Tolerant Systems 2. Faults and Fault Models. Redundancy 4. Time Redundancy and Backward Recovery. Hardware Redundancy
More informationSoftware Engineering: Integration Requirements
Software Engineering: Integration Requirements AYAZ ISAZADEH Department of Computer Science Tabriz University Tabriz, IRAN Abstract: - This paper presents a discussion of software integration requirements,
More informationDesigning fault-tolerant SOA based on design diversity
Nascimento et al. Journal of Software Engineering Research and Development 2014, 2:13 RESEARCH Open Access Designing fault-tolerant SOA based on design diversity Amanda S Nascimento 1*, Cecília MF Rubira
More informationRedundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992
Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical
More informationSequential Fault Tolerance Techniques
COMP-667 Software Fault Tolerance Software Fault Tolerance Sequential Fault Tolerance Techniques Jörg Kienzle Software Engineering Laboratory School of Computer Science McGill University Overview Robust
More informationIssues in Programming Language Design for Embedded RT Systems
CSE 237B Fall 2009 Issues in Programming Language Design for Embedded RT Systems Reliability and Fault Tolerance Exceptions and Exception Handling Rajesh Gupta University of California, San Diego ES Characteristics
More informationComponent Failure Mitigation According to Failure Type
onent Failure Mitigation According to Failure Type Fan Ye, Tim Kelly Department of uter Science, The University of York, York YO10 5DD, UK {fan.ye, tim.kelly}@cs.york.ac.uk Abstract Off-The-Shelf (OTS)
More informationRedundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992
Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to
More informationAUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM
AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM B.HARIKRISHNA 1, DR.S.RAVI 2 1 Sathyabama Univeristy, Chennai, India 2 Department of Electronics Engineering, Dr. M. G. R. Univeristy, Chennai,
More informationSoftware Diversity and Fault-Tolerance: An Overview
Software Diversity and Fault-Tolerance: An Overview Daniel Rodriguez Retamosa and Mehrdad Saadatmand Mälardalen Real-Time Research Centre (MRTC) Mälardalen University Västerås, Sweden dra05002@student.mdh.se,
More informationIntroduction to Software Fault Tolerance Techniques and Implementation. Presented By : Hoda Banki
Introduction to Software Fault Tolerance Techniques and Implementation Presented By : Hoda Banki 1 Contents : Introduction Types of faults Dependability concept classification Error recovery Types of redundancy
More informationDiversely Designed Classes for Use by Multiple Tasks
Diversely Designed Classes for Use by Multiple Tasks Alexander Romanovsky Department of Computing Science University of Newcastle upon Tyne, NE1 7RU, UK email: alexander.romanovsky@newcastle.ac.uk tel:
More informationChapter 9. Software Testing
Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of
More informationVALIDATING AN ANALYTICAL APPROXIMATION THROUGH DISCRETE SIMULATION
MATHEMATICAL MODELLING AND SCIENTIFIC COMPUTING, Vol. 8 (997) VALIDATING AN ANALYTICAL APPROXIMATION THROUGH DISCRETE ULATION Jehan-François Pâris Computer Science Department, University of Houston, Houston,
More informationACTIVE NETWORK MANAGEMENT FACILITATING THE CONNECTION OF DISTRIBUTED GENERATION AND ENHANCING SECURITY OF SUPPLY IN DENSE URBAN DISTRIBUTION NETWORKS
ACTIVE NETWORK MANAGEMENT FACILITATING THE CONNECTION OF DISTRIBUTED GENERATION AND ENHANCING SECURITY OF SUPPLY IN DENSE URBAN DISTRIBUTION NETWORKS David OLMOS MATA Ali R. AHMADI Graham AULT Smarter
More informationAN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES
AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES S. SRINIVAS KUMAR *, R.BASAVARAJU ** * PG Scholar, Electronics and Communication Engineering, CRIT
More informationA Modelling and Analysis Environment for LARES
A Modelling and Analysis Environment for LARES Alexander Gouberman, Martin Riedl, Johann Schuster, and Markus Siegle Institut für Technische Informatik, Universität der Bundeswehr München, {firstname.lastname@unibw.de
More informationConcurrent Exception Handling and Resolution in Distributed Object Systems
Concurrent Exception Handling and Resolution in Distributed Object Systems Presented by Prof. Brian Randell J. Xu A. Romanovsky and B. Randell University of Durham University of Newcastle upon Tyne 1 Outline
More informationDistributed Systems COMP 212. Lecture 19 Othon Michail
Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails
More informationLabVIEW Based Embedded Design [First Report]
LabVIEW Based Embedded Design [First Report] Sadia Malik Ram Rajagopal Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 malik@ece.utexas.edu ram.rajagopal@ni.com
More informationMATERIALS AND METHOD
e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com Evaluation of Web Security Mechanisms
More informationITERATIVE MULTI-LEVEL MODELLING - A METHODOLOGY FOR COMPUTER SYSTEM DESIGN. F. W. Zurcher B. Randell
ITERATIVE MULTI-LEVEL MODELLING - A METHODOLOGY FOR COMPUTER SYSTEM DESIGN F. W. Zurcher B. Randell Thomas J. Watson Research Center Yorktown Heights, New York Abstract: The paper presents a method of
More informationWhat are Embedded Systems? Lecture 1 Introduction to Embedded Systems & Software
What are Embedded Systems? 1 Lecture 1 Introduction to Embedded Systems & Software Roopa Rangaswami October 9, 2002 Embedded systems are computer systems that monitor, respond to, or control an external
More informationA Framework for Reliability Assessment of Software Components
A Framework for Reliability Assessment of Software Components Rakesh Shukla, Paul Strooper, and David Carrington School of Information Technology and Electrical Engineering, The University of Queensland,
More information!! An!Orthogonal!Framework!for!Fault! Tolerance!Composition!in!Software!Systems!!!
AnOrthogonalFrameworkforFault ToleranceCompositioninSoftwareSystems SobiaKhurshidKhan ComputingDepartment LancasterUniversity UnitedKingdom SUBMITTEDINPARTIALFULFILLMENTOFTHE REQUIREMENTFORTHEDEGREEOF
More informationConceptual Model for a Software Maintenance Environment
Conceptual Model for a Software Environment Miriam. A. M. Capretz Software Engineering Lab School of Computer Science & Engineering University of Aizu Aizu-Wakamatsu City Fukushima, 965-80 Japan phone:
More informationReferences: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 10/14/2004 1
References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 10/14/2004 1 Assertions Statements about input to a routine or state of a class Have two primary roles As documentation,
More informationAn Optimal Locking Scheme in Object-Oriented Database Systems
An Optimal Locking Scheme in Object-Oriented Database Systems Woochun Jun Le Gruenwald Dept. of Computer Education School of Computer Science Seoul National Univ. of Education Univ. of Oklahoma Seoul,
More informationAerospace Software Engineering
16.35 Aerospace Software Engineering Verification & Validation Prof. Kristina Lundqvist Dept. of Aero/Astro, MIT Would You...... trust a completely-automated nuclear power plant?... trust a completely-automated
More informationReflective Design Patterns to Implement Fault Tolerance
Reflective Design Patterns to Implement Fault Tolerance Luciane Lamour Ferreira Cecília Mary Fischer Rubira Institute of Computing - IC State University of Campinas UNICAMP P.O. Box 676, Campinas, SP 3083-970
More informationImproving Memory Repair by Selective Row Partitioning
200 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems Improving Memory Repair by Selective Row Partitioning Muhammad Tauseef Rab, Asad Amin Bawa, and Nur A. Touba Computer
More informationA Case Study for Fault Tolerance Oriented Programming in Multi-core Architecture
Software Engineering Group Department of Computer Science Nanjing University http://seg.nju.edu.cn Technical Report No. NJU-SEG- 2009-IC-001 A Case Study for Fault Tolerance Oriented Programming in Multi-core
More informationOn Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems
On Object Orientation as a Paradigm for General Purpose Distributed Operating Systems Vinny Cahill, Sean Baker, Brendan Tangney, Chris Horn and Neville Harris Distributed Systems Group, Dept. of Computer
More informationVerification and Validation. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1
Verification and Validation 1 Objectives To introduce software verification and validation and to discuss the distinction between them To describe the program inspection process and its role in V & V To
More informationNOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or
NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying
More informationResilience Design Patterns: A Structured Approach to Resilience at Extreme Scale
Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale Saurabh Hukerikar Christian Engelmann Computer Science Research Group Computer Science & Mathematics Division Oak Ridge
More informationFault Tolerance. The Three universe model
Fault Tolerance High performance systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful
More informationIntroducing MESSIA: A Methodology of Developing Software Architectures Supporting Implementation Independence
Introducing MESSIA: A Methodology of Developing Software Architectures Supporting Implementation Independence Ratko Orlandic Department of Computer Science and Applied Math Illinois Institute of Technology
More informationAssertions. Assertions - Example
References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 11/13/2003 1 Assertions Statements about input to a routine or state of a class Have two primary roles As documentation,
More informationPart 5. Verification and Validation
Software Engineering Part 5. Verification and Validation - Verification and Validation - Software Testing Ver. 1.7 This lecture note is based on materials from Ian Sommerville 2006. Anyone can use this
More informationConsistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:
Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical
More informationAn Approach to Task Attribute Assignment for Uniprocessor Systems
An Approach to ttribute Assignment for Uniprocessor Systems I. Bate and A. Burns Real-Time Systems Research Group Department of Computer Science University of York York, United Kingdom e-mail: fijb,burnsg@cs.york.ac.uk
More informationA Robust Bloom Filter
A Robust Bloom Filter Yoon-Hwa Choi Department of Computer Engineering, Hongik University, Seoul, Korea. Orcid: 0000-0003-4585-2875 Abstract A Bloom filter is a space-efficient randomized data structure
More informationThe Design Space of Software Development Methodologies
The Design Space of Software Development Methodologies Kadie Clancy, CS2310 Term Project I. INTRODUCTION The success of a software development project depends on the underlying framework used to plan and
More information3.4 Data-Centric workflow
3.4 Data-Centric workflow One of the most important activities in a S-DWH environment is represented by data integration of different and heterogeneous sources. The process of extract, transform, and load
More informationMONIKA HEINER.
LESSON 1 testing, intro 1 / 25 SOFTWARE TESTING - STATE OF THE ART, METHODS, AND LIMITATIONS MONIKA HEINER monika.heiner@b-tu.de http://www.informatik.tu-cottbus.de PRELIMINARIES testing, intro 2 / 25
More informationFault-tolerant techniques
What are the effects if the hardware or software is not fault-free in a real-time system? What causes component faults? Specification or design faults: Incomplete or erroneous models Lack of techniques
More informationExploiting Unused Spare Columns to Improve Memory ECC
2009 27th IEEE VLSI Test Symposium Exploiting Unused Spare Columns to Improve Memory ECC Rudrajit Datta and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer Engineering
More informationHDL IMPLEMENTATION OF SRAM BASED ERROR CORRECTION AND DETECTION USING ORTHOGONAL LATIN SQUARE CODES
HDL IMPLEMENTATION OF SRAM BASED ERROR CORRECTION AND DETECTION USING ORTHOGONAL LATIN SQUARE CODES (1) Nallaparaju Sneha, PG Scholar in VLSI Design, (2) Dr. K. Babulu, Professor, ECE Department, (1)(2)
More informationISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AUTOMATIC TEST CASE GENERATION FOR PERFORMANCE ENHANCEMENT OF SOFTWARE THROUGH GENETIC ALGORITHM AND RANDOM TESTING Bright Keswani,
More informationArea Efficient Scan Chain Based Multiple Error Recovery For TMR Systems
Area Efficient Scan Chain Based Multiple Error Recovery For TMR Systems Kripa K B 1, Akshatha K N 2,Nazma S 3 1 ECE dept, Srinivas Institute of Technology 2 ECE dept, KVGCE 3 ECE dept, Srinivas Institute
More informationVTV A Voting Strategy for Real-Time Systems
VTV A Voting Strategy for Real-Time Systems Hüseyin Aysan, Sasikumar Punnekkat, and Radu Dobrin Mälardalen Real-Time Research Centre, Mälardalen University, Västerås, Sweden {huseyin.aysan, sasikumar.punnekkat,
More informationDoctoral Studies and Research Proposition. Diversity in Peer-to-Peer Networks. Mikko Pervilä. Helsinki 24 November 2008 UNIVERSITY OF HELSINKI
Doctoral Studies and Research Proposition Diversity in Peer-to-Peer Networks Mikko Pervilä Helsinki 24 November 2008 UNIVERSITY OF HELSINKI Department of Computer Science Supervisor: prof. Jussi Kangasharju
More informationSOFTWARE ENGINEERING DECEMBER. Q2a. What are the key challenges being faced by software engineering?
Q2a. What are the key challenges being faced by software engineering? Ans 2a. The key challenges facing software engineering are: 1. Coping with legacy systems, coping with increasing diversity and coping
More informationMetaheuristic Optimization with Evolver, Genocop and OptQuest
Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:
More informationSoftware Testing. Software Testing
Software Testing Software Testing Error: mistake made by the programmer/ developer Fault: a incorrect piece of code/document (i.e., bug) Failure: result of a fault Goal of software testing: Cause failures
More informationQuality Assurance in Software Development
Quality Assurance in Software Development Qualitätssicherung in der Softwareentwicklung A.o.Univ.-Prof. Dipl.-Ing. Dr. Bernhard Aichernig Graz University of Technology Austria Summer Term 2017 1 / 47 Agenda
More informationHandling Multi Objectives of with Multi Objective Dynamic Particle Swarm Optimization
Handling Multi Objectives of with Multi Objective Dynamic Particle Swarm Optimization Richa Agnihotri #1, Dr. Shikha Agrawal #1, Dr. Rajeev Pandey #1 # Department of Computer Science Engineering, UIT,
More informationError Detecting and Correcting Code Using Orthogonal Latin Square Using Verilog HDL
Error Detecting and Correcting Code Using Orthogonal Latin Square Using Verilog HDL Ch.Srujana M.Tech [EDT] srujanaxc@gmail.com SR Engineering College, Warangal. M.Sampath Reddy Assoc. Professor, Department
More informationIan Sommerville 2006 Software Engineering, 8th edition. Chapter 22 Slide 1
Verification and Validation Slide 1 Objectives To introduce software verification and validation and to discuss the distinction between them To describe the program inspection process and its role in V
More informationDetecting Common Mode Failures in N-Version Software Using Weakest Precondition Analysis
Detecting Common Mode Failures in N-Version Software Using Weakest Precondition Analysis Gwang Sik Yoon, Sung Deok Cha, and Yong Rae Kwon Department of Computer Science Korea Advanced Institute of Science
More informationCDA 5140 Software Fault-tolerance. - however, reliability of the overall system is actually a product of the hardware, software, and human reliability
CDA 5140 Software Fault-tolerance - so far have looked at reliability as hardware reliability - however, reliability of the overall system is actually a product of the hardware, software, and human reliability
More informationA Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment
A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment Michel RAYNAL IRISA, Campus de Beaulieu 35042 Rennes Cedex (France) raynal @irisa.fr Abstract This paper considers
More informationTwo-dimensional Totalistic Code 52
Two-dimensional Totalistic Code 52 Todd Rowland Senior Research Associate, Wolfram Research, Inc. 100 Trade Center Drive, Champaign, IL The totalistic two-dimensional cellular automaton code 52 is capable
More informationTopics in Software Testing
Dependable Software Systems Topics in Software Testing Material drawn from [Beizer, Sommerville] Software Testing Software testing is a critical element of software quality assurance and represents the
More informationTSW Reliability and Fault Tolerance
TSW Reliability and Fault Tolerance Alexandre David 1.2.05 Credits: some slides by Alan Burns & Andy Wellings. Aims Understand the factors which affect the reliability of a system. Introduce how software
More informationGOOFI : Generic Object-Oriented Fault Injection Tool
GOOFI : Generic Object-Oriented Fault Injection Tool Joakim Aidemark, Jonny Vinter, Peter Folkesson, and Johan Karlsson Laboratory for Dependable Computing Department of Computer Engineering Chalmers University
More informationB.H. Far
SENG 637 Dependability, Reliability & Testing of Software Systems Defining i Necessary Reliability (Chapter 4) Department of Electrical & Computer Engineering, University of Calgary B.H. Far (far@ucalgary.ca)
More informationDetecting Structural Refactoring Conflicts Using Critical Pair Analysis
SETra 2004 Preliminary Version Detecting Structural Refactoring Conflicts Using Critical Pair Analysis Tom Mens 1 Software Engineering Lab Université de Mons-Hainaut B-7000 Mons, Belgium Gabriele Taentzer
More informationOn UML2.0 s Abandonment of the Actors-Call-Use-Cases Conjecture
On UML2.0 s Abandonment of the Actors-Call-Use-Cases Conjecture Sadahiro Isoda Toyohashi University of Technology Toyohashi 441-8580, Japan isoda@tutkie.tut.ac.jp Abstract. UML2.0 recently made a correction
More informationSimulink/Stateflow. June 2008
Simulink/Stateflow Paul Caspi http://www-verimag.imag.fr/ Pieter Mosterman http://www.mathworks.com/ June 2008 1 Introduction Probably, the early designers of Simulink in the late eighties would have been
More informationPetri-net-based Workflow Management Software
Petri-net-based Workflow Management Software W.M.P. van der Aalst Department of Mathematics and Computing Science, Eindhoven University of Technology, P.O. Box 513, NL-5600 MB, Eindhoven, The Netherlands,
More informationComputation of Multiple Node Disjoint Paths
Chapter 5 Computation of Multiple Node Disjoint Paths 5.1 Introduction In recent years, on demand routing protocols have attained more attention in mobile Ad Hoc networks as compared to other routing schemes
More informationECE 60872/CS 590: Fault-Tolerant Computer System Design Software Fault Tolerance
ECE : Fault-Tolerant Computer System Design Software Fault Tolerance Saurabh Bagchi School of Electrical & Computer Engineering Purdue University Some material based on ECE442 at the University of Illinois
More informationRAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE
RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting
More informationHW/SW Co-Detection of Transient and Permanent Faults with Fast Recovery in Statically Scheduled Data Paths
HW/SW Co-Detection of Transient and Permanent Faults with Fast Recovery in Statically Scheduled Data Paths Mario Schölzel Department of Computer Science Brandenburg University of Technology Cottbus, Germany
More informationAn Automatic Test Case Generator for Testing Safety-Critical Software Systems
An Automatic Test Case Generator for Testing Safety-Critical Software Systems Mehdi Malekzadeh Faculty of Computer Science and IT University of Malaya Kuala Lumpur, Malaysia mehdi_malekzadeh@perdana.um.edu.my
More informationIntroduction to Software Engineering
Introduction to Software Engineering Gérald Monard Ecole GDR CORREL - April 16, 2013 www.monard.info Bibliography Software Engineering, 9th ed. (I. Sommerville, 2010, Pearson) Conduite de projets informatiques,
More informationA CAN-Based Architecture for Highly Reliable Communication Systems
A CAN-Based Architecture for Highly Reliable Communication Systems H. Hilmer Prof. Dr.-Ing. H.-D. Kochs Gerhard-Mercator-Universität Duisburg, Germany E. Dittmar ABB Network Control and Protection, Ladenburg,
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 3 - Resilient Structures Chapter 2 HW Fault Tolerance Part.3.1 M-of-N Systems An M-of-N system consists of N identical
More informationFramework for replica selection in fault-tolerant distributed systems
Framework for replica selection in fault-tolerant distributed systems Daniel Popescu Computer Science Department University of Southern California Los Angeles, CA 90089-0781 {dpopescu}@usc.edu Abstract.
More informationBasic Concepts of Reliability
Basic Concepts of Reliability Reliability is a broad concept. It is applied whenever we expect something to behave in a certain way. Reliability is one of the metrics that are used to measure quality.
More informationM. Xie, G. Y. Hong and C. Wohlin, "A Practical Method for the Estimation of Software Reliability Growth in the Early Stage of Testing", Proceedings
M. Xie, G. Y. Hong and C. Wohlin, "A Practical Method for the Estimation of Software Reliability Growth in the Early Stage of Testing", Proceedings IEEE 7th International Symposium on Software Reliability
More informationVerification and Validation
Lecturer: Sebastian Coope Ashton Building, Room G.18 E-mail: coopes@liverpool.ac.uk COMP 201 web-page: http://www.csc.liv.ac.uk/~coopes/comp201 Verification and Validation 1 Verification and Validation
More informationMODEL FOR DELAY FAULTS BASED UPON PATHS
MODEL FOR DELAY FAULTS BASED UPON PATHS Gordon L. Smith International Business Machines Corporation Dept. F60, Bldg. 706-2, P. 0. Box 39 Poughkeepsie, NY 12602 (914) 435-7988 Abstract Delay testing of
More informationDefect Tolerance in VLSI Circuits
Defect Tolerance in VLSI Circuits Prof. Naga Kandasamy We will consider the following redundancy techniques to tolerate defects in VLSI circuits. Duplication with complementary logic (physical redundancy).
More informationApproaches to Software Based Fault Tolerance A Review
Computer Science Journal of Moldova, vol.13, no.3(39), 2005 Approaches to Software Based Fault Tolerance A Review Goutam Kumar Saha Abstract This paper presents a review work on various approaches to software
More informationVerification and Validation. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1
Verification and Validation Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1 Verification vs validation Verification: "Are we building the product right?. The software should
More informationFault Tolerance Against Design Faults
Fault Tolerance Against Design Faults Lorenzo Strigini Abstract Centre for Software Reliability, City University Northampton Square, London EC1V OHB, U.K. E-mail: strigini@csr.city.ac.uk This chapter surveys
More information