B.H. Far

Size: px
Start display at page:

Download "B.H. Far"

Transcription

1 SENG 637 Dependability, Reliability & Testing of Software Systems Defining i Necessary Reliability (Chapter 4) Department of Electrical & Computer Engineering, University of Calgary B.H. Far (far@ucalgary.ca) p// / p / / / / far@ucalgary.ca 1

2 Contents Steps in defining necessary reliability Failure severity class (FSC) Failure intensity objective (FIO) Strategies to meet FIO Software fault tolerance 2

3 SRE: Process /1 5 steps in SRE process: Define necessary reliability Develop operational profiles Prepare for test Execute test Apply failure data to guide decisions Define Necessary Reliability Develop Operational Profile Fault Tolerance Computing Prepare for Test Execute Test Apply Failure Data to Guide Decisions far@ucalgary.ca 3

4 Chapter 4 Section 1 How to define Necessary Reliability? far@ucalgary.ca 4

5 Reliability and Risk Necessary reliability depends on the Risk. Higher risk software requires higher reliability. Necessary reliability also depends on profitability, budget, man-power, etc. Q. "What are you going to test?" A. "The Most Important things. " Q. "And how do you know what the most important things are?" Reference: Software Testing Fundamentals: Methods and Metrics Marnie L. Hutcheson ISBN: X John Wiley & Sons 2003 (408 pages), Chapter

6 Necessary Reliability: How to 1) Define failure with failure severity classes (FSC) for the product. 2) Set a failure intensity objective (FIO) for each system to be tested. t 3) Choose a common scale for all associated systems. 4) Find the developed software failure intensity objective. 5) Engineer strategies to meet the software fil failure intensity it it objective. far@ucalgary.ca 6

7 1. Failure Severity Classes Failures usually differ by their impact on the system A failure Severity Class (FSC) is a set of failures that have the same per-failure impact on users using a failure classification criteria Common classification criteria: cost, system capability, human life, environment Note: there are other rankings such as MIT s ranks Failure severity is different from its complexity Severity can change with the time of failure and can be subjective far@ucalgary.ca 7

8 FSC: Common Classification Common o classification c criteria: Cost What does this failure cost in terms of operational cost, repair cost, loss of business, disruption, etc. Severity classes based on cost may be scaled by a factor of 10. Usually 4 ranges are enough. Severity class Definition ($) 1 > 100, , , ,000 10,000 4 < 1,000 far@ucalgary.ca 8

9 FSC: Common Classification Common classification criteria: System capability (Services) May include factors such as loss of data, downtime, recoverability, etc. Severity class Definition 1 Basic service interruption 2 Basic service degradation 3 Inconvenience, correction not deferrable 4 Minor tolerable effects, correction deferrable far@ucalgary.ca 9

10 FSC: Common Classification Common classification criteria: Environment May include factors such as harmful to environment, loss of wild life, etc. Applicable to nuclear, chemical industry, etc. Severity class Definition 1 Severe and unrecoverable damage to environment and/or wild life 2 Severe but partially recoverable damage to environment 3 Minor damage to environment or wild life 4 Minor but recoverable deficiencies far@ucalgary.ca 10

11 FSC: Common Classification Common classification criteria: Human life May include factors such as harmful lto human or environment, loss of human life, etc. Applicable to aeronautical, automotive, nuclear, health care industry, military systems, etc. Severity class Definition 1 Possible loss of human life 2 Severe damage to human immune system or environment 3 Minor damage to human immune system or environment 4 Minor but recoverable deficiencies far@ucalgary.ca 11

12 How to Define FSC? Experience based: ask users/ stakeholders/ developers/ compare to similar il products / use FTA and/or FMEA techniques. List all factors that t may be considered d as failure severity for the project Narrow the list down to the most critical and/or measurable ones Some factors may be hard to measure, such as impact on company reputation, etc. far@ucalgary.ca 12

13 FSC: Conflicting Concerns Conflicting viewpoints (concerns) between the software developer and customer regarding failure severity class (FSC) should be resolved before proceeding to set target failure intensity objective Comparison of the FSC for the software with a similar product is usually useful far@ucalgary.ca 13

14 Documenting FSC User profile Classification Failures (ordered list: start with the most severe ones ) (type or criteria Class 1 Class 2 Class 3 Class 4 concern) Cost System capability (Services) Human life Environment Other (specify) Define classes for each criterion separately far@ucalgary.ca 14

15 2. Failure Intensity Objective (FIO) Failure intensity objective (FIO) reflects an estimation of the bugs allowed to be remained in the product at the release time. FIO is an alternative way of expressing system reliability. 15

16 Failure Intensity Objective Failure intensity is usually given in terms of number of failure per time (or some other defined units), e.g., 3 alarms per 100 hours of operation. 5 failures per 1000 print jobs, etc. Failure intensity of a system is the sum of failure intensities for all of the components of the system (assuming no system redundancy and exponential model). far@ucalgary.ca 16

17 How to Set FIO /1 Mainly experience based and depends on the project. Depends on the trade-off among quality characteristics (development time and development cost) and functionality and technology. Rule of thumb: Estimate the project s total cost (C), e.g., using COCOMO s Early Design Model, etc., and set FIO to be 1 over C (i.e., C units of operation, assuming that the cost of highest impact is roughly equal to the total development costs) far@ucalgary.ca 17

18 How to Set FIO /2 Typical FIO for various projects Failure Impact Typical FIO ( ) Time between failures (MTTF) More than 1,000,000,000 $ cost 1 per 1,000,000,000 hours 114,000 years More than 1,000,000 $ cost 1 per 1,000,000 hours 114 years Around 1,000 $ cost 1 per 1,000 hours 6 weeks Around 100 $ cost 1 per 100 hours 100h Around 10 $ cost 1 per 10 hours 10 h Around 1 $ cost 1 per hour 1 h far@ucalgary.ca 18

19 How to Set FIO: Reliability Setting FIO in terms of reliability ln t R or 1 R t is failure intensity R is reliability t is natural unit (time, etc.) for For reliability around for 8 hours of operation, is set to R 0.95 far@ucalgary.ca 19

20 Reliability & Failure Intensity Reliability for 1 hour mission time Failure intensity failure / hour failure / 1000 hours failure / day failure / 1000 hours failure / week failure / month failure / 1000 hours failure / year far@ucalgary.ca 20

21 How to Set FIO: Availability Setting FIO in terms of system availability (A) for the exponential model : 1 1 A t At or t t t t A t 1 m m is failure intensity t m is downtime per failure eg e.g., if a product must be available 99% of time and downtime is 6 min, then FIO is about 1 per 10 hours. far@ucalgary.ca 21

22 Example Suppose we want 99 percent availability of a human- machine team. Assume that t a service interruption ti requires an average recovery time of 14 minutes for the person involved, since he/she must refresh his/her memory before restarting. Assume the average machine downtime at each failure is 1 minute. The total downtime is 15 minutes. λf= (1-0.99) / (0.99 x 0.25) = 0.01/ /0 or approximately 4 failures per 100 hr. Example From Musa s Book far@ucalgary.ca 22

23 How to Set FIO: MTTF Using MTTF A MTTR MTTF MTTF MTTF MTTF MTTR MTBF failure intensity meantime to repair meantime to failure Another definition of availability: MTTF 1 MTTF MTTR 1 MTTR MTTF M TTF MTTF MTTR far@ucalgary.ca 23

24 How to Set FIO: Hazard Rate Hazard Rate z(t): The probability that the component will fail in a given time interval given that it has not failed prior to the interval Hazard rate of 0.05 means that there is a 5% chance that the first failure will occur in the specified time interval and not before For exponential distribution, z(t) is far@ucalgary.ca 24

25 How to Set FIO: Profitability Based on analyze experience with previous or similar systems by comparing field measurements of major quality characteristics and degrees of user satisfaction with them with similar measurements for a previous release or a similar product. Compare trade-off trends between profitability and failure intensity. far@ucalgary.ca 25

26 Example Tip: select a range that leads to highest profit margin Example From Musa s Book far@ucalgary.ca 26

27 Reliability vs. Availability Why specify reliability when availability is better understood and has better intuitive appeal? Availability has a subjective appeal to the user and there are usually workarounds to make the system available without increasing the intrinsic reliability of it. Example: Using a replica server in case the domain server goes down increases the availability of the system but it does not necessarily increase the reliability of the server software. far@ucalgary.ca 27

28 Developed Software Product Developed software product is usually only a part of the whole system Example: stand alone system ft Interface e to other systems Acquired components Developed components OS, System software Hardware far@ucalgary.ca 28

29 3. Choose a Common Scale There may be various scales for expressing FIO for various project parts. Example: System failure intensity i objective = 30 failure/1,000,000 transactions MTTF for OS is 3,000 hours for 10 million transactions MTTF for hardware is 1 per 30 hours of operation One must define a unique scale for all FIOs far@ucalgary.ca 29

30 FIO for Developed Product How to compute failure intensity objective for the developed software? 1. Set FIO for the whole system 2. Set a common measurement unit for failure intensity for the whole system 3. Subtract expected failure intensity for acquired components from the FIO. 4. Subtract expected failure intensity for the environment (OS, interface systems) that the developed software will run on 5. The remaining will be failure intensity objective for the developed software components. far@ucalgary.ca 30

31 Computing Developed FIO Example 1: System failure intensity it objective = 100 failure/1,000,000 transactions Failure intensity for hardware = 0.1 failure/hour OS failure for a load of 100, transactions = 0.4 failure/hour Therefore, developed software FIO = 95 failure/1,000,000 transactions far@ucalgary.ca 31

32 Computing Developed FIO Example 2: Database system running on Win 2K System failure intensity objective = 30 failure/1,000,000 transactions MTTF for Win 2K is around 3,000 hours for 10 million transactions Average hardware failure is 1 per 30 hours Failure rate for other systems is 9 for one million transactions What is FIO for the developed software? far@ucalgary.ca 32

33 Computing Developed FIO 1 os 1/ 3000 MTTF 1 hardware 100 / / 3000 for 10, 000, 000 transactions os other total F hardware 90 for 10, 000, 000 transactions 191 for 10, 000, 000 transactions 300 for 10, 000, 000 transactions therefore developed _ software for 10,000,000 transactions far@ucalgary.ca 33

34 4. Strategies to Meet FIO Engineer strategies to meet the software failure intensity objective for the developed software. 4 main strategies: Fault prevention Fault removal Fault tolerance Fault/failure forecasting far@ucalgary.ca 34

35 Fault Prevention To avoid fault occurrences by construction. Activities: Requirement review Design review Clear code Establishing standards (ISO , 3etc) etc.) Using CASE tools with built-in check mechanisms Effectiveness factor: Proportion of the faults remaining after prevention activities. 35

36 Fault Removal To detect, by verification and validation, the existence it of ffaults and eliminate i them. Activities: Reviewing i code (inspection) i Testing Effectiveness factor: Reduction of failure intensity due to code review. Ratio of failure intensity after test and before test. far@ucalgary.ca 36

37 Testing vs. Inspection Inspections are strict and close examinations conducted on specifications, design, code, test, and other artifacts. Inspections allow for Testing allows for defect defect detection, detection prevention, and isolation Start early in life cycle Start later in life cycle Inspections are up to 20 times more efficient than testing Code reading detects twice as many defects/hour as testing 80% of development errors are usually found by inspections Inspections resulted in a 10x reduction in cost of finding errors SENG635 (Winter 2007) far@ucalgary.ca 37

38 Inspections or Testing? Q. Can inspection replace testing? No. Inspections cannot replace testing because all the information revealed through testing cannot be obtained through hinspection. Complex interactions in large systems (deadlocks, emergent behavior, etc.) Software reliability indicator Nonfunctional requirements: performance, usability, etc. SENG635 (Winter 2007) 38

39 Fault Tolerance To provide, by redundancy, service complying with the specification in spite of faults occurrences. Activities: Designing gand implementing redundancy Effectiveness factor: Reduction of failure intensity as a result of redundant design. far@ucalgary.ca 39

40 Fault / Failure Forecasting To estimate, by evaluation, the presence of faults and the occurrences of failures Activities: Establishing reliability model Collecting failure data Analysis and dinterpretation t ti of results Effectiveness factor: Reduction of failure intensity as a result of applying reliability engineering far@ucalgary.ca 40

41 41

42 Chapter 4 Section 2 Fault ltt Tolerant Software Systems far@ucalgary.ca 42

43 Fault Tolerance Terminology Backward Fault Tolerance Recovery Redundancy Forward Architectural Hardware redundancy Software redundancy Data redundancy Temporal redundancy Functional Serial Parallel Sequential 43

44 Definition & Goal /1 A fault-tolerant computing system must be capable of providing specified services in the presence of a bounded number of failures Use of techniques to enable continued delivery of service during system operation Based on the principle of Act during operation while Defined during specification and design far@ucalgary.ca 44

45 Definition & Goal /2 The failures could occur because faults are present in either the components of the system or in the system s design. Building large computing systems is a complex task; fault-tolerance requirements could make the task even more difficult unless appropriate system structuring ring concepts are utilized. Reliability growth (modeling, computation and interpretation) of a system featuring fault tolerance is different from a system without such feature. far@ucalgary.ca 45

46 Problems The traditional approaches to fault tolerance in hardware systems have been based on coping with the effects of well-understood failure modes of physical components. Conventional hardware fault tolerance methods (e.g., redundancy) are rarely powerful enough to cope with design deficiencies. E.g., designing a square wheel! Consequently, most hardware fault tolerance techniques cannot be applied directly in software fault tolerance, where almost all faults are design faults. 2+2=5 2+2=5 Redundancy of incorrectly designed component doesn t help! far@ucalgary.ca 46

47 History Defensive programming: Implementing relatively ad-hoc methods used to minimize the damage which could arise from the damage of presence of residual bugs. Dual software technique: Implementing two distinct versions of the same software and executing them. Any discrepancy in the outputs of the two versions may trigger an alarm. Etc. 47

48 Fault Tolerance Process 1. Detection Identify faults and their causes (errors) 2. Assessment Assess the extent to which the system state has been damaged or corrupted. 3. Recovery Remain operational or regain operational status 4. Fault treatment and continued service Locate and repair the fault to prevent another occurence 48

49 Definitions Recovery Actions to restore the system state to a correct state Recovery requires consistency checking Redundancy Designing the system with multiple components with the same functionality far@ucalgary.ca 49

50 Consistency Check A program-specific error detection mechanism to check on the results of program execution. Usually evaluates to either true or false. ensure<acceptance test>by P0 else-by P1 else fail far@ucalgary.ca 50

51 Example: Consistency Check Checksums for program parts or split packages Internal check points: ABS[(SQRT(x)*SQRT(x)) x] < E Exception signal when dividing by zero Integer overflow signal Interrupt signal for program loop Float point numerical failure check far@ucalgary.ca 51

52 Example: Consistency Check x y x y 6 i 1 x i y i ,1223,10,,10,3,, 10 30,2, 10 26, ,2111, The correct answer should be But ordinary implementation ti of this will return zero due to rounding and large differences in the order of magnitude of the summands. far@ucalgary.ca 52

53 Backward Recovery Roll back the system to a previously saved correct state Consistency check fails Laura L. Pullum: Software Fault Tolerance Techniques and Implementation, Artech House, 2001 far@ucalgary.ca 53

54 Domino Effect Why backward recovery is not always possible? Domino Effect: successive rollback of communicating processes when a failure is detected in any one of the processes. Laura L. Pullum: Software Fault Tolerance Techniques and Implementation, Artech House, 2001 far@ucalgary.ca 54

55 Forward Recovery Use redundancy to recover from a failure Laura L. Pullum: Software Fault Tolerance Techniques and Implementation, Artech House, 2001 far@ucalgary.ca 55

56 Forward Recovery: Pros & Cons Advantages: Forward recovery is fairly efficient in terms of the overhead (time and memory) it requires. This can be crucial in real-time applications where the time overhead of backward recovery can exceed stringent time constraints. If the fault is an anticipated one, such as the potential loss of data, then redundancy and forward recovery can be a useful and timely approach. Faults involving missed deadlines may be better recovered from using forward recovery than by introducing additional delay in roll back and recovering. Disadvantages: Application-specific, that is, it must be tailored to each situation or program. Can only remove predictable errors from the system state. Requires knowledge of the error. Cannot aid in recovery if the state is damaged beyond recoverability. Depends on the ability to accurately detect the occurrence of a fault (thus initiating the recovery actions. Laura L. Pullum: Software Fault Tolerance Techniques and Implementation, Artech House, 2001 SENG635 (Winter 2007) far@ucalgary.ca 56

57 Redundancy Redundancy: designing the system with multiple components with the same functionality Redundancy techniques: Implementing two (or more) )distinct i versions of the same software and executing them for the same set of inputs. Any discrepancy in the outputs of the two versions may trigger an alarm. Redundancy techniques efficiency depends on coincident and correlated faults. far@ucalgary.ca 57

58 Types of Redundancy Hardware redundancy Replicated and supplementary hardware added to the system to support fault tolerance. Software redundancy Also called program, modular, or functional redundancy, includes programs, modules, functions used to support fault tolerance. Data redundancy Using additional forms of data to assist in fault tolerance Temporal redundancy Using additional forms of data to assist in fault tolerance. Using additional time to perform tasks related to fault tolerance, i.e. repeating an execution using the same software and hardware resources involved in the initial, failed execution. 58

59 1. Coincident Faults Coincident Faults: when two or more functionally equivalent software components fail on the same input. When two or more software versions give the same incorrect response, an identical-andwrong (IAW) answer is obtained. 59

60 2. Correlated Faults Correlated Faults: Two faults are correlated when the measured probability of the coincidence failures is significantly higher than what would be expected from the individual failure. if p i _ fails j _ fails p i _ fails There will be no failure independence. d far@ucalgary.ca 60

61 Failure Scenario What if the software P1 components produce doublet or triplet identical-and-wrong (IAW) responses? Input space for each procedure P2 P3 Adjudication Algorithm Doublet & triplet IAW faults 61

62 Adjudication by Voting A voter compares results from two or more functionally equivalent software components and decides which of the answers provided by those components is correct. Various versions of voting algorithm: Majority voting Consensus voting 2-of-N voting far@ucalgary.ca 62

63 Majority Voting Several identical components are structured in parallel l and all are active. If the component outputs t are not identical, the minority components are ignored (i.e., e disabled or switched off). Majority voting: N: number of systems m [(N+1)/2], N>1 m: agreement number System reliability (Rsystem) for majority voting (assuming components with identical reliability Rc) R system 1 1 R c m where m N 1 2 far@ucalgary.ca 63

64 Consensus Voting If majority agreement is achieved, select this answer If unique maximum agreement is achieved but m<[(n+1)/2], select the unique maximum (m is the ceiling value) If tie in the maximum agreement number is achieved, select randomly System reliability (Rsystem) for consensus voting (assuming components with identical reliability Rc) R system 1 1 R c m m is the number of unique maximum components far@ucalgary.ca 64

65 2-of of-n Voting Agreement number m can be set to 2 if the output space is large and statistical independence of variant failures can be assumed. System reliability (Rsystem) for 2-of-N voting (assuming components with identical reliability Rc) R system 1 1 R 2 c far@ucalgary.ca 65

66 Design Techniques 1) Robust software systems 2) Recovery blocks 3) N-version programming 4) Consensus recovery block 5) Acceptance voting 6) N-self-checking programming 66

67 1) Robust Software Systems /1 Robust Software Systems (Anderson and Lee 1981, etc.): Construction of a robust module requires: Exception handlers for coping with exceptions propagated from lower levels; and Boolean expressions for detecting exceptions arising in the module itself, and their exception handlers. It is often possible (and desirable for the sake of simplicity) to map several exceptions onto a single handler. far@ucalgary.ca 67

68 2) Recovery Blocks (RB) Using multiple versions of software module and acceptance test. The output of the 1 st module is tested for acceptability and if fails, the 2 nd module is executed after backward state recovery. The system fails only if all modules fail on their acceptance tests. Figure from Reliability Engineering Handbook far@ucalgary.ca 68

69 3) N-Version Programming (NVP) Parallel execution of N independently developed functionally equivalent modules. Adjudication is via voting. The voter accepts all N outputs and selects the correct one among them, i.e., the one that meets the specification. Advantage of NVP: no service interruption Figure from Reliability Engineering Handbook far@ucalgary.ca 69

70 4) Consensus Recovery Block Combination of N- version programming (NVP) and recovery blocks (RB). IF NVP fails, the system reverts to RB using the same blocks. Advantage: highest possible system reliability. input failure NVP RB System failure success Correct output Correct output 70

71 5) Acceptance Voting Like N-version programming (NVP) all versions are executed in parallel. The output of each module goes to an acceptance test. If acceptance test is successful, the output goes to a voter. Figure from Reliability Engineering Handbook far@ucalgary.ca 71

72 6) N-Self Self-Check Programming In N-Self-Check Programming (NSCP), N modules are executed in pairs. The pairs outputs can be compared or accessed for correctness. Figure from Reliability Engineering Handbook far@ucalgary.ca 72

73 Discussion The capability of tolerating design faults rests largely on the coverage of run-time checks (i.e. acceptance tests) for detecting errors. Often, it is not possible to check completely within a procedure that the results produced have been according to the specification (e.g., for a sort algorithm that sorts its input, the check that the output has been sorted correctly would be as complex as the sort algorithm itself). Hence run-time checks are often limited to checking certain critical aspects of the specification. This means that the possibility of undetected failures cannot be ruled out entirely. far@ucalgary.ca 73

74 Fault Tolerance: Adjudication by voting 74

Introduction to Software Fault Tolerance Techniques and Implementation. Presented By : Hoda Banki

Introduction to Software Fault Tolerance Techniques and Implementation. Presented By : Hoda Banki Introduction to Software Fault Tolerance Techniques and Implementation Presented By : Hoda Banki 1 Contents : Introduction Types of faults Dependability concept classification Error recovery Types of redundancy

More information

Sequential Fault Tolerance Techniques

Sequential Fault Tolerance Techniques COMP-667 Software Fault Tolerance Software Fault Tolerance Sequential Fault Tolerance Techniques Jörg Kienzle Software Engineering Laboratory School of Computer Science McGill University Overview Robust

More information

Department of Electrical & Computer Engineering, University of Calgary. B.H. Far

Department of Electrical & Computer Engineering, University of Calgary. B.H. Far SENG 421: Software Metrics Software Test Metrics (Chapter 10) Department of Electrical & Computer Engineering, University of Calgary B.H. Far (far@ucalgary.ca) http://www.enel.ucalgary.ca/people/far/lectures/seng421/10/

More information

Aerospace Software Engineering

Aerospace Software Engineering 16.35 Aerospace Software Engineering Reliability, Availability, and Maintainability Software Fault Tolerance Prof. Kristina Lundqvist Dept. of Aero/Astro, MIT Definitions Software reliability The probability

More information

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d) Distributed Systems Fö 9/10-1 Distributed Systems Fö 9/10-2 FAULT TOLERANCE 1. Fault Tolerant Systems 2. Faults and Fault Models. Redundancy 4. Time Redundancy and Backward Recovery. Hardware Redundancy

More information

TSW Reliability and Fault Tolerance

TSW Reliability and Fault Tolerance TSW Reliability and Fault Tolerance Alexandre David 1.2.05 Credits: some slides by Alan Burns & Andy Wellings. Aims Understand the factors which affect the reliability of a system. Introduce how software

More information

Basic Concepts of Reliability

Basic Concepts of Reliability Basic Concepts of Reliability Reliability is a broad concept. It is applied whenever we expect something to behave in a certain way. Reliability is one of the metrics that are used to measure quality.

More information

Software reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment.

Software reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment. SOFTWARE ENGINEERING SOFTWARE RELIABILITY Software reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment. LEARNING OBJECTIVES

More information

Course: Advanced Software Engineering. academic year: Lecture 14: Software Dependability

Course: Advanced Software Engineering. academic year: Lecture 14: Software Dependability Course: Advanced Software Engineering academic year: 2011-2012 Lecture 14: Software Dependability Lecturer: Vittorio Cortellessa Computer Science Department University of L'Aquila - Italy vittorio.cortellessa@di.univaq.it

More information

Issues in Programming Language Design for Embedded RT Systems

Issues in Programming Language Design for Embedded RT Systems CSE 237B Fall 2009 Issues in Programming Language Design for Embedded RT Systems Reliability and Fault Tolerance Exceptions and Exception Handling Rajesh Gupta University of California, San Diego ES Characteristics

More information

CprE 458/558: Real-Time Systems. Lecture 17 Fault-tolerant design techniques

CprE 458/558: Real-Time Systems. Lecture 17 Fault-tolerant design techniques : Real-Time Systems Lecture 17 Fault-tolerant design techniques Fault Tolerant Strategies Fault tolerance in computer system is achieved through redundancy in hardware, software, information, and/or computations.

More information

Review of Software Fault-Tolerance Methods for Reliability Enhancement of Real-Time Software Systems

Review of Software Fault-Tolerance Methods for Reliability Enhancement of Real-Time Software Systems International Journal of Electrical and Computer Engineering (IJECE) Vol. 6, No. 3, June 2016, pp. 1031 ~ 1037 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i3.9041 1031 Review of Software Fault-Tolerance Methods

More information

Part 2: Basic concepts and terminology

Part 2: Basic concepts and terminology Part 2: Basic concepts and terminology Course: Dependable Computer Systems 2012, Stefan Poledna, All rights reserved part 2, page 1 Def.: Dependability (Verlässlichkeit) is defined as the trustworthiness

More information

MTAT : Software Testing

MTAT : Software Testing MTAT.03.159: Software Testing Lecture 04: Static Testing (Inspection) and Defect Estimation (Textbook Ch. 10 & 12) Spring 2013 Dietmar Pfahl email: dietmar.pfahl@ut.ee Lecture Reading Chapter 10: Reviews

More information

Dependability tree 1

Dependability tree 1 Dependability tree 1 Means for achieving dependability A combined use of methods can be applied as means for achieving dependability. These means can be classified into: 1. Fault Prevention techniques

More information

Fault tolerance and Reliability

Fault tolerance and Reliability Fault tolerance and Reliability Reliability measures Fault tolerance in a switching system Modeling of fault tolerance and reliability Rka -k2002 Telecommunication Switching Technology 14-1 Summary of

More information

6.033 Lecture Fault Tolerant Computing 3/31/2014

6.033 Lecture Fault Tolerant Computing 3/31/2014 6.033 Lecture 14 -- Fault Tolerant Computing 3/31/2014 So far what have we seen: Modularity RPC Processes Client / server Networking Implements client/server Seen a few examples of dealing with faults

More information

Darshan Institute of Engineering & Technology Unit : 9

Darshan Institute of Engineering & Technology Unit : 9 1) Explain software testing strategy for conventional software architecture. Draw the spiral diagram showing testing strategies with phases of software development. Software Testing: Once source code has

More information

Overview. State-of-the-Art. Relative cost of error correction. CS 619 Introduction to OO Design and Development. Testing.

Overview. State-of-the-Art. Relative cost of error correction. CS 619 Introduction to OO Design and Development. Testing. Overview CS 619 Introduction to OO Design and Development ing! Preliminaries! All sorts of test techniques! Comparison of test techniques! Software reliability Fall 2012! Main issues: There are a great

More information

Functional Safety and Safety Standards: Challenges and Comparison of Solutions AA309

Functional Safety and Safety Standards: Challenges and Comparison of Solutions AA309 June 25th, 2007 Functional Safety and Safety Standards: Challenges and Comparison of Solutions AA309 Christopher Temple Automotive Systems Technology Manager Overview Functional Safety Basics Functional

More information

ZKLWHýSDSHU. 3UHð)DLOXUHý:DUUDQW\ý 0LQLPL]LQJý8QSODQQHGý'RZQWLPH. +3ý 1HW6HUYHUý 0DQDJHPHQW. Executive Summary. A Closer Look

ZKLWHýSDSHU. 3UHð)DLOXUHý:DUUDQW\ý 0LQLPL]LQJý8QSODQQHGý'RZQWLPH. +3ý 1HW6HUYHUý 0DQDJHPHQW. Executive Summary. A Closer Look 3UHð)DLOXUHý:DUUDQW\ý 0LQLPL]LQJý8QSODQQHGý'RZQWLPH ZKLWHýSDSHU Executive Summary The Hewlett-Packard Pre-Failure Warranty 1 helps you run your business with less downtime. It extends the advantage of

More information

Fault-tolerant techniques

Fault-tolerant techniques What are the effects if the hardware or software is not fault-free in a real-time system? What causes component faults? Specification or design faults: Incomplete or erroneous models Lack of techniques

More information

Critical Systems. Objectives. Topics covered. Critical Systems. System dependability. Importance of dependability

Critical Systems. Objectives. Topics covered. Critical Systems. System dependability. Importance of dependability Objectives Critical Systems To explain what is meant by a critical system where system failure can have severe human or economic consequence. To explain four dimensions of dependability - availability,

More information

Bridge Course On Software Testing

Bridge Course On Software Testing G. PULLAIAH COLLEGE OF ENGINEERING AND TECHNOLOGY Accredited by NAAC with A Grade of UGC, Approved by AICTE, New Delhi Permanently Affiliated to JNTUA, Ananthapuramu (Recognized by UGC under 2(f) and 12(B)

More information

SOFTWARE ENGINEERING DECEMBER. Q2a. What are the key challenges being faced by software engineering?

SOFTWARE ENGINEERING DECEMBER. Q2a. What are the key challenges being faced by software engineering? Q2a. What are the key challenges being faced by software engineering? Ans 2a. The key challenges facing software engineering are: 1. Coping with legacy systems, coping with increasing diversity and coping

More information

Distributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases

Distributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases Distributed Database Management System UNIT-2 Bharati Vidyapeeth s Institute of Computer Applications and Management, New Delhi-63,By Shivendra Goel. U2.1 Concurrency Control Concurrency control is a method

More information

Fault-Tolerant Embedded System

Fault-Tolerant Embedded System Fault-Tolerant Embedded System EE8205: Embedded Computer Systems http://www.ee.ryerson.ca/~courses/ee8205/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale

Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale Saurabh Hukerikar Christian Engelmann Computer Science Research Group Computer Science & Mathematics Division Oak Ridge

More information

Fault-Tolerant Embedded System

Fault-Tolerant Embedded System Fault-Tolerant Embedded System COE718: Embedded Systems Design http://www.ee.ryerson.ca/~courses/coe718/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Dep. Systems Requirements

Dep. Systems Requirements Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

Chapter 39: Concepts of Time-Triggered Communication. Wenbo Qiao

Chapter 39: Concepts of Time-Triggered Communication. Wenbo Qiao Chapter 39: Concepts of Time-Triggered Communication Wenbo Qiao Outline Time and Event Triggered Communication Fundamental Services of a Time-Triggered Communication Protocol Clock Synchronization Periodic

More information

CDA 5140 Software Fault-tolerance. - however, reliability of the overall system is actually a product of the hardware, software, and human reliability

CDA 5140 Software Fault-tolerance. - however, reliability of the overall system is actually a product of the hardware, software, and human reliability CDA 5140 Software Fault-tolerance - so far have looked at reliability as hardware reliability - however, reliability of the overall system is actually a product of the hardware, software, and human reliability

More information

INTRODUCTION TO SOFTWARE ENGINEERING

INTRODUCTION TO SOFTWARE ENGINEERING INTRODUCTION TO SOFTWARE ENGINEERING Introduction to Software Testing d_sinnig@cs.concordia.ca Department for Computer Science and Software Engineering What is software testing? Software testing consists

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Network Survivability

Network Survivability Network Survivability Bernard Cousin Outline Introduction to Network Survivability Types of Network Failures Reliability Requirements and Schemes Principles of Network Recovery Performance of Recovery

More information

Software Quality. Chapter What is Quality?

Software Quality. Chapter What is Quality? Chapter 1 Software Quality 1.1 What is Quality? The purpose of software quality analysis, or software quality engineering, is to produce acceptable products at acceptable cost, where cost includes calendar

More information

Aerospace Software Engineering

Aerospace Software Engineering 16.35 Aerospace Software Engineering Verification & Validation Prof. Kristina Lundqvist Dept. of Aero/Astro, MIT Would You...... trust a completely-automated nuclear power plant?... trust a completely-automated

More information

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN Course Code : CS0444 Course Title : Software Reliability Semester : VIII

More information

Verification and Validation. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1

Verification and Validation. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1 Verification and Validation Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1 Verification vs validation Verification: "Are we building the product right?. The software should

More information

Homework 2 COP The total number of paths required to reach the global state is 20 edges.

Homework 2 COP The total number of paths required to reach the global state is 20 edges. Homework 2 COP 5611 Problem 1: 1.a Global state lattice 1. The total number of paths required to reach the global state is 20 edges. 2. In the global lattice each and every edge (downwards) leads to a

More information

A Better Way to a Redundant DNS.

A Better Way to a Redundant DNS. WHITEPAPE R A Better Way to a Redundant DNS. +1.855.GET.NSONE (6766) NS1.COM 2019.02.12 Executive Summary DNS is a mission critical application for every online business. In the words of Gartner If external

More information

Alexandre Esper, Geoffrey Nelissen, Vincent Nélis, Eduardo Tovar

Alexandre Esper, Geoffrey Nelissen, Vincent Nélis, Eduardo Tovar Alexandre Esper, Geoffrey Nelissen, Vincent Nélis, Eduardo Tovar Current status MC model gradually gaining in sophistication Current status MC model gradually gaining in sophistication Issue Safety-related

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica

TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica Examination Architecture of Distributed Systems (2IMN10), on Thursday, November 8, 2018, from 9.00 to 12.00 hours. Before you start,

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in

More information

B.H. Far

B.H. Far SENG 521 Software Reliability & Quality Software Reliability Tools (Chapter 12) Department t of Electrical l & Computer Engineering, i University it of Calgary B.H. Far (far@ucalgary.ca) http://www.enel.ucalgary.ca/people/far/lectures/seng521

More information

hot plug RAID memory technology for fault tolerance and scalability

hot plug RAID memory technology for fault tolerance and scalability hp industry standard servers april 2003 technology brief TC030412TB hot plug RAID memory technology for fault tolerance and scalability table of contents abstract... 2 introduction... 2 memory reliability...

More information

Module 8 - Fault Tolerance

Module 8 - Fault Tolerance Module 8 - Fault Tolerance Dependability Reliability A measure of success with which a system conforms to some authoritative specification of its behavior. Probability that the system has not experienced

More information

Fault Tolerance. The Three universe model

Fault Tolerance. The Three universe model Fault Tolerance High performance systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful

More information

Software dependability. Critical systems development. Objectives. Dependability achievement. Diversity and redundancy.

Software dependability. Critical systems development. Objectives. Dependability achievement. Diversity and redundancy. Software dependability Critical systems development In general, software customers expect all software to be dependable. However, for non-critical applications, they may be willing to accept some system

More information

Sample Exam ISTQB Advanced Test Analyst Answer Rationale. Prepared By

Sample Exam ISTQB Advanced Test Analyst Answer Rationale. Prepared By Sample Exam ISTQB Advanced Test Analyst Answer Rationale Prepared By Released March 2016 TTA-1.3.1 (K2) Summarize the generic risk factors that the Technical Test Analyst typically needs to consider #1

More information

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical

More information

Distributed Transaction Management. Distributed Database System

Distributed Transaction Management. Distributed Database System Distributed Transaction Management Advanced Topics in Database Management (INFSCI 2711) Some materials are from Database Management Systems, Ramakrishnan and Gehrke and Database System Concepts, Siberschatz,

More information

What are Embedded Systems? Lecture 1 Introduction to Embedded Systems & Software

What are Embedded Systems? Lecture 1 Introduction to Embedded Systems & Software What are Embedded Systems? 1 Lecture 1 Introduction to Embedded Systems & Software Roopa Rangaswami October 9, 2002 Embedded systems are computer systems that monitor, respond to, or control an external

More information

DATA DOMAIN INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY

DATA DOMAIN INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY WHITEPAPER DATA DOMAIN INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY A Detailed Review ABSTRACT No single mechanism is sufficient to ensure data integrity in a storage system.

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CS SOFTWARE ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CS SOFTWARE ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CS 6403 - SOFTWARE ENGINEERING QUESTION BANK TWO MARKS UNIT I SOFTWARE PROCESS AND PROJECT MANAGEMENT 1. What is software engineering? Software engineering

More information

OL Connect Backup licenses

OL Connect Backup licenses OL Connect Backup licenses Contents 2 Introduction 3 What you need to know about application downtime 5 What are my options? 5 Reinstall, reactivate, and rebuild 5 Create a Virtual Machine 5 Run two servers

More information

Harmonization of usability measurements in ISO9126 software engineering standards

Harmonization of usability measurements in ISO9126 software engineering standards Harmonization of usability measurements in ISO9126 software engineering standards Laila Cheikhi, Alain Abran and Witold Suryn École de Technologie Supérieure, 1100 Notre-Dame Ouest, Montréal, Canada laila.cheikhi.1@ens.etsmtl.ca,

More information

Verification and Validation. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1

Verification and Validation. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1 Verification and Validation 1 Objectives To introduce software verification and validation and to discuss the distinction between them To describe the program inspection process and its role in V & V To

More information

Deriving safety requirements according to ISO for complex systems: How to avoid getting lost?

Deriving safety requirements according to ISO for complex systems: How to avoid getting lost? Deriving safety requirements according to ISO 26262 for complex systems: How to avoid getting lost? Thomas Frese, Ford-Werke GmbH, Köln; Denis Hatebur, ITESYS GmbH, Dortmund; Hans-Jörg Aryus, SystemA GmbH,

More information

WHY BUILDING SECURITY SYSTEMS NEED CONTINUOUS AVAILABILITY

WHY BUILDING SECURITY SYSTEMS NEED CONTINUOUS AVAILABILITY WHY BUILDING SECURITY SYSTEMS NEED CONTINUOUS AVAILABILITY White Paper 2 Why Building Security Systems Need Continuous Availability Always On Is the Only Option. If All Systems Go Down, How Can You React

More information

Recovering from a Crash. Three-Phase Commit

Recovering from a Crash. Three-Phase Commit Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator

More information

Financial CISM. Certified Information Security Manager (CISM) Download Full Version :

Financial CISM. Certified Information Security Manager (CISM) Download Full Version : Financial CISM Certified Information Security Manager (CISM) Download Full Version : http://killexams.com/pass4sure/exam-detail/cism required based on preliminary forensic investigation, but doing so as

More information

Concurrent & Distributed 7Systems Safety & Liveness. Uwe R. Zimmer - The Australian National University

Concurrent & Distributed 7Systems Safety & Liveness. Uwe R. Zimmer - The Australian National University Concurrent & Distributed 7Systems 2017 Safety & Liveness Uwe R. Zimmer - The Australian National University References for this chapter [ Ben2006 ] Ben-Ari, M Principles of Concurrent and Distributed Programming

More information

Anomaly Detection Fault Tolerance Anticipation

Anomaly Detection Fault Tolerance Anticipation Anomaly Detection Fault Tolerance Anticipation Patterns John Allspaw SVP, Tech Ops Qcon London 2012 Four Cornerstones Erik Hollnagel (Anticipation) (Response) Knowing Knowing Knowing Knowing What What

More information

Service Recovery & Availability. Robert Dickerson June 2010

Service Recovery & Availability. Robert Dickerson June 2010 Service Recovery & Availability Robert Dickerson June 2010 Started in 1971 with $3,000, 40 clients and 1 employee. 2009: over $2B revenue, 500,000+ clients, 13,000 employees. Payroll / Tax Services / 401(k)

More information

Lecture 19 Engineering Design Resolution: Generating and Evaluating Architectures

Lecture 19 Engineering Design Resolution: Generating and Evaluating Architectures Lecture 19 Engineering Design Resolution: Generating and Evaluating Architectures Software Engineering ITCS 3155 Fall 2008 Dr. Jamie Payton Department of Computer Science University of North Carolina at

More information

Approaches to Software Based Fault Tolerance A Review

Approaches to Software Based Fault Tolerance A Review Computer Science Journal of Moldova, vol.13, no.3(39), 2005 Approaches to Software Based Fault Tolerance A Review Goutam Kumar Saha Abstract This paper presents a review work on various approaches to software

More information

Chapter 8 Software Testing. Chapter 8 Software testing

Chapter 8 Software Testing. Chapter 8 Software testing Chapter 8 Software Testing 1 Topics covered Introduction to testing Stages for testing software system are: Development testing Release testing User testing Test-driven development as interleave approach.

More information

Achieving Rapid Data Recovery for IBM AIX Environments An Executive Overview of EchoStream for AIX

Achieving Rapid Data Recovery for IBM AIX Environments An Executive Overview of EchoStream for AIX Achieving Rapid Data Recovery for IBM AIX Environments An Executive Overview of EchoStream for AIX Introduction Planning for recovery is a requirement in businesses of all sizes. In implementing an operational

More information

PowerVault MD3 Storage Array Enterprise % Availability

PowerVault MD3 Storage Array Enterprise % Availability PowerVault MD3 Storage Array Enterprise 99.999% Availability Dell Engineering June 2015 A Dell Technical White Paper THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS

More information

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner Real-Time Component Software slide credits: H. Kopetz, P. Puschner Overview OS services Task Structure Task Interaction Input/Output Error Detection 2 Operating System and Middleware Application Software

More information

QA Best Practices: A training that cultivates skills for delivering quality systems

QA Best Practices: A training that cultivates skills for delivering quality systems QA Best Practices: A training that cultivates skills for delivering quality systems Dixie Neilson QA Supervisor Lynn Worm QA Supervisor Maheen Imam QA Analyst Information Technology for Minnesota Government

More information

Lecture 20: SW Testing Presented by: Mohammad El-Ramly, PhD

Lecture 20: SW Testing Presented by: Mohammad El-Ramly, PhD Cairo University Faculty of Computers and Information CS251 Software Engineering Lecture 20: SW Testing Presented by: Mohammad El-Ramly, PhD http://www.acadox.com/join/75udwt Outline Definition of Software

More information

Multilevel Fault-tolerance for Designing Dependable Wireless Networks

Multilevel Fault-tolerance for Designing Dependable Wireless Networks Multilevel Fault-tolerance for Designing Dependable Wireless Networks Upkar Varshney Department of Computer Information Systems Georgia State University Atlanta, Georgia 30302-4015 E-mail: uvarshney@gsu.edu

More information

BUSINESS CONTINUITY: THE PROFIT SCENARIO

BUSINESS CONTINUITY: THE PROFIT SCENARIO WHITE PAPER BUSINESS CONTINUITY: THE PROFIT SCENARIO THE BENEFITS OF A COMPREHENSIVE BUSINESS CONTINUITY STRATEGY FOR INCREASED OPPORTUNITY Organizational data is the DNA of a business it makes your operation

More information

10. Software Testing Fundamental Concepts

10. Software Testing Fundamental Concepts 10. Software Testing Fundamental Concepts Department of Computer Science and Engineering Hanyang University ERICA Campus 1 st Semester 2016 Testing in Object-Oriented Point of View Error Correction Cost

More information

Top-Down Network Design

Top-Down Network Design Top-Down Network Design Chapter Two Analyzing Technical Goals and Tradeoffs Copyright 2010 Cisco Press & Priscilla Oppenheimer 1 Technical Goals Scalability Availability Performance Security Manageability

More information

The Microsoft Large Mailbox Vision

The Microsoft Large Mailbox Vision WHITE PAPER The Microsoft Large Mailbox Vision Giving users large mailboxes without breaking your budget Introduction Giving your users the ability to store more email has many advantages. Large mailboxes

More information

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements

More information

Part 5. Verification and Validation

Part 5. Verification and Validation Software Engineering Part 5. Verification and Validation - Verification and Validation - Software Testing Ver. 1.7 This lecture note is based on materials from Ian Sommerville 2006. Anyone can use this

More information

Why the Threat of Downtime Should Be Keeping You Up at Night

Why the Threat of Downtime Should Be Keeping You Up at Night Why the Threat of Downtime Should Be Keeping You Up at Night White Paper 2 Your Plan B Just Isn t Good Enough. Learn Why and What to Do About It. Server downtime is an issue that many organizations struggle

More information

Software Quality Engineering Tackles Security Issues

Software Quality Engineering Tackles Security Issues Software Quality Engineering Tackles Security Issues Taz Daughtrey Senior Scientist Quanterion Solutions, Inc. Software Quality Group of New England 12 June 2013 Software Quality Engineering Tackles Security

More information

B.H. Far

B.H. Far SENG 521 Software Reliability & Software Quality Chapter 11: Preparing & Executing Test Department t of Electrical l & Computer Engineering, i University it of Calgary B.H. Far (far@ucalgary.ca) http://www.enel.ucalgary.ca/people/far/lectures/seng521

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Distributed Systems (ICE 601) Fault Tolerance

Distributed Systems (ICE 601) Fault Tolerance Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability

More information

OPERATING SYSTEMS. Prescribed Text Book. Operating System Principles, Seventh Edition. Abraham Silberschatz, Peter Baer Galvin and Greg Gagne

OPERATING SYSTEMS. Prescribed Text Book. Operating System Principles, Seventh Edition. Abraham Silberschatz, Peter Baer Galvin and Greg Gagne OPERATING SYSTEMS Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne 1 DEADLOCKS In a multi programming environment, several processes

More information

Types of Software Testing: Different Testing Types with Details

Types of Software Testing: Different Testing Types with Details Types of Software Testing: Different Testing Types with Details What are the different Types of Software Testing? We, as testers are aware of the various types of Software Testing such as Functional Testing,

More information

Safety & Liveness Towards synchronization. Safety & Liveness. where X Q means that Q does always hold. Revisiting

Safety & Liveness Towards synchronization. Safety & Liveness. where X Q means that Q does always hold. Revisiting 459 Concurrent & Distributed 7 Systems 2017 Uwe R. Zimmer - The Australian National University 462 Repetition Correctness concepts in concurrent systems Liveness properties: ( P ( I )/ Processes ( I, S

More information

White Paper. Incorporating Usability Experts with Your Software Development Lifecycle: Benefits and ROI Situated Research All Rights Reserved

White Paper. Incorporating Usability Experts with Your Software Development Lifecycle: Benefits and ROI Situated Research All Rights Reserved White Paper Incorporating Usability Experts with Your Software Development Lifecycle: Benefits and ROI 2018 Situated Research All Rights Reserved Learnability, efficiency, safety, effectiveness, memorability

More information

Module 4 STORAGE NETWORK BACKUP & RECOVERY

Module 4 STORAGE NETWORK BACKUP & RECOVERY Module 4 STORAGE NETWORK BACKUP & RECOVERY BC Terminology, BC Planning Lifecycle General Conditions for Backup, Recovery Considerations Network Backup, Services Performance Bottlenecks of Network Backup,

More information

DATA ITEM DESCRIPTION

DATA ITEM DESCRIPTION DATA ITEM DESCRIPTION Title: RELIABILITY AND MAINTAINABILITY (R&M) BLOCK DIAGRAMS AND MATHEMATICAL MODELS REPORT Number: DI-SESS-81496A Approval Date: 20141219 AMSC Number: 9508 Limitation: No DTIC Applicable:

More information

Notes on Photoshop s Defect in Simulation of Global Motion-Blurring

Notes on Photoshop s Defect in Simulation of Global Motion-Blurring Notes on Photoshop s Defect in Simulation of Global Motion-Blurring Li-Dong Cai Department of Computer Science, Jinan University Guangzhou 510632, CHINA ldcai@21cn.com ABSTRACT In restoration of global

More information

MIS Systems & Infrastructure Lifecycle Management 1. Week 12 April 7, 2016

MIS Systems & Infrastructure Lifecycle Management 1. Week 12 April 7, 2016 MIS 5203 Lifecycle Management 1 Week 12 April 7, 2016 Study Objectives Systems Implementation Data Migration Change Over 2 Phase 1 Feasibility Phase 2 Requirements Which ones of these activities are part

More information

Hardware Safety Integrity. Hardware Safety Design Life-Cycle

Hardware Safety Integrity. Hardware Safety Design Life-Cycle Hardware Safety Integrity Architecture esign and Safety Assessment of Safety Instrumented Systems Budapest University of Technology and Economics epartment of Measurement and Information Systems Hardware

More information

Distributed Systems Fault Tolerance

Distributed Systems Fault Tolerance Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable

More information

INFORMATION SECURITY- DISASTER RECOVERY

INFORMATION SECURITY- DISASTER RECOVERY Information Technology Services Administrative Regulation ITS-AR-1505 INFORMATION SECURITY- DISASTER RECOVERY 1.0 Purpose and Scope The objective of this Administrative Regulation is to outline the strategy

More information

Distributed Systems

Distributed Systems 15-440 Distributed Systems 11 - Fault Tolerance, Logging and Recovery Tuesday, Oct 2 nd, 2018 Logistics Updates P1 Part A checkpoint Part A due: Saturday 10/6 (6-week drop deadline 10/8) *Please WORK hard

More information

Integration Testing. Conrad Hughes School of Informatics. Slides thanks to Stuart Anderson

Integration Testing. Conrad Hughes School of Informatics. Slides thanks to Stuart Anderson Integration Testing Conrad Hughes School of Informatics Slides thanks to Stuart Anderson 19 February 2010 Software Testing: Lecture 10 1 Unit Test vs Integration Testing 1 The ideal in unit testing is

More information