Automatic Generation of Availability Models in RAScad

Size: px
Start display at page:

Download "Automatic Generation of Availability Models in RAScad"

Transcription

1 Automatic Generation of Availability Models in RAScad Dong Tang, Ji Zhu, and Roy Andrada Sun Microsystems, Inc Network Circle, Santa Clara, CA {dong.tang, ji.zhu, Abstract RAScad is a Sun internal web based reliability, availability, serviceability (RAS) architecture modeling and analysis tool for use in the computer system design and development phase. Two major goals of RAScad are: Making availability modeling possible for design engineers without background in mathematical modeling and making availability modeling efficient for RAS engineers who understand underlying mathematical models. To achieve these goals, RAScad integrates two modules: Model Generator (MG) which provides automatic model generation specific to Sun product RAS characteristics, and Graphical Model Builder (GMB) which provides general, graphical Markov, semi Markov and reliability block diagram modeling capabilities. An MG model is a hierarchical specification, in terms of an engineering language (MTBF, MTTR, redundancy, etc.), of the constituent components and associated parameters for the modeled system and the user does not have to understand the underlying mathematical models generated by RAScad. 1. Introduction As reliability, availability, and serviceability (RAS) are becoming increasingly important for the networked computer server and storage systems running critical applications, RAS has been one of the major issues considered in designing such products. It has been realized by manufacturers of these products that the quantification of the system level RAS metrics needs to be performed in the early design phase. How to derive the system level availability and reliability measures from the component level RAS parameters (for both hardware and software) and how to relate models to field data have long been addressed in previous studies [1, 2, 3, 4, 5, 6, 10]. Many in house and commercial software tools have also been developed to automate the evaluation procedure. Some representative commercial dependability modeling tools are SHARPE [7], UltraSAN [8], and MEADEP [9]. Although these tools incorporate advanced modeling and evaluation techniques, they all require the users to have mathematical modeling background to build models with the tools. The model construction is time consuming and error prone even if the modeler is an experienced user. In addition, these commercial tools are all stand alone (not web based) applications which lack support for connecting to existing enterprise RAS metrics (e.g., component level MTBF and MTTR) databases and for file sharing across networks which is desired in the modeling effort coordinated by a group of engineers located at different sites. It would be very beneficial to have a domain specific tool that understands not only the mathematical language, but also the engineering language from which mathematical models are generated automatically, and addresses all of the above issues. RAScad is being developed for filling this void. Two major goals of RAScad are: (1) Making availability modeling possible for design engineers without background in mathematical modeling and (2) making availability modeling efficient for RAS engineers who understand underlying mathematical models. To achieve these goals, RAScad integrates two modules: Model Generator (MG) and Graphical Model Builder (GMB). MG provides automatic model generation specific to Sun product RAS characteristics for use by system designers. GMB provides general, graphical Markov, semi Markov, and reliability block diagram (RBD) modeling capabilities for use by RAS experts. RAScad also allows the combined use of MG models and GMB models. MG is used to develop a diagram/block model which is a specification, in terms of an engineering language (MTBF, MTTR, redundancy, etc.), of the constituent components and associated parameters for the modeled system. In the model solution procedure, RAScad translates the diagram/block model to RBDs and Markov chains which are then solved using numerical methods. The MG user does not have to understand these underlying mathematical models. What is needed from the user is the knowledge about the RAS architecture of the modeled system and a basic understanding of the MG diagram/block model structure. GMB is used to develop graphical RBD models and Markov/semi Markov chains, by drawing blocks, states, transitions, and other objects and by specifying related parameters hierarchically. To develop models using GMB, the user needs to have knowledge on RBD and Markov modeling and to understand how to map the

2 system RAS architecture to these models. However, GMB offers more powerful modeling capabilities than MG. For experienced users, GMB provided a very flexible and user friendly environment for modeling system behaviors in great detail. RAScad is implemented using Java TM technology and incorporates a rich set of features including: Automatic model generation Graphical Markov, semi Markov, and RBD modeling and hierarchical approach A library of models for existing Sun products and integration with the component MTBF database Graphical output and parametric analysis capability File sharing across networks and documentation generation In the following sections, we discuss MG only because it is the module that incorporates the automatic model generation, the topic of this paper. 2. RAS Characteristics Modeled MG is intended for use to analytically assess and compare RAS quantities achievable by the computer architectures under design. The tool is not intended for use to predict actual field availability performed by a system. In particular, it is applicable to architectures with the RAS characteristics discussed in this section. The level of detail that can be modeled by MG is the Field Replaceable Unit (FRU) such as CPU module and power supply unit. Based on our investigation of the RAS architectures of Sun server products, the following RAS characteristics are identified to be important for the generation of availability models: Redundancy Fault type (permanent/transient) Fault detection Fault recovery Logistic event Repair of faulty component Reintegration of repaired component The redundancy feature is determined by the quantity and the minimum required quantity for the modeled component. In the current implementation of model generation, all redundant components of the same type are assumed to be functionally equivalent, or symmetric, and have the same failure rate. Model generation for the primary standby and primary secondary (e.g., cluster) architecture is the work in progress. A permanent fault refers to a hard failure of a component and a physical repair action needs to be taken. A transient fault refers to an erroneous state in the system that is induced by cosmic rays, power surges, software defects, or environmental factors. In most cases, the erroneous state can be corrected by a restart of the system. The detailed fault detection process is not modeled in MG. But the effect of fault detection is modeled: detected fault or undetected fault (latent fault). A recovery event occurs after a fault is detected. Depending on the redundancy and automatic recovery (AR) capability implemented in the architecture and operating system, the impact of recovery event on the user applications can be transparent or nontransparent. For example, if there are N+1 power supply units providing power sharing for the system, the failure of one power supply unit would have no effect on the applications and the recovery process is transparent to the user. In a server containing multiple CPUs, if AR for a CPU failure is implemented by system reboot, a CPU failure would trigger a reboot event to deconfigure the failed CPU. This recovery process is not transparent to the user. For both transparent and nontransparent recovery events, imperfect recovery needs to be modeled. The logistic event follows the recovery event. The logistic event duration depends on the redundancy of the faulty component and maintenance strategy. If the faulty component is a required, non redundant component, the system ceases operation upon the failure of the component. A call to the customer service should be placed immediately and the logistic time is just service response time, the time for service personnel to arrive at the scene. If the faulty component is a redundant component, the system is still operational after recovering from the failure. The repair of the faulty component can be scheduled at a later time (e.g., off peak hours) and the time to placing service call is referred to as service restriction time. The logistic event duration is thus the sum of service restriction time and service response time. Similar to the recovery event, the repair event and the following reintegration event can be transparent or nontransparent. If the faulty component is hot pluggable (plug in/out while the system is running) and the system supports dynamic reconfiguration for the component (i.e., the new component can be reintegrated on line without service interruption), the repair event (including reintegration) is transparent to the user. If the faulty component is hot pluggable and the system does not support dynamic reconfiguration for the component, the repair event is not transparent because the system has to be restarted to reintegrate the new component (incurring a short downtime). If the faulty component is not hot pluggable, of course, the repair event is not transparent because the system has to be powered off for replacing the component (incurring a longer downtime). For both transparent and nontransparent repair events, imperfect repair (due to incorrect diagnosis or incorrect corrective action) needs to be modeled. 3. Model Generator GUI The MG Graphical User Interface (GUI) is used to build a diagram/block model which furnishes the automatic model generation. A diagram/block model

3 consists of a MG diagrams and MG blocks. An MG diagram represents a system or subsystem and contains a number of MG blocks. Each MG block represents a component in the system modeled by the diagram and has a parameter list associated with it. An MG block can have a subdiagram to model the subcomponents in the component represented by the block. The root diagram is numbered level 1. All subdiagrams of the root diagram are numbered level 2, etc. The overall diagram/block model is a tree structure of MG diagrams and MG blocks. Figure 1 and Figure 2 show a diagram/block model. Figure 1. Diagram/block model level 1 Figure 2. Diagram/block model level 2 The first diagram (Data Center System) has four blocks: Server Box, Boot Drives, RAID1, Storage 1, RAID5, and Storage 2, RAID5. The color for these four blocks are dark, which means each of them has a subdiagram. The second diagram is the subdiagram of the block Server Box in the first diagram. This subdiagram consists of 19 blocks (System Board, CPU Module, etc.). Associated with each block there is a parameter list. These parameters are explained as follows: Name Name of this component Part Number Part number of this component Description User s description of this component Quantity Quantity of this component Minimum Quantity Required Minimum quantity of this component required by the system MTBF Mean time between failures caused by permanent faults on this component (hours) Transient Failure Rate Failure rate due to transient faults on this component (FIT, or failures/10 9 hours) MTTR Part 1: Diagnosis Time Time to identify the failed component (min.) MTTR Part 2: Corrective Action Time Time to replace the failed component (min.) MTTR Part 3: Verification Time Time to verify the new component function or to restore lost data (min.) Service Response Time (Tresp) Time to wait for service (hours) Probability of Correct Diagnosis (Pcd) Probability of correctly identifying and replacing the faulty component (to model imperfect repair) The following parameters are relevant only if Quantity is greater than Minimum Quantity Required (i.e., the block is a redundant component): Probability of Latent Fault (Plf) MTTDLF Mean time to detect latent fault (hours) Automatic Recovery (AR) Scenario Transparent No downtime is associated with AR Nontransparent Downtime is associated with AR AR/Failover Time User defined downtime associated with AR (min.) Probability of SPF during AR (Pspf) Probability of single point of failure during AR SPF State Recovery Time (Tspf) Recovery time at the SPF state (min.) Repair Scenario Transparent No downtime in repair/reintegration Nontransparent Downtime is associated with repair/ reintegration Reintegration Time User defined downtime associated with reintegration While the above component parameters are local to a specific block, there are a few global parameters which apply to every block in the model, as shown on the Global Parameter Bar (below the Menu Bar) in Figures 1 & 2: Reboot Time (Tboot) Time to reboot the system.

4 MTTM Mean time to maintenance, or service restriction time. The average waiting time before the service call. MTTRFID Mean time to repair from incorrect diagnosis. Mission Time Time point used to calculate interval availability and reliability. 4. Models Generated This section describes how a diagram/block model is translated to underlying mathematical models reliability block diagrams and Markov chains. The modeling approach used in MG is based on the assumption that failures and repairs for different component types are independent. However, the possibility that a component failure causes a system failure is taken into account by the SPF state in the component model discussed below. Because of independent component failures, the probability of repairing multiple faulty components in a service action is very low. The repair of multiple components in a service action seen in the field is most likely due to imperfect diagnosis/replacement which is also modeled in the component model (by the Service Error state). Given an MG diagram/block model discussed in the previous section, each MG diagram is modeled by a serial RBD which consists of all the MG blocks in the diagram. Each block is then modeled by a Markov chain. The Markov chain may have a sub RBD, depending on if the corresponding block has a subdiagram. The overall model is a hierarchy of RBDs and Markov chains. The system availability of an MG diagram containing n blocks is the product of individual block availability, A i (i = 1, 2,..., n). How A i is evaluated depends on the parameters associated with Block i. Let N represents Quantity and K represents Minimum Quantity Required. If there is no redundancy, i.e.,n=k,a i is evaluated from the Markov chain called Markov Model Type 0 (Figure 3). The states and parameters of the model are all explained in the figure. These parameters come from the block and global parameters discussed in the previous section. Each state is marked either by 1 or 0, which is a reward rate assigned to the state. A reward rate of 1 means the state is an operational (up) state. A reward rate of 0 means the state is a failure (down) state. The system availability will be calculated based on the reward rate assignments [1, 6, 10]. If there is redundancy, i.e., N > K, A i is evaluated from one of the four types of Markov chain discussed below. To simplify the discussion, we assume N = 2 and K = 1. That is, there are two components in the system and at least one of them is required for the system to function. For larger N and K values, more states are needed and these states are all generated automatically in RAScad. The four Markov model types are determined by the four combinations of the parameters Automatic Recovery Scenario and Repair Scenario: 1. Transparent recovery, transparent repair 2. Transparent recovery, nontransparent repair 3. Nontransparent recovery, transparent repair 4. Nontransparent recovery, nontransparent repair The Markov chain generated for the above case i (i = 1, 2, 3, 4) is referred to as Markov Model Type i. The complexity of the model increases from type 1 to type 4. For illustration purposes, Markov Model Types 3 is shown in Figure 4 and will be discussed here. The parameters in the model are either derived or directly obtained from the block and global parameters discussed in the previous section. Figure 4. Markov Model Type 3 Figure 3. Markov Model Type 0 Markov Model Type 3 models nontransparent recovery and transparent repair. A detected permanent fault triggers an AR process (Ok AR1). If the AR works, the system goes into a degraded mode (AR1 PF1). Otherwise, it goes into the single point of failure state (AR1 SPF),

5 where it stays for a period of time (Tspf) defined by the user. A non detected permanent fault (latent fault) changes the system to another degraded mode the latent fault state (Ok Latent1). When the latent fault is detected after a delay of MTTDLF, the system has to go through the AR process again (Latent1 AR1). In the PF1 state, a repair action takes place after a logistic event delay (MTTM+Tresp). If the repair (diagnosis and corrective action) is successful, the system goes back to the normal state (PF1 Ok). Otherwise, it has to go through the service error state (PF1 ServiceError) which represents a longer downtime (MTTRFID). If the second fault occurs while the system stays in the degraded mode (PF1 or Latent1), it goes to state PF2 if the fault is permanent or to TF2 if the fault is transient. In PF2, an immediate service call is placed to initiate a repair action. In the situation of a transient fault, either the first fault (Ok TF1) or the second fault adding to a permanent fault (PF1 TF2), the system clears the fault by an AR process. If the AR process does not work (e.g., due to data corruption), the system has to go through the SPF state. As indicated in the figure, the number of states in the model is determined by N and K. For example, if N K > 1, states TF1, AR1, PF1 and Latent1 will be repeated in the model. Due to the variation on the model size, the internal matrix representation, instead of the graphical representation, of the Markov models are generated in the implementation. The system measures generated by RAScad include: Steady state availability, failure and recovery rates Interval availability, failure and recovery rates for (0, T) where T is the Mission Time defined in Section 3 For reliability model: MTTF, Reliability at T, interval failure rate for (0, T), and hazard rate for the time increment in a loop 5. Conclusions In this paper, we discussed the automatic model generation in RAScad, a RAS modeling tool that can generate mathematical models from an engineering specification and that provides tool access and model sharing across Internet. Although the model generation method discussed in this paper was developed based on the Sun server architectures, we believe it is applicable to other server architectures available in the market because they have either transparent or nontransparent property in the recovery and repair processes which are key elements determining the structure of Markov models in our automatic model generation. RAScad has been validated by comparing its results with those generated by SHARPE [7] and MEADEP [9] for selected example models and field data collected from two large operational E10000 servers for 15 months. The availability and reliability results generated from the GMB models match very well with those from the above mentioned commercial tools and for the MG models, the relative errors in yearly downtime are all less than 0.2%. RAScad has been used to develop availability models for a variety of Sun system products during the development phase and is being used in the design of availability architecture for the next generation of Sun products. Acknowledgments The authors would like to thank Vijay Radhakrishnan for his good GUI programming work. Special thanks go to Helen Cunningham, William Bryson, Emrys Williams, Steve Kendall, Swami Sankaran, Robert White, David Wonnacott, and Stefan Myslicki for their valuable comments on RAScad. Sun, Sun Microsystems, and Java are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. References [1] A. Goal, S. S. Lavenberg and K. S. Trivedi, "Probabilistic Modeling of Computer System Availability," Annals of Operations Research, No. 8, March 1987, pp [2] M. C. Hsueh, R. K. Iyer and K. S. Trivedi, "Performability Modeling Based on Real Data: A Case Study," IEEE Transactions on Computers, April 1988, pp [3] J. C. Laprie, "Dependability Evaluation of Software Systems in Operation," IEEE Transactions on Software Engineering, Nov. 1984, pp [4] J. F. Meyer, "On Evaluating the Performability of Degradable Computing Systems," IEEE Transactions on Computers, Aug. 1980, pp [5] D. K. Pradhan (Ed.), Fault Tolerant Computer System Design, Prentice Hall PTR, Upper Saddle River, NJ, [6] A. Reibman, R. Smith and K. Trivedi, "Markov and Markov Reward Model Transient Analysis: An Overview of Numerical Approaches," European Journal of Operational Research, Vol. 40, 1989, pp [7] R. A. Sahner and K. S. Trivedi, "Reliability Modeling Using SHARPE," IEEE Transactions on Reliability, Feb. 1987, pp [8] W. H. Sanders, W. D. Obal II, M. A. Qureshi and F. K. Widjanarko, "The UltraSAN Modeling Environment," Performance Evaluation, Oct./Nov. 1995, pp [9] D. Tang, M. Hecht, J. Miller and J. Handal, "MEADEP A Dependability Evaluation Tool for Engineers," IEEE Transactions on Reliability, Dec. 1998, pp [10] K. S. Trivedi, Probability & Statistics with Reliability, Queuing and Computer Science Applications, Prentice Hall, Englewood Cliffs, NJ, 1982.

Fault tolerance and Reliability

Fault tolerance and Reliability Fault tolerance and Reliability Reliability measures Fault tolerance in a switching system Modeling of fault tolerance and reliability Rka -k2002 Telecommunication Switching Technology 14-1 Summary of

More information

On Dependability in Distributed Databases

On Dependability in Distributed Databases CITI Technical Report 92-9 On Dependability in Distributed Databases Toby J. Teorey teorey@citi.umich.edu ABSTRACT Distributed database availability, reliability, and mean transaction completion time are

More information

Basic Concepts of Reliability

Basic Concepts of Reliability Basic Concepts of Reliability Reliability is a broad concept. It is applied whenever we expect something to behave in a certain way. Reliability is one of the metrics that are used to measure quality.

More information

High Availability and Disaster Recovery Solutions for Perforce

High Availability and Disaster Recovery Solutions for Perforce High Availability and Disaster Recovery Solutions for Perforce This paper provides strategies for achieving high Perforce server availability and minimizing data loss in the event of a disaster. Perforce

More information

Availability Modeling and Analysis of a Two Node Cluster

Availability Modeling and Analysis of a Two Node Cluster Availability Modeling and Analysis of a Two Node Cluster Steven W Hunter IBM Corporation Research Triangle Park, NC 27609, USA And W Earl Smith IBM Corporation Research Triangle Park, NC 27609, USA ABSTRACT

More information

Fault Tolerance. The Three universe model

Fault Tolerance. The Three universe model Fault Tolerance High performance systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful

More information

Software reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment.

Software reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment. SOFTWARE ENGINEERING SOFTWARE RELIABILITY Software reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment. LEARNING OBJECTIVES

More information

Appendix D: Storage Systems (Cont)

Appendix D: Storage Systems (Cont) Appendix D: Storage Systems (Cont) Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Reliability, Availability, Dependability Dependability: deliver service such that

More information

A SKY Computers White Paper

A SKY Computers White Paper A SKY Computers White Paper High Application Availability By: Steve Paavola, SKY Computers, Inc. 100000.000 10000.000 1000.000 100.000 10.000 1.000 99.0000% 99.9000% 99.9900% 99.9990% 99.9999% 0.100 0.010

More information

Analysis of Stochastic Model on a Two-Unit Hot Standby Combined Hardware-Software System

Analysis of Stochastic Model on a Two-Unit Hot Standby Combined Hardware-Software System Analysis of Stochastic Model on a Two-Unit Hot Standby Combined Hardware-Software System Rajeev Kumar Department of Mathematics, M.D. University, Rohtak-124001, INDIA Sudesh Kumari Department of Mathematics,

More information

Module 8 - Fault Tolerance

Module 8 - Fault Tolerance Module 8 - Fault Tolerance Dependability Reliability A measure of success with which a system conforms to some authoritative specification of its behavior. Probability that the system has not experienced

More information

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2010 Daniel J. Sorin Duke University Definition and Motivation Outline General Principles of Available System Design

More information

Techniques, and Tools

Techniques, and Tools SDN Dependability: Assessment, Techniques, and Tools SDN RG Stenio Fernandes (sflf@cin.ufpe.br) Marcelo Santos (mabs@cin.ufpe.br) Federal University of Pernambuco, Recife, Brazil Motivation Dependability

More information

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

Business Continuity and Disaster Recovery. Ed Crowley Ch 12 Business Continuity and Disaster Recovery Ed Crowley Ch 12 Topics Disaster Recovery Business Impact Analysis MTBF and MTTR RTO and RPO Redundancy Failover Backup Sites Load Balancing Mirror Sites Disaster

More information

Technical Brief. NVIDIA Storage Technology Confidently Store Your Digital Assets

Technical Brief. NVIDIA Storage Technology Confidently Store Your Digital Assets Technical Brief NVIDIA Storage Technology Confidently Store Your Digital Assets Confidently Store Your Digital Assets The massive growth in broadband connections is fast enabling consumers to turn to legal

More information

Prediction of Information System Availability in Mission Critical and Business Critical Applications

Prediction of Information System Availability in Mission Critical and Business Critical Applications Prediction of Information System Availability in Mission Critical and Business Critical Applications Myron Hecht SoHaR Incorporated Beverly Hills, CA Twelfth Annual International Symposium of the International

More information

High Availability for SAS 9 Metadata Server

High Availability for SAS 9 Metadata Server High Availability for SAS 9 Metadata Server White Paper November 2005 2 Table of Contents Sun Microsystems, Inc. Table of Contents Introduction to Sun Cluster...3 High Availablity for SAS Metadata Server

More information

Optimal Cost-Effective Design of Standby Systems Subject to Imperfect Fault-Coverage

Optimal Cost-Effective Design of Standby Systems Subject to Imperfect Fault-Coverage Optimal Cost-Effective Design of Standby Systems Subject to Imperfect Fault-Coverage Giribabu G 1, Sarmistha Neogy 2, and Mita Nasipuri 3 Jadavpur University, West Bengal, India-700032 1 babugiri71@yahoo.co.in

More information

Introduction to Robust Systems

Introduction to Robust Systems Introduction to Robust Systems Subhasish Mitra Stanford University Email: subh@stanford.edu 1 Objective of this Talk Brainstorm What is a robust system? How can we build robust systems? Robust systems

More information

Software Engineering: Integration Requirements

Software Engineering: Integration Requirements Software Engineering: Integration Requirements AYAZ ISAZADEH Department of Computer Science Tabriz University Tabriz, IRAN Abstract: - This paper presents a discussion of software integration requirements,

More information

Reliability and Dependability in Computer Networks. CS 552 Computer Networks Side Credits: A. Tjang, W. Sanders

Reliability and Dependability in Computer Networks. CS 552 Computer Networks Side Credits: A. Tjang, W. Sanders Reliability and Dependability in Computer Networks CS 552 Computer Networks Side Credits: A. Tjang, W. Sanders Outline Overall dependability definitions and concepts Measuring Site dependability Stochastic

More information

Analysis of Replication Control Protocols

Analysis of Replication Control Protocols Analysis of Replication Control Protocols Darrell D. E. Long University of California, Santa Cruz darrell@cis.ucsc.edu June 22, 2003 Abstract In recent years many replication control protocols have been

More information

Course: Advanced Software Engineering. academic year: Lecture 14: Software Dependability

Course: Advanced Software Engineering. academic year: Lecture 14: Software Dependability Course: Advanced Software Engineering academic year: 2011-2012 Lecture 14: Software Dependability Lecturer: Vittorio Cortellessa Computer Science Department University of L'Aquila - Italy vittorio.cortellessa@di.univaq.it

More information

VALIDATING AN ANALYTICAL APPROXIMATION THROUGH DISCRETE SIMULATION

VALIDATING AN ANALYTICAL APPROXIMATION THROUGH DISCRETE SIMULATION MATHEMATICAL MODELLING AND SCIENTIFIC COMPUTING, Vol. 8 (997) VALIDATING AN ANALYTICAL APPROXIMATION THROUGH DISCRETE ULATION Jehan-François Pâris Computer Science Department, University of Houston, Houston,

More information

Dependability and ECC

Dependability and ECC ecture 38 Computer Science 61C Spring 2017 April 24th, 2017 Dependability and ECC 1 Great Idea #6: Dependability via Redundancy Applies to everything from data centers to memory Redundant data centers

More information

Hardware Safety Integrity. Hardware Safety Design Life-Cycle

Hardware Safety Integrity. Hardware Safety Design Life-Cycle Hardware Safety Integrity Architecture esign and Safety Assessment of Safety Instrumented Systems Budapest University of Technology and Economics epartment of Measurement and Information Systems Hardware

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems CEC 450 Real-Time Systems Lecture 13 High Availability and Reliability for Mission Critical Systems November 9, 2015 Sam Siewert RASM Reliability High Quality Components (Unit Test) Redundancy Dual String

More information

Part 2: Basic concepts and terminology

Part 2: Basic concepts and terminology Part 2: Basic concepts and terminology Course: Dependable Computer Systems 2012, Stefan Poledna, All rights reserved part 2, page 1 Def.: Dependability (Verlässlichkeit) is defined as the trustworthiness

More information

Release notes for Flash Recovery Tool Version 10.0(2)

Release notes for Flash Recovery Tool Version 10.0(2) Release notes for Flash Recovery Tool Version 10.0(2) Problem Description After several months or years in continuous operation, underlying boot flash devices on NEXUS 7000 SUP2/2E supervisor boards may

More information

Knowledge-based Systems for Industrial Applications

Knowledge-based Systems for Industrial Applications Knowledge-based Systems for Industrial Applications 1 The Topic 2 Tasks Goal: Overview of different tasks Systematic and formal characterization as a requirement for theory and implementation Script: Chap.

More information

Aerospace Software Engineering

Aerospace Software Engineering 16.35 Aerospace Software Engineering Reliability, Availability, and Maintainability Software Fault Tolerance Prof. Kristina Lundqvist Dept. of Aero/Astro, MIT Definitions Software reliability The probability

More information

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 106 Self-organizing behavior of Wireless Ad Hoc Networks T. Raghu Trivedi, S. Giri Nath Abstract Self-organization

More information

Building a High Availability System on Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 Servers (Overview)

Building a High Availability System on Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 Servers (Overview) Building a High Availability System on Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 Servers (Overview) September 2017 Rev 3.0 Fujitsu LIMITED Contents Introduction 1. Model Case of HA System 2. Example

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance

More information

HP NonStop Database Solution

HP NonStop Database Solution CHOICE - CONFIDENCE - CONSISTENCY HP NonStop Database Solution Marco Sansoni, HP NonStop Business Critical Systems 9 ottobre 2012 Agenda Introduction to HP NonStop platform HP NonStop SQL database solution

More information

Automated Multi-Tier System Design for Service Availability

Automated Multi-Tier System Design for Service Availability Automated Multi-Tier System Design for Service Availability G. (John) Janakiraman, Jose Renato Santos, Yoshio Turner Internet Services and Storage Laboratory HP Laboratories Palo Alto May 22, 2003 Author

More information

Dependability tree 1

Dependability tree 1 Dependability tree 1 Means for achieving dependability A combined use of methods can be applied as means for achieving dependability. These means can be classified into: 1. Fault Prevention techniques

More information

OPTIMIZING PRODUCTION WORK FLOW USING OPEMCSS. John R. Clymer

OPTIMIZING PRODUCTION WORK FLOW USING OPEMCSS. John R. Clymer Proceedings of the 2000 Winter Simulation Conference J. A. Joines, R. R. Barton, K. Kang, and P. A. Fishwick, eds. OPTIMIZING PRODUCTION WORK FLOW USING OPEMCSS John R. Clymer Applied Research Center for

More information

Stochastic Petri nets

Stochastic Petri nets Stochastic Petri nets 1 Stochastic Petri nets Markov Chain grows very fast with the dimension of the system Petri nets: High-level specification formalism Markovian Stochastic Petri nets adding temporal

More information

Dependable and Secure Systems Dependability Master of Science in Embedded Computing Systems

Dependable and Secure Systems Dependability Master of Science in Embedded Computing Systems Dependable and Secure Systems Dependability Master of Science in Embedded Computing Systems Quantitative Dependability Analysis with Stochastic Activity Networks: the Möbius Tool April 2016 Andrea Domenici

More information

PRIMEQUEST 400 Series & SQL Server 2005 Technical Whitepaper (November, 2005)

PRIMEQUEST 400 Series & SQL Server 2005 Technical Whitepaper (November, 2005) PRIMEQUEST 400 Series & SQL Server 2005 Technical Whitepaper (November, 2005) Fujitsu Limited PRIMEQUEST 400 Series & SQL Server 2005 Technical White Paper PRIMEQUEST 400 Series Server & SQL Server 2005

More information

White paper PRIMEQUEST 1000 series high availability realized by Fujitsu s quality assurance

White paper PRIMEQUEST 1000 series high availability realized by Fujitsu s quality assurance White paper PRIMEQUEST 1000 series high availability realized by Fujitsu s quality assurance PRIMEQUEST is an open enterprise server platform that fully maximizes uptime. This whitepaper explains how Fujitsu

More information

Module 8 Fault Tolerance CS655! 8-1!

Module 8 Fault Tolerance CS655! 8-1! Module 8 Fault Tolerance CS655! 8-1! Module 8 - Fault Tolerance CS655! 8-2! Dependability Reliability! A measure of success with which a system conforms to some authoritative specification of its behavior.!

More information

FMEDA-Based Fault Injection and Data Analysis in Compliance with ISO SPEAKER. Dept. of Electrical Engineering, National Taipei University

FMEDA-Based Fault Injection and Data Analysis in Compliance with ISO SPEAKER. Dept. of Electrical Engineering, National Taipei University FMEDA-Based Fault Injection and Data Analysis in Compliance with ISO-26262 Kuen-Long Lu 1, 2,Yung-Yuan Chen 1, and Li-Ren Huang 2 SPEAKER 1 Dept. of Electrical Engineering, National Taipei University 2

More information

SHARPE Interface User's Manual Version 1.01

SHARPE Interface User's Manual Version 1.01 SHARPE Interface User's Manual Version 1.01 Contact information: Professor Kishor S. Trivedi Center for Advanced Computing and Communication (CACC) Department of Electrical and Computer Engineering Duke

More information

Module 4: Stochastic Activity Networks

Module 4: Stochastic Activity Networks Module 4: Stochastic Activity Networks Module 4, Slide 1 Stochastic Petri nets Session Outline Places, tokens, input / output arcs, transitions Readers / Writers example Stochastic activity networks Input

More information

COMPASS: FORMAL METHODS FOR SYSTEM-SOFTWARE CO-ENGINEERING

COMPASS: FORMAL METHODS FOR SYSTEM-SOFTWARE CO-ENGINEERING COMPASS: FORMAL METHODS FOR SYSTEM-SOFTWARE CO-ENGINEERING Viet Yen Nguyen Lehrstuhl für Informatik 2, RWTH Aachen University nguyen@cs.rwth-aachen.de Technology Innovation Days, ESA/ESTEC, 2011 ABOUT

More information

How Does Failover Affect Your SLA? How Does Failover Affect Your SLA?

How Does Failover Affect Your SLA? How Does Failover Affect Your SLA? How Does Failover Affect Your SLA? How Does Failover Affect Your SLA? Dr. Bill Highleyman Dr. Managing Bill Highleyman Editor, Availability Digest Managing HP NonStop Editor, Technical Availability Boot

More information

Building a High Availability System on Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 Servers (Overview)

Building a High Availability System on Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 Servers (Overview) Building a High Availability System on Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 Servers (Overview) May. 2017 Rev 2.1 Fujitsu LIMITED Contents Introduction 1. Model Case of HA System 2. Example of HA

More information

Dual-System Warm Standby of Remote Sensing Satellite Control System Technology

Dual-System Warm Standby of Remote Sensing Satellite Control System Technology 2016 3 rd International Conference on Materials Science and Mechanical Engineering (ICMSME 2016) ISBN: 978-1-60595-391-5 Dual-System Warm Standby of Remote Sensing Satellite Control System Technology Fei

More information

Fault Tolerant Computing CS 530

Fault Tolerant Computing CS 530 Fault Tolerant Computing CS 530 Lecture Notes 1 Introduction to the class Yashwant K. Malaiya Colorado State University 1 Instructor, TA Instructor: Yashwant K. Malaiya, Professor malaiya @ cs.colostate.edu

More information

Maximum Availability Architecture: Overview. An Oracle White Paper July 2002

Maximum Availability Architecture: Overview. An Oracle White Paper July 2002 Maximum Availability Architecture: Overview An Oracle White Paper July 2002 Maximum Availability Architecture: Overview Abstract...3 Introduction...3 Architecture Overview...4 Application Tier...5 Network

More information

Service Recovery & Availability. Robert Dickerson June 2010

Service Recovery & Availability. Robert Dickerson June 2010 Service Recovery & Availability Robert Dickerson June 2010 Started in 1971 with $3,000, 40 clients and 1 employee. 2009: over $2B revenue, 500,000+ clients, 13,000 employees. Payroll / Tax Services / 401(k)

More information

Diagnosis in the Time-Triggered Architecture

Diagnosis in the Time-Triggered Architecture TU Wien 1 Diagnosis in the Time-Triggered Architecture H. Kopetz June 2010 Embedded Systems 2 An Embedded System is a Cyber-Physical System (CPS) that consists of two subsystems: A physical subsystem the

More information

OL Connect Backup licenses

OL Connect Backup licenses OL Connect Backup licenses Contents 2 Introduction 3 What you need to know about application downtime 5 What are my options? 5 Reinstall, reactivate, and rebuild 5 Create a Virtual Machine 5 Run two servers

More information

IBM System Storage DS5020 Express

IBM System Storage DS5020 Express IBM DS5020 Express Manage growth, complexity, and risk with scalable, high-performance storage Highlights Mixed host interfaces support (FC/iSCSI) enables SAN tiering Balanced performance well-suited for

More information

Storage. Hwansoo Han

Storage. Hwansoo Han Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics

More information

Security+ Guide to Network Security Fundamentals, Third Edition. Chapter 13 Business Continuity

Security+ Guide to Network Security Fundamentals, Third Edition. Chapter 13 Business Continuity Security+ Guide to Network Security Fundamentals, Third Edition Chapter 13 Business Continuity Objectives Define business continuity Describe the components of redundancy planning List disaster recovery

More information

A Robust Bloom Filter

A Robust Bloom Filter A Robust Bloom Filter Yoon-Hwa Choi Department of Computer Engineering, Hongik University, Seoul, Korea. Orcid: 0000-0003-4585-2875 Abstract A Bloom filter is a space-efficient randomized data structure

More information

VERITAS Storage Foundation 4.0 TM for Databases

VERITAS Storage Foundation 4.0 TM for Databases VERITAS Storage Foundation 4.0 TM for Databases Powerful Manageability, High Availability and Superior Performance for Oracle, DB2 and Sybase Databases Enterprises today are experiencing tremendous growth

More information

WHITE PAPER Using Marathon everrun MX 6.1 with XenDesktop 5 Service Pack 1

WHITE PAPER Using Marathon everrun MX 6.1 with XenDesktop 5 Service Pack 1 WHITE PAPER Using Marathon everrun MX 6.1 with XenDesktop 5 Service Pack 1 www.citrix.com Contents Introduction... 2 Executive Overview... 2 Marathon everrun MX 6.1 (description by Marathon Technologies)...

More information

Issues in Programming Language Design for Embedded RT Systems

Issues in Programming Language Design for Embedded RT Systems CSE 237B Fall 2009 Issues in Programming Language Design for Embedded RT Systems Reliability and Fault Tolerance Exceptions and Exception Handling Rajesh Gupta University of California, San Diego ES Characteristics

More information

TANDBERG Management Suite - Redundancy Configuration and Overview

TANDBERG Management Suite - Redundancy Configuration and Overview Management Suite - Redundancy Configuration and Overview TMS Software version 11.7 TANDBERG D50396 Rev 2.1.1 This document is not to be reproduced in whole or in part without the permission in writing

More information

Functional Safety and Safety Standards: Challenges and Comparison of Solutions AA309

Functional Safety and Safety Standards: Challenges and Comparison of Solutions AA309 June 25th, 2007 Functional Safety and Safety Standards: Challenges and Comparison of Solutions AA309 Christopher Temple Automotive Systems Technology Manager Overview Functional Safety Basics Functional

More information

Page 1. Magnetic Disk Purpose Long term, nonvolatile storage Lowest level in the memory hierarchy. Typical Disk Access Time

Page 1. Magnetic Disk Purpose Long term, nonvolatile storage Lowest level in the memory hierarchy. Typical Disk Access Time Review: Major Components of a Computer Processor Control Datapath Cache Memory Main Memory Secondary Memory (Disk) Devices Output Input Magnetic Disk Purpose Long term, nonvolatile storage Lowest level

More information

Using Virtualization to Reduce Cost and Improve Manageability of J2EE Application Servers

Using Virtualization to Reduce Cost and Improve Manageability of J2EE Application Servers WHITEPAPER JANUARY 2006 Using Virtualization to Reduce Cost and Improve Manageability of J2EE Application Servers J2EE represents the state of the art for developing component-based multi-tier enterprise

More information

Grid Computing with Voyager

Grid Computing with Voyager Grid Computing with Voyager By Saikumar Dubugunta Recursion Software, Inc. September 28, 2005 TABLE OF CONTENTS Introduction... 1 Using Voyager for Grid Computing... 2 Voyager Core Components... 3 Code

More information

Siewiorek, Daniel P.; Swarz, Robert S.: Reliable Computer Systems. third. Wellesley, MA : A. K. Peters, Ltd., 1998., X

Siewiorek, Daniel P.; Swarz, Robert S.: Reliable Computer Systems. third. Wellesley, MA : A. K. Peters, Ltd., 1998., X Dependable Systems Hardware Dependability - Diagnosis Dr. Peter Tröger Sources: Siewiorek, Daniel P.; Swarz, Robert S.: Reliable Computer Systems. third. Wellesley, MA : A. K. Peters, Ltd., 1998., 156881092X

More information

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme On Checkpoint Latency Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: vaidya@cs.tamu.edu Web: http://www.cs.tamu.edu/faculty/vaidya/ Abstract

More information

Erlang s B and C-Formulae: Another Method to Estimate the Number of Routes

Erlang s B and C-Formulae: Another Method to Estimate the Number of Routes Erlang s B and C-Formulae: Another Method to Estimate the Number of Routes James K. Tamgno, Mamadou Alpha Barry, Simplice E. Gnang ESMT-DTE-LTI, ESMT, Dakar Senegal ESMT-DTE, Dakar Senegal james.tamgno@esmt.sn,

More information

Distributed and Cloud Computing

Distributed and Cloud Computing Distributed and Cloud Computing K. Hwang, G. Fox and J. Dongarra Chapter 2: Computer Clusters for Scalable parallel Computing Adapted from Kai Hwang University of Southern California March 30, 2012 Copyright

More information

After the Attack. Business Continuity. Planning and Testing Steps. Disaster Recovery. Business Impact Analysis (BIA) Succession Planning

After the Attack. Business Continuity. Planning and Testing Steps. Disaster Recovery. Business Impact Analysis (BIA) Succession Planning After the Attack Business Continuity Week 6 Part 2 Staying in Business Disaster Recovery Planning and Testing Steps Business continuity is a organization s ability to maintain operations after a disruptive

More information

To Cluster or Not Cluster Tom Scanlon NEC Solutions America

To Cluster or Not Cluster Tom Scanlon NEC Solutions America To Cluster or Not Cluster Tom Scanlon NEC Solutions America June 25, 2003 NEC Solutions America Agenda The PDC Case Study Availability Defined The Dilemma (to cluster or not) Cluster Application Availability

More information

FP7-4: Introduction to Reliability and Fault Tolerance. FP7-4: Introduction to Reliability and Fault Tolerance. The NASA Mars Space Mission

FP7-4: Introduction to Reliability and Fault Tolerance. FP7-4: Introduction to Reliability and Fault Tolerance. The NASA Mars Space Mission FP7-4: Introduction to Reliability and Fault Tolerance Youmin Zhang Phone: 7912 7741 Office Location: FUV 0.22 Email: ymzhang@cs.aaue.dk http://www.cs.aaue.dk/~ymzhang/courses/reliability/index.html FP7-4:

More information

POWER4 Systems: Design for Reliability. Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX

POWER4 Systems: Design for Reliability. Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX Systems: Design for Reliability Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX Microprocessor 2-way SMP system on a chip > 1 GHz processor frequency >1GHz Core Shared L2 >1GHz Core

More information

System of Systems Architecture Generation and Evaluation using Evolutionary Algorithms

System of Systems Architecture Generation and Evaluation using Evolutionary Algorithms SysCon 2008 IEEE International Systems Conference Montreal, Canada, April 7 10, 2008 System of Systems Architecture Generation and Evaluation using Evolutionary Algorithms Joseph J. Simpson 1, Dr. Cihan

More information

An Integrated ECC and BISR Scheme for Error Correction in Memory

An Integrated ECC and BISR Scheme for Error Correction in Memory An Integrated ECC and BISR Scheme for Error Correction in Memory Shabana P B 1, Anu C Kunjachan 2, Swetha Krishnan 3 1 PG Student [VLSI], Dept. of ECE, Viswajyothy College Of Engineering & Technology,

More information

Agenda. Agenda 11/12/12. Review - 6 Great Ideas in Computer Architecture

Agenda. Agenda 11/12/12. Review - 6 Great Ideas in Computer Architecture /3/2 Review - 6 Great Ideas in Computer Architecture CS 6C: Great Ideas in Computer Architecture (Machine Structures) Dependability and RAID Instructors: Krste Asanovic, Randy H. Katz hfp://inst.eecs.berkeley.edu/~cs6c/fa2.

More information

Defect Tolerance in VLSI Circuits

Defect Tolerance in VLSI Circuits Defect Tolerance in VLSI Circuits Prof. Naga Kandasamy We will consider the following redundancy techniques to tolerate defects in VLSI circuits. Duplication with complementary logic (physical redundancy).

More information

UTC3100 and 3170 POS RAID Information

UTC3100 and 3170 POS RAID Information UTC3100 and 3170 POS RAID Information Introduction The UTC3100 and 3170 POS systems may be purchased in a RAID configuration. RAID is defined by Intel as: Redundant Array of Independent Drives: allows

More information

Module 4 STORAGE NETWORK BACKUP & RECOVERY

Module 4 STORAGE NETWORK BACKUP & RECOVERY Module 4 STORAGE NETWORK BACKUP & RECOVERY BC Terminology, BC Planning Lifecycle General Conditions for Backup, Recovery Considerations Network Backup, Services Performance Bottlenecks of Network Backup,

More information

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON. DISTRIBUTED SYSTEMS 121r itac itple TAYAdiets Second Edition Andrew S. Tanenbaum Maarten Van Steen Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON Prentice Hall Upper Saddle River, NJ 07458 CONTENTS

More information

10 Having Hot Spare disks available in the system is strongly recommended. There are two types of Hot Spare disks:

10 Having Hot Spare disks available in the system is strongly recommended. There are two types of Hot Spare disks: 0 This Web Based Training module provides support and maintenance related information for the ETERNUS DX S2 family. This module provides an introduction to a number of ETERNUS Web GUI functions, for a

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices

More information

PowerVault MD3 Storage Array Enterprise % Availability

PowerVault MD3 Storage Array Enterprise % Availability PowerVault MD3 Storage Array Enterprise 99.999% Availability Dell Engineering June 2015 A Dell Technical White Paper THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS

More information

Copyright 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. Presentation_ID.scr 1. IPS _05_2001_c1

Copyright 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. Presentation_ID.scr 1. IPS _05_2001_c1 2001, Cisco Systems, Inc. All rights reserved. 1 Presentation_ID.scr 1 Introduction to High Availability Networking Session 2001, Cisco Systems, Inc. All rights reserved. 3 Agenda Introduction Building

More information

WHITE PAPER THE HIGHEST AVAILABILITY FEATURES FOR PRIMEQUEST

WHITE PAPER THE HIGHEST AVAILABILITY FEATURES FOR PRIMEQUEST WHITE PAPER THE HIGHEST AVAILABILITY FEATURES FOR PRIMEQUEST WHITE PAPER THE HIGHEST AVAILABILITY FEATURES FOR PRIMEQUEST Business continuity and cost-efficiency have become essential demands on IT platforms.

More information

CIT 668: System Architecture

CIT 668: System Architecture CIT 668: System Architecture Availability Topics 1. What is availability? 2. Measuring Availability 3. Failover 4. Failover Configurations 5. Linux HA Availability Availability is the ratio of the time

More information

RAID Controller Installation Guide

RAID Controller Installation Guide RAID Controller Installation Guide Document Number 60001075 Second Edition March 2003 The RAID Controller Installation Guide explains how to install and configure a RAID Controller in an Omvia Media Server.

More information

Dependable and Secure Systems Dependability

Dependable and Secure Systems Dependability Dependable and Secure Systems Dependability Master of Science in Embedded Computing Systems Quantitative Dependability Analysis with Stochastic Activity Networks: the Möbius Tool Andrea Domenici DII, Università

More information

ActiveScale Erasure Coding and Self Protecting Technologies

ActiveScale Erasure Coding and Self Protecting Technologies WHITE PAPER AUGUST 2018 ActiveScale Erasure Coding and Self Protecting Technologies BitSpread Erasure Coding and BitDynamics Data Integrity and Repair Technologies within The ActiveScale Object Storage

More information

DATA ITEM DESCRIPTION

DATA ITEM DESCRIPTION DATA ITEM DESCRIPTION Title: RELIABILITY AND MAINTAINABILITY (R&M) BLOCK DIAGRAMS AND MATHEMATICAL MODELS REPORT Number: DI-SESS-81496A Approval Date: 20141219 AMSC Number: 9508 Limitation: No DTIC Applicable:

More information

Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery

Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery White Paper Business Continuity Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery Table of Contents Executive Summary... 1 Key Facts About

More information

Reliability Availability Serviceability

Reliability Availability Serviceability NEC Enterprise Server Express5800/1000 Series Express5800/1000 Series Guide (1) Powered by Intel Itanium 2 Processor Express5800/1000 Series 1320Xe/1160Xe/1080Xe RAS Technology Three keys to success in

More information

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements

More information

CSCI 402: Computer Architectures. Performance of Multilevel Cache

CSCI 402: Computer Architectures. Performance of Multilevel Cache CSCI 402: Computer Architectures Memory Hierarchy (5) Fengguang Song Department of Computer & Information Science IUPUI Performance of Multilevel Cache Main Memory CPU L1 cache L2 cache Given CPU base

More information

1 of 6 4/8/2011 4:08 PM Electronic Hardware Information, Guides and Tools search newsletter subscribe Home Utilities Downloads Links Info Ads by Google Raid Hard Drives Raid Raid Data Recovery SSD in Raid

More information

PRINCIPAL COMPONENT ANALYSIS IMAGE DENOISING USING LOCAL PIXEL GROUPING

PRINCIPAL COMPONENT ANALYSIS IMAGE DENOISING USING LOCAL PIXEL GROUPING PRINCIPAL COMPONENT ANALYSIS IMAGE DENOISING USING LOCAL PIXEL GROUPING Divesh Kumar 1 and Dheeraj Kalra 2 1 Department of Electronics & Communication Engineering, IET, GLA University, Mathura 2 Department

More information

Slide 0 Welcome to the Support and Maintenance chapter of the ETERNUS DX90 S2 web based training.

Slide 0 Welcome to the Support and Maintenance chapter of the ETERNUS DX90 S2 web based training. Slide 0 Welcome to the Support and Maintenance chapter of the ETERNUS DX90 S2 web based training. 1 This module introduces support and maintenance related operations and procedures for the ETERNUS DX60

More information

Sun Fire V880 System Architecture. Sun Microsystems Product & Technology Group SE

Sun Fire V880 System Architecture. Sun Microsystems Product & Technology Group SE Sun Fire V880 System Architecture Sun Microsystems Product & Technology Group SE jjson@sun.com Sun Fire V880 Enterprise RAS Below PC Pricing NEW " Enterprise Class Application and Database Server " Scalable

More information