Error recovery through programnling

Size: px
Start display at page:

Download "Error recovery through programnling"

Transcription

1 Error recovery through programnling by ALAN N. HIGGINS International Business Machines Corporation ' Kingston, New York INTRODUCTION The requirement for error recovery procedures has existed as long as computers themselves. Since the earliest computers, one of the goals of design has been to increase the reliability and availability of the computer to the user. While great strides have been made in this direction, the need of error recovery is still as present today as ever and at this time, the need is actually amplified and more pressing than ever before. With the many advanced techniques in programming such as multiprogramming and multiprocessing, the cost of an error has increased dramatically so that no longer are the consequences of an error limited "merely" to the loss of a job and the imposition of the need for a subsequent rerun. Error today can: Cause the termination of concurrently executing tasks. Cause an environmental control system to go down. Cause the loss of teleprocessing messages. Cause the generation of a report to be delayed No longer can rerunning the job be accepted as a prime means of "error recovery. The situation existing when running under an Operating System, and executing a number of jobs in the computer at the same time, makes improved error recovery procedures mandatory.. It is recognized that the Engineering Community is diligently striving to improve the hardware itself and thus for a complete solution it is necessary to look at the other half of the question of error recovery-what can be done to improve reliability~ to improve availability, to improve error recovery through programming? In order to do this, we have to first consider in a general way error recovery procedures or Recovery Management Support. The next step is to look specifically at some of the work which has been done in Operating System/60 with the Recovery Management Support for the Model 65. System incidents An examination of system incidents reveals that such incidents are due to a number of sources. Among these are Hard Core errors (including errors in the CPU, memory and channels), errors from Input/Output devices and control units, procedural and operational errors. Each of these is made up of a number of different errors but from a gross point of view, it seems reasonable to state that there are three general types of system interruptions: Hardware malfunctions. Design errors (both hardware and software). Operator or user injected errors. Systems planning must therefore be influenced by the facts that machines will malfunction, neither hardware nor software is perfect and that operators are still likely to make as many mistakes a.s they have in the past. Recovery management The primary objective in any error recovery procedure or Recovery :\ianagement Support should be to alleviate the burden of system interruptions to the user. In order to accomplish this we must: 1. Reduce the number of interruptions to which the user is exposed and, 2. :\iinimize the impact of these interruptions when they do occur. Recovery :\-ianagement therefore Hhould provide the user with a higher degree of system. availability (more time for more jobs) by minimizing the impact of system malfunctions upon his operations. With this objective as the target, error recovery takes on a broader meaning and scope than has been applied to the concept in the past. In an environment of multiprogramming, the system becomes all important and it is most necessary that no matter what happens, the sys-. tem must continue to function. It often becomes a situ- 39

2 40 Fall Joint Computer Conference, 1968 ation of sacrificing a part so that the "whole" may survive. In order to accomplish this, Recovery Management facilities may follow a pattern similar to one where the support attempts to reduce the number of system interruptions by retrying the operation which was interrupted by the malfunction or it may terminate the task affected and continue system operation. If this is not possible, then the second step toward accomplishjng the primary objective of error recovery becomes of paramount importance-to minimize the impact of the interruption. This is done by preparing the system for a simple restart or it may indicate that repair by maintenance personnel is required. Instruction retry. This pattern, which has just been outlined, suggests a nulllber of functions which can be performed to achieve the objectives of Recovery Management. The first of these functions is instruction retry. The concept of instruction retry is not really new. It is something which IBM has been doing for years, particularly in the I/O area. Instruction retry has been standard procedure whenever an error was en'countered in reading or writing a tape. But it is possible to extend this retry capability and to employ it when a CPU or memory malfunction occurs. A relatively large number of malfunctions are intermittent in nature rather than solid failures and therefore, there is a high probability of success of execut.ion and recovery if an instruction retry can be attempted. The first thing which must be determined then is whether instruction retry is feasible and then if feasible, to execute the retry. The determination of instruction retry feasibility is usually quite dependent upon the characteristics of the particular machine. Ordinarily for feasibility to exist, the "environment" of the computer must be valid or free from error. Dependent upon the specific machine, this may include the data contained in general purpose registers, floating point registers, machine log-out areas, permanent storage areas, etc. Arbitrarily, the criteria of validity can be keyed on parity. If the parity of the data is good, the environment is assullled valid and therefore retry is feasible. If parity is bad, then no further retry action can be taken. Having ascertained that instruction retry is feasible, it is necessary to continue the analysis and determine if a specific instruction is retryable. To do this, it is first necessary to locate the failing instruction. The procedure involved here is again dependent upon the particular machine and what type of fetch or pre-fetch logic is employed and whether or not the instruction counter is accurate. In one case, a comparison of the internal registers in the machine log-out can provide the clue as to whether the instruction counter is accurate; in another it may be a function of when the machine check occurred and what updating cycles the instruction counter was executing at the time. It is obvious, therefore, that it is not always easy or possible to locate the failing instruction but if the instruction counter is accurate and it is possible to locate the failing instruction, an analysis can be performed to ascertain whether the retry threshold of the interrupted instruction has been exceeded. (The retry threshold is that point in the instruction cycle after which retry cannot be attempted and is usua]]y indicated by a bit set by the hardware.) The retry threshold has been exceeded when during the normal instruction cycle one or more of the original operands has been changed. If the threshold has not been exceeded, it is possible to cause another attempt at executing the failing instructions. If, however, the threshold is exceeded, it may be possible to extend the threshold by examining the instruction type to determine whether a copy of the original operand might still be intact in some internal register and if it is, by restoring it. This is accomplished by re-,building (in a special execution area) the instruction from the contents of the log-out or the internal registers or main storage. Therefore, from an analysis, it is possible to determine that an instruction is either: I-Retry able, that is the retry threshold has not been exceeded or if it has been exceeded, the damaged operand can be restored and therefore instruction retry can be attempted or 2-Non-retryable, that is instruction retry is not possible because either the threshold has been exceeded or the damaged operand cannot be restored, an invalid environment exists because of incorrect parity or the value of the instru~tion counter is indeterminate. If the second condition is the case, then it is necessary to look for another way to handle the error recovery. Refresh main [)torage The occurrence of a parity error in main storage obviates instruction retry therefore, one function which could be of value would be the ability to "Refresh" main storage. By this is meant to repair the damage which either caused or was caused by a malfunction by loading a new copy of the affected module into main storage. (A module is a program unit that is di'screte and identifiable with respect to compiling, combining with other units and loading.) The use of refreshable code requires a good deal of foresight in coding since in order to be refreshable, a module must not modify itself or be Inodified by another module; for example, it must not set switches,

3 Error Recovery Through Programming 41 contain dynamic storage areas, or store registers or address pointers within the body of its code. The foresight is well rewarded, however, when it is possible to load this refreshable code and then continue execution without changing either the sequence or the results of the processing. The attribute "refreshable" is similar to "reentrant". Most reentrant modules meet the requirements specified above and in addition, a reentrant module is one that may be utilized by more than one task at a time (some modules classified as reentrant deviate from these requirements by operating in a psuedo disabled manner, thus actually allowing modifications during a short period of time). The difference between the two is that "reentrant" is based on the operational characteristics of the module within the system while "refreshable" is based only on the fact that the code is not modified in any manner. Selective termination The functions of instruction retry and refreshable code are most desirable since they render the error recovery procedure transparent to the user and require no intervention on his part. Unfortunately, it is not always possible to attain this level of recovery. When this is the case, it is necessary to accept some degradation in order to keep the system operational. One way to accomplish this is to implement a function of Selective Termination. Such a function would enable the system to examine the failing environment, determine what problem prograln was executing and then proceed to terminate this program while continuing all other jobs which were executing at the time of the malfunction. This is really a type of job-abort which frees the resources of the system allocated to the job and makes theln ava,ilable for future use. If a problem program was utilizing system code when the malfunction occurred, selective termination could be effective if the system code was transient rather than resident in nature. This process results in the loss of a specific job but it does enable the system to continue without interruption. Another function which would aid in the error recovery process when a memory malfunction occurs is the ability to logically carve out or remove that portion of the memory in which the malfunction occurred. Since this type of error recovery would result in job termination and might not return resources (Storage, I/O devices, etc.) to the system, such a procedure would obviously introduce undesirable side effects, such as loss of availability of I/O devices, loss of part of core and, loss of the terminated job, but it would preserve the system and operation would continu~ until an orderly correction could be made.. I/O Recovery The functions which have been discussed so far have been directed mainly to errors which occur in the CPU or memory. From an examination of system inciden~s, it is evident that a significant portion of errors occur In the I/O area. Is there anything which can be done to improve error recovery procedures for I/??. In the first place, there is I/O retry whlch IS available through the ERPs (Error Recovery Proce~ure~) for the different I/O devices. As indicated earlier, It has been standard procedure to retry I/O instructions when errors occur. A number of errors (unit check, unit exception wrong-length indication, protection check and som~ chaining checks) can be corrected by this means. An I/O Supervisor performs an analysis and selects, according to device, the proper ERP to attempt recovery. After retry is attempted, the ERP regains control to determine whether or not the retry has been successful. If it was successful, the I/O retry is transparent to the user. There is another group of I/O errors-channel checks (channel control check, channel data check and interface control check)-which need not be disastrous but which after analysis of the conditions causing the error, it may be possible to recover.. Such an analysis would determine the type of operation t~at failed the type of device affected, the sequences whlch occur~ed across the I/O interface following the error and whether a retry can be attempted. The I/O device or medium can malfunction and if a retry is not successful,. there may be other ways to continue the execution of the job. One such way would be to have the ability to switch data sets (devices), that is to change a tape or disk pack from one drive to another and then to retry the operation with the new drive. Another possibility (if the malfunction was really related to the Channel or Control Unit) would be to try another route to the same device. In this circumstance it would be an attempt to use the device by accessing it through a different route, that is by addressing it through a different channel or control unit. Other system incidents Another group of system incidents is due to procedural and operator errors. Several things can be done to decrease this and as such, it certainly deserves concentrated attention. The first is, of course, better trained personnel but from a programming ~oint of view,.several possibilities exist. It is most desirable to requlre a minimum of user intervention and interaction in order to accomplish execution. Control information should be minimal. When interaction is required, messages should be clear and concise - to the point of outlining

4 42 Fall Joint Computer Conference, 1968 possible choices. A conversation mode could be optional which would permit correction or confirmation of operator action. All these points are generally grouped under a concept of Operator Awareness and have a very definite place in the planning of any error recovery support. All of these functions are aimed at continuing the operation of the system but unfortunately this is not always possible to accomplish. Therefore, the next best thing is to minimize the effect of the malfunction. This can be done by attempting to preserve information concerning the malfunction and to make it available to assist knowledgeable personnel to determine what caused the error and what can be done to correct it. This will have the most desirable effect of shortening the Duration of the Unexpected Interrupt and get the system back in operation as quickly as possible RMS/65 The Recovery Management for the System/360 Model 65 (RMS/65) has provjded a number of these functions in the operating system. These functions are contained in two programs which make up RMS/65. These are the Machine Check Handler (MCR) which is directed at CPU and memory malfunction and Channel Check Handler (CCR) which is oriented to I/O problems. The RMS/65 has provided a hierarchy of recovery which involves four levels: I. Functional Recovery II. System Recovery III. System-Supported Restart IV. System Repair Functional Recov:ery is the successful retry of an interrupted instruction. MCR handles the operation for the CPU and main storage through its Machine Analysis and Instruction Retry (MAIR) facilities. The MAIR facilities perform an analysis of the machine environment at the time of the machine check interruption to determine the feasibility of retrying the interrupted instruction. MAIR then retries the interrupted instruction when retry is feasible. The CCH performs the analysis function for the channel checks discussed earlier. This is accomplished by intercepting I/O interruptions before the I/O Supervisor receives them and performing an analysis of the existing conditions. If feasible, the status bits are manipulated to make the channel check look like a failure for which ERP exists and then control is transferred to the appropriate ERP for action. Functional recovery is of course the desired goal because in this case the malfunction is transparent to the user. System Recovery is the second level of recovery and is required when functional recovery is either not feasible or fails. The objective is to preserve the system and to continue processing all unaffected jobs. This is done by means of a Program Damage Assessment and Repair feature which attempts to analyze the malfunction environment, to isolate and repair the program damage if possible and to report permanent failures to the program and operator. This feature also incorporates the mechanism to provide the capability of selective termination of a task. The function of System -Supported Restart is called on when both Functional and System Recovery have fail~d but a stop for repair is not required. The operator is informed that such a condition exists and that it is necessary to restart the system. The fourth level of recovery support provided by RMS/65 is System Repair. In a way, this is perhaps one of its most important functions since the detailed error analysis information which is provided can be of great assistance in the determination of the cause of failure and in suggesting the proper correction for the problem. Once the repair is completed, initialization is required to restart the system. Figure 1 shows the relationship of these levels of recovery to one another and to the main objective ofrecovery l\1anagement Support which is to keep the system in operation. Each level of recovery performs the important func- FIGURE 1

5 Error Recovery Through Programming 43 tion of recording information concerning what happened, the status of the computer at the time of the incident, what action was taken and the results of such an action. This information which is recorded on a special data set S YSI.LOGREC, is then available through execution of the Environment Record Editing and Printing utility (EREP) which runs under the control of the Operating System/360. This program edits and prints the records generated by MCH and CCH (as well as by several other recording functions) and provides the information for interpretation by the experienced Customer Engineer. A Standard Operating Procedure in a Computer Center using MCH and/or CCH should be to execute EREP on a regular basis and then the information should be available to the CE as an aid or indicator to anticipate serious trouble. For example, if a particular pattern appears indicating possible degrada- tion, preventative maintenance can be performed before the occurrence of a serious incident. CONCLUSION RMS/65 is a step in the direction which error recovery must take if the requirements of computer technology are to be met in this area. l\/fore and more the question of error recovery canr:tot be relegated to hardware or programming alone but rather these two must form an effective partnership and attack the problem together in order to provide. a satisfactory solution. Every sign indicates that this is being accomplished and it appears that some meaningful steps such as Rl\/fS/65 are being taken toward the goal of reducing the number of interruptions to which a user is exposed and to minimizing the impact of these interruptions when they do occur.

6

System/370 integrated emulation under OS and DOS

System/370 integrated emulation under OS and DOS System/370 integrated emulation under OS and DOS by GARY R. ALLRED International Business Machines Corporation Kingston, N ew York INTRODUCTION The purpose of this paper is to discuss the design and development

More information

Application generators: a case study

Application generators: a case study Application generators: a case study by JAMES H. WALDROP Hamilton Brothers Oil Company Denver, Colorado ABSTRACT Hamilton Brothers Oil Company recently implemented a complex accounting and finance system.

More information

Software reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment.

Software reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment. SOFTWARE ENGINEERING SOFTWARE RELIABILITY Software reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment. LEARNING OBJECTIVES

More information

Computer support for an experimental PICTUREPHONE /computer system at Bell Telephone Laboratories, Incorporated

Computer support for an experimental PICTUREPHONE /computer system at Bell Telephone Laboratories, Incorporated Computer support for an experimental PICTUREPHONE /computer system at Bell Telephone Laboratories, Incorporated by ERNESTO J. RODRIGUEZ Bell Telephone Laboratories, Incorporated Holmdel, New Jersey INTRODUCTION

More information

Unit 2 : Computer and Operating System Structure

Unit 2 : Computer and Operating System Structure Unit 2 : Computer and Operating System Structure Lesson 1 : Interrupts and I/O Structure 1.1. Learning Objectives On completion of this lesson you will know : what interrupt is the causes of occurring

More information

Enhanced Debugging with Traces

Enhanced Debugging with Traces Enhanced Debugging with Traces An essential technique used in emulator development is a useful addition to any programmer s toolbox. Peter Phillips Creating an emulator to run old programs is a difficult

More information

OPERATING SYSTEM. Functions of Operating System:

OPERATING SYSTEM. Functions of Operating System: OPERATING SYSTEM Introduction: An operating system (commonly abbreviated to either OS or O/S) is an interface between hardware and user. OS is responsible for the management and coordination of activities

More information

ASSIST Assembler Replacement User s Guide

ASSIST Assembler Replacement User s Guide ASSIST Assembler Replacement User s Guide Program&Documentation: John R. Mashey Pro ject Supervision : Graham Campbell PSU Computer Science Department Preface This manual is the key reference source for

More information

Chapter 8. Achmad Benny Mutiara

Chapter 8. Achmad Benny Mutiara Chapter 8 SOFTWARE-TESTING STRATEGIES Achmad Benny Mutiara amutiara@staff.gunadarma.ac.id 8.1 STATIC-TESTING STRATEGIES Static testing is the systematic examination of a program structure for the purpose

More information

Critical Systems. Objectives. Topics covered. Critical Systems. System dependability. Importance of dependability

Critical Systems. Objectives. Topics covered. Critical Systems. System dependability. Importance of dependability Objectives Critical Systems To explain what is meant by a critical system where system failure can have severe human or economic consequence. To explain four dimensions of dependability - availability,

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor

More information

Computer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications

Computer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications Computer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications concurrently on all computers in the cluster. Disadvantages:

More information

LESSON 13: LANGUAGE TRANSLATION

LESSON 13: LANGUAGE TRANSLATION LESSON 13: LANGUAGE TRANSLATION Objective Interpreters and Compilers. Language Translation Phases. Interpreters and Compilers A COMPILER is a program that translates a complete source program into machine

More information

Chapter 16. Burroughs' B6500/B7500 Stack Mechanism 1. E. A. Hauck / B. A. Dent. Introduction

Chapter 16. Burroughs' B6500/B7500 Stack Mechanism 1. E. A. Hauck / B. A. Dent. Introduction Chapter 16 Burroughs' B6500/B7500 Stack Mechanism 1 E. A. Hauck / B. A. Dent Introduction Burroughs' B6500/B7500 system structure and philosophy are an extention of the concepts employed in the development

More information

CPS221 Lecture: Operating System Protection

CPS221 Lecture: Operating System Protection Objectives CPS221 Lecture: Operating System Protection last revised 9/5/12 1. To explain the use of two CPU modes as the basis for protecting privileged instructions and memory 2. To introduce basic protection

More information

THE LOGICAL STRUCTURE OF THE RC 4000 COMPUTER

THE LOGICAL STRUCTURE OF THE RC 4000 COMPUTER THE LOGICAL STRUCTURE OF THE RC 4000 COMPUTER PER BRINCH HANSEN (1967) This paper describes the logical structure of the RC 4000, a 24-bit, binary computer designed for multiprogramming operation. The

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

IBM System/370 Principles of Operation. Systems

IBM System/370 Principles of Operation. Systems Systems IBM System/370 Principles of Operation The IBM System/370 is a data processing system that is based on the IBM System/360 but that extends the capabilities of that system. This manual describes

More information

Operating Systems. Lecture 09: Input/Output Management. Elvis C. Foster

Operating Systems. Lecture 09: Input/Output Management. Elvis C. Foster Operating Systems 141 Lecture 09: Input/Output Management Despite all the considerations that have discussed so far, the work of an operating system can be summarized in two main activities input/output

More information

Chapter 9. Software Testing

Chapter 9. Software Testing Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of

More information

Announcement. Exercise #2 will be out today. Due date is next Monday

Announcement. Exercise #2 will be out today. Due date is next Monday Announcement Exercise #2 will be out today Due date is next Monday Major OS Developments 2 Evolution of Operating Systems Generations include: Serial Processing Simple Batch Systems Multiprogrammed Batch

More information

System development, design & implementation

System development, design & implementation System development, design & implementation Design of software The following are the principle for any software design : Modularity and partitioning : Top down methods are used through out the analysis

More information

Developing Real-Time Systems

Developing Real-Time Systems Developing Real-Time Systems by George R. Dimble, Jr. Introduction George R. Trimble, Jr., obtained a B.A. from St. John's College in 1948 and an M.A. in mathematics from the University of Delaware in

More information

Process Management. Deadlock. Process Synchronization. Management Management. Starvation

Process Management. Deadlock. Process Synchronization. Management Management. Starvation Process Management Deadlock 7 Cases of Deadlock Conditions for Deadlock Modeling Deadlocks Strategies for Handling Deadlocks Avoidance Detection Recovery Starvation Process Synchronization Deadlock Starvation

More information

Operating Systems Overview. Chapter 2

Operating Systems Overview. Chapter 2 1 Operating Systems Overview 2 Chapter 2 3 An operating System: The interface between hardware and the user From the user s perspective: OS is a program that controls the execution of application programs

More information

Multiprocessor and Real-Time Scheduling. Chapter 10

Multiprocessor and Real-Time Scheduling. Chapter 10 Multiprocessor and Real-Time Scheduling Chapter 10 1 Roadmap Multiprocessor Scheduling Real-Time Scheduling Linux Scheduling Unix SVR4 Scheduling Windows Scheduling Classifications of Multiprocessor Systems

More information

Chapter 8 Memory Management

Chapter 8 Memory Management 1 Chapter 8 Memory Management The technique we will describe are: 1. Single continuous memory management 2. Partitioned memory management 3. Relocatable partitioned memory management 4. Paged memory management

More information

Lf1w1'eLC bliotfitl ~NS C.. /00/CAL VSTEMS. Maintenance Utility

Lf1w1'eLC bliotfitl ~NS C.. /00/CAL VSTEMS. Maintenance Utility Lf1w1'eLC bliotfitl Maintenance Utility /00/CAL VSTEMS ~NS C.. ç. TABLE OF CONTENTS LBMAINT - File Maintenance Utility... 1 Start up procedure... 2 Using LBMAINT... 3 The LBMAINT Scan menu... 5 Viewoption...

More information

SECTION 8 EXCEPTION PROCESSING

SECTION 8 EXCEPTION PROCESSING SECTION 8 EXCEPTION PROCESSING Exception processing is defined as the activities performed by the processor in preparing to execute a handler routine for any condition that causes an exception. In particular,

More information

IBM 3850-Mass storage system

IBM 3850-Mass storage system BM 385-Mass storage system by CLAYTON JOHNSON BM Corporation Boulder, Colorado SUMMARY BM's 385, a hierarchical storage system, provides random access to stored data with capacity ranging from 35 X 1()9

More information

Course: Advanced Software Engineering. academic year: Lecture 14: Software Dependability

Course: Advanced Software Engineering. academic year: Lecture 14: Software Dependability Course: Advanced Software Engineering academic year: 2011-2012 Lecture 14: Software Dependability Lecturer: Vittorio Cortellessa Computer Science Department University of L'Aquila - Italy vittorio.cortellessa@di.univaq.it

More information

Introduction. CS3026 Operating Systems Lecture 01

Introduction. CS3026 Operating Systems Lecture 01 Introduction CS3026 Operating Systems Lecture 01 One or more CPUs Device controllers (I/O modules) Memory Bus Operating system? Computer System What is an Operating System An Operating System is a program

More information

Introduction to Operating Systems. Chapter Chapter

Introduction to Operating Systems. Chapter Chapter Introduction to Operating Systems Chapter 1 1.3 Chapter 1.5 1.9 Learning Outcomes High-level understand what is an operating system and the role it plays A high-level understanding of the structure of

More information

CAD-CARE TROUBLESHOOTING GUIDE

CAD-CARE TROUBLESHOOTING GUIDE CAD-CARE TROUBLESHOOTING GUIDE CAD-Care is a stable and error free system. The biggest problem encountered with CAD-Care is when something stops CAD-Care during a system sort. Windows Screen Savers have

More information

Backups and archives: What s the scoop?

Backups and archives: What s the scoop? E-Guide Backups and archives: What s the scoop? What s a backup and what s an archive? For starters, one of the differences worth noting is that a backup is always a copy while an archive should be original

More information

Quickly Repair the Most Common Problems that Prevent Windows XP from Starting Up

Quickly Repair the Most Common Problems that Prevent Windows XP from Starting Up XP: Solving Windows Startup Problems X 34/1 Quickly Repair the Most Common Problems that Prevent Windows XP from Starting Up With the information in this article you can: The four most common Windows Startup

More information

NOM SIMULATOR TEST PLAN. Sections. A.1 Introduction

NOM SIMULATOR TEST PLAN. Sections. A.1 Introduction NOM SIMULATOR TEST PLAN A.1 Introduction A.2 Test Plan A.3 Test Design Specifications A.4 Test Case Specification A.5 Test Log A.6 Test Summary Report Sections [1] page 8 A.1 Introduction 1.1 Scope This

More information

Chapter 5 B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5 B. Large and Fast: Exploiting Memory Hierarchy Chapter 5 B Large and Fast: Exploiting Memory Hierarchy Dependability 5.5 Dependable Memory Hierarchy Chapter 6 Storage and Other I/O Topics 2 Dependability Service accomplishment Service delivered as

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001 K42 Team modified October 2001 This paper discusses how K42 uses Linux-kernel components to support a wide range of hardware, a full-featured TCP/IP stack and Linux file-systems. An examination of the

More information

CS352 Lecture - The Transaction Concept

CS352 Lecture - The Transaction Concept CS352 Lecture - The Transaction Concept Last Revised 11/7/06 Objectives: 1. To introduce the notion of a transaction and the ACID properties of a transaction 2. To introduce the notion of the state of

More information

A DESIGN FOR A MULTIPLE USER MULTIPROCESSING SYSTEM

A DESIGN FOR A MULTIPLE USER MULTIPROCESSING SYSTEM A DESIGN FOR A MULTIPLE USER MULTIPROCESSING SYSTEM James D. McCullough Kermith H. Speierman and Frank W. Zurcher Burroughs Corporation Paoli, Pennsylvania INTRODUCTION The B8500 system is designed to

More information

MicroSurvey Users: How to Report a Bug

MicroSurvey Users: How to Report a Bug MicroSurvey Users: How to Report a Bug Step 1: Categorize the Issue If you encounter a problem, as a first step it is important to categorize the issue as either: A Product Knowledge or Training issue:

More information

Multiprogramming. Evolution of OS. Today. Comp 104: Operating Systems Concepts 28/01/2013. Processes Management Scheduling & Resource Allocation

Multiprogramming. Evolution of OS. Today. Comp 104: Operating Systems Concepts 28/01/2013. Processes Management Scheduling & Resource Allocation Comp 104: Operating Systems Concepts Management Scheduling & Resource Allocation Today OS evolution Introduction to processes OS structure 1 2 Evolution of OS Largely driven by desire to do something useful

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2016 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 System I/O System I/O (Chap 13) Central

More information

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d) Distributed Systems Fö 9/10-1 Distributed Systems Fö 9/10-2 FAULT TOLERANCE 1. Fault Tolerant Systems 2. Faults and Fault Models. Redundancy 4. Time Redundancy and Backward Recovery. Hardware Redundancy

More information

Introduction to Operating. Chapter Chapter

Introduction to Operating. Chapter Chapter Introduction to Operating Systems Chapter 1 1.3 Chapter 1.5 1.9 Learning Outcomes High-level understand what is an operating system and the role it plays A high-level understanding of the structure of

More information

Occasionally, a network or a gateway will go down, and the sequence. of hops which the packet takes from source to destination must change.

Occasionally, a network or a gateway will go down, and the sequence. of hops which the packet takes from source to destination must change. RFC: 816 FAULT ISOLATION AND RECOVERY David D. Clark MIT Laboratory for Computer Science Computer Systems and Communications Group July, 1982 1. Introduction Occasionally, a network or a gateway will go

More information

CHAPTER 3 RESOURCE MANAGEMENT

CHAPTER 3 RESOURCE MANAGEMENT CHAPTER 3 RESOURCE MANAGEMENT SUBTOPIC Understand Memory Management Understand Processor Management INTRODUCTION Memory management is the act of managing computer memory. This involves providing ways to

More information

SolarWinds Technical Reference

SolarWinds Technical Reference This PDF is no longer being maintained. Search the SolarWinds Success Center for more information. SolarWinds Technical Reference Understanding Orion Advanced Alerts Orion Alerting... 1 Orion Advanced

More information

Basic Concepts of Reliability

Basic Concepts of Reliability Basic Concepts of Reliability Reliability is a broad concept. It is applied whenever we expect something to behave in a certain way. Reliability is one of the metrics that are used to measure quality.

More information

Arm Assembly Language programming. 2. Inside the ARM

Arm Assembly Language programming. 2. Inside the ARM 2. Inside the ARM In the previous chapter, we started by considering instructions executed by a mythical processor with mnemonics like ON and OFF. Then we went on to describe some of the features of an

More information

10 Things to expect from a DB2 Cloning Tool

10 Things to expect from a DB2 Cloning Tool 10 Things to expect from a DB2 Cloning Tool This document gives a brief overview of functionalities that can be expected from a modern DB2 cloning tool. The requirement to copy DB2 data becomes more and

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Modified by Rana Forsati for CSE 410 Outline Principle of locality Paging - Effect of page

More information

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming CS311 Lecture: CPU Control: Hardwired control and Microprogrammed Control Last revised October 18, 2007 Objectives: 1. To explain the concept of a control word 2. To show how control words can be generated

More information

2 Introduction to Processes

2 Introduction to Processes 2 Introduction to Processes Required readings: Silberschatz/Galvin: Chapter 4 With many things happening at once in a system, need some clean way of separating them all out cleanly. sequential process,

More information

OPERATING SYSTEM SUPPORT (Part 1)

OPERATING SYSTEM SUPPORT (Part 1) Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture OPERATING SYSTEM SUPPORT (Part 1) Introduction The operating system (OS) is the software

More information

Operating Systems Overview. Chapter 2

Operating Systems Overview. Chapter 2 Operating Systems Overview Chapter 2 Operating System A program that controls the execution of application programs An interface between the user and hardware Masks the details of the hardware Layers and

More information

Computer-System Organization (cont.)

Computer-System Organization (cont.) Computer-System Organization (cont.) Interrupt time line for a single process doing output. Interrupts are an important part of a computer architecture. Each computer design has its own interrupt mechanism,

More information

Part I Overview Chapter 1: Introduction

Part I Overview Chapter 1: Introduction Part I Overview Chapter 1: Introduction Fall 2010 1 What is an Operating System? A computer system can be roughly divided into the hardware, the operating system, the application i programs, and dthe users.

More information

Function. Description

Function. Description Function Check In Get / Checkout Description Checking in a file uploads the file from the user s hard drive into the vault and creates a new file version with any changes to the file that have been saved.

More information

Q.1 Explain Computer s Basic Elements

Q.1 Explain Computer s Basic Elements Q.1 Explain Computer s Basic Elements Ans. At a top level, a computer consists of processor, memory, and I/O components, with one or more modules of each type. These components are interconnected in some

More information

Module 1. Introduction:

Module 1. Introduction: Module 1 Introduction: Operating system is the most fundamental of all the system programs. It is a layer of software on top of the hardware which constitutes the system and manages all parts of the system.

More information

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Chapter 8 & Chapter 9 Main Memory & Virtual Memory Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array

More information

Issues in Programming Language Design for Embedded RT Systems

Issues in Programming Language Design for Embedded RT Systems CSE 237B Fall 2009 Issues in Programming Language Design for Embedded RT Systems Reliability and Fault Tolerance Exceptions and Exception Handling Rajesh Gupta University of California, San Diego ES Characteristics

More information

Operating system Dr. Shroouq J.

Operating system Dr. Shroouq J. 2.2.2 DMA Structure In a simple terminal-input driver, when a line is to be read from the terminal, the first character typed is sent to the computer. When that character is received, the asynchronous-communication

More information

INFORMATION SECURITY- DISASTER RECOVERY

INFORMATION SECURITY- DISASTER RECOVERY Information Technology Services Administrative Regulation ITS-AR-1505 INFORMATION SECURITY- DISASTER RECOVERY 1.0 Purpose and Scope The objective of this Administrative Regulation is to outline the strategy

More information

DISASTER RECOVERY PRIMER

DISASTER RECOVERY PRIMER DISASTER RECOVERY PRIMER 1 Site Faliure Occurs Power Faliure / Virus Outbreak / ISP / Ransomware / Multiple Servers Sample Disaster Recovery Process Site Faliure Data Centre 1: Primary Data Centre Data

More information

Fault tolerance and Reliability

Fault tolerance and Reliability Fault tolerance and Reliability Reliability measures Fault tolerance in a switching system Modeling of fault tolerance and reliability Rka -k2002 Telecommunication Switching Technology 14-1 Summary of

More information

1. Define Peripherals. Explain I/O Bus and Interface Modules. Peripherals: Input-output device attached to the computer are also called peripherals.

1. Define Peripherals. Explain I/O Bus and Interface Modules. Peripherals: Input-output device attached to the computer are also called peripherals. 1. Define Peripherals. Explain I/O Bus and Interface Modules. Peripherals: Input-output device attached to the computer are also called peripherals. A typical communication link between the processor and

More information

Data Protection Using Premium Features

Data Protection Using Premium Features Data Protection Using Premium Features A Dell Technical White Paper PowerVault MD3200 and MD3200i Series Storage Arrays www.dell.com/md3200 www.dell.com/md3200i THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES

More information

General Objectives: To understand the process management in operating system. Specific Objectives: At the end of the unit you should be able to:

General Objectives: To understand the process management in operating system. Specific Objectives: At the end of the unit you should be able to: F2007/Unit5/1 UNIT 5 OBJECTIVES General Objectives: To understand the process management in operating system Specific Objectives: At the end of the unit you should be able to: define program, process and

More information

CPS352 Lecture - The Transaction Concept

CPS352 Lecture - The Transaction Concept Objectives: CPS352 Lecture - The Transaction Concept Last Revised March 3, 2017 1. To introduce the notion of a transaction and the ACID properties of a transaction 2. To introduce the notion of the state

More information

Process size is independent of the main memory present in the system.

Process size is independent of the main memory present in the system. Hardware control structure Two characteristics are key to paging and segmentation: 1. All memory references are logical addresses within a process which are dynamically converted into physical at run time.

More information

Databases and Database Systems

Databases and Database Systems Page 1 of 6 Databases and Database Systems 9.1 INTRODUCTION: A database can be summarily described as a repository for data. This makes clear that building databases is really a continuation of a human

More information

UNIT:2. Process Management

UNIT:2. Process Management 1 UNIT:2 Process Management SYLLABUS 2.1 Process and Process management i. Process model overview ii. Programmers view of process iii. Process states 2.2 Process and Processor Scheduling i Scheduling Criteria

More information

Following are a few basic questions that cover the essentials of OS:

Following are a few basic questions that cover the essentials of OS: Operating Systems Following are a few basic questions that cover the essentials of OS: 1. Explain the concept of Reentrancy. It is a useful, memory-saving technique for multiprogrammed timesharing systems.

More information

Software Quality. Chapter What is Quality?

Software Quality. Chapter What is Quality? Chapter 1 Software Quality 1.1 What is Quality? The purpose of software quality analysis, or software quality engineering, is to produce acceptable products at acceptable cost, where cost includes calendar

More information

Hot Topics in IT Disaster Recovery

Hot Topics in IT Disaster Recovery Risk Masters International LLC Hot Topics in IT Disaster Recovery Steven J. Ross Executive Principal A Presentation for the Middle Tennessee Chapter of ISACA The Popular View of IT Disaster Recovery Today

More information

Computer System Overview

Computer System Overview Computer System Overview Introduction A computer system consists of hardware system programs application programs 2 Operating System Provides a set of services to system users (collection of service programs)

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

University Information Systems. Administrative Computing Services. Contingency Plan. Overview

University Information Systems. Administrative Computing Services. Contingency Plan. Overview University Information Systems Administrative Computing Services Contingency Plan Overview Last updated 01/11/2005 University Information Systems Administrative Computing Services Contingency Plan Overview

More information

Lecture 1 Introduction (Chapter 1 of Textbook)

Lecture 1 Introduction (Chapter 1 of Textbook) Bilkent University Department of Computer Engineering CS342 Operating Systems Lecture 1 Introduction (Chapter 1 of Textbook) Dr. İbrahim Körpeoğlu http://www.cs.bilkent.edu.tr/~korpe 1 References The slides

More information

P2P. 1 Introduction. 2 Napster. Alex S. 2.1 Client/Server. 2.2 Problems

P2P. 1 Introduction. 2 Napster. Alex S. 2.1 Client/Server. 2.2 Problems P2P Alex S. 1 Introduction The systems we will examine are known as Peer-To-Peer, or P2P systems, meaning that in the network, the primary mode of communication is between equally capable peers. Basically

More information

Software Testing and Maintenance

Software Testing and Maintenance Software Testing and Maintenance Testing Strategies Black Box Testing, also known as Behavioral Testing, is a software testing method in which the internal structure/ design/ implementation of the item

More information

Managing PC Recovery Settings and Functions

Managing PC Recovery Settings and Functions Managing PC Recovery Settings and Functions The programs and files supplied by Microsoft to make your computer operate are referred to as the system. The files which are created by you during your use

More information

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 18 Transaction Processing and Database Manager In the previous

More information

PANASAS TIERED PARITY ARCHITECTURE

PANASAS TIERED PARITY ARCHITECTURE PANASAS TIERED PARITY ARCHITECTURE Larry Jones, Matt Reid, Marc Unangst, Garth Gibson, and Brent Welch White Paper May 2010 Abstract Disk drives are approximately 250 times denser today than a decade ago.

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (6 th Week) (Advanced) Operating Systems 6. Concurrency: Deadlock and Starvation 6. Outline Principles of Deadlock

More information

OPERATING SYSTEMS. P. PRAVEEN Asst.Prof, CSE

OPERATING SYSTEMS. P. PRAVEEN Asst.Prof, CSE OPERATING SYSTEMS By P. PRAVEEN Asst.Prof, CSE P. Praveen Asst Prof, Department of Computer Science and Engineering Page 1 P. Praveen Asst Prof, Department of Computer Science and Engineering Page 2 1

More information

CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon

CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon The data warehouse environment - like all other computer environments - requires hardware resources. Given the volume of data and the type of processing

More information

Sample Exam ISTQB Advanced Test Analyst Answer Rationale. Prepared By

Sample Exam ISTQB Advanced Test Analyst Answer Rationale. Prepared By Sample Exam ISTQB Advanced Test Analyst Answer Rationale Prepared By Released March 2016 TTA-1.3.1 (K2) Summarize the generic risk factors that the Technical Test Analyst typically needs to consider #1

More information

Software Quality. Richard Harris

Software Quality. Richard Harris Software Quality Richard Harris Part 1 Software Quality 143.465 Software Quality 2 Presentation Outline Defining Software Quality Improving source code quality More on reliability Software testing Software

More information

File Organization Sheet

File Organization Sheet File Organization Sheet 1. What is a File? A collection of data is placed under permanent or non-volatile storage Examples: anything that you can store in a disk, hard drive, tape, optical media, and any

More information

Introduction to Deadlocks

Introduction to Deadlocks Unit 5 Introduction to Deadlocks Structure 5.1 Introduction Objectives 5.2 System Model 5.3 Deadlock Characterization Necessary Conditions for Deadlock Resource-Allocation Graph. 5.4 Deadlock Handling

More information

INTERNATIONAL TELECOMMUNICATION UNION STUDY GROUP 12 DELAYED CONTRIBUTION 95. Behaviour of data modems in lossy, packet-based transport systems

INTERNATIONAL TELECOMMUNICATION UNION STUDY GROUP 12 DELAYED CONTRIBUTION 95. Behaviour of data modems in lossy, packet-based transport systems INTERNATIONAL TELECOMMUNICATION UNION TELECOMMUNICATION STANDARDIZATION SECTOR STUDY PERIOD 2005-2008 English only Original: English Questions: 11, 13/12 Geneva, 17-21 October 2005 Source: Title: STUDY

More information

Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery

Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery White Paper Business Continuity Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery Table of Contents Executive Summary... 1 Key Facts About

More information

UNIT - IV. What is virtual memory?

UNIT - IV. What is virtual memory? UNIT - IV Virtual Memory Demand Paging Process creation Page Replacement Allocation of frames Thrashing- File Concept - Access Methods Directory Structure File System Mounting File Sharing Protection.

More information

Spring It takes a really bad school to ruin a good student and a really fantastic school to rescue a bad student. Dennis J.

Spring It takes a really bad school to ruin a good student and a really fantastic school to rescue a bad student. Dennis J. Operating Systems * *Throughout the course we will use overheads that were adapted from those distributed from the textbook website. Slides are from the book authors, modified and selected by Jean Mayo,

More information

Lecture Notes on Memory Layout

Lecture Notes on Memory Layout Lecture Notes on Memory Layout 15-122: Principles of Imperative Computation Frank Pfenning André Platzer Lecture 11 1 Introduction In order to understand how programs work, we can consider the functions,

More information