Error recovery through programnling
|
|
- Georgina Barnett
- 6 years ago
- Views:
Transcription
1 Error recovery through programnling by ALAN N. HIGGINS International Business Machines Corporation ' Kingston, New York INTRODUCTION The requirement for error recovery procedures has existed as long as computers themselves. Since the earliest computers, one of the goals of design has been to increase the reliability and availability of the computer to the user. While great strides have been made in this direction, the need of error recovery is still as present today as ever and at this time, the need is actually amplified and more pressing than ever before. With the many advanced techniques in programming such as multiprogramming and multiprocessing, the cost of an error has increased dramatically so that no longer are the consequences of an error limited "merely" to the loss of a job and the imposition of the need for a subsequent rerun. Error today can: Cause the termination of concurrently executing tasks. Cause an environmental control system to go down. Cause the loss of teleprocessing messages. Cause the generation of a report to be delayed No longer can rerunning the job be accepted as a prime means of "error recovery. The situation existing when running under an Operating System, and executing a number of jobs in the computer at the same time, makes improved error recovery procedures mandatory.. It is recognized that the Engineering Community is diligently striving to improve the hardware itself and thus for a complete solution it is necessary to look at the other half of the question of error recovery-what can be done to improve reliability~ to improve availability, to improve error recovery through programming? In order to do this, we have to first consider in a general way error recovery procedures or Recovery Management Support. The next step is to look specifically at some of the work which has been done in Operating System/60 with the Recovery Management Support for the Model 65. System incidents An examination of system incidents reveals that such incidents are due to a number of sources. Among these are Hard Core errors (including errors in the CPU, memory and channels), errors from Input/Output devices and control units, procedural and operational errors. Each of these is made up of a number of different errors but from a gross point of view, it seems reasonable to state that there are three general types of system interruptions: Hardware malfunctions. Design errors (both hardware and software). Operator or user injected errors. Systems planning must therefore be influenced by the facts that machines will malfunction, neither hardware nor software is perfect and that operators are still likely to make as many mistakes a.s they have in the past. Recovery management The primary objective in any error recovery procedure or Recovery :\ianagement Support should be to alleviate the burden of system interruptions to the user. In order to accomplish this we must: 1. Reduce the number of interruptions to which the user is exposed and, 2. :\iinimize the impact of these interruptions when they do occur. Recovery :\-ianagement therefore Hhould provide the user with a higher degree of system. availability (more time for more jobs) by minimizing the impact of system malfunctions upon his operations. With this objective as the target, error recovery takes on a broader meaning and scope than has been applied to the concept in the past. In an environment of multiprogramming, the system becomes all important and it is most necessary that no matter what happens, the sys-. tem must continue to function. It often becomes a situ- 39
2 40 Fall Joint Computer Conference, 1968 ation of sacrificing a part so that the "whole" may survive. In order to accomplish this, Recovery Management facilities may follow a pattern similar to one where the support attempts to reduce the number of system interruptions by retrying the operation which was interrupted by the malfunction or it may terminate the task affected and continue system operation. If this is not possible, then the second step toward accomplishjng the primary objective of error recovery becomes of paramount importance-to minimize the impact of the interruption. This is done by preparing the system for a simple restart or it may indicate that repair by maintenance personnel is required. Instruction retry. This pattern, which has just been outlined, suggests a nulllber of functions which can be performed to achieve the objectives of Recovery Management. The first of these functions is instruction retry. The concept of instruction retry is not really new. It is something which IBM has been doing for years, particularly in the I/O area. Instruction retry has been standard procedure whenever an error was en'countered in reading or writing a tape. But it is possible to extend this retry capability and to employ it when a CPU or memory malfunction occurs. A relatively large number of malfunctions are intermittent in nature rather than solid failures and therefore, there is a high probability of success of execut.ion and recovery if an instruction retry can be attempted. The first thing which must be determined then is whether instruction retry is feasible and then if feasible, to execute the retry. The determination of instruction retry feasibility is usually quite dependent upon the characteristics of the particular machine. Ordinarily for feasibility to exist, the "environment" of the computer must be valid or free from error. Dependent upon the specific machine, this may include the data contained in general purpose registers, floating point registers, machine log-out areas, permanent storage areas, etc. Arbitrarily, the criteria of validity can be keyed on parity. If the parity of the data is good, the environment is assullled valid and therefore retry is feasible. If parity is bad, then no further retry action can be taken. Having ascertained that instruction retry is feasible, it is necessary to continue the analysis and determine if a specific instruction is retryable. To do this, it is first necessary to locate the failing instruction. The procedure involved here is again dependent upon the particular machine and what type of fetch or pre-fetch logic is employed and whether or not the instruction counter is accurate. In one case, a comparison of the internal registers in the machine log-out can provide the clue as to whether the instruction counter is accurate; in another it may be a function of when the machine check occurred and what updating cycles the instruction counter was executing at the time. It is obvious, therefore, that it is not always easy or possible to locate the failing instruction but if the instruction counter is accurate and it is possible to locate the failing instruction, an analysis can be performed to ascertain whether the retry threshold of the interrupted instruction has been exceeded. (The retry threshold is that point in the instruction cycle after which retry cannot be attempted and is usua]]y indicated by a bit set by the hardware.) The retry threshold has been exceeded when during the normal instruction cycle one or more of the original operands has been changed. If the threshold has not been exceeded, it is possible to cause another attempt at executing the failing instructions. If, however, the threshold is exceeded, it may be possible to extend the threshold by examining the instruction type to determine whether a copy of the original operand might still be intact in some internal register and if it is, by restoring it. This is accomplished by re-,building (in a special execution area) the instruction from the contents of the log-out or the internal registers or main storage. Therefore, from an analysis, it is possible to determine that an instruction is either: I-Retry able, that is the retry threshold has not been exceeded or if it has been exceeded, the damaged operand can be restored and therefore instruction retry can be attempted or 2-Non-retryable, that is instruction retry is not possible because either the threshold has been exceeded or the damaged operand cannot be restored, an invalid environment exists because of incorrect parity or the value of the instru~tion counter is indeterminate. If the second condition is the case, then it is necessary to look for another way to handle the error recovery. Refresh main [)torage The occurrence of a parity error in main storage obviates instruction retry therefore, one function which could be of value would be the ability to "Refresh" main storage. By this is meant to repair the damage which either caused or was caused by a malfunction by loading a new copy of the affected module into main storage. (A module is a program unit that is di'screte and identifiable with respect to compiling, combining with other units and loading.) The use of refreshable code requires a good deal of foresight in coding since in order to be refreshable, a module must not modify itself or be Inodified by another module; for example, it must not set switches,
3 Error Recovery Through Programming 41 contain dynamic storage areas, or store registers or address pointers within the body of its code. The foresight is well rewarded, however, when it is possible to load this refreshable code and then continue execution without changing either the sequence or the results of the processing. The attribute "refreshable" is similar to "reentrant". Most reentrant modules meet the requirements specified above and in addition, a reentrant module is one that may be utilized by more than one task at a time (some modules classified as reentrant deviate from these requirements by operating in a psuedo disabled manner, thus actually allowing modifications during a short period of time). The difference between the two is that "reentrant" is based on the operational characteristics of the module within the system while "refreshable" is based only on the fact that the code is not modified in any manner. Selective termination The functions of instruction retry and refreshable code are most desirable since they render the error recovery procedure transparent to the user and require no intervention on his part. Unfortunately, it is not always possible to attain this level of recovery. When this is the case, it is necessary to accept some degradation in order to keep the system operational. One way to accomplish this is to implement a function of Selective Termination. Such a function would enable the system to examine the failing environment, determine what problem prograln was executing and then proceed to terminate this program while continuing all other jobs which were executing at the time of the malfunction. This is really a type of job-abort which frees the resources of the system allocated to the job and makes theln ava,ilable for future use. If a problem program was utilizing system code when the malfunction occurred, selective termination could be effective if the system code was transient rather than resident in nature. This process results in the loss of a specific job but it does enable the system to continue without interruption. Another function which would aid in the error recovery process when a memory malfunction occurs is the ability to logically carve out or remove that portion of the memory in which the malfunction occurred. Since this type of error recovery would result in job termination and might not return resources (Storage, I/O devices, etc.) to the system, such a procedure would obviously introduce undesirable side effects, such as loss of availability of I/O devices, loss of part of core and, loss of the terminated job, but it would preserve the system and operation would continu~ until an orderly correction could be made.. I/O Recovery The functions which have been discussed so far have been directed mainly to errors which occur in the CPU or memory. From an examination of system inciden~s, it is evident that a significant portion of errors occur In the I/O area. Is there anything which can be done to improve error recovery procedures for I/??. In the first place, there is I/O retry whlch IS available through the ERPs (Error Recovery Proce~ure~) for the different I/O devices. As indicated earlier, It has been standard procedure to retry I/O instructions when errors occur. A number of errors (unit check, unit exception wrong-length indication, protection check and som~ chaining checks) can be corrected by this means. An I/O Supervisor performs an analysis and selects, according to device, the proper ERP to attempt recovery. After retry is attempted, the ERP regains control to determine whether or not the retry has been successful. If it was successful, the I/O retry is transparent to the user. There is another group of I/O errors-channel checks (channel control check, channel data check and interface control check)-which need not be disastrous but which after analysis of the conditions causing the error, it may be possible to recover.. Such an analysis would determine the type of operation t~at failed the type of device affected, the sequences whlch occur~ed across the I/O interface following the error and whether a retry can be attempted. The I/O device or medium can malfunction and if a retry is not successful,. there may be other ways to continue the execution of the job. One such way would be to have the ability to switch data sets (devices), that is to change a tape or disk pack from one drive to another and then to retry the operation with the new drive. Another possibility (if the malfunction was really related to the Channel or Control Unit) would be to try another route to the same device. In this circumstance it would be an attempt to use the device by accessing it through a different route, that is by addressing it through a different channel or control unit. Other system incidents Another group of system incidents is due to procedural and operator errors. Several things can be done to decrease this and as such, it certainly deserves concentrated attention. The first is, of course, better trained personnel but from a programming ~oint of view,.several possibilities exist. It is most desirable to requlre a minimum of user intervention and interaction in order to accomplish execution. Control information should be minimal. When interaction is required, messages should be clear and concise - to the point of outlining
4 42 Fall Joint Computer Conference, 1968 possible choices. A conversation mode could be optional which would permit correction or confirmation of operator action. All these points are generally grouped under a concept of Operator Awareness and have a very definite place in the planning of any error recovery support. All of these functions are aimed at continuing the operation of the system but unfortunately this is not always possible to accomplish. Therefore, the next best thing is to minimize the effect of the malfunction. This can be done by attempting to preserve information concerning the malfunction and to make it available to assist knowledgeable personnel to determine what caused the error and what can be done to correct it. This will have the most desirable effect of shortening the Duration of the Unexpected Interrupt and get the system back in operation as quickly as possible RMS/65 The Recovery Management for the System/360 Model 65 (RMS/65) has provjded a number of these functions in the operating system. These functions are contained in two programs which make up RMS/65. These are the Machine Check Handler (MCR) which is directed at CPU and memory malfunction and Channel Check Handler (CCR) which is oriented to I/O problems. The RMS/65 has provided a hierarchy of recovery which involves four levels: I. Functional Recovery II. System Recovery III. System-Supported Restart IV. System Repair Functional Recov:ery is the successful retry of an interrupted instruction. MCR handles the operation for the CPU and main storage through its Machine Analysis and Instruction Retry (MAIR) facilities. The MAIR facilities perform an analysis of the machine environment at the time of the machine check interruption to determine the feasibility of retrying the interrupted instruction. MAIR then retries the interrupted instruction when retry is feasible. The CCH performs the analysis function for the channel checks discussed earlier. This is accomplished by intercepting I/O interruptions before the I/O Supervisor receives them and performing an analysis of the existing conditions. If feasible, the status bits are manipulated to make the channel check look like a failure for which ERP exists and then control is transferred to the appropriate ERP for action. Functional recovery is of course the desired goal because in this case the malfunction is transparent to the user. System Recovery is the second level of recovery and is required when functional recovery is either not feasible or fails. The objective is to preserve the system and to continue processing all unaffected jobs. This is done by means of a Program Damage Assessment and Repair feature which attempts to analyze the malfunction environment, to isolate and repair the program damage if possible and to report permanent failures to the program and operator. This feature also incorporates the mechanism to provide the capability of selective termination of a task. The function of System -Supported Restart is called on when both Functional and System Recovery have fail~d but a stop for repair is not required. The operator is informed that such a condition exists and that it is necessary to restart the system. The fourth level of recovery support provided by RMS/65 is System Repair. In a way, this is perhaps one of its most important functions since the detailed error analysis information which is provided can be of great assistance in the determination of the cause of failure and in suggesting the proper correction for the problem. Once the repair is completed, initialization is required to restart the system. Figure 1 shows the relationship of these levels of recovery to one another and to the main objective ofrecovery l\1anagement Support which is to keep the system in operation. Each level of recovery performs the important func- FIGURE 1
5 Error Recovery Through Programming 43 tion of recording information concerning what happened, the status of the computer at the time of the incident, what action was taken and the results of such an action. This information which is recorded on a special data set S YSI.LOGREC, is then available through execution of the Environment Record Editing and Printing utility (EREP) which runs under the control of the Operating System/360. This program edits and prints the records generated by MCH and CCH (as well as by several other recording functions) and provides the information for interpretation by the experienced Customer Engineer. A Standard Operating Procedure in a Computer Center using MCH and/or CCH should be to execute EREP on a regular basis and then the information should be available to the CE as an aid or indicator to anticipate serious trouble. For example, if a particular pattern appears indicating possible degrada- tion, preventative maintenance can be performed before the occurrence of a serious incident. CONCLUSION RMS/65 is a step in the direction which error recovery must take if the requirements of computer technology are to be met in this area. l\/fore and more the question of error recovery canr:tot be relegated to hardware or programming alone but rather these two must form an effective partnership and attack the problem together in order to provide. a satisfactory solution. Every sign indicates that this is being accomplished and it appears that some meaningful steps such as Rl\/fS/65 are being taken toward the goal of reducing the number of interruptions to which a user is exposed and to minimizing the impact of these interruptions when they do occur.
6
System/370 integrated emulation under OS and DOS
System/370 integrated emulation under OS and DOS by GARY R. ALLRED International Business Machines Corporation Kingston, N ew York INTRODUCTION The purpose of this paper is to discuss the design and development
More informationApplication generators: a case study
Application generators: a case study by JAMES H. WALDROP Hamilton Brothers Oil Company Denver, Colorado ABSTRACT Hamilton Brothers Oil Company recently implemented a complex accounting and finance system.
More informationSoftware reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment.
SOFTWARE ENGINEERING SOFTWARE RELIABILITY Software reliability is defined as the probability of failure-free operation of a software system for a specified time in a specified environment. LEARNING OBJECTIVES
More informationComputer support for an experimental PICTUREPHONE /computer system at Bell Telephone Laboratories, Incorporated
Computer support for an experimental PICTUREPHONE /computer system at Bell Telephone Laboratories, Incorporated by ERNESTO J. RODRIGUEZ Bell Telephone Laboratories, Incorporated Holmdel, New Jersey INTRODUCTION
More informationUnit 2 : Computer and Operating System Structure
Unit 2 : Computer and Operating System Structure Lesson 1 : Interrupts and I/O Structure 1.1. Learning Objectives On completion of this lesson you will know : what interrupt is the causes of occurring
More informationEnhanced Debugging with Traces
Enhanced Debugging with Traces An essential technique used in emulator development is a useful addition to any programmer s toolbox. Peter Phillips Creating an emulator to run old programs is a difficult
More informationOPERATING SYSTEM. Functions of Operating System:
OPERATING SYSTEM Introduction: An operating system (commonly abbreviated to either OS or O/S) is an interface between hardware and user. OS is responsible for the management and coordination of activities
More informationASSIST Assembler Replacement User s Guide
ASSIST Assembler Replacement User s Guide Program&Documentation: John R. Mashey Pro ject Supervision : Graham Campbell PSU Computer Science Department Preface This manual is the key reference source for
More informationChapter 8. Achmad Benny Mutiara
Chapter 8 SOFTWARE-TESTING STRATEGIES Achmad Benny Mutiara amutiara@staff.gunadarma.ac.id 8.1 STATIC-TESTING STRATEGIES Static testing is the systematic examination of a program structure for the purpose
More informationCritical Systems. Objectives. Topics covered. Critical Systems. System dependability. Importance of dependability
Objectives Critical Systems To explain what is meant by a critical system where system failure can have severe human or economic consequence. To explain four dimensions of dependability - availability,
More informationECE519 Advanced Operating Systems
IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor
More informationComputer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications
Computer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications concurrently on all computers in the cluster. Disadvantages:
More informationLESSON 13: LANGUAGE TRANSLATION
LESSON 13: LANGUAGE TRANSLATION Objective Interpreters and Compilers. Language Translation Phases. Interpreters and Compilers A COMPILER is a program that translates a complete source program into machine
More informationChapter 16. Burroughs' B6500/B7500 Stack Mechanism 1. E. A. Hauck / B. A. Dent. Introduction
Chapter 16 Burroughs' B6500/B7500 Stack Mechanism 1 E. A. Hauck / B. A. Dent Introduction Burroughs' B6500/B7500 system structure and philosophy are an extention of the concepts employed in the development
More informationCPS221 Lecture: Operating System Protection
Objectives CPS221 Lecture: Operating System Protection last revised 9/5/12 1. To explain the use of two CPU modes as the basis for protecting privileged instructions and memory 2. To introduce basic protection
More informationTHE LOGICAL STRUCTURE OF THE RC 4000 COMPUTER
THE LOGICAL STRUCTURE OF THE RC 4000 COMPUTER PER BRINCH HANSEN (1967) This paper describes the logical structure of the RC 4000, a 24-bit, binary computer designed for multiprogramming operation. The
More informationHashing. Hashing Procedures
Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements
More informationIBM System/370 Principles of Operation. Systems
Systems IBM System/370 Principles of Operation The IBM System/370 is a data processing system that is based on the IBM System/360 but that extends the capabilities of that system. This manual describes
More informationOperating Systems. Lecture 09: Input/Output Management. Elvis C. Foster
Operating Systems 141 Lecture 09: Input/Output Management Despite all the considerations that have discussed so far, the work of an operating system can be summarized in two main activities input/output
More informationChapter 9. Software Testing
Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of
More informationAnnouncement. Exercise #2 will be out today. Due date is next Monday
Announcement Exercise #2 will be out today Due date is next Monday Major OS Developments 2 Evolution of Operating Systems Generations include: Serial Processing Simple Batch Systems Multiprogrammed Batch
More informationSystem development, design & implementation
System development, design & implementation Design of software The following are the principle for any software design : Modularity and partitioning : Top down methods are used through out the analysis
More informationDeveloping Real-Time Systems
Developing Real-Time Systems by George R. Dimble, Jr. Introduction George R. Trimble, Jr., obtained a B.A. from St. John's College in 1948 and an M.A. in mathematics from the University of Delaware in
More informationProcess Management. Deadlock. Process Synchronization. Management Management. Starvation
Process Management Deadlock 7 Cases of Deadlock Conditions for Deadlock Modeling Deadlocks Strategies for Handling Deadlocks Avoidance Detection Recovery Starvation Process Synchronization Deadlock Starvation
More informationOperating Systems Overview. Chapter 2
1 Operating Systems Overview 2 Chapter 2 3 An operating System: The interface between hardware and the user From the user s perspective: OS is a program that controls the execution of application programs
More informationMultiprocessor and Real-Time Scheduling. Chapter 10
Multiprocessor and Real-Time Scheduling Chapter 10 1 Roadmap Multiprocessor Scheduling Real-Time Scheduling Linux Scheduling Unix SVR4 Scheduling Windows Scheduling Classifications of Multiprocessor Systems
More informationChapter 8 Memory Management
1 Chapter 8 Memory Management The technique we will describe are: 1. Single continuous memory management 2. Partitioned memory management 3. Relocatable partitioned memory management 4. Paged memory management
More informationLf1w1'eLC bliotfitl ~NS C.. /00/CAL VSTEMS. Maintenance Utility
Lf1w1'eLC bliotfitl Maintenance Utility /00/CAL VSTEMS ~NS C.. ç. TABLE OF CONTENTS LBMAINT - File Maintenance Utility... 1 Start up procedure... 2 Using LBMAINT... 3 The LBMAINT Scan menu... 5 Viewoption...
More informationSECTION 8 EXCEPTION PROCESSING
SECTION 8 EXCEPTION PROCESSING Exception processing is defined as the activities performed by the processor in preparing to execute a handler routine for any condition that causes an exception. In particular,
More informationIBM 3850-Mass storage system
BM 385-Mass storage system by CLAYTON JOHNSON BM Corporation Boulder, Colorado SUMMARY BM's 385, a hierarchical storage system, provides random access to stored data with capacity ranging from 35 X 1()9
More informationCourse: Advanced Software Engineering. academic year: Lecture 14: Software Dependability
Course: Advanced Software Engineering academic year: 2011-2012 Lecture 14: Software Dependability Lecturer: Vittorio Cortellessa Computer Science Department University of L'Aquila - Italy vittorio.cortellessa@di.univaq.it
More informationIntroduction. CS3026 Operating Systems Lecture 01
Introduction CS3026 Operating Systems Lecture 01 One or more CPUs Device controllers (I/O modules) Memory Bus Operating system? Computer System What is an Operating System An Operating System is a program
More informationIntroduction to Operating Systems. Chapter Chapter
Introduction to Operating Systems Chapter 1 1.3 Chapter 1.5 1.9 Learning Outcomes High-level understand what is an operating system and the role it plays A high-level understanding of the structure of
More informationCAD-CARE TROUBLESHOOTING GUIDE
CAD-CARE TROUBLESHOOTING GUIDE CAD-Care is a stable and error free system. The biggest problem encountered with CAD-Care is when something stops CAD-Care during a system sort. Windows Screen Savers have
More informationBackups and archives: What s the scoop?
E-Guide Backups and archives: What s the scoop? What s a backup and what s an archive? For starters, one of the differences worth noting is that a backup is always a copy while an archive should be original
More informationQuickly Repair the Most Common Problems that Prevent Windows XP from Starting Up
XP: Solving Windows Startup Problems X 34/1 Quickly Repair the Most Common Problems that Prevent Windows XP from Starting Up With the information in this article you can: The four most common Windows Startup
More informationNOM SIMULATOR TEST PLAN. Sections. A.1 Introduction
NOM SIMULATOR TEST PLAN A.1 Introduction A.2 Test Plan A.3 Test Design Specifications A.4 Test Case Specification A.5 Test Log A.6 Test Summary Report Sections [1] page 8 A.1 Introduction 1.1 Scope This
More informationChapter 5 B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 B Large and Fast: Exploiting Memory Hierarchy Dependability 5.5 Dependable Memory Hierarchy Chapter 6 Storage and Other I/O Topics 2 Dependability Service accomplishment Service delivered as
More informationChapter 12. CPU Structure and Function. Yonsei University
Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor
More informationUtilizing Linux Kernel Components in K42 K42 Team modified October 2001
K42 Team modified October 2001 This paper discusses how K42 uses Linux-kernel components to support a wide range of hardware, a full-featured TCP/IP stack and Linux file-systems. An examination of the
More informationCS352 Lecture - The Transaction Concept
CS352 Lecture - The Transaction Concept Last Revised 11/7/06 Objectives: 1. To introduce the notion of a transaction and the ACID properties of a transaction 2. To introduce the notion of the state of
More informationA DESIGN FOR A MULTIPLE USER MULTIPROCESSING SYSTEM
A DESIGN FOR A MULTIPLE USER MULTIPROCESSING SYSTEM James D. McCullough Kermith H. Speierman and Frank W. Zurcher Burroughs Corporation Paoli, Pennsylvania INTRODUCTION The B8500 system is designed to
More informationMicroSurvey Users: How to Report a Bug
MicroSurvey Users: How to Report a Bug Step 1: Categorize the Issue If you encounter a problem, as a first step it is important to categorize the issue as either: A Product Knowledge or Training issue:
More informationMultiprogramming. Evolution of OS. Today. Comp 104: Operating Systems Concepts 28/01/2013. Processes Management Scheduling & Resource Allocation
Comp 104: Operating Systems Concepts Management Scheduling & Resource Allocation Today OS evolution Introduction to processes OS structure 1 2 Evolution of OS Largely driven by desire to do something useful
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2016 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 System I/O System I/O (Chap 13) Central
More informationFAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)
Distributed Systems Fö 9/10-1 Distributed Systems Fö 9/10-2 FAULT TOLERANCE 1. Fault Tolerant Systems 2. Faults and Fault Models. Redundancy 4. Time Redundancy and Backward Recovery. Hardware Redundancy
More informationIntroduction to Operating. Chapter Chapter
Introduction to Operating Systems Chapter 1 1.3 Chapter 1.5 1.9 Learning Outcomes High-level understand what is an operating system and the role it plays A high-level understanding of the structure of
More informationOccasionally, a network or a gateway will go down, and the sequence. of hops which the packet takes from source to destination must change.
RFC: 816 FAULT ISOLATION AND RECOVERY David D. Clark MIT Laboratory for Computer Science Computer Systems and Communications Group July, 1982 1. Introduction Occasionally, a network or a gateway will go
More informationCHAPTER 3 RESOURCE MANAGEMENT
CHAPTER 3 RESOURCE MANAGEMENT SUBTOPIC Understand Memory Management Understand Processor Management INTRODUCTION Memory management is the act of managing computer memory. This involves providing ways to
More informationSolarWinds Technical Reference
This PDF is no longer being maintained. Search the SolarWinds Success Center for more information. SolarWinds Technical Reference Understanding Orion Advanced Alerts Orion Alerting... 1 Orion Advanced
More informationBasic Concepts of Reliability
Basic Concepts of Reliability Reliability is a broad concept. It is applied whenever we expect something to behave in a certain way. Reliability is one of the metrics that are used to measure quality.
More informationArm Assembly Language programming. 2. Inside the ARM
2. Inside the ARM In the previous chapter, we started by considering instructions executed by a mythical processor with mnemonics like ON and OFF. Then we went on to describe some of the features of an
More information10 Things to expect from a DB2 Cloning Tool
10 Things to expect from a DB2 Cloning Tool This document gives a brief overview of functionalities that can be expected from a modern DB2 cloning tool. The requirement to copy DB2 data becomes more and
More informationChapter 8 Virtual Memory
Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Modified by Rana Forsati for CSE 410 Outline Principle of locality Paging - Effect of page
More informationMaterials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming
CS311 Lecture: CPU Control: Hardwired control and Microprogrammed Control Last revised October 18, 2007 Objectives: 1. To explain the concept of a control word 2. To show how control words can be generated
More information2 Introduction to Processes
2 Introduction to Processes Required readings: Silberschatz/Galvin: Chapter 4 With many things happening at once in a system, need some clean way of separating them all out cleanly. sequential process,
More informationOPERATING SYSTEM SUPPORT (Part 1)
Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture OPERATING SYSTEM SUPPORT (Part 1) Introduction The operating system (OS) is the software
More informationOperating Systems Overview. Chapter 2
Operating Systems Overview Chapter 2 Operating System A program that controls the execution of application programs An interface between the user and hardware Masks the details of the hardware Layers and
More informationComputer-System Organization (cont.)
Computer-System Organization (cont.) Interrupt time line for a single process doing output. Interrupts are an important part of a computer architecture. Each computer design has its own interrupt mechanism,
More informationPart I Overview Chapter 1: Introduction
Part I Overview Chapter 1: Introduction Fall 2010 1 What is an Operating System? A computer system can be roughly divided into the hardware, the operating system, the application i programs, and dthe users.
More informationFunction. Description
Function Check In Get / Checkout Description Checking in a file uploads the file from the user s hard drive into the vault and creates a new file version with any changes to the file that have been saved.
More informationQ.1 Explain Computer s Basic Elements
Q.1 Explain Computer s Basic Elements Ans. At a top level, a computer consists of processor, memory, and I/O components, with one or more modules of each type. These components are interconnected in some
More informationModule 1. Introduction:
Module 1 Introduction: Operating system is the most fundamental of all the system programs. It is a layer of software on top of the hardware which constitutes the system and manages all parts of the system.
More informationChapter 8 & Chapter 9 Main Memory & Virtual Memory
Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array
More informationIssues in Programming Language Design for Embedded RT Systems
CSE 237B Fall 2009 Issues in Programming Language Design for Embedded RT Systems Reliability and Fault Tolerance Exceptions and Exception Handling Rajesh Gupta University of California, San Diego ES Characteristics
More informationOperating system Dr. Shroouq J.
2.2.2 DMA Structure In a simple terminal-input driver, when a line is to be read from the terminal, the first character typed is sent to the computer. When that character is received, the asynchronous-communication
More informationINFORMATION SECURITY- DISASTER RECOVERY
Information Technology Services Administrative Regulation ITS-AR-1505 INFORMATION SECURITY- DISASTER RECOVERY 1.0 Purpose and Scope The objective of this Administrative Regulation is to outline the strategy
More informationDISASTER RECOVERY PRIMER
DISASTER RECOVERY PRIMER 1 Site Faliure Occurs Power Faliure / Virus Outbreak / ISP / Ransomware / Multiple Servers Sample Disaster Recovery Process Site Faliure Data Centre 1: Primary Data Centre Data
More informationFault tolerance and Reliability
Fault tolerance and Reliability Reliability measures Fault tolerance in a switching system Modeling of fault tolerance and reliability Rka -k2002 Telecommunication Switching Technology 14-1 Summary of
More information1. Define Peripherals. Explain I/O Bus and Interface Modules. Peripherals: Input-output device attached to the computer are also called peripherals.
1. Define Peripherals. Explain I/O Bus and Interface Modules. Peripherals: Input-output device attached to the computer are also called peripherals. A typical communication link between the processor and
More informationData Protection Using Premium Features
Data Protection Using Premium Features A Dell Technical White Paper PowerVault MD3200 and MD3200i Series Storage Arrays www.dell.com/md3200 www.dell.com/md3200i THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES
More informationGeneral Objectives: To understand the process management in operating system. Specific Objectives: At the end of the unit you should be able to:
F2007/Unit5/1 UNIT 5 OBJECTIVES General Objectives: To understand the process management in operating system Specific Objectives: At the end of the unit you should be able to: define program, process and
More informationCPS352 Lecture - The Transaction Concept
Objectives: CPS352 Lecture - The Transaction Concept Last Revised March 3, 2017 1. To introduce the notion of a transaction and the ACID properties of a transaction 2. To introduce the notion of the state
More informationProcess size is independent of the main memory present in the system.
Hardware control structure Two characteristics are key to paging and segmentation: 1. All memory references are logical addresses within a process which are dynamically converted into physical at run time.
More informationDatabases and Database Systems
Page 1 of 6 Databases and Database Systems 9.1 INTRODUCTION: A database can be summarily described as a repository for data. This makes clear that building databases is really a continuation of a human
More informationUNIT:2. Process Management
1 UNIT:2 Process Management SYLLABUS 2.1 Process and Process management i. Process model overview ii. Programmers view of process iii. Process states 2.2 Process and Processor Scheduling i Scheduling Criteria
More informationFollowing are a few basic questions that cover the essentials of OS:
Operating Systems Following are a few basic questions that cover the essentials of OS: 1. Explain the concept of Reentrancy. It is a useful, memory-saving technique for multiprogrammed timesharing systems.
More informationSoftware Quality. Chapter What is Quality?
Chapter 1 Software Quality 1.1 What is Quality? The purpose of software quality analysis, or software quality engineering, is to produce acceptable products at acceptable cost, where cost includes calendar
More informationHot Topics in IT Disaster Recovery
Risk Masters International LLC Hot Topics in IT Disaster Recovery Steven J. Ross Executive Principal A Presentation for the Middle Tennessee Chapter of ISACA The Popular View of IT Disaster Recovery Today
More informationComputer System Overview
Computer System Overview Introduction A computer system consists of hardware system programs application programs 2 Operating System Provides a set of services to system users (collection of service programs)
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to
More informationUniversity Information Systems. Administrative Computing Services. Contingency Plan. Overview
University Information Systems Administrative Computing Services Contingency Plan Overview Last updated 01/11/2005 University Information Systems Administrative Computing Services Contingency Plan Overview
More informationLecture 1 Introduction (Chapter 1 of Textbook)
Bilkent University Department of Computer Engineering CS342 Operating Systems Lecture 1 Introduction (Chapter 1 of Textbook) Dr. İbrahim Körpeoğlu http://www.cs.bilkent.edu.tr/~korpe 1 References The slides
More informationP2P. 1 Introduction. 2 Napster. Alex S. 2.1 Client/Server. 2.2 Problems
P2P Alex S. 1 Introduction The systems we will examine are known as Peer-To-Peer, or P2P systems, meaning that in the network, the primary mode of communication is between equally capable peers. Basically
More informationSoftware Testing and Maintenance
Software Testing and Maintenance Testing Strategies Black Box Testing, also known as Behavioral Testing, is a software testing method in which the internal structure/ design/ implementation of the item
More informationManaging PC Recovery Settings and Functions
Managing PC Recovery Settings and Functions The programs and files supplied by Microsoft to make your computer operate are referred to as the system. The files which are created by you during your use
More informationDatabase Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.
Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 18 Transaction Processing and Database Manager In the previous
More informationPANASAS TIERED PARITY ARCHITECTURE
PANASAS TIERED PARITY ARCHITECTURE Larry Jones, Matt Reid, Marc Unangst, Garth Gibson, and Brent Welch White Paper May 2010 Abstract Disk drives are approximately 250 times denser today than a decade ago.
More informationECE519 Advanced Operating Systems
IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (6 th Week) (Advanced) Operating Systems 6. Concurrency: Deadlock and Starvation 6. Outline Principles of Deadlock
More informationOPERATING SYSTEMS. P. PRAVEEN Asst.Prof, CSE
OPERATING SYSTEMS By P. PRAVEEN Asst.Prof, CSE P. Praveen Asst Prof, Department of Computer Science and Engineering Page 1 P. Praveen Asst Prof, Department of Computer Science and Engineering Page 2 1
More informationCAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon
CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon The data warehouse environment - like all other computer environments - requires hardware resources. Given the volume of data and the type of processing
More informationSample Exam ISTQB Advanced Test Analyst Answer Rationale. Prepared By
Sample Exam ISTQB Advanced Test Analyst Answer Rationale Prepared By Released March 2016 TTA-1.3.1 (K2) Summarize the generic risk factors that the Technical Test Analyst typically needs to consider #1
More informationSoftware Quality. Richard Harris
Software Quality Richard Harris Part 1 Software Quality 143.465 Software Quality 2 Presentation Outline Defining Software Quality Improving source code quality More on reliability Software testing Software
More informationFile Organization Sheet
File Organization Sheet 1. What is a File? A collection of data is placed under permanent or non-volatile storage Examples: anything that you can store in a disk, hard drive, tape, optical media, and any
More informationIntroduction to Deadlocks
Unit 5 Introduction to Deadlocks Structure 5.1 Introduction Objectives 5.2 System Model 5.3 Deadlock Characterization Necessary Conditions for Deadlock Resource-Allocation Graph. 5.4 Deadlock Handling
More informationINTERNATIONAL TELECOMMUNICATION UNION STUDY GROUP 12 DELAYED CONTRIBUTION 95. Behaviour of data modems in lossy, packet-based transport systems
INTERNATIONAL TELECOMMUNICATION UNION TELECOMMUNICATION STANDARDIZATION SECTOR STUDY PERIOD 2005-2008 English only Original: English Questions: 11, 13/12 Geneva, 17-21 October 2005 Source: Title: STUDY
More informationProtecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery
White Paper Business Continuity Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery Table of Contents Executive Summary... 1 Key Facts About
More informationUNIT - IV. What is virtual memory?
UNIT - IV Virtual Memory Demand Paging Process creation Page Replacement Allocation of frames Thrashing- File Concept - Access Methods Directory Structure File System Mounting File Sharing Protection.
More informationSpring It takes a really bad school to ruin a good student and a really fantastic school to rescue a bad student. Dennis J.
Operating Systems * *Throughout the course we will use overheads that were adapted from those distributed from the textbook website. Slides are from the book authors, modified and selected by Jean Mayo,
More informationLecture Notes on Memory Layout
Lecture Notes on Memory Layout 15-122: Principles of Imperative Computation Frank Pfenning André Platzer Lecture 11 1 Introduction In order to understand how programs work, we can consider the functions,
More information