Soft Error Fault Tolerant Systems: CS456 Survey
|
|
- Osborn Francis
- 6 years ago
- Views:
Transcription
1 Soft Error Fault Tolerant Systems: CS456 Survey Alok Garg Abstract Currently programming errors have been attributed to be the foremost cause of most system failures. But recent studies have suggested that soft errors are increasingly responsible for system downtime. Computer systems are becoming more complex and are optimized for price and performance and not for availability. This makes soft errors an even more common case. Move towards denser, smaller, and low voltage transistors has the potential to increase these transient errors. Until now most system software architectures assume complete faith in underlying hardware, and software make no provisions to deal with hardware faults. In this survey paper, we investigate the influence of soft error on the system as a whole and current research into proposed recovery mechanisms. 1 Introduction Soft errors are unintended transitions of logic state in a circuit typically caused external source of ionizing radiations. The ionization creates excess free carriers, which recombine with the stored charges, thereby corrupting the state of transistor. Device scaling, reduction in feature size and voltage levels of the transistor, along with high density transistors have increased the risk of hardware faults due to soft errors. Research by Shivakumar et al. [12] predict that soft error rate (SER) per chip of logic circuits will increase nine orders of magnitude from 1992 to 2011 and at that point will be comparable to the SER per chip of unprotected memory elements. Due to demand for high performance and low cost computers, availability has received less attention. It is a common belief that software errors are, and will continue to be, the most probable cause of loss of availability. But with Processors, caches, and memories are becoming larger, faster and denser, while being increasingly used in adverse environments, soft errors are also becoming more probable. Ziegler et al. [15,16], through Extensive field studies, predicted and verified the soft error rate (SER) of FIT (1 FIT equals 1 failure in 10 9 Hours) on a 16 Mbit DRAM chip. A system with 100 such chips will have a fail rate of about one per week. They also claim that a typical processor s silicon can have a soft-error rate of 4000 FIT, of which 50% will affect processor logic and 50% the large on-chip cache. Until now techniques such as Error Correction Codes (ECC) have been used to correct errors in main memory and system interconnects. Unfortunately such techniques only help reduce visible error rate for semiconductor elements that can be covered by such codes. For example, a 1 Gbit memory system based on 64Mbit DRAMs still has a combined visible error rate of 3435 FIT when using Single Error Correct-Double Error Detect (SEC- DED) ECC [3]. This is equivalent to around 300 errors in such machines in 1 year. Analysis by Xu et al. [13] shows that soft errors may lead to serious security vulnerabilities in the systems if not system crash. Due to price sensitivity and higher demand for performance, it is not cost effective for the hardware to provide full support in order to mask or contain these soft errors. Therefore, the burden falls to system software to attempt to handle these errors for highest availability. Current system software assumes complete faith in underlying hardware and doesn t provide provisions for any software based mechanisms to recover from hardware faults. Research by Messer et al. [6] and Charng-da Lu [2] have analyzed the effect of soft errors on system software. Software faulttolerance techniques have also been proposed by Rebaudengo et al. [10] and Milojicic et al. [7]. 2 Existing Hardware Support for Error Handling Availability in computer systems is determined by hardware and software reliability. Hardware reliability has traditionally existed only in proprietary servers, with specialized redundantly configured hardware and critical software components. But few relia- 1
2 bility features also exist in commercial price sensitive processors. 2.1 Support for Memory and Communication Errors Depending on memory size, technology sensitivity to soft errors, and price pressure, PC systems usually support at least parity detection on main memory and system buses. Error Correction Codes (ECC) is also supported for large caches. Parity check is able to detect and report 1-bit in error. While normally ECC is capable of correcting 1-bit in error and detecting 2-bit in error (SEC-DED). Once the error is detected processor tries to correct these error if possible. Otherwise, processor may report the error to firmware. Firmware with advance software support may handle the reported errors Itanium Processor The IA-64 architecture extends support for soft errors in two ways [9]. First, additional hardware detection is supported for processor implementation, such as providing parity or ECC protection to the system bus and the three on-chip caches. These provide good coverage from transient errors in processor cache memory and system buses. Second, the recoverability is handled through machine-check exception. Machine-check exception is supported by providing several types of well-defined error scenarios. Error logging provides information for potential software containment of the errors. Itanium processor reliability and availability features are presented in Figure Memory Raid ECC still offers excellent protection for many servers. As memory capacity grows, however, the level of effectiveness ECC provides actually decreases. HP developed Hot Plug RAID Memory [4] to extend the effectiveness of ECC. Hot Plug RAID Memory provides redundancy and hot-plug capabilities for dual inline memory modules (DIMMs) to deliver unprecedented levels of availability, scalability, and fault tolerance. 2.2 Support for Logical Errors Although, a lot of commercial support is available for error correction and detection in memories and system buses, but any kind protection from soft errors in logical circuits is almost missing. According to the predictions, soft errors are going to be as probable in logical circuits as in system memories or caches, and would become a reliability concern by A lot of research is cur- Figure 1. Itanium processor reliability and availability features. [9] rently going on at circuit and architecture level to detect and correct errors in processor logic. But this research is still very rudimentary and impractical for complex and price sensitive processors. 3 Influence of Soft Errors on System Software Possible software error recovery mechanisms require clear understanding of influence of soft errors on the system software. This is also required to understand whether transient error is an issue for system software or not. And, if soft error is an issue, how severs is it? Although no field study of any kind exist, but simulations using software fault injection have provided a relative insight into the problem. Impact of soft errors on a commodity OS is analyzed by Messer et al. [6]. Charng-da Lu and Daniel A. Reed [2] simulated single bit memory errors, register file upsets, and MPI message payload corruption and measured the behavioral response for a suite of MPI applications. The potential sources of error in the Processor provide better understanding of simulated fault injection techniques. Components of the processor having direct impact on the software, if any part of the processor is corrupted by soft error, are: Processor Register: Regular integer registers are most vulnerable to transient errors. Because these general purpose 2
3 registers contain live data at any give time, single bit upsets in these registers are very likely to affect application behavior. Transient errors in the processor logic may also propagate corrupt results to the registers. For example, transient error during arithmetic addition may write wrong result to the destination register. These kind of errors are very difficult to detect. Processor Cache: Processor cache (SRAM) including memory and TLB cache is at least protected by parity checks and is less vulnerable compared to system memory (DRAM). System Memory: Error in the software code or data may change the behavior of the program if remains un-detected. System memory is usually protected by ECC SEC-DED. System Bus: Bus logic is also prone to soft errors and normally uses parity check for error detection. IO Buffers: Error in IO buffer may corrupt the information available from the disk or network. Distributed applications like MPI are more sensitive to soft errors in IO buffers. To understand how errors affect the system software, soft errors can be characterized with following information. Overwritten: Errors detected in memory or register during write may be ignored since the content is overwritten. User Signalable: If the error is detected while reading from the memory or from a processor register. Error on Memory read is considered user signalable if location of the read is in user data/code space space, while processor is executing user or kernel code. At the same time register read is considered user signalable if processor is executing user code. Recovery from these errors is possible by signaling user applications either for termination or potential application recovery. Kernel Fatal: If the error is detected during memory read located in Kernel data/code space or register read while executing kernel code. These errors may corrupt the Kernel state and hence are called Kernel Fatal. Silent Data Corruption: If error remains undetected by any of the mechanisms, may still corrupt the output. This is most dangerous of all the possible errors because there is little sign during the execution that can alert the user. 3.1 Error Injections Given the importance of soft errors on system software, fault injection techniques are used to study software responses to transient faults. Fault injection can be either hardware-based or software-based [5]. Hardware fault injection technique consists of subjecting chips to heavy ion radiation to simulate the effect of alpha particles. In contrast, software-implemented fault injection does not require expensive equipments and can target specific software components, such as the operating system, software libraries or applications. Messer et al. [6] performed investigations on an IA-32 platform using watch points to simulate memory errors. /proc Kernel virtual file system interface is used to setup a watch point, called a /proc/mfi. The watch point facility does not allow more than one virtual address to be monitored simultaneously. A user program randomly selects the physical address for error injection. Kernel searches with the physical address (kernel or user) maps into the physical address provided by the user program. Reverse page table entry (PTE) lookup is performed when a task is first scheduled after the error was injected. Timeout based mechanism is used for time bound simulations. Charng-da Lu and Daniel A. Reed [2] used memory fault injection to target both registers and application memory regions. Fault injector employs different techniques for injecting faults in different regions of the address space. Techniques for injecting faults in the IO buffer are also used. 3.2 Soft Error Analysis Based on simulated experiments conducted by previous research [2,6], following insight provide better understanding of impact on system software. Registers and IO buffer are particularly vulnerable to singlebit-flip faults, an average of 34.7% of all the activated faults. When IO buffer fault activate, the chance of producing a wrong output can be quite high, ranging from 28 to 71%. 90% of the memory errors need not be fatal to the operating systems execution and may require minor support for partial recovery. Large number of memory activations are overwritten. This stems from the write before read use of most memory locations. 3
4 Kernel fatal memory accesses only accounts for a small number of all memory errors. For user applications, the memory errors in the object heap have a higher activation and susceptibility rate than those in the static data area. A large portion of heap error activation is caused by the garbage collector, and cause fewer application errors than other sources of activation. Above analysis clearly points out that software based fault tolerant efforts must target processor registers along with memory. While only few of the memory faults are actually damaging. 4 Software Based Recovery Approach Various mechanisms are proposed [1,7,10] through which system software tries to provide fault tolerance and higher availability guarantees. These methods depends on level of processor support in error handling is provided to the firmware. Recover techniques are also based on contexts, like fault tolerant schemes in context of distributed systems could be much different from schemes for a single system. A generalized scheme for fault containment and recovery is presented in Figure 2. An error is typically detected at the hardware level, and then it is interpreted, logged, and if needed the next level (firmware) is notified. The interpret/log/notify phases are repeated at different levels until either the error is recovered or determined as non-recoverable. This order of events is presented in Figure 3, where firmware level is split into processor and platform specific, and the OS level into Machine Check Abort (MCA, a serious error exception) and OS-specific. 4.1 Error Detection Hardware typically detects errors through parity check or ECC. It is possible that hardware does not provide support for certain kind of error scenarios like soft error in processor logic. Even if hardware does not detect some errors, it is possible for software to detect inconsistencies typically represented in the form of invalid pointers or incorrect checksums. We have already discussed the hardware support of error detection in Section 2. We will discuss some of the well know software based transient error detection schemes in the following sections Assertions [11] The use of Assertions, i.e. logic statements inserted at different points in the program that reflects invariant relationships between the variables of the program can lead to different problems, since assertions are not transparent to the programmer and their effectiveness largely depends on the nature of the application and on the programmers ability Control Flow Checking [14] The basic idea of Control Flow checking is to partition the application program in basic blocks, i.e., branch-free parts of code. For each block a deterministic signature is computed and faults can be detected by comparing the run-time signature with a precomputed one. In most control-flow checking techniques one of the main problems is to tune the test granularity that should be used Procedure Duplication [8] Considering the Procedure Duplication, the programmer decides to duplicate the most critical procedures and to compare the obtained results. This approach requires that the programmer define a set of procedures to be duplicated and introduces the proper checks on the results. These code modifications can be executed only manually and may introduce errors Data and Code Redudancy [10] Figure 2. Errors are detected, then the error state is logged, interpreted and recovery attempted. If unsuccessful, the next level may be notified. [7] Data and code redundancy is proposed to detect errors affecting both data and code. The redundancy is introduced according to a set of transformations to be performed on the high-level source code. Errors in data are detected by duplicating each variable and adding consistency checks after every read operation. Other transformations focus on errors affecting the code, and cor- 4
5 Figure 3. Memory failure recovery scenario. Memory error is typically detected by HW. If the error cannot be contained, it is notified to FW. FW gathers information and attempts recovery. Recovery is performed at the processor and at the platform-level. If recovery is possible, the state is prepared for OS and it is notified. OS attempts recovery at the MCA and at the OS-level. In case of successful OS recovery, application is notified with relevant state. Application analyzes the state and attempts to recover. All but the first arrows are optional. [7] respond from one side, to duplicate the code implementing each operation, and from the other side, to add checks for verifying the consistency of the executed operations. The main advantage of the method lies in the fact that it can be automatically applied to a high-level source code, thus freeing the programmer from the burden of guaranteeing its robustness against errors (e.g., by selecting what to duplicate and where to put the checks). The method is completely independent on the underlying hardware, and addresses any kind of fault affecting either the code or the data Directions for Improvements in Fault Detection Techniques All the methods we have discussed for transient fault detection assumes very little hardware support, and are generic techniques. Due to generic nature of above error detection techniques, software overhead of error detection is very high. Software overhead is highest for scheme based on Data and Code Redundancy. These overheads may turn out to be very costly for commonly used systems. Alternative hardware aware software techniques could be more feasible solution to improve error detection of the system as a hole. Some of these low level techniques may go into firmware and others may be part of the hardware dependent OS layer, based on specific techniques targeted towards the specific hardware and OS kernel. 4.2 Fault Recovery Mechanisms Whenever the fault is detected in hardware, Processor tries to correct it. If the fault is uncorrectable, then Processor tries to contain the error by giving firmware an opportunity for error recovery. We have already discussed hardware based error recover and containment mechanisms in Section 2. In this section we investigate how software (firmware, OS, or application) can react to errors, given system is capable of detecting transient faults missed by the hardware and return to the consistent state that existed before the failure. If the error cannot be notified in an exact and restartable manner, then the software needs to offer greater support for recovery. For the software to be able to restart the transactions, it is required that sufficient state be saved. Hence software complexity increases with reduced hardware support for same level of fault tolerance. Based on classification of faults according to severity in Section 3, various recovery mechanisms can be implemented in software according to level of availability expected from the system as a whole. Few of the mechanisms for OS recovery are discussed next: User Signalable: In the case of user signalable errors, the state of a particular user program has become corrupt, but the processor may allow the kernel to continue operating. As a result, the kernel can signal the user task and proceed with another one or interrupt the system call. User program may deal with the recovery according to the availability requirements from the application at the application level. Kernel Fatal: Error recovery is possible through analysis of kernel. Like error in duplicate memory regions may be recovered by re-fetching the data for the correct copy. Corruption within logs or statistical counters should not bring the system down. More complicated checkpoint based rollback recovery mechanisms may also be implemented. Recovery mechanism differs and depends on individual level 5
6 Table 1. Failure Recovery Outcome [7] Level Recovery Full Recovery Partial Recovery System Failure Hardware mask errors halt/downgrade performance/functionality halt/reboot Firmware mask error notify OS reboot (notify OS) OS continue to execute notify app, kill user reboot OS thread Application continue to execute notify user terminate applica- tion (Hardware, Firmware, OS, and Application). For example, a distributed application may still provide availability when one node fails. Hence outcome of the recovery can be full or partial recovery, or system failure based on detected error. Refer to Table 1 for details. Full recovery effectively masks the errors from higher levels; error may be logged for statistical purposes. In case recovery is not possible at any particular level, system is halted to prevent corrupt data from propagating to network or disk. 5 Conclusion Because of common belief that soft errors would dominate all kinds of errors, increased support for soft error signaling would be required in future not only in hardware, but also in software to increase the availability of the system as a whole. We have discussed many error detection and recovery aspects of the system, both at hardware and software level. 6 Road Map Road map for improving the availability of the system through hardware and software cooperation is summarized as follows: Hierarchical approach for recovery from the soft errors at different levels may provide elegant solution for improved system availability. But naive implementation of fault detection and recovery techniques at each level may be costly in terms system performance and complexity. Implementation of fault detection techniques need to be balanced at each level (Hardware, Firmware, OS, and Application) and optimum for performance and complexity. Fault recovery mechanisms at each level requires better understanding of sensitivity of these levels to soft errors, so that recovery mechanisms can be optimized for cost and performance. Each level may also differentiate critical data structures from reliability point of view and indicate tolerable latencies for improved reliability. References [1] N. S. Bowen and D. K. Pradhan. Processor and Memory- Based Checkpoint and Rollback Recovery. IEEE Computer, 26(2):22 31, Feb [2] C. da Lu and D. A. Reed. Assessing Fault Sensitivity in MPI Applications. In Supercomputing, page 37, Pittsburgh, Pennsylvania, Nov [3] T. J. Dell. A White Paper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory. IBM Microelectronics Division, Nov [4] HP. Tech brief: Hot Plug RAID Memory technology for fault tolerance and scalability, Sept [5] M.-C. Hsueh, T. K. Tsai, and R. K. Iyer. Fault Injection Techniques and Tools. IEEE Computer, 30(4):75 82, Apr [6] A. Messer, P. Bernadat, G. Fu, D. Chen, Z. Dimitrijevic, D. Lie, D. D. Mannaru, A. Riska, and D. Milojicic. Susceptibility of Commodity Systems and Software to Memory Soft Errors. IEEE Transactions on Computers, 53(12): , Dec [7] D. Milojicic, A. Messer, J. Shau, G. Fu, P. Alto, and A. Munoz. Increasing relevance of memory hardware errors: a case for recoverable programming models. In ACM SIGOPS European workshop, pages , Kolding, Denmark, Sept [8] D. K. Pradhan. Fault-Tolerant Computer System Design. Prentice Hall PTR, [9] N. Quach. High Availability and Reliability in the Itanium Processor. IEEE Micro, 20(5):61 69, Sept. Oct [10] M. Rebaudengo, M. S. Reorda, M. Torchiano, and M. Violante. Soft-error Detection through Software Fault- Tolerance techniques. In IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, pages , Albuquerque, New Mexico, Nov [11] M. Z. Rela, H. Madeira, and J. G. Silva. Experimental Evaluation of the Fail Silent Behavior in Programs with Consistency Checks. In International Symposium on Fault- Tolerant Computing, pages , Sendai, Japan, June
7 [12] P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic. In International Conference on Dependable Systems and Networks, pages , Bethesda, Maryland, June [13] J. Xu, S. Chen, Z. Kalbarczyk, and R. K. Iyer. An Experimental Study of Security Vulnerabilities Caused by Errors. In International Conference on Dependable Systems and Networks, pages , Goteborg, Sweden, [14] S. Yau and F. Chen. An Approach to Concurrent Control Flow Checking. IEEE Transactions on Software Engineering, 6(2): , Mar [15] J. F. Ziegler, H. W. Curtis, H. P. Muhlfeld, C. J. Montrose, B. Chin, M. Nicewicz, C. A. Russell, W. Y. Wang, L. B. Freeman, P. Hosier, L. E. LaFave, J. L. Walsh, J. M. Orro, G. J. Unger, J. M. Ross, T. J. O Gorman, B. Messina, T. D. Sullivan, A. J. Sykes, H. Yourke, T. A. Enger, V. Tolat, T. S. Scott, A. H. Taber, R. J. Sussman, W. A. Klein, and C. W. Wahaus. IBM experiments in soft fails in computer electronics ( ). IBM Journal of Research and Development, 40(1):3 18, Jan [16] J. F. Ziegler, H. P. Muhlfeld, C. J. Montrose, H. W. Curtis, T. J. O Gorman, and J. M. Ross. Accelerated testing for cosmic soft-error rate. IBM Journal of Research and Development, 40(1):51 72, Jan
hot plug RAID memory technology for fault tolerance and scalability
hp industry standard servers april 2003 technology brief TC030412TB hot plug RAID memory technology for fault tolerance and scalability table of contents abstract... 2 introduction... 2 memory reliability...
More informationImproving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy
Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Wei Chen, Rui Gong, Fang Liu, Kui Dai, Zhiying Wang School of Computer, National University of Defense Technology,
More informationARCHITECTURE DESIGN FOR SOFT ERRORS
ARCHITECTURE DESIGN FOR SOFT ERRORS Shubu Mukherjee ^ШВпШшр"* AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO T^"ТГПШГ SAN FRANCISCO SINGAPORE SYDNEY TOKYO ^ P f ^ ^ ELSEVIER Morgan
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 5 Processor-Level Techniques & Byzantine Failures Chapter 2 Hardware Fault Tolerance Part.5.1 Processor-Level Techniques
More informationHP Advanced Memory Protection technologies
HP Advanced Memory Protection technologies technology brief, 5th edition Abstract... 2 Introduction... 2 Memory errors... 2 Single-bit and multi-bit errors... 3 Hard errors and soft errors... 3 Increasing
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance
More informationHDL IMPLEMENTATION OF SRAM BASED ERROR CORRECTION AND DETECTION USING ORTHOGONAL LATIN SQUARE CODES
HDL IMPLEMENTATION OF SRAM BASED ERROR CORRECTION AND DETECTION USING ORTHOGONAL LATIN SQUARE CODES (1) Nallaparaju Sneha, PG Scholar in VLSI Design, (2) Dr. K. Babulu, Professor, ECE Department, (1)(2)
More informationMultiple Event Upsets Aware FPGAs Using Protected Schemes
Multiple Event Upsets Aware FPGAs Using Protected Schemes Costas Argyrides, Dhiraj K. Pradhan University of Bristol, Department of Computer Science Merchant Venturers Building, Woodland Road, Bristol,
More informationReliable Architectures
6.823, L24-1 Reliable Architectures Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 6.823, L24-2 Strike Changes State of a Single Bit 10 6.823, L24-3 Impact
More informationSoftware-based Fault Tolerance Mission (Im)possible?
Software-based Fault Tolerance Mission Im)possible? Peter Ulbrich The 29th CREST Open Workshop on Software Redundancy November 18, 2013 System Software Group http://www4.cs.fau.de Embedded Systems Initiative
More informationTransient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof.
Transient Fault Detection and Reducing Transient Error Rate Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof. Steven Swanson Outline Motivation What are transient faults? Hardware Fault Detection
More informationExploiting Unused Spare Columns to Improve Memory ECC
2009 27th IEEE VLSI Test Symposium Exploiting Unused Spare Columns to Improve Memory ECC Rudrajit Datta and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer Engineering
More informationSusceptibility of Modern Systems and Software to Soft Errors
Susceptibility of Modern Systems and Software to Soft Errors Alan Messer, Philippe Bernadat, Guangrui Fu, Deqing Chen 1, Zoran Dimitrijevic 2, David Lie 3, Durga Devi Mannaru 4, Alma Riska 5, Dejan Milojicic
More informationECE 574 Cluster Computing Lecture 19
ECE 574 Cluster Computing Lecture 19 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 10 November 2015 Announcements Projects HW extended 1 MPI Review MPI is *not* shared memory
More informationArea-Efficient Error Protection for Caches
Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (III)
COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory
More informationRedundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992
Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical
More informationECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University
Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2010 Daniel J. Sorin Duke University Definition and Motivation Outline General Principles of Available System Design
More informationIntel iapx 432-VLSI building blocks for a fault-tolerant computer
Intel iapx 432-VLSI building blocks for a fault-tolerant computer by DAVE JOHNSON, DAVE BUDDE, DAVE CARSON, and CRAIG PETERSON Intel Corporation Aloha, Oregon ABSTRACT Early in 1983 two new VLSI components
More informationRedundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992
Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical
More informationLecture 5: Scheduling and Reliability. Topics: scheduling policies, handling DRAM errors
Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors 1 PAR-BS Mutlu and Moscibroda, ISCA 08 A batch of requests (per bank) is formed: each thread can only contribute
More informationLecture 5: Refresh, Chipkill. Topics: refresh basics and innovations, error correction
Lecture 5: Refresh, Chipkill Topics: refresh basics and innovations, error correction 1 Refresh Basics A cell is expected to have a retention time of 64ms; every cell must be refreshed within a 64ms window
More informationFast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs
Fast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs Hamid R. Zarandi,2, Seyed Ghassem Miremadi, Costas Argyrides 2, Dhiraj K. Pradhan 2 Department of Computer Engineering, Sharif
More informationAN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES
AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES S. SRINIVAS KUMAR *, R.BASAVARAJU ** * PG Scholar, Electronics and Communication Engineering, CRIT
More informationEliminating Single Points of Failure in Software Based Redundancy
Eliminating Single Points of Failure in Software Based Redundancy Peter Ulbrich, Martin Hoffmann, Rüdiger Kapitza, Daniel Lohmann, Reiner Schmid and Wolfgang Schröder-Preikschat EDCC May 9, 2012 SYSTEM
More informationA Low-Cost Correction Algorithm for Transient Data Errors
A Low-Cost Correction Algorithm for Transient Data Errors Aiguo Li, Bingrong Hong School of Computer Science and Technology Harbin Institute of Technology, Harbin 150001, China liaiguo@hit.edu.cn Introduction
More informationDuke University Department of Electrical and Computer Engineering
Duke University Department of Electrical and Computer Engineering Senior Honors Thesis Spring 2008 Proving the Completeness of Error Detection Mechanisms in Simple Core Chip Multiprocessors Michael Edward
More informationRobust System Design with MPSoCs Unique Opportunities
Robust System Design with MPSoCs Unique Opportunities Subhasish Mitra Robust Systems Group Departments of Electrical Eng. & Computer Sc. Stanford University Email: subh@stanford.edu Acknowledgment: Stanford
More informationA Robust Bloom Filter
A Robust Bloom Filter Yoon-Hwa Choi Department of Computer Engineering, Hongik University, Seoul, Korea. Orcid: 0000-0003-4585-2875 Abstract A Bloom filter is a space-efficient randomized data structure
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationAR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance
More informationAccurate Analysis of Single Event Upsets in a Pipelined Microprocessor
Accurate Analysis of Single Event Upsets in a Pipelined Microprocessor M. Rebaudengo, M. Sonza Reorda, M. Violante Politecnico di Torino Dipartimento di Automatica e Informatica Torino, Italy www.cad.polito.it
More informationDATA DOMAIN INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY
WHITEPAPER DATA DOMAIN INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY A Detailed Review ABSTRACT No single mechanism is sufficient to ensure data integrity in a storage system.
More informationA Low-Power ECC Check Bit Generator Implementation in DRAMs
252 SANG-UHN CHA et al : A LOW-POWER ECC CHECK BIT GENERATOR IMPLEMENTATION IN DRAMS A Low-Power ECC Check Bit Generator Implementation in DRAMs Sang-Uhn Cha *, Yun-Sang Lee **, and Hongil Yoon * Abstract
More informationFPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes
FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes E. Jebamalar Leavline Assistant Professor, Department of ECE, Anna University, BIT Campus, Tiruchirappalli, India Email: jebilee@gmail.com
More informationECC Protection in Software
Center for RC eliable omputing ECC Protection in Software by Philip P Shirvani RATS June 8, 1999 Outline l Motivation l Requirements l Coding Schemes l Multiple Error Handling l Implementation in ARGOS
More informationError Detecting and Correcting Code Using Orthogonal Latin Square Using Verilog HDL
Error Detecting and Correcting Code Using Orthogonal Latin Square Using Verilog HDL Ch.Srujana M.Tech [EDT] srujanaxc@gmail.com SR Engineering College, Warangal. M.Sampath Reddy Assoc. Professor, Department
More informationCS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University
CS 370: SYSTEM ARCHITECTURE & SOFTWARE [MASS STORAGE] Frequently asked questions from the previous class survey Shrideep Pallickara Computer Science Colorado State University L29.1 L29.2 Topics covered
More informationI/O Hardwares. Some typical device, network, and data base rates
Input/Output 1 I/O Hardwares Some typical device, network, and data base rates 2 Device Controllers I/O devices have components: mechanical component electronic component The electronic component is the
More informationDESIGN AND ANALYSIS OF TRANSIENT FAULT TOLERANCE FOR MULTI CORE ARCHITECTURE
DESIGN AND ANALYSIS OF TRANSIENT FAULT TOLERANCE FOR MULTI CORE ARCHITECTURE DivyaRani 1 1pg scholar, ECE Department, SNS college of technology, Tamil Nadu, India -----------------------------------------------------------------------------------------------------------------------------------------------
More informationVery Large Scale Integration (VLSI)
Very Large Scale Integration (VLSI) Lecture 10 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Content Manufacturing Defects Wafer defects Chip defects Board defects system defects
More informationStorage systems. Computer Systems Architecture CMSC 411 Unit 6 Storage Systems. (Hard) Disks. Disk and Tape Technologies. Disks (cont.
Computer Systems Architecture CMSC 4 Unit 6 Storage Systems Alan Sussman November 23, 2004 Storage systems We already know about four levels of storage: registers cache memory disk but we've been a little
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity
More informationZKLWHýSDSHU. 3UHð)DLOXUHý:DUUDQW\ý 0LQLPL]LQJý8QSODQQHGý'RZQWLPH. +3ý 1HW6HUYHUý 0DQDJHPHQW. Executive Summary. A Closer Look
3UHð)DLOXUHý:DUUDQW\ý 0LQLPL]LQJý8QSODQQHGý'RZQWLPH ZKLWHýSDSHU Executive Summary The Hewlett-Packard Pre-Failure Warranty 1 helps you run your business with less downtime. It extends the advantage of
More information416 Distributed Systems. Errors and Failures Oct 16, 2018
416 Distributed Systems Errors and Failures Oct 16, 2018 Types of Errors Hard errors: The component is dead. Soft errors: A signal or bit is wrong, but it doesn t mean the component must be faulty Note:
More informationLast class: Today: Course administration OS definition, some history. Background on Computer Architecture
1 Last class: Course administration OS definition, some history Today: Background on Computer Architecture 2 Canonical System Hardware CPU: Processor to perform computations Memory: Programs and data I/O
More informationXentry: Hypervisor-Level Soft Error Detection
2014 43rd International Conference on Parallel Processing Xentry: Hypervisor-Level Soft Error Detection Xin Xu Ron C. Chiang H. Howie Huang George Washington University Abstract Cloud data centers leverage
More informationSoft-error Detection Using Control Flow Assertions
Soft-error Detection Using Control Flow Assertions O. Goloubeva, M. Rebaudengo, M. Sonza Reorda, M. Violante Politecnico di Torino, Dipartimento di Automatica e Informatica Torino, Italy Abstract Over
More informationVirtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili
Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationLet Software Decide: Matching Application Diversity with One- Size-Fits-All Memory
Let Software Decide: Matching Application Diversity with One- Size-Fits-All Memory Mattan Erez The University of Teas at Austin 2010 Workshop on Architecting Memory Systems March 1, 2010 iggest Problems
More informationAnalyzing Heap Error Behavior in Embedded JVM Environments
Analyzing Heap Error Behavior in Embedded JVM Environments G. Chen, M. Kandemir, N. Vijaykrishnan, A. Sivasubramaniam, and M. J. Irwin Department of Computer Science and Engineering The Pennsylvania State
More informationImproving Fault Tolerance Using Memory Redundancy and Hot-Plug Actions in Dell PowerEdge Servers
Improving Fault Tolerance Using Redundancy and Hot-Plug Actions in Dell PowerEdge Servers Features that enable redundancy across physical memory can enhance server reliability and help keep critical business
More informationArchitectural Level Fault- Tolerance Techniques. EECE 513: Design of Fault- tolerant Digital Systems
Architectural Level Fault- Tolerance Techniques EECE 513: Design of Fault- tolerant Digital Systems Learning ObjecDves List the techniques for improving the reliability of commodity & high end processors
More informationEfficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation
http://dx.doi.org/10.5573/jsts.2012.12.4.418 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.4, DECEMBER, 2012 Efficient Implementation of Single Error Correction and Double Error Detection
More informationFault-Tolerant Computer System Design ECE 60872/CS Topic 9: Validation
Fault-Tolerant Computer System Design ECE 60872/CS 59000 Topic 9: Validation Saurabh Bagchi ECE/CS Purdue University ECE/CS 1 Outline Introduction Validation methods Design phase Fault simulation Prototype
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationA Fault Tolerant Superscalar Processor
A Fault Tolerant Superscalar Processor 1 [Based on Coverage of a Microarchitecture-level Fault Check Regimen in a Superscalar Processor by V. Reddy and E. Rotenberg (2008)] P R E S E N T E D B Y NAN Z
More informationDep. Systems Requirements
Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small
More informationComparison of SET-Resistant Approaches for Memory-Based Architectures
Comparison of SET-Resistant Approaches for Memory-Based Architectures Daniel R. Blum and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman,
More informationRAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE
RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting
More informationOPERATING SYSTEM SUPPORT FOR REDUNDANT MULTITHREADING. Björn Döbel (TU Dresden)
OPERATING SYSTEM SUPPORT FOR REDUNDANT MULTITHREADING Björn Döbel (TU Dresden) Brussels, 02.02.2013 Hardware Faults Radiation-induced soft errors Mainly an issue in avionics+space 1 DRAM errors in large
More informationFast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names
Fast access ===> use map to find object HW == SW ===> map is in HW or SW or combo Extend range ===> longer, hierarchical names How is map embodied: --- L1? --- Memory? The Environment ---- Long Latency
More informationDefinition of RAID Levels
RAID The basic idea of RAID (Redundant Array of Independent Disks) is to combine multiple inexpensive disk drives into an array of disk drives to obtain performance, capacity and reliability that exceeds
More informationImproving Memory Repair by Selective Row Partitioning
200 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems Improving Memory Repair by Selective Row Partitioning Muhammad Tauseef Rab, Asad Amin Bawa, and Nur A. Touba Computer
More informationWHITE PAPER THE HIGHEST AVAILABILITY FEATURES FOR PRIMEQUEST
WHITE PAPER THE HIGHEST AVAILABILITY FEATURES FOR PRIMEQUEST WHITE PAPER THE HIGHEST AVAILABILITY FEATURES FOR PRIMEQUEST Business continuity and cost-efficiency have become essential demands on IT platforms.
More informationWhite paper PRIMEQUEST 1000 series high availability realized by Fujitsu s quality assurance
White paper PRIMEQUEST 1000 series high availability realized by Fujitsu s quality assurance PRIMEQUEST is an open enterprise server platform that fully maximizes uptime. This whitepaper explains how Fujitsu
More informationDistributed Systems
15-440 Distributed Systems 11 - Fault Tolerance, Logging and Recovery Tuesday, Oct 2 nd, 2018 Logistics Updates P1 Part A checkpoint Part A due: Saturday 10/6 (6-week drop deadline 10/8) *Please WORK hard
More informationCS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:
CS 47 Spring 27 Mike Lam, Professor Fault Tolerance Content taken from the following: "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen (Chapter 8) Various online
More informationProtecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery
White Paper Business Continuity Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery Table of Contents Executive Summary... 1 Key Facts About
More informationCS5460: Operating Systems Lecture 20: File System Reliability
CS5460: Operating Systems Lecture 20: File System Reliability File System Optimizations Modern Historic Technique Disk buffer cache Aggregated disk I/O Prefetching Disk head scheduling Disk interleaving
More informationLow Power Cache Design. Angel Chen Joe Gambino
Low Power Cache Design Angel Chen Joe Gambino Agenda Why is low power important? How does cache contribute to the power consumption of a processor? What are some design challenges for low power caches?
More informationHigh Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs
Vol. 3, Issue. 5, Sep - Oct. 2013 pp-2894-2900 ISSN: 2249-6645 High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs M. Reddy Sekhar Reddy, R.Sudheer Babu
More informationCS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University
Frequently asked questions from the previous class survey CS 370: OPERATING SYSTEMS [MASS STORAGE] How does the OS caching optimize disk performance? How does file compression work? Does the disk change
More informationA Field Analysis of System-level Effects of Soft Errors Occurring in Microprocessors used in Information Systems
A Field Analysis of System-level Effects of Soft Errors Occurring in Microprocessors used in Information Systems Syed Z. Shazli, Mohammed Abdul-Aziz, Mehdi B. Tahoori, David R. Kaeli Department ofelectrical
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits
More informationReliable Computing I
Instructor: Mehdi Tahoori Reliable Computing I Lecture 9: Concurrent Error Detection INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE NANO COMPUTING (CDNC) National Research Center of the
More informationUltra Low-Cost Defect Protection for Microprocessor Pipelines
Ultra Low-Cost Defect Protection for Microprocessor Pipelines Smitha Shyam Kypros Constantinides Sujay Phadke Valeria Bertacco Todd Austin Advanced Computer Architecture Lab University of Michigan Key
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationChapter 11: File System Implementation. Objectives
Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationAdministrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review
Administrivia CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Homework #4 due Thursday answers posted soon after Exam #2 on Thursday, April 24 on memory hierarchy (Unit 4) and
More informationUsing Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance
Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance Outline Introduction and Motivation Software-centric Fault Detection Process-Level Redundancy Experimental Results
More informationBuilt-in Self-Test and Repair (BISTR) Techniques for Embedded RAMs
Built-in Self-Test and Repair (BISTR) Techniques for Embedded RAMs Shyue-Kung Lu and Shih-Chang Huang Department of Electronic Engineering Fu Jen Catholic University Hsinchuang, Taipei, Taiwan 242, R.O.C.
More informationPOWER4 Systems: Design for Reliability. Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX
Systems: Design for Reliability Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX Microprocessor 2-way SMP system on a chip > 1 GHz processor frequency >1GHz Core Shared L2 >1GHz Core
More informationLast time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson
Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to
More informationScalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN)
Scalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN) Abstract With increasing design complexity in modern SOC design, many memory
More informationCS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University
Frequently asked questions from the previous class survey CS 370: OPERATING SYSTEMS [DISK SCHEDULING ALGORITHMS] Shrideep Pallickara Computer Science Colorado State University ECCs: How does it impact
More informationPHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan
PHX: Memory Speed HPC I/O with NVM Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan Node Local Persistent I/O? Node local checkpoint/ restart - Recover from transient failures ( node restart)
More informationCSE 380 Computer Operating Systems
CSE 380 Computer Operating Systems Instructor: Insup Lee University of Pennsylvania Fall 2003 Lecture Note on Disk I/O 1 I/O Devices Storage devices Floppy, Magnetic disk, Magnetic tape, CD-ROM, DVD User
More informationSupercomputer Field Data. DRAM, SRAM, and Projections for Future Systems
Supercomputer Field Data DRAM, SRAM, and Projections for Future Systems Nathan DeBardeleben, Ph.D. (LANL) Ultrascale Systems Research Center (USRC) 6 th Soft Error Rate (SER) Workshop Santa Clara, October
More informationCharacterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory
Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory Yixin Luo, Sriram Govindan, Bikash Sharma, Mark Santaniello, Justin Meza, Aman Kansal,
More informationSEE Tolerant Self-Calibrating Simple Fractional-N PLL
SEE Tolerant Self-Calibrating Simple Fractional-N PLL Robert L. Shuler, Avionic Systems Division, NASA Johnson Space Center, Houston, TX 77058 Li Chen, Department of Electrical Engineering, University
More informationPowerVR Hardware. Architecture Overview for Developers
Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More information416 Distributed Systems. Errors and Failures Feb 1, 2016
416 Distributed Systems Errors and Failures Feb 1, 2016 Types of Errors Hard errors: The component is dead. Soft errors: A signal or bit is wrong, but it doesn t mean the component must be faulty Note:
More informationVMware vsphere Clusters in Security Zones
SOLUTION OVERVIEW VMware vsan VMware vsphere Clusters in Security Zones A security zone, also referred to as a DMZ," is a sub-network that is designed to provide tightly controlled connectivity to an organization
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More information