Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy
|
|
- Darrell Montgomery
- 5 years ago
- Views:
Transcription
1 Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Wei Chen, Rui Gong, Fang Liu, Kui Dai, Zhiying Wang School of Computer, National University of Defense Technology, Changsha , Hunan, China {chenwei, gongrui, liufang, daikui}@chiplight.com.cn; zywang@nudt.edu.cn Abstract - Triple Modular Redundancy is widely used in dependable systems design to ensure high reliability against soft errors. Conventional TMR is effective in protecting sequential circuits but can t mask soft errors in combinational circuits. A new redundancy technique called the Space-Time Triple Modular Redundancy is presented in this paper, which improves the soft error tolerance of the combinational circuit. This paper demonstrates the usefulness of the Space-Time Triple Modular Redundancy design in a special case study. The delay overhead and the fault tolerance of Space-Time Triple Modular Redundancy are compared with that of the conventional Triple Modular Redundancy. Results show that Space-Time Triple Modular Redundancy is more effective than the conventional Triple Modular Redundancy. Keywords: soft error, fault tolerance, reliability, spacetime triple modular redundancy, sequential circuit, combinational circuit. 1 Introduction Integrated Circuits (IC) used in computer systems and other electronic systems operating under radiation are susceptible to a phenomenon known as Single Event Upset (SEU), or soft error. A soft error is a transient effect induced by the trespassing of a single charged particle through the silicon. Due to the constant shrink in the transistor dimensions, particles that once were considered negligible now are significant to cause upsets [1] which can perturb the integrated circuit operation. As computer systems and other electronic systems are widely used in radiation environments such as space vehicles, satellites and some military systems, fault tolerance and reliability of the IC should be improved to keep systems working correctly in harsh environments. Several techniques have been proposed to make designs reliable in the presence of soft errors. Triple modular redundancy (TMR) [2] is a technique commonly used to provide design hardening. It is used to protect sequential circuits, or storage elements. Conventional TMR technique has been proved effective in protecting sequential circuits. But it can t mask soft errors in combinational circuits. A new TMR technique called Space-Time Triple Modular Redundancy (ST-TMR) is proposed in this paper. It is proved effectively improving fault tolerance of combinational circuits. Both the conventional TMR and ST-TMR are used in a target application: a special counter system. Random faults are injected into the counter. By investigating the value of the counter, the fault tolerant ability of the conventional TMR and ST-TMR is analyzed. This paper is organized as follows. Section 2 introduces soft errors in sequential circuit and combinational circuit. Section 3 reviews the conventional TMR technique. In Section 4, the architecture and principle of ST-TMR are described in detail. A case study on a special counter protected under both and ST-TMR is introduced in Section 5 and the main conclusion is presented in Section 6. 2 Soft Errors in Sequential Circuits and Combinational Circuits The circuit of modern processor or other electronic system falls into two basic classes: sequential circuit and combinational circuit. Soft errors in these two circuits have different impact. Thus, different approaches are required to protect the sequential circuit and the combinational circuit. 2.1 Soft Errors in Sequential Circuits The main contribution to the soft error rate (SER) comes from sequential circuits in current microprocessors. Sequential circuits always refer to different storage elements, such as registers, memories and flip-flops in general. A soft error in these circuits may result in a bit flip in the saved state, which may lead to a wrong execution. Storage elements take up a large part of the chip area in modern microprocessors. As a result, most modern microprocessors already incorporate mechanisms for detecting soft errors, like the triple modular redundancy technique. 2.2 Soft Errors in Combinational Circuits A particle that strikes a p-n junction within a combinational circuit may alter the value produced by the
2 circuit. However, a transient change in the combinational circuit will not affect the results of a computation unless it is captured by a sequential circuit, as shown in Fig.1(a). Transient changes on the clock signal or reset signal will definitely cause the circuit incorrectly executed as shown in Fig.1(b). 3 Triple Modular Redundancy Technique Triple Module Redundancy [2, 6, 7] has been widely used to improve the fault tolerance by protecting storage elements. All memory elements are tripled and their respective outputs are connected to a voter as shown in Fig.2. The voter will select the output of the majority of the components. So, if one component fails, the error will not be reflected in the voter output. The voter is implemented by few logic gates, for each bit, as it can be seen in Fig.3. (a) Input Voter Output Fig. 2. Storage cell protected by TMR (b) Fig. 1. (a) Transient fault in the combinational circuit; (b) transient fault on the clock signal Past research has shown that combinational logic is much less susceptible to soft errors than memory elements [3, 4] and the probability of the glitch from the combinational circuit captured by the sequential circuit is very small. As a result, mechanisms most modern microprocessors already incorporated for detecting soft errors typically focus on protecting sequential elements, particularly storage cells. With the trends of reduced feature sizes, supply and threshold voltages, soft error tolerance of combinational logic circuits is affected more than memory elements. In addition, higher clock frequencies increase the chance of a glitch being captured by a sequential element. Even though SER in combinational circuits is currently smaller than that of sequential elements, it is expected to rise 9 orders of magnitude between 1992 to 2011, when it will equal to the SER of unprotected memory elements [5]. For processors where the sequential elements have been protected, combinational logic will quickly become the dominant source of soft errors. Further research is required into methods for protecting combinational logic from soft errors. Fig. 3. Voter architecture TMR has been proved to be effective in protecting memory elements, or sequential circuits. But conventional TMR described above can t mask glitches from the combinational circuit. As shown in Fig.4, redundant registers of conventional TMR are controlled by the same clock. When the glitch from the combinational circuit propagates to the sequential circuit at the rising edge of the clock, all the three registers will capture the glitch. Similarly, when soft error occurs on the clock signal or the reset signal, all the redundant storage cells will execute incorrectly. 4 Space-Time Triple Modular Redundancy Technique A simple method to improve the soft error tolerance of the combinational circuit is to reduce the chance of the glitch being captured by the sequential circuit. Based on the space redundancy of the conventional TMR (), a new type of TMR adding time redundancy is proposed in this paper. As shown in Fig.5, the Space-Time Triple
3 Fig. 4. Architecture of the conventional TMR in detail (reset signal is omitted) Modular Redundancy (ST-TMR) triplicates the clock in each of the TMR styles. By skewing the clock with delay δ, the fault tolerance of the combinational circuit is improved. As long as the glitch width is smaller than the clock skew, though a glitch from the combinational circuit is captured at the rising edge of one clock, the other two sequential elements won t capture the glitch. sig_full is inactive. Otherwise, it will be set at the rising edge of the clock if sig_full is active. The register in the counter could be treated as a sequential circuit while the sig_ful signal could be treated as an output of a combinational circuit. Thus any soft errors in the combinational circuit could be simulated as glitches on the sig_full signal. This counter is hardened using both and ST- TMR. Soft errors are injected into the counter, in order to investigate the fault tolerance between the conventional TMR and ST-TMR. The counter is described in VHDL and synthesized in XCV300 by Xilinx [8]. Fig. 5. Architecture of space-time triple modular redundancy (reset signal is omitted) ST-TMR is also effective in masking the soft errors on the clock signal and the reset signal because of the triplication. Because there is skew exists between clocks, the voter of ST-TMR is modified to vote the majority value after all the three clock signals are stable. 5 Case Study: A Counter Protected under and ST-TMR Though and ST-TMR have similar architectures, they are different in terms of delay cost and the fault tolerant capability. In terms of delay, ST-TMR is a little worse than S- TMR. As shown in Fig.4, the delay of the circuit of is: t + δ + δ (1) ff com voter And as shown in Fig.5, the delay of ST-TMR is: t + δ + δ + 2δ (2) ff com voter However, the increase of delay caused by ST-TMR could be negligible compared with the improvement of fault tolerance capability. In order to compare the two types of TMR, we target our experiment on a special counter, as shown in Fig. 6. The counter is cleared when the reset signal is active. It increases itself by 1 every rising edge of the clock signal if Fig. 6. The architecture of a counter 5.1 Fault Tolerance of Sequential Circuits Assuming that the sig_full signal, the reset signal, the clock signal and the voter are fault free, we injected 1000 faults into the counter in 1ms while it is running, in order to investigate the fault tolerance of the sequential circuit protected under and ST-TMR. Faults are randomly injected, they could occur at any time during 1ms, and could be in any of the three redundant registers. As shown in Fig.7, both and ST-TMR are effective in protecting the sequential circuit. ST-TMR is a little more effective than T-TMR, because the voter of ST- TMR only works when the three clocks are stable. So the chance of voting the incorrect value is reduced. There are still some soft errors which can not be masked by or ST-TMR. That is when two or more soft errors occur in different redundant registers during the same clock cycle. Because the sequential circuit only updates at the rising edge of the clock, if two or more soft errors occur in different redundant registers during the same clock cycle, the voter will vote the incorrect value and the sequential circuit will update with the incorrect value at the following rising edge of the clock. However, such probability is very small. Furthermore, the fault tolerance increases while the clock frequency increases. Because the probability of the two or more soft errors occurring in different redundant registers during the same clock cycle decreases as the clock period decreases.
4 this experiment are injected too frequently, two or more glitches occur successively at more than one rising edge of clocks. Another reason is that the glitch width is so big that it covers the skew of the clock. (a) Fig. 7. Fault tolerance of counter protected under and ST- TMR: (a) the clock frequency is 100MHz; (b) the clock frequency is 50MHz. Fault tolerance on the Y-axis is the ratio of correct execution times to the total execution times, and it is obtained from fault injection experiments. 5.2 Fault Tolerance of Combinational Circuits As mentioned above, sig_full could be treated as an output of a combinational circuit. So glitches could be injected on this signal to simulate the soft errors in the combinational circuit. Assuming that the redundant registers, the reset signal, the clock signal and the voter are fault free, 1000 glitches are randomly injected on sig_full in 1ms while the counter is running. Results are shown in Table.1. All the results would be much better, for 1000 faults in 1ms is too frequent. Table 1. Fault tolerance of combinational circuits proteced under and ST-TMR with different clock skew, different glitch width and different clock frequency. δ is the clock skew. (b) (a) Clock frequency =100MHz ST-TMR(δ=2ns) ST-TMR(δ=4ns) 7% 99% 7% 4% 31% 92% 3% 17% 37% (b) Clock frequency =50MHz ST-TMR(δ=2ns) ST-TMR(δ=4ns) 92% 89% 9% 49% Obviously, the fault tolerance of the combinational circuit protected by decreases rapidly compared with the fault tolerance of the sequential circuit. Clock skew and glitch width have different influence on the fault tolerance of the combinational circuit while clock frequency doesn t have the same effect. There are two reasons why those soft errors still can t be masked by ST-TMR. One reason is that soft errors in 5.3 Fault Tolerance of the Clock (Reset) Signal Clock signal and reset signal are global signals of IC. Any glitch on these signals may cause incorrect operation. In this experiment, 1000 glitches are randomly injected on the clock signal, assuming that the redundant registers, the sig_full signal, the reset signal and the voter are fault free. Results are shown in Table. 2. Table 2. Fault tolerance of clock signal of the circuit proteced under and ST-TMR with different clock skew, different glitch width and different clock frequency. δ is the clock skew. δ is the clock skew. (a) Clock frequency = 100M ST-TMR (δ =0.5ns) ST-TMR (δ =1n) ST-TMR (δ =2ns) 9 (b) Clock frequency = 50M ST-TMR (δ =0.5ns) ST-TMR (δ =1n) ST-TMR (δ =2ns) 79% 83% 79% 83% 76% 77% % 85% Obviously, conventional TMR can not mask glitches on the clock signal, while ST-TMR is much more effective. Experiments on the reset signal have similar results. With the same reasons in Section 5.2, soft errors which are injected too frequently can t be masked by ST- TMR. 5.4 Fault Tolerance of the Whole Counter In the sections above, the fault tolerance of the combinational circuit, the sequential circuit and the clock signal have been investigated independently. In this section, soft errors are injected into the whole counter. Every part of the counter would be the source of soft errors faults are injected randomly into the register, the sig_full signal, the clock signal and the reset signal. Results are shown in Fig.8. It is proved again that ST-TMR is more effective in protecting integrated circuits against soft errors.
5 [4] P. Liden, P. Dahlgren, R. Johansson, and J. Karlsson, On Latching Probability of Particle Induced Transients in Combinational Networks, In Proceedings of the 24th Symposium on Fault-Tolerant Computing (FTCS-24), pp , Fig. 8. Fault tolerance of a counter protected under and ST-TMR: (a) the clock frequency is 100MHz; (b) the clock frequency is 50MHz. 6 Conclusion Current technology trends (increased clock frequencies, reduced feature sizes, reduced supply and threshold voltages) have a negative effect on the soft error tolerance of the circuit. They will lead to a substantially more rapid increase in the soft error rate in combinational circuit than sequential circuit. Computer systems and other electronic systems are more and more used in the harsh environments where soft errors occur frequently. Research is required on methods for protecting combinational circuits in order to improve the fault tolerance of the whole system. In this paper, a new TMR technique based on both space redundancy and time redundancy is proposed. ST- TMR can not only protect the sequential circuit, but also mask faults from the combinational circuit and clock (reset) signal. A case study demonstrates that ST-TMR is much more effective in improving the fault tolerance and reliability of the computer system and other electronic systems, though it introduces a little delay penalty. In our future work, the relationship of clock skew, clock frequency, glitch width and the frequency of faults injected will be discussed in detail. This will be helpful to finding the appropriate clock skew to achieve the better fault tolerance when the clock frequency and the glitch width are given. [5] P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi, Modeling the effect of technology trends on the soft error rate of combinational logic, Proceedings International Conference on Dependable Systems and Networks, pp , June [6] C. CARMICHAEL, Triple Module Redundancy Design Techniques for the Virtex TM Series, Xilinx Application Note xapp197, [7] R. Hentschke, F. Marques, F. Lima, L. Carro, A. Susin, R. Reis, Analyzing area and performance penalty of protecting different digital modules with Hamming code and triple modular redundancy, Integrated Circuits and Systems Design, 15th Symposium, pp , Sept [8] XILINX, INC. Virtex 2.5 V Field Programmable Gate Arrays, Xilinx Datasheet DS003, v2.4, Oct References [1] A. Johnston, Scaling and Technology Issues for Soft Error Rates, 4th Annual Research Conference on Reliability, Stanford University, Oct [2] D.P. Siewiorek and R. S. Swarz, Reliable Computer Systems: Design and Evaluation, Digital Press, [3] J. Gaisler, Evaluation of a 32-bit microprocessor with built in concurrent error-detection, In Twenty- Seventh Annual International Symposium on Fault-Tolerant Computing, pp , 1997.
Single Event Upset Mitigation Techniques for SRAM-based FPGAs
Single Event Upset Mitigation Techniques for SRAM-based FPGAs Fernanda de Lima, Luigi Carro, Ricardo Reis Universidade Federal do Rio Grande do Sul PPGC - Instituto de Informática - DELET Caixa Postal
More informationDESIGN AND ANALYSIS OF SOFTWARE FAULTTOLERANT TECHNIQUES FOR SOFTCORE PROCESSORS IN RELIABLE SRAM-BASED FPGA
DESIGN AND ANALYSIS OF SOFTWARE FAULTTOLERANT TECHNIQUES FOR SOFTCORE PROCESSORS IN RELIABLE SRAM-BASED FPGA 1 Vatsya Tiwari M.Tech Student Department of computer science & engineering Abstract. This paper
More informationDynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency
Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency Vijay G. Savani, Akash I. Mecwan, N. P. Gajjar Institute of Technology, Nirma University vijay.savani@nirmauni.ac.in, akash.mecwan@nirmauni.ac.in,
More informationAnalysis of Soft Error Mitigation Techniques for Register Files in IBM Cu-08 90nm Technology
Analysis of Soft Error Mitigation Techniques for s in IBM Cu-08 90nm Technology Riaz Naseer, Rashed Zafar Bhatti, Jeff Draper Information Sciences Institute University of Southern California Marina Del
More informationMultiple Event Upsets Aware FPGAs Using Protected Schemes
Multiple Event Upsets Aware FPGAs Using Protected Schemes Costas Argyrides, Dhiraj K. Pradhan University of Bristol, Department of Computer Science Merchant Venturers Building, Woodland Road, Bristol,
More informationFast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs
Fast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs Hamid R. Zarandi,2, Seyed Ghassem Miremadi, Costas Argyrides 2, Dhiraj K. Pradhan 2 Department of Computer Engineering, Sharif
More informationOutline of Presentation Field Programmable Gate Arrays (FPGAs(
FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering Auburn University Outline of Presentation Field Programmable Gate Arrays (FPGAs( FPGAs) How Programmable
More informationHamming FSM with Xilinx Blind Scrubbing - Trick or Treat
Hamming FSM with Xilinx Blind Scrubbing - Trick or Treat Jano Gebelein Infrastructure and Computer Systems in Data Processing (IRI) Frankfurt University Germany January 31st, 2012 Mannheim, Germany 1 Outline
More informationError Mitigation of Point-to-Point Communication for Fault-Tolerant Computing
Error Mitigation of Point-to-Point Communication for Fault-Tolerant Computing Authors: Robert L Akamine, Robert F. Hodson, Brock J. LaMeres, and Robert E. Ray www.nasa.gov Contents Introduction to the
More informationImproving FPGA Design Robustness with Partial TMR
Improving FPGA Design Robustness with Partial TMR Brian Pratt, Michael Caffrey, Paul Graham, Keith Morgan, Michael Wirthlin Abstract This paper describes an efficient approach of applying mitigation to
More informationHigh Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs
Vol. 3, Issue. 5, Sep - Oct. 2013 pp-2894-2900 ISSN: 2249-6645 High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs M. Reddy Sekhar Reddy, R.Sudheer Babu
More informationComparison of SET-Resistant Approaches for Memory-Based Architectures
Comparison of SET-Resistant Approaches for Memory-Based Architectures Daniel R. Blum and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman,
More informationTowards Hamming Processor
Towards Hamming Processor Shlomi Dolev Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel E-mail: dolev@cs.bgu.ac.il Sergey Frenkel Institute of Informatics Problems, Russian Acad
More informationALMA Memo No Effects of Radiation on the ALMA Correlator
ALMA Memo No. 462 Effects of Radiation on the ALMA Correlator Joseph Greenberg National Radio Astronomy Observatory Charlottesville, VA July 8, 2003 Abstract This memo looks specifically at the effects
More informationEnabling Testability of Fault-Tolerant Circuits by Means of IDDQ-Checkable Voters
Enabling Testability of Fault-Tolerant Circuits by Means of IDDQ-Checkable Voters ECE 7502 Class Discussion Ningxi Liu 14 th Apr 2015 ECE 7502 S2015 Customer Validate Requirements Verify Specification
More informationSEE Tolerant Self-Calibrating Simple Fractional-N PLL
SEE Tolerant Self-Calibrating Simple Fractional-N PLL Robert L. Shuler, Avionic Systems Division, NASA Johnson Space Center, Houston, TX 77058 Li Chen, Department of Electrical Engineering, University
More informationLeso Martin, Musil Tomáš
SAFETY CORE APPROACH FOR THE SYSTEM WITH HIGH DEMANDS FOR A SAFETY AND RELIABILITY DESIGN IN A PARTIALLY DYNAMICALLY RECON- FIGURABLE FIELD-PROGRAMMABLE GATE ARRAY (FPGA) Leso Martin, Musil Tomáš Abstract:
More informationOutline. Trusted Design in FPGAs. FPGA Architectures CLB CLB. CLB Wiring
Outline Trusted Design in FPGAs Mohammad Tehranipoor ECE6095: Hardware Security & Trust University of Connecticut ECE Department Intro to FPGA Architecture FPGA Overview Manufacturing Flow FPGA Security
More informationTECHNOLOGY scaling has driven the computer industry
516 IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, VOL. 4, NO. 3, SEPTEMBER 2004 Timing Vulnerability Factors of Sequentials Norbert Seifert, Senior Member, IEEE, and Nelson Tam, Member, IEEE Abstract
More informationArea Efficient Scan Chain Based Multiple Error Recovery For TMR Systems
Area Efficient Scan Chain Based Multiple Error Recovery For TMR Systems Kripa K B 1, Akshatha K N 2,Nazma S 3 1 ECE dept, Srinivas Institute of Technology 2 ECE dept, KVGCE 3 ECE dept, Srinivas Institute
More informationSoft Error Detection And Correction For Configurable Memory Of Reconfigurable System
Soft Error Detection And Correction For Configurable Memory Of Reconfigurable System Babu. M, Saranya. S, Preethy. V, Gurumoorthy. J Abstract: The size of integrated Circuits has developed rapidly and
More informationOn the Optimal Design of Triple Modular Redundancy Logic for SRAM-based FPGAs
On the Optimal Design of Triple Modular Redundancy Logic for SRAM-based FPGAs F. Lima Kastensmidt, L. Sterpone, L. Carro, M. Sonza Reorda To cite this version: F. Lima Kastensmidt, L. Sterpone, L. Carro,
More informationError Resilience in Digital Integrated Circuits
Error Resilience in Digital Integrated Circuits Heinrich T. Vierhaus BTU Cottbus-Senftenberg Outline 1. Introduction 2. Faults and errors in nano-electronic circuits 3. Classical fault tolerant computing
More informationLA-UR- Title: Author(s): Intended for: Approved for public release; distribution is unlimited.
LA-UR- Approved for public release; distribution is unlimited. Title: Author(s): Intended for: Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the Los Alamos
More informationAn Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs
An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs Lee W. Lerner and Charles E. Stroud Dept. of Electrical and Computer Engineering Auburn University Auburn, AL, USA Abstract We present
More informationOptimization of power and area using majority voter based fault tolerant VLSI circuits
Optimization of power and area using majority voter based fault tolerant VLSI circuits Kalpana 1, Umesh Pal Singh 2 1,2 Seth Jai Parkas Mukand Lal Institute of Engineering and Technology Radaur (YNR),
More informationSoftware-based Fault Tolerance Mission (Im)possible?
Software-based Fault Tolerance Mission Im)possible? Peter Ulbrich The 29th CREST Open Workshop on Software Redundancy November 18, 2013 System Software Group http://www4.cs.fau.de Embedded Systems Initiative
More informationA Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor
A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor Jason Blome 1, Scott Mahlke 1, Daryl Bradley 2 and Krisztián Flautner 2 1 Advanced Computer Architecture
More informationStudy on FPGA SEU Mitigation for Readout Electronics of DAMPE BGO Calorimeter
Study on FPGA SEU Mitigation for Readout Electronics of AMPE BGO Calorimeter Zhongtao Shen, Changqing Feng, Shanshan Gao, eliang Zhang, i Jiang, Shubin Liu, i An Abstract The BGO calorimeter, which provides
More informationSimulation of Hamming Coding and Decoding for Microcontroller Radiation Hardening Rehab I. Abdul Rahman, Mazhar B. Tayel
Simulation of Hamming Coding and Decoding for Microcontroller Radiation Hardening Rehab I. Abdul Rahman, Mazhar B. Tayel Abstract This paper presents a method of hardening the 8051 micro-controller, able
More informationEDAC FOR MEMORY PROTECTION IN ARM PROCESSOR
EDAC FOR MEMORY PROTECTION IN ARM PROCESSOR Mrs. A. Ruhan Bevi ECE department, SRM, Chennai, India. Abstract: The ARM processor core is a key component of many successful 32-bit embedded systems. Embedded
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits
More informationA Double-Node-Upset Self-Recoverable Latch Design for High Performance and Low Power Application
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 A Double-Node-Upset Self-Recoverable Latch Design for High Performance and Low Power Application Aibin Yan, Kang
More informationSingle Event Latchup Power Switch Cell Characterisation
Single Event Latchup Power Switch Cell Characterisation Vladimir Petrovic, Marko Ilic, Gunter Schoof Abstract - In this paper are described simulation and measurement processes of a power switch cell used
More informationSAN FRANCISCO, CA, USA. Ediz Cetin & Oliver Diessel University of New South Wales
SAN FRANCISCO, CA, USA Ediz Cetin & Oliver Diessel University of New South Wales Motivation & Background Objectives & Approach Our technique Results so far Work in progress CHANGE 2012 San Francisco, CA,
More informationSEU Mitigation Design Techniques for the XQR4000XL Author: Phil Brinkley, Avnet and Carl Carmichael
XAPP181 (v1.0) March 15, 2000 Product Obsolete/Under Obsolescence Application Note: FPGAs SEU Mitigation Design Techniques for the XQ4000XL Author: Phil Brinkley, Avnet and Carl Carmichael Summary This
More informationOn Supporting Adaptive Fault Tolerant at Run-Time with Virtual FPGAs
On Supporting Adaptive Fault Tolerant at Run-Time with Virtual FPAs K. Siozios 1, D. Soudris 1 and M. Hüebner 2 1 School of ECE, National Technical University of Athens reece Email: {ksiop, dsoudris}@microlab.ntua.gr
More informationEliminating Single Points of Failure in Software Based Redundancy
Eliminating Single Points of Failure in Software Based Redundancy Peter Ulbrich, Martin Hoffmann, Rüdiger Kapitza, Daniel Lohmann, Reiner Schmid and Wolfgang Schröder-Preikschat EDCC May 9, 2012 SYSTEM
More informationNEPP Independent Single Event Upset Testing of the Microsemi RTG4: Preliminary Data
NEPP Independent Single Event Upset Testing of the Microsemi RTG4: Preliminary Data Melanie Berg, AS&D in support of NASA/GSFC Melanie.D.Berg@NASA.gov Kenneth LaBel, NASA/GSFC Jonathan Pellish, NASA/GSFC
More informationDynamic Reconfigurable Computing Architecture for Aerospace Applications
Dynamic Reconfigurable Computing Architecture for Aerospace Applications Brock J. LaMeres 406-994-5987 lameres@ece.montana.edu Clint Gauer 406-994-6495 gauer33@gmail.com Electrical & Computer Engineering
More informationA Fault-Tolerant Alternative to Lockstep Triple Modular Redundancy
A Fault-Tolerant Alternative to Lockstep Triple Modular Redundancy Andrew L. Baldwin, BS 09, MS 12 W. Robert Daasch, Professor Integrated Circuits Design and Test Laboratory Problem Statement In a fault
More information1 Introduction. Gökçe Aydos 1 Görschwin Fey 12
Douglas Cunningham, Petra Hofstedt, Klaus Meer, Ingo Schmitt (Hrsg.): INFORMATIK 2015 Lecture Notes in Informatics (LNI), Gesellschaft für Informatik, Bonn 2015 Parity-based Soft Error Detection with Software-based
More informationInitial Single-Event Effects Testing and Mitigation in the Xilinx Virtex II-Pro FPGA
Initial Single-Event Effects Testing and Mitigation in the Xilinx Virtex II-Pro FPGA J. George 1, S. Rezgui 2, G. Swift 3, C. Carmichael 2 For the North American Xilinx Test Consortium 1 The Aerospace
More informationChapter 8. Coping with Physical Failures, Soft Errors, and Reliability Issues. System-on-Chip EE141 Test Architectures Ch. 8 Physical Failures - P.
Chapter 8 Coping with Physical Failures, Soft Errors, and Reliability Issues System-on-Chip EE141 Test Architectures Ch. 8 Physical Failures - P. 1 1 What is this chapter about? Gives an Overview of and
More informationIntroduction to Field Programmable Gate Arrays
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Historical introduction.
More informationAn Energy-Efficient Scan Chain Architecture to Reliable Test of VLSI Chips
An Energy-Efficient Scan Chain Architecture to Reliable Test of VLSI Chips M. Saeedmanesh 1, E. Alamdar 1, E. Ahvar 2 Abstract Scan chain (SC) is a widely used technique in recent VLSI chips to ease the
More informationImplementation of single bit Error detection and Correction using Embedded hamming scheme
Implementation of single bit Error detection and Correction using Embedded hamming scheme Anoop HK 1, Subodh kumar panda 2 and Vasudeva G 1 M.tech(VLSI & ES), BNMIT, Bangalore 2 Assoc Prof,Dept of ECE,
More informationPROGRAMMABLE MODULE WHICH USES FIRMWARE TO REALISE AN ASTRIUM PATENTED COSMIC RANDOM NUMBER GENERATOR FOR GENERATING SECURE CRYPTOGRAPHIC KEYS
A PROPOSAL FOR A SPACE FLIGHT DEMONSTRATION OF A DYNAMICALLY RECONFIGURABLE PROGRAMMABLE MODULE WHICH USES FIRMWARE TO REALISE AN ASTRIUM PATENTED COSMIC RANDOM NUMBER GENERATOR FOR GENERATING SECURE CRYPTOGRAPHIC
More informationHigh temperature / radiation hardened capable ARM Cortex -M0 microcontrollers
High temperature / radiation hardened capable ARM Cortex -M0 microcontrollers R. Bannatyne, D. Gifford, K. Klein, C. Merritt VORAGO Technologies 2028 E. Ben White Blvd., Suite #220, Austin, Texas, 78741,
More informationSequential Circuit Design: Principle
Sequential Circuit Design: Principle Chapter 8 1 Outline 1. Overview on sequential circuits 2. Synchronous circuits 3. Danger of synthesizing asynchronous circuit 4. Inference of basic memory elements
More informationSoft Error Protection Verification via Smart Behavioral Simulation
Soft Error Protection Verification via Smart Behavioral Simulation Abstract This paper presents a novel approach to verify the Single Event Upset (SEU) resilience of a given netlist based on smart behavioral
More informationAn Integrated ECC and BISR Scheme for Error Correction in Memory
An Integrated ECC and BISR Scheme for Error Correction in Memory Shabana P B 1, Anu C Kunjachan 2, Swetha Krishnan 3 1 PG Student [VLSI], Dept. of ECE, Viswajyothy College Of Engineering & Technology,
More informationA Low-Cost Correction Algorithm for Transient Data Errors
A Low-Cost Correction Algorithm for Transient Data Errors Aiguo Li, Bingrong Hong School of Computer Science and Technology Harbin Institute of Technology, Harbin 150001, China liaiguo@hit.edu.cn Introduction
More informationfalling edge Intro Computer Organization
Clocks 1 A clock is a free-running signal with a cycle time. A clock may be either high or low, and alternates between the two states. The length of time the clock is high before changing states is its
More informationGeneric Scrubbing-based Architecture for Custom Error Correction Algorithms
Generic Scrubbing-based Architecture for Custom Error Correction Algorithms Rui Santos, Shyamsundar Venkataraman Department of Electrical & Computer Engineering National University of Singapore Email:
More informationTU Wien. Fault Isolation and Error Containment in the TT-SoC. H. Kopetz. TU Wien. July 2007
TU Wien 1 Fault Isolation and Error Containment in the TT-SoC H. Kopetz TU Wien July 2007 This is joint work with C. El.Salloum, B.Huber and R.Obermaisser Outline 2 Introduction The Concept of a Distributed
More informationWilliam Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory
William Stallings Computer Organization and Architecture 8th Edition Chapter 5 Internal Memory Semiconductor Memory The basic element of a semiconductor memory is the memory cell. Although a variety of
More informationSelf-checking combination and sequential networks design
Self-checking combination and sequential networks design Tatjana Nikolić Faculty of Electronic Engineering Nis, Serbia Outline Introduction Reliable systems Concurrent error detection Self-checking logic
More informationLEON- PCI- UMC Development Board
LEON- PCI- UMC Development Board Test Report GAISLER RESEARCH / PENDER ELECTRONIC DESIGN Rev. 1.2, 2004-04- 02 LEON- PCI- UMC Development Board Test Report 2 Gaisler Resarch LEON- PCI- UMC Development
More informationMitigation of SCU and MCU effects in SRAM-based FPGAs: placement and routing solutions
Mitigation of SCU and MCU effects in SRAM-based FPGAs: placement and routing solutions Niccolò Battezzati Filomena Decuzzi Luca Sterpone Massimo Violante 1 Goal To provide solutions for increasing the
More informationSpace: The Final Frontier FPGAs for Space and Harsh Environments
Space: The Final Frontier FPGAs for Space and Harsh Environments Introduction FPGAs offer several benefits to the system designer Flexibility of Design performance, upgrades Reduction in NRE and Cost.
More informationENHANCED DYNAMIC RECONFIGURABLE PROCESSING MODULE FOR FUTURE SPACE APPLICATIONS
Enhanced Dynamic Reconfigurable Processing Module for Future Space Applications ENHANCED DYNAMIC RECONFIGURABLE PROCESSING MODULE FOR FUTURE SPACE APPLICATIONS Session: SpaceWire Missions and Applications
More informationA Hybrid Fault-Tolerant Architecture for Highly Reliable Processing Cores
J Electron Test (2016) 32:147 161 DOI 10.1007/s10836-016-5578-0 A Hybrid Fault-Tolerant Architecture for Highly Reliable Processing Cores I. Wali 1 Arnaud Virazel 1 A. Bosio 1 P. Girard 1 S. Pravossoudovitch
More informationBuilding High Reliability into Microsemi Designs with Synplify FPGA Tools
Power Matters. TM Building High Reliability into Microsemi Designs with Synplify FPGA Tools Microsemi Space Forum 2015, Synopsys 1 Agenda FPGA SEU mitigation industry trends and best practices Market trends
More informationConfigurable Fault Tolerant Processor (CFTP) for Space Based Applications
Error Interrupt Status & I/O Memory Control Clock Control Interface/switching logic Configuration Control Interface/switching (GLUE) logic EDAC TMR PRLOS PRLOS Command and Status Registers Bus Transceivers
More informationFine-Grain Redundancy Techniques for High- Reliable SRAM FPGA`S in Space Environment: A Brief Survey
Fine-Grain Redundancy Techniques for High- Reliable SRAM FPGA`S in Space Environment: A Brief Survey T.Srinivas Reddy 1, J.Santosh 2, J.Prabhakar 3 Assistant Professor, Department of ECE, MREC, Hyderabad,
More informationFPGAs operating in a radiation environment: lessons learned from FPGAs in space
Journal of Instrumentation OPEN ACCESS FPGAs operating in a radiation environment: lessons learned from FPGAs in space To cite this article: M J Wirthlin View the article online for updates and enhancements.
More informationRTG4 PLL SEE Test Results July 10, 2017 Revised March 29, 2018 Revised July 31, 2018
RTG4 PLL SEE Test Results July 10, 2017 Revised March 29, 2018 Revised July 31, 2018 Radiation Group 1 I. Introduction This document disseminates recently acquired single-event-effects (SEE) data on the
More informationData Partitioning Techniques for Partially Protected Caches to Reduce Soft Error Induced Failures
Data Partitioning Techniques for Partially Protected Caches to Reduce Soft Error Induced Failures Kyoungwoo Lee, Aviral Shrivastava, Nikil Dutt, and Nalini Venkatasubramanian Abstract Exponentially increasing
More informationA Portable and Fault-Tolerant Microprocessor Based on the SPARC V8 Architecture
A Portable and Fault-Tolerant Microprocessor Based on the SPARC V8 Architecture Jiri Gaisler Gaisler Research, 411 08 Göteborg, Sweden jiri@gaisler.com Abstract The architecture and implementation of the
More informationRedundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992
Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical
More informationMinimizing Single Event Upset Effects Using Synplicity
v3.0 9-2-98 Minimizing Single Event Upset Effects Using Synplicity Application Note This application note gives an overview of some single event upset (SEU) resistant design techniques and describes how
More informationARCHITECTURE DESIGN FOR SOFT ERRORS
ARCHITECTURE DESIGN FOR SOFT ERRORS Shubu Mukherjee ^ШВпШшр"* AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO T^"ТГПШГ SAN FRANCISCO SINGAPORE SYDNEY TOKYO ^ P f ^ ^ ELSEVIER Morgan
More informationExploiting Error Detection Latency for Parity-based Soft Error Detection
Exploiting Error Detection Latency for Parity-based Soft Error Detection Gökçe Aydos University of Bremen Bremen, Germany goekce@cs.uni-bremen.de Goerschwin Fey German Aerospace Center Bremen, Germany
More informationFAULT-TOLERANCE TECHNIQUES FOR SRAM-BASED FPGAS
FAULT-TOLERANCE TECHNIQUES FOR SRAM-BASED FPGAS . FRONTIERS IN ELECTRONIC TESTING Consulting Editor Vishwani D. Agrawal Books in the series: Data Mining and Diagnosing IC Fails Huisman, L.M., Vol. 31 ISBN:
More informationValidation of the Proposed Hardness Analysis Technique for FPGA Designs to Improve Reliability and Fault-Tolerance
Validation of the Proposed Hardness Analysis Technique for FPGA Designs to Improve Reliability and Fault-Tolerance Abdul Rafay Khatri 1, Ali Hayek 2, Josef Börcsök 3 Department of Computer Architecture
More informationSequential Circuit Design: Principle
Sequential Circuit Design: Principle Chapter 8 1 Outline 1. Overview on sequential circuits 2. Synchronous circuits 3. Danger of synthesizing async circuit 4. Inference of basic memory elements 5. Simple
More informationCMP annual meeting, January 23 rd, 2014
J.P.Nozières, G.Prenat, B.Dieny and G.Di Pendina Spintec, UMR-8191, CEA-INAC/CNRS/UJF-Grenoble1/Grenoble-INP, Grenoble, France CMP annual meeting, January 23 rd, 2014 ReRAM V wr0 ~-0.9V V wr1 V ~0.9V@5ns
More informationImproved Fault Tolerant Sparse KOGGE Stone ADDER
Improved Fault Tolerant Sparse KOGGE Stone ADDER Mangesh B Kondalkar 1 Arunkumar P Chavan 2 P Narashimaraja 3 1, 2, 3 Department of Electronics and Communication, R V college of Engineering, Bangalore
More informationReliable Architectures
6.823, L24-1 Reliable Architectures Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 6.823, L24-2 Strike Changes State of a Single Bit 10 6.823, L24-3 Impact
More informationFault-tolerant system design using novel majority voters of 5-modular redundancy configuration
Fault-tolerant system design using novel majority voters of 5-modular redundancy configuration V.Elamaran, G.Rajkumar, N.Raju, K.Narasimhan, Har Narayan Upadhyay School of EEE, Department of ECE, SASTRA
More informationLatches SEU en techno IBM 130nm pour SLHC/ATLAS. CPPM, Université de la méditerranée, CNRS/IN2P3, Marseille, France
Latches SEU en techno IBM 130nm pour SLHC/ATLAS CPPM, Université de la méditerranée, CNRS/IN2P3, Marseille, France Outline Introduction Description of the DICE latch Different implemented layouts for the
More informationAn Energy Efficient Circuit Level Technique to protect Register File from MBUs and SETs in Embedded Processors
An Energy Efficient Circuit Level Technique to protect Register File from MBUs and SETs in Embedded Processors M. Fazeli, A. Namazi, S.G. Miremadi Department of Computer Engineering, Sharif University
More informationMultiChipSat: an Innovative Spacecraft Bus Architecture. Alvar Saenz-Otero
MultiChipSat: an Innovative Spacecraft Bus Architecture Alvar Saenz-Otero 29-11-6 Motivation Objectives Architecture Overview Other architectures Hardware architecture Software architecture Challenges
More informationMaximizing Logic Utilization in ex, SX, and SX-A FPGA Devices Using CC Macros
Application Note AC201 Maximizing Logic Utilization in ex, SX, and SX-A FPGA Devices Using CC Macros Table of Contents Introduction................................................ 1 SX and Related Architectures.......................................
More informationECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University
Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2010 Daniel J. Sorin Duke University Definition and Motivation Outline General Principles of Available System Design
More informationOutline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline
Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication Khanh N. Dang and Xuan-Tu Tran Email: khanh.n.dang@vnu.edu.vn VNU Key Laboratory for Smart Integrated Systems
More informationoutline Reliable State Machines MER Mission example
outline Reliable State Machines Dr. Gary R Burke California Institute of Technology Jet Propulsion Laboratory Background JPL MER example JPL FPGA/ASIC Process Procedure Guidelines State machines Traditional
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 3 - Resilient Structures Chapter 2 HW Fault Tolerance Part.3.1 M-of-N Systems An M-of-N system consists of N identical
More informationA Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup
A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington
More informationVery Large Scale Integration (VLSI)
Very Large Scale Integration (VLSI) Lecture 10 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Content Manufacturing Defects Wafer defects Chip defects Board defects system defects
More informationRadiation Hardened System Design with Mitigation and Detection in FPGA
Master of Science Thesis in Electrical Engineering Department of Electrical Engineering, Linköping University, 2016 Radiation Hardened System Design with Mitigation and Detection in FPGA Hampus Sandberg
More informationReliability of Programmable Input/Output Pins in the Presence of Configuration Upsets
Brigham Young University BYU ScholarsArchive All Faculty Publications 2002-01-01 Reliability of Programmable Input/Output Pins in the Presence of Configuration Upsets Paul S. Graham Nathaniel Rollins See
More informationCprE 458/558: Real-Time Systems. Lecture 17 Fault-tolerant design techniques
: Real-Time Systems Lecture 17 Fault-tolerant design techniques Fault Tolerant Strategies Fault tolerance in computer system is achieved through redundancy in hardware, software, information, and/or computations.
More informationSoft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study
Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study Bradley F. Dutton, Graduate Student Member, IEEE, and Charles E. Stroud, Fellow, IEEE Dept. of Electrical and Computer Engineering
More informationReliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure
Reliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure Iswarya Gopal, Rajasekar.T, PG Scholar, Sri Shakthi Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India Assistant
More informationVariation-Aware Core-Level Redundancy Scheme for Reliable DSP Computation in Multi-Core Systems
Variation-Aware Core-Level Redundancy Scheme for Reliable DSP Computation in Multi-Core Systems Wei-Ching Chu, Huai-Ting Li, Ching-Yao Chou, An-Yeu (Andy) Wu Graduate Institute of Electronics Engineering,
More informationTHE scaling of device geometries improves the performance
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 62, NO. 5, MAY 2015 471 An FPGA-Based Transient Error Simulator for Resilient Circuit and System Design and Evaluation Chia-Hsiang Chen,
More informationDesigning Safe Verilog State Machines with Synplify
Designing Safe Verilog State Machines with Synplify Introduction One of the strengths of Synplify is the Finite State Machine compiler. This is a powerful feature that not only has the ability to automatically
More informationQPro XQR17V16 Radiation Hardened 16Mbit QML Configuration PROM
R DS126 (v1.0) December 18, 2003 0 8 Product Specification 0 QPro XQR17V16 Radiation Hardened 16Mbit QML Configuration PROM Features Latch-Up Immune to LET >120 MeV/cm 2 /mg Guaranteed TID of 50 krad(si)
More information