ECC Protection in Software
|
|
- Coleen Adams
- 6 years ago
- Views:
Transcription
1 Center for RC eliable omputing ECC Protection in Software by Philip P Shirvani RATS June 8, 1999 Outline l Motivation l Requirements l Coding Schemes l Multiple Error Handling l Implementation in ARGOS l Summary 1 Motivation l COTS Board in ARGOS * Transient errors (SEUs) in memory * No hardware ECC * Application corruption l Long repair time l Lost eperiment time * FT in applications l Added compleity due to redundancy l Single-point-of-failure * Operating system corruption Software Implemented ECC l Code Segments * Fied content after link stage * Generate ECC bits on the board * Scrub periodically l Data Segments * Read-only data * Stored results * Random read and writes l Intercept store instructions l Inefficient in software 3 4 Previous Work l Encoding/Decoding for Comm Systems l Software Impl of Error Detection Codes * Performance comparison * Multiple checksum schemes [1] * Computation of CRC via table look-up [] * Quasi-cyclic vs convolution code [3] * Byte-wide SEC-DED (55, 5) RS code [4] l RAM disks of satellite memories Requirements l Data Bits Preserved in Codewords * Systematic code l Small Overhead for ECC Bits l Fast and Small Program * Encoding/decoding and correction l Handling Multiple Error l Background Task l Self-Repair 5 6 Philip P Shirvani - RATS Page 1 6/8/99
2 l Block Code * Communication * Storage l Horizontal Code Different Codes * eg, over a 64-bit word * Memory ECC in hardware l Vertical Code * Bit-sliced l Fleibility in Software l Horizontal Horizontal vs Vertical Code * Wider memory architecture l Vertical * Bit-wise logical instr l Multiple Bit Error * In a word * In a bit-slice 64 data bits 4 ECC bits 8 ECC bits 7 8 Coding Schemes l Detection and Correction Capability * Single errors * Multiple errors * Errors in words, bit-slices l Ease of Software Implementation * Code size and speed l Overhead of Check Bits Scheme 1 - Hamming l (7, 64) Hamming Code Cdd [ 0 1 d63c 1 c7] = Ddd [ 0 1 d63] GI [ P64 8] c = d d d d d d d d d d d d d d d d d d d d d d d d d d d 0 d 1 d l SEC-DED * Independently for each bit-slice d 63 c Scheme - Cyclic 8 7 l (7, 64) Cyclic Code: P( X ) = X + X + X l Polynomial Division * LFSR l Software Implementation * 3 parallel LFSRs l SEC-DED * Independently for each bit-slice + 1 Scheme 3 - Parity l Vertical + Diagonal Parity D0 D1 D D3 C0 C1 l Single Bit Error Correction * One in whole block of 33 bits l Double Bit Detection 11 1 Philip P Shirvani - RATS Page 6/8/99
3 l RS Code, G( 3 ): 31 l d = n-k+1 30 * d=3: SbEC * d=4: SbEC-DbED c0 = di c1 = diα c = diα Scheme 4 - RS 3 P( X ) = X + X + X + X c 1 0 Up to 3-1 words (d=3) or 3 (d=4) check words 13 Comparison Scheme Prog Size Check-bit Performance (bytes) Overhead (Dec MB/s) Hamming % Cyclic % 94 Parity % 3468 RS (d=3) %* 441 * Block = 64 data + check words 14 Multiple Error Handling l Single Particle Hit * Multiple errors [5-7] l ECC Software * Logical view of memory l Programmer s view of bytes and addresses l Mapping of Physical to Logical Bits * Important in code design * System design dependent * Memory structure dependent System Level Dependency l Memory Chip Data Width l Eample: * 3-bit data bus * MB memory l 51K 8bit chips 4 l 51K 1bit chips 3 l Errors in Diff Chips * Independent c Memory Structure Dependency l Physical Layout of Memory Cells * One or multiple arrays * Connection of address bits to the decoders * Mu layouts * Variable among designs Memory Structure 1 l 51K 8bit A0 - A10 Row Decoder 51K 8 Column Decoder A11 - A Philip P Shirvani - RATS Page 3 6/8/99
4 l 4 bit Memory Structure Memory Structure 3 l bit A0 - A16 A0 - A17 A17, A18 1 of 4 Decoder A18 1 of Decoder D0 - D7 D0 - D Interleaving Implementation in ARGOS l Multiple Errors in Adjacent Addresses * eg, 4-way interleaving Memory Block Logical Block ECC Profiler Collector 64 data OS Diagnostic Main Control Telemetry 8 ECC 4*64 data Watchdog 64 data Computations Ground Program 8 ECC 4*8 ECC 1 l Periodic Scrubbing Design Framework Cache Memory l Multitasking * Separate, high priority task for ECC l Memory Access Right to Code Segments * No protection in VWorks l Synchronization * Timer for periodic scrubbing * Message-passing for wake-up calls l Automatic Addition of New Protected Blocks * Initialization code of each module * Block info sent to ECC by message l Cache Coherency in a Split Cache * Instructions in D-cache * Flush D-cache, invalidate I-cache l Software ECC for Cache? * Supervisor mode; OS level * Direct access to data and tag 3 4 Philip P Shirvani - RATS Page 4 6/8/99
5 l Error Detection Error Recovery * SSM, SAI, WD timer l First Retry * Computation errors l Second Retry * Code corrupted * Force ECC scrub l Message-passing l Reload Module l Hardware ECC Summary * Recommended when possible l ECC in Software * Provide protection for code segments * Coding schemes compared l ARGOS Project * Continuous error collection (ECC and others) * Automatic recovery 5 6 Future Work l Self-Checking and Correction for ECC Module * Different schemes * Reliability analysis l More Efficient Schemes l Observe Error Recoveries * Percentage of successful ones * Improvements 7 References (1) l [1] Feldmeier, David C, Fast Software Implementation of Error detection Codes, IEEE/ACM Trans Networking, Vol 3, No 6, pp , De995 l [] Sarwate, Dilip V, Computation of Cyclic Redundancy Checks via Table Look-up, Communications of the ACM, Vol 31, No 8, pp , Aug 1988 l [3] Whelan, James W, Error Correction with a Microprocessor, Proc IEEE National Aerospace and Electronics Conf, pp , 1977 l [4] Hodgart, MS, Efficient Coding and Error Monitoring for Spacecraft Digital Memory, Int l J Electronics, Vol 73, No 1, pp 1-36, 199 l [5] Ziegler, JF, et al IBM eperiments in soft fails in computer electronics, IBM J Res Develop, Vol 40, No 1, pp 3-17, Jan 1996 l [6] Liu, J, et al, Heavy Ion Induced Single Event Effects in Semiconductor Device, Proc Int l Conf on Atomic Collisions in solids, pp, References () l [7] Reed, R, et al, Heavy Ion and Proton-Induced Single Event Multiple Upset, IEEE Trans on Nuclear Science, Vol 44, No 6, pp 4-9, July, 1997 l [8] Rao, TRN, E Fujiwara, Error-Control coding for Computer Systems, Prentice Hall, Philip P Shirvani - RATS Page 5 6/8/99
FAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance
More informationAN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES
AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES S. SRINIVAS KUMAR *, R.BASAVARAJU ** * PG Scholar, Electronics and Communication Engineering, CRIT
More informationError Detecting and Correcting Code Using Orthogonal Latin Square Using Verilog HDL
Error Detecting and Correcting Code Using Orthogonal Latin Square Using Verilog HDL Ch.Srujana M.Tech [EDT] srujanaxc@gmail.com SR Engineering College, Warangal. M.Sampath Reddy Assoc. Professor, Department
More informationExploiting Unused Spare Columns to Improve Memory ECC
2009 27th IEEE VLSI Test Symposium Exploiting Unused Spare Columns to Improve Memory ECC Rudrajit Datta and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer Engineering
More informationReliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure
Reliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure Iswarya Gopal, Rajasekar.T, PG Scholar, Sri Shakthi Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India Assistant
More informationHDL IMPLEMENTATION OF SRAM BASED ERROR CORRECTION AND DETECTION USING ORTHOGONAL LATIN SQUARE CODES
HDL IMPLEMENTATION OF SRAM BASED ERROR CORRECTION AND DETECTION USING ORTHOGONAL LATIN SQUARE CODES (1) Nallaparaju Sneha, PG Scholar in VLSI Design, (2) Dr. K. Babulu, Professor, ECE Department, (1)(2)
More informationSingle error correction, double error detection and double adjacent error correction with no mis-correction code
This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Single error correction, double error detection
More informationError Correction Using Extended Orthogonal Latin Square Codes
International Journal of Electronics and Communication Engineering. ISSN 0974-2166 Volume 9, Number 1 (2016), pp. 55-62 International Research Publication House http://www.irphouse.com Error Correction
More informationOutline of Presentation Field Programmable Gate Arrays (FPGAs(
FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering Auburn University Outline of Presentation Field Programmable Gate Arrays (FPGAs( FPGAs) How Programmable
More informationError Control Coding for MLC Flash Memories
Error Control Coding for MLC Flash Memories Ying Y. Tai, Ph.D. Cadence Design Systems, Inc. ytai@cadence.com August 19, 2010 Santa Clara, CA 1 Outline The Challenges on Error Control Coding (ECC) for MLC
More informationDETECTION AND CORRECTION OF CELL UPSETS USING MODIFIED DECIMAL MATRIX
DETECTION AND CORRECTION OF CELL UPSETS USING MODIFIED DECIMAL MATRIX ENDREDDY PRAVEENA 1 M.T.ech Scholar ( VLSID), Universal College Of Engineering & Technology, Guntur, A.P M. VENKATA SREERAJ 2 Associate
More informationDESIGN OF FAULT SECURE ENCODER FOR MEMORY APPLICATIONS IN SOC TECHNOLOGY
DESIGN OF FAULT SECURE ENCODER FOR MEMORY APPLICATIONS IN SOC TECHNOLOGY K.Maheshwari M.Tech VLSI, Aurora scientific technological and research academy, Bandlaguda, Hyderabad. k.sandeep kumar Asst.prof,
More informationComparative Analysis of DMC and PMC on FPGA
Comparative Analysis of DMC and PMC on FPGA Gnanajyothi R 1, Mr Ramana Reddy K V 2, Dr. Siva Yellampalli 3 1 Student, Mtech13, UTL VTU regional center, Bengaluru. 2 Assistant professor, UTL VTU regional
More informationRedundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992
Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical
More informationFault-Tolerance Projects at Stanford CRC
Center for RC eliable omputing jeihgfdcbabakl Fault-Tolerance Projects at Stanford CRC Philip P. Nirmal Saxena Edward J. McCluskey Center for Reliable Computing Computer Systems Laboratory Departments
More informationComputer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions
More informationEE 6900: FAULT-TOLERANT COMPUTING SYSTEMS
EE 6900: FAULT-TOLERANT COMPUTING SYSTEMS LECTURE 6: CODING THEORY - 2 Fall 2014 Avinash Kodi kodi@ohio.edu Acknowledgement: Daniel Sorin, Behrooz Parhami, Srinivasan Ramasubramanian Agenda Hamming Codes
More informationHardware Implementation of Single Bit Error Correction and Double Bit Error Detection through Selective Bit Placement for Memory
Hardware Implementation of Single Bit Error Correction and Double Bit Error Detection through Selective Bit Placement for Memory Lankesh M. Tech student, Dept. of Telecommunication Engineering, Siddaganga
More informationA Low-Cost Correction Algorithm for Transient Data Errors
A Low-Cost Correction Algorithm for Transient Data Errors Aiguo Li, Bingrong Hong School of Computer Science and Technology Harbin Institute of Technology, Harbin 150001, China liaiguo@hit.edu.cn Introduction
More informationFPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes
FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes E. Jebamalar Leavline Assistant Professor, Department of ECE, Anna University, BIT Campus, Tiruchirappalli, India Email: jebilee@gmail.com
More informationI/O Hardwares. Some typical device, network, and data base rates
Input/Output 1 I/O Hardwares Some typical device, network, and data base rates 2 Device Controllers I/O devices have components: mechanical component electronic component The electronic component is the
More informationAvailable online at ScienceDirect. Procedia Technology 25 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Technology 25 (2016 ) 544 551 Global Colloquium in Recent Advancement and Effectual Researches in Engineering, Science and Technology (RAEREST
More informationPacket-Level Forward Error Correction in Video Transmission
Packet-Level Forward Error Correction in Video Transmission Matteo Mazzotti, Enrico Paolini, Marco Chiani, Davide Dardari, and Andrea Giorgetti University of Bologna Wireless Communications Laboratory
More informationPART III. Data Link Layer MGH T MGH C I 204
PART III Data Link Layer Position of the data-link layer Data link layer duties LLC and MAC sublayers IEEE standards for LANs Chapters Chapter 10 Error Detection and Correction Chapter 11 Data Link Control
More informationSingle Byte Error Correcting Double Byte Error Detecting Codes For Memory Systems
Single Byte Error Correcting Double Byte Error Detecting Codes For Memory Systems single- error correcting and double-error detecting codes (SEC-DED codes) is memory are classified to be either independent
More informationCSMC 417. Computer Networks Prof. Ashok K Agrawala Ashok Agrawala. Nov 1,
CSMC 417 Computer Networks Prof. Ashok K Agrawala 2018 Ashok Agrawala 1 Message, Segment, Packet, and Frame host host HTTP HTTP message HTTP TCP TCP segment TCP router router IP IP packet IP IP packet
More informationCSE 123A Computer Networks
CSE 123A Computer Networks Winter 2005 Lecture 4: Data-Link I: Framing and Errors Some portions courtesy Robin Kravets and Steve Lumetta Last time How protocols are organized & why Network layer Data-link
More informationAn Efficient Error Detection Technique for 3D Bit-Partitioned SRAM Devices
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.5, OCTOBER, 2015 ISSN(Print) 1598-1657 http://dx.doi.org/10.5573/jsts.2015.15.5.445 ISSN(Online) 2233-4866 An Efficient Error Detection Technique
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN
255 CORRECTIONS TO FAULT SECURE OF MAJORITY LOGIC DECODER AND DETECTOR FOR MEMORY APPLICATIONS Viji.D PG Scholar Embedded Systems Prist University, Thanjuvr - India Mr.T.Sathees Kumar AP/ECE Prist University,
More informationAn Integrated ECC and Redundancy Repair Scheme for Memory Reliability Enhancement
An Integrated ECC and Redundancy Repair Scheme for Memory Reliability Enhancement Chin-LungSu,Yi-TingYeh,andCheng-WenWu Laboratory for Reliable Computing (LaRC) Department of Electrical Engineering National
More informationImplementation of Decimal Matrix Code For Multiple Cell Upsets in Memory
Implementation of Decimal Matrix Code For Multiple Cell Upsets in Memory Shwetha N 1, Shambhavi S 2 1, 2 Department of E&C, Kalpataru Institute of Technology, Tiptur, Karnataka, India Abstract: Transient
More informationChapter 6 Storage and Other I/O Topics
Department of Electr rical Eng ineering, Chapter 6 Storage and Other I/O Topics 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Feng-Chia Unive ersity Outline 6.1 Introduction 6.2 Dependability,
More informationThe Data Link Layer. Data Link Layer Design Issues
The Data Link Layer Chapter 3 Data Link Layer Design Issues Network layer services Framing Error control Flow control 1 Packets and Frames Relationship between packets and frames. Network Layer Services
More informationThe Data Link Layer Chapter 3
The Data Link Layer Chapter 3 Data Link Layer Design Issues Error Detection and Correction Elementary Data Link Protocols Sliding Window Protocols Example Data Link Protocols Revised: August 2011 & February
More information4. Error correction and link control. Contents
//2 4. Error correction and link control Contents a. Types of errors b. Error detection and correction c. Flow control d. Error control //2 a. Types of errors Data can be corrupted during transmission.
More informationDependability and ECC
ecture 38 Computer Science 61C Spring 2017 April 24th, 2017 Dependability and ECC 1 Great Idea #6: Dependability via Redundancy Applies to everything from data centers to memory Redundant data centers
More informationLow Power Cache Design. Angel Chen Joe Gambino
Low Power Cache Design Angel Chen Joe Gambino Agenda Why is low power important? How does cache contribute to the power consumption of a processor? What are some design challenges for low power caches?
More informationWilliam Stallings Computer Organization and Architecture 8th Edition. Cache Memory
William Stallings Computer Organization and Architecture 8th Edition Chapter 4 Cache Memory Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics
More informationA Portable and Fault-Tolerant Microprocessor Based on the SPARC V8 Architecture
A Portable and Fault-Tolerant Microprocessor Based on the SPARC V8 Architecture Jiri Gaisler Gaisler Research, 411 08 Göteborg, Sweden jiri@gaisler.com Abstract The architecture and implementation of the
More informationI/O CANNOT BE IGNORED
LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.
More informationCSEP 561 Error detection & correction. David Wetherall
CSEP 561 Error detection & correction David Wetherall djw@cs.washington.edu Codes for Error Detection/Correction ti ti Error detection and correction How do we detect and correct messages that are garbled
More informationCommunication Fundamentals in Computer Networks
Lecture 7 Communication Fundamentals in Computer Networks M. Adnan Quaium Assistant Professor Department of Electrical and Electronic Engineering Ahsanullah University of Science and Technology Room 4A07
More informationChapter 5. Internal Memory. Yonsei University
Chapter 5 Internal Memory Contents Main Memory Error Correction Advanced DRAM Organization 5-2 Memory Types Memory Type Category Erasure Write Mechanism Volatility Random-access memory(ram) Read-write
More informationComputer & Microprocessor Architecture HCA103
Computer & Microprocessor Architecture HCA103 Cache Memory UTM-RHH Slide Set 4 1 Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation
More informationPerformance Optimization of HVD: An Error Detection and Correction Code
Abstract Research Journal of Engineering Sciences ISSN 2278 9472 Performance Optimization of HVD: An Error Detection and Correction Code Fadnavis Shubham Department of Electronics and Communication, Acropolis
More informationStorage. Hwansoo Han
Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics
More informationComputer Science 146. Computer Architecture
Computer Science 46 Computer Architecture Spring 24 Harvard University Instructor: Prof dbrooks@eecsharvardedu Lecture 22: More I/O Computer Science 46 Lecture Outline HW5 and Project Questions? Storage
More informationImproved Error Correction Capability in Flash Memory using Input / Output Pins
Improved Error Correction Capability in Flash Memory using Input / Output Pins A M Kiran PG Scholar/ Department of ECE Karpagam University,Coimbatore kirthece@rediffmail.com J Shafiq Mansoor Assistant
More informationEDAC FOR MEMORY PROTECTION IN ARM PROCESSOR
EDAC FOR MEMORY PROTECTION IN ARM PROCESSOR Mrs. A. Ruhan Bevi ECE department, SRM, Chennai, India. Abstract: The ARM processor core is a key component of many successful 32-bit embedded systems. Embedded
More information11. SEU Mitigation in Stratix IV Devices
11. SEU Mitigation in Stratix IV Devices February 2011 SIV51011-3.2 SIV51011-3.2 This chapter describes how to use the error detection cyclical redundancy check (CRC) feature when a Stratix IV device is
More informationRedundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992
Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical
More informationABSTRACT I. INTRODUCTION
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 Orthogonal Latin Squares Encoders and Syndrome
More informationAdvanced Computer Networks. Rab Nawaz Jadoon DCS. Assistant Professor COMSATS University, Lahore Pakistan. Department of Computer Science
Advanced Computer Networks Department of Computer Science DCS COMSATS Institute of Information Technology Rab Nawaz Jadoon Assistant Professor COMSATS University, Lahore Pakistan Advanced Computer Networks
More informationArchitectural Level Fault- Tolerance Techniques. EECE 513: Design of Fault- tolerant Digital Systems
Architectural Level Fault- Tolerance Techniques EECE 513: Design of Fault- tolerant Digital Systems Learning ObjecDves List the techniques for improving the reliability of commodity & high end processors
More informationCMSC 2833 Lecture 18. Parity Add a bit to make the number of ones (1s) transmitted odd.
Parity Even parity: Odd parity: Add a bit to make the number of ones (1s) transmitted even. Add a bit to make the number of ones (1s) transmitted odd. Example and ASCII A is coded 100 0001 Parity ASCII
More informationLink Layer: Error detection and correction
Link Layer: Error detection and correction Topic Some bits will be received in error due to noise. What can we do? Detect errors with codes Correct errors with codes Retransmit lost frames Later Reliability
More informationYield Enhancement Considerations for a Single-Chip Multiprocessor System with Embedded DRAM
Yield Enhancement Considerations for a Single-Chip Multiprocessor System with Embedded DRAM Markus Rudack Dirk Niggemeyer Laboratory for Information Technology Division Design & Test University of Hannover
More informationThis Unit: Main Memory. Building a Memory System. First Memory System Design. An Example Memory System
This Unit: Main Memory Building a Memory System Application OS Compiler Firmware CPU I/O Memory Digital Circuits Gates & Transistors Memory hierarchy review DRAM technology A few more transistors Organization:
More informationCS321: Computer Networks Error Detection and Correction
CS321: Computer Networks Error Detection and Correction Dr. Manas Khatua Assistant Professor Dept. of CSE IIT Jodhpur E-mail: manaskhatua@iitj.ac.in Error Detection and Correction Objective: System must
More informationUnit 2. Chapter 4 Cache Memory
Unit 2 Chapter 4 Cache Memory Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation Location CPU Internal External Capacity Word
More informationPOWER4 Systems: Design for Reliability. Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX
Systems: Design for Reliability Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX Microprocessor 2-way SMP system on a chip > 1 GHz processor frequency >1GHz Core Shared L2 >1GHz Core
More informationFast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs
Fast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs Hamid R. Zarandi,2, Seyed Ghassem Miremadi, Costas Argyrides 2, Dhiraj K. Pradhan 2 Department of Computer Engineering, Sharif
More informationDesign of Flash Controller for Single Level Cell NAND Flash Memory
Design of Flash Controller for Single Level Cell NAND Flash Memory Ashwin Bijoor 1, Sudharshana 2 P.G Student, Department of Electronics and Communication, NMAMIT, Nitte, Karnataka, India 1 Assistant Professor,
More informationSome portions courtesy Robin Kravets and Steve Lumetta
CSE 123 Computer Networks Fall 2009 Lecture 4: Data-Link I: Framing and Errors Some portions courtesy Robin Kravets and Steve Lumetta Administrative updates I m Im out all next week no lectures, but You
More informationMultiple Event Upsets Aware FPGAs Using Protected Schemes
Multiple Event Upsets Aware FPGAs Using Protected Schemes Costas Argyrides, Dhiraj K. Pradhan University of Bristol, Department of Computer Science Merchant Venturers Building, Woodland Road, Bristol,
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices
More informationAdministrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review
Administrivia CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Homework #4 due Thursday answers posted soon after Exam #2 on Thursday, April 24 on memory hierarchy (Unit 4) and
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 6 Spr 23 course slides Some material adapted from Hennessy & Patterson / 23 Elsevier Science Characteristics IBM 39 IBM UltraStar Integral 82 Disk diameter
More informationShared Memory Architectures. Approaches to Building Parallel Machines
Shared Memory Architectures Arvind Krishnamurthy Fall 2004 Approaches to Building Parallel Machines P 1 Switch/Bus P n Scale (Interleaved) First-level $ P 1 P n $ $ (Interleaved) Main memory Shared Cache
More informationCSE 461: Framing, Error Detection and Correction
CSE 461: Framing, Error Detection and Correction Next Topics Framing Focus: How does a receiver know where a message begins/ends Error detection and correction Focus: How do we detect and correct messages
More informationOverview. EE 4504 Computer Organization. Historically, the limiting factor in a computer s performance has been memory access time
Overview EE 4504 Computer Organization Section 3 Computer Memory Historically, the limiting factor in a computer s performance has been memory access time Memory speed has been slow compared to the speed
More informationCS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University
CS 370: SYSTEM ARCHITECTURE & SOFTWARE [MASS STORAGE] Frequently asked questions from the previous class survey Shrideep Pallickara Computer Science Colorado State University L29.1 L29.2 Topics covered
More informationCharacteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram
Microprocessor Design & Organisation HCA2102 Cache Memory Characteristics Location Unit of transfer Access method Performance Physical type Physical Characteristics UTM-RHH Slide Set 5 2 Location Internal
More informationChapter 6. Storage and Other I/O Topics
Chapter 6 Storage and Other I/O Topics Introduction I/O devices can be characterized by Behaviour: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections
More informationAnalysis of Soft Error Mitigation Techniques for Register Files in IBM Cu-08 90nm Technology
Analysis of Soft Error Mitigation Techniques for s in IBM Cu-08 90nm Technology Riaz Naseer, Rashed Zafar Bhatti, Jeff Draper Information Sciences Institute University of Southern California Marina Del
More informationSelf-Repair for Robust System Design. Yanjing Li Intel Labs Stanford University
Self-Repair for Robust System Design Yanjing Li Intel Labs Stanford University 1 Hardware Failures: Major Concern Permanent: our focus Temporary 2 Tolerating Permanent Hardware Failures Detection Diagnosis
More informationData Link Layer. Srinidhi Varadarajan
Data Link Layer Srinidhi Varadarajan Data Link Layer: Functionality The data link layer must: Detect errors (using redundancy bits) Request retransmission if data is lost (using automatic repeat request
More informationJ. Manikandan Research scholar, St. Peter s University, Chennai, Tamilnadu, India.
Design of Single Correction-Double -Triple -Tetra (Sec-Daed-Taed- Tetra Aed) Codes J. Manikandan Research scholar, St. Peter s University, Chennai, Tamilnadu, India. Dr. M. Manikandan Associate Professor,
More informationEE 6900: FAULT-TOLERANT COMPUTING SYSTEMS
EE 6900: FAULT-TOLERANT COMPUTING SYSTEMS LECTURE 8: HARDWARE FAULT TOLERANCE TECHNIQUES Fall 2014 Avinash Kodi kodi@ohio.edu Acknowledgement: Daniel Sorin, Behrooz Parhami, Srinivasan Ramasubramanian
More informationCommercial-Off-the-shelf Hardware Transactional Memory for Tolerating Transient Hardware Errors
Commercial-Off-the-shelf Hardware Transactional Memory for Tolerating Transient Hardware Errors Rasha Faqeh TU- Dresden 19.01.2015 Dresden, 23.09.2011 Transient Error Recovery Motivation Folie Nr. 12 von
More informationSoft Error Detection And Correction For Configurable Memory Of Reconfigurable System
Soft Error Detection And Correction For Configurable Memory Of Reconfigurable System Babu. M, Saranya. S, Preethy. V, Gurumoorthy. J Abstract: The size of integrated Circuits has developed rapidly and
More informationPage 1. Outline. Microprocessor Errors/Failures. Microprocessor Fault Tolerance. ECE 254 / CPS 225 Fault Tolerant and Testable Computing Systems
Outline Fault Tolerant and Testable Computing Systems Real Systems: Hardware Solutions for Tolerating Hardware Faults Microprocessors Memory Disks Networks Multiprocessors Copyright 2011 Daniel J. Sorin
More informationError Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna
Error Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna Abstract---Bloom filters (BFs) provide a fast and efficient way to check whether a given element belongs to a set.
More informationError correction in Flash memory 1. Error correction in Flash memory. Melissa Worley. California State University Stanislaus.
Error correction in Flash memory 1 Error correction in Flash memory Melissa Worley California State University Stanislaus Senior Seminar 24 October 2010 Error correction in Flash memory 2 Abstract In this
More informationMultiChipSat: an Innovative Spacecraft Bus Architecture. Alvar Saenz-Otero
MultiChipSat: an Innovative Spacecraft Bus Architecture Alvar Saenz-Otero 29-11-6 Motivation Objectives Architecture Overview Other architectures Hardware architecture Software architecture Challenges
More informationCSE 380 Computer Operating Systems
CSE 380 Computer Operating Systems Instructor: Insup Lee University of Pennsylvania Fall 2003 Lecture Note on Disk I/O 1 I/O Devices Storage devices Floppy, Magnetic disk, Magnetic tape, CD-ROM, DVD User
More informationCSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011
CSE325 Principles of Operating Systems Mass-Storage Systems David P. Duggan dduggan@sandia.gov April 19, 2011 Outline Storage Devices Disk Scheduling FCFS SSTF SCAN, C-SCAN LOOK, C-LOOK Redundant Arrays
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 6 Input/Output Israel Koren ECE568/Koren Part.6. CPU performance keeps increasing 26 72-core Xeon
More informationCaching Prof. James L. Frankel Harvard University. Version of 5:16 PM 5-Apr-2016 Copyright 2016 James L. Frankel. All rights reserved.
Caching Prof. James L. Frankel Harvard University Version of 5:16 PM 5-Apr-2016 Copyright 2016 James L. Frankel. All rights reserved. Memory Hierarchy Extremely limited number of registers in CPU Lots
More informationI/O CANNOT BE IGNORED
LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.
More informationECE 574 Cluster Computing Lecture 19
ECE 574 Cluster Computing Lecture 19 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 10 November 2015 Announcements Projects HW extended 1 MPI Review MPI is *not* shared memory
More informationBackground. Memory Hierarchies. Register File. Background. Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory.
Memory Hierarchies Forecast Memory (B5) Motivation for memory hierarchy Cache ECC Virtual memory Mem Element Background Size Speed Price Register small 1-5ns high?? SRAM medium 5-25ns $100-250 DRAM large
More informationChapter 4 Main Memory
Chapter 4 Main Memory Course Outcome (CO) - CO2 Describe the architecture and organization of computer systems Program Outcome (PO) PO1 Apply knowledge of mathematics, science and engineering fundamentals
More informationSingle Event Effects Testing of the Intel Pentium III (P3) Microprocessor
Single Event Effects Testing of the Intel Pentium III (P3) Microprocessor James W. Howard Jr. Jackson and Tull Chartered Engineers Washington, D.C. Martin A. Carts Ronald Stattel Charles E. Rogers Raytheon/ITSS
More informationMulti-path Forward Error Correction Control Scheme with Path Interleaving
Multi-path Forward Error Correction Control Scheme with Path Interleaving Ming-Fong Tsai, Chun-Yi Kuo, Chun-Nan Kuo and Ce-Kuen Shieh Department of Electrical Engineering, National Cheng Kung University,
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Different Storage Memories Chapter 5 Large and Fast: Exploiting Memory
More informationThis calculation converts 3562 from base 10 to base 8 octal. Digits are produced right to left, so the final answer is 6752.
COMP 222 Spring 2016 Midterm #1 Solutions Average = 83, Median = 87 Range # of Papers 100 2 90s 13 80s 8 70s 3 60s 4
More informationI/O Systems (3): Clocks and Timers. CSE 2431: Introduction to Operating Systems
I/O Systems (3): Clocks and Timers CSE 2431: Introduction to Operating Systems 1 Outline Clock Hardware Clock Software Soft Timers 2 Two Types of Clocks Simple clock: tied to the 110- or 220-volt power
More informationChapter 6. I/O issues
Computer Architectures Chapter 6 I/O issues Tien-Fu Chen National Chung Cheng Univ Chap6 - Input / Output Issues I/O organization issue- CPU-memory bus, I/O bus width A/D multiplex Split transaction Synchronous
More informationFault-Tolerant Computing
Fault-Tolerant Computing Dealing with Mid-Level Impairments Oct. 2007 Error Detection Slide 1 About This Presentation This presentation has been prepared for the graduate course ECE 257A (Fault-Tolerant
More information