Outline of Presentation Field Programmable Gate Arrays (FPGAs(

Similar documents
An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs

Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

FPGA Implementations

Atmel AT94K FPSLIC Architecture Field Programmable Gate Array

Multiple Event Upsets Aware FPGAs Using Protected Schemes

Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency

Fast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs

Outline of Presentation

Leso Martin, Musil Tomáš

A Case Study. Jonathan Harris, and Jared Phillips Dept. of Electrical and Computer Engineering Auburn University

Single Event Upset Mitigation Techniques for SRAM-based FPGAs

FPGA architecture and design technology

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs

Field Programmable Gate Array (FPGA)

On-Line Single Event Upset Detection and Correction in Field Programmable Gate Array Configuration Memories

SAN FRANCISCO, CA, USA. Ediz Cetin & Oliver Diessel University of New South Wales

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

Error Mitigation of Point-to-Point Communication for Fault-Tolerant Computing

Altera FLEX 8000 Block Diagram

INTRODUCTION TO FPGA ARCHITECTURE

FAULT TOLERANT SYSTEMS

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Analysis of Soft Error Mitigation Techniques for Register Files in IBM Cu-08 90nm Technology

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy

Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

FPGA: What? Why? Marco D. Santambrogio

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

DESIGN AND ANALYSIS OF SOFTWARE FAULTTOLERANT TECHNIQUES FOR SOFTCORE PROCESSORS IN RELIABLE SRAM-BASED FPGA

Memory and Programmable Logic

Initial Single-Event Effects Testing and Mitigation in the Xilinx Virtex II-Pro FPGA

Outline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline

Reliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN

Improving FPGA Design Robustness with Partial TMR

FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes

ALMA Memo No Effects of Radiation on the ALMA Correlator

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

Mitigation of SCU and MCU effects in SRAM-based FPGAs: placement and routing solutions

Introduction to Partial Reconfiguration Methodology

DIGITAL CIRCUIT LOGIC UNIT 9: MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

Evaluation of FPGA Resources for Built-In Self-Test of Programmable Logic Blocks

Topics. Midterm Finish Chapter 7

Exploiting Unused Spare Columns to Improve Memory ECC

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:

HDL IMPLEMENTATION OF SRAM BASED ERROR CORRECTION AND DETECTION USING ORTHOGONAL LATIN SQUARE CODES

An Integrated ECC and BISR Scheme for Error Correction in Memory

EE219A Spring 2008 Special Topics in Circuits and Signal Processing. Lecture 9. FPGA Architecture. Ranier Yap, Mohamed Ali.

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.

Error Correction Using Extended Orthogonal Latin Square Codes

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

Memory and Programmable Logic

LogiCORE IP Soft Error Mitigation Controller v3.2

Hamming FSM with Xilinx Blind Scrubbing - Trick or Treat

Soft Error Detection And Correction For Configurable Memory Of Reconfigurable System

LogiCORE IP Soft Error Mitigation Controller v4.0

LA-UR- Title: Author(s): Intended for: Approved for public release; distribution is unlimited.

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture

Built-In Self-Test for System-on-Chip: A Case Study

Error Detecting and Correcting Code Using Orthogonal Latin Square Using Verilog HDL

Area Efficient Scan Chain Based Multiple Error Recovery For TMR Systems

Analysis and Implementation of Built-In Self-Test for Block Random Access Memories in Virtex-5 Field Programmable Gate Arrays. Justin Lewis Dailey

Reliability Improvement in Reconfigurable FPGAs

Introduction to Field Programmable Gate Arrays

Dependable VLSI Platform using Robust Fabrics

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder

An HVD Based Error Detection and Correction Code in HDLC Protocol Used for Communication

A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system

FPGA based systems become very popular in many technical

Embedded Systems: Hardware Components (part I) Todor Stefanov

Digital Integrated Circuits

Product Obsolete/Under Obsolescence

Dynamic Reconfigurable Computing Architecture for Aerospace Applications

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Presentation Outline Overview of FPGA Architectures Virtex-4 & Virtex-5 Overview of BIST for FPGAs BIST Configuration Generation Output Response Analy

Implementation of single bit Error detection and Correction using Embedded hamming scheme

Fault Grading FPGA Interconnect Test Configurations

On Supporting Adaptive Fault Tolerant at Run-Time with Virtual FPGAs

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

Adaptive Multi-bit Crosstalk-Aware Error Control Coding Scheme for On-Chip Communication

FAULT TOLERANT SYSTEMS

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

Design Methodologies. Full-Custom Design

Built-In Self-Test of Programmable Input/Output Tiles in Virtex-5 FPGAs

Memory and Programmable Logic

Field Programmable Gate Array

Self-Checking Fault Detection using Discrepancy Mirrors

APPLICATION NOTE. Gate Count Capacity Metrics for FPGAs. Introduction. Maximum Logic Gates

Transcription:

FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering Auburn University

Outline of Presentation Field Programmable Gate Arrays (FPGAs( FPGAs) How Programmable Logic Works Configuration Memory Single Event Upset (SEU) Problem in FPGAs Configuration Memory System Function Memory Elements Architectural Solutions Hamming Code for Memory SEU Controller for Configuration Memory Triple Modular Redundancy and Guard Bands Operational Solutions Plan for AubieSat-2 Summary & Conclusions AubieSat AUBIeSaT /3/7 VLSI Design & Test Seminar 2

Basic FPGA Operation Writing configuration memory defines system function Input/Output (I/O) Cells Logic in Logic Blocks Connections between Logic Blocks & I/O cells Changing configuration memory data changes system function Can change at anytime Partial reconfiguration SEUs can change configuration memory data to another function /3/7 VLSI Design & Test Seminar 3

FPGA Resources FPGA Resource Small FPGA Large FPGA Logic Routing Specialize d Cores Other PLBs per FPGA LUTs and flip-flops flops per PLB Wire segments per PLB PIPs per PLB Bits per memory core Memory cores per FPGA DSP cores Input/output cells Configuration memory bits 256 39 28 42,4 Almost everything in FPGA either,2 79,74,832 /3/7 VLSI Design & Test Seminar 4 45 6 62 Uses memory elements, or Is controlled by configuration memory 25,92 8 46 3,462 36,864 576 52

PLB Architecture Look-up Table (LUT) implements truth table for combination logic functions Carry & control logic implements fast adders/subtractors subtractors Memory elements susceptible to SEUs: Flip-flop/latch LUTs are memory elements storing truth table In some FPGAs LUTs can function as small RAMs Input[:4] Control 4 LUT/ RAM clock, enable, set/reset 3 carry in Carry & Control Logic carry out Flip-flop/ Latch Output Q output /3/7 VLSI Design & Test Seminar 5

Combinational Logic Fucntions Any digital logic function can be represented by a truth table Multiplexer example If S =, Z = A If S =, Z = B Heavily used in FPGAs S S input controlled by configuration memory bit to allow selection of signal flow A S B Truth table S A B Z Z Logic symbol A Z B S /3/7 VLSI Design & Test Seminar 6

Configuration memory holds outputs for truth table Internal signals connect to control signals of multiplexers to select value of truth table for any given input value Look-up Tables Z B A S Multiplexer A Z B S Truth table S A B Z /3/7 VLSI Design & Test Seminar 7

Look-up Table Based RAMs Normal LUT mode performs read operations Address decoder with write enable generates load signals to latches for write operations Small RAMs but can be combined for larger RAMs Data In ld In In In2 Write Enable /3/7 VLSI Design & Test Seminar 8 Address Decoder ld ld2 ld3 ld4 ld5 ld6 ld7 In In In2 Z

Xilinx Virtex-4 FPGAs Configuration memory: 4.7M to 5.8M bits of RAM Logic Blocks:,536 to 22,272 4 LUTs (4-input) 4 LUTs/RAMs (4-input) 8 8 FF/latches Block RAMs: : 48 to 552 8K-bit dual-port RAMs Also operate as FIFOs DSP cores: 32 to 52, each includes: 8x8-bit multiplier 48-bit adder & accumulator PowerPC processors: to 2 PC PC /3/7 VLSI Design & Test Seminar 9

It s s Getting Worse All The Time Smaller design rules & lower supply voltages M. Ohlsson,, P. Dyreklev, K. Johansson, & P. Alfke, Neutron Single Even Upsets in SRAM-Based FPGAs, Proc. 998 IEEE Nuclear & Space Radiation Effects Conf. Used radiation chamber to calculate SEU frequency at altitude of km at 6 N N (Sweden) Increase by FPGA XC4E XC4XL a factor of 2.5 Process Vcc SEU every.6µm 5V.3x 6 hrs.35µm 3.3V 2.8x 5 hrs 4 slices in 4 vs. 89,88 in Virtex-4 Projecting this for 3 design rule shrinks & 2 voltage reductions we get SEU every 28.2 hrs /3/7 VLSI Design & Test Seminar

Hardware Solutions FPGA manufacturers are including some mechanisms for Detecting/correcting SEUs Hamming code Configuration memory SEU controller soft core RAM cores Tolerating SEUs Tools for Triple Modular Redundancy (TMR) TMR would be used for FPGA memory elements not covered by Hamming code Allows limited number of SEUs to be tolerated Need more & better techniques /3/7 VLSI Design & Test Seminar

Calculating Hamming Code H = # Hamming bits D+H+ + 2 H D= # data bits Hamming, BSTJ 5 D=8 example H=D D2 D2 D4 D5 D7D7 H2=D D3 D3 D4 D6 D7D7 H3=D2 D3 D3 D4 D8D8 H4=D5 D6 D6 D7 D8D8 Hamming distance, d=3= =3=E+C++ Single bit error detection & correction (SEC) E=, C= Additional parity bit, d=4= =4=E+C++ Parity over data & Hamming bits Double error detection (DED) & single error correction (SEC) E=2, C= Position 2 3 4 5 6 7 8 9 2 Bit H H2 D H3 D2 D3 D4 H4 D5 D6 D7 D8 Parity H Parity H2 Parity H3 Parity H4 E = #bit errors to detect C = #bit errors to correct Error Type No bit error -bit correctable error 2-bit error detection Condition Hamming match, no parity error Hamming mismatch, parity error Hamming mismatch, no parity error /3/7 VLSI Design & Test Seminar 2

Hamming Code Operation Example: RAM or configuration memory Input (Generate Circuit): Generate Hamming code for data Store data and Hamming bits Output (Detect/Correct Circuit): Regenerate Hamming code for data Bit-wise XOR with stored Hamming bits Non-zero syndrome indicates Error detection and bit position of error bit Flip that bit to correct H stored H regenerated Syndrome Syndrome Extra parity bit determines non-correctable double bit error Indication can disable correction circuit to avoid further corruption /3/7 VLSI Design & Test Seminar 3 H H H D i H D i D i Syndrome Decoder

Error Detection and Correction Single bit error examples D3 is erroneous Changes H3 and H2 Syndrome = = bit 6 D6 is erroneous Position 2 3 4 5 6 7 8 9 2 Bit H H2 D H3 D2 D3 D4 H4 D5 D6 D7 D8 H= Syndrome = = bit H2= H2= H3= H3= H4= H4= Changes H4 and H2 Odd number of bits change Overall parity bit error SEC Double bit error example D3 and D6 are erroneous Changes H3 and H4 (but not H2) Syndrome = = bit 2 Indicates error in D8 Even number of bits change No overall parity error DED /3/7 VLSI Design & Test Seminar 4

Virtex-4 4 Hamming Codes Hamming bits stored in each frame of configuration memory Frame ECC circuit checks Hamming code as each frame is read & indicates Single correctable errors Need additional circuit to fix erroneous bit Multiple non-correctable errors Need to reload configuration memory Block RAMs Contents not covered by configuration memory Hamming bits RAMs have ECC mode with Hamming bits Detection and correction circuitry Correction only on output data Need to write corrected data back in RAM PC PC /3/7 VLSI Design & Test Seminar 5

Xilinx Virtex-4 4 Frame ECC Circuit Hamming code stored in configuration memory,32-bit frame includes Up to,3 bits of configuration data Hamming bits + overall parity bit Hamming code generated by configuration bit generation program and downloaded with configuration data Hamming code check performed on each read operation No bit error correction must be performed by user logic and written back to configuration memory D Output Status indications: Config Data No error Memory Parity Bit DED H SEC w/ syndrome,32-bit Generator Parity DED Check words Error Syndrome valid SEC Frame Address Register Hamming Code Generator Hamming Check /3/7 VLSI Design & Test Seminar 6 H FRAME ECC Indicators H Syndrome

Xilinx s s SEU Controller Soft core synthesized with user s s design Sequences through frames one at a time Uses Frame ECC circuit and Internal Configuration Access Port (ICAP) to detect Single bit detectable errors PicoBlaze microcontroller corrects bit and writes frame back into configuration memory Double bit non-correctable errors Requires 4 PLBs & 2 Block RAMs 3 PLBs for PicoBlaze and RAM for program memory PLBs for SEC circuit and ICAP interface Plus RAM for storing and correcting frame data SEU controller operation (full chip @ MHz) Error detection time.2 to 4.6 msec Smallest to largest Virtex-4 Error correction time 24 to 278 msec /3/7 VLSI Design & Test Seminar 7

Complicating the Problem Block RAM contents not covered by configuration memory Hamming bits Current program memory for PicoBlaze not SEU tolerant Changing data in memory elements FFs & LUT-RAMs Do not change Hamming bits Restore operation Loads config memory data into FFs,, LUT-RAMs RAMs,, and BRAMs Capture operation Loads FF, LUT-RAM, and BRAM contents to config mem for read Destroys Hamming information Cannot use Capture with SEU controller Operational restrictions on FPGA for SEU tolerance SEU controller not SEU-tolerant Need TMR SEU controller design Need TMR PicoBlaze design w/ ECC RAM for program mem Need to write corrected single bit errors back into program memory /3/7 VLSI Design & Test Seminar 2

Virtex-4 4 Block RAMss Contain 48 to 552 8K-bit dual-port RAMs Program from 6Kx-bit RAM to 52x36-bit RAM No SEU protection in these modes of operation Can operate as 24 to 276 36K-bit RAMs with ECC 52x72-bit RAMs 64-bit data 7-bit Hamming Single error correction -bit overall parity Double error detection Can also operate as FIFOs With or without ECC mode =DSPs =PLBs PPC PPC =Block RAMs/FIFOs =I/O Buffers /3/7 VLSI Design & Test Seminar 2

Xilinx Virtex-4 4 ECC RAM Separate Hamming code generators Separate write & read ports Only RAM output data corrected by ECC Contents of RAM still erroneous Extra circuitry to write corrected data back into RAM Virtex-5 5 has internal correct mode Input Data Hamming Code Generator Parity Bit Generator Generate D=64 H=7 RAM Core 52 words 64+7+ bits/word write addr read addr D H Syndrome Parity Bit Generator Hamming Code Generator no err H H2 D H3 D2 D3 D4 H4 D5 D6 D7 D8 D9 D D H5 D2 D3 D4 D5 D6 D7 D8 D9 D2 D2 D22 D23 D24 D25 D26 H6 D27 D28 D29 D3 D3 D32 D33 D34 D35 D36 D37 D38 D39 D4 D4 D42 D43 D44 D45 D46 D47 D48 D49 D5 D5 D52 D53 D54 D55 D56 D57 H7 D58 D59 D6 D6 D62 D63 D64 Bit Error Correction Circuit Parity Check Hamming Check /3/7 VLSI Design & Test Seminar 22 H D Detect/Correct Output Data DED Error Indicators SEC

Triple Modular Redundancy (TMR) Replicate modules and add majority voter(s) Protects against single faults in replicated modules TMR SEU susceptibility problem in FPGAs Single faults in can cause multiple modules to fail Primarily bi-directional PIPs TMR fault isolation with guard band regions Guard bands isolate module components and routing An SEU can cause errors in only one module Deactivated switch isolated wire segments Module Module 3 Module 2 Majority Voter Module Guard Bands Module 2 Majority Voter Module 3 /3/7 VLSI Design & Test Seminar 24

Programmable Interconnect Points Break-point PIP Connect or isolate 2 wire segments Cross-point PIP 2 nets straight through net turns corner and/or fans out Compound cross-point PIP Collection of 6 break-point PIPs Can route to two isolated signal nets These bi-directional PIPs were significant portion of routing resources in early FPGAs Now less than.4% of routing resources Multiplexer PIP Directional and buffered Main routing resource in recent FPGAs Select -of of-n inputs for output Buffer prevents some SEU affects But not all currently studying effects /3/7 VLSI Design & Test Seminar 25

Guard Bands Guard Bands reduce interaction of signals between modules 6 CLB wide GBs Good isolation but big area overhead CLB wide GBs Some isolation 9 Turn off stub trimming to see used wire segment interaction Still have problems Long lines 9 Long lines use bidirectional PIPs 9 PACE controls logic but not routing CLB isolation for fault monitoring circuits /3/7 VLSI Design & Test Seminar 26

Fault Monitoring Circuit Located in guard band regions Compares outputs of adjacent working regions Can be used to compare internal nodes Earlier SEU detection than output alone Any mismatch implies SEU occurred output from region # guard band with fault monitor circuit Count errors and/or take action Scrub configuration memory Activate SEU controller to locate/correct single bit errors Failure indications point to frames to scan for errors output from region #2 PLBs for SR latch fault isolation Interrupt to SEU controller Module Guard Bands Module 2 /3/7 VLSI Design & Test Seminar 27 Module 3

Majority Voter for SEU Controller Adding XORs to majority voting circuit gives circular comparison of module outputs Better diagnostic resolution for faulty modules to scan for SEU controller Lower latency for locating/correcting SEUs Out Out2 Out3 Out X Out2 X Out3 X /3/7 VLSI Design & Test Seminar 28

Our Plan for Virtex-4 FPGAs Configuration memory: 4.7M to 5.8M bits of RAM ECC SEU PLBs: :,536 to 22,272 4 LUTs (4-input) 4 LUTs/RAMs (4-input) 8 8 FF/latches SEU TMR Block RAMs: : 24 to 276 32K-bit ECC RAMs (ECC only) Also operate as FIFOs ECC DSP cores: 32 to 52, each includes: TMR 8x8-bit multiplier 48-bit adder & accumulator PowerPC processors: to 2 /3/7 VLSI Design & Test Seminar 29 PC PC Can t t TMR PowerPCs!! Use TMR Micro- or Pico-Blaze

AUBIeSaT Plan Count, correct, and classify SEUs in an actual FPGA in space Compare with sensor measurements Determine if SEUs impact system function or not Single bit correctable With and without impact on system function Double bit non-correctable With and without impact on system function Record and transmit SEU counts and types Tolerate/correct SEUs using various mechanism Use ECC functionality to count & correct SEUs Configuration Memory (w/ SEU controller circuit) Block RAMs in ECC mode Monitor and count failure indications /3/7 VLSI Design & Test Seminar 3

AUBIeSaT Plan Use TMR with guard bands for all other logic Design SEU Include fault monitoring circuits to detect/count SEUs SEUs can occur in configuration memory & be counted twice But only configuration memory ECC can correct SEUs SEUs in system TMR flip-flops flops may be flushed out in time Fault monitor failures indicate area for SEU controller scan Reduces latency for detection & correction of SEU Include ability to download original configuration To scrub memories in case of multiple non- correctable errors in configuration memory Use rad-hard ROM to store configuration May also periodically re-download to scrub memory /3/7 VLSI Design & Test Seminar 3

Summary Single Event Upsets (SEUs( SEUs) ) in FPGAs Serious problem Everything controlled by configuration memory bits New architectural features provide indication of SEUs with ability to correct SEU controller scan to detect and correct single bit errors ECC Block RAM mode TMR with guard band regions in FPGAs Isolate multiple working regions that contain functionally equivalent system functions Fault monitoring circuits within guard bands AHAB Compare working regions Detects SEUs that could impact system operation Take action when mismatch occurs /3/7 VLSI Design & Test Seminar 32