Systems. FPGAs. di Milano. By: Examiner: Axel Jantsch. Supervisor:

Size: px

Start display at page:

Download "Systems. FPGAs. di Milano. By: Examiner: Axel Jantsch. Supervisor:"

Duane Montgomery
5 years ago
Views:

1 Design and Implementation of a Hardened Reconfiguration Controller for Self Healing Systems on SRAM Based FPGAs A Master Thesis Presented to Politecnico di Milano By: Naser Derakhshan Examiner: Axel Jantsch Supervisor: Cristiana Bolchini Spring 2013

2 IN THE NAME OF GOD I

3 Abstract As digital systems become large and complex, their dependability is getting more important, particularly in mission critical and safety critical applications. Among various available platforms for implementing a digital system, SRAM based Field Programmable Gate Arrays (FPGAs) are increasingly adopted in embedded systems due to their flexibility in achieving multiple requirements such as low cost, high performance, and fast turnaround time compared to Fixed Application Specific Integrated Circuits (ASICs). The most attractive feature of SRAM based FPGAs is the ability of re programming 1 the device in a few clock cycles. This feature is further enhanced by the introduction of Partial Dynamic Reconfiguration (PDR). PDR allows reconfiguration partially and on the fly, while the device is operating. Nevertheless, SRAM based FPGAs are more susceptible to faults compared to other type of FPGAs and ASICs. One of these faults, which mostly happen in higher altitude 2, is bit flop in configuration memory caused by ionizing radiation. If this bit flop alters the critical bits 3 in the configuration memory, the function of the design can be corrupted. Thus, appropriate hardening techniques should be used in order to increase device dependability. In general, fault tolerant techniques are mostly based on spatial redundancy. However, these techniques can be combined with FPGA s re configuration capability for recovery. Since the complexity of systems is increasing and utilizing hardening techniques demand higher resources, a single FPGA may not suffice to contain whole system. In this case, multi FPGA platforms would be taken into account. In this thesis, a hardened generic reconfiguration controller that manages the occurrence of soft errors in self healing systems implemented on SRAM based FPGAs is demonstrated and analyzed. The controller shows the ability to correct the SEUs in the configuration memory in both static and partial reconfigurable regions by means of Xilinx PDR capability. Moreover, the controller, itself, is hardened with fault tolerant techniques and it is able to detect and mask its own errors. The developed controller is compared with similar approaches based on micro controller inside the FPGA. Eventually, the presented structure is proven fully functional on XUPV5 LX110T evaluation board. 1 Re configuring feet and above 3 critical bits are those bits that cause functional failure if they change state II

4 Preface This report is provided as a master thesis to fulfill the requirement for master degree in System on chip Program at ICT School of Royal Institute of Technology (KTH). This thesis is carried out at spring 2012 at Politecnico di Milano during an exchange study. I would like to take this opportunity to express my sincere appreciation to Prof. Cristiana Bolchini, my supervisor at Politecnico di Milano, for her constant support, motivation and guidance during this project. Further, I would like to thank Dr. Antonio Miele, Dr. Chiara Sandionigi and Matteo Carminati for their practical advices and all MicroLAB students for their kind support during this thesis. I would like also to show my sincere gratitude to all KTH and Politecnico staff which I might not remember their names but they helped me a lot to finish my master thesis. Last and the foremost, I wish to thank my parents, Akbar Derakhshan and Tooran Hamedmoghadam, that nothing can be comparable with their dedications, spiritual support and encouragements in my whole life. Moreover, I wish to kindly thank my lovely wife, Zeinab Hassani, who broke her study in Iran to company me during my study abroad. I really could have not finished my master study without her support. III

5 Table of Contents 1 Introduction Background and Related Work Motivation Working scenario Adopted Fault Model Self Healing System Architecture SEU Mitigation Schemes Summary Proposed Controller Architecture Implemented design in the Master side Implemented design in the slave side Summary Design Hardening State Machine Encoding Internal Signal Hardening Interface Hardening Bitstream Memory Protection Test Results Conclusion and Future Works Glossary Works Cited Appendices Appendix A: Bitstream Scrubbing and Readback Appendix B: Redundancy Appendix C: Xilinx Virtex 5 overview Appendix D: Configuration modes in Virtex Configuration Modes and Pins in Virtex 5 [31] Serial Configuration Interface [31] IV

6 LIST OF FIGURES FIGURE 1 BASIC PREMISE OF PARTIAL RECONFIGURATION... 6 FIGURE 2 FT SYSTEM ON MULTI FPGA PLATFORM. DISTRIBUTED SOLUTION (LEFT); CENTRALIZED SOLUTION (RIGHT)... 8 FIGURE 3 A CONFIGURATION CONTROLLER BLOCK DIAGRAM BASED ON MICROBLAZE... 8 FIGURE 4 RECONFIGURATION CONTROLLER BLOCK DIAGRAM FIGURE 5 SLAVE FPGA (LEFT) AND MASTER FPGA (RIGHT) FIGURE 6 CONFIGURATION CONTROLLER BLOCK DIAGRAM FIGURE 7 BLOCK DIAGRAM OF THE MASTER SIDE AND THE TOP MODULE SIGNALS FIGURE 8 PR CONTROLLER INTERFACE FIGURE 9 MODULES INSIDE THE TOP (MASTER SIDE) FIGURE 10 FAULT CLASSIFIER INTERFACE FIGURE 11 FAULT CLASSIFIER FINITE STATE MACHINE DIAGRAM FIGURE 12 PR CONTROLLER INTERFACE FIGURE 13 PR CONTROLLER FINITE STATE MACHINE DIAGRAM FIGURE 14 COMPLETE BLOCK DIAGRAM FIGURE 15 FULL CONFIGURATION CONTROLLER INTERFACE FIGURE 16 FULL CONFIGURATION CONTROLLER FINITE STATE MACHINE FIGURE 17 THE IMPLEMENTED DESIGN WITH AN EXTERNAL MEMORY FOR STORING PARTIAL BIT STREAM FILES FIGURE 18 IMPLEMENTED DESIGN SLAVE SIDE FIGURE 19 DIFFERENTIAL INPUT BUFFER PRIMITIVE (IBUFDS) FIGURE 20 THE CONNECTION BETWEEN TWO EVALUATION BOARDS FIGURE 21 GENERATED PR REGIONS ON THE FPGA FABRIC FIGURE 22 A SCHEMATIC FPGA STRUCTURE. TAKEN FROM [8] FIGURE 23 TMR BASIC PRINCIPLE FIGURE 24 TMR DEVICE LEVEL FIGURE 25 XILINX VIRTEX 5 XC5VLX110T DEVICE. TAKEN FROM [44] FIGURE 26 XILINX XUPV5 LX110T EVALUATION PLATFORM. TAKEN FROM [46] FIGURE 27 VIRTEX 5 FPGA SERIAL CONFIGURATION INTERFACE. TAKEN FROM [31] FIGURE 28 SERIAL CONFIGURATION CLOCKING SEQUENCE. TAKEN FROM [31] FIGURE 29 MASTER SERIAL MODE CONFIGURATION. TAKEN FROM [31] V

7 LIST OF TABLES TABLE 1 FPGA VS. ASIC DESIGN ADVANTAGES. TAKEN FROM [10]... 3 TABLE 2 TOP MODULE (MASTER SIDE) INTERFACE PINS TABLE 3 FAULT CLASSIFIER INTERFACE PINS TABLE 4 PR CONTROLLER PIN DESCRIPTION TABLE 5 FULL CONFIGURATION CONTROLLER. PIN DESCRIPTION TABLE 6 BIT ORDERING FOR ICAP 8 BIT MODE TABLE 7 BIT ORDERING TABLE 8 DEVICE UTILIZATION SUMMARY FOR CONFIGURATION CONTROLER (EXCLUDE BITSTREAM MODULE) TABLE 9 CONFIGURATION TIMES FOR DIFFERENT PARTIAL BITSTREAMS TABLE 10 RESOURCE UTILIZATION OF ICAP CONTROLLER TABLE 11 VIRTEX 5 DEVICE FRAME COUNT, FRAME LENGTH, OVERHEAD, AND BITSTREAM SIZE [31] TABLE 12 PERFORMANCE OVERVIEW OF MITIGATION SCHEMES. PART OF THE TABLE IS TAKEN FROM [12] TABLE 13 VIRTEX 5 (LX110T) DEVICE SPECIFICATION TAKEN FROM [43] TABLE 14 VIRTEX 5 CONFIGURATION MODES TABLE 15 VIRTEX 5 FPGA SERIAL CONFIGURATION INTERFACE PINS VI

8 1 Introduction As digital systems become large and complex, their dependability is getting more important, particularly in mission critical and safety critical applications. Among various available platforms for implementing a digital system, SRAM based Field Programmable Gate Arrays (FPGAs) are increasingly adopted in embedded systems due to their flexibility in achieving multiple requirements such as low cost, high performance, and fast turnaround time compared to Fixed Application Specific Integrated Circuits (ASICs). The most attractive feature of SRAM based FPGAs is the ability of re programming 4 the device in a few clock cycles, which allows the system implemented on the FPGA to be updated during design lifetime. This feature is one of the reasons in which SRAM based FPGAs are taken into account for mission critical applications where direct maintenance is difficult. This feature is further enhanced by the introduction of Partial Dynamic Reconfiguration (PDR), which allows reconfiguration partially and on the fly while the device is operating. Some advantages of using SRAM based FPGAs in space applications are discussed in [1], [2]. Nevertheless, SRAM based FPGAs are more susceptible to faults compared to other type of FPGAs and ASICs. One of these faults, which mostly happen in higher altitude 5, is bit flop in configuration memory caused by ionizing radiation [3], [4], [5]. Ionizing radiation (such as neutrons or alpha particles emitted by natural radioactive isotopes present in device packaging) is able to induce undesired single event effects (SEEs) in most silicon devices. SEEs that result in temporary damages to the device are called soft errors. Soft errors in FPGAs often show up as bit flops in user flip flops, internal block memory and configuration memory. Bit flops within the configuration memory are especially challenging. If these bitflops alter the critical bits (those that cause functional failure if they change state) in the configuration memory, the function of the design can be corrupted. This is clearly unacceptable for mission or safetycritical applications. Thus, appropriate hardening techniques should be used before they can be deployed. In general, fault tolerant techniques are mostly based on spatial redundancy. However, these techniques can be combined with FPGA s re configuration capability for recovery. Since the complexity of modern systems is increasing and utilizing hardening techniques demand higher resources, a single 4 Re configuring feet and above 1

9 FPGA may not suffice to contain the whole system. In this case, multi FPGA platforms would be taken into account. In this thesis, a generic dynamic partial reconfiguration controller for a fault tolerant design based on Multi FPGA is proposed. The final goal is to have a dependable controller that is able to recover all recoverable faults 6 by exploiting the reconfiguration capability of the FPGAs. This controller is able to correct the SEUs in the configuration memory of the neighbor FPGA by means of Xilinx PDR 7 capability. It can correct and classify soft errors in the configuration memory, in both static and partial reconfigurable regions. Moreover, the controller, itself, is hardened and it is able to detect and mask its own errors. Modern fault tolerant architectures using PDR often utilize microprocessors such as PowerPC or MicroBlaze embedded into FPGA as the main processing unit for the configuration controller; like the ones presented in [6], [7]. The innovative contribution of this thesis is implementing all necessary units and components for the FT 8 configuration controller generically on the FPGA fabric. Moreover, in this thesis we focused on multi FPGA platforms, which are less discussed in the literatures. We have proposed a distributed solution where each FPGA on the multi FPGA platform is responsible for monitoring and recovering, in case of faults, the neighbor FPGA on the platform. This method, which is discussed in [8], will increase the overall reliability in contrast to centralized solution. In addition to this, the proposed solution in this work is able to correct single or multiple faults (assuming the faults are detected) inside the FPGA. The rest of this thesis is organized as follows: Chapter 2 briefly introduces the preliminary aspects of the problem and introduces the background elements useful to set the basis for understanding the rest of the thesis. Moreover, other SEU mitigation schemes have been discussed in this chapter. We also introduce the self healing system architecture, which our controller is designed based on that. Chapter 3 describes the proposed controller architecture. Chapter 4 presents the design hardening of the implemented controller. In chapter 5, we present the testing results. Eventually, chapter 6 draws some conclusions and gives some possible future research directions. 6 Recoverable faults are a kind of faults that do not cause permanent damage to the FPGA fabric 7 Partial Dynamic Reconfiguration 8 Fault Tolerance 2

10 2 Background and Related Work In this thesis, we proposed a dependable reconfiguration controller for embedded systems on multi FPGA platforms. Our aim is to increase the overall reliability of system by means of PDR capability. The chapter is structured as follows: Section 2.1 presents the motivations of the proposed work and introduces the background elements useful to set the basis for understanding the rest of the thesis. Section 2.2 discuss what the working scenario for this thesis is, and what the characteristics are. In Section 2.3, we explain the adopted fault model. Section 2.4 presents the self healing system architecture. We follow this architecture in the rest of the thesis. Other mitigation schemes are also discussed in section 2.5. At last, section 2.6 draws the chapter summary. 2.1 Motivation Occasionally, electronic devices show erroneous behavior for no explicit reason. By performing several experimental designs and by considering statistical analysis, scientists and engineers discovered that background radiation is the reason. These failures are generally rare and could be ignored for common applications. However, for many applications, such as mission critical and safety critical applications, it is important to consider the role of radiation in system reliability. Reliability problems due to radiation most commonly fall into the category termed single event effect (SEE) and show up as a type of soft errors called single event upsets (SEU) [9]. Among various available platforms for implementing a digital system, SRAM based Field Programmable Gate Arrays (FPGAs) are increasingly adopted in embedded systems due to their flexibility in achieving multiple requirements such as low cost, high performance, and fast turnaround time compared to Fixed Application Specific Integrated Circuits (ASICs). Table 1 compares FPGAs with ASICs in the various aspects. Table 1 FPGA vs. ASIC Design Advantages. Taken from [10] Advantage Faster time to market No upfront non recurring expenses (NRE) Simpler design cycle More predictable project cycle Field reprogramability Advantage Full custom capability Lower unit costs Smaller form factor FPGA Design Benefit No layout, masks or other manufacturing steps are needed Costs typically associated with an ASIC design Due to software that handles much of the routing, placement, and timing Due to elimination of potential re spins, wafer capacities, etc. A new bitstream can be uploaded remotely ASIC Design Benefit For design since device is manufactured to design specs For very high volume designs Since device is manufactured to design specs 3

11 FPGA designs present faster time to market and less non recurring expenses (NRE). They also have a simpler design cycle in contrast to ASICs. However, in general, FPGA designs exhibit worse performance in terms of logic density, circuit speed, and power consumption than ASICs. In [11] the authors presented empirical measurements quantifying the gap between 90 nm CMOS FPGAs and 90 nm CMOS Standard Cell ASICs. They observed that for circuits implemented entirely using LUTs and flip flops (logiconly), an FPGA is on average 40 times larger and 3.2 times slower than a standard cell implementation. An FPGA also consumes 12 times more dynamic power than an equivalent ASIC on average. Although FPGAs used to be selected for lower speed, complexity, volume designs in the past, today s FPGAs easily push the 500 MHz 9 performance barrier. With unprecedented logic density increases and a host of other features, such as embedded processors, DSP blocks, clocking, and high speed serial at ever lower price points, FPGAs are a compelling proposition for almost any type of design [10]. The most attractive feature of SRAM based FPGAs is the ability of re programming 10 the device in a few clock cycles, which allows the system implemented on the FPGA to be updated during design lifetime. This feature is one of the reasons in which SRAM based FPGAs are taken into account for mission critical applications where direct maintenance is difficult. This feature is further enhanced by the introduction of Partial Dynamic Reconfiguration (PDR), which allows reconfiguration partially and on the fly while the device is operating. In this thesis, we focus on the SRAM based FPGAs in Multi FPGA platforms. In a SRAM based FPGA, the combinational and sequential logic are implemented in programmable complex logic blocks (CLBs), which are customized by loading configuration data (bitstream) in the SRAM cells of the program memory [12]. Since the functionality of SRAM based FPGAs is determined by the configuration memory, any bit flop that alters the critical bits 11 in the configuration memory would corrupt the function of design. Thus, to have a dependable system specifically in a harsh environment, the system on the chip should be hardened using suitable FT techniques. 2.2 Working scenario The working scenario of this thesis is space applications where SEUs are caused by secondary particles. According to [9] secondary particles liberated by the collision of a neutron with a silicon atom or from a contaminant emitting an alpha particle in an electronic device. The neutrons are generated when cosmic rays and protons from space interact with the atmosphere. The cosmic rays are from both inside (the sun) and outside (novas and supernovas) of the solar system. The neutrons range in energy from below 1 million electron volts (MeV) to more than 1,000 MeV. Although it is possible to protect electronic equipment against these hi energy neutrons by means of shielding, this is not practical for most applications because the amount of material required to make this shield is prohibitive (e.g., as much as 30 meters of water for neutrons with high energy) [9].In 9 Xilinx Zynq 7000 technology has already passed 800 MHz 10 Re configuring 11 critical bits are those bits that cause functional failure if they change state 4

12 addition to neutron effects, an SEU could be caused by alpha particles that emitted by natural radioactive isotopes present in device material and packaging [9]. 2.3 Adopted Fault Model We can organize the effects from ionizing radiation into three main categories: transient current pulses, changes in memory values (such as bit flops or SEUs), and latch up. The first two categories will result in recoverable (or soft) faults while latch up, which can results in sever overheating, melting, or vaporization, can cause damage to FPGA fabric and will result in non recoverable (or hard) faults. Due to the difficulty of maintenance in mission critical applications, we have to add aging effects to the abovementioned categories. Aging effects can also end in non recoverable faults. Since the primary concern for FPGAs are soft faults, we need to expand the first two mentioned categories in this section: 1 Transient current pulses may change the values of the internal signals or they may strike the clock line. They may have transient effect and get vanished after a short time or they may propagate to flip flops inputs and get registered. In both cases, they can cause erroneous value that will lead to an incorrect result at the output. Suitable error detection and masking technique is necessary to avoid the propagation of an incorrect result to the other modules. Such approach is discussed in [13], [14]. The fault can, then, be recovered by performing the reset. 2 The second type of recoverable faults is change in the memory values. SRAM based FPGAs have two types of memory: The user registers and block RAMs, which store the user data, and the configuration static memory which stores the configuration bitstream. Any changes in the configuration memory will modify the functionality of the system implemented inside the FPGA. The only method to recover the configuration memory is to rewrite the corrupted portion of the configuration memory by the correct portion of the bitstream. In this work, we concentrate on hardening the design implemented inside the FPGA against upsets in the configuration memory. The proposed controller in our research is able to correct single or multiple bit upsets (MBUs) in the configuration memory by performing the partial reconfiguration of the corrupted portion of the memory or, at the worst case, reconfiguring the whole FPGA. 2.4 Self Healing System Architecture We applied a hybrid fault tolerant technique to our multi FPGA architecture. In this architecture each FPGA hosted a portion of the design. This portion on each FPGA is hardened with hardware redundancy techniques and distributed among available partially reconfigurable Regions (PRR 1 to PRR n). Partitioning the system into different portion and then into n PR regions is not mentioned here since the proposed architecture is not depended on it. The hardware redundancy techniques implemented in this 5

scenario are able to detect, locate and mask faults and controllerr of the neighbor FPGA for recovery 12.

According to Xilinx Partial Reconfiguration User Guide, FPGA technology provides the flexibility of on site programming and re programming without going through re fabrication with a modified design.

After a full bit file configures the FPGA, partial BIT files can be downloaded too modify reconfigurable regions in the FPGA without compromising the integrity of the applications running on those

Figure 1 Basic Premise of Partial Reconfiguration In this scenario, the FPGAs are structured into two separatee regions: a static region and several partial reconfigurable (PR) regions.

13 scenario are able to detect, locate and mask faults and controllerr of the neighbor FPGA for recovery 12. inform the faults to the reconfiguration Partial Reconfigurationn is the modification of an operating FPGA design by loading a partial configuration file [15]. According to Xilinx Partial Reconfiguration User Guide, FPGA technology provides the flexibility of on site programming and re programming without going through re fabrication with a modified design. Partial Reconfiguration (PR) takes this flexibility one step further, allowing the modification of an operating FPGA design by loading a partial configuration file, usually a partial bit file. After a full bit file configures the FPGA, partial BIT files can be downloaded too modify reconfigurable regions in the FPGA without compromising the integrity of the applications running on those parts of the device that are not being reconfigured. [15] The basic block diagram of Partial Reconfiguration is illustrated in Figure 1. Figure 1 Basic Premise of Partial Reconfiguration In this scenario, the FPGAs are structured into two separatee regions: a static region and several partial reconfigurable (PR) regions. The portion of the system thatt is implemented in the PR regions can be modified by means of partial reconfiguration controller. The reconfigurable logic is replaced by the contents of the partial bit file. The static logic remains functioning and iss completely unaffected by the loading of a partial bit file. The static region contains the other parts of the design which cannot (or should not) be reconfigured. The partial BIT files (PR_Bit_x.bit) should be calculated offlinee prior the FPGA design; however, they may be updated later during the design lifetime. As shown in Figure 1, each PR modules can be modified by downloading one of several available partial bit files, PR_Bit A.bit to PR Bit_D.bit. These bit files can be stored in an external memory. If these partial bit files are stored in an protected memory 133, partial reconfiguration can improve FPGA fault tolerance by reconfiguring the faulty portion of the FPGA with a correct partial bitstream while the other logics remains functioning and are completely unaffected. Partial Reconfiguration can be done via JTAG, SelectMAP, Master Serial, or ICAP. ICAP is a comprehensive solution for PR design regarding to the capability of doing readback and verifying the design after reconfiguration. Theree are some status registers in ICAP which indicate an error if partial reconfiguration of a block has not been succeeded. Furthermore, it is possible to implement a CRC 12 A brief introduction to hardware redundancy is available at Appendix B. 13 Protected against radiation 6

14 checker in the PR controller to check the CRC for the received file before forwarding it to the ICAP. By using these two techniques, (monitoring the ICAP registers and CRC checking) we can be sure that the target FPGA is partially reconfigured correctly. Using PR approach has some advantages and disadvantaged. These include: Advantages: Partial BIT files are calculated offline and stored in the FPGA prior the FPGA design. Therefore, the necessary controller for doing partial reconfiguration can be smaller than the other methods. BIT files can be updated later during the design life time The PR flow is straightforward and can be done from beginning to the end in Xilinx PlanAhead software Function of each partial reconfigurable region can be changed completely by using a different BIT file (ability to time multiplex hardware dynamically) Many interfaces exists to perform partial reconfiguration from outside Do not need to know the memory address of the PR modules Disadvantages: Extra memory is needed to store both full configuration and partial reconfiguration BIT files Not all implementation options are available to the PR flow. (e.g. techniques perform optimization across the entire design) [15] PR design affects the performance. In general, one should expect 10% degradation in Clock Frequency, and expect not to exceed 80% slices in Packing Density. [15] Routing challenges may occur if the reconfigurable region is too small or is constructed of non rectangular shapes. [15] We considered a distributed solution for this Multi FPGA design in which each FPGA is responsible to monitor its neighbor FPGA, and in case of fault, recover the neighbor FPGA to a correct state 14. Another approach could be a centralized solution that a rad hard FPGA monitor all other FPGAs in a design. The main supremacy of distributed to a centralized solution is that, there is no need for a controller to be resided in a separate device. It can be implemented alongside the main system on the same FPGAs [8]. Moreover, the distributed solution is independent of the number of FPGAs whereas in the centralized solution the number of FPGAs must be defined prior the design. In both scenarios the original configuration bitstreams should be protected against SEUs. We will discuss this issue in section 4.4. The Figure 2 illustrates the basic principle of distributed and centralized solution. 14 By means of a reconfiguration controller 7

Figure 2 FT system on Multi FPGA designs,

and hardware based implementation for the

In software based implementation, a soft

Distributed solution (left); Centralized

can manage the reconfiguration process by

read/write operation of Xilinx XPS HWICAP

Although software based solution gives a

processor itself is a point of failure and

could be very costly in terms of resource

Figure 3 a configuration controller block

available FT techniques on based solution,

5 SEU Mitigation Schemes Any time the FPGA

15 Figure 2 FT system on Multi FPGA designs, there is tradeoff between software based and hardware based implementation for the above mentioned architecture. In software based implementation, a soft or platform. Distributed solution (left); Centralized solution (right) Like any other digital hard processor (such as MicroBlaze, PowerPC or ARM) should be embedded into the design. Then the processor, as shown in Figure 3, can manage the reconfiguration process by setting the required registers for read/write operation of Xilinx XPS HWICAP core [16]. Although software based solution gives a better flexibility to the user, the processor itself is a point of failure and should be hardened. The only method for hardening a soft processor involves triplication, which could be very costly in terms of resource utilization. Figure 3 a configuration controller block diagram based on MicroBlaze In our proposed hard ware Implementing in this way will let the designer to apply any available FT techniques on based solution, the controller iss implemented purely on hardware without any processors. the controller. In addition to this, implementing in hardware would be space/speed optimize. 2.5 SEU Mitigation Schemes Any time the FPGA is powered up, all its configuration contents are refreshed. Therefore, the simplest way to recover the FPGA to a correct condition is to power cycle it. However, this method is not applicable to many applications because it will cause the FPGA to stop functioning for several seconds. This duration is not tolerable for many applications. In these applications, other mitigation techniques 8

16 should be deployed. Moreover, the state of the FPGA will be lost and a synchronization technique should be deployed to synchronize the FPGA with other processing elements in the design. Another mitigation scheme is ''bitstream scrubbing and readback'' (or simply scrubbing) which means reading back the configuration bitstream stored in the configuration memory, comparing it with an original one and correcting any affected configuration bits. The process is continuously performed, independently of the occurrence of a soft error. Such approach is discussed in [17], [18]. Since this approach is blind, it will introduce latency in detecting a fault and it may cause much more overhead compared to the other approaches because of continues readback and checking 15. Some works have been carried out recently to make the scrubbing faster and on demand. In [19] the author proposed a constraint driven re placement method to reduce the number of sensitive configuration frames and consequently the scrubbing time. The faster and on demand solution is the modification of an operating FPGA design by loading a partial configuration bitstream. Partial reconfiguration is only a recovery technique which means soft errors should be detected (and located) first, before they can be repaired. Detection and masking could be performed by well known hardware redundancy techniques, either triple modular redundancy (TMR) [20], [21], [22], [23] or duplication with comparison (DWC) combined with concurrent error detection (CED) [24]. A first implementation for this kind of reconfiguration controller has been presented in [25]. The author in the mentioned paper propose a distributed mesh topology in which each FPGA monitors the neighbor FPGA in a multi FPGA platform and triggers the reconfiguration of the faulty portion of the neighbor FPGA. However, the proposed solution in the mentioned work for hardening the reconfiguration controller is based on blind readback and checking which may introduce delay in recovery. Another work is presented in [16] where the author compares different software based solution for reconfiguration controller to achieve the minimum reconfiguration time. However, since the reconfiguration controller is implemented in the embedded processor, hardening the controller is very difficult. The latest study in this genre is presented in [26] where the author implemented a hardware based ICAP controller for doing partial reconfiguration. We will compare these approaches in terms of speed and resource utilization with our proposed controller in the upcoming discussion. 2.6 Summary In this chapter, we presented the necessary requirements for a multi FPGA system in a mission critical application. We talked about the importance of the SRAM based FPGAs and we introduced their limitation in different environments. We also included a brief comparison in Performance, Consumption, Cost, and Flexibility between SRAM Based FPGAs and similar embedded processing units. Then, we show that although SRAM based FPGAs are attractive not only in commercial markets, but also in the mission critical and safety critical application, special hardening techniques must be used in a harsh environment. Moreover, we described our working scenario and talked about its characteristics. Next, 15 For more information regarding scrubbing and Xilinx SEM controller please refer to Appendix A. 9

17 we mentioned the main types of fault that threaten electronic devices in this environment, and we discussed the Radiations and its effects on the electronic devices in general and on the SRAM based FPGAs in particular. Furthermore, we introduce our self healing system architecture, which our controller is designed based on that. We have also discussed other possible approaches for increasing the reliability of SRAM based FPGA s design. We performed a brief literature analysis on similar approaches as well. In the next chapter, we will discuss our proposed solution for this scenario. 10

18 3 Proposed Controller Architecture The main problems in fault tolerant system is to first detect error during system operation, then locate the error as fast as possible, next, recover the system to a normal condition and last, bring the system back to the correct state. Error detection and localization could be done by means of online checkers like the one presented in [27]. In this paper, the author presents an on line testing technique for TMR. Another approach is to combine 2 rail logic and self checking to have a concurrent error detection technique like the one presented in [24]. In this thesis, we only focus on fault recovery by means of PDR capability. Our proposed solution is based on the design methodology presented in [8]. As shown in Figure 2, each FPGA (FPGA i ) in our architecture hosted a reconfiguration controller. The main responsibilities of these controllers are as follow: 1 The controller has to monitor the error signals of the PR regions, static region, and the reconfiguration controller of the next FPGA (FPGA i+1 ) in the proposed mesh topology. 2 In case of any error in the FPGA i+1 the controller should perform appropriate action to recover the FPGA i+1 to a correct condition by means of reconfiguration. 3 The controller should be hardened itself in a way that if a fault occur in the controller, it should detect, locate and mask the fault and inform the reconfiguration controller in the FPGA i 1 for performing the recovery. By considering these responsibilities, the controller can be organized into four main parts: Fault Classifier, Partial Reconfiguration (PR) Engine, Full Reconfiguration Engine, and Bitstream Module. The main block diagram of the controller is illustrated in Figure 4. 11

Figure 4 Reconfiguration Controllerr block diagram The fault classifier has

PR Engine with the address of relevant partial bitstream.

FPGAi+1, the Fault Classifier will initiate the Full Reconfiguration Engine.

an error in a PR region after a specific number of try.

communication with this memory at maximum possible speed.

reconfiguration is done via Internal Configuration Access Port (ICAP) att 3.

However, the ICAP could not be used for full reconfiguration and, for this

platforms (Figure 5); however, it can be extended to any number of FPGAs;

19 Figure 4 Reconfiguration Controllerr block diagram The fault classifier has to monitor the error signals, which are encoded with two rail coding (TRC) techniquee [28], [29]. If an error is detected on a PR region,, the Fault Classifier initiates the PR Engine with the address of relevant partial bitstream. Then, it would monitor the error signals again to see whether the error is corrected or not. If the error is detectedd inside the static region or PR controller of the FPGAi+1, the Fault Classifier will initiate the Full Reconfiguration Engine. Full Reconfiguration Engine may also be initiated if PR Engine could not fix an error in a PR region after a specific number of try. The original bitstreamss in our design are stored in a rad hardd external memory. The Bitstream Module is responsible to provide the necessary protocol for communication with this memory at maximum possible speed. To achieve the maximum speed for reconfiguration, the act of doing partial reconfiguration is done via Internal Configuration Access Port (ICAP) att 3.2 Gbps. However, the ICAP could not be used for full reconfiguration and, for this reason; the full reconfiguration is done via master serial configuration mode at 10 Mbps. The controller in this thesis is implemented and tested on two FPGA platforms (Figure 5); however, it can be extended to any number of FPGAs; since the implemented solution is independent of the number of FPGAs. In the implemented solution, the configuration controller reside in one FPGA (we call this FPGA as Master) and the system which should be hardened by means of PDR resides in another FPGA (we call this FPGA as slave). In this section, we describe the implemented controller (Figure 6) in details. We startt by explaining the components in the master side and then the component in the slave side.. 12

20 Figure 5 slave FPGA (left) and master FPGA (right) Virtex 5 Evaluation board (Slave Side) Virtex 5 Evaluation board (Master Side) FPGA 2 (Target) FPGA 1 (Master) LED1 LED2 LED3 RM1 R W Mux RM2 R W Mux RM3 R W Mux ICAP Static Parts Partial Reconfiguration interface Error Signals PR Controller (Master Side) Fault classifier Full configuration controller BRAM Bitstream 1 Bitstream 2 Bitstream 3 LED1 LE D2 LE D3 CPLD 2 (Routing full configuration interface to the dedicatedd FPGA 2 configuration pins) Full Configuration CPLD 1 (Routing the platform flash to the FPGA 1) Full configuration of FPGA 2 via master serial Platform Flash Figure 6 Configuration Controller Block Diagram 13

21 3.1.1 Implemented design in the Master side The configuration controller has to be able to reconfigure the neighbor FPGA (slave), fully or partially. Moreover, it should decide whether it has to perform full configuration or partial reconfiguration based on some existing rules. The block diagram of the Master side is shown in Figure 7. The fault classifier module receives the error16 signals from the slave FPGA and sends a request to the PR Controller. Then, the PR Controller initializes the ICAP interface and sends the selected bitstream to the slave FPGA. If the error is in static region or the number of errors in the PR regions exceed a specific amount (three in our case), the Fault classifier classify these errors as non recoverable by PDR and sends the request to the full configuration controller for downloading the full bitstream to the slave FPGA. The partial bitstream files are stored on the on chip memory17, and the full bitstream is stored on the Platform Flash. We will come back to this later that why we route the platform flash to the FPGA via an onboard CPLD. Figure 7 Block Diagram of the Master Side and the Top module signals 16 These errors can be in PR regions or static region. The detection of these errors is the responsibility of the user and can be done by means of FT techniques. In this thesis, we assume that there is an error detection mechanism (Such as 2 rail logic combined with self checking) on the slave side. 17 These bitstreams will be moved to an off chip memory later. 14

22 The interface of the PR controller is shown in Figure 8. Table 2 describes the top module interface. Figure 8 PR controller interface Table 2 Top module (Master side) interface pins Pin Name err_1(1:0) err_2(1:0) err_3(1:0) CLK RST RAM_DIN SLAVE_CCLK SLAVE_DONE SLAVE_INIT B ICAP_INPUT_N(15:0) ICAP_INPUT_P(15:0) number_of err(2:0) ICAP_CE ICAP_CLK ICAP_WRITE Two rail error signal from PR modulee one Two rail error signal from PR modulee two Two rail error signal from PR modulee three Main 100 MHz clock Main reset Serial configuration data input, synchronous to rising RAM_CCLK edge. This Pin is connected to the D0 of the Platform Flash. Configuration clock source for alll configuration modes except JTAG. This signal is connected to the CCLK of the slave FPGA. Active High signal indicating full configuration is complete: 0 = Slave FPGA not configured 1 = Slave FPGA configured Before the Mode pins are sampled, INIT_B is an input that can be held Low to delay the full configuration of the slave FPGA.. After the Modee pins are sampled, INIT_B is an openpartial drain active Low output indicating whether a CRCC error occurred during full or reconfiguration: 0 = CRC error 1 = No CRC error ICAP read data bus. The bus width depends on ICAP_WIDTH parameter. The bit ordering is identical to the SelectMAP interface. ICAP read data bus. The bus width depends on ICAP_WIDTH parameter. The bit ordering is identical to the SelectMAP interface. Thesee signals which are connected to the LEDs indicate number of errors which have happened since the most recent full configuration. select. Equivalent to CS_B in the SelectMAP interface. Active Low ICAP interface ICAP interface clock. The data are sampled on the rising edge of this clock. ICAP data flow direction. 0=WRITE, 1=READ. Equivalent to the RDWR_B signal in the 15 Type Output Differential Output Differential Output Output Output Output Description

Output Output Output Output Synchronous clock for Platform

Data is put on RAM_DIN on the rising edge of this clock.

When CE is High, the Platform Flash is put into low power

pins are put in a high impedancee state.

holds the address counter reset and the DATA output is in a

slave FPGA Active Low asynchronous full chip reset.

As it can be seen in Figure 9, we have five main components

In the following, we discuss each component in detail. 3.1.

23 SelectMAP interface. RAM_CCLK RAM_CE_B RAM_INIT_ B Slave_D0 SLAVE_PROG_B Output Output Output Output Output Synchronous clock for Platform Flash. Data is put on RAM_DIN on the rising edge of this clock. Chip Enable Output. When CE is High, the Platform Flash is put into low power standby mode, the address counter is reset, and the DATA pins are put in a high impedancee state. Corresponds to OE/ /RESET_B of Platform Flash. When Low, this pin holds the address counter reset and the DATA output is in a high impedance state Configuration DATA input pin for thee slave FPGA Active Low asynchronous full chip reset. This pin iss connected to the PROGAM_B of the slave FPGA. As it can be seen in Figure 9, we have five main components inside TOP. In the following, we discuss each component in detail Fault Classifier Module Figure 9 Modules inside the TOP (Master Side) The purpose of the Fault Classifier is to analyze the input error signals and decide what kind of configuration is needed to restore the faulty module in the slave side to bring it back to its initial state. 16

It will bee removed in thee final design.

24 Main clock for fault classifier. could be up to 100 MHz, We connect this to the 6.25 MHz DCM Clock Main reset. This pin is connected to the main system reset push button This signal is connected to a debounced push buttons and used to make a pause at the beginningg of the state machine. It will bee removed in thee final design. Active High signal indicating full configuration is complete: 0 = Slave FPGA not configured 1 = Slave FPGA configured Indicates that the partial reconfiguration has been finished Two rail error signal from PR module onee Two rail error signal from PR module twoo Two rail error signal from PR module three A request to the PR Controller to start partial reconfiguration. A request to the Full configuration Controller to start full reconfiguration. These signals that are connected to the LEDs indicate number of errors, which have happened since the most recent full configuration. Two bit signal that indicates which portion of the slavee is faulty and which bitstream should be sent to the slave side. It is sampled onn the rising edgee of the start_pr_out. This output buffer is needed for one hot state encoding. For more details please refer to [30] This two rail signal indicates the presence of error in thee classifier state machine In this design, master FPGA detects which part of the targett FPGA is faulty (by means of error signals). These error signals indicate whether the fault is in reconfigurable modules or in the static region of the target FPGA. Then, the fault classifier in the master side decides whether to perform partial reconfiguration or full configuration. Fault classifier can also reside in the slave side, eliminating the need for error signals. In this case, the fault classifier may only signal the master side which bitstream is needed to be sent to the target FPGA. However, in this case, the fault classifier becomes a point of failure itself, and it should be monitored by the master periodically. Figure 10 illustrate the fault classifier interface and Table 3, describes the fault classifier s pin description. Figure 10 Fault Classifier interface Table 3 Fault Classifier interface pins CLK RST Pin Name START_WRITE SLAVE_DONE PR_DONE Err_1(1:0) Err_2(1:0) Err_3(1:0) Start_PR_out Start_full Number_of err(2:0) PR_Select_out(1:0) Current_state(4:0) Err_in_classifier(1:0) Type Output Output Output Output Output Output Description 17

The errorr signals will inform the fault classifier that there iss an

Then by considering the type of error the fault classifier perform a

For instance, if the error is in the IRA_1, we will get the stream of 11

which sends the PR request to the PR controller.

Fault Classifier finite state machine diagram 3.1.

via Internal Configuration Access Port (ICAP).

capability of doing readback and verifying the design after

Theree are some status registers in ICAP which indicate an error if

Furthermore, we can add a CRCC checker at slave side.

25 The errorr signals will inform the fault classifier that there iss an error in the PR or static regions of the slave FPGA. Then by considering the type of error the fault classifier perform a proper action. The FSM diagram is shown in Figure 11. For instance, if the error is in the IRA_1, we will get the stream of 11 or 00 on err_1 signals and the fault classifier will go to Init_PR state, which sends the PR request to the PR controller. Figure 11 Fault Classifier finite state machine diagram PR Controller Partial Reconfigurationn can be done via JTAG, SelectMAP, Master Serial, or ICAP. In the proposed design, the act of doing partial reconfiguration is done via Internal Configuration Access Port (ICAP). ICAP is a comprehensive solution for PR design regarding to the capability of doing readback and verifying the design after reconfiguration. Theree are some status registers in ICAP which indicate an error if partial reconfiguration of a block has not been succeeded. Furthermore, we can add a CRCC checker at slave side. The CRC checker in the PR controller checks the CRC for the received file before forwarding it to the ICAP. By using these two techniques, (monitoring the ICAP registerss and CRC checking) we can be sure that the target FPGA is partially reconfigured correctly. In the current design, none of the above work mentioned methods has been implemented yet. Now, we only focus on the overall system to correctly. These features can be considered as a future work. Figure 10 illustrates the PR Controller interface and Table 4, describes the PR Controller pin description. 18

Figure 12 PR controller interface Table 4 PR Controller pin description CLK Pin Name START_WRITE PR_Select(1:0) PR_DONE ICAP_CE ICAP_write ICAP_CLK Current_state(4:0) Err_in_PR(1:0) Output

This signal is connected to Start_PR_out of the fault classifier module Two bit signal that indicates which portion of the slave is faulty and which bitstream should be sent to the slave side.

26 Figure 12 PR controller interface Table 4 PR Controller pin description CLK Pin Name START_WRITE PR_Select(1:0) PR_DONE ICAP_CE ICAP_write ICAP_CLK Current_state(4:0) Err_in_PR(1:0) Output ICAP_INPUT_P(15:0) ICAP_INPUT_N(15:0) Type Output Output Output Output Output Description Main clock for PR Controller. It could be up to 100 MHz, We connect this to the 6.25 MHz DCM Clock. This signal is connected to Start_PR_out of the fault classifier module Two bit signal that indicates which portion of the slave is faulty and which bitstream should be sent to the slave side. It is sampled onn the rising edgee of the start_pr_out of the fault classifier module Indicates that the partial reconfigurationn has been finished Active Low ICAP interfacee select. Equivalent to CS_B in the SelectMAP interface. ICAP dataa flow direction. 0=WRITE, 1=READ. Equivalent to the RDWR_B signal in the SelectMAP interface. ICAP interface clock. The data are sampled on the rising edge of this clock. This is the output buffer, which is neededd for one hot state encoding. For more details please refer to [ 30] This two rail signal indicates the presence of error in thee PR controller state machine ICAP read data bus. The bus width depends on ICAP_WIDTH parameter. The bit ordering is identical to the SelectMAP interface. ICAP read data bus. The bus width depends on ICAP_WIDTH parameter. The bit ordering is identical to the SelectMAP interface. Output Output The FSM diagram of the PR controller is shown in the Figure 13. The controller enters the first state after the rising edge on the start_pr received. To enable ICAP, we first assertt ICAP_write and then ICAP_CE after one clock cycle. In the third state, ICAP_CLK is enabled and the process of sending bitstream to the slave FPGA is started. At the rising edge of each clock, the RAM address counter is increased and one 16 bit word is sent to the slave FPGA. After one partial bitstream (12174 words in our case) is sent completely, the FSM enters its final state and deactivate the ICAP primitive. Moreover, PR_DONE is asserted at the final state to inform the fault classifier about the completion of the Partial reconfiguration. The ICAP clock can work correctly at the frequency up to 100 MHz. The controller has been tested and verified in different clock speeds from 6.25 MHz to 100 MHz. 19

Figure 13 PR controller finite state machine diagram 3.1.1.3 Full Configuration Controller 3.

1 Implemented circuit for full configuration controller

via CPLD on the slave side, master FPGA, and eventually,

configuration pins for full configuration.

lowering the dedicatedd PROGRAM B pin of the slave FPGA.

configuration mode and start clocking the Platform flash

configuration is finished, the Slave FPGA signals a DONE

Evaluationn board (Master Side) FPGA 2 (Target) FPGA 1

full configuration interface to the dedicated FPGA 2

27 Figure 13 PR controller finite state machine diagram Full Configuration Controller Implemented circuit for full configuration controller In this design, a master serial configuration 18 interface is implemented. The slave FPGA 19 is responsible for the configuration clock. A dedicated configuration pin of the slave FPGA is routed via CPLD on the slave side, master FPGA, and eventually, CPLD on the master side to the platform flash in the master side. We utilized CPLD to access the hardwired dedicated configuration pins for full configuration. The master FPGA can initiate a full configuration by lowering the dedicatedd PROGRAM B pin of the slave FPGA. After releasing the PROGRAM_B, the Slave FPGA enters its configuration mode and start clocking the Platform flash and receiving the bitstream data one bit per clock. After configuration is finished, the Slave FPGA signals a DONE to the full configuration controller to inform completion of the configuration procedure. A block diagram of the full configuration system iss shown in Figure 14. Virtex 5 Evaluation board (Slave Side) Virtex 5 Evaluationn board (Master Side) FPGA 2 (Target) FPGA 1 (Master) Full configuration controllerr CPLD 2 (Routing full configuration interface to the dedicated FPGA 2 configuration pins) Full Configuration CPLD 1 (Routing the platformm flash to the FPGA 1) Full configuration of FPGA 2 via master serial Platform Flash Figure 14 Complete block diagram 18 Information about Virtex 5 configuration modes are made available at Appendix D. 19 Here the Slave FPGA means the FPGA in the slave side. 20

3.1.1.3.2 The Full Configuration Controller architecture Figure 15 shows the full configuration controllerr interface and Table 5 shows the pin description.

. Pin description CLK RST Pin Name START_config RAM_DIN SLAVE_CCLK SLAVE_DONE SLAVE_INIT B RAM_CE_B RAM_INIT_ B RAM_CCLK SLAVE_D0 SLAVE_PROG_B Current_state(4:0) Type Outpu Outpu Outpu Outpu Outpu

This clock is for internal state machine and different from configuration clock (CCLK) Main reset which connected to the main reset push buttonn This signal is connected to Start_full of thee fault

This Pin is connected to the D0 of the Platform Flash. Configuration clock source for all configuration modes except JTAG. This signal is connected to the CCLK of the slave FPGA.

28 The Full Configuration Controller architecture Figure 15 shows the full configuration controllerr interface and Table 5 shows the pin description. Figure 15 Full Configuration Controller Interface Table 5 Full configuration controller.. Pin description CLK RST Pin Name START_config RAM_DIN SLAVE_CCLK SLAVE_DONE SLAVE_INIT B RAM_CE_B RAM_INIT_ B RAM_CCLK SLAVE_D0 SLAVE_PROG_B Current_state(4:0) Type Outpu Outpu Outpu Outpu Outpu Outpu Err_in_full(1:0) Outpu Description Main clock for full config controller. It could be up to 100 MHz, We connect this to the 6.25 MHz DCM Clock. This clock is for internal state machine and different from configuration clock (CCLK) Main reset which connected to the main reset push buttonn This signal is connected to Start_full of thee fault classifier module. The rising edge on this signal indicates the start of full configuration. Serial configuration data input, synchronous to rising RAM_CCLK edge. This Pin is connected to the D0 of the Platform Flash. Configuration clock source for all configuration modes except JTAG. This signal is connected to the CCLK of the slave FPGA. Active High signal indicating full configuration is complete: : 0 = Slave FPGA not configured 1 = Slave FPGA configured Before the Mode pins are sampled, INIT B is an input that can be held Low to delay the full configuration of the slave FPGA. After the Mode pins are sampled, INIT_B is an open drain active Low output indicating whether a CRCC error occurredd during full or partial reconfiguration: 0 = CRC error 1 = No CRC error Chip Enable Output. When CE is High, thee Platform Flash is put into low powea high impedance state. standby mode, the addresss counter is reset, and the DATAA pins are put in Corresponds to OE/RESET B of Platform Flash. When Low, this pin holds the address counter reset and the DATA output is in a high impedance state Synchronous clock for Platform Flash. Dataa is put on RAM DIN on the rising edge of this clock. Configuration DATA input pin for the slave FPGA Active Low asynchronous full chip reset. This pin is connected to the PROGAM_B of the slave FPGA. This is the output buffer, which is neededd for one hot state encoding. For more details please refer to [30] This two rail signal indicates the presence of error in the full config controller state machine 21

29 The FSM diagram of the Full Configuration Controller is shown in the Figure 16. In this controller, we control the PROGRAM_B pin of the Slave FPGA. The other configuration pins are connected to the interface of the platform flash on the slave side 20 directly. When Start_full command is received from fault classifier, the full configuration controller lowers the PROGRAM B for at least 250 ns 21. Then it releasess the PROGRAM_B and monitors thee SLAVE_DONE signal; when received, the controllerr finishes its job by entering the done state Digital Clock Manager (DCM) Figure 16 Full Configuration Controller finite state machine The Digital Clock Manager (DCM) is a primitive in Xilinx FPGA A and can be used to implement delay locked loop, digital frequency synthesizer, digital phase shifter, orr a digital spread spectrum. In this design, DCM was used to reduce 100 MHz clock frequency. In fact, the implemented design does not need any DCM because, it can work with 100 MHz clock directly; however, if a slave serial configuration is used instead of master serial configuration, the maximum clock speed should be less than 20 MHz 22, and, in this case a DCM is needed. In practice, the maximum clockk speed should not exceed 16 MHz 23 when using slave serial mode Debounce module A simple debounce module is implemented to debounce the input push buttons for reset and write signals. The debounce module for start write is not shown in Figure 9. start Bit Stream Module Up to now, the partial bit stream files are stored in on chip BRAMs. These files should be protected against any SEU; therefore, in the final product, these filess should be moved to a radiation hardened memory. Since we do not access to any Rad Hard memory in this project, we have used a simple I2C 20 indirectly via two CPLDs and one FPGA 21 This is the minimum equired time for PROGRAM_ B to remain asserted 22 This is the maximum clock frequency of Platform Flash 23 This is the maximum clock frequency, which we have reached forr XCF32P. 22

30 memory to test our design. The new block diagram of the whole system with an external memory for storing partial bit files are shown in Figure 17. Figure 17 the implemented design with an external memory for storing partial bit stream files In this design, the responsibility of the Bit Stream Module is to refresh the content of BRAM every n minutes. This refresh interval could be changed based on the application and the environment in which the system is deployed. Another solution for a Bit Stream Module is to send the data from external Memory to PR controller directly. This solution is suitable when the size of the bitstream files is too large and it is not possible to store all of them on on chip memory at the same time. However, in this case, the interface speed of the external memory would limit the partial reconfiguration speed and we could not benefit from 400MB/S24 configuration speed anymore. Since the size of the Bit files is small enough in our project, we kept the main idea of using BRAM and we add a bit stream module to refresh the content of BRAM periodically Implemented design in the slave side Figure 18 shows the block diagram of the design in the slave board. As previously mentioned, we assume that the required system is implemented in the slave side. This part consists of partial reconfiguration regions and a static part. 24 This is the maximum reachable ICAP speed 23

31 Virtex 5 Evaluation board (Slave Side) FPGA 2 (Target) Partial Reconfiguration interface ICAP RM1 R W Mux RM2 R W Mux RM3 R Static Parts Error Signals W Mux CPLD 2 (Routing full configuration interface to the dedicated FPGA 2 configuration pins) Full Configuration Figure 18 Implemented design slave side Static Region The static region contains the parts that cannot or should not be reconfigured. These items could be ICAP_VIRTEX5, I/O buffers or DCMs ICAP_VIRTEX5 [31] The ICAP_VIRTEX5 primitive works the same way as the SelectMAP configuration interface except it is on the fabric side and ICAP has a separate read/write bus, as opposed to the bidirectional bus in SelectMAP. The general SelectMAP timing diagrams and the SelectMAP bitstream ordering information as described in the SelectMAP Configuration Interface section of this user guide are also applicable to ICAP. It allows the user to access configuration registers, readback configuration data, or partially reconfigure the FPGA after configuration is done. ICAP has three data width selections through the ICAP WIDTH parameter: x8, x16, and x32. The two ICAP ports cannot be operated simultaneously. The design must start from the top ICAP, and then switch back and forth between the two. Pin Name CLK CE WRITE I[31:0] Type O[31:0] Output BUSY Output Description ICAP interface clock Active Low ICAP interface select. Equivalent to CS_B in the SelectMAP interface. 0=WRITE, 1=READ. Equivalent to the RDWR_B signal in the SelectMAP interface. ICAP write data bus. The bus width depends on ICAP_WIDTH parameter. The bit ordering is identical to the SelectMAP interface. See ICAP Data Ordering in [31] Unregistered ICAP read data bus. The bus width depends on the ICAP_WIDTH parameter. The bit ordering is identical to the SelectMAP interface. Active High busy status. Only used in read operations. BUSY remains Low during writes. 24

32 ICAP Dataa Ordering [31] In many cases, ICAP configuration is driven by a user application residing on a microprocessor, CPLD, or in some cases another FPGA. In these applications, it is important to understand how the data ordering in the configuration data file corresponds to the data ordering expected by the FPGA. In ICAP x8 mode, configuration data is loaded at one byte per CCLK, with the MSB of each byte presented to the D0 pin. This convention (D0 = MSB, D7 = LSB) differs from many other devices. This convention can be a source of confusion when designing custom configuration solutions. Table 6 shows how to load the hexadecimal value 0xABCD into the ICAP data bus. Table 6 Bit Ordering for ICAP 8 Bit Mode CCLK Cycle HEX Equivalent 1 0xAB 2 0xCD D0 D1 D D3 D D5 D6 D Some applications can accommodate the non conventional data ordering without difficulty. For other applications, it can be more convenient for the source configuration data file to be bit swapped, meaning that the bits in each byte of the dataa stream are reversed. For these applications, the Xilinx PROM file generation software can generate bit swapped PROM. Table 7 shows the bit ordering for x8, x16, and x32 modes. Table 7 Bit Orderingg x32 x16 x Pin IBUFDS: differential input buffer primitive In order to be able to use the ICAP maximum speed, we need to use differential pairs to connect the ICAP dataa bus between two evaluation boards. To use these pairs, we need to utilize differential IO buffers. The usage and rules corresponding to the differential primitivess are similar to the single ended SelectIO primitives. Differential SelectIO primitives have twoo pins to and from the device pads to show the P and N channel pins in a differential pair. N channel pinss have a B suffix. [32] Figure 19 shows the differential input buffer primitive. Figure 19 Differential Buffer Primitive (IBUFDS) 25 D [0:7] represent the ICAP DATA pins. 25

33 Partial Reconfiguration Regions An implemented system on a FPGA should be divided into Partial Reconfiguration regions (PRR). Partial reconfigurable modules (PRM) are the part of the design that can be placed in the PR regions. User may have any number of PRM in a PRR; however only one PRM can be operated in a PRR at a given time. The minimal size of the PRM is theoretically one CLB 26 ; however, due to the structure of the configuration memory, configuration of the CLB is contained in several frames and each frame contains the configuration bits of 20 CLBs. Since the frame is the smallest part of the FPGA that can be configured, every reconfiguration changes at least 20 CLBs [12]. The Size of the PRMs is important in optimality of the performance. The author in [8] proposed a reliability aware solution for selecting an optimal area for PRMs. It is necessary to insert specific interface at the boarders of the PRRs. These interfaces are called proxy logics in ISE design tools. The user can place proxy logic manually or they can be placed by design tool automatically. In recent ISE design tools, these proxy logics are also supported by the timing analysis. Therefore, it is possible to analyze the critical path between static region and PR regions. The design flow can be done in ISE and PlanAhead. First, the ISE synthesize the VHDL or Verilog codes and generates the necessary Netlist files. Next, these files are imported to the PlanAhead. Last, after the procedure of floor planning in the PlanAhead, partial bitstream files (*.bit) will be generated for each PRMs. These bitstream files have a header that contains the address of a PRM. For reconfiguring a PRM, the relevant partial bitstream file should be forwarded to the configuration engine by means of one of the available interfaces Summary In this chapter, we describe our proposed configuration controller in details. The implemented configuration controller and its characteristics were presented. In the next chapter, we will discuss deign hardening techniques for our proposed controller. 26 Configuration Logic Block 27 JTAG, SelectMAP, Master Serial or ICAP 26

34 4 Design Hardening Up to now, all components of the system implemented in our multi FPGA platform is hardened by combination of hardware redundancy techniques and partial reconfiguration capability for fault detection, masking and recovery. In our work, each component is triplicated and each part is placed in one PR region. By comparing the output of each part with the other parts, the voter can detect and mask an error in a PR region and inform the reconfiguration controller for recovery. However, three more issues should still be protected against SEUs. In this chapter, we will discuss the strategy for hardening the Configuration Controller, the interfaces and the bitstream. As previously discussed, the most robust mitigation strategy is to use redundancy techniques coupled with partial reconfiguration property. In this design, three different approaches have been utilized to increase the overall reliability. 4.1 State Machine Encoding Because the implemented circuits for PR_Controller, full_config_controller, fault_classifier and bit_stream_module are based on finite state machines (FSM), the first step to increase the robustness of the design is to encode FSMs. Many works have been carried out to apply an optimal state encoding [30], [33]. There are many tools and techniques available to apply an optimal state encoding. Common to most of them is to minimal the number of bits required for state encoding. A poor choice of encoding techniques will result in a state machine that is very costly in terms if resource utilization or it is very slow or both. Moreover, encoding must be applied in the hardware description language to ensure reliability of protected FSM. In this project, an optimized one hot state encoding has been embedded into hardware description language of the state machines. In the one hot state encoding, only one bit of the state vector is set to one for any given state and all other state bits remain zero. Thus if there are n states then n state flops are required. State decode is simplified, since the state bits themselves can be used directly to indicate whether the machine is in a particular state. No additional logic is required [30]. We have used one hot state encoding because it has the following advantages: It maps easily into Xilinx register based FPGA architecture and it is easy to apply one hot state encoding to a state machine. Schematics can be captured and HDL code can be written directly from the state diagram without coding a state table. [30] One hot state encoding is typically faster than other state ending techniques. Moreover, Speed is independent of the number of states, and instead depends only on the number of transitions into a particular state. [30] It is very easy to modify the design without manipulating the rest of the machine. 27

It can be easily synthesized from VHDL or

when synthesizing the circuit, however, it is

encoding directly to the FSM VHDL codes.

given state vector, there will be an error. 4.

state machines, there are some internal signals

An undesired value on these signals may start a

Examples of such signals are the stat PR and

configuration controller too start their

functionality of the whole design may corrupted.

35 It can be easily synthesized from VHDL or Verilog and it is possible to find critical path using static timing analysis. [30] Xilinx can apply one hot state encoding when synthesizing the circuit, however, it is not possible to use this property since in this case error detection is not possible. Therefore, we have applied the one hot state encoding directly to the FSM VHDL codes. The error detection is quite easy in this scenario. It is only necessary to check the state bits concurrently. If there is more than one bit asserted in a given state vector, there will be an error. 4.2 Internal Signal Hardening In addition to state machines, there are some internal signals between components, which should be hardened. An undesired value on these signals may start a component unexpectedly. Examples of such signals are the stat PR and start_full_config that are used by fault_classifier to inform partial reconfiguration controller and full configuration controller too start their functions respectively. If there is an undesired upset on these signalss or if they stock to zero or one, the functionality of the whole design may corrupted. To prevent this situation, we have used 2 rail logic to encode them. In this simple technique, the signal is presented by two bits. 10 presents 1 and 01 presents and 11 are not valid values. An error signals will be generated in case of occurrence of 00 and 11. Moreover, the error detection is also very simple, and could be implemented by an XOR gate. 4.3 Interface Hardening The next step is to harden the connection signals between two evaluation boards (Figure 20) or two FPGAs in a multi FPGA platform. Figure 20 the connection between twoo evaluation boards 28

36 These signals are susceptible to faults caused mainly by radiation or electromagnetic interference. Since, the ICAP data pins are 16 bit or 32 bit, and they are working with 100 MHz clock frequency, cross talk is also possible to happen. We have used differential pairs to prevent such phenomenon. The principle of differential pairs is quite the same as 2 rail logic. In differential pairs, 10 presents 1 and 01 presents 0 and 00 and 11 are not valid values. 4.4 Bitstream Memory Protection In this study, the necessary bitstreams for reconfiguring the PR regions and also the whole FPGA are generated with the help of tool chain and stored in an external non volatile flash memory. These flash memories are susceptible to SEUs [34]. Since the reliability of the whole system is depending on the correctness of these original bitstreams, we need to protect them against radiation. In this work, we envisioned a solution to protect the original bitstreams based on using a radiation hardened memory. However, this is not the only solution for protection. Another possible solution is to utilize error control codes [35], [36], [37]. Particularly, the author in [35] has presented encoders and decoders of error control codes for semiconductor memory systems used in the space radiation environment. In this work, widely used error control codes, such as Hamming and Reed Solomon (RS) codes, compared with new classes of byte error control codes suitable for semiconductor memory systems, called spotty byte error control codes. The author concluded that the spotty byte error control codes show better performance in terms of gate counts and maximum clock frequencies. With the help of this technique, we can benefit from regular non volatile memories without worrying about the incorrectness of the original version of bitstreams. 29

5 Test Results The design has been tested on two

As previously mentioned, the controller is able to

responsibility of the user; however, without

been used in the slave side, we assume that the error

To verify the correct functionality of the

One of them is generating a string of 10 and 01 which

correctly; the other module iss generating a string of

when using the PlanAhead software for Partial

Figure 21 shows the location of these PR regions after

Figure 21 generated PR regions on the FPGA fabric PR

For each PR modules, one test bitstream file has been

37 5 Test Results The design has been tested on two identical XUPV5 LX110T T evaluation boards (Figure 6). As previously mentioned, the controller is able to reconfiguree the faulty part of the slave FPGA based on the received error signals. The error detection of the slave FPGA is the responsibility of the user; however, without considering the error detection method, whichh has been used in the slave side, we assume that the error signals are following the 2 rail checking rules 28. To verify the correct functionality of the configuration controller, Three PR regions have been created in the slave side. Each PR region contains two PR modules. One of them is generating a string of 10 and 01 which represent that the corresponding PR region works correctly; the other module iss generating a string of 11 and 00 which represent that there is an error in the corresponding PR region. Chapterr 4 in [15] describes the design steps involved when using the PlanAhead software for Partial Reconfiguration designs. Figure 21 shows the location of these PR regions after floor planning in PlanAhead. Figure 21 generated PR regions on the FPGA fabric PR modules are implemented in an 8x8 CLBs. For each PR modules, one test bitstream file has been generated. These bitstream files have a size of Bytes each. One bitstream file in each PR region, which corresponds to correct behaviors, are stored on the on chip BRAMs For partial reconfiguration process and 11 on errorr signals indicate an error in the corresponding component 30

38 The other three bitstreams, which are not stored in the BRAM, are used to simulate faults in the PR regions. These partial bitstreams are downloaded to the FPGA via JTAG in impact tool. After downloading, the corresponding PR regions will start sending error signals to the implemented configuration controller in the master side. Then the master side will respond to the error signals by reconfiguring the corresponding PR region by a correct PR module. The above mentioned process has been tested for 100 times and for each PR region with different number of CLBs. The configuration controller was able to correct all of the simulated faults by performing partial reconfiguration or full configuration. Table 8 shows the device utilization summary for the configuration controller at the master side. The resource utilization of bitstream module is not included in this summary. Table 8 device utilization summary for configuration controler (exclude bitstream module) Slice Logic Utilization 31 Used Available Utilization Number of Slice Registers ,120 1% Number used as Flip Flops 164 Number of Slice LUTs ,120 1% Number used as logic ,120 1% Number using O6 output only 114 Number using O5 output only 56 Number using O5 and O6 66 Number used as exclusive route thru 3 Number of route thrus 59 Number using O6 output only 59 Number of occupied Slices ,280 1% Number of LUT Flip Flop pairs used 259 Number with an unused Flip Flop % Number with an unused LUT % Number of fully used LUT FF pairs % Number of unique control sets 18 Number of slice register sites lost to control set restrictions 20 69,120 1% Number of bonded IOBs % Number of LOCed IOBs % IOB Master Pads 16 IOB Slave Pads 16 Number of Block RAM/FIFO %

39 Number using Block RAM only 18 Number of 36k Block RAM used 16 Number of 18k Block RAM used 3 Total Memory used (KB) 630 5,328 11% Number of BUFG/BUFGCTRLs % Number used as BUFGs 3 Number of DCM_ADVs % Average Fan out of Non Clock Nets 3.72 Our implemented generic controller shows a better performance in terms of speed compare to other generic controllers and software based controllers. Table 9 compares our design with another generic reconfiguration controller proposed by Ali Ebrahim in [26] and a software based controller based on the Xilinx XPS_HWICAP engine presented in [7]. Partial Bitstream Size (KB) XPS_HWICAP(x32) [7] Table 9 configuration times for different partial bitstreams BRAM HWICAP (x32) [7] Configuration Time (us) ICAP controller(x32) [26] Our ICAP Controller (x16) Our ICAP Controller (x32) In addition to this, our proposed configuration controller shows better results in terms of resource utilization compare to the proposed design in [26]. Ali Ebrahim [26] implemented his proposed controller on 609 FPGA slices; however, our design is utilized only 239 FPGA slices (Table 10), which shows a significant reduction in space utilization 29. Table 10 Resource utilization of ICAP controller Resources LUTs (total) XPS_HWICAP(x32) [7] BRAM HWICAP (x32) [7] ICAP controller(x32) [26] Our ICAP Controller (x32) The external memory interfaces are excluded in both designs to calculate resource utilization. 32

40 6 Conclusion and Future Works The research presented in this thesis has proposed a dependable reconfiguration controller with the aim of recovering the faulty portion of the FPGA in a multi FPGA platform. Our working scenario was harsh environment such as mission critical and safety critical applications where electronic devices are susceptible to SEUs caused by ionizing radiation. In this thesis, different types of fault tolerant techniques for systems based on FPGAs are discussed. It was concluded that the best fault tolerant technique for a FPGA based design, is to use redundancy techniques for fault detection and fault containment, then use a recovery technique based on Xilinx partial reconfiguration for mitigating the fault. The main innovative contributions provided by this thesis are summarized as follows: The controller is implemented purely on hardware. Not only this generic implementation increases performance in terms of higher speed and lower resource utilization, but also it allows the designer to apply any available FT techniques for increasing the reliability of the controller. The configuration interfaces for full configuration and partial reconfiguration are completely separated from each other. The full configuration is done via a serial interface whereas partial reconfiguration is done via the Parallel ICAP interface. This method will increase the overall reliability because, if the partial reconfiguration stops functioning for any reason, there still is a configuration solution for recovering the FPGA, eventually with lower speed. Directions for future work aimed at its improvement are summarized in the following: 1 A comprehensive testing solution: The testing method used in this thesis is based on pre build bitstream files, which simulate an SEU in the corresponding module. This method is not through enough. There exist two other recognized testing methods, which should be considered as a future work to this thesis. These two testing method are radiation testing strategies and faultinjection campaign. Although radiation test remains one of the worldwide recognized and complete methods for SEU analysis, radiations may permanently damage the device under test (DUT) and increase the testing cost for both the development of radiation setup and for the time that beam operating. Moreover, there is no control on the beam to hit a specific location. Therefore, an SEU may occur in an undesired bit. Another solution is to inject fault during programming phase to emulate SEUs in the FPGA; however, this method requires huge amount of time to provide consistent result. The better solution could be calculating the critical bits and then performing fault injection based on these bits. One example of such approach is presented in [38]. 2 Problem with synchronization of PRMs: The Synchronization of a newly reconfigured module with other modules in the FPGA or other FPGAs in a multi FPGA platform is another issue that has not been addressed in this thesis. This step will be the next step after fault recovery. The newly reconfigured module must start operating from a correct state. One solution to this problem is presented in [12]. 33

41 7 Glossary Some common terminologies used through this document are defined in this section. These definitions are taken from [39] Device: A single integrated circuit. Failure: An unrecoverable error. Functional Error: A logic error in the user function. Functional Interrupt: A disruption in device operation requiring system level intervention to regain normal functionality. Typically causes the loss of user or system data. Multiple Bit Upset (MBU): An SEU that results in more than one adjacent bits flipping due to an oblique angle strike. MBU probability steadily increases as geometries shrink. Use of maximum MBU distance observed is useful to determine block RAM interleaving required so that even MBUs can be corrected by the ECC. Single Bit Upset (SBU): Same as SEU. Scrubbing: The process of correcting any configuration cell upsets through FPGA partial reconfiguration. Scrubbing does not interrupt user design function. Single Event Effect (SEE): The resulting electrical disturbances caused by the direct ionization of a silicon lattice by an energetic charged subatomic particle. Single Event Functional Interrupt (SEFI): An SEE that results in the interference of the normal operation of a complex digital circuit. SEFI is typically used to indicate a failure in a support circuit, such as loss of configuration capability, power on reset, JTAG functionality, a region of configuration memory, or the entire configuration. Single Event Transient (SET): A signal transition caused by a SEE. Often observed as a glitch. Single Event Upset (SEU): A state change (or flip) of a single data bit storage or memory cell caused by an SEE. An SEU can affect the configuration memory cell states, the block RAM contents, a CLB DFF, a LUTRAM, or SRL16 memory cell (which are also configuration memory cells, directly accessible to the user). System: An integration of multiple devices and circuit boards or modular sub systems. User Function: User specified operational functions defined by the data stored in device configuration memory. 34

42 8 Works Cited [1] M. Caffrey, "A Space Based Reconfigurable Radio," Military and Aerospace Applications of Programmable Logic Devices (MAPLD), Laurel MD, USA, [2] A. Dawood, S. Visser and J. Williams, "Reconfigurable FPGAS for real time image processing in space," in 14th International Conference on Digital Signal Processing, DSP2002, Santorini, Greece, [3] D. M. Hiemstra, G. Battiston and P. Gill, "Single Event Upset Characterization of the Virtex 5 Field Programmable Gate Array Using Proton Irradiation," in IEEE Radiation Effects Data Workshop (REDW), Denver, CO, [4] M. Ceschia, M. Menichelli, A. Papi, J. Wyss and A. Paccagnella, "Ion beam testing of SRAM based FPGA's," in Radiation and Its Effects on Components and Systems, th European Conference on, [5] E. Fuller, P. Blain, M. Caffrey and C. Carmichael, "Radiation Test Results of the Virtex FPGA and ZBT SRAM for Space Based Reconfigurable Computing," Xilinx Inc., Los Alamos National Laboratory, [6] L. Sterpone, M. Aguirre, J. Tombs and H. Guzmán Miran, "On the design of tunable fault tolerant circuits on SRAM based FPGAs for safety critical applications," in Design automation and test in Europe, Torino, Sevilla, [7] M. liu, W. Kuehn, Z. Lu and A. Jantsch, "Run Time Partial Reconfiguration Speed Investigation and Architectural Design Space Exploration," in FPL, Giessen, Germany, [8] C. Bolchini, A. Miele and C. Sandioni, "A Novel Design Methodology for Implementing Reliability Aware Systems on SRAM Based FPGAs," IEEE TRANSACTIONS ON COMPUTERS, vol. 60, no. 12, pp , [9] J. Hussein and G. Swift, "Mitigating Single Event Upsets," Xilinx, [10] "FPGA vs. ASIC," Xilinx Inc., [Online]. Available: [11] I. Kuon and J. Rose, "Measuring the Gap between FPGAs and ASICs," in FPGA 06, Toronto, [12] M. Straka, J. Kastil and Z. Kotasek, "Fault Tolerant Structure for SRAM based FPGA via Partia Dynamic Reconfiguration," Digital System Design: Architecture, Methods and Tools, pp , [13] F. Lima, C. Carmichael, J. J. Fabula and R. Padovani, "A fault injection analysis of Virtex FPGA TMR design methodology," in Radiation and Its Effects on Components and Systems, th European Conference on,

43 [14] K. S. Morgan, D. L. McMurtrey, B. H. Pratt and M. J. Wirthlin, "A Comparison of TMR With Alternative Fault Tolerant Design Techniques for FPGAs," Nuclear Science, IEEE Transactions on, vol. 54, no. 6, pp , [15] "Partial Reconfiguration User Guide," Xilinx Inc., [16] L. Ming, W. Kuehn, L. Zhonghai and A. Jantsch, "Run Time Partial Reconfiguration Speed Investigation and Architectural Design Space Exploration," in International Conference on Field Programmable Logic and Applications, FPL 2009, Prague, [17] M. Berg, C. Poivey, D. Petrick, D. Espinosa, A. Lesea, K. LaBel, M. Friendlich, H. Kim and A. Phan, "Effectiveness of Internal Versus External SEU Scrubbing Mitigation Strategies in a Xilinx FPGA: Design, Test, and Analysis," IEEE Transactions on Nuclear Science, vol. 55, no. 4, pp , [18] K. Chapmanl, "SEU Strategies for Virtex 5 Devices (XAPP864)," Xilinx Inc., [19] A. Sari and M. Psarakis, "Scrubbing based SEU Mitigation Approach for Systems on Programmable Chips," in International Conference on Field Programmable Technology (FPT), New Delhi, [20] M. Niknahad, O. Sander and J. Becker, "Fine grain fault tolerance A key to high reliability for FPGAs in space," in IEEE Aerospace Conference, Big Sky, MT, [21] K. Kyriakoulakos and D. Pnevmatikatos, "A novel SRAM based FPGA architecture for efficient TMR fault tolerance support," in International Conference on Field Programmable Logic and Applications, FPL, Prague, [22] C. Carmichae, "Triple Module Redundancy Design Techniques for Virtex FPGAs (XAPP197)," Xilinx Inc., [23] "TMRTool," Xilinx Inc., [Online]. Available: [Accessed 2013]. [24] F. de Lima Kastensmidt, "Designing fault tolerant techniques for SRAM based FPGAs," vol. 21, no. 6, pp , [25] C. Bolchini, L. Fossati, D. Codinachs, A. Miele and C. Sandionigi, "{A reliable reconfiguration controller for faulttolerant embedded systems on multi FPGA platform," in IEEE 25th International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT), Kyoto, [26] A. Ebrahim, K. Benkrid, X. Iturbe and C. Hong, "A Novel High Performance Fault Tolerant ICAP Controller," Edinburgh. [27] Y. shu Yi and E. J. McCluskey, "On line Testing and Recovery in TMR Systems for Real Time Applications," in ITC INTERNATIONAL TEST CONFERENCE, Stanford University, Stanford, California,

44 [28] D. Nikolos, "Self Testing Embedded Two Rail Checkers," Journal of Electronic Testing: Theory and Applications Special issue on On line testing, vol. 12, no. 1 2, pp , [29] M. Omana, D. Rossi and C. Metra, "High Speed and Highly Testable Parallel Two Rail Code Checker," in Design, Automation and Test in Europe Conference and Exhibition, [30] S. Golson, "One hot state machine design for FPGAs," in 3rd PLD Design Conference, Santa Clara CA, [31] "Virtex 5 FPGA Configuration User Guide," Xilinx Inc., [32] "Virtex 5 FPGA User Guide (UG190)," Xilinx Inc., [33] M. Cassel and F. Lima, "Evaluating one hot encoding finite state machines for SEU reliability in SRAM based FPGAs," in 12th IEEE International On Line Testing Symposium, IOLTS, Lake Como, [34]. D. Nguyen,. S. Guertin and. J. Patterson, "Radiation Tests on 2Gb NAND Flash Memories," in IEEE Radiation Effects Data Workshop, Ponte Vedra, FL, [35] H. Kaneko, "Error Control Coding for Semiconductor Memory Systems in the Space Radiation Environment," in 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, DFT, [36] G. Umanesan and E. Fujiwara, "A class of random multiple bits in a byte error correcting (Stb/EC) codes for semiconductor memory systems," in Pacific Rim International Symposium on Dependable Computing, Proceedings, [37] G. Umanesan and E. Fujiwara, "A class of systematic t/b error correcting codes for semiconductor memory systems," in Information Theory Workshop, IEEE Proceedings., Cairns, Qld., [38] L. Sterpone, F. Margaglia, M. Koester, J. Hagemeyer and M. Porrmann, "Analysis of SEU Effects in Partially Reconfigurable SoPCs," in Adaptive Hardware and Systems (AHS), 2011 NASA/ESA, San Diego, CA, [39] B. Bridgford, C. Carmichael and W. Tseng, "Single Event Upset Mitigation Selection Guide," Xilinx, [40] "Early Access Partial Reconfiguration User Guide," [41] "LogiCORE IP Soft Error Mitigation Controller v2.1," Xilinx Inc., [42] E. Dubrova, FAULT TOLERANT DESIGN:AN INTRODUCTION, Stockholm: Kluwer Academic Publishers, 2008, p [43] "Virtex 5 Family Overview," Xilinx, [44] "hkinventory," Xilinx, [Online]. Available: 37

45 [45] S. Suhail Zain and C. Hu, "NSEU Mitigation in Avionics Applications," [46] "Xilinx University Program XUPV5 LX110T Development System," Xilinx Inc., [Online]. Available: lx110t.htm. [47] "Soft Error Mitigation Controller," Xilinx Inc., [48] "Partial Reconfiguration User Guide UG702,"

46 9 Appendices 9.1 Appendix A: Bitstream Scrubbing and Readback Upsets in Xilinx FPGA can be removed by advanced scrubbing. Scrubbing means reading back the configuration bitstream that is stored in configuration memory, comparing it with an original one and correcting any affected configuration bits. The configuration management system, which is able to detect and correct any upsets in configuration memory by means of scrubbing, can be hosted in a radiation hard FPGA, ASIC, Microcontroller or the FPGA itself. The internal scrubbing in Virtex 5 FPGA is done via ICAP for reading back the frames in conjunction with Frame Error Correction Code (ECC) for detect single or double bit errors in configuration frame data. Configuration management can only detect and correct errors caused by SEUs. It cannot mitigate the SEU s effects. Therefore, configuration management is often combined with redundant FPGA mitigation schemes to mask the SEU s effects in the system. Virtex 5 FPGA configuration memory is arranged in frames that are tiled about the device. These frames are the smallest addressable segments of the Virtex 5 configuration memory space, and all operations must therefore, act upon whole configuration frames. [31] Frame (Figure 22) is the smallest part of the FPGA that can be reconfigured and has a size of 1362 bits in the Virtex 5. [40] Virtex 5 (LX110T) frame counts and configuration sizes are shown in Table 1. Device Table 11 Virtex 5 Device Frame Count, Frame Length, Overhead, and Bitstream Size [31] Non Configuration Frames Configuration Frames Total Device Frames Frame Lengths in Words Configuration Array Size in Words Bitstream Overhead in Words 1 LX110T ,712 24, , Configuration overhead consists of commands in the bitstream that are needed to perform configuration but do not themselves program any memory cells. Configuration overhead contributes to the overall bitstream size. 39

47 Figure 22 A schematicc FPGA structure. Taken from [8] There are some advantages and disadvantages to this method. These include: Advantages: There is no need to partitioning the device. Therefore, the performance remains unaffected. It can reconfigure a component in a finer granularity than PR approach. The state of a component can be reserved Need less memory than PR approach. Only one bitstream file is needed All implementation options are available. (e.g. techniques perform optimization across the entire design) Disadvantages: This method is blind, which means it will readback the configuration bitstream frame by frame; if an upset occurs it will fix it. This method is not able to locate the error unless it reads back its relative frame. The recovery process may take time (it depends on the fault location). It cannot mitigate the SEU s effect. Therefore, scrubbing is not able to guarantee the correct work of the design after recovery. It is not possible to detect SEUs in the BRAMs and the registers since their values changes in time and cannot be ested by scrubbing. Detection of errors in these parts of the system should be done by means of other techniques. [12] This method can only be used to reconfigure with the same bitstream. It cannot change the functionality of the components. Time multiplex is not possible Difficult to monitor if an error has occurred during reconfiguration of a frame It is not possible to detect more than two SEUs and correct more than one SEU in a frame Xilinx has recently introduced a SEU mitigation controller (SEM), which iss based on bitstream scrubbing. The SEM Controller implements five main functions: initialization, error injection, error detection, error correction, and error classification. All functions, except initialization and detection, are optional; desired functions are selected during the IP core configuration and generation processs in the Xilinx CORE Generator. [41] 40

48 The controller initializes by bringing the integrated soft error detection capability of the FPGA into a known state after the FPGA enters user mode. After this initialization, the controller endlessly loops, observing the integrated soft error detection status. When an ECC or CRC error is detected, the controller evaluates the situation to identify the Configuration Memory location involved. [41] Once this is complete, the controller may optionally correct the soft error by repairing it or by replacing the affected bits. The repair methods are active partial reconfiguration to perform a localized correction of Configuration Memory using a read modify write scheme. These methods use algorithms to identify the error in need of correction. The replace method is also active partial reconfiguration with the same goal, but this method uses a write only scheme to replace Configuration Memory with original data. This data is provided by the implementation tools and stored outside the controller. [41] The controller may optionally classify the soft error as essential or non essential using a lookup table. The lookup table is stored outside the controller and is fetched as required during execution of error classification. This data is also provided by the implementation tools and stored outside the controller. [41] When the controller is idle, there is an option to accept input from the user to inject errors into Configuration Memory. This function is useful for testing the integration of the controller into a larger system design. Using the error injection capability, system verification and validation engineers may construct test cases to ensure the complete system responds to soft error events as expected. [41] The SEM controller uses ICAP for readback and accessing the configuration memory. The ICAP Interface is a point to point connection between the SEM Controller and the ICAP primitive. The ICAP primitive enables read and write access to the registers inside the FPGA configuration system. For error detection, the SEM controller uses FRAME_ECC Interface. The FRAME_ECC primitive is an output only primitive that provides a window into the soft error detection function in the FPGA configuration system. The Virtex 5 Frame error correction code (ECC) logic is designed to detect single or double bit errors in configuration frame data. [41] [31] There are some advantages and disadvantages to this method. These include: Advantages: Support different error detection and correction techniques [41] Completely flexible. Can be used on many applications [41] Can perform error detection, error containment, error classification, and error correction Various status and monitor registers Disadvantages: Only available for special Xilinx series (Spartan 6, Virtex 6, Virtex 7, Kintex 7 series) Error detection is not optimal (use FRAME_ECC primitive which read configuration memory frame by frame periodically) 41

49 9.2 Appendix B: Redundancy One method to provide fault tolerance in embedded systems is through redundancy. For our purposes, redundancy is the provision of functional capabilities that would be unnecessary in a fault free environment. This can be a replicated hardware component, an additional check bit attached to a string of digital data, or a few lines of program code verifying the correctness of the program s results. [42] Two kinds of redundancy are possible: space redundancy and time redundancy. Space redundancy provides additional components, functions, or data items that are unnecessary for a fault free operation. Space redundancy is further classified into hardware, software and information redundancy, depending on the type of redundant resources added to the system. In time redundancy the computation or data transmission is repeated and the result is compared to a stored copy of the previous result. [42] The term redundancy in literatures mostly returns to space redundancy. The most common form of space redundancy is Triple Modular Redundancy (TMR). Figure 23 Shows the TMR basic principle. In TMR, the components are triplicated and their outputs are compared to each other. If there is an error in one module, the voter will mask the error. TMR can be applied to different granularity, from logic level to system level. Figure 23 TMR basic principle In addition to TMR, there are many other hardware redundancy techniques available (such as N modular redundancy, duplication with comparison, standby sparing, self purging redundancy and Triplex duplex redundancy [42]). Xilinx has introduced XTMR 30 software tool to simplify the task of design triplication. According to [39] TMRtool can partially or fully triplicate a design, insert voters, synchronize feedback path loops, and allow customized user triplicated module insertion. A triplicated design mitigates SEU impact on the user design. However, the XTMR is very costly in terms of resource utilization and as a result, leads to lower frequency and higher power consumption. 30 Xilinx Triple Modular Redundancy 42

50 The redundancy can also be applied to the device level. For instance, in Figure 24 the FPGA is triplicated with two identical duplications. However, in this design the voter is itself a point of failure and must be implemented on a Radiation Hard device. This design can also be very expensive. Figure 24 TMR Device Level To summarize, Redundancy is common techniques for almost all approaches to a FT design; however, it cannot be considered as a mitigation scheme in a FPGA design solely, because redundancy can only detect and mask the fault and it cannot recover the modules from faults. Therefore, the best mitigation schemes would be a combination of redundancy and a reconfiguration controller for detecting, masking and correcting the faults. Table 12 Summarize the performance overview of mitigation schemes. Mitigation Scheme Table 12 Performance Overview of mitigation schemes. Part of the table is taken from [12] Mitigation Strength Board Layout Complexity Ease in Meeting Timing Constraints 43 Power Consumption Component Cost Average Recovery Speed power cycling Weak Low Normal Typical Low Lowest XTMR Medium High Reduced ~3X typical Low N/A Bitstream Scrubbing Medium Low Normal Typical Medium Low PDR Medium Low Reduced Typical Low High XTMR + Bitstream Scrubbing Strong High Reduced ~3X Typical Medium Low XTMR + PDR Strong High Reduced ~3X Typical Low High Redundant devices + Bitstream Scrubbing Redundant devices + PDR Strongest Medium Normal 2~4X typical High Low Strong Medium Reduced 2~4X typical High Highest As previously mentioned, a combination of redundancy and a reconfiguration controller will be the strongest mitigation scheme. For instance, redundant devices (which could be a combination of redundancy at component and device level in multi FPGA platforms) plus bitstream scrubbing or PDR will lead to the best mitigation result. One of the important differences between scrubbing and PDR is that scrubbing may show a better result in recovering the upsets, especially when SEU occurs in the routing bits, however, it s recovery speed could be much lower than PDR.

9.3 Appendix C: Xilinx Virtex 5 overview Xilinx Virtex 5 FPGAs are one of the Virtex families which introduced by Xilinx in 2009 with the highest performance of 550 MHz.

SXT: Signal processing applications with advanced serial connectivity 4 Virtex 5 FXT: Embedded systems with advanced serial connectivity Device XC5VLX110T Configuration Logic Blocks (CLBs) Array (Row

Virtex 5 FPGA slices are organized differently from previous generations. Eachh Virtex 5 FPGA slice contains four 6 input LUTs and four flip flops (previously it was two LUTs and two flip flops.

4 Each Clock Management Tile (CMT) contains two DCMs and one PLL. 5 RocketIO GTP transceivers are designed to run from 100 Mb/s to 3.75 Gb/s.

Max (kb) CMT 4 Power PC processor blocks Ethernet MACs N/A 4 Max Rocket IO Transceivers GTP 5 GTX Max user I/O 6 16 N/A 680 Figure 25 Xilinx Virtex 5 XC5VLX110T device.

51 9.3 Appendix C: Xilinx Virtex 5 overview Xilinx Virtex 5 FPGAs are one of the Virtex families which introduced by Xilinx in 2009 with the highest performance of 550 MHz. This Family is divided into four different categories: 1 Virtex 5 LX: High performance general logic applications 2 Virtex 5 LXT: High performance logic with advanced serial connectivity 3 Virtex 5 SXT: Signal processing applications with advanced serial connectivity 4 Virtex 5 FXT: Embedded systems with advanced serial connectivity Device XC5VLX110T Configuration Logic Blocks (CLBs) Array (Row x Col) 160 x 54 Slices 1 Table 13 Virtex 5 (LX110T) device specification taken from [43] Max Distributed RAM(kb) 17,280 1,120 DSP48E slices 2 Block RAM blockss kb 3 kb , Virtex 5 FPGA slices are organized differently from previous generations. Eachh Virtex 5 FPGA slice contains four 6 input LUTs and four flip flops (previously it was two LUTs and two flip flops.) 2 Each DSP48E slice contains a 25 x 18 multiplier, an adder, and ann accumulator. 3 Block RAMs are fundamentally 36 Kbits in size. Each block can also be used as two independent 18 Kbit blocks. 4 Each Clock Management Tile (CMT) contains two DCMs and one PLL. 5 RocketIO GTP transceivers are designed to run from 100 Mb/s to 3.75 Gb/s. RocketIO GTX transceivers are designed to run from 150 Mb/s to 6.5 Gb/s. 6 This number does not include RocketIO transceivers. Max (kb) CMT 4 Power PC processor blocks Ethernet MACs N/A 4 Max Rocket IO Transceivers GTP 5 GTX Max user I/O 6 16 N/A 680 Figure 25 Xilinx Virtex 5 XC5VLX110T device. Taken from [44] The proposed controller in this thesis is implemented on Xilinx Virtex 5 XC5VLX110T FPGA (Figure 25). Table 13 shows the device specification. Like all other Xilinx FPGA series, Virtex 5 families store their configuration bitstream in SRAM type data into internal memory via the configuration interface. This data internal latches. Virtex 5 devices are configured by loading application specific configuration contains bits that set the configuration for each LUT and flip flop as well as all routing connections. Moreover, the bitstream contains all necessary data for configuring the embedded elements like 44

52 PowerPC, ICAP, and the initial data for BRAMs [45]. Because Xilinx configuration memory is volatile, it must be reconfigured each time it is turned on. The Virtex 5 FPGA can be configured via several configuration interfaces. These interfaces are listed in Table 14. No. Table 14 Virtex 5 Configuration Modes Configuration Mode Type of interface Bus Width (bit) 1 Master serial configuration mode Serial 1 2 Slave serial configuration mode Serial 1 3 Master SelectMAP configuration mode Parallel 8 or 16 4 Slave SelectMAP configuration mode Parallel 8 or 16 or 32 5 JTAG/Boundary Scan configuration mode Serial 1 6 Master Serial Peripheral Interface (SPI) Flash configuration mode Serial 1 7 Master Byte Peripheral Interface Up (BPI Up) Flash configuration mode Parallel 8 or 16 8 Master Byte Peripheral Interface Down (BPI Down) Flash configuration mode Parallel 8 or 16 Among these interfaces, we have used Master serial configuration for doing full FPGA configuration, and Internal Configuration Access Port (ICAP), which is based on SelectMAP protocol, for doing partial reconfiguration. The XUPV505 LX110T is a feature rich general purpose evaluation and development platform with onboard memory and industry standard connectivity interfaces. It features the Virtex 5 XC5VLX110T device. [46]. The evaluation platform (Figure 26) has the following features: Xilinx Virtex 5 XC5VLX110T FPGA Two Xilinx XCF32P Platform Flash PROMs (32 MB each) for storing large device configurations Xilinx System ACE Compact Flash configuration controller 64 bit wide 256Mbyte DDR2 small outline DIMM (SODIMM) module compatible with EDK supported IP and software drivers On board 32 bit ZBT synchronous SRAM and Intel P30 Strata Flash 10/100/1000 tri speed Ethernet PHY supporting MII, GMII, RGMII, and SGMII interfaces USB host and peripheral controllers Programmable system clock generator Stereo AC97 codec with line in, line out, headphone, microphone, and SPDIF digital audio jacks RS 232 port, 16x2 character LCD, and many other I/O devices and ports 45

53 Figure 26 Xilinx XUPV5 LX110T Evaluation Platform. Taken from [46] In this thesis, two identical XUPV5 LX110T evaluation platforms have been used. One plays the role of a master that monitors and, in case of failure, recovers the second one that plays the role of slave. We have utilized Two Xilinx XCF32P Platform Flash PROMs for storing the full configuration bitstreams. The partial bitstream files were stored in on chip BRAMs and an off board Atmel I2C memory; however, we can use on board memories (such as Compact Flash, ZBT synchronous SRAM or SPI flash) as an alternative for storing partial or full bitstream files. In addition to this, we have used on board been used for dipswitches, LEDs and keys for testing the system. Moreover, the expansion IOs have connecting two identical evaluation boards to each other to shape a master slave system. 46

54 9.4 Appendix D: Configuration modes in Virtex Configuratio on Modes and Pins in Virtex 5 [31] Virtex 5 devices are configured by loading application specific configuration data the bitstream into internal memory. Because Xilinx FPGA configuration memoryy is volatile, itt must be configured each time it is powered up. The bitstream is loaded into the device through special configuration pins. These configuration pins serve as the interface for a number of different configuration modes: Master serial configuration mode Slave serial configuration mode Master SelectMAP (parallel) configuration mode (x8 and x16 only) ) Slave SelectMAP (parallel) configuration mode (x8, x16, and x32) JTAG/Boundary Scan configuration mode Master Serial Peripheral Interface (SPI) Flash configuration mode Master Byte Peripheral Interface Up (BPI Up) Flash configuration mode (x8 and x16 only) Master Byte Peripheral Interface Down (BPI Down) Flash configuration mode (x8 and x16 only) Serial Configuration Interface [31] In serial configuration modes, the FPGA is configured by loading one configuration bit per CCLK cycle: In Master Serial mode, CCLK is an output. In Slave Serial mode, CCLK is an input. Figure 27 shows the basic Virtex 5 serial configuration interface. There are four methods of configuring an FPGA in serial mode: Master serial configuration Slave serial configuration Serial daisy chain configuration Ganged serial configurationn Figure 27 Virtex 5 FPGA Serial Configuration Interface. Taken from [31] 47

Table 15 Virtex 5 FPGA Serial Configuration Interface Pins Pin name Type M[2:0]

Drainor Active or Output, Open Drain Dedicated or Dual Purpose Dedicated

Configuration clock source for all configuration modes except JTAG.

Serial configuration data input, synchronous to rising CCLK edge Serial data

Active High signal indicating configuration is complete: 0 = FPGA not configured

indicating whether a CRC error occurred during configuration: 0 = CRC error 1 =

configuration data is clocked into Virtex 5 devices in Master Serial mode.

55 Table 15 Virtex 5 FPGA Serial Configuration Interface Pins Pin name Type M[2:0] CCLK D_IN DOUT_BUSY DONE INIT_B PROGRAM_B Output Output Bidirectional, Open Drainor Active or Output, Open Drain Dedicated or Dual Purpose Dedicated Dedicated Dedicated Dedicated Dedicated Dedicated Dedicated Description Mode Pins determine configuration mode. They can be set via speciall DIP switchess on the evaluation board. Configuration clock source for all configuration modes except JTAG. For Master serial it is an output. Serial configuration data input, synchronous to rising CCLK edge Serial data output for downstream daisy chained devices. It is left unconnected in our design. Active High signal indicating configuration is complete: 0 = FPGA not configured 1 = FPGA configured Refer to the BitGen section of the Development System Reference Guide for software settings. Before the Mode pins are sampled, INIT_B is an input that can be held Low to delay configuration. After the Mode pins aree sampled, INIT_B is an open drain active Low output indicating whether a CRC error occurred during configuration: 0 = CRC error 1 = No CRC error Active Low asynchronous full chip reset Figure 28 shows how configuration data is clocked into Virtex 5 devices in Master Serial mode. Figure 28 Serial Configuration Clocking Sequence. Taken from [31] Notes relevant to Figure 28: 1. Bit 0 represents the MSB of the first byte. For example, if the first byte is 0xAAA (1010_1010), bit 0 = 1, bit 1 = 0, bit 2 = 1, etc. 2. For Master configuration mode, CCLK does not transition until after the Mode pins are sampled, as indicated by the arrow. 48

3. CCLK can be free running in Slave serial mode.

PROM, as shown in Figure 29. Figure 29 Master Serial Mode Configuration.

The DONE pin is by default an open drain output requiring an external pull up resistor.

The INIT_B pin is a bidirectional, open drain pin. An external pull up resistor is required. 3.

The PROM in this diagram represents one or more Xilinx PROMs.

56 3. CCLK can be free running in Slave serial mode. The Master Serial mode is designed so that the FPGA cann be configured from a Xilinx configuration PROM, as shown in Figure 29. Figure 29 Master Serial Mode Configuration. Taken from [31] Notes relevant to Figure 29: 1. The DONE pin is by default an open drain output requiring an external pull up resistor. The DONE pin has a programmable active driver. To enable it, enable the Drive DONE option in BitGen. 2. The INIT_B pin is a bidirectional, open drain pin. An external pull up resistor is required. 3. The BitGen startup clock setting must be set for CCLKK for serial configuration. 4. The PROM in this diagram represents one or more Xilinx PROMs. Multiple Xilinx PROMs can be cascaded to increase the overall configuration storage capacity. 5. The BIT file must be reformatted into a PROM file before it can be stored on the Xilinx PROM. 6. On some Xilinx PROMs, the reset polarity is programmable. RESET should be configured as active Low when using this setup. 49

Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency

Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency Vijay G. Savani, Akash I. Mecwan, N. P. Gajjar Institute of Technology, Nirma University vijay.savani@nirmauni.ac.in, akash.mecwan@nirmauni.ac.in,