SEVA: A Soft-Error- and Variation-Aware Cache Architecture

Size: px

Start display at page:

Download "SEVA: A Soft-Error- and Variation-Aware Cache Architecture"

Pearl Elfrieda Fields
6 years ago
Views:

1 SEVA: A Soft-Error- and Variation-Aware Cache Architecture Luong D. Hung, Masahiro Goshima, Shuichi Sakai Graduate School of Information Science and Technology, The University of Tokyo Hongo 7-3-, Bunkyo, Tokyo , Japan {hung,goshima,sakai}@mtl.t.u-tokyo.ac.jp Abstract As SRAM devices are scaled down, the number of variation-induced defective memory cells increases rapidly. Combination of ECC, particularly SECDED, with a redundancy technique can effectively tolerate a high number of defects. While SECDED can repair a defective cell in a block, the block becomes vulnerable to soft errors. This paper proposes SEVA, an original soft-error- and variationaware cache architecture. SEVA exploits SECDED to tolerate variation-induced defects while preserving high resilience against soft errors. Information about the defectiveness and data dirtiness is maintained for each SECDED block. SEVA allows only the clean data to be stored in defective (but still usable) blocks of a cache. An error occurring in a defective block can be detected and the correct data can be obtained from the lower level of the memory hierarchy. SEVA improves yield and reliability with low overheads. Introduction SRAM designs are confronted with two serious problems: soft errors and variation-induced defects. Soft errors refer to radiation-induced transient errors [4]. Soft error rate (SER) per bit of SRAM is expected to stay steady over process generations []. However, device scaling allows the number of bits integrated on a chip to increase exponentially, rising total SER rapidly. Tolerance against soft errors is highly required. Error Correcting Code (ECC) particularly, Single Error Correction Double Error Detection Hamming code (SECDED) is widely employed to detect and correct soft errors in SRAMs. Process variation causes spreads in the electrical characteristics of scaled devices. The effect is pronounced in SRAMs where minimum-geometry transistors are used. The number of variation-induced defective memory cells becomes high with device scaling [6][2], leading the conventional redundancy techniques (e.g., using redundancy rows/columns) to become impractical. Combination of a redundancy technique with ECC can tolerate a high degree of random defects [3][4]. A block containing a single defective cell can be repaired by ECC. At much lower probability, a block may contain multiple defective cells that exceed the error detection and/or correction capability of ECC. Only such a block is replaced by a redundancy element. With such a combined approach, a small number of redundancy elements can be sufficient. It is cost-effective if the same ECC resource can be used to tolerate both defects and soft errors. However, while a defective cell in a block can be tolerated by SECDED, the block becomes vulnerable to soft errors. Soft errors occurring in the block could be left undetectable and/or uncorrectable. While the error detection and correction capability of ECC can be enhanced by using codes that are more powerful than SECDED, these codes incur significant overheads and are impractical for being implemented in high-speed SRAMs. Previous work therefore improves defect tolerance at the expense of degraded soft error tolerance [3][4]. This paper proposes SEVA, a Soft Error and Variation Aware cache architecture. SEVA exploits SECDED to tolerate variation-induced defects while preserving high resilience against soft errors. Information about the defectiveness and data dirtiness is maintained for each SECDED block. SEVA allows only clean data to be stored in defective (but still usable) blocks. Soft errors cannot cause integrity problem in these blocks because SECDED is still able to detect the errors. When soft errors are detected, the correct data can be obtained from the lower levels of the memory hierarchy. SEVA improves yield and reliability with modest costs. The rest of this paper is organized as follows. Section 2 discusses the existing techniques for tolerating soft errors and defects in SRAMs, as well as their limitations. Section 3 describes SEVA architecture. Section 4 presents defect and yield analysis for a SEVA cache. Section 5 presents the performance and reliability evaluations. Finally, Section 6 concludes the paper.

2 2 Preliminary This section covers soft errors and variation-induced defects as well as existing techniques to mitigate them. In this paper, a defect refers to a permanent hardware fault while an error refers to a soft error. A block refers to either an ECC dataword (including information bits and check bits) or the storage portion holding such data. 2. Soft Error Tolerance in SRAMs A collision of a high energy particle (e.g., alpha, neutron) with silicon atoms induces charges. Enough charges collected at the diffusion nodes of a gate can alter the gate output. Particle strikes in SRAMs can result in bit upsets. While Soft Error Rate (SER) per bit is expected to stay steady over process generations [], device scaling allows the number of bits that can be integrated on a chip to increase exponentially and rises total SER rapidly. Tolerance against soft errors in SRAMs is highly required. Thanks to the high regularity of memory arrays, soft errors in SRAMs can be effectively tolerated by using coding techniques. Parity can be used in places where error detection is sufficient (e.g., instruction caches, write-through data caches). ECC, particularly SECDED, is widely used to detect and correct soft errors in caches holding dirty data (e.g., write-back caches). SECDED can correct single-bit errors and detect double-bit errors. The encoding and decoding circuitry of SECDED can be constructed as simple XOR trees and is quite fast. While a single-bit error (SBE) in a block can be detected and corrected successfully by SECDED, there is a possibility of a multi-bit error (MBE) occurring in a block. MBE can be caused by ) accumulation of multiple single-bit errors over time or 2) a particle strike corrupting multiple bits at once. We call the former case temporal MBE, and the later case spatial MBE. Since soft errors are infrequent events and SRAMs are frequently accessed, an error caused by a previous strike is very likely to be detected and corrected before next strike occurs. The probability of temporal MBE is therefore very low. The probability of a particle strike resulting in a spatial MBE increases with shrinking in device dimensions and supply voltages [0]. Since the error bits of a spatial MBE are located closely to each other, combination of ECC and an interleaving technique can tolerate spatial MBEs []. By interleaving different blocks on the same row, the error bits are dispersed over multiple blocks. Each error bit belongs to a different block and therefore can be detected and corrected successfully by SECDED. Table. Variation of V th over process generations Technology (nm) Vth (V) σv th (mv) Variation-induced Defects and Defect Tolerance in SRAMs 2.2. Variation-induced Defects As process scales down, precise control of various device parameters such as gate width, gate length, number and locations of dopant atoms in manufacturing becomes increasingly difficult. This results in considerable variations in the electrical characteristics of devices. Table shows the variation of threshold voltage (V th ) over process generations, predicted by ITRS []. Particularly, fluctuations in the number and locations of dopant atoms in the channel region of a transistor are known as the dominant source of V th variation in scaled devices [8][5]. Variation is pronounced in SRAMs where minimumgeometry transistors are used. Figure shows the schematic of an SRAM cell. The cell consists of two cross-coupled CMOS inverters and two N-type access transistors. Large mismatches in the strengths of transistors in an SRAM cell can make the cell fail to function. Three types of failures can occur in an SRAM cell: read, write, and access time failures. BL B=0 B= Figure. Schematic of an SRAM cell BL Read Failure. Read failure is defined as the flipping of data while reading an SRAM cell. The voltage of node B (the node storing 0 in Figure ) V B is raised from zero to V read due to a voltage divider between BL (precharged at V DD) and GND through T and T 3. If V read is larger 2

3 Table 2. Probability of failure cell versus σv th in 45nm process σ Vth (mv) P RF P W F P AF P F ault 20 x0 4 x0 4 x0 4 x x0 4 2x0 4 5x0 4 8x x0 3 6x0 4 3x0 3 4x0 3 Table 3. Number of check bits required for different information-bit lengths Information-bit length SECDED BCH DEC than the tripling voltage V trip of {T 4,T 6} inverter, the cell flips resulting in a read failure. Variations in V th of T and T 3 (or T 6 and T 4) lead to large variation in V read (or V trip ). Write Failure. Write failure is defined as the inability to successfully write to a cell. When writing 0 to node B which originally stores, V B develops to V write due to a voltage divider between BL at GND and V DD through T 2 and T 6. If V write is larger than V trip of {T 3,T 5} inverter, the write will fail. Since T 2 and T 6 (also T and T 5) are typically the smallest transistors in the cell, V th variations in these transistors causes large variation in V write, resulting in a high probability of write failure [6]. Access Time Failure. The access time (T access ) is defined as the time required to develop a predefined voltage difference between BL and BL. When node B stores 0, BL will discharge through T and T 3 in a read operation. The discharge speed depends on the strengths of T and T 3. V th variations in these transistors cause a spread in T access. An access fails if T access is larger than the maximum tolerable limit T limit. Table 2 shows the failure probability of a cell at 45nm process with different amount of V th variations [2]. P RF, P W F, and P AF are respectively the probabilities of read, write, and access failures. P F ault is the total failure probability. The failure probability is quite high and highly sensitive to V th variation Defect Tolerance Techniques Redundancy techniques have been used to tolerate manufacturing defects in SRAMs to improve yield. SRAMs go through a test process after fabrication. A test algorithm applies input data in particular patterns to the SRAMs, and diagnoses the output data to locate defects. Common defects in SRAMs are malfunction cells, stuck-at faults, and bridging faults. When the defects have been identified, a repair algorithm is executed to replace defective locations with redundancy elements. Traditionally, external equipments are extensively used to test the chips, to localize defects, and to drive a laser beam to perform repair, or to blow fuses or anti-fuses. However, external test & repair method is costly. Modern SRAMs typically includes Built-In Self Test (BIST) and Built-in Self Repair (BISR) to maintain reasonable test and repair cost [7]. Conventional redundancy techniques usually use rows and/or columns as the redundancy elements. However, high defect densities in scaled processes makes such coarse-grain redundancy techniques impractical. In [2], a row contains several cache lines and an individual cache line, rather than an entire row, can be replaced through modifications to the column multiplexers. Maintaining redundancy at a word level (e.g., 32-bit) allows more efficient utilization of redundant resources [5][2]. Nevertheless, such fine-grain redundancy techniques are still inadequate for high defect densities. For instance, assuming the defect probability per cell of 0.00, a 256-KB SRAM will have roughly 2K defective words. The BISR and decoder circuitry to support the replacement of such a high number of defective words is complex. Particularly, the Content-Addressable Memory (CAM), which is usually used in BISR to store the addresses of the defective words, becomes excessively large and degrades the access latency. There have been proposals that combine a redundancy technique with ECC to tolerate a high number of random defects [3][4]. ECC, usually SECDED, is maintained for each data block. In the case where a defective cell is present in the block, the defective cell can be corrected by SECDED. Only in the case there are more than one defective cell present in the same block, the block is replaced by a redundancy element. The probability of the later case is much lower than the former case. The number of redundancy elements can be kept at low. 2.3 Limitation of Existing Tolerance Techniques Defect tolerance and soft-error tolerance are the two topics that are usually treated separately so far. ECC has proved to be effective for tolerating either high defect densities or soft errors. It is cost-effective if the same ECC resource can be used for both purposes. However, while a defective cell present in a block can be tolerated by SECDED, the block becomes vulnerable to soft errors. A single-bit error in the block is detectable but uncorrectable. A multi-bit error in the block can be undetectable, or detectable but incorrectly corrected. Error detection and correction capability can be improved by using codes that are more powerful than SECDED. For instance, if Double Error Correction Code 3

4 (DEC) is used, a defective cell present in a block can be tolerated while a single-bit error occurring in the block can be successfully detected and corrected. Some conventional DEC codes described in the literature are Bose-Chaudhury- Hocquenghem (BCH), Reed-Solomon [6]. However, their overheads are considerably larger than those of SECDED. Table 3 shows the number of check bits required to implement SECDED and BCH DEC for various information bit lengths. DEC increases number of check bits significantly. Moreover, the encoding and decoding circuitry of DEC usually employs multi-bit Linear Feedback Shift Registers (LFSR) which introduce inordinate access delay, making DEC unsuitable for being implemented in SRAMs requiring fast access. Therefore, it is important to use SECDED as ECC of choice. Existing work[3][4] making use of SECDED have limitation that they improve defect tolerance at the expense of degraded tolerance against soft errors. SEVA can overcome such a limitation. SEVA utilizes SECDED to tolerate variation-induced defects while preserving high resilience against soft errors. 3 SEVA Architecture A SEVA cache consists of several subarrays. A subarray has several rows. Each row is comprised of several blocks. Each subarray has its own decoders, BIST, BISR, and some redundancy rows. BIST detects the defective cells in the subarray. Blocks are classified based on the number of defective cells they contain. A block that does not have any defective cell is called a good block (g-block). A block that has a defective cell is called a tolerable block (t-block). A block that has more than one defective cells is called a bad block (b-block). The row containing at least one b-block is a bad row. BISR replaces bad rows with redundancy rows. SEVA associates each block with a g-bit. The g-bit of a block is set (or reset) if the block is a g-block (or t-block). Defect analyzing and setting of g-bits are performed by BIST every time the system is booted. Alternatively, the overheads can be reduced by storing the values of g-bits into a non-volatile storage and reloading these values into g-bits at rebooting. SEVA interleaves multiple blocks in the same rows to disperse the error bits of a spatial MBE. We assume that a block contains as much as one error bit and focus on how to deal with a single-bit error in a block. If the block is a g- block, SECDED can be exclusively used for soft error tolerance. Therefore, the error is detectable and correctable. However, if the block is a t-block, a part of SECDED capability is used for repairing the defective cell, leaving the reduced capability for soft error tolerance. The error in this case is detectable, but uncorrectable. If the corrupted data is clean, error detection is sufficient since the correct data can be obtained from the lower levels of the memory hierarchy. Data integrity problem occurs if the corrupted data are dirty data which have no backup elsewhere. SEVA prevents such a data integrity problem by storing only clean data in t-blocks. When there is a data update coming from a upper level cache (or processor) to a t-block of a cache, the data is also updated to the next level of the memory hierarchy. Such an update is referred to as an assurance update. An assurance update also occurs if modification goes to a cache line which the block holding the cache line s tag is a t-block. Assurance updates increase the number of accesses to the next level of the memory hierarchy and have impacts on performance and power consumption. In conventional caches, the dirtiness of data is maintained at cache line level. When the cache line is written back, all the constituent blocks are written back no matter whether the blocks have been modified or not. By maintaining the dirtiness of data at block level, unnecessary writebacks of unmodified blocks can be eliminated. In SEVA, each block is associated with a d-bit. The d-bit of a block is set (or reset) if the block stores dirty (or clean) data. When the cache line are written back, only those blocks having their d-bits set are sent to the next level of the memory hierarchy. The d-bits are reset to indicate that the data in the blocks are now clean. By reducing the number of blocks to be written back, the probability of assurance update triggered by the next level of the memory hierarchy can also be reduced. Figure 2 shows a simplified architecture of a 256KB, four-way set-associative SEVA cache. The cache line size is 64B. The cache is constructed from a tag subarray and eight 32-KB data subarrays. Each data subarray has 52 rows, each row is comprised of eight 72b blocks. The tag subarray has 256 rows, each row is comprised of sixteen 45b blocks. A tag entry has a tag address, some state bits (for LRU replacement, cache coherency), and g-bits and d- bits of all blocks belonging to the corresponding cache line. The tag access usually proceeds the data access in a typical cache. By storing the g-bits and d-bits of the blocks of the data array in the corresponding entry of the tag array, the decisions of ) whether an assurance write is needed or not upon a cache write, and 2) which blocks are dirty that need to be written back upon a line eviction or an assurance update, can be determined at the end of the tag access. Accesses to unmodified blocks in a cache line can be skipped, thereby saving power consumption. Interleaving the tags of multiple sets in the same row in the tag subarray tolerates MBE while allowing the tags of the same set to be read concurrently on an access. 4

5 4-b tag 6 state bits 9 g-bits 9 d-bits 38 info bits 7 check bits a tag entry 64 info bits 8 check bits a SECDED block 45b 45b 45b 45b tags of the same set 72b 72b sub-cachelines 80b 80b 80b 80b a row 44b 44b 44b 44b a row decoder 256 rows decoder 52 rows BIST BISR 720b tag subarray spare rows BIST BISR 576b data subarray #0 spare rows data subarray #7 Figure 2. Architecture of 256KB SEVA cache 4 Defect and Yield Analysis In this section, we perform defect and yield analysis of an SEVA cache. Our analysis focuses on variation-induced defects, which are the dominant type of defects in scaled devices (redundancy rows can also be used to deal with other types of defects such as stuck-at faults at wordlines, decoders). We assume the defects are randomly distributed and λ is the probability that a cell is defective. Since an SEVA cache consists of multiple subarrays, we start with defect and yield analysis of a generalized subarray. The subarray has N row rows. Each row contains N blk blocks. Each block has BS bits, including information bits and check bits. The subarray has N rdrow redundancy rows. The probabilities that a block is a g-block, t-block or b- block are respectively P g blk, P t blk, and P b blk, and are given by P g blk = ( λ) BS () P t blk = BS( λ) BS λ (2) P b blk = P g blk P t blk (3) The probability that a row is a bad row, P b row, is given by P b row = ( P b blk ) N blk (4) The array is passable if the number of bad rows, N b row, is at most equal to the number of redundancy rows that are not bad rows, N nb rdrow. The yield of the array can be expressed as Y ield array = P rob(n b row N nb rdrow ) (5) We can simulate the yield of the subarray by assuming that N b row and N nb rdrow follow Poisson distribution which means are given by N b row = N row P b row (6) N nb rdrow = N rdrow ( P b row ) (7) The yield of the cache is the product of the yield of the subarrays. Y ield cache = 4. Results all subarrays Y ieldsubarray (8) We perform yield simulation for the SEVA cache shown in Figure 2. The yields of the tag and data subarrays are calculated separately. The configuration parameters {N row, N blk, BS} of the tag and data subarrays are respectively {256, 6, 45} and {52, 8, 72}. The defect probability is varied from to We also vary the number of redundancy rows to investigate its impact on the yield of the subarray. The yields of the tag and data subarray are respecitively shown in the left and right halfs of Figure 3. When defect probability is low, the subarrays can archieve 5

6 Table 4. Yield and Overhead of the subarrays and the cache λ Tag subarray Data subarray Cache Overhead (%) (yield/n rdrow ) (yield/n rdrow ) yield / 0.996/ / / / / / / high yield with a very few rows. As the defect probability becomes high ( or 0.00), the number of redundancy rows required to achieve adequate yield increases appreciably. Table 4 lists the yield and number of redundancy rows in the tag and data subarrays in order to obtain roughly 95% cache yield. The hardware overhead of SEVA is also shown in the table. The overhead is simply the total number of SECDED check bits, g-bits, d-bits, and the bits consumed by redundancy rows, divided by the number of memory bits in the original cache. The overhead of SECDED is constant and the overhead of redundancy rows grows up with the increase in the defect probability. Even when the defect probality is as high as 0.00, SEVA can achieve 95% yield with 2.8% overhead. 5 Performance and Reliability Evaluation The applications of SEVA caches on a processor s memory hierarchy are considered in this section. The performance overhead and reliability improvement are evaluated. 5. Evaluation Methodology The simulated system is an out-of-order superscalar processor which configuration parameters are shown in Table 5. The processor has 6-KB instruction and data caches, and a 256-KB unified L2 cache. The data cache and L2 cache are writeback caches, each equipped with a two-entry writeback buffer. Upon a line eviction or an assurance update, the data is temporarily stored in the writeback buffer. The buffer will write back the data when the bus to the next level of the memory hierarchy is free. SECDED is maintained for every 64 information bits in the L caches and L2 cache. We assume that the defect probability of an SRAM cell is We also assume that all bad rows in the caches are replaced by redundancy rows. The probability that a block is a t-block is calculated using Equation and 2. The g-bits are randomly initialized at the beginning of the simulation based on such a probability. We consider only the defects on SRAM caches and assume the memory to be fault-free in the evaluation. The following two systems are evaluated: Table 5. Parameters of Simulated Architecture Processor Parameters Frequency Functional Units LSQ size RUU size Issue Width L i-cache L d-cache L2 cache Memory GHz 4 integer ALUs, 4 FP ALUs integer multiplier/divider FP multiplier/divider 8 instructions 6 instructions 4 instructions/cycle Memory Hierarchy Parameters 6 KB, direct-map, 32B line, cycle latency 72b SECDED block 6 KB, 4-way, 32B line, cycle latency, writeback 72b SECDED block, 2-entry writeback buffer 256KB, unified, 4-way, 64B line, 6 cycle latency 72b SECDED block, 2-entry writeback buffer 00 cycle latency Baseline system: The L data cache and L2 cache are normal caches which do not perform assurance update and maintain dirtiness per cache line. SEVA system: The L data cache and L2 cache are SEVA caches which perform assurance updates and maintain dirtiness per block. Cycle-accurate processor simulation is performed using SimpleScalar toolset [7]. SPEC2000 benchmarks are used in the simulation. For each benchmark, we skip the first one billion instructions and simulated the next four billion instructions. We assume that the soft error rate of an unprotected SRAM equal to.6 KFIT per megabit [3], and soft errors follow a uniform distribution. Accesses to the cache lines (in baseline system) or blocks (in SEVA system) of the caches are recorded. Such information allows us to calculate the error rates of the caches. 5.2 Evaluation Results Figure 4 shows the relative performance of the baseline system and SEVA system. The assurance updates of an SEVA cache consume the access bandwidth to the next level of the memory hierarchy and impact performance. However, the performance degradation is very small (less than 0.%). It is thanks to the presence of the writeback buffer which effectively removing an assurance update from the critical path of a cache access. Interestingly, SEVA system improves performance for mcf benchmark. The improvement in performance in this case can be attributed to early writeback effect [9]. If the writeback buffer is full and a dirty line must be evicted in a cache miss, an entry in the One FIT (Failure In Time) corresponds to one failure per 0 9 hours 6

7 # Figure 3. The yields of the tag subarray (left-half) and data subarray (right-half) buffer must be written back and such a writeback adds to the latency of the cache access. Early writing back the dirty lines through assurance updates can reduce such overheads.!!! " # $&%'( )* %% %' )+. '-, ),! Figure 4. Performance degradation of SEVA system, compared to the baseline system Figure 5 shows the average number of accesses to L2 caches and memory per one thousand instructions. On average, SEVA system increases the numbers of accesses to the L2 cache and memory respectively.3 and 6.4%. The breakdowns of the accesses are also shown in the figure. Assurance updates and writebacks account for 25.5% (or 27.2%) of accesses to the L2 cache (or memory). By allowing the L data cache (or L2 cache) to send only the dirty blocks to the L2 cache (or memory), the number of blocks can be reduced by 44% (or 60%), as compared with the case all the blocks in the cache lines are sent back to the L2 cache (or memory). The results confirm the effectiveness of maintaining data dirtiness per block in SEVA caches for eliminating unnecessary block updates and improving power efficiency. Table 6 shows the uncorrectable error rate of the L2 caches. For the baseline cache, an uncorrectable error occurs if a strike occurs on a t-block of a dirty cache line and Table 6. Uncorrectable error rate of 256KB L2 caches Cache Uncorrectable Error Rate (FIT) Baseline 63 SEVA 3.84e-4 the cache line is read later. For the SEVA cache, an uncorrectable error occurs if two strikes occur on the same t-block between two consecutive accesses to the block. The error rate of the SEVA cache is many orders of magnitude lower than that of the baseline L2 cache. 6 Conclusions Tolerating soft errors for high reliability and tolerating variation-induced defects for yield improvement are highly required in advanced SRAM designs. The proposed SEVA cache architecture can satisfy both the requirements. By combining SECDED with a redundancy technique, SEVA can effectively tolerate high defect densities. Enforcement of assurance update mechanism aided by the newly added g-bits and d-bits efficiently eliminates the uncorrectable soft errors occurring in defective (but still usable) blocks. The evaluation results verified the effectiveness of SEVA caches. Even with high defect probability (as high as 0.00), SEVA caches can achieve high yield with low area overheads. The performance overheads caused by assurance updates in SEVA caches are very low. Acknowledgement This research is partially supported by Grant-in-Aid for Fundamental Scientific Research B(2) # , B(2) 7

8 ? 3 : 4 32 <? 3 : 4 / / / / / 0 3; 4 0 : 3 0 2< 3 26 < < 8 = > anb cedmf[fhgni dkjnlnm anb conpi qrsmntudmlpv qb c[wkapb cyxeqf[f A5 / : 3 A@ > 0 EA 5 < C 5 6G F FC4 0 / /0 / / / 0 3; 4 0 : 2< < 8 = > H IKJMLNLPORQ JPSRTRU H IWVXQ YZ[UR\NJPTR] HI_^`YLPL A5 / : 3 A@ > 0 EA 5 < C 5 6G F FC4 Figure 5. Accesses to L2 cache and memory per,000 instructions # from the Ministry of Education, Culture, Sports, Science and Technology Japan, a CREST project of the Japan Science and Technology Corporation, and by a 2st century COE project of the Japan Society for the Promotion of Science. References [] International Technology Roadmap for Semiconductor [2] A. Agarwal, B. C. Paul, S. Mukhopadhyay, and K. Roy. Process Variation in Embedded Memories: Failure Analysis and Variation Aware Architecture. IEEE Journal on Solid State Circuits, 40(9):804 84, [3] R. Baumann. Soft Errors in Advanced Computer System. IEEE Design and Test of Computers, 22(3): , [4] R. C. Baumann. Soft Errors in Advanced Semiconductor Devices Part I: Three Radiation Sources. IEEE Transactions on Device and Materials Reliability, ():7 22, 200. [5] A. Benso, S. Chiusano, G. D. Natale, P. Prinetto, and M. L. Bodoni. A family of self-repair SRAM cores. In Proc. IEEE International On-Line Testing Workshop, pages 24 28, [6] A. Bhavnagarwala, S. Kosonocky, C. Radens, K. Stawiasz, R. Mann, Q. Ye, and K. Chin. Fluctuation Limits & Scaling Oppoturnities for CMOS SRAM Cells. In Proc. IEEE International Electron Devices Meeting, pages , [7] D. Burger and T. Austin. The SimpleScalar Tool Set. Technical Report CS-TR , University of Wisconsin- Madison, 997. [8] D. J. Frank, Y. Taur, M. Ieong, and Hon. Monte Carlo Modelling of Threshold Variation due to Dopant Fluctuations. In Proc. IEEE International Electron Devices Meeting, pages 93 94, 999. [9] H.-H. S. Lee, G. S. Tyson, and M. K. Farrens. Eager writeback a technique for improving bandwidth utilization. In Proc. IEEE/ACM International Symposium on Microarchitecture, pages 2, [0] J. Maiz, S. Hareland, K. Zhang, and P. Armstrong. Characterization of Multi-bit Soft Error events in advanced SRAMs. In Proc. IEEE International Electron Devices Meeting, pages , [] K. Osada, Y. Saitoh, and K. Ishibashi. 6.7-fA/Cell Tunnel- Leakage-Suppressed 6-Mb SRAM for Handling Cosmic- Ray-Induced Multierrors. IEEE Journal on Solid State Circuits, 38(): , [2] V. Schober, S. Paul, and O. Picot. Memory Built-in Self- Repair using Redundant words. In Proc. IEEE International Test Conference, pages , 200. [3] C. H. Stapper and H.-S. Lee. Synergistic Fault-Tolerance for Memory Chips. IEEE Transactions on Computers, 4(9): , 992. [4] C.-L. Su and Y.-T. Yeh. An Integrated ECC and Redundancy Repair Scheme for Memory Reliability Enhancement. In Proc. IEEE International Test Conference, pages 8 89, [5] X. Tang, V. K. De, and J. D. Meindl. Intrinsic MOSFET Parameter Fluctuations Due to Random Dopant Placement. In Proc. IEEE International Electron Devices Meeting, pages , 997. [6] T.R.N.Rao and E. Fujiwara. Error-Control Coding for Computer Systems. Prentice Hall, Inc, 989. [7] Y. Zorian. Embedded infrastructure IP for SOC yield improvement. In Proc. IEEE/ACM Design Automation Conference, pages ,

Area-Efficient Error Protection for Caches

Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various