Adaptive ECC for Tailored Protection of Nanoscale Memory

Size: px

Start display at page:

Download "Adaptive ECC for Tailored Protection of Nanoscale Memory"

Shon McKinney
5 years ago
Views:

1 Adaptive ECC for Tailored Protection of Nanoscale Memory Dongyeob Shin, Jongsun Park Korea University Seoul, Korea {shindy99, Jangwon Park Samsung Electronics Suwon, Korea Somnath Paul Intel Corporation Hillsboro, OR, USA Swarup Bhunia University of Florida Gainesville, FL, USA Abstract Increasing run-time failure in nanoscale memory, specifically at low supply voltages, has emerged as a major challenge in current VLSI design. This paper presents a novel reconfigurable Error Correction Code (ECC) for robust nanoscale memory, which can dynamically adapt, in space and time, to varying reliability of memory blocks, thus providing the right amount of protection to a memory block at a given time. Our analysis shows that the proposed ECC scheme can efficiently tolerate high run-time failure rates with modest performance and area penalty. It can significantly enhance nanoscale memory reliability at iso-overhead compared to existing uniform ECC scheme. Keywords: Memory Failures, Error Correction Code (ECC), Variable ECC, Run-time Protection, Robust Nanoscale Memory I. INTRODUCTION Post-silicon calibration and healing techniques have emerged as effective solutions for recovering from manufacturing defects or process variation induced failures in digital, analog and RF circuits/systems [] []. In case of nanoscale memories, aggressive area optimization in the quest of higher integration density has made them highly vulnerable to manufacturing defects as well as run-time failures. Built-in redundancy (e.g. in row/column) has been a well-adopted healing approach for memory to adapt to hard defects []. However, tolerance to runtime failures in memory remains a serious challenge for system-on-chip designs [] particularly in the sub-5nm technology regime. Increasing process variation in these process nodes largely aggravates run-time failure rate. Such failures can affect random or contiguous bit positions in a memory codeword. They can be primarily caused by: ) supply voltage or D. Shin and J. Park are with the School of Electrical Engineering, Korea University, 7B Innovation Hall, Seoul -7, Korea (phone/fax: /+89544, shindy99@korea.ac.kr and jongsun@korea.ac.kr) J. Park is with Samsung Electronics, Suwon 44-74, Gyeonggi, Korea ( jw849.park@samsung.com) S. Paul is with Intel Corporation, N.E. 5th Ave, MS JF-55, Hillsboro, OR 974, USA (phone: +5754, somnath.paul@intel.com) S. Bhunia is with University of Florida, A Larsen Hall, Gainesville, FL, USA (phone/fax: / swarup@ece.ufl.edu ) thermal noises, and ) temporal device degradation due to aging effects [4]. In order to address the multiple bit runtime failures in onchip memories, error correcting codes (ECC) such as single error correction and double error detection (SECDED) and bit-interleaving have been used together [5]. Bitinterleaving distributes the contiguous errors into different words and facilitates error correction using SECDED. However, it typically incurs significant energy overhead and half-select issues due to pseudo-read operations. It is worth noting that one of the primary drawbacks encountered with the conventional uniform ECC approach is that the ECC protection is equally applied to all memory blocks. It fails to account for the distribution of vulnerability to run-time failures across memory blocks. The conventional overly pessimistic uniform ECC allocation approach, where ECC error correction capability is based on worst-case memory block vulnerability, generally wastes significant silicon area and leads to greater power consumption. With increasing spatial as well as temporal shift in intrinsic reliability of memory blocks, such uniform protection approaches are unattractive in terms of overhead or level of protection. This paper presents a novel reconfigurable ECC scheme for robust nanoscale memory, which can dynamically adapt in space and time to varying reliability of memory blocks. This is achieved by incorporating a reconfigurable ECC encoder and decoder with multiple protection capabilities during the design, and selecting them on demand during actual operation. In order to enhance the effectiveness of a multi-bit bit error tolerance scheme, we use Bose-Chaudhuri-Hocquenghem (BCH) cyclic code that is effective for random multi-bit correction at low hardware overhead. Our approach can provide the right amount of error correction capability to the individual memory blocks depending on their relative vulnerabilities to runtime failures without incurring large hardware and power overhead. As a case study, the proposed time-varying ECC approach is applied on a low-power, supply-voltage-scalable MB L cache. In order to reduce the increasing number of errors in L cache due to voltage scaling, we propose a gradual voltage scaling scheme together with the adaptive time duration control. We show that the proposed adaptive ECC approach provides high level of reliability for the cache while maintaining its low-power advantage.

Space and Time Varying ECC in Memory Protection Model Variation Model Fig. : Overall scheme for the proposed variable error correction in nanoscale memory.

Two types of configurability for dynamic error correction in nanoscale memory array. II.

2 Space and Time Varying ECC in Memory Protection Model Variation Model Fig. : Overall scheme for the proposed variable error correction in nanoscale memory. The correction capability changes over space and time. T and W indicate, the numbers of bits to be corrected and the codeword width, respectively. Two types of configurability for dynamic error correction in nanoscale memory array. II. SPACE-TIME VARING ECC In this section, first we describe prior art on variable ECC and present the basic concept of the adaptive ECC approach. A. Related Work on Variable ECC ) Reliability-Driven ECC Allocations for Adaptive Error Protection: A reliability-driven ECC allocation scheme, where the relative vulnerability of a memory block (determined using post-fabrication characterization) is matched with appropriate ECC protection, has been proposed in []. In this approach, post-fabrication variable ECC allocation to different memories are achieved by storing the check bits in the ways of an associative cache. This work also presents efficient circuit/architecture-level optimizations of the ECC encoding/decoding logic to minimize the impact on area, performance, and energy. ) Bit-width Reconfigurable ECC: Based on the facts that the MSBs are significantly more important than least significant bits (LSBs) in digital signal processing (DSP) applications with respect to output data quality, a bit-width reconfigurable ECC [7] is designed with extra control units for dynamically changing the input data-length. When the number of memory failures in a code-word exceeds the maximum correctable number of bits during low voltage operation, the input data length of ECC is reduced to focus on the more important MSB parts. As a result, the correction of failures on MSBs can be ensured even at low supply voltages, and the overall system quality degradation caused by SRAM failures can be minimized since uncorrected LSB failures have much less prominent effect on the system output. B. Space-Time Varying ECC With inter and intra-die process variations, different sections of a memory array move to different process corners, and some of the memory cells may become marginally functional during the manufacturing test. Those weak cells can undergo runtime failures due to voltage/thermal noise or aging effects. In order to improve the reliability, the memory cells that suffer larger process variations should be protected using stronger ECC with higher error correction capabilities. However, due to the unpredictable random process variations, the conventional uniform ECC protection fails to account for the distribution of vulnerability across memory blocks. The proposed spacetime varying ECC scheme addresses this shortcoming and allocates detection and correction capabilities proportional to the vulnerability of the blocks. Fig. and illustrate the overall scheme for the proposed variable ECC in memory. As shown in Fig., depending on the severities of static, spatial and temporal variations, the reconfigurable ECC can adaptively change error correction capability (T) and code-word width (W) over space and time. In the following sections, as an example of variable ECC approach, this paper presents a

3 H = (, ) R(x) bit H(,8) bit H(44,8) 4bit H(,8) α α α α 4 α 5 α α 7 α 8 α 9 α 4 α 4 α 4 α 4 α 44 α 45 α 4 α 47 α 48 α 49 α 5 α 5 α 5 α 5 α 54 α 55 α 5 α 57 α 58 α 59 α α α 9 α 47 α 5 α 5 α 5 α 59 α α 5 α 8 α 7 α 74 α 77 α 8 α 8 α 8 α 89 α 9 α 95 α 98 α α 4 α 7 α α α α 9 α α 5 α α 5 α α 5 α 7 α 75 α 8 α 85 α 9 α 95 α α 5 α α 5 α α 5 α α 5 α 4 α 45 α 5 α α 5 α α 5 α α 5 α α 7 α 4 α α 7 α 8 α 87 α 94 α α 8 α 5 α α 9 α α 4 α 5 α α 9 α α α α 7 α 44 α 5 α 58 α 5 α 7 α 79 α 8 α 9 Syndrome Generator Key Equation Solver FIFO Chien Search C(x) 59 S S S S4 S5 S S7 bit bit 4bit (c) S8 a 8 a XOR array XOR array S S S5 S7 mult. PE mult. mult. mult. mult. mult. mult. mult. mult. (d) PE bit 4bit bit bit 48 D D D 4bit (e) 48 D Encoding Mode LUT Decoding Control mode_selection Syndrome Monitoring Input Syndrome Φ PDN VDD PUN Φ GND Turning-off gate() Fig. : VC-ECC decoder architecture. The unified parity check matrix of VC-ECC. The complete BCH decoding process. (c) Syndrome generator. (d) Peterson algorithm [9] based key equation solver implementation. (e) Chien search. (f) Dynamic adaptation scheme applied to syndrome generator using turning-off gate []. The enable signal Φ is generated from control module using the mode_selection signal. (f) Output S S S S4 S5 S S7 S8 bit bit 4bit temporally varying ECC scheme, where the ECC architecture can dynamically change the error correction capabilities depending on the number of failures in the embedded memory. III. TEMPORALLY VARYING ECC The proposed Variable error Correction capability ECC (VC-ECC) scheme offers three different error correction capabilities (bit / bit / 4bit), and the correction capability

The VC-ECC Architecture ) VC-ECC Encoder/Decoder Architecture: VC-ECC encoder [8] is mainly composed of Galois field adders and dividers, and three different division parts are used.

4 can be automatically adapted at run time to the number of failures in memory using a dynamic syndrome monitoring approach. When smaller error correction options (- bit correction) is selected, the unused modules can be easily turned off to save computation energy. A. The VC-ECC Architecture ) VC-ECC Encoder/Decoder Architecture: VC-ECC encoder [8] is mainly composed of Galois field adders and dividers, and three different division parts are used. The area overhead of the reconfiguration (different division parts) is small since the area of encoder is much smaller (around 5 %) than that of the decoder. The VC-ECC decoder [8] is composed of syndrome generator (SG), key equation solver (KES), and Chien search (CS) modules as shown in Fig.. Overall VC- ECC decoder is similar to 4-bit correction BCH decoder, and the architecture is scalable such that a simple control logic can easily turn off the unused modules when the correction capability is or bits. The unified parity check of VC-ECC is presented in Fig.. The dimension of the parity check matrix is (, ) meaning that the input is the codeword of bit and the outputs are four odd syndromes of 8 bit width. Each Galois field element of the parity check matrix is (8, ) vector. As shown in Fig. (c) and (f), only 5% or 5% of SG is utilized for -bit or -bit correction BCH, respectively. The scalable syndrome calculation is also shown in the unified parity check matrix. For KES module, the inversion-less Peterson algorithm [9] is adopted to reduce the critical path delay. The inversion-less KES for 4-bit correction BCH decoder is designed with PE and PE, and PE can be turned off when -bit correction mode is used. The scalable CS modules are also presented in Fig. (e). Fig. (f) illustrates the power-gating scheme [8] to turn off the unused parts in the BCH decoder. Simple pull-up and pull-down transistors with correct dimensions is used to turn-off unused sections of the SG based on whether -bit, -bit or 4-bit correction scheme is being exercised. The pulldown NMOS transistor is required to ensure that SG modules provide zero output when unused in order to have correct ECC functionality. The additional area for powergating is accounted for in the results presented in Fig.. ) Dynamic Adaptation of VC-ECC: The proposed VC- ECC has three choices of error correction capabilities, and the correction mode can be controlled using -bit mode_selection signal as shown in Fig.. For the protection of on-chip cache memory using VC-ECC, the -bit mode_selection is stored per cache block to indicate the encoding type, and the number of ways to store ECC bits is dynamically adjusted during runtime, similar to spatially varying ECC []. The two bit overhead for the mode_selection storage is negligible considering a typical cache block size (e.g. 5bits). At runtime, this mode_selection information is updated on a regular basis by monitoring the frequency of memory failures. This is obtained from the output of the syndrome generator since ECC Type Total area (μm ) Max. freq. (MHz) # of Cycles Power (mw) SECDED any non-zero syndrome indicates memory failure occurrence. When the frequency of memory failures increases or decreases, the VC-ECC scheme can change mode_selection signal to offer proper error correction capabilities. As presented in Fig. (f), we do not need an extra stage to identify/change ECC mode since the mode_selection signal can be directly used to VC-ECC decoder. B. Experimental Results. Hardware Implementation Results SECDAEC 84 Bit Cor. BCH The proposed VC-ECC decoders are implemented using 5-nm standard-cell CMOS library, and Fig. shows the implementation results. SECDED, single-error-correction double-adjacent-error correction (SECDAEC) ECC, -bit (Hamming), -bit, and 4-bit correction BCH decoders are also implemented for comparison. From the results, it is evident that the power and performance for single bit error correction with the proposed VC-ECC hardware is comparable with those in stand-alone SECDED, SECDAEC.5 (c) Bit Cor. BCH Bit Cor. BCH 5 4. VC-ECC 575 Storage (bit) 9 8 /8/ /4/ /./4.9 Fig. : Implementation results of VC-ECC scheme. Hardware implementation results (area, power, performance). L miss rate ratio when VC-ECC is applied to L cache. (c) CPI ratio when VC-ECC is applied to L cache. 4

5 The number of replacements in L cache (x 5 ) VDD (mv) VDD (mv) ECC mode ECC mode L cache miss Data writing + L cache miss Normal data writing 5 5 Clock cycles (x 8 ) Clock cycles (x 8 ) V DD e n : the n th error frequency e e Initial V DD drop e T Clock cycles (x 8 ) (c) and Hamming hardware. The area requirement is understandably larger since VC-ECC hardware provides greater flexibility in error correction. The parity bits of VC- ECC (8// bit) is identical with those of each //4 bit BCH. In Fig., storage bits indicate the parity bits plus the selection bits for configurability, which are called the mode_selection bits. The power consumptions results in Fig. are obtained using the clock cycle of ns i.e. a Error frequency Error frequency Fig. 4: Simulation results with 49.mcf benchmark: Increasing cache block replacement with time. Error frequency variations with the conventional one-step voltage scaling. (c) Error frequency variation with the proposed step-by-step voltage scaling. (ΔV dd = mv and ΔT= million cycles) frequency of 7MHz for all ECC schemes at.v with circuit-level simulations in Spice using input data. Fig. and (c) show the simulation results on cache miss rate and clock-cycles per instruction (CPI) when the proposed temporally varying ECC scheme is applied to L cache. The performance of the proposed VC-ECC scheme is measured using SPEC benchmark suite complied for 4-bit single core out-of-order processor with 4 issue width. It is simulated by general execution-driven multiprocessor simulator (GEMS) on Simics using the MOESI coherence protocol. Each of the SPEC benchmarks was simulated for million instructions. Our memory system includes a KB, 8-way set-associative L instruction and data cache, and a MB, -way set-associative unified L cache. All caches in the system are configured to have 4 byte lines. The L cache access latency is assumed to be clock cycles, and L cache is assumed to be protected by SECDED as a baseline. From the simulation results shown in Fig and (c), we can observe the following. When the proposed VC-ECC is used as -bit or 4-bit correctible scheme, the performance degradation induced by reduction in cache capacity due to parity storage bits and additional latencies of ECC decoder logic, is negligibly small for most of the benchmarks compared to baseline SECDED scheme. In the following section, we present a case study of the VC-ECC application. For the very slowly changing temporal variations like aging, since the error rate increases only with time, the adaptation method can be relatively regular and simple. We will consider the more complex example of L cache with supply voltage scaling. IV. APPLICATION TO VOLTAGE-SCALABLE L CACHE When VC-ECC is used in L cache, following are the two issues to consider: ) since the supply voltage scaling induces the increase of bit error rate (BER), VC-ECC should change the ECC mode for providing stronger protection to L cache memory. However, the data that has been already encoded by ECC encoder and stored in the cache, need to be decoded in the same way as it was encoded. For example, consider that an L cache word has been stored with ECC mode (-bit correction mode) before the supply voltage scaling, and ECC mode changes from mode to mode (-bit correction mode) after scaling down the voltage. Then the cache words already encoded with ECC mode need to be decoded by mode after the mode change although the current ECC mode is mode. ) A simple and effective way to cope with the ECC mode change is to read out all the cache data and re-encode the data following the new mode_selection signal. However, reencoding all the data in the memory would incur large latency and power overheads. Actually, the normal cache read/write operations naturally replace the cache data with the new ECC mode as time goes on, which is shown in Fig. 4. In the normal cache operation, since the data in cache 5

6 is decoded with previous ECC mode while reading, and the data is encoded with the updated ECC mode when writing to memory, it naturally updates the cache data even without re-encoding. However, when the supply voltage is scaled down, the cache error rate abruptly increases, which may causes considerable performance loss. As an example, Fig. 4 shows the increasing error frequency when the supply voltage is abruptly scaled down from 7 mv to mv. Here, the error frequency is defined as the number of codewords encountering errors in the total number of decoded codewords during a given time duration. In this work, considering the L cache access rate of 5.45 million per sec (. million per 5 ms) with the worst case of the benchmark simulations, the time duration of 5 ms is used to consider at least million L cache accesses. The total number of decoded codewords during the time duration is around. To obtain the results shown in Fig. 4, first, BER of SRAM is obtained using Monte Carlo simulations with 45nm predictive technology model (PTM) for various supply voltages. Cache access data is calculated using the GEM5 simulator on Simics using SPEC benchmark suite compiled for GHz 4-bit single core out-of-order processor. The SPEC benchmark has been simulated for 5 million instructions. The details of the cache configurations can be found in Section III. As presented in Fig. 4, when the supply voltage is scaled down directly from 7 mv to mv with one step change, the error frequency starts to increase rapidly. Due to the error rate difference between 7 mv and mv, memory blocks encoded with lower protection level, show a high decoding error frequency. A large error frequency observed especially at the initial moment of the voltage scaling as shown in Fig. 4, can lead to the significant performance loss due to the latency overhead of reading data from the next level of memory i.e. the main memory (DRAM). Gradual voltage scaling scheme with VC-ECC: As shown in Fig. 4, the cache blocks are gradually replaced with time. We leverage the observation of gradual cache replacement of blocks to scale the supply voltage down at multiple small steps (ΔV dd ) instead of one-step scaling to reduce the memory error rate incurred by voltage scaling. In this way, we give enough time (ΔT) for the existing blocks with increased errors to be gradually replaced with new blocks with updated ECC mode. The results for the step-bystep supply voltage scaling, which incorporates a relatively gradual change in the supply voltage, are presented in Fig. 4 (c). As shown in the figure, dividing the voltage shift in small steps can prevent the error rate from abruptly increasing, and it becomes much lower than that in Fig. 4. In addition to the cache replacement due to the normal data writing (a dotted line) as shown in Fig. 4, the regular cache-misses also accelerate the cache replacement (a solid line) for the L cache application. Since the cachemiss also needs to replace the cache block with the data from the L cache or main memory block, the proposed step-by-step scheme can be applied more effectively to the L cache. The numerical results shown in Fig. 4 (c) is obtained with a fixed voltage step (ΔV dd ) of mv and time duration (ΔT) of million cycles ( ms). Here, the voltage step (ΔV dd ) is a fixed parameter, which is limited by a voltage regulator performance []. However, the time duration (ΔT) can be controlled by monitoring the error frequencies to provide enough time for cache replacement since blindly reducing the supply voltage with fixed ΔT can increase the memory errors. Adaptive Time Duration (ΔT) Control: In this approach, we decide whether to scale down the supply voltage or not by comparing the error frequency with an error rate threshold value. When the monitored error frequency is still larger than a threshold value, the supply voltage is maintained to give more time until the error rate decreases with cache replacement. Otherwise, the supply voltage can be scaled down one step. The cumulative moving average (CMA) [] of the error frequency is used as the threshold value in our approach. The CMA is widely used to determine the moving average since it provides a distortion tolerance for the applications with equally important input data []. As presented in Fig. 4 (c), the n th CMA from the initial ΔV dd drop can be expressed as: CMA n = e n + (n ) CMA n n = e + e + + e n n e n = the n th error frequency from the initial V dd drop, where CMA =. CMA n is the average of all the error frequency values from the initial e, and it can be easily calculated with a simple hardware ( adder, multiplier, divider and flip-flop) based on the current error frequency e n and the former CMA n-. When the error frequency e n is larger than CMA n-, the supply voltage is maintained (ΔT increases), which means that the cache has not been replaced enough. If e n is smaller than CMA n-, the supply voltage can be scaled by a step ΔV dd. In this approach, since each of e n for computing CMA n is equally weighted, a single error frequency does not have a large effect on CMA value, which helps us to make more reliable decision as n grows. Since CMA values can be unstable with the small n near the initial ΔV dd drop, the error frequency comparison with CMA starts when n is larger than. For the proposed adaptive ΔT control approach, the block diagram of the step-by-step voltage scaling process is presented in Fig. 5. The process is initiated with the start signal generated when dynamic voltage scaling (DVS) circuitry begins scaling down the supply voltage with ΔV dd step. VC-ECC also changes the ECC mode by updating mode_selection signal. After changing the ECC mode, the syndrome monitoring module of VC-ECC sends non-zero syndrome detection information to the monitoring circuit to update the decoding errors. Then, the monitoring circuit and scaling decision module in Fig. 5 begins the adaptive ΔT

7 Mode Control VC-ECC Syndrome Monitoring Core Non-zero syndrome detect Monitoring Circuit Error Freq. Scaling Decision CMA Decoding Error Monitoring Circuit VDD (mv) Mode_selection LUT (SRAM) Data L Cache (SRAM) Initial Avg. Period e e e Cumulative sum. of errors Initial start V DD Scale Down DVS Circuit Adaptive VDD step Fixed VDD step Adaptive Err. freq. Fixed Err. freq Clock cycles (x 8 ) No step M cycle 5M cycle M cycle M cycle 5M cycle Adapt. step Error frequency No Increment Error Sum. Period End Compute CMA.5ms [].s.s.s.4s 5 5 Clock cycles (x 8 ) s Yes Initial Avg. Period End No Yes Scaling decision & DVS circuit.55s Error Freq. < CMA Yes Scale Down One Step V dd V dd = V Yes Voltage Scaling End No Start with V No V is target V dd Fig. 5: The proposed voltage scaling system with VC-ECC: Block diagram of step-by-step voltage scaling process. Comparison of error frequency changes between the adaptive and fixed-time duration approaches simulated with 49.mcf benchmark. (ΔV dd = mv, fixed ΔT= million cycles). (c) Operation flow chart of adaptive time duration control approach. (d) Cumulative sum of errors when various fixed-time duration and adaptive time duration approaches are used (simulation with 49.mcf benchmark). (d) (c) control approach and major steps of this process are presented in Fig. 5(c). In the figure, the monitoring circuit calculates the error frequency based on the non-zero syndrome detection. The scaling decision module computes CMA using the error frequency throughout the step-by-step voltage scaling process. The scaling decision module also makes a decision on V dd scaling down by comparing CMA with the error frequency that is delivered from monitoring circuit. For reliable voltage scaling decision, comparison between the error frequency with the computed CMA starts after the initial average period ends, as shown in Fig. 5. The comparison results between the adaptive time duration (ΔT) and the fixed ΔT ( million cycles) approaches are shown in Fig. 5. As shown in Fig. 5, the average error frequency of the adaptive time duration control approach is reduced to.75, which shows 5.4% and 5.7% error frequency reductions compared to the conventional one-step voltage scaling scheme and the fixed ΔT approach, respectively. We can observe from the results 7

8 shown in Fig. 5 that larger time duration at the first step, results in reduced error frequency. Fig. 5(d) also shows the cumulative sum of errors when the adaptive time duration approach and the various fixed time duration schemes are used. The total number of errors of the adaptive time duration scheme is 75, which is the same as that of fixed 5 million cycles case. However, with the fixed time duration of 5 million cycles, the total time taken to scale down the supply voltage from 7 mv to mv with 5 steps is sec (.55 sec for the adaptive time duration case), which incurs relatively large (approximately 9% more) power consumption compared to the proposed adaptive ΔT approach. Consideration with dirty cache data: In the application of the VC-ECC to L cache, when an error occurs in a clean cache line and the VC-ECC scheme is not able to correct it, re-fetching it from the next level memory is a viable solution. However, if the errors occur in dirty cache lines, there is no option to recover. These errors can eventually lead to incorrect program execution. To prevent this case, when VC-ECC encodes the cache data, it can selectively provide stronger protection for the dirty block (e.g.. -bit correction for clean and -bit correction mode to dirty data). This is similar to the scheme proposed in [], where different protection levels for dirty and clean cache blocks are used. It also shows that the average portion of dirty data is considerably small compared to the whole cache size. As presented in Fig. and (c), the increase in L cache miss ratios and CPI ratios with -bit correction or 4-bit correction modes is negligible. Since the dirty data portion in whole cache is not large, the overhead of applying stronger ECC for only dirty cache data is expected to be modest. V. SUMMARY & FUTURE DIRECTIONS We have presented an adaptive protection scheme for nanoscale memory arrays that provides the right amount of protection to each memory block under spatially and temporally varying reliability. Area, performance, and energy overheads for the proposed scheme are minimized by appropriate choice of ECC and joint circuit/architecture level optimizations of the encoding/decoding hardware. In contrast to existing multiple bit error tolerance schemes, the proposed ECC approach can tolerate both higher random and contiguous errors, and is amenable for efficient dynamic adaption in operating point e.g. voltage, which makes it attractive for low-power memory. Future investigations will include application of reliability-aware address mapping and combination with bit-interleaving to further reduce ECC overhead and enhance error protection. the Information Technology Research and Development Program of Korea Evaluation Institute of Industrial Technology (KEIT) [57, Design technology development of ultralow voltage operating circuit and IP for smart sensor SoC]. REFERENCES [] S. Narasimhan, K. Kunaparaju, and S. Bhunia, Healing of DSP Circuits Under Power Bound Using Post-Silicon Operand Bitwidth Truncation, IEEE Trans. Circuits Syst. I, vol. 59, no. 9, pp. 9-94,. [] J. Tschanz, K. Bowman, and V. De, Variation-Tolerant Circuits: Circuit Solutions and Techniques, Proc. Design Automation Conf., pp. 7-7, 5. [] S.S. Mukherjee, J. Emer, T. Fossum, and S. K. Reinhardt, Cache Scrubbing in Microprocessors: Myth or Necessity? IEEE Int. Symp. Dependable Computing, pp. 7-4, 4. [4] E. H. Cannon, A. Kleinosowski, R. Kanj, D. Reinhardt, and R. V. Joshi, The Impact of Aging Effects and Manufacturing Variation on SRAM Soft-Error Rate, IEEE Trans. Dev. Mat. Rel., vol. 8, no., 8, pp [5] N. Quach, High Availability and Reliability in the Itanium Processor, IEEE Micro, vol., no. 5, pp. -9,. [] S. Paul, F. Cai, X. Zhang, and S. Bhunia, "Reliability-Driven ECC Allocation for Multiple Bit Error Resilience in Processor Cache", IEEE Trans. Comput., vol., no., pp. -4,. [7] J. Park, J. Park, and S. Bhunia, VL-ECC: Variable Data- Length Error Correction Code for Embedded Memory in DSP Applications, IEEE Trans. Circuits Syst.Ⅱ, vol., no., pp. -4, 4. [8] A. Basak, S. Paul, J. Park, J. Park, and S. Bhunia, "Reconfigurable ECC for adaptive protection of memory," IEEE 5th International Midwest Symposium on Circuits and Systems,, pp [9] S. Lin and D. Costello, Error Control Coding, nd Edition, Prentice Hall, 4. [] S. Park et al., Accurate Modeling of the Delay and Energy Overhead of Dynamic Voltage and Frequency Scaling in Modern Microprocessors, IEEE Trans. Comput. Aided Design, vol., no. 5, pp ,. [] T. Chen, and M. Ikeda, Design and Implementation of Low- Power Hardware Architecture with Single-Cycle Divider for On- Line Clustering Algorithm, IEEE Trans. Circuits Syst. I, vol., no. 8, pp. 5-7,. [] L. Li et al., Soft error and energy consumption interactions: a data cache perspective, in Proc. of International Symposium on Low Power Electronics and Design, pp. -7, 4. Dongyeob Shin is currently working toward the integrated Master and Ph.D. degree in the VLSI Signal Processing Research Lab, Korea University, Seoul, Korea. His research interests include low-power, energy-efficient VLSI design, and error correction code design. Shin has a BS in electrical engineering from Korea University, Seoul, Korea. ACKNOWLEDGMENTS The work is supported in part by Semiconductor Research Corporation (SRC) grant 5.. This work is also supported by National Research Foundation of Korea (#5MDA745 and #RAB459), and Jangwon Park currently works as a Senior Engineer for Samsung Electronics, Suwon, Korea. His research interests include low-power error correction code design for embedded memory. Park has BS and MS degrees in electrical engineering from Korea University, Seoul, Korea. 8

9 Jongsun Park is currently an Associate Professor of the School of Electrical Engineering, Korea University, Seoul, Korea. His research interests focus on variation-tolerant, low-power, high-performance VLSI architectures and circuit designs. Park has a PhD in electrical and computer engineering from Purdue University. He is a senior member of IEEE. Somnath Paul is currently a research scientist at Intel Labs, Intel Corporation. His primary research interest is hardware-software co-design for energy-efficiency, yield and reliability in nanoscale technologies. Paul received his Ph.D. degree in Computer Engineering from Case Western Reserve University, Cleveland, OH. Swarup Bhunia is a professor of electrical and computer engineering at the University of Florida. His research interests include hardware and system security, implantable systems, and energy-efficient electronics. He received his PhD in computer engineering from Purdue University. He is a Senior Member of IEEE and a member of ACM. 9

120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014

120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014 VL-ECC: Variable Data-Length Error Correction Code for Embedded Memory in DSP Applications Jangwon Park,