A Low-Power ECC Check Bit Generator Implementation in DRAMs

Similar documents
Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation

Single error correction, double error detection and double adjacent error correction with no mis-correction code

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

Exploiting Unused Spare Columns to Improve Memory ECC

DETECTION AND CORRECTION OF CELL UPSETS USING MODIFIED DECIMAL MATRIX

HDL IMPLEMENTATION OF SRAM BASED ERROR CORRECTION AND DETECTION USING ORTHOGONAL LATIN SQUARE CODES

Error Detecting and Correcting Code Using Orthogonal Latin Square Using Verilog HDL

An Integrated ECC and BISR Scheme for Error Correction in Memory

FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes

Reducing Power Consumption in Memory ECC Checkers

Improving Memory Repair by Selective Row Partitioning

Reliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure

Error Correction Using Extended Orthogonal Latin Square Codes

J. Manikandan Research scholar, St. Peter s University, Chennai, Tamilnadu, India.

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

A New Scan Chain Fault Simulation for Scan Chain Diagnosis

An Area-Efficient BIRA With 1-D Spare Segments

3D Memory Formed of Unrepairable Memory Dice and Spare Layer

An Efficient Error Detection Technique for 3D Bit-Partitioned SRAM Devices

A Network Storage LSI Suitable for Home Network

Available online at ScienceDirect. Procedia Technology 25 (2016 )

/$ IEEE

Comparative Analysis of DMC and PMC on FPGA

Fast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs

An Integrated ECC and Redundancy Repair Scheme for Memory Reliability Enhancement

MEMORY reliability is a major challenge in the design of

Programming Characteristics on Three-Dimensional NAND Flash Structure Using Edge Fringing Field Effect

A Low-Cost Correction Algorithm for Transient Data Errors

Low Power Cache Design. Angel Chen Joe Gambino

Unleashing the Power of Embedded DRAM

Multiple Event Upsets Aware FPGAs Using Protected Schemes

Process Variation on Arch-structured Gate Stacked Array 3-D NAND Flash Memory

120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014

CAD Technology of the SX-9

Outline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline

It is well understood that the minimum number of check bits required for single bit error correction is specified by the relationship: D + P P

Area-Efficient Error Protection for Caches

DIRECT Rambus DRAM has a high-speed interface of

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology

Hardware Implementation of Single Bit Error Correction and Double Bit Error Detection through Selective Bit Placement for Memory

CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL

A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit

Chapter 5. Internal Memory. Yonsei University

DUE to the high computational complexity and real-time

ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

Performance Evolution of DDR3 SDRAM Controller for Communication Networks

An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

Soft Error Detection And Correction For Configurable Memory Of Reconfigurable System

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

A Novel Architecture of SRAM Cell Using Single Bit-Line

Memory Technologies for the Multimedia Market

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY

Answers to comments from Reviewer 1

[Kalyani*, 4.(9): September, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

Design and Characterization of SDRAM Controller IP Core with Built In Refined ECC Module

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT

Designing a Fast and Adaptive Error Correction Scheme for Increasing the Lifetime of Phase Change Memories

the main limitations of the work is that wiring increases with 1. INTRODUCTION

Input Ordering in Concurrent Checkers to Reduce Power Consumption

Error Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna

THE latest generation of microprocessors uses a combination

Motivation. Banked Register File for SMT Processors. Distributed Architecture. Centralized Architecture

High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm *

Implementation of Decimal Matrix Code For Multiple Cell Upsets in Memory

Minimizing Power Dissipation during Write Operation to Register Files

SSD Garbage Collection Detection and Management with Machine Learning Algorithm 1

A Robust Bloom Filter

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Analysis of Different Multiplication Algorithms & FPGA Implementation

Fault Tolerant Parallel Filters Based On Bch Codes

Using Error Detection Codes to detect fault attacks on Symmetric Key Ciphers

TEST cost in the integrated circuit (IC) industry has

Fault Tolerant Parallel Filters Based on ECC Codes

[Sahu* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Deduction and Logic Implementation of the Fractal Scan Algorithm

Design of Majority Logic Decoder for Error Detection and Correction in Memories

Memory Systems IRAM. Principle of IRAM

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN

Static Compaction Techniques to Control Scan Vector Power Dissipation

A novel test access mechanism for parallel testing of multi-core system

Enhanced Detection of Double Adjacent Errors in Hamming Codes through Selective Bit Placement

Model EXAM Question Bank

VERY large scale integration (VLSI) design for power

Implementation of Efficient Ternary Content Addressable Memory by Using Butterfly Technique

ABSTRACT I. INTRODUCTION

High Throughput Radix-4 SISO Decoding Architecture with Reduced Memory Requirement

Low Power Testing of VLSI Circuits Using Test Vector Reordering

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

FeRAM Circuit Technology for System on a Chip

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007

Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation

Majority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

An Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic

A hardware operating system kernel for multi-processor systems

Transcription:

252 SANG-UHN CHA et al : A LOW-POWER ECC CHECK BIT GENERATOR IMPLEMENTATION IN DRAMS A Low-Power ECC Check Bit Generator Implementation in DRAMs Sang-Uhn Cha *, Yun-Sang Lee **, and Hongil Yoon * Abstract A low-power ECC check bit generator is presented with competent DRAM implementation with minimal speed loss, area overhead and power consumption. The ECC used in the proposed scheme is a variant form of the minimum weight column code. The spatial and temporal correlations of input data are analyzed and the input paths of the check bit generator are ordered for the on-line adaptable power savings up to 24.4% in the benchmarked cases. The chip size overhead is estimated to be under 0.3% for a 80nm 1Gb DRAM implementation. Index Terms Error correction code, single error correction, low power, input ordering, on-line, memory, DRAM I. INTRODUCTION As the market for mobile devices has rapidly grown, low-cost DRAMs are finding wide applications in portable devices. As such, it has become imperative to reduce the power consumption in mobile DRAMs [1]. Meanwhile, with the rapid advance of the semiconductor processing technology enabling low-voltage operations, the reliability of the device is jeopardized as the vulnerability to the alpha particles and cosmic radiation grows. These soft errors are undoubtedly becoming one of the most serious threats to high-performance memory systems. For highly reliable memory systems, tolerance to potential hazards caused by these soft errors must be gracefully provided. On-chip error correction code (ECC) techniques are being used to enhance the reliability of memory systems by strengthening the user systems against random soft errors. As one of the remedies to enhance the reliability of DRAMs, ECC processing circuits can be naturally adopted in DRAMs. However, the ECC processing circuit has often proven to be unworthy due to unacceptable amount of power consumption. Therefore, the recent research on ECC for memory has to focus on the reduction of the power consumption. ECC processing circuit mainly consists of four units: 1) check bit generator, 2) syndrome generator, 3) decoder and 4) corrector. The check bit generator produces the check bits which represent the characteristics of data bits. The syndrome generator compares the read check bits with write check bits, and the result is used to determine the location of error bits in the decoder. Finally, the corrector amends for the necessary modifications in the faulty data. Among these units, the check bit generator is most complex and consumes the largest amount of power [2]. In this paper, we propose an efficient input data ordering scheme to effectuate the implementation of low power ECC check bit generator. II. PREVIOUS WORK Manuscript received Sep 9, 2006; revised Nov. 5, 2006. * Department of Electronic and Electronics Engineering, Yonsei University, 134 Shinchon-Dong, Seodaemun-Gu, Seoul 120-749, Korea ** Memory Division, Samsung Electronics, Hwasung, Korea E-mail : rustle88@yonsei.ac.kr 1. Minimum Odd Weight Column Code The minimum odd weight column code is the most commonly used ECC for single error correction (SEC) and double error detection (DED). The minimum odd

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.6, NO.4, DECEMBER, 2006 253 weight column code can be expressed in various shapes of the H-matrix. According to the feature of the H-matrix, the speed, power and area overhead of the check bit generator can be determined. Hence, by optimizing the H-matrix, the check bit generator bearing the desired attributes can be implemented. It can be noted that the H-matrix of the minimum odd weight column code has the following constraints for SEC and DED [3]: 1) The number of 1 s in the H-matrix should be minimized. 2) The number of 1 s in a row of the H-matrix should be equal or close to the average number of 1 s among all rows. 3) Every column should have an odd number of 1 s. 2. Off-line Input Ordering Technique to Reduce Power Consumption The power consumption of the check bit generator can be reduced by changing appropriate columns of the H- matrix. The shape of the H-matrix determines the order of data inputted into the check bit generator. Based on this fact, an off-line input ordering technique which minimizes the power consumption has been proposed previously [4]. The off-line input ordering technique consists of two steps. In the first step, the spatial and temporal correlations of input data are analyzed using the genetic algorithm (GA) and simulated annealing (SA). In the second step, the optimal H-matrix that minimizes the power consumption is selected using the information from the previous step. The whole process is conducted off-line because the execution time is fairly long. Therefore, the aforementioned off-line input ordering technique is not an efficient adaptation scheme. III. PROPOSED SCHEME In this section, a new scheme for the low-power check bit generator implementation is proposed. The proposed scheme is conducted on-line so that the check bit generator can efficiently find the best condition with the least amount of power budget. 1. Minimum Weight Column Code So far many codes that can correct single error and detect double errors have been suggested. However, in the practical aspect, multiple error detection without correction has an over-paid budget because it cannot really improve the real-time reliability of the memory device, while a surplus area overhead in excess of the SEC only implementation is necessary. We propose a simple code called the minimum weight column code which conducts only SEC. It can be noted that the minimum weight column code s H-matrix follows the constraints to be described below: 1) The number of 1 s in the H-matrix should be minimized. 2) The number of 1 s in a row of the H-matrix should be equal or close to the average number of 1 s among all rows. 3) Every column should have a minimum number of 1 s. The first constraint enables the check bit generator to keep the minimum power consumption and area overhead. The second constraint guarantees the shortest delay in the check bit generator. Finally, the third constraint enforces that the H-matrix implements only SEC. Consequently, the column weights as well as the total number of weights are minimized, resulting in the reduction of both area and power for the ECC processing circuit. Fig. 1 shows an example of the H-matrix of the minimum weight column code for 8-bit data width. Table 1 shows the implementational attributes for the proposed minimum weight column code and some other conventional codes for 64-bit data width. The minimum weight column code requires only seven check bits, while the SEC-DED codes require eight check bits, thereby decreasing the area overhead of the check bit memory core by 12.5%. The reduction of the total weights in H-matrix practically means the reduction of the total inputs of the inner elements of the check bit generator. As a consequence, the area overhead is decreased by about 14% as well as the power consumption in comparison to the implementation using the odd weight column code. By balancing the row weight close to an equal number, the level of the XOR

254 SANG-UHN CHA et al : A LOW-POWER ECC CHECK BIT GENERATOR IMPLEMENTATION IN DRAMS Fig. 1. Example of the H-matrix of the minimum weight column code for 8-bit data width. Table 1. Comparison of the proposed code with some conventional codes for 64-bit data width. Fig. 3. Block diagram of the correlation detector. Check bits Total weights of H-matrix Maximum data bits for one check bit generation (M) Level of 3-input XOR tree log3m code 7 179 26 3 SEC Modified Hamming code hamming code 7 205 32 4 8 269 64 4 SEC-DED Odd weight column code 8 208 trees in the check bit generator is minimized. In the proposed code, the total delay can be decreased roughly by 25%. 2. On-line Input Ordering Scheme to Reduce Power Consumption The input data correlation of DRAMs will vary according to the specific systems and their particular applications. If the input data ordering is conducted offline, it is not guaranteed that the check bit generator will always consume the minimum power. In this sense, online input ordering is the only available means to adjust H-matrix to varying input data effectively. 26 3 Fig. 2 illustrates the block diagram of the proposed scheme. Bidirectional data I/Os are assumed as in DRAMs. The scheme includes the basic building blocks for the ECC processing: 1) check bit generator, 2) syndrome generator, 3) decoder and 4) corrector. In addition, to implement on-line input data ordering technique, three components are devised: 1) correlation detector, 2) path selector and 3) path de-selector. The correlation detector analyzes the spatial and temporal correlations of input data using the number of input data transitions. The path selector orders input data paths according to the analysis from the correlation detector. The path de-selector reorders the input and returns the order to the original state. Fig. 3 shows the block diagram of the correlation detector. It counts the number of the transitions of each input data and sends the correlation signal to the path selector. It consists of a pulse generator to detect the timing for each input data transition and a counter to count the number of data transitions. The path selector is illustrated in Fig. 4. With an account of the correlation signals from the correlation detector, the path selector orders the input data path of Fig. 2. Block diagram of the proposed scheme. Fig. 4. Block diagram of the path selector.

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.6, NO.4, DECEMBER, 2006 255 power is very small in comparison with the total of ECCoperated power which is considerably reduced by the proposed scheme. Moreover, if the modification interval can be maintained to be long enough, the advantage will grow. IV. EXPERIMENTAL RESULTS Fig. 5. Timing diagram of the correlation detector and the path selector. the check bit generator to minimize the power consumption. For the minimum weight column code of 64-bit data width, all 2 C 7 =21 of the weight-2 and all 3 C 7 =35 of the weight-3 are used, but only eight of the 4C 7 =35 possible weight-4 columns are used. When the eight weight-4 columns out of thirty five are selected, the row weight of the H-matrix should be similar so as to minimize the delay. Therefore, the check bit generator has three groups of data paths for column weight-2, column weight-3 and column weight-4, respectively. Based on the spatial correlation, each group of input data is linked to the input path groups which have similar number of transitions. Higher transition data is connected to the input path group with smaller column weight and lower transition data is connected to the input path group with larger column weight for reduction in the power consumption [5]. Fig. 5 shows the functional timing diagram of the correlation detector and the path selector. The input waveforms in the region (A) are analyzed in the correlation detector then the correlation signals shown in the region (B) are generated. Finally, the input data is ordered based on the correlation signals as illustrated in the region (C). The ordered data is asserted into the check bit generator. The functional steps for the path de-selector are in the reverse order compared to those for the path selector. It can be simply viewed by changing the direction of inputs and outputs of the path selector. Once the check bit generator is modified on-line, the check bits which are already generated should be updated to correlate with the new check bit generator. The updating operation, however, is conducted only when the check bit generator is modified and it takes only a minimal time. The consumed reconfiguration We measured the power benefits in our proposed scheme using the input data abstracted from gcc and go benchmarks from SPEC 1995 and 2000 [6]. First, logic simulation was conducted so as to examine the input path ordering. Afterwards, the power consumed by the proposed check bit generator was evaluated using H- spice. We used TSMC s 0.25µm process and the experiments were conducted with Vcc values of 1.8V, 2.5V and 3.3V. Table 2 shows the experimental results demonstrating the power savings in the range from 17.0% to 24.4% for the proposed scheme. Table 2. Power consumption measured with varying Vcc. Benchmarked power (mw) gcc go 1.5 1.5 Vcc=1.8V 1.2 1.3 6.0 6.0 Vcc=2.5V 4.7 4.8 V. CONCLUSIONS 8.5 8.3 Vcc=3.5V In this paper, a low-power ECC check bit generator is presented by employing the proposed minimum weight column code for on-line input data ordering. With the capability for SEC, the area overhead and power consumption are reduced. Based on the proposed code, the input data correlation of the check bit generator is analyzed and the input data paths are ordered. The check bit generator implemented adopting the proposed scheme reduces the power by about 17.0~24.4% for the benchmarked cases in variable voltage ranges. The additional hardware for the correlation detector, path selector and path de-selector accounts for only 0.3% of the total die size of an 80nm 1Gb DRAM. 6.4 6.6

256 SANG-UHN CHA et al : A LOW-POWER ECC CHECK BIT GENERATOR IMPLEMENTATION IN DRAMS REFERENCES [1] T. Kuroda and M. Hamada, Low power CMOS digital design for multimedia processors, IEEE Journal of Solid-State Circuits, vol. 25, pp. 652-655, 2000. [2] W. Gao and S. Simmons, A study on the VLSI implementation of ECC for embedded DRAM, IEEE Canadian Conference on Electrical and Computer Engineering, pp. 203-206, May 2003. [3] M. Y. Hsiao, A class of optimal minimum oddweight-column SEC-DED codes, IBM J. of RES. and DEVELOP., vol. 14, pp. 395-401, July 1970. [4] S. Ghosh, S. Basu and N. A. Touba, Reducing power consumption in memory ECC checkers, International Test Conference, pp.1322-1331, 2004. [5] K. Mohanram and N. A. Touba, Lowering power consumption in concurrent checkers via input ordering, IEEE Transactions on Very Large Scale Integration Systems, vol. 12, pp. 1234-1243, Nov. 2004. [6] J. J. Yi and D. J. Lija, An analysis of the amount of global level redundant computation in the SPEC 95 and SPEC 2000 benchmarks, 2001 IEEE International Workshop on Workload Characteri zation, pp. 74-81, Dec. 2001. Sang-Uhn Cha received the B.S degree in Electrical and Electronic engineering from Yonsei University, Seoul, Korea, in 2005. He is currently working toward the M.S. degree with School of Electrical and Electronic engineering, Yonsei University. His main research interests include lowvoltage memory circuit and technology, and evolvable hardware design and test. Yun-Sang Lee was born in Seoul, Korea, in 1967. He received the B.S. degree in electronic engineering from Sogang University, Seoul, Korea, in 1991. He joined Samsung Electronics, Korea, in 1991, where he has been engaged in the research and development of high speed and low latency DRAM and SDR/DDR SDRAM. He is currently engaged in the development of future main memories. Hongil Yoon received the B.S. degree in electrical engineering and computer sciences from the University of California, Berkeley, in 1991 and the M.S. and Ph.D. degrees in electrical engineering and computer science from the University of Michigan, Ann Arbor, in 1993 and 1996, respectively. From 1996 to 2002, he was with Samsung Electronics, Kiheung, Korea, being involved in the design of dynamic random access memory. Since 2002, he has been with the Department of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea. His research interests include low-voltage memory circuit and technology, high-frequency RF circuits and devices, and evolvable hardware design and test.