On the Use of Hard-Decision LDPC Decoders on MLC NAND Flash Memory

Similar documents
Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

k (check node degree) and j (variable node degree)

Joint Message-Passing Symbol-Decoding of LDPC Coded Signals over Partial-Response Channels

BASED ON ITERATIVE ERROR-CORRECTION

Fully Parallel Window Decoder Architecture for Spatially-Coupled LDPC Codes

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Elementary Educational Computer

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

GPUMP: a Multiple-Precision Integer Library for GPUs

3D Model Retrieval Method Based on Sample Prediction

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network

Appendix D. Controller Implementation

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

A REDUCED-COMPLEXITY LDPC DECODING ALGORITHM WITH CHEBYSHEV POLYNOMIAL FITTING

CS2410 Computer Architecture. Flynn s Taxonomy

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

UNIVERSITY OF MORATUWA

CS 683: Advanced Design and Analysis of Algorithms

UH-MEM: Utility-Based Hybrid Memory Management. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, Onur Mutlu

Ones Assignment Method for Solving Traveling Salesman Problem

6.854J / J Advanced Algorithms Fall 2008

Probability of collisions in Soft Input Decryption

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )

Improving Template Based Spike Detection

FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS

Pattern Recognition Systems Lab 1 Least Mean Squares

Mobile terminal 3D image reconstruction program development based on Android Lin Qinhua

Lecture 5. Counting Sort / Radix Sort

Interference Aware Channel Assignment Scheme in Multichannel Wireless Mesh Networks

Image Segmentation EEE 508

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

New Results on Energy of Graphs of Small Order

Alpha Individual Solutions MAΘ National Convention 2013

Fuzzy Minimal Solution of Dual Fully Fuzzy Matrix Equations

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

A Study on the Performance of Cholesky-Factorization using MPI

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING

A Note on Least-norm Solution of Global WireWarping

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

Low Complexity H.265/HEVC Coding Unit Size Decision for a Videoconferencing System

Random Network Coding in Wireless Sensor Networks: Energy Efficiency via Cross-Layer Approach

Optimization for framework design of new product introduction management system Ma Ying, Wu Hongcui

Heaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Cache-Optimal Methods for Bit-Reversals

Extending The Sleuth Kit and its Underlying Model for Pooled Storage File System Forensic Analysis

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Lecture 1: Introduction and Strassen s Algorithm

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

Big-O Analysis. Asymptotics

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Algorithms for Disk Covering Problems with the Most Points

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

Fast Fourier Transform (FFT) Algorithms

An Efficient Implementation Method of Fractal Image Compression on Dynamically Reconfigurable Architecture

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING: SPECIAL ISSUE ON EMERGING TECHNOLOGIES IN COMPUTER DESIGN 1

Wavelet Transform. CSE 490 G Introduction to Data Compression Winter Wavelet Transformed Barbara (Enhanced) Wavelet Transformed Barbara (Actual)

Isn t It Time You Got Faster, Quicker?

Investigating methods for improving Bagged k-nn classifiers

Operating System Concepts. Operating System Concepts

Security of Bluetooth: An overview of Bluetooth Security

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

Efficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO

Using the Keyboard. Using the Wireless Keyboard. > Using the Keyboard

Dynamic Programming and Curve Fitting Based Road Boundary Detection

MAC Throughput Improvement Using Adaptive Contention Window

1 Enterprise Modeler

The Simeck Family of Lightweight Block Ciphers

On the Accuracy of Vector Metrics for Quality Assessment in Image Filtering

. Written in factored form it is easy to see that the roots are 2, 2, i,

Efficient Hough transform on the FPGA using DSP slices and block RAMs

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Our Learning Problem, Again

Automatic Generation of Polynomial-Basis Multipliers in GF (2 n ) using Recursive VHDL

Descriptive Statistics Summary Lists

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis

Introduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP

A Comparative Study on the Algorithms for a Generalized Josephus Problem

An Efficient Implementation of the Gradient-based Hough Transform using DSP slices and block RAMs on the FPGA

Sectio 4, a prototype project of settig field weight with AHP method is developed ad the experimetal results are aalyzed. Fially, we coclude our work

Fuzzy Membership Function Optimization for System Identification Using an Extended Kalman Filter

Project 2.5 Improved Euler Implementation

The Counterchanged Crossed Cube Interconnection Network and Its Topology Properties

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

A Modified Multiband U Shaped and Microcontroller Shaped Fractal Antenna

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme

MOST of the advanced signal processing algorithms are

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

K-NET bus. When several turrets are connected to the K-Bus, the structure of the system is as showns

An Estimation of Distribution Algorithm for solving the Knapsack problem

Transcription:

O the Use of Hard-Decisio LDPC Decoders o MLC NAND Flash Memory Khoa Le ad Fakhreddie Ghaffari ETIS, UMR-8051, Uiversité Paris Seie, Uiversité de Cergy-Potoise, ENSEA, CNRS, Frace. {khoa.letrug, fakhreddie.ghaffari}@esea.fr Abstract The soft-decisio Low-Desity Parity-Check (LDPC) decoders are widely used i the multi-level per cell (MLC) NAND flash memory to esure the correctess of stored data. However, to obtai the good error correctio capability, these soft-decisio decoders require the soft-iformatio, which results a log o-chip memory sesig ad dramatically exteds the memory read latecy. I this paper, we explore the possibility of usig some recet itroduced Bit Flippig (BF) hard decisio decoders, i replacig the soft-decisio decoders, to correct the retetio errors appeared i the MLC NAND flash memory. The applied BF decoders require oly the hard iformatio read from that memory to decode ad thus, by avoidig soft-iformatio sesig, the memory read latecy is improved. Furthermore, by usig the characteristic of the retetio error where some iformatio bits ca be determied as error-free, we propose the adaptatio of the BF decoders to be better adapted to MLC NAND flash memory. We show that, the adapted BF decoders ca correct more erroeous bits thaks to these error-free values. The simulatio results show that, the adapted BF decoders ruig o the flash memory provide a good error correctio performace, beig close to that of the soft-decisio decoders, with oly the hard iformatio of the memory. I additio, it is show that, the BF decoders provide a higher decodig speed compared to the soft-decisio decoder. I. INTRODUCTION NAND flash memory have bee used i practical applicatios thaks to its favorable features such as the high data throughput, low-power cosumptio, compactess [1]. I order to icrease the storage capacity of NAND flash memory, multi-level cell techology have bee developed to store more tha 1 bit per cell, e.g. 2 bits per cell i [2], 3 bits per cell i [3] or 4 bits per cell i [4]. This icreasig storage desity results to the fact that, the stored data is more proe to error due to various uderlyig physical factors [5], appearig at the type of retetio error, cellto-cell iterferece (CCI) error etc. This MLC techology sigificatly icreases the bit error rate ad a powerful error correctio code with soft-iformatio has to be employed to yield a good error correctio performace. There are several works o the use of the soft-decisio LDPC codes for error correctio o MLC NAND flash memory [2][6][7]. There are also several works o efficiet hardware implemetatio of LDPC decodig algorithm o that type of memory [8]. All of these researches have demostrated that, the soft-decisio LDPC decoders ca achieve a oticeable error correctio gai over the covetioal bose-chaudhuri-hocqueghem (BCH) code. However, sice the MLC NAND flash memory read latecy is proportioal to the memory sesig precisio, the use of soft-iformatio memory sesig causes sigificatly icrease of memory read latecy. Low-Desity Parity-Check (LDPC) are the well-kow error correctio code (ECC) applied i several commuicatio stadards such as IEEE 802.11, 802.3a, 802.16a etc. [9][10][11] ad storage applicatios as i [12][13][14]. LDPC ca be effectively decoded by the iterative decodig algorithms which are the iterative process where messages are iteratively passed betwee the computig uits: Variable Nodes (VNs) ad Check Nodes (CNs). A LDPC decoder is either the soft-iformatio message passig decoder, i.e. Mi Sum (MS), Sum Product (SP) [15], or hard-decisio decoder, i.e. Bit Flippig (BF), Gallager-A,B [16]. The soft-decisio decoders require the soft-iformatio from chael ad durig the decodig process, the soft-iformatio are passed betwee the computig uits. These soft-decisio decoders provide the outstadig error correctio, approachig the capacity. However, the decoder complexity is dramatically icreased while the decodig speed is relatively low, due to the itesive computatio. O the cotrary, the hard decisio decoders have bee recetly attracted much attetio due to the high decodig throughput, low-complexity implemetatio ad the progressive improvemet i error correctio, approachig or eve surpassig the soft-decisio decoders [17]. Recetly, several BF decoders such as Gradiet Descet Bit Flippig (GDBF) [18], Probabilistic Gradiet Descet Bit Flippig (PGDBF) [19], Probabilistic Parallel Bit Flippig (PPBF) [20] decoders, have bee itroduced, i which the error correctio performace is sigificatly improved, beig very close or eve comparable to the performace of softdecisio decoders. These decoders are show to be very lowcomplexity, which comes from the fact that, oly biary messages are iteratively passed betwee the computig uits simplifyig the coectio etworks. Furthermore, the computig uits which process these biary messages are very simple ad therefore, ehace the decodig throughput. More iterestigly, these decoders require oly the hard iformatio from the chael to start the decodig process. I this paper, we ivestigate o the use of BF decoders, i replacig the soft-decisio decoders, to correct the retetio errors i the MLC NAND flash memory. We focus o the GDBF ad PGDBF o MLC NAND flash memory i terms of error correctio as well as the hardware efficiecy such as the decodig throughput ad decoder complexity, by compariso

with the soft-decisio decoders. We take beefit from the fact that, the BF decoders require oly hard iformatio from chael to avoid the log memory sesig, ad from that, the memory read latecy is obviously reduced. Moreover, by usig the characterizatio of the retetio errors i MLC NAND flash memory, we ca determie certai bits which are reliable. We the adapt the BF decoders such that the reliable VNs are kept uchaged i the VN processig uits (VNUs) while decodig oly o other ureliable VNs. We show that, with the additioal iformatio i determiig the reliable VNs i MLC NAND flash memory, the modified BF decoders ca correct more retetio errors tha the covetioal couterparts. The rest of the paper is orgaized as followig. I Sectio II, we recall the characteristic of the retetio oise i MLC NAND flash memory ad characterize the priciple to determie the reliable bits whe the retetio errors occur. The LDPC decodig priciple ad the deployed BF decoders (the GDBF ad PGDBF decoders) are also itroduced. I Sectio III, we preset the adapted BF decoders (deoted as A-GDBF ad A-PGDBF) dedicated for MLC NAND flash memory. I Sectio IV, we show for the test case that, the BF decoders provide a comparable error correctio to that of the soft-decisio decoders (the MS decoder) while the average umber of decodig clock cycles is oly 1/4. Although the advatages of avoidig soft-iformatio sesig is ot quatified, the fact that the adapted BF decoders use oly the hard-decisio iformatio reduces obviously the read latecy of MLC NAND flash memory. Sectio V cocludes the paper. II. THE MLC NAND FLASH MEMORY RETENTION ERROR CHARACTERIZATION AND BF LDPC DECODERS RECALL A. The characterizatio of retetio errors i MLC NAND flash memory MLC NAND flash memory allows storig more iformatio i a sigle cell which is realized by the umber of charges trapped i the floatig gate ad the umerous voltage thresholds to idetify the storage states [2]. I each MLC NAND flash memory cell, the voltage levels betwee two adjacet storage states become closer, makig it beig more proe to errors whe beig read. Fig. 1 shows the relatio betwee the stored data ad its correspodig threshold voltage distributio for a 2-bits per cell NAND flash memory. There are 4 states of the stored data which are usually represeted by the Gray code to reduce the raw bit error probability. While the data is stored i the memory cell, the programmed cell gradually loses its charge i the floatig gate (Fig. 1b). This leads evetually to a chage i logic states, causig the error kow as the retetio error. The appearace of retetio error depeds o several factors e.g. the umber of Program/Erase (P/E), the storig time. There are also other types of errors i MLC NAND flash memory such as the cell-to-cell iterferece (CCI) or programmig error (which appears whe the data is beig writte). However, several works i literature showed that, the retetio error is the mai source of error appearig i MLC NAND flash memory [5]. We assume i this work that oly Retetio directio Number of charges 0 01 00 10 11 Logic values v th v ref3 v ref2 v ref1 Desired distributio Distributio after losig charges a. b. Figure 1. The retetio behavior i MLC NAND flash memory. retetio errors appear i the read data from MLC NAND flash memory. Also, we limit the illustratios i this work to oly 2-bits per cell MLC NAND flash memory. Whe readig data from a MLC NAND flash memory cell, the retur data is i the form as (most-sigificat-bit, least-sigificat-bit) or (MSB,LSB). Further iformatio o the MLC NAND flash memory such as the orgaizatio, program/read process ca be foud i [2][6][7]. The charge trapped i the floatig gate is gradually dimiished formig the fact that, the retetio error is always i the type of oe-directio state trasitios. Ideed, it ca be see i Fig. 1 that, the logic state 01 is realized at the highest charge trapped. The loss of charge chages oly that logic state to the lower charge level states such as states 00 or 10 ad there is o logic state that ca form the state 01 whe the retetio errors happe. This implies the fact that, the 01 read from MLC NAND flash memory cell is the error-free (reliable) iformatio. This observatio have also itroduced i [21]. The work of [21] also showed more cases where we ca determie the reliability of the read data. Ideed, the MSB whe readig 00 from the cell is reliable because if the retetio error occurs, it chages from the state 01 to 00 ad oly LSB is chaged. With the same priciple, the LSB of 10 ad MSB of 11 are also the reliable data. Note that, we assume i this work that, oly sigle step retetio error (where the logic state is oly chaged to its closest adjacet state) appears i MLC NAND flash memory. It is a acceptable assumptio sice the multiple state jump is extremely rare thaks to the observatio i [5][22]. By usig this characterizatio of retetio error, we propose to have a pre-processig o the read data from the MLC NAND flash memory to determie the reliable iformatio before lauchig the LDPC decodig process. This extra iformatio will help the BF correct more v th

erroeous bits i the decodig process. B. GDBF ad PGDBF decoders A LDPC code is defied by a sparse parity-check matrix H (M, N), N > M. Each row represets a parity check fuctio, computed by a CN, o the VNs represeted by the colums. The VN v (1 N) is checked by the CN c m (1 m M) if the etry H(m, ) = 1 ad they are called eighbors. LDPC code ca also be represeted by a bipartite graph called Taer graph composig two groups of odes, the CNs c m, (1 m M) ad the VNs v, (1 N). The VN v (1 N) coects to the CN c m (0 m M) by a edge i the Taer graph if the etry H(m, ) = 1. We deote the set of CNs coected to the VN v as N (v ), (1 N), ad N (v ) is called VN degree. Similarly, N (c m ), (1 m M) deotes all VNs coected CN m ad N (c m ) is the CN degree. This paper works o the regular LDPC code, i.e. N (v ) = d v ad N (c m ) = d c, m. Whe LDPC ECC is applied i the MLC NAND flash memory, the user data is ecoded ad stored i form of a codeword. A vector x = {x } = {0, 1} N, (1 N) is called a codeword if ad oly if Hx T = 0. We deote y = {y }, (1 N) as the read versio of x from the memory. The retetio error i this work is modeled as a crossover behavior such that each bit of x is flipped with a probability α, called raw Bit Error Rate (BER), i.e., Pr(y = x ) = 1 α ad Pr(y = 1 x ) = α, 1 N. We use the superscript (k) alog with the ode (VN ad CN) otatios to idicate that ode value at the k iteratio. The decodig process of BF decoders is a iterative process where the biary messages are iteratively passed betwee the VNs ad CNs. The CN operatio is the parity check operatio which is formulated i Equ. 1 where is the Exclusive-OR (XOR) operatio. The decodig process will be termiated if all the CN are satisfied i.e. c (k) m = 0, m, otherwise the decodig process cotiues with the VN operatios. E (k) c (k) m = v (k) (1) v N (c m) = v (k) y + c m N (v ) c (k) m (2) I each VN operatio, first of all, a so-called eergy value is computed usig Equ. 2 basig o the all eighbor CN values, the VN curret value (v (k) ) ad its value received from the chael (y ). The maximum eergy value, E max, (k) is computed by a Maximum Fider (MF). I GDBF decoder, the value of a VN will be flipped if ad oly if its eergy value is equal to the maximum eergy value. I PGDBF, a biary radom geerator is implemeted geeratig for each VN a radom sigal R (k) where Pr(R (k) = 1) = p ad Pr(R (k) = 0) = 1 p ad p is a pre-defied sigal. The value of each VN i PGDBF is flipped if ad oly if its eergy equals to the maximum eergy ad R (k) = 1. The GDBF ad PGDBF provide the good error correctio performace while requirig oly the hard-iformatio as show i [19][23]. We preset i the ext MLC NAND flash memory x ^ Raw data HD read ad pre-processig I BF LDPC Decoders y Decoded codeword Figure 2. The utilizatio of BF decoders o the MLC NAND flash memory system. sectio the adaptatio of these decoders to work o the MLC NAND flash memory. III. ADAPTED BF (A-BF) DECODERS FOR MLC NAND FLASH MEMORY By uderstadig the retetio error characteristic, we preset the adaptatio of the BF decoders to MLC NAND flash memory system. The MLC NAND flash memory system is described i the Fig. 2. The adaptatio icludes the additioal pre-processig uit which provides the reliability sigal for each data bit read from memory. The secod part of the adaptio is the modificatio i the decodig process basig o the reliability iformatio from the pre-processig module. A. The pre-processig module The target of desigig the pre-processig block is to idetify the reliability of each bit i the data read from MLC NAND flash memory ad to use this iformatio improvig the performace of the BF decoders. As preseted above, 2 bits (MSB ad LSB) are stored i the same cell ad basig o their values, the reliability ca be determied. We deote the reliability of the MSB as I MSB ad that of the LSB as I LSB. The value of these variables represets the reliability of the bits (reliable bit is deoted by 1, otherwise, it is marked by 0). We show the values of I MSB ad I LSB as the fuctios of MSB ad LSB i table I. It ca be see that, whe the read data is 01, both of MSB ad LSB are reliable. I the other cases, oly oe bit (either MSB or LSB) is reliable. The pre-processig module is desiged to idetify the reliability of a stored codeword ad to produce a biary vector I = (I 1, I 2,..., I N ). Each elemet of I represets the reliability of the correspodig bits (or VNs) i the read data. I the MLC NAND flash memory layout, two bits (MSB ad LSB) i a cell may belog to the same codeword [22] or they are i two differet codewords. I the latter case, pre-processig module eeds the read of two cosecutive codewords (i N cells) i order to compute I for a codeword, while i the former case, it eeds the iformatio i N/2 cells. A computig block is desiged to produce the values I MSB

MSB LSB XOR LSB I MSB I Figure 3. The computig uit to idetify the reliability of the read data from oe MLC NAND flash memory. ad I LSB o the value read (MSB ad LSB). The detailed circuit of this computatio block is described i Fig. 3. I geeral, the pre-processig module may be implemeted with oly 1 computig block ad used serially or may blocks are implemeted operatig i parallel for acceleratig the readig speed. It ca be see that, the pre-processig uit is a simple combiatory circuit ad it would require a egligible hardware overhead. Table I THE RELIABILITY DETERMINATION BASED ON THE DATA READ FROM A MLC NAND FLASH CELL. MSB LSB I MSB I LSB 0 1 1 1 0 0 1 0 1 0 0 1 1 1 1 0 B. The adapted BF decoders for MLC NAND flash memory The pre-processig uit provides the reliability iformatio of each bit read from the MLC NAND flash memory. We adapt the BF decoders to be used i this type of memory. We have applied our strategy for the GDBF ad PGDBF which are amed as A-GDBF ad A-PGDBF, respectively. The A- PGDBF decodig algorithm is described i Algorithm 1. It ca be see that, the A-PGDBF follows similarly the decodig step of the origial PGDBF ad the modificatio is o the VN operatio where a verificatio step is added. I the VN processig of A-PGDBF, the flippig step is proceeded oly if that VN value is idicated as ureliable (I = 0). This priciple is also applied for the A-GDBF. It ca be see that, the reliability iformatio of each VN is used i the adapted decoders, chagig the behaviors of VNs, compared to the origial decoders. Furthermore, this iformatio helps the adapted BF decoders correct more errors i the read data tha their origial couterparts. We illustrate a example where the origial decoder (GDBF) fails to correct the errors while the adapted decoder (the A-GDBF) ca correct i Fig. 4. It is show that, the GDBF fails with oly 3 error VNs located o a trappig set (5,3) (i Fig. 4a) [23] (the white circles (squares) represet the correct VNs (satisfied CNs) while the black circles (squares) represet the erroeous VNs (usatisfied CNs), with the assumptio that all zero codeword is set). GDBF fails to fid the correct codeword because whe flippig the erroeous VNs ( ad ), it flips also the correct VNs ( ad ) ad oscillates betwee the two states as i Fig. 4a ad Fig. 4b. O the cotrary, i A-GDBF decoder, the erroeous VNs will be certaily flipped while the correct VNs will be flipped oly whe they are ureliable. A-GDBF ca correct all the errors i Fig. 4a after 2 iteratios (the error cofiguratio as i Fig. 4a Fig. 4c Fig. 4d) if the correct VNs ad are reliable. It ca also correct all error VNs i 3 iteratios by goig Fig. 4a Fig. 4e Fig. 4c Fig. 4d if is reliable while is ureliable. A-GDBF fails to correct the errors whe both ad are ureliable. Therefore, while GDBF always fails, A-GDBF ca correct 75% whe this error patter appears. The advatage i error correctio of the adapted BF decoders is show further i the simulatio results i the ext sectio. Algorithm 1 The Adapted-PGDBF decodig algorithm for MLC NAND Flash memory 1: Iitializatio k = 0, v (0) = y 2: while k It max do 3: for 1 m M do CN processig 4: Compute c (k) m, usig Equ. (1). 5: ed for 6: if c (k) = 0 the Check sydrome 7: exit while 8: ed if 9: for 1 N do VN processig 10: Compute E (k), usig Equ. (2). 11: ed for 12: Sort E max (k) = max (E (k) ) 13: for 1 N do 14: if I = 0 the Added verificatio 15: Geerate R (k) 16: if E (k) from B(p). = 1 the = v (k) 1 = E max (k) ad R (k) 17: v (k+1) 18: ed if 19: ed if 20: ed for 21: k = k + 1 22: ed while 23: Output: v (k) IV. ERROR CORRECTION PERFORMANCE AND ANALYSIS I this sectio, we preset the simulated error correctio performace of adapted BF decoders o the MLC NAND flash memory with the retetio errors. Although the practical LDPC code i the MLC NAND flash memory is usually with very high code rate ad the log codeword [8], we simulated our BF decoders o a moderate code rate (R = 0.75), ad relative short (N = 1296) code to save the simulatio time. We assume that, the MSB ad LSB i a MLC NAND flash memory cell belog to 2 differet codewords ad we decode oly the codeword formed by the LSB bits. As show above,

10 4 10 5 A-GDBF baselie GDBF A-PGDBF baselie PGDBF MS a. b. Bit Error Rate (BER) 10 6 10 7 10 8 10 9 c. d. 10 10 0.01 Raw BER, α 0.001 Figure 5. The decodig performace of the BF decoders o MLC NAND flash memory. e. Figure 4. The error patter with which the GDBF fails to correct the error while the A-GDBF ca. the LSB codeword has higher probability to be erroeous tha the MSB codeword. We produce, therefore, the worst-case decodig performace of our BF decoders. We also show the decodig performace of the quatized MS decoder with 4 bits for LLR ad 6 bits for the AP-LLR iformatio [24]. The MS decoder is implemeted with the layered schedulig i which a decodig iteratio is composed by 4 layers. It ca be see i the Fig. 5 i geeral that, the BF decoders provide the good error protectio ability for MLC NAND flash memory. Ideed, the A-PGDBF provides a BER at 8.10 10 ad A-GDBF is with 4.10 8 whe the raw BER at 2.10 3. The A-PGDBF is much better tha the A-GDBF decoder i term of error correctio at the cost of hardware overhead whe beig implemeted, as observed i may works [23][25]. The reliability iformatio of each bit helps the BF decoders correct more retetio errors. I fact, we itroduce also i the Fig. 5 the decodig performace of the BF decoders whe the bit reliability iformatio are ot used ad deote them as baselie. It is show that, the A-GDBF loses aroud 5dB i BER while A-PGDBF loses 2dB at raw BER 2.10 3. Also, the A-GDBF seems to be affected o all values of α while the A-PGDBF is affected o the error floor regio oly. I geeral, the MS is better tha all BF decoders i error correctio at the cost of hardware complexity [23] ad the decodig throughput. We show i Fig. 6 the compariso o the average umber of clock cycles eeded to decode a codeword, as a fuctio of α. Due to the lack of space, we do ot show the detail of the BF ad MS hardware architectures (which ca be foud i [23] ad [24]). The MS decoders requires 2 clock cycles to decode 1 layer (it does 4 layers i a iteratio) while Average umber of clock cycles 16 14 12 10 8 6 4 2 0.01 Raw BER, α A-GDBF baselie GDBF A-PGDBF baselie PGDBF MS 0.001 Figure 6. The average clock cycles required to decode 1 codeword. the BF decoders requires 1 clock cycle to fiish 1 iteratio. The differece i the average umber of clock cycles ca be iterpreted to the differece i decodig throughput if the decoders operate at the same frequecy. It ca be see i the Fig. 6 that, the BF decoders provide a much higher decoder speed tha that of the MS decoder. At raw BER 2.10 3, the BF decoders is 4 times faster tha the MS decoder. Note further that, the BF decoders require oly the hard iformatio from MLC NAND flash memory while the soft-decisio decoders eed the soft-iformatio. Although we do ot quatify i detail this advatage, it will obviously improve the readig latecy whe the BF decoders are used i MLC NAND flash memory. V. CONCLUSION We propose i this paper a ivestigatio i usig the recet itroduced BF hard-decisio LDPC decoders (the GDBF ad PGDBF decoders), i replacig the soft-decisio, to correct the retetio error i MLC NAND flash memory. The motivatio

comes from the fact that, BF decoders require oly the hardiformatio from the chael to proceed the decodig process, ad from that, the log soft-iformatio sesig ca be omitted. Furthermore, by usig the characteristic of retetio error, we have adapted these BF decoders i such a way that they ca correct more errors i MLC NAND flash memory. The adapted BF decoders show a admirable error correctio improvemet, beig very close or eve comparable to soft-decisio decoders. The advatage of usig BF decoders are also cofirmed by the very small umber of clock cycles eeded for decodig a codeword which is iterpreted to the high decodig throughput. The promisig results i terms of error correctio capability, decodig throughput ad complexity lead the BF decoders to become the good ECC cadidates i the ew geeratio MLC NAND flash memory. ACKNOWLEDGMENT This work was fuded by the frech ANR uder grat umber ANR-15-CE25-0006-01 (NAND project). REFERENCES [1] R. S. Liu, M. Y. Chuag, C. L. Yag, C. H. Li, K. C. Ho, ad H. P. Li, Improvig read performace of ad flash ssds by exploitig error locality, IEEE Trasactios o Computers, vol. 65, o. 4, pp. 1090 1102, April 2016. [2] H. Su, W. Zhao, M. Lv, G. Dog, N. Zheg, ad T. Zhag, Exploitig itracell bit-error characteristics to improve mi-sum ldpc decodig for mlc ad flash-based storage i mobile device, IEEE Trasactios o Very Large Scale Itegratio (VLSI) Systems, vol. 24, o. 8, pp. 2654 2664, Aug 2016. [3] Y. L. et al., A 16 gb 3-bit per cell (x3) ad flash memory o 56 m techology with 8 mb/s write rate, IEEE Joural of Solid-State Circuits, vol. 44, o. 1, pp. 195 207, Ja 2009. [4] N. S. et al., A 70 m 16 gb 16-level-cell ad flash memory, IEEE Joural of Solid-State Circuits, vol. 43, o. 4, pp. 929 937, April 2008. [5] Y. Cai, E. F. Haratsch, O. Mutlu, ad K. Mai, Error patters i mlc ad flash memory: Measuremet, characterizatio, ad aalysis, i 2012 Desig, Automatio Test i Europe Coferece Exhibitio (DATE), March 2012, pp. 521 526. [6] G. Dog, N. Xie, ad T. Zhag, Eablig ad flash memory use softdecisio error correctio codes at miimal read latecy overhead, IEEE Trasactios o Circuits ad Systems I: Regular Papers, vol. 60, o. 9, pp. 2412 2421, Sept 2013. [7] D. h. Lee ad W. Sug, Estimatio of ad flash memory threshold voltage distributio for optimum soft-decisio error correctio, IEEE Trasactios o Sigal Processig, vol. 61, o. 2, pp. 440 449, Ja 2013. [8] J. Kim ad W. Sug, Rate-0.96 ldpc decodig vlsi for soft-decisio error correctio of ad flash memory, IEEE Trasactios o Very Large Scale Itegratio (VLSI) Systems, vol. 22, o. 5, pp. 1004 1015, May 2014. [9] Wireless la medium access cotrol ad physical layer specificatios: Ehacemets for higher throughput, p802.11/d3.07, IEEE-802.11, Mar. 2008. [10] Local ad metropolita area etworks - part 16, ieee std 802.16a-2003, IEEE-802.16. [11] Ieee stadard for iformatio techology - corrigedum 2: Ieee std 802.3a-2006 10gbase-t correctio, IEEE Std 802.3-2005/Cor 2-2007 (Corrigedum to IEEE Std 802.3-2005), Aug. 2007. [12] S. Jeo ad B. V. K. V. Kumar, Performace ad complexity of 32 k-bit biary ldpc codes for magetic recordig chaels, IEEE Trasactios o Magetics, vol. 46, o. 6, pp. 2244 2247, Jue 2010. [13] K. C. Ho, C. L. Che, Y. C. Liao, H. C. Chag, ad C. Y. Lee, A 3.46 gb/s (9141,8224) ldpc-based ecc scheme ad o-lie chael estimatio for solid-state drive applicatios, i 2015 IEEE Iteratioal Symposium o Circuits ad Systems (ISCAS), May 2015, pp. 1450 1453. [14] G. Dog, N. Xie, ad T. Zhag, O the use of soft-decisio errorcorrectio codes i ad flash memory, IEEE Trasactios o Circuits ad Systems I: Regular Papers, vol. 58, o. 2, pp. 429 439, Feb 2011. [15] D. Declercq, M. Fossorier, ad E. Biglieri, Chael codig: Theory, algorithms, ad applicatios, Academic Press Library i Mobile ad Wireless Commuicatios, Elsevier, ISBN: 978-0-12-396499-1, 2014. [16] R. G. Gallager, Low desity parity check codes, MIT Press, Cambridge, 1963, Research Moograph series, 1963. [17] G. Sudararaja, C. Wistead, ad E. Boutillo, Noisy gradiet descet bit-flip decodig for ldpc codes, IEEE Trasactios o Commuicatios, vol. 62, o. 10, pp. 3385 3400, Oct 2014. [18] T. Wadayama, K. Nakamura, M. Yagita, Y. Fuahashi, S. Usami, ad I. Takumi, Gradiet descet bit flippig algorithms for decodig ldpc codes, IEEE Trasactios o Commuicatios, vol. 58, o. 6, pp. 1610 1614, Jue 2010. [19] O. A. Rasheed, P. Ivais, ad B. Vasić, Fault-tolerat probabilistic gradiet-descet bit flippig decoder, IEEE Commuicatios Letters, vol. 18, o. 9, pp. 1487 1490, Sept 2014. [20] K. Le, F. Ghaffari, D. Declercq, B. Vasic, ad C. Wistead, A ovel high-throughput, low-complexity bit-flippig decoder for ldpc codes, i 2017 Iteratioal Coferece o Advaced Techologies for Commuicatios (ATC), Oct 2017, pp. 126 131. [21] L. Qiao, H. Wu, D. Wei, ad S. Wag, A joit decodig strategy of o-biary ldpc codes based o retetio error characteristics for mlc ad flash memories, i 2016 Sixth Iteratioal Coferece o Istrumetatio Measuremet, Computer, Commuicatio ad Cotrol (IMCCC), July 2016, pp. 183 188. [22] M. Zhag, F. Wu, X. He, P. Huag, S. Wag, ad C. Xie, Real: A retetio error aware ldpc decodig scheme to improve ad flash read performace, i 2016 32d Symposium o Mass Storage Systems ad Techologies (MSST), May 2016, pp. 1 13. [23] K. Le, F. Ghaffari, D. Declercq, ad B. Vasić, Efficiet hardware implemetatio of probabilistic gradiet descet bit-flippig, IEEE Trasactios o Circuits ad Systems I: Regular Papers, vol. 64, o. 4, pp. 906 917, April 2017. [24] T. Nguye-Ly, K. Le, F. Ghaffari, A. Amaricai, O. Bocalo, V. Savi, ad D. Declercq, Fpga desig of high throughput ldpc decoder based o imprecise offset mi-sum decodig, i 2015 IEEE 13th Iteratioal New Circuits ad Systems Coferece (NEWCAS), Jue 2015, pp. 1 4. [25] K. Le, D. Declercq, F. Ghaffari, L. Kessal, O. Bocalo, ad V. Savi, Variable-ode-shift based architecture for probabilistic gradiet descet bit flippig o qc-ldpc codes, IEEE Trasactios o Circuits ad Systems I: Regular Papers, vol. PP, o. 99, pp. 1 13, 2017.