Memory Built-In Self-Repair Dept. of Electrical Engineering National Central University Jungli, Taiwan
Introduction Outline Redundancy Organizations Built-In Redundancy Analysis Built-In Self-Repair Infrastructure IP for Memory Yield Improvement References 2
Introduction Difficulties in System-On-Chip Designs Timing closure Verification & testing Yield improvement Memories usually dominate the chip area Memories will cover 90% of an SOC die area by 2011 Thus the memory yield heavily impacts the SOC yield Increasing memory yield can significantly increase the SOC yield 3
Example Notebook Graphic Controller 1. 0.24 um CMOS 2. 24M-bit edram 3. 500k logic gates, 83M Hz Source: Courtesy NeroMagic 4
Yield of an SOC SOC Yield Y S = Y M Y L Improve the yields of memories can drastically increase the yields of SOCs For example, UltraSparc chip yield Source: R. Rajsuman IEEE D&T, 2001 5
Conventional Memory Repair Flow Test Laser Repair Error Logging Test Bitmaps Redundancy Analysis Requirements: 1. Memory tester 2. Laser repair equipment Disadvantages: 1. Time consuming 2. Expensive 6
The Conventional Approach for SOCs Memory repair in SOCs Memory Tester Laser Repair Large capture memory, redundancy analysis Swap the defective cells Memory Tester Test the repaired memories Logic Tester Test the remaining non-memory components Problems Cost: time cost & equipment cost Accessibility Laser repair become more difficult 7
Memory BISR is Indispensable BISR for SOCs Built-In Self-Test (BIST) Test Built-In Self-Diagnosis (BISD) Diagnostics Built-In Redundancy Analyzer (BIRA) Redundancy allocation Redundancy Reconfiguration Swap the defective cells Within 5 years (~2006), 100% high capacity memories are equipped with BISR (ITRS 2001) 8
Typical Memory BIST Architecture Normal I/Os Test Controller Test Pattern Generator Test Collar RAM Comparator 9
NTHU-FTC BISD Architecture ADD RD ID WEB O CS OE BEF BSO BSI BMS BSC BRS BGO FSI CTR ERR EOP CONT CM D TGO DONE ENA TPG ADDR_T DI_T DO_T WEB_T CS_T OE_T ADDR_S DI_S DO_S WEB_S CS_S OE_S SRAM CLK Test_se 10
BISD in Diagnosis Mode In diagnosis i mode, the BISD can run userspecified march algorithm for test/diagnosis EOP format: Addr Session Syndrome A sample of timing diagram is as follows CLK ERR EOP 1001...10 BEF BSO CONT 11
Typical Memory BISR Architecture Normal I/Os BIRA BIST Reco onfigura ation m echanism Test Collar & RAM Redundancy 12
Redundancy Organizations A memory array with local redundancies Bank 1 Bank k2 Local lspare Columns Local Spare Rows 13
Redundancy Organizations A memory array with hybrid redundancies Bank 1 Bank k2 Local Spare Rows Global (Linked) Spare Columns 14
Redundancy Organizations A memory array with hybrid redundancies Bank 1 Bank 2 Bank 2 Bank k2 Global (Linked) Spare Rows Local Spare Columns 15
Reconfiguration Techniques Three kinds of reconfiguration techniques Soft (programmed) By programming FFs to store reconfiguration information which can connect spare rows or columns Disadvantage: repair process must be performed when the power is turned on; some potential faults cannot be repaired Soft/Firm (programmed) By programming non-volatile memories to blow soft fuses internally and connect spare rows or columns Disadvantage: high-voltage h programming circuitry is usually needed 16
Reconfiguration Techniques Hard (permanent) Laser-blown or electrically-blown polysilicon or diffusion fuses Disadvantages: 1) there techniques are not part of standard CMOS technology, and incorporating them into the process technology and then adding repairing steps after the chip is made cause the cost to increase; 2) allow one-time programming, and any field-related error correction and fault tolerance is difficult to achieve These techniques can coexist in memory BISRs 17
Repair with Flash EEPROM Switches D0 D1 D2 D14 D15 0 1 2 15 Multiplexers Size: 768 kb/12 Rows: 512 Column: 128 Controller with 20 flash cells Q0 Q1 Q2 Q14 Q15 [R. J. McPartland, et al., IEEE CICC, 2000] 18
Redundancy Analysis The redundancy analysis problem is to choose the minimum number of spare rows and columns that cover all the faulty cells The complexity of 2-D redundancy analysis 2-D redundancy analysis problem is NP- complete The time required to determine repair solution is crucial factor 19
Example 1: Redundancy Analysis 0 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 20
Example 2: Redundancy Analysis 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 21
2D Redundancy Analysis Algorithm Typical redundancy analysis algorithms for RAMs with 2D redundancy (spare rows/columns) Two-phase redundancy allocation procedure: must- repair phase and final-repair phase Must-repair phase Row-must repair (column-must repair): a repair solution forced by a failure pattern with >S C (>S R ) defective cells in a single row (column), where S C and S R denote the number of available spare columns and spare rows Final-repair phase Heuristic algorithms are usually used, e.g., repair- most rule 22
Redundancy Analysis Using ATE Create a fault map which size is the same as the memory under test Column Counters 1 0 1 1 0 1 1 2 2 1 0 0 1 1 0 Row Execute software-based redundancy analysis using computer in ATE 0 2 4 1 2 2 0 1 Counters 23
Redundancy Analysis Using ATE Hardware necessary to execute the redundancy analysis A device image memory (or fault memory) The size is the same as the memory under test Counters that indicate the number of faults that occur in a row, or a column Apparently, the conventional software-based redundancy analysis algorithms are not adapted to be realized with hardware and be embedded into the SOCs Hardware overhead is too large Efficient built-in redundancy-analysis (BIRA) algorithms are required to be developed d 24
Types of BISR Off-line BISR On-line BISR BISR Strategies Off-line BISR without BIRA ability BIST + reconfiguration mechanism Off-line BISR with BIRA ability BIST + BIRA + reconfiguration mechanism On-line BISR 25
Examples of BISR Design NEC BISR design without BIRA (JSSC92) 5 26 21 64 Mb 32 32 I/O Buffer I/O Memory Array Spare 21 16wx32b 21 Memory 32 32 ROM TPG Comparator Fail CAM BIST Block 16wx21b BISR Block 26
Examples of BISR Design A BISR design (ITC98) Data Input Bus Main Memory Spare Memory Redundancy Analysis Algorithm Information Reconfiguration Control Unit Column Decoder 27
Main idea BIRA Algorithms CRESTA Exhaustive search with parallel multiple hardware implementations For example, assume that a memory with 2 spare rows (Rs) & 2 spare columns (Cs), then all possible repair solutions R-R-C-C (Solution 1) R-C-R-C R C (Solution 2) R-C-C-R (Solution 3) C-R-R-C R C (Solution 4) C-R-C-R (Solution 5) C-C-R-R (Solution 6) 28
BIRA Algorithms CRESTA Comprehensive Real-time Exhaustive Search Test and Analysis Parallel algorithms For example, Solution 1 (R-R-C-C) R C) 29
BIRA Algorithms CRESTA Analysis flow chart Start Test Fail? Yes No S1 S2 S3 S6 Finish? Yes Result Output No End 30
Review Basic Idea of CRESTA Assume that there are m spare rows and n spare columns in a memory. Then a CRESTA repair analyzer contains C(m+n, m) subanalyzers E.g., if 2 spare rows and 2 spare columns are available, CRESTA will need C(4, 2)=6 sub-analyzers Each sub-analyzer analyzes in-coming row/column addresses of faulty memory cells in parallel l in a different repair strategy t using spare rows/columns Because CRESTA tries all the possible repair strategies of spare resources, it guarantees finding a solution for a repairable memory 31
Limitation of CRESTA However, since CRESTA needs row address and column address of a faulty memory cell in order to check if the current faulty memory cell can be repaired by previously allocated spare resources It is unable to handle at-speed multiple-bit failure occurring in a word-oriented memory due to the fact that determining the number of spare columns needed for all failure bits in a word cannot be achieved in one cycle To solve this at-speed problem, a column repair vector (CRV) is used to store column failure information By analyzing CRV at the end of memory BIST will allow post BIST analysis to determine whether a given repair strategy can repair or not 32
At-Speed BISR Example of redundancy allocation 0 1 2 3 4 CCRR (Unrepairable) CRRC 0 1 2 3 4 1 1 2 2 2 3 3 3 33
At-Speed BISR Implementation Restart BIST Controller Fail/Success Memory Under Test BISRA Controller Repair Data Repairable In the BISRA, all C(m+n, m) analysis engines or just one engine can be implemented In one engine scheme, update the repair strategy if the current repair strategy fails and then re-run BIST and try the next repair strategy 34
At-Speed BISR Implementation Fail Map Address SRA CAR BISRA Engine SRA CAR A B I T E R Fail Map Address Restart SRA CAR BISRA Engine RSR BISRA Engine BISRA controller with C(m+n, m) engines BISRA controller with one engine 35
Basic Components of BISRA Spare Resource Allocation (SRA): allocates either a spare row or a spare column according to its repair strategy Control and Report (CAR): checks if this repair strategy fails If not, it will report the repair data, such as faulty row addresses and CRV, to BISRA controller if Arbiter grants the right to it Repair Strategy Reconfiguration (RSR) block: it updates the repair strategy and sends a restart signal to BIST controller 36
Early Abort Method E.g., let a memory have 2 spare rows and 2 spare columns, and the current repair strategy is CCRR If after reserving a spare column we find that the number of ones in CRV exceeds 2, we can conclude that the memory cannot be repaired by any repairing strategy beginning with a spare column R C R C C R C R C R R C C C R R C R 37
RAM BISR Using Redundant Words The BISR methodology contains memory BIST logic, wrapper logic to replace defect words, and fuse boxes to store the failing addresses Only spare words are used for replacing defective cells Avoid redundancy calculation The BISR will be no additional delay in the data path of the memory Without penalty on total test time Redundancies can be activated immediately within 1 cycle 38
The RAM BISR Architecture Address, Data Input, Control BIST Mux Fuse Box Redundancy Logic RAM Mux Source: V. Schober, et. al, ITC01. 39
Redundancy Wrapper Logic The redundancy logic consists of two basic components Spare memory words Logic to program the address decoding The address comparison is done in the redundancy logic The address is compared to the addresses that are stored in the redundancy word lines An overflow bit identifies that there are more failing addresses than possible repair cells The programming of the faulty addresses is done during the memory BIST or from the fuse box during memory setup 40
An Array of Redundant Word Lines MBIST Address Write Data Address, Data Input, Control F Address Expected Data Fail fail Fail Address RAM Data TDI FA Address Data RAM Word Redundancy FA Address Data FA Address Data FO Control Overflow TDO Data out Source: V. Schober, et. al, ITC01. 41
Applications of Redundancy Logic Faulty addresses can be streamed out after test completion. Then the fuse box is blown accordingly in the last step of the test This is called here hard repair This is normally done at wafer level test Furthermore, the application can be started immediately after the memory BIST passes This is called here soft repair 42
Redundancy Word Line Fail Fail_address A R W DI Expected_data TDI FA Address Data TDO Comparator & & Read Fail Fail_address Expected_data, DO Source: V. Schober, et. al, ITC01. 43
Issues about Fuse Boxes From a testing point of view, three problems arise: The logic of fuse box has to be tested An easy way to set fuse values from external source without blowing the fuse is helpful This allows a pre-fuse test and a proof of the determined faulty memory locations for reliability tests, yield improvements and diagnosis capabilities A possibility to read the fuse values directly after the fuse blowing process To enhance observability of the fuse process 44
One-Bit Fuse Box One-bit fuse box contains a fuse bit and a scan flip flop for controlling and observing the fuse data Test_Update=0: the chain of inverters is closed (The value is latched) Test_Update=1: It is possible to set the internal node from TDO The ports TDI and TDO are activated at scan mode Test_Update TDI FRest FRead 1 0 Scan FF TDO F out FRest FRead FGND Fuse Fuse Bit (FB) FGND Reset cycle to read out the fuse information t Source: V. Schober, et. al, ITC01. 45
Fuse Boxes The fuse box can be connected to a scan register to stream in and out data during test and redundancy configuration Update Reset Fuse Box FB FB FB TDI Scan FF Scan FF Scan FF TDO Fail A[0] A[N-1] Source: V. Schober, et. al, ITC01. 46
Parallel Access of the Fuse Information Fuse Box Fuse activation BIST FA Fuse Address Address to be fuse FA Fuse Address FA Address Register FA Fuse Address FA Address Register FA Address Register Redundancy Logic Source: V. Schober, et. al, ITC01. 47
Serial Access of the Fuse Information Fuse Box Fuse activation BIST TDI FA Fuse Address Address to be fuse FA Fuse Address FA Address Register FA Fuse Address FA Address Register FA Address Register TDO Redundancy Logic Source: V. Schober, et. al, ITC01. 48
Test Flow to Activate the Redundancy Initialization of the BIST Load faulty addresses Increment address Access memory No Yes Test finished? No Fail? Yes No Fuse to be blown? No Free register? Yes Yes Stream out faulty addresses Write expected data Write address Write Fail flag Soft repair Hard repair Unrepairable Source: V. Schober, et. al, ITC01. 49
NTHU/ADMtek BISR Scheme Redundancy organization SEG0 SEG1 SR0 SR1 SCG0 SCG1 SR: Spare Row; SCG: Spare Column Group; SEG: Segment 50
NTHU/ADMtek BISR Scheme Redundancy organization Q D A MAO BIRA Wrap pper Main Memory POR BIST Spare Memory MAO: mask address output; POR: power-on reset 51
NTHU/ADMtek BISR Scheme Power-on BISR procedure Power On BIST Test Spare Row & Column Error information BIRA BIST Test Main Memory Continue Error information BIRA Masked address BIRA Reduced address space Address Remapping Address 52
NTHU/ADMtek BISR Scheme Down-graded operation mode If the spare rows are exhausted, the memory is operated at down-graded mode The size of the memory is reduced For example, assume that a memory with multiple blocks is used for buffering and the blocks are chained by pointers If some block is faulty and should be masked, then the pointers are updated to invalidate the block The system still works if a smaller buffer is allowed 53
Definition NTHU/ADMtek BISR Scheme Subword A subword is consecutive bits of a word Its length is the same as the group size Example: a 32x16 RAM with 3-bit row address and 2- bit column address A word with 4 subwords A subword with 4 bits 54
NTHU/ADMtek BISR Scheme Row-repair rules To reduce the complexity, we use two row-repair repair rules A row has multiple faulty subwords Multiple faulty subwords with the same column address and different row addresses Examples: subword subword 55
NTHU/ADMtek BISR Scheme BIRA procedure Run BIST Detects a fault Check Row-Repair Rules Not met Done Met Stop Repair-Most Rules Check Available Spare Rows No available spare row Export Faulty Row Address 56
NTHU/ADMtek BISR Scheme Repair rate analysis Repair rate The ratio of the number of repaired memories to the number of defective memories A simulator has been implemented to estimate the repair rate of the proposed BISR scheme [Huang, et al., l MTDT, 2002] Industrial case: SRAM size: 8Kx64 # of injected random faults: 1~10 # of memory samples: 534 RA algorithms: proposed and exhaustive search algorithms 57
NTHU/ADMtek BISR Scheme Simulation results N SR N SC N SCG RR 1MA 2MA 3MA 4MA 5MA >5MA RR (Best) 1 0 0 18.37% 99 191 4 69 45 32 18.54% 1 4 1 1 8 2 1 12 3 73.10% 94.43% 99.26% 38 40 35 16 9 7 5 7 12 1 3 2 1 1 1 1 0 0 86.14% 99.81% 100% 2 0 0 2 4 1 2 8 2 2 12 3 3 0 0 3 4 1 3 8 2 3 12 3 4 0 0 4 4 1 4 8 2 4 12 3 5 0 0 5 4 1 5 8 2 5 12 3 36.55% 86.09% 99.26% 100% 72.17% 96.10% 99.81% 100% 72.36% 98.52% 100% 100% 85.90% 99.81% 100% 100% 192 2 71 46 18 13 36 16 12 3 8 0 3 1 0 0 0 0 0 0 0 0 0 0 0 75 43 18 7 7 7 5 4 3 2 0 1 0 0 0 0 0 0 0 0 0 0 0 73 44 18 8 5 1 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 44 18 7 6 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37.08% 94.01% 100% 100% 55.06% 97.38% 100% 100% 71.91% 98.69% 100% 100% 85.77% 99.81% 100% 100% 58
NTHU/ADMtek BISR Scheme Layout view of the repairable SRAM Technology: 0.25um SRAM area: 6.5 mm 2 BISR area : 0.3 mm 2 Spare area : 0.3 mm 2 HO spare : 4.6% HO bisr : 4.6% Repair rate: 100% (if # random faults is no more than 10) Redundancy: 4 spare rows and 2 spare column groups Group size: 4 59
Infrastructure IP What is Infrastructure IP Advanced yield optimization solutions necessitate embedding a special type of IP blocks in a chip. Such IP blocks are called Infrastructure IP Unlike the functional IP cores used in SOCs, the infrastructure IP cores do not add to the main functionality of the chip Rather, they are intended to ensure the manufacturability of the SOC and to achieve lifetime reliability Examples of such infrastructure IPs Process monitoring IP, test & repair IP, diagnosis IP, timing measurement IP, and fault tolerance IP 60
Composite IP Infrastructure IP STAR Mem. Mem. Mem. Mem. IW IW IW IW 1149.1 STAR Processor 1 STAR Processor 2 Fuse Box P1500 Mem. IW 61
Infrastructure IP STAR The infrastructure IP is comprised of a number of hardware components, including A STAR Processor A Fuse Box Intelligent Wrappers (IW) The STAR Processor Performs all appropriate test & repair coordination i of a STAR memory It is programmed by a set of instructions to control the operation of the internal modules The Fuse Box May be made of laser fuses to allow single-time repair or may be built of non-volatile memory to perform multi-time time repair 62
Infrastructure IP STAR The Intelligent Wrapper (IW) Associated with each memory is used in conjunction with the STAR Processor to perform test and repair of the memory as well as allow normal memory functioning in the system The IW contains Address counters, registers, data comparators and multiplexers The architectural partitioning between the functions contained in IW and the STAR Processor Depend on the infrastructure IP bandwidth requirements 63
References [1] Y. Zorian, Embedded memory test & repair: infrastructure IP for SOC yield, Proc. Int. Test Conf. (ITC), Oct., 2002, pp.340-349. [2] D.-K. Bhavsar, An algorithm for row-column self-repair of RAMs and its implementation in the Alpha 21246, Proc. Int. Test Conf. (ITC), Oct., 1999, pp.311-318. [3] R.-Rajsuman, Design and test of large embedded memories: an overview, IEEE Design & Test, May/June, 2001, pp.16-27. [4]T. Kawagoe, et al., A built-in self-repair analyzer (CRESTA) for embedded DRAMs, Proc. Int. Test Conf. (ITC), Oct., 1999. [5] I.-Kim, Y. Zorian, G. Komoriya, H. Pham, F. P. Higgins, and J. L. Lweandowski, Built in self repair for embedded high density SRAM, in Proc. Int. Conf. (ITC), Oct. 1998, pp. 1112-1119. [6] J.-F. Li, J.-C. Yeh, R.-F. Huang, C.-W. Wu, A built-in self-repair scheme for semiconductor memories with 2-D redundancy, Proc. Int. Test Conf. (ITC), Sep., 2003. [7] Volker Schober, Steffen Paul, and Olivier Picot, Memory built-in selfrepair using redundant words, Infineon Technologies AG Balanstr.73,81541 Munich, Germany,0-7803-7169-0/01 2001, IEEE 64