AS THE capacity and density of memory gradually

Similar documents
Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead

An Area-Efficient BIRA With 1-D Spare Segments

3D Memory Formed of Unrepairable Memory Dice and Spare Layer

Repair Analysis for Embedded Memories Using Block-Based Redundancy Architecture


Optimized Built-In Self-Repair for Multiple Memories

AN OPTIMAL APPROACH FOR TESTING EMBEDDED MEMORIES IN SOCS

An Integrated ECC and BISR Scheme for Error Correction in Memory

THREE algorithms suitable for built-in redundancy analysis

Fully Programmable Memory BIST for Commodity DRAMs

RE-CONFIGURABLE BUILT IN SELF REPAIR AND REDUNDANCY MECHANISM FOR RAM S IN SOCS Ravichander Bogam 1, M.Srinivasa Reddy 2 1

Built-in Self-Test and Repair (BISTR) Techniques for Embedded RAMs

A New Scan Chain Fault Simulation for Scan Chain Diagnosis

Efficient Built In Self Repair Strategy for Embedded SRAM with selectable redundancy

Improving Memory Repair by Selective Row Partitioning

Scalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN)

At-Speed Wordy-R-CRESTA Optimal Analyzer to Repair Word- Oriented Memories

Parallelized Network-on-Chip-Reused Test Access Mechanism for Multiple Identical Cores

Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation

A Proposed RAISIN for BISR for RAM s with 2D Redundancy

A Universal Test Pattern Generator for DDR SDRAM *

Mark Sandstrom ThroughPuter, Inc.

A novel test access mechanism for parallel testing of multi-core system

Jin-Fu Li Dept. of Electrical Engineering National Central University

DIRECT Rambus DRAM has a high-speed interface of

GLOBAL JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES A NOVAL BISR APPROACH FOR EMBEDDED MEMORY SELF REPAIR G. Sathesh Kumar *1 & V.

A Built-In Redundancy-Analysis Scheme for RAMs with 2D Redundancy Using 1D Local Bitmap

High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm *

Global Built-In Self-Repair for 3D Memories with Redundancy Sharing and Parallel Testing

An Integrated Built-in Test and Repair Approach for Memories with 2D Redundancy

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

CHAPTER 1 INTRODUCTION

Test Resource Reused Debug Scheme to Reduce the Post-Silicon Debug Cost

Storage Architecture and Software Support for SLC/MLC Combined Flash Memory

POWERFUL BISR DESIGN FOR EMBEDDED SRAM WITH SELECTABLE REDUNDANCY

Computing Submesh Reliability in Two-Dimensional Meshes

1110 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 7, JULY 2014

Scan-Based BIST Diagnosis Using an Embedded Processor

System Verification of Hardware Optimization Based on Edge Detection

Very Large Scale Integration (VLSI)

A Parametric Design of a Built-in Self-Test FIFO Embedded Memory

Evaluation of FPGA Resources for Built-In Self-Test of Programmable Logic Blocks

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM

Self-Repair for Robust System Design. Yanjing Li Intel Labs Stanford University

A hardware operating system kernel for multi-processor systems

A Low-Power ECC Check Bit Generator Implementation in DRAMs

P2FS: supporting atomic writes for reliable file system design in PCM storage

Lecture notes on Transportation and Assignment Problem (BBE (H) QTM paper of Delhi University)

A Performance Degradation Tolerable Cache Design by Exploiting Memory Hierarchies

TEST cost in the integrated circuit (IC) industry has

Design of memory efficient FIFO-based merge sorter

COE 561 Digital System Design & Synthesis Introduction

BIST-Based Test and Diagnosis of FPGA Logic Blocks

Using Genetic Algorithms to Solve the Box Stacking Problem

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

Built-in Self-repair Mechanism for Embedded Memories using Totally Self-checking Logic

On Using Machine Learning for Logic BIST

Single error correction, double error detection and double adjacent error correction with no mis-correction code

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation

Efficient BISR strategy for Embedded SRAM with Selectable Redundancy using MARCH SS algorithm. P. Priyanka 1 and J. Lingaiah 2

A Reconfigurable Multifunction Computing Cache Architecture

Optimal Built-In Self Repair Analyzer for Word-Oriented Memories

3. HARDWARE ARCHITECTURE

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION

Efficient Repair Rate Estimation of Redundancy Algorithms for Embedded Memories

Unleashing the Power of Embedded DRAM

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

ECE519 Advanced Operating Systems

Block Sparse and Addressing for Memory BIST Application

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

CHAPTER 1 INTRODUCTION

Embedded SRAM Technology for High-End Processors

A motion planning method for mobile robot considering rotational motion in area coverage task

A Review paper on the Memory Built-In Self-Repair with Redundancy Logic

Design and Implementation of Microcode based Built-in Self-Test for Fault Detection in Memory and its Repair

With Fixed Point or Floating Point Processors!!

Restricted Use Case Modeling Approach

ADVANCES in chip design and test technology have

Exploiting Unused Spare Columns to Improve Memory ECC

Metodologie di progetto HW Il test di circuiti digitali

EECS150 - Digital Design Lecture 17 Memory 2

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Metodologie di progetto HW Il test di circuiti digitali

CDA 5140 Software Fault-tolerance. - however, reliability of the overall system is actually a product of the hardware, software, and human reliability

ECE 485/585 Microprocessor System Design

An Advanced and more Efficient Built-in Self-Repair Strategy for Embedded SRAM with Selectable Redundancy

Reducing Control Bit Overhead for X-Masking/X-Canceling Hybrid Architecture via Pattern Partitioning

Memory System Design. Outline

ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2

WITH integrated circuits, especially system-on-chip

1998 Technical Documentation Services

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Modeling and Simulation of Microcode-based Built-In Self Test for Multi-Operation Memory Test Algorithms

Splitter Placement in All-Optical WDM Networks

MAA3182SC, MAB3091SC INTELLIGENT DISK DRIVES OEM MANUAL

Accurate Logic Simulation by Overcoming the Unknown Value Propagation Problem

Online Hardware Task Scheduling and Placement Algorithm on Partially Reconfigurable Devices

Basic Concepts of Reliability

Chapter 3 - Memory Management

Transcription:

844 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 3, MARCH 2017 Hardware-Efficient Built-In Redundancy Analysis for Memory With Various Spares Jooyoung Kim, Woosung Lee, Keewon Cho, and Sungho Kang, Senior Member, IEEE Abstract Memory capacity continues to increase, and many semiconductor manufacturing companies are trying to stack memory dice for larger memory capacities. Therefore, built-in redundancy analysis (BIRA) is of utmost importance because the probability of fault occurrence increases with a larger memory capacity. A traditional spare structure that consists of simple rows and columns is somewhat inadequate for multiple memory blocks BIRA because the hardware overhead and spare allocation efficiency are degraded. The proposed BIRA uses various types of spares and can achieve a higher yield than a simple row and column spare structure. Herein, we propose a BIRA that can achieve an optimal repair rate using various spare types. The proposed analyzer can exhaustively search not only row and column spare types but also global and local spare types. In addition, this paper proposes a fault-storing content-addressable memory (CAM) structure. The proposed CAM is small and collects faults efficiently. The experimental results show a high repair rate with a small hardware overhead and a short analysis time. Index Terms Built-in redundancy analysis (BIRA), built-in self-repair, spare structures, yield improvement. I. INTRODUCTION AS THE capacity and density of memory gradually increases in recent times because of consumer demand, technical limitations in analysis time and the capacity of 2-D memory arises [1]. Therefore, 3-D semiconductor devices become important to overcome these technical limitations [2] [6], and it is expected that the demand for 3-D memory will increase in the near future. Further development of diagnosis and repair techniques is required under these circumstances to maximize the revenue of semiconductor companies. This requires built-in self-test (BIST) and built-in redundancy analysis (BIRA). Since a 3-D semiconductor usually has unused space on the base layer, this space can be used to accommodate additional functions such as BIRA and BIST. To reduce the time to market, it is necessary to determine why a semiconductor is defective; it is also important to maximize the yield, minimize the analysis time, and minimize Manuscript received March 24, 2016; revised July 20, 2016; accepted August 30, 2016. Date of publication September 21, 2016; date of current version February 22, 2017. This work was supported by the National Research Foundation of Korea (NRF) grand funded by the Korea government, Ministry of Science, ICT and Future Planning (No. 2015R1A2A1A13001751). (Corresponding author: Sungho Kang.) The authors are with the Computer Systems Reliable SOC Laboratory, Department of Electrical and Electronic Engineering, Yonsei University, Seoul 120-749, South Korea (e-mail: kimjy9850@soc.yonsei.ac.kr; uoos@soc. yonsei.ac.kr; ckw1510@soc.yonsei.ac.kr; shkang@yonsei.ac.kr). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVLSI.2016.2606499 the hardware overhead. Therefore, repair and diagnosis of semiconductors are important. This is particularly important in 3-D semiconductors because if even one die among the stacked dice is irreparable in a postbond process, every stacked die will be incapable of working properly. However, 3-D semiconductors can be repaired by a postbond BIRA process in the base layer. In 3-D memories, each stacked die needs a test port to test itself. Using automatic test equipment (ATE) for semiconductor testing, the number of I/O ports increases between the base layer and the ATE. Increasing the number of I/O ports not only causes an increase in the production risk of failure on throughsilicon-vias (TSV) but also increases the production cost [7]. For the abovementioned reasons, BIRA has been under research for a long time; however, there is an NP-complete problem of allocating row and column spares [8]. Researches on BIRA have attempted to minimize the redundancy analysis (RA) time, hardware overhead, and the repair rate because these features influence the cost and productivity of semiconductors. There are tradeoffs among these features in general. As the memory repair becomes more important, many researches have been performed extensively. 2-D spare structures [9] [17] consist of a simple row and column spare structure. However, these are not limited to achieve a higher yield. In general, a memory consists of several memory blocks or memory banks. In practice, 2-D spares are located for each memory block. Directly applicable for application of this research to practical memories is difficult because most practical memories adopt divided bitline or divided wordline structures to reduce the operational voltage and increase the analysis time by minimizing the operational memory area [18], [19]. Because of the changing operation of the memory, the spare structure has also changed in various ways. Some BIRA researches have covered the various spares [20] [22]; however, this research uses heuristic RA and considers a different spare structure with a traditional row and column spare structure, decreasing the overall yield of the tested memory. To maximize the yield, BIRA of the spare structure with various spares can repair every case of reparable faulty memory using the given redundancies. To obtain an optimal repair rate, an exhaustive search is unavoidable because of the NP-complete problem of allocating row and column spares [10]. The novelties of the proposed approach are as follows. To obtain a higher repair rate, the proposed BIRA uses various spares such as common spares, local spares, and global spares, as shown in Fig. 1. Using these spares, the proposed BIRA can 1063-8210 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

KIM et al.: HARDWARE-EFFICIENT BIRA FOR MEMORY WITH VARIOUS SPARES 845 cases by the given spares. Also, in experiments, the repair rates of algorithms, such as Comprehensive Realtime exhaustive Search Test and Analysis (CRESTA) [9] and BRANCH [10], and the proposed algorithm, are identical. Fig. 1. Cost-effective spare structure with many types of spare cells. achieve a 100% normalized repair rate and a higher repair rate than the conventional spare structure. The explanations for the feature of each spare are given later. In a 3-D memory block, there can be a vertical relationship between each block using spare sharing between dice; there can also be a horizontal relationship within the same die [4], as shown in Fig. 1. In order to reduce the hardware overhead, a storage structure of the fault collection unit is devised. And an analyzer is developed to exhaustively search every repair case. Although the number of cases required for the exhaustive combinations of the various spares is large, this onerous analysis time can be overcome by simplifying the search. Therefore, the proposed BIRA can rapidly repair a tested memory to the full extent possible with the given spares for any spare structure. II. BACKGROUND A. Repair Rate Repair rate and normalized repair rate are defined as # of repaired memories Repair rate = (1) # of tested memories # of repaired memories Normalized repair rate = # of reparable memories. (2) The repair rate is defined as the number of repaired memories divided by the number of tested memories, as the repair process is performed for all the tested memories. Using the normalized repair rate, all the repair rates of the various BIRAs can be easily compared. When BIRA achieves a normalized repair rate of 100%, it is called a BIRA that can obtain an optimal repair rate. The optimal repair rate can be defined. In every faulty case, there is a finite number of covering cases by given spares. If there are repairable cases among them, the faulty case can be repaired by the given spares. Therefore, a BIRA method that includes an exhaustive searching method can find repair solutions among the finite number of covering B. Classification of Faults 1) Single Fault: A single fault refers to a fault that does not share a row and column address with other faults [23]. Because it is isolated from others, it can be covered by a row or column spare. At least one spare should cover it. 2) Pivot Fault: A pivot fault is defined as a fault that has a row and column address that is independent of previously found faults [12]. The pivot fault can be defined differently, according to a found sequence. Using the concept of the pivot fault, the hardware overhead of BIRA can be decreased, and the method of allocating spares for BIRA can be simplified. 3) Must-Repair Line: A must-repair line is defined as a line that has more faults than the number of different available spare types for that line [10]. When such a case occurs, the faults on the line cannot be covered by the different spare types because of their insufficient number [24]. Because most BIRAs define a must-repair line, they can omit faulty information that is included on such lines. This makes the hardware overhead of BIRA small. C. Algorithms It is more important than ever to research how to improve yields for 3-D stacked semiconductors. Toward that end, there have been many studies regarding BIRA. Initially, BIRA algorithms that were considered for a single block included CRESTA [9], BRANCH [10], Selected Fail Count Comparison (SFCC) [11], Fault-Driven [25], Essential Spare Pivoting (ESP), and Local Repair Most (LRM) [12]. These algorithms can be adopted into exhaustive searching and heuristic BIRAs. 1) Exhaustive Searching BIRA: These are tree-based search BIRAs, which are typically referred to as CRESTA and Fault- Driven. CRESTA is a parallel BIRA; it concurrently collects faults and repairs, while also achieving an optimal repair rate. The hardware for CRESTA exponentially increases with an increase in the number of spares, because CRESTA stores every repair case. CRESTA can achieve an optimal repair rate without RA time because it considers every repair case. Despite the hardware overhead, CRESTA has many positive aspects in terms of analysis time and repair rate. To reduce the hardware of CRESTA, R-CRESTA [26] that removes fault address overlap has been proposed, however, the hardware remains too large. SFCC and BRANCH were proposed to reduce the hardware overhead of exhaustive searching BIRA. SFCC and BRANCH achieve an optimal repair rate with a smaller hardware overhead than R-CRESTA and CRESTA. SFCC and BRANCH have a content-addressable memory (CAM) [27], [28] structure for fault collection. They collect all faulty information, except for fault addresses, which are included on the mustrepair line and share addresses with a pivot or nonpivot. This reduces hardware overhead. SFCC defines the must-repair line by counting repeated fault addresses. The addresses of the

846 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 3, MARCH 2017 must-repair line are stored on an independently designed mustrepair address CAM. On the other hand, BRANCH detects the must-repair line using the properties of CAM. Because BRANCH does not use the independently designed mustrepair address CAM, the hardware of BRANCH is slightly smaller than that of SFCC. In terms of RA, SFCC is based on simple exhaustive searching. On the other hand, BRANCH is based on pivot-based exhaustive allocation. The pivot-based exhaustive allocation is faster than the simple exhaustive searching of SFCC. 2) Heuristic BIRA: While exhaustive searching-based BIRA algorithms can achieve an optimal repair rate, heuristic BIRA algorithms ESP and LRM cannot achieve an optimal repair rate. This is because, as mentioned above, BIRA with a 2-D spare structure has an NP-complete problem. ESP is a representative heuristic BIRA algorithm. The hardware overhead of ESP is quite small and has a very simple RA method. ESP uses the concept of the pivot fault for the first time; it also uses CAM. When the newly found fault is not shared with a previously stored fault, it is stored on CAM; otherwise, the newly found fault is shared with a previously stored fault, and the sharing address of the previously stored fault is covered by a spare. Unlike ESP, LRM has a 2-D faulty map. It uses a 2-D faulty map as much as the defined faulty map size by the user. The 2-D faulty map requires more storage than the 1-D fault-saving CAM of ESP. Therefore, the hardware of LRM is quite large, and the repair rate can be changed according to the faulty map size defined by the user. Although heuristic approaches cannot achieve an optimal repair rate, heuristic BIRA is a simple way to achieve an allocation method and small hardware overhead. However, the repair rate has an influence on the manufacturing yield. In 3-D stacked memories, a small change in the yield of a die can be a serious problem for the postbond yield because when one stacked die is irreparable, the entire stack of dice cannot be used. Therefore, each die yield of 3-D memory is more important than each die yield of 2-D memory. III. PROPOSED BIRA The proposed BIRA has an expandable spare structure that has a 100% normalized repair rate by searching all the spare allocation cases. Therefore, it is a practical solution for the memory repair, which has expandable spare cell structure. BIRA algorithm for multiple-block memory is proposed. This improves the repair rate of multiple single-die memory. Memories are usually composed of multiple memory blocks or memory banks. Since each memory block has only local spare cells, in order to repair the whole memory, most BIRA algorithms repair each memory block with its own spare cells sequentially. In order to efficiently repair the memory, the proposed BIRA repairs multiple blocks with the help of global spare cells and common spare cells. In addition, many approaches for modifying spares [29], [30] as well as BIRAs that are reconfigurable for many different types of embedded memory are required for systems-on-chip [31]. Therefore, the proposed BIRA can be applicable to any spare structure and achieve an optimal repair rate. Fig. 2. Definition of the number of each spare. A. Features of the Cost-Efficient Spare Structure The cost-efficient spare structure of the memory has various types of spares, as shown in Fig. 1. These types of spares can be used according to their length, the range of their application, and their direction, as shown in Fig. 2. The cost-efficient spare structure looks quite complex. However, the memory is composed of many blocks. The costefficient spare structure uses the concepts of spare sharing and spare length. It can cause designing of fuses and gates for routing to become a little complex. It may make memory access time and performance of operation degrade. However, improving yield is a more important issue and the complexities of the proposed spare structure can be acceptable to the future memory. As mentioned above, exhaustive searching BIRA can find out every repairable case among many finite allocation cases in the traditional spare structure, and the proposed BIRA, modified CRESTA and BRANCH for the tested memory can also find out every repairable case among many finite allocation cases in the proposed cost-efficient spare structure. Fig. 2 shows the feature options. The length options refer to the length of the spare. A single spare can only cover a single block. On the other hand, a double spare can cover an adjacent block s row or column in the same direction as the spare. The single spare is more efficient in terms of the yield per spare area than the double spare; however, BIRA needs more CAM cells to store more faulty information with single spares because it changes the must-repair condition of the blocks, and this change increases the number of must-storing faults to achieve an optimal repair rate. As the number of usable spares in a block increased, the number of must-storing faults also increased. A specific calculation of the number of must-storing faults to achieve the optimal repair rate will be explained along with the proposed CAM structure using an equation. Because of this fact, the overall hardware of BIRA increased, while the yield per hardware overhead of BIRA decreased. Therefore, the structure of the spares should be considered on the basis of the efficiency of the hardware overhead of BIRA and spares. Next, the coverable range option refers to the coverable scope of a spare. The local spare is only for a fixed block. The common spare is for the blocks adjacent to the spare, while the global spare is for all blocks. The reason for this is that each spare has different advantages. The advantage of the local and common spares is the access time. As the spares are close to the target block, the access speed is fast. However, because of the limited coverable scope of the block, the yield per spare area decreased. On the other hand, the advantage of the global spare is the scope of the coverable blocks. As the global spare can cover all blocks, the yield

KIM et al.: HARDWARE-EFFICIENT BIRA FOR MEMORY WITH VARIOUS SPARES 847 per spare area increased. However, there are disadvantages to the global spare, such as the access speed and the increased number of must-storing faults, as in the case of the features of the single spare. Finally, the direction option refers to the direction of a spare. As the options are described, there are complex tradeoffs between spare types. To obtain efficient yield at low cost, the construction of the spare structure is important in a multiblock with various spares. To define the efficient spare structure, the results of the simulation based on real faulty information are required. B. Overview of the Proposed Idea Symbolic representations of the number of each type of spare are defined in Fig. 2. For example, according to the definition, S sgr denotes the number of the single global row spare lines. To present the number of total row and column spares, R s and C s are used, respectively. In this paper, a structure is used in which the spare structure has the same number of useable row and column spares in each block for the convenience of explanation. Therefore, when S slr is 2, every block has a local row with two singles. The rest of the spares are applied in the same manner as # of allocation combinations for traditional spare (# total spares)! = (3) (# row spares)!(# column spares)! # of allocation combinations for target spare (R s + C s )! = S slr!s slc!s scc!s scr!s sgr!s sgc!s dlr!s dlc!s dgr!s dgc!. (4) As a way to change the traditional row and column spare structure, two variables are added to distinguish the types of spares, namely, the length of the spare and the coverable range of the spare. These two variables have been defined in the previous section. These can make RA complicated, as in CRESTA. To apply CRESTA to the cost-efficient spare structure, the number of combinations of spare allocation increased tremendously by distinguishing the increased type of spares in the allocation sequence. The combination equations (3) and (4) show the difference between the traditional spare structure and the cost-efficient spare structure. The quantity in (4) is much larger than that in (3) for the same spare area. The reason for this is that (4) not only has a larger total number of spares than (3) after dividing the spares but also has a smaller quantity appearing in the denominator because of the divided factorial number. To overcome these problems, the proposed BIRA has a reduced CAM structure and a simple analyzer that can consider the variables and exhaustively searches for the solution based on the collected pivot faults at a faster speed. In order to minimize the number of fault storages, a new step for checking must-repair conditions is devised to reduce the fault storages. If must-repair conditions occur in the tested memory, the fault addresses in the must-repair line need not be saved. Therefore, it is not necessary to store all the faults in the tested memory. Fig. 3. Pivot CAM. C. Proposed CAM Structure CAM can use the contented value as an address. Therefore, BIRA can store the faulty information in a cycle in the correct place using the incoming fault; as CAM compares and stores the faulty information in a cycle, BIRA does not interrupt BIST. Therefore, the storage cell of the proposed BIRA is CAM. CAM is connected with BIST, as shown in Fig. 10. BIST provides CAM with fault and test finish information to store faults and determine whether BIRA collects faults or analyzes collected faults to find a repair solution. The proposed structure of CAM can be divided into two parts. One of the two parts is the pivot CAM (PCAM). PCAM stores the faults that have independent row and column addresses. As mentioned above, the faults are defined as pivot faults. The other part is the nonpivot CAM (NPCAM). NPCAM stores the faults that have the same row or column address as the pivot faults that have been previously stored in PCAM. The reason for which CAM of the proposed BIRA collects faults as the pivot and nonpivot faults is to use the characteristics of the pivot fault in the proposed analyzer. Fig. 3 shows the composition of PCAM. An enable flag shows whether the line of PCAM is occupied by a fault. Next to the enable flag, there are row and column address CAMs for storing the addresses of pivot faults. When the row and column length of a block are N and M, respectively, CAM should be able to express the full address of the row and column, log 2 N and log 2 M, respectively. The block address CAM is to represent the block address of the storing pivot fault. To show the must-repair states on the tested memory, there is mustrepair information, which includes the block row must-flag, block column must-flag, adjacent block row must-flag, and adjacent block column must-flag. For the multiple memory blocks, the must-repair condition is changed.

848 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 3, MARCH 2017 Fig. 5. Example of a two-block memory with faulty pattern. Fig. 4. Nonpivot CAM. In the single block memory, the must-repair condition is simply defined depending on the number of spares and the number of faults on the faulty line. However, the must-repair conditions in the tested memory with multiple memory blocks are rather complicated. This line should be set separately for each block based on its number of usable spares. Therefore, the must-repair information has four flags. Block must-flags are for the block of the stored pivot fault on the corresponding PCAM, whereas adjacent block must-flags are for the blocks that are adjacent to the block of the stored pivot fault on the corresponding PCAM. The changed repair conditions are as follows: # of faults on the row faulty line in a block > # of usable column spares in a block (5) # of faults on the column faulty line in a block > # of usable row spares in a block. (6) The number of lines of PCAM equals the total number of spares. Subsequently, as the pivot fault should be covered by at least one spare, when the number of pivot faults is larger than the total number of spares, the faulty pattern is not reparable. In this case, the early abort signal of Fig. 10 is set to stop the BIST operation. Fig. 4 shows the composition of NPCAM. An enable flag shows whether the line of NPCAM is occupied by a fault like the enable flag of PCAM. As the sharing row or column address of the nonpivot needs to be stored with a pivot fault, the duplicated address is also stored. Subsequently, this causes the storage cells to be wasted. Therefore, the proposed NPCAM uses a PCAM pointer and an r/c-type descriptor to express the position of the sharing address. The PCAM pointer represents the position of the sharing PCAM line, and the r/c-type descriptor represents the type of address of NPCAM (i.e., row or column). When the r/c-type descriptor is equal to 1, the type of sharing address is a column. Otherwise, the type of sharing address is a row. The row or column address should be able to express the maximum lengths of N and M. The block address of the nonpivot should be stored because the block address of a nonpivot fault is not always the same as that of a pivot fault in the case where the nonpivot fault is on the adjacent block. The total number of lines of NPCAM is defined by mustrepair conditions. The proposed BIRA stores all faulty information, except for the faults on the must-repair line. The amount of faulty information beside that on the must-repair line is given in (S slc + S scc + S sgc + 2S dgc + 2S dlc ) (# of usable row spares for a block) + ( ) S slr + S scr + S sgr + 2S dgr + 2S dlr (# of usable column spares for a block). (7) Equation (7) is calculated according to the must-repair condition. The proposed structure of CAM should be able to store the number derived in (7). The total number of lines of NPCAM is not given in (7) because the pivot faults are stored at the PCAM. Therefore, to estimate this number, one must consider the minimum pivot with the maximum number of essentially stored faults in (7), except for the minimal number of pivots. All the lines of CAM in BRANCH are occupied by the faults with maximal faulty patterns in the single block memory. However, this is not the case for all the lines of PCAM. The difference between the single block memory and the tested memory arises from the number of maximum usable spares for a pivot fault. In the tested memory, this number is four because the added two faults of the block adjacent to the pivot fault can be repaired by single spares. On the other hand, the maximum number of usable spares for a pivot is two, made up of row and column spares for the single block memory. Therefore, the number of lines of NPCAM should be calculated according to the minimum number of pivot fault cases with the maximum number of essentially stored faults. The number of minimal pivot fault cases with the maximum number of essentially stored faults is the sum of the number of usable spares of block1 and the number of local spares of block4, because, every fault on block2 and block3 can be included in the line of the pivots of block1 and block4. As the faults share a row or column address with the pivots on block1 and block4, the faults on block2 and block3 can

KIM et al.: HARDWARE-EFFICIENT BIRA FOR MEMORY WITH VARIOUS SPARES 849 Fig. 6. Example of fault collection using Fig. 5(a). (a) Fault collections #1 #4 result in two pivot faults and two nonpivot faults. (b) #5 fault collection results in a nonpivot fault. (c) #6 fault collection results in a must-repair determination. (d) Fault collections #7 #9 result in a pivot fault and two nonpivot faults. (e) #10 fault collection results in a nonpivot fault. (f) #11 fault collection results in a nonpivot fault. be covered by local and common spares without the pivots on block2 and block3. In conclusion, the number of lines of the NPCAM is obtained by subtracting the result of the number of minimal pivot fault cases from the maximal number of essentially stored faults given by (7). Fig. 5 is an example of a two-block memory. Fig. 6 shows an example of fault collection in Fig. 5. The two-block memory does not have an adjacent block in the column direction. Thus, there is no adjacent block column must-flag. Fig. 6(a) shows the collection result for faults #1 #4. Fault #1 is stored in the first line of PCAM. Fault #2 (1, 3, 0) shares a row address with the fault stored in the first line of PCAM. Therefore, fault #2 is stored in the first line of NPCAM. The column address of fault #2 is presented by the address part of NPCAM. The row address of the #2 fault is presented by the PCAM Pointer and the R/C Descriptor. The 0 of PCAM pointer refers to the first line of PCAM, and the C of the R/C Descriptor means that the corresponding address type of NPCAM is a column. Through stored information about the R/C Descriptor and PCAM pointer, the other address of fault #2 can be restored by the first row of PCAM. Fault #3 has independent row and column addresses of the stored faults on PCAM. Thus, fault #3 is stored in the second line of PCAM as it is classified as a pivot fault. Fault #4 has a dependent row on the row of the first line of PCAM. Thus, the fault is stored in the second line of NPCAM. Moreover, the values of the R/C Descriptor and PCAM pointer are set in the same manner. Fig. 6(b) and (c) shows the additional collection result of faults #5 and #6. As fault #6 comes, the 1 row line of block2 is a must-repair line. However, the pivot fault block and the must-repair line block differ; in this case, a must-flag is set for the adjacent block row of the first line of PCAM; the faults on the mustrepair line may or may not be removed because there may or may not be enough room in storage. In the example, these faults are not removed. Fig. 6(d) (f) shows the collection results for faults #7 #11. Fault #7 has the independent row and column addresses of the faults stored on PCAM. In the same way, the fault is stored on the third line of PCAM. Faults #8 #11 are stored in the same way as the previously collected faults. As the faults are collected in this way, the proposed BIRA can achieve an optimal repair rate with reduced faulty information. In addition, the maximally designed CAM can collect faults on the blocks on another die. Using the collected faulty information, the proposed analyzer performs an exhaustive search to find a repair solution. D. Proposed Analyzer for the Tested Memory As mentioned above, 2-D spare allocation is an NP-complete problem. Thus, exhaustive searching is essential for finding a reparable solution using limited spares. In addition, the cost-efficient spare structure has more variables than the traditional row and column spare structure. The added variables are the length of the spare and the coverable range of the spare. Thus, the analyzer may have to be more complex than the traditional BIRA to consider the added variables. Although the number of allocation combinations is large, the average time for finding a repair solution is not too long. For most tested memories, the memory repair process is finished earlier since finding the first solution is enough

850 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 3, MARCH 2017 Fig. 7. Proposed analyzer for the target spare structure. for repair even though there are allocation combinations. In addition, to meet the given timing specifications and the BIST clock, many BIRAs including the proposed BIRA use the pipeline architecture for fault collection and fault analysis blocks. Since the fault collection block is performed during the BIST operations, it must synchronize the faulty address output of the BIST. On the contrary, since the analyzer block is performed after the BIST operations, the analyzer time is more relaxed. However, the hardware overhead of the analyzer block is necessary due to the pipeline structure of the analyzer block to meet the BIST clock. This makes for slight latency. The timing is not a critical issue for BIRAs since they are implemented to synchronize the BIST clock. Fig. 7 shows the proposed analyzer. It is designed to consider the direction and length of spares. The addresses and the block addresses of the faults saved on PCAM are connected to each pivot row and pivot column of Fig. 7, according to the sequence of the PCAM line. The proposed analyzer uses three signals for exhaustive searching. At first, the proposed analyzer uses the direction-of-spares-selection signal (DSSS) to search every case of the direction selection for each pivot fault. It uses the pivot fault characteristics whereby every pivot fault should be covered by at least one spare, regardless of the direction of the spare. Thus, every bit of DSSS refers to a row or column according to a value 1 or 0. The length of DSSS is the same as the total number of spares or the number of lines in PCAM used to select a row or column address of the stored pivot faults. Thus, DSSS is connected to each multiplexer (MUX) and chooses every pattern of cases among the pivot faults on the basis of the number of row and column spares. The number of the patterns of the DSSS is given as # of the patterns of the DSSS = (R s + C s )!. (8) R s!c s! MUX outputs the chosen row or column addresses of the pivot faults by a pattern of DSSS. Using the output of MUX, the nonpivot faults should be compared to check whether they are covered by the chosen addresses of the pivot faults. The part used to do the comparing operation is shown by MUX of Fig. 7. NPr x and NPc x in Fig. 7 represent the row and column addresses of the fault saved on the xth line of NPCAM, respectively. Every NPr x or NPc x value is compared with the output of MUX that is a candidate for a repair case. However, not only the row or column address but also the block address should be considered in the cost-efficient spare structure, because the nonpivot address of a block adjacent to the pivot fault can be repaired when the type of allocating spare is global. Thus, the comparator should be able to check whether nonpivot faults are repaired according to the type of allocating spare and the relation between the block of candidate pivot faults and the block of the connected nonpivot fault. There are as many row address comparators (RAC) and column address comparators (CAC) as the number of total row and column spares, respectively, for NPr x or NPc x in Fig. 7. This is because the nonpivot fault can be checked by comparingannpr x or NPc x value with all repair candidate addresses. The comparators are shown in Figs. 8 and 9. The comparators are designed to consider the length of spares for each spare direction. For a nonpivot fault to be repaired by a repair candidate address, the address and block address of both the faults should be essentially the same for single-spare allocation. On the other hand, for double-spare allocation, the nonpivot fault having the same address that is adjacent to the repair candidate address can be repaired. The designed RAC and CAC can judge whether a nonpivot fault can be repaired according to the repair candidate address and the length of the allocating spare. The three NXOR gates are shown in Figs. 8 and 9. One of the NXOR gates is used to compare a row or column address of a nonpivot fault (NPry addr, NPcy addr) to a row or column repair candidate address (RRx addr, RCx addr). The center NXOR gate is used to check whether the connected nonpivot fault is adjacent to the repair candidate fault or the same. Thus, RAC compares the high block addresses of both the faults (RRx block1, NPry block1) to check whether they are on the same block or adjacent in the column direction. On the other hand, CAC compares the row block addresses of both the faults (RCx block0, NPcy block0) because CAC should check whether both the faults are adjacent in the row direction, unlike RAC. The comparison result is not the same when the

KIM et al.: HARDWARE-EFFICIENT BIRA FOR MEMORY WITH VARIOUS SPARES 851 Fig. 8. Row address comparator. Fig. 10. Block diagram of the proposed RA scheme. Fig. 9. Column address comparator. faults are not adjacent to each other. The comparison result of the two NXOR gates should be essentially the same for the nonpivot fault to be repaired. However, the final repair result is defined by the length of the allocating spare and the relationship between the blocks. The last NXOR gate is used to check whether both the faults are on the same block or adjacent, when the results of the first two NXOR gates are the same. The output of the last NXOR gate is connected to the next two input OR gates. The other input of the OR gate is connected to a bit of the row/column length-selection signal (RLSS/CLSS) for the length of the spare. When the length of the spare is double, the output is 1, and the row or column address and the block address are the same as those of a repair candidate fault, the nonpivot fault can be covered regardless of the output of the last NXOR gate. When the length of the spare is single, the output is 0, and the row or column address and the block address are the same as those of a repair candidate fault, the nonpivot fault can be covered in the case where the output of the last NXOR gate is 1, and the faults are on the same block. To use CAC and RAC for exhaustive searching, there are RLSS and CLSS in Fig. 7. These signals are designed to exhaustively search the cases and allocate every length of spare to all addresses chosen by DSSS. The length of RLSS is the number of total row spares, and the length of CLSS is the number of total column spares. Every bit of RLSS is connected to every RAC RRx in order. In addition, every bit of CLSS is connected to every RAC RCx in order and in the same manner. Thus, the proposed BIRA can exhaustively search every pattern of the length of the spares using RLSS and CLSS. The numbers of RLSS and CLSS patterns are given by (9) and (10), respectively, according to the combination equations # of the RLSS patterns = R s! (9) S sr!s dr! # of the CLSS patterns = C! S sc!s dc!. (10) S sr of (9) indicates the total number of the single-row spares and S dr indicates that of double row spares. Also S sc and S dc of (10) means the number of the single-row spares and the number of the double-row spares. The comparison results of RAC and CAC are connected to each of two input OR gates at the bottom of Fig. 7. Thus, the output of the OR gate indicates whether the nonpivot fault is covered. When the output of the OR gate is 0, the nonpivot fault is uncovered. However, when the output of the OR gate is 1, the corresponding nonpivot fault is covered. After all, the proposed analyzer outputs uncover_nonpivot_addr, the uncovered nonpivot addresses, and nonpivot_cover_result, the nonpivot cover result, to the redundant analyzer in Fig. 10 to consider a final repair. The proposed analyzer considers only the length and direction of the spares. Consideration of the coverable range of

852 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 3, MARCH 2017 Fig. 11. Example of the operation of the proposed RA. (a) Faulty pattern and spare structure. (b) Fault collection result according to the faulty pattern. (c) Analysis result. spares is required to not exceed the number of usable spares in each block according to DSSS, RLSS, and CLSS from the signal generator in Fig. 10. They are considered at the signal validity checker module in Fig. 10. Although all the nonpivot faults and must-repair lines are covered by the combination of selection signals, the combination of the three signals is invalid when it allocates too many spares to a block, where too many is in this case defined as more spares than the number allocated to the block by the signals. On the other hand, when the signals satisfy the number of usable spares for each block, the signal validity checker module outputs the unused spare and valid signal from signal valid, as shown in Fig. 10. In addition, the signal validity checker module checks whether the candidate allocations cover must-repair lines. The overly collected faults, including the must-repair lines, were not stored during the fault-collection phase. When the must-repair lines are not covered by the signals, the module checks whether there are enough spares to cover them. The module can consider using the unused spares and combination of signals. In addition, the module outputs the valid signal when it is satisfied that the must-repair line can be covered and does not allocate too many spares to a block. As the pivot faults are essentially repaired, there are uncovered nonpivot addresses for some combinations of signals. The uncovered nonpivot address can be covered by unused spares. The number of unused spares is defined by the number of empty lines of PCAM and each combination of signals. The redundant analyzer in Fig. 10 is used on the basis of the uncovered nonpivot faults and unused spares. When the uncovered nonpivot faults can be covered by unused spares, the faulty pattern can be repaired and the final output of the module is 1. The redundant analyzer in Fig. 10 also gives the final cover addresses the fuse controller. Fig. 11 shows an example of the operation of the proposed RA for the previous example of the fault collection, which is shown in Fig. 6. As the memory structure of the example has two blocks, the adjacent block column that must be flagged and CLSS are not needed. Moreover, the fault collection result [Fig. 11(b)] shows one must-repair line and one unoccupied line of PCAM. The allocation to the unoccupied line of PCAM is invalid allocation. Thus, the spare allocated to the line is considered to be unused. When a combination of the signals does not allocate a row spare to the adjacent block row must-repair line, the signal valid checker module judges the combination as invalid in the case where there is no proper unused row spare to allocate to the must-repair line. The first DSSS and RLSS patterns, #1-1 in Fig. 11, satisfy the must-repair condition by the unused row spare. However, the pattern can repair only the fifth fault of NPCAM. Although the unused spare is allocated to the must-repair line, the fourth and the sixth faults of NPCAM cannot be covered by an insufficient spare. In addition, the #1-2 case can allocate a row spare to the must-repair line; however, faults are uncovered in NPCAM. The #2-1 and #2-2 cases cannot cover the must-repair line. Therefore, the patterns are invalid. The #3-1 and #4-1 cases allocate a single global row spare to the row of the first line of NPCAM. However, the row spare cannot cover the adjacent row and the must-repair line because the patterns are invalid. Although #3-2 can allocate the must-repair line using the double global row spare, the pattern cannot cover two nonpivot faults. One of the uncovered nonpivot faults can be covered by an unused spare; however, the other fault cannot be covered. The last case, #4-2, can cover all the nonpivot faults as well as the must-repair line. The proposed CAM structure is designed to collect the reduced faulty information for the optimal repair rate. In addition, the proposed RA scheme can try every repair case, according to combinations of the direction and width of the spares. Through the proposed CAM structure and RA scheme, the proposed BIRA can achieve an optimal repair rate. 3-D memory can perform analysis using adaptable CLSS, RLSS, and DSSS depending on the redundant spare state, including sharable spares, TSV. IV. EXPERIMENTAL RESULTS As mentioned above, there are three major features of BIRA: the hardware overhead, repair rate, and analysis time. The hardware overhead is calculated on the basis of the required storage cells of each BIRA. In fact, the number of required storage cells is not exactly the same as the hardware overhead of each BIRA; however, the area of required storage dominates BIRAs. This is because a comparison of the hardware overheads of all BIRAs is very difficult. To estimate the repair rate, a C language-based simulator is used for a randomly generated fault pattern. The simulator is applied to operate in the tested memory structure, 1 to 12, in Fig. 12. Every simulated tested memory in Fig. 12, has the same spare area with three row and three column spares. The experiment is preceded by changing the number of faults #0 #24. The tested memories were simulated 10 000 times each for the number of faults. However, the simulation result was shown only for the number of

KIM et al.: HARDWARE-EFFICIENT BIRA FOR MEMORY WITH VARIOUS SPARES 853 Fig. 13. Comparison in terms of repair rate for the same spare (traditional three-row and three-column spares). the solution. In addition, multiblock CRESTA collects every repair case like existing CRESTA. However, as the sort of the spare type increased in the tested memory, the number of combinations of repair cases exponentially increased. At first, the hardware overhead can be easily calculated according to each BIRA. Equations (11) (13) give the numbers of required storage cells of each BIRA for the tested memory Fig. 12. Simulated tested spare memories. faults between #7 and #19, as the repair rate of the cases for #0 #6 fault cells is always 100%. Also, the repair rate of the cases for #20 #24 fault cells decreases drastically. The analysis time is also estimated by calculation. The RA scheme of the proposed BIRA and BRANCH is composed of a combination gate logic. Thus, the analysis time can be calculated on the basis of the number of signals in combination. To achieve the various results, not only the proposed BIRA but also BRANCH and CRESTA for the tested memory structure are used. Multiblock BRANCH and multiblock CRESTA are designed to adapt to the tested memory using traditional BIRA, BRANCH, and CRESTA. Multiblock BRANCH repairs each block first. Finally, the repair results of each block are analyzed to see whether the repair result satisfies a given spare resource. If it does not satisfy a given spare source, BRANCH has to find another repair result combination to find a solution. Therefore, it takes a repair result of each block analyzed to see whether the repair result satisfies a given spare resource. If it does not satisfy a given spare source, multiblock BRANCH has to find another repair result combination to find a solution. Therefore, it takes a lot of time on average to find A CRESTA = (R s + C s )! (11) S sr!s sc!s wr!s wc! A BRANCH = 4[(# of usable row spares in a block + # of usable column spares in a block) (log 2 N + log 2 M + 3) + (2 # of usable row spares in a block # of usable column spares in a block # of usable row spares in a block # of usable column spares in a block) (max(log 2 M, log 2 M) + 2 + log 2 (# usable spares in a block)] (12) A proposed ={(S slc + S scc + S sgc + 2S wgc + 2S wlc ) R s + (S slr + S scr + S sgr + 2S wgr + 2S wlr ) C s # usable spares in block 1 # local spares in block 4} (max(log 2 M, log 2 M) + 4 + log 2 (R s + C s )) (13) T proposed = (R s + C s )! R S! C s! R s! S wr! S sr! C s! S wc! S sc!. (15) Fig. 13 and Table I show a comparison of the simulated repair rate and the calculated hardware overhead among the proposed BIRA, multiblock CRESTA, and multiblock BRANCH. The simulated spare types are defined in Fig. 12. { } (# usable spares for a block)! 4 T BRANCH = (14) (# row spares for a block)! (# column spares for a block)!

854 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 3, MARCH 2017 TABLE I COMPARISON OF EACH ALGORITHM IN TERMS OF HARDWARE OVERHEAD FOR THE SAME SPARE AREA (TRADITIONAL THREE-ROW AND THREE-COLUMN SPARES) TABLE II COMPARISON OF EACH BIRA IN TERMS OF ANALYSIS TIME The hardware overhead of the proposed BIRA is the smallest among the three options for every spare structure in Fig. 12. Although the simulated repair rates of each BIRA are the same, the hardware overhead of multiblock CRESTA is far larger than that of others. The proposed BIRA requires only approximately 28% 55% of the hardware overhead of multiblock BRANCH, depending on the spare structure. The experimental result for 1 in Fig. 13 shows that the repair rate of the traditional spare structure is lower than that of the others. This means that the cost-efficient spare structure has greater flexibility for fault covering than the traditional one. Commonly, spare structures having more segmented and globally usable spares can achieve higher repair rates. Table I shows the hardware overhead of each spare structure in Fig. 12. The spare structure in Fig. 12 is arranged in the descending order of each repair rate. However, the hardware overhead data do not scale in proportion to the repair rate. This means that the spare structure with BIRA having larger hardware overhead does not always have a better repair rate than that with BIRA having a smaller hardware overhead. Therefore, the memory designer should consider the real faulty patterns. Using this property, the semiconductor manufacturer can optimize yield according to the faulty patterns with minimal hardware overhead of BIRA. Memories with high defect probabilities may be considered using spare structures such as 8 12. However, 10 and 12 require more hardware overhead than BIRA for other spare structures with similar repair rates. Thus, 8, 9, and 11 are more efficient than 10 and 12 for a memory with a high defect probability. On the other hand, a memory with low defect probability in a mature semiconductor manufacturing process may be able to be considered by spare structures 6 8. The hardware overhead of multiblock CRESTA is relatively too large to apply. The analysis time of multiblock BRANCH is given by (14), as shown at the bottom of the previous page. As mentioned above, the number of combinations of control signals of each block is deducted from the equation. Therefore, the equation is biquadrate for four block memories. The multiplier increased for more than four block memories. The analysis time of the proposed method is given by (15) as the product of the number of each signal, including RLSS, CLSS, and DSSS. It can be easily calculated for a memory of more than four blocks. Table II shows the maximum time calculation result of the above spare structure cases for multiblock BRANCH and the proposed BIRA. The analysis time of the proposed BIRA is smaller than that of the multiblock BRANCH for every spare structure case. Multiblock BRANCH should combine the solutions of each block. Thus, the analysis time of multiblock BRANCH is typically longer than that of the proposed BIRA. In the various spare cell architectures, the number of allocation combinations is large. However, it is very rare to find a repair solution after searching all the combinations. Therefore, even if the number of allocation combinations is very large, the average time for finding a repair solution is not too long. In addition, the repair time is usually much less than the test time of the BIRA. The proposed BIRA has a better analysis time and smaller hardware overhead than the comparison targets in every spare structure. In addition, the proposed costefficient spare structure offers a way to efficiently improve the yield. V. CONCLUSION As the density of memory increases, a spare cell structure in the tested memories becomes more complex. Therefore, a new RA method is required to test and repair the tested memories that have a cost-efficient spare cell structure. The traditional BIRAs are hard to apply for a cost-efficient spare cell structure because it needs complex conditions to allocate spare cells

KIM et al.: HARDWARE-EFFICIENT BIRA FOR MEMORY WITH VARIOUS SPARES 855 efficiently. However, using a cost-efficient spare cell structure, the proposed BIRA can achieve a 100% normalized repair rate and a higher repair rate than the conventional spare structure. Multiblock CRESTA and multiblock BRANCH were used to compare the performance with the proposed BIRA in the experimental result. Although these provide a 100% normalized repair rate, their hardware overheads are larger than that of the proposed method. However, the proposed BIRA can achieve a 100% normalized repair rate with a small hardware overhead and short analysis time due to a cost-efficient spare cell structure. REFERENCES [1] International Technology Roadmap for Semiconductors (ITRS), Semicond. Ind. Assoc., San Jose, CA, USA, 2011. [2] V. F. Pavlidis and E. G. Friedman, Interconnect-based design methodologies for three-dimensional integrated circuits, Proc. IEEE, vol. 97, no. 1, pp. 123 140, Jan. 2009. [3] W. R. Davis et al., Demystifying 3D ICs: The pros and cons of going vertical, IEEE Des. Test Comput., vol. 22, no. 6, pp. 498 510, Nov./Dec. 2005. [4] W. Kang, C. Lee, H. Lim, and S. Kang, A 3 dimensional built-in selfrepair scheme for yield improvement of 3 dimensional memories, IEEE Trans. Rel., vol. 64, no. 2, pp. 586 595, Jun. 2015. [5] C. Lee, W. Kang, D. Cho, and S. Kang, A new fuse architecture and a new post-share redundancy scheme for yield enhancement in 3-Dstacked memories, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 33, no. 5, pp. 786 797, May 2014. [6] B.-Y. Lin et al., Redundancy architectures for channel-based 3D DRAM yield improvement, in Proc. IEEE Int. Test Conf., Oct. 2014, pp. 1 7. [7] J. H. Lau, TSV manufacturing yield and hidden costs for 3D IC integration, in Proc. 60th Electron. Compon. Technol. Conf., Jun. 2010, pp. 1031 1042. [8] S.-Y. Kuo and W. K. Fuchs, Efficient spare allocation for reconfigurable arrays, IEEE Des. Test Comput., vol. 4, no. 1, pp. 24 31, Feb. 1987. [9] T. Kawagoe, J. Ohtani, M. Niiro, T. Ooishi, M. Hamada, and H. Hidaka, A built-in self-repair analyzer (CRESTA) for embedded DRAMs, in Proc. Int. Test Conf., Oct. 2000, pp. 567 574. [10] W. Jeong, J. Lee, T. Han, K. Lee, and S. Kang, An advanced BIRA for memories with an optimal repair rate and fast analysis speed by using a branch analyzer, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 29, no. 12, pp. 2014 2026, Dec. 2010. [11] W. Jeong, I. Kang, K. Jin, and S. Kang, A fast built-in redundancy analysis for memories with optimal repair rate using a line-based search tree, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 12, pp. 1665 1678, Dec. 2009. [12] C.-T. Huang, C.-F. Wu, J.-F. Li, and C.-W. Wu, Built-in redundancy analysis for memory yield improvement, IEEE Trans. Rel., vol. 52, no. 4, pp. 386 399, Dec. 2003. [13] I. Kim, Y. Zorian, G. Komoriya, H. Pham, F. P. Higgins, and J. L. Lewandowski, Built in self repair for embedded high density SRAM, in Proc. Int. Test Conf., Oct. 1998, pp. 1112 1119. [14] D. K. Bhavsar, An algorithm for row-column self-repair of RAMs and its implementation in the Alpha 21264, in Proc. Int. Test Conf., Sep. 1999, pp. 311 318. [15] C. H. Stapper, A. N. McLaren, and M. Dreckmann, Yield model for productivity optimization of VLSI memory chips with redundancy and partially good product, IBM J. Res. Develop., vol. 24, no. 3, pp. 398 409, May 1980. [16] H.-Y. Lin, F.-M. Yeh, and S.-Y. Kuo, An efficient algorithm for spare allocation problems, IEEE Trans. Rel., vol. 55, no. 2, pp. 369 378, Jun. 2006. [17] W. Kang, H. Cho, J. Lee, and S. Kang, A BIRA for memories with an optimal repair rate using spare memories for area reduction, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 11, pp. 2336 2349, Nov. 2014. [18] M. Yoshimoto et al., A divided word-line structure in the static RAM and its application to a 64K full CMOS RAM, IEEE J. Solid-State Circuits, vol. SC-18, no. 5, pp. 479 485, Oct. 1983. [19] A. Karandikar and K. K. Parhi, Low power SRAM design using hierarchical divided bit-line approach, in Proc. Int. Conf. Comput. Design, Oct. 1998, pp. 82 88. [20] S.-K. Lu, C.-L. Yang, Y.-C. Hsiao, and C.-W. Wu, Efficient BISR techniques for embedded memories considering cluster faults, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 2, pp. 184 193, Feb. 2010. [21] S.-K. Lu, Z.-Y. Wang, Y.-M. Tsai, and J.-L. Chen, Efficient builtin self-repair techniques for multiple repairable embedded RAMs, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 31, no. 4, pp. 620 629, Apr. 2012. [22] S.-K. Lu, C.-H. Hsu, Y.-C. Tsai, K.-H. Wang, and C.-W. Wu, Efficient built-in redundancy analysis for embedded memories with 2-D redundancy, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 1, pp. 34 42, Jan. 2006. [23] W. K. Huang, Y.-N. Shen, and F. Lombardi, New approaches for the repairs of memories with redundancy by row/column deletion for yield enhancement, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 9, no. 3, pp. 323 328, Mar. 1990. [24] M. Tarr, D. Boundreau, and R. Murphy, Defect analysis system speeds test and repair of redundant memories, Electronics, vol. 57, no. 1, pp. 175 179, Jan. 1984. [25] J. R. Day, A fault-driven, comprehensive redundancy algorithm, IEEE Des. Test Comput., vol. 2, no. 3, pp. 35 44, Jun. 1985. [26] H. Cho, W. Kang, and S. Kang, A built-in redundancy analysis with a minimized binary search tree, ETRI J., vol. 32, no. 4, pp. 638 641, Aug. 2010. [27] K. Pagiamtzis and A. Sheikholeslami, Content-addressable memory (CAM) circuits and architectures: A tutorial and survey, IEEE J. Solid-State Circuits, vol. 41, no. 3, pp. 712 727, Mar. 2006. [28] Y.-S. Kang, J.-C. Lee, and S. Kang, Parallel BIST architecture for CAMs, Electron. Lett., vol. 33, no. 1, pp. 30 31, Jan. 1997. [29] C.-S. Hou and J.-F. Li, High repair-efficiency BISR scheme for RAMs by reusing bitmap for bit redundancy, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 9, pp. 1720 1728, Sep. 2015. [30] M. Lee, L.-M. Denq, and C.-W. Wu, A memory built-in self-repair scheme based on configurable spares, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 30, no. 6, pp. 919 929, Jun. 2011. [31] T.-W. Tseng, J.-F. Li, and C.-C. Hsu, ReBISR: A reconfigurable built-in self-repair scheme for random access memories in SOCs, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 6, pp. 921 932, Jun. 2010. Jooyoung Kim received the B.S. degree in electrical and electronic engineering from the Department of Electrical and Electronics Engineering, Yonsei University, Seoul, South Korea, in 2015, where he is currently pursuing the combined M.S. degree. His current research interests include built-in self-repair, built-in self-testing, built-in redundancy analysis, redundancy analysis algorithms, reliability, and VLSI design. Woosung Lee received the B.S. degree in electrical and electronic engineering from the Department of Electrical and Electronics Engineering, Yonsei University, Seoul, South Korea, in 2014, where he is currently pursuing the combined M.S. degree. His current research interests include built-in self-repair, built-in self-testing, built-in redundancy analysis, redundancy analysis algorithms, reliability, and VLSI design.

856 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 3, MARCH 2017 Keewon Cho received the B.S. degree in electrical and electronic engineering from the Department of Electrical and Electronics Engineering, Yonsei University, Seoul, South Korea, in 2013, where he is currently pursuing the combined Ph.D. degree. His current research interests include built-in self-repair, built-in self-testing, built-in redundancy analysis, redundancy analysis algorithms, reliability, and VLSI design. Sungho Kang (M 89 SM 15) received the B.S. degree from Seoul National University, Seoul, South Korea, in 1986. and the M.S. and Ph.D. degrees in electrical and computer engineering from the University of Texas at Austin, Austin, TX, USA, in 1988, and 1992 respectively. He was a Research Scientist with the Schlumberger Laboratory for Computer Science, Schlumberger, Inc., Houston, TX, USA, and a Senior Staff Engineer with Semiconductor Systems Design Technology, Motorola, Inc., Schaumburg, IL, USA. Since 1994, he has been a Professor with the Department of Electrical and Electronic Engineering, Yonsei University, Seoul. His current research interests include VLSI/system-on-chip/3-D IC design and testing, design-for-testability, built-in self-test, defect diagnosis, and design-for-manufacturability.