Automating IEEE 1500 Core Test An EDA Perspective

Similar documents
IEEE P1500, a Standard for System on Chip DFT

A DfT architecture and tool flow for 3D-SICs with test data compression, embedded cores, and multiple towers

Chapter 8 Test Standards. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab. Embedded Core Testing (ΙΕΕΕ SECT std) 2

A DfT Architecture and Tool Flow for 3-D SICs With Test Data Compression, Embedded Cores, and Multiple Towers

THE CORE TEST WRAPPER HANDBOOK

Jin-Fu Li. Department of Electrical Engineering. Jhongli, Taiwan

High Quality, Low Cost Test

Testable SOC Design. Sungho Kang

Test Resource Reused Debug Scheme to Reduce the Post-Silicon Debug Cost

SoC Design Flow & Tools: SoC Testing

SmartScan - Hierarchical Test Compression for Pin-limited Low Power Designs

TEST cost in the integrated circuit (IC) industry has

TEST SCHEDULING OPTIMIZATION FOR EMBEDDED CORE TESTING

A novel test access mechanism for parallel testing of multi-core system

Testing TAPed Cores and Wrapped Cores With The Same Test Access Mechanism Λ

Test-Architecture Optimization and Test Scheduling for SOCs with Core-Level Expansion of Compressed Test Patterns

POLITECNICO DI TORINO Repository ISTITUZIONALE

1 Introduction & The Institution of Engineering and Technology 2008 IET Comput. Digit. Tech., 2008, Vol. 2, No. 4, pp.

Hierarchy-Aware and Area-Efficient Test Infrastructure Design for Core-Based System Chips 1

SoC Design Lecture 14: SoC Testing. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology

AN IMPLEMENTATION THAT FACILITATE ANTICIPATORY TEST FORECAST FOR IM-CHIPS

At-Speed On-Chip Diagnosis of Board-Level Interconnect Faults

EMERGING STRATEGIES FOR RESOURCE-CONSTRAINED TESTING OF SYSTEM CHIPS Resource-constrained system-on-a-chip test: a survey

WITH integrated circuits, especially system-on-chip

Scan-Based BIST Diagnosis Using an Embedded Processor

Reconfigurable Linear Decompressors Using Symbolic Gaussian Elimination

Abstract. 1 Introduction. 2 Time Domain Multiplexed TAM

Core-Level Compression Technique Selection and SOC Test Architecture Design 1

Efficiently Utilizing ATE Vector Repeat for Compression by Scan Vector Decomposition

EMBEDDED DETERMINISTIC TEST FOR LOW COST MANUFACTURING TEST

TAM design and Test Data Compression for SoC Test Cost Reduction

ADVANCES in chip design and test technology have

DFT-3D: What it means to Design For 3DIC Test? Sanjiv Taneja Vice President, R&D Silicon Realization Group

OPTIMIZED TEST SCHEDULING WITH REDUCED WRAPPER CELL FOR EMBEDDED CORE TESTING

Chapter 2 Why use the IEEE 1500 Standard?

Efficient Algorithm for Test Vector Decompression Using an Embedded Processor

A Technique for High Ratio LZW Compression

Parallelized Network-on-Chip-Reused Test Access Mechanism for Multiple Identical Cores

A Partition-Based Approach for Identifying Failing Scan Cells in Scan-BIST with Applications to System-on-Chip Fault Diagnosis

Test of NoCs and NoC-based Systems-on-Chip. UFRGS, Brazil. A small world... San Diego USA. Porto Alegre Brazil

Static Compaction Techniques to Control Scan Vector Power Dissipation

Compression-based SoC Test Infrastructures

Testing ASICs with Multiple Identical Cores

Reducing Control Bit Overhead for X-Masking/X-Canceling Hybrid Architecture via Pattern Partitioning

Best Practices for Incremental Compilation Partitions and Floorplan Assignments

A Unified DFT Architecture for use with IEEE and VSIA/IEEE P1500 Compliant Test Access Controllers

Deterministic BIST ABSTRACT. II. DBIST Schemes Based On Reseeding of PRPG (LFSR) I. INTRODUCTION

A Proposed RAISIN for BISR for RAM s with 2D Redundancy

Scalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN)

IEEE P1500 Core Test Standardization

At-Speed Scan Test with Low Switching Activity

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments

Testing Embedded Cores Using Partial Isolation Rings

A Reconfigured Twisted Ring Counter Using Tristate Coding For Test Data Compression

298 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 35, NO. 2, FEBRUARY 2016

Design of System-on-a-Chip Test Access Architectures using Integer Linear Programming 1

An Industrial Approach to Core-Based System Chip Testing

Test/Repair Area Overhead Reduction for Small Embedded SRAMs

COEN-4730 Computer Architecture Lecture 12. Testing and Design for Testability (focus: processors)

Improving Encoding Efficiency for Linear Decompressors Using Scan Inversion

Ahierarchical system-on-chip (SOC) is designed by integrating

Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers

Digital VLSI Testing. Week 1 Assignment Solution

FACTOR: A Hierarchical Methodology for Functional Test Generation and Testability Analysis

Chapter 9. Design for Testability

IEEE Std : What? Why? Where?

THE TESTABILITY FEATURES OF THE ARM1026EJ MICROPROCESSOR CORE

Digital Integrated Circuits

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Embedded Software-Based Self-Test for Programmable Core-Based Designs

A Scalable and Parallel Test Access Strategy for NoC-based Multicore System

Preliminary Outline of the IEEE P1500 Scaleable Architecture for Testing Embedded Cores

Test-Architecture Optimization for 3D Stacked ICs

POLITECNICO DI TORINO Repository ISTITUZIONALE

Optimal Clustering and Statistical Identification of Defective ICs using I DDQ Testing

A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip

Deterministic Test Vector Compression/Decompression for Systems-on-a-Chip Using an Embedded Processor

Test Cost Reduction for SOCs Using Virtual TAMs and Lagrange Multipliers Λ

Test Wrapper and Test Access Mechanism Co-Optimization for System-on-Chip

FPGA Implementation of ALU Based Address Generation for Memory

Programovatelné obvody a SoC. PI-PSC

Design-for-Test and Test Optimization. Techniques for TSV-based 3D Stacked ICs

How Much Logic Should Go in an FPGA Logic Block?

Part II: Laboratory Exercise

Embedded Quality for Test. Yervant Zorian LogicVision, Inc.

A Non-Intrusive Isolation Approach for Soft Cores

Overview the Proposed IEEE P1500 Scaleable Architecture for Testing Embedded Cores

Wrapper design for the reuse of a bus, network-on-chip, or other functional interconnect as test access mechanism

Wrapper/TAM Co-Optimization and Test Scheduling for SOCs Using Rectangle Bin Packing Considering Diagonal Length of Rectangles

Efficient Wrapper/TAM Co-Optimization for Large SOCs

DFT Trends in the More than Moore Era. Stephen Pateras Mentor Graphics

Eliminating Routing Congestion Issues with Logic Synthesis

Algorithm for Determining Most Qualified Nodes for Improvement in Testability

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Improving Memory Repair by Selective Row Partitioning

Expanding IEEE Std Boundary-Scan Architecture Beyond Manufacturing Test of Printed Circuit Board Assembly

TEST DATA COMPRESSION BASED ON GOLOMB CODING AND TWO-VALUE GOLOMB CODING

Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs

IEEE P1500-Compliant Test Wrapper Design for Hierarchical Cores

Transcription:

IEEE Std 1500 and Its Usage Automating IEEE 1500 Test An EDA Perspective Krishna Chakravadhanula and Vivek Chickermane Cadence Design Systems Editor s note: Standardized design and test practices enable automation. This article describes a methodology and corresponding tool set that combines automated support for IEEE Std 1500 and test data compression in one. Erik Jan Marinissen, IMEC ŠTHE CURRENT TREND of SoC design has made conventional test methodologies increasingly difficult. Performing brute-force test pattern generation (ATPG) on the entire SoC is often infeasible, because the design can exceed the test pattern generator s capabilities. At other times, some black-box third-party cores within the SoC might have their own test patterns generated at the core boundary. IEEE Std 1500 has been developed primarily to address such complex scenarios encountered while testing SoC designs. 1 IEEE 1500 describes how the cores within a SoC can be wrapped with IEEE-1500-compliant logic (called a wrapper) such that the overall task of testing the SoC is made much simpler. Researchers have published extensively on building IEEE-1500-compliant wrappers, 2-6 and on their verification for compliancy. 7 Additionally, research has been conducted on building test access mechanisms (TAMs) at the SoC level that efficiently harness cores, wrapped in accordance with IEEE 1500, that are embedded in the SoC. 8,9 This article discusses CAD support for automated IEEE 1500 wrapper generation, verification, and test generation in a production environment. In particular, we show how to combine IEEE 1500 wrapper synthesis with test data compression to reduce the test data volume and test application time of wrapped cores. Test data compression is not a new concept, and has been described previously. 10-12 One of the significant contributions of this article is to incorporate the test compression structures into the wrapper shell in a manner that retains IEEE 1500 compliance while also providing the benefits of compression. In this article, we also provide some solutions to the problem of migrating core test patterns to the SoC. IEEE 1500 wrapper automation Figure 1 shows a block diagram of a typical IEEE 1500 wrapper. A wrapper boundary cell (WBC) is placed on each functional pin of the bare core, forming the wrapper boundary register (WBR). A wrapper instruction register (WIR) sets the wrapper in a particular mode of operation, whether inward facing, outward facing, or in a bypass mode. IEEE 1500 defines two forms of access mechanisms for the wrapper the mandatory wrapper serial port (WSP), and the optional wrapper parallel port (WPP). The WSP consists of the wrapper serial input (), the wrapper serial output (WSO), and a set of control signals wrapper serial control (WSC) to enable data transfer via the and WSO pins. To reduce the test time when shifting large amounts of test data via the single-bit and WSO pins, the standard also allows for a wrapper parallel port that consists of one or more wrapper parallel input (WPI) terminals, an equal number of wrapper parallel output (WPO) terminals, and some wrapper parallel control (WPC) terminals. The two key challenges in designing a fully IEEE- 1500-compliant wrapper while addressing all the complex embedded DFT access issues are related to the WPP and WBR architecture. The WPP is very briefly described in IEEE 1500 with just two rules and two permissions. That brevity allows a lot of flexibility in how users can architect the WPP, but at the same time the WPP has to be scalable in supporting a large number of parallel that users might want to implement. The WBR might need to participate in several of the parallel and serial, so optimal WBR segmentation must be carefully addressed. 6 0740-7475/09/$25.00 c 2009 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers

Architecture of synthesized WPP After careful analysis of core test and integration requirements based on input from many users, the following were found to be most useful: Š Š Š WP_EXTEST WP_INTEST WP_INTEST_COMPRESSION boundary register Optional user-defined wrapper parallel port Test inputs (TI) Test outputs (TO) boundary register The rules and permissions to Functional inputs support the first two are well defined in IEEE 1500. The user-defined WP_INTEST_ COMPRESSION instruction is not bypass register defined in the standard, because it predates many of the test compression features offered by serial input commercial DFT tools; many IP () designers, however, considered it essential to take advantage of this feature. The WPC signals control the operation of the WBR and core internal scan chains during parallel. The WPC signals consist of WPC_ShiftWBR, WPC_ CaptureWBR, and WPC_UpdateWBR to control the WBR cells during parallel. The WPC also consists of any ports required to operate the core s test signals during parallel INTEST. For example, if the core has a pin that is a scan enable during at least one parallel INTEST instruction, this pin will have a corresponding port on the wrapper. If only WP_EXTEST is implemented, there will not be any dedicated ports on the wrapper that correspond to the core s test signals. Architecture of WBR Depending on the current instruction in the WIR, the WBR is either configured as a single serial shift register or split up into multiple scan chains. During serial, the WBR acts as a single scan chain consisting of all the WBR cells. During the WS_INTEST_SCAN instruction, the core internal scan chains are daisy-chained along with the WBR cells to form a single long scan chain. During the optional parallel, the WBR is split into one or more scan segments, and the wrapper parallel input and output (WPI/WPO) terminals access these WBR WBR FI WBY WIR segments. The WPC controls the WBR and how the core operates during parallel. The key challenge for all these objectives is to achieve optimal WBR segmentation. To configure the WBR for either serial or parallel, we propose a novel method in which multiplexers (muxes) are inserted between the WBR cells to allow either the or WPI terminals to access the WBR. These muxes are placed in a separate hierarchy called a SCANMUX macro. This makes the wrapper module less cluttered, while allowing the option of ungrouping the SCANMUX hierarchy if required. The control logic for these muxes is generated by the WIR on the basis of the instruction currently loaded and the WBR segmentation required for that instruction. Figures 2 and 3 show the wrapper configuration during the WP_EXTEST and WP_INTEST. Segmentation of WBR for parallel For each WBR cell and core scan chain, the user provides information as to which wrapper parallel input directly or indirectly drives it during a particular parallel instruction. Hence, given a set C ¼ {WC 1, WC 2,..., WC x, SC 1, SC 2,..., SC y } of wrapper cells FO serial control (WSC) Figure 1. IEEE 1500 wrapper (from IEEE Std 1500 1 ). WBR Functional outputs instruction register serial output (WSO) May/June 2009 7

IEEE Std 1500 and Its Usage WPI 1 WPI 2 WPI N WPO N WPO 2 WPO 1 WBR segments 1..N (segment size = WBR_length/N) shell scan chains bypassed Figure 2. Configuration during WP_EXTEST. Scan chains 1..N (WC) and core internal scan chains (SC), and set I ¼ {I 1, I 2,..., I z } of wrapper, the WBR segment information can be captured in a 2D table that has four wrapper cells, two scan chains, and three WPI ports. The entry at position T(m, n) of the table indicates that during instruction I m, element C n lies on a WBR segment that is driven by a or WPI instruction. For example, Table 1 shows that during instruction WP_EXTEST, cell WC 2 must get data originating from parallel port, and during instruction WS_EXTEST must get data originating from. An entry of 1indicates that element C n is bypassed during that instruction. The objective is to use the table s information to determine where to add muxes to configure the different WBR segments, and to derive the control logic for these muxes. Although a detailed description of the algorithm is beyond the scope of this article, the key is to analyze each column of the table in turn, starting from the left-hand side. For example, when analyzing column WC 3, during the WP_INTEST (I 2 ) instruction WC 3 must be connected to the output of WC 2,because WC 2 is the closest element to its left that is also driven by. During the WS_INTEST_SCAN (I 4 ) instruction, WC 3 must be connected to the scan output of chain SC 2 since that is the closest element to its left that is driven by. During WP_EXTEST (I 1 ), because there is no other element to the left of WC 3 driven by WPI[1], WC 3 is the first element in the segment driven by WPI[1]. Note that reordering the columns in the table could change the position and number of muxes that must be inserted. After analyzing Table 1, we arrive at the following equations for each element C n : WC 1 ¼ððI 1 þ I 2 ÞWPI½0ŠÞ þ ðði 3 þ I 4 þ I 5 ÞÞ WC 2 ¼ðI 1 þ I 2 þ I 3 þ I 4 þ I 5 ÞWC 1 Þ SC 1 ¼ðI 2 WPI½1ŠÞ þ ði 4 WC 2 Þ WPI 1 WPI 2 shell Scan chains 1..N SC 2 ¼ðI 2 WPI½2ŠÞ þ ði 4 SC 1 Þ WC 3 ¼ðI 1 WPI½1ŠÞ þ ðði 2 þ I 3 þ I 5 ÞWC 2 Þ þði 4 SC 2 Þ WC 4 ¼ðI 1 WPI½2ŠÞ þ ðði 2 þ I 3 þ I 4 þ I 5 ÞWC 3 Þ WPI N WPO N WPO 2 WPO 1 WBR segments 1..N (segment size = WBR_length/N) Figure 3. Configuration during WP_INTEST. Other than WC 2, which can be directly connected to the output of WC 1, all the other elements require logic in front of them to select from the multiple sources during the different. A similar analysis must be made for each WPO port by looping through each row in the table to find the last entry that contains this WPO port name. The element C n corresponding to this entry is the last bit in the segment that drives this WPO port. As we repeat this process for every row in the table, the outputs of all such elements will be the inputs to the mux that drives this WPO. 8 IEEE Design & Test of Computers

Table 1. Entries showing the and WPI terminals driving each wrapper cell and scan chain. Instruction (wrapper instruction) WP_EXTEST (I 1 ) WC 1 WC 2 SC 1 SC 2 WC 3 WPI[1] WC 4 WPI[2] WP_INTEST (I 2 ) WPI[1] WPI[2] WS_INTEST_RING (I 3 ) WS_INTEST_SCAN (I 4 ) WS_EXTEST (I 5 ) Automated wrapper synthesis Figure 4 shows the use model for this methodology. Optional memory BIST (MBIST) insertion followed by scan synthesis can be performed on the bare stand-alone core without deviating from the conventional test synthesis process. Note that the core might contain logic that enables compression of the core test data. But this will be transparent to the wrapper generation step, because it only needs information about the functionality of the pins on the core interface. This information is provided using a tabular format (SpecList in Figure 4) that describes the functional pins, the static and dynamic test signals, and the configuration of the required wrapper. Once the wrapper has been generated, verification of the wrapped core ensures the integrity of the added test logic. Each wrapper instruction maps to a test mode within which ATPG can be performed. For each instruction the wrapper supports, all ATPG rule checks, including a shift register integrity test, are performed. These checks ensure that this particular mode of the wrapper results in high test coverage during ATPG. The shift register integrity test is run to ensure that the correct registers are concatenated together in the shift path between the WSP scan pins ( and WSO) or between the n WPP scan pins (WPI[0:n] and WPO[0:n]). For example, during WS_INTEST_SCAN, the scan shift path between and WSO must include all the wrapper boundary cells and all the scan flip-flops within the core itself. To set up the test mode for test structure verification and ATPG, a mode initialization sequence is necessary to load the WIR with the instruction corresponding to the test mode. The wrapper generation process generates the mode initialization sequence files for each instruction, as well as all other files necessary to perform the verification of each wrapper instruction. Pattern generation of the wrapped core can occur at either the SoC or the IP level. During test generation at the SoC level, all wrapped cores can be placed in the INTEST mode to test the core s internals, and can be placed in EXTEST mode to test the SoC glue logic between wrapped cores. Alternatively, test pattern generation can be performed stand-alone on the wrapped core, and the core test patterns can be shipped to the SoC integrator who will then migrate the core patterns to the SoC boundary. The ability to migrate test patterns enables reuse of the wrapped core IP, and also allows for core test pattern reuse. synthesis with compression Combining wrapper synthesis with test data compression is beneficial for large cores because the Output files for verification Insert wrapped core into SoC Test generation/migration MBIST insertion Scan synthesis Scan-inserted core Build 1500 wrapper Wrapped core Test structure verification Test generation patterns description (SpecList) Figure 4. Automated wrapper synthesis methodology. May/June 2009 9

IEEE Std 1500 and Its Usage Chip Masking Compressed input stream Space expander (spreader network) Masking Space compactor (XOR trees) Compressed output stream Figure 5. Supported test compression structure. uncompressed test data of one core can negatively affect the test data volume and test application time of the SoC as a whole. Another advantage of adding test compression structures to cores within SoC is that it helps simplify the design of the TAM. By using compression, all cores can be designed to have the same TAM width, while internally having different compression ratios. Compression of core test data is achieved by having far more scan chains during the compression mode than during the uncompressed mode also called full-scan (or diagnostic) mode. Having more chains decreases the length of each one, thereby reducing the test application time. Having only a few top-level scan pins drive this large number of internal scan chains reduces the test data volume. During full-scan mode, the number of chains is the same as the number of top-level scan pins. Some additional logic is required to configure the scan chains during these two modes. Figure 5 shows the type of decompressioncompression structure currently supported by the described methodology. On the input side, the decompressor can be either a simple fan-out or a linear combinational network. The compactor on the output is a linear combinational network with the option of having mask logic to block unknowns (X values) in the response data from being scanned out. In the rest of this article, we use compressor to cover both the input-side decompressor and the output compactor. Masking The described methodology automatically inserts the compression structures within the Input side: Fan-out only or wrapper shell, and not within XOR spreader the core. This allows the use of both soft- and hard-ip cores because no logic is added inside the core module. In the case of hard-ip cores, because the number of scan chains in the core is Output side: already frozen, the achievable Space compactor compression ratio is limited. Having the compression macro out- Optional masking side the core makes it possible to tweak the compression ratio to match the TAM constraints without having to resynthesize the scan chains within the core. Because the compression logic is within the wrapper, the wrapper flip-flops can be tested using compressed data, and this keeps ATPG simple. If the compression logic is inside the core, then the test data from the WPI pins would contain both the compressed data for the core and the uncompressed data for the wrapper. The core can also be a black-box IP module as long as the scan synthesis constraints are met while generating the IP. For the wrapper to include compression, the scan synthesis of the core must consider the number of chains during the compression mode. For example, if the number of chains during the compression mode is 320 but there are only 32 scan-in pins, the core must have 320 short scan chains in it, not 32. During full-scan mode, the short scan chains are concatenated to form 32 long scan chains driven by the top-level scan pins. The logic to perform the concatenation is automatically added during the wrapper synthesis process. Information must be provided to the wrapper generation process to indicate that compression logic needs to be inserted. The additional wrapper level pins required to operate the compression logic must also be specified. The parallel instruction WP_INTEST_ COMPRESSION enables compression mode. During compression mode (see Figure 6), the wrapper boundary register is divided into balanced segments and appended to the end of the core scan chains. The number of WBR segments must match the number of core scan chains (320, from the previous example). The WPI and WPO terminals are connected to the pins 10 IEEE Design & Test of Computers

on the compression macro, and look similar to regular scan I/O pins (32, from the previous example). During the WP_EXTEST mode, the wrapper boundary register is divided into balanced segments and connected to the WPI and WPO terminals; the compression macro and the core are bypassed. The number of WBR segments matches the number of WPI and WPO terminals (32, from the above example). The compression macro is bypassed in every mode (instruction) except the one that uses the compression macro that is, WP_INTEST_COMPRESSION. SoC integration of cores A full discussion of the SoC-level IEEE 1500 core integration requires an article by itself. Therefore, we only briefly describe the process for integrating cores (wrapped or unwrapped) into a SoC environment, along with a mechanism for migrating test patterns for the wrapped (or bare) core to the SoC boundary. preparation for SoC integration In our methodology, the IP provider has the ability to provide a full, but protected (encrypted), logical description of the IP design. The IP content is visible to DFT verification, ATPG, and diagnostic applications, but not to the SoC designer. This is known as a whitebox solution. Alternatively, the IP provider can create effective and efficient predefined IP tests via the DFT and ATPG tool suite, and describe predefined test application requirements via a macro isolation control file. The MIC information can be described in a native format or via IEEE Std 1450.6 Test Language (CTL) format. This is a black-box solution because the IP content is not known to DFT, ATPG, or diagnostic applications. Our methodology enables migration of the IP-providersupplied IP tests to SoC primary and scan I/Os. IP-level integration flow The IP provider executes the following steps: 1. Create the ATPG model (white box or encrypted black box) of the IP netlist. 2. Execute the standard logic test methodology. All available tool features can be used for DFT analysis, ATPG, pattern optimization, and so on. 3. For black-box cores, convert the resulting tests into a core-test-migratable format. This process removes all dependencies on the ATPG model files such that the SoC designer can use the vectors without knowledge of the IP logic. The IP test vectors can be exported in Standard Test Interface Language (STIL) (IEEE 1450.1 or 1450-1999) format. WPI 1 WPI N WPO N WPO 1 Pins to control macro Compression/decompression macro 4. Create the MIC file to describe the IP embedding environment required to apply the tests and the IP black-box cell definition (I/O description only). The vector files and the MIC file resulting from the processing in steps 1 through 3 are then distributed to SoC customers for their use in the blackbox SoC methodology. SOC-level integration The SoC integrator executes the following steps (illustrated in Figure 7): 1. Create an integrated design with the multiple embedded IP cores, SoC glue logic, and TAMs that connect to the IP-level serial and parallel ports. Embeddedcore test isolation requirements patterns 1 2 3 M M 3 2 1 WBR segments 1..M (segment size = WBR_length/M) SoC model isolation validation ->SoC map Embeddedcore test Figure 7. test pattern reuse flow. shell M > N Scan chains 1..M Figure 6. Configuration during compression mode (WP_INTEST_ COMPRESSION). SoC patterns May/June 2009 11

IEEE Std 1500 and Its Usage Table 2. Area overhead of the wrapper (in NAND2 equivalents). A + wrapper (22K flip-flops, 358 WBR cells) B + wrapper (1,348 flip-flops, 111 WBR cells) Instances within wrapper Bare core WIR Dedicated WBR cells Shared WBR cells serial 338,777 51 4,138 2,482 parallel 338,777 59 4,138 2,482 compression 338,777 59 4,138 2,482 serial 18,166 184 2,681 1,608 parallel 18,166 217 2,681 1,608 compression 18,166 217 2,681 1,608 WBY SCANMUX macro XOR macro Total area (dedicated WBR cells) Area increase over bare core Total area (shared WBR cells) Area increase over bare core 6 2 342,974 1.24% 341,318 0.75% 6 33 343,013 1.25% 341,357 0.76% 6 287 1,926 345,193 1.89% 343,537 1.41% 15 7 21,053 15.9% 19,980 9.9% 15 119 21,198 16.7% 20,125 10.8% 15 339 1,953 23,371 28.7% 22,298 22.7% * WBR: wrapper boundary register; WIR: wrapper instruction register; WBY: wrapper bypass. This is a fairly complex step and is well-described in the literature. 6 2. Verify core isolation using the MIC files supplied by each IP provider. Make design changes in the SoC as necessary to correct the isolation failures. 3. Execute a step to migrate the core-level tests to the SoC pins. This step results in the IP vectors being mapped to the SoC primary and scan I/O identified and verified by the previous step. Experimental results The results we discuss here were achieved after we performed wrapper synthesis using two cores, A and B. A had 22,000 scan flip-flops and 358 functional I/Os. B was smaller, with only 1,348 scan flipflops and 111 functional I/Os. Table 2 shows the area required for the different building blocks within the wrappers for cores A and B, and the area overhead when compared against the unwrapped cores. For each core, three columns show the area required when the core was wrapped by an IEEE 1500 wrapper implementing, respectively, only serial, parallel, and compression. As expected, most of the area within the wrapper shell is taken up by the WBCs on the core s functional I/Os. Our results showed that one approach to reduce the overhead is to implement shared WBR cells, where a functional flip-flop on the core s boundary is reused as a WBR cell. This is a viable option because flipflops are commonly located near the core I/Os for isolation and timing purposes. Table 2 also presents an estimation of the area when shared WBR cells are used, and the consequently smaller area overhead incurred. In summary, for small designs like core B the wrapper s area overhead can be significant, but for bigger designs like core A the wrapper s area overhead is very small. From our results, we expect that the overhead will be less for larger designs, since the number of I/Os will not be growing at the same scale as the design itself. As mentioned earlier, each wrapper instruction maps to a test mode in which ATPG is performed. Table 3 shows the results after performing ATPG on each instruction within the wrapper for core A. The four scenarios we considered were ATPG on a bare core, on a wrapped core having serial, on a wrapped core having parallel, and finally on a wrapped core having the additional compression instruction. Each row in the table corresponds to a mode in which ATPG was run, and the modes not applicable in a 12 IEEE Design & Test of Computers

Table 3. ATPG results for core A (each row is an incremental run on top of a previous instruction). Instructions within wrapper XOR spreader and compactor XOR compactor Full-scan (diagnostic mode) WP_INTEST WP_EXTEST WS_INTEST_SCAN WS_EXTEST WS_BYPASS Total Bare core 3,094 97.96 3,094 97.96 Serial 3,134 97.79 11 97.98 3 97.98 3,148 97.98 Parallel 3,119 97.84 12 98.04 8 98.04 1 98.04 3.140 98.04 Compression instruction 3,274 97.86 22 97.89 3 97.89 16 98.00 7 98.01 1 98.01 3,323 98.01 scenario are grayed out in the table. For example, ATPG was performed on the bare core only in the full-scan mode since there is no wrapper present at this point. For the wrapped core having parallel, ATPG was performed in four different modes, corresponding to the WP_INTEST, WP_EXTEST, WS_EXTEST, and WS_BYPASS. For each scenario, we ran ATPG first in the mode that would target the most faults, while minimizing the number of patterns. When we ran ATPG in the next mode, it was an incremental run targeting only the faults undetected by the previous mode(s). In the scenario with compression, we first ran ATPG in the mode with both XOR spreader and XOR compactor active, resulting in 3,274 test patterns. The last ATPG run was in the mode in which the WS_BYPASS instruction was loaded in the wrapper, resulting in a total of 3,323 patterns, yielding a final coverage of 98.01%. The total number of faults differed slightly among the different scenarios, reflected in the small differences in the final fault coverage attained for each scenario. Table 4 presents the results when we performed similar ATPG runs on core B. Table 4. ATPG results for core B (each row is an incremental run on top of a previous instruction). Instructions within wrapper XOR spreader and compactor XOR compactor Full scan (diagnostic mode) WP_INTEST WP_EXTEST WS_INTEST_SCAN WS_EXTEST WS_BYPASS Total Bare core 427 99.92 427 99.92 Serial 414 98.81 4 99.37 3 99.46 421 99.46 Parallel 429 98.74 4 99.32 5 99.41 2 99.45 440 99.45 Compression instruction 450 98.87 11 98.88 1 99.00 9 99.31 4 99.37 2 99.40 477 99.40 May/June 2009 13

IEEE Std 1500 and Its Usage Table 5. Results for test data volume and test application time reduction for core A. 3. Y. Zorian, E.J. Marinissen, and S. Dey, Testing Embedded- Test data volume Test application time Based System Chips, Proc. Int l No. of bits No. of cycles Test Conf. (ITC 98), IEEE CS Modes (millions) Ratio (millions) Ratio Press, 1998, pp. 130-143. Full scan on bare core 68.7 1.00 8.6 1.00 4. A. Sehgal et al., IEEE P1500- WS_INTEST_SCAN 70.7 0.97 70.7 0.12 Compliant Test Design WP_INTEST 70.4 0.98 8.8 0.98 for Hierarchical s, Proc. Int l WP_INTEST_COMPRESSION 4.7 14.5 0.6 14.50 Test Conf. (ITC 04), IEEE CS + top-off (diagnostic mode) Press, 2004, pp. 1203-1212. 5. H.J. Vermaak and H.G. Kerkhoff, Enhanced P1500 Compliant Suitable for Delay Table 6. Results for test data volume and test application time reduction for core B. Testing of Embedded s, Test data volume Test application time Proc. 8th European Test Workshop, No. of bits No. of cycles IEEE CS Press, 2003, Modes (thousands) Ratio (thousands) Ratio pp. 121-126. Full scan on bare core 576 1.00 72 1.00 6. F. da Silva, T. McLaurin, and WS_INTEST_SCAN 604 0.95 604 0.12 T. Waayers, The Test WP_INTEST 624 0.92 78 0.92 Handbook: Rationale WP_INTEST_COMPRESSION 176 3.30 22 3.30 and Application of IEEE Std. + top-off (diagnostic mode) 1500, Springer, 2006. 7. A. Benso et al., IEEE Standard 1500 Compliance Verification for Table 5 shows the test data volume and test application time (in cycles) for each of the major that tested the logic in core A. As expected, the reduction in test data and test time was greatest during the compression instruction (almost 14). The Ratio shown in the table is calculated by using as a baseline the result of running ATPG on the bare core without any compression (that is, full-scan mode). Table 6 shows the results for core B. Embedded s, IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 16, no. 4, 2008, pp. 397-407. 8. T. Waayers, R. Morren, and R. Grandi, Definition of a Robust Modular SOC Test Architecture; Resurrection of the Single TAM Daisy-Chain, Proc. Int l Test Conf. (ITC 05), IEEE CS Press, 2005, pp. 610-619. 9. D. Appello et al., On the Automation of the Test Flow of Complex SoCs, Proc. 24th VLSI Test Symp., IEEE CS Press, 2006, pp. 166-171. 10. C. Barnhart et al., Extending OPMISR beyond 10x Scan OUR AUTOMATED METHODOLOGY for IEEE 1500 wrapper generation and test data migration, in conjunction with our flexible WPP and WBR architecture, can support most common, including test compression. Future work will include providing a full description of the SoC level architecture and Test Efficiency, IEEE Design & Test, vol. 19, no. 5, 2002, pp. 65-73. 11. I. Hamzaoglu and J.H. Patel, Reducing Test Application Time for Full-Scan Embedded s, Proc. 29th Ann. Int l Symp. -Tolerant Computing, IEEE CS Press, 1999, pp. 260-267. using CTL to enable test data migration. Š 12. J. Rajski et al., Embedded Deterministic Test for Low- Cost Manufacturing Test, Proc. Int l Test Conf. (ITC 02), IEEE CS Press, 2002, pp. 301-310. ŠReferences 1. IEEE Std 1500, IEEE Standard for Embedded Test (SECT), IEEE, 2005; http://grouper.ieee.org/ groups/1500. 2. E.J. Marinissen, S.K. Goel, and M. Lousberg, Design for Embedded Test, Proc. Int l Test Conf. (ITC 00), IEEE CS Press, 2000, pp. 911-920. Krishna Chakravadhanula is a senior member of the consulting staff on the Encounter Test R&D team at Cadence Design Systems. His research interests include all aspects of VLSI testing and synthesis, including low power, test compression, and embedded test. He has a PhD in electrical and 14 IEEE Design & Test of Computers

computer engineering from the University of Texas at Austin. Vivek Chickermane is a senior architect and R&D director for DFT products at Cadence Design Systems. His research interests include testing, synthesis, and verification of VLSI systems. He has a PhD in electrical engineering from the University of Illinois at Urbana- Champaign. ŠDirect questions and comments about this article to Krishna Chakravadhanula, Cadence Design Systems, 1701 North Street, Building 257-3, Endicott, NY 13760; ckrishna@cadence.com. For further information on this or any other computing topic, please visit our Digital Library at http://www. computer.org/csdl. May/June 2009 15