A Scalable and Parallel Test Access Strategy for NoC-based Multicore System

Similar documents
Parallelized Network-on-Chip-Reused Test Access Mechanism for Multiple Identical Cores

A novel test access mechanism for parallel testing of multi-core system

ADVANCES in chip design and test technology have

A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip

BISTed cores and Test Time Minimization in NOC-based Systems

3D Memory Formed of Unrepairable Memory Dice and Spare Layer

Test of NoCs and NoC-based Systems-on-Chip. UFRGS, Brazil. A small world... San Diego USA. Porto Alegre Brazil

Test Resource Reused Debug Scheme to Reduce the Post-Silicon Debug Cost

Test-Architecture Optimization and Test Scheduling for SOCs with Core-Level Expansion of Compressed Test Patterns

Deadlock-free XY-YX router for on-chip interconnection network

An Area-Efficient BIRA With 1-D Spare Segments

Kee Sup Kim Samsung Electronics. ramework for Massively Parallel esting at Wafer and Package Test

A Novel Massively Parallel Testing Method Using Multi-Root for High Reliability

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

1 Introduction & The Institution of Engineering and Technology 2008 IET Comput. Digit. Tech., 2008, Vol. 2, No. 4, pp.

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead

Core-Level Compression Technique Selection and SOC Test Architecture Design 1

High Performance Interconnect and NoC Router Design

AN OPTIMAL APPROACH FOR TESTING EMBEDDED MEMORIES IN SOCS

WITH the development of the semiconductor technology,

Ahierarchical system-on-chip (SOC) is designed by integrating

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

AN IMPLEMENTATION THAT FACILITATE ANTICIPATORY TEST FORECAST FOR IM-CHIPS

Test-Architecture Optimization for 3D Stacked ICs

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

A Concurrent Testing Method for NoC Switches

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Testable SOC Design. Sungho Kang

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Scan-Based BIST Diagnosis Using an Embedded Processor

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Wrapper Design for the Reuse of Networks-on-Chip as Test Access Mechanism

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology

SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

Self-Repair for Robust System Design. Yanjing Li Intel Labs Stanford University

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

Efficient Algorithm for Test Vector Decompression Using an Embedded Processor

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

A Built-in Self-Test for System-on-Chip

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab. Embedded Core Testing (ΙΕΕΕ SECT std) 2

Design of a System-on-Chip Switched Network and its Design Support Λ

ISSN Vol.03, Issue.02, March-2015, Pages:

A Modified NoC Router Architecture with Fixed Priority Arbiter

A New Scan Chain Fault Simulation for Scan Chain Diagnosis

AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

A Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi-Processors

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN

Practical Test Architecture Optimization for System-on-a-Chip under Floorplanning Constraints

A Heuristic Search Algorithm for Re-routing of On-Chip Networks in The Presence of Faulty Links and Switches

Optimal Test Time for System-on-Chip Designs using Fuzzy Logic and Process Algebra

Transaction Level Model Simulator for NoC-based MPSoC Platform

Testing TAPed Cores and Wrapped Cores With The Same Test Access Mechanism Λ

AS THE capacity and density of memory gradually

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University

Improving Fault Tolerance of Network-on-Chip Links via Minimal Redundancy and Reconfiguration

Temperature and Traffic Information Sharing Network in 3D NoC

A scalable built-in self-recovery (BISR) VLSI architecture and design methodology for 2D-mesh based on-chip networks

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

POLITECNICO DI TORINO Repository ISTITUZIONALE

Built-in Self-Test and Repair (BISTR) Techniques for Embedded RAMs

Lecture 18: Communication Models and Architectures: Interconnection Networks

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors

Improving Memory Repair by Selective Row Partitioning

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM

A Test Integration Methodology for 3D Integrated Circuits

Wrapper/TAM Co-Optimization and Test Scheduling for SOCs Using Rectangle Bin Packing Considering Diagonal Length of Rectangles

Reconfigurable Linear Decompressors Using Symbolic Gaussian Elimination

The Design and Implementation of a Low-Latency On-Chip Network

Design and Implementation of Microcode based Built-in Self-Test for Fault Detection in Memory and its Repair

Test-Architecture Optimization for TSV-Based 3D Stacked ICs

COEN-4730 Computer Architecture Lecture 12. Testing and Design for Testability (focus: processors)

SoC Design Flow & Tools: SoC Testing

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip

TSV Minimization for Circuit Partitioned 3D SoC Test Wrapper Design

Optimization of Test Scheduling and Test Access for ITC-02 SOC Benchmark Circuits

Defect Tolerance in Homogeneous Manycore Processors Using Core-Level Redundancy with Unified Topology

Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation

Framework for Massively Parallel Testing at Wafer and Package Test

Network-on-Chip Architecture

DESIGN AND IMPLEMENTATION ARCHITECTURE FOR RELIABLE ROUTER RKT SWITCH IN NOC

Efficient Built In Self Repair Strategy for Embedded SRAM with selectable redundancy

Efficient And Advance Routing Logic For Network On Chip

Chapter 9. Design for Testability

A Heuristic for Concurrent SOC Test Scheduling with Compression and Sharing

A Novel Semi-Adaptive Routing Algorithm for Delay Reduction in Networks on Chip

Scalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN)

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

PERFORMANCE EVALUATION OF FAULT TOLERANT METHODOLOGIES FOR NETWORK ON CHIP ARCHITECTURE

Accessing On-chip Instruments Through the Life-time of Systems ERIK LARSSON

A Literature Review of on-chip Network Design using an Agent-based Management Method

l Some materials from various sources! Soma 1! l Apply a signal, measure output, compare l 32-bit adder test example:!

DFT-3D: What it means to Design For 3DIC Test? Sanjiv Taneja Vice President, R&D Silicon Realization Group

Transcription:

A Scalable and Parallel Test Access Strategy for NoC-based Multicore System Taewoo Han, hyuk Choi, Hyunggoy Oh, Sungho Kang Department of Electrical and Electronic Engineering Computer systems & reliable SoC Lab., Yonsei University Seoul, Korea {twhan, ihchoi, kyob58}@soc.yonsei.ac.kr, shkang@yonsei.ac.kr Abstract This paper proposes a new parallel test access strategy for multiple identical cores in a network-on-chip (NoC). The proposed test strategy takes advantage of the regular design of NoC to reduce both test area overhead and test time. The proposed NoC reused test access mechanism (TAM) adopted a pipelining structure and a deterministic test data routing algorithm in order to reuse the full bandwidth of links in the NoC. Also, the architecture has complete scalability according to the number of cores and applications for 3D environment are also represented. Experimental results show that the proposed TAM can test multiple cores with the same test time as a single core and negligible hardware overhead. Keywords parallel test; multiple identical cores; NoC; TAM I. INTRODUCTION The number and complexity of cores in a system-on-chip (SoC) are predicted to increase continuously and rapidly in the next few years. It makes a major challenge to design and test reliable systems and thus efficient design-for-test (DFT) technologies are required [1]. Many difficulties in testing arise from the use of deeply embedded cores. A test access mechanism (TAM) is needed that allows the core to be efficiently accessed and tested [2]. IEEE std. 15 provides an efficient solution for testing cores by wrapping cores to prevent interaction from outside data sources and allowing tests for a single core to be generated and applied to each of the instances of the core [3]. With technology scaling, the communication architecture cause severe on-chip synchronization errors, unpredictable delays, and high power consumption. One emerging solution for such communication bottleneck is network-on-chip (NoC) as a promising alternative to bus-based and point-to-point communication architectures [4]. Furthermore, the reuse of the NoC as TAM was presented as a cost-effective strategy for the testing of embedded cores, with reduced area, pin count, and test time costs [5]. frastructures of the NoC must be tested before reusing the NoC as TAM [6]. Routers in a NoC are homogeneous structure and concurrent test strategy for them was studied [7]. Previous researches about test strategy for NoC-reused TAM were targeting heterogeneous cores and the optimization of test pin-count, test scheduling and test access was represented [8, 9]. Recently, multicore and manycore designs have evolved to include multiple identical cores. This trend has been increasingly observed for CPU cores in modern microprocessor designs [1]. The use of multiple identical cores achieves several goals: in addition to the benefits of multiprocessing, some cores can be used as redundant cores to guarantee a highly reliable system. Especially, NoC helps implementing reconfigurable system and a topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems is researched [11]. However, it did not focus about the effective testing of homogeneous cores. Some parallel TAMs for concurrent testing of multiple identical cores in not NoC but non-noc based SoC have been described previously. The AZSCAN architecture tests multiple identical cores in parallel, and the responses are compared with the expected data in the chip by using on-chip comparators [12]. A pipeline-based TAM allows a great deal of flexibility in test applications, and the pipelining helps to improve test times and to reduce the capture power requirements [13]. Also, a majority-based TAM uses the majority value of test response data and all cores can be tested using the same test pins and the same test time as required for testing a single core [14]. These parallel TAMs are based on bus-based architecture and they have not supported NoC-reuse TAM. this paper, a parallel TAM which specialized for multiple identical cores in NoC-based system is proposed. It aims to test multiple cores in parallel for minimizing test time and test pins. addition, it targets most common NoC architecture [15] and has flexibility in design, configuration, and application. By reusing the infrastructures of NoC, the hardware overhead is minimized and abundant interconnections help to reduce the test time. order to use the full bandwidth of NoC interconnection as the width of TAM, a new deterministic routing algorithm to transfer test data is designed. The proposed TAM can be used to perform a complete core-level diagnosis in the case of faults by multiple cores. This paper is organized as follows: Section 2 reviews more detailed relating works. Section 3 presents an overview of NoC design and a description of the NoC-reused TAM. It includes a new architecture for 3D environments. Section 4 discusses experimental results, while Section 5 concludes the paper.

II. RELATED WORKS On-chip access to wrapped cores embedded in a SoC is provided by a TAM. The TAM is used to transport the test stimuli data from a test pattern source to the core under test and to transport the test response data from the core under test to a test pattern sink. multicore or manycore systems, TAM can be implemented in various ways, which has test pins and test time as the main factors. Especially, multiple identical cores are expected to respond identically to given test patterns. This allows TAM to compare the test response data from cores either to the expected response or to the other cores response using on-chip comparators [16]. When the test response data of cores are compared with the expected response at on-chip, additional test pins are required for the expected data input [12]. To reduce the number of test pins to that of a single core, a pipelined TAM has been proposed [13]. this design, the test response data of multiple identical cores are compared on chip with the test response data from a primary core. The test response data of the primary core are compared with the expected data in the automated test equipment (ATE); if they agree, the primary core is considered non-faulty, and any other cores whose test response data differ from that of the primary core are thus considered faulty. On the contrary, if the primary core has a fault, the other core is selected as a primary core and the test is restarted with the new primary core. The proposed NoC-reused TAM adopted the pipelined structure and implemented the pipelining registers by reusing input buffers of the routers in NoC. Therefore, with minimized hardware overhead, the proposed TAM can test multiple identical cores with the same number of test pins and test time as that of a single core. However, if the pipeline-based TAM is applied to NoC-reused TAM as it is, the test stimuli data and the response data of primary core are transferred in one direction. It will reuse only half of the bidirectional links in NoC and it lead to loss of test time. order to reuse the full bandwidth of the links and reduce the test time, a novel routing algorithm of test data (test stimuli and response of primary core) is proposed in addition to the pipelined NoC-reused TAM architecture. III. PROPOSED NOC-REUSED PARALLEL TAM NoCs typically use the packet-passing communication model, where the IP cores attached to the network communicate by sending and receiving request and response messages. It can be described by the approaches used to implement the mechanisms of flow control, routing, arbitration, switching, and buffering [17]. Figure 1 presents a typical structure of NoC-based system with the proposed NoC-reused parallel TAM (which is denoted as NRP-TAM in this paper). More detailed descriptions about NRP-TAM are presented in the following chapters and the experimental results show that it has negligible hardware overhead. A. Architecture NRP-TAM reuses the buffers in inputs of the routers and some multiplexers and comparators (XOR gates) are added, but the comparators can be shared with the testing of routers in [7]. Fig. 1. Architecture of the proposed TAM in NoC Figure 1 shows a partial hardware architecture of the proposed TAM in NoC. It is assumed that the Router is a source & sink node and Core is a primary core. External ATE sends the test stimuli data to Router and receives the test response data from Router and confirms whether the response data is identical with expected data. Router transfers the test stimuli data from ATE to Core and at the same time, it is transferred to Router1. One input buffer in Router 1 is reused as the pipelining register for test stimuli data. The red lines indicate the transmission of test stimuli data which transfers from Router to Router1, Router5 and Router4 according to the proposed routing algorithm. Core is selected as the primary core and the test response data of Core is transferred to both Core4 and ATE. The blue lines indicate the transmission of Core s test response data which transfers from Router to Router4, router5 and Router1. All routers and cores can send and receive the test stimuli data and test response data of primary core in the same way. And then, each router compares the test response data of its core and the test response data of primary core. some cases, additional buffering is necessary for matching the timing of test response data of each core and primary core. Router1 receives the test stimuli data prior to the test response data of primary core, so the test response data of Core1 must be buffered until receiving the primary core s test response data. This buffering reuses the input buffer as the pipelining register and the routing algorithm will show that the buffering depths are always two clocks. As shown in Figure1, one of the input buffers which connected to Core1 in Router1 is reused as a pipelining register and two more buffers are reused as buffering register (the buffering in this case is called as core_buffering in this paper). Therefore, Core1 can be tested by comparing the buffered test response data and the received test response data of primary core through north input path.

Router4 receives the test response data of primary core prior to the test stimuli data, therefore the primary core s test response data must be buffered until receiving the test response data of Core4 (the buffering in this case is called as primary_buffering in this paper). Core4 can be tested by comparing the test response data of Core4 and the received and buffered test response data of primary core through south input path. On the other hand, Router5 does not need any buffering because it receives the test response data of Core5 (stimulated by the test stimuli data through south path) and the test response data of primary core at the same time. Core5 can be tested by comparing the test response data of Core5 and the received test response data of primary core through east path. According to the proposed test data routing algorithm in the next chapter, most routers do not need buffering like as Router5. B. Routing T S T R Router8 Router9 Router1 Router11 2 Router4 Router5 Router6 Router7 2 Router Router1 Router2 Router3 2 Primary column 2 row (a) Routing of test data (without faulty core) (b) Routing of test data (with faulty core) Fig. 2. Routing algorithm of the proposed TAM in NoC order to reuse the bidirectional links in NoC for an abundant bandwidth of TAM, NRP-TAM uses a novel deterministic routing algorithm for test data. Figure 2(a) and Figure 2(b) represent 4*3 NoC systems and routings according to the proposed algorithm. Cores connected to the routers are omitted in these figures. 2 The main rule of the proposed routing algorithm for test data is to route with the shortest path, while test stimuli data having priority in a row routing and test response data of primary core having priority in a column routing. Figure 2(a), Router is a source & sink node and Core is selected as a primary core. Router transfers the test stimuli data through a row line and Router1 receives this data. To transfer the test response data of primary core to Router1, it bypasses Router4 and Router5 since the east link of Router is preoccupied by the test stimuli data. As a result, Router1 requires the core_buffering and the depth of this buffering is two clocks for bypassing two Routers. On the other hand, Router transfers the test response data of primary core through a column line and Router4 receives this data. To transfer the test stimuli data to Router4, it bypasses Router1 and Router5 because the north link of Router is preoccupied by the test response data. Therefore, Router4 has the primary_buffering and the depth of this buffering is also two clocks for bypassing two Routers. The test stimuli data and test response data of the primary core for Router5 are routed from Router1 and Router4 respectively according to the rule of shortest path. After routing all the routers in a NoC by the proposed test data routing algorithm, only routers in the same row (Router1, Router2 and Router3) and column (Router4 and Router8) with the primary core have buffering and regardless of the number of cores, the depth of buffering are always two clocks. The remaining routers do not need the buffering and all cores in the NoC can be tested concurrently by the pipelined test data. If the primary core is faulty in the pipeline based TAM, that core is derated and an arbitrary core is decided as a new primary core. Figure 2(b) shows the state of Core which was a primary core has a fault and Core1 is selected as a new primary core. NRP-TAM selects the primary core in a numerical order (Core, Core1, Core2 ). The source & sink node is not changed, but the routings are mirrored on the column of Router1 (connected to the new primary core) as axis. The proposed routing algorithm offers the same routing mentioned above to the routers on the axis and the right side. On the contrary, cores on the left side are routed as mirrored. Figure 2(b), Router, Router4 and Router8 are on the left side of the primary core. Router1 receives the test stimuli data from ATE through Router. Then, Router1 sends the test stimuli data to Router2 and at the same time, Router1 sends it to Router4 through Router. other words, Router multicasts by connecting east input link to west output link and west input link to north output link. Router4 receives the test response data of the primary core from Router1 through Router5. The timing of test response data of Core4 and the primary core at Router4 are matched and no buffering is required. Router8 can be routed in the same manner. Only routers in the same row (Router2 and Router3) and column (Router5 and Router9) with the primary core have buffering as before. The proposed deterministic test data routing algorithm is arranged as pseudo-code in Figure 3. It is designed clearly and simply according to the position of routers. The routing algorithm of NRP_TAM consists of Primary_routing (routing of the primary core) and General_routing (routing of the nonprimary cores).

NRP_TAM( ){ Primary_route( ){ Routing_primary_from_source&sink( );} Non-primary_route( ){ case(position of router) same row with primary: if(right side of primary){ Routing_stimulti_from_WEST( ); Routing_response_from_NORTH( ); Buffering_core( );} else if(left side of primary){ Routing_stimulti_from_EAST( ); Faulty_cores( );}} same column with primary: if(upper side of primary){ Routing_stimuli_from_EAST( ); Routing_response_from_SOUTH( ); Buffering_primary( );} else if(down side of primary){ Faulty_cores( );}} neither same row nor column: if((right side)&&( upper side) of primary){ Routing_stimuli_from_SOUTH( ); Routing_response_from_WEST( );} else if((left side)&&(upper side) of primary){ Routing_stimuli_from_SOUTH( ); Routing_response_from_EAST( );} else{ Faulty_cores( );}} endcase}} Second, if the router is on the same column and the upper side of the primary core, it receives the test stimuli data from east link and the test response data of the primary core from south link. addition, this router requires primary_buffering. If the router is on the same column, and the down side of the primary core, it can be considered that the core connected to this router was a primary core and a fault was detected in this core. this case, this router is inactive. Last is the case when the remaining routers are on neither same row nor column of the primary core. If the router is on the right and upper side of the primary core, it receives the test stimuli data from south link and the test response data of the primary core from west link. On the other hand, if the router is on the left and upper side of the primary core, it is routed in the mirrored direction (receives the test stimuli data from south and the test response data from east). The cores connected to the other routers are faulty and they are inactive. The proposed deterministic test data routing algorithm can be extended to cases of alternative primary core or the number of cores which is changed. Likewise general NoC, the proposed routing algorithm in NoC-reused TAM design gives importance to the complete scalability. The characteristic of routing algorithm always requires two additional buffers for some routers in order to use the entire bidirectional links. These buffers can be implemented by reusing the input buffers in general routers at NoC. Even though a particular NoC has less than three input buffers in router (one for pipelining and two for additional buffering), this overhead would be negligible. C. Timing diagram test response Fig. 3. Pseudo-code of the proposed test data routing algorithm in NoC The primary core receives the test stimuli data from ATE and sends the test response data to ATE through source&sink router. The primary core transforms if it has a fault, but the source&sink router remains unchanged. The routers connected to the faulty primary cores are just used as path of the Primary_route and it can be routed in the shortest path easily because they do not communicate with the cores and all links of them are free. The other non-primary routers are routed according to its position. First, if the router is on the same row and the right side of the primary core, it receives the test stimuli data from west link and the test response data of the primary core from north link. addition, this router requires core_buffering. If the router is on the same row, and the left side of the primary core, it indicates that the core connected to this router was a primary core and a fault was detected in this core. Therefore, it transfers the test stimuli data in the mirrored direction. primary response Fig. 4. Timing diagram of test data in the proposed TAM To represent pipelined test data and concurrent test process of the NRP-TAM, Figure 4 shows the timing diagram of test data in the proposed TAM. The test stimuli data is transferred from Router to Router1, Router5 and Router4. Therefore, they receive the test response data of their core in the same order as above. The test response data of the primary core is transferred from Router to Router4, Router5 and Router1. Router1 needs core_buffering for comparing the first test response of its core (TR1) and the first test response of the primary core (PR1).

Router4 needs primary_buffering and the depths of both buffering are two clocks as described earlier. For example, when the clock time is clk4, Router receives the fourth test response data of Core (TR4) and transfers it as the fourth test data of the primary core (PR4). Router1 receives the third test response data of Core1 (TR3) and the first test response data of primary core (PR1). TR3 is buffered and PR1 is compared with the core_buffered TR1 for testing Core1. Router5 receives the second test response data of Core5 (TR2) and the primary core (PR2). They are compared for testing Core5. Router4 receives the first test response data of Core4 (TR1) and the third test response data of primary core (PR3). PR3 is buffered and TR1 is compared with the primary_buffered PR1 for testing Core4. D. 3D Extention Fig. 5. Routing results with the proposed TAM 3D NoC Advances in chip design and test technology have led to 3D stacked chips and there are many recent studies about 3D NoC topology and 3D TAM technologies for reliable and high yield 3D ICs [18, 19, 2, 21]. NRP-TAM can extend to such 3D environments and Figure 5 shows the proposed TAM in a 4*3*2 NoC system. Each layer can be tested by the proposed test data routing algorithm in NRP-TAM as mentioned earlier. This is a prebond testing process in the 3D stacked ICs, and the proposed TAM is extended for the post-bond testing. Figure 5, the routing result from the bottom die is the same as the routing result of the 2D NoC in Figure 2(a). The upper die receives the test stimuli data (TS) and the test response data of the primary core (TR) from the bottom die. this case, only two vertical links are reused for transmission of the test data and Router12 operates as a semi-primary router in the upper layer. Router12 receives the test response of the primary core from Router (primary router) and the test stimuli data from ATE through Router, Router1 and Router13. the upper layer, Router12 sends the test stimuli data and the test response data of primary core to the other routers and the cores in this layer can be tested concurrently. The routers on the same row or column with the semi-primary router require two depths of buffering as the bottom layer, but the semi-primary router also needs the buffering. IV. EXPERIMENTAL RESULTS Several experiments are performed to verify the effectiveness of the proposed NoC-reused parallel TAM. The general NoC [22] is synthesized by Synopsys 9nm generic library [23] for analyzing and implementing the proposed TAM in real NoC systems. A. Test time TAM width TABLE I. M5 in d695 WIDTH OF THE NOC-REUSED TAM AND TEST TIME Test time (cycles) M26 in p2281 M13(M14) in p93791 averaged time reduction 8 24,197 278,641 242,384-16 12,29 145,417 123,434 48.81% 32 6,215 72,981 8,729 44.5% the NoC with the bidirectional link, NRP-TAM reuses one directional link for the test stimuli data and another directional link for the test response data of primary core. Therefore, multiple cores can be tested in the same amount of test time when testing a single core by a typical TAM which has the unidirectional link. The bandwidth of the proposed TAM is twice as large as the NoC-reused TAM applied by pipeline-based TAM because it reuses the one directional link for both the test stimuli data and the test response data of primary core. order to show the relation between test time and width of the NoC-reused TAM, some modules in ITC`2 benchmarks [24] are tested according to wrapper & TAM optimization method [25]. Some hypothetical homogeneous systems are implemented by duplicating a specific module in the benchmarks. Therefore, the test time of a specific core in the benchmarks are used as the total test time of multiple identical cores with NRP-TAM. As the width increases as twice as large, the test times tends to decrease nearly as half. For example, the result value 48.81% in Table 1 indicates the average reduction rate of each test time as the width increases from 8 to 16. This experiment shows that the proposed TAM helps to reduce the test time in nearly half due to the bidirectional links. B. Hardware overhead TABLE II. HARDWARE OVERHEAD OF THE PROPOSED TAM IN 4*3 NOC NoC Hardware area buffer flit NoC Proposed TAM Overhead 6 12 18 16 9,426 1,744 1.89% 32 136,68 3,439 2.45% 64 231,623 6,866 2.88% 16 141.164 1,743 1.22% 32 224,772 3,451 1.51% 64 394,518 6,99 1.74% 16 193,38 1,765.9% 32 314,118 4,487 1.41% 64 565,113 14,127 2.44%

TABLE III. HARDWARE OVERHEAD OF THE PROPOSED TAM IN 4*3*2 (3D) NOC NoC Hardware area buffer flit NoC Proposed TAM Overhead 16 18,853 3,626 1.97% 6 32 273,361 7,142 2.55% 64 463,246 14,269 2.99% 16 282,329 3,623 1.27% 12 32 449,544 7,183 1.57% 64 789,48 14,512 1.81% 16 386,76 3,671.94% 18 32 628,224 9,261 1.45% 64 1,13,232 28,42 2.45% Table Ⅱ and Table Ⅲ represent the hardware size of the proposed TAM with NoC in the number of NAND gates. The feature of NRP-TAM is reusing the input buffers for pipelining registers and additional buffering. Additional multiplexers and comparators relating to the flit size of NoC are needed. Therefore, the hardware experimental results indicate that the hardware overhead of the propose TAM is decreased when the number of buffers in the original NoC are increased. Also, the hardware overhead is increased in the larger size of flit at NoC, which is the width of the TAM. The hardware overhead of NRP-TAM is less than 3% in the worst case of the experiments, but it does not include the hardware size of cores. Considering the fact that the number of gates of a modern multicore processor system is much more than million gates, the hardware overhead of the proposed TAM in a chip is negligible. conclusion, the proposed hardware architecture needs a few more multiplexers (other architectures can reuse the infrastructures of NoC) than the aforementioned unidirectional NoC-reused TAM applied by pipeline-based TAM, but the experimental results show that this hardware overhead is imperceptible. V. CONCLUSION This paper describes a novel NoC-reused parallel TAM for concurrent testing of multicore or manycore system. All the cores can be tested simultaneously by reusing infrastructures of the NoC and test time as required for a single core. A new deterministic test data routing algorithm is designed for reusing the full bandwidth of bidirectional NoC links as the TAM. Experimental results show that the proposed TAM has a minimized test time with abundant TAM width and negligible hardware overhead. The architecture has complete scalability and it can be extended to 3D environments for 3D NoC topology and 3D stacked ICs. The proposed NRP-TAM is only related to the delivery of test data and it can be compatible and improved with the existing DFT technologies. ACKNOWLEDGMENT This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(mest) (No. 212R1A2A1A36255). REFERENCES [1] The ternational Technology Roadmap for Semiconductors, 211, pp. 26-55 [2] Y. Zorian, E.J. Marinissen, and S. Dey, Testing Embedded-Core Based System Chips, Proceedings IEEE t. Test Conf., 1998, pp. 13-143 [3] F.da Silva, T. McLaurin, and T. Waayers, The Core Test Wrapper Handbook, Springer, 26. [4] R. Marculescu, U.Y. Ogras, L.S. Peh, N.E. Jerger, and Y. Hoskote, standing Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives, IEEE Trans. on CAD of IC and Systems, vol. 28, 29, pp. 3-21 [5] E. Cota, A.M. Amory, and M.S. Lubaszewski, Reliability, Availability and Serviceability of Networks-on-Chip, Springer, 212 [6] R. Nourmandi-Pour and N. Mousavian, A fully parallel BIST-based method to test the crosstalk defects on the inter-switch links in NOC, Microelectronics Journal, vol. 44, 213, pp. 248-257 [7] A.M. Amory, E. Briao, E. Cota, M. Lubaszewski, and F.G. Moraes, A Scalable Test Strategy for Network-on-Chip Routers, Proceedings IEEE t. Test Conf., 25, Paper 25.1 [8] T. Sbiai and K. Namba, NoC Dynamically Reconfigurable as TAM, IEEE Asian Test Symposium, 212, pp.326-331 [9] R. Michael and C. Krishnendu, Optimization of Test Pin-Count, Test Scheduling, and Test Access for NoC-Based Multicore SoCs, IEEE Trans. on Computers, vol. 63, 214, pp. 691-72 [1] I. Parulkar, T. Ziaja, R. Pendurkar, A. DSouza, and A. Majumdar, A Scalable, Low Cost Design-For-Test Architecture for UltraSPARC Chip Multi-Processors, Proceedings IEEE t. Test Conf., 22, pp. 726-735 [11] L. Zhang, Y. Han, Q. Xu, X. Li, and H. Li, On Topology Reconfigurati -on for Defect-Tolerant NoC-Based Homogeneous Manycore Systems, IEEE Trans. on VLSI systems, vol.17, 29, pp. 1173-1186 [12] S. Makar, T. Altinis, N. Patkar, and J. Wu, Testing of Vega2, a Chip Multi-Processor with Spare Processors, Proceedings IEEE t. Test Conf., 27, Paper 9.1 [13] G. Giles, J. Wang, A. Sehgal, K.J. Balakrishnan, and J. Wingfield, Test Access Mechanism for Multiple Identical Cores, Proceedings IEEE t. Test Conf., 29, Best Paper [14] T. Han, I. Choi, and S. Kang, A Novel Test Access Mechanism for Parallel Testing of Multi-core System, IEICE Electronics Express, vol. 11, 214, pp. 1-6 [15] Salminen et at., Survey of Network-on-Chip Proposals, White paper, OCP-IP, 28 [16] WD Atwell Jr, Tester on a Chip (TOAC) or Apparatus for Application of Tests for Embedded test Points, Journal of Motorola Technical Developments, vol.9, 1989. [17] P. Guerrir and A. Greiner, A Generic Architecture for on-chip Packetswitched terconnections, Proceedings Design, Automation and Test in Europe Conference and Exhibition, 2, pp. 25-156 [18] H.S. Lee and K. Charkrabarty, Test Challenges for 3D tegrated Circuits, IEEE Design & Test of Computers, vol. 26, 29, pp. 26-35 [19] B. Noia, K. Charkrabarty, and E.J. Marinissen, Optimization Methods for Post-Bond Testing of 3D Stacked ICs, Journal of Electronic Testing Theory and Applications, vol. 28, 212, pp. 13-12 [2] D. Xiang, A Cost-Effectice Scheme for Network-on-Chip Router and terconnect Testing, IEEE Asian Test Symposium, 213, pp. 27-212 [21] I. Loi, F. Angiolini, S. Fujita, S. Mitra, and L. Benini, Characterization and Implementation of Fault-Tolerant Vertical Links for 3-D NoC, IEEE Trans. on CAD of IC and Systems, vol.3, 211, pp. 124-134 [22] Daniel U. Becker, Efficient Microarchitecture for Network-on-Chip Routers, PhD thesis, Stanford University, August 212. [23] 9nm generic library, http://www.synopsys.com/comunity/university Program [24] E.J. Marinissen, V. Iyengar, and K. Chakrabarty, ITC`2 SoC Test Benchmarks, http://itc2socbenchm.pratt.duke.edu/ [25] V. Iyengar and K. Chakrabarty, Test Wrapper and Test Access Mechanism Co-Optimization for System-on-Chip, Journal of Electronic Testing Theory and Applications, vol.18, 22, pp. 213-23