RECENT developments in semiconductor technology

Size: px
Start display at page:

Download "RECENT developments in semiconductor technology"

Transcription

1 1504 IEEE TRANSACTIONS ON COMPUTERS, VOL. 66, NO. 9, SEPTEMBER 2017 DRAM-Based Error Detection Method to Reduce the Post-Silicon Debug Time for Multiple Identical Cores Hyunggoy Oh, Inhyuk Choi, and Sungho Kang, Senior Member, IEEE Abstract In the post-silicon debug of multicore designs, the debug time has increased significantly because the number of cores undergoing debug has increased; however the resources available to debug the design are limited. This paper proposes a new DRAM-based error detection method to overcome this challenge. The proposed method requires only three debug sessions even if multiple cores are present. The first debug session is used to detect the error intervals of each core using golden signatures. The second session is used to detect the error clock cycles in each core using a golden data stream. Instead of storing all of the golden data, the golden data stream is generated by selecting error-free debug data for each interval which are guaranteed by the first session. Finally, the error data in all cores are only captured during the third session. The experimental results on various debug cases show significant reductions in total debug time and the amount of DRAM usage compared to previous methods. Index Terms Multiple identical cores, DRAM-based debug method, MISR compaction, golden data stream, debug time Ç 1 INTRODUCTION RECENT developments in semiconductor technology have allowed for the integration of a large number of cores into a single system-on-chip (SoC) and the prevalence of multicore designs in modern integrated circuits. However, the demand for multicore features increases the difficulty of verifying or validating those components and the number of errors that escape the pre-silicon verification and manufacturing tests has increased. Consequently, the first silicon is rarely error-free and it is imperative to detect errors during the post-silicon debug stage in order to meet stringent time-to-market requirements. The main goal of the post-silicon debug for multicore system is to detect the errors rapidly in the first silicon in order to avoid the increased cost caused by a silicon respin. There are two types of errors, which are logical and electrical errors [1], [2], [3]. Logical errors are related to designer mistakes caused by the complexity of the design. On the other hand, electrical errors occur in certain electrical environments such as parastic coupling noises, power supply noise, and crosstalk. Typically, debugging electrical errors is more challenging than debugging logical errors because it is difficult to predict and detect electrical errors during the pre-silicon verification [4], [5]. Because of these electrical errors, the post-silicon debug has become a bottleneck of the design implementation process. According to [1] and The authors are with the Department of Electrical and Electronics Engineering, Yonsei University, Seoul, Korea. {kyob508, ihchoi}@soc.yonsei.ac.kr, shkang@yonsei.ac.kr. Manuscript received 21 Nov. 2016; revised 22 Feb. 2017; accepted 28 Feb Date of publication 5 Mar. 2017; date of current version 15 Aug Recommended for acceptance by C. Metra. For information on obtaining reprints of this article, please send to: reprints@ieee.org, and reference the Digital Object Identifier below. Digital Object Identifier no /TC [6], engineering costs of the post-silicon debug consume up to 35 percent total implementation time at 90 nm and more than 50 percent of overall design effort at 65 nm. To support a fast and precise design process, design-for-debug (DfD) architectures and the post-silicon debug methods have been introduced. The scan-based debug method has been a well-known technique for the post-silicon debug. To observe as many of the internal signals as possible, logic probing techniques that leverage the reuse of scan chains have been introduced [7], [8], [9], [10]. Scan chains are commonly used in manufacturing tests. Although scan-based methods can achieve high observability, they require circuit operations to be halted, which implies the internal state data are not acquired in real-time but only at a single instant in time. Therefore, this technique is inadequate for analyzing the continuous functional behavior of the circuit. Furthermore, since errors that are difficult-to-detect may appear in any of the circuit states during thousands of clock cycles [11], it is not a desirable technique for the post-silicon debug. As a result, real-time signal tracing methods have been introduced to complement the scan-based method. The trace buffer-based debug method is commonly used to achieve real-time signal observation. This method requires an embedded logic analyzer (ELA) to manage the postsilicon debug. An ELA consists of a control unit, a trigger unit, a sample unit and an offload unit [1], [12]. The control unit monitors the trigger unit, sample unit and offload unit during the post-silicon debug. The trigger unit determines the start or end point for observing the circuit operation and the trace signals are captured via the sample unit which includes a trace buffer. Finally, the captured debug data are unloaded to the external workstation through the offload unit and the debug data are analyzed to detect the error via ß 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 OH ET AL.: DRAM-BASED ERROR DETECTION METHOD TO REDUCE THE POST-SILICON DEBUG TIME FOR MULTIPLE IDENTICAL the debug software. Since the real time at-speed observation of trace signals is captured in the trace buffer, the trace buffer-based method is helpful for the post-silicon debug. However, the major challenge of the trace buffer-based method is the limited observability because the size of the trace buffer results in DfD hardware overhead. Furthermore, the debug time increases significantly. To overcome the limitations of the trace buffer-based method, the DRAM-based debug method has been introduced [17], [18]. In the FPGA prototype, the debug method using external DRAM has been researched in [17] and this method has allowed for the extraordinary improvement of the FPGA prototype observability. In [18], a massive signal tracing method using on-chip DRAM which can be integrated in a SoC or accessed through-silicon via (TSV)-based 3D-ICs has been introduced. During a debug run, this method detects erroneous intervals using a multiple-input signature register (MISR) and stores the corresponding debug data dump in the DRAM through the trace buffer. Although this DRAM-based method overcomes the limitations of the trace buffer size, this method requires many debug sessions or a large number of buffers in order to debug multicore designs because there is no consideration of the debug case of multicore designs. Because a SoC or 3D-IC has a large number of cores, such as in multicore systems, an improved debug method for multicore designs is urgently needed in order to reduce the debug time. In this paper, a new DRAM-based error detection method for multiple identical cores is proposed to reduce the debug time and DRAM usage significantly. Typically, the multicore designs have evolved to include multiple identical cores because of the performance benefits associated with multiprocessing, and the attractive and economical options for redundant cores that guarantee highly reliable systems [19], [20], [21]. In addition, it is noted that the failures of the identical cores can be different in case of the electrical errors. Hence, the proposed method focuses to detect these electrical errors of multiple identical cores using the characteristic of them. The main contributions of this paper are as follows: A new DRAM-based debug method is proposed that detects error data in only three debug sessions. The first session detects error intervals, second session detects error clock cycles and third session captures only the error data. In order to support the second session, a golden data stream generation method is proposed that does not require storing all golden data, but instead exploits the fact that all CUDs are identical. New architectural features for a DRAM-based DfD using MISRs and comparators are proposed to perform the on-chip error intervals/cycles detection process. Probability models are introduced to estimate the DRAM usage and debug time. With this model, the DfD designer can determine the size of trace buffer and the number of CUDs with respect to the various debug cases. The rest of the paper is organized as follows. Section 2 describes the related works and motivation of the proposed idea. Section 3 discusses the proposed debug frame work and Section 4 describes the analysis of the DRAM-based method effectiveness with probability models. Section 5 provides the experimental results for various debug cases and finally, conclusions are presented in Section 6. 2 RELATED WORKS AND MOTIVATION In order to debug multiple cores of a SoC, some DfD architectures exploiting scan-based or trace buffer-based debug method have been introduced. In [10], a low-cost SoC debug platform based on-chip test architecture has been proposed. This architecture supports multi-core debugging in a SoC with a hardware breakpoint insertion and cycle-based runstop debug steps. Because the test architectures such as test access mechanism (TAM), test bus, IEEE and/or 1,500 test wrapper are reused in this debug platform, it is a cost-effective debug solution. Nonetheless, there is a limitation of a run-stop debug approach as discussed before. In [12], a DfD architecture including distributed ELAs for the post-silicon debug for multicore design in a SoC has been introduced. This architecture handles the issues of allocating the debug data of the multiple cores with a userdefined priority scheme because the resources of the trace buffers are limited. It is an effective solution when multiple cores are debugged and the priority of CUDs is required. However, the limited trace buffer observation is still a critical challenge to reduce the debug time. Therefore, several techniques for the trace buffer-based debug method have been introduced to improve the capacity of the trace buffer [13], [14], [15], [16]. In [13], an iterative error detection method using MISR signatures has been introduced to reduce the number of debug sessions during repeatable debug experiments. First, the entire target observation window is compacted and captured to the trace buffer with an MISR. After transferring the signatures to the external workstation, the erroneous intervals, i.e., error suspect windows, are detected by comparing the acquired signatures to the golden signatures. And in the following debug session, the set of error suspect window is compacted and the method zooms into the error suspect window in this way until detecting the specific error clock cycles. To improve the quality of [13], an on-chip error detection method has introduced in [14] that re-uses the empty area of the trace buffer to store pre-calculated golden signatures, and then compacts the debug data with a higher compaction ratio during error-suspect intervals. In [15] and [16], a 2-D compaction technique has been introduced to expand the observation window. This technique requires three debug sessions. The first session estimates the error rate using a parity generator. In the second session, the error suspect clock cycles are determined through a 2-D compaction using an MISR and a cycling register. Finally, in the third session, the erroneous debug data are selectively captured with pre-calculated tag bits. With this 3-pass methodology, the method can expand the observation window significantly. However, it does have some limitations. First, there is some probability that more debug sessions will be required because the first session, which estimates the error rate, is strongly dependent on the error distribution case. Furthermore, the ability of the 2-D

3 1506 IEEE TRANSACTIONS ON COMPUTERS, VOL. 66, NO. 9, SEPTEMBER 2017 Fig. 1. Examples of post-silicon debug process for n cores. (a) The previous method. (b) The proposed method. compaction process to detect the error clock cycles is increasingly inaccurate due to misidentification as the target observation window increases. Consequently, these methods are only suitable for short duration debug cases, or as supplements to the other debug methods in long duration debug cases. To overcome the limited capacity of the trace buffer, a DRAM-based debug method has been introduced in [18]. The key principle of this method is to transfer the debug data dump from the trace buffer into a larger on-chip DRAM. First, golden MISR signatures are stored in the DRAM using a trace port such as JTAG. After the debug runs, the debug data dump during the specific interval are captured in the trace buffer and compacted by an MISR set. At the end of the interval, the interval is analyzed comparing the MISR signature and the golden signature. If they are the same, this interval is error-free and the debug data for the next interval are captured in the trace buffer. If not, the captured debug data in the trace buffer are shifted to the shadow buffer and stored in the DRAM. With the above debug process, this DRAM-based debug method reduces a substantial debug time compared to the trace buffer-based method [13], [14], [15], [16]. However, this previous method focuses to debug only one core although a lot of cores should be debugged in the multicore design such as a SoC or a 3-D IC where includes the embedded DRAM (edram) or 3D DRAM [22], [23]. Consequently, the previous method requires multiple debug sessions and a lot of debug time because the debugging process is performed sequentially. The simple example of debugging n cores is described in Fig. 1a. In addition, the overhead associated with the hardware area and memory resources increases as the number of cores increases if multiple cores are to be simultaneously debugged in the previous method. In this paper, on a cycle-accurate deterministic debug environment, a new DRAM-based error detection method for multiple identical cores is proposed to reduce debug time significantly. The concept of the proposed method is described in Fig. 1b. Gray boxes and black lines indicate error intervals and error clock cycles. The main idea of the proposed method is debugging all cores not in sequence but at the same time. In the first session, error intervals for all cores are detected by golden MISR signatures. And error clock cycles for all cores are detected by the golden data stream in the second session. Instead of storing all golden data in DRAM, the golden data stream can be generated by selecting the debug data for each interval which are guaranteed as error-free in the first session. Finally, erroneous data for all cores are captured. Unlike the previous method which stores unnecessary debug interval data dump in the DRAM, the proposed DRAM-based method stores the error clock cycle data by the error detection method which requires only three debug sessions even if multiple cores are present. As a result, the proposed method provides the significant debug time and DRAM usage reduction compared to the previous method when debugging multiple cores. It is important to note that a cycle-accurate deterministic debug phase is not an impractical assumption [13], [24],

4 OH ET AL.: DRAM-BASED ERROR DETECTION METHOD TO REDUCE THE POST-SILICON DEBUG TIME FOR MULTIPLE IDENTICAL TABLE 1 Notations for Debug Experiments Name L M N S n EI index EI tag EC index EC tag GDS tag Representation Number of observed signals Buffer depth and interval length in cycles Length of the observation window in cycles Timestamp length Number of CUDs Error interval index bit Error interval tag bits Error clock cycle index bit Error clock cycle tag bits Golden data stream tag bits Fig. 2. The three debug session flow of the proposed method. [25], [26]. The post-silicon debug generally comprises two different phases: non-deterministic and deterministic. In the non-deterministic debug phase, bug occurrences cannot be reproduced because of asynchronous interfaces, interrupts from peripherals. In this phase, the main goal is to determine how to control the failure and these techniques are introduced in [25], [26], [27], and [28]. When the failures are controllable, the debug environment can be cycle-accurate deterministic. In this deterministic phase, the main goal is to detect the root cause in terms of space and time information as quickly as possible [12], [13], [14], [15], [16]. Unlike the non-deterministic debug phase, the functional tests for very long debug cycles are performed repetitively in the deterministic debug phase and it results in a tremendous debug time overhead. Hence, the proposed method focuses to cycle-accurate deterministic debug envrionments in order to exploit the characteristic of identical cores and reduce the total debug time significantly. 3 PROPOSED DRAM-BASED DEBUG SCHEME As discussed previously, the proposed method consists of three debug sessions. The three debug session flow is described in Fig. 2. In this section, each debug session is explained in detail. Then, considerations of the DRAMbased debug method are demonstrated. Finally, the hardware architecture for the proposed method is introduced. To aid in understanding, please refer to the notations presented in Table 1, which are similar to those used in [18]. 3.1 Debug Session 1 Detecting Error Intervals In the first debug session, the erroneous intervals of all CUDs are detected using an MISR-based compression technique, which has been widely used to identify failing intervals during the post-silicon debug process [13], [14], [15], [16], [18]. First, the golden MISR signatures are generated by simulating the behavioral model or by using a FPGA prototyping board. To capture the debug data in the third session, the length of MISR is set to M cycles. Then, the precalculated golden signatures are uploaded to the DRAM via a serial interface (e.g., JTAG) or a high speed trace port [12]. When the debug process starts, the debug configuration sets the trigger event conditions and selects the debug data. In addition, a golden signature is loaded into the golden signature register from the DRAM. After the functional operation begins, the debug data from each CUD are compacted by the MISR over the course of M cycles, which is the same length as the golden signature. After M cycles, the signatures are compared to the golden one in order to detect that the interval is erroneous or not. If the signature value of a core is the same to the golden signature, then the current interval of the core is error-free. If not, the current interval is an erroneous interval. To check the results of the interval error detection process of all cores, one bit is required per interval. If the bit is set to 1, this indicates least one core is erroneous during the corresponding interval. If the bit is set to 0, this indicates that all cores are error-free in the corresponding interval. This one bit is called EI index in this paper. If EI index is 1, n bits, which are referred to as EI tags, are captured in the EI tag register in order to check the results for each core in the interval. If EI index is 0, then it is not necessary to capture the EI tag. After the capture process, the EI tags are stored in the DRAM before the next interval detection process ends. After that, the next golden signature is loaded and the debug data are analyzed during the next N cycles in this manner. After the first session has completed, the EI tag in the DRAM and the EI index in the register are transferred to the workstation (off-chip) and analyzed in order to generate tag bits for the second session. This on-chip process is described in Fig Debug Session 2 Detecting Error Clock Cycles After the tag bits are transferred to the workstation, the offchip debug process is performed by the debug software. It should be noted that the process of using the debug Fig. 3. The on-chip error interval detection during the first debug session.

5 1508 IEEE TRANSACTIONS ON COMPUTERS, VOL. 66, NO. 9, SEPTEMBER 2017 Fig. 5. The example of the golden data stream selector. Fig. 4. The example of the off-chip process before the second debug session. software is not a burden with respect to the debug time because this process is performed at the same time during the on-chip debug experiments, and the process where the debug data are transferred to/from the debug software. For EI index and EI tag, an error interval matrix, which indicates the information pertaining to erroneous intervals of each core, can be generated. If n is 4 and the number of EI index is 10, the matrix size is 4 10 as described in Fig. 4. During the second debug session, golden data are required in order to analyze the debug data corresponding to each clock cycle. However, it is a tremendous burden to store all of the golden data in the DRAM as N or L increases. To solve this problem, a technique for generating the golden data stream during N cycles is introduced in this section. It exploits the fact that the error-free interval data of a core can be used to compare to the erroneous data of other cores as the golden data because all cores are identical. That is, the golden data stream can be generated by selecting the debug data for each interval which are guaranteed as errorfree in the first session. The golden data stream (GDS) selector is described in Fig. 5. To select the error-free data for each interval, the GDS tag is required. The algorithm for generating the GDS tag is described in Algorithm 1. The GDS tag consists of a GDS index and a core_sel. The GDS index determines how to select the golden data, and the core_sel is the set of selected cores, which are error-free. First, the cores, which are error-free, are identified in the core_sel when the EI index is not 0 (lines 3-9). Then, the algorithm determines the GDS index. If the GDS index is 0, this indicates that the golden data can be selected from the core_sel. If the GDS index is 1, this means that all of the cores are erroneous during the interval, and the golden data should be selected from the DRAM. If the sum of the EI tags is the same to n, then the GDS index is 1. If not, the GDS index is 0 (lines 10-13). After determining the GDS index, the core_sel is selected according to the following rules. First, the core_sel only exists when the EI index is 1 and the GDS index is 0. Then, the number of cores in the core_sel is calculated, and the core is selected which has the maximum number. In the example of Fig. 4, interval 1, 3, 4, 7 and 8 can select the core as the golden data. In this case, core 4 (core_sel ¼ 11) is for intervals 1, 3 and core 3 (core_sel ¼ 10) is for intervals 4, 7 and 8. Algorithm 1. Generating GDS tag Input: EI index, EI tag, n, N, M Output: GDS index, core_sel 1 i ¼ 0; k ¼ 0; 2 for each (EI index(i)) do 3 for each (EI tag(k)) do 4 if (EI index ¼¼0) then 5 ignore(core_sel); 6 else 7 if (EI tag(k) ¼¼0) then 8 store(core_sel, i, k); 9 end 10 if (sum of EI tag(k) ¼¼n) then 11 GDS index ¼ 1; 12 else 13 GDS index ¼ 0; 14 end 15 while (all cases of core_sel) do 16 if ((EI index ¼¼1)&&(GDS index ¼¼0) then 17 calculate_core_number(core_sel, i, k) 18 max_samecore(core_sel, i, k) 19 select_core(core_sel, i, k); 20 else 21 ignore(core_sel); 22 end 23 return GDS tag, core_sel; After generating the GDS tag, the second debug session is initiated in order to capture the information regarding the error clock cycles of each core. First, the pre-calculated tag bits are uploaded to the DRAM. If the cases for which the GDS index is 1 occur, then additional golden data are also uploaded, which requires DRAM usage more. The cases for which the GDS index is 1 are related to the logical errors because the cores are identical. However, these logical errors occur infrequently because the logical errors can be detected in the pre-silicon verification step as previously discussed. As a result, the cases that additional golden data are required occur infrequently in the most practical debug cases. The usage of the DRAM is discussed in Sections 4 and 5. When the debug process starts, the debug configuration is performed and the tag bits are uploaded to each tag register. In the case of the additional golden data, they are uploaded to the trace buffer and shadow buffer in turns because they should be compared to the debug data in a consecutive sequence of clock cycles. In addition, these

6 OH ET AL.: DRAM-BASED ERROR DETECTION METHOD TO REDUCE THE POST-SILICON DEBUG TIME FOR MULTIPLE IDENTICAL Fig. 6. The on-chip error clock cycle detection during the second debug session. buffers are re-used to capture error data in debug session 3. After the trigger point, the EI index is used to detect the erroneous intervals and the EI tag selects the erroneous cores in real time. If EI index is 0, this interval is bypassed. If not, the erroneous cores are compared to the golden data stream from the GDS selector, and the resulting bits of comparison, which are referred to as EC tags in this paper, are captured in the EC tag on-chip buffer during M cycles. In order to accommodate the worst case scenario in which all of the cores are erroneous, the size of EC tag buffer is n M. Since the capture process is also performed in a consecutive sequence of clock cycles, the shadow EC tag buffer is required. If the EC tag buffer is full, the data are shifted to the shadow buffer and stored in the DRAM. After that, the next tag bits are loaded from the DRAM into the registers, and the debug data are analyzed over the course of N cycles in this manner. After the second session is completed, the stored tag bits in the DRAM are transferred to the workstation and analyzed for the third session. This on-chip process is described in Fig Debug Session 3 Capturing Error Data After transferring the EC tags to the debug software, they are re-generated to detect the erroneous data of all cores in a sequence of clock cycles during the third session. First, the error cycle matrix is calculated using the EC tags and the error interval matrix. If the length of the MISR is 5, then a 4 5 error cycle matrix is generated for each interval. With this matrix, the EC index is generated, which indicates whether or not at least one core is erroneous in the corresponding clock cycle. First, if the EI index is 0, the interval is bypassed. If the EI index is 1, the required number of EC index is M. IfEC index is 1, EC tag is required as much as n and can be re-generated by the error clock matrix. If the EC index is 0, the cycle is bypassed. In this way, EC tag is regenerated. The process is illustrated in Fig. 7. After generating these tag bits, the final debug session is performed. The pre-calculated tag bits are uploaded to the DRAM and the debug configuration is performed. After the trigger point, the EI index is used to detect the erroneous intervals, the EC index detects the erroneous cycles, and the EC tag selects the erroneous cores in real time. The error data Fig. 7. The tag bit generation example for the third debug session. of all cores are captured in the trace buffer. When the trace buffer is full, the captured data are shifted to the shadow buffer and then stored in the DRAM. After that, the next tag bits are loaded from the DRAM and all error data are stored in the DRAM after the debug run. After the debug session has completed, the stored data are transferred to the workstation and then analyzed to find the root-cause of the errors. This on-chip process is described in Fig Considerations of the DRAM-Based Debug Method In the DRAM-based debug method, it is necessary to control the communication with DRAM during functional operation. Hence, some considerations are necessary in order to satisfy the requirements of the proposed method. First, the DRAM should be partitioned in the circuit design process in order to store the debug data. In Sections 4 and 5, the DRAM usage for the previous and proposed method is analyzed using the probability models and experimental results. With this information, the designer can determine the size of DRAM required for the post-silicon debug Fig. 8. The on-chip error data capture during the third debug session.

7 1510 IEEE TRANSACTIONS ON COMPUTERS, VOL. 66, NO. 9, SEPTEMBER 2017 Fig. 9. Operations of the proposed DfD modules during three debug sessions. (a) The first session. (b) The second session. (c) The third session. process with respect to the various debug cases. Second, sufficient bandwidth is required to shift the trace data from the shadow buffer to the DRAM at-speed. As discussed in [17], [18], this is a reasonable assumption because the post silicon debug is performed as a real applications, and there are available memory resources. In addition, unlike in the previous work, the advantage of the proposed method is that the requirement for memory resources remains the same, although n has increased. Third, the DRAM access operations from the DfD module need to be scheduled in order to avoid interfering with the debug programs. In addition, the memory access latency during each interval should be within M cycles because the debug process is performed periodically every M cycles. As a result, a sufficient trace buffer size is required to perform the DRAM-based debug method. To show how to communicate with the DRAM during the three debug sessions, the operations of the proposed DfD modules are described in Fig. 9. In the first session, each interval is compacted as an MISR signature, compared to the golden signature and then captured in the EI index and EI tag registers. EI index is captured in the register and then offloaded to the workstation after the debug session because the data volume is small. The captured EI tag (n bits) is stored to the DRAM and the next golden signature (L bits) is loaded from the DRAM. This is described in Fig. 9a. In the second session, the debug data from the erroneous cores during the erroneous interval are compared to the golden data stream and captured in the EC tag buffer. This is described in Fig. 9b. Because the capture process is performed during M cycles, the shadow EC tag is required. The dotted line means the process is performed conditionally. If the EI index is 0, it is not necessary to perform the error clock cycle detection. If the EI index is 1, the EC tag is selectively captured by the EI tag. After the EC tag buffer is full, the tag bits are shifted to the shadow one. Typically, the number of captured tag bits is diferent every M cycles. However, it is possible to predict when the EC tag will be full based on the EI tag and EI index, which are calculated before the second session. Consequently, the DfD controller can determine the shift timing. After that, the data are stored in the DRAM. If the additional golden data is required for the next interval, it is loaded into the trace buffer and shadow buffers by turns. This mechanism avoids the case where the additional golden data is required in a sequence. After that, the EI tag is loaded if the EI index is 1. Finally, the GDS tag is loaded for the next interval. In the third session, the error data of all cores are captured in the trace buffer. For the above-mentioned reasons, the shadow buffer is required. As in the second session, the timing when the trace buffer is full is controlled by the EI index, EC index and EC tag. After the trace buffer fills up, the captured data are shifted to the shadow buffer and stored in the DRAM. Finally, for the next interval, the EC index and EC tag are loaded into in each register if the EI index is 1. Then the EC tag is compared to the debug data when the EC index is 1. They are also loaded in turns in order to prevent that the required data from overlapping. This is described in Fig. 9c. Because the store and load operation is irregular in the second and third session, the preserved time area is required and can be determined by the tag bits as discussed above. In addition, there is a limitation for n in order to satisfy the operation in the third session. In the worst case where all cores are erroneous in every cycle, the minimum preserved time is M/n cycles. Hence, the maximum value of n is less than M divided by the write access latency. For example, if the average access latency of the DRAM is 25 ns, and the circuit is operating at a 1 GHz clock frequency, the maximum value of n is 20 when M is 512. In order to adapt the proposed method to a practical multi-core debug case, this limitation should be considered in the practical design process. 3.5 Hardware Architecture of the Proposed DfD The hardware architecture of the proposed method is illustrated in Fig. 10. The debug configuration module is controlled through the trace port, e.g. JTAG during the configuration step. This module controls the trigger point of the debug process and selects the CUDs and debugs data. In addition, the MISR settings and the golden register (GS) are configured during the first session, and the tag bit registers are controlled for the second and third sessions. It should be noted that these buffers and tag bit registers are re-used in order to reduce the hardware area overhead during the three debug sessions. After the start of the debug process, a finite state machine (FSM) controls the debug modules and communications with the DRAM. In session 1, the FSM controls the timing for capturing the EI tag and communicating with the DRAM controller using the interval counter. In the session 2, the FSM controls the GDS selector using the GDS tag in order to generate the golden data stream. If additional golden data is required, the FSM selects the trace buffer and shadow buffer in turns

8 OH ET AL.: DRAM-BASED ERROR DETECTION METHOD TO REDUCE THE POST-SILICON DEBUG TIME FOR MULTIPLE IDENTICAL Name p i P i X i Y Z TABLE 2 Notations for Probability Models Representation Error clock cycle probability of ith core Error interval probability of i th core Random variable as the number of erroneous intervals of ith core Random variable as the number of the cases that all cores are erroneous during the same interval Random variable as the number of the cases that at least one error interval of all cores exsists during the same interval Z Random variable as the number of the cases that at least one error clock cycle of all cores exsists at the same clock cycle DU prev(prop)_i T prev(prop) DRAM usage of i th session Debug time Since the previous method only stores the debug data for the error intervals, the DRAM usage of ith session in the previous method is calculated as: Fig. 10. Hardware architecture of the proposed DfD for three debug sessions. in order to load the data from the DRAM. In addition, the FSM determines the timing for capturing and shifting the EC tag to shadow EC tag. The timing information when the EC tag is full can be calculated with the EI tag before the second session, and is configured in the debug configuration step. After shifting the data to the shadow EC tag, the FSM supports the operation for communicating with the DRAM controller as explained in Section 3.4. In session 3, the FSM controls the timing for capturing the erroneous data in the trace buffer, shifting it to the shadow buffer, and communicating with the DRAM. To satisfy the operations of sessions 2 and 3, the preserved time area is controlled by the FSM and the interval counter. In order to store the data in the DRAM, the adapter is added in front of the shadow EC index and EC tag and shadow buffer. With this adapter, the debug data can be transferred to the memory interface although the frequency of the interface is different from the CUD. 4 ANALYSIS OF THE DRAM-BASED METHOD EFFECTIVENESS WITH PROBABILITY MODELS In this section, probability models are introduced that are used to estimate the DRAM usage and debug times for both the previous and proposed methods. These models help the DfD designer to assess various debug strategies and determine the DfD module. The notations used for the models are presented in Table 2. To easily compare this method to the previous method, some variables that are used in [18] are re-used in this paper. In this case, P i and the expectation of X ðe½x i ŠÞ are described as follows: P i ¼ ð1 ð1 p i ÞÞ M (1) N E ½X i Š ¼ P i M : (2) DU prev i ¼ EX ½ i ŠðS þ LMÞþ LN M ; (3) where S is the size of the time stamp which identifies the corresponding error interval, and LN/M is the number of golden MISR signatures stored in the DRAM. In the proposed method, each session requires different DRAM usage. To calculate the DRAM usage for the proposed method, E½Y Š, E½ZŠ, and E½Z 0 Š are required. They are described as, EZ ½ Š ¼ N M E ½YŠ ¼ N M Yn 1 Y n i¼1 EZ ½ 0 Š ¼ N 1 Yn i¼1 i¼1 P i (4) ð1 P i Þ! (5)! ð1 p i Þ : (6) In the first session, golden MISR signatures are required and the EI index and EI tag are stored in the DRAM every M cycles. Consequently, the DRAM usage for the first session can be described as, DU prop 1 ¼ LN M þ N M þ ne½zš ; (7) where the EI index is N/M and EI tag is ne[z]. In the second session, the GDS tag, the additional golden data, EI index and EI tag are uploaded in the DRAM in order to detect the erroneous clock cycles. The GDS tag and additional golden data can be described as, GDS tag ¼ N M þ log 2nEZ ð ½ Š EY ½ ŠÞ (8) additional golden data ¼ LME½Y Š : (9) The sum of GDS tag, additional golden data, EI tag, and EI index is the total uploaded data volume before starting

9 1512 IEEE TRANSACTIONS ON COMPUTERS, VOL. 66, NO. 9, SEPTEMBER 2017 the second session. After the debug process starts, the EC tag is stored in the DRAM every M cycles. Hence, the DRAM usage for the second session is described as, DU prop 2 ¼ upload data volume þ ME½ZŠðnP i Þ ; (10) where the captured EC tag volume during the second session. To simplify the calculations, it is assumed that the error occurrence distribution of all cores follows a binomial distribution during the interval, and each P i is the same. In the third session, the EI index, EC index and EC tag are uploaded to the DRAM in order to detect the erroneous data of all cores. As discussed in Section 3.3, the EC tag is regenerated. After the debug process start, the error data of all cores are stored in the DRAM every M cycles. As a result, the DRAM usage for the third session is calculated as, DU prop 3 ¼ N M þ ME ½ Z ŠþnE½Z0Š þ Xn LNp i : (11) As discussed in Section 3.4, the preserved area of the DRAM is required to perform the debug process, and the area should be larger than the maximum DRAM usage across all sessions. That is, the DRAM usage for the previous and proposed method is described as, DU prev ¼ MAX DU prev1 ;DU prev2 ;...:; DU prevn (12) DU prop ¼ MAX DU prop1 ; DU prop2 ;DU prop3 : (13) To calculate the debug execution time, both the on-chip sampling time and communication time are required. The on-chip sampling time is related to the number of clock cycles that elapse from the trigger point until the debug session ends. In the previous method, the total on-chip sampling clock cycles are nn. However, the proposed method requires only 3N. In addition, if all cores are error-free, only N cycles are needed because only the first session is required. That is, the proposed method reduces debug time significantly with respect to the on-chip sampling time. The communication time is the time during which the debug data are uploaded and offloaded through the trace port. In the previous method, the golden MISR signatures are uploaded once, and the erroneous data and corresponding S of each core are offloaded every session. On the other hand, it is necessary to transfer the DRAM usage during all three sessions in the proposed method. Furthermore, it is only necessary to transfer the DRAM contents for session 1 if all cores are error-free. T prevðpropþ can be calculated as follows: T prev ¼ nn f CUD 8 < T prop ¼ : þ LN M þ P n i¼1 EX i N f þ DUprop 1 CUD f trace port EðZ 0 P 3 3N i¼1 f þ DU prop i CUD f trace port i¼1 ð ÞðS þ LMÞ (14) f trace port ð Þ ¼ 0Þ ðotherwiseþ: (15) Fig. 11 shows the expected results from calculating the probability models when N is 2M cycles, L is 32, M is 512 and the ratio of f CUD to f trace port ðaþ is 10. To simplify estimation of the expected results, it is assumed that all p i are the same. First, Fig. 11a shows the progress of the DRAM usage in the proposed method for various p when n ¼ 16. In the very low error rate area, DU prop 1 is the largest. As p increases, DU prop 2 and DU prop 3 also increase, and the DU prop 3 volume is the largest because the amount of stored error data increase. As p increases, DU prop 2 excels DU prop 3. This is because the amount of the additional golden data required increases exponentially as p increases. Even though this increases the DRAM usage overhead, it can be solved in the practical debug case. For example, this additional golden data volume is 0 as long as one core is error-free. This is demonstrated through various debug cases in Section 6. The DRAM usage ratio and debug time improvement ratio are described for various n and p in Figs. 11b and 11c. The DRAM usage ratio indicates DU prop =DU prev and the time improvement is calculated as ðt prev T prop Þ=T prev 100. In most cases of p, the DRAM usage of the proposed method is less than that of the previous one because the erroneous data are selectively captured with small tag bits in the proposed method. After p becomes higher than 0.1 percent, the DRAM usage of the case where n is 4 exceeds that of the case where n is 8 or 16. This is because the additional golden data volume increase as E½Y Š increases when n is small. In the high error rate area (around 1 percent), the DRAM usage ratio increases and it is double when n is 32. In addition, the ratio increases more as n increases. For example, it is 2.7 when n is 64 and it is 4.6 when n is 128. However, it is a reasonable growth in the view of increased n. In addition, it should be noted this memory is a part of the on-chip DRAM and does not require any additional hardware overhead. Consequently, the trade-off between n and the DRAM usage can be determined by the DfD designer during the design process. The debug time improvement ratio is strongly related to n. First, the improvement ratio is constant in the error-free zone because the proposed method requires only one session and the required data volume is deterministic. When n is 4 and p is very low, the improvement ratio is approximately 2040 percent. However, the ratio increases as p increases because the debug time overhead in the proposed method increases slowly when compared to the previous method. Then, the ratio decreases when p is high. This is because additional golden data are more often required as p increases. As n increases, the baseline of the improvement ratio increases and the range of fluctuation decreases. For example, the improvement ratio is always more than 90 percent when n is 32, as described in Fig. 11c. Fig. 11d shows the optimal values of M with respect to the debug time. These are obtained by setting the prop /@M to zero and assuming M is 256, 512, 1,024 or 2,048 in order to simplify the implementation. When p is low, the debug time is the shortest when M is 2,048. This is because M is the interval length of the error detection, and the error data are virtually zero and the intervals are almost error-free. As p increases, the shorter interval can detect the error data minutely during the observation window. That is, the small trace buffer size has the benefit of more debug time improvement as p increases. However, M is limited in the DRAM-based debug method, as discussed in Section 3.4. That is, the designer should consider this trade-off to optimize debug time, hardware area overhead and memory resources.

10 OH ET AL.: DRAM-BASED ERROR DETECTION METHOD TO REDUCE THE POST-SILICON DEBUG TIME FOR MULTIPLE IDENTICAL Fig. 11. Expected results by calculating the probability models with N ¼ 2M cycles, L ¼ 32, M ¼ 512, a ¼ 10. (a) DRAM usage with n ¼ 16. (b) DRAM usage ratio with various n (c) Debug time improvement ratio with various n. (d) Optimal M with various n. 5 EXPERIMENTAL RESULTS This section discusses the experimental results with respect to the DRAM usage, debug time and hardware area overhead in order to illustrate the benefits of the proposed method in the multicore debug cases. The experimental results are presented for an ARM-based processor design [29] and CPU cores in an OpenSPARC T2 [21]. First, each debug module is designed as a Verilog RTL model and synthesized using the TSMC 130 nm standard cell library [30] to estimate the area size. To perform the DRAM-based debug methods, the DRAM is modeled as a Verilog module as in [18]. Faults are randomly injected into the circuits to produce misbehavior according to the various error rates. A 32-bit data bus is used in an ARM-based design and the CPU core in an OpenSPARC T2 uses a 64-bit data bus. The data bus is assumed as the debug data to be observed by the DRAM to compare the performance of the previous and proposed method. 5.1 DRAM Usage and Debug Time Table 3 shows the DRAM usage and debug time of [18] and the proposed method for the debug experiment in which N is 2M cycles, M is 512 and a is 10 with different number of CUDs and error rate. The error rates are presented in the second column, which means how many errors are injected in the experiments. In addition, the error rates of each core can be different although the cores are identical. This is because the electrical errors occur in certain electrical environments. To solve this issue, a Gaussian distribution is used to generate the various error rates in each core. The standard deviation (s) related to the process variation of the multicores and the error rates of each core is changed with s. In this paper, s is set to The third and fourth columns show the comparisons between the previous method [18] and the proposed method. The notation Seq indicates the previous method [18] in which each core is debugged in sequence. In addition, Equation (12), (13), (14), and (15) are exploited to calculate DRAM usage and debug time for experimental results. As shown in Table 3, the proposed method reduces debug time significantly compared to [18], because it reduces the on-chip sampling time as well as the communication time. When the error rate is low, the debug time is strongly related with the sampling time. In this case, the debug time of the previous and proposed method is

11 1514 IEEE TRANSACTIONS ON COMPUTERS, VOL. 66, NO. 9, SEPTEMBER 2017 TABLE 3 DRAM Usage and Debug Time Comparison for Different Number of CUDs and Error Rate Number of CUDs(n) Error Rate (%) DRAM usage (M Byte) Debug time (M cycles) Seq [18] Prop Seq [18] Prop ARM based Design [29] OpenSPARC T2 [21] relatively small and just proportional to the number of debug sessions and CUDs. However, the communication time significantly affects the debug time as the error rate increases. This is because the amount of stored debug data increases and the frequency of the trace port, which transfers the debug data from the CUD to the external workstation, is relatively slow. In this case, the proposed method has the benefits of the communication time compared to the previous method because the proposed method only requires three sessions and the amount of the required data is significantly reduced through the tag bits (e.g., EI index, EI tag, EC index, EC tag). Furthermore, the required data of the proposed method are much less than those of the previous one as n increases. Hence, the total debug data volume for the communication between the CUDs and the workstation is much lower than in the previous method, and the debug time is correspondingly reduced compared to the previous work. The DRAM usage in the proposed method is always less than that of the previous method except in the debug case where the error rate is only percent. As Fig. 12. Debug time improvement ratio with different number of CUDs and error rate. discussed in Section 4, the maximum DRAM usage ratio is just two when error rate is very high and n is 32, and this is a reasonable increment considering the reduction in the debug time. Fig. 12 shows the debug time improvement ratio with different number of CUDs and error rate for the debug experiment in which N ¼ 2M cycles, M ¼ 512 with uniform error rate in the ARM based design. When n is 4, the improvement ratio decreases as p increases. This is because the proposed method basically requires three sessions. That is, the time benefit of the proposed method is more effective as the number of CUDs increases. Fig. 13 shows the experimental results of the debug time with different values of M for the debug cases where N is 2M cycles, a is 10, and the error rate is 0.01 percent in the ARM-based design. As discussed in Section 4, the DRAM usageanddebugtimearethesmallestwhenm is 256. However, the prerequisite of the DRAM-based debug method is that M should be larger than the memory access latencyinordertocommunicatewiththedram.inaddition, the write access latency should be within M/n to handletheworstcaseinthethreesessionsoftheproposed method. That is, the number of CUDs can be determined by the trade-off between the memory resources and the tracebuffersize.althoughtheproposedmethodhasthe limitation of n when compared to the previous method, the debug time is still significantly reduced, as described in Fig. 13. Fig. 13. The experimental results of debug time with different trace buffer depth.

12 OH ET AL.: DRAM-BASED ERROR DETECTION METHOD TO REDUCE THE POST-SILICON DEBUG TIME FOR MULTIPLE IDENTICAL n TABLE 4 Hardware Area Overhead Comparison Hardware area (2 NAND equivalents) CUD DfD modules spc(s) Seq [18] Multi [18] Prop (0.30%) (1.34%) (1.49%) (0.16%) (1.47%) (1.53%) (0.08%) (1.47%) (1.64%) (0.05%) (1.54%) (1.75%) Fig. 14. The results with different debug method and error rate. (a) Dram usage. (b) Debug time. Fig. 14 illustrates the results of the DRAM usage and debug time with different error rate for the debug cases where N is 2M cycles, M is 512, n is 8, and a is 10. In this figure, the notation Multi indicates the method where multiple cores are debugged at the same time in the previous method [18]. In order to debug n cores at the same time in the previous method, 2n trace buffers are required. Furthermore, the required bandwidth for communicating with the DRAM increases. Nonetheless, it is assumed that these limitations are acceptable in this simulation. As described in Fig. 14a, the DRAM usage increment of Multi is larger than both Seq and the proposed method because the error-suspect data of n cores during the intervals are stored in the DRAM at the same time. For example, if all intervals of all cores are erroneous, the stored data volume is nlm. On the other hand, the DRAM usage of the proposed method is less than Multi because the error data are only stored in the DRAM with three sessions. With respect to the debug time, the required debug time of Multi is less than that of Seq and the proposed method when the error rate is very low. This is because the on-chip sampling time is more important than the communication time when the error rate is very low. However, the communication time dominates as the error rate increases. As a result, the proposed method can reduce the debug time much more than Multi as the error rate increases. This is described in Fig. 14b. 5.2 Hardware Area Overhead Table 4 compares the hardware aspect of debug modules of Seq, Multi and the proposed method with different n in terms of two input NAND (NAND2). They are designed in RTL code and synthesized using the TSMC 130 nm standard cell library [30]. The results indicate only to the logic area and do not account for the on-chip buffers. The spc is the SPARC processor core module and it is used to analyze the hardware overhead in a real multicore system. In the case of Seq, an MISR set, an adapter, a golden signature register, counters and the control logic are required to perform the debug experiment. In addition, the increment of the hardware overhead is almost the same as n increases because the cores are debugged in sequence. On the other hand, Multi and the proposed one require much more hardware overhead than Seq does because multiple cores ared debugged at the same time. The proposed debug module consists of n MISR sets, a comparator, an adapter, a golden signature register, the EI tag, EI index, GDS tag and the control logic with the FSM. Because of these additional debug modules and more complicated control logic, the area overhead is larger than Seq. However, it is slightly larger than Multi because Multi requires additional hardware modules than Seq does. Furthermore, the hardware overhead of the proposed method is about 1.75 percent when there are 32 cores. This result indicates that the hardware area overhead of the proposed method is negligible compared to that of a multicore processor system. Fig. 15 shows the expected results of the number of required on-chip buffers with the different number of cores Fig. 15. Expected results of the number of required on-chip buffers with different number of cores.

13 1516 IEEE TRANSACTIONS ON COMPUTERS, VOL. 66, NO. 9, SEPTEMBER 2017 by setting the trace buffer size (LM) to 1.Seq requires only 2 trace buffers. However, if multiple cores are debugged in the previous method (Multi), the number of required trace buffers increases as 2n. In the case of the proposed method, three kinds of on-chip buffers are required, which are 2 trace buffers, 2 EC tag buffers, 2 EC index buffers. As discussed in Section 3, the size of EC tag buffer is nm and the size of EC index is M. That is, the on-chip buffer size of the proposed method is 2(Lþnþ1)M, which is a reasonable overhead compared to Multi. 6 CONCLUSION In this paper, a new DRAM-based post-silicon debug method for multiple identical cores is proposed to reduce the total debug time significantly for various debug cases. Unlike the previous methods which require time or on-chip buffer overhead during debugging multicore system, the proposed method detects error clock cycles of all cores during only three sessions, which consists of error interval detection, error clock cycle detection and error data capture process. This method accelerates the identification process of the errors when the number of CUDs increases. The hardware area and DRAM data overhead of the proposed method are negligible compared to the increment of the previous methods when multiple cores are debugged at the same time. In addition, the proposed method is compatible with other debug techniques, e.g., debugging communication logics. As a result, the proposed method is suitable to be adapted to the practical debug cases of the SoCs or 3D-ICs, which include large on-chip memories, such as edram or 3D DRAM. ACKNOWLEDGMENTS This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIP) (No. 2015R1A2A1A ). Sungho Kang is the corresponding author. REFERENCES [1] M. Abramovici, P. Bradley, J. Dwarakanath, P. Levin, G. Memmi, and D. Miller, A reconfigurable design-for-debug infrastricture for SoCs, in Proc. ACM/IEEE Des. Autom. Conf., 2006, pp [2] X. Liu, X. Liu, and Q. Xu, On signal tracing for debugging speedpath-related electrical errors in post-silicon validation, in Proc. IEEE Asian Test Symp., Dec. 2010, pp [3] M. H. Neishaburi and Z. Zilic, On a new mechanism of trigger generation for post-silicon debugging, IEEE Trans. Comput., vol. 63, no. 9, pp , Sep [4] SB. Park, T. Hong, and S. Mitra, Post-silicon bug localization in processors using instruction footprint recording and analysis (IFRA), IEEE Trans. Comput.-Aided Des., vol. 28, no. 10, pp , Oct [5] K. Chang, I. L. Markov, and V. Bertacco, Automating post-silicon debugging and repair, in Proc. Int. Conf. Comput.-Aided Des., Nov. 2007, pp [6] A. Nahir, et al., Bridging pre-silicon verification and post-silicon validation, in Proc. ACM/IEEE Des. Autom. Conf., 2010, pp [7] X. Gu, W. Wang, K. Ki, H. Kim, and S. Chung, Re-using DFT logic for functional and silicon debugging test, in Proc. IEEE Int. Test Conf., Oct. 2002, pp [8] B. Vermeulen, T. Waayers, and S. K Goel, Core-based scan architecture for silicon debug, in Proc. IEEE Int. Test Conf., Oct. 2002, pp [9] R. Datta, A. Sebastine, and J. A. Abraham, Delay fault testing and silicon debug using scan chains, in Proc. IEEE Eur. Test Symp., May. 2004, pp [10] K. J. Lee, S. Y. Liang, and A. Su, A low-cost SOC debug platform based on on-chip test architectures, in Proc. SOC Conf., Sep. 2009, pp [11] D. Josephson, The manic depression of microprocessor debug, in Proc. IEEE Int. Test Conf., Oct. 2002, pp [12] H. F. Ko, A. B. Kinsman, and N. Nicolici, Design-for-debug architecture for distributed embedded logic analysis, IEEE Trans. VLSI Syst., vol. 19, no. 8, pp , Aug [13] E. A. Daoud and N. Nicolici, On using lossy compression for repeatable experiments during silicon debug, IEEE Trans. Comput., vol. 60, no. 7, pp , Jul [14] H. Oh, T. Han, I. Choi, and S. Kang, An on-chip error detection method to reduce the post-silicon debug time, IEEE Trans. Comput., vol. 66, no. 1, pp , Jan [15] J.-S. Yang and N. Touba, Improved trace buffer observation via selective data capture using 2-D compaction for post-silicon debug, IEEE Trans. VLSI Syst., vol. 21, no. 2, pp , Feb [16] W. Jung, H. Oh, D. Kang, and S. Kang, A 2-D compaction method using macro block for post-silicon validation, in Proc. Int. SoC Des. Conf., pp , Nov [17] Feb [Online]. Available: issue1.pdf [18] S. Deutsch and K. Chakrabarty, Massive signal tracing using onchip DRAM for In-system silicon debug, in Proc. IEEE Int. Test Conf., 2014, pp [19] G. Giles, J. Wang, A Sehgal, K. J. Balakrishnan, and J. Wingfield, Test access mechanism for multiple identical cores, in Proc. IEEE Int. Test Conf., Oct. 2008, pp [20] M. Sharma, A Dutta, W.-T. Cheng, B. Benware, and M. Kassab, A novel test access mechanism for failure diagnosis of multiple isolated identical cores, in Proc. IEEE Int. Test Conf., Sep. 2011, pp [21] T. Han, I. Choi, and S. Kang, Majority-based test access mechanism for parallel testing of multiple identical cores, IEEE Trans. Very Large Scale Integr. Syst., vol. 23, no. 8, pp , Aug [22] D. Wendel, et al., The Power7TM processor SoC, in Proc. Int. Conf. IC Des. Technol., 2010, pp [23] H. Sun, et al., 3D DRAM design and application to 3D multicore systems, IEEE Des. Test Comput., vol. 26, no. 5, pp , Sep [24] H. F. Ko and N. Nicolici, Combining scan and trace buffers for enhancing real-time observability in post-silicon debugging, in Proc. IEEE Eur. Test Symp., Jul. 2010, pp [25] S. Sarangi, B. Greskamp, and J. Torrellas, CADRE: Cycle-accurate deterministic replay for hardware debugging, in Proc. IEEE Int. Conf. Dependable Syst. Netw., Jun. 2006, pp [26] I. Silas, I. Frumkin, E. Hazan, E. Mor, and G. Zobin, System-level validation of the Intel Pentium M processor, Intel Technol. J., vol. 7, no. 2, pp , May [27] B. Quinton and S. Wilton, Programmable logic core based postsilicon debug for SoCs, in Proc. 4th IEEE Silicon Debug Diagnosis Workshop, May [28] M. Fujita and H. Yoshida, Post-silicon patching for verification/ debugging with high-level models and programmable logic, in Proc. 17th Asia South Pacific Des. Autom. Conf., 2012, pp [29] Dec. 23, [Online]. Available: amber [30] Apr. 03, [Online]. Available: downloads/tsmc_library_request/sc_brochure_9.pdf Hyunggoy Oh received the BS degree in electrical and electronics engineering from Yonsei University, Seoul, Korea, in 2014, where he is currently working toward the MS and PhD degrees in electrical and electronics engineering. His current research interests include design for testability/ debug, and system-level test and validation.

14 OH ET AL.: DRAM-BASED ERROR DETECTION METHOD TO REDUCE THE POST-SILICON DEBUG TIME FOR MULTIPLE IDENTICAL Inhyuk Choi received the BS degree in electrical and electronics engineering from Yonsei University, Seoul, Korea, in 2009, where he is currently working toward the MS and PhD degrees in the same field. His current research interests include SoC design, design for testability, and systemlevel test and validation. Sungho Kang received the BS degree in control and instrumentation engineering from Seoul National University, Seoul, Korea, and the MS and PhD degrees in electrical and computer engineering from University of Texas at Austin, Austin, Texas, in He was a research scientist with the Schlumberger Laboratory for Computer Science, Schlumberger Inc., Austin, Texas, and a senior staff engineer with Semiconductor Systems Design Technology, Motorola Inc., Austin, Texas. Since 1994, he has been a professor in the Department of Electrical and Electronic Engineering, Yonsei University, Seoul. His current research interests include very-large-scale integration/ system-on-chip/3d IC design and testing, design-for-testability, built-in self-test, defect diagnosis, and design-for-manufacturability. He is senior member of the IEEE. " For more information on this or any other computing topic, please visit our Digital Library at

An On-Chip Error Detection Method to Reduce the Post-Silicon Debug Time

An On-Chip Error Detection Method to Reduce the Post-Silicon Debug Time 38 IEEE TRANSACTIONS ON COMPUTERS, VOL. 66, NO. 1, JANUARY 2017 An On-Chip Error Detection Method to Reduce the Post-Silicon Debug Time Hyunggoy Oh, Taewoo Han, Inhyuk Choi, and Sungho Kang, Member, IEEE

More information

Test Resource Reused Debug Scheme to Reduce the Post-Silicon Debug Cost

Test Resource Reused Debug Scheme to Reduce the Post-Silicon Debug Cost IEEE TRANSACTIONS ON COMPUTERS, VOL. 67, NO. 12, DECEMBER 2018 1835 Test Resource Reused Debug Scheme to Reduce the Post-Silicon Debug Cost Inhyuk Choi, Hyunggoy Oh, Young-Woo Lee, and Sungho Kang, Senior

More information

An Area-Efficient BIRA With 1-D Spare Segments

An Area-Efficient BIRA With 1-D Spare Segments 206 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 1, JANUARY 2018 An Area-Efficient BIRA With 1-D Spare Segments Donghyun Kim, Hayoung Lee, and Sungho Kang Abstract The

More information

Parallelized Network-on-Chip-Reused Test Access Mechanism for Multiple Identical Cores

Parallelized Network-on-Chip-Reused Test Access Mechanism for Multiple Identical Cores IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 35, NO. 7, JULY 2016 1219 Parallelized Network-on-Chip-Reused Test Access Mechanism for Multiple Identical Cores Taewoo

More information

Boost FPGA Prototype Productivity by 10x

Boost FPGA Prototype Productivity by 10x Boost FPGA Prototype Productivity by 10x Introduction Modern ASICs have become massively complex due in part to the growing adoption of system on chip (SoC) development methodologies. With this growing

More information

WITH integrated circuits, especially system-on-chip

WITH integrated circuits, especially system-on-chip IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2006 1227 Improving Linear Test Data Compression Kedarnath J. Balakrishnan, Member, IEEE, and Nur A. Touba, Senior

More information

A novel test access mechanism for parallel testing of multi-core system

A novel test access mechanism for parallel testing of multi-core system LETTER IEICE Electronics Express, Vol.11, No.6, 1 6 A novel test access mechanism for parallel testing of multi-core system Taewoo Han, Inhyuk Choi, and Sungho Kang a) Dept of Electrical and Electronic

More information

Scan-Based BIST Diagnosis Using an Embedded Processor

Scan-Based BIST Diagnosis Using an Embedded Processor Scan-Based BIST Diagnosis Using an Embedded Processor Kedarnath J. Balakrishnan and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer Engineering University of Texas

More information

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing

More information

Employing Multi-FPGA Debug Techniques

Employing Multi-FPGA Debug Techniques Employing Multi-FPGA Debug Techniques White Paper Traditional FPGA Debugging Methods Debugging in FPGAs has been difficult since day one. Unlike simulation where designers can see any signal at any time,

More information

ADVANCED DIGITAL IC DESIGN. Digital Verification Basic Concepts

ADVANCED DIGITAL IC DESIGN. Digital Verification Basic Concepts 1 ADVANCED DIGITAL IC DESIGN (SESSION 6) Digital Verification Basic Concepts Need for Verification 2 Exponential increase in the complexity of ASIC implies need for sophisticated verification methods to

More information

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Woosung Lee, Keewon Cho, Jooyoung Kim, and Sungho Kang Department of Electrical & Electronic Engineering, Yonsei

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation

Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation Hamid Shojaei, and Azadeh Davoodi University of Wisconsin 1415 Engineering Drive, Madison WI 53706 Email: {shojaei,

More information

TEST cost in the integrated circuit (IC) industry has

TEST cost in the integrated circuit (IC) industry has IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 8, AUGUST 2014 1219 Utilizing ATE Vector Repeat with Linear Decompressor for Test Vector Compression Joon-Sung

More information

On Multiplexed Signal Tracing for Post-Silicon Debug

On Multiplexed Signal Tracing for Post-Silicon Debug On Multiplexed Signal Tracing for Post-Silicon Debug iao Liu and Qiang u Department of Computer Science & Engineering The Chinese University of Hong Kong, Shatin, NT, Hong Kong Email: {xliu,qxu}@csecuhkeduhk

More information

Reconfigurable Linear Decompressors Using Symbolic Gaussian Elimination

Reconfigurable Linear Decompressors Using Symbolic Gaussian Elimination Reconfigurable Linear Decompressors Using Symbolic Gaussian Elimination Kedarnath J. Balakrishnan and Nur A. Touba Computer Engineering Research Center University of Texas at Austin {kjbala,touba}@ece.utexas.edu

More information

RISC-V Core IP Products

RISC-V Core IP Products RISC-V Core IP Products An Introduction to SiFive RISC-V Core IP Drew Barbier September 2017 drew@sifive.com SiFive RISC-V Core IP Products This presentation is targeted at embedded designers who want

More information

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience H. Krupnova CMG/FMVG, ST Microelectronics Grenoble, France Helena.Krupnova@st.com Abstract Today, having a fast hardware

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

I/O Management and Disk Scheduling. Chapter 11

I/O Management and Disk Scheduling. Chapter 11 I/O Management and Disk Scheduling Chapter 11 Categories of I/O Devices Human readable used to communicate with the user video display terminals keyboard mouse printer Categories of I/O Devices Machine

More information

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured System Performance Analysis Introduction Performance Means many things to many people Important in any design Critical in real time systems 1 ns can mean the difference between system Doing job expected

More information

Automated Data Analysis Solutions to Silicon Debug

Automated Data Analysis Solutions to Silicon Debug Automated Data Analysis Solutions to Silicon Debug Yu-Shen Yang Dept. of ECE University of Toronto Toronto, M5S 3G4 yangy@eecg.utronto.ca Nicola Nicolici Dept. of ECE McMaster University Hamilton, L8S

More information

Testable SOC Design. Sungho Kang

Testable SOC Design. Sungho Kang Testable SOC Design Sungho Kang 2001.10.5 Outline Introduction SOC Test Challenges IEEE P1500 SOC Test Strategies Conclusion 2 SOC Design Evolution Emergence of very large transistor counts on a single

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

3D Memory Formed of Unrepairable Memory Dice and Spare Layer

3D Memory Formed of Unrepairable Memory Dice and Spare Layer 3D Memory Formed of Unrepairable Memory Dice and Spare Layer Donghyun Han, Hayoug Lee, Seungtaek Lee, Minho Moon and Sungho Kang, Senior Member, IEEE Dept. Electrical and Electronics Engineering Yonsei

More information

AS FEATURE sizes shrink and designs become more complex,

AS FEATURE sizes shrink and designs become more complex, IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 10, OCTOBER 2004 1447 Identification of Error-Capturing Scan Cells in Scan-BIST With Applications to System-on-Chip

More information

Logic Bug Detection and Localization Using Symbolic Quick Error Detection

Logic Bug Detection and Localization Using Symbolic Quick Error Detection Logic Bug Detection and Localization Using Symbolic Quick Error Detection 1 Logic Bug Detection and Localization Using Symbolic Quick Error Detection Eshan Singh, David Lin, Clark Barrett, and Subhasish

More information

Verifying the Correctness of the PA 7300LC Processor

Verifying the Correctness of the PA 7300LC Processor Verifying the Correctness of the PA 7300LC Processor Functional verification was divided into presilicon and postsilicon phases. Software models were used in the presilicon phase, and fabricated chips

More information

Design For High Performance Flexray Protocol For Fpga Based System

Design For High Performance Flexray Protocol For Fpga Based System IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) e-issn: 2319 4200, p-issn No. : 2319 4197 PP 83-88 www.iosrjournals.org Design For High Performance Flexray Protocol For Fpga Based System E. Singaravelan

More information

NEW ALGORITHMS AND ARCHITECTURES FOR POST-SILICON VALIDATION

NEW ALGORITHMS AND ARCHITECTURES FOR POST-SILICON VALIDATION NEW ALGORITHMS AND ARCHITECTURES FOR POST-SILICON VALIDATION NEW ALGORITHMS AND ARCHITECTURES FOR POST-SILICON VALIDATION BY HO FAI KO, B.Eng. & Mgt., M.A.Sc. APRIL 2009 a thesis Submitted to the School

More information

A New Scan Chain Fault Simulation for Scan Chain Diagnosis

A New Scan Chain Fault Simulation for Scan Chain Diagnosis JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.7, NO.4, DECEMBER, 2007 221 A New Scan Chain Fault Simulation for Scan Chain Diagnosis Sunghoon Chun, Taejin Kim, Eun Sei Park, and Sungho Kang Abstract

More information

Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of SOC Design

Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of SOC Design IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 1, JANUARY 2003 1 Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of

More information

Reducing Control Bit Overhead for X-Masking/X-Canceling Hybrid Architecture via Pattern Partitioning

Reducing Control Bit Overhead for X-Masking/X-Canceling Hybrid Architecture via Pattern Partitioning Reducing Control Bit Overhead for X-Masking/X-Canceling Hybrid Architecture via Pattern Partitioning Jin-Hyun Kang Semiconductor Systems Department Sungkyunkwan University Suwon, Korea, 16419 kangjin13@skku.edu

More information

A CAN-Based Architecture for Highly Reliable Communication Systems

A CAN-Based Architecture for Highly Reliable Communication Systems A CAN-Based Architecture for Highly Reliable Communication Systems H. Hilmer Prof. Dr.-Ing. H.-D. Kochs Gerhard-Mercator-Universität Duisburg, Germany E. Dittmar ABB Network Control and Protection, Ladenburg,

More information

4 DEBUGGING. In This Chapter. Figure 2-0. Table 2-0. Listing 2-0.

4 DEBUGGING. In This Chapter. Figure 2-0. Table 2-0. Listing 2-0. 4 DEBUGGING Figure 2-0. Table 2-0. Listing 2-0. In This Chapter This chapter contains the following topics: Debug Sessions on page 4-2 Code Behavior Analysis Tools on page 4-8 DSP Program Execution Operations

More information

Parallel Simulation Accelerates Embedded Software Development, Debug and Test

Parallel Simulation Accelerates Embedded Software Development, Debug and Test Parallel Simulation Accelerates Embedded Software Development, Debug and Test Larry Lapides Imperas Software Ltd. larryl@imperas.com Page 1 Modern SoCs Have Many Concurrent Processing Elements SMP cores

More information

Ten Reasons to Optimize a Processor

Ten Reasons to Optimize a Processor By Neil Robinson SoC designs today require application-specific logic that meets exacting design requirements, yet is flexible enough to adjust to evolving industry standards. Optimizing your processor

More information

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Joseph Yiu, ARM Ian Johnson, ARM January 2013 Abstract: While the majority of Cortex -M processor-based microcontrollers are

More information

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009 Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Cache 11232011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Memory Components/Boards Two-Level Memory Hierarchy

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY Saroja pasumarti, Asst.professor, Department Of Electronics and Communication Engineering, Chaitanya Engineering

More information

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth

More information

Fault Tolerant Parallel Filters Based on ECC Codes

Fault Tolerant Parallel Filters Based on ECC Codes Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 11, Number 7 (2018) pp. 597-605 Research India Publications http://www.ripublication.com Fault Tolerant Parallel Filters Based on

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Scalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN)

Scalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN) Scalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN) Abstract With increasing design complexity in modern SOC design, many memory

More information

4. Hardware Platform: Real-Time Requirements

4. Hardware Platform: Real-Time Requirements 4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture

More information

Lecture notes for CS Chapter 2, part 1 10/23/18

Lecture notes for CS Chapter 2, part 1 10/23/18 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Storage. Hwansoo Han

Storage. Hwansoo Han Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

Simplifying the Development and Debug of 8572-Based SMP Embedded Systems. Wind River Workbench Development Tools

Simplifying the Development and Debug of 8572-Based SMP Embedded Systems. Wind River Workbench Development Tools Simplifying the Development and Debug of 8572-Based SMP Embedded Systems Wind River Workbench Development Tools Agenda Introducing multicore systems Debugging challenges of multicore systems Development

More information

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,

More information

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory Shawn Koch Mark Doughty ELEC 525 4/23/02 A Simulation: Improving Throughput and Reducing PCI Bus Traffic by Caching Server Requests using a Network Processor with Memory 1 Motivation and Concept The goal

More information

An Infrastructural IP for Interactive MPEG-4 SoC Functional Verification

An Infrastructural IP for Interactive MPEG-4 SoC Functional Verification International Journal on Electrical Engineering and Informatics - Volume 1, Number 2, 2009 An Infrastructural IP for Interactive MPEG-4 SoC Functional Verification Trio Adiono 1, Hans G. Kerkhoff 2 & Hiroaki

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs

High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs Vol. 3, Issue. 5, Sep - Oct. 2013 pp-2894-2900 ISSN: 2249-6645 High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs M. Reddy Sekhar Reddy, R.Sudheer Babu

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

A Reconfigured Twisted Ring Counter Using Tristate Coding For Test Data Compression

A Reconfigured Twisted Ring Counter Using Tristate Coding For Test Data Compression A Reconfigured Twisted Ring Counter Using Tristate Coding For Test Data Compression 1 R.Kanagavalli, 2 Dr.O.Saraniya 1 PG Scholar, 2 Assistant Professor Department of Electronics and Communication Engineering,

More information

Design and Implementation of High Performance DDR3 SDRAM controller

Design and Implementation of High Performance DDR3 SDRAM controller Design and Implementation of High Performance DDR3 SDRAM controller Mrs. Komala M 1 Suvarna D 2 Dr K. R. Nataraj 3 Research Scholar PG Student(M.Tech) HOD, Dept. of ECE Jain University, Bangalore SJBIT,Bangalore

More information

Real-time processing for intelligent-surveillance applications

Real-time processing for intelligent-surveillance applications LETTER IEICE Electronics Express, Vol.14, No.8, 1 12 Real-time processing for intelligent-surveillance applications Sungju Lee, Heegon Kim, Jaewon Sa, Byungkwan Park, and Yongwha Chung a) Dept. of Computer

More information

DFT Trends in the More than Moore Era. Stephen Pateras Mentor Graphics

DFT Trends in the More than Moore Era. Stephen Pateras Mentor Graphics DFT Trends in the More than Moore Era Stephen Pateras Mentor Graphics steve_pateras@mentor.com Silicon Valley Test Conference 2011 1 Outline Semiconductor Technology Trends DFT in relation to: Increasing

More information

The 80C186XL 80C188XL Integrated Refresh Control Unit

The 80C186XL 80C188XL Integrated Refresh Control Unit APPLICATION BRIEF The 80C186XL 80C188XL Integrated Refresh Control Unit GARRY MION ECO SENIOR APPLICATIONS ENGINEER November 1994 Order Number 270520-003 Information in this document is provided in connection

More information

JTAG TAP CONTROLLER PROGRAMMING USING FPGA BOARD

JTAG TAP CONTROLLER PROGRAMMING USING FPGA BOARD JTAG TAP CONTROLLER PROGRAMMING USING FPGA BOARD 1 MOHAMED JEBRAN.P, 2 SHIREEN FATHIMA, 3 JYOTHI M 1,2 Assistant Professor, Department of ECE, HKBKCE, Bangalore-45. 3 Software Engineer, Imspired solutions,

More information

A Partition-Based Approach for Identifying Failing Scan Cells in Scan-BIST with Applications to System-on-Chip Fault Diagnosis

A Partition-Based Approach for Identifying Failing Scan Cells in Scan-BIST with Applications to System-on-Chip Fault Diagnosis A Partition-Based Approach for Identifying Failing Scan Cells in Scan-BIST with Applications to System-on-Chip Fault Diagnosis Chunsheng Liu and Krishnendu Chakrabarty Department of Electrical & Computer

More information

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio Project Proposal ECE 526 Spring 2006 Modified Data Structure of Aho-Corasick Benfano Soewito, Ed Flanigan and John Pangrazio 1. Introduction The internet becomes the most important tool in this decade

More information

Contents 1 Basic of Test and Role of HDLs 2 Verilog HDL for Design and Test

Contents 1 Basic of Test and Role of HDLs 2 Verilog HDL for Design and Test 1 Basic of Test and Role of HDLs... 1.1 Design and Test... 1.1.1 RTL Design Process... 1.1.2 Postmanufacturing Test... 1.2 Test Concerns... 1.2.1 Test Methods... 1.2.2 Testability Methods... 1.2.3 Testing

More information

System Debugging Tools Overview

System Debugging Tools Overview 9 QII53027 Subscribe About Altera System Debugging Tools The Altera system debugging tools help you verify your FPGA designs. As your product requirements continue to increase in complexity, the time you

More information

Optimizing Emulator Utilization by Russ Klein, Program Director, Mentor Graphics

Optimizing Emulator Utilization by Russ Klein, Program Director, Mentor Graphics Optimizing Emulator Utilization by Russ Klein, Program Director, Mentor Graphics INTRODUCTION Emulators, like Mentor Graphics Veloce, are able to run designs in RTL orders of magnitude faster than logic

More information

A hardware operating system kernel for multi-processor systems

A hardware operating system kernel for multi-processor systems A hardware operating system kernel for multi-processor systems Sanggyu Park a), Do-sun Hong, and Soo-Ik Chae School of EECS, Seoul National University, Building 104 1, Seoul National University, Gwanakgu,

More information

The Design of a Debugger Unit for a RISC Processor Core

The Design of a Debugger Unit for a RISC Processor Core Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 3-2018 The Design of a Debugger Unit for a RISC Processor Core Nikhil Velguenkar nv8840@rit.edu Follow this and

More information

Nexus Instrumentation architectures and the new Debug Specification

Nexus Instrumentation architectures and the new Debug Specification Nexus 5001 - Instrumentation architectures and the new Debug Specification Neal Stollon, HDL Dynamics Chairman, Nexus 5001 Forum neals@hdldynamics.com nstollon@nexus5001.org HDL Dynamics SoC Solutions

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Seventh Edition By William Stallings Objectives of Chapter To provide a grand tour of the major computer system components:

More information

Design And Implementation Of USART IP Soft Core Based On DMA Mode

Design And Implementation Of USART IP Soft Core Based On DMA Mode Design And Implementation Of USART IP Soft Core Based On DMA Mode Peddaraju Allam 1 1 M.Tech Student, Dept of ECE, Geethanjali College of Engineering & Technology, Hyderabad, A.P, India. Abstract A Universal

More information

INTERCONNECT TESTING WITH BOUNDARY SCAN

INTERCONNECT TESTING WITH BOUNDARY SCAN INTERCONNECT TESTING WITH BOUNDARY SCAN Paul Wagner Honeywell, Inc. Solid State Electronics Division 12001 State Highway 55 Plymouth, Minnesota 55441 Abstract Boundary scan is a structured design technique

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Chapter 11. I/O Management and Disk Scheduling

Chapter 11. I/O Management and Disk Scheduling Operating System Chapter 11. I/O Management and Disk Scheduling Lynn Choi School of Electrical Engineering Categories of I/O Devices I/O devices can be grouped into 3 categories Human readable devices

More information

Laboratory Finite State Machines and Serial Communication

Laboratory Finite State Machines and Serial Communication Laboratory 11 11. Finite State Machines and Serial Communication 11.1. Objectives Study, design, implement and test Finite State Machines Serial Communication Familiarize the students with Xilinx ISE WebPack

More information

Eliminating False Loops Caused by Sharing in Control Path

Eliminating False Loops Caused by Sharing in Control Path Eliminating False Loops Caused by Sharing in Control Path ALAN SU and YU-CHIN HSU University of California Riverside and TA-YUNG LIU and MIKE TIEN-CHIEN LEE Avant! Corporation In high-level synthesis,

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices

More information

System Verification of Hardware Optimization Based on Edge Detection

System Verification of Hardware Optimization Based on Edge Detection Circuits and Systems, 2013, 4, 293-298 http://dx.doi.org/10.4236/cs.2013.43040 Published Online July 2013 (http://www.scirp.org/journal/cs) System Verification of Hardware Optimization Based on Edge Detection

More information

08 - Address Generator Unit (AGU)

08 - Address Generator Unit (AGU) October 2, 2014 Todays lecture Memory subsystem Address Generator Unit (AGU) Schedule change A new lecture has been entered into the schedule (to compensate for the lost lecture last week) Memory subsystem

More information

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee

More information

ADVANCES in chip design and test technology have

ADVANCES in chip design and test technology have IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 Majority-Based Test Access Mechanism for Parallel Testing of Multiple Identical Cores Taewoo Han, Inhyuk Choi, and Sungho Kang Abstract

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

An Efficient Multi Mode and Multi Resolution Based AHB Bus Tracer

An Efficient Multi Mode and Multi Resolution Based AHB Bus Tracer An Efficient Multi Mode and Multi Resolution Based AHB Bus Tracer Abstract: Waheeda Begum M.Tech, VLSI Design & Embedded System, Department of E&CE, Lingaraj Appa Engineering College, Bidar. On-Chip program

More information

An Infrastructural IP for Interactive MPEG-4 SoC Functional Verification

An Infrastructural IP for Interactive MPEG-4 SoC Functional Verification ITB J. ICT Vol. 3, No. 1, 2009, 51-66 51 An Infrastructural IP for Interactive MPEG-4 SoC Functional Verification 1 Trio Adiono, 2 Hans G. Kerkhoff & 3 Hiroaki Kunieda 1 Institut Teknologi Bandung, Bandung,

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Abstract In microprocessor-based systems, data and address buses are the core of the interface between a microprocessor

More information

Post Silicon Electrical Validation

Post Silicon Electrical Validation Post Silicon Electrical Validation Tony Muilenburg 1 1/21/2014 Homework 4 Review 2 1/21/2014 Architecture / Integration History 3 1/21/2014 4 1/21/2014 Brief History Of Microprocessors 5 1/21/2014 6 1/21/2014

More information

On GPU Bus Power Reduction with 3D IC Technologies

On GPU Bus Power Reduction with 3D IC Technologies On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Atmel Exploits FPGA Flexibility in Application Development for Customizable Microcontroller-based Systems Peter Bishop, Atmel Corporation 22-Dec-2008

Atmel Exploits FPGA Flexibility in Application Development for Customizable Microcontroller-based Systems Peter Bishop, Atmel Corporation 22-Dec-2008 Atmel Exploits Flexibility in Application Development for Customizable Microcontrollerbased Peter Bishop, Atmel Corporation 22Dec2008 Introduction Designing an embedded microcontrollerbased system poses

More information

3. HARDWARE ARCHITECTURE

3. HARDWARE ARCHITECTURE 3. HARDWARE ARCHITECTURE The architecture of the Recognition Accelerator consists of two main parts: a dedicated classifier engine and a general-purpose 16-bit microcontroller. The classifier implements

More information

Assertion Checker Synthesis for FPGA Emulation

Assertion Checker Synthesis for FPGA Emulation Assertion Checker Synthesis for FPGA Emulation Chengjie Zang, Qixin Wei and Shinji Kimura Graduate School of Information, Production and Systems, Waseda University, 2-7 Hibikino, Kitakyushu, 808-0135,

More information

ISSN Vol.05, Issue.12, December-2017, Pages:

ISSN Vol.05, Issue.12, December-2017, Pages: ISSN 2322-0929 Vol.05, Issue.12, December-2017, Pages:1174-1178 www.ijvdcs.org Design of High Speed DDR3 SDRAM Controller NETHAGANI KAMALAKAR 1, G. RAMESH 2 1 PG Scholar, Khammam Institute of Technology

More information

COEN-4730 Computer Architecture Lecture 12. Testing and Design for Testability (focus: processors)

COEN-4730 Computer Architecture Lecture 12. Testing and Design for Testability (focus: processors) 1 COEN-4730 Computer Architecture Lecture 12 Testing and Design for Testability (focus: processors) Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University 1 Outline Testing

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

ECE 4750 Computer Architecture, Fall 2017 Lab 1: Iterative Integer Multiplier

ECE 4750 Computer Architecture, Fall 2017 Lab 1: Iterative Integer Multiplier School of Electrical and Computer Engineering Cornell University revision: 2017-08-31-12-21 The first lab assignment is a warmup lab where you will design two implementations of an integer iterative multiplier:

More information