Thermal-aware Fault-Tolerant System Design with Coarse-Grained Reconfigurable Array Architecture

Size: px
Start display at page:

Download "Thermal-aware Fault-Tolerant System Design with Coarse-Grained Reconfigurable Array Architecture"

Transcription

1 2010 NASA/ESA Conference on Adaptive Hardware and Systems Thermal-aware Fault-Tolerant System Design with Coarse-Grained Reconfigurable Array Architecture Ganghee Lee and Kiyoung Choi Department of Electrical Engineering and Computer Science Seoul National University, Seoul, Korea {berean97, Abstract Coarse-grained reconfigurable array architectures have drawn increasing attention due to their performance and flexibility. A typical coarse-grained reconfigurable array architecture has many PEs in the array, which is suitable for implementing spatial redundancy used for faulttolerant systems design. In this paper, we propose to implement replications and a voting function on the PE array of a coarse-grained reconfigurable array architecture to design a fault-tolerant system. We also introduce thermal-aware application mapping onto the coarse-grained reconfigurable array architecture for reliability. The experiment with Viterbi decoder shows that our approach enables implementing fault-tolerance with 12% area overhead which comes from implementing conditional execution. 1. Introduction Fault-tolerance is the property that enables a system to continue operating properly in the event of failure. Especially in aero space and biomedical applications, the system has to be highly reliable since the effect of a fault can be catastrophic. To attain the required reliability, soft error tolerant design has been widely attempted by replicating multiple identical instances of the same system, executing all of them in parallel, and choosing the correct result on the basis of majority vote [1]. The same inputs are provided to each replication and so same outputs are expected but the outputs of the replications are compared using a voter. Reconfigurable computing is becoming more and more popular with the increasing requirements for more flexibility and higher performance. Actually, coarsegrained reconfigurable array architectures [2][3] are gaining popularity, since it can reduce huge NRE (nonrecurring engineering) cost of custom VLSI chips but also has higher area efficiency than fine-grained architectures such as FPGAs. Typically, the coarse-grained reconfigurable array architectures consist of a reconfigurable array of processing elements (PEs) and its controller. Due to the large number of PEs in the array, the coarse-grained reconfigurable array architectures are suitable for implementing spatial redundancy for faulttolerance. However, conventional coarse-grained reconfigurable array architectures suffer from inefficiency in implementing a voter since they are usually designed for data-intensive kernel part rather than control-intensive part such as a voter. Most of the previous researches for reliability on reconfigurable architecture are focused on fine-grained reconfigurable devices such as FPGA [4][5]. In [6], they introduce a coarse-grained reconfigurable architecture enabling flexible reliability. However, it incurs much area overhead due to the voter implementation. In this paper, we introduce an approach to designing fault-tolerant systems efficiently with coarse-grained reconfigurable array architecture. In a preliminary effort [16], we presented an approach to supporting conditional execution on the reconfigurable architecture. The support of conditional execution enables efficient implementation of a voter without additional overhead. The novelty of our approach is as follows. - We implement low overhead fault-tolerant system with existing conditional execution mechanism. Since we implement both replications and a voter on the reconfigurable PE array, we do not incur additional area overhead for implementing a voter unlike the approach in [6]. - We consider thermal effect when generating configuration code for coarse-grained reconfigurable array architecture for reliability. The remainder of this paper is organized as follows. Section 2 introduces our coarse-grained reconfigurable array architecture. Section 3 explains the design flow for reliability. Section 4 shows experiments with Viterbi decoder. Finally, Section 5 concludes with some remarks on future work /10/$ IEEE 265

2 Figure 1. Coarse-grained reconfigurable array architecture. 2. Target architecture 2.1. Coarse-grained reconfigurable array architecture Our target architecture consists of an array of PEs, several sets of data memories and a configuration cache memory [7]. Figure 1 shows our coarse-grained reconfigurable array architecture and internal structure of the PEs. It is connected with the nearest neighboring PEstop, bottom, left and right. The size of the array can be optimized to a specific application domain [7]. In Figure 1, for example, the architecture contains a 4x4 reconfigurable array of PEs. The area-critical functional units (such as multipliers or dividers) are located outside the PEs and shared among a set of PEs [7]. Each areacritical functional unit is pipelined to curtail the critical path delay, and its execution is initiated by scheduling the area-critical operation on one of the PEs that share this area-critical resource. Thus each PE can be dynamically reconfigured either to perform arithmetic and logical operations with its own ALU in one clock cycle, or to perform multiplication or division operations using the corresponding shared functional unit in several clock cycles with pipelining. The data memory in Figure 1 is used for storing data that can be accessed by the PEs. There are two sets of memory, each of which consists of three banks: one connected to the write bus and the other two connected to the read buses. These read/write buses are also shared by the PEs like the area-critical shared functional units. The two sets of memory are used for double buffering. The configuration cache is composed of an array of Cache Elements (CEs), whose size is the same as the size of the array of PEs. More specifically, each PE has its own CE, and therefore, the two arrays (PE array and CE array) have the same dimension. Each CE has many layers, with each layer having a different context, such that the entire array of PEs can be reconfigured within just one cycle by switching the layers. Note that the area-critical resources are shared by the PEs on the same row as shown in Figure 1 and activated through the individual PEs, and thus need not be modeled separately from the PEs Feature for supporting conditional execution To support conditional execution on the reconfigurable architecture, our target architecture [16] has Condition signal as shown in Figure 2. The condition signal can be issued by conditional operations such as comparison or logical negation and the PE can select one of the results from multiple sources (between A sel and B sel ). An interconnection network is also introduced for conditional execution [16]. Among various interconnect architectures, we use the column-wide bus architecture, where buses are placed on the array along with each column. Figure 3 shows the column-wide bus architecture where the total number of buses on the array 266

3 (a) Triple-modular redundancy (TMR) Figure 2. PE structure for supporting conditional execution. (b) Double-modular redundancy (DMR) (c) No redundancy (NR) Figure 4. Three different level of reliability. Figure 3. Column-wide bus architecture. is the same as the number of columns. Each bus has 1-bit width used for the condition signal. Note that a conditional operation should be executed just before the resulting condition signal is used, since in the current implementation a PE broadcasts the condition signal value to the column-wide bus and it is preserved only for the next one cycle. Then the other PEs get the value from the column-wide bus in the next cycle. 3. Design flow for reliability We implement three different levels of reliability with coarse-grained reconfigurable array architecture. By exploiting the flexibility of reconfigurable architecture, we can easily change the level of reliability without incurring any additional overhead. Figure 4 shows the three different levels of reliability: i) TMR (triple-modular redundancy), ii) DMR (doublemodular redundancy) and iii) NR (no redundancy). In TMR mode, three replications of each element are used for reliability. The voting circuit can determine which replication is in error when a two-to-one vote is observed. In this case, the voting circuit can output the correct result, and discard the erroneous version. In DMR mode, two replications of each element are used for reliability. Thus, the voting circuit can only detect a mismatch. In NR mode, it does not check the failure of the system. For the application for performance (such as multimedia example), Figure 5. Design flow for fault-tolerant code generation. we run the system in NR mode. However, for the application that has to be highly reliable (such as biomedical or aero space applications), we run the system in DMR or TMR mode Fault-tolerant code generation Figure 5 shows the design flow for fault-tolerant code generation considering different levels of reliability. At the first step, we adopt HLS (high-level synthesis) techniques to map application kernels onto the reconfigurable array architecture through scheduling and binding. The mapping requires solving multiple problems 267

4 (a) (b) Figure 6. TMR voter implementation. Table I Error detection result (O D ) of TMR voter O D result Description 00 No error 01 Unrecoverable error in voter 10 Two-to-one vote 11 Unrecoverable error in replica simultaneously. First, we should compile the application and generate configuration of the architecture while maximally exploiting the parallelism in both the application and the architecture. Consider that two operations having data or control dependency between them are mapped onto two different PEs that have no direct interconnection. In this case, other PEs are used for routing, and for this, we add extra dummy move operations for data forwarding [9][10]. Our kernel mapping algorithm in Figure 5 consists of two phases: i) list scheduling to get an initial solution, and ii) quantum-inspired evolutionary (QEA) algorithm [14] to get a more refined solution. QEA is a kind of evolutionary algorithm which is known to be very efficient. We seed the QEA to start from the list scheduling result and try to minimize the total latency. Since the QEA starts with a relatively good initial solution, it tends to reach a better solution sooner than starting with a random seed. As a result of QEA, the schedule and binding of each vertex are determined. Once the schedule and binding are given, it tries to find the routing paths among the vertices with unused remaining PEs to see if these schedule and binding results are implementable with the limited interconnect resources. In the scheduling and binding algorithm, the number of allowed resources in a column is given by M / R, where M is the number of PEs in a column and R is the number of replications. For example if the reconfigurable array has 8 PEs in a column and requires TMR level of reliability, the allowed number of resources for each replication is 8 / 3 2. For each replica, we need the replicated data. If we put only one data in the data memory we may suffer from data contention with limited interconnect resources. To resolve (a) TMR voter (b) DMR voter Figure 7. CDFG representations for voter implementation. such contention caused by memory accesses, we allow storing the replicated data for each replica. Thus, according to the level of reliability, we store multiple copies of data into the data memory using DMA. After generating the configuration for one replication is completed, we replicate the configuration according to the level of reliability. Then at the final stage, we insert voting operation. The voter uses compare operations of the PEs (a PE in our coarse-grained reconfigurable array architecture can perform conditional operations as well as arithmetic or logical operations (see Section 2.3)). The outputs (O R and O D ) of the voter are stored in two different locations. One (O R ) is the computational result and the other (O D ) is the error detection result. Figure 6 shows input (X, Y and Z) and output (O D and O R ) table of the voter with triple-modular redundancy (TMR). Inputs (X, Y and Z) of the table (in Figure 6(a)) are the compared result of replications (A, B and C) as shows in Figure 6(b). Table I shows the description of O D result. In Table I, there are two kinds of unrecoverable errors: one from the voter part and the other from the replication part. When 11 is observed as the O D result, we see that three different results are generated respectively from the three replicas. On the other hand, if 10 is observed from the O D result, we know that the voter has a fault. For example, X=0, Y=0 and Z=1 (one of the cases where 10 is observed at O D ) means that A is equal to B and A is equal to C, but, B is not equal to C. Since this is logically not true, we infer that the voting logic has a fault. We cannot recover from the two errors, 01 and 11 at O D. Figure 7 shows the CDFG (control data flow graph) representation for voter implementation, where each operation ( compare, add, logical and and select ) is supported by a PE in the array. Compared to the TMR voter, the DMR voter can be easily implemented with only one compare operation as shown in Figure 7(b). In [6], most of the area overhead (27 % compared to the original architecture) for designing a reliable system arises from voter implementation. However, in our implementation, since the PE array is used for voting 268

5 Assuming that the temperature of each PE can be measured (or estimated by a thermal model such as the one in [15]), we map the application considering thermal effect so that the reliability of the system is improved. For the mapping, we calculate the thermal cost of each PE as (a) 2x2 (1 thermal location) (b) 3x3 (3 thermal locations) (c) 4x4 (3 thermal locations) (d) 5x5 (6 thermal locations) Figure 8. Thermally different locations. operation instead of extra dedicated hardware logic, there is no additional area overhead Thermal-aware application mapping When we map an application onto the coarse-grained reconfigurable array architecture, we can also consider thermal effect based on the fact that given a certain compute processor and steady ambient temperature, in general, tasks with longer run-times cause more heat and therefore higher peak temperatures. Accordingly tasks with shorter run-times cause less heat, and therefore, lower peak temperature [11]. The reason why we consider thermal effect is that the FIT (failures-in-time) rate increases dramatically along with the temperature 1. Thermal management can be characterized as temporal or spatial. Temporal thermal management scheme [12] controls the amount of computations on the processing element to reduce the temperature. On the other hand, spatial thermal management scheme [13] can reduce the temperature by scheduling hot tasks on cool processing elements. In this paper, we perform spatial thermal management for reliability since it can reduce the temperature effectively without throttling the computation [19]. The advantage of the coarse-grained reconfigurable array architecture is in its flexibility. For example, the mapping of the application running on the coarse-grained reconfigurable array architecture can be dynamically changed as time elapses. Thus the temperature of each PE can be dynamically changed according to the workload. 1 Time to failure is known to be a function of e -Ea/kT (acceleration factor in Arrhenius equation [17]), where E a = activation energy of the failure mechanism being accelerated, k = Boltzmann's constant, and T = absolute temperature. Cost a1 T a2 T a3 T (1) C DA where T C is temperature of the candidate PE, T DA s are temperatures of directly neighboring PEs, T da s are temperatures of diagonally neighboring PEs, and a i s are weights of the parameters. The values of the weights are determined statically through analysis and/or experiments. Equation (1) is obtained from [18] after slight modification for our mapping approach. As shown in Figure 5, our mapping tool for the coarsegrained reconfigurable array architecture [9] takes a two phase approach of list scheduling followed by refinement with quantum-inspired evolutionary algorithm (QEA) [14]. In the second phase, the fitness function that we use for the QEA is the performance. In addition to that we consider the thermal cost in (1). At the evaluation stage of QEA, we calculate the thermal cost for every possible mapping. If there are several candidate solutions that give same performance result, we choose the one having the lowest cost. Thermal model In (1), temperature of each PE can be measured by a thermal sensor (the details of how to measure the temperature of each PE are out of scope of this paper) or calculated by a thermal model as follows. Figure 8 shows the thermally different locations of the PE array. For an N N array of identical square PEs, there are ( N / 2 ( N / 2 1)) / 2 different possible locations [18]. Thermally different location indicates that central PEs such as C in Figure 8(b) tends to have higher temperature than the edge PEs such as A. With the idea of thermally different locations, in [15], they present post thermal map calculation that estimates temperature change after task allocation. A 2D thermal map is defined for the N N PE array, where a cell value in the thermal map represents the temperature corresponding to that particular PE. The current thermal map is referred to as the pre-thermal map, and the temperature of the PE at location (i, j), i.e., i-th row and j-th column, in the pre-thermal map is denoted by T 0 (i, j). The thermal map predicting the temperature change after task allocation is referred to as the postthermal map, and the temperature of the PE at location (i, j) in the post-thermal map is denoted by T(i, j). To calculate the fast thermal distribution associated with adding a task to the PE array, they use the following equation [15]. da 269

6 (a) Thermal snapshot (b) Application graph Table II Error detection and correction Replication part Voter part Detect O Correct (two-to-one) X O: possible, : partially possible, X: cannot In Figure 9, darker gray in the PE array means hotter area. From Figure 9 (a) and (c), we see that hot-spots in thermal snapshot and cost analysis results may differ Discussions on reliability (c) Cost analysis (d) Application mapping Figure 9. Thermal snapshot and cost analysis. e / T ( i, j) T ( i, j) (1 e ) p LUT ( k, i, ) (2) 0 j where (1 e e/ ) is the architecture-dependent constant which is calculated statically. p is the power dissipated at thermally different location k. LUT(k, i, j) is an element at location (i, j) of the look-up table for k. There are ( N / 2 ( N / 2 1)) / 2 pre-built look-up tables, one for each thermally different location. The look-up table stores the steady-state temperature of each PE, which can be reached if the application is executed infinitely. More specifically, an element in the k-th LUT gives the increase in temperature at the corresponding PE after one Watt of power is dissipated by the application running at k. The details about this thermal model can be found in [15]. Thermal-aware application mapping Figure 9(a) shows the thermal snapshot of the PEs at a certain time t. Now we want to map an application represented by the data flow graph shown in Figure 9(b) onto the PEs. Among several mapping candidates that give same performance result, we choose the one giving the lowest cost. Figure 9(c) shows the cost calculated by (1) for every PEs when we simply assume a 1 =3, a 2 =2 and a 3 =1. Finally we map an application onto the PE array as shown in Figure 9(d), which gives the best performance and lowest peak temperature while satisfying given resource constraints. Regarding the resource constraints, there are several problems to be considered for the mapping. For example, in Figure 9(d), diagonal interconnection or shared resources such as multipliers are not considered for simplicity. However, we do not address such problems since they are out of the scope of this paper. The details of mapping considering resource constraints can be found in our previous paper [9]. In the coarse-grained reconfigurable array architecture, when permanent faults such as manufacturing faults are detected, we can relatively easily correct the problem by reconfiguration. When some broken PEs are detected in the PE array (the details of how to find the broken PEs will not be addressed in this paper, since it is another difficult subject to be solved), we map the kernel by avoiding the broken PEs. Transient faults can be detected and corrected by TMR or DMR implementation. A transient fault can occur either in a PE executing the replicated tasks or in a PE executing the voting operation. In some cases, both replications and voter can have faults. But in our fault-tolerant approach, only errors due to faults in the replications can be detected and corrected with two-to-one vote. Errors due to faults in the voter can be detected but cannot be corrected. Table II summarizes it. We should also consider faults occurred in memory. We expect that the memory faults can be detected and corrected by inserting ECC (error correcting code) circuit. 4. Experiment 4.1. Architecture overhead analysis To see the implementation overhead for reliability, we designed the coarse-grained reconfigurable array architecture at the register-transfer level, and synthesized a gate-level circuit targeting for an FPGA. The area overhead for implementing reliability was 12% compared to the original architecture. This overhead comes from implementing conditional execution for handling control path of the application [16]. As we mentioned in Section 2.3, we added conditional signals, 1-bit registers and column-wide buses for interconnection. However, most of the area overhead (10.3%) come from the increased logic to implement the extension in the operation (such as comparator or logical negation) rather than from the control interconnects (1.7%). 270

7 Figure 10. ACS (add-compare-select) operation in Viterbi decoder. Figure 12. Tradeoff between performance and reliability level for the ACS operation of Viterbi decoder. (a) TMR implementation (b) DMR implementation Figure 11. Application mapping with different reliability level. In our synthesis result, there was no degradation of clock speed for reliability implementation compared to the original architecture. As mentioned in Section 2.2, we use column-wide bus architecture for the control signal shared by column-wide PEs of the array. Adding such an 1-bit column-wide bus for the control signal does not cause degradation of clock speed, since we already have 16-bit column-wide buses for data memory in the original architecture Evaluation As a sample application, we implemented a kernel part of a Viterbi decoder on our coarse-grained reconfigurable array architecture. Viterbi decoding algorithm is widely used for decoding convolutional codes of satellite communications and bioinformatics, where the system has to be highly reliable. One of the most time-consuming operations in a Viterbi decoder is an ACS (add-compareselect) operation as shown in Figure 10. Since our coarsegrained reconfigurable array architecture enables conditional execution, we can easily map this ACS operation on our reconfigurable architecture. Figure 11(a) shows an example, where three replicas of ACS operations and the voter are mapped onto the coarsegrained reconfigurable array architecture that has eight PEs in a column for TMR. With the eight PEs in a column, one replica can use two PEs. Gray region in Figure 11 represents voter implementation. We can compromise reliability level for performance. If we implement DMR instead of TMR, one replica can use 8 / 2 4 PEs. Thus we can run two ACS operations concurrently for each replica, which leads to performance improvement. Figure 11(b) shows the DMR implementation with a voting function which is simpler than that of TMR implementation. Figure 12 shows the tradeoff between the performance and reliability level when running ACS operations of Viterbi decoder. The performance is normalized to the TMR implementation. The performance is not simply linear to the number of PEs used for implementing one replica, since the implementation has different voting operation and latency depending on the reliability level. 5. Conclusion In this paper, we presented a thermal-aware faulttolerant system design with coarse-grained reconfigurable array architecture. The proposed system has several reliability levels so that one can exploit the performance and reliability tradeoffs by adjusting the reliability level. We used the feature of conditional execution to implement a reliable system, which accounts for 12% area overhead compared to the original architecture. We also introduced temperature-aware application mapping onto coarsegrained reconfigurable array architecture for reliability. We experimented with Viterbi decoder where every replications and voting function are implemented on the reconfigurable PE array without causing additional logic overhead. For the future work, we are working on detailed reliability analysis for different implementations, and designing reliable systems including fault-tolerant memory and run-time adaptor. Acknowledgment This work was supported by KOSEF under NRL Program Grant (R0A ) funded by MEST, Korea and Nano IP/SoC Promotion Group under Seoul R&BD Program (10560). 271

8 References [1] L. Anghel, D. Alexandrescu, and M. Nicolaidis, Evaluation of a soft error tolerance technique based on time and/or space redundancy, in Proc. ICSD, [2] H. Singh, M. H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E. M. C. Filho, Morphosys: an integrated reconfigurable system for data-parallel and computation-intensive applications, IEEE Tran. Computers, vol. 49, May [3] B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins, ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix, in Proc. FPLA, [4] J. A. Cheatham, J. M. Emmert, and S. R. Baumgart, A survey of fault tolerant methodologies for FPGAs, ACM Trans. Design Automation of Computer Systems, April [5] S. K. Lu, F. M. Yesh, J. S. Shih, Fault detection and fault diagnosis techniques for lookup table FPGAs, VLSI Design Vol. 15, [6] D. Alnajjar, Y. Ko, T. Imagawa, M. Hiromoto, Y. Mitsuyama, M. Hashimoto, H. Ochi, and T. Onoye, A coarse-grained dynamically reconfigurable architecture enabling flexible reliability, in Proc. FPL, [7] Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi, Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization, in Proc. DATE, [8] Y. Kim, I. Park, K. Choi, and Y. Paek, Power-conscious configuration cache structure and code mapping for coarse-grained reconfigurable architecture, in Proc. ISLPED, [9] G. Lee, S. Lee, K. Choi, and N. Dutt, Routing-aware application mapping considering Steiner points for coarsegrained reconfigurable architecture, in Proc. ARC, [10] G. Lee, K. Chang, and K. Choi, Automatic mapping of control-intensive kernels onto coarse-grained reconfigurable array architecture with speculative execution, in Proc. RAW, [11] D. C. Vanderster, A. Baniasadi, and N. J. Dimopoulos, Exploiting task temperature profiling in temperatureaware task scheduling for computational clusters, in Proc. APCSAC, [12] D. Brooks and M. Martonosi, Dynamic thermal management for high-performance microprocessors, in Proc. HPCA, [13] K. Skadron, M. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan, Temperature-aware microarchitecture: modeling and implementation, ACM Trans. Architecture and Code Optimization vol. 1, March [14] K. Han and J. Kim, Quantum-inspired evolutionary two phase scheme, IEEE Trans. Evolutionary Computation 8, April [15] J. Cui and D. L. Maskell, Dynamic thermal-aware scheduling on chip multiprocessor for soft real-time system, in Proc. GLSVLSI, [16] J. Lee, Y. Kim, J. Jung, S. Kang, and K. Choi, Reconfigurable ALU array architecture with conditional execution, in Proc. ISOCC, [17] Compendium of Chemical Terminology, International Union of Pure and Applied Chemistry, Gold Book. [18] K. Stavrou and P. Trancoso, Thermal-aware scheduling for future chip multiprocessors, EURASIP Journal on Embedded Systems, January [19] M. D. Powell, M. Gomaa, and T. N. Vijaykumar, Heatand-run: Leveraging SMT and CMP to manage power density through the operating system, in Proc. ASPLOS,

Design of Reusable Context Pipelining for Coarse Grained Reconfigurable Architecture

Design of Reusable Context Pipelining for Coarse Grained Reconfigurable Architecture Design of Reusable Context Pipelining for Coarse Grained Reconfigurable Architecture P. Murali 1 (M. Tech), Dr. S. Tamilselvan 2, S. Yazhinian (Research Scholar) 3 1, 2, 3 Dept of Electronics and Communication

More information

Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture for Domain-Specific Optimization

Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture for Domain-Specific Optimization Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture for Domain-Specific Optimization Yoonjin Kim, Mary Kiemb, Chulsoo Park, Jinyong Jung, Kiyoung Choi Design Automation Laboratory,

More information

Dependable VLSI Platform using Robust Fabrics

Dependable VLSI Platform using Robust Fabrics Dependable VLSI Platform using Robust Fabrics Director H. Onodera, Kyoto Univ. Principal Researchers T. Onoye, Y. Mitsuyama, K. Kobayashi, H. Shimada, H. Kanbara, K. Wakabayasi Background: Overall Design

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016 NEW VLSI ARCHITECTURE FOR EXPLOITING CARRY- SAVE ARITHMETIC USING VERILOG HDL B.Anusha 1 Ch.Ramesh 2 shivajeehul@gmail.com 1 chintala12271@rediffmail.com 2 1 PG Scholar, Dept of ECE, Ganapathy Engineering

More information

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,

More information

COARSE-GRAINED DYNAMICALLY RECONFIGURABLE ARCHITECTURE WITH FLEXIBLE RELIABILITY

COARSE-GRAINED DYNAMICALLY RECONFIGURABLE ARCHITECTURE WITH FLEXIBLE RELIABILITY COARSE-GRAINED DYNAMICALLY RECONFIGURABLE ARCITECTURE WIT FLEXIBLE RELIABILITY Dawood ALNAJJAR, Younghun KO, Takashi IMAGAWA, iroaki KONOURA, Masayuki IROMOTO, Yukio MITSUYAMA, Masanori ASIMOTO, iroyuki

More information

Reconfigurable Computing. Introduction

Reconfigurable Computing. Introduction Reconfigurable Computing Tony Givargis and Nikil Dutt Introduction! Reconfigurable computing, a new paradigm for system design Post fabrication software personalization for hardware computation Traditionally

More information

Performance Improvements of Microprocessor Platforms with a Coarse-Grained Reconfigurable Data-Path

Performance Improvements of Microprocessor Platforms with a Coarse-Grained Reconfigurable Data-Path Performance Improvements of Microprocessor Platforms with a Coarse-Grained Reconfigurable Data-Path MICHALIS D. GALANIS 1, GREGORY DIMITROULAKOS 2, COSTAS E. GOUTIS 3 VLSI Design Laboratory, Electrical

More information

WITH aggressive process scaling, sustaining reliability

WITH aggressive process scaling, sustaining reliability IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 12, DECEMBER 2013 2165 Implementing Flexible Reliability in a Coarse-Grained Reconfigurable Architecture Dawood Alnajjar,

More information

Dependable VLSI Platform Using Robust Fabrics

Dependable VLSI Platform Using Robust Fabrics Dependable VLSI Platform Using Robust Fabrics Hidetoshi Onodera, T. Sato, A. Tsuchiya (Kyoto Univ.) T. Onoye, M. Hashimoto, Y. Mitsuyama (Osaka Univ.) H. Ochi (Kyoto U.), K. Kobayashi (KIT), H. Shimada

More information

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM B.HARIKRISHNA 1, DR.S.RAVI 2 1 Sathyabama Univeristy, Chennai, India 2 Department of Electronics Engineering, Dr. M. G. R. Univeristy, Chennai,

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Ultra Low-Cost Defect Protection for Microprocessor Pipelines

Ultra Low-Cost Defect Protection for Microprocessor Pipelines Ultra Low-Cost Defect Protection for Microprocessor Pipelines Smitha Shyam Kypros Constantinides Sujay Phadke Valeria Bertacco Todd Austin Advanced Computer Architecture Lab University of Michigan Key

More information

The S6000 Family of Processors

The S6000 Family of Processors The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which

More information

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures Abstract: The coarse-grained reconfigurable architectures (CGRAs) are a promising class of architectures with the advantages of

More information

Branch-Aware Loop Mapping on CGRAs

Branch-Aware Loop Mapping on CGRAs Branch-Aware Loop Mapping on CGRAs Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula School of Computing, Informatics, and Decision Systems Engineering Arizona State University, Tempe, AZ {mahdi, aviral.shrivastava,

More information

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design Lecture Objectives Background Need for Accelerator Accelerators and different type of parallelizm

More information

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Woosung Lee, Keewon Cho, Jooyoung Kim, and Sungho Kang Department of Electrical & Electronic Engineering, Yonsei

More information

Fault Recovery Time Analysis for Coarse-Grained Reconfigurable Architectures

Fault Recovery Time Analysis for Coarse-Grained Reconfigurable Architectures Fault Recovery Time Analysis for Coarse-Grained Reconfigurable Architectures GANGHEE LEE, University of New South Wales EDIZ CETIN, Macquarie University OLIVER DIESSEL, University of New South Wales Coarse-grained

More information

Data Parallel Architectures

Data Parallel Architectures EE392C: Advanced Topics in Computer Architecture Lecture #2 Chip Multiprocessors and Polymorphic Processors Thursday, April 3 rd, 2003 Data Parallel Architectures Lecture #2: Thursday, April 3 rd, 2003

More information

Selective Validations for Efficient Protections on Coarse-Grained Reconfigurable Architectures

Selective Validations for Efficient Protections on Coarse-Grained Reconfigurable Architectures Selective Validations for Efficient Protections on Coarse-Grained Reconfigurable Architectures Jihoon Kang The Graduate School Yonsei University Department of Computer Science Selective Validations for

More information

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University http://mast.stanford.edu HPCA March 14, 2016 PIM is Coming Back End of Dennard

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

A Complete Data Scheduler for Multi-Context Reconfigurable Architectures

A Complete Data Scheduler for Multi-Context Reconfigurable Architectures A Complete Data Scheduler for Multi-Context Reconfigurable Architectures M. Sanchez-Elez, M. Fernandez, R. Maestre, R. Hermida, N. Bagherzadeh, F. J. Kurdahi Departamento de Arquitectura de Computadores

More information

Hardware/Software T e T chniques for for DRAM DRAM Thermal Management

Hardware/Software T e T chniques for for DRAM DRAM Thermal Management Hardware/Software Techniques for DRAM Thermal Management 6/19/2012 1 Introduction The performance of the main memory is an important factor on overall system performance. To improve DRAM performance, designers

More information

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,

More information

Memory Partitioning Algorithm for Modulo Scheduling on Coarse-Grained Reconfigurable Architectures

Memory Partitioning Algorithm for Modulo Scheduling on Coarse-Grained Reconfigurable Architectures Scheduling on Coarse-Grained Reconfigurable Architectures 1 Mobile Computing Center of Institute of Microelectronics, Tsinghua University Beijing, China 100084 E-mail: daiyuli1988@126.com Coarse Grained

More information

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance

More information

An Approach for Adaptive DRAM Temperature and Power Management

An Approach for Adaptive DRAM Temperature and Power Management IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 An Approach for Adaptive DRAM Temperature and Power Management Song Liu, Yu Zhang, Seda Ogrenci Memik, and Gokhan Memik Abstract High-performance

More information

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections ) Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case

More information

High performance, power-efficient DSPs based on the TI C64x

High performance, power-efficient DSPs based on the TI C64x High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance

More information

COARSE GRAINED RECONFIGURABLE ARCHITECTURES FOR MOTION ESTIMATION IN H.264/AVC

COARSE GRAINED RECONFIGURABLE ARCHITECTURES FOR MOTION ESTIMATION IN H.264/AVC COARSE GRAINED RECONFIGURABLE ARCHITECTURES FOR MOTION ESTIMATION IN H.264/AVC 1 D.RUKMANI DEVI, 2 P.RANGARAJAN ^, 3 J.RAJA PAUL PERINBAM* 1 Research Scholar, Department of Electronics and Communication

More information

The MorphoSys Parallel Reconfigurable System

The MorphoSys Parallel Reconfigurable System The MorphoSys Parallel Reconfigurable System Guangming Lu 1, Hartej Singh 1,Ming-hauLee 1, Nader Bagherzadeh 1, Fadi Kurdahi 1, and Eliseu M.C. Filho 2 1 Department of Electrical and Computer Engineering

More information

Single Event Upset Mitigation Techniques for SRAM-based FPGAs

Single Event Upset Mitigation Techniques for SRAM-based FPGAs Single Event Upset Mitigation Techniques for SRAM-based FPGAs Fernanda de Lima, Luigi Carro, Ricardo Reis Universidade Federal do Rio Grande do Sul PPGC - Instituto de Informática - DELET Caixa Postal

More information

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration , pp.517-521 http://dx.doi.org/10.14257/astl.2015.1 Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration Jooheung Lee 1 and Jungwon Cho 2, * 1 Dept. of

More information

Temperature-Sensitive Loop Parallelization for Chip Multiprocessors

Temperature-Sensitive Loop Parallelization for Chip Multiprocessors Temperature-Sensitive Loop Parallelization for Chip Multiprocessors Sri Hari Krishna Narayanan, Guilin Chen, Mahmut Kandemir, Yuan Xie Department of CSE, The Pennsylvania State University {snarayan, guilchen,

More information

A Robust Bloom Filter

A Robust Bloom Filter A Robust Bloom Filter Yoon-Hwa Choi Department of Computer Engineering, Hongik University, Seoul, Korea. Orcid: 0000-0003-4585-2875 Abstract A Bloom filter is a space-efficient randomized data structure

More information

Worst Case Execution Time Analysis for Synthesized Hardware

Worst Case Execution Time Analysis for Synthesized Hardware Worst Case Execution Time Analysis for Synthesized Hardware Jun-hee Yoo ihavnoid@poppy.snu.ac.kr Seoul National University, Seoul, Republic of Korea Xingguang Feng fengxg@poppy.snu.ac.kr Seoul National

More information

Research Article Dynamic Reconfigurable Computing: The Alternative to Homogeneous Multicores under Massive Defect Rates

Research Article Dynamic Reconfigurable Computing: The Alternative to Homogeneous Multicores under Massive Defect Rates International Journal of Reconfigurable Computing Volume 2, Article ID 452589, 7 pages doi:.55/2/452589 Research Article Dynamic Reconfigurable Computing: The Alternative to Homogeneous Multicores under

More information

Coarse Grained Reconfigurable Architecture

Coarse Grained Reconfigurable Architecture Coarse Grained Reconfigurable Architecture Akeem Edwards July 29 2012 Abstract: This paper examines the challenges of mapping applications on to a Coarsegrained reconfigurable architecture (CGRA). Through

More information

A hardware operating system kernel for multi-processor systems

A hardware operating system kernel for multi-processor systems A hardware operating system kernel for multi-processor systems Sanggyu Park a), Do-sun Hong, and Soo-Ik Chae School of EECS, Seoul National University, Building 104 1, Seoul National University, Gwanakgu,

More information

Predictive Thermal Management for Hard Real-Time Tasks

Predictive Thermal Management for Hard Real-Time Tasks Predictive Thermal Management for Hard Real-Time Tasks Albert Mo Kim Cheng and Chen Feng Real-Time System Laboratory, Department of Computer Science University of Houston, Houston, TX 77204, USA {cheng,

More information

DESIGN AND ANALYSIS OF SOFTWARE FAULTTOLERANT TECHNIQUES FOR SOFTCORE PROCESSORS IN RELIABLE SRAM-BASED FPGA

DESIGN AND ANALYSIS OF SOFTWARE FAULTTOLERANT TECHNIQUES FOR SOFTCORE PROCESSORS IN RELIABLE SRAM-BASED FPGA DESIGN AND ANALYSIS OF SOFTWARE FAULTTOLERANT TECHNIQUES FOR SOFTCORE PROCESSORS IN RELIABLE SRAM-BASED FPGA 1 Vatsya Tiwari M.Tech Student Department of computer science & engineering Abstract. This paper

More information

ISSN Vol.05,Issue.09, September-2017, Pages:

ISSN Vol.05,Issue.09, September-2017, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,

More information

Vertex Shader Design I

Vertex Shader Design I The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only

More information

Integrating MRPSOC with multigrain parallelism for improvement of performance

Integrating MRPSOC with multigrain parallelism for improvement of performance Integrating MRPSOC with multigrain parallelism for improvement of performance 1 Swathi S T, 2 Kavitha V 1 PG Student [VLSI], Dept. of ECE, CMRIT, Bangalore, Karnataka, India 2 Ph.D Scholar, Jain University,

More information

Self-Repair for Robust System Design. Yanjing Li Intel Labs Stanford University

Self-Repair for Robust System Design. Yanjing Li Intel Labs Stanford University Self-Repair for Robust System Design Yanjing Li Intel Labs Stanford University 1 Hardware Failures: Major Concern Permanent: our focus Temporary 2 Tolerating Permanent Hardware Failures Detection Diagnosis

More information

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 1, JANUARY /$ IEEE

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 1, JANUARY /$ IEEE IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 1, JANUARY 2009 151 Transactions Briefs Interconnect Exploration for Energy Versus Performance Tradeoffs for Coarse Grained

More information

Design For High Performance Flexray Protocol For Fpga Based System

Design For High Performance Flexray Protocol For Fpga Based System IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) e-issn: 2319 4200, p-issn No. : 2319 4197 PP 83-88 www.iosrjournals.org Design For High Performance Flexray Protocol For Fpga Based System E. Singaravelan

More information

On Supporting Adaptive Fault Tolerant at Run-Time with Virtual FPGAs

On Supporting Adaptive Fault Tolerant at Run-Time with Virtual FPGAs On Supporting Adaptive Fault Tolerant at Run-Time with Virtual FPAs K. Siozios 1, D. Soudris 1 and M. Hüebner 2 1 School of ECE, National Technical University of Athens reece Email: {ksiop, dsoudris}@microlab.ntua.gr

More information

Leso Martin, Musil Tomáš

Leso Martin, Musil Tomáš SAFETY CORE APPROACH FOR THE SYSTEM WITH HIGH DEMANDS FOR A SAFETY AND RELIABILITY DESIGN IN A PARTIALLY DYNAMICALLY RECON- FIGURABLE FIELD-PROGRAMMABLE GATE ARRAY (FPGA) Leso Martin, Musil Tomáš Abstract:

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

Cost Functions for the Design of Dynamically Reconfigurable Processor Architectures

Cost Functions for the Design of Dynamically Reconfigurable Processor Architectures Cost Functions for the Design of Dynamically Reconfigurable Processor Architectures Tobias Oppold, Thomas Schweizer, Tommy Kuhn, Wolfgang Rosenstiel University of Tuebingen Wilhelm-Schickard-Institute,

More information

Fault Tolerant Parallel Filters Based On Bch Codes

Fault Tolerant Parallel Filters Based On Bch Codes RESEARCH ARTICLE OPEN ACCESS Fault Tolerant Parallel Filters Based On Bch Codes K.Mohana Krishna 1, Mrs.A.Maria Jossy 2 1 Student, M-TECH(VLSI Design) SRM UniversityChennai, India 2 Assistant Professor

More information

A Reconfigurable Multifunction Computing Cache Architecture

A Reconfigurable Multifunction Computing Cache Architecture IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 4, AUGUST 2001 509 A Reconfigurable Multifunction Computing Cache Architecture Huesung Kim, Student Member, IEEE, Arun K. Somani,

More information

Compilation Approach for Coarse-Grained Reconfigurable Architectures

Compilation Approach for Coarse-Grained Reconfigurable Architectures Application-Specific Microprocessors Compilation Approach for Coarse-Grained Reconfigurable Architectures Jong-eun Lee and Kiyoung Choi Seoul National University Nikil D. Dutt University of California,

More information

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing Linköping University Post Print epuma: a novel embedded parallel DSP platform for predictable computing Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu N.B.: When citing this work, cite the original article.

More information

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department

More information

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Michalis D. Galanis, Gregory Dimitroulakos, and Costas E. Goutis VLSI Design Laboratory, Electrical and Computer Engineering

More information

T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University, W Lafayette, IN

T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University, W Lafayette, IN Resource Area Dilation to Reduce Power Density in Throughput Servers Michael D. Powell 1 Fault Aware Computing Technology Group Intel Massachusetts, Inc. T. N. Vijaykumar School of Electrical and Computer

More information

Program-Driven Fine-Grained Power Management for the Reconfigurable Mesh

Program-Driven Fine-Grained Power Management for the Reconfigurable Mesh Program-Driven Fine-Grained Power Management for the Reconfigurable Mesh Heiner Giefers, Marco Platzner Computer Engineering Group University of Paderborn {hgiefers, platzner}@upb.de Outline 1. Introduction

More information

Flexible wireless communication architectures

Flexible wireless communication architectures Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) A 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

A Literature Review of on-chip Network Design using an Agent-based Management Method

A Literature Review of on-chip Network Design using an Agent-based Management Method A Literature Review of on-chip Network Design using an Agent-based Management Method Mr. Kendaganna Swamy S Dr. Anand Jatti Dr. Uma B V Instrumentation Instrumentation Communication Bangalore, India Bangalore,

More information

Analysis of Soft Error Mitigation Techniques for Register Files in IBM Cu-08 90nm Technology

Analysis of Soft Error Mitigation Techniques for Register Files in IBM Cu-08 90nm Technology Analysis of Soft Error Mitigation Techniques for s in IBM Cu-08 90nm Technology Riaz Naseer, Rashed Zafar Bhatti, Jeff Draper Information Sciences Institute University of Southern California Marina Del

More information

Two-level Reconfigurable Architecture for High-Performance Signal Processing

Two-level Reconfigurable Architecture for High-Performance Signal Processing International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA 04, pp. 177 183, Las Vegas, Nevada, June 2004. Two-level Reconfigurable Architecture for High-Performance Signal Processing

More information

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal

More information

[Sahu* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Sahu* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY SPAA AWARE ERROR TOLERANT 32 BIT ARITHMETIC AND LOGICAL UNIT FOR GRAPHICS PROCESSOR UNIT Kaushal Kumar Sahu*, Nitin Jain Department

More information

Memory Systems and Compiler Support for MPSoC Architectures. Mahmut Kandemir and Nikil Dutt. Cap. 9

Memory Systems and Compiler Support for MPSoC Architectures. Mahmut Kandemir and Nikil Dutt. Cap. 9 Memory Systems and Compiler Support for MPSoC Architectures Mahmut Kandemir and Nikil Dutt Cap. 9 Fernando Moraes 28/maio/2013 1 MPSoC - Vantagens MPSoC architecture has several advantages over a conventional

More information

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS 1 RONNIE O. SERFA JUAN, 2 CHAN SU PARK, 3 HI SEOK KIM, 4 HYEONG WOO CHA 1,2,3,4 CheongJu University E-maul: 1 engr_serfs@yahoo.com,

More information

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

Outline of Presentation Field Programmable Gate Arrays (FPGAs( FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering Auburn University Outline of Presentation Field Programmable Gate Arrays (FPGAs( FPGAs) How Programmable

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra ia a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Fault Tolerant Parallel Filters Based on ECC Codes

Fault Tolerant Parallel Filters Based on ECC Codes Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 11, Number 7 (2018) pp. 597-605 Research India Publications http://www.ripublication.com Fault Tolerant Parallel Filters Based on

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

An Efficient Flexible Architecture for Error Tolerant Applications

An Efficient Flexible Architecture for Error Tolerant Applications An Efficient Flexible Architecture for Error Tolerant Applications Sheema Mol K.N 1, Rahul M Nair 2 M.Tech Student (VLSI DESIGN), Department of Electronics and Communication Engineering, Nehru College

More information

Soft-error and Variability Resilience in Dependable VLSI Platform. Hidetoshi Onodera Kyoto University

Soft-error and Variability Resilience in Dependable VLSI Platform. Hidetoshi Onodera Kyoto University Soft-error and Variability Resilience in Dependable VLSI Platform Hidetoshi Onodera Kyoto University Outline: Soft-error and Variability Resilience 1 Background Overview: Dependable VLSI Platform Circuit-level

More information

M.TECH VLSI IEEE TITLES

M.TECH VLSI IEEE TITLES 2016 2017 M.TECH VLSI IEEE TITLES S.NO TITLES DOMAIN 1 A Fixed-Point Squaring Algorithm Using an Implicit Arbitrary Radix Number System 2 An Improved Design of a Reversible Fault Tolerant LUT-Based FPGA

More information

FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes

FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes E. Jebamalar Leavline Assistant Professor, Department of ECE, Anna University, BIT Campus, Tiruchirappalli, India Email: jebilee@gmail.com

More information

HW/SW Co-Detection of Transient and Permanent Faults with Fast Recovery in Statically Scheduled Data Paths

HW/SW Co-Detection of Transient and Permanent Faults with Fast Recovery in Statically Scheduled Data Paths HW/SW Co-Detection of Transient and Permanent Faults with Fast Recovery in Statically Scheduled Data Paths Mario Schölzel Department of Computer Science Brandenburg University of Technology Cottbus, Germany

More information

HIGH-LEVEL SYNTHESIS

HIGH-LEVEL SYNTHESIS HIGH-LEVEL SYNTHESIS Page 1 HIGH-LEVEL SYNTHESIS High-level synthesis: the automatic addition of structural information to a design described by an algorithm. BEHAVIORAL D. STRUCTURAL D. Systems Algorithms

More information

Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors

Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors Siew-Kei Lam Centre for High Performance Embedded Systems, Nanyang Technological University, Singapore (assklam@ntu.edu.sg)

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs

An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs Lee W. Lerner and Charles E. Stroud Dept. of Electrical and Computer Engineering Auburn University Auburn, AL, USA Abstract We present

More information

QUKU: A Fast Run Time Reconfigurable Platform for Image Edge Detection

QUKU: A Fast Run Time Reconfigurable Platform for Image Edge Detection QUKU: A Fast Run Time Reconfigurable Platform for Image Edge Detection Sunil Shukla 1,2, Neil W. Bergmann 1, Jürgen Becker 2 1 ITEE, University of Queensland, Brisbane, QLD 4072, Australia {sunil, n.bergmann}@itee.uq.edu.au

More information

The future is parallel but it may not be easy

The future is parallel but it may not be easy The future is parallel but it may not be easy Michael J. Flynn Maxeler and Stanford University M. J. Flynn 1 HiPC Dec 07 Outline I The big technology tradeoffs: area, time, power HPC: What s new at the

More information

A Low-Cost Correction Algorithm for Transient Data Errors

A Low-Cost Correction Algorithm for Transient Data Errors A Low-Cost Correction Algorithm for Transient Data Errors Aiguo Li, Bingrong Hong School of Computer Science and Technology Harbin Institute of Technology, Harbin 150001, China liaiguo@hit.edu.cn Introduction

More information

Error Detecting and Correcting Code Using Orthogonal Latin Square Using Verilog HDL

Error Detecting and Correcting Code Using Orthogonal Latin Square Using Verilog HDL Error Detecting and Correcting Code Using Orthogonal Latin Square Using Verilog HDL Ch.Srujana M.Tech [EDT] srujanaxc@gmail.com SR Engineering College, Warangal. M.Sampath Reddy Assoc. Professor, Department

More information

Storage. Hwansoo Han

Storage. Hwansoo Han Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra is a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY Department of Computer science and engineering Year :II year CS6303 COMPUTER ARCHITECTURE Question Bank UNIT-1OVERVIEW AND INSTRUCTIONS PART-B

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES S. SRINIVAS KUMAR *, R.BASAVARAJU ** * PG Scholar, Electronics and Communication Engineering, CRIT

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information