Asynchronous Behavior Related Retiming in Gated-Clock GALS Systems

Size: px
Start display at page:

Download "Asynchronous Behavior Related Retiming in Gated-Clock GALS Systems"

Transcription

1 Asynchronous Behavior Related Retiming in Gated-Clock GALS Systems Sam Farrokhi, Masoud Zamani, Hossein Pedram, Mehdi Sedighi Amirkabir University of Technology Department of Computer Eng. & IT {Sfarrokhi, m-zamani, pedram, Abstract Although retiming is a well known method to optimize various characteristics of synchronous circuits, this method has rarely been applied to the synchronous blocks of a Globally Asynchronous Locally Synchronous (GALS) system. In this paper, communication protocols of gated-clock based wrappers have been analyzed for applying retiming algorithm to improve performance. Through the introduction of a new algorithm, it will be shown that a careful application of retiming concepts will not only prevent metastability problems between synchronous blocks of a GALS system, it can also reduce the communication gaps among those blocks whereby increasing their operating frequency. To demonstrate the effectiveness of the proposed algorithm, a 3-stage pipeline implementation of 16N digital filter is used as a locally synchronous block. Our experiments show a 23.2% increase in filter s operating frequency compared to other optimized circuits using original retiming algorithm and common reliable metastability-free methods. 1. Introduction GALS design methodology has been introduced by Chapiro in 1984[1]. The idea based on gathering both synchronous and asynchronous design advantages in a system such as achieving higher performance and less power consumption. Moreover, avoiding clock skew problem in large systems beside non-significant overhead logic were most attractive properties of GALS methodology during 1990s [2]. GALS Designers main concerns can be classified as [2]: 1- partitioning a system to some locally synchronous (LS) modules maximizing GALS methodology advantages. 2- Design and synthesis an asynchronous interface for synchronous modules which imposes as low as possible delay overhead 3-The metastability problem during modules synchronization process, which should seriously be avoided. Some basic synchronization methods in GALS systems are discussed in [3], [4] and [5].In 1996 pauseable clock circuits (PCC) was proposed to manage data transfer in asynchronous communications in GALS systems [6]. Later, asynchronous wrappers responsible for all asynchronous communications were introduced in 1997[7]. They surrounded each LS module with an asynchronous wrapper with a local clock generator inside each wrapper [5][7]. During the communication process each wrapper stops its local clock until the completion of the process. The idea of asynchronous wrappers based on clock gating is discussed in [8] [9]. By sharing clock generator between different locally synchronous modules, simpler wrappers using less power and area overhead can be achieved. [9] 1.1. Paper Contributions In this paper, a new approach for using retiming in a gated-clock wrapper based GALS system according to its asynchronous characteristics has been proposed.as shown in this paper, such retiming can improve system s working frequency and performance. In GALS systems and especially in gated clock based wrappers, there are some timing gaps during asynchronous handshaking of two LS modules. Designers leave these timing gaps unused avoiding metastability problem. Proposed retiming method will use these timing gaps in a safe manner and guarantee circuit will not encounter metastability problems. This timing gap analysis can help designers and CAD tools during partitioning and/or synthesis process. This method repositions some combinational parts of each LS module to boundary areas of the module while leaving sequential parts inside the module. This leads to shorter critical paths and increased frequency and/or performance of the system. 1.2 Paper Organization Paper has been organized as follows: second section contains some preliminaries about retiming, IEEE EWDTS, Yerevan, September 7-10,

2 asynchronous communication choices and GALS methodology. Third section contains gated-clock based GALS timing in details followed by timing analysis section. Our contribution described in the 5 th section named retiming in GALS systems. A clarifying example is provided in the 6 th section. Experimental results and conclusions are the last sections. 2. Preliminaries 2.1 Asynchronous communication choices [16] Impressed that asynchronous communications can be choosing through a solution space which consists of the cross product of a number of options including: {2-phase, 4-phase} * {bundled-data, dual rail, 1-ofn, } * {push, pull} The choice of protocol affects the circuit s implementation and characteristics such as area, speed, power and robustness. 2.2 Gated-clock GALS Methodology In GALS methodology a large system partitions to several LS modules. These modules communicate asynchronously. A GALS system can be assumed as an asynchronous sea with some locally synchronous islands. As mentioned above, we focused on GALS systems using wrapper methods based on clock gating. The general architecture of the GALS wrapper circuit based on clock gating is shown in figure 1. In this wrapper circuit, each locally synchronous module has a local clock which is obtained by gating separated external clock (eclk) signal corresponding to the request which comes from a port controller. When locally synchronous module enters data communication phase, it informs the related port controller that it needs data communication. Subsequently, the port controller will generate a gate request for clock generator and the external clock will be gated. After handshake completion, the clock will be released. Since no handshaking is required between asynchronous port controllers and local clock generation circuit, such port controllers are simpler than plausible clock based port controllers. Figure 1. Gated clock Based GALS wrapper Basic components of such gated-clock based wrappers are: 1. Clock generator: The clock generator circuit for gated clock method in GALS wrapper is as simple as usual clock gating shown in figure 2. There is no asynchronous element in clock generator circuit. During the activity phase of LS module all of the gate-signals are high. When LS module enters data communication phase, one or more of the gate signals go low and clock will be gated. Figure 2. Clock gating in gated clock GALS wrapper 2. Port controller: This is responsible for interfacing internal synchronous and external asynchronous environment. Each port controller is activated by Den signal which is generated by the LS module. When LS module needs data for next clock cycle, it activates Den signal at the coming negative edge of the current clock pulse like pausible clock based GALS scheme. After Den signal activation, the port controller starts its work. At the first step, external clock will be gated to prevent metastablity during asynchronous data exchange. 3. Gate synchronizer Gating the eclk signal must not be done later than the next positive clock edge. This can be guaranteed by enforcing timing constraints on GALS modules during synthesis process. 516 IEEE EWDTS, Yerevan, September 7-10, 2007

3 4. Input latch: Input port should store arrived valid data. The validity of data is defined by handshaking signals. 3. GALS Timings in details Assume that there is a 4-bit data that has to be transmitted between two LS modules in a Gated-Clock GALS system via two wrappers of the same type, as shown in figure 3. Figure3. Structure of binding to wrappers As it is described in [8] the sender should follow next sequence to transmit its data to receiver: Den1 g1 Gate1 LCLK1 Rp [Ap ]* Rp [Ap ] g1 Gate1 LCKL1 *[] means that wrapper will wait until transition detected Also, receiver should follow a dual sequence as shown bellow: Den2 g2 Gate2 LCLK2 [Rp ] Ap [Rp ] Ap g1 Gate1 LCKL2 As states in [8], this process starts when the local clock switches to inactive mode. This can avoid metastability problem while as another solution some designers leave primary outputs connected directly to FF. According to figure 5, input latch will be activated by Ap s positive edge and data will be grabbed by latch at this time. Another noticeable point is that, LS2 can save data after its clock enabled; the duration between LCLK2 s rise to LS2 s FF load is Tlclk2r_ff2. 4. GALS timing analysis 4.1. Sender Counterpart As described before, there are two paths which should be considered while analyzing sender counterpart: 1. Control path which is responsible for producing control signals. 2. Data path which produces data for transmition. According to timings mentioned in the last section, the sequence of data transfer from sender to receiver - in a situation which receiver is ready to receive the data, contains: Tsr: max(tden1t_g1r + Tg1rt_Gate1r + TGate1r_rpr, Tff1_b1 + Tb1_l) + Trpr_apr + Tapr_rpf + Trpf_apf + max( Tapf_g2f + Tg2f_Gate2f + TGate2f_lclk2r + Tlclk2r_ff2, Tl_b2 + Tb2_ff2 ) Wrapper designers try to generate control signals (such as Den1, g1 and Gate1) as soon as possible after local clock falls [5]. On the other hand, as stated before, designers leave data path produce data as soon as possible to avoid metastability problem, like connecting Primary outputs directly to FFs. In some cases, extra optimization to minimize control path is used such as generation of Rp based on g signal instead of Gate signals Receiver Counterpart While analyzing receiver counterpart, individual consideration of these two data and control paths should not be neglected. This timing is mentioned bellow: Max (Tden2t_g2r + Tg2r_Gate2r + TGate2r_rpr, max(tden1t_g1r + Tg1r_Gate1r + TGate1r_lclk1f + TGate1r_rpr, Tff1_b1 + Tb1_l)) + Trpr_apr + Tapr_rpf + Trpf_apf + max( Tapf_g2f + Tg2f_Gate2f + TGate2f_lclk2r + Tlclk2r_ff2, Tl_b2 + Tb2_ff2 ) If receiver assumed ready to receive data, by sensing positive edge on Rp signal, wrapper should produce g2 and Gate2 sequence. As illustrated in figure 5 data path contains Tl_b2 and Tb2_ff2. Like sender counterpart, Because of metastability problem data path will be left free. If receiver s readiness considered during analysis, former equation should be replaced by following expression: max(tden1t_g1r + Tg1r_Gate1r + TGate1r_lclk1f + TGate1r_rpr, Tff1_b1 + Tb1_l) + Trpr_apr + Tapr_rpf + Trpf_apf + max( Tapf_g2f + Tg2f_Gate2f + TGate2f_lclk2r + Tlclk2r_ff2, Tl_b2 + Tb2_ff2 ) Because of deferent sources of these two delays, rest of the paper will not concern the receiver s readiness timing details. Like sender s counterpart some optimizations are apply able to these control signals generation such as releasing clock based on Rp signal instead of Ap. In this case, propagation delay of control path at receiver counterpart is: Tapr_rpf + Trpf_g2f + Tg2f_Gate2f + TGate2f_lclk2r + Tlclk2r_ff2 IEEE EWDTS, Yerevan, September 7-10,

4 5. Retiming in Gated-clock GALS systems A key point which was not considered before, is using data path gaps. These gaps were left free to avoid metastability. By using these gaps in a safe manner, each LS module can get rid of some combinational logics which had to manage. As mentioned in the last section, there are two timing gaps during asynchronous communication in a Gated-Clock GALS system. Sender s gap: One of these timing gaps is the duration time between sender s last FF and the input latch located at receiver s wrapper. This duration contains Tff1_b1 and Tb1_l. Also, the latch will be ready to store data after its control line is enabled by Ap that takes Half_lck1_period + Tffden1_toggle + Tden1t_g1r + Tg1r_rpr + Trpr_apr time units to be enabled. Therefore data will be ready at most after Half_lck1_period + Tffden1_toggle + Tden1t_g1r + Tg1r_rpr + Trpr_apr Tlatch_setup time units Receiver s gap: The other timing gap is the duration from wrapper s input latch to receiver s FF. the control line will be ready after Tapr_rpf + Trpf_g2f + Tg2f_Gate2 + TGate2f_lclk2r + Tlclk2r_ff2 time units following the rise of Ap. Data passes through Tl_b2 + Tb2_ff2 which is usually a simple wire with no combinational delays, and takes into account the input latch hold time and LS2 s FF hold time. Moving some combinational logics to these unused gaps, frees an LS module from handling this logic. Consequently, designers have to deal with less combinational circuit. This can lead to decrease in each counterpart s critical path. Critical path reduction can affect on frequency and hence performance improvement. Retiming algorithm as described in section two, repositions sequential parts of the circuit. On the other hand this algorithm moves combinational parts through sequential elements if fixed positions are assumed. Theorem1. In a retimeable sequential synchronous circuit that satisfies retiming constraints D1, W1 and W2, and has no combinational path between its primary inputs and its primary outputs, retiming algorithm can reposition FFs in a manner where each primary output connects directly to at least one FF. Proof. Two results can be achieved by focusing on retiming edge weight calculation formula: ' w ( e) w( e) r( v) r( u) Result 1: In a Retimeable sequential synchronous circuit, if node V assigns R=1 while all other nodes assigned R=0, and such retiming be a legal retiming for the graph, applying the retiming algorithm on this graph will result at least one FF placed in all edges ended in node V. This procedure is shown in figure 4. a) Before retiming b) After retiming Figure 4. An example for result 1 Result 2: In a Retimeable sequential synchronous circuit, in a path which has several nodes connected to each other and all of them assigned R=1 called ones path, and if such retiming is a legal retiming for the graph, after applying the retiming algorithm on this graph, one FF from end of ones path will be moved toward the beginning of the path. Because of the assumption of legal retiming, before running the algorithm, the nodes -located next to ones path - have at least one FF at the connecting edge. This procedure is shown in figure 5. a b Figure 5. An example for result 2: a - before retiming; b) after retiming FF repositioning algorithm. To reposition FFs in a way that each primary input connects directly to an FF, we will assign R vector to the nodes of the graph according to the following algorithm: Algorithm RA-PI: R vector assignment-pi 1. For all nodes in the graph, initialize all R(v) to zero 2. For all nodes that are connected to primary inputs (extra dummy node) set R = 1 3. For all nodes that have combinational path initiated in nodes of step2, set R = 1 By setting R values using RA-PI algorithm, and considering result1 and result2, the execution of retiming algorithm on the graph using these R values leads to a circuit with all primary inputs connected to at least one FF. Finally, it should be proved that such R vector is a legal retiming for the graph. As mentioned before, in order to have legal retiming, R vector should meet the following two constraints: r (u) r(v) w(u, v) 1 for all vertices u, v V such that D(u, v) c r (u) r(v) w(e) for every edge u e v of G 518 IEEE EWDTS, Yerevan, September 7-10, 2007

5 First constraint should be studied in four different cases: 1. two following R=1 nodes This situation will not affect W(e). Consequently, the graph will meet this constraint because it met this constraint before applying the algorithm. 2. two following R=0 nodes This situation will not affect W(e). Consequently, the graph will meet this constraint because it met this constraint before applying the algorithm. 3. an R=1 node connected to R=0 node As it mentioned in RA algorithm, the algorithm continues to set R=1 until it reaches an edge where W(e) > 0. Retiming algorithm uses integer values and hence the edge s value is W(e)>= 1. In this situation r (u) r(v) 1 W(e) and therefore this constraint is met. 4. an R=0 node connected to R=1 node This situation can be seen only between combinational nodes that are directly connected to primary inputs. This condition does not disturb the constraint because r ( u) r( v) 1 and the retiming condition forces W (e) 0, hence the inequality holds. Second constraint is due to timing problems and will not be considered at this time. Hence, RA-PI algorithm produces a legal retiming vector that can reposition FFs in a manner that will connect all primary inputs directly to at least one FF. Theorem 2. In a retimeable sequential synchronous circuit that satisfies D1, W1 and W2 retiming constraints, and has no combinational path between its primary inputs and primary outputs, retiming algorithm can reposition FFs in a manner where each primary input connects directly to at least one FF. Similar proof can be presented for connecting primary outputs directly to FFs, using different R value assignment algorithm. A suitable R value assignment algorithm has shown bellow: Algorithm RA-PO: R vector assignment-po 1. For all nodes in the graph initialize all R(v) to one 2. For all nodes that are connected to primary onputs (extra dummy node), set R = 0 3. For all nodes that have combinational paths ended in nodes of step2, set R = 0 Like theorem 1, it can be concluded that RA-PO algorithm produces a legal retiming vector that can reposition FFs in a manner which will connect all primary outputs directly to at least one FF. Definition: A circuit is IO-independent, if running both RA-PI and RA-PO algorithms leads in a graph which hast at least one FF connected to its primary inputs and primary outputs. In order to achieve the goal of retiming an IOindependent LS module according to its asynchronous communication, following steps should be passed: ALGORITHM 1- GALS retiming Step1: (making the circuit IO- independent) 1.1 Run RA-PI algorithm. 1.2 Run RA-PO algorithm. Step2: ( filling IO gaps) 2.1. output gaps Initialize all R values to Zero. Set R value of each node to one provided that it has a combinational path to PO with combinational delay less than the assumed output gap (in backward direction) Calculate new edge weights using retiming formula input gaps Initialize all R values to One. Set R value of each node to zero provided that it has a combinational path to PI with combinational delay less than the assumed input gap (in forward direction) Calculate new edge weights using retiming formula. Step3: (retiming core logic) 3.1 Exclude core logic by omitting all primary IO connected combinational logic. 3.2 Retime reduced graph using the original retiming algorithm to achieve max Frequency. Step4: Include primary IO connected combinational circuits to optimized core logic 7. Experimental results To demonstrate the effectiveness of the proposed algorithm, two cascaded 3-stage pipelined 16N digital filter were used as a locally synchronous block. TSMC is 0.18 and Synopsys synthesis tool was used to synthesize the logical circuit. First, wrappers with output and input controllers were implemented to determine the sender and receiver gaps as follows: Sender: Half_lck1_period + Tffden1_toggle + Tden1t_g1r + Tg1r_rpr + Trpr_apr = Half_lck1_period + 412ps(sender) Reciever: Tapr_rpf + Trpf_g2f + Tg2f_Gate2f + TGate2f_lclk2r + Tlclk2r_ff2=492ps Then, the circuit was retimed to achieve the lowest possible critical path delay which was 825 ps. The algorithms output was optimized to have 642ps as its critical path delay. This shows 23.2% improvement for the selected digital filter. IEEE EWDTS, Yerevan, September 7-10,

6 8. Conclusion Retiming LS modules of a gated-clock GALS system according to their asynchronous communication behavior has not been investigated before. Timing analysis showed that there are some timing gaps of asynchronous communication in a gated-clock GALS system that are left free to avoid metastability problem. The proposed algorithm noticed these timing gaps can be used by some combinational circuit without encountering metastability problems. Such repositioning of some combinational circuit toward boundary areas; leads to simpler core circuit containing less combinational modules and same number of pipeline stages inside the core LS module. Consequently, the retimed circuit can run using higher frequencies and execute algorithms in a shorter period of time. Although, this timing analysis has been done for gated and pauseable clock wrappers, related analysis on interblock retiming is possible as well. 10. References [1] Chapiro, D.M. Globally Asynchronous Locally Synchronous Systems. PhD Thesis, Stanford University, [2] Gurkaynak, F.K., and Oetiker, S. Is there hope for GALS in the future? 4 th ACiD Workshop of the European commission s fifth framework programme, (Jun 2004). [3] Gurkaynak,F.K., and Oetiker, S. On the GALS Design Methodology of ETH Zurich. FMGALS Workshop at the 12th International FME Symposium, (Sep 2003). [4] Seizovic, J. Pipeline synchronization. In Proceeding of International Symposium on Advanced Research in Asynchronous Circuits and Systems, (Nov 1994). [5] Muttersbach, J., and Villiger T. Practical Design of Globally-Asynchronous Locally-Synchronous Systems. In Proceeding of International Symposium Advanced Research in Asynchronous Circuits and Systems, (April 2000). [7] Bormann, D., and Cheung, P. Asynchronous wrapper for heterogeneous systems. In Proceeding of International Conf. Computer Design(ICCD), (Oct 1997). [8] Amini, E., and Najibi, M., and Pedram, H. Globally Asynchronous Locally Synchronous Wrapper Circuits based on Clock Gating. In Proceeding of the IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI), (Mar 2006). [9] Amini, E., and Najibi, M., and Pedram, H. Automatic Generation of Pausible Clock Based GALS Wrapper Circuit. In Proceeding of the 11th International CSI computer Conference, (Jan 2006). [10] Leiserson, C., and saxe, J. Retiming Synchronous Circuitry. Algorithmica, Vol. 6, (1991), [11] De Micheli, G. Synthesis and optimization of digital circuits. McGraw-Hill, [12] Baumgartner, J. Min-Area Retiming on Flexible Circuit Structures. In Proceeding Of IEEE International Conference ICCAD, (2001), [13] Hsu, Y.-L., and Wang, S.-J. Retiming-based logic synthesis for low-power. In Proceeding of International Symposium On low power electronics and design, (2002), [14] Dey, S., and Potkonjak, M., and Rothweiler, S. G. Performance Optimization of Sequential Circuits by Eliminating Retiming Bottlenecks. In Proceeding of IEEE International Conference ICCAD, (1992), [15] Farrokhi, S., and Sedighi, M. Improving the Retiming Synchronous Circuitry Algorithm to Increase Clock Speed. Journal of computer science and engineering, no 3, (fall 2003) [16] Sprso, J., and Furber, S. Principles of Asynchronous Circuit Design- A System Prespective. Kluwer Academic Publishers, [6] Yun, K., and Donohue R.P. Pausible clocking: A first step toward heterogeneous systems. In Proceeding of International Conf. Computer Design(ICCD), (Oct 1996). 520 IEEE EWDTS, Yerevan, September 7-10, 2007

Globally Asynchronous Locally Synchronous FPGA Architectures

Globally Asynchronous Locally Synchronous FPGA Architectures Globally Asynchronous Locally Synchronous FPGA Architectures Andrew Royal and Peter Y. K. Cheung Department of Electrical & Electronic Engineering, Imperial College, London, UK {a.royal, p.cheung}@imperial.ac.uk

More information

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

Low Power GALS Interface Implementation with Stretchable Clocking Scheme

Low Power GALS Interface Implementation with Stretchable Clocking Scheme www.ijcsi.org 209 Low Power GALS Interface Implementation with Stretchable Clocking Scheme Anju C and Kirti S Pande Department of ECE, Amrita Vishwa Vidyapeetham, Amrita School of Engineering Bangalore,

More information

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset M.Santhi, Arun Kumar S, G S Praveen Kalish, Siddharth Sarangan, G Lakshminarayanan Dept of ECE, National Institute

More information

Retiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming.

Retiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming. Retiming Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford Outline Structural optimization methods. Retiming. Modeling. Retiming for minimum delay. Retiming for minimum

More information

Implementation of ALU Using Asynchronous Design

Implementation of ALU Using Asynchronous Design IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 6 (Nov. - Dec. 2012), PP 07-12 Implementation of ALU Using Asynchronous Design P.

More information

Application of Binary Decision Diagram in digital circuit analysis.

Application of Binary Decision Diagram in digital circuit analysis. Application of Binary Decision Diagram in digital circuit analysis. Jyoti Kukreja University of Southern California For Dr. James Ellison Abstract: Binary Decision Diagrams (BDDs) are one of the biggest

More information

Hermes-GLP: A GALS Network on Chip Router with Power Control Techniques

Hermes-GLP: A GALS Network on Chip Router with Power Control Techniques IEEE Computer Society Annual Symposium on VLSI Hermes-GLP: A GALS Network on Chip Router with Power Control Techniques Julian Pontes 1, Matheus Moreira 2, Rafael Soares 3, Ney Calazans 4 Faculty of Informatics,

More information

A Low Latency FIFO for Mixed Clock Systems

A Low Latency FIFO for Mixed Clock Systems A Low Latency FIFO for Mixed Clock Systems Tiberiu Chelcea Steven M. Nowick Department of Computer Science Columbia University e-mail: {tibi,nowick}@cs.columbia.edu Abstract This paper presents a low latency

More information

The design of a simple asynchronous processor

The design of a simple asynchronous processor The design of a simple asynchronous processor SUN-YEN TAN 1, WEN-TZENG HUANG 2 1 Department of Electronic Engineering National Taipei University of Technology No. 1, Sec. 3, Chung-hsiao E. Rd., Taipei,10608,

More information

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM Mansi Jhamb, Sugam Kapoor USIT, GGSIPU Sector 16-C, Dwarka, New Delhi-110078, India Abstract This paper demonstrates an asynchronous

More information

Design of an Improved and Robust Asynchronous Wrapper (AW) for FPGA Applications

Design of an Improved and Robust Asynchronous Wrapper (AW) for FPGA Applications Design of an Improved and Robust Asynchronous Wrapper (AW) for FPGA Applications Duarte L. Oliveira 1, Lester A. Faria 1 and Eduardo Lussari 1,2 1 Electronic Engineer Division Technological Institute of

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

Evaluation of pausible clocking for interfacing high speed IP cores in GALS Framework

Evaluation of pausible clocking for interfacing high speed IP cores in GALS Framework Evaluation of pausible clocking for interfacing high speed IP cores in GA Framework Joycee Mekie upratik Chakraborty Dinesh K. harma Indian Institute of Technology, Bombay, Mumbai 400076, India jrm@ee,supratik@cse,dinesh@ee.iitb.ac.in

More information

A Controller Testability Analysis and Enhancement Technique

A Controller Testability Analysis and Enhancement Technique A Controller Testability Analysis and Enhancement Technique Xinli Gu Erik Larsson, Krzysztof Kuchinski and Zebo Peng Synopsys, Inc. Dept. of Computer and Information Science 700 E. Middlefield Road Linköping

More information

Memory, Area and Power Optimization of Digital Circuits

Memory, Area and Power Optimization of Digital Circuits Memory, Area and Power Optimization of Digital Circuits Laxmi Gupta Electronics and Communication Department Jaypee Institute of Information Technology Noida, Uttar Pradesh, India Ankita Bharti Electronics

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

Design of 8 bit Pipelined Adder using Xilinx ISE

Design of 8 bit Pipelined Adder using Xilinx ISE Design of 8 bit Pipelined Adder using Xilinx ISE 1 Jayesh Diwan, 2 Rutul Patel Assistant Professor EEE Department, Indus University, Ahmedabad, India Abstract An asynchronous circuit, or self-timed circuit,

More information

A Scalable Coprocessor for Bioinformatic Sequence Alignments

A Scalable Coprocessor for Bioinformatic Sequence Alignments A Scalable Coprocessor for Bioinformatic Sequence Alignments Scott F. Smith Department of Electrical and Computer Engineering Boise State University Boise, ID, U.S.A. Abstract A hardware coprocessor for

More information

RTL Scan Design for Skewed-Load At-Speed Test under Power Constraints

RTL Scan Design for Skewed-Load At-Speed Test under Power Constraints RTL Scan Design for Skewed-Load At-Speed Test under Power Constraints Ho Fai Ko and Nicola Nicolici Department of Electrical and Computer Engineering McMaster University, Hamilton, ON, L8S 4K1, Canada

More information

RPUSM: An Effective Instruction Scheduling Method for. Nested Loops

RPUSM: An Effective Instruction Scheduling Method for. Nested Loops RPUSM: An Effective Instruction Scheduling Method for Nested Loops Yi-Hsuan Lee, Ming-Lung Tsai and Cheng Chen Department of Computer Science and Information Engineering 1001 Ta Hsueh Road, Hsinchu, Taiwan,

More information

Eliminating Nondeterminism to Enable Chip-Level Test of Globally-Asynchronous Locally-Synchronous SoC s

Eliminating Nondeterminism to Enable Chip-Level Test of Globally-Asynchronous Locally-Synchronous SoC s Eliminating Nondeterminism to Enable Chip-Level Test of Globally-Asynchronous Locally-Synchronous SoC s Matthew Heath, Wayne Burleson, University of Massachusetts Amherst Ian Harris, University of California

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

AS SILICON technology continues to make rapid progress,

AS SILICON technology continues to make rapid progress, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 10, OCTOBER 2006 1063 High Rate Data Synchronization in GALS SoCs Rostislav (Reuven) Dobkin, Ran Ginosar, and Christos P.

More information

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath

More information

Sequential Logic Synthesis with Retiming in Encounter RTL Compiler (RC)

Sequential Logic Synthesis with Retiming in Encounter RTL Compiler (RC) Sequential Logic Synthesis with Retiming in Encounter RTL Compiler (RC) Christoph Albrecht 1, Shrirang Dhamdhere 1, Suresh Nair 1, Krishnan Palaniswami 2, Sascha Richter 1 1 Cadence Design Systems, 2 Focus

More information

Area-Efficient Design of Asynchronous Circuits Based on Balsa Framework for Synchronous FPGAs

Area-Efficient Design of Asynchronous Circuits Based on Balsa Framework for Synchronous FPGAs Area-Efficient Design of Asynchronous ircuits Based on Balsa Framework for Synchronous FPGAs ERSA 12 Distinguished Paper Yoshiya Komatsu, Masanori Hariyama, and Michitaka Kameyama Graduate School of Information

More information

Delay and Power Optimization of Sequential Circuits through DJP Algorithm

Delay and Power Optimization of Sequential Circuits through DJP Algorithm Delay and Power Optimization of Sequential Circuits through DJP Algorithm S. Nireekshan Kumar*, J. Grace Jency Gnannamal** Abstract Delay Minimization and Power Minimization are two important objectives

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

An Approach for Integrating Basic Retiming and Software Pipelining

An Approach for Integrating Basic Retiming and Software Pipelining An Approach for Integrating Basic Retiming and Software Pipelining Noureddine Chabini Department of Electrical and Computer Engineering Royal Military College of Canada PB 7000 Station Forces Kingston

More information

Study of GALS based FPGA Architecture Using CAD Tool

Study of GALS based FPGA Architecture Using CAD Tool Study of GALS based FPGA Architecture Using CAD Tool Savitha Devaraj Department of Electronics Engineering Lokmanya Tilak College of Engineering, Navi Mumbai, Maharashtra, India Neeta Gargote Department

More information

ISSN Vol.08,Issue.07, July-2016, Pages:

ISSN Vol.08,Issue.07, July-2016, Pages: ISSN 2348 2370 Vol.08,Issue.07, July-2016, Pages:1312-1317 www.ijatir.org Low Power Asynchronous Domino Logic Pipeline Strategy Using Synchronization Logic Gates H. NASEEMA BEGUM PG Scholar, Dept of ECE,

More information

Modeling Asynchronous Communication at Different Levels of Abstraction Using SystemC

Modeling Asynchronous Communication at Different Levels of Abstraction Using SystemC Modeling Asynchronous Communication at Different Levels of Abstraction Using SystemC Shankar Mahadevan Inst. for Informatics and Mathematical Modeling Tech. Univ. of Denmark (DTU) Lyngby, Denmark sm@imm.dtu.dk

More information

Synchronization In Digital Systems

Synchronization In Digital Systems 2011 International Conference on Information and Network Technology IPCSIT vol.4 (2011) (2011) IACSIT Press, Singapore Synchronization In Digital Systems Ranjani.M. Narasimhamurthy Lecturer, Dr. Ambedkar

More information

A Transistor-Level Placement Tool for Asynchronous Circuits

A Transistor-Level Placement Tool for Asynchronous Circuits A Transistor-Level Placement Tool for Asynchronous Circuits M Salehi, H Pedram, M Saheb Zamani, M Naderi, N Araghi Department of Computer Engineering, Amirkabir University of Technology 424, Hafez Ave,

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

Retiming Arithmetic Datapaths using Timed Taylor Expansion Diagrams

Retiming Arithmetic Datapaths using Timed Taylor Expansion Diagrams Retiming Arithmetic Datapaths using Timed Taylor Expansion Diagrams Daniel Gomez-Prado Dusung Kim Maciej Ciesielski Emmanuel Boutillon 2 University of Massachusetts Amherst, USA. {dgomezpr,ciesiel,dukim}@ecs.umass.edu

More information

16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path

16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path Volume 4 Issue 01 Pages-4786-4792 January-2016 ISSN (e): 2321-7545 Website: http://ijsae.in 16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path Authors Channa.sravya

More information

Asynchronous Design By Conversion: Converting Synchronous Circuits into Asynchronous Ones

Asynchronous Design By Conversion: Converting Synchronous Circuits into Asynchronous Ones Asynchronous Design By onversion: onverting Synchronous ircuits into Asynchronous Ones Alex Branover, Rakefet Kol and Ran Ginosar VLSI Systems Research enter, Technion Israel Institute of Technology, Haifa

More information

Globally-asynchronous, Locally-synchronous Wrapper Configurations For

Globally-asynchronous, Locally-synchronous Wrapper Configurations For University of Central Florida Electronic Theses and Dissertations Masters Thesis (Open Access) Globally-asynchronous, Locally-synchronous Wrapper Configurations For 2004 Akarsh Ravi University of Central

More information

Implementing Synchronous Counter using Data Mining Techniques

Implementing Synchronous Counter using Data Mining Techniques Implementing Synchronous Counter using Data Mining Techniques Sangeetha S Assistant Professor,Department of Computer Science and Engineering, B.N.M Institute of Technology, Bangalore, Karnataka, India

More information

A New Optimal State Assignment Technique for Partial Scan Designs

A New Optimal State Assignment Technique for Partial Scan Designs A New Optimal State Assignment Technique for Partial Scan Designs Sungju Park, Saeyang Yang and Sangwook Cho The state assignment for a finite state machine greatly affects the delay, area, and testabilities

More information

Wave-Pipelining the Global Interconnect to Reduce the Associated Delays

Wave-Pipelining the Global Interconnect to Reduce the Associated Delays Wave-Pipelining the Global Interconnect to Reduce the Associated Delays Jabulani Nyathi, Ray Robert Rydberg III and Jose G. Delgado-Frias Washington State University School of EECS Pullman, Washington,

More information

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema [1] Laila A, [2] Ajeesh R V [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology, Kollam

More information

Delay Estimation for Technology Independent Synthesis

Delay Estimation for Technology Independent Synthesis Delay Estimation for Technology Independent Synthesis Yutaka TAMIYA FUJITSU LABORATORIES LTD. 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, JAPAN, 211-88 Tel: +81-44-754-2663 Fax: +81-44-754-2664 E-mail:

More information

Introduction to Asynchronous Circuits and Systems

Introduction to Asynchronous Circuits and Systems RCIM Presentation Introduction to Asynchronous Circuits and Systems Kristofer Perta April 02 / 2004 University of Windsor Computer and Electrical Engineering Dept. Presentation Outline Section - Introduction

More information

POWER ANALYSIS OF CRITICAL PATH DELAY DESIGN USING DOMINO LOGIC

POWER ANALYSIS OF CRITICAL PATH DELAY DESIGN USING DOMINO LOGIC 181 POWER ANALYSIS OF CRITICAL PATH DELAY DESIGN USING DOMINO LOGIC R.Yamini, V.Kavitha, S.Sarmila, Anila Ramachandran,, Assistant Professor, ECE Dept, M.E Student, M.E. Student, M.E. Student Sri Eshwar

More information

the main limitations of the work is that wiring increases with 1. INTRODUCTION

the main limitations of the work is that wiring increases with 1. INTRODUCTION Design of Low Power Speculative Han-Carlson Adder S.Sangeetha II ME - VLSI Design, Akshaya College of Engineering and Technology, Coimbatore sangeethasoctober@gmail.com S.Kamatchi Assistant Professor,

More information

Functional Test Generation for Delay Faults in Combinational Circuits

Functional Test Generation for Delay Faults in Combinational Circuits Functional Test Generation for Delay Faults in Combinational Circuits Irith Pomeranz and Sudhakar M. Reddy + Electrical and Computer Engineering Department University of Iowa Iowa City, IA 52242 Abstract

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET4076) Lecture 8 (1) Delay Test (Chapter 12) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Learning aims Define a path delay fault

More information

An Interconnect-Centric Design Flow for Nanometer Technologies. Outline

An Interconnect-Centric Design Flow for Nanometer Technologies. Outline An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 http://cadlab.cs.ucla.edu/~cong Outline Global interconnects

More information

A Module Diagnosis and Design-for-Debug Methodology Based on Hierarchical Test Paths

A Module Diagnosis and Design-for-Debug Methodology Based on Hierarchical Test Paths A Diagnosis and Design-for-Debug Methodology ased on Hierarchical Test s

More information

Sequential Logic Synthesis

Sequential Logic Synthesis Sequential Logic Synthesis Logic Circuits Design Seminars WS2010/2011, Lecture 9 Ing. Petr Fišer, Ph.D. Department of Digital Design Faculty of Information Technology Czech Technical University in Prague

More information

Floorplan considering interconnection between different clock domains

Floorplan considering interconnection between different clock domains Proceedings of the 11th WSEAS International Conference on CIRCUITS, Agios Nikolaos, Crete Island, Greece, July 23-25, 2007 115 Floorplan considering interconnection between different clock domains Linkai

More information

Optimization of Robust Asynchronous Circuits by Local Input Completeness Relaxation. Computer Science Department Columbia University

Optimization of Robust Asynchronous Circuits by Local Input Completeness Relaxation. Computer Science Department Columbia University Optimization of Robust Asynchronous ircuits by Local Input ompleteness Relaxation heoljoo Jeong Steven M. Nowick omputer Science Department olumbia University Outline 1. Introduction 2. Background: Hazard

More information

Latch Based Design (1A) Young Won Lim 2/18/15

Latch Based Design (1A) Young Won Lim 2/18/15 Latch Based Design (1A) Copyright (c) 2015 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any

More information

Lossless Compression using Efficient Encoding of Bitmasks

Lossless Compression using Efficient Encoding of Bitmasks Lossless Compression using Efficient Encoding of Bitmasks Chetan Murthy and Prabhat Mishra Department of Computer and Information Science and Engineering University of Florida, Gainesville, FL 326, USA

More information

Analysis of Different Multiplication Algorithms & FPGA Implementation

Analysis of Different Multiplication Algorithms & FPGA Implementation IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 2, Ver. I (Mar-Apr. 2014), PP 29-35 e-issn: 2319 4200, p-issn No. : 2319 4197 Analysis of Different Multiplication Algorithms & FPGA

More information

TEMPLATE BASED ASYNCHRONOUS DESIGN

TEMPLATE BASED ASYNCHRONOUS DESIGN TEMPLATE BASED ASYNCHRONOUS DESIGN By Recep Ozgur Ozdag A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the

More information

Distributed Synchronous Control Units for Dataflow Graphs under Allocation of Telescopic Arithmetic Units

Distributed Synchronous Control Units for Dataflow Graphs under Allocation of Telescopic Arithmetic Units Distributed Synchronous Control Units for Dataflow Graphs under Allocation of Telescopic Arithmetic Units Euiseok Kim, Hiroshi Saito Jeong-Gun Lee Dong-Ik Lee Hiroshi Nakamura Takashi Nanya Dependable

More information

Design of Asynchronous Interconnect Network for SoC

Design of Asynchronous Interconnect Network for SoC Final Report for ECE 6770 Project Design of Asynchronous Interconnect Network for SoC Hosuk Han 1 han@ece.utah.edu Junbok You jyou@ece.utah.edu May 12, 2007 1 Team leader Contents 1 Introduction 1 2 Project

More information

Monotonic Static CMOS and Dual V T Technology

Monotonic Static CMOS and Dual V T Technology Monotonic Static CMOS and Dual V T Technology Tyler Thorp, Gin Yee and Carl Sechen Department of Electrical Engineering University of Wasngton, Seattle, WA 98195 {thorp,gsyee,sechen}@twolf.ee.wasngton.edu

More information

DESIGN OF 2-D FILTERS USING A PARALLEL PROCESSOR ARCHITECTURE. Nelson L. Passos Robert P. Light Virgil Andronache Edwin H.-M. Sha

DESIGN OF 2-D FILTERS USING A PARALLEL PROCESSOR ARCHITECTURE. Nelson L. Passos Robert P. Light Virgil Andronache Edwin H.-M. Sha DESIGN OF -D FILTERS USING A PARALLEL PROCESSOR ARCHITECTURE Nelson L. Passos Robert P. Light Virgil Andronache Edwin H.-M. Sha Midwestern State University University of Notre Dame Wichita Falls, TX 76308

More information

REDUCING THE CODE SIZE OF RETIMED SOFTWARE LOOPS UNDER TIMING AND RESOURCE CONSTRAINTS

REDUCING THE CODE SIZE OF RETIMED SOFTWARE LOOPS UNDER TIMING AND RESOURCE CONSTRAINTS REDUCING THE CODE SIZE OF RETIMED SOFTWARE LOOPS UNDER TIMING AND RESOURCE CONSTRAINTS Noureddine Chabini 1 and Wayne Wolf 2 1 Department of Electrical and Computer Engineering, Royal Military College

More information

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS International Journal of Computing Academic Research (IJCAR) ISSN 2305-9184 Volume 2, Number 4 (August 2013), pp. 140-146 MEACSE Publications http://www.meacse.org/ijcar DESIGN AND IMPLEMENTATION OF VLSI

More information

Low Power System-on-Chip Design Chapters 3-4

Low Power System-on-Chip Design Chapters 3-4 1 Low Power System-on-Chip Design Chapters 3-4 Tomasz Patyk 2 Chapter 3: Multi-Voltage Design Challenges in Multi-Voltage Designs Voltage Scaling Interfaces Timing Issues in Multi-Voltage Designs Power

More information

Low Power PLAs. Reginaldo Tavares, Michel Berkelaar, Jochen Jess. Information and Communication Systems Section, Eindhoven University of Technology,

Low Power PLAs. Reginaldo Tavares, Michel Berkelaar, Jochen Jess. Information and Communication Systems Section, Eindhoven University of Technology, Low Power PLAs Reginaldo Tavares, Michel Berkelaar, Jochen Jess Information and Communication Systems Section, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands {regi,michel,jess}@ics.ele.tue.nl

More information

Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput

Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput Scott C. Smith University of Missouri Rolla, Department of Electrical and Computer Engineering

More information

FPGA for Software Engineers

FPGA for Software Engineers FPGA for Software Engineers Course Description This course closes the gap between hardware and software engineers by providing the software engineer all the necessary FPGA concepts and terms. The course

More information

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany

More information

Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction

Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction Rakhi S 1, PremanandaB.S 2, Mihir Narayan Mohanty 3 1 Atria Institute of Technology, 2 East Point College of Engineering &Technology,

More information

SDR Forum Technical Conference 2007

SDR Forum Technical Conference 2007 THE APPLICATION OF A NOVEL ADAPTIVE DYNAMIC VOLTAGE SCALING SCHEME TO SOFTWARE DEFINED RADIO Craig Dolwin (Toshiba Research Europe Ltd, Bristol, UK, craig.dolwin@toshiba-trel.com) ABSTRACT This paper presents

More information

Reconfigurable PLL for Digital System

Reconfigurable PLL for Digital System International Journal of Engineering Research and Technology. ISSN 0974-3154 Volume 6, Number 3 (2013), pp. 285-291 International Research Publication House http://www.irphouse.com Reconfigurable PLL for

More information

A Deterministic Globally Asynchronous Locally Synchronous Microprocessor Architecture

A Deterministic Globally Asynchronous Locally Synchronous Microprocessor Architecture A Deterministic Globally Asynchronous Locally Synchronous Microprocessor Architecture Matthew Heath and Ian Harris University of Massachusetts Amherst {mheath, harris}@ecs.umass.edu Abstract This paper

More information

Retiming and Clock Scheduling for Digital Circuit Optimization

Retiming and Clock Scheduling for Digital Circuit Optimization 184 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Retiming and Clock Scheduling for Digital Circuit Optimization Xun Liu, Student Member,

More information

PROOFS Fault Simulation Algorithm

PROOFS Fault Simulation Algorithm PROOFS Fault Simulation Algorithm Pratap S.Prasad Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL prasaps@auburn.edu Term Paper for ELEC 7250 (Spring 2005) Abstract This paper

More information

Advances in Designing Clockless Digital Systems

Advances in Designing Clockless Digital Systems Advances in Designing Clockless Digital Systems Prof. Steven M. Nowick nowick@cs.columbia columbia.edu Department of Computer Science (and Elect. Eng.) Columbia University New York, NY, USA Introduction

More information

Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2

Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 04, 2015 ISSN (online): 2321-0613 Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2

More information

Implementation of Asynchronous Topology using SAPTL

Implementation of Asynchronous Topology using SAPTL Implementation of Asynchronous Topology using SAPTL NARESH NAGULA *, S. V. DEVIKA **, SK. KHAMURUDDEEN *** *(senior software Engineer & Technical Lead, Xilinx India) ** (Associate Professor, Department

More information

Parallel and Distributed VHDL Simulation

Parallel and Distributed VHDL Simulation Parallel and Distributed VHDL Simulation Dragos Lungeanu Deptartment of Computer Science University of Iowa C.J. chard Shi Department of Electrical Engineering University of Washington Abstract This paper

More information

MOST computations used in applications, such as multimedia

MOST computations used in applications, such as multimedia IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 9, SEPTEMBER 2005 1023 Pipelining With Common Operands for Power-Efficient Linear Systems Daehong Kim, Member, IEEE, Dongwan

More information

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic A Novel Design of High Speed and Area Efficient De-Multiplexer Using Pass Transistor Logic K.Ravi PG Scholar(VLSI), P.Vijaya Kumari, M.Tech Assistant Professor T.Ravichandra Babu, Ph.D Associate Professor

More information

Quasi Delay-Insensitive High Speed Two-Phase Protocol Asynchronous Wrapper for Network on Chips

Quasi Delay-Insensitive High Speed Two-Phase Protocol Asynchronous Wrapper for Network on Chips Guan XG, Tong XY, Yang YT. Quasi delay-insensitive high speed two-phase protocol asynchronous wrapper for network on chips. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 25(5): 1092 1100 Sept. 2010. DOI 10.1007/s11390-010-1086-3

More information

On Designs of Radix Converters Using Arithmetic Decompositions

On Designs of Radix Converters Using Arithmetic Decompositions On Designs of Radix Converters Using Arithmetic Decompositions Yukihiro Iguchi 1 Tsutomu Sasao Munehiro Matsuura 1 Dept. of Computer Science, Meiji University, Kawasaki 1-51, Japan Dept. of Computer Science

More information

ISSN Vol.03, Issue.02, March-2015, Pages:

ISSN Vol.03, Issue.02, March-2015, Pages: ISSN 2322-0929 Vol.03, Issue.02, March-2015, Pages:0122-0126 www.ijvdcs.org Design and Simulation Five Port Router using Verilog HDL CH.KARTHIK 1, R.S.UMA SUSEELA 2 1 PG Scholar, Dept of VLSI, Gokaraju

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

TIMA Lab. Research Reports

TIMA Lab. Research Reports ISSN 1292-862 TIMA Lab. Research Reports TIMA Laboratory, 46 avenue Félix Viallet, 38000 Grenoble France Session 1.2 - Hop Topics for SoC Design Asynchronous System Design Prof. Marc RENAUDIN TIMA, Grenoble,

More information

X(1) X. X(k) DFF PI1 FF PI2 PI3 PI1 FF PI2 PI3

X(1) X. X(k) DFF PI1 FF PI2 PI3 PI1 FF PI2 PI3 Partial Scan Design Methods Based on Internally Balanced Structure Tomoya TAKASAKI Tomoo INOUE Hideo FUJIWARA Graduate School of Information Science, Nara Institute of Science and Technology 8916-5 Takayama-cho,

More information

LabVIEW Based Embedded Design [First Report]

LabVIEW Based Embedded Design [First Report] LabVIEW Based Embedded Design [First Report] Sadia Malik Ram Rajagopal Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 malik@ece.utexas.edu ram.rajagopal@ni.com

More information

Pooja Kawale* et al ISSN: [IJESAT] [International Journal of Engineering Science & Advanced Technology] Volume-6, Issue-3,

Pooja Kawale* et al ISSN: [IJESAT] [International Journal of Engineering Science & Advanced Technology] Volume-6, Issue-3, Pooja Kawale* et al ISSN: 2250-3676 [IJESAT] [International Journal of Engineering Science & Advanced Technology] Volume-6, Issue-3, 161-165 Design of AMBA Based AHB2APB Bridge Ms. Pooja Kawale Student

More information

RTL Power Estimation and Optimization

RTL Power Estimation and Optimization Power Modeling Issues RTL Power Estimation and Optimization Model granularity Model parameters Model semantics Model storage Model construction Politecnico di Torino Dip. di Automatica e Informatica RTL

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

HAI ZHOU. Evanston, IL Glenview, IL (847) (o) (847) (h)

HAI ZHOU. Evanston, IL Glenview, IL (847) (o) (847) (h) HAI ZHOU Electrical and Computer Engineering Northwestern University 2535 Happy Hollow Rd. Evanston, IL 60208-3118 Glenview, IL 60025 haizhou@ece.nwu.edu www.ece.nwu.edu/~haizhou (847) 491-4155 (o) (847)

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device

More information

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier U.V.N.S.Suhitha Student Department of ECE, BVC College of Engineering, AP, India. Abstract: The ever growing need for improved

More information

Clock Skew Optimization Considering Complicated Power Modes

Clock Skew Optimization Considering Complicated Power Modes Clock Skew Optimization Considering Complicated Power Modes Chiao-Ling Lung 1,2, Zi-Yi Zeng 1, Chung-Han Chou 1, Shih-Chieh Chang 1 National Tsing-Hua University, HsinChu, Taiwan 1 Industrial Technology

More information

High Performance Asynchronous Circuit Design Method and Application

High Performance Asynchronous Circuit Design Method and Application High Performance Asynchronous Circuit Design Method and Application Charlie Brej School of Computer Science, The University of Manchester, Oxford Road, Manchester, M13 9PL, UK. cbrej@cs.man.ac.uk Abstract

More information

High-Speed Cell-Level Path Allocation in a Three-Stage ATM Switch.

High-Speed Cell-Level Path Allocation in a Three-Stage ATM Switch. High-Speed Cell-Level Path Allocation in a Three-Stage ATM Switch. Martin Collier School of Electronic Engineering, Dublin City University, Glasnevin, Dublin 9, Ireland. email address: collierm@eeng.dcu.ie

More information