Quasi Delay-Insensitive High Speed Two-Phase Protocol Asynchronous Wrapper for Network on Chips

Size: px
Start display at page:

Download "Quasi Delay-Insensitive High Speed Two-Phase Protocol Asynchronous Wrapper for Network on Chips"

Transcription

1 Guan XG, Tong XY, Yang YT. Quasi delay-insensitive high speed two-phase protocol asynchronous wrapper for network on chips. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 25(5): Sept DOI /s Quasi Delay-Insensitive High Speed Two-Phase Protocol Asynchronous Wrapper for Network on Chips Xu-Guang Guan ( ), Student Member, IEEE, Xing-Yuan Tong ( ), and Yin-Tang Yang ( ) Institute of Microelectronics, Xidian University, Xi an , China guanxuguang 5@126.com; mayxt@126.com; yangyt@xidian.edu.cn Received July 13, 2009; revised June 17, Abstract For the purpose of solving the shortcomings of low speed and high power consumption of asynchronous wrapper in conventional network on chips, this paper proposes a quasi delay-insensitive high-speed two-phase operation mode asynchronous wrapper. The metastable state in sampling data procedure can be avoided by detecting the write/read signal, which can be used to stop the clock. Empty/full level of the registers can be determined by detecting the pulse signal of the two-phase asynchronous register, and then control the wrapper to sample input/output data. Sender wrapper and receiver wrapper consist of C elements and threshold gates, which ensure the quasi delay-insensitive characteristics and enhance the robustness. Simulations under different technology corners are implemented based on SMIC 0.18 µm standard CMOS. Sender wrapper and receiver wrapper allow synchronous modules to work at the speed of 3.08 GHz and 2.98 GHz respectively with average dynamic power consumption of mw and mw. Its advantages of high-throughput, low-power, scalability and robustness make it a viable option for high-speed low-power interconnection of network-on-chip. Keywords asynchronous wrapper, quasi delay-insensitive, network on chip (NoC), two-phase protocol, threshold gate 1 Introduction As semiconductor technology shrinks, more IP cores are integrated onto a single chip to implement more complicated and efficient on-chip system solutions. But with the increasing working speed and larger scale of on chip systems, conventional single clock working operation faces a lot of challenges, such as poor reusability of modules, power consumption increment of clock tree, large size of clock tree area, clock skew and EMI. Because of these problems, the complexity of designing very deep submicron integrated circuit is greatly increased. So the problems brought by clock become the crucial issues that need to be solved first in ultra large integrated circuits. To reduce power consumption of clock and increase communication performance, extensive research has been conducted into network-on-chip (NoC) systems [1-3]. The NoC approach particularly suits communication-dominant on-chip systems. Asynchronous NoCs are proposed to eliminate the clock for global communication [4-5], providing better power efficiency and higher modularity compared to synchronous NoCs. Asynchronous circuits have a couple of advantages over synchronous circuits in terms of low power design. The lack of a clock network is a substantial advantage. High-speed clock networks have been known to account for as high as 70% of total power consumption of the system [6]. In addition, asynchronous circuits have the equivalent of perfect clock gating. In other words, circuits go into working when data come, while automatically go into sleep when no data come, and there is no need for extra control logic to control them. So it is necessary to add asynchronous wrappers around synchronous modules, in order to communicate with each other by asynchronous routers to reach high-throughput and low-power. Thus asynchronous wrapper becomes an important and difficult issue in NoC design. This paper presents a novel quasi delayinsensitive two phase protocol asynchronous wrapper with characteristics of high-speed, low-power, quasi delay-insensitive and high-scalability, which can fulfill the requirements of high performance interconnections of the network on chips. 2 Network on Chips and Asynchronous Interconnection Handshake Protocol The working manner of network on chips refers to Regular Paper Supported by the National Natural Science Foundation of China under Grant Nos , , the National High-Tech Research and Development 863 Program of China under Grant Nos. 2009AA01Z258, 2009AA01Z260, and the National Science & Technology Important Project under Grant No. 2009ZX Springer Science + Business Media, LLC & Science Press, China

2 Xu-Guang Guan et al.: Two-Phase Asynchronous Wrapper for NoC 1093 the data transmission mode in computer networks. Data can be transmitted to the corresponding target module by route switching, which substitutes the conventional data transmission mode in bus-based architecture. So it has high concurrent transmission capacity and expansibility. Owing to the dependence of data transmission on the handshake signals other than clock in the asynchronous on chip networks, problems brought by clock can be eliminated, also modularity is greatly enhanced. Although on chip networks have a variety of topologies, asynchronous wrappers can be used directly regardless of topology changes due to its high reusable characteristics. 2D mesh topology is widely used in network on chips, and Fig.1 shows its unit structure. It consists of synchronous module, stoppable clock, asynchronous wrapper, asynchronous router and on-chip buffer. Four-phase protocol is widely used in conventional NoC design [7-9], because four-phase protocol can effectively reuse existing synchronous units, and it is suitable for designing function module in asynchronous router due to its design simplicity. But communications between asynchronous router and asynchronous wrapper always become the bottleneck when burst mode data transmission emerge. In other words, four-phase working manner cannot satisfy the increasing demand of large scale communications in network on chip. complexity; therefore, it is more preferable for communications between asynchronous routers. In fourphase protocol working, as shown in Fig.2(a), control signals will hop four times for transmitting one data, which is time-consuming and power-consuming. So it is not suitable for high-speed low-power asynchronous on-chip interconnect. Substituting two-phase protocol for four-phase protocol can result in a great enhancement in communication speed, as shown in Fig.2(b). In two-phase operation mode, each hop in request and acknowledgement signals represents a data transmission. The throughput of two-phase protocol circuits can reach twice as much as that of four-phase protocol circuits under the same frequency requests. Thus twophase protocol is more suitable for high-speed transmission applications in network on chips. Fig.2. Four-phase and two-phase transmission protocols. (a) Four-phase handshake protocol. (b) Two-phase handshake protocol. Fig.1. Structure diagram of 2D mesh network on chip unit. Two-phase operation mode can be triggered to work at rising edge and falling edge of the request signal. So its working speed is much higher than that of four-phase operation mode [10]. But two-phase protocol does not fit function module design due to its high However, single-rail working is widely used in conventional two-phase transmission protocol. In order to avoid glitches at outputs, delay-matching is adopted to guarantee the proper operation of the circuits. At this point, dual-rail operation can reflect its particular advantages. Dual-rail protocol circuits are also called quasi delay-insensitive circuits, namely the circuit is insensitive to variations of physical parameters if lines on fork and merge are equal in length. Variations of physical parameters include doping density fluctuating, temperature and voltage variations. So dual-rail circuits have strong robustness, which is suitable for low-voltage high-speed on-chip applications. We can combine the high-speed two-phase protocol and high robustness dual-rail protocol together, that is, two-phase dual-rail working. Fig.3 is the diagram of two-phase dual-rail transmission protocol. If the

3 1094 J. Comput. Sci. & Technol., Sept. 2010, Vol.25, No.5 channel transmits the same data, the codes in different cycles are not always the same. Data code in present cycle depends on the data in previous cycle. If the data in present cycle is different from that of the previous, two-phase dual-rail encoding will change following the sequence ; if the data in present cycle and following cycles are the same and equal to 1, then two-phase dual-rail encoding will change following the sequence ; if the data in present cycle and following cycles are the same and equal to 0, then two-phase dual-rail encoding will change following the sequence (see Fig.3) can be seen more visually in Fig.4. Only one bit changes in each cycle, and data is indicated by S1, while S2 is an accompanying bit, which indicates the adjacent cycles have the same data. This is a non-return to zero (NRTZ) encoding, in which each hop of one line represents a data transmission, so the time utilization is twice that of the four-phase working. After two-phase dual-rail protocol asynchronous wrappers are used instead of the synchronous modules, data signals can be converted to two-phase dual-rail data at a high speed. And high speed transmissions between wrappers and routers become possible, which greatly enhances the system performance. 3 Specific Implementation of Two-Phase Quasi Delay-Insensitive Asynchronous Wrapper Fig.3. Diagram of two-phase dual-rail transmission protocol. Design of asynchronous wrappers need to meet three major targets: less metastable state, lower delay and lower power consumption. It has the function of converting synchronous signals into asynchronous signals of corresponding transmission protocol and vice versa. Asynchronous wrapper proposed recently in [11-13] are four-phase single-rail wrappers: Although single-rail design can broadly reuse traditional synchronous units, it has unavoidable limitations as delay-matching, low function in anti-emi, extra control circuits and glitches. The two-phase wrapper proposed is insensitive to delay variations, and can properly work with no delaymatching effort, and be of high function in anti-emi. 3.1 Sender Two-Phase Wrapper Fig.4. Waveform of two-phase dual-rail transmission protocol. The encoding scheme of two-phase dual-rail protocol Specific circuit of sender two-phase quasi delayinsensitive wrapper is shown in Fig.5. It can automatically detect read signal and convert output data of the Fig.5. Implementation of sender two-phase quasi delay-insensitive wrapper.

4 Xu-Guang Guan et al.: Two-Phase Asynchronous Wrapper for NoC 1095 synchronous module to two-phase dual-rail output data with properly working under variations of delays. For simplicity, we merely draw one bit conversion as the example here, and more bits conversion is easy to reach through adding bit width at the output buffer. The difficulty of designing the wrapper is detecting full/empty state of the output buffer and controlling the clock module to work. The circuit in Fig.5 consists of three parts: synchronous module, two-phase asynchronous wrapper and asynchronous output buffers. The wrapper is responsible for release/stop the clock as well as data sampling. It can stop the stoppable clock module through detecting write signal. If write signal becomes high, then stretch signal will go high just after 2 gates delay. Processing element serves as the synchronous module and is responsible for sending data and informing the wrapper to write. Output buffers are responsible for data storage and connections with asynchronous routers. Twophase registers need XOR gates at both input and output to detect new data, which is different from conventional four-phase registers, as shown in right half of Fig.5. Two flip-flops are used to sample synchronous data, while pulse generator is used to control D flip-flop sampling, as shown in Fig.6. It consists of XNOR gate, XOR gate and NOR gate. N1 and N2 come from XOR gates of both input and output, while ack signal comes from next stage of two-phase register. When codes at inputs and outputs are unequal, that is, N1 and N2 signal are different, the output of XOR gate goes low. And now the voltages of signal ack and N2 are the same, so the output of XOR gate is low as well. This will make the pulse signal go higher and create a rising edge pulse, which will trigger D flip-flops to sample the incoming data. After the D flip-flops have successfully sampled incoming data, the pulse signal goes low. So each change in signal data will cause D flip-flops to sample, avoiding null cycles in four-phase wrappers, so the throughput is enhanced. The working procedure of wrapper circuit in Fig.5 is Fig.6. Pulse generation module. as follows. The circuit begins to reset at first. After reset, the output of reverse C element is high, the output of TH33 threshold gate and signal stretch are low, and stoppable clock module begins to work properly; the wrapper circuit is waiting for read signal from synchronous module. When synchronous module wants to send data, write signal goes high, and the output of reverse C element keeps high due to low signal stretch. So the output of #1 AND gate goes high. And now, three inputs of the TH33 gate are all high, which fulfill the threshold condition, and this will cause the output of the TH33 gate to go high. So the stoppable clock goes into pause state to keep the output data stable. This can be classified into two circumstances according to the data inputted. Firstly, new data is different from previous data, that is, signal data and D1 are unequal. This can be detected by #1 XOR gate and signal H goes high. In addition, stretch is high, so the output of #2 AND gate goes high, which enables the tri-state gate and data can transmit into signal D1. Variations in D1 can be detected by pulse generator and cause signal pulse to go high, and this makes the TH33 gate meet the reset condition. So clock is released by signal stretch. At the same time, rising edge of the signal pulse will control D flip-flop #2 and #3 to sample D1 and D2. By now synchronous data have been converted to two-phase dual-rail asynchronous data. Secondly, new data is the same as previous ones, i.e., signal data and D1 are equal. In this case, the output of #1 XOR gate H keeps low, so the output of #2 reverse C element keeps high. Therefore, the output of #3 AND gate goes high, which triggers #1 D flip-flop to change its output to the opposite one. Similarly, variations in D2 will be detected by pulse generator and makes signal pulse high, which controls D flip-flops to sample inputs. By now, cycles of second circumstances end. The working procedure above all can be regarded as write slowly, read quickly. So what is write quickly, read slowly? This situation is very similar to blocking in asynchronous part. If the asynchronous part transmits slowly, ack signal of the next stage will not reach to the present stage right away, thus the pulse signal will not go high until ack signal arrives. So signal stretch will continuously be high until pulse signal changes. That is to say, the synchronous module will wait until the asynchronous part is free. So flow control is manipulated by pulse signal. Another situation often encounter is that output buffers get full. In this case, signal ack is contrary to signal N2. This will cause the output of TH33 gate stretch to keep high, and clock remains pause. Only when output buffers are not full, pulse signal can rise again, and release the output of TH33. Thus, stoppable clock can continuously go into working again.

5 1096 J. Comput. Sci. & Technol., Sept. 2010, Vol.25, No.5 Fig.7. Implementation of receiver two-phase quasi delay-insensitive wrapper. 3.2 Receiver Two-Phase Wrapper Specific implementation of receiver two-phase quasi delay-insensitive wrapper is shown in Fig.7. The receiver wrapper is simpler than sender because only D1 represents data. This wrapper can automatically detect read signal and converts two-phase dual-rail data to synchronous data, only one-bit conversion is shown here, and more bits conversion is easy to reach through adding bit width at the input buffer. The difficulty of designing this wrapper is detecting full/empty state of the input buffer and controlling the stoppable clock module to work. Circuits in Fig.7 mainly consist of three parts. Processing element serves as the synchronous module and is responsible for receiving data and informing the wrapper to read. If read signal becomes high, then stretch signal will go high just after two gates delay and stoppable clock will stop working. Ack signal is responsible for informing the previous stages that data sampling has been finished. In order to achieve flow control function at the receiver, changes must be made to the first two-phase register at the interface of input buffers and two-phase wrapper. Thus the stretch signal of the wrapper can control the action of pulse generation module, so the flow control is reached. Improved pulse generation module is shown in Fig.8(a). G1 and G2 are produced by XOR gates at the input and output of first stage two-phase register respectively. When new data arrive, G1 will get inversed, and XOR gate of pulse generation module goes high. And now pulse generation module needs to wait for signal stretch to let signal pulse go high. In other words, the working rhythm of input buffer is controlled by signal stretch, which avoids data of previous cycle being flushed by new data. This point can be seen exactly from Fig.8(b). When data arrive, G1 is low while G2 is high, so the output of XOR gate and the inverse output of TH22 gate are both high. If two-phase wrapper wants to read data, stretch becomes high, and three inputs of the AND gate in pulse generation module are all high, which will cause signal pulse to rise. At the same time, high signal stretch makes the output of inverse TH22 gate go low. So after delays of a TH22 gate and an inverter, signal b goes low, which makes the output of AND gate pulse go low. And, a pulse period is over. Fig.8. Pulse generation module with flow control and its timing diagram. (a) Pulse generation module with flow control function. (b) Timing diagram of pulse generation module.

6 Xu-Guang Guan et al.: Two-Phase Asynchronous Wrapper for NoC 1097 The working procedure of receiver two-phase quasi delay-insensitive wrapper in Fig.7 is as follows. The circuit begins to reset at first. After reset, the outputs of C element and TH33 gate are both zero, and stoppable clock module begins to work properly. The whole module is waiting for the arrival of read signal. If synchronous module wants to read data, read signal goes high. The output of AND gate goes high due to high reverse output of C element. And now TH33 gate meets set condition, thus signal stretch goes high and clock stops. If there are any new data at the input port, that is, G1 and G2 are different, then a pulse can be generated by pulse generator. This pulse is used to control D flip-flops to sample input data. Signal read keeps high during the time clock suspended, so tri-state gate is enabled, and data can enter synchronous module through tri-state gate. Meanwhile, high signal stretch and pulse cause the inputs of TH33 gate to reach reset condition, thus signal stretch goes low. This will make stoppable clock module go to work again. By now, a data sampling period is over. It can be found that when data bits being converted are wider, the number of gates used is larger than that of conventional single rail circumstances. But dual-rail two-phase working scheme has unique advantages over the single-rail working scheme, quasi delay-insensitive can make the circuit immune to delay variations and strong in anti-emi. From the view point of on-chip communication, stability is the first consideration. If error frequently appears in transmission, then more cost will come from retransmission or error correction. These greatly add burdens to the network. So a high robustness transmission scheme is necessary to on-chip communications. Generally speaking, it is worthy to use more components to achieve robustness. Another issue concerned is the probability of hazard. It can be divided into two parts. First part is the asynchronous register, in which pulse generation module was designed with specific discrete gates to avoid hazard. A rising pulse-edge is produced only when the new code is different from the current code and the current code is the same with the next stage code. So there are two possible transitions that generate the rising pulse edge: the arrival of a new code, or the arrival of an acknowledgement of the next stage s code. The falling pulse edge is always generated by the capture of the new code in the D flip-flops. So this circuit is robust against delay variations. There is one timing relationship that must be guaranteed: the minimum pulse width of the D flip-flop must be taken into account. This requirement is easily met since the falling edge of the pulse is not generated until the D flip-flops capture a new value and it propagates through the pulse generation logic. Second part is the control signal of the wrapper. Different from conventional Boolean gates, C element and null convention logics are both state holding devices, and they have input threshold characteristics. Thus C element and null convention logic are insensitive to the sequence of incoming signals, that is to say the output of the device only changes when inputs are all reached or all removed, so glitches cannot emerge at the outputs. In a word, the circuit is hazard-free. 4 Simulations and Analysis The whole circuits of sender wrapper and receiver wrapper are implemented using SMIC 0.18 µm standard technology. Fig.9 is the SPICE simulation waveform. It can be found that in Fig.9(a), data at first Fig.9. Simulation waveform of sender and receiver wrapper. (a) Waveform of sender two-phase wrapper. (b) Waveform of receiver two-phase wrapper.

7 1098 J. Comput. Sci. & Technol., Sept. 2010, Vol.25, No.5 half of the time is 1, and dual-rail two-phase output changes following the sequence ; data at last half of the time is 0, and dual-rail two-phase output changes following the sequence So the sender has intact functions. In Fig.9(b), asynchronous two-phase input data at first half of the time changes following the sequence , and it can be found that the data sampled is 0; while asynchronous twophase input data at last half of the time changes following the sequence , the data sampled work following the sequence Thus, the receiver has intact functions as well. To test the performance under different technology conditions and the sensitiveness on process variations, simulations were made under three technology models (tt, ss, ff) at temperature 27 C. Results are shown in Table 1 and Table 2. Here we define delay forward as the delay from rise of read/write signal to rise of signal stretch, and define delay all as the delay from rise of read/write signal to fall of signal stretch. P dynamic represents the average dynamic power consumption of the circuit, while P static stands for average static power consumption of the circuit. As can be seen from Table1 and Table 2, the circuit can properly work with preferable performance under different variations of technology. Variations of technology have a little impact on the circuit, namely the circuit has a better robustness. To further explain the merits of proposed two-phase quasi delay-insensitive asynchronous wrapper, comparisons among several wrappers on performance, sensitiveness on delay and operation mode are made, as shown in Table 3. Here Throughput sender represents the throughput of sender wrapper, and Throughput receiver represents the throughput of receiver wrapper. As can be seen from Table 3, the proposed wrapper has advantages over the majority of conventional singlerail four-phase asynchronous wrappers on throughput because the proposed wrapper can work at both edges of the signal. Although the comparisons are based on different technology processes, it can be found that the throughput of the proposed method is close to method in [11] and greatly exceed the throughput of methods in [12-14]. Considering that improving the technology process can make a great improvement to working speed of the circuits, the wrapper proposed is expected to have better performance under more advanced technology process. Due to the fact that the performance of wrapper in [11] is close to the performance of the wrapper proposed, we mainly focus our comparison on these two wrappers. Firstly, the wrapper of [11] may have timing constraints. While the wrapper proposed is based on a stoppable clock scheme, synchronous module does not change its state until data sampling is finished. Only when asynchronous module has successfully sampled the synchronous data can it inform the stoppable clock generator module to release the clock signal. So there are no timing constraints to the whole conversion circuits and the robustness of the wrapper can be increased. But in the mean time, some throughput performances would be lost since restoring the clock signal wastes some time. Secondly, the wrapper in [11] needs K + 2 conversion stages to reach the maximum throughput. While the proposed wrapper has nothing to do with conversion stages, the throughput is relatively stable. And the wrapper in [11] needs Multiplexer, De-multiplexer, finite state machine and Domino controller to control the data sampling by asynchronous part, so the area used could be large. The number of gates consumed by the wrapper (both sender and receiver) of [11] is 97, and the wrapper of this paper (both sender and receiver) uses 48 gates, only about half of the previous one. So the wrapper in [11] uses more area to achieve better throughput performance. Here we also compare the performance, structure as well as overheads of wrappers in [12-14]. The wrapper in [13] uses FIFO (first in first out) to increase buffering space, but it simultaneously brings about a series of defects. Specifically, it often can be Table 1. Performance Test Results of the Sender Wrapper (27 C, 1.8 V) Max Clk Fre. Supported delay forward delay all P dynamic P static ss 2.47 GHz ps ns mw@1.18 GHz µw tt 3.08 GHz ps ps mw@1.43 GHz µw ff 3.78 GHz ps ps mw@1.71 GHz µw Table 2. Performance Test Results of the Receiver Wrapper (27 C, 1.8 V) Max Clk Fre. Supported delay forward delay all P dynamic P static ss 2.39 GHz ps ps 1.43 mw@1.18 GHz 2.98 µw tt 2.98 GHz ps ps mw@1.43 GHz nw ff 3.67 GHz ps ps mw@1.71 GHz nw

8 Xu-Guang Guan et al.: Two-Phase Asynchronous Wrapper for NoC 1099 Table 3. Performance Comparisons of Different Asynchronous Wrappers Function Sensitiveness on Delay Throughput sender Throughput receiver (GEvents/s) (GEvents/s) Proposed Synchronous to dual-rail two-phase Quasi delay-insensitive (180 nm) (180 nm) Method in [11] Synchronous to single-rail four-phase Delay sensitive 2.39 (90 nm) 1.5 (90 nm) Method in [12] Synchronous to single-rail four-phase Delay sensitive 0.25 (65 nm) 0.3 (65 nm) Method in [13] Synchronous to single-rail four-phase Delay sensitive 0.52 (65 nm) 0.71 (65 nm) Method in [14] Synchronous to single-rail four-phase Delay sensitive 0.18 (65 nm) 0.22 (65 nm) observed that the FIFO uses dual-port RAM to serve as buffering space. But the area overheads could be much larger. Furthermore, the pointer which is used to detect the full/empty state in the FIFO so as to avoid overflow or underflow could be a problem that limits the performance of the wrapper. But the comparisons of read and write pointers are complex and designers have to design Gray encoding and decoding modules to convert pointers. Although the wrapper in [13] changes the encoding scheme to optimize the design of the wrapper, it still cannot avoid the comparisons between read and write pointers, and the improvement of the performance is not so obvious. Another problem is, when FIFO is empty, the write pointer needs to be resynchronized using a conventional two-dff synchronizer, since the empty state is detected at the synchronous domain. This has the disadvantages of delaying the effective increment of two clock cycles, which may have a direct impact on performance in the case of a small FIFO. In contrast, the wrapper proposed does not need FIFO, so comparisons between read and write pointers can be removed. Also, the area overheads of the proposed wrapper are much smaller due to the fact that there is no FIFO. Moreover, the wrapper in [13] has timing constraints, that is, there is a race between the signal and the data after a rising clock edge because of the increment of the read pointer. So delay matching is needed on the clock input of the Muller gate. The structure of the wrapper in [12] is nearly the same as the wrapper in [13]. And they have the same defects. The difference only exits in token generation/consumption mode and a smaller FIFO depth. The wrapper in [14] also has no FIFO, thus the area is smaller. Performance of [14] was limited by asynchronous delays in the stoppable clock and clock tree insertion constraints. Finally, the dual-rail operation mode in this paper makes the request signals integrated into data, which avoids designs of complex control circuits, simultaneously delay matching process can be removed. But the most striking merit is the characteristics of quasi delayinsensitiveness, which greatly enhances the robustness of the circuit and is more suitable for transmission on long interconnection lines. 5 Conclusions High performance on-chip communication is always a hot topic research field, and the emergence of network on chips makes this topic even hotter. High speed point to point communication in network on chips is the key problem that needs solving. Asynchronous circuits inherently have the advantages of high performance, so communication using asynchronous scheme request and acknowledgement gradually becomes popular in asynchronous network on chips. To further improve the performance of point to point asynchronous communication, this paper proposes a quasi delay-insensitive high-speed two-phase operation mode asynchronous wrapper for network on chips. It can convert the data between synchronous data and twophase dual-rail asynchronous data. Stoppable clock scheme is used so that the metastable state would not emerge. Full/empty state of the registers can be determined by pulse signals of the two-phase register, and further control the wrapper to sample input and output data. Sender wrapper and receiver wrapper work in a quasi delay-insensitive mode so the robustness is improved. The wrapper is implemented using SMIC 0.18 µm standard CMOS technology, and the wrapper can properly work under different technology modes with a high robustness. The wrapper has the merits of high speed, low power, and quasi delay-insensitiveness and lower design complexity. The wrapper proposed is suitable for high performance, high robustness onchip networks. Further research will focus on enhancing the working speed of the wrapper and improving the portability. We hope this paper can contribute to the development of high speed, low power asynchronous interconnect of network on chips. References [1] Dally W J, Towles B. Route packets, not wires: On-chip interconnection networks. In Proc. 38th ACM Conf. Design Automation, Las Vegas, Nevada, Jun , 2001, pp [2] Benini L, Micheli G D. Networks on chips: A new SoC paradigm. Computer, 2002, 35(1):

9 1100 J. Comput. Sci. & Technol., Sept. 2010, Vol.25, No.5 [3] Wang J L, Xue Y B, Wang H X, Li C M, Wang D S. CCNoC: Cache-coherent network on chip for chip multiprocessors. J. Comput. Sci. & Technol., 2010, 25(2): [4] Bainbridge J, Furber S B. CHAIN: A delay-insensitive chip area Interconnect. IEEE Micro, 2002, 22(5): [5] Lines A. Asynchronous interconnect for synchronous SoC design. IEEE Micro, 2004, 24(1): [6] Geer D. Is it time for clockless chips? Computer, 2005, 38(3): [7] Teehan P, Greenstreet M, Lemieux G. A survey and taxonomy of GALS design styles. IEEE Design & Test of Computers, 2007, 24(5): [8] Sheibanyrad A, Greiner A, Miro-Panades I. Multisynchronous and fully asynchronous NoCs for GALS architectures. IEEE Design & Test of Computers, 2008, 25(6): [9] Krstic M, Grass E, Gurkaynak F K, Vivet P. Globally asynchronous, locally synchronous circuits: Overview and outlook. IEEE Design & Test of Computers, 2007, 24(5): [10] Dobkin R R, Ginosar R. Two-phase synchronization with subcycle latency. Integration, the VLSI Journal, 2009, 42(3): [11] Sheibanyrad A, Greiner PA. Two efficient synchronous asynchronous converters well-suited for networks-on-chip in GALS architectures. Integration, the VLSI Journal, 2008, 41(1): [12] Beigne E, Vivet P. Design of on-chip and off-chip interfaces for a GALS NoC architecture. In Proc. the 12th International Symposium on Advanced Research in Asynchronous Circuits and Systems, Grenoble, Mar , 2006, pp [13] Yvain T, Edith B, Pascal V. Design and implementation of a GALS adapter for ANoC based architectures. In Proc. the 15th International Symposium on Asynchronous Circuits and Systems, Chapel Hill, USA, May 17-20, 2009, pp [14] Beigne E, Clermidy F, Miermont S, Vivet P. Dynamic voltage and frequency scaling architecture for units integration within a GALS NoC. In Proc. the 2nd IEEE International Symposium on Networks-on-Chip, Newcastle Upon Tyne, UK, Apr. 7-11, 2008, pp Xu-Guang Guan received his B.S. degree from Institute of Physics and Technology, Xidian University in He is currently working toward the M.D.-Ph.D. degree with the School of Microelectronics, Xidian University, China. He is a student member of IEEE. His research interests include asynchronous circuits design, network on chips and VLSI design. Xing-Yuan Tong received his B.S. degree from Guilin University of Electronic Technology in He is currently working toward the M.D.-Ph.D. degree with the School of Microelectronics, Xidian University, China. His research interests include VLSI designs and A/D converters. Yin-Tang Yang received his B.S. and M.S. degrees in microelectronics and solid state electronics from Xidian University, Xi an, China in 1982 and 1984, respectively, and received the Ph.D. degree from Xi an Jiaotong University in He is currently the vice president and professor of Xidian University. His research interests include VLSI technology, new semiconductor materials and devices, and microelectronics reliability technology.

A full asynchronous serial transmission converter for network-on-chips

A full asynchronous serial transmission converter for network-on-chips Vol. 31, No. 4 Journal of Semiconductors April 2010 A full asynchronous serial transmission converter for network-on-chips Yang Yintang( 杨银堂 ), Guan Xuguang( 管旭光 ), Zhou Duan( 周端 ), and Zhu Zhangming(

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath

More information

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset M.Santhi, Arun Kumar S, G S Praveen Kalish, Siddharth Sarangan, G Lakshminarayanan Dept of ECE, National Institute

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema [1] Laila A, [2] Ajeesh R V [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology, Kollam

More information

Hermes-GLP: A GALS Network on Chip Router with Power Control Techniques

Hermes-GLP: A GALS Network on Chip Router with Power Control Techniques IEEE Computer Society Annual Symposium on VLSI Hermes-GLP: A GALS Network on Chip Router with Power Control Techniques Julian Pontes 1, Matheus Moreira 2, Rafael Soares 3, Ney Calazans 4 Faculty of Informatics,

More information

WITH the development of the semiconductor technology,

WITH the development of the semiconductor technology, Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip Guang Sun, Yong Li, Yuanyuan Zhang, Shijun Lin, Li Su, Depeng Jin and Lieguang zeng Abstract Network on Chip (NoC)

More information

Implementation of ALU Using Asynchronous Design

Implementation of ALU Using Asynchronous Design IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 6 (Nov. - Dec. 2012), PP 07-12 Implementation of ALU Using Asynchronous Design P.

More information

Design of Low Power Wide Gates used in Register File and Tag Comparator

Design of Low Power Wide Gates used in Register File and Tag Comparator www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,

More information

Low Power GALS Interface Implementation with Stretchable Clocking Scheme

Low Power GALS Interface Implementation with Stretchable Clocking Scheme www.ijcsi.org 209 Low Power GALS Interface Implementation with Stretchable Clocking Scheme Anju C and Kirti S Pande Department of ECE, Amrita Vishwa Vidyapeetham, Amrita School of Engineering Bangalore,

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK DOI: 10.21917/ijct.2012.0092 HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK U. Saravanakumar 1, R. Rangarajan 2 and K. Rajasekar 3 1,3 Department of Electronics and Communication

More information

Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture

Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture 1 Physical Implementation of the DSPI etwork-on-chip in the FAUST Architecture Ivan Miro-Panades 1,2,3, Fabien Clermidy 3, Pascal Vivet 3, Alain Greiner 1 1 The University of Pierre et Marie Curie, Paris,

More information

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

Networks-on-Chip Router: Configuration and Implementation

Networks-on-Chip Router: Configuration and Implementation Networks-on-Chip : Configuration and Implementation Wen-Chung Tsai, Kuo-Chih Chu * 2 1 Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 413, Taiwan,

More information

Top-Level View of Computer Organization

Top-Level View of Computer Organization Top-Level View of Computer Organization Bởi: Hoang Lan Nguyen Computer Component Contemporary computer designs are based on concepts developed by John von Neumann at the Institute for Advanced Studies

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM Mansi Jhamb, Sugam Kapoor USIT, GGSIPU Sector 16-C, Dwarka, New Delhi-110078, India Abstract This paper demonstrates an asynchronous

More information

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad NoC Round Table / ESA Sep. 2009 Asynchronous Three Dimensional Networks on on Chip Frédéric ric PétrotP Outline Three Dimensional Integration Clock Distribution and GALS Paradigm Contribution of the Third

More information

6. Latches and Memories

6. Latches and Memories 6 Latches and Memories This chapter . RS Latch The RS Latch, also called Set-Reset Flip Flop (SR FF), transforms a pulse into a continuous state. The RS latch can be made up of two interconnected

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP 1 M.DEIVAKANI, 2 D.SHANTHI 1 Associate Professor, Department of Electronics and Communication Engineering PSNA College

More information

Digital VLSI Testing Prof. Santanu Chattopadhyay Department of Electronics and EC Engineering India Institute of Technology, Kharagpur.

Digital VLSI Testing Prof. Santanu Chattopadhyay Department of Electronics and EC Engineering India Institute of Technology, Kharagpur. Digital VLSI Testing Prof. Santanu Chattopadhyay Department of Electronics and EC Engineering India Institute of Technology, Kharagpur Lecture 05 DFT Next we will look into the topic design for testability,

More information

Multi-path Routing for Mesh/Torus-Based NoCs

Multi-path Routing for Mesh/Torus-Based NoCs Multi-path Routing for Mesh/Torus-Based NoCs Yaoting Jiao 1, Yulu Yang 1, Ming He 1, Mei Yang 2, and Yingtao Jiang 2 1 College of Information Technology and Science, Nankai University, China 2 Department

More information

Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology

Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology Vol. 3, Issue. 3, May.-June. 2013 pp-1475-1481 ISSN: 2249-6645 Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology Bikash Khandal,

More information

Implementation of Asynchronous Topology using SAPTL

Implementation of Asynchronous Topology using SAPTL Implementation of Asynchronous Topology using SAPTL NARESH NAGULA *, S. V. DEVIKA **, SK. KHAMURUDDEEN *** *(senior software Engineer & Technical Lead, Xilinx India) ** (Associate Professor, Department

More information

CAD Technology of the SX-9

CAD Technology of the SX-9 KONNO Yoshihiro, IKAWA Yasuhiro, SAWANO Tomoki KANAMARU Keisuke, ONO Koki, KUMAZAKI Masahito Abstract This paper outlines the design techniques and CAD technology used with the SX-9. The LSI and package

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

A Minimal Source-Synchronous Interface

A Minimal Source-Synchronous Interface Minimal Source-Synchronous Interface janta hakraborty and Mark. Greenstreet Department of omputer Science University of ritish olumbia Vancouver,, anada chakra,mrg @cs.ubc.ca Domain 1 Domain 2 data Domain

More information

Design & Implementation of AHB Interface for SOC Application

Design & Implementation of AHB Interface for SOC Application Design & Implementation of AHB Interface for SOC Application Sangeeta Mangal M. Tech. Scholar Department of Electronics & Communication Pacific University, Udaipur (India) enggsangeetajain@gmail.com Nakul

More information

Reliable Physical Unclonable Function based on Asynchronous Circuits

Reliable Physical Unclonable Function based on Asynchronous Circuits Reliable Physical Unclonable Function based on Asynchronous Circuits Kyung Ki Kim Department of Electronic Engineering, Daegu University, Gyeongbuk, 38453, South Korea. E-mail: kkkim@daegu.ac.kr Abstract

More information

Bandwidth Optimization in Asynchronous NoCs by Customizing Link Wire Length

Bandwidth Optimization in Asynchronous NoCs by Customizing Link Wire Length Bandwidth Optimization in Asynchronous NoCs by Customizing Wire Length Junbok You Electrical and Computer Engineering, University of Utah jyou@ece.utah.edu Daniel Gebhardt School of Computing, University

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 938 LOW POWER SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY T.SANKARARAO STUDENT OF GITAS, S.SEKHAR DILEEP

More information

International Conference on Information Sciences, Machinery, Materials and Energy (ICISMME 2015)

International Conference on Information Sciences, Machinery, Materials and Energy (ICISMME 2015) International Conference on Information Sciences, Machinery, Materials and Energy (ICISMME 2015) ARINC - 429 airborne communications transceiver system based on FPGA implementation Liu Hao 1,Gu Cao 2,MA

More information

EECS150 - Digital Design Lecture 20 - Finite State Machines Revisited

EECS150 - Digital Design Lecture 20 - Finite State Machines Revisited EECS150 - Digital Design Lecture 20 - Finite State Machines Revisited April 2, 2009 John Wawrzynek Spring 2009 EECS150 - Lec20-fsm Page 1 Finite State Machines (FSMs) FSM circuits are a type of sequential

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

Synchronization In Digital Systems

Synchronization In Digital Systems 2011 International Conference on Information and Network Technology IPCSIT vol.4 (2011) (2011) IACSIT Press, Singapore Synchronization In Digital Systems Ranjani.M. Narasimhamurthy Lecturer, Dr. Ambedkar

More information

ISSN Vol.03, Issue.02, March-2015, Pages:

ISSN Vol.03, Issue.02, March-2015, Pages: ISSN 2322-0929 Vol.03, Issue.02, March-2015, Pages:0122-0126 www.ijvdcs.org Design and Simulation Five Port Router using Verilog HDL CH.KARTHIK 1, R.S.UMA SUSEELA 2 1 PG Scholar, Dept of VLSI, Gokaraju

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path

16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path Volume 4 Issue 01 Pages-4786-4792 January-2016 ISSN (e): 2321-7545 Website: http://ijsae.in 16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path Authors Channa.sravya

More information

Design of Asynchronous Interconnect Network for SoC

Design of Asynchronous Interconnect Network for SoC Final Report for ECE 6770 Project Design of Asynchronous Interconnect Network for SoC Hosuk Han 1 han@ece.utah.edu Junbok You jyou@ece.utah.edu May 12, 2007 1 Team leader Contents 1 Introduction 1 2 Project

More information

Chapter 3 - Top Level View of Computer Function

Chapter 3 - Top Level View of Computer Function Chapter 3 - Top Level View of Computer Function Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 3 - Top Level View 1 / 127 Table of Contents I 1 Introduction 2 Computer Components

More information

The Design of MCU's Communication Interface

The Design of MCU's Communication Interface X International Symposium on Industrial Electronics INDEL 2014, Banja Luka, November 0608, 2014 The Design of MCU's Communication Interface Borisav Jovanović, Dejan Mirković and Milunka Damnjanović University

More information

Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy

Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Wei Chen, Rui Gong, Fang Liu, Kui Dai, Zhiying Wang School of Computer, National University of Defense Technology,

More information

An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart

An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart Weiwei Jiang Columbia University, USA Gabriele Miorandi University of Ferrara, Italy Wayne Burleson

More information

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Outline Introduction Overview of WiNoC system architecture Overlaid

More information

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.1, FEBRUARY, 2015 http://dx.doi.org/10.5573/jsts.2015.15.1.077 Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network

More information

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS TECHNOLOGY BRIEF June 2002 Compaq Computer Corporation Prepared by ISS Technology Communications C ONTENTS Executive Summary 1 Notice 2 Introduction 3 SDRAM Operation 3 How CAS Latency Affects System Performance

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

The design of a simple asynchronous processor

The design of a simple asynchronous processor The design of a simple asynchronous processor SUN-YEN TAN 1, WEN-TZENG HUANG 2 1 Department of Electronic Engineering National Taipei University of Technology No. 1, Sec. 3, Chung-hsiao E. Rd., Taipei,10608,

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

TIMA Lab. Research Reports

TIMA Lab. Research Reports ISSN 1292-862 TIMA Lab. Research Reports TIMA Laboratory, 46 avenue Félix Viallet, 38000 Grenoble France Session 1.2 - Hop Topics for SoC Design Asynchronous System Design Prof. Marc RENAUDIN TIMA, Grenoble,

More information

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering IP-SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY A LOW POWER DESIGN D. Harihara Santosh 1, Lagudu Ramesh Naidu 2 Assistant professor, Dept. of ECE, MVGR College of Engineering, Andhra Pradesh, India

More information

Clocked and Asynchronous FIFO Characterization and Comparison

Clocked and Asynchronous FIFO Characterization and Comparison Clocked and Asynchronous FIFO Characterization and Comparison HoSuk Han Kenneth S. Stevens Electrical and Computer Engineering University of Utah Abstract Heterogeneous blocks, IP reuse, network-on-chip

More information

Design of Synchronous NoC Router for System-on-Chip Communication and Implement in FPGA using VHDL

Design of Synchronous NoC Router for System-on-Chip Communication and Implement in FPGA using VHDL Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.

More information

[Indu*, 5(4): April, 2016] ISSN: (I2OR), Publication Impact Factor: 3.785

[Indu*, 5(4): April, 2016] ISSN: (I2OR), Publication Impact Factor: 3.785 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY GLITCH-FREE NAND-BASED DIGITALLY CONTROLLED DELAY LINES M.Indu*, S.HasmashruthiA.Nandhini, N.Megala Electronics and Communication

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

Feedback Techniques for Dual-rail Self-timed Circuits

Feedback Techniques for Dual-rail Self-timed Circuits This document is an author-formatted work. The definitive version for citation appears as: R. F. DeMara, A. Kejriwal, and J. R. Seeber, Feedback Techniques for Dual-Rail Self-Timed Circuits, in Proceedings

More information

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Power-Mode-Aware Buffer Synthesis for Low-Power

More information

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China CMOS Crossbar Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China OUTLINE Motivations Problems of Designing Large Crossbar Our Approach - Pipelined MUX

More information

Analysis and Design of Low Voltage Low Noise LVDS Receiver

Analysis and Design of Low Voltage Low Noise LVDS Receiver IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 9, Issue 2, Ver. V (Mar - Apr. 2014), PP 10-18 Analysis and Design of Low Voltage Low Noise

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

POWER consumption has become one of the most important

POWER consumption has become one of the most important 704 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 4, APRIL 2004 Brief Papers High-Throughput Asynchronous Datapath With Software-Controlled Voltage Scaling Yee William Li, Student Member, IEEE, George

More information

Investigation and Comparison of Thermal Distribution in Synchronous and Asynchronous 3D ICs Abstract -This paper presents an analysis and comparison

Investigation and Comparison of Thermal Distribution in Synchronous and Asynchronous 3D ICs Abstract -This paper presents an analysis and comparison Investigation and Comparison of Thermal Distribution in Synchronous and Asynchronous 3D ICs Brent Hollosi 1, Tao Zhang 2, Ravi S. P. Nair 3, Yuan Xie 2, Jia Di 1, and Scott Smith 3 1 Computer Science &

More information

SPATIAL PARALLELISM IN THE ROUTERS OF ASYNCHRONOUS ON-CHIP NETWORKS

SPATIAL PARALLELISM IN THE ROUTERS OF ASYNCHRONOUS ON-CHIP NETWORKS SPATIAL PARALLELISM IN THE ROUTERS OF ASYNCHRONOUS ON-CHIP NETWORKS A THESIS SUBMITTED TO THE UNIVERSITY OF MANCHESTER FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN THE FACULTY OF ENGINEERING AND PHYSICAL

More information

ANEW asynchronous pipeline style, called MOUSETRAP,

ANEW asynchronous pipeline style, called MOUSETRAP, 684 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007 MOUSETRAP: High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh and Steven M. Nowick Abstract

More information

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING QUESTION BANK NAME OF THE SUBJECT: EE 2255 DIGITAL LOGIC CIRCUITS

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING QUESTION BANK NAME OF THE SUBJECT: EE 2255 DIGITAL LOGIC CIRCUITS KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING QUESTION BANK NAME OF THE SUBJECT: EE 2255 DIGITAL LOGIC CIRCUITS YEAR / SEM: II / IV UNIT I BOOLEAN ALGEBRA AND COMBINATIONAL

More information

POWER ANALYSIS OF CRITICAL PATH DELAY DESIGN USING DOMINO LOGIC

POWER ANALYSIS OF CRITICAL PATH DELAY DESIGN USING DOMINO LOGIC 181 POWER ANALYSIS OF CRITICAL PATH DELAY DESIGN USING DOMINO LOGIC R.Yamini, V.Kavitha, S.Sarmila, Anila Ramachandran,, Assistant Professor, ECE Dept, M.E Student, M.E. Student, M.E. Student Sri Eshwar

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

Implementing Synchronous Counter using Data Mining Techniques

Implementing Synchronous Counter using Data Mining Techniques Implementing Synchronous Counter using Data Mining Techniques Sangeetha S Assistant Professor,Department of Computer Science and Engineering, B.N.M Institute of Technology, Bangalore, Karnataka, India

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2

Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 04, 2015 ISSN (online): 2321-0613 Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2

More information

MAHALAKSHMI ENGINEERING COLLEGE TIRUCHIRAPALLI

MAHALAKSHMI ENGINEERING COLLEGE TIRUCHIRAPALLI DEPARTMENT: ECE MAHALAKSHMI ENGINEERING COLLEGE TIRUCHIRAPALLI 621213 QUESTION BANK SUBJECT NAME: DIGITAL ELECTRONICS SEMESTER III SUBJECT CODE: EC2203 UNIT 5 : Synchronous and Asynchronous Sequential

More information

Wave-Pipelining the Global Interconnect to Reduce the Associated Delays

Wave-Pipelining the Global Interconnect to Reduce the Associated Delays Wave-Pipelining the Global Interconnect to Reduce the Associated Delays Jabulani Nyathi, Ray Robert Rydberg III and Jose G. Delgado-Frias Washington State University School of EECS Pullman, Washington,

More information

Deduction and Logic Implementation of the Fractal Scan Algorithm

Deduction and Logic Implementation of the Fractal Scan Algorithm Deduction and Logic Implementation of the Fractal Scan Algorithm Zhangjin Chen, Feng Ran, Zheming Jin Microelectronic R&D center, Shanghai University Shanghai, China and Meihua Xu School of Mechatronical

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Dec Hex Bin ORG ; ZERO. Introduction To Computing

Dec Hex Bin ORG ; ZERO. Introduction To Computing Dec Hex Bin 0 0 00000000 ORG ; ZERO Introduction To Computing OBJECTIVES this chapter enables the student to: Convert any number from base 2, base 10, or base 16 to any of the other two bases. Add and

More information

SHRI ANGALAMMAN COLLEGE OF ENGINEERING. (An ISO 9001:2008 Certified Institution) SIRUGANOOR, TIRUCHIRAPPALLI

SHRI ANGALAMMAN COLLEGE OF ENGINEERING. (An ISO 9001:2008 Certified Institution) SIRUGANOOR, TIRUCHIRAPPALLI SHRI ANGALAMMAN COLLEGE OF ENGINEERING AND TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR, TIRUCHIRAPPALLI 621 105 DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING EC1201 DIGITAL

More information

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee

More information

Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput

Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput Scott C. Smith University of Missouri Rolla, Department of Electrical and Computer Engineering

More information

Evaluation of pausible clocking for interfacing high speed IP cores in GALS Framework

Evaluation of pausible clocking for interfacing high speed IP cores in GALS Framework Evaluation of pausible clocking for interfacing high speed IP cores in GA Framework Joycee Mekie upratik Chakraborty Dinesh K. harma Indian Institute of Technology, Bombay, Mumbai 400076, India jrm@ee,supratik@cse,dinesh@ee.iitb.ac.in

More information

OVERVIEW: NETWORK ON CHIP 3D ARCHITECTURE

OVERVIEW: NETWORK ON CHIP 3D ARCHITECTURE OVERVIEW: NETWORK ON CHIP 3D ARCHITECTURE 1 SOMASHEKHAR, 2 REKHA S 1 M. Tech Student (VLSI Design & Embedded System), Department of Electronics & Communication Engineering, AIET, Gulbarga, Karnataka, INDIA

More information

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital hardware modules that accomplish a specific information-processing task. Digital systems vary in

More information

A Half-duplex Synchronous Serial Fieldbus S 2 CAN with Multi-host Structure

A Half-duplex Synchronous Serial Fieldbus S 2 CAN with Multi-host Structure Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com A Half-duplex Synchronous Serial Fieldbus S 2 CA with Multi-host Structure Xu-Fei SU College of Physics and Information

More information

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering,

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering, Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering, K.S.R College of Engineering, Tiruchengode, Tamilnadu,

More information

Asynchronous Bypass Channel Routers

Asynchronous Bypass Channel Routers 1 Asynchronous Bypass Channel Routers Tushar N. K. Jain, Paul V. Gratz, Alex Sprintson, Gwan Choi Department of Electrical and Computer Engineering, Texas A&M University {tnj07,pgratz,spalex,gchoi}@tamu.edu

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 5, Sep-Oct 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 5, Sep-Oct 2014 RESEARCH ARTICLE OPEN ACCESS A Survey on Efficient Low Power Asynchronous Pipeline Design Based on the Data Path Logic D. Nandhini 1, K. Kalirajan 2 ME 1 VLSI Design, Assistant Professor 2 Department of

More information

Encoding Scheme for Power Reduction in Network on Chip Links

Encoding Scheme for Power Reduction in Network on Chip Links RESEARCH ARICLE OPEN ACCESS Encoding Scheme for Power Reduction in Network on Chip Links Chetan S.Behere*, Somulu Gugulothu** *(Department of Electronics, YCCE, Nagpur-10 Email: chetanbehere@gmail.com)

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University Chapter 3 Top Level View of Computer Function and Interconnection Contents Computer Components Computer Function Interconnection Structures Bus Interconnection PCI 3-2 Program Concept Computer components

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

Performance Evaluation of Elastic GALS Interfaces and Network Fabric

Performance Evaluation of Elastic GALS Interfaces and Network Fabric FMGALS 2007 Performance Evaluation of Elastic GALS Interfaces and Network Fabric Junbok You Yang Xu Hosuk Han Kenneth S. Stevens Electrical and Computer Engineering University of Utah Salt Lake City, U.S.A

More information

CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL

CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL Shyam Akashe 1, Ankit Srivastava 2, Sanjay Sharma 3 1 Research Scholar, Deptt. of Electronics & Comm. Engg., Thapar Univ.,

More information

Controller IP for a Low Cost FPGA Based USB Device Core

Controller IP for a Low Cost FPGA Based USB Device Core National Conference on Emerging Trends in VLSI, Embedded and Communication Systems-2013 17 Controller IP for a Low Cost FPGA Based USB Device Core N.V. Indrasena and Anitta Thomas Abstract--- In this paper

More information

A Router Architecture for Connection-Oriented Service Guarantees in the MANGO Clockless Network-on-Chip

A Router Architecture for Connection-Oriented Service Guarantees in the MANGO Clockless Network-on-Chip A Architecture for Connection-Oriented Service Guarantees in the MANGO Clockless -on-chip Tobias Bjerregaard and Jens Sparsø Informatics and Mathematical Modelling Technical University of Denmark (DTU),

More information

Novel Intelligent I/O Architecture Eliminating the Bus Bottleneck

Novel Intelligent I/O Architecture Eliminating the Bus Bottleneck Novel Intelligent I/O Architecture Eliminating the Bus Bottleneck Volker Lindenstruth; lindenstruth@computer.org The continued increase in Internet throughput and the emergence of broadband access networks

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information