AS THE NUMBER of cores integrated into a systemon-chip

Size: px
Start display at page:

Download "AS THE NUMBER of cores integrated into a systemon-chip"

Transcription

1 774 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 5, MAY 2011 Data Encoding Schemes in Networks on Chip Maurizio Palesi, Member, IEEE, Giuseppe Ascia, Fabrizio Fazzino, Member, IEEE, and Vincenzo Catania Abstract An ever more significant fraction of the overall power dissipation of a network-on-chip (NoC) based systemon-chip (SoC) is due to the interconnection system. In fact, as technology shrinks, the power contribute of NoC links starts to compete with that of NoC routers. In this paper, we propose the use of data encoding techniques as a viable way to reduce both power dissipation and energy consumption of NoC links. The proposed encoding scheme exploits the wormhole switching techniques and works on an end-to-end basis. That is, flits are encoded by the network interface (NI) before they are injected in the network and are decoded by the destination NI. This makes the scheme transparent to the underlying network since the encoder and decoder logic is integrated in the NI and no modification of the routers architecture is required. We assess the proposed encoding scheme on a set of representative data streams (both synthetic and extracted from real applications) showing that it is possible to reduce the power contribution of both the self-switching activity and the coupling switching activity in inter-routers links. As results, we obtain a reduction in total power dissipation and energy consumption up to 37% and 18%, respectively, without any significant degradation in terms of both performance and silicon area. Index Terms Coupling capacitance, data encoding, low power, network on chip (NoC), power analysis. I. Introduction AS THE NUMBER of cores integrated into a systemon-chip (SoC) increases, the role played by the interconnection system becomes more and more important. The International Technology Roadmap for Semiconductors [1] depicts the on-chip communication issues as the limiting factors for performance and power consumption in current and next generation SoCs [2]. Design in the era of ultradeep submicron silicon is mainly dominated by issues concerning the communication infrastructure. As the design complexity increases, the total length of the interconnection wire increases, resulting in long transmission delay and higher power consumption. In addition, the distance between wires shrinks with technology, increasing coupling capacitance, and the height of the wire material increases resulting in greater fringe capacitance [3]. While SoCs consisting of tens of cores were common in the last decade, common predictions foresee that the next Manuscript received April 16, 2010; revised September 14, 2010; accepted November 9, Date of current version April 20, This paper was recommended by Associate Editor D. Atienza. M. Palesi is with Kore University, Enna 94100, Italy ( maurizio.palesi@unikore.it). G. Ascia and V. Catania are with the Dipartimento di Ingegneria Informatica e delle Telecomunicazioni, Università di Catania, Catania 95125, Italy ( gascia@diit.unict.it; vcatania@diit.unict.it). F. Fazzino is with Icera, Inc., Bristol BS32 4AQ, U.K. ( fazzino@icerasemi.com). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TCAD /$26.00 c 2011 IEEE generation of many-cores SoC will contain hundreds or thousands of cores [4]. In the many core era, as the number of cores residing on the same SoC increases significantly, the communication solutions also need to change drastically in order to support the new inter-core communication demands. It is nowadays widely recognized that network-on-chip (NoC) architectures [5] represent the most viable solution to cope with scalability issues of future many-cores systems and to meet performance, power, and reliability requirements which characterize future ambient intelligent applications. The importance of interconnects in complex many-core chips has outrun the importance of transistors as a dominant factor of performance, power, cost, and reliability [6], [7]. Sophisticated on-chip communication protocols, involving advanced adaptive routing algorithms, selection policies, data protection schemes, and mechanisms aimed at guaranteeing the quality-of-service are pushing the interconnect system to become one of the main elements which characterizes the system in terms of both power dissipation and energy consumption. In fact, the advantages over bus-based architectures come at the cost of increase in complexity which pushes the communication system to become one of the main elements of a SoC which strongly impact the cost, power, and performance figures of the overall system. For instance, in the Intel s 80- tiles TeraFLOPS processor [8] over 30% of the chip area is dedicated to the communication system and the communication power accounts for about 28% of the total. In the Massachusetts Institute of Technology RAW chip [9] the NoC is responsible for 40% of the system power. In the Æthereal NoC the largest percentage of power dissipation (54%) is due to the NoC clock, followed by the NoC links (18%) [10]. In [11], it has been shown that on-chip interconnects account for a significant fraction (up to 50%) of the total on-chip energy consumption. The basic elements which form a NoC-based interconnect are network interfaces (NIs), routers, and links. As technology shrinks, the power dissipated by the links is as relevant as (or more relevant than) that dissipated by routers and NIs [12] [15]. In this paper we focus on power dissipated by network links. Links dissipate power due to the switching activity (both self and coupling) induced by subsequent data patterns traversing the link [16]. We focus on data encoding schemes as a viable way to reduce power dissipated by the network links. The basic idea is to opportunely encode the data before their injection in the network in such a way as to reduce the switching activity of the links. Differently from the previous approaches on data encoding in NoCs [16], [17] our proposal exploits the pipeline nature of wormhole switching technique (commonly used in the NoC context) to implement an

2 PALESI et al.: DATA ENCODING SCHEMES IN NETWORKS ON CHIP 775 end-to-end encoding/decoding scheme. In our proposal data are encoded before transmission and are decoded at the destination. This makes the approach transparent with respect to the underlying NoC fabric as it does not require any modification of the router architecture. It should be pointed out, however, that the proposed approach is thought to be applied to NoC architectures which do not use virtual channels (VCs). In fact, if VCs are used, the effectiveness of the proposed approach reduces as it will be shown in the experimental section. In addition, although the proposed approach is specifically focused on reducing the power dissipated by network links, it does not conflict with other techniques which attack the power problem by acting on the other main elements of the interconnect. Based on this, it could be used in cooperation with other approaches to form a complete framework for power optimization of the NoC based interconnection system. The proposed data encoding schemes are assessed on a set of traffic scenarios both synthetic and extracted from real applications. The analysis takes into consideration not only the power and energy saving due to the reduction of the switching activity in network links, but also the overhead (both in terms of power dissipation and silicon area) due to the encoding and decoding logic integrated into the NI. We show that up to 37% of power dissipation and up to 18% of energy consumption can be saved adopting the proposed encoding schemes without impacting the overall performance of the network. The rest of this paper is organized as follows. In Section II, we briefly discuss related research. An overview of the proposed data encoding scheme is presented in Section III. In Section IV, we perform a general quantitative analysis aimed at showing the power saving figures obtained using data encoding schemes when several parameters are made to vary. The proposed data encoding scheme along with a possible hardware implementation and its analysis is presented in detail in Section V. The proposed data encoding scheme is assessed and compared with other approaches on a set of traffic scenarios, both synthetically generated as well as extracted from real applications, in Section VI. Finally, in Section VII we draw our conclusion and discuss possible future developments. II. Related Work The interconnection network dissipates a significant fraction of the total system power budget. For this reason, the design of power efficient interconnection networks is today recognized as a key issue. There are several works in literature which deal with power dissipation and energy consumption issues in NoC architectures. They differ by either the level of abstraction in which they operate or by the specific NoC element they focus on. Here we focus on power dissipated by network links. Several techniques have been proposed in the literature to reduce the power dissipated by the links of a NoC [18] [22]. In this subsection, we review the sub-set of them which use data encoding schemes as main mechanism to reduce power dissipation. Almost all the data encoding techniques proposed in the literature have been defined to be applied in the context of bus-based architectures with the primary goal of minimizing transition activities on buses while ignoring cross-coupled capacitance. Bus-invert method [23] can be applied to encode randomly distributed data patterns. Highly correlated access patterns exhibit spatial-temporal locality which is exploited by Gray code [24], T0 method [25], and the working-zone encoding [26]. Application specific approaches based on a priori knowledge of the traffic patterns have been proposed [27] [29]. Other encoding techniques have been defined to take into consideration the contribute of cross-coupled capacitance [30], [31]. In the context of NoCs, Jantsch et al. [16] analyzed the use of partial bus invert coding as link level low power encoding technique with the conclusion that it spends several times more power than no encoding at all, if normalized for the same performance, which is done by adjusting supply voltage and frequency. However, differently from how we propose in this paper, they considered point-to-point encoding in which every router in the NoC decodes the incoming flits and encodes the outgoing flits. In addition, [16] did not take advantage of the pipelined nature of the flow of flits through the links of the routing path which is guaranteed by the wormhole switching technique generally used in NoCs. Conversely, the data encoding scheme proposed in this paper is designed to exploit the wormhole switching technique making it possible to operate an end-to-end encoding which does not determine any overhead in terms of routers and links. It only requires the upgrade of the network interface, which is augmented with the encoding decoding logic leaving the underlying network as is. Pande et al. proposed the use of crosstalk avoidance codes (CAC) to improve the signal integrity by reducing the effective coupling capacitance and lowering the energy dissipation of wire segments [17]. By incorporating CAC in NoC data streams the effective coupling capacitance of the inter-switch wire segments and hence the communication energy is reduced without incurring the non-optimal wire area overhead of shielding/spacing. However, its application requires redundant wires and the encoding/decoding process is performed hop by hop for the header flit. The data encoding schemes we present in this paper have been already introduced by the authors in [32] and [33]. In this paper, the proposed schemes are discussed in more details and assessed by means of both a quantitative analysis and an experimental analysis. Differently from [32] and [33], in which the analysis was carried out using synthetic traffic scenarios and without considering the interaction between concurrent communication flows (zero-load analysis), in this paper we extend the experimental analysis to real case studies using a cycle accurate simulator in which the dynamic behavior of the NoC is modeled. III. Overview of the Proposal The general scheme of the proposed approach is depicted in Fig. 1. The basic idea is to apply an encoding technique end-to-end taking advantage of the wormhole switching technique [34]. In fact, wormhole switching is the most suitable option for on-chip communication [35]. The rationale behind

3 776 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 5, MAY 2011 through h links and will be processed by h + 1 routers (from R 0 to R h ). The power dissipated to transmit the packet can be expressed as Fig. 1. General scheme of the proposed approach. this idea is due to the pipeline nature of wormhole switching. Since all the links of the routing path are traversed by the same sequence of flits, the encoding decision taken at the network interface guarantees the same switching behavior in each link of the routing path. As shown in Fig. 1, the NI is augmented with an encoder (E) and a decoder (D) block. With the exception of the header flit, the encoder encodes the outgoing flits of the packet in such a way as to minimize the power dissipated by the interrouter point-to-point links which form the routing path of the current packet. Since the routers are not equipped with any encoding/decoding logic, the header flit is not encoded as it contains control information (destination address, packet size, and so on) which have to be processed by the routers through the routing path. Similarly to the above description, all the incoming flits in the network interface (with the exception of the header flit) are decoded by the decoder block. It should be pointed out that the proposed scheme is designed to be applied in the context of no VC based implementations. In fact, if VCs are used, the assumption that the flits belonging to different packets are not interleaved in the same link is not valid anymore. At any rate, it does not mean that the proposed scheme cannot be applied in VC based implementations but, instead, that the potential power savings are reduced. Before describing the proposed encoding technique, in the next section we will perform a general quantitative analysis which allows to assess the achievable power reduction improvement when the scheme outlined above is used. IV. General Quantitative Analysis In this section, we will first define a general model to quantify the communication power saving that can be achieved using an end-to-end data encoding technique as sketched in Fig. 1. Then, we will analyze the impact of several architectural and communication-related parameters on power saving. Finally, we will summarize the results of this analysis. A. Power Saving Estimation Let us consider a packet of n + 1 flits pkt = {b H,b 1,b 2,...,b n } where we indicated with b H the header flit and with b i, i = 1, 2,...,n the body flits. Let us suppose that a packet is transmitted from PE s to PE d involving h hops 1 (see Fig. 1). Such a packet will pass 1 With the term hops we refer to the number of links traversed and not to the number of routers traversed. P(pkt) =2(n +1)P NI +(h +1)P (H) R +(h +1)nP (B) R + h(n +1)P L (1) where we indicate with P (H) R and P (B) R the power dissipated by the router when it routes a header flit and a body flit, respectively. With P NI the power dissipated by the network interface and with P L the power dissipated to transmit a flit over a link. Now, let us consider the case in which the NI encodes each flit of the packet (except the header flit) before transmission to the network and decodes each received flit from the network (except the header flit). In this case, the power dissipated to transmit the packet can be expressed as ˆP(pkt) =2P NI +2nˆP NI +(h +1)P (H) R +(h +1)nP (B) R + h(n +1)ˆP L (2) where ˆP NI is the power dissipated by the NI augmented with the encoding/decoding logic and with ˆP L the power dissipated to transmit an encoded flit over a link. Let us indicate with P ED the power contribution of the encoding/decoding logic. 2 We can approximate ˆP NI as the P NI plus the overhead due to the encoding/decoding logic ˆP NI P NI + P ED. (3) Substituting (3) in (2) we have ˆP(pkt) =2P NI +2n(P NI + P ED )+(h +1)P (H) R (4) +(h +1)nP (B) R + h(n +1)ˆP L. (5) The percentage reduction in power dissipation, PR, when the encoding technique is used is computed as PR =1 ˆP(pkt) P(pkt). (6) Substituting (1) and (5) in (6) and performing some symbolic algebraic manipulations we obtain hε (n + 1)(1 β) 2nγ δ PR = 2(n +1)+ε(h + 1)(n + α)+ hε δ (n +1) (7) which expresses the percentage power reduction by means of the following relative parameters. 1) α P (H) R /P (B) R is the header to body flit routing power ratio. This ratio is 1 since routing a header flit involves more operations than that required to route a body flit (e.g., routing algorithm, selection policy, arbitration, and so on). 2) β ˆP L /P L is the link power reduction factor. It indicates the average reduction factor of link power dissipation when the encoding technique is used. 3) γ P ED /P NI is the amount of power dissipated by the encoder/decoder logic normalized to the power dissipated by the network interface. 2 We assume that the power dissipated by the encoder logic is equal to the power dissipated by the decoder logic.

4 PALESI et al.: DATA ENCODING SCHEMES IN NETWORKS ON CHIP 777 4) δ P (B) R /P L is the ratio between the power dissipated by the router when it routes a body flit and the power dissipated to transmit a flit over a link. 5) ε P (B) R /P NI is the ratio between the power dissipated by the router when it routes a body flit and the power dissipated by the network interface. Equation (7) expresses the percentage reduction in power dissipation when the encoding technique is used in function of packet size, distance from source to destination and the relative parameters α, β, γ, δ, and ε. B. Analysis and Discussion To get some confidence with the percentage reduction in power dissipation that can be achieved in practical cases, let us consider as baseline a router in which α =1.04 (i.e., the average power dissipated by the router when it processes the header flit is 4% higher than that when it processes a body flit) and ε =1.08 (i.e., the power dissipated by the router is 8% higher than that dissipated by the NI). The power values used to compute α and ɛ have been captured from a power analysis of a bit router with input FIFO buffers of 4 flits and a NI with minimum buffering supporting OCP and AHB protocols. Additional details about the synthesis results can be found in Section V-E. Fig. 2(a) (d) shows a set of contour plots of the percentage power reduction for different values of the parameters β, γ, δ, h, and n. Fig. 2(a) shows the contour plot of the percentage power reduction for different values of the link power reduction factor, β (from 0% to 90%) and different power fraction of the encoding/decoding block, γ (from 0% to 10%). As can be observed, the overall power reduction spans from 5% to 35%. The effect of hop count, h, is analyzed in Fig. 2(b). As can be observed, the effectiveness of data encoding increases as hop count increases [36]. Moreover, as soon as the routing path length increases, the effectiveness of data encoding becomes less and less sensitive to the power overhead due to the encoding/decoding logic. The effect of packet size, n, is shown in Fig. 2(c). As can be observed there is a γ threshold located at about γ T =0.04. This value depends on the values of the remaining parameters considered in this analysis (β =0.7, δ =1,h =10).Itis interesting to observe that below γ T, an increase in packet size has a positive impact of power reduction. Conversely, above γ T, an increase in packet size causes a reduction in power saving. Please note that such behavior is observed for small packet size. For packet size greater than six flits, such separation becomes negligible. Finally, Fig. 2(d) shows the effect of the router to link power ratio δ. As expected, the more the power contribute of link is as compared to power contribute of the router, the more is the power saving that can be achieved. C. Summary of the Analysis Overall, the effectiveness of an encoding scheme strongly depends on several architectural and technological parameters as well as communication related parameters (e.g., packet size, routing path length). To summarize the results of the above general quantitative analysis, we can state that the effectiveness of applying data encoding techniques for low power in the NoC context increases as: 1) hop count increases; 2) power contribute of link is comparable with that of routers; and 3) packet size increases. It is expected that, due to the ever growing bandwidth requirements demanded by current and future applications, links will become more and more wider. In addition, technological trend is pushing power consumption from logic to wiring. Based on this, the power contribution of links is expected to dominate that of routers. Moreover, as the number of cores increases, the NoC size increases as well resulting in longer average path length. Based on the above considerations, we believe that the use of data encoding techniques represents a viable solution to address low-power issues in NoC based system architectures. V. Proposed Encoding Scheme In this section, we present the proposed encoding scheme whose goal is to reduce the power dissipated by point-to-point inter-router links of a NoC. Before we start to discuss the proposed scheme, we briefly analyze the different contributions which determine the power dissipated by a link. A. Power Model The dynamic power consumed by the interconnects and drivers is given by P = [T 0 1 (C s + C l )+T c C c ] V 2 dd F ck (8) where V dd is the supply voltage, F ck is the clock frequency, C s is the self capacitance (which includes the parallel-plate capacitance and the fringe capacitance), C l is the load capacitance, and C c is the coupling capacitance. T 0 1 and T c are the average number of effective transitions per cycle for C s and C c, respectively. They are computed as follows. T 0 1 counts the number of 0 1 transitions in the bus in two consecutive transmissions. T c counts the correlated switching between physically adjacent lines. Precisely, we can enumerate four types of coupling transitions as follows [30]. A Type I transition occurs when one of the lines switches while the other stays unchanged. In a Type II transition one line switches from low to high and the other from high to low. A Type III transition occurs when both lines switch simultaneously. Finally, in a Type IV transition both lines do not switch. The effective switched capacitance varies from type to type. Thus, the coupling transition activity T c is a weighted sum of the different type of coupling transition contributions. We have T c = k 1 T 1 + k 2 T 2 + k 3 T 3 + k 4 T 4 (9) where the T i, i =1, 2, 3, 4, are the average number of transition type i and k i are weights. According to [30] we assume k 1 =1, k 2 = 2, and k 3 = k 4 = 0. That is, k 1 is assumed as reference for other types of transition. The effective capacitance in Type II transition is usually twice that of a Type I transition. In Type III transition, as both signal switch simultaneously, C c

5 778 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 5, MAY 2011 Fig. 2. Contour plot of the percentage reduction in power dissipation when the encoding technique is used and α =1.04, ε =1.08. (a) δ =1,h =10,n =4. (b) β =0.7, δ =1,n = 4. (c) β =0.7, δ =1,h = 10. (d) β =0.7, h =10,n =4. TABLE I How Transitions Mutate If Data Is Inverted Time Normal Inverted Type I Type I t t Type II Type IV t t Type III Type IV t t Type IV Types II and III t t T4 T4 T 3 T 2 is not charged (here we assume that there is no misalignment between the two transitions). Finally, in Type IV transition there is no dynamic charge distribution over C c. Based on this, (8) can be expressed as follows: P = [T 0 1 (C s + C l )+(T 1 +2T 2 )C c ] V 2 dd F ck. (10) In the next subsection, we present the proposed encoded scheme whose primary goal is to minimize T 1 and T 2 and to minimize T 0 1 as secondary goal. B. Proposed Encoded Scheme Looking at (8) and (9) we have P T 0 1 C s +(k 1 T 1 + k 2 T 2 + k 3 T 3 + k 4 T 4 )C c. (11) If the data (from now on, the flit) is inverted, the link power consumption will be P T 0 1 C s +(k 1 T 1 + k 2T 2 + k 3T 3 + k 4T 4 )C c (12) where we indicate with T 0 1, T 1, T 2, T 3, and T 4, the self transition activity, the coupling transition activity of Types I, II, III, and IV, respectively, if the flit is inverted before being transmitted. It is simple to determine the relationship between the coupling transition activities if the flit is transmitted as is and the coupling transition activities if the flit is transmitted with its bits inverted. Table I reports for each transition type how it mutates if the flit is inverted. Data are organized as follows. The first bit is the value of the generic ith line of the link, whereas the second bit represents the value of the adjacent Fig. 3. Flowchart to evaluate the invert condition (14) for link width greater than or equal to 8 bits. line (line i + 1 of the same link). For each partition, the first line represents the values at time t 1, whereas the second line the values at time t. For instance, looking at the first partition which reports Type I transitions, the first column indicates that, on time slot t, lines i and i + 1 of a link were 0 and 0, respectively, and in the next time slot t they switch to 0 and 1, respectively. As can be observed from Table I, Type I transitions still remain Type I transitions if the flit is inverted. Type II and Type III transitions will mutate in Type IV transitions if the flit is inverted. Type IV transitions mutate either in Type II or Type III transitions. In particular, transitions indicated as T4 in the table mutate in Type III transitions whereas that indicated with T4 mutate in Type II transitions. Similarly, it is simple to find that T 0 1 = T 0 0. Thus, (12) can be expressed in function of T 1, T 2, T 3, T4, and T4 as P T 0 0 C s +[k 1 T 1 + k 2 T 4 + k 3 T 4 + k 4(T 2 + T 3 )]C c. (13) It is convenient to invert the flit before transmission if P> P. Taking (11) and (13) and considering, according to [30], k 1 =1,k 2 =2,k 3 = k 4 = 0 and C c /C s = 4, we obtain the following invert condition: T T 2 >T T 4. (14) In conclusion, the proposed encoding scheme simply inverts the flit before its transmission if and only if the invert condition (14) is satisfied. In the next subsection, we assess the hardware implications of implementing this encoding scheme into the network interface in a NoC based system.

6 PALESI et al.: DATA ENCODING SCHEMES IN NETWORKS ON CHIP 779 Fig. 5. Architecture of the encoder implementing the simplified invert condition (15). of ones in their inputs. Finally, in the third level there is a set of parallel comparators. D. Simplified Version of the Proposed Encoding Scheme The invert condition (14) is the exact condition which determines if the transmitted flit has to be inverted or not to reduce both the self switching activity and the coupling switching activity on the links traversed by the flit. Since the terms T 2 and T4 are weighted with a factor 8 in respect to T 0 1 and T 0 0, we can approximate the invert condition as Fig. 4. Encoder architecture. (a) Top level view. (b) Internal view of the encoder block (E). C. Design of the Proposed Encoding Scheme Looking again at the invert condition (14) and considering a link width less than or equal to 8 bit, if T 2 is greater than T4 then the invert condition is satisfied as T 00 can be at most 8. Based on this the flowchart shown in Fig. 3 can be considered a simple way to evaluate the invert condition. We use this algorithm as the base for the implementation of the encoding logic. For link width greater than 8 bit we found that the miss prediction of the invert condition does not exceed 1.2% on average as will be shown in the next subsection. Let us consider a NoC with links width of w bits. We assume that the NI, which hosts the encoding logic, packs body flits in w 1 bits. Fig. 4(a) shows the top-level view of the encoder. The w 1 bits body flit is concatenated with a 0 bit and represents the first input of the encoder. The second input is the previously encoded body flit. The internal logic of the encoder block is sketched in Fig. 4(b). The 2 1 bits of the incoming body flit are indicated with x i, i =0, 1,...,w 2 whereas that of the previously encoded body flit are indicated with y i. The wth bit of the previously encoded body flit is indicated with inv. This bit is used by the decoder to decide whether the received body flit has to be inverted (inv =1)or left as is (inv =0). The first level of the encoder determines the transition type. The four input blocks T 2 and T 4 assert their output if y i y i+1 x i x i+1 is a T 2 or a T4 transition, respectively. The two input blocks T 01 and T 00 assert their output if y i x i is a T 0 1 or a T 0 0 transition, respectively. The blocks Ones in the second level are parallel counters which count the number T 2 >T 4. (15) That is, the flit is inverted if (15) is satisfied. The architecture of the encoder implementing the simplified invert condition is shown in Fig. 5. On the one hand, as will be shown in the next subsection, the simplification of the invert condition results in a reduction of the area, power, and delay of the encoder since the logic to evaluate the invert condition becomes simpler. On the other hand, the use of the simplified (or approximated) invert condition instead of the exact invert condition reduces the effectiveness of the encoding scheme since for some combinations of successive transmitted flits (14) and (15) might conflict each other. E. Logic Synthesis Results The encoder and the decoder have been designed in Verilog HDL described at the RTL level, synthesized with Synopsys Design Compiler and mapped onto an UMC 65 nm technology library. A clock speed of 700 MHz has been considered. Here, we compare the area, power, and timing figures of the proposed encoding scheme (SC) against the bus-invert (BI) coding [23], the coupling driven bus invert (CDBI) coding [30], and the forbidden pattern condition (FPC) codes [17] as they have the highest potential for power saving while still represent a feasible implementation for on-chip communication. Fig. 6 shows the percentage impact on silicon area and power dissipation of the NI due to the data encoding/decoding logic. The baseline NI has minimum buffering and supports OCP 2 and AHB protocols [37]. We assume 32 bit link and for each encoder type E we consider four different versions named E4, E8, E16, and E32. In En, the link is partitioned in 32/n n-bits sub-links and

7 780 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 5, MAY 2011 congestions, blocking, multiplexing of packets, and so on. The analysis is performed on a set of data streams belonging to several media formats and under different synthetic traffic patterns. Finally, the section is closed with a real case study. Fig. 6. Percentage impact on silicon area and power dissipation of the network interface due to the data encoding/decoding logic. TABLE II Absolute Power Dissipation (mw) of the Different Elements of the NoC Router NI Link FIFO Arbiter Crossbar WHRT Routing Assuming 25% self switching activity and 25% coupling switching activity. the encoding scheme E is applied in parallel to each sublink. There is only one instance of FPC as it is specifically designed to work on 4 bit sub-links. As can be observed, the area overhead is below 8% for all the encoding schemes. With the exception of CDBI-32 and SC-32, the power overhead is below 6%. In the experimental section, we show that, in many cases, such power overhead is completely absorbed by the power saving in the NoC links. In most cases, whatever the delay introduced by the encoding logic, there will be no slowdown and the corresponding timing path can be easily ignored, since the encoding of the data can be pipelined and the inversion condition for the first chunk of data will be evaluated in parallel with the creation of the header flit. Table II shows the absolute power dissipated by the different elements of the NoC. The power analysis for the router and the NI has been carried out considering different packet sizes (from 2 to 16 flits) and random destinations. Power dissipated by the link strongly depends on the data traveling through it. The value in the table refers to the case in which the transfer determine 25% self switching activity and 25% coupling switching activity. VI. Experiments In this section, we assess the effectiveness of using data encoding techniques in NoC architectures. We restrict the analysis to the interconnect system components (i.e., links, routers, and network interfaces) without considering the power and energy contribution of the IP cores. This is not a limitation of the analysis since the interconnect system absorbs an important fraction of the overall power budget of the entire system [8]. First, we analyze the effectiveness of different encoding schemes, both in term of power and energy reduction, focusing on a single communication flow. Then, we perform a complete network analysis taking into account dynamic effects like A. Zero-Load Analysis In this analysis, we emphasize the energy improvement that can be obtained using the proposed data encoding schemes without considering any specific communication traffic. That is, we limit the analysis on a specific routing path from a source node, S, to a destination node, D. We assume only a communication flow from S to D without taking into consideration congestion and blocking issues due to the interaction of multiple concurrent communication flows. Such effects will be taken into account in the next subsection. The following parameters are used. The NoC is clocked at 700 MHz. The baseline NI (i.e., without the encoding/decoding logic) dissipates 5.3 mw. The average power dissipated by the wormhole-based router is 5.7 mw. An inter-routers wire has a total capacity of 592 ff/mm in a 65 nm UMC technology in which about 80% is due to crosstalk. We assume 2 mm 32 bits links and packet size of 16 bytes (8 flits). We assess the different data encoding schemes on a set of data streams belonging to eight different media formats namely ASCII text, PDF, gray scale image and true color image (both in BMP and JPEG formats), MP3 audio and MPEG video. For each class, ten data streams are considered and average values are reported. Fig. 7 shows the percentage of power saving obtained with different data encoding schemes for several data streams as compared to the case in which no data encoding is used. Negative values mean that there has been an increase in average power dissipation. With the exception of Text and Pic BW bmp, for all the considered data streams almost all the data encoding schemes improve power dissipation. If we focus on SCS, for instance, average power saving ranges from 5% to 24% when we pass from SCS-32 to SCS-4. The rationale behind the fact that smaller partitions result in more power saving is that the estimation of the impact on power dissipation due to the inversion of the bits in the partitions is more accurate for smaller partition sizes. This is due to the fact that, as soon as partition size increases, it becomes more and more probable that sub-sequences of bits in the partition, for which there is no crosstalk, are inverted in favor of other sub-sequences in the partition which contribute more to the power dissipation. It should be pointed out that the use of the considered encoding schemes increases the amount of traffic in the network since one bit of information in each encoded word is sacrificed in favor of the inv bit. Such overhead increases with the number of sub-links in which the link is partitioned. Precisely, the overhead due to the inv bit(s) when encoder En (with E {BI, CDBI, SC} and n {8, 16, 32}) is used is 1/n. The impact of this overhead on energy consumption is analyzed by Fig. 8 which shows the percentage of energy saving obtained with different data encoding schemes for several data streams as compared to the case in which no data encoding is used. Although the average power dissipated by the encoding/decoding logic for SC and SCS is not negligible

8 PALESI et al.: DATA ENCODING SCHEMES IN NETWORKS ON CHIP 781 Fig. 7. Percentage of power saving obtained with different data encoding schemes for several data streams. Fig. 8. Percentage of energy saving obtained with different data encoding schemes for several data streams. (see Fig. 6), the effectiveness of SC and SCS in reducing both the switching activity and the coupling switching activity is higher than that of the other approaches. That is, the power saving on links when SC or SCS is used counterbalances the power dissipated by the encoding/decoding logic more than the other approaches. Thus, since the energy consumption is the area below the power curve, and the extension of the curve is the same for all the schemes using the same number of partitions, the energy saving for SC/SCS is higher than that of the other approaches. The above results have been obtained assuming single hop communication. The impact of path length on both power saving and energy saving is shown in Fig. 9. For the sake of discussion, let us consider the results obtained under music data stream. As can be observed all the encoding schemes allow to obtain power saving [Fig. 9(a)]. In particular, the proposed approaches together with FPC exhibit the highest power saving up to 33%, 33%, and 36% for SC-4, SCS-4, and FPC, respectively. Average power savings above 20% are also observed when SC-8 and SCS-8 are used. Bus invert and CDBI do not pass the threshold of 17% in power saving. In terms of energy [Fig. 9(b)], only the proposed encoding schemes result in energy saving. In particular, SC/SCS in their configuration with 8, 16, and 32 bit are effective whatever the number of hops whereas SC/SCS-4 is effective starting from 2 hops. B. High-Load Analysis In this section, we compare the different encoding schemes using a cycle accurate NoC simulator based on Noxim [38]. In this way, dynamic effects like congestion, blocking, multiplexing of packets through the same link are taken into account and a clearer picture of the effectiveness of this data encoding can be provided. The power estimation models available in Noxim have been updated to take into account the power dissipated by the NIs (augmented with the encoding/decoding logic), and the power dissipated by the links due to both self and coupling switching activity. 1) Energy Analysis: Let us start by analyzing the percentage energy reduction that can be achieved using data encoding schemes on a 8 8 NoC under several synthetic traffic patterns. We consider deterministic XY routing, input FIFO buffers of four flits, and packets of eight flits injected at different packet injection rates (pir). Energy figures are computed running the simulation until 1 MB of traffic is drained by the network. A number of simulations is repeated for each pir value and energy values are averaged until the 95% confidence intervals are mostly within 2% of the means. Fig. 10 shows the percentage energy reduction achieved when several data encoding schemes are used for different pirs under bit-reversal traffic for 4-flit and 8-flit packets. Negative values mean an increase of energy consumption. As can be observed, the general trend is that the percentage energy reduction decreases as pir increases. This is quite expected due to the exponential nature which relates the communication delay with the pir. As the pir increases and approaches the saturation point, small increment of the injected load determines a high increment of the communication delay. For a given pir, the amount of traffic injected in the network is higher for the mechanisms which use more partitions. 3 For this reason, as soon as the saturation point is approached, the completion time for the schemes which use more partitions (i.e., BI 4, CDBI 4, SC 4, SCS 4, FPC) increases faster than in 3 Such an increment of injected load is related with the overhead traffic carrying control information (a.k.a., invert bit conditions) for decoding purposes.

9 782 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 5, MAY 2011 Fig. 9. Percent (a) power saving and (b) energy saving for different path length (hops) for music data stream. Fig. 10. Percentage energy reduction achieved for different pirs under bit-reversal traffic for 4-flit and 8-flit packets. the other schemes. This explains why, in Fig. 10, from a certain pir value, the slope of the curves BI 4, CDBI 4, SC 4, SCS 4, and FPC is more pronounced than the other mechanisms. The experimental results shown in Fig. 10 confirm the theoretical quantitative analysis presented in Section IV. As expected, as soon as the packet size increases, the percentage energy reduction increases. In fact, for larger packet sizes, the fraction of time a generic link is traversed by flits belonging to the same packet increases. Thus, the positive effects of data encoding are exploited for a longer time. It should be pointed out, however, that the results obtained for small packet sizes (e.g., 4-flit packets) are still worth of consideration. For instance, the percentage energy reduction when SCS-8 is used ranges from 12% to 15% even with small packet size. Due to space limitation, we do not report the detailed results for the other traffic scenarios. However, a summary of the improvements in terms of energy consumption with respect to the case in which no data encoding is used is shown in Fig. 11. The percentage energy reduction is computed for a pir value where none of the networks are saturated. 4 On average, only 4 A network is said to start saturating when increase in applied load does not result in linear increase in throughput [39]. SC, SCS, and FPC provide energy saving which are up to 16%, 17%, and 3% for SC-8, SCS-8, and FPC, respectively. It is interesting to note that the energy reduction mostly depends on the encoding scheme and it is fairly invariant with the traffic scenario. Thus, the average results shown in Fig. 11 are representative of the energy reduction that can be achieved in general cases using the proposed data encoding techniques. 2) Energy/Power Versus Performance: To assess the tradeoff between the reduction of the average power dissipation, the reduction of total energy consumption of the interconnect system with the completion time (i.e., the amount of time needed to drain a given amount of traffic volume), Fig. 12 shows the distribution of the simulated configurations in the plane % increase of completion time versus % reduction of power dissipation [Fig. 12(a)] and % increase of completion time versus % reduction of energy consumption [Fig. 12(b)]. The percentage increase of completion time is defined as the percentage increase of the time needed to drain a given amount of traffic when a given data encoding scheme is used with respect to the case in which no data encoding is used. Similarly, the percentage decrease of power dissipation (energy consumption) is the percentage reduction of power dissipated (energy consumed) to drain a given amount of traffic when a

10 PALESI et al.: DATA ENCODING SCHEMES IN NETWORKS ON CHIP 783 Fig. 11. Percentage energy reduction using different data encoding schemes under different synthetic traffic scenarios. Fig. 12. Percentage increase of completion time versus (a) percentage increase of power dissipation, and (b) percentage decrease of energy consumption to drain a given amount of traffic volume when different data encoding schemes are used under several traffic scenarios. given data encoding scheme is used with respect to the case in which no data encoding is used. Each point in the graph refers to one of the five synthetic traffic scenarios discussed above. As can be observed in Fig. 12(a), as the percentage power reduction is always positive, there is an improvement of the average power dissipation for all the encoding schemes and under all the synthetic traffic scenarios considered. The oblique solid line partitions the area of the graph in two regions. The points belonging to the bottom region are characterized by a percentage increase of completion time which is greater than the percentage reduction of power dissipation. On the contrary, the points belonging to the top region are characterized by a percentage reduction of power dissipation which is greater than the negative impact due to the percentage increase of the completion time. From this graph, the Pareto-optimal encoding schemes (i.e., that above the oblique line) are SC and SCS. In particular, higher power savings are observed as the granularity of the encoder becomes more and more fine (from SC/SCS 32 to SC/SCS 4). The other schemes (BI, CDBI, and FPC) fall in the bottom region in which the penalty due to the increase of the completion time is higher than the reduction of the average power dissipation. Fig. 12(b) shows the relationship between completion time and energy consumption. The plane is divided in three regions. The first region, bounded by the x-axis and the horizontal solid line, collects the dominated points. Such points refer to the data encoding schemes that, for the considered traffic scenarios, resulted in an increase of both energy consumption and completion time as compared to the case in which no data encoding is used. BI and CDBI in all their configurations (32-, 16-, 8-, and 4-bit) belong to this region. The region between the oblique solid line and the horizontal line collects the data encoding schemes for which the percentage reduction in energy consumption is less than the percentage increase in completion time. Finally, the region between the y-axes and the oblique solid line collects the data encoding schemes for which the percentage reduction in energy consumption is greater than the percentage increase in completion time. Of course, this is the most interesting region and, as can be observed, only SC/SCS 32, SC/SCS 16, and SC/SCS 8 belong to this region. 3) Packet Size: As discussed in the general quantitative analysis in Section IV, the effectiveness of a data encoded scheme exploiting the pipeline nature of wormhole switching increases as packet size increases. To quantify this trend, Fig. 13 shows the percentage of energy per flit reduction using SCS for different packet size under bit-reversal traffic. The percentage reduction of energy/flit rapidly increases as packet size goes from two flits to six flits. For packets larger than six flits, there are no more energy improvements. In fact, the missing encoding exploitation passing from a packet to the next one (please note that the header flit is not encoded) is completely amortized by the exploitation of the encoding scheme for a large number of flits.

11 784 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 5, MAY 2011 Fig. 13. Percentage energy/flit reduction using SCS with different partition sizes, for different packet size under bit-reversal traffic. Fig. 15. Heterogeneous system composed by a multimedia sub-system, a MIMO-OFDM receiver, a PIP, and a MWD module. Fig. 14. Percentage of energy reduction when SCS 8 is used for different pirs and number of VCs under uniform traffic. C. Virtual Channel Based Implementations As it has been already stated in Section III, the effectiveness of the proposed data encoding schemes decreases if VCs are used. Such worsening arises not only because flit-level multiplexing will ruin the flit correlation within a packet as already discussed in the paper, but also because the power ratio between router and link increases. In fact, VCs introduce some overhead in terms of both additional resources (e.g., buffers, replication of the routing logic, and so on) and mechanisms for their management (e.g., complex arbitration). Fig. 14 shows the percentage of energy reduction when SCS 8 is used for different pirs and number of VCs under uniform traffic. As expected, the average percentage of energy reduction decreases as the number of VCs increases. It should be noted, however, that the use of the proposed encoding scheme still provides energy saving also when VCs are used. As can be observed, as pir increases, the percentage of energy reduction decreases more quickly for the no VC implementation. This is due to the fact that the saturation pir for VC based implementations is usually higher than that of the baseline implementation (i.e., without VCs) providing higher performance (e.g., lower average delay). Thus, although the power dissipated by VC-based routers is higher than that of the baseline routers, the higher performance allows for draining more traffic during a certain time window which positively affects energy consumption. Fig. 16. Percentage reduction of power, energy, energy/flit, and percentage increase of completion time for different configurations of SCS scheme. D. A Case Study In the previous subsection, we found that the SCS scheme represents the Pareto optimal scheme in terms of both energy versus completion time and power versus completion time as compared to the other data encoding schemes considered in this paper. In this subsection, we assess the effectiveness of SCS on a complex heterogeneous system shown in Fig. 15. The system is composed by the following sub-systems. 1) MMS: a generic MultiMedia System which includes a H.263 video encoder, a H.263 video decoder, a MP3 audio encoder, and a MP3 audio decoder [40]. 2) MIMO-OFDM: a MIMO-OFDM receiver in which, to support the maximum data rate of world-wide spectrum efficiency proposal for the next-generation wireless LAN systems, some of the IPs have been parallelized to multiple IPs [41]. 3) PIP and MWD: a picture-in-picture application and a multi-window display application [42], [43]. In this case study, both packet size and packet injection rate vary with communication flow. For instance, the communication flows involved in MMS-Enc and MMS-Dec use a packet size tuned on the basis of a macroblock. Packet injection rate has been computed for each communication flow on the basis of the bandwidth requirements for each application as reported in [40] [43]. Fig. 16 shows the percentage reduction of total energy, average power, energy per flit and the percentage increase of

12 PALESI et al.: DATA ENCODING SCHEMES IN NETWORKS ON CHIP 785 completion time when SCS is used as respect to the case in which no data encoding scheme is used. In terms of energy, the best configuration is SCS 8 which allows to save up to 21% of energy consumption with less than 11% penalty in completion time. Average power dissipation and energy per flit reduce from 18% to 46% and from 18% to 49%, respectively, passing from SCS 32 to SCS 4 with a penalty in completion time which ranges from 2% to 25%. VII. Conclusion The power dissipated by the links of a NoC accounts for a significant fraction of the total power budget [8] [10], [13] [15]. In this paper, we have proposed the use of data encoding techniques as a viable way to reduce both power dissipation and energy consumption of NoC links. The proposed schemes are transparent to the underlying NoC infrastructure as they operate on an end-to-end basis. No modification of the router architecture is needed as well as links width. Only the NI is augmented with the encoding/decoding logic that, although represents an overhead, does not introduce a significant penalty both in terms of cost (i.e., silicon area) and latency. The proposed encoding schemes have been compared with several encoding schemes proposed in literature on a set of representative data streams both synthetic and extracted from real applications. The experimental analysis shown that by using the proposed encoding schemes it is possible to reduce the power contribution of both the self switching activity and the coupling switching activity in inter-routers links. Precisely, as compared to a baseline implementation in which no data encoding techniques are used, a reduction of up to 37% of power dissipation and 18% of energy consumption has been observed without any significant degradation in terms of both performance and silicon area. Currently, we are in the evaluation phase of integrating the SCS scheme into the NI of the STMicroelectronics NoC-based interconnection infrastructure. References [1] International Technology Roadmap for Semiconductors: Interconnect. (2006) [Online]. Semiconductor Industry Assoc. Available: [2] S. Pasricha and N. Dutt, Trends in emerging on-chip interconnect technologies, IPSJ Trans. Syst. LSI Design Methodol., vol. 1, pp. 2 17, Aug [3] H.-J. Yoo, K. Lee, and J. K. Kim, Low-Power NoC for High-Performance SoC Design. Boca Raton, FL: CRC Press, [4] S. Borkar, Thousand core chips: A technology perspective, in Proc. ACM/IEEE Design Autom. Conf., Jun. 2007, pp [5] G. D. Micheli and L. Benini, Networks on Chips: Technology and Tools. San Mateo, CA: Morgan Kaufmann, [6] J. A. Davis, R. Venkatesan, A. Kaloyeros, M. Beylansky, S. J. Souri, K. Banerjee, K. C. Saraswat, A. Rahman, R. Reif, and J. D. Meindl, Interconnect limits on gigascale integration, Proc. IEEE, vol. 89, no. 3, pp , Mar [7] J. D. Meindl, Interconnect opportunities for gigascale integration, IEEE Micro, Special Issue Reliab.-Aware Microarchitecture, vol. 23, no. 3, pp , May [8] S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y. Hoskote, N. Borkar, and S. Borkar, An 80-tile sub-100-w TeraFLOPS processor in 65-nm CMOS, IEEE J. Solid-State Circuits, vol. 43, no. 1, pp , Jan [9] M. B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J.-W. Lee, W. Lee, A. Ma, A. Saraf, M. Seneski, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal, The raw microprocessor: A computational fabric for software circuits and general-purpose programs, IEEE Micro, vol. 22, no. 2, pp , Mar. Apr [10] F. Steenhof, H. Duque, B. Nilsson, K. Goossens, and R. P. Llopis, Networks on chips for high-end consumer-electronics TV system architectures, in Proc. Conf. Design Autom. Test Eur., 2006, pp [11] A. Ejlali, B. M. Al-Hashimi, P. Rosinger, S. G. Miremadi, and L. Benini, Performability/energy tradeoff in error-control schemes for on-chip networks, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 18, no. 1, pp. 1 14, Jan [12] K. Srinivasan and K. S. Chatha, Layout aware design of mesh based NoC architectures, in Proc. Int. Conf. Hardw.-Softw. Codesign Syst. Synthesis, 2006, pp [13] J. C. S. Palma, L. S. Indrusiak, F. G. Moraes, A. G. Ortiz, M. Glesner, and R. A. L. Reis, Inserting data encoding techniques into NoC-based systems, in Proc. IEEE Comput. Soc. Annu. Symp. VLSI, Mar. 2007, pp [14] Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar, A 5-GHz mesh interconnect for a teraflops processor, IEEE MICRO, vol. 27, no. 5, pp , Sep. Oct [15] L. Carloni, A. B. Kahng, S. Muddu, A. Pinto, K. Samadi, and P. Sharma, Interconnect modeling for improved system-level design optimization, in Proc. Asia South Pacific Design Autom. Conf., 2008, pp [16] A. Jantsch, R. Lauter, and A. Vitkowski, Power analysis of link level and end-to-end data protection in networks on chip, in Proc. IEEE Int. Symp. Circuits Syst., vol. 2. May 2005, pp [17] P. P. Pande, A. Ganguly, H. Zhu, and C. Grecu, Energy reduction through crosstalk avoidance coding in networks on chip, J. Syst. Architure, vol. 54, nos. 3 4, pp , [18] G.-Y. Wei, J. Kim, D. Liu, S. Sidiropoulos, and M. A. Horowitz, A variable-frequency parallel I/O interface with adaptive power-supply regulation, IEEE J. Solid-State Circuits, vol. 35, no. 11, pp , Nov [19] J. Kim and M. A. Horowitz, Adaptive supply serial links with sub- 1v operation and per-pin clock recovery, IEEE J. Solid-State Circuits, vol. 37, no. 11, pp , Nov [20] V. Soteriou and L.-S. Peh, Design-space exploration of power-aware on/off interconnection networks, in Proc. IEEE Int. Conf. Comput. Design, Oct. 2004, pp [21] G. Chen, F. Li, and M. Kandemir, Compiler-directed channel allocation for saving power in on-chip networks, ACM SIGPLAN Not., vol. 41, no. 1, pp , [22] S. E. Lee and N. Bagherzadeh, A variable frequency link for a poweraware network-on-chip, Integr. VLSI J., vol. 42, no. 4, pp , Sep [23] M. R. Stan and W. P. Burleson, Bus invert coding for low power I/O, IEEE Trans. Very Large Scale Integr. Syst., vol. 3, no. 1, pp , Mar [24] C. Su, C. Tsui, and A. Despain, Saving power in the control path of embedded processors, IEEE Design Test Comput., vol. 11, no. 4, pp , Aug [25] L. Benini, G. D. Micheli, E. Macii, D. Sciuto, and C. Silvano, Asymptotic zero-transition activity encoding for address busses in low-power microprocessor-based systems, in Proc. Great Lakes Symp. VLSI, Mar. 1997, pp [26] E. Musoll, T. Lang, and J. Cortadella, Reducing the energy of address and data buses with the working-zone encoding technique and its effect on multimedia applications, in Proc. Power Driven Architecture Workshop, 1998, pp [27] L. Benini, G. D. Micheli, E. Macii, M. Poncino, and S. Quer, Power optimization of core-based systems by address bus encoding, IEEE Trans. Very Large Scale Integr. Syst., vol. 6, no. 4, pp , Dec [28] L. Benini, A. Macii, E. Macii, M. Poncino, and R. Scarsi, Architectures and synthesis algorithms for power-efficient bus interfaces, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 19, no. 9, pp , Sep [29] G. Ascia, V. Catania, M. Palesi, and A. Parlato, Switching activity reduction in embedded systems: A genetic bus encoding approach, IEE Proc. Comput. Digital Tech., vol. 152, no. 6, pp , Nov

13 786 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 5, MAY 2011 [30] K. W. Kim, K. H. Baek, N. Shanbhag, C. L. Liu, and S. M. Kang, Coupling-driven signal encoding scheme for low-power interface design, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 2000, pp [31] J. Henkel, H. Lekatsas, and V. Jakkula, Encoding schemes for address busses in energy efficient SoC design, in Proc. 11th VLSI-SoC Int. Conf. Very Large Scale Integration, Dec. 2001, pp [32] M. Palesi, F. Fazzino, G. Ascia, and V. Catania, Data encoding for lowpower in wormhole-switched networks-on-chip, in Proc. Euromicro Conf. Digital Syst. Des., 2009, pp [33] G. Ascia, V. Catania, F. Fazzino, and M. Palesi, An encoding scheme to reduce power consumption in networks-on-chip, in Proc. IEEE Int. Conf. Comput. Eng. Syst., Dec. 2009, pp [34] L. M. Ni and P. K. McKinley, A survey of wormhole routing techniques in direct networks, IEEE Comput., vol. 26, no. 2, pp , Feb [35] L. Benini and G. D. Micheli, Networks on chips: A new SoC paradigm, IEEE Comput., vol. 35, no. 1, pp , Jan [36] J. Xi and P. Zhong, A system-level network-on-chip simulation framework with analytical interconnecting wire models, in Proc. IEEE Int. Conf. Electro/Inform. Technol., May 2006, pp [37] D. Bertozzi and L. Benini, Xpipes: A network-on-chip architecture for gigascale systems-on-chip, IEEE Circuits Syst. Mag., vol. 4, no. 2, pp , Apr. Jun [38] F. Fazzino, M. Palesi, and D. Patti. Noxim: Network-on-Chip Simulator [Online]. Available: [39] P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, Performance evaluation and design tradeoffs for network-on-chip interconnect architectures, IEEE Trans. Comput., vol. 54, no. 8, pp , Aug [40] J. Hu and R. Marculescu, Energy and performance-aware mapping for regular NoC architectures, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 24, no. 4, pp , Apr [41] S.-R. Yoon, J. Lee, and S.-C. Park, Case study: Noc based nextgeneration WLAN receiver design in transaction level, in Proc. Int. Conf. Adv. Commun. Technol., 2006, pp [42] E. G. T. Jaspers and P. H. N. de With, Chip-set for video display of multimedia information, IEEE Trans. Consumer Electron., vol. 45, no. 3, pp , Aug [43] E. B. van der Tol and E. G. Jaspers, Mapping of MPEG-4 decoding on a flexible architecture platform, in Proc. SPIE: Media Processors, vol , pp Maurizio Palesi (M 06) received the M.S. and Ph.D. degrees in computer engineering from the Università di Catania, Catania, Italy, in 1999 and 2003, respectively. Since November 2010, he has been an Assistant Professor with Kore University, Enna, Italy. Dr. Palesi serves on the Editorial Board of the Very Large Scale Integration Design Journal as an Associate Editor since May He has served as a Guest Editor for the Very Large Scale Integration Design Journal (Special Issue on Networkson-Chip) in 2008, as a Guest Editor for the International Journal of High Performance Systems Architecture (Special Issue on Power-Efficient, High Performance General Purpose and Application-Specific Computing Architectures) in 2009, and as a Guest Editor for the Elsevier MICPRO Journal (Special Issue on Network-on-Chip Architectures and Design Methodologies) in He serves as the Technical Program Committee Member for the following IEEE/ACM international conferences: RTAS, CODES+ISSS, ESTIMedia, SOCC, VLSI, ISC, and SITIS. He was the Co-Organizer of the International Workshops on Network-on-Chip Architectures in 2008, 2009, and Giuseppe Ascia received the M.S. degree in electronic engineering and the Ph.D. degree in computer science from the Università di Catania, Catania, Italy, in 1994 and 1998, respectively. In 1994, he joined the Institute of Computer Science and Telecommunications, Università di Catania. Currently, he is an Associate Professor with the Dipartimento di Ingegneria Informatica e delle Telecomunicazioni, Università di Catania. His current research interests include soft computing, very large scale integration design, hardware architectures, and low-power design. Fabrizio Fazzino (M 08) received the M.S. degree in computer engineering from the University of Catania, Catania, Italy, in Until 2001, he was responsible for the functional verification of 32 bit lines of microprocessors with STMicroelectronics, Catania. Since 2004, he has collaborated with the Department of Computer and Telecommunications Engineering, University of Catania. He is currently a Silicon Engineer with Icera, Inc., Bristol, U.K. Vincenzo Catania received the M.S. degree with Honors in electrical engineering from the Università di Catania, Catania, Italy, in Until 1984, he was responsible for testing microprocessor systems with STMicroelectronics, Catania. Since 1985, he has cooperated in research on advanced computer architectures and computer networks with the Dipartimento di Ingegneria Informatica e delle Telecomunicazioni, Facoltà di Ingegneria, Università di Catania, where he is currently a Full Professor of computer science. Since November 2006, he has served as the Director of the Department of Computer Science and Telecommunications Engineering, Università di Catania. He is the author of more than 200 articles on international journals and conference proceedings and holds two patents. Currently, his research focuses on pervasive embedded systems, network-on-chip architectures, and mobile terminal platforms and services.

Encoding Scheme for Power Reduction in Network on Chip Links

Encoding Scheme for Power Reduction in Network on Chip Links RESEARCH ARICLE OPEN ACCESS Encoding Scheme for Power Reduction in Network on Chip Links Chetan S.Behere*, Somulu Gugulothu** *(Department of Electronics, YCCE, Nagpur-10 Email: chetanbehere@gmail.com)

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

Chapter 2 Designing Crossbar Based Systems

Chapter 2 Designing Crossbar Based Systems Chapter 2 Designing Crossbar Based Systems Over the last decade, the communication architecture of SoCs has evolved from single shared bus systems to multi-bus systems. Today, state-of-the-art bus based

More information

Bandwidth Aware Routing Algorithms for Networks-on-Chip

Bandwidth Aware Routing Algorithms for Networks-on-Chip 1 Bandwidth Aware Routing Algorithms for Networks-on-Chip G. Longo a, S. Signorino a, M. Palesi a,, R. Holsmark b, S. Kumar b, and V. Catania a a Department of Computer Science and Telecommunications Engineering

More information

Mapping of Real-time Applications on

Mapping of Real-time Applications on Mapping of Real-time Applications on Network-on-Chip based MPSOCS Paris Mesidis Submitted for the degree of Master of Science (By Research) The University of York, December 2011 Abstract Mapping of real

More information

RTL Power Estimation and Optimization

RTL Power Estimation and Optimization Power Modeling Issues RTL Power Estimation and Optimization Model granularity Model parameters Model semantics Model storage Model construction Politecnico di Torino Dip. di Automatica e Informatica RTL

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

ISSN Vol.04,Issue.01, January-2016, Pages:

ISSN Vol.04,Issue.01, January-2016, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.04,Issue.01, January-2016, Pages:0077-0082 Implementation of Data Encoding and Decoding Techniques for Energy Consumption Reduction in NoC GORANTLA CHAITHANYA 1, VENKATA

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

IMPLEMENTATION OF LOW POWER DATA ENCODING TECHNIQUES FOR NoC

IMPLEMENTATION OF LOW POWER DATA ENCODING TECHNIQUES FOR NoC IMPLEMENTATION OF LOW POWER DATA ENCODING TECHNIQUES FOR NoC Swathi.Shivakumar 1 and Prasanna Kumar B. K 2 1,2 VLSI Design and Embedded Systems, Shridevi Institute of Engineering and Technology,Tumkur

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

The Benefits of Using Clock Gating in the Design of Networks-on-Chip

The Benefits of Using Clock Gating in the Design of Networks-on-Chip The Benefits of Using Clock Gating in the Design of Networks-on-Chip Michele Petracca, Luca P. Carloni Dept. of Computer Science, Columbia University, New York, NY 127 Abstract Networks-on-chip (NoC) are

More information

On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors

On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors Govindan Ravindran Newbridge Networks Corporation Kanata, ON K2K 2E6, Canada gravindr@newbridge.com Michael

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

A High Performance Bus Communication Architecture through Bus Splitting

A High Performance Bus Communication Architecture through Bus Splitting A High Performance Communication Architecture through Splitting Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 797, USA {lur, chengkok}@ecn.purdue.edu

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari Global Journal of Computer Science and Technology: E Network, Web & Security Volume 15 Issue 6 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Bandwidth-aware routing algorithms for networks-on-chip platforms M. Palesi 1 S. Kumar 2 V. Catania 1

Bandwidth-aware routing algorithms for networks-on-chip platforms M. Palesi 1 S. Kumar 2 V. Catania 1 Published in IET Computers & Digital Techniques Received on 6th July 2008 Revised on 2nd April 2009 In Special Issue on Networks on Chip ISSN 1751-8601 Bandwidth-aware routing algorithms for networks-on-chip

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

Multicomputer distributed system LECTURE 8

Multicomputer distributed system LECTURE 8 Multicomputer distributed system LECTURE 8 DR. SAMMAN H. AMEEN 1 Wide area network (WAN); A WAN connects a large number of computers that are spread over large geographic distances. It can span sites in

More information

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad NoC Round Table / ESA Sep. 2009 Asynchronous Three Dimensional Networks on on Chip Frédéric ric PétrotP Outline Three Dimensional Integration Clock Distribution and GALS Paradigm Contribution of the Third

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC QoS Aware BiNoC Architecture Shih-Hsin Lo, Ying-Cherng Lan, Hsin-Hsien Hsien Yeh, Wen-Chung Tsai, Yu-Hen Hu, and Sao-Jie Chen Ying-Cherng Lan CAD System Lab Graduate Institute of Electronics Engineering

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design white paper Intel FPGA Applying the Benefits of on a Chip Architecture to FPGA System Design Authors Kent Orthner Senior Manager, Software and IP Intel Corporation Table of Contents Abstract...1 Introduction...1

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study

Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study William Fornaciari Politecnico di Milano, DEI Milano (Italy) fornacia@elet.polimi.it Donatella Sciuto Politecnico

More information

Network-on-Chip Micro-Benchmarks

Network-on-Chip Micro-Benchmarks Network-on-Chip Micro-Benchmarks Zhonghai Lu *, Axel Jantsch *, Erno Salminen and Cristian Grecu * Royal Institute of Technology, Sweden Tampere University of Technology, Finland Abstract University of

More information

DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER

DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER G MAHESH BABU, et al, Volume 2, Issue 7, PP:, SEPTEMBER 2014. DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER G.Mahesh Babu 1*, Prof. Ch.Srinivasa Kumar 2* 1. II. M.Tech (VLSI), Dept of ECE,

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

WITH scaling of process technology, the operating frequency

WITH scaling of process technology, the operating frequency IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 8, AUGUST 2007 869 Synthesis of Predictable Networks-on-Chip-Based Interconnect Architectures for Chip Multiprocessors Srinivasan

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

On GPU Bus Power Reduction with 3D IC Technologies

On GPU Bus Power Reduction with 3D IC Technologies On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,

More information

A Dedicated Monitoring Infrastructure For Multicore Processors

A Dedicated Monitoring Infrastructure For Multicore Processors IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, Vol. xx, No. xx, February 2010. 1 A Dedicated Monitoring Infrastructure For Multicore Processors Jia Zhao, Sailaja Madduri, Ramakrishna

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Abstract In microprocessor-based systems, data and address buses are the core of the interface between a microprocessor

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

ISSN Vol.03, Issue.02, March-2015, Pages:

ISSN Vol.03, Issue.02, March-2015, Pages: ISSN 2322-0929 Vol.03, Issue.02, March-2015, Pages:0122-0126 www.ijvdcs.org Design and Simulation Five Port Router using Verilog HDL CH.KARTHIK 1, R.S.UMA SUSEELA 2 1 PG Scholar, Dept of VLSI, Gokaraju

More information

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Outline Introduction Overview of WiNoC system architecture Overlaid

More information

Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization

Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization Basic Network-on-Chip (BANC) interconnection for Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization Abderazek Ben Abdallah, Masahiro Sowa Graduate School of Information

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April

More information

HiRA: A Methodology for Deadlock Free Routing in Hierarchical Networks on Chip

HiRA: A Methodology for Deadlock Free Routing in Hierarchical Networks on Chip HiRA: A Methodology for Deadlock Free Routing in Hierarchical Networks on Chip Rickard Holsmark 1, Maurizio Palesi 2, Shashi Kumar 1 and Andres Mejia 3 1 Jönköping University, Sweden 2 University of Catania,

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Design and Implementation of Multistage Interconnection Networks for SoC Networks

Design and Implementation of Multistage Interconnection Networks for SoC Networks International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 212 Design and Implementation of Multistage Interconnection Networks for SoC Networks Mahsa

More information

Boosting the Performance of Myrinet Networks

Boosting the Performance of Myrinet Networks IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations

More information

WITH THE CONTINUED advance of Moore s law, ever

WITH THE CONTINUED advance of Moore s law, ever IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 11, NOVEMBER 2011 1663 Asynchronous Bypass Channels for Multi-Synchronous NoCs: A Router Microarchitecture, Topology,

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Bus Encoding Techniques for System- Level Power Optimization

Bus Encoding Techniques for System- Level Power Optimization Chapter 5 Bus Encoding Techniques for System- Level Power Optimization The switching activity on system-level buses is often responsible for a substantial fraction of the total power consumption for large

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP 1 M.DEIVAKANI, 2 D.SHANTHI 1 Associate Professor, Department of Electronics and Communication Engineering PSNA College

More information

Conquering Memory Bandwidth Challenges in High-Performance SoCs

Conquering Memory Bandwidth Challenges in High-Performance SoCs Conquering Memory Bandwidth Challenges in High-Performance SoCs ABSTRACT High end System on Chip (SoC) architectures consist of tens of processing engines. In SoCs targeted at high performance computing

More information

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology 1 ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology Mikkel B. Stensgaard and Jens Sparsø Technical University of Denmark Technical University of Denmark Outline 2 Motivation ReNoC Basic

More information

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks 2080 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2012 Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks Rohit Sunkam

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Top-Down Transaction-Level Design with TL-Verilog

Top-Down Transaction-Level Design with TL-Verilog Top-Down Transaction-Level Design with TL-Verilog Steven Hoover Redwood EDA Shrewsbury, MA, USA steve.hoover@redwoodeda.com Ahmed Salman Alexandria, Egypt e.ahmedsalman@gmail.com Abstract Transaction-Level

More information

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong

More information

DUE to the increasing computing power of microprocessors

DUE to the increasing computing power of microprocessors IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 7, JULY 2002 693 Boosting the Performance of Myrinet Networks José Flich, Member, IEEE, Pedro López, M.P. Malumbres, Member, IEEE, and

More information

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems Mohammad Ali Jabraeil Jamali, Ahmad Khademzadeh Abstract The success of an electronic system in a System-on- Chip is highly

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

Networks on Chip. Axel Jantsch. November 24, Royal Institute of Technology, Stockholm

Networks on Chip. Axel Jantsch. November 24, Royal Institute of Technology, Stockholm Networks on Chip Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Networks on Chip 1 Overview NoC as Future SoC Platforms What

More information

Multi-path Routing for Mesh/Torus-Based NoCs

Multi-path Routing for Mesh/Torus-Based NoCs Multi-path Routing for Mesh/Torus-Based NoCs Yaoting Jiao 1, Yulu Yang 1, Ming He 1, Mei Yang 2, and Yingtao Jiang 2 1 College of Information Technology and Science, Nankai University, China 2 Department

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks

An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks ABSTRACT High end System-on-Chip (SoC) architectures consist of tens of processing engines. These processing engines have varied

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.705

More information

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger Interconnection Networks: Flow Control Prof. Natalie Enright Jerger Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control:

More information

AS SILICON technology enters the nanometer-scale era,

AS SILICON technology enters the nanometer-scale era, 1572 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 10, OCTOBER 2010 An SDRAM-Aware Router for Networks-on-Chip Wooyoung Jang, Student Member, IEEE, and David

More information

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema [1] Laila A, [2] Ajeesh R V [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology, Kollam

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

LPRAM: A Novel Low-Power High-Performance RAM Design With Testability and Scalability. Subhasis Bhattacharjee and Dhiraj K. Pradhan, Fellow, IEEE

LPRAM: A Novel Low-Power High-Performance RAM Design With Testability and Scalability. Subhasis Bhattacharjee and Dhiraj K. Pradhan, Fellow, IEEE IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 5, MAY 2004 637 LPRAM: A Novel Low-Power High-Performance RAM Design With Testability and Scalability Subhasis

More information

Topology basics. Constraints and measures. Butterfly networks.

Topology basics. Constraints and measures. Butterfly networks. EE48: Advanced Computer Organization Lecture # Interconnection Networks Architecture and Design Stanford University Topology basics. Constraints and measures. Butterfly networks. Lecture #: Monday, 7 April

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology Outline SoC Interconnect NoC Introduction NoC layers Typical NoC Router NoC Issues Switching

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

On Using Machine Learning for Logic BIST

On Using Machine Learning for Logic BIST On Using Machine Learning for Logic BIST Christophe FAGOT Patrick GIRARD Christian LANDRAULT Laboratoire d Informatique de Robotique et de Microélectronique de Montpellier, UMR 5506 UNIVERSITE MONTPELLIER

More information

Noxim the NoC Simulator

Noxim the NoC Simulator Noxim the NoC Simulator User Guide http://www.noxim.org/ (C) 2005-2010 by the University of Catania Maurizio Palesi, PhD Email: mpalesi@diit.unict.it Home: http://www.diit.unict.it/users/mpalesi/ Davide

More information

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults 1/45 1/22 MICRO-46, 9 th December- 213 Davis, California udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults Ritesh Parikh and Valeria Bertacco Electrical Engineering & Computer

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections A.SAI KUMAR MLR Group of Institutions Dundigal,INDIA B.S.PRIYANKA KUMARI CMR IT Medchal,INDIA Abstract Multiple

More information