J. Basic. Appl. Sci. Res., 1(10)1594-1602, 2011 2011, TextRoad Publication ISSN 2090-424X Jounal of Basic and Applied Scientific Reseach www.textoad.com Simulation and Pefomance Evaluation of Netwok on Chip Achitectues and Algoithms using CINSIM Sheaz Anjum*, Ehsan Ullah Muni and Muhammad Wasif Nisa COMSATS Institute of Infomation Technology, Quaid Avenue, The Mall, Wah Cantt, Pakistan *dsheaz@comsats.edu.pk, ehsanmuni@comsats.edu.pk and wasifnisa@gmail.com ABSTRACT The continuous satuation poblems of today s buses intoduced due to lage numbe of esouces has lead the System on Chip (SoC) designes to eseach a new SoC paadigm known as Netwoks-on-Chip (NoC). The eseach elated to NoC is still in its infancy and eseaches ae tying to popose, design and exploe diffeent achitectues, algoithms and simulation tools elated to NoC. In this pape we have exploed a tool named as CINSIM (Component Based Inteconnection Netwok Simulato) that is designed fo linux systems by a eseach team at Real-Time Systems and Robotics of Technische Univesität Belin. Fist we intoduced the simulation components and envionment elated to the tool and then applied the tool on a heteogeneous 2D-Mesh Netwok on Chip achitectue. We showed how the diffeent combinations of outing algoithms, switching techniques, simulation types, taffic souces and measuement types can be used to check the pefomance evaluation of the undeline heteogeneous NoC which in tun shows the geneal methodology of simulation and pefomance evaluation of any NoC platfom unde consideation. We hope this effot will help many NoC eseaches fo the quick design, analysis and selection of paticula NoC platfoms. KEY WORDS: Component Based Inteconnection Netwok Simulato (CINSIM), Switching Techniques, Routing Stategies, Taffic Souces, Flow Contol Unit (Flit). 1 INTRODUCTION Accoding to ITRS [1], we ae now able to fabicate billions of tansistos on a single chip using 45nm o lowe pocess technologies. Cuent SoC design methodologies ae not scaling well with the advancement of pocess technologies. The use of buses in today s SoCs fo inteconnecting heteogeneous esouces is becoming a bottleneck due to contention and congestion poblems. Moe ove global wie delays, eusability, less modulaity and scalability issues have added to the poblems of cuent bus based SoCs. Consequently some eseach goups [2], [3] and [4] have poposed to adopt the moe modula and scalable design methodologies known as Netwoks on chip, a new SoC paadigm. The use of Globally Asynchonous and Locally Synchonous concept in NoCs has disintegated the design of esouces fom the est of the netwok. Its use could enhance the scalability, modulaity and eusability of IP. Design and selection of appopiate achitectue, outing algoithm and switching technique fo on chip communication has a key ole in the design and implementation of the complete platfom fo NoC. An appopiate NoC tool can acceleate the pocess of design and selection of suitable NoC achitectues and algoithms. Due to lack of tools available fo NoC Simulations, many eseaches as in [5] [6], [11] and [12]have used publicly available Netwok Simulato 2 (NS-2) [7], [8], [9] fo the simulation and evaluation of thei NoC achitectues and algoithms, but NS-2 is less suitable fo the simulation and pefomance evaluation of netwoks on chip. Theefoe we decided to exploe, intoduce and apply CINSIM [10] tool that is moe suitable fo the simulation and pefomance evaluation of Netwoks-on-Chip achitectues and algoithms. In the following section we will biefly discuss the specification and use of CINSIM tool. *Coesponding Autho: Sheaz Anjum, COMSATS Institute of Infomation Technology, Quaid Avenue, The Mall, Wah Cantt, Pakistan. Email: dsheaz@comsats.edu.pk 1594
Anjum et al., 2011 2 CINSIM CINSIM is a Component-based Inteconnection Netwok Simulato and is designed fo Linux envionments. The simulation is pefomed by the simulation coe (cinsim) to be contolled at the command-line. The coe is widely witten in C++ and capable of executing pefomance analysis of egula and iegula inteconnection netwoks with some bounday conditions to be satisfied. The simulation setup, including the netwok desciption, must be specified by an XML file based on an XML schema. Fo this pupose, the CINSIM povides a fully schema diven edito (cinsim-gui) that can visualize and edit the simulation paametes of the XML document shown in Figue 1 to easily descibe inteconnection netwoks. The edito is witten in JAVA that makes it platfom independent. Figue 1: The Global Setting Menu of 5x5 Heteogeneous 2D-Mesh XML Document The pefomance analysis of inteconnection netwoks that can be executed using the simulato CINSIM includes mean packet delay, mean queue length, mean flit delay, taget and souce thoughputs. These popeties can be investigated using steady-state o teminating simulations. Confidence levels and estimated pecisions ae obseved fo each measue. If the desied temination citeia ae met, the simulation stops. The component-based appoach of the simulato CINSIM leads to a distinction of seveal netwok and simulation components that can be put togethe in many ways. The following Netwok components can be used to descibe egula o iegula inteconnection netwoks: Souce Buffes Used to specify taffic souces Non-Shaed Buffes Used as intemediate data (Flit) holdes Routes Used to oute packets Taget Buffes Used to analyze eceived packets The simulation of an inteconnection netwok invokes seveal independent simulation components that can also be set up using the povided XML edito. The simulation components ae listed below: Routing Stategies Includes Bitmask, Shotest Path, XY and West Fist outing Switching Techniques Packet switching including Stoe-and-Fowad, Vitual/Patial Cut Though and Womhole switching Teminating and Steady State Simulations Scheduling Algoithms - To esolve outing conflicts Measue Routines - Flit Delay, Taget Thoughput, Souce Thoughput, Mean Queue Length 1595
J. Basic. Appl. Sci. Res., 1(10)1594-1602, 2011 Backpessue Mechanisms Local o Global In the following sections we biefly descibe the impotant netwok and simulation components that can be setup by the povided schema diven edito fo the XML document unde consideation. 2.1 Souce Buffe Component A souce buffe does not eceive packets but ceates them on demand invoking vaious distibution functions such as geometic distibution, peiodic distibution, paeto distibution and andom-bust distibutions. In some aspects a souce buffe can be consideed as the countepat of a client connected to a netwok, like a single pocesso. A client usually tansfes a message to a netwok addessed to a specific client o to a set of clients also connected to the netwok. Howeve, a souce buffe component in CINSIM can not eceive messages, defined by constant-size packets; theefoe an additional taget buffe component is needed to simulate a client sending and eceiving messages. 2.2 Buffe Component Buffe components ae used to stoe packets along thei way though an inteconnection netwok. Within evey single clock cycle only one packet fagment (Flit) can be eceived and stoed while on the othe hand one fagment can be sent and emoved. 2.3 Taget Buffe Component The taget buffes ae special buffes that ae used as destinations fo packets passed though an inteconnection netwok simulated with CINSIM. Each taget buffe is epesented by a single bit within the taget bitmasks of packets and netwok components. Tagets can eceive one packet fagment (Flit) pe clock cycle and thei queue will be cleaed at the end of each clock cycle. Analyze components connected to taget buffes can analyze the eceived Flits. 2.4 Route Component The oute component is an abstact switch using an I/O matix to oute packets. Routes ae esponsible fo outing, scheduling and switching and contain input and output pots. Input pots seve incoming packets while an output pot is the contentious esouce. The opeation of a oute can be divided up into the outing, the scheduling and the switching. The contentious esouce can be solved by selecting one of many scheduling algoithms such as andom, global ound obin, local ound obin, fixed ode, most ecently used, least ecently used, most fequently used, least fequently used, oldest packet fist, longest waiting packet fist, pioity and deadline scheduling. All abites call the andom numbe geneato, if the scheduling algoithm cannot solve the conflict. 2.5 Routing Stategies The layout of an inteconnection netwok sets up one o, in case of edundancy, moe paths between a given souce/destination pai. A outing stategy detemines the paths to the destination fo a given message. CINSIM suppots fou diffeent outing stategies that includes Bitmask, Shotest Path, XY and West Fist outing out of which one can be set up fo each netwok configuation in the fist section of the local/global settings menu by choosing the elated value fo the attibute outingtype. 2.6 Switching Techniques Apat fom detemining valid paths between souces and destinations within an inteconnection netwok, a switching technique is needed that specifies how messages ae to be fagmented befoe passing them to the netwok and how the esouces along the path ae to be allocated. Futhemoe, a switching technique gives peconditions to be fulfilled befoe a fagment can be moved on to the next netwok component. CINSIM featues seveal packet switching techniques such as Stoe and Fowad, Vitual Cut Though, Patial Cut Though and Womhole switching technique. 2.7 Simulation Type CINSIM povides two diffeent types of simulation uns. A steady-state simulation can be used to detemine the pefomance chaacteistics fo the steady state of an inteconnection netwok. The simulation invokes consecutive clock cycles until the esults of all measuements each steady states. In contast to this appoach, the tansient behavio of an inteconnection netwok within a given ange of clock cycles can be investigated using a teminating simulation. A teminating simulation un stops and estats afte the last clock cycle of the desied ange has been simulated. Fo evey single clock cycle within the specified ange, a esult fo all set up measuements will be detemined. Both, steady-state and teminating simulation, ae obseved by CINSIM and stops, if all pefomance chaacteistics to be investigated each steady states accoding to a given confidence level and a desied pecision. 2.8 Obsevation Type This analyze component has been intoduced to povide some kind of measuement device that can be connected to othe 1596
Anjum et al., 2011 components of a netwok but it is actually not a netwok component. An analyze efes to one o moe measuement vaiables defining measuements. A measuement vaiable belongs to one paamete obseved by CINSIM o, in case of teminating simulations, to a sequence of paametes. Multiple analyzes may shae the same measuement vaiables and theefoe the same paametes. The obsevation types suppoted by CINSIM include delay and latency at tagets, delay and latency at buffes, souce thoughput, taget thoughput and mean queue length. 3 Implementation of a Heteogeneous 2D-Mesh NoC using CINSIM To apply the CINSIM tool and check the pefomance evaluation of diffeent paametes, we decided to choose 5x5 heteogeneous 2D-Mesh NoC achitectue as shown in the Figue 2. Whee epesents a esouce and epesents a oute. The esouce can be a DSP, a pocesso, RAM, ROM, an ASIC o an FPGA etc. The link between any two outes is a duplex link. By selecting Sim in the global setting menu of Figue 1 and choosing diffeent combinations of outing type, switching type and simulation type (efe to top-ight cone of Figue 1), we ceated diffeent XML documents elated to heteogeneous 2D-Mesh to be simulated and analyzed late by the cinsim coe. We defined thee measuement vaiables 1, 2 and 3 to measue the delay at tagets, souce thoughput and destination thoughput espectively and fou paametes 1, 2, 3 and 4 to be used with diffeent simulation components detailed in Table 1. Whee the stat value is used by fist simulation un and the add values ae used fo the next simulation uns in sequence. 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 : Route : Resouce Figue 2: Achitectue of 5x5 2D-Mesh NoC Table 1: Paamete Values and thei Relationship Paamete Stat Value Add Value Related To a 1 1 Buffe Size b 1 1 Packet Size c 0.1 0.1 Geometic Distibution d 1 1 Peiodic Distibution By choosing the Mesh 0 component in the cinsim-gui edito (see Figue 1), we designed the 5x5 heteogeneous 2D-Mesh NoC achitectue as shown in Figue 3. To model the NoC as containing heteogeneous esouces, we applied geometic distibution on the odd numbeed souces and peiodic distibution on the even numbeed souce nodes. The detail of components and connections compised by a single Mesh node is shown in Figue 4. Each Mesh node contains fou input and fou output pots which is also clea fom Figue 3. In addition it consists of a souce buffe S0 to geneate packets accoding to selected distibution function, a taget buffe T0 to eceive taffic in the fom of packets, a oute R0 to oute and schedule Flits depending on selected outing technique and scheduling algoithm, five buffe components B0 to B4 to hold intemediate Flits duing thei flight to the destinations and two analyze components A0 and A1 to measue the delay, taget thoughput and souce thoughput espectively. 1597
J. Basic. Appl. Sci. Res., 1(10)1594-1602, 2011 Figue 3: 5x5 Heteogeneous 2D-Mesh NoC Achitectue using cinsim-gui Figue 4: Components compised by a Mesh Node 4 SIMULATION RESULTS We analyzed the achitectue of Figue 3 using diffeent combinations of outing, switching, simulation and measuement types without distubing the local infomation at each node. The detail of combinations used fo the geneation of diffeent XML files is listed in Table 2. The geneated XML files along with 90% confidence level and 10% pecision wee used one by one as an input to the cinsim coe to evaluate the heteogeneous 2D-Mesh achitectue fo paametes such as delay at tagets, souce thoughput and destination thoughputs. In the Figues fom 5 to 10, we will use some abbeviations as given in Table 3. Table 3: Abbeviations Used in Figues 5 to 10 Name West Fist Stoe and Fowad Patial Cut Though Souce Thoughput Taget Thoughput Steady State Simulation Teminating Simulation Abbeviation WF SaF PCT STP TTP SSS TS 1598
Anjum et al., 2011 Table 2: Detail of Combinations fo the Geneation of Diffeent XML Files Combination Routing Type Switching Type Simulation Type Measuement Type 1 West Fist Stoe and Fowad Steady State Delay at Tagets 2 XY Stoe and Fowad Steady State Delay at Tagets 3 West Fist Patial Cut Though Steady State Delay at Tagets 4 XY Patial Cut Though Steady State Delay at Tagets 5 West Fist Stoe and Fowad Teminating Delay at Tagets 6 XY Stoe and Fowad Teminating Delay at Tagets 7 West Fist Patial Cut Though Teminating Delay at Tagets 8 XY Patial Cut Though Teminating Delay at Tagets 9 West Fist Stoe and Fowad Steady State Souce Thoughput 10 XY Stoe and Fowad Steady State Souce Thoughput 11 West Fist Patial Cut Though Steady State Souce Thoughput 12 XY Patial Cut Though Steady State Souce Thoughput 13 West Fist Stoe and Fowad Teminating Souce Thoughput 14 XY Stoe and Fowad Teminating Souce Thoughput 15 West Fist Patial Cut Though Teminating Souce Thoughput 16 XY Patial Cut Though Teminating Souce Thoughput 17 West Fist Stoe and Fowad Steady State Taget Thoughput 18 XY Stoe and Fowad Steady State Taget Thoughput 19 West Fist Patial Cut Though Steady State Taget Thoughput 20 XY Patial Cut Though Steady State Taget Thoughput 21 West Fist Stoe and Fowad Teminating Taget Thoughput 22 XY Stoe and Fowad Teminating Taget Thoughput 23 West Fist Patial Cut Though Teminating Taget Thoughput 24 XY Patial Cut Though Teminating Taget Thoughput Figue 5: Delay Vs Simulation Runs Using SSS Figues 5 and 6 with combinations fom 1 to 4 and 5 to 8 (see Table 2) shows the delay at tagets Vs simulation uns using steady state simulation and clock cycle using teminating simulation espectively. In both Figues XY with PCT has the lowest delay and poves to be the best combination fo the heteogeneous 2D-Mesh NoC. Figue 7 and 8 with combinations fom 9 to 12 and 13 to 16 shows the souce thoughput Vs simulation uns using steady state simulation and clock cycle using teminating simulation espectively. On the aveage the souce thoughput fo the steady 1599
1600 J. Basic. Appl. Sci. Res., 1(10)1594-1602, 2011 state simulation inceases fo the next simulation uns on the othe hand it deceases fo the teminating simulation up to cycle 15 and then andomly vaies between 0.3 and 0.45. Figue 6: Delay Vs Clock Cycle Using TS Figue 7: STP Vs Simulation Runs Using SSS Figue 8: STP Vs Clock Cycle Using TS Similaly Figue 9 and 10 with combinations fom 17 to 20 and 21 to 24 shows the taget thoughput Vs simulation uns using steady state simulation and clock cycle using teminating simulation espectively. Again in this case XY with PCT has the lagest taget thoughput. Keeping in view of these esults we can conclude that XY outing with PCT switching technique can be selected as a best choice fo 5x5 heteogeneous 2D-Mesh NoC. Similaly any othe NoC achitectue can be analyzed fo diffeent combinations and the best one can be selected fo the implementations.
Anjum et al., 2011 Figue 9: TTP Vs Simulation Runs Using SSS Figue 10: TTP Vs Clock Cycle Using TS 5 Conclusions In this pape we intoduced the eade to a new tool named as Component Based Inteconnection Netwok Simulato (CINSIM) that is designed fo linux platfoms and poves to be a suitable choice fo netwok-on-chip simulations and pefomance evaluation. We showed the methodology of designing the XML files using the GUI based schema diven edito cinsim-gui taking a paticula 5x5 heteogeneous 2D-Mesh NoC achitectue into consideation by selecting diffeent combinations of outing stategies, switching techniques and simulation types povided by the edito. Late we applied the coe cinsim to the designed XML files of the 2D-Mesh NoC to get the simulation esults elated to the delay at tagets, souce thoughput and taget thoughputs. The esults show that the XY outing algoithm with Patial Cut Though switching technique poves to be the best choice fo this paticula achitectue. Hence the CINSIM tool can effectively be used fo the design, analysis and selection of diffeent kind of NoC achitectues and algoithms befoe thei actual implementations which in tun helps to acceleate the oveall pocess of NoC platfom design. Acknowledements This wok is jointly sponsoed by COMSATS Institute Of Infomation Technology, Pakistan and 国家自然科学基金项目 unde the No. 60425413. REFERENCES [1] ITRS. Intenational technology oadmap fo semiconductos - 2004 edition, http://public.its.net/ [2] L. Benini, G. D. Micheli, 2002. Netwoks on Chip: A New SoC Paadigm. IEEE Compute., 35(1): 70-78. 1601
J. Basic. Appl. Sci. Res., 1(10)1594-1602, 2011 [3] Kuma S et al., 2002. A Netwok on Chip Achitectue and Design Methodology. In the Poceedings of the 2002 IEEE Compute Society Annual Symposium on VLSI, pp: 117-124. [4] Jantsch A, Tenhunen H, 2003. Netwoks on Chip. Kluwe Academic Publishes, pp: 85-106. [5] Sun Y R, Kuma S, Jantsch A. Simulation and Evaluation fo a Netwok on Chip Achitectue Using Ns-2. In the Poceedings of 2002 20th NORCHIP Confeence, pp: 6. [6] Vahdatpou A, Tavakoli A, Falaki M H. Hieachical Gaph: A New Cost Effective Achitectue fo Netwok on Chip. In the Poceedings of 2005 EUC(IFIP), LNCS 3824, pp. 311-320. [7] Fall K, Vaadhan K, 2006. The NS Manual. The VINT Poject, available at http://www.isi.edu/nsnam/ns/ns-documentation.html [8] Altman E, Jimenez T., 2003. NS Simulato fo Beginnes: Lectue Notes, Univ. de Los Andes, Meida, Venezuela, Fance. [9] The netwok simulato - NS-2, available at http://www.isi.edu/nsnam/ns/ [10] Walte A, Kühm M, Tutsch D, Lüdtke D, Zimmemann C., 2004. CINSim Handbook: Installation and Use's Guide. Technische Univesität Belin, Real-Time Systems and Robotics Copyight 2004-2007, pp: 18-68. [11] Sheaz Anjum, Jie Chen, Pei-pei Yue and Jian Liu, 2009. A Delay Optimized Achitectue fo On-Chip Communication. Jounal of Electonic Science and Technology of China., 7(2): 104-109. [12] Sheaz Anjum, Chen Jie, Yue Pei-pei and Liu Jian, 2008. Taffic Modeling and Mapping of H.264 Encode on 2D-Mesh Vs Application Specific NoC. Jounal of System Simulation., 20(10): 2782-2788. 1602