Energy-efficient Custom Topology-based Dynamic Voltage-frequency Island-enabled Network-on-chip Design

Size: px
Start display at page:

Download "Energy-efficient Custom Topology-based Dynamic Voltage-frequency Island-enabled Network-on-chip Design"

Transcription

1 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.18, NO.3, JUNE, 2018 ISSN(Print) ISSN(Online) Energy-efficient Custom Topology-based Dynamic Voltage-frequency Island-enabled Network-on-chip Design Chang-Lin Li, Jae-Chern Yoo, and Tae Hee Han * Abstract The voltage-frequency island (VFI) design paradigm has strong potential for reducing energy consumption in network-on-chip (NoC). The V/F of each island can be dynamically tuned according to the application s requirements. However, dynamic VFI (DVFI) requires an efficient on-chip communication architecture to compensate for the latency overhead produced while tuning the proper V/F of each VFI. Although standard topology has been used in most VFI designs, this approach incurs a large energy and latency overhead owing to the redundant hop counts. Therefore, we propose a custom topology-based DVFI for an energy-efficient manycore platform to maximize energy efficiency with a reasonable implementation cost. In this regard, a custom topology generation method with a heuristic run-time V/F tuning algorithm is incorporated by considering the core and link utilization. Experimental results demonstrated the effectiveness of the proposed scheme in terms of execution time and energy-delay product. Index Terms Network-on-chip (NoC), voltagefrequency island (VFI), dynamic voltage-frequency island (DVFI), custom topology, topology generation Manuscript received Aug. 25, 2017; accepted Jan. 22, 2018 The College of Information and Communication Engineering, Sungkyunkwan University than@skku.edu I. INTRODUCTION Owing to the diminishing returns of the performance scaling and ever-increasing computational demand of single-core processors, the system-on-chip (SoC) design paradigm has shifted to the manycore processor era. Moreover, the communication bottleneck between the processing cores and the memory has forced the communication subsystem to adopt a scalable and distributed on-chip interconnection architecture, which is called network-on-chip (NoC) [1]. In addition, energy efficiency has become a primary design concern not only for battery-powered embedded systems but also for highend server machines. In this regard, the voltagefrequency island (VFI) design has been widely adopted as an efficient and scalable energy optimization solution [2]. In a VFI-based manycore system, it is possible to tune the V/F of each VFI dynamically under the given performance constraints [3]. Compared with per-core dynamic voltage frequency scaling (DVFS), where each core has its own V/F scaling domain, the dynamic VFI (DVFI) is more practical for large-scale manycore processors in terms of implementation complexity and associated cost, considering the number of required voltage regulators (VRs) and phase-locked loops (PLLs) that cannot be well-scaled down with the finer fabrication technologies [4]. Moreover, compared with per-core DVFS, the DVFI is well-suited to the state-ofthe-art highly energy-efficient asymmetric multicore architecture such as the ARM big.little technology. However, the DVFI requires distributed core and link-

2 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.18, NO.3, JUNE, (a) Fig. 1. VFI architecture with (a) a mesh, (b) a custom topology. level information for assigning and tuning a proper V/F value to each VFI, which incurs extra latency [5]. Therefore, a latency-aware communication architecture for DVFI-enabled NoCs is needed. Most VFI-related NoC architectures employ the standard (e.g., mesh) topology (Fig. 1(a)). However, several studies have demonstrated that the standard topology produces a large energy and latency overhead owing to the path with long multi-hop. Therefore, it cannot cope with the overhead produced in the DVFI [6, 7]. On the other hand, a custom topology (Fig. 1(b)) is well suited to accommodate the diversified requirements of today s computing environment and most recent leading-edge manycore architecture, such as the ARM DynamIQ technology [8]. In this regard, custom topologies provide more design optimization opportunities with less latency and energy overhead compared to the standard one. Therefore, a custom topology should be incorporated into the DVFI to further enhance the energy efficiency and on-chip-network latencies in a flexible manner [6]. Therefore, a new scheme for designing a custom topology-based DVFI (CT-DVFI) is proposed for achieving significant energy savings. In this regard, a custom topology generation method and an associated DVFI tuning method are incorporated. As topology generation is an NP-hard problem, a heuristic algorithm is deployed while considering the DVFI tuning. During the topology generation, the core utilization and communication traffic are utilized to cluster the cores and links. Moreover, the intra- and inter-vfi communications with the VFI information are incorporated to generate the NoC components, routers, and links, for optimizing the trade- offs between performance and energy consumption. Because the (b) proposed custom topology utilizes a minimum number of routers, the additional latency produced by the DVFI tuning can be addressed by pursuing a minimum hop count for the dedicated communication paths. For the DVFI, a tuning method is proposed to dynamically tune the V/F of each VFI according to the developed metric with core and link-level utilization during runtime. The experimental results showed significant savings in terms of execution time and energy delay product compared to the mesh topology, along with VFI configuration. The rest of the paper is organized as follows. Section 2 reviews the related works. Section 3 gives a detailed description of the proposed design method. Section 4 gives the experimental results. Finally, Section 5 concludes the paper. II. RELATED WORKS VFI-based energy optimization schemes can be categorized into two: static VFI (SVFI) and dynamic VFI (DVFI). Basically, the difference between SVFI and DVFI is the flexibility in VFI partitioning: the former fixes the VFI partition at design time, whereas the latter can reallocate the partitioning at runtime. Ogras et al. employed the VFI paradigm into their NoC design for minimum energy consumption as frontier study [2]. For further energy optimization, Jang et al. incorporated partitioning, mapping, and routing and proposed an energy optimization framework [9]. The methods used in these studies can be classified as an SVFI technique wherein the SVFI relies on a single VFI partition for all types of applications and does not vary the V/Fs of the VFIs at runtime. In contrast to SVFI-based schemes, DVFI-based schemes can tune the V/F values of each VFI at runtime to further reduce the energy dissipation. Yan et al. proposed a hybrid regulator scheme to improve the power efficiency of multicore architectures with restricted shape and size of the VFIs [10]. Musoll analyzed the benefits of reconfiguring the VFIs in overcoming process variations [11]. These studies evaluated the energy saving and design overhead when using the DVFI compared with using the SVFI and percore DVFS, but they ignored the latency overhead produced in inherent NoC topology, e.g., the mesh. With respect to the NoC topology, the mesh is the

3 354 CHANG-LIN LI et al : ENERGY-EFFICIENT CUSTOM TOPOLOGY-BASED DYNAMIC VOLTAGE-FREQUENCY most preferred topology owing to its advantages of reusability and reduced design time. However, the inherent redundancy with multi-hops between communicating cores produces a large latency and energy overhead, and, thus, the mesh topology cannot cope with the latency overhead in the DVFI. On the other hand, a custom topology provides better opportunities for optimization and can be generated with predefined requirements with respect to the number of routers and hop counts for a given application. A number of recent studies on custom topology generation for static VFIbased NoCs have been presented, and they have demonstrated that a custom topology-based VFI scheme can achieve latency saving compared to a mesh-based VFI one [12, 13]. Consequently, to address the aforementioned problem with respect to the timing overhead of the DVFI and V/F tuning in a custom topology, it is necessary to have a new design method that incorporates the DVFI and the custom topology. III. CT-DVFI DESIGN METHOD In this section, we describe a detailed design method for constructing the CT-DVFI. The CT-DVFI design method consists of the custom topology generation and DVFI tuning steps for improving the energy efficiency. In the custom topology generation step, the core and communication information are employed to construct an optimal VFI cluster and to generate the NoC components, routers, and links for each application. In the DVFI tuning step, the V/F of each VFI is determined by a developed metric with core and link-level utilization. 1. Topology Generation The goal of the custom topology generation is to construct a topology such that all cores can communicate and transfer data over the on-chip networks to satisfy specific requirements, such as the performance and energy consumption. In addition, the cost for implementing mixed-clock first-in-first-out buffers (mcfifos), along with the partitioning between the intraand the inter-vfi communications, should be determined appropriately while considering the VFI architecture. It has been noted that, with the use of a custom topology, the energy consumption and the network latency can be reduced compared to the mesh topology. This allows us to implement DVFI tuning while maintaining the required network performance. Hence, in this part, we propose a custom topology generation method for the VFI to support efficient data transfers among VFIs and to enable the DVFI. Considering the on-chip communication and topological properties of VFI, the custom topology generation method consists of two main steps: core clustering and topology construction. In the following subsection, we give detailed descriptions of each step of the proposed topology generation method. Core clustering: Usually, it is advantageous to cluster cores with similar operating V/F demands. Several previous works have demonstrated that, if the cores with similar V/F level use the same V/F, interface overheads such as mcfifos, can be reduced [2]. In the consideration of the NoC, the communication between the cores should be fully realized to reduce the communication cost while clustering. In [12], a core clustering method was presented based on communication volume and demonstrated the energy efficiency and performance improvement it brings with various evaluations. However, the core-level information, i.e., not only the core communication but also the core utilization, needs to be realized in the DVFI. The new VFI-aware clustering method relies on both the core utilization and the communication traffic. The principle of this method is to cluster the cores with similar core utilization and the communication traffic in to same VFI so that tune V/F with easily manner. For example, the cores with low utilization should be clustered together and tuned to a cluster with low V/F level allocate cores with low utilization, whereas the high V/F level clusters allocate cores with high utilization. In this respect, the instructions per cycle (IPC) and the communication volume is used to count the core utilization and communication traffic, respectively. The pseudocode of the proposed core clustering method is shown in Fig. 2. First, the core utilization is used to allocate the cores with similar behavior to the same clusters (lines 1 8). In this aspect, we followed a widely known approach called k-means clustering to cluster the cores with similar utilization. The k-means algorithm aims to iteratively partition the elements into clusters with a predefined number in which each element belongs to the cluster with the nearest mean [14]. In our

4 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.18, NO.3, JUNE, iteratively find the nearest cluster and move the core to that cluster according to 2 arg min x i - m j. (2) This operation repeats until all the clusters are updated and p clusters with similar utilization are generated. Then, for realizing the communication-induced energy consumption, the communication volume is deployed to create q clusters in each of the p clusters generated previously. The communication-based cluster begins by constructing the initial group with the required minimum voltage, which strongly affects the communication energy and design overhead as demonstrated in [13]. The cores in each of the P clusters are allocated to the corresponding Q (the number of communication-based clusters) group according to 2 ( VH -VL ) Vi < q, q = 1,..., Q. (3) Q Fig. 2. Pseudocode of the core clustering algorithm. method, we start by forming the basic initialization for the k-means clustering where each core is randomly assigned to P (the number of workload-based clusters) clusters (line 1) and calculate the initial center of each cluster (lines 1-2). The center of each cluster, μ, is defined as the mean utilization (IPC) of all the included cores in that cluster and is calculated as m j = å å N d 1 ij u i= i N d i= 1 ij. (1) Here, p is the cluster number, u i is the IPC for the i-th core, and δ ij is an indicator function, which is set to 1 if and only if the i-th core belongs to the p-th cluster or set to 0 otherwise. The cores in each cluster are evaluated to Here, V i is the required minimum voltage of the i-th core, and V H and V L are the predefined highest and lowest voltage in each of the P clusters, respectively. Since the V/F of all the cores within each cluster should be identical, the V/F of all the cores in the same clusters is set to the maximum value among the cores. Because the communication-based clustering is focused on the energy consumption, the communication energy is estimated for the current cluster as a temporary value. The energy consumption is calculated using the equation given in [4]. For the communication energy reduction, iteratively, we select the cores with the larger inter-vfi communication and check either cores to determine whether the total energy consumption can be reduced by migrating it to the other clusters. The core pair with the next highest communication volume is considered if none of the alternative core migrations can reduce the total energy consumption. Accordingly, the inter-vfi links are transformed into intra-vfi links for the migrated core pairs. This iteration stops when there is no improvement in energy consumption after the subsequent rearrangement of the VFIs. Consequently, the cores with similar utilization and communication traffic are clustered together in this step. Topology construction: The topology construction is

5 356 CHANG-LIN LI et al : ENERGY-EFFICIENT CUSTOM TOPOLOGY-BASED DYNAMIC VOLTAGE-FREQUENCY Because the routing path should be determined for the generated topology, the shortest path routing is used as the default routing. For each inter-vfi communication, the inter-vfi links on the minimal routing path are retrieved and the associated number of inter-vfi links is incremented. Among the candidate inter-vfi links in each VFI communication, the link with the largest number of VFI links is the most frequently used inter- VFI link. Finally, mcfifos are generated on the chosen optimal inter-vfi links. 2. DVFI Tuning Method Fig. 3. Pseudocode of the topology construction algorithm. built upon the clusters from the previous step and upon the cores contained in each cluster. We select the clusters one by one and assign each one to an appropriate router with a restricted number of ports. The detailed process is discussed below. The pseudocode of the proposed topology construction method is shown in Fig. 3. In the algorithm, we first place the routers to build up an initial VFI NoC topology. The minimum number of routers should be determined while connecting the cores to routers. To prevent excessive design complexity associated with the number of ports in the routers, typical four-port routers, which are common in two-dimensional NoC design, is used. Given a cluster with n nodes, the minimum number of routers, R min, is determined as With timing-varying workloads, dynamic fine-tuning of the V/F levels of VFIs is applied. The traditional DVFS uses core-level information to tune the core s V/F values. For the DVFI, we use the combined information from all cores and links within the VFI and the core and link utilization to determine the suitable V/F of each VFI. Therefore, we employ a metric, M, that incorporates the information of cores and links in VFI, which is defined as uc M = w + i l c å wl nc å i VFI j nl. (5) " Î j j " lîvfi j Here, uc i is the utilization of the i-th core, ul l is the utilization of the l-th link, nc j is the number of cores in the j-th VFI, and nl l is the number of links in the j-th VFI. ω c and ω l are the weights for the utilization of the core and link, respectively. The weights are calculated as the proportion of the core to the link utilization. According to the value of M, the predicted V/F is calculated and the V/F is adjusted for the VFI. ul R min é n ù = 2 ê ú. (4) Next, for each intra-vfi communication, the core pair with the larger communication volume among all the communicating core pairs will be connected to the same router iteratively until all the core airs are connected. Then, the routers with available ports are each other to generate as many inter-vfi links as possible. Finally, network interfaces (NI) are generated between the routers and the cores to packetize the data. IV. EXPERIMENTAL RESULTS In this section, the efficiency of proposed CT-DVFI is evaluated for comparing mesh-based designs and VFI configuration. Sniper [16], a multi-core simulator, is used to obtain detailed core and network-level information. The platform configurations were set according to the Intel Xeon Nehalem architecture for constructing a 64- core system. We modified the Sniper code to support NoC interconnection. A nominal cache configuration of 64 KB L1 instruction and data caches and a shared 8 MB

6 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.18, NO.3, JUNE, L2 cache is assumed. The PARSEC and SPLASH2 benchmarks were used in the simulation. The core-level statistic generated by the Sniper simulations was integrated into McPAT [17] to determine the energy consumption. To consider the nominal operation scenario, the adopted dynamic V/F level uses discrete V/F pairs as 0.5 V/1.25 GHz, 0.6 V/1.5 GHz, 0.6 V/1.5 GHz, 0.8 V/2.0 GHz, 0.9 V/2.25 GHz, and 1 V/2.5 GHz. To estimate the energy overhead introduced by the on-chip VR, we follow the method used in a recent work [2] and the overhead can be calculated as E = (1 - h) C V - V. (6) VR f Here, E VR is the energy dissipated by the voltage regulator due to a voltage transition, η is the power efficiency of the regulator, C filter is the regulator filter capacitance, and V 2 and V 1 are the two voltage levels. Therefore, the energy overhead for each VFI was calculated as the sum of energy overhead of clock signal, mcfifos, and VR. To demonstrate the performance of our proposed method, we considered the different VFI configurations, such as SVFI and DVFI, and network configurations, such as the mesh and a custom topology. Therefore, we performed simulations for mesh-based SVFI (ME-SVFI), mesh-based DVFI (ME-DVFI), custom topology-based SVFI (CT-SVFI), and custom topology-based DVFI (CT-DVFI). As the baseline to all our configurations, we considered the commonly used mesh-based non-vfi (ME-NVFI). The per-core DVFI scheme was ignored in this experiment owing to its impracticality in manycore design. We compared the execution time for all of the configurations considered here to the baseline ME-NVFI in Fig. 4. We can see that the custom topology produced less latency overhead compared to the mesh one in each VFI configuration. In addition, the CT-DVFI produced the lowest values compared with other configurations of the benchmarks considered. Moreover, the energy delay product is evaluated as comparative results. Fig. 5 shows the normalized energy delay product of all configurations with respect to ME-NVFI. The CT-DVFI configuration shows less energy delay product compared to its mesh counterpart in all benchmarks. Also, the DVFI outperform compare to the other configurations running Fig. 4. Comparison of the execution time using different benchmarks. Fig. 5. Comparison of the energy delay products using different benchmarks. either mesh or custom topology owing to the capability of the DVFI to energy consumption with little performance impact.

7 358 CHANG-LIN LI et al : ENERGY-EFFICIENT CUSTOM TOPOLOGY-BASED DYNAMIC VOLTAGE-FREQUENCY V. CONCLUSIONS In this paper, a new scheme for designing a custom topology-based DVFI is proposed for energy-efficient manycore platforms. We demonstrated that the proposed CT-DVFI significantly improved the energy efficiency without sacrificing the performance. We also showed that, for all the benchmarks considered, it was able to save significant energy-delay product in all topological and VFI configurations and combined configurations. ACKNOWLEDGMENTS This work was supported by the MOTIE (Ministry of Trade, Industry & Energy ( ) and KSRC (Korea Semiconductor Research Consortium) support program for the development of the future semiconductor device and by the IT R&D Program of MSIP/IITP ( ). REFERENCES [1] L. Benini and G. De Micheli, Networks on Chips: A New SoC Paradigm, Computer, Vol. 35, No. 1, pp , [2] U. Y. Ogras, et al., Voltage-Frequency Island Partitioning for GALS-based Networks-on-Chip, IEEE Design Automation Conference, pp , [3] R. David, et al., Dynamic Power Management of Voltage-Frequency Island Partitioned Networkson-Chip using Intel s Single-chip Cloud Computer, IEEE/ACM International Symposium on Networks on Chip, [4] S. Herbert and D. Marculescu, Analysis of Dynamic Voltage/Frequency Scaling in Chip- Multiprocessors, International Symposium on Low Power Electronics and Design, pp , [5] L. Guang, et al., Autonomous DVFS on Supply Islands for Energy-Constrained NoC Communication, International Conference on Architecture of Computing Systems, pp , [6] S. Tosun, et al., Application-specific topology generation algorithms for network-on-chip design, IET Computer & Digital Techniques, Vol. 6, No. 5, pp , [7] B. Huang, et al., Application-Specific Networkon-Chip Synthesis with Topology-Aware Floorplanning, Symposium on Integrated Circuits and Systems Design, pp. 1-6, [8] [9] W. Jang and D. Z. Pan, A Voltage-Frequency Island Aware Energy Optimization Framework for Network-on-Chip, IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 1, no. 3, pp [10] P. Choudhary and D. Marculescu, Power management of voltage/frequency Island-based systems using hardware based methods, IEEE Transactions on Very Large Scale Integration Systems, vol. 17, no. 3, pp , [11] J. Howard et al., A 48-core IA-32 processor in 45 nm CMOS using on-die message-passing and DVFS for performance and power scaling, IEEE J. Solid-State Circuits, vol. 46, no. 1, pp , Jan [12] C. L. Li, et al., Communication-aware custom topology generation for VFI network-on-chip, IEICE Electronics Express, Vol. 11, No. 18, pp. 1-8, 2014 [13] C. Li, et al., Energy-efficient Custom Topology Generation for Link-failure-aware Network-onchip in Voltage-frequency Island Regime, Journal of Semiconductor Technology and Science, Vol. 16, No. 6, pp , [14] S. Jin et al., Statistical Energy Optimization on Voltage Frequency Island based MPSoCs in the Presence of Process Variations, Microelectronics Journal, vol. 54, pp [15] J. A. Hartigan. Clustering Algorithms. WILEY, [16] T. E. Carlson, et al., Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-Core Simulation, international Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-12, [17] S. Li, et al, McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures, International Symposium on Microarchitecture, pp , 2009.

8 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.18, NO.3, JUNE, Chang-Lin Li received his B.S. degree from the Department of Computer, Electronics and Telecommunication engineering from Yanbian University of Science and Technology, Yanji, China, in 2010, and M.S. degree inform the Department of Cogno-Mechatronics Engineering from Pusan National Universiy, Busan, Korea in He is currently a combined M.S. and Ph.D. student in the Department of Electrical and Computer Engineering at Sungkyunkwan University, Suwon, Korea. Tae Hee Han received his BS, MS, and PhD degrees in electrical engineering from KAIST, Daejeon, Republic of Korea, in 1992, 1994, and 1999, respectively. From 1999 to 2006, he had been with the Telecom R&D Center in Samsung Electronics, Suwon, Korea. Since March 2008, he has been with Sungkyunkwan University, Suwon, Republic of Korea, as a professor. His research interests include SoC architectures and design technologies. From May 2011 to April 2013, he had served as a full-time advisor on semiconductor devices for the Korean government. Jae-Chern Yoo received the B.S. degree in electronics from Sungkyunkwan University, Korea, in 1986, and the M.S. and Ph.D. degrees in information & communication engineering, and electronics from KAIST and POSTECH, Korea, in 1996 and 2001, respectively. Since March 2008, he has been with Sungkyunkwan University as an Associate Professor.

WITH the development of the semiconductor technology,

WITH the development of the semiconductor technology, Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip Guang Sun, Yong Li, Yuanyuan Zhang, Shijun Lin, Li Su, Depeng Jin and Lieguang zeng Abstract Network on Chip (NoC)

More information

ACCORDING to the International Technology Roadmap

ACCORDING to the International Technology Roadmap 420 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 1, NO. 3, SEPTEMBER 2011 A Voltage-Frequency Island Aware Energy Optimization Framework for Networks-on-Chip Wooyoung Jang,

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

Wireless NoC and Dynamic VFI Co-Design: Energy Efficiency without Performance Penalty

Wireless NoC and Dynamic VFI Co-Design: Energy Efficiency without Performance Penalty > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < Wireless NoC and Dynamic VFI Co-Design: Energy Efficiency without Performance Penalty Ryan Gary Kim, Student Member,

More information

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip Codesign for Tiled Manycore Systems Mingyu Wang and Zhaolin Li Institute of Microelectronics, Tsinghua University, Beijing 100084,

More information

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari Global Journal of Computer Science and Technology: E Network, Web & Security Volume 15 Issue 6 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Energy Efficient MapReduce with VFI-enabled Multicore Platforms

Energy Efficient MapReduce with VFI-enabled Multicore Platforms Energy Efficient MapReduce with VFI-enabled Multicore Platforms Karthi Duraisamy *, Ryan Gary Kim *, Wonje Choi *, Guangshuo Liu, Partha Pratim Pande *, Radu Marculescu, Diana Marculescu *School of EECS

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.1, FEBRUARY, 2015 http://dx.doi.org/10.5573/jsts.2015.15.1.077 Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network

More information

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections A.SAI KUMAR MLR Group of Institutions Dundigal,INDIA B.S.PRIYANKA KUMARI CMR IT Medchal,INDIA Abstract Multiple

More information

Embedded Systems: Projects

Embedded Systems: Projects December 2015 Embedded Systems: Projects Davide Zoni PhD email: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni Research Activities Interconnect: bus, NoC Simulation (component design, evaluation)

More information

Energy-Efficient Multicore Chip Design Through Cross-Layer Approach

Energy-Efficient Multicore Chip Design Through Cross-Layer Approach Energy-Efficient Multicore Chip Design Through Cross-Layer Approach Paul Wettin, Jacob Murray, Partha Pande, Behrooz Shirazi School of Electrical Engineering and Computer Science Washington State University

More information

Last Level Cache Size Flexible Heterogeneity in Embedded Systems

Last Level Cache Size Flexible Heterogeneity in Embedded Systems Last Level Cache Size Flexible Heterogeneity in Embedded Systems Mario D. Marino, Kuan-Ching Li Leeds Beckett University, m.d.marino@leedsbeckett.ac.uk Corresponding Author, Providence University, kuancli@gm.pu.edu.tw

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee

More information

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing

More information

Energy-Efficiency Prediction of Multithreaded Workloads on Heterogeneous Composite Cores Architectures using Machine Learning Techniques

Energy-Efficiency Prediction of Multithreaded Workloads on Heterogeneous Composite Cores Architectures using Machine Learning Techniques Energy-Efficiency Prediction of Multithreaded Workloads on Heterogeneous Composite Cores Architectures using Machine Learning Techniques Hossein Sayadi Department of Electrical and Computer Engineering

More information

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.705

More information

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN Comparative Analysis of Latency, Throughput and Network Power for West First, North Last and West First North Last Routing For 2D 4 X 4 Mesh Topology NoC Architecture Bhupendra Kumar Soni 1, Dr. Girish

More information

Mark Sandstrom ThroughPuter, Inc.

Mark Sandstrom ThroughPuter, Inc. Hardware Implemented Scheduler, Placer, Inter-Task Communications and IO System Functions for Many Processors Dynamically Shared among Multiple Applications Mark Sandstrom ThroughPuter, Inc mark@throughputercom

More information

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic A Novel Design of High Speed and Area Efficient De-Multiplexer Using Pass Transistor Logic K.Ravi PG Scholar(VLSI), P.Vijaya Kumari, M.Tech Assistant Professor T.Ravichandra Babu, Ph.D Associate Professor

More information

Reconfigurable Multicore Server Processors for Low Power Operation

Reconfigurable Multicore Server Processors for Low Power Operation Reconfigurable Multicore Server Processors for Low Power Operation Ronald G. Dreslinski, David Fick, David Blaauw, Dennis Sylvester, Trevor Mudge University of Michigan, Advanced Computer Architecture

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Network on Chip Architectures BY JAGAN MURALIDHARAN NIRAJ VASUDEVAN

Network on Chip Architectures BY JAGAN MURALIDHARAN NIRAJ VASUDEVAN Network on Chip Architectures BY JAGAN MURALIDHARAN NIRAJ VASUDEVAN Multi Core Chips No more single processor systems High computational power requirements Increasing clock frequency increases power dissipation

More information

342 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH /$ IEEE

342 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH /$ IEEE 342 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH 2009 Custom Networks-on-Chip Architectures With Multicast Routing Shan Yan, Student Member, IEEE, and Bill Lin,

More information

Temperature and Traffic Information Sharing Network in 3D NoC

Temperature and Traffic Information Sharing Network in 3D NoC , October 2-23, 205, San Francisco, USA Temperature and Traffic Information Sharing Network in 3D NoC Mingxing Li, Ning Wu, Gaizhen Yan and Lei Zhou Abstract Monitoring Network on Chip (NoC) status, such

More information

The (Low) Power of Less Wiring: Enabling Energy Efficiency in Many-Core Platforms Through Wireless NoC (Invited Paper)

The (Low) Power of Less Wiring: Enabling Energy Efficiency in Many-Core Platforms Through Wireless NoC (Invited Paper) The (Low) Power of Less Wiring: Enabling Energy Efficiency in Many-Core Platforms Through Wireless NoC (Invited Paper) Partha Pratim Pande, Ryan Gary Kim, Wonje Choi School of EECS, Washington State University,

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors , July 4-6, 2018, London, U.K. A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid in 3D chip Multi-processors Lei Wang, Fen Ge, Hao Lu, Ning Wu, Ying Zhang, and Fang Zhou Abstract As

More information

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Power-Mode-Aware Buffer Synthesis for Low-Power

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology Surbhi Jain Naveen Choudhary Dharm Singh ABSTRACT Network on Chip (NoC) has emerged as a viable solution to the complex communication

More information

Clustering-Based Topology Generation Approach for Application-Specific Network on Chip

Clustering-Based Topology Generation Approach for Application-Specific Network on Chip Proceedings of the World Congress on Engineering and Computer Science Vol II WCECS, October 9-,, San Francisco, USA Clustering-Based Topology Generation Approach for Application-Specific Network on Chip

More information

3118 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 10, OCTOBER 2016

3118 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 10, OCTOBER 2016 3118 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 10, OCTOBER 2016 Hybrid L2 NUCA Design and Management Considering Data Access Latency, Energy Efficiency, and Storage

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Parallelized Network-on-Chip-Reused Test Access Mechanism for Multiple Identical Cores

Parallelized Network-on-Chip-Reused Test Access Mechanism for Multiple Identical Cores IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 35, NO. 7, JULY 2016 1219 Parallelized Network-on-Chip-Reused Test Access Mechanism for Multiple Identical Cores Taewoo

More information

On GPU Bus Power Reduction with 3D IC Technologies

On GPU Bus Power Reduction with 3D IC Technologies On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The

More information

Cycle accurate transaction-driven simulation with multiple processor simulators

Cycle accurate transaction-driven simulation with multiple processor simulators Cycle accurate transaction-driven simulation with multiple processor simulators Dohyung Kim 1a) and Rajesh Gupta 2 1 Engineering Center, Google Korea Ltd. 737 Yeoksam-dong, Gangnam-gu, Seoul 135 984, Korea

More information

Bandwidth Aware Routing Algorithms for Networks-on-Chip

Bandwidth Aware Routing Algorithms for Networks-on-Chip 1 Bandwidth Aware Routing Algorithms for Networks-on-Chip G. Longo a, S. Signorino a, M. Palesi a,, R. Holsmark b, S. Kumar b, and V. Catania a a Department of Computer Science and Telecommunications Engineering

More information

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect 1 A Soft Tolerant Network-on-Chip Router Pipeline for Multi-core Systems Pavan Poluri and Ahmed Louri Department of Electrical and Computer Engineering, University of Arizona Email: pavanp@email.arizona.edu,

More information

A Low-Power ECC Check Bit Generator Implementation in DRAMs

A Low-Power ECC Check Bit Generator Implementation in DRAMs 252 SANG-UHN CHA et al : A LOW-POWER ECC CHECK BIT GENERATOR IMPLEMENTATION IN DRAMS A Low-Power ECC Check Bit Generator Implementation in DRAMs Sang-Uhn Cha *, Yun-Sang Lee **, and Hongil Yoon * Abstract

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

DVFS-ENABLED SUSTAINABLE WIRELESS NoC ARCHITECTURE

DVFS-ENABLED SUSTAINABLE WIRELESS NoC ARCHITECTURE DVFS-ENABLED SUSTAINABLE WIRELESS NoC ARCHITECTURE Jacob Murray, Partha Pratim Pande, Behrooz Shirazi School of Electrical Engineering and Computer Science Washington State University {jmurray, pande,

More information

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications Li Tan 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California, Riverside 2 Texas

More information

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Gregor von Laszewski, Lizhe Wang, Andrew J. Younge, Xi He Service Oriented Cyberinfrastructure Lab Rochester Institute of Technology,

More information

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University Design and Test Solutions for Networks-on-Chip Jin-Ho Ahn Hoseo University Topics Introduction NoC Basics NoC-elated esearch Topics NoC Design Procedure Case Studies of eal Applications NoC-Based SoC Testing

More information

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

Self-adaptability in Secure Embedded Systems: an Energy-Performance Trade-off

Self-adaptability in Secure Embedded Systems: an Energy-Performance Trade-off Self-adaptability in Secure Embedded Systems: an Energy-Performance Trade-off N. Botezatu V. Manta and A. Stan Abstract Securing embedded systems is a challenging and important research topic due to limited

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

Reliable Time Synchronization Protocol for Wireless Sensor Networks

Reliable Time Synchronization Protocol for Wireless Sensor Networks Reliable Time Synchronization Protocol for Wireless Sensor Networks Soyoung Hwang and Yunju Baek Department of Computer Science and Engineering Pusan National University, Busan 69-735, South Korea {youngox,yunju}@pnu.edu

More information

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI, CHEN TIANZHOU, SHI QINGSONG, JIANG NING College of Computer Science Zhejiang University College of Computer

More information

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013 NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network Improving the Data Scheduling Efficiency of the IEEE 802.16(d) Mesh Network Shie-Yuan Wang Email: shieyuan@csie.nctu.edu.tw Chih-Che Lin Email: jclin@csie.nctu.edu.tw Ku-Han Fang Email: khfang@csie.nctu.edu.tw

More information

Heuristics Core Mapping in On-Chip Networks for Parallel Stream-Based Applications

Heuristics Core Mapping in On-Chip Networks for Parallel Stream-Based Applications Heuristics Core Mapping in On-Chip Networks for Parallel Stream-Based Applications Piotr Dziurzanski and Tomasz Maka Szczecin University of Technology, ul. Zolnierska 49, 71-210 Szczecin, Poland {pdziurzanski,tmaka}@wi.ps.pl

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

DEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE

DEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE DEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE Roger Luis Uy College of Computer Studies, De La Salle University Abstract: Tick-Tock is a model introduced by Intel Corporation in 2006 to show the improvement

More information

Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices

Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices Ilhoon Shin Seoul National University of Science & Technology ilhoon.shin@snut.ac.kr Abstract As the amount of digitized

More information

Transaction Level Model Simulator for NoC-based MPSoC Platform

Transaction Level Model Simulator for NoC-based MPSoC Platform Proceedings of the 6th WSEAS International Conference on Instrumentation, Measurement, Circuits & Systems, Hangzhou, China, April 15-17, 27 17 Transaction Level Model Simulator for NoC-based MPSoC Platform

More information

Cache-Aware Utilization Control for Energy-Efficient Multi-Core Real-Time Systems

Cache-Aware Utilization Control for Energy-Efficient Multi-Core Real-Time Systems Cache-Aware Utilization Control for Energy-Efficient Multi-Core Real-Time Systems Xing Fu, Khairul Kabir, and Xiaorui Wang Dept. of Electrical Engineering and Computer Science, University of Tennessee,

More information

A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip

A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip 2010 25th International Symposium on Defect and Fault Tolerance in VLSI Systems A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip Min-Ju Chan and Chun-Lung Hsu Department of Electrical

More information

An Industrial Employee Development Application Protocol Using Wireless Sensor Networks

An Industrial Employee Development Application Protocol Using Wireless Sensor Networks RESEARCH ARTICLE An Industrial Employee Development Application Protocol Using Wireless Sensor Networks 1 N.Roja Ramani, 2 A.Stenila 1,2 Asst.professor, Dept.of.Computer Application, Annai Vailankanni

More information

An Energy Efficient Topology Augmentation Methodology Using Hash Based Smart Shortcut Links in 2-D Mesh

An Energy Efficient Topology Augmentation Methodology Using Hash Based Smart Shortcut Links in 2-D Mesh International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 12, Issue 10 (October 2016), PP.12-23 An Energy Efficient Topology Augmentation

More information

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip Nasibeh Teimouri

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

CLUSTER BASED ROUTING PROTOCOL FOR WIRELESS SENSOR NETWORKS

CLUSTER BASED ROUTING PROTOCOL FOR WIRELESS SENSOR NETWORKS CLUSTER BASED ROUTING PROTOCOL FOR WIRELESS SENSOR NETWORKS M.SASIKUMAR 1 Assistant Professor, Dept. of Applied Mathematics and Computational Sciences, PSG College of Technology, Coimbatore, Tamilnadu,

More information

A Network Storage LSI Suitable for Home Network

A Network Storage LSI Suitable for Home Network 258 HAN-KYU LIM et al : A NETWORK STORAGE LSI SUITABLE FOR HOME NETWORK A Network Storage LSI Suitable for Home Network Han-Kyu Lim*, Ji-Ho Han**, and Deog-Kyoon Jeong*** Abstract Storage over (SoE) is

More information

OVERVIEW: NETWORK ON CHIP 3D ARCHITECTURE

OVERVIEW: NETWORK ON CHIP 3D ARCHITECTURE OVERVIEW: NETWORK ON CHIP 3D ARCHITECTURE 1 SOMASHEKHAR, 2 REKHA S 1 M. Tech Student (VLSI Design & Embedded System), Department of Electronics & Communication Engineering, AIET, Gulbarga, Karnataka, INDIA

More information

SoC Communication Complexity Problem

SoC Communication Complexity Problem When is the use of a Most Effective and Why MPSoC, June 2007 K. Charles Janac, Chairman, President and CEO SoC Communication Complexity Problem Arbitration problem in an SoC with 30 initiators: Hierarchical

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

ShuttleNoC: Boosting On-chip Communication Efficiency by Enabling Localized Power Adaptation

ShuttleNoC: Boosting On-chip Communication Efficiency by Enabling Localized Power Adaptation ShuttleNoC: Boosting On-chip Communication Efficiency by Enabling Localized Power Adaptation Hang Lu, Guihai Yan, Yinhe Han, Ying Wang and Xiaowei Li State Key Laboratory of Computer Architecture, Institute

More information

Caching video contents in IPTV systems with hierarchical architecture

Caching video contents in IPTV systems with hierarchical architecture Caching video contents in IPTV systems with hierarchical architecture Lydia Chen 1, Michela Meo 2 and Alessandra Scicchitano 1 1. IBM Zurich Research Lab email: {yic,als}@zurich.ibm.com 2. Politecnico

More information

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction

More information

A Cache Utility Monitor for Multi-core Processor

A Cache Utility Monitor for Multi-core Processor 3rd International Conference on Wireless Communication and Sensor Network (WCSN 2016) A Cache Utility Monitor for Multi-core Juan Fang, Yan-Jin Cheng, Min Cai, Ze-Qing Chang College of Computer Science,

More information

Evaluation of NOC Using Tightly Coupled Router Architecture

Evaluation of NOC Using Tightly Coupled Router Architecture IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router

More information

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth

More information

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,

More information

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 12: On-Chip Interconnects Instructor: Ron Dreslinski Winter 216 1 1 Announcements Upcoming lecture schedule Today: On-chip

More information

Low Power Bus Binding Based on Dynamic Bit Reordering

Low Power Bus Binding Based on Dynamic Bit Reordering Low Power Bus Binding Based on Dynamic Bit Reordering Jihyung Kim, Taejin Kim, Sungho Park, and Jun-Dong Cho Abstract In this paper, the problem of reducing switching activity in on-chip buses at the stage

More information

Parallel Simulated Annealing for VLSI Cell Placement Problem

Parallel Simulated Annealing for VLSI Cell Placement Problem Parallel Simulated Annealing for VLSI Cell Placement Problem Atanu Roy Karthik Ganesan Pillai Department Computer Science Montana State University Bozeman {atanu.roy, k.ganeshanpillai}@cs.montana.edu VLSI

More information

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b 5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b 1 School of

More information

A Dedicated Monitoring Infrastructure For Multicore Processors

A Dedicated Monitoring Infrastructure For Multicore Processors IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, Vol. xx, No. xx, February 2010. 1 A Dedicated Monitoring Infrastructure For Multicore Processors Jia Zhao, Sailaja Madduri, Ramakrishna

More information

A Heuristic Search Algorithm for Re-routing of On-Chip Networks in The Presence of Faulty Links and Switches

A Heuristic Search Algorithm for Re-routing of On-Chip Networks in The Presence of Faulty Links and Switches A Heuristic Search Algorithm for Re-routing of On-Chip Networks in The Presence of Faulty Links and Switches Nima Honarmand, Ali Shahabi and Zain Navabi CAD Laboratory, School of ECE, University of Tehran,

More information

SpiNNaker - a million core ARM-powered neural HPC

SpiNNaker - a million core ARM-powered neural HPC The Advanced Processor Technologies Group SpiNNaker - a million core ARM-powered neural HPC Cameron Patterson cameron.patterson@cs.man.ac.uk School of Computer Science, The University of Manchester, UK

More information

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 20: Main Memory II Prof. Onur Mutlu Carnegie Mellon University Today SRAM vs. DRAM Interleaving/Banking DRAM Microarchitecture Memory controller Memory buses

More information

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI CHEN TIANZHOU SHI QINGSONG JIANG NING College of Computer Science Zhejiang University College of Computer Science

More information

DATA REUSE DRIVEN MEMORY AND NETWORK-ON-CHIP CO-SYNTHESIS *

DATA REUSE DRIVEN MEMORY AND NETWORK-ON-CHIP CO-SYNTHESIS * DATA REUSE DRIVEN MEMORY AND NETWORK-ON-CHIP CO-SYNTHESIS * University of California, Irvine, CA 92697 Abstract: Key words: NoCs present a possible communication infrastructure solution to deal with increased

More information

Performance Evaluation of Mesh - Based Multicast Routing Protocols in MANET s

Performance Evaluation of Mesh - Based Multicast Routing Protocols in MANET s Performance Evaluation of Mesh - Based Multicast Routing Protocols in MANET s M. Nagaratna Assistant Professor Dept. of CSE JNTUH, Hyderabad, India V. Kamakshi Prasad Prof & Additional Cont. of. Examinations

More information

Networks-on-Chip Router: Configuration and Implementation

Networks-on-Chip Router: Configuration and Implementation Networks-on-Chip : Configuration and Implementation Wen-Chung Tsai, Kuo-Chih Chu * 2 1 Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 413, Taiwan,

More information

Encoding Scheme for Power Reduction in Network on Chip Links

Encoding Scheme for Power Reduction in Network on Chip Links RESEARCH ARICLE OPEN ACCESS Encoding Scheme for Power Reduction in Network on Chip Links Chetan S.Behere*, Somulu Gugulothu** *(Department of Electronics, YCCE, Nagpur-10 Email: chetanbehere@gmail.com)

More information

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS INSTRUCTOR: Dr. MUHAMMAD SHAABAN PRESENTED BY: MOHIT SATHAWANE AKSHAY YEMBARWAR WHAT IS MULTICORE SYSTEMS? Multi-core processor architecture means placing

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information