NETWORK-ON-CHIP FLOORPLANNING AND APPLICATION MAPPING USING CROSS-ENTROPY METHOD TAN CHEE WEI UNIVERSITI TEKNOLOGI MALAYSIA
NETWORK-ON-CHIP FLOORPLANNING AND APPLICATION MAPPING USING CROSS-ENTROPY METHOD TAN CHEE WEI A project report submitted in partial fulfilment of the requirements for the award of the degree of Master of Engineering (Electrical - Computer and Microelectronic System) Faculty of Electrical Engineering Universiti Teknologi Malaysia MAY 2015
To my beloved father, mother and sister. iii
iv ACKNOWLEDGEMENT First and foremost, I would like to express my deepest appreciation to my Master project supervisor, Assoc. Prof. Dr. Muhammad Nadzir Marsono for giving me the opportunity to complete my Master project under his supervision and guidance. I would like to grab this opportunity to thank him for his patience in guiding me and giving me lot of creative ideas on how to complete my project and research. Special thanks to Ms. Tei Yin Zhen, a PhD candidate from UTM for sharing with me her valuable research results on Network-on-Chip (NoC) application mapping using Genetic Algorithm (GA). I would also like to salute all the authors and researchers that published their valuable research results and findings through journals, articles or books that are vital for my references to understand more about Network-on-Chip (NoC) architecture and NoC floorplanning and application mapping techniques so that I can complete this project successfully. Last but not least, I would like to thank all lecturers, moderators, department staffs and friends that helped me in making my Master project successful. Tan Chee Wei Bayan Lepas, Penang
v ABSTRACT The increase in number of on-chip components (IP core) integration on System on Chip (SoC) has caused the communication of on-chip components (IP core) to hit the bottleneck of communication due to bandwidth limitation of buses. Networkon-Chip (NoC) is introduced to solve the communication bandwidth problem and it is widely used in System-on-Chip (SoC) nowadays to enable the communication between on-chip components through routers and network channel within the chip so that the complexity of communication between on-chip components can be reduced by reducing number of wire used which can lead to huge saving in chip area and reducing dynamic power significantly. The performance of Network-on-Chip (NoC) is highly dependence on floorplanning methodology used which can improve performance (transfer rate) of Network-on-Chip (NoC) blocks while meeting communication requirements and achieving minimal area overhead. The Cross-Entropy (CE) method has been applied successfully by researcher in various optimization problems and able to produce promising results. Therefore, the Cross-Entropy (CE) Method is introduced to solve optimization problems for Network on Chip (NoC) floorplanning and application mapping. The Cross-Entropy (CE) method is used to generate optimal floorplan with optimal communication cost for various multimedia benchmark applications. Evaluation results show that the Cross-Entropy (CE) method is able to produce comparable results compared to other selected methods from published journal papers and has faster convergence in terms of iteration/generation when compared to GA with heuristic crossover and random initial mapping.
vi ABSTRAK Peningkatan dalam bilangan atas cip komponen (teras IP) integrasi Sistem pada Chip (SoC) menyebabkan komunikasi komponen atas cip (teras IP) megalami kesesakan komunikasi kerana had jalur lebar bas. Rangkaian pada cip (NOC) yang diperkenalkan untuk menyelesaikan masalah lebar jalur komunikasi dan ia digunakan secara meluas dalam Sistem pada Chip (SoC) hari ini untuk membolehkan komunikasi antara komponen atas cip melalui router dan saluran rangkaian dalam cip supaya kerumitan komunikasi antara komponen atas cip boleh dikurangkan dengan mengurangkan beberapa wayar yang digunakan yang boleh membawa kepada penjimatan kos yang besar di kawasan cip dan mengurangkan kuasa dinamik dengan ketara. Prestasi Rangkaian pada cip (NOC) adalah sangat bergantung kepada metodologi floorplanning digunakan yang boleh meningkatkan prestasi (kelajuan pemindahan) blok rangkaian pada cip (NOC) di samping memenuhi keperluan komunikasi dan mencapai overhed kawasan yang minimum. Kaedah Cross-Entropy (CE) telah digunakan dengan berjayanya oleh penyelidik dalam pelbagai masalah pengoptimuman dan dapat menghasilkan kejayaan awal. Oleh itu, Kaedah Cross- Entropy (CE) diperkenalkan untuk menyelesaikan masalah pengoptimuman untuk rangkaian pada cip (NOC) floorplanning dan aplikasi pemetaan. Kaedah Cross- Entropy (CE) yang digunakan untuk menghasilkan Pelan Lantai optimum dengan kos komunikasi yang optimum untuk pelbagai aplikasi penanda aras multimedia. Keputusan penilaian menunjukkan bahawa kaedah Cross-Entropy (CE) mampu menghasilkan keputusan yang setanding berbanding dengan kaedah lain dipilih dari kertas jurnal yang diterbitkan dan mempunyai penumpuan yang lebih cepat dari segi lelaran / generasi berbanding Genetic Algorithm (GA) dengan crossover heuristik dan pemetaan awal rawak.
vii TABLE OF CONTENTS CHAPTER TITLE PAGE DECLARATION DEDICATION ACKNOWLEDGEMENT ABSTRACT ABSTRAK TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES LIST OF ABBREVIATIONS LIST OF SYMBOLS ii iii iv v vi vii x xi xiv xv 1 INTRODUCTION 1 1.1 Motivation of Research 1 1.2 Problem Statement 1 1.3 Objectives 2 1.4 Scope of Work 2 1.5 Report Organization 2 2 LITERATURE REVIEW 4 2.1 Overview of NoC Architecture 4 2.2 NoC Architecture 4 2.2.1 Regular NoC topology 5 2.2.2 Other NoC topology 6 2.3 Floorplan Representation Models 7 2.3.1 Slicing Floorplans 8 2.3.2 Non-slicing Floorplans 9 2.4 NoC Floorplanning and Application Mapping 10 2.5 The Cross-Entropy Method 14
viii 2.5.1 Smoothed Probabilities Updating Equation 14 2.5.2 Main Cross-Entropy (CE) Algorithm for Optimization 15 2.5.3 Node Transition Algorithm (Fast Trajectories Generation Algorithm) 17 2.6 Chapter Summary 19 3 PROPOSED SOLUTION AND IMPLEMENTATION STRATEGY 20 3.1 Introduction 20 3.2 NoC Floorplanning and Application Mapping Methodology 20 3.3 CE Method for NoC Floorplanning and Application Mapping 21 3.3.1 CE NoC Floorplanning and Application Mapping Algorithm 22 3.3.2 Sink Node and NoC Sequence Handling in CE Program 25 3.4 Communication Cost 26 3.5 Node Transition Probabilities Matrix 27 3.6 Representation of Solution 27 3.7 Chapter Summary 30 4 EVALUATION RESULTS 31 4.1 Generated NoC Floorplans 32 4.1.1 Multiwindow Display (MWD) Benchmark Application with 9 IP Cores 32 4.1.2 Audio Video Benchmark Application with 18 IP Cores 34 4.1.3 MPEG4 Decoder Benchmark Application with 12 IP Cores 36 4.1.4 Video Object Plane Decoder (VOPD) Benchmark Application with 16 IP Cores 38 4.1.5 263 Decoder MP3 Decoder Benchmark Application with 14 IP Cores 40 4.2 NoC Floorplan Comparison: CE Method versus Selected Methods 42
ix 4.3 Relationship between Parameters 47 4.4 Chapter Summary 50 5 CONCLUSIONS AND FUTURE WORK 51 5.1 Summary of findings 51 5.2 Future Work 52 REFERENCES 53 Appendices A D 55 71
x LIST OF TABLES TABLE NO. TITLE PAGE 2.1 Advantages and disadvantages of mapping techniques 13 3.1 Descriptions of variables used in CE MATLAB program 23 4.1 Number of IP core and size of mesh topology of various multimedia benchmark applications 31 4.2 Possible search space and startpoint defined for various multimedia benchmark applications 31 4.3 Results comparison: CE vs selected methods for MWD, Audio Video, MPEG4, VOPD and MP3 42
xi LIST OF FIGURES FIGURE NO. TITLE PAGE 2.1 a) Generic 3x3 Mesh Network with 9 routers b) Torus network NoC architecture 5 2.2 Butterfly fat tree NoC architecture 6 2.3 Extended butterfly fat tree NoC architecture 7 2.4 a) Example of a slicing floorplan with 6 blocks (P1 to P6) and its possible corresponding slicing tree b) Another possible slicing tree for slicing floorplan that consists of 6 blocks (P1 to P6) 8 2.5 Two types of minimal non-slicing floorplans (wheel floorplans) 9 2.6 A hierarchical floorplan and its corresponding floorplan tree where W represents wheel floorplan that consists of 5 blocks (P3 to P7) 10 2.7 Different types of mapping techniques 12 2.8 Flowchart of Main Cross-Entropy (CE) Algorithm for Optimization 16 2.9 Flowchart of Node Transition Algorithm (Fast Trajectories Generation Algorithm) 18 3.1 CE concept for NoC floorplanning and application mapping 21 3.2 Flowchart of CE Algorithm for NoC floorplanning and application mapping 24 3.3 Optimal NoC mapping sequence sink node handling and transformation 25 3.4 Example of optimal NoC mapping sequence 28 3.5 Example for graphical representation of node transition probabilities matrix 28 3.6 Example of mathematical representation of probabilities matrix (in excel - xls format) 29 3.7 Example for graph of Communication Cost (Mbps) versus Iteration 29
xii 4.1 TDG of Multiwindow Display (MWD) benchmark application with 9 IP cores 32 4.2 CE generated optimal floorplan for Multiwindow Display (MWD) benchmark application with 9 IP cores 33 4.3 a) Node transition probabilities matrix b) Graph of Communication Cost (Mbps) versus Iteration for Multiwindow Display (MWD) benchmark application with 9 IP cores 33 4.4 TDG of Audio Video benchmark application with 18 IP cores 34 4.5 CE generated optimal floorplan for Audio Video benchmark application with 18 IP cores 35 4.6 a) Node transition probabilities matrix b) Graph of Communication Cost (Mbps) versus Iteration for Audio Video benchmark application with 18 IP cores 35 4.7 TDG of MPEG4 decoder benchmark application with 12 IP cores 36 4.8 CE generated optimal floorplan for MPEG4 decoder benchmark application with 12 IP cores 37 4.9 a) Node transition probabilities matrix b) Graph of Communication Cost (Mbps) versus Iteration for MPEG4 decoder benchmark application with 12 IP cores 37 4.10 TDG of Video Object Plane Decoder (VOPD) benchmark application with 16 IP cores 38 4.11 CE generated optimal floorplan for VOPD benchmark application with 16 IP cores 39 4.12 a) Node transition probabilities matrix b) Graph of Communication Cost (Mbps) versus Iteration for VOPD benchmark application with 16 IP cores 39 4.13 TDG of 263 decoder mp3 decoder benchmark application with 14 IP cores 40 4.14 CE generated optimal floorplan for 263 decoder mp3 decoder benchmark application with 14 IP cores 41 4.15 a) Node transition probabilities matrix b) Graph of Communication Cost (Mbps) versus Iteration for 263 decoder mp3 decoder benchmark application with 14 IP cores 41 4.16 Communication Cost (Mbps) of various mapping techniques for MWD, Audio Video, MPEG4, VOPD and MP3 43 4.17 Communication Cost (Mbps) of various mapping techniques for VOPD 43
xiii 4.18 Communication Cost (Mbps) versus Iteration/Generation for MWD - CE vs GA 44 4.19 Communication Cost (Mbps) versus Iteration/Generation for Audio Video - CE vs GA 44 4.20 Communication Cost (Mbps) versus Iteration/Generation for MPEG4 - CE vs GA 45 4.21 Communication Cost (Mbps) versus Iteration/Generation for VOPD - CE vs GA 45 4.22 Communication Cost (Mbps) versus Iteration/Generation for MP3 - CE vs GA 46 4.23 Graph of computation time versus total number of random NoC sequences generated each iteration, N 48 4.24 Graph of computation time versus Alpha, α 49 4.25 Graph of communication cost (Mbps) versus Alpha, α 49 A.1 Netlist of MWD (multiwindow_netlist.m) 55 A.2 Netlist of Audio Video (audiovideo_netlist.m) 56 A.3 Netlist of MPEG4 (mpeg4_netlist.m) 56 A.4 Netlist of VOPD (VOPD_netlist.m) 57 A.5 Netlist of MP3 (mp3_netlist.m) 57 A.6 CE program wrapper (demo_noc_ce.m) part 1 58 A.7 CE program wrapper (demo_noc_ce.m) part 2 58 A.8 CE program wrapper (demo_noc_ce.m) part 3 59 A.9 Main CE program (NOC_CE.m) part 1 59 A.10 Main CE program (NOC_CE.m) part 2 60 A.11 Main CE program (NOC_CE.m) part 3 60 A.12 Main CE program (NOC_CE.m) part 4 61 A.13 Main CE program (NOC_CE.m) part 5 61 A.14 Main CE program (NOC_CE.m) part 6 62 A.15 Main CE program (NOC_CE.m) part 7 62 A.16 Node Transition Algorithm (Fast Trajectories Generation Algorithm) (gen_noc.m) part 1 63 A.17 Node Transition Algorithm (Fast Trajectories Generation Algorithm) (gen_noc.m) part 2 64 A.18 Node Transition Algorithm (Fast Trajectories Generation Algorithm) (gen_noc.m) part 3 65 A.19 Algorithm for calculating communication cost 66 A.20 Algorithm for updating probabilities matrix based on best NoC sequence generated 66
xiv LIST OF ABBREVIATIONS SoC - System-on-Chip NoC - Network-on-Chip IC - Integrated Circuit IP - Intellectual Property VLSI - Very-large-scale integration PE - Processing Element TDG - Traffic Distribution Graph TDM - Traffic Distribution Matrix BSG - Bounded Slicing Grid Structure NP-hard - Non-deterministic Polynomial-time hard CBL - Corner Block List GA - Genetic Algorithm PSO - Particle Swarm Optimization DPSO - Discrete Particle Swarm Optimization ILP - Integer Linear Programming ACO - Ant Colony Optimization SA - Simulated Annealing CE - Cross-Entropy MWD - Multiwindow Display VOPD - Video Object Plane Decoder NP_GA - Genetic Algorithm with Network Partitioning NMAP - Fast algorithm for mapping IP cores onto mesh floorplan PSMAP - Meta-heuristic strategy using Particle Swarm Optimization technique
xv LIST OF SYMBOLS α - Smoothing parameter ρ - Rarity parameter
CHAPTER 1 INTRODUCTION 1.1 Motivation of Research Performance, power and area are three areas of focus of System-on-Chips (SoCs) design nowadays. Network-on-Chip (NoC) plays an important role in balancing and achieving these aggressive three critical key targets. The performance of NoC is highly dependence on floorplanning methodology used as the placement of NoC core and routers may impact the data transfer rate and area overhead of the chip. Optimal NoC floorplan can help to improve performance (transfer rate) of NoC while meeting communication requirements and achieving minimal area overhead. Besides this, optimal NoC floorplan can help to achieve low power while meeting performance and area requirements. 1.2 Problem Statement Heuristic optimization method has been used to solve application mapping and floorplanning problem due to large NoC mapping search space. Simulated Annealing (SA) and other heuristic optimization methods such as Genetic Algorithm (GA) and adaptive search techniques can be trapped in local minimum. Reference [1] shows that the Cross-Entropy (CE) method has fast convergence and is less susceptible trapped in a local minimum as Cross-Entropy (CE) method is a global search method.
2 1.3 Objectives The main objectives of this project are: 1. To propose Network-on-Chip (NoC) floorplanning and application mapping based on Traffic Distribution Graph (TDG) using the Cross-Entropy (CE) method. 2. To find the optimal source-to-destination communication path through routers for each communication path in Traffic Distribution Graph (TDG) given. 1.4 Scope of Work The main focus of this project is to develop MATLAB program based on the Cross-Entropy (CE) method that can: 1. Generate random NoC mapping sequences based on given Traffic Distribution Graph (TDG). 2. Find optimal NoC floorplan/mapping sequence with the lowest communication cost. 1.5 Report Organization This report is divided into 5 chapters: 1. Chapter 1 discusses about the motivation of research, problem statement, objective, scope of work and report organization. 2. Chapter 2 summarizes literature reviews on the floorplan representation models, NoC architecture and NoC application mapping and floorplanning. Various NoC application mapping and floorplanning techniques/methods are discussed. A brief introduction of the Cross-Entropy (CE) method, smoothed probabilities updating equation, the main Cross-Entropy (CE) Algorithm for Optimization and Node Transition Algorithm (Fast Trajectories Generation Algorithm) are discussed at the end of this chapter.
3 3. Chapter 3 discusses about the NoC floorplanning and application mapping, communication cost, node transition matrix and how solution can be represented. 4. Chapter 4 shows CE generated floorplans for each of the multimedia benchmark applications. Optimal floorplans with optimal communication cost generated using the Cross-Entropy (CE) method are then compared with floorplans generated using selected methods in term of communication cost and convergence speed in terms of iteration/generation. Besides this, the relationship between parameter is also discussed at the end of this chapter. 5. Chapter 5 summarizes all the findings of this project and discusses briefly about future works recommendations for future researchers that plan to do research on NoC floorplanning and application mapping.
REFERENCES 1. I. Walter, I. Cidon, A. Kolodny, and D. Sigalov, The era of many-modules soc: revisiting the noc mapping problem, in Proceedings of the 2nd International Workshop on Network on Chip Architectures, pp. 43 48, ACM, 2009. 2. M. Danashtalab and M. Palesi, Basic concepts on on-chip networks, in Routing Algorithms in Networks-on-Chip, pp. 1 18, Springer, 2014. 3. H. Zhou, C.-W. Sham, and H. Yao, Slicing floorplans with handling symmetry and general placement constraints, in VLSI (ISVLSI), 2014 IEEE Computer Society Annual Symposium on, pp. 112 117, IEEE, 2014. 4. L. Jain and A. Singh, Non slicing floorplan representations in vlsi floorplanning: A summary, International Journal of Computer Applications, vol. 71, no. 15, pp. 12 20, 2013. 5. R. Marculescu, U. Y. Ogras, L.-S. Peh, N. E. Jerger, and Y. Hoskote, Outstanding research problems in noc design: system, microarchitecture, and circuit perspectives, Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 28, no. 1, pp. 3 21, 2009. 6. P. K. Sahu and S. Chattopadhyay, A survey on application mapping strategies for network-on-chip design, Journal of Systems Architecture, vol. 59, no. 1, pp. 60 76, 2013. 7. Y. Z. Tei, Y. W. Hau, N. Shaikh-Husin, and M. Marsono, Network partitioning domain knowledge multiobjective application mapping for large-scale network-on-chip, Applied Computational Intelligence and Soft Computing, vol. 2014, p. 9, 2014. 8. A. A. Morgan, H. Elmiligi, M. W. El-Kharashi, and F. Gebali, Area and delay optimization for networks-on-chip architectures using genetic algorithms, in Design and Test Workshop (IDT), 2009 4th International, pp. 1 6, IEEE, 2009. 9. Y. Z. Tei, M. Marsono, N. Shaikh-Husin, and Y. W. Hau, Network partitioning and ga heuristic crossover for noc application mapping, in Circuits and Systems (ISCAS), 2013 IEEE International Symposium on, pp. 1228 1231, May 2013.
54 10. J. Soumya and S. Chattopadhyay, Application-specific network-on-chip synthesis with flexible router placement, Journal of Systems Architecture, vol. 59, no. 7, pp. 361 371, 2013. 11. J. Soumya, S. Tiwary, and S. Chattopadhyay, Area-performance trade-off in floorplan generation of application-specific network-on-chip with soft cores, Journal of Systems Architecture, vol. 61, no. 1, pp. 1 11, 2015. 12. P. K. Sahu, T. Shah, K. Manna, and S. Chattopadhyay, Application mapping onto mesh-based network-on-chip using discrete particle swarm optimization, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 22, no. 2, pp. 300 312, 2014. 13. R. Y. Rubinstein and D. P. Kroese, The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation and machine learning. Springer Science & Business Media, 2004. 14. V. Dumitriu and G. N. Khan, Throughput-oriented noc topology generation and analysis for high performance socs, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 17, no. 10, pp. 1433 1446, 2009. 15. S. Murali and G. De Micheli, Bandwidth-constrained mapping of cores onto noc architectures, in Proceedings of the conference on Design, automation and test in Europe-Volume 2, p. 20896, IEEE Computer Society, 2004. 16. K. Srinivasan, K. S. Chatha, and G. Konjevod, Linear-programming-based techniques for synthesis of network-on-chip architectures, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 14, no. 4, pp. 407 420, 2006. 17. I. C. Chia, Floorplaning methodology for network on chip, Master s thesis, Universiti Teknologi Malaysia, 2012. 18. Z. Lu, L. Xia, and A. Jantsch, Cluster-based simulated annealing for mapping cores onto 2d mesh networks on chip, in Design and Diagnostics of Electronic Circuits and Systems, 2008. DDECS 2008. 11th IEEE Workshop on, pp. 1 6, IEEE, 2008. 19. S. Tosun, Cluster-based application mapping method for network-on-chip, Advances in Engineering Software, vol. 42, no. 10, pp. 868 874, 2011. 20. P. K. Sahu, P. Venkatesh, S. Gollapalli, and S. Chattopadhyay, Application mapping onto mesh structured network-on-chip using particle swarm optimization., in ISVLSI, pp. 335 336, 2011.