NETWORKS-ON-CHIP (NoCs) have proven to be highly

Size: px

Start display at page:

Download "NETWORKS-ON-CHIP (NoCs) have proven to be highly"

Dora Matthews
5 years ago
Views:

1 IEEE EMBEDDED SYSTEMS LETTERS, VOL. XX, NO. Y, MONTH XX, ORION3.: A Comprehensive NoC Router Estimation Tool Andrew B. Kahng, Fellow, IEEE, Bill Lin, Senior Member, IEEE, and Siddhartha Nath, Student Member, IEEE Abstract Networks-on-Chip (NoCs) are increasingly used in many-core architectures. ORION [9] is a widely adopted NoC power and area estimation tool but its estimation models can have large errors (up to 185%) versus actual implementation. We present ORION3., an open-source tool whose parametric and non-parametric modeling methodologies fundamentally differ from ORION logic template-based approaches in that the estimation models are derived from actual physical implementation data. When compared with actual implementations, ORION3. models achieve average estimation errors of no more than 9.8% across microarchitecture, implementation, and operational parameters as well as multiple router RTL generators. A comprehensive suite of these methodologies has been implemented in the ORION3. distribution [2]. Index Terms Network-On-Chip, regression, metamodeling. I. INTRODUCTION NETWORKS-ON-CHIP (NoCs) have proven to be highly scalable and low-latency interconnection fabrics in the era of many-core architectures, as evidenced by commercial chips such as the Intel 8-core [16], and IBM Blue Gene [15] processors. NoCs are now a key uncore element in the updated MPU system driver model in the 213 International Technology Roadmap for Semiconductors (ITRS) [17]. Because of their growing importance, NoCs must be optimized for latency and power [11]. To facilitate early design-space exploration, accurate NoC power and area estimation tools are required. We describe ORION3., a comprehensive NoC router estimation tool that embodies both parametric and nonparametric models. We include new models of router component blocks using parametric [5] and non-parametric modeling [6] methodologies that fundamentally differ from ORION [9] in that the estimation models are derived from post-place-and-route (P&R) data that correspond to a given RTL generator and target cell library. Within this paradigm, we describe two approaches that are implemented in ORION3.. The first approach is based on parametric modeling. Our work in [5] makes a substantial departure from the ORION approach in that no logic template is assumed for any router component block. Instead, for each component block in the router RTL, appropriate parametric models are A. B. Kahng is with the Departments of Computer Science and Engineering, and of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA abk@ucsd.edu. B. Lin is with the Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA billlin@ece.ucsd.edu. S. Nath is with the Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA sinath@ucsd.edu Copyright 213 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an to pubs-permissions@ieee.org. derived from post-synthesis netlists by observing how instance counts change with microarchitectural, implementation, and operational parameters. We call these models ORION. We perform least-squares regression (LSQR) with actual post- P&R power and area data to refine these ORION models. The resulting parametric models achieve worst-case errors significantly better than those of ORION. This parametric modeling methodology enables a separation of concerns and skillsets: it does not require the architect or developer to understand how the architectural components are implemented on chip. Rather, the methodology relies on a onetime characterization of post-synthesis data to derive parametric models of component blocks, and automatic fitting of these models to post-p&r data using parametric regression. The second approach is based on non-parametric modeling. Estimation models are again derived from post-p&r power and area data that correspond to a given RTL generator and target cell library. The non-parametric modeling approach can automatically derive accurate estimation models based on a sample set of post-p&r results. ORION3. extends ideas from [8] [3] by incorporating four metamodeling techniques for automatic model generation: Radial Basis Functions (), Kriging (), Multivariate Adaptive Regression Splines with linear and cubic splines, and Support Vector Machine () regression. The non-parametric modeling approach does not require the architect or developer to understand how architectural components are physically implemented. For backward compatibility, ORION3. also includes the logic template-based models that comprise ORION. Based on modeling accuracy requirements, and availability of training and testing data for regression, users have the flexibility to choose appropriate modeling methodologies. For example, when training and testing data for a technology or tool flow is not available, users may use ORION models with scaled technology parameters. Our main contributions are as follows: 1) We describe a new parametric modeling methodology that derives accurate parametric models from postsynthesis netlists by observing how instance counts change with microarchitectural, implementation, and operational parameters. Use of post-synthesis netlists accurately captures contributions from both control and datapath in the design. 2) We demonstrate that non-parametric regression techniques,,, and, can yield highly accurate (worst-case error 2%) NoC power and area estimates. 3) ORION3. is available on the web for download. Over 1 downloads have been made from industry and academia since availability commenced in February 213.

2 IEEE EMBEDDED SYSTEMS LETTERS, VOL. XX, NO. Y, MONTH XX, The remainder of this paper is organized as follows. Section II presents ORION models and describes our parametric modeling methodology, and Section III provides description of our non-parametric modeling methodology. Section IV describes the ORION3. distribution itself, including software architecture and extensibility with user-defined models as well as training and testing datasets. Section V concludes this work. II. PARAMETRIC MODELING Figure 1 shows example of a modern on-chip network router with input and output buffers, switch and virtual channel arbiter, and crossbar. ORION uses logic template models for these router blocks. However, these models can be inaccurate because of mismatches between the actual RTL and the templates assumed. Moreover, typical design flows involve sophisticated design steps that have complex interactions among them, making their effects difficult to characterize. Figures 2 and show power and instance-count estimation errors at for ORION relative to two router RTL generators (Netmaker [19] from Cambridge and the Stanford NoC router [22]), as a function of the number of input ports in the router. The maximum errors are greater than 1% and 1%, respectively. For these two RTL generators, [5] reports significant improvements in power, area, and instance-count estimation using ORION models. Fig. 1. Router architecture [11]. Fig. 2. Poor estimations by ORION [9]. Power and instance counts of Netmaker and Stanford NoC vs. ORION at as a function of #ports. A. Model Enhancements. The ORION router block models in ORION3. model instances (or gates) in each router block, which our studies show to be required for accurate estimations of area and power. The microarchitecture parameters used are #Ports (P), #VCs (V), #Buffers (B) and Flit-width (F). The constant factors in instancecount models of InBUF, OutBUF and frequency derating are derived by linear regression with post-synthesis netlists. Crossbar (XBAR) Model. ORION models comprehend modern router RTL implementations which use smaller crossbars instead of traditional matrix [11] and multiplexer tree [9] implementation options in ORION. The XBAR uses tri-state buffers (modeled as 2:1 MUX) to control each flit. Hence, the total number of such MUXes required is P P F. Switch and VC Arbiter (SWVC) Model. ORION removes the default overhead factor of 3% used by ORION because our analysis indicates that this overhead is not needed with frequencies in the range 4 MHz 9 MHz for process nodes to 13. Instance-count in SWVC is modeled as 9 (P (P (V 2 + 1) + (V 2 1)). The constant factor 9 arises because six 2-input NOR gates, two INVerters, and one D-FlipFlop are used to generate one grant signal on each path. Input Buffer (InBUF) Model. ORION models take into account control signals and housekeeping logic which are needed required to decode flits and manage VCs. ORION models lack these components and are hence inaccurate. In ORION, FIFO buffers are modeled as 2 P V B F, and control signals and housekeeping logic are modeled as 18 P V + 2 P 2 V B+3 P V B+5 P 2 B+P 2 +F P+15 P. Output Buffer (OutBUF) Model. ORION models take into account hybrid output buffer implementations in modern router RTLs as well as control signals per port and VC associated with each buffer. Output buffers are thus modeled differently from input buffers, with OutBUF given as P (8 V + 25). Clock and Control Logic (CLKCTRL) Model. Unlike ORION, ORION models the clock buffers and routing resources as frequency scales. These resources are modeled as 2% of the sum of instances in the SWVC, InBUF and OutBUF component blocks. Frequency Derating Model. ORION models are agnostic to implementation parameters such as clock frequency and results in large estimation errors at high frequencies. We first find the frequency below which instance counts change by less than 1%. We derate instance counts by a multiplier Instance that is based on this frequency as Instance = Frequency ConstantFactor. B. Modeling Methodology. Figure 3 shows our flow to derive new parametric models, ORION, from post-synthesis netlists and subsequent refinement process by fitting the models to post-p&r area, power, and instance counts data. We use the Netmaker and the Stanford NoC router RTL generators, and a range of values of microarchitecture parameters (P, V, B and F) and implementation parameters (clock frequency and technology node) to configure the router. We synthesize the router RTLs using Synopsys Design Compiler vf sp4-64 (DC) [23] and Cadence RTL Compiler vedi1.1 (RC) [13], with options to preserve module hierarchy to enable us to analyze each router component block. We derive ORION models from analysis of postsynthesis netlists of the component blocks. To refine these models, we generate post-p&r power and area data. We place and route the synthesized netlists using Cadence SOC Encounter vedi1.1 (SOCE) with die utilization of.75 and die aspect ratio of 1., and use Synopsys PrimeTime-PX vf sp3-7 (PT-PX) [23] to run power analysis based on the post-p&r netlist, SPEF [23] and SDC [1]. Finally, we use the MATLAB vr211b [18] function lsqnonneg to fit the models to post-p&r data. C. Results. Figures 4 and respectively compare power and area estimation errors of ORION to those of ORION at

3 IEEE EMBEDDED SYSTEMS LETTERS, VOL. XX, NO. Y, MONTH XX, 213. NoC router RTL generators 3 Implementation params: Clock Frequency µarch params: P, V, B, F Synthesis and P&R: DC/RC, SOCE Analysis of blocks: XBAR, SW & VC arbiter, Input & Output buffers Post-P&R area, power, instances ORION_ models for each component block LSQR New fitted models Fig. 3. Development of ORION and fitted models using post-p&r data. Fig. 5. Development of non-parametric regression models using post-p&r data. and. The ORION estimates are very close to actual implementation (average error of 9.8% in estimating Netmaker power at ) and are robust across multiple microarchitecture, implementation parameters, and router RTLs. and test data points. The input variables to all the models are P, V, B and F and the responses are post-p&r power and area. We use two training set sizes sparse and restricted with 5 data points that omit higher values of the microarchitectural parameters,1 and sparse only with 64 data points that are sampled using Latin Hypercube Sampling [4]. The sparse and restricted set allows us to assess how well the models generalize in estimating area and power for input parameters which are beyond the range of values used for training. In each experiment, model generation takes around 3s and response estimation takes around 1.88s. We repeat all experiments 1 times for each training set size, and report the averages of all the error values across the 1 trials. Figures 6 and respectively compare area and power estimation errors across all modeling techniques at and for the sparse and restricted training sets. The average errors are 1% in area and 2% in power. performs better than other techniques across technologies in both area and power estimation. can be up to 3 more accurate than, and. Figures 7 and show similar plots for the sparse only training sets. Again, is more accurate than other techniques, but the difference in accuracy is not as significant with and as compared to Figures 6 and. Across all training set sizes used in our experiments, area and power estimation errors are the smallest for and are the largest for. Stanford NoC NetMaker 1% 8% 6% 4% 2% % Stanford NoC NetMaker Fig. 4. Regression fit vs. ORION: power and area estimation errors. III. N ON -PARAMETRIC M ODELING Non-parametric regression techniques provide another approach to estimate NoC power and area [8] [3]. The models determine the interactions between all input variables and how they affect the output (or response). This alleviates the effort needed to model architecture-level implementations of NoCs. At the same time, non-parametric regression approaches are scalable across multiple router RTLs, technology libraries and commercial tool flows. In ORION3., we implement four popular non-parametric regression or metamodeling techniques,,, and. Detailed descriptions of these techniques are in [2]. A. Modeling Methodology. We derive NoC area and power models by performing nonparametric fit of post-p&r data using and technology libraries. We first perform synthesis using Synopsys Design Compiler vf sp4-64 [23], followed by place and route using Cadence SOC Encounter vedi1.1 [13], of the Netmaker [19] router RTL. Figure 5 shows our flow to derive non-parametric regression models. B. Results. We use 256 data points of post-p&r power and area values using and technology libraries to generate training Fig. 6. Estimation errors in area and power at with sparse and restricted training sets (i.e. 5 data points for training) % 8% 6% 4% 2% % Fig. 7. Estimation errors in area and power at with sparse only training sets (i.e., 64 data points for training). 1 More precisely, the resulting training sets omit all values of {B = 7}, or of {P = 9}, or of {V = 7}, or of {F = 64}.

IEEE EMBEDDED SYSTEMS LETTERS, VOL. XX, NO. Y, MONTH XX, 213. 4 IV. ORION3. DISTRIBUTION We now describe ORION3.

4 IEEE EMBEDDED SYSTEMS LETTERS, VOL. XX, NO. Y, MONTH XX, IV. ORION3. DISTRIBUTION We now describe ORION3. software architecture, extensibility with new models and training/testing datasets, and details of the ORION3. software distribution. A. Software Architecture and Extensibility. ORION3. uses a modular software architecture and is written in C and MATLAB vr212b. Figure 8 shows the high-level software architecture, how router configuration is read by the tool and how models are invoked. ORION3. offers command-line options for the user to choose (1) ORION3. vs. ORION models, (2) specific ORION3. modeling techniques (basic, lsqr, rbf, kg, mars, svm), and (3) training and/or testing datasets when any of {lsqr, rbf, kg, mars, svm} is used. As with ORION, users can configure microarchitecture parameters such as flit-width, #input and output buffers, #virtual channels, #pipeline stages, type of crossbar, #ports, etc. in the SIM port.h file. These parameters are used with either the ORION models or the ORION3. basic model; the latter refers to ORION (cf. Section II), which is used by default when no options are specified by the user. Users may specify implementation and operational parameters such as technology library, size of MUXes in the crossbar, and the input load on the router from the link. We have updated the technology files with accurate leakage and internal power data from a leading foundry s GS libraries for all cell types used in modeling, and are in the process of calibrating foundry 28 library models. All models report area in µm 2, power in mw, and energy in J. Fig. 8. Software architecture of ORION3.. When choosing any of lsqr, rbf, kg, mars, svm methods, a user may optionally provide training and testing data points. ORION3. performs basic validation of such user data to ensure that it can be converted to a non-singular matrix. In the absence of user-provided data, the tool uses default training and testing data points based on the technology configured in the SIM port.h file in the PARM TECH POINT field. Users may also develop their own regression models in MATLAB: the ORION3. distribution provides a template shell script that executes the regression model in MATLAB, and the shell script is called from orion router.c using the system method in C. B. Software Distribution. ORION3. is downloadable at [2]. We provide academic MATLAB toolboxes for [21], [1], [12], and [14] under the same copyright and license agreements as available in their distributions. Doxygen-based documentation of all functions and implemented structures is provided in the distribution. V. CONCLUSION Accurate modeling for NoC area and power estimation is critical to successful early design-space exploration in the era of many-core computing. ORION, while very popular, has large errors versus actual implementation. This is because there is often a mismatch between the actual router RTL and the templates assumed. Also, typical design flows involve sophisticated optimizations that are difficult to characterize. We present ORION3., an open-source tool that incorporates comprehensive parametric and non-parametric modeling techniques to accurately estimate NoC power and area. Our ORION parametric models explicitly account for control and data path resources. We further refine these parametric models by least-squares regression (LSQR) on post- P&R data. ORION3. non-parametric models include four popular techniques,,, and. Our studies show that these techniques can be low-overhead and highly accurate in estimating NoC power and area, with with being more accurate than the other methods for sparse and restricted training sets. ORION3. is now available for web download [2]. ACKNOWLEDGMENTS This work was supported in part by NSF grant SHF , the MARCO GSRC focus center, and the SRC. We thank Mr. Jeremiah Fong for developing the front-end interfaces for ORION3.. REFERENCES [1] J. Bhasker and R. Chadha, Static Timing Analysis for Nanometer Designs: A Practical Approach, Springer, 29. [2] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 29. [3] K. Jeong, A. B. Kahng, B. Lin and K. Samadi, Accurate Machine Learning-Based On-Chip Router Modeling, IEEE ESL 2(3) (21), pp [4] R. Jin, W. Chen and T. W. Simpson, Comparative Studies of Metamodeling Techniques Under Multiple Modeling Criteria, Trans. Struct. Multidiscip. Optim. 23 (21), pp [5] A. B. Kahng, B. Lin and S. Nath, Explicit Modeling of Control and Data for Improved NoC Router Estimation Proc. DAC, 212, pp [6] A. B. Kahng, B. Lin and S. Nath, Comprehensive Modeling Methodologies for NoC Router Estimation, TR CS , UCSD CSE Dept., 212. [7] A. B. Kahng, B. Lin and S. Nath, Enhanced Metamodeling Techniques for High-Dimensional IC Design Estimation Problems, Proc. DATE, 213, pp [8] A. B. Kahng, B. Lin and K. Samadi, Improved On-Chip Router Analytical Power and Area Modeling Proc. ASP-DAC, 21, pp [9] A. B. Kahng, B. Li, L.-S. Peh and K. Samadi, ORION : A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration, Proc. DATE, 29, pp [1] S. N. Lophaven, H. B. Nielsen and J. Sondergaard, Aspects of the MATLAB Toolbox DACE, TR IMM-REP-22-13, Tech. Univ. of Deark, 22. [11] H.-S. Wang, L.-S. Peh and S. Malik, Orion: A Power-Performance Simulator for Interconnection Networks, Proc. MICRO, 22, pp [12] ARESLab. [13] Cadence Design Systems, Inc. [14] LIB. cjlin/libsvm [15] IBM Blue Gene. [16] Intel 8-Core. [17] ITRS Edition Reports and Ordering. [18] MATLAB. [19] Netmaker. rdm34/wiki [2] ORION3.. [21] 2 Manual. mjo/rbf.html [22] Stanford NoC. [23] Synopsys, Inc.

5 IEEE EMBEDDED SYSTEMS LETTERS, VOL. XX, NO. Y, MONTH XX, Dear Editor and Reviewers: We are submitting ORION3.: A Comprehensive NoC Router Estimation Tool for publication in IEEE Embedded Systems Letters. In our paper, we describe ORION3., an open-source tool recently released for download from ORION3. makes a number of significant improvements over ORION (29). Quality of the release is supported by over 1 downloads by academic and industry users during the past several months. We point out that some of the ORION3. improvements to NoC parametric modeling are described in our earlier conference paper, Explicit Modeling of Control and Data for NoC Router Estimation, Proc. ACM/IEEE/EDAC Design Automation Conference, 212 (reference [5] in our submission). Following are the major contributions of ORION3.. We significantly improve accuracy of parametric models of NoC router blocks, as compared to ORION (29), by implementing ORION models described in [5]. These models are derived from analysis of post-synthesis netlists of multiple router RTL generators (synthesized using multiple commercial tools). We demonstrate further improvement of model accuracy by automatic fitting of post-layout data to the ORION models using least-squares regression. We develop methodology to derive accurate non-parametric models by applying Radial Basis Functions (), Kriging (), Multivariate Adaptive Regression Splines () and Support Vector Machines () techniques. These models automatically fit post-layout data that is specific to a commercial SP&R tool flow and technology library; separation of concerns is supported in that architects and front-end designers can perform NoC design space exploration without understanding of physical implementation issues. We provide training and testing data sets for NoC routers implemented using a leading foundry s and technologies. We describe the software architecture, user interface, and extensibility mechanisms of ORION3.. The parametric and nonparametric modeling options available to the user are (1) ORION, (2) ORION models [5], (3) ORION fitted with post-layout data using least-squares regression, (4), (5), (6), and (7). Users can seamlessly extend the ORION3. interfaces to incorporate new non-parametric area and power models, as well as new training and testing datasets. Please contact me with any questions concerning the submission. Thank you for your consideration. Sincerely, Siddhartha Nath (on behalf of co-authors Andrew B. Kahng and Bill Lin) Department of Computer Science and Engineering University of California at San Diego La Jolla, CA sinath@ucsd.edu

ORION3.0: A Comprehensive NoC Router Estimation Tool

IEEE EMBEDDED SYSTEMS LETTERS, VOL. XX, NO. Y, MONTH XX, 2013. 1 ORION3.0: A Comprehensive NoC Router Estimation Tool Andrew B. Kahng, Fellow, IEEE, Bill Lin, Senior Member, IEEE, and Siddhartha Nath,