Detection of Weak Spots in Benchmarks Memory Space by using PCA and CA

Leonardo Electronic Journal of Practices and Technologies ISSN 1583-1078 Issue 16, January-June 2010 p. 43-52 Detection of Weak Spots in Memory Space by using PCA and CA Abdul Kareem PARCHUR *, Fazal NOORBASHA and Ram Asaray SINGH Department of Physics and Electronics, Dr. H. S. Gour University, Sagar, India-470003 * E-mail: kareemskpa@hotmail.com ( * Corresponding author: +91-9907048098) Abstract This paper describes the weak spots in SPEC CPU INT 2006 memory space by using Principal Component Analysis and Cluster Analysis. We used recently published SPEC CPU INT 2006 Benchmark scores of AMD Opteron 2000+ and AMD Opteron 8000+ series processors. The four most significant PCs, which are retained for 72.6% of the variance, PC2, PC3, and PC4 covers 26.5%, 2.9%, 0.91% and 0.019% variance respectively. The dendrogram is useful to identify the similarities and dissimilarities between the benchmarks in workload space. These results and analysis can be used by performance engineers, scientists and developers to better understand the benchmark behavior in workload space and to design a Benchmark Suite that covers the complete workload space. Keywords SPEC CPU INT 2006, Principal Component Analysis (PCA), Cluster Analysis (CA), Performance. Introduction AMD Opteron 2000+ and AMD Opteron 8000+ Series Processors are the members of a new family of seventh-generation AMD processors designed to meet the computation- http://lejpt.academicdirect.org 43

Detection of Weak Spots in Memory Space by using PCA and CA Abdul Kareem PARCHUR, Fazal NOORBASHA and Ram Asaray SINGH intensive requirements of cutting-edge software applications running on high-performance desktop systems, workstations, and servers with most advanced x86-64 architecture. The technology innovation for the x86 architecture drives today's personal computers. It incorporates the most new microarchitectural features including a more than 200MHz system bus, a highperformance cache architecture and enhanced 3DNow! technology. 3DNow! technology is a set of 21 new instructions designed to open the traditional processing bottlenecks for floating-point-intensive and multimedia applications. The 3DNow! technology enables faster frame rates on high resolution scenes, better physical modeling of real-world environments, sharper and more detailed 3D imaging, smoother video playback, and near theater quality audio. Future AMD processors designed to operate at a frequencies grater than of 3GHz, should provide even high performance implementations of 3DNow! technology [1]. AMD Athlon processors are manufactured on AMD s robust 0.18-micron aluminum process technology and on AMD s leading-edge HiP6L 0.18-micron process technology featuring copper interconnects. The approximately 37-million-transistor new AMD Athlon processor has a die size of 120 mm2 on 0.18-micron technology [2]. Computer architectural complexity is growing so dramatically, the performance becomes an important approach to take full advantage of hardware s computational potential [3]. The CMOS scaling leading to ever increasing level of transistor integration on a chip, designers of high performance embedded processors have ample area available to increase processor resources in order to improve performance [4]. The SPEC CPU2006 benchmark suite contains several programs from different application areas such as Physics, Artificial intelligence and Combinatorial Optimization etc. The recently released SPEC CPU2006 benchmark suite is expected to be used by computer designers and computer architecture researchers for pre-silicon early design analysis [5]. Accuracy of the processor performance depends on the selected benchmarks in simulation study. The selected benchmarks should cover the vide spectrum of the application area. Increase in benchmarks program accelerates the simulation time, at the same time improper selection of the benchmarks may not accurately determines the performance of the processor Increasing size of the benchmarks makes detailed simulation an extremely time consuming process[6]. In this paper we have detected the hotspots in benchmarks memory space by using AMD Opteron 2000+ and AMD Opteron 8000+ series processors SPEC CPU INT 2006 44

Leonardo Electronic Journal of Practices and Technologies ISSN 1583-1078 Issue 16, January-June 2010 p. 43-52 performance scores, PCA and CA techniques. The similarities and dissimilarities between the benchmarks have been identified. Scope of This Study Building a high-performance microprocessor presents many reliability challenges. Today we are moving towards the nanotechnology era and also from 32-bit processor environment to 64-bit processor environment. The analysis of our study examines the weak spots in different series of AMD processors (AMD Opteron 2000+ and AMD Opteron 8000+ Series) which are fabricated for the requirement of the modern generation utility. This study is helpful to build complete benchmark suite which covers the entire spectrum of the application area and to predict the performance of the processor more accurately. We previously reported the performance prediction of the processors and evaluated scalability of the Memory Wait Time which degraded the performance of the processor by using a simple statistical correlation technique [7]. This analysis is more useful to performance engineers, scientists and developers to better understand benchmark behavior in workload space, and the scalability of the performance in modern generation commercial processors. are used for the performance evolution of the processors. The SPEC, HINT, and TPC are most important and popular benchmarks are available for performance evolution. SPEC is a nonprofit corporation formed to establish, maintain, and endorse a standardized set of benchmarks. SPEC s membership includes computer hardware and software vendors, leading universities, and research facilities worldwide. SPEC CPU2006 is designed to provide a comparative measure of compute-intensive performance across a range of hardware. Comprised of two suites of benchmarks, SPEC CPU2006 gauges computeintensive integer performance with CINT2006 and measures floating-point performance with CFP2006. CINT2006 and CFP2006 results are presented as ratios, which are calculated using a reference time determined by SPEC and the runtime of the benchmark higher scores indicate better performance [8]. 45

Detection of Weak Spots in Memory Space by using PCA and CA Abdul Kareem PARCHUR, Fazal NOORBASHA and Ram Asaray SINGH Table 1. The CINT 2006 Suite S. No Integer Benchmark Language Description 1 C++ PERL Programming Language 2 C Data Compression 3 C C Language Optimizing Compiler 4 C Combinatorial Optimization 5 C Artificial Intelligence : Game Playing 6 C Search a Gene Sequence Database 7 C Artificial Intelligence : Chess 8 C Physics / Quantum Computing 9 C Video Compression 10 C++ Discrete Event Simulation 11 C++ Path Finding Algorithm 12 C++ XSLT Processor Table 2. The CFP2006 Suite S. No Floating Point Language Description Benchmark 1 410.bwaves Fortran 77 Computational Fluid Dynamics 2 416.gamess Fortran Quantum Chemical Computations 3 433.milc C Physics / Quantum Chromo Dynamics 4 434.zeusmp Fortran 77 Physics / Magneto Hydro Dynamics 5 435.gromacs C/Fortran Chemistry / Molecular Dynamics 6 436.cactusADM C / Fortran-90 Physics / General Relativity 7 437.leslie3d Fortran 90 Computational Fluid Dynamics 8 444.namd C++ Scientific, Structural Biology, Classical Molecular Dynamics Simulation. 9 447.dealII C++ Solution of Partial Differential Equations using the Adaptive Finite Element Method. 10 450.soplex C++ Simplex Linear Programming Solver 11 453.povray C++ Computer Visualization / Ray Tracing 12 454.calculix C/Fortran-90 Structural Mechanics 13 459.GemsFDTD Fortran-90 Computational Electromagnetic 14 465.tonto Fortran-95 Quantum Crystallography 15 470.lbm C Computational Fluid Dynamics 16 481.wrf C/Fortran 90 Weather Processing 17 482.sphinx3 C Speech Recognition The SPEC CPU2006 suite contains 18 floating-point programs (Some programs are written in C and some in FORTRAN) and 13 integer programs (8 written in C, 4 in C++ and 1 in ANSI C). Table.1 and Table 2 provides a list of the benchmarks in SPEC CPU2006 suite. The SPEC CPU2006 benchmarks replace the SPEC89, SPEC92, SPEC95 and SPEC CPU 46

Leonardo Electronic Journal of Practices and Technologies ISSN 1583-1078 Issue 16, January-June 2010 p. 43-52 2000 benchmarks [8, 9, 10]. Methodology In this study we use the integer benchmarks from the newly released SPEC CPU2006 suite for the detection of weak spots in this analysis. Benchmark scores for AMD Opteron 2000+ series processors and AMD Opteron 8000+ series are obtained under the same operating conditions. We reported the performance scaling in AMD Opteron 2000+ series processors and AMD Opteron 8000+ series Processors [7]. Principal Component analysis and Cluster Analysis is used to identify the weak spots in workload memory space and to find the similarities and dissimilarities between different benchmarks in workload memory space. We used commercial statistical software called STATISTICA v.7.0 [11] for evaluating PCA and CA. Results and Discussion Using the Benchmark scores of AMD Opteron 2000+ series processors and AMD Opteron 8000+ series processors we obtained four most significant principal components, the first principal component (PC1) covers 79.6%, PC2 (26.5%), PC3 (0.91%) and PC4 (0.019%) of variance respectively. Among all PCs the first two principal components gives important information about benchmark behavior. The eigenvalues scree plot of all principal components, (PC1-PC4) is shown in Figure 1. Figure 2, shows the benchmarks behavior in PC1 and PC2 memory space. Among all the benchmarks shows high deviation in memory space. The benchmarks (C Language Optimizing Compiler) and (Discrete Event Simulation) are overlapped at top of the memory space by showing high variance, the benchmarks (PERL Programming Language) and (Video Compression) and the benchmark (Artificial Intelligence : Game Playing) and (XSLT Processor) are overlapped at the bottom of the memory space, these benchmarks can only increase the simulation time without providing an extra information. These weak spot in the memory space was represented by gray shapes in memory space of PC1 vs. PC2. These 47

Detection of Weak Spots in Memory Space by using PCA and CA Abdul Kareem PARCHUR, Fazal NOORBASHA and Ram Asaray SINGH weak spots identification provides the information to build a complete benchmark suite that covers a complete workload space. 10 9 72.5993% 8 7 6 Eigenvalues 5 4 3 26.4710% 2 1 0 0.9103% 0.0195% -1 1 2 3 4 Principal Components Figure 1. Eigenvalues scree plot of all principal components, which explain the variance in the workload (PC1-PC4) Figure 2. SPEC CINT 2006 programs plotted in the PC space using memory access characteristics (PC1 vs. PC2), Weak spots are highlighted trough a gray shapes Figure 3 and Figure 4 shows the SPEC CINT 2006 programs plotted in the PC space using memory access characteristics, PC3vs. PC4 and PC2vs. PC3 respectively, Weak spots are highlighted trough a gray shapes. Figure 5 represents the variance in the four significant individual Principal Components. (a)-(d) Presents the variation of individual Principal component score corresponding to each benchmark, figure 5(a) shows the most significant 48

Leonardo Electronic Journal of Practices and Technologies ISSN 1583-1078 Issue 16, January-June 2010 p. 43-52 variations in the in the benchmarks, PC1 covers 72.6% variation in memory space. The dissimilar behavior benchmarks are represented trough red circles in figure 5(a). Figure 3. SPEC CINT 2006 programs plotted in the PC space using memory access characteristics (PC2 vs. PC3), Weak spots are highlighted trough a gray shapes Figure 4. SPEC CINT 2006 programs plotted in the PC space using memory access characteristics (PC3 vs. PC4), Weak spots are highlighted trough a gray shapes Figure 6 shows the dendrogram, which explains similarities and dissimilarities in workload space of AMD Opteron 2000+ and AMD Opteron 8000+ Series Processors. The benchmarks and are linked with smaller linkage distance; on the other hand benchmark is useful for Physics / Quantum Computing shows long linkage distance. This dendrogram is useful for selecting benchmark suite for performance evolution. The line drawn at linkage distance L=400, can select K=4 benchmark, so, one can 49

Detection of Weak Spots in Memory Space by using PCA and CA Abdul Kareem PARCHUR, Fazal NOORBASHA and Ram Asaray SINGH reduce the program execution time. Figure 7 shows the two-way Joining results of AMD Opteron 2000+ and AMD Opteron 8000+ Series Processors and SPEC CPU INT 2006 benchmarks. The benchmark shows high execution time which is represented trough 1800 score point boxes. Principal Component 1 1.1 1.0 0.9 0.8 0.7 0.6 Principal Component 3 0.20 0.15 0.10 0.05 0.00-0.05-0.10-0.15 (a) 0.5 (b) -0.20 1.0 0.04 0.8 0.6 0.03 Principal Component 2 0.4 0.2 0.0-0.2-0.4-0.6-0.8 Principal Component 4 0.02 0.01 0.00-0.01-0.02-1.0 (c) (d) Figure 5. Represents the variance in the four significant Principal Components. (a)-(d) Presents the variation of individual Principal component score corresponding to each benchmark -0.03 Disclaimer All the observations and analysis done in this paper on SPEC CPU2006int are the author s opinions and should not be used as official or unofficial guidelines from SPEC in selecting benchmarks for any purpose. This paper only provides guidelines for performance engineers, academic users, scientists and developers to better understand the benchmark suite and to build a complete benchmark suit which covers the entire spectrum of the memory space without weak spots. 50

Leonardo Electronic Journal of Practices and Technologies ISSN 1583-1078 Issue 16, January-June 2010 p. 43-52 K=4, L=400 Smaller Linkage Distance Long Linkage Distance 0 500 1000 1500 2000 Linkage Distance Figure 6. Dendrogram showing the similarities and dissimilarities in workload space of AMD Opteron 2000+ and AMD Opteron 8000+ Series Processors Two-Way Joining Results AMD Opetron 2222 8220 8218 2356 8216 1800 1600 1400 1200 1000 800 600 Figure 7. Cluster analysis of Two-way joining results showing the similarities and dissimilarities in workload space of AMD Opteron 2000+ and AMD Opteron 8000+ Series Processors Acknowledgement The authors would like to thank Prof. D. K. Gautam, Head, Department of Electronics, North Maharastra University, Jalgaon, (M.S), India, and Prof. Ravi Pandey, Professor and Department Chair, Michigan Tech. University, USA, for many stimulating comments and discussions. One of the authors (A.K.P.) gratefully acknowledges financial support of UGC for a meritorious research fellowship. 51

Detection of Weak Spots in Memory Space by using PCA and CA Abdul Kareem PARCHUR, Fazal NOORBASHA and Ram Asaray SINGH References 1. Oberman S., Favor G., Weber F., AMD 3DNow! Technology: architecture and implementations, IEEE_M_MICRO, 1999, 19, p. 37-48. 2. AMD 86-64 Architecture Manuals [online] [accessed on August, 2009]. Available at: http://www.amd.com. 3. Xue Y., Zhao C., Automated Phase-Ordering of Loop Optimizations Based on Polyhedron Model, Proc. 10th IEEE International Conference on High Performance Computing and Communications HPCC '08, 2008, p. 672-677. 4. Homayoun H., Pasricha S., Makhzar M., Veidenbaum A., Dynamic register file resizing and frequency scaling to improve embedded processor performance and energy-delay efficiency, Proc. 45th ACM/IEEE Design Automation Conference DAC 2008, 2008, p. 68-71. 5. Aashish Phansalkar, Ajay Joshi and Lizy K. John, Analysis of Redundancy and Application Balance in the SPEC CPU2006 Benchmark Suite, ISCA 07, June 9-13, 2007. 6. Nair A., John L., Simulation points for SPEC CPU 2006, Proc. IEEE International Conference on Computer Design ICCD 2008, 2008, p. 397-403. 7. Abdul Kareem P., R. A. Singh, Performance Scaling of Individual SPEC INT 2006 Results for AMD Processors, Leonardo Electronic Journal of Practices and Technologies, 2009, 14, p. 65-72. 8. SPEC CPU2000 Press Release FAQ [online] [accessed on August, 2009], Available at: http://www.spec.org/osg/cpu2000/press/ faq.html 9. KleinOsowski A.J., Lilja D.J., MinneSPEC: A new SPEC benchmark workload for simulation-based computer architecture research, Computer Architecture Letters, 2002, 1, p. 7-7. 10. Henning J.L., SPEC CPU2000: Measuring CPU performance in the new millennium. IEEE Computer, 2000, 33, p. 28-35. 11. StatSoft, Inc. (2004). STATISTICA (data analysis software system), version 7, for windows. www.statsoft.com. 52