COMPUTER ARCHITECTURE SIMULATOR

Size: px

Start display at page:

Download "COMPUTER ARCHITECTURE SIMULATOR"

Amanda McDowell
6 years ago
Views:

1 International Journal of Electrical and Electronics Engineering Research (IJEEER) ISSN X Vol. 3, Issue 1, Mar 2013, TJPRC Pvt. Ltd. COMPUTER ARCHITECTURE SIMULATOR P. ANURADHA 1, HEMALATHA RALLAPALLI 2 & SYED MUSTAK AHMED 3 1,3 Department of ECE, SR Engineering College, Warangal, Andhra Pradesh, India 2 Department of ECE, University College of Engineering, Osmania University, Hyderabad, Andhra Pradesh, India ABSTRACT SimpleScalar tool set is a computer architecture simulator regurgitates the behavior of a computing device and it is a system software infrastructure used to build modeling applications for program performance analysis, detailed micro architectural modeling, and hardware-software co-verification. Using the SimpleScalar tools, we can build modeling applications that simulate real programs running on a range of modern processors and systems. The tool set includes sample simulators ranging from a fast functional simulator to a detailed, dynamically scheduled processor model that supports non-blocking caches, speculative execution, and state-of-the-art branch prediction. In this paper we have presented simulation of SPEC 95 benchmarks. KEYWORDS: Simulation, Simple Scalar, SPEC INTRODUCTION Modern processors are incredibly complex marvels of engineering that are becoming increasingly hard to evaluate. The SimpleScalar tool set,which performs fast, flexible, and accurate simulation of modern processors that implement the SimpleScalar architecture The tool set takes binaries compiled for the SimpleScalar architecture and simulates their execution on one of six provided processor simulators. There are seven SimpleScalar simulators,simsafe,sim-fast,sim-cache,sim-outorder,sim-bpred and Sim-profile. Summary of all simulators given below. Table1: Simplescalar Simulators Simulator Description #Line Sim-fast Simple functional simulator 402 Sim-safe Speed-optimized functional simulator 307 Sim-profile Functional simulator with profiling 812 Sim-cache Hierarchical memory simulator 782 Sim-bred Customizable branch prediction simulator 513 Sim-outorder Detailed micro-architectural simulator with dynamic instruction scheduler and multimemory hierarchy 4555 Simplescalar tools support several benchmarks (such as spec 95,2000, and 2006 )and written in fortran and c languages, but all languages should be converted to c language, then compiled using the simplescalar version of gcc which will generate simplescalar assembly from benchmark source code. The simplescalar assembler and loader, along with the necessary ported libraries, produce simplescalar executables that can then be fed directly into one of the provided simulators (the simulators themselves are compiled with the host platform s native compiler; any ansi c compiler will do). We depict a graphical overview of the tool set in figure2. The simplescalar architecture, like the mips architecture, supports both big-endian and little-endian executables. The tool set supports compilation for either of these targets; the names for the big-endian and little-endian architecture is ssbig-na-sstrix and sslittle-na-sstrix, respectively. We should use the target

2 298 P. Anuradha, Hemalatha Rallapalli & Syed Mustak Ahmed endian-ness that matches host platform; the simulators may not work correctly if we force the compiler to provide crossendian support. Figure 1: Simplescalar Tool Set Overview Simplescalar can simulate alpha and pisa (portable isa). others are being added to the simplescalar. in fact, pisa and mips only have a little difference in term of their isas. Most of the mips compilation tools, especially the linker, are reusable by pisa. Supporting only one real processor architecture (alpha) until the recent release, the simplescalar team has been spending a lot of efforts to manually porting the simplescalar infrastructure to many other processor architectures. in this paper we are going to present results of few available spec 95 alpha and pisa benchmarks. after installing simplescalar, go into the simplescalar-3.0 directory and type make config-alpha, followed by make to install it, to configure it as pisa first that must be cleaned using make clean, make config-pisa,and make.the benchmarks directory contains the binaries and inputs for 4 programs; gcc, anagram, compress and go. Simulated with simplescalar simulators and analyzed with graphs. LITERATURE REVIEW Benchmarks are simulated with simplescalar simulators; simplescalar simulators are briefly described below Sim-Fast It does no time accounting; only functional simulation-it executes each instruction serially, simulating no instructions in parallel. sim-fast is optimized for raw speed, and assumes no cache, instruction checking. Sim-Safe It also performs functional simulation, but checks for correct alignment and access permissions for each memory reference. sim-fast and sim-safe do not accept any additional command line arguments. Sim-Profile It can generate detailed profiles on instruction classes and addresses, text symbols, memory accesses, branches, and data segment symbols. Sim-cache and sim-cheetah: these simulators are ideal for fast simulation of caches if the effect of cache performance on execution time is not needed. Sim-Outorder This simulator supports out-of-order issue and execution, based on the register update unit. The ruu scheme uses a reorder buffer to automatically rename registers and hold the results of pending instructions. Each cycle the reorder buffer retires completed instructions in program order to the architected register file. The processor's memory system employs a load/store queue. Store values are placed in the queue if the store is speculative. Loads are dispatched to the memory

3 Computer Architecture Simulator 299 system when the addresses of all previous stores are known. Loads may be satisfied either by the memory system or by an earlier store value residing in the queue, if their addresses match. Speculative loads may generate cache misses, but speculative tlb misses stall the pipeline until the branch condition is known. Simoutorder runs about an order of magnitude slower than sim-fast. This simulator is a performance simulator, tracking the latency of all pipeline operations. Sim-Cache This simulator can emulate a system with multiple levels of instruction and data caches, each of which can be configured for different sizes and organizations. This simulator is ideal for fast cache simulation if the effect of cache performance on execution time is not needed. Sim-Bpred Branch predictor simulator. This tool can simulate difference branch prediction schemes and reports results such as prediction hit and miss rates. Like sim-cache, this does not simulate accurately the effect of branch prediction on execution time.alpha, is a 64-bit reduced instruction set computer (risc) instruction set architecture (isa) developed by digital equipment corporation (dec), designed to replace the 32-bit vax complex instruction set computer (cisc) isa and its implementations. alpha was implemented in microprocessors originally developed and fabricated by dec. these microprocessors were most prominently used in a variety of dec workstations and servers, which eventually formed the basis for almost their entire mid-to-upper-scale lineup. several third-party vendors also produced alpha systems, including pc form factor motherboards.in the alpha architecture, a byte was defined as an 8-bit datum, a word as a 16-bit datum, a longword as a 32-bit datum, a quadword as a 64-bit datum and an octaword as a 128-bit datum.the alpha architecture originally defined six data types;quadword (64-bit) integer,longword (32-bit) integer,ieee t-floating-point (double precision, 64-bit) and ieee s-floating-point (single precision, 32-bit).the alpha had some provision for future expansion of the instruction set to include 128-bit data types. alpha architecture instruction set classified into control instructions,integer arithmetic,logical and shift and extentions also there.the alpha has a 64-bit linear virtual address space with no memory segmentation. Implementations can implement a smaller virtual address space with a minimum size of 43 bits. Although the unused bits were not implemented in hardware such as tlbs, the architecture required implementations to check if they are zero to ensure software compatibility with implementations that implemented a larger or the full virtual address space.the pisa instruction set (the portable instruction set architecture) is a simple mips-like instruction set maintained primarily for instructional use. a gnu gcc-based cross-compiler and pre-built libraries are also available for this target. The pisa target is particularly useful for computer engineering instruction as the tools can be built on a wide range of host platforms, including linux/x86, win2000, sparc solaris, and others. RESEARCH METHODOLOGY This section presents our methodology: the micro architecture-independent metrics, the statistical data analysis techniques, the tools, and the benchmarks that are used in this paper. Metrics In this research work we selected microarchitecture-independent metrics to characterize the behavior of the instruction and data stream of every benchmark program. Microarchitecture-independent metrics allow for a comparison between programs by understanding the inherent characteristics of a program isolated from features of particular microarchitectural components. As such, we use a gamut of microarchitecture-independent metrics which we feel affect overall program performance. We provide an intuitive reasoning to illustrate how the measured metrics can affect the

4 300 P. Anuradha, Hemalatha Rallapalli & Syed Mustak Ahmed manifested performance. The metrics measured in this study are a subset of all the microarchitecture independent characteristics that can be potentially measured, but we believe that they cover a wide enough range of the program characteristics to make a meaningful comparison between the programs. We Have Presented Result of the Following Metrics Number of instructions committed. Simulation time in cycles. CPI,cycles per instruction l1 data cache missrate l2 data cache missrate Benchmarks RESULTS The different benchmarks of SPEC 95 programs and input are shown in Table 2. Table 2: SPEC 95 Alpha Programs Program Input INT/FP anagram anagram. in INT Cc1 1stmt.i INT Compress95 Compress95.in INT go 2stone9.in INT In this section, we present our results in table 3-5.We first focus on particular benchmark statistics; Total number of instructions, loads and stores, branches, dl1 hits, dl1 misses. And plot the variation in CPI, L1 data cache miss rate and L2 data cache miss rate across the applications for alpha benchmarks. Table 3: Results of SPEC 95 Alpha Program Total Number of Instructions Loads and Stores Branches Dl1 Hits Dl1 Misses anagram Cc Compress go Figure 2: Results of SPEC 95 Alpha

Computer Architecture Simulator 301 Table 4: SPEC 95 Alpha Results Program CPI L1 Data Cache Miss Rate L2 Data Cache Miss Rate anagram 2.6783 0.1396 0.5527 cc1 0.9511 0.0606 0.5066 Compress95 0.554 0.

2348 Figure 3: Variation in CPI, L1 and L2 Data Cache Miss Rates Next, change the type of branch predictor used to (a) taken, (b) not-taken and (c) perfect and collect the new CPI, cycles per

5 Computer Architecture Simulator 301 Table 4: SPEC 95 Alpha Results Program CPI L1 Data Cache Miss Rate L2 Data Cache Miss Rate anagram cc Compress go Figure 3: Variation in CPI, L1 and L2 Data Cache Miss Rates Next, change the type of branch predictor used to (a) taken, (b) not-taken and (c) perfect and collect the new CPI, cycles per instruction. Plot the CPI across each application. Table 5: SPEC 95 Alpha Results Program Taken Not Taken Perfect CCI in anagram CCI in compress Figure 4: Variation in CPI for Change in the Type of Branch Predictor Simulated SPEC95 Benchmarks with SimpleScalar Simulator called Sim-outorder for programs anagram, cc1, compress95 and go. In all these binaries we have observed total number of instructions, loads and stores, branches, dl1 hits, dl1 misses, CPI, L1 data cache miss rate and L2 data cache miss rate. For go binary all above statistics are very high compared to other binaries. CONCLUSIONS With the objective of understanding the SPEC 95 benchmarks since the inception of SPEC, we characterized different microarchitectur independent. Analyzing the executables generated by compiling these programs on state of the art compilers with full optimization levels, we put the programs into a common perspective and examined the trends. Total

6 302 P. Anuradha, Hemalatha Rallapalli & Syed Mustak Ahmed number of instructions, loads and stores, branches, dl1 hits, dl1 misses, CPI, L1 data cache miss rate and L2 data cache miss rate across the applications were studied. No single characteristic has changed as dramatically as the dynamic instruction count. Although the diversity of newer generations of SPEC CPU benchmarks has increased, about half of the programs in SPEC CPU 2000 are redundant. While researchers in the past have picked subsets of suites based on convenience, we have presented results of clustering analysis based on several innate program characteristics and our results should be useful to select representative subsets (should experimentation with the whole suite be prohibitively expensive). We have also put program from four different suites into a common perspective, in case anyone wanted to compare results of particular programs from past suites with the newest programs. Our recommendations to SPEC would be to continue broadening the diversity of programs in the future generation of benchmark suites while at the same time reduce the redundancy in programs, and increase the static instruction count in the program binaries. We also recommend that computer architects and researchers should concentrate on designing well performing memory hierarchies in anticipation of increasingly poor temporal data locality in future generation of SPEC CPU benchmark programs. ACKNOWLEDGEMENTS The Author would like to acknowledge the support of varies sites that gives more information regarding this. REFERENCES 1. Keith D. Cooper and Timothy J. Harvey, Compiler-Controlled Memory, Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, October John Hennessey and David Patterson, Computer Architecture: A Quantitative Approach. Third Edition, Morgan Kaufman Publishers, San Francisco, California, J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy, Power4 System Architecture, IBM Journal of Research and Development,January Harsh Sharangpani and Ken Arora, Itanium Processor Microarchitecture, Ieee Micro, September/October Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith, Mediabench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems, proceedings of the 30th International Symposium on Microarchitecture, December 1997.

SimpleScalar v2.0 Tutorial. Simulator Basics. Types of Simulators. What is a simulator? Why use a simulator? Why not use a simulator?

SimpleScalar v2.0 Tutorial. Simulator Basics. Types of Simulators. What is a simulator? Why use a simulator? Why not use a simulator? SimpleScalar v2.0 Tutorial Univ of Wisc - CS 752 Dana Vantrease (leveraged largely from Austin & Burger) Simulator Basics What is a simulator? Tool that runs and emulates the behavior of a computing device