COMPUTER ARCHITECTURE SIMULATOR

Size: px
Start display at page:

Download "COMPUTER ARCHITECTURE SIMULATOR"

Transcription

1 International Journal of Electrical and Electronics Engineering Research (IJEEER) ISSN X Vol. 3, Issue 1, Mar 2013, TJPRC Pvt. Ltd. COMPUTER ARCHITECTURE SIMULATOR P. ANURADHA 1, HEMALATHA RALLAPALLI 2 & SYED MUSTAK AHMED 3 1,3 Department of ECE, SR Engineering College, Warangal, Andhra Pradesh, India 2 Department of ECE, University College of Engineering, Osmania University, Hyderabad, Andhra Pradesh, India ABSTRACT SimpleScalar tool set is a computer architecture simulator regurgitates the behavior of a computing device and it is a system software infrastructure used to build modeling applications for program performance analysis, detailed micro architectural modeling, and hardware-software co-verification. Using the SimpleScalar tools, we can build modeling applications that simulate real programs running on a range of modern processors and systems. The tool set includes sample simulators ranging from a fast functional simulator to a detailed, dynamically scheduled processor model that supports non-blocking caches, speculative execution, and state-of-the-art branch prediction. In this paper we have presented simulation of SPEC 95 benchmarks. KEYWORDS: Simulation, Simple Scalar, SPEC INTRODUCTION Modern processors are incredibly complex marvels of engineering that are becoming increasingly hard to evaluate. The SimpleScalar tool set,which performs fast, flexible, and accurate simulation of modern processors that implement the SimpleScalar architecture The tool set takes binaries compiled for the SimpleScalar architecture and simulates their execution on one of six provided processor simulators. There are seven SimpleScalar simulators,simsafe,sim-fast,sim-cache,sim-outorder,sim-bpred and Sim-profile. Summary of all simulators given below. Table1: Simplescalar Simulators Simulator Description #Line Sim-fast Simple functional simulator 402 Sim-safe Speed-optimized functional simulator 307 Sim-profile Functional simulator with profiling 812 Sim-cache Hierarchical memory simulator 782 Sim-bred Customizable branch prediction simulator 513 Sim-outorder Detailed micro-architectural simulator with dynamic instruction scheduler and multimemory hierarchy 4555 Simplescalar tools support several benchmarks (such as spec 95,2000, and 2006 )and written in fortran and c languages, but all languages should be converted to c language, then compiled using the simplescalar version of gcc which will generate simplescalar assembly from benchmark source code. The simplescalar assembler and loader, along with the necessary ported libraries, produce simplescalar executables that can then be fed directly into one of the provided simulators (the simulators themselves are compiled with the host platform s native compiler; any ansi c compiler will do). We depict a graphical overview of the tool set in figure2. The simplescalar architecture, like the mips architecture, supports both big-endian and little-endian executables. The tool set supports compilation for either of these targets; the names for the big-endian and little-endian architecture is ssbig-na-sstrix and sslittle-na-sstrix, respectively. We should use the target

2 298 P. Anuradha, Hemalatha Rallapalli & Syed Mustak Ahmed endian-ness that matches host platform; the simulators may not work correctly if we force the compiler to provide crossendian support. Figure 1: Simplescalar Tool Set Overview Simplescalar can simulate alpha and pisa (portable isa). others are being added to the simplescalar. in fact, pisa and mips only have a little difference in term of their isas. Most of the mips compilation tools, especially the linker, are reusable by pisa. Supporting only one real processor architecture (alpha) until the recent release, the simplescalar team has been spending a lot of efforts to manually porting the simplescalar infrastructure to many other processor architectures. in this paper we are going to present results of few available spec 95 alpha and pisa benchmarks. after installing simplescalar, go into the simplescalar-3.0 directory and type make config-alpha, followed by make to install it, to configure it as pisa first that must be cleaned using make clean, make config-pisa,and make.the benchmarks directory contains the binaries and inputs for 4 programs; gcc, anagram, compress and go. Simulated with simplescalar simulators and analyzed with graphs. LITERATURE REVIEW Benchmarks are simulated with simplescalar simulators; simplescalar simulators are briefly described below Sim-Fast It does no time accounting; only functional simulation-it executes each instruction serially, simulating no instructions in parallel. sim-fast is optimized for raw speed, and assumes no cache, instruction checking. Sim-Safe It also performs functional simulation, but checks for correct alignment and access permissions for each memory reference. sim-fast and sim-safe do not accept any additional command line arguments. Sim-Profile It can generate detailed profiles on instruction classes and addresses, text symbols, memory accesses, branches, and data segment symbols. Sim-cache and sim-cheetah: these simulators are ideal for fast simulation of caches if the effect of cache performance on execution time is not needed. Sim-Outorder This simulator supports out-of-order issue and execution, based on the register update unit. The ruu scheme uses a reorder buffer to automatically rename registers and hold the results of pending instructions. Each cycle the reorder buffer retires completed instructions in program order to the architected register file. The processor's memory system employs a load/store queue. Store values are placed in the queue if the store is speculative. Loads are dispatched to the memory

3 Computer Architecture Simulator 299 system when the addresses of all previous stores are known. Loads may be satisfied either by the memory system or by an earlier store value residing in the queue, if their addresses match. Speculative loads may generate cache misses, but speculative tlb misses stall the pipeline until the branch condition is known. Simoutorder runs about an order of magnitude slower than sim-fast. This simulator is a performance simulator, tracking the latency of all pipeline operations. Sim-Cache This simulator can emulate a system with multiple levels of instruction and data caches, each of which can be configured for different sizes and organizations. This simulator is ideal for fast cache simulation if the effect of cache performance on execution time is not needed. Sim-Bpred Branch predictor simulator. This tool can simulate difference branch prediction schemes and reports results such as prediction hit and miss rates. Like sim-cache, this does not simulate accurately the effect of branch prediction on execution time.alpha, is a 64-bit reduced instruction set computer (risc) instruction set architecture (isa) developed by digital equipment corporation (dec), designed to replace the 32-bit vax complex instruction set computer (cisc) isa and its implementations. alpha was implemented in microprocessors originally developed and fabricated by dec. these microprocessors were most prominently used in a variety of dec workstations and servers, which eventually formed the basis for almost their entire mid-to-upper-scale lineup. several third-party vendors also produced alpha systems, including pc form factor motherboards.in the alpha architecture, a byte was defined as an 8-bit datum, a word as a 16-bit datum, a longword as a 32-bit datum, a quadword as a 64-bit datum and an octaword as a 128-bit datum.the alpha architecture originally defined six data types;quadword (64-bit) integer,longword (32-bit) integer,ieee t-floating-point (double precision, 64-bit) and ieee s-floating-point (single precision, 32-bit).the alpha had some provision for future expansion of the instruction set to include 128-bit data types. alpha architecture instruction set classified into control instructions,integer arithmetic,logical and shift and extentions also there.the alpha has a 64-bit linear virtual address space with no memory segmentation. Implementations can implement a smaller virtual address space with a minimum size of 43 bits. Although the unused bits were not implemented in hardware such as tlbs, the architecture required implementations to check if they are zero to ensure software compatibility with implementations that implemented a larger or the full virtual address space.the pisa instruction set (the portable instruction set architecture) is a simple mips-like instruction set maintained primarily for instructional use. a gnu gcc-based cross-compiler and pre-built libraries are also available for this target. The pisa target is particularly useful for computer engineering instruction as the tools can be built on a wide range of host platforms, including linux/x86, win2000, sparc solaris, and others. RESEARCH METHODOLOGY This section presents our methodology: the micro architecture-independent metrics, the statistical data analysis techniques, the tools, and the benchmarks that are used in this paper. Metrics In this research work we selected microarchitecture-independent metrics to characterize the behavior of the instruction and data stream of every benchmark program. Microarchitecture-independent metrics allow for a comparison between programs by understanding the inherent characteristics of a program isolated from features of particular microarchitectural components. As such, we use a gamut of microarchitecture-independent metrics which we feel affect overall program performance. We provide an intuitive reasoning to illustrate how the measured metrics can affect the

4 300 P. Anuradha, Hemalatha Rallapalli & Syed Mustak Ahmed manifested performance. The metrics measured in this study are a subset of all the microarchitecture independent characteristics that can be potentially measured, but we believe that they cover a wide enough range of the program characteristics to make a meaningful comparison between the programs. We Have Presented Result of the Following Metrics Number of instructions committed. Simulation time in cycles. CPI,cycles per instruction l1 data cache missrate l2 data cache missrate Benchmarks RESULTS The different benchmarks of SPEC 95 programs and input are shown in Table 2. Table 2: SPEC 95 Alpha Programs Program Input INT/FP anagram anagram. in INT Cc1 1stmt.i INT Compress95 Compress95.in INT go 2stone9.in INT In this section, we present our results in table 3-5.We first focus on particular benchmark statistics; Total number of instructions, loads and stores, branches, dl1 hits, dl1 misses. And plot the variation in CPI, L1 data cache miss rate and L2 data cache miss rate across the applications for alpha benchmarks. Table 3: Results of SPEC 95 Alpha Program Total Number of Instructions Loads and Stores Branches Dl1 Hits Dl1 Misses anagram Cc Compress go Figure 2: Results of SPEC 95 Alpha

5 Computer Architecture Simulator 301 Table 4: SPEC 95 Alpha Results Program CPI L1 Data Cache Miss Rate L2 Data Cache Miss Rate anagram cc Compress go Figure 3: Variation in CPI, L1 and L2 Data Cache Miss Rates Next, change the type of branch predictor used to (a) taken, (b) not-taken and (c) perfect and collect the new CPI, cycles per instruction. Plot the CPI across each application. Table 5: SPEC 95 Alpha Results Program Taken Not Taken Perfect CCI in anagram CCI in compress Figure 4: Variation in CPI for Change in the Type of Branch Predictor Simulated SPEC95 Benchmarks with SimpleScalar Simulator called Sim-outorder for programs anagram, cc1, compress95 and go. In all these binaries we have observed total number of instructions, loads and stores, branches, dl1 hits, dl1 misses, CPI, L1 data cache miss rate and L2 data cache miss rate. For go binary all above statistics are very high compared to other binaries. CONCLUSIONS With the objective of understanding the SPEC 95 benchmarks since the inception of SPEC, we characterized different microarchitectur independent. Analyzing the executables generated by compiling these programs on state of the art compilers with full optimization levels, we put the programs into a common perspective and examined the trends. Total

6 302 P. Anuradha, Hemalatha Rallapalli & Syed Mustak Ahmed number of instructions, loads and stores, branches, dl1 hits, dl1 misses, CPI, L1 data cache miss rate and L2 data cache miss rate across the applications were studied. No single characteristic has changed as dramatically as the dynamic instruction count. Although the diversity of newer generations of SPEC CPU benchmarks has increased, about half of the programs in SPEC CPU 2000 are redundant. While researchers in the past have picked subsets of suites based on convenience, we have presented results of clustering analysis based on several innate program characteristics and our results should be useful to select representative subsets (should experimentation with the whole suite be prohibitively expensive). We have also put program from four different suites into a common perspective, in case anyone wanted to compare results of particular programs from past suites with the newest programs. Our recommendations to SPEC would be to continue broadening the diversity of programs in the future generation of benchmark suites while at the same time reduce the redundancy in programs, and increase the static instruction count in the program binaries. We also recommend that computer architects and researchers should concentrate on designing well performing memory hierarchies in anticipation of increasingly poor temporal data locality in future generation of SPEC CPU benchmark programs. ACKNOWLEDGEMENTS The Author would like to acknowledge the support of varies sites that gives more information regarding this. REFERENCES 1. Keith D. Cooper and Timothy J. Harvey, Compiler-Controlled Memory, Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, October John Hennessey and David Patterson, Computer Architecture: A Quantitative Approach. Third Edition, Morgan Kaufman Publishers, San Francisco, California, J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy, Power4 System Architecture, IBM Journal of Research and Development,January Harsh Sharangpani and Ken Arora, Itanium Processor Microarchitecture, Ieee Micro, September/October Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith, Mediabench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems, proceedings of the 30th International Symposium on Microarchitecture, December 1997.

SimpleScalar v2.0 Tutorial. Simulator Basics. Types of Simulators. What is a simulator? Why use a simulator? Why not use a simulator?

SimpleScalar v2.0 Tutorial. Simulator Basics. Types of Simulators. What is a simulator? Why use a simulator? Why not use a simulator? SimpleScalar v2.0 Tutorial Univ of Wisc - CS 752 Dana Vantrease (leveraged largely from Austin & Burger) Simulator Basics What is a simulator? Tool that runs and emulates the behavior of a computing device

More information

CprE Computer Architecture and Assembly Level Programming Spring Lab-2

CprE Computer Architecture and Assembly Level Programming Spring Lab-2 CprE 381 - Computer Architecture and Assembly Level Programming Spring 2017 Lab-2 INTRODUCTION: This introductory lab is aimed at introducing you to the Simplescalar simulator, while letting you explore

More information

Using the SimpleScalar Tool Set at UT-CS

Using the SimpleScalar Tool Set at UT-CS Handout #3 Using the SimpleScalar Tool Set at UT-CS Prof. Steve Keckler skeckler@cs.utexas.edu Version 1.0: 1/25/99 1 Introduction The SimpleScalar Tool Set performs fast, flexible, and accurate simulations

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Architecture Tuning Study: the SimpleScalar Experience

Architecture Tuning Study: the SimpleScalar Experience Architecture Tuning Study: the SimpleScalar Experience Jianfeng Yang Yiqun Cao December 5, 2005 Abstract SimpleScalar is software toolset designed for modeling and simulation of processor performance.

More information

Limitations of Scalar Pipelines

Limitations of Scalar Pipelines Limitations of Scalar Pipelines Superscalar Organization Modern Processor Design: Fundamentals of Superscalar Processors Scalar upper bound on throughput IPC = 1 Inefficient unified pipeline

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

For Problems 1 through 8, You can learn about the "go" SPEC95 benchmark by looking at the web page

For Problems 1 through 8, You can learn about the go SPEC95 benchmark by looking at the web page Problem 1: Cache simulation and associativity. For Problems 1 through 8, You can learn about the "go" SPEC95 benchmark by looking at the web page http://www.spec.org/osg/cpu95/news/099go.html. This problem

More information

Exploitation of instruction level parallelism

Exploitation of instruction level parallelism Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering

More information

A Framework for the Performance Evaluation of Operating System Emulators. Joshua H. Shaffer. A Proposal Submitted to the Honors Council

A Framework for the Performance Evaluation of Operating System Emulators. Joshua H. Shaffer. A Proposal Submitted to the Honors Council A Framework for the Performance Evaluation of Operating System Emulators by Joshua H. Shaffer A Proposal Submitted to the Honors Council For Honors in Computer Science 15 October 2003 Approved By: Luiz

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

C152 Laboratory Exercise 3

C152 Laboratory Exercise 3 C152 Laboratory Exercise 3 Professor: Krste Asanovic TA: Christopher Celio Department of Electrical Engineering & Computer Science University of California, Berkeley March 7, 2011 1 Introduction and goals

More information

Design of Experiments - Terminology

Design of Experiments - Terminology Design of Experiments - Terminology Response variable Measured output value E.g. total execution time Factors Input variables that can be changed E.g. cache size, clock rate, bytes transmitted Levels Specific

More information

A Quantitative/Qualitative Study for Optimal Parameter Selection of a Superscalar Processor using SimpleScalar

A Quantitative/Qualitative Study for Optimal Parameter Selection of a Superscalar Processor using SimpleScalar A Quantitative/Qualitative Study for Optimal Parameter Selection of a Superscalar Processor using SimpleScalar Abstract Waseem Ahmad, Enrico Ng {wahmad1, eng3}@uic.edu Department of Electrical and Computer

More information

HW#3 COEN-4730 Computer Architecture. Objective:

HW#3 COEN-4730 Computer Architecture. Objective: HW#3 COEN-4730 Computer Architecture Objective: To learn about SimpleScalar and Wattch tools. Learn how to find a suitable superscalar architecture for a specific benchmark through so called Design Space

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Towards a More Efficient Trace Cache

Towards a More Efficient Trace Cache Towards a More Efficient Trace Cache Rajnish Kumar, Amit Kumar Saha, Jerry T. Yen Department of Computer Science and Electrical Engineering George R. Brown School of Engineering, Rice University {rajnish,

More information

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture Lecture Topics ECE 486/586 Computer Architecture Lecture # 5 Spring 2015 Portland State University Quantitative Principles of Computer Design Fallacies and Pitfalls Instruction Set Principles Introduction

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Using a Victim Buffer in an Application-Specific Memory Hierarchy

Using a Victim Buffer in an Application-Specific Memory Hierarchy Using a Victim Buffer in an Application-Specific Memory Hierarchy Chuanjun Zhang Depment of lectrical ngineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Depment of Computer Science

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

IF1 --> IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB. add $10, $2, $3 IF1 IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB sub $4, $10, $6 IF1 IF2 ID1 ID2 --> EX1 EX2 ME1 ME2 WB

IF1 --> IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB. add $10, $2, $3 IF1 IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB sub $4, $10, $6 IF1 IF2 ID1 ID2 --> EX1 EX2 ME1 ME2 WB EE 4720 Homework 4 Solution Due: 22 April 2002 To solve Problem 3 and the next assignment a paper has to be read. Do not leave the reading to the last minute, however try attempting the first problem below

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per

More information

Performance of several branch predictor types and different RAS configurations

Performance of several branch predictor types and different RAS configurations Performance of several branch predictor types and different RAS configurations Advanced Computer Architecture Simulation project First semester, 2009/2010 Done by: Dua'a AL-Najdawi Date: 20-1-2010 1 Design

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

Superscalar Processor

Superscalar Processor Superscalar Processor Design Superscalar Architecture Virendra Singh Indian Institute of Science Bangalore virendra@computer.orgorg Lecture 20 SE-273: Processor Design Superscalar Pipelines IF ID RD ALU

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Course web site: teaching/courses/car. Piazza discussion forum:

Course web site:   teaching/courses/car. Piazza discussion forum: Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic

More information

15-740/ Computer Architecture Lecture 22: Superscalar Processing (II) Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 22: Superscalar Processing (II) Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 22: Superscalar Processing (II) Prof. Onur Mutlu Carnegie Mellon University Announcements Project Milestone 2 Due Today Homework 4 Out today Due November 15

More information

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?) Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Walking Four Machines by the Shore

Walking Four Machines by the Shore Walking Four Machines by the Shore Anastassia Ailamaki www.cs.cmu.edu/~natassa with Mark Hill and David DeWitt University of Wisconsin - Madison Workloads on Modern Platforms Cycles per instruction 3.0

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

MODELING EFFECTS OF SPECULATIVE INSTRUCTION EXECUTION IN A FUNCTIONAL CACHE SIMULATOR AMOL SHAMKANT PANDIT, B.E.

MODELING EFFECTS OF SPECULATIVE INSTRUCTION EXECUTION IN A FUNCTIONAL CACHE SIMULATOR AMOL SHAMKANT PANDIT, B.E. MODELING EFFECTS OF SPECULATIVE INSTRUCTION EXECUTION IN A FUNCTIONAL CACHE SIMULATOR BY AMOL SHAMKANT PANDIT, B.E. A thesis submitted to the Graduate School in partial fulfillment of the requirements

More information

The Impact of Instruction Compression on I-cache Performance

The Impact of Instruction Compression on I-cache Performance Technical Report CSE-TR--97, University of Michigan The Impact of Instruction Compression on I-cache Performance I-Cheng K. Chen Peter L. Bird Trevor Mudge EECS Department University of Michigan {icheng,pbird,tnm}@eecs.umich.edu

More information

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections ) Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target

More information

ECE404 Term Project Sentinel Thread

ECE404 Term Project Sentinel Thread ECE404 Term Project Sentinel Thread Alok Garg Department of Electrical and Computer Engineering, University of Rochester 1 Introduction Performance degrading events like branch mispredictions and cache

More information

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Elec525 Spring 2005 Raj Bandyopadhyay, Mandy Liu, Nico Peña Hypothesis Some modern processors use a prefetching unit at the front-end

More information

New Advances in Micro-Processors and computer architectures

New Advances in Micro-Processors and computer architectures New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,

More information

More on Conjunctive Selection Condition and Branch Prediction

More on Conjunctive Selection Condition and Branch Prediction More on Conjunctive Selection Condition and Branch Prediction CS764 Class Project - Fall Jichuan Chang and Nikhil Gupta {chang,nikhil}@cs.wisc.edu Abstract Traditionally, database applications have focused

More information

Characterization of Native Signal Processing Extensions

Characterization of Native Signal Processing Extensions Characterization of Native Signal Processing Extensions Jason Law Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 jlaw@mail.utexas.edu Abstract Soon if

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

CS 654 Computer Architecture Summary. Peter Kemper

CS 654 Computer Architecture Summary. Peter Kemper CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining

More information

Superscalar Machines. Characteristics of superscalar processors

Superscalar Machines. Characteristics of superscalar processors Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance

More information

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor 1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the Evolution of ISAs Instruction set architectures have changed over computer generations with changes in the cost of the hardware density of the hardware design philosophy potential performance gains One

More information

[1] C. Moura, \SuperDLX A Generic SuperScalar Simulator," ACAPS Technical Memo 64, School

[1] C. Moura, \SuperDLX A Generic SuperScalar Simulator, ACAPS Technical Memo 64, School References [1] C. Moura, \SuperDLX A Generic SuperScalar Simulator," ACAPS Technical Memo 64, School of Computer Science, McGill University, May 1993. [2] C. Young, N. Gloy, and M. D. Smith, \A Comparative

More information

Instruction Set Usage Analysis for Application-Specific Systems Design

Instruction Set Usage Analysis for Application-Specific Systems Design INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & COMPUTER SCIENCE, VOL. 7, NO. 2, JANUARY/FEBRUARY 2013, (ISSN: 2091-1610) 99 Instruction Set Usage Analysis for Application-Specific Systems Design Charles

More information

Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research

Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Joel Hestness jthestness@uwalumni.com Lenni Kuff lskuff@uwalumni.com Computer Science Department University of

More information

CS146 Computer Architecture. Fall Midterm Exam

CS146 Computer Architecture. Fall Midterm Exam CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state

More information

A Quick SimpleScalar Tutorial. [Prepared from SimpleScalar Tutorial v2, Todd Austin and Doug Burger.]

A Quick SimpleScalar Tutorial. [Prepared from SimpleScalar Tutorial v2, Todd Austin and Doug Burger.] A Quick SimpleScalar Tutorial [Prepared from SimpleScalar Tutorial v2, Todd Austin and Doug Burger.] What is SimpleScalar? SimpleScalar is a tool set for high-performance simulation and research of modern

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 15 April 1, 2009 martha@cs.columbia.edu and the rest of the semester Source code (e.g., *.java, *.c) (software) Compiler MIPS instruction set architecture

More information

Enhancing Data Cache Performance via Dynamic Allocation

Enhancing Data Cache Performance via Dynamic Allocation Enhancing Data Cache Performance via Dynamic Allocation George Murillo, Scott Noel, Joshua Robinson, Paul Willmann Rice University 6100 Main Street Houston, TX 77005, USA {jmurillo, scnoel, jpr, willmann}@rice.edu

More information

ECE 486/586. Computer Architecture. Lecture # 7

ECE 486/586. Computer Architecture. Lecture # 7 ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix

More information

ProtoFlex: FPGA-Accelerated Hybrid Simulator

ProtoFlex: FPGA-Accelerated Hybrid Simulator ProtoFlex: FPGA-Accelerated Hybrid Simulator Eric S. Chung, Eriko Nurvitadhi James C. Hoe, Babak Falsafi, Ken Mai Computer Architecture Lab at Multiprocessor Simulation Simulating one processor in software

More information

A Cache Scheme Based on LRU-Like Algorithm

A Cache Scheme Based on LRU-Like Algorithm Proceedings of the 2010 IEEE International Conference on Information and Automation June 20-23, Harbin, China A Cache Scheme Based on LRU-Like Algorithm Dongxing Bao College of Electronic Engineering Heilongjiang

More information

Advanced Computer Architecture CMSC 611 Homework 4 Due in class at 1.05pm, Nov 7 th, 2012

Advanced Computer Architecture CMSC 611 Homework 4 Due in class at 1.05pm, Nov 7 th, 2012 Advanced Computer Architecture CMSC 11 Homework Due in class at 1.0pm, Nov 7 th, 01 (For Part B, you could submit an electronic file containing the output of your simulations. If you wish to go green,

More information

COMP3221: Microprocessors and. and Embedded Systems. Instruction Set Architecture (ISA) What makes an ISA? #1: Memory Models. What makes an ISA?

COMP3221: Microprocessors and. and Embedded Systems. Instruction Set Architecture (ISA) What makes an ISA? #1: Memory Models. What makes an ISA? COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) http://www.cse.unsw.edu.au/~cs3221 Lecturer: Hui Wu Session 2, 2005 Instruction Set Architecture (ISA) ISA is

More information

Computer Architecture A Quantitative Approach

Computer Architecture A Quantitative Approach Computer Architecture A Quantitative Approach Third Edition John L. Hennessy Stanford University David A. Patterson University of California at Berkeley With Contributions by David Goldberg Xerox Palo

More information

Cache Implications of Aggressively Pipelined High Performance Microprocessors

Cache Implications of Aggressively Pipelined High Performance Microprocessors Cache Implications of Aggressively Pipelined High Performance Microprocessors Timothy J. Dysart, Branden J. Moore, Lambert Schaelicke, Peter M. Kogge Department of Computer Science and Engineering University

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Superscalar Organization

Superscalar Organization Superscalar Organization Nima Honarmand Instruction-Level Parallelism (ILP) Recall: Parallelism is the number of independent tasks available ILP is a measure of inter-dependencies between insns. Average

More information

Cache Performance Research for Embedded Processors

Cache Performance Research for Embedded Processors Available online at www.sciencedirect.com Physics Procedia 25 (2012 ) 1322 1328 2012 International Conference on Solid State Devices and Materials Science Cache Performance Research for Embedded Processors

More information

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Appendix C Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure C.2 The pipeline can be thought of as a series of data paths shifted in time. This shows

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2007 Lecture 14: Virtual Machines 563 L14.1 Fall 2009 Outline Types of Virtual Machine User-level (or Process VMs) System-level Techniques for implementing all

More information

URL: Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture

URL:   Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture 01 1 EE 4720 Computer Architecture 01 1 URL: https://www.ece.lsu.edu/ee4720/ RSS: https://www.ece.lsu.edu/ee4720/rss home.xml Offered by: David M. Koppelman 3316R P. F. Taylor Hall, 578-5482, koppel@ece.lsu.edu,

More information

Review of instruction set architectures

Review of instruction set architectures Review of instruction set architectures Outline ISA and Assembly Language RISC vs. CISC Instruction Set Definition (MIPS) 2 ISA and assembly language Assembly language ISA Machine language 3 Assembly language

More information

Area-Efficient Error Protection for Caches

Area-Efficient Error Protection for Caches Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various

More information

Spectre and Meltdown. Clifford Wolf q/talk

Spectre and Meltdown. Clifford Wolf q/talk Spectre and Meltdown Clifford Wolf q/talk 2018-01-30 Spectre and Meltdown Spectre (CVE-2017-5753 and CVE-2017-5715) Is an architectural security bug that effects most modern processors with speculative

More information

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Virtual Machines and Dynamic Translation: Implementing ISAs in Software Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application

More information

Computer Architecture EE 4720 Final Examination

Computer Architecture EE 4720 Final Examination Name Computer Architecture EE 4720 Final Examination Primary: 6 December 1999, Alternate: 7 December 1999, 10:00 12:00 CST 15:00 17:00 CST Alias Problem 1 Problem 2 Problem 3 Problem 4 Exam Total (25 pts)

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

COMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA)

COMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA) COMP2121: Microprocessors and Interfacing Instruction Set Architecture (ISA) http://www.cse.unsw.edu.au/~cs2121 Lecturer: Hui Wu Session 2, 2017 1 Contents Memory models Registers Data types Instructions

More information

The Impact of Write Back on Cache Performance

The Impact of Write Back on Cache Performance The Impact of Write Back on Cache Performance Daniel Kroening and Silvia M. Mueller Computer Science Department Universitaet des Saarlandes, 66123 Saarbruecken, Germany email: kroening@handshake.de, smueller@cs.uni-sb.de,

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

URL: Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture

URL:   Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture 01 1 EE 4720 Computer Architecture 01 1 URL: http://www.ece.lsu.edu/ee4720/ RSS: http://www.ece.lsu.edu/ee4720/rss home.xml Offered by: David M. Koppelman 345 ERAD, 578-5482, koppel@ece.lsu.edu, http://www.ece.lsu.edu/koppel

More information

E0-243: Computer Architecture

E0-243: Computer Architecture E0-243: Computer Architecture L1 ILP Processors RG:E0243:L1-ILP Processors 1 ILP Architectures Superscalar Architecture VLIW Architecture EPIC, Subword Parallelism, RG:E0243:L1-ILP Processors 2 Motivation

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

Constructing Virtual Architectures on Tiled Processors. David Wentzlaff Anant Agarwal MIT

Constructing Virtual Architectures on Tiled Processors. David Wentzlaff Anant Agarwal MIT Constructing Virtual Architectures on Tiled Processors David Wentzlaff Anant Agarwal MIT 1 Emulators and JITs for Multi-Core David Wentzlaff Anant Agarwal MIT 2 Why Multi-Core? 3 Why Multi-Core? Future

More information

High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs

High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs October 29, 2002 Microprocessor Research Forum Intel s Microarchitecture Research Labs! USA:

More information

Like scalar processor Processes individual data items Item may be single integer or floating point number. - 1 of 15 - Superscalar Architectures

Like scalar processor Processes individual data items Item may be single integer or floating point number. - 1 of 15 - Superscalar Architectures Superscalar Architectures Have looked at examined basic architecture concepts Starting with simple machines Introduced concepts underlying RISC machines From characteristics of RISC instructions Found

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Historical Perspective and Further Reading 3.10

Historical Perspective and Further Reading 3.10 3.10 6.13 Historical Perspective and Further Reading 3.10 This section discusses the history of the first pipelined processors, the earliest superscalars, the development of out-of-order and speculative

More information

Speculation and Future-Generation Computer Architecture

Speculation and Future-Generation Computer Architecture Speculation and Future-Generation Computer Architecture University of Wisconsin Madison URL: http://www.cs.wisc.edu/~sohi Outline Computer architecture and speculation control, dependence, value speculation

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 18: Virtual Machines

CS252 Spring 2017 Graduate Computer Architecture. Lecture 18: Virtual Machines CS252 Spring 2017 Graduate Computer Architecture Lecture 18: Virtual Machines Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Midterm Topics ISA -- e.g. RISC vs. CISC

More information

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

ECE/CS 552: Introduction to Superscalar Processors

ECE/CS 552: Introduction to Superscalar Processors ECE/CS 552: Introduction to Superscalar Processors Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Limitations of Scalar Pipelines

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

CS 152, Spring 2011 Section 8

CS 152, Spring 2011 Section 8 CS 152, Spring 2011 Section 8 Christopher Celio University of California, Berkeley Agenda Grades Upcoming Quiz 3 What it covers OOO processors VLIW Branch Prediction Intel Core 2 Duo (Penryn) Vs. NVidia

More information