DOI: /jos Tel/Fax: by Journal of Software. All rights reserved. , )

Size: px
Start display at page:

Download "DOI: /jos Tel/Fax: by Journal of Software. All rights reserved. , )"

Transcription

1 ISSN , CODEN RUXUEW Journal of Software, Vol18, No4, April 2007, pp DOI: /jos Tel/Fax: by Journal of Software All rights reserved CPU SimOS-Goodson 1,2+, 1, 1, 1, 1 1, 1 (, ) 2 (, ) SimOS-Goodson: A Goodson-Processor Based Multi-Core Full-System Simulator GAO Xiang 1,2+, ZHANG Fu-Xin 1, TANG Yan 1, ZHANG Long-Bing 1, HU Wei-Wu 1, TANG Zhi-Min 1 1 (Institute of Computing Technology, The Chinese Academy of Sciences, Beijing , China) 2 (Department of Computer Science and Technology, University of Science and Technology of China, Hefei , China) + Corresponding author: Phn: ext 9320, gaoxiang@ictaccn Gao X, Zhang FX, Tang Y, Zhang LB, Hu WW, Tang ZM SimOS-Goodson: A Goodson-processor based multi-core full-system simulator Journal of Software, 2007,18(4): /1047htm Abstract: As the Chip MultiProcessors (CMPs) have become the trend of high performance microprocessors, the target workloads become more and more diversified The traditional user-level simulators cannot handle them, so new simulators are needed for the future architecture research Based on the SimOS full-system environment, a new multi-core full-system simulator of Goodson processors, SimOS-Goodson, has been designed and implemented The SimOS-Goodson decouples the simulation functionality and timing It adopts a new value-prediction approach to implement memory consistency in the simulation environment The credibility and accuracy of SimOS-Goodson are achieved by cross-validating the simulator with the actual hardware The simulator inherits the benefits such as high speed and high flexibility from the traditional user-level simulators It also has the new benefits such as accuracy, full-system support and easy to use By porting the entire Linux OS, analysis and evaluation of the microarchitecture and workloads can be conducted easily in the SimOS-Goodson full-system environment On a machine of Pentium4 30GHz, the speed of SimOS-Goodson exceeds 300K instructions per second SimOS-Goodson will play a key role in the research of future Goodson multi-core architecture Key words: simulator; Goodson-2 processor; full-system; multi-core; SimOS : Supported by the National Natural Foundation of China for Distinguished Young Scholars under Grant No ( ); the National High-Tech Research and Development Plan of China under Grant Nos2005AA110010, 2005AA ( (863)); the National Grand Fundamental Research 973 Program of China under Grant No2005CB ( (973)); the Basic Research Foundation of the Institute of Computing Technology, the Chinese Academy of Sciences under Grant No ( ); the Knowledge Innovation Program of the Institute of Computing Technology, the Chinese Academy of Sciences under Grant No ( ) Received ; Accepted

2 1048 Journal of Software Vol18, No4, April 2007 SimOS, CPU SimOS-Goodson SimOS-Goodson,SimOS-Goodson, Linux, SimOS-Goodson 30GHz Pentium4,SimOS-Goodson 300K/ SimOS-Goodson CPU : ; 2 ; ;SimOS : TP302 : A, 3 SimpleScalar [1], :, Web,, ;,,Intel,AMD, IBM,CMP(chip multiprocessor), 3, [2],,,, Stanford SimOS [4], CPU [5] SimOS-Goodson :, SimOS-Goodson 300K/ ; SimOS-Goodson 2 CPU, 1 2 SimOS-Goodson 3 SimOS-Goodson 4 SimOS-Goodson 5 SimOS-Goodson 6 1 [3] 11 2 CPU 2 4,,

3 : CPU SimOS-Goodson , ICT-Goodson,,, ICT-Goodson Sim-Goodson ICT-Goodson 2 (execution-driven) SimpleScalar 2 CPU SPEC CPU2000,Sim-Goodson,,Sim-Goodson 12 SimOS SimOS Stanford DASH/FLASH SimOS MIPS R4000 SimOS : ; IRIX ( ) 2 SimOS-Goodson SimOS-Goodson,SimOS-Goodson, SimOS, 2 SimOS SimOS-Goodson 1 Linux/MIPS, Application: SPEC CPU2000, Splash2, Linux-Mips operating system SimOS interface/device driver Console Interrupt-Controller, Timer NIC DISK Memory controller CPU interface/interconnect SimOS Goodson CPU core Goodson CPU core Fig1 Diagram of SimOS-Goodson 1 SimOS-Goodson SimOS 2 3 : 2 21 SimOS-Goodson 2 Sim-Goodson

4 1050 Journal of Software Vol18, No4, April 2007 (proxy), ; SimOS-Goodson 2 (reorder queue, ROQ),,,,, Cache : ; 22 2 (processor consistency),,,, :, [1],,,,,, 2, [6], 2,, 2 23

5 : CPU SimOS-Goodson 1051 SimOS-Goodson SimOS,, : ; : 8255, Linux/MIPS,SimOS-Goodson,Linux/MIPS ELF(executable and linkable format),, SimOS-Goodson ELF : 2 Linux ;, Linux SimOS-Goodson SCSI(small computer system interface) SimOS-Goodson Linux ( 2420) SimOS-Goodson, VxWorks SimOS-Goodson SimOS-Goodson 3 31 SimOS-Goodson 3, GDB(the GNU project debugger) (stub) (checkpoint), SimOS-Goodson SimOS-Fast,,SimOS-Fast SimOS, 2M / 2,,SimOS-Fast,SimOS-Goodson ;, SimOS-Fast, SimOS-Goodson SimOS-Goodson Share memory PC trace SimOS-Fast InterProcess communication Fig3 On-Line cross validating 3 SimOS GDB, SimOS GDB (416) Linux/MIPS GDB(53 60) SimOS-Goodson, 2 SimOS-Goodson Linux/MIPS SimOS-Goodson,,

6 1052 Journal of Software Vol18, No4, April 2007 (warm-up), 32 SimOS-Goodson Tcl, SimOS ELF dwarf,, Tcl, SimOS IRIX, Linux Linux Tcl 2 Load/Store ( cp0queue) [5],cp0queue, Load/Store, Store CACHE,Load Store cp0queue LW/SW LW/SW,Load Store ( 30%), Load Store Load, SPEC CPU 11% 7% 4 SimOS-Goodson Linux Unix 30G Pentium 4, 2GB, 4GB Red Hat Linux 73, GCC 296 Linux/MIPS SimOS-Goodson,, SPEC CPU 2000 [7] LMbench [8], SPEC CPU 2000 CPU IPC(instructons per cycle)simos-goodson CPU,,, 1 2 CPU,SPEC CPU2000 test, 10, 2 3 IPC 1 ICT-Goodson SimOS-Goodson 10%, SimOS-Goodson ; Sim-Goodson,SimOS-Goodson IPC Cache LMbench, ( ) 2 300MHZ 2 SimOS-Goodson 15%, SimOS-Goodson

7 : CPU SimOS-Goodson Table 1 Validating with SPEC CPU SPEC CPU 2000 Prog (SPECint) ICT-Goodson Sim-Goodson SimOS-Goodson IPC-Err Mcf Parser Perlbmk Crafty Twolf Vortex Vpr Gap Gcc Gzip Eon Applu Apsi Art Swim Equake Mesa Wupwise Sixtrack Table 2 Validating with LMbench 2 LMbench Application Real SimOS-Goodson Err Description Syscall null Simple syscall Syscall read Read syscall Pipe Interprocess communication through pipes Sig catch Signal handler Unix Unix stream socket Fifo Pipe operation Ctx Context switching, IPS(instructions per second) CPS(cycles per second) 4 3 Linux SPEC CPU 2000 SPLASH2 [9] Apache Benchmark,,SPLASH2 3 SimOS-Goodson Table 3 Simlating speed of SimOS-Goodson 3 SimOS-Goodson Core count Workload Cycles (K) Instructions (K) Time (s) CPS (K) IPS (K) 1 Linux OS boot Spec2k (MESA) Splash2 (radix) Apache benchmark Linux OS boot Splash2 (radix) Spec2k (MESA+MESA) Apache benchmark Linux OS boot Splash2 (radix) Apache benchmark

8 1054 Journal of Software Vol18, No4, April 2007 Linux 506K CPS IPC IPC,,SimOS-Goodson 300K IPS, 405K IPS 4 Linux Linux cpu_idle Cache 1, IPC 2 CPU,ICT-Goodson 50K / ;Sim-Goodson 500K / ;SimOS-Goodson 300K / 400K / ICT-Goodson,,, ;Sim-Goodson, ;SimOS-Goodson, Sim-Goodson, 5 DEC IBM SimOS Alpha PowerPC SimOS-Alpha SimOS-PPC SimOS-Goodson SimOS,, ;, GEMS(general execution model simulator) [10] Wisconsin, Simics [11] Simics SimOS-Goodson, GEMS Simics, [2] 120K IPS,GEMS (flat), SimOS-Goodson,,SimOS-Goodson 300 K IPS,SimOS-Goodson, 6 SimOS SimOS-Goodson,, 3 SPEC CPU 2000 LMbench, SimOS-Goodson 15% SimOS-Goodson, 2 CPU Linux,SimOS-Goodson 30%, 300K / SimOS-Goodson, SimOS-Goodson Cache,

9 : CPU SimOS-Goodson 1055 SimOS-Goodson, References: [1] Burger DC, Austin TM The simplescalar tool set, version 20 Technical Report, CS-TR , Madison: University of Wisconsin, 1997 [2] Mauer CJ, Hill MD, Wood DA Full system timing-first simulation In: Proc of the 2002 ACM Sigmetrics Conf on Measurement and Modeling of Computer Systems ACM Press, [3] Gibson J, Kunz R, Ofelt D, Horowitz M, Hennessy J, Heinrich M FLASH vs (simulated) FLASH: Closing the simulation loop In: Proc of the 9th Int l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) IEEE Computer Society, [4] Rosenblum M, Herrod SA, Witchel E, Gupta A Complete computer system simulation: The SimOS approach IEEE Parallel and Distributed Technology: Systems and Applications, 1995,3(4):34 43 [5] Hu WW, Zhang FX, Li ZS Microarchitecture of the Goodson-2 processor Journal of Computer Science and Technology, 2005, 20(2): [6] Martin MMK, Sorin DJ, Cain HW, Hill MD, Lipasti MH Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing In: Proc of the 34th Int l Symp on Microarchitecture IEEE Computer Society, [7] Henning J SPEC CPU2000: Measuring CPU performance in the new millennium IEEE Computer, 2000,33(7):28 35 [8] McVoy L, Staelin C LMbench: Portable tools for performance analysis In: USENIX Annual Technical Conf San Diego, [9] Woo SC, Ohara M, Torrie E, Singh JP, Gupta A The SPLASH-2 programs: Characterization and methodological considerations In: Proc of the 22nd Annual Int l Symp on Computer Architecture IEEE Computer Society, [10] Martin MM, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA Multifacet s general execution-driven multiprocessor simulator (GEMS) toolset, submitted to computer architecture news (CAN) wiscedu/gems [11] Magnusson PS, Cbristensson M, Eskilson J, Forsgren D, Hällberg G, Högberg J, Larsson F, Moestedt A, Werner B Simics: A full system simulation platform IEEE Computer, 2002,35(2):50 58 (1982 ),, (1974 ),, (1976 ),,,,CCF, (1968 ),,,VLSI (1981 ),,, (1966 ),,,CCF,

2. Futile Stall HTM HTM HTM. Transactional Memory: TM [1] TM. HTM exponential backoff. magic waiting HTM. futile stall. Hardware Transactional Memory:

2. Futile Stall HTM HTM HTM. Transactional Memory: TM [1] TM. HTM exponential backoff. magic waiting HTM. futile stall. Hardware Transactional Memory: 1 1 1 1 1,a) 1 HTM 2 2 LogTM 72.2% 28.4% 1. Transactional Memory: TM [1] TM Hardware Transactional Memory: 1 Nagoya Institute of Technology, Nagoya, Aichi, 466-8555, Japan a) tsumura@nitech.ac.jp HTM HTM

More information

Low-Complexity Reorder Buffer Architecture*

Low-Complexity Reorder Buffer Architecture* Low-Complexity Reorder Buffer Architecture* Gurhan Kucuk, Dmitry Ponomarev, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower

More information

Impact of Cache Coherence Protocols on the Processing of Network Traffic

Impact of Cache Coherence Protocols on the Processing of Network Traffic Impact of Cache Coherence Protocols on the Processing of Network Traffic Amit Kumar and Ram Huggahalli Communication Technology Lab Corporate Technology Group Intel Corporation 12/3/2007 Outline Background

More information

Full-System Timing-First Simulation

Full-System Timing-First Simulation Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin Madison The Problem Design of future computer systems uses simulation

More information

Accuracy Enhancement by Selective Use of Branch History in Embedded Processor

Accuracy Enhancement by Selective Use of Branch History in Embedded Processor Accuracy Enhancement by Selective Use of Branch History in Embedded Processor Jong Wook Kwak 1, Seong Tae Jhang 2, and Chu Shik Jhon 1 1 Department of Electrical Engineering and Computer Science, Seoul

More information

Exploiting Streams in Instruction and Data Address Trace Compression

Exploiting Streams in Instruction and Data Address Trace Compression Exploiting Streams in Instruction and Data Address Trace Compression Aleksandar Milenkovi, Milena Milenkovi Electrical and Computer Engineering Dept., The University of Alabama in Huntsville Email: {milenka

More information

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 CHERRY: CHECKPOINTED EARLY RESOURCE RECYCLING José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 1 2 3 MOTIVATION Problem: Limited processor resources Goal: More

More information

MEMORY ORDERING: A VALUE-BASED APPROACH

MEMORY ORDERING: A VALUE-BASED APPROACH MEMORY ORDERING: A VALUE-BASED APPROACH VALUE-BASED REPLAY ELIMINATES THE NEED FOR CONTENT-ADDRESSABLE MEMORIES IN THE LOAD QUEUE, REMOVING ONE BARRIER TO SCALABLE OUT- OF-ORDER INSTRUCTION WINDOWS. INSTEAD,

More information

Simulating Server Consolidation

Simulating Server Consolidation 421 A Coruña, 16-18 de septiembre de 2009 Simulating Server Consolidation Antonio García-Guirado, Ricardo Fernández-Pascual, José M. García 1 Abstract Recently, virtualization has become a hot topic in

More information

A Shared-Variable-Based Synchronization Approach to Efficient Cache Coherence Simulation for Multi-Core Systems

A Shared-Variable-Based Synchronization Approach to Efficient Cache Coherence Simulation for Multi-Core Systems A Shared-Variable-Based Synchronization Approach to Efficient Cache Coherence Simulation for Multi-Core Systems Cheng-Yang Fu, Meng-Huan Wu, and Ren-Song Tsay Department of Computer Science National Tsing

More information

Decoupled Zero-Compressed Memory

Decoupled Zero-Compressed Memory Decoupled Zero-Compressed Julien Dusser julien.dusser@inria.fr André Seznec andre.seznec@inria.fr Centre de recherche INRIA Rennes Bretagne Atlantique Campus de Beaulieu, 3542 Rennes Cedex, France Abstract

More information

A SoC simulator, the newest component in Open64 Report and Experience in Design and Development of a baseband SoC

A SoC simulator, the newest component in Open64 Report and Experience in Design and Development of a baseband SoC A SoC simulator, the newest component in Open64 Report and Experience in Design and Development of a baseband SoC Wendong Wang, Tony Tuo, Kevin Lo, Dongchen Ren, Gary Hau, Jun zhang, Dong Huang {wendong.wang,

More information

Cache Optimization by Fully-Replacement Policy

Cache Optimization by Fully-Replacement Policy American Journal of Embedded Systems and Applications 2016; 4(1): 7-14 http://www.sciencepublishinggroup.com/j/ajesa doi: 10.11648/j.ajesa.20160401.12 ISSN: 2376-6069 (Print); ISSN: 2376-6085 (Online)

More information

Configurable Virtual Platform Environment Using SID Simulator and Eclipse*

Configurable Virtual Platform Environment Using SID Simulator and Eclipse* Configurable Virtual Platform Environment Using SID Simulator and Eclipse* Hadipurnawan Satria, Baatarbileg Altangerel, Jin Baek Kwon, and Jeongbae Lee Department of Computer Science, Sun Moon University

More information

ECE404 Term Project Sentinel Thread

ECE404 Term Project Sentinel Thread ECE404 Term Project Sentinel Thread Alok Garg Department of Electrical and Computer Engineering, University of Rochester 1 Introduction Performance degrading events like branch mispredictions and cache

More information

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example 1 Which is the best? 2 Lecture 05 Performance Metrics and Benchmarking 3 Measuring & Improving Performance (if planes were computers...) Plane People Range (miles) Speed (mph) Avg. Cost (millions) Passenger*Miles

More information

Speculative Multithreaded Processors

Speculative Multithreaded Processors Guri Sohi and Amir Roth Computer Sciences Department University of Wisconsin-Madison utline Trends and their implications Workloads for future processors Program parallelization and speculative threads

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 10: Runahead and MLP Prof. Onur Mutlu Carnegie Mellon University Last Time Issues in Out-of-order execution Buffer decoupling Register alias tables Physical

More information

Frequent Value Compression in Packet-based NoC Architectures

Frequent Value Compression in Packet-based NoC Architectures Frequent Value Compression in Packet-based NoC Architectures Ping Zhou, BoZhao, YuDu, YiXu, Youtao Zhang, Jun Yang, Li Zhao ECE Department CS Department University of Pittsburgh University of Pittsburgh

More information

Leveraging Bloom Filters for Smart Search Within NUCA Caches

Leveraging Bloom Filters for Smart Search Within NUCA Caches Leveraging Bloom Filters for Smart Search Within NUCA Caches Robert Ricci, Steve Barrus, Dan Gebhardt, and Rajeev Balasubramonian School of Computing, University of Utah {ricci,sbarrus,gebhardt,rajeev}@cs.utah.edu

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Chip-Multithreading Systems Need A New Operating Systems Scheduler

Chip-Multithreading Systems Need A New Operating Systems Scheduler Chip-Multithreading Systems Need A New Operating Systems Scheduler Alexandra Fedorova Christopher Small Daniel Nussbaum Margo Seltzer Harvard University, Sun Microsystems Sun Microsystems Sun Microsystems

More information

Execution-based Prediction Using Speculative Slices

Execution-based Prediction Using Speculative Slices Execution-based Prediction Using Speculative Slices Craig Zilles and Guri Sohi University of Wisconsin - Madison International Symposium on Computer Architecture July, 2001 The Problem Two major barriers

More information

CSE 502 Graduate Computer Architecture. Lec 11 Simultaneous Multithreading

CSE 502 Graduate Computer Architecture. Lec 11 Simultaneous Multithreading CSE 502 Graduate Computer Architecture Lec 11 Simultaneous Multithreading Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from David Patterson,

More information

HIVE: Fault Containment for Shared-Memory Multiprocessors J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, A. Gupta

HIVE: Fault Containment for Shared-Memory Multiprocessors J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, A. Gupta HIVE: Fault Containment for Shared-Memory Multiprocessors J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, A. Gupta CSE 598C Presented by: Sandra Rueda The Problem O.S. for managing FLASH architecture

More information

Data Access History Cache and Associated Data Prefetching Mechanisms

Data Access History Cache and Associated Data Prefetching Mechanisms Data Access History Cache and Associated Data Prefetching Mechanisms Yong Chen 1 chenyon1@iit.edu Surendra Byna 1 sbyna@iit.edu Xian-He Sun 1, 2 sun@iit.edu 1 Department of Computer Science, Illinois Institute

More information

Software-assisted Cache Mechanisms for Embedded Systems. Prabhat Jain

Software-assisted Cache Mechanisms for Embedded Systems. Prabhat Jain Software-assisted Cache Mechanisms for Embedded Systems by Prabhat Jain Bachelor of Engineering in Computer Engineering Devi Ahilya University, 1986 Master of Technology in Computer and Information Technology

More information

Power-Efficient Spilling Techniques for Chip Multiprocessors

Power-Efficient Spilling Techniques for Chip Multiprocessors Power-Efficient Spilling Techniques for Chip Multiprocessors Enric Herrero 1,JoséGonzález 2, and Ramon Canal 1 1 Dept. d Arquitectura de Computadors Universitat Politècnica de Catalunya {eherrero,rcanal}@ac.upc.edu

More information

Computer System. Performance

Computer System. Performance Computer System Performance Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

SIMFLEX: A Fast, Accurate, Flexible Full-System Simulation Framework for Performance Evaluation of Server Architecture

SIMFLEX: A Fast, Accurate, Flexible Full-System Simulation Framework for Performance Evaluation of Server Architecture ACM SIGMETRICS Performance Evaluation Review (PER) Special Issue on Tools for Computer Architecture Research, Volume 31, Number 4, pages 31-35, March 2004 SIMFLEX: A Fast, Accurate, Flexible Full-System

More information

Improved Estimation for Software Multiplexing of Performance Counters

Improved Estimation for Software Multiplexing of Performance Counters Improved Estimation for Software Multiplexing of Performance Counters Wiplove Mathur Texas Instruments, Inc. San Diego, CA 92121 wiplove@ti.com Jeanine Cook New Mexico State University Las Cruces, NM 88003

More information

DOM: A Data-dependency-Oriented Modeling Approach for Efficient Simulation of OS Preemptive Scheduling

DOM: A Data-dependency-Oriented Modeling Approach for Efficient Simulation of OS Preemptive Scheduling DOM: A Data-dependency-Oriented Modeling Approach for Efficient Simulation of OS Preempte Scheduling Peng-Chih Wang, Meng-Huan Wu, and Ren-Song Tsay Department of Computer Science National Tsing Hua Unersity

More information

Workloads, Scalability and QoS Considerations in CMP Platforms

Workloads, Scalability and QoS Considerations in CMP Platforms Workloads, Scalability and QoS Considerations in CMP Platforms Presenter Don Newell Sr. Principal Engineer Intel Corporation 2007 Intel Corporation Agenda Trends and research context Evolving Workload

More information

Performance Validation of Network-Intensive Workloads on a Full-System Simulator

Performance Validation of Network-Intensive Workloads on a Full-System Simulator Performance Validation of Network-Intensive Workloads on a Full-System Ali G. Saidi Nathan L. Binkert Lisa R. Hsu Steven K. Reinhardt {saidi,binkertn,hsul,stever}@umich.edu Advanced Computer Architecture

More information

Improving Data Cache Performance via Address Correlation: An Upper Bound Study

Improving Data Cache Performance via Address Correlation: An Upper Bound Study Improving Data Cache Performance via Address Correlation: An Upper Bound Study Peng-fei Chuang 1, Resit Sendag 2, and David J. Lilja 1 1 Department of Electrical and Computer Engineering Minnesota Supercomputing

More information

An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks

An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 s Joshua J. Yi and David J. Lilja Department of Electrical and Computer Engineering Minnesota Supercomputing

More information

Cycle accurate transaction-driven simulation with multiple processor simulators

Cycle accurate transaction-driven simulation with multiple processor simulators Cycle accurate transaction-driven simulation with multiple processor simulators Dohyung Kim 1a) and Rajesh Gupta 2 1 Engineering Center, Google Korea Ltd. 737 Yeoksam-dong, Gangnam-gu, Seoul 135 984, Korea

More information

CATS: Cycle Accurate Transaction-driven Simulation with Multiple Processor Simulators

CATS: Cycle Accurate Transaction-driven Simulation with Multiple Processor Simulators CATS: Cycle Accurate Transaction-driven Simulation with Multiple Processor Simulators Dohyung Kim Soonhoi Ha Rajesh Gupta Department of Computer Science and Engineering School of Computer Science and Engineering

More information

COMPUTER SYSTEM EVALUATION WITH COMMERCIAL WORKLOADS

COMPUTER SYSTEM EVALUATION WITH COMMERCIAL WORKLOADS COMPUTER SYSTEM EVALUATION WITH COMMERCIAL WORKLOADS JIM NILSSON, FREDRIK DAHLGREN, MAGNUS KARLSSON, PETER MAGNUSSON y, and PER STENSTRÖM fj,dahlgren,karlsson,persg@ce.chalmers.se Department of Computer

More information

Multifacet s General Execution-driven Multiprocessor Simulator (GEMS) Toolset

Multifacet s General Execution-driven Multiprocessor Simulator (GEMS) Toolset Multifacet s General Execution-driven Multiprocessor Simulator (GEMS) Toolset Milo M. K. Martin 1, Daniel J. Sorin 2, Bradford M. Beckmann 3, Michael R. Marty 3, Min Xu 4, Alaa R. Alameldeen 3, Kevin E.

More information

Using a Serial Cache for. Energy Efficient Instruction Fetching

Using a Serial Cache for. Energy Efficient Instruction Fetching Using a Serial Cache for Energy Efficient Instruction Fetching Glenn Reinman y Brad Calder z y Computer Science Department, University of California, Los Angeles z Department of Computer Science and Engineering,

More information

DOI: /jos Tel/Fax: by Journal of Software. All rights reserved. , )

DOI: /jos Tel/Fax: by Journal of Software. All rights reserved. , ) ISSN 1000-9825, CODEN RUXUEW E-mail jos@iscasaccn Journal of Software, Vol17, No5, May 2006, pp1195 1203 http//wwwjosorgcn DOI 101360/jos171195 Tel/Fax +86-10-62562563 2006 by Journal of Software All rights

More information

Integrated CPU and Cache Power Management in Multiple Clock Domain Processors

Integrated CPU and Cache Power Management in Multiple Clock Domain Processors Integrated CPU and Cache Power Management in Multiple Clock Domain Processors Nevine AbouGhazaleh, Bruce Childers, Daniel Mossé & Rami Melhem Department of Computer Science University of Pittsburgh HiPEAC

More information

Reducing Latencies of Pipelined Cache Accesses Through Set Prediction

Reducing Latencies of Pipelined Cache Accesses Through Set Prediction Reducing Latencies of Pipelined Cache Accesses Through Set Prediction Aneesh Aggarwal Electrical and Computer Engineering Binghamton University Binghamton, NY 1392 aneesh@binghamton.edu Abstract With the

More information

Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research

Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Joel Hestness jthestness@uwalumni.com Lenni Kuff lskuff@uwalumni.com Computer Science Department University of

More information

Visualizing Page Replacement Techniques based on Page Frequency and Memory Access Pattern

Visualizing Page Replacement Techniques based on Page Frequency and Memory Access Pattern Visualizing Page Replacement Techniques based on Page Frequency and Memory Access Pattern Ruchin Gupta, Narendra Teotia Information Technology, Ajay Kumar Garg Engineering College Ghaziabad, India Abstract

More information

Dataflow Dominance: A Definition and Characterization

Dataflow Dominance: A Definition and Characterization Dataflow Dominance: A Definition and Characterization Matt Ramsay University of Wisconsin-Madison Department of Electrical and Computer Engineering 1415 Engineering Drive Madison, WI 53706 Submitted in

More information

DOI: /jos Tel/Fax: by Journal of Software. All rights reserved. , )

DOI: /jos Tel/Fax: by Journal of Software. All rights reserved. , ) ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscasaccn Journal of Software, Vol17, No2, February 2006, pp315 324 http://wwwjosorgcn DOI: 101360/jos170315 Tel/Fax: +86-10-62562563 2006 by Journal of Software

More information

High Performance Memory Requests Scheduling Technique for Multicore Processors

High Performance Memory Requests Scheduling Technique for Multicore Processors High Performance Memory Requests Scheduling Technique for Multicore Processors Walid El-Reedy Electronics and Comm. Engineering Cairo University, Cairo, Egypt walid.elreedy@gmail.com Ali A. El-Moursy Electrical

More information

Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring

Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Lei Jin Sangyeun Cho Department of Computer Science University of Pittsburgh jinlei,cho@cs.pitt.edu Abstract Private

More information

Instruction Based Memory Distance Analysis and its Application to Optimization

Instruction Based Memory Distance Analysis and its Application to Optimization Instruction Based Memory Distance Analysis and its Application to Optimization Changpeng Fang cfang@mtu.edu Steve Carr carr@mtu.edu Soner Önder soner@mtu.edu Department of Computer Science Michigan Technological

More information

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Bank-aware Dynamic Cache Partitioning for Multicore Architectures Bank-aware Dynamic Cache Partitioning for Multicore Architectures Dimitris Kaseridis, Jeffrey Stuecheli and Lizy K. John Department of Electrical and Computer Engineering, The University of Texas at Austin,

More information

Exploring Wakeup-Free Instruction Scheduling

Exploring Wakeup-Free Instruction Scheduling Exploring Wakeup-Free Instruction Scheduling Jie S. Hu, N. Vijaykrishnan, and Mary Jane Irwin Microsystems Design Lab The Pennsylvania State University Outline Motivation Case study: Cyclone Towards high-performance

More information

Scalable Directory Organization for Tiled CMP Architectures

Scalable Directory Organization for Tiled CMP Architectures Scalable Directory Organization for Tiled CMP Architectures Alberto Ros, Manuel E. Acacio, José M. García Departamento de Ingeniería y Tecnología de Computadores Universidad de Murcia {a.ros,meacacio,jmgarcia}@ditec.um.es

More information

Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors. Moinuddin K. Qureshi Onur Mutlu Yale N.

Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors. Moinuddin K. Qureshi Onur Mutlu Yale N. Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors Moinuddin K. Qureshi Onur Mutlu Yale N. Patt High Performance Systems Group Department of Electrical

More information

Shengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota

Shengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota Loop Selection for Thread-Level Speculation, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota Chip Multiprocessors (CMPs)

More information

Reducing Reorder Buffer Complexity Through Selective Operand Caching

Reducing Reorder Buffer Complexity Through Selective Operand Caching Appears in the Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2003 Reducing Reorder Buffer Complexity Through Selective Operand Caching Gurhan Kucuk Dmitry Ponomarev

More information

Official Agenda Review the most commonly used tools in CMP µarch research. Unofficial Agenda Convince you to use Simics

Official Agenda Review the most commonly used tools in CMP µarch research. Unofficial Agenda Convince you to use Simics Simics and Friends Modeling Tools for CMP Research Zvika Guz, Isask har (Zigi Zigi) Walter The Technion Israel Institute of Technology Official Agenda Agenda Review the most commonly used tools in CMP

More information

Keywords Cache memory, replacement policy, performance evaluation.

Keywords Cache memory, replacement policy, performance evaluation. Performance Evaluation of Cache Replacement Policies for the SPEC CPU Benchmark Suite Hussein Al-Zoubi, Aleksandar Milenkovic, Milena Milenkovic Department of Electrical and Computer Engineering The University

More information

Cache Insertion Policies to Reduce Bus Traffic and Cache Conflicts

Cache Insertion Policies to Reduce Bus Traffic and Cache Conflicts Cache Insertion Policies to Reduce Bus Traffic and Cache Conflicts Yoav Etsion Dror G. Feitelson School of Computer Science and Engineering The Hebrew University of Jerusalem 14 Jerusalem, Israel Abstract

More information

Saving Register-File Leakage Power by Monitoring Instruction Sequence in ROB

Saving Register-File Leakage Power by Monitoring Instruction Sequence in ROB Saving Register-File Leakage Power by Monitoring Instruction Sequence in ROB Wann-Yun Shieh Department of Computer Science and Information Engineering Chang Gung University Tao-Yuan, Taiwan Hsin-Dar Chen

More information

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors , July 4-6, 2018, London, U.K. A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid in 3D chip Multi-processors Lei Wang, Fen Ge, Hao Lu, Ning Wu, Ying Zhang, and Fang Zhou Abstract As

More information

Wisconsin Computer Architecture. Nam Sung Kim

Wisconsin Computer Architecture. Nam Sung Kim Wisconsin Computer Architecture Mark Hill Nam Sung Kim Mikko Lipasti Karu Sankaralingam Guri Sohi David Wood Technology & Moore s Law 35nm Transistor 1947 Moore s Law 1964: Integrated Circuit 1958 Transistor

More information

Instruction Recirculation: Eliminating Counting Logic in Wakeup-Free Schedulers

Instruction Recirculation: Eliminating Counting Logic in Wakeup-Free Schedulers Instruction Recirculation: Eliminating Counting Logic in Wakeup-Free Schedulers Joseph J. Sharkey, Dmitry V. Ponomarev Department of Computer Science State University of New York Binghamton, NY 13902 USA

More information

Porting GCC to the AMD64 architecture p.1/20

Porting GCC to the AMD64 architecture p.1/20 Porting GCC to the AMD64 architecture Jan Hubička hubicka@suse.cz SuSE ČR s. r. o. Porting GCC to the AMD64 architecture p.1/20 AMD64 architecture New architecture used by 8th generation CPUs by AMD AMD

More information

MODELING EFFECTS OF SPECULATIVE INSTRUCTION EXECUTION IN A FUNCTIONAL CACHE SIMULATOR AMOL SHAMKANT PANDIT, B.E.

MODELING EFFECTS OF SPECULATIVE INSTRUCTION EXECUTION IN A FUNCTIONAL CACHE SIMULATOR AMOL SHAMKANT PANDIT, B.E. MODELING EFFECTS OF SPECULATIVE INSTRUCTION EXECUTION IN A FUNCTIONAL CACHE SIMULATOR BY AMOL SHAMKANT PANDIT, B.E. A thesis submitted to the Graduate School in partial fulfillment of the requirements

More information

Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000

Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Mitesh R. Meswani and Patricia J. Teller Department of Computer Science, University

More information

That all computers can simulate each other

That all computers can simulate each other COVER FEATURE Simics: A Full System Simulation Platform A full system simulator attempts to strike a balance between accuracy and performance by modeling the complete final application and providing a

More information

Linux/SimOS - A Simulation Environment for Evaluating High-speed Communication Systems 1

Linux/SimOS - A Simulation Environment for Evaluating High-speed Communication Systems 1 Linux/SimOS - A Simulation Environment for Evaluating High-speed Communication Systems 1 Chulho Won, Ben Lee, Chansu Yu *, Sangman Moh, Yong-Youn Kim, Kyoung Park Electrical and Computer Engineering Department

More information

Reducing Overhead for Soft Error Coverage in High Availability Systems

Reducing Overhead for Soft Error Coverage in High Availability Systems Reducing Overhead for Soft Error Coverage in High Availability Systems Nidhi Aggarwal, Norman P. Jouppi, James E. Smith Parthasarathy Ranganathan and Kewal K. Saluja Abstract High reliability/availability

More information

Boost Sequential Program Performance Using A Virtual Large. Instruction Window on Chip Multicore Processor

Boost Sequential Program Performance Using A Virtual Large. Instruction Window on Chip Multicore Processor Boost Sequential Program Performance Using A Virtual Large Instruction Window on Chip Multicore Processor Liqiang He Inner Mongolia University Huhhot, Inner Mongolia 010021 P.R.China liqiang@imu.edu.cn

More information

CMP Memory Modeling: How Much Does Accuracy Matter?

CMP Memory Modeling: How Much Does Accuracy Matter? CMP Modeling: How Much Does Accuracy Matter? Sadagopan Srinivasan Li Zhao Brinda Ganesh Bruce Jacob* Mike Espig Ravi Iyer Systems Technology Lab, Intel Corporation * University of Maryland, College Park

More information

Computer Science 246. Computer Architecture

Computer Science 246. Computer Architecture Computer Architecture Spring 2010 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture Outline Performance Metrics Averaging Amdahl s Law Benchmarks The CPU Performance Equation Optimal

More information

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Computer Systems Laboratory Stanford University Motivation Dynamic analysis help

More information

Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX

Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Keerthi Bhushan Rajesh K Chaurasia Hewlett-Packard India Software Operations 29, Cunningham Road Bangalore 560 052 India +91-80-2251554

More information

On the Predictability of Program Behavior Using Different Input Data Sets

On the Predictability of Program Behavior Using Different Input Data Sets On the Predictability of Program Behavior Using Different Input Data Sets Wei Chung Hsu, Howard Chen, Pen Chung Yew Department of Computer Science University of Minnesota Abstract Smaller input data sets

More information

Characterizing the d-tlb Behavior of SPEC CPU2000 Benchmarks

Characterizing the d-tlb Behavior of SPEC CPU2000 Benchmarks Characterizing the d-tlb Behavior of SPEC CPU2 Benchmarks Gokul B. Kandiraju Anand Sivasubramaniam Dept. of Computer Science and Engineering The Pennsylvania State University University Park, PA 682. fkandiraj,anandg@cse.psu.edu

More information

Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved

Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved C5 Micro-Kernel: Real-Time Services for Embedded and Linux Systems Copyright 2003- Jaluna SA. All rights reserved. JL/TR-03-31.0.1 1 Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview

More information

Parallel Computing 38 (2012) Contents lists available at SciVerse ScienceDirect. Parallel Computing

Parallel Computing 38 (2012) Contents lists available at SciVerse ScienceDirect. Parallel Computing Parallel Computing 38 (2012) 533 551 Contents lists available at SciVerse ScienceDirect Parallel Computing journal homepage: www.elsevier.com/locate/parco Algorithm-level Feedback-controlled Adaptive data

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors:!

More information

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review Bijay K.Paikaray Debabala Swain Dept. of CSE, CUTM Dept. of CSE, CUTM Bhubaneswer, India Bhubaneswer, India

More information

Precise Instruction Scheduling

Precise Instruction Scheduling Journal of Instruction-Level Parallelism 7 (2005) 1-29 Submitted 10/2004; published 04/2005 Precise Instruction Scheduling Gokhan Memik Department of Electrical and Computer Engineering Northwestern University

More information

SimOS PPC. Full System Simulation of PowerPC Architecture. Tom Keller Austin Research Lab IBM Research Division

SimOS PPC. Full System Simulation of PowerPC Architecture. Tom Keller Austin Research Lab IBM Research Division SimOS PPC Full System Simulation of PowerPC Architecture Tom Keller IBM Research Division tkeller@us.ibm.com This Talk:(Inside IBM) (Outside IBM) http://simos.austin.ibm.com http://www.ece.utexas.edu/projects/ece/lca/events

More information

Non-Uniform Instruction Scheduling

Non-Uniform Instruction Scheduling Non-Uniform Instruction Scheduling Joseph J. Sharkey, Dmitry V. Ponomarev Department of Computer Science State University of New York Binghamton, NY 13902 USA {jsharke, dima}@cs.binghamton.edu Abstract.

More information

Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors

Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors Dan Wallin and Erik Hagersten Uppsala University Department of Information Technology P.O. Box 337, SE-751 05 Uppsala, Sweden

More information

Saving Register-File Leakage Power by Monitoring Instruction Sequence in ROB

Saving Register-File Leakage Power by Monitoring Instruction Sequence in ROB Saving Register-File Leakage Power by Monitoring Instruction Sequence in ROB Wann-Yun Shieh * and Hsin-Dar Chen Department of Computer Science and Information Engineering Chang Gung University, Taiwan

More information

Analyzing and Improving Clustering Based Sampling for Microprocessor Simulation

Analyzing and Improving Clustering Based Sampling for Microprocessor Simulation Analyzing and Improving Clustering Based Sampling for Microprocessor Simulation Yue Luo, Ajay Joshi, Aashish Phansalkar, Lizy John, and Joydeep Ghosh Department of Electrical and Computer Engineering University

More information

Performance Prediction using Program Similarity

Performance Prediction using Program Similarity Performance Prediction using Program Similarity Aashish Phansalkar Lizy K. John {aashish, ljohn}@ece.utexas.edu University of Texas at Austin Abstract - Modern computer applications are developed at a

More information

Energy Efficient Asymmetrically Ported Register Files

Energy Efficient Asymmetrically Ported Register Files Energy Efficient Asymmetrically Ported Register Files Aneesh Aggarwal ECE Department University of Maryland College Park, MD 20742 aneesh@eng.umd.edu Manoj Franklin ECE Department and UMIACS University

More information

Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors

Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors Ilya Ganusov and Martin Burtscher Computer Systems Laboratory Cornell University {ilya, burtscher}@csl.cornell.edu Abstract This

More information

Speculative Multithreaded Processors

Speculative Multithreaded Processors Guri Sohi and Amir Roth Computer Sciences Department University of Wisconsin-Madison utline Trends and their implications Workloads for future processors Program parallelization and speculative threads

More information

Pipeline Muffling and A Priori Current Ramping: Architectural Techniques to Reduce High-Frequency Inductive Noise

Pipeline Muffling and A Priori Current Ramping: Architectural Techniques to Reduce High-Frequency Inductive Noise Pipeline Muffling and A Priori Current Ramping: Architectural Techniques to Reduce High-Frequency Inductive Noise ABSTRACT While circuit and package designers have addressed microprocessor inductive noise

More information

Building and Using the ATLAS Transactional Memory System

Building and Using the ATLAS Transactional Memory System Building and Using the ATLAS Transactional Memory System Njuguna Njoroge, Sewook Wee, Jared Casper, Justin Burdick, Yuriy Teslyar, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford

More information

Post-Silicon Verification for Cache Coherence

Post-Silicon Verification for Cache Coherence Post-Silicon Verification for Cache Coherence Andrew DeOrio, Adam Bauserman and Valeria Bertacco Dept. of Electrical Engineering and Computer Science University of Michigan, Ann Arbor {awdeorio, adambb,

More information

Improving Instruction Delivery with a Block-Aware ISA

Improving Instruction Delivery with a Block-Aware ISA 530 Improving Instruction Delivery with a Block-Aware ISA Ahmad Zmily, Earl Killian, and Christos Kozyrakis Electrical Engineering Department Stanford University {zmily,killian,kozyraki}@stanford.edu Abstract.

More information

Inserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL. Inserting Prefetches IA-32 Execution Layer - 1

Inserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL. Inserting Prefetches IA-32 Execution Layer - 1 I Inserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL Inserting Prefetches IA-32 Execution Layer - 1 Agenda IA-32EL Brief Overview Prefetching in Loops IA-32EL Prefetching in

More information

Evaluation of Low-Overhead Organizations for the Directory in Future Many-Core CMPs

Evaluation of Low-Overhead Organizations for the Directory in Future Many-Core CMPs Evaluation of Low-Overhead Organizations for the Directory in Future Many-Core CMPs Alberto Ros 1 and Manuel E. Acacio 2 1 Dpto. de Informática de Sistemas y Computadores Universidad Politécnica de Valencia,

More information

Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor

Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Sarah Bird ϕ, Aashish Phansalkar ϕ, Lizy K. John ϕ, Alex Mericas α and Rajeev Indukuru α ϕ University

More information

Microarchitectural Design Space Exploration Using An Architecture-Centric Approach

Microarchitectural Design Space Exploration Using An Architecture-Centric Approach Microarchitectural Design Space Exploration Using An Architecture-Centric Approach Christophe Dubach, Timothy M. Jones and Michael F.P. O Boyle Member of HiPEAC School of Informatics University of Edinburgh

More information