DOI: /jos Tel/Fax: by Journal of Software. All rights reserved. , )
|
|
- Logan Morgan
- 5 years ago
- Views:
Transcription
1 ISSN , CODEN RUXUEW Journal of Software, Vol18, No4, April 2007, pp DOI: /jos Tel/Fax: by Journal of Software All rights reserved CPU SimOS-Goodson 1,2+, 1, 1, 1, 1 1, 1 (, ) 2 (, ) SimOS-Goodson: A Goodson-Processor Based Multi-Core Full-System Simulator GAO Xiang 1,2+, ZHANG Fu-Xin 1, TANG Yan 1, ZHANG Long-Bing 1, HU Wei-Wu 1, TANG Zhi-Min 1 1 (Institute of Computing Technology, The Chinese Academy of Sciences, Beijing , China) 2 (Department of Computer Science and Technology, University of Science and Technology of China, Hefei , China) + Corresponding author: Phn: ext 9320, gaoxiang@ictaccn Gao X, Zhang FX, Tang Y, Zhang LB, Hu WW, Tang ZM SimOS-Goodson: A Goodson-processor based multi-core full-system simulator Journal of Software, 2007,18(4): /1047htm Abstract: As the Chip MultiProcessors (CMPs) have become the trend of high performance microprocessors, the target workloads become more and more diversified The traditional user-level simulators cannot handle them, so new simulators are needed for the future architecture research Based on the SimOS full-system environment, a new multi-core full-system simulator of Goodson processors, SimOS-Goodson, has been designed and implemented The SimOS-Goodson decouples the simulation functionality and timing It adopts a new value-prediction approach to implement memory consistency in the simulation environment The credibility and accuracy of SimOS-Goodson are achieved by cross-validating the simulator with the actual hardware The simulator inherits the benefits such as high speed and high flexibility from the traditional user-level simulators It also has the new benefits such as accuracy, full-system support and easy to use By porting the entire Linux OS, analysis and evaluation of the microarchitecture and workloads can be conducted easily in the SimOS-Goodson full-system environment On a machine of Pentium4 30GHz, the speed of SimOS-Goodson exceeds 300K instructions per second SimOS-Goodson will play a key role in the research of future Goodson multi-core architecture Key words: simulator; Goodson-2 processor; full-system; multi-core; SimOS : Supported by the National Natural Foundation of China for Distinguished Young Scholars under Grant No ( ); the National High-Tech Research and Development Plan of China under Grant Nos2005AA110010, 2005AA ( (863)); the National Grand Fundamental Research 973 Program of China under Grant No2005CB ( (973)); the Basic Research Foundation of the Institute of Computing Technology, the Chinese Academy of Sciences under Grant No ( ); the Knowledge Innovation Program of the Institute of Computing Technology, the Chinese Academy of Sciences under Grant No ( ) Received ; Accepted
2 1048 Journal of Software Vol18, No4, April 2007 SimOS, CPU SimOS-Goodson SimOS-Goodson,SimOS-Goodson, Linux, SimOS-Goodson 30GHz Pentium4,SimOS-Goodson 300K/ SimOS-Goodson CPU : ; 2 ; ;SimOS : TP302 : A, 3 SimpleScalar [1], :, Web,, ;,,Intel,AMD, IBM,CMP(chip multiprocessor), 3, [2],,,, Stanford SimOS [4], CPU [5] SimOS-Goodson :, SimOS-Goodson 300K/ ; SimOS-Goodson 2 CPU, 1 2 SimOS-Goodson 3 SimOS-Goodson 4 SimOS-Goodson 5 SimOS-Goodson 6 1 [3] 11 2 CPU 2 4,,
3 : CPU SimOS-Goodson , ICT-Goodson,,, ICT-Goodson Sim-Goodson ICT-Goodson 2 (execution-driven) SimpleScalar 2 CPU SPEC CPU2000,Sim-Goodson,,Sim-Goodson 12 SimOS SimOS Stanford DASH/FLASH SimOS MIPS R4000 SimOS : ; IRIX ( ) 2 SimOS-Goodson SimOS-Goodson,SimOS-Goodson, SimOS, 2 SimOS SimOS-Goodson 1 Linux/MIPS, Application: SPEC CPU2000, Splash2, Linux-Mips operating system SimOS interface/device driver Console Interrupt-Controller, Timer NIC DISK Memory controller CPU interface/interconnect SimOS Goodson CPU core Goodson CPU core Fig1 Diagram of SimOS-Goodson 1 SimOS-Goodson SimOS 2 3 : 2 21 SimOS-Goodson 2 Sim-Goodson
4 1050 Journal of Software Vol18, No4, April 2007 (proxy), ; SimOS-Goodson 2 (reorder queue, ROQ),,,,, Cache : ; 22 2 (processor consistency),,,, :, [1],,,,,, 2, [6], 2,, 2 23
5 : CPU SimOS-Goodson 1051 SimOS-Goodson SimOS,, : ; : 8255, Linux/MIPS,SimOS-Goodson,Linux/MIPS ELF(executable and linkable format),, SimOS-Goodson ELF : 2 Linux ;, Linux SimOS-Goodson SCSI(small computer system interface) SimOS-Goodson Linux ( 2420) SimOS-Goodson, VxWorks SimOS-Goodson SimOS-Goodson 3 31 SimOS-Goodson 3, GDB(the GNU project debugger) (stub) (checkpoint), SimOS-Goodson SimOS-Fast,,SimOS-Fast SimOS, 2M / 2,,SimOS-Fast,SimOS-Goodson ;, SimOS-Fast, SimOS-Goodson SimOS-Goodson Share memory PC trace SimOS-Fast InterProcess communication Fig3 On-Line cross validating 3 SimOS GDB, SimOS GDB (416) Linux/MIPS GDB(53 60) SimOS-Goodson, 2 SimOS-Goodson Linux/MIPS SimOS-Goodson,,
6 1052 Journal of Software Vol18, No4, April 2007 (warm-up), 32 SimOS-Goodson Tcl, SimOS ELF dwarf,, Tcl, SimOS IRIX, Linux Linux Tcl 2 Load/Store ( cp0queue) [5],cp0queue, Load/Store, Store CACHE,Load Store cp0queue LW/SW LW/SW,Load Store ( 30%), Load Store Load, SPEC CPU 11% 7% 4 SimOS-Goodson Linux Unix 30G Pentium 4, 2GB, 4GB Red Hat Linux 73, GCC 296 Linux/MIPS SimOS-Goodson,, SPEC CPU 2000 [7] LMbench [8], SPEC CPU 2000 CPU IPC(instructons per cycle)simos-goodson CPU,,, 1 2 CPU,SPEC CPU2000 test, 10, 2 3 IPC 1 ICT-Goodson SimOS-Goodson 10%, SimOS-Goodson ; Sim-Goodson,SimOS-Goodson IPC Cache LMbench, ( ) 2 300MHZ 2 SimOS-Goodson 15%, SimOS-Goodson
7 : CPU SimOS-Goodson Table 1 Validating with SPEC CPU SPEC CPU 2000 Prog (SPECint) ICT-Goodson Sim-Goodson SimOS-Goodson IPC-Err Mcf Parser Perlbmk Crafty Twolf Vortex Vpr Gap Gcc Gzip Eon Applu Apsi Art Swim Equake Mesa Wupwise Sixtrack Table 2 Validating with LMbench 2 LMbench Application Real SimOS-Goodson Err Description Syscall null Simple syscall Syscall read Read syscall Pipe Interprocess communication through pipes Sig catch Signal handler Unix Unix stream socket Fifo Pipe operation Ctx Context switching, IPS(instructions per second) CPS(cycles per second) 4 3 Linux SPEC CPU 2000 SPLASH2 [9] Apache Benchmark,,SPLASH2 3 SimOS-Goodson Table 3 Simlating speed of SimOS-Goodson 3 SimOS-Goodson Core count Workload Cycles (K) Instructions (K) Time (s) CPS (K) IPS (K) 1 Linux OS boot Spec2k (MESA) Splash2 (radix) Apache benchmark Linux OS boot Splash2 (radix) Spec2k (MESA+MESA) Apache benchmark Linux OS boot Splash2 (radix) Apache benchmark
8 1054 Journal of Software Vol18, No4, April 2007 Linux 506K CPS IPC IPC,,SimOS-Goodson 300K IPS, 405K IPS 4 Linux Linux cpu_idle Cache 1, IPC 2 CPU,ICT-Goodson 50K / ;Sim-Goodson 500K / ;SimOS-Goodson 300K / 400K / ICT-Goodson,,, ;Sim-Goodson, ;SimOS-Goodson, Sim-Goodson, 5 DEC IBM SimOS Alpha PowerPC SimOS-Alpha SimOS-PPC SimOS-Goodson SimOS,, ;, GEMS(general execution model simulator) [10] Wisconsin, Simics [11] Simics SimOS-Goodson, GEMS Simics, [2] 120K IPS,GEMS (flat), SimOS-Goodson,,SimOS-Goodson 300 K IPS,SimOS-Goodson, 6 SimOS SimOS-Goodson,, 3 SPEC CPU 2000 LMbench, SimOS-Goodson 15% SimOS-Goodson, 2 CPU Linux,SimOS-Goodson 30%, 300K / SimOS-Goodson, SimOS-Goodson Cache,
9 : CPU SimOS-Goodson 1055 SimOS-Goodson, References: [1] Burger DC, Austin TM The simplescalar tool set, version 20 Technical Report, CS-TR , Madison: University of Wisconsin, 1997 [2] Mauer CJ, Hill MD, Wood DA Full system timing-first simulation In: Proc of the 2002 ACM Sigmetrics Conf on Measurement and Modeling of Computer Systems ACM Press, [3] Gibson J, Kunz R, Ofelt D, Horowitz M, Hennessy J, Heinrich M FLASH vs (simulated) FLASH: Closing the simulation loop In: Proc of the 9th Int l Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS) IEEE Computer Society, [4] Rosenblum M, Herrod SA, Witchel E, Gupta A Complete computer system simulation: The SimOS approach IEEE Parallel and Distributed Technology: Systems and Applications, 1995,3(4):34 43 [5] Hu WW, Zhang FX, Li ZS Microarchitecture of the Goodson-2 processor Journal of Computer Science and Technology, 2005, 20(2): [6] Martin MMK, Sorin DJ, Cain HW, Hill MD, Lipasti MH Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing In: Proc of the 34th Int l Symp on Microarchitecture IEEE Computer Society, [7] Henning J SPEC CPU2000: Measuring CPU performance in the new millennium IEEE Computer, 2000,33(7):28 35 [8] McVoy L, Staelin C LMbench: Portable tools for performance analysis In: USENIX Annual Technical Conf San Diego, [9] Woo SC, Ohara M, Torrie E, Singh JP, Gupta A The SPLASH-2 programs: Characterization and methodological considerations In: Proc of the 22nd Annual Int l Symp on Computer Architecture IEEE Computer Society, [10] Martin MM, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA Multifacet s general execution-driven multiprocessor simulator (GEMS) toolset, submitted to computer architecture news (CAN) wiscedu/gems [11] Magnusson PS, Cbristensson M, Eskilson J, Forsgren D, Hällberg G, Högberg J, Larsson F, Moestedt A, Werner B Simics: A full system simulation platform IEEE Computer, 2002,35(2):50 58 (1982 ),, (1974 ),, (1976 ),,,,CCF, (1968 ),,,VLSI (1981 ),,, (1966 ),,,CCF,
2. Futile Stall HTM HTM HTM. Transactional Memory: TM [1] TM. HTM exponential backoff. magic waiting HTM. futile stall. Hardware Transactional Memory:
1 1 1 1 1,a) 1 HTM 2 2 LogTM 72.2% 28.4% 1. Transactional Memory: TM [1] TM Hardware Transactional Memory: 1 Nagoya Institute of Technology, Nagoya, Aichi, 466-8555, Japan a) tsumura@nitech.ac.jp HTM HTM
More informationLow-Complexity Reorder Buffer Architecture*
Low-Complexity Reorder Buffer Architecture* Gurhan Kucuk, Dmitry Ponomarev, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower
More informationImpact of Cache Coherence Protocols on the Processing of Network Traffic
Impact of Cache Coherence Protocols on the Processing of Network Traffic Amit Kumar and Ram Huggahalli Communication Technology Lab Corporate Technology Group Intel Corporation 12/3/2007 Outline Background
More informationFull-System Timing-First Simulation
Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin Madison The Problem Design of future computer systems uses simulation
More informationAccuracy Enhancement by Selective Use of Branch History in Embedded Processor
Accuracy Enhancement by Selective Use of Branch History in Embedded Processor Jong Wook Kwak 1, Seong Tae Jhang 2, and Chu Shik Jhon 1 1 Department of Electrical Engineering and Computer Science, Seoul
More informationExploiting Streams in Instruction and Data Address Trace Compression
Exploiting Streams in Instruction and Data Address Trace Compression Aleksandar Milenkovi, Milena Milenkovi Electrical and Computer Engineering Dept., The University of Alabama in Huntsville Email: {milenka
More informationJosé F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2
CHERRY: CHECKPOINTED EARLY RESOURCE RECYCLING José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 1 2 3 MOTIVATION Problem: Limited processor resources Goal: More
More informationMEMORY ORDERING: A VALUE-BASED APPROACH
MEMORY ORDERING: A VALUE-BASED APPROACH VALUE-BASED REPLAY ELIMINATES THE NEED FOR CONTENT-ADDRESSABLE MEMORIES IN THE LOAD QUEUE, REMOVING ONE BARRIER TO SCALABLE OUT- OF-ORDER INSTRUCTION WINDOWS. INSTEAD,
More informationSimulating Server Consolidation
421 A Coruña, 16-18 de septiembre de 2009 Simulating Server Consolidation Antonio García-Guirado, Ricardo Fernández-Pascual, José M. García 1 Abstract Recently, virtualization has become a hot topic in
More informationA Shared-Variable-Based Synchronization Approach to Efficient Cache Coherence Simulation for Multi-Core Systems
A Shared-Variable-Based Synchronization Approach to Efficient Cache Coherence Simulation for Multi-Core Systems Cheng-Yang Fu, Meng-Huan Wu, and Ren-Song Tsay Department of Computer Science National Tsing
More informationDecoupled Zero-Compressed Memory
Decoupled Zero-Compressed Julien Dusser julien.dusser@inria.fr André Seznec andre.seznec@inria.fr Centre de recherche INRIA Rennes Bretagne Atlantique Campus de Beaulieu, 3542 Rennes Cedex, France Abstract
More informationA SoC simulator, the newest component in Open64 Report and Experience in Design and Development of a baseband SoC
A SoC simulator, the newest component in Open64 Report and Experience in Design and Development of a baseband SoC Wendong Wang, Tony Tuo, Kevin Lo, Dongchen Ren, Gary Hau, Jun zhang, Dong Huang {wendong.wang,
More informationCache Optimization by Fully-Replacement Policy
American Journal of Embedded Systems and Applications 2016; 4(1): 7-14 http://www.sciencepublishinggroup.com/j/ajesa doi: 10.11648/j.ajesa.20160401.12 ISSN: 2376-6069 (Print); ISSN: 2376-6085 (Online)
More informationConfigurable Virtual Platform Environment Using SID Simulator and Eclipse*
Configurable Virtual Platform Environment Using SID Simulator and Eclipse* Hadipurnawan Satria, Baatarbileg Altangerel, Jin Baek Kwon, and Jeongbae Lee Department of Computer Science, Sun Moon University
More informationECE404 Term Project Sentinel Thread
ECE404 Term Project Sentinel Thread Alok Garg Department of Electrical and Computer Engineering, University of Rochester 1 Introduction Performance degrading events like branch mispredictions and cache
More informationWhich is the best? Measuring & Improving Performance (if planes were computers...) An architecture example
1 Which is the best? 2 Lecture 05 Performance Metrics and Benchmarking 3 Measuring & Improving Performance (if planes were computers...) Plane People Range (miles) Speed (mph) Avg. Cost (millions) Passenger*Miles
More informationSpeculative Multithreaded Processors
Guri Sohi and Amir Roth Computer Sciences Department University of Wisconsin-Madison utline Trends and their implications Workloads for future processors Program parallelization and speculative threads
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More information15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 10: Runahead and MLP Prof. Onur Mutlu Carnegie Mellon University Last Time Issues in Out-of-order execution Buffer decoupling Register alias tables Physical
More informationFrequent Value Compression in Packet-based NoC Architectures
Frequent Value Compression in Packet-based NoC Architectures Ping Zhou, BoZhao, YuDu, YiXu, Youtao Zhang, Jun Yang, Li Zhao ECE Department CS Department University of Pittsburgh University of Pittsburgh
More informationLeveraging Bloom Filters for Smart Search Within NUCA Caches
Leveraging Bloom Filters for Smart Search Within NUCA Caches Robert Ricci, Steve Barrus, Dan Gebhardt, and Rajeev Balasubramonian School of Computing, University of Utah {ricci,sbarrus,gebhardt,rajeev}@cs.utah.edu
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationChip-Multithreading Systems Need A New Operating Systems Scheduler
Chip-Multithreading Systems Need A New Operating Systems Scheduler Alexandra Fedorova Christopher Small Daniel Nussbaum Margo Seltzer Harvard University, Sun Microsystems Sun Microsystems Sun Microsystems
More informationExecution-based Prediction Using Speculative Slices
Execution-based Prediction Using Speculative Slices Craig Zilles and Guri Sohi University of Wisconsin - Madison International Symposium on Computer Architecture July, 2001 The Problem Two major barriers
More informationCSE 502 Graduate Computer Architecture. Lec 11 Simultaneous Multithreading
CSE 502 Graduate Computer Architecture Lec 11 Simultaneous Multithreading Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from David Patterson,
More informationHIVE: Fault Containment for Shared-Memory Multiprocessors J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, A. Gupta
HIVE: Fault Containment for Shared-Memory Multiprocessors J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, A. Gupta CSE 598C Presented by: Sandra Rueda The Problem O.S. for managing FLASH architecture
More informationData Access History Cache and Associated Data Prefetching Mechanisms
Data Access History Cache and Associated Data Prefetching Mechanisms Yong Chen 1 chenyon1@iit.edu Surendra Byna 1 sbyna@iit.edu Xian-He Sun 1, 2 sun@iit.edu 1 Department of Computer Science, Illinois Institute
More informationSoftware-assisted Cache Mechanisms for Embedded Systems. Prabhat Jain
Software-assisted Cache Mechanisms for Embedded Systems by Prabhat Jain Bachelor of Engineering in Computer Engineering Devi Ahilya University, 1986 Master of Technology in Computer and Information Technology
More informationPower-Efficient Spilling Techniques for Chip Multiprocessors
Power-Efficient Spilling Techniques for Chip Multiprocessors Enric Herrero 1,JoséGonzález 2, and Ramon Canal 1 1 Dept. d Arquitectura de Computadors Universitat Politècnica de Catalunya {eherrero,rcanal}@ac.upc.edu
More informationComputer System. Performance
Computer System Performance Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/
More informationSIMFLEX: A Fast, Accurate, Flexible Full-System Simulation Framework for Performance Evaluation of Server Architecture
ACM SIGMETRICS Performance Evaluation Review (PER) Special Issue on Tools for Computer Architecture Research, Volume 31, Number 4, pages 31-35, March 2004 SIMFLEX: A Fast, Accurate, Flexible Full-System
More informationImproved Estimation for Software Multiplexing of Performance Counters
Improved Estimation for Software Multiplexing of Performance Counters Wiplove Mathur Texas Instruments, Inc. San Diego, CA 92121 wiplove@ti.com Jeanine Cook New Mexico State University Las Cruces, NM 88003
More informationDOM: A Data-dependency-Oriented Modeling Approach for Efficient Simulation of OS Preemptive Scheduling
DOM: A Data-dependency-Oriented Modeling Approach for Efficient Simulation of OS Preempte Scheduling Peng-Chih Wang, Meng-Huan Wu, and Ren-Song Tsay Department of Computer Science National Tsing Hua Unersity
More informationWorkloads, Scalability and QoS Considerations in CMP Platforms
Workloads, Scalability and QoS Considerations in CMP Platforms Presenter Don Newell Sr. Principal Engineer Intel Corporation 2007 Intel Corporation Agenda Trends and research context Evolving Workload
More informationPerformance Validation of Network-Intensive Workloads on a Full-System Simulator
Performance Validation of Network-Intensive Workloads on a Full-System Ali G. Saidi Nathan L. Binkert Lisa R. Hsu Steven K. Reinhardt {saidi,binkertn,hsul,stever}@umich.edu Advanced Computer Architecture
More informationImproving Data Cache Performance via Address Correlation: An Upper Bound Study
Improving Data Cache Performance via Address Correlation: An Upper Bound Study Peng-fei Chuang 1, Resit Sendag 2, and David J. Lilja 1 1 Department of Electrical and Computer Engineering Minnesota Supercomputing
More informationAn Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks
An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 s Joshua J. Yi and David J. Lilja Department of Electrical and Computer Engineering Minnesota Supercomputing
More informationCycle accurate transaction-driven simulation with multiple processor simulators
Cycle accurate transaction-driven simulation with multiple processor simulators Dohyung Kim 1a) and Rajesh Gupta 2 1 Engineering Center, Google Korea Ltd. 737 Yeoksam-dong, Gangnam-gu, Seoul 135 984, Korea
More informationCATS: Cycle Accurate Transaction-driven Simulation with Multiple Processor Simulators
CATS: Cycle Accurate Transaction-driven Simulation with Multiple Processor Simulators Dohyung Kim Soonhoi Ha Rajesh Gupta Department of Computer Science and Engineering School of Computer Science and Engineering
More informationCOMPUTER SYSTEM EVALUATION WITH COMMERCIAL WORKLOADS
COMPUTER SYSTEM EVALUATION WITH COMMERCIAL WORKLOADS JIM NILSSON, FREDRIK DAHLGREN, MAGNUS KARLSSON, PETER MAGNUSSON y, and PER STENSTRÖM fj,dahlgren,karlsson,persg@ce.chalmers.se Department of Computer
More informationMultifacet s General Execution-driven Multiprocessor Simulator (GEMS) Toolset
Multifacet s General Execution-driven Multiprocessor Simulator (GEMS) Toolset Milo M. K. Martin 1, Daniel J. Sorin 2, Bradford M. Beckmann 3, Michael R. Marty 3, Min Xu 4, Alaa R. Alameldeen 3, Kevin E.
More informationUsing a Serial Cache for. Energy Efficient Instruction Fetching
Using a Serial Cache for Energy Efficient Instruction Fetching Glenn Reinman y Brad Calder z y Computer Science Department, University of California, Los Angeles z Department of Computer Science and Engineering,
More informationDOI: /jos Tel/Fax: by Journal of Software. All rights reserved. , )
ISSN 1000-9825, CODEN RUXUEW E-mail jos@iscasaccn Journal of Software, Vol17, No5, May 2006, pp1195 1203 http//wwwjosorgcn DOI 101360/jos171195 Tel/Fax +86-10-62562563 2006 by Journal of Software All rights
More informationIntegrated CPU and Cache Power Management in Multiple Clock Domain Processors
Integrated CPU and Cache Power Management in Multiple Clock Domain Processors Nevine AbouGhazaleh, Bruce Childers, Daniel Mossé & Rami Melhem Department of Computer Science University of Pittsburgh HiPEAC
More informationReducing Latencies of Pipelined Cache Accesses Through Set Prediction
Reducing Latencies of Pipelined Cache Accesses Through Set Prediction Aneesh Aggarwal Electrical and Computer Engineering Binghamton University Binghamton, NY 1392 aneesh@binghamton.edu Abstract With the
More informationReducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research
Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Joel Hestness jthestness@uwalumni.com Lenni Kuff lskuff@uwalumni.com Computer Science Department University of
More informationVisualizing Page Replacement Techniques based on Page Frequency and Memory Access Pattern
Visualizing Page Replacement Techniques based on Page Frequency and Memory Access Pattern Ruchin Gupta, Narendra Teotia Information Technology, Ajay Kumar Garg Engineering College Ghaziabad, India Abstract
More informationDataflow Dominance: A Definition and Characterization
Dataflow Dominance: A Definition and Characterization Matt Ramsay University of Wisconsin-Madison Department of Electrical and Computer Engineering 1415 Engineering Drive Madison, WI 53706 Submitted in
More informationDOI: /jos Tel/Fax: by Journal of Software. All rights reserved. , )
ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscasaccn Journal of Software, Vol17, No2, February 2006, pp315 324 http://wwwjosorgcn DOI: 101360/jos170315 Tel/Fax: +86-10-62562563 2006 by Journal of Software
More informationHigh Performance Memory Requests Scheduling Technique for Multicore Processors
High Performance Memory Requests Scheduling Technique for Multicore Processors Walid El-Reedy Electronics and Comm. Engineering Cairo University, Cairo, Egypt walid.elreedy@gmail.com Ali A. El-Moursy Electrical
More informationBetter than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring
Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Lei Jin Sangyeun Cho Department of Computer Science University of Pittsburgh jinlei,cho@cs.pitt.edu Abstract Private
More informationInstruction Based Memory Distance Analysis and its Application to Optimization
Instruction Based Memory Distance Analysis and its Application to Optimization Changpeng Fang cfang@mtu.edu Steve Carr carr@mtu.edu Soner Önder soner@mtu.edu Department of Computer Science Michigan Technological
More informationBank-aware Dynamic Cache Partitioning for Multicore Architectures
Bank-aware Dynamic Cache Partitioning for Multicore Architectures Dimitris Kaseridis, Jeffrey Stuecheli and Lizy K. John Department of Electrical and Computer Engineering, The University of Texas at Austin,
More informationExploring Wakeup-Free Instruction Scheduling
Exploring Wakeup-Free Instruction Scheduling Jie S. Hu, N. Vijaykrishnan, and Mary Jane Irwin Microsystems Design Lab The Pennsylvania State University Outline Motivation Case study: Cyclone Towards high-performance
More informationScalable Directory Organization for Tiled CMP Architectures
Scalable Directory Organization for Tiled CMP Architectures Alberto Ros, Manuel E. Acacio, José M. García Departamento de Ingeniería y Tecnología de Computadores Universidad de Murcia {a.ros,meacacio,jmgarcia}@ditec.um.es
More informationMicroarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors. Moinuddin K. Qureshi Onur Mutlu Yale N.
Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors Moinuddin K. Qureshi Onur Mutlu Yale N. Patt High Performance Systems Group Department of Electrical
More informationShengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota
Loop Selection for Thread-Level Speculation, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota Chip Multiprocessors (CMPs)
More informationReducing Reorder Buffer Complexity Through Selective Operand Caching
Appears in the Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2003 Reducing Reorder Buffer Complexity Through Selective Operand Caching Gurhan Kucuk Dmitry Ponomarev
More informationOfficial Agenda Review the most commonly used tools in CMP µarch research. Unofficial Agenda Convince you to use Simics
Simics and Friends Modeling Tools for CMP Research Zvika Guz, Isask har (Zigi Zigi) Walter The Technion Israel Institute of Technology Official Agenda Agenda Review the most commonly used tools in CMP
More informationKeywords Cache memory, replacement policy, performance evaluation.
Performance Evaluation of Cache Replacement Policies for the SPEC CPU Benchmark Suite Hussein Al-Zoubi, Aleksandar Milenkovic, Milena Milenkovic Department of Electrical and Computer Engineering The University
More informationCache Insertion Policies to Reduce Bus Traffic and Cache Conflicts
Cache Insertion Policies to Reduce Bus Traffic and Cache Conflicts Yoav Etsion Dror G. Feitelson School of Computer Science and Engineering The Hebrew University of Jerusalem 14 Jerusalem, Israel Abstract
More informationSaving Register-File Leakage Power by Monitoring Instruction Sequence in ROB
Saving Register-File Leakage Power by Monitoring Instruction Sequence in ROB Wann-Yun Shieh Department of Computer Science and Information Engineering Chang Gung University Tao-Yuan, Taiwan Hsin-Dar Chen
More informationA Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors
, July 4-6, 2018, London, U.K. A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid in 3D chip Multi-processors Lei Wang, Fen Ge, Hao Lu, Ning Wu, Ying Zhang, and Fang Zhou Abstract As
More informationWisconsin Computer Architecture. Nam Sung Kim
Wisconsin Computer Architecture Mark Hill Nam Sung Kim Mikko Lipasti Karu Sankaralingam Guri Sohi David Wood Technology & Moore s Law 35nm Transistor 1947 Moore s Law 1964: Integrated Circuit 1958 Transistor
More informationInstruction Recirculation: Eliminating Counting Logic in Wakeup-Free Schedulers
Instruction Recirculation: Eliminating Counting Logic in Wakeup-Free Schedulers Joseph J. Sharkey, Dmitry V. Ponomarev Department of Computer Science State University of New York Binghamton, NY 13902 USA
More informationPorting GCC to the AMD64 architecture p.1/20
Porting GCC to the AMD64 architecture Jan Hubička hubicka@suse.cz SuSE ČR s. r. o. Porting GCC to the AMD64 architecture p.1/20 AMD64 architecture New architecture used by 8th generation CPUs by AMD AMD
More informationMODELING EFFECTS OF SPECULATIVE INSTRUCTION EXECUTION IN A FUNCTIONAL CACHE SIMULATOR AMOL SHAMKANT PANDIT, B.E.
MODELING EFFECTS OF SPECULATIVE INSTRUCTION EXECUTION IN A FUNCTIONAL CACHE SIMULATOR BY AMOL SHAMKANT PANDIT, B.E. A thesis submitted to the Graduate School in partial fulfillment of the requirements
More informationEvaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000
Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Mitesh R. Meswani and Patricia J. Teller Department of Computer Science, University
More informationThat all computers can simulate each other
COVER FEATURE Simics: A Full System Simulation Platform A full system simulator attempts to strike a balance between accuracy and performance by modeling the complete final application and providing a
More informationLinux/SimOS - A Simulation Environment for Evaluating High-speed Communication Systems 1
Linux/SimOS - A Simulation Environment for Evaluating High-speed Communication Systems 1 Chulho Won, Ben Lee, Chansu Yu *, Sangman Moh, Yong-Youn Kim, Kyoung Park Electrical and Computer Engineering Department
More informationReducing Overhead for Soft Error Coverage in High Availability Systems
Reducing Overhead for Soft Error Coverage in High Availability Systems Nidhi Aggarwal, Norman P. Jouppi, James E. Smith Parthasarathy Ranganathan and Kewal K. Saluja Abstract High reliability/availability
More informationBoost Sequential Program Performance Using A Virtual Large. Instruction Window on Chip Multicore Processor
Boost Sequential Program Performance Using A Virtual Large Instruction Window on Chip Multicore Processor Liqiang He Inner Mongolia University Huhhot, Inner Mongolia 010021 P.R.China liqiang@imu.edu.cn
More informationCMP Memory Modeling: How Much Does Accuracy Matter?
CMP Modeling: How Much Does Accuracy Matter? Sadagopan Srinivasan Li Zhao Brinda Ganesh Bruce Jacob* Mike Espig Ravi Iyer Systems Technology Lab, Intel Corporation * University of Maryland, College Park
More informationComputer Science 246. Computer Architecture
Computer Architecture Spring 2010 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture Outline Performance Metrics Averaging Amdahl s Law Benchmarks The CPU Performance Equation Optimal
More informationDecoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor
Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Computer Systems Laboratory Stanford University Motivation Dynamic analysis help
More informationAries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX
Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Keerthi Bhushan Rajesh K Chaurasia Hewlett-Packard India Software Operations 29, Cunningham Road Bangalore 560 052 India +91-80-2251554
More informationOn the Predictability of Program Behavior Using Different Input Data Sets
On the Predictability of Program Behavior Using Different Input Data Sets Wei Chung Hsu, Howard Chen, Pen Chung Yew Department of Computer Science University of Minnesota Abstract Smaller input data sets
More informationCharacterizing the d-tlb Behavior of SPEC CPU2000 Benchmarks
Characterizing the d-tlb Behavior of SPEC CPU2 Benchmarks Gokul B. Kandiraju Anand Sivasubramaniam Dept. of Computer Science and Engineering The Pennsylvania State University University Park, PA 682. fkandiraj,anandg@cse.psu.edu
More informationOutline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved
C5 Micro-Kernel: Real-Time Services for Embedded and Linux Systems Copyright 2003- Jaluna SA. All rights reserved. JL/TR-03-31.0.1 1 Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview
More informationParallel Computing 38 (2012) Contents lists available at SciVerse ScienceDirect. Parallel Computing
Parallel Computing 38 (2012) 533 551 Contents lists available at SciVerse ScienceDirect Parallel Computing journal homepage: www.elsevier.com/locate/parco Algorithm-level Feedback-controlled Adaptive data
More informationComputer Architecture!
Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors:!
More informationRelative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review
Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review Bijay K.Paikaray Debabala Swain Dept. of CSE, CUTM Dept. of CSE, CUTM Bhubaneswer, India Bhubaneswer, India
More informationPrecise Instruction Scheduling
Journal of Instruction-Level Parallelism 7 (2005) 1-29 Submitted 10/2004; published 04/2005 Precise Instruction Scheduling Gokhan Memik Department of Electrical and Computer Engineering Northwestern University
More informationSimOS PPC. Full System Simulation of PowerPC Architecture. Tom Keller Austin Research Lab IBM Research Division
SimOS PPC Full System Simulation of PowerPC Architecture Tom Keller IBM Research Division tkeller@us.ibm.com This Talk:(Inside IBM) (Outside IBM) http://simos.austin.ibm.com http://www.ece.utexas.edu/projects/ece/lca/events
More informationNon-Uniform Instruction Scheduling
Non-Uniform Instruction Scheduling Joseph J. Sharkey, Dmitry V. Ponomarev Department of Computer Science State University of New York Binghamton, NY 13902 USA {jsharke, dima}@cs.binghamton.edu Abstract.
More informationMiss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors
Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors Dan Wallin and Erik Hagersten Uppsala University Department of Information Technology P.O. Box 337, SE-751 05 Uppsala, Sweden
More informationSaving Register-File Leakage Power by Monitoring Instruction Sequence in ROB
Saving Register-File Leakage Power by Monitoring Instruction Sequence in ROB Wann-Yun Shieh * and Hsin-Dar Chen Department of Computer Science and Information Engineering Chang Gung University, Taiwan
More informationAnalyzing and Improving Clustering Based Sampling for Microprocessor Simulation
Analyzing and Improving Clustering Based Sampling for Microprocessor Simulation Yue Luo, Ajay Joshi, Aashish Phansalkar, Lizy John, and Joydeep Ghosh Department of Electrical and Computer Engineering University
More informationPerformance Prediction using Program Similarity
Performance Prediction using Program Similarity Aashish Phansalkar Lizy K. John {aashish, ljohn}@ece.utexas.edu University of Texas at Austin Abstract - Modern computer applications are developed at a
More informationEnergy Efficient Asymmetrically Ported Register Files
Energy Efficient Asymmetrically Ported Register Files Aneesh Aggarwal ECE Department University of Maryland College Park, MD 20742 aneesh@eng.umd.edu Manoj Franklin ECE Department and UMIACS University
More informationFuture Execution: A Hardware Prefetching Technique for Chip Multiprocessors
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors Ilya Ganusov and Martin Burtscher Computer Systems Laboratory Cornell University {ilya, burtscher}@csl.cornell.edu Abstract This
More informationSpeculative Multithreaded Processors
Guri Sohi and Amir Roth Computer Sciences Department University of Wisconsin-Madison utline Trends and their implications Workloads for future processors Program parallelization and speculative threads
More informationPipeline Muffling and A Priori Current Ramping: Architectural Techniques to Reduce High-Frequency Inductive Noise
Pipeline Muffling and A Priori Current Ramping: Architectural Techniques to Reduce High-Frequency Inductive Noise ABSTRACT While circuit and package designers have addressed microprocessor inductive noise
More informationBuilding and Using the ATLAS Transactional Memory System
Building and Using the ATLAS Transactional Memory System Njuguna Njoroge, Sewook Wee, Jared Casper, Justin Burdick, Yuriy Teslyar, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford
More informationPost-Silicon Verification for Cache Coherence
Post-Silicon Verification for Cache Coherence Andrew DeOrio, Adam Bauserman and Valeria Bertacco Dept. of Electrical Engineering and Computer Science University of Michigan, Ann Arbor {awdeorio, adambb,
More informationImproving Instruction Delivery with a Block-Aware ISA
530 Improving Instruction Delivery with a Block-Aware ISA Ahmad Zmily, Earl Killian, and Christos Kozyrakis Electrical Engineering Department Stanford University {zmily,killian,kozyraki}@stanford.edu Abstract.
More informationInserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL. Inserting Prefetches IA-32 Execution Layer - 1
I Inserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL Inserting Prefetches IA-32 Execution Layer - 1 Agenda IA-32EL Brief Overview Prefetching in Loops IA-32EL Prefetching in
More informationEvaluation of Low-Overhead Organizations for the Directory in Future Many-Core CMPs
Evaluation of Low-Overhead Organizations for the Directory in Future Many-Core CMPs Alberto Ros 1 and Manuel E. Acacio 2 1 Dpto. de Informática de Sistemas y Computadores Universidad Politécnica de Valencia,
More informationPerformance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor
Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Sarah Bird ϕ, Aashish Phansalkar ϕ, Lizy K. John ϕ, Alex Mericas α and Rajeev Indukuru α ϕ University
More informationMicroarchitectural Design Space Exploration Using An Architecture-Centric Approach
Microarchitectural Design Space Exploration Using An Architecture-Centric Approach Christophe Dubach, Timothy M. Jones and Michael F.P. O Boyle Member of HiPEAC School of Informatics University of Edinburgh
More information